A Fast Clustering Algorithm for Analyzing Highly Similar Compounds of Very Large Libraries

Li, Weizhong
Journal of chemical information and modeling ISSN 1549-9596 2006, vol. 46, no5, pp. 1919-1923 [5 page(s) (article)]

Abstract

As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace.org.

News - Articles

Site Search

Contact Us

Phone: 302-292-8500
Fax: 302-292-8520

TimTec LLC 1950 E Irlo Bronson Memorial Hwy, Suite 301 Kissimmee, FL 34744 T: 302-292-8500 F: 302-292-8520

Company	Quick Links	Resources	Customer Support	TimTec Network
About Us Our Customers Our Partners Register & Login Contact Site Search	Compound Libraries Natural products Bioscreening Directory	Glossary Database Downloads FAQs	Customer Service Terms of Sale Your Feedback (Un)Subscribe	eChemStore ActiMol.com MyriaScreen.com ChemDBsoft.com www.timtec.org

A Fast Clustering Algorithm for Analyzing Highly Similar Compounds of Very Large Libraries

Site Search

share

Contact Us