About PPT-DB

The PPT-DB is a database of databases. It houses more than 20 carefully curated databases, each of which contain commonly predicted protein property information. PPT-DB is designed to serve two purposes. First it is intended to serve as a centralized, up-to-date, freely downloadable, easily queried and carefully curated repository of predictable or "derived" protein property data (such as secondary structures, transmembrane helices, signal peptides, B-factors, disulfide pairings, contact order, folding rates etc.). In this role, PPT-DB can serve as a one-stop, fully standardized repository that developers may access to obtain the required training, testing and validation data needed for almost any kind of protein property prediction program they may wish create. The second role that PPT-DB can play is as a tool for homology-based protein property prediction. Users (as opposed to developers) may query PPT-DB with a sequence of interest and have a specific property (say secondary structure) predicted using a sequence similarity search against PPT-DB's extensive collection of proteins with known properties. In particular PPT-DB exploits the well-known fact that protein structure, function and dynamics are highly conserved between homologous proteins. Predictions derived from PPT-DB's similarity searches are typically 90-95% correct (for categorical predictions, such as secondary structure) or within 5-10% of the actual measured values (for numeric predictions, such as B-factors or order parameters). This performance is 10-30% better than what is typically obtained from standard "ab initio" predictions. PPT-DB and all of its contents are available at: http://www.pptdb.ca

Users and developers are invited to suggest or submit additions, enhancements or corrections to the database.

Each database in PPT-DB contains one or more references describing the origin of the data or the program(s) used to generate the data. Many of the databases placed in PPT-DB were developed in-house using a program called VADAR to help derive or extract the data from PDB files. Other databases were assembled semi-automatically from information contained in SwissProt or the PDB. Some databases (such as the transmembrane helix, the transmembrane beta barrel, the beta hairpin, the beta edge/central strand and the folding rate databases) were assembled manually. Other databases such as EVA and TMH-Benchmark were obtained from external sources but were re-formatted to make them compatible with the PPT-DB annotation standards. Each database and each update to a database is dated and numbered allowing a well-defined audit trail to be assembled and to allow developers to share and compare testing/training data. Examples of the database content and the associated databases references are given in the "Databases Details" link. Depending on their size and ease of curation, PPT-DB databases are updated as frequently as once per month (i.e. secondary structure databases) or as infrequently as once per year (i.e. folding rate information).

This project is supported by Genome Alberta & Genome Canada, a not-for-profit organization that is leading Canada's national genomics strategy with $600 million in funding from the federal government.