Database Support for Bioinformatics

DNA helix

The extensible and algorithmic functionality embedded in modern DBMSs is not widely recognised; in particular, the opportunity to extend DBMSs with new specialised datatypes, and to implement algorithms and searches inside database systems as 'stored procedures'. Database Cluster The key problem is that these individual components, while potentially powerful, lack the integration required to optimise their support of bioinformatics databases. Once these components are integrated, DBMSs are an obvious platform for bioinformatics databases.

We have several projects in this area of database support for Bioinformatics:

  • The BioSeqDB project develops a prototype system for the processing of high-throughput DNA sequencers. Our idea is to utilize the facilities of today's database systems to efficiently process huge amounts of data.
    This work is done in cooperation with Microsoft Research.

  • In our dbBLAST project, we aim at improving the performance of sequence alignment searches in a sequence databases. Our idea is to utilize the facilities of today's database systems to efficiently process huge amounts of data.
    This work is done in cooperation with Microsoft Research.

  • The PathBank project developed an integrated metabolic pathway web-database. The system includes a flexible visualisation GUI on top of an optimized pathway database that supported the BioPAX standard. As part of this project, we developed Genea - a novel and flexible method of creating a relational mapping and storage from an arbitrary OWL schema and studied the performance implications of our approach versus standard RDF and OWL stores.
    This work is done in cooperation with NICTA and Axogenic.

The Team

This project would not have been possible without the contributions by our project students:
Harshana Randeni, Tim Kraska, Leng Hong Tan, Puneet Kush, Joshua Ho, Tristan Manwaring, Thanh-Mai Diep, Alexander Bolodurin, and Sujit George.

Publications

Tim Kraska and Uwe Röhm. Genea: Schema-Aware Mapping of Ontologies into Relational Databases. In: Proceedings of the 13th International Conference on Management of Data (COMAD), December, Delhi, India, 2006.

Joshua Ho, Tristan Manwaring, Seokhee Hong, Uwe Röhm, Kai Xu, David Fung and Tim Kraska. PathBank: Web-based Querying and Visualisation of an Integrated Biological Pathway Database. In: Proceedings of the International Symposium on Bioinformatics Visualization (BioViz'06), 26-28 July, Sydney, 2006.

Uwe Röhm and Thanh-Mai Diep. How to BLAST your Database - A Comparison Study of Stored Procedures for BLAST Searches. In: Proceedings of 11th International Conference on Database Systems and Advanced Applications (DASFAA'2006), 12-15 April, Singapore, 2006.

Chun-Wu Chen and Uwe Röhm, A Service-oriented Approach for the Parallelization of Data-intensive Algorithms in a Grid-enabled Cluster. In: Proceedings of the First International Workshop on Biomedical Data Engineering (BMDE) in conjunction with ICDE2005, Tokyo, Japan, April 3-4, pages 22-29, 2005.