Previous Student Projects

Mac Mini Cluster
The following list gives examples of which projects I supervised previously.
  • A Touch Interface for SQL Databases (Honours project) This project is about developing a touch interface for SQL databases. While SQL is the 'lingua franca' for database management systems, it requires some technical skills to be used effectively. Modern pads and slate computers feature however an interactive multi-touch display, that allows to intuitively navigate interactive content with hand gestures. In this project, the student shall develop a small prototype that allows to use hand gestures for navigating a SQL database. To this end, the student will define a set of table visualisations and gestures, and the corresponding mapping into SQL operators which then can be performed on a relational database. We will concentrate on some core SQL functions such as filtering, projection, and 2-way joins. The goal is to offer a interactive frontend to a standard SQL database which can be used on a touch display just by gestures.

  • Airline Passenger Connections (Honours project; co-supervisor: Irena Koprinska; industry-partner: Qantas)

    Passenger air travel involves a myriad of complex daily flight connections. Successful connections are important to a full-service airline's revenue management, airport operations and customer experience. Connecting times depend on many factors such as airport configuration, air and terminal congestion, local security procedures etc. Airlines design flight schedules, booking and airport procedures around major connecting traffic flows. They aim to provide optimum connections through their respective booking engines, balancing the needs between minimum transit time and maximum reliability. The goal of this project is to map out and analyse the flight connection hot spots across the full Qantas network, providing a full view of the performance of passenger connections and extracting data patterns. The analysis will be based on real passenger and schedules data, using a massively parallel Teradata relational database. This project is a fantastic opportunity to apply your knowledge in a high-profile, industry-linked project.

  • Building Monitoring with SSDQP: How to fix our airconditioning... (MIT 12cp project)
  • "Freshness-Aware Middle-Tier Caching for Multi-Object Requests." (Honours Project)

    Many e-business sites nowadays use a cluster as hardware platform and the build-in clustering support in web- and application servers. However, today's cluster approaches have some issues with guaranteeing distributed cache consistency and access to latest data. In this project, we want to extend the clustering support of J2EE application servers to being able to guarantee the consistency and freshness of the cached data, while at the same time improving reader performance. The idea is to allow different 'fresh' versions of beans in the cluster and to use this as additional degree of freedom for the routing decision of J2EE requests. When a client agrees to access stale data, it can be served by the cluster much faster than a client requesting up-to-date data, because this must then first wait for possible concurrent writers. This project can build on a prototypical implementation of such freshness-aware caching in JBoss and extend it for multi-object requests; this means invocations to the J2EE server which involve several cached objects. The research question to solve is how to support consistency and freshness for all cached objects together and how efficiently can this be done?

  • Suffix-Tree Indexing of Sequence Databases (MIT Research Project)

    In our dbBLAST project, we aim at improving the performance of sequence alignment searches in a sequence databases. We are currently using an inverted index structure from IR to efficiently identify potential sub-sequence matches. This project shall compare this with a suffix-tree index that indexes all substrings occurring in a sequence database. This is a challenging task the suffix tree shall be implemented using standard SQL and stored procedures in MS SQL Server, and it must be optimised for disk-based access. The student shall also investigate the scalability and performance characteristics of a BLAST search with this new indexing method. This project can build on previous work on (a) DBMS-based sequence alignment search and (b) parallelisation in database clusters.

    Skills needed: Good knowledge in databases; good data structures knowledge and experience with stored procedures of advantage

    Suitable majors: Databases, Software Engineering, Computer Science

  • A SPARQL Query Processor for GeneaStore (MIT Software Project)

    The aim of this project is to integrate a SPARQL query processor into PathBank, a metabolic pathway web database that is built on top of GeneaStore, our own OWL/RDF store that uses PostgreSQL 8 as underlying database system. The query processor shall be able to translate SPARQL queries, the declarative query language for RDF data, into appropriate SQL to access our GeneaStore.

    Skills needed: Sound Java Programming Experience, Good knowledge in databases

    Suitable majors: Databases, Software Engineering, Computer Science

  • Resource-aware Data Processing in Wireless Sensor Networks (MIT Research Project)

    Energy efficiency and good resource management is of paramount impor tance for wireless sensor networks that are batter y powered and have limited resources. The aim of this project is to study the potential of resource-awareness and adaptivity of in-network data processing algorithms for wireless sensor networks.

    The student shall develop a resource-aware software component for sensor nodes that monitors the usage of different sensor node resources such as battery level, memory consumption, CPU load or communication buffers, and that can provide users with the current status of available resources in the sensor network. This component shall further be used to enable a given data processing algorithm to become resource-aware and adapt dynamically to the available resources.

    Skills needed: Java Programming Experience, Knowledge in networks and databases of advantage

    Suitable majors: Networking, Databases, Software Engineering, Computer Science

  • Comparison of Central vs. In-network Data Processing for Wireless Sensor Networks (MIT Research Project)

    The central issue in wireless sensor networks (WNs) is energy-efficiency. Factors that contribute to energy consumption of a sensor node are communication, CPU usage, memory usage and sensor activation. As the communication costs are typically largest, the standard approach to energy efficiency is by minimising communication between nodes.

    However, sensors can also consume a lot of energy if activated, and furthermore, modern nodes such as Sun SPOTs provide comprehensive CPU, large (flash) memory capabilities, and multi-threading (e.g. Sun SPOT wireless sensor nodes). The result is that depending on the processing task, sending data to a central processing node might actually be the more energy efficient way than complex, periodical in-network data processing.

    This project shall compare the resource consumption of central versus in-network data processing on modern sensor nodes for complex data gathering tasks. It shall study techniques that allow to trade-off between accuracy and resource consumption in central and distributed data processing for WSNs. The outcome of this project will provide recommendations of using either central or local processing in different situations.

    Skills needed: Java Programming Experience, Knowledge in networks and databases

    Suitable majors: Networking, Databases, Software Engineering, Computer Science

  • Sun SPOT Study (12cp MIT Project)

    Sun Microsystems this week officially introduced their Sun SPOT wireless sensors with full Java support - and we already have about a dozen brand-new samples here in the School of IT for our research in Wireless Sensor Networks (WSN)!

    The aim of this project is the setup, study, and performance evaluation of a network of the new Sun SPOT wireless sensors. The student shall set-up a first testbed, study the features of the given sensors and conduct an initial evaluation of the nodes regarding their power consumption and time synchronisation capabilities.

  • GeneaStore Evaluation (12cp MIT Project)

    In a recent project last semester, we developed a novel method for automatically mapping and loading ontologies written in the Web Ontology Language (OWL) into relational databases. This project was on a Bioinformatics background (we are working with biological pathway databases) and won the 2005 EIE/SIT Conversazione Siemens price for Industry-related projects.

    In this follow-up project this semester, the student shall quantitatively and qualitatively evaluate our GeneaStore system for mapping and storing OWL ontologies in a relation DBMS versus the Sesame system and the DLDB approach. S/he shall also look at two different storage engines - Postgres and MySQL - with different data sizes of the LUBM benchmark.

  • "PowerBLAST: Building a Data Warehouse for Biology using a Database Cluster." (Summer Vacation Project)

    Applications in Bioinformatics need to analyze and cooperatively accessing large amounts of data. Besides 'raw' scientific data such as nucleotide sequences of genes, this also includes rich meta data such as source information, literature references, etc. This project will investigate the scalability and performance characteristics of a data warehouse for Bioinformatics using a database cluster. A database cluster is a very powerful and easy extensible parallel database consisting of a cluster of PCs, each with its own DBMS. We will either use Microsoft SQLServer 2005 with its C# CLR extension, or PostgreSQL and C. The challenge is to map the feature-rich bioinformatical data sets to such an architecture while maintaining scalability and improving performance. This project can build on previous work of our group on (a) OLAP in database clusters and (b) DBMS-based sequence alignment search.

  • "Freshness-Aware Scheduling in J2EE Clusters." (UGrad Project)

    A cluster of PCs is attractive as scalable, but inexpensive parallel infrastructure. Hence, many e-business sites nowadays use a cluster as hardware platform and there is build-in clustering support in web- and application servers. However, this support is somewhat straightforward so far, as an Honours thesis at SIT has shown last year. In this project, we want to extend the clustering support of J2EE application servers to improve especially reader performance which form the majority of e-business workload. The idea is to allow different 'fresh' versions of beans in the cluster and to use this as additional degree of freedom for the routing decision of J2EE requests. When a client agrees to access stale data, it can be served by the cluster much faster than a client requesting up-to-date data, because this must then first wait for possible concurrent writers. The project goal is to implement such a freshness-aware scheduler in an Open Source J2EE application server (JBoss) and to investigate the performance characteristics of such an approach.

  • "Implementation of an Integrated Metabolic Pathway Database" (MIT Research Project)

    The goal of this project is to develop an integrated database for metabolic pathways including a visualization component and an interactive web-browsing and querying interface. This is a joint project between the School of IT, NICTA and AxoGenic, a bioinformatics company. While the visualization component will be provided by NICTA, the project student will concentrate on the database, the access components and the planned web interface. This project is interesting both from research and commercial viewpoint, as no such integrated and dynamically visualized system does exist so far.

    In the initial phase, the current field of biological pathway databases will be explored and a handfull of suitable systems chosen. In this phase, we will together with NICTA and AxoGenic also decide on the visualization and mining algorithms to be suppor ted by the integrated system. In the integration phase, the student has to integrate the chosen pathway databases by designing a uniform database schema and appropriate data integration tools. The implementation phase will set-up the integrated database and the needed import components. The physical schema must be optimized for the chosen algorithms and an appropriate interface be provided. Eventually (if time permits), we will also investigate a design based on a database cluster. Finally, a web frontend shall be implemented that allows to quer y and browse the database.

  • "Distributed Caching in J2EE versus Middle-Tier Database Caching" (MIT Research Project)

    In a modern e-business architectures, clients are served by a cluster of application servers which access a shared back-end database. In large-scale systems, application servers and databases can even be widely distributed. To avoid that the database becomes a bottleneck, the major database vendors developed middle-tier database caching. Basically, this is a memory-resident database which serves as a local database to the application servers, but actually only caches the data from the back-end system. This is a quite heavy caching approach. Hence, we developed distributed caching of EJBs within the application servers themselves in an honours thesis last year.
    This project shall compare our distributed caching approach with middle-tier database caching. It can build on the results of a Honours thesis from previous year and shall concentrate on the installation and performance measurements with both caching approaches.

  • "Benchmarking Stored Procedures of Relational DBMS" (TSP Project)

    Modern relational database management systems can store and execute user-defined functions within the database server. These so-called 'stored procedures' provide improved performance for more complex business functions which can reside on the server. Although the latest SQL standard defines 'persistent stored modules', every system takes a slightly different approach to implement stored procedures: From proprietary extensions of SQL towards a programming language, via supporting Java programs, to the inclusion of a CLR runtime or C DLLs. Which approach is better? What are the advantages/disadvantages of each approach?
    This project shall develop a small set of well-defined stored procedure tests and compare the ease-of-use and the performance of the stored procedure implementations with three important relational DBMS: Oracle, SQLServer, and PostgreSQL.

  • "Peer-to-Peer Knowledge Management" (Honours Project)

    This project corresponds to the previous project on the Personal Information Store in the sense that we want to investigate how such an information store can be build in a dsitributed system using peer-to-peer technologies. Such a system would be very attractive for small collaborating teams, where each member's local information is automatically integrated into a global view. The open research questions are how to do effective distributed searches/queries in such a decentralised architecture, as well as how to automatically integrate distributed TopicMaps.

  • "Study of Desktop Search Engines" (MIT Project)

    There are a number of different desktop search engines available: Google Desktop Search, MSN Search Toolbar, Apple Spotlight just to name a few. The purpose of this project is to give an overview of currently available desktop search engines and give a comparison regarding key features. We are in particular interested in their indexing methods, meta-data support, extensibility, and networking possibilities (only local search, or more?).

  • "In-Network Data Stream Processing using Eddies" (MIT Project)

    More and more applications do not work on a static data set, but on a continous stream of data. Examples are click-streams of websites, RSS data streams, or sensor networks. The processing of such data streams is currently a hot topic in the database research community. One new approach are so called "Eddies" which are highly adaptive query operators which follow the idea to route data through query operators rather the other way around. This makes them very attractive for de-centralised, extensible architectures. In particular, it allows to process data "in-network" instead of collecting all data into a central processing site. In this project, we want to follow this vision and investigate the feasability of Eddies for distributed data stream processing in a peer-to-peer architecture. We are in particular interested in optimizing the routing decision and incorporation of user-defined operators.

  • Summer Vacation Project "BLAST in a Database"

    The Basic Local Alignment Search Tool (BLAST) is one of the central algorithms in bioinformatics. In this project, we investigate the possibilities to implement the standard BLAST algorithm using stored procedure functionalities in a relational database management system. We are in particular interested in the development effort needed and its performance characteristics as compared with a version parsing textfiles. The idea is to utilize the facilities of todayÕs database systems to efficiently process large amounts of data. This will include the development of an appropriate physical design for the sequence database and the identification which parts of the BLAST algorithm can take advantage of the set processing capabilities of SQL. The core of this project is the implementation and optimisation of BLAST with a 4GL programming language such as PL/SQL or TransactSQL in a commercial database system. An quantitative evaluation of the new implementation with a non-database version of BLAST will conclude this work.

  • Honour's Project "Distributed Cache Replication Framework for Middle-Tier Data Caching"

    In a modern web-based 3-tier architecture, a cluster of application servers is supported by a single database server. To increase the capacity of the system, one pursues the scaling-out of application servers. A direct result of this approach is that the database server becomes the bottleneck and the challenge is to alleviate that bottleneck.
    This research project developed a scalable distributed cache replication framework that addresses the challenge through the management of a distributed cache, containing Container Managed Persistence (CMP) entity beans, in the application server tier. Essentially an application server propagates all state updates to all other application servers in the cluster so that they are not required to continuously consult the database server for the most up-to-date version of the data. This reduces the need to communicate with the database server whilst minimising the latency of requests for data and thereby increase the scalability and performance of application servers.

  • MIT Project "Study of WiKi Engines"

    A Wiki is a writable web site, where basically everyone can contribute to both the content and the structure of the site (see for example Some Wikis even go further and already include some document management facilities such as user management and attached files. This study shall give an overview of currently available Wiki systems, and compare three to five chosen Wiki engines with regards to their suitability for two concrete use case examples: Using a Wiki website in teaching, and using a Wiki as knowledge management tool for a research group.

  • MIT Project "Study of Benchmarks for e-Business"

    A study and comparison of different state-of-the-art e-business benchmarks, including TPC-W, IBM's Trade2, Sun's ECperf (SPEC jAppServer 2002), and CSIRO's Stock Online.

  • MIT Project "Parallelisation and Distribution Algorithms on the Grid"

    The proliferation of distributed grid infrastructures allows to efficiently share resources between organizations in a distributed environment. An important application is the parallelization of computational expensive algorithms as they can be found, for instance, in life sciences (genomics, proteomics). The goal of this project is to evaluate existing, data-intensive scientific algorithms, e.g. in genomics and proteomics, and to analyze to what extent they can benefit from parallelization and a distributed service infrastructure. The parallelization shall not be based on parallel compilers and special hardware, but based on a grid infrastructure (e.g., the Globus toolkit). In a cooperation with the University for Health Informatics and Technology Tyrol (Austria), a distributed grid infrastructure will be jointly set up and the parallelization of selected life sciences algorithms will be implemented, even across continental boundaries.

  • MIT Project "Study of up-coming database-like file system: WinFS"

    Traditionally, file systems and database systems are thought to address different needs and audience Ñ the file systems is more the light-weight but unreliable storage for applications whereas databases are a relatively heavy-weight persistent storage with emphasis on robustness, security, and higher-level data models. This perception has changed in the last years, which saw file systems getting more robust (cf. journaling file systems) and databases move more towards end-users and the desktop (e.g. document management systems). An interesting merge of both technologies will happen in the upcoming Windows version, code name 'Longhorn': With WinFS, a database-kind storage is planned to be directly integrated into the Windows file system. In this study, we were interested to find out how much this and other new systems are meant to be a file system, how much a document repository, and how much a database system.