Project Offerings
Project Offerings for Semester 1, 2025
I have currently the following projects on offer on machine learning for self-tuning data systems, large-scale distributed data management, and next generation data analytics:
- Machine Learning for Physical Data Design
- Cardinality Estimation with Large Language Models
- Benchmarking MongoDB and other JSON Document Stores for Analytical Workloads
If you are interested in any of these projects, please contact me by email or in person.
Projects on Machine Learning for Data Systems
- Machine Learning for Physical Data Design (Honours project)
In this project we are interested in using machine learning techniques such as deep reinforcement learning (DRL) to automatically tune the physical design of a given database schema for query workload. In particular we are interested to find out how well index suggestions, and decisions on table partitioning and row versus column storage can be automated using DRL as compare to manual database tuning.
- Cardinality Estimation with Large Language Models (Honours project)
Large Language Models (LLMs) are powerful machine learning models for text generation and natural language processing tasks. LLMs achieve this by learning statistical relationships from large amounts of text during a (semi-)supervised training process. In this project, we will investigate how well we can apply LLMs for learning the statistical relationships of data inside a database for the purpose of cardinality estimation and potentially also approximate query processing. Basically: Can we use LLMs to 'learn' a database? Cardinality estimation is a key technique for query optimisation inside databases, and being able to fast and accurately estimate the result size of query conditions and joins is key for query execution planning.
Projects on Large-Scale Distributed Data Management:
- Dynamic First-Past-the-Post Optimiser for MongoDB
MongoDB is a popular, distributed JSON document database that has a very unique optimiser approach called First-Past-the-Post (FPTP): For a given query, it generates a number of execution plans which then 'compete' with each other in a round-robin fashion with discrete execution steps until the first plan reaches a result threshold; this winner plan is then fully executed while the other plan variants are aborted.
This project will extend this approach by also taking into account the actual execution time until the result threshold was reached. This would enable the FPTP optimiser to dynamically react to varying system loads and heterogeneous cluster configurations - a new opportunity not covered yet by the current system. The project would consist of some small modifications to the existing open-source MongoDB, plus a performance evaluation on varying cluster loads for the original system and the modified version.
- Benchmarking MongoDB and other JSON Document Stores for Analytical Workloads
In this project, we will be comparing a number of popular open source JSON stores, such as MongoDB, Couchbase and AsterixDB, with regard how well they are suited for different query workloads. In particular we are interested in identifying how those systems deal with analytical queries on nested document structures. This project can build on a previous project where we defined a benchmark for the MongoDB system which should be extended for different JSON stores.
Projects on Next-Generation Data Analytics
- A Touch Interface for SQL Databases (Honours project)
More and more computing systems are produced with touch interfaces, from smartphones via tablets to the latest versions of desktop operating systems (Windows 8 and Max OS X). At the same time, the basic interface to database systems is still SQL, which is a text-based query language that requires keyboard input and that is hard to learn for novice users.
In our TouchQL project, we aim to develop a query 'language' that is purely based on a graphical schema representation and input gestures and that allows to query a relational database using a tablet computer.
There exists already an initial prototype of TouchQL for Android devices that supports basic selections, projections and natural joins over local databases.
The goal of this Honours project is to extend this system with a mechanism for grouping and aggregation, and also to support querying remote databases. The challenge in the later part is to provide timely feedback to the user for the intended operations as in TouchQL, there is no separation between query formulation and query execution - users shall get immediate feedback on their intended actions on the actual data set. It would be additionally beneficial if the student would be able to port TouchQL from the Java-based Android to the Objective-C based iOS.
I also maintain a list of former projects supervised by me in recent years.