Theoretical foundations for efficient enumeration of query results

Query evaluation is one of the central tasks of a database system. The theoretical foundations of query evaluation rely on a close connection between database theory and mathematical logic, as databases correspond to finite relational structures, and queries can be formulated by logical formulae. Since 2007, the fields of logic in computer science and database theory have seen a number of contributions that deal with the efficient enumeration of query results. In this scenario, the objective is as follows: given a structure (i.e., a database) and a logical formula (i.e., a query), after a short precomputation phase, the query results shall be generated one by one, while giving guarantees on the maximum delay time between the output of two tuples. Currently known algorithms for this scenario achieve good worst-case complexity bounds, but have the unpleasant property that they exhibit worst-case behaviour on virtually every input instance. The project's objective is to overcome this worst-case behaviour. We want to establish notions of "certificate complexity" and "instance optimality" for enumeration algorithms, and we want to develop algorithms that are instance optimal in this sense. Furthermore, we want to carry out a systematic study of query result enumeration under database updates. In the context of query result enumeration, database updates have received surprinsingly little attention in the database theory literature; most of the currently existing methods have to redo the entire precomputation phase after having received an update to the database. The query languages we want to consider are suitable fragments of first-order logic or monadic second-order logic. Whenever necessary, we will restrict attention to suitable classes of databases, including classes of bounded degree or bounded expansion. In each case, we want to identify the "tractability frontiers". I.e., given a class of structures, we want to exhibit the most expressive logic, and given a logic, we want to exhibit the largest class of structures, for which efficient query result enumeration is possible. For proving lower bounds, we want to utilise methods developed in the emerging field of "fine-grained complexity".

Query evaluation is one of the central tasks of a database system. The theoretical foundations of query evaluation rely on a close connection between database theory and mathematical logic, as databases correspond to finite relational structures, and queries can be formulated by logical formulae. Since 2007, the fields of logic in computer science and database theory have seen a number of contributions that deal with the efficient enumeration of query results. In this scenario, the objective is as follows: given a structure (i.e., a database) and a logical formula (i.e., a query), after a short precomputation phase, the query results shall be generated one by one, while giving guarantees on the maximum delay time between the output of two tuples. Currently known algorithms for this scenario achieve good worst-case complexity bounds, but have the unpleasant property that they exhibit worst-case behaviour on virtually every input instance. The project's objective is to overcome this worst-case behaviour. We want to establish notions of "certificate complexity" and "instance optimality" for enumeration algorithms, and we want to develop algorithms that are instance optimal in this sense. Furthermore, we want to carry out a systematic study of query result enumeration under database updates. In the context of query result enumeration, database updates have received surprinsingly little attention in the database theory literature; most of the currently existing methods have to redo the entire precomputation phase after having received an update to the database. The query languages we want to consider are suitable fragments of first-order logic or monadic second-order logic. Whenever necessary, we will restrict attention to suitable classes of databases, including classes of bounded degree or bounded expansion. In each case, we want to identify the "tractability frontiers". I.e., given a class of structures, we want to exhibit the most expressive logic, and given a logic, we want to exhibit the largest class of structures, for which efficient query result enumeration is possible. For proving lower bounds, we want to utilise methods developed in the emerging field of "fine-grained complexity".

Duration of project

Start date: 07/2016

End date: 12/2019

Research Areas

Research Areas