Details of Grant

EPSRC Reference:

EP/T033762/1

Title:

Foundations of the Finite Model Theory of Concatenation

Principal Investigator:

Freydenberger, Dr DD

Other Investigators:

Researcher Co-Investigators:

Project Partners:

Department:

Computer Science

Organisation:

Loughborough University

Scheme:

New Investigator Award

Starts:

01 July 2021

Ends:

30 June 2025

Value (£):

397,698

EPSRC Research Topic Classifications:

Fundamentals of Computing

Information & Knowledge Mgmt

EPSRC Industrial Sector Classifications:

No relevance to Underpinning Sectors

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
20 May 2020	EPSRC ICT Prioritisation Panel May 2020	Announced

Summary on Grant Application Form

The topic of this proposal is the finite model theory of concatenation (FC), a new logic that combines approaches from two different subfields of theoretical computer science and that has direct applications in information extraction and database theory, and eventual applications in all areas that deal with searching in or filtering of textual data.

Textual data is everywhere in modern life: No matter whether we are dealing with books, report, emails, social media, spreadsheets, or even HTML documents or log files, a huge amount of information is represented as a sequence of letters.

Information extraction (IE) deals with the problem of extracting relevant information from such data. One model for this are document spanners, a relational framework for IE that can be understood as querying text like one would query a database. Due to their similarity to relational queries, document spanners have recently become a trending topic in database theory.

Database theory uses methods from finite model theory (FMT), a sub-discipline of logic in computer science. FMT provided the inspiration for relational database, and both fields have maintained a close connection over these last fifty years. Of particular importance are methods for efficient evaluation of queries and deep insights of what can and cannot be expressed with certain types of queries. Every result in FMT is also a result on databases, as there is an immediate one-to-one correspondence between logical formulas and database queries: Every database query corresponds to a logical formula, and every logical formula corresponds to a database query.

This is well-understood for relational databases, but it does directly translate to the textual setting of document spanners: The canonical FMT-approaches to text are provably too weak to model document spanners. But to be able to evaluate, optimize, and compare queries on texts, having this for document spanners would be tremendously useful.

As first steps in this direction, previous research used "the theory of concatenation", a different logic from combinatorics on words (CoW), another sub-discipline of theoretical computer science, that studies patterns in texts. But this logic corresponds to infinite model theory, not finite model theory, which makes many useful techniques from FMT unavailable.

To address this problem, the proposed new logic FC combines the theory of concatenation from CoW with the finite model approach from FMT. As shown in the preparatory research, FC has exactly the required expressive power of document spanners while still allowing the use of many classical FMT techniques.

Apart from the connection to document spanners, FC can be seen as a natural variant of the logics that have previously been used: From an FMT-point of view, FC can be seen as extending the canonical first-order logic on strings with a string equality operator. From a CoW-point of view, FC is the finite model version of the theory of concatenation. As such, FC is a fundamental framework for using logic on textual data, and it can directly be extended to cover other operations, like length constraints as they are used in string verification.

The preparatory research introduces FC, proves it has the desired one-to-one corresponds to document spanners, and shows that many FMT-techniques can be adapted. But many open questions remain, in particular regarding two of the most important aspects: The efficient evaluation of queries, and the fundamental limitations of the framework -- or, less formally, which queries can be evaluated quickly, which algorithms should be used for this, and what is possible in this querying model?

This project will close this gap by creating a solid theoretical foundation for FC. This will require new proof techniques that at the very least merge those from CoW and FMT, or even go far beyond them. The ultimate goal is that FC will do for querying text what FMT did for databases.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.lboro.ac.uk