Learning Table Similarity Measures


Existing table similarity measures build on simple models of table metadata, structure, and content. They are designed mainly for tables with a horizontal layout where each column represents one attribute and data values are in rows, and they cannot be easily used for tables with other structures, such as matrix tables where both rows and columns are represented by attributes and values. Moreover, they rely in different manners on computing with frequency values of individual words which is not sufficient to capture the semantics of table elements because these have comparable (compared to words in a document) little and difficult to model context.
The main objective of this proposal is to research methods that bring more "semantics" to table similarity measures. We expect that better TSM will significantly improve the quality of applications relying on tables, such as table similarity search and table auto completion. We will approach this problem in two ways: By learning specific word embeddings optimized to yield semantically meaningful comparisons of single tokens within tables, and by designing a particular neural network architecture addressing table normalization and table comparison in a single, trainable framework.

Principal investigators
Leser, Ulf Prof. Dr.-Ing. (Details) (Knowledge Management in Bioinformatics)

Participating organisational units of HU Berlin

Financer
DFG: Sachbeihilfe

Duration of project
Start date: 10/2018
End date: 08/2021

Research Areas
Information Systems, Process and Knowledge Management, Operating, Communication, Database and Distributed Systems

Research Areas
Deep Learning, Information Retrieval, Web Science

Last updated on 2023-26-04 at 06:30