Learning Table Similarity Measures
Existing table similarity measures build on simple models of table metadata, structure, and content. They are designed mainly for tables with a horizontal layout where each column represents one attribute and data values are in rows, and they cannot be easily used for tables with other structures, such as matrix tables where both rows and columns are represented by attributes and values. Moreover, they rely in different manners on computing with frequency values of individual words which is not sufficient to capture the semantics of table elements because these have comparable (compared to words in a document) little and difficult to model context.
The main objective of this proposal is to research methods that bring more "semantics" to table similarity measures. We expect that better TSM will significantly improve the quality of applications relying on tables, such as table similarity search and table auto completion. We will approach this problem in two ways: By learning specific word embeddings optimized to yield semantically meaningful comparisons of single tokens within tables, and by designing a particular neural network architecture addressing table normalization and table comparison in a single, trainable framework.
Participating organisational units of HU Berlin
Financer
DFG: Sachbeihilfe
Duration of project
Start date: 10/2018
End date: 08/2021
Research Areas
Information Systems, Process and Knowledge Management, Operating, Communication, Database and Distributed Systems
Research Areas
Deep Learning, Information Retrieval, Web Science