Learning Table Similarity Measures

Existing table similarity measures build on simple models of table metadata, structure, and
content. They are designed mainly for tables with a horizontal layout where each column
represents one attribute and data values are in rows, and they cannot be easily used for tables
with other structures, such as matrix tables where both rows and columns are represented
by attributes and values. Moreover, they rely { in different manners { on computing with
frequency values of individual words which is not sufficient to capture the semantics of table
elements because these have comparable (compared to words in a document) little and difficult
to model context.
The main objective of this proposal is to research methods that bring more "semantics" to
table similarity measures. We expect that better TSM will significantly improve the quality of
applications relying on tables, such as table similarity search and table auto completion. We
will approach this problem in two ways: By learning specific word embeddings optimized to
yield semantically meaningful comparisons of single tokens within tables, and by designing a
particular neural network architecture addressing table normalization and table comparison in
a single, trainable framework.

Leser, Ulf Prof. Dr.-Ing. (Details) (Wissensmanagement in der Bioinformatik)

Beteiligte Organisationseinheiten der HU

DFG: Sachbeihilfe

Projektstart: 10/2018
Projektende: 02/2021

Betriebs-, Kommunikations-, Datenbank- und verteilte Systeme

Deep Learning, Information Retrieval, Web Science

Zuletzt aktualisiert 2020-01-06 um 18:55