Generalized Analysis of Logs for Automatic Translation and Episodic Analysis of Searches


With the growth of digital libraries and digital library federation (as well as partially unstructured collections of documents such as web sites), a large set of vendors is offering engines for retrieving contents and metadata via search requests by the end user (queries). In most cases these queries are just unstructured fragments of text in a specific language. The first service offered by GALATEAS (LangLog) is focussed on getting meaning out of these lists of queries and it is addressed to library/federation/site managers. Contrary to mainstream service in this field, GALATEAS services will not considered standard structured information of web logs (e.g. click rate, visited pages, user s paths inside the document tree) but the information contained in queries from the point of view of language interpretation. By subscribing LangLog federations administrator and managers will be able to answer questions such as: as Which are the topics which are most commonly searched in my collection, according to a certain language? ; how do these topics relate with my catalogue? ; Which named entities (people, places) are more popular among my users? The second problem addressed by GALATEAS is the one of Cross Language Information Retrieval (CLIR) i.e. the capability of typing a query in one specific language and retrieving documents which are available in different languages. The CACAO consortium is already successfully providing services for indexing and searching over digital libraries and metadata repositories. During commercial exploration for marketing CACAO it emerged that certain institutions prefer to keep indexing and searching at their premises (using their own favourite search engine) and would be perfectly satisfied with a service of plain query translation.The second service offered by GALATEAS (QueryTrans) has the ambitious and innovative goal of providing the first web translation service specially tailored on query translation. Languages addressed by both LangLog and QueryLog are : Italian, French, English, German, Dutch, Modern Arabic and Polish.
With the growth of digital libraries and digital library federation (as well as partially unstructured collections of documents such as web sites), a large set of vendors is offering engines for retrieving contents and metadata via search requests by the end user (queries). In most cases these queries are just unstructured fragments of text in a specific language. The first service offered by GALATEAS (LangLog) is focussed on getting meaning out of these lists of queries and it is addressed to library/federation/site managers. Contrary to mainstream service in this field, GALATEAS services will not considered standard structured information of web logs (e.g. click rate, visited pages, user s paths inside the document tree) but the information contained in queries from the point of view of language interpretation. By subscribing LangLog federations administrator and managers will be able to answer questions such as: as Which are the topics which are most commonly searched in my collection, according to a certain language? ; how do these topics relate with my catalogue? ; Which named entities (people, places) are more popular among my users? The second problem addressed by GALATEAS is the one of Cross Language Information Retrieval (CLIR) i.e. the capability of typing a query in one specific language and retrieving documents which are available in different languages. The CACAO consortium is already successfully providing services for indexing and searching over digital libraries and metadata repositories. During commercial exploration for marketing CACAO it emerged that certain institutions prefer to keep indexing and searching at their premises (using their own favourite search engine) and would be perfectly satisfied with a service of plain query translation.The second service offered by GALATEAS (QueryTrans) has the ambitious and innovative goal of providing the first web translation service specially tailored on query translation. Languages addressed by both LangLog and QueryLog are : Italian, French, English, German, Dutch, Modern Arabic and Polish.


Principal investigators
Petras, Vivien Prof. (Details) (Information Retrieval)

Duration of project
Start date: 04/2010
End date: 03/2013

Last updated on 2020-19-03 at 23:17