Logic-based reasoning
for information integration and data linkage

Marie-Christine Rousset

Professor of Computer Science, member of the LIG (Laboratoire d'Informatique de Grenoble)
Univ. Grenoble Alpes, IUF, France

Biography: Marie-Christine Rousset is a Professor of Computer Science at the University of Grenoble Alpes and senior member of Institut Universitaire de France. Her areas of research are Knowledge Representation, Information Integration, Pattern Mining and the Semantic Web. She has published around 100 refereed international journal articles and conference papers, and participated in several cooperative industry-university projects. She received a best paper award from AAAI in 1996, and has been nominated ECCAI fellow in 2005. She has served in many program committees of international conferences and workshops and in editorial boards of several journals.
Information integration and data linkage raise many difficult challenges, because data are becoming ubiquitous, multi-form, multi-source and musti-scale. Data semantics is probably one of the keys for attacking those challenges in a principled way. A lot of effort has been done in the Semantic Web community for describing the semantics of information through ontologies.

In this tutorial, I will show that description logics provide a good model for specifying ontologies over Web data (described in RDF), but that restrictions are necessary in order to obtain scalable algorithms for checking data consistency and answering conjonctive queries. I will explain that the DL-Lite family has good properties for combining ontological reasoning and data management at large scale.

Finally, I will describe a unifying rule-based logical framework for reasoning on RDF ontologies and databases. The underlying rule language allows to capture in a uniform manner OWL constraints that are useful in practice, such as property transtivity or symmetry, but also domain-specific rules with practical relevance for users in many domains of interest.

I will illustrate the expressivity of this framework for modeling Linked Data applications and its genericity for developing inference algorithms. In particular, I will show how it allows to model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. I will also explain how it makes possible to efficiently extract expressive modules from Semantic Web ontologies and databases with formal guarantees, whilst effectively controlling their succinctness. Experiments conducted on real-world datasets have demonstrated the feasibility of this approach and its usefulness in practice for data integration and information extraction.
 

Everyboy interested in description logics, databases and information integration is welcome to join. There is no specific prerequisites. The tutorial will be divided in the following three sessions.

I.
This part will be devoted to introduce the problems of information integration and data linkage from heterogeneous data sources, in particular in the setting of the Web of data (also called Linked Data), and the ontology-based approach to address these problems.

II.
This part will be devoted to description logics, their use for specifying ontologies and the associated inference algorithms for reasoning on data in presence of ontologies.

III.
In this last part, we will present a unifying rule-based logical framework for reasoning on RDF ontologies and databases, based on Datalog and its extensions.

Bibliography
  • Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart Web Data Management, Cambridge University Press, 2012