« Back 2-years Postdoctoral Research position at INRAE Montpellier
2-years Postdoctoral Research position at MISTEA INRAE Montpellier (https://www.inrae.fr/en/centres/occitanie-montpellier) in the context of the ANR projet DACE-DL
Areas: Semantic Web, Linked Data, Data linking Qualifications: PhD in the domain of Semantic Web Contact & Collaboration: Danai Symeonidou, CR INRAE Montpellier, danai.symeonidou@inrae.fr Clement Jonquet, CR (HDR) INRAE Montpellier, clement.jonquet@inrae.fr
Institut: INRAE is a pioneer in France in terms of data sharing and Open Science commitment. The MathNum research department gathers around 200 scientists in mathematics and digital technologies in 13 research units in France. MISTEA is a joint research unit of INRAE and Montpellier Institut Agro engineering school with activities in the development of mathematical, statistical and informatics methods dedicated to analysis and decision support for agronomy and environment. The team is also recognized for its expertise in knowledge engineering and ontology-based scientific data management and information systems. Project context: Data linking is the scientific challenge of automatically establishing typed links between the entities of two or more structured datasets. A variety of complex data linking systems exists, evaluated on public benchmarks [1,2,3]. While they have allowed for the generation of vast amounts of linked data in the context of various dedicated projects, data generic systems often have limited applicability in many real-world scenarios, where data are highly heterogeneous and domain-specific. The ANR project DACE-DL (2022-2024) targets a paradigm shift in the data linking field with a data-centric bottom-up methodology relying on machine learning and representation learning models. We hypothesize there exists a finite number of identifiable and generalisable linking problem types (LPTs), that we need to categorize and analyse to provide better linking results. Topic: The main goal is to identify and provide a categorisation/taxonomy of the different linking problem types based on an in-depth analysis of the linked datasets provided by the project and beyond. The first objective is to provide an in-depth analysis of the linked data available along with an exhaustive study of the state-of-the-art in the field of data linking. The postdoctoral researcher will propose a finite number of generalisable linking problem types, organised in a taxonomy encoded with OWL or SKOS, where the relations and inherent structure of the LPTs will be made explicit to both human and machine. Additionally, a (semi-)automatic taxonomy building methodology that will be easily reproducible and extensible in the future for other pairs of datasets should be built. The goal is to answer questions such as: are certain LPTs or groups of LPTs (e.g. siblings at a given level of the taxonomy) specific to a domain, language or a community? Are certain LPTs inherent to specific types of data? Once a clear and unambiguous taxonomy of LPTs is produced, various datasets will be manually annotated. These annotations on existing pairs of datasets will be used to learn, using machine learning strategies, features for the automatic categorization of other datasets. Starting period: January 2022 Duration: 24 months Location: INRAE Centre Occitanie-Montpellier, MISTEA, 2 Place Pierre Viala, 34000 Montpellier Salary: Between 2200€ and 2700€ gross monthly depending on qualifications and situation. Application: To apply for the position send an email to the contact emails (danai.symeonidou@inrae.fr and clement.jonquet@inrae.fr):
The offer is also available here: |