The Semantic Web is an initiative of the World Wide Web Consortium (W3C) that extends the World Wide Web by setting standards that aim to organize Web data so that it can be processed by machines. As a first step towards this aim, W3C standardized in 2004 the Resource Description Framework (RDF) as the data model of the Web. RDF offers a flexible and semi-structured format, by storing data as triples, easily representable as a graph, which makes it suitable for modeling information from Web resources. This has favored its wide adoption and large and growing RDF graphs are continuously being published on the Web. Crucially, one often encounters RDF data that is incorrect. As ensuring data quality became increasingly important in RDF-based applications, the presence of faulty facts in RDF graphs has been widely acknowledged as a serious issue. To address this, W3C recommended in 2017 the Shapes Constraint Language (or SHACL), a machine-readable constraint language for describing and validating RDF graphs. Intuitively, it allows us to specify a set of conditions to be checked against an RDF graph, which can be exploited by applications to improve its quality. Although it only recently became a W3C Standard, SHACL has already been adopted by existing tools and software packages and has been the focus of research works. However, SHACL is not yet well-understood, and its theoretical foundations are not yet well-established.
This project will develop solid theoretical foundations for SHACL, paving the way for very powerful techniques for intelligent Web data management and reasoning about data quality, and significantly advancing the current state-of-the-art. We will investigate fundamental static analysis tasks aimed at supporting the design of SHACL constraints as well as the yet unexplored but crucial task of how to handle violations of SHACL constraints by RDF graphs. More precisely, the first major goal of the project is to investigate the satisfiability and containment of SHACL constraints, the most primitive static analysis problems that lie at the core of constraint design and optimization techniques. These tasks are crucial for building meaningful sets of SHACL constraints without involving inconsistencies in the way they are formulated, but also for optimization purposes. The second major goal is to formalize the notions of explanations and repairs for SHACL and study their properties. Roughly, this allows us to explain why some RDF graph violates the SHACL constraints and to provide ways to fix the graph so that it conforms with the constraints. This is a key point mentioned in the SHACL specification, under so-called validation reports. This study will also clarify the relationship between the emerging and the classic approaches to managing data inconsistency and explanations, e.g., it will allow us to transfer existing results from the setting of databases and that of SHACL’s close relative OWL to the setting of SHACL.