Förderjahr 2017 / Stipendien Call #12 / ProjektID: 2418 / Projekt: Decentralised Data Provenance based on the Blockchain
In complex, loosely coupled systems, as depicted in Figure 1, it is often hard to reproduce how a certain decision was made or how certain data was generated. In our example, the input data depicted in green passes through a hybrid system of human experts and Web services and produces some output data, depicted in violet. It is hard to build trust into this output data since it is not any more easily reproducible what happened to the input data and what influence the human experts had on it. Neither is the path which the data took in its lifetime easily traceable.
To solve this issues, data provenance can be used to provide reliable information about those processes, decisions and outcomes (depicted in orange). By collecting provenance information about the process and the changes that occurred to the input data, we can build trust into the output data, as depicted in Figure 2. However, one major disadvantage of data provenance solutions is that you have to trust the provenance data store and its maintainers. This leads back to the initial problem of how to ensure trust in the data that is provided.
In our next blog entry, we will see how the Blockchain can help to solve the trust issue.
My master thesis aims to combine the advantages of the blockchain with data provenance. The blockchain is a distributed ledger which allows persisting data in an unchangeable way. Data provenance is an approach to track what happened to data and by this allowing to build trust into this data.