Project Detail |
Knowledge Graphs (KGs) play a vital role in modern computer systems by organizing information efficiently through structured relations between concepts or entities. They provide a structured framework for storing and retrieving information, facilitating easier navigation and analysis of large volumes of data. This is crucial in interdisciplinary knowledge-intensive applications like disease diagnosis, drug discovery, ecological data interpretation, and specialized search engines. The knowledge in KGs is predominantly derived from unstructured textual sources, such as scientific articles and news feeds. However, verifying the origin of KG knowledge in these textual sources, known as the provenance of KG knowledge, is currently challenging. Provenance detection is essential for explaining and validating the knowledge stored in KGs and identifying potential inconsistencies with textual sources. To address the lack of efficient KG provenance detection models, my method will tackle two major scientific challenges. Firstly, dealing with a large volume of text as a source of information requires significant computational power, which poses a scalability problem. To overcome this, I will design subsampling methods to focus only on the most relevant textual passages that represent the knowledge in a KG. Secondly, the scalability problem is further complicated by the dynamic and evolving nature of knowledge, with millions of new textual sources appearing daily. This presents a challenge in efficiently identifying textual sources that contribute to knowledge shifts and using them as provenance to define KG updates. To address this, I will develop a novel scalable architecture to efficiently align knowledge shifts in text to concrete changes in KGs. Finally, I will closely collaborate with interdisciplinary industrial researchers to demonstrate the effectiveness of the developed methodology in real-world scenarios. |