Automated Knowledge Graph Construction From Raw Log Data

Automated Knowledge Graph Construction From Raw Log Data

Poster @ISWC (22.10.2020)

Förderjahr 2017 / Science Call #1 / ProjektID: / Projekt: SEPSES

In this paper, we propose SLOGER, a workflow for automated knowledge graph construction from unstructured, heterogeneous, and potentially fragmented log sources. SLOGERT combines extraction techniques that leverage particular characteristics of log data into a modular and extensible processing framework.

Our underlying workflow combines log parsing and event template learning, natural language annotation, keyword extraction, automatic generation of RDF graph modelling patterns, and linking and enrichment to extract and integrate the evidence-based knowledge contained in logs.

process_overview

Our approach expects unstructured log files as input and consists of five phases:

Template and Parameter Extraction: Log files typically consist of structured elements (e.g., time stamp and device id), and an unstructured free-text message. We use LogPAI to identify constant string and variable parts (parameters) in the text message, but their semantic meaning is yet undefined.
Semantic Annotation: We apply a combination of Named Entity Recognition (NER) techniques to identify semantic objects inside the extracted parameters and generate Reasonable Ontology Templates (OTTRs).
RDFization: In this step, we generate an RDF knowledge graph with the help of the templates and log instances. For this purpose, we use LUTRA, the reference implementation for OTTR.
Background Knowledge Graph (KG) linking contextualize entities that appear in a log file with local background knowledge (e.g., employees, servers, installed software) and external background knowledge (e.g., publicly available cybersecurity information).
Knowledge Graph Integration combines the generated KGs from previously isolated log files and sources.

With such knowledge graphs, security analysts can easily navigate and query the log data in an integrated fashion. The following SPARQL query and result visualization illustrate how to write a single query to combine log events with external knowledge (standard services running on the ports from IANA, available as ontology).

query_example

result

By making log data amenable to semantic analysis, the workflow fills an important gap and opens up a wealth of data sources for knowledge graph building.

Andreas Ekelhart

Andreas is a researcher at University of Vienna and SBA Research. His main research interests include semantic applications and machine learning to strengthen cybersecurity.

Skills:

IT Security

Semantic applications

Programming

Simulation

Attacker modeling

Ontologies

Machine Learning

LLMs

Weitere Blogbeiträge

An ATT&CK-KG for Linking CybersecurityAttacks to Adversary Tactics and Techniques (ISWC P&D 2021)

The paper discusses an extension of our prior work namely Cybersecurity Knowledge Graph (CSKG) with adversary tactics and techniques, to support analysts in connecting log events to higher level attack steps. For this purpose, we developed a vocabula...

Virtual Knowledge Graphs for Federated Log Analysis (ARES Conference 2021)

The paper introduces a novel approach to dynamically construct virtual log knowledge graphs directly from heterogeneous raw log files across multiple hosts. It furthermore contextualizes the results with internal and external background knowledge to ...

The SLOGERT Framework for Automated Log Knowledge Graph Construction (ESWC 2021)

SLOGERT is an approach to (semi-)automatically transform raw log data, i.e., textual records of system events, into RDF graphs following a sequence of processes. SLOGERT supports automatic identification of rich RDF graph modelling patterns to repres...

Cross-Platform File System Activity Monitoring and Forensics – A Semantic Approach (IFIPSEC 2020)

In this paper, we introduce a semantic approach for file system activity monitoring and forensics. We proposed a vocabulary (depicted in Figure 1) for file access information and implement an architecture (shown in Figure 2) for log acquisition, log ...

Report from the Semantics 2019 Conference

Semantics 2019 Trip Report

The SEPSES knowledge graph: An integrated resource for cybersecurity (ISWC 2019)

Resource paper @ISWC

Semantic Integration and Monitoring of File System Activity (Semantics 2019)

Semantics 2019 - Poster & Demo Session

SEPSES Cybersecurity Knowledge Graph (CSKG)

Im Rahmen des SEPSES-Projekts haben wir einen Cybersecurity Knowledge Graph (CSKG) entwickelt, der Informationen zu identifizierten Software-Schwachstellen, konzeptuellen Entwicklungsfehlern und Angriffsmustern aus verschiedenen öffentlich zugänglich...

Finding Non-compliances with Declarative Process Constraints through Semantic Technologies (CAiSE’19 Forum)

Declarative Process Constraints through Semantic Technologies