Automated Knowledge Graph Construction From Raw Log Data
Poster @ISWC (22.10.2020)
Förderjahr 2017 / Science Call #1 / ProjektID: / Projekt: SEPSES

In this paper, we propose SLOGER, a workflow for automated knowledge graph construction from unstructured, heterogeneous, and potentially fragmented log sources. SLOGERT combines extraction techniques that leverage particular characteristics of log data into a modular and extensible processing framework.

Our underlying workflow combines log parsing and event template learning, natural language annotation, keyword extraction, automatic generation of RDF graph modelling patterns, and linking and enrichment to extract and integrate the evidence-based knowledge contained in logs.

process_overview

Our approach expects unstructured log files as input and consists of five phases:

  1. Template and Parameter Extraction: Log files typically consist of structured elements (e.g., time stamp and device id), and an unstructured free-text message. We use LogPAI to identify constant string and variable parts (parameters) in the text message, but their semantic meaning is yet undefined.
  2. Semantic Annotation: We apply a combination of Named Entity Recognition (NER) techniques to identify semantic objects inside the extracted parameters and generate Reasonable Ontology Templates (OTTRs).
  3. RDFization: In this step, we generate an RDF knowledge graph with the help of the templates and log instances. For this purpose, we use LUTRA, the reference implementation for OTTR.
  4. Background Knowledge Graph (KG) linking contextualize entities that appear in a log file with local background knowledge (e.g., employees, servers, installed software) and external background knowledge (e.g., publicly available cybersecurity information).
  5. Knowledge Graph Integration combines the generated KGs from previously isolated log files and sources.

With such knowledge graphs, security analysts can easily navigate and query the log data in an integrated fashion. The following SPARQL query and result visualization illustrate how to write a single query to combine log events with external knowledge (standard services running on the ports from IANA, available as ontology).

query_example

result

By making log data amenable to semantic analysis, the workflow fills an important gap and opens up a wealth of data sources for knowledge graph building.

Tags:

Security semantics log analysis

Andreas Ekelhart

Profile picture for user ae
Andreas is a researcher at University of Vienna and SBA Research. His main research interests include semantic applications and machine learning to strengthen cybersecurity.

Skills:

IT Security
,
Semantic applications
,
Programming
,
Simulation
,
Attacker modeling
,
Ontologies
,
Machine Learning
,
LLMs
CAPTCHA
Diese Frage dient der Überprüfung, ob Sie ein menschlicher Besucher sind und um automatisierten SPAM zu verhindern.

    Weitere Blogbeiträge

    Datenschutzinformation
    Der datenschutzrechtliche Verantwortliche (Internet Privatstiftung Austria - Internet Foundation Austria, Österreich) würde gerne mit folgenden Diensten Ihre personenbezogenen Daten verarbeiten. Zur Personalisierung können Technologien wie Cookies, LocalStorage usw. verwendet werden. Dies ist für die Nutzung der Website nicht notwendig, ermöglicht aber eine noch engere Interaktion mit Ihnen. Falls gewünscht, treffen Sie bitte eine Auswahl: