Netidee Blog Bild
HaSPI at Forschungsforum 2025
Presenting HaSPI at the 18th Research Forum of the Austrian Universities of Applied Sciences (15.04.2025)
Förderjahr 2024 / Projekt Call #19 / ProjektID: 7207 / Projekt: HaSPI

We’re excited to announce that we’ll be presenting our first research paper of the project, "Automatic Content Moderation for German Newspaper Comments", at this year’s Forschungsforum 2025. The event, hosted at FH Campus Wien on May 7–8, brings together Austria’s universities of applied sciences to explore the pressing questions of our time. Under the motto “Doing Research – Shaping the Future,” the forum offers a glimpse into how research can drive innovation and impact society — and we’re thrilled to be part of that conversation.

Presentation Details

Session: Software-Anwendungsfälle - Session 7 Time: Thursday, May 8, 2025 – 11:45 am to 1:15 pm Location: Room A.-1.03, FH Campus Wien, Ebene U1 Title: Automatic Content Moderation for German Newspaper Comments

About the Paper

Social media platforms have been widely studied when it comes to detecting hate speech and offensive language, but German-language newspaper forums have received surprisingly little attention — despite being important venues for public debate. Our paper addresses this gap by exploring automated binary content moderation in this context using several machine learning models.

Research Focus

We examined how different machine learning approaches perform when tasked with predicting whether a comment should remain online or be removed (offline). The key novelty of our work lies in context-aware modeling — using not just the comment itself, but also:

  • Article metadata (title, topic path)

  • User comment history (online ratio)

  • And, in some experiments, community guidelines

We compared LSTM, CNN, and a large language model (ChatGPT GPT-3.5-Turbo) with traditional models and transformer-based baselines like BERT and GBERT from existing literature.

Dataset: One Million Posts Corpus

Our experiments were conducted on the “One Million Posts Corpus” from the Austrian newspaper DerStandard, one of the largest and most richly annotated German datasets for online comments. We balanced the dataset to tackle the heavy class imbalance between online and offline posts, resulting in over 120,000 examples used for training and evaluation.

Key Findings

  • CNN and LSTM models, when given additional context like article titles and user history, performed competitively with BERT and GBERT models from prior studies.

  • Surprisingly, contextual information did not improve results for ChatGPT (GPT-3.5-Turbo), suggesting that prompt-based moderation has limitations in this domain — at least for now.

  • Traditional models like logistic regression and naive Bayes offered solid performance and efficiency, but were ultimately outpaced by deep learning models with context.

  • Our best model achieved an AUROC of 0.809, outperforming several transformer-based models trained on similar tasks.

What's Next

This paper is just the beginning: next we want to explore how imitation learning can enhance moderation by learning directly from human moderators. The goal is to develop systems that not only detect harmful content but learn nuanced decision-making patterns from professionals in real-world settings.

Looking Forward

We’re looking forward to engaging with fellow researchers, practitioners, and innovators at Forschungsforum 2025. If you’re attending, feel free to stop by our session — we’d love to hear your thoughts, ideas, or experiences in the field of content moderation, ethical AI, or digital communication.

See you on May 8th at FH Campus Wien! Event Website

If you'd like a sneak peek at the paper or have questions about the project, feel free to reach out!

CAPTCHA
Diese Frage dient der Überprüfung, ob Sie ein menschlicher Besucher sind und um automatisierten SPAM zu verhindern.