Building Knowledge Graphs for Under Resourced Languages

Elwin Huaman

Building Knowledge Graphs for Under Resourced Languages

Förderjahr 2024 / Stipendium Call #19 / Stipendien ID: 7335

This work aims to lay the foundations for building community-centred and language-aligned Knowledge Graphs for Under-Resourced Languages and their communities. These Knowledge Graphs will act as a common sense layer and will play a key role in shaping the Internet and Artificial Intelligence applications for a better future, where no one is left behind. Why this project matters: There are about 7000 languages in the world and only a few of them have been implemented in language technologies. How it is done: Allowing all languages to be represented in knowledge graphs Contributions: A community-driven approach to knowledge graph curation, and bringing awareness of bias in knowledge graphs and AI applications.

Uni | FH [Universität]

Universität Innsbruck

Themengebiet

Artificial Intelligence

Digital Divide

Information Visualization

Interdisciplinary Research

Projektmanagement

Semantic Web

Semantische Modellierung

Zielgruppe

Start-ups

Systemintegratoren

thematische Community

Gesamtklassifikation

Dissertation | PhD

Informationen | Guidelines

Initiative |Aktion

Technologie

AI | KI

Datenbank

Linked Data

Open Data

Semantische Netze

Projektergebnisse

Zwischenbericht CC-BY

Building Knowledge Graphs for Under-Resourced Languages Interim Report

stip7335_Call19_Zwischenbericht_V01.pdf
592.56 KB

Interim Report

Paper CC-BY-SA

Huaman, E., Huaman, J. L., Huaman, W., & Quispe, N. (2025). Quechua speech datasets in common voice: The case of Puno Quechua. In *Information management and big data – 12th annual international conference (SIMBig 2025), Lima, Peru, October 29-31, 2025, Proceedings. Springer.

https://arxiv.org/pdf/2510.13871

Quechua Speech Datasets in Common Voice: The Case of Puno Quechua

Datensammlung CCO

This datasheet is for version 2.0 of the the Mozilla Common Voice Spontaneous Speech dataset for Puno Quechua (qxp). The dataset contains 2211 clips representing 10 hours of recorded speech (6 hours validated) from 26 speakers.

https://datacollective.mozillafoundation.org/datasets/cmj8u48et004tnxzps28psruc

Common Voice Spontaneous Speech 2.0 - Puno Quechua

Datensammlung CCO

This datasheet is for version 24.0 of the the Mozilla Common Voice Scripted Speech dataset for Puno Quechua (qxp). The dataset contains 9033 clips representing 11.63 hours of recorded speech (9.88 hours validated) from 14 speakers.

https://datacollective.mozillafoundation.org/datasets/cmj8u3pq700ndnxxb737sq5re

Common Voice Scripted Speech 24.0 - Puno Quechua

Präsentation CC-BY-SA

I act as part of the advisory committee for the Shared Task: Mozilla Common Voice Spontaneous Speech ASR, that challenges researchers and developers to push ASR further, across 21 underrepresented languages from Africa, Asia, Europe and the Americas.

https://community.mozilladatacollective.com/shared-task-mozilla-common-voice-spontaneous-speech-asr/

Shared Task: Mozilla Common Voice Spontaneous Speech ASR

Endbericht CC-BY

Final report that describes the current status and next steps.

stip7335_Call19_Endbericht_V01.pdf
741.28 KB

Final Report

Summary CC-BY

Summary

stip7335_Call19_Zusammenfassung_V01.pdf
176.76 KB

Summary

Building Knowledge Graphs for Under Resourced Languages

Förderjahr 2024 / Stipendium Call #19 / Stipendien ID: 7335

Uni | FH [Universität]

Themengebiet

Zielgruppe

Gesamtklassifikation

Technologie

Projektergebnisse

Blogbeiträge

(26.01.2026 )
Case Study: Puno Quechua language

(25.01.2026 )
Giving a Voice to Quechua: Breaking Digital Barriers

(21.08.2025 )
Unlocking the Future of Speech data for Under-Resourced Languages

(30.04.2025 )
Multilingual Awareness in Knowledge Graphs

(20.01.2025 )
Bridging the Digital Gap

Ähnliche Stipendien

QUDAPI: Efficient Data Pipeline in Quantum-enhanced Cloud Computing

Advancing Privacy in Federated Learning

Analyzing and Understanding the Internet of Insecure Things

Intellectual Property Protection in Open Data Sharing

Hidden Dangers: Uncovering Security and Privacy Risks through Large-scale Mobile App Analysis

Building Knowledge Graphs for Under Resourced Languages

Förderjahr 2024 / Stipendium Call #19 / Stipendien ID: 7335

Uni | FH [Universität]

Themengebiet

Zielgruppe

Gesamtklassifikation

Technologie

Projektergebnisse

Blogbeiträge

(26.01.2026 ) Case Study: Puno Quechua language

(25.01.2026 ) Giving a Voice to Quechua: Breaking Digital Barriers

(21.08.2025 ) Unlocking the Future of Speech data for Under-Resourced Languages

(30.04.2025 ) Multilingual Awareness in Knowledge Graphs

(20.01.2025 ) Bridging the Digital Gap

Ähnliche Stipendien

QUDAPI: Efficient Data Pipeline in Quantum-enhanced Cloud Computing

Advancing Privacy in Federated Learning

Analyzing and Understanding the Internet of Insecure Things

Intellectual Property Protection in Open Data Sharing

Hidden Dangers: Uncovering Security and Privacy Risks through Large-scale Mobile App Analysis

(26.01.2026 )
Case Study: Puno Quechua language

(25.01.2026 )
Giving a Voice to Quechua: Breaking Digital Barriers

(21.08.2025 )
Unlocking the Future of Speech data for Under-Resourced Languages

(30.04.2025 )
Multilingual Awareness in Knowledge Graphs

(20.01.2025 )
Bridging the Digital Gap