Synthetic Content Needs a Home
And Jekyll will provide for it (19.09.2022)
Förderjahr 2021 / Stipendien Call #16 / ProjektID: 5900 / Projekt: Synthetic Content for Probe-Resistant Proxies

In the previous blog articles we established the foundation for understanding the motivation and the idea behind this project. Now we will get into the nitty-gritty of this project and dive into challenges and solutions to building an automated tool for generating a website filled with synthetic content. So let's start by taking a deep dive into Jekyll. 


Like previously mentioned, Jekyll is a popular static site generator written in Ruby and it renders its source from Markdown or from Textile combined with Liquid templates. Key advantage of using Jekyll is that there is no need of setting up databases, since it supports loading the content dynamically from YAML, JSON or CSV files. 

Gems, Gemfile & Bundler

Similar to other programming frameworks, Gems can be viewed as libraries or packages that can be imported in Ruby in order to use pre-programmed functionalities. Jekyll is therefore a Gem that can be used in Ruby to generate static websites. Gems are defined in a Gemfile inside the root folder of a Jekyll project and specifies which Gems are being used within the project. 

Bundler is just like Jekyll another Ruby Gem. It takes care of installing all Gems that are specified inside the Gemfile which are being used inside the project. Using Gemfile together with Bundler ensures that every time the project is being deployed on a different environment, the correct version of Jekyll and its plugins are being installed. 

Parsing Jekyll templates

This project is based on the idea of generating a website filled with synthetic content by passing it any of the countless open source Jekyll templates available, together with a few keywords that will build the actual theme of the website. But in order to that, we need to understand all the different components used inside a Jekyll project. This understanding is necessary for the most crucial step inside this project, which is parsing Jekyll projects in order to identify the correct locations where the the synthetic content needs to be inserted. 

Let's start by listing a few of the most relevant components within a Jekyll project. 


Jekyll uses Liquid to process templates and it is structured based on 3 main components: objects, tags and filters. 

  1. Objects: Are used with double curly braces to signal to Liquid to output predefined variables as content on a page.
  2. Tags: Are used with curly braces and percent signs for tags and they define any logic and control flows inside a template. 
  3. Filters: Can change the output of a Liquid object and are used within an output and are specified with a |

Front Matter

This component is placed between two triple-dashed lines at the start of a page and it declares variables that are being used specifically for this page only. These variables can then be called by using the page variable together with the variable name defined inside the Front Matter. It's not mandatory to define these variables, however in order to build the project successfully the two triple-dashed lines need be specified at the beginning of the page regardless. 


Additionally to HTML, Jekyll supports Markdown for building pages. A popular use case inside Jekyll is to combine both approaches for building webpages. For instance, HTML can be used to define the basic layout of the page, inside the HTML body however Liquid will be used to load the actual content from a Markdown file. 

The approach taken

File:Abstract syntax tree for Euclidean algorithm.svg

The parsing of the Jekyll template is being implemented with the use of the Go programming language. After parsing the projects root folder an abstract syntax tree will be generated, that will help to simplify the project's structure and it will enable to insert additional nodes inside this tree. In this context, these nodes will represent our synthetic content. In the next blog I will detail how this tool will obtain its synthetic content and logic on where to insert these new additional nodes inside the abstract syntax tree. 

Diese Frage dient der Überprüfung, ob Sie ein menschlicher Besucher sind und um automatisierten SPAM zu verhindern.
    Der datenschutzrechtliche Verantwortliche (Internet Privatstiftung Austria - Internet Foundation Austria, Österreich) würde gerne mit folgenden Diensten Ihre personenbezogenen Daten verarbeiten. Dies ist für die Nutzung der Website nicht notwendig, ermöglicht aber eine noch engere Interaktion mit Ihnen. Falls gewünscht, treffen Sie bitte eine Auswahl: