Förderjahr 2020 / Project Call #15 / ProjektID: 5214 / Projekt: MLReef
Enabling collaboration and concurrent workflows in ML development
At MLReef we want to enhance collaboration and concurrent workflows in Machine Learning development to new levels. The general idea is to move away from canvas focused notebooks to a modular way of doing ML. Giving structure to the general workflow means that you have distinct functions, such as data pre-processing, that is detached from the modelling part. This detachment means, that the singular scripts need to be atomic, able to work outside of a previous defined context, which is impossible in notebooks.
Making scripts atomic means that teams can work together and separate from each other on the same project. We already know this concept from software engineering, where teams work together on the same project but, thanks to technologies like git, not interfeering with one another. In Machine Learning this was, until now, more complicated. We believe, that the publishing process can enable a structured and independent work that builds on previous work from others. We call this effect: cross-project synergy effect.
What does the publishing process do?
The clue was to create code repositories already known from software engineering and creating a publishing concept, that containerizes the script. This containerization makes the script immutable and also very easily re-usable: all its dependencies and libraries are versioned and enclosed in the created docker image. It is also possible to publish different branches of one script to iterate without loosing any information or traceability.
In addition, the parameters that were annotated are parsed and stored in a data base. When a code module is published, it is automatically made available for the team or the community to use. The use itself is then very easy: a simple drag and drop element withing the built-in pipelines from MLReef. The users can address the described paramters and execute the script with just one click - there is no need for setup or configuration to make scripts work!
What are the benefits?
Using git and CI/CD mechanism allow teams and the community to provide an interactive knowledge base. Everyone can contribute and create content that is made easily explorable and reusable. This way, the value adding tasks are modular and are built from existing pieces. This enhances the speed of prototyping and, at the same time, creates a new level of concurrent workflows that where previously impossible.