OpenML Evaluation Engine

The Product

The project we have worked on is making a port of the evaluation engine of the OpenML project. The OpenML project is an open platform for sharing datasets, algorithms and experiments. When uploading datasets to the OpenML servers, they calculate a lot of attributes called meta features. The way this used to happen was written in Java. Normally this wouldn’t be a problem, however some of the packages used were no longer being supported. So, to still have all the functionalities of the old evaluation engine but have all the packages supported we were asked to write a new evaluation engine in Python. The added benefit of using Python is that OpenML wants to switch to python on a whole, so this is a good begin. The nice thing about the evaluation engine is that it can be used from the command line. To process some datasets, you can run a command in your terminal and the evaluation engine will work its magic and process the unprocessed datasets. In order to get all the documentation of the code right we used tox and sphinx. Sphinx is a documentation tool that automatically generates a website from ReStructuredText or RST files in short. This is a Markdown file format. We have tried to make our documentation consistent with the layout and styling from OpenML-python.


The Customer

Our customer was Jan van Rijn. Though we mainly interacted with Jan van Rijn. Him being assistant professor at Leiden was of some help to us, since it meant he had some experience dealing with students. Being one of the main contributors of the OpenML project, he knew well how the existing program worked. However, we often had difficulties getting the right information. This was mostly because his schedule was rather full, meaning that our meetings were always a bit too short to really get all the information we wanted. We also had the problem that we had no knowledge of the OpenML project whilst he knew it very well. This meant that there were parts of the project that seemed easy to him and difficult to us. Because of this we often did not quite get what we needed to do. This could of course be dealt with in the meetings. However, due to the meetings being short, we often did not get a satisfactory answer. Although there were some difficulties, we did resolve them in a professional manner and Jan was very understanding.

“The most important thing about software engineering is to be clear about what you are trying to build“
The Team

Our team named StudentZIP has six members. These members come from a variety of studies. We consist of three computer science students, an AI student and two bio-informatics students. Some team members had more experience than others but the more experienced were able to help the less experienced solve any problems when they were encountered. The weekly scrum meetings we had helped us get a good view of our current progress and what we could do next.


The Technologies