Pose Recognition

The project

OpenPose is a very powerful program that is able to detect the pose of a person in a photo or video. Key points of the body, hands and outlines of the face are pinpointed and marked by OpenPose, making it a great program for applications like behavioral analysis, automatic sign language recognition, person counting and many more applications. Previously, researchers have been struggling to use this program effectively in a research environment. The analysis was limited to one video per command and the options were not clear to the user.

To solve these issues, we created an easy-to-use interface. It allows various different settings to be altered to achieve the desired analysis result, such as a per-frame analysis of body, hands and face, including the generation of (animated) images and video to quickly verify and display the results. Moreover, multiple analysis jobs can be queued and executed to relieve the user of all focus and headache needed to complete the analysis. In sum, all data needed for research can be generated with minimum effort, and the possibilities for finding hidden treasures in the data are greatly increased.


The customer

The Creative Intelligence Lab at the the Leiden Institute of Advanced Computer Science uses the OpenPose interface which we developed in its research on the structure of sign language.

"The most important aspect about Software Engineering is teamwork."
The team

The five members of the coding team split the work up into three parts: user interface, python scripts and a test suite. Working with other students with varying expertises offered the team many learning experiences during the project.


The technologies

For the designing of the graphical user interface we used Visual Studio .NET C#, which offers a clear overview when designing a graphical user interface and allows one to drag and drop certain components, like buttons, a media player and check boxes to the desired place in the window. Events are used to decide what happens when an event like a mouse hover or a mouse input occurs in a component. The mean action undertaken by the interface is to call the Python scripts doing the actual work as a C# process.

Our Python scripts analyze videos and detect gestures in them using the OpenPose C++/Python API. This is an API that is able to detect key points of body parts in videos. The API has a following dependencies, which we all bundled in an installer:
* CUDA, which allows us to use the GPU to make calculations necessary for the analysis
* OpenCV
* Numpy
* Python 2.7

Our Python scripts write the analysis results into the desired csv format.