/img/portrait-square.jpg

Samuel Pröll - Personal homepage

Managing multi-dimensional datasets with metadata in Python

What many datasets for machine learning tutorials are lacking is metadata. Example datasets are, with good reason, kept as simple as possible and only capture the core of the problem. In the real world however, a business goal very rarely comes as one neat pack of uniform data. You may have CT images obtained with different scanners, manufacturing data from different facilities, sales data including different marketing campaigns… When working on a particular task, you want to be aware of such information, as disregarding it may lead to unexpected model behavior.

My personal COVID-19 experience: a data project

Covid knocked me out quite good. I contracted the virus right at the start of summer and spent more than a week in bed – with aching limbs, a sore throat and body temperature almost reaching 40°C. Immediately after the first positive antigen test I decided to monitor my body more closely and generate some data to play with later. This post summarizes what I have learned through making a data project out of my Covid infection.

Speed up data exploration with ad-hoc data filters in Streamlit

I love Streamlit. It is an amazing tool, to quickly create interactive data apps. In data science, it is often beneficial to get first results early and then improve iteratively. Making data available and accessible to domain experts is an important step in that journey. With Streamlit, it is straightforward to build custom applications. Apps can easily be tailored to specific data science projects. But with a few tricks, they can also be made more generally applicable.