fbpx
Back to Blog

Machine Learning Design Patterns: Problem Representation (Part 1)

Join Vaidas Armonas, our Machine Learning lead, as he explores two ML design patterns, Reframing and Neutral Class, in an attempt to predict song popularity on Spotify.

Author: Vaidas Armonas, Machine Learning Lead at Genus AI
This post was originally published on his personal blog.

In my previous post I have discussed data representation patterns presented in Machine Learning Design Patterns by V. Lakshmanan, S. Robinson & M. Munn. In this post I would like to talk about the next topic in the above-mentioned book: problem representation. After taking care of our data representation, this is the next logical step (and therefore the next chapter in the book).

This is also probably the most important decision to make for a Machine Learning problem: the decision how to model a given problem will define how well our solution will perform. The good bit is that we do not need to make this decision correct from the start. As with everything in Machine Learning, it is an iterative process, and when you find that your problem cannot be solved by regression, try classification (always try classification if you can).

I will do it differently this time – instead of just discussing patterns, I will define a task which we will solve using different design patterns. This way we will be able to compare results and see the influence of problem representation.

In this post, I will concentrate on Reframing and Neutral Class design patterns. Next time I will cover Rebalancing and Ensemble design patterns. But first, let’s define our task.

Task: Predict song popularity

To illustrate the above-mentioned design patterns I will try to predict track popularity on Spotify only using track (mostly audio) features such as danceabilityliveness and tempo. Data can be downloaded from Kaggle as well as the full list of features used. I will provide the accompanying notebook shortly.

Popularity is a Spotify metric calculated for each track mostly based on the number of plays and the recency of those plays. Given the above definition, we expect that new songs will be more popular on average. So we will limit ourselves to the tracks that were released in the past decade. Again, for more details, see the notebook.

Reframing and Neutral Class Design Patterns

As I have already mentioned, in this post I would like to discuss and illustrate two problem representation design patterns: Reframing and Neutral Class. The former is any data scientist’s bread and butter. As for the latter, it is much rarer (I haven’t seen it anywhere else so far).

Solving the task

The task is simple: we want to predict which song is popular. Popularity rating is an integer from 1 to 100. It is not a real-valued target, as, for example, a price of a house would be, but regression is a reasonable approach here. However, if the only thing we want to predict is popularity (not the rating itself), we can make this task a classification problem by thresholding the popularity index.

Data

More on data can be found in the notebook referenced above.

We have 19,788 tracks collected for the years 2011-2020. We split this dataset randomly to train and test sets – 15,830 and 3,958. We have 11 audio features to predict popularity from.

The popularity has the following distributions for train and test splits:

Machine Learning Design Patterns: Problem Representation. Popularity distribution in train and test data sets

It is not ideal, but close to that. Statistics are close too:

StatisticPopularity /train/Popularity /test/
Count15,8303,958
Mean58.8958.58
Std15.3015.14
Min00
25th5453
50th6160
75th6767
Max99100

Evaluation

For these types of problems, I use correlation (rank correlation) as an evaluation metric. Since I want to know which tracks are/will be popular, I am interested to know if my predicted score indicates higher actual popularity, and that is the correlation. I will use Spearman’s Rho and Kendall’s Tau with corresponding p-values.

Regression

As mentioned, regression is a reasonable approach to model popularity. Running it with sklearn’s GradientBoostingRegressor produces the following results:

Machine Learning Design Patterns: Problem Representation. Predicted vs. Actual Popularity (Regression)

Correlation coefficients

MetricCoefficientp-value
Spearman’s Rho0.2705.25e-67
Kendall’s Tau0.1846.20e-66

There is a statistically significant correlation between predicted and actual popularity. But let’s see if we can do even better.

Classification

We can reframe our original regression problem to classification by thresholding the popularity index to create classes for our model to predict. This usually works better than regression, because we simplify the problem a bit. If the only thing we need is a relative ordering of the songs, the scores from the classification model are perfectly good for such a goal.

Split at the median

The simplest approach for binary class creation is to split the scores at the median value. Using such an approach, we will have nicely distributed training data. This might not work if values in your dataset are very skewed towards one or the other end of the scale. In that case, you will have to experiment to see whether having the imbalanced dataset works out or some re-balancing design patterns (the subject of my next post!) should be employed.

Machine Learning Design Patterns: Problem Representation. Predicted vs. Actual Popularity (Classification - Median)

Correlation coefficients

MetricCoefficientp-value
Spearman’s Rho0.3067.69e-87
Kendall’s Tau0.2103.73e-85

The improvement is at 11% and 14% in Spearman’s and Kendall’s correlations respectively. That is not bad for some simple thresholding. Let’s see if we can improve this result.

Middle values removed

Another trick that I always experiment with is trying to simplify a problem for the algorithm by removing middle values. Depending on your problem this might help for the model to distinguish between good and bad examples better. However, this means that we are discarding some of our data and, therefore, this tradeoff will improve results when an amount of discarded data is less costly than the ambiguity that is removed.

Machine Learning Design Patterns: Problem Representation. Predicted vs. Actual Popularity (Classification - Middle Removed)

Correlation coefficients

MetricCoefficientp-value
Spearman’s Rho0.3037.98e-87
Kendall’s Tau0.2071.39e-82

In our example, I took top and bottom 40 percent of the dataset, and, as we can see, this ended up hurting the performance. I also varied the amount of data I removed – an approach, that always has a detrimental effect on the model performance, the same as in this case.

Classification with Neutral Class

The above example tried to remove ambiguity introduced by splitting continuous variable at a threshold, but hurt model performance because part of the data was removed. The Neutral Class design pattern takes care of that. Instead of removing part of the data, we give it a class and use it for the prediction. Then, at inference time, we only look at the high class probability.

The neutral class design pattern is also useful when we train models on human labelers output, medical imaging applications for example. When human labelers do not agree, we can represent that uncertainty to our model via neutral class.

Machine Learning Design Patterns: Problem Representation. Predicted vs. Actual Popularity (Classification - Neutral Class)

Correlation coefficients

MetricCoefficientp-value
Spearman’s Rho0.3088.23e-88
Kendall’s Tau0.2117.16e-86

As you can see, this increased the Spearman’s and Kendall’s metrics by 1% and 0.5% respectively compared to the previous simple classification approach. This will not make-or-break your Machine Learning project, but the effect it has also will depend on the dataset. With such a simple change, any positive effect is welcome.

Discussion

When looking at the scatterplots (or even looking at the label distribution), we see that more can be done to increase model performance. We could train a model to distinguish 0 (or very low) scores from the higher ones and then train another model to predict higher scores on the output of the first. This is called the Cascade design pattern and I will write about it in the future.

Summary

In this post, we explored two Machine Learning design patterns, Reframing and Neutral Class. I have shown that these can work in tandem by reframing a problem at hand from regression to classification and then adding a neutral class to help our model distinguish the high and low values better. These two steps add a nice performance boost if we can live with having an estimate of a probability of the high class rather than an estimate of the value itself.

In my next post, I will take a look into Rebalancing and Ensembles design patterns. I think they can be useful every day since the most of the interesting problems are imbalanced by nature (predicting high-value customers, churners, etc.).

Share