Executive Summary

Problem Summary

With the advent of smartphones and the consumer economy, there has been a revolution in the ways that people consume products and content. At the same time, digital music and digital music distribution platforms have become some of the most widely accessible and highly consumed product markets in the world. Yet with this deluge of digital music content comes a challenge: how do users find new content that they enjoy, and how do digital music platforms enable music discovery by users? These challenges are exacerbated by the fact that in the modern fast-paced world, people are often time or attention limited, there are other platforms competing for user attention, and digital content-based company's revenue often relies on the time consumers spend on, or interact with, its platform. These companies need to be able to figure out what kind of content is needed to increase customer time spent on their platform, the amount of interaction had with their platform, and the overall satisfaction with a user’s experience on the platform. The key challenge for companies is in figuring out what kind of content their users are most likely to consume. Spotify is one such music content provider with a huge market base across the world. With the ever-increasing volume of streaming music becoming available, finding new music of interest has become a tedious task in and of itself. Spotify has grown significantly in the market because of its ability to make highly personalized music recommendations to every user of its’ platform based on a huge preference database gathered over time - millions of customers and billions of songs. This is done by using smart recommendation systems that can recommend songs based on users’ likes/dislikes, incorporating both content-based and latent features for song recommendations. However, the recommendation system used by Spotify and its hyperparameter settings have remained a proprietary, closely guarded secret. Here, I build a recommendation system to provide a top 10 of personalized song recommendations to a user that the user is most likely to enjoy/like/interact-with based on that users’ personal musical preferences.

Solution Summary

In total, I explored six recommendation systems using the 1 million songs dataset¹ as part of the solution design for this project: popularity-based, user-user collaborative filtering, item-item collaborative filtering, matrix factorization, cluster-based, and content-based. To evaluate the different models analyzed here, I relied on the F1_score, model predictions of user/song interactions, and the top 10 recommended songs by each model. These metrics clearly demonstrate a higher level of performance by three of the six models: matrix factorization (Singular Value Decomposition with default settings, with a 70% weight applied), user-user collaborative filtering (KNN basic algorithm with msd distance, max cluster size of 50, minimum cluster size of 9, and 30% weight applied), and the content-based model (tf-idf encoding and cosign-similarity distance on album title, artist name, and release year). Therefore, I propose a Hybrid-Based Recommendation System, built by combining these three models with the specified hyperparameters, be adopted for personalized recommendations of music with maximum user interaction potential. The proposed hybrid recommendation system boasts fast computational times, high prediction accuracy of user preferences, and a balanced recommendation of familiar and diverse new music for users. However, the model is subject to limitations such as a 'cold start' problem when making predictions for new users with few listens, and with the ever-increasing volume of songs becoming available and users adopting the platform this will lead to longer and longer computation times (and presumably less user satisfaction). Additionally, the model generally favors popular artists and songs, which could have an impact on the music industry by making it more difficult for new artists to break into the scene using this platform.
Importantly, to reduce the computational resources required to test, train, and evaluate the solution design for this project, it was necessary to reduce the size of the dataset to make it more computationally tractable. The resulting filtered dataset reduced the original '1 million songs' dataset (with 2 million records) to 121,900 records and increased the accuracy of predicted user interactions by reducing high degree of imbalance in the data from infrequent users and unpopular songs. While these results do not address the 'cold start' issue, they demonstrate the importance of a filtering step for making fast and relevant recommendations. Therefore, I suggest the application of a pre-recommendation filter as part of a production design. Additionally, the content-based part of the recommendation system could also be improved by incorporating further features such as genre, lyrics, danceability, and encoded recordings. With these additions to the content portion of the recommendation system, the weight of the content-based model in the final hybrid recommendation system might be increased in future versions of the product. I recommend that these limitations and proposals be considered in future releases of the recommendation system for improved user personalization and increased user satisfaction.

==> NOTE <==
For the full code check out the GitHub Link at the bottom of the page

The objective:

Build a recommendation system to propose the top 10 songs for a user based on the likelihood of listening to those songs.

The key questions:

What is the structure of the dataset?
What variables will be used to make the recommendation system?
What is the distribution of the ‘rating’ variable?
Which models will be used to build a recommendation system?
How will models be evaluated?
Based on criteria, which model is ‘best’?
Is this model good enough for production?

The Data

The core data is the Taste Profile Subset released by the Echo Nest as part of the Million Song Dataset. There are two files in this dataset. The first file contains the details about the song id, titles, release, artist name, and the year of release. The second file contains the user id, song id, and the play count of users.

Data Source

http://millionsongdataset.com/ (1)

The dataset is split into two .csv files I load here. The two dataframes and their features are:

song_data

song_id - A unique id given to every song
title - Title of the song
Release - Name of the released album
Artist_name - Name of the artist
year - Year of release

	user_id	song_id	play_count
0	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOAKIMP12A8C130995	1
1	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBBMDR12A8C13253B	2
2	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBXHDL12A81C204C0	1
3	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOBYHAJ12A6701BF1D	1
4	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODACBL12A8C13C273	1
5	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODDNQT12A6D4F5F7E	5
6	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SODXRTY12AB0180F3B	1
7	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOFGUAY12AB017B0A8	1
8	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOFRQTD12A81C233C0	1
9	b80344d063b5ccb3212f76538f3d9e43d87dca9e	SOHQWYZ12A6D4FA701	1

count_data

user _id - A unique id given to the user
song_id - A unique id given to the song
play_count - Number of times the song was played

	song_id	title	release	artist_name	year
0	SOQMMHC12AB0180CB8	Silent Night	Monster Ballads X-Mas	Faster Pussy cat	2003
1	SOVFVAK12A8C1350D9	Tanssi vaan	Karkuteillä	Karkkiautomaatti	1995
2	SOGTUKN12AB017F4F1	No One Could Ever	Butter	Hudson Mohawke	2006
3	SOBNYVR12A8C13558C	Si Vos Querés	De Culo	Yerba Brava	2003
4	SOHSBXH12A8C13B0DF	Tangle Of Aspens	Rene Ablaze Presents Winter Sessions	Der Mystic	0
5	SOZVAPQ12A8C13B63C	Symphony No. 1 G minor "Sinfonie Serieuse"/All...	Berwald: Symphonies Nos. 1/2/3/4	David Montgomery	0
6	SOQVRHI12A6D4FB2D7	We Have Got Love	Strictly The Best Vol. 34	Sasha / Turbulence	0
7	SOEYRFT12AB018936C	2 Da Beat Ch'yall	Da Bomb	Kris Kross	1993
8	SOPMIYT12A6D4F851E	Goodbye	Danny Boy	Joseph Locke	0
9	SOJCFMH12A8C13B0C2	Mama_ mama can't you see ?	March to cadence with the US marines	The Sun Harbor's Chorus-Documentary Recordings	0

Observations and Insights:

The Count dataset has 3 columns (user_id, song_id, and play_count)
The Count dataset has 2,000,000 observations
The Songs dataset has 5 columns (song_id,title,release,artist_name,year)
The Songs dataset has 1,000,000 observations
There are some missing titles and releases
The primary/foreign key to merge these two datasets is song_id
The user_id and song_id are encrypted and can be encoded. However, this could cause problems if we were working on a real life data science business problem where user_id and song_id might need to be retained, or if later on in this analysis we wanted to encorporate other features from the 1 million songs data set online. Therefore, I will not encode these.
As the data also contains users who have listened to very few songs and vice versa, filtering these records out of the data could ‘get two birds with one stone’ by decreasing the cold start problem, and decreasing the computational resources needed to analyze this large dataset.

==> NOTE <==

A dataset of size 2000000 rows x 7 columns can be quite large and may require a lot of computing resources to process. This can lead to long processing times and can make it difficult to train and evaluate your model efficiently. In order to address this issue, I filtered the dataset to decrease its size and reduce the class imbalance, and then scaled the play_count feature.

Exploratory Data Analysis (EDA):

Lets take a look at the distribution of ‘play_count’, the feature I use as a proxy for user ‘rating’:

png

From this distribution plot, we can see that the number of songs played by each user (lets call it user interactions) is heavily right skewed. There are many (thousands) users who have only listened to a few songs, so any matrix I build from this data will be extremely sparse. Next I reduce the skew in the play_counts by filtering out users who have less than a minimum (I settled on 90) number of total plays. These are users that there is very little preference data for. Plotting the distriution of ‘play_count’ after filtering:

png

	Column	Non-Null	Count	Dtype
0	user_id	1224498	non-null	object
1	song_id	1224498	non-null	object
2	title	1224498	non-null	object
3	release	1224498	non-null	object
4	artist_name	1224498	non-null	object
5	year	1224498	non-null	int64
6	play_count	1224498	non-null	int64

Filtering out less active users has decreased the imbalance somewhat. The distribution is still heavily right skewed in the above plot, but the class imbalance has been reduced by a factor of 5 (y limit is 1200 instead of 7000). This has also decreased the size of the dataset to 1.2 million records down from 2 million.

This is still not good enough, so I continue to decrease the sparcity and imbalance of the data by filtering out any user/song records that have a play count less than 6. There are many more songs that users have only listened to 1 or a few times. I want to recommend highly rated songs, so I am going to get rid of these songs with low interactions and assume they are uninteracted with ‘not-liked’.

	Column	Non-Null	Count	Dtype
0	user_id	185694	non-null	object
1	song_id	185694	non-null	object
2	title	185694	non-null	object
3	release	185694	non-null	object
4	artist_name	185694	non-null	object
5	year	185694	non-null	int64
6	play_count	185694	non-null	int64

This last filtering step dramatically reduced the size of the dataset by 85%, this will speed up processing times for the models and also make the predictions more accurate by reducing the class imbalance. Finally, I filter out all songs that have less than 20 user interactions:

png

	Column	Non-Null	Count	Dtype
0	user_id	124147	non-null	object
1	song_id	124147	non-null	object
2	title	124147	non-null	object
3	release	124147	non-null	object
4	artist_name	124147	non-null	object
5	year	124147	non-null	int64
6	play_count	124147	non-null	int64

With these filtering steps, I have reduced the class imbalance, decreased the size of the dataset to make models and gridsearch more tracteable, and I have decreased the extreme sparcity of our resulting recommendations matrices. Now I am going to apply a threshhold limit to further reduce class imblance and the play_count range, and then apply a min max scalar to standardize the play_counts so I can effectively use them as a proxy for a 1-10 rating.

Clipping and Scaling play_count:

Because there are very few users who have listened to a song more than 25 times, I will set a threshold at 25 plays. I dont want to drop records with more than 25 plays because this is important information on users likes, So I will clip anything > 25 to 25 and then apply a MinMaxScalar function from the sklearn package to scale the playcounts from 1-10.
Distribution plot after filtering:

png

Distribution plot after scaling:

png

My final play_counts, filtered, clipped, and scaled look pretty good. The class imbalance between the left side of the X axis and the higher play_counts has been reduced, The play_counts are scaled from 1-10, and I have kept the records containing the highest ratings as 10’s.

(121900, 8)

The final dataset has 121,900 records. This is a much more maneagable number of records for training and testing models. After previous iterations of testing the filtering, I had dropped over 90% of the songs so the final recommendations that were being made were not diverse and nearly the same for every user. So Im going to continue EDA and take a look at the filtered dataset to make sure the user/song diversity is still good…

Exploratory Data Analysis Continued…

Checking the total number of unique users, songs, artists in the data

Total number of unique user id

Number of unique USERS =  19212

Total number of unique song id

Number of unique SONGS =  2210

Total number of unique artists

Number of unique artists =  1194

Observations and Insights:

There are 19212 unique users remaining in the dataset after filtering
There are 2210 unique songs remaining in the dataset after filtering
There are 1194 Unique artists remaining in the dataset after filtering
This looks like a great balance, we have filtered out rare users and songs but retained many different users and we have a diversity of songs and artists.

Let’s find out about the most interacted songs and interacted users

Most interacted songs

	song_id	play_count
138	SOBONKR12A58A7A7E0	1466
80	SOAUWYT12A81C206F1	1445
1624	SOSXLTC12AF72A7F54	1186
478	SOFRQTD12A81C233C0	1133
88	SOAXGDH12A8C13F8A1	1024
357	SOEGIYH12A6D4FC0E3	993
1188	SONYKOW12AB01849C9	831
1901	SOWCKVR12A8C142411	748
1343	SOPUCYA12A8C13A694	744
647	SOHTKMO12AB01843B0	616

Most interacted users

	user_id	play_count
5683	4be305e02f4e72dad1b8ac78e630403543bab994	106
766	0a4c3c6999c74af7d8a44e96b44bf64e513c0f8b	82
827	0b19fe0fad7ca85693846f7dad047c449784647e	81
8262	6d625c6557df84b60d90426c0116138b617b9449	74
16334	da3890400751de76f0f05ef0e93aa1cd898e7dbc	69
7930	695179610d0b1fbb9d66267a3bd24946617af7fb	67
17477	e9a7dba8248ced646ea192016660e3c9056c0d03	66
2996	283882c3d18ff2ad0e17124002ec02b847d06e9a	65
5405	48567d388c6a7dda0e9d0a7b6648bdb42440475c	65
10463	8c78c69701072e204f4340ca4d6ee44fe39e40cc	64

Observations and Insights:

The most interacted song is ‘SOBONKR12A58A7A7E0’ which has been interacted with by 1466 different users
The most active user is ‘4be305e02f4e72dad1b8ac78e630403543bab994’, they have listened to 106 different songs

Songs played in a year

	0	49	48	47	46	43	45	50	41	42	...	17	21	7	6	3	5	12	1	2	4
year	0	2009	2008	2007	2006	2003	2005	2010	2001	2002	...	1977	1981	1967	1966	1962	1965	1972	1958	1960	1963
play_count	26580	12782	10616	7499	6693	5718	4861	4617	4316	4147	...	151	145	137	134	113	110	77	66	62	61

2 rows × 51 columns

Text(0, 0.5, 'number of releases')

png

Observations and Insights:

It is not clear whether the ‘year’ feature is, but it is most likely the year a song/album was released.
We can clearly see that there in an increasing trend from 1960-2008 in the number of songs released
This makes sense as there are more people, more artists, and musical equiptment, recording equiptment, and streaming platforms have made it easier to produce music

Now I apply different algorithms to build a recommendation system.

Building a baseline popularity-based recommendation system

In this basic recommendation system, I take the count and sum of play counts of the songs and build the popularity recommendation system based on the sum of play counts (For the full code check out the Github Link at the bottom of the page). The rank based recommender function is:

Here is the output after running the popularity-based RS on the filtered dataset:

	title	artist_name
0	221	keller williams
1	Call It Off (Album Version)	Tegan And Sara
2	Clara meets Slope - Hard To Say	Clara Hill
3	Kelma	Rachid Taha
4	Numb (Album Version)	Disturbed
5	Voices On A String (Album Version)	Thursday
6	What If I Do?	Foo Fighters
7	Encore Break	Pearl Jam
8	Reign Of The Tyrants	Jag Panzer
9	Dance_ Dance	Fall Out Boy

Collaborative Filtering, Matrix Factorization, and Clustering based recommendation sytems

Before running the following recommendation systems, I developed several functions for calculating metrics to evaluate the models. The metrics I used are Root Mean Squared Error, Mean Average Error, Precision, Recall, and F1.

I then split the data into a train and test set (Link to full code is at the bottom of the page). Also, to build the user-user-similarity-based and subsequent models I used the “surprise” library from Python.

User User Similarity-Based Collaborative Filtering Metrics

The code for implementing the user-user similarity matrix with base settings is:

Result:

RMSE: 3.1852
MAE:  2.5755
  algorithm      rmse       mae  precision  recall  f1_score  popularity
0  KNNbasic  3.185239  2.575525      0.682   0.772     0.724        24.3

Observations and Insights:
For the untuned User-user similarity-based model,

The RMSE is 3.18
The MAE is 2.57
The f1 score is 0.72
The average popularity of recommended songs is 24.3

predictions using knn.KNNBasic for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 3.97
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 3.74
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 3.45

Observations and Insights:

For the song the user has seen with a rating of 10, the model predicted a rating of 3.97
For the song the user has seen with a rating of 1, the model predicted a rating of 3.74
For the unheard songs by the user, the model predicted 3.45
The user-user similarity-based collaborative filtering method has good RMSE, MAE and f1_score, but…
The user-user model isnt predicting ratings very well
All three songs have similar predicted ‘ratings’ for this user

Next, I tuned the model using GridSearchCV to try to improve the model performance.

The metrics after tuning are:

2.8805243519619927
{'k': 50, 'min_k': 9, 'sim_options': {'name': 'msd', 'user_based': True}}

RMSE: 2.8309
MAE:  2.2550
        algorithm      rmse       mae  precision  recall  f1_score  popularity
0        KNNbasic  3.185239  2.575525      0.682   0.772     0.724        24.3
1  KNNbasic_tuned  2.830942  2.254989      0.698   0.799     0.745        64.9

Observations and Insights:
After tuning the user-user model,

The RMSE has decreased over the untuned model
The MAE has also decreased
The f1 score has increased
The popularity of recommended songs has increased

predictions using knn.KNNBasic tuned for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 8.95
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 3.14
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 3.38

Observations and Insights:

The model is predicting a rating of 8.95 for the song the user has heard and rated a 10, this is very good.
The model is predicting a rating of 3.14 for the song the user has heard and rated a 1, this is not bad.
The model is rating the unheard song 3.38.
It seems that tuning the model has improved its ability to predict ratings

Finally, because more ‘popular’ songs are more likely to be ‘liked’, I adjust the tuned recommendations by using a custom script to weight recommendations by the number of plays. The final, weighted, recommednations from the User-User similarity-based recommendation system for user ‘6ccd111af9b4baa497aacd6d1863cbf5a141acc6’ are:

	title	artist_name	count_plays	predicted_interaction	corrected_ratings
0	Clara meets Slope - Hard To Say	Clara Hill	89	9.205088	9.099088
1	Numb (Album Version)	Disturbed	68	8.825430	8.704162
2	When You're Gone	Avril Lavigne	76	8.568666	8.453958
3	#40	DAVE MATTHEWS BAND	72	8.503340	8.385488
4	Speechless (Album Version)	The Veronicas	45	8.512847	8.363776
5	XRDS	Covenant	51	8.475145	8.335117
6	(iii)	The Gerbils	88	8.352223	8.245623
7	Modern world	Modern Lovers	51	8.274458	8.134430
8	The Memory Remains	Metallica / Marianne Faithfull	94	8.209485	8.106343
9	Sunburn	Muse	15	8.156890	7.898691

Observations and Insights:

I predicted 10 songs for the user ‘6d625c6557df84b60d90426c0116138b617b9449’ with the user-user collaborative filtering method
Some of the songs that are recommended to the user have a predicted rating close to 10, this is very good.
Evaluating the tuned and untuned models, the tuned model has improved performance based on the evaluation metrics I chose.

Item-Item Similarity-based collaborative filtering recommendation system

Metrics of the item-item model compared to user-user

RMSE: 2.9990
MAE:  2.2550
        algorithm      rmse       mae  precision  recall  f1_score  popularity
0        KNNbasic  3.185239  2.575525      0.682   0.772     0.724        24.3
1  KNNbasic_tuned  2.830942  2.254989      0.698   0.799     0.745        64.9
2   KNNbasic_item  2.998977  2.254952      0.658   0.736     0.695        25.1

Observations and Insights:

The RMSE for the item-item model is 2.99
The item-item collaborative filtering model has nearly the the same MAE as the tuned user-user model
the f1 score is 0.695
The popularity of recommended songs is 25

predictions using knn.KNNBasic_item for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 4.40
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 4.44
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 4.72

Observations and Insights:

The model is predicting 4.4 for the heard song that had rating of 10
The model is predicting 4.44 for the heard song that had rating of 1
The model is predicting 4.72 for the unheard song
Overall, the prediction of ratings is poor compaired to the tuned user-user model

Next, I ran GridsearchCV to to search for the optimal hyperparameters to tune the item-item model with. The optimal hyperparameter settings after running GridSearch are:

2.9243151529965794
{'k': 50, 'min_k': 3, 'sim_options': {'name': 'cosine', 'user_based': False}}

Model Metrics for item-item model after rerunning with the tuned paramters

RMSE: 2.9040
MAE:  2.3017
             algorithm      rmse       mae  precision  recall  f1_score  \
0             KNNbasic  3.185239  2.575525      0.682   0.772     0.724   
1       KNNbasic_tuned  2.830942  2.254989      0.698   0.799     0.745   
2        KNNbasic_item  2.998977  2.254952      0.658   0.736     0.695   
3  KNNbasic_item_tuned  2.903997  2.301706      0.688   0.785     0.733   

predictions using knn.KNNBasic_item tuned for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 4.61
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 4.44
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 4.72

Observations and Insights:

The RMSE decreased slightly after tuning
The MAE has increased slightly
The f1 score increased
The popularity is the same
similar to the untuned item-item model, our predictions are rather poor
In general, the predicted ratings of all the songs are about the same as the untuned model
Given that all of these songs are quite different, these predictions may not reflect the users actually taste.

The final, weighted, recommendations from the Item-Item similarity-based recommendation system for user ‘6ccd111af9b4baa497aacd6d1863cbf5a141acc6’ are:

	title	artist_name	count_plays	predicted_interaction	corrected_ratings
0	Ni Tú Ni Nadie (Versión Demo)	Alaska Y Dinarama	31	10.000000	9.820395
1	My Perfect Cousin	The Undertones	21	9.842105	9.623887
2	Walk On Water Or Drown (Album)	Mayday Parade	24	9.684211	9.480086
3	Docking Bay 94	The Alter Boys	25	9.661287	9.461287
4	Waters Of Nazareth (album version)	Justice	29	9.210526	9.024831
5	Valentine	Justice	24	8.934211	8.730086
6	Please_ Before I Go	Derek Webb	32	8.905551	8.728775
7	Hitsville U.K.	The Clash	25	8.421053	8.221053
8	Go Places	The New Pornographers	20	8.342105	8.118498
9	Uptown	Drake / Bun B / Lil Wayne	24	8.061607	7.857483

Observations and Insights:

Interestingly, the item-item model recommends a completely different top 10 songs than the user-user model

Model Based Collaborative Filtering - Matrix Factorization

Model-based Collaborative Filtering is a personalized recommendation system, the recommendations are based on the past behavior of the user and it is not dependent on any additional information. It uses latent features to find recommendations for each user. Here I am using Singular Value Decomposition (SVD) method from the Suprise library, and calculate the metrics after running the base model with default settings:

RMSE: 2.7215
MAE:  2.1682
             algorithm      rmse       mae  precision  recall  f1_score  \
0             KNNbasic  3.185239  2.575525      0.682   0.772     0.724   
1       KNNbasic_tuned  2.830942  2.254989      0.698   0.799     0.745   
2        KNNbasic_item  2.998977  2.254952      0.658   0.736     0.695   
3  KNNbasic_item_tuned  2.903997  2.301706      0.688   0.785     0.733   
4                  SVD  2.721453  2.168153      0.696   0.798     0.744   

predictions using SVD tuned for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 8.95
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 2.06
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 5.39

Observations and Insights:

The SVD model has the best RMSE and MAE of any models yet
The f1 score is higher than any other untuned models
The popularity of recommended songs is far higher than the other models
The predictions for the heard song with rating 10 is 8.95
The predictions for the heard song with rating 1 is 2.06
The prediciton for the unheard song is 5.39
These are the most accurate predictions yet of any of the models

Improving matrix factorization based recommendation system by tuning its hyperhyperparameters

To tune the SVD model, first I run a factor checking function that plots the RMSE based on a range of latent features.

Here is the plot for this data run for 100 features:

png

According to the figure, there is a decreasing trend of better performance with higher k. The lowest RMSE is achieved when k is 80. However, it is worth mentioning that k = 52 and >84 are also good. The result suggests a range of values which can be used in GridSearchCV()for hyperparameter tunning. Next I ran GridSearchCV to find the optimal hyperparameter settings. The hyperparameter settings for the model that reduced RMSE the most are:

2.7097656150739744
{'n_epochs': 30, 'lr_all': 0.01, 'reg_all': 0.4, 'n_factors': 80}

Model Metrics for Matrix Factorization (SVD) method after rerunning with the tuned paramters

RMSE: 2.6736
MAE:  2.1434
             algorithm      rmse       mae  precision  recall  f1_score  \
0             KNNbasic  3.185239  2.575525      0.682   0.772     0.724   
1       KNNbasic_tuned  2.830942  2.254989      0.698   0.799     0.745   
2        KNNbasic_item  2.998977  2.254952      0.658   0.736     0.695   
3  KNNbasic_item_tuned  2.903997  2.301706      0.688   0.785     0.733   
4                  SVD  2.721453  2.168153      0.696   0.798     0.744   
5            SVD_tuned  2.673590  2.143426      0.694   0.799     0.743   

predictions using SVD tuned tuned for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 8.22
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 4.71
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 5.13

Observations and Insights:

The tuned SVD model has a slightly lower RMSE and MAE than the base SVD model
the f1_score of the the tuned SVD model increased .01
The popularity of the tuned model dropped significantly
The popularity of the base model may have been affected by a single highly rated song.
The predictions for the heard song with rating 10 is about the same as the untuned model
the prediction for the heard song with rating 1 came up a bit to 2
In general the predicted ratings are much better than the non matrix factorization models, but tuning the SVD model did not improve performance much if at all.

Since the untuned SVD model had predictions that were the closest to the users actually play counts, Im going to look at the weighted recommendations for both the default and tuned SVD algorithms. The final, weighted, recommendations from default SVD matrix factorization algorithm for user ‘6ccd111af9b4baa497aacd6d1863cbf5a141acc6’ are:

	title	artist_name	count_plays	predicted_interaction	corrected_ratings
0	Catch You Baby (Steve Pitron & Max Sanna Radio...	Lonnie Gordon	616	10.000000	9.959709
1	Clara meets Slope - Hard To Say	Clara Hill	89	9.879379	9.773379
2	Make Her Say	Kid Cudi / Kanye West / Common	156	9.186673	9.106609
3	Something (Album Version)	Jaci Velasquez	95	8.532769	8.430171
4	Electric Feel	MGMT	179	8.380396	8.305653
5	He's A Pirate	Klaus Badelt	20	8.514556	8.290949
6	221	keller williams	51	8.379492	8.239464
7	Gunn Clapp	O.G.C.	86	8.313128	8.205295
8	Call It Off (Album Version)	Tegan And Sara	66	8.098702	7.975610
9	Girls	Death In Vegas	25	8.072738	7.872738

And the Tuned recommendations:

	title	artist_name	count_plays	predicted_interaction	corrected_ratings
0	False Pretense	The Red Jumpsuit Apparatus	21	7.612781	7.394563
1	Sorrow (1997 Digital Remaster)	David Bowie	77	6.734795	6.620835
2	I'd Hate To Be You When People Find Out What T...	Mayday Parade	21	6.831190	6.612972
3	Recado Falado (Metrô Da Saudade)	Alceu Valença	143	6.648431	6.564807
4	Q-Ball	Brotha Lynch Hung	33	6.705204	6.531126
5	Underground	Eminem	24	6.631778	6.427654
6	Cold Blooded (Acid Cleanse)	The fFormula	38	6.565968	6.403747
7	Walk Through Hell (featuring Max Bemis Acousti...	Say Anything	31	6.578550	6.398944
8	Night Village	Deep Forest	20	6.575026	6.351420
9	Drive	Savatage	24	6.539166	6.335042

Observations and Insights:

The recommended songs from the tuned svd model are much different than the untuned svd model
In general, it seems that the tuned model is recommending less popular songs

Cluster Based Recommendation System

In clustering-based recommendation systems, we explore the similarities and differences in people’s tastes in songs based on how they rate different songs. We cluster similar users together and recommend songs to a user based on play_counts from other users in the same cluster. After running the Coclustering method with default settings, the metrics compared to the other models are:

RMSE: 2.9591
MAE:  2.2116
             algorithm      rmse       mae  precision  recall  f1_score  \
           KNNbasic  3.185239  2.575525      0.682   0.772     0.724   
     KNNbasic_tuned  2.830942  2.254989      0.698   0.799     0.745   
      KNNbasic_item  2.998977  2.254952      0.658   0.736     0.695   
KNNbasic_item_tuned  2.903997  2.301706      0.688   0.785     0.733   
                SVD  2.721453  2.168153      0.696   0.798     0.744   
          SVD_tuned  2.673590  2.143426      0.694   0.799     0.743   
       CoClustering  2.959111  2.211564      0.622   0.666     0.643   

predictions using SVD tuned tuned for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 4.71
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 4.00
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 3.17

Observations and Insights:

The clustering model has an RMSE of 2.9 and MAE of 2.2
The f1 score is .643
Overal the clustering metrics are performing similarly slightly poorer compared to other models
The predicted rating for the song with rating 10 is 4.71
The predictions for the other heard song with raitng 1 is 4

Next, I run GridsearchCV on the clustering-based recommendation system to tune its hyper-hyperparameters. The optimal hyperhyperparameters suggested by the search are:

3.0094114249385258
{'n_cltr_u': 3, 'n_cltr_i': 3, 'n_epochs': 40}

The model metrics after rerunning coclustering with the tuned hyperparamters

RMSE: 2.9613
MAE:  2.2139
             algorithm      rmse       mae  precision  recall  f1_score  \
           KNNbasic  3.185239  2.575525      0.682   0.772     0.724   
     KNNbasic_tuned  2.830942  2.254989      0.698   0.799     0.745   
      KNNbasic_item  2.998977  2.254952      0.658   0.736     0.695   
KNNbasic_item_tuned  2.903997  2.301706      0.688   0.785     0.733   
                SVD  2.721453  2.168153      0.696   0.798     0.744   
          SVD_tuned  2.673590  2.143426      0.694   0.799     0.743   
       CoClustering  2.959111  2.211564      0.622   0.666     0.643   
 CoClustering_tuned  2.961292  2.213922      0.622   0.666     0.643   

predictions using SVD tuned tuned for user 6ccd111af9b4baa497aacd6d1863cbf5a141acc6:

prediction for Red Dirt Road by Brooks and Dunn: r_ui = 10.00 est = 4.7
predictions for Till I collapse by Eminem and Nate Dogg: r_ui = 1.00 est = 3.99
prediction for SOTTGXB12A6701FA0B by Phoenix (other pheonix songs are 7.6): r_ui = None est = 3.23

Observations and Insights:

Tuning the clustering model has not improved it, or only marginally
The prediction for the heard song with rating 10 is 4.7

The final, weighted, recommendations from the tuned coclustering algorithm for user ‘6ccd111af9b4baa497aacd6d1863cbf5a141acc6’ are:

	title	artist_name	count_plays	predicted_interaction	corrected_ratings
0	Love Is Gone (Original Mix)	David Guetta - Joachim Garraud - Chris Willis	20	8.561523	8.337917
1	Cold Blooded (Acid Cleanse)	The fFormula	38	8.275809	8.113588
2	The World Is Mine	David Guetta	21	8.056761	7.838544
3	What Is Light? Where Is Laughter? (Album Version)	Twin Atlantic	20	7.872448	7.648841
4	Kelma	Rachid Taha	58	7.642269	7.510962
5	Love Is Not A Fight	Warren Barfield	36	7.561523	7.394857
6	I Wonder Why	Dion & The Belmonts	41	7.517873	7.361699
7	Hasta siempre	Varios	20	7.552595	7.328988
8	Numb (Album Version)	Disturbed	68	7.418666	7.297398
9	221	keller williams	51	7.411147	7.271119

Observations and Insights:

Ther top 10 recommended songs are similar to the user-user and SVD models

Content Based Recommendation System

So far I have only used the play_count of songs to find recommendations but there are other information/features on songs as well, and these features can be used to increase the personalization of the recommendation system. For example, we can use the artist name and album title to recommend songs to users from artists/albums they like but songs they have not heard yet. We can also include the year the song was released and recommend music from the same time period. Here are the main functions I used for the content model:

Before Running any of the content based models, I preprocessed the text data by tolkenizing it with natural language processing methods that come standard in the NLTK package. First, I coded the year column into text:

	song_id	artist_name	release	year	year_text
0	SODGVGW12AC9075A8D	Justin Bieber	My Worlds	2010	twothousandtens
1	SODGVGW12AC9075A8D	Justin Bieber	My World 2.0	2010	twothousandtens
2	SOEPZQS12A8C1436C7	Deadmau5	Ghosts 'n' Stuff	2009	twothousandzeros
3	SOGDDKR12A6701E8FA	Eminem / Hailie Jade	The Eminem Show	2006	twothousandzeros
4	SOWGXOP12A6701E93A	Eminem	Without Me	2002	twothousandzeros

Then I combined all of the text features into a single column ‘text’:

	year_text	text
song_id
SODGVGW12AC9075A8D	twothousandtens	Justin Bieber My Worlds twothousandtens
SODGVGW12AC9075A8D	twothousandtens	Justin Bieber My World 2.0 twothousandtens
SOEPZQS12A8C1436C7	twothousandzeros	Deadmau5 Ghosts 'n' Stuff twothousandzeros
SOGDDKR12A6701E8FA	twothousandzeros	Eminem / Hailie Jade The Eminem Show twothousa...
SOWGXOP12A6701E93A	twothousandzeros	Eminem Without Me twothousandzeros

I then calculated a tf-idf matrix and calculated the cosign similarity.

The Recommendations using this method are:

	title	artist_name	count_plays	predicted_interaction	corrected_ratings
0	Cats In The Cradle	Ugly Kid Joe	30	7.789474	7.606899
1	Cold Blooded (Acid Cleanse)	The fFormula	38	7.000000	6.837779
2	Hold Me_ Thrill Me_ Kiss Me_ Kill Me	U2	33	6.770725	6.596648
3	Twilight Galaxy	Metric	28	6.696697	6.507715
4	Last Nite	The Strokes	21	6.724206	6.505988
5	The Calculation (Album Version)	Regina Spektor	44	6.636842	6.486086
6	Mansard Roof (Album)	Vampire Weekend	26	6.587426	6.391310
7	Face To Face	Daft Punk	32	6.525241	6.348465
8	Injection	Rise Against	23	6.485387	6.276872
9	Chinese	Lily Allen	14	6.092105	5.824844

Observations and Insights:

This is excellent, we are recommending songs by the same artists/albums the user has likes and adding in some weight for the decade of the users song preferences.
We are also ranking the recommendations based on song popularity, so more listened to songs by other users will be weighted heavier for this user

hybrid recommendation system

I have explored five different recommender methods (six if you include the popularity-based method) for recommending songs to a user in a personalized way. This is all well and good, but how do we know which method is recommending songs the user will actualy listen to? As an avid listener of music, is it not also true that a users preferences can change, and may change drastically during the day or from day to day? It also seems that the models we have presented so far sometimes recommend similar songs, but other models are drastically different as in the item-item recommendations and content based recommendations. In this section, I build a hybrid recommender system which takes into account recommendations from multiple models, applys weights based on which models we want to be most represented in the results, and then provides a ‘hybrid’ recommendation.

In my hybrid system, I fit models and make predictions by combing two of the previously evaluated models:

SVD (default settings)
User-User collaborative Filtering (tuned)

I chose these two models because both had the best F1_scores metrics, both had fairly good predictions, and both recommended markedly different songs. I then add in a subset of recommendations from the content-based filtering method to increase the ‘familiarity’ for the user of the recommended songs/artists. This will hopefully provide a highly personalized recommendation for a user with a balance of new songs and familiar artists to give choices based on potential dynamic user preferences. I also evaluated the performance of different weight combinations using the hold-out set. For example, we can try different combinations of wA and wB for the SVD and User-User model combination, ranging from (0.1, 0.9) to (0.9, 0.1), and compute their respective RMSE scores:

RMSE: 2.7976
MAE:  2.2302
Weight combination (0.1, 0.9): RMSE = 2.7976, MAE = 2.2302
RMSE: 2.7495
MAE:  2.1971
Weight combination (0.3, 0.7): RMSE = 2.7495, MAE = 2.1971
RMSE: 2.7186
MAE:  2.1749
Weight combination (0.5, 0.5): RMSE = 2.7186, MAE = 2.1749
RMSE: 2.7057
MAE:  2.1634
Weight combination (0.7, 0.3): RMSE = 2.7057, MAE = 2.1634
RMSE: 2.7110
MAE:  2.1619
Weight combination (0.9, 0.1): RMSE = 2.7110, MAE = 2.1619

Observations and Insights:

there is not much difference in RMSE for the different combinations of model weights
The ‘best’ weight combination is SVD .7 - collaborative filtering .3

Conclusion and Recommendations

1. Comparison of various techniques and their relative performance based on chosen Metric (Measure of success):

In General the Method 2 dataset had better MSE and MAE than the method 1 dataset, but the Method 1 dataset had better f1_score and the predictions for listened to songs were closer to the actual playcounts
The SVD and user-user based models performed the best according to RMSE, MAE, and f1
The item-item model recommended different songs that the SVD, clustering, and user-user
In general, tuning the models did improve their performance, however it became clear that one of the biggest effects on model performance is the data going into the model.
The Content based recommender system, just based on artist name, album title and year does a decent job of recommending artists the user likes, and songs from albums the user has listened to.
Many of the songs in the SVD,clustering, collaboritive filtering recommendations were not artists in the users top listened artist/song list, therefore I added the Content Based Recommendations into the hybrid recommender system.
I chose to tune my hybrid recommendation system using SVD with default settings because it was the best performing model, and user-user collaborative Filtering model. Combining these added more potential song diversity that might align with the user preferences into the final recommendation.
I also added content-based recommendations into the hybrid model for ‘familiarity’
There is room for improvement in the performance of the models and final recommender system:
more features such as genre, loudness, danceability, and lyrics could be added as features to improve Content based recommendations
Further improvement could be made on filtering/scaling the play_counts to achieve better predictions
A different approach could be tried, such as Neural Networks

2. Refined insights:

The impact of the Y variable - in this case play_count - cannot be understated.
The distribution and scarcity in the play_counts, and how I chose to filter and transform it had large impacts on the model performance
The Content-based recommender system, although perhaps the ‘simplist’ of the models besides the popularity based recommender system, performs predictably and precisely recommends songs based on content
matrix factorization outperformed other models. It runs quickly, and while it has only marginally better performace metrics than other collaborative or cluster-based models, its predictions were the closest to actual.
The metrics you choose to assess model performance with are extremely important. depending on the metrics you choose, they can change the decisions you make about models and recommendations.

3. Proposal for the final solution design:

I am proposing the hybrid recommender system I built be adopted. It provides a good balance of artists/albums the user is known to like, but may not have listened to other songs of, and new content recommended by usesr-user collaborative filtering and matrix factorization,
In short, it is highly personalized!
This hybrid recommender system could be adjusted and improved further based on more features for content-based recommending or alternative compositions of models/weights

Final 10 song recommendation for user 6d625c6557df84b60d90426c0116138b617b9449:

	title	artist_name	count_plays	predicted_interaction	corrected_ratings
0	Clara meets Slope - Hard To Say	Clara Hill	89	9.677092	9.571092
1	Catch You Baby (Steve Pitron & Max Sanna Radio...	Lonnie Gordon	616	8.470891	8.430600
2	Make Her Say	Kid Cudi / Kanye West / Common	156	7.772713	7.692649
3	Cats In The Cradle	Ugly Kid Joe	30	7.789474	7.606899
4	Recado Falado (Metrô Da Saudade)	Alceu Valença	143	7.642525	7.558901
5	#40	DAVE MATTHEWS BAND	72	7.661807	7.543956
6	Walk Through Hell (featuring Max Bemis Acousti...	Say Anything	31	7.678929	7.499324
7	Eternal Flame (Single Version)	Atomic Kitten	49	7.411352	7.268495
8	Cold Blooded (Acid Cleanse)	The fFormula	38	7.000000	6.837779
9	Hold Me_ Thrill Me_ Kiss Me_ Kill Me	U2	33	6.770725	6.596648

References

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011.

Top of Page Raw Code