Model

We have two baselines:

Random baseline: for n total guesses, randomly select n songs from the set as predictions.
Mode baseline: Select the top n most popular songs in the dataset based on number of appearances in playlists where n is the number of guesses we allow the model to make. For example, when n=1, we guess the most popular song, “HUMBLE.” by Kendrick Lamar.

To improve on these baselines, we first attempted to use word embedding techniques to find the relationships between songs. Word embeddings map words to vectors to find relationships between different words (and therefore determine their contextual significance). We attempted to do something similar by treating each song as a word and each playlist as a sentence to create a Doc2Vec model. However, we found that the data was too sparse to make for an effective model. For the language problem, the number of words is on a magnitude of hundreds of thousand; the Oxford English Dictionary, for instance, has 170,000 words. At least in the general sense of the problem, there are a nearly infinite number of sentences out there to train off. With the playlist problem there are over 2 million songs but only 1 million playlists from which to train. This meant there was simply not enough data per song to find relationships and generate effective predictions.

We then turned to collaborative filtering. Collaborative filtering uses a playlist by song matrix: every row represents a playlist and every column represents a song. Matrix[i,j] is then defined as 1 if playlist i contains song j and 0 otherwise. Given this formulation, we can then think about the similarity between two songs as a function of cosine similarity between their respective vectors. If two songs are present in the same playlists, their column vectors will be very similar, and thus the cosine similarity between them will be high. To actually generate a new song for a playlist, we compared every other song in the dataset to every song in the playlist, and chose the song that was closest on average to each song in the playlist, using cosine similarity as our distance metric. This method functions independent of order, which suits our focus on general, batched recommendations instead of automatic playlist generation that flows continuously from song to song. We trained our model on 300,000 playlists (~⅓ the whole playlist set) and tested on 10,000 playlists. Training generally took 15 minutes and predictions took 30-45 minutes per set of 10,000, with some variance based on the number of guesses allowed and the number of input songs.

We also played with some prior assumptions about the usefulness of number of followers in weighting a model. During data exploration, we theorized that the most popular playlists have important insights into listener preference and therefore should actually be given more weight in future models. We attempted to leverage this insight by adding duplicate playlists for popular playlists. We took a separate training sample of 100,000 playlists and duplicated a playlist for every follower it had. This increased the “size” of that training set to 240,000 playlists (the average playlist has 2.4 followers). As we discuss in the results section, however, this did not increase accuracy and in fact decreased it, likely due to overfitting.