So just as a quick recap – I started with 9,000 audio files, converted them into 9,000 spectrograms, split them up into 185,000 smaller spectrograms and trained a convolutional neural network on these images. I then extracted 185,000 feature vectors for all these images and calculated the average vector for each of the 9,000 original audio files.
At this point I had now extracted 128 features from the music files that identified different characteristics in the music. So in order to create recommendations of songs that shared similar characteristics, all I needed to find were the vectors that were most similar to one another. To do that, I calculated the cosine similarity between all 9,000 vectors.