This small project is a joint work done with:

About

The motivation of this project has been to study whether we could create a Network of songs and find reasonable communities in them using a community detection algorithm of our choosing (Walktrap).

This study was done as an assignment for the subject called Information Retrieval (IR) for Facultat d’Informatica de Barcelona at Universitat Politecnica de Catalunya (FIB - UPC). This subject is from the Master in Innovation and Research in Informatics at said university.

Approach

Each node is represented by a song, and we are using their lyrics to establish whether they are connected or not.

The method chosen to create the edges/adjacencies has been using the cosine similarity with tf-idf (term frequency/inverse document frequency) in conjuction with a threshold that determines whether to create the adjacency or not.

The tf-idf is a method that treats the terms as a Bag of words. This means that we are not taking into account their context, grammar or topic. So it’s only relevant whether the term appears and how many times it does.

Because of having used this method, the results were inconclusive.

Date

Study report

You can find the report of the project here.

Code

You can find all the code in this repo.