Spotify | Akanksha

Songs Analysis w/ Spotify

Industry

Machine Learning

Course

Usable Artificial Intelligence

My Role

Data Scientist

Timeline

Jan 2025 - May 2025

View Github

Decoding the beats: a data-driven approach with Spotify.

During my journey to learn more about machine learning and artificial intelligence, I was drawn to exploring broader trends across music history to understand what makes a song popular.

In this project, I set out to answer two guiding questions: What patterns emerge across musical features over time and genre? Can we predict a song's popularity or recommend similar songs basedon these features?

I ended up building a Streamlit-powered music recommender system that compares songs using audio features from Spotify data. Users can select a favorite song to get smart, feature-based recommendations.

Dataset Overview & Analysis

The dataset I used was sourced from Kaggle, originally collected using Spotify's Web API. It includes 15,150 hit songs by 3,083 artists spanning over a century. Each song has audio features that are added by Spotify.

Here are some insights I found from using the following methods:

Data Prep & Exploration

Pop dominates in count and popularity, followed by Rock and Rap
Popular tracks tend to be louder, more danceable, less acoustic
Feature correlations with popularityare weak and nonlinear

Predictive Modeling

Ran Regression (Random Forest, Ridge, Lasso) → Best R² ≈ 0.22
Ran Classification (Random Forest, SVM, XGBoost) → Best F1 ≈ 0.39
Models perform best for very popular songs, less accurate for mid-tier

Clustering & PCA

Applied PCA to reduce dimensionality and visualize song space
K-Means Clustering (k=4) revealed hidden acoustic groupings like:
- Mainstream pop: Loud, upbeat, high-valence
- Modern rap/EDM: High speechiness, very popular
- Acoustic/folk/jazz: Soft, niche, less popular
- Rock/metal: Loud, fast, mixed success

Visualizing Data Clusters with a Hexbin Plots

Overall, I found that no single audio feature can predict a song's popularity — trends are contextual, not absolute. But when features are combined, they reveal meaningful patterns.

Reducing Dimensions to Reveal Patterns

Audio profiles are effective for clustering and recommendations, especially when analyzed across decades. The sound of success evolves over time.

Building an Interactive Music Recommendation App

After exploring clustering and recommendation techniques, I developed a Streamlit app that lets users select a song and discover others with similar audio features. The app uses K-Means clustering to narrow the search and cosine similarity to find the closest matches. Users can compare songs through radar charts and tables.

One cool example: Black Sabbath was matched with The Parting Glass — not because they are the same genre, but because their sound features are alike.

What I learned

This project was definitely a challenge for me. I realized just how vast the world of machine learning is and how much the quality of your data impacts everything.

Still, I learned a lot along the way, and it's made me more excited to get more into advanced machine learning methods for future projects.