Songs Analysis w/ Spotify
Decoding the beats: a data-driven approach with Spotify.
During my journey to learn more about machine learning and
artificial intelligence, I was drawn to exploring broader
trends across music history to understand what makes a song
popular.
In this project, I set out to answer two guiding questions:
What patterns emerge across musical features over time
and genre?
Can we predict a song's popularity or recommend similar
songs basedon these features?
I ended up building a Streamlit-powered music recommender
system that compares songs using audio features from Spotify
data. Users can select a favorite song to get smart,
feature-based recommendations.
Dataset Overview & Analysis
The
dataset
I used was sourced from Kaggle, originally collected using
Spotify's Web API. It includes 15,150 hit songs by 3,083
artists spanning over a century. Each song has
audio features
that are added by Spotify.
Here are some insights I found from using the following methods:
Here are some insights I found from using the following methods:
Data Prep & Exploration
- Pop dominates in count and popularity, followed by Rock and Rap
- Popular tracks tend to be louder, more danceable, less acoustic
- Feature correlations with popularityare weak and nonlinear
Predictive Modeling
- Ran Regression (Random Forest, Ridge, Lasso) → Best R² ≈ 0.22
- Ran Classification (Random Forest, SVM, XGBoost) → Best F1 ≈ 0.39
- Models perform best for very popular songs, less accurate for mid-tier
Clustering & PCA
- Applied PCA to reduce dimensionality and visualize song space
-
K-Means Clustering (k=4) revealed hidden acoustic
groupings like:
- Mainstream pop: Loud, upbeat, high-valence
- Modern rap/EDM: High speechiness, very popular
- Acoustic/folk/jazz: Soft, niche, less popular
- Rock/metal: Loud, fast, mixed success
Visualizing Data Clusters with a Hexbin Plots
Overall, I found that no single audio feature can predict a
song's popularity — trends are contextual, not absolute. But
when features are combined, they reveal meaningful patterns.
Reducing Dimensions to Reveal Patterns
Audio profiles are effective for clustering and
recommendations, especially when analyzed across decades.
The sound of success evolves over time.
Building an Interactive Music Recommendation App
After exploring clustering and recommendation techniques, I
developed a Streamlit app that lets users select a song and
discover others with similar audio features. The app uses
K-Means clustering to narrow the search and cosine
similarity to find the closest matches. Users can compare
songs through radar charts and tables.
One cool example: Black Sabbath was matched with The Parting Glass — not because they are the same genre, but because their sound features are alike.
One cool example: Black Sabbath was matched with The Parting Glass — not because they are the same genre, but because their sound features are alike.
What I learned
This project was definitely a challenge for me. I realized
just how vast the world of machine learning is and how much
the quality of your data impacts everything.
Still, I learned a lot along the way, and it's made me more excited to get more into advanced machine learning methods for future projects.
Still, I learned a lot along the way, and it's made me more excited to get more into advanced machine learning methods for future projects.