A peep at the current state of jazz music through data from Spotify (part 2)

George Tsang
14 min readMar 17, 2022

Focusing on Grammy-winning and nominated tracks, albums and artists

Abstract

After trying genre-specific search with Spotipy in part 1 which returned data of pretty bad quality, the approach was changed for part 2. The new approach uses data of Grammy-winning or nominated artists, albums and tracks from Wikipedia. Both quality and quantity of data was improved.

Link to part 1:
https://tsanggeorge.medium.com/a-peep-at-the-current-state-of-jazz-music-through-data-from-spotify-part-1-75400069f2dc

In part 2, popularity of artists, albums and tracks continue to be the main subject. They were examined in relation to award category, album release date, gender and a number of other factors.

With 25 wins & 71 nominations, Chick Corea is perhaps the most decorated jazz artist in Grammy’s history. Photo credit: Chick Corea Productions.

Methodology

Similar to part 1, Spotify data were retrieved from Spotify Web API endpoints using Spotipy Python library. However, I changed the approach to target Grammy-winning and nominated tracks, albums and artists, which is much more specific than data used in part 1. To do that, pandas.read_html method was used to scrape tabular data on Wikipedia pages of Grammy Award for:-

  1. Best Improvised Jazz Solo;
  2. Best Jazz Vocal Album;
  3. Best Jazz Instrumental Album;
  4. Best Latin Jazz Album; and
  5. Best Large Jazz Ensemble Album.

These are all of the Grammy Awards in the category of jazz currently. Although most of the above awards exist for many decades, time period of data scraped was limited to 2000 (42nd Annual Grammy Awards) and beyond to align with the aim of the study, i.e., to describe to CURRENT state of jazz. All data were collected in mid to late February 2022.

A mix of manual and automated operations on strings were employed to clean the data from Wikipedia using pandas and Python standard libraries. The strings were basically names of tracks, albums and artists. The only numeric data were year awarded or nominated. The information was then used to search for corresponding URIs for matching tracks, albums and artists using sp.search method of Spotipy. Various other methods were used to retrieve all necessary catalogue information, popularity and no. of followers. Returned results were compared to the data from Wikipedia to make sure most of the results were correct. Criteria used in the comparison were:

  1. There must be at least one matching word in names of tracks, albums and artists between search results and Wikipedia data. For example, “Etta Jones” would match “Etta James” for “Etta”.
  2. The album release year has to be 1 or 2 years earlier than the year the annual award ceremony was held.

Although it is easy for a record to have one matching word in names of tracks, the chance is much slimmer when album name, artist name and release year are added to the criteria. The names were also converted to lowercase and stripped of punctuation and “stop words”, e.g., “and”, “the”, “big”, “band”, “trio”, etc. This increased accuracy by quite a lot.

Returned results that did not fulfill the above criteria were marked as potential mismatch and saved for manual search. At the end, I was able to match 79 out of 115 Grammy-winning or nominated tracks of the Best Improvised Jazz Solo category, and 411 out of 454 Grammy-winning or nominated albums in the other categories. Apparently, not all tracks, albums and artists were available on Spotify. However, data yielded by this method were of much higher quality and larger amount compared to the method used in part 1.

It should be noted that some artists are missing because I only allowed one (main) artist per album. But some albums are billing multiple artists equally. Although awards and nominations were given to all of them, I did not include all of them in the dataset. This problem can be handled better if I took the time to create a database and tables for tracks, albums, and artists separately.

The chart below shows the total no. of tracks, albums and artists in the dataset by category of Grammy Awards. The chart below shows the total no. of tracks, albums and artists by category of Grammy Awards in the dataset.

An overview of the dataset

There are a lot less tracks from Best Improvised Jazz Solo category compared to other categories because individual tracks were nominated in this category, not the whole album like other categories. The no. of albums and artists by category are pretty balanced. The total % of the artists’ donut chart is more than 100% because some artists competed in multiple categories.

The differences between % of artists and albums for the Best Jazz Vocal Album (green), Best Latin Jazz Album (red) and Best Large Jazz Ensemble Album (orange) are worth noting. For Best Jazz Vocal Album, there are less artists competing in this category, with % of artist 5% lower than % of album. While for the other 2 categories, there are significantly more artists competing, with % of artist 6% and 7% higher than their respective % of album. This does not indicate more or less fierce competition. In fact, albums in these categories are less popular (as you will see in the section below). I found that many of these albums were not available on Spotify as I searched for them manually.

Some genres are less popular

In Spotify, popularity has a range of 0 to 100. For artists, albums and tracks in the dataset, their popularities range from 0 to about 70, with medians of around 35, 15 and 5 respectively. When we look at individual categories, it is quite obvious that artists, albums and tracks competing in the Large Jazz Ensemble and Latin Jazz categories are significantly lower than the other 3 categories, i.e., they come from different distribution.

Maria Schneider (frontmost), winner of multiple Grammys, composer and big band bandleader. Photo credit: Jeff Riedel for The New York Times.

On a side note, all of Maria Schneider’s albums are unavailable on Spotify. There are only 6 tracks under her name. This is probably due to her stand on musicians’ rights and copyright, specifically “freemium” model.

Bebo and Chucho Valdés, both latin jazz artists and Grammy winner, father and son. Photo credit: Fernando Alvarado / EPA.

To demonstrate that the popularities between categories are different statistically, statistical tests are necessary. Kruskal-Wallis test, Dunn’s test and a couple others, were chosen and done in RStudio. The following paragraphs will explain why these tests were chosen.

Test for normality

The most classic test to determine if the means of 2 datasets are the same is perhaps the Student’s t-test (and ANOVA for comparison of more than 2 datasets). One of the most crucial assumptions is that data are normally distributed.

To check for normality, I used Shapiro-Wilk test. The null hypothesis of it is data are normally distributed. Alpha is set at 0.05. A p-value smaller than alpha rejects the null hypothesis. The following code were used.

Data in csv files are arranged so that popularity values of each category occupies 1 column.

Results show that only track popularities for Best Jazz Solo, Best Jazz Vocal and Instrumental Album categories have p-values larger than 0.05, which accept the null hypothesis. It does not make sense to use different tests for different groups because I want to comparable results between groups. Therefore, I need a test that is applicable to all groups and does not assume distribution.

Test for equal variance

Another assumption of Student’s t-test and ANOVA is equal variance. Levene’s test of equality of variance can be used to test for this assumption. Since the distribution of data departs significantly from normal distribution, and there are lots of outliers in track popularities, I decided to go with a variation of Levene’s test, i.e., Brown-Forsythe test, which uses median instead mean. This means it calculates deviation of observations from group median.

A p-value smaller than alpha, i.e., 0.05, will reject the null hypothesis of equal variance between groups of data. The following codes were used.

tidyr pivots the 5 columns into 2, which are the category and popularity column.

Results show that only the p-value for groups of artists is larger than 0.05, which means the null hypothesis of equal variance between groups is accepted.

Other consideration: sample size

Some sources point out that Student’s t-test and ANOVA is tolerant to certain extent of violation of assumption of equal variance, with the condition that sample of different groups are of similar size. However, the Best Improvised Jazz Solo category only make up 2% of all tracks, while the other categories contribute 20% to 30% of all tracks. Sample sizes are definitely not similar.

Although there are Welch’s t-test and Welch’s ANOVA test, which are the variations for groups of data of unequal variance and should be tolerant to violation of normal distribution, the deviation of distributions of many groups of data are more serious than just “deviation from normal distribution”. They range from exponential distribution to multi-modal distribution. Even though some resemble normal distribution, they are so skewed that most of them did not pass the Shapiro-Wilk test. Hence, a test that does not assume equal variance, similar sample size and any distribution is needed. Here comes the Kruskal-Wallis test.

Kruskal-Wallis test — a non-parametric test by ranks

Instead of popularities, Kruskal-Wallis test is a test on ranks of popularities of tracks/albums/artists. For example, a dataset of 4 artists of popularity of 0, 10, 15 and 30 would be ranked 1, 2, 3 and 4 respectively. Statistics are then computed based on these ranks.

A p-value of smaller than alpha, i.e., 0.05, would reject the null hypothesis that all groups come from the same distribution. The following codes were used.

Unsurprisingly, p-values for groups of tracks, albums and artists are all much smaller than 0.05. As a result, the null hypothesis is rejected. It can be concluded that they come from different distributions. However, Kruskal-Wallis test does not tell which group(s) is/are different from which. A post hoc analysis is necessary to answer the question.

Post hoc analysis: Dunn’s test

Dunn’s test is a common follow-up for Kruskal-Wallis test. It tests groups in pairwise fashion, so it is possible to determine which one is different from which. A p-value smaller than 0.05 would rejected the null hypothesis that both groups come from the same distribution. The following codes were used.

As expected, p-values for all pairs involving tracks, albums and artists in the Large Jazz Ensemble and Latin Jazz categories are smaller than 0.05, except the Large Jazz Ensemble : Latin Jazz pair. The p-values for pairs between the other 3 categories are also larger than 0.05. In other words, tracks, albums and artists in the Large Jazz Ensemble and Latin Jazz categories come from the same distribution, while those of the other 3 categories belong to another distribution.

Newer tracks are not necessarily more popular

Note: this chart is not tracking popularities of the same tracks over the years.

It should be obvious that there is no clear trend or pattern among tracks released in different year, although there are significant differences in distribution between certain years. It can be concluded that whether the tracks are old or new has nothing to do with track popularity.

Winners are more popular

I tried comparing popularities between winners and nominees too. It makes sense to see winners being more popular than nominees in general, although not by much except for artists.

Note that the sample sizes are very unbalanced (usually 5 nominees per category per year, only 1 winner among them), and some winners are included in nominees too because they compete in multiple years and do not always win. If artists are divided by the criterion of at least one win and no win at all, the the medians could be farther apart.

Some tracks are included in both winner and nominee because they could be winner of Best Jazz Solo and nominated for other categories as part of an album, or vice versa.

Female jazz vocalists are the most popular

A few things to note here:

  1. Among the 258 artists, only 40 (~16%) have a popularity ≥ 50.
  2. Among the artists with popularity ≥ 50, over 40% compete in the Best Jazz Vocal Album category, which means they are either vocalists or double as vocalists. This percentage is indeed very significant, given that jazz nowadays is predominantly instrumental.
  3. The most popular artists are vocalists. In fact, jazz vocalists are more popular than instrumentalists on average.
  4. Among the artists with popularity ≥ 50, 16 are jazz vocalists, of which 11 of them are female, which is more than double of male.
  5. Among the artists with popularity ≥ 50, 13 are female, of which only 2 are instrumentalists.
  6. No. of followers increases exponentially as popularity increases.

Overall, it can be concluded that among the popular artists, vocalists are the most popular. There is a significant portion of vocalists. Genders in different categories and in general are unbalanced.

Gender and living/deceased

Here, we continue to dig the popular artists. A few things to note here:

  1. Among the 40 popular artists, only 11 are deceased. Of course, this dataset only contains artists that competed in Grammy since 2000.
  2. While deceased female artists on average have more followers than the living female artists, the opposite happens to the male artists (although the medians are very close).

Living popular artists are quite old

I went a step further to look into the age of the living popular artists. Below are some simple statistics:

  1. Age 62 on average. Average age of female artists is about 54. Average age of male artists is about 65.
  2. The youngest* male and female artists are Jon Batiste and Cécile McLorin Salvant, aged 32 and 35 respectively.
  3. The oldest male and female artists are Sonny Rollins and Patti Austin, aged 91 and 71 respectively.

I took a quick look at the winners and nominees of Grammy Award for Best Pop Vocal Album and found out, unsurprisingly, artists in this category is in general much younger. The youngest and oldest winner/nominee are perhaps Billie Eilish (age 20) and Paul McCartney (age 79).

*The youngest jazz artists being nominated in recent years is perhaps Joey Alexander. He was born on June 25, 2003 (now 18). His album My Favourite Things was nominated for Best Jazz Instrumental Album in 58th Grammy Award (2016). The album was released on May 12, 2015, when Alexander was just under 12. However, the album was not available on Spotify, thus not available in the dataset.

Conclusion

Among the 5 categories of Grammy Awards, Best Large Jazz Ensemble Album and Best Latin Jazz Album have significantly lower artist, album and track popularities compared to other categories. My observation agrees with the data that Large Jazz Ensemble (e.g. big band, orchestral jazz) and Latin Jazz are the less popular form of jazz nowadays.

In fact, large jazz ensemble is difficult to maintain. According to the 63rd Grammy Awards Rules and Guidelines (2020) by Recording Academy, there must be 9 or more members generally to be eligible to compete in this category. Getting 9 jazz musicians to play together is undoubtedly more difficult than making up a trio or quartet, financially, logistically and technically. Generally, large ensemble requires more effort in arrangement and rehearsal in order to sound good. In contrast, smaller formats like trio, quartet and quintet can be much more spontaneous in nature.

While recency does not seem to have relationship with popularity of jazz tracks, jazz vocalists do tend to be more popular than instrumentalists, especially female vocalists. This is a conclusion drawn from only artists with popularity over 50. However, I do believe this holds true for the less popular too because there are just more female than male jazz vocalists. Back in Hong Kong, I also observed more female than male jazz vocalists in the local music scene. Besides, winners also tend to be more popular than nominees, which is normal. Winners should receive more recognition and fame than nominees.

In terms of age, popular jazz artists are quite old in general. I would not say jazz is old men’s music, but it does prove that being old in jazz is not a problem. Indeed, many revered veterans like Herbie Hancock, Wayne Shorter and Sonny Rollins are still and will continue to be relevant. Roy Haynes, now 97, is still gigging. They are old but GOLD.

In terms of followers, I do not see a significant lack thereof for living artists compared to deceased artists. This is opposite to the conclusion I made in part 1 of this study. It is true that the deceased female artists have more followers than living female artists on average, but this is only so because of the notable exception of Etta James. Living artists have the rest of their lives to catch up.

Epilogue

Since focusing on Grammy-winning or nominated artists, albums and tracks gives good data quality and quantity, this approach will also be applied to collection of data of Pop, Rock, R&B, Rap and Dance/Electronic for comparison in future.

It is worth noting that there is a Grammy Award for Best Traditional Pop Vocal Album. I only discovered it when I started writing this article. IMHO, many albums competing in this category are jazz legitimately. Quite a few tracks from albums competed in this category are included in the playlist I created for the part 1 article. In this category, you find wildly popular names, past and present, e.g., Tony Bennett, Frank Sinatra, Michael Bublé, and even Lady Gaga, Norah Jones and John Legend.

According to the 63rd Grammy Awards Rules and Guidelines (2020) by Recording Academy:

This category is for performances of a type and style of song that cannot properly be intermingled with present forms of pop music. This includes older forms of traditional pop such as the Great American Songbook, created by the Broadway, Hollywood and Tin Pan Alley songwriters of the period between the Twenties and the end of World War II, as well as cabaret/ musical theater style songs and previous forms of contemporary pop. This would also include contemporary pop songs performed in traditional pop style — the term “traditional” being a reference, equally, to the style of the composition, vocal styling and the instrumental arrangement, without regard to the age of the material. (p. 42)

The above guideline is actually pointing towards a very specific form of music. Depending on the elements involved, songs in this category are not necessarily jazz but always make reference to the jazz of that particular, old era. The very same song could compete in the jazz category if arranged differently. If you look closely, you would find some artists competing in this and the jazz category, e.g., Natalie Cole, Bill Charlap. I do not know why it is separated from Best Jazz Vocal Album but I think Recording Academy wants to preserve the tradition, as the name of the award suggests (which is great).

Reference

Recording Academy, 2020. 63rd Grammy Awards Rules and Guidelines
https://www.recordingacademy.com/sites/com/files/rulesandguidelines_2020_linked_update3.pdf

--

--

George Tsang

Studied biology and data analytics, self-taught in music and photography. Looking for the algorithm that connects the dots in my life.