My 2018 in Music

Published by . [Permalink]

If your social media feed is anything like mine, you probably see a lot of posts like this toward the end of the year.

Figure 1: Spotify promomotional image for “Spotify Wrapped 2018”.

Figure 1: Spotify promomotional image for “Spotify Wrapped 2018”.

It can be fun to see what kind of music other people like and to share your own music tastes. It’s also a great advertisement campaign for Spotify (see their nice logo in the top left of these graphics).

The only problem for me is that I’m not a Spotify user, so when I try to open my #2018Wrapped data, I am greeted with a very nicely packaged empty box. Fortunately, as I wrote about in my last post, I log all of my music streaming using a free, open-source service called ListenBrainz. I am going to use that data to create my own end-of-year music graphic similar to the ones posted by my friends who use Spotify.

The Data

I’m doing this project in R for a couple of reasons. First of all, I kind of like R. Honestly this wasn’t the case a few years ago. It has tons of great stats tools, but a lot of things are very much designed for statisticians.

print("starts")
x
starts
library("jsonlite")
library("tidyverse")
library("xml2")
library("RCurl")
library("scales")
library("purrrlyr")
plays <- fromJSON(lb)

I’m only interested in my activity from 2018, so I will filter my dataset down to only the entries with a timecode in 2018.

stamp <- as.numeric(as.POSIXct("2018-01-01", format="%Y-%m-%d"))
recentPlays <- plays[plays$timestamp >= stamp, ]
recentPlays <- as_tibble(recentPlays[c("artist_name", "track_name", "release_name", "timestamp")])
nrow(recentPlays)
13226

That’s a lot of music! How was that listening distributed over time?

recentPlays$date <- as.Date(as.POSIXct(recentPlays$timestamp, origin="1970-01-01"))
plot <- ggplot(recentPlays, aes(format(recentPlays$date, "%Y-%U"))) +
		geom_bar(stat = "count") +
		labs(x = "Week", title="Tracks streamed per week.") +
		theme(axis.text.x=element_text(angle = -90, hjust = 0),
					panel.border = element_blank(),
					legend.key = element_blank(),
					panel.background = element_blank(),
					plot.background = element_rect(fill = "transparent",colour = NA)
		)
ggsave(file=fname, plot=plot, width=7, height=4, dpi=300, bg="transparent")
fname
Figure 2: Tracks streamed per week.

Figure 2: Tracks streamed per week.

Top Artists

We can use this data to answer some pretty easy questions. For example, who were my top artists in 2018?

top_artists <-recentPlays %>%
		count(artist_name, sort=T)
top_artists %>% head()
artist_namen
Charli XCX870
Carly Rae Jepsen427
Ariana Grande311
Kacey Musgraves277
Marina And The Diamonds223
Lady Gaga215

Critically acclaimed pop perfection yes!

Top Songs

I can also do something similar to find my top tracks for the year.

recentPlays %>%
		count(artist_name, track_name, sort=T) %>%
		head(5)
artist_nametrack_namen
SOPHIEImmaterial41
Charli XCXNo Angel40
Charli XCXI Got It (feat. Brooke Candy, CupcakKe and Pabllo Vittar)36
Charli XCXFocus34
Charli XCXLucky33

I listen to a lot of Charli XCX, so this list doesn’t really have a lot of variety (though Charli is absolutely one of the most versatile artists in pop today). Let’s filter the results to only show one song per artist.

top_songs <- recentPlays %>%
		group_by(artist_name, track_name) %>%
		count(sort=T) %>%
		ungroup() %>%
		distinct(artist_name, .keep_all=T) %>%
		head(5)
artist_nametrack_namen
SOPHIEImmaterial41
Charli XCXNo Angel40
Troye SivanMy My My!32
Kacey MusgravesHigh Horse31
Carly Rae JepsenParty For One26

Top Albums

ListenBrainz also logs the release name, so it’s pretty easy to compile a list of my top albums.

topAlbums <- recentPlays %>%
		group_by(artist_name, release_name) %>%
		count(sort=T)
topAlbums %>% head()
artist_namerelease_namen
Charli XCXPop 2296
Kacey MusgravesGolden Hour247
Carly Rae JepsenEmotion (Deluxe)191
Marina And The DiamondsElectra Heart179
Charli XCXNumber 1 Angel153
Ariana GrandeDangerous Woman144

Let’s say I just want to know which albums from the last year I streamed.

getAlbum <- function(row) {
		mburl <- sprintf(
				'https://beta.musicbrainz.org/ws/2/release/?query=artist:%s+release:%s+AND+status:official+AND+format:"Digital%%20Media"&inc=release-group&limit=1',
				curlEscape(row$artist_name),
				curlEscape(row$release_name)
		)
		print(mburl)
		Sys.sleep(0.25)
		groupData <- read_xml(mburl)
		xml_ns_strip(groupData)
		release <- xml_find_first(groupData, '//release[@ns2:score=100]')
		xml_ns_strip(release)
		# If it is empty
		if (class(release) == "xml_missing") {
				release <- xml_new_document() %>% xml_add_child("")
		}
		# Go with the earliest release date given.
		date <- xml_text(xml_find_first(release, "//date"))
		artistId <- xml_text(xml_find_first(release, "//artist/@id"))
		df <- data.frame(date, artistId, stringsAsFactors=FALSE)
		colnames(df) <- c("date", "artistId")
		return(df)
}
recentAlbums <- topAlbums %>% filter(n > 25) %>% by_row(..f=getAlbum, .to=".out") %>% unnest()
recentAlbums %>%
		filter(str_detect(date, "2018")) %>%
		select(artist_name, release_name, n, date) %>%
		filter(n > 75)
artist_namerelease_namendate
Kacey MusgravesGolden Hour2472018-03-30
Clarence ClarityTHINK: PEACE1192018-10-04
SOPHIEOIL OF EVERY PEARL’S UN-INSIDES1192018-06-15
Amnesia ScannerAnother Life1182018-09-07
Troye SivanBloom1182018-05-02
IDLESJoy as an Act of Resistance.1032018-08-31
Ariana GrandeSweetener982018-08-17
A.A.L (Against All Logic)2012 - 2017902018-02-17
Let’s Eat GrandmaI’m All Ears872018-06-29
Beach House7862018-05-11
MitskiBe the Cowboy862018-08-17
Mid-Air ThiefCrumbling 무너지기782018-07-31

Minutes streamed

Initially I considered a brute-force approach to this problem; however, it does not seem a good use of resources to get the length for every single song. Instead I’ll write a function to grab lengths for songs…

 getLengths <- function(row) {
song_stripped <- trimws(sub("\\(.*\\)", "", row$track_name))
mburl <- sprintf(
					'https://beta.musicbrainz.org/ws/2/recording/?query=artist:%s+AND+recording:%s&limit=2',
					curlEscape(row$artist_name),
					curlEscape(song_stripped)
)
# To comply with the rate limit.
Sys.sleep(0.5)
albumData <- read_xml(mburl)
xml_ns_strip(albumData)
length <- xml_integer(xml_find_first(albumData, "//length"))
return(length)
	}

…and sample 250 of my streams.

set.seed(425368203)
len_sample <- recentPlays %>% sample_n(250) %>% by_row(..f=getLengths, .to="length") %>% unnest()

This gives me a reasonable mean length.

mean_len <- len_sample %>% dplyr::summarize(Mean=mean(length, na.rm=T))
Mean
240542.148760331

Which I can use to estimate the total for the population.

mins <- nrow(recentPlays) * mean(as.numeric(mean_len)) / 60000
x
50698.9453704167

Top Genre

Observation: the top quartile of artists make up the vast majority of my streams this year.

top_artist_ids <- recentAlbums %>%
		group_by(artistId) %>%
		filter(!is.na(artistId)) %>%
		summarize(Sum=sum(n)) %>%
		arrange(desc(Sum))
top_artist_ids %>%
		summarize(sum(Sum))
sum(Sum)
6985

Conslution: This is a good time to use a sample again.

fetchGenres <- function(row) {
		mburl <- sprintf(
				"https://beta.musicbrainz.org/ws/2/artist/%s?inc=genres",
				row$artistId
		)
		print(mburl)
		Sys.sleep(0.25)
		groupData <- read_xml(mburl)
		xml_ns_strip(groupData)
		genres <- xml_text(xml_find_all(groupData, "//genre/name"))
		return(genres)
}
top_artist_ids <- top_artist_ids %>%
		by_row(..f=fetchGenres, .to="Genres") %>%
		unnest()
topGenres <- top_artist_ids %>%
		group_by(Genres) %>%
		summarize(Sum=sum(Sum)) %>%
		arrange(desc(Sum))
topGenres %>% head()
GenresSum
pop2535
electropop1958
dance-pop1712
electronic1411
pop rock1145
synth-pop741

Creating the graphic

library("ggpubr")
library("png")
library("raster")

myTheme <- ttheme(colnames.style = colnames_style(color = "white",
																									fill = "#8cc257",
																									linewidth=0),
									tbody.style = tbody_style(color = "white", linewidth=0,
																						fill = "#8cc257"))

bgTheme <- theme(
		plot.background =
				element_rect(fill = "#8cc257", color="#8cc257"),
		panel.border = element_blank(),
		)

top_artist_names <- top_artists$artist_name %>%
		head()
artistTable <- ggtexttable(top_artist_names, rows = NULL,
													 theme = myTheme, cols=c("Top Artists")) + bgTheme
trackTable <- ggtexttable(top_songs$track_name, rows = NULL,
													theme = myTheme, cols=c("Top Songs")) + bgTheme
minutes <- as_ggplot(text_grob(
		paste("Minutes Listened",
					toString(round(mins)),
					"",
					"Top Genre",
					toString(topGenres[1,1]),
					sep="\n"),
		color="white")) + bgTheme
img <- readPNG("images/albums.png")
im_A <- ggplot() +
		background_image(img[1:250, 1:250, 1:3]) +
		theme(
				plot.margin = margin(t=.5, l=.5, r=.5, b=.5, unit = "cm"),
		) + bgTheme
p <- ggarrange(im_A, artistTable, minutes, trackTable, ncol=2, nrow=2)
ggsave(file=fname, plot=p, width=4.5, height=4.5, dpi=300)
fname

A portrait of Carl Colglazier

I'm .

Rooted in a dual education in computer science and communication, I make meaningful information accessible with new media, social computing, and computational social science. I also post on , YouTube, and GitHub.

Mentions