Adele’s lyrics analysis in R

What stands behind “Hello, it’s me”?

18 min readApr 1, 2021

Adele is one of the greatest singers the music industry has ever had. Adele Laurie Blue Adkins is an English singer-songwriter who has won 15 Grammys (out of 18 nominations)! Her music is famous for its uniqueness, while her lyrics are rich and deep. As a fan of Adele and music analytics in general, I couldn’t miss this opportunity of analyzing the singer’s lyrics with the help of text analysis tools in R.

The focus of this analysis are three albums, which Adele produced throughout her career (let’s hope that the fourth one is coming soon). Each album is named after the age of the singer during its production.

19 is the debut studio album. Adele wrote most of the album’s material solely, but did work with a select few writers and producers including Jim Abbiss, Eg White and Sacha Skarbek. Their collaborations created a blue-eyed soul album with lyrics describing heartbreak, nostalgia and relationships.
21 is the second studio album. 21 shares the folk and Motown soul influences of her 2008 debut album 19, but was further inspired by the American country and Southern blues music to which she had been exposed during the North American leg of her 2008–09 tour An Evening with Adele. Composed in the aftermath of the singer’s separation from her then partner, the album typifies the near dormant tradition of the confessional singer-songwriter in its exploration of heartbreak, introspection, and forgiveness.
25 is the third studio album. Issued nearly five years after her previous album, the internationally successful 21 (2011), the album is titled as a reflection of her life and frame of mind at 25 years old and is termed a “make-up record”. Its lyrical content features themes of Adele “yearning for her old self, her nostalgia”, and “melancholia about the passage of time” according to an interview with the singer by Rolling Stone.

Summary

After loading the necessary packages, the data from Adele’s three albums is loaded and cleaned. Then I proceed with text mining of songs and albums and their various facets. Next I employ three different approaches to analyze the sentiment in Adele’s songs. Finally, the network of Adele’s songs is constructed as well as some key insights are highlighted.

Loading packages

First things first, we need to download the corresponding packages.

library(dplyr) #God of all packages 
library(tidyverse) #Goddess of all packages 
library(genius) #lyrics
library(tidytext) #text tidying
library(extrafont) #personalizzed fonts to ggplot output
library(scales) #percentage scales in ggplot
library(widyr) #correlations between songs
library(ggraph) #network maps
library(igraph) #network analysis
library(knitr) #output formatting
library(kableExtra) #output formatting
library(wordcloud2) #word cloud visualization 
library(formattable) #output formatting
library(textdata) #working with lyrics 
library(ggrepel) #extra geoms for ggplot 
library(yarrr) #pirate plot

Downloading data

We can get the lyrics from Adele’s albums with the help of genius-package. Here, I’m using the genius_album function to download, one by one Adele’s albums. All we need to specify to the function is the name of the artist and the album we want, and it’ll get the lyrics. As an output, we get a tibble with one row per sentence and information on the track title, track number and song line, while the album’s name has to be added separately.

a_25 <- genius_album(artist = "Adele", album = "25") %>%
  mutate(album = "25")

a_21 <- genius_album(artist = "Adele", album = "21") %>%
  mutate(album = "21")

a_19 <- genius_album(artist = "Adele", album = "19") %>%
  mutate(album = "19")

Additionally, album “19” has live versions of songs (starting from Track 13), therefore, they need to be filtered out to avoid having the same lyrics twice.

a_19 <- a_19 %>%
  filter (!track_n %in% c(13:22))

After downloading everything, I put the albums lyrics all together in a tibble called adele_data.

adele_data <- rbind(a_25, a_21, a_19)

Finally I save the dataset as having to download the data again might be time-consuming.

save(adele_data, file = "adele_data.Rdata")

Data preparation

Let’s view the final dataset.

str(adele_data)

As of now we have 927 rows of lyrics from Adele’s songs with 5 columns, namely track number, line, lyric itself, track title and the album. We can also check the summary of the tibble.

summary(adele_data)

In the summary we can observe some NA’s that need to be deleted.

adele_data <- na.omit(adele_data)

So now we have a tibble with more that 900 lines of lyrics from Adele’s songs. But according to the tidy text format, we need to have “one-token-per-document-per-row” format, with a token meaning, in our case, a word. We can convert songs to this format via unnest_tokens function from tidytext, that will take our data frame and split every sentence word by word.

adele_tok <- adele_data%>%
  #word is the new column, lyric the column to retrieve the information from
  unnest_tokens(word, lyric)

Our new dataset has 6,474 observations, with each row representing a separate word from lyrics.

adele_tok %>%
  mutate(album = color_tile("lightblue","lightblue")(album)) %>%
  mutate(word = color_tile("lightgreen", "lightgreen")(word)) %>%
  kable("html", escape = FALSE, align = "c", caption = "Lyrics in Adele's Songs") %>%
  kable_styling(bootstrap_options = 
                  c("striped", "condensed", "bordered"), 
                  full_width = FALSE)

In this output we already can see a lot of pronouns, prepositions, etc., which of course doesn’t make any sense. These are the so-called “stop words”, which can be easily removed with the stopwordsdictionary from the tidytext package. Moreover, we can also remove words with fewer than three characters (often used for phonetic effect in music) and some manually selected undesirable words (e.g. “yeah”).

adele_tidy <- adele_tok %>%
  filter(word != "lea",
         word != "aye",
         word != "yeah",
         word != "da",
         word != "after",
         word != "where",
         word != "were",
         word != "such",
         word != "from",
         word != "ooh",
         word != "set",
         word != "lay",
         word != "would've",
         word != "gonna") %>% 
  filter(!nchar(word) < 3) %>% 
  anti_join(stop_words)

Text mining

Now that the songs are in tidy format, we can analyze them. The purpose of text mining is to discover relevant insights from Adele’s lyrics at the level of word frequency, density and lexical diverstiy.

Firstly, we can check the word frequency (including stop words and unnecessary words) by songs.

adele_data$album <- as.factor(adele_data$album)

word_count <- adele_data %>%
  unnest_tokens(word, lyric) %>%
  group_by(track_title,album) %>%
  summarise(num_words = n()) %>%
  arrange(desc(num_words))

And here we look at songs with the highest word count.

word_count[1:10,] %>%
  ungroup(num_words, track_title) %>%
  mutate(num_words = color_bar("skyblue")(num_words)) %>%
  kable("html", escape = FALSE, align = "c", caption = "Songs with the Highest Word Count") %>%
 kable_styling(bootstrap_options = 
                  c("striped", "condensed", "bordered"), 
                  full_width = TRUE)

Clearly the song “He Won’t Go” by Adele has the highest number of words: 453 words. According to genius.com, the song “was mainly inspired by two friends of Adele’s that she had met after her first record, 19, was released. The male was a heroin-addict and had been making his journey to rehab. The couple’s bond had helped the male overcome his addiction, something that really touched Adele, who says that she is very proud of both of them, as the male has been “clean” for what is now over a year.”

And we can also visualize the distribution of words by albums.

word_count %>%
  ggplot() +
    geom_density(aes(x = num_words, fill = album), alpha = 0.5, position = 'stack') +
    ylab("Song Density") + 
    xlab("Word Count per Song") +
    ggtitle("Word Count Distribution") +
   theme_minimal()+
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"), legend.position="bottom")

As we can observe, the highest word count belongs to the album “19”, with the following albums having fewer words.

Let’s do a simple count-call to see which are the most frequent words based on the “clean” dataset.

adele_tidy %>%
  count(word, sort = TRUE) %>%
  arrange(desc(n, .by_group = TRUE)) %>%
  top_n(10) %>%
  mutate(n = color_tile("#FFE4B5", "#FF4500")(n)) %>%
  kable("html", escape = FALSE, align = "c", caption = "The Most Frequent Words in Adele's Songs") %>%
  kable_styling(bootstrap_options = 
                  c("striped", "condensed", "bordered"), 
                  full_width = TRUE)

A clear leader in Adele’s songs is the word “love” (mentioned 118 times!), which is no wonder given the character of Adele’s songs in general. The following common words are rumour (63 times), time (33 times), heart (27 times) and highs and lows (16 times both). Exciting, isn’t it?

We can also plot the most frequent words.

adele_tidy %>%
  count(word, sort = TRUE) %>%
  #filtering to get only the information we want on the plot
  filter(n > 13)%>%
ggplot(aes(x = reorder(word, n), y = n, fill = -n)) +
  #Use `fill = -word_count` to make the larger bars darker
  geom_col()+
  geom_text(aes(label = reorder(word, n)), 
            hjust = 1.2,vjust = 0.3, color = "white", 
            size = 5)+
  labs(y = "Number  of times mentioned", 
       x = NULL,
       title = "Most Frequent Words in Adele's Lyrics")+
  coord_flip()+
  ylim(c(0,130))+ 
  theme_minimal() + 
  theme(plot.title = element_text( hjust = 0.5,vjust = 1, face = "bold"),
        axis.text.y = element_blank(), legend.position = "none")

Furthermore, we can create a word cloud with the most frequent words in Adele’s songs.

adele_words_counts <- adele_tidy %>%
  count(word, sort = TRUE)wordcloud2(adele_words_counts[1:50, ], size = .5)

Based on the most frequent words, we can already state that Adele’s songs are melancholic and centered around the topic of love. But what about the distribution of the most common words across different albums?

words_albums <- adele_tidy %>%
  group_by(album) %>%
  count(word, album, sort = TRUE) %>%
  slice(seq_len(10)) %>%
  ungroup() %>%
  arrange(album, n) %>%
  mutate(row = row_number())words_albums %>%
  ggplot(aes(row, n, fill = album)) +
    geom_col(show.legend = NULL) +
    labs(x = NULL, y = "Word count") +
    ggtitle("Words Across Albums") + 
    theme_minimal() +   
    facet_wrap(~album, scales = "free") +
   scale_x_continuous(  # This handles replacement of row 
      breaks = words_albums$row, # notice need to reuse data frame
      labels = words_albums$word) +
    coord_flip() +
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"))

An interesting fact: the word “love” was very prominent in the last two albums by Adele, namely “21” and “25”, whereas in the debut album “19” it’s not even in the top-10.

Last but not least, we can check the lexical diversity per album or simply speaking the vocabulary of each album. For this purpose we can use a pirate plot from the yarrr package. This plot is an advanced method of plotting a continuous dependent variable, such as the word count, as a function of a categorical independent variable, like album. Again in this case we’re using “untidy” dataset with all stop and short words.

word_summary <- adele_tok %>%
  group_by(album, track_title) %>%
  mutate(word_count = n_distinct(word)) %>%
  select(track_title, word_count) %>%
  distinct() %>% #To obtain one record per song
  ungroup()pirateplot(formula = word_count ~ album, 
             data = word_summary, #Data frame
   xlab = "Album", ylab = "Song Distinct Word Count", #Axis labels
   main = "Lexical Diversity Per Album", #Plot title
   pal = "google", #Color scheme
   point.o = .6, #Points
   avg.line.o = 1, #Turn on the Average/Mean line
   theme = 0, #Theme
   point.pch = 16, #Point `pch` type
   point.cex = 1.5, #Point size
   jitter.val = .1, #Turn on jitter to see the songs better
   cex.lab = .9, cex.names = .7) #Axis label size

Every colored circle in this pirate plot represents a song. There is a slight decrease in the mean unique number of words per song in the second and third albums, however, overall the lexical diversity of albums seems to be almost at the same level.

Term Frequency-Inverse Document Frequency (TD-IDF)

So far we’ve looked at the entire dataset without analyzing the importance of certain words in lyrics. One measure of how important a word may be is its term frequency (tf), how frequently a word occurs in a document. Another approach is to look at a term’s inverse document frequency (idf), which decreases the weight for commonly used words and increases the weight for words that are not used very much in a collection of documents. This can be combined with term frequency to calculate a term’s TF-IDF (the two quantities multiplied together), the frequency of a term adjusted for how rarely it is used. For this purpose, firstly we need to calculate tf, idf and tf_idf measures for each word in each album.

tfidf_words <- adele_tidy%>%
  count(album, word, sort = TRUE) %>%
  ungroup() %>%
  bind_tf_idf(word, album, n)tfidf_words [1:12,] %>%
  mutate(tf = color_tile("#FFA07A", "#FF0000")(tf)) %>%
  mutate(idf = color_tile("#98FB98", "#32CD32")(idf)) %>%
  mutate(tf_idf = color_tile("#E0FFFF", "#00BFFF")(tf_idf)) %>%
  kable("html", escape = FALSE, align = "c", caption = "Term Frequency-Inverse Document Frequency in Adele's Songs") %>%
  kable_styling(bootstrap_options = 
                  c("striped", "condensed", "bordered"), 
                  full_width = FALSE)

Still the words “love” and “rumour” seem to have high TF-ID indicators. Then let’s select top-10 most important words per album…

top_tfidf_words_album <- tfidf_words %>% 
  group_by(album) %>% 
  slice(seq_len(10)) %>%
  ungroup() %>%
  arrange(album, tf_idf) %>%
  mutate(row = row_number())

… and we can plot them too.

top_tfidf_words_album %>%
  ggplot(aes(x = row, tf_idf, fill = album)) +
  geom_col(show.legend = NULL) +
  labs(x = NULL, y = "TF-IDF") + 
  ggtitle("Important Words using TF-IDF by Album") +
  theme_minimal() +  
  facet_wrap(~album,
             scales = "free") +
  scale_x_continuous(  # this handles replacement of row 
    breaks = top_tfidf_words_album$row, # notice need to reuse data frame
    labels = top_tfidf_words_album$word) +
  coord_flip() +
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"))

According to TD-IDF approach, the most important word in the album “19” is “tired”, while for albums “21” and “25” these are “rumour” and “love”, respectively. Going from “tired” to “rumour” and finishing with some “love”…what a combination!

Sentiment analysis

Now we dig deeper into analyzing the content of Adele’s songs with R. An interesting technique to apply here is sentiment analysis. Sentiment analysis is a type of text mining to identify the overall “mood” of the text content, in case of lyrics it can reveal the artist’s attitudes and even cultural influences. Sentiment analysis in R can be performed with the tidytext package and its different sentiment lexicons. Three general-purpose lexicons are:

AFINN from Finn Årup Nielsen;
bing from Bing Liu and collaborators;
NRC from Saif Mohammad and Peter Turney.

All three of these lexicons are based on unigrams, i.e., single words.

Bing sentiment

The bing lexicon categorizes words in a binary fashion into positive and negative categories. Let’s compare the positivity and negativity of Adele’s songs first.

adele_bing_plot <- adele_tidy %>%
  inner_join(get_sentiments("bing"))

bing_plot <- adele_bing_plot %>%
  group_by(sentiment) %>%
  summarise(word_count = n()) %>%
  ungroup() %>%
  mutate(sentiment = reorder(sentiment, word_count)) %>%
  ggplot(aes(sentiment, word_count, fill = sentiment)) +
  geom_col() +
  guides(fill = FALSE) +
  labs(x = NULL, y = "Word Count") +
  ggtitle("Adele Bing Sentiment") +
  coord_flip() +
  theme_minimal()+
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"), legend.position="bottom")bing_plot

Here we can see that according to the bing approach, Adele’s songs are slightly more negative than positive.

In this section I will compute the total amount of positive and negative sentiments in every Adele’s song, by computing the difference between positive and negative ones. Firstly, I will perform an inner join with the lexicon selected which will add a column specifying whether a word has a positive or a negative sentiment related to it. Then, I count the number of total positive and negative words showing up in a song with the count-call. Finally, I carry out some basic data wrangling to create a desired tibble. For more information on tidytext package and its functions, please consult this tutorial, highly recommended!

adele_bing<- adele_tidy%>%
  inner_join(get_sentiments("bing"))%>% 
  count(album, track_title, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)adele_bing [1:10,] %>%
  mutate(negative = color_tile("#FFA07A", "#FF0000")(negative)) %>%
  mutate(positive = color_tile("#98FB98", "#32CD32")(positive)) %>%
  mutate(sentiment = color_tile("#E0FFFF", "#00BFFF")(sentiment)) %>%
  kable("html", escape = FALSE, align = "c", caption = "Bing Sentiment of Adele's Songs") %>%
  kable_styling(bootstrap_options = 
                  c("striped", "condensed", "bordered"), 
                  full_width = TRUE)

Now we can plot the sentiment of different albums and tracks based on the bing lexicon.

adele_bing%>%
  ggplot(aes(reorder(track_title, sentiment), sentiment, fill = album)) +
  geom_col(show.legend = TRUE) + 
  scale_fill_manual(values = c("#D8BFD8", "#FFD700", "#AFEEEE"))+
  labs(x = NULL,
       y = "Sentiment",
       title = "Adele's Songs Ranked by Bing Sentiment")+
  theme_minimal()+
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"), legend.position="bottom")+
  coord_flip()

In line with the plot the most positive song is “Why Do You Love me” from album “25”. The song indeed has a very positive tone to it. Album “21” has numerous negative songs, compared with other albums, which makes sense, given that the album was written after a hard breakup and symbolises forgiveness and broken-heartedness.

NRC sentiment

The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).

adele_nrc <- adele_tidy %>%
  inner_join(get_sentiments("nrc"))

Now we can see if more of the Adele’s lyrics are positive or negative, according to NRC approach.

nrc_plot <- adele_nrc %>%
  group_by(sentiment) %>%
  summarise(word_count = n()) %>%
  ungroup() %>%
  mutate(sentiment = reorder(sentiment, word_count)) %>%
  #Use `fill = -word_count` to make the larger bars darker
  ggplot(aes(sentiment, word_count, fill = -word_count)) +
  geom_col() +
  geom_text(aes(label = word_count), 
            hjust = 1.3,vjust = 0.5, color = "white", 
            size = 4)+
  labs(x = NULL, y = "Word Count") +
  ggtitle("Adele NRC Sentiment") +
  coord_flip() +
  theme_minimal()+
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"), legend.position="bottom")nrc_plot + guides(fill=FALSE)

Contrary to the bing approach, here the positive sentiment prevails over the negative one, however, one has to bear in mind that except for the positive/negative sentiments, other categories which can potentially be classified into “positive” or “negative” are also present, e.g., “joy” or “sadness”.

Now we can proceed with the analysis of words belonging to different sentiments determined by the NRC lexicon.

plot_words <- adele_nrc %>%
  group_by(sentiment) %>%
  count(word, sort = TRUE) %>%
  arrange(desc(n)) %>%
  slice(seq_len(8)) %>% #consider top_n() from dplyr also
  ungroup()

For the visualization, a special theme can be created.

theme_lyrics <- function(aticks = element_blank(),
                         pgminor = element_blank(),
                         lt = element_blank(),
                         lp = "none")
{
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"), #Center the title
        axis.ticks = aticks, #Set axis ticks to on or off
        panel.grid.minor = pgminor, #Turn the minor grid lines on or off
        legend.title = lt, #Turn the legend title on or off
        legend.position = lp) #Turn the legend on or off
}

And now we can visualize the words themselves.

plot_words %>%
  #Set `y = 1` to just plot one variable and use word as the label
  ggplot(aes(word, 1, label = word, fill = sentiment)) +
  #You want the words, not the points
  geom_point(color = "transparent") +
  #Make sure the labels don't overlap
  geom_label_repel(force = 1,nudge_y = .5,  
                   direction = "y",
                   box.padding = 0.04,
                   segment.color = "transparent",
                   size = 3) +
  facet_grid(~sentiment) +
  theme_lyrics() +
  theme(axis.text.y = element_blank(), axis.text.x = element_blank(),
        axis.title.x = element_text(size = 6),
        panel.grid = element_blank(), panel.background = element_blank(),
        panel.border = element_rect("lightgray", fill = NA),
        strip.text.x = element_text(size = 9)) +
  xlab(NULL) + ylab(NULL) +
  ggtitle("Adele NRC Sentiment") +
  coord_flip()

As can be seen on the plot, the allocation of words into pre-defined sentiments definitely makes sense. Some words, for instance, “finally”, “true” and “god” are simultaneously displayed in different sentiments. Funny enough, the word “boy” is classified as “negative” and “disgust”. NRC definitely knows something!

AFINN sentiment

The AFINN lexicon assigns words with a score that runs between -5 and 5, with negative scores indicating negative sentiment and positive scores indicating positive sentiment.

adele_afinn <- adele_tidy %>%
  inner_join(get_sentiments("afinn")) %>%
  group_by(album, track_title) %>%
  summarize(mean = mean(value, na.rm=TRUE))

Now let’s visualize it.

adele_afinn%>%
  ggplot(aes(reorder(track_title, mean), mean, fill = album)) +
  geom_col(show.legend = TRUE) + 
  scale_fill_manual(values = c("#D8BFD8", "#FFD700", "#AFEEEE"))+
  labs(x = NULL,
       y = "Sentiment",
       title = "Adele's Songs Ranked by AFINN Sentiment")+
  theme_minimal()+
  theme(plot.title = element_text(hjust = 0.5,vjust = 1, face = "bold"), legend.position="bottom")+
  coord_flip()

According to the AFINN lexicon, the most positive Adele’s song is “Lay Me Down” from the album “25”, while the most negative one are “Crazy for You” and “Tired” from “19”. Here the songs from album “25” are classified as positive although the album is mostly about regret and lost time. According to Adele, “My last record was a break-up record, and if I had to label this one, I would call it a make-up record. Making up for lost time. Making up for everything I ever did and never did. 25 is about getting to know who I’ve become without realising. And I’m sorry it took so long but, you know, life happened.”

Comparison of lexicons

Finally, let’s compare these three lexicons and how they define sentiments.

afinn <- adele_tidy %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = line) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN")bing_and_nrc <- bind_rows(
  adele_tidy %>% 
    inner_join(get_sentiments("bing")) %>%
    mutate(method = "Bing et al."),
  adele_tok %>% 
    inner_join(get_sentiments("nrc") %>% 
                 filter(sentiment %in% c("positive", 
                                         "negative"))
    ) %>%
    mutate(method = "NRC")) %>%
  count(method, index = line, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)

We now have an estimate of the net sentiment (positive — negative) in each line of songs for each sentiment lexicon. Let’s bind them and visualize them.

bind_rows(afinn, 
          bing_and_nrc) %>%
  ggplot(aes(index, sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~method, ncol = 1, scales = "free_y") + 
  labs(x = "Index",
       y = "Sentiment") + 
  theme_bw()

The three lexicons for calculating sentiment produce different absolute results which, however, have similar trajectories through Adele’s songs. We can indeed observe some similar dips and peaks in sentiment at about the same places in songs. AFINN lexicon has the largest absolute values, with high positive values. The lexicon from Bing et al. has lower absolute values.The NRC results are shifted higher relative to the other two, labeling the text more positively, but detects similar relative changes in the text.

Network of songs

A long way behind us! So far we’ve downloaded the lyrics, tidied up the data, performed basic text mining and analyzed the sentiment of Adele’s songs via different approaches. Now we can check how Adele’s songs are related to each other lyrics-wise. For this we will employ the network analysis. Network analysis is a set of integrated techniques to depict relations among actors and to analyze the social structures that emerge from the recurrence of these relations.

To begin weith, we need to check pairwise correlations of songs first.

adele_cor <- adele_tidy %>%
  pairwise_cor(track_title, word, sort = TRUE)adele_cor %>%
  arrange(desc(correlation, .by_group = TRUE)) %>%
  top_n(10) %>%
  mutate(correlation = color_tile("lightgreen", "skyblue")(correlation)) %>%
  kable("html", escape = FALSE, align = "c", caption = "Correlation between Adele's Songs") %>%
  kable_styling(bootstrap_options = 
                  c("striped", "condensed", "bordered"), 
                  full_width = TRUE)

With so many correlations, it will be difficult to produce a nice-looking plot. Therefore, firstly we need to filter out correlations, for example, let’s leave only those couples of songs, where correlations are higher than 0.1. Quick note: R might produce different plots every time you run the function, therefore, it’s recommended to set a seed first.

set.seed(123)adele_cor %>%
  filter(correlation > .1) %>%
  graph_from_data_frame() %>%
  ggraph(layout = "fr") +
  geom_edge_link( show.legend = FALSE, aes(edge_alpha = correlation)) +
  geom_node_point(color = "#BA55D3", size = 5) +
  geom_node_text(aes(label = name), repel = TRUE, size = 3.5, color = "black") +
  theme_void()

Here we can see, that lyrics-wise Adele’s songs aren’t that interconnected, meaning that Adele’s songs are indeed unique and she doesn’t repeat herself. “I Found a Boy” and “Set Fire to the Rain” are not connected to the rest of the network at all.

Key insights

All in all, the above analysis of Adele’s albums has definitely provided a lot of food for thought. Here is the recap of some findings.

1. “He Won’t Go” by Adele (Album “21”) has the highest word count: 453 words.

2. Album “25” has the highest word density.

3. The most frequent word in Adele’s songs is “love”(118 times), followed by “rumour” (63 times) and “time” (33 times).

4. The most frequently used word in album “19” is “tired”, while in albums “21” and “25” those are “rumour” and “love”, respectively.

5. All three albums have almost the same level of lexical diversity (proportion of unique words in lyrics).

6. The most important words per album according to the TD-IDF approach are again “tired” (album “19”), “rumour” (album “21) and “love” (album “25”).

7. For the sentiment analysis three different lexicons were employed. In accordance with the bing lexicon, Adele’s songs are slightly more negative, opposite to the AFINN approach. According to the NRC sentiment analysis a lot of words from Adele’s songs were classified to have “positive”, “joy”, “negative’ and “anticipation” sentiments.

8. The sentiment analysis approaches showcased some similarities, however, the distribution of positive and negative sentiments differss slightly.

9. Based on the network analysis approach, Adele’s songs are not that interconnected lyrics-wise.

Useful links for further analysis

Of course, the above analysis doesn’t present all aspects of lyrics analysis. For further research, the integration of Spotify features would be relevant. In addition, here are a few links where one can find inspirations and ideas for further work:
https://www.datacamp.com/community/tutorials/sentiment-analysis-R
https://www.tidytextmining.com/index.html
https://uc-r.github.io/sentiment_analysis

To sum it up, lyrics of Adele’s songs go far beyond numbers and statistics, and they are simply timeless!