A whole new (research) world: AI-based classification of Disney songs into academic fields
Research AI bite: 02.
Key takeaways
Dimensions AI classifiers can be applied to any text, including thesis abstracts, unfunded or internal grant proposals.. even Disney song lyrics.
Classifying non-scholarly text using our scholarly-trained classifiers highlights the shift in lyrics during the Disney Renaissance.
Dimensions has used AI (then referred to as Machine Learning) since its inception, enabling the unique ability to classify research publications at the article level. To demonstrate the versatility of our classifiers, we are applying them here to an unexpected dataset: Disney song lyrics. While this may seem frivolous, Disney is a subject of serious academic interest—Dimensions has indexed 6,600+ publications mentioning “disney” (in title & abstract, while excluding Disney as a researcher name), nearly tripling in the last 10 years. The company even funds research on robotics, wildlife conservation, marine ecology, dynamic multimedia technologies, ecological analysis, and theme park management, reflecting their interests in entertainment, sustainability, and innovation.
The Dimensions API classifier
The Dimensions API allows users to query items (publications, grants, researchers, ..) in Dimensions, but also provides useful functions; from extracting affiliations with GRID to using our classifications on any text, public or private. Whether you should apply them to text like Disney lyrics is obviously another question—we have trained our classifiers on research materials, so it is best to use with research-related text. However, as always, our classification system remains inclusive, covering everything from Tiananmen Square research (spanning Human Society, Language, Communication and Culture, Creative Arts and Writing, and History, Heritage and Archaeology) to gender studies (4405 Gender Studies). You can even find mRNA research with applications in Clinical Sciences and Oncology.
Our classifiers are designed to identify research areas, cancer types, conditions, Fields of Research (ANZSRC classification), the UN Sustainable Development Goals (SDGs), and more. The Fields of Research (FoR) classifier we are using today was built on grants manually classified by researchers, then further refined by our skilled team—an approach developed to overcome the limitations of traditional journal-based classifications (Herzog & Lunn 2018). It models two levels: 2-digit Divisions and 4-digit Groups, ensuring precise categorisation across FoR, and is continuously updated (Porter et al. 2023).
Applying it to Disney songs
Based on online tops, I identified 99 songs that were representative of the 3 periods that Disney (so only Disney although Toy Story is included) has gone through:
Classic: 1937 to 1988. Hand-drawn animated fairy tales but also some classic live-action movies. 33 songs.
Renaissance: 1989 to 1999. Broadway-style musicals, it starts with the Little Mermaid and ends with Tarzan. 37 songs.
Modern: since 2000. CGI animation, diverse storytelling, and experimental narratives. 29 songs.
Although it lasted only 10 years, the Renaissance era is rich in the most memorable songs, hence the imbalance.
Using ChatGPT to identify artists and the Genius API (see the making of below), I compiled the lyrics for the songs. I then used the Dimensions API to classify the lyrics. Most songs were classified in the discipline Humanities and Creative Arts, especially in the 10 years of the Renaissance. None of them ha more than one FoR, even though 25% of publications in Dimensions have more than one FoR. Here are some statistics, but all songs are represented in a figure/data table further down if you only want to look at that.
Looking deeper at the FoR Group level, we found a striking shift in classification during the Renaissance era, and one of the reasons why this might be considered the best Disney era (for extended discussion see for instance this Reddit post).
The Classic era dominated the most FoR, reflecting its focus on the physical world, nature, and community. The Renaissance era marked a shift toward Theology, Literary, and Historical studies as narratives moved from external descriptions to inner journeys. Themes of miracles, destiny, and faith reinforced allegory and symbolism. In the Modern era, theological elements declined, replaced by an emphasis on Performing Arts, Religious Studies, and Music.
Classic era—natural world allegories
In this era, passive characters awaited rescue, villains were clearly evil, and romance-centred stories focused on overcoming obstacles. Some surprising song classifications include:
Bella Notte in Ophthalmology (mention eyes),
A spoonful of sugar in Zoology (not too surprising with lyrics like A robin feathering his nest / The honey bees that fetch the nectar),
The Unbirthday Song in Applied mathematics (Lyrics include Now, statistics prove / Prove that you've / One birthday) [Lewis Carroll is a mathematician after all, thanks Juergen for the reminder],
Following the Leader in Applied Economics (lyrics mention leadership, group behaviour, and decision-making),
Ev'rybody Wants to Be a Cat in Music
Renaissance era—looking for miracles
With more complex villains and characters shaping their destinies, this era saw songs integrated into character development. Theology, History, and Literary Studies dominated, partly due to films like Hercules and Mulan:
Under the sea (The Little Mermaid) is unsurprisingly classified in Fisheries Sciences,
Honor to us all (Mulan) in Art, History, Theory and Criticism,
Arabian Nights (Aladdin) in Performing Arts.
Modern era—personal agency
While Renaissance characters faced external struggles, Modern-era protagonists confront internal conflicts. Allegories increasingly use natural and scientific imagery—waves, pressure, ice, and the cosmos—to explore self-discovery, with less emphasis on fate.
Let it Go (Frozen), this was the reason I started this analysis; I expected that with frozen fractals and crystallizes, chemistry or mathematics would be picked up.. but the repeated ‘let it go’, the ‘heavens’ and ‘soul’ pushed towards Theology, [25% of publications have more than one FoR, we see here that it needs a stronger signal to give more than one FoR—none of the songs received more than one.]
In Summer (Frozen) in Atmospheric science,
Un poco loco (Coco) in Applied Mathematics (‘count’ but ‘ay’ could also be interpreted as variables).
Results
Below are all the songs, their Disciplines and FoR, through the years; visualisation or data views.
Conclusion
While applying AI classification to Disney songs may seem unconventional, offers a chance for exploring academic fields through unexpected connections. The Dimensions classifier, designed for scholarly content, revealed interesting patterns in musical themes—most songs aligned with Humanities and Creative Arts, with a peak in Theology during the Renaissance era. However, the most intriguing insights came from unexpected classifications, prompting a deeper dive into the disciplines themselves. Why did Peter Pan’s Following the Leader land in Applied Economics? How does Mary Poppins’ A Spoonful of Sugar relate to Zoology? The answers lie in the ways academic fields interpret language and themes.
With limited text to analyse, the classifier assigns categories based on the closest possible match, focusing on individual words. However, a song such as Let it go, despite its scientific vocabulary, was still classified as Theology, suggesting that the song’s themes of transformation, power, and exile—along with words like heaven—made Theology the dominant match.
This analysis demonstrates the flexibility of AI-driven research tools while reminding us that context matters—a classifier trained on research publications will naturally respond to words used by researchers. This playful experiment gives us a better understanding of the fields themselves. Ultimately, while entertaining, applying our classifier to Disney lyrics is far from its intended use; it is best suited for research materials such as thesis abstracts and unfunded grant proposals.
The End
The making of this analysis
Artificial Intelligence
During the process of this bite, here is when I used AI:
Creating the list of the top Disney songs with their artists—ChatGPT 4o refused to give more than 20. o1 accepted to list 54 and then wrote a loop in python to create placeholders for the rest (literally no shame!).. a second pass helped though. I further refined it because it hadn’t included some of my favourite songs. However, when it didn’t know the artists it just either inferred from others or just lazily called it “movie name Cast” (which is sometimes true).. It even suggested singers who had nothing to do with the movies and an artist who had died 8 years before one movie, which was quite an hallucination.
I also lazily asked ChatGPT to write my python code to access the Genius API, so I didn’t even need to read the documentation (however I had a few issues when it retrieved a song in Japanese so it may not have been optimised).
Sounding board to shape the idea.
Spelling, grammar, and inconsistencies.
Code
I used the Genius API for the Lyrics, and the Dimensions API for the classifier. You can find here the discipline classification file used by Dimensions in its dashboards, which I used to categorise the Divisions of FoR into 5 disciplines.
Dimensions API—classification
username = ""
password = ""
endpoint = "https://app.dimensions.ai"
import dimcli
from dimcli.utils import *
dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()
df_lyrics = pd.read_csv("data/202502_disney_top100_songs_cur_lyrics.csv")
song_for = []
for index, row in df_lyrics[["song", "lyrics"]].iterrows():
# clean the lyrics (remove newlines, etc.)
lyrics_clean = row["lyrics"].replace("\n", " ")
try:
classfor = get_classification(row["song"], lyrics_clean)
except Exception as e:
print(f"Failed to classify song '{row['song']}' after retries: {e}")
continue
print(classfor['FOR_2020'])
song_for.append([index,row["song"],",".join([x["name"] for x in classfor["FOR_2020"]])])
df_results = pd.DataFrame(song_for,columns=["rank","title","Field_of_research"])