The following guest post comes Daniel Teachey, an expert in data management, analytics, and cloud computing and a member of the SAS External Communications team.
Is it Tupac Shakur or 2pac Shakur?
Notorious BIG, Notorious B.I.G. (note the periods), or Biggie Smalls? Ted Leo and the Pharmacists, Ted Leo & the Pharmacists or Ted Leo/Pharmacists? And, famously, how about Prince, the Artist Formerly Known as Prince, the Artist, or that little symbol deal? It’s enough to make “Big Poppa” break down and cry.
These are all valid ways to spell the names of just a few recording artists. Luckily, these variations don’t cause a problem when you’re speaking to another music lover. If you tell me that you were listening to a song by Prince, I can recall most of his entire catalog, regardless of the moniker.
But what happens when you want to find the entire music catalog of your favorite artist in iTunes, Spotify, Pandora and the myriad other online sites? What if I wanted to listen to (or purchase) the entire Prince catalog?
That can be a little tricky – and it all comes down to the way you manage that data. And this has far-reaching effects beyond just consumer frustration—it can actually impact how an artist collects royalties on the purchase or streaming of their music. Unfortunately, there is no consensus on the syntax for an artist, an album or a song.
Here’s another example: Darryl Hall and John Oates.
For decades, they have been regaling fans with their Philly blue-eyed soul. But check any music site, and how is their name listed? “Darry Hall and John Oates,” “Hall & Oates,” and “Hall and Oates” are all options.
You would assume that high-powered music sites wouldn’t get confused by something as small as an ampersand. However, if the software running these sites doesn’t have a way to reconcile these things, then “Hall and Oates” and “Hall & Oates” are essentially separate bands. Can’t find “Sara Smile” or “She’s Gone?” Maybe they’re with that other Hall & Oates.
Behind the Music: Data
We like to think of digital music as a collection of audio files. What’s less understood is that the file also has data – also known as metadata – that helps systems classify songs, albums or artists. These problems may be new to the average music lover, but they have flummoxed data professionals since the days of punch cards.
Computers love order. They thrive on certainty. Humans? We have no such need, and as things get added to computer systems, there are invariably inconsistencies in how the data is entered. (Anybody with thousands of songs in their iTunes system is probably nodding sadly at this point, after another frustrating effort at data quality management.)
In the digital age, those problems can become magnified. Digital music comes from both publishers and independent artists, and there are no set standards for how to classify names across systems. Unfortunately, data hiccups can lead to frustrating issues to the online listener. If you want to find all the songs by a single artist, you may have to think about all the permutations of that name. The question is: how do you introduce some order and certainty to millions of songs, artists, albums and other music elements?
The key is to understand that the data about songs is an asset that you can explore, rationalize and, ultimately, improve.
Managing Your Data and Playing the Hits
Do you remember the days when you once got four pieces of mail to the same address? That was because a database marketer had your address in their system four separate times. Data quality technology allows marketers to find these duplicates and reconcile them. Nowadays, duplicate mailings rarely happen, because companies got smarter about their data. Along the way, they exacted significant cost savings (no more extra mailings) while at the same time improving customer satisfaction (no more multiple catalogs clogging your mailbox).
In most organizations, there is usually a group focused on the health of data like this. Sometimes, there is even an executive, often called the Chief Data Officer, assigned to managing this data as an asset. For a bank or a retailer, this group will work with the organization’s technology and business leaders to codify the rules for their company – and how to apply them within their systems.
These same principles apply to digital music. By creating rules within the databases behind the online music, you can start to rationalize all the permutations of an artist’s name. These rules can be “always on” to catch potentially non-standard data as it gets added to the catalog.
The data problems in the music industry are not a new story, especially for groups that collect and distribute royalty fees. Just imagine the data problems that a royalty company faces when trying to track down recording artists, authors and publishers who are often on the move – and possibly using different stage names.
Ain’t No Party Like a Crowdsource Party
While the royalty world has a business need to get data right, there is less of an imperative for online music sites. They are wildly popular already, and the occasional bit of dirty data may not seem to be a huge problem. But, as Taylor Swift showed us in 2014 in her tiff with Spotify, artists are looking at the royalty stream from online systems.
As there is increased scrutiny on payments per stream, in order to prove their value to artists, there will be more pressure on these services to get payments right. Which will mean getting the data right.
Perhaps a fix will spring from the collaborative nature of online music. These sites create – and thrive on – a community. There are audiophiles and music lovers everywhere consuming and interacting with music in ways that seemed ludicrous just two decades ago. Music, in many ways, is a social mechanism.
There have been efforts to “crowdsource” the categorization of music – is the song laid-back, mellow, acoustic, etc., beyond the set genres in iTunes. Sites like Pandora, Spotify and Google Play are getting smarter at making recommendations based on these categorizations—and users can easily modify the suggestions with a simple thumbs up or thumbs down.
Maybe it’s the time to crowdsource the data behind the music that we listen to every day. If a listener sees a song title or artist’s name that isn’t square with other entries, they can just “Shake it Off,” flag it for review, and the data geeks can fix the data behind the scenes.
For both new and legacy artists, digital music is their main point of exposure to rabid fans and new listeners alike; now, more than ever, the data behind their music matters: for listeners, for artists and for the industry.
Daniel Teachey is a member of the SAS External Communications team, and in his current role, he works closely with global marketing groups to generate content about data management, analytics and cloud computing. Prior to this, he managed marketing efforts for DataFlux, helping the former SAS subsidiary go from a niche data quality software provider to a world leader in data management solutions.
Image by Joanna Poe, adapted under a Creative Commons Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) license.