Tupac, or 2Pac? Sadly, Music Industry Still Doesn’t Know…

The following guest post comes Daniel Teachey, an expert in data management, analytics, and cloud computing and a member of the SAS External Communications team.

Is it Tupac Shakur or 2pac Shakur?

Notorious BIG, Notorious B.I.G. (note the periods), or Biggie Smalls?  Ted Leo and the Pharmacists, Ted Leo & the Pharmacists or Ted Leo/Pharmacists? And, famously, how about Prince, the Artist Formerly Known as Prince, the Artist, or that little symbol deal?  It’s enough to make “Big Poppa” break down and cry.

These are all valid ways to spell the names of just a few recording artists.  Luckily, these variations don’t cause a problem when you’re speaking to another music lover.  If you tell me that you were listening to a song by Prince, I can recall most of his entire catalog, regardless of the moniker.

But what happens when you want to find the entire music catalog of your favorite artist in iTunes, Spotify, Pandora and the myriad other online sites?  What if I wanted to listen to (or purchase) the entire Prince catalog?

That can be a little tricky – and it all comes down to the way you manage that data.  And this has far-reaching effects beyond just consumer frustration—it can actually impact how an artist collects royalties on the purchase or streaming of their music. Unfortunately, there is no consensus on the syntax for an artist, an album or a song.

Here’s another example: Darryl Hall and John Oates.  

For decades, they have been regaling fans with their Philly blue-eyed soul.  But check any music site, and how is their name listed?  “Darry Hall and John Oates,” “Hall & Oates,” and “Hall and Oates” are all options.

You would assume that high-powered music sites wouldn’t get confused by something as small as an ampersand.  However, if the software running these sites doesn’t have a way to reconcile these things, then “Hall and Oates” and “Hall & Oates” are essentially separate bands. Can’t find “Sara Smile” or “She’s Gone?” Maybe they’re with that other Hall & Oates.

Behind the Music: Data

We like to think of digital music as a collection of audio files.  What’s less understood is that the file also has data – also known as metadata – that helps systems classify songs, albums or artists.  These problems may be new to the average music lover, but they have flummoxed data professionals since the days of punch cards.

Computers love order.  They thrive on certainty. Humans?  We have no such need, and as things get added to computer systems, there are invariably inconsistencies in how the data is entered.  (Anybody with thousands of songs in their iTunes system is probably nodding sadly at this point, after another frustrating effort at data quality management.)

In the digital age, those problems can become magnified.  Digital music comes from both publishers and independent artists, and there are no set standards for how to classify names across systems.  Unfortunately, data hiccups can lead to frustrating issues to the online listener.  If you want to find all the songs by a single artist, you may have to think about all the permutations of that name.   The question is: how do you introduce some order and certainty to millions of songs, artists, albums and other music elements?

The key is to understand that the data about songs is an asset that you can explore, rationalize and, ultimately, improve.

Managing Your Data and Playing the Hits

Do you remember the days when you once got four pieces of mail to the same address?  That was because a database marketer had your address in their system four separate times.  Data quality technology allows marketers to find these duplicates and reconcile them.  Nowadays, duplicate mailings rarely happen, because companies got smarter about their data.  Along the way, they exacted significant cost savings (no more extra mailings) while at the same time improving customer satisfaction (no more multiple catalogs clogging your mailbox).

In most organizations, there is usually a group focused on the health of data like this.  Sometimes, there is even an executive, often called the Chief Data Officer, assigned to managing this data as an asset.  For a bank or a retailer, this group will work with the organization’s technology and business leaders to codify the rules for their company – and how to apply them within their systems.

These same principles apply to digital music. By creating rules within the databases behind the online music, you can start to rationalize all the permutations of an artist’s name. These rules can be “always on” to catch potentially non-standard data as it gets added to the catalog.

The data problems in the music industry are not a new story, especially for groups that collect and distribute royalty fees.  Just imagine the data problems that a royalty company faces when trying to track down recording artists, authors and publishers who are often on the move – and possibly using different stage names.

Ain’t No Party Like a Crowdsource Party

While the royalty world has a business need to get data right, there is less of an imperative for online music sites. They are wildly popular already, and the occasional bit of dirty data may not seem to be a huge problem.  But, as Taylor Swift showed us in 2014 in her tiff with Spotify, artists are looking at the royalty stream from online systems.

As there is increased scrutiny on payments per stream, in order to prove their value to artists, there will be more pressure on these services to get payments right.  Which will mean getting the data right.

Perhaps a fix will spring from the collaborative nature of online music.  These sites create – and thrive on – a community.  There are audiophiles and music lovers everywhere consuming and interacting with music in ways that seemed ludicrous just two decades ago. Music, in many ways, is a social mechanism.

There have been efforts to “crowdsource” the categorization of music – is the song laid-back, mellow, acoustic, etc., beyond the set genres in iTunes.  Sites like Pandora, Spotify and Google Play are getting smarter at making recommendations based on these categorizations—and users can easily modify the suggestions with a simple thumbs up or thumbs down.

Maybe it’s the time to crowdsource the data behind the music that we listen to every day.  If a listener sees a song title or artist’s name that isn’t square with other entries, they can just “Shake it Off,” flag it for review, and the data geeks can fix the data behind the scenes.

For both new and legacy artists, digital music is their main point of exposure to rabid fans and new listeners alike; now, more than ever, the data behind their music matters: for listeners, for artists and for the industry.


Daniel Teachey is a member of the SAS External Communications team, and in his current role, he works closely with global marketing groups to generate content about data management, analytics and cloud computing. Prior to this, he managed marketing efforts for DataFlux, helping the former SAS subsidiary go from a niche data quality software provider to a world leader in data management solutions.

Image by Joanna Poe, adapted under a Creative Commons Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) license.

11 Responses

  1. Myles

    Search engines have an extremely hard time with classical music also. You get bad results if you search by composer (composers are often listed under different name variations) You get better results if you know the performer. The best results are returned if you know performer, composer, composition and the year it was performed

  2. Will W

    Daniel – this is a great article. People who love music will probably do a better job with the data that recording label wonks or anybody else. I stand by you – crowdsource the data! Bring clarity to the future of music delivery and monetization. Show AC/DC and Taylor Swift that Spotify can make them money (not that I miss them of course).

    Write on and rock on –


  3. MarkC

    “Hall and Oates” vs “Hall & Oates” doesn’t keep Darryl Hall, or John Oates, from being paid. You report back to the labels the number of plays next to the unique identifier, like an ISRC, and then the label pays based on who they have on file for that. In regards to search engine madness; it is a problem, but the concept of an assigned artist alias fixes it, although not all services can keep on top of it (it can be like herding cats sometimes).

  4. Jia

    Ideally one artist uses one name. There is a workaround for variety of names technically if needed. I don’t believe human lives by rules of computer. The other way around.

  5. CraigH

    MarkC – ISRC is a great, but having been on the reporting side of that equation, they are impossible to come by without paying for the information. The RIAA has been promising reasonable access to that data now for several years, but we’ll see what they say again this year at the MusicBiz Metadata Summit – maybe it will be only 12 months out instead of 18…
    As far as crowdsourcing, if you don’t know about MusicBrainz (http://musicbrainz.org), you owe it to yourself to check it out. Discogs is also working on this, but from a little different perspective.

  6. Dude

    Just find ways to link all the names when they look them up.

  7. Amy

    Crowdsourcing music data, as mentioned above, is happening via MusicBrainz and Discogs. The problem with any crowdsourced information is making sure 1,000 monkeys at 1,000 keyboards are giving you the right information. Wikipedia has done a good job of providing strict guidelines, checks & balances, and growing a community of people with its best interests at heart, so of course it’s possible. It’s also possible for malicious editors to add false information to wikipedia articles. If you’re in the business of providing correct information, that possibility is too risky.

    DDEX has started to bang the drum about music medatada standards, and a good handful of businesses that export that data have become DDEX compliant (SONY, 7Digital, etc.) It’s still a work in progress, but if labels, publishers, and distributors all start adopting DDEX as the standard, it would be a step i nthe right direction. The latest style guide was updated earlier this year: http://musicbiz.org/wp-content/uploads/2014/08/MusicMetadataStyleGuide-MusicBiz-FINAL.pdf