Artificial Intelligence, or AI, is a subject generating significant attention, both in the music industry and beyond.
The following comes from Stephen Brady, Lead Software Engineer at Exactuals, a proud partner of DMN.
Many fear AI could put jobs at risk, especially considering research shows 45% of occupations could be automated by adapting currently demonstrated technologies. However, copyright-intensive industries, like music, could benefit from this type of overhaul, and given those industries’ key contributions to the national GDP, it is imperative we work to leverage AI to create an economic landscape that ensures all revenue is collected.
According to the most recent U.S. Census data from 2014, copyright-intensive industries accounted for 5.5% of the total U.S. GDP, up from 4.4% in 2010. This meaningful percentage would be more significant if not for substantial revenue loss caused by piracy and incorrect or unknown rights attribution. Despite the general assumption that piracy is declining, antipiracy consulting firm Muso found that visits to music piracy sites were actually up 14.7% in 2017 to 73.9 billion. In addition, an estimated $1.5 billion in unclaimed music royalties was reported in 2017, according to NOI filings. Both represent a significant loss of revenue.
In this article, we will focus on the problem of unclaimed royalties and show how Exactuals uses AI to resolve it.
At the most basic level, algorithms attributing royalties to rights holders are asking questions of equivalence. This means the program is trying to determine whether or not two entities are the same thing. For this reason, these types of algorithms are called “entity resolution” algorithms. Executing this process may sound simple, but can be extremely difficult due to the veracity of the data and the computational complexity required. The simplest way to perform entity resolution is by measuring the similarity between metadata attributes like “Title” and “Performer,” a process known as “attribute similarity,” and using the results to make automated decisions. Here’s an example.
In Exactuals’ RAI open API, this type of algorithm is used to find links between sound recordings and musical works. To illustrate this, take a look at the song “Michelle” by The Beatles. Written by John Lennon and Paul McCartney, the track has been recorded by over 778 different performers, resulting in an estimated 1,302 total recordings of the song. To properly attribute its rights, an algorithm needs to determine equivalence between the one original work and its various recordings. Let’s assume two recordings of the song exist with the same “Title” attribute of “Michelle.” However, the “Performer” attribute on one version is “The Beatles” while the other is “Beatles, The.” Using a popular similarity algorithm known as “Levenshtein distance” — which assigns a similarity value between 0 and 1.0 — the Title attribute receives a 1.0 similarity score while the Performer attribute receives only a 0.25. Combining these using a naive weighting scheme results in an attribute similarity score of 0.625, or 62.5%. This is illustrated below:
This is oddly low considering that the attributes are the same, just out of order. To improve the results, an algorithm can be taught to classify whether or not two entities are the same. The algorithm accomplishes this by learning which attributes are more informative than others, thus improving the results by adding more context. This type of machine learning algorithm is known as a “binary classification” algorithm and is used extensively in Exactuals’ RAI.
Using the previous example, let’s assume the binary classification algorithm learns that the Performer attribute is not as informative as the Title attribute when determining match status, assigning a weight of 30% to the Performer and 70% to the Title. That brings the attribute similarity score up to 0.77, or 77%, which is better but still not truly indicative of the underlying relationship between the entities.
The result can still be improved, but doing so will require looking at the data in a different way. Instead of seeing the data simply as a set of attributes, it also must be viewed as a set of relationships. For example, the data shows that Paul McCartney and John Lennon are members of both “The Beatles” and “Beatles, The.” The degree of connectivity, or “connection strength” between them can be measured using an algorithm called “Jaccard Similarity.” Because they both contain the same set of relationships, they produce a “connection strength” score of 1.0.
The final step in determining equivalence is to add the initial attribute similarity score of 0.77 to the connection strength score of 1.0. This is accomplished by weighting the importance of attribute similarity vs. connection strength. Naively assigning a weight of 30% to the former and 70% to the latter, the total equivalence score is 0.93, or 93%, a much more accurate reading. But there’s a problem. In this scenario, the above weights were assigned manually, which would not scale to millions of listeners that are streaming millions of songs at any given point in time.
This process can be optimized by adding another binary classifier to “learn” the relevance of attribute similarity vs. connection strength when making matching decisions. This new algorithm will be used in conjunction with our previous binary classifier, thereby creating an “ensemble” of algorithms, a powerful strategy commonly used to improve our results in Exactuals’ RAI. In this case, the algorithm determined that connection strength is more informative than attribute similarity, assigning a weight of 80% to the former and 20% to the latter. It all results in an equivalence score of 0.954, or 95%.
An Elegant Solution
Artificial intelligence is an inherently complex subject. However, AI can make the process of attributing royalties just as elegant as a duck gliding across a pond. We don’t see the furious swimming beneath the surface, just the result. Solutions like Exactuals’ PaymentHub and RAI leverage AI to accurately detect all rights holders and get them the money they deserve. With a robust library of metadata, AI can connect the intricate web of relationships existing between recordings, significantly decreasing the amount of unclaimed royalties and ensuring all revenue is collected. This is good for the rights holders, good for the industry, and great for the country.