Web
Analytics
top of page
  • Writer's pictureJessie G Taft

DL Seminar | Machine Learning’s Copyright Problem

Updated: Jan 8, 2019

By Bradley Wise | Connective Media Student, Cornell Tech


Illustration by Gary Zamchick

DL Seminar Speaker Amanda Levendowski

Could copyright law be enhancing bias in the way machine learning algorithms learn? According to Amanda Levendowski, a teaching fellow in the Technology Law & Policy Clinic at NYU, the case is compelling. It boils down to who has access to certain works without fear of legal action.


Levendowski explained that right now there are only a handful of companies (Facebook, Google, IBM, etc.) that have the financial capital to both create AI systems and acquire the rights to copyright works to feed those systems, either by building systems that acquire works outright or through acquisitions and licensing. This creates a problem. Since the volume of data generated or obtained by these companies includes works that are subject to copyright protection, it is very challenging for the public to understand and know what data is being used for these systems. Researchers and journalists have found that many AI systems built and used today suffer flaws – due primarily to biased datasets. The copyright problem, in short, is that if the data can’t be accessed, this bias can’t be addressed.


Levendowski’s paper, How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem, highlights several examples of how biased datasets create biased systems, most notably for “biased, low-friction data (BLFD).” BLFD datasets are regarded as legally low-risk, principally because they are widely accessible in the public domain. An example is the Enron emails: a set of 1.6 million emails sent among Enron employees in 2003, and ultimately released publicly because of the company’s involvement in a fraud investigation which ultimately led to its collapse. The Enron dataset has been used by a number of AI systems in the computer science community, including spam filters, other natural language machine learning systems, and even in the initial build of Apple’s Siri. However, researchers have also used these emails, and their extremely narrow demographic lineage, to analyze for gender bias and unsavory power dynamics, creating an ethical quandary for systems built upon them.


Additional questionable examples of BLFD include scraped profiles from the online dating site OKCupid, as well as WikiLeaks’ release of 20,000 hacked emails during the Hilary Clinton campaign, both disclosed in 2016. These datasets are questioned not for their bias per se, but for the ethical questions surrounding viewing and using them. The OKCupid dataset posed many ethical quandaries, in particular for the public uproar it created over privacy and identification (it has since been removed from public use). WikiLeaks’ release also created concerns over use of classified government emails in machine learning algorithms. Because BLFD is easily available and cheap to use, it presents a particularly acute dilemma for small startups or companies with limited capital resources.


Levendowski’s solution is to push for fairer AI through the copyright “fair use” doctrine, which permits limited use of copyright works without permission from the copyright holder, including for purposes of criticism, news reporting, or research. If there was a court decision acknowledging that fair use applies to using copyright works as machine learning training data, she argues, there would be more competition for fairer AI systems – an alluring prospect, even if its practical achievement does require concerted effort by incumbents and new players alike.

7 comentarios


Rukhsar Rafiq
Rukhsar Rafiq
30 mar

I wanted to thank you for this excellent read!! I definitely loved every little bit of it. I have you bookmarked your site to check out the new stuff you post.

kèo nhà cái


Me gusta

Rukhsar Rafiq
Rukhsar Rafiq
30 mar

I admit, I have not been on this web page in a long time... however it was another joy to see It is such an important topic and ignored by so many, even professionals. I thank you to help making people more aware of possible issues.

glory boyz


Me gusta

Rukhsar Rafiq
Rukhsar Rafiq
27 mar

You there, this is really good post here. Thanks for taking the time to post such valuable information. Quality content is what always gets the visitors coming.

amarnath helicopter booking

Me gusta

Rukhsar Rafiq
Rukhsar Rafiq
27 mar

Its a great pleasure reading your post.Its full of information I am looking for and I love to post a comment that "The content of your post is awesome" Great work.

helicopter booking vaishno devi


Me gusta

Rukhsar Rafiq
Rukhsar Rafiq
27 mar

I wanted to thank you for this excellent read!! I definitely loved every little bit of it. I have you bookmarked your site to check out the new stuff you post.

vaishno devi helicopter


Me gusta
bottom of page