• Digital Life Initiative

DL Seminar | Data Ownership is Not Dispositive: Data Access Conflicts in Public-Private Contracting

By Stephen Terrence Brown

Cornell Tech

Property rights, and their implications on public access to resources have never been a simple subject. And yet, here, in our modern age, has comecome data ownership, to complicate the meaning of that bundle of rights even further. There could hardly be a more relevant topic, at a time when access to data by corporations controls R&D arms races, like that which got scientists to several COVID-19 vaccines, in record time. The implications of who controls massive sets of data, be they healthcare related, or other forms of valued information, requires serious research and contemplation, and this endeavor is being tackled by an emerging mind in the field. On May 6th, Meg Young, a Postdoctoral Fellow at Cornell Tech’s Digital Life Initiative, previously of the Tech Policy Lab at the University of Washington in Seattle, helped to unpack issues in this area for the lucky attendees. Young’s work incorporates ethnography and design to decipher exactly how data is being used by the government. In her talk, Young addressed government transparency, in relation to contracting with private firms, by walking the audience through her research, including ethnographic studies.

Getting Moving in South Lake Union

Young began with an explanation of the setting and methods of her ethnographic study, based in South Lake Union, in downtown Seattle. This is the home to Amazon Headquarters, as well as many other tech firms. As these companies have grown, the demand for labor, housing and transportation has grown in concert. Surrounded by water, the area has unique challenges related to transportation. Adapt to this predicament, Seattle has shifted to heavier reliance on digital technologies, and data-intensive systems. In one of the busiest areas, the Department of Transportation (DoT) has installed hardware to collect the MAC addresses that uniquely identify the cell phones of people passing traffic lights. This information is sent to a vendor, a company that sells services to the government, which aggregates and anonymizes the data. Once sent to the DoT, the information is used to more efficiently time traffic lights, in response to real-world demand. The issue arising here, from a legal and public policy perspective, is who is accountable for this data? According to Young, a public employee whom she spoke to informed her of a general sentiment among the government that the anonymization of data, once handed to the vendor, is the vendor’s responsibility. Essentially, the government is passing the buck, meaning that pressing transparency and accountability concerns are harder to address.

Zooming out, Young attempted to solve the question of who is responsible. A rising share of public services are being handed over to the companies using proprietary systems. These are often blackbox technologies, where one can see the inputs and outputs, but not the inner workings, leaving the question of what is actually being done with the data a mystery to the public. This does not, initially, tend to be a forward thinking practice in regards to the implications on privacy rights. In diving into this conundrum, Young used empirical methodology to analyze technical, legal, and policy factors that enabled and constrained data access in these cross-sector data contracting relationships. The ethnographic study ran from 2014 to 2019, including interviews with a diverse group of 40 individuals, from public agencies to vendors, to advocates and activists, and during that time, Young worked in and with area advocacy and activist groups, as well.

An illustrative anecdote collected features Phil Mosaic, an activist who requested records from the Tacoma Police Department about surveillance devices known as stingrays, manufactured by the Harris Corporation. The records that Mosaic received were redacted, to the point of being entire pages of black censorship. In response to this unacceptable reply, Mosaic sued the department under Washington’s Freedom of Information law, claiming that Harris Company had no right to withhold this information. The court agreed, awarding Mosaic fifty thousand dollars in damages, which was given to six police transparency non-profits. Washington state’s Freedom of Information law defines all records that the government prepares, owns, uses or attains to be available for disclosure, unless there's an explicit exemption in the statute. However, over five-hundred such exemptions exist. The practical application of this legal regime means the requester makes a records request to the agency, which, in turn, looks for any responsive records. If there is a record that is proprietary or marked as confidential, the vendor is notified and can file for an injunction, stopping the court from releasing the records and the court makes a determination. Washington, therefore, is an excellent test case, as a finding that there are systems that are barred to public access, even there, would suggest that the risks for transparency and accountability are even larger in other states without similarly forceful laws.

Rideshare, and Its Discontents: The Case of Lyft and King County Metro

In her first case study, Young turned to a private-public data partnership, which ultimately failed. The government’s position has been that, in order to create good policy, it was necessary to know what was happening on the ground. This, currently, requires information from rideshare apps, like Lyft and Uber. These companies, in contrast, are in a staunch opposition to the release of their proprietary data at the granular level of specificity requested. For example, Uber launched an operation known as ‘Gray Ball,’ intended to shield data from collection through geofencing and other aggressive means. As a result, transportation agencies have had to proceed with incomplete data. However, the need for this data still exists, as early research suggests that, rather than diverting riders for privately-held vehicles, it may in fact be diverting them from public transit. In 2016, Seattle addressed this issue by making permits to operate within city limits contingent on data sharing. Jeff Kirk, a tech journalist, requested data under the Public Records Act, causing Lyft and Uber to be notified. The companies sued, in a case that went all the way to the state Supreme Court, on the basis that disclosure would harm their competitive advantage. In a 5-4 decision, the court ruled that the data was disclosable, as public interest tilted in favor of researchers having access to Uber and Lyft data that spoke to racial inequity harms.

However, the focus of the case study was on the public-private partnership between Lyft and public transit agency King County Metro in 2016. Public agencies received federal funding to subsidize Lyft trips from transit riders homes to public transit hubs in low-density areas. This plan linked Lyft to transit cards, requiring the firm to share granular data to ensure that federal funds were being correctly allocated towards users of public transit, instead of used to take riders to stores nearby the transit hub, and to ensure that drivers’ routes were making efficient use of the subsidy.

Following the court case, however, Lyft began to stall, despite the program already having been funded and Lyft having signed a letter of intent. Lyft admitted to privacy concerns and fears of losing competitive advantages in the marketplace. Young further theorizes that Lyft feared the sharing of information would lead to further regulation of their services. In the wake of this, the public agencies began to argue intellectual property claims, stating that the fact that the program receive public funding made the data the property of public agencies. Ultimately, the two factions reached an impasse, and Lyft pulled out of the program. The pilot moved forward with Via, a smaller ride-share service that promised to be more amenable to sharing data. In fact, it was not. Negotiations took over a year, and disagreements continued, with evidence showing a gap between contractual obligations and the actual data shared by Via. The takeaway is that the reading of the public records law recognized the public interest in proprietary data, but simultaneously prevented new services from getting off the ground. The firms’ reluctance to share reflects a larger problem with getting access to private data.

More Than IP: The Case of ORCA

The second case study presented involved the government’s issues with getting data from its own vendors. The ORCA (a transit system standing for One Regional Card for All) program thought that a turn-key solution would be ideal, in which the winning bidder for their contract would handle everything from hardware to software, data warehousing to payment services, etc. The data flow accompanying these arrangements showed that the agencies only released the last two years of data, and everything else was warehoused by the vendor, VIX. VIX’s technical infrastructure was extremely time-intensive, causing issue, as the transit agencies had a strong need for longitudinal data for planning purposes. VIX won the bid and took precautions to contractually own all of the information that was produced by the system. However, the agency did not have access, due to technical and organizational barriers getting out of the system. It ended up costing three hundred thousand dollars more to export the data from its warehouse.

In 2017, the transit agencies continued the bidding process, abandoning the turn-key solution to instead have more than one vendor, addressing non-functioning parts of the system. Tech activist Tyler Trevor thought it could be important to establish whether software could be considered a public record. He issued a public records request for fair enforcement software, which was contested by VIX. VIX claimed the software was intellectual property and, like Lyft before, thought it would cost them a competitive advantage in the marketplace to share data. The court sided with VIX. This case has parallels to other jurisdictions where activists have requested the code that law enforcement uses in DNA matching software, denied on the basis of trade secret.

Conclusions: Rethinking Data Governance

While trade secret claims block public access to software and code, creating transparency and accountability issues, organizational and technical boundaries, like VIX’s data being locked-in the warehousing, can keep even the owner from accessing data. In both cases presented, proprietary systems blocked access, but not exclusively because of intellectual property claims. This demonstrates that IP is not solely determinative of outcomes. Like the case of Uber and Lyft being ordered by the Washington Supreme Court to hand over their data, and with VIX’s infrastructural issues, data owners can lose control over their own data. Further, like King Metro and Lyft, multiple parties can assert ownership over the same data. Young’s conclusion is that as ownership is not necessary for data control, ownership is not a good framework for governing data. With the question of who owns the rights so constantly in flux, a more stable regime for addressing the increasingly important issue is needed.