• Digital Life Initiative

DL Seminar | Taking Responsibility for Someone Else's Code: Studying the Privacy Behaviors of Mobile

Updated: 7 days ago

Individual reflections by Frank C. Barile, and Heidi He (scroll down).

By Frank C. Barile

Cornell Tech

Code Reuse: Software Data Kits and Resulting Mobile Application Privacy Violations

On October 7, 2020, the Digital Life Seminar series at Cornell Tech welcomed Serge Egelman, Research Director of the Usable Security and Privacy group at the International Computer Science Institute, an independent research institute affiliated with the University of California, Berkeley. Egelman is also the CTO and co-founder of AppCensus, Inc.[1], a startup offering on-demand privacy analysis of mobile apps. Egelman’s seminar was entitled “Taking Responsibility for Someone Else's Code: Studying the Privacy Behaviors of Mobile Apps at Scale”[2].

Code Reuse

Egelman focused on “code reuse”, which entails software developers using pre-fabricated portions of bundled code (sometimes called “Software Development Kits” or “SDKs”) to complement larger software production. SDKs aim for efficiency, allowing a developer to avoid reconstructing the building blocks each time they develop a program. This is similar to preparing a dish with pre-packaged materials- for example, adding a pre-mixed collection of spices into a sauce, instead of sourcing, preparing, and combining the spices manually. If using a bundled product to get started, the preparer naturally has less control over the total contents of the final product. Alternatively, if one creates a dish from scratch, they know of its contents in totality. The use of SDKs is widespread, as it can save developer resources by plugging in pre-engineered portions of code (just as the cook can shortcut food preparation), rather than reinventing it each time they develop software.


Code reuse comes with insidious dangers that pervade the development industry. If not designing each portion of a program, one cedes control to the developer of the SDK, and may not even be aware of the SDK’s functions. The reused code in any SDK may not behave in the manner imagined, and might even have unintended consequences. For example, Egelman outlined the proliferation of code reuse resulting in rampant data privacy violations, particularly in the space of personal data. This practice could also raise questions of accountability for the final product- which party would be liable for data breaches? Both the developer and the SDK perhaps?


App users may find themselves in a familiar quandary: “I need apps. I don’t have time[3] to read all the disclosures or options[4]; and if I did, I might not understand them. And even if I did comprehend, I’ll likely just press ‘accept’, especially given this service is free. Besides, I really don’t care what they do with my location data at a particular moment.” If the user isn’t motivated to protect user data, and then the app developers are not influenced to protect user data, that leaves guardianship to regulatory authorities, who haven’t acted fast enough to combat the ills of code reuse or data breaches. This results in the mass-sharing of user data- whether users are aware or not- to apps that are then using the data, or even sharing the data with another party.


Egelman performed much research to identify the issues and support his conclusions. In one study, Egelman surveyed 308 Android users and concluded that most users don’t understand permissions requests. In fact, they may not even read the requests, calling into question the utility of disclosures. Ultimately, Egelman found that privacy violations were ubiquitous- whether violations of law, or even violations of an app’s own privacy policy.

Further, because code reuse involves not just the ultimate development, but also the original design of any embedded SDK, there is a possibility that the SDK portion of the code can also violate the app’s privacy policy. This is an especially critical blind spot if developers are not aware of the SDK’s behavior. Thus, the onus should be on developers to investigate SDK code they may use in their final product, begging more questions about how transparent an SDK should be with its functionality.

Cautionary Tale

Another Egelman study utilized a click-farm of smartphones that accessed apps, read privacy policies, and tracked data sharing. The result was that out of apps surveyed, CVS Pharmacy[5] was far and away the most divulgent, providing a user’s location data with forty external parties.[6] Egelman suspected that CVS’s app was developed using SDKs and CVS was likely unaware of external sharing of data permitted in the SDK code. As a result, Egelman contacted CVS, and it denied the claims, even suggesting that users could simply turn the app off to resolve the issue.[7] Egelman then went to the press, which published the account.[8] Thereafter, CVS resolved the issue within one week. CVS currently has a lengthy privacy policy[9] with explicit opt-out channels on its website. Egelman’s research broadly suggests this data breach occurrence is not anomalous nor limited to CVS.


It would be an absurd conclusion to absolve a developer from liability for unknowingly deploying SDKs that leak data. If developers use SDKs, naturally they should shoulder the burden of investigating the SDK before rolling into production. If they cannot conclude that the SDK is secure, best practice may be to eschew such SDK. SDKs in turn should offer transparency about the behavior of their designs.

Ultimately, it should not matter whether the developer did not know of the behavior of the SDK that is ultimately integrated into their app. While fraud usually requires some knowledge of falsehood, recent data privacy regulations generally do not require knowledge of wrongdoing, deceit, falsehood, etc. One can be liable merely for sharing data in violation of a rule. Developers should take precautions as a prophylactic measure to protect users, and perhaps also themselves.


Regulations do exist in certain jurisdictions that protect data privacy. Aside from this, apps usually maintain a privacy policy. It is important to understand the distinction of violating either, as they can have varying consequences. Violating either a law or privacy policy can both carry criminal or civil penalties. The difference is the app sets its own privacy policy and not the law. An app could even violate both its own policy and the law simultaneously. An interesting point Egelman raises is if a developer is unaware of the behaviors of an embedded SDK, not only may the app violate law or its own policy, the SDK could violate the policy or the law and the app should ultimately be responsible.[10]


[1] https://search.appcensus.io/ [2] https://www.dli.tech.cornell.edu/seminars/Taking-Responsibility-for-Someone-Else's- Code%3A-Studying-the-Privacy-Behaviors-of-Mobile-Apps-at-Scale [3] Egelman’s studies conclude that most users can read 250 words per minute, access 100 unique websites per month, and the median privacy policy contains 2500 words. This would result in 200 hours per year of reading, or 5 weeks of an ordinary 40-hour work week, and thus, is impractical. [4] For a Google product, Egelman spent 15 minutes merely trying to locate an online opt out of behavioral targeting. [5] https://www.cvs.com/mobile-cvs/apps [6] https://blog.appcensus.io/2017/08/25/cvs-discretely-shares-your-location-with-40-other-sites/ [7] https://blog.appcensus.io/2017/08/28/cvs-responds-fake-news/ [8] https://www.cnet.com/news/cvs-app-sends-your-location-to-40-servers-researchers-say/ [9] https://www.cvs.com/help/privacy_policy.jsp [10] Egelman’s research concludes that most of the privacy violations found were a result of code in the SDK, not in the code of the larger app.

By Heidi He

Cornell Tech

Taking Responsibility for Someone Else's Code: Studying the Privacy Behaviors of Mobile

The Oct. 7th 2020 edition of the Digital Life Seminar series at Cornell Tech welcomed Serge Egelman, the research director of the User Security and Privacy group at the International Computer Science Institute (ICSI), which is an independent research institute affiliated with the University of California, Berkeley. Prof. Egelman is also CTO and co-founder of AppCensus, Inc., which is a startup that is commercializing his research by performing on-demand privacy analysis of mobile apps for developers, regulators, and watchdog groups. Over the hour-long talk, Prof. Egelman introduced us with his study on the privacy behaviors of mobile apps at scale and the in-depth vision of challenges in consumer protection.

Ten years ago when the focus of the privacy and security community was limited to the permission system, the question was whether users understand the permission system. Prof. Egelman's team conducted research studying users online and in laboratory experiments, they found out that people do not notice install-time permission requests, and do not understand them when they do. There is not a good way for consumers to understand the privacy implications of apps. The “notice and consent” framework requires consumers to take responsibility for their privacy by reading privacy policies, which is an excessive workload. Moreover, those privacy policies often entail data that will be shared with various third parties, who each pertain their own policy and the sharing system can be intricate. After all, it is unlikely for a user to understand all of the policies.

Although Privacy policy is legally mandated in US or EU, not all app complies with their own policies or with relevant laws or regulations. Prof. Egelman’s team build a system that downloads plenty of android apps and observe their data flows. They detected the exfiltrated data and accessed APIs as well as SDKs. After testing abundance of apps with their pipeline, they started to examine compliance at scale. Prof. Egelman talked about the case of CVS where the user location coordinate is shared from the user-agent with any third party content that’s hosted on the same web page. He pointed out that developers are often unaware of their app’s privacy issues causing serious privacy liability. He also mentioned that 57% of designed-for-families apps are in potential violation of US children’s online privacy protection act due to platform providers not enforcing their own terms. Moreover, the advertising ID can disclose one’s data to the third party. Although the ad id is designed so that users can reset it, 75% of apps transmit the ad id alongside other persistent identifiers, neglecting their privacy settings.

Prof. Egelman’s talk offers an in-depth vision of privacy concerns and regulations in mobile apps. While reading all of the privacy policy word-by-word seems to be an impossible task for a general user, understanding the network structure as well as laws and regulations requires a higher level of expertise and skills. What can consumers, developers, and managers do about this? Prof. Egelman suggested developers to read SDK documentation to check any configuration one need to disable to prevent the SDK from violating the privacy regulations. In addition, compliance people should be aware and offer guidance to the developers in their organization. Last but not least, everyone in the ecosystem can actually test the apps and look at the traffic flow and the receivers.



Cornell Tech

2 W Loop Rd,

New York, NY 10044

Get Here >

DLI Queries

Jessie G. Taft