DL Seminar | Studying the Privacy Behaviors of Mobile Apps at Scale
Updated: Nov 2, 2020
Individual reflections by Frank C. Barile, and Heidi He (scroll down).
By Frank C. Barile
Code Reuse: Software Data Kits and Resulting Mobile Application Privacy Violations
On October 7, 2020, the Digital Life Seminar series at Cornell Tech welcomed Serge Egelman, Research Director of the Usable Security and Privacy group at the International Computer Science Institute, an independent research institute affiliated with the University of California, Berkeley. Egelman is also the CTO and co-founder of AppCensus, Inc. , a startup offering on-demand privacy analysis of mobile apps. Egelman’s seminar was entitled “Taking Responsibility for Someone Else's Code: Studying the Privacy Behaviors of Mobile Apps at Scale” .
Egelman focused on “code reuse”, which entails software developers using pre-fabricated portions of bundled code (sometimes called “Software Development Kits” or “SDKs”) to complement larger software production. SDKs aim for efficiency, allowing a developer to avoid reconstructing the building blocks each time they develop a program. This is similar to preparing a dish with pre-packaged materials- for example, adding a pre-mixed collection of spices into a sauce, instead of sourcing, preparing, and combining the spices manually. If using a bundled product to get started, the preparer naturally has less control over the total contents of the final product. Alternatively, if one creates a dish from scratch, they know of its contents in totality. The use of SDKs is widespread, as it can save developer resources by plugging in pre-engineered portions of code (just as the cook can shortcut food preparation), rather than reinventing it each time they develop software.
Code reuse comes with insidious dangers that pervade the development industry. If not designing each portion of a program, one cedes control to the developer of the SDK, and may not even be aware of the SDK’s functions. The reused code in any SDK may not behave in the manner imagined, and might even have unintended consequences. For example, Egelman outlined the proliferation of code reuse resulting in rampant data privacy violations, particularly in the space of personal data. This practice could also raise questions of accountability for the final product- which party would be liable for data breaches? Both the developer and the SDK perhaps?
App users may find themselves in a familiar quandary: “I need apps. I don’t have time  to read all the disclosures or options ; and if I did, I might not understand them. And even if I did comprehend, I’ll likely just press ‘accept’, especially given this service is free. Besides, I really don’t care what they do with my location data at a particular moment.” If the user isn’t motivated to protect user data, and then the app developers are not influenced to protect user data, that leaves guardianship to regulatory authorities, who haven’t acted fast enough to combat the ills of code reuse or data breaches. This results in the mass-sharing of user data- whether users are aware or not- to apps that are then using the data, or even sharing the data with another party.
It would be an absurd conclusion to absolve a developer from liability for unknowingly deploying SDKs that leak data. If developers use SDKs, naturally they should shoulder the burden of investigating the SDK before rolling into production. If they cannot conclude that the SDK is secure, best practice may be to eschew such SDK. SDKs in turn should offer transparency about the behavior of their designs.
Ultimately, it should not matter whether the developer did not know of the behavior of the SDK that is ultimately integrated into their app. While fraud usually requires some knowledge of falsehood, recent data privacy regulations generally do not require knowledge of wrongdoing, deceit, falsehood, etc. One can be liable merely for sharing data in violation of a rule. Developers should take precautions as a prophylactic measure to protect users, and perhaps also themselves.
When asked by consumers how to parse which apps are egregious offenders, Egelman jests: seek permission-protected application programming interfaces, disable app binaries, and then perform a deep packet inspection. This technical process illustrates Egelman’s point- it would be absurd to assume that the everyday user is qualified or even equipped to investigate privacy policies. As a result, if the user cannot reasonably or efficiently protect itself, the potential solution may lie in government regulation.
Egelman proposed additional solutions in the absence of comprehensive regulation, the primary being an app he developed at AppCensus, emanating from his research on privacy. AppCensus is a product that enables users to examine the privacy behaviors of mobile apps in an automated fashion.  Most app users likely don’t have the resources- given the multitude of apps used in daily life- to assess whether the app may be violating laws, or even the app’s own privacy policies. AppCensus dynamically analyzed hundreds of thousands of popular Android apps to assess (a) what data they collect, (b) whom they share it with, and (c) how these practices may conflict with privacy policies or laws. AppCensus users can then stay informed on privacy practices with a fraction of the effort.
Further, Egelman suggests that developers should be aware of the risk when using SDKs and should be responsible to review the code in the SDK, as they ultimately adopt it as their own when embedding into their development. Developers should engage compliance personnel to raise awareness and investigate these issues. Lastly, testing should be a standard practice and ultimately a requirement (whether by law, policy, or both). This will keep developers aware of app behavior (whether it’s their development or SDK code) before it is pushed into production and potentially putting users at risk.
In the larger context of data privacy, society should move to more disclosure so that the user can make an informed decision. However, considering the aforementioned research, disclosure, while transparent, may not yield a better result, particularly if disclosure becomes so burdensome that users will not consume it. Perhaps a better solution that strikes the balance between users and apps is to regulate the collection, use, and sharing of data; require transparency through disclosure (but disclosure that is meaningful and legible); and impose strict penalties for violators. One can look at analogous regimes in the European Union, in particular the General Data Protection Regulation , though there likely isn’t a critical mass of enforcement yet to provide comprehensive guidance.
With the recent proliferation of technology that stimulates data sharing, there are now more competing forces of privacy against such sharing. Regulation has evolved in certain jurisdictions, such as the US, Canada, Australia, the European Union, and the UK, but not in others.  Contextual Integrity , as described by Professor Helen Nissenbaum, is a rubric with which to think about privacy, data, and the context in which data is used or shared. Factors to consider include: how personal is the data, was it consented to, was the user aware, did the user comprehend disclosures, etc. Applying Professor Nissenbaum’s concepts, perhaps the tide of data privacy regulation turns only when society as a whole deems it prudent to force legislators to hold data collectors accountable.
Turning back to striking a balance of interests, Contextual Integrity perhaps would there seek equilibrium; for example, data usage or sharing may be socially permissible in one context, but not in another. A malleable standard, while it may invite disparate enforcement and inequity at times, provides flexibility for arbiters and also for data collectors to scale their practices against such social norms.
Leveraging SDKs can be efficient for developers but comes with accompanying privacy risks that should be assessed. Everyday consumers likely aren’t equipped to sift through quantities of disclosure, policies, and consent forms. Research shows that data breaches occur, whether nefarious or otherwise. New regulations are likely necessary to protect users. Regulation has matured, but is not yet sufficient to keep pace with technology advances. Any resulting regimes should balance the interests of users and developers. If we cannot strike a perfect balance, we should at the very least impose some accountability on the developers and data collectors so we can better protect the privacy rights of the individual.
 https://blog.appcensus.io/2017/08/28/cvs-responds-fake-news/  https://www.cnet.com/news/cvs-app-sends-your-location-to-40-servers-researchers-say/  https://www.cvs.com/help/privacy_policy.jsp  Egelman’s research concludes that most of the privacy violations found were a result of code in the SDK, not in the code of the larger app.  https://search.appcensus.io/about  https://gdpr-info.eu/  https://www.dlapiperdataprotection.com/  https://crypto.stanford.edu/portia/papers/RevnissenbaumDTP31.pdf
By Heidi He
Taking Responsibility for Someone Else's Code: Studying the Privacy Behaviors of Mobile
The Oct. 7th 2020 edition of the Digital Life Seminar series at Cornell Tech welcomed Serge Egelman, the research director of the User Security and Privacy group at the International Computer Science Institute (ICSI), which is an independent research institute affiliated with the University of California, Berkeley. Prof. Egelman is also CTO and co-founder of AppCensus, Inc., which is a startup that is commercializing his research by performing on-demand privacy analysis of mobile apps for developers, regulators, and watchdog groups. Over the hour-long talk, Prof. Egelman introduced us with his study on the privacy behaviors of mobile apps at scale and the in-depth vision of challenges in consumer protection.
Ten years ago when the focus of the privacy and security community was limited to the permission system, the question was whether users understand the permission system. Prof. Egelman's team conducted research studying users online and in laboratory experiments, they found out that people do not notice install-time permission requests, and do not understand them when they do. There is not a good way for consumers to understand the privacy implications of apps. The “notice and consent” framework requires consumers to take responsibility for their privacy by reading privacy policies, which is an excessive workload. Moreover, those privacy policies often entail data that will be shared with various third parties, who each pertain their own policy and the sharing system can be intricate. After all, it is unlikely for a user to understand all of the policies.