Web
Analytics
DL Seminar | Studying the Privacy Behaviors of Mobile Apps at Scale
top of page
  • Writer's pictureJessie G Taft

DL Seminar | Studying the Privacy Behaviors of Mobile Apps at Scale

Updated: Apr 25, 2021


Individual reflections by Frank C. Barile, and Heidi He (scroll down).



By Frank C. Barile

Cornell Tech

Code Reuse: Software Data Kits and Resulting Mobile Application Privacy Violations

On October 7, 2020, the Digital Life Seminar series at Cornell Tech welcomed Serge Egelman, Research Director of the Usable Security and Privacy group at the International Computer Science Institute, an independent research institute affiliated with the University of California, Berkeley. Egelman is also the CTO and co-founder of AppCensus, Inc. [1], a startup offering on-demand privacy analysis of mobile apps. Egelman’s seminar was entitled “Taking Responsibility for Someone Else's Code: Studying the Privacy Behaviors of Mobile Apps at Scale” [2].

Code Reuse


Egelman focused on “code reuse”, which entails software developers using pre-fabricated portions of bundled code (sometimes called “Software Development Kits” or “SDKs”) to complement larger software production. SDKs aim for efficiency, allowing a developer to avoid reconstructing the building blocks each time they develop a program. This is similar to preparing a dish with pre-packaged materials- for example, adding a pre-mixed collection of spices into a sauce, instead of sourcing, preparing, and combining the spices manually. If using a bundled product to get started, the preparer naturally has less control over the total contents of the final product. Alternatively, if one creates a dish from scratch, they know of its contents in totality. The use of SDKs is widespread, as it can save developer resources by plugging in pre-engineered portions of code (just as the cook can shortcut food preparation), rather than reinventing it each time they develop software.

Dangers


Code reuse comes with insidious dangers that pervade the development industry. If not designing each portion of a program, one cedes control to the developer of the SDK, and may not even be aware of the SDK’s functions. The reused code in any SDK may not behave in the manner imagined, and might even have unintended consequences. For example, Egelman outlined the proliferation of code reuse resulting in rampant data privacy violations, particularly in the space of personal data. This practice could also raise questions of accountability for the final product- which party would be liable for data breaches? Both the developer and the SDK perhaps?

Quandary


App users may find themselves in a familiar quandary: “I need apps. I don’t have time [3] to read all the disclosures or options [4]; and if I did, I might not understand them. And even if I did comprehend, I’ll likely just press ‘accept’, especially given this service is free. Besides, I really don’t care what they do with my location data at a particular moment.” If the user isn’t motivated to protect user data, and then the app developers are not influenced to protect user data, that leaves guardianship to regulatory authorities, who haven’t acted fast enough to combat the ills of code reuse or data breaches. This results in the mass-sharing of user data- whether users are aware or not- to apps that are then using the data, or even sharing the data with another party.

Research


Egelman performed much research to identify the issues and support his conclusions. In one study, Egelman surveyed 308 Android users and concluded that most users don’t understand permissions requests. In fact, they may not even read the requests, calling into question the utility of disclosures. Ultimately, Egelman found that privacy violations were ubiquitous- whether violations of law, or even violations of an app’s own privacy policy.

Further, because code reuse involves not just the ultimate development, but also the original design of any embedded SDK, there is a possibility that the SDK portion of the code can also violate the app’s privacy policy. This is an especially critical blind spot if developers are not aware of the SDK’s behavior. Thus, the onus should be on developers to investigate SDK code they may use in their final product, begging more questions about how transparent an SDK should be with its functionality.

Cautionary Tale


Another Egelman study utilized a click-farm of smartphones that accessed apps, read privacy policies, and tracked data sharing. The result was that out of apps surveyed, CVS Pharmacy [5] was far and away the most divulgent, providing a user’s location data with forty external parties. [6] Egelman suspected that CVS’s app was developed using SDKs and CVS was likely unaware of external sharing of data permitted in the SDK code. As a result, Egelman contacted CVS, and it denied the claims, even suggesting that users could simply turn the app off to resolve the issue. [7] Egelman then went to the press, which published the account. [8] Thereafter, CVS resolved the issue within one week. CVS currently has a lengthy privacy policy [9] with explicit opt-out channels on its website. Egelman’s research broadly suggests this data breach occurrence is not anomalous nor limited to CVS.

Scienter


It would be an absurd conclusion to absolve a developer from liability for unknowingly deploying SDKs that leak data. If developers use SDKs, naturally they should shoulder the burden of investigating the SDK before rolling into production. If they cannot conclude that the SDK is secure, best practice may be to eschew such SDK. SDKs in turn should offer transparency about the behavior of their designs.

Ultimately, it should not matter whether the developer did not know of the behavior of the SDK that is ultimately integrated into their app. While fraud usually requires some knowledge of falsehood, recent data privacy regulations generally do not require knowledge of wrongdoing, deceit, falsehood, etc. One can be liable merely for sharing data in violation of a rule. Developers should take precautions as a prophylactic measure to protect users, and perhaps also themselves.

Regulation


Regulations do exist in certain jurisdictions that protect data privacy. Aside from this, apps usually maintain a privacy policy. It is important to understand the distinction of violating either, as they can have varying consequences. Violating either a law or privacy policy can both carry criminal or civil penalties. The difference is the app sets its own privacy policy and not the law. An app could even violate both its own policy and the law simultaneously. An interesting point Egelman raises is if a developer is unaware of the behaviors of an embedded SDK, not only may the app violate law or its own policy, the SDK could violate the policy or the law and the app should ultimately be responsible. [10]

Solutions


When asked by consumers how to parse which apps are egregious offenders, Egelman jests: seek permission-protected application programming interfaces, disable app binaries, and then perform a deep packet inspection. This technical process illustrates Egelman’s point- it would be absurd to assume that the everyday user is qualified or even equipped to investigate privacy policies. As a result, if the user cannot reasonably or efficiently protect itself, the potential solution may lie in government regulation.

Egelman proposed additional solutions in the absence of comprehensive regulation, the primary being an app he developed at AppCensus, emanating from his research on privacy. AppCensus is a product that enables users to examine the privacy behaviors of mobile apps in an automated fashion. [11] Most app users likely don’t have the resources- given the multitude of apps used in daily life- to assess whether the app may be violating laws, or even the app’s own privacy policies. AppCensus dynamically analyzed hundreds of thousands of popular Android apps to assess (a) what data they collect, (b) whom they share it with, and (c) how these practices may conflict with privacy policies or laws. AppCensus users can then stay informed on privacy practices with a fraction of the effort.

Further, Egelman suggests that developers should be aware of the risk when using SDKs and should be responsible to review the code in the SDK, as they ultimately adopt it as their own when embedding into their development. Developers should engage compliance personnel to raise awareness and investigate these issues. Lastly, testing should be a standard practice and ultimately a requirement (whether by law, policy, or both). This will keep developers aware of app behavior (whether it’s their development or SDK code) before it is pushed into production and potentially putting users at risk.

Advocacy


In the larger context of data privacy, society should move to more disclosure so that the user can make an informed decision. However, considering the aforementioned research, disclosure, while transparent, may not yield a better result, particularly if disclosure becomes so burdensome that users will not consume it. Perhaps a better solution that strikes the balance between users and apps is to regulate the collection, use, and sharing of data; require transparency through disclosure (but disclosure that is meaningful and legible); and impose strict penalties for violators. One can look at analogous regimes in the European Union, in particular the General Data Protection Regulation [12], though there likely isn’t a critical mass of enforcement yet to provide comprehensive guidance.

Contextual Integrity


With the recent proliferation of technology that stimulates data sharing, there are now more competing forces of privacy against such sharing. Regulation has evolved in certain jurisdictions, such as the US, Canada, Australia, the European Union, and the UK, but not in others. [13] Contextual Integrity [14], as described by Professor Helen Nissenbaum, is a rubric with which to think about privacy, data, and the context in which data is used or shared. Factors to consider include: how personal is the data, was it consented to, was the user aware, did the user comprehend disclosures, etc. Applying Professor Nissenbaum’s concepts, perhaps the tide of data privacy regulation turns only when society as a whole deems it prudent to force legislators to hold data collectors accountable.

Turning back to striking a balance of interests, Contextual Integrity perhaps would there seek equilibrium; for example, data usage or sharing may be socially permissible in one context, but not in another. A malleable standard, while it may invite disparate enforcement and inequity at times, provides flexibility for arbiters and also for data collectors to scale their practices against such social norms.

Conclusion


Leveraging SDKs can be efficient for developers but comes with accompanying privacy risks that should be assessed. Everyday consumers likely aren’t equipped to sift through quantities of disclosure, policies, and consent forms. Research shows that data breaches occur, whether nefarious or otherwise. New regulations are likely necessary to protect users. Regulation has matured, but is not yet sufficient to keep pace with technology advances. Any resulting regimes should balance the interests of users and developers. If we cannot strike a perfect balance, we should at the very least impose some accountability on the developers and data collectors so we can better protect the privacy rights of the individual.

[1] https://search.appcensus.io/ [2] https://www.dli.tech.cornell.edu/seminars/Taking-Responsibility-for-Someone-Else's-Code%3A-Studying-the-Privacy-Behaviors-of-Mobile-Apps-at-Scale [3] Egelman’s studies conclude that most users can read 250 words per minute, access 100 unique websites per month, and the median privacy policy contains 2500 words. This would result in 200 hours per year of reading, or 5 weeks of an ordinary 40-hour work week, and thus, is impractical. [4] For a Google product, Egelman spent 15 minutes merely trying to locate an online opt out of behavioral targeting. [5] https://www.cvs.com/mobile-cvs/apps


By Heidi He

Cornell Tech


Taking Responsibility for Someone Else's Code: Studying the Privacy Behaviors of Mobile


The Oct. 7th 2020 edition of the Digital Life Seminar series at Cornell Tech welcomed Serge Egelman, the research director of the User Security and Privacy group at the International Computer Science Institute (ICSI), which is an independent research institute affiliated with the University of California, Berkeley. Prof. Egelman is also CTO and co-founder of AppCensus, Inc., which is a startup that is commercializing his research by performing on-demand privacy analysis of mobile apps for developers, regulators, and watchdog groups. Over the hour-long talk, Prof. Egelman introduced us with his study on the privacy behaviors of mobile apps at scale and the in-depth vision of challenges in consumer protection.


Ten years ago when the focus of the privacy and security community was limited to the permission system, the question was whether users understand the permission system. Prof. Egelman's team conducted research studying users online and in laboratory experiments, they found out that people do not notice install-time permission requests, and do not understand them when they do. There is not a good way for consumers to understand the privacy implications of apps. The “notice and consent” framework requires consumers to take responsibility for their privacy by reading privacy policies, which is an excessive workload. Moreover, those privacy policies often entail data that will be shared with various third parties, who each pertain their own policy and the sharing system can be intricate. After all, it is unlikely for a user to understand all of the policies.


Although Privacy policy is legally mandated in US or EU, not all app complies with their own policies or with relevant laws or regulations. Prof. Egelman’s team build a system that downloads plenty of android apps and observe their data flows. They detected the exfiltrated data and accessed APIs as well as SDKs. After testing abundance of apps with their pipeline, they started to examine compliance at scale. Prof. Egelman talked about the case of CVS where the user location coordinate is shared from the user-agent with any third party content that’s hosted on the same web page. He pointed out that developers are often unaware of their app’s privacy issues causing serious privacy liability. He also mentioned that 57% of designed-for-families apps are in potential violation of US children’s online privacy protection act due to platform providers not enforcing their own terms. Moreover, the advertising ID can disclose one’s data to the third party. Although the ad id is designed so that users can reset it, 75% of apps transmit the ad id alongside other persistent identifiers, neglecting their privacy settings.


Prof. Egelman’s talk offers an in-depth vision of privacy concerns and regulations in mobile apps. While reading all of the privacy policy word-by-word seems to be an impossible task for a general user, understanding the network structure as well as laws and regulations requires a higher level of expertise and skills. What can consumers, developers, and managers do about this? Prof. Egelman suggested developers to read SDK documentation to check any configuration one need to disable to prevent the SDK from violating the privacy regulations. In addition, compliance people should be aware and offer guidance to the developers in their organization. Last but not least, everyone in the ecosystem can actually test the apps and look at the traffic flow and the receivers.

bottom of page