DL Seminar | Security and Ethical Challenges of Big Data

Jessie G Taft
Mar 28, 2019
6 min read

Updated: Mar 29, 2019

By Congzheng Song and Haojie Zhang | MS Students, Cornell Tech

ree — Illustration by DLI Chronicler, Gary Zamchick.

Reflection 1 | Congzheng Song

In the era of big data, the Internet of Things (IoT) devices are collecting data about our daily life to make “smarter” decisions for us. Yet the massive amount of personal data raises questions about our privacy. Jessica Vitak, professor at the University of Maryland’s iSchool, gave a talk on her studies looking at privacy and surveillance on smartphones and intelligent personal assistants like Siri and Alexa as well as the ethical challenges.

Sensors, CCTV, smartphones and other IoT devices are collecting gigabytes of data about ourselves on a daily basis. On the one hand, these devices collect data to help us better quantifying ourselves: fitness tracker can measure the steps we take, the hours we sleep and more; smart home devices can help us save electric energy. On the other hand, little do people know how these personal data are used, shared, analyzed by the platforms that are behind the devices. Vitak took a deeper look into the question of the privacy concerns brought by the data collected by these smart devices and how consumers evaluate the risks through two case studies.

Vitak’s first case study focused on the fitness tracker such as Fitbit and Apple watch. This small device on your wrist collects many personal information including steps taken, distance traveled, time slept, heart rate and sometimes GPS coordinates. These data, collected by the fitness tracker company, might be shared with other companies for further analysis other than just telling your fitness statistics. With the fitness data in hand, one can infer your dietary habits, stress level, movement patterns and presumably even your insurance rates and fidelity. Not suprisingly, police and attorneys have been using fitness data to track down murder cases or evidence in court.

Even though Fitbit makes a clear statement that your personal identifiable data is not sold or shared with anyone else, the aggregates of your data might be shared. It is not clear to which extent the privacy is protected in this case. To explore users' privacy concerns, Vitak and her group interviewed Fitbit users about how much they know about the company's data practices, and their attitude towards the collected data. She found that users spent very little time on understanding Fitbit’s data policy, and even those who read and understand the privacy terms have few privacy concerns. Furthermore, users weigh the benefits of these devices as greater than their concerns about the collected data.

The second case study focused on the increasingly popular intelligent personal assistants (IPAs) such as Amazon Alexa and Google home. These assistants are voiced-based and will passively listen to user’s commands to perform simple jobs such as turn on room lights, play music and so on. The voice commands are recorded and collected in the backend, which again raises privacy concerns. In this study, Vitak found that people with lower privacy concerns and higher data confidence are more likely to own IPAs. Users may also rationalize privacy risks by emphasizing device benefits.

Knowing that only a minor portion of the consumers of smart devices have concerns about their privacy, how should we help more to be aware of the potential misuse of the collected personal data? Vitak gave several implications. We should embed the privacy in design of the devices and value privacy as a default setting. We should recognize that information norm vary in different contexts. In the case of personal data privacy, we might be fine with sharing the data in one context and be uncomfortable in another. We should nudge users towards privacy-focused decision making rather than forcing people to make the “right” choice. Big data technology has changed our life at a very fast pace and many of us overlook the risks of personal data. With the increasing amount of dedicated research looking into the security and privacy issues of personal data, we shall see more regulation in data collection and sharing policy and more awareness of the problem among ourselves.

Reflection 2 | Haojie Zhang

With the development of information technology, the world had never produced such massive amount of trackable data like today. It is no exaggeration to say that we have entered the era of big data, both in terms of the amount of data generated and the technology of processing them. In ancient times when text has not been invented, people store information in their brains. Thus the information generated and saved in this situation is evanescent and difficult to retrieve. After the written records appeared, it was much easier to save and exchange information. In the 20th century, Claude Shannon, the founder of information theory, proposed "bit" as the basic unit of information, and published two papers that laid the foundation of information theory. From this time on, the human civilization has rushed wildly on the road of informatization. The vast amount of data utilized by AI technology has greatly enriched modern people’s everyday life. However such technology also creates threats when it comes to privacy protection and information security. Jessica Vitak, associate professor at the University of Maryland’s iSchool, whose research is largely focused on networked privacy and social impact of ICTs, presented a wonderful talk entitled Privacy, Security, and Ethical Challenges in the Era of Big Data on March 21, 2019.

Jessica Vitak started her talk with ‘quantified self’ which referred to understanding ourselves through self-tracking data generated from our daily usage of mobile apps and other kinds of devices. She then walked through the idea of smart home. Underlying these practices are the fast development of smarter and better sensors, public sharing, and the emergence of cloud computing. Come along with those technological advancements, concerns like how people’s personal data should be used by companies and governments are also being raised. For example, in Jessica Vitak’s study of Perception of Personal Fitness Information, her team interviewed and studied fitness device users and made the following findings: users have low concerns about their fitness data; their knowledge of company’s privacy policy is often unrelated to that as well; benefits outweigh perceived disadvantages; and users mostly interface with trackers on their mobile devices which lack privacy feature granularity. In addition, researchers found that most Fitbit users relied on default privacy settings on their devices. In her second case study, she and her group turned attention to smart home technologies (like Google Home). They found that people with low data concerns or high data confidence are more likely to own a home IPA. People rationalize privacy risks by emphasizing the benefits and convenience of using those devices. On the other hand, to other people, privacy and security concerns may serve as obstacles to the adoption of these smart home technologies. At the end of the talk, she pointed out that norms vary across context and to give people better control over their privacy, product framework should be designed to embrace privacy protection and alert users toward privacy focused decision making.

Here I want to introduce several actions that people can take to better protect data privacy in their daily life. First when you surf the internet from desktop, set your browser to automatically clear cookies when it is turned off. Since cookies are the primary tool used by websites to track users, clearing cookies can cut off the tracking of users on most websites, and therefore preventing users from being targeted by website providers’ advertisements. If you access websites through mobile devices, try to use the incognito mode to further enhance your privacy in case that some mobile browsers may not provide the option of clearing cookies. However using incognito mode on your desktop browsers all the time is not recommended since unsaved websites cannot be retrieved which causes inconvenience. Also under the incognito mode, all the data downloaded from websites is cleared which means for frequently visited websites, data needs to be downloaded again and again to reopen the same webpage. That causes longer responding time. In addition, when using mobile devices, people should be careful about the permissions of each APP on the mobile end. The most important permissions to focus on are: location, SMS, contacts, app list, camera, and image. These permissions are closely related to personal privacy: the location will reveal your current location; the SMS will reveal the verification code received; the contact will analyze the relationship graph; the application list will reflect your preferences; and the camera and image will extract your surrounding scenes. If you do not want to leak privacy, the first thing you should do is to check those settings before installing any app on your mobile devices.