DL Seminar | The Platform Data Crisis and How to Solve It
Updated: Apr 18, 2022
Individual reflections by Joseph Cera and Eunic Yoo (scroll below).
By Joseph Cera
Society could benefit from greater access to big tech platform data. During a talk on March 31st, 2022, Jonathan Mayer (Assistant Professor of Computer Science and Public Affairs at Princeton University) made a strong case for increased data access for researchers on platforms such as Meta and Twitter. Jonathan covered ways in which data access could help address a range of societal concerns. Examples included mental health issues for young women and scandals such as Cambridge Analytica and the Myspace privacy scandal. Tech platforms do not currently provide adequate data access for research.
I reflected on Jonathan’s informative talk from the perspective of a Product Manager with experience delivering digital publishing products and experiences. Ethics, appropriate stakeholder input, key metrics, and long-term business goals were part of the equation for success during product development.
A massive amount of behavioral data could allow big tech platforms to classify the mental states of users and potentially take advantage to drive revenue. The datasets that big tech companies hold may enable decision-makers or algorithms to manipulate users into desired mental states. Incentives to drive key engagement metrics, e.g. increasing screen time as a consequence of someone feeling sad or depressed, may contradict societal goals to improve the mental health of the populace. If increased screen time is driving revenue and algorithms are saying that depressed users spend more time scrolling, there may be an incentive to make people feel depressed. Even if this is happening unknowingly, access to data can form the foundation for discussion. Access to data could be the first step to improving unforeseen negative impacts on society. Jonathan made a strong point when he highlighted that courtroom discussion surrounding big tech is often based on anecdotes from ex-employees. Anecdotes may not be enough to progress discussion since big tech companies seem to naturally default to plausible deniability. If we had access to data, there could be less finger-pointing or excuses for undesired product behavior and we could have a healthy debate about how algorithms or platform features can change to benefit society. The lack of transparency seems to be eroding trust in big tech platforms and as more time is spent on platforms, the need for data access in the name of research increases.
Jonathan posed an intriguing thought exercise - what if we could study real user behavior, in real-world settings, and both measure activity and evaluate interventions? Relating back to my reflection, and if platforms are unknowingly (or knowingly) pushing users towards unhealthy mental states, the need to study user behavior in real-world settings is essential for societal concerns. Researchers should be allowed to access the data they need in order to establish the way a platform is tracking toward improving the mental health of the population. If a particular platform is trending in the wrong direction in terms of mental health influence, perhaps a known intervention can be implemented to make a change. Whether this is a feature added or removed from the platform, researchers can’t contribute to the conversation or even know there are issues in the first place without data access. Jonathan did a fantastic job highlighting the need for interdisciplinary collaboration on complex societal issues spawned by tech platforms. I imagine grounding discussions with data would make these discussions more productive and actionable.
By Eunice Yoo
Jonathan Mayer, an assistant professor at Princeton, poses a critical question that we all conveniently overlook – do giant platforms have data from which we can learn some important insights so that platforms could better serve the society as a whole? Yet, granting access to the big data by the giant platform companies would be sacrificing their proprietary business ingredients. Therefore, the society is faced with the dilemma – the platform companies are ever expanding in its technological capacity to serve the masses of people, and their speed of algorithmic machine learning about people’s behaviors and categorizing people is staggering, but do we know where we are headed as a society?
As a consumer, I am an avid Netflix watcher. In international markets, Netflix has performed particularly well, making the over the top content prevalent for many households around the world. If we observe Netflix, we get a glimpse of the algorithm behind the extreme YouTube radicalization that sometimes happens to some citizens around the world. The more a viewer watches and clicks the like button, the more the viewer will be shown content in a similar genre.
So, if a viewer liked a detective series and watched thriller/mystery genre for several days, the person would be segmented to a certain category and be served by similar content – this stickiness keeps the viewer returning to Netflix as Netflix knows what s/he likes to watch.
If we move the context to YouTube or Facebook, the situation becomes a little more complicated as digital marketing comes to play a critical role to hyper-customization of content for users. Based on the digital footprints that users leave on a platform, the users are categorized into segments to whom a certain product or a brand can be advertised most effectively.
Essentially, “more time spent” by a user on a digital platform ecosystem translates into “more accuracy” on user behavior-based segmentation and “more advertising money” as the user consumes more advertising content. As digital marketing becomes more sophisticated, platform giants and digital marketers will seek ways to intensely leave an impression of a brand or a product while also expediting the speed of consumer purchasing behavior from awareness to actual purchase.
Along the lines of hyper-customization based on algorithmic recommendations made by YouTube or Facebook, the pitfalls of the radicalization by extreme content makers exist. While the digital platforms’ goal is to have more users to spend more time on their ecosystems, the process of radicalization by a bad actor can settle so easily: gentle nudge (creating awareness) - repeated recommendation of similar content (lack of alternative options or thorough evaluations) - stickiness to the limited world view (indoctrination and self-confirmation bias).
Ultimately, if we were to move into meta verse in which all of our digital footprints would be stored across blockchain technology at some point, and have our physical reality re-translated into digital reality, the pitfalls of extreme radicalization must be taken into account and acted upon now. To do so, if the digital giants do not take the initiatives to take the social responsibilities to help instill unbiased AI/ML and less commercialized targeting and converting while preventing super-narrow segmenting of the audiences, limiting their world views, then, it’s time that the academia, governments, and other organizations including non-profit take an action to provide a vision and a roadmap for mankind in the digital transformation era. Therefore, I think that Jonathan Mayer’s first moon step towards understanding the platform data is the beginning of a bigger change – bringing up the overlooked issue and quantifying what seemed rather mysterious and proprietary and thereby prompting a society-wide change.