Jessie G Taft
DL Seminar | Fairness in Networks
Zimeng Zhu & Jingxuan Zhang | MS Students, Cornell Tech
At the DLI Seminar on 17th October, Sorelle Friedler from Haverford College spoke about Fairness in Networks: Understanding Disadvantage and Information Access, how to understand discrimination in social networks, and what interventions can be done to ensure more equal access to information.
Lots of information is being shared through social networks today: access to information spreading through a network can mean knowledge of jobs, public health information, social credit from the government, or even public safety information, such as who is being surveilled. All information spreading through social networks can influence an individual's life from different aspects. This information is either for good, which means that individuals with access to the information have an advantage, or for bad, which means individuals are trying to avoid access to this kind of information (consider surveillance).
One of the most important questions in this area is how to maximize influence in a social network. The goal is to choose where to place initial sources (the seeds) of information so as to maximize the number of people in a social network that receive specific information via word-of-mouth. However, one limitation of this work is that it has not typically focused on the impact that the information has on the individuals in the network.
In the social networks scenario, individuals have a social “position” in the network, which is a class marker defined by the network structure. Social standing within a network confers utility on an individual. Insiders who have a lot of connections in the network usually have more convenient access to relevant information about opportunities for advancement that can, in turn, lead to even better and broader connections. In the meantime, outsiders who have few connections in the network will experience lack of access to such information, and in turn, will find it more difficult to improve their social capital. This access gap may lead to a form of discrimination by social network position that is different from the traditional forms of inequality based on class, race, gender, or other attributes, but nonetheless puts a significant challenge in front of us.
In the work, Friedler and her team put more attention to each individual’s access to information and not just the number of people who are reached in the social networks. They put forward the question about which people are getting the information and how might we ensure that the access gap in information is reduced.
What Friedler and her team proposed is a new way of measuring the information access gap in a social network and how to improve the gaps through intervention, which in this case, by adding new seeds to the network. They modeled the flow of information in a social network according to the independent cascade model. In this model, a set of seeds send out the initial information, with a probability p of reaching each of the nodes in the next round. Once a node sends the information, it will then become inactive, meaning that it will no longer be spreading the information.
Two key concepts that were introduced during the talk were Welfare Functions and Access Gaps. A welfare function is a way of defining the fairness of a network. For example, we can use the expected number of people receiving information as a welfare function. An access gap, on the other hand, is defined as the difference in welfare between two groups of people and acts as an indication of unfairness. Ideally, we can do something to prevent the access gap from getting larger after we introduce new seeds to the network.
However, there is a phenomenon called “the rich get richer” in these social networks, where the access gap increases after intervention, meaning that the people who are more accessible to information will only become more accessible and the gap between the advantaged and disadvantaged groups will grow larger. Unfortunately, in many examples of arbitrary networks, this phenomenon will always exist. Therefore, preventing the rich from getting richer is perhaps too impossible of a task. But another goal that Friedler and her team discovered was to at least guarantee that the welfare of the disadvantaged group will increase after introducing new seeds.
This leads to a new notion of k-imbalance, that describes a welfare function. We say that a particular welfare function is k-imbalanced if, in a social network, the poor may get richer, but still cannot catch up to the level at which the rich started, and if intervention can only help the rich, but does not help the poor. Among the several welfare functions introduced in the talk, only one function is found to be balanced, which is the minimax function, defined as µ−∞. This function represents the minimum probability of all the individual nodes to receive information. Since µ−∞ focuses on the most disadvantaged individuals, there will always be some improvement for these groups after intervention. Thus, if we find a way to maximize this welfare function, we are solving the problem at hand.
The problem of maximizing the minimax function is called maximin, which is proved to be NP-hard by Friedler and her team.
As a work-around, the team used a set of heuristic algorithms on real data to solve the maximin problem. The most basic algorithm is a greedy one. In order to choose k new seeds with the greedy strategy, we will need to do k iterations. In each iteration, we will choose the one candidate seed that maximizes the minimum probability. However, this greedy strategy requires too much computation and thus, the team also came up with improved variations of the algorithm, including a myopic strategy, a naive myopic strategy, and ganzalez, which are more efficient computationally. For example, in the improved myopic strategy, we only choose the node with the smallest probability in the network in that one iteration.
The experiments conducted by Friedler and her team focused on the question of whether maximizing the µ−∞ welfare function really changes the access gaps. In addition, the team also investigated whether the result is different from improving the maximum reach welfare function and the efficiency of the proposed algorithms. To do so, the team used data from real social network databases and varied the scale of intervention between k = 1 to 100. In order to compare the minimax and maximum reach welfare functions, the team used an algorithm called TIM+ to optimize maximum reach.
Firstly, maximin is more than network structure. The experiment evidence shows that when using maximize reach(TIM+) instead of maximin, interventions end up strongly reflecting the existing structure of the network. That is, nodes in the network are more likely to become seeds if they are close to the center of the network, where probabilities of receiving the information are already high and additional interventions are not really needed. However, with intervention such as Myopic, the correlation between nodes’ probability of receiving the information and how well connected the nodes are is reduced.
Maximin is different than maximize reach also when considering that Myopic changes the distribution of probabilities. It significantly decreases the number of nodes with very low probability of receiving the information, and increases the number of nodes with larger probabilities over a broad range of probabilities.
In terms of the different heuristics, there is a tradeoff between effectiveness (maximizing µ−∞ ) and speed. In particular, the heuristics that make no use of estimation usually yield poorer quality results, but they run extremely fast because they avoid the step of estimating probabilities which is time consuming.
In order to understand the possible future improvements, Friedler first identified the limitations of the current work.
Not everyone needs to get information in the social networks, and some might need it more, for instance, consider the information of job openings.
There are feedback loops in these interventions that are not considered by the team.
There are different models for information flow (receiving information more than once might be better.
Furthermore, the possible improvement includes, but is not limited to, determining interventions that help individuals with avoid information access and trying to identify groups of people that share information access characteristics based on the network structure alone. The real world social networks structure and information flow is actually quite complicated, so it is difficult for researchers to consider all factors in a single model. In order to help individuals to get access to the information they need equally, we might better focus on a single factor everytime and try to make progress step by step.