DL Seminar | Modeling COVID with Mobility Data to Understand Inequality and Guide Reopening
Updated: Apr 25
Individual reflections by John DeFelice, Michael Hewson, and Victoria Woo (scroll below).
By John DeFelice
The COVID-19 global pandemic has led to an unprecedented focus on the spread of infectious disease. In the midst of a delicate battle against public health crisis and economic survival, we have raced to answer questions pertaining to primary causes of spread. During a presentation highlighting her research and modelling, Emma Pierson—a senior researcher and data scientist at Microsoft—argued the central importance of urban mobility and its correlation to COVID-19 infection rates.
Emma and her team utilized aggregated cell phone data to trace mobility among the 10 largest metropolitan areas in the United States, which represents about 98 million individuals. The mobile tracking study analyzed movement between two key categories. First, the data focused on Neighborhood subsets or CBGs. Second, key points of interest or POIs were traced among the CBGs. For example, Emma’s model exhibited a significant drop in CBG movement between March and April 2020.
Next, Emma’s research centered around a related Epidemiological Model, which accounted for four key stages in the incubation process: 1) Susceptibility, 2) Exposure, 3) Infection, and 4) Removal. Of particular importance, the research team considered the interaction between stages 1 and 2—Susceptibility and Exposure, respectively. Backed by real-world COVID spread data, Emma found a key correlation of higher rates of exposure yielding in dense metropolitan areas.
Indeed, the models back the real-world narrative we’ve all lived these past 6 months—what does this all mean and how can the research be helpful moving forward?
Of Emma’s key results, I find POI risk, reopening considerations, and socio-economic and racial disparity to be the most compelling. An incredible amount of damage has already been caused by the pandemic, but can we take these results into account and do better? I think so.
The research shows that particular “types” of POIs are more dangerous than others. For example, a café with high volume and quick turnover is more dangerous than a more spread-out dining experience. In addition, the study found that individuals who fall into lower socio-economic demographics are more likely to visit higher risk POIs. As we reopen, perhaps we provide more resources or support dollars to the higher-risk POIs.
The easy answer is for everyone to stay home. However, as Emma’s study notes, folks who fall into lower socio-economic demographics—often non-white—live in dense urban neighborhoods. As it regards sheltering in place, these groups are afforded less choice because their livelihoods often require mobility. Further, I would argue, basic economics point to the necessity of reopening.
Federal, state, and local governments should contemplate data (like Emma’s study on mobility) to strike a smarter strategy. For example, perhaps municipalities could rally around higher-risk CBGs and provide an elevated number of testing facilities. This would enhance awareness and minimize transfer between the Susceptibility and Exposure stages described above.
I remain hopeful. As winter approaches, and we close in on the better part of a year combatting this pandemic, we’re a lot smarter. We know we must pay greater attention to dense metropolitan communities which are far more susceptible to infection. And looking beyond COVID, assuming a widespread vaccine in 2021, we can take these lessons with us even further into the future. Albeit, far less alarming, flu seasons can be detrimental to many individuals in the populations at the center of Emma’s study. I am not suggesting limiting capacities in 5 years from now just to combat the flu, but we can ramp up annual influenza vaccinations, mask-wearing, handwashing, and general awareness in these more susceptible communities.
Certainly, Emma’s presentation and findings are incredibly well-informed and shed an important light on the extra attention we must give to metropolitan neighborhoods.
By Michael Hewson
On September 30, 2020, the Digital Life Seminar series at Cornell Tech welcomed Emma Pierson, a senior researcher at Microsoft Research New England and a future assistant professor of computer science at Cornell Tech starting in summer 2021. Emma focuses on developing data science and machine learning methods to study the intersection of healthcare and inequality. During her presentation, she showed her model for tracking and analyzing COVID-19’s spread through dynamic mobility networks, the implications of reopening strategies, and the predictability of higher COVID-19 infection rates among disadvantaged groups.
In trying to understand inequality and guide reopening in terms of the COVID-19 pandemic, Emma presents her approach as two-fold: (1) estimate the dynamic human contact and mobility network, and (2) build a model to capture transmission on this mobility network. First, Emma used cell phone mobility data to estimate mobility networks for a certain amount of time between neighborhoods (Census Block Group “CBG”) and places (Point of Interest “POI”, such as restaurants, grocery stores, gyms, etc.). Second, Emma created an overlay of disease transmission on the mobility network to analyze who is infected, where transmission occurred, and when transmission occurred.
Split into four categories, Emma explained how her mobility network data and model works: Data, Methods, Results, and her Conclusion.
Emma focused on city-specific data from 98 million people in 10 of the largest Metropolitan Statistical Areas (MSAs) in the United States, including Atlanta, Chicago, Dallas, Houston, Los Angeles, Miami, New York City, Philadelphia, San Francisco, and Washington DC. This data comes from SafeGraph, which looks at anonymized, aggregated location data from several cell phone apps.
The Patterns Dataset looks at POIs. Specifically, how many visitors each POI had per hour, the distribution of visitors’ home CBGs per week or month, and metadata that focuses on area, median dwell time, and category. The Social Distancing Metrics looks at neighborhood data (CBGs); specifically, it estimates how many people stay at home per hour. Other sources of data that Emma looked at was the New York Times for the daily number of COVID-19 cases per U.S. county, and the U.S. Census for CBG demographics on population size, median household income, and percentage of white residents. Emma looked particularly at data between March 8, 2020, and May 9, 2020.
Emma then presented her Methods Pipeline, in which she split it up into the (1) Network Inference, (2) Epidemiological Model, and (3) Fit Model.
With the Network Inference: SafeGraph gave Emma information about POI and CBG visits on a weekly or monthly level that is censored, not hourly. Here, Emma had data about how many people go to each POI per hour, how many people leave each CBG per hour, and an estimate of POI and CBG matrix. With the matrix, Emma looked at consistency with the row and column sums, and “as similar as possible” to the initial noisy matrix.
With the Epidemiological Model: Emma updated her model hour-by-hour; for example, Emma said: “for each CBG, we look at each hour and say ok, 30% of people are in the susceptible state during this time.” She specifically split variables into (1) Susceptible, (2) Exposed, (3) Infectious, and (4) Removed. ‘Susceptible’ pertains to the virus not having entered into the public yet, ‘exposed’ pertains those who are exposed, ‘infectious’ pertains to those who can spread the virus, and ‘removed’ pertains to those individuals who have possibly recovered and can no longer infect others or catch the virus themselves. In particular, Emma emphasized the transition of susceptible to exposed as the transition that depends on mobility with CBGs and POIs.
With the Fit Model: Emma looked at model calibration. Emma (1) performed a grid search for over 1,500 parameter sets for each major city observed, (2) evaluated model fit [RMSE] on daily cases, and (3) selected parameter sets with RMSE within 20% of best fit RMSE.
The model allows for information on “what-if scenarios” for mobility reduction, super-spreader locations and where the riskiest POIs are, reopening strategies analysis, and disparities in infection rates based on race and socioeconomic status. In terms of “what-if scenarios”, Emma adjusted her model to see what would have occurred at specific points in time if mobility was reduced. Interestingly, 10% POIs account for about 85% of infections (riskier locations being restaurants, hotels, and gyms, as people stay longer), and Emma mentioned the possibility of distributing masks at these riskier locations.
In terms of reopening plans, Emma looked at possible ways, such as (1) fully reopening, which means mobility would be at 100%; (2) capping occupancy, which would mean to only allow a certain percentage of maximum capacity to be at a POI at a given time; and (3) uniform reduction, which would mean to uniformly reduce the number of visits to a POI by a certain percentage. Emma’s predictable data based on her model stated that:
Fully reopening would make for a 32% increase in infections.
Capping at 20% maximum occupancy would cut infections down by 80%, while businesses would only be losing roughly 42% of overall visits. The 42% figure is because we are not worried about people going to a store at 3 A.M. vs. 5 P.M.
Interestingly, the capping occupancy option would make for 25% fewer infections than a uniform reduction.
In terms of disparities, Emma made clear that we cannot infer demographics on an individual level, only a neighborhood level, specifically comparing CBGs in low-income communities vs. higher-end communities. The model shows that lower-income and less white CBGs saw smaller amounts of mobility reduction (higher CBGs saw more of a drop in mobility). Being an essential worker from a lower-income community exposes you more to infection. The data shows that policymakers need to look at the overall population, and look closely at disadvantaged groups because they will be harmed the most when reopening occurs.
Emma concluded the presentation by describing some limitations and takeaways from it. Some limitations she mentions are (1) cell phone mobility data does not cover and give data for all populations, such as children, and does not cover all POIs; (2) her model lacks features, such as asymptomatic transmission, hand washing and mask wearing, and age-related variation in mobility and susceptibility, because of the model’s simplicity; and (3) the necessity of having to make assumptions about things, such as transmission rate, to provide a model in the first place. However, despite some pitfalls, Emma ended her presentation strong by providing us with some takeaways. Some takeaways mentioned are (1) although the model is simple, it provides accurate fits of case trajectories for the observed cities, (2) the model allows for micro trend capture per hour for CBGs and POIs, and of course, (3) providing information for how to respond effectively and equitably to COVID-19.
The presentation was certainly interesting and provided me with much more detail about how these types of models are created.
By Victoria Woo
Epidemiological models have been central to our understanding the salient factors and populations more vulnerable to spread of COVID-19. However, these models for the most part have relied on either historical data or modeled on a macro level, which doesn’t provide the fine-toothed comb to understand the information we would need to ask more precise questions. For example, what happens if we reopen restaurants, but not other areas of gathering, at 75% capacity? How does this disproportionally affect certain demographics?
In order to ask questions in this fine grained way, Emma Pierson, machine learning researcher at Microsoft Research and incoming Assistant Professor at Cornell Tech, along with an interdisciplinary team of researchers coming from sociology, epidemiology and computer science have developed an epidemiological model based on mobility data. In essence, the model captures and predicts case trajectory to understand inequality in infection rates, which in turn can also guide more equitable and effective policy responses.
Fine-grained Mobility Network Modeling to Ask Precise Questions
1. Estimating the contact mobility network through the data
There were two main tasks in order to build the model, however the results are a scalable and interoperable model. First, there was the estimation of the contact mobility network. This means building a picture about who is infected where from anonymized cell phone mobility data provided to a third party aggregator and a number of open source data sets. Captured within the data points were a population of 98 million people in 10 of the largest US metropolitan statistical areas, which were modeled separately, running from March to May 2020 – or generally, lockdown to a limited reopening.
The cell phone mobility data captures two important sources of information for the model. First is the Patterns Dataset which gives the hourly number of visits from a neighborhood (defined as census block groups or “CBGs”) to points of interests (“POIs”) such as restaurants and churches, the distribution of the visitors’ home CBGs and metadata such as the area and median dwell time. What is being inferred from the data is a network that links neighborhoods to points of interests by showing how many visits there are from a given neighborhood to a given place and how long they stay. The second is the Social Distancing Metrics, which are the hourly estimates of the proportion of people staying home. From this, a noisy estimate of the network is built of the contact mobility network. Then, using a machine learning algorithm called interproportional fitting, the researchers link the data together to form a functional picture of who is moving where from where and whether they are able to social distance.
2. Building a model that captures transmission across the mobility network
Secondly, the epidemiological model built would have to capture transmission across this network. For each neighborhood, they model the proportion of people at each epidemiological state (those susceptible, exposed, infected, or removed from infection) which is updated every hour.
The results from both of these processes are a model that fits noisy data well that can then be used to ask more precise questions.
Findings and Modelling Reopening Strategies
What the model found dovetailed with the reporting around “superspreader” events: that a small fraction of POIs account for a large fraction of infections. An implication of this result is the outsized impact on infections. So from a policy level, it enables targeting and monitoring to reducing risk in a small fraction of POIs and enable an effective measures such as diverting resources like handing out masks at those POIs.
Another important policy measure in reopening strategies has been to reduce mobility by capping occupancy strategy. The model can be used to compare reopening strategies, by for example, simulating what having mobility from 100% to 80% of the maximum historical capacity of each POI. However, the model found if visits are disproportionally capped at certain times, there is an outsized impact on infections. A capital occupancy strategy may be more effective than uniform reduction strategies.
Inequality can also be inferred through infection rates, due socio-economic and racial disparities that leave force exposure to risk. As demographics are based on the neighborhood level grounded in census blocks, there are limitations drawn around the inequities within census block groups that may not be represented by the median. However, when the model is asked to compare the top decile with the bottom deciles of CBGs, socio-economic and racial disparities are predicted based on mobility patterns alone. Lower income and less white CBGs are not afforded the same opportunities to reduce their mobility, such as working from home. The disparate impact of reopening is that less white and poorer neighborhoods are going to suffer. While this is not new news, the model gives metrics to the mechanisms through which these inequities play out, which is important an important policy tool.
Pierson and the research team’s fine-tuned model, though based off of noisy data, has been able to provide more place-based and demographic-driven guidance on the disparate impacts of COVID-19 and the policy choices leaders make in the course of managing the pandemic. While the model – due to the data – cannot provide the definitive clarity levels of risk we need to see through the lens of labour, schools and children, and the impact of disparity within neighborhoods, it showcases the power of being able to see micro trends down to the neighborhood, down to the hour to inform better policymaking.