DL Seminar | Inside the Internet
Individual reflections by George Pomar, Brandt Beckerman, and Jesse Zhu (scroll below).
By George Pomar
Tejas Narenchania (pictured above) is an assistant professor of Law at UC Berkeley and Nick Merill is the director of the Daylight Lab at the UC Berkeley center for long-term cybersecurity. Together they presented Inside the Internet, a book that they are co-writing.
Setting the Stage
Tejas starts the talk by setting the stage for how the internet works. Under the old system in the early 2000s, if Jane wanted to communicate with Joe, she would first go to her ISP who would then hand off to a tier 2 provider, then a tier 1 provider, then a different tier 1 provider then another tier 2 provider, then to an ISP who finally connects Jane with Joe. This is the old, decentralized story and there are two problems with this structure. One is latency, which becomes more important with the rise of web 2.0 and e-commerce. The other problem is security. DDoS (distributed denial of service) attacks become increasingly common and in 2000, a 16-year-old is able to take down the biggest and most well-known sites on the internet. As a result, we get CDNs (Content Delivery Networks) which address both problems. CDNs reduce latency by distributing content across a network of geographically dispersed servers and CDNs improve security by offering a wider view into internet traffic and using it to identify that which is malicious.
The dominant view is that the market for internet traffic exchange is highly competitive. On this view, many providers across several classes of services compete to offer service. The ISP market is not competitive, but because the US Government has viewed the inside of the internet as competitive, they have not placed any regulation over the market for internet traffic exchange.
The questions we must ask ourselves are what have CDNs given us? What have they taken away? And what is the state of the market? Recently, there have been two widespread internet outages due to Fastly, a CDN which suggests that the CDN market might not be as competitive as initially thought.
Nick continues the talk by answering 3 questions. One, how do we know what the market for CDNs look like? Two, what does that market look like? Three, so what?
To answer the first question, Nick outlines an experiment that he and his team undertook. They make requests and analyze responses for the top one million websites and see if it uses a CDN and if so, which CDN. The results of the experiment are that 11 providers control 99% of the CDN market. 5 firms control 96%. And 80.7% of websites that use a CDN use Cloudflare. The Herfindahl Hirschman index of the CDN market is 6559, where 2500 is the threshold for a highly concentrated market. But, out of all websites on the internet only 22.6% use a CDN at all. This leads us to the real question. What proportion of user-facing bits deal with a CDN? The answer, roughly 76%.
If Akamai or Cloudflare went down Nick hypothesizes that the entire internet would go down.
This centralization leads to two potential problems, one is cyberattacks and the other is speech. A state sponsored attacker could potentially take down a CDN. Russia and China are potential adversaries who don’t have as much to lose as it might seem. Furthermore, the most realistic way of mediating speech on internet is to contact the CDNs first. In fact, the great firewall of China is just a large state run CDN that Chinese ISPs and service providers are forced to use.
Back to Tejas. In summary we have a regulatory model that’s predicated on the assumption of robust competition for internet traffic exchange but in fact 80% of your internet experience is mediated by CDNs and 80% of the CDN market is controlled by Cloudflare. The policy of no regulation is based on robust competition but in reality, the market is highly concentrated. As policy makers, we might ask for greater transparency since without a window into CDNs’ private networks our public security is frustrated by the increasing privatization of internet traffic exchange. We also might be concerned about competition. There are two potential consequences, increased prices, and a bottleneck for free speech. Regarding the first concern there has been no monopoly or oligopoly pricing yet but CDNs are extracting loads of data. Regarding the second concern we already know that Cloudflare limits some consumer choice like Tor and treats traffic from Ghana as intrinsically suspicious. In summary, do we trust CDNs to make these decisions for us? Do we want political accountability or market forces to hold CDNs accountable? Tejas concludes that the market is not competitive so policy makers should revisit this issue and he also recommends that the U.S. Government impose net neutrality on the inside of the internet.
Tying this talk into the broader digital life, I feel as if this problem underlies the speech and privacy concerns for the whole internet. Although most of the focus on speech and privacy is on application layer interfaces such as social media platforms, the speech and privacy concerns regarding CDNs underpin every facet of the internet which underscores the importance of this issue. The concentration of the CDN market also bears similarity to the larger problem of capitalism incentivizing large corporations to keep growing larger and larger which in many cases leads to monopoly or oligopoly structures in certain markets as in the case with the FAANG companies for instance.
My personal take is that while the thought of CDNs being very concentrated is of concern, many other parts of the internet are also very concentrated like cloud computing which is dominated by Amazon Web Services. The fact of the matter is that there are already many things that are almost totally in the control of large corporations. Naturally, however, it does make sense for policy makers to revisit this topic with the new information brought to light by this paper. We should let our elected officials decide the best course of action with all the relevant information present. Nonetheless, I personally believe that there haven’t been any problems with the way in which CDNs have been run for the past decade. The concentration of CDNs has allowed them greater insight into potential attacks and their stance on Tor for instance is justified given the behavior of some individuals who use Tor which includes spamming, attacking, or distributing malware.
By Brandt Beckerman
There is nothing more ubiquitous in society today than the internet. It has become an integral part of life in everything from buying toothpaste to checking health records.
Businesses rely on it for interacting with customers and parents rely on it for communicating with their children. One would therefore be placed in quite a predicament if their favorite app or website was knocked offline for example due to a nefarious actor performing a directed denial of service attack (DDoS). Access is everything and, in an effort, to maximize uptime and enhance security many services centralized their traffic through Content Delivery Networks (CDN). Overall, this seemed to solve the issue. Services such as Cloudflare’s and Fastly’s entire business model is built on maintaining a services ability to access the greater internet, thereby ensuring their countless users have access every hour of the day. The average internet denizen has over 75% of their browsing routed through these CDN with Cloudflare making up a large plurality of these three quarters.
With their vast traffic, these CDN have used economies of scale to build cheaper, larger networks of infrastructure as well as understand traffic patterns to the degree that detection of malicious web traffic is now easier than ever. Overall performance for websites has also improved dramatically with CDNs vast network of nodes allowing for low latency access. From Akamai’s What is a CDN ‘content storage at the network edge makes it possible to reduce latency and deliver the same content to multiple users for more efficient access.’2 It would not be possible for individual websites to cache their content on numerous servers throughout the globe to allow for this increased speed of access without CDNs. Consumers have come to expect this lightning-fast connection with ‘trust in e-commerce being inversely proportional to latency.’3 The more the hourglass spins when completing a transaction, the more time there is to think if a credit card number is being stolen.
In an attempt to provide greater internet security and access through centralization, CDNs have created a singular weak point in the internet infrastructure. If a malicious actor can take out a CDN they can take out large swaths of the internet, not unlike Luke firing into the thermal exhaust port on the Death Star. It only takes one actor and action to blow everything up. In addition to these nefarious third parties, CDNs themselves use their position to heavily throttle traffic from nation states, Ghana for example, as well as services such as TOR.4 They have the extreme power to create and enforce a ‘digital no fly list of sorts.’5 As the internet is as pervasive as it is, this can be devastating to any individual, business or other entity who finds themselves on the bad side of a CDN.
Numerous governmental bodies have looked into this Achilles Heel of CDN centralization, determining that there is sufficient competition not to necessitate regulatory action on the scale implemented with respect to internet service providers (ISP). What then would be the solution to this quandary? This is an open question that hopefully will have a solution arise sooner rather than later.
1. DLI Seminar
By Jesse Zhu
Tejas and Nick begin the discussion of the internet through talking about its infrastructure history. In the old story, there’s multiple pathways that a request goes through to reach the end destination, from the internet service provider (ISP) to the ISP’s central office, through multiple tier 1 and 2 providers, and finally through the end destination’s ISP to the destination. To illustrate as an example, if a restaurant were to use a data center from an ISP, the progression would go as follows: ISP to the ISP’s central office to a tier 2 provider to a tier 1 provider to another tier 1 provider to a tier 2 provider to the data center.
They continue, however, by discussing the two main issues with this infrastructure: first, the latency issue, whereby larger distances take longer for the request to be processed. The current Web 2.0 that is more user facing is extremely latency sensitive, with critical consequences of higher latency, such as lowered trust in e-commerce. The second problem lies in security, whereby distributed denial of service attacks (DDoS) attacks are common. On February 7th, 2000, Michael Calce successfully launched a DDoS attack on Yahoo by flooding its servers with different types of communications, and proceeded to successfully bring down eBay, CNN, and Amazon, over the next week.
With these problems, Tejas and Nick motivate the creation of content delivery networks (CDNs), geographically distributed networks of proxy servers and data centers. In this model, the request goes from the ISP to an interconnection point, whereby the CDN sends the relevant data to that same interconnection point. As a result of this wide scale distribution, CDNs have reduced distance latency and increased security compared to the traditional infrastructure of the internet. Additionally, they have other features such as file size reduction via minification and file compression that lead to quicker load times.
The introduction of CDNs has led to several different issues regarding competition. Tejas and Nick assert that the internet traffic exchange is highly competitive, with numerous tier 1 and 2 providers as well as CDNs, which might lead to belief of no regulation. Indeed, regulating the internet traffic exchange has been a recent development. The federal communications commission (FCC) has only recently made traffic exchange in scope in 2015, and repealed net neutrality in 2018 due to the belief that the market is competitive and dynamic. However, with internet outages, the old infrastructure avoided cascading problems by routing around the damaged provider due to the many other available ones in the competitive market. With recent outages with CDNs, it’s become apparent that numerous websites are down if one CDN goes down, which questions if the CDN market is actually competitive.
Using their own analysis, Tejas and Nick used internet measurement to see what the CDN market looks like by omitting the types of machines and why they are connected to one another. They were able to track which provider uses which CDN by aggregating responses from various websites. From their analysis, 22-23% of websites use CDNs, with 11 of those providers controlling 99% of the market and 5 of the providers controlling 96%. Of the providers, 80.7% of websites use Cloudﬂare, which can be seen to be extremely concerning. A lot of traﬃc is not necessarily user-facing, with many being tier 1 only, routing, or application providers talking to each other. Even regarding the user-facing bits, 76% of them deal with a CDN. This suggests a centralization of CDNs which is concerning due to cyberattacks and outages leading large portions of the web to fail if one CDN goes down, and relates to moral rights such as speech if CDNs are able to regulate large portions of the web.
Tejas and Nick lead to possible policy responses as motivation for dealing with this issue. One relates to transparency with cross-provider communication and the threat of public networks. The other is to deal with competition, whereby monopoly pricing is unlikely, but the threat of data as currency needs to be handled. One solution proposed is to have publicly sponsored non-proﬁt CDNs as competition. Finally, regarding the speech threat, intermediary control imposes important limits on consumer choice and on speech (e.g. if a CDN determines that traﬃc from Ghana is intrinsically suspicious). They propose that a solution rests in the net neutrality debate but has yet to be exactly determined.
Overall, Tejas and Nick did an excellent job at describing the past and current architecture of the internet, as well as their problems and motivating concerns for the CDN architecture, especially regarding competition. They shed light on possible solutions for the CDN competition problem, and I hope to see several solutions in the future!