Week 3: Traceroutes

blog cover copy.png

We’re learning about the physical infrastructure of the Internet, including how bits travel back and forth across the ocean in mere milliseconds through undersea fiber optic cables (and we recently saw one of four NYC access points at Hunter Newby’s facility in downtown Manhattan here). In my mental model, I imagine the Internet as the global nervous system of the human race that’s increasingly spreading across the crust of the planet. Wireless satellite communications play a part in this, too, but as we learned from Hunter, this accounts for barely 1% of the pathways that information might travel.

Using the command line, traceroutes reveal the path of packets of bits to a particular destination, such as to a server hosting a website. Packets hop from router to router and each trace notes the router’s IP address, which autonomous system to which it belongs (networks maintained by internet service providers), and the amount of time (again, in milliseconds) it takes reach and return from each router.

This week, I performed traceroutes on three websites that I visit regularly to learn how my web requests are routed and which networks handle them.

Destinations

  1. This blog where I record my learning at ITP

  2. The P5 Web Editor where I sketch out computational ideas

  3. And YouTube where I consume a limited but daily dose of news and occasionally fall down rabbit holes of rabbit holes when I need a break from #1 and #2

Questions

  1. What are the physical paths that my web requests take?

  2. Do these paths change over time?

  3. Do these paths differ when requested from different geolocations in my city?

  4. Through which networks do these requests pass? And again, do they differ when requests are made from separate geolocations?

Process

  1. Via the command line, traceroute -a www.ellennickles.com, returns a detailed list of the aforementioned data points. But traceroute -n www.ellennickles.com, returns each hop with just the router’s IP address along with each packet’s roundtrip travel time. I recorded traceroutes for each of the three sites from both my home and from NYU and over the course of a week: on Monday 9.17.18, Wednesday 9.9.18, and Friday 9.21.18.

  2. I used ip-api.com to batch query the IP addresses for each traceroute, again via the command line. (Here’s an example from their site: curl ip-api.com/batch --data '[{"query": "208.80.152.201"}, {"query": "91.198.174.192"}]'.) This returns JSON-formatted data including the name of the internet server provider (ISP) along with the latitude and longitude of the router at each hop in the packet’s path.

  3. I converted these JSON files into CSV files to see the data in table form and also to easily grab the the routers’ geolocations, which I plotted on plain web map to quickly visualize the actual paths traveled across the Earth. (I’m sure there’s a way to programmatically do all of this, but I wasn’t sure what I wanted to do with the data until I saw it in this form.)

Findings
Geolocations - On each of the three days last week, the paths to each web destination did not vary that much, however the routes did differ according to where I performed the traceroutes. My home is about 3.75 miles away from NYU and on the same island, yet the routes taken were often quite different as demonstrated below.

Here are snapshots of that geospatial data from my map. All traceroutes visualized below were performed on Monday, 9.17.18. (However, data for all three days from both starting locations is here.)

Network Providers - For those traceroutes initiated at my home, around 50% of the hops are managed by Spectrum, which makes sense: I pay them monthly for Internet access. After that it varies. For traceroutes to my blog, there’s a hop through Vodafone, based in the UK, and then on to the American company, Akami Technologies, before it reaches Squarespace. For the P5 Editor, Tata Communications America (Tata started as an provider in India) handles the remaining 50% of the hops after Spectrum. For YouTube, there are a couple hops through Tata Communications America but then Google handles the rest (Google acquired YouTube in 2006).

For those run at ITP, NYU itself handles about the first third of hops, followed either by Tata Communications America, Akamai, or Google (for YouTube). The traceroute to my blog, however, reports some new mentions: GTT Communications, a multinational telecommunications company, and Telia Company AB, a Swedish company.

Takeaways & Next Steps
This blog is hosted by Squarespace, which is located on the same island where the traceroutes originated. Yet the packets still travel around the country and over the seabed, often several times, before reaching their destination, just blocks from NYU. When I started this investigation, I was mostly curious about the geographical distances travelled, but as I read in Chapter 11 of Linked, it’s not necessarily about physical distance—it’s how fast routers can move those bits. As we recently discussed in class, after I collected and mapped this data, routers use traceroutes to keep track of the fastest routes (in their routing tables) to shepherd packets to their destinations. For the most part routes are static, which my data supports—the one exception being the 9/21/18 traceroute from NYU to my blog. Changing routes would slow the system; this is one of the downsides of a mesh-type network. A good next step for me would be to include the roundtrip times to and from those routers on my map, as well as dynamically draw the lines to watch how the path emerges, and compare the routes again.