Understanding Networks

Week 10: RESTful Project Part I

Ridwan and I thought it would be fun to make an audience-controlled karaoke machine, with controls for displaying a random song video, playing and pausing that video, increasing and decreasing the playback rate, and giving compliments to the karaoke performer.

But I’m getting lost in thinking about a REST interface for our project. No matter how much I read about REST, including going over past students’ projects and the examples from class, I still can’t picture how it works. In attempts wrap my head around it, I a made a few things to help me figure out the gaps in my understanding.

First, I made a P5 sketch to test the interaction, although in this example both the input controls and the visual output are combined on one page.

Next, I made a very simple web app. When the server is running on my localhost, and I visit that port through a browser, a GET request is made to render the index page, which includes JavaScript with one button. Pressing that button generates a random number that is sent to the server as a POST request. My server receives it, prints it out in the terminal window, and finally, sends a response back to the (input) client acknowledging its receipt of the message. (Btw, I’m using this plugin now to handle CORS error messages.)

// server.js
var express = require('express');
var server = express();
var bodyParser = require('body-parser');
var urlencodedParser = bodyParser.urlencoded({ extended: true });
server.use(bodyParser.json());
server.use('/',express.static('public'));

function serverStart(){
  var port = this.address().port;
  console.log('Server listening on port ' + port);
}

function handleGet(request, response){
  console.log('Hey, I got a ' + request.method + ' request!');
}

function handleClicked(request, response){
  console.log('Got a ' + request.method + ' request.');

  var content = request.body.value;
  console.log(content);

  response.status(200).send('You sent me ' + content);
  response.end();
}

server.listen(7006, serverStart);
server.get('/*', handleGet);
server.post('/clicked', handleClicked);
// index.html
<!DOCTYPE html>
<html>
    <head>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.7.2/p5.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.7.2/addons/p5.dom.min.js"></script>
    <title>Let's Play!</title>
    </head>
    <body>
    Hello World
    <script>
    var button;

    function setup(){
        noCanvas();
        button = createButton('Click me!');
        button.position(10, 40);
        button.mouseReleased(clicked);
    }

    function clicked(){
        let num = int(random(360));
        var data = {
            value: num
        }
        sendNumber(data);
    }

    function sendNumber(value){
        let path = 'http://127.0.0.1:7006/clicked';
        httpDo(path, 'POST', value, 'text', getResponse);
    }

    function getResponse(response){
        console.log(response);
    }
    </script>
    </body>
</html>

These exercises were a useful review and helped me talk through and diagram our application with Ridwan, which he cleaned up here:

1-1.jpg

Our device has a input client with controls that output to visuals with sound, most likely in a browser window so we can project it.

We envision the controller to have six buttons that all make POST requests to the server:

  1. New Song posts a random number to select a video from an array at URL/song

  2. Play posts a 1 to URL/state

  3. Pause posts a 0 to URL/state

  4. Faster posts an increase of 0.5 to URL/speed

  5. Slower post a decrease of 0.5 to URL/speed

  6. Compliment posts a random number to select a happy cheer from an array at URL/compliment

The server responds to these requests with server status codes and stores the incoming values (along with the videos) to serve to the output client.

But then how does our output client know when to GET the user input data from the server? Does check a regular intervals? This seems kinda slow and impractical for video. Can’t our server just be a socket server that updates the output client as soon as it receives the new input data?

What am I missing? What are the dots that we need to connect?

Week 7: Packet Sniffing

Last week we spoke with ITP alumnus Surya Mattu about data traffic to and from network-connected devices. While the data itself is often encrypted, the metadata reveals quite a lot about a user's Internet activity and habits. Earlier this year, Surya and a colleague collected and analyzed packets from a fully-connected smart home (documented here) and among many other things, learned how chatty devices stay active even when their humans are away. Curious about the situation in my own home, I started my packet-scanning survey with the intention to compare differences between using my devices and leaving them connected but idle. The initial results opened up about a million questions for me, and I quickly meandered down some side roads…here we go:

Part I - What devices are on my network?
My home has a router that is connected to four Apple devices: a laptop, a TV, a tablet, and a phone. Each device has a Media Access Control (MAC) address and an private IP address. MAC addresses are unique identifiers assigned by the device’s manufacturer and rarely change (I say “rarely” because I’ve seen hacks online). IP addresses are assigned by the router when a device connects to it, and these are subject to change if a device “forgets the network” and then re-establishes a connection. This info will come in handy later on in the post.

Part II - What packets can I capture with Herbivore?
I started out simple with Surya’s friendly sniffing application, Herbivore. While packets may be sent using a variety of protocols, this app filters TCP packets sent via HTTP and HTTPS on ports 80 and 443 respectively. Without actively-engaging in any other activities on my laptop—no typing or clicking anywhere—I started sniffing my router and quickly saw traffic flowing to and from my computer.

Though no applications were running in the “foreground”, background apps like Adobe Creative Cloud, Avast, and Dropbox were regularly sending packets out every few minutes to the internet. I also noticed that except for Avast, the ports where the packets originated on my laptop changed with each transmission.

I noticed that the traffic changed depending on if I left a browser open. Packets to Google (though not pictured below) and my password-manager app showed up when I left Firefox running with Inbox in a tab and variety of other pages open.

With a Chrome running and the same sites open, I saw the similar activity and much more from services I did not recognize like fw.adsafeprotected.com, s.yimg.com, ds.reson8.com, choices.truste.com, pr.ybp.yahoo.com, and others. Pretty sure I have’t visited anything Yahoo since the late 90s.

Part III - What packets can I capture with Wireshark?
Then I switched to Wireshark, which provides a great deal more information about network activity, covering a wide range of protocols for both local and remote communication. I ran two one-hour long sessions using Wireshark in promiscuous mode with the intent to capture activity from all my connected devices. For the first session my devices were on but idle, and for the second, I actively engaged with them as much as possible.

SESSION 1 — Idle Devices — Packets Captured: 15,438

This chart shows all the many protocols that were active during this period of hands-off devices.

Wireshark shows you the IP addresses where packets where headed and also where they originated. A Whois lookup told me that most of the conversation during this time was between my laptop and two different Google IP addresses and mostly via TCP, a handshaking protocol which explains the same percentage results in both categories. While my browsers were closed, I do run Google’s Backup & Sync (for Drive) in the background, and my guess is that this handshaking occurs through a port filtered out in Herbivore.

Top sources of packets:
44% My laptop
21% Google
16% My router

Top destinations of packets:
34% —> My laptop
21% —> Google

SESSION 2 — Engaged Devices — Packets Captured: 140,742

My activities included streaming a show on my Apple TV, texting on my phone, browsing sites on my iPad, and attempting to do some homework on my laptop (nearly impossible, I was so distracted), including visiting various sites and cloning Github repos.

Compared to Session 1’s protocol list, GQUIC (Google Quick UDP Internet Connections) leaped onto the scene.

Top sources of packets:
My laptop 28%
GitHub 25%
Google 17%

My router 2%
My phone 0%*
My tablet 0%*
My TV 0%*

Top destinations of packets:
69% —> My laptop

My router 0%*
My phone 0%*
My tablet 0%*
My TV 0%*

*
Wait a second: I was using ALL of my connected devices—how come I barely found any of their packets (so little that their percentages rounded to zero)? What I did find were related to internal device communication on my local network using MDNS and IGMPv2 protocols—the latter used to support online video streaming.

The screenshot below shows traffic from my iPad (click to enlarge). If you dig into any of the MDNS packets, you’ll find the name of my tablet, which happens to be my last name, clearly visible—to me anyone and else sniffing my network. (But not any more!)

Part IV - So where are the missing packets?
I returned to Herbivore and sniffed my Apple TV while streaming a movie to see if I could capture packets with a different app, and sure enough, I saw them. So was I doing something wrong in Wireshark? Nope. In office hours (thank you, Tom!) I learned that Apple changed their network card configurations a while back to only see what packets are sent to them. So my laptop is only suppose to see the traffic headed its way, which explains why it could not see packets specifically addressed to my other Apple devices. But I can get around this by using Herbivore, which uses ARP spoofing.

Part V - So what is ARP and ARP spoofing?
Remember how each device on a network has a MAC address and a corresponding IP address? When devices on the same network need to send data to one another, they need to know each other’s MAC addresses. 

How do they find each other’s MAC address? They ask by using their IP addresses!

They broadcast their question (in packet form) to every node on the network using Address Resolution Protocol (ARP) by calling on active IP addresses. When I device “hears” it’s IP address called, it responds with its MAC address.

For example, take a look at the first two entries in my Wireshark log. My router wants to know the identity of the device with the IP address of 192.168.1.141:

In my mind, ARP sounds like a medium-sized dog bark.

My router: “ARP! Who has 192.168.1.141? Tell me at 192.168.1.1” 

My laptop: “ARP! Hey there, I’m 192.168.1.141, and my MAC address is 24:a0:74:f2:db:7e.”

My router then maps my laptop’s IP address to that MAC address and stores it in an ARP table. (You can check your computer’s ARP table by typing arp -a into a terminal window.) Then it continues to query to ensure that it has the most up-to-date information. During one hour, my router broadcast the same question over 100 times.

In the Open Systems Interconnect (OSI) Network Model, which is used to describe how different computer systems talk with one another, MAC addresses are data link layer (or layer 2) addresses. Internet Protocol (IP) addresses, are network layer (or layer 3) addresses. Address Resolution Protocol is a call-and-response method that uses network layer addresses to find and map to data link layer addresses.

But here’s the thing about ARP: there’s nothing to prevent a device from lying. No authentication is needed for a device to send ARP messages. For example, my laptop can spoof the IP address of my router, and if that happens, then packets addressed to my router go to my laptop first.

And this is how I was able to see packets from my Apple TV with Herbivore.

My laptop: “ARP! I’m the router because my IP address is 192.168.1.1, and here’s my [actual] MAC address so we can talk.”

My Apple TV: “ARP! Cool, here’s my MAC address. I’m sending my packets to you since I trust anyone who tells me their network layer address.”

My laptop smirks and silently forwards the packets onto their original destination without being caught. This is also known as a type of man-in-the-middle attack.

References:
Address Resolution Protocol @ Wireshark
Address Resolution Protocol @ Wikipedia
ARP Spoofing @ Wikipedia
Man-in-the-middle attack @ Wikipedia
Understanding Networks and TCP/IP by Gregory White
What is the OSI Model? @ Cloudfare
How To Do A Man-in-the-Middle Attack Using ARP Spoofing & Poisoning by Shivam Singh Sengar

Week 3: Traceroutes

blog cover copy.png

We’re learning about the physical infrastructure of the Internet, including how bits travel back and forth across the ocean in mere milliseconds through undersea fiber optic cables (and we recently saw one of four NYC access points at Hunter Newby’s facility in downtown Manhattan here). In my mental model, I imagine the Internet as the global nervous system of the human race that’s increasingly spreading across the crust of the planet. Wireless satellite communications play a part in this, too, but as we learned from Hunter, this accounts for barely 1% of the pathways that information might travel.

Using the command line, traceroutes reveal the path of packets of bits to a particular destination, such as to a server hosting a website. Packets hop from router to router and each trace notes the router’s IP address, which autonomous system to which it belongs (networks maintained by internet service providers), and the amount of time (again, in milliseconds) it takes reach and return from each router.

This week, I performed traceroutes on three websites that I visit regularly to learn how my web requests are routed and which networks handle them.

Destinations

  1. This blog where I record my learning at ITP

  2. The P5 Web Editor where I sketch out computational ideas

  3. And YouTube where I consume a limited but daily dose of news and occasionally fall down rabbit holes of rabbit holes when I need a break from #1 and #2

Questions

  1. What are the physical paths that my web requests take?

  2. Do these paths change over time?

  3. Do these paths differ when requested from different geolocations in my city?

  4. Through which networks do these requests pass? And again, do they differ when requests are made from separate geolocations?

Process

  1. Via the command line, traceroute -a www.ellennickles.com, returns a detailed list of the aforementioned data points. But traceroute -n www.ellennickles.com, returns each hop with just the router’s IP address along with each packet’s roundtrip travel time. I recorded traceroutes for each of the three sites from both my home and from NYU and over the course of a week: on Monday 9.17.18, Wednesday 9.9.18, and Friday 9.21.18.

  2. I used ip-api.com to batch query the IP addresses for each traceroute, again via the command line. (Here’s an example from their site: curl ip-api.com/batch --data '[{"query": "208.80.152.201"}, {"query": "91.198.174.192"}]'.) This returns JSON-formatted data including the name of the internet server provider (ISP) along with the latitude and longitude of the router at each hop in the packet’s path.

  3. I converted these JSON files into CSV files to see the data in table form and also to easily grab the the routers’ geolocations, which I plotted on plain web map to quickly visualize the actual paths traveled across the Earth. (I’m sure there’s a way to programmatically do all of this, but I wasn’t sure what I wanted to do with the data until I saw it in this form.)

Findings
Geolocations - On each of the three days last week, the paths to each web destination did not vary that much, however the routes did differ according to where I performed the traceroutes. My home is about 3.75 miles away from NYU and on the same island, yet the routes taken were often quite different as demonstrated below.

Here are snapshots of that geospatial data from my map. All traceroutes visualized below were performed on Monday, 9.17.18. (However, data for all three days from both starting locations is here.)

Network Providers - For those traceroutes initiated at my home, around 50% of the hops are managed by Spectrum, which makes sense: I pay them monthly for Internet access. After that it varies. For traceroutes to my blog, there’s a hop through Vodafone, based in the UK, and then on to the American company, Akami Technologies, before it reaches Squarespace. For the P5 Editor, Tata Communications America (Tata started as an provider in India) handles the remaining 50% of the hops after Spectrum. For YouTube, there are a couple hops through Tata Communications America but then Google handles the rest (Google acquired YouTube in 2006).

For those run at ITP, NYU itself handles about the first third of hops, followed either by Tata Communications America, Akamai, or Google (for YouTube). The traceroute to my blog, however, reports some new mentions: GTT Communications, a multinational telecommunications company, and Telia Company AB, a Swedish company.

Takeaways & Next Steps
This blog is hosted by Squarespace, which is located on the same island where the traceroutes originated. Yet the packets still travel around the country and over the seabed, often several times, before reaching their destination, just blocks from NYU. When I started this investigation, I was mostly curious about the geographical distances travelled, but as I read in Chapter 11 of Linked, it’s not necessarily about physical distance—it’s how fast routers can move those bits. As we recently discussed in class, after I collected and mapped this data, routers use traceroutes to keep track of the fastest routes (in their routing tables) to shepherd packets to their destinations. For the most part routes are static, which my data supports—the one exception being the 9/21/18 traceroute from NYU to my blog. Changing routes would slow the system; this is one of the downsides of a mesh-type network. A good next step for me would be to include the roundtrip times to and from those routers on my map, as well as dynamically draw the lines to watch how the path emerges, and compare the routes again.