Week 7: Packet Analysis

Last week we spoke with ITP alumnus Surya Mattu about data traffic to and from network-connected devices. While the data itself is often encrypted, the metadata reveals quite a lot about a user's Internet activity and habits. Earlier this year, Surya and a colleague collected and analyzed packets from a fully-connected smart home (documented here) and among many other things, learned how chatty devices stay active even when their humans are away. Curious about the situation in my own home, I started my packet-scanning survey with the intention to compare differences between using my devices and leaving them connected but idle. The initial results opened up about a million questions for me, and I quickly meandered down some side roads…here we go:

Part I - What devices are on my network?
My home has a router that is connected to four Apple devices: a laptop, a TV, a tablet, and a phone. Each device has a Media Access Control (MAC) address and an private IP address. MAC addresses are unique identifiers assigned by the device’s manufacturer and rarely change (I say “rarely” because I’ve seen hacks online). IP addresses are assigned by the router when a device connects to it, and these are subject to change if a device “forgets the network” and then re-establishes a connection. This info will come in handy later on in the post.

Part II - What packets can I capture with Herbivore?
I started out simple with Surya’s friendly sniffing application, Herbivore. While packets may be sent using a variety of protocols, this app filters TCP packets sent via HTTP and HTTPS on ports 80 and 443 respectively. Without actively-engaging in any other activities on my laptop—no typing or clicking anywhere—I started sniffing my router and quickly saw traffic flowing to and from my computer.

Though no applications were running in the “foreground”, background apps like Adobe Creative Cloud, Avast, and Dropbox were regularly sending packets out every few minutes to the internet. I also noticed that except for Avast, the ports where the packets originated on my laptop changed with each transmission.

I noticed that the traffic changed depending on if I left a browser open. Packets to Google (though not pictured below) and my password-manager app showed up when I left Firefox running with Inbox in a tab and variety of other pages open.

With a Chrome running and the same sites open, I saw the similar activity and much more from services I did not recognize like fw.adsafeprotected.com, s.yimg.com, ds.reson8.com, choices.truste.com, pr.ybp.yahoo.com, and others. Pretty sure I have’t visited anything Yahoo since the late 90s.

Part III - What packets can I capture with Wireshark?
Then I switched to Wireshark, which provides a great deal more information about network activity, covering a wide range of protocols for both local and remote communication. I ran two one-hour long sessions using Wireshark in promiscuous mode with the intent to capture activity from all my connected devices. For the first session my devices were on but idle, and for the second, I actively engaged with them as much as possible.

SESSION 1 — Idle Devices — Packets Captured: 15,438

This chart shows all the many protocols that were active during this period of hands-off devices.

Wireshark shows you the IP addresses where packets where headed and also where they originated. A Whois lookup told me that most of the conversation during this time was between my laptop and two different Google IP addresses and mostly via TCP, a handshaking protocol which explains the same percentage results in both categories. While my browsers were closed, I do run Google’s Backup & Sync (for Drive) in the background, and my guess is that this handshaking occurs through a port filtered out in Herbivore.

Top sources of packets:
44% My laptop
21% Google
16% My router

Top destinations of packets:
34% —> My laptop
21% —> Google

SESSION 2 — Engaged Devices — Packets Captured: 140,742

My activities included streaming a show on my Apple TV, texting on my phone, browsing sites on my iPad, and attempting to do some homework on my laptop (nearly impossible, I was so distracted), including visiting various sites and cloning Github repos.

Compared to Session 1’s protocol list, GQUIC (Google Quick UDP Internet Connections) leaped onto the scene.

Top sources of packets:
My laptop 28%
GitHub 25%
Google 17%

My router 2%
My phone 0%*
My tablet 0%*
My TV 0%*

Top destinations of packets:
69% —> My laptop

My router 0%*
My phone 0%*
My tablet 0%*
My TV 0%*

*
Wait a second: I was using ALL of my connected devices—how come I barely found any of their packets (so little that their percentages rounded to zero)? What I did find were related to internal device communication on my local network using MDNS and IGMPv2 protocols—the latter used to support online video streaming.

The screenshot below shows traffic from my iPad (click to enlarge). If you dig into any of the MDNS packets, you’ll find the name of my tablet, which happens to be my last name, clearly visible—to me anyone and else sniffing my network. (But not any more!)

Part IV - So where are the missing packets?
I returned to Herbivore and sniffed my Apple TV while streaming a movie to see if I could capture packets with a different app, and sure enough, I saw them. So was I doing something wrong in Wireshark? Nope. In office hours (thank you, Tom!) I learned that Apple changed their network card configurations a while back to only see what packets are sent to them. So my laptop is only suppose to see the traffic headed its way, which explains why it could not see packets specifically addressed to my other Apple devices. But I can get around this by using Herbivore, which uses ARP spoofing.

Part V - So what is ARP and ARP spoofing?
Remember how each device on a network has a MAC address and a corresponding IP address? When devices on the same network need to send data to one another, they need to know each other’s MAC addresses. 

How do they find each other’s MAC address? They ask by using their IP addresses!

They broadcast their question (in packet form) to every node on the network using Address Resolution Protocol (ARP) by calling on active IP addresses. When I device “hears” it’s IP address called, it responds with its MAC address.

For example, take a look at the first two entries in my Wireshark log. My router wants to know the identity of the device with the IP address of 192.168.1.141:

In my mind, ARP sounds like a medium-sized dog bark.

My router: “ARP! Who has 192.168.1.141? Tell me at 192.168.1.1” 

My laptop: “ARP! Hey there, I’m 192.168.1.141, and my MAC address is 24:a0:74:f2:db:7e.”

My router then maps my laptop’s IP address to that MAC address and stores it in an ARP table. (You can check your computer’s ARP table by typing arp -a into a terminal window.) Then it continues to query to ensure that it has the most up-to-date information. During one hour, my router broadcast the same question over 100 times.

In the Open Systems Interconnect (OSI) Network Model, which is used to describe how different computer systems talk with one another, MAC addresses are data link layer (or layer 2) addresses. Internet Protocol (IP) addresses, are network layer (or layer 3) addresses. Address Resolution Protocol is a call-and-response method that uses network layer addresses to find and map to data link layer addresses.

But here’s the thing about ARP: there’s nothing to prevent a device from lying. No authentication is needed for a device to send ARP messages. For example, my laptop can spoof the IP address of my router, and if that happens, then packets addressed to my router go to my laptop first.

And this is how I was able to see packets from my Apple TV with Herbivore.

My laptop: “ARP! I’m the router because my IP address is 192.168.1.1, and here’s my [actual] MAC address so we can talk.”

My Apple TV: “ARP! Cool, here’s my MAC address. I’m sending my packets to you since I trust anyone who tells me their network layer address.”

My laptop smirks and silently forwards the packets onto their original destination without being caught. This is also known as a type of man-in-the-middle attack.

References:
Address Resolution Protocol @ Wireshark
Address Resolution Protocol @ Wikipedia
ARP Spoofing @ Wikipedia
Man-in-the-middle attack @ Wikipedia
Understanding Networks and TCP/IP by Gregory White
What is the OSI Model? @ Cloudfare
How To Do A Man-in-the-Middle Attack Using ARP Spoofing & Poisoning by Shivam Singh Sengar