Linux Setup: July 2015

So as a scientific exercise, I endeavoured on finding out what would be involved in monitoring traffic going from the internal network to the outside world. The article assumes quite a bit of prior knowledge and is based on a specific use case with lots of assumptions.

There are many ways to approach this problem. Some alternative options would be (in no particular order):

Have a hub on the network (before the internet connection) and monitor all traffic coming from it
Build a router setup on the linux machine with two external network cards (one on the internal side and one on the internet side)
Hack the Internet router to support packets inspection
Install hardware solutions to monitor traffic.

The option I decided to explore was to route all traffic through a linux server, before it goes to the router. To achieve this, these were the rough steps to achieve this:

Install DHCP server
Setup DHCP server to give out the linux server's IP as the gateway
Use tcpdump to record all traffic
Investigative tools to analyse the collected traffic

Basic Network layout

For the sake of all the material that follows, here is a summary of what my network looks like:

192.168.0.1 - I have a router (Cable modem) that provides access to the greater Internet.
192.168.0.2 - Linux Server (Gentoo distribution)
192.168.0.3 - 192.168.0.9 - Other servers, devices and appliances (Printer, SIP Server, TV, etc)
192.168.0.10 - 192.168.0.99 - Computer devices (PCs, Laptops, mobile phones, tablets, Kindle, etc)

DHCP Server

Depending on your linux distribution, you can pick the DHCP server you are most comfortable with. I use ISC DHCP. Depending on the server you have chosen, your configuration will vary, but I've included some snippets from the configuration to demonstrate the setup I settled on.

In my installation, the file lives in /etc/dhcp/dhcpd.conf

# option definitions common to all supported networks...
option domain-name-servers 208.67.220.220, 208.67.222.222, 8.8.8.8, 8.8.4.4;

default-lease-time 360000;
max-lease-time 3600000;

# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.
authoritative;

# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;

# This is a very basic subnet declaration.
subnet 192.168.0.0 netmask 255.255.255.0 {
  # Assign IPs between 192.168.0.10-192.168.0.99
  range 192.168.0.10 192.168.0.99;
  # Address of server to route traffic through
  option routers 192.168.0.2;
}

Some explanations -
option domain-name-servers defines my DNS resolving servers. I use public servers for both speed and independence from my ISPs DNS (the 8.8.* addresses are Google's free DNS, while the other two are part of OpenDNS)
To determine which of the free DNS servers to use, I settled on a decision based on speed. To determine which is the fastest DNS in terms of latency, I used the following command

parallel -j0 --tag dig @{} "$*" ::: 208.67.222.222 208.67.220.220 198.153.192.1 198.153.194.1 156.154.70.1 156.154.71.1 8.8.8.8 8.8.4.4 | grep Query | sort -nk5

It requires the parallel command. Essentially, it cycles through the list of known free DNS servers and does a query in each, then sorts the results. The top is the fastest.

max-lease-time identifies how long before a new lease is required. I set this to a very long time, since I don't have a congested network and I would like for devices to keep their IPs for as long as practical.

range 192.168.0.10 192.168.0.99; identifies the IPs I'd like dynamically allocated

option routers 192.168.0.2; - this is the most critical part for what I was trying to achieve. I want all traffic on the network to be routed through the Linux server machine, so I set that as the router (gateway) address.

After this step, any device that was getting their address automatically, was also routing all its internet access through the linux server (and thus packets could be intercepted and inspected on the way in and out).

Record all network traffic

To record all the traffic, I settled on using tcpdump. It's one of the most established and reliable solutions. Instead of doing real time inspection, I setup a running log that dumps everything to a file.

My shell script was:

#!/bin/bash
export FILE=/var/log/internet.log
export INTIP=192.168.0.2
export INTNET=192.168.0.0/16
tcpdump -s0 -i enp2s0 -w $FILE  "(ip src net $INTNET and not ip src $INTIP) or (ip dst net $INTNET and not ip dst $INTIP)"

The variables at the top set where the log file should be recorded, what's the Server's IP, the intranet ip range and the network card's device name (please note yours could be something like "eth0" instead). The condition in the end ensured that traffic from the server itself or internally between the server and the Local Area Network were not filed.

And just like that, I had all traffic between the computers on the network and the outside world being logged to a file. It turns out that was the easy part.

Traffic analysing

As I soon found out, it turns out there are many tools to make sense of what was captured by tcpdump in what's known as a pcap (packet capture) file. Most of the tools I ran into were quite "mature" (old) and very technical (not very easy or straight forward to use). I'll discuss only a few of them based on what I found useful

wireshark

Most people would have heard of that one. It is capable of monitoring and investigating traffic. It is very powerful and can allow you to do a lot of things, but with that power, also comes the drawback of it being quite hard to use. There are new features that make some things easy (like export all HTTP objects to files), but it is still somewhat of a cumbersome tool. In summary, if you know what you are doing, this may be the only tool you need. But if you want a quick look and something easy to use, you probably want something simpler.

tcpflow

This tool is quite simple to use and can give you a lot of readable information quickly.

tcpflow -aCJ -e http -r /var/log/internet.log

This is the basic usage scenario and it shows the raw ASCII data of the requests and responses between the local machines and remote servers.

tcpflow -aCJ -e http -r /var/log/internet.log | grep -E "(GET|POST|Host)"

Similar to the above command, but instead of showing the complete ASCII packets, it's only showing the HTTP requests and hosts the request is to. It can help with getting a good idea about users are generally doing on the web, without going into details.

tcpflow -aCJ -e http -r /var/log/internet.log | grep -E "Host" | sort -u | awk '{ print $2 }'

Another variation, which will give you just a list of the websites users have gone to.

chaosreader

Now we are getting a bit more user friendly, This simple tool merges the tcp streams and extracts the relevant files. It stores the images and html files that have been downloaded and shows a complete log of all the established connections. It generates HTML files as reports, that are very easy to use.

The usage is simple:

chaosreader /var/log/internet.log

Just make sure you run the command inside an empty directory - it will generate a lot of files as part of the export.

NetworkMiner

I finally found a tool I am pretty happy with. Some parts of its interface are not as straight forward as I would like but by and large, it's the most comprehensive tool to service the needs I had. It lets you see all the remote connections, all the HTTP streams, all the files and images that have been downloaded and a couple of views to investigate the raw text that has been passed through. All in all, it can give you all the information you need about what has happened on your network.

It's a GUI, so you pick your capture file from the interface. It takes a bit of time to do its processing, but it is relatively quick. It also stores all the downloaded files locally.

It's written in .NET, but it is designed to run under Mono as well.

It is open source and I couldn't help doing a couple of modifications. The main things I changed is to store session data in a sub-folder with the hosts' names instead of their IPs as well as ignore storing certificate files - it was just creating a lot of useless clutter. Further modifications I'm considering is exporting data to DB/SQL files to be easier to analyse and search through as well as some UI changes to allow looking at HTTP packets quickly.

Conclusions

All in all, after a few trials and errors and some investigation, it was fairly straight forward to get network monitoring up and running. On the downside, this approach would have worked a lot better in terms of results a few years ago, before most major sites and communication applications started forcing exclusive SSL encrypted traffic. Now, as expected, the only information to be gained from an HTTPS connection is the address of the site - non of the content is visible. Same with messaging applications. If you need to monitor network traffic on your own network, you are probably better off installing software locally on the machines/devices you are interested in. fiddler for PCs and mSpy for mobile devices would be a good place to start. There are ways to do it through a router, but they get complex and rely on playing with trusted certificates. If you decide to go down that path, this article would get you only half way there.

Linux Setup

04 July 2015

Monitoring Traffic