DarkNet Monitor Project

Posted on December 24, 2020
Tags: DarkNet network monitoring infosec 100DaysToOffload

Tim Lavoie

A Side Project Begins

Something like a couple years ago, my manager asked if anyone would be interested in a small, spare-time sort of side project. I jumped at the chance, because it let me stretch a bit, and do some techie work that I wasn’t at the time. Also, the company I work for is a fairly large organization, and it’s always interesting to get insights into other parts that I don’t normally interact with.

One of our very senior security guys gets to think of all the big-brain, big-picture stuff, but doesn’t have time to trying things out himself. Instead, he gets the occasional volunteer to explore a topic, creating some proof of concept (PoC) to explore the space a little. This was mine.

I’m not allowed to post source code, simply due to it needing approval from Legal to avoid liability issues and the like. I do have permission to discuss the details though, so here we are.

DarkNet? You mean Dark Web? Deep Web? Online Drug Sales?

In this case… no

Many of these terms get tossed around together, and it gets confusing for sure. For this post, I’m not referring to things like the Silk Road marketplace that got busted a few years back, nor other sites that you’d have to use Tor or something to connect to. Though I do appreciate Tor, and actually run a couple nodes. That’s for another time.

In this case, what I mean is unallocated network space, where there should be no traffic at all. When you have a block of internet addresses, you hand out IP (internet protocol) addresses to each device that you want on the network. When you connect to your internet provider, your modem gets assigned an address, and the same thing happens with your Wi-Fi at home.

What’s left, is what we’re referring here to as “DarkNet”. That is, if there is nothing assigned to a particular address, logically there should be no traffic to or from that address.

If there is traffic involving one of these addresses… well, that is interesting.

Objective

The goal was to create a tool to receive traffic sent to un-allocated company address space and forward details to our in-house logging platform. The concept is that where traffic should not be received at all, anything that is unexpected is automatically interesting. By recording this information and forwarding to the logging systems, it may inform our incident response team and other parties that are defending against malicious traffic.

Approach

A Docker container was created with define minimal system tools, defining packet capture and logging capabilities. The container has scripts to record nearly all network traffic in PCAP format as rolling set of files, retained on the container in case a responder needs to be able to investigate further.

A local, minimal Snort install that also logs locally was added, essentially as a base proof of concept. Ideally, this would be configured with rules to perform more than default inspection of the traffic.

A statically-compiled program was created to process each PCAP file. This program logs to a local database, and sends summary information to a local rsyslog instance within the Docker container. Rsyslog was configured to forward local logs to the logging platform.

Finally, an example Splunk query was created to confirm that incoming log events could be parsed and displayed.

Current(-ish) State

A container has been running for a few months on a test system, with an exclusion rule to minimize own noise. By default, the local DB contains information on every packet processed for the container’s lifetime. Splunk can be queried to confirm logs received.

Detailed Design

Docker Config

Docker containers are defined through a text file that defines a starting point image, then layers additional changes on top. The result is a new image that applies these changes in order to run in what appears to be a very minimal virtual environment. The following file is “Dockerfile” in the archive. It is starting with a minimal, Alpine Linux distribution, and then adding the pieces required.

  • Start with the already-minimal Alpine Linux base image.
  • Add a few handy tools: snort, tmux, tcpdump, bash, rsyslog, sqlite.
  • Copy in some configuration files and scripts to make it all work.
  • Fire up the daemons that will run.
  • Launch tcpdump to capture packets, via:

/usr/sbin/tcpdump -F pcap_filter -s 1500 -w trace-%Y-%m-%d_%H.%M.%S.pcap -G 60 -Z foo -z /app/darknet-mon

What this command does is broken down like this:

  • Use the filter defined in “pcap_filter” to capture almost all network traffic, except for that legitimate traffic which will be coming from this host.
  • Capture the full packet size up to a reasonable MTU.
  • Dump everything into a file using a date/timestamp as a name.
  • Rotate the file every minute (we’ll get more into this).
  • Run as user “foo”, once dropping root permissions.
  • Run the program called “darknet-mon” on the pcap file that’s just been closed.

The primary source code for this program is written in Haskell. It could be in anything, really; this program is just called for each PCAP file, effectively running as a short batch job each time.

The code creates separate types for various data fields, then defines a compound type that defines the fields of interest for a given packet.

The program takes the path of a PCAP file as its sole parameter, which in this case will be a file in the same subdirectory, /app.

A log entry is created, which includes the PID of the program as a sort of header for subsequent log entries that will be created during the run.

The program calls tcpdump for basic processing of the PCAP file, essentially converting a binary file into a text format with consistent fields for simple processing of the data of interest. For reference, this call looks basically like:

/usr/sbin/tcpdump –n –l –q –tttt –r pcapfile”

SQLite3 databases are used twice for a batch of PCAP data; first, entries are entered into an in-memory temporary database, which can be queried for any desired details in the single file of data, typically to report how many new packets are processed. The process is essentially repeated for the persistent database, written on the container as /app/foo.db. Each database is defined to create a single table called “packetcap”.

When reporting on the new file’s contents, the program runs queries to log starting and ending time/date stamps, to assist a responder by providing the window in question. Data is inserted into each database as a bulk transaction for performance reasons, since otherwise, SQLite will create a new connection per insert, and take much longer. With a single transaction, a batch of inserts is very quick.

The final piece reported on lists the new data using a query to display the packet type, source and destination IPs, and payload type.

The intent here is that it should show, at a high level, the distinct aspects of the traffic received during this time window, and ordered by the source IP address.

Building

Haskell programs, such as this one, often use the stack tool to create the build environment. It can be found at https://docs.haskellstack.org/en/stable/README/. It includes Docker support, which we will leverage to actually perform the build in the desired Linux distribution.

The stack.yaml file in the project archive defines the various parameters. The Docker configuration in this case will perform the build in a temporary Docker image, using a version of the compiler built for and running on Alpine Linux. As a result, it should be built on a Linux system with Docker running, and for which stack is already downloaded. Note that stack versions must match that contained in the base image; the referenced Alpine image currently has version 1.9.3, so get that version for the local Linux system. The script found at https://get.haskellstack.org/ is currently set to download v2.1.3, so a user would save the script and edit it accordingly to fetch the correct version. When the base image changes the version of stack included, the error message will prompt you to fetch a new one to match.

In the project directory, running “stack build” should take care of fetching, building the rest of the dependencies, and leave a static, darknet-mon binary in the dist/ folder. From there, a new Docker image would be created, to be deployed wherever makes sense. Finally, network rules would be defined to forward unallocated traffic to the host running Docker, so that the sensor (this package) would be able to see it.

Logging

A member of the logging team helped with defining queries that list logged data from this program. The specific query used is parsing out IPv4 packet data, as that is what was contained in the example testing. A similar query would be used to show IPv6.

Interacting with Docker

  • On the host, running “docker images” will show you the available images.
  • Run “docker ps” to see what current containers are running.
  • To start a shell and do things inside the container: docker exec -it /bin/bash
  • Note: Alpine doesn’t use bash by default, but the Dockerfile adds it to the deployment.
  • “sqlite3 foo.db” connects you to the database.
  • To count all recorded packets: select count(*) from packetcap;
  • “.headers on” enables column headers in queries.

Limitations and Next Steps

Note that the existing deployment must run to capture the full interface from the Docker container’s host, so do understand that it will capture traffic such as SSH to the host platform for admin purposes. The aforementioned pcap_filter file does limit the noise somewhat, but could be replaced with a deployment step that customizes this to the specific deployment site.

File cleanup: While the compiled app does compress processed PCAP files using gzip, it does not automatically delete old ones (e.g. > 90 days). The Dockerfile could have an entry added that creates a cron-style entry to do this, using the find command in the previous section. The database will also continue to grow, so perhaps an additional command there would be useful.

Reporting and use of this information should be refined further to get it beyond basic PoC stage.

Note that while Alpine is a nice small base for a minimal app configuration, building for MUSL instead of the standard glibc does cause some additional work in cross-compilation from a non-MUSL host.

If you’d like to comment, please feel free to add to the tweet for this post, here

Note: This is post #6 for #100DaysToOffload.