Network HTTP Packet Snooper
Roland Wooster, Stephen Williams, Patrick Brooks
April 25, 1996
The World Wide Web is now the dominant source of traffic on the Internet backbone, and may soon become the dominant source of traffic on local area networks as well. To better understand the characteristics of that traffic it is necessary to isolate HTTP packets from those of other network applications. We have constructed a tool that performs this function. The developed tool is experimentally compared with an existing tool set to analyze efficiency and performance.
We have developed a tool based on the packetfilter mechanism to collect HTTP packets from a network. This new tool is experimentally compared with an existing tcpdump-based tool set (described in §ii-A). Both tools produce a variant of the common log [1] which is used to measure and compare performance differences. This paper includes:
Webservers collect logs of traffic to that server, and proxy servers may be used to collect logs of usage by groups of http clients. However, there are several areas in which an independent, in-network tool can be used to collect information in ways that servers and proxies cannot. Because httpdump can run transparently on a broadcast local area network, it can be used in a variety of situations where it is cumbersome or impossible to collect logs otherwise. httpdump can be used to monitor client activity, collect logs for a group of servers as a ``third-party'', and collect logs at a network router facility.
To maximize the usefullness of the tool, the output is in a form that other WWW analysis tools can take advantage of. This form is known as the common format log file [1]. Both the tcpdump-based tool and httpdump generate this form of output and also create an extended version. This allows for compatibility with existing tools and allows new tools to take advantage of the extra information in the extended format.
The first tool, developed prior to this project, uses the output from tcpdump to create a log file. Tcpdump is executed in a manner that only passes up packets with a host on port 80 (the standard http daemon port). Then, a series of three PERL scripts:
Figure 1: Tool one, using tcpdump and PERL scripts
Packet filters are a highly optimized set of procedures to do work on network packets [2]. They are many times faster than previous filtering routines and are the basis for tcpdump. httpdump increases the performance level of the translation by implementing all of the filtering and conversion steps in `C' code. Once httpdump has captured a TCP packet, it places the packet data in a large array until it can be processed. The main thread then picks up the information and determines whether it is:
Figure 2: Simple overview of tool two, the
multi-threaded program's architecture.
Our initial design for httpdump was to have a single multi-threaded executable. Unfortunately, we found that the `C' libraries: pthread.h and pfilt.h conflicted with each other. There was no other solution than to design the program as two intercommunicating executables.
A First In First Out, FIFO [3], queue is used between the two executables. The writer to the FIFO is the tcppf executable which ``snoops'' the network for packets and quickly filters out the packets which can not possibly be part of an HTTP connection. The reader from the FIFO is the http executable. This program is the multi-threaded executable which monitors HTTP sessions.
In order to minimize the delay in collecting network packets, we chose to use a packet-filter [2] based approach. The packet-filter itself passes all TCP packets up to a highly optimized `C' program within the same thread. The `C' program then discards all packets determined to be irrelavent and passes packet headers and some data portions through a FIFO to the httpdump main process.
In order to minimize execution time, the main program was optimized; removing all divisions and multiplications, and minimizing additions and conditional branches. The FIFO buffer is also optimized by only passing data portions of the packet when the first two characters of the data field correspond to a possible client or server header.
This executable is the second half of the httpdump set of executables. It is designed around DEC threads, called pthreads, from the `C' library pthreads.h [4]. This module reads the FIFO queue and generates the ``common-format'' log file by learning about each HTTP session as they begin and terminate. This module has been designed with a default of 66 threads. Although there appear to be a large number of threads, there are only three different types of threads. The first thread type is the Packet Filter thread, which reads from the FIFO. The second type, the Main thread (not to be confused with main() of each and every `C' program), manages the data and the other threads. Finally the third type of thread is the Process thread. By default, 64 Process threads are created, along with one Main and one Packet Filter thread. The following sections explain what the main() function of the program initializes and the inter-thread communication process.
Figure 3(left) illustrates the statically allocated memory structures that are created by main(). The threads it creates are:
It statically allocates memory, in the form of array structures, for three structures:
As mentioned above, in an attempt to reduce the quantity of memory copied during program execution, small nodes, which form a single element in a linked list, are moved from one list to another. The nodes simply have their pointers changed, rather than allocating and freeing memory each time the list is lengthened or shortened. A node consists of only a single next pointer and an integer to store the cell number that contains the actual data. When a thread's linked list is not empty it knows that there is something for it to process. When a thread has finished processing the contents of a cell, the node's ``cell pointer'' is moved to the tail of the next list that should process it. If a thread finds that its list is empty, it will go to sleep until the next node arrives. In this manner, the threads never busy wait, saving CPU cycles. The executable runs in just under 5MB of memory.
Figure: The initialization state of memory structures
and inter-thread communication
The depth of the hash table was chosen to be 257. This number was chosen because it is the smallest prime number greater than four times the number of threads. The factor of four is rather arbitrary, however the primality of the hash table is believed to yield far fewer hash table collisions. The depth of the hash table is 25, this is to allow up to 25 client-port values to hash to the same value before new HTTP sessions are discarded.
Figure 3(right) illustrates the inter-thread communication between each of the threads. Each thread effectively communicates with the others by simply appending nodes to the list of cells that the recipient thread must process. Semaphore structures are used to ensure correct manipulation for addition to and removal from the list.
McBryde 118, one of two undergraduate computer science labs, was used to test the two tools described herein. We chose this lab for the following reasons:
We collected data using the tcpdump-based and filter-based tools on five weekdays each.
Our experiment is a
design, with 5 replications, are
described in [5]. The analysis parameters described in
[Table i].
Table i: Experimental Factors and Collection Schedule
Using the firewall logs, we determined the HIGH load level to be 4 http requests or more within a 5 second window. The low level was 1 or 2 requests during the same window. Requests that matched, the total number of firewall entries and the number of collected requests not in the firewall were recorded for each 5 second interval in which at least one request was found. The factor analysis used the total numbers for HIGH level and LOW level, ``matches'' and ``firewall'', for each day.
The machine used to collect the data was a DECstation 5000/125 with a MIPS RS2000 processor and 20MB main memory. The traffic collected on 10 weekdays represents 12797 accesses made by 21 client machines to a total of 1377 servers.
We expected that data collected using the threaded tool would be more complete for two reasons: First, the direct manipulation of packet filter files and the use of efficient, threaded programming methods should drop fewer network packets. Second, the tcpdump tool cannot run efficiently without limiting viewed packets to those where one of the connection ports is port 80. Therefore, it would not log any activity to http servers that are connected to other ports.
Visual inspection of log files collected by the firewall, and by both tools yeilded unexpected results. Whereas we had originally assumed that the firewall would collect all http data we found that client ``POST''s of form data were not recorded. The server reply, which often contains the posted data within the returned URL, was recorded. We also noted that the firewall log contains accesses using other methods, such as ftp. We still have not determined an efficient way to seperate ftp sessions made by a Web browser versus some other tool.
Because the firewall records named network addresses and httpdump records the IP ``dot'' address. Some problems also arose with respect to determining a URL match. Large server sites rotate the same ``name'' among several IP addresses [6]: small server sites alias several ``names'' to the same IP address. In these cases, we used the relative URL and the time of collection to determine a match.
By viewing time based traces [see Figures 4 and 5], we can see that the httpdump tool misses many references, even when only one reference is made within the 5 second window. The tcpdump tool, however, has far fewer vertical lines (which indicate missed references) (points with no vertical lines indicate times where the tool collected all references in the firewall log).
Error plots show a trend when the Tool is viewed as a factor, but not when the workload is viewed [Figures 6 and 7]. This indicates that the network load does not significantly effect the performance of either tool. A further analysis shows that the Tool accounts for 78.57% of the variation in efficiency, whereas the Load accounts for only 0.07% of the variation. The remainder of the variation is explained by the interaction between Tool and Load (1.65%) and error (19.71%). Using a 95% Confidence Interval, only the Tool used shows a significant difference. We can, therfore, conclude that the developed tool httpdump, in its current form, is a poor performer, as compared with the, previously developed, tcpdump filter.
Figure 4: Reference errors for Tool 1, using tcpdump
and PERL scripts
Figure 5: Reference errors for Tool 2, using httpdump
=2.025in =0.4Tool_err.eps
=2.025in =0.4Load_err.eps
Fig 7. Error plot for the Traffic load. Fig 8. Error plot for the Tool.
=2.025in =0.4Response-Error.eps
=2.025in =0.4Q-Q.eps
Fig. 9. Response-Error Plot Fig. 10. Residual has a ``Non-Normal'' distribution
Although httpdump was designed to run efficiently, it did not perform as well as expected. Even though the tcpdump filter was necessarily less inclusive, it outperformed the developed code. One possible cause for this problem is the small amount of memory on the machine doing the collecting. We hypothesize that, during times of low activity, the httpc process was swapped out. When a burst of activity arrived, the FIFO was filled (and overflowed), before the httpc process had time to page in.
While the current version of httpdump is a poor performer, we feel that it, ultimately, has more potential to collect a complete view of the http traffic on a local network. We will therefore attempt to improve its performance in the following ways:
We also intend to improve the usefuless of httpdump by extending it to dynamically switch protocols for services that use the WWW as a connection agent. This capability will become more valuable as the use of so-called ``plug-ins'' become more prevalent.
set up packet-filter options (permiscuous, pass-packets, make timestamp)
determine the physical network (ethernet or fddi)
open FIFO for writing
while(true){
get a TCP packet.
set pointer to end of physical network header
get IP datagram length
if (length impossible for ethernet) # Use length as error filter
discard packet # The condition uses bitwise operations only
read source IP address
read destination IP address
if IP options exist
skip over options
read source port
read destination port
read TCP flags
enter appropriate flags in httpdump flags
if (data_length > 40) # min TCP + IP header length
set an httpdump DATA flag
create timestamp struct for http
set pointer to beginning of ``data'' portion of packet
if (data[0] is between 0x40 and 0xA0) # Includes all capital letters
if (data[0] is one of {G,P,H} and data[1] is one of {E,O,E})
add the entire data portion of the packet to the http structure
write http structure to fifo
exit
}
set the thread's priority
while(not time to finish) {
semaphore.wait the unassigned counter
if (packet == client header) {
semaphore.wait (with timeout) the free thread counter
if (not timeout) {
create a new thread-client-server session
move the node related to the cell to the thread's process list
semaphore.signal thread's process counter
} else {
discard packet
move node related to cell to the free cell list
semaphore.signal free cell counter
}
} else if (packet == known HTTP session) {
move node related to the cell to the correct hread's process list
semaphore.signal threads's process list
semaphore.signal thread's process counter
} else {
discard packet
move node related to the cell to the free cell list
semaphore.signal free cell counter
}
}
set the thread's priority
open the FIFO for reading
while(not time to finish) {
obtain a free cell to add the next packet to
read the FIFO until some data arrives, store in the cell
move the node related to this cell to the unassigned cell list
semaphore.signal the unassigned counter
}
set the thread's priority
while(not time to finish) {
if (thread currently in use) {
semaphore.wait (with timeout) the process list counter
if (not timeout) {
if (packet == final packet of connection) {
write all collected information to the CERN.log file
move the node related to the cell to the free cell list
semaphore.signal free cell counter
add thread to the free thread list
semaphore.signal free thread counter
} else { /* not the final packet of the connection */
read required data from packet
move the node related to the cell to the free cell list
semaphore.signal free cell counter
}
} else { /* timed out */
write all known information upto the timeout to the CERN.log file
move the node related to the cell to the free cell list
semaphore.signal free cell counter
add thread to the free thread list
semaphore.signal free thread counter
}
}
;SPMlt;(URL):http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html;SPMgt;.
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -no_subdir paper.tex.
The translation was initiated by CERN server install on Thu Sep 26 14:14:59 EDT 1996