HTTPDUMP

Network HTTP Packet Snooper

Roland Wooster, Stephen Williams, Patrick Brooks

April 25, 1996

Abstract:

The World Wide Web is now the dominant source of traffic on the Internet backbone, and may soon become the dominant source of traffic on local area networks as well. To better understand the characteristics of that traffic it is necessary to isolate HTTP packets from those of other network applications. We have constructed a tool that performs this function. The developed tool is experimentally compared with an existing tool set to analyze efficiency and performance.

keywords28

Introduction

We have developed a tool based on the packetfilter mechanism to collect HTTP packets from a network. This new tool is experimentally compared with an existing tcpdump-based tool set (described in §ii-A). Both tools produce a variant of the common log [1] which is used to measure and compare performance differences. This paper includes:

Utility of httpdump

Webservers collect logs of traffic to that server, and proxy servers may be used to collect logs of usage by groups of http clients. However, there are several areas in which an independent, in-network tool can be used to collect information in ways that servers and proxies cannot. Because httpdump can run transparently on a broadcast local area network, it can be used in a variety of situations where it is cumbersome or impossible to collect logs otherwise. httpdump can be used to monitor client activity, collect logs for a group of servers as a ``third-party'', and collect logs at a network router facility.

 

Tool Architectures

To maximize the usefullness of the tool, the output is in a form that other WWW analysis tools can take advantage of. This form is known as the common format log file [1]. Both the tcpdump-based tool and httpdump generate this form of output and also create an extended version. This allows for compatibility with existing tools and allows new tools to take advantage of the extra information in the extended format.

 

Tool one: Tcpdump with PERL script translation

The first tool, developed prior to this project, uses the output from tcpdump to create a log file. Tcpdump is executed in a manner that only passes up packets with a host on port 80 (the standard http daemon port). Then, a series of three PERL scripts:

  1. converts the output to ASCII,
  2. filters out the packets with http headers, and
  3. generates the output record in the log file.
The output from one program is piped into the next [see Figure 1], such that all run as seperate concurrent processes on the same machine.

  figure55
Figure 1:  Tool one, using tcpdump and PERL scripts

 

Tool two: Packetfilter with `C' program

Packet filters are a highly optimized set of procedures to do work on network packets [2]. They are many times faster than previous filtering routines and are the basis for tcpdump. httpdump increases the performance level of the translation by implementing all of the filtering and conversion steps in `C' code. Once httpdump has captured a TCP packet, it places the packet data in a large array until it can be processed. The main thread then picks up the information and determines whether it is:

TCP packets for each active http session are handled by a seperate Process thread [see Figure 2]. When the process thread receives a ``FIN'' packet, it records the connect time, writes out a log file entry, and frees itself for the next http session.

  figure76
Figure 2:   Simple overview of tool two, the multi-threaded program's architecture.

System Design

Our initial design for httpdump was to have a single multi-threaded executable. Unfortunately, we found that the `C' libraries: pthread.h and pfilt.h conflicted with each other. There was no other solution than to design the program as two intercommunicating executables.

Interprocess communication between the two executables

A First In First Out, FIFO [3], queue is used between the two executables. The writer to the FIFO is the tcppf executable which ``snoops'' the network for packets and quickly filters out the packets which can not possibly be part of an HTTP connection. The reader from the FIFO is the http executable. This program is the multi-threaded executable which monitors HTTP sessions.

tcppf

In order to minimize the delay in collecting network packets, we chose to use a packet-filter [2] based approach. The packet-filter itself passes all TCP packets up to a highly optimized `C' program within the same thread. The `C' program then discards all packets determined to be irrelavent and passes packet headers and some data portions through a FIFO to the httpdump main process.

In order to minimize execution time, the main program was optimized; removing all divisions and multiplications, and minimizing additions and conditional branches. The FIFO buffer is also optimized by only passing data portions of the packet when the first two characters of the data field correspond to a possible client or server header.

httpc

This executable is the second half of the httpdump set of executables. It is designed around DEC threads, called pthreads, from the `C' library pthreads.h [4]. This module reads the FIFO queue and generates the ``common-format'' log file by learning about each HTTP session as they begin and terminate. This module has been designed with a default of 66 threads. Although there appear to be a large number of threads, there are only three different types of threads. The first thread type is the Packet Filter thread, which reads from the FIFO. The second type, the Main thread (not to be confused with main() of each and every `C' program), manages the data and the other threads. Finally the third type of thread is the Process thread. By default, 64 Process threads are created, along with one Main and one Packet Filter thread. The following sections explain what the main() function of the program initializes and the inter-thread communication process.

Initialization structure

Figure 3(left) illustrates the statically allocated memory structures that are created by main(). The threads it creates are:

It statically allocates memory, in the form of array structures, for three structures:

As mentioned above, in an attempt to reduce the quantity of memory copied during program execution, small nodes, which form a single element in a linked list, are moved from one list to another. The nodes simply have their pointers changed, rather than allocating and freeing memory each time the list is lengthened or shortened. A node consists of only a single next pointer and an integer to store the cell number that contains the actual data. When a thread's linked list is not empty it knows that there is something for it to process. When a thread has finished processing the contents of a cell, the node's ``cell pointer'' is moved to the tail of the next list that should process it. If a thread finds that its list is empty, it will go to sleep until the next node arrives. In this manner, the threads never busy wait, saving CPU cycles. The executable runs in just under 5MB of memory.


Figure:  The initialization state of memory structures and inter-thread communication

The depth of the hash table was chosen to be 257. This number was chosen because it is the smallest prime number greater than four times the number of threads. The factor of four is rather arbitrary, however the primality of the hash table is believed to yield far fewer hash table collisions. The depth of the hash table is 25, this is to allow up to 25 client-port values to hash to the same value before new HTTP sessions are discarded.

Inter-thread communication

Figure 3(right) illustrates the inter-thread communication between each of the threads. Each thread effectively communicates with the others by simply appending nodes to the list of cells that the recipient thread must process. Semaphore structures are used to ensure correct manipulation for addition to and removal from the list.

Data Collection

McBryde 118, one of two undergraduate computer science labs, was used to test the two tools described herein. We chose this lab for the following reasons:

  1. there is a large population of potential users,
  2. precedence for collection of WWW traffic has been set,
  3. all WWW accesses are required to go through a gateway proxy server, which will provide an accounting of the actual http traffic, and
  4. there is an abundance of power outlets and network connections.

We collected data using the tcpdump-based and filter-based tools on five weekdays each.

Experimental Design and Performance Analysis

Our experiment is a   figure134 design, with 5 replications, are described in [5]. The analysis parameters described in [Table i].

  tex2html_wrap_inline489
Table i:  Experimental Factors and Collection Schedule

Using the firewall logs, we determined the HIGH load level to be 4 http requests or more within a 5 second window. The low level was 1 or 2 requests during the same window. Requests that matched, the total number of firewall entries and the number of collected requests not in the firewall were recorded for each 5 second interval in which at least one request was found. The factor analysis used the total numbers for HIGH level and LOW level, ``matches'' and ``firewall'', for each day.

The machine used to collect the data was a DECstation 5000/125 with a MIPS RS2000 processor and 20MB main memory. The traffic collected on 10 weekdays represents 12797 accesses made by 21 client machines to a total of 1377 servers.

We expected that data collected using the threaded tool would be more complete for two reasons: First, the direct manipulation of packet filter files and the use of efficient, threaded programming methods should drop fewer network packets. Second, the tcpdump tool cannot run efficiently without limiting viewed packets to those where one of the connection ports is port 80. Therefore, it would not log any activity to http servers that are connected to other ports.

Visual inspection of log files collected by the firewall, and by both tools yeilded unexpected results. Whereas we had originally assumed that the firewall would collect all http data we found that client ``POST''s of form data were not recorded. The server reply, which often contains the posted data within the returned URL, was recorded. We also noted that the firewall log contains accesses using other methods, such as ftp. We still have not determined an efficient way to seperate ftp sessions made by a Web browser versus some other tool.

Because the firewall records named network addresses and httpdump records the IP ``dot'' address. Some problems also arose with respect to determining a URL match. Large server sites rotate the same ``name'' among several IP addresses [6]: small server sites alias several ``names'' to the same IP address. In these cases, we used the relative URL and the time of collection to determine a match.

By viewing time based traces [see Figures 4 and 5], we can see that the httpdump tool misses many references, even when only one reference is made within the 5 second window. The tcpdump tool, however, has far fewer vertical lines (which indicate missed references) (points with no vertical lines indicate times where the tool collected all references in the firewall log).

Error plots show a trend when the Tool is viewed as a factor, but not when the workload is viewed [Figures 6 and 7]. This indicates that the network load does not significantly effect the performance of either tool. A further analysis shows that the Tool accounts for 78.57% of the variation in efficiency, whereas the Load accounts for only 0.07% of the variation. The remainder of the variation is explained by the interaction between Tool and Load (1.65%) and error (19.71%). Using a 95% Confidence Interval, only the Tool used shows a significant difference. We can, therfore, conclude that the developed tool httpdump, in its current form, is a poor performer, as compared with the, previously developed, tcpdump filter.

  table152
Figure 4:  Reference errors for Tool 1, using tcpdump and PERL scripts

  figure193
Figure 5:  Reference errors for Tool 2, using httpdump


=2.025in =0.4Tool_err.eps

=2.025in =0.4Load_err.eps

Fig 7. Error plot for the Traffic load. Fig 8. Error plot for the Tool.

=2.025in =0.4Response-Error.eps

=2.025in =0.4Q-Q.eps

Fig. 9. Response-Error Plot Fig. 10. Residual has a ``Non-Normal'' distribution

Conclusions and Future Work

Although httpdump was designed to run efficiently, it did not perform as well as expected. Even though the tcpdump filter was necessarily less inclusive, it outperformed the developed code. One possible cause for this problem is the small amount of memory on the machine doing the collecting. We hypothesize that, during times of low activity, the httpc process was swapped out. When a burst of activity arrived, the FIFO was filled (and overflowed), before the httpc process had time to page in.

While the current version of httpdump is a poor performer, we feel that it, ultimately, has more potential to collect a complete view of the http traffic on a local network. We will therefore attempt to improve its performance in the following ways:

We also intend to improve the usefuless of httpdump by extending it to dynamically switch protocols for services that use the WWW as a connection agent. This capability will become more valuable as the use of so-called ``plug-ins'' become more prevalent.

Pseudo-code for tcppf

set up packet-filter options (permiscuous, pass-packets, make timestamp)
determine the physical network (ethernet or fddi)
open FIFO for writing
while(true){
   get a TCP packet.
   set pointer to end of physical network header
   get IP datagram length
   if (length impossible for ethernet)      # Use length as error filter
      discard packet                        # The condition uses bitwise operations only
   read source IP address
   read destination IP address
   if IP options exist
      skip over options
   read source port
   read destination port
   read TCP flags
   enter appropriate flags in httpdump flags
   if (data_length > 40)        # min TCP + IP header length
      set an httpdump DATA flag
   create timestamp struct for http
   set pointer to beginning of ``data'' portion of packet
   if (data[0] is between 0x40 and 0xA0) # Includes all capital letters
      if (data[0] is one of {G,P,H} and data[1] is one of {E,O,E})
         add the entire data portion of the packet to the http structure
   write http structure to fifo
exit
}

Pseudo-code for http Main thread

set the thread's priority
while(not time to finish) {
  semaphore.wait the unassigned counter
  if (packet == client header) {
    semaphore.wait (with timeout) the free thread counter
    if (not timeout) {
      create a new thread-client-server session
      move the node related to the cell to the thread's process list
      semaphore.signal thread's process counter
    } else {
      discard packet
      move node related to cell to the free cell list
      semaphore.signal free cell counter
    }
  } else if (packet == known HTTP session) {
    move node related to the cell to the correct hread's process list
    semaphore.signal threads's process list
    semaphore.signal thread's process counter
  } else {
    discard packet
    move node related to the cell to the free cell list
    semaphore.signal free cell counter
  }
}

Pseudo-code for http Packet thread

set the thread's priority
open the FIFO for reading
while(not time to finish) {
  obtain a free cell to add the next packet to
  read the FIFO until some data arrives, store in the cell
  move the node related to this cell to the unassigned cell list
  semaphore.signal the unassigned counter
}

Pseudo-code for http Process threads

set the thread's priority
while(not time to finish) {
  if (thread currently in use) {
    semaphore.wait (with timeout) the process list counter
    if (not timeout) {
      if (packet == final packet of connection) {
        write all collected information to the CERN.log file
        move the node related to the cell to the free cell list
        semaphore.signal free cell counter
        add thread to the free thread list
        semaphore.signal free thread counter
      } else { /* not the final packet of the connection */
        read required data from packet
        move the node related to the cell to the free cell list
        semaphore.signal free cell counter
      }
    } else {  /* timed out */
      write all known information upto the timeout to the CERN.log file
      move the node related to the cell to the free cell list
      semaphore.signal free cell counter
      add thread to the free thread list
      semaphore.signal free thread counter
    }
}

References

1
W. W. W. Consortium, ``W3c httpd common log format.''

;SPMlt;(URL):http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html;SPMgt;.

2
S. McCanne and V. Jacobson, ``The bsd packet filter: A new architecture for user-level packet capture.'' Proceedings: 1993 Winter USENIX Conference, January 1993.

3
W. R. Stevens, UNIX Network programming. Prentice Hall, 1990.

4
DEC, man-page: pthread(3) - Introduction to DCE Threads. ULTRIX V4.4 (Rev. 69).

5
R. Jain, The Art of Computer Systems Performance Analysis. John Wiley and Sons, Inc., 1991.

6
T. T. Kwan, R. E. McGrath, and D. A. Reed, ``User access patterns to NCSA's World-wide Web server,'' Tech. Rep. UIUCDCS-R-95-1934, Dept. of Computer Science, Univ. IL, Feb. 1995.

About this document ...

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 -no_subdir paper.tex.

The translation was initiated by CERN server install on Thu Sep 26 14:14:59 EDT 1996


CERN server install
Thu Sep 26 14:14:59 EDT 1996