Picture World Wide Web (WWW) Traffic Analysis Tools


Computer Science Department
Virginia Polytechnic and State University
Blacksburg, VA 24061-0106

For the purposes of conducting analyses of WWW traffic at Virginia Tech, we have developed several tools which can:

Tools currently available:

Picture Cern2csl [PERL]IS NOW AVAILABLE
This code takes a CERN-Common Log file and generates a csl output file for use in CHITRA
Here is the help for it:
USAGE: cern2csl [-ohcfpsntra -l "reg-expr"] <log.12Mon95> [<log2.11Mon95>]

cern2csl converts CERN-httpd server log files to a CHITRA format .csl file
         -o  will create a .csl that is compatible with CHITRA-93
         -h  prints this message
         -u  attempt to make time stamps unique 
         -m  MAP client and user to unique integer and output 
             a NEW CERN-http server log file with integers in 
             place of actual data to provide user anonymity.
Component switches: (at least one required for output)
         -a  ALL of the below switches turned ON

         -c  output the client process running the browser
         -f  output the source machine serving the file
         -p  the protocol used. Of 'http', 'ftp', 'gopher', 'wais'
         -s  the size of the URL in bytes
         -n  the name of the file requested
         -t  the type of the file requested. 'html','mpeg','txt'...etc.
         -r  whether or not the file made a cache hit. 2 = hit, 1 = no hit
         -a  ALL: all of the above component switches will be set to on
         -l  "vt\.edu" considers a source machine containing the string to
                      be local.  2 = local, 1 = NOT local

 Formal arguments are CERN log files. You may include as many as you like
 and they will be merged and sorted into a single .csl file 

Picture httpdump [PERL] Two pieces of code take output from tcpdump; filter it, and generate a Common Log Format file. This output will be anonymized for the protection of the individual users.
Educational institutions wishing to receive this software NOW, please send mail to williams@csgrad.cs.vt.edu

Picture CLFmunge [PERL](V1.2) IS AVAILABLE
This code takes a Common Log Format file (as defined by NCSA and CERN) and generates a new file in the same format with client IP and user fields replaced with unique integers. This allows for traffic analyses without comprimizing the individual users. This is also a prerequisite in Europe for making log files available for others to use.

Additions for V1.1 (New as of 27-Jun-96)
  • Map integers are now broken by the number of "parts" to an IP or URL.
  • You can now map Client IP, Server IP and the relative URL.
  • Map integers are consistent between Client IP and Server IP "parts".
  • You can save the integer mapping in order to maintain map consistency between log files.
  • You can define Client and Server IP substrings that you consider "LOCAL".
Additions for V1.2 (New as of 15-Oct-97)
  • Selecting a "-level" of "URL" now also maps any query string to a unique integer.

Here is the help for CLFmunge.pl:
USAGE: CLFmunge.pl [-h -use <dir> -local "list" -level "list"]


CLFmunge.pl maps sensitive client info from a  Common Log Format server or
proxy log to a unique unrevertable integer.

    -h  prints this message
    -use <directory>  gets and saves all map information for commonality
                     between multiple log files.
    -level "comma-sep list" DEFAULT 'CIP' which fields to map
           Choices:  'CIP'  Client IP address
                     'SIP'  Server IP address
                     'URL'  Relative path and filename in URL
           NOTE:   UID is ALWAYS mapped if it exists.
    -local "comma-sep list" domain names for clients and servers in this
            will be mapped to the key phrase 'local'.
           NOTE: DO NOT start substitution strings with "."

    Formal argument is a Common Log Format file.
    The output file is .mng 
  

Go to the home page for the Chitra project


PictureSUCCEED

The development of these tools was supported, in part, by a National Science Foundation grant CCR-9211342 and the National Science Foundation SUCCEED Coalition Cooperative Agreement No. EID-9109853).SUCCEED is a coalition of eight schools and colleges working to enhance engineering education for the twenty-first century.


Authors of WWW tools in Chitra:

  • Ghaleb Abdulla
  • Stephen Williams

  • Picture Please send inquiries and commments to succeed-people@vtopus.cs.vt.edu.

    Number of accesses: (since Feb 8, 1996).

    Last Modified: 27-June-1996