"A Web script is a program that can be executed by the Web
server in response to Web requests." [Yeager and McGrath]
Web scripts
can be written in any language - C, Perl, UNIX shell
take their input from the Web (e.g., from a form) and write as output
an HTTP response packet that includes a document (e.g., html or plain text)
may call other programs or scripts on the Web server or even on another
machine
How Web Servers Execute Scripts
A Web server must
distinguish scripts from static documents based on URL,
locate code to execute (e.g., the script) on its file system,
check whether the script has execute permission,
start the script, and pass form data (i.e., fields after "?"
in a GET or the body of a POST) to script,
route output from the script back to the Web browser, and
send an error message to browser if script cannot be completed, and
then close the network connection.
Distinguishing Scripts from Static Documents
Each Web server specifies what directories or files can contain scripts.
Hence scripts can be identified based on URL.
NCSA httpd usually uses cgi-bin directory for scripts. Our
earlier example used http://ei.cs.vt.edu/cgi-bin/wwwbtb/SampleForm.pl.
NCSA httpd can also treat all *.cgi files as scripts.
Common Gateway Interface (CGI)
CGI specifies rules for passing data between the Web server (httpd)
and a script.
CGI rules are platform dependent:
Platform
Script runs as
Input is
Output is
UNIX
process
stdin and environment variables
standard out
Macintosh
Mac script
Apple events
Apple events
Windows NT
application
temp file
temp file
How Scripts Work
Code for Script
The Overview document illustrated a form
that executed a script to print the users logged onto ei.cs.vt.edu. Click
here to see
the script.
Some things to note:
Script is in perl, a popular Web scripting language. However, it could
be in another language.
Script requires no input.
Script must actually write HTTP header!
Minimal HTTP header is to provide Content-type header field.
Note that two new-lines follow first print statement, because a blank
line terminates HTTP headers. (Click here
to review this.)
Script writes an HTML file after the HTTP header.
Script invokes another program to do real work: UNIX finger.
Exit status of zero is returned.
All writes to stderr are written to Web server, not to browser window.
So user will never see an error message!
How Scripts are Executed
See diagrams on pp. 65-69 of Yeager & McGrath.
Cost of Using Scripts
Scripts can really slow down a Web server, compared to serving static
documents!
Some costs of script execution are
Multiple processes are spawned -- httpd, perl, and finger
Some scripts don't write their own protocol header lines -- httpd can
dynamically parse script output to look for headers, and supply missing
headers. (Can be deactivated in "no parse header" scripts that
have been carefully checked for HTTP compliance.)
Note that with HTTP1.1, scripts that write their own protocol headers
will be more difficult to write.
Script Input
Scripts must do the following to retrieve the input form data (called
query string):
Determine if script was called via GET or POST.
Retrieve query string.
Parse query string.
Convert pluses and ASCII codes in query string to normal characters.
In UNIX, detect "tainted" strings - one containing shell
control characters, such as the semicolon.
Fortunately, libraries exist to do these things. In perl, use get_query
subroutine from cgi-utils.pl (See pp. 375-378 in Stein book.)
Also use perl's associative arrays to easily access query string values.
See example in Stein, p. 378.