Open Computing "Hands-On" Tutorial: October 1994
Making Web Browsers Talk Back
The World Wide Web, accessed through forms-capable browsers like Netscape and
Mosaic, can be used for two-way communication. Here's how to create forms and
scripts for collecting information.
By Patrick M. Ryan
The World Wide Web (WWW) is an excellent tool not only for retrieving
information from remote sites but also for allowing you to interact with sites
in a way similar to transaction processing. Several Web browsers, most notably
Netscape and Mosaic, let you enter information into the browser and have that
information sent back to a server.
The resources most commonly accessed through WWW are documents written using
the Hypertext Markup Language (HTML). With HTML, portions of a document can be
treated as hyperlinks (or references) to other Web resources. These elements,
which are often textual and sometimes graphical, appear as highlighted objects
when viewed in a browser such as Netscape. If you click a mouse or press a key
related to one of these highlighted objects, the browser goes out to the net
and retrieves that hyperlink.
HTML documents may be retrieved from machines that are running the Hypertext
Transfer Protocol (HTTP) daemon. This daemon (HTTPD) listens to a certain port
(default 80) for requests for documents within a certain domain on the host
system. Mosaic and HTTPD are both products of the National Center for
Supercomputing Applications (NCSA).
(See the June 1994 Open Computing ``Hands-On'' section tutorial article,
``Riding the Internet Wave'' for setting up Mosaic.)
HTML also allows the reader to enter information into the HTML document and
have that information passed back to the server machine's HTTP daemon. These
types of HTML documents are called forms. The method for passing the
information back and processing that information is called the Common Gateway
Interface (CGI). Associated with an HTML form is a CGI program or script. The
CGI specification describes what CGI programs can expect from standard input,
what they should send to standard output, what environment variables they can
use, and what may appear on the command line.
Nearly all current browsers support forms. Netscape Navigator and versions of
Mosaic later than 2.0 support forms. The research for this tutorial used Mosaic
for X version 2.4 and httpd 1.1.
There are many things to learn about implementing HTML documents, but some
important items you should learn about include configuring Web server (HTTP
daemon program), how to write HTML forms, how the server and CGI program
interact, and what security measures to keep in mind.
Server Configuration
Implementing a form system using HTML requires you to write two files: an HTML
document and a CGI program to process the input from the form. Both Listing 1
and Listing 2 demonstrate a simple product-ordering system used by the prolific
and fictitious Yoyodyne Corp. We assume that their HTTP server resides on
www.yoyodyne.com.
Configuration of the Web server daemon is a straightforward but long process
and is a subject worthy of an article all to itself. The URL (Uniform Resource
Locator) for NCSA's excellent documentation on HTTPD configuration can be found
at the end of this article.
Once you have an operational HTTP daemon on your system, familiarize yourself
with the directory structure of the server. The server has a directive named
ServerRoot that points to the top of the HTTP daemon's directory tree (often
/usr/local/etc/httpd/). The server-root directory has several subdirectories,
including conf/, icons/, logs/, and cgi-bin/. The conf/ directory contains the
server's configuration files. In those files, most directory references are
relative to the value of ServerRoot.
Look at the file conf/srm.conf (the server resource map file). The variable
ScriptAlias indicates where CGI scripts reside. The first argument to
ScriptAlias is an alias name (for the actual path name) that HTML forms must
use to refer to their associated CGI programs. The second argument is the real
path on the system where CGI scripts live. For security reasons, any attempt to
reference a CGI program outside that alias directory will generate an error
from the server. We need to know the actual location of that directory so that
we know where to locate our CGI program.
Form Syntax
Forms are set up in an HTML document using a FORM tag. The syntax is
. The ACTION attribute
parameter is a URL that points to the form's CGI program. Usually but not
always, this CGI program resides on the same machine as the HTML document. The
METHOD parameter will have a value of ``POST'' or ``GET.'' This parameter
indicates how the request will be transmitted to the server. In nearly all
cases, it will be ``POST.'' When using the POST method, the client sends the
query data as an Object-Body. The CGI program reads the data on its standard
input.
As mentioned before, CGI scripts must reside in the directory pointed to by the
ScriptAlias parameter. A typical value for the alias directory name is
/cgi-bin/. The URL that points to a CGI program process_order would be
"/cgi-bin/process_order". Note that this URL does not have any protocol or host
information. In the absence of such information, the Web server will look for
the CGI script on the same host where the form resides.
As with CGI scripts, the locations of all resources--accessed through the Web
server daemon, the documents served, or otherwise --are restricted. The
DocumentRoot directive points to the top-level directory where these resources
may be accessed. (The default value for DocumentRoot is
/usr/local/etc/httpd/htdocs/.) For example, if you have a URL that points to
http://www.yoyodyne.com/products/order.html, the HTTP daemon on
www.yoyodyne.com will translate this path into
/usr/local/etc/httpd/htdocs/products/order.html.
However, if the first part of the URL file path has the form ~user/, the server
consults the value of the UserDir server configuration directive. If this
directive has a directory name value, the server will look for user-account
home directory, append the value of UserDir, and look in that directory for the
reference. For instance, if UserDir is set to public_html and you have a
reference to http://www.yoyodyne.com/~dave/abstract.html, the server will
translate this path to ~dave/public_html/abstract.html on www.yoyodyne.com. The
administrator can set UserDir to ``DISABLED'' to defeat this feature.
HTML Buttons
An HTML form can make use of three different types of interface elements or
tags: INPUT, SELECT, and TEXTAREA.
The general form of an INPUT tag is . The INPUT element is
a ``standalone'' tag; it has no terminating tag. NAME defines the symbolic name
for the field value passed back to the server upon submission and must be
present for all but TYPE="submit" or TYPE="reset". The value for NAME does not
appear in the displayed document. Usually, any text immediately before or after
the INPUT tag serves as a label for the tag.
The TYPE attribute to the INPUT tag indicates which type of input you want:
text
Textual input.
password
Same as text but does not echo characters.
checkbox
A button that is either on or off.
radio
A ``one-of-many'' checkbox if multiple radio buttons are grouped with the
same NAME.
reset
Resets form values to their defaults.
submit
Sends form information back to the server.
The text and password input TYPES values may contain an optional
SIZE=columns,rows attribute that indicates the number of columns (characters)
and rows displayed for text input. Checkboxes and radio buttons may have an
optional CHECKED attribute to specify a pre-checked value.
The submit and reset input TYPE values are special. If a user presses the
``Reset'' button, all of the inputs are set back to their default values.
Pressing the ``Submit'' button will cause the browser to package up the data
entered by the user and send it back to the server. These two values have an
optional attribute VALUE=button-label, which, if present, will be used as the
button label.
The SELECT interface tag allows the user to choose from a list of items in a
pop-up menu or scrollable list. The selection items are enclosed between the
opening tag . Each choice in the list
begins with an