Introduction

When building an electronic library it seems wisest to have a single (family) of standards, so that all documents can be stored using a standard representation. At present, SGML (Standard Generalized Markup Language) is the most likely prospect. SGML is a metalanguage that allows specification of grammars, for languages describable with labelled brackets that can have associated attributes.

SGML was approved as an ISO standard in 1986, and has been endorsed by numerous agencies world-wide. It will be used in Project Envision, and has strong support from publishers, governments, professional associations, humanists (e.g., those involved in developing guidelines as part of the Text Encoding Initiative sponsored by ACH-ALLC), and many other groups.

You have already worked a bit with SGML! In your work with PAT, you studied the tag sets for three document collections, and searched through three SGML-encoded files. Thus, you saw the benefit of SGML for text searching: allowing you to ask precise questions, that can make reference to the structure of the information. You have also used HTML, which is another application of SGML, in connection with the World-Wide Web.

In this unit, we explore SGML from a variety of other perspectives: what it is, how it can be applied to various types of documents (by developing Document Type Definitions, DTDs), how SGML documents can be created and edited and displayed with specially tailored tools, what is meant by SGML compliance, how we can analyze documents tagged according to SGML grammars, and how we can translate to and from SGML.

Thus we continue our investigation of compiler/translation methods, begun during our consideration in Unit IN of lexical analysis, into discussions of document grammars and translators. We extend the document analysis topic touched on earlier, now gaining experience with parsers of documents that conform to SGML DTDs. We also continue themes relating to usability and human-computer interaction, seeing how structure editing tools can be coupled with SGML processing to yield a next generation of authoring tools / editing systems.

This Unit calls upon your interest in new approaches, to motivate your considering changes in how to think about word processing and document editing. It provokes you to integrate what you have learned about grammars with what you have learned about IS &R, to see the value of document representation standards, and to be prepared to change the way you think about future writing efforts, especially regarding theses or project reports. In particular, it prepares you to become involved in an ongoing project at Virginia Tech to have theses and dissertations submitted electronically, according to standards now being developed and tested, in conjunction with the Council of Graduate Schools, University Microfilms International, and the Coalition for Networked Information.


fox@cs.vt.edu
Thu Oct 27 04:09:51 EDT 1994