SD - SGML
We have the evolution of:
- Markup language, to
- Metalanguange / generalized markup, to
- SGML (metalanguage) and
- Application of SGML using
- Document Type Definitions (DTDs).
SGML - Standards
- SGML = ISO 8879, Standard Generalized Markup
Language
- Related Standards
- ISO 8613, ODA, Office Document Architecture: formating, imaging,
interchange; represented in SGML or binary --- includes layout
- ISO 10179, DSSSL, Document Style Semantics & Specifications
SGML - Explanation
- Is a
- tagging language, DB language
- extensible doc. descrip. language
- metalanguage to define doc. types
- Supports
- logical structure, hierarchies
- file linking & addressing
- multimedia and hypertext
Documents, DTDs, Procedures
SGML-encoded documents can be:
- checked for syntactic compliance with a DTD
- printed in accordance with DSSL specifications
- displayed on a screen
- indexed for context-specific searching
- translated into another representation
SGML - Markup
SGML allows 4 types of markup
- Descriptive: tags
- Referential: entity references
- Metamarkup: markup declarations
- Processing instructions: LINK, CONCUR
The most important type is Descriptive Markup.
- Elements
- GI (generic identifier) - found in start and
end tags
- id= idref; attribute/value pairs
- content
- Top element = doc.
- Elements have content model (i.e., a grammar
production)
Example
OHCO Model Diagram
DTDs
- Define a class of documents.
- Specialize SGML for documents in a class.
- Use an attribute grammar.
- Use a bracketted grammar.
- Hierarchy support means nested content.
- Ordering means OHCO is fully supported.
- Developing a DTD: DTDspin, modular approach, DTD for theses,
ACM's eclectic approach (but making sure formatters exist).
Our Uses of SGML
- CODER: dictionaries, INCARD (project with cardiology data)
- Project Envision
- Bibliographic records
- Transactions articles
- Tech reports (WATERS): for biblographic records instead of refer or
BibTeX
- Dissertations and theses
- TREC, TULIP
- HTML and HTML+
Advantages of SGML
- Maintainability
- Allows multiple use: Paper - screen; thesis - article - book
- Immune to new WP s/w or new versions
- Portability
- Author to online use to publisher
- Electronic Manuscript Project (of AAP)
- Guidelines for articles, books
- Elements for mathematics, tables
- Text Encoding Initiative (TEI)
- Benefits
- Sharing, collaborating
- Avoid rekeying, proofing, fixing galleys:
initially, and on new editions
- Derive bibliographies & citations
- Immediately publish too in online databases and on CD-ROM
- Less Cognitive Demand, Better Support of Content Orientation
- Element recognition only -> derive markup
selection & performance
- Focus on structure
- Special Assistance / Processing
Doc. Modeling w. SGML
- Characterize structure using DTD
- Depth / thoroughness of tagging
- Bibliographic entries, names
- Math, tables
- PAT model, representation, retrieval
- PAT and LECTOR specification files
- OHCO, grammar models --- especially if one avoids
SGML customization
- Minimization, entity references
- CONCUR, processing markup
- Extensive use of attributes
- Attach semantics / weighting / evidence
effects -- to context
HTML References