10/23/95 Class Summary by Group I: Kalafut, Muhlenburg, Klein, Fitzerald After taking care of administrative issues, we were referred to an article about markup before starting SGML. We mentioned other standards such as ODA and DSSSL, and then we went over what SGML could do, most importantly serve as an intermediate language and a document structure definition markup language for such documents as theses and dissertations. We then learned that SGML documents can be checked for syntax, printed according to other standards like DSSSL, indexed, used for translations, and even displayed in ASCII form (as we then viewed). We reviewed the 4 kinds of markup that SGML supports, the most important being descriptive markup which uses "elements" that are used to describe a type of document's structure. We then reviewed an example using "elements". We then discussed DTD's, which define a class of documents, and a little about how they work, noting that good DTD's are difficult to develop. We saw some other uses of SGML, like the Envision project. After noting how CD's are now beginning to rust, we reviewed some advantages of SGML, like maintainability and portability. We then used UVA's Text Encoding Initiative to look up, among other things, some base tags about poetry. We then went into document modelling with SGML, including the PAT model and OHCO grammar model, as well as noting that one should use a tool to develop something with SGML instead of a straight ASCII editor. We then covered document translation which has difficulties, such as 1 to many and many to 1 translations and building an invertible translator for both up and down translations. We then went over the ICA approach before concluding that SGML as an intermediate form is a very plausible solution. After closing the SGML discussion we received a brief introduction to hypertext at class's end. = = = = = == = = === = = = = = = = = = ==== = = == = = = == = = October 23, 1995 - Class Summary Carolyn O'Hare Lauren Barton Robert Ryan Martin Falck Nelson Kile The class began with a discussion of the news and announcements and a reminder that the class needs to judge the debate topics by November 1. We then began the lecture on SGML. SGML is a tagging metalanguage that allows you to add description to the language. It supports logical structures, hierarchies, file linking, addressing, multimedia and hypertext. Documents created with SGML can be: 1) Checked for syntactic compliance, 2) Printed in accordance with DSSL specifications, 3) Displayed on a screen, 4) Indexed for context-specific searching, and 5) Translated into another representation. The present standards for SGML is ISO 8879. Other related standards are ISO 8613, Office Document Architecture (ODA) and ISO 10179, Document Style Semantics and Specifications (DSSL). SGML allows for four types of markup, descriptive, referential, metamarkup and processing instruction. Of these descriptive markup is the most important. The basic pieces of descriptive markup are the generic identifier, the id and the content. The top element identifies the document. There are many benefits of SGML. Some of the benefits are maintainability and portability. SGML allows for multiple use and is immune to different versions of Word Processors. SGML also provides for immediate on-line or CD-ROM publishing. There is no need to rekey or adjust an SGML document for this type of publishing. We ended class with a brief discussion on hypertext. Hyper- G and Hypermedia were mentioned as the second generation for the WWW. = = = = = == = = === = = = = = = = = = ==== = = == = = = == = = Class Summary for Oct 23, 1995 (Group 5) Shirley Carr Mike Joyce Bushra Khan Zakia Khan Vas Madhava CLASS SUMMARY CS5604 - OCT 23, 1995 UNIT SD (95%) ------------- SD: SGML -------- * SGML has had wide impact and will continue to do so. This is because it can be used to describe any kind of document, ie. it's a meta language by which any language can be defined. * The evolution of document languages is as follows: - Markup Language - Just tags - Metalanguage / Generalized markup A metalanguage specifies the format and structure of a language at a high level of abstration. - SGML Metalanguage - Application of SGML - Use of DTDs (Document Type Defnitions) The advantage of SGMLs when used with DTDs is that it can be personalized to the user's needs. * SGML Standards: ISO 8879: "The" SGML standard - it specifies the logical structure of the language. Related Standards: ISO 8613: ODA - Office Document Architecture - This was more concerned with pages and page layout, formatting, imaging and interchange. - Never caught on because it was too cumbersome ISO 10179: DSSSL - Document Style Semantics & Specifications - This standard deals with style and semantic issues for different types of languages. - This has not been agreed upon to date. * SGML - Description: - SMGL provides methods for describing a document's content and structure - Is a tagging language that can be used to describe anything in a data base. - Can be used for interchange. Thus it's good for WWW applications. - It's extensible. Thus it can be used without fear of obsolescence when future needs arise. - It's a metalanguage to define document types. Thus it can be customized for personal use. For example, a DTD for theses is being developed. - It supports hierarchies thus allowing you to structure your document. - It allows for file linking and external addressing thus allowing the document to point to the outside world. - It supports Multimedia and Hypertext thus allowing descriptions of various elements. (eg. the different elements of a dance). * SGML documents can be used for a variety of purposes (as opposed to paper which can only be used for reading). These include: - Syntax checking with a DTD to verify that the document conforms to specifications. This can used to verify that the document has good structure. - printed according to DSSSL specifications. - Displayed on a screen - Indexed for context specific searching. This is powerful because one language can be used for both processing and style considerations. - Translated into another representation. This is powerful because SGML can be used as the intermediate language for conversion between disparate document formats, such as various word processing formats. * SGML Markup - 4 types of markup are allowed: Referential - which is used for referring to entities (eg. & the & char) Metamarkup - used for Markup declarations. Processing Instructions - which are not covered here. Descriptive Markup - The most important type of markup - Composed of 3 elements: a) The Generic Identifier which is found in start and end tags. b) Attribute Value pairs of the format: id=idref c) The contents - The topmost element is the document itself. - The elements have a content model, defined by a DETERMINED, that specifies the grammar of the element. DTD - Example ... - allows you to define a class of documents and its associated grammar. Thus, defining a DTD for HTML allows you to specify the allowable syntax for HTML from which you can then create a parser. - It uses an attribute grammar, thus allowing for attaching attributes to grammars. But purists don't like this because it makes it more difficult to write a parser. - It uses a bracketted grammar (or context free grammar) meaning that special sequences are delimited. - It is hierarchical meaning that contents are nested. - It is ordered meaning that the specified sequence must be followed. - Developing a DTD is very difficult because you need to know all possible data items you will care about beforehand. DTDSpin is a tool that lets you develop DTDs more quickly. * VA Tech's uses of SGML include: - CODER dictionaries - The INCARD system for cardiology data - WATERS system for technical reports - Theses - etc. * The Advantages of SGML are in 1) Maintainability: - The same source could be used for multiple purposes, eg. printing and displaying - It's immune to new different versions of software packages thus ensuring its viability in the future. 2) Portability: - It can be used by everyone who needs to refer to the document such as authors and publishers and even on-line. - It has the approval of Electronic Manuscript Project which has given guidelines for its use in articles and books. - The Text Coding Initiative and Interchange has guidelines for use of SGML for various types of "documents" including poetry, sculpture descriptions, etc. 3) Sharing - it allows everyone to use one source, especially publishers 4) It avoids rekeying and proofing when conversions are made 5) It can be used for processing such as generation of bibliographies and citations. 6) Allows for immediate publishing in on-line databases and cd-roms. 7) Allows for more focus on the content rather than the style, especially on the structure of the document. 8) Allows for special processing and viewing such as with Author/Edit's ability to move entire sections by moving tags and Panorama which lets you read SGML files (thus avoiding keeping a HTML copy of the same file). * Issues involved in document modeling with SGML 1) Need thorough tagging but this often gets in the way of writing content 2) Need bibliographic entries but there are no conventions for encoding this. eg. some people make distinctions between proceedings and journals 3) The issue of going from a structure to its rendering hasn't been resolved yet. * There are many online references to HTML writing. SD: ELECTRONIC PUBLISHING ------------------------- * There are many approaches and systems in this area. * Software packages include troff, LaTeX, word processors such as Word, Word Perfect * SoftQuad's Author/Editor is advanced because it lets you create and edit SGML documents based on rules that you define. * The industry trend is towards SGML with a WYSIWYG front end. SD: DOCUMENT TRANSLATION ------------------------ * The translation problem is in being able to go from any of n sources to n-1 targets. If direct translators need to be written, we would need n*(n-1) translators. But if an intermediate format is used, this comes down to 2n. And if the same translator is used for going in both directions, then it comes down to n. * This intermediate format has to be powerful enough to handle all the contents of the sources and targets. It also needs to separate contents and structure elements from the style elements so that it can be rendered differently on the target systems. * The difficult part is going from a specific format to the SGML format because there might be - inconsistencies in markup selection and ordering - overloading of symbols - semantic vs. syntactic orientation in the contents. - Especially difficult are math translations and tables * Measurements of fidelity of translation are: - Hardcopy - does it look the same when printed - Screen - does it look the same on the screen, do lines wrap as window size changes, etc. - Editing - Do you have screen fidelity and editability as in the source system. * The ICA (Integrated Chameleon) Approach - A formal model, does translation up to SGML and inverse translation from SGML to the target - Developers provide grammars and translations while users apply them and resolve ambiguities that arise. - It uses SGML as the intermediate format. - It uses the following methodology: 1) Develop the grammar using BNF, Yacc, etc. 2) Replace existing tags 3) Insert new tags 4) Map specific to general 5) Map general to specific UNIT HT: HYPERTEXT (5%) ----------------------- * IR and HT are actually integrated in the sense that IR systems are used for searching while HT systems are used for browsing. One typically does one and then the other, often going back and forth. * The emphasis with Hypertext is on Multimedia. The evolution of 1) Dexter Group - They came up with the initial model to 2) Amsterdam Model - added support for Multimedia which involves a lot of synchronization issues. 3) HyTime - Now standardized by ISO - Was originally developed to describe music but since it addressed the same kinds of issues as multimedia it became an extension to SGML to handle multimedia.