CS5604, Unit DL

Edward A. Fox
Department of Computer Science
Virginia Tech, Blacksburg VA 24061-0106

Abstract:

The vision of digital libraries, expressed so eloquently by Vannevar Bush in 1945 when he described Memex [1], provides a rallying point for work in information storage and retrieval. With our course theme of digital libraries, we begin by trying to figure out what they could and should be, their advantages and disadvantages, what might be in them and how they might function.

1. Unit Highlights

Lecture/Motivating Question:
What are the potential benefits, costs, and disadvantages relating to digital libraries?

Discussion:
Small groups of 2-6 people should meet during the week, drawing on the readings and other knowledge, to argue the pros and cons of digital libraries. Groups will collaborate on the exercise. They also will work together on the DL tutorial quiz (only one this term to be done in groups) to make sure that all of it is clear.

Computer Exercise:
Since netlib is one of the oldest and most widely used electronic mail-based service that provides digital library services, students in groups will try it out to gain experience with the use of indexes, query facilities, and semi-interactive archives.

2. Introduction

On July 20-21, 1992, the National Science Foundation sponsored a workshop on digital libraries, prompted by an earlier proposal prepared by Lesk, Fox, and McGill, that called for a National Electronic Library for Science, Engineering, and Technology. NSF will fund a good deal of R&D in this area in the 1990s, helping bring to fruition the dreams of such visionaries as Vannevar Bush and J.C.R. Licklider.

At the workshop, David Hartzband of DEC recounted experiences of a major multinational corporation involved in office and factory automation. They found that technology alone is not enough, that social and anthropological knowledge is also needed to effect change. Thus, in this unit, which introduces the course and our theme of digital libraries, there will be readings and discussion about the legal issues and essential characteristics of digital media and digital libraries.

To give concreteness to the idea of digital libraries, you will use computer networks to access netlib, an electronic archive for numerical and mathematical software and related information. This will also illustrate how semi-interactive querying can be carried out using electronic mail (or xnetlib).

Since during this course we will be making use of the rapidly expanding digital library that is being developed in Project Envision, it is important to understand the background to that effort. The article on ACM Press Database and Electronic Products describes earlier work toward an ACM digital library, relates it to products and services for ACM members and other users, and discusses some of the financial and pragmatic aspects of such an archive.

This unit paves the way for discussions of technology, methodology, theory, commercial and research systems for IS&R. It sets the stage for detailed discussions, and introduces the key theme of digital libraries, that was part of legislation introduced by Senator Gore in 1992, and re-introduced in Congress in 1993.

3. Objectives

From the Course Objectives a key point is to prepare students to discuss and explain the main issues relating to developing digital libraries and related services. Toward that end, this unit will deal with digital library efforts by ACM and by experts in numerical software, and will explore the legal and conceptual issues relating to the fundamental properties of digital media.

Other, specific objectives include being able to:

  1. define and explain digital library, and describe a vision of its future in terms of possible scenarios for two different groups;

  2. list three key challenges facing builders of digital libraries;

  3. prepare a draft proposal that could be submitted to ACM Press Database and Electronic Products;

  4. discuss six characteristics of digital media that are likely to lead to significant changes in future law about those media, and why;

  5. obtain mathematical software and information from netlib; and

  6. describe the main services, advantages, disadvantages, limitations, and costs of netlib.

4. Suggested Procedure

There are three main types of effort required. First, the readings (see section 5) should be carefully studied, keeping unit objectives in mind (see section 3). Second, students should work together in groups on the exercise and the digital libraries tutorial (along with the long quiz on it). Third, students should independently take the final quiz for the unit (a separate and shorter one than that on the tutorial) - this must be done separately by each student according to the honor code.

4.1 DL Tutorial

The class will break into groups. Please send the instructor the list of students in each group, so all can get credit together. The groups will discuss two WWW resources related to digital libraries.

The first is a tutorial on digital libraries available in Adobe's Portable Document Format (PDF). For the sake of the class, the short version of that will suffice. However, those interested in in-depth reading are encouraged to look at the long version. Once this has been studied, groups can then work together to answer the long multi-part self-study quiz available from the QUIZIT system.

The second is more fun, and should lead to a good discussion. Please read Raj Reddy's proposal for a Universal Library and prepare a group response, sending it to the instructor, who may send it along to Dr. Reddy if deemed of interest (e.g., making good suggestions, asking good questions).

4.2 Computer Exercise

In this exercise you will send email to netlib, following the instructions in the article [2]. A newer version of the instructions involve sending mail to the netlib@ornl.gov system. However, it is much better to click here with Mosaic or Netscape, now that WWW browsers can access information directly.

Please send a copy of all results you receive to the instructor for review.

  1. identify works that came from Virginia Tech.;

  2. find works by John Dennis (but not by other people named John or Dennis); and

  3. find the originators of HOMPACK

Here are some hints:

  1. Some people use their first initial instead of their first name.
  2. Software systems may have only one originator.
  3. Software systems at universities often are partially developed by the students of a faculty member who is responsible for the system.
  4. Drs. Watson and Ribbens in CS at Virginia Tech work on numerical analysis problems; their names and research interests could be found among the departmental web pages.

5. Comments on Readings

Note: For information on all articles for the course, click here with Mosaic or Netscape.

Note: An online report on digital libraries and related research opportunities may be of interest. This includes definitions and use scenarios. Look for it in the long tutorial or use a WWW browser to access Interoperability, Scaling, and the Digital Libraries Research Agenda.

5.1 FOXE88d

This article describes the early work on ACM Press Database and Electronic Products, and plans for the future. The research aspects of this program have been carried forward into Project Envision, and ACM Headquarters and the Publications Board, along with the Electronic Publishing Volunteer Advisory Committee, are coordinating work on a plan that will include an electronic archive and electronic submissions.

Paragraph 1 gives a snapshot of the status, which is amplified at the end of the Introduction. The earlier part of the Introduction describes in general terms what technological and related advances have made digital libraries possible.

The last 3 paragraphs of Vision are important, dealing with collection building, standards, and the main classes of services. The Challenges section calls for more vision and ideas (such as those given in the first paragraph of the Opportunities section), then for focused R&D, and finally for work on economic, social and legal issues. Funding will be needed from ACM SIG's (paragraph 2) and from partnerships (paragraph 3).

The Organization and Acknowledgments sections are not relevant. However, the Proposals section gives important guidance regarding what to look for in developing information products, services, or even multimedia information packages.

5.2 SAMU91a

This article gives valuable insight into copyright and legal matters, but is especially helpful in pointing out important characteristics of digital media. The six characteristics should be carefully studies and pondered. The last one, on nonlinearity, should be re-read after completion of Unit 8 on Hypertext. Note that Ms. Samuelson's husband is Robert Glusko, who has been very active in R&D relating to hypertext.

The second section, on Replication, explains an influential court case and its implications, and discusses several clever schemes for generating revenue in connection with electronic publishing.

Key to our course are the remaining sections. Transmission and Multiple Use are essential parts of digital libraries, but there are serious dangers of piracy. Plasticity is one of the key added values of digital media, but protecting authors' rights will demand careful balancing of this benefit with the need for extending copyright protection to changes. Equivalence issues relate to multimedia, but are not clearly explained here. Compactness is not really the theme of the next section - rather it is about storage cost-effectiveness and storage hierarchies, an idea which dates back to such discussions as [3]. Issues of nonlinearity will be dealt with later in the course, but are previewed in an interesting way here.

5.3 DONG87

The netlib system was one of the first to provide electronic mail-based access to archives, and serves the scientific computing community. This short article describes how the system works and can be used (though details and addresses have changed!), summarizes the contents of the archive, briefly explains the server, gives advantages and disadvantages, and closes with a list of needed future enhancements and opportunities.

6. Summary of Key Concepts

  1. Digital libraries should carry over the benefits of current libraries, and add value in terms of speedy, ubiquitous, and flexible access, as well as through support of searching, browsing and linking.

  2. Not only must we consider the tools and the content of digital libraries, but we also must be concerned with electronic media and their characteristics; the demands of applications; and questions relating to economic, social and legal policies.

  3. Building digital libraries includes many types of work including:

References

1
V. Bush. As we may think. Atlantic Monthly, 176:101-108, July 1945.

2
Jack J. Dongarra and Eric Grosse. Distribution of mathematical software via electronic mail. Communications of the Association for Computing Machinery, 30(5):403-407, May 1987.

3
J. C. R. Licklider. Libraries of the Future. The MIT Press, Cambridge, MA, 1965.


fox@cs.vt.edu
Fri Aug 29, 1996