CS5604, Unit DL

Edward A. Fox
Department of Computer Science
Virginia Tech, Blacksburg VA 24061-0106

Abstract:

The vision of digital libraries, expressed so eloquently by Vannevar Bush in 1945 when he described Memex [1], provides a rallying point for work in information storage and retrieval. With our course theme of digital libraries, we begin by trying to figure out what they could and should be, their advantages and disadvantages, what might be in them and how they might function.

Unit Highlights

Lecture/Motivating Question:
What are the potential benefits, costs, and disadvantages relating to digital libraries?

Discussion:
Small groups of 2-6 people should meet during the week, drawing on the readings and other knowledge, to argue the pros and cons of digital libraries. Each group will send in an email summary to the instructor.

Computer Exercise:
Since netlib is one of the oldest and most widely used electronic mail-based service that provides digital library services, students will try it out to gain experience with the use of indexes, query facilities, and semi-interactive archives.

Introduction

On July 20-21, 1992, the National Science Foundation sponsored a workshop on digital libraries, prompted by an earlier proposal prepared by Lesk, Fox, and McGill, that called for a National Electronic Library for Science, Engineering, and Technology. NSF will fund a good deal of R&Din this area in the 1990s, helping bring to fruition the dreams of such visionaries as Vannevar Bush and J.C.R. Licklider.

At the workshop, David Hartzband of DEC recounted experiences of a major multinational corporation involved in office and factory automation. They found that technology alone is not enough, that social and anthropological knowledge is also needed to effect change. Thus, in this unit, which introduces the course and our theme of digital libraries, there will be readings and discussion about the legal issues and essential characteristics of digital media and digital libraries.

To give concreteness to the idea of digital libraries, you will use computer networks to access netlib, an electronic archive for numerical and mathematical software and related information. This will also illustrate how semi-interactive querying can be carried out using electronic mail (or xnetlib).

Since during this course we will be making use of the rapidly expanding digital library that is being developed in Project Envision, it is important to understand the background to that effort. The article on ACM Press Database and Electronic Products describes earlier work toward an ACM digital library, relates it to products and services for ACM members and other users, and discusses some of the financial and pragmatic aspects of such an archive.

This unit paves the way for discussions of technology, methodology, theory, commercial and research systems for IS&R. It sets the stage for detailed discussions, and introduces the key theme of digital libraries, that was part of legislation introduced by Senator Gore in 1992, and re-introduced in Congress in 1993.

Objectives

From the Course Objectives a key point is to prepare students to discuss and explain the main issues relating to developing digital libraries and related services. Toward that end, this unit will deal with digital library efforts by ACM and by experts in numerical software, and will explore the legal and conceptual issues relating to the fundamental properties of digital media.

Other, specific objectives include being able to:

  1. define and explain digital library, and describe a vision of its future in terms of possible scenarios for two different groups;

  2. list three key challenges facing builders of digital libraries;

  3. prepare a draft proposal that could be submitted to ACM Press Database and Electronic Products;

  4. discuss six characteristics of digital media that are likely to lead to significant changes in future law about those media, and why;

  5. obtain mathematical software and information from netlib; and

  6. describe the main services, advantages, disadvantages, limitations, and costs of netlib.

Suggested Procedure

There are three main types of effort required. First, the readings (see the second from last section) should be carefully studied, keeping unit objectives in mind (see the third section). Second, students should prepare for and engage in the group discussions, drawing upon the readings and other resources (see next subsection). Third, students should use electronic mail (or xnetlib) to carry out the Exercises relating to netlib (see subsection after debate discussion).

Debate Topics

The class will break into groups. Each will be assigned 3 questions. Think about your group's debate topics, since people involved in that discussion group must discuss them (either pro or con). After your group has a discussion on its 3 topics totalling about 45 minutes, work together to send an email summary to the instructor, that has the name of each person in the group, and gives the consensus viewpoint on each of your 3 topics. Do a careful job in writing, with different people playing different roles (see discussion under Course Format). Rotate the roles for each different topic.

  1. Technology will cause the demise of print publishers, because they will be unable to prevent theft and widespread distribution of electronic forms of their publications.

  2. People won't take computers to the beach or put them on their night tables or spread them out to read the comics, since books and other print forms are much cheaper, friendlier, lighter, easier to use in a variety of lighting situations, and cover a larger surface area.

  3. ACM should allow electronic submissions for all of its publications to be made according to authors' wishes, and should ignore issues of standardization.

  4. Electronic publication should be funded by a system of subscriptions, so users are encouraged to make use of published materials that are covered by any of their subscriptions.

  5. ACM should not call for proposals but should instead carry out an ambitious electronic publishing effort in-house, that should rapidly break even financially.

  6. There are too many electronic publishing forms and standards, and too little user-oriented access software, to motivate people to buy into the new technology.

  7. Copyright is a useless concept for digital libraries because publishers will control and charge for access to and use of digital works.

  8. Given the federal government's role in NSFNET and the NREN, ubiquitous networking seems imminent, and means that there will be national networked access to future digital libraries. One result will be serious problems with international copyright violation due to the ease of transmission.

  9. Derivative works will be commonplace, and primary publishers will suffer greatly from those who have shaped and combined prior work in such a way as to abandon giving credit to original sources.

  10. Copyright law cannot hope to deal with classifying works by media type, given that multimedia publications will become widely used.

  11. Storage hierarchies, with personal, departmental, campus, state, regional, national, and international levels, will operate with multiple copies of each work available, in places chosen to optimize performance.

  12. With highly linked collections of materials like hypertexts and their search trails, whose raw materials originated from numerous sources, copyright protection and remuneration should go to the new editor, and not to the original authors.

Computer Exercise

In this exercise you will send email to netlib, following the instructions in the article [2]. See /u1/README/netlib on fox.cs.vt.edu (or, click here with Mosaic or Netscape) for current instructions, which involve sending mail to the netlib@ornl.gov system. Alternatively, you can run the xnetlib program (which in most CS dept. machines is located in /usr/local/X11R5/bin). Another alternative is to click here with Mosaic or Netscape; xnetlib is no longer supported now that WWW browsers can access information directly.

Please send a copy of all results you receive to the instructor for review.

  1. get a copy of the index, and then identify works that came from Virginia Tech.;

  2. repeat task 1 above by doing a direct search for the same items;

  3. find works by John Dennis (but not by other people named John or Dennis); and

  4. find the originators of HOMPACK

Here are some hints:

  1. Some people use their first initial instead of their first name.
  2. Software systems may have only one originator.
  3. Software systems at universities often are partially developed by the students of a faculty member who is responsible for the system.
  4. Drs. Watson and Ribbens in CS at Virginia Tech work on numerical analysis problems; their names and research interests could be found among the departmental web pages.

Comments on Readings

Note: For information on all articles for the course, click here with Mosaic or Netscape.

Note: A new report on digital libraries and related research opportunities may be of interest. This includes definitions and use scenarios. Use a WWW browser to access Interoperability, Scaling, and the Digital Libraries Research Agenda.

FOXE88

This article describes the early work on ACM Press Database and Electronic Products, and plans for the future. The research aspects of this program have been carried forward into Project Envision, and ACM Headquarters and the Publications Board, along with the Electronic Publishing Volunteer Advisory Committee, are coordinating work on a plan that will include an electronic archive and electronic submissions.

Paragraph 1 gives a snapshot of the status, which is amplified at the end of the Introduction. The earlier part of the Introduction describes in general terms what technological and related advances have made digital libraries possible.

The last 3 paragraphs of Vision are important, dealing with collection building, standards, and the main classes of services. The Challenges section calls for more vision and ideas (such as those given in the first paragraph of the Opportunities section), then for focused R&D, and finally for work on economic, social and legal issues. Funding will be needed from ACM SIG's (paragraph 2) and from partnerships (paragraph 3).

The Organization and Acknowledgments sections are not relevant. However, the Proposals section gives important guidance regarding what to look for in developing information products, services, or even multimedia information packages.

SAMU91a

This article gives valuable insight into copyright and legal matters, but is especially helpful in pointing out important characteristics of digital media. The six characteristics should be carefully studies and pondered. The last one, on nonlinearity, should be re-read after completion of Unit 8 on Hypertext. Note that Ms. Samuelson's husband is Robert Glusko, who has been very active in R&Drelating to hypertext.

The second section, on Replication, explains an influential court case and its implications, and discusses several clever schemes for generating revenue in connection with electronic publishing.

Key to our course are the remaining sections. Transmission and Multiple Use are essential parts of digital libraries, but there are serious dangers of piracy. Plasticity is one of the key added values of digital media, but protecting authors' rights will demand careful balancing of this benefit with the need for extending copyright protection to changes. Equivalence issues relate to multimedia, but are not clearly explained here. Compactness is not really the theme of the next section - rather it is about storage cost-effectiveness and storage hierarchies, an idea which dates back to such discussions as [3]. Issues of nonlinearity will be dealt with later in the course, but are previewed in an interesting way here.

DONG87

The netlib system was one of the first to provide electronic mail-based access to archives, and serves the scientific computing community. This short article describes how the system works and can be used (though details and addresses have changed!), summarizes the contents of the archive, briefly explains the server, gives advantages and disadvantages, and closes with a list of needed future enhancements and opportunities.

Summary of Key Concepts

  1. Digital libraries should carry over the benefits of current libraries, and add value in terms of speedy, ubiquitous, and flexible access, as well as through support of searching, browsing and linking.

  2. Not only must we consider the tools and the content of digital libraries, but we also must be concerned with electronic media and their characteristics; the demands of applications; and questions relating to economic, social and legal policies.

  3. Building digital libraries includes many types of work including:

References

1
V. Bush. As we may think. Atlantic Monthly, 176:101-108, July 1945.

2
Jack J. Dongarra and Eric Grosse. Distribution of mathematical software via electronic mail. Communications of the Association for Computing Machinery, 30(5):403-407, May 1987.

3
J. C. R. Licklider. Libraries of the Future. The MIT Press, Cambridge, MA, 1965.





fox@cs.vt.edu
Fri Sep 1 14:12:34 EDT 1995