IBM Academy Digital Library Workshop

September 12-13, 1994, Edith Macy Conf. Center, Briarcliff Manor, NY

Chair: Jacob Slonim, IBM Toronto Laboratory

Session: Introduction

Speaker: Jeffrey Crigler, IBM

The purpose of the meeting was to understand the requirements of various classes of users, and to understand the technology requirements. Today's libraries serve as:

They deliver value of various types:

Digital libraries, part of NII --- where citizens are informed, exchange information, and are educated and entertained --- should be considered ito information needs and consumptive behavior. They are institutions and repositories:

We want DLs to be wired (graphic representation, authoring, interaction, topics, superdistribution, natural language, subject webs, relevancy of results, come and find me, guides, up to the second, integrated and used, value derived, educate and entertain) not tired (abstracts and text, one way delivery, sources, repositories, boolean, hit lists, accuracy of recall, go and get it, explore, archival, rip and read, time on line, inform).

Session: Copyright Owners

Speaker: Karen Hunter, Elsevier

Alan Marwick of IBM moderated. Ms. Hunter is Vice President and Assistant to the Chairman of Elsevier Science B.V. and leader of the TULIP project. She referred us to The Changing Business of Scholarly Publishing in Journal of Library Administration, 9(3/4), 23-38, 1993. Her presentation argued that we have answers to the following questions with our current paper system, but are not sure about the future electronic scenario.

Session: Knowledge Workers and Education

Speaker: Edward Fox, Virginia Tech

Jeff Crigler of IBM moderated.

Session: Librarians and their Patrons

Speaker: Neil Smith, The British Library

Katherine Willis of Univ. Michigan moderated. Her institution plans to work with NSF/ARPA/NASA funding on research on digital libraries, in the area of earth and space science.

Session: Distribution Service Providers

Speaker: Mark Thompson, Dow Jones and Company

Frank Licate of ISI (Institute for Scientific Information) moderated.

Session: Library of Congress

Speaker: Herbert Becker, Library of Congress

Mr. Becker is Director of Information Technology Services, a member of the LoC's Digital Library Coordinating Committee, and LoC representative to the Committee on Applications and Technology of the National Information Infrastructure Task Force. This discussion focussed on LoC's initiatives in the area of digital libraries. By the year 2000, they will have 5M works in digital form, and they are charged to have those be the most important works. There is serious interest from foundations to assist.

LoC is the copyright center for USA, so two copies of all works should be deposited there. Part of the plan is for new works to be entered electronically, since 1M new works come in each year. But research is underway on preservation too, so that old works that may disintegrate will not be lost. It is important to understand that it may cost $5 per page for this effort.

A Draft for Public Comment document was distributed, related to the NII Task Force, that began with an Introduction and then to a section of Libraries and the NII. The 12 September NY Times article by Peter H. Lewis entitled Library of Congress Offers to Feed the Data Highway was discussed, and inaccuracies / premature announcements were explained.

Session: Searching, Information Retrieval and Navigation

Speaker: Heinz Sagl, IBM

Donna Harman of NIST moderated. The context was that this feeds presentation and distribution, depends on storage and control, which draws on translation and transformation as well as authoring and assembly and capture and indexing.

Session: Rights, Management, Security and Billing

Speaker: Jeffrey Crigler, IBM

William Walker of the New York Public Library moderated. Rights Management is the technical choke point of digital library services. Value derived from scarcity is the historical notion, so right to copy or distribution has been the means of protecting owner's rights. Now, however, it is meaningless, and so usage can be basis of protecting economic value of works. Then, performance models can apply, whence copyright become usageright.

Superdistribution is one approach, where permanently adhere entitlements to objects (and any derivatives or aggregations) and enforce them through automated means, so can distribute and copy freely, but when certain usage occurs, some payment must be made or pre-payment must be proved. This leads to a model where:

Challenges involve: keeping entitlements glued, dealing with derivatives or aggregates, metering usage, digital coins for anonymity, standards, royalty payment clearinghouses, connection to electronic commerce, etc.

In Joseph L. Ebersole's March 1994 book (available through the IIA) Protecting Intellectual Property Rights on the Information Superhighways is listed the following set of requirements:

Session: Database Issues

Speaker: Jim Reimer

Bill Scherlis of CMU moderated. The basic database has applications connected to integrated APIs which talk to either a store for the index and metadata, or an object data store (usually in HSM). The architecture must be scalable, have a content independent data store, allow either store and forward or stream delivery, and can be centralized or distributed in a client/server environment. A solution must include:

Session: Music Libraries

Speaker: Tony Prior is co-founder of Telstar Group, which has the biggest annual media spend of any company in the UK record business. He pointed out that they are ready immediately to get into the digital distribution business, once the technical issues are worked out. This may allow them to bypass several steps, and go directly to end-users with music. It may lead to better services to users at lower costs.

Session: Storage Systems

Speaker: Alan Bell, IBM gave some important background information. Each year 15 petabytes of disk storage is purchased, with 2/3 of that going to distributed systems. Tape storage purchases are 200 petabytes per year. This amounts to over 2 Mbytes of disk storage for each person on the planet.

An architecture was suggested with main servers connecting to a number of local servers and thence to client systems with smaller stores. The main server needs a full storage hierarchy, while the local server may have a storage hiearchy and CD storage. Clients will tend to have disk stores.

A terabyte is equivalent to:

Storage costs will continue to decrease, quickly, for all forms, with a 50% CAGR. In a year magnetic disk will cost 45 cents / Mbyte. Read/write optical will cost 1/10th that, CD-ROM 1/1000th, and tapes 1/500th.