DL Project Suggestion: PetaPlex
Title: PetaPlex super storage system
- Number of people: 2-15
- Goal: for VT-PetaPlex-1 to go into production use on campus
- Contact information: Robert Akscyn, rma@ks.com,
President of Knowledge
Systems Incorporated, developer of PetaPlex technology;
the instructor; and others listed in subprojects below
- Required background: ability to program in C and C++, over networks
- Description:
This is an exciting project to help
deploy the Virginia Tech super storage cluster for campus use.
This system involves a powerful RS/6000 front end (1G RAM, 4 processors)
and the main PetaPlex unit itself. That has 100 nodes, each with a 233 MHz
Pentium, 64 M, and 25G byte disk, so there is a total of 2.5 terabytes capacity.
Robert Akscyn
will work with project groups during the semester.
Documentation should be available directly at
http://ks.com/vt/50.html
or indirectly from
ks.com/vt
under "Documents".
Particular efforts include (i.e., you can work on one or more of):
- Help Ohm Sornil (osornil@vt.edu) and extend his PhD work, that soon should
lead to a number of publications (which students may co-author) and probably
another dissertation. One effort is to integrate his inverted file technology
with MARIAN and get that to run on the PetaPlex as a production service.
Another is to carry out experiments
with information retrieval for 1 terabyte of text, to get performance figures
and tune the algorithms.
See O. Sornil, "A Distributed Inverted Index for a Large-Scale,
Dynamic Digital Library," Virginia Tech Computer Science, Blacksburg,
Ph. D. Dissertation Draft, 2000
- Develop support for video on PetaPlex, working with Paul Mather
(paul@csgrad.cs.vt.edu)
who is funded by IBM to connect their VideoCharger software
- Develop NFS and FTP services for PetaPlex
- Develop a Web server atop PetaPlex
- Develop quota and security controls for PetaPlex, so various applications
can run without interfering.
- Adapt bioinformatics algorithms to the PetaPlex, running on MPI.
One interesting
possibility is the work of Professor David Bevan (drbevan@vt.edu).
He has particular interest in the parallel version of Assisted
Model Building with Energy Refiement (AMBER) software.
- Connect PetaPlex with the Internet 2 Distributed Storage
Initiative
- Connect PetaPlex to run the Storage Request Broker (SRB) software from
San Diego Supercomputer Center, so it can support collections interoperable
with that, around the world. See http://www.npaci.edu/dice/srb and
http://srb.npaci.edu/
- Develop support for images and GIS on PetaPlex
- Adapt a Web spider to PetaPlex, so we can index the Web.
Divide up the sites to index randomly among the 100 nodes
and have each one index a complete site, in turn.
- Students involved: The following students are working on this project:
- Rohit Gupta, rogupta@csgrad.cs.vt.edu
- Palash Jain, pjain@csgrad.cs.vt.edu
- Abhishek Ram, aram@vt.edu
-