Electrotechnical Laboratory, Techinical Report, TR-2001-4

Grid Data Farm for Petascale Data Intensive Computing
http://datafarm.apgrid.org/

Osamu Tatebe^1, Youhei Morita^2, Satoshi Matsuoka^3, Noriyuki Soda^5,
Hiroyuki Sato^2, Yoshio Tanaka^1, Satoshi Sekiguchi^1, Yoshiyuki Watase^2,
Masatoshi Imori^4 and Tomio Kobayashi^4,

1. Electrotechnical Laboratory
2. High Energy Accelerator Research Organization (KEK)
3. Tokyo Institute of Technology/JST
4. The University of Tokyo
5. SRA Inc.

Introduction

High performance and data-intensive computing and networking
technology has become a vital part of large-scale scientific research
projects in areas such as high energy physics, astronomy, space
exploration and human genome projects.  One example is the Large
Hadron Collider (LHC) project at CERN, where four major experiment
groups will generate an order of Petabyte of raw data from four big
underground particle detectors each year, data acquisition starting
from 2006.  Grid technology will play an essential role in
constructing world-wide data analysis environments where thousands of
physicists will collaborate and compete for the particle physics data
analysis at the energy frontier.  A multi-tier ``Regional Centers''
world-wide computing model has been studied by the MONARC Project.  It
consists of Tier-0 center at CERN, multiple Tier-1 centers in
participating continents, tens of Tier-2 centers in participating
countries, and many Tier-3 centers in universities and institutes.

Grid Data Farm is a Petascale data-intensive computing project initiated
in Japan.  The project is collaboration among KEK (High Energy
Accelerator Research Organization), ETL/TACC (Electrotechnical
Laboratory / Tsukuba Advanced Computing Center), the University of
Tokyo, and Tokyo Institute of Technology (Titech).  The challenge will
involve construction of a data processing framework that will handle
hundreds of Terabyte to Petabyte scale data emanated by the ATLAS
experiment of LHC.  Both KEK and the Univ.\ of Tokyo will collaborate
for building a Tier-1 regional center in Japan.  The underlying hardware
will be a thousands node scale PC cluster, each node facilitating a
near-Terabyte of storage, and incoming data of approximately continuous
600Mbps bandwidth from CERN will be systematically stored and will be
subject to intensive processing. The Grid Data Farm will facilitate the
following features for collider data processing as well as serving as a
framework for other types of data-intensive scientific applications:

 - Global distributed filesystem for Petabyte scale data,
 - Parallel I/O and parallel processing for fast data analysis,
 - World-wide group-oriented authentication and access control,
 - Thousands-node, wide-area resource management and scheduling,
 - Multi-Tier data sharing and efficient access,
 - Program sharing and management,
 - System monitoring and administration,
 - Fault tolerance / dynamic reconfiguration / Automated data
   regeneration or re-computation

Major components of the Grid Data Farm are the Gfarm client,
the Gfarm server and the Gfarm (distributed) filesystem with Gfarm
parallel I/O.  The Gfarm filesystem consists of a thousands node scale
PC cluster, each node with a local disk and possibly distributed over
the Grid, and Petascale data are distributed across the disks in the
Gfarm filesystem managed by the Meta Data Management System and the
Gfarm Filesystem Daemon.  The Meta Data Management System
provides a mapping from logical file names to the distributed physical
file components and also stores metadata such as a replica catalog and a
history that is necessary to reproduce the data.  The Gfarm filesystem
daemon provides a facility of remote file operations with access control
as well as remote program loading and resource monitoring.  Large-scale
distributed data are accessed by the Gfarm parallel I/O library and
processed in parallel.

The Grid Data Farm middleware is based on Grid-based RPC (GridRPC), in
particular an extended variant of our Ninf system, and other lower
level Grid service middleware, especially Globus; it makes it easy for
the users to register his analysis software and process massive
amounts of data spread over multiple nodes in an easy way. Load
balancing, Job scheduling, Fault Tolerance, and Data Maintenance are
transparently or semi-transparently handled by the system. Users can
interact with the system using GUIs or a simple shell front end; more
sophisticated client program interaction is possible with GridRPC.