Electrotechnical Laboratory, Techinical Report, TR-2001-4 Grid Data Farm for Petascale Data Intensive Computing http://datafarm.apgrid.org/ Osamu Tatebe^1, Youhei Morita^2, Satoshi Matsuoka^3, Noriyuki Soda^5, Hiroyuki Sato^2, Yoshio Tanaka^1, Satoshi Sekiguchi^1, Yoshiyuki Watase^2, Masatoshi Imori^4 and Tomio Kobayashi^4, 1. Electrotechnical Laboratory 2. High Energy Accelerator Research Organization (KEK) 3. Tokyo Institute of Technology/JST 4. The University of Tokyo 5. SRA Inc. Introduction High performance and data-intensive computing and networking technology has become a vital part of large-scale scientific research projects in areas such as high energy physics, astronomy, space exploration and human genome projects. One example is the Large Hadron Collider (LHC) project at CERN, where four major experiment groups will generate an order of Petabyte of raw data from four big underground particle detectors each year, data acquisition starting from 2006. Grid technology will play an essential role in constructing world-wide data analysis environments where thousands of physicists will collaborate and compete for the particle physics data analysis at the energy frontier. A multi-tier ``Regional Centers'' world-wide computing model has been studied by the MONARC Project. It consists of Tier-0 center at CERN, multiple Tier-1 centers in participating continents, tens of Tier-2 centers in participating countries, and many Tier-3 centers in universities and institutes. Grid Data Farm is a Petascale data-intensive computing project initiated in Japan. The project is collaboration among KEK (High Energy Accelerator Research Organization), ETL/TACC (Electrotechnical Laboratory / Tsukuba Advanced Computing Center), the University of Tokyo, and Tokyo Institute of Technology (Titech). The challenge will involve construction of a data processing framework that will handle hundreds of Terabyte to Petabyte scale data emanated by the ATLAS experiment of LHC. Both KEK and the Univ.\ of Tokyo will collaborate for building a Tier-1 regional center in Japan. The underlying hardware will be a thousands node scale PC cluster, each node facilitating a near-Terabyte of storage, and incoming data of approximately continuous 600Mbps bandwidth from CERN will be systematically stored and will be subject to intensive processing. The Grid Data Farm will facilitate the following features for collider data processing as well as serving as a framework for other types of data-intensive scientific applications: - Global distributed filesystem for Petabyte scale data, - Parallel I/O and parallel processing for fast data analysis, - World-wide group-oriented authentication and access control, - Thousands-node, wide-area resource management and scheduling, - Multi-Tier data sharing and efficient access, - Program sharing and management, - System monitoring and administration, - Fault tolerance / dynamic reconfiguration / Automated data regeneration or re-computation Major components of the Grid Data Farm are the Gfarm client, the Gfarm server and the Gfarm (distributed) filesystem with Gfarm parallel I/O. The Gfarm filesystem consists of a thousands node scale PC cluster, each node with a local disk and possibly distributed over the Grid, and Petascale data are distributed across the disks in the Gfarm filesystem managed by the Meta Data Management System and the Gfarm Filesystem Daemon. The Meta Data Management System provides a mapping from logical file names to the distributed physical file components and also stores metadata such as a replica catalog and a history that is necessary to reproduce the data. The Gfarm filesystem daemon provides a facility of remote file operations with access control as well as remote program loading and resource monitoring. Large-scale distributed data are accessed by the Gfarm parallel I/O library and processed in parallel. The Grid Data Farm middleware is based on Grid-based RPC (GridRPC), in particular an extended variant of our Ninf system, and other lower level Grid service middleware, especially Globus; it makes it easy for the users to register his analysis software and process massive amounts of data spread over multiple nodes in an easy way. Load balancing, Job scheduling, Fault Tolerance, and Data Maintenance are transparently or semi-transparently handled by the system. Users can interact with the system using GUIs or a simple shell front end; more sophisticated client program interaction is possible with GridRPC.