Implementing Relational Grid Monitoring Architecture (R-GMA)
provided by European Data Grid (EDG)
to Prototype a Knowledge Discovery Infrastructure
Abstract

Frank Wang, John Gordon, Na Helian and Robert Allan

August 27, 2003

The Grid has developed standards based infrastructures and services and platforms relevant to data mining. For example, the European Data Grid (EDG) provides a Relational Grid Monitoring Architecture (R-GMA) for distributed resources that expose a relational model with SQL support to provide static as well as dynamic information about Grid resources and for use within application monitoring. The relational model R-GMA is very flexible and allows complex queries which make use of information in multiple objects. R-GMA makes information from Producers available to Consumers as relations (tables). This paper describes the implementation of a ScanOnce algorithm in SQL for quick association rule mining and the development of a data mining infrastructure JetGrid. The architecture of JetGrid is designed to be compatible with lower-level grid mechanisms since it is to operate on top of R-GMA. JetGrid for quick knowledge discovery was preliminarily prototyped and it is extensible, using an object-oriented design that was coded in C++. Mining agents will be staged to one or more computing elements on the EDG. The agent also performs acquisition of data. Intensive tests are being carried out using GridFTP and other Globus transport mechanism for shipping raw data D across the grid to one node for processing (MD: Move Data Model) and processing the data locally until a result R is obtained and ship the result to one node for further processing (MR: Move Results Model), which provides some indication of the performance of the JetGrid data mining infrastructure on top of EDG.