Information technology industry in general, and data mining technology in particular, has lingered too long in an era of over-specialization in which integration is just another specialty. We have made a tremendous progress in almost every aspect of computing through these specializations. Some components are smaller, others are faster, cheaper, easily connectable, more precise, and have more capacity. But, how to deal with the complexity of the entire environment, as a system, generated by all that "smaller/faster/cheaper" focus, where complexity is expressed by heterogeneity and large interconnectivity of powerful software, hardware, and data components? Though this question is applicable to different IT disciplines, our concentration is on solutions for a data-mining domain.
This paper focuses on the complexity problem of the data-mining infrastructure, and proposes a layered architecture to overcome this complexity using Grid technology. By hiding the complexity of both, data and software components in the Grid based architecture, a more user oriented data mining system can be built. The user will have powerful, intelligent infrastructure to specify data mining problems at a more abstract level, without knowing all the specific characteristics of required components for data mining. Proposed architecture will allow users to concentrate on what they want to accomplish rather than figuring how to solve all the technical details in tuning computing system for executing data mining models. In grid-based infrastructure exchange and integration of data and tasks (tools, libraries, device drivers, middleware, etc.) can be implemented through open standards defining the identification of components, their communication protocol and negotiation protocol among them. Grid based distributed data mining approach is an attempt to build an infrastructure that will have consistent user-friendly interface and control while allowing integration of a distributed, heterogeneous environment and complex interconnectivities.