National Center for Data Mining Research Projects

Current Projects

The Teraflow Testbed.   The Teraflow Testbed is an infrastructure designed to use new 10 GB/s network protocols and data services for long haul, high performance networks.

Angle.   The Angle Anomaly Detection project was created to monitor network traffic and to detect unknown anomalous events. Network health is monitored by capturing packets at multiple Internet locations and detecting new traffic trends and sudden changes.

Sector.   Sector is a system infrastructure software that provides functionality for distributed data storage, access, and analysis/processing. It automatically manages large volumetric data across servers or clusters, even those over distributed wide area high speed networks.

Sidgrid.   The Social Informatics Data (SID) Grid enables researchers to collect real-time multimodal behavior at multiple time scales. Multimedia data (voice, video, images, text, numerical) is stored in a distributed data warehouse that employs Web and Grid services to support data storage, access, exploration, annotation, integration, analysis, and mining of individual and combined data sets. The SIDGrid project includes development of a user interface that supports access to the data regardless of the user's geographic location and coupling with existing Grid technology.

Cistrack.   The Cistrack system supports the studies on the dynamics of transcriptional networks in physiological, developmental and evolutionary time scales. We are developing a comprehensive system responsible for directing the data management, computing and simulation and portal aspects of these studies. We have integrated a wide variety of biological information about transcriptional networks and experimental results, including gene expression and biomolecular interactions, physiological states, spatio-temporal dynamics, etc. We have also developed a preliminary data management and computational infrastructure to manage experimental data about transcriptional networks. We maintain a project wide repository of transcriptional network data and an accompanying portal that is accessible to the research community. We plan to develop models to study the behavior of transcriptional regulatory networks during physiological stress, during cellular and organismal development, and during the evolution of species.

RAY.   Ray serves data in the Ferris-wheel method. In a Ferris wheel system, multiple client queries are synchronized to access the same sectors on disk simultaneously, so that an arbitrary number of parallel queries to the same dataset can be executed without asymptomatic degradation in performance.

UDT.   UDT is an application level data transport protocol for the emerging distributed data intensive applications over wide area high-speed networks. UDT uses UDP to transfer bulk data and it has its own reliability control and congestion control mechanism. This new protocol is not only for private or QoS-enabled links, but also for shared networks. Furthermore, UDT is also a highly configurable framework that can accommodate various congestion control algorithms.


Past Projects

DataSpace.   The web today provides an infrastructure for working with distributed multimedia documents. DataSpace is an infrastructure for creating a web of data instead of documents. DataSpace is designed to provide an infrastructure for the analysis and mining of business, e-business, scientific, engineering, and health care data.

DSTP.   Data is accessed in DataSpace using a protocol called the dataspace transfer protocol (DSTP). One of the NCDM/LAC projects is to a create a vendor supported standard for working with remote and distributed data over the web. DSTP is the first version of this standard. Mercury is an open source dstp server for putting data into data webs. Mercury is designed to work with small to medium size datasets over the commodity and high performance networks. Jupiter is a high performance dstp server designed for working with large data sets over high performance and optical networks.

PSockets and SABUL.   Designing network applications whose end-to-end performance scales to high performance SONET and emerging optical networks is a challenge. PSockets and SABUL are open source libraries to build network applications with this type of functionality.

Data Web Applications.   The NCDM/LAC has prototyped a variety of data web applications, including applications for the remote analysis, distributed mining, and interactive exploration of earth science data, bioinformatics data, web data, health care data, as well as other scientific, engineering, medical and business data sets.

telephone (312) 996-0305
e-mail staff@teraflowtestbed.net
address 700 SEO MC 249, 851 S. Morgan St. Chicago, IL. 60607