National Center for Data Mining Welcome
Jaideep, Bamshad and Dave
HPC Demo at SC03
Babu Asvin Arvind at Supercomputing 05

November 2009

SC09: NCDM wins the Bandwidth Challenge Competition.

The NCDM/iCAIR/NRL team demonstrated three applications to show efficient bandwidth utilization in distributed data intensive applications. The first demo is processing very large datasets over 256 servers in 4 data centers connected by wide area high speed networks. The data analysis application exchanages data at over 100Gb/s among participating nodes. This application uses the open source software Sector/Sphere and UDT, developed by NCDM. The second demo is a cloud based image rendering application that delivers very high resolution visualization (computed by remote cloud systems) over long distance infiniband and IPv6. A hardware implementation of UDT was deployed to support the long distance infiniband protocol. The third demo showcased a light weight UDT variant called UDX, which can transfer data at 9.xGb/s using a single connection over a 10Gb/s network with 200ms RTT. Overall, our team achieved 25Gb/sec sustained throughput over a 200ms RTT, 12,000 mile path utilizing only seven servers at the SC09 floor.

SC09: Sphere TeraSort performance visualization

NCDM developed Sphere platform demonstrated at SuperComputing 2009 conference running a TeraSort HPC benchmark.

SC09: Canopy visualization

NCDM developed virtual network management library Canopy, demonstrated at SuperComputing 2009 conference.

SC09: UDXnet BWC visualization

NCDM developed UDT high performance network protocol demonstrated across a 12,000 mile network at SuperComputing 2009 conference.

WTTW 11 Chicago presents a segement on Cloud Computing

'Chicago has become a world center of "cloud computing." As we continue our Chicago Matters: Beyond Burnham series, Rich Samuels explains what "cloud computing is and how you probably already use it on a daily basis.' Video link.

Summer 2009 Undergraduate Research Opportunity in Computer Science

Sponsored by The National Center for Data Mining at UIC & The National Science Foundation.

The National Center for Data Mining at UIC is hosting a Research Experience for Undergraduates (REU). Students will work on research projects using clouds for high performance computing, for applications in genomics and systems biology.

Information: reu-2009.pdf
Application: reu-2009-application.pdf

Sterling Commerce adopts UDT

Sterling Commerce, an AT&T Inc (NYSE: T) company, today announced Sterling File Accelerator (SFA). SFA combines the power of the company's Connect:Direct point-to-point file transfer software optimised for high-volume, secure, assured delivery of files with a new UDP Data Transfer-based file transport (UDT) - an application-level data transport protocol that overcomes the latency issues associated with transmission control protocol (TCP)-based transmissions.

source: iTWire.com.

UDT was developed by National Center for Data Mining


December Talk

U.K. e-Science and Exploiting Research Data

Speaker: Dr. Malcolm Atkinson

Friday, December 5, 2008
1-2 pm Location: 636 SEO

ABSTRACT

In 2000 the U.K. coined the word "e-Science" for a long-established research strategy: making the best use of advances in computing science to enable new research methods. It recognized this as a two way dynamic process and placed emphasis on advances in distributed computing and on exploiting the opportunities delivered by the growing bonanza of data in all fields of research. I will argue that this combination requires new architectures and will discuss experiences of using data streaming architectures in the OGSA-DAI product and the ADMIRE research project.

Short Biography:

Malcolm Atkinson is Director of the e-Science Institute. He is the UK e-Science Envoy and plays a leading role in the Open Middleware Infrastructure Institute UK, is on the advisory boards of the National Grid Service, the National Centre for e-Social Science, and Baltic Grid. He led the EU IST project 'International Collaboration to Extend and Advance Grid Education' (ICEAGE). This project organized the International Summer School on Grid Computing (ISSGC) and he chaired the Programme Committee for ISSGC'06, ISSGC'07 and ISSGC'08. He is a member of the Joint Information Systems Committee Board and JISC Support of Research Committee. He is a representative of the UK at the e-Infrastructure Reflection Group.

He led the development of the Department of Computing Science in Glasgow and is now Professor of e-Science in the School of Informatics, University of Edinburgh. He has more than 130 publications. His current research is concerned with data integration and its exploitation. He is currently the lead architect on an EU Framework Programme 7 project called Advanced Data Mining and Integration Research for Europe (ADMIRE).

Hosted by:


November 2008

NCDM receives SC|08 Conference Bandwidth Challenge Award.

AUSTIN, Texas, Nov. 20 -- SC08 -- The National Center for Data Mining (NCDM) at UIC and the Open Cloud Consortium were awarded the 2008 SC08 Bandwidth Challenge award at SC08 today in Austin.

Their entry was titled "Towards Global Scale Cloud Computing: Using Sector and Sphere on the Open Cloud Testbed" and was led by Dr. Yunhong Gu of the University of Illinois at Chicago and Dr. Robert Grossman of the University of Illinois at Chicago and Open Data Group.

Although cloud computing is common today, processing data by clouds today is almost always done within a single datacenter due to the technical challenges processing data across multiple datacenters. The team today demonstrated technology for the first time that enables cloud computing to utilize high performance networks and spread cloud computing across datacenters to create wide area clouds. The technology that makes this possible is the open source Sector storage cloud and Sphere compute cloud developed by the NCDM.

NCDM used the Open Cloud Testbed, which is a testbed managed by the Open Cloud Consortium for this challenge. The Open Cloud Consortium develops standards for computing within clouds and frameworks for interoperating between clouds.

"A whole new generation of cloud computing is now possible using the open source Sector storage cloud and the Sphere computing cloud and standards developed by the Open Cloud Consortium. For the first time, developing applications that span multiple distributed clouds is now possible," according to Robert Grossman.

According to Joe Mambretti, director of the International Center of Advanced Internet Research at Northwestern University and co-director of the Open Cloud Testbed, "These innovative technologies provide unique capabilities that will enable new generations of applications based on extremely large scale data streams."

During the Bandwidth Challenge at SC08, the team demonstrated three applications that used the Sector/Sphere cloud. The application transported bioinformatics data using Sector from the conference floor in Austin to Kitakyushu in Japan at over 8 Gb/s.

The second application demonstrated was Creditstone, which is a benchmark for financial services applications. The Sector/Sphere implementation of Creditstone processed about 53.5 billion synthetic credit card transaction records in less than 1 hour.

The third application was TeraSort, which sorted 1 terabyte of data within 30 minutes. The average data moving rate was about 4.8Gb/s in the Open Cloud Testbed, with a peak speed reaching 10Gb/s.

One of the key achievements of the Sector and Sphere software is that it is very easy to use. For example, the TeraSort code only requires about 50 lines of C++ code. This is critical, as it allows researchers to use their time to focus on research problems, rather than spending time dealing with distributed programming.

According to Yunhong Gu, "Sphere is a new software system that supports simplified distributed data processing application development. In contrast to traditional distributed computing methods such as MPI, Sphere allows users to write distributed applications with a few lines of code and without knowing the details of the underlying hardware."

Source: HPC Wire
The National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC) was established in 1998 to serve as a resource for research, standards development, and education for high performance and distributed data mining and predictive modeling.

The NCDM is supported, in part, by the National Science Foundation, the Chicago Bioinformatics Consortium, the Department of Defense, and the University of Illinois at Chicago, as well as by other funding agenices and NCDM's industrial partners.

NCDM is comprised of the Laboratory for Advanced Computing (LAC), Laboratory for Machine Learning and Data Mining (MLDM), Prof. Leland Wilkinson's group and Prof. Philp Yu's group.

Center's Recent projects:

  • Teraflow Testbed - distributed infrastructure designed to use new 10 Gb/s network protocols and data services.
  • Sector - infrastructure software providing distributed data storage, access and analysis/processing functionality.
  • Angle - network monitoring software to detect anomalous network events across multiple monitoring sites.
  • SidGrid - social informatics data collection and collaborative analysis software utilizing web and grid services.
  • largedataarchive - hosts a variety of large data sets for use by the larger research community
  • UDT - application level data transport protocol for the emerging distributed data intensive applications over wide area high-speed networks.

The Center focuses on three research areas:

  • Scaling algorithms, applications and systems to massive data sets.
  • Developing algorithms, applications, and systems for mining distributed data.
  • Establishing standard languages, protocols, and services for data mining and predictive modeling.

The NCDM is a co-founding member of the Data Mining Group (DMG), which develops the Predictive Model Markup Language (PMML) and related standards.

Recent News and Awards page.


Groups within NCDM

Laboratory for Advanced Computing
Director : Prof. Robert Grossman

Machine Learning & Data Mining
Director : Prof. Bing Liu

Prof. Leland Wilkinson's Group

Prof. Philip Yu's Group

telephone (312) 996-0305
e-mail staff@teraflowtestbed.net
address 700 SEO MC 249, 851 S. Morgan St. Chicago, IL. 60607