Recent News
November 2007
NCDM receives SC|07 Conference
Award.
First Place at the SC 07 Analytics Challenge Awarded to ANGLE, the New
Approach for Protecting Cyber-infrastructure
Reno, NV, November 15, 2007 - A new approach for protecting
cyber-infrastructure won first place at the Third Annual Analytics
Challenge at the SC 2007 conference in Reno, NV.
Cyber-infrastructure refers to the Internet-based infrastructure that
allows businesses, consumers and the government to use the Internet and
Internet-based applications. There is a growing awareness that
protecting cyber-infrastructure from interference by criminals and other
threats is becoming a national priority.
A team led by the National Center for Data Mining (NCDM) at the
University of Illinois at Chicago and including participants from
Northwestern University, the University of Chicago, Argonne National
Laboratory, and the University of Southern California developed an
application to protect cyber-infrastructur, called Angle.
Given the high volume of the data that is transported over the Internet,
methods for identifying attacks on cyber-infrastructure can produce so
many alerts that analysts monitoring the infrastructure are often
overwhelmed. In these circumstances, it is common for analysts to miss
new behavior that might be the beginning of new types of attacks. The
Angle application developed by the team introduced a new algorithm for
identifying possibly malicious activity for further study.
Since the Internet is distributed, so is the data that must be analyzed
to protect it. With today's supercomputers, the data must be collected,
transported to the supercomputer, and then transported back. For large
data, the time required to do this can be a significant fraction of the
total time required by the analysis.
One of the innovations of the Angle project was the use of a data and
compute cloud so that the data could be left in place and computation
performed over the data. Although cloud computing has been used in the
past several years by companies such as Google, Yahoo, Amazon and
Microsoft to provide their services, these cloud infrastructures, by and
large, are based on the standard Internet. In contrast, the Sector data
cloud used by the Angle Project was a second-generation data cloud that
is based on wide area high performance networks. These high performance
networks enabled the large data sets produced by the project to be
handled easily.
"Winning the Analytics Challenge shows the potential that second
generation data and compute clouds have for changing the way we manage
and compute with large distributed data," said Robert Grossman, Director
of the National Center for Data Mining (NCDM) at the University of
Illinois at Chicago and Managing Partner of Open Data Group.
The Angle Project was sponsored in part by CDAR, a Chicago-based
research consortium that is developing new technologies and
methodologies for analyzing large, complex and distributed data.
The National Center for Data Mining has led teams that have won two of
the first three Analytic Challenges (at SC 05 and SC 07).
Second Generation Data Cloud Announced at SC07
Reno, NV, November 12, 2007 - A second generation Data Cloud called Sector
was announced this week at the SC 2007 conference in Reno, NV.
Cloud computing is a critical piece of the infrastructure that allows companies
such as Google, Yahoo, Amazon and Microsoft to provide their services.
A cloud provides computing resources or services over the Internet. A storage
cloud provides storage services; a data cloud provides data management
services; and a computing cloud provides computational services. Often these
are layered to create a stack of cloud services that provide a computing
platform for developing cloud-based applications.
Until now, data clouds all used the standard Internet to link distributed
computing resources.
At SC 07, the National Center for Data Mining (NCDM) at UIC announced a
second generation data cloud called Sector that uses high performance, wide
area 10 Gbps networks.
The foundation for Sector Data Cloud is the 10 Gbps Teraflow Testbed, a
joint project of the NCDM and the International Center for Advanced Internet
Research (iCAIR) at Northwestern University.
"Data clouds have emerged as the preferred platform for distributed
computing when working with large amounts of data," said Robert Grossman,
Director of the National Center for Data Mining at the University of Illinois
at Chicago and Managing Partner of Open Data Group. According to Grossman,
"Sector is the first of a second generation of data clouds that are based on
new network protocols designed to work with the very large data sets that are
common in e-science and that are beginning to become more common in
e-business."
Sector is an open source data cloud based on the NCDM developed UDP-based
Data Transfer (UDT) protocol that enables even very large data sets to be
transported efficiently over high performance wide area networks.
"We have extensively used the Sector Data Cloud and the Teraflow Testbedto
distribute multi-terabyte astronomical datasets to the whole world. We are also
working to implement large-scale streaming queries across large astronomical
archives to support the users of the National Virtual Observatory," said
Alexander Szalay, Professor of Astrophysics and Computer Science at the Johns
Hopkins University.
About the National Center for Data Mining
The National Center for Data Mining (NCDM) at the University of Illinois at
Chicago (UIC) was founded in 1998 as a national resource for high-performance
and distributed data mining and data intensive computing. NCDM performs
research, hosts standards, operates testbeds, and engages in outreach. NCDM
coordinates the development of the Predictive Model Markup Language (PMML), a
standard for statistical and data mining models, and operates the Teraflow
Testbed, a network for distributing large e-science datasets. For more
information about NCDM, see http://www.ncdm.uic.edu.
Teraflow Testbed - A High Performance Facility for Distributing and
Sharing Large E-Science Data Sets Announced at SC07
Reno, NV, November 12, 2007. This week, at the SC 2007 conference in
Reno, NV, a consortium of researchers announced the Teraflow Testbed (TFT). The
Teraflow Testbed is a unique international facility for working with, and for
sharing, large remote and distributed data.
The Teraflow Testbed is the first advanced network dedicated to linking
together large e-science data sets so that they are easier to integrate with
each other and easier to share with colleagues.
The Teraflow Testbed employs specialized transport protocols and dedicated
lightpaths using 1 Gbps, 10 Gbps and multiple 10 Gbps data streams that connect
Teraflow Testbed sites around the world. With the ability to move the data at
10 Gbps and higher, the Teraflow Testbed provides as much bandwidth between its
distributed sites as most grid computers have between their nodes that are in
the same room.
The design and implementation of the TeraFlow Testbed is being led by the
National Center for Data Mining (NCDM) at the University of Illinois at Chicago
and the International Center for Advanced Internet Research (iCAIR) at
Northwestern University. Other members of the consortium include StarLight, an
international communications facility in Chicago, and the National Lambda Rail.
"This facility is the first dedicated facility for distributing and sharing
large e-science data sets," said Robert Grossman, Director of the National
Center for Data Mining at the University of Illinois at Chicago and Managing
Partner of Open Data Group. "Until today, most high performance network
testbeds have been used for connecting supercomputers, not for changing the way
people work with data," according to Grossman.
"The ability to share large amounts of distributed, federated data and
stream between sites to support search, analysis, and visualization requires
reliable high-bandwidth, low-latency networking at 10Gbps to 40Gbps over
unconstrained lightpaths," said Henry Dardy, Chief Scientist for the Center for
Computational Science at the Naval Research Laboratory. "Our research today
with the Teraflow Testbed deploys Infiniband as a single wire hardware
interconnect of processing, storage and network assets along with open source
software to demonstrate virtualization of the enterprise."
"This facility will support multiple advanced applications, including many
advanced prototypes that cannot be sustained by traditional technology
infrastructures," said Joe Mambretti, Director of the International Center for
Advanced Internet Research (iCAIR) and Co-Director of the StarLight facility,
one of the world's largest optical network exchanges for national and
international research and education networks, which is located in Chicago.
The initial leg of the Teraflow Testbed uses a dedicated 10 Gbps lightpath
connecting a Teraflow Testbed cluster at the StarLight facility in Chicago and
a Teraflow Testbed cluster in Mclean, Virginia. From McLean, the Teraflow
Testned connects to clusters at NASA Goddard in Greenbelt, Maryland, Johns
Hopkins University in Baltimore, Maryland, and to the Naval Research Laboratory
in Arlington, Virginia.
The Teraflow Testbed also connects with Teraflow Testbed clusters in
Daejeon, Korea, Tokyo, Japan, and Amsterdam, The Netherlands using shared 10
Gbps networks.
Over the next year, it will be extended to multiple other sites nationally
and internationally.
The Teraflow Testbed is sponsored in part by the National Science
Foundation, the US Army, the Department of Energy, and the University of
Illinois at Chicago.
For more information, see http://www.teraflowtestbed.net.
About the National Center for Data Mining
The National Center for Data Mining (NCDM) at the University of Illinois at
Chicago (UIC) was founded in 1998 as a national resource for high-performance
and distributed data mining. NCDM performs research, hosts standards, operates
testbeds, and engages in outreach. The Center coordinates the development of
the Predictive Model Markup Language (PMML), a standard for statistical and
data mining models, and operates the Teraflow Testbed, a network for
distributing large e-science datasets. For more information about NCDM, see
http://www.ncdm.uic.edu.
About the International Center for Advanced Internet Research, Northwestern
University (iCAIR). iCAIR accelerates leading-edge innovation and enhanced
global communications through advanced Internet technologies, in partnership
with the international community, and national partners. The Center, which was
created in partnership with a number of major high tech corporations, designs
and implements large scale infrastructure and applications (metro, regional,
national, and global). The Center has designed multiple advanced research
testbeds, which are used to develop new communications architecture, services
and technology. iCAIR also participates in the operations of advanced networks
and facilities, such as StarLight, a unique global network exchange in Chicago.
See http://www.icair.org for more information.
About the National Lambda Rail
National LambdaRail, Inc. (NLR) is a major initiative of U.S. research
universities and private SECTOR technology companies to provide a national
scale infrastructure for research and experimentation in networking
technologies and applications. NLR puts the control, the power and the promise
of experimental network infrastructure in the hands of our nation's scientists
and researchers. For more information, see http://www.nlr.net.
About StarLight (sm)
StarLight is an advanced optical infrastructure and proving ground for
network services optimized for high-performance, large scale national and
global applications. Operational since summer 2001, StarLight has 1GE and 10GE
switch/router facilities and true optical switching for wavelengths. StarLight
is being developed by the Electronic Visualization Laboratory (EVL) at the
University of Illinois at Chicago (UIC), the International Center for Advanced
Internet Research (iCAIR) at Northwestern University, and the Mathematics and
Computer Science Division at Argonne National Laboratory, in partnership with
Canada's CANARIE and the Netherlands' SURFnet. See
http://www.startap.net/starlight for more information.
The Second Annual Data Mining Practice Prize winner was announced at KDD-2007:
Data Quality Models for High Volume
Transaction Streams: A Case Study
authors:
Robert L. Grossman
Open Data Group
River Forest IL USA
& National Center for Data Mining
University of Illinois at Chicago
Chicago IL USA
rlg1 at opendatagroup.com
Joseph Bugajski
Visa International
Foster City, CA USA
JBugajsk at visa.com
Chris Curry, David Locke & Steve Vejcik
Open Data Group
River Forest IL USA
{ccurry, dlocke, vejcik} at opendatagroup.com
August 28, 2007
"Detecting Changes in Large Data Sets of Payments Cards Data: A Case Study"
- by Robert Grossman is available in a
video lecture format.
July 22, 2007
ACM SIGKDD 2007 Service Award to Robert Grossman
Award Acceptance Video
ACM SIGKDD is pleased to announce that Robert Grossman is the winner
of its 2007 Service Award. Robert Grossman is recognized for his key
role in the development of open and scalable architectures and
standards for the SIGKDD and Global KDD Communities.
The ACM SIGKDD Service Award is the highest service award in the field
of data mining and knowledge discovery. It is given to one individual
or one group who has performed significant service to the data mining
and knowledge discovery field, including professional volunteer
services disseminating technical information to the field, leading
organizations or projects that contribute technically to the field as
a whole, furthering KDD education, or increasing funding to the KDD
community.
The previous SIGKDD Service Award winners were Gregory
Piatetsky-Shapiro, Ramasamy Uthurusamy, Usama M. Fayyad, Xindong Wu,
the Weka team lead by Ian Witten and Eibe Frank, and Won Kim.
The award includes a plaque and a check for $2,500, to be presented at
KDD-2007 (The 13th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining) Opening Plenary Session on August 12, 2007
in San Jose, CA.
Grossman was one of the Founders of the Data Mining Group in 1998,
which develops the Predictive Model Markup Language (PMML). He has
been its Chair since it was started; and, during this time, it has
released nine versions of PMML. PMML has seen wide spread adoption by
the KDD community, in part, because:
- PMML supports the sharing of statistical and data mining models in a
platform and application independent fashion.
- PMML supports architectures in which one application produces PMML models
(called the PMML Producer) and another application, which may not even be a
data mining application, consumes PMML models (called the PMML Consumer or
scoring engine).
- PMML supports KDD service oriented architectures.
- PMML facilitates the storing of models in model repositories.
- PMML supports applications in which models must be audited for compliance
and other regulatory requirements.
For the past 10 years, Grossman has led two international testbeds for
high performance and distributed data mining, which have been used by
over fifty different organizations and groups to test, benchmark, and
develop innovative technology for high performance and distributed
data mining and knowledge discovery. The testbeds have also been used
to develop and benchmark grid and service oriented technologies for
mining large remote and distributed data sets. The first testbed was
called the Terabyte Challenge and operated from 1995 to 1999, when
working with a terabyte of data was still relatively rare. The second
tested called the Teraflow Testbed was started in 2004 and will operate
until at least 2008. Today when most distributed data mining takes
place at 1-100 Mbps, the Teraflow Testbed can be used to mine data at
1-10 Gbps over wide area high performance networks.
Grossman has a long history of serving the KDD community. He was the
Industrial Track Co-Chair for KDD 2006, the General Chair of KDD 2005,
the Sponsorship Chair for KDD 2000 and 2001, and the co-chair of the
First and Second SIAM International Conferences on Data Mining (SDM-01
and SDM-02).
Grossman has published over 140 research and technical papers in
international conferences and journals. In 2005, he led the team that
won the first annual High Performance Analytics Challenge at the
ACM/IEEE International Conference for High Performance Computing and
Communications (SC 2005). He also led teams that won prizes involving
high performance data mining and related areas at SC 2006, SC 1999,
and SC 1998, SC 1996 and SC 1995.
Grossman is the Director of the National Center for Data Mining at the
University of Illinois at Chicago and the Managing Partner of Open
Data Group.
ACM SIGKDD is pleased to present Grossman its 2007 Service Award for his
significant service and contributions to the global KDD community.
2007 ACM SIGKDD Awards Committee:
Ramasamy Uthurusamy (General Motors, USA), Chair
Jerome Friedman (Stanford University, USA)
Jiawei Han (University of Illinois Urbana-Champaign, USA)
Vipin Kumar (University of Minnesota, USA)
Heikki Mannila (University of Helsinki, Finland)
Rajeev Motwani (Stanford University, USA)
Ramakrishnan Srikant (Google, USA)
Ian H. Witten and Eibe Frank (University of Waikato, New Zealand)
Xindong Wu (University of Vermont, USA)
June 5, 2007
UIC's National Data Mining Center Enables Fast Data Transfer of Terabyte-sized
Scientific Datasets. Press Release.
NCDM wins Bandwidth Challenge at SC06
November 16, 2006
Tampa, FL
The National Center for Data Mining at UIC
wins the HPC Bandwidth Challenge at SuperComputing '06

Debbie Montano presents the SC06 Bandwidth Challenge Winner award
to Dr. Robert Grossman and Dr. Yunhong Gu of NCDM.

SuperComputing 2006 Bandwidth Challange Award
Bandwidth Challenge 06: End-to-End Achievement
The National Center for Data Mining at UIC has won the HPC Bandwidth
Challenge at SC06 in Tampa, FL, sponsored by Qwest. Nine institutions participated in the
competition. NCDM won by sustaining a data transfer rate of 8Gb/s over a 10Gb/s link,
with a peak rate of 9.18Gb/s during the competition window. NCDM uses its own open source
software products, UDT and SECTOR, to transfer large datasets efficiently at high speeds
on optical networks.
This year the Bandwidth Challenge focused on a specific facet of networking:
End-to-End achievement. Competitors were asked to fully utilize one 10 Gig path,
end-to-end, disk-to-disk, from SC06 in Tampa back to their home institution, using
the actual production network back home. Participants were required to realize, demonstrate and
publish all the configuration, troubleshooting, tuning and policies used.
The SC06 show floor was connected with the major US research networks, specifically:
Abilene, ESnet, NLR PacketNet, NLR FrameNet, and HOPI. The US research networks provided
transit for the international networks with which they peer.
NCDM transferred Sloan Digital Sky Survey Data (SDSS) between the SC06 show floor in Tampa, Fl
and its lab in 4223 SEL at the University of Illinois in Chicago. It used SECTOR, the newly
developed distributed data space management system. SECTOR transparently manages the file
locating and data moving, while the NCDM developed UDT software is used for the actual
data transfer. The data transfer was disk to disk over one 10Gb/s shared routed link between
SC06 and UIC, via StarLight.
Bandwidth Challeng Competitors:
Winner
National Center for Data Mining (NCDM) at UIC, Northwestern Univ., Johns Hopkins Univ., "Transporting Sloan Digital Sky Survey Data Using
SECTOR".
Honorable Mention
1) CalTech, CERN, Univ. of Florida, Univ. of Michigan, "High Speed Data Gathering, Distribution
and Analysis for Physics Discoveries at the Large Hadron Collider"
2) Indiana Univ., Pittsburgh SuperComputing Center, Oak Ridge National Laboratory, "All in a
Day's Work: Advancing Data Intensive Research with the Data Capacitor"
Additional teams:
1) Japanese Aerospace Exploration Agency
2) Pacific Northwest National Laboratory
3) Purdue University
4) Internet2, Univ. of Washington
5) Texas A&M Univ. , Univ. of Delaware
6) Univ. of Tokyo
For more information on the HPC Challenge see SC06 BWC page.
March 2006
UDT
As of March 2006, 5519 users downloaded UDT (UDP-based Data Transfer Protocol) from our SourceForge page.
UDT is a high performance data transport protocol for distributed data intensive applications developed in our lab.
March 2006
UIC Chicago Alumni News features an article about Stuart Bailey, the founder of Infoblox.
Read More.
February 2006
November 2005
LAC receives SC|05 Conference
Awards.
NCDM won 2 awards at SC05 in Seattle, WA this year, including the first ever running of the
Tri-Challenge. The Tri-Challenge was a combination of the HPCAnalytics, the HPC Bandwidth
and the Storcloud Challenge. NCDM enterd all 3 competitions and received the highest combined
score for its entries, winning first place. This competition is not expected to be run again,
so NCDM is proud to be the first, and only, winner of the SuperComputing Tri-Challenge Award.
SuperComputing Conference High Performance Computing Challenges:
Bandwidth Challenge Winner (2004, Pittsburgh) Award
Application Foundation Award (2003, Phoenix)
Best Use of Emerging Infrastructure (2002, Baltimore) Award
1st runner-up, Outstanding (2000, Dallas) Award
High Performance Communication Award (1999, Portland) Award
Most Innovative of Show (1998, Orlando) Award
Gold Medal for Innovation (1996, Pittsburgh) Award
1st Place (1995, San Diego) Award
Links of Interest
Selected press clippings.
|