State Space Realization Theorems For Data Mining (2009)
Grossman, Robert L, Larson, Richard G
In this paper, we consider formal series associated with events, profiles derived from events, and statistical models that make predictions about events. We prove theorems about realizations for...
Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks (2009)
Tian, Feng, Shah, Parantu K., Liu, Xiangjun, Negre, Nicolas, Chen, Jia, Karpenko, Oleksiy, ...
Motivation: The highly coordinated expression of thousands of genes in an organism is regulated by the concerted action of transcription factors, chromatin proteins and epigenetic mechanisms....
End-to-End Congestion Control for High (2008)
Yunhong Gu, Student Member, Robert L. Grossman
Abstract—One of the headache problems in high performance computing area is the lack of a transport protocol to transfer bulk data fast over computational grids. TCP, the de facto transport...
Augustus: The Design and Architecture of a PMML-Based Scoring Engine 1 (2008)
John Chaves, Chris Curry, Robert L. Grossman, David Locke, Steve Vejcik
The Predictive Model Markup Language or PMML is an XML markup language for statistical and data mining models that has been developed over the past several years by
Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data (2008)
Gu, Yunhong, Grossman, Robert L
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming model and infrastructure. In this paper, we describe the...
Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere (2008)
Grossman, Robert L, Gu, Yunhong
We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides...
Distributed Discovery in E-Science: Lessons from the Angle Project ∗ (2008)
Robert L. Grossman, Michael Sabala, Yunhong Gu, Anushka An, Matt H, Rajmonda Sulo, ...
We describe the design of a system called Angle that detects emergent and anomalous behavior in distributed IP packet data. Currently, Angle sensors are collecting IP packet data at four locations,...
Abstract UDT: UDP-based Data Transfer for High-Speed Wide Area Networks (2008)
Yunhong Gu, Robert L. Grossman
In this paper, we summarize our work on the UDT high performance data transport protocol in the past four years. UDT was designed to effectively utilize the rapidly emerging high-speed wide area...
Compute and Storage Clouds Using Wide Area High Performance Networks (2008)
Grossman, Robert L., Gu, Yunhong, Sabala, Michael, Zhang, Wanzhi
We describe a cloud based infrastructure that we have developed that is optimized for wide area, high performance networks and designed to support data mining applications. The infrastructure...
Experimental Studies Using Photonic Data Services (2008)
At Igrid, Robert L. Grossman, Yunhong Gu, Don Hamelburg, Dave Hanley, Xinwei Hong, ...
We describe an architecture for remote and distributed data intensive applications which integrates path services for optical paths, network protocol services for high performance data transport, and...
Yunhong Gu, Robert L. Grossman
As network bandwidth and delays increase, TCP becomes inefficient [1, 5, 8, 9]. These problems are due to slow loss-recovery, a RTT bias inherent in its AIMD congestion-control algorithm, and the...
Joseph M. Bugajski, Robert L. Grossman, Steve Vejcik
Abstract. As the size of an organization grows, so does the tension between a centralized system for the management of data, metadata, derived data, and business intelligence and a distributed...
FastPara: a High-level Declarative Data-Parallel Programming Framework on Clusters ABSTRACT (2008)
Yong Mao, Yunhong Gu, Jia Chen, Robert L. Grossman
This paper presents FastPara, a C++ programming framework and associated runtime support for writing and running data-parallel applications in computer cluster environments. With FastPara, the user...
ABSTRACT Visual Browsing of Remote and Distributed Data (2008)
Parthasarathy Krishnaswamy, Stephen G Eick, Robert L Grossman
Data repositories around the world hold many thousands of data sets. Finding information from these data sets is greatly facilitated by being able to quickly and efficiently browse remote data sets....
Robert L. Grossman, Yunhong Gu, Dave Hanley, Xinwei Hong, Gokulnath Rao
The analysis and mining of remote and distributed data is critical for many applications being deployed on grid-based and web-based computing platforms. Broadly speaking, the
Joseph Bugajski, Robert L. Grossman
Abstract: We introduce an end-to-end framework for data quality that integrates business strategy, data quality models, and supporting investigative and governance processes. We also describe a...
Abstract UDT: UDP-based Data Transfer for High-Speed Wide Area Networks (2008)
Yunhong Gu, Robert L. Grossman
In this paper, we summarize our work on the UDT high performance data transport protocol over the past four years. UDT was designed to effectively utilize the rapidly emerging high-speed wide area...
Yunhong Gu, Robert L. Grossman
www.ncdm.uic.edu This paper presents Sector, a distributed environment that was created specifically to address the challenges inherent in accessing, exploring, analyzing and transporting extremely...
Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases (2008)
Greeshma Neglur, Robert L. Grossman, Natalia Maltsev, Clement Yu
Abstract. This paper describes a technique for efficiently searching metabolic pathways similar to a given query pathway, from a pathway database. Metabolic pathways can be converted into labeled...
An Overview of Hopf Algebras of Trees and Their Actions on Functions (2007)
Grossman, Robert L., Larson, Richard G.
We provide an expository account of some of the Hopf algebras that can be defined using trees, labeled trees, ordered trees and heap ordered trees. We also describe some actions of these Hopf...
Abstract Merging Multiple Data Streams on Common Keys over High Performance Networks (2007)
Marco Mazzucco, Asvin Ananthanarayan, Robert L. Grossman, Jorge Levera, Gokulnath Bhagavantha Rao
The model for data mining on streaming data assumes that there is a buffer of £xed length and a data stream of in£nite length and the challenge is to extract patterns, changes, anomalies, and...
Robert L. Grossman, Yunhong Gu, Dave Hanley, Xinwei Hong, Jorge Levera, Marco Mazzucco, ...
We describe an architecture for next generation, distributed data mining systems which integrates data services to facilitate remote data analysis and distributed data mining, network protocol...
Robert L. Grossman, Yunhong Gu, David Hanley, Xinwei Hong, Parthasarathy Krishnaswamy
Computer Networks, 2004. Although the amount of earth science data is growing rapidly, as is the availability of high performance networks, our ability to access large remote earth science data sets...
Dave Lillethun, Robert L. Grossman, Yunhong Gu, Dave Hanley, Xinwei Hong, Jorge Levera, ...
We argue that data webs employing specialized path services, network protocols, and data protocols can be an effective platform to analyze and access millions of distributed Gigabyte (and larger)...
Hopf Algebras of Heap Ordered Trees and Permutations (2007)
Robert L. Grossman, Richard G. Larson
A standard heap ordered tree with n + 1 nodes is a finite rooted tree in which all the nodes except the root are labeled with the natural numbers between 1 and n, and that satisfies the property that...
Joseph Bugajski, Robert L. Grossman, Eric Sumner, Steve Vejcik
In this paper we describe a new methodology for detecting data quality problems in high volume transaction streams called change detection using cubes of models or CDCM. We also describe how this...
Data mining middleware for wide-area high-performance networks,” Future Generation (2006)
Robert L. Grossman, Yunhong Gu, David Hanley, Michal Sabala, Joe Mambretti, Alex Szalay, ...
In this paper, we describe two distributed, data intensive applications that were demonstrated at iGrid 2005 (iGrid Demonstration US109 and iGrid Demonstration US121). One involves transporting...
Data Mining Systems Selected Prior Research (2006)
• 1996- scaled tree-based classifiers to very large data sets. A fundamental challenge in data mining is to mine data sets that are so large that they do not fit into a computer’s memory. This is...
Greeshma Neglur, Robert L. Grossman, Bing Liu
Integrating data involving chemical structures is simplified when unique identifiers (UIDs) can be associated with chemical structures. For example, these identifiers can be used as database keys....
Joseph Bugajski, Robert L. Grossman, Eric Sumner, Tao Zhang
Abstract: We introduce a methodology for improving information quality for complex, distributed event based systems and apply this methodology to an electronic payments system. The methodology...
Differential Algebra Structures on Families of Trees (2005)
Robert L. Grossman, Richard G. Larson
It is known that the vector space spanned by labeled rooted trees forms a Hopf algebra. Let k be a field and let R be a commutative kalgebra. Let H denote the Hopf algebra of rooted trees labeled...
Differential Algebra Structures on Familes of Trees (2004)
Grossman, Robert L, Larson, Richard G
It is known that the vector space spanned by labeled rooted trees forms a Hopf algebra. Let k be a field and let R be a commutative k-algebra. Let H denote the Hopf algebra of rooted trees labeled...
Experiences in Design and Implementation of a High Performance Transport (2004)
Yunhong Gu, Xinwei Hong, Robert L. Grossman
This paper describes our experiences in the development of the UDP-based Data Transport (UDT) protocol, an application level transport protocol used in distributed data intensive applications. The...
Using Dataspace to Support Long-Term Stewardship of Remote and Distributed Data (2004)
Robert L. Grossman, Dave Hanley, Xinwei Hong, Parthasarathy Krishnaswamy
Introduction In this note, we introduce DataSpace Archives. DataSpace Archives are built on top of DataSpace's DSTP servers [2] and are designed not only to provide a long term archiving of...
Robert L. Grossman, Yunhong Gu And Xinwei, Yunhong Gu, Xinwei Hong, Antony Antony, Johan Blom, ...
We describe a UDP based application level transport protocol, named UDT (UDP based Data Transfer) that is designed for high performance networking and computing. It is fast, fair, and friendly. UDT...
at Chicago “High Performance Data Streaming in Service Architecture (2004)
Geoffrey Fox, Harshawardhan Gadgil, Shrideep Pallickara, Marlon Pierce, Robert L. Grossman, Yunhong Gu, ...
Applications dealing with large data sets obtained via simulation or actual real-time sensor networks are increasing in abundance. The data obtained from real-time sources may contain certain...
Abstract Merging Multiple Data Streams on Common Keys over High Performance Networks (2002)
Marco Mazzucco, Asvin Ananthanarayan, Robert L. Grossman, Jorge Levera, Gokulnath Bhagavantha Rao
The model for data mining on streaming data assumes that there is a buffer of fixed length and a data stream of infinite length and the challenge is to extract patterns, changes, anomalies, and...
Experimental Studies Using Photonic Data Services at IGrid 2002 (2002)
At Igrid, Robert L. Grossman, Yunhong Gu, Don Hamelburg, Dave Hanley, Xinwei Hong, ...
We describe an architecture for remote and distributed data intensive applications which integrates optical path services, network protocol services for high performance data transport, and data...
Combining Families of Information Retrieval (2002)
Michael Cornelson, Robert L. Grossman, Ron Karidi, Dan Shnidman
This paper describes some experiments which use meta-learning to combine families of information retrieval (IR) algorithms obtained by varying the normalizations and similarity functions. By...
Merging Multiple Data Streams on Common Keys over High Performance Networks (2002)
Marco Mazzucco, Asvin Ananthanarayan, Robert L. Grossman, Jorge Levera, Gokulnath Bhagavantha Rao
The model for data mining on streaming data assumes that there is a bu#er of fixed length and a data stream of infinite length and the challenge is to extract patterns, changes, anomalies, and...
An Algebraic Approach to Data Mining: Some Examples (2002)
Robert L. Grossman, Richard G. Larson
In this paper, we introduce an algebraic approach to the foundations of data mining. Our approach is based upon two algebras of functions defined over a common state space X and a pairing between...
Structured Ensemble of Models (SEMs) (1999)
Robert L. Grossman, Ron Karidi, H. Vincent Poor
Predictive modeling is a fundamental tool with a wide range of business applications in industries such as advertising, banking, insurance, and finance. Such applications include personalization of...
A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications (1996)
Robert L. Grossman, Dave Northcutt
sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned...
Wavelet transforms associated with finite cyclic groups (1993)
Robert L. Grossman, H. Vincent Poor
Abstmct- Multiresolution analysis via decomposition on wavelet bases has emerged as an important tool in the analysis of signals and images when these objects are viewed as sequences of complex or...
Data mining standards initiatives (0000)
The article reveals that lacking standards for statistical and data mining models, applications cannot leverage benefits of data mining. The data mining and statistical models generated by commercial...
Data mining standards initiatives
The article reveals that lacking standards for statistical and data mining models, applications cannot leverage benefits of data mining. The data mining and statistical models generated by commercial...
Flynet: a genomic resource for Drosophila melanogaster transcriptional regulatory networks
Tian, Feng, Shah, Parantu K., Liu, Xiangjun, Negre, Nicolas, Chen, Jia, Karpenko, Oleksiy, ...
Motivation: The highly coordinated expression of thousands of genes in an organism is regulated by the concerted action of transcription factors, chromatin proteins and epigenetic mechanisms....