Spectrogram: A Mixture-of-Markov-Chains Model for Anomaly Detection in Web Traffic (2009)
Yingbo Song, Angelos D. Keromytis, Salvatore J. Stolfo
We present Spectrogram, a machine learning based statistical anomaly detection (AD) sensor for defense against web-layer code-injection attacks. These attacks include PHP file inclusion,...
Towards Stealthy Malware Detection1 (2009)
Salvatore J. Stolfo, Ke Wang, Wei-jen Li
Malcode can be easily hidden in document files and go undetected by standard technology. We demonstrate this opportunity of stealthy malcode insertion in several experiments using a standard COTS...
Casting out Demons: Sanitizing Training Data for Anomaly Sensors (2009)
Gabriela F. Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo, Angelos D. Keromytis
The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment...
SPARSE: A Hybrid System to Detect Malcode-Bearing Documents (2008)
Wei-jen Li, Salvatore J. Stolfo
Embedding malcode within documents provides a convenient means of penetrating systems which may be unreachable by network-level service attacks. Such attacks can be very targeted and difficult to...
Anomaly Detection in Computer Security and an Application to File System Accesses ⋆ (2008)
Salvatore J. Stolfo, Shlomo Hershkop, Linh H. Bui, Ryan Ferster, Ke Wang
Abstract. We present an overview of anomaly detection used in computer security, and provide a detailed example of a host-based Intrusion Detection System that monitors file systems to detect...
Spectrogram: A Mixture-of-Markov-Chains Model for Anomaly Detection in Web Traffic (2008)
Song, Yingbo, Keromytis, Angelos D., Stolfo, Salvatore J.
We present Spectrogram, a mixture of Markov-chains sensor for anomaly detection (AD) against web-layer (port 80) code-injection attacks such as PHP file inclusion, SQL-injection,...
ABSTRACT STAND: Sanitization Tool for ANomaly Detection (2008)
Gabriela F. Cretu, Angelos Stavrou, Salvatore J. Stolfo, Angelos D. Keromytis
heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that...
Masquerade Detection Using a Taxonomy-Based Multinomial Modeling Approach in UNIX Systems (2008)
Salem, Malek Ben, Stolfo, Salvatore J.
This paper presents one-class Hellinger distance-based and one-class SVM modeling techniques that use a set of features to reveal user intent. The specific objective is to model user command profiles...
Abstract A Data Mining Framework for Building Intrusion Detection Models 1 (2008)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
There is often the need to update an installed Intrusion Detection System (IDS) due to new attack methods or upgraded computing environments. Since many current IDSs are constructed by manual...
WORMS 2005 Columbia IDS Lab Fileprint analysis for Malware Detection 1 (2008)
Salvatore J. Stolfo, Ke Wang, Wei-jen Li
Malcode can be easily hidden in document files and embedded in application executables. We demonstrate this opportunity of stealthy malcode insertion in several experiments using a standard COTS...
Wenke Lee, Salvatore J. Stolfo
Abstract. In this paper we describe a data mining framework for constructing intrusion detection models. The first key idea is to mine system audit data for consistent and useful patterns of program...
ABSTRACT ACE, a system for Automated Cable Expertise, is a Knowledge-Based Expert (2008)
Gregg T. Vesonder, Salvatore J. Stolfo, John E. Zielinski
System designed to provide troubleshooting reports and management analyses for telephone cable maintenance. Design decisions faced during the construction of ACE were guided by recent successes in...
ABSTRACT Online Training and Sanitization of AD Systems Extended Abstract (2008)
Gabriela F. Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo
In this paper, we introduce novel techniques that enhance the training phase of Anomaly Detection (AD) sensors. Our aim is to both improve the detection performance and protect against attacks that...
DATA MINING Distributed Data Mining in Credit Card Fraud Detection (2008)
Philip K. Chan, Salvatore J. Stolfo
to grow in number, taking an ever-larger share of the US payment system and leading to a higher rate of stolen account numbers and subsequent losses by banks. Improved fraud detection thus has become...
SPARSE: A Hybrid System to Detect Malcode-Bearing Documents (2008)
Li, Wei-Jen, Stolfo, Salvatore J.
Embedding malcode within documents provides a convenient means of penetrating systems which may be unreachable by network-level service attacks. Such attacks can be very targeted and difficult to...
DATABASE RESEARCH AT COLUMBIA UNIVERSITY (2007)
Shih-fu Chang, Luis Gravano, Gail E. Kaiser, Kenneth A. Ross, Salvatore J. Stolfo
Columbia University has a number of projects that
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem (2007)
Salvatore J. Stolfo, Usama Fayyad
Abstract. The problem of merging multiple databases of information about common entities is frequently encountered in KDD and decision support applications in large commercial and government...
A Comparison between Combiner and Stacked Generalization (2007)
David W. Fan, Philip K. Chan, Salvatore J. Stolfo
Combiner and Stacked Generalization are two very similar meta-learning methods that combine predictions of multiple classifiers to improve accuracy of any single classifier. Both methods form a...
Recursive-Stacking To Improve The Accuracy of Combined Classifiers (2007)
David W. Fan, Salvatore J. Stolfo, Philip K. Chan
(250 word maximum): We analyze the mechanism of stacking and point out the conflict problem. Two methods to reduce conflicts are discussed and their equivalence is established. We propose the...
The ALEXSYS Mortgage Pool Allocation System (2007)
Salvatore J. Stolfo, Philip K. Chan, Leland Woodbury, Jason Glazier, David Ohsie
We studied various approaches for the allocation of mortgage pools, a combinatorial optimization problem faced by financial institutions that trade in mortgagebacked securities. We review the...
Scalability of Learning Arbiter and Combiner Trees from Partitioned Data (2007)
Philip K. Chan, Salvatore J. Stolfo
Much of the research in inductive learning concentrates on problems with relatively small amounts of data residing at one location. In this paper we explore the scalability of learning arbiter and...
Meta-Learning Agents for Fraud and Intrusion Detection in Financial Information Systems (2007)
Salvatore J. Stolfo, Philip K. Chan, Dave Fan, Wenke Lee, Andreas Prodromidis
This paper describes a set of techniques and a general architecture realizing these that are presently under development at Columbia University. Our work is performed in collaboration with a...
Andreas L. Prodromidis, Salvatore J. Stolfo
Abstract. In this paper we study methods that combine multiple classification models learned over separate data sets. Numerous studies posit that such approaches provide the means to e#ciently scale...
Wenke Lee, Salvatore J. Stolfo, Philip K. Chan, Wei Fan, Matthew Miller, Shlomo Hershkop, ...
In this paper, we present an overview of our research in real time data mining-based intrusion detection systems (IDSs). We focus on issues related to deploying a data mining-based IDS in a real time...
Kui W. Mok, Salvatore J. Stolfo
In this paper, we present new algorithms to balance the computation of parallel hash joins over heterogeneous processors in the presence of data skew and external loads. Heterogeneity in our model...
DATABASE RESEARCH AT COLUMBIA UNIVERSITY (2007)
Shih-fu Chang, Luis Gravano, Gail E. Kaiser, Kenneth A. Ross, Salvatore J. Stolfo
Columbia University has a number of projects that
Cost Complexity Pruning of Ensemble Classifiers (2007)
Andreas L. Prodromidis, Salvatore J. Stolfo
In this paper we study methods that combine multiple classification models learned over separate data sets in a distributed database setting. Numerous studies posit that such approaches provide the...
Distributed Data Mining: The JAM System Architecture (2007)
Andreas L. Prodromidis, Salvatore J. Stolfo, Shelley Tselepis, Terrance Truta, David Kalina
This paper describes the system architecture of JAM (Java Agents for Meta-learning), a distributed data mining system that scales up to large and physically separated data sets. An early version of...
Citizen's Attitudes about Privacy While Accessing Government Websites: (2007)
Results Of An, Salvatore J. Stolfo, Eric Johnson, Tomislav Pavlicic, Stephen Jan
This paper reports the results of an investigation on citizens' attitudes and concerns regarding privacy and security on the Web, in general, and on the government websites they may visit, in...
Abstract Combining Knowledge Discovery and Knowledge Engineering to Build IDSs (2007)
Wenke Lee, Salvatore J. Stolfo
We have been developing a data mining (i.e., knowledge discovery) framework, MADAM ID, for
Automated Social Hierarchy Detection through Email Network Analysis (2007)
Rowe, Ryan, Creamer, German, Hershkop, Shlomo, Stolfo, Salvatore J.
We present our work on automatically extracting social hierarchies from electronic communication data. Data mining based on user behavior can be leveraged to analyze and catalog patterns of...
Frias-Martinez, Vanessa, Stolfo, Salvatore J., Keromytis, Angelos D.
There is a considerable body of literature and technology that provides access control and security of communication for Mobile Ad-hoc Networks (MANETs) based on cryptographic authentication...
A Study of Malcode-Bearing Documents (2007)
Li, Wei-Jen, Stolfo, Salvatore J., Stavrou, Angelos, Androulaki, Elli, Keromytis, Angelos D.
By exploiting the object-oriented dynamic composability of modern document applications and formats, malcode hidden in otherwise inconspicuous documents can reach third-party applications that may...
STAND: Sanitization Tool for ANomaly Detection (2007)
Cretu, Gabriela F., Stavrou, Angelos, Stolfo, Salvatore J., Keromytis, Angelos D.
The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Arti- ficial or contrived training data may not provide a realistic view of the...
Data Sanitization: Improving the Forensic Utility of Anomaly Detection Systems (2007)
Cretu, Gabriela F., Stavrou, Angelos, Stolfo, Salvatore J., Keromytis, Angelos D.
Anomaly Detection (AD) sensors have become an invaluable tool for forensic analysis and intrusion detection. Unfortunately, the detection performance of all learning-based ADs depends heavily on the...
On the infeasibility of Modeling Polymorphic Shellcode for Signature Detection (2007)
Song, Yingbo, Locasto, Michael E., Stavrou, Angelos, Keromytis, Angelos D., Stolfo, Salvatore J.
POlymorphic malcode remains one of the most troubling threats for information security and intrusion defense systems. The ability for malcode to be automatically transformed into a semantically...
On the infeasibility of Modeling Polymorphic Shellcode for Signature Detection (2007)
Yingbo Song, Michael E. Locasto, Angelos Stavrou, Angelos D. Keromytis, Salvatore J. Stolfo
Polymorphic malcode remains one of the most troubling threats for information security and intrusion defense systems. The ability for malcode to be automatically transformed into to a semantically...
Data Sanitization: Improving the Forensic Utility of Anomaly Detection Systems (2007)
Gabriela F. Cretu, Angelos Stavrou, Salvatore J. Stolfo, Angelos D. Keromytis
Anomaly Detection (AD) sensors have become an invaluable tool for forensic analysis and intrusion detection. Unfortunately, the detection accuracy of all learning-based ADs depends heavily on the...
Privacy-Preserving Payload-Based Correlation for Accurate Malicious Traffic Detection (2006)
Parekh, Janak J., Wang, Ke, Stolfo, Salvatore J.
With the increased use of botnets and other techniques to obfuscate attackers' command-and-control centers, Distributed Intrusion Detection Systems (DIDS) that focus on attack source IP addresses or...
Quantifying Application Behavior Space for Detection and Self-Healing (2006)
Locasto, Michael E., Stavrou, Angelos, Cretu, Gabriela G., Keromytis, Angelos D., Stolfo, Salvatore J.
The increasing sophistication of software attacks has created the need for increasingly finer-grained intrusion and anomaly detection systems, both at the network and the host level. We believe that...
Anagram: A Content Anomaly Detector Resistant to Mimicry Attack (2006)
Wang, Ke, Parekh, Janak, Stolfo, Salvatore J.
In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and ^íñsuspicious^íî network packet payloads. By...
Host-Based Anomaly Detection Using Wrapping File Systems (2006)
Hershkop, Shlomo, Bui, Linh H., Ferster, Ryan, Stolfo, Salvatore J.
We describe an anomaly detector., called FWRAP for a Host-based Intrusion Detection System that monitors file system calls to detect anomalous accesses. The system is intended to be used not as a...
Intrusion and Anomaly Detection Model Exchange for Mobile Ad-Hoc Networks (2006)
Gabriela F. Cretu, Janak J. Parekh, Ke Wang, Salvatore J. Stolfo
Mobile Ad-hoc NETworks (MANETs) pose unique security requirements and challenges due to their reliance on open, peer-to-peer models that often don’t require authentication between nodes....
Intrusion and Anomaly Detection Model Exchange for Mobile Ad-Hoc Networks (2006)
Gabriela F. Cretu, Janak J. Parekh, Ke Wang, Salvatore J. Stolfo
Abstract—Mobile Ad-hoc NETworks (MANETs) pose unique security requirements and challenges due to their reliance on open, peer-to-peer models that often don’t require authentication between nodes....
Anagram: A Content Anomaly Detector Resistant to Mimicry Attack (2006)
Ke Wang, Janak J. Parekh, Salvatore J. Stolfo
Abstract. In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n> 1) designed to detect anomalous and “suspicious ” network packet...
Data-Driven Detection of Malicious Document (2006)
Advisor Prof, Salvatore J. Stolfo
Malcode hidden in otherwise normal appearing public documents provide both convenient and stealthy means for attackers to penetrate systems. By exploiting the ubiquitous and object-oriented approach...
A temporal based forensic analysis of electronic communication (2006)
Previous work [1] reported on our research in developing a data mining environment for analyzing email communication data. In this paper, we describe our extensions to EMT for applying forensic...
Quantifying Application Behavior Space for Detection and Self-Healing (2006)
Michael E. Locasto, Angelos Stavrou, Gabriela F. Cretu, Angelos D. Keromytis, Salvatore J. Stolfo
The increasing sophistication of software attacks has created the need for increasingly finer-grained intrusion and anomaly detection systems, both at the network and the host level. We believe that...
A Genre-based Clustering Approach to Content Extraction (2005)
Gupta, Suhit, Becker, Hila, Kaiser, Gail E., Stolfo, Salvatore J.
The content of a webpage is usually contained within a small body of text and images, or perhaps several articles on the same page; however, the content may be lost in the clutter (defined as...
Genre Classification of Websites Using Search Engine Snippets (2005)
Gupta, Suhit, Kaiser, Gail E., Stolfo, Salvatore J., Becker, Hila
Web pages often contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Automatic extraction of 'useful and...
Towards Collaborative Security and P2P Intrusion Detection (2005)
Michael Locasto, Janak J. Parekh, Angelos D. Keromytis, Salvatore J. Stolfo
The increasing array of Internet-scale threats is a pressing problem for every organization that utilizes the network. Organizations have limited resources to detect and respond to these threats. The...
FLIPS: Hybrid Adaptive Intrusion Prevention (2005)
Michael Locasto Ke, Ke Wang, Angelos D. Keromytis, Salvatore J. Stolfo
Intrusion detection systems are fundamentally passive and fail--open. Because their primary task is classification, they do nothing to prevent an attack from succeeding. An intrusion prevention...
A Comparative Evaluation of Two Algorithms for Windows Registry Anomaly Detection, volume 13 (2005)
Salvatore J. Stolfo, Frank Apap, Eleazar Eskin, Katherine Heller, Andrew Honig, Krysta Svore
Abstract. We present a component anomaly detector for a host-based intrusion detection system (IDS) for Microsoft Windows. The core of the detector is a learning-based anomaly detection algorithm...
FLIPS: Hybrid adaptive intrusion prevention (2005)
Michael E. Locasto, Ke Wang, Angelos D. Keromytis, Salvatore J. Stolfo
Abstract. Intrusion detection systems are fundamentally passive and fail–open. Because their primary task is classification, they do nothing to prevent an attack from succeeding. An intrusion...
Anomalous payload-based worm detection and signature generation (2005)
Ke Wang, Gabriela Cretu, Salvatore J. Stolfo
Abstract. New features of the PAYL anomalous payload detection sensor are demonstrated to accurately detect and generate signatures for zero-day worms. Experimental evidence demonstrates that...
FLIPS: Hybrid adaptive intrusion prevention (2005)
Michael E. Locasto, Ke Wang, Angelos D. Keromytis, Salvatore J. Stolfo
Abstract. Intrusion detection systems are fundamentally passive and fail–open. Because their primary task is classification, they do nothing to prevent an attack from succeeding. An intrusion...
Email Mining Toolkit Supporting Law Enforcement Forensic Analyses NSF Final Report. DG.o 2005 (2005)
The Email Mining Toolkit (EMT) is a data mining tool that visualizes a very wide range of detailed analyses of email and email flows derived from an archive of email in a variety of formats. EMT may...
Anomalous payload-based worm detection and signature generation (2005)
Ke Wang, Gabriela Cretu, Salvatore J. Stolfo
Abstract. New features of the PAYL anomalous payload detection sensor are presented and demonstrated to accurately detect and generate signatures for zero-day worm exploits. Experimental evidence is...
Extracting Context To Improve Accuracy For HTML Content Extraction (2004)
Gupta, Suhit, Kaiser, Gail E., Stolfo, Salvatore J.
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of 'useful and relevant' content...
Host-based Anomaly Detection Using Wrapping File Systems (2004)
Hershkop, Shlomo, Bui, Linh H., Ferst, Ryan, Stolfo, Salvatore J.
We describe an anomaly detector, called FWRAP, for a Host-based Intrusion Detection System that monitors file system calls to detect anomalous accesses. The system is intended to be used not as a...
Anomalous payload-based network intrusion detection (2004)
We present a payload-based anomaly detector, we call PAYL, for intrusion detection. PAYL models the normal application payload of network traffic in a fully automatic, unsupervised fashion. The...
Detecting viral propagations using email behavior profiles (2004)
Salvatore J. Stolfo, Wei-jen Li, Shlomo Hershkop, Ke Wang, Chia-wei Hu, Olivier Nimeskern
The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a variety of forensic analyses and detection...
Anomalous Payload-based Network Intrusion Detection (2004)
We present a payload-based anomaly detector, we call PAYL, for intrusion detection. PAYL models the normal application payload of network traffic in a fully automatic, unsupervised and very effecient...
Detecting viral propagations using email behavior profiles (2004)
Salvatore J. Stolfo, Wei-jen Li, Shlomo Hershkop, Ke Wang, Chia-wei Hu, Olivier Nimeskern
The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a variety of forensic analyses and detection...
A Holistic Approach to Service Survivability (2003)
Keromytis, Angelos D., Parekh, Janak, Gross, Philip N., Kaiser, Gail E., Misra, Vishal, Nieh, Jason, ...
We present SABER (Survivability Architecture: Block, Evade, React), a proposed survivability architecture that blocks, evades and reacts to a variety of attacks by using several security and...
One class support vector machines for detecting anomalous windows registry accesses (2003)
Katherine A. Heller, Krysta M. Svore, Angelos D. Keromytis, Salvatore J. Stolfo
We present a new Host-based Intrusion Detection System (IDS) that monitors accesses to the Microsoft Windows Registry using Registry Anomaly Detection (RAD). Our system uses a one class Support...
One-Class Training for Masquerade Detection (2003)
We extend prior research on masquerade detection using UNIX commands issued by users as the audit source. Previous studies using multi-class training requires gathering data from multiple users to...
Surveillance detection in high bandwidth environments (2003)
Seth Robertson, Eric V. Siegel, Matt Miller, Salvatore J. Stolfo
In this paper, we describe System Detection’s surveillance detection techniques for enclave environments (ESD) and peering center environments (PSD) and evaluate each technique over data gathered...
A Behavior-based Approach To Securing Email Systems (2003)
Salvatore J. Stolfo, Shlomo Hershkop, Ke Wang, Olivier Nimeskern, Chia-Wei Hu, Wei Hu
The Malicious Email Tracking (MET) system, reported in a prior publication, is a behavior-based security system for email services. The Email Mining Toolkit (EMT) presented in this paper is an...
Combining Behavior Models to Secure Email Systems (2003)
Salvatore J. Stolfo, Chia-wei Hu, Wei-jen Li, Shlomo Hershkop, Ke Wang, Olivier Nimeskern
We introduce the Email Mining Toolkit (EMT), a system that implements behavior-based methods to improve security of email systems. Behavior models of email flows and email account usage may be used...
Citizen's Attitudes about Privacy While Accessing Government and Private (2003)
Websites Results Of, Salvatore J. Stolfo, Eric Johnson, Tomislav Pavlicic, Stephen Jan
This paper reports the results of an investigation on citizens' attitudes and concerns regarding privacy and security on the Web, in general, and on the government websites they may visit, in...
A network worm vaccine architecture (2003)
Stelios Sidiroglou, John Ioannidis, Angelos D. Keromytis, Salvatore J. Stolfo
Abstract. We present an architecture for detecting “zero-day ” worms and viruses in incoming email. Our main idea is to intercept every incoming message, prescan it for potentially dangerous...
A Behavior-based Approach to Securing Email Systems (2003)
Salvatore J. Stolfo, Shlomo Hershkop, Ke Wang, Olivier Nimeskern, Chia-Wei Hu, Chia-wei Wu
The Malicious Email Tracking (MET) system, reported in a prior publication, is a behavior-based security system for email services. The Email Mining Toolkit (EMT) presented in this paper is an...
One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses (2003)
Katherine A. Heller, Krysta M. Svore, Angelos D. Keromytis, Salvatore J. Stolfo
We present a new Host-based Intrusion Detection System (IDS) that monitors accesses to the Microsoft Windows Registry using Registry Anomaly Detection (RAD). Our system uses a one class Support...
Automatic Discovery of Heuristics for Non-Deterministic Programs. (2002)
Stolfo,Salvatore J., Harrison,Malcolm C.
During the last few years a number of relatively effective AI programs have been written incorporating considerable amounts of problem specific knowledge. Consequently, the problem of encoding such...
Toward cost-sensitive modeling for intrusion detection and response (2002)
Wenke Lee, Matthew Miller, Salvatore J. Stolfo, Wei Fan, Erez Zadok
Intrusion detection systems (IDSs) must maximize the realization of security goals while minimizing costs. In this paper, we study the problem of building cost-sensitive intrusion detection models....
Toward cost-sensitive modeling for intrusion detection and response (2002)
Wenke Lee, Wei Fan, Matthew Miller, Salvatore J. Stolfo, Erez Zadok
Intrusion detection systems (IDSs) must maximize the realization of security goals while minimizing costs. In this paper, we study the problem of building cost-sensitive intrusion detection models....
Toward cost-sensitive modeling for intrusion detection and response (2002)
Wenke Lee, Matthew Miller, Salvatore J. Stolfo, Wei Fan, Erez Zadok
Intrusion detection systems (IDSs) must maximize the realization of security goals while minimizing costs. In this paper, we study the problem of building cost-sensitive intrusion detection models....
MET: An Experimental System for Malicious Email Tracking (2002)
Manasi Bhattacharyya, Matthew G. Schultz, Eleazar Eskin, Shlomo Hershkop, Salvatore J. Stolfo
Despite the use of state of the art methods to protect against malicious programs, they continue to threaten and damage computer systems around the world. In this paper we present MET, the Malicious...
Distributed Data Mining: The JAM system architecture (2001)
Prodromidis, Andreas L., Stolfo, Salvatore J., Tselepis, Shelley, Truta, Terrance, Sherwin, Jeffrey, Kalina, David
This paper describes the system architecture ofJAM (Java Agents for Meta-learning), a distributed data mining systemthat scales up to large and physically separated data sets. An earlyversion of the...
Real time data mining-based intrusion detection (2001)
Wenke Lee, Salvatore J. Stolfo, Philip K. Chan, Eleazar Eskin, Wei Fan, Matthew Miller, ...
Salvatore J. Stolfo, Wenke Lee, Philip K, Wei Fan, Eleazar Eskin
The field of Intrusion Detection has been an active area of research for some time. The goal of an Intrusion Detection System (IDS) is to provide another layer of defense against malicious (or
Data mining methods for detection of new malicious executables (2001)
Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo
A serious security threat today is malicious executables, especially new, unseen malicious executables. Many of these new malicious executables are undetectable by current anti-virus systems because...
Data mining methods for detection of new malicious executables (2001)
Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo
A serious security threat today is malicious executables, especially new, unseen malicious executables. Many of these new malicious executables are undetectable by current anti-virus systems because...
Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables (2001)
Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Manasi Bhattacharyya, Salvatore J. Stolfo
We present Malicious Email Filter, MEF, a freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server....
Data mining methods for detection of new malicious executables (2001)
Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo
A serious security threat today is malicious executables, especially new, unseen malicious executables. Many of these new malicious executables are undetectable by current anti-virus systems because...
Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables (2001)
Matthew G. Schultz, Eleazar Eskin, Salvatore J. Stolfo
We present Malicious Email Filter, MEF, a freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server....
Data mining methods for detection of new malicious executables (2001)
Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo
A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of...
Modeling system calls for intrusion detection with dynamic window sizes (2001)
Eleazar Eskin, Wenke Lee, Salvatore J. Stolfo
We extend prior research on system call anomaly detection modeling methods for intrusion detection by incorporating dynamic window sizes. The window size is the length of the subsequence of a system...
MEF: Malicious Email Filter (2001)
Unix Mail Filter, Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Manasi Bhattacharyya, Salvatore J. Stolfo
We present Malicious Email Filter, MEF, a freely distributed malicious binary filter incorporated into Procmail that can detect malicious Windows attachments by integrating with a UNIX mail server....
Real Time Data Mining-based Intrusion Detection (2001)
Wenke Lee Salvatore, Salvatore J. Stolfo, Philip K. Chan, Eleazar Eskin, Wei Fan, Matthew Miller, ...
In this paper, we present an overview of our research in real time data mining-based intrusion detection systems (IDSs). We focus on issues related to deploying a data mining-based IDS in a real time...
Real time data mining-based intrusion detection (2001)
Wenke Lee, Salvatore J. Stolfo, Philip K. Chan, Eleazar Eskin, Wei Fan, Matthew Miller, ...
1
Toward Cost-Sensitive Modeling for Intrusion Detection (2000)
Lee, Wenke, Miller, Matthew, Stolfo, Salvatore J., Jallad, Kahil, Park, Christoper, Zadok, Erez, ...
Intrusion detection systems need to maximize security while minimizing costs. In this paper, we study the problem of building cost-sensitive intrusion detection models. We examine the major cost...
A Framework for Constructing Features and Models for Intrusion Detection Systems (2000)
Wenke Lee, Salvatore J. Stolfo
Intrusion detection (ID) is an important component of infrastructure protection mechanisms. Intrusion detection systems (IDSs) need to be accurate, adaptive, and extensible. Given these requirements...
Adaptive Intrusion Detection: A Data Mining Approach (2000)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
Abstract. In this paper we describe a data mining framework for constructing intrusion detection models. The first key idea is to mine system audit data for consistent and useful patterns of program...
A Framework for Constructing Features and Models for Intrusion Detection Systems (2000)
Wenke Lee, Salvatore J. Stolfo, Name Salvatore, J. Stolfo
Intrusion detection (ID) is an important component of infrastructure protection mechanisms. Intrusion detection systems (IDSs) need to be accurate, adaptive, and extensible. Given these requirements...
A Data Mining and CIDF Based Approach for Detecting Novel and Distributed Intrusions (2000)
Wenke Lee, Rahul A. Nimbalkar, Kam K. Yee, Sunil B. Patil, Pragneshkumar H. Desai, Thuan T. Tran, ...
. As the recent distributed Denial-of-Service (DDOS) attacks on several major Internet sites have shown us, no open computer network is immune from intrusions. Furthermore, intrusion detection...
Meta-Learning in Distributed Data Mining Systems: Issues and Approaches (2000)
Andreas L. Prodromidis, Philip K. Chan, Salvatore J. Stolfo
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning...
A Multiple Model Cost-Sensitive Approach for Intrusion Detection (2000)
Wei Fan, Wenke Lee, Salvatore J. Stolfo, Matthew Miller
Intrusion detection systems (IDSs) need to maximize security while minimizing costs. In this paper, we study the problem of building cost-sensitive intrusion detection models to be used for realtime...
Meta-Learning in Distributed Data Mining Systems: Issues and Approaches (2000)
Andreas L. Prodromidis, Philip K. Chan, Salvatore J. Stolfo
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning...
Agent-Based Distributed Learning Applied to Fraud Detection (1999)
Prodromidis, Andreas L., Stolfo, Salvatore J.
Inductive learning and classification techniqueshave been applied in many problems in diverse areas. In this paper wedescribe an AI-based approach that combines inductive learningalgorithms and...
Automated Intrusion Detection using NFR: Methods and Experiences (1999)
Wenke Lee, Wenke Lee, Christopher T. Park, Christopher T. Park, Salvatore J. Stolfo, Salvatore J. Stolfo
Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright...
AdaCost: misclassification cost-sensitive boosting (1999)
AdaCost, a variant of AdaBoost, is a misclassification cost-sensitive boosting method. It uses the cost of misclassifications to update the training distribution on successive boosting rounds. The...
Distributed Data Mining in Credit Card Fraud Detection (1999)
Philip K. Chan, Wei Fan, Andreas Prodromidis, Salvatore J. Stolfo
Credit card transactions continue to grow in number, taking a larger share of the US payment system, and have led to a higher rate of stolen account numbers and subsequent losses by banks. Hence,...
Effective and efficient pruning of metaclassifiers in a distributed data mining system (1999)
Andreas L. Prodromidis, Salvatore J. Stolfo, Philip K. Chan
Distributed data mining systems aim to discover and combine useful information that is distributed across multiple databases. One of the main challenges is the design of effective and efficient...
Distributed Data Mining in Credit Card Fraud Detection (1999)
Philip K. Chan, Wei Fan, Salvatore J. Stolfo
Credit card transactions continue to grow in number, taking a larger share of the US payment system, and have led to a higher rate of stolen account numbers and subsequent losses by banks. Hence,...
A Data Mining Framework for Building Intrusion Detection Models (1999)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
There is often the need to update an installed Intrusion Detection System (IDS) due to new attack methods or upgraded computing environments. Since many current IDSs are constructed by manual...
Algorithms For Mining System Audit Data (1999)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
We describe our research in applying data mining techniques to construct intrusion detection models. The key ideas are to mine system audit data for consistent and useful patterns of program and user...
Automated Intrusion Detection Methods Using NFR (1999)
Wenke Lee, Christopher T. Park, Salvatore J. Stolfo
There is often the need to update an installed Intrusion Detection System (IDS) due to new attack methods or upgraded computing environments. Since many current IDSs are constructed by manual...
Mining in a data-flow environment: experience in network intrusion detection (1999)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
We discuss the KDD process in "data-flow " environments, where unstructured and time dependent data can be processed into various levels of structured and semanticallyrich forms for...
AdaCost: Misclassification Cost-sensitive Boosting (1999)
Wei Fan, Salvatore J. Stolfo, Junxin Zhang, Philip K. Chan
AdaCost, a variant of AdaBoost, is a misclassification cost-sensitive boosting method. It uses the cost of misclassifications to update the training distribution on successive boosting rounds. The...
Using Conflicts Among Multiple Base Classifiers to Measure the Performance of Stacking (1999)
Wei Fan, Salvatore J. Stolfo, Philip K. Chan
We analyze the machine learning bias of stacking and point out the conflict problem. Conflicts are defined as base data with di#erent class labels that produced the same predictions by a set of base...
Distributed Data Mining in Credit Card Fraud Detection (1999)
Philip K. Chan, Wei Fan, Andreas L. Prodromidis, Salvatore J. Stolfo
this article, we survey and evaluate a number of techniques that address these three main issues concurrently. Our proposed methods of combining multiple learned fraud detectors under a "cost...
A Data Mining Framework for Building Intrusion Detection Models (1999)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
There is often the need to update an installed Intrusion Detection System (IDS) due to new attack methods or upgraded computing environments. Since many current IDSs are constructed by manual...
Philip Chan, Salvatore J. Stolfo
Many factors influence the performance of a learned classifier. In this paper we study different methods of measuring performance based on a unified set of cost models and the effects of training...
Effective and Efficient Pruning of Meta-Classifiers in a Distributed Data Mining System (1999)
Andreas L. Prodromidis, Salvatore J. Stolfo, Philip K. Chan
Distributed data mining systems aim to discover and combine useful information that is distributed across multiple databases. One of the main challenges is the design of effective and efficient...
Cost Complexity-based Pruning of Ensemble Classifiers (1999)
Andreas L. Prodromidis, Salvatore J. Stolfo
In this paper we study methods that combine multiple classification models learned over separate data sets in a distributed database setting. Numerous studies posit that such approaches provide the...
Mining in a Data-flow Environment: Experience in Network Intrusion Detection (1999)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
In this paper we discuss the KDD process in "data-flow" environments, where unstructured and time dependent data can be processed into various levels of structured and semantically-rich...
Minimal cost complexity pruning of meta-classifiers (1999)
Andreas L. Prodromidis, Salvatore J. Stolfo
Integrating multiple learned classification models (classifiers) computed over large and (physically) distributed data sets has been demonstrated as an effective approach to scaling inductive...
A Comparative Evaluation of Meta-Learning Strategies over Large and Distributed Data Sets (1999)
Andreas L. Prodromidis, Salvatore J. Stolfo
There has been considerable interest recently in various approaches to scaling up machine learning systems to large and distributed data sets. We have been studying approaches based upon the parallel...
The application of AdaBoost for distributed, scalable and on-line learning (1999)
Wei Fan, Salvatore J. Stolfo, Junxin Zhang
1 Introduction Learning from very large and distributed databases imposes major performance challenges for data mining. Many databases have grown too large to fit into main memory. Learning a...
Data Mining Approaches for Intrusion Detection (1998)
Lee, Wenke, Stolfo, Salvatore J.
In this paper we discuss our research in developing general and systematic methods for intrusion detection. The key ideas are to use data mining techniques to discover consistent and useful patterns...
Application-Level Anomaly Detection for the Master Caution Panel (1998)
The goal of this work was to study how to monitor a large distributed system and apply machine learning methods to, and generate models of, its normal operation. With this done, the generated...
Pruning Classifiers in a Distributed Meta-Learning System (1998)
Prodromidis, Andreas L., Stolfo, Salvatore J.
JAM is a powerful and portable agent-baseddistributed data mining system that employs meta-learning techniquesto integrate a number of independent classifiers (models) derived inparallel from...
Philip K. Chan, Salvatore J. Stolfo
Many factors influence a learning process and the performance of a learned classifier. In this paper we investigate the effects of class distribution in the training set on performance. We also study...
Pruning Meta-Classifiers in a Distributed Data Mining System (1998)
Andreas L. Prodromidis, Salvatore J. Stolfo
JAM is a powerful and portable agent-based distributed data mining system that employs metalearning techniques to integrate a number of independent classifiers (models) derived in parallel from...
Pruning Meta-Classifiers in a Distributed Data Mining System (1998)
Andreas L. Prodromidis, Salvatore J. Stolfo
is a powerful and portable agent-based distributed data mining system that employs metalearning techniques to integrate a number of independent classifiers (models) derived in parallel from...
Mining Audit Data to Build Intrusion Detection Models (1998)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
In this paper we discuss a data mining framework for constructing intrusion detection models. The key ideas are to mine system audit data for consistent and useful patterns of program and user...
Data Mining Approaches for Intrusion Detection (1998)
Wenke Lee, Salvatore J. Stolfo
In this paper we discuss our research in developing general and systematic methods for intrusion detection. The key ideas are to use data mining techniques to discover consistent and useful patterns...
Pruning Meta-Classifiers in a Distributed Data Mining System (1998)
Andreas L. Prodromidis, Salvatore J. Stolfo
JAM is a powerful and portable agent-based distributed data mining system that employs meta-learning techniques to integrate a number of independent classifiers (models) derived in parallel from...
Agent-based fraud and intrusion detection in financial information systems (1998)
Salvatore J. Stolfo, David W. Fan, Andreas Prodromidis, Wenke Lee, Shelley Tselepis, Philip K. Chan
A secured and trusted inter-banking network for electronic commerce requires high speed verification and authentication mechanisms that allow legitimate users easy access to conduct their business,...
Philip Chan, Salvatore J. Stolfo
Very large databases with skewed class distributions and non-uniform cost per error are not uncommon in real-world data mining tasks. One such task is credit card fraud detection: the number of...
Philip K. Chan, Salvatore J. Stolfo
Very large databases with skewed class distributions and non-uniform cost per error are not uncommon in real-world data mining tasks. We devised a multi-classifier meta-learning approach to address...
A Data Mining Framework for Adaptive Intrusion Detection (1998)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
In this paper we describe a data mining framework for constructing intrusion detection models. The key ideas are to mine system audit data for consistent and useful patterns of program and user...
Philip K. Chan, Salvatore J. Stolfo
. Many factors influence a learning process and the performance of a learned classifier. In this paper we investigate the performance effects of class distribution in the training set. We also study...
Pruning Meta-Classifiers in a Distributed Data Mining System (1998)
Andreas Prodromidis, Salvatore J. Stolfo
JAM is a powerful and portable agent-based distributed data mining system that employs meta-learning techniques to integrate a number of independent classifiers (models) derived in parallel from...
Mining Audit Data to Build Intrusion Detection Models (1998)
Wenke Lee, Salvatore J. Stolfo, Kui W. Mok
In this paper we discuss a data mining framework for constructing intrusion detection models. The key ideas are to mine system audit data for consistent and useful patterns of program and user...
Behavior-based modeling and its application to email analysis (1998)
Salvatore J. Stolfo, Shlomo Hershkop, Chia-wei Hu, Wei-jen Li, Olivier Nimeskern, Ke Wang
The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a multitude of tasks including forensic analyses...
Behavior-based modeling and its application to email analysis (1998)
Salvatore J. Stolfo, Shlomo Hershkop, Chia-wei Hu, Wei-jen Li, Olivier Nimeskern, Ke Wang
The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a multitude of tasks including forensic analyses...
Learning Patterns from Unix Process Execution Traces for Intrusion Detection (1997)
Wenke Lee, Salvatore J. Stolfo, Philip K. Chan
In this paper we describe our preliminary experiments to extend the work pioneered by Forrest (see Forrest et al. 1996) on learning the (normal and abnormal) patterns of Unix processes. These...
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results (1997)
Salvatore J. Stolfo, David W. Fan, Wenke Lee, Andreas L. Prodromidis, Philip K. Chan
In this paper we describe initial experiments using meta-learning techniques to learn models of fraudulent credit card transactions. Our collaborators, some of the nation's largest banks, have...
Learning Patterns from Unix Process Execution Traces for Intrusion Detection (1997)
Wenke Lee, Salvatore J. Stolfo, Philip K. Chan
In this paper we describe our preliminary experiments to extend the work pioneered by Forrest (see Forrest et al. 1996) on learning the (normal and abnormal) patterns of Unix processes. These...
Scalability of Hierarchical Meta-Learning on Partitioned Data (1997)
Philip K. Chan, Salvatore J. Stolfo
In this paper we study the issue of how to scale machine learning algorithms, that typically are designed to deal with main-memory based datasets, to efficiently learn models from large distributed...
A Generalization of Band Joins and The Merge/Purge Problem (1996)
Mauricio A. Hernández, Mauricio A. Hern'andez, Salvatore J. Stolfo
The problem of merging multiple databases of information about common entities is frequently encountered in large commercial and government organizations. The problem we study is often called the...
A Comparative Evaluation of Combiner and Stacked Generalization (1996)
David W. Fan, Philip K. Chan, Salvatore J. Stolfo
Combiner and Stacked Generalization are two very similar meta-learning methods that combine predictions of multiple classifiers to improve accuracy of any single classifier. In this paper, we compare...
On the Accuracy of Meta-learning for Scalable Data Mining (1996)
Philip Chan, Salvatore J. Stolfo
. In this paper, we describe a general approach to scaling data mining applications that we have come to call meta-learning. Meta-Learning refers to a general strategy that seeks to learn how to...
Sharing Learned Models among Remote Database Partitions by Local Meta-learning (1996)
Philip K. Chan, Salvatore J. Stolfo
We explore the possibility of importing "blackbox " models learned over data sources at remote sites to improve models learned over locally available data sources. In this way, we may be...
Scaling Learning by Meta-Learning over Disjoint and Partially Replicated Data (1996)
Philip Chan, Salvatore J. Stolfo
Many existing learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of inherently distributed data. One approach we explore to handling...
Scaling Learning by Meta-Learning over Disjoint and Partially Replicated Data (1996)
Philip Chan, Salvatore J. Stolfo
Many existing learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of inherently distributed data. One approach we explore to handling...
Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning (1995)
Philip K. Chan, Salvatore J. Stolfo
Knowledge discovery in databases has become an increasingly important research topic with the advent of wide area network computing. One of the crucial problems we study in this paper is how to scale...
A Comparative Evaluation of Voting and Meta-learning on Partitioned Data (1995)
Philip Chan, Salvatore J. Stolfo
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of very large network computing, it is likely that orders of magnitude...
A Comparative Evaluation of Voting and Meta-learning on Partitioned Data (1995)
Philip K. Chan, Salvatore J. Stolfo
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of very large network computing, it is likely that orders of magnitude...
Predictive Dynamic Load Balancing of Parallel and Distributed Rule and Query Processing (1994)
Hasanat M. Dewan, Salvatore J. Stolfo, Mauricio Hern
Expert Databases are environments that support the processing of rule programs against a disk resident database. They occupy a position intermediate between active and deductive databases, with...
Toward Scalable and Parallel Inductive Learning: A Case Study in Splice Junction Prediction (1994)
Philip Chan, Salvatore J. Stolfo
Much of the research in inductive learning concentrates on problems with relatively small amounts of training data. With the steady progress of the Human Genome Project, it is likely that orders of...
Toward Parallel and Distributed Learning by Meta-Learning (1993)
Philip Chan, Salvatore J. Stolfo
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of very large network computing, it is likely that orders of magnitude...
Experiments on Multistrategy Learning by Meta-Learning (1993)
Philip Chan, Salvatore J. Stolfo
In this paper, we propose meta-learning as a general technique to combine the results of multiple learning algorithms, each applied to a set of training data. We detail several metalearning...
Parallel Programming of Rule-based Systems in PARULEL (1993)
Mauricio A. Hernández, Mauricio A. Hern'andez, Salvatore J. Stolfo
Although the problem of increasing the speed of rulebased programs by parallel processing has been studied for a long while, so far the level of parallelism achieved under various parallel processing...
PARULEL: Parallel Rule Processing Using Meta-rules for Redaction (1991)
Salvatore J. Stolfo, Ouri Wolfson, Philip K. Chan, Hasanat M. Dewan, Leland Woodbury, Jason S. Glazier, ...
Although the problem of increasing the speed of rule-based programs has been studied for a long while, so far the level of parallelism achieved under various parallel processing schemes fails to meet...
Incremental Evaluation of Rules and its Relationship to Parallelism (1991)
Ouri Wolfson, Hasanat M. Dewan, Salvatore J. Stolfo, Yechiam Yemini
Rule interpreters usually start with an initial database and perform the inference procedure in cycles, ending with a final database. In a real time environment it is possible to receive updates to...
Salvatore J. Stolfo, Leland Woodbury, Jason Glazier, Philip Chan
this report in detail.
Speech recognition in parallel (1989)
Salvatore J. Stolfo, Zvi Galil, Kathleen Mckeown, Russell Mills
Concomitantly with recent advances in speech coding, recognition and production, parallel computer systems are now commonplace delivenng raw computing power measured in hundreds of MIPS and...
Typescript.
Cost-based Modeling for Fraud and Intrusion Detection: Results from the JAM Project
Salvatore J. Stolfo, Wei Fan, Wenke Lee, Andreas Prodromidis, Philip K. Chan
In this paper we describe the results achieved using the JAM distributed data mining system for the real world problem of fraud detection in financial information systems. For this domain we provide...
Cost-based Modeling for Fraud and Intrusion Detection: Results from the JAM Project
Salvatore J. Stolfo, Wei Fan, Wenke Lee, Andreas Prodromidis, Philip K. Chan
In this paper we describe the results achieved using the JAM distributed data mining system for the real world problem of fraud detection in financial information systems. For this domain we provide...
Data Mining Methods for Detection of New Malicious Executables
Matthew Schultz And, Matthew G. Schultz, Eleazar Eskin, Erez Zadok, Salvatore J. Stolfo
A serious security threat today is malicious executables, especially new, unseen malicious executables often arriving as email attachments. These new malicious executables are created at the rate of...