Benyu Zhang

Publication List Details

Period

2004 - 2009

Number

29

Co-Authors

Z: Effective and Efficient Dimensionality Reduction for Large-Scale and Streaming Data Preprocessing (2009)

Jun Yan, Benyu Zhang, Ning Liu, Shuicheng Yan, Qiansheng Cheng, Weiguo Fan, ...

Abstract—Dimensionality reduction is an essential data preprocessing technique for large-scale and streaming data classification tasks. It can be used to improve both the efficiency and the...

Z: Effective and Efficient Dimensionality Reduction for Large-Scale and Streaming Data Preprocessing (2008)

Jun Yan, Benyu Zhang, Ning Liu, Shuicheng Yan, Qiansheng Cheng, Weiguo Fan, ...

Abstract—Dimensionality reduction is an essential data preprocessing technique for large-scale and streaming data classification tasks. It can be used to improve both the efficiency and the...

ABSTRACT LEARNING SIMILARITY MEASURES IN NON-ORTHOGONAL SPACES (2008)

Ning Liu, Benyu Zhang, Jun Yan, Qiang Yang, Shuicheng Yan, Zheng Chen, ...

Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the...

Adding Semantics to Email Clustering (2008)

Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang

This paper presents a novel algorithm to cluster emails according to their contents and the sentence styles of their subject lines. In our algorithm, natural language processing techniques and...

Mining Quantitative Associations in Large Database* (2008)

Chenyong Hu, Yongji Wang, Benyu Zhang, Qiang Yang, Qing Wang, Jinhui Zhou, ...

Abstract. Association Rule Mining algorithms operate on a data matrix to derive association rule, discarding the quantities of the items, which contains valuable information. In order to make full...

A Similarity Reinforcement Algorithm for Heterogeneous Web Pages (2008)

Ning Liu, Jun Yan, Fengshan Bai, Benyu Zhang, Wensi Xi, Weiguo Fan, ...

Abstract. Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used...

Diverse Topic Phrase Extraction from Text Collection (2008)

Jilin Chen, Benyu Zhang, Dou Shen, Qiang Yang, Zheng Chen, Qiansheng Cheng

Keyword extraction is an efficient approach to managing an explosion of online text on the Web. Traditionally, an abstraction of the online text is constructed though keywords, which are extracted...

A Scalable Supervised Algorithm for Dimensionality Reduction on Streaming Data * (2008)

Jun Yan, Benyu Zhang, Shuicheng Yan, Ning Liu, Qiang Yang, Qiansheng Cheng, ...

Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks....

WWW 2007 / Poster Paper Topic: User Interfaces and Accessibility A Novel Clustering-based RSS Aggregator (2008)

Xin Li, Jun Yan, Zhihong Deng, Lei Ji, Weiguo Fan, Benyu Zhang, ...

In recent years, different commercial Weblog subscribing systems have been proposed to return stories from users ’ subscribed feeds. In this paper, we propose a novel clustering-based RSS...

WWW 2007 / Poster Paper Topic: Search Causal Relation of Queries from Temporal Logs (2008)

Yizhou Sun, Ning Liu, Kunqing Xie, Shuicheng Yan, Benyu Zhang, Zheng Chen

In this paper, we study a new problem of mining causal relation of queries in search engine query logs. Causal relation between two queries means event on one query is the causation of some event on...

Graph embedding and extension: A general framework for dimensionality reduction (2007)

Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-jiang Zhang, Qiang Yang, Senior Member, ...

Abstract—Over the past few decades, a large family of algorithms—supervised or unsupervised; stemming from statistics or geometry theory—has been designed to provide different solutions to the...

A novel scalable algorithm for supervised subspace learning (2006)

Jun Yan, Ning Liu, Benyu Zhang, Qiang Yang, Shuicheng Yan, Zheng Chen

Subspace learning approaches aim to discover important statistical distribution on lower dimensions for high dimensional data. Methods such as Principal Component Analysis (PCA) do not make use of...

Learning quantifiable associations via principal sparse non-negative matrix factorization (2005)

Chenyong Hu, Benyu Zhang, Yongji Wang, Shuicheng Yan, Zheng Chen, Qing Wang, ...

Association rules are traditionally designed to capture statistical relationship among itemsets in a given database. To additionally capture the quantitative association knowledge, Korn et.al....

An incremental subspace learning algorithm to categorize large scale text data (2005)

Jun Yan, Qiansheng Cheng, Qiang Yang, Benyu Zhang

Abstract. The dramatic growth in the number and size of on-line information sources has fueled increasing research interest in the incremental subspace learning problem. In this paper, we propose an...

Improving web search results using affinity graph (2005)

Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, ...

In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank search results by optimizing two metrics: (1) diversity-- which indicates the variance of topics in a group of...

Efficient text classification by weighted proximal svm (2005)

Dong Zhuang, Benyu Zhang, Qiang Yang, Jun Yan, Zheng Chen, Ying Chen

In this paper, we present an algorithm that can classify large-scale text data with high classification quality and fast training speed. Our method is based on a novel extension of the proximal SVM...

Efficient text classification by weighted proximal svm (2005)

Dong Zhuang, Benyu Zhang, Qiang Yang, Jun Yan, Zheng Chen, Ying Chen

In this paper, we present an algorithm that can classify large-scale text data with high classification quality and fast training speed. Our method is based on a novel extension of the proximal SVM...

Improving web search results using affinity graph (2005)

Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, ...

In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank search results by optimizing two metrics: (1) diversity-- which indicates the variance of topics in a group of...

Graph Embedding: A General Framework for Dimensionality Reduction (2005)

Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-jiang Zhang

In the last decades, a large family of algorithms ─ supervised or unsupervised; stemming from statistic or geometry theory─have been proposed to provide different solutions to the problem of...

Ocfs: Optimal orthogonal centroid feature selection for text categorization (2005)

Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, Qiansheng Cheng, ...

ABSTRACT 1 Text categorization is an important research area in many Information Retrieval (IR) applications. To save the storage space and computation time in text categorization, efficient and...

Link Fusion: A Unified Link Analysis Framework for Multi-Type Interrelated Data Objects I (2004)

Xi, Wensi, Zhang, Benyu, Chen, Zheng, Lu, Yizhou, Yan, Shuicheng, Ma, Wei-Ying

Web link analysis has proven to be a significant enhancement for quality based web search. Most existing links can be classified into two categories: intra-type links (e.g., web hyperlinks), which...

E.A.: Link fusion: A unified link analysis framework for multi-type interrelated data objects. In: WWW’04 (2004)

Wensi Xi, Benyu Zhang, Zheng Chen, Yizhou Lu, Shuicheng Yan, Wei-ying Ma, ...

Web link analysis has proven to be a significant enhancement for quality based web search. Most existing links can be classified into two categories: intra-type links (e.g., web hyperlinks), which...

Mining ratio rules via principal sparse non-negative matrix factorization (2004)

Chenyong Hu, Benyu Zhang, Shuicheng Yan, Qiang Yang, Jun Yan, Zheng Chen, ...

Association rules are traditionally designed to capture statistical relationship among itemsets in a given database. To additionally capture the quantitative association knowledge, F.Korn et al...

Affinity Rank: A New Scheme for Efficient Web Search (2004)

Yi Liu, Benyu Zhang, Zheng Chen, Michael R. Lyu, Wei-ying Ma

Maximizing only the relevance between queries and documents will not satisfy users if they want the top search results to present a wide coverage of topics by a few representative documents. In this...

Affinity Rank: A New Scheme for Efficient Web Search (2004)

Yi Liu, Benyu Zhang, Zheng Chen, Michael R. Lyu, Wei-ying Ma

Maximizing only the relevance between queries and documents will not satisfy users if they want the top search results to present a wide coverage of topics by a few representative documents. In this...

Improving text classification using local latent semantic indexing (2004)

Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, Gongyi Wu

Latent Semantic Indexing (LSI) has been shown to be extremely useful in information retrieval, but it is not an optimal representation for text classification. It always drops the text classification...

Learning similarity measures in non-orthogonal space (2004)

Ning Liu, Benyu Zhang, Jun Yan, Qiang Yang, Shuicheng Yan, Zheng Chen, ...

Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the...