ABSTRACT Building Bridges for Web Query Classification (2009)
Dou Shen, Jian-tao Sun, Qiang Yang, Zheng Chen
Web query classification (QC) aims to classify Web users’ queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web...
Latent Friend Mining from Blog Data (2008)
Dou Shen, Jian-tao Sun, Qiang Yang, Zheng Chen
The rapid growth of blog (also known as “weblog”) data provides a rich resource for social community mining. In this paper, we put forward a novel research problem of mining the latent friends of...
Detect and Track Latent Factors with Online Nonnegative Matrix Factorization (2008)
Bin Cao, Dou Shen, Jian-tao Sun, Xuanhui Wang, Qiang Yang, Zheng Chen
Detecting and tracking latent factors from temporal data is an important task. Most existing algorithms for latent topic detection such as Nonnegative Matrix Factorization (NMF) have been designed...
Web-Data Driven Approach for Bridging the Gap between Image Content and Concept (2008)
Xin-jing Wang, Jian-tao Sun, Wei-ying Ma, Xing Li
Abstract. Due to the semantic gap, current content-based image retrieval framework can not satisfy the complex demands created by a user’s preferences and subjectivity. To retrieve images in a...
Algorithms, Experimentation (2008)
Dou Shen, Qiang Yang, Jian-tao Sun, Zheng Chen
Text message stream is a newly emerging type of Web data which is produced in enormous quantities with the popularity of Instant Messaging and Internet Relay Chat. It is beneficial for detecting the...
ABSTRACT SHINE: Search Heterogeneous Interrelated Entities (2008)
Heterogeneous entities or objects are very common and are usually interrelated with each other in many scenarios. For example, typical Web search activities involve multiple types of interrelated...
Detect and Track Latent Factors with Online Nonnegative Matrix Factorization (2008)
Bin Cao, Dou Shen, Jian-tao Sun, Xuanhui Wang, Qiang Yang, Zheng Chen
Detecting and tracking latent factors from temporal data is an important task. Most existing algorithms for latent topic detection such as Nonnegative Matrix Factorization (NMF) have been designed...
ABSTRACT CWS: A Comparative Web Search System (2008)
Jian-tao Sun, Xuanhui Wang, Dou Shen, Hua-jun Zeng, Zheng Chen
In this paper, we define and study a novel search problem: Comparative Web Search (CWS). The task of CWS is to seek relevant and comparative information from the Web to help users conduct comparisons...
Document Summarization using Conditional Random Fields (2007)
Dou Shen, Jian-tao Sun, Hua Li, Qiang Yang, Zheng Chen
Many methods, including supervised and unsupervised algorithms, have been developed for extractive document summarization. Most supervised methods consider the summarization task as a twoclass...
Document Summarization using Conditional Random Fields (2007)
Dou Shen, Jian-tao Sun, Hua Li, Qiang Yang, Zheng Chen
Many methods, including supervised and unsupervised algorithms, have been developed for extractive document summarization. Most supervised methods consider the summarization task as a twoclass...
A Comparison of Implicit and Explicit Links for Web Page Classification (2006)
Dou Shen, Jian-Tao Sun, Qiang Yang, Zheng Chen
It is well known that Web-page classification can be enhanced by using hyperlinks that provide linkages between Web pages. However, in the Web space, hyperlinks are usually sparse, noisy and thus in...
Latent semantic analysis for multiple-type interrelated data objects (2006)
Xuanhui Wang, Jian-tao Sun, Zheng Chen, Chengxiang Zhai
Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA can only handle a...
Mining Clickthrough Data for Collaborative Web Search (2006)
Jian-tao Sun, Xuanhui Wang, Dou Shen, Hua-jun Zeng, Zheng Chen
Query enrichment for web-query classification (2006)
Dou Shen, Rong Pan, Jian-tao Sun, Jeffrey Junfeng Pan, Kangheng Wu, Jie Yin, ...
Web search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this paper, we present a new technique called...
Query enrichment for web-query classification (2006)
Dou Shen, Rong Pan, Jian-tao Sun, Jeffrey Junfeng Pan, Kangheng Wu, Jie Yin, ...
Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called...
Web-page summarization using clickthrough data (2005)
Jian-tao Sun, Qiang Yang, Yuchang Lu
Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to uncover the full knowledge associated with a Web page to build a high-quality summary,...
Q2c@ust: our winning solution to query classification in kddcup 2005 (2005)
Dou Shen, Rong Pan, Jian-tao Sun, Jeffrey Junfeng Pan, Kangheng Wu, Jie Yin, ...
In this paper, we describe our ensemble-search based approach,
CubeSVD: A Novel Approach to Personalized Web Search (2005)
Jian-tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu
As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users' information needs. This paper focuses on...
Web-Page Summarization Using Clickthrough Data (2005)
Jian-tao Sun, Qiang Yang, Yuchang Lu
Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to uncover the full knowledge associated with a Web page to build a high-quality summary,...
GE-CKO: A Method to Optimize Composite Kernels for Web Page Classification (2004)
Jian-tao Sun, Ben-yu Zhang, Zheng Chen, Yu-chang Lu, Chun-yi Shi, Wei-ying Ma
Most of current researches on Web page classification focus on leveraging heterogeneous features such as plain text, hyperlinks and anchor texts in an effective and efficient way. Composite kernel...