Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma
A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page...
Exploring URL Hit Priors for Web Search (2008)
Ruihua Song, Guomao Xin, Shuming Shi, Ji-rong Wen, Wei-ying Ma
Abstract. URL usually contains meaningful information for measuring the relevance of a Web page to a query in Web search. Some existing works utilize URL depth priors (i.e. the probability of being a...
Pseudo-Anchor Text Extraction for Vertical Search (2008)
Shuming Shi, Fei Xing, Mingjie Zhu, Zaiqing Nie, Ji-rong Wen
Anchor text plays a special important role in improving the performance of general Web search. The importance of anchor text comes from the fact that it is fairly objective description for a Web page...
Dynamic Hierarchical Markov Random Fields and their Application to Web Data Extraction (2008)
Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-rong Wen
Hierarchical models have been extensively studied in various domains. However, existing models assume fixed model structures or incorporate structural uncertainty generatively. In this paper, we...
Ruihua Song, Zhenxiao Luo, Ji-rong Wen, Yong Yu, Hsiao-wuen Hon
It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of “how many queries are...
Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-rong Wen, Wei-ying Ma
The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web...
2 Content-based Query Clustering 5 (2008)
W. Wu, Ji-rong Wen, Hong-jiang Zhang
2.2 Similarity Based on Word-Level String Matching........... 6
The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are only accessible...
ABSTRACT Learning Block Importance Models for Web Pages (2008)
Ruihua Song, Haifeng Liu, Ji-rong Wen, Wei-ying Ma
Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that...
A Systematic Study of Parameter Correlations in (2008)
Large Scale Duplicate, Shaozhi Ye, Ji-rong Wen, Wei-ying Ma
Although much work has been done on duplicate document detection (DDD) and its applications, we observe the absence of a systematic study of the performance and scalability of large-scale DDD. It is...
A systematic study on parameter correlations in large-scale duplicate document detection (2008)
Ye, Shaozhi, Wen, Ji-Rong, Ma, Wei-Ying
Although much work has been done on duplicate document detection (DDD) and its applications, we observe the absence of a systematic study on the performance and scalability of large-scale DDD...
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Queries to search engines on the Web are usually short. They do not provide sufficient indications for an effective selection of relevant documents. Previous research has proposed the utilization of...
Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-rong Wen, Wei-ying Ma
The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web...
A systematic study of parameter correlations in large scale duplicate document detection (2006)
Ye, Shaozhi, Wen, Ji-Rong, Ma, Wei-Ying
Although much work has been done on duplicate document detection (DDD) and its applications, we observe the absence of a systematic study of the performance and scalability of large-scale DDD. It is...
Automated Known Problem Diagnosis with Event Traces (2006)
Chun Yuan, Ni Lao, Ji-rong Wen, Jiwei Li, Zheng Zhang, Yi-min Wang, ...
Computer problem diagnosis remains a serious challenge to users and support professionals. Traditional troubleshooting methods relying heavily on human intervention make the process inefficient and...
Automated Known Problem Diagnosis with Event Traces (2006)
Chun Yuan, Ni Lao, Ji-rong Wen, Jiwei Li, Zheng Zhang, Yi-min Wang, ...
Computer problem diagnosis remains a serious challenge to users and support professionals. Traditional troubleshooting methods relying heavily on human intervention make the process inefficient and...
Simultaneous record detection and attribute labeling in web data extraction (2006)
Jun Zhu, Zaiqing Nie, Ji-rong Wen
Recent work has shown the feasibility and promise of templateindependent Web data extraction. However, existing approaches use decoupled strategies – attempting to do data record detection and...
Object-Level Ranking: Bringing Order to Web Objects (2005)
Zaiqing Nie, Yuanzhi Zhang, Ji-rong Wen, Wei-ying Ma
In contrast with the current Web search methods that essentially do document-level ranking and retrieval, we are exploring a new paradigm to enable Web search at the object level. We collect Web...
Gravitationbased model for information retrieval (extended version (2005)
Shuming Shi, Ji-rong Wen, Qing Yu, Ruihua Song, Wei-ying Ma
This paper proposes GBM (gravitation-based model), a physical model for information retrieval inspired by Newton’s theory of gravitation. A mapping is built in this model from concepts of...
2D conditional random fields for web information extraction (2005)
Zaiqing Nie, Ji-rong Wen, Bo Zhang
The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects...
2D conditional random fields for web information extraction (2005)
Zaiqing Nie, Ji-rong Wen, Bo Zhang, Wei-ying Ma
The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects...
An Implementation and Experimental (2005)
Zaiqing Nie, Yuanzhi Zhang, Ji-rong Wen, Wei-ying Ma
In contrast with the current Web search methods that essentially do document-level ranking and retrieval, we are exploring a new paradigm to enable Web search at the object level. We collect Web...
Gravitationbased model for information retrieval (extended version (2005)
Shuming Shi, Ji-rong Wen, Qing Yu, Ruihua Song, Wei-ying Ma
This paper proposes GBM (gravitation-based model), a physical model for information retrieval inspired by Newton’s theory of gravitation. A mapping is built in this model from concepts of...
Gravitationbased model for information retrieval (extended version (2005)
Shuming Shi, Ji-rong Wen, Qing Yu, Ruihua Song, Wei-ying Ma
This paper proposes GBM (gravitation-based model), a physical model for information retrieval inspired by Newton’s theory of gravitation. A mapping is built in this model from concepts of...
Efficient browsing of web search results on mobile devices based on block importance model (2005)
Xing Xie, Gengxin Miao, Ruihua Song, Ji-rong Wen, Wei-ying Ma
It is expected that more and more people will search the web when they are on the move. Though conventional search engines can be directly visited from mobile devices with web browsing capabilities,...
Learning Block Importance Models for Web Pages (2004)
Song, Ruihua, Liu, Haifeng, Wen, Ji-Rong, Ma, Wei-Ying
Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that...
Probabilistic Model for Contextual Retrieval (2004)
Contextual retrieval is a critical technique for facilitating many important applications such as mobile search, personalized search, PC troubleshooting, etc. Despite of its importance, there is no...
Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma
Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to...
Organizing WWW Images Based on The Analysis of Page Layout and Web Link Structure (2004)
Deng Cai, Xiaofei He, Wei-ying Ma, Ji-rong Wen, Hongjiang Zhang
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...
Block-level Link Analysis (2004)
Deng Cai, Xiaofei He, Ji-rong Wen, Wei-ying Ma
Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web...
Why PCs Are Fragile and What We Can Do About It: A Study of Windows Registry Problems (2004)
Archana Ganapathi, Yi-min Wang, Ni Lao, Ji-rong Wen
Software configuration problems are a major source of failures in computer systems. In this paper, we present a new framework for categorizing configuration problems. We apply this categorization to...
J.X. Yu, X. Lin, H. Lu, and Y. Zhang (Eds.): APWeb 2004, LNCS 3007, pp. 48--58, 2004. (2004)
Shaozhi Ye, Ruihua Song, Ji-rong Wen, Wei-ying Ma
Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. offline and online...
A Query-Dependent Duplicate Detection Approach for Large Scale Search Engines (2004)
Shaozhi Ye, Ruihua Song, Ji-rong Wen, Wei-ying Ma
Abstract. Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. offline...
The Co-Evolution of Systems and (2004)
Shaozhi Ye, Ji-rong Wen, Wei-ying Ma
Under consideration for publication in Knowledge and Information
Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma
Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to...
Organizing WWW Images Based on The Analysis of Page Layout and Web Link Structure (2004)
Deng Cai, Xiaofei He, Wei-ying Ma, Ji-rong Wen, Hongjiang Zhang
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...
Deng Cai, Xiaofei He, Zhiwei Li, Wei-ying Ma, Ji-rong Wen
We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different...
Xiaofei He, Deng Cai, Ji-rong Wen, Wei-ying Ma, Hong-jiang Zhang
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...
Block-level Link Analysis (2004)
Deng Cai, Xiaofei He, Ji-rong Wen, Wei-ying Ma
Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web...
Deng Cai, Xiaofei He, Zhiwei Li, Wei-ying Ma, Ji-rong Wen
We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different...
ImageSeer: Clustering and Searching WWW Images (2004)
Xiaofei He, Deng Cai, Ji-rong Wen, Wei-ying Ma, Hong-jiang Zhang
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...
Learning Important Models for Web Page Blocks based on Layout and Content Analysis (2004)
Ruihua Song, Haifeng Liu, Ji-rong Wen, Wei-ying Ma
Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. It has also been proven that...
Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation (2003)
Yu, Shipeng, Cai, Deng, Wen, Ji-Rong, Ma, Wei-Ying
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from...
A Multi-paradigm Querying Approach for a Generic Multimedia Database Management System (2003)
Ji-rong Wen, Qing Li, Wei-ying Ma, Hong-jiang Zhang
To truly meet the requirements of multimedia database (MMDB) management, an integrated framework for modeling, managing and retrieving various kinds of media data in a uniform way is necessary....
Hierarchical indexing and flexible element retrieval for structured documents (2003)
Hang Cui, Ji-rong Wen, Tat-seng Chua
Abstract. As more and more structured documents, such as the SGML or XML documents, become available on the Web, there is a growing demand to develop effective structured document retrieval which...
Improving pseudo-relevance feedback in web information retrieval using web page segmentation (2003)
Shipeng Yu, Deng Cai, Ji-rong Wen, Wei-ying Ma
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from...
1 VIPS: a Vision-based Page Segmentation Algorithm (2003)
Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma, Deng Cai, Shipeng Yu, ...
A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page...
A Multi-paradigm Querying Approach for a Generic Multimedia Database Management System (2003)
Ji-rong Wen, Qing Li, Wei-ying Ma, Hong-jiang Zhang
To truly meet the requirements of multimedia database (MMDB) management, an integrated framework for modeling, managing and retrieving various kinds of media data in a uniform way is necessary....
Hierarchical indexing and flexible element retrieval for structured documents (2003)
Abstract. As more and more structured documents, such as SGML or XML documents become available on the Web, there is a growing demand to develop effective structured document retrieval which exploits...
Query Expansion by Mining User Logs (2003)
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Abstract—Queries to search engines on the Web are usually short. They do not provide sufficient evidence for an effective selection of relevant documents. Previous research has proposed the...
Cost-Driven Storage Schema Selection for XML (2003)
Various models and approaches have been proposed for mapping XML data into relational tables recently.
Hierarchical Indexing and Flexible Element Retrieval (2003)
For Structured Document, Hang Cui, Ji-rong Wen
As more and more structured documents, such as SGML or XML documents become available on the Web, there is a growing demand to develop effective structured document retrieval which exploits both...
Query Expansion by Mining User Logs (2003)
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of...
Improving pseudo-relevance feedback in web information retrieval using web page segmentation (2003)
Shipeng Yu, Deng Cai, Ji-rong Wen, Wei-ying Ma
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from...
Extracting content structure for web pages based on visual representation (2003)
Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma
Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page...
Probabilistic Query Expansion Using Query Logs (2002)
Cui, Hang, Wen, Ji-Rong, Nie, Jian-Yun, Ma, Wei-Ying
Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information...
A Concentric Model for Community Mining in Graph Structures (2002)
Wei-Jun Zhou, Wen-jun Zhou, Ji-rong Wen, Ji-rong Wen, Wei-ying Ma, Wei-ying Ma, ...
Discovering communities from a graph structure such as the Web has become an interesting research problem recently. In this paper, comparing with the state-of-the-art authority detecting and graph...
Probabilistic Query Expansion Using Query Logs (2002)
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information...
Probabilistic Query Expansion Using Query Logs (2002)
Hang Cui, Ji-Rong Wen, Wei-Ying Ma
Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in the traditional...
Ji-rong Wen, Jian-yun Nie, Hong-jiang Zhang
This paper describes a new query clustering method that makes use of user logs which allow us to identify the documents the users have selected for a query. The similarity between two queries may be...
Clustering User Queries of a Search Engine (2001)
Wen, Ji-Rong, Nie, Jian-Yun, Zhang, Hong-Jiang
In order to increase retrieval precision, some new search engines provide manually verified answers to Frequently Asked Queries (FAQs). An underlying task is the identification of FAQs. This paper...
Clustering User Queries of a Search Engine (2001)
In order to increase retrieval precision, some new search engines provide manually verified answers to Frequently Asked Queries (FAQs). An underlying task is the identification of FAQs. This paper...
Query clustering using content words and user feedback (2001)
Query clustering is crucial for automatically discovering frequently asked queries (FAQs) or most popular topics on a question-answering search engine. Due to the short length of queries, the...
Qiang Yang, Hai-feng Wang, Ji-rong Wen, Gao Zhang, Ye Lu, Hong-jiang Zhang
Abstract. As more information becomes available on the World Wide Web, it has become an acute problem to provide effective search tools for information access. Previous generations of search engines...
Towards A Next-Generation Search Engine (2000)
Qiang Yang, Hai-Feng Wang, Ji-rong Wen, Gao Zhang, Ye Lu, Kai-Fu Lee, ...
. As more information becomes available on the World Wide Web, it has become an acute problem to provide effective search tools for information access. Previous generations of search engines are...