Ji-rong Wen

Publication List Details

Period

2000 - 2008

Number

67

Co-Authors

General Terms (2008)

Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma

A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page...

Exploring URL Hit Priors for Web Search (2008)

Ruihua Song, Guomao Xin, Shuming Shi, Ji-rong Wen, Wei-ying Ma

Abstract. URL usually contains meaningful information for measuring the relevance of a Web page to a query in Web search. Some existing works utilize URL depth priors (i.e. the probability of being a...

Pseudo-Anchor Text Extraction for Vertical Search (2008)

Shuming Shi, Fei Xing, Mingjie Zhu, Zaiqing Nie, Ji-rong Wen

Anchor text plays a special important role in improving the performance of general Web search. The importance of anchor text comes from the fact that it is fairly objective description for a Web page...

Dynamic Hierarchical Markov Random Fields and their Application to Web Data Extraction (2008)

Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-rong Wen

Hierarchical models have been extensively studied in various domains. However, existing models assume fixed model structures or incorporate structural uncertainty generatively. In this paper, we...

Language (2008)

Ruihua Song, Zhenxiao Luo, Ji-rong Wen, Yong Yu, Hsiao-wuen Hon

It is widely believed that some queries submitted to search engines are by nature ambiguous (e.g., java, apple). However, few studies have investigated the questions of “how many queries are...

WWW 2007 / Track: Data Mining Session: Identifying Structure in Web Pages Web Object Retrieval (2008)

Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-rong Wen, Wei-ying Ma

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web...

2 Content-based Query Clustering 5 (2008)

W. Wu, Ji-rong Wen, Hong-jiang Zhang

2.2 Similarity Based on Word-Level String Matching........... 6

Abstract (2008)

Ping Wu, Ji-rong Wen

The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are only accessible...

ABSTRACT Learning Block Importance Models for Web Pages (2008)

Ruihua Song, Haifeng Liu, Ji-rong Wen, Wei-ying Ma

Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that...

A Systematic Study of Parameter Correlations in (2008)

Large Scale Duplicate, Shaozhi Ye, Ji-rong Wen, Wei-ying Ma

Although much work has been done on duplicate document detection (DDD) and its applications, we observe the absence of a systematic study of the performance and scalability of large-scale DDD. It is...

A systematic study on parameter correlations in large-scale duplicate document detection (2008)

Ye, Shaozhi, Wen, Ji-Rong, Ma, Wei-Ying

Although much work has been done on duplicate document detection (DDD) and its applications, we observe the absence of a systematic study on the performance and scalability of large-scale DDD...

Query expansion for short queries by mining user logs. http://research.microsoft.com/asia/dload files/group/mediasearching/2002p/QETKDE.pdf (2007)

Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma

Queries to search engines on the Web are usually short. They do not provide sufficient indications for an effective selection of relevant documents. Previous research has proposed the utilization of...

Web object retrieval (2007)

Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-rong Wen, Wei-ying Ma

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web...

A systematic study of parameter correlations in large scale duplicate document detection (2006)

Ye, Shaozhi, Wen, Ji-Rong, Ma, Wei-Ying

Although much work has been done on duplicate document detection (DDD) and its applications, we observe the absence of a systematic study of the performance and scalability of large-scale DDD. It is...

Automated Known Problem Diagnosis with Event Traces (2006)

Chun Yuan, Ni Lao, Ji-rong Wen, Jiwei Li, Zheng Zhang, Yi-min Wang, ...

Computer problem diagnosis remains a serious challenge to users and support professionals. Traditional troubleshooting methods relying heavily on human intervention make the process inefficient and...

Automated Known Problem Diagnosis with Event Traces (2006)

Chun Yuan, Ni Lao, Ji-rong Wen, Jiwei Li, Zheng Zhang, Yi-min Wang, ...

Computer problem diagnosis remains a serious challenge to users and support professionals. Traditional troubleshooting methods relying heavily on human intervention make the process inefficient and...

Simultaneous record detection and attribute labeling in web data extraction (2006)

Jun Zhu, Zaiqing Nie, Ji-rong Wen

Recent work has shown the feasibility and promise of templateindependent Web data extraction. However, existing approaches use decoupled strategies – attempting to do data record detection and...

Object-Level Ranking: Bringing Order to Web Objects (2005)

Zaiqing Nie, Yuanzhi Zhang, Ji-rong Wen, Wei-ying Ma

In contrast with the current Web search methods that essentially do document-level ranking and retrieval, we are exploring a new paradigm to enable Web search at the object level. We collect Web...

Gravitationbased model for information retrieval (extended version (2005)

Shuming Shi, Ji-rong Wen, Qing Yu, Ruihua Song, Wei-ying Ma

This paper proposes GBM (gravitation-based model), a physical model for information retrieval inspired by Newton’s theory of gravitation. A mapping is built in this model from concepts of...

2D conditional random fields for web information extraction (2005)

Zaiqing Nie, Ji-rong Wen, Bo Zhang

The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects...

2D conditional random fields for web information extraction (2005)

Zaiqing Nie, Ji-rong Wen, Bo Zhang, Wei-ying Ma

The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects...

An Implementation and Experimental (2005)

Zaiqing Nie, Yuanzhi Zhang, Ji-rong Wen, Wei-ying Ma

In contrast with the current Web search methods that essentially do document-level ranking and retrieval, we are exploring a new paradigm to enable Web search at the object level. We collect Web...

Gravitationbased model for information retrieval (extended version (2005)

Shuming Shi, Ji-rong Wen, Qing Yu, Ruihua Song, Wei-ying Ma

This paper proposes GBM (gravitation-based model), a physical model for information retrieval inspired by Newton’s theory of gravitation. A mapping is built in this model from concepts of...

Gravitationbased model for information retrieval (extended version (2005)

Shuming Shi, Ji-rong Wen, Qing Yu, Ruihua Song, Wei-ying Ma

This paper proposes GBM (gravitation-based model), a physical model for information retrieval inspired by Newton’s theory of gravitation. A mapping is built in this model from concepts of...

Efficient browsing of web search results on mobile devices based on block importance model (2005)

Xing Xie, Gengxin Miao, Ruihua Song, Ji-rong Wen, Wei-ying Ma

It is expected that more and more people will search the web when they are on the move. Though conventional search engines can be directly visited from mobile devices with web browsing capabilities,...

Learning Block Importance Models for Web Pages (2004)

Song, Ruihua, Liu, Haifeng, Wen, Ji-Rong, Ma, Wei-Ying

Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that...

Probabilistic Model for Contextual Retrieval (2004)

Ji-rong Wen

Contextual retrieval is a critical technique for facilitating many important applications such as mobile search, personalized search, PC troubleshooting, etc. Despite of its importance, there is no...

Blockbased web search (2004)

Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma

Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to...

Organizing WWW Images Based on The Analysis of Page Layout and Web Link Structure (2004)

Deng Cai, Xiaofei He, Wei-ying Ma, Ji-rong Wen, Hongjiang Zhang

Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...

Block-level Link Analysis (2004)

Deng Cai, Xiaofei He, Ji-rong Wen, Wei-ying Ma

Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web...

Why PCs Are Fragile and What We Can Do About It: A Study of Windows Registry Problems (2004)

Archana Ganapathi, Yi-min Wang, Ni Lao, Ji-rong Wen

Software configuration problems are a major source of failures in computer systems. In this paper, we present a new framework for categorizing configuration problems. We apply this categorization to...

J.X. Yu, X. Lin, H. Lu, and Y. Zhang (Eds.): APWeb 2004, LNCS 3007, pp. 48--58, 2004. (2004)

Shaozhi Ye, Ruihua Song, Ji-rong Wen, Wei-ying Ma

Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. offline and online...

A Query-Dependent Duplicate Detection Approach for Large Scale Search Engines (2004)

Shaozhi Ye, Ruihua Song, Ji-rong Wen, Wei-ying Ma

Abstract. Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. offline...

The Co-Evolution of Systems and (2004)

Shaozhi Ye, Ji-rong Wen, Wei-ying Ma

Under consideration for publication in Knowledge and Information

Blockbased web search (2004)

Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma

Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to...

Organizing WWW Images Based on The Analysis of Page Layout and Web Link Structure (2004)

Deng Cai, Xiaofei He, Wei-ying Ma, Ji-rong Wen, Hongjiang Zhang

Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...

Hierarchical clustering of WWW image search results using visual, textual and link information (2004)

Deng Cai, Xiaofei He, Zhiwei Li, Wei-ying Ma, Ji-rong Wen

We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different...

ImageSeer: Clustering and Searching WWW Images Using Link and Page Layout Analysis”, Microsoft (2004)

Xiaofei He, Deng Cai, Ji-rong Wen, Wei-ying Ma, Hong-jiang Zhang

Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...

Block-level Link Analysis (2004)

Deng Cai, Xiaofei He, Ji-rong Wen, Wei-ying Ma

Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web...

Hierarchical clustering of WWW image search results using visual, textual and link information (2004)

Deng Cai, Xiaofei He, Zhiwei Li, Wei-ying Ma, Ji-rong Wen

We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different...

ImageSeer: Clustering and Searching WWW Images (2004)

Xiaofei He, Deng Cai, Ji-rong Wen, Wei-ying Ma, Hong-jiang Zhang

Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper...

Learning Important Models for Web Page Blocks based on Layout and Content Analysis (2004)

Ruihua Song, Haifeng Liu, Ji-rong Wen, Wei-ying Ma

Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. It has also been proven that...

Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation (2003)

Yu, Shipeng, Cai, Deng, Wen, Ji-Rong, Ma, Wei-Ying

In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from...

A Multi-paradigm Querying Approach for a Generic Multimedia Database Management System (2003)

Ji-rong Wen, Qing Li, Wei-ying Ma, Hong-jiang Zhang

To truly meet the requirements of multimedia database (MMDB) management, an integrated framework for modeling, managing and retrieving various kinds of media data in a uniform way is necessary....

Hierarchical indexing and flexible element retrieval for structured documents (2003)

Hang Cui, Ji-rong Wen, Tat-seng Chua

Abstract. As more and more structured documents, such as the SGML or XML documents, become available on the Web, there is a growing demand to develop effective structured document retrieval which...

Improving pseudo-relevance feedback in web information retrieval using web page segmentation (2003)

Shipeng Yu, Deng Cai, Ji-rong Wen, Wei-ying Ma

In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from...

1 VIPS: a Vision-based Page Segmentation Algorithm (2003)

Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma, Deng Cai, Shipeng Yu, ...

A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page...

A Multi-paradigm Querying Approach for a Generic Multimedia Database Management System (2003)

Ji-rong Wen, Qing Li, Wei-ying Ma, Hong-jiang Zhang

To truly meet the requirements of multimedia database (MMDB) management, an integrated framework for modeling, managing and retrieving various kinds of media data in a uniform way is necessary....

Hierarchical indexing and flexible element retrieval for structured documents (2003)

Hang Cui, Ji-rong Wen

Abstract. As more and more structured documents, such as SGML or XML documents become available on the Web, there is a growing demand to develop effective structured document retrieval which exploits...

Query Expansion by Mining User Logs (2003)

Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma

Abstract—Queries to search engines on the Web are usually short. They do not provide sufficient evidence for an effective selection of relevant documents. Previous research has proposed the...

Cost-Driven Storage Schema Selection for XML (2003)

Ji-rong Wen, Hongjun Lu

Various models and approaches have been proposed for mapping XML data into relational tables recently.

Hierarchical Indexing and Flexible Element Retrieval (2003)

For Structured Document, Hang Cui, Ji-rong Wen

As more and more structured documents, such as SGML or XML documents become available on the Web, there is a growing demand to develop effective structured document retrieval which exploits both...

Query Expansion by Mining User Logs (2003)

Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma

Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of...

Improving pseudo-relevance feedback in web information retrieval using web page segmentation (2003)

Shipeng Yu, Deng Cai, Ji-rong Wen, Wei-ying Ma

In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from...

Extracting content structure for web pages based on visual representation (2003)

Deng Cai, Shipeng Yu, Ji-rong Wen, Wei-ying Ma

Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page...

Probabilistic Query Expansion Using Query Logs (2002)

Cui, Hang, Wen, Ji-Rong, Nie, Jian-Yun, Ma, Wei-Ying

Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information...

A Concentric Model for Community Mining in Graph Structures (2002)

Wei-Jun Zhou, Wen-jun Zhou, Ji-rong Wen, Ji-rong Wen, Wei-ying Ma, Wei-ying Ma, ...

Discovering communities from a graph structure such as the Web has become an interesting research problem recently. In this paper, comparing with the state-of-the-art authority detecting and graph...

Probabilistic Query Expansion Using Query Logs (2002)

Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma

Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information...

Probabilistic Query Expansion Using Query Logs (2002)

Hang Cui, Ji-Rong Wen, Wei-Ying Ma

Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in the traditional...

Ji-Rong Wen (2002)

Ji-rong Wen, Jian-yun Nie, Hong-jiang Zhang

This paper describes a new query clustering method that makes use of user logs which allow us to identify the documents the users have selected for a query. The similarity between two queries may be...

Clustering User Queries of a Search Engine (2001)

Wen, Ji-Rong, Nie, Jian-Yun, Zhang, Hong-Jiang

In order to increase retrieval precision, some new search engines provide manually verified answers to Frequently Asked Queries (FAQs). An underlying task is the identification of FAQs. This paper...

Clustering User Queries of a Search Engine (2001)

Ji-rong Wen

In order to increase retrieval precision, some new search engines provide manually verified answers to Frequently Asked Queries (FAQs). An underlying task is the identification of FAQs. This paper...

Query clustering using content words and user feedback (2001)

Ji-rong Wen

Query clustering is crucial for automatically discovering frequently asked queries (FAQs) or most popular topics on a question-answering search engine. Due to the short length of queries, the...

Eeye Digital Security, “All versions of Microsoft Internet Information Services Remote buffer overflow,” AdvisoryAD20010618 (2001)

Qiang Yang, Hai-feng Wang, Ji-rong Wen, Gao Zhang, Ye Lu, Hong-jiang Zhang

Abstract. As more information becomes available on the World Wide Web, it has become an acute problem to provide effective search tools for information access. Previous generations of search engines...

Towards A Next-Generation Search Engine (2000)

Qiang Yang, Hai-Feng Wang, Ji-rong Wen, Gao Zhang, Ye Lu, Kai-Fu Lee, ...

. As more information becomes available on the World Wide Web, it has become an acute problem to provide effective search tools for information access. Previous generations of search engines are...