Publication View

Level-biased Statistics in the Hierarchical Structure of the Web (2008)

Abstract
Abstract. In the literature of web search and mining, researchers used to consider the World Wide Web as a flat network, in which each page as well as each hyperlink is treated identically. However, it is the common knowledge that the Web is organized with a natural hierarchical structure according to the URLs of pages. Exploring the hierarchical structure, we found several level-biased characteristics of the Web. First, the distribution of pages over levels has a spindle shape. Second, the average indegree in each level decreases sharply when the level goes down. Third, although the indegree distributions in deeper levels obey the same power law with the global indegree distribution, the top levels show a quite different statistical characteristic. We believe that these new discoveries might be essential to the Web, and by taking use of them, the current web search and mining technologies could be improved and thus better services to the web users could be provided. 1

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.3613
Source http://learn.tsinghua.edu.cn:8080/2003315385/papers/Level-biased.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.120.3875, 10.1.1.31.1768, 10.1.1.122.7284, 10.1.1.16.2407, 10.1.1.11.6843, 10.1.1.136.9917