Web topology histogram cube

Topological measures and maps of the Web

PI: Filippo Menczer

In this project we study the relationships between different types of Web topology based for example on hyperlinks, words, and page meaning, and how they affect the performance of ranking and crawling algorithms, such as InfoSpiders. This research extends prior work (see dissertation) in which we characterized a necessary condition for effective autonomous browsing of any distributed hypertext database such as the Web in terms of a relevance autocorrelation measure.

More recently we have used a brute force approach to map the relationships between lexical, linkage, and semantic similarity across billions of Web page pairs. The data cube in the picture is a graphical representation of the distribution of page pairs along the three similarity dimensions. By clicking on a face of the cube you will see a more detailed histogram map projected onto two similarity axes.

This research is being applied to build models that may help understand how the scale-free distribution of Web links has emerged and how it can be exploited for designing more effective Web crawlers and search engines.



This material is based upon work supported by the National Science Foundation under grant No. IIS-0133124/0348940. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.