Graph Theory Approach to Web Topology

Perhaps the best-known paradigm for studying the Web is graph theory. The Web can be seen as a graph whose nodes are pages and whose (directed) edges are links. Because very few weblinks are random, it is clear that the edges of the graph encode much structure that is seen by designers and authors of content as important. Strongly connected parts of the webgraph correspond to what are called cybercommunities and early investigations, for example by Kumar et al, led to the discovery and mapping of hundreds and thousands of such communities [175]. However, the identification of cybercommunities by knowledge mapping is still something of an art, and can be controversial - approaches often produce "communities" with unexpected or missing members, and different approaches often carve up the space differently [137].

The connectivity of the webgraph has been analysed in detail, using such structural indicators as how nodes are connected. Various macroscopic structures have been discerned and measured; for example one crawl of in excess of 200 million pages discovered that 90% of the Web was actually connected, if links were taken as non-directional, and that 56m of these pages were very strongly connected [49] cf. [80]. The structure thus uncovered is often referred to as a bowtie shape, as shown in Figure 4.1. The 'knot' of the tie is a strongly connected cluster (SCC) of the webgraph in which there is a path between each pair of nodes. The SCC is flanked by two sets of clusters, those which link into the SCC but from which there is no link back (marked as IN in the figure), and those which are linked to from the SCC but do not link back (OUT). The relationship between the SCC, IN and OUT gives the bowtie shape. The implications of these topological discoveries still need to be understood. Although some have suggested alterations to the PageRank algorithm to take advantage of the underlying topology [18], there is still plenty of work to do to exploit the structure discerned.

Indeed, the bowtie structure is prevalent at a variety of scales. Dill at al have discovered that smaller subsets of the Web also have a bowtie shape, a hint that the Web has interesting fractal properties - i.e. that each thematically-unified region displays (many of) the same Fig. 4.1 The bowtie shape of the Web and its fractal nature [78]. characteristics as the Web at large [78]. The Web is sufficiently sparsely connected to mean that the subgraph induced by a random set of nodes will be almost empty, but if we look for non-random clusters (thematically-unified clusters or TUCs) which are much more connected, then we see the bowtie shape appearing again. Each TUC will have its own SCC, and its own IN and OUT flank, contained within the wider SCC. The larger-scale SCC, because it is strongly connected, can then act as a navigational backbone between TUCs.

In this way the fractal nature of the Web gives us an indication of how well it is carrying the compromise between stability and diversity; a reasonably constant number of connections at various levels of scale means more effective communication [29]. Too many connections produce a high overhead for communication, while too few mean that essential communications may fail to happen. The assumption that levels of connectivity are reasonably constant at each level of scale is of importance for planning long-range and short-range bandwidth capacity, for example. TheWeb develops as a result of a number of essentially independent stochastic processes that evolve at various scales, which is why structural properties remain constant as we change scale. If we assume that the Web has this sort of fractal property, then for designing efficient algorithms for data services on the Web at various scales it is sufficient to understand the structure that emerges from one simple stochastic process [78].

Notes:

The graph theory approach produces a model of the web that is like a bowtie, and filled with other bowties, like a fractal. There is an image in the original document of this phenomena.

Folksonomies: semantic web web topology graph theory

Keywords:
bowtie shape (0.912532 (positive:0.400371)), graph theory approach (0.721781 (positive:0.453010)), Strongly connected parts (0.543921 (neutral:0.000000)), strongly connected cluster (0.518362 (positive:0.353931)), web (0.511033 (positive:0.091236)), SCC (0.463594 (neutral:0.000000)), fractal nature (0.428452 (positive:0.888185)), thematically-unified region displays (0.425377 (neutral:0.000000)), Various macroscopic structures (0.421992 (neutral:0.000000)), interesting fractal properties (0.414917 (positive:0.526106)), reasonably constant number (0.399985 (positive:0.767873)), short-range bandwidth capacity (0.398564 (positive:0.698615)), independent stochastic processes (0.396193 (neutral:0.000000)), larger-scale SCC (0.395083 (neutral:0.000000)), simple stochastic process (0.392208 (positive:0.602479)), wider SCC (0.385500 (neutral:0.000000)), various scales (0.366419 (positive:0.602479)), Web Topology (0.328236 (positive:0.453010)), graph encode (0.312053 (positive:0.564602)), original document (0.297036 (neutral:0.000000)), thematically-unified clusters (0.295889 (neutral:0.000000)), best-known paradigm (0.294192 (neutral:0.000000)), nodes (0.287397 (negative:-0.194458)), fractal property (0.285945 (neutral:0.000000)), webgraph correspond (0.281356 (neutral:0.000000)), early investigations (0.280184 (neutral:0.000000)), bowtie structure (0.275878 (neutral:0.000000)), knowledge mapping (0.274408 (positive:0.373627)), different approaches (0.270247 (neutral:0.000000)), topological discoveries (0.269438 (neutral:0.000000))

Entities:
SCC:Organization (0.805506 (neutral:0.000000))

Concepts:
Topology (0.966461): dbpedia | freebase | opencyc
Fractal (0.849189): dbpedia | freebase
Stochastic process (0.732270): dbpedia | freebase
Mathematics (0.705925): dbpedia | freebase | opencyc
World Wide Web (0.671109): dbpedia | freebase | yago
Planar graph (0.670063): dbpedia | freebase | yago
Graph (0.651731): dbpedia | freebase
Structure (0.613338): dbpedia | freebase

 A Framework for Web Science (Foundations and Trends(R) in Web Science)
Books, Brochures, and Chapters>Book:  Berners-Lee, Tim (2006-09-15), A Framework for Web Science (Foundations and Trends(R) in Web Science), Now Publishers Inc, Retrieved on 2010-11-15
  • Source Material [eprints.ecs.soton.ac.uk]
  • Folksonomies: web science