event thumbnail image
European Conference on Complex Systems
PASCAL

Web Click Network

author: Filippo Menczer, School of Informatics, Indiana University

Description

We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean link host graph and others pointing to important differences. We find that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited. The temporal traffic patterns display strong regularities, with a large portion of future requests being statistically predictable by past ones. Given the importance of topological measures such as PageRank in modeling user navigation, as well as their role in ranking sites for Web search, we use the traffic data to validate the PageRank random surfing model. The ranking obtained by the actual frequency with which a site is visited by users differs significantly from that approximated by the uniform surfing/teleportation behavior modeled by PageRank, especially for the most important sites. To interpret this finding, we consider each of the fundamental assumptions underlying PageRank and show that each is violated by actual user behavior. Joint work with Mark Meiss, Santo Fortunato, Alessandro Flammini, and Alessandro Vespignani.

You might be experiencing some problems with Your Video player.
Slides
0:00 The Web Click Network
1:03 Static Web link graph
3:54 NetFlow
6:17 Web server logs
7:35 Toolbars - part 1
7:40 Toolbars - part 2
10:17 ISP
12:12 The Internet is for... - part 1
12:31 The Internet is for... - part 2
12:54 Clicky - part 1
15:08 Clicky - part 2
16:20 ... But seriously... Outline
18:16 Data collection
22:30 Host graphs
25:42 Structural properties: degree - part 1
25:54 Structural properties: degree - part 2
27:53 Caveat: sampling bias
29:29 Structural properties: strength (site traffic)
30:43 Structural properties: strength (link traffic)
31:43 Behavioral patterns (HUMAN)
35:39 Ratios are stable - part 1
35:59 Ratios are stable - part 2
37:06 Does search mitigate the rich-get-richer dynamics?
38:09 Temporal patterns - part 1
38:37 Temporal patterns - part 2
39:06 HUMAN host graph
40:20 PageRank
40:54 Kendall’s rank correlation
42:07 PageRank assumptions
43:30 Kendall’s rank correlation
44:05 Local link heterogeneity - part 1
44:09 Local link heterogeneity - part 2
44:45 Teleportation source heterogeneity
45:14 Teleportation target heterogeneity
45:38 Summary

Lecture rating

People found this lecture:
Worth seeing
because it is:
 Valuable and informative
Well presented
Easily understandable
Acceptably recorded
You need to login to cast your vote.

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment: