The Google Markov Chain - Convergence and Eigenvalues

U.U.D.M.
Project Report 2012:14
The Google Markov Chain: convergence speed and eigenvalues

Fredrik Backker
Examensarbete i matematik, 15 hp Handledare och examinator: Jakob Bjrnberg Juni 2012
Department of Mathematics Uppsala University
Acknowledgments
I would like to thank my supervisor Jakob Bjrnberg for helping me writing this thesis.
The Google arkov !hain" #onvergen#e speed and eigenvalues

Contents
1 Introduction 2 Definitions and background $.1 arkov #hains $.$ The Google %age&ank 3 Convergence speed '.1 General theory of #onvergen#e speed '.$ !onvergen#e speed and eigenvalues of Google(s 4 Simulations ).1 ultipli#ity of the se#ond eigenvalue ).$ *uality of the limit distribution 5 Conclusion 6 eferences Appendices atlab+!ode
arkov !hain
! Introduction
There are many different sear#h engines on the internet whi#h help us find the information we want. These sear#h engines use different methods to rank pages and display them to us in a way su#h that the most relevant and important information is showed first. In this thesis, we study a mathemati#al method that is a part of how %age&ank,the ranking method for the sear#h engine Google, ranks the order of whi#h pages are displayed in a sear#h. This method we look at uses pages as states in a sto#hasti# arkov #hain where outgoing links from pages are the transitions and the #orresponding transition probabilities are e-ually divided among the number of outgoing links from the related page. The transition probability matri. that is given by this is then used to #ompute a stationary distribution where the page with the largest stationary value is ranked first, the page with the se#ond largest is ranked se#ond and so on. This method #an be put into two variants, with a dampening fa#tor or without. The variant without a dampening fa#tor is the one we just des#ribed. In the other variant, whi#h we study in this thesis, the dampening fa#tor /often set to 0.123 is introdu#ed mainly to ensure that the stationary distribution is uni-ue. This variant is #onsidered to be the most useful one and in this thesis we take a light look at how the dampening fa#tor affe#ts the #omputation of %age&ank. 4e will begin by going through some basi# definitions for arkov #hains and e.plain the Google %age&ank in more detail. In the se#tion after, we go through some general theory about the rate of #onvergen#e for arkov #hains sin#e it turns out that the eigenvalues of a transition probability matri. is #onne#ted to the #onvergen#e speed to its steady state. 5urther, we look at the se#ond largest eigenvalue of the Google arkov #hain and its algebrai# multipli#ity, whi#h are the main fa#tors that affe#t the #onvergen#e rate of the #hain. 6e.t, we go through some results of how the se#ond eigenvalue of the Google arkov #hain is limited by the dampening fa#tor and by this, makes the #hoi#e of the dampening fa#tor very important. 4e end by doing some simulations to #he#k how different properties of %age&ank are affe#ted by #hoi#es of the dampening fa#tor and in parti#ular, whi#h value of the dampening fa#tor that is most adapted for a fast #onvergen#e speed of the Google arkov #hain.
'
2 Definitions and background

2"! #arkov c$ains
7 dis#rete time arkov #hain is a sto#hasti# pro#ess { X n } with finite state spa#e S that satisfies the arkov property"
P ( X n= x n X 0= x 0 , , X n 1= x n1)= P ( X n = x n X n1 = x n1) for all x 0 , , x n S and n 1. In other words, the ne.t step of a arkov #hain is independent of the past and only relies upon the most re#ent state. The #hain is #alled time+homogenous if the transition probabilities do not #hange over time, i.e. if for ea#h i , j S , pij = P ( X n= j X n1 =i ) does not depend on n. In this #ase the probabilities p ij are the arkov #hains transition probabilities when moving from m) state i to state j. 7lso let p ( ij = P ( X m+ n= j X n=i ) denote the transition probabilities in m steps, m=0,1,$... . The probabilities #an be #olle#ted in a transition probability matri., here denoted by P" p00 P = p10
p01 p11
)
pij=1.
j
This matri. is #alled a sto#hasti# matri. if all of the row ve#tors in it sum to one"
The
arkov #hain is said to be irreducible if it is possible to rea#h ea#h state i from any other state j, in any number of steps. ore formally, if P ( X n= j X 0=i )> 0 for some n 0 i , j 7 state i has period k if any return to state i o##urs in multiples of k steps" k = greatest common divisor of the set {n " P ( X n=i X 0 =i )>0 } If all the states in a arkov #hain has period one, it is said to be aperiodic, i.e. the greatest #ommon divisor of the return time to any state from itself is one. The following result is standard and we do not prove it. %roposition ! 7 arkov #hain that is irredu#ible and aperiodi# with finite state spa#e has a uni-ue stationary distribution 8, whi#h is a probability ve#tor su#h that =P. 7dditionally, the transition probabilities #onverges to a steady state when the number of steps goes to infinity in the sense that (m ) lim p ij = j for all i,j in S.
m
2"2 &$e 'oogle %age ank

The Google %age&ank is one of many methods that the sear#h engine Google uses to determine the importan#e or relevan#e of a page. This method uses a spe#ial arkov #hain whi#h is used to #ompute the rank of web pages and this rank determines in whi#h order the pages should be listed in a sear#h in Google. )
9et all the web pages Google #ommuni#ates with be denoted by the state spa#e W. The si:e of W is n, several billion pages. 9et =( c ij ) denote the #onne#tivity matri. of W, whi#h means that is a n!n matri. with c ij =1 if there is an hyperlink from page i to page j and c ij =0 otherwise. The number of outgoing links from page i are the row sums si = cij If si =0 , it has no outgoing links and is #alled a dangling node. 9et " =( t ij ) be given by t ij =c ij / s i if si 1 and t ij =1 / n if i is a dangling node. By this, " #an be seen as a transition probability matri. of the arkov #hain with state spa#e W. 5urthermore, to define the Google arkov #hain we in#lude an additional parameter d, whi#h is a dampening fa#tor that #an be set between 0 and 1. The transition probability matri. of the Google arkov #hain is defined by" 1 P = d" +( 1 d )( ) E n where E is the n!n matri. with only ones. This arkov #hain #an be des#ribed as a ;random surfer; who, with probability d, #li#ks on an outgoing link on the #urrent web page with e-ual probabilites or, if the page has no outgoing links, #hooses another page at random in W. 7lso, with probability 1+d, the surfer jumps to a page at random among all the pages n. The Google arkov #hain is finite, irredu#ible and also aperiodi# depending on whi#h value d has. If d<1, the #hain is aperiodi# sin#e all its states have a probability to jump ba#k to them self and therefor periods that are e-ual to one. If d=1, we get P=" and that the periodi#ity and irredu#ibility is #ompletely determined by the outgoing links from all of the pages. By this, it is possible that two pages only link to ea#h other and #reate a subset with a periodi#ity of two. If so, the #hain is neither aperiodi# nor irredu#ible. Then there is no uni-ue stationary distribution and be#ause the #hain stays in a subset, the limit distribution j depends on the starting state. This would be the most realisti# #ase #onsidering how the internet is stru#tured and is the main reason why the dampening fa#tor d is introdu#ed. 5urther d affe#ts, as we will see, the #onvergen#e speed of the Google arkov #hain. In the #omputation of %age&ank, d is usually set to 0.12>1? and then the Google arkov #hain is finite, irredu#ible and aperiodi#. @en#e by %roposition 1 there e.ist a uni-ue stationary distribution 8. This stationary distribution is used to rank all the pages in W by letting the page with the largest i be ranked first ,and the se#ond largest be ranked se#ond, and so on until all we get a Google %age&ank for all the pages. Ane way of #omputing the Google %age&ank is done by simulating the transitions until you rea#h a /appro.imate3 steady state and a##ording to Brin and %age>1?, the #reators of Google, ;a %age&ank for $B million web pages #an be #omputed in a few hours on a medium si:e workstation;.
j=1 n
3 Convergence speed
3"! 'eneral t$eor( of convergence speed
Cin#e the Google %age&ank #onsists of many billion pages, one might would like to know how fast this #an be #omputed. This #an be done by determining how fast the transition probability matri. of the Google arkov #hain #onverges to its steady state as in %roposition 1. To find this rate of #onvergen#e, we need to go through some definitions and theorems. 9et # be a s-uare sto#hasti# matri. of dimension m, w be a non+:ero ve#tor and $ a s#alar su#h that # w=$ w whi#h is e-uivalent to ( # $l ) w =0 /where l is the identity matri.3. Then $ is said to be the right eigenvalue of # #orresponding to the eigenve#tor w. In words, an eigenve#tor of a matri. is a non+ :ero ve#tor that remains paralell to the original ve#tor after being multiplied by the matri. and the eigenvalue of that eigenve#tor is the fa#tor of whi#h it is s#aled when multiplied by the matri.. Digenve#tors #an either be left or right eigenve#tors, but the most #ommonly used is the right as des#ribed above. 4e say that $ is a left eigenvalue if z T #= $ z T , where z is a non+:ero ve#tor /the left eigenve#tor3. Da#h left eigenvalue is a right eigenvalue and vi#e versa, be#ause if $% is a left eigenvalue then zT # = $ % zT ( z T #)" = $ % z #" z = $ % z ( #" $ % l ) z = 0 0 =det ( #" $ % l ) = det ( # $ % l )" = det ( # $ % l ) This shows that $% is also a right eigenvalue. &$eorem ! E=1 is always an eigenvalue of a sto#hasti# m!m matri. # asso#iated with the right eigenve#tor v=! with all entries e-ual to 1. If a stationary distribution e.ist then the left eigenve#tor u=8. Proof& Cin#e #!=! and 8#=8. F 9et E1,...,Em be the m eigenvalues of #, assume these eigenvalues are distin#t, and let u1,...,um be the #orresponding left eigenve#tors and v1,...,vm be the #orresponding right eigenve#tors. 4e do not prove the following well+known fa#t. &$eorem 2 9et # be an irredu#ible, aperiodi# and sto#hasti# m!m matri., then E1=1 satisfies E1 GHEiH for any other eigenvalue Ei. 4e now perform some #al#ulation that illustrate the relevan#e of the se#ond largest eigenvalue for #onvergen#e speed.
%roposition 2 u1,...,um form an orthogonal set of ve#tors, and so do v1,...,vm.

" Proof of Proposition '" The e-uations for the eigenve#tors are" u " and # v j = $i v j . By i # = $i u i multipli#ation we find that " " " u" i #v j = $ i ui v j u i #v j = $ j ui v j " $ i u" i v j = $ j ui v j ( $i $ j ) u " i v j =0 " u i v j =0 if $i $ j and sin#e the eigenvalues are distin#t the following e-uation holds" u" (1 ) i v j= 0, if i j , 1 i , j m. by this we see that eigenve#tors of different eigenvalues are orthogonal to ea#h other.
F 5urther, we #an s#ale the eigenve#tors so that " u i v i =1 for all i [ 1, m] . ($ ) !olle#t the left eigenve#tors ui of # in ( so that u1 is the first #olumn in (, u$ is the se#ond, and so on. !olle#t the right eigenve#tors vi in ) the same way. ( =( u1 , u $ , , u m) , ) =( v 1 , v $ , , v m) 5rom /13 and /$3 we get that" (" (' ) i ) i = 1, " and also, from the theory of matri#es, that ) i ( i =1. 5urther, let * be a diagonal matri. with the eigenvalues of # as entries, i.e. $1 0 0 *= 0 $$ 0 0 0 0 0 $m
Cin#e ) #onsists of the right eigenve#tors, we get the e-uation" # ) = )* By /'3 and /)3 we get" ( " #) =( " )* = * 4hi#h #an be rewriten to" #= )*( = $ i v i u" i
" i=1
n
() )
4e then take the power of n of # to get

# =)* (
n "
Cin#e, # =( )*( )( )*( )( )*( ) ( )*( )

n factors n " " " "
=) *( ( ) )* ( ( ) ) * ( ( " ) )* ( " =)* ( By this we get the spe#tral de#omposition

m i=1 n factors n "
"
"
" # n = $ n i v i ui
4e #an rewrite this as
" " n = $n # n $ n v iu " 1 v 1 u1 i v i u i $i i i =$ i= $
5urther let the eigenvalues other than E1, i.e. E$,E',...,Em, be arranged su#h that $ 1>$ $ $ m " The results above is an argument whi#h shows that E$ is related to the differen#e of #n $ n 1 v 1 u1 when all eigenvalues are different. In fa#t>)?, there is even a way to show that also if the eigenvalues are not distin#t then /J3 holds. Dnsure /by rearranging the eigenvalues if ne#essary3 that, if any HEiH for iK' is e-ual to HE$H, then ri, the algebrai# multipli#ity of Ei, is less than or e-ual to r$. This arrangement of the eigenvalues is ne#essary for the %erron+5robenius Theorem, where the algebrai# multipli#ity of the se#ond eigenvalue is related to the #onvergen#e speed for a transition probability matri. to its steady state. 4e state the theorem and then look at an e.ample to make the rate of #onvergen#e and the algebrai# multipli#ity more #lear. &$eorem 3 9et the eigenve#tors be #hosen so that u " i v i = 1, where ui is the left eigenve#tor and vi is the right eigenve#tor. Then we get the formula" n n r 1 # n = $1 v1 u" ( J) $$ ) . 1 ++ ( n
$
)*ample ! ,ates of convergence via Perron-.robenius "heorem If P is a sto#hasti#, irredu#ible and aperiodi# matri. with state spa#e S=L1,...,mM.Then the first eigenvalue is E1=1 with eigenve#tors v1=!, u1= 8, and therefore by /J3 n P n=! " ++ ( nr 1 $$ ) and if the eigenvalues are distin#t and the absolute value of these are different we even get n P n=! " ++ ( $$ ) . By this we see that smaller HE$H gives a higher rate of #onvergen#e. If we do not arrange the eigenvalues and #ount their multipli#ity, as des#ribed above, we might get a #onvergen#e speed that is not true. 5or e.ample if the eigenvalues were e-ual to 0.2 and +0.2, and say that their algebrai# multipli#ity are ) respe#tively 1. Then, sin#e we #hoose the eigenvalue with the largest multipli#ity, we get from theorem $ that n P n=! " ++ ( n)10.2 ) If we ordered the eigenvalues so that E$=+0.2 instead, we get n n " 1 1 P =! ++ ( n + 0.2 ) whi#h is not true for the rate of #onvergen#e.
$
3"2 Convergence speed and eigenvalues of 'oogle+s #arkov C$ain

6ow that we know the importan#e of the se#ond eigenvalue we #an #ontinue by looking at some results done by @aveliwala and Namvar>$?. These results are related to the se#ond eigenvalue of the Google arkov #hain transition probability matri. and to the dampening fa#tor d, whi#h we will see is relevant to the result in Theorem '. &$eorem 4 1 arkov #hains transition probability matri., P = d" +( 1 d )( ) E , has a se#ond n $ d eigenvalue that satisfies $ The Google 1
&$eorem 5 If there e.ist at least two irredu#ible #losed subsets in ", then the se#ond eigenvalue of P is $ $=d In >$?, the result is stated and proved for any rank+one row+sto#hasti# matri. E, but in our #ase we only #onsider when E is the m!m matri. with only ones. In appli#ations the most #ommon #hoi#e is to use the E we have #hosen. Opon reading >$? we found that the proofs of their results be#ame #onsiderably simpler in our #ase and we therefore now give the proofs for these simpler #ases in detail. %roofs of both these theorems rely on the following lemma" ,emma ! If v ! is a right eigenve#tor of P then v is a right eigenve#tor of ". $ If Pv = $v then "v = v. d Proof" 9et Ei denote the eigenvalues of P with asso#iated eigenve#tor vi, and let the eigenvalues of " be denoted by Pi with asso#iated eigenve#tor /i for i=$,',...,m. If we look at the e-uation for the eigenvalues of P we get" 1 P v i= d" v i+( 1 d )( ) E v i= $ i v i m Da#h row of E e-uals !&. By proposition $, sin#e ! is also the first right eigenve#tor of P, we get !&vi=0 and by this, that Evi=0. The e-uation left is" d" vi = $ i v i 6e.t, we #an divide by d to get" $ " v i= i v i d Then we let /i=vi and Pi=EiQd. &ewrite the e-uation to " / i= 0i / i and we see that vi is an eigenve#tor of " as well. F 4ith this we #an #ontinue to the proofs of the theorems. Proof of theorem 1" 5or this to be proven, we need to look at different values of d. 9et Ei denote the eigenvalues of % with asso#iated eigenve#tor vi, and let the eigenvalues of T be denoted by Pi for i=$,',...,m. 1 ) E , whi#h has E$=0 and therefore the theorem holds. m 4hen d=1 we get the e-uation P=" and sin#e " is sto#hasti#, $ $1. , the theorem holds for this #ase as well. To prove the theorem for the #ase when 0<d<1, we use the lemma we proved before. $ 5rom 9emma 1 we know that " v ' = $ v ' and by this, we see that the eigenvalues of P and " is d s#aled by" $ $=d0 i . Cin#e " is sto#hasti# and therefore the eigenvalues 0i1, we get that $ $ d and Theorem ) is proved. F 4hen d=0 we get the e-uation P =( R
Proof of theorem 2" It is a standard fa#t /from page 1$B of >'?3 that the multipli#ity of eigenvalue 1 of a sto#hasti# matri. is e-ual to the number of irredu#ible #losed subsets. Co $ linearly independent eigenve#tors a,b of " with "a =a , "b =b 4e want to #onstru#t a ve#tor x with " x = x and !T x =0. Cet x = 3a + 4b , S and T are s#alars. By this we get " x =3"a + 4"b= 3a + 4b , so " x = x for all S,T. 5urther, we want !T x =3 !T a + 4 !T b to be e-ual to 0. If !T a =0 and !T b= 0, we #an #hoose !T b T any S,T. Atherwise, we assume ! a 0 and #hoose 3 = 4 T . ! a By this, we see that x is an eigenve#tor of " with eigenvalue 1 and from lemma 1 we get that x is an $ eigenve#tor of P with eigenvalue $ =1 $ $=d . d F Cin#e " most likely has at least two irredu#ible #losed subsets /as mentioned in se#tion $.$ when two pages only link to ea#h other, also see figure 1 below3 it is obvious that the #hoi#e of the dampening fa#tor is related to the rate of whi#h the Google arkov #hain #onverges.
Figure 1& ' and 5creates an irreducible closed subset, and so do 1 and 2.
To make these results on the se#ond eigenvalue more #lear we look at some small e.amples. )*ample 2 !onsider a small web of ' pages with the transition probability matri. 0 1/ $ 1/ $ " = 1 / $ 0 1/ $ 1 / $ 1/ $ 0 and a dampening fa#tor set to 0.12. Then the transition probability matri. of the Google arkov 0 1/ $ 1 / $ 1 1 1 1 / $0 1R / )0 1R / )0 0.12 #hain in this #ase is P =0.12 1 / $ 0 1 / $ + 1 1 1 = 1R / )0 1 / $0 1R / )0 ' 1/ $ 1/ $ 0 1 1 1 1R / )0 1R / )0 1 / $0 The eigenvalues of P are #omputed by det [ P $6 ']=0. 4e then get from using matlab that the se#ond eigenvalue is E$=-0.)$20 and a##ording to Theorem ) $ $ d , whi#h is true sin#e 0.)$20<0.12.
) ( )(
)*ample 3 !onsider another small web with ' pages where the transition probability matri. is 0 1/ $ 1/ $ "= 0 1 0 0 0 1 10
and the dampening fa#tor is set to 0.12. In this #ase we get that P is 1 / $0 1R / )0 1R / )0 P = 1 / $0 R / 10 1 / $0 1 / $0 1 / $0 R / 10 Cin#e " has at least two irredu#ible #losed subsets, the se#ond eigenvalue must be e-ual to dampening fa#tor. 5rom #omputation we get that E$=0.12 and we see that the result from Theorem 2 is true.
4 Simulations
6ow that we know, from Theorem ) and 2, that our #hoi#e of d will affe#t E$ and further the #onvergen#e rate of the Google arkov #hain, it is of interest to simulate some random networks in atlab to investigate how different values of d affe#t some fa#tors of the #omputation of %age&ank. 4e begin by giving a short des#ription of how our simulation of random networks is done and then move on to our results. 7s we mentioned earlier the internet is stru#tured in a way so that the pages link to ea#h other and #reate more than one #losed irredu#ible subset. Therefore, we #onsider this in our simulation of networks to make them somewhat realisti# and also remove links whi#h link to the page them self to prevent single pages from #reating subsets. To do this, we randomi:e small transition probability matri#es of desired si:e and then put them into the diagonal in another bigger matri. whi#h be#omes our ". 5or e.ample, if we randomi:e three smaller matri#es #,7 and , it looks like # 0 0 "= 0 7 0 0 0 /the 0Us represent matri#es of :eros of the same si:e as the small matri#es3. Then if a page links to itself we remove that link by putting a :ero there instead. In other words, the diagonal of " #ontains only :eros. 7fter we have done this, if a page in " does not have a link we randomi:e a page to link to other than the page itself. 4e then use this modified " in our Google arkov #hain P to do some tests.
4"! #ultiplicit( of t$e second eigenvalue

The first thing we tested was how r', the multipli#ity of E$, was affe#ted by different values of d. &e#all /J3. 4hi#h shows that apart from HE$H itself, the main fa#tor affe#ting the #onvergen#e speed is r'. If we #hoose a small d, we would limit E$ to also being small and by this get a high #onvergen#e rate depending on the si:e of r'. But if r' is bigger for smaller d it would lower the rate of #onvergen#e and then the #hoi#e of d would not give us the #onvergen#e speed as desired. 4hen testing this we found that different d does not affe#t the multipli#ity of E$ and therefore a #hoi#e of d #lose to 0 would give a fast #onvergen#e speed. 5or e.ample, in a test with d=0.12 for a network of 1000 pages we got r'=' and when #hanging d to 0.1 and 0.01 for the same matri. we got that the multipli#ity still was './the .+a.is shows the multipli#ity of the se#ond eigenvalue and the y+a.is is the number of simulations3
11
Figure 2: d=8.9
Figure 3: d=8.89
4"2 -ualit( of t$e limit distribution

The ne.t -uestion would then be why d should not be set to a very small number. 7fter all, smaller d mean faster #onvergen#e. In the e.treme #ase d=0, #onvergen#e would be almost immediate. 1 @owever, sin#e then P =( ) E the limit distribution is always uniform regardless of the stru#ture m of the internet itself. Intuitively, larger d means that P is V#loser; to " and hen#e the %age&ank should give a more realisti# pi#ture. 4e have not found any -uantitative formulation of this intuition in the literature, and have therefore simulated to obtain the %age&ank for different d. In these tests we simulate transitions for networks of si:e 1000 for different d. The following figures shows the number of times ea#h state in P is visited when simulating 10 000 transitions and #hoosing a starting state at random, /the .+a.is shows the states and the y+a.is shows the number of visits3.
Figure 3: d=8.9
Figure 4: d=8.2
1$
Figure 5: d=8.:2
Figure 6: d=8.;;
These visits are then #ounted and used to #ompute a steady state for P and in turn the %age&ank of the pages. 5rom the figures, we see that the number of times a state is rea#hed are evenly spread for lower values of d and thereby gives a more uniform steady state than for higher d. The %age&ank 1 for low d would then be #al#ulated generally from ( ) E and most of the stru#ture from " would m be lost. To investigate this further, we ran a test with a spe#ified " /see figure I3.
Figure 7: States <,=,:,; and 98 have links to each other>not illustrated? and create an irreducible subset.
This time, we simulated 1000 transitions for different values of d, and did this 1000 times for ea#h d to get the mean value for the distan#e between uniform distribution and our simulated stationary distribution.
1'
Figure 8: d=8.9
Figure 9: d=8.2
Figure 10: d=8.:2
Figure 11: d=8.;;
In the figures above we see the number of visits for a simulation for ea#h value of d we tested and this time it is even more #lear that lower values of d gives a more uniform steady state. The distan#es we measured between uniform distribution and our steady state for these d shows that our statement is #orre#t" Walue of d Xistan#e"
m
0.1 0.00$1
$
0.2 0.00'0
0.12 0.002'
0.RR 0.0BIR
i= 1
1 ( m )
i
Cmaller values of d give shorter distan#e to the uniform distribution and to make this obvious, we plotted the distan#e from simulations for values of d between 0.01 and 0.RR /the .+a.is shows the value of d and the y+a.is shows the distan#e3.
1)
Figure 12: The distan#e for different values of d
5 Conclusion
5rom the results in se#tion ', we have learned that the Google arkov #hains se#ond largest eigenvalue and the algebrai# multipli#ity of it, is dire#tly affe#ting the #onvergen#e speed of the #hain to a stationary distribution. 5urthermore we have seen that the dampening fa#tor restrains the se#ond largest eigenvalue to be less than or e-ual to the value of the dampening fa#tor or to be e-ual to the dampening fa#tor in the #ase when there e.ists two or more #losed irredu#ible subsets in the set of pages we use. In se#tion ) we see, from different simulations of networks, that different values of the dampening fa#tor does not #hange the multipli#ity of the se#ond eigenvalue and therefore a #hoi#e of a very small dampening fa#tor would give faster #onvergen#e speed than larger #hoi#es, su#h as 0.12. But from tests of the -uality of the limit distribution, we dis#over that setting the dampening fa#tor to a low value will #hange the stru#ture of the transition probability matri. of the Google arkov #hain and transform the limit distribution into being a uniform distribution. Cin#e outgoing links from pages are the main fa#tor in this method of #omputing a Google %age&ank, a stationary distribution whi#h is uniform would mean that all of the pages have the same %age&ank. By this, we would have re#eived a fast #omputation of a %age&ank where we have lost almost all of the information from our original network and this would not be very useful. 5rom these results we see that setting the dampening fa#tor to 0.12, as the #reators of Google did, might give us a good #ombination of good -uality of the limit distribution and fast #onvergen#e speed for the Google arkov #hain.
12
6 eferences
>1? 9. %age and C. Brin. VThe 7natomy of a 9arge+C#ale @yperte.tual 4eb Cear#h Dngine;. !omputer 6etworks and ICX6 Cystems '0 /1RR13 10I+ 11I >$? T.@. @aveliwala and C.X. Namvar, VThe se#ond eigenvalue of the Google matri.;, Ctanford Oniversity, !omputer C#ien#e Xepartment. >'? X.9. Isaa#son and &.4. adsen. V arkov !hains" Theory and appli#ations;, #hapter IW, pages 1$B+1$I. John 4iley and sons, In#. 6ew York, 1RIB. >)? %.BrZmaud. V arkov !hains" Gibbs fields, onte !arlo simulation and *ueues;, #hapter B, pages 1R2+1RR. Cpringer+verlag, 6ew York, 1RRR. >2? G.&. Grimmett, X.&. Ctir:aker, V%robability and random pro#esses;, $nd Dd, A.ford C#ien#e %ubli#ations, A.ford, 1RR$.
1B
#atlab.Code
% % % % % Program for determining the multiplicity of the second eigenvalue. n is the number of smaller matrices put into the diagonal in a bigger matrix. p is prob of a link d is the dampening factor loop is the number simulations
function [P] = randNet(n,p,d,loop !"#!=$eros(%,loop & for k=%'loop& N=()n& !=$eros(N & for i=%'( *=rand(n & *=(*+=p)ones(n a=(i-% )n,%& b=a,n-%& !(a'b,a'b =*& end
&
for i=%'N !(i,i =-& if (sum(!(i,' ==.=randi([%,N-%] & if .+i !(i,. =%& else !(i,.,% =%& end end end !=!./repmat(sum(!,0 ,%,()n &
P=d)!,(%-d /(()n )ones(()n & "ig=eig(P & 1"# = $eros(si$e(eig(P for i = %' si$e(eig(P 1"#(i end !"#!(%,k =sum(1"# -%& sum(1"# -% end hist(!"#! =(abs("ig(i & 2 d--.-----% &
1I
% % % % %
Program for determining Pagerank and distance to uniform distribution. n is the number of smaller matrices put into the diagonal in a bigger matrix. p is the probabilit3 of a link. d is the dampening factor. trans is the number of transitions.
function [P] = Pagerank(n,p,d,trans) N=()n& !=$eros(N & for i=%'( *=rand(n & *=(*+=p)ones(n a=(i-% )n,%& b=a,n-%& !(a'b,a'b =*& end
&
for i=%'N !(i,i =-& if (sum(!(i,' ==.=randi([%,N-%] & if .+i !(i,. =%& else !(i,.,% =%& end end end !=!./repmat(sum(!,0 ,%,()n & P=d ) ! , (%-d /(()n ) ones(()n & nmb=trans& states=$eros(%,nmb & states(% =randi([%,()n] & le4els=cumsum(P,0 & pi=$eros(%,()n & count=-& for i=%'nmb-% u=rand& .=%& 5hile u2le4els(states(i ,. .=.,%& end& states(i,% =.& for k=%'()n if states(i ==k count=count,%& pi(k =count& end end& end&
11
pi=pi/sum(pi & unif=ones(%,N /N& norm(pi-unif hist(states(%---'nmb ,()n % Program for determining Pagerank and distance to uniform distribution for a specified arkov chain. % d is the dampening factor. % trans is the number of transitions. % loop is the number simulations function [P] = Pagerank!(d,trans,loop) !=[% % % % % % % % % % % % % % % % % % % % % % % % % % % -& -& -& -& %& %& %& %& %& -]&
!=!./repmat(sum(!,0 ,%,%- & !"#!=$eros(%,loop & dampen=$eros(%,66 & d4al=$eros(%,66 & for m=%'67& P=d ) ! , (%-d /(%- ) ones(%- & for l=%'loop& nmb=trans& states=$eros(%,nmb & states(% =%&
le4els=cumsum(P,0 & pi=$eros(%,%- & count=-& for i=%'nmb-% u=rand& .=%& 5hile u2le4els(states(i ,. .=.,%& end& states(i,% =.& for k=%'%if states(i ==k count=count,%& pi(k =count&
1R
end end& end& pi=pi/sum(pi & unif=ones(%,%- /%-& !"#!(%,l =norm(pi-unif & end& dampen(%,m =mean(!"#! & d4al(%,m =d& d=d--.-% end plot(d4al(%'66 ,dampen(%'66
$0

The Google Markov Chain - Convergence and Eigenvalues

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

The Google Markov Chain - Convergence and Eigenvalues

Încărcat de

Drepturi de autor:

Formate disponibile

U.U.D.M.

Project Report 2012:14

The Google Markov Chain: convergence speed and eigenvalues

Examensarbete i matematik, 15 hp Handledare och examinator: Jakob Bjrnberg Juni 2012

Department of Mathematics Uppsala University

The Google arkov !hain" #onvergen#e speed and eigenvalues

2 Definitions and background

2"2 &$e 'oogle %age ank

%roposition 2 u1,...,um form an orthogonal set of ve#tors, and so do v1,...,vm.

4e then take the power of n of # to get

Cin#e, # =( )( )( )( )( )( ) ( )( )

=) ( ( ) ) ( ( ) ) * ( ( " ) )* ( " =)* ( By this we get the spe#tral de#omposition

4e #an rewrite this as

" " n = $n # n $ n v iu " 1 v 1 u1 i v i u i $i i i =$ i= $

3"2 Convergence speed and eigenvalues of 'oogle+s #arkov C$ain

4"! #ultiplicit( of t$e second eigenvalue

4"2 -ualit( of t$e limit distribution

Figure 10: d=8.:2

Figure 11: d=8.;;

Figure 12: The distan#e for different values of d

S-ar putea să vă placă și