More Related Content More from Dawn Anderson MSc DigM (20) Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn Anderson15. Popularised by.George.
Kingsley.Zipf
• George&Kingsley&Zipf (1902&5 1950)
• Linguist&&&philologist
• Popularised ‘Zipf’s Law’&but&did¬&claim&to&have&discovered&it
• Observed&patterns&in&word&frequency&distribution
• Extended&Zipf’s Law&to&data&types&other&than&words
• Developed&‘Principle&of&Least&Effort’&(Path&of&least&resistance)
• Image&attribution:&Freeport&High&School,&Freeport,&Illinois.,&Public&
domain,&via&Wikimedia&Commons
41. Historically,,of,course,,
many,of,these,words,
were,considered,to,
add,no,value,(stop,
words)
59. And$Zipfs Law$is$a$Power$Law
“A power law*is*a*functional relationship between*two*quantities,*where*a*relative*change*in*one*
quantity*results*in*a*proportional*relative*change*in*the*other*quantity,*independent*of*the*initial*
size*of*those*quantities:*one*quantity*varies*as*a power of*another.”*(Source:*Wikipedia)
69. Some%Baeza)Yates%Papers%on Web%Characteristics
Characterization,
of,National,Web,
Domains,(2007)
Link,Analysis,in,
National,Web,
Domains,(2005),
Characteristics,
of,the,Web,of,
Spain,(2005)
Bias,on,The,
Web,(2018)
221. • Adamic,(L.A.(and(Huberman,(B.A.,(2002.(Zipf's law(and(the(Internet. Glottometrics, 3(1),(
pp.143C150.
• BaezaCYates,(R.,(Castillo(Ocaranza,(C.(and(López(Martínez,(V.,(2005.(Characteristics(of(the(
Web(of(Spain.
• BaezaCYates,(R.,(Gionis,(A.,(Junqueira,(F.,(Murdock,(V.,(Plachouras,(V.(and(Silvestri,(F.,(
2007,(July.(The(impact(of(caching(on(search(engines.(In Proceedings0of0the030th0annual0
international0ACM0SIGIR0conference0on0Research0and0development0in0information0
retrieval (pp.(183C190).
• BaezaCYates,(R.,(Castillo,(C.(and(Efthimiadis,(E.N.,(2007.(Characterization(of(national(web(
domains. ACM0Transactions0on0Internet0Technology0(TOIT), 7(2),(pp.9Ces.
• BaezaCYates,(R.,(Boldi,(P.(and(Chierichetti,(F.,(2015,(May.(Essential(web(pages(are(easy(to(
find.(In Proceedings0of0the024th0International0Conference0on0World0Wide0Web (pp.(97C
107).
• BaezaCYates,(R.,(2018.(Bias(on(the(web. Communications0of0the0ACM, 61(6),(pp.54C61.
• Becchetti,(L.(and(Castillo,(C.,(2006,(May.(The(distribution(of(PageRank(follows(a(powerC
law(only(for(particular(values(of(the(damping(factor.(In Proceedings0of0the015th0
international0conference0on0World0Wide0Web (pp.(941C942).
• Zipf,(G.K.,(2016. Human0behavior and0the0principle0of0least0effort:0An0introduction0to0
human0ecology.(Ravenio Books.
226. Maximal'Shame'Power'Law'(Baeza3Yates
“One%phenomenon%that%has%appeared%before%in%our%own%studies%and%now%is%
completely%clear%is%the%smaller%power%law%exponent%at%the%beginning%of%several%of%
the%measures%presented.%In%fact,%this%happens%for%file%sizes%up%to%25Kb,%pages%perA
site%up%to%(15–30),%pages%perAdomain%up%to%10%(except%South%Korea),%number%of%
outlinks in%a%page%up%to%10%to%40,%and%average%number%of%internal%links%persite up%to%
15%to%30,%where%a%range%is%given%to%show%the%variability%for%different%countries.%We%
argue%that%this%is%due%to%another%empirical%power%law%that%we%call%maximal%shame%
which%forces%people%to%work%a%bit%more%than%the%minimum%until%they%feel%good%
about%their%work.%Notice%that%this%maximal%shame%can%be%for%an%individual%or%for%a%
group%(e.g.,%in%the%case%of%a%Web%site).”
BaezaAYates,%R.,%Castillo,%C.%and%Efthimiadis,%E.N.,%2007.%Characterization%of%national%
web%domains. ACM$Transactions$on$Internet$Technology$(TOIT), 7(2),%pp.9Aes.