SlideShare a Scribd company logo
Zipf Distribution
Generate zipfian subscription spaces
Take all combinations C(5,i) ( 𝑖=0
5
𝐶 5, 𝑖 = 25
)
C(5,0) {}
C(5,1) {1} {2} {3} {4} {5}
C(5,2) {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5}
C(5,3) {1,2,3} {1,2,4} {1,2,5} {1,3,4} {1,3,5} {1,4,5} {2,3,4} {2,3,5} {2,4,5} {3,4,5}
C(5,4) {1,2,3,4} {1,2,3,5} {1,2,4,5} {1,3,4,5} {2,3,4,5}
C(5,5) {1,2,3,4,5}
C(5,0)
C(5,1) {1}
×
1/5
×
C(5,1) × 1
{2}
×
1/5
×
C(5,1) × 1
{3}
×
1/5
×
C(5,1) × 1
{4}
×
1/5
×
C(5,1) × 1
{5}
×
1/5
×
C(5,1) ×1
C(5,2) {1}
×
2/5
×
C(5,2) × 2
{2}
×
2/5
×
C(5,2) × 2
{3}
×
2/5
×
C(5,2) × 2
…. ….
C(5,3)
C(5,4)
C(5,5)
Normal Combinations
Prob(rank, C(5,i)) =
𝑖
5
Prob(rank, 𝑖=0
5
𝐶 5, 𝑖 ) =
𝑖=0
5
𝑃𝑟𝑜𝑏 𝑟𝑎𝑛𝑘,𝐶 5,𝑖 × 𝐶 5,𝑖 × 𝑖
𝑖=0
5
𝐶 5,𝑖 ×𝑖
= constant
Items have equal probability to appear because “no rank”
How to generate normal combinations?
• Take C(5,2), we have
• 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
• (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (4,5)
Zipf Formula
𝑓 𝑘: 𝑠, 𝑁 =
1
𝑘 𝑠
𝑛=1
𝑁
(
1
𝑛 𝑠)
N - number of elements in distribution,
k - rank of element
s - value of exponent
Take N=5, rank ={1,….5}, s=1
Rank Zipf Prob
1 0.437956204379562062
2 0.218978102189781033
3 0.1459854014598544
4 0.109489051094890525
5 0.08759124087591241
CDF (5) 1.000000000000000004
Take 5 Attributes, Rank them ={1,….5}, s=1
Attribute Rank Zipf Prob
Feature 1 0.437956204379562062
ProductGroup 2 0.218978102189781033
Brand 3 0.1459854014598544
Label 4 0.109489051094890525
PackageQuantity 5 0.08759124087591241
CDF (5) 1.000000000000000004
Zipf Distribution
Prob(rank, C(5,i)) = 𝑧𝑖𝑝𝑓(𝑟𝑎𝑛𝑘)
Generating….
C(5,0) {}
C(5,1) {Feature}
×
zipf(Feature)
×
C(5,1) × 1
{ProductGroup}
×
zipf(ProductGroup)
×
C(5,1) × 1
{Brand}
×
zipf(Brand)
×
C(5,1) × 1
{Label}
×
zipf(Label)
×
C(5,1) × 1
{PQ}
×
zipf(PackageQuantity)
×
C(5,1) × 1
C(5,2) {Feature}
×
zipf(Feature)
×
C(5,2) × 2
{ProductGroup}
×
zipf(ProductGroup)
×
C(5,2) × 2
… … …
C(5,3)
C(5,4)
C(5,5)
Generating….
C(5,0) {}
C(5,1) {Feature}
×
0.437956204379562062
×
5 × 1
{ProductGroup
×
0.218978102189781033
×
5 × 1
{Brand}
×
0.1459854014598544
×
5 × 1
{Label}
×
0.109489051094890525
×
5 × 1
{PQ}
×
0.08759124087591241
×
5 × 1
C(5,2) {Feature}
×
0.437956204379562062
×
10 × 2
{ProductGroup
×
0.218978102189781033
×
10 × 2
{Brand}
×
0.1459854014598544
×
10 × 2
{Label}
×
0.109489051094890525
×
10 × 2
{PQ}
×
0.08759124087591241
×
10 × 2
C(5,3)
C(5,4)
C(5,5)
Zipf Distribution
Prob(rank, C(5,i)) = 𝑧𝑖𝑝𝑓(𝑟𝑎𝑛𝑘)
Prob(rank, 𝑖=0
5
𝐶 5, 𝑖 ) =
𝑖=0
5
𝑃𝑟𝑜𝑏 𝑟𝑎𝑛𝑘,𝐶 5,𝑖 × 𝐶 5,𝑖 × 𝑖
𝑖=0
5
𝐶 5,𝑖 ×𝑖
Prob(rank,
𝑖=0
5
𝐶 5, 𝑖 ) ∝ 𝒛𝒊𝒑𝒇(𝒓𝒂𝒏𝒌)
Ranks appear according to zipf distribution
How to generate zipf combinations?
• Take C(5,2), we have
• Feature × 9
• ProductGroup × 4
• Brand × 3
• Label × 2
• PQ × 2
• 𝐹1 𝐹2 𝐹3 𝐹4 𝐹5 𝐹6 𝐹7 𝐹8 𝐹9 𝑃1 𝑃2 𝑃3 𝑃4 𝐵1 𝐵2 𝐵3 𝐿1 𝐿2 𝑄1 𝑄2
• (𝐹1, 𝑃1) (𝐹2, 𝐵1) (𝐹3, 𝐿1) (𝐹4, 𝑄1) (𝑃2, 𝐵2) (𝑃3, 𝐿2) (𝑃4, 𝑄2) (𝐹5, 𝐵3) (𝐹6, 𝐹7) (𝐹8, 𝐹9)

More Related Content

What's hot

Application of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationApplication of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimization
Pranamesh Chakraborty
 
Comparative study of algorithms of nonlinear optimization
Comparative study of algorithms of nonlinear optimizationComparative study of algorithms of nonlinear optimization
Comparative study of algorithms of nonlinear optimization
Pranamesh Chakraborty
 
Exercicios expressões numéricas
Exercicios expressões numéricasExercicios expressões numéricas
Exercicios expressões numéricas
Vitor Leal Diniz
 
Ch15pp
Ch15ppCh15pp
Chapter1.7
Chapter1.7Chapter1.7
Chapter1.7
nglaze10
 
Parentesi
ParentesiParentesi
Parentesi
Neus Muñoz
 

What's hot (6)

Application of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimizationApplication of recursive perturbation approach for multimodal optimization
Application of recursive perturbation approach for multimodal optimization
 
Comparative study of algorithms of nonlinear optimization
Comparative study of algorithms of nonlinear optimizationComparative study of algorithms of nonlinear optimization
Comparative study of algorithms of nonlinear optimization
 
Exercicios expressões numéricas
Exercicios expressões numéricasExercicios expressões numéricas
Exercicios expressões numéricas
 
Ch15pp
Ch15ppCh15pp
Ch15pp
 
Chapter1.7
Chapter1.7Chapter1.7
Chapter1.7
 
Parentesi
ParentesiParentesi
Parentesi
 

Similar to Zipf distribution

K means clustering
K means clusteringK means clustering
K means clustering
Ahmedasbasb
 
Lec38
Lec38Lec38
1 8 Properties
1 8 Properties1 8 Properties
1 8 Properties
Kathy Favazza
 
4 chap
4 chap4 chap
GCC
GCCGCC
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
Lalit Kumar
 
Algebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manual
kejeqadaqo
 
math worsheet
math worsheetmath worsheet
math worsheet
Zoe Hope
 
Factoring common monomial
Factoring common monomialFactoring common monomial
Factoring common monomial
AjayQuines
 
Running Free with the Monads
Running Free with the MonadsRunning Free with the Monads
Running Free with the Monads
kenbot
 
The Ring programming language version 1.5 book - Part 5 of 31
The Ring programming language version 1.5 book - Part 5 of 31The Ring programming language version 1.5 book - Part 5 of 31
The Ring programming language version 1.5 book - Part 5 of 31
Mahmoud Samir Fayed
 
Function problem p
Function problem pFunction problem p
Function problem p
Thanuphong Ngoapm
 
Day 5 multiplying and dividing integers
Day 5 multiplying and dividing integersDay 5 multiplying and dividing integers
Day 5 multiplying and dividing integers
Erik Tjersland
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
bigdata trunk
 
WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...
WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...
WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...
MaruMengesha
 
Facility Layout in production management
Facility Layout in production managementFacility Layout in production management
Facility Layout in production management
Joshua Miranda
 
Facility location
Facility location Facility location
Facility location
Hitesh Bhiogade
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
Barbara Fusinska
 
Metodologia de la programación - expresiones
Metodologia de la programación - expresionesMetodologia de la programación - expresiones
Metodologia de la programación - expresiones
Mar_Angeles
 
Counting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort AlgorithmsCounting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort Algorithms
Sarvesh Rawat
 

Similar to Zipf distribution (20)

K means clustering
K means clusteringK means clustering
K means clustering
 
Lec38
Lec38Lec38
Lec38
 
1 8 Properties
1 8 Properties1 8 Properties
1 8 Properties
 
4 chap
4 chap4 chap
4 chap
 
GCC
GCCGCC
GCC
 
LalitBDA2015V3
LalitBDA2015V3LalitBDA2015V3
LalitBDA2015V3
 
Algebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manual
 
math worsheet
math worsheetmath worsheet
math worsheet
 
Factoring common monomial
Factoring common monomialFactoring common monomial
Factoring common monomial
 
Running Free with the Monads
Running Free with the MonadsRunning Free with the Monads
Running Free with the Monads
 
The Ring programming language version 1.5 book - Part 5 of 31
The Ring programming language version 1.5 book - Part 5 of 31The Ring programming language version 1.5 book - Part 5 of 31
The Ring programming language version 1.5 book - Part 5 of 31
 
Function problem p
Function problem pFunction problem p
Function problem p
 
Day 5 multiplying and dividing integers
Day 5 multiplying and dividing integersDay 5 multiplying and dividing integers
Day 5 multiplying and dividing integers
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
 
WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...
WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...
WINSEM2020-21_STS3105_SS_VL2020210500169_Reference_Material_I_01-Mar-2021_L12...
 
Facility Layout in production management
Facility Layout in production managementFacility Layout in production management
Facility Layout in production management
 
Facility location
Facility location Facility location
Facility location
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
 
Metodologia de la programación - expresiones
Metodologia de la programación - expresionesMetodologia de la programación - expresiones
Metodologia de la programación - expresiones
 
Counting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort AlgorithmsCounting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort Algorithms
 

More from Sameera Horawalavithana

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
Sameera Horawalavithana
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Sameera Horawalavithana
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Sameera Horawalavithana
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Sameera Horawalavithana
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Sameera Horawalavithana
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
Sameera Horawalavithana
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
Sameera Horawalavithana
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
Sameera Horawalavithana
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
Sameera Horawalavithana
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
Sameera Horawalavithana
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
Sameera Horawalavithana
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
Sameera Horawalavithana
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
Sameera Horawalavithana
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
Sameera Horawalavithana
 
Query personalization
Query personalizationQuery personalization
Query personalization
Sameera Horawalavithana
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
Sameera Horawalavithana
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Sameera Horawalavithana
 

More from Sameera Horawalavithana (17)

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 

Zipf distribution

  • 2. Take all combinations C(5,i) ( 𝑖=0 5 𝐶 5, 𝑖 = 25 ) C(5,0) {} C(5,1) {1} {2} {3} {4} {5} C(5,2) {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} C(5,3) {1,2,3} {1,2,4} {1,2,5} {1,3,4} {1,3,5} {1,4,5} {2,3,4} {2,3,5} {2,4,5} {3,4,5} C(5,4) {1,2,3,4} {1,2,3,5} {1,2,4,5} {1,3,4,5} {2,3,4,5} C(5,5) {1,2,3,4,5}
  • 3. C(5,0) C(5,1) {1} × 1/5 × C(5,1) × 1 {2} × 1/5 × C(5,1) × 1 {3} × 1/5 × C(5,1) × 1 {4} × 1/5 × C(5,1) × 1 {5} × 1/5 × C(5,1) ×1 C(5,2) {1} × 2/5 × C(5,2) × 2 {2} × 2/5 × C(5,2) × 2 {3} × 2/5 × C(5,2) × 2 …. …. C(5,3) C(5,4) C(5,5)
  • 4. Normal Combinations Prob(rank, C(5,i)) = 𝑖 5 Prob(rank, 𝑖=0 5 𝐶 5, 𝑖 ) = 𝑖=0 5 𝑃𝑟𝑜𝑏 𝑟𝑎𝑛𝑘,𝐶 5,𝑖 × 𝐶 5,𝑖 × 𝑖 𝑖=0 5 𝐶 5,𝑖 ×𝑖 = constant Items have equal probability to appear because “no rank”
  • 5. How to generate normal combinations? • Take C(5,2), we have • 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 • (1,2) (1,3) (1,4) (1,5) (2,3) (2,4) (2,5) (3,4) (3,5) (4,5)
  • 6. Zipf Formula 𝑓 𝑘: 𝑠, 𝑁 = 1 𝑘 𝑠 𝑛=1 𝑁 ( 1 𝑛 𝑠) N - number of elements in distribution, k - rank of element s - value of exponent
  • 7. Take N=5, rank ={1,….5}, s=1 Rank Zipf Prob 1 0.437956204379562062 2 0.218978102189781033 3 0.1459854014598544 4 0.109489051094890525 5 0.08759124087591241 CDF (5) 1.000000000000000004
  • 8. Take 5 Attributes, Rank them ={1,….5}, s=1 Attribute Rank Zipf Prob Feature 1 0.437956204379562062 ProductGroup 2 0.218978102189781033 Brand 3 0.1459854014598544 Label 4 0.109489051094890525 PackageQuantity 5 0.08759124087591241 CDF (5) 1.000000000000000004
  • 9. Zipf Distribution Prob(rank, C(5,i)) = 𝑧𝑖𝑝𝑓(𝑟𝑎𝑛𝑘)
  • 10. Generating…. C(5,0) {} C(5,1) {Feature} × zipf(Feature) × C(5,1) × 1 {ProductGroup} × zipf(ProductGroup) × C(5,1) × 1 {Brand} × zipf(Brand) × C(5,1) × 1 {Label} × zipf(Label) × C(5,1) × 1 {PQ} × zipf(PackageQuantity) × C(5,1) × 1 C(5,2) {Feature} × zipf(Feature) × C(5,2) × 2 {ProductGroup} × zipf(ProductGroup) × C(5,2) × 2 … … … C(5,3) C(5,4) C(5,5)
  • 11. Generating…. C(5,0) {} C(5,1) {Feature} × 0.437956204379562062 × 5 × 1 {ProductGroup × 0.218978102189781033 × 5 × 1 {Brand} × 0.1459854014598544 × 5 × 1 {Label} × 0.109489051094890525 × 5 × 1 {PQ} × 0.08759124087591241 × 5 × 1 C(5,2) {Feature} × 0.437956204379562062 × 10 × 2 {ProductGroup × 0.218978102189781033 × 10 × 2 {Brand} × 0.1459854014598544 × 10 × 2 {Label} × 0.109489051094890525 × 10 × 2 {PQ} × 0.08759124087591241 × 10 × 2 C(5,3) C(5,4) C(5,5)
  • 12. Zipf Distribution Prob(rank, C(5,i)) = 𝑧𝑖𝑝𝑓(𝑟𝑎𝑛𝑘) Prob(rank, 𝑖=0 5 𝐶 5, 𝑖 ) = 𝑖=0 5 𝑃𝑟𝑜𝑏 𝑟𝑎𝑛𝑘,𝐶 5,𝑖 × 𝐶 5,𝑖 × 𝑖 𝑖=0 5 𝐶 5,𝑖 ×𝑖 Prob(rank, 𝑖=0 5 𝐶 5, 𝑖 ) ∝ 𝒛𝒊𝒑𝒇(𝒓𝒂𝒏𝒌) Ranks appear according to zipf distribution
  • 13. How to generate zipf combinations? • Take C(5,2), we have • Feature × 9 • ProductGroup × 4 • Brand × 3 • Label × 2 • PQ × 2 • 𝐹1 𝐹2 𝐹3 𝐹4 𝐹5 𝐹6 𝐹7 𝐹8 𝐹9 𝑃1 𝑃2 𝑃3 𝑃4 𝐵1 𝐵2 𝐵3 𝐿1 𝐿2 𝑄1 𝑄2 • (𝐹1, 𝑃1) (𝐹2, 𝐵1) (𝐹3, 𝐿1) (𝐹4, 𝑄1) (𝑃2, 𝐵2) (𝑃3, 𝐿2) (𝑃4, 𝑄2) (𝐹5, 𝐵3) (𝐹6, 𝐹7) (𝐹8, 𝐹9)