SlideShare a Scribd company logo
Unique Identifier Generation in
Distributed Environment
Jianhan Zhu
• In distributed systems, sequential IDs are not always an option
• As short as possible for sharing
• GUID of 36 characters could be too long: 00017071-8786-42a5-94d9-dc0f62f585fc
• A balance between ID length and probability of collision
• The shorter the ID, the higher the probability of collision
Probability of
collision (%)
ID Length
0, 0
100
36
Birthday Paradox
• For 𝑛 randomly chosen persons, the probability that at least two of them have the
same birthday
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐶𝑜𝑙𝑙𝑖𝑠𝑖𝑜𝑛 ≈ 1 − 𝑒−
𝑛2
2𝑥
• 𝑥: all possible ID values
• 𝑛: number of IDs we plan to have
𝒙 = 𝟔𝟐 𝟖
• 52 Alphabetic and 10 numeric characters
• ID of length 8
𝑛
• Currently 40K, so a probability of collision: 0.0003%
• If 1 million, the probability is 0.23%
• Will be tens of millions or more in future
In triple store:
• Generated ID: 1AFu55Hs
• Prefix: https://id.parliament.uk
• Resource URI: https://id.parliament.uk/1AFu55Hs
3
ID length Num of IDs generated before a
collision (Simulation)
Probability of collision
5 36K 51% (36K)
6 289K 52% (289K)
7 2.3 Million 51% (2.3 Million)
8 Out of memory 0.002% (100K)
0.06% (0.5 Million)
0.23% (1 Million)
5.56% (5 Million)
20.5% (10 Million)
9 - 0.37% (10 Million)
8.82% (50 Million)
30.88% (100 Million)
10 - 0.59% (100 Million)
2.35% (200 Million)
5.22% (300 Million)
13.84% (500 Million)
44.88% (1000 Million) 4
• Results for different ID lengths:
• Random data source: Crypto Random
• Data estimates on current triple store http://indexing.parliament.uk
• 174 million triples
• 9.2 million unique subjects (2.9 million blank nodes)
5
Subject Prefix
Num of
Triples
Num of Unique
Subjects
Average Num
of Triples per
Subject
http://data.parliament.uk/pimsdata/ 92,708,852 2,960,851 31.3
http://data.parliament.uk/edms/ 24,196,297 1,939,024 12.5
http://hansard.intranet.data.parliament.uk/ 18,115,694 552,505 32.8
http://tabledpq.indexing.parliament.uk/ 6,967,006 191,173 36.4
http://data.parliament.uk/writtenparliamentaryquestion/ 3,716,166 70,199 52.9
http://esid.parliament.uk/EUDocument/ 3,247,707 149,035 21.8
http://data.parliament.uk/depositedpapers/ 2,551,168 80,185 31.8
http://services.paperslaid.devci.dev.parliament.uk/ 644,193 23,121 27.9
http://data.parliament.uk/terms/uncontrolled/ 606,951 172,373 3.5
http://data.parliament.uk/resources/ 490,192 31,509 15.6
http://data.parliament.uk/currentawareness/ 487,227 22,044 22.1
http://paperslaidpoller.parliament.uk/ 396,153 9,636 41.1
• Conclusions:
• 8 characters long ID for the near future
• Need to increase ID length to accommodate more IDs
• At 1 million (0.23%)?
• Data will be structured differently from previous two triple stores?
• In future, add ID collision check against the triple store if the effect of
performance is acceptable
• Challenges:
• If a collision occurred, how to spot it? (Log generated IDs?)
6
Further Reading
• https://en.wikipedia.org/wiki/Birthday_problem
• https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorando
m_number_generator
• https://en.wikipedia.org/wiki/Universally_unique_identifier
• https://eager.io/blog/how-long-does-an-id-need-to-be/
• https://github.com/twitter/snowflake
• Parliament Data Platform: https://api.parliament.uk/openapi.json
7

More Related Content

Similar to Data platform ID generation

Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from Scratch
Jonathan LeBlanc
 
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseSANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
John Bambenek
 
Biometrics (Distributed computing)
Biometrics (Distributed computing)Biometrics (Distributed computing)
Biometrics (Distributed computing)Sri Prasanna
 
Cyber crime &_info_security
Cyber crime &_info_securityCyber crime &_info_security
Cyber crime &_info_security
Er Mahendra Yadav
 
Blockchain on Azure and Use Cases
Blockchain on Azure and Use CasesBlockchain on Azure and Use Cases
Blockchain on Azure and Use Cases
Nuri Cankaya
 
Blockchain general presentation nov 2017 v eng
Blockchain general presentation nov 2017 v engBlockchain general presentation nov 2017 v eng
Blockchain general presentation nov 2017 v eng
David Vangulick
 
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
All Things Open
 
Internet squared, society squared, wehome, cooperativism, sharing economy at ...
Internet squared, society squared, wehome, cooperativism, sharing economy at ...Internet squared, society squared, wehome, cooperativism, sharing economy at ...
Internet squared, society squared, wehome, cooperativism, sharing economy at ...
wehome.me, a home sharing on blockchain owned by hosts and guests
 
Schaffner Quantum Computing and Cryptography.pptx
Schaffner Quantum Computing and Cryptography.pptxSchaffner Quantum Computing and Cryptography.pptx
Schaffner Quantum Computing and Cryptography.pptx
santa142869
 
3d password 23 mar 14
3d password 23 mar 143d password 23 mar 14
3d password 23 mar 14
Saddam Ahmed
 
3D Password
3D Password3D Password
3D Password
Devyani Vaidya
 
Connected Cars: What Could Possibly Go Wrong
Connected Cars: What Could Possibly Go WrongConnected Cars: What Could Possibly Go Wrong
Connected Cars: What Could Possibly Go Wrong
OnBoard Security, Inc. - a Qualcomm Company
 
influence of AI in IS
influence of AI in ISinfluence of AI in IS
influence of AI in IS
ISACA Riyadh
 
Nasscom Demystifying Blockchain 101
Nasscom Demystifying Blockchain 101Nasscom Demystifying Blockchain 101
Nasscom Demystifying Blockchain 101
Mayank Jain
 
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Infrastructure Facility
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
Kendrick Lo
 
Password based encryption
Password based encryptionPassword based encryption
Password based encryption
Sachin Tripathi
 
MITRE ATTACKcon Power Hour - January
MITRE ATTACKcon Power Hour - JanuaryMITRE ATTACKcon Power Hour - January
MITRE ATTACKcon Power Hour - January
MITRE - ATT&CKcon
 

Similar to Data platform ID generation (20)

Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from Scratch
 
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseSANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
 
Biometrics (Distributed computing)
Biometrics (Distributed computing)Biometrics (Distributed computing)
Biometrics (Distributed computing)
 
Cyber crime &_info_security
Cyber crime &_info_securityCyber crime &_info_security
Cyber crime &_info_security
 
Blockchain on Azure and Use Cases
Blockchain on Azure and Use CasesBlockchain on Azure and Use Cases
Blockchain on Azure and Use Cases
 
Blockchain general presentation nov 2017 v eng
Blockchain general presentation nov 2017 v engBlockchain general presentation nov 2017 v eng
Blockchain general presentation nov 2017 v eng
 
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
 
Internet squared, society squared, wehome, cooperativism, sharing economy at ...
Internet squared, society squared, wehome, cooperativism, sharing economy at ...Internet squared, society squared, wehome, cooperativism, sharing economy at ...
Internet squared, society squared, wehome, cooperativism, sharing economy at ...
 
Schaffner Quantum Computing and Cryptography.pptx
Schaffner Quantum Computing and Cryptography.pptxSchaffner Quantum Computing and Cryptography.pptx
Schaffner Quantum Computing and Cryptography.pptx
 
3d password 23 mar 14
3d password 23 mar 143d password 23 mar 14
3d password 23 mar 14
 
3D Password
3D Password3D Password
3D Password
 
Connected Cars: What Could Possibly Go Wrong
Connected Cars: What Could Possibly Go WrongConnected Cars: What Could Possibly Go Wrong
Connected Cars: What Could Possibly Go Wrong
 
influence of AI in IS
influence of AI in ISinfluence of AI in IS
influence of AI in IS
 
Nasscom Demystifying Blockchain 101
Nasscom Demystifying Blockchain 101Nasscom Demystifying Blockchain 101
Nasscom Demystifying Blockchain 101
 
Network Security
Network SecurityNetwork Security
Network Security
 
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
 
Hyperloglog Project
Hyperloglog ProjectHyperloglog Project
Hyperloglog Project
 
Password based encryption
Password based encryptionPassword based encryption
Password based encryption
 
MITRE ATTACKcon Power Hour - January
MITRE ATTACKcon Power Hour - JanuaryMITRE ATTACKcon Power Hour - January
MITRE ATTACKcon Power Hour - January
 

More from UK Parliament Data

Coping with complexity
Coping with complexityCoping with complexity
Coping with complexity
UK Parliament Data
 
Making parliamentary procedure machine readable
Making parliamentary procedure machine readableMaking parliamentary procedure machine readable
Making parliamentary procedure machine readable
UK Parliament Data
 
What would erskine may do?
What would erskine may do?What would erskine may do?
What would erskine may do?
UK Parliament Data
 
Unlocking the Indexing and Search Data Goldmine
Unlocking the Indexing and Search Data GoldmineUnlocking the Indexing and Search Data Goldmine
Unlocking the Indexing and Search Data Goldmine
UK Parliament Data
 
Modelling Parliamentary Procedure
Modelling Parliamentary ProcedureModelling Parliamentary Procedure
Modelling Parliamentary Procedure
UK Parliament Data
 
Domain modelling Parliament
Domain modelling Parliament Domain modelling Parliament
Domain modelling Parliament
UK Parliament Data
 
A new data platform for Parliament
A new data platform for ParliamentA new data platform for Parliament
A new data platform for Parliament
UK Parliament Data
 
What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?
UK Parliament Data
 
UK Parliament: the long road to open data
UK Parliament:  the long road to open data UK Parliament:  the long road to open data
UK Parliament: the long road to open data
UK Parliament Data
 
Domain Driven Design at UK Parliament
Domain Driven Design at UK ParliamentDomain Driven Design at UK Parliament
Domain Driven Design at UK Parliament
UK Parliament Data
 
Open Revolution - James Smith
Open Revolution - James SmithOpen Revolution - James Smith
Open Revolution - James Smith
UK Parliament Data
 
Parliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan BarrettParliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan Barrett
UK Parliament Data
 
Playing with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony HirstPlaying with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony Hirst
UK Parliament Data
 
How technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve GoodrichHow technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve Goodrich
UK Parliament Data
 
Mapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli HawkinsMapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli Hawkins
UK Parliament Data
 

More from UK Parliament Data (15)

Coping with complexity
Coping with complexityCoping with complexity
Coping with complexity
 
Making parliamentary procedure machine readable
Making parliamentary procedure machine readableMaking parliamentary procedure machine readable
Making parliamentary procedure machine readable
 
What would erskine may do?
What would erskine may do?What would erskine may do?
What would erskine may do?
 
Unlocking the Indexing and Search Data Goldmine
Unlocking the Indexing and Search Data GoldmineUnlocking the Indexing and Search Data Goldmine
Unlocking the Indexing and Search Data Goldmine
 
Modelling Parliamentary Procedure
Modelling Parliamentary ProcedureModelling Parliamentary Procedure
Modelling Parliamentary Procedure
 
Domain modelling Parliament
Domain modelling Parliament Domain modelling Parliament
Domain modelling Parliament
 
A new data platform for Parliament
A new data platform for ParliamentA new data platform for Parliament
A new data platform for Parliament
 
What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?What do Twitter conversations tell us about petitioning?
What do Twitter conversations tell us about petitioning?
 
UK Parliament: the long road to open data
UK Parliament:  the long road to open data UK Parliament:  the long road to open data
UK Parliament: the long road to open data
 
Domain Driven Design at UK Parliament
Domain Driven Design at UK ParliamentDomain Driven Design at UK Parliament
Domain Driven Design at UK Parliament
 
Open Revolution - James Smith
Open Revolution - James SmithOpen Revolution - James Smith
Open Revolution - James Smith
 
Parliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan BarrettParliament, data and democracy meetup - Dan Barrett
Parliament, data and democracy meetup - Dan Barrett
 
Playing with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony HirstPlaying with Parliamentary Data - Tony Hirst
Playing with Parliamentary Data - Tony Hirst
 
How technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve GoodrichHow technology can help you monitor your MP’s performance - Steve Goodrich
How technology can help you monitor your MP’s performance - Steve Goodrich
 
Mapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli HawkinsMapping population data for Parliament - Oli Hawkins
Mapping population data for Parliament - Oli Hawkins
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 

Data platform ID generation

  • 1. Unique Identifier Generation in Distributed Environment Jianhan Zhu
  • 2. • In distributed systems, sequential IDs are not always an option • As short as possible for sharing • GUID of 36 characters could be too long: 00017071-8786-42a5-94d9-dc0f62f585fc • A balance between ID length and probability of collision • The shorter the ID, the higher the probability of collision Probability of collision (%) ID Length 0, 0 100 36
  • 3. Birthday Paradox • For 𝑛 randomly chosen persons, the probability that at least two of them have the same birthday 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐶𝑜𝑙𝑙𝑖𝑠𝑖𝑜𝑛 ≈ 1 − 𝑒− 𝑛2 2𝑥 • 𝑥: all possible ID values • 𝑛: number of IDs we plan to have 𝒙 = 𝟔𝟐 𝟖 • 52 Alphabetic and 10 numeric characters • ID of length 8 𝑛 • Currently 40K, so a probability of collision: 0.0003% • If 1 million, the probability is 0.23% • Will be tens of millions or more in future In triple store: • Generated ID: 1AFu55Hs • Prefix: https://id.parliament.uk • Resource URI: https://id.parliament.uk/1AFu55Hs 3
  • 4. ID length Num of IDs generated before a collision (Simulation) Probability of collision 5 36K 51% (36K) 6 289K 52% (289K) 7 2.3 Million 51% (2.3 Million) 8 Out of memory 0.002% (100K) 0.06% (0.5 Million) 0.23% (1 Million) 5.56% (5 Million) 20.5% (10 Million) 9 - 0.37% (10 Million) 8.82% (50 Million) 30.88% (100 Million) 10 - 0.59% (100 Million) 2.35% (200 Million) 5.22% (300 Million) 13.84% (500 Million) 44.88% (1000 Million) 4 • Results for different ID lengths: • Random data source: Crypto Random
  • 5. • Data estimates on current triple store http://indexing.parliament.uk • 174 million triples • 9.2 million unique subjects (2.9 million blank nodes) 5 Subject Prefix Num of Triples Num of Unique Subjects Average Num of Triples per Subject http://data.parliament.uk/pimsdata/ 92,708,852 2,960,851 31.3 http://data.parliament.uk/edms/ 24,196,297 1,939,024 12.5 http://hansard.intranet.data.parliament.uk/ 18,115,694 552,505 32.8 http://tabledpq.indexing.parliament.uk/ 6,967,006 191,173 36.4 http://data.parliament.uk/writtenparliamentaryquestion/ 3,716,166 70,199 52.9 http://esid.parliament.uk/EUDocument/ 3,247,707 149,035 21.8 http://data.parliament.uk/depositedpapers/ 2,551,168 80,185 31.8 http://services.paperslaid.devci.dev.parliament.uk/ 644,193 23,121 27.9 http://data.parliament.uk/terms/uncontrolled/ 606,951 172,373 3.5 http://data.parliament.uk/resources/ 490,192 31,509 15.6 http://data.parliament.uk/currentawareness/ 487,227 22,044 22.1 http://paperslaidpoller.parliament.uk/ 396,153 9,636 41.1
  • 6. • Conclusions: • 8 characters long ID for the near future • Need to increase ID length to accommodate more IDs • At 1 million (0.23%)? • Data will be structured differently from previous two triple stores? • In future, add ID collision check against the triple store if the effect of performance is acceptable • Challenges: • If a collision occurred, how to spot it? (Log generated IDs?) 6
  • 7. Further Reading • https://en.wikipedia.org/wiki/Birthday_problem • https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorando m_number_generator • https://en.wikipedia.org/wiki/Universally_unique_identifier • https://eager.io/blog/how-long-does-an-id-need-to-be/ • https://github.com/twitter/snowflake • Parliament Data Platform: https://api.parliament.uk/openapi.json 7