SlideShare a Scribd company logo
1 of 19
A bit of information about
Checksums
By Ross Spencer
Extracts from a joint presentation by myself, Jan Hutař, and Andrea K. Byrne for Archives
NZ colleagues…
Checksums – why?
• why do we use checksums; policy – Integrity:
“This policy deals with the integrity of digital content. Digital content is
information encapsulated in one or more digital objects. Within this
context, integrity of a digital object is the quality of its content
remaining ‘uncorrupted and free of unauthorized and undocumented
changes’” (UNESCO 2003).
• Moving files – validation after the move
• Working with files – uniquely identifying what
we’re working with
• Security… a by-product of integrity
What do checksums look like
• Hexadecimal notation, making a bigger number look smaller!
• Numbers 0-9
• And Letters A-F
---
281,949,770,000,000,000,000,000,000,000,000,000,000
becomes:
d41d8cd98f00b204e9800998ecf8427e
What do checksums look like…
• John Doe
4c2a904bafba06591225113ad17b5cec
MD5
• Jane Doe
cac7bbb6b67b44ea0ab997d34a88e4ea9b4d3d62
SHA1
• Axl Roe
21bd701e54de1d61bba99623509cdd794042dc3f2141ee
d2e853482cfbcccbf0
SHA256
• MD5, SHA1, SHA256 are using different algorithms
What do checksums look like…
USA: f75d91cdd36b85cc4a8dfeca4f24fa14
USB: 7aca5ec618f7317328dcd7014cf9bdcf
What are checksums doing?
- Deterministic – The same input gives the same output
- Uniform/Even distribution – input shared equally across output
An algorithm does the computing
bit…
MD5 or…
- A checksum algorithm is a one way function…
- “a7fc44290f691cd888b68b59eb4989a1” cannot be turned back
into “Joan”!
- The algorithm computing the checksum varies in complexity and goes by
different names… e.g. MD5:
It’s irreversible:
Think: Susan Storm, She Hulk, and The Thing
Rather than: The Hulk
Why do we always talk about the
same ones in our workflows?
• Namely: CRC32, MD5, SHA1, SHA256…
• different algorithms
• DROID can handle MD5, SHA1, and SHA256
• MD5 and SHA1 are the only overlaps with Rosetta
(Oct 2016)
• Rosetta handles (creates and validates):
• CRC32
• MD5
• SHA1
Why multiple checksums?
• There are a limited number of unique numbers that can be output by a
checksum algorithm, so sometimes we see collisions:
4 possible outputs, 5 inputs:
Collisions, really?
• But also keep in mind the probability of that happening for more complex
algorithms:
The probabilities are low (files needed for
1 collision, 50% chance)
• CRC32 - 32-bit output - 8 character length
77 Thousand, 165 – 77165
• MD5 - 128-bit output - 32 character length
21 Quintillion - 21,719,643,148,400,763,000
• SHA1 - 160-bit output - 40 character length
1 Septillion - 1,423,418,533,373,592,400,000,000
• SHA256 - 256-bit output - 64 character length
400 Undecillion - 400,656,698,530,848,040,000,000,000,000,000,000,000
4.5 million (4,443,745) files in Rosetta (as of 13/01/2016)
What if we got one?
• Archivists have the concept of fixity – indicators
of the file not changing, but also – we can
understand what the file is…
• Two files the same according to checksum:
– What was the last accessed date?
– What is the file name?
– What is the file size?
– What is the file type?
– What does it look like?
– We can figure it out!
So why?
• We will ensure uniqueness
• We can automate processes with the files better
with checksums (they’re just numbers!)
• Some may have a preference – it is convenient for us
that Rosetta handles MD5 as well!
• Future proof – one day we will have a lot more files!
• Security – for most altruistic purposes, our
checksums are okay… but older checksums can be
hacked (engineered) – we keep this in mind 10% of
the time we talk about them in an archive…
Checksums – where do they come
from?
• We generate them with a tool:
– Free Commander (Windows)
– online tool on the Internet (http://www.md5.cz/)
– SHA1SUM. MD5SUM, (Linux)
– DROID!!
• We create a list and compare and validate with another:
– Spreadsheet
– SHA1SUM, MD5SUM (Linux)
– AVPreserve Fixity: https://vimeo.com/100311241
– My comparator: https://github.com/exponential-decay/checksum-
comparator
• Other tools out there, many internet links!
Tools using checksums
– Internet behind-the-scenes, verify data being sent
– Rsync – improve efficiency of backups/data moves
– Digital Asset Management systems – file management – ensure
storage integrity/accurate download and access
– DP systems – preserving files (integrity, authenticity)
– Law Enforcement – Software comparison databases – National
Software Reference Library
– HW – storage layers have their own checksums check/validation
• Other cool uses:
Information management systems – de-duplication tools -
removing duplicate files with good reliability – files with different
names but same content produce the same checksum!
“I was having nightmares about the integrity of
my data and thought I was losing sleep… I
looked at my checksums and found that I hadn’t
lost any…” - @beet_keeper
498cd895eb5a102c5aeb977e2b928dee
Thank you!

More Related Content

What's hot

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Information retrieval dynamic indexing
Information retrieval dynamic indexingInformation retrieval dynamic indexing
Information retrieval dynamic indexingNadia Nahar
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesJason TC HOU (侯宗成)
 
Chapter 10 - Multimedia Over Atm
Chapter 10 - Multimedia Over AtmChapter 10 - Multimedia Over Atm
Chapter 10 - Multimedia Over AtmPratik Pradhan
 
Chapter 2 - Computer Networking a top-down Approach 7th
Chapter 2 - Computer Networking a top-down Approach 7thChapter 2 - Computer Networking a top-down Approach 7th
Chapter 2 - Computer Networking a top-down Approach 7thAndy Juan Sarango Veliz
 
Cloud deployment models
Cloud deployment modelsCloud deployment models
Cloud deployment modelsAshok Kumar
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionDataStax
 
MEDIUM ACCESS CONTROL
MEDIUM ACCESS CONTROLMEDIUM ACCESS CONTROL
MEDIUM ACCESS CONTROLjunnubabu
 
Error Detection And Correction
Error Detection And CorrectionError Detection And Correction
Error Detection And CorrectionRenu Kewalramani
 
Chapter 1 - Computer Networking a top-down Approach 7th
Chapter 1 - Computer Networking a top-down Approach 7thChapter 1 - Computer Networking a top-down Approach 7th
Chapter 1 - Computer Networking a top-down Approach 7thAndy Juan Sarango Veliz
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoopdarugar
 
Wireless networks syllabus
Wireless networks syllabusWireless networks syllabus
Wireless networks syllabusnikshaikh786
 
ASYNCHRONOUS TRANSFER MODE (ATM)
ASYNCHRONOUS TRANSFER MODE (ATM)ASYNCHRONOUS TRANSFER MODE (ATM)
ASYNCHRONOUS TRANSFER MODE (ATM)ZillayHuma Mehmood
 

What's hot (20)

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Information retrieval dynamic indexing
Information retrieval dynamic indexingInformation retrieval dynamic indexing
Information retrieval dynamic indexing
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
Chapter 10 - Multimedia Over Atm
Chapter 10 - Multimedia Over AtmChapter 10 - Multimedia Over Atm
Chapter 10 - Multimedia Over Atm
 
Chapter 2 - Computer Networking a top-down Approach 7th
Chapter 2 - Computer Networking a top-down Approach 7thChapter 2 - Computer Networking a top-down Approach 7th
Chapter 2 - Computer Networking a top-down Approach 7th
 
Queuing analysis
Queuing analysisQueuing analysis
Queuing analysis
 
Trends in distributed systems
Trends in distributed systemsTrends in distributed systems
Trends in distributed systems
 
Cloud deployment models
Cloud deployment modelsCloud deployment models
Cloud deployment models
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Firewalls
FirewallsFirewalls
Firewalls
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
MEDIUM ACCESS CONTROL
MEDIUM ACCESS CONTROLMEDIUM ACCESS CONTROL
MEDIUM ACCESS CONTROL
 
Error Detection And Correction
Error Detection And CorrectionError Detection And Correction
Error Detection And Correction
 
Chapter 1 - Computer Networking a top-down Approach 7th
Chapter 1 - Computer Networking a top-down Approach 7thChapter 1 - Computer Networking a top-down Approach 7th
Chapter 1 - Computer Networking a top-down Approach 7th
 
Transport layer protocols : Simple Protocol , Stop and Wait Protocol , Go-Bac...
Transport layer protocols : Simple Protocol , Stop and Wait Protocol , Go-Bac...Transport layer protocols : Simple Protocol , Stop and Wait Protocol , Go-Bac...
Transport layer protocols : Simple Protocol , Stop and Wait Protocol , Go-Bac...
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
 
Wireless networks syllabus
Wireless networks syllabusWireless networks syllabus
Wireless networks syllabus
 
ASYNCHRONOUS TRANSFER MODE (ATM)
ASYNCHRONOUS TRANSFER MODE (ATM)ASYNCHRONOUS TRANSFER MODE (ATM)
ASYNCHRONOUS TRANSFER MODE (ATM)
 

Viewers also liked

GOVSIG: Update on Digital Transfer at Archives NZ
GOVSIG: Update on Digital Transfer at Archives NZGOVSIG: Update on Digital Transfer at Archives NZ
GOVSIG: Update on Digital Transfer at Archives NZRoss Spencer
 
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Ross Spencer
 
Binary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital recordsBinary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital recordsRoss Spencer
 
Legacy digital and outreach @archives nz
Legacy digital and outreach @archives nzLegacy digital and outreach @archives nz
Legacy digital and outreach @archives nzRoss Spencer
 
ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]Ross Spencer
 
The Reality of Digital Transfer @ArchivesNZ
The Reality of Digital Transfer @ArchivesNZThe Reality of Digital Transfer @ArchivesNZ
The Reality of Digital Transfer @ArchivesNZRoss Spencer
 
HDLC(high level data link control)
HDLC(high level data link control)HDLC(high level data link control)
HDLC(high level data link control)Anand Biradar
 
Linear block coding
Linear block codingLinear block coding
Linear block codingjknm
 
Multiple access control protocol
Multiple access control protocol Multiple access control protocol
Multiple access control protocol meenamunesh
 

Viewers also liked (18)

GOVSIG: Update on Digital Transfer at Archives NZ
GOVSIG: Update on Digital Transfer at Archives NZGOVSIG: Update on Digital Transfer at Archives NZ
GOVSIG: Update on Digital Transfer at Archives NZ
 
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
 
Binary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital recordsBinary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital records
 
Legacy digital and outreach @archives nz
Legacy digital and outreach @archives nzLegacy digital and outreach @archives nz
Legacy digital and outreach @archives nz
 
ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]
 
The Reality of Digital Transfer @ArchivesNZ
The Reality of Digital Transfer @ArchivesNZThe Reality of Digital Transfer @ArchivesNZ
The Reality of Digital Transfer @ArchivesNZ
 
HDLC(high level data link control)
HDLC(high level data link control)HDLC(high level data link control)
HDLC(high level data link control)
 
Check sum
Check sumCheck sum
Check sum
 
Hdlc
HdlcHdlc
Hdlc
 
Chapter3
Chapter3Chapter3
Chapter3
 
The medium access sublayer
 The medium  access sublayer The medium  access sublayer
The medium access sublayer
 
Farming system
Farming systemFarming system
Farming system
 
Farming
FarmingFarming
Farming
 
Chapter 03 cyclic codes
Chapter 03   cyclic codesChapter 03   cyclic codes
Chapter 03 cyclic codes
 
Ethernet
EthernetEthernet
Ethernet
 
Linear block coding
Linear block codingLinear block coding
Linear block coding
 
Multiple access protocol
Multiple access protocolMultiple access protocol
Multiple access protocol
 
Multiple access control protocol
Multiple access control protocol Multiple access control protocol
Multiple access control protocol
 

Similar to Checksum 101

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware SandboxesBSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware SandboxesJason Trost
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!nerdybeardo
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remanijaxconf
 
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...Philip Koopman
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
BSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysBSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysJoff Thyer
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationRaffael Marty
 
CapTech Talks Webinar Feb 2023 Rick Hansen.pptx
CapTech Talks Webinar Feb 2023 Rick Hansen.pptxCapTech Talks Webinar Feb 2023 Rick Hansen.pptx
CapTech Talks Webinar Feb 2023 Rick Hansen.pptxCapitolTechU
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghubit-people
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Share winter 2016 encryption
Share winter 2016 encryptionShare winter 2016 encryption
Share winter 2016 encryptionbigendiansmalls
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Alex Pinto
 
Computer forensics libin
Computer forensics   libinComputer forensics   libin
Computer forensics libinlibinp
 
Workshop on Network Security
Workshop on Network SecurityWorkshop on Network Security
Workshop on Network SecurityUC San Diego
 
20-security.ppt
20-security.ppt20-security.ppt
20-security.pptajajkhan16
 
Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment isc2-hellenic
 
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)Gabriella Davis
 

Similar to Checksum 101 (20)

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware SandboxesBSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Cryto Party at CCU
Cryto Party at CCUCryto Party at CCU
Cryto Party at CCU
 
BSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysBSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad Guys
 
Gabe Nault Data Integrity
Gabe Nault Data IntegrityGabe Nault Data Integrity
Gabe Nault Data Integrity
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
CapTech Talks Webinar Feb 2023 Rick Hansen.pptx
CapTech Talks Webinar Feb 2023 Rick Hansen.pptxCapTech Talks Webinar Feb 2023 Rick Hansen.pptx
CapTech Talks Webinar Feb 2023 Rick Hansen.pptx
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Share winter 2016 encryption
Share winter 2016 encryptionShare winter 2016 encryption
Share winter 2016 encryption
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Computer forensics libin
Computer forensics   libinComputer forensics   libin
Computer forensics libin
 
Workshop on Network Security
Workshop on Network SecurityWorkshop on Network Security
Workshop on Network Security
 
20-security.ppt
20-security.ppt20-security.ppt
20-security.ppt
 
Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment
 
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
 

Recently uploaded

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Checksum 101

  • 1. A bit of information about Checksums By Ross Spencer Extracts from a joint presentation by myself, Jan Hutař, and Andrea K. Byrne for Archives NZ colleagues…
  • 2. Checksums – why? • why do we use checksums; policy – Integrity: “This policy deals with the integrity of digital content. Digital content is information encapsulated in one or more digital objects. Within this context, integrity of a digital object is the quality of its content remaining ‘uncorrupted and free of unauthorized and undocumented changes’” (UNESCO 2003). • Moving files – validation after the move • Working with files – uniquely identifying what we’re working with • Security… a by-product of integrity
  • 3. What do checksums look like • Hexadecimal notation, making a bigger number look smaller! • Numbers 0-9 • And Letters A-F --- 281,949,770,000,000,000,000,000,000,000,000,000,000 becomes: d41d8cd98f00b204e9800998ecf8427e
  • 4. What do checksums look like… • John Doe 4c2a904bafba06591225113ad17b5cec MD5 • Jane Doe cac7bbb6b67b44ea0ab997d34a88e4ea9b4d3d62 SHA1 • Axl Roe 21bd701e54de1d61bba99623509cdd794042dc3f2141ee d2e853482cfbcccbf0 SHA256 • MD5, SHA1, SHA256 are using different algorithms
  • 5. What do checksums look like… USA: f75d91cdd36b85cc4a8dfeca4f24fa14 USB: 7aca5ec618f7317328dcd7014cf9bdcf
  • 6. What are checksums doing? - Deterministic – The same input gives the same output - Uniform/Even distribution – input shared equally across output
  • 7. An algorithm does the computing bit…
  • 8. MD5 or… - A checksum algorithm is a one way function… - “a7fc44290f691cd888b68b59eb4989a1” cannot be turned back into “Joan”! - The algorithm computing the checksum varies in complexity and goes by different names… e.g. MD5:
  • 9. It’s irreversible: Think: Susan Storm, She Hulk, and The Thing Rather than: The Hulk
  • 10. Why do we always talk about the same ones in our workflows? • Namely: CRC32, MD5, SHA1, SHA256… • different algorithms • DROID can handle MD5, SHA1, and SHA256 • MD5 and SHA1 are the only overlaps with Rosetta (Oct 2016) • Rosetta handles (creates and validates): • CRC32 • MD5 • SHA1
  • 11. Why multiple checksums? • There are a limited number of unique numbers that can be output by a checksum algorithm, so sometimes we see collisions: 4 possible outputs, 5 inputs:
  • 12. Collisions, really? • But also keep in mind the probability of that happening for more complex algorithms:
  • 13. The probabilities are low (files needed for 1 collision, 50% chance) • CRC32 - 32-bit output - 8 character length 77 Thousand, 165 – 77165 • MD5 - 128-bit output - 32 character length 21 Quintillion - 21,719,643,148,400,763,000 • SHA1 - 160-bit output - 40 character length 1 Septillion - 1,423,418,533,373,592,400,000,000 • SHA256 - 256-bit output - 64 character length 400 Undecillion - 400,656,698,530,848,040,000,000,000,000,000,000,000 4.5 million (4,443,745) files in Rosetta (as of 13/01/2016)
  • 14. What if we got one? • Archivists have the concept of fixity – indicators of the file not changing, but also – we can understand what the file is… • Two files the same according to checksum: – What was the last accessed date? – What is the file name? – What is the file size? – What is the file type? – What does it look like? – We can figure it out!
  • 15. So why? • We will ensure uniqueness • We can automate processes with the files better with checksums (they’re just numbers!) • Some may have a preference – it is convenient for us that Rosetta handles MD5 as well! • Future proof – one day we will have a lot more files! • Security – for most altruistic purposes, our checksums are okay… but older checksums can be hacked (engineered) – we keep this in mind 10% of the time we talk about them in an archive…
  • 16. Checksums – where do they come from? • We generate them with a tool: – Free Commander (Windows) – online tool on the Internet (http://www.md5.cz/) – SHA1SUM. MD5SUM, (Linux) – DROID!! • We create a list and compare and validate with another: – Spreadsheet – SHA1SUM, MD5SUM (Linux) – AVPreserve Fixity: https://vimeo.com/100311241 – My comparator: https://github.com/exponential-decay/checksum- comparator • Other tools out there, many internet links!
  • 17. Tools using checksums – Internet behind-the-scenes, verify data being sent – Rsync – improve efficiency of backups/data moves – Digital Asset Management systems – file management – ensure storage integrity/accurate download and access – DP systems – preserving files (integrity, authenticity) – Law Enforcement – Software comparison databases – National Software Reference Library – HW – storage layers have their own checksums check/validation • Other cool uses: Information management systems – de-duplication tools - removing duplicate files with good reliability – files with different names but same content produce the same checksum!
  • 18. “I was having nightmares about the integrity of my data and thought I was losing sleep… I looked at my checksums and found that I hadn’t lost any…” - @beet_keeper