SlideShare a Scribd company logo
1 of 19
A bit of information about
Checksums
By Ross Spencer
Extracts from a joint presentation by myself, Jan Hutař, and Andrea K. Byrne for Archives
NZ colleagues…
Checksums – why?
• why do we use checksums; policy – Integrity:
“This policy deals with the integrity of digital content. Digital content is
information encapsulated in one or more digital objects. Within this
context, integrity of a digital object is the quality of its content
remaining ‘uncorrupted and free of unauthorized and undocumented
changes’” (UNESCO 2003).
• Moving files – validation after the move
• Working with files – uniquely identifying what
we’re working with
• Security… a by-product of integrity
What do checksums look like
• Hexadecimal notation, making a bigger number look smaller!
• Numbers 0-9
• And Letters A-F
---
281,949,770,000,000,000,000,000,000,000,000,000,000
becomes:
d41d8cd98f00b204e9800998ecf8427e
What do checksums look like…
• John Doe
4c2a904bafba06591225113ad17b5cec
MD5
• Jane Doe
cac7bbb6b67b44ea0ab997d34a88e4ea9b4d3d62
SHA1
• Axl Roe
21bd701e54de1d61bba99623509cdd794042dc3f2141ee
d2e853482cfbcccbf0
SHA256
• MD5, SHA1, SHA256 are using different algorithms
What do checksums look like…
USA: f75d91cdd36b85cc4a8dfeca4f24fa14
USB: 7aca5ec618f7317328dcd7014cf9bdcf
What are checksums doing?
- Deterministic – The same input gives the same output
- Uniform/Even distribution – input shared equally across output
An algorithm does the computing
bit…
MD5 or…
- A checksum algorithm is a one way function…
- “a7fc44290f691cd888b68b59eb4989a1” cannot be turned back
into “Joan”!
- The algorithm computing the checksum varies in complexity and goes by
different names… e.g. MD5:
It’s irreversible:
Think: Susan Storm, She Hulk, and The Thing
Rather than: The Hulk
Why do we always talk about the
same ones in our workflows?
• Namely: CRC32, MD5, SHA1, SHA256…
• different algorithms
• DROID can handle MD5, SHA1, and SHA256
• MD5 and SHA1 are the only overlaps with Rosetta
(Oct 2016)
• Rosetta handles (creates and validates):
• CRC32
• MD5
• SHA1
Why multiple checksums?
• There are a limited number of unique numbers that can be output by a
checksum algorithm, so sometimes we see collisions:
4 possible outputs, 5 inputs:
Collisions, really?
• But also keep in mind the probability of that happening for more complex
algorithms:
The probabilities are low (files needed for
1 collision, 50% chance)
• CRC32 - 32-bit output - 8 character length
77 Thousand, 165 – 77165
• MD5 - 128-bit output - 32 character length
21 Quintillion - 21,719,643,148,400,763,000
• SHA1 - 160-bit output - 40 character length
1 Septillion - 1,423,418,533,373,592,400,000,000
• SHA256 - 256-bit output - 64 character length
400 Undecillion - 400,656,698,530,848,040,000,000,000,000,000,000,000
4.5 million (4,443,745) files in Rosetta (as of 13/01/2016)
What if we got one?
• Archivists have the concept of fixity – indicators
of the file not changing, but also – we can
understand what the file is…
• Two files the same according to checksum:
– What was the last accessed date?
– What is the file name?
– What is the file size?
– What is the file type?
– What does it look like?
– We can figure it out!
So why?
• We will ensure uniqueness
• We can automate processes with the files better
with checksums (they’re just numbers!)
• Some may have a preference – it is convenient for us
that Rosetta handles MD5 as well!
• Future proof – one day we will have a lot more files!
• Security – for most altruistic purposes, our
checksums are okay… but older checksums can be
hacked (engineered) – we keep this in mind 10% of
the time we talk about them in an archive…
Checksums – where do they come
from?
• We generate them with a tool:
– Free Commander (Windows)
– online tool on the Internet (http://www.md5.cz/)
– SHA1SUM. MD5SUM, (Linux)
– DROID!!
• We create a list and compare and validate with another:
– Spreadsheet
– SHA1SUM, MD5SUM (Linux)
– AVPreserve Fixity: https://vimeo.com/100311241
– My comparator: https://github.com/exponential-decay/checksum-
comparator
• Other tools out there, many internet links!
Tools using checksums
– Internet behind-the-scenes, verify data being sent
– Rsync – improve efficiency of backups/data moves
– Digital Asset Management systems – file management – ensure
storage integrity/accurate download and access
– DP systems – preserving files (integrity, authenticity)
– Law Enforcement – Software comparison databases – National
Software Reference Library
– HW – storage layers have their own checksums check/validation
• Other cool uses:
Information management systems – de-duplication tools -
removing duplicate files with good reliability – files with different
names but same content produce the same checksum!
“I was having nightmares about the integrity of
my data and thought I was losing sleep… I
looked at my checksums and found that I hadn’t
lost any…” - @beet_keeper
498cd895eb5a102c5aeb977e2b928dee
Thank you!

More Related Content

What's hot

Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 
Lecture 23 27. quality of services in ad hoc wireless networks
Lecture 23 27. quality of services in ad hoc wireless networksLecture 23 27. quality of services in ad hoc wireless networks
Lecture 23 27. quality of services in ad hoc wireless networks
Chandra Meena
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
Shivam Gupta
 

What's hot (20)

Block Cipher and its Design Principles
Block Cipher and its Design PrinciplesBlock Cipher and its Design Principles
Block Cipher and its Design Principles
 
CoAP protocol -Internet of Things(iot)
CoAP protocol -Internet of Things(iot)CoAP protocol -Internet of Things(iot)
CoAP protocol -Internet of Things(iot)
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
Congestion control and quality of service
Congestion control and quality of serviceCongestion control and quality of service
Congestion control and quality of service
 
Network Mnagement for WSN
Network Mnagement for WSNNetwork Mnagement for WSN
Network Mnagement for WSN
 
Bgp protocol
Bgp protocolBgp protocol
Bgp protocol
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
Text summarization
Text summarizationText summarization
Text summarization
 
Lecture 23 27. quality of services in ad hoc wireless networks
Lecture 23 27. quality of services in ad hoc wireless networksLecture 23 27. quality of services in ad hoc wireless networks
Lecture 23 27. quality of services in ad hoc wireless networks
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
RPL - Routing Protocol for Low Power and Lossy Networks
RPL - Routing Protocol for Low Power and Lossy NetworksRPL - Routing Protocol for Low Power and Lossy Networks
RPL - Routing Protocol for Low Power and Lossy Networks
 
DSDV VS AODV
DSDV VS AODV DSDV VS AODV
DSDV VS AODV
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Destination Sequenced Distance Vector Routing (DSDV)
Destination Sequenced Distance Vector Routing (DSDV)Destination Sequenced Distance Vector Routing (DSDV)
Destination Sequenced Distance Vector Routing (DSDV)
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
 
System On Chip (SOC)
System On Chip (SOC)System On Chip (SOC)
System On Chip (SOC)
 
Intoduction to TinyOS, nesC and TOSSIM
Intoduction to TinyOS, nesC and TOSSIMIntoduction to TinyOS, nesC and TOSSIM
Intoduction to TinyOS, nesC and TOSSIM
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
Application Layer Protocols for the IoT
Application Layer Protocols for the IoTApplication Layer Protocols for the IoT
Application Layer Protocols for the IoT
 

Viewers also liked

Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Ross Spencer
 
Legacy digital and outreach @archives nz
Legacy digital and outreach @archives nzLegacy digital and outreach @archives nz
Legacy digital and outreach @archives nz
Ross Spencer
 
HDLC(high level data link control)
HDLC(high level data link control)HDLC(high level data link control)
HDLC(high level data link control)
Anand Biradar
 

Viewers also liked (18)

GOVSIG: Update on Digital Transfer at Archives NZ
GOVSIG: Update on Digital Transfer at Archives NZGOVSIG: Update on Digital Transfer at Archives NZ
GOVSIG: Update on Digital Transfer at Archives NZ
 
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me...
 
Binary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital recordsBinary Trees? Automatically identifying the links between born-digital records
Binary Trees? Automatically identifying the links between born-digital records
 
Legacy digital and outreach @archives nz
Legacy digital and outreach @archives nzLegacy digital and outreach @archives nz
Legacy digital and outreach @archives nz
 
ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]
 
The Reality of Digital Transfer @ArchivesNZ
The Reality of Digital Transfer @ArchivesNZThe Reality of Digital Transfer @ArchivesNZ
The Reality of Digital Transfer @ArchivesNZ
 
HDLC(high level data link control)
HDLC(high level data link control)HDLC(high level data link control)
HDLC(high level data link control)
 
Check sum
Check sumCheck sum
Check sum
 
Hdlc
HdlcHdlc
Hdlc
 
Chapter3
Chapter3Chapter3
Chapter3
 
The medium access sublayer
 The medium  access sublayer The medium  access sublayer
The medium access sublayer
 
Farming system
Farming systemFarming system
Farming system
 
Farming
FarmingFarming
Farming
 
Chapter 03 cyclic codes
Chapter 03   cyclic codesChapter 03   cyclic codes
Chapter 03 cyclic codes
 
Ethernet
EthernetEthernet
Ethernet
 
Linear block coding
Linear block codingLinear block coding
Linear block coding
 
Multiple access protocol
Multiple access protocolMultiple access protocol
Multiple access protocol
 
Multiple access control protocol
Multiple access control protocol Multiple access control protocol
Multiple access control protocol
 

Similar to Checksum 101

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
Korea Sdec
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
jaxconf
 
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Philip Koopman
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
Abdelmonaim Remani
 

Similar to Checksum 101 (20)

SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware SandboxesBSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
Data Integrity Techniques: Aviation Best Practices for CRC & Checksum Error D...
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Cryto Party at CCU
Cryto Party at CCUCryto Party at CCU
Cryto Party at CCU
 
BSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysBSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad Guys
 
Gabe Nault Data Integrity
Gabe Nault Data IntegrityGabe Nault Data Integrity
Gabe Nault Data Integrity
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
CapTech Talks Webinar Feb 2023 Rick Hansen.pptx
CapTech Talks Webinar Feb 2023 Rick Hansen.pptxCapTech Talks Webinar Feb 2023 Rick Hansen.pptx
CapTech Talks Webinar Feb 2023 Rick Hansen.pptx
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Share winter 2016 encryption
Share winter 2016 encryptionShare winter 2016 encryption
Share winter 2016 encryption
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Computer forensics libin
Computer forensics   libinComputer forensics   libin
Computer forensics libin
 
Workshop on Network Security
Workshop on Network SecurityWorkshop on Network Security
Workshop on Network Security
 
20-security.ppt
20-security.ppt20-security.ppt
20-security.ppt
 
Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment Building next gen malware behavioural analysis environment
Building next gen malware behavioural analysis environment
 
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
1086: The SSL Problem and How to Deploy SHA2 Certificates (with Mark Myers)
 

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Recently uploaded (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Checksum 101

  • 1. A bit of information about Checksums By Ross Spencer Extracts from a joint presentation by myself, Jan Hutař, and Andrea K. Byrne for Archives NZ colleagues…
  • 2. Checksums – why? • why do we use checksums; policy – Integrity: “This policy deals with the integrity of digital content. Digital content is information encapsulated in one or more digital objects. Within this context, integrity of a digital object is the quality of its content remaining ‘uncorrupted and free of unauthorized and undocumented changes’” (UNESCO 2003). • Moving files – validation after the move • Working with files – uniquely identifying what we’re working with • Security… a by-product of integrity
  • 3. What do checksums look like • Hexadecimal notation, making a bigger number look smaller! • Numbers 0-9 • And Letters A-F --- 281,949,770,000,000,000,000,000,000,000,000,000,000 becomes: d41d8cd98f00b204e9800998ecf8427e
  • 4. What do checksums look like… • John Doe 4c2a904bafba06591225113ad17b5cec MD5 • Jane Doe cac7bbb6b67b44ea0ab997d34a88e4ea9b4d3d62 SHA1 • Axl Roe 21bd701e54de1d61bba99623509cdd794042dc3f2141ee d2e853482cfbcccbf0 SHA256 • MD5, SHA1, SHA256 are using different algorithms
  • 5. What do checksums look like… USA: f75d91cdd36b85cc4a8dfeca4f24fa14 USB: 7aca5ec618f7317328dcd7014cf9bdcf
  • 6. What are checksums doing? - Deterministic – The same input gives the same output - Uniform/Even distribution – input shared equally across output
  • 7. An algorithm does the computing bit…
  • 8. MD5 or… - A checksum algorithm is a one way function… - “a7fc44290f691cd888b68b59eb4989a1” cannot be turned back into “Joan”! - The algorithm computing the checksum varies in complexity and goes by different names… e.g. MD5:
  • 9. It’s irreversible: Think: Susan Storm, She Hulk, and The Thing Rather than: The Hulk
  • 10. Why do we always talk about the same ones in our workflows? • Namely: CRC32, MD5, SHA1, SHA256… • different algorithms • DROID can handle MD5, SHA1, and SHA256 • MD5 and SHA1 are the only overlaps with Rosetta (Oct 2016) • Rosetta handles (creates and validates): • CRC32 • MD5 • SHA1
  • 11. Why multiple checksums? • There are a limited number of unique numbers that can be output by a checksum algorithm, so sometimes we see collisions: 4 possible outputs, 5 inputs:
  • 12. Collisions, really? • But also keep in mind the probability of that happening for more complex algorithms:
  • 13. The probabilities are low (files needed for 1 collision, 50% chance) • CRC32 - 32-bit output - 8 character length 77 Thousand, 165 – 77165 • MD5 - 128-bit output - 32 character length 21 Quintillion - 21,719,643,148,400,763,000 • SHA1 - 160-bit output - 40 character length 1 Septillion - 1,423,418,533,373,592,400,000,000 • SHA256 - 256-bit output - 64 character length 400 Undecillion - 400,656,698,530,848,040,000,000,000,000,000,000,000 4.5 million (4,443,745) files in Rosetta (as of 13/01/2016)
  • 14. What if we got one? • Archivists have the concept of fixity – indicators of the file not changing, but also – we can understand what the file is… • Two files the same according to checksum: – What was the last accessed date? – What is the file name? – What is the file size? – What is the file type? – What does it look like? – We can figure it out!
  • 15. So why? • We will ensure uniqueness • We can automate processes with the files better with checksums (they’re just numbers!) • Some may have a preference – it is convenient for us that Rosetta handles MD5 as well! • Future proof – one day we will have a lot more files! • Security – for most altruistic purposes, our checksums are okay… but older checksums can be hacked (engineered) – we keep this in mind 10% of the time we talk about them in an archive…
  • 16. Checksums – where do they come from? • We generate them with a tool: – Free Commander (Windows) – online tool on the Internet (http://www.md5.cz/) – SHA1SUM. MD5SUM, (Linux) – DROID!! • We create a list and compare and validate with another: – Spreadsheet – SHA1SUM, MD5SUM (Linux) – AVPreserve Fixity: https://vimeo.com/100311241 – My comparator: https://github.com/exponential-decay/checksum- comparator • Other tools out there, many internet links!
  • 17. Tools using checksums – Internet behind-the-scenes, verify data being sent – Rsync – improve efficiency of backups/data moves – Digital Asset Management systems – file management – ensure storage integrity/accurate download and access – DP systems – preserving files (integrity, authenticity) – Law Enforcement – Software comparison databases – National Software Reference Library – HW – storage layers have their own checksums check/validation • Other cool uses: Information management systems – de-duplication tools - removing duplicate files with good reliability – files with different names but same content produce the same checksum!
  • 18. “I was having nightmares about the integrity of my data and thought I was losing sleep… I looked at my checksums and found that I hadn’t lost any…” - @beet_keeper