SlideShare a Scribd company logo
1 of 7
The Archive, Big Data, and Security
Or
Why do dusty documents really matter?
Tim Gollins
Head of Digital Preservation
(The National Archives)
&
Honorary Research Fellow
(Glasgow University School of Computing Science)
Outline
● The Archive is Big Data
● The Archive is about our Security
● Sensitivity Review of Digital Records
– Both a Threat and an Opportunity to
The Archive and thus to our Security
The Archive is Big Data
● Big data is not new
– Data of a volume that is transformative
● Medieval Times
– The Master of The Rolls
● Over 1 Billion sheets of paper (records)
● Already over 1.5 Billion web pages
● Capacity for over 13 Petabytes of Digital
Records
The Archive is Security
● Security rests on the Citizen's Trust in the State
● The Archive underpins the fabric of our society
● Enables Trust
– Is the impartial witness
– Holds the executive to account - the court of history
● Fundamental to The Rule of Law
– Underpins many of Lord Bingham’s principles
– E.g. “Ministers and pubic officers at all levels must exercise
the powers conferred on them in good faith, fairly, for the
purpose for which the powers were conferred, without
exceeding the limits of such powers and not unreasonably”
Transfer To The Archive
● Complex and Opaque process
– Decisions can appear perverse
– Checks and Balances
– Involving an “Advisory Council”
– The “Lord Chancellor's Blanket” (Blue?)
● Journalists and Eminent Historians are
questioning the process
● Conspiracy Theorists Ply Their Trade
Digital Sensitivity Review
● Threat
– Volume & Resources
– Complexity - Content and Containers
– Risk – specifics are now easy to find
– Decisions – The Rule of Law
– Timing – transition from 20 to 30 years
● Opportunity
– Some things are easier – but search can overload
– Constancy
– Efficiencies possible
– Technically Assisted Digital Sensitivity Review
Conclusion - The Right Balance
● Freedom of Information not just openness
● Openness & Transparency of process
● Calls for Privacy - keep “my” data private
● Calls for Openness - what is done in “my” name
● Need for limits
– National Security
– Protection of individuals from harm
● Digital Records Makes it Much harder
● Clear need for Research
– and the means to conduct it

More Related Content

Similar to The Archive, Big data, and Security

Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...
Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...
Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...KISK FF MU
 
Data Management for the Digital Humanities
Data Management for the Digital HumanitiesData Management for the Digital Humanities
Data Management for the Digital HumanitiesThea Atwood
 
Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...LIBER Europe
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenChris Rusbridge
 
ReinventLaw Silicon Valley 2013
ReinventLaw Silicon Valley 2013ReinventLaw Silicon Valley 2013
ReinventLaw Silicon Valley 2013Sean McGrath
 
Open for all – the benefits of open data in a digital age_Thorley
Open for all – the benefits of open data in a digital age_ThorleyOpen for all – the benefits of open data in a digital age_Thorley
Open for all – the benefits of open data in a digital age_ThorleyPlatforma Otwartej Nauki
 
Regulatory Compliance and Long-Term Storage of Data
Regulatory Compliance and Long-Term Storage of DataRegulatory Compliance and Long-Term Storage of Data
Regulatory Compliance and Long-Term Storage of DataArkivum
 
Alicia Wise - About CLOCKSS
Alicia Wise - About CLOCKSSAlicia Wise - About CLOCKSS
Alicia Wise - About CLOCKSSCLOCKSS
 

Similar to The Archive, Big data, and Security (9)

Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...
Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...
Guenther Krumpak: The Book and The Internet - the Antithesis between Paper an...
 
Data Management for the Digital Humanities
Data Management for the Digital HumanitiesData Management for the Digital Humanities
Data Management for the Digital Humanities
 
Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...Tensions between intellectual property and knowledge discovery in the digital...
Tensions between intellectual property and knowledge discovery in the digital...
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your Garden
 
ReinventLaw Silicon Valley 2013
ReinventLaw Silicon Valley 2013ReinventLaw Silicon Valley 2013
ReinventLaw Silicon Valley 2013
 
Open for all – the benefits of open data in a digital age_Thorley
Open for all – the benefits of open data in a digital age_ThorleyOpen for all – the benefits of open data in a digital age_Thorley
Open for all – the benefits of open data in a digital age_Thorley
 
Glenn Cumiskey - UKAD 2016 forum
Glenn Cumiskey - UKAD 2016 forumGlenn Cumiskey - UKAD 2016 forum
Glenn Cumiskey - UKAD 2016 forum
 
Regulatory Compliance and Long-Term Storage of Data
Regulatory Compliance and Long-Term Storage of DataRegulatory Compliance and Long-Term Storage of Data
Regulatory Compliance and Long-Term Storage of Data
 
Alicia Wise - About CLOCKSS
Alicia Wise - About CLOCKSSAlicia Wise - About CLOCKSS
Alicia Wise - About CLOCKSS
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

The Archive, Big data, and Security

  • 1. The Archive, Big Data, and Security Or Why do dusty documents really matter? Tim Gollins Head of Digital Preservation (The National Archives) & Honorary Research Fellow (Glasgow University School of Computing Science)
  • 2. Outline ● The Archive is Big Data ● The Archive is about our Security ● Sensitivity Review of Digital Records – Both a Threat and an Opportunity to The Archive and thus to our Security
  • 3. The Archive is Big Data ● Big data is not new – Data of a volume that is transformative ● Medieval Times – The Master of The Rolls ● Over 1 Billion sheets of paper (records) ● Already over 1.5 Billion web pages ● Capacity for over 13 Petabytes of Digital Records
  • 4. The Archive is Security ● Security rests on the Citizen's Trust in the State ● The Archive underpins the fabric of our society ● Enables Trust – Is the impartial witness – Holds the executive to account - the court of history ● Fundamental to The Rule of Law – Underpins many of Lord Bingham’s principles – E.g. “Ministers and pubic officers at all levels must exercise the powers conferred on them in good faith, fairly, for the purpose for which the powers were conferred, without exceeding the limits of such powers and not unreasonably”
  • 5. Transfer To The Archive ● Complex and Opaque process – Decisions can appear perverse – Checks and Balances – Involving an “Advisory Council” – The “Lord Chancellor's Blanket” (Blue?) ● Journalists and Eminent Historians are questioning the process ● Conspiracy Theorists Ply Their Trade
  • 6. Digital Sensitivity Review ● Threat – Volume & Resources – Complexity - Content and Containers – Risk – specifics are now easy to find – Decisions – The Rule of Law – Timing – transition from 20 to 30 years ● Opportunity – Some things are easier – but search can overload – Constancy – Efficiencies possible – Technically Assisted Digital Sensitivity Review
  • 7. Conclusion - The Right Balance ● Freedom of Information not just openness ● Openness & Transparency of process ● Calls for Privacy - keep “my” data private ● Calls for Openness - what is done in “my” name ● Need for limits – National Security – Protection of individuals from harm ● Digital Records Makes it Much harder ● Clear need for Research – and the means to conduct it

Editor's Notes

  1. The Cuneiform tablets in Babylon – including instructions to build an Ark The Library and Alexandria, Large collections of records have always been transfomative and thus I would regard as “big data” of their time. From the 12 century the duties of a clerk responsible for the “rolls” is worthy of mention Explicitly known as the “Master of the Rolls” by the 15th Century The holder of that post now chairs the Lord Chancellor’s Advisory Council that assures the transfer of records to the archives The Paper Holdings at Kew are over 1 Billion pages (1000 years of documents) The UK government web archive (Less than 20 years of material – most in the last 5) over 1.5 Billion pages – 18 months ago
  2. Security relies on the trust of the citizen in the state It is about The Rule of Law and the fact that the executive cannot be above that rule For the UK it is about the very fabric of our society The British state is different from many others in that the citizen expects the state to be subservient to her rather than the more common case The Rule of law supports and empowers the citizen National Archives are fundamental to all of this They provide the impartial witness that enable the holding to account under the rule of law and in the court of history Bingham’s 4th Principle – accountability for the executive – how can we know what they have done if the records are not kept ? It follows that the citizen must therefore trust the process by which the archives receives its material to sustain his rights.
  3. The UK system requires that selection, appraisal and sensitivity review is carried out by the department – this is counter intuitive as it appears to allow the department to hide material that it wishes not to see the light of day! however in creating this system the great archivist, Jenkinson, who articulated many of the fundamentals of the UK system, was trying to ensure that, unlike the Nazi Archive that was complicit the Holocaust, the UK archive was able to guard its independence under the rule of law. There are checks and balances, The right of access under FOI is the first, The Second is the public visibility of the selection criteria that the departments must apply, The Third is the Archives oversight of the application of those criteria, The Fourth is the Lord chancellors Advisory council's oversight of the application of FOI exemption during sensitivity review and their role in ensuring the timely transmission But what colour is the “Lord Chancellor’s Blanket” ? Professor Margaret MacMillan, warden of St Antony's College, Oxford, Quoted in the Guardian : "I am one of many historians who has benefited from using the British archives and who had confidence that the documents had not been weeded to suit particular interests. Now I am wondering whether I will have to go back and rethink my work on such matters as the outbreak of the first world war or the peace conference at the end. But when are we going to get the complete records? So far the pace of transferring them is stately, to put it politely." It looks like we have something to hide, and such appearances are important.
  4. Volume and Resources: Following advance of office technology during the twentieth century and the broadening of the interest of the scholarly community a much greater volume of material is being deemed worthy of preservation in the digital age. Against a background of budgetary constraint manual review of digitally born records is not practical. Complex Context: Across government and elsewhere the impact of technology has eroded earlier clear and unambiguous rules for the creation and management of information. This was very obvious in the evidence presented to the Hutton Inquiry [4], where the paper trail for a decision was no longer in a single manila file; instead, the record was found in a blizzard of emails sent from person to person and stored on multiple computing systems [5]. This situation will significantly complicate digital sensitivity review, as understanding a record’s context (including its distribution) is crucial in assessing its sensitivity. Risk: These challenges for review also occur in a context of significantly increased risk. Although the consequences of mistaken disclosure have not changed with the advent of digital records, the probability of discovering a mistake has. It is hard to discover particular information in the paper world, in marked contrast to the digital environment where ubiquitous search engines index content rapidly. A risk-averse depositor may feel obliged to close large swathes of records if they cannot efficiently and effectively determine the sensitivity of each individual record with some clear degree of certainty. Defensible Decisions: The risk environment is further complicated by the fact all closures of public records are open to challenge through FOIA, appeal to the Information Commissioner and ultimately in the courts. This means that the Digital Sensitivity Review process must produce decisions which stand up to external scrutiny and with which Lord Chancellor’s Advisory Council and audit and risk management committees both inside and outside the public sector are comfortable.
  5. Fundamental difference between openness (driven by what the state wants you to see) and Freedom of information which proscribes you right to see while creating a balance between the public interest, the sate interest and the personal interest based on Human Rights and the Rule of Law. Concept of FOI framework is fundamentally sound (in my view), although of course the details must always be open to debate, they are open to public scrutiny and The Rule of Law. Openness of process – Drawing on Bingham's second principle “Questions of legal right and Liability should ordinarily be resolved by application of the law and not the exercise of discretion” - and earlier principles “All persons and authorities within the state, whether public or private, should be bound by and entitled to the benefit of laws publically made, taking effect (generally) in the future and publically administered in the courts” - I am of course extrapolating and generalising but I think reasonably. It is about trust engendered by right being seen to be done!