Libraries, archives, and museums have traditionally preserved and provided access to many different kinds of physical materials, including books, papers, theses, faculty research notes, correspondence, and more. These items have been critical for researchers to have a full understanding of their fields of study as well as the history and context that surround the work.
However, in recent years many of these equivalent materials only exist electronically on websites, laptops, private servers, and social media. These digital materials are currently very difficult to track, preserve, and make accessible. Future researchers may very well find a black hole of content: discovering early physical materials and late electronic records, but little information for the late 20th though early 21st Centuries. In other words, a portion of history, including the field of Mathematics, may be lost unless this electronic content--perhaps some content you have right now--is cared for properly.
The presenters will cover the issues surrounding Digital Preservation, including steps needed to make sure data is reasonably safe. Additionally they will pose a small number of discrete challenges and unsolved problems in the field of Digital Preservation, where Mathematicians may be able to help with analysis and new algorithms.
Secure Deduplication with Efficient and Reliable Dekey Management with the Pr...paperpublications3
Abstract: De-Duplication improves Storage and bandwidth efficiency is incompatible with traditional encryption. In traditional model encryption requires different users to encrypt their own data with their own master key, thus identical data copies of different users will lead to different cipher texts, making de-duplication impossible. Each such copy can be defined based on different granularities: it may refer to either a whole file (i.e., file level deduplication), or data block (i.e., block-level deduplication). To applying deduplication to user data to save maintenance cost in cloud. Apart from normal encryption and decryption process we have proposed Master key concept with DeKey concept. For Encryption and Decryption we have used Triple Data Encryption Standard Algorithm where the plain text is encrypted triple times with the key so that the data is secure and reliable from hackers. We reduced the cost and time in uploading and downloading with storage space.Keywords: De-Duplication improves Storage; bandwidth efficiency is incompatible with traditional encryption.
Title: Secure Deduplication with Efficient and Reliable Dekey Management with the Proof of Ownership
Author: M. Shankari, V. Sheela, S. Rajesh
International Journal of Recent Research in Mathematics Computer Science and Information Technology
ISSN 2350-1022
Paper Publications
This document summarizes a webinar about managing and preserving scientific data sets. It discusses the definition of science data according to the federal government, why science data is different than other data, current trends and challenges in digital preservation for science. It outlines several levels of digital preservation and provides examples of data being preserved. The webinar discusses the benefits of data management, such as supporting open access and future funding. It also describes existing problems around data management including lack of standards, resources and staffing. Potential solutions discussed include implementing research data management plans and using existing and upcoming tools to help with various stages of the research lifecycle from data creation to long-term preservation and access.
Cloud storage (CS) is gaining much popularity nowadays because it offers low-cost and convenient network storage services. In this big data era, the explosive growth in digital data moves the users towards CS to store their massive data. This explosive growth of data causes a lot of storage pressure on CS systems because a large volume of this data is redundant. Data deduplication is a most-effective data reduction technique that identifies and eliminates the redundant data. Dynamic nature of data makes security and ownership of data as a very important issue. Proof-of-ownership schemes are a robust way to check the ownership claimed by any owner. However to protect the privacy of data, many users encrypt it before storing in CS. This method affects the deduplication process because encryption methods have varying characteristics. Convergent encryption (CE) scheme is widely used for secure data deduplication, but it destroys the message equality. Although, DupLESS provides strong privacy by enhancing CE, but it is also found insufficient. The problem with the CE-based scheme is that the user can decrypt the cloud data while he has lost his ownership. This paper addresses the problem of ownership revocation by proposing a secure deduplication scheme for encrypted data. The proposed scheme enhances the security against unauthorized encryption and poison attack on the predicted set of data.
This document presents a proposal for an individual project on opportunistic persistent data storage. The proposal discusses opportunistic networks and their social network properties. It aims to efficiently implement persistent data storage in opportunistic networks by utilizing social network properties. Key research questions are how to implement efficient storage protocols using social properties and handling replicas with low overhead. The study will select server nodes based on social properties and implement a storage model with write and read quorums. Evaluation will use the ONE simulator and synthetic traces to analyze the approach.
El documento presenta una clasificación de las diferentes ramas del derecho, dividiéndolas entre derecho público, derecho privado y especificando algunas subcategorías como derecho penal sustantivo, derecho procesal, derecho constitucional, derecho mercantil, derecho tributario y otros. Finalmente incluye una breve bibliografía con enlaces sobre la clasificación del derecho.
The document outlines a draft customer service plan called "I Serve" for the Kansas City, Missouri School District (KCMSD). The plan aims to enhance customer service both internally and externally through a focus on meeting and exceeding constituent expectations. It involves developing positive relationships, establishing service standards and guidelines, providing training to employees, gathering customer feedback, and recognizing excellent service. The plan is presented in phases, beginning with establishing expectations, training administrators, and launching communications, followed by ongoing reinforcement, accountability measures, and use of data to improve the program over time.
Secure Deduplication with Efficient and Reliable Dekey Management with the Pr...paperpublications3
Abstract: De-Duplication improves Storage and bandwidth efficiency is incompatible with traditional encryption. In traditional model encryption requires different users to encrypt their own data with their own master key, thus identical data copies of different users will lead to different cipher texts, making de-duplication impossible. Each such copy can be defined based on different granularities: it may refer to either a whole file (i.e., file level deduplication), or data block (i.e., block-level deduplication). To applying deduplication to user data to save maintenance cost in cloud. Apart from normal encryption and decryption process we have proposed Master key concept with DeKey concept. For Encryption and Decryption we have used Triple Data Encryption Standard Algorithm where the plain text is encrypted triple times with the key so that the data is secure and reliable from hackers. We reduced the cost and time in uploading and downloading with storage space.Keywords: De-Duplication improves Storage; bandwidth efficiency is incompatible with traditional encryption.
Title: Secure Deduplication with Efficient and Reliable Dekey Management with the Proof of Ownership
Author: M. Shankari, V. Sheela, S. Rajesh
International Journal of Recent Research in Mathematics Computer Science and Information Technology
ISSN 2350-1022
Paper Publications
This document summarizes a webinar about managing and preserving scientific data sets. It discusses the definition of science data according to the federal government, why science data is different than other data, current trends and challenges in digital preservation for science. It outlines several levels of digital preservation and provides examples of data being preserved. The webinar discusses the benefits of data management, such as supporting open access and future funding. It also describes existing problems around data management including lack of standards, resources and staffing. Potential solutions discussed include implementing research data management plans and using existing and upcoming tools to help with various stages of the research lifecycle from data creation to long-term preservation and access.
Cloud storage (CS) is gaining much popularity nowadays because it offers low-cost and convenient network storage services. In this big data era, the explosive growth in digital data moves the users towards CS to store their massive data. This explosive growth of data causes a lot of storage pressure on CS systems because a large volume of this data is redundant. Data deduplication is a most-effective data reduction technique that identifies and eliminates the redundant data. Dynamic nature of data makes security and ownership of data as a very important issue. Proof-of-ownership schemes are a robust way to check the ownership claimed by any owner. However to protect the privacy of data, many users encrypt it before storing in CS. This method affects the deduplication process because encryption methods have varying characteristics. Convergent encryption (CE) scheme is widely used for secure data deduplication, but it destroys the message equality. Although, DupLESS provides strong privacy by enhancing CE, but it is also found insufficient. The problem with the CE-based scheme is that the user can decrypt the cloud data while he has lost his ownership. This paper addresses the problem of ownership revocation by proposing a secure deduplication scheme for encrypted data. The proposed scheme enhances the security against unauthorized encryption and poison attack on the predicted set of data.
This document presents a proposal for an individual project on opportunistic persistent data storage. The proposal discusses opportunistic networks and their social network properties. It aims to efficiently implement persistent data storage in opportunistic networks by utilizing social network properties. Key research questions are how to implement efficient storage protocols using social properties and handling replicas with low overhead. The study will select server nodes based on social properties and implement a storage model with write and read quorums. Evaluation will use the ONE simulator and synthetic traces to analyze the approach.
El documento presenta una clasificación de las diferentes ramas del derecho, dividiéndolas entre derecho público, derecho privado y especificando algunas subcategorías como derecho penal sustantivo, derecho procesal, derecho constitucional, derecho mercantil, derecho tributario y otros. Finalmente incluye una breve bibliografía con enlaces sobre la clasificación del derecho.
The document outlines a draft customer service plan called "I Serve" for the Kansas City, Missouri School District (KCMSD). The plan aims to enhance customer service both internally and externally through a focus on meeting and exceeding constituent expectations. It involves developing positive relationships, establishing service standards and guidelines, providing training to employees, gathering customer feedback, and recognizing excellent service. The plan is presented in phases, beginning with establishing expectations, training administrators, and launching communications, followed by ongoing reinforcement, accountability measures, and use of data to improve the program over time.
1. The document discusses the concept of global governance and defines it as the system of global institutions and actors, both state and non-state, that work to manage global issues and interdependencies through cooperation and consensus building.
2. It analyzes China's concept of a "harmonious global society" based on the Five Principles of Peaceful Coexistence, which envisions a world where states and peoples live in harmony.
3. The document argues that mature global governance is not possible without implementing a model of harmonious global society, and that China's vision proposes a sophisticated approach for the global system that consolidates the role of states while accepting new forms of globalism and an emerging proto-global order.
Este documento describe los conceptos básicos de la electricidad, incluyendo los componentes de un circuito eléctrico (generador, conductor, receptor), la carga eléctrica, voltaje, intensidad, resistencia y la ley de Ohm. También explica que la energía eléctrica puede transformarse en energía calorífica, luminosa, mecánica, magnética, química y tener efectos biológicos, y cubre los tipos de circuitos en serie y paralelo.
Este documento describe los componentes básicos de un circuito eléctrico: una fuente de energía que suministra voltaje, un flujo de corriente eléctrica medido en amperios, y una resistencia u objeto que consume la energía medida en ohmios. También explica qué es un cortocircuito, la ley de Ohm que relaciona voltaje, corriente y resistencia, y que los conductores permiten el flujo de la corriente eléctrica de manera análoga a como una tubería permite el flujo de un
Este documento presenta una introducción a conceptos fundamentales de electricidad. Explica que la electricidad se produce principalmente en centrales eléctricas, las cuales transforman energía primaria como la hidráulica, térmica o nuclear en energía eléctrica mediante generadores. Luego define el átomo como la unidad básica de la materia, compuesto de un núcleo central con protones y neutrones, y electrones orbitando alrededor; la carga eléctrica de un átomo depende del balance entre sus protones y electrones. Finalmente,
A Keynote at the Web Science Conference, 2018, held at the VU Amsterdam [1]. This describes in the main the output of the Semantic Technology Institute International (STI2) Summit (for senior researchers in the Semantic Web field) held in Crete in September, 2017 [2].
1. https://websci18.webscience.org/
2. https://www.sti2.org/events/2017-sti2-semantic-summit
eScience: A Transformed Scientific MethodDuncan Hull
The document discusses the concept of eScience, which involves synthesizing information technology and science. It explains how science is becoming more data-driven and computational, requiring new tools to manage large amounts of data. It recommends that organizations foster the development of tools to help with data capture, analysis, publication, and access across various scientific disciplines.
“Filling the digital preservation gap”an update from the Jisc Research Data ...Jenny Mitcham
This document summarizes the findings of the Jisc Research Data Spring project at the University of York and Hull which investigated how Archivematica could be used to provide digital preservation for research data. The project tested Archivematica, explored how it handles different file formats and research data, and identified ways to improve Archivematica and integrate it into research data management workflows. The next phases will develop Archivematica further and implement proof of concepts at York and Hull to preserve research data using Archivematica.
Research Data Curation _ Grad Humanities ClassAaron Collie
This document discusses best practices for research data curation and management. It covers topics such as data storage, file organization, documentation, sharing, and archiving. Effective data management practices include making backups in multiple locations, using logical file naming conventions and organization schemes, documenting projects, processes, and data, publishing and sharing data when appropriate, and archiving data for long-term preservation and access. Proper data management ensures that valuable research data is organized, preserved, and accessible to enable future research and verification of results.
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
No Free Lunch: Metadata in the life sciencesChris Dwan
This presentation covers some challenges and makes suggestions to support the work of creating flexible, interoperable data systems for the life sciences.
Talk at the World Science Festival at Columbia, June 2, 2017: session on Big Data and Physics: http://www.worldsciencefestival.com/programs/big-data-future-physics/
Digital preservation and curation of information.presentationPrince Sterling
This document summarizes key aspects of digital preservation and curation. It discusses the rapid growth of digital information and the need for new preservation models. Effective preservation practices require consistent maintenance and addressing technological and social challenges. Different organizational models are described, including government libraries, independent preservation libraries like Portico, and networked library efforts like LOCKSS and CLOCKSS. The roles and responsibilities of curators and repositories include ensuring sustainability, access, security and addressing copyright issues.
This document discusses issues, opportunities, and challenges related to big data. It provides an overview of big data characteristics like volume, variety, velocity, and veracity. It also describes Hadoop and HDFS for distributed storage and processing of big data. The document outlines issues in big data like storage, management, and processing challenges due to scale. Opportunities in big data analytics are also presented. Finally, challenges like heterogeneity, scale, timeliness, and ownership are discussed along with approaches like Hadoop, Spark, NoSQL databases, and Presto for tackling big data problems.
1. The document discusses the concept of global governance and defines it as the system of global institutions and actors, both state and non-state, that work to manage global issues and interdependencies through cooperation and consensus building.
2. It analyzes China's concept of a "harmonious global society" based on the Five Principles of Peaceful Coexistence, which envisions a world where states and peoples live in harmony.
3. The document argues that mature global governance is not possible without implementing a model of harmonious global society, and that China's vision proposes a sophisticated approach for the global system that consolidates the role of states while accepting new forms of globalism and an emerging proto-global order.
Este documento describe los conceptos básicos de la electricidad, incluyendo los componentes de un circuito eléctrico (generador, conductor, receptor), la carga eléctrica, voltaje, intensidad, resistencia y la ley de Ohm. También explica que la energía eléctrica puede transformarse en energía calorífica, luminosa, mecánica, magnética, química y tener efectos biológicos, y cubre los tipos de circuitos en serie y paralelo.
Este documento describe los componentes básicos de un circuito eléctrico: una fuente de energía que suministra voltaje, un flujo de corriente eléctrica medido en amperios, y una resistencia u objeto que consume la energía medida en ohmios. También explica qué es un cortocircuito, la ley de Ohm que relaciona voltaje, corriente y resistencia, y que los conductores permiten el flujo de la corriente eléctrica de manera análoga a como una tubería permite el flujo de un
Este documento presenta una introducción a conceptos fundamentales de electricidad. Explica que la electricidad se produce principalmente en centrales eléctricas, las cuales transforman energía primaria como la hidráulica, térmica o nuclear en energía eléctrica mediante generadores. Luego define el átomo como la unidad básica de la materia, compuesto de un núcleo central con protones y neutrones, y electrones orbitando alrededor; la carga eléctrica de un átomo depende del balance entre sus protones y electrones. Finalmente,
A Keynote at the Web Science Conference, 2018, held at the VU Amsterdam [1]. This describes in the main the output of the Semantic Technology Institute International (STI2) Summit (for senior researchers in the Semantic Web field) held in Crete in September, 2017 [2].
1. https://websci18.webscience.org/
2. https://www.sti2.org/events/2017-sti2-semantic-summit
eScience: A Transformed Scientific MethodDuncan Hull
The document discusses the concept of eScience, which involves synthesizing information technology and science. It explains how science is becoming more data-driven and computational, requiring new tools to manage large amounts of data. It recommends that organizations foster the development of tools to help with data capture, analysis, publication, and access across various scientific disciplines.
“Filling the digital preservation gap”an update from the Jisc Research Data ...Jenny Mitcham
This document summarizes the findings of the Jisc Research Data Spring project at the University of York and Hull which investigated how Archivematica could be used to provide digital preservation for research data. The project tested Archivematica, explored how it handles different file formats and research data, and identified ways to improve Archivematica and integrate it into research data management workflows. The next phases will develop Archivematica further and implement proof of concepts at York and Hull to preserve research data using Archivematica.
Research Data Curation _ Grad Humanities ClassAaron Collie
This document discusses best practices for research data curation and management. It covers topics such as data storage, file organization, documentation, sharing, and archiving. Effective data management practices include making backups in multiple locations, using logical file naming conventions and organization schemes, documenting projects, processes, and data, publishing and sharing data when appropriate, and archiving data for long-term preservation and access. Proper data management ensures that valuable research data is organized, preserved, and accessible to enable future research and verification of results.
Data Communities - reusable data in and outside your organization.Paul Groth
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series," Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 2: “Metadata and Repository Services for Research Data Curation”
Presented by Declan Fleming, Chief Technology Strategist, Arwen Hutt, Metadata Librarian & Matt Critchlow, Manager of Development and Web ServicesUC, San Diego Library.
No Free Lunch: Metadata in the life sciencesChris Dwan
This presentation covers some challenges and makes suggestions to support the work of creating flexible, interoperable data systems for the life sciences.
Talk at the World Science Festival at Columbia, June 2, 2017: session on Big Data and Physics: http://www.worldsciencefestival.com/programs/big-data-future-physics/
Digital preservation and curation of information.presentationPrince Sterling
This document summarizes key aspects of digital preservation and curation. It discusses the rapid growth of digital information and the need for new preservation models. Effective preservation practices require consistent maintenance and addressing technological and social challenges. Different organizational models are described, including government libraries, independent preservation libraries like Portico, and networked library efforts like LOCKSS and CLOCKSS. The roles and responsibilities of curators and repositories include ensuring sustainability, access, security and addressing copyright issues.
This document discusses issues, opportunities, and challenges related to big data. It provides an overview of big data characteristics like volume, variety, velocity, and veracity. It also describes Hadoop and HDFS for distributed storage and processing of big data. The document outlines issues in big data like storage, management, and processing challenges due to scale. Opportunities in big data analytics are also presented. Finally, challenges like heterogeneity, scale, timeliness, and ownership are discussed along with approaches like Hadoop, Spark, NoSQL databases, and Presto for tackling big data problems.
This document discusses the challenges and opportunities presented by the increasing volume and complexity of biological data. It outlines four main areas: 1) Developing methods to efficiently store, access, and analyze large datasets; 2) Broadening our understanding of gene function beyond a small number of well-studied genes; 3) Accelerating research through improved sharing of data, results, and methods; and 4) Leveraging exploratory analysis of integrated datasets to generate new insights. The author advocates for lossy data compression, streaming analysis, preprint sharing, improved metadata collection, and incentivizing open data practices.
Responsible conduct of research: Data ManagementC. Tobin Magle
A presentation for the Food and Nutrition Science Responsible conduct of research class on data management best practices. Covers material in the context of writing a data management plan.
The document discusses a Faculty Development Program (FDP) on database management systems that was held on December 6, 2018 at the University College of Engineering Tindivanam in Tindivanam, India. The FDP covered recent research perspectives in different database management systems and the importance of database management systems in Digital India. It was conducted by Dr. A. Karthirvel, Professor and Head of the Computer Science and Engineering Department at MNM Jain Engineering College in Chennai.
The computational requirements of next generation sequencing is placing a huge demand on IT organisations .
Building compute clusters is now a well understood and relatively straightforward problem. However, NGS sequencing applications require large amounts of storage, and high IO rates.
This talk details our approach for providing storage for next-gen sequencing applications.
Talk given at BIO-IT World, Europe, 2009.
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
The document discusses trends in scientific data repositories and ecosystems. It notes that repositories are becoming more like virtual laboratories where scientists can conduct research. It also discusses how artificial intelligence and machine learning are being used to complement human discovery and analysis of large and complex datasets. The document raises several challenges around issues such as data ownership, rewards for data sharing and software development, and the roles of various stakeholders in research data management.
Research Data Management Fundamentals for MSU Engineering StudentsAaron Collie
This document discusses the importance of research data management and outlines best practices. It notes that data is expensive to produce but is the primary output of research. Funding agencies now require data management plans to facilitate data sharing and reuse. The document recommends storing data on multiple types of storage, avoiding single points of failure, creating backup strategies, documenting projects and data, and selecting open file formats. Overall, it emphasizes that data management is an important skill for researchers.
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series, " Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 1: “Research Data Curation at UC San Diego: An Overview”
Presented by David Minor & Declan Fleming, Chief Technology Strategist, UC San Diego Library
Similar to The Quest for Digital Preservation: Will Part of Math History Be Gone Forever? (20)
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Generative AI Deep Dive: Advancing from Proof of Concept to Production
The Quest for Digital Preservation: Will Part of Math History Be Gone Forever?
1. The Quest for Digital
Preservation
Will a portion of Math History be lost forever?
Steve DiDomenico
Head, Enterprise Systems
Northwestern University Library
steve@northwestern.edu
Linda Newman
Head of Digital Collections and
Repositories
University of Cincinnati Libraries
newmanld@ucmail.uc.edu
2. Today we’ll be talking about:
● What is Digital Preservation
● Why is it important
● Libraries, museums, and other institutions’
recent work in this area
● What you (Mathematicians) can do and how
you can help
● Time for questions
4. Tweeted in 2012 by Gail
Steinhart, Head of
Research Services, Mann
Library, Cornell University
https://twitter.com/gailst/status/2375
25591547580416
5. The Availability of Research Data Declines Rapidly with Article Age
1. Vines, T. H., Albert, A. Y. K., Andrew, R. L., Débarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2013). The Availability of Research Data
Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014
“The major cause of the reduced
data availability for older papers
was the rapid increase in the
proportion of data sets reported
as either lost or on inaccessible
storage media.”
Predicted probability that the research data were
extant given a useful response was received
6.
7. A Library’s mission is to:
● Collect and preserve the scholarly record
● Provide access to this content
● Support the creation of new knowledge
Applies whether materials are in print or digital
8. Digital Preservation
Some issues with storing digital information include:
● Backups — Keeping separate geographic locations for disaster situations
● Maintenance (performing fixity checks, test restores)
● Versioning
● Costs of high-quality storage
● Curation: What can we not afford to lose?
9. Digital Preservation
Digitized physical content: Data where the content originated from a physical
object
Examples: scanned books, digitally captured photo of a painting
Issues include:
What quality do we capture at in order to make sure the item is preserved?
What equipment do we use to capture?
How do we note our digitization processes for the future (particularly if there is
a problem)?
10. Digital Preservation
Born-digital content: Data that doesn’t have a physical equivalent
Examples: photos from a digital camera, email, web pages
Issues include:
● Storage medium (floppy disks, burned data discs that only last a few years,
etc.)
● Format — Obsolete formats difficult or impossible to open, may have
missing data
● Difficult to preserve content on websites, private servers, social media.
● There can be a lot of content, hard to determine what to keep
11. Digital Preservation
For both born-digital and digitized materials:
Can we preserve what we need before it’s too late?
Does metadata exist (or can it be created) so we know the
content that we have?
12. Digital Repository: A preservation system designed to help librarians,
curators, and other specialists keep track of and maintain digital information
over time. It can also provide users and patrons access to the content.
(In addition to some other other things, such as access controls, APIs for content ingestion, copyright
support, and more)
Libraries, museums, and other institutions have created community-developed,
open source software such as:
15. What can you do
● Use Library Services!
o Faculty members: contact your Digital Librarians/Preservation
specialists to discuss how to preserve your content
o Publish to Open Access journals
o Researchers: Discover digital resources that have become available
● For personal items, make sure you have good backups
and metadata (descriptions) of your data
16. Puzzle #1 - Checksums
Challenge/puzzle #1:
❖ Is there a better hash function – a better checksum
algorithm for the digital preservation use case?
● Checksums are hashes used to detect errors in data transfer and for
authenticity validation.
● ‘Collisions’ occur when the same checksum is generated for two files that
should not match.
● Low likelihood of a collision caused by accidental degradation (bit rot)?
● High likelihood of a collision caused by malicious alteration? (justifies a
cryptographic hash)
17. Checksums & Cryptographic Hashes – an oversimplified history:
➢ 1961: CRC (Cyclic Redundancy Check) – not cryptographic
The following checksums are created with cryptographic hash functions (here cryptographic means that it is
practically impossible to invert and recreate input data from the hash value). These checksums are less
vulnerable to collisions:
➢ 1991: MD5 (Message Digest algorithm) – cryptographic but found to be highly
vulnerable
➢ 1995: SHA-1 – (Secure Hash Algorithm)
➢ SHA-1 still in wide use but NIST directive to US. Agencies to stop using SHA-1
by 2010. Mozilla plans to stop accepting SHA-1 SSL certificates by 2017.
➢ 2002: SHA-2 – Less vulnerable than SHA-1.
➢ 2012: SHA-3 – also known as ‘Keccak’ developed after 5 year NIST-sponsored
contest – not yet in widespread use.
Puzzle #1 - Checksums
18. Checksums - performance:
● CRCs are less complex than the SHA family of cryptographic hash functions.
● SHA-2 and SHA-1 are commonly assumed to be slower than MD5 but empirical
data is unpublished and inconsistent. A 2014 study tried to address this evidence
gap, and found SHA-256 30% slower per gigabyte than MD5.1
● To be secure, hash functions may need to be slow – the faster the hash the more
vulnerable it may be to brute force attacks.
● For preservation at mass scale, has CPU power kept up with the amount of
constant file validation we need, allowing re-calculation, not just comparison of
stored values?
2. Duryee, Alex. “What Is the Real Impact of SHA-256? A Comparison of Checksum Algorithms.” AVPreserve.com, October 2014. http://www.avpreserve.com/wp-
content/uploads/2014/10/ChecksumComparisons_102014.pdf.
Puzzle #1 - Checksums
19. ● With digital preservation at mass scale, are cryptographic checksums, in
addition to robust system security, necessary? (The answer has been
considered to be yes, to prevent malicious alteration by bad actors.)
Challenge/puzzle #1:
❖Is there a better hash function – a better checksum
algorithm for the digital preservation use case?
Puzzle #1 - Checksums
20. Puzzle #2 - How many copies do we need?
● LOCKSS stores 7 copies across 7 geographically dispersed servers.
● Academic Preservation Trust uses Amazon S3 (3 copies) and Amazon Glacier (3 copies),
with geographic distance between S3 (Virginia) and Glacier (Oregon). Copies are in
different ‘availability zones’ with separate power and internet. Amazon claims
99.999999999% (11 nines) reliability for S3 and Glacier in this configuration.3
● But are these estimates based on projections and models or empirical studies?
● Hardware manufacturers’ estimates of mean time to data loss (MTTDL) may not tell us
much about the extent of data loss over a given period of time.4
3. http://media.amazonwebservices.com/AWS_Storage_Options.pdf
4. Rosenthal, David S. H. “DSHR’s Blog: ‘Petabyte for a Century’ Goes Main-Stream.” DSHR’s Blog, October 6, 2010. http://blog.dshr.org/2010/09/petabyte-for-century-goes-main-
stream.html and Greenan, Kevin M., James S. Plank, and Jay J. Wylie. “Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability.” In Proceedings of the
2nd USENIX Conference on Hot Topics in Storage and File Systems. Boston, Mass.: USENIX Association, 2010.
https://www.usenix.org/legacy/events/hotstorage10/tech/full_papers/Greenan.pdf.
Challenge/puzzle #2:
❖ How many copies do we need to preserve a given
amount of data over a specified period of time?
21. ● Greenan et al propose a new measurement – Normalized Magnitude of Data Loss
(NOMDL) – and demonstrate that it is possible to compute this using Monte Carlo
simulation based assigning failure and repair characteristics to hardware devices drawn
from real-world data.3
● David Rosenthal has written extensively about the problems with proving that we can keep
a petabyte for a century. He has concluded that “the requirements being placed on bit
preservation systems are so onerous that the experiments required to prove that a solution
exists are not feasible.”4 He proposed a bit half-life measurement (the time after which
there is a 50% probability that a bit will have flipped), but argues that it is not possible to
construct an experiment that proves that a given number of copies of files stored in a
particular configuration will keep a petabyte safe for a century.5
5. Greenan, Kevin M., James S. Plank, and Jay J. Wylie. “Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability.” In Proceedings of the 2nd USENIX
Conference on Hot Topics in Storage and File Systems. Boston, Mass.: USENIX Association, 2010.
https://www.usenix.org/legacy/events/hotstorage10/tech/full_papers/Greenan.pdf.
6. Rosenthal, David S. H. “Bit Preservation: A Solved Problem?” International Journal of Digital Curation 5, no. 1 (June 22, 2010): 134–48. doi:10.2218/ijdc.v5i1.148.
http://www.ijdc.net/index.php/ijdc/article/view/151.
7. Rosenthal, David S. H. “Keeping Bits Safe - ACM Queue.” ACM Queue 8, no. 10 (October 2010) (October 1, 2010). http://queue.acm.org/detail.cfm?id=1866298
Puzzle #2 - How many copies do we need?
22. Where does this us when designing digital preservation solutions?
We have general comfort with assumptions that
● The more copies of our files the better.
● The more independent the copies are from each other the better. (different
storage technology, network independence, as well as organizational and
geographic independence)
● The more frequently the copies are audited the better.
But as the size of the data increases, the time and cost for audits increases, the
per-copy cost increases, and the number of storage options that are feasible
decrease. If we maximize the number of copies, the frequency of audits and
network independence, high costs will force us to preserve much less.
Puzzle #2 - How many copies do we need?
23. ● Should librarians and archivists accept a higher level of risk in order to
preserve more?
● Is it better to preserve more quantity with some data loss than little quantity
with unproven projections of no data loss?
Challenge/puzzle #2:
❖ How many copies do we need to preserve a given amount
of data over a specified period of time?
A more definitive answer to this question would help us make better informed
choices (and get funding).
Puzzle #2 - How many copies do we need?
24. ● Format obsolescence predictions have not proven to be as dire as previously
thought – the web continues to be able to render much earlier content, even for
formats no longer in active use.
● The importance of open source documentation may greatly increase a format’s
chances of being rendered in the future.
● Popularity of a format may be a simple test, but popularity can wax and wane
quickly and unexpectedly as happened with the Flash video format.
Puzzle #3 - Format Obsolescence
Challenge/puzzle #3:
❖Is there a better algorithm that could be used to initiate
format preservation actions?
25. ● We’ve used tools such as Droid, JHOVE and Apache Tika to identify
formats and create metadata about them – leading some practitioners to
emphasize preserving those formats we can identify. But one study
suggests that browsers still render much of the content these tools fail to
identify, so perhaps our format criteria are inadequate.6
● In some cases, emulation of a software environment could prove to be
equally if not more feasible than format migration.
8. Jackson, Andrew N. “Formats over Time: Exploring UK Web History.” In arXiv:1210.1714 [cs]. Toronto, Canada, 2012.
http://arxiv.org/abs/1210.1714.
Puzzle #3 - Format Obsolescence
26. Puzzle #3
Librarians and archivists would like to know when to take preservation ‘actions’
– to intervene and establish a method of format migration or emulation.
At present we are balancing opinions about format renderability and brittleness
without knowing whether factors such as wide-spread use (popularity), self-
documentation (open source) and rendering complexity (layers) might be better
and possibly more quantifiable predictors.
Challenge/Puzzle #3:
❖Is there a better algorithm that could be used to initiate
format preservation actions?
27. Conclusion
➢ Data is at risk if it isn’t preserved properly
➢ Talk with libraries at your institution and make sure your
research will be found for later generations
28. Questions?
Steve DiDomenico
Head, Enterprise Systems
Northwestern University Library
steve@northwestern.edu
Linda Newman
Head of Digital Collections and
Repositories
University of Cincinnati Libraries
newmanld@ucmail.uc.edu
These slides are available on Slideshare at: http://www.slideshare.net/newmanld/the-quest-for-digital-preservation-
will-part-of-math-history-be-gone-forever
A bibliography is available at: http://www.slideshare.net/newmanld/the-quest-for-digital-preservation-mathfest-2015-
bibliography
And at Scholar@UC, the University of Cincinnati’s digital repository (https://scholar.uc.edu/)
http://dx.doi.org/doi:10.7945/C23S3C