The document discusses the challenges of preserving electronic records for future use. It outlines stakeholders such as government agencies and medical organizations that have an interest in electronic record preservation. Some of the key open problems mentioned are how to appraise and process increasing amounts of heterogeneous electronic record data over time given limitations in computational resources and costs. The document provides examples of research aimed at addressing issues like developing scalable appraisal methodologies and automating aspects of electronic record processing and preservation to enable long-term learning from these records.
Preserving Access For Electronic Records at the Texas e-Records Confence 2013Kalani Hausman
Retention standards for electronic State records can include extended terms, potentially rendering storage formats and media unavailable. This session will address strategies for both format management and media renewal for long-term electronic records access.
Metadata for Audiovisual Materials and its Role in Digital ProjectsJenn Riley
Riley, Jenn. "Metadata for Audiovisual Materials and its Role in Digital Projects." Online Audiovisual Catalogers (OLAC)/Music OCLC Users. Group (MOUG) Joint Conference, September 2008.
This is a project proposal presentation for my English 30 (English for the Professions) class, showcasing the present condition of the University of the Philippines Main Library, and highlighting 5 points for improvement of the library's facilities and services.
A presentation on the digital preservation of audiovisual materials, including a brief history of media formats and file types, among others. It's a bit of a rushed work, I admit, plus the text designs are not as smooth as before I converted the PPT to PDF format.
Digital preservation aims to maintain access to digital materials over time despite changes in technology. It faces challenges such as the obsolescence of storage technologies, instability of storage media, and maintaining the integrity of digital materials. Recommendations include starting preservation close to creation to ensure future access and increasing awareness of preservation techniques. While skepticism remains, institutions like the Library and Archives of Canada are mandated to preserve digital heritage for present and future generations through continued migration to new technologies and formats.
Handout for Digital Imaging of PhotographsJenn Riley
This document provides guidelines for digitizing sheet music collections at the Lilly Library, including specifications for file formats, resolution, naming conventions, and scanning procedures. Key steps include wearing gloves, handling pages carefully, scanning pages sequentially in color or grayscale as needed, using consistent pixel dimensions within each item, and recording metadata in a scan log spreadsheet. The goal is to digitally capture all relevant content like illustrations, advertisements, and annotations, while preserving the original order and organization of the physical materials.
Preserving Access For Electronic Records at the Texas e-Records Confence 2013Kalani Hausman
Retention standards for electronic State records can include extended terms, potentially rendering storage formats and media unavailable. This session will address strategies for both format management and media renewal for long-term electronic records access.
Metadata for Audiovisual Materials and its Role in Digital ProjectsJenn Riley
Riley, Jenn. "Metadata for Audiovisual Materials and its Role in Digital Projects." Online Audiovisual Catalogers (OLAC)/Music OCLC Users. Group (MOUG) Joint Conference, September 2008.
This is a project proposal presentation for my English 30 (English for the Professions) class, showcasing the present condition of the University of the Philippines Main Library, and highlighting 5 points for improvement of the library's facilities and services.
A presentation on the digital preservation of audiovisual materials, including a brief history of media formats and file types, among others. It's a bit of a rushed work, I admit, plus the text designs are not as smooth as before I converted the PPT to PDF format.
Digital preservation aims to maintain access to digital materials over time despite changes in technology. It faces challenges such as the obsolescence of storage technologies, instability of storage media, and maintaining the integrity of digital materials. Recommendations include starting preservation close to creation to ensure future access and increasing awareness of preservation techniques. While skepticism remains, institutions like the Library and Archives of Canada are mandated to preserve digital heritage for present and future generations through continued migration to new technologies and formats.
Handout for Digital Imaging of PhotographsJenn Riley
This document provides guidelines for digitizing sheet music collections at the Lilly Library, including specifications for file formats, resolution, naming conventions, and scanning procedures. Key steps include wearing gloves, handling pages carefully, scanning pages sequentially in color or grayscale as needed, using consistent pixel dimensions within each item, and recording metadata in a scan log spreadsheet. The goal is to digitally capture all relevant content like illustrations, advertisements, and annotations, while preserving the original order and organization of the physical materials.
Measuring reliability and validity in human coding and machine classificationStuart Shulman
Slides delivered as a part of #CAQDAS14.
In 1989 the Department of Sociology at the University of Surrey convened the world's first conference on qualitative software, which brought together qualitative methodologists and software developers who debated the pros and cons of the use of technology for qualitative data analysis. The result was a book (Fielding & Lee (1991) Using Computers in Qualitative Research, Sage Publications), the setting-up of the CAQDAS Networking Project and many other conferences concerning the topics over the years.
This conference will be another opportunity for methodologists, developers and researchers to come together and debate the issues.There will be keynote papers by leading experts in the field, software support clinics and opportunities to present work in progress.
http://www.surrey.ac.uk/sociology/files/Programme%20.pdf
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
The document discusses the future of chemistry librarianship, addressing how the field has changed with new technologies, what may be on the horizon, and how information professionals can prepare. It notes how technologies like the web, mobile devices, and digital publishing have impacted the field. Emerging trends that may shape the future include open science, institutional repositories, and global collaboration. The document recommends that information professionals evaluate user needs, integrate new technologies, and prioritize services to navigate an evolving landscape.
High Performance Computing and the Opportunity with Cognitive TechnologyIBM Watson
With the ability to reduce “time to insight” and accelerate research breakthroughs by providing immense computational power, high performance computing is becoming increasingly important in the marketplace. Meanwhile, cognitive technology has risen to prominence, similarly accelerating new insight, but through a very different approach - by analyzing previously ignored unstructured data, which accounts for 80% of new data created today.
By combining the powerful computing power of the HPC market, along with the machine learning, natural language processing, and even computer vision techniques found within cognitive technology, there is a huge opportunity to accelerate breakthroughs and enable better decision making than ever before.
Watch the replay of the webinar: https://www.youtube.com/watch?v=Hxgieboj3W0
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
"Building a Sustainable Knowledge Base for the Marine Protected Areas Monitoring Enterprise" a presentation to the California Ocean Science Trust, Oakland, California March 16, 2010
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
The document discusses research data management and provides guidance on how to manage research data. It defines research data and explains why proper data management is important, such as ensuring data quality and access. It also outlines Oxford's activities to support data management, including interviews with researchers to understand challenges and requirements. Finally, it provides recommendations on developing a data management plan and offers services available at Oxford to help with file handling, metadata, storage, sharing, and long-term preservation of research data.
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
(Em)Powering Science: High-Performance Infrastructure in Biomedical ScienceAri Berman
We’ll explore current and future considerations in advanced computing architectures that empower the conversion of data into knowledge. Life sciences produce the largest amount of data production out of all major science domains, making analytics and scientific computing cornerstones of modern research programs and methodologies. We’ll highlight the remarkable biomedical discoveries that are emerging through combined efforts, and discuss where and how the right infrastructure can catalyze the advancement of human knowledge. On-premises architectures as well as cloud, hybrid, and exotic architectures will all be discussed. It’s likely that all life science researchers will required advanced computing to perform their research within the next year. However, there has been less focus on advanced computing infrastructures across the industry due to the increased availability of public cloud infrastructure anything as a service models.
Knowledge Management in the AI Driven Scintific SystemSubhasis Dasgupta
In this dynamic talk, we'll explore the transformative role of AI in scientific knowledge management. We'll delve into how AI revolutionizes data organization, analysis, and hypothesis testing, enhancing efficiency and discovery. Highlighting the seamless integration with existing research processes, we'll address the training and ethical considerations of AI adoption. Through real-world examples, we'll demonstrate AI's impact on scientific breakthroughs, emphasizing the shift towards more collaborative and innovative research landscapes. This presentation aims to inspire the scientific community to embrace AI, leveraging its potential to redefine the boundaries of knowledge and innovation.
The Digital Curation Centre was created to help build skills and capabilities around research data management in UK higher education by providing support and guidance to address challenges that individual institutions cannot tackle alone. The document discusses why managing research data has become important due to factors like large datasets, funder requirements, and the need for open science. It also examines some of the challenges around issues like scale, infrastructure needs, policies, and developing skills and incentives around data management.
This document summarizes a presentation about preserving scientific data at the American Museum of Natural History (AMNH). It discusses current issues in digital preservation and science, provides an overview of the AMNH including its collections and staff, and outlines a project to understand the AMNH's digital preservation needs. A survey was conducted and results showed challenges around management, personnel, infrastructure, and preservation risks. The AMNH needs to improve its digital preservation practices to better protect and provide access to its valuable scientific data and collections.
This document summarizes the findings of a research data census conducted at Montana State University. The census was a partnership between the university's Information Technology Center, Library, and Vice President for Research & Economic Development. It found that the amount of research data is growing significantly due to new instruments and technologies. Researchers are interested in data infrastructure and services to help store, share, and annotate their data. The census informed proposals to the National Science Foundation for new data network investments and a collaboration between the library and IT to provide data services to researchers.
Paul Henning Krogh A New Dawn For E Collaboration In ScienceVincenzo Barone
Plone has growing reputation within research for working as an important component in international scientific collaboration infrastructures. In this panel session researchers shall present and answer questions on both their experiences in using Plone in a scientific context and on their research of studying Plone in use by scientists. Attendees will leave with a better conception of what is needed for international scientific collaboration and what Plone can offer as an e-collaboration tool to support research infrastructures. The panel participants will bring in expertise on computer supported collaborative work (CSCW) to stimulate use and development of Plone applications for such use cases. Panel headlines: - Exchange experiences with Plone in research environments (use cases) - Requirements for Plone in research environments: what's available, which extensions or modifications do we need? - Coordinate actions around Plone products for scientific use - Promote the use of Plone in scientific environments - Confront conceptions of collaborative research processes with Plone implementations of such models
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
The document discusses the challenges of big data research. It outlines three dimensions of data challenges: volume, velocity, and variety. It then describes the major steps in big data analysis and the cross-cutting challenges of heterogeneity, incompleteness, scale, timeliness, privacy, and human collaboration. Overall, the document argues that realizing the full potential of big data will require addressing significant technical challenges across the entire data analysis pipeline from data acquisition to interpretation.
This document appears to be a presentation given by Tom Johnson at the Esri Health Conference in Scottsdale, Arizona on August 28, 2012. The presentation discusses how data and maps inform each other, with data being used to create maps and maps then guiding the collection of additional data. It also outlines four potential types of data/analytic variables that can be studied for any phenomenon: qualitative, quantitative, geographic, and timeline of change. The presentation argues that addressing complex health issues will require transdisciplinary collaboration and going beyond the traditional three-phase process of data in, analysis, and information out.
Mr. Ben Wekalao Namande is a principal librarian at the Kenya National Archives and Documentation Service and is currently pursuing his PhD at Kenyatta University. The document discusses the Kenya National Archives' efforts to digitize over 680 million pages of records in order to preserve them and provide access. It describes the four phases of digitization undertaken so far, challenges faced like inadequate resources, and the requirements for building a digital collection and information center. The Kenya National Archives has partnered with other government organizations on cooperative digitization projects and aims to make the digitized information accessible online.
This document discusses the challenges of long-term access to digital records for radioactive waste management. It notes that waste records need to be accessible for 50-1000+ years to meet societal, regulatory, and operational needs. Current records use a variety of media like paper, microform, removable digital storage, and material samples. Ensuring the preservation of digital records' provenance, readability, and usability over long periods of time is difficult due to changing technologies and standards. International cooperation is needed to develop standards and best practices for long-term digital records strategies that can transcend organizational changes.
This document provides an overview of a workshop on data management plans. The workshop aims to provide practical advice on preparing data management plans using online tools. It discusses the context and benefits of data management planning, including upcoming requirements from Canadian funding agencies. It also covers the purpose and elements of data management plans, and examples of institutions that have implemented data management plan policies.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Measuring reliability and validity in human coding and machine classificationStuart Shulman
Slides delivered as a part of #CAQDAS14.
In 1989 the Department of Sociology at the University of Surrey convened the world's first conference on qualitative software, which brought together qualitative methodologists and software developers who debated the pros and cons of the use of technology for qualitative data analysis. The result was a book (Fielding & Lee (1991) Using Computers in Qualitative Research, Sage Publications), the setting-up of the CAQDAS Networking Project and many other conferences concerning the topics over the years.
This conference will be another opportunity for methodologists, developers and researchers to come together and debate the issues.There will be keynote papers by leading experts in the field, software support clinics and opportunities to present work in progress.
http://www.surrey.ac.uk/sociology/files/Programme%20.pdf
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
The document discusses the future of chemistry librarianship, addressing how the field has changed with new technologies, what may be on the horizon, and how information professionals can prepare. It notes how technologies like the web, mobile devices, and digital publishing have impacted the field. Emerging trends that may shape the future include open science, institutional repositories, and global collaboration. The document recommends that information professionals evaluate user needs, integrate new technologies, and prioritize services to navigate an evolving landscape.
High Performance Computing and the Opportunity with Cognitive TechnologyIBM Watson
With the ability to reduce “time to insight” and accelerate research breakthroughs by providing immense computational power, high performance computing is becoming increasingly important in the marketplace. Meanwhile, cognitive technology has risen to prominence, similarly accelerating new insight, but through a very different approach - by analyzing previously ignored unstructured data, which accounts for 80% of new data created today.
By combining the powerful computing power of the HPC market, along with the machine learning, natural language processing, and even computer vision techniques found within cognitive technology, there is a huge opportunity to accelerate breakthroughs and enable better decision making than ever before.
Watch the replay of the webinar: https://www.youtube.com/watch?v=Hxgieboj3W0
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
"Building a Sustainable Knowledge Base for the Marine Protected Areas Monitoring Enterprise" a presentation to the California Ocean Science Trust, Oakland, California March 16, 2010
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
Maximizing the value of data, computing, data science in an academic medical center, or 'towards a molecularly informed Learning Health System. Given in October at the University of Florida in Gainesville
The document discusses research data management and provides guidance on how to manage research data. It defines research data and explains why proper data management is important, such as ensuring data quality and access. It also outlines Oxford's activities to support data management, including interviews with researchers to understand challenges and requirements. Finally, it provides recommendations on developing a data management plan and offers services available at Oxford to help with file handling, metadata, storage, sharing, and long-term preservation of research data.
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
(Em)Powering Science: High-Performance Infrastructure in Biomedical ScienceAri Berman
We’ll explore current and future considerations in advanced computing architectures that empower the conversion of data into knowledge. Life sciences produce the largest amount of data production out of all major science domains, making analytics and scientific computing cornerstones of modern research programs and methodologies. We’ll highlight the remarkable biomedical discoveries that are emerging through combined efforts, and discuss where and how the right infrastructure can catalyze the advancement of human knowledge. On-premises architectures as well as cloud, hybrid, and exotic architectures will all be discussed. It’s likely that all life science researchers will required advanced computing to perform their research within the next year. However, there has been less focus on advanced computing infrastructures across the industry due to the increased availability of public cloud infrastructure anything as a service models.
Knowledge Management in the AI Driven Scintific SystemSubhasis Dasgupta
In this dynamic talk, we'll explore the transformative role of AI in scientific knowledge management. We'll delve into how AI revolutionizes data organization, analysis, and hypothesis testing, enhancing efficiency and discovery. Highlighting the seamless integration with existing research processes, we'll address the training and ethical considerations of AI adoption. Through real-world examples, we'll demonstrate AI's impact on scientific breakthroughs, emphasizing the shift towards more collaborative and innovative research landscapes. This presentation aims to inspire the scientific community to embrace AI, leveraging its potential to redefine the boundaries of knowledge and innovation.
The Digital Curation Centre was created to help build skills and capabilities around research data management in UK higher education by providing support and guidance to address challenges that individual institutions cannot tackle alone. The document discusses why managing research data has become important due to factors like large datasets, funder requirements, and the need for open science. It also examines some of the challenges around issues like scale, infrastructure needs, policies, and developing skills and incentives around data management.
This document summarizes a presentation about preserving scientific data at the American Museum of Natural History (AMNH). It discusses current issues in digital preservation and science, provides an overview of the AMNH including its collections and staff, and outlines a project to understand the AMNH's digital preservation needs. A survey was conducted and results showed challenges around management, personnel, infrastructure, and preservation risks. The AMNH needs to improve its digital preservation practices to better protect and provide access to its valuable scientific data and collections.
This document summarizes the findings of a research data census conducted at Montana State University. The census was a partnership between the university's Information Technology Center, Library, and Vice President for Research & Economic Development. It found that the amount of research data is growing significantly due to new instruments and technologies. Researchers are interested in data infrastructure and services to help store, share, and annotate their data. The census informed proposals to the National Science Foundation for new data network investments and a collaboration between the library and IT to provide data services to researchers.
Paul Henning Krogh A New Dawn For E Collaboration In ScienceVincenzo Barone
Plone has growing reputation within research for working as an important component in international scientific collaboration infrastructures. In this panel session researchers shall present and answer questions on both their experiences in using Plone in a scientific context and on their research of studying Plone in use by scientists. Attendees will leave with a better conception of what is needed for international scientific collaboration and what Plone can offer as an e-collaboration tool to support research infrastructures. The panel participants will bring in expertise on computer supported collaborative work (CSCW) to stimulate use and development of Plone applications for such use cases. Panel headlines: - Exchange experiences with Plone in research environments (use cases) - Requirements for Plone in research environments: what's available, which extensions or modifications do we need? - Coordinate actions around Plone products for scientific use - Promote the use of Plone in scientific environments - Confront conceptions of collaborative research processes with Plone implementations of such models
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
The document discusses the challenges of big data research. It outlines three dimensions of data challenges: volume, velocity, and variety. It then describes the major steps in big data analysis and the cross-cutting challenges of heterogeneity, incompleteness, scale, timeliness, privacy, and human collaboration. Overall, the document argues that realizing the full potential of big data will require addressing significant technical challenges across the entire data analysis pipeline from data acquisition to interpretation.
This document appears to be a presentation given by Tom Johnson at the Esri Health Conference in Scottsdale, Arizona on August 28, 2012. The presentation discusses how data and maps inform each other, with data being used to create maps and maps then guiding the collection of additional data. It also outlines four potential types of data/analytic variables that can be studied for any phenomenon: qualitative, quantitative, geographic, and timeline of change. The presentation argues that addressing complex health issues will require transdisciplinary collaboration and going beyond the traditional three-phase process of data in, analysis, and information out.
Mr. Ben Wekalao Namande is a principal librarian at the Kenya National Archives and Documentation Service and is currently pursuing his PhD at Kenyatta University. The document discusses the Kenya National Archives' efforts to digitize over 680 million pages of records in order to preserve them and provide access. It describes the four phases of digitization undertaken so far, challenges faced like inadequate resources, and the requirements for building a digital collection and information center. The Kenya National Archives has partnered with other government organizations on cooperative digitization projects and aims to make the digitized information accessible online.
This document discusses the challenges of long-term access to digital records for radioactive waste management. It notes that waste records need to be accessible for 50-1000+ years to meet societal, regulatory, and operational needs. Current records use a variety of media like paper, microform, removable digital storage, and material samples. Ensuring the preservation of digital records' provenance, readability, and usability over long periods of time is difficult due to changing technologies and standards. International cooperation is needed to develop standards and best practices for long-term digital records strategies that can transcend organizational changes.
This document provides an overview of a workshop on data management plans. The workshop aims to provide practical advice on preparing data management plans using online tools. It discusses the context and benefits of data management planning, including upcoming requirements from Canadian funding agencies. It also covers the purpose and elements of data management plans, and examples of institutions that have implemented data management plan policies.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
A tale of scale & speed: How the US Navy is enabling software delivery from l...
To Preserve Or Not To Preserve?
1. To Preserve Or Not
To Preserve?
The Challenges in
Appraising
Electronic Records
ect o c eco ds
Peter Bajcsy, PhD
- Research Scientist, NCSA
- Adjunct Assistant Professor ECE & CS at
UIUC
- Associate Director Center for Humanities,
Social Sciences and Arts (CHASS), Illinois
Informatics Institute (I3), UIUC
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
Date: January 21st, 2009
2. Acknowledgement
• This research was partially supported by a National
Archive and Records Administration (NARA) supplement
( ) pp
to NSF PACI cooperative agreement CA #SCI-9619019
and NCSA Industrial Partners.
• The views and conclusions contained in this doc ment
ie s concl sions document
are those of the authors and should not be interpreted as
representing the official policies, either expressed or
implied, of the National Archive and Records
Administration, or the U.S. government.
• Contributions by: Peter Bajcsy Kenton McHenry Rob
Bajcsy, McHenry,
Kooper, Michal Ondrejcek, William McFadden, Sang-
Chul Lee, David Clutter and Alex Yahja
Imaginations unbound
3. Outline
• Introduction
• Stakeholders
• Conceptual Challenges
• Some Open Problems
• Research Examples Illustrating Open
Problems
• Summary Observations and Future
Summary,
Vision
4. Introduction
• Two Trends in the Context of Decision Processes
(Government, Medical, Natural Disasters, …)
• Decision processes are moving from paper based
to electronic record based (~ computer assisted
decision processes)
• Electronic records depend on rapidly changing
information technology
• Decisions are optimal depending on knowledge
• Any learning from electronic records depends on
preservation and reconstruction of the records, as
well as on quality and granularity of the information
National Center for Supercomputing Applications
5. Fundamental Problems
• Limited learning from historical records
today
• It is often due to missing information and
high uncertainty/ low quality of historical
records.
• Lack of understanding how to preserve and
reconstruct data and decision processes.
• It is due to insufficient
forecasting/simulation capabilities.
National Center for Supercomputing Applications
6. To Be Preserved!
Digital
representation of Preservation
information
i f ti
& knowledge
Information
transfer ?
AGENCY ARCHIVES
Imaginations unbound
7. Motivation
• The problems related to preservation of electronic records
are only going to become more serious
• Information becomes more heterogeneous and complex
• More data types
• Higher dimensional data
• N
New fil f
file formats
t
• Volumes of electronic records have been increasing and will
continue to grow
• The model of a paperless office (4 years of Bush’s email > 8
years of Clinton’s email)
• The paradigm shift to eScience
• Digital information technology has been changing faster than
any previous preservation media
• The time scale of electronic media is ephemeral in comparison
p p
with paper or clay tablets
Imaginations unbound
8. Example of Preservation Needs in Medicine
• Short term:
• Medical practice requires comparing patients’
records acquired today with the patients’
records f
d from 5 10 50 or 70 years i order t
5, 10, 50, in d to
assess functional, structural or low level
biological changes due to diseases
diseases,
treatments and/or aging.
• Long term:
• Genealogy studies compare data sets over
several hundreds and thousands of years
y
National Center for Supercomputing Applications
9. Who Are the Stakeholders?
• Multiple institutions and organizations are active in the area
of medical record preservation
• National Library of Medicine (NLM)
y ( )
• Research Information Network (RIN)
• Medical Research Council (MRC) in UK
• National Archives and Record Administration (NARA)
• Identified common goals:
• S
Seamless, uninterrupted access t expanding collections
l i t t d to di ll ti
of biomedical data, medical knowledge, and health
information
• Preserve medical record collections in highly usable
forms and contribute to comprehensive strategies for
preservation of biomedical information in the U S and
U.S.
worldwide.
National Center for Supercomputing Applications
10. Other Stakeholders
• Government agencies
• Prediction of patterns signaling natural disasters
based on hi t i l measurements
b d historical t
• Detection of terrorist attacks based on past
experience
• Learning about other planets from past space shuttle
missions
• Preservation of cultural heritage
• Companies
• P
Preservation of engineering d
ti f i i drawings and
i d
architectural designs – Boeing, John Deere, GM
• Preservation of simulation results – Caterpillar, Ford
p ,
• Backward compatibility of hardware/software - GE
Imaginations unbound
11. NARA as One of the Key Stakeholders
• According to The Strategic Plan of The
National Archives and Records
Administration 2006–2016. “Preserving th
Ad i i t ti 2006 2016 “P i the
Past to Protect the Future”
• “Strategic Goal: We will preserve and
Strategic
process records to ensure access by the
p
public as soon as legally p
g y possible”
• “D. We will improve the efficiency with
which we manage our holdings from
the time th are scheduled th
th ti they h d l d through h
accessioning, processing, storage,
preservation, and public use.”
use.
12. Conceptual Challenges
• Learning Requires Reusing Electronic Records
• How to enable and support preservation and
reconstruction of electronic records?
• Advancing Sensors and Instruments Leads to New
Types of High Dimensional Data and Large Volumes
• How to design preservation methodologies that
scale well?
• Process to Enable Learning over Time from
Electronic Records Requires Large Financial
Investments
• How to minimize computational hardware,
software,
software and storage cost and maximize the
amount of preserved information?
National Center for Supercomputing Applications
13. What Are The Key Open Problems?
Imaginations unbound
14. Some Open Problems -> Intellectual Merit
• Appraisal Methodology
• Appraisal by Visual Exploration
• Support of Appraisals by Enabling Comparisons
• Scalability of Appraisals with Increasing Heterogeneity of
Information, Dimensionality of Data and Volume of Electronic
Records
• Support of Archival Decisions
• Simulate Preservation Costs as a Function of Information
Granularity and I f
G l it d Information Technology
ti T h l
• Optimal Utilization of Computational and Human Resources
• Automation of Processing for Preservation
g
• Discovery of Relationships Among Electronic Records
• Information Preserving Conversions of Electronic Records
• Sampling Authenticity and Integrity Verification of a Collection of
Sampling,
Temporally Changing Records
Imaginations unbound
15. Broader Impacts
Process to Enable
Learning Over Time
Electronic +$ Knowledge
Records
-$
Optimal Decision Making
National Center for Supercomputing Applications
17. Open Problems Related to Appraisal
Methodology
1. Appraisal by Visual Exploration
2. Support of Appraisals by Enabling Comparisons
3. Scalability of Appraisals with Increasing Heterogeneity of
Information, Dimensionality of Data and Volume of Electronic
Records
Imaginations unbound
18. Definition of Appraisal in Archival Context
• Appraisal -- the process of determining the value and thus
the final disposition of Federal records making them either
records,
temporary or permanent.
• See http://www.archives.gov/records-
p g
mgmt/initiatives/appraisal.html
• The basis of appraisal decisions may include
• th records'' provenance and content,
the d d t t
• the records' authenticity and reliability,
• the records‘ order and completeness,
records completeness
• the records‘ condition and costs to preserve them, and
• the records‘ intrinsic value
records
Imaginations unbound
19. Open Problem 1: Appraisal by Visual
Exploration
• How to visualize the transition from raw data to information?
• Raw data (Byte stream) -> Information 0F0 ->(R.G,B)->GREEN
• How to encode and represent heterogeneous information for
visual exploration and for computer assisted operations?
computer-assisted
• Encoding (e.g., shape consisting of a set of Bezier
curves is encoded by a set of straight lines)
• Representation (e.g., colors are represented by an
ordered sequence of intensity values from all bands)
• H
How t summarize representations for visual exploration?
to i t ti f i l l ti ?
• Frequency of occurrence of primitives
• Local and global summarizations
Imaginations unbound
20. Example: Adobe Portable Document
Format (PDF)
• Why PDF? - PDF is just an example of a container
• Office environment (Adobe PDF PS, MS Word, HTML …)
PDF, PS Word HTML, )
• Satellite measurements (HDF, netCDF, …)
3D
Adobe Library 6.0
Movie
Adobe Lib
Ad b Library 7 0
7.0
Imaginations unbound
21. Exploration of PDF Documents Using PDF
Viewer
• PDF Viewer presents information as a set of pages with
their layouts
• PDF Viewer renders layers of internal objects
(components) and hence only the top layer is visible
22. Needed Exploration of PDF Components
p p
• There is no support for archival appraisals that would
include visual exploration of components in a document
(a container of components)
• Needed viewers for appraisal analyses that present
information stored in a container (e.g., PDF) as a set of
components and their characteristics
• Text – word frequency
• Images (rasters) – color frequency (histogram)
• Vector graphics – line frequency
• Exploration for appraisal analyses needs to include
visible and invisible objects
23. Exploration of Text Components
LOADED FILES
Occurrence of words Occurrence of numbers
“Ignore” words
24. Exploration of Image Components
LOADED FILES “Ignore” colors
List of images Occurrence of colors Preview
25. Exploration of Vector Graphics
Components
LOADED FILES
Preview Occurrence of v/h lines
Imaginations unbound
26. Exploration of Visible And Invisible Objects
Objects intersected at the
mouse click location
27. Open Problem 2: Support of Appraisals
by Enabling Comparisons
• How to compare containers with heterogeneous
information?
i f ti ?
• Methodology
• Metrics
• Weighting factors for fusion
• How to quantify differences between the same
type of information?
• Encodings and Representations
• Metrics
• Local versus global differences
Imaginations unbound
30. Experimental Example
INPUT = 10 PDF docs (4 & 6 Groups)
UNIQUE ID= 1,2,3,4 UNIQUE ID= 5,6,7,8,9,10
Imaginations unbound
31. Comparative Experimental Results
INPUT = 10 PDF docs
(6 & 4 members in each Group)
Vector-based similarity
V b d i il i
Text-based similarity Image-based similarity
32. Comparative Experimental Results
Vector Graphics Similarity Portion of Document Surface
and Word Similarity Combined Allotted to Each Document Feature
Comparison Using
Combination of Document
Features in Proportion to
Coverage
33. Accuracy Comparisons
Method Average Average Average
Similarity of Similarity of Similarity Across
Group 1 Group 2 Group 1 & 2
TEXT ONLY 1 0.489 0
TEXT & IMAGE & 0.906
0 906 0.520
0 520 0.075
0 075
GRAPHICS
One refers to high similarity & zero refers to low similarity
g y y
Conclusions:
•Differences in similarity are up to 10% of the score
•Documents in Group 2 would likely be misclassified as 0.5
similarity would be the threshold between similar and
dissimilar documents
Imaginations unbound
34. Open Problem 3: Scalability of
Appraisals
• Scalability of appraisals with increasing
heterogeneity of information,
dimensionality of data and volume of
electronic records
• H
How should appraisal process change
h ld i l h
as 3D data is added to file containers?
• H
How should appraisal process change
h ld i l h
as 3D+time, 2D+spectrum,
3D+time+spectrum, nD,
3D+time+spectrum nD …
• How should appraisal operations be
designed to accommodate growing
volume of electronic records?
Imaginations unbound
35. Approaches to Computational Scalability of
Document Appraisals
• Options for parallel processing
• message-passing interface (MPI)
• MPI is d i
i designed f the coordination of a program running as multiple
d for h di i f i li l
processes in a distributed memory environment by using passing
control messages.
• open multi-processing (OpenMP)
multi processing
• OpenMP is intended for shared memory machines. It uses a
multithreading approach where the master threads forks any
number of slave threads
threads.
• Map Reduce parallel programming paradigm for commodity
clusters
• It l t programmers write simple Map function and Reduce
lets it i l M f ti dR d
function, which are then automatically parallelized without
requiring the programmers to code the details of parallel
processes and communications
• Specialized Hardware: FPGA, Cell processors, GPU
Imaginations unbound
37. Hardware & Software Dependencies with
Hadoop
• Test data: 15 PDF files from the Columbia investigation
p g
web site at http://caib.nasa.gov/.
• Software configuration: Linux OS (Ubuntu flavor) and
the Hadoop implementation of Map and Reduce
functionalities
f nctionalities
• Hardware configuration: homogeneous &
heterogeneous machines
g
Hadoop Average Speed
60
50
nds
40
secon
30 average speed
20
10
0
1 2 3 4 5
#machines
Homogeneous Hardware Heterogeneous Hardware
Imaginations unbound
38. Open Problems Related to Archival
Decisions
•Simulate Preservation Costs as a Function of Information
Granularity and Information Technology
•Optimal Utilization of Computational and Human
Resources
Imaginations unbound
39. Open Problem: Archival Decision Support
• Decision support for forecasting preservation
costs
• How to predict computational and storage
p p g
requirements of preservation as a function
of technology variables and information
gy
granularity?
• How to optimize computational hardware,
software, storage, and networking
investments?
Imaginations unbound
40. Basic Questions About Information to be
Preserved
National Center for Supercomputing Applications
41. Challenges in Forecasting
• Volatility of software/hardware/storage media
• Updates: Windows operating systems since 2000: Two major new
releases, two minor service pack updates, around fifty security
, p p , y y
patches since SP2
• Upgrades: Microsoft Office Pro for Windows
95/98/ME/2000/XP/2003/2007
• Media life expectancy: Optical ~5 years Disk ~ 15 years Microfiche ~
5 years, years,
100, microfilm ~ 300, newspaper ~ 50, clay tablet ~ 10,000 (life
expectancy vs. information density – [P. Conway, 1996] )
• Cost of software/hardware/storage media
• Operating System: Windows 3.1/95/98/NT/2000/XP/Vista: Windows
95 = $209; Windows NT = $280; Windows XP = $300; Windows Vista =
$399->$319 (2008)
• 128 MB of SDRAM: Year 1999 ~ $120-> $40 -> $200 250 due to
$120 > > $200-250
Earthquake in Taiwan -> March 2000 ~ $55->March 2007 ~ $8.96
(flash card) - www.pricewatch.com (1TB ~$109.95 as of 01/15/2009)
• High performance computers: 2006: DARPA awards approximately
$500 million to Cray and IBM; 2007 NSF $200 million to NCSA/IBM
National Center for Supercomputing Applications
42. Archival Decision Support
• Lack of forecasting models to predict preservation costs
• Our work: Understand the tradeoffs between information
value and computational/storage costs by providing
simulation frameworks
• Information granularity, organization, compression, encryption,
document format, ...
• Versus
• Cost of CPU for gathering information, for processing and for
input/output operations; cost of storage media, upgrades, storage
p p p ; g , pg , g
room, …
• Prototype simulation framework: Image Provenance To
Learn available for downloading from
http://isda.ncsa.uiuc.edu
43. Simulation Framework
Information Information
Gathering and Retrieval and
Decision Maker Storage Process Learning
Preservation
Reconstruction
Value
Provenance Provenance
Information Information
Value
linear
Value
observed
Cost (memory, CPU)
Cost / Information Granularity
Analysis
Image Viewer Process Reconstruction System
Information Gathering System
National Center for Supercomputing Applications
46. Storage vs. Information Organization
Tradeoffs: Test Case
• Information granules include interpreted, raw and snapshots
• Files were not compressed
Event Name
Saved Size
Change Auto Zoom
Change Gray Scale
Change RGB Band
Add Annotation
Mouse Clicked
Mouse Clicked -RDF= Resource
Magnification Description
Change Selection
Window Hidden RDF
Framework
Change Gamma
Key Pair
Metadata Model
Window Shown
New Image
Change Visible Region -Key pair = XML
Change Zoom Factor Metadata Model
Window Created
1 10 100 1000 10000 100000 1000000 10000000
Bytes (log scale)
National Center for Supercomputing Applications
47. Open Problems Related to Automating
Archival Processing for Preservation
1. Discovery of Relationships Among Electronic Records
2. Information Preserving Conversions of Electronic Records
3. Sampling, Authenticity and Integrity Verification of a Collection
of Temporally Changing Records
Imaginations unbound
48. Open Problem 1: Discovering
Relationships Among Files
• How should one establish relationships among electronic
records coming from disparate sources or from the same
source at multiple time instances?
• How to extract metadata?
• What ontology to use to represent the extracted
metadata?
• H
How t automate metadata extraction from multiple data
to t t t d t t ti f lti l d t
types, e.g., 2D drawings and 3D CAD models?
• How to discover relationships between electronic records
corresponding to the same physical objects but different
multidimensional observations?
• Need to Understand the Complexity of the Problem
Imaginations unbound
49. Metadata Extraction: Complexity & Size
the Crandon Mine Reports
p
from 1981 till 2003
http://digicoll.library.wisc.edu/cgi-bin/EcoNatRes/EcoNatRes-
idx?type=browse&scope=ECONATRES.CRANDONMINE
RDF t i l extracted using A t
triples t t d i Aperture and visualized using RDF
d i li d i RDF-
Gravity (red – edges, green-literal values, violet – properties)
Imaginations unbound
50. Relationships Among Multiple Data Types
• Example Data: Torpedo Weapon Retriever 841
• 784 existing 2D image drawings and N>22 3D CAD
models
• How to establish relationships among the 3D
CAD models and 2D image drawings during a
product lifecycle?
Hypothetical Distribution of 3D CAD models for
TWR 841
Imaginations unbound
51. Understanding Challenges in Automation
ry
Relationship Discover
D
OCR
Descriptors (metadata)
Representation
Imaginations unbound
52. Open Problem 2: Conversions of
Electronic Records
• Conversions of electronic records are needed because
• Visual exploration depends on various software
packages
• Many formats are retired (deprecated) over time
• A subset of formats is selected for preservation
purposes
• How to measure the degree of information
g
preservation when files are converted from format A to
format B?
• During conversions, information could be lost added or modified
conversions lost,
• What is the importance of each byte, object, etc. ?
• How to introduce a framework for measuring the
quality of conversion and visualization software?
Imaginations unbound
53. Example: Conversion of X3D to STEP to X3D
Software:
X3dToVrml97
X3D Software: WRL
A3D Reviewer
Software:
A3D Reviewer
Software: Nothing!
Vrml97ToX3d
STEP WRL X3D
54. Automation of 3D File Format Mapping &
Conversion
Imaginations unbound
55. Open Problem 3: Sampling,
Integrity and Authenticity
g y y
• Given finite resources and increasing amounts of electronic
records, automation of sampling, integrity and authenticity
verification is very much needed
• What are the criteria for sampling a collection of temporally
changing versions of ‘the same’ document?
• Authenticity
• Integrity
• Information content
• How to measure a degree of authenticity?
• Computers might assign inaccurate time stamps to records
• How to detect integrity failures?
• A record containing a female patient with prostate cancer
• How to incorporate constraints into sampling?
• Storage space, compression computational cost, etc.
Imaginations unbound
56. Example:Temporal Ranking and Integrity
Verification
• Chronological ranking
based on time stamps of
files
fil
• Last modification (current
implementation)
• Ranking can be
changed by a human
• Content referring to
dates can be used for
integrity verification
TIME
Imaginations unbound
57. Rules and Attributes for Integrity Verification
• Document integrity attributes?
• appearance or disappearance of document images
• appearance and disappearance of dates embedded in
documents
• file size
• count of image groups
• number of sentences
• average value of dates found in a document
• Rules?
Imaginations unbound
58. Summary
• Introduced a set of open problems
related to
•AAppraisal of electronic records
i l f l t i d
• Archival forecasting of preservation
costs
• Automation of processing for
preservation
• Examples used for illustrating the open
problems from our research just
scratch the surface of some of the open
problems
bl
59. Observations
• Many stakeholders are already aware of some of the
open problems including government agencies and
companies
• As all government agencies have been
computerized, the continuity and functioning of the
agencies depend on preservation and reconstruction
of electronic records
• Right now, we are at the beginning of the
exponential growth of electronic records (many more
electronic records will be coming)
• Some scientific fields are already facing real time
decisions about preserving electronic records (e.g.,
astronomers)
t )
60. Future Vision
• It is envisioned that the preservation and
reconstruction of electronic records have to
follow different paradigms that incorporate
• Scalability (heterogeneity, dimensionality
and volume) )
• Forecasting of preservation costs
• New level of automation and quality
control in processing for preservation
purposes
• The field of electronic record management
and preservation needs forward looking
solutions to stay abreast with the dynamics
y y
of digital information
Imaginations unbound
61. References to Presented Research
• -Bajcsy P., R. Kooper and S-C. Lee, “Understanding Preservation and Reconstruction Requirements for Computer
Assisted Decision Processes,” ACM Journal on Computers and Cultural Heritage (JOCCH), (submitted October 2008).
• -Bajcsy P., “A Perspective on Cyberinfrastructure for Water Research Driven by Informatics Methodologies,” Geography
Bajcsy A Methodologies,
Compass, Volume 2, Issue 6 (p 2040-2061), 2008 Blackwell Publishing Ltd, URL: http://www3.interscience.wiley.com/cgi-
bin/fulltext/121478978/PDFSTART
• -Bajcsy P., R. Kooper, L. Marini and J. Myers, “Community-Scale Cyberinfrastructure for Exploratory Science,” In:
Cyberinfrastructure Technologies and Applications book, Editor: Junwei Cao, Nova Science Publishers, Chapter 12, Inc.,
2009; URL: https://www.novapublishers.com/catalog/product_info.php?products_id=8011
; p p gp p p p
• - McHenry K. and P. Bajcsy quot;An Overview of 3D Data Content, File Formats and Viewers.quot;, Technical Report NCSA-
ISDA08-002, October 31, 2008
• -McFadden W., K. McHenry, R. Kooper, M. Ondrejcek, A. Yahja and P. Bajcsy, “Advanced Information Systems for
Archival Appraisals of Contemporary Documents,” the 4th IEEE International Conference on e-Science, December 8-12,
2008, Indianapolis, IN.
, p ,
• -Lee S-C, W. McFadden and P. Bajcsy, “Text, Image and Vector Graphics Based Appraisal of Contemporary
Documents,” The Seventh International Conference on Machine Learning and Applications, December 11-13, 2008, San
Diego, CA.
• -Bajcsy P. and S-C Lee, quot;Computer Assisted Appraisal of Contemporary PDF Documentsquot; ARCHIVES 2008: Archival
R/Evolution & Identities 72nd Annual Meeting Pre-conference Programs: August 24-27, 2008, San Francisco, CA.
g g g , , ,
• -Lee S-C. and P. Bajcsy, “Understanding Challenges in Preserving and Reconstructing Computer-Assisted Medical
Decision Processes,” the Workshop on Machine Learning in Biomedicine and Bioinformatics (MLBB07) of the 2007
International Conference on Machine Learning and Application (ICMLA07), Cincinnati, Ohio, December 13-15, 2007.
• -Bajcsy P and D. Clutter, “Gathering and Analyzing Information about Decision Making Processes Using Geospatial
Electronic Records, the 2006 Winter Federation of Earth Science Information Partners (“Federation”) Conference,
Records,” ( Federation )
poster, January 4-6, 2006 in Washington, DC.
Imaginations unbound
62. Questions
• Project URL:
j
http://isda.ncsa.uiuc.edu/NARA/index.html
and http://isda.ncsa.uiuc.edu/CompTradeoffs/
• Publications – see our URL at
http://isda.ncsa.uiuc.edu/publications
http://isda ncsa uiuc edu/publications
• Peter Bajcsy; email: pbajcsy@ncsa uiuc edu
pbajcsy@ncsa.uiuc.edu