Things to consider before, during and after a digitization project in an historical institution. Lecture by Daniel Jeller on the 13th September 2011 in Volterra.
OpenGL is a cross-language API for 2D and 3D graphics rendering on the GPU. It was created by Silicon Graphics in 1992 and is now maintained by the Khronos Group. OpenGL provides an interface between software and graphics hardware to perform tasks like rendering, texture mapping, and shading. Developers write OpenGL code that gets translated into GPU commands by a driver for the specific graphics card. This allows hardware-accelerated graphics to be used across many platforms and programming languages.
This document discusses storage management in database systems. It describes the storage device hierarchy from fastest but smallest (cache) to slowest but largest (magnetic tapes). It covers main memory, hard disks, solid state drives and tertiary storage. The document also discusses RAID configurations and how the relational model is represented on secondary storage through records, blocks, files and indexes.
This document provides an introduction to predictive analytics. It defines analytics and predictive analytics, comparing their purposes and differences. Analytics uses past data to understand trends while predictive analytics anticipates the future. Business intelligence involves using data to support decision making and aims to provide historical, current and predictive views of business. As technologies advanced, business intelligence evolved from being organized under IT to potentially being aligned under strategy management. Effective communication between business and analytics professionals is important for organizations to benefit from predictive analytics. The business case for predictive analytics includes enabling strategic planning, competitive analysis, and improving business processes to work smarter.
Many resources discuss machine learning and data analytics from a technology deployment perspective. From the business standpoint, however, the real value of analytics is in the methodology for solving some systemic holistic problems, rather than a specific technology or platform.
In this presentation, the focus is shifted from the technology deployment to the analytics methodology for solving some holistic business problems. Two examples will be covered in detail:
(i) Analysis of the performance and the optimal staffing of a team of doctors, nurses, and technicians for a large local hospital unit using discrete event simulation with a live demonstration. This simulation methodology is not included in most Machine Learning algorithms libraries.
(ii) Identifying a few factors (or variables) that contribute most to the financial outcome of a local hospital using principal component decomposition (PCD) of the large observational dataset of population demographic and disease prevalence.
The purpose of this presentation is providing an overview of the main approaches in using big data: data focus vs. business analytics focus. The following topics will be covered:
- Why getting data should not be a starting point in business analytics, and why more data not always result in more accurate predictions
- The simulation analytics methodology in comparison to machine learning and data science approach
- Examples of two business cases:
(i) Healthcare: Pediatric Triage in a Severe Pandemic-Maximizing Population Survival by Establishing Admission Thresholds
(ii) Banking & Finance: Analysis of the staffing and utilization of a team of mutual fund analysts for timely producing ‘buy-sell’ reports
The document discusses business intelligence and the decision making process. It defines business intelligence as using technology to gather, store, access and analyze data to help users make better decisions. This includes applications like decision support systems, reporting, online analytical processing, and data mining. It also discusses key concepts like data warehousing, OLTP vs OLAP, and the different layers of business intelligence including the presentation, data warehouse, and source layers.
OpenGL is a cross-language API for 2D and 3D graphics rendering on the GPU. It was created by Silicon Graphics in 1992 and is now maintained by the Khronos Group. OpenGL provides an interface between software and graphics hardware to perform tasks like rendering, texture mapping, and shading. Developers write OpenGL code that gets translated into GPU commands by a driver for the specific graphics card. This allows hardware-accelerated graphics to be used across many platforms and programming languages.
This document discusses storage management in database systems. It describes the storage device hierarchy from fastest but smallest (cache) to slowest but largest (magnetic tapes). It covers main memory, hard disks, solid state drives and tertiary storage. The document also discusses RAID configurations and how the relational model is represented on secondary storage through records, blocks, files and indexes.
This document provides an introduction to predictive analytics. It defines analytics and predictive analytics, comparing their purposes and differences. Analytics uses past data to understand trends while predictive analytics anticipates the future. Business intelligence involves using data to support decision making and aims to provide historical, current and predictive views of business. As technologies advanced, business intelligence evolved from being organized under IT to potentially being aligned under strategy management. Effective communication between business and analytics professionals is important for organizations to benefit from predictive analytics. The business case for predictive analytics includes enabling strategic planning, competitive analysis, and improving business processes to work smarter.
Many resources discuss machine learning and data analytics from a technology deployment perspective. From the business standpoint, however, the real value of analytics is in the methodology for solving some systemic holistic problems, rather than a specific technology or platform.
In this presentation, the focus is shifted from the technology deployment to the analytics methodology for solving some holistic business problems. Two examples will be covered in detail:
(i) Analysis of the performance and the optimal staffing of a team of doctors, nurses, and technicians for a large local hospital unit using discrete event simulation with a live demonstration. This simulation methodology is not included in most Machine Learning algorithms libraries.
(ii) Identifying a few factors (or variables) that contribute most to the financial outcome of a local hospital using principal component decomposition (PCD) of the large observational dataset of population demographic and disease prevalence.
The purpose of this presentation is providing an overview of the main approaches in using big data: data focus vs. business analytics focus. The following topics will be covered:
- Why getting data should not be a starting point in business analytics, and why more data not always result in more accurate predictions
- The simulation analytics methodology in comparison to machine learning and data science approach
- Examples of two business cases:
(i) Healthcare: Pediatric Triage in a Severe Pandemic-Maximizing Population Survival by Establishing Admission Thresholds
(ii) Banking & Finance: Analysis of the staffing and utilization of a team of mutual fund analysts for timely producing ‘buy-sell’ reports
The document discusses business intelligence and the decision making process. It defines business intelligence as using technology to gather, store, access and analyze data to help users make better decisions. This includes applications like decision support systems, reporting, online analytical processing, and data mining. It also discusses key concepts like data warehousing, OLTP vs OLAP, and the different layers of business intelligence including the presentation, data warehouse, and source layers.
Computer Graphics - Hidden Line Removal AlgorithmJyotiraman De
This document discusses various algorithms for hidden surface removal when rendering 3D scenes, including the z-buffer method, scan-line method, spanning scan-line method, floating horizon method, and discrete data method. The z-buffer method uses a depth buffer to track the closest surface at each pixel. The scan-line method only considers visible surfaces within each scan line. The floating horizon method finds the visible portions of curves using a horizon array. The discrete data method handles surfaces defined by discrete points rather than mathematical equations.
COLOR CRT MONITORS IN COMPUTER GRAPHICSnehrurevathy
1. Color CRT displays use phosphors and one of two methods - beam penetration or shadow mask - to generate colors.
2. The beam penetration method uses red and green phosphors and electron beam speed to produce four colors, while the shadow mask method uses three color phosphors and electron beam deflection through a shadow mask to generate millions of colors.
3. Flat panel displays like LCDs and plasma panels provide alternatives to CRTs with reduced size and power use, though early types had limitations in features like color capability.
Provide Effective Feedbackand Guidance and AssistanceAngga Papiih
This document discusses providing effective feedback and guidance in user interfaces. It covers acceptable response times, dealing with time delays, blinking for attention, and using sound to provide feedback. It also reviews different types of guidance and assistance, including preventing errors, instructions or prompting, help facilities, contextual help, task-oriented help, reference help, wizards, and hints or tips. The types of guidance aim to help users complete tasks and troubleshoot problems through step-by-step instructions, references, and brief contextual information. Design guidelines are provided for implementing each type effectively.
Decision trees are a type of supervised machine learning that use a tree-like model to predict target variables. They work by splitting data into smaller and smaller groups (branches) based on attribute values, continuing until the groups only contain similar target variable values or cannot be split further. The tree consists of decision nodes that test attributes, branches representing the outcome of the tests, and leaf nodes that represent classifications or predicted target values. The ID3 algorithm builds decision trees by selecting the attribute that creates the most information gain at each split in a greedy, top-down manner.
The document discusses different database models including hierarchical, network, relational, entity-relationship, object-oriented, object-relational, and semi-structured models. It provides details on the characteristics, structures, advantages and disadvantages of each model. It also includes examples and diagrams to illustrate concepts like hierarchical structure, network structure, relational schema, entity relationship diagrams, object oriented diagrams, and XML schema. The document appears to be teaching materials for a database management course that provides an overview of various database models.
Chap004-Product and Service Design.pdfKhatVillados
This document outlines the key concepts and learning objectives covered in Chapter 4 of an operations management textbook on product and service design. It discusses the strategic importance of design, the design process, sources of design ideas, considerations like quality, costs and sustainability, and phases of product and service life cycles. Key aspects of design covered include standardization, mass customization, reliability, and concurrent engineering. The document provides an overview of the chapter's content at a high level.
The document discusses business intelligence and analytics programs and careers. It provides information on topics like data mining, dashboards, enterprise resource planning systems, online analytical processing, and multidimensional data models. It also lists relevant course descriptions and curriculum from technical schools and colleges to prepare for careers in fields like business intelligence specialist, business intelligence developer, and business intelligence report developer.
The document provides an overview of SQL vs NoSQL databases. It discusses how RDBMS systems focus on ACID properties to ensure consistency but sacrifice availability and scalability. NoSQL systems embrace the CAP theorem, prioritizing availability and partition tolerance over consistency to better support distributed and cloud-scale architectures. The document outlines different NoSQL database models and how they are suited for high volume operations through an asynchronous and eventually consistent approach.
The document provides an introduction to information retrieval, including its history, key concepts, and challenges. It discusses how information retrieval aims to retrieve relevant documents from a collection to satisfy a user's information need. The main challenge in information retrieval is determining relevance, as relevance depends on personal assessment and can change based on context, time, location, and device. The document outlines the major issues and developments in the field over time from the 1950s to present day.
Forecasting demand for new product launches has been a major challenge for industries and cost of error has been high. Multiple research suggests that new product contributes to one-third of the organization sales across the various industry. The blog highlights on use of deep Learning / Machine Learning for effective new product forecast.
This document provides a comparison of SQL and NoSQL databases. It summarizes the key features of SQL databases, including their use of schemas, SQL query languages, ACID transactions, and examples like MySQL and Oracle. It also summarizes features of NoSQL databases, including their large data volumes, scalability, lack of schemas, eventual consistency, and examples like MongoDB, Cassandra, and HBase. The document aims to compare the different approaches of SQL and NoSQL for managing data.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides guidance on how to write a clear scientific paper. It discusses the key sections of a paper including the title, abstract, introduction, related work, method, results, and conclusions. The introduction should motivate the problem, prior approaches, contributions, and provide a teaser figure. The related work section should group existing work into topics and compare approaches. The method section should describe the approach with subsections and forward references. The results section covers experiments, metrics, datasets, and includes visual and quantitative results with an ablation study. Figures and tables should be able to stand alone in a presentation. Writing should be concise, consistent, specific and direct with careful use of words, equations, and notation. Overall, the
The document provides an overview of key concepts in data science including data types, the data value chain, and big data. It defines data science as extracting insights from large, diverse datasets using tools like machine learning. The data value chain involves acquiring, processing, analyzing and using data. Big data is characterized by its volume, velocity and variety. Common techniques for big data analytics include data mining, machine learning and visualization.
The document defines and describes key concepts related to data warehousing. It provides definitions of data warehousing, data warehouse features including being subject-oriented, integrated, and time-variant. It discusses why data warehousing is needed, using scenarios of companies wanting consolidated sales reports. The 3-tier architecture of extraction/transformation, data warehouse storage, and retrieval is covered. Data marts are defined as subsets of the data warehouse. Finally, the document contrasts databases with data warehouses and describes OLAP operations.
The document discusses object-oriented databases and their advantages over traditional relational databases, including their ability to model more complex objects and data types. It covers fundamental concepts of object-oriented data models like classes, objects, inheritance, encapsulation, and polymorphism. Examples are provided to illustrate object identity, object structure using type constructors, and how an object-oriented model can represent relational data.
Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives. It originated in the 1970s and was formalized in 1993, with OLAP cubes organizing numeric facts by dimensions to enable fast analysis. OLAP provides operations like roll-up, drill-down, slice, and dice to analyze aggregated data across multiple systems. It offers advantages over relational databases for consistent reporting and analysis.
The document provides an introduction to database management systems and databases. It discusses:
1) Why we need DBMS and examples of common databases like bank, movie, and railway databases.
2) The definitions of data, information, databases, and DBMS. A DBMS allows for the creation, storage, and retrieval of data from a database.
3) Different types of file organization methods like heap, sorted, indexed, and hash files and their pros and cons. File organization determines how records are stored and accessed in a database.
The document discusses digitizing, which is the process of converting analog geographic data into digital vector data. There are two main types of digitizing - automatic digitizing which uses image processing to convert raster to vector, and manual digitizing. Manual digitizing can be done on a tablet by tracing a hard copy, or on a computer screen using a mouse or digitizing cursor. When digitizing on screen, there are two modes: point mode places vertices manually and is slower but more accurate, while stream mode automatically places vertices and is faster but produces heavier files.
88 digital interview questions and answersdavidgest49
In this file, you can ref interview materials for digital such as, digital situational interview, digital behavioral interview, digital phone interview, digital interview thank you letter, digital interview tips …
88 digital interview questions and answers
free ebook pdf download
Computer Graphics - Hidden Line Removal AlgorithmJyotiraman De
This document discusses various algorithms for hidden surface removal when rendering 3D scenes, including the z-buffer method, scan-line method, spanning scan-line method, floating horizon method, and discrete data method. The z-buffer method uses a depth buffer to track the closest surface at each pixel. The scan-line method only considers visible surfaces within each scan line. The floating horizon method finds the visible portions of curves using a horizon array. The discrete data method handles surfaces defined by discrete points rather than mathematical equations.
COLOR CRT MONITORS IN COMPUTER GRAPHICSnehrurevathy
1. Color CRT displays use phosphors and one of two methods - beam penetration or shadow mask - to generate colors.
2. The beam penetration method uses red and green phosphors and electron beam speed to produce four colors, while the shadow mask method uses three color phosphors and electron beam deflection through a shadow mask to generate millions of colors.
3. Flat panel displays like LCDs and plasma panels provide alternatives to CRTs with reduced size and power use, though early types had limitations in features like color capability.
Provide Effective Feedbackand Guidance and AssistanceAngga Papiih
This document discusses providing effective feedback and guidance in user interfaces. It covers acceptable response times, dealing with time delays, blinking for attention, and using sound to provide feedback. It also reviews different types of guidance and assistance, including preventing errors, instructions or prompting, help facilities, contextual help, task-oriented help, reference help, wizards, and hints or tips. The types of guidance aim to help users complete tasks and troubleshoot problems through step-by-step instructions, references, and brief contextual information. Design guidelines are provided for implementing each type effectively.
Decision trees are a type of supervised machine learning that use a tree-like model to predict target variables. They work by splitting data into smaller and smaller groups (branches) based on attribute values, continuing until the groups only contain similar target variable values or cannot be split further. The tree consists of decision nodes that test attributes, branches representing the outcome of the tests, and leaf nodes that represent classifications or predicted target values. The ID3 algorithm builds decision trees by selecting the attribute that creates the most information gain at each split in a greedy, top-down manner.
The document discusses different database models including hierarchical, network, relational, entity-relationship, object-oriented, object-relational, and semi-structured models. It provides details on the characteristics, structures, advantages and disadvantages of each model. It also includes examples and diagrams to illustrate concepts like hierarchical structure, network structure, relational schema, entity relationship diagrams, object oriented diagrams, and XML schema. The document appears to be teaching materials for a database management course that provides an overview of various database models.
Chap004-Product and Service Design.pdfKhatVillados
This document outlines the key concepts and learning objectives covered in Chapter 4 of an operations management textbook on product and service design. It discusses the strategic importance of design, the design process, sources of design ideas, considerations like quality, costs and sustainability, and phases of product and service life cycles. Key aspects of design covered include standardization, mass customization, reliability, and concurrent engineering. The document provides an overview of the chapter's content at a high level.
The document discusses business intelligence and analytics programs and careers. It provides information on topics like data mining, dashboards, enterprise resource planning systems, online analytical processing, and multidimensional data models. It also lists relevant course descriptions and curriculum from technical schools and colleges to prepare for careers in fields like business intelligence specialist, business intelligence developer, and business intelligence report developer.
The document provides an overview of SQL vs NoSQL databases. It discusses how RDBMS systems focus on ACID properties to ensure consistency but sacrifice availability and scalability. NoSQL systems embrace the CAP theorem, prioritizing availability and partition tolerance over consistency to better support distributed and cloud-scale architectures. The document outlines different NoSQL database models and how they are suited for high volume operations through an asynchronous and eventually consistent approach.
The document provides an introduction to information retrieval, including its history, key concepts, and challenges. It discusses how information retrieval aims to retrieve relevant documents from a collection to satisfy a user's information need. The main challenge in information retrieval is determining relevance, as relevance depends on personal assessment and can change based on context, time, location, and device. The document outlines the major issues and developments in the field over time from the 1950s to present day.
Forecasting demand for new product launches has been a major challenge for industries and cost of error has been high. Multiple research suggests that new product contributes to one-third of the organization sales across the various industry. The blog highlights on use of deep Learning / Machine Learning for effective new product forecast.
This document provides a comparison of SQL and NoSQL databases. It summarizes the key features of SQL databases, including their use of schemas, SQL query languages, ACID transactions, and examples like MySQL and Oracle. It also summarizes features of NoSQL databases, including their large data volumes, scalability, lack of schemas, eventual consistency, and examples like MongoDB, Cassandra, and HBase. The document aims to compare the different approaches of SQL and NoSQL for managing data.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides guidance on how to write a clear scientific paper. It discusses the key sections of a paper including the title, abstract, introduction, related work, method, results, and conclusions. The introduction should motivate the problem, prior approaches, contributions, and provide a teaser figure. The related work section should group existing work into topics and compare approaches. The method section should describe the approach with subsections and forward references. The results section covers experiments, metrics, datasets, and includes visual and quantitative results with an ablation study. Figures and tables should be able to stand alone in a presentation. Writing should be concise, consistent, specific and direct with careful use of words, equations, and notation. Overall, the
The document provides an overview of key concepts in data science including data types, the data value chain, and big data. It defines data science as extracting insights from large, diverse datasets using tools like machine learning. The data value chain involves acquiring, processing, analyzing and using data. Big data is characterized by its volume, velocity and variety. Common techniques for big data analytics include data mining, machine learning and visualization.
The document defines and describes key concepts related to data warehousing. It provides definitions of data warehousing, data warehouse features including being subject-oriented, integrated, and time-variant. It discusses why data warehousing is needed, using scenarios of companies wanting consolidated sales reports. The 3-tier architecture of extraction/transformation, data warehouse storage, and retrieval is covered. Data marts are defined as subsets of the data warehouse. Finally, the document contrasts databases with data warehouses and describes OLAP operations.
The document discusses object-oriented databases and their advantages over traditional relational databases, including their ability to model more complex objects and data types. It covers fundamental concepts of object-oriented data models like classes, objects, inheritance, encapsulation, and polymorphism. Examples are provided to illustrate object identity, object structure using type constructors, and how an object-oriented model can represent relational data.
Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives. It originated in the 1970s and was formalized in 1993, with OLAP cubes organizing numeric facts by dimensions to enable fast analysis. OLAP provides operations like roll-up, drill-down, slice, and dice to analyze aggregated data across multiple systems. It offers advantages over relational databases for consistent reporting and analysis.
The document provides an introduction to database management systems and databases. It discusses:
1) Why we need DBMS and examples of common databases like bank, movie, and railway databases.
2) The definitions of data, information, databases, and DBMS. A DBMS allows for the creation, storage, and retrieval of data from a database.
3) Different types of file organization methods like heap, sorted, indexed, and hash files and their pros and cons. File organization determines how records are stored and accessed in a database.
The document discusses digitizing, which is the process of converting analog geographic data into digital vector data. There are two main types of digitizing - automatic digitizing which uses image processing to convert raster to vector, and manual digitizing. Manual digitizing can be done on a tablet by tracing a hard copy, or on a computer screen using a mouse or digitizing cursor. When digitizing on screen, there are two modes: point mode places vertices manually and is slower but more accurate, while stream mode automatically places vertices and is faster but produces heavier files.
88 digital interview questions and answersdavidgest49
In this file, you can ref interview materials for digital such as, digital situational interview, digital behavioral interview, digital phone interview, digital interview thank you letter, digital interview tips …
88 digital interview questions and answers
free ebook pdf download
A dialogue with CEO and Engineering, Legal, Compliance, HSE, Quality, HR, Finance, Administration to demonstrate "visible" change and true transformation is possible and that too rapidly
- The document discusses recommendations for digitizing banking services based on a comparative study of digital and branch banking.
- A survey found customers prefer digital banking over branches due to convenience and time savings. Key implementation factors are infrastructure, data management, analytics, and user interfaces.
- The recommendations include creating an integrated customer database, origination systems, independent processing support, and data repository to power customized digital products and services.
The document provides step-by-step instructions for digitizing data in ArcMap. It explains how to customize the toolbar, select a shapefile to edit, use the pencil tool to start digitizing by clicking points on the map, and then stop editing to save changes. Digitizing involves creating shapefiles, adding data, starting editing, and selecting a scale. The digitized lines and their vertices are highlighted, and the process includes stopping and saving edits after selection.
This document summarizes the results of a study testing a mixed-mode questionnaire for Finland's Labour Force Survey, including cognitive interviews and a pilot survey comparing computer-assisted web interviewing (CAWI) to telephone interviewing (CATI).
Key findings include: 1) Questionnaire design improvements like layout, navigation bars, and "Don't know" options impacted usability. 2) The CAWI response rate was 30%, suggesting cost savings, but CAWI respondents had more "distorted" data. 3) No significant mode effects were found for employment status, but working hours questions showed mode effects between CAWI and CATI formats. Further testing is recommended to reduce biases between modes.
Digitization is fundamentally changing how property and casualty insurers and their customers do business. Emerging digital technologies like the internet of things, artificial intelligence, and autonomous vehicles are creating new risks and exposures. This is transforming the risk, liability, and insurance arena. Insurers must develop new products, services, and processes to respond to customers' evolving risks and take advantage of this opportunity. How insurers adapt to increasing digitization will determine whether they remain relevant to customers.
Structured conceptualization approach to survey design slideshare 0213 dmfDavid Filiberto
The document describes a structured conceptualization approach to survey design that merges best practices in survey design with concept mapping. It involves engaging stakeholders to generate statements in response to a focus prompt, sorting the statements into piles of similar meaning, and using multidimensional scaling and hierarchical clustering to compute a concept map that structures the statements and provides the framework for an effective survey. The approach was used successfully in two projects involving surveys on home energy use and a village comprehensive plan.
This document provides information on constructing questionnaires. It defines what a questionnaire is and describes the various types. The key steps outlined for constructing a questionnaire are: writing the study aim, identifying broad topic areas, breaking these into single-item statements, constructing questions and the questionnaire, and validating the questionnaire. Various question types like closed-ended, open-ended, rating scales, and checklists are described. Guidelines are provided for writing clear, unbiased questions and properly structuring the questionnaire. The importance of validation by piloting the questionnaire on a small sample is also covered.
This chapter introduces descriptive statistics. It aims to study basic statistical concepts including variables, measures of central tendency, and measures of dispersion. For measures of central tendency, it discusses how to calculate the mean, median, and mode for both ungrouped and grouped data. It also introduces how to calculate variance and standard deviation as measures of dispersion. Examples are provided to demonstrate calculating these descriptive statistics for raw data sets.
This document discusses descriptive statistics and analyzing student achievement data. It introduces common descriptive statistics like mean, median, mode, range, and standard deviation. While mean, median, mode, etc. can describe some aspects of the data, they do not tell the full story on their own. The document emphasizes that raw data contains all the information and descriptive statistics alone do not. It also warns against making inaccurate interpretations from average data and stresses the importance of critically consuming and making nuanced interpretations of student achievement data.
Descriptive statistics are used to summarize and describe characteristics of a data set. They include measures of central tendency like the mean, median, and mode as well as measures of variability such as range, standard deviation, and variance. Descriptive statistics help analyze and understand patterns in data through tables, charts, and summaries without drawing inferences about the underlying population.
The document discusses pre-experimental research designs. Pre-experimental designs lack key elements of true experiments such as control groups and random assignment. Three examples of pre-experimental designs are described: the one-shot case study which involves observing a group after a treatment with no pre-observation; the one group pretest-posttest study which involves measuring a dependent variable before and after treatment; and the static group comparison which compares groups that have and have not received a treatment. The document also provides an example of an experimental study on matching counselor and client interests that demonstrates key elements of an experimental method section including participants, design, instruments, and procedures.
Tackling the job of conducting a survey for your library can be daunting. A systematic and quality-driven approach will yield results which can provide valuable information to decision-makers and stakeholders. This first in a three-part series of workshops on conducting surveys will demystify the survey process, from beginning to end of your project.
This first workshop of the three-part series addresses 1) the reasons for conducting a survey; 2) issues in effective questionnaire design, data collection and analysis, and reporting; and 3) questionnaire design, especially measurement, question content, and structure, including examples.
This document provides an introduction to questionnaire design. It discusses important considerations for writing questions, such as ensuring respondents understand the question and are willing and able to answer. It also covers drafting and organizing questionnaires, including ordering questions by topic, starting with easy questions, and testing the questionnaire. The goal is to design questionnaires that yield accurate, truthful answers from respondents.
Development of concurrent services using In-Memory Data Gridsjlorenzocima
As part of OTN Tour 2014 believes this presentation which is intented for covers the basic explanation of a solution of IMDG, explains how it works and how it can be used within an architecture and shows some use cases. Enjoy
This document discusses deduplication, including what it is, different types of deduplication, where it occurs in the data storage process, advantages and disadvantages, expected storage reductions, and results from an experimental home deduplication. It defines deduplication as eliminating duplicate data and notes that while the basic premise is the same between vendors, implementations can vary. The document also provides examples of storage reductions seen in case studies and an experiment showing significant space savings from deduplicating personal files and software downloads.
The document summarizes a project testing the feasibility of emulation and migration as strategies for preserving web archives. It describes the National Library of Australia's web collections and common web object types. It then details tests conducted on a sample of the NLA's PANDORA archive using emulation and migration tools. The tests found that emulation was complex and slow, while available migration tools were imperfect and slow. Both strategies require significant resources and clearly defined preservation policies. The document concludes that no proven digital preservation methods currently exist and more real-world testing is needed.
This document discusses various types of forensic duplication including simple duplication that copies selected data versus forensic duplication that retains every bit on the source drive including deleted files. It covers requirements for forensic duplication including the need to act as admissible evidence. It describes different forensic image formats including complete disk, partition, and logical images and details scenarios for each type. Key aspects of forensic duplication covered include recovering deleted files, non-standard data types, ensuring image integrity with hashes, and traditional duplication methods like using hardware write blockers or live DVDs.
Rapid Evolution of Web Dev? aka Talking About The WebPINT Inc
Thomas Powell gives a Meme peppered talk at Interactive Day San Diego about the Web and Web Dev tech focusing on how far (or not) we have come since the late 1990s.
Digitization is the process of converting analog materials like written records, photographs, films and artifacts into a digital format. While digitized materials allow for greater accessibility through efficient search and preservation, some information is lost in translation to digital 1s and 0s. However, digitization transforms how we research, present and access historical materials. There are different methods of digitization including page imaging, markup with standards like TEI to make text machine-readable and searchable, and optical character recognition. Each method has advantages and disadvantages for representing the original content.
This document discusses architecting a data lake. It begins by introducing the speaker and topic. It then defines a data lake as a repository that stores enterprise data in its raw format including structured, semi-structured, and unstructured data. The document outlines some key aspects to consider when architecting a data lake such as design, security, data movement, processing, and discovery. It provides an example design and discusses solutions from vendors like AWS, Azure, and GCP. Finally, it includes an example implementation using Azure services for an IoT project that predicts parts failures in trucks.
Introduction to Big Data and NoSQL.
This presentation was given to the Master DBA course at John Bryce Education in Israel.
Work is based on presentations by Michael Naumov, Baruch Osoveskiy, Bill Graham and Ronen Fidel.
Brev loc cloud data storage, backup and recovery presdanmraz
This document discusses data backup and recovery solutions, specifically Brevard Local Cloud. It notes that data loss can be catastrophic for businesses and outlines potential causes of data loss like hardware and human errors. Brevard Local Cloud is presented as a local backup solution where a user's data is backed up every 15 minutes to storage in Brevard County. The service provides up to 4 devices with 100GB storage, remote access, version restoration, and assistance for $35 per month.
Managing active data: storage, access, academic dropbox servicesMarieke Guy
Researchers need to store and access active data for their work. They often use local storage devices and email to manage their data, but these methods lack resilience. Institutions provide network storage, but their systems are rarely used by researchers for active data access and sharing. An ideal solution would transparently synchronize local and network storage, providing both the convenience of local access and the resilience of network attached storage. However, there are challenges to managing active research data at large scales, including file sizes, storage locations, costs, ease of use, network transfers, versions, and access rights that institutions aim to address.
2010 AIRI Petabyte Challenge - View From The TrenchesGeorge Ang
This document provides an overview of trends in science-driven storage from the perspective of an independent consulting firm. It discusses how the needs of life science researchers are driving huge increases in data production and storage needs. It also describes some common problems encountered, such as enterprise storage solutions that don't meet research needs, do-it-yourself cluster configurations that are not optimized, and unchecked user requirements. The document concludes with some practical advice, such as the importance of a single namespace, user expectation management, and trends towards larger petabyte-scale storage deployments.
This document discusses security considerations for startups. It notes that while startups often don't prioritize security due to budget constraints, security breaches can impact revenue and data. The document outlines best practices at different layers including external network, application, internal network, and staff awareness. It also provides examples of typical security issues that startups encounter like platform dependencies and vulnerabilities, and recommends basic security scans and education resources to help improve practices.
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
Wordnik migrated from a MySQL relational database to the non-relational MongoDB database for 5 key reasons: speed, stability, scaling, simplicity, and fitting their object model better. They tested MongoDB extensively, iteratively improving their data mapping and access patterns. The migration was done without downtime by switching between the databases. While inserts were much faster in MongoDB, updates could be slow due to disk I/O. Wordnik addressed this through optimizations like pre-fetching on updates and moving to local storage. Overall, MongoDB was a better fit for Wordnik's large and evolving datasets.
The document summarizes key discussion points from Axiell East Coast Roadshow 2018's Digital Preservation Discussion Group. The group discussed: 1) the types of digital materials cultural heritage organizations manage like images, documents, and born-digital files; 2) who has organizational digital preservation policies in place; and 3) what digital preservation means and current challenges organizations face in preserving digital collections over time given risks like file formats becoming unreadable and materials becoming lost or unrecoverable without preservation planning. The group also explored options for on-site and cloud-based preservation solutions and factors to consider when developing a digital preservation program and policy.
The document provides an agenda and summaries of updates from Redmap's quarterly briefing, including:
1. ManagePoint 4.3.2 will include improved office integration, a new ManageAnywhere interface, updated image editing and viewing tools, and input queue functionality.
2. CaptureMail is an email archiving solution that will be available in June, with options for full archiving setup or importing existing emails.
3. The ManagePoint 4.3.2 release focuses on improved security, performance, and usability through changes to the application architecture. New features were also outlined for the image editor, viewer, and office integration.
A little simple explanation abut Digital imagingaechaa93
This document discusses digital imaging and factors to consider for digitizing records. Digital imaging converts documents to computer-readable digital image files consisting of pixels arranged in rows and columns. Key factors to consider include the project's mission and users, priorities for speed, quality and quantity, and staff expertise. Maintaining records as digital images allows for high-density storage, faster retrieval when indexed, and multiple access levels, but images require computer equipment to read and involve equipment and scanning costs. Proper implementation requires selecting materials, preparing files, creating metadata, quality control, and staff training.
Brev loc cloud data storage, backup and recovery presdanmraz
This document discusses data backup and recovery solutions, including the Brevard Local Cloud. It notes that computers, hard drives, and flash drives can fail, putting business data and operations at risk. The Brevard Local Cloud backs up data every 15 minutes, allowing users to restore deleted files or previous versions. It offers up to 4 devices with 100GB storage, remote access, self-serve restore, and support for $35 per month.
Esteban R. Frías
Social Innovation Labs at Universities: The Case of Medialab UGR – a Research Laboratory for Digital Culture and Society
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
Xpectraltek
Graphic Documents Unfolded With the XpeCAM X01 Solution
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
Csaba B. Stenge
The Chaldean Heritage Work Group – A Mission by ICARUS to Help Preserve the Ancient Iraqi Christian Written Heritage
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
Rosa María Martín Rey and Javier Hernández Díez
Interoperability, Records Management and e-Archiving in Spanish eGovernment: Legal Framework, Processes and Tools
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
Daniel Alves, David Bodenhamer and Paul Ell
DIGIWARMEMO: A Digital Humanities Approach to Re-Use the First World War Online Archives
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
Sergio Riolo
Bringing the Archives to the People, and Vice-Versa
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
ImageWare Austria
Collaborative Digitization in Archives vs. Contracted Services or Procurement. New Paths to High End Digitization
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
Presenting Archives Portal Europe, the largest online archives catalogue in the world. It provides access to archival descriptions from over 1,300 institutions across Europe. The portal uses international metadata standards like EAD and EAC-CPF and sees 500,000 users annually. The Archives Portal Europe Foundation governs the portal and has a strategy to improve its API, processing of EAC records, and additional finding aids.
Alfonso Sánchez Mairena
PARES 2.0: The Spanish State Archives and the Open Data Culture
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and Society
23–25 October 2017, Complutense University Madrid, Calle del Prof. Aranguren, 28040 Madrid, Spain
Rafael Chelaru
Creating a Genealogical Database - Digitization of the Civil Registers and Matricula from Bucharest and Brasov County (Romania)
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Karl-Magnus Johansson
Archives and Art. The Regional State Archives in Gothenburg in Cooperation with HDK, Academy of Design and Crafts, University of Gothenburg
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Karin Sjöberg
Archives, Education and Learning. Archives as a Resource for Schools
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Björn Asker
Open Access in the 18th Century – The Swedish Freedom of the Press Act of 1766
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Martin Bjersby
The National Archival Database
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Hanna Wendelbo-Hansson
Archives on the Wall
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Anna Ketola
Documentation, Collections and Archives from the Civil Society
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Jan Östergren
The Swedish Archival Landscape. Vision, Future, Transparency, Access and Use
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Elisabeth Steiger
The European Archival Blog
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Bente Jensen
Archives’ Outreach in the Nordic Countries – a Question About Relevance, Participation and Dialogue
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
Michael Scholz
Tourismusgeschichte und Destinationsentwicklung am Beispiel Gotlands
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative archival landscape can effect society
23–25 May 2016, Krukmakarens hus (The Potter´s house), Mellangatan 21, 621 56 Visby / The Regional State Archives in Visby, Broväg 27, 621 41 Visby, Sweden
More from ICARUS - International Centre for Archival Research (20)
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
2. What is digitization?
• The translation of analogue data into digital
data
• The process of taking a digital image of
something
• The creation of a digital representation of
something
5. Initial reflections
• What do I want to digitize?
• Why do I want to do it?
• Do I already have the necessary metadata?
• What safety precautions are necessary?
• Do I want to digitize myself or hire a
company?
• What will happen with the digital object, once
produced?
6. Technical reflections
• Image format (TIFF, JPEG, JPEG2000,…)
• Image quality (Resolution, Image
compression)
• Equipment (Scanner vs. Digital Camera)
• Storage space (webhosting?)
• Long-term storage
• Filename (Human vs. Computer readable)
7. Scanner vs. Digital Camera
Scanner Digital Camera
• Variable resolution • Constant resolution
• Lower speed • High speed
• Mainly for flat objects • Higher adjustability
8. Metadata for Digitization
Why is this important?
• Every digital object has to be identifiable
• Most file systems only permit unique names
• Changes to filenames are difficult to make
after digitization
9. Preparation
• Enough space
• Constant indirect
lighting, no direct
sunlight
• Tables for preparation
• Enough power outlets
10. Digitization
• Speed vs. Quality
• Number of people
depending on source
material
• Storage of sources
during the night
• Delivery of new sources
• Quality control
• Image postprocessing
11. Source material
• Flat surface • Book
• One sheet • Possibly very large
• Possibly folded • Possibly difficult to
• Possibly packaged open
12. File naming
Human readability Computer readability
“HHStA_AUR_16890201_r.jpg” „7c7a1400-dd12-11e0.jpg“
• Easy to read and share • Can be created quickly
• Has to be created by hand before digitization
• Easy to make mistakes • Has to be copied, not
• Temptation to make entered by hand
changes later on • Doesn’t mean anything
• Can stay the same
regardless of metadata
13. Postproduction
• Image editing (cutting, compression,…)
• Quality control (missing images)
• Preparation for storage (file organization)