This document provides an overview of the Oxford Common File Layout (OCFL) specification for digital preservation. It summarizes the goals of OCFL, which are to enable the completeness, parsability, robustness, and storage of digital objects on a variety of infrastructures. The key components of an OCFL system include OCFL objects, which group content files and metadata into versioned directories, and OCFL storage roots, which provide a hierarchical storage structure for multiple OCFL objects.
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
RO-Crate: A framework for packaging research products into FAIR Research Objects presented to Research Data Alliance RDA Data Fabric/GEDE FAIR Digital Object meeting. 2021-02-25
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
An introduction to Unstructured Data and the world of Vector Databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture.
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
Tech-talk at Bay Area Apache Spark Meetup.
Apache Spark 2.0 will ship with the second generation Tungsten engine. Building upon ideas from modern compilers and MPP databases, and applying them to data processing queries, we have started an ongoing effort to dramatically improve Spark’s performance and bringing execution closer to bare metal. In this talk, we’ll take a deep dive into Apache Spark 2.0’s execution engine and discuss a number of architectural changes around whole-stage code generation/vectorization that have been instrumental in improving CPU efficiency and gaining performance.
Extracting, Aligning, and Linking Data to Build Knowledge GraphsCraig Knoblock
This document discusses building knowledge graphs by extracting, aligning, and linking data from various sources. It describes crawling websites to acquire raw data, using both structured and unstructured extraction to extract features from the data, aligning the extracted features to a common schema, and resolving entities in the data to merge records referring to the same real-world entity. It also discusses techniques for collectively resolving entities in large datasets, summarizing graphs by grouping similar nodes into super-nodes, and using the summarized graph to predict links in the original graph. The overall goal is to clean, organize, and link disconnected data into a knowledge graph that is easier to query, analyze, and visualize.
A Short Tutorial to Semantic Media Wiki (SMW) Jie Bao
This document provides an outline for a tutorial on Semantic MediaWiki (SMW). SMW allows semantic annotation of wiki pages, treating them as a lightweight semantic database. It covers what SMW is, how to edit pages semantically, browsing annotated data, using semantics for end users and developers, example applications, and additional resources.
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
Apache Jackrabbit is just about to reach the 3.0 milestone based on a new architecture called Oak. Based on concepts like eventual consistency and multi-version concurrency control, and borrowing ideas from distributed version control systems and cloud-scale databases, the Oak architecture is a major leap ahead for Jackrabbit. This presentation describes the Oak architecture and shows what it means for the scalability and performance of modern content applications. Changes to existing Jackrabbit functionality are described and the migration process is explained.
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
RO-Crate: A framework for packaging research products into FAIR Research Objects presented to Research Data Alliance RDA Data Fabric/GEDE FAIR Digital Object meeting. 2021-02-25
"SPARQL Cheat Sheet" is a short collection of slides intended to act as a guide to SPARQL developers. It includes the syntax and structure of SPARQL queries, common SPARQL prefixes and functions, and help with RDF datasets.
The "SPARQL Cheat Sheet" is intended to accompany the SPARQL By Example slides available at http://www.cambridgesemantics.com/2008/09/sparql-by-example/ .
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
An introduction to Unstructured Data and the world of Vector Databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture.
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
Tech-talk at Bay Area Apache Spark Meetup.
Apache Spark 2.0 will ship with the second generation Tungsten engine. Building upon ideas from modern compilers and MPP databases, and applying them to data processing queries, we have started an ongoing effort to dramatically improve Spark’s performance and bringing execution closer to bare metal. In this talk, we’ll take a deep dive into Apache Spark 2.0’s execution engine and discuss a number of architectural changes around whole-stage code generation/vectorization that have been instrumental in improving CPU efficiency and gaining performance.
Extracting, Aligning, and Linking Data to Build Knowledge GraphsCraig Knoblock
This document discusses building knowledge graphs by extracting, aligning, and linking data from various sources. It describes crawling websites to acquire raw data, using both structured and unstructured extraction to extract features from the data, aligning the extracted features to a common schema, and resolving entities in the data to merge records referring to the same real-world entity. It also discusses techniques for collectively resolving entities in large datasets, summarizing graphs by grouping similar nodes into super-nodes, and using the summarized graph to predict links in the original graph. The overall goal is to clean, organize, and link disconnected data into a knowledge graph that is easier to query, analyze, and visualize.
A Short Tutorial to Semantic Media Wiki (SMW) Jie Bao
This document provides an outline for a tutorial on Semantic MediaWiki (SMW). SMW allows semantic annotation of wiki pages, treating them as a lightweight semantic database. It covers what SMW is, how to edit pages semantically, browsing annotated data, using semantics for end users and developers, example applications, and additional resources.
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
Apache Jackrabbit is just about to reach the 3.0 milestone based on a new architecture called Oak. Based on concepts like eventual consistency and multi-version concurrency control, and borrowing ideas from distributed version control systems and cloud-scale databases, the Oak architecture is a major leap ahead for Jackrabbit. This presentation describes the Oak architecture and shows what it means for the scalability and performance of modern content applications. Changes to existing Jackrabbit functionality are described and the migration process is explained.
This document provides an overview and introduction to MongoDB, an open-source, high-performance NoSQL database. It outlines MongoDB's features like document-oriented storage, replication, sharding, and CRUD operations. It also discusses MongoDB's data model, comparisons to relational databases, and common use cases. The document concludes that MongoDB is well-suited for applications like content management, inventory management, game development, social media storage, and sensor data databases due to its flexible schema, distributed deployment, and low latency.
DSpace 7 - The Power of Configurable EntitiesAtmire
The document discusses DSpace 7's new configurable entities feature. It describes an entities working group that defined requirements and a roadmap for implementing configurable entities. Key points include:
- Items can now be typed and relations between types can be configured
- Existing item functionality like submission forms, searching, and importing can support different entity types
- A journal use case was implemented to demonstrate entities
- "Virtual metadata" allows mapping metadata between related items to avoid duplication
- Item pages can display different configurations for each entity type and their relations
Elasticsearch is a free and open source distributed search and analytics engine. It allows documents to be indexed and searched quickly and at scale. Elasticsearch is built on Apache Lucene and uses RESTful APIs. Documents are stored in JSON format across distributed shards and replicas for fault tolerance and scalability. Elasticsearch is used by many large companies due to its ability to easily scale with data growth and handle advanced search functions.
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Spark
DevNexus 2022 Atlanta
https://devnexus.com/presentations/7150/
This talk is a quick overview of the How, What and WHY of Apache Pulsar, Apache Flink and Apache NiFi. I will show you how to design event-driven applications that scale the cloud native way.
This talk was done live in person at DevNexus across from the booth in room 311
Tim Spann
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
The document summarizes BuzzNumbers' transition from using SQL Server to MongoDB as their database. It discusses problems they faced with SQL Server like scalability issues and performance problems with large datasets. It then covers why they chose to use MongoDB, including its ability to scale horizontally and handle large volumes of writes and reads. Finally, it discusses lessons learned in moving to a NoSQL database and using MongoDB and .NET to build their analytics product.
Workshop held at Open Repository 2018, Bozeman, Montana
As of late 2016, a DSpace 7 UI Working Group has begun developing an Angular User Interface which will replace the existing UIs in DSpace 7. This effort also includes the development of a new REST API for DSpace, designed to follow the principles of a RESTful webservice and adopt emerging standards and formats. The goals of the REST API are twofold: (1) to fully support the new Angular UI, and (2) to provide a rich, RESTful integration point for third-party services and tools.
This workshop will allow developers to become more familiar with the new REST API framework before DSpace 7 is released.
This hands-on developers workshop will provide attendees with an overview of the DSpace 7 REST framework:
- standards / best practices that the API is based on (HAL, JSON+PATCH, JWT)
- DSpace 7 REST Contract (documentation of all endpoints)
- interacting with the REST API (via HAL browser, curl and/or postman)
- how to build new endpoints into the REST API
- where to look when issues arise
- how to document and test existing/new endpoints
Attendees will be expected to setup a virtual machine (or install the DSpace 7 codebase locally) to get more familiar with the codebase/development tools.
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together.
In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.
The document provides an overview of the Spring Framework. It describes Spring as an open source application development framework for Java that provides features like inversion of control (IoC) and dependency injection. The key benefits of Spring include its modular architecture, support for testing, integration with other technologies like ORM frameworks, and web MVC framework. The core container in Spring uses dependency injection to manage application components (beans). Configuration can be done via XML, annotations, or Java-based approaches. Spring also supports aspects like dependency injection, AOP, and auto-wiring to reduce coupling between objects.
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
Tesla ingests trillions of events every day from hundreds of unique data sources through our streaming data platform. Find out how we developed a set of high-throughput, non-blocking primitives that allow us to transform and ingest data into a variety of data stores with minimal development time. Additionally, we will discuss how these primitives allowed us to completely migrate the streaming platform in just a few months. Finally, we will talk about how we scale team size sub-linearly to data volumes, while continuing to onboard new use cases.
Apache Jackrabbit Oak - Scale your content repository to the cloudRobert Munteanu
The document discusses Apache Jackrabbit Oak, an open source content repository that can scale to the cloud. It provides an overview of content, repositories, scaling techniques using different storage backends like TarMK and MongoMK, and how Oak can be deployed in the cloud using technologies like S3 and MongoDB. The presentation covers key JCR concepts and shows how Oak can be used for applications like content management, digital asset management, and invoice management.
A Brief History of Database Management (SQL, NoSQL, NewSQL)Abdelkader OUARED
What's the Difference Between SQL, NoSQL, and NewSQL
SQL is a relational database management system (RDBMS) based on ... NewSQL tries to bring some of the features and scalability of NoSQL to SQL.
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
This document provides an introduction and overview of Java garbage collection (GC) tuning and the Java Mission Control tool. It begins with information about the speaker, Leon Chen, including his background and patents. It then outlines the Java and JVM roadmap and upcoming features. The bulk of the document discusses GC tuning concepts like heap sizing, generation sizing, footprint vs throughput vs latency. It provides examples and recommendations for GC logging, analysis tools like GCViewer and JWorks GC Web. The document is intended to outline Oracle's product direction and future plans for Java GC tuning and tools.
1) The document discusses information retrieval and search engines. It describes how search engines work by indexing documents, building inverted indexes, and allowing users to search indexed terms.
2) It then focuses on Elasticsearch, describing it as a distributed, open source search and analytics engine that allows for real-time search, analytics, and storage of schema-free JSON documents.
3) The key concepts of Elasticsearch include clusters, nodes, indexes, types, shards, and documents. Clusters hold the data and provide search capabilities across nodes.
Towards an Open Research Knowledge GraphSören Auer
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
Reactive Microservices with Spring 5: WebFlux Trayan Iliev
On November 27 Trayan Iliev from IPT presented “Reactive microservices with Spring 5: WebFlux” @Dev.bg in Betahaus Sofia. IPT – Intellectual Products & Technologies has been organizing Java & JavaScript trainings since 2003.
Spring 5 introduces a new model for end-to-end functional and reactive web service programming with Spring 5 WebFlow, Spring Data & Spring Boot. The main topics include:
– Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure)
– REST services with WebFlux – comparison between annotation-based and functional reactive programming approaches for building.
– Router, handler and filter functions
– Using reactive repositories and reactive database access with Spring Data. Building end-to-end non-blocking reactive web services using Netty-based web runtime
– Reactive WebClients and integration testing. Reactive WebSocket support
– Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
AWS hosts a variety of public data sets that anyone can access for free. Previously, large data sets such as satellite imagery or genomic data have required hours or days to locate, download, customize, and analyze. When data is made publicly available on AWS, anyone can analyze any volume of data without downloading or storing it themselves. In this session, the AWS Open Data Team shares tips and tricks, patterns and anti-patterns, and tools to help you effectively stage your data for analysis in the cloud.
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
This document describes FOOPS, an ontology validation service that checks ontologies for adherence to the FAIR principles. FOOPS tests ontologies against criteria related to findability, accessibility, interoperability, and reusability. It provides explanations for test failures to help users improve their ontologies. FOOPS validation results include an overall FAIRness score and coverage of FAIR categories to assess ontology quality, though there is no single threshold for what makes an ontology fully FAIR. The document demonstrates FOOPS and lists the types of tests it supports under each FAIR category. It invites feedback to help further improve FOOPS.
This document provides an overview and introduction to MongoDB. It discusses how new types of applications, data, volumes, development methods and architectures necessitated new database technologies like NoSQL. It then defines MongoDB and describes its features, including using documents to store data, dynamic schemas, querying capabilities, indexing, auto-sharding for scalability, replication for availability, and using memory for performance. Use cases are presented for companies like Foursquare and Craigslist that have migrated large volumes of data and traffic to MongoDB to gain benefits like flexibility, scalability, availability and ease of use over traditional relational database systems.
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
The Oxford Common File Layout (OCFL) specification began as a discussion at a Fedora/Samvera Camp held at Oxford University in September of 2017. Since then, it has grown into a focused community effort to define an open and application-independent approach to the long-term preservation of digital objects. Developed for structured, transparent, and predictable storage, it is designed to promote sustainable long-term access and management of content within digital repositories. This presentation will focus on the motivations and vision for the OCFL, explain key choices for the specification, and describe the status of implementation efforts.
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
This presentation given to University of Iowa Libraries on Nov. 17, 2014, discussing 1) the alphabet soup of metadata standards, e.g. CDM, VRA, CCO, METS, MODS, RDF, including sample tagging and their applications for digital libraries, and 2) why metadata matters. It does not address metadata issues and tools for metadata creation, extraction, transformation, quality control, syndication and ingest.
This document provides an overview and introduction to MongoDB, an open-source, high-performance NoSQL database. It outlines MongoDB's features like document-oriented storage, replication, sharding, and CRUD operations. It also discusses MongoDB's data model, comparisons to relational databases, and common use cases. The document concludes that MongoDB is well-suited for applications like content management, inventory management, game development, social media storage, and sensor data databases due to its flexible schema, distributed deployment, and low latency.
DSpace 7 - The Power of Configurable EntitiesAtmire
The document discusses DSpace 7's new configurable entities feature. It describes an entities working group that defined requirements and a roadmap for implementing configurable entities. Key points include:
- Items can now be typed and relations between types can be configured
- Existing item functionality like submission forms, searching, and importing can support different entity types
- A journal use case was implemented to demonstrate entities
- "Virtual metadata" allows mapping metadata between related items to avoid duplication
- Item pages can display different configurations for each entity type and their relations
Elasticsearch is a free and open source distributed search and analytics engine. It allows documents to be indexed and searched quickly and at scale. Elasticsearch is built on Apache Lucene and uses RESTful APIs. Documents are stored in JSON format across distributed shards and replicas for fault tolerance and scalability. Elasticsearch is used by many large companies due to its ability to easily scale with data growth and handle advanced search functions.
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Spark
DevNexus 2022 Atlanta
https://devnexus.com/presentations/7150/
This talk is a quick overview of the How, What and WHY of Apache Pulsar, Apache Flink and Apache NiFi. I will show you how to design event-driven applications that scale the cloud native way.
This talk was done live in person at DevNexus across from the booth in room 311
Tim Spann
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
The document summarizes BuzzNumbers' transition from using SQL Server to MongoDB as their database. It discusses problems they faced with SQL Server like scalability issues and performance problems with large datasets. It then covers why they chose to use MongoDB, including its ability to scale horizontally and handle large volumes of writes and reads. Finally, it discusses lessons learned in moving to a NoSQL database and using MongoDB and .NET to build their analytics product.
Workshop held at Open Repository 2018, Bozeman, Montana
As of late 2016, a DSpace 7 UI Working Group has begun developing an Angular User Interface which will replace the existing UIs in DSpace 7. This effort also includes the development of a new REST API for DSpace, designed to follow the principles of a RESTful webservice and adopt emerging standards and formats. The goals of the REST API are twofold: (1) to fully support the new Angular UI, and (2) to provide a rich, RESTful integration point for third-party services and tools.
This workshop will allow developers to become more familiar with the new REST API framework before DSpace 7 is released.
This hands-on developers workshop will provide attendees with an overview of the DSpace 7 REST framework:
- standards / best practices that the API is based on (HAL, JSON+PATCH, JWT)
- DSpace 7 REST Contract (documentation of all endpoints)
- interacting with the REST API (via HAL browser, curl and/or postman)
- how to build new endpoints into the REST API
- where to look when issues arise
- how to document and test existing/new endpoints
Attendees will be expected to setup a virtual machine (or install the DSpace 7 codebase locally) to get more familiar with the codebase/development tools.
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
PostgreSQL is an open source relational database. Kafka is an open source log-based messaging system. Because both systems are powerful and flexible, they’re devouring whole categories of infrastructure. And they’re even better together.
In this talk, you’ll learn about commit logs and how that fundamental data structure underlies both PostgreSQL and Kafka. We’ll use that basis to understand what Kafka is, what advantages it has over traditional messaging systems, and why it’s perfect for modeling database tables as streams. From there, we’ll introduce the concept of change data capture (CDC) and run a live demo of Bottled Water, an open source CDC pipeline, watching INSERT, UPDATE, and DELETE operations in PostgreSQL stream into Kafka. We’ll wrap up with a discussion of use cases for this pipeline: messaging between systems with transactional guarantees, transmitting database changes to a data warehouse, and stream processing.
The document provides an overview of the Spring Framework. It describes Spring as an open source application development framework for Java that provides features like inversion of control (IoC) and dependency injection. The key benefits of Spring include its modular architecture, support for testing, integration with other technologies like ORM frameworks, and web MVC framework. The core container in Spring uses dependency injection to manage application components (beans). Configuration can be done via XML, annotations, or Java-based approaches. Spring also supports aspects like dependency injection, AOP, and auto-wiring to reduce coupling between objects.
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
Tesla ingests trillions of events every day from hundreds of unique data sources through our streaming data platform. Find out how we developed a set of high-throughput, non-blocking primitives that allow us to transform and ingest data into a variety of data stores with minimal development time. Additionally, we will discuss how these primitives allowed us to completely migrate the streaming platform in just a few months. Finally, we will talk about how we scale team size sub-linearly to data volumes, while continuing to onboard new use cases.
Apache Jackrabbit Oak - Scale your content repository to the cloudRobert Munteanu
The document discusses Apache Jackrabbit Oak, an open source content repository that can scale to the cloud. It provides an overview of content, repositories, scaling techniques using different storage backends like TarMK and MongoMK, and how Oak can be deployed in the cloud using technologies like S3 and MongoDB. The presentation covers key JCR concepts and shows how Oak can be used for applications like content management, digital asset management, and invoice management.
A Brief History of Database Management (SQL, NoSQL, NewSQL)Abdelkader OUARED
What's the Difference Between SQL, NoSQL, and NewSQL
SQL is a relational database management system (RDBMS) based on ... NewSQL tries to bring some of the features and scalability of NoSQL to SQL.
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
This document provides an introduction and overview of Java garbage collection (GC) tuning and the Java Mission Control tool. It begins with information about the speaker, Leon Chen, including his background and patents. It then outlines the Java and JVM roadmap and upcoming features. The bulk of the document discusses GC tuning concepts like heap sizing, generation sizing, footprint vs throughput vs latency. It provides examples and recommendations for GC logging, analysis tools like GCViewer and JWorks GC Web. The document is intended to outline Oracle's product direction and future plans for Java GC tuning and tools.
1) The document discusses information retrieval and search engines. It describes how search engines work by indexing documents, building inverted indexes, and allowing users to search indexed terms.
2) It then focuses on Elasticsearch, describing it as a distributed, open source search and analytics engine that allows for real-time search, analytics, and storage of schema-free JSON documents.
3) The key concepts of Elasticsearch include clusters, nodes, indexes, types, shards, and documents. Clusters hold the data and provide search capabilities across nodes.
Towards an Open Research Knowledge GraphSören Auer
The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.
Reactive Microservices with Spring 5: WebFlux Trayan Iliev
On November 27 Trayan Iliev from IPT presented “Reactive microservices with Spring 5: WebFlux” @Dev.bg in Betahaus Sofia. IPT – Intellectual Products & Technologies has been organizing Java & JavaScript trainings since 2003.
Spring 5 introduces a new model for end-to-end functional and reactive web service programming with Spring 5 WebFlow, Spring Data & Spring Boot. The main topics include:
– Introduction to reactive programming, Reactive Streams specification, and project Reactor (as WebFlux infrastructure)
– REST services with WebFlux – comparison between annotation-based and functional reactive programming approaches for building.
– Router, handler and filter functions
– Using reactive repositories and reactive database access with Spring Data. Building end-to-end non-blocking reactive web services using Netty-based web runtime
– Reactive WebClients and integration testing. Reactive WebSocket support
– Realtime event streaming to WebClients using JSON Streams, and to JS client using SSE.
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
AWS hosts a variety of public data sets that anyone can access for free. Previously, large data sets such as satellite imagery or genomic data have required hours or days to locate, download, customize, and analyze. When data is made publicly available on AWS, anyone can analyze any volume of data without downloading or storing it themselves. In this session, the AWS Open Data Team shares tips and tricks, patterns and anti-patterns, and tools to help you effectively stage your data for analysis in the cloud.
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
This document describes FOOPS, an ontology validation service that checks ontologies for adherence to the FAIR principles. FOOPS tests ontologies against criteria related to findability, accessibility, interoperability, and reusability. It provides explanations for test failures to help users improve their ontologies. FOOPS validation results include an overall FAIRness score and coverage of FAIR categories to assess ontology quality, though there is no single threshold for what makes an ontology fully FAIR. The document demonstrates FOOPS and lists the types of tests it supports under each FAIR category. It invites feedback to help further improve FOOPS.
This document provides an overview and introduction to MongoDB. It discusses how new types of applications, data, volumes, development methods and architectures necessitated new database technologies like NoSQL. It then defines MongoDB and describes its features, including using documents to store data, dynamic schemas, querying capabilities, indexing, auto-sharding for scalability, replication for availability, and using memory for performance. Use cases are presented for companies like Foursquare and Craigslist that have migrated large volumes of data and traffic to MongoDB to gain benefits like flexibility, scalability, availability and ease of use over traditional relational database systems.
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
The Oxford Common File Layout (OCFL) specification began as a discussion at a Fedora/Samvera Camp held at Oxford University in September of 2017. Since then, it has grown into a focused community effort to define an open and application-independent approach to the long-term preservation of digital objects. Developed for structured, transparent, and predictable storage, it is designed to promote sustainable long-term access and management of content within digital repositories. This presentation will focus on the motivations and vision for the OCFL, explain key choices for the specification, and describe the status of implementation efforts.
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersNew York University
This presentation given to University of Iowa Libraries on Nov. 17, 2014, discussing 1) the alphabet soup of metadata standards, e.g. CDM, VRA, CCO, METS, MODS, RDF, including sample tagging and their applications for digital libraries, and 2) why metadata matters. It does not address metadata issues and tools for metadata creation, extraction, transformation, quality control, syndication and ingest.
This document summarizes a presentation on the Hypatia platform, which was developed to help archivists manage, preserve, and provide access to digital archival materials. Key points include:
- Hypatia is an open source software based on Hydra and Fedora that aims to be a repository solution for digital archives.
- It grew out of the Archives Information Management System (AIMS) project and leverages the Hydra framework.
- The presentation covered Hypatia's functional requirements gathering, data models, demonstration of capabilities, and plans for future development and community involvement.
Fedora is an open-source digital object repository system that provides persistent storage and delivery of digital content. It is implemented as a set of Java services and stores content and associated metadata in XML files. The repository can scale to support millions of objects and provides features such as versioning, audit trails and triple store capabilities through integrated systems like Mulgara.
Memory Analysis of the Dalvik (Android) Virtual MachineAndrew Case
The document summarizes research on analyzing the memory of the Dalvik virtual machine used in Android. It describes acquiring memory from Android devices, locating key data structures in memory like loaded classes and their fields, and analyzing specific Android applications to recover data like call histories, text messages, and location information. The goal is to develop forensics capabilities for investigating Android devices through memory analysis.
This presentation gives an overview of: 1) Fedora Commons, 2) it's current use by CLARIN B centres, and 3) the new TLA/FLAT setup that meets the CLARIN B centre requirements using the Fedora Commons/Islandora stack.
Using Fedora Commons To Create A Persistent ArchivePhil Cryer
With the increasing amount of digital data and demand for open access to view and reuse such data continually increasing, the adoption of open source digital repository software is critical for long term storage and management of digital objects. By utilizing the open source Fedora Commons software, the Missouri Botanical Garden has created a stable, persistent archive for Tropicos digital objects, including specimen images, plant photos, and other digital media. Metadata, organized in standard Dublin Core extracted from Tropicos, are stored alongside the digital objects providing search and sharing of data via open standards such as REST and OAI, opening the door for mash-ups and alternative uses. The presentation will cover initial discovery, required hardware and software, and an overview of our experience implementing Fedora Commons. Lessons learned, pros and cons, and other options will also be covered.
This document provides an overview of archival technologies presented at the 46th Annual Georgia Archives Institute on June 10-21, 2013. The presentation introduces various archival management tools like Archon and Archivists' Toolkit for managing archival collections. It also discusses digital collection management software such as CONTENTdm and Islandora. Emerging standards, formats and linked open data initiatives are also covered. The goal is to help archivists identify existing and new technologies that can help manage and provide access to archival materials.
SWORD (Simple Web-service Offering Repository Deposit) will take forward the Deposit protocol developed by a small working group as part of the JISC Digital Repositories Programme by implementing it as a lightweight web-service in four major repository software platforms: EPrints, DSpace, Fedora and IntraLibrary. The existing protocol documentation will be finalised by project partners and a prototype ‘smart deposit’ tool will be developed to facilitate easier and more effective population of repositories.
The document discusses the AudioMD metadata scheme created by the Library of Congress to describe technical qualities of digital audio objects. It defines AudioMD, provides examples of its use, and describes its importance in understanding audio files. The scheme captures administrative, technical, and preservation metadata in a structured XML format. It has evolved through versions 1.0 and 2.0. Additionally, the document outlines the BIBFRAME initiative led by the Library of Congress to transform bibliographic standards to a linked data model and make library catalog records more accessible online.
FAIR Workflows and Research Objects get a Workout Carole Goble
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services? Package them up with data and other stuff (or just package up data for that matter)? How? WorkflowHub (https://workflowhub.eu) and RO-Crate Research Objects (https://www.researchobject.org/ro-crate) that’s how! A step towards FAIR Digital Objects gets a workout.
Presented at DataVerse Community Meeting 2021
OCFL is an open standard for storing digital objects in a repository-independent and preservation-friendly format. It aims to ensure completeness, parsability, robustness, versioning, and storage diversity. The standard was developed through an open community process and draws on existing digital preservation best practices. An OCFL storage root contains digital objects and their metadata and versions arranged in a filesystem structure. The standard is now in version 1.0 along with implementation notes and validation tools.
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE
The 2019 International Open Access Week will be held October 21-27, 2019. This year’s theme, “Open for Whom? Equity in Open Knowledge,” builds on the groundwork laid during last year’s focus of “Designing Equitable Foundations for Open Knowledge.”
As has become a tradition of sorts, OpenAIRE organises a series of webinars during this week, highlighting OpenAIRE activities, services and tools, and reach out to the wider community with relevant talks on many aspects of Open Science.
The document discusses Eclipse Memory Analyzer, a tool for analyzing Java heap dumps and system dumps. It can simplify memory analysis of large heap dumps, provide automated detection of memory leak suspects, and allow exploration of OSGi bundles in an application. Key features include a dominator tree for identifying retained memory, a query language, and adapters for analyzing different dump formats like IBM system dumps.
This document provides an overview of digital libraries, including definitions, benefits, limitations, components, standards, and challenges. It defines a digital library as a collection of information stored and accessed electronically, extending the functions of a traditional library digitally. Benefits include improved access and searchability, easier information sharing and preservation. Emerging technologies discussed include metadata standards, XML, and protocols like OAI-PMH for metadata harvesting. Common digital library software includes DSpace, Greenstone, and EPrints. Challenges involve digitization, description, legal issues, presentation of heterogeneous resources, and economic sustainability.
This document provides an overview of digital libraries, including definitions, benefits, limitations, components, standards, and challenges. It defines a digital library as a collection of information stored and accessed electronically, extending the functions of a traditional library digitally. Benefits include improved access, information sharing, and preservation, while limitations include technological obsolescence and rights management. Key components discussed include digital objects, metadata, and tools like DSpace and Greenstone for developing digital libraries. Emerging standards around identifiers, encoding, and metadata are also summarized.
UKOLN supports the SWORD project which aims to improve the efficiency and options for depositing content into repositories by developing a standard specification. SWORD defines a deposit interface that has been implemented and tested in several repository systems, including EPrints, DSpace, Fedora, and IntraLibrary. The specification is informed by requirements from JISC and other international work in this area.
Questioning Authority Lookup Service: Linking the DataSimeon Warner
One segment of a presentation "From idea to implementation: BIBFRAME becomes reality", Charleston, 2022
The implementation of BIBFRAME in active cataloguing workflows and linked data exchange environments is live and it’s evolving across several paths that are often intertwined. This complex bibliographic ecosystem consists of many experiences that the speakers will present highlighting their value both as autonomous endeavours, as well as from the perspective of interaction and options for mutual integration.
The Library of Congress, with the BIBFRAME original cataloguing editor, Marva, will report about developments and achievements for bringing BIBFRAME into practice in a very large library environment with many cataloguing workflows for diverse types of resources, encompassing the use of and adjustments to the BIBFRAME ontology and its modelling.
On the topic of original and copy cataloguing in linked data, Stanford and Cornell Universities are working to achieve a dynamic form of cataloguing through the implementation of Sinopia linked data editor and enrichment tools such as the Questioning Authority that queries authoritative sources to support linked data authorities.
Regarding the impact of linked data processes on the user experience, the University of Pennsylvania has contributed a study describing the functionalities and scenarios which the Share-VDE 2.0 entity discovery system https://www.svde.org/ addresses, and the ways in which user feedback is supporting the evolution of linked data discovery.
Share-VDE (SVDE) is an international library-driven initiative which brings together the bibliographic catalogues and authority files of a community of libraries in an innovative entity discovery environment based on linked data. A path towards the integration of SVDE with the local library services at the University of Pennsylvania and with the Sinopia environment is ongoing. Being a linked open data node, SVDE supports various levels of interoperability and also provides additional tools like the J.Cricket entity editor based on BIBFRAME that opens up new forms of cooperation among libraries to manage and maintain linked data entities.
OCFL: A Shared Approach to Preservation PersistenceSimeon Warner
A lightning talk at the CNI Fall Forum 2022: The Oxford Common File Layout (OCFL) is an application-independent method for storing and versioning content for digital preservation. Version 1.1 was released in October 2022, including backwards compatible corrections and clarifications based on implementation experience and community feedback. The session will recap goals, summarize changes in v1.1, and survey current implementations.
This document discusses FOLIO's support for linked data and plans to enhance it. Currently, FOLIO only supports the MARC format for source record storage but plans to support additional metadata formats like Dublin Core, VRAcore and PBCore in 2023-2025. It also plans to support entity-based data models and interchange with linked open data, including BIBFRAME. There is discussion of building out a linked data SRS or connecting to an external one. The document explores options for external linked data editing and storage and integrating with an entity management application. It proposes a prototype integrating the Sinopia linked data editor and QA lookups with FOLIO to advance linked data work.
The document discusses metadata fields and labels. It notes that Simeon Warner of Cornell University presented at a conference in Austin, Texas in June 2011 on whether metadata fields should be boldly labeled. The presentation suggested thinking first about what information can be presented without labels and what is most important. It also recommended considering accepted cues to make the content clear and when fielded displays might be useful, such as for editing interfaces or search interfaces prioritizing search over comprehension.
Samvera and IIIF: Opportunities and Challenges presented by Jon Dunn (Indiana), Simeon Warner (Cornell), Hannah Frost (Stanford), Adam Wead (Penn. State), and Trey Pendragon (Princeton). The document discusses the opportunities and challenges of integrating Samvera with the International Image Interoperability Framework (IIIF). It provides an overview of IIIF APIs and specifications. It also highlights several institutions' experiences with and plans for IIIF, including using it for audiovisual content, experiments with the Presentation API, and consuming and creating IIIF manifests in digital collections. Key challenges mentioned include tool support for IIIF 3.0, authentication,
This document summarizes ORCID implementation efforts at Cornell University. It notes that over 53,000 authors have connected their ORCID IDs to their arXiv.org profiles. Cornell uses ORCID for faculty reporting in the College of Agriculture and Life Sciences. Users can connect their institutional login to their ORCID profile through Shibboleth single sign-on. So far 631 users have connected in this way. The library maintains a guide about ORCID benefits and registration.
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...Simeon Warner
This document summarizes the key changes between Open Annotations and the W3C Web Annotations specification, and their impact on IIIF Presentation API 3.0. Some of the main changes include: 1) Splitting the annotation model and vocabulary into separate specifications for cleaner JSON; 2) Adding an annotation protocol for creating/updating annotations; 3) Replacing complex body types with simpler TextualBody; 4) Using fragment selectors instead of direct fragment URIs to select parts of resources; 5) Replacing annotation lists and layers with AnnotationPages and AnnotationCollections from Activity Streams. Overall the changes aim to improve the specifications for developers through cleaner JSON, better documentation, and a stricter definition of properties and values.
Introduction to the IIIF Presentation API (@SWIB17)Simeon Warner
This document provides an introduction and overview of the International Image Interoperability Framework (IIIF) Presentation API:
- The Presentation API allows for rich online viewing of image-based objects by providing descriptive information for human users to view objects, but not necessarily for machines.
- It uses a shared canvas data model where image resources can be "painted" onto canvases using annotations. Other annotation types include transcriptions and commentaries.
- Manifests describe a work or collection using metadata, technical properties, and linking properties. Collections aggregate related manifests.
- The API uses JSON-LD to represent data in a Linked Data format that is easy for JavaScript clients to consume and treat as
Introduction to the International Image Interoperability Framework (IIIF)Simeon Warner
Introduction to the International Image Interoperability Framework (IIIF), Tutorial at Library Network Days, National Library of Finland, Helsinki, 2017-10-26
From Open Access to Open Standards, (Linked) Data and CollaborationsSimeon Warner
This document discusses moving from MARC to linked data formats like BIBFRAME. It notes that MARC has limitations like using text where data is needed and limited extensibility. Linked data formats use identifiers rather than names, connect to the web using URIs, and can be extended over time by the community. The LD4L project converted millions of MARC records to BIBFRAME at scale and developed a blacklight search over combined linked data catalogs.
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
A 24x7 presentation at Open Repositories 2017 in Brisbane, Australia.
I start with an opinionated history of the evolution of repository data harvesting since the late 1990's to the present. A conclusion is that we are currently in danger of creating a repository environment with fewer cross-repository services than before, with the potential to reinforce the silos we hope to open. I suggest that the community needs to agree upon a new solution, and further suggest that solution should be ResourceSync.
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFSimeon Warner
Identifiers, including ORCID, ISNI, LC NACO and VIAF, are playing an increasing role in library authority work. Well describe changes to cataloging practices to leverage identifiers. We'll then tell a short story of the how and why of ORCID identifiers for researchers, and relationships with other person identifiers. Finally, we'll discuss the use of identifiers as part of moves toward linked data cataloging being explored in Linked Data for Libraries work (in the LD4L Labs and LD4P projects).
IIIF without an image server? No problem!Simeon Warner
This document discusses how to implement the IIIF Image API using only static files on a web server, without a full image server. It describes how to generate tiles from an image that can be requested by IIIF viewers like OpenSeadragon. The tiles and other files like info.json are hosted using Amazon S3, enabling pan and zoom of the image in Universal Viewer and Mirador. The document emphasizes the importance of using HTTPS for security and avoiding mixed content issues.
IIIF Technical Specification Status UpdateSimeon Warner
This document summarizes updates to the IIIF specifications and groups. It discusses the release of version 1.0 of the IIIF Authentication API in January 2017. It also notes patch releases for the Image and Presentation APIs with minor corrections and clarifications. Finally, it provides updates on the work of various IIIF technical specification groups.
This document discusses the work of the IIIF Discovery Technical Subgroup, which aims to support the discovery of IIIF resources across institutions. It notes that over 335 million IIIF resources are available from over 100 institutions. Key areas the group is addressing include: (1) how to crawl and harvest IIIF resource descriptions from different sources, (2) how to index and link IIIF content to external metadata formats to support search, and (3) how to set up change notification systems and import IIIF content into different viewers. The group is developing recommendations and reference implementations in these areas to facilitate broad discovery of the growing amount of IIIF content available online.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
Oxford Common File Layout (OCFL)
1. Oxford Common File Layout
Rosalyn Metz (Emory),
Simeon Warner (Cornell)
Samvera Connect 2018
http://bit.ly/ocfl-samcon2018
2. Not just us...
OCFL Editorial Group
● Andrew Hankinson (Oxford)
● Neil Jefferies (Oxford)
● Julian Morley (Stanford)
● Andrew Woods (DuraSpace)
● and us (Rosalyn and Simeon)
Community input from pasig-discuss and
ocfl-community groups, and from others
4. BagIt
Well established and implemented specification for handling sets of files
● Being formally standardized as RFC:
https://tools.ietf.org/html/draft-kunze-bagit-17
● Used for transfer and (somewhat less) for files at rest
● Good fixity support
● No explicit versioning support
○ Could use local conventions for version inside a bag
○ Could use bag-per-version
5. Moab: A Brief History
Slides adapted from Julian Morley's in the OR2018 OCFL presentation
● Moab is the closest ancestor of OCFL
● Developed at Stanford Libraries by Richard Anderson
○ Article: http://journal.code4lib.org/articles/8482
● Named after Moab, UT
6. Moab: A Brief History
● Moab is a versioned, forward delta file
structure that supports fixity and file
de-duplication.
● You can preserve anything with it (even
cat pictures found on the internet)
● The tools to manage and create Moabs
are open source Ruby gem
○ https://github.com/sul-dlss/moab-versioning
7. Moab is part of the
Stanford Digital Repository
Here be Moabs!
8. Moab in Practice @ Stanford
We have many Moabs in the SDR
● 1.6 million Moab objects
● 5 million version directories
● 50+ million files
● 500+ TB of data (25TB added last month)
● Spread across 15 NFS volumes on NetApp filers
● Backed up by IBM Spectrum Protect (formerly TSM)
○ 1 tape copy kept in local tape frame;1 sent to Iron Mountain
12. CULAR @ 2017
It worked, what
now?
● Fedora 3 no longer being
developed, Fedora 4 not an
appropriate option
● Decision not to buy
"preservation services",
primarily on cost grounds
● Decision that we want one local
copy for legal access reasons
Short term ⇒ use local disk and
AWS S3. Build tools over
filesystem and object stores
13.
14. Those files sure
are piling up!
Nearly 100TB now, planning
100TB/year digitization
● Plan to purchase a scalable
local (object) storage system
for 1 copy
● Two more copies in cloud
(perhaps tape)
● Content will outlast any
application or software system
● Content will outlast any storage
system
● Expect change and hence
migration ⇒ KISS
18. Shared Cornell and OCFL Goals
● Provide an application and vendor neutral storage arrangement that can be
used with filesystems and object stores
○ Allow easy replication between multiple storage environments
○ Allow easy migration between storage systems (modulo the inherent burdens)
○ Allow use with multiple and changing applications
● Support package versioning at low cost (complexity and storage use)
● Support internal package validation for completeness and fixity
● Support audit and self-description of entire store
● Have an easy migration path from current archival storage arrangements
● Develop a shared model that is useful at multiple institutions so that all benefit
from community developed tools and expertise.
20. Lessons from Emory: Deliverables
Actively engaged in a multi-year effort to gather requirements, design, and
develop a digital repository based on the Samvera framework.
Selected deliverables included...
Develop object definitions/types (e.g.
collections, objects, other entities) and their
relationships to one another; determine
preservation objects inside and outside of
Fedora.
Identify needs for AIP structure.
Identify storage requirements (e.g. number of
copies, file access scenarios)
21. Lessons from Emory: Identified requirements
The means to distribute digital objects to third-party preservation services.
A well understood and well documented model for storing digital objects.
Ability to place multiple copies of digital objects into diverse storage services
(AWS, local storage, etc.).
Easily allow for fixity checking of digital objects.
22. Digital
Object
Content Files
(Primary or Supplemental)
Content file 1
Content file 2
Content file 3...
… + additional
… + additional
The content itself:
relationships provided in
structural metadata
Metadata (Actionable/Indexed)
Desc. metadata
Technical metadata (File-level)
Preservation Events/Audits
Administrative metadata
Structural metadata (PCDM)
Metadata converted to RDF
for Hyrax/Fedora - editable
and/or searchable
Supplemental Preservation Files
(Metadata/Administrative Files)
Source Metadata (binary file)
Desc. Metadata record (binary file)
METS (binary file)
License/agreement (binary file)
Supplemental PREMIS (binary file)
Variable supplemental info
stored as files (not directly
system-readable):
staff can view or download
file to read it
23. Collection
Ancient Egyptian
Collection
Administrative
Collection
Carlos Museum
Administrative
Collections reflect the
process the libraries
followed when deciding to
collect materials.
Digital Objects must be a
part of an Administrative
Collection and optionally in
one or more Collections
Digital Objects may
contain one or more files
Digital Objects,
Collections receive
Emory-defined metadata
and relationships
Major Emory
Entities PCDM
Context -
Simple Example
Individual Agreements
contain information about
the Administrative
Collection.
Individual Agreements
may contain one or more
files
Individual Agreements
are assigned to objects
through their parent
Collection
Is a member of
Is a member of
Individual Agreement
Carlos Museum
Agreement
Digital Object
Statuette of a Cat.
Collection
Divine Felines Exhibition
Is a member of
Is a member of
25. OCFL Requirements
1) Completeness, so that a repository can be
rebuilt from the files it stores,
2) Parsability, both by humans and machines,
most importantly in the absence of original
software,
3) Robustness, against errors, corruption, and
migration between storage technologies, and
4) Storage, on a variety of infrastructures
including cloud object stores.
Many existing digital preservation
standards like:
● TDR (ISO 16363)
● OAIS (ISO 14721)
● NDSA Levels of Preservation
● BagIt
discuss the need for these
requirements, but none provided a
standardized way for how to do it.
27. OCFL Object
A group of one or more content files and
administrative information identified by a
URI.
The object may contain a sequence of versions
of the files organized into version directories.
The base directory of the object may contain a
logs directory.
A NAMASTE file indicating conformance.
An object contains an inventory digest file
which provides a digest for the
inventory.json file.
[object root]
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
├── v2
│ ├── foo
│ │ └── bar.xml
│ ├── inventory.json
│ └── inventory.json.sha512
└── v3
├── inventory.json
└── inventory.json.sha512
28. OCFL Object
An object contains an inventory.json file
which inventories the contents of an object.
The manifest block lists all the digests and
existing file paths for all of the object’s content.
The versions block identifies the logical file path
and the digest for each version of the object’s
content.
Separating the logical file path from the
existing file path and using digests to refer to
files allows for deduplication of content.
{
"head": "v3",
"id": "ark:/12345/bcd987",
"manifest": {
"4d27c8...b53": [ "v2/foo/bar.xml" ],
"7dcc35...c31": [ "v1/foo/bar.xml" ],
"cf83e1...a3e": [ "v1/empty.txt" ],
"ffccf6...62e": [ "v1/image.tiff" ]
},
"type": "Object",
"versions": [
{
"created": "2018-01-01T01:01:01Z",
"message": "Initial import",
"state": {
"7dcc35...c31": [ "foo/bar.xml" ],
"cf83e1...a3e": [ "empty.txt" ],
"ffccf6...62e": [ "image.tiff" ]
},
"type": "Version",
"user": {
"address": "alice@example.com",
"name": "Alice"
},
"version": "v1"
},
{
"created": "2018-02-02T02:02:02Z",
"message": "Fix bar.xml, remove image.tiff,
29. OCFL Storage Root
The base directory of an OCFL storage layout.
Should also contain the OCFL specification in
human-readable plain-text format.
Should contain the conformance declaration
OCFL Objects may conform to the same or
earlier version of the specification.
The storage hierarchy must terminate with an
OCFL Object Root.
[storage root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
├── ab12cd34
│ ├── 0=ocfl_object_1.0
│ ├── inventory.json
│ ├── inventory.json.sha512
│ └── v1
│ ├── file.txt
│ ├── inventory.json
│ └── inventory.json.sha512
└── ef56gh78
. ├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
└── v2
├── foo
│ └── bar.xml
├── inventory.json
└── inventory.json.sha512
30. OCFL Storage Root
Storage hierarchies must not include files
within intermediate directories
Storage hierarchies must be terminated by
OCFL Object Roots
Storage hierarchies within the same OCFL
Storage Root should use just one layout
pattern
Storage hierarchies within the same OCFL
Storage Root should consistently use either a
directory hierarchy of OCFL Objects or
top-level OCFL Objects
[storage root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
└── ab
└── 12
└── cd
└── 34
└── ab12cd34
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
└── v2
├── foo
│ └── bar.xml
├── inventory.json
└── inventory.json.sha512
32. Rebuildability
● Key OCFL goal -- be able to rebuild repo
from an OCFL storage root
● Therefore, in OAIS terms: must include
all the descriptive, administrative,
structural, representation, and
preservation metadata relevant to the
object.
● Optionally include copy of spec in top level
of OCFL storage root
● More complete option would be a specific
OCFL object that contains this
documentation and to have a pointer to its
location in the storage root.
e.g. permissions, access, and
creation times
● not portable between filesystems
● not preservable through file
transfer operations
● ill-defined fixity
⇒ out-of-scope
If important, use filesystem image
format or extract as metadata
Filesystem metadata
33. Empty Directories
● OCFL preserves files and their
content
● Directories serve as an
organizational convention
● Empty directories not directly
supported
⇒ Use zero-length `.keep` file as
necessary (ala. `git`, BagIt)
Only special files are the inventory,
its digest file, and conformance
declaration files
Otherwise OCFL makes no
distinction between different types of
files.
⇒ Use local conventions as
needed
Data and Metadata
34. Storage
● Filesystem or Object Store -- you choose
● Original filename or Normalized filename -- you choose
● Deduplication & Forward delta differencing (at file level) --
optional but likely desirable/normal
"logical file path" - path of file in content as part of state for a particular version
"existing file path" - path of file in OCFL object
content addressing ties these two together
36. File operations
(mungification?)
● Inheritance
● Addition
● Updating
● Renaming
● Deletion
● Reinstatement
● Purging ⇒ choices:
a. rebuild new object
b. break immutability and
rewrite (not recommended)
Yes - OCFL supports that...
37. Version Immutability
OCFL supports systems where
versions (everything in a given
version directory) is immutable once
written.
● It is recommended to follow this
practice
● BUT you can rewrite objects if
you really want to, but
OCFL supports (in fact, enforces for
internal references) deduplication
through digests
● Only within an object
● File level
● sha512 digest recommended
Deduplication
38. Forward Delta
Each version need only include new
and changed files
● Files from previous version
included by reference
● Reference by content (digest)
supports renaming without
duplicating
(You can avoid this and include files again if you
really want. But why?)
1. Digests used for reference
already provide basis for strong
fixity checks (pref. sha512)
2. Additional digests may be
include to support legacy fixity
information (e.g. md5)
(Fixity of inventory files themselves handled by
sidecar file, e.g. inventory.json.sha512)
Fixity
39. Log Information
log directory in OCFL object
available for information not in
objects content and not versioned
● form not specified
● will be ignored in object
validation
Objects with many small file may
cause problems with some storage
infrastructures and may make
validation/fixity time consuming
● package in single file (ZIP
recommend)
(Options for a later version of the OCFL spec
are ZIPped objects and/or ZIP by version)
Small Files
40. Roadmap
Alpha (yesterday)
● Released(ish) on October 10 community call
(OCFL Editors and PASIG Discuss)
● Feedback for November community call
Beta (date based on feedback)
● Experimental validation tool
● Determine what other groups communities to
seek input from
Release 1.0 (2019)
● One production-ready validator
● Test suite and fixture objects
● Two institutions committed to backing the
initiative (should define that)