Using OCLC Data sync to enhance records in your ILS with information from the WorldCat master record, minimizing time spent on cataloging. Presented at the Te Puna forum May 2017, Wellington, New Zealand.
Integrating music resources from one library to another
A presentation on how we used MS Excel and Marc Edit to create a Marc file using Massey University bibliographic and item data to batch load into Voyager / Alma.
A presentation from CatSIG's Professional Development Day in Auckland, New Zealand on 16 October 2016.
http://www.lianza.org.nz/presentations-catsigs-professional-development-day-now-available
MARC (Machine Readable Cataloging) is an international standard format for bibliographic data. It allows catalog records to be shared and processed by different library systems. The MARC standard ensures compatibility and enables efficient management of catalog records across libraries. Some key aspects include standardized field and tag definitions, a predictable record structure, and established formats like MARC21 that are used internationally. Common errors to avoid include incorrect field codes and indicators, typographical mistakes, failure to follow punctuation conventions, and not accounting for how one's library system handles specific MARC fields.
This document provides an overview of batch processing in OCLC and outlines the basic workflow. It defines common types of batch processing projects like reclamation, retrospective conversion, and ongoing batchloading. The workflow involves ordering a project, submitting data files, pre-processing and matching records at OCLC, and receiving output reports and records. Output options include cross reference reports, records with OCLC numbers merged, and full OCLC-MARC records. Documentation and support resources are also listed.
The document describes the functionalities of the technical processing section of the MUET Library & Online Information Center. It discusses activities like classification, cataloguing, and preparing MARC records. It provides examples of MARC records and describes the daily workflow of the technical processing section, including receiving documents, harvesting records, editing records, generating barcodes, and transferring materials to departments. Performance statistics for 2012 show 1800 books were purchased, with 1140 processed so far and 600 still under process.
This document provides an overview of OCLC's batch processing services for loading bibliographic records. It discusses the different types of batchload projects, output options, and the basic workflow process. The key points covered are:
1) The types of batchload projects include standard bibliographic loads, cross reference reports, and full OCLC-MARC record returns.
2) Output options for batchload projects include cross reference reports, records with OCLC numbers merged in, and full OCLC records with local data merged in.
3) The basic workflow involves ordering a project, submitting data files to OCLC, pre-processing and matching records, and receiving output reports and/or records.
4)
Batchloading can help synchronize a library's bibliographic records with WorldCat and keep them up-to-date. A library can order various batchload projects including reclamation projects to set holdings, retrospective projects to fill gaps, and ongoing batchloads to maintain current holdings. The document describes how to order standard and non-standard batchload projects through OCLC and ensure optimal processing of records and output of OCLC numbers and reports.
Batchloading can help synchronize a library's bibliographic records with WorldCat and keep them up-to-date. A library can order various batchload projects including reclamation projects to set holdings, retrospective projects to obtain OCLC numbers, and ongoing batchloads to maintain current holdings. The document describes how to order standard and non-standard batchload projects through OCLC and the processing and output options available.
Integrating music resources from one library to another
A presentation on how we used MS Excel and Marc Edit to create a Marc file using Massey University bibliographic and item data to batch load into Voyager / Alma.
A presentation from CatSIG's Professional Development Day in Auckland, New Zealand on 16 October 2016.
http://www.lianza.org.nz/presentations-catsigs-professional-development-day-now-available
MARC (Machine Readable Cataloging) is an international standard format for bibliographic data. It allows catalog records to be shared and processed by different library systems. The MARC standard ensures compatibility and enables efficient management of catalog records across libraries. Some key aspects include standardized field and tag definitions, a predictable record structure, and established formats like MARC21 that are used internationally. Common errors to avoid include incorrect field codes and indicators, typographical mistakes, failure to follow punctuation conventions, and not accounting for how one's library system handles specific MARC fields.
This document provides an overview of batch processing in OCLC and outlines the basic workflow. It defines common types of batch processing projects like reclamation, retrospective conversion, and ongoing batchloading. The workflow involves ordering a project, submitting data files, pre-processing and matching records at OCLC, and receiving output reports and records. Output options include cross reference reports, records with OCLC numbers merged, and full OCLC-MARC records. Documentation and support resources are also listed.
The document describes the functionalities of the technical processing section of the MUET Library & Online Information Center. It discusses activities like classification, cataloguing, and preparing MARC records. It provides examples of MARC records and describes the daily workflow of the technical processing section, including receiving documents, harvesting records, editing records, generating barcodes, and transferring materials to departments. Performance statistics for 2012 show 1800 books were purchased, with 1140 processed so far and 600 still under process.
This document provides an overview of OCLC's batch processing services for loading bibliographic records. It discusses the different types of batchload projects, output options, and the basic workflow process. The key points covered are:
1) The types of batchload projects include standard bibliographic loads, cross reference reports, and full OCLC-MARC record returns.
2) Output options for batchload projects include cross reference reports, records with OCLC numbers merged in, and full OCLC records with local data merged in.
3) The basic workflow involves ordering a project, submitting data files to OCLC, pre-processing and matching records, and receiving output reports and/or records.
4)
Batchloading can help synchronize a library's bibliographic records with WorldCat and keep them up-to-date. A library can order various batchload projects including reclamation projects to set holdings, retrospective projects to fill gaps, and ongoing batchloads to maintain current holdings. The document describes how to order standard and non-standard batchload projects through OCLC and ensure optimal processing of records and output of OCLC numbers and reports.
Batchloading can help synchronize a library's bibliographic records with WorldCat and keep them up-to-date. A library can order various batchload projects including reclamation projects to set holdings, retrospective projects to obtain OCLC numbers, and ongoing batchloads to maintain current holdings. The document describes how to order standard and non-standard batchload projects through OCLC and the processing and output options available.
This document discusses using JSON in Oracle Database 18c/19c. It begins by introducing the presenter and their background. It then covers storing and querying JSON in the database using various SQL and PL/SQL features like JSON_QUERY, JSON_OBJECT, and JSON_TABLE. The document discusses how different SQL data types are converted to JSON. It shows examples of converting rows to CSV, XML, and JSON formats for data exchange. In summary, the document provides an overview of Oracle Database's support for JSON focusing on using it properly for data exchange and storage.
This document discusses using JSON in Oracle Database 18c/19c. It begins with an introduction of the author and their experience. It then covers how JSON is used everywhere in databases and applications now. The main topic is using JSON as intended - as a data interchange format rather than for storage. It highlights the advantages of JSON over CSV and XML for data exchange. It also demonstrates how to transform between SQL and JSON data types and discusses some of the challenges around data formats and encoding.
The document provides an overview of SSIS connectivity options for Oracle, DB2 and SAP databases. It discusses the various connectors that can be used to extract, load and transform data between these enterprise databases and SQL Server. Performance tests were conducted using these connectors to load and extract data from Oracle, DB2 and SAP systems. Tips are provided on optimizing extraction and loading speeds by leveraging data type conversions and parallel processing capabilities.
This document provides an overview of how bibliographic, holding, and item data for a print monograph titled "Philosophy: the quest for truth and meaning" is represented in the Voyager integrated library system. It demonstrates the relationships between the MARC record, bibliographic record, holding record, item record, and relevant database tables. Basic SQL queries are presented to retrieve and view specific data from tables such as bib_text, bib_index, and others.
visualisasi data praktik pakai excel, pyElmaLyrics
This document provides examples of different types of data visualizations that can be created using Excel, Octave/Matlab, R, and Python. It discusses creating line graphs, bar charts, scatterplots, histograms, boxplots, heatmaps, and treemaps from various sample datasets to visualize trends in data over time, relationships between variables, and the distribution of variable values. Code examples are provided for creating these visualizations in Octave/Matlab, R, and Python.
ECU ODS data integration using OWB and SSIS UNC Cause 2013Keith Washer
ECU’s Extract Transform and Load (ETL) Framework consists of two paths for loading external data into the Operational Data Store (ODS): Non-Oracle Data Sources (Microsoft SQL Server, MS Access databases, web services) and Oracle data sources. The paths are controlled by the external system and the mechanism to connect and extract the data. When the external system does not allow for an Oracle to Oracle connection, Microsoft SQL Server Integration Services (SSIS) is used as the foundation for the Non-Oracle data source path. When the external systems allows for an Oracle to Oracle connection the Oracle Data Source path is selected.
In this session we will present several major projects showcasing how ECU leverages Microsoft SQL Server Integration Services (SSIS), Oracle Streams, and the Ellucian/Banner ODS ETL process to load various types of external data into the Ellucian/Banner Operational Data Store (ODS).
Tired of trying to maintain an ever-changing list of eSerials holdings? Looking for ways to automatically set and maintain your library’s eSerials holdings in WorldCat? The OCLC eSerials Holdings service has what you need!
Attend this session to learn how your library can:
* Make your electronic content more visible and accessible - without adding to your cataloging workload
* Automate setting and maintaining your library’s location holdings in WorldCat
* Increase the value of your investment in A-Z lists, OpenURL resolver, WorldCat, and WorldCat Collection Analysis
* Increase usage of your electronic serials collection
* Automatically control and/or deflect ILL requests from colleagues in the OCLC cooperative
* Get started using the eSerials Holdings service - available at no charge to OCLC member libraries
Presented by Christa Burns as the NEBASE Hour on December 5, 2007.
Benjamin Ferguson presented on using OCLC's Reclamation Batchload service to update a library's records in OCLC's WorldCat catalog. The process involves submitting bibliographic records from the local catalog to OCLC, which will match the records to WorldCat, update local holdings, and remove outdated holdings. Preparation is important, such as fixing cataloging issues. Challenges included problem records, missing records, and vendor-supplied records that needed fixes to the 001 field before overlaying updates. It is recommended to thoroughly prepare records beforehand and closely monitor the results for completeness.
SAP Business Objects XIR3.0/3.1, BI 4.0 & 4.1 Course Content
SAP Business Objects Web Intelligence and BI Launch Pad 4.0
Introducing Web Intelligence
BI launch pad: What's new in 4.0
Customizing BI launch pad
Creating Web Intelligence Documents with Queries
Restricting Data Returned by a Query
Report Design in the Java Report Panel
Enhancing the Presentation of Reports
Formatting Reports
Creating Formulas and Variables
Synchronizing Data
Analyzing Data
Drilling
Filtering data
Alerts
Input Control
Scheduling (email)
Data Refresh introduction
Sharing Web Intelligence Documents
SAP Business Objects BI Information Design Tool 4.0
Create a project
Create a connection to a relational database
Create a data foundation based on a single source relational database
Create a business layer based on a single relational data source
Publish a new universe file based on a single data source
Retrieve a universe from a repository location
Publish a universe to a local folder
Retrieve a universe from a local folder
Open a local project
Delete a local project
Convert a repository universe from a UNV to a UNX
Convert a local universe from a UNV to a UNX
Connecting to Data Sources
Create a connection shortcut
View and filter data source values in the connection editor
Create a connection to an OLAP data source
Create a BICS connection to SAP BW for client tools
Create a relational connection to SQL Server using OLEDB providers
Building the Structure of a Universe
Arrange tables in a data foundation
View table values in a data foundation
View values from multiple tables in a data foundation
Filter table values in a data foundation
Filter values from multiple tables in a data foundation
Apply a wildcard to filter table values in a data foundation
Apply a wildcard to filter values from multiple tables in a data foundation
Sort and re-order table columns in a data foundation
Edit table values in a data foundation
Create an equi-join, theta join, outer join, shortcut join
Create a self-restricting join using a column filter
Modify and remove a column filter
Detect join cardinalities in a data foundation
Manually set join cardinalities in a data foundation
Refresh the structure of a universe
Creating the Business Layer of a Universe
Create business layer folders and subfolders
Create a business layer folder and objects automatically from a table
Create a business layer subfolder and objects automatically from a table
Create dimension objects automatically from a table
Create a dimension, attribute , measure
Hide folders and objects in a business layer
Organize folders and subfolders in a business layer
View table and object dependencies
Create a custom navigation path
Create a dimensional business layer from an OLAP data source
Copy and paste folders and objects in a business layer
Filtering Data in Objects
Create a pre-defined
The document discusses best practices for using Oracle Database In-Memory. It provides an overview of In-Memory and describes how to configure and populate the In-Memory Column Store. It also discusses how the optimizer utilizes In-Memory statistics and hints to optimize queries for In-Memory. Several examples of queries that benefit from In-Memory, such as aggregation queries and queries with predicates, are also provided.
How to Implement Distributed Data Store Philip Zhong
This document discusses the design of an XQuery engine data storage system. It covers major features like data query and storage. It describes the data flow and data structures. It also outlines rules for selectivity calculation, SQL generation, database high availability and monitoring. Performance test results are provided for different queries on large tables with and without indexes.
The document discusses moving beyond simply moving bytes in stream processing and instead focusing on understanding data semantics through the use of a schema registry. A schema registry is a centralized service for storing and retrieving schemas to support serialization and deserialization across applications and systems. Several existing schema registries are described, along with how schemas can be referenced in messages rather than embedded. The use of a schema registry in a data pipeline is demonstrated. Finally, the document discusses implementing serialization and deserialization using schemas with Apache Flink.
Streaming applications almost always require a schema. This is because the most interesting operations that can be applied to a data stream -- projection, scaling, aggregation, filtering, joining, streaming SQL -- all require you to know something about the types and values of fields in your data; otherwise you’re just moving bytes and counting anonymous things. This talk is an introduction and overview of shared schema registries [1,2] with a demonstration of how they can be integrated into Apache Flink pipelines to centralize schema management and enable schema reuse across data flow systems (e.g., from Apache Kafka or Apache NiFi to Flink and back again). We will begin with a discussion about the shortcomings of the common practice of embedding schemas and generated classes in code projects, followed by an illustration of essential registry features (e.g., centralization, versioning, transformation and validation) as they appear in both Confluent’s and Hortonworks’s schema registries. And, we’ll close with a detailed look at how these schema registries can be integrated into Flink serializers, sources and sinks. 1. https://github.com/confluentinc/schema-registry 2. http://github.com/hortonworks/registry
Building Robust ETL Pipelines with Apache SparkDatabricks
Stable and robust ETL pipelines are a critical component of the data infrastructure of modern enterprises. ETL pipelines ingest data from a variety of sources and must handle incorrect, incomplete or inconsistent records and produce curated, consistent data for consumption by downstream applications. In this talk, we’ll take a deep dive into the technical details of how Apache Spark “reads” data and discuss how Spark 2.2’s flexible APIs; support for a wide variety of datasources; state of art Tungsten execution engine; and the ability to provide diagnostic feedback to users, making it a robust framework for building end-to-end ETL pipelines.
The document discusses SQLite, a widely used lightweight database format. It notes that SQLite databases are commonly used in smartphones and applications to store structured data. The document outlines challenges in recovering deleted data from SQLite databases and introduces an advanced SQLite recovery tool being developed by viaForensics. It provides information on SQLite database structure, including pages, B-trees, records, and data types. It also discusses viewing and analyzing SQLite databases using command line and graphical tools.
Spark and Cassandra with the Datastax Spark Cassandra Connector
How it works and how to use it!
Missed Spark Summit but Still want to see some slides?
This slide deck is for you!
The document discusses the Datastax Spark Cassandra Connector. It provides an overview of how the connector allows Spark to interact with Cassandra data, including performing full table scans, pushing down filters and projections to Cassandra, distributed joins using Cassandra's partitioning, and writing data back to Cassandra in a distributed way. It also highlights some recent features of the connector like support for Cassandra 3.0, materialized views, and performance improvements from the Java Wildcard Cassandra Tester project.
Rajat Venkatesh from Qubole presented on Quark, a virtualization engine for analytics. Quark uses a multi-store architecture to optimize queries using materialized views, predicate injection, and denormalized/sorted tables. It supports multiple SQL and storage engines. The roadmap includes improvements to the cost-based optimizer, support for OLAP cubes, and developing Quark as a service. Coordinates for the Quark GitHub and mailing list were provided.
Making Apache Spark Better with Delta LakeDatabricks
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies the streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
In this talk, we will cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
* How to get involved!
Presentation of RAW, a prototype query engine which enables querying heterogeneous data sources transparently using Just-In-Time access paths. Presentation given at the 40th International Conference on Very Large Databases (VLDB 2014)
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
More Related Content
Similar to Using OCLC Data sync to enhance records in your ILS
This document discusses using JSON in Oracle Database 18c/19c. It begins by introducing the presenter and their background. It then covers storing and querying JSON in the database using various SQL and PL/SQL features like JSON_QUERY, JSON_OBJECT, and JSON_TABLE. The document discusses how different SQL data types are converted to JSON. It shows examples of converting rows to CSV, XML, and JSON formats for data exchange. In summary, the document provides an overview of Oracle Database's support for JSON focusing on using it properly for data exchange and storage.
This document discusses using JSON in Oracle Database 18c/19c. It begins with an introduction of the author and their experience. It then covers how JSON is used everywhere in databases and applications now. The main topic is using JSON as intended - as a data interchange format rather than for storage. It highlights the advantages of JSON over CSV and XML for data exchange. It also demonstrates how to transform between SQL and JSON data types and discusses some of the challenges around data formats and encoding.
The document provides an overview of SSIS connectivity options for Oracle, DB2 and SAP databases. It discusses the various connectors that can be used to extract, load and transform data between these enterprise databases and SQL Server. Performance tests were conducted using these connectors to load and extract data from Oracle, DB2 and SAP systems. Tips are provided on optimizing extraction and loading speeds by leveraging data type conversions and parallel processing capabilities.
This document provides an overview of how bibliographic, holding, and item data for a print monograph titled "Philosophy: the quest for truth and meaning" is represented in the Voyager integrated library system. It demonstrates the relationships between the MARC record, bibliographic record, holding record, item record, and relevant database tables. Basic SQL queries are presented to retrieve and view specific data from tables such as bib_text, bib_index, and others.
visualisasi data praktik pakai excel, pyElmaLyrics
This document provides examples of different types of data visualizations that can be created using Excel, Octave/Matlab, R, and Python. It discusses creating line graphs, bar charts, scatterplots, histograms, boxplots, heatmaps, and treemaps from various sample datasets to visualize trends in data over time, relationships between variables, and the distribution of variable values. Code examples are provided for creating these visualizations in Octave/Matlab, R, and Python.
ECU ODS data integration using OWB and SSIS UNC Cause 2013Keith Washer
ECU’s Extract Transform and Load (ETL) Framework consists of two paths for loading external data into the Operational Data Store (ODS): Non-Oracle Data Sources (Microsoft SQL Server, MS Access databases, web services) and Oracle data sources. The paths are controlled by the external system and the mechanism to connect and extract the data. When the external system does not allow for an Oracle to Oracle connection, Microsoft SQL Server Integration Services (SSIS) is used as the foundation for the Non-Oracle data source path. When the external systems allows for an Oracle to Oracle connection the Oracle Data Source path is selected.
In this session we will present several major projects showcasing how ECU leverages Microsoft SQL Server Integration Services (SSIS), Oracle Streams, and the Ellucian/Banner ODS ETL process to load various types of external data into the Ellucian/Banner Operational Data Store (ODS).
Tired of trying to maintain an ever-changing list of eSerials holdings? Looking for ways to automatically set and maintain your library’s eSerials holdings in WorldCat? The OCLC eSerials Holdings service has what you need!
Attend this session to learn how your library can:
* Make your electronic content more visible and accessible - without adding to your cataloging workload
* Automate setting and maintaining your library’s location holdings in WorldCat
* Increase the value of your investment in A-Z lists, OpenURL resolver, WorldCat, and WorldCat Collection Analysis
* Increase usage of your electronic serials collection
* Automatically control and/or deflect ILL requests from colleagues in the OCLC cooperative
* Get started using the eSerials Holdings service - available at no charge to OCLC member libraries
Presented by Christa Burns as the NEBASE Hour on December 5, 2007.
Benjamin Ferguson presented on using OCLC's Reclamation Batchload service to update a library's records in OCLC's WorldCat catalog. The process involves submitting bibliographic records from the local catalog to OCLC, which will match the records to WorldCat, update local holdings, and remove outdated holdings. Preparation is important, such as fixing cataloging issues. Challenges included problem records, missing records, and vendor-supplied records that needed fixes to the 001 field before overlaying updates. It is recommended to thoroughly prepare records beforehand and closely monitor the results for completeness.
SAP Business Objects XIR3.0/3.1, BI 4.0 & 4.1 Course Content
SAP Business Objects Web Intelligence and BI Launch Pad 4.0
Introducing Web Intelligence
BI launch pad: What's new in 4.0
Customizing BI launch pad
Creating Web Intelligence Documents with Queries
Restricting Data Returned by a Query
Report Design in the Java Report Panel
Enhancing the Presentation of Reports
Formatting Reports
Creating Formulas and Variables
Synchronizing Data
Analyzing Data
Drilling
Filtering data
Alerts
Input Control
Scheduling (email)
Data Refresh introduction
Sharing Web Intelligence Documents
SAP Business Objects BI Information Design Tool 4.0
Create a project
Create a connection to a relational database
Create a data foundation based on a single source relational database
Create a business layer based on a single relational data source
Publish a new universe file based on a single data source
Retrieve a universe from a repository location
Publish a universe to a local folder
Retrieve a universe from a local folder
Open a local project
Delete a local project
Convert a repository universe from a UNV to a UNX
Convert a local universe from a UNV to a UNX
Connecting to Data Sources
Create a connection shortcut
View and filter data source values in the connection editor
Create a connection to an OLAP data source
Create a BICS connection to SAP BW for client tools
Create a relational connection to SQL Server using OLEDB providers
Building the Structure of a Universe
Arrange tables in a data foundation
View table values in a data foundation
View values from multiple tables in a data foundation
Filter table values in a data foundation
Filter values from multiple tables in a data foundation
Apply a wildcard to filter table values in a data foundation
Apply a wildcard to filter values from multiple tables in a data foundation
Sort and re-order table columns in a data foundation
Edit table values in a data foundation
Create an equi-join, theta join, outer join, shortcut join
Create a self-restricting join using a column filter
Modify and remove a column filter
Detect join cardinalities in a data foundation
Manually set join cardinalities in a data foundation
Refresh the structure of a universe
Creating the Business Layer of a Universe
Create business layer folders and subfolders
Create a business layer folder and objects automatically from a table
Create a business layer subfolder and objects automatically from a table
Create dimension objects automatically from a table
Create a dimension, attribute , measure
Hide folders and objects in a business layer
Organize folders and subfolders in a business layer
View table and object dependencies
Create a custom navigation path
Create a dimensional business layer from an OLAP data source
Copy and paste folders and objects in a business layer
Filtering Data in Objects
Create a pre-defined
The document discusses best practices for using Oracle Database In-Memory. It provides an overview of In-Memory and describes how to configure and populate the In-Memory Column Store. It also discusses how the optimizer utilizes In-Memory statistics and hints to optimize queries for In-Memory. Several examples of queries that benefit from In-Memory, such as aggregation queries and queries with predicates, are also provided.
How to Implement Distributed Data Store Philip Zhong
This document discusses the design of an XQuery engine data storage system. It covers major features like data query and storage. It describes the data flow and data structures. It also outlines rules for selectivity calculation, SQL generation, database high availability and monitoring. Performance test results are provided for different queries on large tables with and without indexes.
The document discusses moving beyond simply moving bytes in stream processing and instead focusing on understanding data semantics through the use of a schema registry. A schema registry is a centralized service for storing and retrieving schemas to support serialization and deserialization across applications and systems. Several existing schema registries are described, along with how schemas can be referenced in messages rather than embedded. The use of a schema registry in a data pipeline is demonstrated. Finally, the document discusses implementing serialization and deserialization using schemas with Apache Flink.
Streaming applications almost always require a schema. This is because the most interesting operations that can be applied to a data stream -- projection, scaling, aggregation, filtering, joining, streaming SQL -- all require you to know something about the types and values of fields in your data; otherwise you’re just moving bytes and counting anonymous things. This talk is an introduction and overview of shared schema registries [1,2] with a demonstration of how they can be integrated into Apache Flink pipelines to centralize schema management and enable schema reuse across data flow systems (e.g., from Apache Kafka or Apache NiFi to Flink and back again). We will begin with a discussion about the shortcomings of the common practice of embedding schemas and generated classes in code projects, followed by an illustration of essential registry features (e.g., centralization, versioning, transformation and validation) as they appear in both Confluent’s and Hortonworks’s schema registries. And, we’ll close with a detailed look at how these schema registries can be integrated into Flink serializers, sources and sinks. 1. https://github.com/confluentinc/schema-registry 2. http://github.com/hortonworks/registry
Building Robust ETL Pipelines with Apache SparkDatabricks
Stable and robust ETL pipelines are a critical component of the data infrastructure of modern enterprises. ETL pipelines ingest data from a variety of sources and must handle incorrect, incomplete or inconsistent records and produce curated, consistent data for consumption by downstream applications. In this talk, we’ll take a deep dive into the technical details of how Apache Spark “reads” data and discuss how Spark 2.2’s flexible APIs; support for a wide variety of datasources; state of art Tungsten execution engine; and the ability to provide diagnostic feedback to users, making it a robust framework for building end-to-end ETL pipelines.
The document discusses SQLite, a widely used lightweight database format. It notes that SQLite databases are commonly used in smartphones and applications to store structured data. The document outlines challenges in recovering deleted data from SQLite databases and introduces an advanced SQLite recovery tool being developed by viaForensics. It provides information on SQLite database structure, including pages, B-trees, records, and data types. It also discusses viewing and analyzing SQLite databases using command line and graphical tools.
Spark and Cassandra with the Datastax Spark Cassandra Connector
How it works and how to use it!
Missed Spark Summit but Still want to see some slides?
This slide deck is for you!
The document discusses the Datastax Spark Cassandra Connector. It provides an overview of how the connector allows Spark to interact with Cassandra data, including performing full table scans, pushing down filters and projections to Cassandra, distributed joins using Cassandra's partitioning, and writing data back to Cassandra in a distributed way. It also highlights some recent features of the connector like support for Cassandra 3.0, materialized views, and performance improvements from the Java Wildcard Cassandra Tester project.
Rajat Venkatesh from Qubole presented on Quark, a virtualization engine for analytics. Quark uses a multi-store architecture to optimize queries using materialized views, predicate injection, and denormalized/sorted tables. It supports multiple SQL and storage engines. The roadmap includes improvements to the cost-based optimizer, support for OLAP cubes, and developing Quark as a service. Coordinates for the Quark GitHub and mailing list were provided.
Making Apache Spark Better with Delta LakeDatabricks
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies the streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
In this talk, we will cover:
* What data quality problems Delta helps address
* How to convert your existing application to Delta Lake
* How the Delta Lake transaction protocol works internally
* The Delta Lake roadmap for the next few releases
* How to get involved!
Presentation of RAW, a prototype query engine which enables querying heterogeneous data sources transparently using Just-In-Time access paths. Presentation given at the 40th International Conference on Very Large Databases (VLDB 2014)
Similar to Using OCLC Data sync to enhance records in your ILS (20)
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
AppSec PNW: Android and iOS Application Security with MobSF
Using OCLC Data sync to enhance records in your ILS
1. Using OCLC Data sync to enhance records in your
ILS with information from the WorldCat master
record, minimizing time spent on cataloging.
Presented by
Skalk van der Merwe (Discovery Systems Librarian)
2. Statistics on Victoria Catalogue…
0%
20%
40%
60%
80%
100%
120%
Book Computer file Journal Map Mixed material Music Visual material
% of Records with OCLC Numbers pre and post record upgrade
work
Nov-16 Mar-17
0%
10%
20%
30%
40%
50%
60%
70%
80%
Book Computer file Journal Map Mixed material Music Visual material
% of Records with no Classification
Nov-16 Mar-17
3. What is OCLC Datasync
• Data synchronization is an automated service which allows you to
synchronize your holdings with WorldCat to make your collections
visible and available through OCLC services by:
– Adding original cataloging records to WorldCat
– Enhancing records in your ILS with information from the WorldCat master
record, minimizing time spent on cataloging
– Matching records from your local catalog with records in WorldCat.
– Managing local holdings data
– Setting or deleting holdings for single institutions or groups to accurately
reflect what is in your collection
– Updating your holdings in WorldCat with additional Local Bibliographic data.
4. Step 1 Get OCLC Numbers into bibliographic records with
no OCLC number
1. Publish MARC records with no OCLC Numbers for resolution to get
the OCLC Number. (Published over 400,000 MARC records!)
2. Download from OCLC FTP server the cross-reference files.
metacoll.*.datasync.*.*.*.xrefrpt.txt
3. Download from OCLC FTP server the MARC update files.
metacoll.*.datasync.*.*.WorldCatRecords.*.*.mrc
4. Concatenate these files
5. Process with Notepad++ and MS Excel these files
6. Use Marc Edit to Create Short records
1. Click Delimited Text Translator,
option.
2. Supply a Source file and Output file
details
i. Source File contains Local System
Number and OCLC number.
ii. Output File will contain MARC records.
3. Click Edit LDR/008
4. Remove 008 data.
5. Click OK
❶
❷
❸
❹
7. Process Short Records in MarcEdit and Load Into the LMS
=LDR 00087nam a2200049Ia 4500
=001 996887014002386
=035 $a(OCoLC)173411524
=LDR 00085nam a2200049Ia 4500
=001 991984504002386
=035 $a(OCoLC)2787314
=LDR 00085nam a2200049Ia 4500
=001 995134274002386
=035 $a(OCoLC)4506400
=LDR 00087nam a2200049Ia 4500
=001 993734314002386
=035 $a(OCoLC)966359245
1. In LMS create a Merge Import profile that matches on 001.
2. In LMS create a merge import rule that will merge the 035 if
035$a(OCoLC) exists, replace 035 with 035 from incoming record. If
doesn't exist, the 035 from incoming record is added to existing
record.
3. In Alma: Use Sheffield University’s:
rule "Replace 035 OCLC number"
when
merge
then
replace MARC."035" when MARC."035"."a"
contains "(OCoLC)" excluding
MARC."035"("9","9")
end
8. Load into LMS the WorldCat DataSync records
• In LMS Create an Import rule that matches on 035$a
• Load the metacoll.*.datasync.*.*.WorldCatRecords.*.*.mrc
file(s) into LMS
• Got option to:
– Replace whole record
– Merge the tags you want
• In Alma create a Merge Rule:
9. Alma create a “Merge Rule”
rule "Merge WorldCat import records for All field"
when
merge
then
remove MARC."001"
remove MARC."003"
remove MARC."019"
remove MARC."029"
remove MARC."9"XX excluding "945,950,957,980,980,984,994"
add MARC."035" when MARC."035"."a" contains "NZ-WeVUL"
add MARC."035" when MARC."035"."a" contains "nzNZBN"
replace MARC.control."007"
replace MARC.control."008"
replace MARC."010" if exists
replace MARC."020" if exists
replace MARC."022" if exists
replace MARC."035" when MARC."035"."a" contains "(OCoLC)"
excluding MARC."035"("9","9")
Retain from existing record}
10. Alma create a “Merge Rule” cont…
replace MARC."041" if exists
replace MARC."043" if exists
replace MARC."045" if exists
replace MARC."048" if exists
replace MARC."050" if exists
replace MARC."082" if exists
replace MARC."1"XX if exists
replace MARC."24"X if exists
replace MARC."3"XX if exists
replace MARC."4"XX if exists
11. Alma create a “Merge Rule” cont…
replace MARC."507" if exists
replace MARC."508" if exists
replace MARC."511" if exists
replace MARC."513" if exists
replace MARC."514" if exists
replace MARC."515" if exists
replace MARC."516" if exists
replace MARC."518" if exists
replace MARC."521" if exists
replace MARC."522" if exists
replace MARC."524" if exists
replace MARC."525" if exists
replace MARC."533" if exists
replace MARC."547" if exists
replace MARC."550" if exists
replace MARC."555" if exists
replace MARC."556" if exists
replace MARC."567" if exists
replace MARC."585" if exists
replace MARC."586" if exists
replace MARC."588" if exists
replace MARC."6"XX if exists
replace MARC."7"XX if exists
replace MARC."8"XX if exists
add MARC."050" if does not exists
add MARC."500" if does not exists
add MARC."501" if does not exists
add MARC."502" if does not exists
add MARC."504" if does not exists
add MARC."505" if does not exists
add MARC."520" if does not exists
add MARC."546" if does not exists
end
Retain from
existing
record
Today I am going to talk about how Victoria University is planning to use OCLC Data sync to enhance records in our ILS with information from WorldCat, minimizing time spent on cataloging.
To see where Data Synch might be off use, I ran a snap shot analysis report on our Alma Print / Physical Collection records.
Here we can see:
Bib records without OCLC Numbers as a percentage and number of bib records without Classification for example.
Notice how the percentage of Bib records without OCLC Numbers, increased from, 71% to 97%. This has an impact on
Collection Evaluation e.g. Duplicate detection
Record Linking between Print and Electronic in Discovery System Primo
(26% increase)
Also notice how the percentage of Bib records without Classification dropped from 37% to 28% . This has an impact on
Collection Evaluation e.g. Reporting by Classification
Weeding efforts e.g. Reporting by Classification
Reclassification and relabelling
9% Decrease
Subjects are 7.5% currently
So what is OCLC Data Sync?
OCLC Data Sync is an automated service which allows Libraries to synchronize their holdings with OCLC WorldCat to make collections visible and available through OCLC services.
By:
Adding records.
Removing records
Adding holdings
Removing holdings
++++++++++++++++++++++++++++++++++++++++
Data elements used as the primary source of retrieval and comparison for matching include, but are not limited to, the following:
"Unique" Numbers including OCLC Numbers, ISBN, ISSN, etc.
Physical Material Type
Dates of Publication
Language of Cataloging
Title
Author
Edition
Publisher
Extent
A fingerprint is a pattern of data created by multiple data elements in the record. Any individual field can be part of multiple Fingerprints.
We wanted to take advantage of this but needed to do a couple of things….
First we needed to update existing MARC records with OCLC numbers:
Publish MARC records with no OCLC Numbers for resolution to get the OCLC Number. (Published over 400,000 MARC records!)
Download from OCLC FTP server the cross-reference files.
Download from OCLC FTP server the MARC update files.
Concatenate cross-reference files and the MARC update files
Process with Notepad++ and MS Excel these files
Cross-reference files in /xfer/metacoll/reports
MARC update files in: /xfer/metacoll/out/ongoing/updates
Process the OCLC Cross Reference Files
Old Report
Difficult
New Report
Much more usefull
Use Marc Edit to Create Short records
Run through steps:
Click Delimited Text Translator, option.
Supply a Source file and Output file details
Source File contains Local System Number and OCLC number.
Output File will contain MARC records.
Click Edit LDR/008
Remove 008 data.
Click OK
Process Short Records in MarcEdit and Load Into the LMS
In Alma create a Merge Rule
Options to take advantage would be to:
Replace whole record
Merge the tags you want
We opted to use Selective Merge
Needed to consulted with Metadata Librarians on tags
Solicited examples
Tested loading of records.
Your decision will depend on local Cataloguing practices and in what tags you have added local data etc.
This Alma merge rule will remove from the incoming WorldCat record:
001
003
019
029
260
9XX fields but not the 945, 950, 957, 980, 984 and 995
We want to retain the existing NZ-WeVUL and the NZBN number if possible.
Replace Control Fields 007 and 008
Replace fields 010, 020, 022 if they exist in the existing Alma record.
Replace the 035 if existing Alma record contains and OCLC prefix
Replace
041
043
045
050
082
All 1XX fields
All 24X fields
All 3XX fields
All 5XX fields
if they exist in the existing Alma record.
Replace a number of specified 5 fields
Replace a number of specified 6 fields
If a 69X field is in an existing Alma record the incoming record will not replace it, since its not mention in the list, it will not be replaced.
Replace all 7XX fields
Replace all 8XX fields
Retain in the existing Alma record if they do not exist
050
500
501
502
504
505
520
546
Marc records in Alma Pre and Post import:
The Middle East a physical, social and regional geography.
Marc records in Alma Pre and Post import:
An illustrated guide to New Zealand hebes / by Michael Bayly and Alison Kellow.