Planning and Implementing a Digital Library ProjectJenn Riley
Brancolini, Kristine and Jenn Riley. "Planning and Implementing a Digital Library Project," Indiana LSTA Digital Project Planning Workshop, February 7, 2006.
I would like to present to you my ppt about 'Blockchain and Libraries'. And that's within the course of "Digital Services in Data Centers and Archives" supervised by Dr Farah Sbeity at the Lebanese University.
#libraries #blockchaintechnology #Innovativelibraries #blockchainlibraries #digitaltransformation #research #digital #university #database #dataanalytics #databackup #datacenters #archive #informationmanagement #blockchain
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems.
Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a presentation.
The slides cover technologies such as Apache Kafka, Apache Spark, Confluent, Databricks, Snowflake, Elasticsearch, AWS Redshift, GCP with Google Bigquery, and Azure Synapse.
Slides from the IFLA ARL Hot Topics 2023 session held in Rotterdam, The Netherlands on 22 August 2023.
Presentation by: Cecilia Adewumi and Adetoun Oyelude (Nigeria)
Access the recording on YouTube: https://tinyurl.com/3xdkmbtb
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDatabricks
Delight (https://www.datamechanics.co/delight) is a free & cross-platform monitoring dashboard for Apache Spark, which display system metrics (CPU Usage, Memory Usage) along with Spark information (jobs, stages, tasks) on the same timeline. Delight is a great complement to the Spark UI when it comes to troubleshooting your Spark application and understanding its performance bottleneck. It works freely on top of any Spark platform (whether it’s open-source or commercial, in the cloud or on-premise). You can install it using an open-sourced Spark agent (https://github.com/datamechanics/delight).
In this session, the co-founders of Data Mechanics will take you through performance troubleshooting sessions with Delight on real-world data engineering pipelines. You will see how Delight and the Spark UI can jointly help you spot the performance bottleneck of your applications, and how you can use these insights to make your applications more cost-effective and stable.
Planning and Implementing a Digital Library ProjectJenn Riley
Brancolini, Kristine and Jenn Riley. "Planning and Implementing a Digital Library Project," Indiana LSTA Digital Project Planning Workshop, February 7, 2006.
I would like to present to you my ppt about 'Blockchain and Libraries'. And that's within the course of "Digital Services in Data Centers and Archives" supervised by Dr Farah Sbeity at the Lebanese University.
#libraries #blockchaintechnology #Innovativelibraries #blockchainlibraries #digitaltransformation #research #digital #university #database #dataanalytics #databackup #datacenters #archive #informationmanagement #blockchain
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems.
Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a presentation.
The slides cover technologies such as Apache Kafka, Apache Spark, Confluent, Databricks, Snowflake, Elasticsearch, AWS Redshift, GCP with Google Bigquery, and Azure Synapse.
Slides from the IFLA ARL Hot Topics 2023 session held in Rotterdam, The Netherlands on 22 August 2023.
Presentation by: Cecilia Adewumi and Adetoun Oyelude (Nigeria)
Access the recording on YouTube: https://tinyurl.com/3xdkmbtb
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDatabricks
Delight (https://www.datamechanics.co/delight) is a free & cross-platform monitoring dashboard for Apache Spark, which display system metrics (CPU Usage, Memory Usage) along with Spark information (jobs, stages, tasks) on the same timeline. Delight is a great complement to the Spark UI when it comes to troubleshooting your Spark application and understanding its performance bottleneck. It works freely on top of any Spark platform (whether it’s open-source or commercial, in the cloud or on-premise). You can install it using an open-sourced Spark agent (https://github.com/datamechanics/delight).
In this session, the co-founders of Data Mechanics will take you through performance troubleshooting sessions with Delight on real-world data engineering pipelines. You will see how Delight and the Spark UI can jointly help you spot the performance bottleneck of your applications, and how you can use these insights to make your applications more cost-effective and stable.
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
Hybrid cloud architectures are the new black for most companies. A cloud-first strategy is evident for many new enterprise architectures, but some use cases require resiliency across edge sites and multiple cloud regions. Data streaming with the Apache Kafka ecosystem is a perfect technology for building resilient and hybrid real-time applications at any scale. This talk explores different architectures and their trade-offs for transactional and analytical workloads. Real-world examples include financial services, retail, and the automotive industry.
Video recording:
https://qconlondon.com/london2022/presentation/resilient-real-time-data-streaming-across-the-edge-and-hybrid-cloud
SQL Performance Improvements At a Glance in Apache Spark 3.0Kazuaki Ishizaki
This is a presentation deck for Spark AI Summit 2020 at
https://databricks.com/session_na20/sql-performance-improvements-at-a-glance-in-apache-spark-3-0
Productionizing Spark and the Spark Job ServerEvan Chan
You won't find this in many places - an overview of deploying, configuring, and running Apache Spark, including Mesos vs YARN vs Standalone clustering modes, useful config tuning parameters, and other tips from years of using Spark in production. Also, learn about the Spark Job Server and how it can help your organization deploy Spark as a RESTful service, track Spark jobs, and enable fast queries (including SQL!) of cached RDDs.
Lecture presented by Dr. Reinabelle C. Reyes at PAARL's Summer Conference on the theme "Library Analytics: Data-driven Library Management, held at Pearl Hotel, Manila on 20-22 April 2016
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Impetus Technologies
Traditional databases and batch ETL operations have not been able to serve the growing data volumes and the need for fast and continuous data processing.
How can modern enterprises provide their business users real-time access to the most up-to-date and complete data?
In our upcoming webinar, our experts will talk about how real-time CDC improves data availability and fast data processing through incremental updates in the big data lake, without modifying or slowing down source systems. Join this session to learn:
What is CDC and how it impacts business
The various methods for CDC in the enterprise data warehouse
The key factors to consider while building a next-gen CDC architecture:
Batch vs. real-time approaches
Moving from just capturing and storing, to capturing enriching, transforming, and storing
Avoiding stopgap silos to state-through processing
Implementation of CDC through a live demo and use-case
You can view the webinar here - https://www.streamanalytix.com/webinar/planning-your-next-gen-change-data-capture-cdc-architecture-in-2019/
For more information visit - https://www.streamanalytix.com
https://fosdem.org/2017/schedule/event/hpc_bigdata_calcite/
When working with BigData & IoT systems we often feel the need for a Common Query Language. The platform specific languages are often harder to integrate with and require longer adoption time.
To fill this gap many NoSql (Not-only-Sql) vendors are building SQL layers for their platforms. It is worth exploring the driving forces behind this trend, how it fits in your BigData stacks and how we can adopt it in our favorite tools. However building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your big data system.
Calcite has been used to empower many Big-Data platforms such as Hive, Spark, Drill Phoenix to name some.
I will walk you through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system.
Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. In this webinar, developers will learn:
*How Spark Streaming works - a quick review.
*Features in Spark Streaming that help prevent potential data loss.
*Complementary tools in a streaming pipeline - Kafka and Akka.
*Design and tuning tips for Reactive Spark Streaming applications.
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Objectives The objectives of the webinar are to:
• introduce AI in libraries
• describe the IDEA Institute on AI and its contribution to providing professional, innovative training in AI to library and other information professionals
• understand challenges and opportunities in implementing AI in libraries based on real-world experiences of the first cohort of Institute Fellows
• consider equity, diversity, inclusion and accessibility issues, and ethical questions, in AI implementation.
Speakers
Prof. Dr. Dania Bilal
Professor, School of Information Sciences at the University of Tennessee in Knoxville, TN.
Researcher, scholar and educator in Human Information Behavior, Human–Computer Interaction (HCI), User Experience and Design (UXD), Human–AI Interaction, and Information Science Theory.
Research focus is on user information interaction and behavior (children, teenagers and adults) with information systems, products and interfaces; and on user-centered design for better user engagement and experiences.
Principal Investigator and co-developer, IDEA Institute on Artificial Intelligence.
Clara M. Chu
Director and Mortenson Distinguished Professor, Mortenson Center for International Library Programs, University of Illinois at Urbana-Champaign, IL.
• Expert in developing appropriate and strategic solutions to deliver equitable and relevant library services in culturally diverse and dynamic libraries.
• Studies the information needs of culturally diverse communities in a globalized and technological society.
• Co-developer, IDEA Institute on Artificial Intelligence.
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Strumenti per il recupero dell'informazione bibliografica e servizi di biblio...Evelina Ceccato
Information literacy, ovvero della competenza informativa: un aiuto per la TESI, ma non solo. Slide della lezione 3, parte I, del Seminario d'information literacy per gli studenti del III anno del corso di laurea in Servizio Sociale @unipr a.a. 2016-2017 15 ore di lezioni frontali 9 ore di laboratorio elearning (BiblioPatente) 6 CFU Il seminario si propone di formare gli studenti all’uso consapevole delle fonti bibliografiche e documentali e di fornire competenze utili non solo per il proseguimento del percorso accademico ma anche per il futuro professionale. L’obiettivo finale è quello di fornire agli studenti gli strumenti per condurre le proprie ricerche in maniera autonoma: saper scegliere e mettere a fuoco l’argomento di ricerca, trovare il materiale, gestire la bibliografia. Argomenti di questa lezione: cataloghi di biblioteca, locali e nazionali, catalogo dei periodici elettronici, OPAC, SFX, servizi di biblioteca, prestito digitale, prestito interbibliotecario, reference.
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
Hybrid cloud architectures are the new black for most companies. A cloud-first strategy is evident for many new enterprise architectures, but some use cases require resiliency across edge sites and multiple cloud regions. Data streaming with the Apache Kafka ecosystem is a perfect technology for building resilient and hybrid real-time applications at any scale. This talk explores different architectures and their trade-offs for transactional and analytical workloads. Real-world examples include financial services, retail, and the automotive industry.
Video recording:
https://qconlondon.com/london2022/presentation/resilient-real-time-data-streaming-across-the-edge-and-hybrid-cloud
SQL Performance Improvements At a Glance in Apache Spark 3.0Kazuaki Ishizaki
This is a presentation deck for Spark AI Summit 2020 at
https://databricks.com/session_na20/sql-performance-improvements-at-a-glance-in-apache-spark-3-0
Productionizing Spark and the Spark Job ServerEvan Chan
You won't find this in many places - an overview of deploying, configuring, and running Apache Spark, including Mesos vs YARN vs Standalone clustering modes, useful config tuning parameters, and other tips from years of using Spark in production. Also, learn about the Spark Job Server and how it can help your organization deploy Spark as a RESTful service, track Spark jobs, and enable fast queries (including SQL!) of cached RDDs.
Lecture presented by Dr. Reinabelle C. Reyes at PAARL's Summer Conference on the theme "Library Analytics: Data-driven Library Management, held at Pearl Hotel, Manila on 20-22 April 2016
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Impetus Technologies
Traditional databases and batch ETL operations have not been able to serve the growing data volumes and the need for fast and continuous data processing.
How can modern enterprises provide their business users real-time access to the most up-to-date and complete data?
In our upcoming webinar, our experts will talk about how real-time CDC improves data availability and fast data processing through incremental updates in the big data lake, without modifying or slowing down source systems. Join this session to learn:
What is CDC and how it impacts business
The various methods for CDC in the enterprise data warehouse
The key factors to consider while building a next-gen CDC architecture:
Batch vs. real-time approaches
Moving from just capturing and storing, to capturing enriching, transforming, and storing
Avoiding stopgap silos to state-through processing
Implementation of CDC through a live demo and use-case
You can view the webinar here - https://www.streamanalytix.com/webinar/planning-your-next-gen-change-data-capture-cdc-architecture-in-2019/
For more information visit - https://www.streamanalytix.com
https://fosdem.org/2017/schedule/event/hpc_bigdata_calcite/
When working with BigData & IoT systems we often feel the need for a Common Query Language. The platform specific languages are often harder to integrate with and require longer adoption time.
To fill this gap many NoSql (Not-only-Sql) vendors are building SQL layers for their platforms. It is worth exploring the driving forces behind this trend, how it fits in your BigData stacks and how we can adopt it in our favorite tools. However building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your big data system.
Calcite has been used to empower many Big-Data platforms such as Hive, Spark, Drill Phoenix to name some.
I will walk you through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system.
Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
The slides explain how shuffle works in Spark and help people understand more details about Spark internal. It shows how the major classes are implemented, including: ShuffleManager (SortShuffleManager), ShuffleWriter (SortShuffleWriter, BypassMergeSortShuffleWriter, UnsafeShuffleWriter), ShuffleReader (BlockStoreShuffleReader).
Spark Streaming makes it easy to build scalable fault-tolerant streaming applications. In this webinar, developers will learn:
*How Spark Streaming works - a quick review.
*Features in Spark Streaming that help prevent potential data loss.
*Complementary tools in a streaming pipeline - Kafka and Akka.
*Design and tuning tips for Reactive Spark Streaming applications.
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Objectives The objectives of the webinar are to:
• introduce AI in libraries
• describe the IDEA Institute on AI and its contribution to providing professional, innovative training in AI to library and other information professionals
• understand challenges and opportunities in implementing AI in libraries based on real-world experiences of the first cohort of Institute Fellows
• consider equity, diversity, inclusion and accessibility issues, and ethical questions, in AI implementation.
Speakers
Prof. Dr. Dania Bilal
Professor, School of Information Sciences at the University of Tennessee in Knoxville, TN.
Researcher, scholar and educator in Human Information Behavior, Human–Computer Interaction (HCI), User Experience and Design (UXD), Human–AI Interaction, and Information Science Theory.
Research focus is on user information interaction and behavior (children, teenagers and adults) with information systems, products and interfaces; and on user-centered design for better user engagement and experiences.
Principal Investigator and co-developer, IDEA Institute on Artificial Intelligence.
Clara M. Chu
Director and Mortenson Distinguished Professor, Mortenson Center for International Library Programs, University of Illinois at Urbana-Champaign, IL.
• Expert in developing appropriate and strategic solutions to deliver equitable and relevant library services in culturally diverse and dynamic libraries.
• Studies the information needs of culturally diverse communities in a globalized and technological society.
• Co-developer, IDEA Institute on Artificial Intelligence.
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal.
In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.
Strumenti per il recupero dell'informazione bibliografica e servizi di biblio...Evelina Ceccato
Information literacy, ovvero della competenza informativa: un aiuto per la TESI, ma non solo. Slide della lezione 3, parte I, del Seminario d'information literacy per gli studenti del III anno del corso di laurea in Servizio Sociale @unipr a.a. 2016-2017 15 ore di lezioni frontali 9 ore di laboratorio elearning (BiblioPatente) 6 CFU Il seminario si propone di formare gli studenti all’uso consapevole delle fonti bibliografiche e documentali e di fornire competenze utili non solo per il proseguimento del percorso accademico ma anche per il futuro professionale. L’obiettivo finale è quello di fornire agli studenti gli strumenti per condurre le proprie ricerche in maniera autonoma: saper scegliere e mettere a fuoco l’argomento di ricerca, trovare il materiale, gestire la bibliografia. Argomenti di questa lezione: cataloghi di biblioteca, locali e nazionali, catalogo dei periodici elettronici, OPAC, SFX, servizi di biblioteca, prestito digitale, prestito interbibliotecario, reference.
Seminario di preparazione alla stesura della tesi di laurea in ambito giuridicoEvelina Ceccato
Argomenti del Seminario di preparazione alla stesura della tesi di laurea in ambito giuridico
Stumenti per il recupero dell'informazione bibliografica giuridica: cataloghi e banche dati
Citazione bibliografica
Servizi della biblioteca
Seminario studenti Corso di Laurea in Servizio Sociale - Lezione 16-4-13Elisa Minardi
Strumenti per la ricerca bibliografica (bibliografie e cataloghi, l'OPAC del Sistema bibliotecario parmense, SFX, il catalogo dei periodici elettronici dell'ateneo, cataloghi in liena di altre biblioteche)
Il sistema bibliotecario della Università Cattolica di Milano presentato agli studenti per incoraggiarli all'uso dei suoi numerosi strumenti e servizi.