Matthew Hale from the Kings Fund provided an interesting talk about how they implemented Hyku - an open source online archive solution and how it integrates with Koha
Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It provides storage for large datasets in the Hadoop Distributed File System (HDFS) and allows parallel processing of the data using the MapReduce programming model. Hadoop has evolved from Google's work and is developed by Yahoo and Apache to provide a low-cost solution for very large data volumes and processing needs across both structured and unstructured data sources.
NBITS is a best hadoop training institute providing customer project-based Training and Placements in Big Data Hadoop. NBITS provides Hadoop Training in Hyderabad by Real time experts faculty with 10+ yrs Experience.
Kibana + timelion: time series with the elastic stackSylvain Wallez
The document discusses Kibana and Timelion, which are tools for visualizing and analyzing time series data in the Elastic Stack. It provides an overview of Kibana's evolution and capabilities for creating dashboards. Timelion is introduced as a scripting language that allows users to transform, aggregate, and calculate on time series data from multiple sources to create visualizations. The document demonstrates Timelion's expression language, which includes functions, combinations, filtering, and attributes to process and render time series graphs.
Big data workloads using Apache Sparkon HDInsightNilesh Gule
Slidedeck used during the Azure UG meetup in Singapore on 17th May 2019. Demonstrates usage of Spark for running big data workloads on HDInsight cluster. Spark SQL, Dataset API along with Hive support was demonstrated
Hadoop architecture discussion of the Global Biodiversity Information Facility (GBIF) by Oliver Meyn for Toronto Hadoop Users Group (THUG) on 2015-11-27.
Video in french at https://www.youtube.com/watch?v=9LNnNh63rBI
Sizing an Elasticsearch cluster has to consider many dimensions. In this presentation we go through the different elements and features you should consider to handle big and varying loads of log data.
Kibana is a data visualization tool that is part of the ELK stack (Elasticsearch, Logstash, Kibana) and allows users to search, analyze, and visualize data stored in Elasticsearch. The document discusses Kibana's essential features including Discover to query data, Visualize to create visualizations, and Dashboard to combine them. It also covers additional tools like Dev Tools, X-Pack plugins, and Machine Learning capabilities.
Open source big data landscape and possible ITS applicationsSoftwareMill
What is big data, and how open-source big data projects, such as Apache Spark, Kafka and Cassandra can be used in ITS (Intelligent Transport Systems) related projects.
Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It provides storage for large datasets in the Hadoop Distributed File System (HDFS) and allows parallel processing of the data using the MapReduce programming model. Hadoop has evolved from Google's work and is developed by Yahoo and Apache to provide a low-cost solution for very large data volumes and processing needs across both structured and unstructured data sources.
NBITS is a best hadoop training institute providing customer project-based Training and Placements in Big Data Hadoop. NBITS provides Hadoop Training in Hyderabad by Real time experts faculty with 10+ yrs Experience.
Kibana + timelion: time series with the elastic stackSylvain Wallez
The document discusses Kibana and Timelion, which are tools for visualizing and analyzing time series data in the Elastic Stack. It provides an overview of Kibana's evolution and capabilities for creating dashboards. Timelion is introduced as a scripting language that allows users to transform, aggregate, and calculate on time series data from multiple sources to create visualizations. The document demonstrates Timelion's expression language, which includes functions, combinations, filtering, and attributes to process and render time series graphs.
Big data workloads using Apache Sparkon HDInsightNilesh Gule
Slidedeck used during the Azure UG meetup in Singapore on 17th May 2019. Demonstrates usage of Spark for running big data workloads on HDInsight cluster. Spark SQL, Dataset API along with Hive support was demonstrated
Hadoop architecture discussion of the Global Biodiversity Information Facility (GBIF) by Oliver Meyn for Toronto Hadoop Users Group (THUG) on 2015-11-27.
Video in french at https://www.youtube.com/watch?v=9LNnNh63rBI
Sizing an Elasticsearch cluster has to consider many dimensions. In this presentation we go through the different elements and features you should consider to handle big and varying loads of log data.
Kibana is a data visualization tool that is part of the ELK stack (Elasticsearch, Logstash, Kibana) and allows users to search, analyze, and visualize data stored in Elasticsearch. The document discusses Kibana's essential features including Discover to query data, Visualize to create visualizations, and Dashboard to combine them. It also covers additional tools like Dev Tools, X-Pack plugins, and Machine Learning capabilities.
Open source big data landscape and possible ITS applicationsSoftwareMill
What is big data, and how open-source big data projects, such as Apache Spark, Kafka and Cassandra can be used in ITS (Intelligent Transport Systems) related projects.
Presented on Codemotion Warsaw 2016 and JDD 2016.
Pig, Hive, Flink, Kafka, Zeppelin... if you now wonder if someone just tried to offend you or are those just Pokemon names, then this talk is just for you!
Big Data is everywhere and new tools for it are released almost at the speed of new JavaScript frameworks. During this entry level presentation we will walk though the challenges which Big Data presents, reflect how big is big and introduce currently most fancy and popular (mostly open source) tools.
We'll try to spark off interest in Big Data by showing application areas and by throwing ideas where you can later dive into.
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...OpenNebula Project
Virtion provides managed hosting and IT services using an infrastructure as a service platform based on OpenNebula. The platform utilizes multiple server clusters across two datacenter locations plus external clouds for scaling. OpenNebula has been in use since 2011 to manage all IaaS resources, both for Virtion's hosting customers and for managing customer on-premise locations. Block storage is provided by Storpool which has been in production since 2016 and provides high performance storage across two clusters located in separate fire zones. Future plans include upgrading OpenNebula, taking new Storpool features into production like cross-zone replication, expanding container services, and automating monitoring.
This document summarizes the key new features and capabilities in Neo4j 4.0. It discusses how Neo4j 4.0 provides unlimited scalability through sharding and federation. It also introduces a fully reactive architecture and granular security controls for privacy. Finally, it highlights how Neo4j Desktop can help developers work with Neo4j from idea to production.
This document discusses implementing graph database capabilities in XPages. It begins with introductions of the presenter Oliver Busse and an overview of graph databases. It then discusses some graph database products and frameworks as well as companies using graph databases. Key terminology for graphs like vertices, properties, and edges is defined. The document explains how graphs could be implemented in Domino using documents to store vertices and edges and outlines the data modeling and initialization process. Code examples and a demo application are referenced for further information.
Searching Billions of Product Logs in Real Time (Use Case)Ryan Tabora
This document provides an overview and summary of searching billions of product logs in real-time. It discusses how product logs from devices can generate large amounts of big data and the challenges of analyzing this data in real-time. It then summarizes new techniques like using Apache Solr or Elasticsearch to index the data and provide powerful search capabilities, allowing users to analyze logs across all devices. Key design challenges discussed include scaling the search infrastructure and handling changes to the data schema.
Vous n'avez pas pu assister à la journée DevOps by Xebia ? Voici la présentation de Guillaume Arnaud concernant Graphite, un outil de monitoring applicatif.
Frank van der Linden presented on connecting XPages applications to Cloudant. He began with an introduction to Cloudant, describing it as the cloud version of CouchDB that stores data as JSON documents. He then covered how to connect to Cloudant directly via REST or through an OSGi plugin, and described storing and retrieving data from Cloudant using a Java connector. Finally, he demonstrated integrating Cloudant with an XPages application to store and search job documents, attachments, and rich text.
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
Według szacunków do 2020 roku wygenerujmy 40 Zetta byte’ów, a do roku 2025 aż 163 Zetta byte’ów różnego rodzaju danych, a ich dokładna analiza ACpozwali na odkrywanie nowych zjawisk, optymalizacje procesów, czy wspomaganie procesów decyzyjnych. Aby efektywnie przetwarzać tak duże zbiory danych potrzebujemy nowych technik analizy danych oraz innowacyjnych rozwiązań technologicznych. Ważną role pełni tutaj chmura Azure, która oferuje szereg usług, przy użyciu których możemy tworzyć rozwiązania na potrzeby przetwarzania Big Data zarówno w trybie batch’owych jak i ‘near real time’. Podczas sesji stworzymy przykładowe rozwiązanie przetwarzania Big Data oparte o architekturę Lambda , z wykorzystaniem usług platformy Azure, takich jak Azure Data Factory, Azure Stream Analytics, Azure HdInsight, Azure Event (IoT) Hub, czy Azure Data Lake.
New Thor & Roxie Hardware ArchitectureHPCC Systems
From the 2017 HPCC Systems Community Day:
Jon Burger gives a deep dive into the plans for the new Thor & Roxie hardware architecture.
Jon Burger
Senior Architect, LexisNexis Risk Solutions
Jon Burger is LexisNexis Risk’s head infrastructure architect with 20+ years in information technology and over 15 years’ experience with the HPCC Systems platform. He has worked in a variety of roles within technology including Director of Technology, Director of HPCC Engineering in Network, Linux and Microsoft. He currently works out of the Boca Raton office and is the father to two teenage boys.
Drupal and the Semantic Web - ESIP Webinarscorlosquet
This document summarizes a presentation about using semantic web technologies like the Resource Description Framework (RDF) and Linked Data with Drupal 7. It discusses how Drupal 7 maps content types and fields to RDF vocabularies by default and how additional modules can add features like mapping to Schema.org and exposing SPARQL and JSON-LD endpoints. The presentation also covers how Drupal integrates with the larger Semantic Web through technologies like Linked Open Data.
This document summarizes the September 2015 community update for Apache Flink. Key highlights include Matthias Sax joining as a new committer, the release of version 0.9.1, and discussions starting around releasing version 0.10. Version 0.10 will include improvements to window operators, memory allocation, and new connectors to HDFS, Elasticsearch, and Kafka. The community held various meetups and presentations around the world in September and Flink was recognized as one of the best open source big data tools.
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
Alexander Zaitsev presented on LifeStreet's experience implementing ClickHouse for their ad analytics platform. Some key points:
- LifeStreet processes over 10 billion events per day from their ad exchange and needed a high performance analytics solution.
- They tried various databases but migrated fully to ClickHouse due to its performance for analytics workloads.
- Major challenges included designing an efficient schema, sharding and replication strategy, and reliable data ingestion.
- ClickHouse's dictionary feature allowed them to implement normalized dimensions tables while supporting updates, improving storage efficiency and query performance.
This document discusses streaming data architectures and technologies. It begins with defining streaming processing as processing data continuously as it arrives, rather than in batches. It then covers streaming architectures, scalable data ingestion technologies like Kafka and Flume, and real-time streaming processing systems like Storm, Samza and Spark Streaming. The document aims to provide an overview of building distributed streaming systems for processing high volumes of real-time data.
Getting Started with Riak - NoSQL Live 2010 - BostonRusty Klophaus
Riak (http://basho.com), a dynamo-inspired, open-source key/value datastore, was built to scale from a single machine to a 100+ cluster without driving you or your operations team crazy.
This presentation points out the characteristics of Riak that become important in small, medium, and large clusters, and then demonstrates the Riak API via the Python client library.
The IT committee update document discusses:
1) The IT committee has 18 members and is recruiting new members until December 12th. 2) Past meetings included discussions around server virtualization, alternative solutions for the infocenter, a new Galaxy strategy, and mobile app ideas. 3) Upcoming meetings will take place in Brussels from December 14-16 with 8-10 members attending.
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...OpenNebula Project
In this talk, Rubén and Tino will lay our the novelties (not all of them, there are many!) present in 5.4, ranging from core new functionality to the big changes in vCenter. Also, the roadmap for 5.6 and future versions would be laid out, as far as it is consolidated (it won't be closed yet, but nearly so).
It would also be the perfect session for feature requests, so don't miss it!
YouTube: https://youtu.be/Czzm2EimayY
Concourse CI is a container-based continuous integration and delivery tool that uses a pipeline-first approach with YAML configuration files. Key features include being language agnostic, having configuration defined as code, and providing good visibility into workflows. It consists of a web UI, API, and workers that run tasks in isolated containers to execute jobs when dependent resources change.
Rainer Schmidt, AIT Austrian Institute of Technology, presented Scalable Preservation Workflows from SCAPE at the 5-days ‘Digital Preservation Advanced Practitioner Training’ event (http://bit.ly/1fYCvMO), hosted by DPC, in Glasgow on 15-19 July 2013.
The presentation gives an introduction to the SCAPE Platform, it presents scenarios from SCAPE Testbeds and it finally describes how to create scalable workflows and execute them on the SCAPE Platform.
The Elephant in the Library - Integrating Hadoopcneudecker
This document discusses integrating Hadoop into libraries to help scale up their digitization efforts of cultural heritage materials. It provides background on two libraries' digitization projects and data volumes. It then outlines challenges of scaling up like use cases exploring document recognition, file format migration, and web archiving using Hadoop. Scenarios demonstrate running analytics on book metadata and page images and web archives stored in Hadoop.
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
Presentation of the European project SCAPE (www.scape-project.eu) at the Elag2013 conference in Gent/Belgium. The presentation includes details about use cases and implementation at the Austrian National LIbrary.
Presented on Codemotion Warsaw 2016 and JDD 2016.
Pig, Hive, Flink, Kafka, Zeppelin... if you now wonder if someone just tried to offend you or are those just Pokemon names, then this talk is just for you!
Big Data is everywhere and new tools for it are released almost at the speed of new JavaScript frameworks. During this entry level presentation we will walk though the challenges which Big Data presents, reflect how big is big and introduce currently most fancy and popular (mostly open source) tools.
We'll try to spark off interest in Big Data by showing application areas and by throwing ideas where you can later dive into.
OpenNebulaConf2017EU: Providing cloud and Managed Hosting Environment by Mich...OpenNebula Project
Virtion provides managed hosting and IT services using an infrastructure as a service platform based on OpenNebula. The platform utilizes multiple server clusters across two datacenter locations plus external clouds for scaling. OpenNebula has been in use since 2011 to manage all IaaS resources, both for Virtion's hosting customers and for managing customer on-premise locations. Block storage is provided by Storpool which has been in production since 2016 and provides high performance storage across two clusters located in separate fire zones. Future plans include upgrading OpenNebula, taking new Storpool features into production like cross-zone replication, expanding container services, and automating monitoring.
This document summarizes the key new features and capabilities in Neo4j 4.0. It discusses how Neo4j 4.0 provides unlimited scalability through sharding and federation. It also introduces a fully reactive architecture and granular security controls for privacy. Finally, it highlights how Neo4j Desktop can help developers work with Neo4j from idea to production.
This document discusses implementing graph database capabilities in XPages. It begins with introductions of the presenter Oliver Busse and an overview of graph databases. It then discusses some graph database products and frameworks as well as companies using graph databases. Key terminology for graphs like vertices, properties, and edges is defined. The document explains how graphs could be implemented in Domino using documents to store vertices and edges and outlines the data modeling and initialization process. Code examples and a demo application are referenced for further information.
Searching Billions of Product Logs in Real Time (Use Case)Ryan Tabora
This document provides an overview and summary of searching billions of product logs in real-time. It discusses how product logs from devices can generate large amounts of big data and the challenges of analyzing this data in real-time. It then summarizes new techniques like using Apache Solr or Elasticsearch to index the data and provide powerful search capabilities, allowing users to analyze logs across all devices. Key design challenges discussed include scaling the search infrastructure and handling changes to the data schema.
Vous n'avez pas pu assister à la journée DevOps by Xebia ? Voici la présentation de Guillaume Arnaud concernant Graphite, un outil de monitoring applicatif.
Frank van der Linden presented on connecting XPages applications to Cloudant. He began with an introduction to Cloudant, describing it as the cloud version of CouchDB that stores data as JSON documents. He then covered how to connect to Cloudant directly via REST or through an OSGi plugin, and described storing and retrieving data from Cloudant using a Java connector. Finally, he demonstrated integrating Cloudant with an XPages application to store and search job documents, attachments, and rich text.
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
Według szacunków do 2020 roku wygenerujmy 40 Zetta byte’ów, a do roku 2025 aż 163 Zetta byte’ów różnego rodzaju danych, a ich dokładna analiza ACpozwali na odkrywanie nowych zjawisk, optymalizacje procesów, czy wspomaganie procesów decyzyjnych. Aby efektywnie przetwarzać tak duże zbiory danych potrzebujemy nowych technik analizy danych oraz innowacyjnych rozwiązań technologicznych. Ważną role pełni tutaj chmura Azure, która oferuje szereg usług, przy użyciu których możemy tworzyć rozwiązania na potrzeby przetwarzania Big Data zarówno w trybie batch’owych jak i ‘near real time’. Podczas sesji stworzymy przykładowe rozwiązanie przetwarzania Big Data oparte o architekturę Lambda , z wykorzystaniem usług platformy Azure, takich jak Azure Data Factory, Azure Stream Analytics, Azure HdInsight, Azure Event (IoT) Hub, czy Azure Data Lake.
New Thor & Roxie Hardware ArchitectureHPCC Systems
From the 2017 HPCC Systems Community Day:
Jon Burger gives a deep dive into the plans for the new Thor & Roxie hardware architecture.
Jon Burger
Senior Architect, LexisNexis Risk Solutions
Jon Burger is LexisNexis Risk’s head infrastructure architect with 20+ years in information technology and over 15 years’ experience with the HPCC Systems platform. He has worked in a variety of roles within technology including Director of Technology, Director of HPCC Engineering in Network, Linux and Microsoft. He currently works out of the Boca Raton office and is the father to two teenage boys.
Drupal and the Semantic Web - ESIP Webinarscorlosquet
This document summarizes a presentation about using semantic web technologies like the Resource Description Framework (RDF) and Linked Data with Drupal 7. It discusses how Drupal 7 maps content types and fields to RDF vocabularies by default and how additional modules can add features like mapping to Schema.org and exposing SPARQL and JSON-LD endpoints. The presentation also covers how Drupal integrates with the larger Semantic Web through technologies like Linked Open Data.
This document summarizes the September 2015 community update for Apache Flink. Key highlights include Matthias Sax joining as a new committer, the release of version 0.9.1, and discussions starting around releasing version 0.10. Version 0.10 will include improvements to window operators, memory allocation, and new connectors to HDFS, Elasticsearch, and Kafka. The community held various meetups and presentations around the world in September and Flink was recognized as one of the best open source big data tools.
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
Alexander Zaitsev presented on LifeStreet's experience implementing ClickHouse for their ad analytics platform. Some key points:
- LifeStreet processes over 10 billion events per day from their ad exchange and needed a high performance analytics solution.
- They tried various databases but migrated fully to ClickHouse due to its performance for analytics workloads.
- Major challenges included designing an efficient schema, sharding and replication strategy, and reliable data ingestion.
- ClickHouse's dictionary feature allowed them to implement normalized dimensions tables while supporting updates, improving storage efficiency and query performance.
This document discusses streaming data architectures and technologies. It begins with defining streaming processing as processing data continuously as it arrives, rather than in batches. It then covers streaming architectures, scalable data ingestion technologies like Kafka and Flume, and real-time streaming processing systems like Storm, Samza and Spark Streaming. The document aims to provide an overview of building distributed streaming systems for processing high volumes of real-time data.
Getting Started with Riak - NoSQL Live 2010 - BostonRusty Klophaus
Riak (http://basho.com), a dynamo-inspired, open-source key/value datastore, was built to scale from a single machine to a 100+ cluster without driving you or your operations team crazy.
This presentation points out the characteristics of Riak that become important in small, medium, and large clusters, and then demonstrates the Riak API via the Python client library.
The IT committee update document discusses:
1) The IT committee has 18 members and is recruiting new members until December 12th. 2) Past meetings included discussions around server virtualization, alternative solutions for the infocenter, a new Galaxy strategy, and mobile app ideas. 3) Upcoming meetings will take place in Brussels from December 14-16 with 8-10 members attending.
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...OpenNebula Project
In this talk, Rubén and Tino will lay our the novelties (not all of them, there are many!) present in 5.4, ranging from core new functionality to the big changes in vCenter. Also, the roadmap for 5.6 and future versions would be laid out, as far as it is consolidated (it won't be closed yet, but nearly so).
It would also be the perfect session for feature requests, so don't miss it!
YouTube: https://youtu.be/Czzm2EimayY
Concourse CI is a container-based continuous integration and delivery tool that uses a pipeline-first approach with YAML configuration files. Key features include being language agnostic, having configuration defined as code, and providing good visibility into workflows. It consists of a web UI, API, and workers that run tasks in isolated containers to execute jobs when dependent resources change.
Rainer Schmidt, AIT Austrian Institute of Technology, presented Scalable Preservation Workflows from SCAPE at the 5-days ‘Digital Preservation Advanced Practitioner Training’ event (http://bit.ly/1fYCvMO), hosted by DPC, in Glasgow on 15-19 July 2013.
The presentation gives an introduction to the SCAPE Platform, it presents scenarios from SCAPE Testbeds and it finally describes how to create scalable workflows and execute them on the SCAPE Platform.
The Elephant in the Library - Integrating Hadoopcneudecker
This document discusses integrating Hadoop into libraries to help scale up their digitization efforts of cultural heritage materials. It provides background on two libraries' digitization projects and data volumes. It then outlines challenges of scaling up like use cases exploring document recognition, file format migration, and web archiving using Hadoop. Scenarios demonstrate running analytics on book metadata and page images and web archives stored in Hadoop.
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
Presentation of the European project SCAPE (www.scape-project.eu) at the Elag2013 conference in Gent/Belgium. The presentation includes details about use cases and implementation at the Austrian National LIbrary.
This case study deals with the creation, migration and maintenance of all websites of Arkema, the biggest French chemical company.
A quick overview will firstly be drawn about both Arkema and Eurelis companies, to better understand how we have been organizing to centralize all Arkema's websites.
Then, two different OpenCms platforms will be described, both internet and intranet platforms: which architecture has been chosen, how they have been set up and configured to host around 45 sites each. Also, it will be presented an overview regarding how the projects have been managed to guarantee their success, and how the deployment of all websites have been handled including a quick presentation of the training for every contributor.
After, a focus on the evolutions occurred upon the last 3 years will be depicted, particularly concerning the product and range pages, making the Front-End responsive and the OpenCms technical migrations.
Finally, thoughts regarding the future of both platforms will be shared.
Presentada en la Jornada Internacional sobre Archivos Web y Depósito Legal Electrónico, en la Biblioteca Nacional de España (BNE), el día 9 de julio de 2013.
Scala is widely used at Treasure Data for data analytics workflows, management of the Presto query engine, and open-source libraries. Some key uses of Scala include analyzing query logs to optimize Presto performance, developing Prestobase using Scala macros and libraries like Airframe, and integrating Spark with Treasure Data. Treasure Data engineers have also created several open-source Scala libraries, such as wvlet-log for logging and Airframe for dependency injection, and sbt plugins to facilitate packaging, testing, and deployment.
SE2016 - Java EE revisits design patterns 2016Alex Theedom
Design patterns are not only cool but represent the collective wisdom of many developers. Since the publication of Design Patterns: Elements of Reusable Object-Oriented Software by GoF many new concepts have extended the coverage of these design patterns, and now Java EE provides out-of-the box implementations of many of the most well known patterns. This talk will show how, by taking advantage of Java EE features such as CDI and the smart use of annotations, traditional design patterns can be implemented in a much cleaner and quicker way. Among the design patterns discuss there will be Singleton, Façade, Observer, Factory, Dependency Injection, Decorator and more.
This document discusses LinkedIn's use of Kafka, Hadoop, Storm, and Couchbase in their big data pipeline. It provides an overview of each technology and how LinkedIn uses them together. Specifically, it describes how LinkedIn uses Kafka to stream data to Hadoop for analytics and report generation. It also discusses how LinkedIn uses Hadoop to pre-build and warm Couchbase buckets for improved performance. The presentation includes a use case of streaming member profile and activity data through Kafka to both Hadoop and Couchbase clusters.
Thesis Proposal: User Application Profiles for Publishing Linked Data in HTM...Sean Petiya
User Application Profiles for Publishing Linked Data in HTML/RDFa: Building a Semantic Web of Comic Book Metadata.
Kent State University - July 30, 2014
The objective is to present a case study for building a domain ontology and extending the usability and usage of that vocabulary by developing metadata application profiles for specific user groups. These objectives will be realized by a metadata vocabulary for the description of comic books and comic book collections, titled the Comic Book Ontology (CBO) and a series of schemata for encoding records using appropriate members of that ontology, specifically an XML schema and a corresponding minimal version. A set of metadata application profiles will also be developed to guide the publication of comic book data using the vocabulary by identified user groups, which include libraries, collectors, creators, retailers, and publishers, and will present recommended elements, guidelines, and examples of encoding data in the markup of existing hypertext systems using HTML5 and RDFa. The study then aims to extend the usability and usage of those schemata by presenting a methodology for building application profiles guided by the development of assumptive, data-driven personas. It will generate these personas through a review of systems used by each participant and an analysis of existing content. The study also seeks to demonstrate how an ontology can be applied to existing collaborative indexing projects, datasets, or research to enhance the visibility, reference, and utilization of those endeavors through their publication as Linked Data. The overall, and long-term, goal is to explore methods for bringing enhanced bibliographic control and organization to the comic book domain, allowing the creative and intellectual efforts of writers, artists, contributors, scholars, researchers, and collectors to be better combined and shared, and well represented in the Semantic Web.
The document summarizes upcoming changes and improvements to Cincom Smalltalk products, including the Foundation, ObjectStudio, and VisualWorks. Key changes include modernizing the text editor, source code editor, and user interface with new "millennial" versions. The Foundation will see improvements to tools like SiouX and AppeX, as well as updated PostgreSQL drivers. ObjectStudio is focusing on next generation user interface integration. VisualWorks will improve skinning and layout functionality. Future plans include additional framework updates across products.
Eclipse Hawk provides scalable querying of models by indexing them into graph databases. It addresses challenges of collaborative modeling on large systems by distributed teams. The Hawk API is designed for flexibility, performance, and scalability through features like multiple communication styles, efficient encodings, and paged results.
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...Lucidworks
This document provides an agenda and overview for a presentation on H-Hypermap, a project to build a search platform called the Billion Object Platform (BOP) to index and search over billions of geo-tagged tweets in near real-time. The presentation will cover the architecture using Apache Kafka, Solr sharding, and techniques for fast geo-spatial queries and heatmaps. It will also discuss experiences using technologies like Kotlin, Dropwizard, Docker and Kontena.
State of Image Annotations - I Annotate 2016r0bcas7
- The document discusses image annotation standards and software. It proposes using resolution-independent coordinates and GeoJSON/WKT to annotate image areas with points, lines, and polygons.
- Major annotation standards mentioned are Open Annotation, Web Annotation, SharedCanvas, and IIIF. Software discussed includes Annotorious, HyperImage, digilib, SemToNotes, Mirador, and Diva.js.
- The document advocates for standards that allow annotating specific image areas, sharing annotations, and connecting annotations to form a "semantic network" of annotations linked to sources.
The document describes a project to identify and extract images from a collection of 19th century scanned books. Researchers used computer vision algorithms to detect faces in the books and found that female faces were detected more often than males. Over 580GB of images were extracted and uploaded to Flickr Commons where they received over 55 million views within 5 days. Workers and queues were used to distribute the image processing and uploading tasks. Ongoing monitoring is done to track changes to the images on Flickr.
1) Scratchpads are digital workspaces for collaboratively building biodiversity data and publishing information. Over 239 sites and 3,000+ users have used Scratchpads since 2007.
2) An updated version, Scratchpads 2, is scheduled for release in January 2012. It will include new technical features and user enhancements to better support collaboration and scholarly publication.
3) Long term, Scratchpads aim to serve as a fully digital scholarly communication system integrated throughout the entire research process, from assembling data to formal publication. Current publishing efforts via Scratchpads include open access journals and data publishing.
Koha is Ireland's fastest growing library management system. Interleaf Technology, which has long been associated with library technology, has expanded its product offerings to include Wi-Fi solutions, RFID, PC booking systems, open source consulting, and legal databases. Interleaf supports open source technology and has seen a 30% increase in staff alongside 47 new Irish libraries adopting Koha in 2018, bringing the total number of Koha libraries in Ireland to 158.
KohaCon 2018 was held in Portland, Oregon from May 21-25 with over 230 registered users from around the world. The conference included a cultural day and 3-day hackfest after 3 days of presentations on topics like EDI standards in the US, the SubjectsPlus discovery tool, linked data, data-driven decision making, and the Koha ILL module. Upcoming EDS and citation plugins were demonstrated. Talks also covered the Koha manual, Coral ERM integration, Elasticsearch indexing, and customizations at BULAC library. KohaCon 2019 will be held in Dublin, Ireland from May 20-26, 2019.
This document provides a summary of PTFS Europe's activities in 2018. It discusses the hiring of a new staff member, succession planning for leadership positions, and gaining over 260 years of combined library experience among staff. PTFS Europe also onboarded 16 new Koha customers live on the system. The company renewed ISO certifications, presented at KohaCon18, and won the bid to host KohaCon19. PTFS Europe increased community engagement through leadership roles in releases, meetings, and development contributions. The year was summarized as a success overall.
The document discusses the changing expectations of library users and the OPAC of the future. Users now expect to easily find exactly what they need with no effort. The OPAC of 2025 will have easy search options, minimal yet pertinent search results, intuitive design, and provide contextual information to users. It will utilize what patrons already know rather than requiring new skills.
The document provides an update on enhancements to Koha upgrades and the new features in version 18.11. Key points include:
1) PTFS Europe will maintain a branch to cherry-pick bugs and security fixes from the community version and add local fixes/features to support yearly upgrades to the .11 release.
2) Benefits are a known environment, quick security fixes, and potential sharing of customizations.
3) Infrastructure Manager will continue managing upgrades and Customer Services will assist with testing.
4) New features in 18.11 include duplicating order lines, invoice adjustments, currency permissions, and various circulation, reporting, and GDPR enhancements.
Plugins are optional modules that can extend or modify the functionality of the Koha library system in specific ways without requiring ongoing server configuration. The document discusses what plugins are available, including some from PTFS Europe that enable payment integrations and permissions checks, as well as plugins for coverflow browsing, improved searching, MARC record checking, and batch emailing patrons. Both advantages like easy installation and niche needs, as well as disadvantages like limited community involvement and support are outlined. A demo of the CLA Permissions Check plugin is provided.
This document discusses interlibrary loan (ILL) systems and services used in the UK and Ireland. It provides an overview of the ILL options available in different library types and countries. It also describes recent developments to improve ILL functionality within the Koha integrated library system, including the addition of modular backends that allow Koha to connect to different ILL services and exchange requests via APIs or email. The document outlines the configuration and setup needed to enable and use Koha's ILL module with these backends.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Liberarsi dai framework con i Web Component.pptxMassimo Artizzu
In Italian
Presentazione sulle feature e l'utilizzo dei Web Component nell sviluppo di pagine e applicazioni web. Racconto delle ragioni storiche dell'avvento dei Web Component. Evidenziazione dei vantaggi e delle sfide poste, indicazione delle best practices, con particolare accento sulla possibilità di usare web component per facilitare la migrazione delle proprie applicazioni verso nuovi stack tecnologici.
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...kalichargn70th171
In today's business landscape, digital integration is ubiquitous, demanding swift innovation as a necessity rather than a luxury. In a fiercely competitive market with heightened customer expectations, the timely launch of flawless digital products is crucial for both acquisition and retention—any delay risks ceding market share to competitors.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
Project Management: The Role of Project Dashboards.pdf
Kings fund - implementing Hyku
1. Implementing an open source online archive solution (Hyku) - interaction and comparisons with Koha
Matthew Hale
2. Digitised publications and first solution
• The King's Fund publications from 1898.
• 2013 - 1,771 documents digitised (in 2 batches) @ 30,000 images.
• Digital Archive and Repository Team at ULCC (CoSector) to
• implement ePrints with Wellcome Player embedded.
• TIFFs for Wellcome Player; JPEG2000 for archive copy; OCR’d for
• search; metadata from Koha.
3. Need for new solution
• Search poor; little ePrints functionality used; not good fit.
• Hybrid solution, poor admin interface - reliant on custom scripts.
• Upgrades and development issues; use of Wellcome Player.
• New pricing model January 2017.
4. The solution
• Samvera (known as Hydra 2008-2017) platform.
• Fedora Commons repository software; Solr indexing and searching;
• Blacklight facets; Ruby plugins ("gems") (i.e. IIIF search.)
• "Heads" for applications, user interfaces - technical, dependencies.
• Need for packaged, simpler solution - Hyku, easy implementation.
• Universal Viewer embedded; improved searching; modern admin
5. A familiar situation
• Similarities to Koha 9 years ago:
• open source;
• opportunity, but no other comparable examples;
• trusted supplier;
• remote hosting, SaaS;
• mutual learning process, ongoing development;
• trailblazing (with community responsibility);
6. Implementing Hyku
• New version of Archive successfully launched June 2018.
• First live Hyku instance in world – https://archive.kingsfund.org.uk
• Integrating with Koha:
• biblionumber;
• Koha metadata (Tools - Export);
• styling;
7. Linking from Koha results
• Example search
• MARC21slim2OPACDetail.xsl:
• <xsl:variable name="bibnumber">
• <xsl:value-of select="marc:datafield[@tag=999]/marc:subfield[@code='c']" />
• </xsl:variable>
• <a href="http://archive.kingsfund.org.uk/biblionumber/{$bibnumber}/">
• <img id="imageArchiveLink" src=
• "https://archive.kingsfund.org.uk/biblionumber_thumbnail/{$bibnumber}"
• height="120" width="90" onerror="this.style.display='none'" title="Digital copy" />
• </a>
• Jquery for any that are problematic.
8. Comparison of communities
• Unique requirements, local development and solutions.
• Decentralised, shared ethos rather than single product. Toolkit.
• Flexibility rather than cost: Brown, Columbia, Cornell,
• Harvard, Princeton, Stanford, Yale. In UK: Oxford, Durham, LSE, York.
• Anglo-US rather than global: HEA ; Samvera partners, developers and users.
• Reliance on local development teams.
• Branding issues, support monetisation, organisation, funding central
• developer.
9. Future work
• Collections for online exhibitions (Spotlight).
• Born-digital: as PDFs, native support in Universal Viewer; tesseract-
• ocr for OCR?; PDF/A archiving format with OCRmyPDF?
• Other formats – video, image, sound, web.
• Commitment to open source project - CoSector feeding code back
• to community. What will others develop? (Warburg, British Library.)
•