ResourceSync is a framework for synchronizing web resources between systems. The core team is developing standards for baseline synchronization using inventories, incremental synchronization using changesets, and push notifications using XMPP. The framework is based on reusing and extending existing sitemap formats to describe resources and changes in a modular way. Experiments show it can scale to synchronize large datasets like DBpedia and arXiv. Feedback is being solicited throughout 2012 to finalize the specifications.
ResourceSync core team members Bernhard Haslhofer and Simeon Warner will present on the ResourceSync specification and provide practical examples and scenarios for its application.
The slides were used to accompany an overview of the outcomes of the ResourceSync project at the 2014 Spring Membership Meeting of the Coalition for Networked Information (CNI).
The launch of ResourceSync, a joint project of the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) funded by the Alfred P. Sloan Foundation, was motivated by the ubiquitous need to synchronize resources for applications in the realm of cultural heritage and research communication. After an initial problem definition and scoping phase, the project has designed, specified, and tested a framework for web-based synchronization that is based on SiteMaps, a protocol widely used by web servers to advertise the resources they make available to search engines for indexing. This choice allows repositories to address both search engine optimization and resource synchronization needs using the same technology.
The ResourceSync framework specifies various modular capabilities that a repository can support in order to allow third party systems to remain synchronized with its evolving resources. For example, a Resource List provides an inventory of resources whereas a Change List details resources that were created, deleted or updated during a given temporal interval. Support for capabilities can be combined in order to meet local or community requirements. The framework specifies capabilities that require a third party to recurrently poll for up-to-date information about a repositories’ resources but also publish/subscribe capabilities that keep third parties informed about changes through notifications, thereby significantly reducing synchronization latency.
This presentation introduces ResourceSync, a specification aimed to enable web-based synchronization of resources. The specification is the result of a collaboration between NISO and the Open Archives Initiative funded by the Sloan Foundation and JISC. The proposed resource synchronization approach is based on several existing specifications (e.g. Sitemaps, PubSubHubbub, well-known URI) and is aligned with common architectural principles (e.g. REST, follow your nose).
A 15 minute video version of these slides is available at https://www.youtube.com/watch?v=ASQ4jMYytsA
Pesented at SWIB13 in Hamburg, 2013-11-27. ResourceSync slides excerpted from the full tutorial at http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
Presentation given at the International Digital Curation Conference in San Francisco, February 26 2014. Highlights the lack of machine-actionability of persistent identifiers assigned to scholarly communication assets. Proposes an approach to address the issue that meets requirements that take into account the changing nature of web based research communication. A draft paper provides more details: http://public.lanl.gov/herbertv/papers/Papers/2014/IDCC2014_vandesompel.pdf
ResourceSync core team members Bernhard Haslhofer and Simeon Warner will present on the ResourceSync specification and provide practical examples and scenarios for its application.
The slides were used to accompany an overview of the outcomes of the ResourceSync project at the 2014 Spring Membership Meeting of the Coalition for Networked Information (CNI).
The launch of ResourceSync, a joint project of the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) funded by the Alfred P. Sloan Foundation, was motivated by the ubiquitous need to synchronize resources for applications in the realm of cultural heritage and research communication. After an initial problem definition and scoping phase, the project has designed, specified, and tested a framework for web-based synchronization that is based on SiteMaps, a protocol widely used by web servers to advertise the resources they make available to search engines for indexing. This choice allows repositories to address both search engine optimization and resource synchronization needs using the same technology.
The ResourceSync framework specifies various modular capabilities that a repository can support in order to allow third party systems to remain synchronized with its evolving resources. For example, a Resource List provides an inventory of resources whereas a Change List details resources that were created, deleted or updated during a given temporal interval. Support for capabilities can be combined in order to meet local or community requirements. The framework specifies capabilities that require a third party to recurrently poll for up-to-date information about a repositories’ resources but also publish/subscribe capabilities that keep third parties informed about changes through notifications, thereby significantly reducing synchronization latency.
This presentation introduces ResourceSync, a specification aimed to enable web-based synchronization of resources. The specification is the result of a collaboration between NISO and the Open Archives Initiative funded by the Sloan Foundation and JISC. The proposed resource synchronization approach is based on several existing specifications (e.g. Sitemaps, PubSubHubbub, well-known URI) and is aligned with common architectural principles (e.g. REST, follow your nose).
A 15 minute video version of these slides is available at https://www.youtube.com/watch?v=ASQ4jMYytsA
Pesented at SWIB13 in Hamburg, 2013-11-27. ResourceSync slides excerpted from the full tutorial at http://www.slideshare.net/OpenArchivesInitiative/resourcesync-tutorial
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
Presentation given at the International Digital Curation Conference in San Francisco, February 26 2014. Highlights the lack of machine-actionability of persistent identifiers assigned to scholarly communication assets. Proposes an approach to address the issue that meets requirements that take into account the changing nature of web based research communication. A draft paper provides more details: http://public.lanl.gov/herbertv/papers/Papers/2014/IDCC2014_vandesompel.pdf
This presentation provides a problem perspective from the recently launched NISO/OAI ResourceSync effort that aims at devisions a framework for synchronizing web resources. The slides were used during a WebEx conference on March 6 2012.
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
This presentation: (1( Discusses why providing detailed attributions of individual contributions is essential to large scale sharing of historical research data; (2) Provides a short introduction to Open Linked Data; (3) Introduces the PastPlace Gazetteer API (Applications Programming Interface), explaining components of the RDF it generates using the example of Oxford, UK; (4) Notes that most open data projects use the Creative Commons -- Must Ackowledge license (CC-BY) while not actually acknowledging contributors within their RDF, then shows how we do it; (5) Introduces the separate PastPlace Datafeed API, which implements the W3C Datacube Vocabulary.
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
The presentation is designed for those interested in Hadoop technology, and can enhance your knowledge in Hadoop, such as community history, current development status, features of services, distributed computing framework and scenario of big data development in Enterprise.
Lessons learnt from the Murchison Widefield Array Data ArchiveChen Wu
Presentation at the the “Realising SKA-LOW: New Technologies & Techniques for Imaging and Calibration of Low Frequency Arrays” Conference on 29 March 2017
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
If you are search Best Engineering college in India, Then you can trust RCE (Roorkee College of Engineering) services and facilities. They provide the best education facility, highly educated and experienced faculty, well furnished hostels for both boys and girls, top computerized Library, great placement opportunity and more at affordable fee.
NISO access related projects (presented at the Charleston conference 2016)Christine Stohn
Presentation by Pascal Calarco (University of Windsor), Christine Stohn (Ex Libris/ProQuest), John G. Dove (Paloma Associates), covering NISO D2D work, ResourceSync, KBART and KBART automation, ODI (Open Discovery Initiative), Link origin tracking, ALI (Access and License Indicators), and a discussion around improvements and challenges for open access discovery
Slides for talk presented at Boulder Java User's Group on 9/10/2013, updated and improved for presentation at DOSUG, 3/4/2014
Code is available at https://github.com/jmctee/hadoopTools
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
This presentation provides a problem perspective from the recently launched NISO/OAI ResourceSync effort that aims at devisions a framework for synchronizing web resources. The slides were used during a WebEx conference on March 6 2012.
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
This presentation: (1( Discusses why providing detailed attributions of individual contributions is essential to large scale sharing of historical research data; (2) Provides a short introduction to Open Linked Data; (3) Introduces the PastPlace Gazetteer API (Applications Programming Interface), explaining components of the RDF it generates using the example of Oxford, UK; (4) Notes that most open data projects use the Creative Commons -- Must Ackowledge license (CC-BY) while not actually acknowledging contributors within their RDF, then shows how we do it; (5) Introduces the separate PastPlace Datafeed API, which implements the W3C Datacube Vocabulary.
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
The presentation is designed for those interested in Hadoop technology, and can enhance your knowledge in Hadoop, such as community history, current development status, features of services, distributed computing framework and scenario of big data development in Enterprise.
Lessons learnt from the Murchison Widefield Array Data ArchiveChen Wu
Presentation at the the “Realising SKA-LOW: New Technologies & Techniques for Imaging and Calibration of Low Frequency Arrays” Conference on 29 March 2017
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
If you are search Best Engineering college in India, Then you can trust RCE (Roorkee College of Engineering) services and facilities. They provide the best education facility, highly educated and experienced faculty, well furnished hostels for both boys and girls, top computerized Library, great placement opportunity and more at affordable fee.
NISO access related projects (presented at the Charleston conference 2016)Christine Stohn
Presentation by Pascal Calarco (University of Windsor), Christine Stohn (Ex Libris/ProQuest), John G. Dove (Paloma Associates), covering NISO D2D work, ResourceSync, KBART and KBART automation, ODI (Open Discovery Initiative), Link origin tracking, ALI (Access and License Indicators), and a discussion around improvements and challenges for open access discovery
Slides for talk presented at Boulder Java User's Group on 9/10/2013, updated and improved for presentation at DOSUG, 3/4/2014
Code is available at https://github.com/jmctee/hadoopTools
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
From Backups To Time Travel: A Systems Perspective on SnapshotsNuoDB
Many applications today are dependent on databases. Access to past states of database data enables new kinds of useful queries: time-traveling queries. With time travel, application developers can analyze and predict trends in changing data over time, detect data anomalies, and recover from user error such as accidental deletion of data (without relying on a cumbersome database restore). System administrators want simple and efficient backups. Database snapshots can bridge this gap and provide both, without disrupting performance.
This talk dives into snapshots as a database system service. We will discuss design choices for snapshots and time travel, and how those choices impact applications. You will learn about novel research results about how to add snapshots to a database system in a modular way, and we will touch on the challenges and opportunities that present when that database is distributed.
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
A 24x7 presentation at Open Repositories 2017 in Brisbane, Australia.
I start with an opinionated history of the evolution of repository data harvesting since the late 1990's to the present. A conclusion is that we are currently in danger of creating a repository environment with fewer cross-repository services than before, with the potential to reinforce the silos we hope to open. I suggest that the community needs to agree upon a new solution, and further suggest that solution should be ResourceSync.
Open Source Cloud Sync and Share software provides synchronisation layer on top of a variety of backend storages such as local filesystems and object storage. In case of some software stacks, such as ownCloud, a SQL database is used to support the synchronisation requirements.
We tested how different technology stacks impact the ownCloud HTTP-based Synchronisation Protocol. Efficiency and scalability analysis was performed based on benchmarking results. The results have been produced using the ClawIO framework prototype.
ClawIO is a Cloud Synchronisation Benchmarking Framework. The software provides a base
architecture to stress different storage solutions against different cloud synchronisation protocols.
Such architecture is based on the IETF Storage Sync draft specification and CERN EOS.
The synchronisation logic is divided into control and data servers.
This separation is done by the use of highly-decoupled micro services connected to each other using high performance communication protocols such a gRPC and HTTP/2.
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services
Learning Objectives:
- Understand key requirements for collecting, preparing, and loading streaming data into data lakes
- Get an overview of transmitting data using Amazon Kinesis Firehose
- Learn how to perform data transformations with Amazon Kinesis Firehose
Data lakes enable your employees across the organization to access and analyze massive amounts of unstructured and structured data from disparate data sources, many of which generate data continuously and rapidly. Making this data available in a timely fashion for analysis requires a streaming solution that can durably and cost-effectively ingest this data into your data lake. Amazon Kinesis Firehose is a fully managed service that makes it easy to prepare and load streaming data into AWS. In this tech talk, we will provide an overview of Amazon Kinesis Firehose and dive deep into how you can use the service to collect, transform, batch, compress, and load real-time streaming data into your Amazon S3 data lakes.
Stateful streaming and the challenge of stateYoni Farin
The different challenges of working with state in a distributed streaming data pipeline and how we solve it with the 3S architecture and Kafka streams stores based on rocksDB
RDBMS gave us table schemas. A table schema, which is an essential metadata component, gave us the power to validate data types, and enforce constraints. In the age of varying data and schema-less data stores, how can we enforce these rules and how can we leverage metadata (even in RDBMS) to empower data validity, code checks, and automation.
This is a brief background into Big data (data lake) to put in context the importance of metadata from a governance perspective and more especially in todays heterogeneous big data platforms.
Data scientists spend too much of their time collecting, cleaning and wrangling data as well as curating and enriching it. Some of this work is inevitable due to the variety of data sources, but there are tools and frameworks that help automate many of these non-creative tasks. A unifying feature of these tools is support for rich metadata for data sets, jobs, and data policies. In this talk, I will introduce state-of-the-art tools for automating data science and I will show how you can use metadata to help automate common tasks in Data Science. I will also introduce a new architecture for extensible, distributed metadata in Hadoop, called Hops (Hadoop Open Platform-as-a-Service), and show how tinker-friendly metadata (for jobs, files, users, and projects) opens up new ways to build smarter applications.
Rainer Schmidt, AIT Austrian Institute of Technology, presented Scalable Preservation Workflows from SCAPE at the 5-days ‘Digital Preservation Advanced Practitioner Training’ event (http://bit.ly/1fYCvMO), hosted by DPC, in Glasgow on 15-19 July 2013.
The presentation gives an introduction to the SCAPE Platform, it presents scenarios from SCAPE Testbeds and it finally describes how to create scalable workflows and execute them on the SCAPE Platform.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Similar to ResourceSync: Web-based Resource Synchronization (20)
Questioning Authority Lookup Service: Linking the DataSimeon Warner
One segment of a presentation "From idea to implementation: BIBFRAME becomes reality", Charleston, 2022
The implementation of BIBFRAME in active cataloguing workflows and linked data exchange environments is live and it’s evolving across several paths that are often intertwined. This complex bibliographic ecosystem consists of many experiences that the speakers will present highlighting their value both as autonomous endeavours, as well as from the perspective of interaction and options for mutual integration.
The Library of Congress, with the BIBFRAME original cataloguing editor, Marva, will report about developments and achievements for bringing BIBFRAME into practice in a very large library environment with many cataloguing workflows for diverse types of resources, encompassing the use of and adjustments to the BIBFRAME ontology and its modelling.
On the topic of original and copy cataloguing in linked data, Stanford and Cornell Universities are working to achieve a dynamic form of cataloguing through the implementation of Sinopia linked data editor and enrichment tools such as the Questioning Authority that queries authoritative sources to support linked data authorities.
Regarding the impact of linked data processes on the user experience, the University of Pennsylvania has contributed a study describing the functionalities and scenarios which the Share-VDE 2.0 entity discovery system https://www.svde.org/ addresses, and the ways in which user feedback is supporting the evolution of linked data discovery.
Share-VDE (SVDE) is an international library-driven initiative which brings together the bibliographic catalogues and authority files of a community of libraries in an innovative entity discovery environment based on linked data. A path towards the integration of SVDE with the local library services at the University of Pennsylvania and with the Sinopia environment is ongoing. Being a linked open data node, SVDE supports various levels of interoperability and also provides additional tools like the J.Cricket entity editor based on BIBFRAME that opens up new forms of cooperation among libraries to manage and maintain linked data entities.
OCFL: A Shared Approach to Preservation PersistenceSimeon Warner
A lightning talk at the CNI Fall Forum 2022: The Oxford Common File Layout (OCFL) is an application-independent method for storing and versioning content for digital preservation. Version 1.1 was released in October 2022, including backwards compatible corrections and clarifications based on implementation experience and community feedback. The session will recap goals, summarize changes in v1.1, and survey current implementations.
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
The Oxford Common File Layout (OCFL) specification began as a discussion at a Fedora/Samvera Camp held at Oxford University in September of 2017. Since then, it has grown into a focused community effort to define an open and application-independent approach to the long-term preservation of digital objects. Developed for structured, transparent, and predictable storage, it is designed to promote sustainable long-term access and management of content within digital repositories. This presentation will focus on the motivations and vision for the OCFL, explain key choices for the specification, and describe the status of implementation efforts.
Introduction to the International Image Interoperability Framework (IIIF)Simeon Warner
Introduction to the International Image Interoperability Framework (IIIF), Tutorial at Library Network Days, National Library of Finland, Helsinki, 2017-10-26
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFSimeon Warner
Identifiers, including ORCID, ISNI, LC NACO and VIAF, are playing an increasing role in library authority work. Well describe changes to cataloging practices to leverage identifiers. We'll then tell a short story of the how and why of ORCID identifiers for researchers, and relationships with other person identifiers. Finally, we'll discuss the use of identifiers as part of moves toward linked data cataloging being explored in Linked Data for Libraries work (in the LD4L Labs and LD4P projects).
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
ResourceSync: Web-based Resource Synchronization
1. ResourceSync:
Web-based
Resource
Synchronization
Simeon Warner (Cornell)
Open Repositories 2012, Edinburgh, 11 July 2012
2. Core team -- Todd Carpenter (NISO), Berhard Haslhofer, (Cornell
University), Martin Klein (Los Alamos National Laboratory), Nettie
Lagace (NISO), Carl Lagoze (Cornell University), Peter Murray
(NISO), Michael L. Nelson (Old Dominion University), Robert
Sanderson (Los Alamos National Laboratory), Herbert Van de
Sompel (Los Alamos National Laboratory), Simeon Warner (Cornell
University)
Team members – Richard Jones (JISC/Cottage Labs), Stuart
Lewis (JISC/Cottage Labs), Graham Klyne (JISC), Shlomo Sanders
(Ex Libris), Kevin Ford (LoC), Ed Summers (LoC), Jeff Young
(OCLC), David Rosenthal (Stanford)
Funding – The Sloan Foundation (core team) and the JISC (UK
participation)
Thanks for slides from – Stuart Lewis, Herbert Van de Sompel
3. Synchronize what?
• Web resources – things with a URI that can be
dereferenced and are cache-able (no dependency on
underlying OS, technologies etc.)
• Small websites/repositories (a few resources) to
large repositories/datasets/linked data collections
(many millions of resources)
• That change slowly (weeks/months) or quickly
(seconds), and where latency needs may vary
• Focus on needs of research communication and
cultural heritage organizations, but aim for generality
3
4. Why?
… because lots of projects and services are
doing synchronization but have to roll their
own on a case by case basis!
• Project team involved with projects that need this
• Experience with OAI-PMH: widely used in repos but
o XML metadata only
o Web technology has moved on since 1999
• Data / Metadata / Linked Data – Shared solution?
8. Use case: DBpedia Live duplication
• 20M entries updated @ 1/s though sporadic
• Want low latency => need a push technology
9. Use case: arXiv mirroring
• 1M article versions, ~800/day created
or updated at 8pm US eastern time
• Metadata and full-text for each article
• Accuracy important
• Want low barrier for others to use
• Look for more general solution than current
homebrew mirroring (running with minor
modifications since 1994!) and occasional
rsync (filesystem layout specific, auth issues)
10. Terminology
• Resource: an object to be synchronized, a web resource
• Source: system with the original or master resources
• Destination: system to which resources from the source will be
copied and kept in synchronization
• Pull: process to get information from source to destination
initiated by the destination.
• Push: process to get information from source to destination
initiated by the source (and some subscription mechanism)
• Metadata: information about resources such as URI,
modification time, checksum, etc. (Not to be confused with
resources that may themselves be metadata records)
11. Three basic needs
1. Baseline synchronization – A destination must be
able to perform an initial load or catch-up with a
source
- avoid out-of-band setup; provide discovery
2. Incremental synchronization – A destination must
have some way to keep up-to-date with changes at a
source
- subject to some latency; minimal: create/update/delete
3. Audit – It should be possible to determine whether a
destination is synchronized with a source
- subject to some latency; want efficiency > HTTP HEAD
12. Baseline synchronization
Either
• Get inventory of resources and then copy them one-
by-one using HTTP GET
o simplest, inventory is list of resources plus perhaps metadata
o inventory format?
or
• Get dump of resources and all necessary metadata
o more efficient: reduce number of round trips
o dump format?
13. Audit
Could do new Baseline synchronization and compare …
but likely very inefficient! Optimize by adding:
• Get inventory and compare with copy at destination
o use timestamp, digest or other metadata in inventory to
check content (effort çè accuracy tradeoff)
o latency depends on freshness of inventory and time to copy
and check (easier to cope with if modification times included
in metadata)
14. Incremental synchronization
Simplest method is Audit and then copy of all new/
updated resources, plus removal of deleted resources.
Optimize by adding:
• Change Communication – Exchange ChangeSet
listing only updates
- How to understand sequence, schedule?
• Resource Transfer – Exchange dumps for
ChangeSets or even diffs appropriate to resource type
Change Memory necessary to record sequence or
intermediate states.
18. A framework based on Sitemaps
• Modular framework allowing selective deployment
• Sitemap is the most basic component of the
framework
• Reuse Sitemap form for changesets and notifications
(same <url> element describing resource)
• Selective synchronization via tagging
• Discovery of capabilities via <atom:link>!
• Further extension possible
18
20. Level zero è Publish a Sitemap
• Periodic publication of an up-to-date Sitemap is
base level implementation
• Use Sitemap <url> as is with <loc> and
<lastmod> as core elements for each Resource
o Introduce optional extra elements to convey fixity information,
size, tags for selective synchronization, etc.
• Extend to:
o Convey Source capabilities, discovery informatio, locations of
dumps, locations of changesets, change memory, etc.
o Provide timestamp and/or additional metadata for the
Sitemap
22. Two resources, with lastmod times, sizes and
digests. The second with a tag also
23. Sitemap details & issues
• Sitemap XML format designed to allow extension
• ResourceSync additions:
o Additional core elements in ResourceSync namespace
(digest, size, update information)
o Discovery information using <atom:link> elements
• Use existing Sitemap Index scheme for large sets of
resources (handles up to 2.5 billion resources before
further extension required)
• Provide mapping to RDF semantics but keep XML
simple
23
25. ChangeSet
• Reuse Sitemap format but include information only for change
events over a certain period:
• One <url> element per change event
• The <url> element uses <loc> and <lastmod> as is and
is extended with:
• an event type to express create/update/delete
• an optional event id to provide a unique identifier for the
event.
• can further extend to include fixity, tag info, Memento
TimeGate link, special-purpose access-point, etc.
• Introduce minimal <urlset>-level extensions to support:
• Navigation between ChangeSets via <atom:link>
• Timestamping the ChangeSet
25
26. Expt: arXiv – Inventory and ChangeSet
• Baseline synchronization and Audit (Inventory):
o 2.3M resources (300GB content)
o 46 sitemaps and 1 sitemapindex (50k resources/sitemap)
o sitemaps ~9.3MB each -> 430MB total uncompressed;1.7MB
each -> 78MB total if gzipped (<0.03% content size)
• Incremental synchronization (ChangeSet):
o arXiv has updates daily @ 8pm so create daily ChangeSet
o ~1k additions and 700 updates per day
o 1 sitemap ~300kB or 20kB gzipped, can be generated and
served statically
o keep chain of ChangeSets, link with <atom:link>
28. Change Communication: Push via XMPP
• Rapid notification of change events via XMPP
PubSub node; one notification per event
• Each change event is conveyed using a Sitemap
<url> element contained in a dedicated XMPP
<item> wrapper
• Use same resource metadata (e.g. <loc>,
<lastmod>) and same extensions as with
changesets
• Multiple change events can be grouped into a single
XMPP message (using <items>)
29. Expt: LiveDBpedia with XMPP Push
• LANL Research Library ran a significant scale
experiment in synchronization of the LiveDBpedia
database from Los Alamos to two remote sites using
XMPP to push change notifications
o Push for change communication only, content then obtained
with HTTP GET
• Destination sites were able to keep in close
synchronization with sources
o Maximum queued updates <400 over 6 runs with 100k
updates; and bursty updates averaging ~1/s
o Small number of errors suggests use for audit in many real-
life situations
30. Dumps
Optimization over making repeated HTTP GET requests
for multiple resources. Use for baseline and changeset.
Options:
1. ZIP+Sitemap
o simple and ZIP very widely used
o consistent inventory/change/set format
o con: “custom”
2. WARC
o designed for exactly this purpose
o con: little used outside web archiving community
32. Timeline and input
• July 2012 – First draft of sitemap-based spec (SOON)
• August 2012 – Publicize and solicit feedback (will be
NISO email list)
• September 2012 – Revise, more experiments, more
feedback
• December 2012 – Finalize specification (?)
• NISO webspace
• Code on github: http://github.org/resync/simulator