Data ingestion and distribution with apache NiFiLev Brailovskiy
In this session, we will cover our experience working with Apache NiFi, an easy to use, powerful, and reliable system to process and distribute a large volume of data. The first part of the session will be an introduction to Apache NiFi. We will go over NiFi main components and building blocks and functionality.
In the second part of the session, we will show our use case for Apache NiFi and how it's being used inside our Data Processing infrastructure.
K8s in 3h - Kubernetes Fundamentals TrainingPiotr Perzyna
Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. This training helps you understand key concepts within 3 hours.
Docker - un outil pour faciliter le développement et le déploiement informatiquesdenier
Cette présentation s'adresse aussi bien aux débutants qu'aux utilisateurs de Docker cherchant à en découvrir de nouveaux aspects.
- caractéristiques de Docker et écosystème
- cas d'usage : création d’environnement automatisé pour le développement, déploiement et orchestration de conteneurs, Docker sous Windows
Présentation donnée dans le cadre du Festival Transfo 2019 http://www.festival-transfo.fr/evenement/145/14-docker-un-outil-pour-faciliter-le-developpement-et-le-deploiement-informatique.htm
Rejoindre le meetup des Matinales techniques de Sogilis : https://www.meetup.com/Les-matinales-techniques-de-Sogilis
Creating a Virtual Library space using free web toolsS. L. Faisal
An introduction to selected web tools useful for creating a virtual library space. The tools include WordPress, Wakelet, SoundCloud, Linktree, Facebook, Twitter, YouTube, Padlet, Flipgrid, ReadWorks, Book Creator, and Storyweaver.
Data ingestion and distribution with apache NiFiLev Brailovskiy
In this session, we will cover our experience working with Apache NiFi, an easy to use, powerful, and reliable system to process and distribute a large volume of data. The first part of the session will be an introduction to Apache NiFi. We will go over NiFi main components and building blocks and functionality.
In the second part of the session, we will show our use case for Apache NiFi and how it's being used inside our Data Processing infrastructure.
K8s in 3h - Kubernetes Fundamentals TrainingPiotr Perzyna
Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. This training helps you understand key concepts within 3 hours.
Docker - un outil pour faciliter le développement et le déploiement informatiquesdenier
Cette présentation s'adresse aussi bien aux débutants qu'aux utilisateurs de Docker cherchant à en découvrir de nouveaux aspects.
- caractéristiques de Docker et écosystème
- cas d'usage : création d’environnement automatisé pour le développement, déploiement et orchestration de conteneurs, Docker sous Windows
Présentation donnée dans le cadre du Festival Transfo 2019 http://www.festival-transfo.fr/evenement/145/14-docker-un-outil-pour-faciliter-le-developpement-et-le-deploiement-informatique.htm
Rejoindre le meetup des Matinales techniques de Sogilis : https://www.meetup.com/Les-matinales-techniques-de-Sogilis
Creating a Virtual Library space using free web toolsS. L. Faisal
An introduction to selected web tools useful for creating a virtual library space. The tools include WordPress, Wakelet, SoundCloud, Linktree, Facebook, Twitter, YouTube, Padlet, Flipgrid, ReadWorks, Book Creator, and Storyweaver.
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
KOHA is world's best, Open-source, Free-to-download and use-share Library Management Solution available for all types and sizes of libraries.
OpenLX is the largest service provider for KOHA in/around India.
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Speakers: Robert Sanders, Shekhar Vemuri
KOHA is world's best, Open-source, Free-to-download and use-share Library Management Solution available for all types and sizes of libraries.
OpenLX is the largest service provider for KOHA in/around India.
Open Source and Accesssiblity - t12t meetup 181122Erik Zetterström
How can accessibility benefit from open source solutions? And how do you build a business based on open source solutions? Current success stories and an exclusive peek at the future through Erik's crystal ball.
Slides accompanying a presentation by Dan Gillean, delivered at the Glenstone Digital Preservation Roundtable in Potomac, Maryland, November 4th, 2016.
These slides introduce Archivematica's approach to supporting digital preservation worfklows, and our development philosophy behind the application.
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Artefactual Systems - AtoM
Slides accompanying a presentation given by Dan Gillean on June 7th, 2018 at Open Repositories 2018, held in Bozeman, MT.
Access to Memory is a web-based open source application for standards based description and access. AtoM was first released in 2008 and much of the codebase is now relying on deprecated frameworks and libraries – and at the same time, new standards and technologies are changing how our profession approaches description and access. Currently Artefactual Systems, a Canadian based company, uses a services model to support the project. Artefactual is looking ahead to AtoM3, and considering building a linked data driven platform for archival description and access. As we consider AtoM's next generation, we are also examining governance and maintenance models to sustain the project and better empower our user community as Artefactual wasn't originally intended to be AtoM's organizational home. This presentation will offer some thoughts on existing open source project governance models, challenges, and possibilities for the future. How do we ensure community engagement and project sustainability over time?
Slides accompanying a presentation delivered at the VII Congresso Nacional de Arquivologia in Fortaleza, Brazil, on October 19th, 2016. The slides provide an overview of the AtoM project's history, its maintenance by Artefactual, and its development philosophy, before proceeding to examine the application as a component used in a digital preservation ecosystem. Aspects of ISO 16363:2012, the Audit and Certification of Trustworthy Digital Repositories standard, are used to evaluate how AtoM can support description, management, administration, and access functions when used to maintain a chain of custody in a trustworthy digital repository ecosystem.
Slides for Culture Hack panel @SXSW2013 : http://schedule.sxsw.com/2013/events/event_IAP4580
Some slides re-used from Harry Verwayen (http://www.slideshare.net/hverwayen/business-model-innovation-open-data) and Julia Fallon
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
2. What is it?
Collections management software for museums
and archives.
Collections presentation software providing
framework for web and kiosk applications;
includes media clients such as a high-resolution
image viewer and audio/video player, and can
transcode video & audio formats.
A collaboration between Whirl-i-Gig and
partner institutions in N.America & Europe.
Freely available under the open source GNU
Public License (GPL).
3. Two Versions
Version 0.5x (current version=0.55)
Currently available
Originally named OpenCollection
First deployed in 2004
In use at 25+ sites (that we know of)
worldwide
Version 0.6/1.0 (aka. “Providence”)
Expected release in first half of 2009
Addresses many of the limitations of the 0.5x
model the were exposed in 4+ years of real-
world use in a variety of settings
4. 0.5x Features
Entirely web-based.
Integrated digital asset management - support
for many media and document formats.
Extensive support for authority lists and
controlled vocabularies.
Configurable (but limited) support for metadata
standards.
Direct web-presence with CA-Access.
Georeferencing/GIS support.
Can run on Linux/Unix, Mac OS X and
Windows servers.
5.
6.
7.
8.
9.
10.
11.
12.
13. Improvements in 0.6
Localization: user interface can be in many languages.
Multilingual cataloguing: all fields support translation into
multiple languages.
No longer object-centric: object and authority items (people,
geographic places, events, film productions) can be given equal
importance in the user interface.
Configurable schema: all fields are now configurable. No more
hardcoded fields.
Compound fields: configurable fields can be composed of many
values, each having a specific type (eg. text, date range, number,
predefined pick-list, type-ahead lookup into an authority or web-
service). This makes PBCore support possible.
Pre-configured standards: Can be automatically configured to
support various metadata schemes via configuration profiles.
Initial support for PBCore, DublinCore, CEN/TC-372, CA 0.5x
compatibility mode and a selection of use-specific custom
schemas (eg. location-based photo archive, documentary archive,
exhibition archive). You can write your own.
14. Improvements in 0.6
Overhauled user interface: fewer clicks, easier navigation.
Uses shiny new browser features that weren’t available in
2003-2004 when the 0.5x UI was designed.
New media types: adding built-in support for image formats
such as DPX (digital projection).
“Pluggable” search engine: can use any back-end search
engine for which a plug-in has been written. Five engines are
being developed for the first release: PHP Lucene, Apache
SOLR, Sphinx, MySQL FullText and MySQL inverted
index. New engine can be employed without having to
rewrite the core application.
Extensibility: support for implementation of custom plugins
for parsing and transforming media; user authentication; new
value types for fields (eg. custom web-service lookups);
generation of accession/id numbers; hooks into UI for
additional functionality; file storage API (planned) to allow
for Fedora support.
15. Better Public Access in 0.6
Faceted browsing: browse collections with selective filtering.
Ala http://www.lost-films.eu/films.
User-provided content: support for user tagging,
commenting and submission of resources.
Improved time-based media presentation: new Flash-based
media player provides display of time-based cataloguing
during playback and can display synchronized media (eg.
images during an audio interview).
Curated sets: tools for creating ordered, annotated sets of
objects or authority items and presenting these sets as
slideshows, timelines and maps.
Tours: tools for creating location-based “tours” of collections.
All of these features are being developed for existing public access
projects and will be open-sourced by their sponsors.
16.
17.
18. Open Source?
All software is free to download and use. There is no
commercial aspect to the project.
GNU Public License version 2 (GPLv2): do what you
want with the software. Forever.
Source code is included:
Gives you the freedom and ability to modify the
software to suit your needs.
Software can never orphaned as user community has
the means (source code and legal rights) to fix bugs
and maintain compatibility.
GPLv2 gives you the right to distribute your
modifications so long as source code is included.
19. History
Project began in 2003 by Whirl-i-Gig, with roots in web-
based cataloguing systems developed in the 1990’s.
First users start working in 2005.
February 2007: first public release.
November 2008: Name change from OpenCollection to
CollectiveAccess.
Today: 25 institutional users (that we know about).
February 2009: First five sites begin using 0.6 for work.
Include two film archives, a “digital memory” project, a
catalogue raisonné and an archive documenting the
physical remains of the World Trade Center in NY.
Summer 2009: First public release of 0.6
20. Selected users
Royal Museum for Central Africa, Brussels
Deutsche Kinemathek, Berlin
Center for Biodiversity Conservation, American
Museum of Natural History, New York
Northeast Historic Film, Buckport, ME.
The Parrish Art Museum, Southampton, NY
The Frick Collection, New York
Museum of Jewish Heritage, New York
National Museum of Women Artists,
Washington, D.C.
Hansen’s Snobliz, New Orleans, LA
21. Types of collections
Fine Art
Film
Technology
Architectural design archives
Costumes and clothing
Anthropology/ethnographic collections
Biodiversity conservation (field photographs)
Oral history
Exhibition asset management
Corporate archives
Photography
Historical societies
22. Support
Whirl-i-Gig’s work on CA directly funded by
users - all developed code is contractually
covered by the GPL.
Indirect funding through our related work in
cultural heritage and the natural sciences.
Indirect support from Kulturstiftung des
Bundes, Bundeszentrale für politische Bildung,
IMLS, NEH, NEA, the New York State
Council for the Arts (NYSCA) and New York
City Department of Cultural Affairs through
project partners.
23. Overarching goals
Develop broad-based international user
community.
Establish CA as a viable platform for the widest
practical range of uses in as many locales as
possible.
Develop support infrastructure: net-based
community support as well as local consultants.
Establish productive collaborations with
complementary projects.
24. Thank you!
Questions or comments?
Contact: Seth Kaufman
(seth@CollectiveAccess.org)
For more information on CollectiveAccess:
http://www.CollectiveAccess.org