This document provides a summary of 46 Apache content-related projects and 8 content-related incubating projects. It discusses projects for transforming and reading content like Apache PDFBox, POI, and Tika. Projects for text and language analysis like UIMA, OpenNLP, and Mahout. Projects that work with structured data and linked data like Any23, Stanbol, and Jena. Projects for data management and processing on Hadoop like MRQL, DataFu, and Falcon. Projects for serving content like HTTPD Server, TrafficServer, and Tomcat. Projects that focus on generating content like OpenOffice, Forrest, and Abdera. And projects for working with hosted content like Chemistry and ManifoldCF. The document
Large Scale ETL for Hadoop and Cloudera Search using Morphlineswhoschek
Cloudera Morphlines is a new, embeddable, open source Java framework that reduces the time and skills necessary to integrate and build Hadoop applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, analytic online dashboards, or other consumers. If you want to integrate, build, or facilitate streaming or batch transformation pipelines without programming and without MapReduce skills, and get the job done with a minimum amount of fuss and support costs, Morphlines is for you.
In this talk, you'll get an overview of Morphlines internals and explore sample use cases that can be widely applied.
http://sigir2013.ie/industry_track.html#GrantIngersoll
Abstract: Apache Lucene and Solr are the most widely deployed search technology on the planet, powering sites like Twitter, Wikipedia, Zappos and countless applications across a large array of domains. They are also free, open source, extensible and extremely scalable. Lucene and Solr also contain a large number of features for solving common information retrieval problems ranging from pluggable posting list compression and scoring algorithms to faceting and spell checking. Increasingly, Lucene and Solr also are being (ab)used to power applications going way beyond the search box. In this talk, we'll explore the features and capabilities of Lucene and Solr 4.x, as well as look at how to (ab)use your search engine technology for fun and profit.
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldAlex Moundalexis
This presentation describes the Hadoop ecosystem and gives examples of how these open source tools are combined and used to solve specific and sometimes very complex problems. Drawing upon case studies from the field, Mr. Moundalexis demonstrates that one-size, rigid traditional systems don’t fit all, but that combinations of tools in the Apache Hadoop ecosystem provide a versatile and flexible platform for integrating, finding, and analyzing information.
Large Scale ETL for Hadoop and Cloudera Search using Morphlineswhoschek
Cloudera Morphlines is a new, embeddable, open source Java framework that reduces the time and skills necessary to integrate and build Hadoop applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, analytic online dashboards, or other consumers. If you want to integrate, build, or facilitate streaming or batch transformation pipelines without programming and without MapReduce skills, and get the job done with a minimum amount of fuss and support costs, Morphlines is for you.
In this talk, you'll get an overview of Morphlines internals and explore sample use cases that can be widely applied.
http://sigir2013.ie/industry_track.html#GrantIngersoll
Abstract: Apache Lucene and Solr are the most widely deployed search technology on the planet, powering sites like Twitter, Wikipedia, Zappos and countless applications across a large array of domains. They are also free, open source, extensible and extremely scalable. Lucene and Solr also contain a large number of features for solving common information retrieval problems ranging from pluggable posting list compression and scoring algorithms to faceting and spell checking. Increasingly, Lucene and Solr also are being (ab)used to power applications going way beyond the search box. In this talk, we'll explore the features and capabilities of Lucene and Solr 4.x, as well as look at how to (ab)use your search engine technology for fun and profit.
Search in the Apache Hadoop Ecosystem: Thoughts from the FieldAlex Moundalexis
This presentation describes the Hadoop ecosystem and gives examples of how these open source tools are combined and used to solve specific and sometimes very complex problems. Drawing upon case studies from the field, Mr. Moundalexis demonstrates that one-size, rigid traditional systems don’t fit all, but that combinations of tools in the Apache Hadoop ecosystem provide a versatile and flexible platform for integrating, finding, and analyzing information.
Presentation from March 18th, 2013 Triangle Java User Group on Taming Text. Presentation covers search, question answering, clustering, classification, named entity recognition, etc. See http://www.manning.com/ingersoll for more.
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
Slides from my talk on "Building a Large Scale SEO/SEM Application with Apache Solr" in Lucene/Solr Revolution 2014 where I talk how we handle Indexing/Search of 40 billion records (documents)/month in Apache Solr with 4.6 TB compressed index data.
Abstract: We are working on building a SEO/SEM application where an end user search for a "keyword" or a "domain" and gets all the insights about these including Search engine ranking, CPC/CPM, search volume, No. of Ads, competitors details etc. in a couple of seconds. To have this intelligence, we get huge web data from various sources and after intensive processing it is 40 billion records/month in MySQL database with 4.6 TB compressed index data in Apache Solr.
Due to large volume, we faced several challenges while improving indexing performance, search latency and scaling the overall system. In this session, I will talk about our several design approaches to import data faster from MySQL, tricks & techniques to improve the indexing performance, Distributed Search, DocValues(life saver), Redis and the overall system architecture.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
A 1 hour intro to search, Apache Lucene and Solr, and LucidWorks Search. Contains a quick start with LucidWorks Search and a demo using financial data (See Github prj: http://bit.ly/lws-financial) as well as some basic vocab and search explanations
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Aaron Cordova outlines how Accumulo helps provide the essential features of a "Data Lake": a system in which all types of data from all sources can be imported, secured, analyzed, and delivered to decision makers.
Solving real world data problems with JerakiaCraig Dunn
This is the talk I gave at Config Management Camp 2016 in Ghent introducing Jerakia as a lookup tool that can be used in place of, or along side of hiera to solve some of the edge cases around data separation
An overview of all the different content related technologies at the Apache Software Foundation
Talk from ApacheCon NA 2010 in Atlanta in November 2010
A talk given by Ted Dunning on February 2013 on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Lucas Jellema
Events are playing an increasingly important role in modern application architecture. They represent fast, streaming data, they fuel the interaction between microservices, they are at the core of CQRS and event sourcing. Apache Kafka has quickly emerged as the de facto standard event platform: open source, cross technology, reliable and extremely scalable and available on any platform, in Docker and from the major cloud platforms- including Oracle Cloud’s Event Hub service. This session explains the what, why and how of Apache Kafka. What role does it play, how is it used and what are challenges and tricks for real life applications. How does it fit in with Oracle Database and Fusion Middleware and with Oracle Public Cloud? In several demos, Kafka is seen at work - in real time streaming event analysis through KSQL, in CQRS and microservices scenarios and with user interfaces updated in real time through events and HTML5 server sent events.
This presentation includes a demonstration of remote database synchronization through Twitter.
Presentation from March 18th, 2013 Triangle Java User Group on Taming Text. Presentation covers search, question answering, clustering, classification, named entity recognition, etc. See http://www.manning.com/ingersoll for more.
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
Slides from my talk on "Building a Large Scale SEO/SEM Application with Apache Solr" in Lucene/Solr Revolution 2014 where I talk how we handle Indexing/Search of 40 billion records (documents)/month in Apache Solr with 4.6 TB compressed index data.
Abstract: We are working on building a SEO/SEM application where an end user search for a "keyword" or a "domain" and gets all the insights about these including Search engine ranking, CPC/CPM, search volume, No. of Ads, competitors details etc. in a couple of seconds. To have this intelligence, we get huge web data from various sources and after intensive processing it is 40 billion records/month in MySQL database with 4.6 TB compressed index data in Apache Solr.
Due to large volume, we faced several challenges while improving indexing performance, search latency and scaling the overall system. In this session, I will talk about our several design approaches to import data faster from MySQL, tricks & techniques to improve the indexing performance, Distributed Search, DocValues(life saver), Redis and the overall system architecture.
ElasticSearch in Production: lessons learnedBeyondTrees
With Proquest Udini, we have created the worlds largest online article store, and aim to be the center for researchers all over the world. We connect to a 700M solr cluster for search, but have recently also implemented a search component with ElasticSearch. We will discuss how we did this, and how we want to use the 30M index for scientific citation recognition. We will highlight lessons learned in integrating ElasticSearch in our virtualized EC2 environments, and challenges aligning with our continuous deployment processes.
A 1 hour intro to search, Apache Lucene and Solr, and LucidWorks Search. Contains a quick start with LucidWorks Search and a demo using financial data (See Github prj: http://bit.ly/lws-financial) as well as some basic vocab and search explanations
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Aaron Cordova outlines how Accumulo helps provide the essential features of a "Data Lake": a system in which all types of data from all sources can be imported, secured, analyzed, and delivered to decision makers.
Solving real world data problems with JerakiaCraig Dunn
This is the talk I gave at Config Management Camp 2016 in Ghent introducing Jerakia as a lookup tool that can be used in place of, or along side of hiera to solve some of the edge cases around data separation
An overview of all the different content related technologies at the Apache Software Foundation
Talk from ApacheCon NA 2010 in Atlanta in November 2010
A talk given by Ted Dunning on February 2013 on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Lucas Jellema
Events are playing an increasingly important role in modern application architecture. They represent fast, streaming data, they fuel the interaction between microservices, they are at the core of CQRS and event sourcing. Apache Kafka has quickly emerged as the de facto standard event platform: open source, cross technology, reliable and extremely scalable and available on any platform, in Docker and from the major cloud platforms- including Oracle Cloud’s Event Hub service. This session explains the what, why and how of Apache Kafka. What role does it play, how is it used and what are challenges and tricks for real life applications. How does it fit in with Oracle Database and Fusion Middleware and with Oracle Public Cloud? In several demos, Kafka is seen at work - in real time streaming event analysis through KSQL, in CQRS and microservices scenarios and with user interfaces updated in real time through events and HTML5 server sent events.
This presentation includes a demonstration of remote database synchronization through Twitter.
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
Given at Data Day Seattle 2015.
Bitly generates over 9 billion clicks on shortened links a month, as well as over 100 million unique link shortens. Analyzing data of this scale is not without its challenges. At Bitly, we have started adopting Apache Spark as a way to process our data. In this talk, I’ll elaborate on how I use Spark as part of my data science workflow. I’ll cover how Spark fits into our existing architecture, the kind of problems I’m solving with Spark, and the benefits and challenges of using Spark for large-scale data science.
From the Fast Feather Track at ApacheCon NA 2010 in Atlanta
This quick talk provides an overview of Apache Tika, looks at a new features and supported file formats. It then shows how to create a new parser, and finishes with using Tika from your own application.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
A presentation from ApacheCon Europe 2015 / Apache Big Data Europe 2015
Apache Tika detects and extracts metadata and text from a huge range of file formats and types. From Search to Big Data, single file to internet scale, if you've got files, Tika can help you get out useful information!
Apache Tika has been around for nearly 10 years now, and in that time, a lot has changed. Not only has the number of formats supported gone up and up, but the ways of using Tika have expanded, and some of the philosophies on the best way to handle things have altered with experience. Tika has gained support for a wide range of programming languages to, and more recently, Big-Data scale support, and ways to automatically compare effects of changes to the library.
Whether you're an old-hand with Tika looking to know what's hot or different, or someone new looking to learn more about the power of Tika, this talk will have something in it for you!
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and SparkCarolyn Duby
ODSC East 2017 - How to use Zeppelin and Spark to document your research.
Reproducible research documents not just the findings of a study but the exact code required to produce those findings. Reproducible research is a requirement for study authors to reliably repeat their analysis or accelerate new findings by applying the same techniques to new data. The increased transparency allows peers to quickly understand and compare the methods of the study to other studies and can lead to higher levels of trust, interest and eventually more citations of your work. Big data introduces some new challenges for reproducible research. As our data universe expands and the open data movement grows, more data is available than ever to analyze, and the possible combinations are infinite. Data cleaning and feature extraction often involve lengthy sequences of transformations. The space allotted for publications is not adequate to effectively describe all the details, so they can be reviewed and reproduced by others. Fortunately, the open source community is addressing this need with Apache Spark, Zeppelin and Hadoop. Apache Spark 2.0 makes it even simpler and faster to harness the power of a Hadoop computing cluster to clean, analyze, explore and train machine learning models on large data sets. Zeppelin web-based notebooks capture and share code and interactive visualizations with others. After this session you will be able to create a reproducible data science pipeline over large data sets using Spark, Zeppelin, and a Hadoop distributed computing cluster. Learn how to combine Spark with other supported interpreters to codify your results from cleaning to exploration to feature extraction and machine learning. Discover how to share your notebooks and data with others using the cloud. This talk will cover Spark and show examples, but it is not intended to be a complete tutorial on Spark.
A comprehensive overview on the entire Hadoop operations and tools: cluster management, coordination, injection, streaming, formats, storage, resources, processing, workflow, analysis, search and visualization
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovygagravarr
You've got an XML file. You need a XLS or XLSX spreadsheet. You need it urgently. How do you turn one into the other, without going made? Groovy on the Java JVM to the rescue!
But we're already open source! Why would I want to bring my code to Apache?gagravarr
From ApacheCon Europe 2015 in Budapest
So, your business has already opened sourced some of its code? Great! Or you're thinking about it? That's fine! But now, someone's asking you about giving it to these Apache people? What's up with that, and why isn't just being open source enough?
In this talk, we'll look at several real world examples of where companies have chosen to contribute their existing open source code to the Apache Software Foundation. We'll see the advantages they got from it, the problems they faced along the way, why they did it, and how it helped their business. We'll also look briefly at where it may not be the right fit.
Wondering about how to take your business's open source involvement to the next level, and if contributing to projects at the Apache Software Foundation will deliver RoI, then this is the talk for you!
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...gagravarr
If you have one or two files, you can take the time to manually work out what they are, what they contain, and how to get the useful bits out (probably....). However, this approach really doesn't scale, mechanical turks or no! Luckily, there are open source projects and libraries out there which can help, and which can scale!
In this talk, we'll first look at how we can work out what a given blob of 1s and 0s actually is, be it textual or binary. We'll then see how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! We'll see how to use things like Apache Tika to do this, along with some other libraries to complement it. Once that part's all sorted, we'll look at how to roll this all out for a large-scale Search or Big Data setup, helping you turn those 1s and 0s into useful content at scale!
This talk was given at Berlin Buzzwords 2015
The ""Apache Way"" is the process by which Apache Software Foundation projects are managed. It has evolved over many years and has produced over 100 highly successful open source projects. But what is it and how does it work?
How Big is Big – Tall, Grande, Venti Data?gagravarr
Apache has a wide range of Big Data projects, some suitable for smaller problem sets, some which scale to huge problems. Today though, that one label "Big Data" can cause confusion for new users, as they may struggle to pick the right project for the right scale for their problem.
Do we need new titles for different kinds of Big Data? Does the buzz and VC funding cause confusion? Is the humble requirement dead? Or can we help new users better find the right Apache project for them?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?gagravarr
So, your business has already opened sourced some of it's code? Great! But now, someone's asking you about giving it to these Apache people? What's up with that, and why isn't just being open source enough?
In this talk, we'll look at several real world examples of where companies have chosen to contribute their existing open source code to the Apache Software Foundation. We'll see the advantages they got from it, the problems they faced along the way, why they did it, and how it helped their business. We'll also look briefly at where it may not be the right fit.
Wondering about how to take your business's open source involvement to the next level, and if contributing to projects at the Apache Software Foundation will deliver RoI, then this is the talk for you!
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
If you have one or two files, you can take the time to manually work out what they are, what they contain, and how to get the useful bits out (probably....). However, this approach really doesn't scale, mechanical turks or no! Luckily, there are Apache projects out there which can help!
In this talk, we'll first look at how we can work out what a given blob of 1s and 0s actually is, be it textual or binary. We'll then see how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! We'll see how to do all of this with Apache Tika, and how to dive down to the underlying libraries (including its Apache friends like POI and PDFBox) for specialist cases. Finally, we'll look a little bit about how to roll this all out on a Big Data or Large-Search case.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
4. Picking the “most interesting” ones
36 Projects in 45 minutes
With time for questions...
This is not a comprehensive guide!
5. Active Committer - ~3 of these projects
Committer - ~6 of these projects
User - ~12 of these projects
Interested - ~24 of these projects
My experience levels / knowledge
will vary from project to project!
6. Different Technologies
• Transforming and Reading
• Text and Language Analysis
• RDF and Structured
• Data Management and Processing
• Serving Content
• Hosted Content
But not: Storing Content
7. What can we get in 45 mins?
• A quick overview of each project
• Roughly how they fit together / cluster
into related areas
• When talks on the project are
happening at ApacheCon
• The project's URL, so you can look
them up and find out more!
• What interests me in the project
9. Apache PDFBox
http://pdfbox.apache.org/
• Read, Write, Create and Edit PDFs
• Create PDFs from text
• Fill in PDF forms
• Extract text and formatting (Lucene,
Tika etc)
• Edit existing files, add images, add text
etc
• Continues to improve with each
release!
10. Apache POI
http://poi.apache.org/
• File format reader and writer for
Microsoft office file formats
• Support binary & ooxml formats
• Strong read edit write for .xls & .xlsx
• Read and basic edit for .doc & .docx
• Read and basic edit for .ppt & .pptx
• Read for Visio, Publisher, Outlook
• Continues growing/improving with time
11. ODF Toolkit (Incubating)
http://incubator.apache.org/odftoolkit/
• File format reader and writer for ODF
(Open Document Format) files
• A bit like Apache POI for ODF
• ODFDOM – Low level DOM interface
for ODF Files
• Simple API – High level interface for
working with ODF Files
• ODF Validator – Pure java validator
12. Apache Tika
http://tika.apache.org/
• Talks – Tuesday + Wednesday
• Java (+app +server +OSGi) library for
detecting and extracting content
• Identifies what a blob of content is
• Gives you consistent, structured
metadata back for it
• Parses the contents into plain text,
HTML, XHTML or sax events
• Growing fast!
13. Apache Cocoon
http://cocoon.apache.org/
• Component Pipeline framework
• Plug together “Lego-Like” generators,
transformers and serialisers
• Generate your content once in your
application, serve to different formats
• Read in formats, translate and publish
• Can power your own “Yahoo Pipes”
• Modular, powerful and easy
14. Apache Xalan
http://xalan.apache.org/
• XSLT processor
• XPath engine
• Java and C++ flavours
• Cross platform
• Library and command line executables
• Transform your XML
• Fast and reliable XSLT transformation
engine
Project rebooted in 2014!
15. Apache XML Graphics: FOP
http://xmlgraphics.apache.org/fop/
• XSL-FO processor in Java
• Reads W3C XSL-FO, applies the
formatting rules to your XML
document, and renders it
• Output to Text, PS, PDF, SVG, RTF,
Java Graphics2D etc
• Lets you leave your XML clean, and
define semantically meaningful rich
rendering rules for it
16. Apache Commons: Codec
http://commons.apache.org/codec/
• Encode and decode a variety of
encoding formats
• Base64, Base32, Hex, Binary Strings
• Digest – crypt(3) password hashes
• Caverphone, Metaphone, Soundex
• Quoted Printable, URL Encoding
• Handy when interchanging content
with external systems
17. Apache Commons: Compress
http://commons.apache.org/compress/
• Standard way to deal with archive and
compression formats
• Read and write support
• zip, tar, gzip, bzip, ar, cpio, unix dump,
XZ, Pack200, 7z, arj, lzma, snappy, Z
• Wider range of capabilities than
java.util.Zip
• Common API across all formats
18. Apache Commons: Imaging
http://commons.apache.org/imaging/
• Used to be called Commons Sanselan
• Pure Java image reader and writer
• Fast parsing of image metadata and
information (size, color space, icc etc)
• Much easier to use than ImageIO
• Slower though, as pure Java
• Wider range of formats supported
• PNG, GIF, TIFF, JPEG + Exif, BMP,
ICO, PNM, PPM, PSD, XMP
19. Apache SIS
http://sis.apache.org/
• Spatial Information System
• Java library for working with geospatial
content
• Enables geographic content
searching, clustering and archiving
• Supports co-ordination conversions
• Implements GeoAPI 3.0, uses ISO-
19115 + ISO-19139 + ISO-19111
21. Apache UIMA
http://uima.apache.org/
• Unstructured Information analysis
• Lets you build a tool to extract
information from unstructured data
• Language Identification,
Segmentation, Sentences, Enties etc
• Components in C++ and Java
• Network enabled – can spread work
out across a cluster
• Helped IBM to win Jeopardy!
22. Apache OpenNLP
http://opennlp.apache.org/
• Natural Language Processing
• Various tools for sentence detection,
tokenization, tagging, chunking, entity
detection etc
• Maximum Entropy and Perception
Based machine learning
• OpenNLP good when integrating
NLP into your own solution
• UIMA wins for OOTB whole-solution
23. Apache cTAKES
http://ctakes.apache.org/
• Clinical Text Analysis and Knowledge
Extraction System – cTAKES
• NLP system for information extraction
from clinical records free text in EMR
• Identifies named entities from various
dictionaries, eg diseases, procedues
• Does subject, content, ontology
mappings, relations and severity
• Built on UIMA and OpenNLP
24. Apache Mahout
http://mahout.apache.org/
• Scalable Machine Learning Library
• Large variety of scalable, distributed
algorithms
• Clustering – find similar content
• Classification – analyse and group
• Recommendations
• Formerly Hadoop based, now moving
to a DSL based on Apache Spark
26. Apache Any 23
http://any23.apache.org/
• Anything To Tripples
• Library, Web Service and CLI Tool
• Extracts structured data from many
input formats
• RDF / RDFa / HTML with Microformats
or Microdata, JSON-LD, CSV
• To RDF, JSON, Turtle, N-Triples, N-Quads,
XML
27. Apache Blur
http://incubator.apache.org/blur/
• Search engine for massive amounts of
structured data at high speed
• Query rich, structured data model
• US Census example: show me all of
the people in the US who were born in
Alaska between 1940 and 1970 who
are now living in Kansas.
• Maybe? Content → Classify → Search
• Built on Apache Hadoop
28. Apache Stanbol
http://stanbol.apache.org/
• Set of re-usable components for
semantic content management
• Components offer RESTful APIs
• Can add semantic services on top of
existing content management systems
• Content Enhancement – reasoning to
add semantic information to content
• Reasoning – add more semantic data
• Storage, Ontologies, Data Models etc
29. Apache Clerezza
http://clerezza.apache.org/
• For management of semantically
linked data available via REST
• Service platform based on OSGi
• Makes it easy to build semantic web
applications and RESTful services
• Fetch, store and query linked data
• SPARQL and RDF Graph API
• Renderlets for custom output
30. Apache Jena
http://jena.apache.org/
• Java framework for building Linked
Data and Semantic Web applications
• High performance Tripple Store
• Exposes as SPARQL http endpoint
• Run local, remote and federated
SPARQL queries over RDF data
• Ontology API to add extra semantics
• Inference API – derive additional data
31. Apache Marmotta
http://marmotta.apache.org/
• Open source Linked Data Platform
• W3C Linked Data Platform (LDP)
• Read-Write Linked Data
• RDF Tripple Store with transactions,
versioning and rule based reasoning
• SPARQL, LDP and LDPath queries
• Caching and security
• Builds on Apache Stanbol and Solr
33. Apache Calcite (Incubating)
http://calcite.incubator.apache.org/
• Formerly known as Optiq
• Dynamic Data Management framework
• Highly customisable engine for planning
and parsing queries on data from a wide
variety of formats
• SQL interface for data not in relational
databases, with query optimisation
• Complementary to Hadoop and NoSQL
systems, esp. combinations of them
34. Apache MRQL (miracle)
http://mrql.apache.org/
• Large scale, distributed data analysis
system, built on Hadoop, Hama, Spark
• Query processing and optimisation
• SQL-like query for data analysis
• Works on raw data in-situ, such as
XML, JSON, binary files, CSV
• Powerful query constructs avoid the
need to write MapReduce code
• Write data analysis tasks as SQL-like
35. Apache DataFu (Incubating)
http://datafu.incubator.apache.org/
• Collection of libraries for working with
large-scale data in Hadoop, for data
mining, statistics etc
• Provides Map-Reduce jobs and high
level language functions for data
analysis, eg statistics calculations
• Incremental processing with Hadoop
with sliding data, eg computing daily
and weekly statistics
36. Apache Falcon (Incubating)
http://falcon.apache.org/
• Data management and processing
framework built on Hadoop
• Quickly onboard data + its processing
into a Hadoop based system
• Declarative definition of data endpoints
and processing rules, inc dependencies
• Orchestrates data pipelines,
management, lifecycle, motion etc
37. Apache Ignite (Incubating)
http://ignite.incuabtor.apache.org/
• Formerly known as GainGrid
• Only just entered incubation
• In-Memory data fabric
• High performance, distributed data
management between heterogeneous
data sources and user applications
• Stream processing and compute grid
• Structured and unstructured data
39. Apache HTTPD Server
http://httpd.apache.org/
• Talks – All day today
• Very wide range of features
• (Fairly) easy to extend
• Can host most programming
languages
• Can front most content systems
• Can proxy your content applications
• Can host code and content
40. Apache TrafficServer
http://trafficserver.apache.org/
• High performance web proxy
• Forward and reverse proxy
• Ideally suited to sitting between your
content application and the internet
• For proxy-only use cases, will probably
be better than httpd
• Fewer other features though
• Often used as a cloud-edge http router
41. Apache Tomcat
http://tomcat.apache.org/
• Talks – Tuesday
• Java based, as many of the Apache
Content Technologies are
• Java Servlet Container
• And you probably all know the rest!
44. Apache OpenOffice
http://openoffice.apache.org
• Tracks – Tuesday and Wednesday
• Apache Licensed way to create, read
and write your documents and content
• Our first big “Consumer Focused”
project
• Can be used directly
• Or can be used as the upstream for
other applications
45. Apache Forrest
http://forrest.apache.org/
• Document rendering solution build on
top of cocoon
• Reads in content in a variety of
formats (xml, wiki etc), applies the
appropriate formatting rules, then
outputs to different formats
• Heavily used for documentation and
websites
• eg read in a file, format as changelog
and readme, output as html + pdf
46. Apache Abdera
http://abdera.apache.org/
• Atom – syndication and publishing
• High performance Java
implementation of RFC 4287 + 5023
• Generate Atom feeds from Java or by
converting
• Parse and process Atom feeds
• Atompub server and clients
• Supports Atom extensions like
GeoRSS, MediaRSS & OpenSearch
47. Apache JSPWiki
http://jspwiki.apache.org/
• Feature-rich extensible wiki
• Written in Java (Servlets + JSP)
• Fairly easy to extend
• Can be used as a wiki out of the box
• Provides a good platform for new wiki
based application
• Rich wiki markup and syntax
• Attachments, security, templates etc
49. Apache Chemistry
http://chemistry.apache.org/
• Java, Python, .net, PHP, Mobile
• Atom, W*, Browser (JSON) interfaces
• OASIS CMIS (Content Management
Interoperability Services)
• Client and Server bindings
• “SQL for Content”
• Consistent view on content across
different repositories
• Read / Write / Manipulate content
50. Apache ManifoldCF
http://manifoldcf.apache.org/
• Name has changed a few times...
(Lucene/Apache Connectors)
• Provides a standard way to get content
out of other systems, ready for sending
to Lucene etc
• Different goals to CMIS (Chemistry)
• Uses many parsers and libraries to talk
to the different repositories / systems
• Analogous to Tika but for repos
51. Chemistry vs ManifoldCF
incubator /chemistry/ /connectors/
• ManifoldCF treats repo as nasty black
box, and handles talking to the parsers
• Chemistry talks / exposes repo's
contents through CMIS
• ManifoldCF supports a wider range of
repositories
• Chemistry supports read and write
• Chemistry delivers a richer model
• ManifoldCF great for getting text out