A brief introduction to Elasticsearch and the many possibilities Elasticsearch offers in terms of search, data exploration and data aggregation. The presentation includes a brief introduction to search engine fundamentals and core features of Elasticsearch. The talk focuses on how we can navigate structured and unstructured data for search as well as aggregating and visualizing data for analytical purposes.
The talk aims to demonstrate case studies beyond traditional full-text-search, and hopefully show that Elasticsearch can help us build so much more than just a search engine.
Использование Elasticsearch для организации поиска по сайтуOlga Lavrentieva
Дмитрий Жлобо, Ruby and Rails Developer в Twinslash
«Использование Elasticsearch для организации поиска по сайту»
Организация качественного поиска на сайте – сложная и нетривиальная задача. В своем докладе Дмитрий расскажет о том, как ее решить с помощью Elasticsearch.
Будет рассмотрено, как Elasticsearch работает с текстом или другими данными: от анализа и индексации документов до поиска и агрегации. По шагам и на примерах будет показано, как настроить поиск, учитывающий, например, морфологию и фонетику русского языка. Также Дмитрий расскажет, как все это использовать в приложениях на Ruby, как организовать добавление документов в индекс и др.
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Использование Elasticsearch для организации поиска по сайтуOlga Lavrentieva
Дмитрий Жлобо, Ruby and Rails Developer в Twinslash
«Использование Elasticsearch для организации поиска по сайту»
Организация качественного поиска на сайте – сложная и нетривиальная задача. В своем докладе Дмитрий расскажет о том, как ее решить с помощью Elasticsearch.
Будет рассмотрено, как Elasticsearch работает с текстом или другими данными: от анализа и индексации документов до поиска и агрегации. По шагам и на примерах будет показано, как настроить поиск, учитывающий, например, морфологию и фонетику русского языка. Также Дмитрий расскажет, как все это использовать в приложениях на Ruby, как организовать добавление документов в индекс и др.
Elasticsearch is a powerful, distributed, open source searching technology. By integrating Elasticsearch into your application, you instantly provide a way to search a lot of data very quickly. Elasticsearch has a RESTful API, it scales, its super fast, you can use plugins to customize it, and much more. In this talk I go over the basics of setting up Elasticsearch, creating a search index, importing your data, and doing some basic searching. I also touch on a few advanced topics that will show the flexibility of this awesome service.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
The core Search frameworks in Liferay 7 have been significantly retooled to benefit not only from Liferay's new modular architecture, but also from one of the most innovative players in the market: Elasticsearch, which replaces Lucene as the default search engine in Portal. This session will cover topics like clustering and scalability, unveil improvements (both Elasticsearch and Solr) like aggregations, filters, geolocation, "more like this" and other new query types, and also hot new features for the Enterprise like out-of-the-box Marvel cluster monitoring and Shield security.
André "Arbo" Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been writing code for a living for 22 years, 14 of them as a Java developer and architect. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.
General introduction to Elasticsearch at the RubyShift 2013 conference.
Download the source code for demos:
* http://git.io/hello-elasticsearch-ruby
* http://git.io/stackexchange-elasticsearch
Think *inside* the box. Inside the *search* box, that is.
The "best"* search results incorporate many more factors than (just) textual matching and relevancy. Search experience owners manage query context rules, signals automatically feed back machine learned factors, users implicit and explicit behaviors filter and weight future interactions. Synergy emerges with several cooperating (just) searches.
This talk will showcase and detail several (just) search examples including rules, typeahead/suggest, signals, and location awareness, bringing them all together into a cohesive search experience.
At Stormpath we spent 18 months researching API design best practices. Join Les Hazlewood, Stormpath CTO and Apache Shiro Chair, as he explains how to design a secure REST API, the right way. He'll also hang out for a live Q&A session at the end.
Sign up for Stormpath: https://api.stormpath.com/register
More from Stormpath: http://www.stormpath.com/blog
Les will cover:
REST + JSON API Design
Base URL design tips
API Security
Versioning for APIs
API Resource Formatting
API Return Values and Content Negotiation
API References (Linking)
API Pagination, Parameters, & Errors
Method Overloading
Resource Expansion and Partial Responses
Error Handling
Multi-tenancy
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
The core Search frameworks in Liferay 7 have been significantly retooled to benefit not only from Liferay's new modular architecture, but also from one of the most innovative players in the market: Elasticsearch, which replaces Lucene as the default search engine in Portal. This session will cover topics like clustering and scalability, unveil improvements (both Elasticsearch and Solr) like aggregations, filters, geolocation, "more like this" and other new query types, and also hot new features for the Enterprise like out-of-the-box Marvel cluster monitoring and Shield security.
André "Arbo" Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been writing code for a living for 22 years, 14 of them as a Java developer and architect. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.
General introduction to Elasticsearch at the RubyShift 2013 conference.
Download the source code for demos:
* http://git.io/hello-elasticsearch-ruby
* http://git.io/stackexchange-elasticsearch
Think *inside* the box. Inside the *search* box, that is.
The "best"* search results incorporate many more factors than (just) textual matching and relevancy. Search experience owners manage query context rules, signals automatically feed back machine learned factors, users implicit and explicit behaviors filter and weight future interactions. Synergy emerges with several cooperating (just) searches.
This talk will showcase and detail several (just) search examples including rules, typeahead/suggest, signals, and location awareness, bringing them all together into a cohesive search experience.
At Stormpath we spent 18 months researching API design best practices. Join Les Hazlewood, Stormpath CTO and Apache Shiro Chair, as he explains how to design a secure REST API, the right way. He'll also hang out for a live Q&A session at the end.
Sign up for Stormpath: https://api.stormpath.com/register
More from Stormpath: http://www.stormpath.com/blog
Les will cover:
REST + JSON API Design
Base URL design tips
API Security
Versioning for APIs
API Resource Formatting
API Return Values and Content Negotiation
API References (Linking)
API Pagination, Parameters, & Errors
Method Overloading
Resource Expansion and Partial Responses
Error Handling
Multi-tenancy
How Solr Search Works - A tech Talk at Atlogys Delhi Office by our Senior Technologist Rajat Jain. The lecture takes a deep dive into Solr - what it is, how it works, what it does and its inbuilt architecture. A wonderful technical session with many live examples, a sneak peak into solr code and config files and a live demo. Part of Atlogys Academy Series.
"ElasticSearch in action" by Thijs Feryn.
ElasticSearch is a really powerful search engine, NoSQL database & analytics engine. It is fast, it scales and it's a child of the Cloud/BigData generation. This talk will show you how to get things done using ElasticSearch. The focus is on doing actual work, creating actual queries and achieving actual results. Topics that will be covered: - Filters and queries - Cluster, shard and index management - Data mapping - Analyzers and tokenizers - Aggregations - ElasticSearch as part of the ELK stack - Integration in your code.
10 pasos para desarrollar un plan de negocios en internet. Interlat
Cualquier empresa, sin importar su tamaño o industria, puede mejorar su desempeño y aumentar el volumen de su negocio gracias al uso eficiente de Internet como medio de comunicación y mercadeo. Descubre en esta clase virtual como tu empresa puede ser más rentable y exitosa con la integración de los medios digitales.
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...Sematext Group, Inc.
This talk covers the basics of centralizing logs in Elasticsearch and all the strategies that make it scale with billions of documents in production. Topics include:
- Time-based indices and index templates to efficiently slice your data
- Different node tiers to de-couple reading from writing, heavy traffic from low traffic
- Tuning various Elasticsearch and OS settings to maximize throughput and search performance
- Configuring tools such as logstash and rsyslog to maximize throughput and minimize overhead
Practical Elasticsearch - real world use casesItamar
Elasticsearch - a search and real-time analytics server based on Apache Lucene - is gaining a lot of popularity lately, and is being used world-wide to power many sophisticated systems. While many use it for the "standard" stuff (that is, simple full-text search and real-time log analysis), there are some really interesting usage patterns that can prove useful in many real-world scenarios. In this talk we will briefly talk about Elasticsearch and its common use-cases, and then showcase some less common use-cases leveraging Elasticsearch in an interesting and often times innovating ways.
Elasticsearch Distributed search & analytics on BigData made easyItamar
Elasticsearch is a cloud-ready, super scalable search engine which is gaining a lot of popularity lately. It is mostly known for being extremely easy to setup and integrate with any technology stack.In this talk we will introduce Elasticdearch, and start by looking at some of its basic capabilities. We will demonstrate how it can be used for document search and even log analytics for DevOps and distributed debugging, and peek into more advanced usages like the real-time aggregations and percolation. Obviously, we will make sure to demonstrate how Elasticsearch can be scaled out easily to work on a distributed architecture and handle pretty much any load.
Creating an Open Source Genealogical Search Engine with Apache SolrBrooke Ganz
Set Your Records Free!
LeafSeek is a new tool that helps you turn your genealogical or historical record collections into searchable online databases. Combine multiple datasets of different types — such as birth, marriage, and military records — into one unified searchable website. Find inter-connections in your data that you never noticed before.
With great features like built-in geo-spatial searches, pop-up Google Maps, Beider-Morse Phonetic Matching, name synonyms, and language localization, LeafSeek can help you turn your spreadsheets of names and dates into a full-featured genealogy search engine. It’s designed for researchers and genealogy societies alike.
Oh, and one more thing: LeafSeek is free and open source. No strings attached.
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...Amazon Web Services
The need for Natural Language Processing (NLP) is gaining more importance as the amount of unstructured text data doubles every 18 months and customers are looking to extend their existing analytics workloads to include natural language capabilities. Historically, this data had been prohibitively expensive to store and early manual processing evolved into rule-based systems, which were expensive to operate and inflexible. In this session we will show you how you can address this problem using Amazon Comprehend.
A presentation given at the Lucene/Solr Revolution 2014 conference to show Solr and Elasticsearch features side by side. The presentation time was only 30 minutes, so only the core usability features were compared. The full video is embedded on the last slide.
You're not using ElasticSearch (outdated)Timon Vonk
The slightly verbose slides accompanying my introductory ElasticSearch talk at Utrecht.rb
These slides are outdated, see https://speakerdeck.com/timonv/youre-not-using-elasticsearch
There are many examples of text-based documents (all in ‘electronic’ format…)
e-mails, corporate Web pages, customer surveys, résumés, medical records, DNA sequences, technical papers, incident reports, news stories and more…
Not enough time or patience to read
Can we extract the most vital kernels of information…
So, we wish to find a way to gain knowledge (in summarised form) from all that text, without reading or examining them fully first…!
Some others (e.g. DNA seq.) are hard to comprehend!
Similar to Data Exploration with Elasticsearch (20)
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
5. • Aleksander
M.
Stensby
• CEO
in
Monokkel
AS
• Previously
COO
in
Integrasco
AS
• Working
with
search
and
data
analysis
since
2004
www.monokkel.io
6. • Daglig
leder
i
Monokkel
AS
• Tidligere
COO
i
Integrasco
AS
• Persistering,
Prosessering
og
Presentasjon
av
data
Persistence
–
Processing
–
PresentaHon
7.
8.
9.
10.
11.
12. Agenda
• Search
fundamentals
primer
• Intro
to
elasHcsearch
• Search,
filter
and
aggregate!
13. Agenda
• Search
fundamentals
primer
• Intro
to
elasHcsearch
• Search,
filter
and
aggregate!
…
and
some
bonus
visualisaHon!
14. What
we
will
not
cover
today…
• All
the
different
searches,
filters
and
aggregaHons
available
in
elasHcsearch
J
• Details
on
tokenizaHon,
analyzers…
• ElasHcsearch
in
producHon
and
performance
tuning…
• Data
integraHon
22. “We were born to run”
“No one told you when
to run”
“Some were born to sing
the blues”
23.
24.
25.
26. The
Inverted
Index
Term
Frequency
blues
1
born
2
no
1
one
1
run
2
sing
1
some
1
the
1
to
3
told
1
we
1
were
2
when
1
you
1
Documents
3
1,3
2
2
1,2
3
3
3
1,2,3
2
1
1,3
2
2
dictionary postings
1. “We were born to
run ”
2. “No one told you
when to run”
3. “Some were born to
sing the blues”
27. Searching
born
1. “We were born to
run ”
2. “No one told you
when to run”
3. “Some were born to
sing the blues”
28. The
Boolean
Model
Term
Frequency
blues
1
born
2
no
1
one
1
run
2
sing
1
some
1
the
1
to
3
told
1
we
1
were
2
when
1
you
1
Documents
3
1,3
2
2
1,2
3
3
3
1,2,3
2
1
1,3
2
2
dictionary postings
born
29. Term
Frequency
blues
1
born
2
no
1
one
1
run
2
sing
1
some
1
the
1
to
3
told
1
we
1
were
2
when
1
you
1
Documents
3
1,3
2
2
1,2
3
3
3
1,2,3
2
1
1,3
2
2
dictionary postings
born
blues
30. Term
Frequency
blues
1
born
2
no
1
one
1
run
2
sing
1
some
1
the
1
to
3
told
1
we
1
were
2
when
1
you
1
Documents
3
1,3
2
2
1,2
3
3
3
1,2,3
2
1
1,3
2
2
dictionary postings
born
OR
blues
31. Term
Frequency
blues
1
born
2
no
1
one
1
run
2
sing
1
some
1
the
1
to
3
told
1
we
1
were
2
when
1
you
1
Documents
3
1,3
2
2
1,2
3
3
3
1,2,3
2
1
1,3
2
2
dictionary postings
born
AND
blues
32. Term
Frequency
blues
1
born
2
no
1
one
1
run
2
sing
1
some
1
the
1
to
3
told
1
we
1
were
2
when
1
you
1
Documents
3
1,3
2
2
1,2
3
3
3
1,2,3
2
1
1,3
2
2
dictionary postings
born
NOT
blues
33. Relevancy
and
Ranking
• Term
frequency
• Inverse
document
frequency
• Field-‐length
norm
34. Similarity
1. “We were born to
run ”
2. “No one told you
when to run”
3. “Some were born to
sing the blues”
[2,
0]
[0,
0]
[2,
5]
0
0
1
2
3
4
5
1
2
3
“blues”
“born”
query:
[2,5]
doc
3:
[2,5]
doc
2:
[0,0]
doc
1:
[2,0]
36. Brief
history
of
elasHcsearch
Shay
Banon
-‐>
AbstracHon
Layer
on
top
of
Lucene
-‐>
Compass
-‐>
Rewricen
high
performance,
real-‐Hme,
distributed
-‐>
ElasHcsearch
-‐>
February
2010
37. elasHcsearch
• Open
source
search
engine
-‐
wricen
in
Java
• Built
on
top
of
Lucene
• Simple,
coherent,
RESTful
API
• Distributed,
scalable
search
engine
with
real-‐
Hme
analyHcs
{
}
38.
“more
useable
and
concise
API,
scalability,
and
opera+onal
tools
on
top
of
Lucene’s
search
implementa+on”
42. Much
more
than
just
search!
• Real-‐Hme
analyHcs
• Log
analysis
• PredicHon
modelling
• RecommendaHons
43.
in
5
minutes
DEMO
44. DEMO
• Install
ElasHcSearch
• Load
in
some
data
• Run
a
very
basic
search
45.
in
15
minutes
DEMO
46. Easy
peasy…
• hcp://www.elasHcsearch.org/download
• bin/elasHcsearch
or
bin/elasHcsearch.bat
on
windows
• hcp://localhost:9200/
or
curl
–X
GET
hcp://localhost:9200/
54. Mapping
• Is
it
a
number?
String?
Date?
• Combining
mulHple
fields?
• Default
values?
• Stored?
• Analyzed?
• How
should
we
tokenize/analyse/normalize
the
field?
64. And
lots
more…
filtered
query
prefix
query
simple
query
string
query
range
query
regexp
query
term
query
terms
query
wildcard
query
dis
max
query
geoshape
query
nested
query
more
like
this
query
more
like
this
field
query
boosHng
query
common
terms
query
constant
score
query
fuzzy
like
this
query
fuzzy
like
this
field
query
funcHon
score
query
fuzzy
query
has
child
query
has
parent
query
ids
query
indices
query
span
first
query
span
mulH
term
query
span
near
query
span
not
query
span
or
query
span
term
query
top
children
query
minimum
should
match
mulH
term
query
rewrite
template
query
hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/query-‐dsl-‐queries.html
65. Filtering
• Filters
do
not
score
so
they
are
faster
to
execute
than
queries
• Filters
can
be
cached
in
memory
-‐
significantly
faster
than
queries
If relevance is not important, use
filters, otherwise, use queries!
77. AggregaHons
• Buckets
and
Metrics:
par++oning
documents
based
on
a
criteria
SELECT
COUNT(color)
FROM
table
GROUP
BY
color
An
aggrega+on
is
a
combina+on
of
buckets
and
metrics
metric
bucket
78. AggregaHons
{
"aggs":
{
"speakers":
{
"terms":
{
"field":
"speaker"
}
}
}
}
your aggregation name
bucket type
82. AggregaHons
min
max
sum
avg
stats
extended
stats
value
count
percenHles
percenHle
ranks
cardinality
top
hits
scripted
metric
global
filter
filters
missing
nested
reverse
nested
children
terms
significant
terms
range
date
range
ipv4
range
histogram
date
historgram
geo
bounds
geo
distance
geohash
grid
hAp://www.elas+csearch.org/guide/en/elas+csearch/reference/current/search-‐aggrega+ons.html
83. And
a
whole
lot
more!
• Geosearch,
distance
and
bounds
• ”More
Like
This”
• Suggesters
/
Autocomplete
• PercolaMon
• Language
drivers
• ScripMng
84. Further
reading
and
some
great
resources!
• hcp://www.elasHcsearch.org/guide/
• hcp://blog.monokkel.io/
• hcps://found.no/foundaHon/