Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
This slide deck talks about Elasticsearch and its features.
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.
what is the ELK Stack?
ELK vs Elastic stack
What is Elasticsearch used for?
How does Elasticsearch work?
What is an Elasticsearch index?
Shards
Replicas
Nodes
Clusters
What programming languages does Elasticsearch support?
Amazon Elasticsearch, its use cases and benefits
In this presentation, we are going to discuss how elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
A brief presentation outlining the basics of elasticsearch for beginners. Can be used to deliver a seminar on elasticsearch.(P.S. I used it) Would Recommend the presenter to fiddle with elasticsearch beforehand.
This slide deck talks about Elasticsearch and its features.
When you talk about ELK stack it just means you are talking
about Elasticsearch, Logstash, and Kibana. But when you talk
about Elastic stack, other components such as Beats, X-Pack
are also included with it.
what is the ELK Stack?
ELK vs Elastic stack
What is Elasticsearch used for?
How does Elasticsearch work?
What is an Elasticsearch index?
Shards
Replicas
Nodes
Clusters
What programming languages does Elasticsearch support?
Amazon Elasticsearch, its use cases and benefits
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
An introduction to and a couple of examples and tips on how to use Elasticsearch for general data analytics. Examples are based on Elasticsearch version 2.x.
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka tutorial on What Is ELK Stack will help you in understanding the fundamentals of Elasticsearch, Logstash, and Kibana together and help you in building a strong foundation in ELK Stack. Below are the topics covered in this ELK tutorial for beginners:
1. Need for Log Analysis
2. Problems with Log Analysis
3. What is ELK Stack?
4. Features of ELK Stack
5. Companies Using ELK Stack
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
در این اسلاید به مباحث زیر می پردازیم:
مقدمات پایگاه داده های غیر اس.کیو.ال، مبانی جستجوگرها
سپس معرفی ابزار جستجوی الاستیکی، کاربردها، معماری کلی، مقایسه با ابزارهای مشابه
افزودن تحلیلگر متن و در نهایت لینک آن با دات نت
ا
NoSQL (Not Only SQL) is believed to be a superset of, or sometimes an intersecting set with, relational SQL databases. The concept itself is still shaping, but already now we can say for sure: NoSQL addresses the task of storing and retrieving the data of large volumes in the systems with high load. There is another very important angle in perceiving the concept:
NoSQL systems can allow storing and efficient searching of the unstructured or semi-unstructured data, like completely raw or preprocessed documents. Using the example of one world-class document retrieval system Apache SOLR (performant HTTP wrapper around Apache Lucene) as a reference we will check upon its use cases, horizontal and vertical scalability, faceted search, distribution and load balancing, crawling, extendability, linguistic support, integration with relational databases and much more.
Dmitry Kan will shortly touch upon *hot* topic of cloud computing using the famous project Apache Hadoop and will help the audience to see whether SOLR shines through the cloud.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
The talk covers how Elasticsearch, Lucene and to some extent search engines in general actually work under the hood. We'll start at the "bottom" (or close enough!) of the many abstraction levels, and gradually move upwards towards the user-visible layers, studying the various internal data structures and behaviors as we ascend. Elasticsearch provides APIs that are very easy to use, and it will get you started and take you far without much effort. However, to get the most of it, it helps to have some knowledge about the underlying algorithms and data structures. This understanding enables you to make full use of its substantial set of features such that you can improve your users search experiences, while at the same time keep your systems performant, reliable and updated in (near) real time.
An introduction to and a couple of examples and tips on how to use Elasticsearch for general data analytics. Examples are based on Elasticsearch version 2.x.
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka tutorial on What Is ELK Stack will help you in understanding the fundamentals of Elasticsearch, Logstash, and Kibana together and help you in building a strong foundation in ELK Stack. Below are the topics covered in this ELK tutorial for beginners:
1. Need for Log Analysis
2. Problems with Log Analysis
3. What is ELK Stack?
4. Features of ELK Stack
5. Companies Using ELK Stack
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
در این اسلاید به مباحث زیر می پردازیم:
مقدمات پایگاه داده های غیر اس.کیو.ال، مبانی جستجوگرها
سپس معرفی ابزار جستجوی الاستیکی، کاربردها، معماری کلی، مقایسه با ابزارهای مشابه
افزودن تحلیلگر متن و در نهایت لینک آن با دات نت
ا
NoSQL (Not Only SQL) is believed to be a superset of, or sometimes an intersecting set with, relational SQL databases. The concept itself is still shaping, but already now we can say for sure: NoSQL addresses the task of storing and retrieving the data of large volumes in the systems with high load. There is another very important angle in perceiving the concept:
NoSQL systems can allow storing and efficient searching of the unstructured or semi-unstructured data, like completely raw or preprocessed documents. Using the example of one world-class document retrieval system Apache SOLR (performant HTTP wrapper around Apache Lucene) as a reference we will check upon its use cases, horizontal and vertical scalability, faceted search, distribution and load balancing, crawling, extendability, linguistic support, integration with relational databases and much more.
Dmitry Kan will shortly touch upon *hot* topic of cloud computing using the famous project Apache Hadoop and will help the audience to see whether SOLR shines through the cloud.
Turning a Search Engine into a Relational DatabaseMatthias Wahl
About the How and Why of taking Lucene and Elasticsearch and turning it into a Relational Database.
Talk I gave at Search User Group Berlin September Meetup http://www.meetup.com/de/Search-UG-Berlin/events/224765731/
MYSQL Query Anti-Patterns That Can Be Moved to SphinxPythian
PalominoDB European Team lead, Vladimir Fedorkov will be discussing how to handle query bottlenecks that can result from increases in dataset and traffic
PostgreSQL is a well-known relational database. But in the last few years, it has gained capabilities that previously belonged only to "NoSQL" databases. In this talk, I describe several of PostgreSQL that give it such capabilities.
PostgreSQL - It's kind've a nifty databaseBarry Jones
This presentation was given to a company that makes software for churches that is considering a migration from SQL Server to PostgreSQL. It was designed to give a broad overview of features in PostgreSQL with an emphasis on full-text search, various datatypes like hstore, array, xml, json as well as custom datatypes, TOAST compression and a taste of other interesting features worth following up on.
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users.
slides for SQL Saturday 635, Vancouver BC, Aug 2017
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
You're not using ElasticSearch (outdated)Timon Vonk
The slightly verbose slides accompanying my introductory ElasticSearch talk at Utrecht.rb
These slides are outdated, see https://speakerdeck.com/timonv/youre-not-using-elasticsearch
A walkthrough of the updated Engagor, a tool for "Intelligent Social Media Management". The most powerful features of this tool I've been working on are showcased in a couple of screenshots.
"Engagor is your Swiss Army knife for the social web. Easily measure, grow and manage your online presences in style."
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
An approach to horizontal database scaling. Slides from a presentation at the mySQL dev room at FOSDEM 2009. Notes and remarks to these slides will become available soon on www.jurriaanpersyn.com and netlog.com/go/developer
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
10. TEXT SEARCH WITH PHP/MYSQL
• FULLTEXT index
• Only for CHAR, VARCHAR & TEXT columns
• For MyISAM & InnoDB tables
• Configurable Stop Words
• Types:
• Natural Language
• Natural Language with Query Expansion
• Boolean Full Text
11. MYSQL FULLTEXT BOOLEAN MODE
Operators:
•+ AND
•- NOT
• OR implied
•() Nesting
•* Wildcard
•“ Literal matches
12. TEXT SEARCH WITH PHP/MYSQL (CONT‟D)
• Typical columns for search table:
• Type
• Id
• Text
• Priority
• Process:
• Blog posts, comments, …
• Save (filtered) duplicate of text in search table.
• When searching …
• Search table and translate to original data via type/id
This is how most php/mysql sites implement their search, right?
14. SELECT *
FROM mysearchtable
WHERE MATCH(text)
AGAINST („+shizzle –”ma
nizzle”‟ IN BOOLEAN MODE)
15. SELECT * FROM jobs
WHERE role = „DEVELOPER‟
AND MATCH(job_description)
AGAINST („node.js‟)
16. SELECT * FROM jobs j
JOIN jobs_benefits jb ON j.id =
jb.job_id
WHERE j.role = „DEVELOPER‟
AND (MATCH(job_description)
AGAINST („node.js -asp‟ IN
BOOLEAN MODE)
AND jb.free_espresso = TRUE
17.
18. WHAT IS A SEARCH ENGINE?
• Efficient indexing of data
• On all fields / combination of fields
• Analyzing data
• Text Search
• Tokenizing
• Stemming
• Filtering
• Understanding locations
• Date parsing
• Relevance scoring
19. TOKENIZING
• Finding word boundaries
• Not just explode(„ „, $text);
• Chinese has no spaces. (Not every single character is a
word.)
• Understand patterns:
• URLs
• Emails
• #hashtags
• Twitter @mentions
• Currencies (EUR, €, …)
20. STEMMING
• “Stemming is the process for reducing inflected (or sometimes
derived) words to their stem, base or root form.”
• Conjugations
• Plurals
• Example:
• Fishing, Fished, Fish, Fisher > Fish
• Better > Good
• Several ways to find the stem:
• Lookup tables
• Suffix-stripping
• Lemmatization
•…
• Different stemmers for every language.
21. FILTERING
• Remove stop words
• Different for every language
• HTML
• If you‟re indexing web content, not every character is
meaningful.
22. UNDERSTANDING LOCATIONS
• Reverse geocoding of locations to longitude & latitude
• Search on location:
• Bounding box searches
• Distance searches
• Searching nearby
• Geo Polygons
• Searching a country
(Note: MySQL also has geospatial indeces.)
23. RELEVANCE SCORING
• From the matched documents, which ones do you show first?
• Several strategies:
• How many matches in document?
• How many matches in document as percentage of length?
• Custom scoring algorithms
• At index time
• At search time
• … A combination
Think of Google PageRank.
26. APACHE LUCENE
• “Information retrieval software library”
• Free/open source
• Supported by Apache Foundation
• Created by Doug Cutting
• Written in 1999
30. ELASTICSEARCH
• “You know, for Search”
• Also Free & Open Source
• Built on top of Lucene
• Created by Shay Banon @kimchy
• Versions
• First public release, v0.4 in February 2010
• A rewrite of earlier “Compass” project, now with scalability
built-in from the very core
• Now stable version at 0.20.6
• Beta branch at 0.90 (working towards 1.0 release)
• In Java, so inherently cross-platform
31. WHAT DOES IT ADD TO LUCENE?
• RESTfull Service
• JSON API over HTTP
• Want to use it from PHP?
• CURL Requests, as if you‟d do requests to the Facebook
Graph API.
• High Availability & Performance
• Clustering
• Long Term Persistency
• Write through to persistent storage system.
32. $ cd ~/Downloads
$ wget https://github.com/…/elasticsearch-0.20.5.tar.gz
$ tar –xzf elasticsearch-0.20.5.tar.gz
$ cd elasticsearch-0.20.5/
$ ./bin/elasticsearch
33. $ cd ~/Downloads
$ wget https://github.com/…/elasticsearch-0.20.5.tar.gz
$ tar –xzf elasticsearch-0.20.5.tar.gz
$ git clone https://github.com/elasticsearch/elasticsearch-
servicewrapper.git elasticsearch-servicewrapper
$ sudo mv elasticsearch-0.20.5 /usr/local/share
$ cd elasticsearch-servicewrapper
$ sudo mv service /usr/local/share/elasticsearch-0.20.5/bin
$ cd /usr/local/share
$ sudo ln -s elasticsearch-0.20.5 elasticsearch
$ sudo chown -R root:wheel elasticsearch
$ cd /usr/local/share/elasticsearch
$ sudo bin/service/elasticsearch start
42. SCHEMALESS, DOCUMENT ORIENTED
• No need to configure schema upfront
• No need for slow ALTER TABLE –like operations
• You can define a mapping (schema) to customize the indexing
process
• Require fields to be of certain type
• If you want text fields that should not be analyzed (stemming,
…)
46. TERMINOLOGY
MySQL Elastic Search
Database Index
Table Type
Row Document
Column Field
Schema Mapping
Index Everything is indexed
SQL Query DSL
SELECT * FROM table … GET http://…
UPDATE table SET … PUT http://…
47. DISTRIBUTED & HIGHLY AVAILABLE
• Multiple servers (nodes) running in a cluster
• Acting as single service
• Nodes in cluster that store data or nodes that just help in
speeding up search queries.
• Sharding
• Indeces are sharded (# shards is configurable)
• Each shard can have zero or more replicas
• Replicas on different servers (server pools) for failover
• One in the cluster goes down? No problem.
• Master
• Automatic Master detection + failover
• Responsible for distribution/balancing of shards
48.
49. SCALING ISSUES?
• No need for an external load balancer
• Since cluster does it‟s own routing.
• Ask any server in the cluster, it will delegate to correct node.
• What if …
• More data > More shards.
• More availability > More replicas per shard.
50. PERFORMANCE TWEAKING
• Bulk Indexing
• Multi-Get
• Avoids network latency (HTTP Api)
• Api with administrative & monitoring interface
• Cluster‟s availability state
• Health
• Nodes‟ memory footprint
• Alternatives voor HTTP Api?
• Java library
• PHP wrappers (Sherlock, Elastica, …)
• But simplicity of HTTP Api is brilliant to work with, latency is
hardly an issue.
57. Query DSL Example:
(language:nl OR location.country:be OR location.country:aa)
(tag:sentiment.negative) author.followers:[1000 TO *] (-
sub_category:like) ((-status:857.assigned) (-status:857.done))
58.
59.
60.
61. FACETS
• Instead of returning the matching documents …
• … return data about the distribution of values in the set of
matching documents
• Or a subset of the matching documents
• Possibilities:
• Totals per unique value
• Averages of values
• Distributions of values
•…
65. ADVANCED FEATURES
• Nested documents (Child-Parent)
• Like MySQL joins?
• Percolation Index
• Store queries in Elastic
• Send it documents
• Get returned which queries match
• Index Warming
• Register search queries that cause heavy load
• New data added to index will be warmed
• So next time query is executed: pre cached
66. WHAT ARE MY OTHER OPTIONS?
• RDBMS
• MySQL, …
• NoSQL
• MongoDB, …
• Search Engines
• Solr
• Sphinx
• Xapian
• Lucene itself
• SaaS
• Amazon CloudSearch
67. … VS. SOLR
•+
• Also built on Lucene
• So similar feature set
• Also exposes Lucene functionality, like Elastic Search, so
easy to extend.
• A part of Apache Lucene project
• Perfect for Single Server search
•-
• Clustering is there. But it‟s definitely not as simple as
ElasticSearch‟
• Fragmented code base. (Lots of branches.)
Engagor used to run on Solr.
68. … VS. SPHINX
•+
• Great for single server full text searches;
• Has graceful integration with SQL database;
• (Eg. for indexing data)
• Faster than the others for simple searches;
•-
• No out of the box clustering;
• Not built on Lucene; lacks some advanced features;
Netlog & Twoo use Sphinx.
69. WANT TO USE IT?
• In an existing project:
• As an extra layer next to your data …
• Send to both your database & elasticsearch;
• Consistency problems?;
• Or as replacement for database
• Elastic is as persistent as MySQL;
• If you don‟t need RDBMS features;
• @Engagor: Our social messages are only in Elastic
70. “Users are incredibly bad at
finding and researching things
on the web.”
Nielsen (March 2013)
http://www.nngroup.com/articles/search-navigation/
71. “Pathetic and useless are
words that come to mind after
this year‟s user testing.”
Nielsen (March 2013)
http://www.nngroup.com/articles/search-navigation/
This talk is about adding search to your own website.Implementing a search engine for your own content.First the bit about how it’s done in MySQL, and what problems that brings with it;Then about what a search engine should do for you;Then about how Elastic Search helps;And code of course, lots of code.
I’m leading the development team at Engagor, a startup in the city centreof Gent, Belgium.We’re 2 years old.We build a social media monitoring & management product.Before that I worked for Massive Media for 5 years.As a lead developer I worked on the Netlog & Gatchaproducts.
Search can be a “simple” textsearch.Here I’m searching Tumblr for funny gifs, because that’s what Tumblris for.
But search can go deeper and more into detail too ..Here I’m usingAND, OR, NOTNestingRestrictionson fields
Or very difficult …Searching in a mixed set of dataProfilesPhotosFriend connectionsSearching in a graph …
My first thought when I’d have to add search to a php/mysql site … It sort of works …
Problems arisewhen you have lots of data …To speed things up you add indecesto your MySQL tables.
And the library analogy for a MySQL index is this …An index card box.
MySQL has an index type esp. for full text search.Natural Language:case insensitive, accent incensitiveQuery Expansion:Search for “database” > returns results that has often has words like “mysql”, “oracle”, … A second search with extended query string happens to find related documents too.Boolean ModeAdds operators to Natural Language type
Anyone using a similar system to this?Implemented yourself, or from a CMS?phpBB has/had a table like this.
Example of FULLTEXT MySQL search query.
Example of FULLTEXT MySQL search query in boolean mode. A bit more powerful.
Now, we add restrictions on a certain fields. Now you need combined indeces to keep this fast.
And even more restrictions.Indeces on all involved feeds? In all combinations? In all orders?Lots of indeces make WRITE operations on your tables slower.
MySQL just isn’t built for complex search …So let’s look at what a system built for search needs …
So it’s old.But still active.Used a lot.
It is however a java library. It’s not a fully managed service.
If you have lots of data; you need a search engine to be scalable & highly available.
And that’s where ElasticSearch comes in.
Download, unzip, start.
This time we install it in the right place, and wrapped in a service.
Now it runs on your localhost. As a HTTP service, so open your browser and surf to your Elastic Search server.
HTTP Access, it’s brilliant!Do you want to secure it? Add firewalls …Do you want to cache it? Add Varnish …
Example of adding something to ElasticSearch from your command shell via a HTTP PUT request done by curl.You can do this right after installation. No need to create an index or configure anything, just add data right away.
Adding a second record (document).
Updating an existing record. (Mind the new version number.)
Want to see the record?Surf to it.The url consist ofindex, type & id.
Want to add a new field (column)?Update your document with new field added.
It’s right there.
So it’s schemaless.And actually we did ZERO CONFIGURATION. We didn’t have to create indeces or tell Elastic Search what type of data we’ll be adding.Actually, you can configure a mapping/schema:To require certain fields to be of a certain typeTo avoid text fields of being analyzed (text analysis)Basically: to speed things up …
What we’ve demoed so far is aNoSQL store. That’s cool. But not all.
Here we do a GET request (in the browser) that searches our newly created index for the word “smashing”It returns the 2nd document.
Curl in PHP is simple.Simplest example of how to do a search to elastic from PHP.
I’ve mentioned a few Elastic Search specific terms.Here’s the full breakdown of terminology and how they related to MySQL concepts.
Back to the clustering features Elastic Search adds …
Here you have a Engagor specific dashboard that shows the 12 servers in the Engagor cluster.You see:server12 is the master;That every node knows each other;There are 11K shards;Each with one replica.Health = green means every shard has a replica (on a different server).If one server goes down: no problem.
Now let’s look at how Engagor uses Elastic Search.First, what do we do?From a technical point of view:Engagor = Huge database of social messages.Facebook, Twitter, Instagram, News sites …We save those that are clients are interested in to our application and offer:Statistics about the tracked dataWorkflow toolsAutomation tools
This is the time to show a slide I stole from a presentation from our CEO Folke.We started 2 years ago.I joined after 6 months.Now a team of 16 people.7 of them technical profilesweb developersdata scientistsbackend developersfrontend developersOur customers includeMobistarTelenetMicrosoft EuropeEuropean CommisionAlproSeveral agenciesThey use Engagor for customer support (“call center” software for social media)marketing insightscrisis detection…
Here you see an example of a search on our cluster for a certain twitter user’s handle.You see it returns 260 social messages.Each message has data like the:IdService it’s fromContentDateAuthor details
Here we’re searching in a topic about Belgacom & Mobile Vikings.“All messages from users with at least 1000 followers, that are negative and from Belgium or in Dutch.”On 4 different fields, and nested … It just works.
This is the Query SQL we’re sending to Elastic for the previous search.
Here we are in a topic about Coca Cola, thus high volume.About 50k message per day. 28 days.That’s 1,4M messages we’re searching in.This is a graph of messages per day.
This is the inbox. Showing the last 10 messages in that topic.Performance: about half a second.
Same inbox, but now only showing messages with the word “thirsty”.Performance: again about half a second.(Only 1 sample, so this is not really a benchmark .)
Now, there’s another feature of search engines you might not immediately think about.
Think of it as an equivalent to MySQL GROUP BY query.
The pie chart:Facet on sentiment field of a mention. Returns totals per value.Sentiment Per Day:Facet on combined fields: sentiment + day(dateadd).Returns data used for second chart.You can also use the filter like in the inbox, to see these facets for a filtered set of data. Eg. Sentiment per day for mentions coming from Belgium only.
For the Telenet Twitter profile:Totals of messages per dayPer typeRetweetsMentionsOwn tweetsEvery “segment” (color) is facet with custom filter/search querySingle ElasticSearch call to get all this information.
Percolation example:Use it to route documents …Eg. you have a stream of data coming in and need to decide what to do with those documents based on queries.Everything matching x is for client AEverything matching y is for client B
There’s a good competition going on. Looks like Elastic Search has made Solr alert again; since they’re also focusing on clustering features now.
It’s not because you have a great search engine, that you have great search experience on your site …Users aren’t very good at it …But searching for things is not that easy …
Look at how natural language differs from search query language.
This happens quite often at the Engagor office.Not that we blame our users …It’s just difficult to get it right.
If you want to play with it, and are excited …But you’re boss is all #meh.
ElasticSearch is a mature product.Soundcloud is using it.
StumbleUpon is using it.
Foursquare switched to it …When checking in somewhere, they show:Locations close to youLocations you previously checked inLocations that are popularThat’s a pretty nifty search query right there.
There’s also a real company behind Elastic Search, backing it with:TutorialsTrainingSupportSLA’sConsulting
Recently redesigned site. Documentation is a bit frightening at first, since it’s hard to know what to search for, but I hope this presentation solves that.IRC channel is very active; quick answers.
If you’re interested in working with Elastic Search, or in fact, any of these other cool technologies …We can always use good profiles: junior or senior, backend or frontend … Send us your resumes ;).March 27th: Now we’re esp. searching for someone who’s good at Javascript.
Already gave an overview of how it’s like to work with the technologies at Engagor.Go to http://startupshizzle.tumblr.com to find out how it’s like to work with the people of our team ;).