Elasticsearch: Big
Data Search
Simplified
{
"name":"Manoj Mohan",
"email":"manoj@intelligrape.com",
"company": "Intelligrape Software Ltd",
"role": "Develop customized web solutions",
"partTimeHobby":["Photography",”Sketching”]
}
speaker.dump()
Agenda
Big Data Search
Contenders
Elasticsearch
- Elastic Whaaat ??
- Some cool features
- Analyzing Data
More than just search
Unchartered Territory
Big Data Search
What is Big Data
Highly subjective term used when human/technical infra
unable to cope with company's data needs
Rapid data consumption volume means todays “Big” is
tomorrows ”Normal”
The Big V's – Velocity, Volume, Variety, Veracity
Small but complex is still “Big”
Is it really Big Data ??
Elasticsearch: Big Data
Search Simplified
“If you aren’t taking advantage of big data, then
you don’t have big data, you have just a
pile of data”
Jay Parikh, VP of Infrastructure
Facebook
Challenges with Big Data Search
Data can be both structured and unstructured
Need to search across exabytes of data
Data needs to be collated and normalized first to get
accurate results
Fine tuning and refining search can take time
Needs an easily scalable solution
Big Data Search
Contenders
Elasticsearch
- Elastic Whaaat ??
- Some cool features
- Analyzing Data
More than just search
Unchartered Territory
Looking Back ….
Contenders
(Open Source)
Contenders
Lucene-search
Apache Solr
Sphinx
Elasticsearch
Lucene
Started in 1999, joined Apache family in 2001
High-performance, full-featured text search engine
library written entirely in Java
Used by adding a jar to your application
Apache Solr
Scalable and distributed solution
Uses Lucene under the hood
Provides a http-wrapper over Lucene marketing itself as
ready-to-use solution.
Adds XML/JSON support, caching, replication, sharding
Deployable in any servlet container – Tomcat, Jetty,
Resin
Sphinx
Cannot be hit directly from your web-app
Aligns closely with SQL/PostgreSQL database
Scalable and Distributed Searching
High performance
No partial updates
No Restful API or data replication
Big Data Search
Contenders
Elasticsearch
- Elastic Whaaat ??
- Some cool features
- Analyzing Data
More than just search
Unchartered Territory
Looking Back ….
Elasticsearch: Big Data
Search Simplified
Elasticsearch
“Your own private Google”
Elastic Whaaat .... ??
ElasticSearch is a distributed, RESTful, free/open source
search server
Based on Apache Lucene
Developed by Shay Banon, written in Java
Latest and Greatest - 0.90.3
Really Elastic !!
High performance
● Document oriented and schema free
Built distributed from the ground up
Support for complex documents
Support for multitenancy
RestfulAPI
Supports dynamic schema updates
Push Replication
Replica created
after successful
indexing
Index a documentP: Primary Shard
R: Replica
Node Auto Discovery
Elasticsearch: Big Data
Search SimplifiedRelocated
New node detected automatically
Shards and replicas are now distributed taking into account the new node
20
Elasticsearch: Big Data
Search Simplified
Ping check
Data nodes ping master to check if master is alive
A new master is elected among the child nodes automatically
Become master
Fail-Safe
Analyzing Data
Indexing it Right !!!
The first step to make big data useful is to identify the
relevant data.
Lets index 2 documents
1) I know I am a really great developer
2) I develop excellent Excuses
Indexing it Right !!!
1) I know I am a really great developer
2) I develop excellent Excuses
Hmmm... Lets index everything ...
Indexing it Right !!!
1) I know I am a really great developer
2) I develop excellent Excuses
Wait ....
Do we actually need
words like “I”, “a”, “am” etc.
c
Indexing it Right !!!
1) I know I am a really great developer
2) I develop excellent Excuses
Wait ....
Some words “stem” from others.
1) I know I am a really great developer
2) I develop excellent Excuses
Wait ....
Some words are simply
synonyms of others
Indexing it Right !!!
Indexing it Right !!!
1) I know I am a really great developer
2) I develop excellent Excuses
Wait ....
Case Insensitive ??
Pre Process Strategy
Drop stopwords (Stopword Analyzer)
Lowercase everything (Lowercase Filter)
Reduce words to stems (Stemming Analyzer)
Consider synonyms (Synonym Filter)
Big Data Search
Contenders
Elasticsearch
- Elastic Whaaat ??
- Some cool features
- Analyzing Data
More than just search
Unchartered Territory
Looking Back ….
More than just search ...
Facets
Facets
Faceting - Example
Range Facet
Histogram Facet
Geo Facet
Percolator
Search ... Reversed
What is Percolation?
“Reversed search”
Instead of storing documents, and then searching them
with queries ..... store queries and “percolate” documents
through them
Elasticsearch Percolator
Register a number of queries and determine which of
them match a particular document
Business rules of your application. Helping you decide
on the “nature” of the document
Commonly used for alerts
Percolating !!!
Distance > 100 distanceAlarm
TestPercolator
distanceAlarm
Elastic Index
Percolating !!!
TestPercolator
DistanceAlarm
Elastic Index
“name”:”John Doe”,
“distance” : 120
{ "ok" : true, "matches" : ["distanceAlarm"]}
Big Data Search
Contenders
Elasticsearch
- Elastic Whaaat ??
- Some cool features
- Analyzing Data
More than just search
Unchartered Territory
Looking Back ….
Unchartered Territory
Elasticsearch and NOSQL Database
Stores documents or JSON data
CRUD
High performance -24 billion records queried in 900 ms
Unlike SOLR original source is retained
Built in support for Sharding and Replicas
Versioning and Commits
What IF ??
Remove the hassle of maintaining 2 data stores
Simply use ES as a NOSQL db with advanced out of
the box search capabilities
Has limitations.. but can be worked around
References
www.elasticsearch.com
http://www.elasticsearch.org/guide/
http://www.villanovau.com/university-online-programs/what-is-big-data/
http://gigaom.com/2012/08/22/facebook-is-collecting-your-data-500-terabytes-a-day/
https://lithosphere.lithium.com/t5/science-of-social-blog/Searching-and-Filtering-Big-
Data-The-2-Sides-of-the-Relevances/ba-p/38074
http://lbroudoux.wordpress.com/2013/04/30/real-time-analytics-with-elasticsearch-and-
kibana3/
http://onemilliontweetmap.com/
And lots of Googling … :D
Contact us
Our Office
Client Location
Here’s how TOTHENEW
helps your customers
outsource across the globe
using BIG DATA!
Click Here To Know More!
Have more queries related to
BIG DATA?
Talk To Our Experts
Thank You

BigData Search Simplified with ElasticSearch