SlideShare a Scribd company logo
1 of 37
Download to read offline
http://www.meetup.com/abctalks
Agenda
• What is Big Data?
• Why is NOSQL required?
• What are different types of NOSQL database?
• ElasticSearch - Introduction
• ElasticSearch - Features
• Hands on
http://www.meetup.com/abctalks
Big Data
 Any collection of data sets so large and complex that it becomes difficult to
process using traditional data processing applications.
 Require "massively parallel software running on tens, hundreds, or even
thousands of servers"
http://www.meetup.com/abctalks
Factors of growth, challenges and
opportunities of big data
 Volume – the quantity of data that is generated.
 Variety – category to which Big Data belongs to.
 Velocity – how fast the data is generated and processed to meet the demands.
http://www.meetup.com/abctalks
Horizontal & Vertical scaling
 Horizontal scaling - scale by adding more machines to your pool of resources.
 Vertical scaling - scale by adding more power (CPU, RAM, etc.) to your existing
machine.
 Horizontal scaling is easier to scale dynamically by adding more machines into
the existing pool.
 Vertical scaling is often limited to the capacity of a single machine
 Horizontal scaling are the Cloud data stores, e.g. DynamoDB, Cassandra ,
MongoDB
 Vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL)
http://www.meetup.com/abctalks
NOSQL
 Basically a large serialized object store
 Doesn’t have a structured schema
 Recommends de-normalization
 Designed to be distributed (cloud-scale) out of the box
 Because of this, drops the ACID requirements
 Any database can answer any query
 Any write query can operate against any database and will “eventually” propagate to other
distributed servers
http://www.meetup.com/abctalks
Why NOSQL?
 Today, data is becoming easier to access and capture through third parties such as Facebook,
Google+ and others.
 Personal user information, social graphs, geo-location data, user-generated content and
machine logging data are just a few examples where the data has been increasing
exponentially.
 To use the above services properly requires the processing of huge amounts of data. Which
SQL databases are no good for, and were never designed for.
 NoSQL databases have evolved to handle this huge data properly.
http://www.meetup.com/abctalks
CAP Theorem
 Consistency - This means that all nodes see the same
data at the same time.
 Availability - This means that the system is always on,
no downtime.
 Partition Tolerance - This means that the system
continues to function even if the communication
among the servers is unreliable
Distributed systems must be partition tolerant , so we
have to choose between Consistency and Availability.
http://www.meetup.com/abctalks
Different types of NOSQL
Column Store
 Column data is saved together, as opposed to row data
 Super useful for data analytics
 Hadoop, Cassandra, Hypertable
Key-Value Store
 A key that refers to a payload
 MemcacheDB, Azure Table Storage, Redis
Document / XML / Object Store
 Key (and possibly other indexes) point at a serialized object
 DB can operate against values in document
 MongoDB, CouchDB, RavenDB, ElasticSearch
Graph Store
 Nodes are stored independently, and the relationship between nodes (edges) are
stored with data
http://www.meetup.com/abctalks
RDBMS vs NOSQL
RDBMS NoSQL
Structured and organized data Semi-structured or unorganized data
Structured Query Language (SQL) No declarative query language
Tight consistency Eventual consistency
ACID transactions BASE transactions
Data and Relationships stored in tables No pre defined schema
http://www.meetup.com/abctalks
What is ElasticSearch?
 ElasticSearchisafreeandopensourcedistributedinvertedindexcreatedbyshaybanon.
 BuildontopofApacheLucene
 Luceneisamostpopularjava-basedfulltextsearchindeximplementation.
 Firstpublicreleaseversionv0.4inFebruary2010.
 DevelopedinJava,soinherentlycross-platform.
http://www.meetup.com/abctalks
Why ElasticSearch?
 Easy to scale (Distributed)
 Everything is one JSON call away (RESTful API)
 Unleashed power of Lucene under the hood
 Excellent Query DSL
 Multi-tenancy
 Support for advanced search features (Full Text)
 Configurable and Extensible
 Document Oriented
 Schema free
 Conflict management
 Active community
.
http://www.meetup.com/abctalks
 ElasticSearch is built to scale horizontally out of the box. When ever
you need to increase capacity, just add more nodes, and let the
cluster reorganize itself to take advantage of the extra hardware.
 One server can hold one or more parts of one or more indexes, and
whenever new nodes are introduced to the cluster they are just
being added to the party. Every such index, or part of it, is called a
shard, and ElasticSearch shards can be moved around the cluster
very easily.
Easy to Scale (Distributed)
RESTful API
 ElasticSearch is API driven. Almost any action can be performed using a
simple RESTful API using JSON over HTTP. .
 Responses are always in JSON format.
http://www.meetup.com/abctalks
 Apache Lucene is a high performance, full-featured Information
Retrieval library, written in Java. ElasticSearch uses Lucene internally to
build its state of the art distributed search and analytics capabilities.
 Since Lucene is a stable, proven technology, and continuously being
added with more features and best practices, having Lucene as the
underlying engine that powers ElasticSearch.
Build on top of Apache Lucene
Excellent Query DSL
 The REST API exposes a very complex and capable query DSL, that is very
easy to use. Every query is just a JSON object that can practically contain
any type of query, or even several of them combined.
 Using filtered queries, with some queries expressed as Lucene filters,
helps leverage caching and thus speed up common queries, or complex
queries with parts that can be reused.
 Faceting, another very common search feature, is just something that
upon-request is accompanied to search results, and then is ready for you
to use.http://www.meetup.com/abctalks
 Multiple indexes can be stored on one ElasticSearch installation
- node or cluster. Each index can have multiple "types", which
are essentially completely different indexes.
 The nice thing is you can query multiple types and multiple
indexes with one simple query.
Multi-tenancy
Support for advanced search features (Full Text)
 ElasticSearch uses Lucene under the covers to provide the most powerful
full text search capabilities available in any open source product.
 Search comes with multi-language support, a powerful query language,
support for geolocation, context aware did-you-mean suggestions,
autocomplete and search snippets.
 Script support in filters and scorers
http://www.meetup.com/abctalks
 Many of ElasticSearch configurations can be changed while ElasticSearch is running, but some will require a restart (and in
some cases re-indexing). Most configurations can be changed using the REST API too.
 ElasticSearch has several extension points - namely site plugins (let you serve static content from ES - like monitoring java
script apps), rivers (for feeding data into ElasticSearch), and plugins to add modules or components within ElasticSearch
itself. This allows you to switch almost every part of ElasticSearch if so you choose, fairly easily.
Configurable and Extensible
Document Oriented
 Store complex real world entities in ElasticSearch as structured JSON
documents. All fields are indexed by default, and all the indices can be
used in a single query, to return results at breath taking speed.
Per-operation Persistence
 ElasticSearch primary moto is data safety. Document changes are recorded
in transaction logs on multiple nodes in the cluster to minimize the chance
of any data loss.
http://www.meetup.com/abctalks
 ElasticSearch allows you to get started easily. Send a JSON
document and it will try to detect the data structure, index the
data and make it searchable.
Schema free
Conflict management
 Optimistic version control can be used where needed to ensure that data
is never lost due to conflicting changes from multiple processes.
Active community
 The community, other than creating nice tools and plugins, is very helpful and supporting. The overall vibe is really great, and
this is an important metric of any OSS project.
 There are also some books currently being written by community members, and many blog posts around the net sharing
experiences and knowledge
http://www.meetup.com/abctalks
Architecture
Basic Concepts
 Cluster: Aclusterconsistsofoneormorenodeswhichsharethesameclustername.Eachclusterhasasinglemaster
nodewhichischosenautomaticallybytheclusterandwhichcanbereplacedifthecurrentmasternodefails.
 Node: AnodeisarunninginstanceofElasticSearchwhichbelongstoacluster.Multiplenodescanbestartedona
singleserverfortestingpurposes,butusuallyyoushouldhaveonenodeperserver. Atstartup,anodewilluseunicast
(ormulticast,ifspecified)todiscoveranexistingclusterwiththesameclusternameandwilltrytojointhatcluster.
 Index: Anindexislikea‘database’inarelationaldatabase.Ithasamappingwhichdefinesmultipletypes.Anindexisa
logicalnamespacewhichmapstooneormoreprimaryshardsandcanhavezeroormorereplicashards.
 Type: Atypeislikea‘table’inarelationaldatabase.Eachtypehasalistoffieldsthatcanbespecifiedfordocumentsof
thattype.Themappingdefineshoweachfieldinthedocumentisanalyzed.
http://www.meetup.com/abctalks
Basic Concepts
 Document: AdocumentisaJSONdocumentwhichisstoredinElasticSearch.Itislikearowinatableinarelationaldatabase.Each
documentisstoredinanindexandhasatypeandanid.AdocumentisaJSONobject(alsoknowninotherlanguagesasahash/
hashmap/associativearray)whichcontainszeroormorefields,orkey-valuepairs.TheoriginalJSONdocumentthatisindexedwillbe
storedinthe_sourcefield,whichisreturnedbydefaultwhengettingorsearchingforadocument.
 Field: Adocumentcontainsalistoffields,orkey-valuepairs.Thevaluecanbeasimple(scalar)value(egastring,integer,date),ora
nestedstructurelikeanarrayoranobject.Afieldissimilartoacolumninatableinarelationaldatabase.Themappingforeachfieldhas
afield‘type’(nottobeconfusedwithdocumenttype)whichindicatesthetypeofdatathatcanbestoredinthatfield,eginteger,string,
object.Themappingalsoallowsyoutodefine(amongstotherthings) howthevalueforafieldshouldbeanalyzed.
 Mapping: Amappingislikea‘schemadefinition’inarelationaldatabase.Eachindexhasamapping,whichdefineseachtypewithin
theindex,plusanumberofindex-widesettings.Amappingcaneitherbedefinedexplicitly,oritwillbegeneratedautomaticallywhena
documentisindexed.
http://www.meetup.com/abctalks
Basic Concepts
 Shard: AshardisasingleLuceneinstance.Itisalow-level“worker”unitwhichismanagedautomaticallyby
ElasticSearch.Anindexisalogicalnamespacewhichpointstoprimaryandreplicashards.
ElasticSearchdistributesshardsamongstallnodesinthecluster,andcanmoveshardsautomaticallyfromonenodeto
anotherinthecaseofnodefailure,ortheadditionofnewnodes.
 PrimaryShard: Eachdocumentisstoredinasingleprimaryshard.Whenadocumentissendforindexing,itisindexed
firstontheprimaryshard,thenonallreplicasoftheprimaryshard.Bydefault,anindexhas5primaryshards.Youcan
specifyfewerormoreprimaryshardstoscalethenumberofdocumentsthatyourindexcanhandle.
 ReplicaShard: Eachprimaryshardcanhavezeroormorereplicas.Areplicaisacopyoftheprimaryshard,andhastwo
purposes:
a. increasefailover:areplicashardcanbepromotedtoaprimaryshardiftheprimaryfails.
b. increaseperformance:getandsearchrequestscanbehandledbyprimaryorreplicashards.
 Identifiedbyindex/type/id
Configuration
 cluster.name:Clusternameidentifiesclusterforauto-discovery.Ifproductionenvironmenthasmultipleclustersonthesamenetwork,clusternamemustbe
unique.
 node.name:Nodenamesaregenerateddynamicallyonstartup.Butusercanspecifyanametonodemanually.
 node.master &node.data:Everynodecanbeconfiguredtoallowordenybeingeligibleasthemaster,andtoallowordenytostorethedata.Masterallow
thisnodetobeeligibleasamasternode(enabledbydefault)andDataallowthisnodetostoredata(enabledbydefault).
Followingarethesettingstodesignadvancedclustertopologies.
1. Ifanodetoneverbecomeamasternode,onlytoholddata.Thiswillbethe"workhorse"ofthecluster.
node.master:false,node.data:true
2. Ifanodetoonlyserveasamasterandnottostoredataandtohavefreeresources.Thiswillbethe"coordinator"ofthecluster. node.master:
true,node.data:false
3. Ifanodetobeneithermasternordatanode,but toactasa"searchloadbalancer"(fetchingdatafromnodes,aggregating,etc.)
node.master:false,node.data:false
 Index::Anumberofoptions(suchasshard/replicaoptions,mappingoranalyzerdefinitions,translogsettings,...)canbesetforindicesglobally,inthisfile.
Note,thatitmakesmoresensetoconfigureindexsettingsspecificallyforacertainindex,eitherwhencreatingitorbyusingtheindextemplatesAPI..
example.index.number_of_shards:5,index.number_of_replicas:1
 Discovery:ElasticSearchsupportsdifferenttypesofdiscovery,whichimakesmultipleElasticSearchinstancestalktoeachother.
Thedefaulttypeofdiscoveryismulticast. Unicastdiscoveryallowstoexplicitlycontrolwhichnodeswillbeusedtodiscoverthecluster.Itcanbeusedwhen
multicastisnotpresent,ortorestricttheclustercommunication-wise.
http://www.meetup.com/abctalks
Cluster Architecture
http://www.meetup.com/abctalks
Is it running?
http://localhost:9200/?pretty
Response:
{
"status" : 200,
"name" : “elasticsearch",
"version" : {
"number" : "1.3.4",
"build_hash" : "f1585f096d3f3985e73456debdc1a0745f512bbc",
"build_timestamp" : "2015-04-21T14:27:12Z",
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
http://www.meetup.com/abctalks
Index Request
Indexing a document
Request:
PUT test/cities/1
{
"rank": 3,
"city": "Hyderabad",
"state": "Telangana",
"population2014": 7750000,
"land_area": 625,
"location":
{
"lat": 17.37,
"lon": 78.48
},
"abbreviation": "Hyd"
}
Response:{ "_index": "test", "_type": "cities", "_id": "1", "_version": 1, "created": true }
http://www.meetup.com/abctalks
Search Request
http://www.meetup.com/abctalks
Getting a document
Request:
GET test/cities/1?pretty
Response:
{
"_index": "test",
"_type": "cities",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"rank": 3,
"city": "Hyderabad",
"state": "Telangana",
"population2014": 7750000,
"land_area": 625,
"location": {
"lat": 17.37,
"lon": 78.48
},
"abbreviation": "Hyd"
}
}
http://www.meetup.com/abctalks
Updating a document
Request:
PUT test/cities/1
{
"rank": 3,
"city": "Hyderabad",
"state": "Telangana",
"population2013": 7023000,
"population2014": 7750000,
"land_area": 625,
"location":
{
"lat": 17.37,
"lon": 78.48
},
"abbreviation": "Hyd"
}
Response:{"_index": "test", "_type": "cities", "_id": "1", "_version": 2, "created": false}
http://www.meetup.com/abctalks
Searching
Search across all indexes and all types
http://localhost:9200/_search
Search across all types in the test index.
http://localhost:9200/test/_search
Search explicitly for documents of type cities within the test index.
http://localhost:9200/test/cities/_search
There’s3differenttypesofsearchqueries
 Full Text Search (query string)
 Structured Search (filter)
 Analytics (facets)
http://www.meetup.com/abctalks
Routing
 All the data lives in a primary shard in the cluster. You may have ‘N’ number of shards in the cluster. Routing is the
process of determining which shard that document will reside in.
 ElasticSearch has no idea where a indexed document is located. So ElasticSearch broadcasts the request
to all shards. This is a non-negligible overhead and can easily impact performance.
 Routing ensures that all documents with the same routing value will locate to the same shard, eliminating the
need to broadcast searches and increase the performance.
http://www.meetup.com/abctalks
Data Synchronization
 ElasticSearch supports river a pluggable service to run within ElasticSearch cluster to pull data (or being
pushed with data) that is then indexed into the cluster.(https://github.com/jprante/ElasticSearch-river-
jdbc)
Rivers are available for mongodb, couchdb, rabitmq, twitter, wikipedia, mysql, and etc
The relational data is internally transformed into structured JSON objects for the schema-less
indexing model of ElasticSearch documents.
The plugin can fetch data from different RDBMS source in parallel, and multithreaded bulk mode
ensures high throughput when indexing to ElasticSearch.
 TypicallyElasticSearchimplementsworkerroleasalayerwithintheapplicationtopushdata/entitiestoElasticsearch.
http://www.meetup.com/abctalks
Products
http://www.meetup.com/abctalks
Monitoring Tools
 ElasticSearch-Head-https://github.com/mobz/ElasticSearch-head
 Marvel-http://www.elastic.co/guide/en/marvel/current/#_marvel_8217_s_dashboards
 Paramedic-https://github.com/karmi/ElasticSearch-paramedic
 Bigdesk-https://github.com/lukas-vlcek/bigdesk/
http://www.meetup.com/abctalks
Who is using
https://www.elastic.co/use-cases
http://www.meetup.com/abctalks
http://www.elastic.co/guide/en/elasticsearch/guide/
current/index.html
http://www.elasticsearchtutorial.com/
http://lucene.apache.org/
Lucene in Action
SlideShare.net presentations on ElasticSearch
References
http://www.meetup.com/abctalks
http://www.meetup.com/abctalks

More Related Content

What's hot

Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchIsmaeel Enjreny
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search medcl
 
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearchFadel Chafai
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...Rahul K Chauhan
 
Elasticsearch for Data Analytics
Elasticsearch for Data AnalyticsElasticsearch for Data Analytics
Elasticsearch for Data AnalyticsFelipe
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 

What's hot (20)

Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Introduction à ElasticSearch
Introduction à ElasticSearchIntroduction à ElasticSearch
Introduction à ElasticSearch
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Elasticsearch for Data Analytics
Elasticsearch for Data AnalyticsElasticsearch for Data Analytics
Elasticsearch for Data Analytics
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
The Elastic ELK Stack
The Elastic ELK StackThe Elastic ELK Stack
The Elastic ELK Stack
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 

Similar to Elastic search overview

Explore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth UsingExplore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth UsingInexture Solutions
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxKnoldus Inc.
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtGenoveva Vargas-Solar
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxKnoldus Inc.
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...
Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...
Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...Dean Delamont
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and ElasticsearchDean Hamstead
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionQui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionGlobalLogic Ukraine
 

Similar to Elastic search overview (20)

Elastic search
Elastic searchElastic search
Elastic search
 
Explore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth UsingExplore Elasticsearch and Why It’s Worth Using
Explore Elasticsearch and Why It’s Worth Using
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 
Filebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptxFilebeat Elastic Search Presentation.pptx
Filebeat Elastic Search Presentation.pptx
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...
Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...
Achieving Cost & Resource Effeciencies through Trove Database As-A-Service (D...
 
NOSQL
NOSQLNOSQL
NOSQL
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Qui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in ActionQui Quaerit, Reperit. AWS Elasticsearch in Action
Qui Quaerit, Reperit. AWS Elasticsearch in Action
 

Recently uploaded

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Recently uploaded (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Elastic search overview

  • 2. Agenda • What is Big Data? • Why is NOSQL required? • What are different types of NOSQL database? • ElasticSearch - Introduction • ElasticSearch - Features • Hands on http://www.meetup.com/abctalks
  • 3. Big Data  Any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.  Require "massively parallel software running on tens, hundreds, or even thousands of servers" http://www.meetup.com/abctalks
  • 4. Factors of growth, challenges and opportunities of big data  Volume – the quantity of data that is generated.  Variety – category to which Big Data belongs to.  Velocity – how fast the data is generated and processed to meet the demands. http://www.meetup.com/abctalks
  • 5. Horizontal & Vertical scaling  Horizontal scaling - scale by adding more machines to your pool of resources.  Vertical scaling - scale by adding more power (CPU, RAM, etc.) to your existing machine.  Horizontal scaling is easier to scale dynamically by adding more machines into the existing pool.  Vertical scaling is often limited to the capacity of a single machine  Horizontal scaling are the Cloud data stores, e.g. DynamoDB, Cassandra , MongoDB  Vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL) http://www.meetup.com/abctalks
  • 6. NOSQL  Basically a large serialized object store  Doesn’t have a structured schema  Recommends de-normalization  Designed to be distributed (cloud-scale) out of the box  Because of this, drops the ACID requirements  Any database can answer any query  Any write query can operate against any database and will “eventually” propagate to other distributed servers http://www.meetup.com/abctalks
  • 7. Why NOSQL?  Today, data is becoming easier to access and capture through third parties such as Facebook, Google+ and others.  Personal user information, social graphs, geo-location data, user-generated content and machine logging data are just a few examples where the data has been increasing exponentially.  To use the above services properly requires the processing of huge amounts of data. Which SQL databases are no good for, and were never designed for.  NoSQL databases have evolved to handle this huge data properly. http://www.meetup.com/abctalks
  • 8. CAP Theorem  Consistency - This means that all nodes see the same data at the same time.  Availability - This means that the system is always on, no downtime.  Partition Tolerance - This means that the system continues to function even if the communication among the servers is unreliable Distributed systems must be partition tolerant , so we have to choose between Consistency and Availability. http://www.meetup.com/abctalks
  • 9. Different types of NOSQL Column Store  Column data is saved together, as opposed to row data  Super useful for data analytics  Hadoop, Cassandra, Hypertable Key-Value Store  A key that refers to a payload  MemcacheDB, Azure Table Storage, Redis Document / XML / Object Store  Key (and possibly other indexes) point at a serialized object  DB can operate against values in document  MongoDB, CouchDB, RavenDB, ElasticSearch Graph Store  Nodes are stored independently, and the relationship between nodes (edges) are stored with data http://www.meetup.com/abctalks
  • 10. RDBMS vs NOSQL RDBMS NoSQL Structured and organized data Semi-structured or unorganized data Structured Query Language (SQL) No declarative query language Tight consistency Eventual consistency ACID transactions BASE transactions Data and Relationships stored in tables No pre defined schema http://www.meetup.com/abctalks
  • 11. What is ElasticSearch?  ElasticSearchisafreeandopensourcedistributedinvertedindexcreatedbyshaybanon.  BuildontopofApacheLucene  Luceneisamostpopularjava-basedfulltextsearchindeximplementation.  Firstpublicreleaseversionv0.4inFebruary2010.  DevelopedinJava,soinherentlycross-platform. http://www.meetup.com/abctalks
  • 12. Why ElasticSearch?  Easy to scale (Distributed)  Everything is one JSON call away (RESTful API)  Unleashed power of Lucene under the hood  Excellent Query DSL  Multi-tenancy  Support for advanced search features (Full Text)  Configurable and Extensible  Document Oriented  Schema free  Conflict management  Active community . http://www.meetup.com/abctalks
  • 13.  ElasticSearch is built to scale horizontally out of the box. When ever you need to increase capacity, just add more nodes, and let the cluster reorganize itself to take advantage of the extra hardware.  One server can hold one or more parts of one or more indexes, and whenever new nodes are introduced to the cluster they are just being added to the party. Every such index, or part of it, is called a shard, and ElasticSearch shards can be moved around the cluster very easily. Easy to Scale (Distributed) RESTful API  ElasticSearch is API driven. Almost any action can be performed using a simple RESTful API using JSON over HTTP. .  Responses are always in JSON format. http://www.meetup.com/abctalks
  • 14.  Apache Lucene is a high performance, full-featured Information Retrieval library, written in Java. ElasticSearch uses Lucene internally to build its state of the art distributed search and analytics capabilities.  Since Lucene is a stable, proven technology, and continuously being added with more features and best practices, having Lucene as the underlying engine that powers ElasticSearch. Build on top of Apache Lucene Excellent Query DSL  The REST API exposes a very complex and capable query DSL, that is very easy to use. Every query is just a JSON object that can practically contain any type of query, or even several of them combined.  Using filtered queries, with some queries expressed as Lucene filters, helps leverage caching and thus speed up common queries, or complex queries with parts that can be reused.  Faceting, another very common search feature, is just something that upon-request is accompanied to search results, and then is ready for you to use.http://www.meetup.com/abctalks
  • 15.  Multiple indexes can be stored on one ElasticSearch installation - node or cluster. Each index can have multiple "types", which are essentially completely different indexes.  The nice thing is you can query multiple types and multiple indexes with one simple query. Multi-tenancy Support for advanced search features (Full Text)  ElasticSearch uses Lucene under the covers to provide the most powerful full text search capabilities available in any open source product.  Search comes with multi-language support, a powerful query language, support for geolocation, context aware did-you-mean suggestions, autocomplete and search snippets.  Script support in filters and scorers http://www.meetup.com/abctalks
  • 16.  Many of ElasticSearch configurations can be changed while ElasticSearch is running, but some will require a restart (and in some cases re-indexing). Most configurations can be changed using the REST API too.  ElasticSearch has several extension points - namely site plugins (let you serve static content from ES - like monitoring java script apps), rivers (for feeding data into ElasticSearch), and plugins to add modules or components within ElasticSearch itself. This allows you to switch almost every part of ElasticSearch if so you choose, fairly easily. Configurable and Extensible Document Oriented  Store complex real world entities in ElasticSearch as structured JSON documents. All fields are indexed by default, and all the indices can be used in a single query, to return results at breath taking speed. Per-operation Persistence  ElasticSearch primary moto is data safety. Document changes are recorded in transaction logs on multiple nodes in the cluster to minimize the chance of any data loss. http://www.meetup.com/abctalks
  • 17.  ElasticSearch allows you to get started easily. Send a JSON document and it will try to detect the data structure, index the data and make it searchable. Schema free Conflict management  Optimistic version control can be used where needed to ensure that data is never lost due to conflicting changes from multiple processes. Active community  The community, other than creating nice tools and plugins, is very helpful and supporting. The overall vibe is really great, and this is an important metric of any OSS project.  There are also some books currently being written by community members, and many blog posts around the net sharing experiences and knowledge http://www.meetup.com/abctalks
  • 19. Basic Concepts  Cluster: Aclusterconsistsofoneormorenodeswhichsharethesameclustername.Eachclusterhasasinglemaster nodewhichischosenautomaticallybytheclusterandwhichcanbereplacedifthecurrentmasternodefails.  Node: AnodeisarunninginstanceofElasticSearchwhichbelongstoacluster.Multiplenodescanbestartedona singleserverfortestingpurposes,butusuallyyoushouldhaveonenodeperserver. Atstartup,anodewilluseunicast (ormulticast,ifspecified)todiscoveranexistingclusterwiththesameclusternameandwilltrytojointhatcluster.  Index: Anindexislikea‘database’inarelationaldatabase.Ithasamappingwhichdefinesmultipletypes.Anindexisa logicalnamespacewhichmapstooneormoreprimaryshardsandcanhavezeroormorereplicashards.  Type: Atypeislikea‘table’inarelationaldatabase.Eachtypehasalistoffieldsthatcanbespecifiedfordocumentsof thattype.Themappingdefineshoweachfieldinthedocumentisanalyzed. http://www.meetup.com/abctalks
  • 20. Basic Concepts  Document: AdocumentisaJSONdocumentwhichisstoredinElasticSearch.Itislikearowinatableinarelationaldatabase.Each documentisstoredinanindexandhasatypeandanid.AdocumentisaJSONobject(alsoknowninotherlanguagesasahash/ hashmap/associativearray)whichcontainszeroormorefields,orkey-valuepairs.TheoriginalJSONdocumentthatisindexedwillbe storedinthe_sourcefield,whichisreturnedbydefaultwhengettingorsearchingforadocument.  Field: Adocumentcontainsalistoffields,orkey-valuepairs.Thevaluecanbeasimple(scalar)value(egastring,integer,date),ora nestedstructurelikeanarrayoranobject.Afieldissimilartoacolumninatableinarelationaldatabase.Themappingforeachfieldhas afield‘type’(nottobeconfusedwithdocumenttype)whichindicatesthetypeofdatathatcanbestoredinthatfield,eginteger,string, object.Themappingalsoallowsyoutodefine(amongstotherthings) howthevalueforafieldshouldbeanalyzed.  Mapping: Amappingislikea‘schemadefinition’inarelationaldatabase.Eachindexhasamapping,whichdefineseachtypewithin theindex,plusanumberofindex-widesettings.Amappingcaneitherbedefinedexplicitly,oritwillbegeneratedautomaticallywhena documentisindexed. http://www.meetup.com/abctalks
  • 21. Basic Concepts  Shard: AshardisasingleLuceneinstance.Itisalow-level“worker”unitwhichismanagedautomaticallyby ElasticSearch.Anindexisalogicalnamespacewhichpointstoprimaryandreplicashards. ElasticSearchdistributesshardsamongstallnodesinthecluster,andcanmoveshardsautomaticallyfromonenodeto anotherinthecaseofnodefailure,ortheadditionofnewnodes.  PrimaryShard: Eachdocumentisstoredinasingleprimaryshard.Whenadocumentissendforindexing,itisindexed firstontheprimaryshard,thenonallreplicasoftheprimaryshard.Bydefault,anindexhas5primaryshards.Youcan specifyfewerormoreprimaryshardstoscalethenumberofdocumentsthatyourindexcanhandle.  ReplicaShard: Eachprimaryshardcanhavezeroormorereplicas.Areplicaisacopyoftheprimaryshard,andhastwo purposes: a. increasefailover:areplicashardcanbepromotedtoaprimaryshardiftheprimaryfails. b. increaseperformance:getandsearchrequestscanbehandledbyprimaryorreplicashards.  Identifiedbyindex/type/id
  • 22. Configuration  cluster.name:Clusternameidentifiesclusterforauto-discovery.Ifproductionenvironmenthasmultipleclustersonthesamenetwork,clusternamemustbe unique.  node.name:Nodenamesaregenerateddynamicallyonstartup.Butusercanspecifyanametonodemanually.  node.master &node.data:Everynodecanbeconfiguredtoallowordenybeingeligibleasthemaster,andtoallowordenytostorethedata.Masterallow thisnodetobeeligibleasamasternode(enabledbydefault)andDataallowthisnodetostoredata(enabledbydefault). Followingarethesettingstodesignadvancedclustertopologies. 1. Ifanodetoneverbecomeamasternode,onlytoholddata.Thiswillbethe"workhorse"ofthecluster. node.master:false,node.data:true 2. Ifanodetoonlyserveasamasterandnottostoredataandtohavefreeresources.Thiswillbethe"coordinator"ofthecluster. node.master: true,node.data:false 3. Ifanodetobeneithermasternordatanode,but toactasa"searchloadbalancer"(fetchingdatafromnodes,aggregating,etc.) node.master:false,node.data:false  Index::Anumberofoptions(suchasshard/replicaoptions,mappingoranalyzerdefinitions,translogsettings,...)canbesetforindicesglobally,inthisfile. Note,thatitmakesmoresensetoconfigureindexsettingsspecificallyforacertainindex,eitherwhencreatingitorbyusingtheindextemplatesAPI.. example.index.number_of_shards:5,index.number_of_replicas:1  Discovery:ElasticSearchsupportsdifferenttypesofdiscovery,whichimakesmultipleElasticSearchinstancestalktoeachother. Thedefaulttypeofdiscoveryismulticast. Unicastdiscoveryallowstoexplicitlycontrolwhichnodeswillbeusedtodiscoverthecluster.Itcanbeusedwhen multicastisnotpresent,ortorestricttheclustercommunication-wise. http://www.meetup.com/abctalks
  • 24. Is it running? http://localhost:9200/?pretty Response: { "status" : 200, "name" : “elasticsearch", "version" : { "number" : "1.3.4", "build_hash" : "f1585f096d3f3985e73456debdc1a0745f512bbc", "build_timestamp" : "2015-04-21T14:27:12Z", "build_snapshot" : false, "lucene_version" : "4.9" }, "tagline" : "You Know, for Search" } http://www.meetup.com/abctalks
  • 26. Indexing a document Request: PUT test/cities/1 { "rank": 3, "city": "Hyderabad", "state": "Telangana", "population2014": 7750000, "land_area": 625, "location": { "lat": 17.37, "lon": 78.48 }, "abbreviation": "Hyd" } Response:{ "_index": "test", "_type": "cities", "_id": "1", "_version": 1, "created": true } http://www.meetup.com/abctalks
  • 28. Getting a document Request: GET test/cities/1?pretty Response: { "_index": "test", "_type": "cities", "_id": "1", "_version": 1, "found": true, "_source": { "rank": 3, "city": "Hyderabad", "state": "Telangana", "population2014": 7750000, "land_area": 625, "location": { "lat": 17.37, "lon": 78.48 }, "abbreviation": "Hyd" } } http://www.meetup.com/abctalks
  • 29. Updating a document Request: PUT test/cities/1 { "rank": 3, "city": "Hyderabad", "state": "Telangana", "population2013": 7023000, "population2014": 7750000, "land_area": 625, "location": { "lat": 17.37, "lon": 78.48 }, "abbreviation": "Hyd" } Response:{"_index": "test", "_type": "cities", "_id": "1", "_version": 2, "created": false} http://www.meetup.com/abctalks
  • 30. Searching Search across all indexes and all types http://localhost:9200/_search Search across all types in the test index. http://localhost:9200/test/_search Search explicitly for documents of type cities within the test index. http://localhost:9200/test/cities/_search There’s3differenttypesofsearchqueries  Full Text Search (query string)  Structured Search (filter)  Analytics (facets) http://www.meetup.com/abctalks
  • 31. Routing  All the data lives in a primary shard in the cluster. You may have ‘N’ number of shards in the cluster. Routing is the process of determining which shard that document will reside in.  ElasticSearch has no idea where a indexed document is located. So ElasticSearch broadcasts the request to all shards. This is a non-negligible overhead and can easily impact performance.  Routing ensures that all documents with the same routing value will locate to the same shard, eliminating the need to broadcast searches and increase the performance. http://www.meetup.com/abctalks
  • 32. Data Synchronization  ElasticSearch supports river a pluggable service to run within ElasticSearch cluster to pull data (or being pushed with data) that is then indexed into the cluster.(https://github.com/jprante/ElasticSearch-river- jdbc) Rivers are available for mongodb, couchdb, rabitmq, twitter, wikipedia, mysql, and etc The relational data is internally transformed into structured JSON objects for the schema-less indexing model of ElasticSearch documents. The plugin can fetch data from different RDBMS source in parallel, and multithreaded bulk mode ensures high throughput when indexing to ElasticSearch.  TypicallyElasticSearchimplementsworkerroleasalayerwithintheapplicationtopushdata/entitiestoElasticsearch. http://www.meetup.com/abctalks
  • 34. Monitoring Tools  ElasticSearch-Head-https://github.com/mobz/ElasticSearch-head  Marvel-http://www.elastic.co/guide/en/marvel/current/#_marvel_8217_s_dashboards  Paramedic-https://github.com/karmi/ElasticSearch-paramedic  Bigdesk-https://github.com/lukas-vlcek/bigdesk/ http://www.meetup.com/abctalks