3 DBAs walk into a
NOSQL bar. A little
while later they walk
out ..... because they
couldn't find a table.
3 nosql guys walk into a
SQL bar... a little while
later they leave because
they couldn't find a
relationship
Objective – Understanding of ElasticSeach as a
search engine and nosql datastore.
About Me
Search Architect
Big Data/Hadoop Engineer
NoSql Advocate
blog.nosqltips.com
@nosqltips on twitter
ElasticSearch
You know, for search
Shay Banon - compass
Elasticsearch.org
Built on Lucene core (4.3 as of 0.90.3)
JSON over HTTP (REST)
ElasticSearch
Very easy scalability – multicast, unicast, AWS
Open Source – apache 2 license
 https://github.com/elasticsearch
Transports – HTTP, memcached, thrift
Scripting – mvel, javascript, java, python, groovy
 custom scoring, document updates
Schema Free &
Document Oriented
$ curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{
"name" : "Shay Banon"
}'
$ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{
"user": "kimchy",
"post_date": "2009-11-15T13:12:00",
"message": "Trying out elasticsearch, so far so good?"
}'
$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "You know, for Search"
}'
Search
$ curl -XGET http://localhost:9200/twitter/tweet/_search?q=user:kimchy
$ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{
"query" : {
"term" : { "user": "kimchy" }
}
}'
$ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{
"query" : {
"range" : {
"post_date" : {
"from" : "2009-11-15T13:00:00",
"to" : "2009-11-15T14:30:00"
}
}
}
}'
GETting Some Data
$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "You know, for Search"
}'
$ curl -XGET http://localhost:9200/twitter/tweet/2
Schema Mapping
$ curl -XPUT http://localhost:9200/twitter
$ curl -XPUT http://localhost:9200/twitter/user/_mapping -d '{
"properties" : {
"name" : { "type" : "string" }
}
}'
Multi Tenancy
$ curl -XPUT http://localhost:9200/kimchy
$ curl -XPUT http://localhost:9200/elasticsearch
$ curl -XPUT http://localhost:9200/elasticsearch/tweet/1 -d '{
"post_date": "2009-11-15T14:12:12",
"message": "Zug Zug",
"tag": "warcraft"
}'
$ curl -XPUT http://localhost:9200/kimchy/tweet/1 -d '{
"post_date": "2009-11-15T14:12:12",
"message": "Whatyouwant?",
"tag": "warcraft"
}'
$ curl -XGET http://localhost:9200/kimchy,elasticsearch/tweet/_search?q=tag:warcraft
$ curl -XGET http://localhost:9200/_all/tweet/_search?q=tag:warcraft
Settings
$ curl -XPUT http://localhost:9200/elasticsearch/ -d '{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 3
}
}'
Distributed
Shards – write scale
Replicas – read scale, durability
Segments
Routing – index and search
Discovery – multicast, unicast, AWS
http://www.youtube.com/watch?v=l4ReamjCxHo
Consistency
Always read consistent with RT GET
 View always consistent after write
Document searchable after short delay(1s)
Write tunable – one, quorum, all
Gateway
Local/NFS
Amazon S3
Hadoop
River
Twitter
CouchDB
MongoDB
RabbitMQ
Wikipedia
Logstash
SOLR over
ElasticSearch
Release synchronized with Lucene
Larger community
Larger tool set
Feature set a bit better
XML configuration
ElasticSearch over
SOLR
Natively distributed
JSON based
Dynamic, template, and defined schema
Returns source document by default
Avoids overhead of index commit after write
Mock SOLR interface
Rivers
Who uses ElasticSearch
StumbleUpon
Mozilla Foundation
Sony Computer Entertainment
Infochimps
Foursquare
Github
Ataxo Social Insider
Sonian Inc.
Demo
Hadoop Integration
Hadoop as gateway storage
LoadFunc/StorFunc – Pig/Map Reduce
Hadoop streaming interface
Manual export and import of data
ES in Big Data
Endpoint for processed data
Aggregator for BI or dashboard (facets)
Used to query reduced data sets for machine
learning algorithms
Data storage engine in it's own right plus full
search capabilities
ElasticStore
Make ES look and function more like a document
store while exposing advanced ES features
Influenced by Mongo API
Expose a simpler, more programmer centric API
Expose A QueryBuilder style API (HQL)
Expose annotations for easier schema definition,
properties, analyzers, etc.
Allow both strong and weak object mapping
https://github.com/nosqltips/elasticstore
Resources
www.elasticsearch.org
www.elasticsearch.com
https://github.com/elasticsearch
http://lucene.apache.org/core

Craig Brown speaks on ElasticSearch

  • 1.
    3 DBAs walkinto a NOSQL bar. A little while later they walk out ..... because they couldn't find a table.
  • 2.
    3 nosql guyswalk into a SQL bar... a little while later they leave because they couldn't find a relationship
  • 3.
    Objective – Understandingof ElasticSeach as a search engine and nosql datastore.
  • 4.
    About Me Search Architect BigData/Hadoop Engineer NoSql Advocate blog.nosqltips.com @nosqltips on twitter
  • 5.
    ElasticSearch You know, forsearch Shay Banon - compass Elasticsearch.org Built on Lucene core (4.3 as of 0.90.3) JSON over HTTP (REST)
  • 6.
    ElasticSearch Very easy scalability– multicast, unicast, AWS Open Source – apache 2 license  https://github.com/elasticsearch Transports – HTTP, memcached, thrift Scripting – mvel, javascript, java, python, groovy  custom scoring, document updates
  • 7.
    Schema Free & DocumentOriented $ curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{ "name" : "Shay Banon" }' $ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{ "user": "kimchy", "post_date": "2009-11-15T13:12:00", "message": "Trying out elasticsearch, so far so good?" }' $ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "post_date": "2009-11-15T14:12:12", "message": "You know, for Search" }'
  • 8.
    Search $ curl -XGEThttp://localhost:9200/twitter/tweet/_search?q=user:kimchy $ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{ "query" : { "term" : { "user": "kimchy" } } }' $ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{ "query" : { "range" : { "post_date" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:30:00" } } } }'
  • 9.
    GETting Some Data $curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{ "user": "kimchy", "post_date": "2009-11-15T14:12:12", "message": "You know, for Search" }' $ curl -XGET http://localhost:9200/twitter/tweet/2
  • 10.
    Schema Mapping $ curl-XPUT http://localhost:9200/twitter $ curl -XPUT http://localhost:9200/twitter/user/_mapping -d '{ "properties" : { "name" : { "type" : "string" } } }'
  • 11.
    Multi Tenancy $ curl-XPUT http://localhost:9200/kimchy $ curl -XPUT http://localhost:9200/elasticsearch $ curl -XPUT http://localhost:9200/elasticsearch/tweet/1 -d '{ "post_date": "2009-11-15T14:12:12", "message": "Zug Zug", "tag": "warcraft" }' $ curl -XPUT http://localhost:9200/kimchy/tweet/1 -d '{ "post_date": "2009-11-15T14:12:12", "message": "Whatyouwant?", "tag": "warcraft" }' $ curl -XGET http://localhost:9200/kimchy,elasticsearch/tweet/_search?q=tag:warcraft $ curl -XGET http://localhost:9200/_all/tweet/_search?q=tag:warcraft
  • 12.
    Settings $ curl -XPUThttp://localhost:9200/elasticsearch/ -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 3 } }'
  • 13.
    Distributed Shards – writescale Replicas – read scale, durability Segments Routing – index and search Discovery – multicast, unicast, AWS http://www.youtube.com/watch?v=l4ReamjCxHo
  • 14.
    Consistency Always read consistentwith RT GET  View always consistent after write Document searchable after short delay(1s) Write tunable – one, quorum, all
  • 15.
  • 16.
  • 17.
    SOLR over ElasticSearch Release synchronizedwith Lucene Larger community Larger tool set Feature set a bit better XML configuration
  • 18.
    ElasticSearch over SOLR Natively distributed JSONbased Dynamic, template, and defined schema Returns source document by default Avoids overhead of index commit after write Mock SOLR interface Rivers
  • 19.
    Who uses ElasticSearch StumbleUpon MozillaFoundation Sony Computer Entertainment Infochimps Foursquare Github Ataxo Social Insider Sonian Inc.
  • 20.
  • 21.
    Hadoop Integration Hadoop asgateway storage LoadFunc/StorFunc – Pig/Map Reduce Hadoop streaming interface Manual export and import of data
  • 22.
    ES in BigData Endpoint for processed data Aggregator for BI or dashboard (facets) Used to query reduced data sets for machine learning algorithms Data storage engine in it's own right plus full search capabilities
  • 23.
    ElasticStore Make ES lookand function more like a document store while exposing advanced ES features Influenced by Mongo API Expose a simpler, more programmer centric API Expose A QueryBuilder style API (HQL) Expose annotations for easier schema definition, properties, analyzers, etc. Allow both strong and weak object mapping https://github.com/nosqltips/elasticstore
  • 24.