1
Elastic 5.0
…so much awesomeness!
Matias Cascallares, Solutions Architect
matias@elastic.co
• Made in Argentina, living in Singapore
• Java / Python / NodeJS
• Working with/in open source for the last 8 years
• Using Elasticsearch since 2014, working for Elastic since 2015
• Meme lover
> whoami
3
The Elastic Stack
4
It’s Complicated
5
5.0
6
5.0
7
8
Store, Index & Analyze
• Resilient; designed for
scale-out
• High availability;
multitenancy
• Structured & unstructured
data
Distributed
& Scalable
Developer
Friendly
Search &
Analytics
• Schemaless
• Native JSON
• Client libraries
• Apache Lucene
• Real-time
• Full-text search
• Aggregations
• Geospatial
• Multilingual
• Lower memory usage & improved cluster stability
(new keyword type)
• Better scoring, faster, reduced hardware demand
(Okapi BM25)
• IPv6 type support
Update To Lucene 6
• Half the disk space
• Twice as fast to ingest
• 25% faster to search
• For numeric and geospatial fields only
• Scaled floats
• Technically a BKD Tree implementation
Lucene Demensional Fields
Some Benchmarks
Some Benchmarks
New Scripting Language: Painless
• Aggregation and suggestion results are
cached on shard level for instant returns
after the first query.
• Combined with a new query rewrite,
typical Kibana dashboards that use “last
X days” type of queries will improve
dramatically.
Shard Request Cache
Rollover API
• Indices not based on time, but on size of the data.
• Even if your data sizes are not consistent per day, Elasticsearch will use
constant index/shard sizes.
• Set up rules around automatic rollover to a new index, with aliases.
Shrink API
• Reduce resources on immutable data
• Easily reduce the number of shards to free up resources
• Indices can be shrunk to a factor of its original number of shards
• Low-level client
• Allows communication through HTTP/S
• Sync and Async semantics
• Connection handling
• Node discovery (sniffer module)
Java REST Client
• Define processing pipelines right in the Elasticsearch cluster.
• Depending on use case, can simplify the architecture
• Has Processors for the most common actions.
• Combine it with Logstash when needed for power & flexibility.
Ingest Node
Bootstrap Checks
Bootstrap Checks
Bootstrap Checks
• Detects if it’s running in production or development mode
• When running in production, it will now refuse to start under certain conditions
that could seriously impact performance, stability, or data integrity
‒ Heap size (initial vs max)
‒ Memory lock (mlockall)
‒ Virtual memory size
‒ File descriptors
‒ Threads
‒ JVM in server mode
More Goodies…
• Dots in field names was supported in 1.x, and was removed in 2.x. 5.0
support dots in field names again!
More Goodies…
• New lock method increases small document indexing up to 15-20%
• New fsync method for increased ingestion speed
• refresh=[true|wait_for] for index, update, delete and bulk APIs
• Migration Helper
‒ Cluster checkup before upgrading
‒ Reindex helper for 1.x indices
‒ Deprecation logging
Version Compatibility
IDX_v1x IDX_v2x IDX_v5x
ES 1.X
ES 2.X
ES 5.X
Website: www.elastic.co
Products: https://www.elastic.co/products
Forums: https://discuss.elastic.co/
Community: https://www.elastic.co/community/meetups
Twitter: @elastic
Thank You.

Elasticsearch 5.0

  • 1.
    1 Elastic 5.0 …so muchawesomeness! Matias Cascallares, Solutions Architect matias@elastic.co
  • 2.
    • Made inArgentina, living in Singapore • Java / Python / NodeJS • Working with/in open source for the last 8 years • Using Elasticsearch since 2014, working for Elastic since 2015 • Meme lover > whoami
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    8 Store, Index &Analyze • Resilient; designed for scale-out • High availability; multitenancy • Structured & unstructured data Distributed & Scalable Developer Friendly Search & Analytics • Schemaless • Native JSON • Client libraries • Apache Lucene • Real-time • Full-text search • Aggregations • Geospatial • Multilingual
  • 9.
    • Lower memoryusage & improved cluster stability (new keyword type) • Better scoring, faster, reduced hardware demand (Okapi BM25) • IPv6 type support Update To Lucene 6
  • 10.
    • Half thedisk space • Twice as fast to ingest • 25% faster to search • For numeric and geospatial fields only • Scaled floats • Technically a BKD Tree implementation Lucene Demensional Fields
  • 12.
  • 13.
  • 14.
  • 15.
    • Aggregation andsuggestion results are cached on shard level for instant returns after the first query. • Combined with a new query rewrite, typical Kibana dashboards that use “last X days” type of queries will improve dramatically. Shard Request Cache
  • 16.
    Rollover API • Indicesnot based on time, but on size of the data. • Even if your data sizes are not consistent per day, Elasticsearch will use constant index/shard sizes. • Set up rules around automatic rollover to a new index, with aliases.
  • 17.
    Shrink API • Reduceresources on immutable data • Easily reduce the number of shards to free up resources • Indices can be shrunk to a factor of its original number of shards
  • 18.
    • Low-level client •Allows communication through HTTP/S • Sync and Async semantics • Connection handling • Node discovery (sniffer module) Java REST Client
  • 19.
    • Define processingpipelines right in the Elasticsearch cluster. • Depending on use case, can simplify the architecture • Has Processors for the most common actions. • Combine it with Logstash when needed for power & flexibility. Ingest Node
  • 20.
  • 21.
  • 22.
    Bootstrap Checks • Detectsif it’s running in production or development mode • When running in production, it will now refuse to start under certain conditions that could seriously impact performance, stability, or data integrity ‒ Heap size (initial vs max) ‒ Memory lock (mlockall) ‒ Virtual memory size ‒ File descriptors ‒ Threads ‒ JVM in server mode
  • 23.
    More Goodies… • Dotsin field names was supported in 1.x, and was removed in 2.x. 5.0 support dots in field names again!
  • 24.
    More Goodies… • Newlock method increases small document indexing up to 15-20% • New fsync method for increased ingestion speed • refresh=[true|wait_for] for index, update, delete and bulk APIs • Migration Helper ‒ Cluster checkup before upgrading ‒ Reindex helper for 1.x indices ‒ Deprecation logging
  • 25.
    Version Compatibility IDX_v1x IDX_v2xIDX_v5x ES 1.X ES 2.X ES 5.X
  • 27.
    Website: www.elastic.co Products: https://www.elastic.co/products Forums:https://discuss.elastic.co/ Community: https://www.elastic.co/community/meetups Twitter: @elastic Thank You.

Editor's Notes

  • #5 Totally different products, different teams, different programming languages
  • #10 String is being split into Keyword and Text. Text will be similar to existing String, but Keyword allows some basic string to live off-heap, which will increase stability and reduce heap usage. This will be great for things like HTTP verbs (GET, POST, PUT, DELETE). BM25 considers the length of the document. The longer the document higher odds you have to match a document.
  • #11 We added a new BKD tree to Lucene, which lets us do lots of neat things with numbers very efficiently. New Geo type merges geopoint and geoshape, which makes development easier, but also wraps up into much more efficient structure Can also be used to build very large number support, which is needed for IPv6 types and other big numbers (like nanosecond-granularity epoch
  • #12 We have POIs from London In this video you can see that where the density of POIs is higher, the storage grid as smaller resolution
  • #13 Go to benchmarks.elastic.co and check it out by yourself We run this every night so we can detect any performance regression
  • #14 Go to benchmarks.elastic.co and check it out by yourself We run this every night so we can detect any performance regression
  • #15 Lucene Expressions Groovy (we thought it could be sandboxed) Painless is twice as fast as Groovy. But more important is safe and secure
  • #19 Persistent connections Failled connection penalization Load balancing/failover
  • #20 GROK, GEOIP, DATE, change field names, remove fields, timestamp, etc
  • #21 Before the sheet hits the fan