New features coming soon with the Elastic Stack 6.0 (Elasticsearch, Logstash, Kibana, Beats). Presented at the Elasticsearch Meetup San Francisco on Nov 2, 2017.
8. 8
• New Kubernetes module in Metricbeat
‒ CPU, memory, bytes on network and more.
• New processor to add_docker_metadata
‒ Container ID, name, image, labels
• New processor to add_kubernetes_metadata
‒ Pod name, pod namespace, container name, pod labels
Beats <3 containerization
Monitor your Docker and Kubernetes deployments with ease
10. 10
• Run multiple, distinct workloads on a
single Logstash JVM
• Simplify dataflow logic by managing
per data source logic independently
• Monitor each pipeline separately with
the new Pipeline Viewer
Multiple Pipelines, One Logstash
Logstash
JDBC Pipeline
Netflow Pipeline
Apache Pipeline
12. 12
• Visualize pipeline topologies as graphs
• Reveal bottlenecks at the plugin level
• Optimize dataflow with better metrics
• Integrated with Monitoring UI
Zoom in on your Pipelines
Pipeline Viewer
X-Pack feature (Basic, free)
16. 16
Upgrade (2017)
• New Upgrade Assistant (UI & API)
• Zero downtime upgrades
‒Rolling restarts from latest 5.x to 6.x
‒Cross-cluster search across major version
17. 17
Space-saving columnar store
• Better for storing
sparse fields
• Save on disk space
& file system cache
Tapping into Lucene 7 goodness (sparse doc value)
user first middle last age phone
johns Alex Smith
jrice Jill Amy Rice 508.567.1211
mt123 Jeff Twain 56
sadams Sue Adams
adoe Amy Doe 31
lp12 Liz Potter
18. 18
Much speedier sorted queries
Tapping into Lucene 7 goodness (index sorting)
Player 1 Score: 600
5.x
Queryfortop3playerscores
Player 2 Score: 0
Player 3 Score: 200
Player 4 Score: 700
Player 5 Score: 300
Player 1907 Score: 800...
Queryfortop3playerscores
...
Player 1907 Score: 800
Player 4 Score: 700
Player 1 Score: 600
Player 5 Score: 300
Player 3 Score: 200
Player 2 Score: 0
6.x
Sort at index time vs. query time
Optimize on-disk format for some
use cases
Improve query performance at the
cost of index performance
25. 25
Large Improvements to Replication
• Limit syncs to only changed documents (instead of file-based recovery)
• Fast replica recovery after temporary unavailability (network issues, etc.)
• Re-sync on primary failure
• Laying foundation for future big league features
‒Cross-datacenter replication
‒Changes API (tbd)
New operation-based approach to recovery (sequence numbers)
26. 26
Simpler data models with type removal
• Breaking change
• Gradual migration path
‒ 6.0 indices can be created with only one type
‒ Existing 5.x indices using _type will continue to function
• Introducing new APIs for type-less operations
Say goodbye to _type confusion
33. 33
6.0 starts Kibana on the accessibility path
• High contrast color scheme
• Keyboard accessibility
• Screen reader support
• More improvements on the way
Accessibility improvements
35. 35
Kibana now supports multiple query languages
• Lucene Query Language (default)
• Kuery (off by default, experimental in 6.0)
• ... perhaps others in the future
We want your feedback!
• Enable Kuery from Advanced Settings
More ways to query with Kuery
Consistent syntax and simple to get started
geoBoundingBox("coordinates", topLeft="40.73, -74.1", bottomRight="40.01, -71.12")
36. 36
• Visualize pipeline topologies as graphs
• Reveal bottlenecks at the plugin level
• Optimize dataflow with better metrics
• Integrated with Monitoring UI
Zoom in on your Pipelines
Pipeline Viewer
X-Pack feature (Basic, free)
We could pull out these features individually by functionality but we talk about this together because we want to provide a better experience for Docker and Kubernetes. You can now pull in metadata about your containers, your pods and even directly gather metrics and logs from your Kubelet to keep an eye on what’s happening across the board.
We've added support for running multiple, self-contained pipelines in the same Logstash instance (on the same JVM). Logstash configurations can be complex with multiple data flows built using conditionals. This feature will help simplify that — if your data flow and processing can be separated, you can run them in an isolated fashion as separate pipelines. This separation means that a blocked output in one pipeline won’t exert backpressure in the other pipeline.
A new configuration file called pipeline.yml file has been added to define multiple pipelines that can be run on the same instance. Users can dynamically add, modify and remove pipeline configurations. This feature also provides an option to run pipelines with their own run-time settings like workers size, batch size etc.
The monitoring UI got a significant upgrade with this new X-Pack Basic feature. Users can now visualize their often complex pipeline configuration as a directed acyclic graph (DAG) representation. This UI provides a simple way to understand the overall pipeline topology, data flow, branching logic, and granular plugin level metrics. We overlay important metrics such as events per second, and time spent in milliseconds for each plugin in this view. Plus, there are visual indicators (colored labels) on the components when events spend extra time in certain plugins — this should draw your attention to the problem areas, providing an easy way to diagnose bottlenecks and optimize them.
Major versions are a time to make breaking changes, to clean out the cruft. With previous versions, you had to do a lot of research in the release notes to figure out what changes to make to your application to work with the new version. In 5.x, we have made this much easier. Wherever possible, we have added deprecation logging to warn you about functionality that is either going away or changing, and (wherever possible) we have added a backwards compatibility layer which allows you to migrate your application to the new functionality before moving to 6.0.
Upgrading experience got a major rehaul in 6.0
Major versions are a time to make breaking changes, to clean out the cruft. With previous versions, you had to do a lot of research in the release notes to figure out what changes to make to your application to work with the new version. In 5.x, we have made this much easier. Wherever possible, we have added deprecation logging to warn you about functionality that is either going away or changing, and (wherever possible) we have added a backwards compatibility layer which allows you to migrate your application to the new functionality before moving to 6.0.
Upgrading experience got a major rehaul in 6.0
The new upgrade assistant ui (x-pack basic feature) and revamped apis simplifes upgrade preparation a lot.
With 6.0, we introduce 2 new chnages that make it possible to upgrade you existing deployments with zero downtime.
Upgrading to 6.0 with Rolling Restarts
You will be able to upgrade from the latest 5.x version to 6.0 using rolling restarts, without any cluster downtime. The only exception to this is if you use X-Pack Security without SSL/TLS enabled. TLS between nodes is required in X-Pack Security in 6.0 and the only way to enable it if you aren’t already using it is to do a full cluster restart, which you can choose to do either in 5.x or in 6.0.
Cross Cluster Search Across Major Versions
As with previous major version upgrades, Elasticsearch 6.0 will be able to read indices created in 5.x, but not those created in 2.x. However, instead of needing to reindex all of your old indices, you can choose to leave them in a 5.x cluster and to use Cross Cluster Search to search across both your 6.x and 5.x clusters at the same time.
Doc values (the columnar data store in Elasticsearch) have allowed us to escape the limitations of JVM heap size to support scalable analytics on larger amounts of data. Doc values are a very good fit for dense values, where every document has a value for every field.
But they have been a poor fit for sparse values (many fields, with few documents having a value for each field), where the matrix structure ends up wasting a lot of space.
Lucene 7 brings support for sparse doc values — an alternative encoding format for the sparse case which can save a lot of both disk space and file-system cache.
Imagine that you have a large search-heavy index. Searches should be super-fast, but a significant part of every search request is sorting the results into the correct order in order to return just the top 10 best hits. With index sorting, you can pay the price of sorting at index time (30-40% of throughput) instead of at search time. That way, a search can terminate as soon as it has gathered sufficient hits.
To take advantage of this, your documents need to be sorted at index time in the same order as will be used for your primary sort criterion at search time, e.g. by price or timestamp. This means that it won’t work well where your primary sort is on the relevance _score. It also isn’t suitable for searches with aggregations, as aggregations have to examine all documents regardless and can’t terminate early.
However, there is another non-obvious benefit of index sorting. Sorting on low-cardinality fields such as age, gender, is_published, which are commonly used as filters, can result in more efficient searches as all potential matching documents are grouped together.
Elasticsearch 6.0.0-alpha1 supports index sorting (and so would already benefit the low-cardinality use case), but doesn’t yet expose the early termination of searches. See Index Sorting for more details.
While synced flush has greatly improved shard recovery times for indices that are not being written to, recovery of active indices is still a slow and heavy operation. An active replica on a node that leaves the cluster for a brief period still needs to copy over all or most of the files in the primary shard in order to bring itself up to date.
The new Sequence Numbers infrastructure assigns an incremental operation ID to every index, update, or delete. This new infrastructure allows a replica to ask the primary for all operations from X onwards. If these operations are found in the primary’s translog an older replica can bring itself up to date by just replaying the transaction log and avoid the need to copy files.
This is a feature that is partially present in 6.0.0-alpha1 and will continue to evolve towards 6.0 and during the 6.x series. The main areas of improvements are:
Increase the chance of a succesful ops based recovery by keeping transaction logs around for longer than strictly necessary. It will be possible to configure how long to keep old transaction logs, allowing you to balance disk usage with longer outage periods.
If a primary shard fails and it is configured to have multiple replicas, it is possible for each replica to have different operations in flight — operations which have not yet been acknowledged to the user. Sequence numbers allow the replicas to sync with the newly elected primary immediately rather than wait for the next recovery to ensure that all shards hold the same data.
Sequence numbers can improve optimistic locking, which is currently implemented using internal versioning.
See Consensus and Replication in Elasticsearch for more on the topic.
Long term we are planning to use Sequence Numbers to power new features like a Changes API and Cross Data-Centre Disaster Recovery.
Search for the documents you want to export in the Discover app, and then export matching documents as a CSV file via the reporting menu. CSV export comes with X-Pack basic, which is our free license.
Currently limited to 10mb for ad-hoc analysis use case
Can extend through kibana.yml file but limited by ES settings
Next phase is to utilize reporting system for chunked export of larger files
You can now enter full screen mode when viewing a dashboard. This hides the browser chrome and the top nav bar. If you have any filters applied, you'll see the filter bar, otherwise that will be hidden as well. To exit full screen mode, hover over and click the Kibana button on the lower left side of the page, or simply press the ESC key. This mode complements the Dashboard Only Mode introduced in alpha2, and together they make a great solution for monitors in NOC's, SOC's and other KIosks around the office!
In 6.0 we're working across Kibana to improve Accessibility, one of those efforts is to make the colors in the UI have appropriate contrast for folks who have different forms of color blindness. We've redone the styling for most of Kibana in this beta to address these issues. Here are some sample screens:
In 6.0 we're working across Kibana to improve Accessibility, one of those efforts is to make the colors in the UI have appropriate contrast for folks who have different forms of color blindness. We've redone the styling for most of Kibana in this beta to address these issues. Here are some sample screens:
In #12282 we introduce an Experimental Kibana Query Language it is disabled by default and can be enabled through the Kibana configuration.
Kibana currently provides four different search mechanisms with overlapping responsibilities:
Lucene query syntax in the query bar
Query DSL in the query bar
Filters created via the UI (which could include custom query DSL if edited)
Console
Exposing the Lucene query syntax and the query DSL to users creates a few problems. Since we don't control the query syntax we can't implement features that would require introspection into a user's query. This includes things like:
Safe and seamless migrations of saved searches when ES search APIs change
Typeahead/autocomplete in the query bar
Dynamic help text
We could solve these problems by building a model in Kibana to represent raw Elasticsearch queries, but there are other advantages to building our own query language:
We can support query types that are available in the ES query DSL that are not supported by the Lucene query syntax
We can implement functionality that is beyond the scope of the Lucene query syntax, e.g. support for aggregations and visualizations in the query language
We can provide finer grain controls for admins to restrict access to expensive queries, e.g. leading wildcards or regexes
We can add support for scripted fields to the language
We can unify the query bar and the filter bar, eliminating confusion about when to use one or the other
So, we hope you'll turn on the Kibana Query Language and give it a spin and send us feedback!
The monitoring UI got a significant upgrade with this new X-Pack Basic feature. Users can now visualize their often complex pipeline configuration as a directed acyclic graph (DAG) representation. This UI provides a simple way to understand the overall pipeline topology, data flow, branching logic, and granular plugin level metrics. We overlay important metrics such as events per second, and time spent in milliseconds for each plugin in this view. Plus, there are visual indicators (colored labels) on the components when events spend extra time in certain plugins — this should draw your attention to the problem areas, providing an easy way to diagnose bottlenecks and optimize them.