Successfully reported this slideshow.
Your SlideShare is downloading. ×

Schema on read with runtime fields

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Airflow 101
Airflow 101
Loading in …3
×

Check these out next

1 of 27 Ad

Schema on read with runtime fields

Download to read offline

Elasticsearch has always been fast, but required structuring and indexing your data up front. We're changing that with the introduction of runtime fields, which enable you to extract, calculate, and transform fields at query time. They can be defined after data is indexed or provided with your query, enabling new cost/storage/performance tradeoffs, and letting analysts gradually define fields over time.

Elasticsearch has always been fast, but required structuring and indexing your data up front. We're changing that with the introduction of runtime fields, which enable you to extract, calculate, and transform fields at query time. They can be defined after data is indexed or provided with your query, enabling new cost/storage/performance tradeoffs, and letting analysts gradually define fields over time.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Schema on read with runtime fields (20)

Advertisement

More from Elasticsearch (20)

Recently uploaded (20)

Advertisement

Schema on read with runtime fields

  1. 1. 1 Runtime Fields Gilad Gal Product Manager, Elasticsearch
  2. 2. 2 This presentation and the accompanying oral presentation contain forward-looking statements, including statements concerning plans for future offerings; the expected strength, performance or benefits of our offerings; and our future operations and expected performance. These forward-looking statements are subject to the safe harbor provisions under the Private Securities Litigation Reform Act of 1995. Our expectations and beliefs in light of currently available information regarding these matters may not materialize. Actual outcomes and results may differ materially from those contemplated by these forward-looking statements due to uncertainties, risks, and changes in circumstances, including, but not limited to those related to: the impact of the COVID-19 pandemic on our business and our customers and partners; our ability to continue to deliver and improve our offerings and successfully develop new offerings, including security-related product offerings and SaaS offerings; customer acceptance and purchase of our existing offerings and new offerings, including the expansion and adoption of our SaaS offerings; our ability to realize value from investments in the business, including R&D investments; our ability to maintain and expand our user and customer base; our international expansion strategy; our ability to successfully execute our go-to-market strategy and expand in our existing markets and into new markets, and our ability to forecast customer retention and expansion; and general market, political, economic and business conditions. Additional risks and uncertainties that could cause actual outcomes and results to differ materially are included in our filings with the Securities and Exchange Commission (the “SEC”), including our Annual Report on Form 10-K for the most recent fiscal year, our quarterly report on Form 10-Q for the most recent fiscal quarter, and any subsequent reports filed with the SEC. SEC filings are available on the Investor Relations section of Elastic’s website at ir.elastic.co and the SEC’s website at www.sec.gov. Any features or functions of services or products referenced in this presentation, or in any presentations, press releases or public statements, which are not currently available or not currently available as a general availability release, may not be delivered on time or at all. The development, release, and timing of any features or functionality described for our products remains at our sole discretion. Customers who purchase our products and services should make the purchase decisions based upon services and product features and functions that are currently available. All statements are made only as of the date of the presentation, and Elastic assumes no obligation to, and does not currently intend to, update any forward-looking statements or statements relating to features or functions of services or products, except as required by law. Forward-Looking Statements
  3. 3. 3 Runtime fields in a nutshell • Empowering all users to generate fields upon need • Flexibility vs. performance at query time Schema on read A Runtime field is a field that is associated with instructions for calculating it at query time (e.g. script). Runtime fields can be defined in the mapping or introduced in a query. Other than that runtime fields behave like any other field in Elasticsearch.
  4. 4. Agenda Slide What are runtime fields?1 How will runtime fields be implemented?3 Why are runtime fields useful?2
  5. 5. Schema on write query performance Extract, Transform, Index Readiness for immediate query/agg Advantages: ● Immediate response time ● Flexibility for new docs
  6. 6. Schema on read flexibility, cost, ingest pace Load almost raw Prep per query upon need Advantages: ● Flexibility for ingested docs ● Start without data/use knowledge ● Improved ingest rate Schema on write query performance Extract, Transform, Index Readiness for immediate query/agg Advantages: ● Immediate response time ● Flexibility for new docs
  7. 7. Runtime Fields Elastic’s schema on read • Instructions for calculating the field upon need (e.g. script) • Defined in the mappings or introduced in a query • Smaller index and faster ingest • Lower query performance • Other than that - like any other field Schema on read flexibility, cost, ingest pace Load almost raw Prep per query upon need Advantages: ● Flexibility for ingested docs ● Start without data/use knowledge ● Improved ingest rate
  8. 8. Add to mapping PUT /test { "mappings": { "properties": { "@timestamp": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "message": { "type": "wildcard" }, "status": { "type": "runtime", "runtime_type": "long", "script": "String m = doc["message"].value; int end = m.lastIndexOf(" "); int start = m.lastIndexOf(" ", end - 1) + 1; emit(Long.parseLong(m.substring(start, end)));" } } } POST /test/_doc?refresh { "timestamp" : "1998-04-30T14:30:17-05:00" , "message" : "40.135.0.0 - - [1998-04-30T14:30:17-05:00] "GET /images/hm_bg.jpg HTTP/1.0 " 200 24736" }
  9. 9. and use like any other field POST /_async_search { "query": { "bool": { "must" : [ { "match": { "status": "200" } }, { "range" : { "@timestamp" : { "gte": "1998-05-01T00:00:00Z" , "lt": "1998-05-02T00:00:00Z" } } } ] } } }
  10. 10. POST /_async_search { "runtime_mappings": { "ip": { "type": "runtime", "runtime_type": "ip", "script": "String m = doc["message"].value; emit(m.substring(0, m.indexOf(" ")));" } }, "query": { "bool": { "must": [ { "range": { "ip": { "gte": "40.135.0.0", "lt": "40.135.255.255" } } }, { "match": { "status": "200" } }, { "range": { "@timestamp": { "gte": "1998-05-01T00:00:00Z", "lt": "1998-05-02T00:00:00Z" } } } ] } } } Query a runtime field defined on the fly POST /test/_doc?refresh { "timestamp" : "1998-04-30T14:30:17-05:00" , "message" : "40.135.0.0 - - [1998-04-30T14:30:17-05:00] "GET /images/hm_bg.jpg HTTP/1.0 " 200 24736" }
  11. 11. Future enhancements • Painless script • Grok patterns • Query time enrichment • Source field Options for defining the function that yields the value in the field
  12. 12. Agenda Slide Use color to highlight What are runtime fields?1 How will runtime fields be implemented?3 Why are runtime fields useful?2
  13. 13. Schema on read Benefits: – Flexibility in defining the data – No index footprint (lower TCO – Improved ingest pace Extract, transform and index data *only* upon need Beneficial, but we do have better mechanisms to help deal with these Letting analysts define their schema in retrospect
  14. 14. A new field lifecycle Extract more data with Runtime fields Index only @timestamp The rest as log entry in _source Turn frequently used runtime fields into indexed fields Benefits: ● Save time and effort ● Add fields if and when required, without knowing everything in advance ● Only index what you need - save index size - performance and hardware cost
  15. 15. Fix mapping errors Benefits: • Fix immediately, without reindexing • Queries and schema don’t change (performance impacted) Index Index data for optimal performance Retrospective Fix Identify an error in the ingest instructions and override the indexed field with runtime field for indexed documents Index Index new documents with the revised mapping
  16. 16. Field per context Query, visualization, or completely ad-hoc "runtime_mappings": { "ip": { "type": "runtime", "runtime_type": "ip", "script": "String m = doc["message"].value; emit(m.substring(0, m.indexOf(" ")));" } Benefits: • Avoid polluting everyone’s schema with fields that answer a need only for a subset of the users • Analyze more efficiently with fields designed to answer a specific need What’s the average size of an article in my index? I need to know for relevance ranking tuning. Please don’t add it to everyone’s articles index… You’re the only one interested in it, and even you just look at it once a month.
  17. 17. Autonomy Anyone is free to create new fields No collateral impact Adding a Runtime field (not indexed) Low permission barrier Benefits: ● Administrators avoid spending time on creating schema for specific needs ● Employees that are permitted to define their own data structure can achieve more with fewer resources
  18. 18. Agenda Slide Use color to highlight What are runtime fields?1 How will runtime fields be implemented?3 Why are runtime fields useful?2
  19. 19. The complex parts are things we already have Putting pre-existing mechanisms together • Calculate a field value per document and do that quickly – Prefered Painless script over ingest processor adaptation • Index to rely on for the heavy lifting • Logic to minimize the cases in which the calculation is performed • Async search to deal with slow queries
  20. 20. Async Queries Robustness to slow queries
  21. 21. Sync search Query Results or or Query Partial Results & ID Call w. ID Complete Result set Timeout Query Query Results Async search
  22. 22. Efficient calculation at query time • Calculate only upon need – Aggregations – Filter only after filtering by indexed fields – Display fields for top documents per query • Initial performance tests prove the important of indexed timestamp
  23. 23. 23 Matching is done by the query Only extract and transform are made with a script
  24. 24. Define a field with the script PUT /test { "mappings": { "properties": { "@timestamp": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "message": { "type": "wildcard" }, "status": { "type": "runtime", "runtime_type": "long", "script": "String m = doc["message"].value; int end = m.lastIndexOf(" "); int start = m.lastIndexOf(" ", end - 1) + 1; emit(Long.parseLong(m.substring(start, end)));" } } } POST /test/_doc?refresh { "timestamp" : "1998-04-30T14:30:17-05:00" , "message" : "40.135.0.0 - - [1998-04-30T14:30:17-05:00] "GET /images/hm_bg.jpg HTTP/1.0 " 200 24736" }
  25. 25. Matching logic is in the query POST /_async_search { "query": { "bool": { "must" : [ { "match": { "status": "200" } }, { "range" : { "@timestamp" : { "gte": "1998-05-01T00:00:00Z" , "lt": "1998-05-02T00:00:00Z" } } } ] } } }
  26. 26. Summary • Runtime fields - schema on read in Elasticsearch • Gaining in flexibility, index size and ingest pace, at a cost to performance • Leveraging existing mechanisms, e.g. index, async search, painless, query optimization • Facilitating new workflows: – Field per context (query, visualization, schema, etc.) – Fixing ingest errors in retrospect – New field creation and ingest workflow: start working and gradually create the schema Runtime fields Coming soon to an elasticsearch cluster near you
  27. 27. 27 Thank You!

×