1
Search, Time Series, and Graph Analysis in the Cloud
Dave Erickson
dave@elastic.co
Visualizing Data in
Elasticsearch
2
Dave Erickson – Developer
• Biotech
• Electronic Archives & Libraries
• Geospatial
• Healthcare
• Air Traffic Control
• Financial Services
3
4
Elastic Stack: Real Time Search & Analytics at Scale
Elastic Cloud
Security
X-Pack
Kibana
User Interface
ElasticsearchStore, Index,
& Analyze
Ingest
Logstash Beats
+
Alerting
Monitoring
Reporting
Graph
5
6
Visualization is Important
https://www.reddit.com/r/dataisugly/
7
Visualization in the Cloud
• Qualities We Want:
‒ Parallel
‒ Highly Available
‒ Platform Independent
‒ Multi-tenancy
‒ Extensible
• Use Cases:
‒ Search, Discovery, & Analytics
‒ Metrics & Time Series Data
‒ Structured & Unstructured
‒ Security Analytics
8
Wait …
Why would you use a
search engine for
analytics?
9
Search indexes have been around for a long time
10
Scaled, distributed search indexes have been around
for a long time
11
Electronic search engines have been around for a
long time
1928 – patent application by Emanuel Goldberg for a “Statistical Machine”
http://www.google.com/patents/US1838389
Basically an optical version of grep that predates almost everything
12
Timeline, in no way complete
• 7th Century B.C.E. ? – library catalogs
• 1928 – Goldberg “Statistical Machine”
– Optical search on microfilm
• 1945 – Vannevar Bush “microfilm rapid selector”; “Memex”
• 1960s – SMART Information Retrieval System (Cornell U.)
• 1974 – grep first appears in Unix v4
• 1990s – WWW search engines
• 1999 – Doug Cutting Lucene search indexer
13
Inverted Indexes
• Pay the cost at indexing time (insertion time)
• Reap the benefits at retrieval time
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
14
Pretty Good At Retrieval
Find documents mentioning “foxes” ?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
15
Excellent at Search
Find documents mentioning
“quick” AND “fox” ?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
16
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Excellent at Real Time Analytics
What was the most commonly mentioned term?
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
17
“the quick brown fox” “brown fox in the forest”
Document (1) Document (2)
“brown bear”
Document (3)
Histogram about the mention of foxes over time:
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
18
Columnar Indexes
18
text: “the quick brown fox”
date: Monday
text: “brown fox in the forest”
date: Tuesday
Document (1)
Document (2)
text: “brown bear”
date: Monday
Document (3)
Doc id Date
1 Monday
2 Tuesday
3 Monday
Term Postings List Statistics (count)
quick 1 1
brown 1, 2, 3 3
fox 1, 2 2
forest 2 1
bear 3 1
19
Now do it in parallel
• Distributed
• Non-blocking
• Read / Write
• Commodity hardware
• Fault-tolerance
• High Availability
19
20
Use Cases
20
21
22
23
24
25
26
Thank You!
dave@elastic.co

Visualizing Data in Elasticsearch DevFest DC 2016

  • 1.
    1 Search, Time Series,and Graph Analysis in the Cloud Dave Erickson dave@elastic.co Visualizing Data in Elasticsearch
  • 2.
    2 Dave Erickson –Developer • Biotech • Electronic Archives & Libraries • Geospatial • Healthcare • Air Traffic Control • Financial Services
  • 3.
  • 4.
    4 Elastic Stack: RealTime Search & Analytics at Scale Elastic Cloud Security X-Pack Kibana User Interface ElasticsearchStore, Index, & Analyze Ingest Logstash Beats + Alerting Monitoring Reporting Graph
  • 5.
  • 6.
  • 7.
    7 Visualization in theCloud • Qualities We Want: ‒ Parallel ‒ Highly Available ‒ Platform Independent ‒ Multi-tenancy ‒ Extensible • Use Cases: ‒ Search, Discovery, & Analytics ‒ Metrics & Time Series Data ‒ Structured & Unstructured ‒ Security Analytics
  • 8.
    8 Wait … Why wouldyou use a search engine for analytics?
  • 9.
    9 Search indexes havebeen around for a long time
  • 10.
    10 Scaled, distributed searchindexes have been around for a long time
  • 11.
    11 Electronic search engineshave been around for a long time 1928 – patent application by Emanuel Goldberg for a “Statistical Machine” http://www.google.com/patents/US1838389 Basically an optical version of grep that predates almost everything
  • 12.
    12 Timeline, in noway complete • 7th Century B.C.E. ? – library catalogs • 1928 – Goldberg “Statistical Machine” – Optical search on microfilm • 1945 – Vannevar Bush “microfilm rapid selector”; “Memex” • 1960s – SMART Information Retrieval System (Cornell U.) • 1974 – grep first appears in Unix v4 • 1990s – WWW search engines • 1999 – Doug Cutting Lucene search indexer
  • 13.
    13 Inverted Indexes • Paythe cost at indexing time (insertion time) • Reap the benefits at retrieval time “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3) Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 14.
    14 Pretty Good AtRetrieval Find documents mentioning “foxes” ? Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1 “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3)
  • 15.
    15 Excellent at Search Finddocuments mentioning “quick” AND “fox” ? Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1 “the quick brown fox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3)
  • 16.
    16 “the quick brownfox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3) Excellent at Real Time Analytics What was the most commonly mentioned term? Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 17.
    17 “the quick brownfox” “brown fox in the forest” Document (1) Document (2) “brown bear” Document (3) Histogram about the mention of foxes over time: Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 18.
    18 Columnar Indexes 18 text: “thequick brown fox” date: Monday text: “brown fox in the forest” date: Tuesday Document (1) Document (2) text: “brown bear” date: Monday Document (3) Doc id Date 1 Monday 2 Tuesday 3 Monday Term Postings List Statistics (count) quick 1 1 brown 1, 2, 3 3 fox 1, 2 2 forest 2 1 bear 3 1
  • 19.
    19 Now do itin parallel • Distributed • Non-blocking • Read / Write • Commodity hardware • Fault-tolerance • High Availability 19
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.

Editor's Notes

  • #15 You’ve done the work ahead of time by building the index. Access if fast
  • #16 Incredibly flexible ad-hoc query, structured or unstructured
  • #17 Wait .. Did we just do analytics?
  • #18 Wait .. Did we just do analytics?