Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
“Streaming Simplified”
Real Time search using Spark
and Elasticsearch
Sushmitha Ch
sushmitha@sigmoidanalytics.com
• Usecase
• Spark Streaming
• Spark
• Elasticsearch
• Elasticsearch with spark streaming
• elasticsearch-hadoop connector
...
Usecase
Stream of
Organization’s Data
{ Ip : “1.1.1.1”,
Response : “HTTP/1.1
404 not found”,
Port : 80,
Timestamp :1409076...
Input Stream Processing
Output
Storage
Low latency
Fault tolerance
Fast computing
Consistency
Latency
Usecase(Contd..)
Spark Streaming
• Runs on top of Spark
• Applications will be scalable,fault tolerant
• Can achieve second scale latencies...
Dstreams Transformations Output
Spark
Elasticsearch
• Powerful open source, distributed real-time search
engine
Document Mapping Index
Type ShardNode
Percolate API
Elasticsearch with Spark Streaming
Spark
Streaming
Elasticsearch
(elasticsearch-hadoop)
connector
• Spark integrates with Elasticsearch through its dedicated
InputFormat, OutputFormat
• O...
Query
elasticsearch-
hadoop
Elasticsearch
Shard 1
Elasticsearch
Shard 2
Elasticsearch
Shard 3
Elasticsearch
Shard 4
Elasti...
Architecture
Thank You
Real Time search using Spark and Elasticsearch
Upcoming SlideShare
Loading in …5
×

Real Time search using Spark and Elasticsearch

2,966 views

Published on

Real Time search using Spark and Elasticsearch
- Sushmitha Ch

Published in: Software

Real Time search using Spark and Elasticsearch

  1. 1. “Streaming Simplified” Real Time search using Spark and Elasticsearch Sushmitha Ch sushmitha@sigmoidanalytics.com
  2. 2. • Usecase • Spark Streaming • Spark • Elasticsearch • Elasticsearch with spark streaming • elasticsearch-hadoop connector Contents
  3. 3. Usecase Stream of Organization’s Data { Ip : “1.1.1.1”, Response : “HTTP/1.1 404 not found”, Port : 80, Timestamp :1409076765 } Document format Process and Enrich the data Store in DB Real time search,analyze and generate alerts
  4. 4. Input Stream Processing Output Storage Low latency Fault tolerance Fast computing Consistency Latency Usecase(Contd..)
  5. 5. Spark Streaming • Runs on top of Spark • Applications will be scalable,fault tolerant • Can achieve second scale latencies. • Can get data from live data streams like Kafka, flume etc Spark Streaming Spark Live Data Stream Processed Results
  6. 6. Dstreams Transformations Output
  7. 7. Spark
  8. 8. Elasticsearch • Powerful open source, distributed real-time search engine Document Mapping Index Type ShardNode
  9. 9. Percolate API
  10. 10. Elasticsearch with Spark Streaming Spark Streaming Elasticsearch
  11. 11. (elasticsearch-hadoop) connector • Spark integrates with Elasticsearch through its dedicated InputFormat, OutputFormat • Operations: • Enable “Kryo” serialization for efficient handling of conversions • Integrates 2 distributed systems –Hadoop ( Batch oriented computing platform) –Elasticsearch Reading Writing
  12. 12. Query elasticsearch- hadoop Elasticsearch Shard 1 Elasticsearch Shard 2 Elasticsearch Shard 3 Elasticsearch Shard 4 Elasticsearch Shard 5 HadoopTask Scalability HadoopTask HadoopTask HadoopTask HadoopTask
  13. 13. Architecture
  14. 14. Thank You

×