Your SlideShare is downloading. ×
0
Combine Apache Hadoop & Elasticsearch
to get the most of your big data...

© Hortonworks Inc. 2013

Page 1
Your Presenters
Steve Mayzak (@smayzak)
–  Head of Sales Engineering
–  Seahawks fan!

Mark Lochbihler (@mlochbihler)
– Pa...
Today’s Topics
• Drivers for the Modern Data Architecture (MDA)
• Elasticsearch’s role in the MDA
• Q&A

© Hortonworks Inc...
Hadoop Adoption
“Hadoop’s momentum is unstoppable as its open
source roots grow wildly into enterprises. Its refreshingly
...
APPLICATIONS	
  

A Traditional Approach Under Pressure
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Package...
APPLICATIONS	
  

Emerging Modern Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
 ...
MDA Driver #1: A New Approach to Insight
Current Approach
§  Apply schema on write
§  Heavily dependent on IT

Hadoop Ap...
MDA Driver #2: Data Warehouse Optimization
Current Reality
§  EDW at capacity; some usage
from low value workloads
§  Ol...
SCALE

The Common Journey with Hadoop
MDA/Data Lake
More data and
analytic apps

Cost, Insight
IT Driven


New Analytic Ap...
Unlock Value in New Types of Data
1.  Social
Understand how people are feeling and interacting –
right now

2.  Clickstrea...
20 Business Applications of Hadoop
Industry

Use Case
New Account Risk Screens

Geographic
Clickstream
Sensor

Assembly Li...
YARN Unlocks the Data Lake Vision
Store all data in one place, interact in multiple ways
Single Use System

Multi-Use Data...
SCALE

The Common Journey with Hadoop
MDA/Data Lake
More data and
analytic apps

Cost, Insight
IT Driven


New Analytic Ap...
Example Journey Towards a Data Lake

PB’s

Data Lake

PB

Risk Management
E.g., Fraud Reduction

New Business
E.g., Data a...
Enabling Hadoop for the Enterprise

1
2
3

Capabilities
Ensure enterprise capabilities
are delivered in 100% open
source t...
Core Capabilities of Enterprise Hadoop

1

	
  Presenta4on	
  &	
  Applica4on	
  

Enable	
  both	
  exis4ng	
  and	
  new...
3

Skills
Leverage your existing
skills: development,
analytics, operations

Integration
Interoperable with existing
data ...
APPLICATIONS	
  

Requirements for Enterprise Hadoop

1
DATA	
  	
  SYSTEM	
  

2
SOURCES	
  

3

Business	
  	
  
Analy4c...
DATA	
  SYSTEM	
  

APPLICATIONS	
  

Elasticsearch in the Modern Data Architecture

DEV	
  &	
  DATA	
  TOOLS	
  

OPERAT...
Today’s Topics
• Drivers for the Modern Data Architecture (MDA)
• Elasticsearch’s role in the MDA
• Q&A

© Hortonworks Inc...
What is Elasticsearch?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
...
open-source

RESTful
API

JSON
over HTTP

scales
massively
high
availability
schema
free

Elasticsearch
real time,
search ...
The Elasticsearch ELK Stack

Logstash

Elasticsearch

Kibana

Data From
Any Source

Instantly
Analyze

Actionable
Insights...
What about Elasticsearch the Company?
•  Support 100s of Companies in Production environments
•  Training Developers and O...
Who’s using Elasticsearch?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  witho...
What are people saying about
Elasticsearch?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  dis...
Real-time Search
• 

Europe’s largest professional social
network

• 

Over 14 Million members

• 

New data available for...
How do they fit
together?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  withou...
Elasticsearch
Index
seamlessly

Free Text
Search
Analytics

Elasticsearch-Hadoop Library
Integrate
Natively

Choice

Clean...
Elasticsearch-Hadoop Library
• 

Java Library for integrating Elasticsearch and Hadoop

• 

Pig, Hive, Cascading, MapReduc...
Multiple Architectures

-Same Hardware
-1 for 1

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
 ...
Multiple Architectures
ES
ES
ES
Node
 Node
 Node

-Separate Hardware
-Clusters of each
-Scale Independently


Copyright	
 ...
Show me!
• 

Hortonworks HDP Sandbox - making Hadoop easy!

• 

Installed Elasticsearch, Marvel and Kibana on Sandbox

• 
...
Where to find
us?
elasticsearch.com
elasticsearch.org
@elasticsearch
#elasticsearch
IRC (webchat.freenode)



Github elast...
Try Hadoop Today… Get Involved
More about Elasticsearch & Hortonworks
hortonworks.com/partner/elasticsearch

Download the ...
Upcoming SlideShare
Loading in...5
×

Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

26,371

Published on

Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.

In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.

Published in: Technology

Transcript of "Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data"

  1. 1. Combine Apache Hadoop & Elasticsearch to get the most of your big data... © Hortonworks Inc. 2013 Page 1
  2. 2. Your Presenters Steve Mayzak (@smayzak) –  Head of Sales Engineering –  Seahawks fan! Mark Lochbihler (@mlochbihler) – Partner Solutions Engineer – HUGE FC Barcelona Fan! © Hortonworks Inc. 2013 Page 2
  3. 3. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 3
  4. 4. Hadoop Adoption “Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data” --Mike Gualtieri, Forrester © Hortonworks Inc. 2013 Page 4
  5. 5. APPLICATIONS   A Traditional Approach Under Pressure Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 5
  6. 6. APPLICATIONS   Emerging Modern Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 6
  7. 7. MDA Driver #1: A New Approach to Insight Current Approach §  Apply schema on write §  Heavily dependent on IT Hadoop Approach §  Apply schema on read §  Support range of access patterns to data stored in HDFS: polymorphic access Single Query Engine SQL Determine list of questions Design solution Right Engine, Right Job batch interactive real-time in-memory Collect structured data Ask questions from list Detect additional questions © Hortonworks Inc. 2013 HADOOP Iterate over structure Transform and Analyze Page 7
  8. 8. MDA Driver #2: Data Warehouse Optimization Current Reality §  EDW at capacity; some usage from low value workloads §  Older transformed data archived, unavailable for ongoing exploration §  Source data often discarded Augment with Hadoop §  Free up EDW resources from low value tasks §  Keep 100% of source data and historical data for ongoing exploration §  Mine data for value after loading it because of schema-on-read Analytics 20% ETL Process 30% Analytics 50% Operations 50% Operations 50% © Hortonworks Inc. 2013 HADOOP Parse, cleanse, apply structure, transform Page 8
  9. 9. SCALE The Common Journey with Hadoop MDA/Data Lake More data and analytic apps Cost, Insight IT Driven New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2013 Page 9
  10. 10. Unlock Value in New Types of Data 1.  Social Understand how people are feeling and interacting – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming from remote sensors and machines 4.  Geographic Value Analyze location-based data to manage operations where they occur 5.  Server Logs Diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2013 + Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value Page 10
  11. 11. 20 Business Applications of Hadoop Industry Use Case New Account Risk Screens Geographic Clickstream Sensor Assembly Line Quality Assurance Sensor Crowdsourced Quality Assurance Social Use Genomic Data in Medical Trials Structured Monitor Patient Vitals in Real-Time Sensor Recruit and Retain Patients for Drug Trials Social, Clickstream Improve Prescription Adherence Social, Unstructured, Geographic Unify Exploration & Production Data Sensor, Geographic & Unstructured Monitor Rig Safety in Real-Time © Hortonworks Inc. 2013 Clickstream, Text Supply Chain and Logistics Government Server Logs, Text, Social Website Optimization Oil & Gas Machine, Server Logs Localized, Personalized Promotions Pharmaceuticals Machine, Geographic 360° View of the Customer Healthcare Geographic, Sensor, Text Real-time Bandwidth Allocation Manufacturing Server Logs Infrastructure Investment Retail Trading Risk Call Detail Records (CDRs) Telecom Text, Server Logs Insurance Underwriting Financial Services Type of Data Sensor, Unstructured ETL Offload in Response to Federal Budgetary Pressures Structured Sentiment Analysis for Government Programs Social Page 11
  12. 12. YARN Unlocks the Data Lake Vision Store all data in one place, interact in multiple ways Single Use System Multi-Use Data Platform Batch Apps Batch, Interactive, Online, Streaming, … 1st Gen of Hadoop 2nd Gen of Hadoop Classic   Hadoop   Apps   Batch   MapReduce   MapReduce   Hive,  Pig,  others…   Batch  &  Interac4ve   Tez   Flexible  Data   Processing   Online  Data     Processing   HBase,  Accumulo   Stream     Processing   Storm   (cluster  resource  management    &  data  processing)   Efficient  Cluster  Resource     Management  &  Shared  Services   HDFS     others   …   Redundant,  Reliable  Storage   (redundant,  reliable  storage)   © Hortonworks Inc. 2013 (YARN)   (HDFS)   Page 12
  13. 13. SCALE The Common Journey with Hadoop MDA/Data Lake More data and analytic apps Cost, Insight IT Driven New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2013 Page 13
  14. 14. Example Journey Towards a Data Lake PB’s Data Lake PB Risk Management E.g., Fraud Reduction New Business E.g., Data as a Product DATA TB’s Customer Intimacy E.g., 360 Degree View of the Customer DATA LAKE Operational Excellence E.g., Network Maintenance An architectural shift in the data center that uses Hadoop to deliver deep insight across a large, broad, diverse set of data at efficient scale VALUE © Hortonworks Inc. 2013 Page 14
  15. 15. Enabling Hadoop for the Enterprise 1 2 3 Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all Integration Interoperable with existing data center investments Skills Leverage your existing skills: development, analytics, operations 2006 © Hortonworks Inc. 2013 2007 2008 2009 2010 2011 2012 2013 2014 2015 Page 15
  16. 16. Core Capabilities of Enterprise Hadoop 1  Presenta4on  &  Applica4on   Enable  both  exis4ng  and  new  applica4ons  to  provide     value  to  the  organiza4on   Capabilities  Opera4ons   Empower  Current  opera4ons  and   security  tools  to  manage  Hadoop   Ensure enterprise capabilities are delivered in 100% open source to benefit all Data   Governance    BROAD  INSIGHT   Data  Access   Integrate  with   exis4ng  systems   and  move  data   in/out  and  within   the  environment   Access  your  data  simultaneously  in  mul4ple  ways   (batch,  interac4ve,  real4me)    EFFICIENT  SCALE   Security   Provide  layered   approach  to   security  through   Authen4ca4on,   Authoriza4on,   Accountability   and  Data   Protec4on   Opera4ons   Allow  you  to   deploy  and   effec4vely   manage  the   environment   Data  Management   Store  and  process  all  of  your  Corporate  Data  Assets    Deployment  Model   Provide  the  efficient  deployment  op4on  for  your  organiza4on     © Hortonworks Inc. 2013 Page 16
  17. 17. 3 Skills Leverage your existing skills: development, analytics, operations Integration Interoperable with existing data center investments © Hortonworks Inc. 2013 ANALYST   2 Ensure enterprise capabilities are delivered in 100% open source to benefit all OPERATOR   1 Capabilities DEVELOPER   Enabling Familiar and Existing Tools COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITO R   Page 17
  18. 18. APPLICATIONS   Requirements for Enterprise Hadoop 1 DATA    SYSTEM   2 SOURCES   3 Business     Analy4cs   Capabilities Custom   Applica4ons   Packaged   Applica4ons   Ensure enterprise capabilities are delivered in 100% open source to benefit all Integrate with DEV  &  DATA   TOOLS   Applications BUILD  &   Business Intelligence, TEST   Developer IDEs, Data Integration Skills OPERATIONAL   TOOLS   Leverage your existing RDBMS   EDW   skills: development, MPP   analytics, operations MANAGE  &   Systems MONITOR   Integration Platforms Data Systems & Storage, Systems Management REPOSITORIES   Interoperable with existing data center investments Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Operating Systems, Virtualization, Cloud, Appliances Page 18
  19. 19. DATA  SYSTEM   APPLICATIONS   Elasticsearch in the Modern Data Architecture DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 19
  20. 20. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 20
  21. 21. What is Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  22. 22. open-source RESTful API JSON over HTTP scales massively high availability schema free Elasticsearch real time, search and analytics engine Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited Lucene based distributed multi tenancy
  23. 23. The Elasticsearch ELK Stack Logstash Elasticsearch Kibana Data From Any Source Instantly Analyze Actionable Insights Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  24. 24. What about Elasticsearch the Company? •  Support 100s of Companies in Production environments •  Training Developers and Ops around the world on ELK •  Drive the ELK Projects forward, great things to come! •  Commercial products: Marvel to monitor and manage ELK •  Backed by the best: Benchmark, Index Ventures Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  25. 25. Who’s using Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  26. 26. What are people saying about Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  27. 27. Real-time Search •  Europe’s largest professional social network •  Over 14 Million members •  New data available for search immediately vs 50 mins •  “According to the customer survey that we conduct every quarter, search is the most important feature on our platform,” Dr. Daniel Olmedilla, Vice President, Data Science at XING Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  28. 28. How do they fit together? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  29. 29. Elasticsearch Index seamlessly Free Text Search Analytics Elasticsearch-Hadoop Library Integrate Natively Choice Clean, Enrich Raw data Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  30. 30. Elasticsearch-Hadoop Library •  Java Library for integrating Elasticsearch and Hadoop •  Pig, Hive, Cascading, MapReduce •  Search & Real-time Analytics with Elasticsearch, Hadoop as Data Lake •  Scales with Hadoop •  Works with Apache Hadoop, Certified on HDP 1.x and 2.x (Yarn compatible Binary) Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  31. 31. Multiple Architectures -Same Hardware -1 for 1 Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  32. 32. Multiple Architectures ES ES ES Node Node Node -Separate Hardware -Clusters of each -Scale Independently Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  33. 33. Show me! •  Hortonworks HDP Sandbox - making Hadoop easy! •  Installed Elasticsearch, Marvel and Kibana on Sandbox •  Upload elasticsearch-hadoop jar as Pig Storage lib •  Index CSV data from Pig to Elasticsearch •  Query Elasticsearch from Pig - best of both •  Kibana to Visualize and Discover Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  34. 34. Where to find us? elasticsearch.com elasticsearch.org @elasticsearch #elasticsearch IRC (webchat.freenode) Github elasticsearch/elasticsearch Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  35. 35. Try Hadoop Today… Get Involved More about Elasticsearch & Hortonworks hortonworks.com/partner/elasticsearch Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 Contact us: events@hortonworks.com © Hortonworks Inc. 2013 Page 35
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×