SlideShare a Scribd company logo
1 of 35
Download to read offline
Combine Apache Hadoop & Elasticsearch
to get the most of your big data...

© Hortonworks Inc. 2013

Page 1
Your Presenters
Steve Mayzak (@smayzak)
–  Head of Sales Engineering
–  Seahawks fan!

Mark Lochbihler (@mlochbihler)
– Partner Solutions Engineer
– HUGE FC Barcelona Fan!

© Hortonworks Inc. 2013

Page 2
Today’s Topics
• Drivers for the Modern Data Architecture (MDA)
• Elasticsearch’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 3
Hadoop Adoption
“Hadoop’s momentum is unstoppable as its open
source roots grow wildly into enterprises. Its refreshingly
unique approach to data management is transforming how
companies store, process, analyze, and share big data”
--Mike Gualtieri, Forrester

© Hortonworks Inc. 2013

Page 4
APPLICATIONS	
  

A Traditional Approach Under Pressure
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

DATA	
  	
  SYSTEM	
  

2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
RDBMS	
  

EDW	
  

MPP	
  

REPOSITORIES	
  

15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  

SOURCES	
  

Source: IDC

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 5
APPLICATIONS	
  

Emerging Modern Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

SOURCES	
  

DATA	
  	
  SYSTEM	
  

BUILD	
  &	
  
TEST	
  

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
MONITOR	
  

MPP	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 6
MDA Driver #1: A New Approach to Insight
Current Approach
§  Apply schema on write
§  Heavily dependent on IT

Hadoop Approach
§  Apply schema on read
§  Support range of access patterns to
data stored in HDFS: polymorphic
access

Single Query Engine
SQL
Determine list of questions
Design solution

Right Engine, Right Job
batch

interactive

real-time

in-memory

Collect structured data
Ask questions from list
Detect additional questions

© Hortonworks Inc. 2013

HADOOP
Iterate over structure
Transform and Analyze

Page 7
MDA Driver #2: Data Warehouse Optimization
Current Reality
§  EDW at capacity; some usage
from low value workloads
§  Older transformed data
archived, unavailable for
ongoing exploration
§  Source data often discarded

Augment with Hadoop
§  Free up EDW resources from low
value tasks
§  Keep 100% of source data and
historical data for ongoing exploration
§  Mine data for value after loading it
because of schema-on-read

Analytics

20%

ETL Process

30%

Analytics

50%

Operations

50%

Operations

50%

© Hortonworks Inc. 2013

HADOOP
Parse, cleanse,
apply structure, transform
Page 8
SCALE

The Common Journey with Hadoop
MDA/Data Lake
More data and
analytic apps

Cost, Insight
IT Driven


New Analytic Apps
New Types of Data
LOB Driven


SCOPE
© Hortonworks Inc. 2013

Page 9
Unlock Value in New Types of Data
1.  Social
Understand how people are feeling and interacting –
right now

2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website

3.  Sensor/Machine
Discover patterns in data streaming from remote
sensors and machines

4.  Geographic

Value

Analyze location-based data to manage operations
where they occur

5.  Server Logs
Diagnose process failures and prevent security
breaches

6.  Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web
pages, emails, and documents

© Hortonworks Inc. 2013

+ Online archive
Data that was once purged or moved
to tape can be stored in Hadoop to
discover long term trends and
previously hidden value

Page 10
20 Business Applications of Hadoop
Industry

Use Case
New Account Risk Screens

Geographic
Clickstream
Sensor

Assembly Line Quality Assurance

Sensor

Crowdsourced Quality Assurance

Social

Use Genomic Data in Medical Trials

Structured

Monitor Patient Vitals in Real-Time

Sensor

Recruit and Retain Patients for Drug Trials

Social, Clickstream

Improve Prescription Adherence

Social, Unstructured, Geographic

Unify Exploration & Production Data

Sensor, Geographic & Unstructured

Monitor Rig Safety in Real-Time

© Hortonworks Inc. 2013

Clickstream, Text

Supply Chain and Logistics

Government

Server Logs, Text, Social

Website Optimization

Oil & Gas

Machine, Server Logs

Localized, Personalized Promotions

Pharmaceuticals

Machine, Geographic

360° View of the Customer

Healthcare

Geographic, Sensor, Text

Real-time Bandwidth Allocation

Manufacturing

Server Logs

Infrastructure Investment

Retail

Trading Risk

Call Detail Records (CDRs)

Telecom

Text, Server Logs

Insurance Underwriting

Financial Services

Type of Data

Sensor, Unstructured

ETL Offload in Response to Federal Budgetary Pressures

Structured

Sentiment Analysis for Government Programs

Social
Page 11
YARN Unlocks the Data Lake Vision
Store all data in one place, interact in multiple ways
Single Use System

Multi-Use Data Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

1st Gen
of Hadoop

2nd Gen of Hadoop
Classic	
  
Hadoop	
  
Apps	
  
Batch	
  
MapReduce	
  

MapReduce	
  

Hive,	
  Pig,	
  others…	
  
Batch	
  &	
  Interac4ve	
  
Tez	
  

Flexible	
  Data	
  
Processing	
  

Online	
  Data	
  	
  
Processing	
  

HBase,	
  Accumulo	
  

Stream	
  	
  
Processing	
  
Storm	
  

(cluster	
  resource	
  management	
  
	
  &	
  data	
  processing)	
  

Efficient	
  Cluster	
  Resource	
  	
  
Management	
  &	
  Shared	
  Services	
  

HDFS	
  

	
  
others	
  
…	
  

Redundant,	
  Reliable	
  Storage	
  

(redundant,	
  reliable	
  storage)	
  

© Hortonworks Inc. 2013

(YARN)	
  

(HDFS)	
  

Page 12
SCALE

The Common Journey with Hadoop
MDA/Data Lake
More data and
analytic apps

Cost, Insight
IT Driven


New Analytic Apps
New Types of Data
LOB Driven


SCOPE
© Hortonworks Inc. 2013

Page 13
Example Journey Towards a Data Lake

PB’s

Data Lake

PB

Risk Management
E.g., Fraud Reduction

New Business
E.g., Data as a Product

DATA

TB’s

Customer Intimacy
E.g., 360 Degree View
of the Customer

DATA LAKE
Operational Excellence
E.g., Network
Maintenance

An architectural shift in the
data center that uses Hadoop
to deliver deep insight across a
large, broad, diverse set of
data at efficient scale

VALUE
© Hortonworks Inc. 2013

Page 14
Enabling Hadoop for the Enterprise

1
2
3

Capabilities
Ensure enterprise capabilities
are delivered in 100% open
source to benefit all

Integration
Interoperable with existing
data center investments

Skills
Leverage your existing
skills: development,
analytics, operations

2006

© Hortonworks Inc. 2013

2007

2008

2009

2010

2011

2012

2013

2014

2015

Page 15
Core Capabilities of Enterprise Hadoop

1

	
  Presenta4on	
  &	
  Applica4on	
  

Enable	
  both	
  exis4ng	
  and	
  new	
  applica4ons	
  to	
  provide	
  	
  
value	
  to	
  the	
  organiza4on	
  

Capabilities

	
  Opera4ons	
  

Empower	
  Current	
  opera4ons	
  and	
  
security	
  tools	
  to	
  manage	
  Hadoop	
  

Ensure enterprise capabilities
are delivered in 100% open
source to benefit all

Data	
  
Governance	
  

	
  BROAD	
  INSIGHT	
  
Data	
  Access	
  

Integrate	
  with	
  
exis4ng	
  systems	
  
and	
  move	
  data	
  
in/out	
  and	
  within	
  
the	
  environment	
  

Access	
  your	
  data	
  simultaneously	
  in	
  mul4ple	
  ways	
  
(batch,	
  interac4ve,	
  real4me)	
  

	
  EFFICIENT	
  SCALE	
  

Security	
  

Provide	
  layered	
  
approach	
  to	
  
security	
  through	
  
Authen4ca4on,	
  
Authoriza4on,	
  
Accountability	
  
and	
  Data	
  
Protec4on	
  

Opera4ons	
  
Allow	
  you	
  to	
  
deploy	
  and	
  
effec4vely	
  
manage	
  the	
  
environment	
  

Data	
  Management	
  

Store	
  and	
  process	
  all	
  of	
  your	
  Corporate	
  Data	
  Assets	
  

	
  Deployment	
  Model	
  

Provide	
  the	
  efficient	
  deployment	
  op4on	
  for	
  your	
  organiza4on	
  
	
  

© Hortonworks Inc. 2013

Page 16
3

Skills
Leverage your existing
skills: development,
analytics, operations

Integration
Interoperable with existing
data center investments

© Hortonworks Inc. 2013

ANALYST	
  

2

Ensure enterprise capabilities
are delivered in 100% open
source to benefit all

OPERATOR	
  

1

Capabilities

DEVELOPER	
  

Enabling Familiar and Existing Tools

COLLECT	
  

PROCESS	
  

BUILD	
  

EXPLORE	
  

QUERY	
  

DELIVER	
  

PROVISION	
  

MANAGE	
  

MONITO
R	
  

Page 17
APPLICATIONS	
  

Requirements for Enterprise Hadoop

1
DATA	
  	
  SYSTEM	
  

2
SOURCES	
  

3

Business	
  	
  
Analy4cs	
  
Capabilities

Custom	
  
Applica4ons	
  

Packaged	
  
Applica4ons	
  

Ensure enterprise capabilities
are delivered in 100% open
source to benefit all

Integrate with
DEV	
  &	
  DATA	
  
TOOLS	
  

Applications
BUILD	
  &	
  

Business Intelligence,
TEST	
  
Developer IDEs,
Data Integration

Skills

OPERATIONAL	
  
TOOLS	
  

Leverage your existing
RDBMS	
  
EDW	
  
skills: development, MPP	
  
analytics, operations

MANAGE	
  &	
  
Systems
MONITOR	
  

Integration

Platforms

Data Systems & Storage,
Systems Management

REPOSITORIES	
  

Interoperable with existing
data center investments

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Operating Systems,
Virtualization, Cloud,
Appliances

Page 18
DATA	
  SYSTEM	
  

APPLICATIONS	
  

Elasticsearch in the Modern Data Architecture

DEV	
  &	
  DATA	
  TOOLS	
  

OPERATIONAL	
  TOOLS	
  
RDBMS	
  

EDW	
  

HANA

MPP	
  

SOURCES	
  

INFRASTRUCTURE	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 19
Today’s Topics
• Drivers for the Modern Data Architecture (MDA)
• Elasticsearch’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 20
What is Elasticsearch?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
open-source

RESTful
API

JSON
over HTTP

scales
massively
high
availability
schema
free

Elasticsearch
real time,
search and
analytics engine

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited



Lucene
based
distributed
multi
tenancy
The Elasticsearch ELK Stack

Logstash

Elasticsearch

Kibana

Data From
Any Source

Instantly
Analyze

Actionable
Insights

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
What about Elasticsearch the Company?
•  Support 100s of Companies in Production environments
•  Training Developers and Ops around the world on ELK
•  Drive the ELK Projects forward, great things to come!
•  Commercial products: Marvel to monitor and manage ELK
•  Backed by the best: Benchmark, Index Ventures

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Who’s using Elasticsearch?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
What are people saying about
Elasticsearch?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Real-time Search
• 

Europe’s largest professional social
network

• 

Over 14 Million members

• 

New data available for search
immediately vs 50 mins

• 

“According to the customer survey
that we conduct every quarter,
search is the most important feature
on our platform,” Dr. Daniel
Olmedilla, Vice President, Data
Science at XING

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
How do they fit
together?

Copyright	
  ElasBcsearch	
  2014.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Elasticsearch
Index
seamlessly

Free Text
Search
Analytics

Elasticsearch-Hadoop Library
Integrate
Natively

Choice

Clean,
Enrich

Raw
data

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Elasticsearch-Hadoop Library
• 

Java Library for integrating Elasticsearch and Hadoop

• 

Pig, Hive, Cascading, MapReduce

• 

Search & Real-time Analytics with Elasticsearch,
Hadoop as Data Lake

• 

Scales with Hadoop

• 

Works with Apache Hadoop, Certified on HDP 1.x and
2.x (Yarn compatible Binary)

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Multiple Architectures

-Same Hardware
-1 for 1

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Multiple Architectures
ES
ES
ES
Node
 Node
 Node

-Separate Hardware
-Clusters of each
-Scale Independently


Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Show me!
• 

Hortonworks HDP Sandbox - making Hadoop easy!

• 

Installed Elasticsearch, Marvel and Kibana on Sandbox

• 

Upload elasticsearch-hadoop jar as Pig Storage lib

• 

Index CSV data from Pig to Elasticsearch

• 

Query Elasticsearch from Pig - best of both

• 

Kibana to Visualize and Discover

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Where to find
us?
elasticsearch.com
elasticsearch.org
@elasticsearch
#elasticsearch
IRC (webchat.freenode)



Github elasticsearch/elasticsearch

Copyright	
  ElasBcsearch	
  2013.	
  Copying,	
  publishing	
  and/or	
  distribuBng	
  without	
  wriJen	
  permission	
  is	
  strictly	
  prohibited
Try Hadoop Today… Get Involved
More about Elasticsearch & Hortonworks
hortonworks.com/partner/elasticsearch

Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2

Contact us: events@hortonworks.com
© Hortonworks Inc. 2013

Page 35

More Related Content

What's hot

Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarScyllaDB
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkDatabricks
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...Amazon Web Services
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka confluent
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 

What's hot (20)

Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Hadoop
HadoopHadoop
Hadoop
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka Cisco’s E-Commerce Transformation Using Kafka
Cisco’s E-Commerce Transformation Using Kafka
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 

Similar to Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack EuropeHortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 

Similar to Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data (20)

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

  • 1. Combine Apache Hadoop & Elasticsearch to get the most of your big data... © Hortonworks Inc. 2013 Page 1
  • 2. Your Presenters Steve Mayzak (@smayzak) –  Head of Sales Engineering –  Seahawks fan! Mark Lochbihler (@mlochbihler) – Partner Solutions Engineer – HUGE FC Barcelona Fan! © Hortonworks Inc. 2013 Page 2
  • 3. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 3
  • 4. Hadoop Adoption “Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data” --Mike Gualtieri, Forrester © Hortonworks Inc. 2013 Page 4
  • 5. APPLICATIONS   A Traditional Approach Under Pressure Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 5
  • 6. APPLICATIONS   Emerging Modern Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 6
  • 7. MDA Driver #1: A New Approach to Insight Current Approach §  Apply schema on write §  Heavily dependent on IT Hadoop Approach §  Apply schema on read §  Support range of access patterns to data stored in HDFS: polymorphic access Single Query Engine SQL Determine list of questions Design solution Right Engine, Right Job batch interactive real-time in-memory Collect structured data Ask questions from list Detect additional questions © Hortonworks Inc. 2013 HADOOP Iterate over structure Transform and Analyze Page 7
  • 8. MDA Driver #2: Data Warehouse Optimization Current Reality §  EDW at capacity; some usage from low value workloads §  Older transformed data archived, unavailable for ongoing exploration §  Source data often discarded Augment with Hadoop §  Free up EDW resources from low value tasks §  Keep 100% of source data and historical data for ongoing exploration §  Mine data for value after loading it because of schema-on-read Analytics 20% ETL Process 30% Analytics 50% Operations 50% Operations 50% © Hortonworks Inc. 2013 HADOOP Parse, cleanse, apply structure, transform Page 8
  • 9. SCALE The Common Journey with Hadoop MDA/Data Lake More data and analytic apps Cost, Insight IT Driven New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2013 Page 9
  • 10. Unlock Value in New Types of Data 1.  Social Understand how people are feeling and interacting – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming from remote sensors and machines 4.  Geographic Value Analyze location-based data to manage operations where they occur 5.  Server Logs Diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2013 + Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value Page 10
  • 11. 20 Business Applications of Hadoop Industry Use Case New Account Risk Screens Geographic Clickstream Sensor Assembly Line Quality Assurance Sensor Crowdsourced Quality Assurance Social Use Genomic Data in Medical Trials Structured Monitor Patient Vitals in Real-Time Sensor Recruit and Retain Patients for Drug Trials Social, Clickstream Improve Prescription Adherence Social, Unstructured, Geographic Unify Exploration & Production Data Sensor, Geographic & Unstructured Monitor Rig Safety in Real-Time © Hortonworks Inc. 2013 Clickstream, Text Supply Chain and Logistics Government Server Logs, Text, Social Website Optimization Oil & Gas Machine, Server Logs Localized, Personalized Promotions Pharmaceuticals Machine, Geographic 360° View of the Customer Healthcare Geographic, Sensor, Text Real-time Bandwidth Allocation Manufacturing Server Logs Infrastructure Investment Retail Trading Risk Call Detail Records (CDRs) Telecom Text, Server Logs Insurance Underwriting Financial Services Type of Data Sensor, Unstructured ETL Offload in Response to Federal Budgetary Pressures Structured Sentiment Analysis for Government Programs Social Page 11
  • 12. YARN Unlocks the Data Lake Vision Store all data in one place, interact in multiple ways Single Use System Multi-Use Data Platform Batch Apps Batch, Interactive, Online, Streaming, … 1st Gen of Hadoop 2nd Gen of Hadoop Classic   Hadoop   Apps   Batch   MapReduce   MapReduce   Hive,  Pig,  others…   Batch  &  Interac4ve   Tez   Flexible  Data   Processing   Online  Data     Processing   HBase,  Accumulo   Stream     Processing   Storm   (cluster  resource  management    &  data  processing)   Efficient  Cluster  Resource     Management  &  Shared  Services   HDFS     others   …   Redundant,  Reliable  Storage   (redundant,  reliable  storage)   © Hortonworks Inc. 2013 (YARN)   (HDFS)   Page 12
  • 13. SCALE The Common Journey with Hadoop MDA/Data Lake More data and analytic apps Cost, Insight IT Driven New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2013 Page 13
  • 14. Example Journey Towards a Data Lake PB’s Data Lake PB Risk Management E.g., Fraud Reduction New Business E.g., Data as a Product DATA TB’s Customer Intimacy E.g., 360 Degree View of the Customer DATA LAKE Operational Excellence E.g., Network Maintenance An architectural shift in the data center that uses Hadoop to deliver deep insight across a large, broad, diverse set of data at efficient scale VALUE © Hortonworks Inc. 2013 Page 14
  • 15. Enabling Hadoop for the Enterprise 1 2 3 Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all Integration Interoperable with existing data center investments Skills Leverage your existing skills: development, analytics, operations 2006 © Hortonworks Inc. 2013 2007 2008 2009 2010 2011 2012 2013 2014 2015 Page 15
  • 16. Core Capabilities of Enterprise Hadoop 1  Presenta4on  &  Applica4on   Enable  both  exis4ng  and  new  applica4ons  to  provide     value  to  the  organiza4on   Capabilities  Opera4ons   Empower  Current  opera4ons  and   security  tools  to  manage  Hadoop   Ensure enterprise capabilities are delivered in 100% open source to benefit all Data   Governance    BROAD  INSIGHT   Data  Access   Integrate  with   exis4ng  systems   and  move  data   in/out  and  within   the  environment   Access  your  data  simultaneously  in  mul4ple  ways   (batch,  interac4ve,  real4me)    EFFICIENT  SCALE   Security   Provide  layered   approach  to   security  through   Authen4ca4on,   Authoriza4on,   Accountability   and  Data   Protec4on   Opera4ons   Allow  you  to   deploy  and   effec4vely   manage  the   environment   Data  Management   Store  and  process  all  of  your  Corporate  Data  Assets    Deployment  Model   Provide  the  efficient  deployment  op4on  for  your  organiza4on     © Hortonworks Inc. 2013 Page 16
  • 17. 3 Skills Leverage your existing skills: development, analytics, operations Integration Interoperable with existing data center investments © Hortonworks Inc. 2013 ANALYST   2 Ensure enterprise capabilities are delivered in 100% open source to benefit all OPERATOR   1 Capabilities DEVELOPER   Enabling Familiar and Existing Tools COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITO R   Page 17
  • 18. APPLICATIONS   Requirements for Enterprise Hadoop 1 DATA    SYSTEM   2 SOURCES   3 Business     Analy4cs   Capabilities Custom   Applica4ons   Packaged   Applica4ons   Ensure enterprise capabilities are delivered in 100% open source to benefit all Integrate with DEV  &  DATA   TOOLS   Applications BUILD  &   Business Intelligence, TEST   Developer IDEs, Data Integration Skills OPERATIONAL   TOOLS   Leverage your existing RDBMS   EDW   skills: development, MPP   analytics, operations MANAGE  &   Systems MONITOR   Integration Platforms Data Systems & Storage, Systems Management REPOSITORIES   Interoperable with existing data center investments Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Operating Systems, Virtualization, Cloud, Appliances Page 18
  • 19. DATA  SYSTEM   APPLICATIONS   Elasticsearch in the Modern Data Architecture DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 19
  • 20. Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 20
  • 21. What is Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 22. open-source RESTful API JSON over HTTP scales massively high availability schema free Elasticsearch real time, search and analytics engine Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited Lucene based distributed multi tenancy
  • 23. The Elasticsearch ELK Stack Logstash Elasticsearch Kibana Data From Any Source Instantly Analyze Actionable Insights Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 24. What about Elasticsearch the Company? •  Support 100s of Companies in Production environments •  Training Developers and Ops around the world on ELK •  Drive the ELK Projects forward, great things to come! •  Commercial products: Marvel to monitor and manage ELK •  Backed by the best: Benchmark, Index Ventures Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 25. Who’s using Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 26. What are people saying about Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 27. Real-time Search •  Europe’s largest professional social network •  Over 14 Million members •  New data available for search immediately vs 50 mins •  “According to the customer survey that we conduct every quarter, search is the most important feature on our platform,” Dr. Daniel Olmedilla, Vice President, Data Science at XING Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 28. How do they fit together? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 29. Elasticsearch Index seamlessly Free Text Search Analytics Elasticsearch-Hadoop Library Integrate Natively Choice Clean, Enrich Raw data Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 30. Elasticsearch-Hadoop Library •  Java Library for integrating Elasticsearch and Hadoop •  Pig, Hive, Cascading, MapReduce •  Search & Real-time Analytics with Elasticsearch, Hadoop as Data Lake •  Scales with Hadoop •  Works with Apache Hadoop, Certified on HDP 1.x and 2.x (Yarn compatible Binary) Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 31. Multiple Architectures -Same Hardware -1 for 1 Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 32. Multiple Architectures ES ES ES Node Node Node -Separate Hardware -Clusters of each -Scale Independently Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 33. Show me! •  Hortonworks HDP Sandbox - making Hadoop easy! •  Installed Elasticsearch, Marvel and Kibana on Sandbox •  Upload elasticsearch-hadoop jar as Pig Storage lib •  Index CSV data from Pig to Elasticsearch •  Query Elasticsearch from Pig - best of both •  Kibana to Visualize and Discover Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 34. Where to find us? elasticsearch.com elasticsearch.org @elasticsearch #elasticsearch IRC (webchat.freenode) Github elasticsearch/elasticsearch Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • 35. Try Hadoop Today… Get Involved More about Elasticsearch & Hortonworks hortonworks.com/partner/elasticsearch Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 Contact us: events@hortonworks.com © Hortonworks Inc. 2013 Page 35