Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Upcoming SlideShare
Loading in...5
×
 

Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data

on

  • 6,471 views

Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage ...

Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.

In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.

Statistics

Views

Total Views
6,471
Views on SlideShare
6,424
Embed Views
47

Actions

Likes
15
Downloads
235
Comments
0

6 Embeds 47

http://localhost 35
http://sharepointorange.blogspot.com 4
http://sharepointorange.blogspot.in 3
http://192.168.6.56 2
https://www.blogger.com 2
http://sharepointorange.blogspot.fr 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data Presentation Transcript

  • Combine Apache Hadoop & Elasticsearch to get the most of your big data... © Hortonworks Inc. 2013 Page 1
  • Your Presenters Steve Mayzak (@smayzak) –  Head of Sales Engineering –  Seahawks fan! Mark Lochbihler (@mlochbihler) – Partner Solutions Engineer – HUGE FC Barcelona Fan! © Hortonworks Inc. 2013 Page 2
  • Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 3
  • Hadoop Adoption “Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data” --Mike Gualtieri, Forrester © Hortonworks Inc. 2013 Page 4
  • APPLICATIONS   A Traditional Approach Under Pressure Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 5
  • APPLICATIONS   Emerging Modern Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 6
  • MDA Driver #1: A New Approach to Insight Current Approach §  Apply schema on write §  Heavily dependent on IT Hadoop Approach §  Apply schema on read §  Support range of access patterns to data stored in HDFS: polymorphic access Single Query Engine SQL Determine list of questions Design solution Right Engine, Right Job batch interactive real-time in-memory Collect structured data Ask questions from list Detect additional questions © Hortonworks Inc. 2013 HADOOP Iterate over structure Transform and Analyze Page 7
  • MDA Driver #2: Data Warehouse Optimization Current Reality §  EDW at capacity; some usage from low value workloads §  Older transformed data archived, unavailable for ongoing exploration §  Source data often discarded Augment with Hadoop §  Free up EDW resources from low value tasks §  Keep 100% of source data and historical data for ongoing exploration §  Mine data for value after loading it because of schema-on-read Analytics 20% ETL Process 30% Analytics 50% Operations 50% Operations 50% © Hortonworks Inc. 2013 HADOOP Parse, cleanse, apply structure, transform Page 8
  • SCALE The Common Journey with Hadoop MDA/Data Lake More data and analytic apps Cost, Insight IT Driven New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2013 Page 9
  • Unlock Value in New Types of Data 1.  Social Understand how people are feeling and interacting – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming from remote sensors and machines 4.  Geographic Value Analyze location-based data to manage operations where they occur 5.  Server Logs Diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2013 + Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value Page 10
  • 20 Business Applications of Hadoop Industry Use Case New Account Risk Screens Geographic Clickstream Sensor Assembly Line Quality Assurance Sensor Crowdsourced Quality Assurance Social Use Genomic Data in Medical Trials Structured Monitor Patient Vitals in Real-Time Sensor Recruit and Retain Patients for Drug Trials Social, Clickstream Improve Prescription Adherence Social, Unstructured, Geographic Unify Exploration & Production Data Sensor, Geographic & Unstructured Monitor Rig Safety in Real-Time © Hortonworks Inc. 2013 Clickstream, Text Supply Chain and Logistics Government Server Logs, Text, Social Website Optimization Oil & Gas Machine, Server Logs Localized, Personalized Promotions Pharmaceuticals Machine, Geographic 360° View of the Customer Healthcare Geographic, Sensor, Text Real-time Bandwidth Allocation Manufacturing Server Logs Infrastructure Investment Retail Trading Risk Call Detail Records (CDRs) Telecom Text, Server Logs Insurance Underwriting Financial Services Type of Data Sensor, Unstructured ETL Offload in Response to Federal Budgetary Pressures Structured Sentiment Analysis for Government Programs Social Page 11
  • YARN Unlocks the Data Lake Vision Store all data in one place, interact in multiple ways Single Use System Multi-Use Data Platform Batch Apps Batch, Interactive, Online, Streaming, … 1st Gen of Hadoop 2nd Gen of Hadoop Classic   Hadoop   Apps   Batch   MapReduce   MapReduce   Hive,  Pig,  others…   Batch  &  Interac4ve   Tez   Flexible  Data   Processing   Online  Data     Processing   HBase,  Accumulo   Stream     Processing   Storm   (cluster  resource  management    &  data  processing)   Efficient  Cluster  Resource     Management  &  Shared  Services   HDFS     others   …   Redundant,  Reliable  Storage   (redundant,  reliable  storage)   © Hortonworks Inc. 2013 (YARN)   (HDFS)   Page 12
  • SCALE The Common Journey with Hadoop MDA/Data Lake More data and analytic apps Cost, Insight IT Driven New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2013 Page 13
  • Example Journey Towards a Data Lake PB’s Data Lake PB Risk Management E.g., Fraud Reduction New Business E.g., Data as a Product DATA TB’s Customer Intimacy E.g., 360 Degree View of the Customer DATA LAKE Operational Excellence E.g., Network Maintenance An architectural shift in the data center that uses Hadoop to deliver deep insight across a large, broad, diverse set of data at efficient scale VALUE © Hortonworks Inc. 2013 Page 14
  • Enabling Hadoop for the Enterprise 1 2 3 Capabilities Ensure enterprise capabilities are delivered in 100% open source to benefit all Integration Interoperable with existing data center investments Skills Leverage your existing skills: development, analytics, operations 2006 © Hortonworks Inc. 2013 2007 2008 2009 2010 2011 2012 2013 2014 2015 Page 15
  • Core Capabilities of Enterprise Hadoop 1  Presenta4on  &  Applica4on   Enable  both  exis4ng  and  new  applica4ons  to  provide     value  to  the  organiza4on   Capabilities  Opera4ons   Empower  Current  opera4ons  and   security  tools  to  manage  Hadoop   Ensure enterprise capabilities are delivered in 100% open source to benefit all Data   Governance    BROAD  INSIGHT   Data  Access   Integrate  with   exis4ng  systems   and  move  data   in/out  and  within   the  environment   Access  your  data  simultaneously  in  mul4ple  ways   (batch,  interac4ve,  real4me)    EFFICIENT  SCALE   Security   Provide  layered   approach  to   security  through   Authen4ca4on,   Authoriza4on,   Accountability   and  Data   Protec4on   Opera4ons   Allow  you  to   deploy  and   effec4vely   manage  the   environment   Data  Management   Store  and  process  all  of  your  Corporate  Data  Assets    Deployment  Model   Provide  the  efficient  deployment  op4on  for  your  organiza4on     © Hortonworks Inc. 2013 Page 16
  • 3 Skills Leverage your existing skills: development, analytics, operations Integration Interoperable with existing data center investments © Hortonworks Inc. 2013 ANALYST   2 Ensure enterprise capabilities are delivered in 100% open source to benefit all OPERATOR   1 Capabilities DEVELOPER   Enabling Familiar and Existing Tools COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITO R   Page 17
  • APPLICATIONS   Requirements for Enterprise Hadoop 1 DATA    SYSTEM   2 SOURCES   3 Business     Analy4cs   Capabilities Custom   Applica4ons   Packaged   Applica4ons   Ensure enterprise capabilities are delivered in 100% open source to benefit all Integrate with DEV  &  DATA   TOOLS   Applications BUILD  &   Business Intelligence, TEST   Developer IDEs, Data Integration Skills OPERATIONAL   TOOLS   Leverage your existing RDBMS   EDW   skills: development, MPP   analytics, operations MANAGE  &   Systems MONITOR   Integration Platforms Data Systems & Storage, Systems Management REPOSITORIES   Interoperable with existing data center investments Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Operating Systems, Virtualization, Cloud, Appliances Page 18
  • DATA  SYSTEM   APPLICATIONS   Elasticsearch in the Modern Data Architecture DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 19
  • Today’s Topics • Drivers for the Modern Data Architecture (MDA) • Elasticsearch’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 20
  • What is Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • open-source RESTful API JSON over HTTP scales massively high availability schema free Elasticsearch real time, search and analytics engine Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited Lucene based distributed multi tenancy
  • The Elasticsearch ELK Stack Logstash Elasticsearch Kibana Data From Any Source Instantly Analyze Actionable Insights Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • What about Elasticsearch the Company? •  Support 100s of Companies in Production environments •  Training Developers and Ops around the world on ELK •  Drive the ELK Projects forward, great things to come! •  Commercial products: Marvel to monitor and manage ELK •  Backed by the best: Benchmark, Index Ventures Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Who’s using Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • What are people saying about Elasticsearch? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Real-time Search •  Europe’s largest professional social network •  Over 14 Million members •  New data available for search immediately vs 50 mins •  “According to the customer survey that we conduct every quarter, search is the most important feature on our platform,” Dr. Daniel Olmedilla, Vice President, Data Science at XING Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • How do they fit together? Copyright  ElasBcsearch  2014.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Elasticsearch Index seamlessly Free Text Search Analytics Elasticsearch-Hadoop Library Integrate Natively Choice Clean, Enrich Raw data Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Elasticsearch-Hadoop Library •  Java Library for integrating Elasticsearch and Hadoop •  Pig, Hive, Cascading, MapReduce •  Search & Real-time Analytics with Elasticsearch, Hadoop as Data Lake •  Scales with Hadoop •  Works with Apache Hadoop, Certified on HDP 1.x and 2.x (Yarn compatible Binary) Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Multiple Architectures -Same Hardware -1 for 1 Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Multiple Architectures ES ES ES Node Node Node -Separate Hardware -Clusters of each -Scale Independently Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Show me! •  Hortonworks HDP Sandbox - making Hadoop easy! •  Installed Elasticsearch, Marvel and Kibana on Sandbox •  Upload elasticsearch-hadoop jar as Pig Storage lib •  Index CSV data from Pig to Elasticsearch •  Query Elasticsearch from Pig - best of both •  Kibana to Visualize and Discover Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Where to find us? elasticsearch.com elasticsearch.org @elasticsearch #elasticsearch IRC (webchat.freenode) Github elasticsearch/elasticsearch Copyright  ElasBcsearch  2013.  Copying,  publishing  and/or  distribuBng  without  wriJen  permission  is  strictly  prohibited
  • Try Hadoop Today… Get Involved More about Elasticsearch & Hortonworks hortonworks.com/partner/elasticsearch Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 Contact us: events@hortonworks.com © Hortonworks Inc. 2013 Page 35