Non-Stop Hadoop for Hortonworks
Upcoming SlideShare
Loading in...5
×
 

Non-Stop Hadoop for Hortonworks

on

  • 2,789 views

In this webinar, we'll: ...

In this webinar, we'll:
-Examine the key drivers and use cases for High Availability, performance and scalability for Apache Hadoop.
-Walk through an overview of reference architecture for a Non-Stop Hadoop implementation.
-Show how you can get started with Non-Stop Hadoop with the Hortonworks Data Platform.

Statistics

Views

Total Views
2,789
Slideshare-icon Views on SlideShare
2,788
Embed Views
1

Actions

Likes
10
Downloads
281
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks Presentation Transcript

    • Modern Data Architecture …for Non-Stop Hadoop © Hortonworks Inc. 2013 Page 1
    • Your Presenters • Jagane Sundar (@jagane) – CTO of Big Data at WANdisco –  Co-founder of AltoStor and former Director of Engineering in Yahoo’s Hadoop group –  Managed Hadoop 0.20.204 release for Yahoo • Rohit Bakhshi (@Rohit2b) – Product Management at Hortonworks –  Focus on HDP Platform Services, Hadoop Core and Windows enablement –  Enjoy live jazz and expresso © Hortonworks Inc. 2013 Page 2
    • Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • WANdisco’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 3
    • APPLICATIONS   Existing Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MPP   MANAGE  &   MONITOR   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Page 4
    • APPLICATIONS   Existing Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Page 5
    • APPLICATIONS   Modern Data Architecture Enabled Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 6
    • Drivers of Hadoop Adoption Architectural A Modern Data Architecture New Business Applications Complement your existing data systems: the right workload in the right place Types of Big Data •  CRM, ERP •  Server log •  Clickstream •  Sentiment/Social •  Machine/Sensor •  Geo-locations © Hortonworks Inc. 2013 - Confidential Page 7
    • Opportunity in types of data 1.  Sentiment Understand how your customers feel about your brand and products – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4.  Geographic Value Analyze location-based data to manage operations where they occur 5.  Server Logs Research logs to diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2013 - Confidential Page 8
    • 3 Requirements for Hadoop Adoption Requirements for Hadoop’s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Key Services Skills Platform, operational and data services essential for the enterprise Leverage your existing skills: development, operations, analytics © Hortonworks Inc. 2013 - Confidential Page 9
    • Requirements for Enterprise Hadoop 1 2 3 Key Services Platform, Operational and Data services essential for the enterprise OPERATIONAL   SERVICES   AMBARI   HBASE   CORE   PIG   SQOOP   LOAD  &     EXTRACT   Skills     PLATFORM     SERVICES   Integrated MAP     REDUCE     NFS   TEZ   YARN       WebHDFS   KNOX*   HIVE  &   HCATALOG   HDFS   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS     DATA  PLATFORM  (HDP)   Engineered with existing data center investments OS/VM   © Hortonworks Inc. 2013 - Confidential FLUME   FALCON*   OOZIE   Leverage your existing skills: development, analytics, operations DATA   SERVICES   Cloud   Appliance   Page 10
    • Requirements for Enterprise Hadoop 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITOR   Engineered with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 11
    • Familiar and Existing Tools 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   BusinessObjects BI PROVISION   MANAGE   MONITOR   Interoperable with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 12
    • APPLICATIONS   Requirements for Enterprise Hadoop Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   Integrated with DEV  &  DATA   TOOLS   Applications BUILD  &   DATA    SYSTEM   Business Intelligence, TEST   Developer IDEs, Data Integration SOURCES   3 OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   Systems MONITOR   MPP   Data Systems & Storage, Systems Management REPOSITORIES   Platforms Integration   Exis4ng  Sources   Engineered with Lexisting (CRM,  ERP,  Clickstream,   ogs)   data center investments © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Operating Systems, Virtualization, Cloud, Appliances Page 13
    • DATA  SYSTEM   APPLICATIONS   WANdisco in the Modern Data Architecture BusinessObjects BI DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 14
    • Non-Stop Hadoop for Hortonworks •  Non-stop technology delivers continuous uptime with no data loss •  One Hadoop cluster across data centers any distance •  Eliminates the bottleneck of a single active NameNode •  Automatic backup, failover and recovery within across data centers •  LAN-speed read and write © Hortonworks Inc. 2013 - Confidential Page 15
    • Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • WANdisco’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 16
    • WANdisco Background u  WANdisco: Wide Area Network Distributed Computing –  Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability u  Leader in tools for software engineers – Subversion –  Apache Software Foundation sponsor u  Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) u  US patented active-active replication technology granted, November 2012 u  Global locations –  San Ramon (CA) –  Chengdu (China) –  Tokyo (Japan) –  Boston (MA) –  Sheffield (UK) –  Belfast (UK) © WANdisco 2013 / page 17
    • Customers © WANdisco 2013
    • WANdisco u  Overarching theme - We’re enabling global protection against: •  Data loss •  Downtime •  Loss of Intellectual Property •  Loss of revenue/time to market •  Falling behind the competition © WANdisco 2013
    • Non-Stop Hadoop Extending HDFS across Data Centers u  Single HDFS that spans multiple Data Centers across the world u  Provides 100% Uptime for Hadoop u  Built as an extension on top of Apache Hadoop HDFS u  100 % HDFS / 100% compatibility with Hadoop applications – Applications run unmodified u  Applications can run in any Data Center u  Not Simple Mirroring or a Copy © WANdisco 2013
    • WANdisco DConE Distributed Coordination Engine u  WANdisco’s patented WAN capable Paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata •  •  Create, Modify, Delete •  u  Active-Active (All locations) Share nothing (No Leader) No restrictions on distance between data centers –  US Patent granted for time independent implementation of Paxos u  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks © WANdisco 2013
    • Apache Hadoop © WANdisco 2013 / page 22
    • Apache Hadoop © WANdisco 2013 / page 23
    • Apache Hadoop © WANdisco 2013 / page 24
    • Apache Hadoop © WANdisco 2013 / page 25
    • Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 26
    • Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 27
    • Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 28
    • Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 29
    • Non-Stop Hadoop over WAN Unlimited performance and scalability © WANdisco 2013 / page 30
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 31
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 32
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 33
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 34
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 35
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 36
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 37
    • Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 38
    • Non-Stop Hadoop u  Architecture –  Non-Intrusive - Not Simple Mirroring or a Copy –  Does not modify Apache Hadoop –  Runs on HDP 2 and later u  Provides 100% Uptime for Hadoop –  Provides Continuous Availability of HDFS Data –  Guarantees 100% Uptime of HDFS During all 4 Categories of Failures u  Enables HDFS to be Deployed Globally – Across the WAN –  Extends HDFS Across Multiple Data Centers –  Unifies the HDFS Namespace –  Exceeds Business Continuity Requirements for SLAs and Compliance u  Load Balances NameNode Traffic for Increased Scalability © WANdisco 2013
    • DEMO DEMO © WANdisco 2013 / page 40
    • Use Cases for Non-Stop Hadoop with Hortonworks u  Disaster Recovery –  Data is as current as possible (no periodic synchronizations) –  Virtually zero downtime to recover from regional data center failure –  Regulatory compliance u  Load Balancing u  Multi Data Center Ingest –  Information doesn’t need to be sent to one DC and then copied back to the other using DistCP –  Parallel ingest methods don’t require redirected data streams u  Global MapReduce –  Global Click Stream Analysis –  Global Log Analysis –  Etc. u  Maximize Resource Utilization –  All data centers can be used to run different jobs concurrently © WANdisco 2013 / page 41
    • Key Takeaways Non-Stop Hadoop for Hortonworks u  Non-Stop Hadoop make Hadoop Enterprise/Production Ready u  Load balancing eliminates the bottleneck of a single NameNode u  Active-Active replication solves the Hadoop high availability issue u  No job restarts or lost time for NameNode failures (Continuous Availability) u  Single HDFS across multiple data centers –  No out of sync issues –  No Load Balancer maintenance problems u  Data Centers can be located at any distance from each other u  If any Data Center fails, applications can be run on any other replicated Data Center u  If a Data Center is completely lost, any other replica of that Data Center can be used to restore it © WANdisco 2013 / page 42
    • Next Steps: More about Non-Stop Hadoop for Hortonworks http://www.wandisco.com/hadoop/non-stop-hadoophortonworks Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/ Try Non-Stop Hadoop for Hortonworks Contact us: WANdisco@hortonworks.com © Hortonworks Inc. 2013 Page 43