SlideShare a Scribd company logo
1 of 17
WANdisco and Hadoop:
The Future of Big Data
     December 11, 2012
• WANdisco: Wide Area Network Distributed Computing
• Patented technology for active-active replication
• Leader in tools for software engineers (Subversion)
• No venture capital, angel investors or private equity funding
• Listed on the London Stock Exchange on June 1, 2012 in a highly
  successful IPO (LSE:WAND)
• Offices in San Ramon (CA), Boston (MA), Sheffield (UK), Belfast
  (UK), Chengdu (China), Tokyo (Japan)




                                                                    2
WANdisco Technology   Traditional Approach




                                             3
"unlike conventional solutions, the multi-site
computing system architecture does not rely on a central
transaction coordinator that is known to be a single-point-of-
failure."
                                                                  4
“Big Data is the new
       definitive source of
   competitive advantage
 across all industries. For
 those organizations that
understand and embrace
    the new reality of Big
 Data, the possibilities for
new innovation, improved
    agility, and increased
    profitability are nearly
                   endless”
                     - Wikibon




                             5
• Fixing specific problems:
    • Easy to use appliance
    • High availability
    • Disaster recovery / zero time
      to recovery (over a WAN)
• Highly differentiated
    • Nobody else can do
      active-active replication
      over a WAN




                                      6
Dr. Konstantin Shvachko
•   Co-founder of AltorStor, acquired by WANdisco
•   Was part of the team that invented Hadoop at Yahoo in 2006 and went on to
    become the Principal Big Data architect at eBay
•   Credited with the creation and maintenance of the Hadoop Distributed File
    System (HDFS), which is at the very core of both Hadoop and any replication
    solution for Hadoop
Jagane Sundar
•   Co-founder of AltorStor, acquired by WANdisco
•   Was responsible for conceiving, architecting and managing the development
    of AltoScale’s Hadoop As A Service platform before selling it to VertiCloud
•   Visionary behind AltoStor’s Cloud and Big Data Storage Appliance
•   Former Director of Hadoop Engineering at Yahoo! and managed the
    development of Hadoop 0.20.204 with Disk Fail In Place
Complementary IP and skills
•   Ideal fit with our patented active-active replication technology
•   Altostor founders faced problems (scaling, performance and high availability)
    we are planning to solve

                                                                                    7
• 24-by-7 Reliability, Availability, Scalability and Performance
• Planned as well as unplanned outages are extremely expensive
• Steep learning curve and dearth of trained specialists
• Many enterprises forced to rely on public cloud options such as Amazon
    •   Expensive hourly billing models
    •   Vendor lock-in with difficult migration paths
    •   Periodic availability and performance problems
    •   Data security concerns with cloud-based deployments
• Moving from batch model to real-time transaction model
• Our product suite will be designed to meet all of these challenges



                                                                           8
• Plug-and-play pre-packaged software eliminates the need for
  specialized Hadoop skills
• Wizard based deployment, monitoring and management
• Supports migration from Amazon to private in-house clouds
• S3-enabled filesystem unique to WANdisco’s AltoStor appliance
   • Allows searches for any kind of data (images, videos, etc.) based
      on descriptive characteristics
• HBase support for real-time transaction processing




                                                                         9
NameNode    NameNode
       NameNode




     HDFS Data




                       10
• Works over a LAN within a single data
  center
                                          NameNode    NameNode
• Works over a WAN across data centers
  thousands of miles apart

• Supports simultaneous read and write
  access on every server

                                              HDFS Data




                                                                 11
Public Cloud S3 Apps for
            the private cloud – e.g.
            JungleDisk, SmugMug,
             Senduit, Zmanda, etc.                                        Traditional Hadoop M/R
                                                    HBase Apps                     Apps

Step 1
  WANdisco AltoStor
     Appliance
  Hadoop Mgmt Server                   S3 API          HBase API        HDFS API            JobTracker
• Deploy Hadoop(s)
• Manage Hadoop                                              AltoStor Hadoop
• Monitor Hadoop
                                     Step 3



    Enterprise                                     Physical (e.g. rack of Dell servers) or
     Active                                        Virtual Infrastructure (e.g. VMware VI)
    Directory
                   Step 2
                                              for WANdisco AltoStor to use in building Hadoop

                                                                                                         12
• WANdisco AltoStor appliance has the capability to deploy on virtualized
  infrastructure such as VMware
• Advantages:
    • Extra level of reliability
    • Elasticity – live cluster shrinking and expansion
    • Extremely high hardware resource utilization
    • Resource isolation
    • Ease of management – preconfigured VMs




                                                                            13
HDFS Clients



HDFS Cluster
                                       DataNodes


      Active
                   Standby
    NameNode
                  NameNode




                      Shared Storage



                                                   14
Client                                 Client                                     Client




      Active                               Active                                       Active
    NameNode                             NameNode                             …       NameNode
                    Proposal Handler




                                                           Proposal Handler




                                                                                                      Proposal Handler
                                                                                  Dispatch
                                       Dispatch
Dispatch




                                          WANdisco PAXOS


                                                                                                                         15
Questions and
  Answers
Thank You


http://blogs.wandisco.com/autho
r/jagane-sundar

Visit us www.wandisc.com

More Related Content

What's hot

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 

What's hot (20)

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 

Viewers also liked

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...Amazon Web Services
 
Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesAlan McSweeney
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery PresentationTimSchaefer
 
An Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningAn Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningNEBizRecovery
 
The A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoveryThe A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoverySirius
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IThhuihhui
 

Viewers also liked (9)

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
 
Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery Notes
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery Presentation
 
An Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningAn Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery Planning
 
The A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoveryThe A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster Recovery
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IT
 

Similar to Hadoop and WANdisco: The Future of Big Data

Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataData Con LA
 
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...Digicomp Academy AG
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationCeph Community
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011GlusterFS
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceIsaac Christoffersen
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
 
Inside Triton, July 2015
Inside Triton, July 2015Inside Triton, July 2015
Inside Triton, July 2015Casey Bisson
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSGlusterFS
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 

Similar to Hadoop and WANdisco: The Future of Big Data (20)

Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueData
 
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Inside Triton, July 2015
Inside Triton, July 2015Inside Triton, July 2015
Inside Triton, July 2015
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 

More from WANdisco Plc

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Forrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentForrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentWANdisco Plc
 
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & MergingWANdisco Plc
 
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVNWANdisco Plc
 
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for DevelopmentWANdisco Plc
 
uberSVN introduction by WANdisco
uberSVN introduction by WANdiscouberSVN introduction by WANdisco
uberSVN introduction by WANdiscoWANdisco Plc
 
WANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Plc
 
Make Subversion Agile
Make Subversion AgileMake Subversion Agile
Make Subversion AgileWANdisco Plc
 
Subversion in 2010 and Beyond
Subversion in 2010 and BeyondSubversion in 2010 and Beyond
Subversion in 2010 and BeyondWANdisco Plc
 
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...WANdisco Plc
 
Forrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionForrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionWANdisco Plc
 

More from WANdisco Plc (13)

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Forrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentForrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed Development
 
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
 
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
 
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development
 
uberSVN introduction by WANdisco
uberSVN introduction by WANdiscouberSVN introduction by WANdisco
uberSVN introduction by WANdisco
 
Subversion Zen
Subversion ZenSubversion Zen
Subversion Zen
 
WANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Subversion Support Services
WANdisco Subversion Support Services
 
Make Subversion Agile
Make Subversion AgileMake Subversion Agile
Make Subversion Agile
 
Why Svn
Why SvnWhy Svn
Why Svn
 
Subversion in 2010 and Beyond
Subversion in 2010 and BeyondSubversion in 2010 and Beyond
Subversion in 2010 and Beyond
 
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...
 
Forrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionForrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using Subversion
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Hadoop and WANdisco: The Future of Big Data

  • 1. WANdisco and Hadoop: The Future of Big Data December 11, 2012
  • 2. • WANdisco: Wide Area Network Distributed Computing • Patented technology for active-active replication • Leader in tools for software engineers (Subversion) • No venture capital, angel investors or private equity funding • Listed on the London Stock Exchange on June 1, 2012 in a highly successful IPO (LSE:WAND) • Offices in San Ramon (CA), Boston (MA), Sheffield (UK), Belfast (UK), Chengdu (China), Tokyo (Japan) 2
  • 3. WANdisco Technology Traditional Approach 3
  • 4. "unlike conventional solutions, the multi-site computing system architecture does not rely on a central transaction coordinator that is known to be a single-point-of- failure." 4
  • 5. “Big Data is the new definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless” - Wikibon 5
  • 6. • Fixing specific problems: • Easy to use appliance • High availability • Disaster recovery / zero time to recovery (over a WAN) • Highly differentiated • Nobody else can do active-active replication over a WAN 6
  • 7. Dr. Konstantin Shvachko • Co-founder of AltorStor, acquired by WANdisco • Was part of the team that invented Hadoop at Yahoo in 2006 and went on to become the Principal Big Data architect at eBay • Credited with the creation and maintenance of the Hadoop Distributed File System (HDFS), which is at the very core of both Hadoop and any replication solution for Hadoop Jagane Sundar • Co-founder of AltorStor, acquired by WANdisco • Was responsible for conceiving, architecting and managing the development of AltoScale’s Hadoop As A Service platform before selling it to VertiCloud • Visionary behind AltoStor’s Cloud and Big Data Storage Appliance • Former Director of Hadoop Engineering at Yahoo! and managed the development of Hadoop 0.20.204 with Disk Fail In Place Complementary IP and skills • Ideal fit with our patented active-active replication technology • Altostor founders faced problems (scaling, performance and high availability) we are planning to solve 7
  • 8. • 24-by-7 Reliability, Availability, Scalability and Performance • Planned as well as unplanned outages are extremely expensive • Steep learning curve and dearth of trained specialists • Many enterprises forced to rely on public cloud options such as Amazon • Expensive hourly billing models • Vendor lock-in with difficult migration paths • Periodic availability and performance problems • Data security concerns with cloud-based deployments • Moving from batch model to real-time transaction model • Our product suite will be designed to meet all of these challenges 8
  • 9. • Plug-and-play pre-packaged software eliminates the need for specialized Hadoop skills • Wizard based deployment, monitoring and management • Supports migration from Amazon to private in-house clouds • S3-enabled filesystem unique to WANdisco’s AltoStor appliance • Allows searches for any kind of data (images, videos, etc.) based on descriptive characteristics • HBase support for real-time transaction processing 9
  • 10. NameNode NameNode NameNode HDFS Data 10
  • 11. • Works over a LAN within a single data center NameNode NameNode • Works over a WAN across data centers thousands of miles apart • Supports simultaneous read and write access on every server HDFS Data 11
  • 12. Public Cloud S3 Apps for the private cloud – e.g. JungleDisk, SmugMug, Senduit, Zmanda, etc. Traditional Hadoop M/R HBase Apps Apps Step 1 WANdisco AltoStor Appliance Hadoop Mgmt Server S3 API HBase API HDFS API JobTracker • Deploy Hadoop(s) • Manage Hadoop AltoStor Hadoop • Monitor Hadoop Step 3 Enterprise Physical (e.g. rack of Dell servers) or Active Virtual Infrastructure (e.g. VMware VI) Directory Step 2 for WANdisco AltoStor to use in building Hadoop 12
  • 13. • WANdisco AltoStor appliance has the capability to deploy on virtualized infrastructure such as VMware • Advantages: • Extra level of reliability • Elasticity – live cluster shrinking and expansion • Extremely high hardware resource utilization • Resource isolation • Ease of management – preconfigured VMs 13
  • 14. HDFS Clients HDFS Cluster DataNodes Active Standby NameNode NameNode Shared Storage 14
  • 15. Client Client Client Active Active Active NameNode NameNode … NameNode Proposal Handler Proposal Handler Proposal Handler Dispatch Dispatch Dispatch WANdisco PAXOS 15
  • 16. Questions and Answers

Editor's Notes

  1. SPOFs – Name Node, Hbase, YARN
  2. SPOFs – Name Node, Hbase, YARN