Piranha vs. mammoth   predator appliances that chew up big data
Upcoming SlideShare
Loading in...5
×
 

Piranha vs. mammoth predator appliances that chew up big data

on

  • 2,187 views

If you also got the Big Data itch, here is something to ease the pain :-) ...

If you also got the Big Data itch, here is something to ease the pain :-)
Answers to this questions will be available soon (more info in the attached link)

Which Big Data Appliance should YOU use?
(click on the attached link for Poll results)

Appliances are Small and Quick, Right?

Revealing the 6 Types of Big Data Appliances

Uncovering the Main Players

Challenges, Pitfalls, and Winning the Big Data Game

Where is all this leading YOU to?

Statistics

Views

Total Views
2,187
Views on SlideShare
1,877
Embed Views
310

Actions

Likes
0
Downloads
34
Comments
0

3 Embeds 310

http://itprofessional-mastermind.com 225
https://twitter.com 84
http://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Piranha vs. mammoth   predator appliances that chew up big data Piranha vs. mammoth predator appliances that chew up big data Presentation Transcript

  • Piranha vs. Mammoth Predator Appliances chew up BIG DATA
  • Piranha vs. Mammoth Predator Appliances chew up BIG DATA• Appliances are Small and Quick, Right?• Revealing the 6 Types of Big Data Appliances• Uncovering the Main Players• Which Big Data Appliance should YOU use?• Challenges, Pitfalls, and Winning the Big Data Game• Where is all this leading YOU to?
  • Appliances are Small and Quick, Right?
  • Well, in some cases.But, Big Data Appliance can be… BIG… Quantum StorNext M330 Presented on YouTube http://www.youtube.com/watch?v=X1IZpoyHxlY
  • So what makes a great appliance?
  • But first, let’s get to know You(Big Data Appliance Poll #1…)
  • How deep have you dived into Big Data?A. Just starting to learn itB. Learning a lot, nothing done yetC. Planning a Big Data ProjectD. Running a Big Data OperationE. I dont get it Yet! Whats all the fuss about it?
  • Results…
  • So what makes a great appliance?
  • So what makes a great appliance?1. Does the job – no more, no less2. Quick and simple setup3. Quick and easy updates4. Easy control of one or many instances5. Simple Infrastructure requirements6. Reliable underlying system7. No delays doing it’s job8. What else?
  • What’s the most important Job for a Great Appliance? (Poll #2)
  • What’s the most important Job for a Great Appliance? (Poll #2)A. Does the job on time – no more, no lessB. Quick and simple Setup and UpdatesC. Easy control of one or many instancesD. Simple Infrastructure requirementsE. Reliable underlying system
  • Results…
  • What is the job for your Big Data Appliance?
  • What is the job for your Big Data Appliance?1. Extend your Existing Data Warehouse to include Non- Structured Data?2. Discover new types of insights to Increase Innovation3. Run a pilot to verify it is worth it4. Process more (types of) Data5. Process Data faster6. Process Data cheaper7. Static or Continuous Analysis of Data8. Flexibility and Lock-In prevention (yes, sure :-)) - Hadoop9. Turn Operational Data into Assets10. Break Data Silo barriers11. Stick to existing Data vendors or work with new ones
  • Revealing the 6 Types of Big Data Appliances
  • Revealing the 6 Types of Big Data Appliances• Hadoop Engine - Software Based Appliance• Data Warehouse Hardware Engine + API to Hadoop / Analytics• Hardware Storage “Only”• Software Based Appliance, Compatible to Hadoop• Cloud based VMs + Hadoop Engine• Cloud Based API with Hooks to Hadoop
  • What type of Big Data Appliance will you use? (Poll #3)
  • What type of Big Data Appliance will you use? (Poll #3)A. Hadoop Engine or Compatible - Software BasedB. Data Warehouse Hardware Engine + API to Hadoop / AnalyticsC. Hardware Storage “Only”D. Cloud based VMs + Hadoop EngineE. Cloud Based API with Hooks to Hadoop
  • Results…
  • Uncovering (some of) the Main Players
  • Hadoop Engine - Software Based Appliances• Oracle• Cloudera• HortonWorks (like Cloudera Co-Op VMware, Microsoft, TeraData,…))• MapR Available on Amazon EMR and Google Compute Engine VMs• Red Hat Storage 2.0 Beta (Includes compatibility for Apache Hadoop)
  • Oracle Big Data Appliance• End goal: Get data into Oracle Database 11g• Includes open source Hadoop (Now Cloudera)• Oracle NoSQL Database (JVM DB vs. HDFS!)• Oracle Loader for Hadoop (more next slide)• Open source distribution of R• Oracle Linux + Oracle Java Hot Spot VM
  • Oracle Big Data Appliance• Oracle Data Integrator + Hadoop API – Easy upload to HDFS by automating MAP-R – Validate constraints of Hives – Add Data to Hives – Upload to Oracle using Oracle Loader for Hadoop – Allows query of Hives, using Oracle SQL, via a “connector” Oracle Table
  • Oracle Big Data Appliance• Type: Hadoop Engine - Software Based Appliance• Does the job – See next slide• Quick and simple setup – Medium (Oracle)• Quick and easy updates – Medium (Oracle/CDH?)• Easy control of one or many instances• Simple Infrastructure requirements – Medium (Oracle)• Reliable underlying system• No delays doing it’s job - ?• What else? – Great if you’ve got Oracle already – Add on to Oracle Exadata Hardware / Data Warehouse
  • Oracle Big Data Appliance• Can do most of the job requirements• Exceptions: – Process Data faster – Looks like… – Process Data cheaper – Oracle is not a cheap product… – Flexibility and Lock-In prevention - Medium
  • Cloudera• Integrated, Tested collection of Open Source Apache Hadoop (more next slide)• HDFS is the NOSQL Database...• Management Console for rapid node deploy• Free up to X nodes• Paid Enterprise Subscription, includes support• Integrated into a bunch of Data software Giants
  • Cloudera Included Open Source Mods:• Apache HBase HDFS based tables• Apache Hive SQL-like language• Apache Mahout Machine Learning algorithms• Apache Pig High-level data flow language• Apache Sqoop Engine integrating with SQLDBs• Apache Whirr to deploy Hadoop in the cloud• Hue Browser-based interface for Hadoop
  • Cloudera• Type: Hadoop Engine - Software Based Appliance• Does the job – See next slide• Quick and simple setup – Great once first node set• Quick and easy updates• Easy control of one or many instances• Simple Infrastructure requirements• Reliable underlying system• No delays doing it’s job - maybe• What else? – Easy to start as a pilot! – Great for old hardware
  • Cloudera• Can do most of the job requirements• Exceptions: – Process Data faster – depends on allocated resources – Process Cheaper – Yes (but cheap HW can be costly) – Static or Continuous Analysis – needs more tools – Endorsement from Huge Players
  • MapR Special Features (Do You need it?)• ExpressLane – Small jobs finish quickly (medium)• Mount / use HDFS over NFS (strategic?)• NFS, allows data streaming (Important/lock in?)• Volumes (manage, mirror, snap) – (Important?)• X times more scalable / faster (lock in?)• Name Node and Job Tracker HA (claims regular hadoop has only 1 Name Node) (Medium)• SW Snapshot/Mirror (Fast? Complex?)
  • Data Warehouse Hardware Engine + API to Hadoop / Analytics• TeraData Aster MapR Appliance• EMC GreenPlum• IBM Netezza + Cloudera/Hadoop as part of IBM’s Big Data Solutions Suite• Cray Big Data Appliance, Urika (YarcData Division)
  • TeraData Aster MapR Appliance• Hadoop is not at the front, MAP Reduce is• Short learning curve, using current DW tool• MPP is already built in for scale as part of DW• Reliability and Performance done by HW• Connectivity (JDBC,ODBC) to Big Data: Cloudera• Guess Price is higher than Hadoop solutions• Platform: SuSE Linux• Aster Data nCluster Amazon AWS Cloud Edition
  • TeraData Aster MapR Appliance• Type: Data Warehouse Hardware Engine + API to Hadoop / Analytics• Does the job – See next slide• Quick and simple setup –• Quick and easy updates• Easy control of one or many instances• Simple Infrastructure requirements – Specialized HW…• Reliable underlying system• No delays doing it’s job - maybe
  • TeraData Aster MapR Appliance• Can do most of the job requirements• Exceptions: – Run a pilot to verify it is worth it – probably pricy…unless using the Software / Cloud editions – Process Data cheaper – probably not so… – Static or Continuous Analysis of Data – Should Excel! – Lock-In – probably, not sure how much• Turn Operational Data into Assets - Should Excel at this…
  • Hardware Storage “Only”• DataDirect Networks Big Data Storage Appliances• Quantum StorNext Metadata Appliances
  • DataDirect Networks Big Data Storage Appliances• “Science Fiction” I/O Performance – Single Array: 40GB⁄s and 1.4 Million Flash IOPS – Up to 25 FC/Infiniband hooked arrays: 1TB⁄s + – More info and pricing
  • Quantum StorNext Metadata Appliances• Special additional features: – Huge file size support – Huge amount of files support – Varying Operating System direct access support
  • Hardware Storage “Only”• Does the job – See next slide• Quick and simple setup – Once you set the HW• Quick and easy updates - probably• Easy control of one or many instances• Simple Infrastructure requirements – Specialized HW…• Reliable underlying system• No delays doing it’s job
  • Hardware Storage “Only”• Can do SOME of the job requirements• Exceptions: Can’t do all those without additional software – Run a pilot to verify it is worth it – too costly for a pilot? – Process Data faster – Process Data cheaper – Flexibility and Lock-In prevention
  • Cloud based VMs + Hadoop Engine• Amazon Elastic MapReduce (Amazon EMR)• Google Compute Engine
  • Amazon Elastic MapReduce (Amazon EMR)• Type: Cloud based VMs + Hadoop Engine• Cost Affective (not always = cheap!)• Includes Hadoop SW such as MapR including all MapR advanced SW based File Services• Easily add or remove nodes – Pre set VMs – Easy mass deployment using AWS console• HA integrated into Amazon S3• Hadoop Hbase DB as EMR service
  • Google Compute Engine Special Features• Type: Cloud based VMs + Hadoop Engine• Based on CentOS (nice – open…)• Various disk types (all encrypted, fast) – Non Persistent (dies with the VM) – Persistent – shared + snapshots – Cloud based (looks similar to Amazon S3)• Cheaper than Amazon?
  • Amazon Elastic MapReduce (Amazon EMR)• Does the job – See next slide• Quick and simple setup• Quick and easy updates - probably• Easy control of one or many instances• Simple Infrastructure requirements• Reliable underlying system• No delays doing it’s job
  • Amazon Elastic MapReduce (Amazon EMR)• Can do most of the job requirements• Exceptions: – Extend your Existing Data Warehouse to include Non- Structured Data - Your DW out in the cloud … – Run a pilot to verify it is worth it – Excels at this! – Process Data faster – Process Data cheaper – Static or Continuous Analysis of Data – Turn Operational Data into Assets Operational in the Cloud…
  • Cloud Based API with Hooks to Hadoop• Google APP Engine Map Reduce• Microsoft Big Data via Windows Azure
  • Google APP Engine Map Reduce• open-source library for doing MapReduce on the Google App Engine platform• Can process data store entities and blob files (probably Google Cloud Storage)• Both in memory and disk operation• Scale up or down “working threads”• Python and Java support• Experimental, still allows a look into the future…
  • Google APP Engine Map Reduce• Does the job – See next slide• Quick and simple setup – Once you learn the API• Quick and easy updates• Easy control of one or many instances• Simple Infrastructure requirements• Reliable underlying system – still Beta…• No delays doing it’s job
  • Google APP Engine Map Reduce• Can do SOME of the job requirements• Exceptions: – Extend your Existing Data Warehouse – Cloud Security and DW – Run a pilot to verify it is worth it – could be great! – Process Data faster – Process Data cheaper – Static or Continuous Analysis of Data – Flexibility and Lock-In prevention – Code is open, but Process may not be – Turn Operational Data into Assets – Cloud Security…
  • Microsoft Big Data via Windows Azure• Provides SQL Server Hadoop Connector Provides ODBC Hadoop connector to tie MS Office and other Apps to Hadoop Hive• Seems similar to DW providers who have connector to Hadoop – Reason: It is not clear exactly where and how Azure Cloud Implementation goes…
  • Which Big Data Appliance should YOU use?
  • Which Big Data Appliance should YOU use?• Let’s look at the Big Data Appliance Job to be Done and ask questions:• Where are you and what is your goal? – So you have some of the puzzle pieces? – Any constraints? – Long term vs. Short term? – (Always start with a Pilot, if this is your first time…)
  • Challenges, Pitfalls, andWinning the Big Data Game
  • Challenges, Pitfalls, and Winning the Big Data Game• You can’t get much of Big Data if you don’t know how to find useful insights (Lack of Data Scientists)• The same abilities you needed for Data Warehouse digging, you need with Big Data, even more• Commoditization of the data warehouse (hadoop + Cloud) = More players and innovation
  • Challenges, Pitfalls, and Winning the Big Data Game• You can’t make use of it, if you lack innovative quick agile abilities to change direction and respond on time• Privacy (implied and specific)• Security (implied and specific)• To pay cheap (many X86 nodes) you need Mass Node Management APP• Big DW Vendors embrace hadoop through solution providers such as Cloudera and HortonWorks, but it “feels” a bit “vague”
  • Where is all this leading YOU to?
  • Where is all this leading YOU to?• The Simple Stuff (I know it looks complicated) – Crunching More and Faster for Less – Optimizing the Process and Utilizing the right Tools• The real challenge: Turning Data into an Asset – Finding: The Golden Nuggets – Deciding: What should I do now? – Pitching and leading: The Transformation• Big Data does not mean Endless Capacity…• Don’t get lost in the Technology Play Ground
  • Q&A Soon…But First, I need Your Help now…1. Please rate the Webinar2. Download the resource attachments for future use3. Register to my channel on BrightTalk4. Spread the word5. Have fun with Big Data and Enjoy Life 
  • Questions?
  • Reminder…1. Please rate the Webinar2. Download the resource attachments for future use3. Register to my channel on BrightTalk4. Spread the word5. Have fun with Big Data and Enjoy Life 