SlideShare a Scribd company logo
1 of 26
Getting Started with Big Data in the Cloud



      Vijay Tolani
      Sr. Sales Engineer


Talk with the Experts.
2#



Agenda
   • What is Big Data and Why is it a Good Fit for the Cloud?

   • Use Cases for running Big Data in the Cloud
        • Storing Large Data Sets and Unstructured Data
        • Data Analytics using Hadoop


   • RightScale Ecosystem Solutions
        • NoSQL
        • Hadoop Analytics


   • How I learned to Use Hadoop in the Cloud


Talk with the Experts.
3#



What is Big Data?

“Big data is data that exceeds the processing capacity
of conventional database systems. The data is too big,
moves too fast, or doesn't fit the strictures of your
database architectures. To gain value from this data,
you must choose an alternative way to process it.”

      - O’Reilly



Talk with the Experts.
4#



Why is Big Data a Good Fit for the Cloud?
         What insight could
         you gain if you had               We don’t have
       full use of a 100-node             resources to do
                cluster                  anything like that

                          What if one hour of
                         this 100-node cluster
                            would cost $34?


Talk with the Experts.
 4
5#



Relational Databases…since 1970
Data is stored in Tables




Data is accessed via SQL Queries




Talk with the Experts.
6#



Now Let Me Tell You a Story




Talk with the Experts.
7#



Draw Something Goes Viral
        Daily Active Users (millions)
   16




   14




   12




   10




   8




   6




   4




   2




          2/6   8   10   12   14   16   18   20   22   24   26   28   3/1   3   5   7   9   11   13   15   17   19   21


Talk with the Experts.
8#



As Usage Grew, So Did Game Data
        Daily Active Users (millions)
   16




   14

                              By March 29, there were
   12        over 30,000,000 downloads of the app,
     over 5,000 drawings being stored per second,
  10           over 2,200,000,000 drawings stored,
 over 105,000 database transactions per second,
  8           and over 3.3 terabytes of data stored.

   6




   4




   2




          2/6   8   10   12    14   16   18   20   22   24   26   28   3/1   3   5   7   9   11   13   15   17   19   21


Talk with the Experts.
9#



This Isn’t The Only Example
Food for Thought:

•   Facebook is expected to have more than 1 billion users by August
    2012, handles 40 billion photos, and generates 10 TB of log data per day.
•   Twitter has more than 100 million users and generates some 7 TB of tweet
    data per day.
•   For every trading session, the NYSE captures 1 TB of trade information.

Conventional Data Warehouses and SQL Databases do not meet the
demands of many of today’s applications with 3 key metrics:

•   Volume
•   Variety
•   Velocity

Talk with the Experts.
10#



Storing Large Data Sets in the Cloud

   • “I want to use Hadoop, but I’m out of capacity in my current
     Data Warehouse.”

   • If you can’t store the data, you can’t analyze the data.

   • Many customers are choosing to begin their Big Data projects
     by implementing NoSQL databases to store large volumes of
     data in a variety of formats (Structured, Unstructured, & Semi-
     Structured)



Talk with the Experts.
11#



What is NoSQL?
   •   Highly Scalable, Distributed, & Fault Tolerant

   •   Designed for use on Commodity Hardware.

   •   Does NOT use SQL

   •   Do NOT Guarantee Immediate Consistency


   Ideal Use Cases for NoSQL Databases when the following criteria is
   met:

   •   Simple Data Models are used.
   •   Flexibility is more important than strict control over defined Data
       Structures.
   •   High Performance is a must.
   •   Strict Data Consistency is not required.

Talk with the Experts.
12#



Types of NoSQL Databases
Key-Value Store




Document Database




Column Oriented Database




Talk with the Experts.
13#



MapReduce
   MapReduce paradigm consists of three steps:

   1. Mapper function or script that goes through your input data and outputs a
      series of keys and values.
   2. Sort the unordered list of keys and to ensure all the fragments that have the
      same key are next to one another in the file.
   3. The reducer stage then goes through the sorted output and receives all of the
      values that have the same key in a contiguous block.




Talk with the Experts.
14#



Hadoop Architecture




Talk with the Experts.
15#



Hadoop Concepts




Talk with the Experts.
16#



Interacting with Hadoop
Hive

•     Program hadoop jobs using SQL.
•     Caution: Because of Hadoop’s focus on large-scale processing, the latency may mean
      that even simple jobs take minutes to complete, so it’s not a substitute for a real-time
      transactional database.

Pig

•     Procedural data processing language designed for Hadoop where you specify a series
      of steps to perform on the data.
•     Often described as “the duct tape of Big Data” for its usefulness there, and it is
      often combined with custom streaming code written in a scripting language for more
      general operations.




Talk with the Experts.
17#



Key-Value Stores
• Use a hash table where there is a unique key and a pointer to a
  particular item of data.

• Typical Application: Content Caching

• Example: Redis




Talk with the Experts.
18#



Document Databases
• Document databases are essentially the next level of Key-Value
  stores, allowing nested values associated with each key.
• The semi-structured documents are stored in formats such as
  JSON.

• Typical Applications: Web Apps

• MongoDB and Couchbase Hadoop Connectors

• Example: Couchbase, MongoDB




Talk with the Experts.
19#



MongoDB Hadoop Integration

Built in MapReduce
• Built in MapReduce (JavaScript Only)
• Limited Scalability
• One JavaScript Implementation at a Time

Hadoop Connector
• Integrating MongoDB and Hadoop to Read/Write data to/from MongoDB
  via Hadoop




Talk with the Experts.
20#



Column Oriented Database
• Store and process very large amounts of data distributed over
  many machines. There are still keys but they point to multiple
  columns.

• Typical Application: Distributed File Systems

• Native Hadoop Integration for Hbase and Cassandra

• Example: Cassandra, HBase




Talk with the Experts.
21#



Cassandra Hadoop Integration
•   Native Support for Apache Pig and Apache Hive
•   Cassandra's Hadoop support implements the same interface as HDFS to achieve input data locality




•   One thing Cassandra can’t do well yet is MapReduce.
•   MapReduce and related systems such as Pig and Hive work well with HBase because it uses hadoop
    HDFS to store its data.



Talk with the Experts.
22#



My Approach to Learning about using
Hadoop in the Cloud… courtesy of IBM

• Learn It
      • Big Data University


• Try It
      • BigInsights Basic, Available for Free in the MultiCloud MarketPlace


• Buy It
      • BigInsights Enterprise for Advanced Functionality




Talk with the Experts.
23#



How I Learned to use Hadoop in the
Cloud
   • Hadoop Fundamentals
        • Hadoop Architecture, MapReduce, and HDFS
        • Using Pig and Hive
   • Using BigInsights in the Cloud with RightScale
   • The Best Part – It’s Free!!
   • http://www.bigdatauniversity.com/




Talk with the Experts.
24#



BigInsights Basic – Get Started for Free

   • Available in the MultiCloud MarketPlace

   • Free for Data Sets up to 10 TB




Talk with the Experts.
25#



BigInsights Enterprise




Talk with the Experts.
Questions?




Talk with the Experts.

More Related Content

What's hot

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core conceptsMaryan Faryna
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInHari Shankar Sreekumar
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 

What's hot (20)

Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedIn
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 

Viewers also liked

RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...
RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...
RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...RightScale
 
Automating Servers in the Cloud
Automating Servers in the CloudAutomating Servers in the Cloud
Automating Servers in the CloudRightScale
 
The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...
The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...
The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...RightScale
 
RightScale Customer Use Case - Ubisoft
RightScale Customer Use Case - UbisoftRightScale Customer Use Case - Ubisoft
RightScale Customer Use Case - UbisoftRightScale
 
Accelerate to Cloud
Accelerate to CloudAccelerate to Cloud
Accelerate to CloudRightScale
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudRightScale
 
RightScale Webinar: Rock Your SoftLayer Cloud with RightScale
RightScale Webinar: Rock Your SoftLayer Cloud with RightScaleRightScale Webinar: Rock Your SoftLayer Cloud with RightScale
RightScale Webinar: Rock Your SoftLayer Cloud with RightScaleRightScale
 

Viewers also liked (7)

RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...
RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...
RightScale Webinar: Benchmark your Cloud Adoption: 2014 State of the Cloud Re...
 
Automating Servers in the Cloud
Automating Servers in the CloudAutomating Servers in the Cloud
Automating Servers in the Cloud
 
The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...
The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...
The Fast Path to Building a Private Cloud (With Guest Speaker from Forrester ...
 
RightScale Customer Use Case - Ubisoft
RightScale Customer Use Case - UbisoftRightScale Customer Use Case - Ubisoft
RightScale Customer Use Case - Ubisoft
 
Accelerate to Cloud
Accelerate to CloudAccelerate to Cloud
Accelerate to Cloud
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
 
RightScale Webinar: Rock Your SoftLayer Cloud with RightScale
RightScale Webinar: Rock Your SoftLayer Cloud with RightScaleRightScale Webinar: Rock Your SoftLayer Cloud with RightScale
RightScale Webinar: Rock Your SoftLayer Cloud with RightScale
 

Similar to Getting Started with Big Data in the Cloud

Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersAdaryl "Bob" Wakefield, MBA
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architectSaurabh K. Gupta
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersZohar Elkayam
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesMongoDB
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataTreasure Data, Inc.
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 

Similar to Getting Started with Big Data in the Cloud (20)

Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Spark
SparkSpark
Spark
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use Cases
 
Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 

More from RightScale

10 Must-Have Automated Cloud Policies for IT Governance
10 Must-Have Automated Cloud Policies for IT Governance10 Must-Have Automated Cloud Policies for IT Governance
10 Must-Have Automated Cloud Policies for IT GovernanceRightScale
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsRightScale
 
Optimize Software, SaaS, and Cloud with Flexera and RightScale
Optimize Software, SaaS, and Cloud with Flexera and RightScaleOptimize Software, SaaS, and Cloud with Flexera and RightScale
Optimize Software, SaaS, and Cloud with Flexera and RightScaleRightScale
 
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About Now
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About NowPrepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About Now
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About NowRightScale
 
How to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseHow to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseRightScale
 
Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)RightScale
 
Comparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBM
Comparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBMComparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBM
Comparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBMRightScale
 
How to Allocate and Report Cloud Costs with RightScale Optima
How to Allocate and Report Cloud Costs with RightScale OptimaHow to Allocate and Report Cloud Costs with RightScale Optima
How to Allocate and Report Cloud Costs with RightScale OptimaRightScale
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...RightScale
 
Using RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsUsing RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsRightScale
 
Best Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and ComplianceBest Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and ComplianceRightScale
 
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreAutomating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreRightScale
 
The 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesThe 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesRightScale
 
9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage Costs9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage CostsRightScale
 
Serverless Comparison: AWS vs Azure vs Google vs IBM
Serverless Comparison: AWS vs Azure vs Google vs IBMServerless Comparison: AWS vs Azure vs Google vs IBM
Serverless Comparison: AWS vs Azure vs Google vs IBMRightScale
 
Best Practices for Cloud Managed Services Providers: The Path to CMP Success
Best Practices for Cloud Managed Services Providers: The Path to CMP SuccessBest Practices for Cloud Managed Services Providers: The Path to CMP Success
Best Practices for Cloud Managed Services Providers: The Path to CMP SuccessRightScale
 
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMCloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMRightScale
 
2018 Cloud Trends: RightScale State of the Cloud Report
2018 Cloud Trends: RightScale State of the Cloud Report2018 Cloud Trends: RightScale State of the Cloud Report
2018 Cloud Trends: RightScale State of the Cloud ReportRightScale
 
Got a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP HelpsGot a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP HelpsRightScale
 
How to Manage Cloud Costs with RightScale Optima
How to Manage Cloud Costs with RightScale OptimaHow to Manage Cloud Costs with RightScale Optima
How to Manage Cloud Costs with RightScale OptimaRightScale
 

More from RightScale (20)

10 Must-Have Automated Cloud Policies for IT Governance
10 Must-Have Automated Cloud Policies for IT Governance10 Must-Have Automated Cloud Policies for IT Governance
10 Must-Have Automated Cloud Policies for IT Governance
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
 
Optimize Software, SaaS, and Cloud with Flexera and RightScale
Optimize Software, SaaS, and Cloud with Flexera and RightScaleOptimize Software, SaaS, and Cloud with Flexera and RightScale
Optimize Software, SaaS, and Cloud with Flexera and RightScale
 
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About Now
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About NowPrepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About Now
Prepare Your Enterprise Cloud Strategy for 2019: 7 Things to Think About Now
 
How to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseHow to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your Enterprise
 
Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)Multi-Cloud Management with RightScale CMP (Demo)
Multi-Cloud Management with RightScale CMP (Demo)
 
Comparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBM
Comparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBMComparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBM
Comparing Cloud VM Types and Prices: AWS vs Azure vs Google vs IBM
 
How to Allocate and Report Cloud Costs with RightScale Optima
How to Allocate and Report Cloud Costs with RightScale OptimaHow to Allocate and Report Cloud Costs with RightScale Optima
How to Allocate and Report Cloud Costs with RightScale Optima
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
 
Using RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider ToolsUsing RightScale CMP with Cloud Provider Tools
Using RightScale CMP with Cloud Provider Tools
 
Best Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and ComplianceBest Practices for Multi-Cloud Security and Compliance
Best Practices for Multi-Cloud Security and Compliance
 
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and MoreAutomating Multi-Cloud Policies for AWS, Azure, Google, and More
Automating Multi-Cloud Policies for AWS, Azure, Google, and More
 
The 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for EnterprisesThe 5 Stages of Cloud Management for Enterprises
The 5 Stages of Cloud Management for Enterprises
 
9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage Costs9 Ways to Reduce Cloud Storage Costs
9 Ways to Reduce Cloud Storage Costs
 
Serverless Comparison: AWS vs Azure vs Google vs IBM
Serverless Comparison: AWS vs Azure vs Google vs IBMServerless Comparison: AWS vs Azure vs Google vs IBM
Serverless Comparison: AWS vs Azure vs Google vs IBM
 
Best Practices for Cloud Managed Services Providers: The Path to CMP Success
Best Practices for Cloud Managed Services Providers: The Path to CMP SuccessBest Practices for Cloud Managed Services Providers: The Path to CMP Success
Best Practices for Cloud Managed Services Providers: The Path to CMP Success
 
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMCloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
 
2018 Cloud Trends: RightScale State of the Cloud Report
2018 Cloud Trends: RightScale State of the Cloud Report2018 Cloud Trends: RightScale State of the Cloud Report
2018 Cloud Trends: RightScale State of the Cloud Report
 
Got a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP HelpsGot a Multi-Cloud Strategy? How RightScale CMP Helps
Got a Multi-Cloud Strategy? How RightScale CMP Helps
 
How to Manage Cloud Costs with RightScale Optima
How to Manage Cloud Costs with RightScale OptimaHow to Manage Cloud Costs with RightScale Optima
How to Manage Cloud Costs with RightScale Optima
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Getting Started with Big Data in the Cloud

  • 1. Getting Started with Big Data in the Cloud Vijay Tolani Sr. Sales Engineer Talk with the Experts.
  • 2. 2# Agenda • What is Big Data and Why is it a Good Fit for the Cloud? • Use Cases for running Big Data in the Cloud • Storing Large Data Sets and Unstructured Data • Data Analytics using Hadoop • RightScale Ecosystem Solutions • NoSQL • Hadoop Analytics • How I learned to Use Hadoop in the Cloud Talk with the Experts.
  • 3. 3# What is Big Data? “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.” - O’Reilly Talk with the Experts.
  • 4. 4# Why is Big Data a Good Fit for the Cloud? What insight could you gain if you had We don’t have full use of a 100-node resources to do cluster anything like that What if one hour of this 100-node cluster would cost $34? Talk with the Experts. 4
  • 5. 5# Relational Databases…since 1970 Data is stored in Tables Data is accessed via SQL Queries Talk with the Experts.
  • 6. 6# Now Let Me Tell You a Story Talk with the Experts.
  • 7. 7# Draw Something Goes Viral Daily Active Users (millions) 16 14 12 10 8 6 4 2 2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21 Talk with the Experts.
  • 8. 8# As Usage Grew, So Did Game Data Daily Active Users (millions) 16 14 By March 29, there were 12 over 30,000,000 downloads of the app, over 5,000 drawings being stored per second, 10 over 2,200,000,000 drawings stored, over 105,000 database transactions per second, 8 and over 3.3 terabytes of data stored. 6 4 2 2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21 Talk with the Experts.
  • 9. 9# This Isn’t The Only Example Food for Thought: • Facebook is expected to have more than 1 billion users by August 2012, handles 40 billion photos, and generates 10 TB of log data per day. • Twitter has more than 100 million users and generates some 7 TB of tweet data per day. • For every trading session, the NYSE captures 1 TB of trade information. Conventional Data Warehouses and SQL Databases do not meet the demands of many of today’s applications with 3 key metrics: • Volume • Variety • Velocity Talk with the Experts.
  • 10. 10# Storing Large Data Sets in the Cloud • “I want to use Hadoop, but I’m out of capacity in my current Data Warehouse.” • If you can’t store the data, you can’t analyze the data. • Many customers are choosing to begin their Big Data projects by implementing NoSQL databases to store large volumes of data in a variety of formats (Structured, Unstructured, & Semi- Structured) Talk with the Experts.
  • 11. 11# What is NoSQL? • Highly Scalable, Distributed, & Fault Tolerant • Designed for use on Commodity Hardware. • Does NOT use SQL • Do NOT Guarantee Immediate Consistency Ideal Use Cases for NoSQL Databases when the following criteria is met: • Simple Data Models are used. • Flexibility is more important than strict control over defined Data Structures. • High Performance is a must. • Strict Data Consistency is not required. Talk with the Experts.
  • 12. 12# Types of NoSQL Databases Key-Value Store Document Database Column Oriented Database Talk with the Experts.
  • 13. 13# MapReduce MapReduce paradigm consists of three steps: 1. Mapper function or script that goes through your input data and outputs a series of keys and values. 2. Sort the unordered list of keys and to ensure all the fragments that have the same key are next to one another in the file. 3. The reducer stage then goes through the sorted output and receives all of the values that have the same key in a contiguous block. Talk with the Experts.
  • 16. 16# Interacting with Hadoop Hive • Program hadoop jobs using SQL. • Caution: Because of Hadoop’s focus on large-scale processing, the latency may mean that even simple jobs take minutes to complete, so it’s not a substitute for a real-time transactional database. Pig • Procedural data processing language designed for Hadoop where you specify a series of steps to perform on the data. • Often described as “the duct tape of Big Data” for its usefulness there, and it is often combined with custom streaming code written in a scripting language for more general operations. Talk with the Experts.
  • 17. 17# Key-Value Stores • Use a hash table where there is a unique key and a pointer to a particular item of data. • Typical Application: Content Caching • Example: Redis Talk with the Experts.
  • 18. 18# Document Databases • Document databases are essentially the next level of Key-Value stores, allowing nested values associated with each key. • The semi-structured documents are stored in formats such as JSON. • Typical Applications: Web Apps • MongoDB and Couchbase Hadoop Connectors • Example: Couchbase, MongoDB Talk with the Experts.
  • 19. 19# MongoDB Hadoop Integration Built in MapReduce • Built in MapReduce (JavaScript Only) • Limited Scalability • One JavaScript Implementation at a Time Hadoop Connector • Integrating MongoDB and Hadoop to Read/Write data to/from MongoDB via Hadoop Talk with the Experts.
  • 20. 20# Column Oriented Database • Store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. • Typical Application: Distributed File Systems • Native Hadoop Integration for Hbase and Cassandra • Example: Cassandra, HBase Talk with the Experts.
  • 21. 21# Cassandra Hadoop Integration • Native Support for Apache Pig and Apache Hive • Cassandra's Hadoop support implements the same interface as HDFS to achieve input data locality • One thing Cassandra can’t do well yet is MapReduce. • MapReduce and related systems such as Pig and Hive work well with HBase because it uses hadoop HDFS to store its data. Talk with the Experts.
  • 22. 22# My Approach to Learning about using Hadoop in the Cloud… courtesy of IBM • Learn It • Big Data University • Try It • BigInsights Basic, Available for Free in the MultiCloud MarketPlace • Buy It • BigInsights Enterprise for Advanced Functionality Talk with the Experts.
  • 23. 23# How I Learned to use Hadoop in the Cloud • Hadoop Fundamentals • Hadoop Architecture, MapReduce, and HDFS • Using Pig and Hive • Using BigInsights in the Cloud with RightScale • The Best Part – It’s Free!! • http://www.bigdatauniversity.com/ Talk with the Experts.
  • 24. 24# BigInsights Basic – Get Started for Free • Available in the MultiCloud MarketPlace • Free for Data Sets up to 10 TB Talk with the Experts.

Editor's Notes

  1. The MapReduce Engine consists of one Job Tracker and Task Trackers assigned to every Node.  Applications submit jobs to the Job Tracker and the Job Tracker pushes the jobs to the Task Trackers closest the data. The Job Tracker knows which node the data is located – keeping the work close to the data.
  2. Cassandra has no master node, and, hence, no single point of failure