Big Data with Hadoop and Cloud Computing

  • 1,130 views
Uploaded on

Big Data with Hadoop and Cloud Computing

Big Data with Hadoop and Cloud Computing

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,130
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
43
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big Data with Hadoop and Cloud Computing http://clean-clouds.com
  • 2. “Big Data Processing” relevant for Enterprises• Big Data used to be discarded or un-analyzed & archived. – Loss of information, insight, and prospects to extract new value.• How Big Data is beneficial? – Energy companies - Geophysical analysis. – Science and medicine - Empiricism is growing than experimentation – Disney – Customer behavior patterns across its stores, and theme parks• Pursuit of a “Competitive Advantage” is the driving factor for Enterprises – Data mining (Log processing, click stream analysis, similarity algorithms, etc.), Financial simulation (Monte Carlo simulation), File processing (resize jpegs), Web indexing Researcher’s Blog - http://clean-clouds.com
  • 3. Cloud Computing ~ brings economy to Big Data Processing• Big Data Processing can be implemented by HPC & Cloud. 1) HPC implementation is very costly w.r.t. CAPEX & OPEX. 2) Cloud Computing is efficient because of its paper use nature.• MapReduce programming model is used for processing big data sets.• Pig, Hive, Hadoop, … are used for Big data Processing – Pig - SQL-like operations that apply to datasets., – Hive - Perform SQL-like data analysis on data – Hadoop - processes vast amounts of data; (Focal point)• Use EC2 instances to analyze “Big Data” in Amazon IaaS.• Amazon MapReduce reduces complex setup & Magt. Researcher’s Blog - http://clean-clouds.com
  • 4. Cost Comparison of AlternativesUse case: Analyze Next Generation Sequencing data to understand genetics of cancer. Amazon HPC Amazon IaaS MapReduce100 Steady & 200 Peak load 400 reserved,600 on demand Elastic: 1000 Standard ExtraServers Standard Extra Large Large instances68.4GB memory 1690 GB instances 15GB RAM,1690GB storagestorage 15GB RAM,1690GB storage Elastic MapReduce $377395•CAPEX & OPEX •Time-consuming set-up $1,746,769•Time-consuming set-up •Magt. of Hadoop clusters Elastic, Easy to Use, Reliable•Magt. of Hadoop clusters Researcher’s Blog - http://clean-clouds.com resources. Auto turnoff As per Amazon EC2 cost comparison calculator
  • 5. Future Direction• Current Experiments & Identified areas – Social network analysis – Managing Data center – Collective Intelligence - Algorithms and Visualization techniques – Predictive analytics• Accelerators Exploration – Apache Whirr - Cloud-neutral way to run services – Apache Mahout - Scalable machine learning library – Cascading - Distributed computing framework – HAMA - define and execute fault tolerant data processing workflows• Exploration of LAMP-like stack for Big Data aggregation, processing and analytics Researcher’s Blog - http://clean-clouds.com
  • 6. Download with Linkedin Username/Password
  • 7. Download with Linkedin Username/Password
  • 8. Download with Linkedin Username/Password
  • 9. Download with Linkedin Username/Password
  • 10. Download with Linkedin Username/Password