Big Data with Hadoop and Cloud Computing
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Big Data with Hadoop and Cloud Computing



Big Data with Hadoop and Cloud Computing

Big Data with Hadoop and Cloud Computing



Total Views
Views on SlideShare
Embed Views



2 Embeds 110 84 26



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Big Data with Hadoop and Cloud Computing Presentation Transcript

  • 1. Big Data with Hadoop and Cloud Computing
  • 2. “Big Data Processing” relevant for Enterprises• Big Data used to be discarded or un-analyzed & archived. – Loss of information, insight, and prospects to extract new value.• How Big Data is beneficial? – Energy companies - Geophysical analysis. – Science and medicine - Empiricism is growing than experimentation – Disney – Customer behavior patterns across its stores, and theme parks• Pursuit of a “Competitive Advantage” is the driving factor for Enterprises – Data mining (Log processing, click stream analysis, similarity algorithms, etc.), Financial simulation (Monte Carlo simulation), File processing (resize jpegs), Web indexing Researcher’s Blog -
  • 3. Cloud Computing ~ brings economy to Big Data Processing• Big Data Processing can be implemented by HPC & Cloud. 1) HPC implementation is very costly w.r.t. CAPEX & OPEX. 2) Cloud Computing is efficient because of its paper use nature.• MapReduce programming model is used for processing big data sets.• Pig, Hive, Hadoop, … are used for Big data Processing – Pig - SQL-like operations that apply to datasets., – Hive - Perform SQL-like data analysis on data – Hadoop - processes vast amounts of data; (Focal point)• Use EC2 instances to analyze “Big Data” in Amazon IaaS.• Amazon MapReduce reduces complex setup & Magt. Researcher’s Blog -
  • 4. Cost Comparison of AlternativesUse case: Analyze Next Generation Sequencing data to understand genetics of cancer. Amazon HPC Amazon IaaS MapReduce100 Steady & 200 Peak load 400 reserved,600 on demand Elastic: 1000 Standard ExtraServers Standard Extra Large Large instances68.4GB memory 1690 GB instances 15GB RAM,1690GB storagestorage 15GB RAM,1690GB storage Elastic MapReduce $377395•CAPEX & OPEX •Time-consuming set-up $1,746,769•Time-consuming set-up •Magt. of Hadoop clusters Elastic, Easy to Use, Reliable•Magt. of Hadoop clusters Researcher’s Blog - resources. Auto turnoff As per Amazon EC2 cost comparison calculator
  • 5. Future Direction• Current Experiments & Identified areas – Social network analysis – Managing Data center – Collective Intelligence - Algorithms and Visualization techniques – Predictive analytics• Accelerators Exploration – Apache Whirr - Cloud-neutral way to run services – Apache Mahout - Scalable machine learning library – Cascading - Distributed computing framework – HAMA - define and execute fault tolerant data processing workflows• Exploration of LAMP-like stack for Big Data aggregation, processing and analytics Researcher’s Blog -
  • 6. Download with Linkedin Username/Password
  • 7. Download with Linkedin Username/Password
  • 8. Download with Linkedin Username/Password
  • 9. Download with Linkedin Username/Password
  • 10. Download with Linkedin Username/Password