On October 23rd, 2014, we updated our
By continuing to use LinkedIn’s SlideShare service, you agree to the revised terms, so please take a few minutes to review them.
Big Data with Hadoop and Cloud ComputingPresentation Transcript
Big Data with Hadoop and Cloud Computing http://clean-clouds.com
“Big Data Processing” relevant for Enterprises• Big Data used to be discarded or un-analyzed & archived. – Loss of information, insight, and prospects to extract new value.• How Big Data is beneficial? – Energy companies - Geophysical analysis. – Science and medicine - Empiricism is growing than experimentation – Disney – Customer behavior patterns across its stores, and theme parks• Pursuit of a “Competitive Advantage” is the driving factor for Enterprises – Data mining (Log processing, click stream analysis, similarity algorithms, etc.), Financial simulation (Monte Carlo simulation), File processing (resize jpegs), Web indexing Researcher’s Blog - http://clean-clouds.com
Cloud Computing ~ brings economy to Big Data Processing• Big Data Processing can be implemented by HPC & Cloud. 1) HPC implementation is very costly w.r.t. CAPEX & OPEX. 2) Cloud Computing is efficient because of its paper use nature.• MapReduce programming model is used for processing big data sets.• Pig, Hive, Hadoop, … are used for Big data Processing – Pig - SQL-like operations that apply to datasets., – Hive - Perform SQL-like data analysis on data – Hadoop - processes vast amounts of data; (Focal point)• Use EC2 instances to analyze “Big Data” in Amazon IaaS.• Amazon MapReduce reduces complex setup & Magt. Researcher’s Blog - http://clean-clouds.com
Cost Comparison of AlternativesUse case: Analyze Next Generation Sequencing data to understand genetics of cancer. Amazon HPC Amazon IaaS MapReduce100 Steady & 200 Peak load 400 reserved,600 on demand Elastic: 1000 Standard ExtraServers Standard Extra Large Large instances68.4GB memory 1690 GB instances 15GB RAM,1690GB storagestorage 15GB RAM,1690GB storage Elastic MapReduce $377395•CAPEX & OPEX •Time-consuming set-up $1,746,769•Time-consuming set-up •Magt. of Hadoop clusters Elastic, Easy to Use, Reliable•Magt. of Hadoop clusters Researcher’s Blog - http://clean-clouds.com resources. Auto turnoff As per Amazon EC2 cost comparison calculator
Future Direction• Current Experiments & Identified areas – Social network analysis – Managing Data center – Collective Intelligence - Algorithms and Visualization techniques – Predictive analytics• Accelerators Exploration – Apache Whirr - Cloud-neutral way to run services – Apache Mahout - Scalable machine learning library – Cascading - Distributed computing framework – HAMA - define and execute fault tolerant data processing workflows• Exploration of LAMP-like stack for Big Data aggregation, processing and analytics Researcher’s Blog - http://clean-clouds.com