Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Combining Hadoop RDBMS for Large-Scale Big Data Analytics


Published on

When working with structured, semi-structured, and unstructured data, there is often a tendency to try and force one tool – either Hadoop or a traditional DBMS – to do all the work. At Vertica, we’ve found that there are reasons to use Hadoop for some analytics projects, and Vertica for others, and the magic comes in knowing when to use which tool and how these two tools can work together. Join us as we walk through some of the use cases for using Hadoop with a purpose-built analytics platform for an effective, combined analytics solution.

Published in: Technology
  • Be the first to comment

Combining Hadoop RDBMS for Large-Scale Big Data Analytics

  1. 1. COMBINING HADOOP &VERTICA FOR LARGE SCALEANALYTICSHadoop Summit 2012Shilpa LawandeVP Engineering, Vertica, an HP Company1 ©2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  2. 2. The Big Data Problem2
  3. 3. The (popular) Solution Vertica’s Real-time Analytics + Scale & Flexibility of Hadoop3
  4. 4. What is Vertica? Speed •  SQL Database for Real-time Analytics Services Cloud •  Runs on x86 hardware •  MPP Columnar Architecture – scales to PBs! Monetize Real Time •  Extensible analytics capabilities •  Easy to setup and use •  Elastic - grow/shrink as needed Better Decisions Statistics Mobile Individual Analysis Simplicity4
  5. 5. What Analytics can Vertica do? SQL Extended SQL •  Window functions •  Sessionization SDKs •  Graph •  Time series •  C++ •  Monte Carlo •  Pattern •  R matching •  Statistical •  Event series •  Geospatial joins5 Check out:
  6. 6. Who uses Vertica? 600+ Customers worldwide “… by partnering with Vertica we’re able to provide operators the tools they need to confidently interpret customer experience…” Steve Kish, Director, Product Management Empirix “…being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing…” Dan McCaffrey, Director of Analytics, Zynga6
  7. 7. Whats different – Hadoop vs Vertica? Vertica Hadoop Both •  Designed for Purpose-built •  Designed for Performance Scalable Fault-tolerance •  SQL Analytics •  Map-Reduce •  Interactive Platforms •  Batch Analytics Analytics Read:
  8. 8. Getting the best of both worlds! SQL/ Extensions     In C++, R ODBC/      Ver%ca     JDBC        Engine   External Tables Native User-defined Loads Ver%ca     Storage   8 Hadoop/MR ConnectorNew in Vertica 6
  9. 9. Joint Use CasesHadoop for ETL, Vertica for Analytics •  Logparsing / tagging / filtering •  Convert JSON into relational tuplesHDFS for data storage, Vertica + Hadoop for Analytics •  Real-timeanalytics on Vertica (needs speed) •  Long-running / exploratory analytics on Hadoop (needs fault tolerance) •  Load from HDFS directly to Vertica (needs Vertica 6) •  SQL access to HDFS (needs Vertica 6)Vertica for data storage, Hadoop as a multi-purpose tool •  Hadoop as a scheduler / load-balancer •  Hadoop to convert to formats for other tools (e.g. STATA) •  Hadoop for Backup via Sqoop9
  10. 10. Customer Stories10
  11. 11. Accelerating Drug Discovery The solution•  Analyzing gene •  Queries went from 5 variants using SNPs hours to 5 minutes and Microarray data •  Hadoop to find the variants •  Scale to 100s of TB of between a sample sequence data and a reference genome •  More experiments => •  Vertica to determine oncology faster discoveries! targets •  Tools: Pipeline Pilot, Spotfire, R The problem The value11
  12. 12. Digital Consumer Insights •  HDFS to store raw •  Vertica to store & Faster insights input behavioral data operationalize high delivered more •  Hadoop / MR to value biz data consistently with less find conversions •  Reporting & administrative (regexp processing) analytics via Tableau overhead, and and R cheaper hardware!! •  Custom ETL12
  13. 13. On a Privacy Assurance Mission Collect user Use MR to Use Vertica to privacy reporting process and analyze stats for requests into structure the data every 3rd party HDFS into Vertica (ETL) tag on a website. For Consumers: For Advertisers: Provide greater transparency to end-users (look for on an a A free browser plugin that can tell you who’s tracking Understand impact of 3rd party tags on you! website performance13
  14. 14. Social Video Social Video Analytics Social Video Advertising▫  Video analytics – 100+ Leading Pubs Hadoop for batch processing of logs and ETL into Vertica ▫ Campaign Measurement – 100+ major brands Vertica for ad-hoc analytics and interactive dashboards▫ Industry-Wide Charts Redis KV store for serving low-latency data needs 100s of millions of events collected and processed daily on Petabyte scale infrastructure!14
  15. 15. Try Vertica for free! Community Edition Up to 1 TB limit, 3 nodes! Check out Vertica Extensions on Github!15
  16. 16. References and Other Info … Website: Community Edition: Github: Questions or Comments: Jobs: (Awesome new location in Cambridge, MA!) Follow us on Twitter: @slawande, @verticacorp16
  17. 17. Sessions will resume at 2:25pm Page 17