• Save
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
Upcoming SlideShare
Loading in...5
×
 

Combining Hadoop RDBMS for Large-Scale Big Data Analytics

on

  • 27,936 views

When working with structured, semi-structured, and unstructured data, there is often a tendency to try and force one tool – either Hadoop or a traditional DBMS – to do all the work. At Vertica, ...

When working with structured, semi-structured, and unstructured data, there is often a tendency to try and force one tool – either Hadoop or a traditional DBMS – to do all the work. At Vertica, we’ve found that there are reasons to use Hadoop for some analytics projects, and Vertica for others, and the magic comes in knowing when to use which tool and how these two tools can work together. Join us as we walk through some of the use cases for using Hadoop with a purpose-built analytics platform for an effective, combined analytics solution.

Statistics

Views

Total Views
27,936
Views on SlideShare
22,707
Embed Views
5,229

Actions

Likes
7
Downloads
0
Comments
0

9 Embeds 5,229

http://www.vertica.com 5124
http://www.scoop.it 45
http://eventifier.co 43
http://eventifier.com 12
http://webcache.googleusercontent.com 1
http://translate.googleusercontent.com 1
http://131.253.14.66 1
http://localhost 1
http://pmomale-ld1 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Combining Hadoop RDBMS for Large-Scale Big Data Analytics Combining Hadoop RDBMS for Large-Scale Big Data Analytics Presentation Transcript

  • COMBINING HADOOP &VERTICA FOR LARGE SCALEANALYTICSHadoop Summit 2012Shilpa LawandeVP Engineering, Vertica, an HP Company1 ©2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • The Big Data Problem2
  • The (popular) Solution Vertica’s Real-time Analytics + Scale & Flexibility of Hadoop3
  • What is Vertica? Speed •  SQL Database for Real-time Analytics Services Cloud •  Runs on x86 hardware •  MPP Columnar Architecture – scales to PBs! Monetize Real Time •  Extensible analytics capabilities •  Easy to setup and use •  Elastic - grow/shrink as needed Better Decisions Statistics Mobile Individual Analysis Simplicity4
  • What Analytics can Vertica do? SQL Extended SQL •  Window functions •  Sessionization SDKs •  Graph •  Time series •  C++ •  Monte Carlo •  Pattern •  R matching •  Statistical •  Event series •  Geospatial joins5 Check out: https://github.com/vertica/Vertica-Extension-Packages
  • Who uses Vertica? 600+ Customers worldwide “… by partnering with Vertica we’re able to provide operators the tools they need to confidently interpret customer experience…” Steve Kish, Director, Product Management Empirix “…being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing…” Dan McCaffrey, Director of Analytics, Zynga6
  • Whats different – Hadoop vs Vertica? Vertica Hadoop Both •  Designed for Purpose-built •  Designed for Performance Scalable Fault-tolerance •  SQL Analytics •  Map-Reduce •  Interactive Platforms •  Batch Analytics Analytics Read: http://www.vertica.com/2011/09/21/counting-triangles/7
  • Getting the best of both worlds! SQL/ Extensions     In C++, R ODBC/      Ver%ca     JDBC        Engine   External Tables Native User-defined Loads Ver%ca     Storage   8 Hadoop/MR ConnectorNew in Vertica 6
  • Joint Use CasesHadoop for ETL, Vertica for Analytics •  Logparsing / tagging / filtering •  Convert JSON into relational tuplesHDFS for data storage, Vertica + Hadoop for Analytics •  Real-timeanalytics on Vertica (needs speed) •  Long-running / exploratory analytics on Hadoop (needs fault tolerance) •  Load from HDFS directly to Vertica (needs Vertica 6) •  SQL access to HDFS (needs Vertica 6)Vertica for data storage, Hadoop as a multi-purpose tool •  Hadoop as a scheduler / load-balancer •  Hadoop to convert to formats for other tools (e.g. STATA) •  Hadoop for Backup via Sqoop9
  • Customer Stories10
  • Accelerating Drug Discovery The solution•  Analyzing gene •  Queries went from 5 variants using SNPs hours to 5 minutes and Microarray data •  Hadoop to find the variants •  Scale to 100s of TB of between a sample sequence data and a reference genome •  More experiments => •  Vertica to determine oncology faster discoveries! targets •  Tools: Pipeline Pilot, Spotfire, R The problem The value11
  • Digital Consumer Insights •  HDFS to store raw •  Vertica to store & Faster insights input behavioral data operationalize high delivered more •  Hadoop / MR to value biz data consistently with less find conversions •  Reporting & administrative (regexp processing) analytics via Tableau overhead, and and R cheaper hardware!! •  Custom ETL12
  • On a Privacy Assurance Mission Collect user Use MR to Use Vertica to privacy reporting process and analyze stats for requests into structure the data every 3rd party HDFS into Vertica (ETL) tag on a website. For Consumers: For Advertisers: Provide greater transparency to end-users (look for on an a A free browser plugin that can tell you who’s tracking Understand impact of 3rd party tags on you! website performance13
  • Social Video Social Video Analytics Social Video Advertising▫  Video analytics – 100+ Leading Pubs Hadoop for batch processing of logs and ETL into Vertica ▫ Campaign Measurement – 100+ major brands Vertica for ad-hoc analytics and interactive dashboards▫ Industry-Wide Charts Redis KV store for serving low-latency data needs 100s of millions of events collected and processed daily on Petabyte scale infrastructure!14
  • Try Vertica for free! Community Edition Up to 1 TB limit, 3 nodes! Check out Vertica Extensions on Github!15
  • References and Other Info … Website: www.vertica.com Community Edition: http://www.vertica.com/community/ Github: https://github.com/vertica/Vertica-Extension-Packages Questions or Comments: shilpa@vertica.com Jobs: resumes@vertica.com (Awesome new location in Cambridge, MA!) Follow us on Twitter: @slawande, @verticacorp16
  • Sessions will resume at 2:25pm Page 17