When working with structured, semi-structured, and unstructured data, there is often a tendency to try and force one tool – either Hadoop or a traditional DBMS – to do all the work. At Vertica, we’ve found that there are reasons to use Hadoop for some analytics projects, and Vertica for others, and the magic comes in knowing when to use which tool and how these two tools can work together. Join us as we walk through some of the use cases for using Hadoop with a purpose-built analytics platform for an effective, combined analytics solution.
4. What is Vertica?
Speed
• SQL Database for Real-time Analytics Services
Cloud
• Runs on x86 hardware
• MPP Columnar Architecture – scales to PBs!
Monetize
Real
Time
• Extensible analytics capabilities
• Easy to setup and use
• Elastic - grow/shrink as needed
Better
Decisions
Statistics
Mobile Individual
Analysis
Simplicity
4
5. What Analytics can Vertica do?
SQL Extended
SQL
• Window
functions • Sessionization SDKs
• Graph • Time series • C++
• Monte Carlo • Pattern • R
matching
• Statistical
• Event series
• Geospatial
joins
5 Check out: https://github.com/vertica/Vertica-Extension-Packages
6. Who uses Vertica?
600+
Customers worldwide “… by partnering with Vertica
we’re able to provide operators the
tools they need to confidently
interpret customer experience…”
Steve Kish, Director, Product Management Empirix
“…being able to run social graph
analysis on tables with tens of billions
of rows with a fast turn around is
amazing…”
Dan McCaffrey, Director of Analytics, Zynga
6
7. What's different – Hadoop vs Vertica?
Vertica Hadoop
Both
• Designed for Purpose-built • Designed for
Performance Scalable Fault-tolerance
• SQL Analytics • Map-Reduce
• Interactive Platforms • Batch Analytics
Analytics
Read: http://www.vertica.com/2011/09/21/counting-triangles/
7
8. Getting the best of both worlds!
SQL/ Extensions
In C++, R
ODBC/
Ver%ca
JDBC
Engine
External Tables
Native
User-defined Loads
Ver%ca
Storage
8 Hadoop/MR Connector
New in Vertica 6
9. Joint Use Cases
Hadoop for ETL, Vertica for Analytics
• Logparsing / tagging / filtering
• Convert JSON into relational tuples
HDFS for data storage, Vertica + Hadoop for Analytics
• Real-timeanalytics on Vertica (needs speed)
• Long-running / exploratory analytics on Hadoop (needs fault tolerance)
• Load from HDFS directly to Vertica (needs Vertica 6)
• SQL access to HDFS (needs Vertica 6)
Vertica for data storage, Hadoop as a multi-purpose tool
• Hadoop as a scheduler / load-balancer
• Hadoop to convert to formats for other tools (e.g. STATA)
• Hadoop for Backup via Sqoop
9
11. Accelerating Drug Discovery
The solution
• Analyzing gene • Queries went from 5
variants using SNPs hours to 5 minutes
and Microarray data • Hadoop to find the variants • Scale to 100s of TB of
between a sample sequence data
and a reference genome • More experiments =>
• Vertica to determine oncology faster discoveries!
targets
• Tools: Pipeline Pilot, Spotfire, R
The problem The value
11
12. Digital Consumer Insights
• HDFS to store raw • Vertica to store & Faster insights
input behavioral data operationalize high delivered more
• Hadoop / MR to value biz data consistently with less
find conversions • Reporting & administrative
(regexp processing) analytics via Tableau overhead, and
and R cheaper hardware!!
• Custom ETL
12
13. On a Privacy Assurance Mission
Collect user Use MR to Use Vertica to
privacy reporting process and analyze stats for
requests into structure the data every 3rd party
HDFS into Vertica (ETL) tag on a website.
For Consumers: For Advertisers:
Provide greater transparency
to end-users (look for on an a
A free browser plugin that
can tell you who’s tracking Understand impact of 3rd party tags on
you! website performance
13
14. Social Video
Social Video Analytics Social Video Advertising
▫ Video analytics – 100+ Leading Pubs
Hadoop for batch processing
of logs and ETL into Vertica
▫ Campaign Measurement – 100+ major brands
Vertica for ad-hoc analytics
and interactive dashboards
▫ Industry-Wide Charts
Redis KV store for serving
low-latency data needs
100s of millions of events collected and processed
daily on Petabyte scale infrastructure!
14
15. Try Vertica for free!
Community Edition
Up to 1 TB limit, 3 nodes!
Check out Vertica
Extensions on Github!
15
16. References and Other Info …
Website: www.vertica.com
Community Edition: http://www.vertica.com/community/
Github: https://github.com/vertica/Vertica-Extension-Packages
Questions or Comments: shilpa@vertica.com
Jobs: resumes@vertica.com (Awesome new location in Cambridge,
MA!)
Follow us on Twitter: @slawande, @verticacorp
16