SlideShare a Scribd company logo
1 of 17
COMBINING HADOOP &
VERTICA FOR LARGE SCALE
ANALYTICS
Hadoop Summit 2012




Shilpa Lawande
VP Engineering, Vertica, an HP Company
1
    ©2011 Hewlett-Packard Development Company, L.P.
    The information contained herein is subject to change without notice
The Big Data Problem




2
The (popular) Solution




    Vertica’s Real-time Analytics
                  +
    Scale & Flexibility of Hadoop
3
What is Vertica?
                                                                      Speed

    •    SQL Database for Real-time Analytics         Services
          Cloud

    •    Runs on x86 hardware
    •    MPP Columnar Architecture – scales to PBs!
                  Monetize
                                             Real
                                             Time


    •    Extensible analytics capabilities
    •    Easy to setup and use
    •    Elastic - grow/shrink as needed
                   Better
                  Decisions
                                         Statistics



         Mobile                                       Individual
                              Analysis




                                                                   Simplicity
4
What Analytics can Vertica do?




     SQL                   Extended
                           SQL
     •  Window
        functions          •  Sessionization           SDKs
     •  Graph              •  Time series              •  C++
     •  Monte Carlo        •  Pattern                  •  R
                              matching
     •  Statistical
                           •  Event series
     •  Geospatial
                              joins




5   Check out: https://github.com/vertica/Vertica-Extension-Packages
Who uses Vertica?

        600+
    Customers worldwide     “… by partnering with Vertica
                            we’re able to provide operators the
                            tools they need to confidently
                            interpret customer experience…”
                                   Steve Kish, Director, Product Management Empirix




                              “…being able to run social graph
                            analysis on tables with tens of billions
                              of rows with a fast turn around is
                                         amazing…”
                                    Dan McCaffrey, Director of Analytics, Zynga




6
What's different – Hadoop vs Vertica?




    Vertica                                       Hadoop
                                  Both
    •  Designed for           Purpose-built       •  Designed for
       Performance              Scalable             Fault-tolerance
    •  SQL                      Analytics         •  Map-Reduce
    •  Interactive             Platforms          •  Batch Analytics
       Analytics




        Read: http://www.vertica.com/2011/09/21/counting-triangles/
7
Getting the best of both worlds!

                             SQL/                                           Extensions
                                           	
  	
                           In C++, R
                             ODBC/         	
  	
  	
  Ver%ca	
  	
  
                             JDBC          	
  	
  	
  	
  Engine	
  




                                                                        External Tables
                                       Native




                                             User-defined Loads
                      Ver%ca	
  	
  
                      Storage	
  
   8                                       Hadoop/MR Connector
New in Vertica 6
Joint Use Cases

Hadoop for ETL, Vertica for Analytics
    •  Logparsing / tagging / filtering
    •  Convert JSON into relational tuples

HDFS for data storage, Vertica + Hadoop for Analytics
    •  Real-timeanalytics on Vertica (needs speed)
    •  Long-running / exploratory analytics on Hadoop (needs fault tolerance)
    •  Load from HDFS directly to Vertica (needs Vertica 6)
    •  SQL access to HDFS (needs Vertica 6)

Vertica for data storage, Hadoop as a multi-purpose tool
    •  Hadoop as a scheduler / load-balancer
    •  Hadoop to convert to formats for other tools (e.g. STATA)
    •  Hadoop for Backup via Sqoop


9
Customer Stories




10
Accelerating Drug Discovery




                                     The solution
•  Analyzing gene                                                •  Queries went from 5
   variants using SNPs                                              hours to 5 minutes
   and Microarray data   •  Hadoop to find the variants          •  Scale to 100s of TB of
                            between a sample sequence               data
                            and a reference genome               •  More experiments =>
                         •  Vertica to determine oncology           faster discoveries!
                            targets
                         •  Tools: Pipeline Pilot, Spotfire, R
     The problem                                                       The value


11
Digital Consumer Insights


     •  HDFS to store raw    •  Vertica to store &    Faster insights
     input behavioral data   operationalize high     delivered more
     •  Hadoop / MR to       value biz data          consistently with less
     find conversions        •  Reporting &          administrative
     (regexp processing)     analytics via Tableau   overhead, and
                             and R                   cheaper hardware!!
                             •  Custom ETL




12
On a Privacy Assurance Mission


             Collect user              Use MR to                 Use Vertica to
          privacy reporting           process and               analyze stats for
            requests into          structure the data            every 3rd party
               HDFS                into Vertica (ETL)          tag on a website.




     For Consumers:                         For Advertisers:

                                            Provide greater transparency
                                            to end-users    (look for               on an a
     A free browser plugin that
     can tell you who’s tracking            Understand impact of 3rd party tags on
     you!                                   website performance

13
Social Video
      Social Video Analytics                                     Social Video Advertising




▫  Video analytics – 100+ Leading Pubs
                                                    Hadoop for batch processing
                                                    of logs and ETL into Vertica
       ▫ Campaign Measurement – 100+ major brands
                                                    Vertica for ad-hoc analytics
                                                    and interactive dashboards
▫ Industry-Wide Charts


                                                    Redis KV store for serving
                                                    low-latency data needs


                     100s of millions of events collected and processed
                           daily on Petabyte scale infrastructure!


14
Try Vertica for free!



     Community Edition

     Up to 1 TB limit, 3 nodes!

     Check out Vertica
       Extensions on Github!




15
References and Other Info …


 Website:    www.vertica.com

 Community Edition: http://www.vertica.com/community/

 Github:    https://github.com/vertica/Vertica-Extension-Packages

 Questions or Comments: shilpa@vertica.com

 Jobs: resumes@vertica.com     (Awesome new location in Cambridge,
 MA!)

 Follow us on Twitter: @slawande, @verticacorp

16
Sessions will resume at 2:25pm




                             Page 17

More Related Content

What's hot

End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed RJorge Martinez de Salinas
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses HadoopNarayan Bharadwaj
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservicesconfluent
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group MeetVinoth Kannan
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBaseRohit Jain
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...DataWorks Summit
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...MapR Technologies
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDataWorks Summit
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna HiremaneIntelAPAC
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...DataWorks Summit
 

What's hot (20)

End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed REnd-to-end Machine Learning Pipelines with HP Vertica and Distributed R
End-to-end Machine Learning Pipelines with HP Vertica and Distributed R
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservices
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
 
APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 

Viewers also liked

Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease ManagementInterpretOmics
 
Contrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsaContrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsaArvelis Reyes
 
Buscadores
Buscadores Buscadores
Buscadores PauColdR
 
TERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑATERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑAyensen rioja
 
Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2Francés Sari
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoivirginie gomez
 
Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016Heather Ellis
 
Lomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la LomceLomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la Lomceeguzsal
 
Digital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our CultureDigital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our CultureRuss Nelson
 
Digital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo CommunicationsDigital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo CommunicationsHeather Ellis
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoivirginie gomez
 
Argan oil liquid gold with magical benefits
Argan oil  liquid gold with magical benefitsArgan oil  liquid gold with magical benefits
Argan oil liquid gold with magical benefitsClintech Tech
 
PLC-SCADA and automation
PLC-SCADA and automationPLC-SCADA and automation
PLC-SCADA and automationGaurav Sharma
 
PLC and SCADA in Industrial Automation
PLC and SCADA in Industrial AutomationPLC and SCADA in Industrial Automation
PLC and SCADA in Industrial AutomationNikhil nnk
 
Slideshare.Com Powerpoint
Slideshare.Com PowerpointSlideshare.Com Powerpoint
Slideshare.Com Powerpointguested929b
 

Viewers also liked (20)

Grammaire intermediare
Grammaire   intermediareGrammaire   intermediare
Grammaire intermediare
 
Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease Management
 
Contrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsaContrato servicio profesionales pdvsa
Contrato servicio profesionales pdvsa
 
updated resume 2016 - QS
updated resume 2016 - QSupdated resume 2016 - QS
updated resume 2016 - QS
 
Real madrid
Real madrid Real madrid
Real madrid
 
Buscadores
Buscadores Buscadores
Buscadores
 
TERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑATERRENOS DE PLANTACIONES DE PIÑA
TERRENOS DE PLANTACIONES DE PIÑA
 
Situations
SituationsSituations
Situations
 
Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2Expressions pour utiliser en cours de français2
Expressions pour utiliser en cours de français2
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoi
 
Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016Integrated Communications Summer Internship 2016
Integrated Communications Summer Internship 2016
 
Lomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la LomceLomce datorkigu /Lo que nos viene con la Lomce
Lomce datorkigu /Lo que nos viene con la Lomce
 
Digital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our CultureDigital Ethnography: New Ways of Knowing Ourselves and Our Culture
Digital Ethnography: New Ways of Knowing Ourselves and Our Culture
 
Digital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo CommunicationsDigital Strategy Recommendations Written Proposal - Allo Communications
Digital Strategy Recommendations Written Proposal - Allo Communications
 
la voix passive - B2 vp eoi
la voix passive - B2 vp eoila voix passive - B2 vp eoi
la voix passive - B2 vp eoi
 
Argan oil liquid gold with magical benefits
Argan oil  liquid gold with magical benefitsArgan oil  liquid gold with magical benefits
Argan oil liquid gold with magical benefits
 
PLC-SCADA and automation
PLC-SCADA and automationPLC-SCADA and automation
PLC-SCADA and automation
 
PLC and SCADA in Industrial Automation
PLC and SCADA in Industrial AutomationPLC and SCADA in Industrial Automation
PLC and SCADA in Industrial Automation
 
Slideshare.Com Powerpoint
Slideshare.Com PowerpointSlideshare.Com Powerpoint
Slideshare.Com Powerpoint
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Combining Hadoop RDBMS for Large-Scale Big Data Analytics

Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Datafreshdatabos
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopMapR Technologies
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-OverviewHarry Frost
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Michael Hiskey
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUG IT
 

Similar to Combining Hadoop RDBMS for Large-Scale Big Data Analytics (20)

Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
Kognitio overview jan 2013
Kognitio overview jan 2013Kognitio overview jan 2013
Kognitio overview jan 2013
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Combining Hadoop RDBMS for Large-Scale Big Data Analytics

  • 1. COMBINING HADOOP & VERTICA FOR LARGE SCALE ANALYTICS Hadoop Summit 2012 Shilpa Lawande VP Engineering, Vertica, an HP Company 1 ©2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
  • 2. The Big Data Problem 2
  • 3. The (popular) Solution Vertica’s Real-time Analytics + Scale & Flexibility of Hadoop 3
  • 4. What is Vertica? Speed •  SQL Database for Real-time Analytics Services Cloud •  Runs on x86 hardware •  MPP Columnar Architecture – scales to PBs! Monetize Real Time •  Extensible analytics capabilities •  Easy to setup and use •  Elastic - grow/shrink as needed Better Decisions Statistics Mobile Individual Analysis Simplicity 4
  • 5. What Analytics can Vertica do? SQL Extended SQL •  Window functions •  Sessionization SDKs •  Graph •  Time series •  C++ •  Monte Carlo •  Pattern •  R matching •  Statistical •  Event series •  Geospatial joins 5 Check out: https://github.com/vertica/Vertica-Extension-Packages
  • 6. Who uses Vertica? 600+ Customers worldwide “… by partnering with Vertica we’re able to provide operators the tools they need to confidently interpret customer experience…” Steve Kish, Director, Product Management Empirix “…being able to run social graph analysis on tables with tens of billions of rows with a fast turn around is amazing…” Dan McCaffrey, Director of Analytics, Zynga 6
  • 7. What's different – Hadoop vs Vertica? Vertica Hadoop Both •  Designed for Purpose-built •  Designed for Performance Scalable Fault-tolerance •  SQL Analytics •  Map-Reduce •  Interactive Platforms •  Batch Analytics Analytics Read: http://www.vertica.com/2011/09/21/counting-triangles/ 7
  • 8. Getting the best of both worlds! SQL/ Extensions     In C++, R ODBC/      Ver%ca     JDBC        Engine   External Tables Native User-defined Loads Ver%ca     Storage   8 Hadoop/MR Connector New in Vertica 6
  • 9. Joint Use Cases Hadoop for ETL, Vertica for Analytics •  Logparsing / tagging / filtering •  Convert JSON into relational tuples HDFS for data storage, Vertica + Hadoop for Analytics •  Real-timeanalytics on Vertica (needs speed) •  Long-running / exploratory analytics on Hadoop (needs fault tolerance) •  Load from HDFS directly to Vertica (needs Vertica 6) •  SQL access to HDFS (needs Vertica 6) Vertica for data storage, Hadoop as a multi-purpose tool •  Hadoop as a scheduler / load-balancer •  Hadoop to convert to formats for other tools (e.g. STATA) •  Hadoop for Backup via Sqoop 9
  • 11. Accelerating Drug Discovery The solution •  Analyzing gene •  Queries went from 5 variants using SNPs hours to 5 minutes and Microarray data •  Hadoop to find the variants •  Scale to 100s of TB of between a sample sequence data and a reference genome •  More experiments => •  Vertica to determine oncology faster discoveries! targets •  Tools: Pipeline Pilot, Spotfire, R The problem The value 11
  • 12. Digital Consumer Insights •  HDFS to store raw •  Vertica to store & Faster insights input behavioral data operationalize high delivered more •  Hadoop / MR to value biz data consistently with less find conversions •  Reporting & administrative (regexp processing) analytics via Tableau overhead, and and R cheaper hardware!! •  Custom ETL 12
  • 13. On a Privacy Assurance Mission Collect user Use MR to Use Vertica to privacy reporting process and analyze stats for requests into structure the data every 3rd party HDFS into Vertica (ETL) tag on a website. For Consumers: For Advertisers: Provide greater transparency to end-users (look for on an a A free browser plugin that can tell you who’s tracking Understand impact of 3rd party tags on you! website performance 13
  • 14. Social Video Social Video Analytics Social Video Advertising ▫  Video analytics – 100+ Leading Pubs Hadoop for batch processing of logs and ETL into Vertica ▫ Campaign Measurement – 100+ major brands Vertica for ad-hoc analytics and interactive dashboards ▫ Industry-Wide Charts Redis KV store for serving low-latency data needs 100s of millions of events collected and processed daily on Petabyte scale infrastructure! 14
  • 15. Try Vertica for free! Community Edition Up to 1 TB limit, 3 nodes! Check out Vertica Extensions on Github! 15
  • 16. References and Other Info … Website: www.vertica.com Community Edition: http://www.vertica.com/community/ Github: https://github.com/vertica/Vertica-Extension-Packages Questions or Comments: shilpa@vertica.com Jobs: resumes@vertica.com (Awesome new location in Cambridge, MA!) Follow us on Twitter: @slawande, @verticacorp 16
  • 17. Sessions will resume at 2:25pm Page 17