SlideShare a Scribd company logo
Building Blocks for Large
Analytic Systems
Andrew Lamb (alamb@vertica.com)
Vertica Systems, an HP Company
Oct 19, 2011



          VERTICA CONFIDENTIAL DO NOT DISTRIBUTE
Outline
          • Emergence of new
            hardware, software and
            data volume is driving the
            wave of new analytic
            systems in spite of mature
            existing systems.
          • Common architectural
            choices when building these
            new systems
          • Choices we made when
            building the Vertica Analytic
            Database
          • Broadly applicable (and
            widely used)                2
Architectural Changes

 • Driven by changing
    – Hardware: (really) cheap
      x86_64 servers, high
      capacity disk, high speed




                                     Better-ness
      networking
    – Software: Linux, Open
      Source
    – Requirements: Data Deluge
 • Hard for Legacy Software
    – Compatibility is brutal
    – Linux x86_64 today
                                                   Time
    – Solaris, HPUX, IRIX, etc. 10
      years ago
• Storage Organization and Processing Principles
                                                          3
[Storage + Processing] MPP (no shared disk)
                                                      SMP Server

                                                 CP     CP   CP    CP
                                                 U      U    U     U
• Any modern system should run on a              CP     CP   CP    CP
                                                 U      U    U     U
  cluster of nodes, scale up/down
                                                  System Bus
• Mid range servers are really cheap
                                               Main Memory/Cache
  (to rent or own)
• Aggregate available resources are
  enormous and scalable:                               Shared
                                                        Disk
      – I/O Bandwidth (Disk and Network)
      – Cores, Memory, etc.
                                     Network


MPP Servers




  Local Disk
                                                                        4
[Storage] Keep Your Data Sorted


• For large data sets, extra indexes
  are expensive to maintain
• Much better to index the data itself
  by sorting it
• Easy to find what you are looking
  for, reduces seeks




                                         5
[Storage] Distribute Data by Value (not chunks)


• “Sharding” -- Distribute data so you can easily find it
  again (not round robin)
• Segmenting in analytics layer simplifies app layer
• Round Robin computations won't scale: need to swizzle
  data around the cluster to do most useful thing
• Replication for high availability: not logs

 B     A   C         B   A   C           B   A   C
 2     2   2         1   1   1           3   3   3

 A     B   C         A   B   C           A   B   C
 3     3   3         2   2   2           1   1   1



                                                       6
[Storage] Store the Data in Columns


• Rarely do all fields of a
  data set appear in
  analytic queries
• Really don't want to
  waste I/O for data you
  don't need
• Columns let you be
  clever about applying
  predicates individually,
  further reducing I/O
• Not appropriate for row
  lookup or transaction
  processing systems                    7
[Storage] Write it Once, Don't Modify

• Physical use of disk
  objects should be write
  once (append only)

• System should present the
  illusion of mutability

• Immutable storage
  drastically simplifies
  coherence in a distributed
  system

                                               8
[Processing] Use Large Sequential IOs


       • Spinning disks are very good at large sequential IOs
       • You really don't want to whipsaw the read head
         (another reason why secondary indexes are bad)
       • Especially useful with sorted data
       40             Random vs Sequential Reads
       35

       30

       25
MB/s




       20

       15

       10
                                                   Random
                                                   Sequential
       5

       0
                                                                9
[Processing] Trade CPU for I/O Bandwidth


• Use very aggressive compression, even if it seems/is
  wasteful (keep getting more cores)
• Example: data type specific encoding, then LZO before
  actually writing to disk
• Hide additional
  latency with
  execution
  pipeline
  parallelism



                                                     10
[Processing] Mess with the Data where it Lives

• Bring your processing to the data, not data to the
  processing
• System should push computation down close to data
  (even if calculation turns out to be redundant)
• Example: multi-phase
  aggregation, each phase
  tries to aggregate
  intermediates before passing
  up the memory / node
  hierarchy


                                                       11
[Processing] Declarative & Extendable

• Give users a declarative query language for most tasks
• Writing procedural code for simple queries is wasteful
• Provide procedural extensions for complex analysis




                                                       12
Conclusion

• Emergence of new hardware, software and data
  volumes implies certain architectural choices in
  modern big data analytic systems.
• Commonly observed in new systems
• Vertica (unsurprisingly) features all of them




                                                     13
Introducing Community Edition

• Free version of Vertica
   – help steward better analytics and the democratization of data-
     closed source, but open access!
• All of the features and advanced analytics of Vertica
  Enterprise Edition
• Seamless upgrade to Enterprise Edition
• Limited to 1 TB raw data on 3-node hardware cluster
• Revamping Community area of Vertica’s website for
  knowledge sharing, third party tools, and code sharing
• Launching academic and non-profit research use
  program as well
• Sign up at: www.vertica.com/community
                                                                  14

More Related Content

What's hot

S ss0886 pendulum-swings-edge2015-v3
S ss0886 pendulum-swings-edge2015-v3S ss0886 pendulum-swings-edge2015-v3
S ss0886 pendulum-swings-edge2015-v3
Tony Pearson
 
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashIbm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ash
Ashutosh Mate
 
S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5
Tony Pearson
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
Rami Jebara
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
Patrick Bouillaud
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
xKinAnx
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Doug O'Flaherty
 
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
xKinAnx
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
NetApp
 
IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016
Patrick Bouillaud
 
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
DB2 10 & 11 for z/OS System Performance Monitoring and OptimisationDB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
John Campbell
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object Storage
Tony Pearson
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5
Tony Pearson
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf Weiser
Sandeep Patil
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
solarisyougood
 
Engage for success ibm spectrum accelerate 2
Engage for success   ibm spectrum accelerate 2Engage for success   ibm spectrum accelerate 2
Engage for success ibm spectrum accelerate 2
xKinAnx
 
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
xKinAnx
 
NetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineersNetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineers
subtitle
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
Sandeep Patil
 
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dS016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710d
Tony Pearson
 

What's hot (20)

S ss0886 pendulum-swings-edge2015-v3
S ss0886 pendulum-swings-edge2015-v3S ss0886 pendulum-swings-edge2015-v3
S ss0886 pendulum-swings-edge2015-v3
 
Ibm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ashIbm spectrum scale_backup_n_archive_v03_ash
Ibm spectrum scale_backup_n_archive_v03_ash
 
S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
Ibm spectrum scale fundamentals workshop for americas part 3 Information Life...
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
 
IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016IBM #Softlayer infographic 2016
IBM #Softlayer infographic 2016
 
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
DB2 10 & 11 for z/OS System Performance Monitoring and OptimisationDB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
DB2 10 & 11 for z/OS System Performance Monitoring and Optimisation
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object Storage
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf Weiser
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
Engage for success ibm spectrum accelerate 2
Engage for success   ibm spectrum accelerate 2Engage for success   ibm spectrum accelerate 2
Engage for success ibm spectrum accelerate 2
 
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
Ibm spectrum scale fundamentals workshop for americas part 7 spectrumscale el...
 
NetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineersNetApp C-mode for 7 mode engineers
NetApp C-mode for 7 mode engineers
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dS016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710d
 

Viewers also liked

1.4亿在线背后的故事(1)
1.4亿在线背后的故事(1)1.4亿在线背后的故事(1)
1.4亿在线背后的故事(1)
liqiang xu
 
Hdfs comics
Hdfs comicsHdfs comics
Hdfs comics
liqiang xu
 
浅谈灰度发布在贴吧的应用 支付宝 20130909
浅谈灰度发布在贴吧的应用 支付宝 20130909浅谈灰度发布在贴吧的应用 支付宝 20130909
浅谈灰度发布在贴吧的应用 支付宝 20130909liqiang xu
 
the NML project
the NML projectthe NML project
the NML projectLei Yang
 
Nginx internals
Nginx internalsNginx internals
Nginx internalsliqiang xu
 
Nginx维护手册
Nginx维护手册Nginx维护手册
Nginx维护手册
Lei Yang
 
高性能Web服务器Nginx及相关新技术的应用实践
高性能Web服务器Nginx及相关新技术的应用实践高性能Web服务器Nginx及相关新技术的应用实践
高性能Web服务器Nginx及相关新技术的应用实践
rewinx
 

Viewers also liked (7)

1.4亿在线背后的故事(1)
1.4亿在线背后的故事(1)1.4亿在线背后的故事(1)
1.4亿在线背后的故事(1)
 
Hdfs comics
Hdfs comicsHdfs comics
Hdfs comics
 
浅谈灰度发布在贴吧的应用 支付宝 20130909
浅谈灰度发布在贴吧的应用 支付宝 20130909浅谈灰度发布在贴吧的应用 支付宝 20130909
浅谈灰度发布在贴吧的应用 支付宝 20130909
 
the NML project
the NML projectthe NML project
the NML project
 
Nginx internals
Nginx internalsNginx internals
Nginx internals
 
Nginx维护手册
Nginx维护手册Nginx维护手册
Nginx维护手册
 
高性能Web服务器Nginx及相关新技术的应用实践
高性能Web服务器Nginx及相关新技术的应用实践高性能Web服务器Nginx及相关新技术的应用实践
高性能Web服务器Nginx及相关新技术的应用实践
 

Similar to Xldb2011 wed 1415_andrew_lamb-buildingblocks

Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Community
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
Ramsay Key
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomMicrosoft Singapore
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Lars Marowsky-Brée
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
Iraklis Psaroudakis
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
剑飞 陈
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
sabnees
 
Lessons from lhc
Lessons from lhcLessons from lhc
Lessons from lhc
drsm79
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
Severalnines
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
Benoit Perroud
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in JavaAltoros
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdfDatabase & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdfInSync2011
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
Ceph Community
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
InfinIT - Innovationsnetværket for it
 

Similar to Xldb2011 wed 1415_andrew_lamb-buildingblocks (20)

Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicom
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
Lessons from lhc
Lessons from lhcLessons from lhc
Lessons from lhc
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in Java
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdfDatabase & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
Database & Technology 2 _ Marting Lambert _ Mixed Workloads Why and How.pdf
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 

More from liqiang xu

Csrf攻击原理及防御措施
Csrf攻击原理及防御措施Csrf攻击原理及防御措施
Csrf攻击原理及防御措施
liqiang xu
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerliqiang xu
 
Xldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inXldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inliqiang xu
 
Xldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsXldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsliqiang xu
 
Xldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseXldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseliqiang xu
 
Selenium私房菜(新手入门教程)
Selenium私房菜(新手入门教程)Selenium私房菜(新手入门教程)
Selenium私房菜(新手入门教程)liqiang xu
 
大话Php之性能
大话Php之性能大话Php之性能
大话Php之性能
liqiang xu
 
1.4亿在线背后的故事(2)
1.4亿在线背后的故事(2)1.4亿在线背后的故事(2)
1.4亿在线背后的故事(2)liqiang xu
 

More from liqiang xu (8)

Csrf攻击原理及防御措施
Csrf攻击原理及防御措施Csrf攻击原理及防御措施
Csrf攻击原理及防御措施
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
 
Xldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inXldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_in
 
Xldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsXldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalytics
 
Xldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseXldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouse
 
Selenium私房菜(新手入门教程)
Selenium私房菜(新手入门教程)Selenium私房菜(新手入门教程)
Selenium私房菜(新手入门教程)
 
大话Php之性能
大话Php之性能大话Php之性能
大话Php之性能
 
1.4亿在线背后的故事(2)
1.4亿在线背后的故事(2)1.4亿在线背后的故事(2)
1.4亿在线背后的故事(2)
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Xldb2011 wed 1415_andrew_lamb-buildingblocks

  • 1. Building Blocks for Large Analytic Systems Andrew Lamb (alamb@vertica.com) Vertica Systems, an HP Company Oct 19, 2011 VERTICA CONFIDENTIAL DO NOT DISTRIBUTE
  • 2. Outline • Emergence of new hardware, software and data volume is driving the wave of new analytic systems in spite of mature existing systems. • Common architectural choices when building these new systems • Choices we made when building the Vertica Analytic Database • Broadly applicable (and widely used) 2
  • 3. Architectural Changes • Driven by changing – Hardware: (really) cheap x86_64 servers, high capacity disk, high speed Better-ness networking – Software: Linux, Open Source – Requirements: Data Deluge • Hard for Legacy Software – Compatibility is brutal – Linux x86_64 today Time – Solaris, HPUX, IRIX, etc. 10 years ago • Storage Organization and Processing Principles 3
  • 4. [Storage + Processing] MPP (no shared disk) SMP Server CP CP CP CP U U U U • Any modern system should run on a CP CP CP CP U U U U cluster of nodes, scale up/down System Bus • Mid range servers are really cheap Main Memory/Cache (to rent or own) • Aggregate available resources are enormous and scalable: Shared Disk – I/O Bandwidth (Disk and Network) – Cores, Memory, etc. Network MPP Servers Local Disk 4
  • 5. [Storage] Keep Your Data Sorted • For large data sets, extra indexes are expensive to maintain • Much better to index the data itself by sorting it • Easy to find what you are looking for, reduces seeks 5
  • 6. [Storage] Distribute Data by Value (not chunks) • “Sharding” -- Distribute data so you can easily find it again (not round robin) • Segmenting in analytics layer simplifies app layer • Round Robin computations won't scale: need to swizzle data around the cluster to do most useful thing • Replication for high availability: not logs B A C B A C B A C 2 2 2 1 1 1 3 3 3 A B C A B C A B C 3 3 3 2 2 2 1 1 1 6
  • 7. [Storage] Store the Data in Columns • Rarely do all fields of a data set appear in analytic queries • Really don't want to waste I/O for data you don't need • Columns let you be clever about applying predicates individually, further reducing I/O • Not appropriate for row lookup or transaction processing systems 7
  • 8. [Storage] Write it Once, Don't Modify • Physical use of disk objects should be write once (append only) • System should present the illusion of mutability • Immutable storage drastically simplifies coherence in a distributed system 8
  • 9. [Processing] Use Large Sequential IOs • Spinning disks are very good at large sequential IOs • You really don't want to whipsaw the read head (another reason why secondary indexes are bad) • Especially useful with sorted data 40 Random vs Sequential Reads 35 30 25 MB/s 20 15 10 Random Sequential 5 0 9
  • 10. [Processing] Trade CPU for I/O Bandwidth • Use very aggressive compression, even if it seems/is wasteful (keep getting more cores) • Example: data type specific encoding, then LZO before actually writing to disk • Hide additional latency with execution pipeline parallelism 10
  • 11. [Processing] Mess with the Data where it Lives • Bring your processing to the data, not data to the processing • System should push computation down close to data (even if calculation turns out to be redundant) • Example: multi-phase aggregation, each phase tries to aggregate intermediates before passing up the memory / node hierarchy 11
  • 12. [Processing] Declarative & Extendable • Give users a declarative query language for most tasks • Writing procedural code for simple queries is wasteful • Provide procedural extensions for complex analysis 12
  • 13. Conclusion • Emergence of new hardware, software and data volumes implies certain architectural choices in modern big data analytic systems. • Commonly observed in new systems • Vertica (unsurprisingly) features all of them 13
  • 14. Introducing Community Edition • Free version of Vertica – help steward better analytics and the democratization of data- closed source, but open access! • All of the features and advanced analytics of Vertica Enterprise Edition • Seamless upgrade to Enterprise Edition • Limited to 1 TB raw data on 3-node hardware cluster • Revamping Community area of Vertica’s website for knowledge sharing, third party tools, and code sharing • Launching academic and non-profit research use program as well • Sign up at: www.vertica.com/community 14