SlideShare a Scribd company logo
1 of 21
Redshift vs Big Query Lessons Learned
at Yahoo!
P R E S E N T E D B Y J o n a t h a n R a s p a u d ⎪ J a n u a r y 2 n d , 2 0 1 7
About Jonathan Raspaud:
1998 2000
2006
2011
2012
Senior Principal Data Architect
Mobility Practice Lead
Manager Business Intelligence
Datawarehouse EngineerSoftware Engineer
Software Engineer
Teamlog
1999
IAE Grenoble
Master of Science in Management
of Information Systems
1997
Data Analytics
Redshift vs BigQuery:
 Amazon Redshift is a partially managed service. If
Amazon Redshift users want to scale a cluster up or
down— for example, to reduce costs during periods of low
usage, or to increase resources during periods of heavy
usage—they must do so manually. In addition, Amazon
Redshift requires users to carefully define and manage
their distribution and sort keys, and to perform data
cleanup and defragmentation processes manually.
 Amazon Redshift can scale from a single node to a
maximum of either 128 nodes for 8xlarge node types or 32
nodes for smaller node types. These limits mean that
Amazon Redshift has a maximum capacity of 2PB of
stored data, including replicated data.
Redshift vs BigQuery (2):
 To achieve good performance, the user must define their static distribution keys at the time of table
creation. These distribution keys are then used by the system to shard the data across the nodes so
that queries can be performed in parallel. Because distribution keys have a significant effect on query
performance, the user must choose these keys carefully. After the user defines their distribution keys,
the keys cannot be changed; to use different keys, the user must create a new table with the new
keys and copy their data from the old table.
 In addition, Amazon recommends that the administrator perform periodic maintenance to reclaim lost
space. Because updates and deletes do not automatically compact the resident data on disk, they
can eventually lead to performance bottlenecks. For more information, see Vacuuming Tables in the
Amazon Redshift documentation.
 Amazon Redshift administrators must manage their end users and applications carefully. For
example, users must tune the number of concurrent queries they perform. By default, Amazon
Redshift performs up to 5 concurrent queries. Because resources are provisioned ahead of time, as
you increase this limit—the maximum is 50—performance and throughput can begin to suffer. See the
Concurrency Levels section of Defining Query Queues in the Amazon Redshift documentation for
details.
 Amazon Redshift administrators must also size their cluster to support the overall data size, query
performance, and number of concurrent users. Administrators can scale up the cluster; however,
given the provisioned model, the users pay for what they provision, regardless of usage.
 Finally, Amazon Redshift clusters are restricted to a single zone by default. To create a highly
available, multi-regional Amazon Redshift architecture, the user must create additional clusters in
other zones, and then build out a mechanism for achieving consistency across clusters. For more
information, see the Building Multi-AZ or Multi-Region Amazon Redshift Clusters post in the Amazon
Redshift vs BigQuery (3):
 In contrast, BigQuery is fully managed. Users do not
need to provision resources; instead, they can simply push
data into BigQuery, and then query across the data. The
BigQuery service manages the associated resources
opaquely and scales them automatically as appropriate.
 BigQuery has no practical limits on the size of a stored
dataset. Ingestion resources scale quickly, and ingestion
itself is extremely fast—by using the BigQuery API, you
can ingest millions of rows into BigQuery per second. In
addition, ingestion resources are decoupled from
query resources, so an ingestion load cannot degrade
the performance of a query load.
Redshift vs BigQuery (4):
 BigQuery handles sharding automatically. Users do not
need to create and maintain distribution keys.
 BigQuery is an on-demand service rather than a
provisioned one. Users do not need to worry about under
provisioning, which can cause bottlenecks, or
overprovisioning, which can result in unnecessary costs.
 BigQuery provides global, managed data replication.
Users do not need to set up and manage multiple
deployments.
 BigQuery supports up to 50 concurrent interactive
queries, with no effect on performance or throughput.
Cloud 2.0 vs 3.0 with GCP
9 Yahoo Confidential & Proprietary
10 Yahoo Confidential & Proprietary
11 Yahoo Confidential & Proprietary
12 Yahoo Confidential & Proprietary
13 Yahoo Confidential & Proprietary
14 Yahoo Confidential & Proprietary
15 Yahoo Confidential & Proprietary
16 Yahoo Confidential & Proprietary
17 Yahoo Confidential & Proprietary
18 Yahoo Confidential & Proprietary
19 Yahoo Confidential & Proprietary
20 Yahoo Confidential & Proprietary
21 Yahoo Confidential & Proprietary

More Related Content

What's hot

Data lake analytics for the admin
Data lake analytics for the adminData lake analytics for the admin
Data lake analytics for the adminTillmann Eitelberg
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demoDatabricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Dipti Borkar
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
 
Making connections with Graph
Making connections with GraphMaking connections with Graph
Making connections with GraphDataStax
 
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiAccelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiDatabricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Building and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStaxBuilding and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStaxDataStax
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Lviv Startup Club
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastYellowbrick Data
 

What's hot (20)

The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Data lake analytics for the admin
Data lake analytics for the adminData lake analytics for the admin
Data lake analytics for the admin
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
BigQuery for Beginners
BigQuery for BeginnersBigQuery for Beginners
BigQuery for Beginners
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
Varadarajan CV
Varadarajan CVVaradarajan CV
Varadarajan CV
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)
 
Making connections with Graph
Making connections with GraphMaking connections with Graph
Making connections with Graph
 
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiAccelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali Ghodsi
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStaxBuilding and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStax
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
 

Similar to Redshift vs BigQuery lessons learned at Yahoo!

Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...Yahoo Developer Network
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - TestKiran Naiga
 
AWS-RedShift The Fundamental for beginner
AWS-RedShift The Fundamental  for beginnerAWS-RedShift The Fundamental  for beginner
AWS-RedShift The Fundamental for beginnerAnIchi4
 
From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...Cognizant
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET Journal
 
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...ScaleBase
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseAlireza Kamrani
 
Ieee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture reportIeee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture reportOutsource Portfolio
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL TechnologiesAmit Singh
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data managementDavid Walker
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAmazon Web Services
 
SharePoint 2013 supported DB's
SharePoint 2013 supported DB'sSharePoint 2013 supported DB's
SharePoint 2013 supported DB'sak-allaire
 
Sap Interview Questions - Part 1
Sap Interview Questions - Part 1Sap Interview Questions - Part 1
Sap Interview Questions - Part 1ReKruiTIn.com
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Amazon Web Services
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Emmanuel Olowosulu
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
Operations: Cost Optimization - Don't Overspend on Infrastructure
Operations: Cost Optimization - Don't Overspend on Infrastructure Operations: Cost Optimization - Don't Overspend on Infrastructure
Operations: Cost Optimization - Don't Overspend on Infrastructure Amazon Web Services
 

Similar to Redshift vs BigQuery lessons learned at Yahoo! (20)

Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
 
Amplitude wave architecture - Test
Amplitude wave architecture - TestAmplitude wave architecture - Test
Amplitude wave architecture - Test
 
AWS-RedShift The Fundamental for beginner
AWS-RedShift The Fundamental  for beginnerAWS-RedShift The Fundamental  for beginner
AWS-RedShift The Fundamental for beginner
 
From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...From Relational Database Management to Big Data: Solutions for Data Migration...
From Relational Database Management to Big Data: Solutions for Data Migration...
 
What is Amazon Redshift?
What is Amazon Redshift?What is Amazon Redshift?
What is Amazon Redshift?
 
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of DataIRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
IRJET- Amazon Redshift Workload Management and Fast Retrieval of Data
 
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
What is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of databaseWhat is Scalability and How can affect on overall system performance of database
What is Scalability and How can affect on overall system performance of database
 
Ieee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture reportIeee-no sql distributed db and cloud architecture report
Ieee-no sql distributed db and cloud architecture report
 
Big Data using NoSQL Technologies
Big Data using NoSQL TechnologiesBig Data using NoSQL Technologies
Big Data using NoSQL Technologies
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data management
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
 
SharePoint 2013 supported DB's
SharePoint 2013 supported DB'sSharePoint 2013 supported DB's
SharePoint 2013 supported DB's
 
Sap Interview Questions - Part 1
Sap Interview Questions - Part 1Sap Interview Questions - Part 1
Sap Interview Questions - Part 1
 
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013
 
Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)Managing Large Flask Applications On Google App Engine (GAE)
Managing Large Flask Applications On Google App Engine (GAE)
 
Cjoin
CjoinCjoin
Cjoin
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Operations: Cost Optimization - Don't Overspend on Infrastructure
Operations: Cost Optimization - Don't Overspend on Infrastructure Operations: Cost Optimization - Don't Overspend on Infrastructure
Operations: Cost Optimization - Don't Overspend on Infrastructure
 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Redshift vs BigQuery lessons learned at Yahoo!

  • 1. Redshift vs Big Query Lessons Learned at Yahoo! P R E S E N T E D B Y J o n a t h a n R a s p a u d ⎪ J a n u a r y 2 n d , 2 0 1 7
  • 2. About Jonathan Raspaud: 1998 2000 2006 2011 2012 Senior Principal Data Architect Mobility Practice Lead Manager Business Intelligence Datawarehouse EngineerSoftware Engineer Software Engineer Teamlog 1999 IAE Grenoble Master of Science in Management of Information Systems 1997
  • 4. Redshift vs BigQuery:  Amazon Redshift is a partially managed service. If Amazon Redshift users want to scale a cluster up or down— for example, to reduce costs during periods of low usage, or to increase resources during periods of heavy usage—they must do so manually. In addition, Amazon Redshift requires users to carefully define and manage their distribution and sort keys, and to perform data cleanup and defragmentation processes manually.  Amazon Redshift can scale from a single node to a maximum of either 128 nodes for 8xlarge node types or 32 nodes for smaller node types. These limits mean that Amazon Redshift has a maximum capacity of 2PB of stored data, including replicated data.
  • 5. Redshift vs BigQuery (2):  To achieve good performance, the user must define their static distribution keys at the time of table creation. These distribution keys are then used by the system to shard the data across the nodes so that queries can be performed in parallel. Because distribution keys have a significant effect on query performance, the user must choose these keys carefully. After the user defines their distribution keys, the keys cannot be changed; to use different keys, the user must create a new table with the new keys and copy their data from the old table.  In addition, Amazon recommends that the administrator perform periodic maintenance to reclaim lost space. Because updates and deletes do not automatically compact the resident data on disk, they can eventually lead to performance bottlenecks. For more information, see Vacuuming Tables in the Amazon Redshift documentation.  Amazon Redshift administrators must manage their end users and applications carefully. For example, users must tune the number of concurrent queries they perform. By default, Amazon Redshift performs up to 5 concurrent queries. Because resources are provisioned ahead of time, as you increase this limit—the maximum is 50—performance and throughput can begin to suffer. See the Concurrency Levels section of Defining Query Queues in the Amazon Redshift documentation for details.  Amazon Redshift administrators must also size their cluster to support the overall data size, query performance, and number of concurrent users. Administrators can scale up the cluster; however, given the provisioned model, the users pay for what they provision, regardless of usage.  Finally, Amazon Redshift clusters are restricted to a single zone by default. To create a highly available, multi-regional Amazon Redshift architecture, the user must create additional clusters in other zones, and then build out a mechanism for achieving consistency across clusters. For more information, see the Building Multi-AZ or Multi-Region Amazon Redshift Clusters post in the Amazon
  • 6. Redshift vs BigQuery (3):  In contrast, BigQuery is fully managed. Users do not need to provision resources; instead, they can simply push data into BigQuery, and then query across the data. The BigQuery service manages the associated resources opaquely and scales them automatically as appropriate.  BigQuery has no practical limits on the size of a stored dataset. Ingestion resources scale quickly, and ingestion itself is extremely fast—by using the BigQuery API, you can ingest millions of rows into BigQuery per second. In addition, ingestion resources are decoupled from query resources, so an ingestion load cannot degrade the performance of a query load.
  • 7. Redshift vs BigQuery (4):  BigQuery handles sharding automatically. Users do not need to create and maintain distribution keys.  BigQuery is an on-demand service rather than a provisioned one. Users do not need to worry about under provisioning, which can cause bottlenecks, or overprovisioning, which can result in unnecessary costs.  BigQuery provides global, managed data replication. Users do not need to set up and manage multiple deployments.  BigQuery supports up to 50 concurrent interactive queries, with no effect on performance or throughput.
  • 8. Cloud 2.0 vs 3.0 with GCP
  • 9. 9 Yahoo Confidential & Proprietary
  • 10. 10 Yahoo Confidential & Proprietary
  • 11. 11 Yahoo Confidential & Proprietary
  • 12. 12 Yahoo Confidential & Proprietary
  • 13. 13 Yahoo Confidential & Proprietary
  • 14. 14 Yahoo Confidential & Proprietary
  • 15. 15 Yahoo Confidential & Proprietary
  • 16. 16 Yahoo Confidential & Proprietary
  • 17. 17 Yahoo Confidential & Proprietary
  • 18. 18 Yahoo Confidential & Proprietary
  • 19. 19 Yahoo Confidential & Proprietary
  • 20. 20 Yahoo Confidential & Proprietary
  • 21. 21 Yahoo Confidential & Proprietary