SlideShare a Scribd company logo
1 of 16
Big Data in the Cloud
@bensullins
Columnar DB

MPP Architecture

Speed!
2TB
XL Node
High Storage Extra Large (XL) DW
Node:
CPU: 2 virtual cores - Intel Xeon E5
Memory: 15 GiB
Storage: 3 HDD with 2TB of local
attached storage
Network: Moderate
Disk I/O: Moderate
API: dw.hs1.xlarge

16TB
8XL Node
High Storage Eight Extra Large (8XL) DW Node:

CPU: 16 virtual cores - Intel Xeon E5
Memory: 120 GiB
Storage: 24 HDD with 16TB of local attached
storage
Network: 10 Gigabit Ethernet with support for
cluster placement groups
Disk I/O: Very High
API: dw.hs1.8xlarge
On-Demand Pricing

DW Node Class (On-Demand)

Hourly

XL Node - 2TB storage (Per Node)

$0.850 per Hour

8XL Node - 16TB storage (Per
Node)

$6.800 per Hour

Reserved Instance 1yr (41% savings)
DW Node Class (Reserved)

Up front

Hourly

XL Node - 2TB storage (Per Node)

$2,500

$0.215 per Hour

8XL Node - 16TB storage (Per Node)

$20,000

$1.720 per Hour

Reserved Instance 3yr (73% savings)
DW Node Class (Reserved)

Up front

Hourly

XL Node - 2TB storage (Per Node)

$3,000

$0.114 per Hour

8XL Node - 16TB storage (Per Node)

$24,000

$0.912 per Hour
Web Interface
Fully Managed

Automated Backups
Fault Tolerant
AES-256 bit Encryption

Amazon VPC Firewall
BigQuery
Columnar DB

Tree Architecture

Speed!
“Dremel can

Scan 35 Billion
Rows
without an Index in

Tens of Seconds”
– Solutions Architect, Google Cloud Solutions
Team
On-Demand Pricing
Resource

Pricing

Storage

$80 (per TB/month)

Interactive Queries

$35 (per TB processed)

Batch Queries

$20 (per TB processed)

Packaged Pricing
Data

100 TB

$3,300 per month ($33 per TB)

400 TB

$12,000 per month ($30 per TB)

1,500 TB

$40,500 per month ($27 per TB)

4,000 TB
•
•

Cost

$100,000 per month ($25 per TB)

Packages are billed in full at the end of each month, whether the package is used or not.
If you use more data than the amount in your chosen package, on-demand rates apply for any
additional data.
Cloud Big Data Sources Comparison
Amazon Redshift

Google BigQuery

Columnar + MPP

Columnar + Tree

Petabytes in Scale

Infinite Scalability

Easy management interface

No Management Required

Straight forward billing
($1K/TB/Yr)

Confusing Pricing Model

Great connectivity w/ BI Tools

Fair Connectivity w/ BI Tools

More Related Content

What's hot

Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...Amazon Web Services
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовGeeksLab Odessa
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sqlhdhappy001
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийGeeksLab Odessa
 
Распределенные системы хранения данных, особенности реализации DHT в проекте ...
Распределенные системы хранения данных, особенности реализации DHT в проекте ...Распределенные системы хранения данных, особенности реализации DHT в проекте ...
Распределенные системы хранения данных, особенности реализации DHT в проекте ...yaevents
 
Amazon Web Services lection 4
Amazon Web Services lection 4  Amazon Web Services lection 4
Amazon Web Services lection 4 Binary Studio
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationNorberto Leite
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysCAPSiDE
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim TkachenkoWebinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim TkachenkoAltinity Ltd
 
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraAnant Corporation
 
Building maps for apps in the cloud - a Softlayer Use Case
Building maps for  apps in the cloud - a Softlayer Use CaseBuilding maps for  apps in the cloud - a Softlayer Use Case
Building maps for apps in the cloud - a Softlayer Use CaseTiman Rebel
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
 
Big data solution capacity planning
Big data solution capacity planningBig data solution capacity planning
Big data solution capacity planningRiyaz Shaikh
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudDatabricks
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraOlga Lavrentieva
 

What's hot (20)

Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
 
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья СвиридовManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
ManetoDB: Key/Value storage, BigData in Open Stack_Сергей Ковалев, Илья Свиридов
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
Распределенные системы хранения данных, особенности реализации DHT в проекте ...
Распределенные системы хранения данных, особенности реализации DHT в проекте ...Распределенные системы хранения данных, особенности реализации DHT в проекте ...
Распределенные системы хранения данных, особенности реализации DHT в проекте ...
 
Amazon Web Services lection 4
Amazon Web Services lection 4  Amazon Web Services lection 4
Amazon Web Services lection 4
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim TkachenkoWebinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
 
Building maps for apps in the cloud - a Softlayer Use Case
Building maps for  apps in the cloud - a Softlayer Use CaseBuilding maps for  apps in the cloud - a Softlayer Use Case
Building maps for apps in the cloud - a Softlayer Use Case
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Big data solution capacity planning
Big data solution capacity planningBig data solution capacity planning
Big data solution capacity planning
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
MongoDB @ fliptop
MongoDB @ fliptopMongoDB @ fliptop
MongoDB @ fliptop
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 

Viewers also liked

Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering Yingcai Wu
 
Google drive presentación
Google drive presentaciónGoogle drive presentación
Google drive presentaciónsantiago2005
 
Making pretty charts that actually mean something
Making pretty charts that actually mean somethingMaking pretty charts that actually mean something
Making pretty charts that actually mean somethingBen Sullins
 
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering Yingcai Wu
 
Research Genre
Research GenreResearch Genre
Research GenrePewwis
 
Big Data Analytics Preview
Big Data Analytics PreviewBig Data Analytics Preview
Big Data Analytics PreviewBen Sullins
 
Visual Analysis of Topic Competition on Social Media
Visual Analysis of Topic Competition on Social Media Visual Analysis of Topic Competition on Social Media
Visual Analysis of Topic Competition on Social Media Yingcai Wu
 
StoryFlow - Visually Tracking Evolution of Stories
StoryFlow - Visually Tracking Evolution of StoriesStoryFlow - Visually Tracking Evolution of Stories
StoryFlow - Visually Tracking Evolution of StoriesYingcai Wu
 

Viewers also liked (10)

Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
 
Red wine
Red wineRed wine
Red wine
 
Bekir
BekirBekir
Bekir
 
Google drive presentación
Google drive presentaciónGoogle drive presentación
Google drive presentación
 
Making pretty charts that actually mean something
Making pretty charts that actually mean somethingMaking pretty charts that actually mean something
Making pretty charts that actually mean something
 
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
Perceptually Based Depth-Ordering Enhancement for Direct Volume Rendering
 
Research Genre
Research GenreResearch Genre
Research Genre
 
Big Data Analytics Preview
Big Data Analytics PreviewBig Data Analytics Preview
Big Data Analytics Preview
 
Visual Analysis of Topic Competition on Social Media
Visual Analysis of Topic Competition on Social Media Visual Analysis of Topic Competition on Social Media
Visual Analysis of Topic Competition on Social Media
 
StoryFlow - Visually Tracking Evolution of Stories
StoryFlow - Visually Tracking Evolution of StoriesStoryFlow - Visually Tracking Evolution of Stories
StoryFlow - Visually Tracking Evolution of Stories
 

Similar to Big data in the cloud

MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowskaguest43b4df3
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World LazowskaWCET
 
AWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APACAWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APACAmazon Web Services
 
AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data WarehouseAWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data WarehouseAmazon Web Services
 
Oracle Exadata Version 2
Oracle Exadata Version 2Oracle Exadata Version 2
Oracle Exadata Version 2Jarod Wang
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
(STG403) Amazon EBS: Designing for Performance
(STG403) Amazon EBS: Designing for Performance(STG403) Amazon EBS: Designing for Performance
(STG403) Amazon EBS: Designing for PerformanceAmazon Web Services
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)Guy Coates
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
Amazon Web Services - An Overview
Amazon Web Services - An OverviewAmazon Web Services - An Overview
Amazon Web Services - An Overviewchregu
 
Exadata x2 ext
Exadata x2 extExadata x2 ext
Exadata x2 extyangjx
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Soluzioni integrate per il design e la comunicazione digitale: Buffalo
Soluzioni integrate per il design e la comunicazione digitale: BuffaloSoluzioni integrate per il design e la comunicazione digitale: Buffalo
Soluzioni integrate per il design e la comunicazione digitale: BuffaloPico Srl
 

Similar to Big data in the cloud (20)

MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
AWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APACAWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APAC
 
AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data WarehouseAWS Summit Tel Aviv - Enterprise Track - Data Warehouse
AWS Summit Tel Aviv - Enterprise Track - Data Warehouse
 
Oracle Exadata Version 2
Oracle Exadata Version 2Oracle Exadata Version 2
Oracle Exadata Version 2
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
(STG403) Amazon EBS: Designing for Performance
(STG403) Amazon EBS: Designing for Performance(STG403) Amazon EBS: Designing for Performance
(STG403) Amazon EBS: Designing for Performance
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
Amazon Web Services - An Overview
Amazon Web Services - An OverviewAmazon Web Services - An Overview
Amazon Web Services - An Overview
 
Windows Azure Storage – Architecture View
Windows Azure Storage – Architecture ViewWindows Azure Storage – Architecture View
Windows Azure Storage – Architecture View
 
Exadata x2 ext
Exadata x2 extExadata x2 ext
Exadata x2 ext
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Processing and Analytics
Processing and AnalyticsProcessing and Analytics
Processing and Analytics
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Soluzioni integrate per il design e la comunicazione digitale: Buffalo
Soluzioni integrate per il design e la comunicazione digitale: BuffaloSoluzioni integrate per il design e la comunicazione digitale: Buffalo
Soluzioni integrate per il design e la comunicazione digitale: Buffalo
 

Recently uploaded

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Big data in the cloud

  • 1. Big Data in the Cloud @bensullins
  • 2.
  • 4. 2TB XL Node High Storage Extra Large (XL) DW Node: CPU: 2 virtual cores - Intel Xeon E5 Memory: 15 GiB Storage: 3 HDD with 2TB of local attached storage Network: Moderate Disk I/O: Moderate API: dw.hs1.xlarge 16TB 8XL Node High Storage Eight Extra Large (8XL) DW Node: CPU: 16 virtual cores - Intel Xeon E5 Memory: 120 GiB Storage: 24 HDD with 16TB of local attached storage Network: 10 Gigabit Ethernet with support for cluster placement groups Disk I/O: Very High API: dw.hs1.8xlarge
  • 5. On-Demand Pricing DW Node Class (On-Demand) Hourly XL Node - 2TB storage (Per Node) $0.850 per Hour 8XL Node - 16TB storage (Per Node) $6.800 per Hour Reserved Instance 1yr (41% savings) DW Node Class (Reserved) Up front Hourly XL Node - 2TB storage (Per Node) $2,500 $0.215 per Hour 8XL Node - 16TB storage (Per Node) $20,000 $1.720 per Hour Reserved Instance 3yr (73% savings) DW Node Class (Reserved) Up front Hourly XL Node - 2TB storage (Per Node) $3,000 $0.114 per Hour 8XL Node - 16TB storage (Per Node) $24,000 $0.912 per Hour
  • 6. Web Interface Fully Managed Automated Backups Fault Tolerant
  • 8.
  • 10.
  • 12. “Dremel can Scan 35 Billion Rows without an Index in Tens of Seconds” – Solutions Architect, Google Cloud Solutions Team
  • 13.
  • 14. On-Demand Pricing Resource Pricing Storage $80 (per TB/month) Interactive Queries $35 (per TB processed) Batch Queries $20 (per TB processed) Packaged Pricing Data 100 TB $3,300 per month ($33 per TB) 400 TB $12,000 per month ($30 per TB) 1,500 TB $40,500 per month ($27 per TB) 4,000 TB • • Cost $100,000 per month ($25 per TB) Packages are billed in full at the end of each month, whether the package is used or not. If you use more data than the amount in your chosen package, on-demand rates apply for any additional data.
  • 15.
  • 16. Cloud Big Data Sources Comparison Amazon Redshift Google BigQuery Columnar + MPP Columnar + Tree Petabytes in Scale Infinite Scalability Easy management interface No Management Required Straight forward billing ($1K/TB/Yr) Confusing Pricing Model Great connectivity w/ BI Tools Fair Connectivity w/ BI Tools

Editor's Notes

  1. Optimized for Data Warehousing – Amazon Redshift uses a variety of innovations to obtain very high query performance on datasets ranging in size from hundreds of gigabytes to a petabyte or more. It uses columnar storage, data compression, and zone maps to reduce the amount of IO needed to perform queries. Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to take advantage of all available resources. The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the Intel Xeon E5 processor and drives, and a 10GigE mesh network to maximize throughput between nodes.
  2. Scalable – With a few clicks of the AWS Management Console or a simple API call, you can easily scale the number of nodes in your data warehouse up or down as your performance or capacity needs change. Amazon Redshift enables you to start with as little as a single 2TB XL node and scale up all the way to a hundred 16TB 8XL nodes for 1.6PB of compressed user data. Amazon Redshift will place your existing cluster into read-only mode, provision a new cluster of your chosen size, and then copy data from your old cluster to your new one in parallel. You can continue running queries against your old cluster while the new one is being provisioned. Once your data has been copied to your new cluster, Amazon Redshift will automatically redirect queries to your new cluster and remove the old cluster.
  3. No Up-Front Costs – You pay only for the resources you provision. You can choose On-Demand pricing with no up-front costs or long-term commitments, or obtain significantly discounted rates with Reserved Instance pricing. On-Demand pricing starts at just $0.85 per hour for a single node 2TB data warehouse, scaling linearly with cluster size. With Reserved Instance pricing, you can lower your effective price to $0.228 per hour for a single 2TB node, or under $1,000 per TB per year. To see more details, visit the Amazon Redshift Pricing page.
  4. Get Started in Minutes – With a few clicks in the AWS Management Console or simple API calls, you can create a cluster, specifying its size, underlying node type, and security profile. Amazon Redshift will provision your nodes, configure the connections between them, and secure the cluster. Your data warehouse should be up and running in minutes.Fully Managed – Amazon Redshift handles all the work needed to manage, monitor, and scale your data warehouse, from monitoring cluster health and taking backups to applying patches and upgrades. You can easily add or remove nodes from your cluster as your performance and capacity needs change. By handling all these time-consuming, labor-intensive tasks, Amazon Redshift frees you up to focus on your data and business.Fault Tolerant – Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster. All data written to a node in your cluster is automatically replicated to other nodes within the cluster and all data is continuously backed up to Amazon S3. Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.Automated Backups – Amazon Redshift’s automated snapshot feature continuously backs up new data on the cluster to Amazon S3. Snapshots, are automated, incremental, and continous. Amazon Redshift stores your snapshots for a user-defined period, which can be from one to thirty-five days. You can also take your own snapshots at any time, which leverage all existing system snapshots and are retained until you explicitly delete them. Once you delete a cluster, your system snapshots are removed but your user snapshots are available until you explicitly delete them.Easy Restores - You can use any system or user snapshot to restore your cluster using the AWS Management Console or the Amazon Redshift APIs. Your cluster is available as soon as the system metadata has been restored and you can start running queries while user data is spooled down in the background.
  5. Encryption – With just a couple of parameter settings, you can set up Amazon Redshift to use SSL to secure data in transit and hardware-acccelerated AES-256 encryption for data at rest. If you choose to enable encryption of data at rest, all data written to disk will be encrypted as well as any backups.Isolation - Amazon Redshift enables you to configure firewall rules to control network access to your data warehouse cluster. You can also run Amazon Redshift inside Amazon Virtual Private Cloud (Amazon VPC) to isolate your data warehouse cluster in your own virtual network and connect it to your existing IT infrastructure using industry-standard encrypted IPsec VPN.
  6. SQL - Amazon Redshift is a SQL data warehouse and uses industry standard ODBC and JDBC connections and Postgres drivers. Many popular software vendors are certifying Amazon Redshift with their offerings to enable you to continue to use the tools you do today. See the Amazon Redshift partner page for details.Designed for use with other AWS Services – Amazon Redshift is integrated with other AWS services and has built in commands to load data in parallel to each node from Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. AWS Data Pipeline enables easy, programmatic integration between Amazon Redshift, Amazon Elastic MapReduce (Amazon EMR), and Amazon Relational Database Service (Amazon RDS).
  7. BigQuery is Google’s Cloud Big Data solution based on the Dremel platform. Dremel has been in development for over 6 years and powers much of Googles Cloud Platform. It’s worth mentioning that for this course I’m going to cover BigQuery at a high-level and then later we’ll connect Tableau up to it to see how functionally to use it. If you’d like to dive deeper into BigQuery Lynn Langit has a course on here which goes into much greater detail that is definitely worth checking out.Let’s start by taking a look at their homepage.
  8. Looking at their interface, on their homepage they proclaim, analyze terabytes of data w/ just a click of a button. Sounds promising, if it weren’t for Amazon Redshift offering petabytes in scale.You’ll also notice a query editor and result pane previewed, this is encouraging however for non-sql developers this can be a scary sight.
  9. Similar to Amazon’s Redshift Google Bigquery stores data in a columnar database format which is great for data compression and query speeds.Google Bigquery differs from Amazon redshift however in that it uses this Tree structure which is similar to a MPP database however it spreads the data extremely wide and for queries creates execution “trees” which can scan tens of thousands of servers or leaf nodes containing the data and return results in miliseconds.Like Redshift, this all adds up to speed! Google is trying to differentiate from MPP solutions with BigQuery by providing what they call full-scan results. This is essentially by creating a query tree of every possible combination of query you can run. In their whitepaper from Kazunori Sato titled “An Inside Look at Google BigQuery” he states that “BigQuery solves the parallel disk I/O problem by utilizing the cloud platform’s economy of scale. You would need to run 10,000 disk drives and 5,000 processors simultaneously to execute the full scan of 1TB of data within one second. “ Impressive.The quote from this whitepaper tat
  10. Dremel is the platform which Google is Based on.
  11. Scalabitlity with Google BigQuery is a bit if a mystery to be honest. Since they handle all of the administration and data distribution for you the scalability really is only limited by based on how much you can afford. Once you upload your data to BigQuery, it handles the rest, you only need to worry about how much data is going to be processed in your queries, this brings us to their pricing model.
  12. Big Data analysis engine without operating a data center Managed service means no additional capital costs Ability to terminate service and remove your data at any timeTransparency in pricing and usage Simplicity: only 2 pricing components (query processing, storage) Flexibility: choice to pay-by-the-month for what you useFull Visibility and Control Monthly billing: Monitor and throttle what you use Tools to optimize usage/costs: best practices, tooling, samplesSince you’re charged by amount of data processed this can be very expensive if using a “chatty” query tool like Tableau. Google recommends to shard data into separate tables using a time stamp and setting your queries to filter just to a specific date range to minimize query costs.In my view this is the only issue with BigQuery. Let’s say you have a query which pulls back something like sales for the west region by month for the past year. This will return 24 data points. That’s 12 integers for sales, and 12 date values corresponding to the month of sales. To get to these 24 data points your query may have to scan millions or billions of rows, imagine Amazon’s detailed sales transactions, aggregate the data, then return your results. Since you’re paying for all the data scanned, a single query could really rack up the bills. Now, if you were building a focused application and not doing visual analytics using a tool like Tableau you can probably handle this quite well however in this case, it can be cost prohibitive to store your data here. I have a friend who was testing this and one of his analysts actually ran a single query that cost them $400!