More Related Content
Similar to AWS Segment XO Group Joint webinar
Similar to AWS Segment XO Group Joint webinar (20)
AWS Segment XO Group Joint webinar
- 1. Loading and Analyzing Behavioral
Data in Amazon Redshift
PresentedbySegment,AWS&XOGroup Inc.
March3,2015
- 6. Amazon Redshift is Easy to Use
• Provisioninminutes
• Monitorqueryperformance
• Pointandclickresize
• Builtinsecurity
• Automaticbackups
- 7. Amazon Redshift Architecture
• LeaderNode
– SQLendpoint
– Storesmetadata
– Coordinatesqueryexecution
• ComputeNodes
– Local,columnarstorage
– Executequeriesinparallel
– Load,backup,restoreviaAmazonS3
– ParallelloadfromAmazonDynamoDB,Amazon
EMR,AmazonS3,HDFS/SSH
• Twohardwareplatforms
– Optimizedfordataprocessing
– DW1:HDD;scalefrom2TBto1.6PB
– DW2:SSD;scalefrom160GBto256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
- 8. • Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage • Withrowstorageyoudo
unnecessaryI/O
• Togettotalamount,youhavetoread
everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift Dramatically Reduces I/O
- 10. analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
• Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage
• COPYcompressesautomatically
• Youcananalyzeandoverride
• Moreperformance,lesscost
Amazon Redshift Dramatically Reduces I/O
- 11. • Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Tracktheminimumandmaximum
valueforeachblock
• Skipoverblocksthatdon’tcontain
relevantdata
Amazon Redshift Dramatically Reduces I/O
- 12. • Columnstorage
• Datacompression
• Zonemaps
• Direct-attachedstorage
128 GB RAM
16 cores
16 TB disk
DW.HS1.8XL:
• >2GB/sscanrate
• Optimizedfordataprocessing
• Highdiskdensity
16 GB RAM
2 cores
2 TB disk
DW.HS1.XL:
Amazon Redshift Dramatically Reduces I/O
- 13. • Query
• Load
• Backup/Restore
• Resize
Amazon Redshift Parallelizes and Distributes Everything
- 14. Amazon S3/DynamoDB
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
• Query
• Load
• Backup/Restore
• Resize
• ParallelloadfromAmazonDynamoDB,AmazonEMR,
AmazonS3,HDFS/SSH
• Kinesisintegration
• Dataautomaticallydistributedandsortedaccordingto
DDL
• Scaleslinearlywithnumber
ofnodes
Amazon Redshift Parallelizes and Distributes Everything
- 15. Amazon S3
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
• Query
• Load
• Backup/Restore
• Resize
• BackupstoAmazonS3areautomatic,continuous
andincremental
• Backupyourclustertoasecondregion
• Configurablesystemsnapshotretentionperiod;take
usersnapshotson-demand
• Streamingrestoresenableyoutoresumequerying
faster
Amazon Redshift Parallelizes and Distributes Everything
- 16. SQL Clients/BI Tools
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Leader
Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Comput
e Node
128GB RAM
48TB disk
16 cores
Leader
Node
• Query
• Load
• Backup/Restore
• Resize
• Add/removenodesorchangenodetypewhile
remainingonline
• Provisionanewclusterandcopydatainparallelfrom
nodetonode
• OnlychargedforsourceclusteruntilSQLendpoint
hasautomaticallybeenswitchedoverviaDNS
Amazon Redshift Parallelizes and Distributes Everything
- 17. • SSLtosecuredataintransit
• Encryptiontosecuredataatrest
– AES-256;hardwareaccelerated
– AllblocksondisksandinAmazonS3
encrypted
– HSM/CloudHSM
• Nodirectaccesstocomputenodes
• AmazonVPCsupport
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
128GB RAM
16TB disk
16 cores
Amazon S3 / Amazon DynamoDB
Customer VPC
Internal
Security
Group
JDBC/ODBC
Leader
Node
Compute
Node
Compute
Node
Compute
Node
Amazon Redshift Has Security Built In
- 30. XO Group Inc. + Segment
Individualproductteams
wantedisolatedaccessto
theirownanalytics.
Segment+Mixpanel+
Customer.io+Optimizely+
Uservoice
- 31. XO Group Inc. + Segment
Stillneededasolutionto
connectSegmentdatafrom
multipleproductsand
platformsintoasingleview.
SegmentSQL+
ModeAnalytics
- 39. More considerate “Share” options
SMSadressfrom
desktopormobileweb
browser.
Oneclickemailyourselfthe
detailsofavenue.