Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data warehouse solutions

274 views

Published on

Data warehouse solutions with AWS

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data warehouse solutions

  1. 1. Data Warehouse Solutions Present by: Tu Pham
  2. 2. TOC • Flow • Client Tracker • Log collector - forwarder • Data storage • Data ETL • Analytics • Architecture
  3. 3. What we need to be
  4. 4. Flow
  5. 5. Client Tracker • Javascript • Mobile SDK
  6. 6. Log collector • Logstash • Kafka • Springxd
  7. 7. Why choose Amazon WS
  8. 8. Data storage with Amazon WS • S3 - Simple object storage • Glacier - Low cost archive storage • Redshift - Petabyte-scale data warehouse • EBS - EC2 Block storage volumes • EFS - Elastic file system for EC2
  9. 9. S3 - Simple object storage • Pros & Cons – Pros • Easy to use • Secure • Durable • Scalable – Cons • Slow I/O • Pricing (Standard - Asia): – Storage: $30 / TB / month – Requests: • PUT, COPY, POST, LISTS - $5 per 1M request • GET & other - $0.4 per 1M request – Networking: • Out to Internet: $120 per 1 TB
  10. 10. Glacier - Low cost archive storage • Pros & Cons – Pros • Secure • Durable • Low cost – Cons • Only for backup • Slow I/O • Pricing – Storage: $10 / TB / month – Requests: • UPLOAD & RETRIEVAL - $5 per 1M request – Networking: • Out to Internet: $90 per 1 TB
  11. 11. Redshift - Petabyte-scale data warehouse • Pros & Cons – Pros • Secure • Durable • High speed • Sql compatible (Based on PostgreSql) – Cons • Very expensive • Not schemaless database for mass storage • Pricing: – $900 for common server (4 vCPU, 31 GB Ram, 2TB HDD)
  12. 12. Data ETL with Amazon WS • Data Pipelines – Pros & Cons • Pros – Easy transform and process to other AWS service » S3 » EMR » RDS » DynamoDB – Low cost (Almost free) • Cons – Only for AWS service
  13. 13. Analytics with Amazon WS • EMR - quickly and cost-effective process big data • Kinesis - real time data processing
  14. 14. EMR - quickly and cost-effective process big data • Pros & Cons – Pros • Scalable • Flexible data store (S3, Glacier, Redshift, HDFS, …) • Support Hadoop tools (Hive, Pig, …) & Spark • Hourly run with low cost – Cons • Not so fast (Redshift have 10x performance) • Pricing: – Based on instance used ($94 to $2367 per year)
  15. 15. Challenger • Proxy between Local country - AWS data center • High performance / Durable / Scalable Log shipper / collector system • Support dynamic data model • Reduce AWS cost
  16. 16. THANKS FOR LISTENING

×