Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)

7,263 views

Published on

Introduce big data architecture and how to apply it for media industry

Published in: Technology

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)

  1. 1. .©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. John  Chang Ecosystem  Solutions  Architect April  2016 大數據運算 媒體業案例分享
  2. 2. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. What  is  big  data? Big  data  on  AWS Northbay customer  case  studies Best  practices APN  resources
  3. 3. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. What  Is  Big  Data  &  Why  Do  We  Care?
  4. 4. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. GB TB PB ZB EB Big  Data:  Unconstrained  Growth 95%  of  the  1.2   zettabytes of  data  in  the   digital  universe  is   unstructured 70%  of  this  data  is  user-­ generated  content   Unstructured  data   growth  is  explosive Machine  data/IoT will   only  steepen  the  curve Source:  IDC
  5. 5. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Data  Gap 1990 2000 2010 2020 Generated  Data Available  for  Analysis Data  Volume Sources:   Gartner:  User  Survey  Analysis:  Key  Trends  Shaping  the  Future  of  Data  Center  Infrastructure  Through  2011   IDC:  Worldwide  Business  Analytics  Software  2012–2016  Forecast  and  2011  Vendor  Shares  
  6. 6. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Big  Data  Evolution Batch Report Real-­‐time   Alerts Prediction Forecast
  7. 7. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Plethora  of  Tools Amazon   Glacier S3 DynamoDB   RDS EMR Amazon   Redshift Data  PipelineAmazon  Kinesis   Cassandra CloudSearch Kinesis-­ enabled   app Lambda ML SQS ElastiCache DynamoDB Streams  
  8. 8. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. A  complete  platform  for  big  data  &  analytics Retrospective analysis  and   reporting Here-­and-­now real-­time  processing   and  dashboards Predictions to  enable  smart   applications Amazon  Kinesis   Amazon  EC2   Amazon  Redshift   Amazon  EMR Amazon  ML Amazon  EMR
  9. 9. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Is  there  a  reference  architecture  ? What  tools  should  I  use  ? How  ?   Why  ?
  10. 10. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. http://aws.amazon.com/marketplace Big  Data  Case  Studies Learn  from  other  AWS  customers aws.amazon.com/solutions/case-­studies/big-­data
  11. 11. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Simplify  Big  Data  Processing ingest  / collect store process  / analyze consume  /   visualize Time  to  Answer  (Latency) Throughput Cost
  12. 12. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Collect  / Ingest  
  13. 13. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Types  of  Data • Transactional • Database  reads  &  writes  (OLTP) • Cache   • Search • Logs • Streams • File • Log  files  (/var/log) • Log  collectors  &  frameworks • Stream • Log  records • Sensors  &  IoT data Database File Storage Stream Storage A iOS Android Web  Apps Logstash LoggingIoTApplications Transactional Data File Data Stream Data Mobile   Apps Search Data Search Collect Store LoggingIoT
  14. 14. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Store
  15. 15. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Stream   Storage A iOS Android Web  Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon ElastiCache SearchSQLNoSQLCacheStreamStorageFileStorage Transactional Data File Data Stream Data Mobile   Apps Search Data Database File Storage Search Collect Store LoggingIoTApplications ü
  16. 16. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Why  Is  Amazon  S3  Good  for  Big  Data? • Natively  supported  by  big  data  frameworks (Spark,  Hive,  Presto,  etc.)   • No  need  to  run  compute  clusters  for  storage  (unlike  HDFS) • Can  run  transient  Hadoop  clusters  &  Amazon  EC2  Spot  instances • Multiple  distinct  (Spark,  Hive,  Presto)  clusters  can  use  the  same  data • Unlimited  number  of  objects   • Very  high  bandwidth    – no  aggregate  throughput  limit • Highly  available  – can  tolerate  AZ  failure • Designed  for  99.999999999%  durability • Tired-­storage  (Standard,  IA,  Amazon  Glacier)  via  life-­cycle  policy • Secure  – SSL,  client/server-­side  encryption  at  rest • Low  cost
  17. 17. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. What  about  HDFS  &  Amazon  Glacier? • Use  HDFS  for  very  frequently   accessed  (hot)  data • Use  Amazon  S3  Standard  for   frequently  accessed  data   • Use  Amazon  S3  Standard  – IA  for  infrequently  accessed   data • Use  Amazon  Glacier  for   archiving  cold  data  
  18. 18. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Database  +   Search   Tier A iOS Android Web  Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon ElastiCache SearchSQLNoSQLCacheStreamStorageFileStorage Transactional Data File Data Stream Data Mobile   Apps Search Data Collect Store ü
  19. 19. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Database  +  Search  Tier  Anti-­pattern Database  +  Search  Tier
  20. 20. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Best  Practice  — Use  the  Right  Tool  for  the  Job Data  Tier Search Amazon   Elasticsearch Service Amazon   CloudSearch Cache Redis Memcached SQL Amazon  Aurora MySQL PostgreSQL Oracle SQL  Server NoSQL Cassandra Amazon   DynamoDB HBase MongoDB Database  +  Search  Tier
  21. 21. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. What  Data  Store  Should  I  Use? • Data  structure  →  Fixed  schema,  JSON,  key-­value • Access  patterns  →  Store  data  in  the  format  you  will   access  it • Data  /  access  characteristics  →  Hot,  warm,  cold • Cost  →  Right  cost
  22. 22. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Data  Structure  and  Access  Patterns Access  Patterns What  to  use? Put/Get  (Key, Value) Cache,  NoSQL Simple relationships  →  1:N, M:N NoSQL Cross table  joins,  transaction,  SQL SQL Faceting,  Search   Search Data Structure What  to  use? Fixed  schema SQL,  NoSQL Schema-­free (JSON) NoSQL,  Search (Key, Value) Cache,  NoSQL
  23. 23. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Data  /  Access  Characteristics:  Hot,  Warm,  Cold Hot Warm Cold Volume MB–GB GB–TB PB Item  size B–KB KB–MB KB–TB Latency ms ms,  sec min,  hrs Durability Low–High High Very  High Request  rate Very  High High Low Cost/GB $$-­$ $-­¢¢ ¢ Hot  Data Warm  Data Cold  Data
  24. 24. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. What  Data  Store  Should  I  Use? Amazon   ElastiCache Amazon DynamoDB Amazon Aurora Amazon Elasticsearch Amazon   EMR  (HDFS) Amazon  S3 Amazon Glacier Average   latency ms ms ms,  sec ms,sec sec,min,hrs ms,sec,min (~  size) hrs Data  volume GB GB–TBs (no limit) GB–TB (64  TB   Max) GB–TB GB–PB (~nodes) MB–PB (no limit) GB–PB (no limit) Item  size B-­KB KB (400  KB   max) KB (64  KB) KB (1  MB  max) MB-­GB KB-­GB (5  TB max) GB (40  TB  max) Request  rate High  -­ Very  High Very  High (no  limit) High High Low  – Very   High Low  – Very  High (no limit) Very  Low Storage  cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10 Durability Low  -­ Moderate Very  High Very  High High High Very  High Very  High Hot  Data Warm  Data Cold  Data Hot  Data Warm  Data Cold  Data
  25. 25. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Process  / Analyze
  26. 26. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. AnalyzeA iOS Android Web  Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessingBatchInteractive Logging StreamStorage IoTApplications FileStorage Hot Cold War m Hot Hot ML Transactional Data File Data Stream Data Mobile   Apps Search Data Collect Store Analyze ü ü
  27. 27. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Process  /  Analyze Analysis  of  data is  a  process  of  inspecting,  cleaning,   transforming,  and  modeling data with  the  goal  of  discovering   useful information,  suggesting  conclusions,  and  supporting   decision-­making. Examples • Interactive  dashboards  → Interactive  analytics • Daily/weekly/monthly  reports  →  Batch  analytics • Billing/fraud  alerts,  1  minute  metrics  →  Real-­time  analytics • Sentiment  analysis,  prediction  models  →  Machine  learning
  28. 28. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Spark  Streaming   Apache  Storm AWS  Lambda KCL Amazon   Redshift Spark   Impala   Presto Hive Amazon Redshift Hive Spark   Presto Impala Amazon   Kinesis Apache  Kafka Amazon   DynamoDB Amazon  S3data Hot Cold Data  Temperature Processing  Latency Low High Answers Amazon  EMR   (HDFS) Hive Native KCL AWS  Lambda Data  Temperature  vs  Processing  Latency Batch
  29. 29. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Interactive  Analytics Takes  large  amount  of  (warm/cold)  data Takes  seconds to  get  answers  back Example:  Self-­service  dashboards
  30. 30. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Batch  Analytics Takes  large  amount  of  (warm/cold)  data Takes  minutes  or  hours to  get  answers  back Example:  Generating  daily,  weekly,  or  monthly  reports
  31. 31. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Real-­Time  Analytics Take  small  amount  of  hot  data  and  ask  questions   Takes  short  amount  of  time  (milliseconds  or  seconds)  to   get  your  answer  back • Real-­time  (event) • Real-­time  response  to  events  in  data  streams • Example:  Billing/Fraud  Alerts   • Near  real-­time  (micro-­batch) • Near  real-­time  operations  on  small  batches  of  events  in  data   streams • Example:  1  Minute  Metrics
  32. 32. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Predictions  via  Machine  Learning ML  gives  computers  the  ability  to  learn  without  being  explicitly   programmed Machine  Learning  Algorithms: -­ Supervised  Learning  ←  “teach”  program -­ Classification  ← Is  this  transaction  fraud?  (Yes/No)   -­ Regression  ← Customer  Life-­time  value?   -­ Unsupervised  Learning  ←  let  it  learn  by  itself -­ Clustering  ←  Market  Segmentation
  33. 33. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Analysis  Tools  and  Frameworks Machine  Learning • Mahout,  Spark  ML,  Amazon  ML Interactive  Analytics • Amazon  Redshift,  Presto,  Impala,  Spark Batch  Processing • MapReduce,  Hive,  Pig,  Spark Stream  Processing • Micro-­batch:  Spark  Streaming,  KCL,  Hive,  Pig • Real-­time:  Storm,  AWS  Lambda,  KCL Amazon Redshift Impala Pig Amazon Machine Learning Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce StreamProcessingBatchInteractiveML Analyze
  34. 34. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Real-­time  Analytics Producer Apache Kafka KCL AWS  Lambda Spark Streaming Apache   Storm Amazon   SNS Amazon ML Notifications Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Alert App  state Real-­time  Prediction KPI process store DynamoDB Streams Amazon   Kinesis
  35. 35. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Interactive  &   Batch Analytics Producer Amazon  S3 Amazon  EMR Hive Pig Spark Amazon ML process store Consume Amazon   Redshift Amazon  EMR Presto Impala Spark Batch Interactive Batch  Prediction Real-­time  Prediction
  36. 36. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Batch  Layer Amazon Kinesis data process store Lambda  Architecture Amazon   Kinesis  S3   Connector   Amazon  S3 A p p l i c a t i o n s Amazon   Redshift Amazon  EMR Presto Hive Pig Spark answer Speed  Layer answer Serving   Layer Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES answer Amazon ML KCL AWS  Lambda Spark  Streaming Storm
  37. 37. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Consume  /   Visualize
  38. 38. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Collect Store Analyze Consume A iOS Android Web  Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessingBatchInteractive Logging StreamStorage IoTApplications FileStorage Analysis&Visualization Hot Cold War m Hot Slow Hot ML Fast Fast Transactional Data File Data Stream Data Notebook s Predictions Apps & APIs Mobile   Apps IDE Search Data ETL Amazon   QuickSight
  39. 39. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Consume • Predictions   • Analysis  and  Visualization • Notebooks • IDE • Applications  &  API Consume Analysis&Visualization Amazon   QuickSight Notebook s Predictions Apps & APIs IDE Store Analyze ConsumeETL Business   users Data  Scientist,   Developers
  40. 40. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Putting  It  All  Together
  41. 41. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Collect Store Analyze Consume A iOS Android Web  Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon S3 Apache Kafka Amazon Glacier Amazon Kinesis Amazon DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessingBatchInteractive Logging StreamStorage IoTApplications FileStorage Analysis&Visualization Hot Cold War m Hot Slow Hot ML Fast Fast Amazon   QuickSight Transactional Data File Data Stream Data Notebook s Predictions Apps & APIs Mobile   Apps IDE Search Data ETL Reference  Architecture
  42. 42. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Problem  Statement: • Need  massive  scalability  and  elasticity Use  of  AWS: • Nearly  100%  of  its  online  video  service  on  AWS • Global  use  of  Amazon  EC2,  Amazon  S3,  Amazon  SQS,   Amazon  EMR,  Lambda,  etc. • 30-­50K  EC2  instances Business  Benefits:   • Application  achieves  near  zero  downtime • Massive  scalability  and  elasticity • Transcoding  entire  library  to  ~60  output  renditions “AWS  is  the  market  leader  and  has  been  able  to  create  a  continuous  and  virtuous  cycle.”   – Kevin  McEntee,  VP  Content  Engineering,  Netflix Customer:  Netflix
  43. 43. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. AdRoll Builds  Bidding  Platform  on  AWS  and  Cuts  Costs  by  83% AdRoll is  a  global  leader  in  digital  advertising   retargeting  products. We’ve  been  able  to   seamlessly  scale  our   infrastructure  and  reduce  our   fixed  costs  by  75%  and   operational  costs  by  83%.” Valentino  Volonghi CTO,  AdRoll ” “ • AdRoll manages  its  Real-­Time  Bidding  platform  using   Amazon  EC2,  Amazon  Dynmo DB,  and  Amazon  S3 • Reduced  annual  operational  costs  by  83% • Reduced  fixed  costs  by  75% • Staff  now  95%  focused  on  new  product  development  
  44. 44. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Problem  Statement: • Needed  scalable,  high  performance,  and  highly  available   storage  and  big  data  solutions Use  of  AWS: • Direct  Connect,  S3,  EMR,  other  AWS  services • Went  from  ~5GB  of  logs  per  day  to  ~1300GB/day Business  Benefits:   • By  moving  to  AWS,  went  from  spending  $50K/mo to   $13K/mo on  big  data  solutions Xfinity X1  Set  Top  Box  Platform Customer:  Comcast
  45. 45. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. MLB  Advanced  Media 「消費者行為正在改變。他們從行動裝置上網購物,這種技 術對於球賽的進化非常重要。」 「我們的努力中最令人興奮的事,就是 AWS  支援的 Statcast。我們首次可以測量以前無法測量的資料。」
  46. 46. ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Partnering  with  AWS
  47. 47. Thank  you! ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved. Questions?
  48. 48. Thank  you! ©  2015,  Amazon  Web  Services,  Inc.  or  its  Affiliates.  All  rights  reserved.

×