82. 82
Workflow tools from re:Invent 2015
• Dataduct / AWS Data Pipeline
– BDT404: Building and Managing Large-Scale ETL Data Flows with AWS
Data Pipeline and Dataduct (Coursera)
• Airflow
– DAT308: How Yahoo! Analyzes Billions of Events a Day on Amazon
Redshift (Yahoo)
• Luigi
– CMP310: Building Robust Data Processing Pipelines Using Containers
and Spot Instances (AdRoll)
• Others: Oozie (EMR4.1.0~), Azkaban
89. 89
AWSでBig Data活用
• Big Dataは誰もが持っていて、活用するかしないか
– 全量分析、相関関係、個にフィードバック
• AWSならすぐに始められて、どんな規模にも対応
– 初期投資不要、従量課金、豊富な大規模事例
• SQL on Big Dataの台頭
– とても簡単かつ高速に、全件検索可能
90. 90
参考文献・資料
• 書籍
– 『分析力を駆使する企業』『データ・アナリティクス3.0』
• re:Inventセッション資料
– BDT403: Best Practices for Building Real-time Streaming Applications with
Amazon Kinesis (AdRoll)
– BDT320: Streaming Data Flows with Amazon Kinesis Firehose and Amazon
Kinesis Analytics (スシロー)
– BDT307: Zero Infrastructure, Real-Time Data Collection, and Analytics (Zillow)
– BDT208: A Technical Introduction to Amazon Elastic MapReduce (AOL)
– BDT303: Running Spark and Presto on the Netflix Big Data Platform (Netflix)
– BDT305: Amazon EMR Deep Dive and Best Practices (FINRA)
– BDT205: Your First Big Data Application on AWS
– BDT306: The Life of a Click: How Hearst Publishing Manages Clickstream
Analytics with AWS (HEARST)
– BDT314: Running a Big Data and Analytics Application on Amazon EMR and
Amazon Redshift with a Focus on Security (Nasdaq)