Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data on AWS - Toronto FSI Symposium - October 2016

1,852 views

Published on

Shawn Gandhi, head of Solutions Architecture for AWS Canada, takes us on a journey through Big Data and the different strategies and services available to implementers and practicioners.

Published in: Business
  • Be the first to comment

  • Be the first to like this

Big Data on AWS - Toronto FSI Symposium - October 2016

  1. 1. Shawn Gandhi Head of Solutions Architecture AWS Canada @shawnagram Big Data on AWS
  2. 2. Generated data Available for analysis Data volume Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  3. 3. Abraham Wald (1902-1950)
  4. 4. Data is part of the fabric of the applications Front-end and UX Mobile Back-end and operations Data and analytics
  5. 5. What is AWS? AWS Global Infrastructure Application Services Networking Deployment & Administration DatabaseStorageCompute
  6. 6. ENTERPRISE APPS DEVELOPMENT & OPERATIONSMOBILE SERVICESAPP SERVICESANALYTICS Data Warehousing Hadoop/ Spark Streaming Data Collection Machine Learning Elastic Search Virtual Desktops Sharing & Collaboration Corporate Email Backup Queuing & Notifications Workflow Search Email Transcoding One-click App Deployment Identity Sync Single Integrated Console Push Notifications DevOps Resource Management Application Lifecycle Management Containers Triggers Resource Templates TECHNICAL & BUSINESS SUPPORT Account Management Support Professional Services Training & Certification Security & Pricing Reports Partner Ecosystem Solutions Architects MARKETPLACE Business Apps Business Intelligence Databases DevOps Tools NetworkingSecurity Storage Regions Availability Zones Points of Presence INFRASTRUCTURE CORE SERVICES Compute VMs, Auto-scaling, & Load Balancing Storage Object, Blocks, Archival, Import/Export Databases Relational, NoSQL, Caching, Migration Networking VPC, DX, DNS CDN Access Control Identity Management Key Management & Storage Monitoring & Logs Assessment and reporting Resource & Usage Auditing SECURITY & COMPLIANCE Configuration Compliance Web application firewall HYBRID ARCHITECTURE Data Backups Integrated App Deployments Direct Connect Identity Federation Integrated Resource Management Integrated Networking API Gateway IoT Rules Engine Device Shadows Device SDKs Registry Device Gateway Streaming Data Analysis Business Intelligence Mobile Analytics
  7. 7. Three types of data-driven development Retrospective analysis and reporting Amazon Redshift Amazon RDS Amazon S3 Amazon EMR
  8. 8. Three types of data-driven development Retrospective analysis and reporting Here-and-now real-time processing and dashboards Amazon Kinesis Amazon EC2 AWS Lambda Amazon Redshift, Amazon RDS Amazon S3 Amazon EMR
  9. 9. Three types of data-driven development Retrospective analysis and reporting Here-and-now real-time processing and dashboards Predictions to enable smart applications Amazon Kinesis Amazon EC2 AWS Lambda Amazon Redshift, Amazon RDS Amazon S3 Amazon EMR
  10. 10. Global Footprint
  11. 11. AZ AZ AZ AZ AZ What is a Region? • Each datacenter has a purpose built network
  12. 12. AZ AZ AZ AZ AZ What is a Region? • Metro-area DWDM links between AZs • AZs <2ms apart & usually <1ms • Each datacenter has a purpose built network
  13. 13. Big Data Pipeline Data Answers Collect Process Analyze Store
  14. 14. Primitive Patterns Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis
  15. 15. One tool to rule them all
  16. 16. Collect Process Analyze Store Data Collection and Storage Data Processing Data Analysis Event Processing Primitive Patterns S3 Kinesis DynamoDB RDS (Aurora) AWS Lambda KCL Apps EMR Redshift Machine Learning
  17. 17. Collect Process Analyze Store Data Collection and Storage Primitive Patterns S3 Kinesis DynamoDB RDS (Aurora)
  18. 18. Data Collection and Storage File Stream Transactional AppsLoggingFrameworks
  19. 19. AWS Services – Data Collection and Storage
  20. 20. S3 $0.030/GB-Mo Redshift Starts at $0.25/hour EC2 Starts at $0.02/hour Glacier $0.010/GB-Mo Kinesis $0.015/shard 1MB/s in; 2MB/out $0.028/million puts
  21. 21. Collect Process Analyze Store Event Processing Primitive Patterns AWS Lambda KCL Apps
  22. 22. Event Processing – Enabling Capabilities AWS Lambda KCL Apps
  23. 23. Primitive Patterns Collect Process Analyze Store Data Collection and Storage Data Processing Event Processing Data Analysis EMR Redshift Machine Learning
  24. 24. Big Data in Action FINRA handles approximately 30 billion market events every day to build a holistic picture of trading in the U.S. Deter misconduct by enforcing the rules Detect and prevent wrongdoing in the U.S. markets Discipline those who break the rules
  25. 25. Market volumes are volatile and steadily increasing Exchanges and markets are evolving dynamically New securities products are being introduced New rules and regulations are being created Market manipulators are innovating FINRA – The Need for Big Data
  26. 26. AWS Offered the Right Services For FINRA’s Platform Cloud Platform APIs at the right layer Automated infrastructure deployment Open source commitment Operations Security
  27. 27. FINRA – A Platform That Adapts to Market Dynamics Data Integration Hbase Hadoop MapReduce Flexible Interactive Queries Hadoop EMR SQL/Hive Fast Predefined Queries Hbase/NoSQL Hadoop Predefined Datamarts Surveillance Analytics EMR Hive Web Applications Analysts Regulators Data Management Services Data Movement Data Registration Notification Version Management Job Management Cluster Management S3 Firms
  28. 28. From One Instance
  29. 29. To Thousands
  30. 30. And Back Again
  31. 31. “At FINRA, we chose AWS because we wanted to be able to deliver innovation at a much larger scale and much more rapidly to our core business. ”- Saman Michael Far, SVP of Technology What FINRA needed: • Infrastructure for its market surveillance platform • Analysis and storage of approximately 75 billion market records every day • Interactively query multi-petabyte data sets Why they chose AWS: • Fulfillment of FINRA’s security requirements • Ability to create a flexible platform using dynamic clusters (Hadoop, Hive, and HBase), Amazon EMR, and Amazon S3 Benefits realized: • Increased agility, speed, and cost savings • Estimated savings of $20m annually by using AWS FINRA FINRA is the largest independent regulator for all securities firms doing business in the US. FINRA oversees about 4,250 brokerage firms, about 162,155 branch offices and approximately 629,525 registered securities representatives.
  32. 32. “The speed and performance of AWS are impressive. Data manipulation processes that took days are now down to one minute. ” National Bank of Canada has more than CAD$219 billion in AUM. The bank’s Global Equity Derivatives Group (GED) is a leader in providing stock-trading solutions that manage exchange-traded securities such as stocks, funds, futures, and options. - Pascal Bergeron, Director of Algorithmic Trading What the National Bank of Canada needed: • Quickly collect a fast-growing volume of stock-market financial data • Scale its data-analysis platform, which was outgrowing the on-prem resources • Process and analyze structured and unstructured data, historic and real time Why they chose AWS: • The most big data services and solutions, such as Cloudera and TickSmith • Reliability to easily process and analyze hundreds of terabytes of data Benefits realized: • Ability to easily access historic data, as far back as 10 years ago • Acceleration of post-trade analysis time, from weeks to hours • Improvement and optimization of trading operations, resulting in more revenue The National Bank of Canada
  33. 33. The Benefits of Big Data on AWS Agility Respond quickly to market challenges Speed Cost Savings Reduce query times from hours to seconds Efficient scale Pay for what you use
  34. 34. Thank you @shawnagram

×