Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Financial Services Analytics on AWS

3,656 views

Published on

Financial Services Analytics on AWS by Ian Meyers, Principal Solution Architect, Amazon Web Services.

Published in: Technology

Financial Services Analytics on AWS

  1. 1. Financial Services Analytics on AWS
  2. 2. Infrastructure Regions Availability  Zones Points  of  Presence Enterprise Applications Virtual  Desktops Sharing  &  Collaboration Core  Services Storage (Object,  Block   and  Archival) Compute (VMs,  Auto-­scaling   and  Load  Balancing) Databases (Relational,  NoSQL,   Caching) Networking (VPC,  DX,   DNS) CDN Access   Control Usage  &  Resource   Tracking Monitoring   and  Logs Administration  &   Security Key  Storage  &   Management Identity Management Service   Catalog Platform   Services Deployment  &  Management One-­click  web  app   deployment Dev/ops  resource management Resource  Templates Push Notifications Mobile  Services Identity Sync Mobile   Analytics App  Services Queuing  & Notifications Workflow App  streaming Transcoding Email Search Analytics Hadoop Data   Pipelines Data   warehouse Real-­time Streaming  Data Code  Deploy Code  Pipeline Code  Commit Machine   Learning
  3. 3. US-WEST (Oregon) EU-WEST (Ireland) ASIA PAC (Tokyo) ASIA PAC (Singapore) US-WEST (San Francisco) SOUTH AMERICA (Sao Paulo) US-EAST (Virginia) GOV CLOUD ASIA PAC (Sydney) Global  Infrastructure CHINA (beta) EU-CENTRAL(Germany) Current Region Announced Region Hong Kong Ohio London India
  4. 4. Availability Zone Global  Infrastructure
  5. 5. Requirements Store Any Amount of Data Without Capacity Planning Perform Complex Analysis on Any Data Scale on Demand Store Data Securely Move to Real Time Realised Value Agile Analytics, DevOps in the Warehouse Decrease Time to Market Build Environments Quickly Reduce Costs Reduce Capital Expenditure Enable Global Reach
  6. 6. 87% now will consider cloud for their big data Advanced analytics closing-in on BI Issues beyond security (reality, perception, regulation) being addressed by march of technology Building & deploying Big Data analytics or processing applications in the cloud can reduce complexity and time to market Source: Gigaom Research data warehousing survey 2014
  7. 7. Ingestion … Integration … Retention
  8. 8. STORAGECOMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTECOMPUTE COMPUTE
  9. 9. Availability 99.99% Durability 99.999999999% A Distributed Object Store Not a file system No Single Points of Failure Eventually consistent Paradigm Object store Performance Very Fast Redundancy Across Availability Zones Security Public Key / Private Key Pricing $0.03/GB/month Typical use case Write once, read many Simple Storage Service Highly scalable object storage for the internet 1 byte to 5TB in size 99.999999999% durability
  10. 10. S3  – Standard S3  – Infrequent Access Amazon Glacier
  11. 11. 34 secs per terabyte GB/Second ReaderConnections Amazon S3 provides near linear scalability S3 Streaming Performance 100 VMs; 9.6GB/s; $26/hr 350 VMs; 28.7GB/s; $90/hr S3 Performance & Scalability
  12. 12. AWS Security Services Compute Storage AWS Global Infrastructure Database App Services Deployment & Administration Networking Analytics
  13. 13. IAM Users AWS Directory Service AD  Connector Direct Connect Hardware VPN
  14. 14. Amazon Kinesis Managed Service for Real Time Big Data Processing Create Streams to Produce & Consume Data Elastically Add and Remove Shards for Performance Use Kinesis Worker Library to Process Data Integration with S3, Redshift and Dynamo DB Compute Storage AWS Global Infrastructure Database App Services Deployment & Administration Networking Analytics Application  Services
  15. 15. Data   Sources App.4 [Machine   Learning] AWS  Endpoint App.1 [Aggregate  &   De-­‐Duplicate] Data   Sources Data  Sources Data   Sources App.2 [Metric   Extraction] S3 DynamoDB Redshift App.3 [Sliding   Window   Analysis] Data   Sources Availability   Zone Kinesis  Streams Availability   Zone Availability   Zone Shard  1 Shard  2 Shard  N
  16. 16. without writing an application managing infrastructure Batch compress encrypt in as little as 60 secs Capture  and  submit   streaming  data  to  Firehose Firehose  loads  streaming  data   continuously  into  S3  and  Redshift   Analyze  streaming  data  using  your  favorite   BI  tools   Kinesis  Firehose
  17. 17. Traditional Business Intelligence … OLAP … Data Sources for ML
  18. 18. Relational Database Service Managed Database-as-a-Service No need to install or manage database instances Automated Backup/Recover, Patching & Upgrade Scalable and fault tolerant configurations 6TB & 30,000 IOPS Managed Database RDS Dynamo DB Redshift ElastiCache Compute Storage AWS Global Infrastructure Database App Services Deployment & Administration Networking Analytics
  19. 19. Managed Data Warehouse Redshift Managed Massively Parallel Petabyte Scale Data Warehouse Streaming Backup/Restore to S3 Load data from S3, DynamoDB and EMR Extensive Security Features Scale from 160 GB -> 1.6 PB Online RDS Dynamo DB Redshift ElastiCache Compute Storage AWS Global Infrastructure Database App Services Deployment & Administration Networking Analytics
  20. 20. Redshift lets you start small and grow big Extra Large Node (dc1.xl & ds2.xl) 3 spindles, 15-30GiB RAM 2 or 4 virtual cores, 10GigE Single Node (160GB SSD or 2TB Magnetic) Cluster 2-32 Nodes (320GB SSD – 64TB Magnetic) 8 Extra Large Node (dc1.8xl & ds2.8xl) 24 spindles, 120-244GiB RAM, 2.56TB SSD or 16TB Magnetic, 16 or 32 virtual cores, 10GigE Cluster 2-100 Nodes (5TB SSD – 1.6PB Magnetic) 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8 XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL XL
  21. 21. 23 LEADING INDEX PROVIDER WITH 41,000+ INDEXES ACROSS ASSET CLASSES AND GEOGRAPHIES Over 10,000 Corporate Clients in 60 countries Our technology powers over 70 MARKETPLACES, regulators, CSDs and clearing-houses in over 50 COUNTRIES100+ DATA PRODUCT OFFERINGS supporting 2.5+ million investment professionals and users IN 98 COUNTRIES 26 Markets 3 Clearing Houses 5 CentralSecurities Depositories Lists more than 3,500 companies in 35 countries, representing more than $8.8 trillion in total market value
  22. 22. Exploratory Analytics … Data Cleansing … Advanced Data Science
  23. 23. Elastic MapReduce Managed, elastic Hadoop (1.x & 2.x) cluster Integrates with S3, DynamoDB and Redshift Install End User Tools Automatically (Spark, Presto, Impala) Support for EC2 Spot Instances Transient or Always on Clusters Managed Big Data Elastic MapReduce Compute Storage AWS Global Infrastructure Database App Services Deployment & Administration Networking Analytics
  24. 24. Try different configurations to find the optimal cost/performance balance CPU c4 family cc2.8xlarge d2 family Memory m2 family r3 family Disk/IO d2 family i2 family General m3 family Choose yourinstance types ETL Machine Learning Spark HDFS
  25. 25. Weather Insurance for Farms Challenge: Volatile weather is deadly to crops like grapes 60 years of crop data 200 TB of S3 Data 1M government Doppler radarpoints Solution: Built a predictive model based on freely available data: 150B Soil Observations 850K Precision Rainfall Grids Tracked 3M Daily Weather Measurements 50 EMR clusters process new data as it comes into S3 each day, continuouslyupdating the model
  26. 26. $10-20M Savings by moving Platform to AWS
  27. 27. Predictive Analytics …
  28. 28. Easily create machine learning models Visualize and optimize models Put models into production in seconds Battle-hardened technologyMachine   Learning Software Development Introducing  Amazon  Machine  Learning
  29. 29. Developing with Amazon Machine Learning Build model Validate & optimize Make predictions 1 2 3
  30. 30. Use existing data in S3, Redshift and RDS Automatic data visualization & exploration Descriptive and summary statistics Your data doesn’t have to be perfect Missing data, malformed data records, type validation Building  a  Predictive  Model
  31. 31. Model  Validation  and  Optimization  Tools
  32. 32. Batch predictions Asynchronous predictions with trained model Real time predictions Synchronous, low latency, high throughput Mount API end-point with a single click Making  Predictions
  33. 33. Data Visualiation …
  34. 34. Old-­guard  BI   Costs  Too  Much Pay  $  million   before  seeing  first  analysis 3  year  TCO  $150  to  $250  per  user  per  month Takes  Too  Long Spend  6  to  12  months  of  consulting   and  SW  implementation  time
  35. 35. A  very  fast,  cloud-­powered,  BI  service  for   1/10th the  cost  of  old-­guard  BI  software
  36. 36. $9   per  user  per  month With  1  year  commitment
  37. 37. Business  user Sign-­in First  analysis  in  about  60  seconds Register  for  preview  beginning  Oct  7  at  aws.amazon.com/quicksight
  38. 38. Business  User QuickSight  API Data  Prep Metadata SuggestionsConnectors SPICE Business  User QuickSight  UI Mobile  Devices Web   Browsers Partner  BI  products Amazon S3 Amazon   Kinesis Amazon   DynamoDB   Amazon   EMR Amazon   Redshift Amazon   RDSFiles Third-­party
  39. 39. Native mobile experience iOS,  Android Full  experience  on  tablets Consumption  experience  on   smart  phones Very  fast  response
  40. 40. $9 $18 Per  user  per  month Per  user  per  month
  41. 41. Integrated Analytics
  42. 42. Validate  records,   recordsets  or   datasets Store  validation   status Manage   validation  rules Abide  data  store Validation  rules Validation   results  /  log Manage   ingestion  rules Split  data  into   records Assign  record   identifiers Output  records Store  event   details  –  rule,   stamp  etc Assign  record   metadata Check  record   format Transform  to   common  format Ingestion  rules Ingestion  audit   log Get  data Manage  input   queue Manage  receive   rules Assign  dataset   identifier Assign  dataset   metadata Store  original   data Store  event   details Receive  rules Receive  audit   log Original  data  store Data  service   endpoints Fetch  data  set Perform   calculations Save  datasets Re-­‐validate Store  event   details Manage   processing  rules Processing  audit   log Processing  rules Format  data Check  data Store  event   details Manage  output   rules Send  output Output  audit  log Output  rules Data  service   endpoints Storage Service  endpoint Function Rules Receive Ingest Validate Process Output Raw  data Common  format Validated Processed Output  format Data  path Events  &  logic Optional  data  path Raw  data* With  dataset  metadata* *  Visio  2013  only ©  Abide  Financial
  43. 43. Thank You!

×