Big dataandhp cforawsbrasilsummit

637 views

Published on

Apresentações do AWS Summit Sao Paulo 2014. Baixe o conteúdo preparado por nossos especialistas para auxiliá-lo na jornada para a nuvem.

Published in: Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
637
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
80
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big dataandhp cforawsbrasilsummit

  1. 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Big Data and High Performance Computing Solutions in the AWS Cloud Michel Pereira, Enterprise Solutions Architect May 27, 2014
  2. 2. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  3. 3. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  4. 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  5. 5. Generation Collection & storage Analytics & computation Collaboration & sharing
  6. 6. GB TB PB 95%  of  the  1.2  ze.abytes  of   data  in  the  digital  universe  is   unstructured   70%  of  of  this  is  user-­‐ generated  content     Unstructured  data  growth   explosive,  with  esDmates  of   compound  annual  growth   (CAGR)  at  62%  from  2008  –   2012.  Source:  IDC ZB EB Big Data: Unconstrained data growth
  7. 7. Lower cost, higher throughput Generation Collection & storage Analytics & computation Collaboration & sharing
  8. 8. Customer segmentation Marketing spend optimization Financial modeling & forecasting Ad targeting & real time bidding Clickstream analysis Fraud detection Use Cases
  9. 9. Visits, views, clicks, purchases Source, device, location, time Latency, throughput, uptime Likes, shares, friends, follows Price, frequency Metrics
  10. 10. Relational NoSQL Web servers Mobile phones Tablets 3rd party feeds Sources
  11. 11. Structured Unstructured Text Binary Near Real-time Batched Formats
  12. 12. Reporting Dashboards Sentiment Clustering Machine Learning Optimization Analysis
  13. 13. Lower cost, higher throughput Highly constrained Generation Collection & storage Analytics & computation Collaboration & sharing
  14. 14. Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Generated data Available for analysis Data volume Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  15. 15. Elastic and highly scalable No upfront capital expense Only pay for what you use + + Available on-demand + = Remove constraints
  16. 16. Accelerated Generation Collection & storage Analytics & computation Collaboration & sharing
  17. 17. Technologies and techniques for working productively with data, at any scale. Big Data
  18. 18. Big data and AWS cloud computing Big data Cloud computing Variety, volume, and velocity requiring new tools Variety of compute, storage, and networking options
  19. 19. Big data and AWS cloud computing Big data Cloud computing Potentially massive datasets Massive, virtually unlimited capacity
  20. 20. Big data and AWS cloud computing Big data Cloud computing Iterative, experimental style of data manipulation and analysis Iterative, experimental style of infrastructure deployment/usage
  21. 21. Big data and AWS cloud computing Big data Cloud computing Frequently not a steady-state workload; peaks and valleys At its most efficient with highly variable workloads
  22. 22. Big data and AWS cloud computing Big data Cloud computing Absolute performance not as critical as “time to results”; shared resources are a bottleneck Parallel compute projects allow each workgroup to have more autonomy, get faster results
  23. 23. Ease of useLower costs
  24. 24. no capital investment pay as you go no subscriptions only pay for what you use Ease of useLower costs
  25. 25. programmable zero admin easy to configure integrate with existing tools Ease of useLower costs
  26. 26. One tool to rule them all
  27. 27. Use the right tools Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon Redshift Amazon Elastic MapReduce
  28. 28. Store anything Object storage Scalable 99.999999999% durability Amazon S3
  29. 29. Real-time processing High throughput; elastic Easy to use EMR, S3, Redshift, DynamoDB Integrations Amazon Kinesis
  30. 30. NoSQL Database Seamless scalability Zero admin Single digit millisecond latency Amazon DynamoDB
  31. 31. Relational data warehouse Massively parallel Petabyte scale Fully managed $1,000/TB/Year Amazon Redshift
  32. 32. Hadoop/HDFS clusters Hive, Pig, Impala, Hbase Easy to use; fully managed On-demand and spot pricing Tight integration with S3, DynamoDB, and Kinesis Amazon Elastic MapReduce
  33. 33. HDFS Analytics languages Data management Amazon RedShift Amazon EMR Amazon RDS Amazon S3 Amazon DynamoDB Amazon Kinesis Sources SourcesData Sources AWS Data Pipeline
  34. 34. Generation Collection & storage Analytics & computation Collaboration & sharing
  35. 35. Generation Collection & storage Analytics & computation Collaboration & sharing Amazon Glacier S3 Amazon DynamoDB Amazon RDS Amazon Redshift AWS Direct Connect AWS Storage Gateway AWS Import/ Export Amazon Kinesis Amazon EMR
  36. 36. Generation Collection & storage Analytics & computation Collaboration & sharing Amazon EC2 Amazon EMRAmazon Kinesis
  37. 37. Generation Collection & storage Analytics & computation Collaboration & sharing Amazon CloudFront AWS CloudFormation S3 Amazon DynamoDB Amazon RDS Amazon Redshift Amazon EC2 Amazon EMR AWS Data Pipeline
  38. 38. The right tools. At the right scale. At the right time.
  39. 39. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  40. 40. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS Customer Success Story Victor Oliveira, Diretor de Engenharia Concrete Solutions Marcos Prete, Gerente de Parcerias SAS
  41. 41. The Power to Know A Empresa - Mundo •  Líder Mundial em Inteligência Analítica q  Dados para Informações Estratégicas q  Decisões mais rápidas q  Antecipar oportunidades •  Fundada em 1976 •  Matriz em Cary, Carolina do Norte •  14 mil funcionários em todo o mundo •  134 países, 400 escritórios •  Great Place to Work •  1º lugar nos rankings de 2010, 2011 e 2012
  42. 42. The Power to Know Produtos oferecidos em formato de licença, mas existe uma demanda latente de entrega de software como serviço (SaaS) A Empresa - Brasil •  Atuação desde 1996 •  + 180 clientes •  Escritórios em SP, RJ e DF •  + 140 colaboradores •  Certificação Top Employers 2012 e 2013
  43. 43. O Desafio do SAS •  Diminuir os Custo de Operação para seus clientes The Power to Know •  Adquirir e Gerenciar Servidores Físicos •  Simplificar a venda (da licença para SaaS) •  Oferecer uma Solução Completa •  Diminuir os Custo de Entrada para seus clientes
  44. 44. •  Big Data •  O produto já existe ! •  Evolução do Negócio •  Value Proposition •  Alavancar IaaS da AWS •  Parceria com Inteligência •  Concrete Solutions e SAS The Power to Know Abordagem
  45. 45. •  Inédito em SaaS no Brasil. •  Ferramenta beneficia departamentos que precisam: q  Tomar decisões rápidas baseadas em grande volume e variedade de dados (Big Data) q  Facilitar a análise dos indicadores de seus negócios •  Facilidade e velocidade de entrega, com menor custo em relação ao modelo tradicional. •  O cliente não precisará gerenciar vários provedores e nem manter uma estrutura interna para suporte ao aplicativo. The Power to Know O Produto – Visual Analytics
  46. 46. Dashboards  e   Scorecards   Relatórios     Corpora4vos   Análises  Dinâmicas  e     ad  hoc   Análises  Avançadas  e   Data  Mining   Mobile  Apps,     Distribuição  informação   e  Alertas     •  Ad  Hoc  Analysis   •  PredicDve  Analysis   •  Data  Mining   •  Visual  ExploraDon     •  Slice  &  Dice  InvesDgaDve  Analysis   •  Root  Cause  DeterminaDon   •  Page-­‐perfect  OperaDonal  ReporDng   •  Pixel-­‐perfect  Business  ReporDng   •  Print-­‐perfect  Statements  &  Invoices   •  Dynamic  Dashboards   •  OperaDonal  Scorecards   •  Metrics  Management   •  Mobile  ApplicaDons   •  Massive  InformaDon  DistribuDon   •  iPad,    iPhone,  email   •  ExcepDon-­‐based  Alerts   The Power to Know Introdução ao Visual Analytics
  47. 47. AWS e Benefícios PARAGRAFO RESUMO CASO _ KEY WORDS de BENEFICIO, DESAFIO VENCIDO – RESUMO DO CASO EM UM •  Flexibilidade de Capacidade •  Planejamento do Fluxo de Caixa •  Escalabilidade e Agilidade com baixo custo •  Flexibilidade no pagamento •  Menos funcionários para gerenciar a aplicação •  Melhora no fluxo de caixa The Power to Know Serviços Software •  Instalação •  Suporte •  Treinamento •  Carga de Dados •  SAS Visual Analytics Infraestrutura Gerenciada SoluçãoCompleta •  AWS e Concrete
  48. 48. The Power to Know BI Tradicional vs. Ambiente de Exploração de Dados
  49. 49. The Power to Know Obrigado! Mais informações: estamos no estande da Concrete! Marcos Prete Gerente de Alianças do SAS Brasil marcos.prete@sas.com Victor Oliveira Diretor de Engenharia victor.oliveira@concretesolutions.com.br @v_oliv
  50. 50. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  51. 51. Take a typical big computation task…
  52. 52. …that an average cluster is too small (or simply takes too long to complete)…
  53. 53. …optimization of algorithms can give some leverage…
  54. 54. …and complete the task in hand…
  55. 55. Applying a large cluster…
  56. 56. …can sometimes be overkill and too expensive
  57. 57. AWS instance clusters can be balanced to the job in hand…
  58. 58. …nor too large…
  59. 59. …nor too small…
  60. 60. …with multiple clusters running at the same time
  61. 61.   Why AWS for HPC? Low cost with flexible pricing Efficient clusters Unlimited infrastructure Faster time to results Concurrent Clusters on-demand Increased collaboration
  62. 62. Cluster compute instances Implement HVM process execution Intel® Xeon® processors 10 Gigabit Ethernet –C3 has Enhanced Networking, SR-IOV cc2.8xlarge 32 vCPUs 2.6 GHz Intel Xeon E5-2670 Sandy Bridge 60.5 GB RAM 4 x 840 GB Local HDD c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD AWS High Performance Computing
  63. 63. c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD Top 500 Super Computer using Amazon EC2 64th fastest supercomputer, Nov 2013 26,496 Intel® Xeon® cores Linpack Performance (Rmax) 484.2 TFlop/s Theoretical (Rpeak) 593.5 Tflops/s c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD c3.8xlarge 32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge 60GB RAM 2 x 320 GB Local SSD
  64. 64. Network placement groups Cluster instances deployed in a Placement Group enjoy low latency, full bisection 10 Gbps bandwidth 10Gbps AWS High Performance Computing
  65. 65. GPU compute instances cg1.4xlarge Intel® Xeon® X5570 33.5 vCPUs 22.5GB RAM 2x NVIDIA GPU 448 Cores 3GB Mem g2.2xlarge Intel® Xeon E5-2670 8vCPUs 15GB RAM 1x NVIDIA GPU 1536 Cores 4GB Mem G2 instances 1 NVIDIA Kepler GK104 GPU I/O Performance: Very High (10 Gigabit Ethernet) CG1 instances 2 x NVIDIA Tesla “Fermi” M2050 GPUs I/O Performance: Very High (10 Gigabit Ethernet) AWS High Performance Computing
  66. 66. HPC Partners and Apps
  67. 67. Making Production Cloud HPC easy from 64 cores to … Pharma Johnson & Johnson Manufacturing HGST, a Western Digital Company Financial Services Pacific Life Insurance Genomics Life Technologies Research The Aerospace Corporation … 156,314 cores for better solar panel materials for $33k, not $68M Amazon EC2 16,788 Spot Instances Amazon S3 4TB Processed Spot Instances on all 8 Regions 1.21 PetaFLOPS Intel SandyBridge on CC2
  68. 68. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  69. 69. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. AWS Customer Success Story Sergio Mafra, Líder de Inovação em TI ONS – Operador Nacional do Sistema Elétrico
  70. 70. •  O Operador Nacional do Sistema Elétrico (ONS) é uma empresa privada, responsável pelo planejamento e operação da geração e transmissão de energia elétrica no Sistema Interligado Nacional (SIN). •  Com cerca de 800 funcionários, em 5 localidades (Rio de Janeiro, Recife, Florianópolis e Brasília), o ONS é uma empresa intensiva em informações com uso contínuo de modelos matemáticos que requer HPC (High Performance Computing e Big Data) “A Amazon Web Services permite provisionar clusters de alto desempenho em minutos, reduzindo significantemente o tempo total de processamento”. “Com isso, percebemos que a AWS transforma High Performance Computers em High Performance Customers” - Sérgio Mafra
  71. 71. O SIN atende 98% do consumo de eletricidade do Brasil. SIN - Sistema Elétrico Brasileiro Sistemas Isolados Amazônia Legal 2% do Mercado Predominantement e Térmico + 300 localidades isoladas - Modelo predominantemente hidroelétrico com grandes reservatórios e grandes interligações.
  72. 72. O Desafio •  Prover ao ONS uma plataforma de maior capacidade de processamento, permitindo obter uma redução no tempo de solução dos modelos matemáticos, com custo adequado ao tempo de utilização, de fácil gestão do ambiente em cluster e que fosse transparente para a organização. •  Permitir o “time-to-market” para a área de TI, detendo o conhecimento e a responsividade às demandas inesperadas provenientes das áreas da organização. “Scotty, We Need More Power”
  73. 73. Benefícios alcançados •  Redução de cerca de 40% no tempo de resolução dos modelos matemáticos de planejamento eletro-energéticos, com custo 30% inferior. •  Condição de analisar 5 estratégias de utilização dos modelos Newave/Decomp em prazo recorde (1 semana), com a execução de 600 casos. O prazo on- premises seria de 3 semanas, incompatível com o compromisso acordado com o MME. Virtual Private Cloud Work Controlador Internet/ AWS 10.24.0.0/2410.24.1.0/24 10.21.0.0/16
  74. 74. Benefícios alcançados •  “Uau... 40 minutos para 4 minutos !!!!” •  “Agora vou usar todos os parâmetros de cálculo para ter um estudo mais completo” •  “Salta 4 x 80 para agora !!!” •  “Obrigado por poder sair 2 horas mais cedo. Todos os casos já rodaram” •  “Rodamos o estudo em 2 minutos. O sistema pode ser operacional e vai virar caso internacional de sucesso”
  75. 75. Sistema de Medição Sincronizada de Fasores - SMSF PDC
  76. 76. Armazenamento Anual do SMSF 2013 •  8,5 TB 2015 •  70 TB 2018 •  120 TB 2022 •  312 TB Big Data Data Coleta estimada para apenas 7 grandezas de medida Volume total do Storage do DC do Rio em 2013
  77. 77. Histórico 1 Tb Cluster Hadoop OpenPDC Coletor Master Nó 1 Nó 3 Nó N Nó 2 HDFS HDFS HDFS HDFS S3 Armazenador Glacier Historiador Glacier Glacier Glacier Glacier Analytics PMUs Controlador Processamento Arquitetura EM ESTUDO
  78. 78. Big Data HPC Customer Success Story Getting Started on AWS What we’ll cover today…
  79. 79. Solution Architects Professional Services Premium Support AWS Partner Network (APN) AWS is here to help
  80. 80. AWS Architecture Diagrams https://aws.amazon.com/architecture/ Processing large amounts of parallel data using a scalable cluster Use commonly-available cluster scheduling tools, such as Grid Engine or Condor
  81. 81. AWS Online Software Store http://aws.amazon.com/marketplace Big Data Case Studies Learn from other AWS customers https://aws.amazon.com/solutions/case- studies/big-data
  82. 82. AWS Online Software Store https://aws.amazon.com/marketplace AWS Marketplace
  83. 83. AWS Online Software Store http://aws.amazon.com/marketplace AWS Public Data Sets Free access to big data sets https://aws.amazon.com/publicdatasets
  84. 84. AWS Online Software Store AWS Big Data Test Drives APN Partner-provided labs https://aws.amazon.com/testdrive/bigdata
  85. 85. Webinars, Bootcamps, and Self-Paced Labs https://aws.amazon.com/training AWS Training & Events https://aws.amazon.com/events
  86. 86. AWS Online Software Store Big Data to AWS Brand new course on Big Data https://aws.amazon.com/training/course- descriptions/bigdata/
  87. 87. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. https://aws.amazon.com/big-data https://aws.amazon.com/hpc

×