AWS Summit Barcelona - Data Analysis on AWS
Upcoming SlideShare
Loading in...5
×
 

AWS Summit Barcelona - Data Analysis on AWS

on

  • 1,089 views

 

Statistics

Views

Total Views
1,089
Views on SlideShare
1,089
Embed Views
0

Actions

Likes
0
Downloads
83
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

AWS Summit Barcelona - Data Analysis on AWS AWS Summit Barcelona - Data Analysis on AWS Presentation Transcript

  • AWS Summit 2013 Barcelona Oct 24 – Barcelona, Spain DATA ANALYSIS ON AWS Carlos Conde Sr. Mgr. Solutions Architecture
  • GENERATE  STORE  ANALYZE  SHARE
  • THE COST OF DATA GENERATION IS FALLING
  • THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN DERIVE FROM IT
  • Lower cost, higher throughput GENERATE  STORE  ANALYZE  SHARE
  • Lower cost, higher throughput  GENERATE  STORE  ANALYZE  SHARE Highly constrained
  • DATA VOLUME Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • GENERATE STORE  ANALYZE  SHARE
  • ACCELERATE GENERATE  STORE  ANALYZE  SHARE
  • + ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND = REMOVE CONSTRAINTS
  • GENERATE  STORE  ANALYZE  SHARE
  • AWS Import / Export AWS Direct Connect GENERATE  STORE  ANALYZE  SHARE
  • Generated and stored in AWS Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect Regional replication of AMIs and snapshots
  • Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE
  • AMAZON S3 SIMPLE STORAGE SERVICE
  • AMAZON DYNAMODB HIGH-PERFORMANCE, FULLY MANAGED NoSQL DATABASE SERVICE
  • DURABLE & AVAILABLE CONSISTENT, DISK-ONLY WRITES (SSD)
  • LOW LATENCY AVERAGE READS < 5MS, WRITES < 10MS
  • NO ADMINISTRATION
  • 500,000 WRITES PER SECOND DURING SUPER BOWL
  • AMAZON REDSHIFT FULLY MANAGED, PETA-BYTE SCALE DATAWAREHOUSE ON AWS
  • DESIGN OBJECTIVES: A petabyte-scale data warehouse service that was… A Lot Faster AMAZON REDSHIFT A Lot Cheaper A Whole Lot Simpler
  • AMAZON REDSHIFT RUNS ON OPTIMIZED HARDWARE HS1.8XL: 128 GB RAM, 16 Cores, 16 TB compressed user storage, 2 GB/sec scan rate HS1.XL: 16 GB RAM, 2 Cores, 2 TB compressed customer storage
  • 30 MINUTES DOWN TO 12 SECONDS
  • AMAZON REDSHIFT LETS YOU START SMALL AND GROW BIG Extra Large Node (HS1.XL) Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB) Eight Extra Large Node (HS1.8XL) Cluster 2-100 Nodes (32 TB – 1.6 PB)
  • CREATE A DATAWAREHOUSE IN MINUTES
  • JDBC/ODBC
  • Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999
  • DATA WAREHOUSING DONE THE AWS WAY Easy to provision and scale up massively No upfront costs, pay as you go Really fast performance at a really low price Open and flexible with support for popular tools
  • USAGE SCENARIOS
  • S3 EMR Redshift Reporting and BI
  • OLTP Web Apps DynamoDB Redshift Reporting and BI
  • OLTP ERP RDBMS Redshift Reporting & BI
  • OLTP ERP RDBMS Redshift + Reporting & BI
  • Social Point Analytics in AWS Marc Canaleta (CTO) @mcanaleta AWS Summit Barcelona 2013
  • Social Games developer para Mobile y Facebook Fundada en 2008, oficinas en Barcelona (22@), 170 personas. Top #20 mobile grossing games worldwide Top #3 facebook developer
  •  Juegos Sociales: interacción entre amigos, viralidad  Modelo freemium: Jugar es gratis, algunos items de pago  Sector Midcore  Leader in Breeding & Collecting strategy games
  •  Top 20 Grossing en iOS App Store worldwide  Lanzado recientemente en Android, featured en Google Play  6M DAU en Facebook
  •  No mantener ni planificar hardware: aumenta la velocidad del negocio  Flexible: Pago por uso  Facilita la escalabilidad: Auto Scaling  Facilita la alta disponibilidad: múltiples availability zones  Managed components: Load Balancers, Bases de datos, …
  • Analytics Driven. Necesarias para casi todos nuestros equipos:  Ingenieros: analíticas realtime, monitorización, detección de problemas  Producto: tomar decisiones, A/B testing, game balancing, …  Marketing: optimización de campañas  Finanzas: seguimiento del negocio
  • FLASH CLIENT IOS CLIENT ANDROID CLIENT BACKEND SERVERS BACKEND SERVERS BACKEND SERVERS Symfony 2 ANALYTICS QUEUES ANALYTICS QUEUES ANALYTICS QUEUES Redis LOGFILES STORAGE ANALYTICS DATABASE AWS S3 AWS Redshift
  •  Backend escribe eventos en listas de redis  Porque Redis?  Coste y rendimiento: 10K eventos/segundo/servidor  Problema: es una base de datos en memoria, hay que vaciar las colas constantemente  Escalado y HA: N servidores distribuidos aleatoriamente BACKEND REDIS REDIS REDIS
  •  Procesos python consumen las colas constantemente y  Calculan métricas Real Time  Almacenan logfiles de eventos para subirlos a S3 GENERACIÓN DE EVENTOS Redis Queue LPOP event Consumer Redis Real Time write event Event Log File  Encolan en SQS la URL del objeto S3 INCR counter put object Amazon S3 CARGA DE DATOS Amazon SQS enqueue S3 object URL
  • GENERACIÓN DE EVENTOS  Python es muy adecuado para desarrollar workers y tratar datos  Redis: estructuras como contadores, sets, sorted sets, para métricas Real Time  S3: espacio virtualmente infinito, escalable, alta disponibilidad  SQS fiabilidad y disponibilidad a mayor precio que Redis Redis Queue LPOP event Consumer INCR counter Redis Real Time write event Event Log File put object Amazon S3 CARGA DE DATOS Amazon SQS enqueue S3 object URL
  • PROCESADO DE EVENTOS  Los importers leen URLs de SQS Amazon S3 Amazon SQS  Se descargan logfiles de S3  Convierten a TSV  Importan masivamente a Redshift (N logfiles a la vez) Importer TSV RedShift
  •  Nos permite ser flexibles -> cambios de esquema sin downtime  Muy escalable (con downtime de escrituras)  Poco riesgo de implantación  Sistema offline  Backups  Mantenimiento mínimo: vacuums, espacio  Buen soporte de SQL, a diferencia de otras columnar databases
  •  Transformaciones y cálculos diarios implementados en SQL Ejemplo: UPDATE USER SET total_revenues = (SELECT SUM(amount) FROM transaction t WHERE t.user_id = user.user_id);  Por qué no hadoop?  Mucho más complejo y lento; de momento las operaciones SQL cumplen todos nuestros requisitos
  • ¿Te gustaría trabajar en el sector de los videojuegos? Buscamos talento. El talento atrae al talento. www.socialpoint.es/jobs ¡GRACIAS! 
  • GENERATE  STORE  ANALYZE  SHARE Amazon EC2 Amazon Elastic MapReduce
  • AMAZON ELASTIC MAPREDUCE HADOOP AS A SERVICE
  • • • • • A FRAMEWORK SPLITS DATA INTO PIECES LETS PROCESSING OCCUR GATHERS THE RESULTS
  • Corporate Data Center Elastic Data Center
  • Corporate Data Center Application data and logs for analysis pushed to S3 Elastic Data Center
  • Amazon Elastic Map Reduce name node to control analysis N Corporate Data Center Elastic Data Center
  • N Corporate Data Center Hadoop cluster started by Elastic Map Reduce Elastic Data Center
  • N Corporate Data Center Adding many hundreds or thousands of nodes Elastic Data Center
  • Disposed of when job completes N Corporate Data Center Elastic Data Center
  • Corporate Data Center Results of analysis pulled back into your systems Elastic Data Center
  • Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 GENERATE  STORE  ANALYZE  SHARE
  • PUBLIC DATA SETS http://aws.amazon.com/publicdatasets
  • GENERATE  STORE  ANALYZE  SHARE
  • GENERATE  STORE  ANALYZE  SHARE
  • FROM DATA TO ACTIONABLE INFORMATION