José Papo
Amazon Evangelist
@josepapo
josepapo@amazon.com
@josepapo
“Algorithms have already written
symphonies as moving as those
composed by Beethoven,
picked through legalese with the
def...
A Nuvem é o alavancador das novas tendências tecnológicas
○○○○
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form whe...
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form whe...
“Todos os mercados estão sendo transformados pela nova onda digital”
http://www.amazon.com.br/Digital-Disruption-Unleashin...
3Vs
27 TB per day
Large Hadron Collider – CERN
The Role of Data
is Changing
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form whe...
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form whe...
Data
Actionable Information
Generated
data
Available for analysis
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastr...
Data Strategist
lunch hours last year?
select productId, count(*)
from page_hits
where hour in (12,13)
group by productId
order by count(*) desc
cat *-(12|13) | ...
1PB = 10^15 (1,000,000,000,000,000) bytes
1 PB = 231 days at 50MB/s
Solution: Massively Parallel Processing
○○○○
HDFS
Reliable storage
MapReduce
Data analysis
Very large
log
(e.g TBs)
Very large
log
(e.g TBs)
Lots of actions
by John
Very large
log
(e.g TBs) Split into
small
pieces
Lots of actions
by John
Very large
log
(e.g TBs)
Process in a
hadoop cluster
Split into
small
pieces
Lots of actions
by John
Very large
log
(e.g TBs)
John’s
history
Process in a
hadoop cluster
Aggregate
the results
Split into
small
pieces
Lots of ...
map
Input
file reduce
Output
file
Worker node
map
Input
file reduce
Output
file
map
Input
file reduce
Output
file
map
Input
file reduce
Output
file
Worker node
Worker n...
#3
♥
○○●○○
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form whe...
Elastic On Demand
Pay as you go
Focus on
YOUR
business
November
Provisioned capacity
November
76%
24%
Provisioned capacity
November
November
0
1.000.000
2.000.000
3.000.000
4.000.000
5.000.000
6.000.000
“What kind of movies do people like ?”
More than 25 Million Streaming Members
50 Billion Events Per Day
30 Million plays every day
2 billion hours of video in 3
...
10 TB of streaming data per day
~1 PB of data stored in Amazon S3
S3
Wide range of processing languages used
EMR
Prod Cluster
(EMR)
S3
Data consumed in multiple ways
S3
EMR
Prod Cluster
(EMR)
Recommendation
Engine
Ad-hoc
Analysis Personalization
EMR
S3
EMR
EMR
Prod Cluster
(EMR)
Query Cluster
(EMR)
EMR
EMR
Foursquare…
33 million users
1.3 million businesses
…generates a lot of Data
3.5 billion check-ins
15M+ venues,
Terabytes ...
Uses EMR for
Evaluation of new features
Machine learning
Exploratory analysis
Daily customer usage reporting
Long-term tre...
Source: IDC Whitepaper, sponsored by Amazon, “The Business Value of Amazon Web Services Accelerates Over Time.” July 2012
...
0
0,1
0,2
0,3
0,4
0,5
0,6
Female Male
Gender
0 10 20 30 40 50 60 70 80
Age
Gorilla Coffee
Gray's Papaya
Amorino
Thursday Friday Saturday Sunday
Log files
250 EMR clusters spun up
and down every week
Challenge:
Large amounts of computing resources
needed for short periods of time; significant
data storage costs
Solution:...
Challenge:
Volatile weather is deadly to crops like grapes
Solution:
Built a predictive model based on freely
available da...
OBRIGADO!
http://awshub.com.br
slideshare.net/AmazonWebServicesLATAM
José Papo
Amazon Evangelist
@josepapo
josepapo@amazon...
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
Upcoming SlideShare
Loading in...5
×

A Empresa na Era da Informação Extrema

820

Published on

Slides da palestra realizada no ECM Show 2013

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
820
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Empresa na Era da Informação Extrema

  1. 1. José Papo Amazon Evangelist @josepapo josepapo@amazon.com @josepapo
  2. 2. “Algorithms have already written symphonies as moving as those composed by Beethoven, picked through legalese with the deftness of a senior law partner, diagnosed patients with more accuracy than a doctor, written news articles like a seasoned reporter, and driven vehicles on urban highways with better control than a human driver.”
  3. 3. A Nuvem é o alavancador das novas tendências tecnológicas
  4. 4. ○○○○
  5. 5. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance. We are constantly producing more data
  6. 6. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance. From all types of industries
  7. 7. “Todos os mercados estão sendo transformados pela nova onda digital” http://www.amazon.com.br/Digital-Disruption-Unleashing-Innovation-ebook/dp/B009L7QD1S/
  8. 8. 3Vs
  9. 9. 27 TB per day Large Hadron Collider – CERN
  10. 10. The Role of Data is Changing
  11. 11. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance. Until now, Questions you ask drove Data model New model is collect as much data as possible – “Data-First Philosophy”
  12. 12. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance. Data is the new raw material for any business on par with capital, people, labor Datais the new raw material for business on par with capital & labor
  13. 13. Data Actionable Information
  14. 14. Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  15. 15. Data Strategist
  16. 16. lunch hours last year?
  17. 17. select productId, count(*) from page_hits where hour in (12,13) group by productId order by count(*) desc cat *-(12|13) | cut –f3 | sort | uniq -c > out Hit <enter>?
  18. 18. 1PB = 10^15 (1,000,000,000,000,000) bytes 1 PB = 231 days at 50MB/s
  19. 19. Solution: Massively Parallel Processing
  20. 20. ○○○○
  21. 21. HDFS Reliable storage MapReduce Data analysis
  22. 22. Very large log (e.g TBs)
  23. 23. Very large log (e.g TBs) Lots of actions by John
  24. 24. Very large log (e.g TBs) Split into small pieces Lots of actions by John
  25. 25. Very large log (e.g TBs) Process in a hadoop cluster Split into small pieces Lots of actions by John
  26. 26. Very large log (e.g TBs) John’s history Process in a hadoop cluster Aggregate the results Split into small pieces Lots of actions by John
  27. 27. map Input file reduce Output file Worker node
  28. 28. map Input file reduce Output file map Input file reduce Output file map Input file reduce Output file Worker node Worker node Worker node
  29. 29. #3 ♥ ○○●○○
  30. 30. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.
  31. 31. Elastic On Demand Pay as you go Focus on YOUR business
  32. 32. November
  33. 33. Provisioned capacity November
  34. 34. 76% 24% Provisioned capacity November
  35. 35. November
  36. 36. 0 1.000.000 2.000.000 3.000.000 4.000.000 5.000.000 6.000.000
  37. 37. “What kind of movies do people like ?”
  38. 38. More than 25 Million Streaming Members 50 Billion Events Per Day 30 Million plays every day 2 billion hours of video in 3 months 4 million ratings per day 3 million searches Device location , time , day, week etc. Social data
  39. 39. 10 TB of streaming data per day
  40. 40. ~1 PB of data stored in Amazon S3 S3
  41. 41. Wide range of processing languages used EMR Prod Cluster (EMR) S3
  42. 42. Data consumed in multiple ways S3 EMR Prod Cluster (EMR) Recommendation Engine Ad-hoc Analysis Personalization
  43. 43. EMR S3 EMR EMR Prod Cluster (EMR) Query Cluster (EMR) EMR EMR
  44. 44. Foursquare… 33 million users 1.3 million businesses …generates a lot of Data 3.5 billion check-ins 15M+ venues, Terabytes of log data
  45. 45. Uses EMR for Evaluation of new features Machine learning Exploratory analysis Daily customer usage reporting Long-term trend analysis
  46. 46. Source: IDC Whitepaper, sponsored by Amazon, “The Business Value of Amazon Web Services Accelerates Over Time.” July 2012 70% lower 5 year TCO per app AWS On- premises $3.01M $0.90M 50% reduction in analytics costs
  47. 47. 0 0,1 0,2 0,3 0,4 0,5 0,6 Female Male Gender 0 10 20 30 40 50 60 70 80 Age
  48. 48. Gorilla Coffee Gray's Papaya Amorino Thursday Friday Saturday Sunday
  49. 49. Log files 250 EMR clusters spun up and down every week
  50. 50. Challenge: Large amounts of computing resources needed for short periods of time; significant data storage costs Solution: Clusters of 100s of nodes on EMR running 4-5 hours at a time Leverages 1000 genomes Public Data Set on AWS — free access to ~200 TB of genomes for over 2,600 people from 26 populations around the world.
  51. 51. Challenge: Volatile weather is deadly to crops like grapes Solution: Built a predictive model based on freely available data— 60 years of crop data, 14 TBs of soil data, and 1M government Doppler radar points 50 EMR clusters process new data as it comes into S3 each day, continuously updating the model.
  52. 52. OBRIGADO! http://awshub.com.br slideshare.net/AmazonWebServicesLATAM José Papo Amazon Evangelist @josepapo josepapo@amazon.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×