Big Data Engineering - Top 10 Pragmatics

1,771 views
1,614 views

Published on

Very high level, but covers all the essentials. Slides of my talk at the Naval PostGraduate School, Monterey

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,771
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
55
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Big Data Engineering - Top 10 Pragmatics

  1. 1. The road lies plain before me;--tis a theme Single and of determined bounds; … - Wordsworth, The Prelude m pre ss.co . word ol bl eclix te Scho p:/ /dou Gr adua 2 ka r, htt val Post l2 7,201 n a San r, Na Apri Krish in a st Sem hD Gue 00–PEC40
  2. 2. What is Big Data ? Big Data to smart data Big Data Pipeline o  Agenda o  To cover the broad picture o  Touch upon instances of the Analytics/ Cloud technologies Modeling Analytic R Algorithms Architectures employedo  Of the Big Data Processing - Storage - domain … Visualization Hadoop NOSQL
  3. 3. Thanks to …The giants whose shoulders I am standing on Special  Thanks  to:        Peter  Ateshian,  NPS        Prof  Murali  Tummala,  NPS        Shirley  Bailes,O’Reilly        Ed  Dumbill,O’Reilly        Jeff  Barr,AWS        Jenny  Kohr  Chynoweth,AWS  
  4. 4. Porcelain vs. Plumbing • The balance is always interesting … • This talk has both • Would be happy to dive deep into plumbing topics like Hadoop, R, MongoDB, Cassandra et al…
  5. 5. EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  6. 6. EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  7. 7. EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  8. 8. EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
  9. 9. EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs  ⑤  Contextual o  Dynamic  variability   o  RecommendaWon  ⑥  Connectedness hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
  10. 10. •  “…  they  didn’t  need  a  genius,  …  but  build  the  world’s  most  impressive   dileKante  …  baKling  the  efficient  human  mind  with  spectacular   flamboyant  inefficiency”  –  Final  Jeopardy  by  Stephen  Baker   •  15  TB  memory,  across  90  IBM  760  servers,  in  10  racks   •  1  TB  of  dataset   •  200  Million  pages  processed  by  Hadoop   •  This  is  a  good  example  of  Connected  data   –  Contextual  w/  variability   –  Breath  of  interpretaWon   –  AnalyWcs  depth  hKp://doubleclix.wordpress.com/2011/03/01/the-­‐educaWon-­‐of-­‐a-­‐machine-­‐%E2%80%93-­‐review-­‐of-­‐book-­‐%E2%80%9Cfinal-­‐jeopardy%E2%80%9D-­‐by-­‐stephen-­‐baker/  hKp://doubleclix.wordpress.com/2011/02/17/watson-­‐at-­‐jeopardy-­‐a-­‐race-­‐of-­‐machines/  
  11. 11. Ref:  hKp://www.ciol.com/News/News/News-­‐Reports/Vinod-­‐Khosla%E2%80%99s-­‐cool-­‐dozen-­‐tech-­‐innovaWons/156307/0/  hKp://yourstory.in/2011/11/vinod-­‐khoslas-­‐keynote-­‐at-­‐nasscom-­‐product-­‐conclave-­‐reject-­‐punditry-­‐believe-­‐in-­‐an-­‐idea-­‐take-­‐risk-­‐and-­‐succeed/  
  12. 12. Ref:h&p:goo.gl/Mm83k Infer-ability Model Internal   dashboards,   Hand   Tableau   Context coded     Programs,   Connectedness R,  Mahout,   …   SQL,       Variety BI  Tools,   Hadoop,   Pig,  Hive,     Variability SQL   .NET   Dryad,   NOSQL,   Logs,   Various  Velocity Scribe,   HDFS,   XML,   other  tools   Flume,   =iles,  …  Volume Storm,     Hadoop …   Decomplexify! Contextualize! Network! Reason! Infer!
  13. 13. Twitter §  200 million tweets/day §  Peak 10,000/second §  How would you handle the fire hose for social network analytics ? AWS – 900 Billion objects! Zynga §  “Analytics company, not a gaming company!” §  Harvests data : 15 TB/day Storage §  Test new features §  4 U box = 40 TB, §  Target advertising 1 PB = 25 boxes ! §  §  230 million players/month hKp://goo.gl/dcBsQ  
  14. 14. •  6  Billion  Messages  per   day  •  2  PB  (w/compression)   online  •  6  PB  w/  replicaWon  •  250  TB/Month  growth  •  HBase  Infrastructure  
  15. 15. eBay  Extreme   AnalyWcs   Architecture   50  TB/Day   Very  systemaWc   240  nodes,  84  PB   Diagram  speaks  volumes!  Path  Analysis   Teradata  InstallaWon  A/B  TesWng   Ref:  hKp://www.hpts.ws/sessions/2011HPTS-­‐TomFastner.pdf  
  16. 16. D3.js   Tableau   R   Dashboard   Mahout   Hadoop   BI  Tools   Predict, Pig/Hive   Recommend NOSQL   Model & & Visualize Cassandra   R   Reason MongoDB   Transform Splunk   Hbase   & Analyze Scribe   Neo4j   Flume   Storm   Store When I think of my own native land, !Collect In a moment I seem to be there; ! But, alas! recollection at hand Soon hurries me back to despair.! - Cowper, The Solitude Of Alexander SelKirk!
  17. 17. NOSQL   Key  Value   Column   Document   Graph   In-­‐memory   SimpleDB   CouchDB   Neo4j   Memcached   Google   MongoDB   FlockDB   BigTable   Disk  Based   HBase   Lotus  Domino   InfiniteGraph   Redis   Cassandra   Riak  Tokyo  Cabinet   Dynamo   HyperTable   Voldemort   Azure  TS  
  18. 18. MapReduce•  Data  parallelism  •  Large  InstallaWons  (many  ~5000  node  clusters!)  
  19. 19. Sotware  As  A  Service  Plasorm  As  A  Service  Infrastructure  As  A  Service   19  
  20. 20. Amazon – Canonical Cloud •  S3  –  Blob  storage   •  Dynamo  DB  –  NOSQL   •  EMR  –  ElasWc  Map  Reduce   •  EC2  –  Compute   •  1%  of  Internet  traffic  “Scalability is about building wider roads,not about building faster cars” – SteveSwartz hKp://blog.deepfield.net/2012/04/18/how-­‐big-­‐is-­‐amazons-­‐cloud/  
  21. 21. hKp://www.slideshare.net/AmazonWebServices/keynote-­‐your-­‐future-­‐with-­‐cloud-­‐compuWng-­‐dr-­‐werner-­‐vogels-­‐aws-­‐summit-­‐2012-­‐nyc  
  22. 22. EC2 EC2hKp://openclipart.org/detail/152311/internet-­‐cloud-­‐by-­‐b.gaulWer,hKp://openclipart.org/detail/17847  
  23. 23. •  Social  Network  Analysis   •  SenWment  Analysis   •  Brand  Strength   •  CitaWon/co-­‐citaWon  ≅  Followed  by/Also  Follows   •  Metrics   Tweets   –  Network  diameter,     Followers   –  Weak-­‐Wes,     Follow/Unfollow   –  Erdös-­‐Renyi  model  &     –  Kronecker  Graphs  hKp://www.oscon.com/oscon2012/public/schedule/detail/23130  
  24. 24. Was it a vision, or a waking dream?!Fled is that music:—do I wake or sleep?! -Keats, Ode to a Nightingale!

×