• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Engineering - Top 10 Pragmatics
 

Big Data Engineering - Top 10 Pragmatics

on

  • 1,646 views

Very high level, but covers all the essentials. Slides of my talk at the Naval PostGraduate School, Monterey

Very high level, but covers all the essentials. Slides of my talk at the Naval PostGraduate School, Monterey

Statistics

Views

Total Views
1,646
Views on SlideShare
1,629
Embed Views
17

Actions

Likes
3
Downloads
38
Comments
0

3 Embeds 17

http://www.linkedin.com 14
https://www.linkedin.com 2
https://si0.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Engineering - Top 10 Pragmatics Big Data Engineering - Top 10 Pragmatics Presentation Transcript

    • The road lies plain before me;--tis a theme Single and of determined bounds; … - Wordsworth, The Prelude m pre ss.co . word ol bl eclix te Scho p:/ /dou Gr adua 2 ka r, htt val Post l2 7,201 n a San r, Na Apri Krish in a st Sem hD Gue 00–PEC40
    • What is Big Data ? Big Data to smart data Big Data Pipeline o  Agenda o  To cover the broad picture o  Touch upon instances of the Analytics/ Cloud technologies Modeling Analytic R Algorithms Architectures employedo  Of the Big Data Processing - Storage - domain … Visualization Hadoop NOSQL
    • Thanks to …The giants whose shoulders I am standing on Special  Thanks  to:        Peter  Ateshian,  NPS        Prof  Murali  Tummala,  NPS        Shirley  Bailes,O’Reilly        Ed  Dumbill,O’Reilly        Jeff  Barr,AWS        Jenny  Kohr  Chynoweth,AWS  
    • Porcelain vs. Plumbing • The balance is always interesting … • This talk has both • Would be happy to dive deep into plumbing topics like Hadoop, R, MongoDB, Cassandra et al…
    • EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
    • EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
    • EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
    • EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs   hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf   hKp://www.quora.com/Business-­‐Intelligence/What-­‐is-­‐the-­‐future-­‐of-­‐business-­‐intelligence  
    • EBC322  ①  Volume o  Scale  ②  Velocity o  Data  change  rate  vs.  decision  window  ③  Variety o  Different  sources  &  formats   o  Structured  vs.  Unstructured  ④  Variability o  Breadth  of  interpreta<on  &   o  Depth  of  analy<cs  ⑤  Contextual o  Dynamic  variability   o  RecommendaWon  ⑥  Connectedness hKp://doubleclix.wordpress.com/2011/09/13/when-­‐is-­‐big-­‐data-­‐really-­‐big-­‐data/   hKp://www.hpts.ws/posters/Poster2011_13_Bulkowski.pdf  
    • •  “…  they  didn’t  need  a  genius,  …  but  build  the  world’s  most  impressive   dileKante  …  baKling  the  efficient  human  mind  with  spectacular   flamboyant  inefficiency”  –  Final  Jeopardy  by  Stephen  Baker   •  15  TB  memory,  across  90  IBM  760  servers,  in  10  racks   •  1  TB  of  dataset   •  200  Million  pages  processed  by  Hadoop   •  This  is  a  good  example  of  Connected  data   –  Contextual  w/  variability   –  Breath  of  interpretaWon   –  AnalyWcs  depth  hKp://doubleclix.wordpress.com/2011/03/01/the-­‐educaWon-­‐of-­‐a-­‐machine-­‐%E2%80%93-­‐review-­‐of-­‐book-­‐%E2%80%9Cfinal-­‐jeopardy%E2%80%9D-­‐by-­‐stephen-­‐baker/  hKp://doubleclix.wordpress.com/2011/02/17/watson-­‐at-­‐jeopardy-­‐a-­‐race-­‐of-­‐machines/  
    • Ref:  hKp://www.ciol.com/News/News/News-­‐Reports/Vinod-­‐Khosla%E2%80%99s-­‐cool-­‐dozen-­‐tech-­‐innovaWons/156307/0/  hKp://yourstory.in/2011/11/vinod-­‐khoslas-­‐keynote-­‐at-­‐nasscom-­‐product-­‐conclave-­‐reject-­‐punditry-­‐believe-­‐in-­‐an-­‐idea-­‐take-­‐risk-­‐and-­‐succeed/  
    • Ref:h&p:goo.gl/Mm83k Infer-ability Model Internal   dashboards,   Hand   Tableau   Context coded     Programs,   Connectedness R,  Mahout,   …   SQL,       Variety BI  Tools,   Hadoop,   Pig,  Hive,     Variability SQL   .NET   Dryad,   NOSQL,   Logs,   Various  Velocity Scribe,   HDFS,   XML,   other  tools   Flume,   =iles,  …  Volume Storm,     Hadoop …   Decomplexify! Contextualize! Network! Reason! Infer!
    • Twitter §  200 million tweets/day §  Peak 10,000/second §  How would you handle the fire hose for social network analytics ? AWS – 900 Billion objects! Zynga §  “Analytics company, not a gaming company!” §  Harvests data : 15 TB/day Storage §  Test new features §  4 U box = 40 TB, §  Target advertising 1 PB = 25 boxes ! §  §  230 million players/month hKp://goo.gl/dcBsQ  
    • •  6  Billion  Messages  per   day  •  2  PB  (w/compression)   online  •  6  PB  w/  replicaWon  •  250  TB/Month  growth  •  HBase  Infrastructure  
    • eBay  Extreme   AnalyWcs   Architecture   50  TB/Day   Very  systemaWc   240  nodes,  84  PB   Diagram  speaks  volumes!  Path  Analysis   Teradata  InstallaWon  A/B  TesWng   Ref:  hKp://www.hpts.ws/sessions/2011HPTS-­‐TomFastner.pdf  
    • D3.js   Tableau   R   Dashboard   Mahout   Hadoop   BI  Tools   Predict, Pig/Hive   Recommend NOSQL   Model & & Visualize Cassandra   R   Reason MongoDB   Transform Splunk   Hbase   & Analyze Scribe   Neo4j   Flume   Storm   Store When I think of my own native land, !Collect In a moment I seem to be there; ! But, alas! recollection at hand Soon hurries me back to despair.! - Cowper, The Solitude Of Alexander SelKirk!
    • NOSQL   Key  Value   Column   Document   Graph   In-­‐memory   SimpleDB   CouchDB   Neo4j   Memcached   Google   MongoDB   FlockDB   BigTable   Disk  Based   HBase   Lotus  Domino   InfiniteGraph   Redis   Cassandra   Riak  Tokyo  Cabinet   Dynamo   HyperTable   Voldemort   Azure  TS  
    • MapReduce•  Data  parallelism  •  Large  InstallaWons  (many  ~5000  node  clusters!)  
    • Sotware  As  A  Service  Plasorm  As  A  Service  Infrastructure  As  A  Service   19  
    • Amazon – Canonical Cloud •  S3  –  Blob  storage   •  Dynamo  DB  –  NOSQL   •  EMR  –  ElasWc  Map  Reduce   •  EC2  –  Compute   •  1%  of  Internet  traffic  “Scalability is about building wider roads,not about building faster cars” – SteveSwartz hKp://blog.deepfield.net/2012/04/18/how-­‐big-­‐is-­‐amazons-­‐cloud/  
    • hKp://www.slideshare.net/AmazonWebServices/keynote-­‐your-­‐future-­‐with-­‐cloud-­‐compuWng-­‐dr-­‐werner-­‐vogels-­‐aws-­‐summit-­‐2012-­‐nyc  
    • EC2 EC2hKp://openclipart.org/detail/152311/internet-­‐cloud-­‐by-­‐b.gaulWer,hKp://openclipart.org/detail/17847  
    • •  Social  Network  Analysis   •  SenWment  Analysis   •  Brand  Strength   •  CitaWon/co-­‐citaWon  ≅  Followed  by/Also  Follows   •  Metrics   Tweets   –  Network  diameter,     Followers   –  Weak-­‐Wes,     Follow/Unfollow   –  Erdös-­‐Renyi  model  &     –  Kronecker  Graphs  hKp://www.oscon.com/oscon2012/public/schedule/detail/23130  
    • Was it a vision, or a waking dream?!Fled is that music:—do I wake or sleep?! -Keats, Ode to a Nightingale!