Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data in Texas: Then, Now, and Ahead

13,680 views

Published on

Plenary talk from Data Day Texas 2013 http://datadaytexas.com/ in Austin

Published in: Technology
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data in Texas: Then, Now, and Ahead

  1. “Big Data in Texas: Then, Now, and Ahead”Paco Nathan,Evil Mad Scientist @Concurrent, Inc. 1
  2. Then, Now, and Ahead THEN1. Keep Austin Weird?2. Something Called Data Science3. Rise Of The Machine Data4. A Cambrian Explosion5. Eat, Drink, Be Merry…6. Data-Driven In TX7. Roll Up Your Sleeves 2
  3. observations… Lynn asked me to talk about Data here today A few weeks ago we stepped back for a moment to reflect about what we’d seen happen in Austin over the years Both of us ran alternative bookstores in Austin, twenty or so years ago, and we participated as the Internet thing exploded in the 1990s That was a blast – 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. observations… We noticed a trend Thinking about some of those who kept showing up whenever interesting things were afoot… 8
  9. 9
  10. “curation and metadata” 10
  11. observations… Overall, it’s about systems thinking We have a wealth of that here, at UT/Austin in particular… Ilya Prigogine spent years here, which is just incredible School of Architecture, with leading work in VR, GIS, etc. Interactive innovations at ACTLab… Quantitative emphasis at McCombs… major intellectual resources here 11
  12. Then, Now, and Ahead NOW1. Keep Austin Weird?2. Something Called Data Science3. Rise Of The Machine Data4. A Cambrian Explosion5. Eat, Drink, Be Merry…6. Data-Driven In TX7. Roll Up Your Sleeves 12
  13. Data Science edoMpUsserD:IUN tcudorP ylppA lenaP yrotnevnI tneilC tcudorP evomeR lenaP yrotnevnI tneilC edoMmooRyM:IUN edoMmooRcilbuP:IUN ydduB ddA nigoL etisbeW vd edoMsdneirF:IUN edoMtahC:IUN egasseM a evaeL G1 :gniniamer ecaps sserddA dekcilCeliforPyM:IUN edoMstiderCyuB:IUN tohspanS a ekaT egapemoH nwO tisiV elbbuB a epyT taeS egnahC business process, wodniW D3 nepO Domain dneirF ddA revO tcudorP pilF lenaP yrotnevnI tneilC lenaP tidE Expert woN tahC stakeholder teP yalP teP deeF 2 petS egaP traC esahcruP edaM remotsuC M215 :gniniamer ecaps sserddA gnihtolC no tuP bew :metI na yuB edoMeivoM:IUN ytinummoc ,tneilc :detratS weiV eivoM teP weN etaerC data detrats etius tset :tseTytivitcennoC emag pazyeh dehcnuaL eciov mooRcilbuP tahC science egasseM yadhtriB edoMlairotuT:IUN ybbol semag dehcnuaL data prep, discovery, noitartsigeR euqinU Data edoMpUsserD:IUN tcudorP ylppA lenaP yrotnevnI tneilC tcudorP evomeR lenaP yrotnevnI tneilC edoMmooRyM:IUN edoMmooRcilbuP:IUN ydduB ddA nigoL etisbeW vd edoMsdneirF:IUN edoMtahC:IUN egasseM a evaeL G1 :gniniamer ecaps sserddA dekcilCeliforPyM:IUN edoMstiderCyuB:IUN tohspanS a ekaT egapemoH nwO tisiV elbbuB a epyT taeS egnahC dneirF ddA revO tcudorP pilF lenaP yrotnevnI tneilC lenaP tidE woN tahC teP yalP teP deeF 2 petS egaP traC esahcruP edaM remotsuC M215 :gniniamer ecaps sserddA gnihtolC no tuP bew :metI na yuB edoMeivoM:IUN ytinummoc ,tneilc :detratS weiV eivoM teP weN etaerC detrats etius tset :tseTytivitcennoC emag pazyeh dehcnuaL eciov mooRcilbuP tahC egasseM yadhtriB edoMlairotuT:IUN ybbol semag dehcnuaL noitartsigeR euqinU wodniW D3 nepO Scientist modeling, etc. software engineering, App Dev automation Ops systems engineering, availability introduced capability 13
  14. Data Science in Texas… 14
  15. references… by DJ Patil Data Jujitsu O’Reilly, 2012 amazon.com/dp/B008HMN5BE Building Data Science Teams O’Reilly, 2011 amazon.com/dp/B005O4U3ZE 15
  16. Enterprise Data Workflows Document Collection Scrub Tokenize token M HashJoin Regex Left token GroupBy R Stop Word token List RHS Count Word Countcascading.org 16
  17. Enterprise Data Workflows Over the past 5+ years, we’ve seen many large- scale Enterprise production deployments based on Cascading, Cascalog, Scalding, PyCascading, Cascading.JRuby, etc. Enterprise data workflows, Machine learning at scale, Big Data… Why? amazon.com/dp/1449358721 17
  18. Then, Now, and Ahead NOW1. Keep Austin Weird?2. Something Called Data Science3. Rise Of The Machine Data4. A Cambrian Explosion5. Eat, Drink, Be Merry…6. Data-Driven In TX7. Roll Up Your Sleeves 18
  19. Three broad categories of dataCurt Monash, 2010dbms2.com/2010/01/17/three-broad-categories-of-data• Human/Tabular data – human-generated data which fits well into tables/arrays• Human/Nontabular data – all other data generated by humans• Machine-Generated data 19
  20. Three broad categories of dataCurt Monash, 2010dbms2.com/2010/01/17/three-broad-categories-of-data• Human/Tabular data – human-generated data which fits well into tables/arrays• Human/Nontabular data – all other data generated by humans• Machine-Generated data• Adjusted Data – Dr. Don Easterbrook, Senate witness 20
  21. Q3 1997: inflection point Four independent teams were working toward horizontal scale-out of workflows based on commodity hardware This effort prepared the way for huge Internet successes in the 1997 holiday season… AMZN, EBAY, Inktomi (YHOO Search), then GOOG MapReduce and the Apache Hadoop open source stack emerged from this 21
  22. Circa 1996: pre- inflection point Stakeholder Customers Excel pivot tables PowerPoint slide decks strategy BI Product Analysts requirements SQL Query optimized Engineering code Web App result sets transactions RDBMS 22
  23. Circa 1996: pre- inflection point Stakeholder Customers Excel pivot tables PowerPoint slide decks strategy “Throw it over the wall” BI Product Analysts requirements SQL Query optimized Engineering code Web App result sets transactions RDBMS 23
  24. Circa 2001: post- big ecommerce successes Stakeholder Product Customers dashboards UX Engineering models servlets recommenders Algorithmic + Web Apps Modeling classifiers Middleware aggregation event SQL Query history result sets customer transactions Logs DW ETL RDBMS 24
  25. Circa 2001: post- big ecommerce successes Stakeholder Product Customers “Data products” dashboards UX Engineering models servlets recommenders Algorithmic + Web Apps Modeling classifiers Middleware aggregation event SQL Query history result sets customer transactions Logs DW ETL RDBMS 25
  26. Circa 2013: clusters everywhere Data Products Customers business Domain process Prod Expert Workflow dashboard metrics data Web Apps, s/w History services science Mobile, etc. dev Data Scientist Planner social discovery interactions + optimized transactions, Eng modeling taps capacity content App Dev Use Cases Across Topologies Hadoop, Log In-Memory etc. Events Data Grid Ops DW Ops batch near time Cluster Scheduler introduced existing capability SDLC RDBMS RDBMS 26
  27. Circa 2013: clusters everywhere Data Products Customers business Domain process Prod Expert Workflow dashboard metrics data Web Apps, s/w History services science Mobile, etc. dev Data Scientist Planner social discovery interactions + optimized transactions, Eng modeling taps capacity content App Dev “Optimizing topologies” Use Cases Across Topologies Hadoop, Log In-Memory etc. Events Data Grid Ops DW Ops batch near time Cluster Scheduler introduced existing capability SDLC RDBMS RDBMS 27
  28. references… • Lambda Architecture: blending topologies • Big Data by Nathan Marz, James Warren • manning.com/marz source: Nathan Marz 28
  29. references… by Leo Breiman Statistical Modeling: The Two Cultures Statistical Science, 2001 bit.ly/eUTh9L 29
  30. references… Amazon “Early Amazon: Splitting the website” – Greg Linden glinden.blogspot.com/2006/02/early-amazon-splitting-website.html eBay “The eBay Architecture” – Randy Shoup, Dan Pritchett addsimplicity.com/adding_simplicity_an_engi/2006/11/you_scaled_your.html addsimplicity.com.nyud.net:8080/downloads/eBaySDForum2006-11-29.pdf Inktomi (YHOO Search) “Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff) youtube.com/watch?v=E91oEn1bnXM Google “Underneath the Covers at Google” – Jeff Dean (0:06:54 ff) youtube.com/watch?v=qsan-GQaeyk perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx 30
  31. Then, Now, and Ahead NOW1. Keep Austin Weird?2. Something Called Data Science3. Rise Of The Machine Data4. A Cambrian Explosion5. Eat, Drink, Be Merry…6. Data-Driven In TX7. Roll Up Your Sleeves 31
  32. Displacement Geoffrey Moore Mohr Davidow Ventures, author of Crossing The Chasm Hadoop Summit, 2012: what Amazon did to the retail sector… has put the entire Global 1000 on notice over the next decade data as the major force… mostly through apps – verticals, leveraging domain expertise Michael Stonebraker INGRES, PostgreSQL,Vertica,VoltDB, Paradigm4, etc. XLDB, 2012: complex analytics workloads are now displacing SQL as the basis for Enterprise apps 32
  33. Drivers algorithmic modeling + machine data + curation, metadata + Open Data data products, as feedback into automation evolution of feedback loops a big part of the science in data science… internet of things + complex analytics accelerated evolution, additional feedback loops taking this out into a highly social dimension 33
  34. “A kind of Cambrian explosion” source: National Geographic 34
  35. Internet of Things 35
  36. A Thought Exercise Consider that when a company like Catepillar moves into data science, they won’t be building the world’s next search engine or social network They will most likely be optimizing supply chain, optimizing fuel costs, automating data feedback loops integrated into their equipment… Operations Research – crunching amazing amounts of data 36
  37. A Thought Exercise That’s a $50B company, in a market segment worth $250B Upcoming: tractors as drones – guided by complex, distributed data apps 37
  38. Alternatively… climate.com 38
  39. Two Avenues to the App Layer Enterprise: must contend with complexity at scale everyday… incumbents extend current practices and infrastructure investments complexity ➞ Start-ups: crave complexity and scale to become viable… new ventures move into Enterprise space scale ➞ to compete using relatively lean staff 39
  40. Then, Now, and Ahead AHEAD1. Keep Austin Weird?2. Something Called Data Science3. Rise Of The Machine Data4. A Cambrian Explosion5. Eat, Drink, Be Merry…6. Data-Driven In TX7. Roll Up Your Sleeves 40
  41. For instance… Let’s drill-down on that intersection of tractors and crops, as a focus… Some of the largest use cases for large-scale data workflows which we encounter are in Agriculture Here’s a sector which integrates some of those themes from the Internet of Things, Catepillar, Climate Corp, etc. 41
  42. Data and Agriculture, Ahead • single largest employer, livelihood for 40% globally • 500 million small farms worldwide • most family farmers rely on rain-fed agriculture • approx $2T agricultural real estate in US alone • high annual rate of soil depletion • cycles of flooding, drought, desertification • high resolution from private satellite networks, e.g., skyboximaging.com • SMS networks for “business intelligence” among family farmers in Ethiopia agrepedia.com • microfinance, e.g., kiva.org, slowmoney.org 42
  43. Data and Agriculture, Ahead Consider the emerging reality of drone tractors, guided by satellite feeds, with predictive analytics accessing remote cloud-based clusters, crunching data for crops planted per-plot, based on years of history evaluated in time series analysis It would be difficult to identify a bigger Big Data problem in the world 43
  44. Data and Agriculture, Ahead You’ve heard about Peak Oil, Peak Phosphorus? How about Peak Snow? In other words, rising variance of snow pack levels, increasingly earlier peak snow in the mountains… which stresses the watersheds, infrastructure, etc., which in turn stress agriculture, energy, transportation, financial markets, tax basis, etc. Jeff Dozier, William Gail “The Emerging Science of Environmental Applications” The Fourth Paradigm, 2009 source: J. Dozier, et al., UCSB 44
  45. Data and Agriculture, Ahead Variance in the timing of the water cycle causes stress on natural resources and infrastructure: reservoirs, aqueducts, river ways, aquifers, levees, farm lands, seawater incursion, etc. Even in the face of so much IoT data looming, we lack adequate data and modeling of snowpack, snow melt, runoff, evaporation, water basins, etc., to understand the impact of these changes – now needed to forecast where to change infrastructure or strategies There’s not much machine data up in the mountain peaks, and satellite data only serves so far… new opportunities for Big Data source: J. Dozier, et al., UCSB 45
  46. Data and Agriculture, Ahead 46
  47. Data and Agriculture, Ahead We can resolve these kinds of problems; however, solutions must leverage huge amounts of data 47
  48. Then, Now, and Ahead AHEAD1. Keep Austin Weird?2. Something Called Data Science3. Rise Of The Machine Data4. A Cambrian Explosion5. Eat, Drink, Be Merry…6. Data-Driven In TX7. Roll Up Your Sleeves 48
  49. Everything’s Bigger in Texas Agriculture is just one sector, one set of problems to tackle We have much, much more here in Texas For example, Houston is a major center for Maritime work… check out: marinexplore.org 49
  50. Everything’s Bigger in Texas There’s also the not so small matter of the Energy and Transportation sectors GE is putting sensors in each and every wind generator, each and every jet engine – again, the Internet of Things. I’ve heard rumors there are a few of those wind turbines out in West Texas? 50
  51. Everything’s Bigger in Texas Another of the fastest growing use cases we see for large-scale predictive modeling is in Telecom Think about the stream of CDRs, billions of us bipeds wandering about the planet with our phones… Firehose for that makes Twitter look like MySpace! The value of location services as data products for local businesses, communities is astounding 51
  52. Then, Now, and Ahead AHEAD1. Keep Austin Weird?2. Something Called Data Science3. Rise Of The Machine Data4. A Cambrian Explosion5. Eat, Drink, Be Merry…6. Data-Driven In TX7. Roll Up Your Sleeves 52
  53. What is needed? Approximately 80% of the costs for data-related projects get spent on data preparation – mostly on cleaning up data quality issues: ETL, log file analysis, etc. Unfortunately, data-related budgets for many companies tend to go into frameworks which can only be used after clean up Most valuable skills: ‣ learn to use programmable tools that prepare data ‣ learn to generate compelling data visualizations ‣ learn to estimate the confidence for reported results ‣ learn to automate work, making analysis repeatable source: D3 53
  54. What else do we need? • more emphasis on statistical thinking • not SQL vs. NoSQL, but instead a focus on apps as the process of structuring data • multi-disciplinary teams, not cubicles and silos • evolving more feedback loops, to drive more automation • oddly enough, we need automation to be able to employ more people in intelligent, productive ways • otherwise, we’re left with… source: Schwa Corporation 54
  55. source: Twentieth Century Fox 55
  56. Thank you very much! source: Twentieth Century Fox 56

×