Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data and the Art of Data Science

7,132 views

Published on

Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?

This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.

Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success

These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.

Published in: Technology
  • Copas Url to Download This eBook === http://freeadygiuagdia.ygto.com/1632171864-the-breath-of-a-whale-the-science-and-spirit-of-pacific-ocean-giants.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❶❶❶ http://bit.ly/2Qu6Caa ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2Qu6Caa ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Manifest Absolutely Anything. Discover the Universe's "7 Sacred Signs" that guide the way to unlocking your heart�s greatest desires. Access your free report now! ▲▲▲ http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data and the Art of Data Science

  1. 1. Big Data and the Art of Data Science Andrew B. Gardner, PhD www.linkedin.com/in/andywocky/ agardner@momentics.com www.momentics.com
  2. 2. Big Data is Not New Big Data Challenge tion e old 8 1880 census – 50M people The First Big Data Solution • Hollerith Tabulating System • Punched cards – 80 variables • Used for 1890 census • 6 weeks instead of 7+ years 9 Hollerith Tabulation System {age, number of insanes, …} 7 years  6 weeks Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
  3. 3. Big Data Is More Than 3 Vs* Volume Variety Velocity *2001 (Meta) / 2012 (Gartner) Definition of Big Data IDC Report 2011 8 billion TB in 2015 40 billion TB in 2020 90% of all data < 2 years storage  transport processing relational, graph time series, sensor, audio, video, text, geo, scientific, … 80% unstructured facebook 500 TB/day Large Hadron 35 GB/sec twitter 300K tweets/min real time  stream
  4. 4. Big Data Opportunities “… big data market will grow from $3.2B (2010) to $16.9B (2015)…” “… gains of 5-6% productivity and profitability …” “… business volume will double every 1.2 years …” “… required for companies to stay innovative and competitive …” “… retail 60% increase in net margin attainable …” “… manufacturing production costs decrease 50% …” “… $300B annual savings in healthcare …” IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
  5. 5. Big Data Successes Walmart • 10-15% online sales lift • $1B incremental revenue • Recommendations • Engineered content • 2012 Presidential Election • Fleet telematics save fuel
  6. 6. What’s Going On?
  7. 7. 1: Growth of Data Amount of data in the world… 2005 100 EB 2012 2800 EB 2013 8000 EB 1 EB = 1 Exabyte = 1 billion GB … doubles every 2 years
  8. 8. 2: Connectedness & Sources More non-human nodes online than people 50B+ non-human nodes online The Internet of Things (IoT) Source: Swan, M. Sensor Mania! The Internet of Things, Objective Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3), 217-253. social mobile web enriched data science IoT Data Sources
  9. 9. 3: Demand Increasing dependence on data.
  10. 10. 4: Economics Attention economy not information economy! • Data is bountiful • Storage is cheap • Computing is cheap • Analysis is cheap • Talent is expensive • Time is expensive
  11. 11. Big Data Disruption • define schema • pour in data • analyze Better Cycle Times and Better Questions Win!  (few) well calculated questions first • collect data • explore • schema as needed  data first then exploratory decision making unknown unknowns = insight gold OLD NEW
  12. 12. Rumsfeld Analytics Things we know don’t know we know we don’t know we know we don’t know Facts – could be wrong. Questions – do reporting. Intuition – quantify to improve. Exploration– unfair advantages. Goal: data discoveries = insights = game changers = unknown unknowns.
  13. 13. Data Alone is Just An Asset • Depreciating • Liability • Useful lifetime • Expense Finished goods create value from raw materials data $$ data product $$
  14. 14. Enter the Data Scientist • mathematical • developer • data talented • problem solver • insight whisperer • product savvy Source: FICO Infographic data + data scientist $$ data product $$
  15. 15. A Brief History of Data Science BC - The Greeks 1974 Peter Naur @ UoC 2001 William S. Cleveland @ CSU 2003 Journal of Data Science 2009 Jeff Hammerbacher @Facebook 2010 Hillary Mason & Chris Wiggins @ Dataists 2010 Mike Loukadis @ O'Reilly 2011 DJ Patil @ LinkedIn
  16. 16. Famous Definitions – New Blend Conway’s “Data Science” Venn Diagram (2010) Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram new skill blend: one stop rock star
  17. 17. Famous Definitions – Skeptic [… with a great salary]
  18. 18. Famous Definitions – Comparison
  19. 19. Many Flavors of Data Scientist Alternatively, Data Roles × Skill Sets Harlan Harris, et al. datacommunitydc.org/ blog/ wp- content/ uploads/ Analyzing the Analyze Harlan Harris, S Marck Vaisman O’Reilly, 2013 amazon.com/ dp … from research to development to business-focused Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013. role skill 2012-3 Survey
  20. 20. Universal Agreement: Scarcity In 2018 Huge shortage of analytic talent (140K+). Gap of 1.5M managers that can make decisions based on data analysis McKinsey Prediction • Talent is the biggest resource • There is a raging talent war Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011). http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
  21. 21. The Data Scientist’s Craft • Discover unknown unknowns in data • Obtain predictive, actionable insight • Communicate business data stories • Build business decision confidence • Create valuable Data Products
  22. 22. Valuable & Reusable Data Products Image credit: Harlan Harris
  23. 23. Building Data Products Objectives Levers Data Models What outcome am I trying to achieve? What inputs can we control? What data can we collect? How do the levers impact the data? Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
  24. 24. Data Product Aims provide increase open new improve data
  25. 25. Some Data Products fitbit flu tracker amazon traffic ads SIRI
  26. 26. How Do Data Scientists Do It? • Tools • Workflow • Creativity
  27. 27. Data Science Tools • Java, R, Python • Hadoop, HDFS, MapReduce, Spark, Storm • HBase, Pig, Hive, Shark, Impala • ETL, Webscrapers, Flume, Sqoop • SQL, RDBMS, DW, OLAP • Weka, RapidMiner, numpy, scipy, pandas • D3.js, ggplot2, Wakari, Tableau, Flare, Shiny • SPSS, Matlab, SAS • NoSQL, MongoDB, Redis, .. • MS-Excel • Machine Learning • ...
  28. 28. Data Science Workflow Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to the Factory: Building a Production Machine Learning Infrastructure.” + creative exploration
  29. 29. Data Science Creativity TECHNOLOGY (feasibility) BUSINESS (viability) HUMAN VALUES (usability, desirability) 1. Design thinking 2. Scientific method 3. Lots of ideas 4. Inspiration 5. Perspiration
  30. 30. Challenges for Data Scientists • Stakeholder naivetee – 2-3 days, right? • Red tape – No access allowed • Terminology – What’s a wonkulator? • Real world data – Messy, noisy, missing, … • Unknown need – What’s the business goal? • Stakeholder alignment – CMO, CIO, Prod, DevOps • Analysis distrust – … but I don’t like that result
  31. 31. Some Practical Tips Rapid Iteration Implement Implement Feedback Visualize, Draw, Sketch, Share Start Simple, Start Small Goal, But Not Perfection
  32. 32. Big Data Science & Sensemaking Source: HP “Monetizing Big Data” Perspective.
  33. 33. A Final Word of Caution big data hypehope happy time expectations cloud computing 2013 2018-2023 Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
  34. 34. Notable Quotes Simple models and a lot of data trump more elaborate models based on less data - Peter Norvig - W.E. Deming In God we trust, all others bring data. - Harvard Prof. Gary King Big data is not about the data! The value in big data [is in] the analytics.
  35. 35. Conclusion • Data is an asset, talent is a more valuable asset. • Big data represents a disruptive shift. • Data science is the magic enabler via Data Products. • Better + faster explorations & questions win. Andrew B. Gardner, PhD http://linkd.in/1byADxC agardner@momentics.com www.momentics.com

×