• Save
Big Data and the Art of Data Science

Like this? Share it with your network


Big Data and the Art of Data Science

Uploaded on

Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so......

Everybody has heard of Big Data, and its promise as the next great frontier for innovation. However, Big Data is neither new nor easily defined. What are the key drivers that make Big Data so critically important today? What is the single idea behind Big Data that promises such game changing outcomes for capable organizations? Who are the skilled talent that deliver Big Data results?

This presentation briefly reviews the opportunities, motivation and trends that are driving Big Data disruption. Data science is introduced as the enabling engine for Big Data transformation via the creation of new Data Products. The data scientist is defined and his tools, workflow and challenges are reviewed. Finally, practical tips are presented for approaching data product development.

Key takeaways include:
- Big Data disruption is driven by four megatrends
- Data is the essential raw material for creating valuable Data Products
- Data scientists are heterogeneous by role & skill set, but share common tools, workflows and challenges
- Data science talent is more important than raw data for Big Data success

These slides are modified from an invited presentation for the Gwinnett Chamber of Commerce on March 18, 2014. An excerpt was presented at the Georgia Pacific Social Media Working Session on March 19, 2014.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 15

http://www.linkedin.com 8
https://www.linkedin.com 3 2
https://twitter.com 1
http://www.slidesearchengine.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Herman HollerithObsolete1880 – 50,189,2091890 – 62,947,714
  • ~ 15 mins via 10Gbps LAN to transfer 1TB~ 220 hrs for 1 PB => move the servers?
  • Harlan Harris
  • Data is the new currency of business.Understand customer use, behavior, and interests. Targeted products and marketing offers Understand customer experience across network, services, and social conversation.Network optimization Connect with OTT players, advertisers, and verticals. New business models


  • 1. Big Data and the Art of Data Science Andrew B. Gardner, PhD www.linkedin.com/in/andywocky/ agardner@momentics.com www.momentics.com
  • 2. Big Data is Not New Big Data Challenge tion e old 8 1880 census – 50M people The First Big Data Solution • Hollerith Tabulating System • Punched cards – 80 variables • Used for 1890 census • 6 weeks instead of 7+ years 9 Hollerith Tabulation System {age, number of insanes, …} 7 years  6 weeks Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
  • 3. Big Data Is More Than 3 Vs* Volume Variety Velocity *2001 (Meta) / 2012 (Gartner) Definition of Big Data IDC Report 2011 8 billion TB in 2015 40 billion TB in 2020 90% of all data < 2 years storage  transport processing relational, graph time series, sensor, audio, video, text, geo, scientific, … 80% unstructured facebook 500 TB/day Large Hadron 35 GB/sec twitter 300K tweets/min real time  stream
  • 4. Big Data Opportunities “… big data market will grow from $3.2B (2010) to $16.9B (2015)…” “… gains of 5-6% productivity and profitability …” “… business volume will double every 1.2 years …” “… required for companies to stay innovative and competitive …” “… retail 60% increase in net margin attainable …” “… manufacturing production costs decrease 50% …” “… $300B annual savings in healthcare …” IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
  • 5. Big Data Successes Walmart • 10-15% online sales lift • $1B incremental revenue • Recommendations • Engineered content • 2012 Presidential Election • Fleet telematics save fuel
  • 6. What’s Going On?
  • 7. 1: Growth of Data Amount of data in the world… 2005 100 EB 2012 2800 EB 2013 8000 EB 1 EB = 1 Exabyte = 1 billion GB … doubles every 2 years
  • 8. 2: Connectedness & Sources More non-human nodes online than people 50B+ non-human nodes online The Internet of Things (IoT) Source: Swan, M. Sensor Mania! The Internet of Things, Objective Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3), 217-253. social mobile web enriched data science IoT Data Sources
  • 9. 3: Demand Increasing dependence on data.
  • 10. 4: Economics Attention economy not information economy! • Data is bountiful • Storage is cheap • Computing is cheap • Analysis is cheap • Talent is expensive • Time is expensive
  • 11. Big Data Disruption • define schema • pour in data • analyze Better Cycle Times and Better Questions Win!  (few) well calculated questions first • collect data • explore • schema as needed  data first then exploratory decision making unknown unknowns = insight gold OLD NEW
  • 12. Rumsfeld Analytics Things we know don’t know we know we don’t know we know we don’t know Facts – could be wrong. Questions – do reporting. Intuition – quantify to improve. Exploration– unfair advantages. Goal: data discoveries = insights = game changers = unknown unknowns.
  • 13. Data Alone is Just An Asset • Depreciating • Liability • Useful lifetime • Expense Finished goods create value from raw materials data $$ data product $$
  • 14. Enter the Data Scientist • mathematical • developer • data talented • problem solver • insight whisperer • product savvy Source: FICO Infographic data + data scientist $$ data product $$
  • 15. A Brief History of Data Science BC - The Greeks 1974 Peter Naur @ UoC 2001 William S. Cleveland @ CSU 2003 Journal of Data Science 2009 Jeff Hammerbacher @Facebook 2010 Hillary Mason & Chris Wiggins @ Dataists 2010 Mike Loukadis @ O'Reilly 2011 DJ Patil @ LinkedIn
  • 16. Famous Definitions – New Blend Conway’s “Data Science” Venn Diagram (2010) Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram new skill blend: one stop rock star
  • 17. Famous Definitions – Skeptic [… with a great salary]
  • 18. Famous Definitions – Comparison
  • 19. Many Flavors of Data Scientist Alternatively, Data Roles × Skill Sets Harlan Harris, et al. datacommunitydc.org/ blog/ wp- content/ uploads/ Analyzing the Analyze Harlan Harris, S Marck Vaisman O’Reilly, 2013 amazon.com/ dp … from research to development to business-focused Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013. role skill 2012-3 Survey
  • 20. Universal Agreement: Scarcity In 2018 Huge shortage of analytic talent (140K+). Gap of 1.5M managers that can make decisions based on data analysis McKinsey Prediction • Talent is the biggest resource • There is a raging talent war Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011). http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
  • 21. The Data Scientist’s Craft • Discover unknown unknowns in data • Obtain predictive, actionable insight • Communicate business data stories • Build business decision confidence • Create valuable Data Products
  • 22. Valuable & Reusable Data Products Image credit: Harlan Harris
  • 23. Building Data Products Objectives Levers Data Models What outcome am I trying to achieve? What inputs can we control? What data can we collect? How do the levers impact the data? Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
  • 24. Data Product Aims provide increase open new improve data
  • 25. Some Data Products fitbit flu tracker amazon traffic ads SIRI
  • 26. How Do Data Scientists Do It? • Tools • Workflow • Creativity
  • 27. Data Science Tools • Java, R, Python • Hadoop, HDFS, MapReduce, Spark, Storm • HBase, Pig, Hive, Shark, Impala • ETL, Webscrapers, Flume, Sqoop • SQL, RDBMS, DW, OLAP • Weka, RapidMiner, numpy, scipy, pandas • D3.js, ggplot2, Wakari, Tableau, Flare, Shiny • SPSS, Matlab, SAS • NoSQL, MongoDB, Redis, .. • MS-Excel • Machine Learning • ...
  • 28. Data Science Workflow Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to the Factory: Building a Production Machine Learning Infrastructure.” + creative exploration
  • 29. Data Science Creativity TECHNOLOGY (feasibility) BUSINESS (viability) HUMAN VALUES (usability, desirability) 1. Design thinking 2. Scientific method 3. Lots of ideas 4. Inspiration 5. Perspiration
  • 30. Challenges for Data Scientists • Stakeholder naivetee – 2-3 days, right? • Red tape – No access allowed • Terminology – What’s a wonkulator? • Real world data – Messy, noisy, missing, … • Unknown need – What’s the business goal? • Stakeholder alignment – CMO, CIO, Prod, DevOps • Analysis distrust – … but I don’t like that result
  • 31. Some Practical Tips Rapid Iteration Implement Implement Feedback Visualize, Draw, Sketch, Share Start Simple, Start Small Goal, But Not Perfection
  • 32. Big Data Science & Sensemaking Source: HP “Monetizing Big Data” Perspective.
  • 33. A Final Word of Caution big data hypehope happy time expectations cloud computing 2013 2018-2023 Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
  • 34. Notable Quotes Simple models and a lot of data trump more elaborate models based on less data - Peter Norvig - W.E. Deming In God we trust, all others bring data. - Harvard Prof. Gary King Big data is not about the data! The value in big data [is in] the analytics.
  • 35. Conclusion • Data is an asset, talent is a more valuable asset. • Big data represents a disruptive shift. • Data science is the magic enabler via Data Products. • Better + faster explorations & questions win. Andrew B. Gardner, PhD http://linkd.in/1byADxC agardner@momentics.com www.momentics.com