This document discusses the rise of big data and data science. It notes that while data volumes are growing exponentially, data alone is just an asset - it is data scientists that create value by building data products that provide insights. The document outlines the data science workflow and highlights both the tools used and challenges faced by data scientists in extracting value from big data.
Una breve introduzione alla data science e al machine learning con un'enfasi sugli scenari applicativi, da quelli tradizionali a quelli più innovativi. La overview copre la definizione di base di data science, una overview del machine learning e esempi su scenari tradizionali, Recommender systems e Social Network Analysis, IoT e Deep Learning
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at info@uplatz.com
Ordinary people included anyone who is not a Geek like myself. This book is written for ordinary people. That includes manager, marketers, technical writers, couch potatoes and so on.
Data Science and Analytics for Ordinary People is a collection of blogs I have written on LinkedIn over the past year. As I continue to perform big data analytics, I continue to discover, not only my weaknesses in communicating the information, but new insights into using the information obtained from analytics and communicating it. These are the kinds of things I blog about and are contained herein.
What is Big Data? What is Data Science? What are the benefits? How will they evolve in my organisation?
Built around the premise that the investment in big data is far less than the cost of not having it, this presentation made at a tech media industry event, this presentation will unveil and explore the nuances of Big Data and Data Science and their synergy forming Big Data Science. It highlights the benefits of investing in it and defines a path to their evolution within most organisations.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Una breve introduzione alla data science e al machine learning con un'enfasi sugli scenari applicativi, da quelli tradizionali a quelli più innovativi. La overview copre la definizione di base di data science, una overview del machine learning e esempi su scenari tradizionali, Recommender systems e Social Network Analysis, IoT e Deep Learning
This presentation is prepared by one of our renowned tutor "Suraj"
If you are interested to learn more about Big Data, Hadoop, data Science then join our free Introduction class on 14 Jan at 11 AM GMT. To register your interest email us at info@uplatz.com
Ordinary people included anyone who is not a Geek like myself. This book is written for ordinary people. That includes manager, marketers, technical writers, couch potatoes and so on.
Data Science and Analytics for Ordinary People is a collection of blogs I have written on LinkedIn over the past year. As I continue to perform big data analytics, I continue to discover, not only my weaknesses in communicating the information, but new insights into using the information obtained from analytics and communicating it. These are the kinds of things I blog about and are contained herein.
What is Big Data? What is Data Science? What are the benefits? How will they evolve in my organisation?
Built around the premise that the investment in big data is far less than the cost of not having it, this presentation made at a tech media industry event, this presentation will unveil and explore the nuances of Big Data and Data Science and their synergy forming Big Data Science. It highlights the benefits of investing in it and defines a path to their evolution within most organisations.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
Talk by Charles Parker (BigML) at BigMine12 at KDD12.
In machine learning, scale adds complexity. The most obvious consequence of scale is that data takes longer to process. At certain points, however, scale makes trivial operations costly, thus forcing us to re-evaluate algorithms in light of the complexity of those operations. Here, we will discuss one important way a general large scale machine learning setting may differ from the standard supervised classification setting and show some the results of some preliminary experiments highlighting this difference. The results suggest that there is potential for significant improvement beyond obvious solutions.
In this talk, we introduce the Data Scientist role , differentiate investigative and operational analytics, and demonstrate a complete Data Science process using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn. We also touch the usage of Python in Big Data context, using Hadoop and Spark.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
Talk by Charles Parker (BigML) at BigMine12 at KDD12.
In machine learning, scale adds complexity. The most obvious consequence of scale is that data takes longer to process. At certain points, however, scale makes trivial operations costly, thus forcing us to re-evaluate algorithms in light of the complexity of those operations. Here, we will discuss one important way a general large scale machine learning setting may differ from the standard supervised classification setting and show some the results of some preliminary experiments highlighting this difference. The results suggest that there is potential for significant improvement beyond obvious solutions.
In this talk, we introduce the Data Scientist role , differentiate investigative and operational analytics, and demonstrate a complete Data Science process using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn. We also touch the usage of Python in Big Data context, using Hadoop and Spark.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...Lightbend
In this webinar, Engineering Manager at Credit Karma, Dustin Lyons, discusses how not long ago his team was facing a common challenge shared by many financial services architects and engineering leaders: not only how to move from the offline, batch-mode processing of Big Data to streaming, Fast Data, and how to enable real-time decision making based on the information flowing in from over 60 million members.
Dustin reviews how his team migrated away from PHP and successfully implemented Akka Streams with Apache Kafka to ingest, process and route real-time events throughout their data ecosystem. At the end of this presentation, you’ll better understand:
* The design considerations for new Fast Data architectures, from streaming to microservices to real-time analysis.
* Some lessons learned when it comes to progressing from batch to streaming using Akka, Spark and Kafka
* Why Akka’s self-healing actor model and the resilience that it provides is actually what matters most when delivering real-time customer experiences
What is the impact of Big Data on Analytics from a Data Science perspective.
Presented at the Big Data and Analytics Summit 2014, Nasscom by Mamatha Upadhyaya.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
The term 'Data Science' was first described in scientific literature about 15 years ago. It started to become a major trend in industry about 7 years ago.
O'Reilly Media surveys the industry extensively each year. In addition we get a good birds-eye view of industry trends through our conference programs and publications, working closely with some of the best practitioners in Data Science.
By now, the field has evolved far beyond its origins eclipsing an earlier generation of Business Intelligence and Data Warehousing approaches. Data Science is moving up, into the business verticals and government spheres of influence where it has true global impact.
This talk considers Data Science trends from the past three years in particular. What is emerging? Which parts are evolving? Which seem cluttered and poised for consolidation or other change?
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/thu/slot-2.html
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
You've heard the news, Data Science is the cool new career opportunity sweeping the world. Come learn from Thinkful Mentors all about this new and exciting industry.
A l'occasion de l'eGov Innovation Day 2014 - DONNÉES DE L’ADMINISTRATION, UNE MINE (qui) D’OR(t) - Philippe Cudré-Mauroux présente Big Data et eGovernment.
Today, we have data – lots of it. We can process information – in many ways. And with these two tools and a little bit of creativity, we are discovering the vast depths of human behavior and by extension, a way to accurately predict the future -- and our future happiness. In fact, we can quantify human movement, behaviors, desires, and even moods on a scale that wasn’t possible before a series of advances in processing power, developments in psychology and social network science, and most importantly, access to data.
In advertising, industry, and humanity, we have experienced the evolution from Web 1.0 (informational) to Web 2.0 (platform) to Web 3.0 (semantic) to elements of Web 4.0 (anticipatory) – In this anticipatory era, what can we dream of next? Beyond addressability and increasing ad relevance, how can businesses utilize these advances in product development and other market initiatives? Can we make the leap from inductive logic to human-paralleled intuition? Can this make up for our human brain mechanics that make predicting our own happiness so difficult?
In this talk we’ll cover the evolutions in data access, models for information processing, and the science of collaboration to see not only how they have been leveraged in businesses but also how they are used to better understand human behavior, and hopefully in the near future, a little bit of happiness.
Abstract: Knowledge has played a significant role on human activities since his development. Data mining is the process of
knowledge discovery where knowledge is gained by analyzing the data store in very large repositories, which are analyzed
from various perspectives and the result is summarized it into useful information. Due to the importance of extracting
knowledge/information from the large data repositories, data mining has become a very important and guaranteed branch of
engineering affecting human life in various spheres directly or indirectly. The purpose of this paper is to survey many of the
future trends in the field of data mining, with a focus on those which are thought to have the most promise and applicability
to future data mining applications.
Keywords: Current and Future of Data Mining, Data Mining, Data Mining Trends, Data mining Applications.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
"Big Data" is term heard more and more in industry – but what does it really mean? There is a vagueness to the term reminiscent of that experienced in the early days of cloud computing. This has led to a number of implications for various industries and enterprises. These range from identifying the actual skills needed to recruit talent to articulating the requirements of a "big data" project. Secondary implications include difficulties in finding solutions that are appropriate to the problems at hand – versus solutions looking for problems. This presentation will take a look at Big Data and offer the audience with some considerations they may use immediately to assess the use of analytics in solving their problems.
The talk begins with an idea of how big "Big Data" can be. This leads to an appreciation of how important "Management Questions" are to assessing analytic needs. The fields of data and analysis have become extremely important and impact nearly all facets of life and business. During the talk we will look at the two pillars of Big Data – Data Warehousing and Predictive Analytics. Then we will explore the open source tools and datasets available to NATO action officers to work in this domain. Use cases relevant to NATO will be explored with the purpose of show where analytics lies hidden within many of the day-to-day problems of enterprises. The presentation will close with a look at the future. Advances in the area of semantic technologies continue. The much acclaimed consultants at Gartner listed Big Data and Semantic Technologies as the first- and third-ranked top technology trends to modernize information management in the coming decade. They note there is an incredible value "locked inside all this ungoverned and underused information." HQ SACT can leverage this powerful analytic approach to capture requirement trends when establishing acquisition strategies, monitor Priority Shortfall Areas, prepare solicitations, and retrieve meaningful data from archives.
Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
Similar to Big Data and the Art of Data Science (20)
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Big Data and the Art of Data Science
1. Big Data and the
Art of Data Science
Andrew B. Gardner, PhD
www.linkedin.com/in/andywocky/
agardner@momentics.com
www.momentics.com
2. Big Data is Not New
Big Data Challenge
tion
e
old
8
1880 census – 50M people
The First Big Data Solution
• Hollerith Tabulating
System
• Punched cards – 80
variables
• Used for 1890 census
• 6 weeks instead of 7+
years
9
Hollerith Tabulation System
{age, number of insanes, …} 7 years 6 weeks
Image Credit – http://en.wikipedia.org/wiki/File:1880_census_Edison.gif
Image Credit – http://en.wikipedia.org/wiki/File:Hollerith_Punched_Card.jpg
Image Credit – http://en.wikipedia.org/wiki/File:HollerithMachine.CHM.jpg
3. Big Data Is More Than 3 Vs*
Volume Variety Velocity
*2001 (Meta) / 2012 (Gartner) Definition of Big Data
IDC Report 2011
8 billion TB in 2015
40 billion TB in 2020
90% of all data < 2 years
storage transport
processing
relational, graph
time series, sensor,
audio, video, text,
geo, scientific, …
80% unstructured
facebook 500 TB/day
Large Hadron 35 GB/sec
twitter 300K tweets/min
real time stream
4. Big Data Opportunities
“… big data market will grow from $3.2B (2010) to $16.9B (2015)…”
“… gains of 5-6% productivity and profitability …”
“… business volume will double every 1.2 years …”
“… required for companies to stay innovative and competitive …”
“… retail 60% increase in net margin attainable …”
“… manufacturing production costs decrease 50% …”
“… $300B annual savings in healthcare …”
IBM | The Economist | McKinsey & Company | PWC | KPMG | Accenture
5. Big Data Successes
Walmart
• 10-15% online sales lift
• $1B incremental revenue
• Recommendations
• Engineered content
• 2012 Presidential Election • Fleet telematics save fuel
7. 1: Growth of Data
Amount of data in the world…
2005
100 EB
2012
2800 EB
2013
8000 EB
1 EB = 1 Exabyte = 1 billion GB
… doubles every 2 years
8. 2: Connectedness & Sources
More non-human
nodes online than
people
50B+ non-human
nodes online
The Internet of Things (IoT)
Source: Swan, M. Sensor Mania! The Internet of Things, Objective
Metrics, and the Quantified Self 2.0. J Sens Actuator Netw (2012) 1(3),
217-253.
social
mobile
web
enriched data
science
IoT
Data Sources
10. 4: Economics
Attention economy not information economy!
• Data is bountiful
• Storage is cheap
• Computing is cheap
• Analysis is cheap
• Talent is expensive
• Time is expensive
11. Big Data Disruption
• define schema
• pour in data
• analyze
Better Cycle Times and Better Questions Win!
(few) well calculated
questions first
• collect data
• explore
• schema as needed
data first then
exploratory decision
making
unknown unknowns = insight gold
OLD NEW
12. Rumsfeld Analytics
Things we
know
don’t know
we know
we don’t
know
we know
we don’t
know
Facts – could be wrong.
Questions – do reporting.
Intuition – quantify to improve.
Exploration– unfair advantages.
Goal: data discoveries = insights = game changers = unknown unknowns.
13. Data Alone is Just An Asset
• Depreciating
• Liability
• Useful lifetime
• Expense
Finished goods create value
from raw materials
data
$$ data product $$
14. Enter the Data Scientist
• mathematical
• developer
• data talented
• problem solver
• insight whisperer
• product savvy
Source: FICO Infographic
data + data scientist
$$ data product $$
15. A Brief History of Data Science
BC - The Greeks
1974 Peter Naur @ UoC
2001 William S. Cleveland @ CSU
2003 Journal of Data Science
2009 Jeff Hammerbacher @Facebook
2010 Hillary Mason & Chris Wiggins @ Dataists
2010 Mike Loukadis @ O'Reilly
2011 DJ Patil @ LinkedIn
16. Famous Definitions – New Blend
Conway’s “Data Science” Venn Diagram (2010)
Image credit: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
new skill blend:
one stop rock star
19. Many Flavors of Data Scientist
Alternatively, Data Roles × Skill Sets
Harlan Harris, et al.
datacommunitydc.org/ blog/ wp- content/ uploads/
Analyzing the Analyze
Harlan Harris, S
Marck Vaisman
O’Reilly, 2013
amazon.com/ dp
… from research
to development
to business-focused
Source / Image Credit: H. Harris, S. Murphy, M. Vaisman. “Analyzing the Analyzers.” O’Reilly Media, Jun 2013.
role
skill
2012-3 Survey
20. Universal Agreement: Scarcity
In 2018
Huge shortage of analytic
talent (140K+).
Gap of 1.5M managers that
can make decisions based on
data analysis
McKinsey Prediction
• Talent is the biggest resource
• There is a raging talent war
Source: J. Manyika et al., “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011).
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
21. The Data Scientist’s Craft
• Discover unknown unknowns in data
• Obtain predictive, actionable insight
• Communicate business data stories
• Build business decision confidence
• Create valuable Data Products
23. Building Data Products
Objectives
Levers
Data
Models
What outcome am I trying to achieve?
What inputs can we control?
What data can we collect?
How do the levers impact the data?
Source / Adapted From: J. Howard,. “Designing Great Data Products.” O’Reilly Media, Mar 2012.
28. Data Science Workflow
Source: Josh Wills, Senior Director of Data Science, Cloudera. “From the Lab to
the Factory: Building a Production Machine Learning Infrastructure.”
+ creative exploration
30. Challenges for Data Scientists
• Stakeholder naivetee
– 2-3 days, right?
• Red tape
– No access allowed
• Terminology
– What’s a wonkulator?
• Real world data
– Messy, noisy, missing,
…
• Unknown need
– What’s the business goal?
• Stakeholder alignment
– CMO, CIO, Prod, DevOps
• Analysis distrust
– … but I don’t like that result
31. Some Practical Tips
Rapid Iteration
Implement Implement
Feedback
Visualize, Draw, Sketch, Share
Start Simple, Start Small Goal, But Not Perfection
32. Big Data Science & Sensemaking
Source: HP “Monetizing Big Data” Perspective.
33. A Final Word of Caution
big data
hypehope happy
time
expectations
cloud computing
2013 2018-2023
Adapted from: Gartner’s 2013 Hype Cycle Special Report (Jul 2013).
34. Notable Quotes
Simple models and a lot of data trump more elaborate
models based on less data
- Peter Norvig
- W.E. Deming
In God we trust, all others bring data.
- Harvard Prof. Gary King
Big data is not about the data! The value in big data
[is in] the analytics.
35. Conclusion
• Data is an asset, talent is
a more valuable asset.
• Big data represents a
disruptive shift.
• Data science is the magic
enabler via Data Products.
• Better + faster
explorations &
questions win.
Andrew B. Gardner, PhD
http://linkd.in/1byADxC
agardner@momentics.com
www.momentics.com
Editor's Notes
Herman HollerithObsolete1880 – 50,189,2091890 – 62,947,714
~ 15 mins via 10Gbps LAN to transfer 1TB~ 220 hrs for 1 PB => move the servers?
Harlan Harris
Data is the new currency of business.Understand customer use, behavior, and interests. Targeted products and marketing offers Understand customer experience across network, services, and social conversation.Network optimization Connect with OTT players, advertisers, and verticals. New business models