Data Science Innovations
#bigdata&artificialintelligence, #thinkdifferentlyaboutdata
6 June 2018
Suresh Sood, PhD
@soody,
linkedin.com/in/sureshsood
suresh.sood@uts.edu.au
Vignettes in the two-step arrival of the internet
of things and its reshaping of marketing
management’s service-dominant logic
Woodside & Sood
Journal of Marketing
Management Volume 33, 2017 -
Issue 1-2: The Internet of Things
(IoT) and Marketing: The State of
Play, Future Trends and the
Implications for Marketing
Areas for Conversation
Democratisation of data science (AI & tools)
Democratisation of big data
Gartner & Forrester Trends
 Natural Language Generation
 Natural Language Processing
 Systems of Insight
Data Science Innovation
#Thinkingdifferentlyaboutdata
Data science innovation is something
an organization has not done before or
even something nobody anywhere has
done before. A data science innovation
focuses on discovering and using new
or untraditional data sources to solve
new problems.
Adapted from:
Franks, B. (2012) Taming the Big Data Tidal
Wave, p. 255, John Wiley & Son
Data Science Algorithms
Companies are reimagining Business
Processes with Algorithms and there
is “evidence of significant, even
exponential, business gains in customer’s
customer engagement,
cost & revenue performance”
Wilson, H., Alter A. and Shukla, P. (2016),
Companies Are Reimagining Business Processes
with Algorithms, Harvard Business Review,
February
Variety of Data Types & Big Data Challenge
1.Astronomical
2.Documents
3.Earthquake
4.Email
5.Environmental sensors
6.Fingerprints
7.Health (personal) Images
8.Graph data (social network)
9.Location
10.Marine
11.Particle accelerator
12.Satellite
13.Scanned survey data
14.Sound & Music
15.Text
16.Transactions
17.Video Big Data consists of extensive datasets primarily in the characteristics
of volume, variety, velocity, and/or variability that require a scalable
architecture for efficient storage, manipulation, and analysis.
. Computational portability is the movement of the computation to the location of the data.
• The data collected in a single day take nearly two million years to playback on an MP3 player
• Generates enough raw data to fill 15 million 64GB iPods every day
• The central computer has processing power of about one hundred million PCs
• Uses enough optical fiber linking up all the radio telescopes to wrap twice around the Earth
• The dishes when fully operational will produce 10 times the global internet traffic as of 2013
• The supercomputer will perform 1018 operations per second - equivalent to the number of stars in
three million Milky Way galaxies - in order to process all the data produced.
• Sensitivity to detect an airport radar on a planet 50 light years away.
• Thousands of antennas with a combined collecting area of 1,000,000 square meters - 1 sqkm)
• Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations and several
years - SKA ETA 5 minutes !
To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which,
according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came
into existence. As a scientist, this is a once in a lifetime opportunity.”
Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska
Galileo
Square Kilometer Array Construction
(SKA1 - 2018-23; SKA2 - 2023-30)
Centaurus A
The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide
jackets, and so on):
SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where
(V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like
'%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like
'%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%')
The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record,
spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest
open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates
spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.
GDELT + BigQuery = Query The Planet
The ANZ Heavy Traffic Index comprises
flows of vehicles weighing more than 3.5
tonnes (primarily trucks) on 11 selected
roads around NZ. It is contemporaneous
with GDP growth.
The ANZ Light Traffic Index is made up of
light or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth in normal circumstances (but
cannot predict sudden adverse events such
as the Global Financial Crisis).
http://www.a http://www.anz.co.nz/about-us/economic-markets-research/truckometer/
ANZ TRUCKOMETER
The
“Massive"
Skills Gap
US data only & includes job title of Marketing Manager
Source: Investing in America’s data science and analytics talent, PWC, April 2017
“There is a MASSIVE shortage of marketers that are skilled in the art of
data analysis…number of marketers with analytics skills is DECREASING
as the job levels increase toward CxO. This discrepancy between the
demand and supply is the most in all of the experiment, at over 10x for
every level…Google Analytics has 1.5x more demand than supply (only
7% of marketers have it)
Source: Marketing Skills 2017: Are You Qualified to Be Hired? [Updated], Ryan
Mccready, Sept 08, 2016, last viewed 18 November 2017
<https://venngage.com/blog/marketing-skills-2016/>
14© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Source: Forrester Research eCommerce Trends And Outlook For Asia Pacific Trends In Australia, China, India, Japan, South Korea From The Forrester Data: Online Retail
Forecast, 2016 To 2021 (Asia Pacific) June 28, 2017
Asia Pacific contains both the largest and the fastest-growing
eCommerce markets (China and India, respectively). Today,
total online retail revenues in just five markets in Asia Pacific
— China, Japan, South Korea, India, and Australia — surpass
the combined figure for online retail in the US and all of
Western Europe.
In these markets, total online retail revenues will jump from
$862 billion in 2016 to $1.4 trillion in 2021.
E-Commerce Outlook 2016-2021
Online tenure leads to more spending per customer
High engagement leads to more orders, more
categories purchased, and more spend
https://www.quillengage.com
Oil reserves shipment monitoring
Ras Tanura Najmah compound, Saudi Arabia
Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
https://nodexl.codeplex.com/
Sherman and Young (2016), When Financial Reporting Still Falls
Short, Harvard Business Review, July-August
Sood (2015), Truth, Lies and Brand Trust The Deceit Algorithm,
http://datafication.com.au/
New Analytical Tools Can Help
19
Deception Algorithm
(1) Self words e.g. “I” and “me” – decrease when someone distances themselves from content
(2) Exclusive words e.g. “but” and “or” decrease with fabricated content owing to complexity of
maintaining deception
(3) Negative emotion words e.g. “hate” increase in word usage owing to shame or guilty feeling
(4) Motion verbs e.g. “go” or “move” increase as exclusive words go down to keep the story on
track
I. Natural Language Processing Leads to New Areas of Discovery
http://www.analyzewords.com
20
(Berger and Packard 2018)
Are Atypical Things More Popular?
Psychological Science
Every business would love to know the minds of its
competitors, and what they are likely to do next.
Strategy analysts have thus far used simple tools…But
new research at Wharton has shown how natural
language processing techniques could be used to
parse tomes of unstructured data such as text buried in
conference calls or annual reports to more accurately
anticipate competitor strategies. The research opens
new pathways to measure and test assumptions firms
make in their competitive strategies, and to “visualize
how firms are positioned with respect to each other,
and then map that on to performance consequences
(Menon and Choi 2018)
“What You Say Your Strategy Is and Why It Matters: Natural
Language Processing of Unstructured Texts,”
II. Natural Language Processing Leads to New Areas of Discovery
Music Strategy
Language on Twitter Tracks Rates of Coronary Heart
Disease, Psychological Science, January 2015
22
The findings show that expressions of negative emotions such as anger, stress, and fatigue in the tweets from people in a given
county were associated with higher heart disease risk in that county.
On the other hand, expressions of positive emotions like excitement and optimism were associated with lower risk.
The results suggest that using Twitter as a window into a community’s collective mental state may provide a
useful tool in epidemiology…So predictions from Twitter can actually be more accurate than using a set of
traditional variables.
2017 Hype Cycle for Data Science and Machine Learning,
29 July, http://www.gartner.com/document/3772081
Gartner (2017)
Strategic Predictions for 2017 and Beyond, research note
14 October, http://www.gartner.com/document/3471568
 By 2020-22 :
 100 million consumers shop in augmented reality
 30% of web browsing sessions without a screen
 Algorithms positively alter behavior of over 1B
 Blockchain-based business worth $10B
 IoT will save consumers/businesses $1T a year
 40% of employees cut healthcare costs via fitness tracker
Smart Data Discovery Will Enable New Class of Citizen Data Scientist ( Gartner 2015)
“With the addition of NLG [Natural Language Generation], smart data discovery
platforms automatically present a written or spoken context-based narrative of
findings in the data that, alongside the visualization, inform the user about what is
most important for them to act on in the data.”
Systems of Insight (Forrester 2015)
 Automated pattern extraction
 Outlier detection
 Correlation
 Time series
 Analytics integration with process, app or IoT
25
outlier-detection “allow detecting a significant fraction
of fraudulent cases…different in nature from historical
fraud…resulting in a novel fraud pattern”
Baesens, B., Vlasselaer, V., and Verbeke, W., 2015, Fraud Analytics Using Descriptive, Predictive,
and Social Network Techniques: A Guide to Data Science for Fraud Detection, Wiley
26© 2017 FORRESTER. REPRODUCTION PROHIBITED.
Forrester Research, 2016
Reports
&
Analysis
Visualisation
&
Interpretation
Write
Data/Business
“Story”
Insights
Led by Data Analyst or
Scientist
SME owner or Corporate , Machine Learning and Natural Language Generation
Fusion of data science, business knowledge & creativity for maximium ROI
Data
Aggregation Operationalise
Detect &
Extract
Patterns and
Relationships
Generate
Insights &
Story
Process
Application
IoT
Data
Aggregation
or
Data Set
Traditional Analytics: Slow & Expensive
80% of time sifting through data
System of Insight (SoI)
SoI: Fast & Cost Effective
80% of time in decision making with client
Systems of Insight
• Helps move away from “crisis levels” in talent
• Traditional 5 step analytics process reduced to 2 step from data to action
• Reimagine business processes through “machine engineering”
• Minimise messy data issues and data preparation time
Better customer experiences . . .
. . . and half the inventory-carrying costs
of other online fashion retailers.
Forrester, 2016
Data Science Resources
University of Helsinki :
Online AI Course
https://www.elementsofai.com/
http://brookfieldinstitute.ca/wp-
content/uploads/2016/06/Talented
MrRobot.pdf
https://industry.gov.au/Innovation-and-
Science-Australia/Documents/Australia-2030-
Prosperity-through-Innovation-Full-Report.pdf
Deep Learning Libraries, Platforms, APIs and Hardware
Next Step
Start using Data Science Resources
Systems of Insight and innovative data sources
Natural Language Generation
34
The future is impossible to predict.
However one thing is certain :
The company that can excite it’s customers dreams
Is out ahead in the race to business success
Selling Dreams, Gian Luigi Longinotti

Bigdata ai

  • 1.
    Data Science Innovations #bigdata&artificialintelligence,#thinkdifferentlyaboutdata 6 June 2018 Suresh Sood, PhD @soody, linkedin.com/in/sureshsood suresh.sood@uts.edu.au
  • 2.
    Vignettes in thetwo-step arrival of the internet of things and its reshaping of marketing management’s service-dominant logic Woodside & Sood Journal of Marketing Management Volume 33, 2017 - Issue 1-2: The Internet of Things (IoT) and Marketing: The State of Play, Future Trends and the Implications for Marketing
  • 4.
    Areas for Conversation Democratisationof data science (AI & tools) Democratisation of big data Gartner & Forrester Trends  Natural Language Generation  Natural Language Processing  Systems of Insight
  • 5.
    Data Science Innovation #Thinkingdifferentlyaboutdata Datascience innovation is something an organization has not done before or even something nobody anywhere has done before. A data science innovation focuses on discovering and using new or untraditional data sources to solve new problems. Adapted from: Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son Data Science Algorithms Companies are reimagining Business Processes with Algorithms and there is “evidence of significant, even exponential, business gains in customer’s customer engagement, cost & revenue performance” Wilson, H., Alter A. and Shukla, P. (2016), Companies Are Reimagining Business Processes with Algorithms, Harvard Business Review, February
  • 6.
    Variety of DataTypes & Big Data Challenge 1.Astronomical 2.Documents 3.Earthquake 4.Email 5.Environmental sensors 6.Fingerprints 7.Health (personal) Images 8.Graph data (social network) 9.Location 10.Marine 11.Particle accelerator 12.Satellite 13.Scanned survey data 14.Sound & Music 15.Text 16.Transactions 17.Video Big Data consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis. . Computational portability is the movement of the computation to the location of the data.
  • 8.
    • The datacollected in a single day take nearly two million years to playback on an MP3 player • Generates enough raw data to fill 15 million 64GB iPods every day • The central computer has processing power of about one hundred million PCs • Uses enough optical fiber linking up all the radio telescopes to wrap twice around the Earth • The dishes when fully operational will produce 10 times the global internet traffic as of 2013 • The supercomputer will perform 1018 operations per second - equivalent to the number of stars in three million Milky Way galaxies - in order to process all the data produced. • Sensitivity to detect an airport radar on a planet 50 light years away. • Thousands of antennas with a combined collecting area of 1,000,000 square meters - 1 sqkm) • Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations and several years - SKA ETA 5 minutes ! To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which, according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came into existence. As a scientist, this is a once in a lifetime opportunity.” Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska Galileo Square Kilometer Array Construction (SKA1 - 2018-23; SKA2 - 2023-30) Centaurus A
  • 9.
    The following BigQueryquery (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide jackets, and so on): SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where (V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like '%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like '%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%') The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well. GDELT + BigQuery = Query The Planet
  • 10.
    The ANZ HeavyTraffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth. The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth in normal circumstances (but cannot predict sudden adverse events such as the Global Financial Crisis). http://www.a http://www.anz.co.nz/about-us/economic-markets-research/truckometer/ ANZ TRUCKOMETER
  • 13.
    The “Massive" Skills Gap US dataonly & includes job title of Marketing Manager Source: Investing in America’s data science and analytics talent, PWC, April 2017 “There is a MASSIVE shortage of marketers that are skilled in the art of data analysis…number of marketers with analytics skills is DECREASING as the job levels increase toward CxO. This discrepancy between the demand and supply is the most in all of the experiment, at over 10x for every level…Google Analytics has 1.5x more demand than supply (only 7% of marketers have it) Source: Marketing Skills 2017: Are You Qualified to Be Hired? [Updated], Ryan Mccready, Sept 08, 2016, last viewed 18 November 2017 <https://venngage.com/blog/marketing-skills-2016/>
  • 14.
    14© 2017 FORRESTER.REPRODUCTION PROHIBITED. Source: Forrester Research eCommerce Trends And Outlook For Asia Pacific Trends In Australia, China, India, Japan, South Korea From The Forrester Data: Online Retail Forecast, 2016 To 2021 (Asia Pacific) June 28, 2017 Asia Pacific contains both the largest and the fastest-growing eCommerce markets (China and India, respectively). Today, total online retail revenues in just five markets in Asia Pacific — China, Japan, South Korea, India, and Australia — surpass the combined figure for online retail in the US and all of Western Europe. In these markets, total online retail revenues will jump from $862 billion in 2016 to $1.4 trillion in 2021. E-Commerce Outlook 2016-2021
  • 15.
    Online tenure leadsto more spending per customer High engagement leads to more orders, more categories purchased, and more spend https://www.quillengage.com
  • 16.
    Oil reserves shipmentmonitoring Ras Tanura Najmah compound, Saudi Arabia Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
  • 17.
  • 18.
    Sherman and Young(2016), When Financial Reporting Still Falls Short, Harvard Business Review, July-August Sood (2015), Truth, Lies and Brand Trust The Deceit Algorithm, http://datafication.com.au/ New Analytical Tools Can Help
  • 19.
    19 Deception Algorithm (1) Selfwords e.g. “I” and “me” – decrease when someone distances themselves from content (2) Exclusive words e.g. “but” and “or” decrease with fabricated content owing to complexity of maintaining deception (3) Negative emotion words e.g. “hate” increase in word usage owing to shame or guilty feeling (4) Motion verbs e.g. “go” or “move” increase as exclusive words go down to keep the story on track I. Natural Language Processing Leads to New Areas of Discovery
  • 20.
  • 21.
    (Berger and Packard2018) Are Atypical Things More Popular? Psychological Science Every business would love to know the minds of its competitors, and what they are likely to do next. Strategy analysts have thus far used simple tools…But new research at Wharton has shown how natural language processing techniques could be used to parse tomes of unstructured data such as text buried in conference calls or annual reports to more accurately anticipate competitor strategies. The research opens new pathways to measure and test assumptions firms make in their competitive strategies, and to “visualize how firms are positioned with respect to each other, and then map that on to performance consequences (Menon and Choi 2018) “What You Say Your Strategy Is and Why It Matters: Natural Language Processing of Unstructured Texts,” II. Natural Language Processing Leads to New Areas of Discovery Music Strategy
  • 22.
    Language on TwitterTracks Rates of Coronary Heart Disease, Psychological Science, January 2015 22 The findings show that expressions of negative emotions such as anger, stress, and fatigue in the tweets from people in a given county were associated with higher heart disease risk in that county. On the other hand, expressions of positive emotions like excitement and optimism were associated with lower risk. The results suggest that using Twitter as a window into a community’s collective mental state may provide a useful tool in epidemiology…So predictions from Twitter can actually be more accurate than using a set of traditional variables.
  • 23.
    2017 Hype Cyclefor Data Science and Machine Learning, 29 July, http://www.gartner.com/document/3772081 Gartner (2017) Strategic Predictions for 2017 and Beyond, research note 14 October, http://www.gartner.com/document/3471568  By 2020-22 :  100 million consumers shop in augmented reality  30% of web browsing sessions without a screen  Algorithms positively alter behavior of over 1B  Blockchain-based business worth $10B  IoT will save consumers/businesses $1T a year  40% of employees cut healthcare costs via fitness tracker Smart Data Discovery Will Enable New Class of Citizen Data Scientist ( Gartner 2015) “With the addition of NLG [Natural Language Generation], smart data discovery platforms automatically present a written or spoken context-based narrative of findings in the data that, alongside the visualization, inform the user about what is most important for them to act on in the data.”
  • 24.
    Systems of Insight(Forrester 2015)  Automated pattern extraction  Outlier detection  Correlation  Time series  Analytics integration with process, app or IoT
  • 25.
    25 outlier-detection “allow detectinga significant fraction of fraudulent cases…different in nature from historical fraud…resulting in a novel fraud pattern” Baesens, B., Vlasselaer, V., and Verbeke, W., 2015, Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection, Wiley
  • 26.
    26© 2017 FORRESTER.REPRODUCTION PROHIBITED. Forrester Research, 2016
  • 27.
    Reports & Analysis Visualisation & Interpretation Write Data/Business “Story” Insights Led by DataAnalyst or Scientist SME owner or Corporate , Machine Learning and Natural Language Generation Fusion of data science, business knowledge & creativity for maximium ROI Data Aggregation Operationalise Detect & Extract Patterns and Relationships Generate Insights & Story Process Application IoT Data Aggregation or Data Set Traditional Analytics: Slow & Expensive 80% of time sifting through data System of Insight (SoI) SoI: Fast & Cost Effective 80% of time in decision making with client
  • 28.
    Systems of Insight •Helps move away from “crisis levels” in talent • Traditional 5 step analytics process reduced to 2 step from data to action • Reimagine business processes through “machine engineering” • Minimise messy data issues and data preparation time
  • 29.
    Better customer experiences. . . . . . and half the inventory-carrying costs of other online fashion retailers. Forrester, 2016
  • 31.
    Data Science Resources Universityof Helsinki : Online AI Course https://www.elementsofai.com/ http://brookfieldinstitute.ca/wp- content/uploads/2016/06/Talented MrRobot.pdf https://industry.gov.au/Innovation-and- Science-Australia/Documents/Australia-2030- Prosperity-through-Innovation-Full-Report.pdf
  • 32.
    Deep Learning Libraries,Platforms, APIs and Hardware
  • 33.
    Next Step Start usingData Science Resources Systems of Insight and innovative data sources Natural Language Generation
  • 34.
    34 The future isimpossible to predict. However one thing is certain : The company that can excite it’s customers dreams Is out ahead in the race to business success Selling Dreams, Gian Luigi Longinotti

Editor's Notes

  • #2 We have entered age of democratisation of data science and big data. Democratisation of data science means we moved from IT & Business led to an almost inviable use of machine learning helping provide insights in all types of data
  • #7 Categories of Data Transactions External Data Customer data (includes web/e-commerce site Google analytics) Social media and online search data
  • #12 BFC than ANZ Google trends/correlate