Collabor8now Ltd
1st December 2016
Steve Dale @stephendale
Unless otherwise noted, this work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Trends in Big Data, Data
Analytics & AI
Collabor8now Ltd
What is “Big Data”?
Big Data is data whose scale, diversity and
complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract
value and hidden knowledge from it…
Collabor8now Ltd
Big Data – Big Challenges
• Structured e.g. databases
• Semi-structured e.g. email, e-forms, HTML, XML
• Unstructured e.g. document collections (text),
social interactions (text, images, video, sound)
• Machine generated e.g. weblogs, sensor data, etc.
Big Data can be a combination of different data
formats:
There is massive growth in unstructured data…but
just wait for the IoT!
Collabor8now Ltd
Big Data Challenges
Image Source: IBM
It’s not just how fast data is produced or changed,
but the speed at which it must be received,
understood and processed.
Collabor8now Ltd
The credibility gap
Collabor8now Ltd
Time
DataVolume
Data available to an organisation
Data an organisation can process
Collabor8now Ltd
Data-driven decisions
Source: PwC Global Data & Analytics Survey 2016
8%
53%
39%
Highly data
driven
Rarely data
driven
Somewhat
data driven
Decision making is
best described as
27%
28%
29%
13%
Predictive:
What could
happen?
Use of analytics is
mostly
Prescriptive:
What should
happen now?
Descriptive:
What
happened?
Diagnostic:
Why did it
happen?
Collabor8now Ltd
What do decision-makers need?
Strategic decisions are still often based
on instinct. But more businesses are
beginning to look at sophisticated
machine learning algorithms to support
decision making.
Our next decision will likely be based on:
Machine Algorithms Human Judgement
59%
41%
Source: PwC Global Data & Analytics Survey 2016
A mix of mind and
machine
Collabor8now Ltd
AI Lexicon
Collabor8now Ltd
Machine Learning
Machine learning techniques are designed to seek out
opportunities to optimise decisions based on the predictive
value of large-scale data sets.
Image Source:Tata Consultancy Services
Collabor8now Ltd
Analytical Techniques
Cluster analysis The task of
grouping a set of objects in such a
way that objects in the same group
(cluster) are more similar, in some
sense or another, to each other
than to those in other groups
clusters).
Comparative
Analysis. A step-by-step
procedure of comparisons
and calculations to detect
patterns within very large
data sets
Descriptive tree
analytics A decision
support tool that uses a tree-
like graph of decisions and their
possible consequences
including chance event
outcomes, resource costs and
utility
Factor analysis
Used to analyse large
numbers of dependent
variables to detect certain
aspects of the
independent variables
(factors) affecting those
dependent variables.
Machine learning A
type of artificial intelligence
which provides computers
with the ability to learn
without being explicitly
programmed.
Multivariate
analysis The
observation and analysis of
more than one statistical
outcome variable at a time..
Regression
analysis A statistical
process for estimating
relationships between a
dependent variable and
one or more independent
variables.
Segmentation
analysis Divides a broad
category into subsets that
have, or are perceived to
have, common features,
needs, interests or
priorities.
Sentiment analysis
The process of identifying
and categorising opinions
expressed in a piece of
text to determine whether
the writer’s attitude
towards a topic or issue is
positive, negative or
neutral.
Simulation The
imitation of the operation
of a real world process or
system over time. It
requires a model that
represents the key
characteristics or
behaviours of the selected
physical or abstract
system or process.
Time Series
analysis Comprises
methods for analysing time
series data to extract
meaningful statistics and
other characteristics of the
data.
Collabor8now Ltd
The art and science of decision making
Unlock existing
insights. Data do not
have to be “big” to be
useful.Analysing
databases previously
mothballed or kept in
silos can lead to fresh
insights.
1
Beware of inherent
bias. Important
decisions have already
taken place before data
analysis. Understand the
provenance and quality of
the data.
2
Invest in talent.
Can you give existing
employees a foundation
in data analysis before
recruiting new data
scientists?
3
Accountability. Be
clear about who has
decision making rights.
Opening up access to data
and analysis can allow
decisions to be challenged.
4
Collabor8now Ltd
Life in a Big Data World
Collabor8now Ltd
Trends & Predictions
• Power to the business users Information Week 2016
• By 2018, 20% of business content will be authored
by machines. Gartner 2016
• Embedding intelligence Gartner 2016
• Shortage of talent A.T. Kearney 2016
• Machine Learning gaining momentum Ovum 2016
• Data-as-a-Service Business Models Forrester 2016
• Real-time insights Forrester 2016
• The start of algorithm markets Forrester 2016
• By 2018, 3 million workers worldwide will be
supervised by roboboss. Gartner 2016
Collabor8now Ltd
IBM Watson Editions
Collabor8now Ltd
IBM Watson
Try it for yourself – free!
Go to: http://www.ibm.com/analytics/watson-analytics/ and sign-in with a
valid email address.
Once your account has been validated, sign-in and you'll see the main
Watson interface:
https://watson.analytics.ibmcloud.com/
Worth looking at the help videos, and I recommend:
- Getting Started
- Load your data
- Create an AssembledView - Create an Exploration
- Create a Prediction
Also – IBM offer regular webinars for new users:
http://www.ibm.com/smarterplanet/us/en/ibmwatson/building-with-watson-webinar.html
Collabor8now Ltd
Unless otherwise noted, this work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Steve Dale
@stephendale
steve.dale@collabor8now.com
“Errors using inadequate data
are much less than those using
no data at all” Charles Babbage

Big Data Trends

  • 1.
    Collabor8now Ltd 1st December2016 Steve Dale @stephendale Unless otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Trends in Big Data, Data Analytics & AI
  • 2.
    Collabor8now Ltd What is“Big Data”? Big Data is data whose scale, diversity and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
  • 3.
    Collabor8now Ltd Big Data– Big Challenges • Structured e.g. databases • Semi-structured e.g. email, e-forms, HTML, XML • Unstructured e.g. document collections (text), social interactions (text, images, video, sound) • Machine generated e.g. weblogs, sensor data, etc. Big Data can be a combination of different data formats: There is massive growth in unstructured data…but just wait for the IoT!
  • 4.
    Collabor8now Ltd Big DataChallenges Image Source: IBM It’s not just how fast data is produced or changed, but the speed at which it must be received, understood and processed.
  • 5.
    Collabor8now Ltd The credibilitygap Collabor8now Ltd Time DataVolume Data available to an organisation Data an organisation can process
  • 6.
    Collabor8now Ltd Data-driven decisions Source:PwC Global Data & Analytics Survey 2016 8% 53% 39% Highly data driven Rarely data driven Somewhat data driven Decision making is best described as 27% 28% 29% 13% Predictive: What could happen? Use of analytics is mostly Prescriptive: What should happen now? Descriptive: What happened? Diagnostic: Why did it happen?
  • 7.
    Collabor8now Ltd What dodecision-makers need? Strategic decisions are still often based on instinct. But more businesses are beginning to look at sophisticated machine learning algorithms to support decision making. Our next decision will likely be based on: Machine Algorithms Human Judgement 59% 41% Source: PwC Global Data & Analytics Survey 2016 A mix of mind and machine
  • 8.
  • 9.
    Collabor8now Ltd Machine Learning Machinelearning techniques are designed to seek out opportunities to optimise decisions based on the predictive value of large-scale data sets. Image Source:Tata Consultancy Services
  • 10.
    Collabor8now Ltd Analytical Techniques Clusteranalysis The task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar, in some sense or another, to each other than to those in other groups clusters). Comparative Analysis. A step-by-step procedure of comparisons and calculations to detect patterns within very large data sets Descriptive tree analytics A decision support tool that uses a tree- like graph of decisions and their possible consequences including chance event outcomes, resource costs and utility Factor analysis Used to analyse large numbers of dependent variables to detect certain aspects of the independent variables (factors) affecting those dependent variables. Machine learning A type of artificial intelligence which provides computers with the ability to learn without being explicitly programmed. Multivariate analysis The observation and analysis of more than one statistical outcome variable at a time.. Regression analysis A statistical process for estimating relationships between a dependent variable and one or more independent variables. Segmentation analysis Divides a broad category into subsets that have, or are perceived to have, common features, needs, interests or priorities. Sentiment analysis The process of identifying and categorising opinions expressed in a piece of text to determine whether the writer’s attitude towards a topic or issue is positive, negative or neutral. Simulation The imitation of the operation of a real world process or system over time. It requires a model that represents the key characteristics or behaviours of the selected physical or abstract system or process. Time Series analysis Comprises methods for analysing time series data to extract meaningful statistics and other characteristics of the data.
  • 11.
    Collabor8now Ltd The artand science of decision making Unlock existing insights. Data do not have to be “big” to be useful.Analysing databases previously mothballed or kept in silos can lead to fresh insights. 1 Beware of inherent bias. Important decisions have already taken place before data analysis. Understand the provenance and quality of the data. 2 Invest in talent. Can you give existing employees a foundation in data analysis before recruiting new data scientists? 3 Accountability. Be clear about who has decision making rights. Opening up access to data and analysis can allow decisions to be challenged. 4
  • 12.
    Collabor8now Ltd Life ina Big Data World
  • 13.
    Collabor8now Ltd Trends &Predictions • Power to the business users Information Week 2016 • By 2018, 20% of business content will be authored by machines. Gartner 2016 • Embedding intelligence Gartner 2016 • Shortage of talent A.T. Kearney 2016 • Machine Learning gaining momentum Ovum 2016 • Data-as-a-Service Business Models Forrester 2016 • Real-time insights Forrester 2016 • The start of algorithm markets Forrester 2016 • By 2018, 3 million workers worldwide will be supervised by roboboss. Gartner 2016
  • 14.
  • 15.
    Collabor8now Ltd IBM Watson Tryit for yourself – free! Go to: http://www.ibm.com/analytics/watson-analytics/ and sign-in with a valid email address. Once your account has been validated, sign-in and you'll see the main Watson interface: https://watson.analytics.ibmcloud.com/ Worth looking at the help videos, and I recommend: - Getting Started - Load your data - Create an AssembledView - Create an Exploration - Create a Prediction Also – IBM offer regular webinars for new users: http://www.ibm.com/smarterplanet/us/en/ibmwatson/building-with-watson-webinar.html
  • 16.
    Collabor8now Ltd Unless otherwisenoted, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Steve Dale @stephendale steve.dale@collabor8now.com “Errors using inadequate data are much less than those using no data at all” Charles Babbage

Editor's Notes

  • #3 Big Data is said to be like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks that everyone else is doing it, so everyone claims they are doing it! Eric Schmidt (Google) was quoted in 2010 as saying “There were 5 exabytes of information created between the dawn of civilisation through 2003, but that much information is now created in 2 days”.
  • #4 Big Data aggregation can help break-down business silos – the analytics will be mainly looking for patterns. Data patterns can tell a story. The Internet of Things (IoT) requires all of it’s interconnected devices (cars, TV’s, fridges, RFIDs, etc. – several billion sensors in all) to communicate, swap and report status data. All of this data will need to be stored, analysed and interpreted.
  • #5 If all of this seems to be a bit abstract. Let us consider one example of the Large Synoptic Survey Telescope (LSST). The LSST is a new kind of telescope. Currently under construction in Chile, the LSST is designed to conduct a ten-year survey of the dynamic universe. It will generate 30 terabytes of image data per night, every night, for 10 years. By the end of the 10-year sky survey, a final image archive of 100-200 petabytes will be achieved, along with a 20- to 40-petabyte queryable database of astronomical source information. There will be one 3-gigapixel (6-gigabyte) image obtained with LSST’s camera every 20 seconds every night for those 10 years. Within 60 seconds, the image (actually a pair of images, to remove instrumental artifacts) needs to be processed, and all objects in the image pair that have changed in any way (via movement or flux variation) must be reported to the worldwide astronomical community.  It is anticipated that each image pair (every 40 seconds) will generate several thousand alerts every single night for 10 years. That is fast, high-velocity information! The ability to mine, characterize, classify, and respond to this rapid avalanche of alerts will be an enormous challenge to the researchers seeking to make astronomical discoveries from this data fire hose. Furthermore, each of the 50 billion objects in the LSST survey will be observed roughly 1000 times each over the 10-year project duration, and each observation (40-second image pair) will yield about 200 unique scientific features measured for each object. Consequently, the final completed database of 50 billion astronomical objects will contain approximately 200,000 dimensions of information per object! 
  • #6 Most organisations still don’t know what data they actually have, and what they’re creating and storing on a daily basis. Some are beginning to realise that these massive archives of data might hold some useful information that can be potentially deliver some business value. But it takes time to access, analyse, interpret and apply actions resulting from this analysis, and in the mean-time, the world has moved on. According to Veritas Technologies ‘Global Databerg Report’, 52% of all information is considered to be ‘Dark’, whose value is unknown. 33% of information is considered redundant, obsolete or trivial (ROT).
  • #7 The report emphasises the need for companies to establish a data-driven innovation culture – but there is still some way to go. Those using data and analytics are focused on the past, looking back with descriptive (27%) or diagnostic (28%) methods. The more sophisticated organisations use a forward-looking predictive and prescriptive approach to data.
  • #8 Strategic decisions are still often based on instinct. Sophisticated machine learning should complement experience and intuition. Today’s business environment is not just about automating business processes – it’s about automating though processes. However, we need to be able to understand and trust the algorithms.
  • #9 Machine learning’s ability to scale across the broad spectrum of contract management, customer service, finance, legal, sales, quote-to-cash, quality, pricing and production challenges enterprises face is attributable to its ability to continually learn and improve. Machine learning algorithms are iterative in nature, constantly learning and seeking to optimise outcomes.  Every time a miscalculation is made, machine learning algorithms correct the error and begin another iteration of the data analysis. These calculations happen in milliseconds which makes machine learning exceptionally efficient at optimizing decisions and predicting outcomes. A recent McKinsey study found that a dozen European banks are replacing statistical modeling techniques with machine learning. The banks are also increasing customer satisfaction scores and customer lifetime value as well.
  • #10 Target assigns every customer a Guest ID number, tied to their credit card, name, or email address that becomes a bucket that stores a history of everything they’ve bought and any demographic information Target has collected from them or bought from other sources. They analysed historical buying data for all the ladies who had signed up for Target baby registries in the past. Some useful patterns emerged. It was noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester and that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Overall, they were able to identify about 25 products that, when analyzed together, allowed them to assign each shopper a “pregnancy prediction” score. Target then started sending coupons for baby items to customers according to their pregnancy scores. An angry man went into a Target outside of Minneapolis, demanding to talk to a manager: “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?” The manager apologized and then called a few days later to apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”
  • #11 Power to business users: Driven by a shortage of big data talent and the ongoing gap between needing business information and unlocking it from the analysts and data scientists, look for more tools and features that expose information directly to the people who use it. Machine generated content: Content that is based on data and analytical information will be turned into natural language writing by technologies that can proactively assemble and deliver information through automated composition engines. Content currently written by people, such as shareholder reports, legal documents, market reports, press releases and white papers are prime candidates for these tools. Embedding intelligence: On a mass scale, Gartner identifies "autonomous agents and things" as one of the up-and-coming trends, which is already marking the arrival of robots, autonomous vehicles, virtual personal assistants, and smart advisers. Shortage of talent: Business consultancy A.T. Kearney reported that 72% of market-leading global companies reported that they had a hard time hiring data science talent. Machine learning: Gartner said that an advanced form of machine learning called deep neural nets will create systems that can autonomously learn to perceive the world on their own. Data as a service: IBM's acquisition of the Weather Company -- with all its data, data streams, and predictive analytics -- highlighted something that's coming. Real-time insights: The window for turning data into action is narrowing. The next 12 months will be about distributed, open source streaming alternatives built on open source projects like Kafka and Spark Roboboss: Some performance measurements can be consumed more swiftly by smart machine managers aka “robo-bosses,” who will perform supervisory duties and make decisions about staffing or management incentives. Algorithm markets: Firms will recognize that many algorithms can be acquired rather than developed. “Just add data”, Forrester's Brian Hopkins wrote, giving several examples of services available today, including Algorithmia, Data Xu, and Kaggle
  • #12 IBM Watson run regular (free) webinars for users that sign up to use Watson. http://www.ibm.com/smarterplanet/us/en/ibmwatson/building-with-watson-webinar.html The professional version starts at £58 per month.  The Plus version starts at £22 per month.