A top-down look at current industry and technology trends for Big Data, Data Analytics and Machine Learning (cognitive technologies, AI etc.). New slides added for Ark Group presentation on 1st December 2016.
1. Collabor8now Ltd
1st December 2016
Steve Dale @stephendale
Unless otherwise noted, this work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Trends in Big Data, Data
Analytics & AI
2. Collabor8now Ltd
What is “Big Data”?
Big Data is data whose scale, diversity and
complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract
value and hidden knowledge from it…
3. Collabor8now Ltd
Big Data – Big Challenges
• Structured e.g. databases
• Semi-structured e.g. email, e-forms, HTML, XML
• Unstructured e.g. document collections (text),
social interactions (text, images, video, sound)
• Machine generated e.g. weblogs, sensor data, etc.
Big Data can be a combination of different data
formats:
There is massive growth in unstructured data…but
just wait for the IoT!
4. Collabor8now Ltd
Big Data Challenges
Image Source: IBM
It’s not just how fast data is produced or changed,
but the speed at which it must be received,
understood and processed.
5. Collabor8now Ltd
The credibility gap
Collabor8now Ltd
Time
DataVolume
Data available to an organisation
Data an organisation can process
6. Collabor8now Ltd
Data-driven decisions
Source: PwC Global Data & Analytics Survey 2016
8%
53%
39%
Highly data
driven
Rarely data
driven
Somewhat
data driven
Decision making is
best described as
27%
28%
29%
13%
Predictive:
What could
happen?
Use of analytics is
mostly
Prescriptive:
What should
happen now?
Descriptive:
What
happened?
Diagnostic:
Why did it
happen?
7. Collabor8now Ltd
What do decision-makers need?
Strategic decisions are still often based
on instinct. But more businesses are
beginning to look at sophisticated
machine learning algorithms to support
decision making.
Our next decision will likely be based on:
Machine Algorithms Human Judgement
59%
41%
Source: PwC Global Data & Analytics Survey 2016
A mix of mind and
machine
9. Collabor8now Ltd
Machine Learning
Machine learning techniques are designed to seek out
opportunities to optimise decisions based on the predictive
value of large-scale data sets.
Image Source:Tata Consultancy Services
10. Collabor8now Ltd
Analytical Techniques
Cluster analysis The task of
grouping a set of objects in such a
way that objects in the same group
(cluster) are more similar, in some
sense or another, to each other
than to those in other groups
clusters).
Comparative
Analysis. A step-by-step
procedure of comparisons
and calculations to detect
patterns within very large
data sets
Descriptive tree
analytics A decision
support tool that uses a tree-
like graph of decisions and their
possible consequences
including chance event
outcomes, resource costs and
utility
Factor analysis
Used to analyse large
numbers of dependent
variables to detect certain
aspects of the
independent variables
(factors) affecting those
dependent variables.
Machine learning A
type of artificial intelligence
which provides computers
with the ability to learn
without being explicitly
programmed.
Multivariate
analysis The
observation and analysis of
more than one statistical
outcome variable at a time..
Regression
analysis A statistical
process for estimating
relationships between a
dependent variable and
one or more independent
variables.
Segmentation
analysis Divides a broad
category into subsets that
have, or are perceived to
have, common features,
needs, interests or
priorities.
Sentiment analysis
The process of identifying
and categorising opinions
expressed in a piece of
text to determine whether
the writer’s attitude
towards a topic or issue is
positive, negative or
neutral.
Simulation The
imitation of the operation
of a real world process or
system over time. It
requires a model that
represents the key
characteristics or
behaviours of the selected
physical or abstract
system or process.
Time Series
analysis Comprises
methods for analysing time
series data to extract
meaningful statistics and
other characteristics of the
data.
11. Collabor8now Ltd
The art and science of decision making
Unlock existing
insights. Data do not
have to be “big” to be
useful.Analysing
databases previously
mothballed or kept in
silos can lead to fresh
insights.
1
Beware of inherent
bias. Important
decisions have already
taken place before data
analysis. Understand the
provenance and quality of
the data.
2
Invest in talent.
Can you give existing
employees a foundation
in data analysis before
recruiting new data
scientists?
3
Accountability. Be
clear about who has
decision making rights.
Opening up access to data
and analysis can allow
decisions to be challenged.
4
13. Collabor8now Ltd
Trends & Predictions
• Power to the business users Information Week 2016
• By 2018, 20% of business content will be authored
by machines. Gartner 2016
• Embedding intelligence Gartner 2016
• Shortage of talent A.T. Kearney 2016
• Machine Learning gaining momentum Ovum 2016
• Data-as-a-Service Business Models Forrester 2016
• Real-time insights Forrester 2016
• The start of algorithm markets Forrester 2016
• By 2018, 3 million workers worldwide will be
supervised by roboboss. Gartner 2016
15. Collabor8now Ltd
IBM Watson
Try it for yourself – free!
Go to: http://www.ibm.com/analytics/watson-analytics/ and sign-in with a
valid email address.
Once your account has been validated, sign-in and you'll see the main
Watson interface:
https://watson.analytics.ibmcloud.com/
Worth looking at the help videos, and I recommend:
- Getting Started
- Load your data
- Create an AssembledView - Create an Exploration
- Create a Prediction
Also – IBM offer regular webinars for new users:
http://www.ibm.com/smarterplanet/us/en/ibmwatson/building-with-watson-webinar.html
16. Collabor8now Ltd
Unless otherwise noted, this work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Steve Dale
@stephendale
steve.dale@collabor8now.com
“Errors using inadequate data
are much less than those using
no data at all” Charles Babbage
Editor's Notes
Big Data is said to be like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks that everyone else is doing it, so everyone claims they are doing it!
Eric Schmidt (Google) was quoted in 2010 as saying “There were 5 exabytes of information created between the dawn of civilisation through 2003, but that much information is now created in 2 days”.
Big Data aggregation can help break-down business silos – the analytics will be mainly looking for patterns. Data patterns can tell a story.
The Internet of Things (IoT) requires all of it’s interconnected devices (cars, TV’s, fridges, RFIDs, etc. – several billion sensors in all) to communicate, swap and report status data. All of this data will need to be stored, analysed and interpreted.
If all of this seems to be a bit abstract. Let us consider one example of the Large Synoptic Survey Telescope (LSST). The LSST is a new kind of telescope. Currently under construction in Chile, the LSST is designed to conduct a ten-year survey of the dynamic universe. It will generate 30 terabytes of image data per night, every night, for 10 years. By the end of the 10-year sky survey, a final image archive of 100-200 petabytes will be achieved, along with a 20- to 40-petabyte queryable database of astronomical source information. There will be one 3-gigapixel (6-gigabyte) image obtained with LSST’s camera every 20 seconds every night for those 10 years. Within 60 seconds, the image (actually a pair of images, to remove instrumental artifacts) needs to be processed, and all objects in the image pair that have changed in any way (via movement or flux variation) must be reported to the worldwide astronomical community. It is anticipated that each image pair (every 40 seconds) will generate several thousand alerts every single night for 10 years. That is fast, high-velocity information! The ability to mine, characterize, classify, and respond to this rapid avalanche of alerts will be an enormous challenge to the researchers seeking to make astronomical discoveries from this data fire hose. Furthermore, each of the 50 billion objects in the LSST survey will be observed roughly 1000 times each over the 10-year project duration, and each observation (40-second image pair) will yield about 200 unique scientific features measured for each object. Consequently, the final completed database of 50 billion astronomical objects will contain approximately 200,000 dimensions of information per object!
Most organisations still don’t know what data they actually have, and what they’re creating and storing on a daily basis. Some are beginning to realise that these massive archives of data might hold some useful information that can be potentially deliver some business value. But it takes time to access, analyse, interpret and apply actions resulting from this analysis, and in the mean-time, the world has moved on.
According to Veritas Technologies ‘Global Databerg Report’, 52% of all information is considered to be ‘Dark’, whose value is unknown. 33% of information is considered redundant, obsolete or trivial (ROT).
The report emphasises the need for companies to establish a data-driven innovation culture – but there is still some way to go. Those using data and analytics are focused on the past, looking back with descriptive (27%) or diagnostic (28%) methods. The more sophisticated organisations use a forward-looking predictive and prescriptive approach to data.
Strategic decisions are still often based on instinct. Sophisticated machine learning should complement experience and intuition. Today’s business environment is not just about automating business processes – it’s about automating though processes. However, we need to be able to understand and trust the algorithms.
Machine learning’s ability to scale across the broad spectrum of contract management, customer service, finance, legal, sales, quote-to-cash, quality, pricing and production challenges enterprises face is attributable to its ability to continually learn and improve. Machine learning algorithms are iterative in nature, constantly learning and seeking to optimise outcomes. Every time a miscalculation is made, machine learning algorithms correct the error and begin another iteration of the data analysis. These calculations happen in milliseconds which makes machine learning exceptionally efficient at optimizing decisions and predicting outcomes.
A recent McKinsey study found that a dozen European banks are replacing statistical modeling techniques with machine learning. The banks are also increasing customer satisfaction scores and customer lifetime value as well.
Target assigns every customer a Guest ID number, tied to their credit card, name, or email address that becomes a bucket that stores a history of everything they’ve bought and any demographic information Target has collected from them or bought from other sources. They analysed historical buying data for all the ladies who had signed up for Target baby registries in the past. Some useful patterns emerged. It was noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester and that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Overall, they were able to identify about 25 products that, when analyzed together, allowed them to assign each shopper a “pregnancy prediction” score. Target then started sending coupons for baby items to customers according to their pregnancy scores. An angry man went into a Target outside of Minneapolis, demanding to talk to a manager: “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?” The manager apologized and then called a few days later to apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”
Power to business users: Driven by a shortage of big data talent and the ongoing gap between needing business information and unlocking it from the analysts and data scientists, look for more tools and features that expose information directly to the people who use it.
Machine generated content: Content that is based on data and analytical information will be turned into natural language writing by technologies that can proactively assemble and deliver information through automated composition engines. Content currently written by people, such as shareholder reports, legal documents, market reports, press releases and white papers are prime candidates for these tools.
Embedding intelligence: On a mass scale, Gartner identifies "autonomous agents and things" as one of the up-and-coming trends, which is already marking the arrival of robots, autonomous vehicles, virtual personal assistants, and smart advisers.
Shortage of talent: Business consultancy A.T. Kearney reported that 72% of market-leading global companies reported that they had a hard time hiring data science talent.
Machine learning: Gartner said that an advanced form of machine learning called deep neural nets will create systems that can autonomously learn to perceive the world on their own.
Data as a service: IBM's acquisition of the Weather Company -- with all its data, data streams, and predictive analytics -- highlighted something that's coming.
Real-time insights: The window for turning data into action is narrowing. The next 12 months will be about distributed, open source streaming alternatives built on open source projects like Kafka and Spark
Roboboss: Some performance measurements can be consumed more swiftly by smart machine managers aka “robo-bosses,” who will perform supervisory duties and make decisions about staffing or management incentives.
Algorithm markets: Firms will recognize that many algorithms can be acquired rather than developed. “Just add data”, Forrester's Brian Hopkins wrote, giving several examples of services available today, including Algorithmia, Data Xu, and Kaggle
IBM Watson run regular (free) webinars for users that sign up to use Watson. http://www.ibm.com/smarterplanet/us/en/ibmwatson/building-with-watson-webinar.html
The professional version starts at £58 per month. The Plus version starts at £22 per month.