Big Data Trends


Published on

A top-down look at current industry and technology trends for Big Data, Data Analytics and Machine Learning (cognitive technologies, AI etc.). New slides added for Ark Group presentation on 1st December 2016.

Published in: Data & Analytics
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Big Data is said to be like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks that everyone else is doing it, so everyone claims they are doing it!

    Eric Schmidt (Google) was quoted in 2010 as saying “There were 5 exabytes of information created between the dawn of civilisation through 2003, but that much information is now created in 2 days”.
  • Big Data aggregation can help break-down business silos – the analytics will be mainly looking for patterns. Data patterns can tell a story.

    The Internet of Things (IoT) requires all of it’s interconnected devices (cars, TV’s, fridges, RFIDs, etc. – several billion sensors in all) to communicate, swap and report status data. All of this data will need to be stored, analysed and interpreted.
  • If all of this seems to be a bit abstract. Let us consider one example of the Large Synoptic Survey Telescope (LSST). The LSST is a new kind of telescope. Currently under construction in Chile, the LSST is designed to conduct a ten-year survey of the dynamic universe. It will generate 30 terabytes of image data per night, every night, for 10 years. By the end of the 10-year sky survey, a final image archive of 100-200 petabytes will be achieved, along with a 20- to 40-petabyte queryable database of astronomical source information. There will be one 3-gigapixel (6-gigabyte) image obtained with LSST’s camera every 20 seconds every night for those 10 years. Within 60 seconds, the image (actually a pair of images, to remove instrumental artifacts) needs to be processed, and all objects in the image pair that have changed in any way (via movement or flux variation) must be reported to the worldwide astronomical community.  It is anticipated that each image pair (every 40 seconds) will generate several thousand alerts every single night for 10 years. That is fast, high-velocity information! The ability to mine, characterize, classify, and respond to this rapid avalanche of alerts will be an enormous challenge to the researchers seeking to make astronomical discoveries from this data fire hose. Furthermore, each of the 50 billion objects in the LSST survey will be observed roughly 1000 times each over the 10-year project duration, and each observation (40-second image pair) will yield about 200 unique scientific features measured for each object. Consequently, the final completed database of 50 billion astronomical objects will contain approximately 200,000 dimensions of information per object! 
  • Most organisations still don’t know what data they actually have, and what they’re creating and storing on a daily basis. Some are beginning to realise that these massive archives of data might hold some useful information that can be potentially deliver some business value. But it takes time to access, analyse, interpret and apply actions resulting from this analysis, and in the mean-time, the world has moved on.

    According to Veritas Technologies ‘Global Databerg Report’, 52% of all information is considered to be ‘Dark’, whose value is unknown. 33% of information is considered redundant, obsolete or trivial (ROT).
  • The report emphasises the need for companies to establish a data-driven innovation culture – but there is still some way to go. Those using data and analytics are focused on the past, looking back with descriptive (27%) or diagnostic (28%) methods. The more sophisticated organisations use a forward-looking predictive and prescriptive approach to data.
  • Strategic decisions are still often based on instinct. Sophisticated machine learning should complement experience and intuition. Today’s business environment is not just about automating business processes – it’s about automating though processes. However, we need to be able to understand and trust the algorithms.
  • Machine learning’s ability to scale across the broad spectrum of contract management, customer service, finance, legal, sales, quote-to-cash, quality, pricing and production challenges enterprises face is attributable to its ability to continually learn and improve. Machine learning algorithms are iterative in nature, constantly learning and seeking to optimise outcomes.  Every time a miscalculation is made, machine learning algorithms correct the error and begin another iteration of the data analysis. These calculations happen in milliseconds which makes machine learning exceptionally efficient at optimizing decisions and predicting outcomes.

    A recent McKinsey study found that a dozen European banks are replacing statistical modeling techniques with machine learning. The banks are also increasing customer satisfaction scores and customer lifetime value as well.
  • Target assigns every customer a Guest ID number, tied to their credit card, name, or email address that becomes a bucket that stores a history of everything they’ve bought and any demographic information Target has collected from them or bought from other sources. They analysed historical buying data for all the ladies who had signed up for Target baby registries in the past. Some useful patterns emerged. It was noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester and that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Overall, they were able to identify about 25 products that, when analyzed together, allowed them to assign each shopper a “pregnancy prediction” score. Target then started sending coupons for baby items to customers according to their pregnancy scores. An angry man went into a Target outside of Minneapolis, demanding to talk to a manager: “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?” The manager apologized and then called a few days later to apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”
  • Power to business users: Driven by a shortage of big data talent and the ongoing gap between needing business information and unlocking it from the analysts and data scientists, look for more tools and features that expose information directly to the people who use it.

    Machine generated content: Content that is based on data and analytical information will be turned into natural language writing by technologies that can proactively assemble and deliver information through automated composition engines. Content currently written by people, such as shareholder reports, legal documents, market reports, press releases and white papers are prime candidates for these tools.

    Embedding intelligence: On a mass scale, Gartner identifies "autonomous agents and things" as one of the up-and-coming trends, which is already marking the arrival of robots, autonomous vehicles, virtual personal assistants, and smart advisers.

    Shortage of talent: Business consultancy A.T. Kearney reported that 72% of market-leading global companies reported that they had a hard time hiring data science talent.

    Machine learning: Gartner said that an advanced form of machine learning called deep neural nets will create systems that can autonomously learn to perceive the world on their own.

    Data as a service: IBM's acquisition of the Weather Company -- with all its data, data streams, and predictive analytics -- highlighted something that's coming.

    Real-time insights: The window for turning data into action is narrowing. The next 12 months will be about distributed, open source streaming alternatives built on open source projects like Kafka and Spark

    Roboboss: Some performance measurements can be consumed more swiftly by smart machine managers aka “robo-bosses,” who will perform supervisory duties and make decisions about staffing or management incentives.

    Algorithm markets: Firms will recognize that many algorithms can be acquired rather than developed. “Just add data”, Forrester's Brian Hopkins wrote, giving several examples of services available today, including Algorithmia, Data Xu, and Kaggle
  • IBM Watson run regular (free) webinars for users that sign up to use Watson.

    The professional version starts at £58 per month.  The Plus version starts at £22 per month.
  • Big Data Trends

    1. 1. Collabor8now Ltd 1st December 2016 Steve Dale @stephendale Unless otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Trends in Big Data, Data Analytics & AI
    2. 2. Collabor8now Ltd What is “Big Data”? Big Data is data whose scale, diversity and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
    3. 3. Collabor8now Ltd Big Data – Big Challenges • Structured e.g. databases • Semi-structured e.g. email, e-forms, HTML, XML • Unstructured e.g. document collections (text), social interactions (text, images, video, sound) • Machine generated e.g. weblogs, sensor data, etc. Big Data can be a combination of different data formats: There is massive growth in unstructured data…but just wait for the IoT!
    4. 4. Collabor8now Ltd Big Data Challenges Image Source: IBM It’s not just how fast data is produced or changed, but the speed at which it must be received, understood and processed.
    5. 5. Collabor8now Ltd The credibility gap Collabor8now Ltd Time DataVolume Data available to an organisation Data an organisation can process
    6. 6. Collabor8now Ltd Data-driven decisions Source: PwC Global Data & Analytics Survey 2016 8% 53% 39% Highly data driven Rarely data driven Somewhat data driven Decision making is best described as 27% 28% 29% 13% Predictive: What could happen? Use of analytics is mostly Prescriptive: What should happen now? Descriptive: What happened? Diagnostic: Why did it happen?
    7. 7. Collabor8now Ltd What do decision-makers need? Strategic decisions are still often based on instinct. But more businesses are beginning to look at sophisticated machine learning algorithms to support decision making. Our next decision will likely be based on: Machine Algorithms Human Judgement 59% 41% Source: PwC Global Data & Analytics Survey 2016 A mix of mind and machine
    8. 8. Collabor8now Ltd AI Lexicon
    9. 9. Collabor8now Ltd Machine Learning Machine learning techniques are designed to seek out opportunities to optimise decisions based on the predictive value of large-scale data sets. Image Source:Tata Consultancy Services
    10. 10. Collabor8now Ltd Analytical Techniques Cluster analysis The task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar, in some sense or another, to each other than to those in other groups clusters). Comparative Analysis. A step-by-step procedure of comparisons and calculations to detect patterns within very large data sets Descriptive tree analytics A decision support tool that uses a tree- like graph of decisions and their possible consequences including chance event outcomes, resource costs and utility Factor analysis Used to analyse large numbers of dependent variables to detect certain aspects of the independent variables (factors) affecting those dependent variables. Machine learning A type of artificial intelligence which provides computers with the ability to learn without being explicitly programmed. Multivariate analysis The observation and analysis of more than one statistical outcome variable at a time.. Regression analysis A statistical process for estimating relationships between a dependent variable and one or more independent variables. Segmentation analysis Divides a broad category into subsets that have, or are perceived to have, common features, needs, interests or priorities. Sentiment analysis The process of identifying and categorising opinions expressed in a piece of text to determine whether the writer’s attitude towards a topic or issue is positive, negative or neutral. Simulation The imitation of the operation of a real world process or system over time. It requires a model that represents the key characteristics or behaviours of the selected physical or abstract system or process. Time Series analysis Comprises methods for analysing time series data to extract meaningful statistics and other characteristics of the data.
    11. 11. Collabor8now Ltd The art and science of decision making Unlock existing insights. Data do not have to be “big” to be useful.Analysing databases previously mothballed or kept in silos can lead to fresh insights. 1 Beware of inherent bias. Important decisions have already taken place before data analysis. Understand the provenance and quality of the data. 2 Invest in talent. Can you give existing employees a foundation in data analysis before recruiting new data scientists? 3 Accountability. Be clear about who has decision making rights. Opening up access to data and analysis can allow decisions to be challenged. 4
    12. 12. Collabor8now Ltd Life in a Big Data World
    13. 13. Collabor8now Ltd Trends & Predictions • Power to the business users Information Week 2016 • By 2018, 20% of business content will be authored by machines. Gartner 2016 • Embedding intelligence Gartner 2016 • Shortage of talent A.T. Kearney 2016 • Machine Learning gaining momentum Ovum 2016 • Data-as-a-Service Business Models Forrester 2016 • Real-time insights Forrester 2016 • The start of algorithm markets Forrester 2016 • By 2018, 3 million workers worldwide will be supervised by roboboss. Gartner 2016
    14. 14. Collabor8now Ltd IBM Watson Editions
    15. 15. Collabor8now Ltd IBM Watson Try it for yourself – free! Go to: and sign-in with a valid email address. Once your account has been validated, sign-in and you'll see the main Watson interface: Worth looking at the help videos, and I recommend: - Getting Started - Load your data - Create an AssembledView - Create an Exploration - Create a Prediction Also – IBM offer regular webinars for new users:
    16. 16. Collabor8now Ltd Unless otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Steve Dale @stephendale “Errors using inadequate data are much less than those using no data at all” Charles Babbage