5 predictions in 5 years:
what the big data environment
will look like…
Nicky Hekster
Technical Leader Healthcare & Lifesciences
IBM
n.s.hekster@nl.ibm.com
IBM Research Global Technology Outlook (GTO)
Every year, IBM’s top researchers identify
significant technology trends and disruptive
technologies that hold the greatest potential
to transform industries, businesses and
society over the next 3-to-10 years.
The IBM GTO has continuously foreseen major IT developments
e-business
Cloud
Workload Optimized
Systems
Mobility in Enterprise
Computing
Big Data
… and many others
High Performance
Computing
The recent evolution of the GTO: It’s all about Data
2012 2013 2014
Volume
Velocity
Variety
Veracity
Confluence of
Social, Mobile, Cloud,
Big Data, and Analytics
Systems of Insight
Data Transforming
Industries
Mobile Social
Cloud
Internet of Things
The recent evolution of the GTO: it’s all about Data
2012 2013 2014 2015
Volume
Velocity
Variety
Veracity
Confluence of
Social, Mobile, Cloud,
Big Data, and Analytics
Systems of Insight
Data Transforming
Industries
Data will disrupt
entire industries
Mobile Social
Cloud
Internet of Things
Data
Doing Predictions….
“I think there is a world market
for maybe five computers.”
Thomas Watson, chairman of IBM, 1943
“Computers in the future may weigh no more
than 1.5 tons. ” Popular Mechanics, 1949
“There is no reason anyone would want a
computer in their home. ”Ken Olsen, founder of DEC, 1977
“640K ought to be enough for
anybody. ” Bill Gates, 1981
“Prediction is difficult, especially about the
future” Yogi Berra
Approach: Data is growing exponentially and demands
new approaches (technology and strategy)
We are here
44 zettabytes
unstructured data
2010 2020
structured data
© 2015 International Business Machines Corporation
Industry Data Curation
Curating data to
generate
meaningful insights
easily can consume
70-80 % of the time-
to-value
High-value business in the emerging data economy is shifting towards providing and curating
data content and industry insights. Sophisticated scalable methods are required for integrating
data sources into a useful, curated form. For example:
• Spatio-temporal data includes raw satellite images that are distorted or incomplete due to
atmospheric interference. Curation analytics can construct a complete image using
statistical methods and detailed physical models.
• Social media data is often text, images, or video. Curation analytics that include a suite of
language, image and video tools and that combine text analysis and psychology can provide
deeper understanding of combined media data.
Un-curated
Diverse sources, resolutions, gaps, data types,
update frequencies and uncertainty levels
Curated
Creates an integrated up-to-date view of layered
spatio-temporal data with industry-specific
analytics
Roads Soil
Weather Land
Analyzing open spatio-temporal data at scale illustrates
the power of curation
© 2015 International Business Machines Corporation
Transforming Healthcare Consumption
In a lifetime, an
average human*
will generate 0.4 TB
of clinical data, 6 TB
of genomic data,
and 1100 TB of
exogenous data
Exogenous data is the behavior, socio-economic and environmental data that is generated by
the individual, fueled by the rapid adoption of smartphones and patient-controlled medical
devices.
Accelerating the curation and integration of this mass of patient-generated data can lead to
improvements in the accuracy of health assessments, monitoring, and self-treatment and
strengthen health outcomes.
These activities will also serve as the catalyst for new service models that will shape and
influence the healthcare market* in developed world
Source: "The Relative Contribution of Multiple Determinants to Health Outcomes", Lauren McGover et al., Health Affairs, 33, no.2 (2014)
60%
30%
10%
Clinical data
Genomics data
Exogenous data
1100 TB
Generated per lifetime
6 TB
Per lifetime
0.4 TB
Per lifetime
Rapid growth of exogenous data is transforming healthcare
Influence on outcomes
© 2015 International Business Machines Corporation
Payment Insights
About 85% of consumer
transactions and 10% of
government and business
transactions are in cash,
which largely is invisible
to analytics today
Digitizing cash transactions and new sources of digitized data will enable a major
transformation driven by innovations in payment insights. The emergence of mobile
payments is opening up this payment insights market to many new players, including mobile
network operators, retailers, phone platform vendors, e-commerce vendors and many
startups.
The combination of innovative solutions built on curated payment data and other relevant
information with reusable machine learning models will generate insights that lead to new
business models in the industry.
2.5 T
Transactions
Consumer volume
0.3 T
Transactions
Government &
Business volume
85% cash
15% non-cash
10% cash
90% non-cash
Cash is the most used retail payment
instrument*
* Source: http://www.frbsf.org/cash/publications/fed-notes/2014/april/cash-consumer-spending-payment-diarySource: Master Card Advisors, 2013 report
New payment insights through digitization of cash
© 2015 International Business Machines Corporation
Data at the Edge
90% of data
created over the
last 10 years was
abandoned
Smartphones already possess more storage capacity than all servers in the world combined.
Over 5000 Exabytes of data generated at the “edge” of the network by these and other
devices stayed there or was thrown away in 2014.
To process the valuable data at the edge, a new computing platform is needed, one that can
tap into the growing storage and compute of these devices.
This platform will create a marketplace for connecting producers and consumers – of
infrastructure and data – at the edge.
© 2015 International Business Machines Corporation
Neuromorphic Computing
60% of valuable
sensory data
loses value in
milliseconds
Today’s computing systems are challenged to extract real-time actionable information from
complex and cluttered sensor data with very low power. Biological systems on the other
hand, have evolved to be extremely efficient at making sense of raw sensory information in
real time at power levels orders of magnitude smaller than conventional compute.
IBM’s SyNAPSE (Systems of Neuromorphic Adaptive Plastic Scalable Electronics) Chip – with
brain-inspired computer architecture is powered by an unprecedented one million neurons
and 256 million synapses. Such differentiated architecture – both in hardware and software
– can enable neural computation to be deployed where the sensory data is at the edge, or at
massive scale within the cloud.
Next Step: Holistic computing intelligence
DARPA Synapse program - TrueNorth
© 2015 International Business Machines Corporation
Cloud Data Foundation
By 2017, data stored
on off-premise
clouds may equal as
much as 75% of all
data in traditional
data centers
Indisputably, cloud is now a critical element of any company’s data architecture. As more data
moves to or is born on clouds, new requirements are emerging for a viable enterprise data
foundation. A robust data foundation requires:
1. A variety of scalable data stores, from low latency to archive, tuned to different data types.
2. A rich set of APIs and curation tools for developers to capture insight quickly from large,
complex and disperse data sets.
3. A diverse pool of high quality data, solutions and services provided by an eco-system of
players for users to exploit insights quickly for business value.
Finally …
Thank you for your attention

Big Data Expo 2015 - IBM 5 predictions

  • 1.
    5 predictions in5 years: what the big data environment will look like… Nicky Hekster Technical Leader Healthcare & Lifesciences IBM n.s.hekster@nl.ibm.com
  • 2.
    IBM Research GlobalTechnology Outlook (GTO) Every year, IBM’s top researchers identify significant technology trends and disruptive technologies that hold the greatest potential to transform industries, businesses and society over the next 3-to-10 years.
  • 3.
    The IBM GTOhas continuously foreseen major IT developments e-business Cloud Workload Optimized Systems Mobility in Enterprise Computing Big Data … and many others High Performance Computing
  • 4.
    The recent evolutionof the GTO: It’s all about Data 2012 2013 2014 Volume Velocity Variety Veracity Confluence of Social, Mobile, Cloud, Big Data, and Analytics Systems of Insight Data Transforming Industries Mobile Social Cloud Internet of Things
  • 5.
    The recent evolutionof the GTO: it’s all about Data 2012 2013 2014 2015 Volume Velocity Variety Veracity Confluence of Social, Mobile, Cloud, Big Data, and Analytics Systems of Insight Data Transforming Industries Data will disrupt entire industries Mobile Social Cloud Internet of Things Data
  • 6.
    Doing Predictions…. “I thinkthere is a world market for maybe five computers.” Thomas Watson, chairman of IBM, 1943 “Computers in the future may weigh no more than 1.5 tons. ” Popular Mechanics, 1949 “There is no reason anyone would want a computer in their home. ”Ken Olsen, founder of DEC, 1977 “640K ought to be enough for anybody. ” Bill Gates, 1981 “Prediction is difficult, especially about the future” Yogi Berra
  • 7.
    Approach: Data isgrowing exponentially and demands new approaches (technology and strategy) We are here 44 zettabytes unstructured data 2010 2020 structured data
  • 8.
    © 2015 InternationalBusiness Machines Corporation Industry Data Curation Curating data to generate meaningful insights easily can consume 70-80 % of the time- to-value High-value business in the emerging data economy is shifting towards providing and curating data content and industry insights. Sophisticated scalable methods are required for integrating data sources into a useful, curated form. For example: • Spatio-temporal data includes raw satellite images that are distorted or incomplete due to atmospheric interference. Curation analytics can construct a complete image using statistical methods and detailed physical models. • Social media data is often text, images, or video. Curation analytics that include a suite of language, image and video tools and that combine text analysis and psychology can provide deeper understanding of combined media data.
  • 9.
    Un-curated Diverse sources, resolutions,gaps, data types, update frequencies and uncertainty levels Curated Creates an integrated up-to-date view of layered spatio-temporal data with industry-specific analytics Roads Soil Weather Land Analyzing open spatio-temporal data at scale illustrates the power of curation
  • 10.
    © 2015 InternationalBusiness Machines Corporation Transforming Healthcare Consumption In a lifetime, an average human* will generate 0.4 TB of clinical data, 6 TB of genomic data, and 1100 TB of exogenous data Exogenous data is the behavior, socio-economic and environmental data that is generated by the individual, fueled by the rapid adoption of smartphones and patient-controlled medical devices. Accelerating the curation and integration of this mass of patient-generated data can lead to improvements in the accuracy of health assessments, monitoring, and self-treatment and strengthen health outcomes. These activities will also serve as the catalyst for new service models that will shape and influence the healthcare market* in developed world
  • 11.
    Source: "The RelativeContribution of Multiple Determinants to Health Outcomes", Lauren McGover et al., Health Affairs, 33, no.2 (2014) 60% 30% 10% Clinical data Genomics data Exogenous data 1100 TB Generated per lifetime 6 TB Per lifetime 0.4 TB Per lifetime Rapid growth of exogenous data is transforming healthcare Influence on outcomes
  • 12.
    © 2015 InternationalBusiness Machines Corporation Payment Insights About 85% of consumer transactions and 10% of government and business transactions are in cash, which largely is invisible to analytics today Digitizing cash transactions and new sources of digitized data will enable a major transformation driven by innovations in payment insights. The emergence of mobile payments is opening up this payment insights market to many new players, including mobile network operators, retailers, phone platform vendors, e-commerce vendors and many startups. The combination of innovative solutions built on curated payment data and other relevant information with reusable machine learning models will generate insights that lead to new business models in the industry.
  • 13.
    2.5 T Transactions Consumer volume 0.3T Transactions Government & Business volume 85% cash 15% non-cash 10% cash 90% non-cash Cash is the most used retail payment instrument* * Source: http://www.frbsf.org/cash/publications/fed-notes/2014/april/cash-consumer-spending-payment-diarySource: Master Card Advisors, 2013 report New payment insights through digitization of cash
  • 14.
    © 2015 InternationalBusiness Machines Corporation Data at the Edge 90% of data created over the last 10 years was abandoned Smartphones already possess more storage capacity than all servers in the world combined. Over 5000 Exabytes of data generated at the “edge” of the network by these and other devices stayed there or was thrown away in 2014. To process the valuable data at the edge, a new computing platform is needed, one that can tap into the growing storage and compute of these devices. This platform will create a marketplace for connecting producers and consumers – of infrastructure and data – at the edge.
  • 15.
    © 2015 InternationalBusiness Machines Corporation Neuromorphic Computing 60% of valuable sensory data loses value in milliseconds Today’s computing systems are challenged to extract real-time actionable information from complex and cluttered sensor data with very low power. Biological systems on the other hand, have evolved to be extremely efficient at making sense of raw sensory information in real time at power levels orders of magnitude smaller than conventional compute. IBM’s SyNAPSE (Systems of Neuromorphic Adaptive Plastic Scalable Electronics) Chip – with brain-inspired computer architecture is powered by an unprecedented one million neurons and 256 million synapses. Such differentiated architecture – both in hardware and software – can enable neural computation to be deployed where the sensory data is at the edge, or at massive scale within the cloud.
  • 16.
    Next Step: Holisticcomputing intelligence
  • 17.
  • 18.
    © 2015 InternationalBusiness Machines Corporation Cloud Data Foundation By 2017, data stored on off-premise clouds may equal as much as 75% of all data in traditional data centers Indisputably, cloud is now a critical element of any company’s data architecture. As more data moves to or is born on clouds, new requirements are emerging for a viable enterprise data foundation. A robust data foundation requires: 1. A variety of scalable data stores, from low latency to archive, tuned to different data types. 2. A rich set of APIs and curation tools for developers to capture insight quickly from large, complex and disperse data sets. 3. A diverse pool of high quality data, solutions and services provided by an eco-system of players for users to exploit insights quickly for business value. Finally …
  • 19.
    Thank you foryour attention

Editor's Notes

  • #3 IBM’s annual Global Technology Outlook (GTO) is an authoritative analysis of the future of information technology and its implications for business and society. Over the past 25 years, the GTO has been carried out annually for the benefit of IBM clients to help inform their IT investments and long-term strategic planning. It draws insights and contributions from IBM Research’s nearly 3,000 leading scientists, engineers, mathematicians and subject matter experts who sit in 12 labs on six continents.
  • #4 Forecasting is always risky business, but over the past 25 years the GTO has proven to be remarkably accurate in anticipating cycles of technological progress and change. It continues to inform and influence the roughly $6B investment in research and development that IBM makes annually, as well as our acquisition strategy. The breadth of IBM’s expertise across systems, software, services, semiconductors and a wide swath of industries gives the company a perspective on trends unmatched by other companies. For example, in the early days of the Internet, when businesses were debating whether Netscape or Internet Explorer would win the browser wars, IBM foresaw that the Internet was about business – not browsers. In 2004, IBM accurately predicted the emergence of petaflop-scale computers. (A petaflop is 1 quadrillion floating point operations per second.) IBM achieved that milestone four years later. Today, there are more than 50 computers operating at that speed. More recently, the GTO foresaw the advent of hybrid clouds – combining the accessibility of public clouds with the data security of private clouds.
  • #5 The focus on one topic over a four year period is unprecedented in GTO history. But the implications and impact of the changing nature of data – and how we extract value from it – is so important that it has been a key GTO topic since 2012. In 2012, we looked at the four Vs of data. Everyone was talking about volume (big data). But IBM also did a deep dive into the implications of the speed at which data could move (velocity), and the different types (or variety) of data, both structured (databases) and unstructured (video, tweets, sensor data, etc.). We also looked at the implications that not all data is accurate or reliable (veracity) and explored how to analyze and gain value from data whose veracity is in doubt. The next year, IBM examined the confluence of five trends, which individually were well known, but whose interactions had not been previously examined. This confluence is now widely recognized as the key driver of Systems of Engagement. In 2014, the GTO took on Systems of Insight as we recognized the greatest value for enterprises occur at the intersection of systems of record (back office applications) and systems of engagement (front office applications). To more rapidly extract this value, and build Systems of Insight, three components are needed: Engine to compute the insight Cloud to deliver the insight, and most importantly Data
  • #6 The current 2015 GTO is dedicated entirely to this last element: data. We brought a lot of earlier themes together to look at how data is driving new business models and transforming not just IT but entire industries with digitization of new data. We will also look at a growing class of data – data at the edge – that is likely to define an entirely new IT paradigm.
  • #7 10/2/2015
  • #12 Of course, data gets even more interesting – and insights more powerful – when it gets curated and analyzed in context of other data. Data curation means integrating data, enriching the data, aligning, reclassifying, projecting etc. Right now, to be totally transparent, curation is a huge gating factor. A study of many analytics engagements shows that 70% of the time and cost invested is spent simply finding, organizing, aligning and classifying data. And often the data sets could drive value across multiple teams, enterprises and industries. But today, the exercise of curation is repeated by each team, enterprise, industry and must be done before the analysis which creates competitive differentiation can even begin. Open spatio-temporal data has the broadest cross-industry value and is currently the least curated. But it holds tremendous potential to connect and extract insights that offer great context and deeper understanding of what’s happening. IBM is building the expertise to organize the world’s open spatio-temporal data and developing a unified, cloud-based, searchable, big data store of pre-processed physical data with industry-specific analytics. This will enable complex query capabilities across layers and data sets. For example, “Find all areas with a specific soil type, surface temperatures between 35 and 65, and access to major transportation hubs…” That’s actually a very simplistic way of applying this concept, but hopefully it suggests ideas for your business.
  • #14 Let me give you two further examples of how the emergence of new forms of data are beginning to transform industries – and suggest that we later discuss the implications for your industry. In a lifetime, an average human (in the developed world) will generate 0.4 terabytes of clinical data (the type of stuff in our electronic health records), 6 terabytes of genomic data (the stuff we’re born with)… and 1,100 terabytes of exogenous data! You may be asking “What is exogenous data?” That’s the data created by our behavior, or related to our socio-economic strata, or the environment that we live in, etc. So little of this information was available to us before, but now is being generated by the rapid adoption of smartphones, fitness monitors and wearables, and patient-controlled medical devices. This development provides a significant opportunity to capture and use relevant, factual patient-generated information to yield improved self-managed outcomes. Why is that so important? Believe it or not, that exogenous data holds the key to delivering healthy outcomes. Studies show that exogenous factors determine 60% of a person’s health outcome – not genetics, not clinical data. That’s big! We’re seeing the emergence of a new era of “personalized medicine” as a result. Personalized medicine is broadly about customization of healthcare. To date, the use of genetic information has showed incredible promise in realizing personalized medicine. Using exogenous data, we can move away from generic population-based analytics to cohorts of one. This is a huge untapped opportunity for healthcare providers.
  • #15 OV chip kan alleen maar digitaal, zo niet dan betaal je meer voor OV. Impulsaankopen of giften zijn hoger als het digitaal kan gebeuren (collectes).
  • #16 Or consider financial transactions. Despite all of the advances in credit cards and electronic payments, the vast majority of the world’s retail transactions – 85% – are still done in cash. Currently there’s no easy way to develop insights from these transactions. But imagine what happens when we accelerate the digitization of those payments! In emerging markets especially, governments are increasingly looking to transition from physical cash to digital cash. In these markets, government deregulation allows banks to acquire mobile operator licenses and offer mobile money services. In other instances mobile operators and retailers are being allowed to provide payment services. In Kenya, we worked with the country’s largest bank on three months of microloan data that included over four million loan requests. We found that if the bank would take the system into production, they could increase revenue by 26%, reduce defaults by 40%, and double the profit on every microloan. Not surprisingly, they are very excited with these results and are planning to move the solution into production this year.
  • #19 http://www.research.ibm.com/articles/brain-chip.shtml
  • #20 http://www.research Next goal: integrating 4,096 chips in a single rack with 4 billion neurons and 1 trillion synapses while consuming ~4kW of power What is a cognitive chip? The latest SyNAPSE chip, introduced on August 7, 2014, has the potential to transform mobility by spurring innovation around an entirely new class of applications with sensory capabilities at incredibly low power levels. This is enabled by an revolutionary new technology design inspired by the human brain. IBM built a new chip with a brain-inspired computer architecture powered by an unprecedented 1 million neurons and 256 million synapses. It is the largest chip IBM has ever built at 5.4 billion transistors, and has an on-chip network of 4,096 neurosynaptic cores. Yet, it only consumes 70mW during real-time operation — orders of magnitude less energy than traditional chips. As part of a complete cognitive hardware and software ecosystem, this technology opens new computing frontiers for distributed sensor and supercomputing applications..ibm.com/cognitive-computing/neurosynaptic-chips.shtml#fbid=8zqqrQQPQzL http://www.scientificamerican.com/article/neuroelectronics-make-smarter-computer-chips/ Menselijke hersenen bevatten vele tientallen miljarden neuronen (86 miljard). http://www.digitaltrends.com/computing/ibms-unveils-the-brain-inspired-truenorth-cognitive-computer/#ixzz3jcefOdIc