www.iactglobal.in
1
www.iactglobal.in
What this Module 1 about ?
After completing this Module, you should be able to:
Understand what is Big Data and its characteristics
Detailed Understanding about the need for a Big Data solution
Understand where Big Data is appropriate
List the IBM products that make up IBM’s Big Data strategy
Describe the type of data appropriate for:
- Infosphere BigInsights
- Infosphere Streams
List the open source programs that are a part of Infosphere BigInsights. 2
www.iactglobal.in
System Of Units / Binary System of Units
3
International
System
Of Units(SI)
Binary
Usage(deprecated)
Kilobyte KB 10^3 2^10
megabyte MB 10^6 2^20
gigabyte GB 10^9 2^30
terabyte TB 10^12 2^40
petabyte PB 10^15 2^50
exabyte EB 10^18 2^60
zettabyte ZB 10^21 2^70
yottabyte YB 10^24 2^80
www.iactglobal.in
2.5 petabytes
Memory capacity of the human brain
13 petabytes
Amount that could be downloaded from the internet in two minutes, if every
American (300M) got on a computer at the same time
4.75 exabytes
Total genome sequences of all people on the earth
422 exabytes
Total digital data created in 2008
 1 Zetabyte
World’s current digital storage capacity
1.8 Zettabytes
Total digital data expected to be created in 2011
4
BigData @ Scale
www.iactglobal.in
Explosion in data and real world events
5Source : IBM internal : http://www.slideshare.net/jowen_evansdata/keynote-randy-newell-of-ibm
www.iactglobal.in
Commercial
 Web Events / Data Base Logs
 Sensor Networks
 RFID
 Internet Text and Documents
 Internet Search Indexing
 CDR (Call Detail Records)
 Medical Records ….. Etc
Government
 Regular Government Business & Commerce Needs
 Military & Homeland Security Surveillance
6
Examples Of BigData
www.iactglobal.in
Science
 Astronomy
 Atmosphere
 Biological
 Genomics
Social
 Social Networks
 Social Data
7
Examples Of BigData
www.iactglobal.in
BigData @ Organizations
8Source: http://www.slideshare.net/albertspijkers/2011-07-27baoclientpresentation
www.iactglobal.in
Perception gap surrounding social media
9Source: IBM internal
www.iactglobal.in
Big Data Characteristics
10Source: http://www.linguamatics.com/blog/big-data-real-world-data-where-does-text-analytics-fit
www.iactglobal.in
Challenge @ BigData to find new insights:
11
Source: IBM Internal:
http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2
www.iactglobal.in
Is there really a need for Big Data?
12
Source:
http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2
www.iactglobal.in
Case Study and Implementation @ Vestas
13
Vestas wind systems has 43,000 wind turbines in 65 countries over 5
continents
Customer Pain Point:
 Optimal place to install wind turbine
 Must consider large number of location dependant factors like temperature, precipitation,
wind velocity and humidity
 Existing legacy process doesn’t support all data to be analyzed
 Analyzing the data must be completed in hours
Solution Required:
 Allow to leverage all available data, drastically reduce modeling time, support future
expansions in modeling techniques.
 Improve accuracy of decisions for wind turbine placement
www.iactglobal.in
Case Study and Implementation @ Vestas
14
Implementation using InfoSphere BigInsights :
 Has created a “wind and site competence center”
 Engineers will be modeling data and forecasting optimal turbine
locations
 Initially to use publically available weather data from nation weather
data services as well as own recorded weather data
 Data sources considered: global deforestation metrics, satellite images,
historical metrics, geospatial data
 InfoSphere BigInsights will be used to as a core infrastructure to hold
generated weather data
www.iactglobal.in
Big Data presents big opportunities ?
15
Source:IBM Internal:
http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2
www.iactglobal.in
Traditional Vs BigData approaches:
16
Source:
http://image.slidesharecdn.com/1524howibmsbigdatasolutioncanhelpyougaininsightintoyourdatacenterv2-130306205122-php
www.iactglobal.in
17
Merging the Traditional and Big Data Approaches
Source:IBM Internal: http://www.rosebt.com/uploads/8/1/8/1/8181762/3861342_orig.jpg?1
www.iactglobal.in
Enterprise information architecture:
Big Data will be a
Permanent part of your
Information architecture
It cannot be a silo- It
Must be fully integrated
In order to leverage its
Value
It must be easy to
deploy and integrate
18Source: IBM Internal:http://www.slideshare.net/albertspijkers/2011-07-27baoclientpresentation
www.iactglobal.in
IBM Big Data platform strategy:
 Integrate and manage the full variety, velocity and volume of Big
Data
 Apply advanced analytics to information in its native form
 Visualize all available data for ad- hoc analysis
 Development environment for building new analytic applications
 Support workload optimization and scheduling
 Provide for security and governance
 Integrate with enterprise software
19
www.iactglobal.in
IBM Big Data platform strategy:
Source: http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2
20
www.iactglobal.in
Enterprise class BigData Product @ IBM:
Failure Tolerance:
 High availability architecture to support hardware or
application failure.
Scale Economically:
 Runs on scalable hardware with the ability to dynamically add
additional nodes.
Security & Privacy:
 Security protection for granular data access control.
21
Source: IBM internal
www.iactglobal.in
Different BigInsights editions for varying needs
22Source:IBM Internal: http://www.bloter.net/wp-content/uploads/2013/04/ibm_biginsights_2_1.jpg
www.iactglobal.in
Different BigInsights editions for varying needs
Characteristics that distinguish BigInsights include its built-
in support for analytics its integration with other enterprise
software, and its production readiness.
For InfoSphere BigInsights , there are Two Releases:
Basic Edition
Enterprise Edition
23
www.iactglobal.in
Infosphere Streams:
24Source:IBM Internal: https://bruceweed.wordpress.com/tag/ibm-infosphere-streams/
www.iactglobal.in
To Summarize
• An enterprise-ready Big Data platform
• Innovative, customer-tested products-InfoSphere
BigInsights-InfoSphere Streams
• Platform and products enabled for integration with
the overall enterprise infrastructure
• Even though BigInsights contains open source
code-Licensing is like other IBM software offering
25
www.iactglobal.in
Having completed this Module, you should be able to
Understand need for a Big Data solution
List the IBM products that make up IBM’s Big Data Strategy
Describe the type of data appropriate for:
-InfoSphere BigInsights
-InfoSphere Streams
List the open source programs that are a part of InfoSphere
BigInsights
26
To Summarize

Introduction to Big Data & Hadoop

  • 1.
  • 2.
    www.iactglobal.in What this Module1 about ? After completing this Module, you should be able to: Understand what is Big Data and its characteristics Detailed Understanding about the need for a Big Data solution Understand where Big Data is appropriate List the IBM products that make up IBM’s Big Data strategy Describe the type of data appropriate for: - Infosphere BigInsights - Infosphere Streams List the open source programs that are a part of Infosphere BigInsights. 2
  • 3.
    www.iactglobal.in System Of Units/ Binary System of Units 3 International System Of Units(SI) Binary Usage(deprecated) Kilobyte KB 10^3 2^10 megabyte MB 10^6 2^20 gigabyte GB 10^9 2^30 terabyte TB 10^12 2^40 petabyte PB 10^15 2^50 exabyte EB 10^18 2^60 zettabyte ZB 10^21 2^70 yottabyte YB 10^24 2^80
  • 4.
    www.iactglobal.in 2.5 petabytes Memory capacityof the human brain 13 petabytes Amount that could be downloaded from the internet in two minutes, if every American (300M) got on a computer at the same time 4.75 exabytes Total genome sequences of all people on the earth 422 exabytes Total digital data created in 2008  1 Zetabyte World’s current digital storage capacity 1.8 Zettabytes Total digital data expected to be created in 2011 4 BigData @ Scale
  • 5.
    www.iactglobal.in Explosion in dataand real world events 5Source : IBM internal : http://www.slideshare.net/jowen_evansdata/keynote-randy-newell-of-ibm
  • 6.
    www.iactglobal.in Commercial  Web Events/ Data Base Logs  Sensor Networks  RFID  Internet Text and Documents  Internet Search Indexing  CDR (Call Detail Records)  Medical Records ….. Etc Government  Regular Government Business & Commerce Needs  Military & Homeland Security Surveillance 6 Examples Of BigData
  • 7.
    www.iactglobal.in Science  Astronomy  Atmosphere Biological  Genomics Social  Social Networks  Social Data 7 Examples Of BigData
  • 8.
    www.iactglobal.in BigData @ Organizations 8Source:http://www.slideshare.net/albertspijkers/2011-07-27baoclientpresentation
  • 9.
    www.iactglobal.in Perception gap surroundingsocial media 9Source: IBM internal
  • 10.
    www.iactglobal.in Big Data Characteristics 10Source:http://www.linguamatics.com/blog/big-data-real-world-data-where-does-text-analytics-fit
  • 11.
    www.iactglobal.in Challenge @ BigDatato find new insights: 11 Source: IBM Internal: http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2
  • 12.
    www.iactglobal.in Is there reallya need for Big Data? 12 Source: http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2
  • 13.
    www.iactglobal.in Case Study andImplementation @ Vestas 13 Vestas wind systems has 43,000 wind turbines in 65 countries over 5 continents Customer Pain Point:  Optimal place to install wind turbine  Must consider large number of location dependant factors like temperature, precipitation, wind velocity and humidity  Existing legacy process doesn’t support all data to be analyzed  Analyzing the data must be completed in hours Solution Required:  Allow to leverage all available data, drastically reduce modeling time, support future expansions in modeling techniques.  Improve accuracy of decisions for wind turbine placement
  • 14.
    www.iactglobal.in Case Study andImplementation @ Vestas 14 Implementation using InfoSphere BigInsights :  Has created a “wind and site competence center”  Engineers will be modeling data and forecasting optimal turbine locations  Initially to use publically available weather data from nation weather data services as well as own recorded weather data  Data sources considered: global deforestation metrics, satellite images, historical metrics, geospatial data  InfoSphere BigInsights will be used to as a core infrastructure to hold generated weather data
  • 15.
    www.iactglobal.in Big Data presentsbig opportunities ? 15 Source:IBM Internal: http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2
  • 16.
    www.iactglobal.in Traditional Vs BigDataapproaches: 16 Source: http://image.slidesharecdn.com/1524howibmsbigdatasolutioncanhelpyougaininsightintoyourdatacenterv2-130306205122-php
  • 17.
    www.iactglobal.in 17 Merging the Traditionaland Big Data Approaches Source:IBM Internal: http://www.rosebt.com/uploads/8/1/8/1/8181762/3861342_orig.jpg?1
  • 18.
    www.iactglobal.in Enterprise information architecture: BigData will be a Permanent part of your Information architecture It cannot be a silo- It Must be fully integrated In order to leverage its Value It must be easy to deploy and integrate 18Source: IBM Internal:http://www.slideshare.net/albertspijkers/2011-07-27baoclientpresentation
  • 19.
    www.iactglobal.in IBM Big Dataplatform strategy:  Integrate and manage the full variety, velocity and volume of Big Data  Apply advanced analytics to information in its native form  Visualize all available data for ad- hoc analysis  Development environment for building new analytic applications  Support workload optimization and scheduling  Provide for security and governance  Integrate with enterprise software 19
  • 20.
    www.iactglobal.in IBM Big Dataplatform strategy: Source: http://www.slideshare.net/cmeniche/1524-how-ibms-big-data-solution-can-help-you-gain-insight-into-your-data-center-v2 20
  • 21.
    www.iactglobal.in Enterprise class BigDataProduct @ IBM: Failure Tolerance:  High availability architecture to support hardware or application failure. Scale Economically:  Runs on scalable hardware with the ability to dynamically add additional nodes. Security & Privacy:  Security protection for granular data access control. 21 Source: IBM internal
  • 22.
    www.iactglobal.in Different BigInsights editionsfor varying needs 22Source:IBM Internal: http://www.bloter.net/wp-content/uploads/2013/04/ibm_biginsights_2_1.jpg
  • 23.
    www.iactglobal.in Different BigInsights editionsfor varying needs Characteristics that distinguish BigInsights include its built- in support for analytics its integration with other enterprise software, and its production readiness. For InfoSphere BigInsights , there are Two Releases: Basic Edition Enterprise Edition 23
  • 24.
    www.iactglobal.in Infosphere Streams: 24Source:IBM Internal:https://bruceweed.wordpress.com/tag/ibm-infosphere-streams/
  • 25.
    www.iactglobal.in To Summarize • Anenterprise-ready Big Data platform • Innovative, customer-tested products-InfoSphere BigInsights-InfoSphere Streams • Platform and products enabled for integration with the overall enterprise infrastructure • Even though BigInsights contains open source code-Licensing is like other IBM software offering 25
  • 26.
    www.iactglobal.in Having completed thisModule, you should be able to Understand need for a Big Data solution List the IBM products that make up IBM’s Big Data Strategy Describe the type of data appropriate for: -InfoSphere BigInsights -InfoSphere Streams List the open source programs that are a part of InfoSphere BigInsights 26 To Summarize