THE NEXT BIG THING!!!
 OLTP: Online Transaction Processing (DBMSs)
 OLAP: Online Analytical Processing (Data Warehousing)
 RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
2
• Structured
• Semi-Structured
• Un Structured
Relational Tables
XML
Graphic images, videos,
Streaming instrument data, Web
pages, logs, tweets etc
Customer
Social
Media
Gaming
Entertain
Banking
Finance
Our
Known
History
Purchase
12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
?TBsof
dataeveryday
2+
billion
people
on the
Web by
end
2011
30 billion RFID
tags today
(1.3B in 2005)
4.6
billion
camera
phones
world
wide
100s of
millions
of GPS
enabled
devices
sold
annually
76 million smart
meters in 2009…
200M by 2014
0
10
20
30
40
2009
2012
2020
Data Usage
Data
Usage
0%
20%
40%
60%
80%
100%
Data Usage
0.216
0.324
2.16
Unstructur
ed
Semi
Structured
Structured
 Not all of the Unstructured data are useful
 Breaking down data silos to access all data an organization
stores in different places and often in different systems
 Creating platforms that can pull in unstructured data as easily
as structured data
 Obtained and processed through new techniques to produce
best value
= +
 Enormous volume of information stored in Data Center/Data
warehouses/Data Bases/ Transaction system
 The Process of collecting, organizing and analyzing large sets
of data to discover patterns and other useful information
 Help the organizations to understand the information
contained within the data in better way
 Helps to identify the most important data for the business
and future business decisions
11
To Addresses the following…
Sophisticated BI & Analytics
Leverage Universal Data
Select proven Technologies
Agility for Business Change
Easy to Build and Manage
…and to overcomes the challenges
Long Timelines for
Infrastructure setup
Time and Cost
Uncertainties
Limitations in on-
boarding new data
sources
Volume
VelocityVariety
Value
Traditional Data warehouse
• Complete record from Transactional system
• All Data are centralized
• Addition of New Data every month/Day
• Analytics designed against Stable environment
• Many Reports run on a production basis
Big Data Environment
• Data from Many Sources inside and outside
organization
• Data often physically disturbed
• Need to iterate solution to test / improve model
• Large Memory analytics also part of iteration
• Every iteration requires complete reload of
information
 Researcher uses BD to decode human DNA in minutes
 Predict where terrorists plan to attack
 To determine which gene is mostly likely to be responsible for
certain diseases
 To decide which ads you are most likely to respond to on
Facebook
 Improves customer retention
 Help with product development and gain a competitive
advantage
 Increases efficiencies and optimize operations
 Improve speed and reduce complexity
 Hadoop, Cloudera, Hortonworks, MapR and Amazon. There also
other products such HPCC and cloud-based services such as
Google BigQuery.
20
Old Model: Few companies generates data, all others are consuming
New Model: All of us generate data, and all of us consume data
Thank You

Big data

  • 1.
    THE NEXT BIGTHING!!!
  • 2.
     OLTP: OnlineTransaction Processing (DBMSs)  OLAP: Online Analytical Processing (Data Warehousing)  RTAP: Real-Time Analytics Processing (Big Data Architecture & technology) 2
  • 3.
    • Structured • Semi-Structured •Un Structured Relational Tables XML Graphic images, videos, Streaming instrument data, Web pages, logs, tweets etc
  • 4.
  • 5.
    12+ TBs of tweetdata every day 25+ TBs of log data every day ?TBsof dataeveryday 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014
  • 6.
  • 7.
     Not allof the Unstructured data are useful  Breaking down data silos to access all data an organization stores in different places and often in different systems  Creating platforms that can pull in unstructured data as easily as structured data
  • 9.
     Obtained andprocessed through new techniques to produce best value = +  Enormous volume of information stored in Data Center/Data warehouses/Data Bases/ Transaction system
  • 10.
     The Processof collecting, organizing and analyzing large sets of data to discover patterns and other useful information  Help the organizations to understand the information contained within the data in better way  Helps to identify the most important data for the business and future business decisions
  • 11.
  • 12.
    To Addresses thefollowing… Sophisticated BI & Analytics Leverage Universal Data Select proven Technologies Agility for Business Change Easy to Build and Manage …and to overcomes the challenges Long Timelines for Infrastructure setup Time and Cost Uncertainties Limitations in on- boarding new data sources
  • 13.
  • 15.
    Traditional Data warehouse •Complete record from Transactional system • All Data are centralized • Addition of New Data every month/Day • Analytics designed against Stable environment • Many Reports run on a production basis Big Data Environment • Data from Many Sources inside and outside organization • Data often physically disturbed • Need to iterate solution to test / improve model • Large Memory analytics also part of iteration • Every iteration requires complete reload of information
  • 17.
     Researcher usesBD to decode human DNA in minutes  Predict where terrorists plan to attack  To determine which gene is mostly likely to be responsible for certain diseases  To decide which ads you are most likely to respond to on Facebook
  • 18.
     Improves customerretention  Help with product development and gain a competitive advantage  Increases efficiencies and optimize operations  Improve speed and reduce complexity
  • 19.
     Hadoop, Cloudera,Hortonworks, MapR and Amazon. There also other products such HPCC and cloud-based services such as Google BigQuery.
  • 20.
    20 Old Model: Fewcompanies generates data, all others are consuming New Model: All of us generate data, and all of us consume data
  • 21.