Big Data
By-
Raghav Tripathi
Department of Computer
Science Engineering,
MMMUT
 Key enablers for the appearance and growth of ‘Big-Data’ are:
◦ Increase in storage capabilities
◦ Increase in processing power
◦ Availability of data
◦ Every day we create 2.5 quintillion bytes of data; 90% of the
data in the world today has been created in the last two
years alone
REF:2
Mobile Devices
Readers/Scanners
Science facilities
Microphones
Cameras
Social Media
Programs/ Software
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the tim
Sensor technology and
networks
(measuring all kinds of data)
 The progress and innovation is no longer hindered by the ability to collect data
 But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion.
 Activity data, Conversation data, Sensor data and phota and video image data.
5
 Relational Data (Tables/Transaction/Legacy
Data)
 Text Data (Web)
 Semi-structured Data (XML)
 Graph Data
◦ Social Network, Semantic Web (RDF), …
 Streaming Data
◦ You can only scan the data once
 Aggregation and Statistics
◦ Data warehouse and OLAP
 Indexing, Searching, and Querying
◦ Keyword based search
◦ Pattern matching (XML/RDF)
 Knowledge discovery
◦ Data Mining
◦ Statistical Modeling
• Main Frame
• SQL Server
• Oracle
• DB2
• Sybase
• Access , Excel,
txt etc.
• Teradata
• Emerging Market
Data
• E-commerce
• Third Party Data
• Weather
• Stock
Exchange
• Syndicated Data
• Social Media
• Chats
• Blogs
• Tweets
• Likes
• Followers
• Digital , Video
• Audio
• Geo- Spacial
Structured Un-Structured Semi-
Structured
Every minute
we send 204 million emails, generate 1,8 million Facebook likes, send 278
thousand Tweets, and up-load 200,000 photos to Facebook.
Disk Speed :
• Traditional Hard Drive: 60-100 mbps
• Solid State Disk : 250-500 mbps
Processing Time :
For 1 TB of File :
Traditional Hard-Disk
Solid State Disk
So, Main problem (storage and analysis) : Disk speed is increasing almost
linearly whereas BIG DATA is growing Exponentially!!
Other Problems:
Risk of Machine Failure , Backup Problem , Expensive .
10000 seconds
167 minutes~3
hrs
2000 Seconds
Approx 33 mins.
The‘Datafication’ of
our World;
• Activities
• Conversations
• Words
• Voice
• Social Media
• Browser logs
• Photos
• Videos
• Sensors
• Etc.
Volume
Veracity
Variety
Velocity
Analysing
Big Data:
• Text
analytics
• Sentiment
analysis
• Face
recognition
• Voice
analytics
• Movement
analytics
• Etc.
Value
Turning Big Data into Value:
 MapReduce computation framework
 Hadoop
 Distributed Database
 NoSQL technologies
A Application Of Big Data analytics
Homeland
Security
Smarter
Healthcare
Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading
Analytics
Search
Quality
• Will be so overwhelmed
• Need the right people and solve the right problems
• Costs escalate too fast
• Isn’t necessary to capture 100%
• Many sources of big data
is privacy
• self-regulation
• Legal regulation
18
 $15 billion on software firms only specializing in data
management and analytics.
 This industry on its own is worth more than $100 billion and
growing at almost 10% a year which is roughly twice as fast
as the software business as a whole.
 In February 2012, the open source analyst firm Wikibon
released the first market forecast for Big Data , listing $5.1B
revenue in 2012 with growth to $53.4B in 2017
 The McKinsey Global Institute estimates that data volume is
growing 40% per year, and will grow 44x between 2009 and
2020.
 Silicon valley and through social media is
making Big Data a global phenom.
 Not only Big Data is “cool” it happens to be a
huge growth area as well.
Big data

Big data

  • 1.
    Big Data By- Raghav Tripathi Departmentof Computer Science Engineering, MMMUT
  • 2.
     Key enablersfor the appearance and growth of ‘Big-Data’ are: ◦ Increase in storage capabilities ◦ Increase in processing power ◦ Availability of data ◦ Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has been created in the last two years alone
  • 3.
  • 4.
  • 5.
    Social media andnetworks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the tim Sensor technology and networks (measuring all kinds of data)  The progress and innovation is no longer hindered by the ability to collect data  But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion.  Activity data, Conversation data, Sensor data and phota and video image data. 5
  • 6.
     Relational Data(Tables/Transaction/Legacy Data)  Text Data (Web)  Semi-structured Data (XML)  Graph Data ◦ Social Network, Semantic Web (RDF), …  Streaming Data ◦ You can only scan the data once
  • 7.
     Aggregation andStatistics ◦ Data warehouse and OLAP  Indexing, Searching, and Querying ◦ Keyword based search ◦ Pattern matching (XML/RDF)  Knowledge discovery ◦ Data Mining ◦ Statistical Modeling
  • 8.
    • Main Frame •SQL Server • Oracle • DB2 • Sybase • Access , Excel, txt etc. • Teradata • Emerging Market Data • E-commerce • Third Party Data • Weather • Stock Exchange • Syndicated Data • Social Media • Chats • Blogs • Tweets • Likes • Followers • Digital , Video • Audio • Geo- Spacial Structured Un-Structured Semi- Structured
  • 9.
    Every minute we send204 million emails, generate 1,8 million Facebook likes, send 278 thousand Tweets, and up-load 200,000 photos to Facebook.
  • 12.
    Disk Speed : •Traditional Hard Drive: 60-100 mbps • Solid State Disk : 250-500 mbps Processing Time : For 1 TB of File : Traditional Hard-Disk Solid State Disk So, Main problem (storage and analysis) : Disk speed is increasing almost linearly whereas BIG DATA is growing Exponentially!! Other Problems: Risk of Machine Failure , Backup Problem , Expensive . 10000 seconds 167 minutes~3 hrs 2000 Seconds Approx 33 mins.
  • 13.
    The‘Datafication’ of our World; •Activities • Conversations • Words • Voice • Social Media • Browser logs • Photos • Videos • Sensors • Etc. Volume Veracity Variety Velocity Analysing Big Data: • Text analytics • Sentiment analysis • Face recognition • Voice analytics • Movement analytics • Etc. Value Turning Big Data into Value:
  • 14.
     MapReduce computationframework  Hadoop  Distributed Database  NoSQL technologies
  • 16.
    A Application OfBig Data analytics Homeland Security Smarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading Analytics Search Quality
  • 18.
    • Will beso overwhelmed • Need the right people and solve the right problems • Costs escalate too fast • Isn’t necessary to capture 100% • Many sources of big data is privacy • self-regulation • Legal regulation 18
  • 19.
     $15 billionon software firms only specializing in data management and analytics.  This industry on its own is worth more than $100 billion and growing at almost 10% a year which is roughly twice as fast as the software business as a whole.  In February 2012, the open source analyst firm Wikibon released the first market forecast for Big Data , listing $5.1B revenue in 2012 with growth to $53.4B in 2017  The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020.
  • 20.
     Silicon valleyand through social media is making Big Data a global phenom.  Not only Big Data is “cool” it happens to be a huge growth area as well.

Editor's Notes

  • #17  Quote practical examples