BIG DATA
B . Abinaya Bharathi,
II-M.Sc Cs&IT.,
Nadar Saraswathi college Of Arts and Science, Theni.
1
SYNOPSIS
 What is big data?
 How big it is...?
 Data generated by us
 Real time example
 5 V of big data
 Technology
 Application
 Conclusion 2
WHAT IS BIG DATA ?
 Big Data is nothing but a size of a data.
 Data with large volume.
 Collection of data sets of large that
is difficult to process .
3
HOW BIG IT IS!!
Byte - one seed
Kilobyte - a cup of seed
Megabyte - 8 bags of seed
Gigabyte - 3 trucks of seed
Terabyte - 2 ships of seed
Petabyte - whole volume of our India
Exabyte - volume of Asian continent
Zettabyte - fills our Indian ocean
Yottabyte - volume of whole earth
A text file
Desktop
Internet
Big data
Future
4
REAL TIME EXAMPLES
Facebook Google
5
DATA GENERATED BY US
 There are 2.5 quintillion bytes of data created each day
 Google now processes more than 40,000 searches EVERY
second (3.5 billion searches per day)!
 There are five new Facebook profiles created every
second!
 Every minute there are 510,000 comments posted and
293,000 statuses updated
 95 million photos and videos are uploaded on face book
per day. 6
TECHNOLOGY
 Big data always brings a number of challenges..
 80% of datum are unstructured .
 how to structured that datum and
 how to analyze and store the datum.
 the top technologies used to store and analyse Big Data are
 Hadoop
 NoSql
 Hive
 Sqoop 7
HADOOP
 Developed by apache software development
 It is a framework. Developed by java.
 This framework runs on a cluster and has an ability to
allow us to process data across all nodes.
 Hadoop distributed file system - storage system of
hadoop
 HDFS splits the data and distribute among different
nodes in clusters. 8
NOSQL
 Not only sql
 NoSQL (Not Only SQL) to handles unstructured data.
 NoSQL databases store unstructured data with no particular schema
 NoSQL gives better performance in storing very big amount of data.
 Other free NoSQL open source database are
 Mongodb
 Couchdb
 Hbase
 Perst
 casandra 9
Hive
 This is a distributed data management for Hadoop.
 It is like SQL query option HiveSQL (HSQL) to access big data.
 This can be primarily used for Data mining purpose.
 This runs on top of Hadoop.
Sqoop
 This tool connects Hadoop with various relational databases to
transfer data.
 used to transfer structured data to Hadoop or Hive.
10
5V OF BIG DATA
11
 Volume
 size of the data content generated that needs to be analyzed.
 Velocity
 speed at which new data is generated, and the speed at which
data moves.
 Value
 meaningful outpu
 worth of the data being extracted.
 Having endless amounts of data is one thing, but unless it can be
turned into value it is useless.
12
 Variety
 types of data that can be analyzed. previously we use rdbms it is
a structured data so we can easily analyse the data. but now a day
80% of data are unstructured big data technology is now
allowing structured and unstructured data to be collected, stored,
and used simultaneously.
 Veracity
 trustworthiness of the data Just how accurate is all this data?
13
APPLICATION
14
GOVERNMENT
15
MEDIA AND ENTERTAINMENT
16
EDUCATION
17
HEALTH CARE
18
 IOT
19
TRANSPORTATION
20
CONCLUSION
 Companies are turning to Big Data in order to expand into new
markets and improve customer relations .
 The use of analytics can improve the industry knowledge of the
analysts.
 There are huge requirements of big data analytics in different fields
and industries.
 So the role of big data in present IT world is very desirable.
21
THANK YOU
22

Overview of bigdata

  • 1.
    BIG DATA B .Abinaya Bharathi, II-M.Sc Cs&IT., Nadar Saraswathi college Of Arts and Science, Theni. 1
  • 2.
    SYNOPSIS  What isbig data?  How big it is...?  Data generated by us  Real time example  5 V of big data  Technology  Application  Conclusion 2
  • 3.
    WHAT IS BIGDATA ?  Big Data is nothing but a size of a data.  Data with large volume.  Collection of data sets of large that is difficult to process . 3
  • 4.
    HOW BIG ITIS!! Byte - one seed Kilobyte - a cup of seed Megabyte - 8 bags of seed Gigabyte - 3 trucks of seed Terabyte - 2 ships of seed Petabyte - whole volume of our India Exabyte - volume of Asian continent Zettabyte - fills our Indian ocean Yottabyte - volume of whole earth A text file Desktop Internet Big data Future 4
  • 5.
  • 6.
    DATA GENERATED BYUS  There are 2.5 quintillion bytes of data created each day  Google now processes more than 40,000 searches EVERY second (3.5 billion searches per day)!  There are five new Facebook profiles created every second!  Every minute there are 510,000 comments posted and 293,000 statuses updated  95 million photos and videos are uploaded on face book per day. 6
  • 7.
    TECHNOLOGY  Big dataalways brings a number of challenges..  80% of datum are unstructured .  how to structured that datum and  how to analyze and store the datum.  the top technologies used to store and analyse Big Data are  Hadoop  NoSql  Hive  Sqoop 7
  • 8.
    HADOOP  Developed byapache software development  It is a framework. Developed by java.  This framework runs on a cluster and has an ability to allow us to process data across all nodes.  Hadoop distributed file system - storage system of hadoop  HDFS splits the data and distribute among different nodes in clusters. 8
  • 9.
    NOSQL  Not onlysql  NoSQL (Not Only SQL) to handles unstructured data.  NoSQL databases store unstructured data with no particular schema  NoSQL gives better performance in storing very big amount of data.  Other free NoSQL open source database are  Mongodb  Couchdb  Hbase  Perst  casandra 9
  • 10.
    Hive  This isa distributed data management for Hadoop.  It is like SQL query option HiveSQL (HSQL) to access big data.  This can be primarily used for Data mining purpose.  This runs on top of Hadoop. Sqoop  This tool connects Hadoop with various relational databases to transfer data.  used to transfer structured data to Hadoop or Hive. 10
  • 11.
    5V OF BIGDATA 11
  • 12.
     Volume  sizeof the data content generated that needs to be analyzed.  Velocity  speed at which new data is generated, and the speed at which data moves.  Value  meaningful outpu  worth of the data being extracted.  Having endless amounts of data is one thing, but unless it can be turned into value it is useless. 12
  • 13.
     Variety  typesof data that can be analyzed. previously we use rdbms it is a structured data so we can easily analyse the data. but now a day 80% of data are unstructured big data technology is now allowing structured and unstructured data to be collected, stored, and used simultaneously.  Veracity  trustworthiness of the data Just how accurate is all this data? 13
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    CONCLUSION  Companies areturning to Big Data in order to expand into new markets and improve customer relations .  The use of analytics can improve the industry knowledge of the analysts.  There are huge requirements of big data analytics in different fields and industries.  So the role of big data in present IT world is very desirable. 21
  • 22.