From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Overview of bigdata
1. BIG DATA
B . Abinaya Bharathi,
II-M.Sc Cs&IT.,
Nadar Saraswathi college Of Arts and Science, Theni.
1
2. SYNOPSIS
What is big data?
How big it is...?
Data generated by us
Real time example
5 V of big data
Technology
Application
Conclusion 2
3. WHAT IS BIG DATA ?
Big Data is nothing but a size of a data.
Data with large volume.
Collection of data sets of large that
is difficult to process .
3
4. HOW BIG IT IS!!
Byte - one seed
Kilobyte - a cup of seed
Megabyte - 8 bags of seed
Gigabyte - 3 trucks of seed
Terabyte - 2 ships of seed
Petabyte - whole volume of our India
Exabyte - volume of Asian continent
Zettabyte - fills our Indian ocean
Yottabyte - volume of whole earth
A text file
Desktop
Internet
Big data
Future
4
6. DATA GENERATED BY US
There are 2.5 quintillion bytes of data created each day
Google now processes more than 40,000 searches EVERY
second (3.5 billion searches per day)!
There are five new Facebook profiles created every
second!
Every minute there are 510,000 comments posted and
293,000 statuses updated
95 million photos and videos are uploaded on face book
per day. 6
7. TECHNOLOGY
Big data always brings a number of challenges..
80% of datum are unstructured .
how to structured that datum and
how to analyze and store the datum.
the top technologies used to store and analyse Big Data are
Hadoop
NoSql
Hive
Sqoop 7
8. HADOOP
Developed by apache software development
It is a framework. Developed by java.
This framework runs on a cluster and has an ability to
allow us to process data across all nodes.
Hadoop distributed file system - storage system of
hadoop
HDFS splits the data and distribute among different
nodes in clusters. 8
9. NOSQL
Not only sql
NoSQL (Not Only SQL) to handles unstructured data.
NoSQL databases store unstructured data with no particular schema
NoSQL gives better performance in storing very big amount of data.
Other free NoSQL open source database are
Mongodb
Couchdb
Hbase
Perst
casandra 9
10. Hive
This is a distributed data management for Hadoop.
It is like SQL query option HiveSQL (HSQL) to access big data.
This can be primarily used for Data mining purpose.
This runs on top of Hadoop.
Sqoop
This tool connects Hadoop with various relational databases to
transfer data.
used to transfer structured data to Hadoop or Hive.
10
12. Volume
size of the data content generated that needs to be analyzed.
Velocity
speed at which new data is generated, and the speed at which
data moves.
Value
meaningful outpu
worth of the data being extracted.
Having endless amounts of data is one thing, but unless it can be
turned into value it is useless.
12
13. Variety
types of data that can be analyzed. previously we use rdbms it is
a structured data so we can easily analyse the data. but now a day
80% of data are unstructured big data technology is now
allowing structured and unstructured data to be collected, stored,
and used simultaneously.
Veracity
trustworthiness of the data Just how accurate is all this data?
13
21. CONCLUSION
Companies are turning to Big Data in order to expand into new
markets and improve customer relations .
The use of analytics can improve the industry knowledge of the
analysts.
There are huge requirements of big data analytics in different fields
and industries.
So the role of big data in present IT world is very desirable.
21