Mrs. D. Suja Mary
Assistant Professor
Nanjil Catholic College of Arts and Science,
Kaliyakkavilai
What is Data?
 The quantities, characters, or symbols on which
operations are performed by a computer, which
may be stored and transmitted in the form of
electrical signals and recorded on magnetic,
optical, or mechanical recording media.
What is Big Data?
 Big Data is a collection of data that is huge in
volume, yet growing exponentially with time. It
is a data with so large size and complexity that
none of traditional data management tools can
store it or process it efficiently. Big data is also a
data but with huge size.
What Is Data Analytics?
 The term data analytics refers to the process of
examining datasets to draw conclusions about the
information they contain. Data analytic techniques
enable you to take raw data and uncover patterns to
extract valuable insights from it.
 Data Scientists and Analysts use data analytics
techniques in their research, and businesses also use
it to inform their decisions.
 Data analysis can help companies better understand
their customers, evaluate their ad campaigns,
personalize content, create content strategies and
develop products.
 businesses can use data analytics to boost business
performance and improve their bottom line.
 Analysis techniques give businesses access to insights that can help
them to improve their performance.
 As the importance of data analytics in the business world
increases, it becomes more critical that our company understand
how to implement it. Some benefits of data analytics include:
 1. Improved Decision Making
Companies can use the insights they gain from data analytics to
inform their decisions, leading to better outcomes.
 2. More Effective Marketing
Data analytics also gives us useful insights into how our campaigns
are performing so that can fine-tune them for optimal outcomes.
 3. Better Customer Service
Data analytics provide us with more insights into our customers,
allowing us to tailor customer service to their needs, provide more
personalization and build stronger relationships with them.
 4. More Efficient Operations
Data analytics can help us streamline our processes, save money
and boost our bottom line. When we have an improved
understanding of what our audience wants, we waste less time on
creating ads and content that don’t match our audience’s
interests.
 It is an organized collection of structured data. It is a
collection of related information.
 DB stores and access data electronically.
 A database is stored as a file or a set of files on magnetic
disk or tape, optical disk, or some other secondary storage
device.
 It is an data structure that stores organized information.
 They are administrated to facilitate the storage of data,
retrieval of data, modification of data, and deletion of
data.
 It allows processing various data-processing operations.
 Databases bolster stockpiling and control of information.
 Databases make information administration simple.
 Any database developer with certain sets of syntax can
process can work on the database
 A DB is a collection of related data. There are
two types of databases – Relation Database
Management System while other is Non –
Relational Database Management System.
 If we are storing and capable of processing a
very huge volume of data in databases,
Definitely we can store and process Big Data
through relational or Non-relational Databases.
 Big data is not going to replace databases. In one
form or other we will be using SQL databases to
store and process Big Data. In this regard, Big
Data is completely separate from DB.
Given below is the difference between Big Data and Database:
 Big Data is a term applied to data sets whose size or type is beyond
the ability of traditional relational databases. A traditional
database is not able to capture, manage, and process the high
volume of data with low-latency While Database is a collection of
information that is organized so that it can be easily captured,
accessed, managed and updated.
 Big Data refers to technologies and initiatives that involve data
that is too diverse i.e. varieties, rapid-changing or massive for
skills, conventional technologies, and infrastructure to address
efficiently While Database management system (DBMS) extracts
information from the database in response to queries but it in
restricted conditions.
 There can be any varieties of data while DB can be defined through
some schema.
 It is difficult to store and process while Databases like SQL, data
can be easily stored and process.
 Raw data is the data that is collected from a
source, but in its initial state. It has not yet
been processed — or cleaned, organized, and
visually presented. Raw data can be
manually written down or typed, recorded,
or automatically input by a machine. You can
find raw data in a variety of places, including
databases, files, spreadsheets, and even on
source devices, such as a camera.
 Data analysts, software, and artificial
intelligence (AI) all work to transform raw data
into processed data.
 They start by organizing and cleaning the raw
data. One of the most important parts of this
process is removing outliers and duplicates within
the data set.
 The next step is an initial analysis that may
involve data manipulation. Especially if analysts
are analyzing raw data based on human responses
to a question, they will look closely at those
responses and determine if respondents
inaccurately replied to the question in a way that
will change the results.
 Raw data serves several purposes, particularly in businesses
where full data visibility is key to statistical and predictive
analytics.
Here are a few reasons why businesses heavily rely on raw data
sources:
 Raw data is the starting phase of all data and the initial source of
data-based decisions. You can’t make visually compelling charts
or overarching analytical statements about processed data until
we’ve worked through all of the raw data.
 We can trust the integrity of raw data. We don’t have to worry
that something has been removed or adjusted, because the
format has not yet been manipulated by humans or machines.
 AI and machine learning methods can only analyze data in a raw
format. Once the data has been processed, it is illegible to these
technologies.
 Raw data gives you a backup resource. We can check our work
and go back to the source after processing and manipulating our
data sets. It’s all there for your reference if we run into a
problem and need a new analysis.
All data inside a computer is transmitted as a
series of electrical signals that are either on
or off. Therefore, in order for a computer to
be able to process any kind of data, including
text, images and sound, they must be
converted into binary form.
 Data Representation Types of data: – Numbers – Text – Images
– Audio & Video
 Text
 When any key on a keyboard is pressed, it needs to be
converted into a binary number so that it can be processed by
the computer and the typed character can appear on the
screen.
 A code where each number represents a character can be used
to convert text into binary. One code we can use for this is
called ASCII. The ASCII code takes each character on the
keyboard and assigns it a binary number.
 Images also need to be converted into binary in
order for a computer to process them so that
they can be seen on our screen. Digital images
are made up of pixels. Each pixel in an image is
made up of binary numbers.
 If we say that 1 is black (or on) and 0 is white (or
off), then a simple black and white picture can
be created using binary.
 The terms audio and video commonly refers
to the time-based media storage format for
sound/music and moving pictures
information. Audio and video digital
recording, also referred as audio and video
codecs, can be uncompressed, lossless
compressed, or lossy compressed depending
on the desired quality and use cases.
 Connected objects are another source of raw
data, which retrieves a large amount of data
through their sensors.
 The Internet of Things (IoT) contributes to
double the size of the digital universe every 2
years, which could be 44,000 billion gigabytes in
2020, 10 times more than in 2013
 The connected object thus allows extend the
scope of internet allowing any object, machine
or living element to transmit information about
its environment and eventually be activated
remotely.
Unit  i big data introduction

Unit i big data introduction

  • 1.
    Mrs. D. SujaMary Assistant Professor Nanjil Catholic College of Arts and Science, Kaliyakkavilai
  • 2.
    What is Data? The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. What is Big Data?  Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.
  • 3.
    What Is DataAnalytics?  The term data analytics refers to the process of examining datasets to draw conclusions about the information they contain. Data analytic techniques enable you to take raw data and uncover patterns to extract valuable insights from it.  Data Scientists and Analysts use data analytics techniques in their research, and businesses also use it to inform their decisions.  Data analysis can help companies better understand their customers, evaluate their ad campaigns, personalize content, create content strategies and develop products.  businesses can use data analytics to boost business performance and improve their bottom line.
  • 4.
     Analysis techniquesgive businesses access to insights that can help them to improve their performance.  As the importance of data analytics in the business world increases, it becomes more critical that our company understand how to implement it. Some benefits of data analytics include:  1. Improved Decision Making Companies can use the insights they gain from data analytics to inform their decisions, leading to better outcomes.  2. More Effective Marketing Data analytics also gives us useful insights into how our campaigns are performing so that can fine-tune them for optimal outcomes.  3. Better Customer Service Data analytics provide us with more insights into our customers, allowing us to tailor customer service to their needs, provide more personalization and build stronger relationships with them.  4. More Efficient Operations Data analytics can help us streamline our processes, save money and boost our bottom line. When we have an improved understanding of what our audience wants, we waste less time on creating ads and content that don’t match our audience’s interests.
  • 5.
     It isan organized collection of structured data. It is a collection of related information.  DB stores and access data electronically.  A database is stored as a file or a set of files on magnetic disk or tape, optical disk, or some other secondary storage device.  It is an data structure that stores organized information.  They are administrated to facilitate the storage of data, retrieval of data, modification of data, and deletion of data.  It allows processing various data-processing operations.  Databases bolster stockpiling and control of information.  Databases make information administration simple.  Any database developer with certain sets of syntax can process can work on the database
  • 6.
     A DBis a collection of related data. There are two types of databases – Relation Database Management System while other is Non – Relational Database Management System.  If we are storing and capable of processing a very huge volume of data in databases, Definitely we can store and process Big Data through relational or Non-relational Databases.  Big data is not going to replace databases. In one form or other we will be using SQL databases to store and process Big Data. In this regard, Big Data is completely separate from DB.
  • 7.
    Given below isthe difference between Big Data and Database:  Big Data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases. A traditional database is not able to capture, manage, and process the high volume of data with low-latency While Database is a collection of information that is organized so that it can be easily captured, accessed, managed and updated.  Big Data refers to technologies and initiatives that involve data that is too diverse i.e. varieties, rapid-changing or massive for skills, conventional technologies, and infrastructure to address efficiently While Database management system (DBMS) extracts information from the database in response to queries but it in restricted conditions.  There can be any varieties of data while DB can be defined through some schema.  It is difficult to store and process while Databases like SQL, data can be easily stored and process.
  • 8.
     Raw datais the data that is collected from a source, but in its initial state. It has not yet been processed — or cleaned, organized, and visually presented. Raw data can be manually written down or typed, recorded, or automatically input by a machine. You can find raw data in a variety of places, including databases, files, spreadsheets, and even on source devices, such as a camera.
  • 9.
     Data analysts,software, and artificial intelligence (AI) all work to transform raw data into processed data.  They start by organizing and cleaning the raw data. One of the most important parts of this process is removing outliers and duplicates within the data set.  The next step is an initial analysis that may involve data manipulation. Especially if analysts are analyzing raw data based on human responses to a question, they will look closely at those responses and determine if respondents inaccurately replied to the question in a way that will change the results.
  • 10.
     Raw dataserves several purposes, particularly in businesses where full data visibility is key to statistical and predictive analytics. Here are a few reasons why businesses heavily rely on raw data sources:  Raw data is the starting phase of all data and the initial source of data-based decisions. You can’t make visually compelling charts or overarching analytical statements about processed data until we’ve worked through all of the raw data.  We can trust the integrity of raw data. We don’t have to worry that something has been removed or adjusted, because the format has not yet been manipulated by humans or machines.  AI and machine learning methods can only analyze data in a raw format. Once the data has been processed, it is illegible to these technologies.  Raw data gives you a backup resource. We can check our work and go back to the source after processing and manipulating our data sets. It’s all there for your reference if we run into a problem and need a new analysis.
  • 11.
    All data insidea computer is transmitted as a series of electrical signals that are either on or off. Therefore, in order for a computer to be able to process any kind of data, including text, images and sound, they must be converted into binary form.
  • 12.
     Data RepresentationTypes of data: – Numbers – Text – Images – Audio & Video  Text  When any key on a keyboard is pressed, it needs to be converted into a binary number so that it can be processed by the computer and the typed character can appear on the screen.  A code where each number represents a character can be used to convert text into binary. One code we can use for this is called ASCII. The ASCII code takes each character on the keyboard and assigns it a binary number.
  • 13.
     Images alsoneed to be converted into binary in order for a computer to process them so that they can be seen on our screen. Digital images are made up of pixels. Each pixel in an image is made up of binary numbers.  If we say that 1 is black (or on) and 0 is white (or off), then a simple black and white picture can be created using binary.
  • 14.
     The termsaudio and video commonly refers to the time-based media storage format for sound/music and moving pictures information. Audio and video digital recording, also referred as audio and video codecs, can be uncompressed, lossless compressed, or lossy compressed depending on the desired quality and use cases.
  • 15.
     Connected objectsare another source of raw data, which retrieves a large amount of data through their sensors.  The Internet of Things (IoT) contributes to double the size of the digital universe every 2 years, which could be 44,000 billion gigabytes in 2020, 10 times more than in 2013  The connected object thus allows extend the scope of internet allowing any object, machine or living element to transmit information about its environment and eventually be activated remotely.