Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
UNIT_1-BD.pptx
1. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
DECS 43A – Big Data Analysis
II Year IV Semester
Dr. S. P. Ponnusamy
Assistant Professor and Head
1
Unit -1
Introduction to Big Data
2. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
2
Unit -1
Introduction to Big Data
Data
Characteristics of data
Types of digital data: Unstructured, Semi-structured and Structured,
Sources of data
Working with unstructured data
Evolution and Definition of big data
Characteristics and Need of big data
Challenges of big data
Data environment versus big data environment
3. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Data
3
• The quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical signals and
recorded on magnetic, optical, or mechanical recording media
• Big Data is a collection of data that is huge in volume, yet growing exponentially
with time.
• It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently.
• It includes data mining, data storage, data analysis, data sharing, and data
visualization.
Big Data
4. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Data vs Big Data
4
5. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Data Growth
5
• 1,024 bytes = 1 kilobyte (KB).
• 1,024 kilobytes (KB) = 1 MB.
• 1,024 MB = 1 GB.
• 1,024 GB = 1 TB
• 1,024 TB = 1 petabyte (PB).
• 1,024 PB = an exabyte (EB).
• 1,024 EB = a zettabyte (ZB)
• 1,024 ZB = 1 YB (Yottabyte).
7. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Types of Data
7
8. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Structured Data
8
• This is the data which is in an organized form (e.g., in rows and columns) and can be
easily used by a computer program.
• Relationships exist between entities of data, such as classes and their objects.
• Data stored in databases is an example of structured data.
• Structured data is also called relational data.
• It is split into multiple tables to enhance the integrity of the data by creating a
single record to depict an entity.
• A Structured Query Language (SQL) is needed to bring the data together.
• Structured data is easy to enter, query, and analyze.
9. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Structured Data
9
10. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Structured Data - Sources
10
11. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Ease with Structured Data
11
12. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi-Structured Data
12
• This is the data which does not conform to a data model but has some structure.
• However, it is not in a form which can be used easily by a computer program.
• Example, emails, XML, markup languages like HTML, JSON document, etc.
• Metadata for this data is available but is not sufficient.
• It is commonly called NoSQL data
13. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi-Structured Data - Sources
13
14. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi-Structured Data –XML Example
14
<ProgrammerDetails>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<CodingPlatforms>
<CodingPlatform Type="Fav">GeeksforGeeks</CodingPlatform>
<CodingPlatform Type="2ndFav">Code4Eva!</CodingPlatform>
<CodingPlatform Type="3rdFav">CodeisLife</CodingPlatform>
</CodingPlatforms>
</ProgrammerDetails>
<!--The 2ndFav and 3rdFav Coding Platforms are imaginative because Geeksforgeeks is
the best!-->
15. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Semi Structured Data – JSON Example
15
16. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Semi-Structured Data
16
17. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data
17
• This is the data which does not conform to a data model or is not in a form which can be
used easily by a computer program.
• Data can not be stored in the form of rows and columns as in Databases
• Data does not follows any semantic or rules
• Data lacks any particular format or sequence
• Data has no easily identifiable structure
• Due to lack of identifiable structure, it can not used by computer programs easily
• About 80–90% data of an organization is in this format.
• Example: memos, chat rooms, PowerPoint presentations, images, videos, letters, researches,
white papers, body of an email, etc.
18. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data – Example
18
19. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data – Sources
19
20. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Unstructured Data – issues
20
21. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Dealing with Unstructured Data
21
22. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Definition of Big Data
22
Big Data is high-volume, high-
velocity, and high-variety
information assets that demand
cost effective, innovative forms
of information processing for
enhanced insight and decision
making.
Source: Gartner IT Glossary
High-volume
High-velocity
High-variety
Cost-effective, innovative forms of
information processing
Enhanced insight & decision
making
23. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Data
23
1. Composition: The composition of data deals with the structure of data, that is,
the sources of data, the granularity, the types, and the nature of data as to
whether it is static or real-time streaming.
2. Condition: The condition of data deals with the state of data, that is, "Can one
use this data as is for analysis?" or "Does it require cleansing for further
enhancement and enrichment?"
3. Context: The context of data deals with "Where has this data been generated?"
"Why was this data generated?" How sensitive is this data?" "What are the events
associated with this data?" and so on.
24. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
24
25. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
25
26. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
26
27. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Evolution of Big Data
27
28. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Why of Big Data?
28
More Data
More Accurate Analysis
More Confidence in decision making
Greater operational efficiencies, Cost reduction,
Time reduction, New product development, Optimized offerings, etc.
29. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Need of Big Data
29
30. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data/What is Big Data?
30
• Volume: the size and amounts of big data that companies manage and
analyse.
• Variety: the diversity and range of different data types, including
unstructured data, semi-structured data and structured data.
• Velocity: the speed at which companies receive, store and manage data
– e.g., the specific number of social media posts or search queries
received within a day, hour or other unit of time.
31. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data
31
32. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data
32
33. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Characteristics of Big Data – other V’s
33
• Value: refers to the value that big data can
provide, and it relates directly to what
organizations can do with that collected
data.
• Veracity: the “truth” or accuracy of data
and information assets, which often
determines executive-level confidence
34. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Challenges of Big Data
34
Challenges with Big Data
Capture
Storage
Curation
Search
Transfer
Visualization
Privacy
Violations
Analysis
35. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Sources of Big Data
35
36. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Traditional Business Intelligence (BI) versus Big Data
36
• In traditional BI environment, data resides in a central server whereas
in big data environment, data resides in a distributed file system.
• Traditional BI Move data to code
• Big Data Environment Move code to data
• In traditional BI environment, data is analyzed in offline mode
whereas in big data environment data is analyzed in both real time as
well as offline mode.
37. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
A Typical Data Warehouse Environment
37
Data Warehouse
ERP
CRM
Legacy
3rd party Apps
Reporting /
Dashboarding
OLAP
Ad hoc querying
Modeling
• In a typical DW environment, data is collected from multiple disparate sources,
integrated, cleansed and transformed before loading it to a data warehouse.
• A host of market leading BI tools can then be used on top of the data warehouse for
reporting/dashboarding, ad hoc querying and modelling.
38. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
A Typical Hadoop Environment
38
Hadoop takes care of storage and processing using the following:
a)HDFS (Hadoop Distributed File System) (distributed storage)
b)MapReduce (distributed processing)
ODS-operational Data store
Web Logs
Images and Videos
Social Media
(Twitter, Facebook, etc.)
Docs & PDFs
HDFS
Operational
Systems
Data Warehouse
Data Marts
ODS
Hadoop
MapReduce
39. Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science
Co-existence of Big Data and Data Warehouse
39
Web Logs
Images and Videos
Social Media
(Twitter, Facebook, etc.)
Docs & PDFs
HDFS
Operational
Systems
Data Warehouse
Data Marts
ODS
Hadoop
MapReduce
Data Warehouse