TOPIK 1
INTRODUCTION TO BIG DATA
COMP6725 - Big Data Technologies
LEARNING OUTCOMES
At the end of this session, students will be able to:
o LO1: Describe big data architecture layer and processing concepts
OUTCOMES
Students are able to describe big data architecture layer and processing concepts
OUTLINE
1. Business Motivations and Driver for Big Data Adoptions
2. Intro to Big Data
3. Big Data Characteristics
4. Big Data Technology
5. Big Data Life Cycle
6. Challenges Faced by Big Data Technology
7. Big Data Examples​​
BUSINESS MOTIVATIONS AND DRIVER FOR BIG DATA
ADOPTIONS
CURRENT SITUATIONS
Pic 1.1. The information what going in world.
Source : https://www.datasciencecentral.com/ (2013)
INFORMATION FROM INTERNET OF THINGS
Pic 1.2. Information from Internet of Things.
Source : https://www.datasciencecentral.com/ (2013)
STORAGE GROWTH AND DIGITIZATION
Nowadays
Pic 1.3. Storage Growth .
Source : https://www.datasciencecentral.com/ (2013)
INTRO TO BIG DATA
EVOLUTION OF BIG DATA
Pic 1.4. Evolution of Big Data.
Source : Big Data Concepts, Technology, and Architecture. 2021
FAILURE OF TRADITIONAL DATABASE IN HANDLING
BIG DATA
The limitations of traditional database in handling big data.
o Exponential increase in data volume, which scales in terabytes and petabytes, has turned
out to become a challenge to the RDBMS in handling such a massive volume of data.
o To address this issue, the RDBMS increased the number of processors and added more
memory units, which in turn increased the cost.
o Almost 80% of the data fetched were of semi-structured and unstructured for- mat, which
RDBMS could not deal with.
o RDBMS could not capture the data coming in at high velocity.
DATA MINING VS. BIG DATA
ATTRIBUTES RDBMS BIG DATA
Data volume Gigabytes to terabytes Petabytes to zettabytes
Organization Centralized Distributed
Data type Structured Unstructured and semi-structured
Hardware type High-end model Commodity hardware
Updates Read/write many times Write once, read many times
Schema Static Dynamic
DATA MINING VS. BIG DATA (CONT)
No RDBMS BIG DATA
1 Data mining is the process of
discovering the underlying
knowledge from the data sets.
Big data refers to massive volume of data
characterized by volume, velocity, and
variety.
2 Structured data retrieved from
spread sheets, relational
databases, etc.
Structured, unstructured, or semi-
structured data retrieved from non-
relational databases, such as NoSQL.
3 Data mining is capable of
processing large data sets, but
the data processing costs are
high.
Big data tools and technologies are
capable of storing and processing large
volumes of data at a comparatively lower
cost.
4 Data mining can process only
data sets that range from
gigabytes to terabytes.
Big data technology is capable of storing
and processing data that range from
petabytes to zettabytes.
WHAT IS BIG DATA?
o Big data is defined as collections of datasets whose volume, velocity or variety is so large
that it is difficult to store, manage, process and analyze the data using traditional databases
and data processing tools.
o Big Data analytics deals with collection, storage, processing and analysis of this massive-
scale data.
o Specialized tools and frameworks are required for big data analysis when:
1. The volume of data involved is so large that it is difficult to store, process and analyze
data on a single machine,
2. The velocity of data is very high and the data needs to be analyzed in real-time,
3. There is variety of data involved, which can be structured, unstructured or semi-
structured, and is collected from multiple data sources,
4. Various types of analytics need to be performed to extract value from the data such as
descriptive, diagnostic, predictive and prescriptive analytics.
WHAT IS BIG DATA? (CONT)
o Big data analytics involves several steps starting from data cleansing, data munging (or
wrangling), data processing and visualization.
o Big data analytics life-cycle starts from the collection of data from multiple data sources
also life-cycle starts from the collection of data from multiple data sources.
BIG DATA CHARACTERISTICS
CHARACTERISTICS OF BIG DATA
Volume
• Large data would not fit on a
single machine.
• Specialized tools and
frameworks are required to
store process and analyze it.
Velocity
• How fast the data is
generated.
• Specialized tools are
required to ingest such high
velocity data into the big
data infrastructure and
analyze the data in real-time.
Variety
• The forms of the data.
• Consists of structured,
unstructured, or semi-
structured data, including
text data, image, audio,
video and sensor data.
Veracity
• How accurate is the data.
• Cleansing of data is
important so that incorrect
and faulty data can be
filtered out.
•Value
•the usefulness of data for the intended purpose.
•The end goal of any big data analytics system is to
extract value from the data.
•some applications value also depends on how fast
we are able to process the data.
10V’S OF BIG DATA
Pic 1.5. 10V’s of Big Data
Source : http://www.datasciencecentral.com/
BIG DATA TECHNOLOGY
BIG DATA TECHNOLOGY
o The core components of big data technologies are the tools and technologies that provide
the capacity to store, process, and analyze the data.
o The key technologies include
• Hadoop
• HDFS
• MapReduce
BIG DATA LIFE CYCLE
BIG DATA LIFE CYCLE
Pic 1.6. Big Data Life Cycle
Source : Big Data Concepts, Technology, and Architecture.. 2021
CHALLENGES FACED BY BIG DATA TECHNOLOGY
CHALLENGES FACED BY BIG DATA TECHNOLOGY
o a lot of challenges when it comes to dealing with the data, some data are structured that
could be stored in traditional databases, while some are videos, pictures, and documents,
which may be unstructured or semi- structured, generated by sensors, social media,
satellite, business transactions, and much more.
o Real challenge is how to make sense by integrating disparate data from diversified sources :
• Heterogeneity and incompleteness
• Volume and velocity of the data
• Data storage
• Data privacy
BIG DATA EXAMPLES
BIG DATA EXAMPLES
Web
Web
Analytics
Performance
Monitoring
Ad Targeting
& Analytics
Content
Recommendatio
n
Financial
Credit
Risk
Modeling
Fraud
Detection
Healthcare
Epidemiological
Surveillance
Patient Similarity-
based Decision
Intelligence Application
Adverse Drug
Events
Prediction
Detecting Claim
Anomalies
Evidence-based
Medicine
Real-time health
monitoring
Internet of
Things
Intrusion
Detection
Smart
Parkings
Smart
Roads
Structural
Health
Monitoring
Smart
Irrigation
Environment
Weather
Monitoring
Air Pollution
Monitoring
Noise Pollution
Monitoring
Forest Fire
Detection
River Floods
Detection
Water Quality
Monitoring
BIG DATA IN EDUCATION INDUSTRY
Pic 1.7. Big Data in Education Industry
Source : https://intellipaat.com/
BIG DATA IN HEALTHCARE
Pic 1.8. Big Data in Healthcare
Source : https://intellipaat.com/
BIG DATA IN GOVERNMENT SECTOR
Pic 1.19. Big Data in Government Sector
Source : https://intellipaat.com/
BIG DATA IN BANKING SECTOR
Pic 1.10. Big Data in Banking Sector
Source : https://intellipaat.com/
BIG DATA IN WEATHER PATTERNS
Pic 1.11. Big Data in Weather Patterns
Source : https://intellipaat.com/
ThankYOU...
SUMMARY
o Big data is a term for data sets that are so large or complex that traditional data
processing application software are inadequate to deal with them.
o Challenges include capture, storage, analysis, search, sharing, transfer, visualization,
querying, updating and information privacy.
o The term "big data" often refers simply to the use of predictive analytics, user
behaviour analytics, or certain other advanced data analytics methods that extract
value from data, and seldom to a particular size of data set.
o Big Data Characteristics is Volume, Velocity, Variety and Veracity
REFERENCES
o Balusamy. Balamurugan, Abirami.Nandhini, Kadry.R, Seifedine, & Gandomi. Amir H. (2021). Big Data
Concepts, Technology, and Architecture. 1st. Wiley. ISBN 978-1-119-70182-8. Chapter 1
o Sawant, N. and Shah, H., (2013). Big data application architecture Q&A. A Problem-Solution Approach.
Apress, Springer Science. ISBN: 978-1-4302-6292-3. Chapter 1
o https://www.youtube.com/watch?v=aC2CmTTZTVU
o http://www.datasciencecentral.com/
o https://www.irjet.net/archives/V4/i9/IRJET-V4I957.pdf
o http://www.martinhilbert.net/WorldInfoCapacity.html
o http://www.cse.unsw.edu.au/~cs9313/
o https://intellipaat.com/

20211011112936_PPT01-Introduction to Big Data.pptx

  • 1.
    TOPIK 1 INTRODUCTION TOBIG DATA COMP6725 - Big Data Technologies
  • 2.
    LEARNING OUTCOMES At theend of this session, students will be able to: o LO1: Describe big data architecture layer and processing concepts
  • 3.
    OUTCOMES Students are ableto describe big data architecture layer and processing concepts
  • 4.
    OUTLINE 1. Business Motivationsand Driver for Big Data Adoptions 2. Intro to Big Data 3. Big Data Characteristics 4. Big Data Technology 5. Big Data Life Cycle 6. Challenges Faced by Big Data Technology 7. Big Data Examples​​
  • 5.
    BUSINESS MOTIVATIONS ANDDRIVER FOR BIG DATA ADOPTIONS
  • 6.
    CURRENT SITUATIONS Pic 1.1.The information what going in world. Source : https://www.datasciencecentral.com/ (2013)
  • 7.
    INFORMATION FROM INTERNETOF THINGS Pic 1.2. Information from Internet of Things. Source : https://www.datasciencecentral.com/ (2013)
  • 8.
    STORAGE GROWTH ANDDIGITIZATION Nowadays Pic 1.3. Storage Growth . Source : https://www.datasciencecentral.com/ (2013)
  • 9.
  • 10.
    EVOLUTION OF BIGDATA Pic 1.4. Evolution of Big Data. Source : Big Data Concepts, Technology, and Architecture. 2021
  • 11.
    FAILURE OF TRADITIONALDATABASE IN HANDLING BIG DATA The limitations of traditional database in handling big data. o Exponential increase in data volume, which scales in terabytes and petabytes, has turned out to become a challenge to the RDBMS in handling such a massive volume of data. o To address this issue, the RDBMS increased the number of processors and added more memory units, which in turn increased the cost. o Almost 80% of the data fetched were of semi-structured and unstructured for- mat, which RDBMS could not deal with. o RDBMS could not capture the data coming in at high velocity.
  • 12.
    DATA MINING VS.BIG DATA ATTRIBUTES RDBMS BIG DATA Data volume Gigabytes to terabytes Petabytes to zettabytes Organization Centralized Distributed Data type Structured Unstructured and semi-structured Hardware type High-end model Commodity hardware Updates Read/write many times Write once, read many times Schema Static Dynamic
  • 13.
    DATA MINING VS.BIG DATA (CONT) No RDBMS BIG DATA 1 Data mining is the process of discovering the underlying knowledge from the data sets. Big data refers to massive volume of data characterized by volume, velocity, and variety. 2 Structured data retrieved from spread sheets, relational databases, etc. Structured, unstructured, or semi- structured data retrieved from non- relational databases, such as NoSQL. 3 Data mining is capable of processing large data sets, but the data processing costs are high. Big data tools and technologies are capable of storing and processing large volumes of data at a comparatively lower cost. 4 Data mining can process only data sets that range from gigabytes to terabytes. Big data technology is capable of storing and processing data that range from petabytes to zettabytes.
  • 14.
    WHAT IS BIGDATA? o Big data is defined as collections of datasets whose volume, velocity or variety is so large that it is difficult to store, manage, process and analyze the data using traditional databases and data processing tools. o Big Data analytics deals with collection, storage, processing and analysis of this massive- scale data. o Specialized tools and frameworks are required for big data analysis when: 1. The volume of data involved is so large that it is difficult to store, process and analyze data on a single machine, 2. The velocity of data is very high and the data needs to be analyzed in real-time, 3. There is variety of data involved, which can be structured, unstructured or semi- structured, and is collected from multiple data sources, 4. Various types of analytics need to be performed to extract value from the data such as descriptive, diagnostic, predictive and prescriptive analytics.
  • 15.
    WHAT IS BIGDATA? (CONT) o Big data analytics involves several steps starting from data cleansing, data munging (or wrangling), data processing and visualization. o Big data analytics life-cycle starts from the collection of data from multiple data sources also life-cycle starts from the collection of data from multiple data sources.
  • 16.
  • 17.
    CHARACTERISTICS OF BIGDATA Volume • Large data would not fit on a single machine. • Specialized tools and frameworks are required to store process and analyze it. Velocity • How fast the data is generated. • Specialized tools are required to ingest such high velocity data into the big data infrastructure and analyze the data in real-time. Variety • The forms of the data. • Consists of structured, unstructured, or semi- structured data, including text data, image, audio, video and sensor data. Veracity • How accurate is the data. • Cleansing of data is important so that incorrect and faulty data can be filtered out. •Value •the usefulness of data for the intended purpose. •The end goal of any big data analytics system is to extract value from the data. •some applications value also depends on how fast we are able to process the data.
  • 18.
    10V’S OF BIGDATA Pic 1.5. 10V’s of Big Data Source : http://www.datasciencecentral.com/
  • 19.
  • 20.
    BIG DATA TECHNOLOGY oThe core components of big data technologies are the tools and technologies that provide the capacity to store, process, and analyze the data. o The key technologies include • Hadoop • HDFS • MapReduce
  • 21.
  • 22.
    BIG DATA LIFECYCLE Pic 1.6. Big Data Life Cycle Source : Big Data Concepts, Technology, and Architecture.. 2021
  • 23.
    CHALLENGES FACED BYBIG DATA TECHNOLOGY
  • 24.
    CHALLENGES FACED BYBIG DATA TECHNOLOGY o a lot of challenges when it comes to dealing with the data, some data are structured that could be stored in traditional databases, while some are videos, pictures, and documents, which may be unstructured or semi- structured, generated by sensors, social media, satellite, business transactions, and much more. o Real challenge is how to make sense by integrating disparate data from diversified sources : • Heterogeneity and incompleteness • Volume and velocity of the data • Data storage • Data privacy
  • 25.
  • 26.
    BIG DATA EXAMPLES Web Web Analytics Performance Monitoring AdTargeting & Analytics Content Recommendatio n Financial Credit Risk Modeling Fraud Detection Healthcare Epidemiological Surveillance Patient Similarity- based Decision Intelligence Application Adverse Drug Events Prediction Detecting Claim Anomalies Evidence-based Medicine Real-time health monitoring Internet of Things Intrusion Detection Smart Parkings Smart Roads Structural Health Monitoring Smart Irrigation Environment Weather Monitoring Air Pollution Monitoring Noise Pollution Monitoring Forest Fire Detection River Floods Detection Water Quality Monitoring
  • 27.
    BIG DATA INEDUCATION INDUSTRY Pic 1.7. Big Data in Education Industry Source : https://intellipaat.com/
  • 28.
    BIG DATA INHEALTHCARE Pic 1.8. Big Data in Healthcare Source : https://intellipaat.com/
  • 29.
    BIG DATA INGOVERNMENT SECTOR Pic 1.19. Big Data in Government Sector Source : https://intellipaat.com/
  • 30.
    BIG DATA INBANKING SECTOR Pic 1.10. Big Data in Banking Sector Source : https://intellipaat.com/
  • 31.
    BIG DATA INWEATHER PATTERNS Pic 1.11. Big Data in Weather Patterns Source : https://intellipaat.com/
  • 32.
  • 33.
    SUMMARY o Big datais a term for data sets that are so large or complex that traditional data processing application software are inadequate to deal with them. o Challenges include capture, storage, analysis, search, sharing, transfer, visualization, querying, updating and information privacy. o The term "big data" often refers simply to the use of predictive analytics, user behaviour analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. o Big Data Characteristics is Volume, Velocity, Variety and Veracity
  • 34.
    REFERENCES o Balusamy. Balamurugan,Abirami.Nandhini, Kadry.R, Seifedine, & Gandomi. Amir H. (2021). Big Data Concepts, Technology, and Architecture. 1st. Wiley. ISBN 978-1-119-70182-8. Chapter 1 o Sawant, N. and Shah, H., (2013). Big data application architecture Q&A. A Problem-Solution Approach. Apress, Springer Science. ISBN: 978-1-4302-6292-3. Chapter 1 o https://www.youtube.com/watch?v=aC2CmTTZTVU o http://www.datasciencecentral.com/ o https://www.irjet.net/archives/V4/i9/IRJET-V4I957.pdf o http://www.martinhilbert.net/WorldInfoCapacity.html o http://www.cse.unsw.edu.au/~cs9313/ o https://intellipaat.com/

Editor's Notes

  • #7 Source: https://www.datasciencecentral.com/profiles/blogs/basic-understanding-of-big-data-what-is-this-and-how-it-is-going
  • #9 Source: http://www.martinhilbert.net/WorldInfoCapacity.html https://ipsrsolutions.com/library/uploads/2015/07/cloud-computing.png
  • #19 https://www.irjet.net/archives/V4/i9/IRJET-V4I957.pdf
  • #28 Source: https://intellipaat.com/blog/wp-content/uploads/2016/07/BigData-02.jpg
  • #29 Source: https://intellipaat.com/blog/wp-content/uploads/2016/07/BigData-03.jpg
  • #30 Source: https://intellipaat.com/blog/wp-content/uploads/2016/07/BigData-04.jpg
  • #31 Source: https://intellipaat.com/blog/wp-content/uploads/2016/07/Big-Data-in-Banking-Sector-1.png
  • #32 Source: https://intellipaat.com/blog/wp-content/uploads/2016/07/BigData.png