Introduction to Big Data
Ma XinJie
Signal Processing for Big Data
2016/09/13
7/15/2023 1
Big-Data in numbers
7/15/2023 2
Big-Data in numbers
7/15/2023 3
What is Big Data?
• There is not a consensus as to how to define Big Data
“A collection of data sets so large and complex that it becomes difficult to
process using on-hand database management tools or traditional data processing
applications.” – wiki.
“Big data refers to data sets whose size is beyond the ability of typical database
software tools to capture, store, manage and analyze.” - The McKinsey Global
Institute, 2011.
#1. Not defined as a certain number
#2. Can vary by sector
7/15/2023 4
7/15/2023 5
Characterization of Big Data:
volume, velocity, variety, Veracity (V4)
• Volume – challenging to load and process (how to index, retrieve)
• Variety – different data types and degree of structure (how to
query semi-structured data)
• Velocity – real-time processing influenced by rate of data arrival
• Veracity (verifying inference-based models from comprehensive
data collections)
7/15/2023 6
Characterization of Big Data:
volume, velocity, variety, Veracity (V4)
7/15/2023 7
7 Key Insights
• 1. Data have swept into every industry and business
function and are now an important factor of production.
• 2. Big Data Creates Value In Several Ways
• 3. use of big data will become a key basis of competition
and growth for individual firms
• 4. The Use Of Big Data Will Underpin New Waves Of
Productivity Growth And Consumer Surplus
• 5. While The Use Of Big Data Will Matter Across Sectors,
Some Sectors Are Poised For Greater Gains
• 6. There Will Be A Shortage Of Talent Necessary For
Organizations To Take Advantage Of Big Data
• 7. Several Issues Will Have To Be Addressed To Capture The
Full Potential Of Big Data
7/15/2023 8
Mapping global data: Growth and
value creation
• Why Big-Data?
• Key enablers for the appearance and growth
of “Big Data” are:
– Increase of storage capacities
– Increase of processing power
– Availability of data
7/15/2023 9
Enabler: Data storage
7/15/2023 10
Enabler: Computation capacity
7/15/2023 11
Enabler: Data availability
7/15/2023 12
Type of available data
7/15/2023 13
Data available from social networks
and mobile devices
7/15/2023 14
Big data techniques and technologies
• What matters when dealing with data?
Scalability
Streaming
Context
Quality
Usage
7/15/2023 15
Techniques for analyzing big data
• Association rule learning
• Classification
• Cluster analysis
• Data fusion and data integration
• Data mining
• Genetic algorithms
• Regression
• Signal processing
7/15/2023 16
Big data technologies
7/15/2023 17
The transformative potential of big
data in five domains
• 5 representative domains where the
importance of big data is growing
• Health care (United States)
• Public sector administration (European Union)
• Retail (United States)
• Manufacturing (global)
• Personal location data (global)
7/15/2023 18
Key findings that apply across sectors
7/15/2023 19
7/15/2023 20
7/15/2023 21
Thanks
7/15/2023 22

Presentation1.pptx

  • 1.
    Introduction to BigData Ma XinJie Signal Processing for Big Data 2016/09/13 7/15/2023 1
  • 2.
  • 3.
  • 4.
    What is BigData? • There is not a consensus as to how to define Big Data “A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” – wiki. “Big data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.” - The McKinsey Global Institute, 2011. #1. Not defined as a certain number #2. Can vary by sector 7/15/2023 4
  • 5.
  • 6.
    Characterization of BigData: volume, velocity, variety, Veracity (V4) • Volume – challenging to load and process (how to index, retrieve) • Variety – different data types and degree of structure (how to query semi-structured data) • Velocity – real-time processing influenced by rate of data arrival • Veracity (verifying inference-based models from comprehensive data collections) 7/15/2023 6
  • 7.
    Characterization of BigData: volume, velocity, variety, Veracity (V4) 7/15/2023 7
  • 8.
    7 Key Insights •1. Data have swept into every industry and business function and are now an important factor of production. • 2. Big Data Creates Value In Several Ways • 3. use of big data will become a key basis of competition and growth for individual firms • 4. The Use Of Big Data Will Underpin New Waves Of Productivity Growth And Consumer Surplus • 5. While The Use Of Big Data Will Matter Across Sectors, Some Sectors Are Poised For Greater Gains • 6. There Will Be A Shortage Of Talent Necessary For Organizations To Take Advantage Of Big Data • 7. Several Issues Will Have To Be Addressed To Capture The Full Potential Of Big Data 7/15/2023 8
  • 9.
    Mapping global data:Growth and value creation • Why Big-Data? • Key enablers for the appearance and growth of “Big Data” are: – Increase of storage capacities – Increase of processing power – Availability of data 7/15/2023 9
  • 10.
  • 11.
  • 12.
  • 13.
    Type of availabledata 7/15/2023 13
  • 14.
    Data available fromsocial networks and mobile devices 7/15/2023 14
  • 15.
    Big data techniquesand technologies • What matters when dealing with data? Scalability Streaming Context Quality Usage 7/15/2023 15
  • 16.
    Techniques for analyzingbig data • Association rule learning • Classification • Cluster analysis • Data fusion and data integration • Data mining • Genetic algorithms • Regression • Signal processing 7/15/2023 16
  • 17.
  • 18.
    The transformative potentialof big data in five domains • 5 representative domains where the importance of big data is growing • Health care (United States) • Public sector administration (European Union) • Retail (United States) • Manufacturing (global) • Personal location data (global) 7/15/2023 18
  • 19.
    Key findings thatapply across sectors 7/15/2023 19
  • 20.
  • 21.
  • 22.