SlideShare a Scribd company logo
1 of 27
Introduction to Big Data
Big Data & IoT
Lecture #2
Umair Shafique (03246441789)
Scholar MS Information Technology - University of Gujrat
CONTENT
• What is Big Data?
• What is an example of Big Data?
• Why Is Big Data Important?
• Big Data Analytics
• Benefits of Big Data Analytics
• Types of Big Data
• Characteristics of Big Data
• Primary Source of Big Data
• Big Data Tools and Software
• Big Data Mining
• Top Trends in Big Data
WHAT IS BIG DATA?
• Big Data is a massive collection of data that is growing exponentially
over time.
• It is a data set that is so large and complex that traditional data
management tools cannot store or process it efficiently.
• Big data is a type of data that is extremely large in size.
WHAT IS AN EXAMPLE OF BIG DATA?
• The following are some Big Data examples:
• The New York Stock Exchange, for example, generates approximately one
terabyte of new trade data per day.
• The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges, putting
comments etc.
• A single Jet engine can generate 10+terabytes of data in 30 minutes of flight
time. With many thousand flights per day, generation of data reaches up to
many Petabytes.
WHY IS BIG DATA IMPORTANT?
• Companies use big data in their systems to improve operations,
provide better customer service, create personalized marketing
campaigns and take other actions that, ultimately, can increase
revenue and profits.
• Big data is also used by medical researchers to identify disease signs
and risk factors and by doctors to help diagnose illnesses and medical
conditions in patients.
• In addition, a combination of data from electronic health records,
social media sites, the web and other sources gives healthcare
organizations and government agencies up-to-date information on
infectious disease threats or outbreaks.
BIG DATA ANALYTICS
• Big data analytics examines large amounts of data to uncover hidden
patterns, correlations and other insights.
• Big data analytics helps organizations harness their data and use it to
identify new opportunities.
• That, in turn, leads to smarter business moves, more efficient
operations, higher profits and happier customers.
BENEFITS OF BIG DATA ANALYTICS
• Real-time forecasting and monitoring of business as well as the
market.
• Identify crucial points hidden within large datasets to influence
business decisions.
• Identify issues in systems and business processes in real-time.
• Dig in customer data to create tailor-made products, services, offers,
discounts, etc.
• Facilitate speedy delivery of products/services that meet and exceed
client expectations.
TYPES OF BIG DATA
• Following are the types of Big Data:
 Structured
 Unstructured
 Semi-structured
STRUCTURED
• Structured Data is used to refer to the data which is already stored in
databases, in an ordered manner.
• There are two sources of structured data;
• Human-Generated
• Machine-Generated
• All the data received from sensors, web logs and financial systems are
classified under machine-generated data.
• Human-generated structured data mainly includes all the data a
human input a computer, such as his name.
UN-STRUCTURED
• Unstructured data is defined as any data with an unknown form or
structure.
• Aside from its massive size, unstructured data presents a number of
challenges in terms of processing and extracting value from it.
• A heterogeneous data source containing a mix of simple text files,
images, videos, and so on is an example of unstructured data.
SEMI-STRUCTURED
• Semi-structured data can contain both types of information.
• Semi-structured data appears to be structured, but it is not defined in
the same way that a table definition in a relational DBMS is.
• A data representation in an XML file is an example of semi-structured
data.
CHARACTERISTICS OF BIG DATA
VOLUME
• The name Big Data itself is related to a size which is enormous.
• Size of data plays a very crucial role in determining value out of data.
Also, whether a particular data can actually be considered as a Big
Data or not, is dependent upon the volume of data.
• Hence, Volume is one characteristic which needs to be considered
while dealing with Big Data solutions.
• For example;
• Organizational data
• Social media data
VELOCITY
• The term ‘velocity’ refers to the speed of generation of data.
• How fast the data is generated and processed to meet the demands,
determines real potential in the data.
• Big Data Velocity deals with the speed at which data flows in from
sources like business processes, application logs, networks, and social
media sites, sensors, Mobile devices, etc.
• The flow of data is massive and continuous.
VERACITY
• When we are dealing with a high volume, velocity and variety of data,
it is not possible that all of the data is going to be 100% correct, there
will be dirty data.
• The quality of the data being captured can vary greatly.
• The data accuracy of analysis depends on the veracity of the source
data.
VALUE
• Value is the most important aspect in the big data.
• Though, the potential value of the big data is huge.
• It is all well and good having access to big data but unless we can turn
it into value it is become useless.
• It becomes very costly to implement IT infrastructure systems to store
big data, and businesses are going to require a return on investment.
VARIETY
• Big data is not always structured data and it is not always easy to put
big data into a relational database.
• This means that the category to which Big Data belongs to is also a
very essential fact that needs to be known by the data analysis.
• Dealing with a variety of structured and unstructured data greatly
increases the complexity of both storing and analyzing Big Data.
• 90% of data generated is data is in unstructured form.
PRIMARY SOURCE OF BIG DATA
• Primary sources of Big Data are;
MEDIA AS A BIG DATA SOURCE
• Media is the most popular source of big data, as it provides valuable
insights on consumer preferences and changing trends.
• Since it is self-broadcasted and crosses all physical and
demographical barriers, it is the fastest way for businesses to get an
in-depth overview of their target audience, draw patterns and
conclusions, and enhance their decision-making.
• Media includes social media and interactive platforms, like Google,
Facebook, Twitter, YouTube, Instagram, as well as generic media like
images, videos, audios, and podcasts that provide quantitative and
qualitative insights on every aspect of user interaction.
CLOUD AS A BIG DATA SOURCE
• Today, companies have moved ahead of traditional data sources by
shifting their data on the cloud.
• Cloud storage accommodates structured and unstructured data and
provides business with real-time information and on-demand
insights.
• The main attribute of cloud computing is its flexibility and scalability.
• As big data can be stored and sourced on public or private clouds, via
networks and servers, cloud makes for an efficient and economical
data source.
WEB AS A BIG DATA SOURCE
• The public web constitutes big data that is widespread and easily
accessible.
• Data on the Web or ‘Internet’ is commonly available to individuals
and companies alike.
• Moreover, web services such as Wikipedia provide free and quick
informational insights to everyone.
• The enormity of the Web ensures for its diverse usability and is
especially beneficial to start-ups and SME’s, as they don’t have to wait
to develop their own big data infrastructure and repositories before
they can leverage big data.
IOT AS A BIG DATA SOURCE
• Machine-generated content or data created from IoT constitute a valuable
source of big data.
• This data is usually generated from the sensors that are connected to
electronic devices.
• The sourcing capacity depends on the ability of the sensors to provide real-
time accurate information.
• IOT is now gaining momentum and includes big data generated, not only
from computers and smartphones, but also possibly from every device that
can emit data.
• With IoT, data can now be sourced from medical devices, vehicular
processes, video games, meters, cameras, household appliances, and the
like.
DATABASES AS A BIG DATA SOURCE
• Businesses today prefer to use an incorporation of traditional and modern
databases to acquire relevant big data.
• This integration paves the way for a hybrid data model and requires low
investment and IT infrastructural costs.
• Furthermore, these databases are deployed for several business intelligence
purposes as well.
• These databases can then provide for the extraction of insights that are used to
drive business profits.
• Popular databases include a variety of data sources, such as MS Access, DB2,
Oracle, SQL, and Amazon Simple, among others.
BIG DATA TOOLS AND SOFTWARE
• Hadoop
• Atlas.it
• HPCC
• Storm
• Cassandra
• Kaggle
• CouchDB
• Pentaho
BIG DATA MINING
• Big data mining is referred to the collective data mining or extraction
techniques that are performed on large sets /volume of data or the
big data.
• Big data mining is primarily done to extract and retrieve desired
information or pattern from humongous quantity of data.
• Big data mining works on data searching, refinement , extraction and
comparison algorithms.
TOP TRENDS IN BIG DATA
• Four major trends in big data are helping organizations meet those
challenges.
• More data, increased data diversity drive advances in processing and
the rise of edge computing.
• Big data storage needs spur innovations in cloud and hybrid cloud
platforms, growth of data lakes.
• DataOps and data governance are becoming more prominent.
• Adoption of advanced analytics, machine learning and other AI
technologies increases dramatically.
TOP TRENDS IN BIG DATA

More Related Content

What's hot

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
DAX and Power BI Training - 001 Overview
DAX and Power BI Training -  001 OverviewDAX and Power BI Training -  001 Overview
DAX and Power BI Training - 001 OverviewWill Harvey
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
A Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementA Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementBoris Otto
 
Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG CCG
 
Base de datos y SGBR
Base  de datos y SGBRBase  de datos y SGBR
Base de datos y SGBRMaybelt King
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeWhereScape
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 

What's hot (20)

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
 
Big data
Big dataBig data
Big data
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Dbms presentaion
Dbms presentaionDbms presentaion
Dbms presentaion
 
SAP BI Overview
SAP BI OverviewSAP BI Overview
SAP BI Overview
 
DAX and Power BI Training - 001 Overview
DAX and Power BI Training -  001 OverviewDAX and Power BI Training -  001 Overview
DAX and Power BI Training - 001 Overview
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Big data
Big dataBig data
Big data
 
A Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementA Reference Process Model for Master Data Management
A Reference Process Model for Master Data Management
 
What is "data"?
What is "data"?What is "data"?
What is "data"?
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG
 
Data lake
Data lakeData lake
Data lake
 
Base de datos y SGBR
Base  de datos y SGBRBase  de datos y SGBR
Base de datos y SGBR
 
Big data
Big dataBig data
Big data
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 

Similar to Introduction to Big Data

Similar to Introduction to Big Data (20)

Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big data
Big dataBig data
Big data
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
What is big data
What is big dataWhat is big data
What is big data
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
Big data
Big dataBig data
Big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data
Big dataBig data
Big data
 
bigdatappt.pptx
bigdatappt.pptxbigdatappt.pptx
bigdatappt.pptx
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Big data
Big dataBig data
Big data
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
A beginner's guide to Big data
A beginner's guide to Big dataA beginner's guide to Big data
A beginner's guide to Big data
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 

More from Umair Shafique

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data PreparationUmair Shafique
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING RUmair Shafique
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGUmair Shafique
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big DataUmair Shafique
 

More from Umair Shafique (6)

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING R
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Introduction to Big Data

  • 1. Introduction to Big Data Big Data & IoT Lecture #2 Umair Shafique (03246441789) Scholar MS Information Technology - University of Gujrat
  • 2. CONTENT • What is Big Data? • What is an example of Big Data? • Why Is Big Data Important? • Big Data Analytics • Benefits of Big Data Analytics • Types of Big Data • Characteristics of Big Data • Primary Source of Big Data • Big Data Tools and Software • Big Data Mining • Top Trends in Big Data
  • 3. WHAT IS BIG DATA? • Big Data is a massive collection of data that is growing exponentially over time. • It is a data set that is so large and complex that traditional data management tools cannot store or process it efficiently. • Big data is a type of data that is extremely large in size.
  • 4. WHAT IS AN EXAMPLE OF BIG DATA? • The following are some Big Data examples: • The New York Stock Exchange, for example, generates approximately one terabyte of new trade data per day. • The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc. • A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.
  • 5. WHY IS BIG DATA IMPORTANT? • Companies use big data in their systems to improve operations, provide better customer service, create personalized marketing campaigns and take other actions that, ultimately, can increase revenue and profits. • Big data is also used by medical researchers to identify disease signs and risk factors and by doctors to help diagnose illnesses and medical conditions in patients. • In addition, a combination of data from electronic health records, social media sites, the web and other sources gives healthcare organizations and government agencies up-to-date information on infectious disease threats or outbreaks.
  • 6. BIG DATA ANALYTICS • Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. • Big data analytics helps organizations harness their data and use it to identify new opportunities. • That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers.
  • 7. BENEFITS OF BIG DATA ANALYTICS • Real-time forecasting and monitoring of business as well as the market. • Identify crucial points hidden within large datasets to influence business decisions. • Identify issues in systems and business processes in real-time. • Dig in customer data to create tailor-made products, services, offers, discounts, etc. • Facilitate speedy delivery of products/services that meet and exceed client expectations.
  • 8. TYPES OF BIG DATA • Following are the types of Big Data:  Structured  Unstructured  Semi-structured
  • 9. STRUCTURED • Structured Data is used to refer to the data which is already stored in databases, in an ordered manner. • There are two sources of structured data; • Human-Generated • Machine-Generated • All the data received from sensors, web logs and financial systems are classified under machine-generated data. • Human-generated structured data mainly includes all the data a human input a computer, such as his name.
  • 10. UN-STRUCTURED • Unstructured data is defined as any data with an unknown form or structure. • Aside from its massive size, unstructured data presents a number of challenges in terms of processing and extracting value from it. • A heterogeneous data source containing a mix of simple text files, images, videos, and so on is an example of unstructured data.
  • 11. SEMI-STRUCTURED • Semi-structured data can contain both types of information. • Semi-structured data appears to be structured, but it is not defined in the same way that a table definition in a relational DBMS is. • A data representation in an XML file is an example of semi-structured data.
  • 13. VOLUME • The name Big Data itself is related to a size which is enormous. • Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. • Hence, Volume is one characteristic which needs to be considered while dealing with Big Data solutions. • For example; • Organizational data • Social media data
  • 14. VELOCITY • The term ‘velocity’ refers to the speed of generation of data. • How fast the data is generated and processed to meet the demands, determines real potential in the data. • Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. • The flow of data is massive and continuous.
  • 15. VERACITY • When we are dealing with a high volume, velocity and variety of data, it is not possible that all of the data is going to be 100% correct, there will be dirty data. • The quality of the data being captured can vary greatly. • The data accuracy of analysis depends on the veracity of the source data.
  • 16. VALUE • Value is the most important aspect in the big data. • Though, the potential value of the big data is huge. • It is all well and good having access to big data but unless we can turn it into value it is become useless. • It becomes very costly to implement IT infrastructure systems to store big data, and businesses are going to require a return on investment.
  • 17. VARIETY • Big data is not always structured data and it is not always easy to put big data into a relational database. • This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysis. • Dealing with a variety of structured and unstructured data greatly increases the complexity of both storing and analyzing Big Data. • 90% of data generated is data is in unstructured form.
  • 18. PRIMARY SOURCE OF BIG DATA • Primary sources of Big Data are;
  • 19. MEDIA AS A BIG DATA SOURCE • Media is the most popular source of big data, as it provides valuable insights on consumer preferences and changing trends. • Since it is self-broadcasted and crosses all physical and demographical barriers, it is the fastest way for businesses to get an in-depth overview of their target audience, draw patterns and conclusions, and enhance their decision-making. • Media includes social media and interactive platforms, like Google, Facebook, Twitter, YouTube, Instagram, as well as generic media like images, videos, audios, and podcasts that provide quantitative and qualitative insights on every aspect of user interaction.
  • 20. CLOUD AS A BIG DATA SOURCE • Today, companies have moved ahead of traditional data sources by shifting their data on the cloud. • Cloud storage accommodates structured and unstructured data and provides business with real-time information and on-demand insights. • The main attribute of cloud computing is its flexibility and scalability. • As big data can be stored and sourced on public or private clouds, via networks and servers, cloud makes for an efficient and economical data source.
  • 21. WEB AS A BIG DATA SOURCE • The public web constitutes big data that is widespread and easily accessible. • Data on the Web or ‘Internet’ is commonly available to individuals and companies alike. • Moreover, web services such as Wikipedia provide free and quick informational insights to everyone. • The enormity of the Web ensures for its diverse usability and is especially beneficial to start-ups and SME’s, as they don’t have to wait to develop their own big data infrastructure and repositories before they can leverage big data.
  • 22. IOT AS A BIG DATA SOURCE • Machine-generated content or data created from IoT constitute a valuable source of big data. • This data is usually generated from the sensors that are connected to electronic devices. • The sourcing capacity depends on the ability of the sensors to provide real- time accurate information. • IOT is now gaining momentum and includes big data generated, not only from computers and smartphones, but also possibly from every device that can emit data. • With IoT, data can now be sourced from medical devices, vehicular processes, video games, meters, cameras, household appliances, and the like.
  • 23. DATABASES AS A BIG DATA SOURCE • Businesses today prefer to use an incorporation of traditional and modern databases to acquire relevant big data. • This integration paves the way for a hybrid data model and requires low investment and IT infrastructural costs. • Furthermore, these databases are deployed for several business intelligence purposes as well. • These databases can then provide for the extraction of insights that are used to drive business profits. • Popular databases include a variety of data sources, such as MS Access, DB2, Oracle, SQL, and Amazon Simple, among others.
  • 24. BIG DATA TOOLS AND SOFTWARE • Hadoop • Atlas.it • HPCC • Storm • Cassandra • Kaggle • CouchDB • Pentaho
  • 25. BIG DATA MINING • Big data mining is referred to the collective data mining or extraction techniques that are performed on large sets /volume of data or the big data. • Big data mining is primarily done to extract and retrieve desired information or pattern from humongous quantity of data. • Big data mining works on data searching, refinement , extraction and comparison algorithms.
  • 26. TOP TRENDS IN BIG DATA • Four major trends in big data are helping organizations meet those challenges. • More data, increased data diversity drive advances in processing and the rise of edge computing. • Big data storage needs spur innovations in cloud and hybrid cloud platforms, growth of data lakes. • DataOps and data governance are becoming more prominent. • Adoption of advanced analytics, machine learning and other AI technologies increases dramatically.
  • 27. TOP TRENDS IN BIG DATA

Editor's Notes

  1. Edge Computing: Edge Computing, which shifts the processing load to the devices themselves before the data is sent to the servers. Edge computing optimizes performance and storage by reducing the need for data to flow through networks, reducing computing and processing costs, especially cloud storage, bandwidth and processing expenses. Edge computing helps to speed up data analysis and provides faster responses to the user. Cloud & Hybrid Cloud Computing: To deal with the inexorable increase in data generation, organizations are spending more of their resources storing this data in a range of cloud-based and hybrid cloud systems optimized for all the V's of big data. In previous decades, organizations handled their own storage infrastructure, resulting in massive data centers that enterprises had to manage, secure and operate. The move to cloud computing changed that dynamic. By shifting the responsibility to cloud infrastructure providers -- such as AWS, Google, Microsoft and IBM -- organizations can deal with almost limitless amounts of new data and pay for storage and compute capability on demand without having to maintain their own large and complex data centers. Data Lakes: One area of innovation is the emergence of DataOps, a methodology and practice that focuses on agile, iterative approaches for dealing with the full lifecycle of data as it flows through the organization. Rather than thinking about data in piecemeal fashion with separate people dealing with data generation, storage, transportation, processing and management, DataOps processes and frameworks address organizational needs across the data lifecycle from generation to archiving. Machine Learning and AI Technologies: No technology has been as revolutionary to big data analytics as machine learning and AI systems. AI is used by organizations of all sizes to optimize and improve their business processes. Machine learning enables them to more easily identify patterns and detect anomalies in large data sets to provide predictive analytics and other advanced data analysis capabilities. This includes recognition systems for image, video and text data; automated classification of information; natural language processing capabilities for chatbots and voice and text analysis; autonomous business process automation; high degrees of personalization and recommendation; and systems that can find optimal solutions among the sea of data.