SlideShare a Scribd company logo
Introduction to Big Data
Big Data & IoT
Lecture #2
Umair Shafique (03246441789)
Scholar MS Information Technology - University of Gujrat
CONTENT
• What is Big Data?
• What is an example of Big Data?
• Why Is Big Data Important?
• Big Data Analytics
• Benefits of Big Data Analytics
• Types of Big Data
• Characteristics of Big Data
• Primary Source of Big Data
• Big Data Tools and Software
• Big Data Mining
• Top Trends in Big Data
WHAT IS BIG DATA?
• Big Data is a massive collection of data that is growing exponentially
over time.
• It is a data set that is so large and complex that traditional data
management tools cannot store or process it efficiently.
• Big data is a type of data that is extremely large in size.
WHAT IS AN EXAMPLE OF BIG DATA?
• The following are some Big Data examples:
• The New York Stock Exchange, for example, generates approximately one
terabyte of new trade data per day.
• The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges, putting
comments etc.
• A single Jet engine can generate 10+terabytes of data in 30 minutes of flight
time. With many thousand flights per day, generation of data reaches up to
many Petabytes.
WHY IS BIG DATA IMPORTANT?
• Companies use big data in their systems to improve operations,
provide better customer service, create personalized marketing
campaigns and take other actions that, ultimately, can increase
revenue and profits.
• Big data is also used by medical researchers to identify disease signs
and risk factors and by doctors to help diagnose illnesses and medical
conditions in patients.
• In addition, a combination of data from electronic health records,
social media sites, the web and other sources gives healthcare
organizations and government agencies up-to-date information on
infectious disease threats or outbreaks.
BIG DATA ANALYTICS
• Big data analytics examines large amounts of data to uncover hidden
patterns, correlations and other insights.
• Big data analytics helps organizations harness their data and use it to
identify new opportunities.
• That, in turn, leads to smarter business moves, more efficient
operations, higher profits and happier customers.
BENEFITS OF BIG DATA ANALYTICS
• Real-time forecasting and monitoring of business as well as the
market.
• Identify crucial points hidden within large datasets to influence
business decisions.
• Identify issues in systems and business processes in real-time.
• Dig in customer data to create tailor-made products, services, offers,
discounts, etc.
• Facilitate speedy delivery of products/services that meet and exceed
client expectations.
TYPES OF BIG DATA
• Following are the types of Big Data:
 Structured
 Unstructured
 Semi-structured
STRUCTURED
• Structured Data is used to refer to the data which is already stored in
databases, in an ordered manner.
• There are two sources of structured data;
• Human-Generated
• Machine-Generated
• All the data received from sensors, web logs and financial systems are
classified under machine-generated data.
• Human-generated structured data mainly includes all the data a
human input a computer, such as his name.
UN-STRUCTURED
• Unstructured data is defined as any data with an unknown form or
structure.
• Aside from its massive size, unstructured data presents a number of
challenges in terms of processing and extracting value from it.
• A heterogeneous data source containing a mix of simple text files,
images, videos, and so on is an example of unstructured data.
SEMI-STRUCTURED
• Semi-structured data can contain both types of information.
• Semi-structured data appears to be structured, but it is not defined in
the same way that a table definition in a relational DBMS is.
• A data representation in an XML file is an example of semi-structured
data.
CHARACTERISTICS OF BIG DATA
VOLUME
• The name Big Data itself is related to a size which is enormous.
• Size of data plays a very crucial role in determining value out of data.
Also, whether a particular data can actually be considered as a Big
Data or not, is dependent upon the volume of data.
• Hence, Volume is one characteristic which needs to be considered
while dealing with Big Data solutions.
• For example;
• Organizational data
• Social media data
VELOCITY
• The term ‘velocity’ refers to the speed of generation of data.
• How fast the data is generated and processed to meet the demands,
determines real potential in the data.
• Big Data Velocity deals with the speed at which data flows in from
sources like business processes, application logs, networks, and social
media sites, sensors, Mobile devices, etc.
• The flow of data is massive and continuous.
VERACITY
• When we are dealing with a high volume, velocity and variety of data,
it is not possible that all of the data is going to be 100% correct, there
will be dirty data.
• The quality of the data being captured can vary greatly.
• The data accuracy of analysis depends on the veracity of the source
data.
VALUE
• Value is the most important aspect in the big data.
• Though, the potential value of the big data is huge.
• It is all well and good having access to big data but unless we can turn
it into value it is become useless.
• It becomes very costly to implement IT infrastructure systems to store
big data, and businesses are going to require a return on investment.
VARIETY
• Big data is not always structured data and it is not always easy to put
big data into a relational database.
• This means that the category to which Big Data belongs to is also a
very essential fact that needs to be known by the data analysis.
• Dealing with a variety of structured and unstructured data greatly
increases the complexity of both storing and analyzing Big Data.
• 90% of data generated is data is in unstructured form.
PRIMARY SOURCE OF BIG DATA
• Primary sources of Big Data are;
MEDIA AS A BIG DATA SOURCE
• Media is the most popular source of big data, as it provides valuable
insights on consumer preferences and changing trends.
• Since it is self-broadcasted and crosses all physical and
demographical barriers, it is the fastest way for businesses to get an
in-depth overview of their target audience, draw patterns and
conclusions, and enhance their decision-making.
• Media includes social media and interactive platforms, like Google,
Facebook, Twitter, YouTube, Instagram, as well as generic media like
images, videos, audios, and podcasts that provide quantitative and
qualitative insights on every aspect of user interaction.
CLOUD AS A BIG DATA SOURCE
• Today, companies have moved ahead of traditional data sources by
shifting their data on the cloud.
• Cloud storage accommodates structured and unstructured data and
provides business with real-time information and on-demand
insights.
• The main attribute of cloud computing is its flexibility and scalability.
• As big data can be stored and sourced on public or private clouds, via
networks and servers, cloud makes for an efficient and economical
data source.
WEB AS A BIG DATA SOURCE
• The public web constitutes big data that is widespread and easily
accessible.
• Data on the Web or ‘Internet’ is commonly available to individuals
and companies alike.
• Moreover, web services such as Wikipedia provide free and quick
informational insights to everyone.
• The enormity of the Web ensures for its diverse usability and is
especially beneficial to start-ups and SME’s, as they don’t have to wait
to develop their own big data infrastructure and repositories before
they can leverage big data.
IOT AS A BIG DATA SOURCE
• Machine-generated content or data created from IoT constitute a valuable
source of big data.
• This data is usually generated from the sensors that are connected to
electronic devices.
• The sourcing capacity depends on the ability of the sensors to provide real-
time accurate information.
• IOT is now gaining momentum and includes big data generated, not only
from computers and smartphones, but also possibly from every device that
can emit data.
• With IoT, data can now be sourced from medical devices, vehicular
processes, video games, meters, cameras, household appliances, and the
like.
DATABASES AS A BIG DATA SOURCE
• Businesses today prefer to use an incorporation of traditional and modern
databases to acquire relevant big data.
• This integration paves the way for a hybrid data model and requires low
investment and IT infrastructural costs.
• Furthermore, these databases are deployed for several business intelligence
purposes as well.
• These databases can then provide for the extraction of insights that are used to
drive business profits.
• Popular databases include a variety of data sources, such as MS Access, DB2,
Oracle, SQL, and Amazon Simple, among others.
BIG DATA TOOLS AND SOFTWARE
• Hadoop
• Atlas.it
• HPCC
• Storm
• Cassandra
• Kaggle
• CouchDB
• Pentaho
BIG DATA MINING
• Big data mining is referred to the collective data mining or extraction
techniques that are performed on large sets /volume of data or the
big data.
• Big data mining is primarily done to extract and retrieve desired
information or pattern from humongous quantity of data.
• Big data mining works on data searching, refinement , extraction and
comparison algorithms.
TOP TRENDS IN BIG DATA
• Four major trends in big data are helping organizations meet those
challenges.
• More data, increased data diversity drive advances in processing and
the rise of edge computing.
• Big data storage needs spur innovations in cloud and hybrid cloud
platforms, growth of data lakes.
• DataOps and data governance are becoming more prominent.
• Adoption of advanced analytics, machine learning and other AI
technologies increases dramatically.
TOP TRENDS IN BIG DATA

More Related Content

What's hot

Big data
Big dataBig data
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse Analytics
Erwin de Kreuk
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
Big Data
Big DataBig Data
Big Data
Rohit Jain
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
Nazir Ahmed
 
What is big data?
What is big data?What is big data?
What is big data?
David Wellman
 
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfGartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
momirlan
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Vipin Batra
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Vivek Aanand Ganesan
 
Chapter 1 introduction to sql server
Chapter 1 introduction to sql serverChapter 1 introduction to sql server
Chapter 1 introduction to sql server
baabtra.com - No. 1 supplier of quality freshers
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Power BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best PracticesPower BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best Practices
Learning SharePoint
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
SAishwaryaDinesh
 

What's hot (20)

Big data
Big dataBig data
Big data
 
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse Analytics
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data
Big DataBig Data
Big Data
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
What is big data?
What is big data?What is big data?
What is big data?
 
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfGartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Big data
Big dataBig data
Big data
 
Chapter 1 introduction to sql server
Chapter 1 introduction to sql serverChapter 1 introduction to sql server
Chapter 1 introduction to sql server
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Power BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best PracticesPower BI Governance - Access Management, Recommendations and Best Practices
Power BI Governance - Access Management, Recommendations and Best Practices
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 

Similar to Introduction to Big Data

Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
RUHULAMINHAZARIKA
 
Big data
Big dataBig data
Big data
Big dataBig data
Big data
madhavsolanki
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
What is big data
What is big dataWhat is big data
What is big data
mintubutani2212
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
mohamedibrahim946387
 
Big data
Big dataBig data
Big data
Sakshi Chawla
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
Big data
Big dataBig data
Big data
Riya
 
bigdatappt.pptx
bigdatappt.pptxbigdatappt.pptx
bigdatappt.pptx
KrishnaTeja570279
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
Hagar Alaa el-din
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
infinix8
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
Vedanand Singh
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma
 
Big data
Big dataBig data
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
ssuser96aab9
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
NouhaElhaji1
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
Arvind Bhisikar
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
Bohitesh Misra, PMP
 
A beginner's guide to Big data
A beginner's guide to Big dataA beginner's guide to Big data
A beginner's guide to Big data
AnushkaGupta763558
 

Similar to Introduction to Big Data (20)

Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
What is big data
What is big dataWhat is big data
What is big data
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
Big data
Big dataBig data
Big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data
Big dataBig data
Big data
 
bigdatappt.pptx
bigdatappt.pptxbigdatappt.pptx
bigdatappt.pptx
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Big data
Big dataBig data
Big data
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
A beginner's guide to Big data
A beginner's guide to Big dataA beginner's guide to Big data
A beginner's guide to Big data
 

More from Umair Shafique

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
Umair Shafique
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
Umair Shafique
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
Umair Shafique
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING R
Umair Shafique
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
Umair Shafique
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
Umair Shafique
 

More from Umair Shafique (6)

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING R
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Introduction to Big Data

  • 1. Introduction to Big Data Big Data & IoT Lecture #2 Umair Shafique (03246441789) Scholar MS Information Technology - University of Gujrat
  • 2. CONTENT • What is Big Data? • What is an example of Big Data? • Why Is Big Data Important? • Big Data Analytics • Benefits of Big Data Analytics • Types of Big Data • Characteristics of Big Data • Primary Source of Big Data • Big Data Tools and Software • Big Data Mining • Top Trends in Big Data
  • 3. WHAT IS BIG DATA? • Big Data is a massive collection of data that is growing exponentially over time. • It is a data set that is so large and complex that traditional data management tools cannot store or process it efficiently. • Big data is a type of data that is extremely large in size.
  • 4. WHAT IS AN EXAMPLE OF BIG DATA? • The following are some Big Data examples: • The New York Stock Exchange, for example, generates approximately one terabyte of new trade data per day. • The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc. • A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.
  • 5. WHY IS BIG DATA IMPORTANT? • Companies use big data in their systems to improve operations, provide better customer service, create personalized marketing campaigns and take other actions that, ultimately, can increase revenue and profits. • Big data is also used by medical researchers to identify disease signs and risk factors and by doctors to help diagnose illnesses and medical conditions in patients. • In addition, a combination of data from electronic health records, social media sites, the web and other sources gives healthcare organizations and government agencies up-to-date information on infectious disease threats or outbreaks.
  • 6. BIG DATA ANALYTICS • Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. • Big data analytics helps organizations harness their data and use it to identify new opportunities. • That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers.
  • 7. BENEFITS OF BIG DATA ANALYTICS • Real-time forecasting and monitoring of business as well as the market. • Identify crucial points hidden within large datasets to influence business decisions. • Identify issues in systems and business processes in real-time. • Dig in customer data to create tailor-made products, services, offers, discounts, etc. • Facilitate speedy delivery of products/services that meet and exceed client expectations.
  • 8. TYPES OF BIG DATA • Following are the types of Big Data:  Structured  Unstructured  Semi-structured
  • 9. STRUCTURED • Structured Data is used to refer to the data which is already stored in databases, in an ordered manner. • There are two sources of structured data; • Human-Generated • Machine-Generated • All the data received from sensors, web logs and financial systems are classified under machine-generated data. • Human-generated structured data mainly includes all the data a human input a computer, such as his name.
  • 10. UN-STRUCTURED • Unstructured data is defined as any data with an unknown form or structure. • Aside from its massive size, unstructured data presents a number of challenges in terms of processing and extracting value from it. • A heterogeneous data source containing a mix of simple text files, images, videos, and so on is an example of unstructured data.
  • 11. SEMI-STRUCTURED • Semi-structured data can contain both types of information. • Semi-structured data appears to be structured, but it is not defined in the same way that a table definition in a relational DBMS is. • A data representation in an XML file is an example of semi-structured data.
  • 13. VOLUME • The name Big Data itself is related to a size which is enormous. • Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. • Hence, Volume is one characteristic which needs to be considered while dealing with Big Data solutions. • For example; • Organizational data • Social media data
  • 14. VELOCITY • The term ‘velocity’ refers to the speed of generation of data. • How fast the data is generated and processed to meet the demands, determines real potential in the data. • Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. • The flow of data is massive and continuous.
  • 15. VERACITY • When we are dealing with a high volume, velocity and variety of data, it is not possible that all of the data is going to be 100% correct, there will be dirty data. • The quality of the data being captured can vary greatly. • The data accuracy of analysis depends on the veracity of the source data.
  • 16. VALUE • Value is the most important aspect in the big data. • Though, the potential value of the big data is huge. • It is all well and good having access to big data but unless we can turn it into value it is become useless. • It becomes very costly to implement IT infrastructure systems to store big data, and businesses are going to require a return on investment.
  • 17. VARIETY • Big data is not always structured data and it is not always easy to put big data into a relational database. • This means that the category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysis. • Dealing with a variety of structured and unstructured data greatly increases the complexity of both storing and analyzing Big Data. • 90% of data generated is data is in unstructured form.
  • 18. PRIMARY SOURCE OF BIG DATA • Primary sources of Big Data are;
  • 19. MEDIA AS A BIG DATA SOURCE • Media is the most popular source of big data, as it provides valuable insights on consumer preferences and changing trends. • Since it is self-broadcasted and crosses all physical and demographical barriers, it is the fastest way for businesses to get an in-depth overview of their target audience, draw patterns and conclusions, and enhance their decision-making. • Media includes social media and interactive platforms, like Google, Facebook, Twitter, YouTube, Instagram, as well as generic media like images, videos, audios, and podcasts that provide quantitative and qualitative insights on every aspect of user interaction.
  • 20. CLOUD AS A BIG DATA SOURCE • Today, companies have moved ahead of traditional data sources by shifting their data on the cloud. • Cloud storage accommodates structured and unstructured data and provides business with real-time information and on-demand insights. • The main attribute of cloud computing is its flexibility and scalability. • As big data can be stored and sourced on public or private clouds, via networks and servers, cloud makes for an efficient and economical data source.
  • 21. WEB AS A BIG DATA SOURCE • The public web constitutes big data that is widespread and easily accessible. • Data on the Web or ‘Internet’ is commonly available to individuals and companies alike. • Moreover, web services such as Wikipedia provide free and quick informational insights to everyone. • The enormity of the Web ensures for its diverse usability and is especially beneficial to start-ups and SME’s, as they don’t have to wait to develop their own big data infrastructure and repositories before they can leverage big data.
  • 22. IOT AS A BIG DATA SOURCE • Machine-generated content or data created from IoT constitute a valuable source of big data. • This data is usually generated from the sensors that are connected to electronic devices. • The sourcing capacity depends on the ability of the sensors to provide real- time accurate information. • IOT is now gaining momentum and includes big data generated, not only from computers and smartphones, but also possibly from every device that can emit data. • With IoT, data can now be sourced from medical devices, vehicular processes, video games, meters, cameras, household appliances, and the like.
  • 23. DATABASES AS A BIG DATA SOURCE • Businesses today prefer to use an incorporation of traditional and modern databases to acquire relevant big data. • This integration paves the way for a hybrid data model and requires low investment and IT infrastructural costs. • Furthermore, these databases are deployed for several business intelligence purposes as well. • These databases can then provide for the extraction of insights that are used to drive business profits. • Popular databases include a variety of data sources, such as MS Access, DB2, Oracle, SQL, and Amazon Simple, among others.
  • 24. BIG DATA TOOLS AND SOFTWARE • Hadoop • Atlas.it • HPCC • Storm • Cassandra • Kaggle • CouchDB • Pentaho
  • 25. BIG DATA MINING • Big data mining is referred to the collective data mining or extraction techniques that are performed on large sets /volume of data or the big data. • Big data mining is primarily done to extract and retrieve desired information or pattern from humongous quantity of data. • Big data mining works on data searching, refinement , extraction and comparison algorithms.
  • 26. TOP TRENDS IN BIG DATA • Four major trends in big data are helping organizations meet those challenges. • More data, increased data diversity drive advances in processing and the rise of edge computing. • Big data storage needs spur innovations in cloud and hybrid cloud platforms, growth of data lakes. • DataOps and data governance are becoming more prominent. • Adoption of advanced analytics, machine learning and other AI technologies increases dramatically.
  • 27. TOP TRENDS IN BIG DATA

Editor's Notes

  1. Edge Computing: Edge Computing, which shifts the processing load to the devices themselves before the data is sent to the servers. Edge computing optimizes performance and storage by reducing the need for data to flow through networks, reducing computing and processing costs, especially cloud storage, bandwidth and processing expenses. Edge computing helps to speed up data analysis and provides faster responses to the user. Cloud & Hybrid Cloud Computing: To deal with the inexorable increase in data generation, organizations are spending more of their resources storing this data in a range of cloud-based and hybrid cloud systems optimized for all the V's of big data. In previous decades, organizations handled their own storage infrastructure, resulting in massive data centers that enterprises had to manage, secure and operate. The move to cloud computing changed that dynamic. By shifting the responsibility to cloud infrastructure providers -- such as AWS, Google, Microsoft and IBM -- organizations can deal with almost limitless amounts of new data and pay for storage and compute capability on demand without having to maintain their own large and complex data centers. Data Lakes: One area of innovation is the emergence of DataOps, a methodology and practice that focuses on agile, iterative approaches for dealing with the full lifecycle of data as it flows through the organization. Rather than thinking about data in piecemeal fashion with separate people dealing with data generation, storage, transportation, processing and management, DataOps processes and frameworks address organizational needs across the data lifecycle from generation to archiving. Machine Learning and AI Technologies: No technology has been as revolutionary to big data analytics as machine learning and AI systems. AI is used by organizations of all sizes to optimize and improve their business processes. Machine learning enables them to more easily identify patterns and detect anomalies in large data sets to provide predictive analytics and other advanced data analysis capabilities. This includes recognition systems for image, video and text data; automated classification of information; natural language processing capabilities for chatbots and voice and text analysis; autonomous business process automation; high degrees of personalization and recommendation; and systems that can find optimal solutions among the sea of data.