SlideShare a Scribd company logo
Course – Big Data Analytics (Professional Elective-II)
Course code-IT314B
Unit-II- BIG DATA ANALYTICS LIFE CYCLE
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Information Technology
(NBA Accredited)
Mr. Rajendra N Kankrale
Asst. Prof.
1
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Unit-I BIG DATA ANALYTICS LIFE CYCLE
• Syllabus
• Introduction to Big Data, sources of Big Data, Data Analytic Lifecycle:
Introduction, Phase 1: Discovery, Phase 2: Data Preparation, Phase 3: Model
Planning, Phase 4: Model Building, Phase 5: Communication results, Phase 6:
Operationalize.
2
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Unit-I BIG DATA ANALYTICS LIFE CYCLE
1. Why Big Data analytics?
2. What is Big Data analytics?
3. Lifecycle of Big Data analytics
4. Types of Big Data analytics
5. Tools used in Big Data analytics
6. Big Data application domains
3
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Why Big Data analytics?
Take the music streaming platform Spotify for example. The company has nearly 96
million users that generate a tremendous amount of data every day. Through this
information, the cloud-based platform automatically generates suggested songs—
through a smart recommendation engine—based on likes, shares, search history, and
more. What enables this is the techniques, tools, and frameworks that are a result of
Big Data analytics.
If you are a Spotify user, then you must have come across the top recommendation
section, which is based on your likes, past history, and other things. Utilizing a
recommendation engine that leverages data filtering tools that collect data and then
filter it using algorithms works. This is what Spotify does.
4
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Uses and Examples of Big Data Analytics
There are many different ways that Big Data analytics can be used in order to improve
businesses and organizations. Here are some examples:
• Using analytics to understand customer behavior in order to optimize the customer
experience
• Predicting future trends in order to make better business decisions
• Improving marketing campaigns by understanding what works and what doesn't
• Increasing operational efficiency by understanding where bottlenecks are and how
to fix them
• Detecting fraud and other forms of misuse sooner
These are just a few examples — the possibilities are really endless when it comes to
Big Data analytics. It all depends on how you want to use it in order to improve your
business.
5
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
What is Big Data analytics?
• What is Big Data?
• Big Data is a massive amount of data sets that cannot be stored, processed, or
analyzed using traditional tools.
• Big Data analytics is a process used to extract meaningful insights, such as
hidden patterns, unknown correlations, market trends, and customer
preferences. Big Data analytics provides various advantages—it can be used
for better decision making, preventing fraudulent activities, among other
things.
6
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Big Data sources
• Today, there are millions of data sources that generate data at a very rapid
rate. These data sources are present across the world. Some of the largest
sources of data are social media platforms and networks. Let’s use
Facebook as an example—it generates more than 500 terabytes of data
every day. This data includes pictures, videos, messages, and more.
• Data also exists in different formats, like structured data, semi-structured
data, and unstructured data. For example, in a regular Excel sheet, data is
classified as structured data—with a definite format. In contrast, emails fall
under semi-structured, and your pictures and videos fall under unstructured
data. All this data combined makes up Big Data.
7
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Types of Big Data analytics
The following are the four types of big data analytics:
1. Prescriptive Analytics- (What is the solution?)
2. Diagnostic Analytics- (why did happened?)
3. Predictive Analytics- (What will happen?)
4. Descriptive Analytics- (What has happened ?)
8
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
• Apache Spark: Spark is a framework for real-time data analytics, which
is a part of the Hadoop ecosystem.
• Python: Python is one of the most versatile programming languages that
is rapidly being deployed for various applications including machine
learning.
• SAS: SAS is an advanced analytical tool that is used for working with large
volumes of data and deriving valuable insights from it.
9
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
• Hadoop: Hadoop is the most popular big data framework that is
deployed by a wide range of organizations from around the world for
making sense of big data.
• SQL: SQL is used for working with relational database management
systems.
• Tableau: Tableau is the most popular business intelligence tool that is
deployed for the purpose of data visualization and business analytics.
10
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
• Splunk: Splunk is the tool of choice for parsing machine-generated data
and deriving valuable business insights out of it.
• R: R is the no. 1 programming language that is being used by data
scientists for statistical computing and graphical applications alike.
11
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
Cassandra
APACHE Cassandra is an open-source NoSQL distributed database that is used to
fetch large amounts of data. It’s one of the most popular tools for data analytics and
has been praised by many tech companies due to its high scalability and availability
without compromising speed and performance. It is capable of delivering thousands of
operations every second and can handle petabytes of resources with almost zero
downtime. It was created by Facebook back in 2008 and was published publicly.
12
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
Apache Storm
A storm is a robust, user-friendly tool used for data analytics, especially in
small companies. The best part about the storm is that it has no language
barrier (programming) in it and can support any of them. It was designed to
handle a pool of large data in fault-tolerance and horizontally scalable
methods. When we talk about real-time data processing, Storm leads the chart
because of its distributed real-time big data processing system, due to which
today many tech giants are using APACHE Storm in their system. Some of the
most notable names are Twitter, Zendesk, NaviSite, etc.
13
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Applications of Big Data Analytics
• Customer Acquisition and Retention: Customer information helps tremendously
in marketing trends, through data-driven actions, to increase customer satisfaction.
For example, personalization engines for Netflix, Amazon, and Spotify help with
improved customer experiences and gaining customer loyalty.
• Targeted Ads: Personalized data about interaction patterns, order history, and
product page viewing history can help immensely to create targeted ad campaigns
for customers on a larger scale and at the individual level.
14
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Applications of Big Data Analytics
• Product Development: It can generate insights on development decisions, product
viability, performance measurements, etc., and direct improvements that positively
serve the customers.
• Price Optimization: Pricing models can be modeled and used by retailers with the
help of diverse data sources to maximize revenues.
• Supply Chain and Channel Analytics: Predictive analytical models help with
B2B supplier networks, preemptive replenishment, route optimizations, inventory
management, and notification of potential delays in deliveries.
• Risk Management: It helps in the identification of new risks with the help of data
patterns for the purpose of developing effective risk management strategies.
• Improved Decision-making: The insights that are extracted from the data can help
enterprises make sound and quick decisions.
15
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Examples/Areas Using Big Data Analytics Tools
• Healthcare: Big data analytics technologies and tools are being used in healthcare
to predict patient outcomes, identify at-risk patients, and improve population health.
• Retail: Big data analytics tools are being used by retailers to improve customer
experience, target marketing campaigns, and prevent fraud.
• Manufacturing: Big data analytics tools are being used in manufacturing to
improve quality control, reduce downtime, and optimize production processes.
• Banking: Real time big data analytics tools are being used by banks to detect
fraudulent activities, prevent money laundering, and improve customer service.
• Government: Big data analytics tools are being used by government agencies to
improve public services, combat fraud and corruption, and better understand citizen
needs.
16
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• The Data analytics lifecycle was designed to address Big Data problems and data
science projects. The process is repeated to show the real projects. To address the
specific demands for conducting analysis on Big Data, the step-by-step
methodology is required to plan the various tasks associated with the acquisition,
processing, analysis, and recycling of data.
17
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
18
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 1: Discovery –
• The data science team is trained and researches the issue.
• Create context and gain understanding.
• Learn about the data sources that are needed and accessible to the project.
• The team comes up with an initial hypothesis, which can be later confirmed
with evidence.
19
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 1: Discovery –
• The first phase of the Data Analytics Lifecycle is the data discovery step. This
stage involves identifying potential data sources, both internal and external,
that are relevant to the business problem at hand. It is essential to define the
scope of the analysis and gather data from various databases, applications, and
online repositories. Data can come in different formats, including structured,
unstructured, and semi-structured data.
• The key to success in this phase is to ensure the data collected is accurate,
relevant, and comprehensive. Missing or flawed data can lead to misleading
insights and decisions down the line. Rigorous data quality checks and
validation procedures are necessary to maintain data integrity.
20
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 2: Data Preparation -
• Once the data is collected, it is crucial to clean and preprocess it before
analysis. Data preparation involves identifying and rectifying errors,
duplications, and inconsistencies in the dataset. This process ensures that the
data is of high quality and ready for further analysis.
• Data preprocessing tasks may include data transformation, normalisation, and
handling missing values. Cleaning and preprocessing are time-consuming but
vital steps that significantly impact the accuracy and reliability of the final
results. Proper data preprocessing can also help in dealing with noise and
irrelevant data, leading to better outcomes.
21
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 2: Data Preparation -
• Methods to investigate the possibilities of pre-processing, analysing, and
preparing data before analysis and modelling.
• It is required to have an analytic sandbox. The team performs, loads, and
transforms to bring information to the data sandbox.
• Data preparation tasks can be repeated and not in a predetermined sequence.
• Some of the tools used commonly for this process include - Hadoop, Alpine
Miner, Open Refine, etc.-
22
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 2: Data Preparation -
• Data preparation and processing involves gathering, sorting, processing and
purifying collected information to make sure it can be utilized by subsequent
steps of analysis.
• Data Collection: Draw information from external sources.
• Data Entry: Within an organization, data entry refers to creating new points of
information using either digital technologies or manual input procedures.
• Signal Reception: Accumulating data from digital devices like the Internet of
Things devices and control systems.
• An analytical sandbox is essential during the data preparation stage of data
analytics Life Cycle. This scalable platform is used by data analysts and
scientists alike for processing their data sets; once executed, loaded, or altered
it resides securely inside this sandbox for later examination and modification.
23
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 3: Model Planning -
• The team studies data to discover the connections between variables. Later, it
selects the most significant variables as well as the most effective models.
• In this phase, the data science teams create data sets that can be used for
training for testing, production, and training goals.
• The team builds and implements models based on the work completed in the
modelling planning phase.
• Some of the tools used commonly for this stage are MATLAB and
STASTICA.
24
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 3: Model Planning -
• The team studies data to discover the connections between variables. Later, it
selects the most significant variables as well as the most effective models.
• In this phase, the data science teams create data sets that can be used for
training for testing, production, and training goals.
• The team builds and implements models based on the work completed in the
modelling planning phase.
• Some of the tools used commonly for this stage are MATLAB and
STASTICA.
25
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 4: Model Building -
• The team creates datasets for training, testing as well as production use.
• The team is also evaluating whether its current tools are sufficient to run the
models or if they require an even more robust environment to run models.
• Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA.
• Commercial tools - MATLAB, STASTICA.
26
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 5: Communication Results -
• Following the execution of the model, team members will need to evaluate the
outcomes of the model to establish criteria for the success or failure of the
model.
• The team is considering how best to present findings and outcomes to the
various members of the team and other stakeholders while taking into
consideration cautionary tales and assumptions.
• The team should determine the most important findings, quantify their value to
the business and create a narrative to present findings and summarize them to
all stakeholders.
27
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 6: Operationalize -
• The team distributes the benefits of the project to a wider audience. It sets up a
pilot project that will deploy the work in a controlled manner prior to
expanding the project to the entire enterprise of users.
• This technique allows the team to gain insight into the performance and
constraints related to the model within a production setting at a small scale and
then make necessary adjustments before full deployment.
• The team produces the last reports, presentations, and codes.
• Open source or free tools such as WEKA, SQL, MADlib, and Octave.
28
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Lifecycle of Big Data analytics
The Big Data Analytics Life cycle is divided into nine phases, named as :
1. Business Case/Problem Definition
2. Data Identification
3. Data Acquisition and filtration
4. Data Extraction
5. Data Munging(Validation and Cleaning)
6. Data Aggregation & Representation(Storage)
7. Exploratory Data Analysis
8. Data Visualization(Preparation for Modeling and Assessment)
9. Utilization of analysis results.
29

More Related Content

Similar to Unit-I_Big data life cycle.pptx, sources of Big Data

Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
Professor Lili Saghafi
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
Experfy
 
Big Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped OpportunitiesBig Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped Opportunities
SAP Technology
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
hktripathy
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Akshata Humbe
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
muflehaljarrah
 
Big Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationBig Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview Preparation
Intellipaat
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
Modern Analytics And The Future Of Quality And Performance Excellence
Modern Analytics And The Future Of Quality And Performance ExcellenceModern Analytics And The Future Of Quality And Performance Excellence
Modern Analytics And The Future Of Quality And Performance Excellence
ICFAI Business School
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
RATISHKUMAR32
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websites
Bhanu Prakash
 
Unlocking big data
Unlocking big dataUnlocking big data
Introduction to visualizing Big Data
Introduction to visualizing Big DataIntroduction to visualizing Big Data
Introduction to visualizing Big Data
Dawit Nida
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
JULIO GONZALEZ SANZ
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Accenture
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
Accenture
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Accenture
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
SlideTeam
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
Lviv Startup Club
 

Similar to Unit-I_Big data life cycle.pptx, sources of Big Data (20)

Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
 
Big Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped OpportunitiesBig Data, Big Thinking: Untapped Opportunities
Big Data, Big Thinking: Untapped Opportunities
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 
Big Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationBig Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview Preparation
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Modern Analytics And The Future Of Quality And Performance Excellence
Modern Analytics And The Future Of Quality And Performance ExcellenceModern Analytics And The Future Of Quality And Performance Excellence
Modern Analytics And The Future Of Quality And Performance Excellence
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websites
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Introduction to visualizing Big Data
Introduction to visualizing Big DataIntroduction to visualizing Big Data
Introduction to visualizing Big Data
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 

Recently uploaded

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 

Recently uploaded (20)

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 

Unit-I_Big data life cycle.pptx, sources of Big Data

  • 1. Course – Big Data Analytics (Professional Elective-II) Course code-IT314B Unit-II- BIG DATA ANALYTICS LIFE CYCLE Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423603 (An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune) NAAC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Information Technology (NBA Accredited) Mr. Rajendra N Kankrale Asst. Prof. 1
  • 2. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Unit-I BIG DATA ANALYTICS LIFE CYCLE • Syllabus • Introduction to Big Data, sources of Big Data, Data Analytic Lifecycle: Introduction, Phase 1: Discovery, Phase 2: Data Preparation, Phase 3: Model Planning, Phase 4: Model Building, Phase 5: Communication results, Phase 6: Operationalize. 2
  • 3. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Unit-I BIG DATA ANALYTICS LIFE CYCLE 1. Why Big Data analytics? 2. What is Big Data analytics? 3. Lifecycle of Big Data analytics 4. Types of Big Data analytics 5. Tools used in Big Data analytics 6. Big Data application domains 3
  • 4. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Why Big Data analytics? Take the music streaming platform Spotify for example. The company has nearly 96 million users that generate a tremendous amount of data every day. Through this information, the cloud-based platform automatically generates suggested songs— through a smart recommendation engine—based on likes, shares, search history, and more. What enables this is the techniques, tools, and frameworks that are a result of Big Data analytics. If you are a Spotify user, then you must have come across the top recommendation section, which is based on your likes, past history, and other things. Utilizing a recommendation engine that leverages data filtering tools that collect data and then filter it using algorithms works. This is what Spotify does. 4
  • 5. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Uses and Examples of Big Data Analytics There are many different ways that Big Data analytics can be used in order to improve businesses and organizations. Here are some examples: • Using analytics to understand customer behavior in order to optimize the customer experience • Predicting future trends in order to make better business decisions • Improving marketing campaigns by understanding what works and what doesn't • Increasing operational efficiency by understanding where bottlenecks are and how to fix them • Detecting fraud and other forms of misuse sooner These are just a few examples — the possibilities are really endless when it comes to Big Data analytics. It all depends on how you want to use it in order to improve your business. 5
  • 6. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT What is Big Data analytics? • What is Big Data? • Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools. • Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent activities, among other things. 6
  • 7. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Big Data sources • Today, there are millions of data sources that generate data at a very rapid rate. These data sources are present across the world. Some of the largest sources of data are social media platforms and networks. Let’s use Facebook as an example—it generates more than 500 terabytes of data every day. This data includes pictures, videos, messages, and more. • Data also exists in different formats, like structured data, semi-structured data, and unstructured data. For example, in a regular Excel sheet, data is classified as structured data—with a definite format. In contrast, emails fall under semi-structured, and your pictures and videos fall under unstructured data. All this data combined makes up Big Data. 7
  • 8. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Types of Big Data analytics The following are the four types of big data analytics: 1. Prescriptive Analytics- (What is the solution?) 2. Diagnostic Analytics- (why did happened?) 3. Predictive Analytics- (What will happen?) 4. Descriptive Analytics- (What has happened ?) 8
  • 9. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics • Apache Spark: Spark is a framework for real-time data analytics, which is a part of the Hadoop ecosystem. • Python: Python is one of the most versatile programming languages that is rapidly being deployed for various applications including machine learning. • SAS: SAS is an advanced analytical tool that is used for working with large volumes of data and deriving valuable insights from it. 9
  • 10. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics • Hadoop: Hadoop is the most popular big data framework that is deployed by a wide range of organizations from around the world for making sense of big data. • SQL: SQL is used for working with relational database management systems. • Tableau: Tableau is the most popular business intelligence tool that is deployed for the purpose of data visualization and business analytics. 10
  • 11. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics • Splunk: Splunk is the tool of choice for parsing machine-generated data and deriving valuable business insights out of it. • R: R is the no. 1 programming language that is being used by data scientists for statistical computing and graphical applications alike. 11
  • 12. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics Cassandra APACHE Cassandra is an open-source NoSQL distributed database that is used to fetch large amounts of data. It’s one of the most popular tools for data analytics and has been praised by many tech companies due to its high scalability and availability without compromising speed and performance. It is capable of delivering thousands of operations every second and can handle petabytes of resources with almost zero downtime. It was created by Facebook back in 2008 and was published publicly. 12
  • 13. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics Apache Storm A storm is a robust, user-friendly tool used for data analytics, especially in small companies. The best part about the storm is that it has no language barrier (programming) in it and can support any of them. It was designed to handle a pool of large data in fault-tolerance and horizontally scalable methods. When we talk about real-time data processing, Storm leads the chart because of its distributed real-time big data processing system, due to which today many tech giants are using APACHE Storm in their system. Some of the most notable names are Twitter, Zendesk, NaviSite, etc. 13
  • 14. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Applications of Big Data Analytics • Customer Acquisition and Retention: Customer information helps tremendously in marketing trends, through data-driven actions, to increase customer satisfaction. For example, personalization engines for Netflix, Amazon, and Spotify help with improved customer experiences and gaining customer loyalty. • Targeted Ads: Personalized data about interaction patterns, order history, and product page viewing history can help immensely to create targeted ad campaigns for customers on a larger scale and at the individual level. 14
  • 15. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Applications of Big Data Analytics • Product Development: It can generate insights on development decisions, product viability, performance measurements, etc., and direct improvements that positively serve the customers. • Price Optimization: Pricing models can be modeled and used by retailers with the help of diverse data sources to maximize revenues. • Supply Chain and Channel Analytics: Predictive analytical models help with B2B supplier networks, preemptive replenishment, route optimizations, inventory management, and notification of potential delays in deliveries. • Risk Management: It helps in the identification of new risks with the help of data patterns for the purpose of developing effective risk management strategies. • Improved Decision-making: The insights that are extracted from the data can help enterprises make sound and quick decisions. 15
  • 16. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Examples/Areas Using Big Data Analytics Tools • Healthcare: Big data analytics technologies and tools are being used in healthcare to predict patient outcomes, identify at-risk patients, and improve population health. • Retail: Big data analytics tools are being used by retailers to improve customer experience, target marketing campaigns, and prevent fraud. • Manufacturing: Big data analytics tools are being used in manufacturing to improve quality control, reduce downtime, and optimize production processes. • Banking: Real time big data analytics tools are being used by banks to detect fraudulent activities, prevent money laundering, and improve customer service. • Government: Big data analytics tools are being used by government agencies to improve public services, combat fraud and corruption, and better understand citizen needs. 16
  • 17. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • The Data analytics lifecycle was designed to address Big Data problems and data science projects. The process is repeated to show the real projects. To address the specific demands for conducting analysis on Big Data, the step-by-step methodology is required to plan the various tasks associated with the acquisition, processing, analysis, and recycling of data. 17
  • 18. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics 18
  • 19. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 1: Discovery – • The data science team is trained and researches the issue. • Create context and gain understanding. • Learn about the data sources that are needed and accessible to the project. • The team comes up with an initial hypothesis, which can be later confirmed with evidence. 19
  • 20. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 1: Discovery – • The first phase of the Data Analytics Lifecycle is the data discovery step. This stage involves identifying potential data sources, both internal and external, that are relevant to the business problem at hand. It is essential to define the scope of the analysis and gather data from various databases, applications, and online repositories. Data can come in different formats, including structured, unstructured, and semi-structured data. • The key to success in this phase is to ensure the data collected is accurate, relevant, and comprehensive. Missing or flawed data can lead to misleading insights and decisions down the line. Rigorous data quality checks and validation procedures are necessary to maintain data integrity. 20
  • 21. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 2: Data Preparation - • Once the data is collected, it is crucial to clean and preprocess it before analysis. Data preparation involves identifying and rectifying errors, duplications, and inconsistencies in the dataset. This process ensures that the data is of high quality and ready for further analysis. • Data preprocessing tasks may include data transformation, normalisation, and handling missing values. Cleaning and preprocessing are time-consuming but vital steps that significantly impact the accuracy and reliability of the final results. Proper data preprocessing can also help in dealing with noise and irrelevant data, leading to better outcomes. 21
  • 22. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 2: Data Preparation - • Methods to investigate the possibilities of pre-processing, analysing, and preparing data before analysis and modelling. • It is required to have an analytic sandbox. The team performs, loads, and transforms to bring information to the data sandbox. • Data preparation tasks can be repeated and not in a predetermined sequence. • Some of the tools used commonly for this process include - Hadoop, Alpine Miner, Open Refine, etc.- 22
  • 23. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 2: Data Preparation - • Data preparation and processing involves gathering, sorting, processing and purifying collected information to make sure it can be utilized by subsequent steps of analysis. • Data Collection: Draw information from external sources. • Data Entry: Within an organization, data entry refers to creating new points of information using either digital technologies or manual input procedures. • Signal Reception: Accumulating data from digital devices like the Internet of Things devices and control systems. • An analytical sandbox is essential during the data preparation stage of data analytics Life Cycle. This scalable platform is used by data analysts and scientists alike for processing their data sets; once executed, loaded, or altered it resides securely inside this sandbox for later examination and modification. 23
  • 24. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 3: Model Planning - • The team studies data to discover the connections between variables. Later, it selects the most significant variables as well as the most effective models. • In this phase, the data science teams create data sets that can be used for training for testing, production, and training goals. • The team builds and implements models based on the work completed in the modelling planning phase. • Some of the tools used commonly for this stage are MATLAB and STASTICA. 24
  • 25. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 3: Model Planning - • The team studies data to discover the connections between variables. Later, it selects the most significant variables as well as the most effective models. • In this phase, the data science teams create data sets that can be used for training for testing, production, and training goals. • The team builds and implements models based on the work completed in the modelling planning phase. • Some of the tools used commonly for this stage are MATLAB and STASTICA. 25
  • 26. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 4: Model Building - • The team creates datasets for training, testing as well as production use. • The team is also evaluating whether its current tools are sufficient to run the models or if they require an even more robust environment to run models. • Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA. • Commercial tools - MATLAB, STASTICA. 26
  • 27. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 5: Communication Results - • Following the execution of the model, team members will need to evaluate the outcomes of the model to establish criteria for the success or failure of the model. • The team is considering how best to present findings and outcomes to the various members of the team and other stakeholders while taking into consideration cautionary tales and assumptions. • The team should determine the most important findings, quantify their value to the business and create a narrative to present findings and summarize them to all stakeholders. 27
  • 28. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 6: Operationalize - • The team distributes the benefits of the project to a wider audience. It sets up a pilot project that will deploy the work in a controlled manner prior to expanding the project to the entire enterprise of users. • This technique allows the team to gain insight into the performance and constraints related to the model within a production setting at a small scale and then make necessary adjustments before full deployment. • The team produces the last reports, presentations, and codes. • Open source or free tools such as WEKA, SQL, MADlib, and Octave. 28
  • 29. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Lifecycle of Big Data analytics The Big Data Analytics Life cycle is divided into nine phases, named as : 1. Business Case/Problem Definition 2. Data Identification 3. Data Acquisition and filtration 4. Data Extraction 5. Data Munging(Validation and Cleaning) 6. Data Aggregation & Representation(Storage) 7. Exploratory Data Analysis 8. Data Visualization(Preparation for Modeling and Assessment) 9. Utilization of analysis results. 29