Data Analytics
An Introduction to the Principles – Developed for
NCS in 2023
Table of
Content
• What is data?
• Types of Data
• What is data Analytics and its importance
• Types of data analytics
• The Data ecosystem
• Data Preparation
• Data Mining
• Planning Data Analysis
What we will not do in
this programme!
1. We shall not write and code
2. We shall not manually
calculate anything
3. We shall not go beyond
shallow interpretations of
results
This thing called data! What is it?
What is
data?
According to Webster Dictionary:
• Factual information (such as measurements
or statistics) used as a basis for reasoning,
discussion, or calculation–
• information in digital form that can be
transmitted or processed
• Data is a collection of facts, such as
numbers, words, measurements,
observations or just descriptions of
things.
It is important to note that this is not just numbers;
but text, images, videos, sound clips etc.
What is data?
Data are collected observations or measurements represented as text numbers
or multimedia.
Raw facts that have not been processed to explain their meaning
What is data? Data can be..
Numbers/digits Field notes Audio recording Documents
Transcripts Videos Images Meeting notes
Exercise
Match the proper type of dataset to the problem
Problem I
HS
Classification
Problem II
Pre arrival
information
Problem III
Biometrics Images
Problem III
Management
Meeting
Meeting
minutes
Categories of
Data
Qualitative Data
Quantitative
Data
Quantitative
Data
• Quantitative data can be expressed as a number
counted or compared
• Information can be counted and expressed
numerically
• Can easily be represented visually in tables and
graphs
Quantitative Data
Quantitative Data
Discrete
Finite values
Space between
values
Continuous
value fall on a
continuum
Usually, a
physical measure
Qualitative
Data
• Provide depth of understanding
• Add ‘feel’ or ‘texture’ to quantitative finding
• May define a problem
• May generate new ideas for research or
intervention
• Very useful in programme evaluation
Exercise
Quantitative and Qualitative Data Examples
What is Data
Analytics?
• Data analytics is the process of analyzing
raw data in order to draw out meaningful,
actionable insights
• the process of examining data sets in order
to find trends and draw conclusions about
the information they contain
Why is Data analytics
important?
Informed Decision Making
Analyzing is very advantageous from a
management standpoint because it enables
you to base judgments more on logic and less
on gut feeling.
For instance, you can recognize chances for
growth, forecast your revenue, or deal with
unusual circumstances before they become
issues. In this way, you may gather pertinent
insights from every department in your
company and, with the aid of dashboard
software, display the data to various
stakeholders in a polished and interactive
manner.
Reduced Cost
Cost savings are an additional wonderful
advantage. Businesses may identify chances for
improvement, trends, and patterns in their data
with the use of cutting-edge technologies like
predictive analytics, and then plan their plans
accordingly. This will eventually enable you to
avoid wasting money and resources on
misguided strategies. Not only that, but you
may also predict production and supply by
forecasting various scenarios, such as sales and
demand.
Targeted Campaigns
• Perhaps the most important factor in every business is the customers. You can discover the
channels your consumers use to communicate with you, their demographics, interests, habits,
purchase behaviors, and more by using analytics to gain a 360° perspective of all elements of your
customers. In the long run, it will help your marketing initiatives succeed, enable you to find new
potential clients, and prevent you from wasting money on mistargeting or miscommunicating.
You can monitor customer satisfaction by examining feedback from clients or the effectiveness of
your customer service division.
Enhance Productivity
With Business Analytics tools, we can
have a more profound understanding
of primary and secondary data
emerging from their activities. This
helps businesses refine their
procedures further and be more
productive.
Other benefits
Improve operational
efficiency through their daily
activities.
Assist businesses to
understand their customers
more precisely.
Business uses data
visualization to offer
projections for future
outcomes.
These insights help in
decision making and
planning for the future.
Business analytics measures
performance and drives
growth.
Discover hidden trends,
generate leads, and scale
business in the right
direction.
Types of Data analytics
Descriptive Data Analytics
Descriptive analytics is the most basic type of analytics and serves as the foundation for all others. It
enables you to extract trends from raw data and describe what happened or is happening in a
concise manner.
Performing descriptive analysis is essential, as it allows us to present our insights in a meaningful
way. Although it is relevant to mention that this analysis on its own will not allow you to predict
future outcomes or tell you the answer to questions like why something happened, it will leave your
data organized and ready to conduct further investigations.
Descriptive Data Analytics - Scenario
A report showing sales of N 900 million may sound impressive, but it lacks context. If that figure represents a 20% month-
over-month decline, it is a concern. If it is a 40% year-over-year increase, then it suggests something is going right with the
sales strategy. However, the larger context including targeted growth is required to obtain an informed view of the
company's sales performance.
Diagnostic Data Analytics
Diagnostic data analytics empowers analysts and executives by helping them gain a firm
contextual understanding of why something happened. If you know why something
happened as well as how it happened, you will be able to pinpoint the exact ways of
tackling the issue or challenge.
This is one of the world's most important research methods, designed to provide direct and
actionable answers to specific questions.
Diagnostic Data Analytics - Scenario
Human resource departments can collect data on employees' physical and psychological safety, issues they care about, and the
qualities and skills that make someone successful and happy. Many of these insights come from conducting internal, anonymous
surveys and exit interviews to identify factors that influenced employees' decisions to stay or leave.
Gathering information about employees' thoughts and feelings allows you to analyze the data and determine how to improve areas
such as company culture and benefits. This can range from wishing the company made more corporate social responsibility (CSR)
contributions to experiencing workplace discrimination. In these cases, the data makes a case for allocating more resources to CSR
and efforts to promote diversity, equity, inclusion, and belonging.
Predictive Data Analytics
• The predictive method allows you to predict what will happen in the future. It does this by
combining the results of the previously mentioned descriptive, exploratory, and diagnostic
analyses with machine learning (ML) and artificial intelligence (AI). You can discover future trends,
potential problems or inefficiencies, connections, and casualties in your data in this manner.
• You can use predictive analysis to develop initiatives that will not only improve your various
operational processes, but will also help you gain an important competitive advantage. You will
be able to develop an informed projection of how things may unfold in specific areas of the
business if you understand why a trend, pattern, or event occurred through data.
Predictive Data Analytics - Scenario
Consumer data is abundant in marketing and is used to create content, advertisements, and
strategies to better reach potential customers where they are. Predictive analytics is the process
of analyzing historical behavioral data and using it to predict what will happen in the future.
Furthermore, historical behavioral data can help predict a lead's likelihood of progressing from
awareness to purchase. For example, you could use a single linear regression model to determine
that the number of content offerings a lead engages with predicts their likelihood of converting
to a customer with a statistically significant level of certainty. With this information, you can
create targeted ads at various stages of the customer's lifecycle.
Prescriptive Data Analytics
• Another of the most effective types of research analysis methods.
Prescriptive data techniques are like predictive analysis in that they use
patterns or trends to develop responsive, practical business strategies.
• You will play an active role in the data consumption process by taking well-
organized sets of visual data and using it as a powerful fix to emerging
issues in several key areas by drilling down into prescriptive analysis.
Prescriptive Data
Analytics - Scenario
• Another of the most effective
types of research analysis
methods. Prescriptive data
techniques are like predictive
analysis in that they use
patterns or trends to develop
responsive, practical
business strategies.
• You will play an active role in
the data consumption
process by taking well-
organized sets of visual data
and using it as a powerful fix
to emerging issues in several
key areas by drilling down
into prescriptive analysis.
Data ecosystem
What is the Data
Ecosystem
The term data ecosystem refers to the programming
languages, packages, algorithms, cloud computing services,
and overall infrastructure that an organization uses to
collect, store, analyze, and leverage data.
No two organizations leverage the same data in the
same way. As such, each organization has a unique
data ecosystem. These ecosystems may overlap in
some cases, particularly when data is pulled or
scraped from a public source, or when third-party
providers are leveraged (for example, cloud storage
providers).
We shall look at the ecosystem via the Data Project
Life Cycle
The data project life cycle
Sensing Collection Wrangling Analysis
Sensing -
The process of identifying data sources for your project is referred to as sensing. It entails
assessing the quality of data in order to determine its usefulness. This evaluation includes
questions such as:
Is the data correct?
Is the data current and up to date?
Is the data complete?
Is the data correct? Can it be relied on?
Internal sources of data include databases, spreadsheets, CRMs, and other software. It can
also come from outside sources like websites or third-party data aggregators.
Sensing – Aspects of Data Ecosystem leveraged
• Internal data sources: Proprietary databases, spreadsheets, and other
resources that originate from within your organization
• External data sources: Databases, spreadsheets, websites, and other data
sources that originate from outside your organization
• Software: Custom software that exists for the sole purpose of data sensing
• Algorithms: A set of steps or rules that automates the process of evaluating
data for accuracy and completion before it’s used
Collection
• Data collection can be completed manually or automatically. However, it is generally not
feasible to collect large amounts of data manually. That is why data scientists write
software in programming languages to automate the data collection process.
• For example, it is possible to write code to "scrape" relevant information from a website
(aptly named a web scraper). It is also possible to create and code an application
programming interface, or API, to directly extract information from a database or interact
with a web application.
Collection – Aspects of Data Ecosystem
leveraged
• Key pieces of the data ecosystem leveraged in this stage include:
• Various programming languages: These include R, Python, SQL, and
JavaScript
• Code packages and libraries: Existing code that’s been written and
tested and allows data scientists to generate programs more quickly and
efficiently
• APIs: Software programs designed to interact with other applications and
extract data
Wrangling
• Data wrangling is a collection of processes
used to convert raw data into a more usable
format. It may involve merging multiple
datasets, identifying and filling gaps in data,
deleting unnecessary or incorrect data, and
"cleaning" and structuring data for future
analysis, depending on the quality of the data
in question.
• Data wrangling, like data collection, can be
done manually or automatically. Manual
processes can be effective if the dataset is
small enough. Most larger data projects
require automation because the amount of
data is too large.
Wrangling –
Aspects of
Data
Ecosystem
leveraged
• Key pieces of the data ecosystem
leveraged in this stage include:
• Algorithms: A series of steps or rules to
be followed to solve a problem (in this
case, the evaluation and manipulation of
data)
• Various programming languages: These
include R, Python, SQL, and JavaScript, and
can be used to write algorithms
• Data wrangling tools: A variety of data
wrangling tools can be purchased or
sourced for free to perform parts of the
data wrangling process. OpenRefine,
DataWrangler, and CSVKit are all
examples.
Analysis
• Raw data can be analyzed after it has been
inspected and transformed into a usable state.
This analysis can be diagnostic, descriptive,
predictive, or prescriptive, depending on the
specific challenge your data project seeks to
address. Although each of these types of
analysis is distinct, they all rely on the same
processes and tools.
• Typically, your analysis will begin with some
form of automation, especially if your dataset
is extremely large. Following the completion of
automated processes, data analysts apply
their expertise to glean additional insights.
Analysis –
Aspects of
Data
Ecosystem
Leveraged
• Key pieces of the data ecosystem
leveraged in this stage include:
• Algorithms: A series of steps or rules
to be followed to solve a problem (in
this case, the analysis of various data
points)
• Statistical models: Mathematical
models used to investigate and
interpret data
• Data Visualization tools include
Tableau, Microsoft BI, and Google
Charts, which can generate graphical
representations of data. Data
visualization software may also have
other functionality you can leverage.
Storage
Throughout all the data life cycle
stages, data must be stored in a way
that’s both secure and accessible. The
exact medium used for storage is
dictated by your organization’s policies.
Storage - Aspects of Data Ecosystem
Leveraged
• Key pieces of the data ecosystem leveraged in this stage
include:
• Cloud-based storage solutions: These allow an
organization to store data off-site and access it remotely
• On-site servers: These give organizations a greater sense
of control over how data is stored and used
• Other storage media: These include hard drives, USB
devices, Cloud-drives
Some Data Formats
Types of Data Used in
Data Analytics
Tabular Data and
Flat files
Multimedia Files
Some Tabular file formats
1. xlsx
2. csv
3. tsv
4. txt
5. json
Data Gathering
What is data
collection?
Data collection is the process of gathering data
for use in business decision-making, strategic
planning, research and other purposes. It's a
crucial part of data analytics applications and
research projects: Effective data collection
provides the information that's needed to answer
questions, analyze business performance or
other outcomes, and predict future trends, actions
and scenarios.
Data Collection in
organizations
• In businesses, data is collected at different levels examples include:
• IT systems regularly collect data on customers,
• Employees,
• Sales and
• other aspects of business operations when transactions are processed and
• data is entered
• Survey (like from social media)
• Internal Data Systems and External Data Sources ( usually used for
application and Business Intelligence(BI)
Data
Collection
for
Research
For research in science, medicine,
higher education and other fields, data
collection is often a more specialized
process, in which researchers create
and implement measures to collect
specific sets of data. In both the
business and research contexts,
though, the collected data must be
accurate to ensure that analytics
findings and research results are valid.
Data Collection
Methods
The methods used to collect data vary based on the type
of application. Some involve the use of technology,
while others are manual procedures. The following are
some common data collection methods:
Data
Collection
Methods
1. Automated data collection functions
built into business applications,
websites and mobile apps
2. Sensors that collect operational data
from industrial equipment, vehicles
and other machinery;
3. Collection of data from information
services providers and other external
data sources
Data
Collection
Methods
6. Tracking social media, discussion
forums, reviews sites, blogs and other
online channels;
7. Surveys, questionnaires and forms,
done online, in person or by phone, email
or regular mail;
8. Focus groups and one-on-one
interviews;
9. Direct observation of participants in a
research study
Data
Collection
Process
Identify a business or
research issue that
needs to be addressed
and set goals for the
project.
Gather data
requirements to answer
the business question or
deliver the research
information.
Identify the data sets
that can provide the
desired information.
Set a plan for collecting
the data, including the
collection methods that
will be used.
Collect the available
data and begin working
to prepare it for
analysis.
Challenges with Data Collection
• Data quality issues. Raw data typically includes errors, inconsistencies
and other issues. Ideally, data collection measures are designed to avoid
or minimize such problems. That is not foolproof in most cases, though. As
a result, collected data usually needs to be put through data profiling to
identify issues and data cleaning to fix them.
• Finding relevant data. With a wide range of systems to navigate,
gathering data to analyze can be a complicated task for data scientists and
other users in an organization. The use of data curation techniques helps
make it easier to find and access data. For example, that might
include creating a data dictionary and searchable indexes.
Challenges with Data Collection
• Deciding what data to collect. This is a fundamental issue both for upfront collection of
raw data and when users gather data for analytics applications. Collecting data that isn't
needed adds time, cost and complexity to the process. But leaving out useful data can
limit a data set's business value and affect analytics results.
• Dealing with big data. Big data environments typically include a combination of
structured, unstructured and semi structured data, in large volumes. That makes the initial
data collection and processing stages more complex. In addition, data scientists often
need to filter sets of raw data stored in a data lake for specific analytics applications.
• Low response and other research issues. In research studies, a lack of responses or
willing participants raises questions about the validity of data that's collected. Other
research challenges include training people to collect the data and creating sufficient
quality assurance procedures to ensure that the data is accurate.
Data Collection Consideration and Best
Practices
• The European Union's General Data Protection Regulation (GDPR) and Nigerian Data
Protection Regulation (NDPR) enacted in recent years make data privacy and security
bigger considerations when collecting data, particularly if it contains personal information
about customers. An organization’s data governance program should include policies to
ensure that data collection practices comply with laws such as GDPR.
• Other data collection best practices include the following:
• Make sure you collect the right data to meet business or research needs.
• Ensure that the data is accurate, either as it's collected or as part of the data preparation
process.
• Don't waste time and resources collecting irrelevant data.
Data Preparation
Data preparation is the process of gathering, combining,
structuring and organizing data so it can be used in
business intelligence (BI), analytics, data visualization
applications and Machine Learning
Components of
Data
Preparation
data preprocessing,
profiling,
cleansing,
validation
and transformation;
pulling together data from different internal systems and
external sources
Why Data Prep?
Data is commonly created with missing values, inaccuracies or
other errors, and separate data sets often have different formats
that need to be reconciled when they're combined. Correcting
data errors, validating data quality and consolidating data sets
are big parts of data preparation projects.
Why Data
Preparatio
n
• Ensure the data used in analytics
applications produces reliable results;
• Identify and fix data issues that
otherwise might not be detected;
• Enable more informed decision-making
by business executives and operational
workers;
• Reduce data management and analytics
costs;
• Avoid duplication of effort in preparing
data for use in multiple applications; and
• Get a higher ROI from BI, analytics and
ML initiatives.
Data Preparation
Steps
1. Data collection. Relevant data is gathered from operational systems, data warehouses, data
lakes and other data sources. During this step, data scientists, members of the BI team, other
data professionals and end users who collect data should confirm that it's a good fit for the
objectives of the planned analytics applications.
2. Data discovery and profiling. The next step is to explore the collected data to better
understand what it contains and what needs to be done to prepare it for the intended uses. To
help with that, data profiling identifies patterns, relationships and other attributes in the data,
as well as inconsistencies, anomalies, missing values and other issues so they can be
addressed.
3. Data cleansing. Next, the identified data errors and issues are corrected to create complete
and accurate data sets. For example, as part of cleaning data set , faulty data is removed or
fixed, missing values are filled in and inconsistent entries are harmonized.
Data Preparation
Steps
6. Data structuring. At this point, the data needs to be modeled and organized to meet the
analytics requirements. For example, data stored in comma-separated values (CSV) files or
other file formats has to be converted into tables to make it accessible to BI and analytics tools.
7. Data transformation and enrichment. In addition to being structured, the data typically must
be transformed into a unified and usable format. For example, data transformation may involve
creating new fields or columns that aggregate values from existing ones. Data enrichment
further enhances and optimizes data sets as needed, through measures such as augmenting
and adding data.
8. Data validation and publishing. In this last step, automated routines are run against the
data to validates is consistency, completeness and accuracy. The prepared data is then stored
in a data warehouse, a data lake or another repository and either used directly by whoever
prepared it or made available for other users to access.
Challenges
with Data
Preparatio
n
• Inadequate or nonexistent data profiling. If
data isn't properly profiled, errors, anomalies
and other problems might not be identified,
which can result in flawed analytics.
• Missing or incomplete data. Data sets often
have missing values and other forms of
incomplete data; such issues need to be
assessed as possible errors and addressed if
so.
• Invalid data values. Misspellings, other
typos and wrong numbers are examples of
invalid entries that frequently occur in data
and must be fixed to ensure analytics
accuracy.
• Name and address
standardization. Names and addresses may
be inconsistent in data from different
systems, with variations that can affect views
of customers and other entities.
Challenges with
Data Preparation
• Inconsistent data across enterprise systems. Other inconsistencies in data
sets drawn from multiple source systems, such as different terminology and
unique identifiers, are also a pervasive issue in data preparation efforts.
• Data enrichment. Deciding how to enrich a data set -- for example, what to
add to it - is a complex task that requires a strong understanding of business
needs and analytics goals.
• Maintaining and expanding data prep processes. Data preparation work
often becomes a recurring process that needs to be sustained and enhanced
on an ongoing basis.
Data
Preparation
Guide and
Best practice
1. Think of data preparation as part of data
analysis. Data preparation and analysis are
"two sides of the same coin," Farmer wrote.
Data, he said, can't be properly prepared
without knowing what analytics use it needs
to fit.
2. Define what data preparation success
means. Desired data accuracy levels and
other data quality metrics should be set as
goals, balanced against projected costs to
create a data prep plan that's appropriate to
each use case.
3. Prioritize data sources based on the
application. Resolving differences in data
from multiple source systems is an important
element of data preparation that also should
be based on the planned analytics use case.
Data
Preparation
Guide and
Best practice
4. Use the right tools for the job and your skill
level. Self-service data preparation tools aren't
the only option available -- other tools and
technologies can also be used, depending on
your skills and data needs.
5. Be prepared for failures when preparing
data. Error-handling capabilities need to be
built into the data preparation process to
prevent it from going awry or getting bogged
down when problems occur.
6. Keep an eye on data preparation costs. The
cost of software licenses, processing and
storage resources, and the people involved in
preparing data should be watched closely to
ensure that they don't get out of hand.
Data Validation
Data validation is the practice of checking the integrity,
accuracy and structure of data before it is used for a business
operation. It can also be used to ensure the integrity of data
for financial accounting or regulatory compliance.
Data Validation in
Excel
We will practice data validation in excel
Data Mining
Data mining is the process of sorting through large data sets to
identify patterns and relationships that can help solve business
problems through data analysis. It’s a key part of data analytics and it
is one of the key disciplines of Data Science.
Importance of
Data Mining
Effective data mining aids in various aspects of planning business strategies and
managing operations. That includes customer-facing functions such as marketing,
advertising, sales and customer support, plus manufacturing, supply chain
management, finance and HR. Data mining supports fraud detection, risk
management, cyber security planning and many other critical business use
cases. It also plays an important role in healthcare, government, scientific
research, mathematics, sports and more.
Data
Mining
Process
1.Data gathering: Relevant data for an
analytics application is identified and
assembled. The data may be in different
source systems.
2. Data Preparation: This stage includes a
set of steps to get the data ready to be
mined. It starts with data exploration,
profiling and pre-processing, followed by
data cleansing work to fix errors and
other issues. Data transformation is also
done to make data sets consistent,
unless a data scientist is looking to
analyze unfiltered raw data for a
particular application.
Data
Mining
Process
3. Mining the data. Once the data is
prepared, a data analyst chooses the
appropriate data mining technique and
then implements one or more algorithms
to do the mining. In machine learning
applications, the algorithms typically must
be trained on sample data sets to look for
the information being sought before
they're run against the full set of data.
4. Data analysis and interpretation. The
data mining results are used to create
analytical models that can help drive
decision-making and other business
actions.
Some Data Mining
Techniques
• Association rule mining. In data mining, association rules are if-then
statements that identify relationships between data elements. Support and
confidence criteria are used to assess the relationships -- support measures
how frequently the related elements appear in a data set, while confidence
reflects the number of times an if-then statement is accurate.
• Classification. This approach assigns the elements in data sets to different
categories defined as part of the data mining process. Decision trees, Naive
Bayes classifiers, k-nearest neighbor and logistic regression are some
examples of classification methods.
Some Data Mining
Techniques
• Clustering. In this case, data elements that share characteristics are grouped
together into clusters as part of data mining applications. Examples include k-
means clustering, hierarchical clustering and Gaussian mixture models.
• Regression. This is another way to find relationships in data sets, by
calculating predicted data values based on a set of variables. Linear regression
and multivariate regression are examples. Decision trees and some other
classification methods can be used to do regressions, too.
Some Data Mining
Techniques
• Sequence and path analysis. Data can also be mined to look for patterns in
which a particular set of events or values leads to later ones.
• Neural networks. A. neural network is a set of algorithms that simulates the
activity of the human brain. Neural networks are particularly useful in complex
pattern recognition applications involving deep learning, a more advanced
offshoot of machine learning.
Some
Applications
in finance
1. Banks and credit card companies
use data mining tools to build
financial risk models, detect
fraudulent transactions and vet loan
and credit applications. Data mining
also plays a key role in marketing
and in identifying potential upselling
opportunities with existing
customers.
2. Insurers rely on data mining to aid
in pricing insurance policies and
deciding whether to approve policy
applications, including risk modeling
and management for prospective
customers.
Planning Data Analysis
Step-by-step guide to data analysis
Preamble
Data analysis, like any scientific discipline, follows
a strict step-by-step procedure. Each stage
necessitates a unique set of skills and knowledge.
To gain meaningful insights, however, it is
necessary to comprehend the entire process. A
solid foundation is essential for producing results
that can withstand scrutiny.
The
following are
standard
steps
towards data
analysis
Defining the
question
Collecting
the data
Cleaning
the data
Analyzing
the data
Sharing
your results
Embrace
failure
Step 1 Defining
the Objective
Defining your objective entails developing a hypothesis and determining how
to test it. Begin by asking yourself, "What business problem am I attempting
to solve?" While this may appear to be a simple task, it can be more difficult
than it appears. For example, your organization's senior management may
raise the question, "Why are so many microfinance banks failin?" However, it
is possible that this does not address the root of the problem. A data
analyst's job is to understand the organization and its goals thoroughly
enough to frame the problem correctly.
Step 2
Collecting
the Data
After you've determined your goal, you'll need
to devise a strategy for collecting and
aggregating the necessary data. A critical
component of this is determining which data
you require. This could be quantitative
(numerical) data, such as sales figures, or
qualitative (descriptive) data, such as
customer feedback. All data was classified as
first-party, second-party, or third-party data.
Let's investigate each one.
Step 2
Collecting
the Data –
First Party
Data
First-party data is information that you or your
company has obtained directly from
customers. It could be transactional tracking
data or data from your company's customer
relationship management (CRM) system. First-
party data, regardless of its source, is typically
structured and organized in a consistent,
defined manner. Customer satisfaction
surveys, focus groups, interviews, and direct
observation are all possible sources of first-
party data.
Step 2
Collecting
the Data –
Second
Party Data
You may want to secure a secondary data
source to supplement your analysis. The first-
party data of other organizations is referred to
as second-party data. This could be obtained
directly from the company or via a private
marketplace. The main advantage of second-
party data is that it is usually structured, and
while it is less relevant than first-party data, it
is also quite reliable. Website, app, or social
media activity, such as online purchase
histories or shipping data, are examples of
second-party data.
Step 2
Collecting
the Data –
Third Party
Data
Third-party data is information gathered
and aggregated from multiple sources
by a third-party organization. Third-
party data frequently (but not always)
contains many unstructured data points
(big data). Many businesses collect big
data in order to create industry reports
or conduct market research. Gartner, a
research and advisory firm, is a good
real-world example of a company that
collects big data and sells it to other
companies. Third-party data can also be
found in open data repositories and
government portals.
Step 3 Cleaning
the Data
Cleaning data usually takes up to 70% to 90% of the time in a typical data project. The
following steps are standard data cleaning procedures
• Removing major errors, duplicates, and outliers—all of which are
inevitable problems when aggregating data from numerous sources.
• Removing unwanted data points—extracting irrelevant observations that
have no bearing on your intended analysis.
Step 3 Cleaning
the Data
• Bringing structure to your data—general ‘housekeeping’, i.e. fixing typos
or layout issues, which will help you map and manipulate your data more
easily.
• Filling in major gaps—as you’re tidying up, you might notice that
important data are missing. Once you’ve identified gaps, you can go about
filling them.
Step 4
Analyze
the data
The type of data analysis you perform is
largely determined by your goal. However,
there are numerous techniques available.
Some examples include univariate or
bivariate analysis, time-series analysis, and
regression analysis. What matters more than
the different types is how you use them.
This is determined by the insights you seek.
Step 5 Presentation
of the Result
The final step in the data analytics process
is to share these insights with the rest of
the world (or, at the very least, with the
stakeholders in your organization!) This is
more complicated than simply sharing the
raw results of your work; it entails
interpreting the results and presenting
them in a way that all types of audiences
can understand. Because you will be
presenting information to decision-makers
on a regular basis, it is critical that the
insights you present are completely clear
and unambiguous.
Wait! One
more step-
Embrace
your failures
Data analytics is inherently messy, and the
process you use will vary depending on the
project. For example, while cleaning data,
you may notice patterns that prompt a
whole new set of questions. This could
return you to step one (to redefine your
objective). Similarly, an exploratory analysis
may reveal a set of data points you had
never considered using before. Or perhaps
you discover that the results of your core
analyses are misleading or incorrect. This
could be due to data errors or human error
earlier in the process.
Embrace
your
failure
Data analysis is inherently chaotic, and
mistakes occur. What’s important is to
hone your ability to spot and rectify
errors. If data analytics was
straightforward, it might be easier, but it
certainly wouldn’t be as interesting. Use
the steps we’ve outlined as a
framework, stay open-minded, and be
creative. If you lose your way, you can
refer to the process to keep yourself on
track.
Data
Visualization
and Techniques
Data visualization is the process of creating
graphical representations of information. This
process helps the presenter communicate
data in a way that’s easy for the viewer to
interpret and draw conclusions.
1. Pie Charts
Pie charts are ideal for illustrating
proportions, or part-to-whole
comparisons.
Pie charts are best suited for audiences
who are unfamiliar with the information
or are only interested in the key
takeaways because they are relatively
simple and easy to read. Pie charts fall
short in their ability to display complex
information for viewers who require a
more thorough explanation of the data.
2. Bar Charts
The categories being compared are shown on
one axis of the chart, and the measured value
is shown on the other. The length of the bar
represents how each group performs in
relation to the value.
One disadvantage is that when there are too
many categories, labeling and clarity can
become difficult. They, like pie charts, can be
too simple for more complex data sets.
3. Histogram
Histograms, as opposed to bar charts, depict the
distribution of data over a continuous interval or
defined period. These visualizations aid in determining
where values are concentrated as well as gaps or
unusual values.
Histograms are particularly useful for displaying the
frequency of an occurrence. A histogram, for example,
can be used to show how many clicks your website
received each day over the last week. You can quickly
determine which days your website received the most
and least clicks using this visualization.
4. Heat Map
It is used to communicate
values in such a way that the
viewer can quickly identify
trends. A clear legend is
required for a user to
successfully read and
interpret a heatmap.
5. Scatter Plot
A scatter plot displays data for two
variables as represented by points plotted
against the horizontal and vertical axis.
This type of data visualization is useful in
illustrating the relationships that exist
between variables and can be used to
identify trends or correlations in data.
Scatter plots are most effective for fairly
large data sets, since it’s often easier to
identify trends when there are more data
points present. Additionally, the closer the
data points are grouped together, the
stronger the correlation or trend tends to
be.
6.Pictogram Chart
Pictogram charts, also known as pictograph
charts, are particularly useful for visually
and engagingly presenting simple data.
These charts visualize data using icons, with
each icon representing a different value or
category. Data about time, for example,
could be represented by clock or watch
icons. Each icon can represent a single unit
or a set number of units (for example, each
icon represents 100 units).
In addition to making the data more engaging,
pictogram charts are helpful in situations
where language or cultural differences might
be a barrier to the audience’s understanding of
the data.
7. Highlight Table
A highlight table is a more interesting option than
traditional tables. You can make it easier for viewers
to spot trends and patterns in the data by
highlighting cells in the table with color. These
visualizations can help you compare categorical data.
You may be able to add conditional formatting rules
to the table that automatically color cells that meet
specified conditions, depending on the data
visualization tool you're using. When using a
highlight table to visualize a company's sales data, for
example, you can color cells red if the sales data is
below the goal, or green if the sales data is above the
goal. Unlike a heat map, each color in a highlight
table represents a single meaning or value.
8. Chloroplate map
A choropleth map visualizes numerical values
across geographic regions by using color, shading,
and other patterns. These visualizations use a
color progression (or shading) on a spectrum to
distinguish between high and low values.
Choropleth maps show how a variable changes
from one region to the next. Because the colors
represent a range of values, the exact numerical
values aren't easily accessible in this type of
visualization. However, some data visualization
tools allow you to add interactivity to your map
so that you can see the exact values.
9. Word Cloud
A word cloud, also known as a tag cloud, is a visual representation of
text data in which the size of each word corresponds to its frequency.
The larger a specific word appears in the visualization, the more
frequently it appears in the dataset. Words may appear bolder or
follow a specific color scheme depending on their frequency, in
addition to size.
Word clouds are frequently used on websites and blogs to identify
significant keywords and compare textual data differences between
two sources. They are also useful when analyzing qualitative datasets,
such as the specific words used to describe a product by customers.
Correlation
Matrix
A correlation matrix is a table that shows
correlation coefficients between variables.
Each cell represents the relationship between
two variables, and a color scale is used to
communicate whether the variables are
correlated and to what extent.
Correlation matrices are useful to summarize
and find patterns in large data sets. In
business, a correlation matrix might be used
to analyze how different data points about a
specific product might be related, such as
price, advertising spend, launch date, etc.
Making Data Backed
Decisions with Data
Visualization
Data visualization is a powerful tool when it comes to
addressing business questions and making informed decisions.
Learning how to create effective illustrations can empower you
to share findings with key stakeholders and other audiences in a
manner that’s engaging and easy to understand.
Data Story Telling
Data storytelling is crucial because it enables data to be communicated clearly. Effective
data analysis can support better decision-making, which strengthens your working
relationship with your clients, management, and particularly investors. This type of data
analysis can also assist in altering people's habits and enhancing their comprehension of
challenging problems.
The Psychological
Power of Storytelling
• When someone hears a story, multiple parts of the brain are
engaged, including:
• The area which controls language comprehension
• The area which processes emotional response
• The neurons which play a role in empathy
• When multiple areas of the brain are engaged, it is more
likely to convert the experience of hearing a story into a
long-term memory. In other words, you leave an experience
with your audience.
• Instead of sharing or reading from a data spreadsheet
and a list of figures, think about how you may activate
different brain regions listed above. You may make
your arguments more memorable and actionable by
using data storytelling to elicit an emotional response
on a brain level.
• By using data storytelling, data insights may be put
into action. Without clear communication, insights may
be overlooked and forgotten.
Data Storytelling
Process
• There are 3 key steps to data storytelling:
• Storyboarding - mapping out the direction
and flow that data insights will follow, from
start to conclusion. This can be done on a
sheet of paper or with diagrams. It helps
identify how insights should best be
presented to guide their audience to a
meaningful and valuable conclusion.
• Data visualization - Data visualization gives
stakeholders the ability to use information
intuitively, without deep technical expertise.
Our eyes are drawn to colors, patterns, and
shapes as viewers. Thus, putting analyses
into graphs, charts, and graphics enables the
audience to access and understand the
information not just with their eyes but with
their mind. In our case here, we will use the
dashboard from the earlier exercise.
• Data Narrative(the most crucial step): It is a
key vehicle to convey meaningful insights,
with visualizations and data being the ‘proof’.
Crafting a Compelling Data
Narrative Components
• The following steps are key to crafting a compelling data
narrative:
• Character
• Setting
• Conflict or value
• Resolution
• We shall use the following scenario to explain data
narrative.
• An ISP received a grant from a donor agency to
provide internet services in a remote community
which has only been made popular by a college of
education. He is to present a report to the donors
on how he utilized the bandwidth given to him and
the money made from it. He used the grant to
provide the following services: PoS, final year
project research by students, receiving and sending
of emails as a service, job applications, online
examinations, general internet surfing by students.
Data Narrative : Character
• Before this narrative, a storyboard has been developed and a dashboard of
visual data created for this.
• Set the scene by explaining how the service has provided impact in that
community. Examples:
• the Service has created employment for 50 PoS merchants
• 413 final year student utilized our facility for their project
• During the last Nigerian Army recruitment, over 100 youths in the community
used our facility to submit their application; 86 of them were called for an
interview
• Use a data visualization to show the decline across audience types and
highlight the largest drop in young users.
Data Narrative :
Conflict or Value
• This aspect of the narrative throws more light to the success or the failure.
In the case of our scenario, the conflict could go like this:
• The relatively huge success record is largely due to our Customer
Education efforts as a value-added service to the internet services
in the community.
• We take out time to show community members how to
thoroughly make these job applications.
• Once a student comes in and registers to use the facility
for their project, we go out of our way to train them on
how to conduct research over the internet.
• The community produces over 3.5 tons of cashews
annually and then sell this to urban merchants. These
transactions were cash based but the 40 PoS jobs created
from our internet service have facilitated payments to a
tune of over 33 million naira for the farmers and yielded
over 80% financial inclusion amongst the farmers in the
community.
Data Narrative :
Resolution
• The PoS services exist only within a radius of 150 metres
from the facility with about 40 PoS merchants. People
come from all over the community to carry out financial
transactions.
• An upgrade on the infrastructure and bandwidth
will allow the service to go beyond 150 metres
radius to as much as 4 kilometres radius.
• This can also increase the number of
students using the service for their project
research from 413 to over 1000 students
annually.
• Over 400 PoS merchants will emerge to
facilitate financial transactions. This will also
attract more urban merchants to conduct
business with the community farmers.
Data Management Best
Practices and
Protection Laws
GDPR and NDPR
General
Data
Protection
Regulation -
GDPR
• GDPR came into force on May 25, 2018. Countries
within Europe were given the ability to make their
own small changes to suit their own needs. Within
the UK this flexibility led to the creation of the Data
Protection Act (2018), which superseded the
previous 1998 Data Protection Act.
• GDPR can be considered as the world's strongest
set of data protection rules, which enhance how
people can access information about them and
places limits on what organisations can do with
personal data
• GDPR was designed to "harmonise" data privacy
laws across all of its members countries as well as
providing greater protection and rights to individuals.
• GDPR was also created to alter how businesses and
other organisations can handle the information of
those that interact with them.
• There's the potential for large fines and reputational
damage for those found in breach of the rules.
GDPR
Principles
Article 5 of the General Data
Protection Regulation (GDPR) sets
out key principles which lie at the
heart of the general data protection
regime. These key principles are set
out right at the beginning of the
GDPR and they both directly and
indirectly influence the other rules
and obligations found throughout
the legislation. Therefore,
compliance with these fundamental
principles of data protection is the
first step for controllers in ensuring
that they fulfil their obligations
under the GDPR. The following is a
brief overview of the Principles of
Data Protection found in article 5
GDPR:
1. Lawfulness, fairness,
and transparency:
Any processing of personal data should be lawful and fair. It should be
transparent to individuals that personal data concerning them are collected,
used, consulted, or otherwise processed and to what extent the personal
data are or will be processed. The principle of transparency requires that any
information and communication relating to the processing of those personal
data be easily accessible and easy to understand, and that clear and plain
language be used.
2. Purpose
Limitation
Personal data should only be collected for specified, explicit, and legitimate
purposes and not further processed in a manner that is incompatible with
those purposes. In particular, the specific purposes for which personal data
are processed should be explicit and legitimate and determined at the time
of the collection of the personal data. However, further processing for
archiving purposes in the public interest, scientific, or historical research
purposes or statistical purposes (in accordance with Article 89(1) GDPR) is not
considered to be incompatible with the initial purposes.
3. Data
Minimisation
Processing of personal data must be adequate, relevant, and limited to what
is necessary in relation to the purposes for which they are processed.
Personal data should be processed only if the purpose of the processing
could not reasonably be fulfilled by other means. This requires, in particular,
ensuring that the period for which the personal data are stored is limited to a
strict minimum (see also the principle of ‘Storage Limitation’ below).
4. Accuracy
Controllers must ensure that personal data are accurate and, where
necessary, kept up to date; taking every reasonable step to ensure that
personal data that are inaccurate, having regard to the purposes for which
they are processed, are erased or rectified without delay. In particular,
controllers should accurately record information they collect or receive and
the source of that information.
5. Storage
Limitation
Personal data should only be kept in a form which permits identification of
data subjects for as long as is necessary for the purposes for which the
personal data are processed. In order to ensure that the personal data are
not kept longer than necessary, time limits should be established by the
controller for erasure or for a periodic review.
6. Integrity and
Confidentiality:
Personal data should be processed in a manner that ensures appropriate
security and confidentiality of the personal data, including protection against
unauthorised or unlawful access to or use of personal data and the
equipment used for the processing and against accidental loss, destruction
or damage, using appropriate technical or organisational measures.
7. Accountability
Finally, the controller is responsible for, and must be able to demonstrate,
their compliance with all of the above-named Principles of Data Protection.
Controllers must take responsibility for their processing of personal data and
how they comply with the GDPR and be able to demonstrate (through
appropriate records and measures) their compliance, in particular to the
DPC.
Nigeria Data
Protection
Regulation -
NDPR
In Nigeria, data protection is a
constitutional right founded on
Section 37 of the Constitution of the
Federal Republic of Nigeria 1999 (as
amended) ('the Constitution').
The Nigerian Data Protection
Regulation, 2019 ('NDPR') is the
main data protection regulation in
Nigeria. The NDPR was issued by
the National Information
Technology Development
Agency ('NITDA'). The NDPR
expounded the concept of data
protection under the Constitution.
Nigeria Data
Protection
Regulation -
NDPR
• In Nigeria, data protection is a constitutional
right founded on Section 37 of
the Constitution of the Federal Republic of
Nigeria 1999 (as amended) ('the
Constitution'). The Nigerian Data Protection
Regulation, 2019 ('NDPR') is the main data
protection regulation in Nigeria. The NDPR
was issued by the National Information
Technology Development Agency ('NITDA').
The NDPR expounded the concept of data
protection under the Constitution.
• The NDPR makes provision for the rights of
data subjects, the obligations of data
controllers and data processors, transfer of
data to a foreign territory amongst others.
1. NDPR Principles -
Transparency
• A data controller has an obligation to take appropriate measures to
provide any information relating to processing to the data subject in a
concise, transparent, intelligible, and easily accessible form, using clear
and plain language, and for any information relating to a child (Section
3.1(1) of the NDPR).
• In addition, prior to collecting personal data from a data subject, a data
controller has to inform the data subject of the purpose(s) of the
processing for which the personal data is intended as well as the legal
basis for the processing (Section 3.1(7)(c) of the NDPR).
2. NDPR Principles –
Purpose and Limitations
• A data controller has an obligation to specify in its privacy policy the
purpose of processing personal data (Section 2.5(c) of the NDPR).
• Where a data controller intends to further process the personal data for a
purpose other than that for which the personal data was collected, the
controller shall provide the data subject prior to that further processing
with information on that other purpose, and with any relevant further
information (Section 3.1(7)(m) of the NDPR).
3. NDPR Principles
- Limitations
The provisions of the NDPR are sacrosanct and no limitation clause in a
privacy policy will exonerate a data controller from liability for violating the
NDPR (Section 2.5(i) of the NDPR).
4. NDPR Principles
- Accuracy
Personal data is expected to be accurate and without prejudice to the dignity
of human person (Section 2.1(1)(b) of the NDPR). A data subject has the right
to access and rectify their data (Section 3.1(7)(h) of the NDPR).
5. NDPR Principles –
Storage Limitation
A data controller has to stipulate in its privacy policy the period for which
personal data will be stored, or if that is not possible, the criteria used to
determine that period (Section 3.1(7)(g) of the NDPR).
6. NDPR Principles
- Confidentiality
A data controller is required to put in place data security apparatus in order
to keep the collected data confidential and protect it against attacks (Section
2.6 of the NDPR).
7. NDPR Principles
- Accountability
Anyone who is entrusted with personal data of a data subject or who is in
possession of such data is accountable for its acts and omissions in respect
of data processing, and in accordance with the principles contained in the
NDPR (Section 2.1(3) of the NDPR).
References
• HILLIER, W. (2022). A Step-by-Step Guide to the Data Analysis Process. [online] careerfoundry.com. Available at: https://careerfoundry.com/en/blog/data-analytics/the-data-analysis-process-step-by-step/.
• ‌
Stedman, C. (n.d.). What is Data Mining? [online] SearchBusinessAnalytics. Available at: https://www.techtarget.com/searchbusinessanalytics/definition/data-mining.
• ‌
SearchBusinessAnalytics. (n.d.). What is Data Preparation? An In-Depth Guide to Data Prep. [online] Available at: https://www.techtarget.com/searchbusinessanalytics/definition/data-preparation.
• ‌
Business Insights - Blog. (2021). 5 Key Elements of a Data Ecosystem. [online] Available at: https://online.hbs.edu/blog/post/data-ecosystem.
• ‌
Miller, K. (2019). Data Visualization Techniques for All Professionals | HBS Online. [online] Business Insights - Blog. Available at: https://online.hbs.edu/blog/post/data-visualization-techniques.
• ‌
Calzon, B. (2022). Learn Here Different Ways of Data Analysis Methods & Techniques. [online] BI Blog | Data Visualization & Analytics Blog | Datapine. Available at: https://www.datapine.com/blog/data-analysis-
methods-and-techniques/.
• ‌
Simplilearn (2021). What is data collection: methods, types, tools, and techniques. [online] Simplilearn.com. Available at: https://www.simplilearn.com/what-is-data-collection-article.
• ‌
Burgess, M. (2020). What is GDPR? The summary guide to GDPR compliance in the UK. [online] Wired.co.uk. Available at: https://www.wired.co.uk/article/what-is-gdpr-uk-eu-legislation-compliance-summary-fines-
2018.
• ‌
Rights of Individuals under the General Data Protection Regulation | Data Protection Commission. (2019). Rights of Individuals under the General Data Protection Regulation | Data Protection Commission. [online]
Available at: https://www.dataprotection.ie/en/individuals/rights-individuals-under-general-data-protection-regulation.
• ‌
DataGuidance. (2021). Nigeria - Data Protection Overview. [online] Available at: https://www.dataguidance.com/notes/nigeria-data-protection-overview.

Introduction to data analytics and data analysis.pptx

  • 1.
    Data Analytics An Introductionto the Principles – Developed for NCS in 2023
  • 2.
    Table of Content • Whatis data? • Types of Data • What is data Analytics and its importance • Types of data analytics • The Data ecosystem • Data Preparation • Data Mining • Planning Data Analysis
  • 3.
    What we willnot do in this programme! 1. We shall not write and code 2. We shall not manually calculate anything 3. We shall not go beyond shallow interpretations of results
  • 4.
    This thing calleddata! What is it?
  • 5.
    What is data? According toWebster Dictionary: • Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation– • information in digital form that can be transmitted or processed • Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things. It is important to note that this is not just numbers; but text, images, videos, sound clips etc.
  • 6.
    What is data? Dataare collected observations or measurements represented as text numbers or multimedia. Raw facts that have not been processed to explain their meaning
  • 7.
    What is data?Data can be.. Numbers/digits Field notes Audio recording Documents Transcripts Videos Images Meeting notes
  • 8.
    Exercise Match the propertype of dataset to the problem
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Quantitative Data • Quantitative datacan be expressed as a number counted or compared • Information can be counted and expressed numerically • Can easily be represented visually in tables and graphs
  • 15.
    Quantitative Data Quantitative Data Discrete Finitevalues Space between values Continuous value fall on a continuum Usually, a physical measure
  • 16.
    Qualitative Data • Provide depthof understanding • Add ‘feel’ or ‘texture’ to quantitative finding • May define a problem • May generate new ideas for research or intervention • Very useful in programme evaluation
  • 17.
  • 18.
    What is Data Analytics? •Data analytics is the process of analyzing raw data in order to draw out meaningful, actionable insights • the process of examining data sets in order to find trends and draw conclusions about the information they contain
  • 19.
    Why is Dataanalytics important?
  • 20.
    Informed Decision Making Analyzingis very advantageous from a management standpoint because it enables you to base judgments more on logic and less on gut feeling. For instance, you can recognize chances for growth, forecast your revenue, or deal with unusual circumstances before they become issues. In this way, you may gather pertinent insights from every department in your company and, with the aid of dashboard software, display the data to various stakeholders in a polished and interactive manner.
  • 21.
    Reduced Cost Cost savingsare an additional wonderful advantage. Businesses may identify chances for improvement, trends, and patterns in their data with the use of cutting-edge technologies like predictive analytics, and then plan their plans accordingly. This will eventually enable you to avoid wasting money and resources on misguided strategies. Not only that, but you may also predict production and supply by forecasting various scenarios, such as sales and demand.
  • 22.
    Targeted Campaigns • Perhapsthe most important factor in every business is the customers. You can discover the channels your consumers use to communicate with you, their demographics, interests, habits, purchase behaviors, and more by using analytics to gain a 360° perspective of all elements of your customers. In the long run, it will help your marketing initiatives succeed, enable you to find new potential clients, and prevent you from wasting money on mistargeting or miscommunicating. You can monitor customer satisfaction by examining feedback from clients or the effectiveness of your customer service division.
  • 23.
    Enhance Productivity With BusinessAnalytics tools, we can have a more profound understanding of primary and secondary data emerging from their activities. This helps businesses refine their procedures further and be more productive.
  • 24.
    Other benefits Improve operational efficiencythrough their daily activities. Assist businesses to understand their customers more precisely. Business uses data visualization to offer projections for future outcomes. These insights help in decision making and planning for the future. Business analytics measures performance and drives growth. Discover hidden trends, generate leads, and scale business in the right direction.
  • 25.
    Types of Dataanalytics
  • 26.
    Descriptive Data Analytics Descriptiveanalytics is the most basic type of analytics and serves as the foundation for all others. It enables you to extract trends from raw data and describe what happened or is happening in a concise manner. Performing descriptive analysis is essential, as it allows us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.
  • 27.
    Descriptive Data Analytics- Scenario A report showing sales of N 900 million may sound impressive, but it lacks context. If that figure represents a 20% month- over-month decline, it is a concern. If it is a 40% year-over-year increase, then it suggests something is going right with the sales strategy. However, the larger context including targeted growth is required to obtain an informed view of the company's sales performance.
  • 28.
    Diagnostic Data Analytics Diagnosticdata analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge. This is one of the world's most important research methods, designed to provide direct and actionable answers to specific questions.
  • 29.
    Diagnostic Data Analytics- Scenario Human resource departments can collect data on employees' physical and psychological safety, issues they care about, and the qualities and skills that make someone successful and happy. Many of these insights come from conducting internal, anonymous surveys and exit interviews to identify factors that influenced employees' decisions to stay or leave. Gathering information about employees' thoughts and feelings allows you to analyze the data and determine how to improve areas such as company culture and benefits. This can range from wishing the company made more corporate social responsibility (CSR) contributions to experiencing workplace discrimination. In these cases, the data makes a case for allocating more resources to CSR and efforts to promote diversity, equity, inclusion, and belonging.
  • 30.
    Predictive Data Analytics •The predictive method allows you to predict what will happen in the future. It does this by combining the results of the previously mentioned descriptive, exploratory, and diagnostic analyses with machine learning (ML) and artificial intelligence (AI). You can discover future trends, potential problems or inefficiencies, connections, and casualties in your data in this manner. • You can use predictive analysis to develop initiatives that will not only improve your various operational processes, but will also help you gain an important competitive advantage. You will be able to develop an informed projection of how things may unfold in specific areas of the business if you understand why a trend, pattern, or event occurred through data.
  • 31.
    Predictive Data Analytics- Scenario Consumer data is abundant in marketing and is used to create content, advertisements, and strategies to better reach potential customers where they are. Predictive analytics is the process of analyzing historical behavioral data and using it to predict what will happen in the future. Furthermore, historical behavioral data can help predict a lead's likelihood of progressing from awareness to purchase. For example, you could use a single linear regression model to determine that the number of content offerings a lead engages with predicts their likelihood of converting to a customer with a statistically significant level of certainty. With this information, you can create targeted ads at various stages of the customer's lifecycle.
  • 32.
    Prescriptive Data Analytics •Another of the most effective types of research analysis methods. Prescriptive data techniques are like predictive analysis in that they use patterns or trends to develop responsive, practical business strategies. • You will play an active role in the data consumption process by taking well- organized sets of visual data and using it as a powerful fix to emerging issues in several key areas by drilling down into prescriptive analysis.
  • 33.
    Prescriptive Data Analytics -Scenario • Another of the most effective types of research analysis methods. Prescriptive data techniques are like predictive analysis in that they use patterns or trends to develop responsive, practical business strategies. • You will play an active role in the data consumption process by taking well- organized sets of visual data and using it as a powerful fix to emerging issues in several key areas by drilling down into prescriptive analysis.
  • 34.
  • 35.
    What is theData Ecosystem The term data ecosystem refers to the programming languages, packages, algorithms, cloud computing services, and overall infrastructure that an organization uses to collect, store, analyze, and leverage data. No two organizations leverage the same data in the same way. As such, each organization has a unique data ecosystem. These ecosystems may overlap in some cases, particularly when data is pulled or scraped from a public source, or when third-party providers are leveraged (for example, cloud storage providers). We shall look at the ecosystem via the Data Project Life Cycle
  • 36.
    The data projectlife cycle Sensing Collection Wrangling Analysis
  • 37.
    Sensing - The processof identifying data sources for your project is referred to as sensing. It entails assessing the quality of data in order to determine its usefulness. This evaluation includes questions such as: Is the data correct? Is the data current and up to date? Is the data complete? Is the data correct? Can it be relied on? Internal sources of data include databases, spreadsheets, CRMs, and other software. It can also come from outside sources like websites or third-party data aggregators.
  • 38.
    Sensing – Aspectsof Data Ecosystem leveraged • Internal data sources: Proprietary databases, spreadsheets, and other resources that originate from within your organization • External data sources: Databases, spreadsheets, websites, and other data sources that originate from outside your organization • Software: Custom software that exists for the sole purpose of data sensing • Algorithms: A set of steps or rules that automates the process of evaluating data for accuracy and completion before it’s used
  • 39.
    Collection • Data collectioncan be completed manually or automatically. However, it is generally not feasible to collect large amounts of data manually. That is why data scientists write software in programming languages to automate the data collection process. • For example, it is possible to write code to "scrape" relevant information from a website (aptly named a web scraper). It is also possible to create and code an application programming interface, or API, to directly extract information from a database or interact with a web application.
  • 40.
    Collection – Aspectsof Data Ecosystem leveraged • Key pieces of the data ecosystem leveraged in this stage include: • Various programming languages: These include R, Python, SQL, and JavaScript • Code packages and libraries: Existing code that’s been written and tested and allows data scientists to generate programs more quickly and efficiently • APIs: Software programs designed to interact with other applications and extract data
  • 41.
    Wrangling • Data wranglingis a collection of processes used to convert raw data into a more usable format. It may involve merging multiple datasets, identifying and filling gaps in data, deleting unnecessary or incorrect data, and "cleaning" and structuring data for future analysis, depending on the quality of the data in question. • Data wrangling, like data collection, can be done manually or automatically. Manual processes can be effective if the dataset is small enough. Most larger data projects require automation because the amount of data is too large.
  • 42.
    Wrangling – Aspects of Data Ecosystem leveraged •Key pieces of the data ecosystem leveraged in this stage include: • Algorithms: A series of steps or rules to be followed to solve a problem (in this case, the evaluation and manipulation of data) • Various programming languages: These include R, Python, SQL, and JavaScript, and can be used to write algorithms • Data wrangling tools: A variety of data wrangling tools can be purchased or sourced for free to perform parts of the data wrangling process. OpenRefine, DataWrangler, and CSVKit are all examples.
  • 43.
    Analysis • Raw datacan be analyzed after it has been inspected and transformed into a usable state. This analysis can be diagnostic, descriptive, predictive, or prescriptive, depending on the specific challenge your data project seeks to address. Although each of these types of analysis is distinct, they all rely on the same processes and tools. • Typically, your analysis will begin with some form of automation, especially if your dataset is extremely large. Following the completion of automated processes, data analysts apply their expertise to glean additional insights.
  • 44.
    Analysis – Aspects of Data Ecosystem Leveraged •Key pieces of the data ecosystem leveraged in this stage include: • Algorithms: A series of steps or rules to be followed to solve a problem (in this case, the analysis of various data points) • Statistical models: Mathematical models used to investigate and interpret data • Data Visualization tools include Tableau, Microsoft BI, and Google Charts, which can generate graphical representations of data. Data visualization software may also have other functionality you can leverage.
  • 45.
    Storage Throughout all thedata life cycle stages, data must be stored in a way that’s both secure and accessible. The exact medium used for storage is dictated by your organization’s policies.
  • 46.
    Storage - Aspectsof Data Ecosystem Leveraged • Key pieces of the data ecosystem leveraged in this stage include: • Cloud-based storage solutions: These allow an organization to store data off-site and access it remotely • On-site servers: These give organizations a greater sense of control over how data is stored and used • Other storage media: These include hard drives, USB devices, Cloud-drives
  • 47.
  • 48.
    Types of DataUsed in Data Analytics Tabular Data and Flat files Multimedia Files
  • 49.
    Some Tabular fileformats 1. xlsx 2. csv 3. tsv 4. txt 5. json
  • 50.
  • 51.
    What is data collection? Datacollection is the process of gathering data for use in business decision-making, strategic planning, research and other purposes. It's a crucial part of data analytics applications and research projects: Effective data collection provides the information that's needed to answer questions, analyze business performance or other outcomes, and predict future trends, actions and scenarios.
  • 52.
    Data Collection in organizations •In businesses, data is collected at different levels examples include: • IT systems regularly collect data on customers, • Employees, • Sales and • other aspects of business operations when transactions are processed and • data is entered • Survey (like from social media) • Internal Data Systems and External Data Sources ( usually used for application and Business Intelligence(BI)
  • 53.
    Data Collection for Research For research inscience, medicine, higher education and other fields, data collection is often a more specialized process, in which researchers create and implement measures to collect specific sets of data. In both the business and research contexts, though, the collected data must be accurate to ensure that analytics findings and research results are valid.
  • 54.
    Data Collection Methods The methodsused to collect data vary based on the type of application. Some involve the use of technology, while others are manual procedures. The following are some common data collection methods:
  • 55.
    Data Collection Methods 1. Automated datacollection functions built into business applications, websites and mobile apps 2. Sensors that collect operational data from industrial equipment, vehicles and other machinery; 3. Collection of data from information services providers and other external data sources
  • 56.
    Data Collection Methods 6. Tracking socialmedia, discussion forums, reviews sites, blogs and other online channels; 7. Surveys, questionnaires and forms, done online, in person or by phone, email or regular mail; 8. Focus groups and one-on-one interviews; 9. Direct observation of participants in a research study
  • 57.
    Data Collection Process Identify a businessor research issue that needs to be addressed and set goals for the project. Gather data requirements to answer the business question or deliver the research information. Identify the data sets that can provide the desired information. Set a plan for collecting the data, including the collection methods that will be used. Collect the available data and begin working to prepare it for analysis.
  • 58.
    Challenges with DataCollection • Data quality issues. Raw data typically includes errors, inconsistencies and other issues. Ideally, data collection measures are designed to avoid or minimize such problems. That is not foolproof in most cases, though. As a result, collected data usually needs to be put through data profiling to identify issues and data cleaning to fix them. • Finding relevant data. With a wide range of systems to navigate, gathering data to analyze can be a complicated task for data scientists and other users in an organization. The use of data curation techniques helps make it easier to find and access data. For example, that might include creating a data dictionary and searchable indexes.
  • 59.
    Challenges with DataCollection • Deciding what data to collect. This is a fundamental issue both for upfront collection of raw data and when users gather data for analytics applications. Collecting data that isn't needed adds time, cost and complexity to the process. But leaving out useful data can limit a data set's business value and affect analytics results. • Dealing with big data. Big data environments typically include a combination of structured, unstructured and semi structured data, in large volumes. That makes the initial data collection and processing stages more complex. In addition, data scientists often need to filter sets of raw data stored in a data lake for specific analytics applications. • Low response and other research issues. In research studies, a lack of responses or willing participants raises questions about the validity of data that's collected. Other research challenges include training people to collect the data and creating sufficient quality assurance procedures to ensure that the data is accurate.
  • 60.
    Data Collection Considerationand Best Practices • The European Union's General Data Protection Regulation (GDPR) and Nigerian Data Protection Regulation (NDPR) enacted in recent years make data privacy and security bigger considerations when collecting data, particularly if it contains personal information about customers. An organization’s data governance program should include policies to ensure that data collection practices comply with laws such as GDPR. • Other data collection best practices include the following: • Make sure you collect the right data to meet business or research needs. • Ensure that the data is accurate, either as it's collected or as part of the data preparation process. • Don't waste time and resources collecting irrelevant data.
  • 61.
    Data Preparation Data preparationis the process of gathering, combining, structuring and organizing data so it can be used in business intelligence (BI), analytics, data visualization applications and Machine Learning
  • 62.
    Components of Data Preparation data preprocessing, profiling, cleansing, validation andtransformation; pulling together data from different internal systems and external sources
  • 63.
    Why Data Prep? Datais commonly created with missing values, inaccuracies or other errors, and separate data sets often have different formats that need to be reconciled when they're combined. Correcting data errors, validating data quality and consolidating data sets are big parts of data preparation projects.
  • 64.
    Why Data Preparatio n • Ensurethe data used in analytics applications produces reliable results; • Identify and fix data issues that otherwise might not be detected; • Enable more informed decision-making by business executives and operational workers; • Reduce data management and analytics costs; • Avoid duplication of effort in preparing data for use in multiple applications; and • Get a higher ROI from BI, analytics and ML initiatives.
  • 65.
    Data Preparation Steps 1. Datacollection. Relevant data is gathered from operational systems, data warehouses, data lakes and other data sources. During this step, data scientists, members of the BI team, other data professionals and end users who collect data should confirm that it's a good fit for the objectives of the planned analytics applications. 2. Data discovery and profiling. The next step is to explore the collected data to better understand what it contains and what needs to be done to prepare it for the intended uses. To help with that, data profiling identifies patterns, relationships and other attributes in the data, as well as inconsistencies, anomalies, missing values and other issues so they can be addressed. 3. Data cleansing. Next, the identified data errors and issues are corrected to create complete and accurate data sets. For example, as part of cleaning data set , faulty data is removed or fixed, missing values are filled in and inconsistent entries are harmonized.
  • 66.
    Data Preparation Steps 6. Datastructuring. At this point, the data needs to be modeled and organized to meet the analytics requirements. For example, data stored in comma-separated values (CSV) files or other file formats has to be converted into tables to make it accessible to BI and analytics tools. 7. Data transformation and enrichment. In addition to being structured, the data typically must be transformed into a unified and usable format. For example, data transformation may involve creating new fields or columns that aggregate values from existing ones. Data enrichment further enhances and optimizes data sets as needed, through measures such as augmenting and adding data. 8. Data validation and publishing. In this last step, automated routines are run against the data to validates is consistency, completeness and accuracy. The prepared data is then stored in a data warehouse, a data lake or another repository and either used directly by whoever prepared it or made available for other users to access.
  • 67.
    Challenges with Data Preparatio n • Inadequateor nonexistent data profiling. If data isn't properly profiled, errors, anomalies and other problems might not be identified, which can result in flawed analytics. • Missing or incomplete data. Data sets often have missing values and other forms of incomplete data; such issues need to be assessed as possible errors and addressed if so. • Invalid data values. Misspellings, other typos and wrong numbers are examples of invalid entries that frequently occur in data and must be fixed to ensure analytics accuracy. • Name and address standardization. Names and addresses may be inconsistent in data from different systems, with variations that can affect views of customers and other entities.
  • 68.
    Challenges with Data Preparation •Inconsistent data across enterprise systems. Other inconsistencies in data sets drawn from multiple source systems, such as different terminology and unique identifiers, are also a pervasive issue in data preparation efforts. • Data enrichment. Deciding how to enrich a data set -- for example, what to add to it - is a complex task that requires a strong understanding of business needs and analytics goals. • Maintaining and expanding data prep processes. Data preparation work often becomes a recurring process that needs to be sustained and enhanced on an ongoing basis.
  • 69.
    Data Preparation Guide and Best practice 1.Think of data preparation as part of data analysis. Data preparation and analysis are "two sides of the same coin," Farmer wrote. Data, he said, can't be properly prepared without knowing what analytics use it needs to fit. 2. Define what data preparation success means. Desired data accuracy levels and other data quality metrics should be set as goals, balanced against projected costs to create a data prep plan that's appropriate to each use case. 3. Prioritize data sources based on the application. Resolving differences in data from multiple source systems is an important element of data preparation that also should be based on the planned analytics use case.
  • 70.
    Data Preparation Guide and Best practice 4.Use the right tools for the job and your skill level. Self-service data preparation tools aren't the only option available -- other tools and technologies can also be used, depending on your skills and data needs. 5. Be prepared for failures when preparing data. Error-handling capabilities need to be built into the data preparation process to prevent it from going awry or getting bogged down when problems occur. 6. Keep an eye on data preparation costs. The cost of software licenses, processing and storage resources, and the people involved in preparing data should be watched closely to ensure that they don't get out of hand.
  • 71.
    Data Validation Data validationis the practice of checking the integrity, accuracy and structure of data before it is used for a business operation. It can also be used to ensure the integrity of data for financial accounting or regulatory compliance.
  • 72.
    Data Validation in Excel Wewill practice data validation in excel
  • 73.
    Data Mining Data miningis the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis. It’s a key part of data analytics and it is one of the key disciplines of Data Science.
  • 74.
    Importance of Data Mining Effectivedata mining aids in various aspects of planning business strategies and managing operations. That includes customer-facing functions such as marketing, advertising, sales and customer support, plus manufacturing, supply chain management, finance and HR. Data mining supports fraud detection, risk management, cyber security planning and many other critical business use cases. It also plays an important role in healthcare, government, scientific research, mathematics, sports and more.
  • 75.
    Data Mining Process 1.Data gathering: Relevantdata for an analytics application is identified and assembled. The data may be in different source systems. 2. Data Preparation: This stage includes a set of steps to get the data ready to be mined. It starts with data exploration, profiling and pre-processing, followed by data cleansing work to fix errors and other issues. Data transformation is also done to make data sets consistent, unless a data scientist is looking to analyze unfiltered raw data for a particular application.
  • 76.
    Data Mining Process 3. Mining thedata. Once the data is prepared, a data analyst chooses the appropriate data mining technique and then implements one or more algorithms to do the mining. In machine learning applications, the algorithms typically must be trained on sample data sets to look for the information being sought before they're run against the full set of data. 4. Data analysis and interpretation. The data mining results are used to create analytical models that can help drive decision-making and other business actions.
  • 77.
    Some Data Mining Techniques •Association rule mining. In data mining, association rules are if-then statements that identify relationships between data elements. Support and confidence criteria are used to assess the relationships -- support measures how frequently the related elements appear in a data set, while confidence reflects the number of times an if-then statement is accurate. • Classification. This approach assigns the elements in data sets to different categories defined as part of the data mining process. Decision trees, Naive Bayes classifiers, k-nearest neighbor and logistic regression are some examples of classification methods.
  • 78.
    Some Data Mining Techniques •Clustering. In this case, data elements that share characteristics are grouped together into clusters as part of data mining applications. Examples include k- means clustering, hierarchical clustering and Gaussian mixture models. • Regression. This is another way to find relationships in data sets, by calculating predicted data values based on a set of variables. Linear regression and multivariate regression are examples. Decision trees and some other classification methods can be used to do regressions, too.
  • 79.
    Some Data Mining Techniques •Sequence and path analysis. Data can also be mined to look for patterns in which a particular set of events or values leads to later ones. • Neural networks. A. neural network is a set of algorithms that simulates the activity of the human brain. Neural networks are particularly useful in complex pattern recognition applications involving deep learning, a more advanced offshoot of machine learning.
  • 80.
    Some Applications in finance 1. Banksand credit card companies use data mining tools to build financial risk models, detect fraudulent transactions and vet loan and credit applications. Data mining also plays a key role in marketing and in identifying potential upselling opportunities with existing customers. 2. Insurers rely on data mining to aid in pricing insurance policies and deciding whether to approve policy applications, including risk modeling and management for prospective customers.
  • 81.
  • 82.
    Preamble Data analysis, likeany scientific discipline, follows a strict step-by-step procedure. Each stage necessitates a unique set of skills and knowledge. To gain meaningful insights, however, it is necessary to comprehend the entire process. A solid foundation is essential for producing results that can withstand scrutiny.
  • 83.
    The following are standard steps towards data analysis Definingthe question Collecting the data Cleaning the data Analyzing the data Sharing your results Embrace failure
  • 84.
    Step 1 Defining theObjective Defining your objective entails developing a hypothesis and determining how to test it. Begin by asking yourself, "What business problem am I attempting to solve?" While this may appear to be a simple task, it can be more difficult than it appears. For example, your organization's senior management may raise the question, "Why are so many microfinance banks failin?" However, it is possible that this does not address the root of the problem. A data analyst's job is to understand the organization and its goals thoroughly enough to frame the problem correctly.
  • 85.
    Step 2 Collecting the Data Afteryou've determined your goal, you'll need to devise a strategy for collecting and aggregating the necessary data. A critical component of this is determining which data you require. This could be quantitative (numerical) data, such as sales figures, or qualitative (descriptive) data, such as customer feedback. All data was classified as first-party, second-party, or third-party data. Let's investigate each one.
  • 86.
    Step 2 Collecting the Data– First Party Data First-party data is information that you or your company has obtained directly from customers. It could be transactional tracking data or data from your company's customer relationship management (CRM) system. First- party data, regardless of its source, is typically structured and organized in a consistent, defined manner. Customer satisfaction surveys, focus groups, interviews, and direct observation are all possible sources of first- party data.
  • 87.
    Step 2 Collecting the Data– Second Party Data You may want to secure a secondary data source to supplement your analysis. The first- party data of other organizations is referred to as second-party data. This could be obtained directly from the company or via a private marketplace. The main advantage of second- party data is that it is usually structured, and while it is less relevant than first-party data, it is also quite reliable. Website, app, or social media activity, such as online purchase histories or shipping data, are examples of second-party data.
  • 88.
    Step 2 Collecting the Data– Third Party Data Third-party data is information gathered and aggregated from multiple sources by a third-party organization. Third- party data frequently (but not always) contains many unstructured data points (big data). Many businesses collect big data in order to create industry reports or conduct market research. Gartner, a research and advisory firm, is a good real-world example of a company that collects big data and sells it to other companies. Third-party data can also be found in open data repositories and government portals.
  • 89.
    Step 3 Cleaning theData Cleaning data usually takes up to 70% to 90% of the time in a typical data project. The following steps are standard data cleaning procedures • Removing major errors, duplicates, and outliers—all of which are inevitable problems when aggregating data from numerous sources. • Removing unwanted data points—extracting irrelevant observations that have no bearing on your intended analysis.
  • 90.
    Step 3 Cleaning theData • Bringing structure to your data—general ‘housekeeping’, i.e. fixing typos or layout issues, which will help you map and manipulate your data more easily. • Filling in major gaps—as you’re tidying up, you might notice that important data are missing. Once you’ve identified gaps, you can go about filling them.
  • 91.
    Step 4 Analyze the data Thetype of data analysis you perform is largely determined by your goal. However, there are numerous techniques available. Some examples include univariate or bivariate analysis, time-series analysis, and regression analysis. What matters more than the different types is how you use them. This is determined by the insights you seek.
  • 92.
    Step 5 Presentation ofthe Result The final step in the data analytics process is to share these insights with the rest of the world (or, at the very least, with the stakeholders in your organization!) This is more complicated than simply sharing the raw results of your work; it entails interpreting the results and presenting them in a way that all types of audiences can understand. Because you will be presenting information to decision-makers on a regular basis, it is critical that the insights you present are completely clear and unambiguous.
  • 93.
    Wait! One more step- Embrace yourfailures Data analytics is inherently messy, and the process you use will vary depending on the project. For example, while cleaning data, you may notice patterns that prompt a whole new set of questions. This could return you to step one (to redefine your objective). Similarly, an exploratory analysis may reveal a set of data points you had never considered using before. Or perhaps you discover that the results of your core analyses are misleading or incorrect. This could be due to data errors or human error earlier in the process.
  • 94.
    Embrace your failure Data analysis isinherently chaotic, and mistakes occur. What’s important is to hone your ability to spot and rectify errors. If data analytics was straightforward, it might be easier, but it certainly wouldn’t be as interesting. Use the steps we’ve outlined as a framework, stay open-minded, and be creative. If you lose your way, you can refer to the process to keep yourself on track.
  • 95.
    Data Visualization and Techniques Data visualizationis the process of creating graphical representations of information. This process helps the presenter communicate data in a way that’s easy for the viewer to interpret and draw conclusions.
  • 96.
    1. Pie Charts Piecharts are ideal for illustrating proportions, or part-to-whole comparisons. Pie charts are best suited for audiences who are unfamiliar with the information or are only interested in the key takeaways because they are relatively simple and easy to read. Pie charts fall short in their ability to display complex information for viewers who require a more thorough explanation of the data.
  • 97.
    2. Bar Charts Thecategories being compared are shown on one axis of the chart, and the measured value is shown on the other. The length of the bar represents how each group performs in relation to the value. One disadvantage is that when there are too many categories, labeling and clarity can become difficult. They, like pie charts, can be too simple for more complex data sets.
  • 98.
    3. Histogram Histograms, asopposed to bar charts, depict the distribution of data over a continuous interval or defined period. These visualizations aid in determining where values are concentrated as well as gaps or unusual values. Histograms are particularly useful for displaying the frequency of an occurrence. A histogram, for example, can be used to show how many clicks your website received each day over the last week. You can quickly determine which days your website received the most and least clicks using this visualization.
  • 99.
    4. Heat Map Itis used to communicate values in such a way that the viewer can quickly identify trends. A clear legend is required for a user to successfully read and interpret a heatmap.
  • 100.
    5. Scatter Plot Ascatter plot displays data for two variables as represented by points plotted against the horizontal and vertical axis. This type of data visualization is useful in illustrating the relationships that exist between variables and can be used to identify trends or correlations in data. Scatter plots are most effective for fairly large data sets, since it’s often easier to identify trends when there are more data points present. Additionally, the closer the data points are grouped together, the stronger the correlation or trend tends to be.
  • 101.
    6.Pictogram Chart Pictogram charts,also known as pictograph charts, are particularly useful for visually and engagingly presenting simple data. These charts visualize data using icons, with each icon representing a different value or category. Data about time, for example, could be represented by clock or watch icons. Each icon can represent a single unit or a set number of units (for example, each icon represents 100 units). In addition to making the data more engaging, pictogram charts are helpful in situations where language or cultural differences might be a barrier to the audience’s understanding of the data.
  • 102.
    7. Highlight Table Ahighlight table is a more interesting option than traditional tables. You can make it easier for viewers to spot trends and patterns in the data by highlighting cells in the table with color. These visualizations can help you compare categorical data. You may be able to add conditional formatting rules to the table that automatically color cells that meet specified conditions, depending on the data visualization tool you're using. When using a highlight table to visualize a company's sales data, for example, you can color cells red if the sales data is below the goal, or green if the sales data is above the goal. Unlike a heat map, each color in a highlight table represents a single meaning or value.
  • 103.
    8. Chloroplate map Achoropleth map visualizes numerical values across geographic regions by using color, shading, and other patterns. These visualizations use a color progression (or shading) on a spectrum to distinguish between high and low values. Choropleth maps show how a variable changes from one region to the next. Because the colors represent a range of values, the exact numerical values aren't easily accessible in this type of visualization. However, some data visualization tools allow you to add interactivity to your map so that you can see the exact values.
  • 104.
    9. Word Cloud Aword cloud, also known as a tag cloud, is a visual representation of text data in which the size of each word corresponds to its frequency. The larger a specific word appears in the visualization, the more frequently it appears in the dataset. Words may appear bolder or follow a specific color scheme depending on their frequency, in addition to size. Word clouds are frequently used on websites and blogs to identify significant keywords and compare textual data differences between two sources. They are also useful when analyzing qualitative datasets, such as the specific words used to describe a product by customers.
  • 105.
    Correlation Matrix A correlation matrixis a table that shows correlation coefficients between variables. Each cell represents the relationship between two variables, and a color scale is used to communicate whether the variables are correlated and to what extent. Correlation matrices are useful to summarize and find patterns in large data sets. In business, a correlation matrix might be used to analyze how different data points about a specific product might be related, such as price, advertising spend, launch date, etc.
  • 106.
    Making Data Backed Decisionswith Data Visualization Data visualization is a powerful tool when it comes to addressing business questions and making informed decisions. Learning how to create effective illustrations can empower you to share findings with key stakeholders and other audiences in a manner that’s engaging and easy to understand.
  • 107.
    Data Story Telling Datastorytelling is crucial because it enables data to be communicated clearly. Effective data analysis can support better decision-making, which strengthens your working relationship with your clients, management, and particularly investors. This type of data analysis can also assist in altering people's habits and enhancing their comprehension of challenging problems.
  • 108.
    The Psychological Power ofStorytelling • When someone hears a story, multiple parts of the brain are engaged, including: • The area which controls language comprehension • The area which processes emotional response • The neurons which play a role in empathy • When multiple areas of the brain are engaged, it is more likely to convert the experience of hearing a story into a long-term memory. In other words, you leave an experience with your audience. • Instead of sharing or reading from a data spreadsheet and a list of figures, think about how you may activate different brain regions listed above. You may make your arguments more memorable and actionable by using data storytelling to elicit an emotional response on a brain level. • By using data storytelling, data insights may be put into action. Without clear communication, insights may be overlooked and forgotten.
  • 109.
    Data Storytelling Process • Thereare 3 key steps to data storytelling: • Storyboarding - mapping out the direction and flow that data insights will follow, from start to conclusion. This can be done on a sheet of paper or with diagrams. It helps identify how insights should best be presented to guide their audience to a meaningful and valuable conclusion. • Data visualization - Data visualization gives stakeholders the ability to use information intuitively, without deep technical expertise. Our eyes are drawn to colors, patterns, and shapes as viewers. Thus, putting analyses into graphs, charts, and graphics enables the audience to access and understand the information not just with their eyes but with their mind. In our case here, we will use the dashboard from the earlier exercise. • Data Narrative(the most crucial step): It is a key vehicle to convey meaningful insights, with visualizations and data being the ‘proof’.
  • 110.
    Crafting a CompellingData Narrative Components • The following steps are key to crafting a compelling data narrative: • Character • Setting • Conflict or value • Resolution • We shall use the following scenario to explain data narrative. • An ISP received a grant from a donor agency to provide internet services in a remote community which has only been made popular by a college of education. He is to present a report to the donors on how he utilized the bandwidth given to him and the money made from it. He used the grant to provide the following services: PoS, final year project research by students, receiving and sending of emails as a service, job applications, online examinations, general internet surfing by students.
  • 111.
    Data Narrative :Character • Before this narrative, a storyboard has been developed and a dashboard of visual data created for this. • Set the scene by explaining how the service has provided impact in that community. Examples: • the Service has created employment for 50 PoS merchants • 413 final year student utilized our facility for their project • During the last Nigerian Army recruitment, over 100 youths in the community used our facility to submit their application; 86 of them were called for an interview • Use a data visualization to show the decline across audience types and highlight the largest drop in young users.
  • 112.
    Data Narrative : Conflictor Value • This aspect of the narrative throws more light to the success or the failure. In the case of our scenario, the conflict could go like this: • The relatively huge success record is largely due to our Customer Education efforts as a value-added service to the internet services in the community. • We take out time to show community members how to thoroughly make these job applications. • Once a student comes in and registers to use the facility for their project, we go out of our way to train them on how to conduct research over the internet. • The community produces over 3.5 tons of cashews annually and then sell this to urban merchants. These transactions were cash based but the 40 PoS jobs created from our internet service have facilitated payments to a tune of over 33 million naira for the farmers and yielded over 80% financial inclusion amongst the farmers in the community.
  • 113.
    Data Narrative : Resolution •The PoS services exist only within a radius of 150 metres from the facility with about 40 PoS merchants. People come from all over the community to carry out financial transactions. • An upgrade on the infrastructure and bandwidth will allow the service to go beyond 150 metres radius to as much as 4 kilometres radius. • This can also increase the number of students using the service for their project research from 413 to over 1000 students annually. • Over 400 PoS merchants will emerge to facilitate financial transactions. This will also attract more urban merchants to conduct business with the community farmers.
  • 114.
    Data Management Best Practicesand Protection Laws GDPR and NDPR
  • 115.
    General Data Protection Regulation - GDPR • GDPRcame into force on May 25, 2018. Countries within Europe were given the ability to make their own small changes to suit their own needs. Within the UK this flexibility led to the creation of the Data Protection Act (2018), which superseded the previous 1998 Data Protection Act. • GDPR can be considered as the world's strongest set of data protection rules, which enhance how people can access information about them and places limits on what organisations can do with personal data • GDPR was designed to "harmonise" data privacy laws across all of its members countries as well as providing greater protection and rights to individuals. • GDPR was also created to alter how businesses and other organisations can handle the information of those that interact with them. • There's the potential for large fines and reputational damage for those found in breach of the rules.
  • 116.
    GDPR Principles Article 5 ofthe General Data Protection Regulation (GDPR) sets out key principles which lie at the heart of the general data protection regime. These key principles are set out right at the beginning of the GDPR and they both directly and indirectly influence the other rules and obligations found throughout the legislation. Therefore, compliance with these fundamental principles of data protection is the first step for controllers in ensuring that they fulfil their obligations under the GDPR. The following is a brief overview of the Principles of Data Protection found in article 5 GDPR:
  • 117.
    1. Lawfulness, fairness, andtransparency: Any processing of personal data should be lawful and fair. It should be transparent to individuals that personal data concerning them are collected, used, consulted, or otherwise processed and to what extent the personal data are or will be processed. The principle of transparency requires that any information and communication relating to the processing of those personal data be easily accessible and easy to understand, and that clear and plain language be used.
  • 118.
    2. Purpose Limitation Personal datashould only be collected for specified, explicit, and legitimate purposes and not further processed in a manner that is incompatible with those purposes. In particular, the specific purposes for which personal data are processed should be explicit and legitimate and determined at the time of the collection of the personal data. However, further processing for archiving purposes in the public interest, scientific, or historical research purposes or statistical purposes (in accordance with Article 89(1) GDPR) is not considered to be incompatible with the initial purposes.
  • 119.
    3. Data Minimisation Processing ofpersonal data must be adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed. Personal data should be processed only if the purpose of the processing could not reasonably be fulfilled by other means. This requires, in particular, ensuring that the period for which the personal data are stored is limited to a strict minimum (see also the principle of ‘Storage Limitation’ below).
  • 120.
    4. Accuracy Controllers mustensure that personal data are accurate and, where necessary, kept up to date; taking every reasonable step to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay. In particular, controllers should accurately record information they collect or receive and the source of that information.
  • 121.
    5. Storage Limitation Personal datashould only be kept in a form which permits identification of data subjects for as long as is necessary for the purposes for which the personal data are processed. In order to ensure that the personal data are not kept longer than necessary, time limits should be established by the controller for erasure or for a periodic review.
  • 122.
    6. Integrity and Confidentiality: Personaldata should be processed in a manner that ensures appropriate security and confidentiality of the personal data, including protection against unauthorised or unlawful access to or use of personal data and the equipment used for the processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures.
  • 123.
    7. Accountability Finally, thecontroller is responsible for, and must be able to demonstrate, their compliance with all of the above-named Principles of Data Protection. Controllers must take responsibility for their processing of personal data and how they comply with the GDPR and be able to demonstrate (through appropriate records and measures) their compliance, in particular to the DPC.
  • 124.
    Nigeria Data Protection Regulation - NDPR InNigeria, data protection is a constitutional right founded on Section 37 of the Constitution of the Federal Republic of Nigeria 1999 (as amended) ('the Constitution'). The Nigerian Data Protection Regulation, 2019 ('NDPR') is the main data protection regulation in Nigeria. The NDPR was issued by the National Information Technology Development Agency ('NITDA'). The NDPR expounded the concept of data protection under the Constitution.
  • 125.
    Nigeria Data Protection Regulation - NDPR •In Nigeria, data protection is a constitutional right founded on Section 37 of the Constitution of the Federal Republic of Nigeria 1999 (as amended) ('the Constitution'). The Nigerian Data Protection Regulation, 2019 ('NDPR') is the main data protection regulation in Nigeria. The NDPR was issued by the National Information Technology Development Agency ('NITDA'). The NDPR expounded the concept of data protection under the Constitution. • The NDPR makes provision for the rights of data subjects, the obligations of data controllers and data processors, transfer of data to a foreign territory amongst others.
  • 126.
    1. NDPR Principles- Transparency • A data controller has an obligation to take appropriate measures to provide any information relating to processing to the data subject in a concise, transparent, intelligible, and easily accessible form, using clear and plain language, and for any information relating to a child (Section 3.1(1) of the NDPR). • In addition, prior to collecting personal data from a data subject, a data controller has to inform the data subject of the purpose(s) of the processing for which the personal data is intended as well as the legal basis for the processing (Section 3.1(7)(c) of the NDPR).
  • 127.
    2. NDPR Principles– Purpose and Limitations • A data controller has an obligation to specify in its privacy policy the purpose of processing personal data (Section 2.5(c) of the NDPR). • Where a data controller intends to further process the personal data for a purpose other than that for which the personal data was collected, the controller shall provide the data subject prior to that further processing with information on that other purpose, and with any relevant further information (Section 3.1(7)(m) of the NDPR).
  • 128.
    3. NDPR Principles -Limitations The provisions of the NDPR are sacrosanct and no limitation clause in a privacy policy will exonerate a data controller from liability for violating the NDPR (Section 2.5(i) of the NDPR).
  • 129.
    4. NDPR Principles -Accuracy Personal data is expected to be accurate and without prejudice to the dignity of human person (Section 2.1(1)(b) of the NDPR). A data subject has the right to access and rectify their data (Section 3.1(7)(h) of the NDPR).
  • 130.
    5. NDPR Principles– Storage Limitation A data controller has to stipulate in its privacy policy the period for which personal data will be stored, or if that is not possible, the criteria used to determine that period (Section 3.1(7)(g) of the NDPR).
  • 131.
    6. NDPR Principles -Confidentiality A data controller is required to put in place data security apparatus in order to keep the collected data confidential and protect it against attacks (Section 2.6 of the NDPR).
  • 132.
    7. NDPR Principles -Accountability Anyone who is entrusted with personal data of a data subject or who is in possession of such data is accountable for its acts and omissions in respect of data processing, and in accordance with the principles contained in the NDPR (Section 2.1(3) of the NDPR).
  • 133.
    References • HILLIER, W.(2022). A Step-by-Step Guide to the Data Analysis Process. [online] careerfoundry.com. Available at: https://careerfoundry.com/en/blog/data-analytics/the-data-analysis-process-step-by-step/. • ‌ Stedman, C. (n.d.). What is Data Mining? [online] SearchBusinessAnalytics. Available at: https://www.techtarget.com/searchbusinessanalytics/definition/data-mining. • ‌ SearchBusinessAnalytics. (n.d.). What is Data Preparation? An In-Depth Guide to Data Prep. [online] Available at: https://www.techtarget.com/searchbusinessanalytics/definition/data-preparation. • ‌ Business Insights - Blog. (2021). 5 Key Elements of a Data Ecosystem. [online] Available at: https://online.hbs.edu/blog/post/data-ecosystem. • ‌ Miller, K. (2019). Data Visualization Techniques for All Professionals | HBS Online. [online] Business Insights - Blog. Available at: https://online.hbs.edu/blog/post/data-visualization-techniques. • ‌ Calzon, B. (2022). Learn Here Different Ways of Data Analysis Methods & Techniques. [online] BI Blog | Data Visualization & Analytics Blog | Datapine. Available at: https://www.datapine.com/blog/data-analysis- methods-and-techniques/. • ‌ Simplilearn (2021). What is data collection: methods, types, tools, and techniques. [online] Simplilearn.com. Available at: https://www.simplilearn.com/what-is-data-collection-article. • ‌ Burgess, M. (2020). What is GDPR? The summary guide to GDPR compliance in the UK. [online] Wired.co.uk. Available at: https://www.wired.co.uk/article/what-is-gdpr-uk-eu-legislation-compliance-summary-fines- 2018. • ‌ Rights of Individuals under the General Data Protection Regulation | Data Protection Commission. (2019). Rights of Individuals under the General Data Protection Regulation | Data Protection Commission. [online] Available at: https://www.dataprotection.ie/en/individuals/rights-individuals-under-general-data-protection-regulation. • ‌ DataGuidance. (2021). Nigeria - Data Protection Overview. [online] Available at: https://www.dataguidance.com/notes/nigeria-data-protection-overview.