What's Driving Data Deluge?
• Big Data is data whose
scale, distribution,
diversity, and/or
timeliness require the
use of new technical
architectures and
analytics to enable
insights that unlock new
sources of business
value.
Three attributes defining Big Data characteristics:
• Huge volume of data: Rather than thousands or millions of
rows, Big Data can be billions of rows and millions of columns.
• Complexity of data types and structures: Big Data reflects the
variety of new data sources, formats, and structures, including
digital traces being left on the web and other digital repositories
for subsequent analysis.
• Speed of new data creation and growth: Big Data can describe
high velocity data, with rapid data ingestion and near real time
analysis.
Big Data is sometimes described as having 3 Vs: volume,
variety, and velocity.
• Social media and genetic sequencing are among the fastest-growing
sources of Big Data and examples of untraditional sources of data
being used for analysis.
• Another example comes from genomics.
• While data has grown, the cost to perform this work has fallen
dramatically. The cost to sequence one human genome has
fallen from $100 million in 2001 to $10,000 in 2011, and the cost
continues to drop.
Data Structures
• Big Data is unstructured
or semi-structured in
nature, which requires
different techniques and
tools to process and
analyze.
Data Structures
• Although analyzing structured data tends to be the most familiar
technique, a different technique is required to meet the
challenges to analyze semi-structured data (shown as XML),
quasi-structured (shown as a clickstream), and unstructured
data. Here are examples of how each of the four main types of
data structures may look.
Structured data: Data containing a defined data type, format,
and structure (that is, transaction data, online analytical
processing [OLAP] data cubes, traditional RDBMS, CSV files,
and even simple spreadsheets.
Example of structured data
Data Structures
• Semi-structured data:
Textual data files with a
discernible pattern that
enables parsing (such as
Extensible Markup
Language [XML] data files
that are self describing and
defined by an XML
schema).
• See Figure 1-5..
Data Structures
• Quasi-structured data:
Textual data with erratic
data formats that can be
formatted with effort, tools,
and time (for instance, web
clickstream data that may
contain inconsistencies in
data values and formats).
See Figure 1-6.
Data Structures
• Unstructured data:
Data that has no
inherent structure,
which may include text
documents, PDFs,
images, and video.
See Figure 1-7.
BI Versus Data Science
Bl tends to provide reports,
dashboards, and queries on
business questions for the current
period or in the past. Bl systems
make it easy to answer questions
related to quarter-to-date revenue,
progress toward quarterly targets,
and understand how much of a
given product was sold in a prior
quarter or year. These questions
tend to be closed-ended and
explain current or past behavior,
typically by aggregating historical
data and grouping it in some way..
• Organizations and data collectors are realizing that the data
they can gather from individuals contains intrinsic value and, as
a result, a new economy is emerging. As this new digital
economy continues to evolve, the market sees the introduction
of data vendors and data cleaners that use crowdsourcing
(such as Mechanical Turk and GalaxyZoo) to test the outcomes
of machine learning techniques. As the new ecosystem takes
shape, there are four main groups of players within this
interconnected web. These are shown in Figure 1-11.
• Data devices
• Data collectors
• Data aggregators
• Data users and buyers
What is Analytics?
Raw data in itself does not have a meaning until it
is contextualized and processed into useful
information.
Analytics is this process of extracting and creating
information from raw data by filtering, processing,
categorizing, condensing and contextualizing the
data.
What is Analytics?
The choice of the technologies, algorithms, and frameworks for
analytics is driven by the analytics goals of the application. For
example, the goals of the analytics task may be: (1) to predict
something (for example whether a transaction is a fraud or not,
whether it will rain on a particular day, or whether a tumor is benign or
malignant), (2) to find patterns in the data (for example, finding the top
10 coldest days in the year, finding which pages are visited the most on
a particular website, or finding the most searched celebrity in a
particular year), (3) finding relationships in the data (for example,
finding similar news articles, finding similar patients in an electronic
health record system, finding related products on an eCommerce
website, finding similar images, or finding correlation between news
items and stock prices).
What is Analytics?
Descriptive Analytics
• Descriptive analytics comprises analyzing past data to present it
in a summarized form which can be easily interpreted.
Descriptive analytics aims to answer - What has happened?
For example, computing the total number of likes for a particular
post, computing the average monthly rainfall or finding the
average number of visitors per month on a website. Descriptive
analytics is useful to summarize the data.
A major portion of analytics done today is descriptive analytics
through use of statistics functions such as counts, maximum,
minimum, mean, top-N, percentage, for instance.
Help in describing patterns in the data and present the data in a
summarized form.
What is Predictive Data Analytics?
The term predictive analytics refers to the use of statistics and
modeling techniques to make predictions about future outcomes and
performance. Predictive analytics looks at current and historical data
patterns to determine if those patterns are likely to emerge again. This
allows businesses and investors to adjust where they use their
resources to take advantage of possible future events. Predictive
analysis can also be used to improve operational efficiencies and
reduce risk.
Key Takeaways of PDA
• Predictive analytics uses statistics and modeling techniques to
determine future performance.
• Industries and disciplines, such as insurance and marketing, use
predictive techniques to make important decisions.
• Predictive models help make weather forecasts, develop video games,
translate voice-to-text messages, customer service decisions, and
develop investment portfolios.
• People often confuse predictive analytics with machine learning even
though the two are different disciplines.
• Types of predictive models include decision trees, regression, and
neural networks.
Understanding Predictive Analytics
•Predictive analytics is a form of technology that
makes predictions about certain unknowns in the
future. It draws on a series of techniques to make
these determinations, including artificial intelligence
(AI), data mining, machine learning, modeling, and
statistics.
• For instance, data mining involves the analysis of
large sets of data to detect patterns from it. Text
analysis does the same, except for large blocks of
text.
Applications of Predictive Models
• Weather forecasts
• Creating video games
• Translating voice to text for mobile phone messaging
• Customer service
• Investment portfolio development
All of these applications use descriptive statistical models of
existing data to make predictions about future data.
Applications of Predictive Models
They're also useful for businesses to help them manage inventory,
develop marketing strategies, and forecast sales.4 It also helps
businesses survive, especially those in highly competitive industries,
such as health care and retail.5 Investors and financial professionals can
draw on this technology to help craft investment portfolios and reduce
the potential for risk.
Uses of Predictive Analytics
• Forecasting
Forecasting is essential in manufacturing because it ensures the optimal
utilization of resources in a supply chain. Predictive modeling is often used
to clean and optimize the quality of data used for such forecasts. Modeling
ensures that more data can be ingested by the system, including from
customer-facing operations, to ensure a more accurate forecast.
• Credit
Credit scoring makes extensive use of predictive analytics. When a consumer
or business applies for credit, data on the applicant's credit history and the
credit record of borrowers with similar characteristics are used to predict the
risk that the applicant might fail to perform on any credit extended.
• Underwriting
Data and predictive analytics play an important role in
underwriting. Insurance companies examine policy
applicants to determine the likelihood of having to pay out
for a future claim based on the current risk pool of similar
policyholders, as well as past events that have resulted in
payouts. Predictive models that consider characteristics
in comparison to data about past policyholders and
claims are routinely used by actuaries.
Applications of Predictive Models
• Marketing
Individuals who work in this field look at how consumers
have reacted to the overall economy when planning on a
new campaign. They can use these shifts in
demographics to determine if the current mix of products
will entice consumers to make a purchase.
Active traders, meanwhile, look at a variety of metrics
based on past events when deciding whether to buy or
sell a security. Moving averages, bands,
and breakpoints are based on historical data and are
used to forecast future price movements.
Predictive Analytics vs. Machine Learning
A common misconception is that predictive analytics
and machine learning are the same things.
Predictive analytics help us understand possible future
occurrences by analyzing the past. At its core, predictive
analytics includes a series of statistical techniques
(including machine learning, predictive modeling, and
data mining) and uses statistics (both historical and
current) to estimate, or predict, future outcomes.
Predictive Analytics vs. Machine Learning
Machine learning, on the other hand, is a subfield of computer science
that, as per the 1959 definition by Arthur Samuel (an American pioneer
in the field of computer gaming and artificial intelligence) means "the
programming of a digital computer to behave in a way which, if done
by human beings or animals, would be described as involving the
process of learning."
There are three
common
techniques used in
predictive
analytics:
❖Decision trees,
❖Neural networks,
❖Regression.
Types of Predictive Analytical Models
• Decision Trees If you want to understand what leads to
someone's decisions, then you may find decision trees
useful. This type of model places data into different
sections based on certain variables, such as price
or market capitalization. Just as the name implies, it
looks like a tree with individual branches and leaves.
Branches indicate the choices available while individual
leaves represent a particular decision.
• Decision trees are the simplest models because they're
easy to understand and dissect. They're also very
useful when you need to make a decision in a short
period of time
Types of Predictive Analytical Models
Regression
It is used when you want to determine
patterns in large sets of data and when
there's a linear relationship between the
inputs. This method works by figuring out
a formula, which represents the
relationship between all the inputs found
in the dataset.
For example, you can use regression to
figure out how price and other key
factors can shape the performance of
a security.
Applications of Predictive Models
• Neural Networks
Neural networks were
developed as a form of
predictive analytics by
imitating the way the human
brain works. This model can
deal with complex data
relationships using artificial
intelligence and pattern
recognition.
Applications of Predictive Models
• Artificial Neural
Network
(ANN) uses the
processing of the
brain as a basis to
develop algorithms
that can be used to
model complex
patterns and
prediction
problems.
How Businesses Can Use Predictive Analytics
• Predictive models are frequently used by businesses to help
improve their customer service and outreach.
• Executives and business owners can take advantage of this
kind of statistical analysis to determine customer behavior. For
instance, the owner of a business can use predictive techniques
to identify and target regular customers who could defect and
go to a competitor.
• Predictive analytics plays a key role in advertising
and marketing. Companies can use models to determine which
customers are likely to respond positively to marketing and
sales campaigns. Business owners can save money by
targeting customers who will respond positively rather than
doing blanket campaigns
Benefits of Predictive Analytics
• Using this type of analysis can help entities when you need to make
predictions about outcomes when there are no other (and obvious)
answers available. Investors, financial professionals, and business
leaders are able to use models to help reduce risk. For instance, an
investor and their advisor can use certain models to help craft an
investment portfolio with minimal risk to the investor by taking
certain factors into consideration, such as age, capital, and goals.
• Businesses can determine the likelihood of success or failure of a
product before it launches.
Criticism of Predictive Analytics
• The use of predictive analytics has been criticized and, in some
cases, legally restricted due to perceived inequities in its
outcomes. Most commonly, this involves predictive models that
result in statistical discrimination against racial or ethnic groups
in areas such as credit scoring, home.
A famous example of this is the (now illegal) practice
of redlining in home lending by banks. Regardless of whether
the predictions drawn from the use of such analytics are
accurate, their use is generally frowned upon, and data that
explicitly include information such as a person's race are now
often excluded from predictive analytics. lending, employment,
or risk of criminal behavior.
Big data overview

Big data overview

  • 2.
    What's Driving DataDeluge? • Big Data is data whose scale, distribution, diversity, and/or timeliness require the use of new technical architectures and analytics to enable insights that unlock new sources of business value.
  • 3.
    Three attributes definingBig Data characteristics: • Huge volume of data: Rather than thousands or millions of rows, Big Data can be billions of rows and millions of columns. • Complexity of data types and structures: Big Data reflects the variety of new data sources, formats, and structures, including digital traces being left on the web and other digital repositories for subsequent analysis. • Speed of new data creation and growth: Big Data can describe high velocity data, with rapid data ingestion and near real time analysis. Big Data is sometimes described as having 3 Vs: volume, variety, and velocity.
  • 4.
    • Social mediaand genetic sequencing are among the fastest-growing sources of Big Data and examples of untraditional sources of data being used for analysis. • Another example comes from genomics. • While data has grown, the cost to perform this work has fallen dramatically. The cost to sequence one human genome has fallen from $100 million in 2001 to $10,000 in 2011, and the cost continues to drop.
  • 5.
    Data Structures • BigData is unstructured or semi-structured in nature, which requires different techniques and tools to process and analyze.
  • 6.
    Data Structures • Althoughanalyzing structured data tends to be the most familiar technique, a different technique is required to meet the challenges to analyze semi-structured data (shown as XML), quasi-structured (shown as a clickstream), and unstructured data. Here are examples of how each of the four main types of data structures may look. Structured data: Data containing a defined data type, format, and structure (that is, transaction data, online analytical processing [OLAP] data cubes, traditional RDBMS, CSV files, and even simple spreadsheets.
  • 7.
  • 8.
    Data Structures • Semi-structureddata: Textual data files with a discernible pattern that enables parsing (such as Extensible Markup Language [XML] data files that are self describing and defined by an XML schema). • See Figure 1-5..
  • 9.
    Data Structures • Quasi-structureddata: Textual data with erratic data formats that can be formatted with effort, tools, and time (for instance, web clickstream data that may contain inconsistencies in data values and formats). See Figure 1-6.
  • 10.
    Data Structures • Unstructureddata: Data that has no inherent structure, which may include text documents, PDFs, images, and video. See Figure 1-7.
  • 11.
    BI Versus DataScience Bl tends to provide reports, dashboards, and queries on business questions for the current period or in the past. Bl systems make it easy to answer questions related to quarter-to-date revenue, progress toward quarterly targets, and understand how much of a given product was sold in a prior quarter or year. These questions tend to be closed-ended and explain current or past behavior, typically by aggregating historical data and grouping it in some way..
  • 14.
    • Organizations anddata collectors are realizing that the data they can gather from individuals contains intrinsic value and, as a result, a new economy is emerging. As this new digital economy continues to evolve, the market sees the introduction of data vendors and data cleaners that use crowdsourcing (such as Mechanical Turk and GalaxyZoo) to test the outcomes of machine learning techniques. As the new ecosystem takes shape, there are four main groups of players within this interconnected web. These are shown in Figure 1-11. • Data devices • Data collectors • Data aggregators • Data users and buyers
  • 15.
    What is Analytics? Rawdata in itself does not have a meaning until it is contextualized and processed into useful information. Analytics is this process of extracting and creating information from raw data by filtering, processing, categorizing, condensing and contextualizing the data.
  • 16.
    What is Analytics? Thechoice of the technologies, algorithms, and frameworks for analytics is driven by the analytics goals of the application. For example, the goals of the analytics task may be: (1) to predict something (for example whether a transaction is a fraud or not, whether it will rain on a particular day, or whether a tumor is benign or malignant), (2) to find patterns in the data (for example, finding the top 10 coldest days in the year, finding which pages are visited the most on a particular website, or finding the most searched celebrity in a particular year), (3) finding relationships in the data (for example, finding similar news articles, finding similar patients in an electronic health record system, finding related products on an eCommerce website, finding similar images, or finding correlation between news items and stock prices).
  • 17.
  • 18.
    Descriptive Analytics • Descriptiveanalytics comprises analyzing past data to present it in a summarized form which can be easily interpreted. Descriptive analytics aims to answer - What has happened? For example, computing the total number of likes for a particular post, computing the average monthly rainfall or finding the average number of visitors per month on a website. Descriptive analytics is useful to summarize the data. A major portion of analytics done today is descriptive analytics through use of statistics functions such as counts, maximum, minimum, mean, top-N, percentage, for instance. Help in describing patterns in the data and present the data in a summarized form.
  • 19.
    What is PredictiveData Analytics? The term predictive analytics refers to the use of statistics and modeling techniques to make predictions about future outcomes and performance. Predictive analytics looks at current and historical data patterns to determine if those patterns are likely to emerge again. This allows businesses and investors to adjust where they use their resources to take advantage of possible future events. Predictive analysis can also be used to improve operational efficiencies and reduce risk.
  • 20.
    Key Takeaways ofPDA • Predictive analytics uses statistics and modeling techniques to determine future performance. • Industries and disciplines, such as insurance and marketing, use predictive techniques to make important decisions. • Predictive models help make weather forecasts, develop video games, translate voice-to-text messages, customer service decisions, and develop investment portfolios. • People often confuse predictive analytics with machine learning even though the two are different disciplines. • Types of predictive models include decision trees, regression, and neural networks.
  • 21.
    Understanding Predictive Analytics •Predictiveanalytics is a form of technology that makes predictions about certain unknowns in the future. It draws on a series of techniques to make these determinations, including artificial intelligence (AI), data mining, machine learning, modeling, and statistics. • For instance, data mining involves the analysis of large sets of data to detect patterns from it. Text analysis does the same, except for large blocks of text.
  • 22.
    Applications of PredictiveModels • Weather forecasts • Creating video games • Translating voice to text for mobile phone messaging • Customer service • Investment portfolio development All of these applications use descriptive statistical models of existing data to make predictions about future data.
  • 23.
    Applications of PredictiveModels They're also useful for businesses to help them manage inventory, develop marketing strategies, and forecast sales.4 It also helps businesses survive, especially those in highly competitive industries, such as health care and retail.5 Investors and financial professionals can draw on this technology to help craft investment portfolios and reduce the potential for risk.
  • 24.
    Uses of PredictiveAnalytics • Forecasting Forecasting is essential in manufacturing because it ensures the optimal utilization of resources in a supply chain. Predictive modeling is often used to clean and optimize the quality of data used for such forecasts. Modeling ensures that more data can be ingested by the system, including from customer-facing operations, to ensure a more accurate forecast. • Credit Credit scoring makes extensive use of predictive analytics. When a consumer or business applies for credit, data on the applicant's credit history and the credit record of borrowers with similar characteristics are used to predict the risk that the applicant might fail to perform on any credit extended.
  • 25.
    • Underwriting Data andpredictive analytics play an important role in underwriting. Insurance companies examine policy applicants to determine the likelihood of having to pay out for a future claim based on the current risk pool of similar policyholders, as well as past events that have resulted in payouts. Predictive models that consider characteristics in comparison to data about past policyholders and claims are routinely used by actuaries.
  • 26.
    Applications of PredictiveModels • Marketing Individuals who work in this field look at how consumers have reacted to the overall economy when planning on a new campaign. They can use these shifts in demographics to determine if the current mix of products will entice consumers to make a purchase. Active traders, meanwhile, look at a variety of metrics based on past events when deciding whether to buy or sell a security. Moving averages, bands, and breakpoints are based on historical data and are used to forecast future price movements.
  • 27.
    Predictive Analytics vs.Machine Learning A common misconception is that predictive analytics and machine learning are the same things. Predictive analytics help us understand possible future occurrences by analyzing the past. At its core, predictive analytics includes a series of statistical techniques (including machine learning, predictive modeling, and data mining) and uses statistics (both historical and current) to estimate, or predict, future outcomes.
  • 28.
    Predictive Analytics vs.Machine Learning Machine learning, on the other hand, is a subfield of computer science that, as per the 1959 definition by Arthur Samuel (an American pioneer in the field of computer gaming and artificial intelligence) means "the programming of a digital computer to behave in a way which, if done by human beings or animals, would be described as involving the process of learning."
  • 29.
    There are three common techniquesused in predictive analytics: ❖Decision trees, ❖Neural networks, ❖Regression.
  • 30.
    Types of PredictiveAnalytical Models • Decision Trees If you want to understand what leads to someone's decisions, then you may find decision trees useful. This type of model places data into different sections based on certain variables, such as price or market capitalization. Just as the name implies, it looks like a tree with individual branches and leaves. Branches indicate the choices available while individual leaves represent a particular decision. • Decision trees are the simplest models because they're easy to understand and dissect. They're also very useful when you need to make a decision in a short period of time
  • 31.
    Types of PredictiveAnalytical Models Regression It is used when you want to determine patterns in large sets of data and when there's a linear relationship between the inputs. This method works by figuring out a formula, which represents the relationship between all the inputs found in the dataset. For example, you can use regression to figure out how price and other key factors can shape the performance of a security.
  • 32.
    Applications of PredictiveModels • Neural Networks Neural networks were developed as a form of predictive analytics by imitating the way the human brain works. This model can deal with complex data relationships using artificial intelligence and pattern recognition.
  • 33.
    Applications of PredictiveModels • Artificial Neural Network (ANN) uses the processing of the brain as a basis to develop algorithms that can be used to model complex patterns and prediction problems.
  • 34.
    How Businesses CanUse Predictive Analytics • Predictive models are frequently used by businesses to help improve their customer service and outreach. • Executives and business owners can take advantage of this kind of statistical analysis to determine customer behavior. For instance, the owner of a business can use predictive techniques to identify and target regular customers who could defect and go to a competitor. • Predictive analytics plays a key role in advertising and marketing. Companies can use models to determine which customers are likely to respond positively to marketing and sales campaigns. Business owners can save money by targeting customers who will respond positively rather than doing blanket campaigns
  • 35.
    Benefits of PredictiveAnalytics • Using this type of analysis can help entities when you need to make predictions about outcomes when there are no other (and obvious) answers available. Investors, financial professionals, and business leaders are able to use models to help reduce risk. For instance, an investor and their advisor can use certain models to help craft an investment portfolio with minimal risk to the investor by taking certain factors into consideration, such as age, capital, and goals. • Businesses can determine the likelihood of success or failure of a product before it launches.
  • 36.
    Criticism of PredictiveAnalytics • The use of predictive analytics has been criticized and, in some cases, legally restricted due to perceived inequities in its outcomes. Most commonly, this involves predictive models that result in statistical discrimination against racial or ethnic groups in areas such as credit scoring, home. A famous example of this is the (now illegal) practice of redlining in home lending by banks. Regardless of whether the predictions drawn from the use of such analytics are accurate, their use is generally frowned upon, and data that explicitly include information such as a person's race are now often excluded from predictive analytics. lending, employment, or risk of criminal behavior.