Researchers, as a whole, tend to underestimate the need for power. I'm just now starting to get it.
I recently gave a brief, easy-to-follow presentation on statistical power, it's importance, and how to go about getting it.
Hope you find it useful.
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONIJDKP
The geometry of data, also known as probability distribution, is an important consideration for accurate computation of data mining tasks, such as pre-processing, classification and interpretation. The data geometry influences outcome and accuracy of the statistical analysis to a large extent. The current paper focuses on, understanding the influence of data geometry in the feature subset selection process using random forest algorithm. In practice, it is assumed that the data follows normal distribution and most of the time, it may not be true. The dimensionality reduction varies, due to change in the distribution of the data. A comparison is made using three standard distributions such as Triangular, Uniform and Normal Distribution. The results are discussed in this paper.
Researchers, as a whole, tend to underestimate the need for power. I'm just now starting to get it.
I recently gave a brief, easy-to-follow presentation on statistical power, it's importance, and how to go about getting it.
Hope you find it useful.
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONIJDKP
The geometry of data, also known as probability distribution, is an important consideration for accurate computation of data mining tasks, such as pre-processing, classification and interpretation. The data geometry influences outcome and accuracy of the statistical analysis to a large extent. The current paper focuses on, understanding the influence of data geometry in the feature subset selection process using random forest algorithm. In practice, it is assumed that the data follows normal distribution and most of the time, it may not be true. The dimensionality reduction varies, due to change in the distribution of the data. A comparison is made using three standard distributions such as Triangular, Uniform and Normal Distribution. The results are discussed in this paper.
Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is suitable for binary and multiclass classification. Naïve Bayes performs well in cases of categorical input variables compared to numerical variables. It is useful for making predictions and forecasting data based on historical results.
Primer on the application of statistical significance testing for business research purposes.
1) How to use statistics to make more informed decisions (and when not to use).
2) Highlight differences between statistics in science vs business.
3) Highlight assumptions, limitations and best practices.
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
In this paper, we are proposing a modified algorithm for classification. This algorithm is based on the concept of the decision trees. The proposed algorithm is better then the previous algorithms. It provides more accurate results. We have tested the proposed method on the example of patient data set. Our proposed methodology uses greedy approach to select the best attribute. To do so the information gain is used. The attribute with highest information gain is selected. If information gain is not good then again divide attributes values into groups. These steps are done until we get good classification/misclassification ratio. The proposed algorithms classify the data sets more accurately and efficiently.
Statistical power lays a foundation for a successful clinical trial, thus affecting all clinical trial professionals. Underpowered studies have a higher risk of not showing a statistically significant effect at the end of the study; whereas overpowered studies can lead to unreasonably large sample sizes, unnecessary risk to patients, and added expense. This webinar will address the basics of statistical power for non-statisticians, highlighting what you need to know about statistical power, how it affects your clinical trial, and what to ask for from your statistician.
Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is suitable for binary and multiclass classification. Naïve Bayes performs well in cases of categorical input variables compared to numerical variables. It is useful for making predictions and forecasting data based on historical results.
Primer on the application of statistical significance testing for business research purposes.
1) How to use statistics to make more informed decisions (and when not to use).
2) Highlight differences between statistics in science vs business.
3) Highlight assumptions, limitations and best practices.
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
In this paper, we are proposing a modified algorithm for classification. This algorithm is based on the concept of the decision trees. The proposed algorithm is better then the previous algorithms. It provides more accurate results. We have tested the proposed method on the example of patient data set. Our proposed methodology uses greedy approach to select the best attribute. To do so the information gain is used. The attribute with highest information gain is selected. If information gain is not good then again divide attributes values into groups. These steps are done until we get good classification/misclassification ratio. The proposed algorithms classify the data sets more accurately and efficiently.
Statistical power lays a foundation for a successful clinical trial, thus affecting all clinical trial professionals. Underpowered studies have a higher risk of not showing a statistically significant effect at the end of the study; whereas overpowered studies can lead to unreasonably large sample sizes, unnecessary risk to patients, and added expense. This webinar will address the basics of statistical power for non-statisticians, highlighting what you need to know about statistical power, how it affects your clinical trial, and what to ask for from your statistician.
Market Research - Course Slides
CONTENTS
1. Introduction
-Marketing Research
-Types of Market Research
-Research Methods
2.Qualitative Research Methods
- Focus Groups
- Depth Interview
- Projective Techniques
- Comparison of Qualitative Techniques
3. Observation Methods
4. Survey: Measurement and Scaling
- Intorduction
- Comparative Scales
- Non-comparative Scales
- Multi-item Scales
- Reliability and Validity
5.Questionnaire
- Asking Questions
- Overcoming Inability to Answer
- Overcoming Unwillingness to Answer
- Increasing Willingness of Respondents
- Determining the Order of Questions
- What’s Next?
6.Sampling
- Non-probability Sampling
- Probability Sampling
- Choosing Non-Probability vs. Probability Sampling
- Sample Size
7. Data Analysis: A Concise Overview of Statistical Techniques
- Descriptive Statistics: Some Popular Displays of Data
- Organizing Qualitative Data
- Organizing Quantitative Data
- Summarizing Data Numerically
- Cross-Tabulations
- Inferential Statistics: Can the results be generalized to population?
- Hypothesis Testing
- Strength of a Relationship in Cross-Tabulation
- Describing the Relationship Between Two (Ratio Scaled) Variables
8. Advanced Techniques of Market Analysis: A Brief Overview of Some Useful Concepts
- Conjoint Analysis
- Market Simulations
- Market Segmentation
- Perceptual Positioning Maps
9. Reporting Results
Data reduction: breaking down large sets of data into more-manageable groups or segments that provide better insight.
- Data sampling
- Data cleaning
- Data transformation
- Data segmentation
- Dimension reduction
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Assignment 2 RA Annotated BibliographyIn your final paper for .docxjosephinepaterson7611
Assignment 2: RA: Annotated Bibliography
In your final paper for this course, you will need to write a Methods section that is about 3–4 pages long where you will assess and evaluate the methods and analysis of your proposed research.
In preparation for this particular section, answer the following questions thoroughly and provide justification/support. The more complete and detailed your answers for these questions, the better prepared you are to successfully write your final paper:
· What is the problem being addressed by your research study?
· State the refined research question and hypothesis (null and alternative).
· What are your independent and dependent variables? What are their operational definitions?
· Who will be included in your sample (i.e., inclusion and exclusion characteristics)?
· How many participants will you have in your sample?
· How will you recruit your sample?
· Identify the type of measurement instrument to be used to collect the raw numeric data to be statistically analyzed and the type of measurement data the instrument produces.
· What issues will you cover in the informed consent?
· If there is potential risk or harm, how will you ensure the safety of all participants?
· Name any possible threats to validity and steps that can be taken to minimize these threats.
· What type of parametric or nonparametric inferential statistical process (correlation, difference, or effect) will you use in your proposed research? Why is this statistical test the best fit?
· State an acceptable behavioral research alpha level you would use to fail to accept or fail to reject the stated null hypothesis and explain your choice.
This paper may be written in question-and-answer format rather than a flowing paper. Write your response in a 3- to 4-page Microsoft Word document.
All written assignments and responses should follow APA rules for attributing sources.
Submission Details:
· By the due date assigned, save your document as M4_A2_Lastname_Firstname.doc and submit it to the Submissions Area .
Assignment 2 Grading Criteria
Maximum Points
Stated the problem being addressed.
8
Stated the refined research question and hypothesis (null and alternative).
6
Stated the independent and dependent variables and provided the operational definitions.
12
Discussed sample characteristics and size.
8
Discussed a sample recruitment strategy.
6
Identified the type of measurement instrument to be used and the type of measurement data the instrument produces.
8
Discussed the informed consent and potential risk and protection factors.
12
Named the possible threats to validity and steps that can be taken to minimize these threats.
12
Discussed the type of parametric or nonparametric inferential statistical process that will be used and why it is a best fit.
8
Stated an acceptable behavioral research alpha level for analyzing the data.
4
Wrote in a clear, concise, and organized manner; demonstrated ethical scholarship in accurate representation and attrib.
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
Supervised learning is a branch of machine learning wherein the machine is equipped with labelled data which it uses to create sophisticated models that can predict the labels of related unlabelled data. the literature on the field offers a wide spectrum of algorithms and applications. however, there is limited research available to compare the algorithms making it difficult for beginners to choose the most efficient algorithm and tune it for their application.
This research aims to analyse the performance of common supervised learning algorithms when applied to sample datasets along with the effect of hyper-parameter tuning. for the research, each algorithm is applied to the datasets and the validation curves (for the hyper-parameters) and learning curves are analysed to understand the sensitivity and performance of the algorithms. the research can guide new researchers aiming to apply supervised learning algorithm to better understand, compare and select the appropriate algorithm for their application. Additionally, they can also tune the hyper-parameters for improved efficiency and create ensemble of algorithms for enhancing accuracy.
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONaciijournal
Supervised learning is a branch of machine learning wherein the machine is equipped with labelled data
which it uses to create sophisticated models that can predict the labels of related unlabelled data.the
literature on the field offers a wide spectrum of algorithms and applications.however, there is limited
research available to compare the algorithms making it difficult for beginners to choose the most efficient
algorithm and tune it for their application.
This research aims to analyse the performance of common supervised learning algorithms when applied to
sample datasets along with the effect of hyper-parameter tuning.for the research, each algorithm is applied
to the datasets and the validation curves (for the hyper-parameters) and learning curves are analysed to
understand the sensitivity and performance of the algorithms.the research can guide new researchers
aiming to apply supervised learning algorithm to better understand, compare and select the appropriate
algorithm for their application. Additionally, they can also tune the hyper-parameters for improved
efficiency and create ensemble of algorithms for enhancing accuracy.
Analysis of Common Supervised Learning Algorithms Through Applicationaciijournal
Supervised learning is a branch of machine learning wherein the machine is equipped with labelled data
which it uses to create sophisticated models that can predict the labels of related unlabelled data. the
literature on the field offers a wide spectrum of algorithms and applications. However, there is limited
research available to compare the algorithms making it difficult for beginners to choose the most efficient
algorithm and tune it for their application.
This research aims to analyse the performance of common supervised learning algorithms when applied to
sample datasets along with the effect of hyper-parameter tuning. for the research, each algorithm is
applied to the datasets and the validation curves (for the hyper-parameters) and learning curves are
analysed to understand the sensitivity and performance of the algorithms. The research can guide new
researchers aiming to apply supervised learning algorithm to better understand, compare and select the
appropriate algorithm for their application. Additionally, they can also tune the hyper-parameters for
improved efficiency and create ensemble of algorithms for enhancing accuracy.
Unlock the power of data with our comprehensive guide to data analytics. Take your business decision making to the next level.
Usefull Link:- https://www.attitudetallyacademy.com/functionalarea/mis-and-data-analytics
Data science is likely to become even more important as the volume and complexity of data continues to increase. With advancements in machine learning and artificial intelligence, data scientists will have access to more sophisticated tools and algorithms to analyze and extract insights from data. Data science will continue to play a crucial role in fields such as healthcare, finance, and technology, helping organizations make better decisions and drive innovation. Additionally, there will be a greater emphasis on data privacy and ethical considerations as the use of data becomes more prevalent.
#Data science is a field that involves using statistical and computational methods to analyze and extract insights from data. It plays a crucial role in various industries, from business and healthcare to finance and technology.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
4. Data Analysis
3 A set of methods and techniques used to
obtain information and insights from data
3 Helps avoid erroneous judgements and
conclusions
3 Can constructively influence the research
objectives and the research design
Essentials of Marketing Research Kumar, Aaker, Day
5. Preparing the Data for Analysis
3 Data editing
3 Coding
3 Statistically adjusting the data
Essentials of Marketing Research Kumar, Aaker, Day
6. Preparing the Data for Analysis
(Contd.)
Data Editing
3 Identifies omissions, ambiguities, and errors
in responses
3 Conducted in the field by interviewer and
field supervisor and by the analyst prior to
data analysis
Essentials of Marketing Research Kumar, Aaker, Day
7. Preparing the Data for Analysis
(Contd.)
Problems Identified With Data Editing
3 Interviewer Error
3 Omissions
3 Ambiguity
3 Inconsistencies
3 Lack of Cooperation
3 Ineligible Respondent
Essentials of Marketing Research Kumar, Aaker, Day
8. Preparing the Data for Analysis
(Contd.)
Coding
3 Coding closed-ended questions involves
specifying how the responses are to be
entered
3 Open-ended questions are difficult to code
x Lengthy list of possible responses is generated
Essentials of Marketing Research Kumar, Aaker, Day
9. Preparing the Data for Analysis
(Contd.)
Statistically Adjusting the Data +
Weighting
3 Each response is assigned a number according to a
pre-specified rule
3 Makes sample data more representative of target
population on specific characteristics
3 Modifies number of cases in the sample that
possess certain characteristics
3 Adjusts the sample so that greater importance is
attached to of Marketing Research with certain characteristics
Essentials respondents Kumar, Aaker, Day
10. Preparing the Data for Analysis
(Contd.)
Statistically Adjusting the Data + Variable
Re-specification
3 Existing data is modified to create new variables
3 Large number of variables collapsed into fewer
variables
3 Creates variables that are consistent with study
objectives
3 Dummy variables are used (binary, dichotomous,
instrumental, quantitative variables)
3 Use (d-1) dummy Research
Essentials of Marketing variables to specify (d) levels of
Kumar, Aaker, Day
11. Preparing the Data for Analysis
(Contd.)
Statistically Adjusting the Data + Scale
Transformation
3 Scale values are manipulated to ensure
comparability with other scales
3 Standardization allows the researcher to compare
variables that have been measured using different
types of scales
3 Variables are forced to have a mean of zero and a
standard deviation of one
3 Can be done Marketing on interval or ratioAaker, Day data
Essentials of only Research Kumar, scaled
12. Simple Tabulation
3 Consists of counting the number of cases
that fall into various categories
Use of Simple Tabulation
3 Determine empirical distribution (frequency
distribution) of the variable in question
3 Calculate summary statistics, particularly
the mean or percentages
3 Aid in "data cleaning" aspects
Essentials of Marketing Research Kumar, Aaker, Day
13. Frequency Distribution
3 Reports the number of responses that each
question received
3 Organizes data into classes or groups of values
3 Shows number of observations that fall into each
class
3 Can be illustrated simply as a number or as a
percentage or histogram
3 Response categories may be combined for many
questions
3 Should result inResearch
Essentials of Marketing categories Kumar, Aaker,worthwhile
with Day
14. Descriptive Statistics
3 Statistics normally associated with a
frequency distribution to help summarize
information in the frequency table
3 Measures of central tendency mean, median
and mode
3 Measures of dispersion (range, standard
deviation, and coefficient of variation)
3 Measures of shape (skewness and kurtosis)
Essentials of Marketing Research Kumar, Aaker, Day
15. Analysis for Various Population
Subgroups
3 Differences between means or percentages
of two subgroup responses can provide
insights
3 Difference between means is concerned
with the association between two questions
3 Question upon which means are based are
intervally scaled
Essentials of Marketing Research Kumar, Aaker, Day
16. Cross Tabulations
3 Statistical analysis technique to study the
relationships among and between variables
3 Sample is divided to learn how the
dependent variable varies from subgroup to
subgroup
3 Frequency distribution for each subgroup is
compared to the frequency distribution for
the total sample
3 The two variables that are analyzed must be
Essentials of Marketing Research Kumar, Aaker, Day
17. Factors Influencing the Choice of
Statistical Technique
Type of Data
x Classification of data involves nominal, ordinal,
interval and ratio scales of measurement
x Nominal scaling is restricted to the mode as the only
measure of central tendency
x Both median and mode can be used for ordinal scale
x Non-parametric tests can only be run on ordinal data
x Mean, median and mode can all be used to measure
central tendency for interval and ratio scaled data
Essentials of Marketing Research Kumar, Aaker, Day
18. Factors Influencing the Choice of
Statistical Technique (Contd.)
Research Design
x Dependency of observations
x Number of observations per object
x Number of groups being analyzed
x Control exercised over variable of interest
Assumptions Underlying the Test Statistic
x If assumptions on which a statistical test is based are
violated, the test will provide meaningless results
Essentials of Marketing Research Kumar, Aaker, Day
19. Overview of Statistical
Techniques
Univariate Techniques
x Appropriate when there is a single measurement of
each of the 'n' sample objects or there are several
measurements of each of the `n' observations but
each variable is analyzed in isolation
x Nonmetric - measured on nominal or ordinal scale
x Metric-measured on interval or ratio scale
x Determine whether single or multiple samples are
involved
x For multiple samples, choice of statistical test
depends on whether the samples are independent or
dependent
Essentials of Marketing Research Kumar, Aaker, Day
20. Overview of Statistical
Techniques (Contd.)
Multivariate Techniques
3 A collection of procedures for analyzing
association between two or more sets of
measurements that have been made on each
object in one or more samples of objects
3 Dependence or interdependence techniques
Essentials of Marketing Research Kumar, Aaker, Day
21. Overview of Statistical
Techniques (Contd.)
Multivariate Techniques (Contd.)
Dependence Techniques
3 One or more variables can be identified as
dependent variables and the remaining as
independent variables
3 Choice of dependence technique depends
on the number of dependent variables
involved in analysis
Essentials of Marketing Research Kumar, Aaker, Day
22. Overview of Statistical
Techniques (Contd.)
Multivariate Techniques (Contd.)
Interdependence Techniques
3 Whole set of interdependent relationships is
examined
3 Further classified as having focus on
variable or objects
Essentials of Marketing Research Kumar, Aaker, Day
23. Overview of Statistical
Techniques (Contd.)
Why Use Multivariate Analysis?
3 To group variables or people or objects
3 To improve the ability to predict variables
(such as usage)
3 To understand relationships between
variables (such as advertising and sales)
Essentials of Marketing Research Kumar, Aaker, Day
24. Hypothesis Testing:
Basic Concepts
3 Assumption (hypothesis) made about a
population parameter (not sample parameter)
3 Purpose of Hypothesis Testing
x To make a judgement about the difference between
two sample statistics or the sample statistic and a
hypothesized population parameter
3 Evidence has to be evaluated statistically
before arriving at a conclusion regarding the
hypothesis.
Essentials of Marketing Research Kumar, Aaker, Day
25. Hypothesis Testing
3 The null hypothesis (Ho) is tested against
the alternative hypothesis (Ha).
3 At least the null hypothesis is stated.
3 Decide upon the criteria to be used in
making the decision whether to “reject” or
"not reject" the null hypothesis.
Essentials of Marketing Research Kumar, Aaker, Day
26. Significance Level
3 Indicates the percentage of sample means that
is outside the cut-off limits (critical value)
3 The higher the significance level (α) used for
testing a hypothesis, the higher the probability
of rejecting a null hypothesis when it is true
(Type I error)
3 Accepting a null hypothesis when it is false is
called a Type II error and its probability is
(β)
Essentials of Marketing Research Kumar, Aaker, Day
27. Hypothesis Testing
Tests in this class
Statistical Test
3 Frequency Distributions χ2
3 Means (one) z (if σ is known)
t (if σ is unknown)
3 Means (two or more) ANOVA
Essentials of Marketing Research Kumar, Aaker, Day
28. Cross-tabulation and Chi Square
In Marketing Applications, Chi-square
Statistic Is Used As
Test of Independence
3 Are there associations between two or more variables in a
study?
Test of Goodness of Fit
3 Is there a significant difference between an observed
frequency distribution and a theoretical frequency
distribution?
Essentials of Marketing Research Kumar, Aaker, Day
29. Chi-Square As a Test of
Independence
Null Hypothesis Ho
3 Two (nominally scaled) variables are
statistically independent
Alternative Hypothesis Ha
3 The two variables are not independent
Use Chi-square distribution to test.
Essentials of Marketing Research Kumar, Aaker, Day
30. Chi-square Statistic (χ ) 2
3 Measures of the difference between the actual numbers
observed in cell i (Oi), and number expected (Ei) under
independence if the null hypothesis were true
(Oi − Ei )
n 2
χ =Σ2
i =1 Ei
With (r-1)*(c-1) degrees of freedom
r = number of rows c = number of columns
3 Expected frequency in each cell: Ei = pc * pr * n
Where pc and pr are proportions for independent variables
and n is the total number of observations
Essentials of Marketing Research Kumar, Aaker, Day
31. Chi-square Step-by-Step
1) Formulate Hypotheses
2) Calculate row and column totals
3) Calculate row and column proportions
4) Calculate expected frequencies (Ei)
5) Calculate χ2 statistic
6) Calculate degrees of freedom
7) Obtain Critical Value from table
8) Make decision regarding the Null-hypothesis
Essentials of Marketing Research Kumar, Aaker, Day
32. Example of Chi-square as a Test
of Independence
Class
1 2
A 10 8
Grade B 20 16
C 45 18
This is a ‘Cell’
D 16 6
E 9 2
Essentials of Marketing Research Kumar, Aaker, Day
33. Chi-square As a Test of
Independence - Exercise
Own Income
Expensive Low Middle High
Automobile
Yes 45 34 55
No 52 53 27
Task: Make a decision whether the two variables are
independent!
Essentials of Marketing Research Kumar, Aaker, Day
34. Hypothesis Testing About
a Single Mean
3 Make judgement about a single sample parameter.
3 Hypothesis testing depends on whether the population
is known on not known
( X − µ) ( X − µ)
z= t=
σx sx
if population variance if population variance
is known is not known, or
if sample size < 60
Essentials of Marketing Research Kumar, Aaker, Day
35. Hypothesis Testing About
a Single Mean - Step-by-Step
1) Formulate Hypotheses
2) Select appropriate formula
3) Select significance level
4) Calculate z or t statistic
5) Calculate degrees of freedom (for t-test)
6) Obtain critical value from table
7) Make decision regarding the Null-
hypothesis
Essentials of Marketing Research Kumar, Aaker, Day
36. Hypothesis Testing About
a Single Mean - Example 1
3 Ho: µ = 5000 (hypothesized value of population)
3 Ha: µ ≠ 5000 (alternative hypothesis)
3 n = 100
3 X = 4960
3 σ = 250
3 α = 0.05
Rejection rule: if |zcalc| > zα/2 then reject Ho.
Essentials of Marketing Research Kumar, Aaker, Day
37. Hypothesis Testing About
a Single Mean - Example 2
3 Ho: µ = 1000 (hypothesized value of population)
3 Ha: µ ≠ 1000 (alternative hypothesis)
3 n = 12
3 X = 1087.1
3 s = 191.6
3 α = 0.01
Rejection rule: if |tcalc| > tdf, α/2 then reject Ho.
Essentials of Marketing Research Kumar, Aaker, Day
38. Hypothesis Testing About
a Single Mean - Example 3
3 Ho: µ ≤ 1000 (hypothesized value of population)
3 Ha: µ > 1000 (alternative hypothesis)
3 n = 12
3 X = 1087.1
3 s = 191.6
3 α = 0.05
Rejection rule: if tcalc > tdf, α then reject Ho.
Essentials of Marketing Research Kumar, Aaker, Day
39. Confidence Intervals
3 Hypothesis testing and Confidence Intervals
are two sides of the same coin.
( X − µ)
t= ⇒ X ± ts x = interval
sx estimate of µ
Essentials of Marketing Research Kumar, Aaker, Day
40. Analysis of Variance (ANOVA)
3 Response variable - dependent variable (Y)
3 Factor(s) - independent variables (X)
3 Treatments - different levels of factors
(r1, r2, r3, …)
Essentials of Marketing Research Kumar, Aaker, Day
41. Example (Book p.495)
Product Sales
1 2 3 4 5 Total Xp
39¢ 8 12 10 9 11 50 10
Price
Level 44 ¢ 7 10 6 8 9 40 8
49 ¢ 4 8 7 9 7 35 7
Overall sample mean: X = 8.333
Overall sample size: n = 15
No. of observations per price level: np = 5
Essentials of Marketing Research Kumar, Aaker, Day
42. Example (Book p.495)
Grand Mean
Essentials of Marketing Research Kumar, Aaker, Day
43. One - Factor Analysis of
Variance
3 Studies the effect of 'r' treatments on one
response variable
3 Determine whether or not there are any
statistically significant differences between
the treatment means µ1, µ2,... µR
3 Ho: All treatments have same effect on
mean responses
3 H1 : At least 2 of µ1, µ2 ... µr are different
Essentials of Marketing Research Kumar, Aaker, Day
44. One - Factor ANOVA -
Intuitively
If: Between Treatment Variance
Within Treatment Variance
Wis large then there are differences between treatments
i is small then there are no differences between treatments
3 To Test Hypothesis, Compute the Ratio Between the
"Between Treatment" Variance and "Within
Treatment" Variance
Essentials of Marketing Research Kumar, Aaker, Day
45. One - Factor ANOVA Table
Source of Variation Degrees of Mean Sum F-ratio
Variation (SS) Freedom of Squares
Between SSr r-1 MSSr =SSr/r-1 MSSr
(price levels) MSSu
Within SSu n-r MSSu=SSu/n-r
(price levels)
Total SSt n-1
Essentials of Marketing Research Kumar, Aaker, Day
46. One - Factor Analysis of
Variance
3 Between Treatment Variance
r
Σ
SSr = p=1 np (Xp - X)2 = 23.3
n r
3 Within-treatment variance
p
i=1 p=1
SSu = Σ Σ (Xip - Xp)2 = 34
Where
SSr = treatment sums of squares r = number of groups
size of group ‘p’
np = sampleEssentialsin Marketing Research X = meanAaker,group p
Kumar, of Day
47. One - Factor Analysis of
Variance
3 Between variance estimate (MSSr)
MSSr = SSr/(r-1) = 23.3/2 = 11.65
3 Within variance estimate (MSSu)
MSSu = SSu/(n-r) = 34/12 = 2.8
Where
n = total sample size Research
Essentials of Marketing
r = Kumar, Aaker, of groups
number Day
48. One - Factor Analysis of
Variance
3 Total variation (SSt): SSt = SSr + SSu = 23.3+34 = 57.3
3 F-statistic: F = MSSr / MSSu = 11.65/2.8 = 4.16
3 DF: (r-1), (n-r) = 2, 12
3 Critical value from table: CV(α, df) = 3.89
Essentials of Marketing Research Kumar, Aaker, Day
Editor's Notes
Solutions for Confidence Interval Exercises (last class): x 95% 90% Problem 1: 4/7 (54.85, 57.14) (55.05, 56.95) (X bar =56, s=4, n = 49) Problem 2: 4/10 (55.2, 56.8) (55.33, 56.66) (X bar =56, s=4, n = 100)
Look at book page 473: explain Type I/II error
We do not deal with Goodness of fit!!
Test whether grade and class are related: Ho: Grade and Class are not related Ha: Grade and Class are related Class Sum 1 2 A 10 (12) 8 (6) 18 (0.12) Grade B 20 (24) 16 (12) 36 (0.24) C 45 (42) 18 (21) 63 (0.42) D 16 (14.66) 6 (7.33) 22 (0.1466) E 9 (7.33) 2 (3.66) 11 (0.0733) Sum: 100 (0.666) 50 (0.333) 150 2 = (10-12) 2 /12 + (8-6) 2 /6 + (20-24) 2 /24 + (16-12) 2/ 12 + (45-42) 2 /42 + (18-21) 2 /21 + (16-14.66) 2 /14.66 + (6-7.33) 2 /7.33 + (9-7.33) 2 /7.33 + (2-3.66) 2 /3.66 = 0.333 + 0.666 + 0.666 + 1.333 + 0.214 + 0.428 + 0.121 + 0.2424 + 0.3787 + 0.752 = 5.136 df = (r-1)*(c-1) = 4*1 = 4 = 0.05 (significance level) Critical value (from table) = 9.49 Since 5.136 < than CV: not reject
Chi-Square = 14.201 df= 2 (r-1)*(c-1) = (2-1)*(3-1) = 2 = 0.05 CV = 5.991 Reject Ho of independence
Talk about Z and t distribution
Population case: therefore z-test Standard error of mean: x = /sqrt(n) = 250/10 = 25 z= (4960-5000) / 25 = -1.6 z /2 = 1.96 if |z calc | > z /2 then reject Ho since |-1.6| < 1.96 do not reject Ho.
Softdrink manufacturer plans to introduce new soft drink. 12 supermarkets are selected at random and soft drink is offered in these supermarkets for limited time.Average existing softdrink sales are 1000, new softdrink sales are 1087.1 Sample < 60 therefore t-test Standard error of mean: s x = s /sqrt(n) = 191.6/sqrt(12) = 55.31 t calc = (1087.1-1000) / 55.31 = 1.57 df = 12-1 = 11 t 11 , /2 = 3.106 if |t calc | > t /2 then reject Ho since |1.57| < 3.106 do not reject Ho.
One sided test Sample < 30 therefore t-test Standard error of mean: x = /sqrt(n) = 191.6/sqrt(12) = 55.31 t calc = (1087.1-1000) / 55.31 = 1.57 df = 12-1 = 11 t 11 , /2 = 1.796 if t calc > t then reject Ho since 1.57 < 1.796 do not reject Ho. Rejection rule for opposite directionality: if t calc < -t then reject Ho