Testing of hypothesis - large sample testParag Shah
Different type of test which are used for large sample has been included in this presentation. Steps for each test and a case study is included for concept clarity and practice.
Hypothesis is usually considered as the principal instrument in research and quality control. Its main function is to suggest new experiments and observations. In fact, many experiments are carried out with the deliberate object of testing hypothesis. Decision makers often face situations wherein they are interested in testing hypothesis on the basis of available information and then take decisions on the basis of such testing. In Six –Sigma methodology, hypothesis testing is a tool of substance and used in analysis phase of the six sigma project so that improvement can be done in right direction
Standard error is used in the place of deviation. it shows the variations among sample is correlate to sampling error. list of formula used for standard error for different statistics and applications of tests of significance in biological sciences
Testing of hypothesis - large sample testParag Shah
Different type of test which are used for large sample has been included in this presentation. Steps for each test and a case study is included for concept clarity and practice.
Hypothesis is usually considered as the principal instrument in research and quality control. Its main function is to suggest new experiments and observations. In fact, many experiments are carried out with the deliberate object of testing hypothesis. Decision makers often face situations wherein they are interested in testing hypothesis on the basis of available information and then take decisions on the basis of such testing. In Six –Sigma methodology, hypothesis testing is a tool of substance and used in analysis phase of the six sigma project so that improvement can be done in right direction
Standard error is used in the place of deviation. it shows the variations among sample is correlate to sampling error. list of formula used for standard error for different statistics and applications of tests of significance in biological sciences
A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution. Suppose we flip a coin two times and count the number of heads (successes).
Hypothesis Testing is important part of research, based on hypothesis testing we can check the truth of presumes hypothesis (Research Statement or Research Methodology )
INTRODUCTION
CHARACTERISTICS OF A HYPOTHESIS
CRITERIA FOR HYPOTHESIS CONSTRUCTION
STEPS IN HYPOTHESIS TESTING
SOURCES OF HYPOTHESIS
APPROACHES TO HYPOTHESIS TESTING
THE LOGIC OF HYPOTHESIS TESTING
TYPES OF ERRORS IN HYPOTHESIS
Basics of Hypothesis testing for PharmacyParag Shah
This presentation will clarify all basic concepts and terms of hypothesis testing. It will also help you to decide correct Parametric & Non-Parametric test for your data
A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution. Suppose we flip a coin two times and count the number of heads (successes).
Hypothesis Testing is important part of research, based on hypothesis testing we can check the truth of presumes hypothesis (Research Statement or Research Methodology )
INTRODUCTION
CHARACTERISTICS OF A HYPOTHESIS
CRITERIA FOR HYPOTHESIS CONSTRUCTION
STEPS IN HYPOTHESIS TESTING
SOURCES OF HYPOTHESIS
APPROACHES TO HYPOTHESIS TESTING
THE LOGIC OF HYPOTHESIS TESTING
TYPES OF ERRORS IN HYPOTHESIS
Basics of Hypothesis testing for PharmacyParag Shah
This presentation will clarify all basic concepts and terms of hypothesis testing. It will also help you to decide correct Parametric & Non-Parametric test for your data
This slideshow is related to testing of hypothesis and goodness of fit of statistics. This may be useful for students, teachers, managers concerned with bio statistics, bioinformatics, data science, etc.
This will help understand the basic concepts of Statistics like data types, level of measurements, central tendency, dispersion, graphs, univaraite analysis, bivariate analysis and more. Moreover, it will also help you to select appropriate summary statistics and charts for your data.
Non-parametric tests are sometimes called distribution-free tests because they are based on fewer assumptions (e.g., they do not assume that the outcome is approximately normally distributed). The cost of fewer assumptions is that non-parametric tests are generally less powerful than their parametric counterparts.
Correlation & Regression Analysis using SPSSParag Shah
Concept of Correlation, Simple Linear Regression & Multiple Linear Regression and its analysis using SPSS. How it check the validity of assumptions in Regression
SPSS does not have Z test for proportions, So, we use Chi-Square test for proportion tests. Test for single proportion and Test for proportions of two samples
Chi Square test for independence of attributes / Testing association between two categorical variables, Chi-Square test for Goodness of fit / Testing significant difference between observed and expected frequencies
Chi-Square test for independence of attributes / Chi-Square test for checking association between two categorical variables, Chi-Square test for goodness of fit
t test for single mean, t test for means of independent samples, t test for means of dependent sample ( Paired t test). Case study / Examples for hands on experience of how SPSS can be used for different hypothesis testing - t test.
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesParag Shah
This presentation will give perfect understanding of data, data types, level of measurements, exploratory data analysis and more importantly, when to use which type of summary statistics and graphs
This presentation will clarify all your basic concepts of Probability. It includes Random Experiment, Sample Space, Event, Complementary event, Union - Intersection and difference of events, favorable cases, probability definitions, conditional probability, Bayes theorem
This ppt includes basic concepts about data types, levels of measurements. It also explains which descriptive measure, graph and tests should be used for different types of data. A brief of Pivot tables and charts is also included.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
This ppt is to guide students opting for Statistics major. It gives an idea of skills required and job prospects. It also emphasizes on the important life skills along with Statistics knowledge, analytical thinking and hands on analytical software .
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. Parameter and Statistics
• A measure calculated from population data is called
Parameter.
• A measure calculated from sample data is called Statistic.
Parameter Statistic
Size N n
Mean μ x̄
Standard deviation σ s
Proportion P p
Correlation coefficient ρ r
4. Statistical Hypothesis
A Statistical hypothesis is an assumption or any logical statement
about the parameter of the population.
E.g.
• India will score on an average 300 runs in the next ODI series.
• The average marks obtained by students at Guj. Uni. in Statistics is
atleast 80.
• Proportion of diabetic patients in Gujarat is not more than 10 %
• Students of Guj. Uni. score better than students from other
universities
5. Null hypothesis
A statistical hypothesis which is written for the possible
acceptance is called Null hypothesis. It is denoted by H0.
• In Null hypothesis if the parameter assumes specific value then
it is called Simple hypothesis.
E.g. 𝜇 = 280, P=0.10
• In Null hypothesis if the parameter assumes set of values then
it is called Composite hypothesis.
E.g. 𝜇 ≥ 280, P ≤ 0.10
6. Alternative Hypothesis
A statistical hypothesis which is complementary to the Null
hypothesis is called Alternative hypothesis. It is denoted by H1.
7. Problem Statement Null hypothesis
(H0)
Alternative
hypothesis (H1)
India will score on an average 300 runs in the next ODI
series
𝜇 = 300 𝜇 ≠ 300
The average marks obtained by students at Guj Uni in
Statistics is atleast 80
𝜇 = 80 𝜇 < 80
Proportion of diabetic patients in Gujarat is not more than
10 %
P = 0.10 P > 0.10
Students of Guj Uni score better than students from other
Universities
𝜇1= 𝜇2 𝜇1> 𝜇2
8. Testing of Hypothesis
The procedure to decide whether to accept or reject the null
hypothesis is called Testing of hypothesis.
9. Test Statistics
If the sample size is more than or equal to 30, it is called a large
sample and if it is less than 30, it is called a small sample.
Different test statistic is used for testing of hypothesis based on
the size of the sample.
• For a large sample, test statistic z is used.
• For a small sample, test statistic t is used.
10. Steps of Testing of Hypothesis
• Step 1: Setting up Null hypothesis
• Step 2: Setting up Alternative hypothesis
• Step3: Calculating test statistics
• Step 4: Determining table value of test statistics
• Step 5: Conclusion
– If test statistics ≤ table value, Null hypothesis is Accepted
– If test statistics > table value, Null hypothesis is Rejected
11.
12. Small Sample Test
• Test of Single Mean
• Test of significance of difference between two means
(Independent samples)
• Test of significance of difference between two means
(dependent samples)
13. Test for Single Mean
• Step 1: Null hypothesis H0: 𝜇 = 𝜇0
• Step 2: Alternative hypothesis H1: 𝜇 ≠ 𝜇0 or 𝜇 > 𝜇0 or 𝜇 < 𝜇0
• Step 3: Test statistics
𝑡 =
𝑥−𝜇
𝑠
𝑛−1
Denominator is the Standard Error of sample mean i.e. S.E.( 𝑥)
• Step 4: Table value of t at 𝛼 % level of significance and 𝑛 − 1 d.f.
• Step 5: If t ≤ t table value, H0 is Accepted
If t > t table value, H0 is Rejected
14. Case Study 1
The price of a popular tennis racket at a national chain store is 1790 Rs. Ronish
bought five of the same racket from online platform for the following prices:
1550 1790 1750 1750 1610
Assuming that the online platform prices of rackets are normally distributed,
determine whether there is sufficient evidence in the sample, at the 5% level of
significance, to conclude that the average price of the racket is less than 1790
Rs. if purchased from online platform.
15. Test for difference between two means ( Independent Samples)
• Step 1: Null hypothesis H0: 𝜇1= 𝜇2
• Step 2: Alternative hypothesis H1: 𝜇1 ≠ 𝜇2 or 𝜇1 > 𝜇2 or 𝜇1 < 𝜇2
• Step 3: Test statistics
𝑡 =
𝒙 𝟏 − 𝒙 𝟐
𝑺
𝟏
𝒏 𝟏
+
𝟏
𝒏 𝟐
where 𝑆2
=
𝑛1 𝑆1
2
+𝑛2 𝑆2
2
𝑛1+𝑛2−1
Denominator is the Standard Error of difference of sample means i.e. S.E.( 𝒙 𝟏 − 𝒙 𝟐)
• Step 4: Table value of t at 𝛼 % level of significance and 𝑛1 + 𝑛2 − 1 d.f.
• Step 5: If t ≤ t table value, H0 is Accepted
If t > t table value, H0 is Rejected
16. Case Study
A software company markets a new computer game with two experimental
packaging designs. Design 1 is sent to 11 stores; their average sales the first month
is 52 units with sample standard deviation 12 units. Design 2 is sent to 6 stores;
their average sales the first month is 46 units with sample standard deviation 10
units. Test at 5 % level whether there is significant difference in average monthly
sales between the two package designs.
17. Test for difference between two means ( Dependent Samples)
• Step 1: Null hypothesis H0: 𝑑 = 0
• Step 2: Alternative hypothesis H1: 𝑑 ≠ 0
• Step 3: Test statistics
𝑧 =
𝑑
𝑠
𝑛−1
Denominator is the Standard Error of sample mean of differences i.e. S.E.( 𝑑)
• Step 4: Table value of t at 𝛼 % level of significance and 𝑛 − 1 d.f.
• Step 5: If t ≤ t table value, H0 is Accepted
If t > t table value, H0 is Rejected
18. Case Study
A clinic provides a program to help their clients lose weight
asks a consumer agency to investigate the effectiveness
of the program. The agency takes a sample of 15 people,
weighing each person in the sample before the program
begins and 3 months later to produce the table given below.
Determine whether the program is effective.