Statistics is both the science of uncertainty and the technology of extracting information from data.
A statistic is a summary measure of data.
Descriptive statistics are methods that describe and summarize data.
Microsoft Excel supports statistical analysis in two ways:
1. Statistical functions
2. Analysis Toolpak add-in
Statistical Methods for Summarizing Data
A frequency distribution is a table that shows the number of observations in each of several nonoverlapping groups.
Categorical variables naturally define the groups in a frequency distribution.
To construct a frequency distribution, we need only count the number of observations that appear in each category.
This can be done using the Excel COUNTIF function.
Frequency Distributions for Categorical Data
Example 3.16: Constructing a Frequency Distribution for Items in the Purchase Orders Database
List the item names in a column on the spreadsheet.
Use the function =COUNTIF($D$4:$D$97,cell_reference), where cell_reference is the cell containing the item name
Example 3.16: Constructing a Frequency Distribution for Items in the Purchase Orders Database
Construct a column chart to visualize the frequencies.
Relative frequency is the fraction, or proportion, of the total.
If a data set has n observations, the relative frequency of category i is:
We often multiply the relative frequencies by 100 to express them as percentages.
A relative frequency distribution is a tabular summary of the relative frequencies of all categories.
Relative Frequency Distributions
Example 3.17: Constructing a Relative Frequency Distribution for Items in the Purchase Orders Database
First, sum the frequencies to find the total number (note that the sum of the frequencies must be the same as the total number of observations, n).
Then divide the frequency of each category by this value.
For numerical data that consist of a small number of discrete values, we may construct a frequency distribution similar to the way we did for categorical data; that is, we simply use COUNTIF to count the frequencies of each discrete value.
Frequency Distributions for Numerical Data
In the Purchase Orders data, the A/P terms are all whole numbers 15, 25, 30, and 45.
Example 3.18: Frequency and Relative Frequency Distribution for A/P Terms
A graphical depiction of a frequency distribution for numerical data in the form of a column chart is called a histogram.
Frequency distributions and histograms can be created using the Analysis Toolpak in Excel.
Click the Data Analysis tools button in the Analysis group under the Data tab in the Excel menu bar and select Histogram from the list.
Excel Histogram Tool
Specify the Input Range corresponding to the data. If you include the column header, then also check the Labels box so Excel knows that the range contains a label. The Bin Range defines the groups (Excel calls these “bins”) used for the frequency distribution.
Histogra.
The document discusses statistical analysis and Excel functions for organizing and summarizing data. It provides information on entering data into Excel sheets, describes functions like SUM, AVERAGE, COUNT, and STDEV that calculate values like sums, means, numbers of data points, and standard deviations. It also discusses using Excel to group data using functions like FREQUENCY and analyzing descriptive statistics using the Data Analysis ToolPak.
This document discusses methods for organizing and presenting qualitative and quantitative data using frequency tables, charts, and graphs. It covers:
1. Creating frequency tables to organize qualitative and quantitative data, and presenting qualitative data as bar charts or pie charts.
2. Constructing frequency distributions to organize quantitative data into class intervals and determining class frequencies, and presenting quantitative data using histograms, frequency polygons, and cumulative frequency polygons.
3. An example of creating a frequency table and histogram based on sales price data from 80 vehicles to compare typical selling prices on dealer lots.
This document discusses exploring and visualizing data in Microsoft Excel. It covers topics such as creating charts, sorting and filtering data, statistical analysis methods for summarizing data, and using PivotTables and PivotCharts. Examples demonstrate how to construct frequency distributions, calculate percentiles and quartiles, filter records, and create cross-tabulations and charts from a structured data set.
The document discusses methods for organizing and presenting both qualitative and quantitative data, including frequency tables, bar charts, pie charts, and different types of frequency distributions. It provides examples of how to construct a frequency table by determining the number of classes, class intervals, and class limits based on a set of data. It also describes how to create histograms, frequency polygons, and cumulative frequency distributions to graphically display a frequency distribution and highlights key terms such as class frequency, class interval, and relative frequency.
This document discusses different methods for organizing data in research. It describes data organization as the process of structuring collected factual information in a way that is accepted by the scientific community. Proper data organization is important for research because it allows facts to be represented in context and helps researchers answer questions and hypotheses. The document then explains three common ways to organize data: frequency distribution tables, stem-and-leaf diagrams, and different types of charts including bar charts, pie charts, line charts, and histograms. Guidelines are provided for constructing each of these data organization methods.
The document discusses various data analysis and visualization techniques in Microsoft Excel including filtering, sorting, formulas, functions, pivot tables, charts and conditional formatting. It provides step-by-step instructions on how to use these tools to extract insights from data by filtering to select specific records, using formulas and functions to perform calculations, sorting data, validating data entry, creating pivot tables and pivot charts to summarize data, and formatting cells based on conditions.
This document provides examples of useful functions and formulas in Microsoft Excel across several categories including common text, math, conditional, date and time functions. It demonstrates how to use functions like UPPER, ROUND, COUNTIF, IF, and DATE among many others to manipulate text, perform calculations, add conditional logic, work with dates and times. Instructions are provided on copying formulas down a column and removing formulas to paste only values.
Introduction to Business analytics unit3jayarellirs
This document discusses various methods for visualizing and summarizing data. It describes different types of charts like column charts, line charts, pie charts, and scatter plots that can be used to visualize quantitative data. It also discusses tools in Excel for filtering, sorting, and summarizing data in tables and how techniques like Pareto analysis can help identify key factors.
The document discusses statistical analysis and Excel functions for organizing and summarizing data. It provides information on entering data into Excel sheets, describes functions like SUM, AVERAGE, COUNT, and STDEV that calculate values like sums, means, numbers of data points, and standard deviations. It also discusses using Excel to group data using functions like FREQUENCY and analyzing descriptive statistics using the Data Analysis ToolPak.
This document discusses methods for organizing and presenting qualitative and quantitative data using frequency tables, charts, and graphs. It covers:
1. Creating frequency tables to organize qualitative and quantitative data, and presenting qualitative data as bar charts or pie charts.
2. Constructing frequency distributions to organize quantitative data into class intervals and determining class frequencies, and presenting quantitative data using histograms, frequency polygons, and cumulative frequency polygons.
3. An example of creating a frequency table and histogram based on sales price data from 80 vehicles to compare typical selling prices on dealer lots.
This document discusses exploring and visualizing data in Microsoft Excel. It covers topics such as creating charts, sorting and filtering data, statistical analysis methods for summarizing data, and using PivotTables and PivotCharts. Examples demonstrate how to construct frequency distributions, calculate percentiles and quartiles, filter records, and create cross-tabulations and charts from a structured data set.
The document discusses methods for organizing and presenting both qualitative and quantitative data, including frequency tables, bar charts, pie charts, and different types of frequency distributions. It provides examples of how to construct a frequency table by determining the number of classes, class intervals, and class limits based on a set of data. It also describes how to create histograms, frequency polygons, and cumulative frequency distributions to graphically display a frequency distribution and highlights key terms such as class frequency, class interval, and relative frequency.
This document discusses different methods for organizing data in research. It describes data organization as the process of structuring collected factual information in a way that is accepted by the scientific community. Proper data organization is important for research because it allows facts to be represented in context and helps researchers answer questions and hypotheses. The document then explains three common ways to organize data: frequency distribution tables, stem-and-leaf diagrams, and different types of charts including bar charts, pie charts, line charts, and histograms. Guidelines are provided for constructing each of these data organization methods.
The document discusses various data analysis and visualization techniques in Microsoft Excel including filtering, sorting, formulas, functions, pivot tables, charts and conditional formatting. It provides step-by-step instructions on how to use these tools to extract insights from data by filtering to select specific records, using formulas and functions to perform calculations, sorting data, validating data entry, creating pivot tables and pivot charts to summarize data, and formatting cells based on conditions.
This document provides examples of useful functions and formulas in Microsoft Excel across several categories including common text, math, conditional, date and time functions. It demonstrates how to use functions like UPPER, ROUND, COUNTIF, IF, and DATE among many others to manipulate text, perform calculations, add conditional logic, work with dates and times. Instructions are provided on copying formulas down a column and removing formulas to paste only values.
Introduction to Business analytics unit3jayarellirs
This document discusses various methods for visualizing and summarizing data. It describes different types of charts like column charts, line charts, pie charts, and scatter plots that can be used to visualize quantitative data. It also discusses tools in Excel for filtering, sorting, and summarizing data in tables and how techniques like Pareto analysis can help identify key factors.
The document discusses various data analysis and visualization techniques in Microsoft Excel including filtering, sorting, formulas, functions, pivot tables, charts and conditional formatting. It provides step-by-step instructions on how to use these tools to extract insights from data by filtering to select specific records, using formulas and functions like VLOOKUP to perform calculations, sorting data, creating pivot tables and pivot charts to summarize and visualize data relationships, and applying conditional formatting to highlight important values.
This document provides tips for managing spreadsheets and extracting information from data. It recommends using Google Sheets to collaborate on spreadsheets with others. It also outlines various spreadsheet functions for summarizing data, extracting text, concatenating strings, and looking up values. Conditional formatting is suggested to highlight important information. Pivot tables are presented as a way to summarize tables with filters and aggregations.
1. The document discusses different topics related to data collection and presentation including sources of data, data collection methods, processing data, and presenting data through graphs, tables, frequency distributions, and other visual formats.
2. Common data collection methods are surveys, observation, interviews, and existing sources; data must then be processed, organized, and cleaned before analysis.
3. Data can be presented visually through tables, graphs, frequency distributions and other charts to reveal patterns and insights in the data in a clear, understandable format.
Focusing on specific data by using filterssum5ashm
1. Excel allows users to focus on important data by limiting the data displayed through powerful filtering tools. Filters can be applied to individual columns to display only certain values.
2. Formulas like SUM and AVERAGE do not dynamically update when rows are hidden, but SUBTOTAL and AGGREGATE functions can summarize just the visible data. Finding unique values in a column can also help analyze data.
3. Validation rules restrict data entry to valid values, helping catch errors. Rules define allowed data types, values, and display custom messages to users.
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptxDishantGola
The document provides instructions for an Excel training course. It lists the students and batch details, and thanks the instructors. It then outlines the topics to be covered in the course, including cell referencing, charts, functions like IF and logical functions, calculators for income tax and HRA exemption, data validation, data protection, pivot tables, conditional formatting, data analysis tools, and dashboard reporting. The purpose is to teach how to use Excel in the CA profession.
Chapter 2: Frequency Distribution and GraphsMong Mara
This document discusses different types of graphs and charts that can be used to represent frequency distributions of data, including histograms, frequency polygons, ogives, bar charts, pie charts, and stem-and-leaf plots. It provides examples of how to construct each graph or chart using sample data sets and discusses key aspects of each type such as class intervals, relative frequencies, and ordering of data. Guidelines are given for determining the optimal number of classes and class widths for grouped data. Exercises at the end provide practice applying these techniques to additional data sets.
UPDATED NOTE Nov 2013: this method of storing information is no longer recommended by the creator of this presentation in light of new data... We migrated to WorldShare Management systems in June 2013, and this presentation is to show how our library is using WorldCat Local lists to create reports on current serials subscriptions because our system currently does not have the reporting ability we need to do our work.
Microsoft Excel is a powerful tool used for creating and formatting spreadsheets. Spreadsheets allow information to be organized in rows and columns and analyzed using automatic mathematics calculations. Excel is commonly used to perform various types of calculations by using functions like IF, AND, OR, SUM, VLOOKUP, and more. Macros can also be recorded and assigned to buttons to automate repetitive tasks in Excel.
This document discusses methods for summarizing data, including frequency distributions, measures of central tendency, and measures of dispersion. It provides examples and formulas for constructing frequency distributions and calculating the mean, median, mode, range, variance, and standard deviation. Key points covered include using frequency distributions to group data, calculating central tendency measures for grouped data, and methods for measuring dispersion both for raw data and grouped data.
Excel tutorial for frequency distributionS.c. Chopra
This document provides a step-by-step tutorial for creating a frequency distribution table in Excel. It explains how to:
1. Prepare the data by naming columns and creating a "FreqDist" sheet.
2. Fill out a template table with parameters like number of observations, class interval, and minimum/maximum values.
3. Use formulas to determine values like class limits, frequencies, and cumulative percentages.
4. Copy formulas down to automatically generate the full distribution table.
The tutorial demonstrates an easy way to analyze numeric data sets in Excel by creating frequency distributions.
This document outlines a training overview for a Microsoft Excel extended introduction course. The course consists of 6 classes covering topics like terminology, navigation, formatting, functions, macros, importing data, and charts. Each class is scheduled for a different date and includes the topics that will be covered, such as formatting, sorting, filtering, and different types of functions like date, logical, and statistical functions.
The document provides steps to calculate summary statistics and create plots for different datasets in Excel. For the first dataset of nuclear reactor counts from 104 to 111, it describes calculating the mean, mode, median using the AVERAGE, MODE, and MEDIAN functions directly in cells or using the Insert Function tool. For the second dataset of European auto sales from 11.2 to 14.3, it describes calculating the variance, standard deviation, and range using the VAR, STDEV, and MAX-MIN functions. Finally, it provides steps to generate a boxplot and stem-and-leaf plot for a third dataset ranging from 23 to 51 using the MegaStat add-in.
1. Outline the differences between Hoarding power and Encouraging..docxpaynetawnya
1. Outline the differences between Hoarding power and Encouraging.
2. Explain about the power of Congruency in Leadership.
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseDegreeGender1GrCopy Employee Data set to this page.822.10.962233290915.81FAThe ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)? 1522.60.984233280814.91FANote: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.3522.60.984232390415.30FA37230.999232295216.20FAThe column labels in the table mean:1023.11.003233080714.71FAID – Employee sample number Salary – Salary in thousands 2323.11.004233665613.30FAAge – Age in yearsPerformance Rating – Appraisal rating (Employee evaluation score)1123.31.01223411001914.81FASERvice – Years of serviceGender: 0 = male, 1 = female 2623.51.020232295216.20FAMidpoint – salary grade midpoint Raise – percent of last raise3123.61.028232960413.91FAGrade – job/pay gradeDegree (0= BS\BA 1 = MS)3623.61.026232775314.30FAGender1 (Male or Female)Compa-ratio - salary divided by midpoint4023.81.034232490206.30MA14241.04523329012161FA4224.21.0512332100815.71FA1924.31.055233285104.61MA25251.0872341704040MA3226.50.855312595405.60MB227.70.895315280703.90MB3428.60.923312680204.91MB3933.91.094312790615.50FB2034.11.1013144701614.80FB1834.51.1133131801115.60FB335.11.132313075513.61FB1341.11.0274030100214.70FC741.31.0324032100815.71FC1642.21.054404490405.70MC4145.81.144402580504.30MC2746.91.172403580703.91MC548.21.0044836901605.71MD3049.31.0274845901804.30MD2456.31.173483075913.80FD4556.91.185483695815.21FD4757.21.003573795505.51ME3357.51.008573590905.51ME4581.01857421001605.51ME3858.81.0325745951104.50ME5059.61.0465738801204.60ME4660.21.0575739752003.91ME2260.31.257484865613.81FD161.61.081573485805.70ME4461.81.0855745901605.21ME49631.1055741952106.60ME1763.71.1185727553131FE1264.71.1355752952204.50ME4869.51.2195734901115.31FE973.91.103674910010041MF4375.61.1286742952015.50FF2976.31.139675295505.40MF2177.21.1526743951306.31MF678.11.1656736701204.51MF2878.31.169674495914.40FF
Week 2This assignment covers the material presented in weeks 1 and 2.Six QuestionsBefore starting this assignment, make sure the the assignment data from the Employee Salary Data Set file is copied over to this Assignment file.You can do this either by a copy and paste of all the columns or by opening the data file, right clicking on the Data tab, selecting Move or Copy, and copying the entire sheet to this file(Weekly Assignment Sheet or whatever you are calling your master assignment file).It is highly recommended that you copy the data columns (with labels) and paste them to the right so that whatever you do will not disrupt the original data values and relationships.To Ensure full credit for each question, you need to show how you got your results. For example, Question 1 asks for several data values. If you obtain them using descript ...
Summarizing Data : Listing and Grouping pdfJustynOwen
Introduction
Descriptive Statistics describe basic features of the data gathered from an experimental study in various ways.
They provide simple summaries about the sample via graphs and numbers, mainly measures of center and variation.
Together with graphics analysis (histograms, bar plots, pie-charts), they are the cornerstone of quantitative data analysis.
Tables (frequency distributions, stem-and-leaf plots, …) that summarize the data.
Graphical representations of the data (histograms, bar plots, pie-charts).
Summary statistics (numbers) which summarize the data
The document discusses the key components of Microsoft Excel, including worksheets, cells, formulas, functions, charts, and printing. It describes how to enter and format data, use formulas and functions, navigate between sheets, resize rows and columns, and create basic charts using the Chart Wizard. Key components of the Excel window include the worksheet, formula bar, row and column headings, and sheet tabs. Formulas in Excel always begin with an equal sign and can include arithmetic operators. Functions like SUM can be used to calculate values across ranges of cells.
Oracle provides several analytical functions that allow for powerful data analysis using SQL. These include group functions that aggregate data over groups or windows, as well as window functions like ROW_NUMBER, RANK, and LAG that analyze data relative to the current row. ROLLUP and CUBE extensions to the GROUP BY clause enable calculation of subtotals across multiple dimensions of data with a single query.
The document discusses data mining and the Microsoft SQL Server 2005 Data Mining Add-ins for Excel 2007. It provides an overview of data mining, how the add-in works, its prerequisites, who can use it, and how to use its various tools for data preparation, modeling, validation and connection to SQL Server Analysis Services.
Data mining refers to analyzing data sets to discover hidden patterns and trends. This information can help companies improve strategies for marketing, analyzing customers and markets, increasing revenue, and forecasting sales. Data mining has proven useful in business, computing, biotechnology, and analyzing stock markets. While a relatively new term, data mining has long been used by large corporations to analyze large data sets and draw conclusions. Microsoft has introduced the SQL Server Data Mining Add-ins for Office 2007 to make data mining accessible through a familiar Microsoft Office environment. It connects Excel to the powerful data mining algorithms in SQL Server Analysis Services. The add-in allows users to perform tasks like data preparation, modeling, and validating models with just a few clicks.
Statistica Sinica 16(2006), 847-860
PSEUDO-R
2
IN LOGISTIC REGRESSION MODEL
Bo Hu, Jun Shao and Mari Palta
University of Wisconsin-Madison
Abstract: Logistic regression with binary and multinomial outcomes is commonly
used, and researchers have long searched for an interpretable measure of the strength
of a particular logistic model. This article describes the large sample properties
of some pseudo-R2 statistics for assessing the predictive strength of the logistic
regression model. We present theoretical results regarding the convergence and
asymptotic normality of pseudo-R2s. Simulation results and an example are also
presented. The behavior of the pseudo-R2s is investigated numerically across a
range of conditions to aid in practical interpretation.
Key words and phrases: Entropy, logistic regression, pseudo-R2
1. Introduction
Logistic regression for binary and multinomial outcomes is commonly used
in health research. Researchers often desire a statistic ranging from zero to one
to summarize the overall strength of a given model, with zero indicating a model
with no predictive value and one indicating a perfect fit. The coefficient of deter-
mination R2 for the linear regression model serves as a standard for such measures
(Draper and Smith (1998)). Statisticians have searched for a corresponding in-
dicator for models with binary/multinomial outcome. Many different R2 statis-
tics have been proposed in the past three decades (see, e.g., McFadden (1973),
McKelvey and Zavoina (1975), Maddala (1983), Agresti (1986), Nagelkerke
(1991), Cox and Wermuch (1992), Ash and Shwartz (1999), Zheng and Agresti
(2000)). These statistics, which are usually identical to the standard R2 when
applied to a linear model, generally fall into categories of entropy-based and
variance-based (Mittlböck and Schemper (1996)). Entropy-based R2 statistics,
also called pseudo-R2s, have gained some popularity in the social sciences (Mad-
dala (1983), Laitila (1993) and Long (1997)). McKelvey and Zavoina (1975)
proposed a pseudo-R2 based on a latent model structure, where the binary/
multinomial outcome results from discretizing a continuous latent variable that
is related to the predictors through a linear model. Their pseudo-R2 is defined
as the proportion of the variance of the latent variable that is explained by the
848 BO HU, JUN SHAO AND MARI PALTA
covariate. McFadden (1973) suggested an alternative, known as “likelihood-
ratio index”, comparing a model without any predictor to a model including all
predictors. It is defined as one minus the ratio of the log likelihood with inter-
cepts only, and the log likelihood with all predictors. If the slope parameters
are all 0, McFadden’s R2 is 0, but it is never 1. Maddala (1983) developed
another pseudo-R2 that can be applied to any model estimated by the maximum
likelihood method. This popular and widely used measure is expressed as
R2M = 1 −
(
L(θ̃)
L(θ̂)
)
2
n
, (1)
.
Stations yourself somewhere (library, cafeteria, etc.) and observe.docxrafaelaj1
Stations yourself somewhere (library, cafeteria, etc.) and observe the nonverbal communication that occurs.
What do people say with their bodies?
What messages are implicit in vocal expressions, clothes, make-up and so on?
Take notes on five of the most eloquent messages sent nonverbally.
*one page.
*Read the instructions then write about 5 difeerent people
.
More Related Content
Similar to Statistics is both the science of uncertainty and the technology.docx
The document discusses various data analysis and visualization techniques in Microsoft Excel including filtering, sorting, formulas, functions, pivot tables, charts and conditional formatting. It provides step-by-step instructions on how to use these tools to extract insights from data by filtering to select specific records, using formulas and functions like VLOOKUP to perform calculations, sorting data, creating pivot tables and pivot charts to summarize and visualize data relationships, and applying conditional formatting to highlight important values.
This document provides tips for managing spreadsheets and extracting information from data. It recommends using Google Sheets to collaborate on spreadsheets with others. It also outlines various spreadsheet functions for summarizing data, extracting text, concatenating strings, and looking up values. Conditional formatting is suggested to highlight important information. Pivot tables are presented as a way to summarize tables with filters and aggregations.
1. The document discusses different topics related to data collection and presentation including sources of data, data collection methods, processing data, and presenting data through graphs, tables, frequency distributions, and other visual formats.
2. Common data collection methods are surveys, observation, interviews, and existing sources; data must then be processed, organized, and cleaned before analysis.
3. Data can be presented visually through tables, graphs, frequency distributions and other charts to reveal patterns and insights in the data in a clear, understandable format.
Focusing on specific data by using filterssum5ashm
1. Excel allows users to focus on important data by limiting the data displayed through powerful filtering tools. Filters can be applied to individual columns to display only certain values.
2. Formulas like SUM and AVERAGE do not dynamically update when rows are hidden, but SUBTOTAL and AGGREGATE functions can summarize just the visible data. Finding unique values in a column can also help analyze data.
3. Validation rules restrict data entry to valid values, helping catch errors. Rules define allowed data types, values, and display custom messages to users.
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptxDishantGola
The document provides instructions for an Excel training course. It lists the students and batch details, and thanks the instructors. It then outlines the topics to be covered in the course, including cell referencing, charts, functions like IF and logical functions, calculators for income tax and HRA exemption, data validation, data protection, pivot tables, conditional formatting, data analysis tools, and dashboard reporting. The purpose is to teach how to use Excel in the CA profession.
Chapter 2: Frequency Distribution and GraphsMong Mara
This document discusses different types of graphs and charts that can be used to represent frequency distributions of data, including histograms, frequency polygons, ogives, bar charts, pie charts, and stem-and-leaf plots. It provides examples of how to construct each graph or chart using sample data sets and discusses key aspects of each type such as class intervals, relative frequencies, and ordering of data. Guidelines are given for determining the optimal number of classes and class widths for grouped data. Exercises at the end provide practice applying these techniques to additional data sets.
UPDATED NOTE Nov 2013: this method of storing information is no longer recommended by the creator of this presentation in light of new data... We migrated to WorldShare Management systems in June 2013, and this presentation is to show how our library is using WorldCat Local lists to create reports on current serials subscriptions because our system currently does not have the reporting ability we need to do our work.
Microsoft Excel is a powerful tool used for creating and formatting spreadsheets. Spreadsheets allow information to be organized in rows and columns and analyzed using automatic mathematics calculations. Excel is commonly used to perform various types of calculations by using functions like IF, AND, OR, SUM, VLOOKUP, and more. Macros can also be recorded and assigned to buttons to automate repetitive tasks in Excel.
This document discusses methods for summarizing data, including frequency distributions, measures of central tendency, and measures of dispersion. It provides examples and formulas for constructing frequency distributions and calculating the mean, median, mode, range, variance, and standard deviation. Key points covered include using frequency distributions to group data, calculating central tendency measures for grouped data, and methods for measuring dispersion both for raw data and grouped data.
Excel tutorial for frequency distributionS.c. Chopra
This document provides a step-by-step tutorial for creating a frequency distribution table in Excel. It explains how to:
1. Prepare the data by naming columns and creating a "FreqDist" sheet.
2. Fill out a template table with parameters like number of observations, class interval, and minimum/maximum values.
3. Use formulas to determine values like class limits, frequencies, and cumulative percentages.
4. Copy formulas down to automatically generate the full distribution table.
The tutorial demonstrates an easy way to analyze numeric data sets in Excel by creating frequency distributions.
This document outlines a training overview for a Microsoft Excel extended introduction course. The course consists of 6 classes covering topics like terminology, navigation, formatting, functions, macros, importing data, and charts. Each class is scheduled for a different date and includes the topics that will be covered, such as formatting, sorting, filtering, and different types of functions like date, logical, and statistical functions.
The document provides steps to calculate summary statistics and create plots for different datasets in Excel. For the first dataset of nuclear reactor counts from 104 to 111, it describes calculating the mean, mode, median using the AVERAGE, MODE, and MEDIAN functions directly in cells or using the Insert Function tool. For the second dataset of European auto sales from 11.2 to 14.3, it describes calculating the variance, standard deviation, and range using the VAR, STDEV, and MAX-MIN functions. Finally, it provides steps to generate a boxplot and stem-and-leaf plot for a third dataset ranging from 23 to 51 using the MegaStat add-in.
1. Outline the differences between Hoarding power and Encouraging..docxpaynetawnya
1. Outline the differences between Hoarding power and Encouraging.
2. Explain about the power of Congruency in Leadership.
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseDegreeGender1GrCopy Employee Data set to this page.822.10.962233290915.81FAThe ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)? 1522.60.984233280814.91FANote: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.3522.60.984232390415.30FA37230.999232295216.20FAThe column labels in the table mean:1023.11.003233080714.71FAID – Employee sample number Salary – Salary in thousands 2323.11.004233665613.30FAAge – Age in yearsPerformance Rating – Appraisal rating (Employee evaluation score)1123.31.01223411001914.81FASERvice – Years of serviceGender: 0 = male, 1 = female 2623.51.020232295216.20FAMidpoint – salary grade midpoint Raise – percent of last raise3123.61.028232960413.91FAGrade – job/pay gradeDegree (0= BS\BA 1 = MS)3623.61.026232775314.30FAGender1 (Male or Female)Compa-ratio - salary divided by midpoint4023.81.034232490206.30MA14241.04523329012161FA4224.21.0512332100815.71FA1924.31.055233285104.61MA25251.0872341704040MA3226.50.855312595405.60MB227.70.895315280703.90MB3428.60.923312680204.91MB3933.91.094312790615.50FB2034.11.1013144701614.80FB1834.51.1133131801115.60FB335.11.132313075513.61FB1341.11.0274030100214.70FC741.31.0324032100815.71FC1642.21.054404490405.70MC4145.81.144402580504.30MC2746.91.172403580703.91MC548.21.0044836901605.71MD3049.31.0274845901804.30MD2456.31.173483075913.80FD4556.91.185483695815.21FD4757.21.003573795505.51ME3357.51.008573590905.51ME4581.01857421001605.51ME3858.81.0325745951104.50ME5059.61.0465738801204.60ME4660.21.0575739752003.91ME2260.31.257484865613.81FD161.61.081573485805.70ME4461.81.0855745901605.21ME49631.1055741952106.60ME1763.71.1185727553131FE1264.71.1355752952204.50ME4869.51.2195734901115.31FE973.91.103674910010041MF4375.61.1286742952015.50FF2976.31.139675295505.40MF2177.21.1526743951306.31MF678.11.1656736701204.51MF2878.31.169674495914.40FF
Week 2This assignment covers the material presented in weeks 1 and 2.Six QuestionsBefore starting this assignment, make sure the the assignment data from the Employee Salary Data Set file is copied over to this Assignment file.You can do this either by a copy and paste of all the columns or by opening the data file, right clicking on the Data tab, selecting Move or Copy, and copying the entire sheet to this file(Weekly Assignment Sheet or whatever you are calling your master assignment file).It is highly recommended that you copy the data columns (with labels) and paste them to the right so that whatever you do will not disrupt the original data values and relationships.To Ensure full credit for each question, you need to show how you got your results. For example, Question 1 asks for several data values. If you obtain them using descript ...
Summarizing Data : Listing and Grouping pdfJustynOwen
Introduction
Descriptive Statistics describe basic features of the data gathered from an experimental study in various ways.
They provide simple summaries about the sample via graphs and numbers, mainly measures of center and variation.
Together with graphics analysis (histograms, bar plots, pie-charts), they are the cornerstone of quantitative data analysis.
Tables (frequency distributions, stem-and-leaf plots, …) that summarize the data.
Graphical representations of the data (histograms, bar plots, pie-charts).
Summary statistics (numbers) which summarize the data
The document discusses the key components of Microsoft Excel, including worksheets, cells, formulas, functions, charts, and printing. It describes how to enter and format data, use formulas and functions, navigate between sheets, resize rows and columns, and create basic charts using the Chart Wizard. Key components of the Excel window include the worksheet, formula bar, row and column headings, and sheet tabs. Formulas in Excel always begin with an equal sign and can include arithmetic operators. Functions like SUM can be used to calculate values across ranges of cells.
Oracle provides several analytical functions that allow for powerful data analysis using SQL. These include group functions that aggregate data over groups or windows, as well as window functions like ROW_NUMBER, RANK, and LAG that analyze data relative to the current row. ROLLUP and CUBE extensions to the GROUP BY clause enable calculation of subtotals across multiple dimensions of data with a single query.
The document discusses data mining and the Microsoft SQL Server 2005 Data Mining Add-ins for Excel 2007. It provides an overview of data mining, how the add-in works, its prerequisites, who can use it, and how to use its various tools for data preparation, modeling, validation and connection to SQL Server Analysis Services.
Data mining refers to analyzing data sets to discover hidden patterns and trends. This information can help companies improve strategies for marketing, analyzing customers and markets, increasing revenue, and forecasting sales. Data mining has proven useful in business, computing, biotechnology, and analyzing stock markets. While a relatively new term, data mining has long been used by large corporations to analyze large data sets and draw conclusions. Microsoft has introduced the SQL Server Data Mining Add-ins for Office 2007 to make data mining accessible through a familiar Microsoft Office environment. It connects Excel to the powerful data mining algorithms in SQL Server Analysis Services. The add-in allows users to perform tasks like data preparation, modeling, and validating models with just a few clicks.
Similar to Statistics is both the science of uncertainty and the technology.docx (20)
Statistica Sinica 16(2006), 847-860
PSEUDO-R
2
IN LOGISTIC REGRESSION MODEL
Bo Hu, Jun Shao and Mari Palta
University of Wisconsin-Madison
Abstract: Logistic regression with binary and multinomial outcomes is commonly
used, and researchers have long searched for an interpretable measure of the strength
of a particular logistic model. This article describes the large sample properties
of some pseudo-R2 statistics for assessing the predictive strength of the logistic
regression model. We present theoretical results regarding the convergence and
asymptotic normality of pseudo-R2s. Simulation results and an example are also
presented. The behavior of the pseudo-R2s is investigated numerically across a
range of conditions to aid in practical interpretation.
Key words and phrases: Entropy, logistic regression, pseudo-R2
1. Introduction
Logistic regression for binary and multinomial outcomes is commonly used
in health research. Researchers often desire a statistic ranging from zero to one
to summarize the overall strength of a given model, with zero indicating a model
with no predictive value and one indicating a perfect fit. The coefficient of deter-
mination R2 for the linear regression model serves as a standard for such measures
(Draper and Smith (1998)). Statisticians have searched for a corresponding in-
dicator for models with binary/multinomial outcome. Many different R2 statis-
tics have been proposed in the past three decades (see, e.g., McFadden (1973),
McKelvey and Zavoina (1975), Maddala (1983), Agresti (1986), Nagelkerke
(1991), Cox and Wermuch (1992), Ash and Shwartz (1999), Zheng and Agresti
(2000)). These statistics, which are usually identical to the standard R2 when
applied to a linear model, generally fall into categories of entropy-based and
variance-based (Mittlböck and Schemper (1996)). Entropy-based R2 statistics,
also called pseudo-R2s, have gained some popularity in the social sciences (Mad-
dala (1983), Laitila (1993) and Long (1997)). McKelvey and Zavoina (1975)
proposed a pseudo-R2 based on a latent model structure, where the binary/
multinomial outcome results from discretizing a continuous latent variable that
is related to the predictors through a linear model. Their pseudo-R2 is defined
as the proportion of the variance of the latent variable that is explained by the
848 BO HU, JUN SHAO AND MARI PALTA
covariate. McFadden (1973) suggested an alternative, known as “likelihood-
ratio index”, comparing a model without any predictor to a model including all
predictors. It is defined as one minus the ratio of the log likelihood with inter-
cepts only, and the log likelihood with all predictors. If the slope parameters
are all 0, McFadden’s R2 is 0, but it is never 1. Maddala (1983) developed
another pseudo-R2 that can be applied to any model estimated by the maximum
likelihood method. This popular and widely used measure is expressed as
R2M = 1 −
(
L(θ̃)
L(θ̂)
)
2
n
, (1)
.
Stations yourself somewhere (library, cafeteria, etc.) and observe.docxrafaelaj1
Stations yourself somewhere (library, cafeteria, etc.) and observe the nonverbal communication that occurs.
What do people say with their bodies?
What messages are implicit in vocal expressions, clothes, make-up and so on?
Take notes on five of the most eloquent messages sent nonverbally.
*one page.
*Read the instructions then write about 5 difeerent people
.
StatementState legislatures continue to advance policy proposals.docxrafaelaj1
Statement
State legislatures continue to advance policy proposals to address cyber threats directed at governments and private businesses. As threats continue to evolve and expand and as the pace of new technologies accelerates, legislatures are making cybersecurity measures a higher priority.
Assignment
You are to author a 2-page (maximum) paper about the “failed” amendments proposed by the Kentucky legislature in 2019 with respect to Cyber Policy. APA format – 1 cover page, 2 content pages, and 1 reference page.
You are to answer two questions in your individual papers.
Brief background of the proposed amendment and “researched” speculation as to why it failed?
What would you propose for them to pass in 2020?
Remember to cite your sources appropriately and turn in original work!
Section 54
KY S 14
Status: Failed - Adjourned
Provides definitions relating to personal information, provides certain personal information that shall be protected from disclosure by a public agency or third-party contractor through redaction or other means, provides a list of covered persons, provides guidelines for contracts between a public agency and a third-party contractor.
.
StatementState legislatures continue to advance policy propo.docxrafaelaj1
Statement
State legislatures continue to advance policy proposals to address cyber threats directed at governments and private businesses. As threats continue to evolve and expand and as the pace of new technologies accelerates, legislatures are making cybersecurity measures a higher priority.
Assignment
You are to author a 2-page (maximum) paper about the “failed” amendments proposed by the Kentucky legislature in 2019 with respect to Cyber Policy. APA format – 1 cover page, 2 content pages, and 1 reference page.
You are to answer two questions in your individual papers.
Brief background of the proposed amendment and “researched” speculation as to why it failed?
What would you propose for them to pass in 2020?
Remember to cite your sources appropriately and turn in original work!
KY S 14
Status: Failed - Adjourned
Provides definitions relating to personal information, provides certain personal information that shall be protected from disclosure by a public agency or third-party contractor through redaction or other means, provides a list of covered persons, provides guidelines for contracts between a public agency and a third-party contractor.
.
Statement of PurposeProvide a statement of your educational .docxrafaelaj1
Statement of Purpose
Provide a statement of your educational background, experience, and preparation relevant to a graduate program in computer science, and specify your research and career goals.
The statement of purpose is a short essay introducing the applicant and his or her
interests, goals, and reasons for pursuing graduate study in history. Applicants may wish
to share a draft of their statement with the individuals writing their letters of
recommendation. While every statement, like every prospective student, will be different,
applicants should devote special attention to the following items:
• Academic/Professional Background: Please give your academic credentials, with
degrees, dates, and relevant employment experience. You do not need to list every
job you have had, only those that bear directly on your desire to enter graduate
school.
• Motivations and Aims: Explain what motivates you to do graduate work in history
and what your goals are, both within the graduate program and after the
completion of your degree.
• Existing Expertise and Accomplishments in History: Discuss any areas of
expertise you may already have in your proposed area of interest. If you have
experience doing research, please describe the project and your work on it. If you
have any special talents or skills, such as a foreign language, please describe
them.
• Proposed Course of Study: Please identify planned major field and minor fields of
study.
• Other Relevant Experiences or Personal Qualities: Discuss any experiences or
personal attributes that may illuminate your commitment to the study of history
and to the successful completion of the graduate program.
Format: Your statement of purpose should be limited to no more than 750 words
(between 2 and 3 pages).
.
States and the federal government should not use private prisons for.docxrafaelaj1
States and the federal government should not use private prisons for various reasons. First, most of the private prisons are for-profit facilities. Therefore, they cut on expenses such as lacking enough staffing and resources, which is likely to affect inmates' safety and quality of life. Further, while pro-private prisons note that private prisons save taxpayers' money, studies indicate that they do not reduce costs. For instance, the day to day cost of housing an inmate in 2010 was $53.02 for private prisons compared to $48.42 for a medium-security public prison (Pedowitz, 2012). Also, prisoners do not receive similar kinds of treatment in private facilities. While they may be suitable for the local economy, such as offering job opportunities, lowering costs by private facilities leaves inmates sick and not well cared for (NPR Staff, 2011).
.
StatementState legislatures continue to advance policy proposa.docxrafaelaj1
The document discusses a 2-page paper assignment on failed cybersecurity amendments proposed by the Kentucky legislature in 2019. Students are asked to analyze why one amendment failed and propose a new amendment for 2020. The failed amendment, KY S 14, aimed to protect personal information from disclosure by public agencies or third-party contractors.
Statement of Interest (This is used to apply for Graduate Schoo.docxrafaelaj1
Statement of Interest: (This is used to apply for Graduate School, digital media program)
Length: 2 pages. (500-750 words)
Area of interest in digital media.
-computational arts.
I did a mix media group exhibition in Feb. 2018 called What Makes You You. Half of my show is a sculpture-based installation. The other half is an interactive digital programed art(using Processing software). The visual of human evolution ties these two park together. See details at https://dongpu.weebly.com/what-makes-you-you.html
-short videos (documentary production).
I love shooting short videos. I formed a Youtube team called 2037 Club last year. https://www.youtube.com/channel/UCmtUQfDMvL9iE8IOy9oshSA
The latest documentary I did is called Liang(Grain). Video Statement:This documentary discovers the Chinese planned economy history period, which starts at the 1950s. People were given a certain amount of coupons to buy food and daily needs because of the limitation of products. Since this is a historical theme, reference images are included to support the concept of buying food today and before 1990. Other than that, the visuals are mainly about common people’s daily routine nowadays. Along with the visual, the most artistic part in this video is Shanghainese dialogue, which explained food coupons in the way of storytelling. “I accidentally found many food coupons in my grandparents’ house this summer, so I went to ask them about the experiences they had with these coupons. My curiosity leads me to the theme.” said by the producer.
This particular video also has a different meaning to me. My Grandmother passed away one week before the video published at the film festival (at Scottsdale Museum of Commemoratory Art). This piece becomes memorable to me, sadly, my grandmother never had a chance to see the whole piece.
https://www.youtube.com/watch?v=fD0Y-BXDfnY
-3D animation for games
I learned Maya in an animation course. Like editing videos, I soon full in love with 3D modeling.
https://www.youtube.com/watch?v=nGIRxaUdYiY
Business idea, if an applicant has one. It is fine if an applicant is unsure when applying to the program. It is also fine if an applicant is interested in the artistic aspect of digital media and not the entrepreneurial aspect.
I would like to complete a mobile game project in my Graduate studies period and start a game company after graduation. Meanwhile, still active in the art field being an intermedia artist.
Goals or expectations upon completion of the digital media program.
-I want to learn more about computer technologies to create artistic works. Focusing on the field of game development.
-Do more social, get to know people in my field
-Get professional advices of my projects
Here is my cover letter, you can utilize this for the statement of interest.
As a creative and passionate professional with a rich history of developing creative materials, I am eager to submit my resume for consideration for the (Position Title) position .
StatementState legislatures continue to advance policy prop.docxrafaelaj1
Statement
State legislatures continue to advance policy proposals to address cyber threats directed at governments and private businesses. As threats continue to evolve and expand and as the pace of new technologies accelerates, legislatures are making cybersecurity measures a higher priority.
Assignment
You are page amendments to author a 2-page (maximum) paper about the “failed” amendments proposed by the Kentucky legislature in 2019 with respect to Cyber Policy. APA format – 1 cover page, 2 content pages, and 1 reference pageamendments proposed.
You are to answer two questions in your individual papers.
Brief background of the proposed amendment and “researched” speculation as to why it failed?
What would you propose for them to pass in 2020?
.
Statement of cash flows (indirect method) Cash flows from ope.docxrafaelaj1
Statement of cash flows (indirect method)
Cash flows from operating activities
Net income
72,600
adjustments to net income
depreciation
4,000
Gan on sale of investments
-7,000
Increase in AR
-36,000
Decrease in inventory
40,000
Increased in Accounts payable
13,000
Decrease in Accrued liabilities
-3,100
net cash provided by operating activities
83,500
Cash flows from investing activities
Purchase of Plant assets
-16,000
Sale of long-term investments
20,000
net cash provided by investing activities
4,000
Cash flows from financing activities
retiement of bonds
-31,000
payment of dividend
-32,500
sale of common stock
6,000
net cash provided by financing activities
-57,500
net increase in cash
30,000
Cash balance, beginning
230,000
cash balance, ending
260,000
Statement of Cash flows (direct method)
Cash flows from operating activities
cash received from customers
714,000
(sales - increase in AR)
cash paid for merchandise
477,000
(cogs - decrease in invnetory - increase in AP)
cash paid for other operating expenes
105,100
(selling & admin exp + decrease in accrued liab - depreciation)
cash paid for income taxes
48,400
net cash provided b oeprating activities
83,500
Cash flows from investing activities
Purchase of Plant assets
-16,000
Sale of long-term investments
20,000
net cash provided by investing activities
4,000
Cash flows from financing activities
retiement of bonds
-31,000
payment of dividend
-32,500
sale of common stock
6,000
net cash provided by financing activities
-57,500
net increase in cash
30,000
Cash balance, beginning
230,000
cash balance, ending
260,000
.
Stateline Shipping and Transport CompanyRachel Sundusky is the m.docxrafaelaj1
Stateline Shipping and Transport Company
Rachel Sundusky is the manager of the South-Atlantic office of the Stateline Shipping and Transport Company. She is in the process of negotiating a new shipping contract with Polychem, a company that manufactures chemicals for industrial use. Polychem want Stateline to pick up and transport waste products from its six plants to three waste disposal sites. Rachel is very concerned about this proposed arrangement. The chemical wastes that will be hauled can be hazardous to humans and the environment if they leak. In addition, a number of towns and communities in the region where the plants are located prohibit hazardous materials from being shipped through their municipal limits. Thus, not only will the shipments have to be handled carefully and transported at reduced speeds, they will also have to traverse circuitous routes in many cases. Rachel has estimated the cost of shipping a barrel of waste from each of the six plants to each of the three waste disposal sites as shown in the following table:
Waste Disposal Site
Plant
Whitewater
Los Canos
Duras
Kingsport
$12
$15
$17
Danville
14
9
10
Macon
13
20
11
Selma
17
16
19
Columbus
7
14
12
Allentown
22
16
18
The plants generate the following amounts of waste products each week:
Plant
Waste per Week (bbl)
Kingsport
35
Danville
26
Macon
42
Selma
53
Columbus
29
Allentown
38
The three waste disposal sites at Whitewater, Los Canos, and Duras can accommodate a maximum of 65, 80, and 105 barrels per week respectively. In addition to shipping directly from each of the six plants to one of the three waste disposal sites, Rachel is also considering using each of the plants and waste disposal sites as intermediate shipping points. Trucks would be able to drop a load at a plant or disposal site to be picked up and carried on to the final destination by another truck, and vice versa. Stateline would not incur any handling costs because Polychem has agreed to take care of all local handling of the waste materials at the plants and the waste disposal sites. In other words, the only cost Stateline incurs is the actual transportation cost. So Rachel wants to be able to consider the possibility that it may be cheaper to drop and pick up loads at intermediate points rather than ship them directly. Rachel estimates the shipping costs per barrel between each of the six plants to be as follows:
Plant
Plant
Kingsport
Danville
Macon
Selma
Columbus
Allentown
Kingsport
$ __
$6
$4
$9
$7
$8
Danville
6
__
11
10
12
7
Macon
5
11
__
3
7
15
Selma
9
10
3
__
3
16
Columbus
7
12
7
3
__
14
Allentown
8
7
15
16
14
__
The e.
State Two ways in which Neanderthals and Cro-Magnons differed. .docxrafaelaj1
State Two ways in which Neanderthals and Cro-Magnons differed.
List an important achievement for each of these scientist. Aristarchus, Euclid, Archimedes, and Herophilus
"Civilizations" is defined as the stage of development in which people have developed :
1. large, permanet communites.
2. a system of writing
3. divirsion
4. trade
5. a srtong central goverment
.
STAT 3300 Homework #6Due Thursday, 03282019Note Answe.docxrafaelaj1
This document outlines a homework assignment for a statistics course. It provides details on a multiple regression analysis examining the relationship between average student debt and various college metrics like admission rates, graduation rates, and in-state costs. The assignment asks students to conduct the multiple regression, analyze residuals, test hypotheses, and determine which variables are significant predictors of debt. It also provides learning objectives for a chapter on juvenile justice treatment and prevention programs.
State Standard by Content AreaLiteracy State Standard to Integra.docxrafaelaj1
State Standard by Content Area
Literacy State Standard to Integrate into Another Content Area
Use a different literacy standard for each content standard.
Standards-based Learning Objective
Aligned to content standards
Instructional Strategy to Integrate Literacy
Resources
Provide links to websites, PDFs, and any other documents used or referenced for strategy
Rationale
How the strategy will promote balanced literacy curriculum
State Content Standard 1:
State Content Standard 2:
State Content Standard 3:
.
STAT200: Assignment #2 - Descriptive Statistics Analysis and Writeup - Instructions
Page 1 of 3
STAT200 Introduction to Statistics
Assignment #2: Descriptive Statistics Analysis and Writeup
Assignment #2: Descriptive Statistics Analysis and Writeup
In the first assignment (Assignment #1: Descriptive Statistics Analysis Data Plan), you developed a
scenario about annual household expenditures and a plan for analyzing the data using descriptive
statistic methods. The purpose of this assignment is to carry out the descriptive statistics analysis plan
and write up the results. The expected outcome of this assignment is a two to three page write-up of
the findings from your analysis as well as a recommendation.
Assignment Steps:
Step #1: Review Feedback from Your Instructor
Before performing any analysis, please make sure to review your instructor’s feedback on Assignment
#1: Descriptive Statistics Data Analysis Plan. Based on the feedback, modify variables, tables, and
selected statistics, graphs, and tables, if needed.
Step #2: Perform Descriptive Statistic Analysis
Task 1: Look at the dataset.
• (Re)Familiarize yourself with the variables. Review Table 1: Variables Selected for the
Analysis you generated for the first assignment as well as your instructor’s feedback. In
addition, look at the data dictionary contained in the data set for information about the
variables.
• Select the variables you need for the analysis.
Task 2: Complete your data analysis, as outlined in your first assignment, with any needed
modifications, based on your instructor’s feedback.
• Calculate Measures of Central Tendency and Variability. Use the information from
Assignment #1 - Table 2. Numerical Summaries of the Selected Variables. Here again,
be sure to see your instructor’s feedback and incorporate into the analysis.
• Prepare Graphs and/or Tables. Use the information from Assignment #1 - Table 3.
Type of Graphs and/or Tables for Selected Variables. Here again, be sure to see your
instructor’s feedback and incorporate into the analysis.
STAT200: Assignment #2 - Descriptive Statistics Analysis and Writeup - Instructions
Page 2 of 3
Step #3: Write-up findings using the Provided Template
For this part of the assignment, write a short 2-3 page write-up of the process you followed and the
findings from your analysis. You will describe, in words, the statistical analysis used and present the
results in both statistical/text and graphic formats.
Here are the main sections for this assignment:
✓ Identifying Information. Fill in information on name, class, instructor, and date.
✓ Introduction. For this section, use the same scenario you submitted for the first assignment and
modified using your instructor’s feedback, if needed. Include Table 1 (Table 1: Variables
Selected for the Analysis) you used in Assignment #1 to show the variables you selected for the
analysis.
✓ Data .
STAT200: Assignment #2 - Descriptive Statistics Analysis Writeup - Template
Page 3 of 3
University of Maryland University College
STAT200 - Assignment #2: Descriptive Statistics Analysis and Writeup
Identifying Information
Student (Full Name):
Class:
Instructor:
Date:
Introduction:
Use the same scenario you submitted for the first assignment with modifications using your instructor’s feedback, if needed. Include Table 1: Variables Selected for the Analysis you used in Assignment #1 to show the variables you selected for analysis.
Table 1. Variables Selected for the Analysis
Variable Name in data set
Description
Type of Variable (Qualitative or Quantitative)
Variable 1: “Income”
Annual household income in USD.
Quantitative
Variable 2:
Variable 3:
Variable 4:
Variable 5:
Data Set Description and Method Used for Analysis:
Results:
Variable 1: Income
Numerical Summary.
Table 2. Descriptive Analysis for Variable 1
Variable
n
Measure(s) of Central Tendency
Measure(s) of Dispersion
Variable: Income
Median=
SD =
Graph and/or Table: Histogram of Income
(Place Histogram here)
Description of Findings.
Variable 2: (Fill in name of variable)
Numerical Summary.
Table 3. Descriptive Analysis for Variable 2
Variable
n
Measure(s) of Central Tendency
Measure(s) of Dispersion
Variable:
Graph and/or Table.
(Place Graph or Table Here)
Description of Findings.
Variable 3: (Fill in name of variable)
Numerical Summary.
Table 4. Descriptive Analysis for Variable 3
Variable
n
Measure(s) of Central Tendency
Measure(s) of Dispersion
Variable:
Graph and/or Table.
(Place Graph or Table Here)
Description of Findings.
Variable 4: (Fill in name of variable)
Numerical Summary.
Table 5. Descriptive Analysis for Variable 4
Variable
N
Mean/Median
St. Dev.
Variable 4:
Graph and/or Table.
(Place Graph or Table Here)
Description of Findings.
Variable 5: (Fill in name of variable)
Numerical Summary.
Table 6. Descriptive Analysis for Variable 5
Variable
n
Measure(s) of Central Tendency
Measure(s) of Dispersion
Variable:
Graph and/or Table.
(Place Graph or Table Here)
Description of Findings.
Discussion and Conclusion.
Briefly discuss each variable in the same sequence as presented in the results. What has the highest expenditure? What variable has the lowest expenditure? If you were to recommend a place to save money, which expenditure would it be and why? Note: The section should be no more than 2 paragraphs.
STAT200 Introduction to Statistics
Dataset for Written Assignments
Description of Dataset:
The data is a random sample from the US Department of Labor’s 2016 Consumer Expenditure Surveys (CE) and provides information about the composition of households and their annual expenditures (https://www.bls.gov/cex/). It contains information from 30 households, where a survey responder provided the requested information; it is all self-reported information. This dataset contains four socioeconomic variables (whose names.
State legislatures continue to advance policy proposals to address c.docxrafaelaj1
The document discusses a 2-page paper assignment on failed cybersecurity policy amendments proposed by the Kentucky legislature in 2019. Students are asked to answer two questions: 1) provide a brief background on a proposed amendment that failed and speculate on why it failed, and 2) propose an amendment for the legislature to pass in 2020. The assignment requires citing sources and original work, and is due by the specified date. It also provides background on one failed proposed amendment related to protecting personal information.
State FLORIDAInstructionsThis written assignment requ.docxrafaelaj1
State: FLORIDA
Instructions
This written assignment requires the student to investigate his/her local, state and federal legislators and explore their assigned committees and legislative commitments. The student is expected to investigate current and actual legislative initiatives that have either passed or pending approval by the house, senate or Governor’s office. The student will draft a letter to a specific legislator and offer support or constructive argument against pending policy or legislation. The letter must be supported with a minimum of 3 evidence based primary citations. (See Rubric)
Submission Details:
Support your responses with examples.
Cite any sources in APA format.
Submit your document to the
Submissions Area
by
the due date assigned.
.
State of the Science Quality ImprovementNameInst.docxrafaelaj1
State of the Science Quality ImprovementNameInstitutionsDate
Abstract
The condition of chronic heart failure sometimes is referred to as congestive heart failure (CHF), which is recognized as an acute life-threatening disease that majorly affects millions of American citizens annually. The condition of the chronic heart failure results when the heart is incapable of sufficient pump the blood throughout the body tissues due to the weak heart muscles (January et al., 2019). Certain conditions, such as narrowed arteries in the heart (CAD) or high blood pressure, gradually leave the heart too weak or stiff to fill and pump efficiently. Moreover, there are some of the several conditions such as coronary artery diseases and hypertension that leads to acute and chronic heart failure in the body system. More importantly, to avoid the possibility of this dangerous condition as well as the ever-increasing of the re-admitted hospital continue, collectively, the patient must be able to control the earlier stated conditions along with diabetes as well as obesity at home-based care and with their primary healthcare providers as well. According to Santesmases-Masana et al. (2019), "Primary health care planned care has been shown to reduce heart failure re-hospitalizations and maintain the patient quality of life." With this known knowledge, it is important to continue care at home and with their primary care provider to monitor and detect worsening of their condition sooner rather than later with evidence-based treatment practices. There are many evidence-based treatments for chronic heart failure that includes monitoring of vital signs, weight, and diet along with medications. In this paper, chronic heart failure, problem discussion, PICO question, and theoretical framework will be presented.
Problem Discussion
Chronic heart failure is a chief public health care concern linked with the high degree of mortality and morbidity in the U.S. Heart failure usually results in adverse outcomes, and the most costly is the issues of hospital readmissions. Currently, the heart failure management clinical procedures and pieces of evidence emphasizes the significance and the function of the care interventions a mid preventing the heart failure readmissions in the hospital set up. The current literature review is meant to evaluate and assess the effectiveness of transitional care interventions that intend to minimize hospital readmissions. Increase hospital readmission and worsening chronic heart failure complications are due to lack of following of a primary care provider and home monitoring of vital signs, weight, diet, energy level, and breathing patterns by the patient. There are many evidence-based practices and comprehensive guidelines for chronic heart failure treatment with side effects of some medications about individual races. For instance, losartan has little to adverse impact on blacks. Furthermore, according to Hadidi et al. (2018), "It has been.
State Data_1986-2015YearGross state product per capitaEducation sp.docxrafaelaj1
The document provides data on various metrics for a U.S. state from 1986 to 2016 including gross state product per capita, education spending per student, unemployment rates, and high school graduation rates. It shows trends over time, with generally increasing economic output and education spending, and decreasing unemployment and increasing graduation rates. The data could help inform policymaking and planning.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Statistics is both the science of uncertainty and the technology.docx
1. Statistics is both the science of uncertainty and the technology
of extracting information from data.
A statistic is a summary measure of data.
Descriptive statistics are methods that describe and summarize
data.
Microsoft Excel supports statistical analysis in two ways:
1. Statistical functions
2. Analysis Toolpak add-in
Statistical Methods for Summarizing Data
A frequency distribution is a table that shows the number of
observations in each of several nonoverlapping groups.
Categorical variables naturally define the groups in a frequency
distribution.
To construct a frequency distribution, we need only count the
number of observations that appear in each category.
This can be done using the Excel COUNTIF function.
Frequency Distributions for Categorical Data
Example 3.16: Constructing a Frequency Distribution for Items
in the Purchase Orders Database
List the item names in a column on the spreadsheet.
Use the function =COUNTIF($D$4:$D$97,cell_reference),
where cell_reference is the cell containing the item name
2. Example 3.16: Constructing a Frequency Distribution for Items
in the Purchase Orders Database
Construct a column chart to visualize the frequencies.
Relative frequency is the fraction, or proportion, of the total.
If a data set has n observations, the relative frequency of
category i is:
We often multiply the relative frequencies by 100 to express
them as percentages.
A relative frequency distribution is a tabular summary of the
relative frequencies of all categories.
Relative Frequency Distributions
Example 3.17: Constructing a Relative Frequency Distribution
for Items in the Purchase Orders Database
First, sum the frequencies to find the total number (note that the
3. sum of the frequencies must be the same as the total number of
observations, n).
Then divide the frequency of each category by this value.
For numerical data that consist of a small number of discrete
values, we may construct a frequency distribution similar to the
way we did for categorical data; that is, we simply use
COUNTIF to count the frequencies of each discrete value.
Frequency Distributions for Numerical Data
In the Purchase Orders data, the A/P terms are all whole
numbers 15, 25, 30, and 45.
Example 3.18: Frequency and Relative Frequency Distribution
for A/P Terms
A graphical depiction of a frequency distribution for numerical
data in the form of a column chart is called a histogram.
Frequency distributions and histograms can be created using the
Analysis Toolpak in Excel.
Click the Data Analysis tools button in the Analysis group
under the Data tab in the Excel menu bar and select Histogram
4. from the list.
Excel Histogram Tool
Specify the Input Range corresponding to the data. If you
include the column header, then also check the Labels box so
Excel knows that the range contains a label. The Bin Range
defines the groups (Excel calls these “bins”) used for the
frequency distribution.
Histogram Dialog
If you do not specify a Bin Range, Excel will automatically
determine bin values for the frequency distribution and
histogram, which often results in a rather poor choice.
If you have discrete values, set up a column of these values in
your spreadsheet for the bin range and specify this range in the
Bin Range field.
Using Bin Ranges
We will create a frequency distribution and histogram for the
A/P Terms variable in the Purchase Orders database.
We defined the bin range below the data in cells H99:H103 as
follows:
5. Month
15
25
30
45
Example 3.19: Using the Histogram Tool
Histogram tool results:
Example 3.19: Using the Histogram Tool
For numerical data that have many different discrete values with
little repetition or are continuous, a frequency distribution
requires that we define by specifying
the number of groups,
the width of each group, and
the upper and lower limits of each group.
Choose between 5 to 15 groups, and the range of each should be
equal.
Choose the lower limit of the first group (LL) as a whole
number smaller than the minimum data value and the upper
limit of the last group (UL) as a whole number larger than the
maximum data value.
Histograms for Numerical Data
6. The data range from a minimum of $68.75 to a maximum of
$127,500; set the lower limit of the first group to $0 and the
upper limit of the last group to $130,000.
If we select 5 groups, using equation (3.2) the width of each
group is ($130,000 - 0) / 5 = $26,000
Example 3.20: Constructing a Frequency Distribution and
Histogram for Cost per Order
Ten-group histogram
Example 3.20: Constructing a Frequency Distribution and
Histogram for Cost per Order
Set the cumulative relative frequency of the first group equal to
its relative frequency. Then add the relative frequency of the
next group to the cumulative relative frequency.
For, example, the cumulative relative frequency in cell D3 is
computed as =D2+C3 = 0.000 + 0.447 = 0.447.
Example 3.21 Computing Cumulative Relative Frequencies
7. The kth percentile is a value at or below which at least k
percent of the observations lie. The most common way to
compute the kth percentile is to order the data values from
smallest to largest and calculate the rank of the kth percentile
using the formula:
Statistical software use different methods that often involve
interpolating between ranks instead of rounding, thus producing
different results.
The Excel function PERCENTILE.INC(array, k) computes the
kth percentile of data in the range specified in the array field,
where k is in the range 0 to 1, inclusive (i.e., including 0 and
1).
Percentiles
Compute the 90th percentile for Cost per order in the Purchase
Orders data.
Rank of kth percentile = nk/100 + 0.5
n = 94; k = 90
For the 90th percentile, the rank is
= 94(90)/100+0.5 = 85.1 (round to 85)
Value of the 85th observation = $74,375
Using the Excel function PERCENTILE.INC(G4:G97,0.9), the
90th percentile is $73,737.50, which is different from using
formula (3.3).
8. Examples 3.22 and 3.23: Computing Percentiles
Data >
Data Analysis >
Rank and Percentile
90.3rd percentile
= $74,375
(same result as
manually computing
the 90th percentile)
Example 3.24 Excel Rank and Percentile Tool
The Excel value of the 90th percentile that was computed in
Example 3.23 as $74,375 is the 90.3rd percentile value.
Quartiles break the data into four parts.
The 25th percentile is called the first quartile,Q1;
the 50th percentile is called the second quartile, Q2;
the 75th percentile is called the third quartile, Q3; and
the 100th percentile is the fourth quartile, Q4.
One-fourth of the data fall below the first quartile, one-half are
below the second quartile, and three-fourths are below the third
quartile.
Excel function QUARTILE. INC(array, quart), where array
specifies the range of the data and quart is a whole number
between 1 and 4, designating the desired quartile.
Quartiles
9. Compute the Quartiles of the Cost per Order data
First quartile: =QUARTILE.INC(G4:G97,1) = $6,757.81
Second quartile: =QUARTILE.INC(G4:G97,2) = $15,656.25
Third quartile: =QUARTILE.INC(G4:G97,3) = $27,593.75
Fourth quartile: =QUARTILE.INC(G4:G97,4) = $127,500.00
Example 3.25 Computing Quartiles in Excel
A cross-tabulation is a tabular method that displays the number
of observations in a data set for different subcategories of two
categorical variables.
A cross-tabulation table is often called a contingency table.
The subcategories of the variables must be mutually exclusive
and exhaustive, meaning that each observation can be classified
into only one subcategory, and, taken together over all
subcategories, they must constitute the complete data set.
Cross-Tabulations
Sales Transactions database
10. Count the number (and compute the percentage) of books and
DVDs ordered by region.
Example 3.26: Constructing a Cross-Tabulation
Cross-Tabulation Visualization: Chart of Regional Sales by
Product
Select the Insert tab.
Highlight the data.
Click on chart type, then subtype.
Use Chart Tools to customize.
Creating Charts in Microsoft Excel
11. Excel distinguishes between vertical and horizontal bar charts,
calling the former column charts and the latter bar charts.
A clustered column chart compares values across categories
using vertical rectangles;
a stacked column chart displays the contribution of each value
to the total by stacking the rectangles;
a 100% stacked column chart compares the percentage that each
value contributes to a total.
Column and bar charts are useful for comparing categorical or
ordinal data, for illustrating differences between sets of values,
and for showing proportions or percentages of a whole.
Column and Bar Charts
Example 3.2: Creating a Column Chart
Highlighted Cells
Highlight the range C3:K6, which includes the headings and
data for each category. Click on the Column Chart button and
then on the first chart type in the list (a clustered column chart).
Example 3.2: Creating a Column Chart
To add a title, click on the first icon in the Chart Layouts group.
Click on “Chart Title” in the chart and change it to “EEO
Employment Report—Alabama.” The names of the data series
can be changed by clicking on the Select Data button in the
12. Data group of the Design tab. In the Select Data Source dialog
(see below), click on “Series1” and then the Edit button. Enter
the name of the data series, in this case “All Employees.”
Change the names of the other data series to “Men” and
“Women” in a similar fashion.
Line charts provide a useful means for displaying data over
time.
You may plot multiple data series in line charts; however, they
can be difficult to interpret if the magnitude of the data values
differs greatly. In that case, it would be advisable to create
separate charts for each data series.
Line Charts
Example 3.3: A Line Chart for China Export Data
Pie Charts
A pie chart displays this by partitioning a circle into pie-shaped
areas showing the relative proportion.
Example 3.4: A Pie Chart for Census Data
13. Pie Charts
Data visualization professionals don't recommend using pie
charts. In a pie chart, it is difficult to compare the relative sizes
of areas; however, the bars in the column chart can easily be
compared to determine relative ratios of the data.
If you do use pie charts, restrict them to small numbers of
categories, always ensure that the numbers add to 100%, and
use labels to display the group names and actual percentages.
Avoid three-dimensional (3-D) pie charts—especially those that
are rotated—and keep them simple.
An area chart combines the features of a pie chart with those of
line charts.
Area charts present more information than pie or line charts
alone but may clutter the observer’s mind with too many details
if too many data series are used; thus, they should be used with
care.
Area Charts
Example 3.5: An Area Chart for Energy Consumption
Scatter charts show the relationship between two variables. To
construct a scatter chart, we need observations that consist of
pairs of variables.
Scatter Charts
14. Example 3.6: A Scatter Chart for Real Estate Data
A bubble chart is a type of scatter chart in which the size of the
data marker corresponds to the value of a third variable;
consequently, it is a way to plot three variables in two
dimensions.
Bubble Charts
Example 3.7: A Bubble Chart for Stock Comparisons
Stock chart
Surface chart
Doughnut chart
Radar chart
Miscellaneous Excel Charts
Many applications of business analytics involve geographic
data. Visualizing geographic data can highlight key data
relationships, identify trends, and uncover business
opportunities. In addition, it can often help to spot data errors
and help end users understand solutions, thus increasing the
likelihood of acceptance of decision models.
Companies like Nike use geographic data and information
15. systems for visualizing where products are being distributed and
how that relates to demographic and sales information. This
information is vital to marketing strategies.
Geographic mapping capabilities were introduced in Excel 2000
but were not available in Excel 2002 and later versions. These
capabilities are now available through Microsoft MapPoint
2010, which must be purchased separately.
Geographic Data
Visualizing and Exploring Data
Data visualization - the process of displaying data (often in
large quantities) in a meaningful fashion to provide insights that
will support better decisions.
Data visualization improves decision-making, provides
managers with better analysis capabilities that reduce reliance
on IT professionals, and improves collaboration and information
sharing.
Data Visualization
Tabular data can be used to determine exactly how many units
16. of a certain product were sold in a particular month, or to
compare one month to another.
For example, we see that sales of product A dropped in
February, specifically by 6.7% (computed as 1 – B3/B2).
Beyond such calculations, however, it is difficult to draw big
picture conclusions.
Example 3.1: Tabular vs. Visual Data Analysis
A visual chart provides the means to
easily compare overall sales of different products (Product C
sells the least, for example);
identify trends (sales of Product D are increasing), other
patterns (sales of Product C is relatively stable while sales of
Product B fluctuates more over time), and exceptions (Product
E’s sales fell considerably in September).
Example 3.1: Tabular vs. Visual Data Analysis
A dashboard is a visual representation of a set of key business
measures. It is derived from the analogy of an automobile’s
control panel, which displays speed, gasoline level,
temperature, and so on.
Dashboards provide important summaries of key business
information to help manage a business process or function.
Dashboards
17. Hypothesis Testing – Examples and
Case Studies
How Hypothesis Tests Are Reported
Determine the null hypothesis and the
alternative hypothesis.
Collect and summarize the data into a
test statistic.
Use the test statistic to determine the p-value.
The result is statistically significant if the p-value is less than
or equal to the level of significance.
2
18. Testing Hypotheses About Proportions and Means
If the null and alternative hypotheses are expressed in terms of
a population proportion, mean, or difference between two means
and if the sample sizes are large …
… the test statistic is simply the corresponding standardized
score computed assuming the null hypothesis is true; and the p-
value is found from a table of percentiles for standardized
scores.
3
Example 2: Weight Loss for Diet vs Exercise
Did dieters lose more fat than the exercisers?
Diet Only:
sample mean = 5.9 kg
sample standard deviation = 4.1 kg sample size = n = 42
Exercise Only: sample mean = 4.1 kg
sample standard deviation = 3.7 kg sample size = n = 47
measure of variability = [(0.633)2 + (0.540)2] = 0.83
19. 4
Example 2: Weight Loss for Diet vs Exercise
Step 1. Determine the null and alternative hypotheses.
Null hypothesis: No difference in average fat lost in population
for two methods. Population mean difference is zero.
Alternative hypothesis: There is a difference in average fat lost
in population for two methods. Population mean difference is
not zero.
Step 2. Collect and summarize data into a test statistic.
The sample mean difference = 5.9 – 4.1 = 1.8 kg and the
standard error of the difference is 0.83.
So the test statistic: z = 1.8 – 0 = 2.17
0.83
5
Example 2: Weight Loss for Diet vs Exercise
Step 3. Determine the p-value.
Recall the alternative hypothesis was two-sided.
p- -shaped curve above 2.17]
20. Step 4. Make a decision.
The p-value of 0.03 is less than or equal to 0.05, so …
If really no difference between dieting and exercise as fat loss
methods, would see such an extreme result only 3% of the time,
or 3 times out of 100.
Prefer to believe truth does not lie with null hypothesis. We
conclude that there is a statistically significant difference
between average fat loss for the two methods.
6
Example 3: Public Opinion About President
On May 16, 1994, Newsweek reported the results of a public
opinion poll that asked: “From everything you know about Bill
Clinton, does he have the honesty and integrity you expect in a
president?” (p. 23).
Poll surveyed 518 adults and 233, or 0.45 of them (clearly less
than half), answered yes.
Could Clinton’s adversaries conclude from this that only a
minority (less than half) of the population of Americans thought
Clinton had the honesty and integrity to be president?
7
Example 3: Public Opinion About President
21. Step 1. Determine the null and alternative hypotheses.
Null hypothesis: There is no clear winning opinion on this
issue; the proportions who would answer yes or no are each
0.50.
Alternative hypothesis: Fewer than 0.50, or 50%, of the
population would answer yes to this question. The majority do
not think Clinton has the honesty and integrity to be president.
Step 2. Collect and summarize data into a test statistic.
Sample proportion is: 233/518 = 0.45.
The standard deviation =
– 0.50) = 0.022.
518
Test statistic: z = (0.45 – 0.50)/0.022 = –2.27
8
Example 3: Public Opinion About President
Step 3. Determine the p-value.
Recall the alternative hypothesis was one-sided.
p-value = proportion of bell-shaped curve below –2.27 Exact p-
value = 0.0116.
Step 4. Make a decision.
The p-value of 0.0116 is less than 0.05, so we conclude that the
proportion of American adults in 1994 who believed Bill
22. Clinton had the honesty and integrity they expected in a
president was significantly less than a majority.
9
Revisiting Case Studies: How Journals Present Tests
Whereas newspapers and magazines tend to simply report the
decision from hypothesis testing, journals tend to report p-
values as well.
This allows you to make your own decision, based on the
severity of a type 1 error and the magnitude of the p-value.
10
Case Study 5.1: Quitting Smoking with
Nicotine Patches
11
Compared the smoking cessation rates for smokers randomly
assigned to use a nicotine patch versus a placebo patch.
Null hypothesis: The proportion of smokers in the population
who would quit smoking using a nicotine patch and a placebo
patch are the same.
Alternative hypothesis: The proportion of smokers in the
population who would quit smoking using a nicotine patch is
23. higher than the proportion who would quit using a placebo
patch.
Case Study 5.1: Quitting Smoking with
Nicotine Patches
12
Higher smoking cessation rates were observed in the active
nicotine patch group at 8 weeks (46.7% vs 20%) (P < .001)
and at 1 year (27.5% vs 14.2%) (P = .011).
(Hurt et al., 1994, p. 595)
Conclusion: p-values are quite small: less than 0.001 for
difference after 8 weeks and equal to 0.011 for difference after
a year. Therefore, rates of quitting are significantly higher
using a nicotine patch than using a placebo patch after 8 weeks
and after 1 year.
Case Study 6.4: Smoking During
Pregnancy and Child’s IQ
13
Study investigated impact of maternal smoking on subsequent
IQ of child at ages 1, 2, 3, and 4 years of age.
24. Null hypothesis: Mean IQ scores for children whose mothers
smoke 10 or more cigarettes a day during pregnancy are same as
mean for those whose mothers do not smoke, in populations
similar to one from which this sample was drawn.
Alternative hypothesis: Mean IQ scores for children whose
mothers smoke 10 or more cigarettes a day during pregnancy are
not the same as mean for those whose mothers do not smoke, in
populations similar to one from which this sample was drawn.
Case Study 6.4: Smoking During
Pregnancy and Child’s IQ
14
Children born to women who smoked 10+ cigarettes per day
during pregnancy had developmental quotients at 12 and 24
months of age that were 6.97 points lower (averaged across
these two time points) than children born to women who did not
smoke during pregnancy (95% CI: 1.62,12.31, P = .01); at 36
and 48 months they were 9.44 points lower (95% CI:
4.52, 14.35, P = .0002). (Olds et al., 1994, p. 223)
Researchers conducted two-tailed tests for possibility the mean
IQ score could actually be higher for those whose mothers
smoke. The CI provides evidence of the direction in which the
difference falls. The p-value simply tells us there is a
statistically significant difference.
For Those Who Like Formulas
25. 15
For Those Who Like Formulas
16
For Those Who Like Formulas
17
Statistics
Spring 2019
Module 3 Comprehensive Problem
26. INFERENTIAL STATISTICS – HYPOTHESIS TESTING
Either individually or in groups of 2 or 3, your task is to
perform some real-world inferential statistics. You will take a
claim that someone has made, form a hypothesis from that,
collect the data necessary to test the hypothesis, perform a
hypothesis test, and interpret the results.
You will test to see if less than 50% of students participate in
the Student Evaluation of Teaching system (SETS) in the
School of Business Administration at USCA. Why or Why not?
Determine and describe the type of data that you will collect
and how you plan to collect this data in order to answer your
questions. You will need to collect data on many characteristics
of your sample so that these characteristics can later be
compared somehow (e.g., before and after data; comparisons by
gender, major, type, year, age, etc.) Define the population and
the sample that you will be studying. (you must sample at least
100 students in the SOBA)Project Components
The report will include a description of the problem, and why
you think it is important, or what you hope to gain from testing
the hypothesis. It should also include the context of the data, all
data collected, and the values generated in EXCEL. A decision
and conclusion should be stated. An analysis should follow
with what the conclusion means in terms of the original
problem. The report should be in narrative format like you were
writing for a newspaper or magazine, must be typed, printed,
and should be double spaced.
An excellent final report (100 points) will have the following
components.
· An introduction to the problem including the claim(s) being
tested
· The context (who, what, where, when, why, how) of the data
27. (remember this is in narrative format) and any possible
problems with collecting the data
· Descriptive statistics and/or tables depending on your type of
data
· Appropriate graphs (every project should have at least one
graph or chart of the data in it)
· Inferential statistics including ...
· the null and alternative hypotheses written symbolically
· statistical output including a test statistic and p-value
· a graph showing the critical and non-critical regions, test
statistic, and p-value
· the decision and a conclusion written in terms of the original
claim
· Conclusion
· Suggestions for the next time this project is done
· No statistical usage errors
What can we test?
Some things are easier to test than other things. The purpose of
this project is to expose you to the process of hypothesis testing
in a real-world application. You may test means, proportions, or
linear correlation. You may have one or more samples. You may
categorize your variables in one or two ways.
If you are dealing with one sample, then you will need some
numerical value to test against. The claim "more people prefer
Pepsi than Coke" becomes a claim that the proportion of Pepsi
drinkers is greater than 0.5. There are not two independent
samples (Pepsi drinkers / Coke drinkers), just one sample
categorized in two ways. A problem with the Pepsi / Coke thing
is that it omits other soft drinks because that is more difficult to
do. A chi-square goodness of fit test would be more appropriate
in this case.
Categorical Data
If your data consists solely of categories and not measured
quantities, then you should be looking at proportions or counts.
28. Things to look for that let you know you're dealing with
categorical data or proportions include: proportions, percents,
counts, frequencies, fractions, or ratios. If your data consists of
names or labels, you're dealing with categorical data.
You really need to think about the response that was recorded
for each case. Did you record a yes/no response for each case
or did you record a number that means something? If it was a
yes/no or other categorical data, then this is the place to be.
Example Claims about Categorical Data
· 93.1% of Americans feel there should not be nudity on
television during children's viewing time.
http://www.parentstv.org/PTC/publications/lbbcolumns/2003/05
28.asp
This is a claim about a single proportion. We know this because
the value includes a percentage and the data is categorical (yes
or no), not numerical. The original claim here could be written
as p=0.931.
Quantitative (Numerical) Data
If your data consists of measured quantities, then you will
probably be testing a mean or perhaps correlation between two
variables. It is possible to test a claim about a standard
deviation, but that is rare, and not covered in this course.
There are four main ways to analyze means.
1. A test about a single mean that requires a number as the
claimed value.
2. A test about two independent means doesn't need a number
because you compare them to each other. This compares the
same thing in two different groups.
3. A test for two dependent means, often called paired samples,
compares two values for each case in the same group.
4. The Analysis of Variance is an extension of the two
independent samples case where there are more than two
groups.
29. You can also perform correlation and regression with two
quantitative variables. Simple regression, with just one
predictor variable, is covered in the book. Multiple regression,
with several predictor variables, is not covered in the textbook
but is available online.
Example Claims about Quantitative Data:
· Women live five years longer than men.
http://www.medicalnewstoday.com/medicalnews.php?newsid=1
8866 This is a claim about two averages, the average lifespan of
women and that of men. We don't know the average of either
gender (they're given in the article), we just know that women
are supposed to live five years longer than men. When you're
working with one sample, it's important to have a value to
compare against, but with two samples, you don't need a value
for each, just the difference between the two (in this case 5
years). The original claim here could be written as μw-μm=5
(the difference in the mean ages of women and men is 5 years).
· Seat belts save lives. http://dot.state.il.us/trafficsafety/seatbelt
june 2006.pdf and http://www-
fars.nhtsa.dot.gov/FinalReport.cfm?stateid=17&title=states&titl
e2=fatalities_and_fatality_rates&year=2005. Okay, this claim
is all over the place, but I wanted to give some links on how it
would be tested.
You could take the data regarding the percent of people wearing
their seat belts and compare it to the fatality rate. These are two
numerical values that are paired together for each case
(probably based on an annual report). Remember that you
cannot perform correlation and regression with categorical
variables. The original claim that seat belts save lives would be
interpreted as a negative correlation (as seat belt use goes up,
fatalities go down) and would be written as ρ<0.
Sample Final Report
Available online are sample projects and resources. Your
project may not be as long or detailed.
30. Assignment is due April 15th, either electronically prior to the
start of class or a hard copy at the start of class.
Hypothesis testing
31. Hypothesis testing: procedure
1
6
7
8
We ask a yes/no question about a population.
We answer the question yes, and answer the question no, using
symbols for the population means.
We label one answer the null hypothesis and the other answer
the alternative hypothesis.
We decide the criterion for rejecting the null hypothesis. The
test is one of: two-tailed, right-tailed, or left-tailed. We take a
sample, and calculate our test statistic (Z or t for now)
We find if the observed test statistic is in the rejection region
(critical region or tail) of the distribution.
If the statistic is in the rejection region, we reject the null
hypothesis and accept the alternative hyopthesis.
If the statistic is not in the rejection region, we retain the null
hypothesis, and do not accept the alternative hypothesis.
2
3
4
5
9
32. STATISTICS
PROJECT:
Hypothesis
Testing
INTRODUCTION
My topic is the average tuition cost of a 4-yr. public college.
Since I will soon be transferring to a 4-yr. college, I thought
this topic would be perfect. "The College Board" says that the
average tuition cost of college is $5836 per year. I will be
researching online the costs of different public colleges to test
this claim. I will be using the T-test for a mean, since my
sample is going to be less than 30 and an unknown population
standard deviation. I will also use Chi-Square Test of
Independence.
HYPOTHESIS
I think the average cost of tuition is lower than the average
stated by “The College Board”.
Ho: mu >/= $5836.
H1: mu< $5836 (Claim)
DATA ANALYSIS
I collected my data from various college websites. I looked up
the cost of tuition per year and the number of students enrolled.
Here is what I came up with:
33. College
Tuition
Number of Students
Central Washington University
$4392
10,200
University of Washington
$5985
25,469
Washington State University
$5888
18,432
Western Washington University
$4356
13,000
Evergreen State University
$4590
4400
Eastern Washington University
$5904
10,000
Peninsula College
$3639
10,120
University of Oregon
$6174
20,394
Portland State University
$5208
24,284
Oregon State University
$5604
19,362
Southern Oregon University
$5233
5000
34. Eastern Oregon University
$4500
3000
Western Oregon University
$5763
4500
University of Idaho
$4410
11,739
Idaho State University
$4400
13,000
There weren’t really any large gaps or outliers in the data that I
collected. There was a gap between 5,000 – 10,000 students.
But the rest was mostly consistent. The lowest tuition was
$3639 from Peninsula College and the highest tuition was
$6174 from the University of Oregon. Some of the websites
were hard to find the information I wanted, but I eventually
found it. Some of the websites were specific as to undergraduate
or graduate and some probably contain both. I should have done
further research to make sure that my numbers only contain
undergraduates and not graduates. So, that is one possible
mistake in the data collection.
HYPOTHESIS TESTING
T-Test for a Mean
Step 1: State the hypothesis and identify the claim.
I claim that the average cost of college tuition is less than
$5836 per year as concluded from “The College Board”. At
a=.025, can it be concluded that the average is less than $5836
35. based on a sample of 15 colleges?
H0: mu>/= $5836
H1: mu<$5836 (claim)
Step 2: Find the critical value
At a=.025 and d.f. = 14, the critical value is -2.145.
Step 3: Compute the sample test value. m= 5069.73, s=787.80
t= (5069.73-5836)/(787.80/sqrt(15)) = -3.767
Step 4: Make the decision to reject or not reject the null
hypothesis. Reject the null hypotheses since -3.767 falls in the
critical region.
Step 5: Summarize the results.
I will reject the null hypotheses since there is enough evidence
to support the claim that the average cost of tuition is less than
$5836 per year.
Chi-Squared Independence Test
Step 1: State the hypotheses and identify the claim.
I claim that there is a correlation between the number of
students at a college and the cost of tuition per year. Here is
the data that I collected:
Cost of Tuition
Number of Students
Total
3000-9,999
37. that attend the college. (claim) (x²>0)
Step 2: Find the critical value:
The critical value is 14.449 since the degrees of freedom are (3-
1)(4-1)=6.
Step 3: Compute the test value.
First we have to find the expected value:
E1,1 = (6)(4)/15=1.6
E2,1 = (3)(4)/15=.8
E3,1 = (6)(4)/15=1.6
E1,2 = (6)(6)/15=2.4
E2,2 = (3)(6)/15=1.2
E3,2 = (6)(6)/15=2.4
E1,3 = (6)(3)/15=1.2
E2,3 = (3)(3)-15=.6
E3,3 = (6)(3)/15=1.2
E1,4 = (6)(2)/15=.8
E2,4 = (3)(2)/15=.4
E3,4 = (6)(2)/15=.8
The completed table is shown:
Cost of Tuition
Number of Students
Total
3000-9,999
10,000-16,999
17,000-23,999
39. Step 4: Make the decision to reject or not to reject the null
hypothesis. Do not reject the null hypothesis since 13.333 is
less than 14.449.
Step 5: Summarize the results.
There is not enough evidence to support the claim that the cost
of tuition is dependent on the number of students that attend the
college.
SUMMARY
My first hypothesis test about the tuition cost of 4-year
universities being less than the average was correct. The
average as stated by “The College Board” said that the tuition
was $5836 per year. I thought that was a little high. The average
tuition of the fifteen colleges that I researched was $5069.73.
Maybe if I would have researched colleges all around the
country instead of just our surrounding states I would have
come up with different numbers. Another thing that may have
caused this test to be a little off was that when I was collecting
data, some of the costs of tuition may include other fees and
some may not. When I looked them up, some fees were listed
separately and some were not. This could have lead to a Type I
error where the null hypothesis was true and it was rejected.
My second hypothesis test about whether the cost of tuition is
dependant on the number of students that attend the college was
rejected. I thought that the fewer the students that attend a
specific college, that tuition would be cheaper, but that wasn’t
the case. One main problem I can see with colleting my data is
that on the college websites for the number of students, some
said “over” or “approximately”. So, these weren’t the exact
numbers of students enrolled. Also, as stated earlier, some of
the students could be undergraduates or graduates. Some of the
websites didn’t list them separately. Tuition is higher for
graduates, so they should not have been included in this study
40. and it would have thrown off the number of students. So, these
may have affected the outcome a little, but I don’t’ think
enough for it to change the hypothesis.
It would have also been interesting to test to see whether the
tuition is higher in urban areas where more people live verses
rural areas where there are not as
many people. I would be inclined to say that this is true, but it
would need to be tested further to say for sure. It would also be
interesting to do this same testing for private colleges to see if
they have the same results. I thought this was fun to come up
with our own hypothesis and try to prove ourselves right or
wrong using what we have learned all quarter. It was a good test
of our skills and it made me get a better understanding of how
the formulas really work rather than just doing the homework
examples in the book.