This is a short introduction course to Stata statistical software version 9. The course still applies to later versions of Stata, too. The course duration was 9 hours. It has been given at the Faculty of Economics and Political Science, Cairo University.
The document presents a statistical analysis to test the independence of two attributes: condition of home and condition of child. A chi-square test is conducted using observed and expected counts from a contingency table with 300 total observations across 5 categories. The calculated chi-square value of 25.633 exceeds the critical value of 9.210 with 2 degrees of freedom at the 1% significance level. Therefore, the null hypothesis that the two attributes are independent is rejected.
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
1. The FP-Growth algorithm constructs an FP-tree to store transaction data, with frequent items listed in descending order of frequency.
2. It then uses a divide-and-conquer strategy to mine the conditional pattern base of each frequent item prefix, extracting combinations of frequent items.
3. This recursively mines the frequent patterns from the conditional FP-tree for each prefix path, without generating a large number of candidate itemsets.
This document discusses finite automata and provides definitions and examples. It defines deterministic finite automata (DFA) and nondeterministic finite automata (NFA) and their components. It describes how strings are processed by DFAs using transition functions. Notations for finite automata like transition diagrams and tables are presented. The reasons for nondeterminism and how to convert NFAs to equivalent DFAs are summarized. Examples of finite automata design are provided.
This document discusses arrays and operations on arrays. It defines an array as a fixed collection of homogeneous data items stored in contiguous memory locations and indexed by integers. It describes the insert operation as shifting existing elements over and adding the new element at the specified index. It also describes the delete operation as shifting elements back and overwriting the element at the specified index.
SPSS is a statistical software package used for interactive or programmed data analysis. It can perform complex data analysis and statistics with simple commands. Originally called the Statistical Package for the Social Sciences when it was first created in 1968, SPSS is now owned by IBM. The default window in SPSS contains a data editor with two sheets - the data view sheet displays raw data while the variable view sheet defines metadata for each variable. SPSS allows users to easily enter, clean, manage and analyze data to derive useful information for making informed decisions.
This document discusses using R to perform multivariate analysis, including one sample and two sample Hotelling's T-square tests and a two-way MANOVA. It analyzes pulmonary response data using a one sample test and rating data from two teachers using two sample tests with different hypothesized mean vectors. It also analyzes triathlon performance data using a two-way MANOVA to examine the effects of gender, age category, and their interaction on swim, bike, and run times. Key results include no significant change in pulmonary function, no difference in teacher ratings, and significant effects of both gender and age category on triathlon performance times.
This is a short introduction course to Stata statistical software version 9. The course still applies to later versions of Stata, too. The course duration was 9 hours. It has been given at the Faculty of Economics and Political Science, Cairo University.
The document presents a statistical analysis to test the independence of two attributes: condition of home and condition of child. A chi-square test is conducted using observed and expected counts from a contingency table with 300 total observations across 5 categories. The calculated chi-square value of 25.633 exceeds the critical value of 9.210 with 2 degrees of freedom at the 1% significance level. Therefore, the null hypothesis that the two attributes are independent is rejected.
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
1. The FP-Growth algorithm constructs an FP-tree to store transaction data, with frequent items listed in descending order of frequency.
2. It then uses a divide-and-conquer strategy to mine the conditional pattern base of each frequent item prefix, extracting combinations of frequent items.
3. This recursively mines the frequent patterns from the conditional FP-tree for each prefix path, without generating a large number of candidate itemsets.
This document discusses finite automata and provides definitions and examples. It defines deterministic finite automata (DFA) and nondeterministic finite automata (NFA) and their components. It describes how strings are processed by DFAs using transition functions. Notations for finite automata like transition diagrams and tables are presented. The reasons for nondeterminism and how to convert NFAs to equivalent DFAs are summarized. Examples of finite automata design are provided.
This document discusses arrays and operations on arrays. It defines an array as a fixed collection of homogeneous data items stored in contiguous memory locations and indexed by integers. It describes the insert operation as shifting existing elements over and adding the new element at the specified index. It also describes the delete operation as shifting elements back and overwriting the element at the specified index.
SPSS is a statistical software package used for interactive or programmed data analysis. It can perform complex data analysis and statistics with simple commands. Originally called the Statistical Package for the Social Sciences when it was first created in 1968, SPSS is now owned by IBM. The default window in SPSS contains a data editor with two sheets - the data view sheet displays raw data while the variable view sheet defines metadata for each variable. SPSS allows users to easily enter, clean, manage and analyze data to derive useful information for making informed decisions.
This document discusses using R to perform multivariate analysis, including one sample and two sample Hotelling's T-square tests and a two-way MANOVA. It analyzes pulmonary response data using a one sample test and rating data from two teachers using two sample tests with different hypothesized mean vectors. It also analyzes triathlon performance data using a two-way MANOVA to examine the effects of gender, age category, and their interaction on swim, bike, and run times. Key results include no significant change in pulmonary function, no difference in teacher ratings, and significant effects of both gender and age category on triathlon performance times.
How to calculate sample size for different studyShine Stephen
1. The document discusses different methods for calculating sample sizes for various study designs including cross-sectional studies, case-control studies, cohort studies, clinical trials, and animal studies.
2. The methods depend on whether the variable is qualitative or quantitative, and formulas are provided for determining sample sizes based on expected proportions, differences in means, odds ratios, and other statistical parameters.
3. Online software and calculators are recommended for performing sample size calculations for different types of biomedical studies.
Introduces common data management techniques in Stata. Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria, merging and collapsing data sets. Intended for users who have an introductory level of knowledge of Stata software.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/data-management-stata
Radix sort is a non-comparative integer sorting algorithm that sorts data by grouping keys based on their individual digit positions and values, from least to most significant. It works by sorting the integers based on the ones place value first, then the tens place, hundreds place, and so on. The algorithm uses buckets to separate numbers into groups based on their digit values, then concatenates the buckets to sort the list.
A Youden square design is a modification of a Latin square design that allows for the analysis of treatment effects while blocking out two sources of variation. It involves an incomplete rectangular arrangement of treatments with some columns missing. The rows form a randomized block design and the columns form a balanced incomplete block design. It provides an analysis of variance that tests the significance of treatment effects while accounting for row and column variations.
A Presentation About Array Manipulation(Insertion & Deletion in an array)Imdadul Himu
The document discusses arrays, which are collections of same-typed data organized in a sequence. It describes one-dimensional, two-dimensional, and multi-dimensional arrays. Initialization of arrays involves declaring the type, name, and size. Values can be initialized individually or in sets within curly braces. Loops are used to input or search values in arrays, running from 0 to the size minus 1. Two-dimensional arrays are often considered multi-dimensional and allow nested looping through rows and columns. Deletion in arrays involves replacing matching values with 0.
This document provides an outline and overview of Chapter 9 from a statistics textbook. The chapter covers hypothesis testing for single populations, including:
- Establishing null and alternative hypotheses
- Understanding Type I and Type II errors
- Testing hypotheses about single population means when the standard deviation is known or unknown
- Testing hypotheses about single population proportions and variances
- Solving for Type II errors
The chapter teaches students how to implement the HTAB (Hypothesis, Test Statistic, Accept/Reject regions, Boundaries, Conclusion) system to scientifically test hypotheses using statistical techniques like z-tests and t-tests. Key concepts covered include one-tailed and two-tailed tests, critical values, p
Clustering is an unsupervised learning technique used to group unlabeled data points into clusters based on similarity. It is widely used in data mining applications. The k-means algorithm is one of the simplest clustering algorithms that partitions data into k predefined clusters, where each data point belongs to the cluster with the nearest mean. It works by assigning data points to their closest cluster centroid and recalculating the centroids until clusters stabilize. The k-medoids algorithm is similar but uses actual data points as centroids instead of means, making it more robust to outliers.
This document provides an overview of analysis of variance (ANOVA). It lists the goals as conducting hypothesis tests to determine if variances or means of populations are equal. It describes the characteristics of the F-distribution and how it is used to test hypotheses about equal variances or means. Examples are provided to demonstrate comparing two variances, comparing means of two or more groups, and constructing confidence intervals for differences in means. The key steps of ANOVA including organizing data in an ANOVA table and making conclusions based on the F-statistic are outlined.
SPSS for beginners, a short course about how novices can use SPSS to analyze their research findings. With this tutorial anyone becomes able to use SPSS for basic statistical analysis. No need to be a professional to use SPSS.
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15Ayapparaj SKS
I have added answers to exercise sums (chapters 7 to 15 - even number problems) for Ron Cody's Learning SAS by Example Programmer's Guide.
In seventh chapter i gathered knowledge about using conditional statements such as IF, ELSE IF, WHERE, SELECT , sub-setting with the help of the above statements and using Boolean operators. In eighth chapter i gathered knowledge about using DO, DO WHILE, DO UNTIL along with LEAVE and CONTINUE statements and also to make a simple gplot. Ninth chapter talks about dealing with dates, finding the difference with respect to day, weekday, month, year and also computing difference quarterly, imputing missing values etc., using various functions and also to make qplot.Tenth chapter talks mainly about merging two datasets and subsetting using IN= function, updating a master table using another table and much more.Eleventh chapter talks about Functions to round and truncate numerical values, missing values, computing constant values, generating random values, to fetch values from previous observations etc.
Chapter twelve talks about functions dealing with manipulating characters. Chapter thirteen talks about array functions. Chapter Fourteen mainly deals with presenting the data. Fifteen is about generating reports.
The document describes the process of minimizing a deterministic finite automaton (DFA) using the Myhill-Nerode theorem. It involves 4 steps: 1) creating a table of all state pairs, 2) marking pairs where one state is final and the other is not, 3) transitively marking additional pairs, 4) combining any remaining unmarked pairs into single states. An example is provided where the given 6-state DFA is minimized to a 3-state DFA using this process. Terminology used in DFA minimization is also defined.
Radix sort is a non-comparative sorting algorithm that sorts numeric keys by decomposing them into digits and sorting the digits individually. It works by representing keys as d-digit numbers in some base-k, then sorting the numbers by looking at one column of digits at a time from least to most significant. This requires d passes through the list, resulting in a time complexity of O(d(n+k)) where n is the number of keys and k is the maximum possible digit value, assuming d and k are constants. When d and k are O(n), the overall time complexity is O(n).
This document provides instructions for inputting and managing data in SAS. It discusses creating a SAS library to organize data files. Steps are provided to manually create a SAS data set within a library and input data. Importing data from an external file is also mentioned as an alternative to manual input. The document reviews key SAS concepts like librefs and permanent vs temporary libraries.
This chapter discusses hypothesis testing for the difference between two population means and two population proportions. It covers tests for:
1) Matched or dependent pairs, using a t-test and assuming normal distributions.
2) Independent populations when variances are known, using a z-test.
3) Independent populations when variances are unknown but assumed equal, using a pooled variance t-test.
4) Independent populations when variances are unknown and assumed unequal, requiring other techniques.
The document provides examples and decision rules for conducting hypothesis tests on differences between two means or proportions in various situations. Formulas for calculating test statistics like z-scores and t-statistics are presented.
This document describes an implementation of a stack data structure using a single linked list in C. It includes functions to push elements onto the stack, pop elements off the stack, and display the elements currently in the stack. The main function contains a menu loop that calls these functions based on user input and exits when the user selects option 4.
STACK ( LIFO STRUCTURE) - Data StructureYaksh Jethva
Stack which is known as LIFO structure.Which is type of the Linear data structure and it is Non-Primitive data structure.
Definition:Non primitive data structure are not a basic data structure and depends on other primitive data structure (Integer,float etc).
Non primitive data structure can't be operated by machine level instruction directly.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
How to calculate sample size for different studyShine Stephen
1. The document discusses different methods for calculating sample sizes for various study designs including cross-sectional studies, case-control studies, cohort studies, clinical trials, and animal studies.
2. The methods depend on whether the variable is qualitative or quantitative, and formulas are provided for determining sample sizes based on expected proportions, differences in means, odds ratios, and other statistical parameters.
3. Online software and calculators are recommended for performing sample size calculations for different types of biomedical studies.
Introduces common data management techniques in Stata. Topics covered include basic data manipulation commands such as: recoding variables, creating new variables, working with missing data, and generating variables based on complex selection criteria, merging and collapsing data sets. Intended for users who have an introductory level of knowledge of Stata software.
All workshop materials including slides, do files, and example data sets can be downloaded from http://projects.iq.harvard.edu/rtc/event/data-management-stata
Radix sort is a non-comparative integer sorting algorithm that sorts data by grouping keys based on their individual digit positions and values, from least to most significant. It works by sorting the integers based on the ones place value first, then the tens place, hundreds place, and so on. The algorithm uses buckets to separate numbers into groups based on their digit values, then concatenates the buckets to sort the list.
A Youden square design is a modification of a Latin square design that allows for the analysis of treatment effects while blocking out two sources of variation. It involves an incomplete rectangular arrangement of treatments with some columns missing. The rows form a randomized block design and the columns form a balanced incomplete block design. It provides an analysis of variance that tests the significance of treatment effects while accounting for row and column variations.
A Presentation About Array Manipulation(Insertion & Deletion in an array)Imdadul Himu
The document discusses arrays, which are collections of same-typed data organized in a sequence. It describes one-dimensional, two-dimensional, and multi-dimensional arrays. Initialization of arrays involves declaring the type, name, and size. Values can be initialized individually or in sets within curly braces. Loops are used to input or search values in arrays, running from 0 to the size minus 1. Two-dimensional arrays are often considered multi-dimensional and allow nested looping through rows and columns. Deletion in arrays involves replacing matching values with 0.
This document provides an outline and overview of Chapter 9 from a statistics textbook. The chapter covers hypothesis testing for single populations, including:
- Establishing null and alternative hypotheses
- Understanding Type I and Type II errors
- Testing hypotheses about single population means when the standard deviation is known or unknown
- Testing hypotheses about single population proportions and variances
- Solving for Type II errors
The chapter teaches students how to implement the HTAB (Hypothesis, Test Statistic, Accept/Reject regions, Boundaries, Conclusion) system to scientifically test hypotheses using statistical techniques like z-tests and t-tests. Key concepts covered include one-tailed and two-tailed tests, critical values, p
Clustering is an unsupervised learning technique used to group unlabeled data points into clusters based on similarity. It is widely used in data mining applications. The k-means algorithm is one of the simplest clustering algorithms that partitions data into k predefined clusters, where each data point belongs to the cluster with the nearest mean. It works by assigning data points to their closest cluster centroid and recalculating the centroids until clusters stabilize. The k-medoids algorithm is similar but uses actual data points as centroids instead of means, making it more robust to outliers.
This document provides an overview of analysis of variance (ANOVA). It lists the goals as conducting hypothesis tests to determine if variances or means of populations are equal. It describes the characteristics of the F-distribution and how it is used to test hypotheses about equal variances or means. Examples are provided to demonstrate comparing two variances, comparing means of two or more groups, and constructing confidence intervals for differences in means. The key steps of ANOVA including organizing data in an ANOVA table and making conclusions based on the F-statistic are outlined.
SPSS for beginners, a short course about how novices can use SPSS to analyze their research findings. With this tutorial anyone becomes able to use SPSS for basic statistical analysis. No need to be a professional to use SPSS.
SAS Ron Cody Solutions for even Number problems from Chapter 7 to 15Ayapparaj SKS
I have added answers to exercise sums (chapters 7 to 15 - even number problems) for Ron Cody's Learning SAS by Example Programmer's Guide.
In seventh chapter i gathered knowledge about using conditional statements such as IF, ELSE IF, WHERE, SELECT , sub-setting with the help of the above statements and using Boolean operators. In eighth chapter i gathered knowledge about using DO, DO WHILE, DO UNTIL along with LEAVE and CONTINUE statements and also to make a simple gplot. Ninth chapter talks about dealing with dates, finding the difference with respect to day, weekday, month, year and also computing difference quarterly, imputing missing values etc., using various functions and also to make qplot.Tenth chapter talks mainly about merging two datasets and subsetting using IN= function, updating a master table using another table and much more.Eleventh chapter talks about Functions to round and truncate numerical values, missing values, computing constant values, generating random values, to fetch values from previous observations etc.
Chapter twelve talks about functions dealing with manipulating characters. Chapter thirteen talks about array functions. Chapter Fourteen mainly deals with presenting the data. Fifteen is about generating reports.
The document describes the process of minimizing a deterministic finite automaton (DFA) using the Myhill-Nerode theorem. It involves 4 steps: 1) creating a table of all state pairs, 2) marking pairs where one state is final and the other is not, 3) transitively marking additional pairs, 4) combining any remaining unmarked pairs into single states. An example is provided where the given 6-state DFA is minimized to a 3-state DFA using this process. Terminology used in DFA minimization is also defined.
Radix sort is a non-comparative sorting algorithm that sorts numeric keys by decomposing them into digits and sorting the digits individually. It works by representing keys as d-digit numbers in some base-k, then sorting the numbers by looking at one column of digits at a time from least to most significant. This requires d passes through the list, resulting in a time complexity of O(d(n+k)) where n is the number of keys and k is the maximum possible digit value, assuming d and k are constants. When d and k are O(n), the overall time complexity is O(n).
This document provides instructions for inputting and managing data in SAS. It discusses creating a SAS library to organize data files. Steps are provided to manually create a SAS data set within a library and input data. Importing data from an external file is also mentioned as an alternative to manual input. The document reviews key SAS concepts like librefs and permanent vs temporary libraries.
This chapter discusses hypothesis testing for the difference between two population means and two population proportions. It covers tests for:
1) Matched or dependent pairs, using a t-test and assuming normal distributions.
2) Independent populations when variances are known, using a z-test.
3) Independent populations when variances are unknown but assumed equal, using a pooled variance t-test.
4) Independent populations when variances are unknown and assumed unequal, requiring other techniques.
The document provides examples and decision rules for conducting hypothesis tests on differences between two means or proportions in various situations. Formulas for calculating test statistics like z-scores and t-statistics are presented.
This document describes an implementation of a stack data structure using a single linked list in C. It includes functions to push elements onto the stack, pop elements off the stack, and display the elements currently in the stack. The main function contains a menu loop that calls these functions based on user input and exits when the user selects option 4.
STACK ( LIFO STRUCTURE) - Data StructureYaksh Jethva
Stack which is known as LIFO structure.Which is type of the Linear data structure and it is Non-Primitive data structure.
Definition:Non primitive data structure are not a basic data structure and depends on other primitive data structure (Integer,float etc).
Non primitive data structure can't be operated by machine level instruction directly.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
This document discusses various research methods and tools used in cognitive neuroscience, including questionnaires, eye trackers, EEG/MEG, PET, MRI/fMRI, NIRS, TMS, and tDCS. It provides examples of how each method is used, such as measuring brain activity with EEG during eye open and closed states, and detecting awareness in vegetative patients using fMRI.
This document discusses the uses of PROC PRINT and PROC MEANS in SAS. PROC PRINT is used to print out and list the data values in a SAS data set. It allows you to specify titles, variable identifiers, and variables. PROC MEANS calculates descriptive statistics like means, standard deviations, minimum and maximum values from the data. You can specify which statistics to compute using options with PROC MEANS and list the variables to analyze. Both procedures end with a run statement and utilize semicolons after each statement.
This document summarizes different methods for inputting data in SAS, including column mode, list mode, and formatted mode. Column mode requires calculating data locations, while list mode is easiest, separating data with blanks and only allowing periods for missing values. Formatted mode requires specifying data lengths and allows blanks or periods for missing values. List mode is generally preferred for inputting data when lengths are unequal, as it easily handles variable data with blanks as separators.
This document discusses using machine learning and data mining techniques to gain knowledge from big data. It defines key terms like data, databases, and big data. It explains that machine learning and data mining can help solve the problem of "data overloading" by discovering patterns and making predictions. The document also introduces social network analysis and crowdsourcing as collective intelligence approaches for learning from data. It provides examples like Amazon Mechanical Turk, which utilizes crowdsourcing for various tasks.
The document provides an introduction to functional magnetic resonance imaging (fMRI). It discusses how fMRI works by detecting changes in blood oxygenation, which serves as an indirect measure of neural activity. The basics of MRI are also reviewed, including how MRI uses strong magnetic fields and radio waves to generate images based on magnetic properties of tissue. Example fMRI studies measuring brain activity in response to visual stimuli are presented.
The SAS program creates a data set with the student names and scores, uses PROC MEANS to calculate the mean and sum for each student, and outputs the results in two tables.
Data _null_;
input Names $ Subject Score;
cards;
John Math 90
John English 96
John French 87
John Physics 45
John History 77
Mary Math 80
Mary English 60
Mary French 87
Mary Physics 65
Mary History 66
Dick Math 78
Dick English 78
Dick French 56
Dick Physics 34
Dick History 88
Lucy Math 74
Luc