SlideShare a Scribd company logo
InfoStatStatistical Software
User’s Manual
Version 2012
Data Management
i
InfoStat
User’s Manual
Version 2012
InfoStat software and documentation are the result of the active and multidisciplinary
participation of all the members of Grupo InfoStat, who are Copyright owners. Principal
responsibilities and activities are as follows:
Programming: Julio A. Di Rienzo
Quality control: Fernando Casanoves, Laura A. Gonzalez, Mónica G. Balzarini
Editorial director of the User’s Manual: Fernando Casanoves, Julio A. Di Rienzo
Electronic version of the User’s Manual: Fernando Casanoves
Online help: Elena M. Tablada
Citation for this manual is as follows:
Casanoves F., Balzarini M.G., Di Rienzo J.A., Gonzalez L., Tablada M., Robledo C.W.
(2012). InfoStat. User Manual, Córdoba, Argentina
The software to which this manual refers should be cited as follows:
Di Rienzo J.A., Casanoves F., Balzarini M.G., Gonzalez L., Tablada M., Robledo C.W.
InfoStat versión 2012. InfoStat Group, Facultad de Ciencias Agropecuarias, Universidad
Nacioal de Córdoba, Argentina. URL http://www.infostat.com.ar
Total or partial reproduction of this reference in identical or modified form, by any means,
mechanical or electronic, including photocopying, recording or through the use of any
information storage and recuperation system not authorized by the Copyright owners, is
prohibited.
Data Management
ii
Prologue
InfoStat is a statistical software developed by Grupo InfoStat—a team of professionals in
Applied Statistics, with a center at the Faculty of Agronomy at Cordoba National University
(Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba). The following
professors of Statistics and Biometry participated in the elaboration of InfoStat: Julio A. Di
Rienzo, Mónica G. Balzarini, Fernando Casanoves, Laura A. Gonzalez, Elena M.
Tablada, and Carlos W. Robledo. InfoStat is a synthesis of experiences accumulated since
1982. It has been enriched by teaching experiences at the undergraduate and graduate
levels, consulting in Statistics and the development of human resources in Applied Statistic.
We are proud of InfoStat’s level of acceptance within university environments, at research
and technological institutions, and among businesses devoted to the production of goods and
services.
This manual consists of four chapters: Data Management, Statistics, Graphs and
Applications. The chapter on Data Management contains information on how to operate the
program in order to use files, and it describes the activities that can be done with data tables.
The chapter on Statistics describes the methodological tools that the user can select in
analyzing his or her data. These descriptions are accompanied by examples of their
implementation using InfoStat, and they are based on numerous real situations in which the
application of one or more statistical techniques is beneficial. The chapter on Graphics also
uses examples to describe the different types of graphical representations available. The
chapter on Applications shows statistical methods used in the statistical quality control, the
quantification of biodiversity and computational tools used to facilitate the teaching-learning
process of classical statistical concepts.
This manual reflects the state of development of InfoStat at the time of print; nevertheless,
InfoStat keeps growing, improving and upgrading algorithms and user interfaces. Through
InfoStat’s Help Menu, users can access the electronic version of the manual and a link to
upgrade the manual.
Data Management
iii
Table of contents
Installation ____________________________________________________________ 1
Upgrading _____________________________________________________________ 1
Requirements __________________________________________________________ 1
General aspects_________________________________________________________ 2
Data Management ______________________________________________________ 5
File ________________________________________________________________________5
New table_________________________________________________________________5
Open table ________________________________________________________________5
Save table_________________________________________________________________8
Save table as ______________________________________________________________9
Close table ________________________________________________________________9
Edit________________________________________________________________________9
Data ______________________________________________________________________11
New row ________________________________________________________________12
Insert row________________________________________________________________12
Delete row _______________________________________________________________12
Deactivate case ___________________________________________________________12
Activate case _____________________________________________________________13
Invert selection ___________________________________________________________13
Choosing cases ___________________________________________________________13
New column______________________________________________________________15
Insert column_____________________________________________________________15
Delete column ____________________________________________________________15
Edit Labels_______________________________________________________________15
Read labels from… ________________________________________________________16
Data type ________________________________________________________________16
Alignment _______________________________________________________________16
Decimals ________________________________________________________________16
Automatically adjust columns ________________________________________________16
Sort ____________________________________________________________________16
Categorize _______________________________________________________________18
Edit categories ____________________________________________________________20
Transforming _____________________________________________________________20
Create dummy variables ____________________________________________________23
Fill... ___________________________________________________________________23
Formula _________________________________________________________________29
Search __________________________________________________________________33
Resampling ______________________________________________________________33
Color selection____________________________________________________________33
Merge tables _____________________________________________________________34
Rearrange columns, one under the other ________________________________________34
Rearrange rows as columns __________________________________________________34
Create a new table using active cases __________________________________________35
Merge categories __________________________________________________________35
Data Management
iv
Output ____________________________________________________________________35
Upload results ____________________________________________________________35
Save results ______________________________________________________________35
Decimals ________________________________________________________________35
Field separator ____________________________________________________________36
Typography ______________________________________________________________36
Export results to table ______________________________________________________36
Statistics _____________________________________________________________ 37
Descriptive statistics _________________________________________________________38
Summary statistics_________________________________________________________38
Frequency tables __________________________________________________________40
Probabilities and quantiles___________________________________________________42
Estimators of population characteristics __________________________________________43
Definitions of terms associated with the sampling technique ________________________43
Simple random sample _____________________________________________________45
Stratified sample __________________________________________________________47
Stratified sampling_________________________________________________________49
Sample size calculation _______________________________________________________51
Estimating a mean with a given precision _______________________________________51
Inference in one and two populations ____________________________________________53
Inference based on one sample _______________________________________________53
Two-sample inference ______________________________________________________60
Analysis of variance__________________________________________________________71
Completely random design __________________________________________________74
Block design _____________________________________________________________77
Latin square design ________________________________________________________79
Comparaciones Múltiples ___________________________________________________94
ANOVA assumptions _____________________________________________________101
Analysis of covariance_____________________________________________________105
Non-parametric analysis of variance ____________________________________________107
Kruskal-Wallis test _______________________________________________________107
Friedman test ____________________________________________________________108
Validation of assumptions __________________________________________________118
Regression with dummy variables____________________________________________123
Non linear analysis of regression_______________________________________________128
Correlation analysis _________________________________________________________132
Correlation between distance matrices ________________________________________135
Categorical data analysis _____________________________________________________136
Contingency tables _______________________________________________________136
Logistic regression________________________________________________________146
Kaplan-Meier survival analysis ______________________________________________148
Multivariate Analysis__________________________________________________ 153
Multivariate descriptive statistics_______________________________________________154
Hierarchical clustering methods _____________________________________________163
Non-hierarchical clustering methods__________________________________________167
Distances _______________________________________________________________167
Principal components________________________________________________________167
Canonical correlations _______________________________________________________180
Partial Least Squares Regression _______________________________________________184
Data Management
v
Multivariate analysis of variance _______________________________________________188
Distances and association matrices _____________________________________________196
Principal coordinates analysis _________________________________________________205
Classification-regression trees _________________________________________________207
Biplot and MST ____________________________________________________________208
Generalized Procrustes analysis________________________________________________210
Cross-correlations __________________________________________________________219
Box and Jenkins methodology (ARIMA) ________________________________________222
Fitting and smoothing _______________________________________________________235
Series Tab ______________________________________________________________239
Legends ________________________________________________________________245
Aplications __________________________________________________________ 267
Quality control _____________________________________________________________267
Control chart for attributes__________________________________________________269
Variable control charts_____________________________________________________274
Confidence intervals ______________________________________________________287
All possible samples ______________________________________________________289
Sampling from the empirical distribution ______________________________________291
Biodeversity indexes ______________________________________________________295
Data Management
1
Installation
To install InfoStat, enter our web page www.infostat.com.ar, download the installer and run
it. Once the installation is successfully completed, the installer will have created a folder
called InfoStat in C:Program files and an icon for direct access at desktop.
Inside the InfoStat folder, C:Program filesInfoStat, you should find the
following information:
Data file: contains all the Data files to which this manual refers.
Help file: contains the Online Help file.
Manual.pdf file: contains the printed material that comes along with the CD. The electronic
version may contain an updated version of the printed material.
Upgrading
Upgrading instructions can be accessed through the HELP menu. The UPGRADE option
opens the InfoStat web page, where the latest applications can be downloaded.
Requirements
Processor required: Pentium or superior
Minimum suggested memory: 128 Mb
Operating systems: Windows XP or newer.
Monitor configuration: minimum 800 x 600 pixel definition, small fonts. The configuration
of large fonts may cause problems in viewing part of the windows displayed by InfoStat
during use.
Data Management
2
General aspects
InfoStat offers different tools so that the user can easily explore information. When InfoStat
is opened, a toolbar appears on the topmost window of the program; it contains the
following menus: File, Edit, Data, Results, Statistics, Graphics, Windows, Help, and
Applications.
Below the menus, the toolbar contains a series of buttons that allow the user to perform
actions quickly. All of the actions that can be performed with the buttons can also be
performed from one of the menus listed above.
By positioning the mouse over a button, but without clicking, the user can visualize a help
label over the button as well as a legend at the foot of the screen, indicating the type of
action that can be performed with that button. These actions are as follows (for buttons
ordered from left to right): New table, Open table, Save active table, Export table, Print,
New column, Sort, Categories, Font, Align left, Align center, and Align right.
At the foot of the screen, the user will visualize three minimized windows, one named
Results, another Graphs, and another Graphical Tools. If the Results window is
maximized as soon as the program is opened, InfoStat will report that there are no results
available. This window will receive content as actions (analyses) that generate results are
performed. The Graphs and Graphical Tools screens are only activated when a graphic is
generated.
In the FILE menu, InfoStat allows the user to open and save different types of data files. For
example, if New Table is activated, the following screen will appear:
By using the keyboard, the user can enter information in the table or file temporarily named
New. Using this table, the user can perform data analysis as well as produce results and
Data Management
3
graphics. The Exit command, used to close the application, can also be found in the FILE
menu.
Commands for cutting, copying and pasting information from data, results and graphics
windows can be found in the EDIT menu. The DATA menu allows the user to conduct
different types of operations on a data grid. It is possible to order a file, transform columns,
generate new columns based on formulas, simulate random variables, and automatically find
and replace information, among other actions. From the OUTPUT menu the user can invoke
actions related to the presentation and exportation of results in table format.
All of the generated results (tables and graphs) can be copied using the EDIT menu (Copy)
and can then be pasted in the word processor. This is the simplest way to transport results
from InfoStat to a document or written report. The use of the Copy and Paste commands is
also the simplest way to import and export data between InfoStat and a word processor or
electronic spreadsheet program such as Excel. In order to simplify the transportation of data
spreadsheets, InfoStat provides the user with the commands Copy and Paste including
column names, in order to preserve the names and labels of columns. It is also possible to
import and export information in ASCII format. In this chapter, the options from the FILE,
EDIT, DATA and OUTPUT menus are described with examples.
InfoStat works with three types of windows: one where data are found (Data), one where
results and procedures are solicited (Results), and one where graphs created by the user are
shown and stored (Graphs). Several data windows can be kept open simultaneously. In
such cases, the active window is the one in the front, with a colored frame (not gray). All
actions will be executed on the active data window. The Results and Graphs windows
contain a sheet for each result and/or graph produced. The user can move across the
different sheets by clicking once on the labels found at the foot of the window, which
indexes the results.
In the STATISTICS menu, in an almost automatic manner (through the use of dialogue
windows), InfoStat makes it possible to implement an ample variety of statistical analyses.
The user can calculate descriptive statistics; calculate probabilities; estimate population
characteristics with different sampling plans; calculate inference statistics for one and two
samples by using different types of confidence intervals and hypothesis tests (parametric
and non-parametric); use regression models and analysis of variance for different types of
experimental designs and observational studies; use inference statistics for categorical data;
use multivariate statistics; do time series analysis; soften and adjust graphs.
After selecting the desired statistics application to be used in analyzing the data of an open
table (active table), a window (Variables) appears in which all the file’s columns are listed
on the left-hand side, so that the user can select the column(s) to be included in the
analysis—either as the variable of interest or as classification criteria. The selected columns
should be transported to the list of Variables, which is found on the right-hand side of the
window, using the button that contains the “”arrow. If a variable was incorrectly selected
or it is no longer necessary, it can be eliminated from the list of variables and added again to
the list of columns in the file by pressing the “” button, after having selected the variable
or having double clicked on it.
Data Management
4
The variable selector facilitates analysis, making it unnecessary to remember or write down
the names of the variables each time they are to be used.
In the GRAPHS menu, InfoStat provides professional style graphical tools for the
presentation of results. Various graphical techniques are employed, and they are described
in the chapter entitled “Graphs”. The program allows the inclusion of several series in a
single graph and the virtual edition of all attributes, by using the Graphical Tools window,
which automatically opens up when a graph is requested. InfoStat has an algorithm for
copying and subscribing formats which facilitates the creation of graphical series with
identical characteristics. Graphs created by InfoStat can be saved or copied and pasted into
any Windows application that supports images (enhanced metafile) by using the classic Cut
and Past (or Paste Special) Windows commands. All the tools on the GRAPHS menu are
available in every version of InfoStat.
Through the WINDOWS menu, the user can move from one window to another. Another
way to access a window is to simply move the cursor to the desired window. The Windows
menu also allows the user to select the mode in which the open windows are presented on
the screen. The windows can be presented in cascade, vertically or horizontally by selecting
the appropriate option—Cascade, Align vertical, or Align horizontal. From this menu, the
user can access the OUTPUT menu, where the results of a session that the user has not
deliberately erased are stored. Similarly, the user can move to the Graphs window. The
names of open data tables are also listed.
Through the HELP menu, the user can access online documentation regarding procedures
and types of statistical analysis which can be implemented from any of the enabled menus,
as well as access an electronic version of the InfoStat manual. Moreover, this menu can be
used to gain fast access to software updates.
In the APPLICATIONS menu, traditional analysis tools are available, and these can be used
to explore information in groups of data from specific areas of knowledge. The following
applications are available: QUALITY CONTROL, TEACHING TOOLS, INDICES and
DNA-MICROARRAY ANALYSIS. The TEACHING TOOLS application is oriented
toward providing classical elements for teaching and learning applied statistics. Some tools
frequently used in statistical quality control are found in the QUALITY CONTROL
application. Under the INDICES item, the user can calculate numerous biodiversity indices
commonly used in Ecology. In the DNA-MICROARRAY ANALYSIS application,
procedures for normalizing, transforming, filtering, grouping and ordering genes, ordering
micromatrices, correcting the p-value to control for false discovery ratios (FDR), and testing
p-values are available, among others.
When an option in any of the menus shows up in gray instead of in black, this indicates that
the menu is not enabled. This could be because the user has not completed a previous step
necessary for that action, or because the action is not available in the acquired version of
InfoStat.
Data Management
5
Data Management
InfoStat processes information proceeding from a table. A table is defined as a group of data
organized in rows and columns. The columns usually represent the variables while the rows
usually represent the observations. Column labels are the names assigned to variables.
File
The actions (submenus) applied to the management of tables in the FILE menu are the
following:
NEW TABLE, OPEN…, SAVE TABLE, SAVE TABLE AS... , and CLOSE TABLE. Also
available in this window are an EXIT option and a list of the most recently modified files.
New table
FILE menu ⇒ NEW TABLE creates a new table. The user can also press <Ctrl+N>
or use the button with the blank sheet found on the toolbar (New Table button). A table with
one row and two columns will appear, and these can be expanded in order to enter data.
New tables are numbered consecutively (New table, New table_1, New table_2, etc.).
Open table
FILE menu ⇒ OPEN …, invokes an existing table. The user can also press <Ctrl+O>
or use the button with the picture of a file (Open Table button) on the toolbar. By pressing
<Shift>+ Open Table button, the user can directly access the Data file which contains the
files used in the examples in this manual. In order to open a table, the user should provide
the information solicited in the dialogue window.
InfoStat allows users to open files with the following formats:
InfoStat (*.IDB, *.IDB2) Excel (*.XLS) Graph (*.IGB)
Text (*.TXT, *.DAT) Dbase (*.DBF) Results (*.ITRES)
InfoGen (*.IGDB) Paradox (*.DB) EpiInfo (*.REC)
InfoStat assumes that in the data structure, columns represent variables and rows represent
observations. For each variable, every value should correspond to the same data type
(whole, real, categorical or date).
Data Management
6
If the user wishes to open an ASCII file with a TXT or DAT extension, the Import text
window will be activated.
By using the Import text window the user can indicate the Field separators he wishes to
use (tab, comma, semicolon, space or others). The data to be imported may contain the
names of the variables (columns). If the data contain the names of the columns, the user can
indicate whether what appears in Row 1 will be the name of future columns in the data table
(InfoStat shows this option by default). If the heading has text before the names of the
columns, the user should indicate which line contains the names of the columns. This can be
done by changing the number that’s on the side of the Row 1 option, until the line with the
names of the columns is shown in the first row. If the data do not contain the names of
columns, the option Use first row as column name should be deselected. In this case, the
variables will be headed as Column 1, Column 2, etc. In order to observe the information
that will make up the table once it is imported, press the Preview table button. If the
structure is correct, press Accept, otherwise, change the options and try again with Preview
table, until the desired result is obtained.
Data Management
7
Note: When data tables that have been saved as text (with .TXT extension) are imported from
Microsoft Excel, the empty cells in the original file appear as two consecutive separators. In this case,
the option Consecutive separators are generated as one should not be selected. By default, InfoStat
shows this option as unselected when a text file is opened. If, however, the file contains numeric and
alphanumeric data in a single column, InfoStat only recognizes the first character in the column. If it
is a number, the alphanumeric characters will be erased and vice versa. The simplest way to read files
from another program is by using the Copy and Paste functions. InfoStat provides the options Copy
with column name and Paste with column name to facilitate the importing and exporting of data.
For example, in order to import an Excel file, the user should simply copy the data he wishes to
export to InfoStat, including the names of the columns from Excel. The user should then open a new
table in InfoStat, where he should paste the copied content by using the option Paste with column
name.
Table toolbar
By positioning the cursor over a table and right clicking the mouse, several options become
available, including the Toolbar. This option allows the user to add a bar of buttons to an
active table, such as the one shown below.
These buttons allow the user to do the following, from left to right: increase font size,
reduce font size, eliminate decimals (the user should first click on a cell of the column of
interest), add decimals (the user should first click on a cell of the column of interest), insert
a row (before a previously selected column), eliminate a previously selected row, add a
column to the end of a table, insert a column (before a previously selected column),
eliminate a previously selected column, and highlight a selection.
The font size can also be modified by pressing Ctrl and ↑ (to increase the size) or Ctrl and ↓
(to decrease the size).
Variable management
This window appears when an active table is open and the user presses <Ctrl+E>. The
following actions are available in the dialogue box:
Data Management
8
Rename variables: This can be done by double clicking on a variable name in a list of
variables.
Move the position of one or more variables: The variables can be selected from the list,
and by pressing <Ctrl>, the selected block can be moved by using the arrow buttons (↑
moves it up and ↓ moves it down). Changes in the position of the list are automatically
updated in the table.
Select one or more variables to be eliminated: Once the variables are selected from the
list, click on the Mark to eliminate button. The variable will be eliminated from both the
list and the table.
Deactivate / activate one or more variables: When the check box to the left of the label is
unchecked, the variable is deactivated. (In the example, all the variables with a “1” in the
label are activated and selected.) The deactivated variables do not appear either in the table
or in the variable selector.
Forming groups of variables: Groups of variables can be formed by selecting the variables
and pressing the Group selection button. Variables in a group can be activated or de-
activated, colored, erased, etc. all together.
Save table
FILE menu ⇒ SAVE TABLE, saves the active table in InfoStat format (with .IDB2
extension), in the directory in use. The same can be achieved by pressing <Ctrl+S>, or the
Save active table button on the toolbar.
Data Management
9
Save table as
FILE menu ⇒ SAVE TABLE AS, saves the active table with the appropriate format and
directory required by the user. The formats are listed below:
InfoStat (*.IDB, *. IDB2) Excel (*.XLS) Dbase (*.DBF)
ASCII (*.TXT) InfoGen (*.IGDB) Paradox (*.DB)
The Export table button on the toolbar can also be used.
In the dialogue box, indicate the name, place and type of file. If an ASCII format is selected,
the user should select a field separator and indicate whether the first row should be used as
the name of columns (labels). If desired, the user can also indicate whether a character (or
group of characters) should identify a missing observation in the exported file.
Close table
FILE menu ⇒ CLOSE TABLE closes the active table. Alternatively, the user can press
<Ctrl+W>. If the table has been modified and has not been saved, InfoStat will ask the user
to confirm whether he wishes to save it.
Edit
The actions (sub-menus) that can be applied to the management of InfoStat tables in the
EDIT menu are the following: Cut, Copy, Paste, Copy with column name, Paste with
column name, Undo and Select all. The actions are used to edit cells, columns and/or rows,
similar to the editing of texts in Windows.
Data Management
10
Modifications to entered data in an InfoStat table are done from the active table. By pressing
<Enter>, the entered characters will be uploaded to the table. By pressing the <Esc> button
before pressing <Enter>, cell content that was previously uploaded will be re-established.
To stop editing, use the arrow buttons (up, down, left, right), the tab, or select another cell
with the mouse.
To select a group of cells, use the mouse to select the desired area. Alternatively,
select cells by using the keyboard, keeping the <Shift> key pressed and using the arrow
buttons to select the desired area. The highlighted areas can be printed by pressing the
Print button, found on the toolbar.
It is possible to select the font type, style, size and color for the entire table. This can
be done by simply selecting a cell and pressing the button with the letter “A” on the toolbar
to obtain the appropriate menu for this action. Buttons for the alignment of data to the right,
left, and center of the column also exist. These are located next to the “A” button.
In tables with .IDB2 format, a description of data contained in the table can be saved. The
description can be edited by pressing F2. When F2 is pressed, a field for writing the
description appears. If the second button on the toolbar of the dialogue window is pressed,
this field will be inserted in the file. If the user wishes to definitively include the description
in the data file, he should save the table.
Data Management
11
A description can be uploaded from a file with TXT or RTF format by pressing the first
button on the mentioned toolbar.
Data
Data Management
12
The actions (submenus) applied to the management of InfoStat tables in the DATA menu
are the following: New row, Insert row, Delete row, Deactivate case, Activate case,
Invert selection, Select cases, New column, Insert column, Delete column, Edit
categories, Edit label, Read labels from…, Data type, Alignment, Decimals, Variable
manager, Categorize, Fill, Generate a class-variable according to cell color, Adjust
column width, Sort, Transformation, Create dummy variables, Formula, Search,
Sampling-Resampling, Color selection, Merge tables, Rearrange columns, one under
the other, Rearrange rows as columns, Create new table using active cases, make a
new column by merging categorical variables, Split a category in its components,
Update, Show-edit data table description.
These actions can also be invoked by right-clicking the mouse when positioned on the data
table.
The following example illustrates some of the actions executed by the submenus.
Example 1: The user has access to a group of observations that refer to seed size (Size),
color of episperm (Episperm), percentage of germination (PG), number of normal plantules
(NP) and dry weight (DW) of Atriplex cordobensis
Note: Files used in this manual are located in C:Program FilesInfoStatData.
seeds, a foraging shrub. The data are
located in the file Atriplex.idb (courtesy of Dr. M.T. Aiazzi, of the Faculty of Agricultural
Sciences, U.N.C.).
New row
DATA menu⇒ NEW ROW adds the number of rows specified by the user in the emerging
window to the end of the table. Alternatively, the user can position the curser on the last row
and press <Enter> to generate new rows.
Insert row
DATA menu ⇒ INSERT ROW inserts a new row above the selected row.
Delete row
DATA menu ⇒ DELETE ROWS eliminates the selected row(s) from the table. This action
can be undone by using the Undo submenu from the Edit menu.
Deactivate case
DATA menu ⇒ DEACTIVATE CASE allows the user to exclude selected rows from the
procedure to be executed. To deactivate a row in the table, the user should double click on
the case number. Deactivated observations show their case number inside parentheses and
the corresponding row is colored.
Data Management
13
Activate case
DATA menu ⇒ ACTIVATE CASE activates cases that have been deactivated (i.e.,
activated cases participate in the analysis). To activate a single row, the user should double
click on its case number. To simultaneously activate several cases, the user should select a
cell from each row to be activated and activate them from the DATA menu or from the
menu that appears by right clicking the mouse. All selected cases are activated by default.
Invert selection
DATA menu ⇒ INVERT SELECTION activates (deactivates) cases that are deactivated
(activated).
Choosing cases
DATA menu ⇒ SELECT CASES... allows the user to establish criteria for selecting cases.
Once the action is executed, unselected cases are deactivated. First, the user should establish
to which variables the selection criteria will be applied, then specify the criteria.
In the Select cases dialogue window, a list of variables from the active table appears. From
this list, the user should select the variables to which the selection criteria will be applied,
entering these in the corresponding box on the Variables tab (a partition can be indicated in
the corresponding tab).
Procedures that facilitate the selection of variables are available when many variables are
used. At the foot of the list of variables, there are options to select variables according to a
particular common characteristic in their names. If the variables share a specific character or
Data Management
14
succession of characters, they can be simultaneously selected. The figure illustrates the
selection of all the variables whose names contain the letter P, once the option (…) box has
been activated. To specify that the character or succession of characters is at the beginning
of the label, activate the option […) box; to indicate that it’s at the end of the label, activate
the option (…] box. Wildcard characters can also be used. For example, by entering the
sequence “**1”, all variables whose labels have 2 characters before the number 1 will be
selected from the list. If “??1” is entered, all variables whose labels contain a “1” preceded
by two alphabetical characters will be selected, and if “##1” is entered, all variables whose
labels contain a “1” preceded by two numerical characters will be selected.
If groups have been formed (using the Variable manager window), the box labeled {g}
becomes available. By activating this box, a field that contains the list of available groups
appears, from which the groups can be selected.
Another way to select variables is to use a list saved in a text file. In so doing, all the
variables contained in the file will be selected. In order to do so, the user should right click
on the box that contains the list of variables of the active table. A menu appears in which the
Select from a list option appears, followed by the Text file option. In this same menu, there
is an option for alphabetically ordering the list of variables.
Once the variables have been selected, criteria for selecting the cases should be established.
The variables that participate in the selection process appear in the dialogue box, and there
is a field for writing the criteria. In the case that a criterion is established based on more than
one variable, the user should select one of the variables, write the sentence that indicates the
criterion, for example x<80, and then press Enter. The user should proceed in the same way
with each variable of interest. By pressing Accept, the cases outside of the selection appear
deactivated (colored and with their case number in parentheses), in the active table.
Data Management
15
More than one sentence can be written to determine the criterion for a single variable. This
can be done by pressing Enter after writing a sentence.
By activating the Create new table using active cases, a table with the selected cases is
generated.
New column
DATA menu ⇒ NEW COLUMN adds a new column to the end of the table. The type
of format can be indicated (whole, real, categorical, or date). The added column is named
Column 1, Column 2, etc. By pressing the button with an image of a table, located in the
toolbar, new columns are added to the right hand side of the active table. Columns generated
in this way are not previously assigned a type. The type of data in these columns is assigned
automatically when content is uploaded to any of its cells. If the content is numerical, the
type assigned is real, if it is alphanumeric the assigned type is categorical. If the user wishes
the type to be whole, he should change it afterwards, starting from a column with real type
data.
Insert column
DATA menu ⇒ INSERT COLUMN inserts a column in a place prior to where the cursor is
located. Data type (real, whole, categorical or date) can be indicated. Inserted columns will
be named Column 1, Column 2, etc.
Delete column
DATA menu ⇒ DELETE COLUMN eliminates the selected column(s). The user need only
select one cell of each column. This action can be reversed by using the Undo submenu in
the Edit menu.
Note: to change the position of a column, select the column while pressing <Ctrl> and move the
mouse, while continuing to press down on the mouse button, to the new desired position. Upon
releasing the mouse button, the column will remain in its new location.
Edit Labels
DATA menu ⇒ EDIT LABELS allows the user to change the name of a column. The user
need only position the mouse on a cell of the column he wishes to edit and solicit this
action. Acceptable names include spaces and ASCII characters, with a limit of twenty
characters. If the name begins with a number, InfoStat will add the letter C beforehand. By
selecting several columns and applying this action, a dialogue window that allows the user
to successively change column names appears.
In files generated with an IDB2 extension, double clicking on the edit field where the name
of the variable is written makes a dialogue appear that allows the user to write a description
Data Management
16
of the variable. If the user wishes to include the description in the file, the description should
be saved.
Read labels from…
DATA menu ⇒ READ LABELS FROM.... allows the user to read the names of variables in
an active table from a text file (*.txt). InfoStat assumes that the names are in a list (one
name beneath the other) in the order in which the variables are found in the table.
Data type
DATA menu ⇒ DATA TYPE allows the user to declare the type of data in a column. The
following data types are acceptable: whole, real, categorical, and date. Dates can be entered
in the following formats: 20/05/07, 20-05-07 or 20.5.07.
If the user does not declare a data type, InfoStat assigns the type that corresponds to the first
data entered. Once the type has been declared, only data of the same type can be entered.
Alignment
DATA menu ⇒ ALIGNMENT changes the position of the presentation of the content in the
selected cells. Alignment positions include left, center and right. The default alignment for
numerical cells is right, and for categorical cells the default is left. There are also buttons to
complete the alignment action, found on the tool bar next to the “A” button.
Decimals
DATA menu ⇒ DECIMALS changes the number of decimal places included in the
numerical content of the cells. Up to 10 decimal places are allowed. By default, 2 decimal
places are included. When data are copied from the grid only visible decimals are taken into
account, thus it is important to specify the desired number of decimals for each variable.
Automatically adjust columns
DATA menu ⇒ ADJUST COLUMN WIDTH (<Ctrl+L) adjusts the width of selected
columns according to the length of the column labels or to cell content. If no column is
selected, the action will be applied to all the columns of the table.
Sort
DATA menu ⇒ SORT allows the user to sort records in ascending or descending order of
the values in one ore more columns. A dialogue window shows the names of columns of the
active table in a list on the left. On the right, two lists, ascendant order and descendent order,
show the variables to be sorted according to the hierarchy determined by the user and the
order in which the variables were selected. For example, if the file has two columns, gender
and age, where the gender variable comes first in ascending order group, and the age
Data Management
17
variable comes second in descending order group, by performing the sorting action, the file
will be ordered by gender, and within each gender, it will be sorted in descending order by
age.
The buttons found on the lower part of the dialogue window allow the user to change the
sorting criteria (ascending or descending) and the sorting hierarchy.
For example, using data from the Atriplex file, observations were sorted in descending
order, according to the values of the variable PG. The resulting configuration is shown in
the following table:
Table 1: Atriplex file sorted in descending order by variable PG.
Size Color Germination Normal
seedlings
DW
medium reddish 100 80 0.0032
big yellow 93 80 0.0040
medium yellow 93 80 0.0038
medium yellow 93 80 0.0043
small reddish 93 7 0.0030
big yellow 87 87 0.0043
medium yellow 87 54 0.0033
. . . . .
. . . . .
small dark 20 0 0.0030
medium dark 13 7 0.0030
Alternatively, sorting can be invoked from the toolbar by activating the Sort icon.
Warning: this option cannot be automatically undone. To keep the original file, close the table
without saving changes, save the file with another name, or sort in such a way as to recover the
original order of the data.
Data Management
18
Categorize
DATA menu ⇒ CATEGORIZE allows the user to categorize data from a previously
selected column while generating a new column with the desired categorization. This action
is available only when the data in the selected column are whole or real. Two procedures are
available: assign categories to intervals or assign categories assign categories to numeric
codes.
By selecting assign categories to intervals, categories are made by setting the upper limits
of a group of class intervals. Cases that belong to the same class are assigned to the same
category. The following categorization methods are defined, depending on the way in which
class intervals are established:
FIXED: categorizes a data group, generating as many intervals as solicited categories.
Minimum and maximum valies, length, and upper limits for each category are shown,
identified as C1, C2, etc. If the user wishes to identify each category with whole numbers,
he should activate the Numbers box. By default the categories are sorted in ascending
order; to change this, the Descendent box should be activated.
To execute the categorization, press the Accept button. The user can change Minimum and
Maximum values to obtain the desired categorization.
PROBABILISTIC: the upper limit of each category represents a percentile of the
distribution of the variable, according to the number intervals solicited. For example, if 4
intervals are solicited, their respective limits are the 25, 50, 75 and 100 percentile. To apply
the categorization, press the Accept button.
CUSTOMIZED: the upper limit of the intervals of each category can be entered. To do so,
the user should select the number of categories that he wishes to create and enter the upper
limit of each interval in the adjacent table. By default, the upper limit of the last category is
the maximum value of the observed values. To apply the categorization, press the Accept
button.
As an example, using data from the Atriplex file (previously sorted by the variable PG, in
descending order), observations were categorized by intervals. The resulting configuration is
shown in Table 2. Using the FIXED option, the pre-establised configuration was selected:
Nº categories: 5; min: 13; max: 100; length of interval: 17.4; upper interval limits: 30.4;
47.8; 65.2; 82.6; 100. Using the PROBABILISTIC option, 5 categories were selected with
the following upper limits: 33; 60; 80; 87 y 100. Using the PERSONALIZED option, two
categories were selected: one with germination values less than or equal to 80%, specified
by writing the number 80 in the LS1 field, and another with values greater than 80%,
specified in the LS2 field where the number 100 appears by default.
Data Management
19
Table 2: Atriplex file with the variable PG categorized according to three criteria.
Germ. Fixed Proba. Pers. Germ. Fixed Prob. Pers.
100.00 C5 C5 C2 73.00 C4 C3 C1
93.00 C5 C5 C2 66.00 C4 C3 C1
93.00 C5 C5 C2 60.00 C3 C2 C1
93.00 C5 C5 C2 60.00 C3 C2 C1
93.00 C5 C5 C2 53.00 C3 C2 C1
87.00 C5 C4 C2 53.00 C3 C2 C1
87.00 C5 C4 C2 40.00 C2 C2 C1
87.00 C5 C4 C2 33.00 C2 C1 C1
87.00 C5 C4 C2 33.00 C2 C1 C1
87.00 C5 C4 C2 26.00 C1 C1 C1
80.00 C4 C3 C1 20.00 C1 C1 C1
80.00 C4 C3 C1 20.00 C1 C1 C1
80.00 C4 C3 C1 13.00 C1 C1 C1
73.00 C4 C3 C1
Upon selecting assign categories to numeric codes, the categories can be read from a table
or entered by the user. This process is useless, for example, in the case of a file that uses
numeric coding to represent the different states of qualitative variables. The corresponding
dialogue window is shown below.
In the dialogue window, the list of numbers to be
categorized appears on the left, and on the right appears
an empty list of categories. The categories can be
entered manually or read from a text file or table stored
on the clipboard. The text file should contain as many
lines as categories, and each line should have a number
followed by a separator symbol (this can be “=”, “:”, “.”
or a tab), followed by the name of the category
associated with this number. For example, if upon
registering the type of occupation the number 2
corresponds to the category “unemployed”, this should
appear as follows: 2=unemployed. If the option for
assigning categories based on a table stored on the
clipboard, this table should have been previously copied
from a file that includes a description of the structure for
the text files. These uploading options are selected from a menu that appears by right
clicking the mouse on the assignment table, as shown in the figure. In order for the option
Copy from clipboard to appear, the table should be on the clipboard.
To obtain the categorization, press the Accept button. Categories will appear in a new
column with a label with the prefix “Cat” followed by the name of the variable that
corresponds to the categorization. The figure shows an edit field in which Cat_Occupation
appears, which can be modified by writing a new name.
When a numerical variable is categorized using an assignment table, the table can be read
from the description of the resulting variable.
Data Management
20
Edit categories
To apply this action, the column that contains the categories should be selected. DATA
menu ⇒ EDIT CATEGORIES makes a dialogue window (Edit categories) appear that
shows the categories of the selected variable (column). In this window, a list with existent
categories will appear. Upon selecting a category, its name will appear in an editing field
located above the list. In that field, the name of the category can be modified. This field is
automatically shown in the list. By pressing the Accept button, changes will be reflected in
the data table.
A category can be grouped with another one by using the arrow buttons: Upper limit and
Lower limit. If a category is selected and the right arrow button is pressed (Lower limit),
the selected category will be “included” within the category that precedes on the list. Upon
pressing the Accept button, the included category will disappear from the data table and it
will be replaced by the category in which it is included. Another way to include a category
within another is to select it with the mouse, and while keeping the mouse button pressed,
drag it to the category that is to contain it. If a category is incorrectly placed within another
category, the user can re-locate it by dragging it to the category where he wishes it to be
included. Before pressing the Accept button, the action can be reversed by selecting the
included category and pressing the left arrow button (Upper limit). To change the position
of the categories, the up (↑) and down (↓) arrows can be used. Once the user is satisfied
with the categorization, he should press the Accept button so that the changes are reflected
in the data sheet.
In order to facilitate entering data for categorical variables, each category is associated with
a number that depends on its position in the list that appears in the Edit categories dialogue
window. For example, if the categories are “small”, “medium” and “large” and they appear
in the list in that order, by entering “1” in one of the cells of the column that contains these
categories and pressing <Enter>, the name “small” will appear. If the order of the categories
in the list is altered, the numeric coding will respond to the new order.
If a variable is changed from categorical to whole, numbers that correspond to the order of
the category in the list will be generated.
The button shown in this paragraph can be found in InfoStat’s toolbar, which
allows the user to edit categories without going to the DATA menu.
Transforming
By invoking this action, the Transformations window will appear, so that the user can
select the variable(s) he wishes to transform. These should be quantitative variables. Upon
pressing the Accept button, another window that allows the user to select the transformation
appears. In this window, two lists of transformations appear: one to be applied to a variable
and another to be applied to a combination of variables. Regardless of which transformation
is selected, InfoStat generates new columns containing the transformed variables, which will
Data Management
21
automatically be named with the name of the transformation followed by an underscore and
the name of the original variable.
Selecting the transformatio: possible transformations including the following—
Standardize, Standard (by row), Center, Center (by row), Externally studt res
(externally studentized residuals), Rank, Normal score, Log10 (base 10 logarithm), Log2
(base 2 logarithm), Ln (natural logarithm), Square root, Inverse, Power, ArcSin (square
root (p)), Probit, Logit, Complement log-log, Map to [0,1], if >= mean then 1 else 0, if
>= median then 1 else 0, Multiply by, and Scale by the maximum. If two or more
variables are selected, other transformations that appear in the Combining variables list can
be executed.
Standardize: allows the user to standardize the selected variable(s). The standardization is
done by extracting from each observation the mean of the column and dividing this by the
standard deviation of the values of the column.
Standardize (by row): if the user selects more than one variable in the transformations
menu, the “standardize by row” option becomes enabled. In such cases, each entry in the
table is transformed to its standardized value using the mean and standard deviation of the
elements in the corresponding row.
Center: this transformation centers by column. In other words, from each observation,
InfoStat subtracts the mean value of the variable, obtained using data from the
corresponding column.
Center (by row): in this case, from each value of a selected variable, InfoStat subtracts the
row mean, obtained using data for all the selected variables.
Externally studt res (externally studentized residuals): for a position medel, define:
( )( ) ( )i i
iERS y y S− −
= −
where yi is the value of the discarded observation, ( )i
y −
is the mean of the data without the
observation yi , and S(-i)
is the standard deviation of the data calculated after the observation
is discarded.
Rank: this function assigns the position occupied in the ascending list to the original data.
In a group of n data, the observation with the lowest position is assigned rank 1, the one
with the second lowest position is assigned rank 2, and so on and so forth. The observation
with the highest position is assigned rank n. If two or more observations are assigned a singl
value (tie), the rank assigned to each observation is an average of the consecutive ranks
corresponding to that value.
For example, for the series 10, 20, 20, 30, 40, 50, 50, 50, 60 the transformed series is as
follows: 1, 2.5, 2.5, 4, 5, 7, 7, 7, 9.
Normal score: the “Rank” transformation is applied to the selected variable. Next, each
rank value is divided by (n+1), where n is the total number of data in the sample. For each
quotient, the inverse of a Normal (0:1) distribution function is obtained.
Data Management
22
Logarithm transformation:InfoStat allows users to generate variables using the Log10
(base 10 logarithm), Log2 (base 2 logarithm) and Ln (natural logarithm). If the value to be
transformed is less than or equal to zero, the result will be a missing value. In this case,
log(y+c) can be used, where c is a constant.
Square root: y or y c+ , where c is a constant.
Inverse: 1/y.
Power: yλ
with λ≠0 where λ is the desired power.
ArcSin (square root (p)): ( )-1Sen p with p ∈ [0,1] (arcsine of the square root of the
proportion).
Probit: defined as Probit (p)=F -1
(p) with p ∈ (0,1), where F -1
is the inverse of the normal
distribution function.
Logit: defined as Logit (p)=ln(p/(1-p)) with p ∈ (0,1).
Complement log-log: defined as CLL(p)=ln[-ln(1-p)] with p ∈ (0,1).
Map to [0,1]: given a group of observations {y1,...,yn}, the transformation consists in
subtracting from each value the minimum of {y1,...,yn} and divide the resulting value by the
rance (difference between the maximum and minimum).
If >= mean then 1 else 0: allows the user to dicotomize the data as a function of the mean
of the observations. Observations greater than or equal to the mean will take on a value of 1.
If >= median then 1 else 0: allows the user to dicotomize the data as a function of the
median of the observations. Observations greater than or equal to the median will take on a
value of 1.
Accumulate: generates a column where the t-th element represents the sum of the first t
elements. For example, if the column contains values 10, 12 and 20, applying this action
will generate the values 10, 22 and 42.
Scale by the maximum: Divide the selected columns by their maximum.
Divide by the sum: Divide the selected columns by their sum.
Fill with a sequence: Replece the non-missing values of the selected columns by a
sequence in the order of the registers.
Combining variables allows the user to apply functions that involve several columns in the
file. The variables that are to intervene in the evaluation of the selected function should be
specified in the variables selector. The selected function can be one of the following: Sum,
Mean, Median, Variance, Standard deviation, Minimum, Maximum, and Linear
combination. The Sum function sums the values of the selected columns in each row of the
file and generates a new variable named Sum. Similarly, the Mean, Median, Variance,
Standard deviation, Minimum, and Maximum of the values in each row can be solicited.
Data Management
23
When Linear combination is selected, the coefficients of the combination should be
indicated in the Coefficients window. The coefficients should be entered one by one,
pressing <Enter> after each entry. Thus, if there are two columns, say X and Y, and the
numbers 2 and 3 are specified in the coefficients window, a new column will be generated
called “linear combination equal to 2X+3Y”.
Create dummy variables
In some statistical applications, for example in those associated with regression models, it is
necessary to transform a categorical variable X with k categories in k-1 binary variables
(with value 0 or 1). A binary variable of this type is known as a dummy (auxiliary or
indicator) variable. The group of k-1 dummy variables is used to identify each of the
categories of the original variable X. Thus, if, for example, X has k=3 categories, two
dummy variables D1 and D2 will be enough to represent each of the categories of X. For
example, the combination D1=1 and D2=0 can identify the first category; D1=0 and D2=1
can identify the second category; D1=0 and D2=1 can identify the third category. In this
case, the third category (that one in which all dummy variables equal zero) is generally
called the reference category.
To generate dummy variables, select the original categorical variable, and upon pressing
Accept, a Dummy variable generator will appear on the screen where the original
variable(s) and available categories for each one of these will be listed. The first category
will be automatically selected to be used as a reference category. If the user wishes another
category to serve this purpose, he should move the cursor to the desired category in order to
select it. InfoStat generates k-1 dummy variables, which will be added to the data table,
which are identified by the original variable name followed by an extension, so that it may
be differentiated.
The option Multiply by… which appears in the Dummy variables generator screen can be
used to obtain the product of a dummy variable and some other variable of interest. These
products will be shown in new columns in the data table, with a name that indicates their
origin. An example of the application of this option is available in Regression with dummy
variables.
Fill...
This option automatically fills a group of selected cells according to the specified option. To
fill cells, select the desired cell(s) and specify the distribution from the DATA ⇒ FILL...
menu.
Warning: these actions replace the values of the selected content, thus if the user wishes to preserve
the content of the original column, he should copy the column and apply the distribution to the new
column.
Data Management
24
Downward
The empty cells are filled with the content of the first filled cell that precedes the empty
cells in that same column. This action can also be completed by pressing CTRL+D.
With sequence
Beginning with the first selected cell, selected cells are assigned a natural number, in
ascending order. The numbering continues on the columns on the right, and does not re-start
with the new column.
With uniform (0,1)
Upon selecting this option the selected cells are assigned the value of a continuous random
variable with uniform distribution, between 0 and 1.
With Standard normal (0,1)
Upon selecting this option, selected cells are assigned the value of a random variable with
according to a standard normal distribution with mean = 0 and variance = 1.
Others...
In order to generate an ample list of distributions of random variables, InfoStat allows the
user to fill cells with the following: 1) realizations of the random variable, 2) a cumulative
distribution function for arguments read from the selected cells, 3) an inverse distribution
function evaluated according to the selected values, and 4) a probability function
evaluated according to the
selected values.
The following distributions are
available: Uniform, Normal,
Student T, Chi square, Non central
F, Exponential, Gamma, Weibull,
Logistic, Gumbel, Poisson,
Binomial, Geometric,
Hypergeometric and Negative
binomial.
The Sequence (begin, step) option
is also available, and it can be
used to fill cells with a sequence
of real numbers, where the user
defines the beginning and the
distance between two consecutive
numbers in the Parameters (begin
and step) subwindow that is
activated upon selecting Sequence
(begin, step). For example, if the
Data Management
25
beginning number is 1 and the step is 2, the selected column will begin with 1 and will
continue with 3, then with 5, and so on.
To fill cells with realizations of the random variable, cumulative distribution function,
inverse distribution function, or probability distribution of one of the available random
variables, select the random variable and in the Parameters panel, specify the constants that
characterize the selected distribution.
Select seed: By default, InfoStat uses a random seed to generate random numbers; however,
in some cases it is useful to generate a single random sequence. This can be done by
specifying a single randomly selected number, not equal to zero, in the edit field that is
activated when the Select seed button is pressed. If the number zero is specified as the seed,
InfoStat assumes that the seed is random, and therefore the sequences will always be
different.
A brief description of the available distributions is shown below:
Note: E(X) and V(X) indicate the expected value and the variance of the random variable (X),
respectively.
Uniform (a,b): A continuous random variable X has a Uniform distribution on the interval
[a,b] if its density function is as follows:
1
( ; , ) I ( )[ , ]f x a b xa bb a
=
−
,
where I ( )[ , ] xa b is the indicator funciton, and the parameters a and b satisfy -∞<a<b<∞.
E(X)=(a+b)/2 and Var(X)=(b-a)2
/12.
Normal (mean, variance): A continuous random variable, X, with -∞< x<∞, has a Normal
distribution if its density function is as follows:
21 ( ) /2( ; , )
2
x m vf x m v e
vπ
− −=
where the parameters m (mean) and v (variance) satisfy -∞< m<∞ y v>0. InfoStat uses m
and v to represent the parameters E(X)=µ y Var(X)=σ2
, respectively.
Student-T (v): The continuous random variable X (with -∞<x<∞) has a Student-T
distribution with v degrees of freedom if its density function is as follows:
( )
( )( )/ 21
2
1 / 2 1 1
( ; )
( / 2) 1 /
f x
x
ν
ν
υ
ν νπ ν
+
Γ +
=
Γ +
  
where v is a whole positive number known as degrees of freedom, and Γ(.) is the gamma
function with the following form:
Data Management
26
0
1( ) yrr y e dy
∞
−−Γ =∫
E(X)=0 for degrees of freedom greater than 1, and V(X)=ν/(ν-2) for ν >2.
Chi square (v, lambda) (non-central): The random variable X has a Chi square distribution
if its density function is as follows:
( )
( )
2 2 / 2 / 2
0,
0 ( / 2)2
2
2
( ; , ) I ( )
!
j j x
j jj
e x e
f x x
j
νλ
νν
λ
ν λ
∞ + −− −
∞
= +
=
+
 
  
  
    Γ   
  
∑
where I ( )(0, ) x∞ is the indicator function, ν is a whole positive number that denotes degrees
of freedom, Γ(.) is the gamma function, and λ≥0 — known as the non-central parameter and
defined as λj
=1 when λ=0, j=0.
E(X)=ν+2λ and V(X)=2(ν+4λ). If λ=0, the distribution is central Chi square.
F non-central (u, v, lambda): The continuous random variable X has a non-central F
distribution, characterized by degrees of freedom u (degrees of freedom of the numerator)
and v (degrees of freedom of the denominator), and by the non-central parameter, λ , if its
density function is as follows:
( )
( ) ( )
( )
( )
( ) ( ) ( )
( )
2 / 2
2 2 / 2
(0, )2 / 2
0
2
2
; , ,
2
1
2 2
I ( )
!
u j
u jj
u j
j
j u u
x u
j u ux
f
e x
x
j
λ
ν
ν
ν
ν λ
ν
ν
λ
+
+ −−
∞
∞+ +
=
+ +
=
+
Γ +
Γ
Γ
∑
where I ( )(0, ) x∞ is the indicator function, u and ν are whole positive numbers, Γ(.) is the
gamma function, and λ≥0, defined as λ j
=1 when λ=0 and j=0. If λ=0, the distribution is F
central with E(X)=v/v-2 for v>2 and V(X)=2 v2
(u+ v-2)/u(v-2)2
(v-4) for v>4.
Exponential (lambda): The continuous random variable X has an Exponential distribution
if its density function is as follows:
( ); I ( )(0, )
xf x e xλλ λ −= ∞
where I ( )(0, ) x∞ is the indicator function and λ>0. E(X)=1/λ and V(X)=1/λ2
.
Gamma (r, lambda): The continuous random variable X has a Gamma distribution if its
density function is as follows:
1
(0, )( ; , ) ( )
( )
r x
r
f x r x e x
r
λ
λ
λ − −
∞= Ι
Γ
Data Management
27
where I ( )(0, ) x∞ is the indicator function, r>0 and λ>0, and Γ(.) is the gamma function.
E(X)=r/λ and V(X)=r/λ2
.
Beta (a, b): The continuous random variable X has a Beta distribution if its density function
is as follows:
1 1
(0,1)
1
( ; , ) (1 ) ( )
( , )
a bf x a b x x x
B a b
− −= − Ι
where (0,1) ( )xΙ is the indicator function, a>0, b>0 , and B(a,b) is the beta function given by
the following expression:
2
1 1
0
( , ) (1 ) 0, 0a bB a b x x dx para a b− −= − > >∫
E(X)=a/(a+b) and V(X)=ab/((a+b+1)(a+b) 2
).
Weibull (a, b): The continuous random variable X has a Weibull distribution if its density
function is as follows:
1
(0, )( ; , ) ( )bb ax
xf x a b xabx e− −= Ι
where (0, ) ( )x xΙ is the indicator function, a>0 and b>0. E(X)=(1/a)1/b
Γ(1+b-1
) and
V(X)=(1/a)2/b
[Γ(1+2b-1
)-Γ2
(1+b-1
)], and Γ(.) is the gamma function.
Logistic (a,b): The continuous random variable X has a logistic function if its cumulative
density function is as follows:
( ) ( )
1
/
; , 1 x a b
x a bF e
−
− − = + 
where -∞< a<∞ and b>0. E(X)=a and V(X)=(π2
b2
)/3.
Gumbel or extreme value (a,b): The continuous random variable X has a Gumbel
distribution if its cumulative density function is as follows:
( )/( ; , ) ( )x a bF x a b exp e− −= −
where -∞<a<∞ y b>0. E(X)=a-bγ where γ approaches 0.577216 and V(X)=(π2
b2
)/6.
Poisson (lambda): This distribution provides a model for count-type variables where the
counts refer to the number of events of interest in a unit of time or space (hours, minutes,
m2
, m3
, etc.). A discrete random variable X has a Poisson distribution if its density function
is as follows:
( ) [ ] ( )0,1,...
I;
!
x x
xf x
e
x
λ
λ−
=
where [ ] ( )0,1,...
I x is the indicator function and λ>0. E(X)=λ and Var(X)=λ.
Data Management
28
Binomial (n, p): This distribution occurs when the following conditions are simultaneously
present: a) Bernoulli trials are executed, b) the parameter p (probability of “success”) is
constant between trails, and c) trials are independent of each other.
Bernoulli distribution: in some experiments, there are only two possible results: success or failure,
presence or absence, yes or no, etc. A Bernoulli variable is a binary variable that identifies these
events. For example, x=1 may represent success and x=0 may represent failure. E(X)=p and
V(X)=p(1-p), where p is the probability of success.
A discrete random variable X is said to have a Binomial distribution if its density function is
as follows:
[ ] ( )0,1,...,
( ; , ) I
n
x n x
nx
xf x n p p q −=
 
 
 
where [ ] ( )0,1,...,
I n
x is the indicator function, 0≤p≤1, q=1-p and n=1,2,... is the total number
of trials. E(X)=np and Var(X)=npq.
Geometric (p): This distribution is of special interest in modeling the number of trails
needed for the first success to occur. A discrete random variable X has a Geometric (or
Pascal) distribution if its density function is as follows:
( ) ( ) [ ] ( )0,1,...
; 1 I
x
f x p xp p= −
where [ ] ( )0,1,...
I x is the indicator function, 0≤p≤1, and q=1-p. E(X)=q/p and Var(X)=q/p2
.
Hypergeometric (m,k,n): This distribution is associated with situations in which there is
sampling without replacement—that is, situations in which an element of the population is
randomly selected, and so on and so forth, until the trail is complete without substituting the
extracted elements. Let a population be a group of m elements, k of which are in one of two
possible states (success) and m-k of which are in the other state (failure). Similar to the
Binomial distribution, the problem of interest is to find the probability of obtaining x
successes in a sample of size n. A discrete random variable X has a Hypergeometric
distribution if its density function is as follows:
( ) [ ] ( )0,1..,n
; , , I
k m k
x n x
m
n
f x m k n x
−
−
=
   
   
   
 
 
 
where [ ] ( )0,1,...
I x is the indicator function, m=1,2,..., k=0,1,...m and n=1,2,...,m.
E(X)=n(k/m) and Var(X)=n(k/m) ((m-k)/m) ((m-n)/m-1).
Negative binomial (m,k): As in the repetition of Bernoulli trials, certain problems, common
in studies of natural populations, concentrate on the probability of finding x individuals in a
simple unit under study where the individuals tend to be aggregated (Contagious
distribution). InfoStat allows the user to calculate those probabilities by means of the
Data Management
29
Negative binomial distribution. A discrete random variable X has a Negative binomial
distribution if its density function is as follows:
( )( )( ) ( )
[ ]
( )0,1,...
1 2 ... 11
I
!
( ; , )
x
k k k k x p
x
k x qq
f x m k
   + + + − 
            
=
where [ ]
( )0,1,...
I x is the indicator function, p=m/k and q=p+1. The parameters m and k
satisfy the following conditions: m>0 (average number of individuals per sampling unit) and
k>0 (contagious or aggregation parameter).
Formula
It is possible to specify a formula whose results can substitute the content of an existing
column or can be added to a new column.
Warning: the names of the variables used in the calculation should not have parentheses,
mathematical operation symbols or names of reserved functions, but they can contain accent marks
and eñes.
The dialogue window is shown below:
During a work session, the formulae are stored in a list as they are written, and they are thus
available for future use. To visualize them, the user should right click on the field in which
the formulae are written.
The dialogue window shows a list of available variables which can be included in a formula
by clicking on the name of the list. When this procedure is followed to add variables to the
expression that is being written, the names appear in quotes. This allows the user to include
names that contain spaces or mathematical symbols that should not be interpreted as such.
Data Management
30
The user can either used predefined functions or he can define his own functions. In the
latter case, the user should write the function in the panel that appears below the formula
edition field. For example, the function cube(x) is not a predefined function, but it can be
specified by the user in the User defined functions panel by writing: cube(x)=x*x*x. This
definition will allow the user to apply the cube function to any other variable in the active
table or to any other valid expression. By writing in the formula specification field, for
example, h=cube(COLUMN2), the cube function will be applied to the data in column 1.
If the variables involved in the formula have a very long name, these names can be
substituted in the formula with %#, where # is the number of the column that holds the
variable. For example, if the data table has 3 columns, %1 denotes the name of the first
column, %2 denotes the name of the second column, and %3 denotes the name of the third
column. To identify the correspondence between column name and number, press the Alt
key. While this key is held, the names of the columns in the active table will be shown as
%#.
If the user wishes to apply a function such as mean(.), min(.), max(.), which accept multiple
arguments, to a block of variables, he should use the notation f(%a:%b), where f denotes the
function, and %a and %b denote the column number of the beginning and end of the block,
respectively. Note that the character that separates the beginning and end of a block is
“colon” (:). Continuing with the above example, in order to calculate the average aof the
first 3 variables in the file, the following should be indicated: (%1:%3). Another way to
indicate that the function should be applied to a group of variables such as, for example
mean (), is to use the format mean (name variable1: name variableN) indicating that the
mean of all the variables between the first and nth variable is desired. This expression can
be written manually or automatically, by selecting the block of variables in the list of
variables.
IDB2 data tables save the formulae that generate the contents of a column. It is possible to
update the content of a column by applying the formula again. To do so, the column should
be selected, and then the Update option should be chosen from the Data menu or from the
menu that appears upon right clicking on the mouse. The dialogue appears in Macros mode,
with the corresponding formula (or formulae, if more than one column was selected). These
formulae can be edited or executed, individually or jointly, to update column content.
Modifications can be conducted from the data table, while keeping the formulae window
open.
To specify a formula, select DATA⇒ FORMULA and write, for example, the expression
Y=LN(COLUMN1)+3 in the window.
The following operators and functions are predefined in InfoStat:
+ : addition operator.
- : subtraction operator.
* : multiplication operator.
/ : division operator.
Data Management
31
^ : exponent operator (only positive numbers in the base).
( : open parentheses.
) : close parentheses.
e : constant 2.7172…
PI: constant 3.141592653…
SETSEED(x): Use this sentence with any integer as argument to set the random seed to a
given initial value.
ABS(x): absolute value of x (Range of x: -1e4932...1e4932).
ARCCOSINE(x) or ARCCOSIN(x): arccosine of x.
ARCSINE (x) or ARCSIN (x):: arcsine of x.
AREAY(y1;…;yn): Calculates the area under the curve defined by the ordered pairs (Y,X),
assuming that the values of X are equally spaced by one unit.
AREAYX(y1;x1;…;yn;xn): Calculates the area under the curve defined by the ordered pairs
(Y,X).
ATAN(x): arctangent of x (Range of x: -1e4932...1e4932).
COSINE(x) or COS(x): cosine of x (Range of x: -1e18...1e18).
SQUARE(x) or SQR(x): square of x (Range of x: -1e2446... 1e2446).
STDEV(x1;x2;…;xn): Calculates the standard deviation of the indicated variables.
DISTNORMAL(x;m;v): Calculates the cumulative probability up to x for a normal
distribution with mean m and variance v.
EXP(x): exponential e^x (Range of x: -11356...11356).
FACTORIAL(x): factorial of x.
GAMMA(x): Assigns values of the Gamma distribution to the values of the indicated
function.
INVNORMAL(p;m;v): Calculates the value of x such that P(X<x)=p with X~N(m,v).
LN(x): natural logarithm of x (Range of x: 0...1e4932).
LN2(x): base 2 logarithm of x.
LOG10(x): base 10 logarithm of x.
MAX(x1;x2;…;xn): Calculates the maximum value of the indicated data group.
MEAN(x1;x2;…;xn): Calculates the mean of the values of the indicated variables.
MEDIAN(x1;x2;…;xn): Calculates the median of the values of the indicated variables.
MIN(x1;x2;…;xn): Calculates the minimum value of the indicated data group.
MOD(x) : modulus (or remainder) operator (applicable only to whole numbers).
Data Management
32
NORMA(x1;x2;…;xn): Calculates the norm of the vector x.
NORMAL(m, v): Generates realizations of a random, normal variable with mean m and
variance v.
ROUND(x): rounds x (Range of x: -1e9...1e9).
SQRT(x): square root of x (Range of x: 0...1e4932).
SINE(x) or SIN(x): sine of x (Range of x -1e18...1e18).
SUM(x1;x2;…;xn): Sum of the values of the indicated variables.
TANGENT (x): Tangent of x.
TRUNC(x): takes the whole value of x (Range of x: -1e9... 1e9).
URN. Generates realizations of a random variable with uniform distribution.
UNIFORM(a, b): Generates realizations of a random variable with uniform distribution on
the interval (a, b).
VARIANCE(x1;x2;…;xn): Calculates the variance of the values of the indicated variables.
ZRN: Generates realizations of a random variable with standard normal distribution. To
work with date type variables, the functions described below are available (the arguments
required by the function are in parentheses).
DIADELCICLO(date,day,month): this sentence generates a column that contains the day
of the cycle (on a scale of 1 to 365), according to the corresponding date and taking into
account that the cycle begins on the day and month specified in the argument. For example,
if in the formula field the user enters day=DIADELCICLO(date, 1,9), a column with the
name of the day that contains whole numbers between 1 and 365 is generated, each one
corresponding to the date indicated in the argument, where day “1” of the cycle is
September first. Thus, according to this example, if the date column reads 18/09/07, the day
column will contain the whole number 18; if the date column reads 03/10/07, the day
column will contain the whole number 03.
FECHADELDIADELCICLO(diadelciclo,day,moth,year): returns the date that
corresponds to the specified day of the cycle, according to the day, month and year that
correspond to the date of origin of the cycle. If the year argument is omitted, it takes the
present year.This function is the inverse fo the function DIADELCICLO.
DIAJULIANO(date): generates a column containing the julian day that corresponds to each
data read from the date column.
YEAR(date): generates a column containing the year that corresponds to each data read
from the date column.
MONTH(date): generates a column containing the month that corresponds to each data read
from the date column.
DAY(date): generates a column containing the day of the month that corresponds to each
data read from the date column.
Data Management
33
DATE(day, month, year): generates a column containing the date that corresponds to the
specified day, month and year.
Search
DATA menu ⇒ SEARCH presents a dialogue window that allows the user to search for
numbers, categories or dates, equal to, greater than, less than and/or different from a that
specified by the user, within a part of the table that has been previously selected. These
values can be replaced by another, by activating the Replace box, excluded from the
analysis by activating the Deactivate case box, or the cells can be colored by activating the
Color it box. The search can be specified for a complete content (if the Whole cell box is
activated), or for certain elements within a text box. After each replacement or deactivation,
the searcher reports the number of cases that were found or deactivated.
Resampling
DATA menu ⇒ SAMPLING/RESAMPLING allows the user to obtain samples from a
group of data by using the bootrap, jackknife, randomly with replacement, or randomly
without replacement methods. The bootstrap method conducts a random sampling with
replacement, and generates samples of size n equal to the size of the original sample, while
the option randomly with replacement allows the user to generate samples of a size different
from n. The column from which the samples are to be drawn should be indicated, as well as
classification and/or partition criteria, if these exist. Then, the user should select a sampling
technique (in the Resampling method panel), and the values to be reporter by the sampling
(Save panel). If bootstrap is selected, the number of samples to be extracted should be
entered in the Bootstrap field; if randomly with or without replacement is selected, the
number of samples to be generated should be indicated (in # of samples) as well as their
size (in Sample size). The values of the variable that make up each of the solicited samples
(Samples option) as well as one or several summary statistics for each sample (Mean,
Median, Maximum, Minimum, Range, Variance, Standard deviation—S.D—, Standard
error, Coefficient of variation—C.V.—Sum, Sum of squares, Median absolute
deviation—MAD—Percentiles—P01, P05, P10, P20, P25, P50, P75, P80, P90, P95, P99,
Kurtosis and Skewness).
The results are shown in a new table. If the values of the variable are solicited, the new table
will have a column for each sample. If one or more summary statistics are solicited, the new
table will contain each sample and each measurement in a column.
Color selection
DATA menu ⇒ COLOR SELECTION, allows the user to color a group of previously
selected cells. When a variable is colored it appears with the color in the Variables selector
list. This characteristic is useful, for example, if colors are used to distinguish groups fo
variables.
Data Management
34
Merge tables
DATA menu ⇒ MERGE TABLES allows the user to merge an active table to two or more
tables Horizontally or Vertically. The merge is done one table at a time.
A Horizontal merge adds columns to the active table to include the new information and
requires that the user select one or more merging criteria. Once these criteria are established,
a dialogue window will appear from which the table and columns to be merged (added) to
the active table should be selected. The window contains a list of tables open on the screen,
from which the table to be merged should be identified. If the desired table is not listed, the
Other table button should be pressed in order to open the corresponding table from its
location, and thus the table will be added to the list. Upon selecting a table from the list,
column (variable) names will appear with an activated check box, indicating which
variables will be added to the active table. The user can deactivate those which he does not
wish to participate in the process. In the case that both tables have the same column names,
upon adding the new information, InfoStat will place a number at the end of the name of the
added column in order to distinguish it from the other column with the same name. If the
user wishes to replace the content of the columns with the same name in the active table, he
should activate the Overwrite box.
Upon completing the horizontal merge, the solicited columns are added, but the information
from the original table is not included.
A vertical merge adds new rows to the active table in order to include the information
contained by coinciding columns and creates new columns for variables that do not
coincide. The process is similar to the one described for a horizontal merge, except that in
there is no need to specify merging criteria.
Rearrange columns, one under the other
DATA menu ⇒ Rearrange columns, one under the other merges the content of two or
more columns in a single column. The columns to be merged should be selected in the
dialogue window the (Columns option) and the merge will be conducted according to the
selection order. The user may also choose to copy the information from a column of interest
(Copy... option). There is an option to conduct the merge with only the active cases. By
clicking Go, a new table that shows the results of the union is generated.
Rearrange rows as columns
DATA menu ⇒ REARRANGE ROWS AS COLUMNS allows the user to transfer the
content of the rows of an active table to the columns of a new table, according to the
classification criteria established by the user. In the Columns option of the dialogue
window, the user should indicate the variables whose data will appear in the columns of the
new table, and in the Partition criteria option, he should indicate those variables which will
define the columns of the new table. The user may also copy entries of a particular column
of interest (Copy... option). The new table will appear upon clicking OK.
Data Management
35
Create a new table using active cases
DATA menu ⇒ CREATE NEW TABLE USING ACTIVE CASES generates a new table
that will contain only the active cases of an open table that also contains inactive cases.
Merge categories
DATA menu ⇒ MAKE A NEW COLUMN BY MERGING CATEGORICAL
VARIABLES allows the user to obtain the combinations that result from merging the
categories of two or more variables. In the dialogue window, under the Partition criteria
option, the user should indicate the variables he wishes to cross. Upon clicking OK, a new
column with the clases obtained by the merge will appear in the table.
Output
The OUTPUT menu shows
the actions that can be
applied to an active result
(the last result of an action
solicited from the Statistics
or Applications menu). In
order to activate another
previously obtained result,
click on the tab that
indexes that result and that
can be found at the foot of
the RESULTS window. Upon activating the OUTPUT menu, the user will be able to choose
from among the following options:
Upload results
This allows the user to open a file that contains results that have been saved during a work
session. The file name and location are specified in the dialogue window.
Save results
This allows the user to create a file containing results that have been obtained during a work
session. The file name and location are specified in the dialogue window. The files will have
a “.ITRES” extension.
Decimals
This item displays a submenu that allows the user to select the desired number of decimals
to be shown. At the bottom of this menu, an option for exponential notation appears; in the
Data Management
36
case that a result cannot show any significant digit with the specified number of decimals,
InfoStat will use exponential notation.
Field separator
This allows the user to select a type of separator (space, tab, comma or semicolon) as the
character that will separate the columns of a table; the default selector is a space. Usually
this separator does not need to be modified, but it can be useful to do so when results from a
table are exported.
Typography
This allows the user to change the typographical attributes (font style and font) used in
presenting results. This action can also be evoked by activating the “A” button on the
Toolbar.
Export results to table
This allows the user to export the text of a Results window as a table. Upon selecting this
action, a dialogue window called Text Importer will open. For details regarding operations
with this window, see OPEN TABLE in the DATA menu.
Access to results submenus through right clicking on the mouse
In addition to the actions presented in the RESULTS menu, the user can also access the
following options by right clicking on the mouse when a Results window is active:
Decimals: establishes the number of decimals that are shown in an active window.
Copy: copies the previously selected text, using tabs as field separators. The text can be
read directly in word processors for the construction of tables.
Delete: deletes the active result.
Delete present and previous windows: deletes the active result as well as all previous
results.
Print: prints the content of the active result.
Statistics
37
Statistics
InfoStat conducts different statistical analyses using an active data table. The selection of
the type of analysis is done from the STATISTICS menu. Each time a procedure is invoked,
the output is presented in a results window which can be formatted and prepared to be
exported according to the specifications given by the user from the OUTPUT menu.
The actions (submenus) that are
applied to the analysis of tables in
InfoStat, in the STATISTICS menu,
are the following: Summary
statistics, Frequency tables,
Probabilities and quantiles,
Estimating population parameters,
Sample size, One-sample inference,
Two-sample inference, Analysis of
variance, non-parametric ANOVA,
Extended and mixed linear models,
Linear regression, Correlation
analysis, Categorical data,
Multivariate analysis, Time series,
Fitting and smoothing
In general, these actions initially
invoke a window that is used to select
variables. In it, the user should
indicate the variable(s) of interest and
the desired partition, in the case that
the analysis is by group or partition in
the data file. In the variables selector,
the user can include variables of
interest by clicking on the arrows in
the Variables subwindow. The
variables that are generated should be
declared in the Partitions tab, the
Partition by command allows the
user to identify the variable(s) that
will be used to partition the analysis.
When more than one variable is
selected, the groups result from the combination of the levels of the selected variables.
Statistics
38
For example, if the variables seed color (light, dark and red) and seed size (large, medium
and small), three groups are generated upon selecting only the color (the three levels of
color). If, instead, both variables are selected, 9 groups will be generated. The partitions will
appear in a list to the right of the window that can be altered through the selection and
elimination of one or more groups that the user does not wish to participate in the analysis,
through the use of the displacement arrows found at the bottom of the list. Once groups have
been identified, InfoStat will repeatedly conduct the solicited analysis on the observations of
each group, separately.
Descriptive statistics
The first block in the Statistics menu allows the user to describe a group of data by means of
univariate summary statistics, frequency tables and theoretical distribution functions
adjusted to empirical distributions (sample frequency tables). All of these actions can be
conducted for the group of active tables, either as a whole, as a subgroup or partition of the
file, if the user indicates a partitioning variable in the Partitions tab. For summary statistics
and frequency tables, it is possible to work with files that have as many rows as
observations (see the Atriplex.idb file), or with files in which each row of the column of
interest represents a value of the variable and in which another column of the file contains
the frequency of each value (see the Insectos.idb file). In the first case, in the variables
selector, the variable(s) of interest should be indicated and the Frequencies field should be
left empty. In the second case, the column that contains the different values of the variable
should be indicated in the Variables window of the selector, and the column hat contains
the frequencies should be indicated in the Frequencies (only option) window. InfoStat also
provides a probabilities and quantiles calculator for different types of random variables.
Summary statistics
The following summary statistics are available: number of observations (n), Mean, standard
deviation (S.D.) variance with denominator n-1 (Var(n-1)), variance with denominator n
(Var(n)), standard error (S.E.), coefficient of variation (CV), minimum value (Min),
maximum value (Max), Median, quantile 0.25 or first quartile (Q1), quantile 0.75 or third
quartile (Q3), sum of observations (Sum), Asymmetry, Kurtosis, uncorrected sum of
squares (USS), corrected sum of squares (CSS) ,median absolute deviation (MAD), Missing
data, percentiles 5, 10, 25, 50, 75, 90 and 95 (P(05), P(10), etc.).
The number of observations reported corresponds to the number of active cases. The sample
statistics are calculated using the number of cases that remain after observations with
missing data have been omitted. The code for missing data can be entered by the user. The
Mean statistic refers to the arithmetic mean. The Standard deviation refers to the square root
of the sample variance, calculated as the sum of the squares of the deviations with respect to
Statistics
39
sample mean, divided by (n-1). The Standard error refers to the standard deviation divided
by the root of n. The Coefficient of variation is the quotient of the standard deviation and the
sample mean, expressed as a percentage.
The first quartile (Q1), the median and the third quartile (Q3), as well as any other
percentage can be obtained by ordering the sample and selecting one of the observed values
according to its position, or estimated based on an approximation of the empirical
distribution function. If the user selects Based on EDF in the Percentiles subwindow,
InfoStat will first estimate the function and then use this function to report the solicited
percentile. If the Sample option is selected, the percentile will be one of the sample values
obtained after the sample was ordered. For this reason, both procedures will not necessarily
produce the same numeric result.
Results can be presented horizontally or vertically. A horizontal presentation is useful to
export results to a new data table prior to conducting further analysis using a data table that
contains summary statistics.
Summary statistics for one or more variables can be simultaneously solicited from the file
(indicated in the variables selector). These summary statistics can be obtained using all the
observations from the file, or for a subgroup of observations. The subgroups can be formed
from a single variable or from a combination of two or more variables from the file. To form
groups, the user should indicate the variables that define the groups by listing these in the
Class variables (optional) subwindow in the variables selector. Alternatively, the Partition
tab can be activated to indicate the variables that partition the file; however this option is
less efficient than using the Class variables in terms of execution time. For this reason, we
recommend using the class variables option when the user wishes to obtain summary
statistics for a large number of subgroups of an extensive file.
To illustrate, we use data from the Atriplex file. Selecting STATISTICS menu ⇒
SUMMARY STATISTICS, we activate the Descriptive statistics window in which the
desired variable(s) are selected. If a variable is selected in the Partition tab to create a
partition in the file, the solicited summary statistics will be generated for each group or
partition. In this example, the variables “Percentage” and “Normal Seedlings”, and in the
Partition tab the variable “Size” was selected. The following summary statistics were
activated or solicited: n, Mean, S.D., Var(n-1), Min, Max, Median and P(50) estimated
from the empirical distribution function (this statistic does not coincide exactly with the
Median, since it is calculated using the sample data, whereas P(50) is calculated using the
distribution of the sample data. If in soliciting P(50) the Sample box is left activated, then
the Median and P(50) will be the same. The Horizontal presentation was selected. The
results are shown in the following table:
Statistics
40
Table 3: Summary statistics for variables in the Atriplex file, according to the partition by seed size
(horizontal presentation).
Summary statistics
Size Variable n Mean S.D. Var(n-1) Minimum Maximum Median P(50)
small Germinación 9 54.56 26.34 694.03 20.00 93.00 60.00 48.67
small Normal Seedlings 9 24.44 20.24 409.53 0.00 60.00 20.00 20.00
big Germinación 9 73.33 19.28 371.75 40.00 93.00 80.00 71.00
big Normal Seedlings 9 51.33 22.12 489.50 27.00 87.00 47.00 42.33
medium Germinación 9 68.78 32.81 1076.19 13.00 100.00 87.00 80.00
medium Normal Seedlings 9 50.67 27.44 752.75 7.00 80.00 54.00 40.50
Frequency tables
STATISTICS menu ⇒ FREQUENCY TABLES, allows the user to obtain a frequency table
and/or test the adjustment of theoretical distributional models on an empirical distribution
table. Frequency tables can, according to the fields activated by the user, contain the
following information: lower limits
(LL) and upper limits (UL) of the
class intervals, mean of the interval
(MI) absolute frequencies (AF),
relative frequencies (RF),
cumulative absolute frequencies
(CAF) and cumulative relative
frequencies (CRF). The number of
classes can be obtained
automatically or can be defined by
the user (PERSONALIZED). For the
automatic method, InfoStat obtains
the class number by taking
log2(n+1). For the personalized case,
InfoStat allows the user to specify
the minimum, maximum and the
number of intervals. The intervals
are closed on the right. If the
variable is categorical, the
personalization is not accepted, and
the frequencies table shows as many
classes are there are categories for
the variable. If the values of the
variable were declared integers, by
default, InfoStsat considers the variable a count variable and shows the frequencies of all the
integer values between the minimum and the maximum. If the variable contains integer
values and the Consider integer variables as countings box is de-activated, InfoStat treats
the variable as continuous, and uses them to define class intervals and construct the table.
Statistics
41
Again, using data from the Atriplex file, we obtained a frequencies table for the germination
variable for each of the seed sizes, by invoking the following actions: STATISTICS ⇒
FREQUENCY TABLES, in the Frequense Distribution window, variables tab,
germination was selected, and before clicking OK, the Partitions citeria tab was activated,
and the variable size was added (all the seed sizes present in the file are automatically
visualized). Upon clicking OK, the Distribution of frequencies – Frequency table options
window appears, from which the user can indicate the type of information he wishes to
visualize in the table and define the number of cases. In this example, all the default options
were accepted, and upon clicking OK, the number of classes was calculated automatically.
The results are shown in the following table:
Tabla 4: Frequency table for the germination variable from the Atriplex file, according to the
partition conducted by the variable seed size.
Frequency distribution
big Germination 1 40.00 57.67 48.83 3 0.33
Size Variable Class LL UL MI AF RF
big Germination 2 57.67 75.33 66.50 0 0.00
big Germination 3 75.33 93.00 84.17 6 0.67
medium Germination 1 13.00 42.00 27.50 3 0.33
Size Variable Class LL UL MI AF RF
medium Germination 2 42.00 71.00 56.50 0 0.00
medium Germination 3 71.00 100.00 85.50 6 0.67
small Germination 1 20.00 44.33 32.17 3 0.33
Size Variable Class LL UL MI AF RF
small Germination 2 44.33 68.67 56.50 3 0.33
Fittings
small Germination 3 68.67 93.00 80.83 3 0.33
STATISTICS menu ⇒ FREQUENCY TABLES, Fittings tab, allows the user to obtain
goodness of fit tests. The null hypothesis specifies a theoretical distribution model for the
data. The values observed in the sample are compared to the expected values according to
the specified model, through the use of the Chi square statistic and/or the maximum
likelihood statistical significance, or G, test (Agresti, 1990). The user should select from
among one of these two statistics in order to conduct a goodness of fit test. Furthermore, he
should specify whether he wishes to estimate from the sample, or externally specify the
parameters of the theoretical distribution that, hypothetically, describe the data. If specify is
activated, as many check boxes as there are parameters in the selected theoretical
distribution will appear, so that the user may input information. The check boxes reserved
for each parameter of a distribution will automatically contain the values of the sample
estimators of each parameter. In the case of continuous variables, the empirical distribution
will be constructed from the automatically generated information on class intervals. These
intervals can be generated on lower and upper open or closed intervals, depending on how
the user specified these in the Frequency distribution - Fittings window.
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng
Manual infostat eng

More Related Content

Similar to Manual infostat eng

BLIND ASSISTANT SERVICE
BLIND ASSISTANT SERVICEBLIND ASSISTANT SERVICE
CompensationTotal rewards is an organizational system of rewards
CompensationTotal rewards is an organizational system of rewardsCompensationTotal rewards is an organizational system of rewards
CompensationTotal rewards is an organizational system of rewards
LynellBull52
 
1. Text mining – Text mining or text data mining is a process to e.docx
1. Text mining – Text mining or text data mining is a process to e.docx1. Text mining – Text mining or text data mining is a process to e.docx
1. Text mining – Text mining or text data mining is a process to e.docx
stilliegeorgiana
 
Ais in banking sector
Ais in banking sectorAis in banking sector
Ais in banking sector
Moez Ansary
 
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
rahulmonikasharma
 
PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...
PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...
PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...
Rendy Marojahan Ompusunggu
 
IRJET- A Review on Budget Estimator Android Application
IRJET-  	  A Review on Budget Estimator Android ApplicationIRJET-  	  A Review on Budget Estimator Android Application
IRJET- A Review on Budget Estimator Android Application
IRJET Journal
 
An Integrated Management Platform for Subscription of Magazines
An Integrated Management Platform for Subscription of MagazinesAn Integrated Management Platform for Subscription of Magazines
An Integrated Management Platform for Subscription of Magazines
IRJET Journal
 
Information Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docxInformation Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docx
annettsparrow
 
Information Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docxInformation Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docx
carliotwaycave
 
Information Systems for Business and Beyond (2019)
Information Systems for Business and Beyond (2019) Information Systems for Business and Beyond (2019)
Information Systems for Business and Beyond (2019)
KiyokoSlagleis
 
Ad hoc reporting
Ad hoc reportingAd hoc reporting
Ad hoc reportingDr P Deepak
 
Encroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data TechnologyEncroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data Technology
MangaiK4
 
Predictive Analysis for Diabetes using Tableau
Predictive Analysis for Diabetes using TableauPredictive Analysis for Diabetes using Tableau
Predictive Analysis for Diabetes using Tableau
rahulmonikasharma
 
Digital Tools Review.pdf
Digital Tools Review.pdfDigital Tools Review.pdf
Digital Tools Review.pdf
KaavyaKumar1
 
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRYSOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
Kaustubh Nale
 
Facets Overview and Navigation User Guide.pdf
Facets Overview and Navigation User Guide.pdfFacets Overview and Navigation User Guide.pdf
Facets Overview and Navigation User Guide.pdf
wardell henley
 
IRJET- Survey Paper on E-Mandi a Market Exhange between Farmers and Enduser
IRJET-  	  Survey Paper on E-Mandi a Market Exhange between Farmers and EnduserIRJET-  	  Survey Paper on E-Mandi a Market Exhange between Farmers and Enduser
IRJET- Survey Paper on E-Mandi a Market Exhange between Farmers and Enduser
IRJET Journal
 
Online Poverty Alleviation System in Bangladesh Context
Online Poverty Alleviation System in Bangladesh ContextOnline Poverty Alleviation System in Bangladesh Context
Online Poverty Alleviation System in Bangladesh Context
IRJET Journal
 

Similar to Manual infostat eng (20)

BLIND ASSISTANT SERVICE
BLIND ASSISTANT SERVICEBLIND ASSISTANT SERVICE
BLIND ASSISTANT SERVICE
 
CompensationTotal rewards is an organizational system of rewards
CompensationTotal rewards is an organizational system of rewardsCompensationTotal rewards is an organizational system of rewards
CompensationTotal rewards is an organizational system of rewards
 
1. Text mining – Text mining or text data mining is a process to e.docx
1. Text mining – Text mining or text data mining is a process to e.docx1. Text mining – Text mining or text data mining is a process to e.docx
1. Text mining – Text mining or text data mining is a process to e.docx
 
Ais in banking sector
Ais in banking sectorAis in banking sector
Ais in banking sector
 
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
Enhancement of Privacy and User Interaction in a Social Network with the Aid ...
 
PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...
PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...
PERSONAL FINANCIAL APPLICATION BASED ON HYBRID MOBILE PLATFORM (UTILIZE SOCIA...
 
IRJET- A Review on Budget Estimator Android Application
IRJET-  	  A Review on Budget Estimator Android ApplicationIRJET-  	  A Review on Budget Estimator Android Application
IRJET- A Review on Budget Estimator Android Application
 
Resume
Resume Resume
Resume
 
An Integrated Management Platform for Subscription of Magazines
An Integrated Management Platform for Subscription of MagazinesAn Integrated Management Platform for Subscription of Magazines
An Integrated Management Platform for Subscription of Magazines
 
Information Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docxInformation Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docx
 
Information Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docxInformation Systems for Business and Beyond (2019) .docx
Information Systems for Business and Beyond (2019) .docx
 
Information Systems for Business and Beyond (2019)
Information Systems for Business and Beyond (2019) Information Systems for Business and Beyond (2019)
Information Systems for Business and Beyond (2019)
 
Ad hoc reporting
Ad hoc reportingAd hoc reporting
Ad hoc reporting
 
Encroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data TechnologyEncroachment in Data Processing using Big Data Technology
Encroachment in Data Processing using Big Data Technology
 
Predictive Analysis for Diabetes using Tableau
Predictive Analysis for Diabetes using TableauPredictive Analysis for Diabetes using Tableau
Predictive Analysis for Diabetes using Tableau
 
Digital Tools Review.pdf
Digital Tools Review.pdfDigital Tools Review.pdf
Digital Tools Review.pdf
 
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRYSOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
SOCIAL MEDIA ANALYSIS ON SUPPLY CHAIN MANAGEMENT IN FOOD INDUSTRY
 
Facets Overview and Navigation User Guide.pdf
Facets Overview and Navigation User Guide.pdfFacets Overview and Navigation User Guide.pdf
Facets Overview and Navigation User Guide.pdf
 
IRJET- Survey Paper on E-Mandi a Market Exhange between Farmers and Enduser
IRJET-  	  Survey Paper on E-Mandi a Market Exhange between Farmers and EnduserIRJET-  	  Survey Paper on E-Mandi a Market Exhange between Farmers and Enduser
IRJET- Survey Paper on E-Mandi a Market Exhange between Farmers and Enduser
 
Online Poverty Alleviation System in Bangladesh Context
Online Poverty Alleviation System in Bangladesh ContextOnline Poverty Alleviation System in Bangladesh Context
Online Poverty Alleviation System in Bangladesh Context
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

Manual infostat eng

  • 2. Data Management i InfoStat User’s Manual Version 2012 InfoStat software and documentation are the result of the active and multidisciplinary participation of all the members of Grupo InfoStat, who are Copyright owners. Principal responsibilities and activities are as follows: Programming: Julio A. Di Rienzo Quality control: Fernando Casanoves, Laura A. Gonzalez, Mónica G. Balzarini Editorial director of the User’s Manual: Fernando Casanoves, Julio A. Di Rienzo Electronic version of the User’s Manual: Fernando Casanoves Online help: Elena M. Tablada Citation for this manual is as follows: Casanoves F., Balzarini M.G., Di Rienzo J.A., Gonzalez L., Tablada M., Robledo C.W. (2012). InfoStat. User Manual, Córdoba, Argentina The software to which this manual refers should be cited as follows: Di Rienzo J.A., Casanoves F., Balzarini M.G., Gonzalez L., Tablada M., Robledo C.W. InfoStat versión 2012. InfoStat Group, Facultad de Ciencias Agropecuarias, Universidad Nacioal de Córdoba, Argentina. URL http://www.infostat.com.ar Total or partial reproduction of this reference in identical or modified form, by any means, mechanical or electronic, including photocopying, recording or through the use of any information storage and recuperation system not authorized by the Copyright owners, is prohibited.
  • 3. Data Management ii Prologue InfoStat is a statistical software developed by Grupo InfoStat—a team of professionals in Applied Statistics, with a center at the Faculty of Agronomy at Cordoba National University (Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba). The following professors of Statistics and Biometry participated in the elaboration of InfoStat: Julio A. Di Rienzo, Mónica G. Balzarini, Fernando Casanoves, Laura A. Gonzalez, Elena M. Tablada, and Carlos W. Robledo. InfoStat is a synthesis of experiences accumulated since 1982. It has been enriched by teaching experiences at the undergraduate and graduate levels, consulting in Statistics and the development of human resources in Applied Statistic. We are proud of InfoStat’s level of acceptance within university environments, at research and technological institutions, and among businesses devoted to the production of goods and services. This manual consists of four chapters: Data Management, Statistics, Graphs and Applications. The chapter on Data Management contains information on how to operate the program in order to use files, and it describes the activities that can be done with data tables. The chapter on Statistics describes the methodological tools that the user can select in analyzing his or her data. These descriptions are accompanied by examples of their implementation using InfoStat, and they are based on numerous real situations in which the application of one or more statistical techniques is beneficial. The chapter on Graphics also uses examples to describe the different types of graphical representations available. The chapter on Applications shows statistical methods used in the statistical quality control, the quantification of biodiversity and computational tools used to facilitate the teaching-learning process of classical statistical concepts. This manual reflects the state of development of InfoStat at the time of print; nevertheless, InfoStat keeps growing, improving and upgrading algorithms and user interfaces. Through InfoStat’s Help Menu, users can access the electronic version of the manual and a link to upgrade the manual.
  • 4. Data Management iii Table of contents Installation ____________________________________________________________ 1 Upgrading _____________________________________________________________ 1 Requirements __________________________________________________________ 1 General aspects_________________________________________________________ 2 Data Management ______________________________________________________ 5 File ________________________________________________________________________5 New table_________________________________________________________________5 Open table ________________________________________________________________5 Save table_________________________________________________________________8 Save table as ______________________________________________________________9 Close table ________________________________________________________________9 Edit________________________________________________________________________9 Data ______________________________________________________________________11 New row ________________________________________________________________12 Insert row________________________________________________________________12 Delete row _______________________________________________________________12 Deactivate case ___________________________________________________________12 Activate case _____________________________________________________________13 Invert selection ___________________________________________________________13 Choosing cases ___________________________________________________________13 New column______________________________________________________________15 Insert column_____________________________________________________________15 Delete column ____________________________________________________________15 Edit Labels_______________________________________________________________15 Read labels from… ________________________________________________________16 Data type ________________________________________________________________16 Alignment _______________________________________________________________16 Decimals ________________________________________________________________16 Automatically adjust columns ________________________________________________16 Sort ____________________________________________________________________16 Categorize _______________________________________________________________18 Edit categories ____________________________________________________________20 Transforming _____________________________________________________________20 Create dummy variables ____________________________________________________23 Fill... ___________________________________________________________________23 Formula _________________________________________________________________29 Search __________________________________________________________________33 Resampling ______________________________________________________________33 Color selection____________________________________________________________33 Merge tables _____________________________________________________________34 Rearrange columns, one under the other ________________________________________34 Rearrange rows as columns __________________________________________________34 Create a new table using active cases __________________________________________35 Merge categories __________________________________________________________35
  • 5. Data Management iv Output ____________________________________________________________________35 Upload results ____________________________________________________________35 Save results ______________________________________________________________35 Decimals ________________________________________________________________35 Field separator ____________________________________________________________36 Typography ______________________________________________________________36 Export results to table ______________________________________________________36 Statistics _____________________________________________________________ 37 Descriptive statistics _________________________________________________________38 Summary statistics_________________________________________________________38 Frequency tables __________________________________________________________40 Probabilities and quantiles___________________________________________________42 Estimators of population characteristics __________________________________________43 Definitions of terms associated with the sampling technique ________________________43 Simple random sample _____________________________________________________45 Stratified sample __________________________________________________________47 Stratified sampling_________________________________________________________49 Sample size calculation _______________________________________________________51 Estimating a mean with a given precision _______________________________________51 Inference in one and two populations ____________________________________________53 Inference based on one sample _______________________________________________53 Two-sample inference ______________________________________________________60 Analysis of variance__________________________________________________________71 Completely random design __________________________________________________74 Block design _____________________________________________________________77 Latin square design ________________________________________________________79 Comparaciones Múltiples ___________________________________________________94 ANOVA assumptions _____________________________________________________101 Analysis of covariance_____________________________________________________105 Non-parametric analysis of variance ____________________________________________107 Kruskal-Wallis test _______________________________________________________107 Friedman test ____________________________________________________________108 Validation of assumptions __________________________________________________118 Regression with dummy variables____________________________________________123 Non linear analysis of regression_______________________________________________128 Correlation analysis _________________________________________________________132 Correlation between distance matrices ________________________________________135 Categorical data analysis _____________________________________________________136 Contingency tables _______________________________________________________136 Logistic regression________________________________________________________146 Kaplan-Meier survival analysis ______________________________________________148 Multivariate Analysis__________________________________________________ 153 Multivariate descriptive statistics_______________________________________________154 Hierarchical clustering methods _____________________________________________163 Non-hierarchical clustering methods__________________________________________167 Distances _______________________________________________________________167 Principal components________________________________________________________167 Canonical correlations _______________________________________________________180 Partial Least Squares Regression _______________________________________________184
  • 6. Data Management v Multivariate analysis of variance _______________________________________________188 Distances and association matrices _____________________________________________196 Principal coordinates analysis _________________________________________________205 Classification-regression trees _________________________________________________207 Biplot and MST ____________________________________________________________208 Generalized Procrustes analysis________________________________________________210 Cross-correlations __________________________________________________________219 Box and Jenkins methodology (ARIMA) ________________________________________222 Fitting and smoothing _______________________________________________________235 Series Tab ______________________________________________________________239 Legends ________________________________________________________________245 Aplications __________________________________________________________ 267 Quality control _____________________________________________________________267 Control chart for attributes__________________________________________________269 Variable control charts_____________________________________________________274 Confidence intervals ______________________________________________________287 All possible samples ______________________________________________________289 Sampling from the empirical distribution ______________________________________291 Biodeversity indexes ______________________________________________________295
  • 7. Data Management 1 Installation To install InfoStat, enter our web page www.infostat.com.ar, download the installer and run it. Once the installation is successfully completed, the installer will have created a folder called InfoStat in C:Program files and an icon for direct access at desktop. Inside the InfoStat folder, C:Program filesInfoStat, you should find the following information: Data file: contains all the Data files to which this manual refers. Help file: contains the Online Help file. Manual.pdf file: contains the printed material that comes along with the CD. The electronic version may contain an updated version of the printed material. Upgrading Upgrading instructions can be accessed through the HELP menu. The UPGRADE option opens the InfoStat web page, where the latest applications can be downloaded. Requirements Processor required: Pentium or superior Minimum suggested memory: 128 Mb Operating systems: Windows XP or newer. Monitor configuration: minimum 800 x 600 pixel definition, small fonts. The configuration of large fonts may cause problems in viewing part of the windows displayed by InfoStat during use.
  • 8. Data Management 2 General aspects InfoStat offers different tools so that the user can easily explore information. When InfoStat is opened, a toolbar appears on the topmost window of the program; it contains the following menus: File, Edit, Data, Results, Statistics, Graphics, Windows, Help, and Applications. Below the menus, the toolbar contains a series of buttons that allow the user to perform actions quickly. All of the actions that can be performed with the buttons can also be performed from one of the menus listed above. By positioning the mouse over a button, but without clicking, the user can visualize a help label over the button as well as a legend at the foot of the screen, indicating the type of action that can be performed with that button. These actions are as follows (for buttons ordered from left to right): New table, Open table, Save active table, Export table, Print, New column, Sort, Categories, Font, Align left, Align center, and Align right. At the foot of the screen, the user will visualize three minimized windows, one named Results, another Graphs, and another Graphical Tools. If the Results window is maximized as soon as the program is opened, InfoStat will report that there are no results available. This window will receive content as actions (analyses) that generate results are performed. The Graphs and Graphical Tools screens are only activated when a graphic is generated. In the FILE menu, InfoStat allows the user to open and save different types of data files. For example, if New Table is activated, the following screen will appear: By using the keyboard, the user can enter information in the table or file temporarily named New. Using this table, the user can perform data analysis as well as produce results and
  • 9. Data Management 3 graphics. The Exit command, used to close the application, can also be found in the FILE menu. Commands for cutting, copying and pasting information from data, results and graphics windows can be found in the EDIT menu. The DATA menu allows the user to conduct different types of operations on a data grid. It is possible to order a file, transform columns, generate new columns based on formulas, simulate random variables, and automatically find and replace information, among other actions. From the OUTPUT menu the user can invoke actions related to the presentation and exportation of results in table format. All of the generated results (tables and graphs) can be copied using the EDIT menu (Copy) and can then be pasted in the word processor. This is the simplest way to transport results from InfoStat to a document or written report. The use of the Copy and Paste commands is also the simplest way to import and export data between InfoStat and a word processor or electronic spreadsheet program such as Excel. In order to simplify the transportation of data spreadsheets, InfoStat provides the user with the commands Copy and Paste including column names, in order to preserve the names and labels of columns. It is also possible to import and export information in ASCII format. In this chapter, the options from the FILE, EDIT, DATA and OUTPUT menus are described with examples. InfoStat works with three types of windows: one where data are found (Data), one where results and procedures are solicited (Results), and one where graphs created by the user are shown and stored (Graphs). Several data windows can be kept open simultaneously. In such cases, the active window is the one in the front, with a colored frame (not gray). All actions will be executed on the active data window. The Results and Graphs windows contain a sheet for each result and/or graph produced. The user can move across the different sheets by clicking once on the labels found at the foot of the window, which indexes the results. In the STATISTICS menu, in an almost automatic manner (through the use of dialogue windows), InfoStat makes it possible to implement an ample variety of statistical analyses. The user can calculate descriptive statistics; calculate probabilities; estimate population characteristics with different sampling plans; calculate inference statistics for one and two samples by using different types of confidence intervals and hypothesis tests (parametric and non-parametric); use regression models and analysis of variance for different types of experimental designs and observational studies; use inference statistics for categorical data; use multivariate statistics; do time series analysis; soften and adjust graphs. After selecting the desired statistics application to be used in analyzing the data of an open table (active table), a window (Variables) appears in which all the file’s columns are listed on the left-hand side, so that the user can select the column(s) to be included in the analysis—either as the variable of interest or as classification criteria. The selected columns should be transported to the list of Variables, which is found on the right-hand side of the window, using the button that contains the “”arrow. If a variable was incorrectly selected or it is no longer necessary, it can be eliminated from the list of variables and added again to the list of columns in the file by pressing the “” button, after having selected the variable or having double clicked on it.
  • 10. Data Management 4 The variable selector facilitates analysis, making it unnecessary to remember or write down the names of the variables each time they are to be used. In the GRAPHS menu, InfoStat provides professional style graphical tools for the presentation of results. Various graphical techniques are employed, and they are described in the chapter entitled “Graphs”. The program allows the inclusion of several series in a single graph and the virtual edition of all attributes, by using the Graphical Tools window, which automatically opens up when a graph is requested. InfoStat has an algorithm for copying and subscribing formats which facilitates the creation of graphical series with identical characteristics. Graphs created by InfoStat can be saved or copied and pasted into any Windows application that supports images (enhanced metafile) by using the classic Cut and Past (or Paste Special) Windows commands. All the tools on the GRAPHS menu are available in every version of InfoStat. Through the WINDOWS menu, the user can move from one window to another. Another way to access a window is to simply move the cursor to the desired window. The Windows menu also allows the user to select the mode in which the open windows are presented on the screen. The windows can be presented in cascade, vertically or horizontally by selecting the appropriate option—Cascade, Align vertical, or Align horizontal. From this menu, the user can access the OUTPUT menu, where the results of a session that the user has not deliberately erased are stored. Similarly, the user can move to the Graphs window. The names of open data tables are also listed. Through the HELP menu, the user can access online documentation regarding procedures and types of statistical analysis which can be implemented from any of the enabled menus, as well as access an electronic version of the InfoStat manual. Moreover, this menu can be used to gain fast access to software updates. In the APPLICATIONS menu, traditional analysis tools are available, and these can be used to explore information in groups of data from specific areas of knowledge. The following applications are available: QUALITY CONTROL, TEACHING TOOLS, INDICES and DNA-MICROARRAY ANALYSIS. The TEACHING TOOLS application is oriented toward providing classical elements for teaching and learning applied statistics. Some tools frequently used in statistical quality control are found in the QUALITY CONTROL application. Under the INDICES item, the user can calculate numerous biodiversity indices commonly used in Ecology. In the DNA-MICROARRAY ANALYSIS application, procedures for normalizing, transforming, filtering, grouping and ordering genes, ordering micromatrices, correcting the p-value to control for false discovery ratios (FDR), and testing p-values are available, among others. When an option in any of the menus shows up in gray instead of in black, this indicates that the menu is not enabled. This could be because the user has not completed a previous step necessary for that action, or because the action is not available in the acquired version of InfoStat.
  • 11. Data Management 5 Data Management InfoStat processes information proceeding from a table. A table is defined as a group of data organized in rows and columns. The columns usually represent the variables while the rows usually represent the observations. Column labels are the names assigned to variables. File The actions (submenus) applied to the management of tables in the FILE menu are the following: NEW TABLE, OPEN…, SAVE TABLE, SAVE TABLE AS... , and CLOSE TABLE. Also available in this window are an EXIT option and a list of the most recently modified files. New table FILE menu ⇒ NEW TABLE creates a new table. The user can also press <Ctrl+N> or use the button with the blank sheet found on the toolbar (New Table button). A table with one row and two columns will appear, and these can be expanded in order to enter data. New tables are numbered consecutively (New table, New table_1, New table_2, etc.). Open table FILE menu ⇒ OPEN …, invokes an existing table. The user can also press <Ctrl+O> or use the button with the picture of a file (Open Table button) on the toolbar. By pressing <Shift>+ Open Table button, the user can directly access the Data file which contains the files used in the examples in this manual. In order to open a table, the user should provide the information solicited in the dialogue window. InfoStat allows users to open files with the following formats: InfoStat (*.IDB, *.IDB2) Excel (*.XLS) Graph (*.IGB) Text (*.TXT, *.DAT) Dbase (*.DBF) Results (*.ITRES) InfoGen (*.IGDB) Paradox (*.DB) EpiInfo (*.REC) InfoStat assumes that in the data structure, columns represent variables and rows represent observations. For each variable, every value should correspond to the same data type (whole, real, categorical or date).
  • 12. Data Management 6 If the user wishes to open an ASCII file with a TXT or DAT extension, the Import text window will be activated. By using the Import text window the user can indicate the Field separators he wishes to use (tab, comma, semicolon, space or others). The data to be imported may contain the names of the variables (columns). If the data contain the names of the columns, the user can indicate whether what appears in Row 1 will be the name of future columns in the data table (InfoStat shows this option by default). If the heading has text before the names of the columns, the user should indicate which line contains the names of the columns. This can be done by changing the number that’s on the side of the Row 1 option, until the line with the names of the columns is shown in the first row. If the data do not contain the names of columns, the option Use first row as column name should be deselected. In this case, the variables will be headed as Column 1, Column 2, etc. In order to observe the information that will make up the table once it is imported, press the Preview table button. If the structure is correct, press Accept, otherwise, change the options and try again with Preview table, until the desired result is obtained.
  • 13. Data Management 7 Note: When data tables that have been saved as text (with .TXT extension) are imported from Microsoft Excel, the empty cells in the original file appear as two consecutive separators. In this case, the option Consecutive separators are generated as one should not be selected. By default, InfoStat shows this option as unselected when a text file is opened. If, however, the file contains numeric and alphanumeric data in a single column, InfoStat only recognizes the first character in the column. If it is a number, the alphanumeric characters will be erased and vice versa. The simplest way to read files from another program is by using the Copy and Paste functions. InfoStat provides the options Copy with column name and Paste with column name to facilitate the importing and exporting of data. For example, in order to import an Excel file, the user should simply copy the data he wishes to export to InfoStat, including the names of the columns from Excel. The user should then open a new table in InfoStat, where he should paste the copied content by using the option Paste with column name. Table toolbar By positioning the cursor over a table and right clicking the mouse, several options become available, including the Toolbar. This option allows the user to add a bar of buttons to an active table, such as the one shown below. These buttons allow the user to do the following, from left to right: increase font size, reduce font size, eliminate decimals (the user should first click on a cell of the column of interest), add decimals (the user should first click on a cell of the column of interest), insert a row (before a previously selected column), eliminate a previously selected row, add a column to the end of a table, insert a column (before a previously selected column), eliminate a previously selected column, and highlight a selection. The font size can also be modified by pressing Ctrl and ↑ (to increase the size) or Ctrl and ↓ (to decrease the size). Variable management This window appears when an active table is open and the user presses <Ctrl+E>. The following actions are available in the dialogue box:
  • 14. Data Management 8 Rename variables: This can be done by double clicking on a variable name in a list of variables. Move the position of one or more variables: The variables can be selected from the list, and by pressing <Ctrl>, the selected block can be moved by using the arrow buttons (↑ moves it up and ↓ moves it down). Changes in the position of the list are automatically updated in the table. Select one or more variables to be eliminated: Once the variables are selected from the list, click on the Mark to eliminate button. The variable will be eliminated from both the list and the table. Deactivate / activate one or more variables: When the check box to the left of the label is unchecked, the variable is deactivated. (In the example, all the variables with a “1” in the label are activated and selected.) The deactivated variables do not appear either in the table or in the variable selector. Forming groups of variables: Groups of variables can be formed by selecting the variables and pressing the Group selection button. Variables in a group can be activated or de- activated, colored, erased, etc. all together. Save table FILE menu ⇒ SAVE TABLE, saves the active table in InfoStat format (with .IDB2 extension), in the directory in use. The same can be achieved by pressing <Ctrl+S>, or the Save active table button on the toolbar.
  • 15. Data Management 9 Save table as FILE menu ⇒ SAVE TABLE AS, saves the active table with the appropriate format and directory required by the user. The formats are listed below: InfoStat (*.IDB, *. IDB2) Excel (*.XLS) Dbase (*.DBF) ASCII (*.TXT) InfoGen (*.IGDB) Paradox (*.DB) The Export table button on the toolbar can also be used. In the dialogue box, indicate the name, place and type of file. If an ASCII format is selected, the user should select a field separator and indicate whether the first row should be used as the name of columns (labels). If desired, the user can also indicate whether a character (or group of characters) should identify a missing observation in the exported file. Close table FILE menu ⇒ CLOSE TABLE closes the active table. Alternatively, the user can press <Ctrl+W>. If the table has been modified and has not been saved, InfoStat will ask the user to confirm whether he wishes to save it. Edit The actions (sub-menus) that can be applied to the management of InfoStat tables in the EDIT menu are the following: Cut, Copy, Paste, Copy with column name, Paste with column name, Undo and Select all. The actions are used to edit cells, columns and/or rows, similar to the editing of texts in Windows.
  • 16. Data Management 10 Modifications to entered data in an InfoStat table are done from the active table. By pressing <Enter>, the entered characters will be uploaded to the table. By pressing the <Esc> button before pressing <Enter>, cell content that was previously uploaded will be re-established. To stop editing, use the arrow buttons (up, down, left, right), the tab, or select another cell with the mouse. To select a group of cells, use the mouse to select the desired area. Alternatively, select cells by using the keyboard, keeping the <Shift> key pressed and using the arrow buttons to select the desired area. The highlighted areas can be printed by pressing the Print button, found on the toolbar. It is possible to select the font type, style, size and color for the entire table. This can be done by simply selecting a cell and pressing the button with the letter “A” on the toolbar to obtain the appropriate menu for this action. Buttons for the alignment of data to the right, left, and center of the column also exist. These are located next to the “A” button. In tables with .IDB2 format, a description of data contained in the table can be saved. The description can be edited by pressing F2. When F2 is pressed, a field for writing the description appears. If the second button on the toolbar of the dialogue window is pressed, this field will be inserted in the file. If the user wishes to definitively include the description in the data file, he should save the table.
  • 17. Data Management 11 A description can be uploaded from a file with TXT or RTF format by pressing the first button on the mentioned toolbar. Data
  • 18. Data Management 12 The actions (submenus) applied to the management of InfoStat tables in the DATA menu are the following: New row, Insert row, Delete row, Deactivate case, Activate case, Invert selection, Select cases, New column, Insert column, Delete column, Edit categories, Edit label, Read labels from…, Data type, Alignment, Decimals, Variable manager, Categorize, Fill, Generate a class-variable according to cell color, Adjust column width, Sort, Transformation, Create dummy variables, Formula, Search, Sampling-Resampling, Color selection, Merge tables, Rearrange columns, one under the other, Rearrange rows as columns, Create new table using active cases, make a new column by merging categorical variables, Split a category in its components, Update, Show-edit data table description. These actions can also be invoked by right-clicking the mouse when positioned on the data table. The following example illustrates some of the actions executed by the submenus. Example 1: The user has access to a group of observations that refer to seed size (Size), color of episperm (Episperm), percentage of germination (PG), number of normal plantules (NP) and dry weight (DW) of Atriplex cordobensis Note: Files used in this manual are located in C:Program FilesInfoStatData. seeds, a foraging shrub. The data are located in the file Atriplex.idb (courtesy of Dr. M.T. Aiazzi, of the Faculty of Agricultural Sciences, U.N.C.). New row DATA menu⇒ NEW ROW adds the number of rows specified by the user in the emerging window to the end of the table. Alternatively, the user can position the curser on the last row and press <Enter> to generate new rows. Insert row DATA menu ⇒ INSERT ROW inserts a new row above the selected row. Delete row DATA menu ⇒ DELETE ROWS eliminates the selected row(s) from the table. This action can be undone by using the Undo submenu from the Edit menu. Deactivate case DATA menu ⇒ DEACTIVATE CASE allows the user to exclude selected rows from the procedure to be executed. To deactivate a row in the table, the user should double click on the case number. Deactivated observations show their case number inside parentheses and the corresponding row is colored.
  • 19. Data Management 13 Activate case DATA menu ⇒ ACTIVATE CASE activates cases that have been deactivated (i.e., activated cases participate in the analysis). To activate a single row, the user should double click on its case number. To simultaneously activate several cases, the user should select a cell from each row to be activated and activate them from the DATA menu or from the menu that appears by right clicking the mouse. All selected cases are activated by default. Invert selection DATA menu ⇒ INVERT SELECTION activates (deactivates) cases that are deactivated (activated). Choosing cases DATA menu ⇒ SELECT CASES... allows the user to establish criteria for selecting cases. Once the action is executed, unselected cases are deactivated. First, the user should establish to which variables the selection criteria will be applied, then specify the criteria. In the Select cases dialogue window, a list of variables from the active table appears. From this list, the user should select the variables to which the selection criteria will be applied, entering these in the corresponding box on the Variables tab (a partition can be indicated in the corresponding tab). Procedures that facilitate the selection of variables are available when many variables are used. At the foot of the list of variables, there are options to select variables according to a particular common characteristic in their names. If the variables share a specific character or
  • 20. Data Management 14 succession of characters, they can be simultaneously selected. The figure illustrates the selection of all the variables whose names contain the letter P, once the option (…) box has been activated. To specify that the character or succession of characters is at the beginning of the label, activate the option […) box; to indicate that it’s at the end of the label, activate the option (…] box. Wildcard characters can also be used. For example, by entering the sequence “**1”, all variables whose labels have 2 characters before the number 1 will be selected from the list. If “??1” is entered, all variables whose labels contain a “1” preceded by two alphabetical characters will be selected, and if “##1” is entered, all variables whose labels contain a “1” preceded by two numerical characters will be selected. If groups have been formed (using the Variable manager window), the box labeled {g} becomes available. By activating this box, a field that contains the list of available groups appears, from which the groups can be selected. Another way to select variables is to use a list saved in a text file. In so doing, all the variables contained in the file will be selected. In order to do so, the user should right click on the box that contains the list of variables of the active table. A menu appears in which the Select from a list option appears, followed by the Text file option. In this same menu, there is an option for alphabetically ordering the list of variables. Once the variables have been selected, criteria for selecting the cases should be established. The variables that participate in the selection process appear in the dialogue box, and there is a field for writing the criteria. In the case that a criterion is established based on more than one variable, the user should select one of the variables, write the sentence that indicates the criterion, for example x<80, and then press Enter. The user should proceed in the same way with each variable of interest. By pressing Accept, the cases outside of the selection appear deactivated (colored and with their case number in parentheses), in the active table.
  • 21. Data Management 15 More than one sentence can be written to determine the criterion for a single variable. This can be done by pressing Enter after writing a sentence. By activating the Create new table using active cases, a table with the selected cases is generated. New column DATA menu ⇒ NEW COLUMN adds a new column to the end of the table. The type of format can be indicated (whole, real, categorical, or date). The added column is named Column 1, Column 2, etc. By pressing the button with an image of a table, located in the toolbar, new columns are added to the right hand side of the active table. Columns generated in this way are not previously assigned a type. The type of data in these columns is assigned automatically when content is uploaded to any of its cells. If the content is numerical, the type assigned is real, if it is alphanumeric the assigned type is categorical. If the user wishes the type to be whole, he should change it afterwards, starting from a column with real type data. Insert column DATA menu ⇒ INSERT COLUMN inserts a column in a place prior to where the cursor is located. Data type (real, whole, categorical or date) can be indicated. Inserted columns will be named Column 1, Column 2, etc. Delete column DATA menu ⇒ DELETE COLUMN eliminates the selected column(s). The user need only select one cell of each column. This action can be reversed by using the Undo submenu in the Edit menu. Note: to change the position of a column, select the column while pressing <Ctrl> and move the mouse, while continuing to press down on the mouse button, to the new desired position. Upon releasing the mouse button, the column will remain in its new location. Edit Labels DATA menu ⇒ EDIT LABELS allows the user to change the name of a column. The user need only position the mouse on a cell of the column he wishes to edit and solicit this action. Acceptable names include spaces and ASCII characters, with a limit of twenty characters. If the name begins with a number, InfoStat will add the letter C beforehand. By selecting several columns and applying this action, a dialogue window that allows the user to successively change column names appears. In files generated with an IDB2 extension, double clicking on the edit field where the name of the variable is written makes a dialogue appear that allows the user to write a description
  • 22. Data Management 16 of the variable. If the user wishes to include the description in the file, the description should be saved. Read labels from… DATA menu ⇒ READ LABELS FROM.... allows the user to read the names of variables in an active table from a text file (*.txt). InfoStat assumes that the names are in a list (one name beneath the other) in the order in which the variables are found in the table. Data type DATA menu ⇒ DATA TYPE allows the user to declare the type of data in a column. The following data types are acceptable: whole, real, categorical, and date. Dates can be entered in the following formats: 20/05/07, 20-05-07 or 20.5.07. If the user does not declare a data type, InfoStat assigns the type that corresponds to the first data entered. Once the type has been declared, only data of the same type can be entered. Alignment DATA menu ⇒ ALIGNMENT changes the position of the presentation of the content in the selected cells. Alignment positions include left, center and right. The default alignment for numerical cells is right, and for categorical cells the default is left. There are also buttons to complete the alignment action, found on the tool bar next to the “A” button. Decimals DATA menu ⇒ DECIMALS changes the number of decimal places included in the numerical content of the cells. Up to 10 decimal places are allowed. By default, 2 decimal places are included. When data are copied from the grid only visible decimals are taken into account, thus it is important to specify the desired number of decimals for each variable. Automatically adjust columns DATA menu ⇒ ADJUST COLUMN WIDTH (<Ctrl+L) adjusts the width of selected columns according to the length of the column labels or to cell content. If no column is selected, the action will be applied to all the columns of the table. Sort DATA menu ⇒ SORT allows the user to sort records in ascending or descending order of the values in one ore more columns. A dialogue window shows the names of columns of the active table in a list on the left. On the right, two lists, ascendant order and descendent order, show the variables to be sorted according to the hierarchy determined by the user and the order in which the variables were selected. For example, if the file has two columns, gender and age, where the gender variable comes first in ascending order group, and the age
  • 23. Data Management 17 variable comes second in descending order group, by performing the sorting action, the file will be ordered by gender, and within each gender, it will be sorted in descending order by age. The buttons found on the lower part of the dialogue window allow the user to change the sorting criteria (ascending or descending) and the sorting hierarchy. For example, using data from the Atriplex file, observations were sorted in descending order, according to the values of the variable PG. The resulting configuration is shown in the following table: Table 1: Atriplex file sorted in descending order by variable PG. Size Color Germination Normal seedlings DW medium reddish 100 80 0.0032 big yellow 93 80 0.0040 medium yellow 93 80 0.0038 medium yellow 93 80 0.0043 small reddish 93 7 0.0030 big yellow 87 87 0.0043 medium yellow 87 54 0.0033 . . . . . . . . . . small dark 20 0 0.0030 medium dark 13 7 0.0030 Alternatively, sorting can be invoked from the toolbar by activating the Sort icon. Warning: this option cannot be automatically undone. To keep the original file, close the table without saving changes, save the file with another name, or sort in such a way as to recover the original order of the data.
  • 24. Data Management 18 Categorize DATA menu ⇒ CATEGORIZE allows the user to categorize data from a previously selected column while generating a new column with the desired categorization. This action is available only when the data in the selected column are whole or real. Two procedures are available: assign categories to intervals or assign categories assign categories to numeric codes. By selecting assign categories to intervals, categories are made by setting the upper limits of a group of class intervals. Cases that belong to the same class are assigned to the same category. The following categorization methods are defined, depending on the way in which class intervals are established: FIXED: categorizes a data group, generating as many intervals as solicited categories. Minimum and maximum valies, length, and upper limits for each category are shown, identified as C1, C2, etc. If the user wishes to identify each category with whole numbers, he should activate the Numbers box. By default the categories are sorted in ascending order; to change this, the Descendent box should be activated. To execute the categorization, press the Accept button. The user can change Minimum and Maximum values to obtain the desired categorization. PROBABILISTIC: the upper limit of each category represents a percentile of the distribution of the variable, according to the number intervals solicited. For example, if 4 intervals are solicited, their respective limits are the 25, 50, 75 and 100 percentile. To apply the categorization, press the Accept button. CUSTOMIZED: the upper limit of the intervals of each category can be entered. To do so, the user should select the number of categories that he wishes to create and enter the upper limit of each interval in the adjacent table. By default, the upper limit of the last category is the maximum value of the observed values. To apply the categorization, press the Accept button. As an example, using data from the Atriplex file (previously sorted by the variable PG, in descending order), observations were categorized by intervals. The resulting configuration is shown in Table 2. Using the FIXED option, the pre-establised configuration was selected: Nº categories: 5; min: 13; max: 100; length of interval: 17.4; upper interval limits: 30.4; 47.8; 65.2; 82.6; 100. Using the PROBABILISTIC option, 5 categories were selected with the following upper limits: 33; 60; 80; 87 y 100. Using the PERSONALIZED option, two categories were selected: one with germination values less than or equal to 80%, specified by writing the number 80 in the LS1 field, and another with values greater than 80%, specified in the LS2 field where the number 100 appears by default.
  • 25. Data Management 19 Table 2: Atriplex file with the variable PG categorized according to three criteria. Germ. Fixed Proba. Pers. Germ. Fixed Prob. Pers. 100.00 C5 C5 C2 73.00 C4 C3 C1 93.00 C5 C5 C2 66.00 C4 C3 C1 93.00 C5 C5 C2 60.00 C3 C2 C1 93.00 C5 C5 C2 60.00 C3 C2 C1 93.00 C5 C5 C2 53.00 C3 C2 C1 87.00 C5 C4 C2 53.00 C3 C2 C1 87.00 C5 C4 C2 40.00 C2 C2 C1 87.00 C5 C4 C2 33.00 C2 C1 C1 87.00 C5 C4 C2 33.00 C2 C1 C1 87.00 C5 C4 C2 26.00 C1 C1 C1 80.00 C4 C3 C1 20.00 C1 C1 C1 80.00 C4 C3 C1 20.00 C1 C1 C1 80.00 C4 C3 C1 13.00 C1 C1 C1 73.00 C4 C3 C1 Upon selecting assign categories to numeric codes, the categories can be read from a table or entered by the user. This process is useless, for example, in the case of a file that uses numeric coding to represent the different states of qualitative variables. The corresponding dialogue window is shown below. In the dialogue window, the list of numbers to be categorized appears on the left, and on the right appears an empty list of categories. The categories can be entered manually or read from a text file or table stored on the clipboard. The text file should contain as many lines as categories, and each line should have a number followed by a separator symbol (this can be “=”, “:”, “.” or a tab), followed by the name of the category associated with this number. For example, if upon registering the type of occupation the number 2 corresponds to the category “unemployed”, this should appear as follows: 2=unemployed. If the option for assigning categories based on a table stored on the clipboard, this table should have been previously copied from a file that includes a description of the structure for the text files. These uploading options are selected from a menu that appears by right clicking the mouse on the assignment table, as shown in the figure. In order for the option Copy from clipboard to appear, the table should be on the clipboard. To obtain the categorization, press the Accept button. Categories will appear in a new column with a label with the prefix “Cat” followed by the name of the variable that corresponds to the categorization. The figure shows an edit field in which Cat_Occupation appears, which can be modified by writing a new name. When a numerical variable is categorized using an assignment table, the table can be read from the description of the resulting variable.
  • 26. Data Management 20 Edit categories To apply this action, the column that contains the categories should be selected. DATA menu ⇒ EDIT CATEGORIES makes a dialogue window (Edit categories) appear that shows the categories of the selected variable (column). In this window, a list with existent categories will appear. Upon selecting a category, its name will appear in an editing field located above the list. In that field, the name of the category can be modified. This field is automatically shown in the list. By pressing the Accept button, changes will be reflected in the data table. A category can be grouped with another one by using the arrow buttons: Upper limit and Lower limit. If a category is selected and the right arrow button is pressed (Lower limit), the selected category will be “included” within the category that precedes on the list. Upon pressing the Accept button, the included category will disappear from the data table and it will be replaced by the category in which it is included. Another way to include a category within another is to select it with the mouse, and while keeping the mouse button pressed, drag it to the category that is to contain it. If a category is incorrectly placed within another category, the user can re-locate it by dragging it to the category where he wishes it to be included. Before pressing the Accept button, the action can be reversed by selecting the included category and pressing the left arrow button (Upper limit). To change the position of the categories, the up (↑) and down (↓) arrows can be used. Once the user is satisfied with the categorization, he should press the Accept button so that the changes are reflected in the data sheet. In order to facilitate entering data for categorical variables, each category is associated with a number that depends on its position in the list that appears in the Edit categories dialogue window. For example, if the categories are “small”, “medium” and “large” and they appear in the list in that order, by entering “1” in one of the cells of the column that contains these categories and pressing <Enter>, the name “small” will appear. If the order of the categories in the list is altered, the numeric coding will respond to the new order. If a variable is changed from categorical to whole, numbers that correspond to the order of the category in the list will be generated. The button shown in this paragraph can be found in InfoStat’s toolbar, which allows the user to edit categories without going to the DATA menu. Transforming By invoking this action, the Transformations window will appear, so that the user can select the variable(s) he wishes to transform. These should be quantitative variables. Upon pressing the Accept button, another window that allows the user to select the transformation appears. In this window, two lists of transformations appear: one to be applied to a variable and another to be applied to a combination of variables. Regardless of which transformation is selected, InfoStat generates new columns containing the transformed variables, which will
  • 27. Data Management 21 automatically be named with the name of the transformation followed by an underscore and the name of the original variable. Selecting the transformatio: possible transformations including the following— Standardize, Standard (by row), Center, Center (by row), Externally studt res (externally studentized residuals), Rank, Normal score, Log10 (base 10 logarithm), Log2 (base 2 logarithm), Ln (natural logarithm), Square root, Inverse, Power, ArcSin (square root (p)), Probit, Logit, Complement log-log, Map to [0,1], if >= mean then 1 else 0, if >= median then 1 else 0, Multiply by, and Scale by the maximum. If two or more variables are selected, other transformations that appear in the Combining variables list can be executed. Standardize: allows the user to standardize the selected variable(s). The standardization is done by extracting from each observation the mean of the column and dividing this by the standard deviation of the values of the column. Standardize (by row): if the user selects more than one variable in the transformations menu, the “standardize by row” option becomes enabled. In such cases, each entry in the table is transformed to its standardized value using the mean and standard deviation of the elements in the corresponding row. Center: this transformation centers by column. In other words, from each observation, InfoStat subtracts the mean value of the variable, obtained using data from the corresponding column. Center (by row): in this case, from each value of a selected variable, InfoStat subtracts the row mean, obtained using data for all the selected variables. Externally studt res (externally studentized residuals): for a position medel, define: ( )( ) ( )i i iERS y y S− − = − where yi is the value of the discarded observation, ( )i y − is the mean of the data without the observation yi , and S(-i) is the standard deviation of the data calculated after the observation is discarded. Rank: this function assigns the position occupied in the ascending list to the original data. In a group of n data, the observation with the lowest position is assigned rank 1, the one with the second lowest position is assigned rank 2, and so on and so forth. The observation with the highest position is assigned rank n. If two or more observations are assigned a singl value (tie), the rank assigned to each observation is an average of the consecutive ranks corresponding to that value. For example, for the series 10, 20, 20, 30, 40, 50, 50, 50, 60 the transformed series is as follows: 1, 2.5, 2.5, 4, 5, 7, 7, 7, 9. Normal score: the “Rank” transformation is applied to the selected variable. Next, each rank value is divided by (n+1), where n is the total number of data in the sample. For each quotient, the inverse of a Normal (0:1) distribution function is obtained.
  • 28. Data Management 22 Logarithm transformation:InfoStat allows users to generate variables using the Log10 (base 10 logarithm), Log2 (base 2 logarithm) and Ln (natural logarithm). If the value to be transformed is less than or equal to zero, the result will be a missing value. In this case, log(y+c) can be used, where c is a constant. Square root: y or y c+ , where c is a constant. Inverse: 1/y. Power: yλ with λ≠0 where λ is the desired power. ArcSin (square root (p)): ( )-1Sen p with p ∈ [0,1] (arcsine of the square root of the proportion). Probit: defined as Probit (p)=F -1 (p) with p ∈ (0,1), where F -1 is the inverse of the normal distribution function. Logit: defined as Logit (p)=ln(p/(1-p)) with p ∈ (0,1). Complement log-log: defined as CLL(p)=ln[-ln(1-p)] with p ∈ (0,1). Map to [0,1]: given a group of observations {y1,...,yn}, the transformation consists in subtracting from each value the minimum of {y1,...,yn} and divide the resulting value by the rance (difference between the maximum and minimum). If >= mean then 1 else 0: allows the user to dicotomize the data as a function of the mean of the observations. Observations greater than or equal to the mean will take on a value of 1. If >= median then 1 else 0: allows the user to dicotomize the data as a function of the median of the observations. Observations greater than or equal to the median will take on a value of 1. Accumulate: generates a column where the t-th element represents the sum of the first t elements. For example, if the column contains values 10, 12 and 20, applying this action will generate the values 10, 22 and 42. Scale by the maximum: Divide the selected columns by their maximum. Divide by the sum: Divide the selected columns by their sum. Fill with a sequence: Replece the non-missing values of the selected columns by a sequence in the order of the registers. Combining variables allows the user to apply functions that involve several columns in the file. The variables that are to intervene in the evaluation of the selected function should be specified in the variables selector. The selected function can be one of the following: Sum, Mean, Median, Variance, Standard deviation, Minimum, Maximum, and Linear combination. The Sum function sums the values of the selected columns in each row of the file and generates a new variable named Sum. Similarly, the Mean, Median, Variance, Standard deviation, Minimum, and Maximum of the values in each row can be solicited.
  • 29. Data Management 23 When Linear combination is selected, the coefficients of the combination should be indicated in the Coefficients window. The coefficients should be entered one by one, pressing <Enter> after each entry. Thus, if there are two columns, say X and Y, and the numbers 2 and 3 are specified in the coefficients window, a new column will be generated called “linear combination equal to 2X+3Y”. Create dummy variables In some statistical applications, for example in those associated with regression models, it is necessary to transform a categorical variable X with k categories in k-1 binary variables (with value 0 or 1). A binary variable of this type is known as a dummy (auxiliary or indicator) variable. The group of k-1 dummy variables is used to identify each of the categories of the original variable X. Thus, if, for example, X has k=3 categories, two dummy variables D1 and D2 will be enough to represent each of the categories of X. For example, the combination D1=1 and D2=0 can identify the first category; D1=0 and D2=1 can identify the second category; D1=0 and D2=1 can identify the third category. In this case, the third category (that one in which all dummy variables equal zero) is generally called the reference category. To generate dummy variables, select the original categorical variable, and upon pressing Accept, a Dummy variable generator will appear on the screen where the original variable(s) and available categories for each one of these will be listed. The first category will be automatically selected to be used as a reference category. If the user wishes another category to serve this purpose, he should move the cursor to the desired category in order to select it. InfoStat generates k-1 dummy variables, which will be added to the data table, which are identified by the original variable name followed by an extension, so that it may be differentiated. The option Multiply by… which appears in the Dummy variables generator screen can be used to obtain the product of a dummy variable and some other variable of interest. These products will be shown in new columns in the data table, with a name that indicates their origin. An example of the application of this option is available in Regression with dummy variables. Fill... This option automatically fills a group of selected cells according to the specified option. To fill cells, select the desired cell(s) and specify the distribution from the DATA ⇒ FILL... menu. Warning: these actions replace the values of the selected content, thus if the user wishes to preserve the content of the original column, he should copy the column and apply the distribution to the new column.
  • 30. Data Management 24 Downward The empty cells are filled with the content of the first filled cell that precedes the empty cells in that same column. This action can also be completed by pressing CTRL+D. With sequence Beginning with the first selected cell, selected cells are assigned a natural number, in ascending order. The numbering continues on the columns on the right, and does not re-start with the new column. With uniform (0,1) Upon selecting this option the selected cells are assigned the value of a continuous random variable with uniform distribution, between 0 and 1. With Standard normal (0,1) Upon selecting this option, selected cells are assigned the value of a random variable with according to a standard normal distribution with mean = 0 and variance = 1. Others... In order to generate an ample list of distributions of random variables, InfoStat allows the user to fill cells with the following: 1) realizations of the random variable, 2) a cumulative distribution function for arguments read from the selected cells, 3) an inverse distribution function evaluated according to the selected values, and 4) a probability function evaluated according to the selected values. The following distributions are available: Uniform, Normal, Student T, Chi square, Non central F, Exponential, Gamma, Weibull, Logistic, Gumbel, Poisson, Binomial, Geometric, Hypergeometric and Negative binomial. The Sequence (begin, step) option is also available, and it can be used to fill cells with a sequence of real numbers, where the user defines the beginning and the distance between two consecutive numbers in the Parameters (begin and step) subwindow that is activated upon selecting Sequence (begin, step). For example, if the
  • 31. Data Management 25 beginning number is 1 and the step is 2, the selected column will begin with 1 and will continue with 3, then with 5, and so on. To fill cells with realizations of the random variable, cumulative distribution function, inverse distribution function, or probability distribution of one of the available random variables, select the random variable and in the Parameters panel, specify the constants that characterize the selected distribution. Select seed: By default, InfoStat uses a random seed to generate random numbers; however, in some cases it is useful to generate a single random sequence. This can be done by specifying a single randomly selected number, not equal to zero, in the edit field that is activated when the Select seed button is pressed. If the number zero is specified as the seed, InfoStat assumes that the seed is random, and therefore the sequences will always be different. A brief description of the available distributions is shown below: Note: E(X) and V(X) indicate the expected value and the variance of the random variable (X), respectively. Uniform (a,b): A continuous random variable X has a Uniform distribution on the interval [a,b] if its density function is as follows: 1 ( ; , ) I ( )[ , ]f x a b xa bb a = − , where I ( )[ , ] xa b is the indicator funciton, and the parameters a and b satisfy -∞<a<b<∞. E(X)=(a+b)/2 and Var(X)=(b-a)2 /12. Normal (mean, variance): A continuous random variable, X, with -∞< x<∞, has a Normal distribution if its density function is as follows: 21 ( ) /2( ; , ) 2 x m vf x m v e vπ − −= where the parameters m (mean) and v (variance) satisfy -∞< m<∞ y v>0. InfoStat uses m and v to represent the parameters E(X)=µ y Var(X)=σ2 , respectively. Student-T (v): The continuous random variable X (with -∞<x<∞) has a Student-T distribution with v degrees of freedom if its density function is as follows: ( ) ( )( )/ 21 2 1 / 2 1 1 ( ; ) ( / 2) 1 / f x x ν ν υ ν νπ ν + Γ + = Γ +    where v is a whole positive number known as degrees of freedom, and Γ(.) is the gamma function with the following form:
  • 32. Data Management 26 0 1( ) yrr y e dy ∞ −−Γ =∫ E(X)=0 for degrees of freedom greater than 1, and V(X)=ν/(ν-2) for ν >2. Chi square (v, lambda) (non-central): The random variable X has a Chi square distribution if its density function is as follows: ( ) ( ) 2 2 / 2 / 2 0, 0 ( / 2)2 2 2 ( ; , ) I ( ) ! j j x j jj e x e f x x j νλ νν λ ν λ ∞ + −− − ∞ = + = +             Γ       ∑ where I ( )(0, ) x∞ is the indicator function, ν is a whole positive number that denotes degrees of freedom, Γ(.) is the gamma function, and λ≥0 — known as the non-central parameter and defined as λj =1 when λ=0, j=0. E(X)=ν+2λ and V(X)=2(ν+4λ). If λ=0, the distribution is central Chi square. F non-central (u, v, lambda): The continuous random variable X has a non-central F distribution, characterized by degrees of freedom u (degrees of freedom of the numerator) and v (degrees of freedom of the denominator), and by the non-central parameter, λ , if its density function is as follows: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 / 2 2 2 / 2 (0, )2 / 2 0 2 2 ; , , 2 1 2 2 I ( ) ! u j u jj u j j j u u x u j u ux f e x x j λ ν ν ν ν λ ν ν λ + + −− ∞ ∞+ + = + + = + Γ + Γ Γ ∑ where I ( )(0, ) x∞ is the indicator function, u and ν are whole positive numbers, Γ(.) is the gamma function, and λ≥0, defined as λ j =1 when λ=0 and j=0. If λ=0, the distribution is F central with E(X)=v/v-2 for v>2 and V(X)=2 v2 (u+ v-2)/u(v-2)2 (v-4) for v>4. Exponential (lambda): The continuous random variable X has an Exponential distribution if its density function is as follows: ( ); I ( )(0, ) xf x e xλλ λ −= ∞ where I ( )(0, ) x∞ is the indicator function and λ>0. E(X)=1/λ and V(X)=1/λ2 . Gamma (r, lambda): The continuous random variable X has a Gamma distribution if its density function is as follows: 1 (0, )( ; , ) ( ) ( ) r x r f x r x e x r λ λ λ − − ∞= Ι Γ
  • 33. Data Management 27 where I ( )(0, ) x∞ is the indicator function, r>0 and λ>0, and Γ(.) is the gamma function. E(X)=r/λ and V(X)=r/λ2 . Beta (a, b): The continuous random variable X has a Beta distribution if its density function is as follows: 1 1 (0,1) 1 ( ; , ) (1 ) ( ) ( , ) a bf x a b x x x B a b − −= − Ι where (0,1) ( )xΙ is the indicator function, a>0, b>0 , and B(a,b) is the beta function given by the following expression: 2 1 1 0 ( , ) (1 ) 0, 0a bB a b x x dx para a b− −= − > >∫ E(X)=a/(a+b) and V(X)=ab/((a+b+1)(a+b) 2 ). Weibull (a, b): The continuous random variable X has a Weibull distribution if its density function is as follows: 1 (0, )( ; , ) ( )bb ax xf x a b xabx e− −= Ι where (0, ) ( )x xΙ is the indicator function, a>0 and b>0. E(X)=(1/a)1/b Γ(1+b-1 ) and V(X)=(1/a)2/b [Γ(1+2b-1 )-Γ2 (1+b-1 )], and Γ(.) is the gamma function. Logistic (a,b): The continuous random variable X has a logistic function if its cumulative density function is as follows: ( ) ( ) 1 / ; , 1 x a b x a bF e − − − = +  where -∞< a<∞ and b>0. E(X)=a and V(X)=(π2 b2 )/3. Gumbel or extreme value (a,b): The continuous random variable X has a Gumbel distribution if its cumulative density function is as follows: ( )/( ; , ) ( )x a bF x a b exp e− −= − where -∞<a<∞ y b>0. E(X)=a-bγ where γ approaches 0.577216 and V(X)=(π2 b2 )/6. Poisson (lambda): This distribution provides a model for count-type variables where the counts refer to the number of events of interest in a unit of time or space (hours, minutes, m2 , m3 , etc.). A discrete random variable X has a Poisson distribution if its density function is as follows: ( ) [ ] ( )0,1,... I; ! x x xf x e x λ λ− = where [ ] ( )0,1,... I x is the indicator function and λ>0. E(X)=λ and Var(X)=λ.
  • 34. Data Management 28 Binomial (n, p): This distribution occurs when the following conditions are simultaneously present: a) Bernoulli trials are executed, b) the parameter p (probability of “success”) is constant between trails, and c) trials are independent of each other. Bernoulli distribution: in some experiments, there are only two possible results: success or failure, presence or absence, yes or no, etc. A Bernoulli variable is a binary variable that identifies these events. For example, x=1 may represent success and x=0 may represent failure. E(X)=p and V(X)=p(1-p), where p is the probability of success. A discrete random variable X is said to have a Binomial distribution if its density function is as follows: [ ] ( )0,1,..., ( ; , ) I n x n x nx xf x n p p q −=       where [ ] ( )0,1,..., I n x is the indicator function, 0≤p≤1, q=1-p and n=1,2,... is the total number of trials. E(X)=np and Var(X)=npq. Geometric (p): This distribution is of special interest in modeling the number of trails needed for the first success to occur. A discrete random variable X has a Geometric (or Pascal) distribution if its density function is as follows: ( ) ( ) [ ] ( )0,1,... ; 1 I x f x p xp p= − where [ ] ( )0,1,... I x is the indicator function, 0≤p≤1, and q=1-p. E(X)=q/p and Var(X)=q/p2 . Hypergeometric (m,k,n): This distribution is associated with situations in which there is sampling without replacement—that is, situations in which an element of the population is randomly selected, and so on and so forth, until the trail is complete without substituting the extracted elements. Let a population be a group of m elements, k of which are in one of two possible states (success) and m-k of which are in the other state (failure). Similar to the Binomial distribution, the problem of interest is to find the probability of obtaining x successes in a sample of size n. A discrete random variable X has a Hypergeometric distribution if its density function is as follows: ( ) [ ] ( )0,1..,n ; , , I k m k x n x m n f x m k n x − − =                   where [ ] ( )0,1,... I x is the indicator function, m=1,2,..., k=0,1,...m and n=1,2,...,m. E(X)=n(k/m) and Var(X)=n(k/m) ((m-k)/m) ((m-n)/m-1). Negative binomial (m,k): As in the repetition of Bernoulli trials, certain problems, common in studies of natural populations, concentrate on the probability of finding x individuals in a simple unit under study where the individuals tend to be aggregated (Contagious distribution). InfoStat allows the user to calculate those probabilities by means of the
  • 35. Data Management 29 Negative binomial distribution. A discrete random variable X has a Negative binomial distribution if its density function is as follows: ( )( )( ) ( ) [ ] ( )0,1,... 1 2 ... 11 I ! ( ; , ) x k k k k x p x k x qq f x m k    + + + −               = where [ ] ( )0,1,... I x is the indicator function, p=m/k and q=p+1. The parameters m and k satisfy the following conditions: m>0 (average number of individuals per sampling unit) and k>0 (contagious or aggregation parameter). Formula It is possible to specify a formula whose results can substitute the content of an existing column or can be added to a new column. Warning: the names of the variables used in the calculation should not have parentheses, mathematical operation symbols or names of reserved functions, but they can contain accent marks and eñes. The dialogue window is shown below: During a work session, the formulae are stored in a list as they are written, and they are thus available for future use. To visualize them, the user should right click on the field in which the formulae are written. The dialogue window shows a list of available variables which can be included in a formula by clicking on the name of the list. When this procedure is followed to add variables to the expression that is being written, the names appear in quotes. This allows the user to include names that contain spaces or mathematical symbols that should not be interpreted as such.
  • 36. Data Management 30 The user can either used predefined functions or he can define his own functions. In the latter case, the user should write the function in the panel that appears below the formula edition field. For example, the function cube(x) is not a predefined function, but it can be specified by the user in the User defined functions panel by writing: cube(x)=x*x*x. This definition will allow the user to apply the cube function to any other variable in the active table or to any other valid expression. By writing in the formula specification field, for example, h=cube(COLUMN2), the cube function will be applied to the data in column 1. If the variables involved in the formula have a very long name, these names can be substituted in the formula with %#, where # is the number of the column that holds the variable. For example, if the data table has 3 columns, %1 denotes the name of the first column, %2 denotes the name of the second column, and %3 denotes the name of the third column. To identify the correspondence between column name and number, press the Alt key. While this key is held, the names of the columns in the active table will be shown as %#. If the user wishes to apply a function such as mean(.), min(.), max(.), which accept multiple arguments, to a block of variables, he should use the notation f(%a:%b), where f denotes the function, and %a and %b denote the column number of the beginning and end of the block, respectively. Note that the character that separates the beginning and end of a block is “colon” (:). Continuing with the above example, in order to calculate the average aof the first 3 variables in the file, the following should be indicated: (%1:%3). Another way to indicate that the function should be applied to a group of variables such as, for example mean (), is to use the format mean (name variable1: name variableN) indicating that the mean of all the variables between the first and nth variable is desired. This expression can be written manually or automatically, by selecting the block of variables in the list of variables. IDB2 data tables save the formulae that generate the contents of a column. It is possible to update the content of a column by applying the formula again. To do so, the column should be selected, and then the Update option should be chosen from the Data menu or from the menu that appears upon right clicking on the mouse. The dialogue appears in Macros mode, with the corresponding formula (or formulae, if more than one column was selected). These formulae can be edited or executed, individually or jointly, to update column content. Modifications can be conducted from the data table, while keeping the formulae window open. To specify a formula, select DATA⇒ FORMULA and write, for example, the expression Y=LN(COLUMN1)+3 in the window. The following operators and functions are predefined in InfoStat: + : addition operator. - : subtraction operator. * : multiplication operator. / : division operator.
  • 37. Data Management 31 ^ : exponent operator (only positive numbers in the base). ( : open parentheses. ) : close parentheses. e : constant 2.7172… PI: constant 3.141592653… SETSEED(x): Use this sentence with any integer as argument to set the random seed to a given initial value. ABS(x): absolute value of x (Range of x: -1e4932...1e4932). ARCCOSINE(x) or ARCCOSIN(x): arccosine of x. ARCSINE (x) or ARCSIN (x):: arcsine of x. AREAY(y1;…;yn): Calculates the area under the curve defined by the ordered pairs (Y,X), assuming that the values of X are equally spaced by one unit. AREAYX(y1;x1;…;yn;xn): Calculates the area under the curve defined by the ordered pairs (Y,X). ATAN(x): arctangent of x (Range of x: -1e4932...1e4932). COSINE(x) or COS(x): cosine of x (Range of x: -1e18...1e18). SQUARE(x) or SQR(x): square of x (Range of x: -1e2446... 1e2446). STDEV(x1;x2;…;xn): Calculates the standard deviation of the indicated variables. DISTNORMAL(x;m;v): Calculates the cumulative probability up to x for a normal distribution with mean m and variance v. EXP(x): exponential e^x (Range of x: -11356...11356). FACTORIAL(x): factorial of x. GAMMA(x): Assigns values of the Gamma distribution to the values of the indicated function. INVNORMAL(p;m;v): Calculates the value of x such that P(X<x)=p with X~N(m,v). LN(x): natural logarithm of x (Range of x: 0...1e4932). LN2(x): base 2 logarithm of x. LOG10(x): base 10 logarithm of x. MAX(x1;x2;…;xn): Calculates the maximum value of the indicated data group. MEAN(x1;x2;…;xn): Calculates the mean of the values of the indicated variables. MEDIAN(x1;x2;…;xn): Calculates the median of the values of the indicated variables. MIN(x1;x2;…;xn): Calculates the minimum value of the indicated data group. MOD(x) : modulus (or remainder) operator (applicable only to whole numbers).
  • 38. Data Management 32 NORMA(x1;x2;…;xn): Calculates the norm of the vector x. NORMAL(m, v): Generates realizations of a random, normal variable with mean m and variance v. ROUND(x): rounds x (Range of x: -1e9...1e9). SQRT(x): square root of x (Range of x: 0...1e4932). SINE(x) or SIN(x): sine of x (Range of x -1e18...1e18). SUM(x1;x2;…;xn): Sum of the values of the indicated variables. TANGENT (x): Tangent of x. TRUNC(x): takes the whole value of x (Range of x: -1e9... 1e9). URN. Generates realizations of a random variable with uniform distribution. UNIFORM(a, b): Generates realizations of a random variable with uniform distribution on the interval (a, b). VARIANCE(x1;x2;…;xn): Calculates the variance of the values of the indicated variables. ZRN: Generates realizations of a random variable with standard normal distribution. To work with date type variables, the functions described below are available (the arguments required by the function are in parentheses). DIADELCICLO(date,day,month): this sentence generates a column that contains the day of the cycle (on a scale of 1 to 365), according to the corresponding date and taking into account that the cycle begins on the day and month specified in the argument. For example, if in the formula field the user enters day=DIADELCICLO(date, 1,9), a column with the name of the day that contains whole numbers between 1 and 365 is generated, each one corresponding to the date indicated in the argument, where day “1” of the cycle is September first. Thus, according to this example, if the date column reads 18/09/07, the day column will contain the whole number 18; if the date column reads 03/10/07, the day column will contain the whole number 03. FECHADELDIADELCICLO(diadelciclo,day,moth,year): returns the date that corresponds to the specified day of the cycle, according to the day, month and year that correspond to the date of origin of the cycle. If the year argument is omitted, it takes the present year.This function is the inverse fo the function DIADELCICLO. DIAJULIANO(date): generates a column containing the julian day that corresponds to each data read from the date column. YEAR(date): generates a column containing the year that corresponds to each data read from the date column. MONTH(date): generates a column containing the month that corresponds to each data read from the date column. DAY(date): generates a column containing the day of the month that corresponds to each data read from the date column.
  • 39. Data Management 33 DATE(day, month, year): generates a column containing the date that corresponds to the specified day, month and year. Search DATA menu ⇒ SEARCH presents a dialogue window that allows the user to search for numbers, categories or dates, equal to, greater than, less than and/or different from a that specified by the user, within a part of the table that has been previously selected. These values can be replaced by another, by activating the Replace box, excluded from the analysis by activating the Deactivate case box, or the cells can be colored by activating the Color it box. The search can be specified for a complete content (if the Whole cell box is activated), or for certain elements within a text box. After each replacement or deactivation, the searcher reports the number of cases that were found or deactivated. Resampling DATA menu ⇒ SAMPLING/RESAMPLING allows the user to obtain samples from a group of data by using the bootrap, jackknife, randomly with replacement, or randomly without replacement methods. The bootstrap method conducts a random sampling with replacement, and generates samples of size n equal to the size of the original sample, while the option randomly with replacement allows the user to generate samples of a size different from n. The column from which the samples are to be drawn should be indicated, as well as classification and/or partition criteria, if these exist. Then, the user should select a sampling technique (in the Resampling method panel), and the values to be reporter by the sampling (Save panel). If bootstrap is selected, the number of samples to be extracted should be entered in the Bootstrap field; if randomly with or without replacement is selected, the number of samples to be generated should be indicated (in # of samples) as well as their size (in Sample size). The values of the variable that make up each of the solicited samples (Samples option) as well as one or several summary statistics for each sample (Mean, Median, Maximum, Minimum, Range, Variance, Standard deviation—S.D—, Standard error, Coefficient of variation—C.V.—Sum, Sum of squares, Median absolute deviation—MAD—Percentiles—P01, P05, P10, P20, P25, P50, P75, P80, P90, P95, P99, Kurtosis and Skewness). The results are shown in a new table. If the values of the variable are solicited, the new table will have a column for each sample. If one or more summary statistics are solicited, the new table will contain each sample and each measurement in a column. Color selection DATA menu ⇒ COLOR SELECTION, allows the user to color a group of previously selected cells. When a variable is colored it appears with the color in the Variables selector list. This characteristic is useful, for example, if colors are used to distinguish groups fo variables.
  • 40. Data Management 34 Merge tables DATA menu ⇒ MERGE TABLES allows the user to merge an active table to two or more tables Horizontally or Vertically. The merge is done one table at a time. A Horizontal merge adds columns to the active table to include the new information and requires that the user select one or more merging criteria. Once these criteria are established, a dialogue window will appear from which the table and columns to be merged (added) to the active table should be selected. The window contains a list of tables open on the screen, from which the table to be merged should be identified. If the desired table is not listed, the Other table button should be pressed in order to open the corresponding table from its location, and thus the table will be added to the list. Upon selecting a table from the list, column (variable) names will appear with an activated check box, indicating which variables will be added to the active table. The user can deactivate those which he does not wish to participate in the process. In the case that both tables have the same column names, upon adding the new information, InfoStat will place a number at the end of the name of the added column in order to distinguish it from the other column with the same name. If the user wishes to replace the content of the columns with the same name in the active table, he should activate the Overwrite box. Upon completing the horizontal merge, the solicited columns are added, but the information from the original table is not included. A vertical merge adds new rows to the active table in order to include the information contained by coinciding columns and creates new columns for variables that do not coincide. The process is similar to the one described for a horizontal merge, except that in there is no need to specify merging criteria. Rearrange columns, one under the other DATA menu ⇒ Rearrange columns, one under the other merges the content of two or more columns in a single column. The columns to be merged should be selected in the dialogue window the (Columns option) and the merge will be conducted according to the selection order. The user may also choose to copy the information from a column of interest (Copy... option). There is an option to conduct the merge with only the active cases. By clicking Go, a new table that shows the results of the union is generated. Rearrange rows as columns DATA menu ⇒ REARRANGE ROWS AS COLUMNS allows the user to transfer the content of the rows of an active table to the columns of a new table, according to the classification criteria established by the user. In the Columns option of the dialogue window, the user should indicate the variables whose data will appear in the columns of the new table, and in the Partition criteria option, he should indicate those variables which will define the columns of the new table. The user may also copy entries of a particular column of interest (Copy... option). The new table will appear upon clicking OK.
  • 41. Data Management 35 Create a new table using active cases DATA menu ⇒ CREATE NEW TABLE USING ACTIVE CASES generates a new table that will contain only the active cases of an open table that also contains inactive cases. Merge categories DATA menu ⇒ MAKE A NEW COLUMN BY MERGING CATEGORICAL VARIABLES allows the user to obtain the combinations that result from merging the categories of two or more variables. In the dialogue window, under the Partition criteria option, the user should indicate the variables he wishes to cross. Upon clicking OK, a new column with the clases obtained by the merge will appear in the table. Output The OUTPUT menu shows the actions that can be applied to an active result (the last result of an action solicited from the Statistics or Applications menu). In order to activate another previously obtained result, click on the tab that indexes that result and that can be found at the foot of the RESULTS window. Upon activating the OUTPUT menu, the user will be able to choose from among the following options: Upload results This allows the user to open a file that contains results that have been saved during a work session. The file name and location are specified in the dialogue window. Save results This allows the user to create a file containing results that have been obtained during a work session. The file name and location are specified in the dialogue window. The files will have a “.ITRES” extension. Decimals This item displays a submenu that allows the user to select the desired number of decimals to be shown. At the bottom of this menu, an option for exponential notation appears; in the
  • 42. Data Management 36 case that a result cannot show any significant digit with the specified number of decimals, InfoStat will use exponential notation. Field separator This allows the user to select a type of separator (space, tab, comma or semicolon) as the character that will separate the columns of a table; the default selector is a space. Usually this separator does not need to be modified, but it can be useful to do so when results from a table are exported. Typography This allows the user to change the typographical attributes (font style and font) used in presenting results. This action can also be evoked by activating the “A” button on the Toolbar. Export results to table This allows the user to export the text of a Results window as a table. Upon selecting this action, a dialogue window called Text Importer will open. For details regarding operations with this window, see OPEN TABLE in the DATA menu. Access to results submenus through right clicking on the mouse In addition to the actions presented in the RESULTS menu, the user can also access the following options by right clicking on the mouse when a Results window is active: Decimals: establishes the number of decimals that are shown in an active window. Copy: copies the previously selected text, using tabs as field separators. The text can be read directly in word processors for the construction of tables. Delete: deletes the active result. Delete present and previous windows: deletes the active result as well as all previous results. Print: prints the content of the active result.
  • 43. Statistics 37 Statistics InfoStat conducts different statistical analyses using an active data table. The selection of the type of analysis is done from the STATISTICS menu. Each time a procedure is invoked, the output is presented in a results window which can be formatted and prepared to be exported according to the specifications given by the user from the OUTPUT menu. The actions (submenus) that are applied to the analysis of tables in InfoStat, in the STATISTICS menu, are the following: Summary statistics, Frequency tables, Probabilities and quantiles, Estimating population parameters, Sample size, One-sample inference, Two-sample inference, Analysis of variance, non-parametric ANOVA, Extended and mixed linear models, Linear regression, Correlation analysis, Categorical data, Multivariate analysis, Time series, Fitting and smoothing In general, these actions initially invoke a window that is used to select variables. In it, the user should indicate the variable(s) of interest and the desired partition, in the case that the analysis is by group or partition in the data file. In the variables selector, the user can include variables of interest by clicking on the arrows in the Variables subwindow. The variables that are generated should be declared in the Partitions tab, the Partition by command allows the user to identify the variable(s) that will be used to partition the analysis. When more than one variable is selected, the groups result from the combination of the levels of the selected variables.
  • 44. Statistics 38 For example, if the variables seed color (light, dark and red) and seed size (large, medium and small), three groups are generated upon selecting only the color (the three levels of color). If, instead, both variables are selected, 9 groups will be generated. The partitions will appear in a list to the right of the window that can be altered through the selection and elimination of one or more groups that the user does not wish to participate in the analysis, through the use of the displacement arrows found at the bottom of the list. Once groups have been identified, InfoStat will repeatedly conduct the solicited analysis on the observations of each group, separately. Descriptive statistics The first block in the Statistics menu allows the user to describe a group of data by means of univariate summary statistics, frequency tables and theoretical distribution functions adjusted to empirical distributions (sample frequency tables). All of these actions can be conducted for the group of active tables, either as a whole, as a subgroup or partition of the file, if the user indicates a partitioning variable in the Partitions tab. For summary statistics and frequency tables, it is possible to work with files that have as many rows as observations (see the Atriplex.idb file), or with files in which each row of the column of interest represents a value of the variable and in which another column of the file contains the frequency of each value (see the Insectos.idb file). In the first case, in the variables selector, the variable(s) of interest should be indicated and the Frequencies field should be left empty. In the second case, the column that contains the different values of the variable should be indicated in the Variables window of the selector, and the column hat contains the frequencies should be indicated in the Frequencies (only option) window. InfoStat also provides a probabilities and quantiles calculator for different types of random variables. Summary statistics The following summary statistics are available: number of observations (n), Mean, standard deviation (S.D.) variance with denominator n-1 (Var(n-1)), variance with denominator n (Var(n)), standard error (S.E.), coefficient of variation (CV), minimum value (Min), maximum value (Max), Median, quantile 0.25 or first quartile (Q1), quantile 0.75 or third quartile (Q3), sum of observations (Sum), Asymmetry, Kurtosis, uncorrected sum of squares (USS), corrected sum of squares (CSS) ,median absolute deviation (MAD), Missing data, percentiles 5, 10, 25, 50, 75, 90 and 95 (P(05), P(10), etc.). The number of observations reported corresponds to the number of active cases. The sample statistics are calculated using the number of cases that remain after observations with missing data have been omitted. The code for missing data can be entered by the user. The Mean statistic refers to the arithmetic mean. The Standard deviation refers to the square root of the sample variance, calculated as the sum of the squares of the deviations with respect to
  • 45. Statistics 39 sample mean, divided by (n-1). The Standard error refers to the standard deviation divided by the root of n. The Coefficient of variation is the quotient of the standard deviation and the sample mean, expressed as a percentage. The first quartile (Q1), the median and the third quartile (Q3), as well as any other percentage can be obtained by ordering the sample and selecting one of the observed values according to its position, or estimated based on an approximation of the empirical distribution function. If the user selects Based on EDF in the Percentiles subwindow, InfoStat will first estimate the function and then use this function to report the solicited percentile. If the Sample option is selected, the percentile will be one of the sample values obtained after the sample was ordered. For this reason, both procedures will not necessarily produce the same numeric result. Results can be presented horizontally or vertically. A horizontal presentation is useful to export results to a new data table prior to conducting further analysis using a data table that contains summary statistics. Summary statistics for one or more variables can be simultaneously solicited from the file (indicated in the variables selector). These summary statistics can be obtained using all the observations from the file, or for a subgroup of observations. The subgroups can be formed from a single variable or from a combination of two or more variables from the file. To form groups, the user should indicate the variables that define the groups by listing these in the Class variables (optional) subwindow in the variables selector. Alternatively, the Partition tab can be activated to indicate the variables that partition the file; however this option is less efficient than using the Class variables in terms of execution time. For this reason, we recommend using the class variables option when the user wishes to obtain summary statistics for a large number of subgroups of an extensive file. To illustrate, we use data from the Atriplex file. Selecting STATISTICS menu ⇒ SUMMARY STATISTICS, we activate the Descriptive statistics window in which the desired variable(s) are selected. If a variable is selected in the Partition tab to create a partition in the file, the solicited summary statistics will be generated for each group or partition. In this example, the variables “Percentage” and “Normal Seedlings”, and in the Partition tab the variable “Size” was selected. The following summary statistics were activated or solicited: n, Mean, S.D., Var(n-1), Min, Max, Median and P(50) estimated from the empirical distribution function (this statistic does not coincide exactly with the Median, since it is calculated using the sample data, whereas P(50) is calculated using the distribution of the sample data. If in soliciting P(50) the Sample box is left activated, then the Median and P(50) will be the same. The Horizontal presentation was selected. The results are shown in the following table:
  • 46. Statistics 40 Table 3: Summary statistics for variables in the Atriplex file, according to the partition by seed size (horizontal presentation). Summary statistics Size Variable n Mean S.D. Var(n-1) Minimum Maximum Median P(50) small Germinación 9 54.56 26.34 694.03 20.00 93.00 60.00 48.67 small Normal Seedlings 9 24.44 20.24 409.53 0.00 60.00 20.00 20.00 big Germinación 9 73.33 19.28 371.75 40.00 93.00 80.00 71.00 big Normal Seedlings 9 51.33 22.12 489.50 27.00 87.00 47.00 42.33 medium Germinación 9 68.78 32.81 1076.19 13.00 100.00 87.00 80.00 medium Normal Seedlings 9 50.67 27.44 752.75 7.00 80.00 54.00 40.50 Frequency tables STATISTICS menu ⇒ FREQUENCY TABLES, allows the user to obtain a frequency table and/or test the adjustment of theoretical distributional models on an empirical distribution table. Frequency tables can, according to the fields activated by the user, contain the following information: lower limits (LL) and upper limits (UL) of the class intervals, mean of the interval (MI) absolute frequencies (AF), relative frequencies (RF), cumulative absolute frequencies (CAF) and cumulative relative frequencies (CRF). The number of classes can be obtained automatically or can be defined by the user (PERSONALIZED). For the automatic method, InfoStat obtains the class number by taking log2(n+1). For the personalized case, InfoStat allows the user to specify the minimum, maximum and the number of intervals. The intervals are closed on the right. If the variable is categorical, the personalization is not accepted, and the frequencies table shows as many classes are there are categories for the variable. If the values of the variable were declared integers, by default, InfoStsat considers the variable a count variable and shows the frequencies of all the integer values between the minimum and the maximum. If the variable contains integer values and the Consider integer variables as countings box is de-activated, InfoStat treats the variable as continuous, and uses them to define class intervals and construct the table.
  • 47. Statistics 41 Again, using data from the Atriplex file, we obtained a frequencies table for the germination variable for each of the seed sizes, by invoking the following actions: STATISTICS ⇒ FREQUENCY TABLES, in the Frequense Distribution window, variables tab, germination was selected, and before clicking OK, the Partitions citeria tab was activated, and the variable size was added (all the seed sizes present in the file are automatically visualized). Upon clicking OK, the Distribution of frequencies – Frequency table options window appears, from which the user can indicate the type of information he wishes to visualize in the table and define the number of cases. In this example, all the default options were accepted, and upon clicking OK, the number of classes was calculated automatically. The results are shown in the following table: Tabla 4: Frequency table for the germination variable from the Atriplex file, according to the partition conducted by the variable seed size. Frequency distribution big Germination 1 40.00 57.67 48.83 3 0.33 Size Variable Class LL UL MI AF RF big Germination 2 57.67 75.33 66.50 0 0.00 big Germination 3 75.33 93.00 84.17 6 0.67 medium Germination 1 13.00 42.00 27.50 3 0.33 Size Variable Class LL UL MI AF RF medium Germination 2 42.00 71.00 56.50 0 0.00 medium Germination 3 71.00 100.00 85.50 6 0.67 small Germination 1 20.00 44.33 32.17 3 0.33 Size Variable Class LL UL MI AF RF small Germination 2 44.33 68.67 56.50 3 0.33 Fittings small Germination 3 68.67 93.00 80.83 3 0.33 STATISTICS menu ⇒ FREQUENCY TABLES, Fittings tab, allows the user to obtain goodness of fit tests. The null hypothesis specifies a theoretical distribution model for the data. The values observed in the sample are compared to the expected values according to the specified model, through the use of the Chi square statistic and/or the maximum likelihood statistical significance, or G, test (Agresti, 1990). The user should select from among one of these two statistics in order to conduct a goodness of fit test. Furthermore, he should specify whether he wishes to estimate from the sample, or externally specify the parameters of the theoretical distribution that, hypothetically, describe the data. If specify is activated, as many check boxes as there are parameters in the selected theoretical distribution will appear, so that the user may input information. The check boxes reserved for each parameter of a distribution will automatically contain the values of the sample estimators of each parameter. In the case of continuous variables, the empirical distribution will be constructed from the automatically generated information on class intervals. These intervals can be generated on lower and upper open or closed intervals, depending on how the user specified these in the Frequency distribution - Fittings window.