A multiple regression analysis was conducted to determine what factors influence human body mass. The analysis found that body mass is best predicted by a model containing sex, body height, and hours spent weekly on physical exercise. Lower AIC values indicated this three-predictor model had the best fit compared to models with additional or fewer predictors.
1. The document discusses analyses of multiple location trials to evaluate genotype performance across environments. It describes factors to consider in determining the number and location of environments for trials and statistical analyses for multiple experiments. 2. Key analyses covered include the Bartlett test for homogeneity of error variances, analysis of variance models for sites and years, and joint regression analysis to evaluate genotype by environment interactions. 3. Joint regression analysis fits linear regressions between genotype performances in each environment and the mean performance across environments to identify which interactions are linear versus non-linear.
This document summarizes key concepts in building multiple regression models, including:
1) Analyzing nonlinear variables, qualitative variables, and building and evaluating regression models.
2) Transforming variables to improve model fit, including using indicator variables for qualitative data.
3) Common model building techniques like stepwise regression, forward selection, and backward elimination.
This document provides an overview of continuous probability distributions covered in Lecture 5, including:
- Continuous random variables can take on uncountably infinite values within an interval, unlike discrete variables. Probability density functions (PDFs) are used instead of probabilities.
- The uniform, normal, and exponential distributions are introduced as examples of continuous distributions. Key properties like expected value and variance are discussed.
- The standard normal distribution is especially important, and its probabilities are provided in tables. Examples show how to calculate probabilities for normal distributions using the tables.
There is a significant difference in body mass between differently colored Norwegian rats (F2,12 = 7.59, p = 0.0074). Post-hoc analysis shows white rats are heavier than black or brown rats. The ANOVA model explains 56% of the variation in mass. Assumptions of normality and homogeneity of variances are met based on examination of residual plots.
The document discusses the steps for conducting a response surface methodology (RSM) experiment using central composite design (CCD). It involves determining independent and dependent variables, selecting an appropriate CCD, conducting the experiment runs according to the design, analyzing the data using statistical methods to develop a mathematical model and check its adequacy, and using the model to optimize responses. Key aspects of RSM and CCD covered include developing the design, analyzing results through ANOVA and regression, and checking model validity.
In the preparation for the Geodetic Engineering Licensure Examination, the BSGE students must memorized the fastest possible solution for the LEAST SQUARES ADJUSTMENT using casio fx-991 es plus calculator technique in order to save time during the said examination. note: lec 2 and above wala akong nilagay na solution para hindi makupya techniques ko. just add me on fb para ituro ko sa inyo solution. Kasi itong solution ko wala sa google, youtube, calc tech books at hindi rin itinuro sa review center.
1. The document discusses analyses of multiple location trials to evaluate genotype performance across environments. It describes factors to consider in determining the number and location of environments for trials and statistical analyses for multiple experiments. 2. Key analyses covered include the Bartlett test for homogeneity of error variances, analysis of variance models for sites and years, and joint regression analysis to evaluate genotype by environment interactions. 3. Joint regression analysis fits linear regressions between genotype performances in each environment and the mean performance across environments to identify which interactions are linear versus non-linear.
This document summarizes key concepts in building multiple regression models, including:
1) Analyzing nonlinear variables, qualitative variables, and building and evaluating regression models.
2) Transforming variables to improve model fit, including using indicator variables for qualitative data.
3) Common model building techniques like stepwise regression, forward selection, and backward elimination.
This document provides an overview of continuous probability distributions covered in Lecture 5, including:
- Continuous random variables can take on uncountably infinite values within an interval, unlike discrete variables. Probability density functions (PDFs) are used instead of probabilities.
- The uniform, normal, and exponential distributions are introduced as examples of continuous distributions. Key properties like expected value and variance are discussed.
- The standard normal distribution is especially important, and its probabilities are provided in tables. Examples show how to calculate probabilities for normal distributions using the tables.
There is a significant difference in body mass between differently colored Norwegian rats (F2,12 = 7.59, p = 0.0074). Post-hoc analysis shows white rats are heavier than black or brown rats. The ANOVA model explains 56% of the variation in mass. Assumptions of normality and homogeneity of variances are met based on examination of residual plots.
The document discusses the steps for conducting a response surface methodology (RSM) experiment using central composite design (CCD). It involves determining independent and dependent variables, selecting an appropriate CCD, conducting the experiment runs according to the design, analyzing the data using statistical methods to develop a mathematical model and check its adequacy, and using the model to optimize responses. Key aspects of RSM and CCD covered include developing the design, analyzing results through ANOVA and regression, and checking model validity.
In the preparation for the Geodetic Engineering Licensure Examination, the BSGE students must memorized the fastest possible solution for the LEAST SQUARES ADJUSTMENT using casio fx-991 es plus calculator technique in order to save time during the said examination. note: lec 2 and above wala akong nilagay na solution para hindi makupya techniques ko. just add me on fb para ituro ko sa inyo solution. Kasi itong solution ko wala sa google, youtube, calc tech books at hindi rin itinuro sa review center.
The document discusses factorial analysis of variance (ANOVA) and provides an example to illustrate the steps. It analyzes the flavor acceptability of luncheon meat from different sources. The null hypothesis is that there is no significant difference between the sources. The two-way ANOVA calculations show that the computed F-values are greater than the critical values, so the null hypothesis is rejected, indicating there are significant differences between the sources of luncheon meat.
The document discusses factorial analysis of variance (ANOVA) and provides an example to illustrate the steps. It analyzes the flavor acceptability of luncheon meat from different sources. The null hypothesis is that there is no significant difference between the sources. The two-way ANOVA calculations show that the computed F-values are greater than the critical values, so the null hypothesis is rejected, indicating there are significant differences between the sources of luncheon meat.
The document discusses factorial analysis of variance (ANOVA) and provides an example to illustrate the steps in a two-way ANOVA. Specifically, it presents a study on the flavor acceptability of luncheon meat from different sources. It provides the problem statement, hypotheses, assumptions, and 10 step-by-step computations to conduct a two-way ANOVA on the data. The results of the ANOVA show that the flavor acceptability significantly differs between the meat sources, leading to a rejection of the null hypothesis.
The document describes using SPSS to analyze soil science data through descriptive statistics, normality tests, and one-sample and independent-samples t-tests. Specifically, it provides examples of:
1) Using SPSS to calculate descriptive statistics like mean, median, mode, range and standard deviation for soil bulk density and exchangeable calcium data.
2) Performing a normality test and one-sample t-test to determine if the mean of exchangeable calcium data transformed using log(x+1) differs significantly from normal.
3) Demonstrating the use of one-sample t-test to analyze soil moisture levels assessed by nine soil scientists and number of earthworms sampled from different farms.
This document discusses properties of pseudo-random numbers and methods for generating random numbers computationally. It covers:
- Properties of pseudo-random numbers including being continuous between 0 and 1 and uniformly distributed.
- Common methods for generating pseudo-random numbers including table lookup, linear congruential generators (LCG), and feedback shift registers.
- Desirable properties for random number generators including being fast, requiring little memory, having a long cycle or period, and producing numbers that are close to uniform and independent.
Speaker: Eduardo Vallejos Associate Professor, Molecular Biology & Physiology.
The talk will cover overall perspective of both genetic and modeling and advanced methods for working with the genetic and phenotypic data with crop models and a perspective on promising future approaches.
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data MiningVarun Ojha
The document proposes using metaheuristic optimization techniques to tune the parameters of an interval type-2 fuzzy inference system (IT2FIS) for data mining applications. Specifically, it aims to 1) create diverse rules in the IT2FIS, 2) reduce the number of fuzzy rules, 3) determine appropriate shapes for type-2 fuzzy sets, and 4) analyze the performance of proposed IT2FIS optimization methods. The proposed framework uses genetic algorithms to tune the IT2FIS knowledge base and swarm intelligence methods to tune rule parameters. Experimental results on four datasets show that differential evolution generally provides the best performance, though no single algorithm works best on all datasets.
ANOVA (analysis of variance) is a statistical technique used to compare differences between group means. It involves calculating the F ratio, which is the ratio of variance between groups to variance within groups. If the calculated F value is greater than the critical F value from statistical tables, then the difference between group means is considered statistically significant. The document provides steps for conducting a one-way ANOVA, including calculating sums of squares, mean squares, and the F ratio to determine if differences between three varieties of wheat are statistically significant based on per acre production data.
This document summarizes a lab on model selection and multi-model inference. It discusses fitting four linear models to predict species richness using variables from the Swiss data. Model selection is performed using AIC, with the top model having an elevation and forest term. Model-averaged predictions are also calculated by weighting predictions from each model by their AIC weights.
Genotype x environment interactions occur when genotypes respond differently to varying environmental conditions. Researchers conduct multi-location trials over multiple years to investigate these interactions and identify genotypes that perform well across different environments or are specifically adapted to certain environments. Analyzing the data from such trials involves testing for homogeneity of error variances between locations and years before performing analyses of variance to partition variance components and determine which genotypes interact least or perform best on average over environments.
The document discusses applying machine learning techniques to identify compiler optimizations that impact program performance. It used classification trees to analyze a dataset containing runtime measurements for 19 programs compiled with different combinations of 45 LLVM optimizations. The trees identified optimizations like SROA and inlining that generally improved performance across programs. Analysis of individual programs found some variations, but also common optimizations like SROA and simplifying the control flow graph. Precision, accuracy, and AUC metrics were used to evaluate the trees' ability to classify optimizations for best runtime.
This document discusses measures of dispersion such as standard deviation and variance. It provides formulas and examples of calculating standard deviation, variance, and coefficient of variation from data sets. It also describes steps for conducting a chi-square test on frequency data, including determining the appropriate test, establishing significance level, formulating hypotheses, calculating test statistics, determining degrees of freedom, and comparing the computed statistic to critical values. An example contingency table and chi-square calculation are also provided.
this is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is good
The document discusses various methods for modeling input distributions in simulation models, including trace-driven simulation, empirical distributions, and fitting theoretical distributions to real data. It provides examples of several continuous and discrete probability distributions commonly used in simulation, including the exponential, normal, gamma, Weibull, binomial, and Poisson distributions. Key parameters and properties of each distribution are defined. Methods for selecting an appropriate input distribution based on summary statistics of real data are also presented.
This document discusses various measures of central tendency and dispersion used to summarize collected data. It defines mean, median and mode as measures of central tendency, and how to calculate each one. It also covers variance, standard deviation and coefficient of variation as measures of dispersion. The document provides examples of calculating and interpreting these statistical concepts, and explains when to use each measure to best summarize a data set.
This document discusses probability distributions for random variables. It introduces discrete distributions like the binomial and Poisson distributions which are used for counting experiments. It also introduces continuous distributions like the normal distribution which are defined over continuous ranges of values. Key concepts covered include probability density functions, cumulative distribution functions, and how to relate random variables with specific parameters to standard distributions. Examples are provided to illustrate concepts like modeling the number of plant stems in a sampling area with a Poisson distribution.
Simulation - Generating Continuous Random VariablesMartin Kretzer
The document discusses various methods for generating continuous random variables in simulations, including the inverse transform method and acceptance-rejection method. It provides examples of how to generate random variables from important distributions like the exponential, normal, Poisson, and nonhomogeneous Poisson distributions. The agenda includes an introduction, overview of methods, generating specific distributions, summary, and exercises in R to apply the methods.
The document describes a nested experimental design where sites are nested within study and control areas. There are multiple sites in each area with replications. A nested ANOVA can test for differences between areas while accounting for variability among sites within areas. The example shows calculations for a nested ANOVA with 3 areas, 4 sites each, and 3 replications. It finds a significant difference among sites within areas but not between the overall study and control areas.
FFT is an efficient algorithm to compute the discrete Fourier transform (DFT) and convert a time domain signal to its frequency domain representation. Radix-2 FFT is the most common algorithm, in which the input is divided into groups of 2 samples at each stage. FFT algorithms generally have a number of samples that is a power of 2, like 2N, to efficiently compute the DFT. The radix-2 FFT breaks the computation into "butterflies" or decimation in time (DIT) and decimation in frequency (DIF) structures to recursively compute the DFT. Twiddle factors representing complex roots of unity are used to compute the outputs of each butterfly operation.
This document summarizes the symptoms and characteristics of various insect pests that affect plants. It describes pests such as the shoot and fruit borer that cause withering of terminal shoots and bore holes in shoots and fruits. It also mentions stem borers that cause drooping and withering of young plant tops and stunting of older plants. Descriptions of other pests include the hadda beetle, leaf roller, lace wing bug, bud worm, hairy caterpillar, brinjal mealy bug, tobacco caterpillar, whitefly, spiraling whitefly, serpentine leaf miner, and striped mealybug. For each, it outlines the visual symptoms of damage caused and sometimes includes details on the
The document summarizes various pests that affect millet crops, including shoot flies, stem borers, pink borers, white borers, white grubs, root aphids, caterpillars, ear head caterpillars, web worms, gall midges, ear head bugs, ear head beetles, weevils, leaf beetles, flea beetles, leaf rollers, slug caterpillars, chafer beetles, grasshoppers, aphids, and whiteflies. It describes the symptoms, appearance of larvae and adults, and in some cases the specific crops affected for each pest.
The document discusses factorial analysis of variance (ANOVA) and provides an example to illustrate the steps. It analyzes the flavor acceptability of luncheon meat from different sources. The null hypothesis is that there is no significant difference between the sources. The two-way ANOVA calculations show that the computed F-values are greater than the critical values, so the null hypothesis is rejected, indicating there are significant differences between the sources of luncheon meat.
The document discusses factorial analysis of variance (ANOVA) and provides an example to illustrate the steps. It analyzes the flavor acceptability of luncheon meat from different sources. The null hypothesis is that there is no significant difference between the sources. The two-way ANOVA calculations show that the computed F-values are greater than the critical values, so the null hypothesis is rejected, indicating there are significant differences between the sources of luncheon meat.
The document discusses factorial analysis of variance (ANOVA) and provides an example to illustrate the steps in a two-way ANOVA. Specifically, it presents a study on the flavor acceptability of luncheon meat from different sources. It provides the problem statement, hypotheses, assumptions, and 10 step-by-step computations to conduct a two-way ANOVA on the data. The results of the ANOVA show that the flavor acceptability significantly differs between the meat sources, leading to a rejection of the null hypothesis.
The document describes using SPSS to analyze soil science data through descriptive statistics, normality tests, and one-sample and independent-samples t-tests. Specifically, it provides examples of:
1) Using SPSS to calculate descriptive statistics like mean, median, mode, range and standard deviation for soil bulk density and exchangeable calcium data.
2) Performing a normality test and one-sample t-test to determine if the mean of exchangeable calcium data transformed using log(x+1) differs significantly from normal.
3) Demonstrating the use of one-sample t-test to analyze soil moisture levels assessed by nine soil scientists and number of earthworms sampled from different farms.
This document discusses properties of pseudo-random numbers and methods for generating random numbers computationally. It covers:
- Properties of pseudo-random numbers including being continuous between 0 and 1 and uniformly distributed.
- Common methods for generating pseudo-random numbers including table lookup, linear congruential generators (LCG), and feedback shift registers.
- Desirable properties for random number generators including being fast, requiring little memory, having a long cycle or period, and producing numbers that are close to uniform and independent.
Speaker: Eduardo Vallejos Associate Professor, Molecular Biology & Physiology.
The talk will cover overall perspective of both genetic and modeling and advanced methods for working with the genetic and phenotypic data with crop models and a perspective on promising future approaches.
Metaheuristic Tuning of Type-II Fuzzy Inference System for Data MiningVarun Ojha
The document proposes using metaheuristic optimization techniques to tune the parameters of an interval type-2 fuzzy inference system (IT2FIS) for data mining applications. Specifically, it aims to 1) create diverse rules in the IT2FIS, 2) reduce the number of fuzzy rules, 3) determine appropriate shapes for type-2 fuzzy sets, and 4) analyze the performance of proposed IT2FIS optimization methods. The proposed framework uses genetic algorithms to tune the IT2FIS knowledge base and swarm intelligence methods to tune rule parameters. Experimental results on four datasets show that differential evolution generally provides the best performance, though no single algorithm works best on all datasets.
ANOVA (analysis of variance) is a statistical technique used to compare differences between group means. It involves calculating the F ratio, which is the ratio of variance between groups to variance within groups. If the calculated F value is greater than the critical F value from statistical tables, then the difference between group means is considered statistically significant. The document provides steps for conducting a one-way ANOVA, including calculating sums of squares, mean squares, and the F ratio to determine if differences between three varieties of wheat are statistically significant based on per acre production data.
This document summarizes a lab on model selection and multi-model inference. It discusses fitting four linear models to predict species richness using variables from the Swiss data. Model selection is performed using AIC, with the top model having an elevation and forest term. Model-averaged predictions are also calculated by weighting predictions from each model by their AIC weights.
Genotype x environment interactions occur when genotypes respond differently to varying environmental conditions. Researchers conduct multi-location trials over multiple years to investigate these interactions and identify genotypes that perform well across different environments or are specifically adapted to certain environments. Analyzing the data from such trials involves testing for homogeneity of error variances between locations and years before performing analyses of variance to partition variance components and determine which genotypes interact least or perform best on average over environments.
The document discusses applying machine learning techniques to identify compiler optimizations that impact program performance. It used classification trees to analyze a dataset containing runtime measurements for 19 programs compiled with different combinations of 45 LLVM optimizations. The trees identified optimizations like SROA and inlining that generally improved performance across programs. Analysis of individual programs found some variations, but also common optimizations like SROA and simplifying the control flow graph. Precision, accuracy, and AUC metrics were used to evaluate the trees' ability to classify optimizations for best runtime.
This document discusses measures of dispersion such as standard deviation and variance. It provides formulas and examples of calculating standard deviation, variance, and coefficient of variation from data sets. It also describes steps for conducting a chi-square test on frequency data, including determining the appropriate test, establishing significance level, formulating hypotheses, calculating test statistics, determining degrees of freedom, and comparing the computed statistic to critical values. An example contingency table and chi-square calculation are also provided.
this is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is goodthis is good
The document discusses various methods for modeling input distributions in simulation models, including trace-driven simulation, empirical distributions, and fitting theoretical distributions to real data. It provides examples of several continuous and discrete probability distributions commonly used in simulation, including the exponential, normal, gamma, Weibull, binomial, and Poisson distributions. Key parameters and properties of each distribution are defined. Methods for selecting an appropriate input distribution based on summary statistics of real data are also presented.
This document discusses various measures of central tendency and dispersion used to summarize collected data. It defines mean, median and mode as measures of central tendency, and how to calculate each one. It also covers variance, standard deviation and coefficient of variation as measures of dispersion. The document provides examples of calculating and interpreting these statistical concepts, and explains when to use each measure to best summarize a data set.
This document discusses probability distributions for random variables. It introduces discrete distributions like the binomial and Poisson distributions which are used for counting experiments. It also introduces continuous distributions like the normal distribution which are defined over continuous ranges of values. Key concepts covered include probability density functions, cumulative distribution functions, and how to relate random variables with specific parameters to standard distributions. Examples are provided to illustrate concepts like modeling the number of plant stems in a sampling area with a Poisson distribution.
Simulation - Generating Continuous Random VariablesMartin Kretzer
The document discusses various methods for generating continuous random variables in simulations, including the inverse transform method and acceptance-rejection method. It provides examples of how to generate random variables from important distributions like the exponential, normal, Poisson, and nonhomogeneous Poisson distributions. The agenda includes an introduction, overview of methods, generating specific distributions, summary, and exercises in R to apply the methods.
The document describes a nested experimental design where sites are nested within study and control areas. There are multiple sites in each area with replications. A nested ANOVA can test for differences between areas while accounting for variability among sites within areas. The example shows calculations for a nested ANOVA with 3 areas, 4 sites each, and 3 replications. It finds a significant difference among sites within areas but not between the overall study and control areas.
FFT is an efficient algorithm to compute the discrete Fourier transform (DFT) and convert a time domain signal to its frequency domain representation. Radix-2 FFT is the most common algorithm, in which the input is divided into groups of 2 samples at each stage. FFT algorithms generally have a number of samples that is a power of 2, like 2N, to efficiently compute the DFT. The radix-2 FFT breaks the computation into "butterflies" or decimation in time (DIT) and decimation in frequency (DIF) structures to recursively compute the DFT. Twiddle factors representing complex roots of unity are used to compute the outputs of each butterfly operation.
This document summarizes the symptoms and characteristics of various insect pests that affect plants. It describes pests such as the shoot and fruit borer that cause withering of terminal shoots and bore holes in shoots and fruits. It also mentions stem borers that cause drooping and withering of young plant tops and stunting of older plants. Descriptions of other pests include the hadda beetle, leaf roller, lace wing bug, bud worm, hairy caterpillar, brinjal mealy bug, tobacco caterpillar, whitefly, spiraling whitefly, serpentine leaf miner, and striped mealybug. For each, it outlines the visual symptoms of damage caused and sometimes includes details on the
The document summarizes various pests that affect millet crops, including shoot flies, stem borers, pink borers, white borers, white grubs, root aphids, caterpillars, ear head caterpillars, web worms, gall midges, ear head bugs, ear head beetles, weevils, leaf beetles, flea beetles, leaf rollers, slug caterpillars, chafer beetles, grasshoppers, aphids, and whiteflies. It describes the symptoms, appearance of larvae and adults, and in some cases the specific crops affected for each pest.
Genes located on the same chromosome tend to be inherited together due to their physical linkage on the chromosome. However, linkage can be broken during meiosis via recombination between homologous chromosomes through the process of crossover. Recombination results in new combinations of parental alleles and limits the types of gametes produced under conditions of complete linkage. Morgan's experiments with Drosophila provided evidence of this by demonstrating non-Mendelian inheritance ratios when genes were linked versus unlinked.
The document discusses various methods used for physical and transcript mapping including somatic cell hybrids, radiation hybrid maps, fluorescence in situ hybridization, flow sorting chromosomes, pulsed field electrophoresis, clone contig mapping, chromosome walking, inverse PCR, bubble PCR, and PCR-based screening. It also discusses genetic mapping techniques such as identifying genetic markers and recombinants, calculating genetic vs physical distances, multipoint mapping, and autozygosity mapping.
Mendel observed patterns of inheritance in pea plants through experimentation with traits such as flower color, seed shape, and pod color. His work provided evidence that heritable traits are specified by discrete units (later identified as genes) that are transmitted from parents to offspring in predictable patterns. Through experiments involving one trait (monohybrid crosses) and two traits (dihybrid crosses), Mendel deduced that genes assort and transmit independently during gamete formation and fertilization. Later work showed that traits are influenced not only by genes but also environmental factors and that variations exist in patterns of gene expression and dominance.
There are five main types of endogenous DNA damage including oxidation of bases from reactive oxygen species, alkylation of bases such as methylation, and hydrolysis of bases through deamination, depurination, and depyrimidination. DNA can also be damaged through alkylation and oxidation by exogenous sources like radiation exposure. Various mutagens including base analogs, intercalating agents, and UV light-induced thymine dimers further contribute to DNA damage that must be repaired.
The genetic code is a triplet code where each group of three nucleotides (codon) in mRNA specifies a single amino acid in the resulting polypeptide. The genetic code is almost universal across organisms, with 61 codons specifying 20 standard amino acids and 3 codons acting as termination signals. It exhibits several key properties including being comma-free, non-overlapping, degenerate, and containing start and stop signals. Wobble can occur in the third position of the codon-anticodon pairing to allow some redundancy.
Viruses contain genetic material surrounded by a protein capsid. They rely on host cells for replication and typically infect specific cell types of one host species. A viral genome can be DNA or RNA, single or double stranded, and circular or linear in size. Viruses enter host cells, hijack their machinery to produce viral components, and assemble new viral particles which are then released to infect more cells through either a lytic or lysogenic pathway.
Human DNA must be highly compressed to fit inside the nucleus. It achieves this by wrapping around proteins called nucleosomes, which act as spools. Each nucleosome contains an octamer of histone proteins around which 147 base pairs of DNA are wrapped. Multiple nucleosomes then coil further to form a 30nm fiber. This fiber is attached to a scaffold of RNA and proteins, forming loops that allow for further compaction of the DNA into chromosomes. The positioning of centromeres divides chromosomes into metacentric, submetacentric and acrocentric types in humans.
Polytene chromosomes are large chromosomes found in secretory cells like salivary glands that contain thousands of identical DNA strands aligned in parallel. This gives them a banded appearance with dark bands and clear interbands when viewed under a microscope. The bands represent regions of condensed and transcriptionally active DNA. B chromosomes are nonessential supernumerary chromosomes that are found in some populations but not others and can provide adaptive advantages in some species and environments.
The document discusses euploidy, or changes in the number of sets of chromosomes, in animals and plants. It notes that while most animal species are diploid, some natural variations exist, including polyploidy in certain tissues and endopolyploidy. Polyploidy is more common and tolerated in plants, with 30-35% of ferns and flowering plants being polyploid. Examples are given of polyploid crops like wheat, cotton, and strawberries that are important agricultural products.
This document discusses two topics: microRNAs and alternative splicing.
For microRNAs: Computational methods are used to predict microRNA genes by looking for evolutionarily conserved sequences that can form stem-loop hairpin structures. MicroRNAs regulate gene expression by binding to mRNA.
For alternative splicing: Splicing of pre-mRNA can result in different mRNA and protein isoforms through various combinations of exons. Bioinformatics methods aim to identify alternative splice variants by comparing cDNA and genomic sequences and analyzing microarray data. Splice graphs can model alternative splicing pathways.
This document discusses heat shock proteins and their relationship to cancer. Heat shock proteins help cells cope with stress and prevent protein damage. Studies show they play a role in cancer progression and drug resistance in breast cancer cells. Current research proposes investigating if plant extracts like Echinacea increase heat shock proteins in breast cancer cells. Understanding this relationship could help develop more effective cancer treatments. Further research on Echinacea's effects may provide insights into a possible cure for breast cancer.
DNA consists of a double helix structure made up of nucleotides. Each nucleotide contains a phosphate, pentose sugar, and one of four nitrogenous bases: adenine, thymine, guanine, or cytosine. The bases bond specifically with each other - adenine pairs with thymine and cytosine pairs with guanine. The sequence of these base pairs encodes genetic information.
Self-incompatibility is a plant's inability to set seed when self-pollinated due to morphological, genetic, physiological or biochemical causes controlled by the multi-allelic S locus. It is classified based on flower morphology, genes involved, site of expression, and pollen cytology. Two main types are distyly found in primula, controlled by two S alleles, and tristyly found in lythrum, controlled by S and M genes determining three style positions. Self-incompatibility prevents self-fertilization by arresting pollen tube growth when the pollen and pistil share the same S allele.
This document discusses different modes of reproduction in living organisms, including sexual reproduction, asexual reproduction, and their sub-types. Sexual reproduction involves the fusion of male and female gametes to produce offspring, while asexual reproduction does not. Some key sub-types are vegetative reproduction, apomixis (reproduction without fertilization), and gametophytic apomixis which allows embryo development without meiosis or fertilization. The document also notes the significance of different reproductive modes for plant breeding and genome evolution.
Genetics and Heredity defines key genetic terminology such as genes, alleles, chromosomes, loci, and inheritance. It discusses how Gregor Mendel conducted early experiments breeding pea plants over generations to develop the laws of inheritance and establish the foundations of genetics. Mendel demonstrated that traits are passed from parents to offspring through discrete factors, now known as genes, located on chromosomes. His work showed that some traits are dominant over others and that segregation and independent assortment of alleles allows for prediction of phenotypic ratios in offspring.
This document provides an overview of a course on basic plant breeding techniques. The course objectives are to understand how breeders meet breeding goals, learn classical and modern breeding methods, and see examples of genetics' importance in modern breeding. Key learning outcomes are to understand plant breeding developments, basics of genetics, and breeding concepts. The document then discusses the history and milestones of plant breeding, achievements in various crops, activities in plant breeding like creation of variation and selection, and breeding objectives like increasing yield and improving quality. It also covers concepts of centers of origin and diversity first proposed by Vavilov.
This document summarizes different patterns of inheritance including autosomal dominant, autosomal recessive, X-linked, and Y-linked traits. It describes the key characteristics of each type of inheritance such as affected family members, chance of passing the trait to offspring, and examples of related genetic disorders.
Sex determination in animals can occur through homogametic or heterogametic sex determination. In homogametic sex determination, the presence of a Y chromosome determines maleness in mammals and fish. In heterogametic sex determination, the presence of a second X chromosome determines femaleness in fruit flies, while the presence of two Z chromosomes determines maleness in birds, amphibians, and reptiles. For insects like grasshoppers and plants, the female is XX and the male is XO. In fruit flies, sex is determined by the ratio of X chromosomes to sets of autosomes, and the inactive X chromosome is inactivated through a process where Xist expression coats the inactive X chromosome in RNA
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
3. Interaction
– test of additivity
H0: the effect of a factor
is not affected by the other factor
– in plots: mean-connecting lines are paralel
mow ed*fertilized; LS Means
Current effect: F(1, 16)=0.0000, p=1.0000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
mow ed 0
mow ed 1
0 1
fertilized
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
No
.
species
mow ed*fertilized; LS Means
Current effect: F(1, 16)=18.000, p=.00062
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
mow ed 0
mow ed 1
0 1
fertilized
0
2
4
6
8
10
12
14
No.species
no interaction:
additive effect
lines are paralel
the effect of mowing
is the same
regardles of fertilization
interaction:
non-additive effect
lines are not paralel
the effect of mowing
is more pronounced
in unfertilized plots
4. Chocolate rats heavier, music no effect
Number of observations
– should be balanced in all groups
music
diet Rock Folk
Paper 15 15
Chocoloate 15 15
Mean mass of rats ~ diet + music
music
diet Rock Folk
Paper 20 40
Chocoloate 10 20
music
diet Rock Folk
Paper 30 10
Chocoloate 10 30
Number of observations
Chocolate rats heavier, Folk rats heavier
5. site 1 site 2
site 3 site 4
fixed: fertilization
random: sites
you are not interested in the effect of site
you can generalize to all (comparable) sites
Fixed vs. random effects of predictors
– depends on how general is your research question
– fixed effect: you are interested in differences
between particular factor levels only
(fertilized / unfertilized, breed1 / breed2 / breed3)
– random effect: you want to generalize the results
to all other possible levels
(site1 / site2 / site3 / site4, breed1 / breed2 / breed3, sampling-unit identity)
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
your results are valid only for your site
6. site 1 site 2
site 3 site 4
fixed: fertilization
random: sites
you are not interested in the effect of site
you can generalize to all (comparable) sites
Fixed vs. random effects of predictors
– Several computaional approaches
– simple use of different MS in F test:
– complicated advanced fitting in R
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
your results are valid only for your site
Effect tested A fixed
B fixed
A random
B random
A fixed
B random)
Factor A MSA/MSe MSA/MSAB MSA/MSAB
Factor B MSB/MSe MSB/MSAB MSB/MSe
A x B interaction MSAB/MSe MSAB/MSe MSAB/MSe
7. 2 bedrocks: site 1 site 2 site 3
granite
limestone
site 4 site 5 site 6
Factors:
fertilization
sites (nested in bedrock)
– it is impossible to have more bedrocks on one site
bedrock
Hierarchical (nested) design
– not all combination of factor levels are available
– factor with more levels
is “nested in“ factor with less levels
Fbedrock = MSbedrock / MSsite
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
8. R – fairly complicated to fit a proper models with
nested factors
random effects
interactions
packages: lme4 (simpler) or nlme (advanced)
https://www.jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r
2 bedrocks: site 1 site 2 site 3
granite
limestone
site 4 site 5 site 6
Factors:
fertilization (fixed)
sites (random, nested in bedrock)
– it is impossible to have more bedrocks on one site
bedrock (fixed)
fertilization:bedrock (interaction of fixed factors is interesting, other usually not)
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
F U F U
U F U F
11. Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.286421 7.220427 2.948 0.00652 **
productivity -0.017128 0.007928 -2.160 0.03977 *
temperature 0.245520 0.681770 0.360 0.72156
...
Multiple R-squared: 0.192, Adjusted R-squared: 0.132
F-statistic: 3.207 on 2 and 27 DF, p-value: 0.0563
> anova(lm(seedlings~temperature+productivity,data=seedl))
Analysis of Variance Table
Response: seedlings
Df Sum Sq Mean Sq F value Pr(>F)
temperature 1 42.80 42.800 1.7454 0.19755
productivity 1 114.46 114.464 4.6678 0.03977 *
Residuals 27 662.10 24.522
> anova(lm(seedlings~productivity+temperature,data=seedl))
Analysis of Variance Table
Response: seedlings
Df Sum Sq Mean Sq F value Pr(>F)
productivity 1 154.08 154.084 6.2834 0.01851 *
temperature 1 3.18 3.180 0.1297 0.72156
Residuals 27 662.10 24.522
ANOVA test of whole model
ANOVA tests of each predictor:
Mind the order of predictors
simple vs. partial effects (in addition to the previous predictors)
Multiple regression
main effects of two predictors:
summary(lm(seedlings ~ productivity + temperature, data=seedl))
Explained variation:
(43+114)/(43+114+662)=0.192
12. General linear models
– ANOVA and Regression are equivalent
– same idea of testing variability explained by a model
– fitting model by least squares
13. Square of the difference
=TOTAL square
Overall mean
Group
mean
Difference from
the group mean
Square of
the difference
= ERROR square
Difference of the
group mean
from the overall
mean
Square of
the difference
= GROUP square
Difference from
the overall mean
Variance: mean of squared differences from mean
– get the differences
– square them
14. Sums of squares in regression
Total square
Error
square
Regression
square
ܱܵܵܶܶ = 𝑌𝑖 − 𝑌 2
ܴܴܵܵܩܧ = 𝑌𝑖 − 𝑌
2
ܵܵ𝑒 = 𝑌𝑖 − 𝑌𝑖
2
This square is minimized
Individual values of Y
Mean of Y
Individual fitted
values of Y (values Y calculated
as Y= a + bx
Fitted value
mean(Y)
15. General linear models
– ANOVA and Regression are equivalent
– same idea of testing variability explained by a model
– fitting model by least squares
–> both types of predictors (numeric, factor)
can be combined
– you can use any wild combination of predictor types,
interactions, nestedness, random effects...
– one more semester:
(P. Šmilauer: Modern Regression Methods, KBE/785E)
– simplest case – analysis of covariance
– 1 numeric predictor
– 1 categorical predictor
– no interaction
– model – paralel lines
17. Analyzing many predictors:
If you have too many predictors
(e.g. measures of everything in field observation)
do not include everything to your model!
–> fit Minimal adequate model
– backward selection
– include everything to the first model,
remove all non-significant terms
– forward selection
– start with the null model
– add individual terms
– one by one (due to colinearity)
– based on p-value or AIC
– analyze final model
18. AIC - Akaike information criterion
AIC = 2 k – 2 log ( n / SSE ) + C
k – number of model parameters
(i.e. model df)
SSE – residual sum of squares (RSS),
C – constant (can be ignored)
Quantifies the information accounted for by a predictor
– lower AIC suggests a better fit,
absolute values of AIC are not informative
– allows comparisons between models with different
number of df – penalization of complicated models
Can be combined with an F-test of significance
19. Question: What does the mass of human body depend on?
Sampling design: 21 randomly chosen inhabitants of České Budějovice.
Minimal adequate models – Forward selection
Body
mass
Sex Body
height
Hair
colour
Vegetarian Hours spent weekly
by physical excersise
mass sex height colour vegetarian hours
95 M 185 blonde 1 3
96 M 165 blonde 0 2
91 M 178 blonde 1 1
82 M 186 blonde 0 2
87 M 196 black 1 4
75 M 178 black 1 6
81 M 186 black 0 2
84 M 187 black 0 6
95 M 196 red 1 1
100 M 201 red 0 8
69 M 169 red 0 12
52 F 156 blonde 0 1
58 F 168 blonde 0 8
62 F 178 blonde 0 5
61 F 168 blonde 1 6
45 F 155 black 0 4
55 F 164 black 1 3
71 F 181 black 0 1
83 F 185 red 1 2
62 F 175 red 0 4
64 F 171 red 1 2
20. Minimal adequate models – Forward selection
Question: What does the mass of human body depend on?
Sampling design: 21 randomly chosen inhabitants of České Budějovice.
For each person was recorded:
– body mass,
– sex,
– body height,
– hair colour,
– whether he/she is vegetarian
– number of hours spent weekly by physical exercise.
Start with null model:
> lm.0<-lm(mass~+1, data=BM)
> add1(lm.0, .~.+sex*height*colour*vegetarian*hours, test="F")
Single term additions
Model:
mass ~ +1
Df Sum of Sq RSS AIC F value Pr(F)
<none> 5318.7 118.224
sex 1 3410.9 1907.7 98.692 33.9710 1.295e-05 ***
height 1 3194.1 2124.6 100.953 28.5649 3.704e-05 ***
colour 2 191.1 5127.6 121.455 0.3354 0.7194
vegetarian 1 224.8 5093.9 119.317 0.8384 0.3713
hours 1 98.6 5220.0 119.830 0.3591 0.5561
21. next step
> lm.1<-update(lm.0, .~.+sex)
> add1(lm.1, .~.+sex*height*colour*vegetarian*hours, test="F")
Single term additions
Model:
mass ~ sex
Df Sum of Sq RSS AIC F value Pr(F)
<none> 1907.7 98.692
height 1 791.27 1116.5 89.441 12.7570 0.00218 **
colour 2 297.60 1610.1 99.131 1.5711 0.23655
vegetarian 1 139.13 1768.6 99.102 1.4160 0.24952
hours 1 289.43 1618.3 97.237 3.2193 0.08959 .
next step
> lm.2<-update(lm.1, .~.+height)
> add1(lm.2, .~.+sex*height*colour*vegetarian*hours, test="F")
Single term additions
Model:
mass ~ sex + height
Df Sum of Sq RSS AIC F value Pr(F)
<none> 1116.47 89.441
colour 2 245.787 870.68 88.220 2.2583 0.13681
vegetarian 1 45.693 1070.77 90.564 0.7254 0.40620
hours 1 192.420 924.05 87.469 3.5400 0.07714 .
sex:height 1 192.466 924.00 87.468 3.5410 0.07710 .
stop here (based on p) or include interaction (based on AIC)
Minimal adequate models – Forward selection
22. Analysis of final model
> anova(lm.0, lm.1, lm.2, test="F")
Analysis of Variance Table
Model 1: mass ~ +1
Model 2: mass ~ sex
Model 3: mass ~ sex + height
Res.Df RSS Df Sum of Sq F Pr(>F)
1 20 5318.7
2 19 1907.7 1 3410.9 54.992 7.094e-07 ***
3 18 1116.5 1 791.3 12.757 0.00218 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(lm.2)
Call:
lm(formula = mass ~ sex + height, data = BM)
Residuals:
Min 1Q Median 3Q Max
-8.559 -5.865 -2.027 3.041 20.865
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -41.8184 28.9782 -1.443 0.166170
sexM 16.9264 4.1986 4.031 0.000783 ***
height 0.6062 0.1697 3.572 0.002180 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.876 on 18 degrees of freedom
Multiple R-squared: 0.7901, Adjusted R-squared: 0.7668
F-statistic: 33.87 on 2 and 18 DF, p-value: 7.914e-07
Minimal adequate models – Forward selection
23. Conclusion: Body mass is
significantly dependent on
sex and body height, these
predictors have additive
effects
Men are on average heavier
than women and the mass
increases with height
But be careful here
because sex and height
are related to each other!
Minimal adequate models – Forward selection
24. How to plot the figure
plot(mass~height, data=BM, type="n", ylab="Body mass", xlab="Body height")
### plots an emply plot, i.e. the axes (with appropriate ranges, so the data fit in -
THIS IS IMPORTANT) and labels; this is specified by type="n"
points(mass~height, data=BM[BM$sex=="M",], pch=16)
### adds full points for males to the empty plot
points(mass~height, data=BM[BM$sex=="M",], pch=1)
### empty points for females
###generates x values to be later used by predict function
males.pred<-data.frame(sex="M", height=150:205)###generates a range of the height
predictor values for which the fitted values for males should be generated
females.pred<-data.frame(sex="F", height=150:205)###same for females
### predicts y values based on the model
lines(150:205, predict(lm.2, newdata=males.pred))
###Adds a solid line to the plot, corresponding to the regression fit for males
lines(150:205, predict(lm.2, newdata=females.pred), lty=2)
###Adds a dashed line to the plot, corresponding to the regression fit for females
legend(x="bottomright", legend=c("Males", "Females"), pch=c(16,1), lty=c(1,2),
inset=0.05, bty="n")
### Adds a legend to the plot
25. Overall conclusion
Statistics:
Numbers and formulas
– summary statistics – how big and variable are data
– hypothesis testing
– p – are the relationships larger than random?
– choose test based on data type
– data arrangement
Logic of discovery
– observation vs. experiment
– statistical vs. causal relationship
– avoid all possible bias
– random selection, proper control treatment
– enough replicates
26. Continuous
(e.g. 0.3, 4, 7, 5.2 etc.)
Ordinal
(e.g. 1=little,
2=medium,
3=a lot)
Categories
frequencies or percentages
(e.g. germinated: 18,
not germinated: 32)
Type of dependent variable
Type
of
predictor
Categories
–>
comparison
of
means
2 groups: t-test
(paired or not)
>2 groups:
one-way ANOVA
2 or more predictors:
two/more-way ANOVA
2 groups (not paired):
Mann-Whitney test
2 groups (paired):
Wilcoxon test
Continuous
–>
linear
relationship
2 variables, one cause
and one effect:
simple regression
2 variables,
no cause / effect:
Pearson correlation
>2 variables,
more causes
and one effect:
multiple regression
2 variables:
Spearman correlation
1 grouping variable:
Goodnes of fit
18 : 32
>2 groups:
Kruskal-Wallis test
more predictors
of both types:
General linear models
Both
types
>1 grouping variable:
Contingency table
A B
C 18 32
D 26 24
Summary statistics
How big?
Mean, median...
How variable?
Variance, quartile range,
standard deviation,
coef. of variation...
How accurate estimate?
Standard error,
confidence interval