Introduction to statistics ii

•Download as PPTX, PDF•

0 likes•452 views

This document discusses approaches for analyzing microarray gene expression data with few observations. It suggests assuming a canonical distribution based on prior knowledge, like the log-normal distribution often seen in this type of data. Parameters of this distribution can then be estimated from the available observations using techniques like estimating the mean and variance. A t-statistic can be used to determine if two gene conditions have the same distribution parameters and thus identify differentially expressed genes. Quality control plots of the data are recommended to validate distribution assumptions.

Technology Business

Background

μ, σ2

• Few observations made by a black box

• What is the distribution behind the black box?

• E.g., with what probability will it output a number
bigger than 5?

Approach

• Easy to determine with many observations

• With few observations..

• Assume a canonical distribution based on prior
knowledge

• Determine parameters of this distribution using
the observations, e.g., mean, variance

Estimating the variance σ2

Chi-Square if
the original
distribution
was Normal

Microarray Data
• Many genes, 25000

• 2 conditions (or more), many replicates within
each condition

• Which genes are differentially expressed
between the two conditions?

More Specifically
• For a particular gene
– Each condition is a black box
– Say 3 observations from each black box

• Do both black boxes have the same
distribution?
– Assume same canonical distribution
– Do both have the same parameters?

Which Canonical Distribution
• Use data with many replicates

• 418.0294, 295.8019, 272.1220, 315.2978, 294.2242,
379.8320, 392.1817, 450.4758, 335.8242, 265.2478,
196.6982, 289.6532, 274.4035, 246.6807, 254.8710,
165.9416, 281.9463, 246.6434, 259.0019, 242.1968

• Distribution??

Distribution of log raw intensities
across genes on a single array

The QQ plot of log scale intensities
(i.e., actual vs simulated from normal)

QQ Plot against a Normal Distribution
• 10 + 10 replicates in
two groups

• Single group QQ plot

• Combined 2 groups QQ
plot

• Combined log-scale QQ
plot
Shapiro-
Wilk Test

Which Canonical Distribution

• Assume log normal distribution

Benford’s Law
• Frequency distribution of first significant digit

Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]

Differential Expression

μ1,σ12 μ2,σ22

Group 1 Group 2

Is μ1= μ2?
σ1 = σ2 ? Is variance a
function of
mean?

SD
increases
linearly
with Mean

SD vs Mean across 3 replicates plotted for all genes

SD is flat
now,
except for
very low
values

Another
reason to
work on
the log
scale

SD vs Mean across 3 replicates computed for all
genes after log-transformation

Differential Expression

μ1,σ12 μ2,σ22

Group 1 Group 2

Is μ1= μ2?
σ1 = σ2 ? Sort-of YES

The T-Statistic
Flattened
Normal or T-
Distribution

The curve
fit here
may be a
better
estimate

Lots of false
positives can Not much
be avoided difference
here here

SD vs Mean across 3 replicates computed for all
genes after log-transformattion

Multiplying fractions involves multiplying the numerators and denominators, then simplifying the resulting fraction. When multiplying a fraction by a whole number, change the whole number to an equivalent fraction. It is often easier to simplify fractions before multiplying them. To multiply mixed numbers, first convert them to improper fractions, then multiply the numerators and denominators and simplify. Practice problems are provided for the learner to try.

Vm3

sumanramanujancamford

This document discusses two Vedic mathematics sutras - Ekadhikena Purvena and Nikhilam. [1] Ekadhikena Purvena allows for quick multiplication by adding one more than the previous digit, while Nikhilam allows for quick division by dividing each digit by the divisor increased by one. [2] Examples are provided of using these sutras to compute decimal expansions of fractions like 1/19, 1/29 and to perform divisions like 111/89, 1234/888. [3] Crucial steps of the algorithms are outlined, like starting with 1 as the last digit in a fraction and multiplying successive digits by 2 for the Ekadhikena Purvena method

Fractions division

Terry Golden

Fractions multiplicatin

Terry Golden

Central tendency

heyyou02

This document defines and provides examples of different measures of central tendency including range, mean, median, and mode. It then provides 9 questions asking the reader to identify the central tendency measure that best represents given data sets and explain their reasoning. Examples include data on product defects, test scores, puppy weights, commuting methods, snowfall, cereal and property tax bills. The reader is asked to calculate and compare the mean, median, mode, range and identify any outliers for each data set.

Long division

lima49

This document provides a step-by-step explanation of the long division process using a family metaphor. It describes the five steps of long division as represented by different family members: Dad (divide), Mom (multiply), Sister (subtract), Brother (bring down), and Rover (repeat or remainder). It then works through an example problem of 4947 divided by 2 to demonstrate each step. In the end, the student is congratulated on completing the long division problem correctly with a remainder of 1.

Multiplication on decimals

NeilfieOrit2

The document discusses multiplying decimals. It provides steps for multiplying decimals which are: 1) write the problem vertically, 2) ignore the decimal points and multiply, 3) determine where the decimal point goes in the product by counting how many places the decimal point has moved. Examples are given showing the application of these steps. The key points are that decimals are multiplied like whole numbers and the decimal point is placed in the product by counting how many places it moves to the left.

Dividing Fraction

mrsbrown109

This document discusses suffix trees, including their definition, important contributors, construction, implementation, and applications. Key points: - A suffix tree is a compressed trie representing all suffixes of a string. It allows efficient pattern matching in linear time. - Important contributors include Weiner (1973), McCreight (1976), Ukkonen (1995), and Farach (1997) who developed faster construction algorithms. - There are multiple ways to implement a suffix tree, including using sibling lists, hash maps, balanced search trees, or sorted arrays. External memory may be needed for very large trees. - Applications include fast substring search, longest common substring problems, and data compression. References provide more details on

Packet forwarding in wan.46

myrajendra

Packet forwarding in WANs involves breaking messages into packets that are transmitted individually and may follow different routes to the destination. There are two types of packet switching networks: virtual circuit networks which set up dedicated routes for connections, and datagram networks which treat each packet independently. WAN addresses use a hierarchical structure to simplify forwarding, with some bits identifying the packet switch and others identifying the connected computer. Packet switches use routing tables containing next-hop forwarding information to route packets towards their destination based on the address without knowing the complete network topology.

Trie tree

Shakil Ahmed

Suffix Tree and Suffix Array

Harshit Agarwal

Suffix trees and suffix arrays are data structures used to solve problems related to string matching and text indexing in an efficient manner. Suffix trees allow finding patterns in text in O(m) time where m is the pattern length, by traversing the tree. Suffix arrays store suffixes in sorted order and allow pattern searching in O(m+logn) time where n is text length. Both structures take O(n) time and space to construct where n is text length. They find applications in bioinformatics, data compression, and other string algorithms.

Data structure tries

Md. Naim khan

Tries are a data structure for storing strings that allow for fast pattern matching. A trie is a tree where each edge represents a character and each path from the root node to a leaf spells out a key. Standard tries insert strings by adding nodes for each character. Compressed tries reduce redundant nodes by compressing chains. Suffix tries store all suffixes of a text in a compressed trie to enable quick string queries. Tries support faster insertion and lookup compared to hash tables, with no collisions between keys.

Lec18

Nikhil Chilwant

The document discusses different types of tries data structures and their applications. It describes standard tries, compressed tries, and suffix tries. Standard tries support operations like finding, inserting, and removing strings in time proportional to the string length and alphabet size. Compressed tries reduce space by compressing chains of redundant nodes. Suffix tries store all suffixes of a text in linear space and support fast pattern matching queries in time proportional to the pattern length plus the number of matches. The document provides examples of using tries for text processing, web search indexing, internet routing, and other applications.

Fundamentals

myrajendra

The document discusses the fundamentals of computer systems, including definitions, components, and how they work together. It defines a computer as an electronic device that accepts input, processes it, and provides output. The key components are the input and output units, memory unit, CPU (consisting of the ALU and control unit), and secondary storage. The input and output units send and receive data, the memory unit temporarily stores programs and data, the CPU performs arithmetic/logical operations and coordinates tasks, and secondary storage provides long-term storage. Together these components work to accept user input, process the data, and provide the results.

Tries - Tree Based Structures for Strings

Amrinder Arora

Basic Packet Forwarding in NS2

Teerawat Issariyakul

This document provides an overview of basic packet forwarding in the NS2 network simulator. It discusses NSObjects, which inherit functionality for interfacing with OTcl and handling default actions. NSObjects have a new recv function for receiving packets. Packet forwarding involves an object sending a packet to another object by calling its recv function. Examples of specific NSObject subclasses like Connector and Queue are also presented, which inherit from NSObject and implement packet forwarding using recv.

Application of tries

Tech_MX

Tries are tree data structures that are useful for storing and retrieving data with associative keys like strings, especially variable-length strings. Tries have applications in auto-complete suggestions, longest prefix matching for IP routing, spell checking, phone book contact searching, and predictive text entry systems like T9. The keys are stored based on prefix matching in a trie, allowing fast lookup and retrieval of all keys that start with a given prefix in near-constant time.

Digital Search Tree

East West University

Trie Data Structure

নিষ্পাপ হ্যাকার

Trie is an efficient data structure for storing and retrieving strings. It stores strings in a tree structure, with each node representing a character. Common operations on a trie like insertion, deletion and searching of strings can be performed in O(M) time where M is the length of the string. The document then provides details on the node structure used to implement a trie, along with pseudocode for inserting strings like "Apple" and "Army" into an empty trie.

Multi ways trees

SHEETAL WAGHMARE

1. A multi-way search tree allows nodes to have up to m children, where keys in each node are ordered and divide the search space. 2. B-trees are a generalization of binary search trees where all leaves are at the same depth and internal nodes have at least m/2 children. 3. Searching and inserting keys in a B-tree starts at the root and proceeds by comparing keys to guide traversal to the appropriate child node. Insertion may require splitting full nodes to balance the tree.

Cis82 e2-1-packet forwarding

Harjanto Handi Kusumo

This presentation provides an overview of routing and packet forwarding concepts. It discusses how routers operate at the network layer to determine the best path between networks and forward packets accordingly. Routers use their routing tables and routing protocols to choose the optimal route. The document also examines the internal components of routers, including CPU, memory, and interfaces, as well as the bootup process.

Introduction to statistics

Strand Life Sciences Pvt Ltd

The document discusses modeling unknown distributions with few observations by assuming they come from a canonical distribution family with unknown parameters. It explains that canonical distributions like the normal, binomial, and Poisson distributions can describe data using just a few parameters. The document then discusses how to estimate these unknown parameters, like the mean and variance, from the limited observations. Specifically, it addresses estimating the mean using the sample mean as an unbiased estimate, and estimating the variance, noting the need for an unbiased estimate that becomes more precise as the number of observations increases.

T Test For Two Independent Samples

shoffma5

An independent t-test is used to compare the means of two independent groups on a continuous dependent variable. It tests if there is a statistically significant difference between the population means of the two groups. The test assumes the groups are independent, the dependent variable is normally distributed for each group, and the groups have equal variances. To perform the test, the researcher states the hypotheses, sets an alpha level, calculates the t-statistic and degrees of freedom, and determines whether to reject or fail to reject the null hypothesis by comparing the t-statistic to the critical value.

DNA Microarray

jaipur national university jaipur

DNA microarrays, also known as DNA chips, allow simultaneous measurement of gene expression levels for every gene in a genome. Microarrays detect messenger RNA (mRNA) or complementary DNA (cDNA). They are manufactured by amplifying individual genes using PCR and spotting them on a medium like a glass slide. When fluorescently labeled cDNA from two samples are hybridized to the array, the level of fluorescence indicates gene expression differences between the samples. Microarray data is analyzed to identify genes that are up-regulated or down-regulated under different experimental conditions.

Explorando a Cognição Neural: Mente, Cérebro e Comportamento

tidihi5139

A cognição neural é o estudo fascinante da maneira como o cérebro processa informações e gera pensamentos, emoções e comportamentos. Nos últimos anos, avanços significativos em tecnologias de neuroimagem e neurociência computacional têm nos permitido investigar os intricados processos neurais subjacentes à cognição com uma precisão sem precedentes. Os neurônios, as células fundamentais do sistema nervoso, formam redes complexas que se comunicam por meio de impulsos elétricos e neurotransmissores. Essas redes neurais são altamente adaptáveis e capazes de reorganização dinâmica, o que permite ao cérebro aprender, lembrar e tomar decisões. A cognição neural examina como essas redes neurais se organizam e interagem para processar informações sensoriais, resolver problemas, planejar ações e criar memórias. Ao estudar padrões de atividade cerebral e conectividade funcional, os cientistas podem desvendar os mecanismos subjacentes a funções cognitivas complexas, como atenção, linguagem, memória e tomada de decisão. Além disso, a cognição neural também investiga como fatores genéticos, experiências de vida e transtornos neurológicos influenciam o funcionamento do cérebro e a cognição. Com uma compreensão mais profunda desses processos, esperamos abrir novas oportunidades para o desenvolvimento de tratamentos inovadores para distúrbios cerebrais e para aprimorar a cognição e o bem-estar humano.

Microarray Analysis

James McInerney

Microarrays allow researchers to analyze gene expression across thousands of genes simultaneously. DNA probes are arrayed on a small glass or nylon slide, and labeled mRNA from samples is hybridized to the probes. Fluorescent scanning detects which genes are expressed. Data analysis includes normalization, distance metrics, clustering, and visualization to group genes with similar expression profiles and identify patterns of co-regulated genes. Microarrays enable functional genomics studies of development, disease, response to drugs or environmental factors, and more.

GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx

BHAGWAT NAWADE

How to analyse bulk transcriptomic data using Deseq2

AdamCribbs1

DESeq2 is used to analyze differential expression from RNA-seq count data using a generalized linear model. It models counts using a gamma-Poisson distribution and estimates dispersion using empirical Bayes shrinkage. Key steps include normalizing counts, estimating dispersion, fitting the linear model, and using Wald and likelihood ratio tests to identify differentially expressed genes while controlling the false discovery rate. Results can be explored using plots of p-values, mean-variance trends, ordination plots, and heatmaps to visualize sample relationships and differentially expressed genes.

Lesson 3

Ning Ding

Viewers also liked

Introduction of suffix tree

Liou Shu Hung

Packet forwarding in wan.46

myrajendra

Trie tree

Shakil Ahmed

Suffix Tree and Suffix Array

Data structure tries

Lec18

Fundamentals

Tries - Tree Based Structures for Strings

Amrinder Arora

Basic Packet Forwarding in NS2

Application of tries

Digital Search Tree

Trie Data Structure

Multi ways trees

Cis82 e2-1-packet forwarding

Harjanto Handi Kusumo

Viewers also liked (14)

Introduction of suffix tree

Packet forwarding in wan.46

Trie tree

Suffix Tree and Suffix Array

Data structure tries

Lec18

Fundamentals

Tries - Tree Based Structures for Strings

Basic Packet Forwarding in NS2

Application of tries

Digital Search Tree

Trie Data Structure

Multi ways trees

Cis82 e2-1-packet forwarding

Similar to Introduction to statistics ii

Introduction to statistics

Strand Life Sciences Pvt Ltd

T Test For Two Independent Samples

shoffma5

DNA Microarray

jaipur national university jaipur

Explorando a Cognição Neural: Mente, Cérebro e Comportamento

tidihi5139

Microarray Analysis

James McInerney

GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx

BHAGWAT NAWADE

How to analyse bulk transcriptomic data using Deseq2

AdamCribbs1

Lesson 3

Ning Ding

unit 4 nearest neighbor.ppt

PRANAVKUMAR699137

This document discusses the k-nearest neighbors (k-NN) algorithm. It begins by explaining the basic principles of k-NN, including that records close to each other in a data space will be of the same type. It then discusses issues with k-NN like computational expense, storage requirements, and performance with high-dimensional data. The document goes on to discuss techniques to improve k-NN, including condensing the training data set to reduce redundant points while retaining the decision boundary, and using proximity graphs and editing algorithms to further refine the training set.

Statisticsforbiologists colstons

andymartin

This document provides an overview of various statistical methods for summarizing and analyzing biological data, including: - Calculating the mean, median, and mode to summarize sample data - Using distribution curves like histograms to visualize patterns in data and identify if the distribution is normal or skewed - Calculating standard deviation to quantify the variation of data from the mean - Using t-tests to compare two normally distributed samples and determine if differences are statistically significant - Using non-parametric tests like the Mann-Whitney U test for small or skewed sample comparisons - Applying the chi-squared test to analyze relationships between categorical variables - Using the Spearman rank correlation coefficient to identify monotonic relationships between two variable sets

Two dependent samples (matched pairs)

Long Beach City College

This document discusses testing differences between two dependent samples using matched pairs. It provides examples of how to: 1) Calculate the differences between matched pairs and find the mean and standard deviation of the differences. 2) Use a t-test to determine if the mean difference is statistically significant and construct a 90% confidence interval for the true mean difference between two dependent samples. 3) Apply these methods to an example comparing cholesterol levels before and after a mineral supplement, testing the claim that the supplement changes cholesterol levels.

Chapter one on sampling distributions.ppt

FekaduAman

The document discusses sampling distributions and their properties. It introduces key concepts like population parameters, sample statistics, estimators, and the central limit theorem. It explains that as sample size increases, the sampling distribution of the sample mean approaches a normal distribution with a mean equal to the population mean and standard deviation equal to the population standard deviation divided by the square root of the sample size. The sampling distribution tells us how close sample statistics are likely to be to the corresponding population parameters.

Standard Scores

shoffma5

The document discusses standard scores and normal distributions. It defines standard scores as transformed raw scores that allow comparison across different scales by putting them on a common scale. It then focuses on z-scores, which convert values to standardized units relative to the mean and standard deviation. The document also discusses how sample means are distributed normally as sample size increases, with a mean equal to the population mean and standard deviation called the standard error that decreases with larger samples. This allows determining if a sample mean is representative of the population.

Genetic Algorithms

Karthik Sankar

This document provides an introduction to genetic algorithms. It explains that genetic algorithms are inspired by Darwinian evolution and use processes like selection, crossover and mutation to iteratively improve a population of potential solutions. It discusses how genetic algorithms can be used for optimization problems and classification in data mining. Examples of genetic algorithm applications like the traveling salesman problem are also presented to illustrate genetic algorithm concepts and processes.

Microarray Statistics

A Roy

This document summarizes statistical methods for analyzing cDNA microarray data, including data preprocessing, normalization techniques, and statistical tests. It discusses alignment, background calculation, data transformation, normalization methods like global normalization, housekeeping gene normalization, and intensity-dependent normalization. Statistical tests covered include t-tests, multiple testing adjustments, permutation tests, and significance analysis of microarrays (SAM). The document concludes that no single method is best and different data may require trying different analytical approaches.

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)

Austin Benson

This document discusses methods for modeling networks using multifractal network generators (MFNG). MFNG is a recursive model that samples nodes into categories at different levels to generate graphs. The document outlines techniques for estimating MFNG parameters from real networks using method of moments, describes challenges in sampling from MFNG efficiently, and shows MFNG can match properties of Twitter and citation networks.

The T-test

ZyrenMisaki

The t-test is used to test hypotheses about population means when the population variance is unknown. It is closely related to the z-test but uses the t distribution instead of the normal. There are three main types of t-tests: single sample, independent samples, and dependent samples. The t-test compares the sample mean to the population mean and takes into account factors like sample size and variability. Larger sample sizes and stronger associations between variables increase the power of the t-test to detect significant differences or relationships.

Early generation selection in an intra population recurrent selection breedin...

CIAT

1. The document discusses using genome-wide markers and early generation selection to speed up recurrent selection breeding programs. It presents a genetic model to describe the effects of additive, dominance, and epistatic interactions over generations of recombination and selfing. 2. The model defines genotypic values and calculates genotype frequencies across generations to relate the means of offspring generations to the starting generation. It explores using early generations before fixation for genomic prediction to reduce the duration of selection cycles. 3. The line value concept is defined as the mean value of all recombinant inbred lines derivable from a plant or cross. The document proposes a method to predict line value using the phenotypes of a parent and its selfed offspring generation.

Antony Raj

1) The document discusses standard deviation and variance as measures of how dispersed data points are from the mean. It provides formulas to calculate population variance, sample variance, population standard deviation, and sample standard deviation. 2) Examples are given to demonstrate calculating variance and standard deviation from raw data sets and frequency distributions. This helps determine which data set or person is more consistent. 3) The empirical rule is described, stating that approximately 68%, 95%, and 99.7% of values in a bell-shaped distribution fall within 1, 2, and 3 standard deviations of the mean, respectively.

Association mapping, GWAS, Mapping, natural population mapping

Mahesh Biradar

This document discusses association mapping for crop improvement. It explains that association mapping exploits historical recombination events in populations to map quantitative trait loci with greater precision than family-based linkage analysis. Association mapping can be applied to diverse populations and detect more alleles than bi-parental mapping. Genome-wide association studies allow for high-resolution mapping of traits down to the sequence level by leveraging linkage disequilibrium. Statistical methods must account for population structure and kinship to avoid false positives in association analyses.

Similar to Introduction to statistics ii (20)

Introduction to statistics

T Test For Two Independent Samples

DNA Microarray

Explorando a Cognição Neural: Mente, Cérebro e Comportamento

Microarray Analysis

GGWS_M3_L5_Estimation_of_heritability_from_GWAS_summary_statistics.pptx

How to analyse bulk transcriptomic data using Deseq2

Lesson 3

unit 4 nearest neighbor.ppt

Statisticsforbiologists colstons

Two dependent samples (matched pairs)

Chapter one on sampling distributions.ppt

Standard Scores

Genetic Algorithms

Microarray Statistics

$Learning multifractal structure in large networks (Purdue ML Seminar)$ $Learning multifractal structure in large networks (Purdue ML Seminar)$

Learning multifractal structure in large networks (Purdue ML Seminar)

The T-test

Early generation selection in an intra population recurrent selection breedin...

Association mapping, GWAS, Mapping, natural population mapping

More from Strand Life Sciences Pvt Ltd

Strand genomics features in CIO review

Strand Life Sciences Pvt Ltd

The document discusses Strand Genomics Inc., which offers genomic analysis and clinical interpretation software and services. It focuses on personalized medicine by using its StrandOmics platform to analyze genomic data and determine disease risks for individuals. StrandOmics aims to make genomic testing routine in medical care to help clinicians make more informed decisions. Strand has grown to over 200 scientists and serves over 2,000 labs and 100,000 patients. Its partnership with Health Care Global Enterprises successfully piloted cancer risk assessment and molecular diagnosis for over 50 patients in India.

Rules of a Quantum World

Strand Life Sciences Pvt Ltd

1) The Stern-Gerlach experiment showed that electrons have an intrinsic spin state that can be either "up" or "down" depending on their orientation in space. 2) This behavior cannot be explained using classical mechanics and requires a quantum mechanical description of electron spin states. 3) Electron spin states can be described by complex probability amplitudes rather than classical probabilities. Rotations of the spin orientation are represented by unitary transformations of the probability amplitudes.

Least common ancestors in constant time

Strand Life Sciences Pvt Ltd

The document discusses using least common ancestor queries and the range minimum problem to efficiently pull out subtrees from a dendrogram for gene ontology analysis. It proposes: 1) Preprocessing the tree in linear time so the least common ancestor of two nodes can be returned in constant time. 2) Using this to linearize the tree and reduce it to finding the minimum value in a range of an array, which can also be done in constant time by preprocessing the array. 3) A divide and conquer approach to preprocess the array in linear time and space.

Introduction to statistics iii

Strand Life Sciences Pvt Ltd

This document discusses statistics and distributions for analyzing RNA sequencing (RNA-Seq) count data. It explains that unlike microarray data, there is no established distribution for RNA-Seq counts based on many replicates. The document explores using the Poisson distribution as a starting point since it models mean and variance equally, but RNA-Seq often shows over-dispersion where variance is greater than the mean. To address over-dispersion, the document considers using the gamma distribution or negative binomial distribution which can combine with the Poisson to account for both technical and biological sources of variation in the counts. It outlines approaches for estimating parameters and correcting for bias to fit an appropriate distribution to the RNA-Seq count data.

Dynamic programming for simd

Strand Life Sciences Pvt Ltd

This document discusses using SIMD (Single Instruction Multiple Data) instructions to parallelize dynamic programming algorithms. It provides an example of how a SIMD register can execute the same operation on multiple data elements in parallel. It also describes how to partition the dynamic programming problem into properly aligned chunks that can be processed simultaneously using SIMD to improve performance. Boundary conditions and dependencies between chunks are identified as challenges to be addressed for an effective parallel implementation.

Complex numbers polynomial multiplication

Strand Life Sciences Pvt Ltd

This document discusses fast algorithms for multiplying polynomials. It presents an algorithm that uses complex numbers to multiply polynomials in O(n log n) time, which is faster than the naive O(n^2) algorithm. It does this by converting the polynomial multiplication problem into an equivalent form involving evaluating the polynomials at certain points, which can then be sped up using the Fast Fourier Transform (FFT) algorithm. The FFT algorithm chooses the evaluation points to be complex roots of unity in a way that speeds up the polynomial evaluations.

Converting High Dimensional Problems to Low Dimensional Ones

Strand Life Sciences Pvt Ltd

1) High dimensional data can be projected to a lower dimensional space while approximately preserving distances between points. 2) Random projections are used - each original point is multiplied by random vectors to project it into fewer dimensions. 3) Analysis shows that projecting the points into around 12ln(n)/Δ^2 dimensions preserves all pairwise distances within a factor of 1±Δ with probability greater than 1-1/n.

Searching using Quantum Rules

Strand Life Sciences Pvt Ltd

This document summarizes the quantum algorithm known as Grover's algorithm for searching an unordered database. It explains that classically, searching an unsorted database of n items would require O(n) queries on average to find the single item satisfying some predicate f(). Grover's algorithm uses amplitude amplification in a quantum system to find this item using only O(√n) queries, providing a quadratic speedup over classical algorithms. It describes how the algorithm iteratively rotates the system state toward the target item using reflections about the average and target states. After O(√n) iterations, the target item can be measured with high probability.

Randomized algorithms

Strand Life Sciences Pvt Ltd

The document discusses using randomized algorithms to check if two large numbers or files are equal using only a small amount of communication. It explains how communicating the remainder of one number modulo a randomly selected prime number allows checking equality with very high probability while only communicating a number of bits proportional to the logarithm of the file size. Randomized algorithms can also be applied to problems like principal component analysis and checking matrix products to reduce computation time.

Suffix arrays

Strand Life Sciences Pvt Ltd

The document describes how to compute the suffix array of a string in linear time. It involves sorting suffixes into groups based on their value modulo some number v. The suffixes in each group are sorted recursively to determine their relative order, which provides the sorted order of all suffixes when combined. By using a difference cover set of size O(sqrt(v)), this process can be done in O(n) time overall.

Alignment of raw reads in Avadis NGS

Strand Life Sciences Pvt Ltd

More from Strand Life Sciences Pvt Ltd (11)

Strand genomics features in CIO review

Rules of a Quantum World

Least common ancestors in constant time

Introduction to statistics iii

Dynamic programming for simd

Complex numbers polynomial multiplication

Converting High Dimensional Problems to Low Dimensional Ones

Searching using Quantum Rules

Randomized algorithms

Suffix arrays

Alignment of raw reads in Avadis NGS

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

SOFTTECHHUB

As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Mind map of terminologies used in context of Generative AI

Kumud Singh

Full-RAG: A modern architecture for hyper-personalization

Zilliz

Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

GridMate - End to end testing is a critical piece to ensure quality and avoid...

ThomasParaiso2

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

Data structures and Algorithms in Python.pdf

TIPNGVN2

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

James Anderson

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved: - Reduction in onboarding time from 5 weeks to 1 day - Improved developer experience and productivity through actionable findings and reduction of false positives - Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO) Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production. We will cover: - How to remove silos in DevSecOps - How to build efficient development pipeline roles and component templates - How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence) - How to streamline operations with automated policy checks on container images

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Nexer Digital

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Mind map of terminologies used in context of Generative AI

Full-RAG: A modern architecture for hyper-personalization

Essentials of Automations: The Art of Triggers and Actions in FME

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

GridMate - End to end testing is a critical piece to ensure quality and avoid...

Microsoft - Power Platform_G.Aspiotis.pdf

Data structures and Algorithms in Python.pdf

UiPath Test Automation using UiPath Test Suite series, part 5

How to Get CNIC Information System with Paksim Ga.pptx

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

Monitoring Java Application Security with JDK Tools and JFR Events

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

A tale of scale & speed: How the US Navy is enabling software delivery from l...

Securing your Kubernetes cluster_ a step-by-step guide to success !

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Introduction to statistics ii

1. Statistics for Microarray Data

2. Background μ, σ2 • Few observations made by a black box • What is the distribution behind the black box? • E.g., with what probability will it output a number bigger than 5?

3. Approach • Easy to determine with many observations • With few observations.. • Assume a canonical distribution based on prior knowledge • Determine parameters of this distribution using the observations, e.g., mean, variance

4. Estimating the mean

5. Estimating the variance σ2 Chi-Square if the original distribution was Normal

6. Microarray Data • Many genes, 25000 • 2 conditions (or more), many replicates within each condition • Which genes are differentially expressed between the two conditions?

7. More Specifically • For a particular gene – Each condition is a black box – Say 3 observations from each black box • Do both black boxes have the same distribution? – Assume same canonical distribution – Do both have the same parameters?

8. Which Canonical Distribution • Use data with many replicates • 418.0294, 295.8019, 272.1220, 315.2978, 294.2242, 379.8320, 392.1817, 450.4758, 335.8242, 265.2478, 196.6982, 289.6532, 274.4035, 246.6807, 254.8710, 165.9416, 281.9463, 246.6434, 259.0019, 242.1968 • Distribution??

9. What is a QQ Plot

10. Distribution of log raw intensities across genes on a single array

11. The QQ plot of log scale intensities (i.e., actual vs simulated from normal)

12. QQ Plot against a Normal Distribution • 10 + 10 replicates in two groups • Single group QQ plot • Combined 2 groups QQ plot • Combined log-scale QQ plot Shapiro- Wilk Test

13. Which Canonical Distribution • Assume log normal distribution

14. Benford’s Law • Frequency distribution of first significant digit Pr(d<=x<d+1 )= log10(1+d)-log10(d), log10(x) is uniformly distributed in [0,1]

15. Differential Expression μ1,σ12 μ2,σ22 Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Is variance a function of mean?

16. SD increases linearly with Mean SD vs Mean across 3 replicates plotted for all genes

17. SD is flat now, except for very low values Another reason to work on the log scale SD vs Mean across 3 replicates computed for all genes after log-transformation

18. Differential Expression μ1,σ12 μ2,σ22 Group 1 Group 2 Is μ1= μ2? σ1 = σ2 ? Sort-of YES

19. The T-Statistic

20. The T-Statistic

21. The T-Statistic

22. The T-Statistic Flattened Normal or T- Distribution

23. A Problem

24. The curve fit here may be a better estimate Lots of false positives can Not much be avoided difference here here SD vs Mean across 3 replicates computed for all genes after log-transformattion

25. Thank You

Introduction to statistics ii

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Introduction to statistics ii

Similar to Introduction to statistics ii (20)

More from Strand Life Sciences Pvt Ltd

More from Strand Life Sciences Pvt Ltd (11)

Recently uploaded

Recently uploaded (20)

Introduction to statistics ii