Consider a binary SNP matrix M with 1 representing the derived allele, and 0 as the ancestral allele. The SNP matrix has n haplotypes. The HAF vector of a haplotype h in M, denoted c_h, is obtained by taking the binary haplotype vector and replacing non-zero entries (derived alleles carried by the haplotype) with their respective frequencies in the sample. Define the HAF score of h as: HAF (h) sigma_j c_h[j] where the sum proceeds over all segregating sites j in the genomic region. Derive an expression for the Expected value of HAF (h) for h randomly sampled from a neutrally evolving sample of n individuals. Solution SNP (Single Nucleotide Polymorphism) Frequency of SNPs is greater than that of any other type of polymorphism. Having the understanding of the structure and frequencies of haplotypes is very important for associating genetic polymorphisms with a given trait. It also helps in inferring the genetic genealogy of alleles in a population. Single nucleotide polymorphism (SNP) haplotypes can be determined without ambiguity when an individual does not have more than one heterozygous site in a given genomic region. The Haplotype Allele Frequency (HAF) score assigned to individual haplotypes in a sample naturally captures many of the properties shared by haplotypes carrying a favored allele. Haplotype: Combinations of genetic variants occurring on the same DNA molecule. Each gene in the diploid genome has two sequences, one on each haplotype. Methods for detecting the genomic signatures of natural selection have been heavily studied. They have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. a) Expected value of HAF (h) for h randomly sampled from a neutrally evolving sample of n individuals. Consider a neutrally evolving sample of n individuals. We assume that all sites are biallelic, and at each site, we denote ancestral alleles by 0 and derived alleles by 1. We also assume that all sites are polymorphic in the sample. The HAF vector of a haplotype h, denoted by c, is obtained by taking the binary haplotype vector and replacing non-zero entries (derived alleles carried by the haplotype) with their respective frequencies in the sample. For parameter , we define the - HAF score of c as: where the sum proceeds over all segregating sites j in the genomic region. The 1-HAF score of a haplotype amounts to the sum of frequencies of all derived alleles carried by the haplotype. The - HAF score is equivalent to the -norm of c raised to the th power..