Abstract this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a dense. Tag snp selection using particle swarm optimization. How to select tag snps in genetic association studies. Prioritize and select snps for association studies with multistage. A comparative study of tag snp selection using clustering. The tagsnp set that covers all snps is therefore the dominating set of the graph g. Taylor1,2, 1epidemiology branch and 2laboratory of molecular carcinogenesis. The power of association studies based on tag snps using genotype data is similar to that using haplotype data. Using tag snps for a genomewide association study allows the investigator to maximize information content and minimize sample size without losing the power. Therefore, less number of htsnps and more number of predictable snps cause a better fitness. Power analysis for genomewide association studies bmc. Jun 01, 2011 genomewide association studies gwass have been effectively identifying the genomic regions associated with a disease trait. For comparing alternative tag snp selection algorithms, we use coalescent simulation to.
Modelbased clustering for identifying diseaseassociated. Acknowledgments i would like to express my deepest gratitude to my. This power provides the fastest path to discovery and publication. Hapmap provides linkage disequilibrium ld information on a sample of 3. A general question for linkage disequilibriumbased association studies is how power to detect an association is compromised when tag snps are chosen from data in one population sample and then deployed in another sample. Sample sizes required at different powers of detecting. The emergence of very large cohorts in genomic research has facilitated a focus on genotypeimputation strategies to power rare variant association.
Imputation aware tag snp selection to improve power for multi. A tag snp is a representative single nucleotide polymorphism in a region of the genome with high linkage disequilibrium the nonrandom association of alleles at two or more loci. Genomewide association studies gwass aim to detect genetic risk factors for complex human diseases by identifying diseaseassociated singlenucleotide polymorphisms snps. Single nucleotide polymorphism snpset analysis in genomewide association studies gwas has. Optimized tag snp content and dense marker spacing mean spacing 1. For a tag snp selection problem using pairwise r 2, one can construct a graph g v,e with each vertex v i representing an snp s i. Imputationaware tag snp selection to improve power for large. Snp tagging and then evaluates freely available software for the selection of tag snps for genetic association studies. Once the tag snp statistics are computed, the genomic regions that are in linkage disequilibrium ld with the most. The power of genomewide association studies can be computed using a set of tag. Recently, several methods have been published to select subsets of. Functionally informative tag snps for disease association studies. Snps hold much promise as a basis for genomewide diseasegene association. Tag snp selection for association studies stram 2004 genetic.
Tag snp selection and its applications in association studies. For a candidate gene study, researchers can choose their tag snps. A tool for selecting snps for association studies based on. A key strategy to improve the efficiency of association studies is to select a subset of informative snps, called tag snps, for analysis johnson et al.
One application is to select a subset of the single nucleotide polymorphism snp biomarkers from the whole snp set that is informative and small enough for subsequent association studies. A novel prediction method for tag snp selection using. Software for tag single nucleotide polymorphism selection. Selection of snp subsets for association studies in candidate. These snps are usually chosen from haplotype data and are thus called haplotype tag snps htsnps. Informative snp selection problem issp given a sample s of a population p of individuals either haplotypes or genotypes on m snps, select positions of k k for any individual, one can predict non. Tag snp selection for association studies request pdf. Jun, 2007 hapmap provides linkage disequilibrium ld information on a sample of 3. Sep 15, 2004 2 the dependence of the performance of tag snp selection methods upon the density of snp markers genotyped for the purpose of haplotype discovery and tag snp selection. Criteria for the selection of single nucleotide polymorphisms in pathway pharmacogenetics. Pdf tag snp selection and its applications in association studies. Association studies can determine whether a genetic variant is associated with a disease or trait.
Selection of genetic markers this chapter focuses on single nucleotide polymorphisms snps the most common form of variation in the human genome. For example, a snp may replace the nucleotide cytosine c with the nucleotide thymine t in a certain stretch of dna. Imputationaware tag snp selection to improve power for. Structured genomewide association studies with bayesian. Selection of these tag snps poses several challenges as rare variants tend to be. Currently, typical genomewide association studies measure hundreds of thousands, or millions, of genetic variants. Efficient association study design via poweroptimized tag.
Tag snp selection for candidate gene association studies using hapmap and gene resequencing data article pdf available in european journal of humangenetics 1510. Tagger is a tool for the selection and evaluation of tag snps from genotype data such as that from the international hapmap project. Tag snp selection via a genetic algorithm sciencedirect. Tag snp selection using particle swarm optimization chuang. In tag snp selection problem, our goal is to achieve a feasible solution with smallest number of htsnps. In low and mediumbudget association studies, a limited number of tag snps are selected out of a large set of available snps previously typed in an initial cohort. Therefore, tag snp selection is not an issueoption for genomewide association studies. Effective tagging singlenucleotide polymorphism snpset selection is crucial to snpset analysis in genomewide association studies gwas. Pdf linkage disequilibrium ld plays a central role in association studies for identifying the genetic variation responsible for complex human. Tag snp selection for association studies, genetic.
Two vertices v i and v j are connected if and only if two corresponding snps s i and s j are correlated. Accordingly, the scale and cost of genotyping are expected to be largely reduced. Selection of representative snp sets for genomewide association. Selecting tagging snps for association studies using power. The value of genebased selection of tag snps in genome. The advantage is particularly striking when the set of tag snps is sparse.
Despite the advances in genotyping technologies which have led to large reduction in genotyping cost, the tag snp selection problem remains an important problem for computational biologists and geneticists. These snps are usually chosen from haplotypes and called haplotype tag snps htsnps. The ld measure r 2 has been used for tag snp selection 1, 12 because the statistical power of association studies is proportional to the value of r 2. Many methods have been developed, and new methods for tag snp selection are continuously being developed. Citeseerx document details isaac councill, lee giles, pradeep teregowda. The aim of this chapter is not to enumerate and detail all available methods for haplotype block partitioning and tag snp selection, but rather to focus on how to use the available methods, tools, and resources to facilitate tag snp selection in association studies. In this case, snp bta60194nors rs41587782 was in high ld with the representative tag snp and thus, was excluded in the final step of the selection strategy. Tag snp selection for prediction of tick resistance in. Oct 23, 2005 we investigated selection and analysis of tag snps for genomewide association studies by specifically examining the relationship between investment in genotyping and statistical power. Snp pvalue data and finds all snps in high ld with gwas snps, so that selection is from a much larger set of snps than the gwas itself. Imputation aware tag snp selection to improve power for. Pdf tag snp selection for candidate gene association.
Haplotype block partitioning and tag snp selection using. Selection of snp subsets for association studies in. Tagsnp selection based on pairwise ld criteria and power. Single nucleotide polymorphism snpset analysis in genomewide association studies gwas has emerged as a research hotspot for identifying genetic variants associated with disease susceptibility. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag snp selection, and the population of interest are required.
Analysis of epidemiologic studies of genetic effects and gene. Summary illuminas tag snp approach in tandem with the powerful infinium as. These tag snps are then typed in a larger set of control and affected individuals. Research article open access an efficient weighted tag snpset analytical method in genomewide association studies bin yan1, shudong wang1,2,3, huaqian jia1, xing liu1 and xinzeng wang1 abstract background. Multimarkerld based genetic algorithm for tag snp selection. Genomewide association studies gwas are meant to find the genetic. At the time of this study, genotypes based on resequencing data were available from the egp website for 52 387 snps in 391 genes from egp. Tag snp selection and association studies over the past few years, numerous disease association studies, both genomewide and.
Bayesian variable selection regression for genomewide. Significant genetic association may be interpreted as either 1 direct association, in which the genotyped snp is the true causal variant conferring disease susceptibility. It is possible to identify genetic variation and association to phenotypes without genotyping every snp in a chromosomal region. Selecting the smallest subset of tag snps that can predict the other snps would considerably minimize the complexity of genomewide or blockbased snpdisease association studies. Each snp represents a difference in a single dna building block, called a nucleotide. It combines the simplicity of pairwise tagging methods with the efficiency benefits of multimarker haplotype approaches. Snp and haplotype associations using a twostage design. R, should rise up by increasing the number of predicted snps or by decreasing the. In this paper, we present an or application for representative snp selection that implements our novel simulated annealing sa based featureselection. Tagging snpset selection with maximum information based on.
Tagsnp selection isan important step indesigning case control association studies. Methods for tag snp selection based on established multivariate statistical techniques may. The goal is to minimize the number of markers selected for genotyping in a particular. In anticipation of costeffective snp genotyping technologies and the availability of databases of a large number of candidate snps, many investigators are seriously considering genomewide snp scans with the hope of performing hypothesisfree disease association studies as opposed to hypothesisdriven candidate gene or region studies. Poweroptimized tag snp selection our poweroptimized tag snp selection method is a stepwise greedy procedure to maximize power. We incorporate functional predictions of protein structure, gene regulation, splicing and mirna binding. It is proposed a new tagsnpset selection method based on ld information. For both applications either for tag or index snp selection, the corresponding problem can be formulated as follows. Therefore, it is essential to select only informative snps representing the original snp distributions in the genome tag snp selection for genome. Tag snp selection for association studies stram 2004. Furthermore, we describe an innovative approach to combine both tag snp.
Transferability of tag snps in genetic association studies. The use of highdensity tag snp arrays mainly illumina hap300 and hap550 for genomewide association studies has virtually revolutionized the field and led to the identification of strong susceptibility loci for several types of malignancies, including breast cancer hunter et al. Imputationaware tag snp selection to improve power for largescale, multiethnic association studies genevieve l. Snp selection for pharmacogenetic association studies is discussed. Haplotype block partitioning and tag snp selection using genotype data and their applications to association studies kui zhang,1,2 zhaohui s. In a typical gwas, an informative subset of the singlenucleotide polymorphisms snps, called tag snps, is genotyped in casecontrol individuals. This work demonstrates that, while there may be limits given current reference panels, improving gwas scaffold design is an underused means to increase power in association studies. The recent advances in genotyping and molecular techniques have greatly increased the knowledge of the human genome structure.
Efficiency and power in genetic association studies nature. The power of intelligent snp selection the infinium assay provides the freedom to design the most powerful genotyping panels. Here, for any given subset of snps within a block, all pairwise r 2 values between the snps in this subset and the snps absent in this subset are calculated. A tag snp is a representative single nucleotide polymorphism snp in a region of the genome with high linkage disequilibrium that represents a group of snps called a haplotype. Transferability of tag snps in genetic association studies in. Power calculations are important at the study design stage to ensure successful results. Although there is a broad literature on bayesian variableselection underhigh orultrahighdimensional. Laboratory of molecular carcinogenesis, national institute of environmental health sciences, research triangle park, nc 27709, usa. But most existing methods of snpset analysis are affected by the quality of snpset, and poor quality of snpset can lead to low power in gwas. Request pdf tag snp selection for association studies this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a. Imputationaware tag snp selection to improve power for largescale, multiethnic association studies.
An efficient weighted tag snpset analytical method in. The differential pattern of mf and tl variation of snps was critical to effective tag snp selection, since the top snps were clearly distinct in the histograms of those windows fig. The program can also identify and choose tag snps for snps not in high ld with any gwas snp. In this paper, we present an or application for representative snp selection that implements our novel simulated annealing sa based feature selection. Request pdf tag snp selection for association studies this report describes current methods for selection of informative single nucleotide polymorphisms snps using data from a dense network. Our variable selection approach is inherently hierarchical, and involves selection at both snpset level and individual snp level.
Among selection methods that have proliferated, the ones based on pairwise ld measurement are attractive for the purpose of designing association studies. Genomewide association studies are a promising new tool for deciphering the genetics of complex diseases. The value of genebased selection of tag snps in genomewide. Tagging snps for association studies hum hered 2004. As a result, there is now a need to identify among all these data, the relevant markers for genetic association studies. Tag snps are useful in wholegenome snp association studies, in. Abstract selection of genetic variants is a crucial first step in the rational design of studies aimed at explaining individual differences in susceptibility to complex human diseases or health intervention outcomes. Analysis of two different sets of snp genotype data from the hapmap is used to judge the practical aspects of using. Dec 01, 2004 tag snp selection for association studies tag snp selection for association studies stram, daniel o. Because the power gauges the chance of success of an association study, selection of tag snps that yield higher power will increase the effectiveness of future association studies, at no. Most of the existing tagging snpset selection methods cannot make full use of the information hidden in common or rare variants associated diseases.
Haplotype block partitioning and tag snp selection using genotype data and their applications to association studies. Title page increasing the power of association studies by. A distinction between haplotype blockbased and nonblockbased approaches yields two classes of procedures. Single nucleotide polymorphisms, frequently called snps pronounced snips, are the most common type of genetic variation among people. Increasing power of genomewide association studies by. Selection and evaluation of tag snps tagger is a tool for the selection and evaluation of tag snps from genotype data such as that from the international hapmap project. Pdf haplotype block partitioning and tag snp selection. Linkage disequilibrium ld, which refers to the nonrandom association of alleles at different loci lewontin 1964 in haplotypes, plays a central role in genomewide association studies for. Efficiency and power in genetic association studies. Twostage sampling designs for gene association studies.
Tag snp selection for candidate gene association studies using. This reduces the expense and time of mapping genome areas. Several methods have been proposed for selecting sets of genetic markers that characterize the polymorphisms in a region of interest 9. Targeting the most informative snp loci supports the most efficient study designs. Tag snp selection for candidate gene association studies. Consequently, a new generation of genotyping arrays are being developed designed with tag single nucleotide polymorphisms snps to improve rare variant imputation.
1617 761 446 1168 1078 1507 128 1119 1510 1041 1060 1282 1322 1296 1131 1124 879 1268 905 1163 376 1073 955 116 1529 877 1347 937 707 1063 369 911 91 936