Multi-threshold (and Multi-marker) Association Study Analysis: MASA
Incorporating prior informatioin into association study in an optimal manner
MASA is a new association testing method incorporating prior information to increase power. MASA is closely related to the concept of multiple testing. To obtain the corrected p-values taking into account multiple testing, usually the Bonferroni correction is used. However, this is equivalent to treating every test equally. For example, some markers are more likely to be causal because it is proximal to funtional elements, or some markers can be in tight LD with many putative causal variants. By taking into account this prior information, we can increase power to detect causal variants (Eskin 2008). This is equivalent to varying significant threshold at each marker depending on the prior information of the marker (multi-thresholding).
Moreover, extending that concept, we can devise a new multi-marker based test (Darnell 2012). Our mutivariate-normal (MVN)-based test is fundamentally different from the traditional tests in that the test is applied to all putative causal variants such as all known variants in HapMap, not only to the collected markers. Again, multi-thresholding technique is applied to optimally incorporate prior information.
usage: java -jar Masa.jar [options] -cohort <FILE> Cohort file (case/control data) in Beagle format -maf_threshold <FLOAT> Remove SNPs of MAF below this threshold both in reference and cohort (default=0.01) -marker <FILE> Marker file in Beagle marker file format -method <FILE> Multi-threshold association testing method ('eskin' or 'mvn') (default=eskin) -mvn_max_num_proxy <INT> In MVN method, maximum number of proxies per tested putative causal SNP (default=20) -mvn_proxy_r_threshold <FLOAT> In MVN method, select nearby SNP only if |r| is above this value (default=0.3) -out <FILE> Output file prefix (default='outFile') -permute <NUM> Perform permutation <NUM> times instead of assuming independent markers (required for MVN method) -prior <FILE> Prior information file -reference <FILE> Phased reference data haplotype file in Beagle format -relative_risk <FLOAT> Prior information of target relative risk (default=1.2) -seed <INT> Random number generator seed (default=0) -window <SIZE> Number of nearby SNPs to look up tags (default = 100)
java -jar Masa.jar -reference ENCODEbeagle/ENm010.CEU.beagle -marker ENCODEbeagle/ENm010.CEU.marker -cohort cohort.beagle
java -jar Masa.jar -reference ENCODEbeagle/ENm010.CEU.beagle -marker ENCODEbeagle/ENm010.CEU.marker -cohort cohort.beagle -method mvn -permute 10000 -out myoutputfile
usage: java -jar SimulateCohort.jar [options] -cohort_param <#CASE #CONTROL #COHORT> Case size, control size, and number of cohorts -maf_threshold <FLOAT> Minimum MAF of a causal SNP that will be randomly selected (default=0.1) -marker <FILE> Marker file in Beagle marker file format -out <FILE> output file (default='cohort') -reference <FILE> Phased reference data haplotype file in Beagle format -relative_risk <FLOAT> Relative risk of causal SNP to simulate (default=1.2) -seed <INT> Random number generator seed (default=0) -tag <FILE> Tag file including rsids of tag SNPs
java -jar SimulateCohort.jar -reference ENCODEbeagle/ENm010.CEU.beagle -marker ENCODEbeagle/ENm010.CEU.marker -tag ENCODEbeagle/ENm010.CEU.tag -cohort_param 2000 2000 10
# Simulation cohort in Beagle format # Relative risk assumed: 1.500000 # Causal SNP assumed: rs28357162 (MAF: 0.983333, Index: 348) # Base_position: 27022619 # Num of cases: 1000, Num of controls: 1000 I id IND0 IND0 IND1 IND1 IND2 IND2 IND3 ... A disease 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ... M rs2462910 A C A A C C C A C A C C A C ... M rs774257 G G G G A A A G G G G G G G G ... M rs774245 A A A A G G G A A A A A A A A ... M rs774246 A A A A G G G A A A A A A A A ... .........
Gregory Darnell, Dat Duong, Buhm Han, Eleazar Eskin. “Incorporating prior information into association studies.”, Bioinformatics (2012) 28 (12): i147-i153. Also in Proceedings of the Twentieth Annual Conference on Intelligent Systems for Molecular Biology (ISMB-2012).
Eleazar Eskin, “Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information.”, Genome Research (2008) 18:653-660.
Buhm Han : buhmhan (AT) broadinstitute (DOT) org
Gregory Darnell : gbd343 (AT) gmail (DOT) com
G.D., D.D., B.H. and E.E. are supported by National Science Foundation grants 0513612, 0731455, 0729049, 0916676 and 1065276, and National Institutes of Health grants K25- HL080079, U01-DA024417, P01-HL30568 and PO1-HL28481. B.H. is supported by the Samsung Scholarship.