Quick Start
The basic use of MODAS generally includes three steps: first, generate pseudo-genotype files; second, use the pseudo-genotype files to screen candidate genomic blocks; third, identify QTLs by SNP-based regional association analysis. In addition, Mendelian randomization (MR) analysis and GWAS visualization can also be performed for specified traits. In this part, only basic commands are listed and briefly described for a quick start, please refer to the tutorial part for a more detailed description of the implementation principle and parameters of MODAS.
Input format
Genotype format
MODAS takes genotype files in plink-bed format as input, it also supports the conversion of hapmap format to plink-bed format. For usage, refer to the subcommand genoidx
.
Phenotype format
MODAS takes omics phenotype files in csv format as input, the first column and the first row are the names of inbred lines and phenotypes respectively.
New features in MODAS2
Identifying stress-responsive Molecular QTL through contrastive PCA and Two-Way ANOVA
MODAS2 innovatively employs the contrastive PCA algorithm to separate the stress-response effects of natural variation on molecular traits from the genetic effects of natural variation on molecular traits. The separated stress response effects are then used as stress response indices for identifying stress-responsive molecular QTL. After identifying QTL using the stress response indices, the significance of the stress response for these QTL is further assessed through two-way ANOVA, resulting in the final stress-responsive molecular QTL. MODAS2 uses the subcommand contrast
to identify stress-responsive molecular QTL. The command line is as follows:
MODAS contrast -stress_phe example_data/test_salt.phe.csv -control_phe example_data/test_control.phe.csv -gwas -g example_data/example_geno.contrast -genome_cluster example_data/example_contrast.genome_cluster.csv -p 10 -o example
contrast
subcommand generates five files including example.scpca_pc.phe.csv
, example.scpca_pc.beta_test.csv
, example.scpca_pc.normalized_phe.csv
, example.region_gwas_qtl_res.anova.csv
, example.region_gwas_qtl_res.anova.sig.csv
and example.region_gwas_bad_qtl_res.csv
, which contain the stress response indices of molecular traits, statistical test results of the stress response indices, normalized stress response indices of molecular traits, stress response indices QTL results including two-way ANOVA P values, stress-responsive molecular QTL results and unreliable molecular QTL results, respectively.
Multi-Trait QTL Colocalization Analysis Based on Image Matching Algorithms
QTL colocalization analysis is an effective method for integrating functional information across different traits. However, existing methods rarely perform multi-trait QTL colocalization analysis. MODAS2 employs an image matching algorithm to score the colocalization degree between pairs of QTLs, and then uses clustering algorithms to quickly achieve multi-trait QTL colocalization analysis. Multi-trait QTL colocalization analysis can be performed using the coloc
subcommand of MODAS2. The command line is as follows:
MODAS coloc -qtl example_data/test_coloc.qtl.csv -g example_data/example_geno.contrast -gwas_dir example_data/gwas_coloc_test/ -p 6 -o example
coloc
subcommand generates three files including example.coloc_res.csv
, example.coloc_pairwise.csv
and example.dis_res.csv
, which contain the results of the co-localized QTL clusters, the pairwise QTL co-localization results, and the similarity results between pairwise QTLs, respectively.
Note: Sample data for MODAS2 can be downloaded via zenodo.
Modules in MODAS
Generate pseudo-genotype files
The pseudo-genotype files can be generated from the -genome_cluster
parameter in the subcommand genoidx
. Since the genotype files are relatively large, -p
is generally added to specify multi-threading to accelerate the generation of pseudo-genotype files.
MODAS genoidx -g ./chr_HAMP -genome_cluster -p 10 -o chr_HAMP
Prescreen candidate genomic regions for omics data
Prescreening of candidate genomic regions for omics data/molecular traits (mTraits) is generated by the subcommand prescreen
. In this step, the pseudo-genotype files are first used to perform association analysis on the mTraits through linear model (LM). Then, the significantly associated genomic regions (SAGRs) for each mTrait are re-analyzed through mixed linear model (MLM). Finally, the SAGRs pass the MLM screening are used as the candidate genomic regions.
MODAS prescreen -g ./chr_HAMP -genome_cluster ./chr_HAMP.genome_cluster.csv -phe ./E3_log2.normalized_phe.csv -p 20 -o E3_log2
Perform regional association analysis to identify QTLs
The QTL identification step is generated by the subcommand regiongwas
. In this step, the SNPs from the candidate genomic regions are extracted to perform regional association analysis through MLM, and the QTLs for each mTrait are identified according to the results of SNP-based regional association analysis.
MODAS regiongwas -g ./chr_HAMP -phe ./E3_log2.sig_omics_phe.csv -phe_sig_qtl ./E3_log2.phe_sig_qtl.csv -p 20 -o E3_log2
Perform Mendelian randomization analysis
Mendelian randomization (MR) analysis is generated by the subcommand mr
. In this step, the peak SNP of a QTL for each trait is used to infer the casual relationship between trait pairs. MR analysis in MODAS can be performed using either linear model (LM) or mixed linear model (MLM), with parameters -lm
and -mlm
, respectively.
# lm model
MODAS mr -g chr_HAMP -exposure AMP_kernel_transcriptome_v4_FPKM_correct.sig_eqtl.qqnorm.csv -outcome blup_traits_final.new.csv -qtl AMP_kernel_transcriptome_qtl_res.csv -lm -o AMP_kernel_transcriptome_MR_lm
# mlm model
MODAS mr -g chr_HAMP -exposure AMP_kernel_transcriptome_v4_FPKM_correct.sig_eqtl.qqnorm.csv -outcome blup_traits_final.new.csv -qtl AMP_kernel_transcriptome_qtl_res.csv -mlm -o AMP_kernel_transcriptome_MR_lm
Whole genome-wide association analysis and visualization
GWAS visualization is generated by the subcommand visual
. In this step, MODAS uses specified QTL and trait files as inputs, to perform GWAS and generate whole-genome level Manhattan plots, and displays the results through a HTML based web page.
# visualization
MODAS visual -g chr_HAMP -phe E3_log2.normalized_phe.csv -qtl E3_log2.local_gwas_qtl_res.csv -p 6 -visual