The basic use of MODAS generally includes three steps: first, generate pseudo-genotype files; second, use the pseudo-genotype files to screen candidate genomic blocks; third, identify QTLs by SNP-based regional association analysis. In addition, Mendelian randomization (MR) analysis and GWAS visualization can also be performed for specified traits. In this part, only basic commands are listed and briefly described for a quick start, please refer to the tutorial part for a more detailed description of the implementation principle and parameters of MODAS.
MODAS takes genotype files in plink-bed format as input, it also supports the conversion of hapmap format to plink-bed format. For usage, refer to the subcommand
MODAS takes omics phenotype files in csv format as input, the first column and the first row are the names of inbred lines and phenotypes respectively.
Generate pseudo-genotype files
The pseudo-genotype files can be generated from the
-genome_cluster parameter in the subcommand
genoidx. Since the genotype files are relatively large,
-p is generally added to specify multi-threading to accelerate the generation of pseudo-genotype files.
MODAS.py genoidx -g ./chr_HAMP -genome_cluster -p 10 -o chr_HAMP
Prescreen candidate genomic regions for omics data
Prescreening of candidate genomic regions for omics data/molecular traits (mTraits) is generated by the subcommand
prescreen. In this step, the pseudo-genotype files are first used to perform association analysis on the mTraits through linear model (LM), general linear model(GLM) or mixed linear model (MLM). Then, the significantly associated genomic regions (SAGRs) for each mTrait are used as the candidate genomic regions.
MODAS.py prescreen -g ./chr_HAMP -genome_cluster ./chr_HAMP.genome_cluster.csv -phe ./E3_log2.normalized_phe.csv -gwas_model MLM -p 20 -o E3_log2
Perform regional association analysis to identify QTLs
The QTL identification step is generated by the subcommand
regiongwas. In this step, the SNPs from the candidate genomic regions are extracted to perform regional association analysis through LM, GLM or MLM, and the QTLs for each mTrait are identified according to the results of SNP-based regional association analysis.
MODAS.py regiongwas -g ./chr_HAMP -phe ./E3_log2.sig_omics_phe.csv -phe_sig_qtl ./E3_log2.phe_sig_qtl.csv -gwas_model MLM -p 20 -o E3_log2
Perform Mendelian randomization analysis
Mendelian randomization (MR) analysis is generated by the subcommand
mr. In this step, the peak SNP of a QTL for each trait is used to infer the causal relationship between trait pairs. MR analysis in MODAS can be performed using either linear model (LM) or mixed linear model (MLM), with parameters
# lm model MODAS.py mr -g chr_HAMP -exposure AMP_kernel_transcriptome_v4_FPKM_correct.sig_eqtl.qqnorm.csv -outcome blup_traits_final.new.csv -qtl AMP_kernel_transcriptome_qtl_res.csv -lm -o AMP_kernel_transcriptome_MR_lm # mlm model MODAS.py mr -g chr_HAMP -exposure AMP_kernel_transcriptome_v4_FPKM.sig_eqtl.qqnorm.csv -outcome blup_traits_final.new.csv -qtl AMP_kernel_transcriptome_qtl_res.csv -mlm -o AMP_kernel_transcriptome_MR_mlm
Whole genome-wide association analysis and visualization
GWAS visualization is generated by the subcommand visual. In this step, MODAS uses specified QTL and trait files as inputs, to perform GWAS and generate whole-genome level Manhattan plots, and displays the results through a HTML based web page.
# visualization MODAS.py visual -g chr_HAMP -phe E3_log2.normalized_phe.csv -qtl E3_log2.local_gwas_qtl_res.csv -gwas_model gemma_MLM -p 6 -visual -anno maize_genefunc.txt -o E3_log2_visual