Quick Start

The basic use of MODAS generally includes three steps: first, generate pseudo-genotype files; second, use the pseudo-genotype files to screen candidate genomic blocks; third, identify QTLs by SNP-based regional association analysis. In addition, Mendelian randomization (MR) analysis and GWAS visualization can also be performed for specified traits. In this part, only basic commands are listed and briefly described for a quick start, please refer to the tutorial part for a more detailed description of the implementation principle and parameters of MODAS.

Input format

Genotype format

MODAS takes genotype files in plink-bed format as input, it also supports the conversion of hapmap format to plink-bed format. For usage, refer to the subcommand genoidx.

Phenotype format

MODAS takes omics phenotype files in csv format as input, the first column and the first row are the names of inbred lines and phenotypes respectively.

Generate pseudo-genotype files

The pseudo-genotype files can be generated from the -genome_cluster parameter in the subcommand genoidx. Since the genotype files are relatively large, -p is generally added to specify multi-threading to accelerate the generation of pseudo-genotype files.

MODAS.py genoidx -g ./chr_HAMP -genome_cluster -p 10 -o chr_HAMP

Prescreen candidate genomic regions for omics data

Prescreening of candidate genomic regions for omics data/molecular traits (mTraits) is generated by the subcommand prescreen. In this step, the pseudo-genotype files are first used to perform association analysis on the mTraits through linear model (LM), general linear model(GLM) or mixed linear model (MLM). Then, the significantly associated genomic regions (SAGRs) for each mTrait are used as the candidate genomic regions.

MODAS.py prescreen -g ./chr_HAMP -genome_cluster ./chr_HAMP.genome_cluster.csv -phe ./E3_log2.normalized_phe.csv -gwas_model MLM -p 20 -o E3_log2

Perform regional association analysis to identify QTLs

The QTL identification step is generated by the subcommand regiongwas. In this step, the SNPs from the candidate genomic regions are extracted to perform regional association analysis through LM, GLM or MLM, and the QTLs for each mTrait are identified according to the results of SNP-based regional association analysis.

MODAS.py regiongwas -g ./chr_HAMP -phe ./E3_log2.sig_omics_phe.csv -phe_sig_qtl ./E3_log2.phe_sig_qtl.csv -gwas_model MLM -p 20 -o E3_log2

Perform Mendelian randomization analysis

Mendelian randomization (MR) analysis is generated by the subcommand mr. In this step, the peak SNP of a QTL for each trait is used to infer the causal relationship between trait pairs. MR analysis in MODAS can be performed using either linear model (LM) or mixed linear model (MLM), with parameters -lm and -mlm, respectively.

# lm model
MODAS.py mr -g chr_HAMP -exposure AMP_kernel_transcriptome_v4_FPKM_correct.sig_eqtl.qqnorm.csv -outcome blup_traits_final.new.csv -qtl AMP_kernel_transcriptome_qtl_res.csv -lm  -o AMP_kernel_transcriptome_MR_lm

# mlm model
MODAS.py mr -g chr_HAMP -exposure AMP_kernel_transcriptome_v4_FPKM.sig_eqtl.qqnorm.csv -outcome blup_traits_final.new.csv -qtl AMP_kernel_transcriptome_qtl_res.csv -mlm  -o AMP_kernel_transcriptome_MR_mlm

Whole genome-wide association analysis and visualization

GWAS visualization is generated by the subcommand visual. In this step, MODAS uses specified QTL and trait files as inputs, to perform GWAS and generate whole-genome level Manhattan plots, and displays the results through a HTML based web page.

# visualization
MODAS.py visual -g  chr_HAMP -phe E3_log2.normalized_phe.csv -qtl E3_log2.local_gwas_qtl_res.csv -gwas_model gemma_MLM -p 6 -visual -anno maize_genefunc.txt -o E3_log2_visual