GRAiN



Loading...
Loading...
Loading...


Gene Regulation and Association Network

Introduction

#GObeyondGO
Geneset enrichment analysis using annotation catalogs, such as the Gene Ontology, is the standard method for functional interpretation of genomic experiments (e.g. RNA-seq outputs, GWAS loci, QTL genes etc.). However, the current state of function annotations of crop genomes is sparse and incomplete because the annotation protocols rely mainly on homology matches in model organisms. This also limits interpretation of genes that functionally evolved while retaining sequence and protein structure. Here, we developed the GRAiN framework to facilitate the functional interpretation of genomic experiments using gene regulatory networks in rice. The GRAiN server allows users to analyze the functional and regulatory features for a set of genes of interest. Input gene-sets could be derived from RNA-seq experiments as top differentially expressed genes, or GWAS SNPs mapped to genomic loci, genes within QTL regions etc. The GRAiN algorithm starts with finding overlaps of the input gene-set with network clusters, which were predicted using a large collection of datasets profiling gene expression under abiotic-stress conditions in rice. Then, all the clusters statistically over-represented in the input gene-set are retrieved, along with the functional and regulatory annotations on the clusters. This information is displayed to the user as an interconnected graph. This interactive graph is essentially a network with clusters enriched in the input set, GO biological process and Mapman pathways enriched within the clusters, their potential transcriptional regulators, as well as cis-regulatory elements predicted in the promoters of cluster genes. Read the manuscript for details on how different node-types were linked to each other.

In case if an overlap between the input set of genes and network clusters is not found in the first try, the GRAiN algorithm expands the input set by using their first order neighbors in the underlying unclustered gene co-regulatory network. The algorithm then proceeds with overlap analysis as stated above.

Alternatively, if there are no genes to input, users can simply parse pre-existing gene sets from the search box. In this case, no enrichment analysis is performed and only the network neighborhood is displayed.

Packages used:
Publication:

Data repo on Zenodo:

This app is developed using the Shiny platform.

Using GRAiN



  1. Submitting your query genes
    upload box
    Click on the ‘Gene list upload’ box and paste query genes in the text box and hit submit. Input list should be prepared according to MSU annotations (LOC_OsXXgXXXXX). If the correct pattern is not found in the list, the input will be rejected. Each gene ID in the input can be separated by a commas(,) or semi-colons(;) or new lines. Best was is to copy and paste columns from excel sheets. For best results, start with fewer genes (10-100), as GRAiN will expand the list if no enrichment is found in the first go. If still no enrichment is found in the expanded gene set, try adding more genes in the input.

    Sidenote: If you have a large number of genes to query, try using a small input size first, and check if the genes within enriched clusters are part of the original input set not used in the query. Remember to referesh the app before placing a new query.

  2. Interpreting GRAiN’s results

    1. Network Panel
      upload box
      If the input is entered correctly, hitting submit button should display a network window. Depending upon your internet speed, the backend calculations should take between 7-12 seconds per query. Different node groups are colored differently in the resulting network. The red circles are the query clusters, clusters most significantly enriched in the input genes. Cluster nodes (red), Gene Ontology process terms (light blue) and Mapman pathways (dark blue) are connected to each other if the overlap between two sets was statistically significant based on hypergeometric tests (qvalues < 0.05). Transcription Factors (grey) are connected to clusters (red) based on the Jaccard’s Index of overlap between predicted targets of TFs and cluster genes. TFs are connected to each other if the mutual information scores between their network connectivity profiles were high. The network can be interacted with zoom buttons to display node names and view denser parts of the network using buttons below or scrolling.


    2. Table Panel
      upload box

      Hitting submit button should also display a table window. This table displays all clusters that were found enriched in the input geneset or their first order neighbors if no statistical significance was found in the user input genes. The Size column shows total number of genes in the cluster. The Overlap shows overlap between inout genes and the cluster. The adjusted pvalue is the multiple hypothesis corrected pvalue from hypergeometric test. The GO process and Mapman pathway columns show over-represented biological processes and pathways in the cluster. The CRE column shows cis-Regulatory Elements found in the cluster by de novo analysis of DNA sequence motifs in 1000 bp promoters of cluster genes. The Top Regulators column shows top 10 TFs predicted as regulators of genes within the cluster.