Annotations for use with MISO and sashimi_plot

This page contains links to GFF annotations for use with MISO and sashimi_plot. The GFF annotation format and how it is used by MISO is described in detail in the MISO manual.

Exon-centric annotations

These annotations include GFF files (.gff3 extension) that can be used with MISO.

Exon-centric annotations for human and mouse genomes

Version 1 of the human/mouse annotations (compiled 2008):

These contain annotations of:

  1. Skipped exons (SE)
  2. Alternative 3’/5’ splice sites (A3SS, A5SS)
  3. Mutually exclusive exons (MXE)
  4. Tandem 3’ UTRs (TandemUTR)
  5. Retained introns (RI)
  6. Alternative first exons (AFE)
  7. Alternative last exons (ALE)

Version 1 of the annotations for human and mouse genomes was derived from by Wang et. al. (2008) using ESTs and various annotation databases (like Ensembl, UCSC and AceView) to define alternative splicing events. Briefly, each splicing event was considered alternative if it was supported by several ESTs, and alternative tandem 3’ UTRs (TandemUTR events) were derived from PolyA DB.

Note that Version 1 of the annotations was originally made for mm9 and hg18, and the mm10 and hg19 annotation was made by coordinate mapping (using UCSC’s liftOver utility) of mm9 to mm10, hg18 to hg19.

Warning

The lifted over Version 1 annotations of mm10/hg19 contain the ID entries in the GFF from mm9/hg18; however, the actual genomic coordinates, which are the only part read by MISO, have been lifted over to the more recent genomes. The ID value used in the GFF is arbitrary and is ignored by MISO; it is only used to encode the gene models hierarchy of genes, mRNAs and exons. Also note that lifting over is an imperfect process: not all events can always be fully lifted over.

Mapping from alternative events to genes for Version 1 annotations

Version 1 annotations from the links above contain a mapping from alternative events to genes, based on Ensembl annotation. These are tab-delimited files the first column (event_id) is the ID of the event from its GFF file and the second column (gene_id) corresponds to a comma-separated list of Ensembl identifiers for the gene(s) the event overlaps. If the event overlaps multiple genes (which could happen because multiple Ensembl identifiers are sometimes given to the same gene, or because the genes overlap and/or are contained within each other in the annotation), then multiple Ensembl identifiers will be listed. A mapping file is given for each event type (e.g. skipped exons, tandem 3’ UTRs, etc.) Events that cannot be mapped to genes are recorded as NA.

Version 2 (alpha release) of the human/mouse annotations (compiled June 2013):

These contain annotations of:

  1. Skipped exons (SE)
  2. Alternative 3’/5’ splice sites (A3SS, A5SS)
  3. Mutually exclusive exons (MXE)
  4. Retained introns (RI)

Version 2 of the annotations was derived by considering all transcripts annotated in Ensembl genes, knownGenes (UCSC) and RefSeq genes. The flanking exons to alternative exons were chosen using the “common shortest” rule, i.e. taking the shortest stretches of flanking that are most common among the annotated transcripts for the gene. The code used to generate these annotations is available as part of rnaseqlib.

The annotations contain the following additional GFF attributes for each event’s gene entry:

  • ensg_id: Ensembl ID for the gene the event falls within
  • refseq_id: RefSeq ID for the gene the event falls within
  • gsymbol: Gene symbol for the gene the event falls within

These annotations are still being tested. Comments on the annotation are welcomed.

Exon-centric annotations for fly genome

These fly genome annotations were derived by the Graveley lab.

Isoform-centric annotations and reference gene models

We provide GFF3 annotations based on UCSC Table Browser’s version of Ensembl genes for the following genomes:

These can be used with MISO for isoform-centric quantitation, or with sashimi_plot to make plots of RNA-Seq data across gene models.

For convenience, we also provide GFF3 annotations of gene models from Ensembl (release 65), which were simply converted from Ensembl’s GTF to GFF3 format and are otherwise identical to the Ensembl annotation.

Note that these annotations follow Ensembl-style chromosome names where as the UCSC-derived Ensembl annotations follow UCSC-style chromosome names.

Alternative 3’ UTR annotations (hybrid)

In addition to exon-centric tandem 3’ UTR annotations, alternative 3’ UTR annotations for mouse (mm9) were made available by Wencheng Li and Bin Tian (these were derived by the 3’ READS method: Analysis of alternative cleavage and polyadenylation by 3′ region extraction and deep sequencing). These contain two or more 3’ UTR annotations per gene:

Updates

2013:

  • Wed, Jun 26: Released revised version of v1.0 annotations, where ALE event formatting error was fixed. Sanitized annotations to ensure start < end for hg19. In v2.0 annotations, an error in subset of RI event definitions was fixed.