logo_dv

AutoGRAPH - Tutorial




Contents :

1› AutoGRAPH Input
      a) Preinserted datasets
      b) Personal datasets
2› AutoGRAPH Output
      a) Options
      b) Comparative map
      c) Breakpoints, CS and CSO characterization
      d) Infer locations
3› Algorithm description
4› Publication

1› AutoGRAPH input (3-ways comparison example):

     a) Pre inserted dataset:

1 : Select the number of species to compare.
2 : Select one Reference species and the reference chromosome (or all chromosomes) and select species 2 and 3 to be compared.

  b) Personal dataset:

1 : Select the number of species to compare.
2 : Select input file formats. (* see below for description)
3 : Click to get an example of text input format to launch AutoGRAPH.
4 : Paste reference dataset or browse your file ( or see example in order to integrate 2 resources within one species -here- ).
5 : Paste dataset 2 or browse your file.
6 : Paste dataset 3 or browse your file.

* Input file format description:

i. AutoGRAPH file format: rows must be formatted as:
- <marker_identifier>   <chromosome>   <position> - (example)
Each column should be separated by space(s) or tab.

ii. GFF file format: rows must be formatted as:
- <seqname>   <source>   <feature>   <start_position>   <end_position>   <score>   <strand>  <frame> - (See description of gff format on Sanger web site)
AutoGRAPH parses input gff files and stores useful data (i.e seqname, source, feature, start and strand) in order to build comparative map.
Concerning the gff semantic, we define "feature" item as the chromosome Id.
Missing data should be set by a dot '.'. Each column should be separated by space(s) or tab.
One raw Example : "ENSCAFG00000014677 ENSEMBL_GeneWise CFA_34 37451049 37453338 . - 1".

2› AutoGRAPH output (3-ways comparison):



  a) Ouput options:

1: Mode (1/2): Mode option allows user to set the comparative map to be displayed. Mode 1 permits to define 1:1 orthologous relationships and Mode 2 allows 1:1 and 1:0 orthologous anchors to be analyzed.
2: Results displayed: Click on each links to get results. Three possibilities are available: Synteny map (mode 1/2), characterization of breakpoints, CS and CSO (mode 1/2) and predictions of 1:0 relationships (mode 2).
3: Dataset 3 options : Options allows user to modify the orthology relationships displayed on synteny map (mode 1 -› 1:1 and mode 2 -› 1:1/1:0) and to change the orientation (increasing/decreasing) of the dataset 3 in the figure.
4: CS and CSO options:
- Adjacency penalty: It correponds to a number of genes/markers set by users that permits to identify an interruption in the colinearity of a Conserved Segment (more info).
- Minimum number of markers listed in array results: Select the minimum number of markers/anchors to define a CS(O) and to be listed. These informations are stored and displayed in the results array at the bottom of the figure and in a flat-file that can be downloaded.
5: Dataset 2 options : Options allows user to modify the orthology relationships displayed on synteny map (mode 1 -› 1:1 and mode 2 -› 1:1/1:0) and to change the orientation (increasing/decreasing) of the dataset 2 in the figure.


  b) Export comparative map in several formats:


User can save their output comparative map in several formats included .png, .gif, .jpeg as well as .pdf, ps, .eps...

  c) Comparative map construction and interpretation :

- Download the output_tutorial to interprete an example output corresponding to the comparison of canine chromosome 34 (CFA 34) with human and mouse genomes.
- Or click on images to see how interpreting results for the same example:

- Step 1 -
- Step 2 -
- Step 3 -
- Step 4 -
- Step 5 -
- Step 6 -
- Step 7 -
- Step 8 -
- Step 9 -
- Step 10 -

  d) Characterization of breakpoints, CS and CSO in array results:

This is an example of the array results analyzed by AutoGRAPH for canine chromosome 34 as Reference dataset compared to Human genome (Dataset 2) and Mouse genome (Dataset 3).
Multiple informations are returned:
- Conserved Segment(s) id (1), content (2), locations (3) and size (4) on reference and tested genomes.
- Marker/gene density (5) in Conserved segments and in breakpoint regions (7) (Density in breakpoint regions is defined as a number of marker(s) in a window of 1Mb surronding the midpoint of the breakpoint region on the reference chromosome).
- Breakpoints region locations on reference chromosome (7).
All informations can be downloaded in a tabulated text file format (8).



  e) Predictions of 1:0 relationships (mode 2):

In mode 2, AutoGRAPH infers tested markers/genes locations for 1:0 relationships.


The figure above shows the top of the canine chromosome 34 compared to mouse genome (DATASET 3). It displays a conserved segment that belongs to mouse chromosome 15. (AutoGRAPH 3 ways-comparison - preinserted data)
1:0 orthologous relationships are identified on the reference genome by (*) and grey color (ENSCAFG00000010056 and ENSCAFG00000010097 in this example).
AutoGRAPH proposes a genomic location based on colinearity within Conserved Segment(s). Mouse orthologue location of canine gene ENSCAFG00000010056 can be inferred on MMU_15 between 31401872 and 31513494 bp (black arrow).
These analyses are available at the bottom of the page and can be downloaded in a tabulated text file format.


3› Algorithm description:

AutoGRAPH compares one reference dataset against one to two tested dataset(s). In a 2 way-comparison, let n be the number of elements (markers/genes/anchor) for reference dataset (A) and m the number of elements for tested dataset (B).
For each data set, the algorithm takes as input 3 parameters: marker IDs, their coordinates in any unit (i1,i2,i3...in, for reference and j1,j2,j3...jm for tested genome) reflecting their position on chromosomes, and the chromosome number to which they belong.
The program assigns relative integer numbers to the coordinates for the reference genome (A) accordingly to their position on the chromosome such as, each marker is characterized by : (ID, chr# Integer). The program assigns the same integers to the orthologues on the tested genome (B) such as (ID, integer, chr#).
In a first step, ortholog anchors are identified by comparing markers IDs from A and B. Then, AutoGRAPH groups and orders comparative anchors on A and B through, respectively, their chromosome number and their coordinates. Therefore, any discontinuity of chromosomal assignement in B revealed an inter-chromosomal breakpoint (CS).
In a second step, within chromosome, an adjacency function determines the sequence of integers that are in the same order for common anchors in A and B. The function calculates the marker contiguity in B as c1=j2-j1, c2=j3-j2... cn=jn-jn-1. Adjacency of markers in B is revealed by value of 1 than can be positive (linear order) or negative (reflecting inversion). Any value different of 1 regardless of the sign identifies anchors that are not consecutive and, consequently, represents intra-chromosomal rearrangement or internal breakpoints (CSO).
An adjacency penalty parameter is available to relaxe the rigidity of marker consecutivity to define CSO. Adjacency penalty is set with a threshold value that define the gap allowed separating two anchors.
See illustration after for a Human-dog example.

- Adjacency Penalty:

It correponds to a number of genes/markers (on the tested chromosome) set by users that makes an interruption in the colinearity of a Conserved Segment.
All genes of a species have precise locations and are consequently (pre)ordered within genomes. AutoGRAPH algorithm tests the difference of order for 2 adjacent markers on tested chromosome.
It permits to identify CSO(s) in a CS.
For examples: it allows user to identify:
a) Inversion
If we have respectively these orders of anchors (orthologous genes) in 2 species:

On reference species: 1 2 3 4 5 6 7 8 9 10
On tested species: 1 2 3 4 5 10 9 8 7 6
AutoGRAPH adjacency test: 1 1 1 1 1 5 -1 -1 -1 -1
    (2-1) (3-2) (4-3) (5-4) (5-10) (9-10) (8-9) (7-8) (6-7)
An adjacency penalty of 3 identifies 2 CSOs (from 1 to 5 and from 6 to 10) in the CS.

b) An interruption in a CS, that corresponds to 2 CSOs
On reference species: 1 2 3 4 5 6 7 8 9 10 11
On tested species: 1 2 3 7 8 9 10 11 12 13 14
AutoGRAPH adjacency test: 1 1 1 4 1 1 1 1 1 1 1
    (2-1) (3-2) (7-3) (8-7) (9-8) (10-9) (11-10) (12-11) (13-12) (14-13)
In the same way, an adjacency penalty of 3 identifies 2 CSOs (from 1 to 3 and from 4 to 11 on the reference chromosome and from 1 to 3 and from 7 to 14 on the tested chromosome) in the CS.


- Server Agreement:

All the results obtained by using AutoGRAPH are protected for your personal use and are not accessible to third-parties excluding our internal software management.

4› Publications:

- If you use AutoGRAPH, please cite:
AutoGRAPH: an interactive web server for automating and visualizing comparative genome maps Bioinformatics. 2007 23:498-499.

- Fig.2 supplementary:





Apr-26-2017, 17:31Questions/problems logo_dv