Help

Methodology
Home page
Table columns
Protein entries

Methodology:

This web site integrates the processing results obtained from the computional analysis of many mycobacterium genomes (Mycobacterium tuberculosis, bovis, avium, marinum, leprae, smegmatis and abcessus).

The genomes were retrieved from the Uniprot database and the secretion pathways, experimental data, structural data, external database links were compiled for each genome protein.

First, secretion pathways and transmembrane segments were predicted from each protein using multiple analysis tools (SignalP, TAT, T7S, Cleave, TM-HMM, TM-Uni).

Links to complementary databases were collected (Uniprot, TBDB, Mycobrowser, AlphaFold, GO, EggNog, 3Did, Domine, String).

Various experimental results were collected from the litterature on proteomics and transcriptomics.

The domain organisation of each protein was predicted using the Pfam and Interpro databases.

Finally, the AlphaFold models for all Mycobacterium Tuberculosis proteins were collected from the EBI database. These models and a representative set of PDB structures were exhaustively compared and clustered on the basis of their pairwise 3D similarities. As a result, each protein AlphaFold model indicates the list of models and structures whose TMalign superposition score is above 0.5.

Home page:

All information was collected in a global genome table accessible from this web site.

Columns are grouped by thematics (General information, Predictions, Proteomics, 3D structures, 3D domains).

A short description of each column can be obtained by hovering its corresponding header.

Columns can be sorted by clicking once or twice the column headers.

The left panel lists the table columns that can be displayed or hidden by clicking the corresponding toggle button.

Data selection can be queried by opening the "Search" form above the table.

Detailed information on each protein can be displayed by clicking the corresponding table row.

The resulting popup window can be maximized by clicking the small top-right square.

Table columns:

The content of each table column is detailed below:

MTB: Tuberculist identifier

Uniprot: Uniprot identifier

Length: protein sequence length

Homologs: number of homologs from 6 Mycobacteria genomes 
	(bovis, avium, marinum, leprae, smegmatis, abcessus)

Sec: secretion predictions
	SIG = probable signal peptide predicted by SignalP41
	sig = possible signal peptide predicted by SignalP41
	ARG = probable twin arignine signal predicted by PredTAT server
	arg = possible twin arginine signal predicted by PredTAT server
	T7S = probable T7S pathway predicted by Y...[DE] motif and Nterm helix
	t7S = possible T7S pathway predicted by Y...[DE]

YEM: prediction and experiment summary
   char 1: 
	Y = T7S pathway predicted motif Y...[DE] et N-term helix
    y = T7S pathway prédit par motif Y...[DE]
	S = strong signal peptide
	s = weak signal peptide
	R = strong twin arginine signal
	r = weak twin arginine signal
   char 2: 
	E = secretion detected by more than 2 experiments
	e = secretion detected by more than one experiment
   char 3: 
	M = probable trans-membrane protein
    m = possible trans-membrane protein 

TMU: trans-membrane segments annotated in the Uniprot FT fields
    char 2-3: number of trans-membrane segments
    char 5-7 : last position of the last trans-membrane segment

TMH: trans-membrane segments predicted by TmHMM
    char2-3: number of trans-membrane segments
    char5-7 : last position of the last trans-membrane segment

Signalp: SignalP v4.1 score * 100 (true positive cutoff = 45)

TAT: Twin Arginine and Sec peptides prediction by PredTAT server
      (http://www.compgen.org/tools/PRED-TAT/)
      score * 100
      s=sec peptide, m=trans-membrane, t=twin arginine
	  
T7S: T7S pathway predicted by the method descibed in Daleke 2012
      char2-3 : number of helix positions predicted by Psipred 
      	        between positions 45 et 80. 
      char5-7 : YDE indicates that the motif
                [ADGISTV][AGKMSV]..[Y]..[AGINQRTV][ED][AEDFILNQSTV]
                is detected between positions 70 et 120.

Cleave: prediction score and position of trans-membrane cleavage near C-term

Proteomics:
   Exp1: T7S pathway from Daleke 2011 experiment
   Exp2: 2D gel culture supernatant (MPIIB) Broth culture
   Exp3: 2D gel culture supernatant (MPIIB) short term culture
   Exp4: secreted and exproted protein according to Malen 2007
   Exp5: 2D-PAGE combined with MALDITOF and chromato (Malen 2007)
   Exp6: 2-DE gel pI=4-4.7 and Mr=6-20kDA (Lange 2013)

AlphaFold: 3d structure according Alphafold model database 

AlphaClan: Number of similar Alphamodel models in Mycobacterium Tuberculosis

Pfam1: link to the first Pfam protein domain family
	   
Pfam2: link to the second Pfam protein domain family
	   
Pfam3: link to the third Pfam protein domain family

Taxo1, Taxo2, Taxo3: taxonomic signature of Pfam1,Pfam2,Pfam3 families with syntax HEBAV123:
	   char 1: H => domain + more frequent than expected in humans
	   char 2: E => domain + more frequent than expected in eukaryotes
	   char 3: B => domain + more frequent than expected in bactéries
	   char 4: A => domain + more frequent than expected in archaea
	   char 5: V => domain + more frequent than expected in virus
	   Each letter is lowercased when domain is less frequent than expected.
	   Each letter is replaced by '~' when domain is very seldom.
	   Each letter is replaced by '-' when domain is not observed. 
	   digit 6: average number of repeats for each protein in Pfam family
	   digit 7: average length / 50 of Pfam domains
	   digit 8: log10(number of Pfam domains)

Protein entries:

Detailed information on each protein can be displayed by clicking the corresponding table row. Popup windows can be maximized by clicking the small top-right square.

Each protein entry is described by the following sections:

Description: protein name obtained from Uniprot

Identifier: Uniprot identifier

Uniprot: accession number

Gene: gene name 

Species: species name

Links:
   TbDB: link to Tuberculosis TB database
   MycoBrowser: link to database on pathogenic mycobacteria
   Uniprot: link to annotated sequence database
   Alphafold: link to EBI database of AlphaFold models
   GO: link to functional classification GO (Gene Ontology)
   EggNog: functional classfication from EggNog
   3did: link to domain interaction database 3did
   Domine: link to domain interaction database Domine
   String: link to protein pair correlation database String

Domains:
   Pfam1, Pfam2, Pfam3: links to Pfam domain families
   Interpro: link to protein domain database Interpro

Length: number of amino acids in protein sequence

Sequence: amino acid sequence retrieved from Uniprot database
 
Orthologs: orthologous proteins detect in close mycobacteria genomes
   This section lists all orthologous sequence detected in Mycobacterium tuberculosis, bovis, avium, marinum, leprae, smegmatis and abcessus.
   Relative protein similarities are assessed by a matrix of pairwise sequence identity percentages.

Alignment: orthologous protein alignment
   The section contains a multiple sequence alignment of all orthologous proteins.
   This alignment can be downloaded in aligned FASTA format by clicking the "download" link.

Location: predictions on protein location
   This sections describes the location predictions for all detected orthologous proteins from close mycobacteria genomes.
   The meaning of each column is explained in the global table description. 

Structures: similar PDB structures or AlphaFold models
   Representative PDB structures and all AlphaFold models from the EBI database were exhaustively compared and clustered. 
   All structures and models similar to the target AlphaFold model are listed in this section. 
   At the bottom of this section, the detected pairwise 3D similarities are assessed using the structural alignment program TMalign (TM-score, RMS deviation, Sequence identity percentage, Alignment length).