Help
Methodology:
This web site integrates the processing results obtained from the computional analysis of many mycobacterium genomes (Mycobacterium tuberculosis, bovis, avium, marinum, leprae, smegmatis and abcessus).
The genomes were retrieved from the Uniprot database and the secretion pathways, experimental data, structural data, external database links were compiled for each genome protein.
First, secretion pathways and transmembrane segments were predicted from each protein using multiple analysis tools (SignalP, TAT, T7S, Cleave, TM-HMM, TM-Uni).
Links to complementary databases were collected (Uniprot, TBDB, Mycobrowser, AlphaFold, GO, EggNog, 3Did, Domine, String).
Various experimental results were collected from the litterature on proteomics and transcriptomics.
The domain organisation of each protein was predicted using the Pfam and Interpro databases.
Finally, the AlphaFold models for all Mycobacterium Tuberculosis proteins were collected from the EBI database. These models and a representative set of PDB structures were exhaustively compared and clustered on the basis of their pairwise 3D similarities. As a result, each protein AlphaFold model indicates the list of models and structures whose TMalign superposition score is above 0.5.
Home page:
All information was collected in a global genome table accessible from this web site.
Columns are grouped by thematics (General information, Predictions, Proteomics, 3D structures, 3D domains).
A short description of each column can be obtained by hovering its corresponding header.
Columns can be sorted by clicking once or twice the column headers.
The left panel lists the table columns that can be displayed or hidden by clicking the corresponding toggle button.
Data selection can be queried by opening the "Search" form above the table.
Detailed information on each protein can be displayed by clicking the corresponding table row.
The resulting popup window can be maximized by clicking the small top-right square.
Table columns:
The content of each table column is detailed below:
MTB: Tuberculist identifier
Uniprot: Uniprot identifier
Length: protein sequence length
Homologs: number of homologs from 6 Mycobacteria genomes
(bovis, avium, marinum, leprae, smegmatis, abcessus)
Sec: secretion predictions
SIG = probable signal peptide predicted by SignalP41
sig = possible signal peptide predicted by SignalP41
ARG = probable twin arignine signal predicted by PredTAT server
arg = possible twin arginine signal predicted by PredTAT server
T7S = probable T7S pathway predicted by Y...[DE] motif and Nterm helix
t7S = possible T7S pathway predicted by Y...[DE]
YEM: prediction and experiment summary
char 1:
Y = T7S pathway predicted motif Y...[DE] et N-term helix
y = T7S pathway prédit par motif Y...[DE]
S = strong signal peptide
s = weak signal peptide
R = strong twin arginine signal
r = weak twin arginine signal
char 2:
E = secretion detected by more than 2 experiments
e = secretion detected by more than one experiment
char 3:
M = probable trans-membrane protein
m = possible trans-membrane protein
TMU: trans-membrane segments annotated in the Uniprot FT fields
char 2-3: number of trans-membrane segments
char 5-7 : last position of the last trans-membrane segment
TMH: trans-membrane segments predicted by TmHMM
char2-3: number of trans-membrane segments
char5-7 : last position of the last trans-membrane segment
Signalp: SignalP v4.1 score * 100 (true positive cutoff = 45)
TAT: Twin Arginine and Sec peptides prediction by PredTAT server
(http://www.compgen.org/tools/PRED-TAT/)
score * 100
s=sec peptide, m=trans-membrane, t=twin arginine
T7S: T7S pathway predicted by the method descibed in Daleke 2012
char2-3 : number of helix positions predicted by Psipred
between positions 45 et 80.
char5-7 : YDE indicates that the motif
[ADGISTV][AGKMSV]..[Y]..[AGINQRTV][ED][AEDFILNQSTV]
is detected between positions 70 et 120.
Cleave: prediction score and position of trans-membrane cleavage near C-term
Proteomics:
Exp1: T7S pathway from Daleke 2011 experiment
Exp2: 2D gel culture supernatant (MPIIB) Broth culture
Exp3: 2D gel culture supernatant (MPIIB) short term culture
Exp4: secreted and exproted protein according to Malen 2007
Exp5: 2D-PAGE combined with MALDITOF and chromato (Malen 2007)
Exp6: 2-DE gel pI=4-4.7 and Mr=6-20kDA (Lange 2013)
AlphaFold: 3d structure according Alphafold model database
AlphaClan: Number of similar Alphamodel models in Mycobacterium Tuberculosis
Pfam1: link to the first Pfam protein domain family
Pfam2: link to the second Pfam protein domain family
Pfam3: link to the third Pfam protein domain family
Taxo1, Taxo2, Taxo3: taxonomic signature of Pfam1,Pfam2,Pfam3 families with syntax HEBAV123:
char 1: H => domain + more frequent than expected in humans
char 2: E => domain + more frequent than expected in eukaryotes
char 3: B => domain + more frequent than expected in bactéries
char 4: A => domain + more frequent than expected in archaea
char 5: V => domain + more frequent than expected in virus
Each letter is lowercased when domain is less frequent than expected.
Each letter is replaced by '~' when domain is very seldom.
Each letter is replaced by '-' when domain is not observed.
digit 6: average number of repeats for each protein in Pfam family
digit 7: average length / 50 of Pfam domains
digit 8: log10(number of Pfam domains)
Protein entries:
Detailed information on each protein can be displayed by clicking the corresponding table row.
Popup windows can be maximized by clicking the small top-right square.
Each protein entry is described by the following sections:
Description: protein name obtained from Uniprot
Identifier: Uniprot identifier
Uniprot: accession number
Gene: gene name
Species: species name
Links:
TbDB: link to Tuberculosis TB database
MycoBrowser: link to database on pathogenic mycobacteria
Uniprot: link to annotated sequence database
Alphafold: link to EBI database of AlphaFold models
GO: link to functional classification GO (Gene Ontology)
EggNog: functional classfication from EggNog
3did: link to domain interaction database 3did
Domine: link to domain interaction database Domine
String: link to protein pair correlation database String
Domains:
Pfam1, Pfam2, Pfam3: links to Pfam domain families
Interpro: link to protein domain database Interpro
Length: number of amino acids in protein sequence
Sequence: amino acid sequence retrieved from Uniprot database
Orthologs: orthologous proteins detect in close mycobacteria genomes
This section lists all orthologous sequence detected in Mycobacterium tuberculosis, bovis, avium, marinum, leprae, smegmatis and abcessus.
Relative protein similarities are assessed by a matrix of pairwise sequence identity percentages.
Alignment: orthologous protein alignment
The section contains a multiple sequence alignment of all orthologous proteins.
This alignment can be downloaded in aligned FASTA format by clicking the "download" link.
Location: predictions on protein location
This sections describes the location predictions for all detected orthologous proteins from close mycobacteria genomes.
The meaning of each column is explained in the global table description.
Structures: similar PDB structures or AlphaFold models
Representative PDB structures and all AlphaFold models from the EBI database were exhaustively compared and clustered.
All structures and models similar to the target AlphaFold model are listed in this section.
At the bottom of this section, the detected pairwise 3D similarities are assessed using the structural alignment program TMalign (TM-score, RMS deviation, Sequence identity percentage, Alignment length).