PROTEIN What - Istituto Zooprofilattico Sperimentale del Piemonte

Transcript

PROTEIN What - Istituto Zooprofilattico Sperimentale del Piemonte
IZSTO
Istituto Zooprofilattico
Sperimentale del Piemonte,
Liguria e Valle d’Aosta
VI WORKSHOP DEL LABORATORIO NAZIONALE DI RIFERIMENTO (NRL) PER GLI STAFILOCOCCHI
COAGULASI POSITIVI COMPRESO S.AUREUS
12 / 13 Dicembre 2013
Analisi in silico e relazione tra enterotossine
stafilococciche e tossine ipotetiche
in silico analysis and relation between SEs and HPs
Guerrino Macori
National Reference Laboratory for Coagulase Positive Staphylococci including S.aureus – Torino
IZSTO
Istituto Zooprofilattico
Sperimentale del Piemonte,
Liguria e Valle d’Aosta
VI WORKSHOP DEL LABORATORIO NAZIONALE DI RIFERIMENTO (NRL) PER GLI STAFILOCOCCHI COAGULASI POSITIVI COMPRESO S.AUREUS
12 / 13 Dicembre 2013
in silico analysis and relation between SEs and HPs
Summary
- Definition of bioinformatic
- What is done, units information, scale overview
- Databases
- Some practices
• Reverse vaccinology
• Hypotetical proteins and SEs
- Conclusion
What is Bioinformatics/computational biology?
A marriage between biology and informatic
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
What is done in bioinformatics?
R&D - Nucleotide and aminoacid sequences, protein
domains and protein structures - models
Development of new algorithms for large data sets
Development and implementation of tools that enable efficient access
and management of different types of information
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
“A prerequisite to understanding the complete biology of an
organism is the determination of its entire genome sequence”
Fleischmann et al. 1995
Whole Genome sequencing
(linear sequence of DNA base units – A T C G-)
Human genome: 3.12 10*9 bp
Whole genome → exponential data → bioinformatic to organize and collect
2000-2001
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Post-genomic era
Is the Sequence sufficient to understand biological function of the organisms?
Bioinformatic to analyze in rational manner the genomic data
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
What “units of information” do we deal with bioinformatics?
• DNA
• RNA
• PROTEIN
• Sequence
• Structure
• Evolution
• Pathways
• Interactions
• Mutations
Biological data used:
• DNA - Genome
• RNA - Transcriptome
• PROTEIN - Proteome
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
What “units of information” do we deal with bioinformatics?
• DNA
• RNA
• PROTEIN
• Simple Sequence Analysis
• Database searching
• Pairwise analysis
• Regulatory Regions
• Gene finding
• Whole Genome Annotations
• Comparative genomics (Species and
strains e.g. oldest methods as PFGE)
>gi|8886401|gb|AF162269.1|
DNA sequences
CCCACTCCTCCATCTCACAAACACTTCTCTATACCCAACAATCCCTTTTACAATCCCTGCTCATTTAGTCAA
AATGGTCAAGATTGCTGCTATCATCCTCCTCATGGGCATTCTCGCCAATGCTGCCGCCATCCCTGTCATT
TCAACACCCAAATTACAGAGCCAACCGGCGAGGGCGACCGTGGGGACGTGGCCGAC
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
What “units of information” do we deal with bioinformatics?
• DNA
• RNA
• PROTEIN
•Splice Variants
•Tissue specific expression
•Structure
•Single gene analysis
•Experimental data/thousands genes
simultaneously (DNA chips, microarray,
expressione arrays)
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
What “units of information” do we deal with bioinformatics?
• DNA
• RNA
• PROTEIN
• Proteome of an organism
• 2D gels
• Mass spectromy
• Structure: 2D/3D/4D
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Protein analysis: scale overview
Organismo
Genome (Mb) Genes
E. coli
464300 (4300)
S. cerevisiae
13,5 (6000)
Drosophila melanogaster
165 (13600)
Arabidopsis thaliana
119 (25500)
Homo sapiens
3300 (30000/40000)
S.aureus
2,84 (2700
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Protein analysis: scale overview
Organismo
Genome (Mb) Genes
E. coli
464300 (4300)
S. cerevisiae
13,5 (6000)
Drosophila melanogaster
165 (13600)
Arabidopsis thaliana
119 (25500)
Homo sapiens
3300 (30000/40000)
S.aureus
2,84 (2700
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Protein analysis: scale overview and databases
Transcription and translation
folding
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
ORF-Finder
• Nucleotide sequences → translation (any frame) ORF (Open Reading Frame) discover
• ORF: proteic sequence with right lenght for an average protein (> 70-100 aa).
• Genome scanned by software for Hypotetical proteins (Hps): possible but not verified
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Protein analysis: scale overview and databases
HPs and Functional SEs domain
Highlight the similarities and differences
of functionally important sites
Derive a structural alignment
Detect evolutionary relationships can not
be perceived by the sequence
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Protein analysis: databases
GenBank
www.ncbi.nlm.nih.gov
nucleotide sequences
Ensembl
www.ensembl.org human/mouse genome (and others)
PubMed
www.ncbi.nlm.nih.gov
literature references
NR
www.ncbi.nlm.nih.gov
protein sequences
SWISS-PROT www.expasy.ch protein
sequences
InterPro
www.ebi.ac.uk
protein domains
OMIM
www.ncbi.nlm.nih.gov
genetic diseases
Enzymes
www.chem.qmul.ac.uk
enzymes
PDB
www.rcsb.org/pdb
protein structures
KEGG
www.genome.ad.jp
metabolic pathways
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
NCBI databases
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Proteic sequences databases
• Less data than nucleotidic sequences;
• Rarely protein seq come from sequencing;
• Obtained for nucleotidic seq tradution;
www.expasy.org
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Practice
• Reverse vaccinology
• in-silico analysis and relation between staphylococcal enterotoxins and hypothetical
toxins: a prediction study for Staphylococcus aureus
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Practice
Reverse vaccinology
First genomic approach for the development of a vaccine: The Reverse Vaccinology applied to
Neisseria meningitidis
Immunogenicity testing in
animal models
Vaccine
VACCINE DEVELOPMENT
Express recombinant
proteins
1-2 years
In silico vaccine candidates
Computer Prediction
Start From the Whole
Genome Sequence
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Practice
Reverse vaccinology
First genomic approach for the development of a vaccine: The Reverse Vaccinology applied to
Neisseria meningitidis
ORF prediction on the partial genomic sequence (ORF Finder)
Homology searches for all the predicted ORFs
(PSI-BLAST, FASTA)
Hits found
(function assigned)
Enzyme,
cytoplasmic
localization
Already known
Neisseria antigen
No hits found
(hypothetical proteins)
Homology to
bacterial surfaceassociated proteins
Localization prediction
(PSORT, SignalP, TMPRED)
-Secreted
-Outer
membrane
SELECTED
DISCARDED
-Inner membrane
-Periplasmic
-Lipoproteins
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
cytoplasmic
Practice
in-silico analysis and relation between staphylococcal enterotoxins and hypothetical
toxins: a prediction study for Staphylococcus aureus
Background
Staphylococcus aureus carries a large repertoire of virulence factors, including over 40 secreted
proteins and enzymes that it uses to establish and maintain infections.
• toxic shock syndrome toxin (TSST)
• Panton-Valentine leukocidin (PVL)
• the exfoliative toxins A and B (ETA and ETB)
• the family of staphylococcal enterotoxins A and B (SEA and SEB) and food poisoning
S.aureus may produce 21 different SEs - excluding variants species
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Background
S.aureus may produce 21 different SEs
Toxin
Molecular Mass (kDa)
Emetic Activity
Crystal Structure Solved
Gene Accessory genetic element
classical Staphylococcal Enterotoxins (SEs)
SEA
SEB
SEC
SED
SEE
27:01:00
28:04:00
27.5–27.6
26:09:00
26:04:00
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
ΦMu50a
sea
seb
sec
sed
see
pZA10, SaPI3
SaPIn1, SaPIm1, SaPImw2, SaPIbov1
pIB485-like
Φsab
seg
seh
sei
ser
ses
set
egc 1 (v Saβ I); egc 2 (v Saβ III); egc 3; egc 4
MGEmw2/mssa476 seh /seo
egc 1 (v Saβ I); egc 2 (v Saβ III) ); egc 3
pIB485-like; pF5
pF5
pF5
selu
selv
selj
selk
sell
selm
seln
selo
selp
selq
egc 2 (v Saβ III); egc 3
egc 4
pIB485-like; pF5
SaPIbov1, SaPI5
new types Staphylococcal Enterotoxins (SEs)
SEG
SEH
SEI
SER
SES
SET
27:00:00
25:01:00
24:09:00
27:00:00
26:02:00
22:06
yes
yes
weak
yes
yes
weak
yes
yes
yes
no
no
no
Staphylococcal Enterotoxins-like proteins (SEls)
SEl U
SEl V
SEl J
SEl K
SEl L
SEl M
SEl N
SEl O
SEl P
SEl Q
27:01:00
nd
28:05:00
26:00:00
26:00:00
24:08:00
26:01:00
26:07:00
27:00:00
25:00:00
nd
nd
nd
nd
no
Nd
Nd
Nd
Nd
No
no
no
no
yes
no
no
no
no
no
no
SaPIn1, SaPIm1, SaPImw2, SaPIbov1
egc 1 (v Saβ I); egc 2 (v Saβ III)
egcegc
1 (v1 Saβ
I); egc
2 (v2 Saβ
III);III);
egcegc
3; egc
(v Saβ
I); egc
(v Saβ
3; 4
egc 4; MGEmw2/mssa476 seh /seo
ΦN315, ΦMu3A
SaPI5
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Background:
‘‘hypothetical proteins’’:
•
protein that is predicted to be expressed from an Open Reading Frame, but for which there is
no experimental evidence of translation
•
•
Substantial fraction of proteomes
There is so far no classification, proteins predicted from nucleic acid sequences and that have not
been shown to exist by experimental protein chemical evidence.
Similarity between S.aureus 13 well known deposited SEs and 50 HPs through
following databases:
SEA - SEB – SEC – SED – SEG – SEI – SEH – SEK – SEL – SEM – SEN – SEO - SEQ
1.
Expasy's Protparam:
computation of various physical and chemical parameters for a given entered sequence protein -
http://web.expasy.org/protparam/
2.
NCBI Conserved Domains:
search for Conserved Domains within a coding nucleotide sequence-
http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
3.
PROTEIN DATA BANK - PDB
The PDB archive contains information about experimentally-determined structures of proteins, and allows to
visualize and align the most similar known structures - http://www.rcsb.org/pdb/home/home.do
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
47/50 HPS have at least one conserved domain
The instability index (I.I.) (provides an estimate of the stability of HPs
in a test tube)
classified 32 protein as stable
Within stable HPs:
6 HPs show conserved domain
homologies with SEs
Staphylococcal/Streptococcal toxin,
Oligonucleotide Binding (OB)-fold domain
Staphylococcal/Streptococcal
toxin and β-grasp domain
6 HPs result unknown function
and belonging family of S.aureus
uncharacterized proteins:
4 sequences match
with an high E-value
to well-known proteins
(E-value connects
the score of an alignment between a user-supplied sequence and a database sequence)
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Background
Experimentally-determined structures of the 4 Sequences with the
high E-value matched (NCBI Access and Protein code are shown)
gi446958341 (1TS2)
gi501167136 (1I4G)
gi446958339 (1Q1L)
gi446958340 (1TS5)
“in-silico” analysis of the important functionally domains and protein families demonstrate that 6 of the 50 HPs reveals
relation as the same family of SEs. This would provide useful solution for the identification of many hypothetical proteins
in databases and prediction of their possible involvement in the mechanisms of foodborne illness.
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Two example:
biosequences alignment and algorithmic solutions
But we must always remember that:
The methods utilized (algorithm for example and modeling)
allow you to find the "best" alignment efficiently
but do not guarantee that the result is biologically true
If the biological sense matchs with function
The gene seq of a protein is less conserved than secondary structure, tertiary and
quaternary in the course of evolution.
two effects:
Homologous proteins can have very different sequence and then produce alignments with a
low similarity score.
If the similarity between two protein sequences is high (statistically significant) is quite
reasonable to assume that among them there is a relationship of functional homology.
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013
Grazie
Grazieper
perl’attenzione
l’attenzione
VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013