UniProt
textfiles, containing complete annotation, cannot be read directly by
biolegato. However, the file format is almost identical to that used by
the EMBL database. The bioLegato Import Foreign Format function uses
readseq to interconvert file formats. Therefore, to read this file
choose File --> Import Foreign Format, and
type
in the name of the file.

P1HOM Version 5/13/91
X-axis: ANP1C_MACAM
Y-axis: ANP2_ANALU
SIMILARITY RANGE: 10 MIN.PERCENT SIMILARITY: 30
SCALE FACTOR: 0.90 COMPRESSION: 4
40 80
U . . . . . . .
H . . . . . . .
F . . . . . . .
I . . . . . . .
P . . . . . . .
P . . . . . . .
OH . . . . . . .
FC . . . . . . .
CE. . . . . . .
40 ........GI............................................................
.N j . . . . .
.PP . . . . . .
. OK . . . . . .
. JG . . . . . .
. EF . . . . . .
. FE . . . . . .
. JK . . . . . .
. M . . . . . .
. L. . . . . .
80 ..................ML..................................................
. NP . . . . .
GGSEARCH performs rigorous
global alignments. That is, the alignment is constructed so that all of
both
proteins must be included in the alignment. Choose Similarity --> GGSEARCH to get
the GGSEARCH
menu:
Click Run. Output is shown below:
The best scores are: n-w bits E(1)
ANP2_ANALU 88 bp ( 88) 455 72.0 4.2e-218
>>ANP2_ANALU 88 bp (88 aa)
n-w opt: 455 Z-score: 365.0 bits: 72.0 E(): 4.2e-218
global/global (N-W) score: 455; 84.1% identity (92.0% similar) in 88 aa overlap (1-87:1-88)
10 20 30 40 50
ANP1C_ MKSVILTGLLFVLLCVDHMT-ASQSVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQ
:::.:::::::::::::::. ::::::::::::::::::: ::.:.:.:: :::::::::
ANP2_A MKSAILTGLLFVLLCVDHMSSASQSVVATQLIPINTALTPIMMKGQVVNPAGIPFAEMSQ
10 20 30 40 50 60
60 70 80
ANP1C_ IVGKQVNTPVAKGQTLMPNMVKTYVAGK
::::::: ::: .:::::::::: :.:
ANP2_A IVGKQVNRAVAKDETLMPNMVKTYRAAK
70 80
When similarity is this high, it is easier to view the mismatches, rather than the matches:

The best scores are: n-w bits E(1)
ANP2_ANALU 88 bp ( 88) 455 74.0 1.1e-232
>>ANP2_ANALU 88 bp (88 aa)
n-w opt: 455 Z-score: 375.5 bits: 74.0 E(): 1.1e-232
global/global (N-W) score: 455; 84.1% identity (92.0% similar) in 88 aa overlap (1-87:1-88)
10 20 30 40 50
ANP1C_ MKSVILTGLLFVLLCVDHMT-ASQSVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQ
x x X x x x X
ANP2_A MKSAILTGLLFVLLCVDHMSSASQSVVATQLIPINTALTPIMMKGQVVNPAGIPFAEMSQ
10 20 30 40 50 60
60 70 80
ANP1C_ IVGKQVNTPVAKGQTLMPNMVKTYVAGK
XX Xx X x
ANP2_A IVGKQVNRAVAKDETLMPNMVKTYRAAK
70 80
| Try
it: Select ANP11_MACAM, ANP3_MACAM Similarity --> P1HOM COMPRESSION: 2 |
P1HOM Version 5/13/91
X-axis: ANP11_MACAM
Y-axis: ANP3_MACAM
SIMILARITY RANGE: 10 MIN.PERCENT SIMILARITY: 30
SCALE FACTOR: 0.90 COMPRESSION: 2
20 40 60
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
20 ......................................................................
. . . . . . .
. . . . . . .
U . . . . . . .
O . . . . . . .
K . . . . . . .
I . . . . . . .
H . . . . . . .
I . . . . . . .
L . . . . . . .
40 .......Q..............................................................
V. . . . . . .
W . . . . . .
. . . . . . .
. c . . . . . .
. f . . . . . .
. e . . . . . .
. f . . . . . .
. . . . . . .
. . . . . . .
60 ......................................................................
. g. . . . . .
. . . . . . .
. .d . . . . .
. . c . . . . .
. . Z . . . . .
h . . . . . . .
i . . W . . . . .
h . . T . . . . .
i . . V . . . . .
80 ...........................X..........................................
. . Z. . . . .
. . f . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Surprisingly, comparison of the two sequences using GGSEARCH shows
no significant
similarity.
| Try
it: Select ANP11_MACAM, ANP3_MACAM Similarity --> GGSEARCH |
GGSEARCH produces global alignmentsYet, we know from the P1HOM output that there is a long region of similarity. In particular, P1HOM shows that the first 22 amino acids of ANP11_MACAM show a strong match with residues 22 through 44 of ANP3_MACAM. This illustrates an important distinction between global and local alignments. GGSEARCH performs alignments in which the entire length of both sequences must be forced into an alignment. GLSEARCH finds the highest-scoring local alignment between both sequences.
version 35.04 Oct. 7, 2008
Query: bio8564379525050104483.tmp.seq1
1>>>ANP11_MACAM 62 bp - 62 aa
Library: bio8564379525050104483.tmp.seq2 91 residues in 1 sequences
0 residues in 0 sequences (range: 50-77)
Statistics: Altschul/Gish params: n0: 62 Lambda: 0.158 K: 0.019 H: 0.100
Algorithm: Global/Global Needleman-Wunsch (2007) (6.0 April 2007)
Parameters: BL50 matrix (15:-5), open/ext: -10/-2
Scan time: 0.000
!! No sequences with E() < 0.010000
62 residues in 1 query sequences
91 residues in 1 library sequences
Scomplib [35.04]
start: Mon Sep 28 12:36:59 2009 done: Mon Sep 28 12:36:59 2009
Total Scan time: 0.000 Total Display time: 0.010
| Try
it: Select ANP11_MACAM, ANP3_MACAM Similarity --> GLSEARCH |
GLSEARCH produces global query alignments
version 35.04 Oct. 7, 2008
Query: bio3874515111645596706.tmp.seq1
1>>>ANP11_MACAM 62 bp - 62 aa
Library: bio3874515111645596706.tmp.seq2 91 residues in 1 sequences
91 residues in 1 sequences (range: >50)
Statistics: (shuffled [500]) Unscaled normal statistics: mu= 14.4260 var=94.9384 Ztrim: 0
Algorithm: Global/Local Needleman-Wunsch (2007) (6.0 April 2007)
Parameters: BL50 matrix (15:-5), open/ext: -12/-2
Scan time: 0.050
The best scores are: n-w bits E(1)
ANP3_MACAM 91 bp ( 91) 211 50.6 8.2e-91
>>ANP3_MACAM 91 bp (91 aa)
n-w opt: 211 Z-score: 251.7 bits: 50.6 E(): 8.2e-91
global/local score: 211; 58.1% identity (72.6% similar) in 62 aa overlap (1-62:25-86)
10 20 30
ANP11_ SVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQ
::::::::::::::: .:: .: : ::: .. .
ANP3_M MKSVILTGLLFVLLCVDHMSSANQSVVATQLIPINTALTLVMMTTRVIYPTGIPAEDIPR
10 20 30 40 50 60
40 50 60
ANP11_ IVGKQVNRIVAKGQTLMPNMVKTYAA
.:. :::. : : ::::.::: :
ANP3_M LVSMQVNQAVPMGTTLMPDMVKFYCLCAPKN
70 80 90
| Try
it: Select ANP11_MACAM, ANP3_MACAM Similarity --> SSSEARCH |
SSEARCH searches a sequence data bank
version 35.04 Oct. 7, 2008
Please cite:
T. F. Smith and M. S. Waterman, (1981) J. Mol. Biol. 147:195-197;
W.R. Pearson (1991) Genomics 11:635-650
Query: bio6278841360627469717.tmp.seq1
1>>>ANP11_MACAM 62 bp - 62 aa
Library: bio6278841360627469717.tmp.seq2 91 residues in 1 sequences
91 residues in 1 sequences
Statistics: (shuffled [500]) MLE statistics: Lambda= 0.2059; K= 0.148
Algorithm: Smith-Waterman (PGopt) (6.0 Mar 2007)
Parameters: BL50 matrix (15:-5), open/ext: -10/-2
Scan time: 0.070
The best scores are: s-w bits E(1)
ANP3_MACAM 91 bp ( 91) 214 66.3 6.1e-17
>>ANP3_MACAM 91 bp (91 aa)
s-w opt: 214 Z-score: 336.6 bits: 66.3 E(): 6.1e-17
Smith-Waterman score: 214; 60.0% identity (75.0% similar) in 60 aa overlap (1-60:25-84)
10 20 30
ANP11_ SVVATQLIPINTALTPAMMEGKVTNPIGIPFAEMSQ
::::::::::::::: .:: .: : ::: .. .
ANP3_M MKSVILTGLLFVLLCVDHMSSANQSVVATQLIPINTALTLVMMTTRVIYPTGIPAEDIPR
10 20 30 40 50 60
40 50 60
ANP11_ IVGKQVNRIVAKGQTLMPNMVKTYAA
.:. :::. : : ::::.::: :
ANP3_M LVSMQVNQAVPMGTTLMPDMVKFYCLCAPKN
70 80 90
| Try
it: Select ANP4_PSEAM, ANP_NOTCO COMPRESSION: 2 |
P1HOM Version 5/13/91The entire matrix is full of diagonals, because Ala-rich subsequences will always show similarity with other Ala-rich sequences.
X-axis: ANP4_PSEAM
Y-axis: ANP_NOTCO
SIMILARITY RANGE: 10 MIN.PERCENT SIMILARITY: 30
SCALE FACTOR: 0.90 COMPRESSION: 2
20 40 60 80
. h e f. ef f h e j. . . .
i hj g dZcbbdaZbcbdZcbdgghicbfbhgi . . .
igeg eZUgYbZcUYhbZUiYdjfdf ZbYhfi . . .
e. e cXeRXWbXTRWbXeTXWdfiaeZZeXf j . . .
i fc e cWTSRWWWSRRWWTSSWWcbZZaYYYXgh . . .
hg g dbVRRRZYVRRSYVRRTgYdafaZXYZbj . . .
j hf d cZfSQQSYTSQSZYTQQVYfaZZZcXahc . . .
gf.dg afdXQdQWXdQQeWZSgSZefYYdbcaig . . .
d b aaeVVPeSZVPPSgZZR UiafYYfced . . .
20 .....i...d.h.ZgaeVePSeeVPfSgaeTVedaYdYffhf............................
| To get around this problem, the
FASTA programs GGSEARCH, GLSEARCH,
SSEARCH and LALIGN all shufffle one of the sequences numerous times and
repeat the search. Rather than shuffling the entire sequence as a single unit, these programs shuffle amino acids or nucleotides in a small sliding window. Shuffling begins at one end and moves down the length of the sequence. What this accomplishes is to create a random sequence which preserves local fluctuations in sequence composition that were found in the original sequence. Local shuffling controls for the hypothesis that the two sequences show similarity because the each have similar local variations in amino acid compositions eg. in DNA, both sequences have an AT-rich followed by a GC-rich region. If the program finds significant similarity between the two unshufffled sequences, but not between one original and the population of shuffled sequences, we can reject the hypothesis that similarity is due to local variations in amino acid or nucleotide composition. |
![]() |
| Try it: Select ANP4_PSEAM, ANP_NOTCO Similarity --> SSEARCH |
SSEARCH searches a sequence data bank
version 35.04 Oct. 7, 2008
Please cite:
T. F. Smith and M. S. Waterman, (1981) J. Mol. Biol. 147:195-197;
W.R. Pearson (1991) Genomics 11:635-650
Query: bio7603365334708550294.tmp.seq1
1>>>ANP4_PSEAM 85 bp - 85 aa
Library: bio7603365334708550294.tmp.seq2 790 residues in 1 sequences
790 residues in 1 sequences
Statistics: (shuffled [500]) MLE statistics: Lambda= 0.1216; K=1.247e+05
Algorithm: Smith-Waterman (PGopt) (6.0 Mar 2007)
Parameters: BL50 matrix (15:-5), open/ext: -10/-2
Scan time: 0.800
!! No sequences with E() < 0.010000
85 residues in 1 query sequences
790 residues in 1 library sequences
We can conclude that these two sequences are analogous, resulting
from convergent evolution, rather than homologous, and descended from a
common ancestor.