ENTREZ: FINDING AND RETRIEVING SEQUENCES


EXAMPLE: Suppose you wanted to find disease resistance genes in legumes. 

Launch Entrez either from the Workspace --> Molecular Biology menu, or by typing 'entrez' at the command line.

Search for GenBank entries containing the word 'disease' in text fields, choose the Nucleotide database, set the Field to Text word, and click on 'Accept'. The word 'disease' will appear below, listing the number of hits.


To refine the search, change Field to 'Organism' and type 'legume'. There is no organism named legume, but the selection menu brings up closely-related choices, including 'Leguminosae'. Click on 'Leguminosae' to add it to the list below.


This is still a very large number of sequences, and it is likely that the majority of them are ESTs. We can get rid of ESTs by  querying for ESTs, and then negating the EST hits. First, change Field: to Properties, and type in "EST" as the search term.


Now, choose Options --> Advanced Queries. The Advanced Queries shows the raw query expression used by Entrez to combine the hits into a single set. In the expression '& ( "EST"[PROP] ) )' change the '&' to '-' ie.'- ( "EST"[PROP] ) )' and click on the Evaluate button.


The results show that 273 sequences match both words, but are NOT ESTs. Any number of words may be added to the Query Refinement window in this fashion. Choose which words you wish to 'AND' together by highlighting the numbers in the right column. In this example, clicking on 'Retrieve 273 Documents' will bring up a list summarizing the sequences:


You can double click on any sequence in the list to view the GenBank entry. For example, the soybean clone listed above has the following entry:


This may be relevant to disease resistance in legumes, because most disease resistance proteins in plants are known to have leucine-rich repeats. To save the GenBank entry, simply choose File --> Export GenBank. Usually when saving GenBank entries, it is best to use the LOCUS name as the filename, with the .gen extension. Hence, this entry would be saved as 'AY193892.gen'.