Bioinformatics Tutorial

Seeking Structure


In this section, you will learn how to use a FASTA sequence as a search input (query) to the Protein Data Bank, the repository of almost all protein models that have been deduced by X-ray crystallography or NMR. Your search will tell you whether anyone has produced an experimental model of your query protein, or whether models are available for any protein of similar sequence. You will also visualize the model using an online graphics tool. Finally, you will learn how turn a long list of hits into an interactive Custom Report that makes details of each hit easy to find.

What is the structure of an opsin?

By now, perhaps you are curious about the structure of peropsin, but it's not likely that the structure of a protein of unknown function has been determined. It is likely that all opsins are similar in structure, so you can try to find a model of a similar sequence in the database for macromolecular structures, the Protein Data Bank (PDB). It will give you an idea of what kind of protein molecule an opsin is.

In fact, the PDB does not contain molecular structures at all. Is is better to say that it contains models of macromolecules. These models are interpretations of data from one of the two main methods of macromolecular structure determination: X-ray crystallography and NMR spectroscopy. When researchers make a model, or as they commonly say, "determine the structure" of a macromolecule, they deposit a file containing the three-dimensional coordinates of all the atoms in the model. This coordinate file—along with an online molecular graphics tool (like the PDB's Jmol Viewer) or a computer graphics program like DeepView—are all that you need to see and study the model on your computer. Next you will retrieve a model from the PDB and view it with an online graphics tool. You will also visit the home of a topnotch computer graphics program that you can download FREE and use on your home computer.

Point your browser to

The PDB home page contains a simple search box at the top. You can search for models using simple keywords or PDB ID codes. An PDB code has four characters, like 1CYO. How would you ever know a model by its code? When a new structure is published, the authors usually give the PDB code in the last reference of the bibiography. With that code, you can go straight to the model you want to see. But more often, your question, like ours, is more general. For such cases, PDB also provides forms for more sophisticated searches. For now, let's just see if any opsin models are availalble. Type "opsin" into the search box, make sure the PDB ID or keyword is selected, and click Site Search.

On 2008/09/22, this search returned only one model, which is quite puzzling, because a search for "rhodopsin" returns 48 models. So it appears that the quicky (quirky?) search tool at the PDB still needs some work. But this shortcoming is a gift for now. You have bagged an experimental model of an opsin; the PDB contains only models derived experimentally—either by x-ray crystallography or NMR spectroscopy. Now take a look at this one.

Click the PDB file code 3CAP above the tiny image of the model.

You have come to the Structure Summary page for this model, which is its home page at the PDB. This page is connected to just about everything you could possible do with this model. At the PDB, your first goal is always to get to the Structure Summary page for the model you are seeking.

NOTE: Structure Summary does not exactly jump out at you on this page. It's the tab selected over the main part of the entry, and it is a sub-tab of the Structure tab above the left column. Those tabs should be more prominent—they are what distinguishes each of the important pages in the PDB. If you want to know where you are in the PDB, look at the two sets of tabs at the top of the page. The set on the left are main tabs, and the set on the right are sub tabs of the main tabs. Main tabs take you to PDB's major sets of tools, and sub tabs subdivide them. Sub tabs under the Structure tab open LOTS of additional information about the currently chosen model.

In the left column of all PDB pages, you find a set of nested menus (they might vary on different PDB pages). Click Display Molecule to open the PDB display options. If you already own or use one of the listed viewers, like the free program DeepView, you are in business. Click your viewer to download the model and view it in a familiar environment. But first behave as if you are new to all this (perhaps you are), and use a handy viewer that works in your browser.

Click Jmol Viewer. Assuming that your computer has up-to-date Java software, your browser will load the viewer, and it will load the file 3CAP. Your should see models of two rhodopsin molecules—with backbones shown as ribbon-like cartoons, one green, one blue—and several ball-and-stick models of smaller molecules. Is rhodopsin a dimer? No, but in the crystals of rhodopsin from which this model was derived contained two rhodopsin molecules per asymmetric unit (the smallest portion from which the entire unit cell of the crystal can be constructed). PDB files usually show the full contents of the asymmetric unit. If more than one molecule is present, they are referred to as chains in the model.

NOTE ON VIEWERS: The viewer embedded in the viewing frame of this page is the widely used Jmol, which you will find in use as a molecular viewer at many web sites. If you take time to get to know this viewer fairly well, you will get more out of the many sites that use it. Like most of the other viewers listed at PDB, Jmol is quite limited in its capacity for analysis of protein structure.

In my humble opinion, the most powerful protein-analysis tool listed at PDB is DeepView. DeepView may be the only protein-structure viewing and analysis tool you will ever need. You will learn about it in if you continue into the homology modeling section, later.

Here are some other things you can do to get to know models in a Jmol frame (to get back to the original rendition, reload the page):

To learn more about Jmol, consult the help links at PDB below the display. You can also find extensive help for all viewers listed there. But if you plan serious protein structure work, especially judging model quality and comparing models by superimposing them, get to know DeepView.

Finding Opsin Homologs in the PDB

Next, you will try to find other models in the PDB that are homologous to the human opsins. You will ask the PDB, in effect, to "list all models whose sequences can be aligned with that of human red opsin, in order of sequence similarity." In PDB terminology, the red opsin sequence is the query, and similar models found (hits) are called subjects.

First, open your query file protred.txt (FASTA sequence of red human opsin), and copy the sequence portion only to the clipboard; omit all of the comment line that begins with >.

At the top right of any PDB page, click Search. From the list of search types, click Sequence. On the resulting page, click the button next to use Sequence, and paste your red opsin sequence into the box just below. Not that the search tool is your new friend Blast, and that a E cut-off value of 10 is given as a default. From what you learned earlier, you know that this is not a very restrictive search criterion, so your search should pick up anything remotely similar in sequence to the red human opsin. Click the search button. The search tool is now looking for PDB models whose sequences are similar to the human red opsin sequence. Hits in UniProt are just other proteins, most of whose structures are not known. Hits in the PDB are models, so hits tell you that there are experimental models for one or more proteins that are similar in sequence to your query.

On 2008/09/22, I got 26 subjects, or 26 PDB models whose sequences are homologous to the search sequence. Each is listed with an E-value, which is the probability that the sequence similarity between query and subject is a coincidence. The first result or subject is PDB model 1F88, a model of bovine rhodopsin. The E-value is 6.2 x 10 -74 . In other words, while the probability that a coin flip and your call will agree just by chance is 0.5, the probability that the similarity between human red opsin and bovine rhodopsin is just a chance occurence is


which means, to any sane biologist, that these two molecules descended from a common ancestor. There is no chance that, in the history of the universe, two proteins could arrive at sequences this similar by chance. This also means that the structure of the bovine rhodopsin is a sure bet to be very similar to that of the human red opsin, whose structure is unknown (if if were known, this search would have found it).

Now look down the list of the models you found. Most are models of the same substance: bovine rhodopsin (lumirhodopsin, bathorhodopsin, and some others are altered forms that represent rhodopsin in different stages of the visual cycle, but notice that all of these come from Bos taurus, from which the good old barnyard cow got the name Bossy. A few hits are the recently published beta-2-adrenergic receptor, the first G protein coupled receptor model besides rhodopsin. Perhaps by the time you take this tutorial, there will be more.

Use the results page to answer these questions about the comparison between human red opsin and the bovine rhodopsin in PDB 1F88:

  1. How many corresponding residues, and what percent of the residues, do the two proteins have in common (exact matches)?
  2. How many and what percent of corresponding residues are similar in chemical properties?
  3. How many gaps did the alignment program introduce, and how many residues in each gap, to get best alignment between human red opsin and 1F88?
  4. Find the longest string of exact matches between the two proteins. How many matches does it contain, and what are the beginning and ending residue numbers?

Reports: Simplifying a Search Through Many Hits

Results pages are difficult to deal with if you want to look around on a long (anything more than 10) list of subjects (hits). To make a display that is easier to navigate, in the left column, click Tabulate, and then Custom Report. You can use this Custom Tabular Report form to generate a list of your subject that includes any features of interest. For now, you will generate a very simple list, but you will quickly see its power.

On the form, click to put checkmarks in these boxes: Descriptor (under Structure Summary), and Source (under Biological Details). Then click Create Report at the bottom of the form.

The custom report appears, with three columns, PDB ID code, model descriptor, and biological source of the protein. The form contains many clickable items. Clicking an ID code takes you to the Structure Summary page for that model. Clicking a column heading sorts the list on that heading. Try this by clicking Source above the third column. Then look down the Source column. This makes it easy to find the non-Bos taurus entries, which include that adrenergic receptor. Anything else?

Now you know how to search the PDB for models whose sequences are similar to a target or query sequence. Structural biologists use such searches when they have a new protein sequence and want to know its structure. If the structure is known, this search would find it, so if you are interested in the structure of a particular gene product, search PDB with its sequence to see if the structure is already known. If not, any hits with high sequence similarity can tell you the overall fold of the protein. You also got a glimpse of the Custom Report tool, which can make it easy for you to organize and peruse a large number of hits from any search.

Next, how to obtain a model if no experimental model is known.