Structure of the prion repeat region
Mad Cow Home ... Best Links ... Search this site

Last update: 14 June 98

3D structure of the repeat domain: introduction
Cis and trans proline in the 3D prion structure
Evolutionary change of proline in mammalian and avian prions
The avian repeat anomaly
Blast pseudo-homologues of repeat region
So what is the structure of the repeat region?
Structure of flanking regions
Sequence variations in the repeat region
Take me to the pdb coordinates
Review of new experimental papers

3D structure of the repeat domain: introduction

10 May 98 webmaster 
All images and pdb files on this site are copyrighted but fully released and cleared for any non-profit, scientific, public educational/media or for-profit medical use; no further permissions are necessary.

As experimental groups close in on the structure of the prion repeat region, it is timely to pull together theoretical approaches that also illuminate this question. Here, intrinsic symmetry and compositional constraints are integrated with phylogenetic considerations and known protein structural data to suggest a 3D structure of the prion repeat domain and its flanking regions. Although knowledge of the 3D nmr structure of the globular domain did not suggest a function, that of the repeat domain is much more informative.

As earlier studies reported a simple random coil structure for the repeat region, it is first necessary to recollect anomalous structural properties of proline generally, of periodic proline in the protein world, and the location and role of proline in the prion molecule. These anomalies, in conjunction with the absence of essential metal cofactors, have understandable consequences for structural determinations based on recombinant prion apo-protein or synthetic fragments.

In horizontal genomics, an entire genome is sequenced; a couple dozen species will be completed by the end of 1998. Despite knowledge of the primary sequence, tens of thousands of protein-coding genes cannot be assigned any function at this time, including the product of the prion gene. In vertical genomics, a gene of interest is sequenced in thousands of species. This determines homological relationships, hopefully to a distant protein whose function is known (or can be more conveniently studied). Knowledge about gene function propagates rapidly once the function of a new family is figured out anywhere in the chain. The prion gene remains an orphan, but as seen below, pseudo-homologues prove surprisingly helpful.

Structure/function information can be deduced from the rates at which various stretches of a protein are evolving; this is sometimes called quintinary protein structure, the protein's structural change over time. Vertical genomics has become fast and affordable on a large scale. The prion protein is well-suited to this approach as the ORF is small, single copy, with no introns, paralogues or pseudo-genes, and a partial 3D structure at hand. About 80 species have been sequenced, along with dozens of polymorphisms and mutations. This suffices to characterize rate of change as a function of residue position, to determine local equivalents (reduced code), and to suppress noise from lab animal singlets and sequencing error via consensus and ancestral reconstruction. It follows immediately that neither the repeat domain nor its flanking regions could possibly be random coil. More subtly, a window to the actual structure is provided by exploiting the so-called avian anomaly: bird prions have a shorter, more symmetric biphasic repeat than mammals that in the end must achieve the same result.

Little has been learned about normal prion function from the tens of millions spent annually on animal containment facilities -- perhaps a few tens of thousands could be diverted to sequencing a few more species. The most useful supplemental species to sequence at this time would be additional marsupials, crocodilians, amphibians and teleost fish. It is best to sequence dynamically -- after each round, the next species is chosen after re-analysis of remaining issues and nodes on the phylogenetic tree that could resolve them. This would settle outstanding structural and functional issues and possibly provide the missing link to homologues in soon-to-be completed fruit fly and nematode genomes. A prion gene in a model organism could provide a more favorable experimental system for certain purposes.

Cis and trans proline in the 3D prion structure

25 May 98 webmaster
Proline of course is an imino, not amino, acid. The ring structure prevents hydrogen bonding on the amide nitrogen and makes its occurrence rare in beta-sheet and alpha helix. Instead, proline along with glycine is more commonly found in sharp turns connecting beta strands (beta bends) and in rigid extended structural proteins such as collagen and cuticle. Proline never participates directly in catalysis due to the chemical inertness of its methylene groups, though it may line a substrate pocket or provide rigidity to an active site.

Peptide bonds other than those with proline have a double bond character and two consecutive alpha carbons are generally trans with respect to this plane. Proline still emulates this double bond angle through steric hindrance, with angle omega seldom varying by more than 15 degrees from peptide-planar. While the cis conformation is not sterically forbidden for non-proline amino acids in short peptides, the ratio of trans : cis is nonetheless roughly 1000 : 1. For proline, the cis conformation (relative to the preceding residue) is less unfavorable and the ratio is approaches 4 : 1. Although the two conformers are in equilibrium, the activation energy is so high (20 kcal/mole) that unassisted attainment of equilibrium can take minutes at physiological temperatures, much longer or never in large proteins.

In a full-length properly folded protein, each proline is in one conformation or the other. Not surprisingly (eg, disulphide isomerase), there exists chaparone-like enzymes in all organisms called prolyl isomerases that lower the energy barrier to interconversion of proline conformers. This happens in the endoplasmic reticulum as the chain is extruded. In a folding nascent chain, a proline in the conformer not featured in the final tertiary structure can slow or prevent formation of properly folded mature protein. A statistical compilation for all prolines in non-specialized proteins at PDB gives 1 proline in 17 cis, compared to 1 in 2000 cis for general amino acids, so a 130-fold excess of cis in proline after adjustment for composition. However trans remains the most densely populated state. Prolyl isomerases are oddly the target of certain immuno-suppressive chemicals and are sometimes called cyclophilins.

It seems likely that the 3 prolines in the closely packed globular domain of mouse and hamster prion are in the correct native in vivo conformations even though the protein is made in E. coli and denaturing steps have been used in the absence of proline isomerase. This is because trapping of favorable conformations through hydrophobic, helix, and sheet formation may favor only one of the 8 permutations. All 3 prolines turn out to be trans in the mouse nmr structure. The torsion angles (omega) of the peptide bond are about 173 degrees for P138 and P165, somewhat off from planarity (180).

However, the amino terminus is a different matter, as here proline content is very high, periodic, and proper conformation may be stabilized by copper that is missing in nmr structural determinations (indeed, EDTA is added). With 12 prolines in the first 84 post-signal residues, the combinatorics of cis and trans give rise to 4096 conformers -- accordingly nmr shows an unstructured tail. The structure needs to be studied in the presence of micromolar copper and prolyl isomerase. These considerations may also affect recent mass spectrometry studies. Covalently modified arginines may also need to be correctly implemented before structural studies are meaningful in the amino terminal domain; note here they are adjacent to one proline and one residue removed from another. (In marsupial and birds, the first R becomes K while the second is conserved.)

KKRPKPGGGWNTGGSRYPGQGSPGGNRYP
PQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGGWGQGG
THSQWNKPSKPKTNMKHVAGAAAAGAVVGGLGG
YMLGSAMSRPLIHFGNDYEDRYYRENMYRYPNQVYYRP
VDQYNNQNTFVHDCVNITVKQHTVTTTTKGENFTETDIKMMERVVEQMCITQYQRESEAYYQRGASVILFSSP
The Swiss mouse structure (1AG2) has 3 prolines, at 137, 158, and 165. (UCSF hamster structure goes backs to M109 -- not quite back to the CJD prolines 102 and 105.) The Ramanchandran plot shows that their bond angles (phi: amid N-C alpah bond rotation, psi: C alpha-C carbonyl bond rotation) are unremarkable (though somewhat in the collagen proline angle patch) within the overall context of bond angles within the globular domain of mouse prion. It would be possible, using a non-redundant set of proteins at PDB with, say, less than 25% homology followed by Procheck run in batch, to see how prion prolines compare to a Ramachandran plot of 'all' prolines in 'all' proteins. [The Ramachandran plot for proline should really be 3 dimensional to simultaneously display omega, the peptide angle.] The diagram below shows the distribution of proline phi-psi angles relative to all phi-psi angles, the red dot shows average angles for globular polyproline II at -74, 145 degrees. [J Mol Biol 1993 229(2):472-493]

Non-proline Cis Peptide Bonds in Proteins.

J Mol Biol 1999 Feb 12;286(1):291-304 
Jabs A, Weiss MS, Hilgenfeld R 
In a non-redundant set of 571 proteins from the Brookhaven Protein Data Base, a total of 43 non-proline cis peptide bonds were identified. Average geometrical parameters of the well-defined cis peptide bonds in proteins determined at high resolution show that some parameters, most notably the bond angle at the amide bond nitrogen, deviate significantly from the corresponding one in the trans conformation. Since the same feature was observed in cis amide bonds in small molecule structures found in the Cambridge Structural Data Base, a new set of parameters for the refinement of protein structures containing non-Pro cis peptide bonds is proposed.

A striking preference was observed for main-chain dihedral angles of the residues involved in cis peptide bonds. All residues N-terminal and most residues C-terminal to a non-Pro cis peptide bond (except Gly) are located in the beta-region of a phi/psi plot. Also, all of the few C-terminal residues (except Gly) located in the alpha-region of the phi/psi plot constitute the start of an alpha-helix in the respective structure.In the majority of cases, an intimate side-chain/side-chain interaction was observed between the flanking residues, often involving aromatic side-chains.

Interestingly, most of the cases found occur in functionally important regions such as close to the active site of proteins. It is intriguing that many of the proteins containing non-proline cis peptide bonds are carbohydrate-binding or processing proteins.The occurrence of these unusual peptide bonds is significantly more frequent in structures determined at high resolution than in structures determined at medium and low resolution, suggesting that these bonds may be more abundant than previously thought. On the basis of our experience with the structure determination of coagulation factor XIII, we developed an algorithm for the identification of possibly overlooked cis peptide bonds that exploits the deviations of geometrical parameters from ideality. A few likely candidates based on our algorithm have been identified and are discussed.

P137 and P158 side chains face the interior; P165 the surface (left); the peptide bond of proline 137 and arg 136 is solidly trans (center); proline 165 terminates the beta-sheet (right) and could impede zipper-like progression distally in conversion to Prp-sc, though beta bulges do occur. There is no evidence for left-handed collagen-like coils. The torsion angles (omega) of the peptide bond are about 173 degrees for Pro 138 and P165, which could account for their conservation [the other 19 amino acids favor 180 degrees]. Pro 158 occurs within the RYP motif that occurs twice earlier in the molecule before the repeat begins; conceivably it could displace the earlier motif in some conformational mode (domain swappping). All in all, there nothing anomalous about prolines in the globular domain of nmr-refined mouse prion.

The quality assurance program WhatIf looks at more esoteric proline issues in hamster prion . Poorly puckered prolines not expected to occur in protein structures:

"Normal proline rings show a so-called envelope conformation with the C-gamma atom above the plane of the ring (phi=+72 degrees), or a half-chair conformation with C-gamma below and C-beta above the plane of the ring (phi=-90 degrees). If phi deviates strongly from these values, this is indicative of a very strange conformation for a proline residue, and definitely requires a manual check of the data. "

PRO  ( 158 )   9   26.5 half-chair N/C-delta (18 degrees)
PRO  ( 158 )  12  -16.7 half-chair C-alpha/N (-18 degrees)
"The omega angles for trans-peptide bonds in a structure are expected to give a gaussian distribution with the average around +178 degrees, and a standard deviation around 5.5. In the current structure the standard deviation of this distribution is above 7.0, which indicates that the omega values have been under-constrained. Standard deviation of omega values : 8.727"

Evolutionary change of prion proline

11 May 98 webmaster
Proline has a neutral frequency of occurrence of 4.6% in general protein. Having 12 prolines in the first 84 post-signal residues (14%) is high; having 5 in the 132 residue carboxy terminal domain is unremarkable. The extended invariant periodicity of prolines post-signal is a major clue to structure.

The 4 proline codons, CCX, give rise through change of first codon position to ser, thr, and ala (the latter two transversions) and through change of second codon position to leu, his, gln, and arg (the latter three transversions). After considerations of codon useage in mammals, GC content, enhanced occurrence of transitions, etc., the mutations expected most frequently are in the ratio 36A:28T:26L:24S:20R:15Q:10H.

However, the single base change point mutations acceptable in the evolutionary sense (PAMs) are found in the ratio 35A:27S:9Q:7T:7R:5H:4L with gly, val, glu, lys, and asn being found after multiple steps. Since leucine, threonine, and arginine are depleted (ratio of PAMs/random mutation) by factors of 7, 4, and 3, they are seldom structurally acceptable as replacements for proline whereas alanine, serine, and glutamine are more neutral.

Consistent with these statistics, P102L and P105L are causative for CJD; the former is a CpG hotspot. Comparable residues have not been tested in birds. Should P102x or P105x for all x also cause CJD, cis proline bonds might be suspected. There are no mutations to proline in prion sequence data, only from proline. Otherwise, there are no changes in prion prolines in any mammal species sequenced nor any polymorphisms: all the prolines are absolute invariants back to the marsupials, modulo understandable qualifications in view of repeat region insertions and deletions. Proline 158 and the RYP motif are further fixed back 310 mya to birds, suggesting them as structural anchors, proline 165 is preserved with a slight transposition, proline 137 is a histidine, and P102 and P105 appear adjacent in birds.

There is no compilation of data in proteins of known structure that stratifies proline mutational or evolutionary data with respect to cis or trans bonds. The expectations would be that mutation to proline would show striking enhancement if the original non-proline residue formed a rare cis bond, that the proline would be trans if the original bond was trans, that mutation from cis proline would be markedly suppressed, favor residues with little steric hindrance, and possibly be the origin of non-proline cis bonds. Cis prolines may be extraordinarily stable over evolutionary time and identifiable in that manner. PDB returns 1250 entries, not all distinct, for 'cis and proline.'

concensus bird dna probe from post- signal through hydrophobic core
AAGAAGGGCAAAGGCAAACCCAGTGGaGGgGGCTGGGGCACtGGGAGCCACCGCCAGCCCAGCTACCCCCGCCAGCCtGGCTACCCCCAaAATCCcGGcTATCCCCATAATCCgGGGTACCCCCACAACCCgGGGTACCCCCACAACCCtGGCTACCCCCACAACCCcGGCTGGGGACAAGGtTACAACCCATCCAGCGGAGGAAgcTACCACAACCAaAAGCCaTGGAAACCCCCCAAATCcAAgACCAACTTCAAGCACGTGGCcGGGGCaGCAGCgGCGGGTGCcGTGGTGGGgGGcTTGGGGGGCTACGCCATGGGg

ancestral bird protein probe from post- signal through hydrophobic core
KKGKGKPSGGGWGtGSHRQPSYPRQPGYPhNPGYPHNPGYPHNPGYPHNPGYPHNPGYPqNPGWGQGYNPSSGGSYHNQKPWKPPKSKTNFKHVAGAAAAGAVVGGLGGYAMG

ancestral mammal protein probe from post- signal through hydrophobic core
KKRPKPGGGWNTGGSRYPGQGSPGGNRYPPQGGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQPHGGGWGQGGGSHNQWNKPSKPKTNMKHVAGAAAAGAVVGGLGGYMLG
Via DNA replication slippage, repeat regions become pronounced hot spots (the repeat itself is the unit of genetic change, not single nucleotides). Within a given species, insertions followed by deletions (not necessarily of the insertion repeats) have the effect of holding number of repeats constant over long time scales and also of over-writing the sequence in terms of alignment. (The first and last repeats are not affected through this mechanism.)

Thus chickens experienced a recent repeat insertion of two units internally, giving rise a perfect tandem triplet of R2R2R2, apparently followed by a recent single unit deletion in one lineage somewhere within the ambiguity zone delineated by underlining in the figure above. The figure below highlights large ambiguity zones in blue [for deletion and insertion end points] that may reflect recent past events and predict forthcoming ones in various avian lineages.

The point here is that despite very different repeat unit sequences, number of repeats, frequent insertional and deletional events, and the immense time span since divergence, consensus avian and mammalian repeat regions still exhibit a remarkable similarity, as seen in hydrophobicity profiles. This, and high-affinity copper binding reported for both structures, suggests that both domains carry out the same functions under the same structural constraints.

From a common ancestor, birds and mammals expanded a different DNA sequence yet arrived at the same function. The deeper avian anomaly remains unresolved: in the common ancestor of birds and mammals, what was the common ancestor of a hexarepeat PHNPGY and an octarepeat PHGGGWGQ, did this protein bind copper or was a new function acquired through gene expansion? If one postulates that the repeats arose from a shorter and simpler upstream domain in a common ancestor (and suggestive regions do occur), then perhaps separate repeat-generating events took place in these lineages creating different but equivalent solutions to expanding an original single copper-binding domain to polynuclear metal sites found today.

Pseudo-homologues of the repeat region:
Blast searches against reconstructed ancestral sequences

15 May 98 webmaster
Despite many Blast and PDB searches by many people, no bona fide homologues of prion protein have shown up, even using a 310 million year-old reconstructed mammal and avian sequences to minimize divergence. Specializing to the pre-globular region and focusing on the more regular avian repeat, Blast returns a set consisting almost exclusively of structural proteins such as collagen, cuticle, annexin, tractins, adhesions, actin associates, elastins, and tropoelastins, regardless of how filters for low complexity regions are set.

It is important to recognize that avian prion repeat is biphasic: proline repeats every third residue whereas histidine and the others repeat every sixth residue. Biphasic Blast returns are therefor of the most interest:

Glu-C endoprotease [Staphylococcus aureus]: biphasic, repeats 3,6
Query:    20 PSYPRQPGYPHNPGYPHNPGYPHNPGYPHNPGYPHNPGYPQNP 62
             P+ P  P  P+NP  P+NP  P+NP  P NP  P NP  P NP
Sbjct:   289 PNNPDNPDNPNNPDNPNNPDNPNNPDNPDNPNNPDNPNNPDNP 331

P06914|csp_playo circumsporozoite protein: biphasic, repeats 3,6
Query:     1 PHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPG 53
             P  PG P  PG P  PG P  PG P  PG P  PG P  PG P  PG P  PG
Sbjct:   144 PQGPGAPQGPGAPQGPGAPQGPGAPQGPGAPQGPGAPQGPGAPQGPGAPQGPG 196

U92813) tractin -- Hirudo medicinalis: biphasic, repeats 3,6 
Query:     1 PHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXP 43
             PH PG P+ PG P+ PG P+ P  P  P  P  PG P     P
Sbjct:  1192 PHGPGGPYGPGGPYGPGGPYGPWGPGRPLGPGGPGGPEATDGP 1234

U42580 Streptococcus B antigen: biphasic, repeats 3,6
Query:     1 PHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNPGXPHNP 52
             P NP  P NP  P NP  P NP  P NP  P NP  P NP  P NP  P NP
Sbjct:    16 PENPEVPENPEVPENPEVPENPEVPENPEVPENPEVPENPEVPENPEVPENP 67

microfilarial sheath protein - nematode: monophasic, repeat 6
Query:    19 QPSYPRQPGYPHNPGYPHNPGYPHNPGYPHNPGYPHNPGYPQNPGW 64
             Q  YP Q GYP + GYP + GYP + GYP + GYP + GYP + G+
Sbjct:     9 QQGYPPQQGYPPQQGYPPQQGYPPQQGYPPQQGYPPQQGYPPQQGY 54

silkworm collagen: monophasic, repeat 3
Query:     9 GGGWGTGSHRQPSYPRQPGYPHNPGYPHNPGYPHNPGYPHNPGYPHNPGYPQNPGWGQGY 68
             GG    G    PS P  PG P +PG P +PG P +PG P +PG P +PG P +PG+    
Sbjct:   242 GGPQQGGQPINPSQPGHPGQPGQPGQPGQPGKPGQPGTPGQPGAPGQPGTPGQPGYPGQN 301
sp|q01443|           sporozoite surface protein ...   140  2.2e-11   3
pir||s42886          collagen - silkworm  (z303.  .   144  6.8e-11   1
gnl|pid|d1           choriogenin h [oryzias lati...   140  2.4e-10   1
gi|167666|           annexin vii [dictyostelium ...   134  1.5e-09   1
pir||s46965          microfilarial sheath - nemat...  134  1.5e-09   1
prf||1814271a        glu-c endoprotease [staph a...   127  1.3e-08   1
pir||s21758          glutamic acid-specific endope.   127  1.3e-08   1
gi|2772914|          precollagen d [mytilus ed...     122  2.3e-08   2
gi|1620100|          pro- and glu-rich, penpev (...   124  3.4e-08   1
pir||a48048          egg envelope protein wf    ...   124  3.4e-08   1
bbs|125129           v8 protease  staph aureus, ...   124  3.4e-08   1
gi|2275260|          tractin [hirudo medicinalis]     120  1.2e-07   1
gi|1684932|          adhesin protein [mycoplasma...   120  1.2e-07   1
gi|1589837|          cuticle preprocollagen [mel...   108  4.8e-06   1
gi|212742|           tropoelastin [gallus gallus]     103  1.7e-05   2
pir||a26601          elastin precursor - chicken      103  1.9e-05   2
Because of convergence, these are dubious homologues and do not indicate a structural role per se for prion protein. The interest lies in the theme of periodic prolines of repeat length 3 or 6, often in conjuntion with high glycine content. GYP, PGxP, PxNP motifs are seen frequently. The similarities are even more pronounced if the amides Q and N are taken as closely related. The correct spacing of glycines and prolines is important because only glycines can form the interior core of structural coils and only proline has cis-trans options and wiggle room on the peptide bond angle, otherwise imparting rigidity and periodicity through its limited use of psi-phi space.

Missing from the databases in this periodic proline context is histidine, that is, PHN repeats; ironically, histidine is the key residue used to ligand copper in other proteins. Glycine is found here instead , suggesting that histidine is the point of departure for understanding the transition from mere collagen-like to collagen-like with copper-binding. The other relevent aspect to histidine is its pK 7, bracketed by physiological pH.

These observations suggest that the known 3D structures of the structural proteins found by Blast, after some judicious adjustments, might illuminate the structure of the avian repeat domain, which is more regular than and arguably a good proxy to mammalian repeat. The idea would be to use as coordinates the proline skeleton along with glycines, methylene-shortened glutamines, and any tyrosine-asparagine stacking before introducing histidine (in place of gly). These changes on a hexamer repeat unit can be accomplished in SwissProt Viewer using its mutation feature and rotamer libraries. If a trimeric coil is deemed necessary, it can be generated by 120 degree rotation along the axis. The main problem is that glycine to histidine is an insurmountable internal obstacle for a triple coil.

The mammalian repeat region has very considerable similarities both in composition, total extended length, and hydrophobicity pattern, yet the spacing is different: 8-9 instead of 6 in avian (and 9-10 in a marsupial). The protein lineages are unquestionably homologous along their entire span. Mitigating against indeterminate extensability are nearly identical lengths of avian and mammalian repeats (despite occasional variance of ±1 units) and disease-causing properties of extra repeats in humans. The likeliest explanation is that longer repeats prevent correctly processing topologically at the endoplasmic reticulum, perhaps with prolines 102 and 105 providing signal-specific kinks similar to those prolines immediately post-signal peptide. Other hypotheses are that a fixed length coil is ideal for some structural purpose in mature protein. Prion protein lacks an fibronectin-type extensibility domain. It is quite possible that chimeric transgenics would function correctly [eg. bird repeat could substitute in mammal prion].

Blast returns nothing closely resembling mammalian PHGGGWGQ repeat, in part because polyglycine runs are favored. These proteins are mainly structural but not as instructive as the avian pseudo-homologues:

sp|p10495|g            glyc-rich cell wall struct ..  137  5.2e-10   1
pir||s54729            rna-binding protein cabeza .   124  1.8e-09   2
pir||s41161            keratin 9, cytoskeletal -      132  2.5e-09   1
gnl||d102              insoluble protein [pinctada.. .119  7.1e-08   2
pir||i59234            octamer binding transcription..117  2.6e-07   1
gi|929566|             fibrillarin [drosophila mel... 113  9.1e-07   1
pir||a44805            eggshell protein - fluke.. .   113  9.2e-07   2
bbs|157676             silk fibroin heavy chain {c... 112  1.2e-06   1
gi|2723362|            lustrin a [haliotis rufes...   110  2.3e-06   1
gi|2253105|            fibrillarin [plasmodium f...   109  3.1e-06   1
bbs|112352             cytokeratin 2, ck 2 [human,... 107  5.9e-06   1
gi|2388658|            insulin receptor substrat...   105  1.1e-05   1
gnl|e334927            attachment region binding p...  80  1.8e-05   2
gi|1122493|            bindin [echinometra oblonga]    77  2.3e-05   2
gi|2914731|            dragline silk protein spidr... 102  2.8e-05   1
pir||mwaxib            myosin heavy chain  - aca...   100  5.2e-05   1
gi|2605798|            minor ampullate silk prot...   100  5.2e-05   1  

So what is the structure of the repeat region?

17 May 98 webmaster 
The main considerations applicable to structural determination are:

-- Periodic primary sequence implies periodic tertiary structure (linear, helix, coiled helix, or ring). Chicken exemplifies a perfect proline periodicity occuring precisely every third residue over 54 amino acids.

-- The avian biphasic repeat, being more regular, is more amenable to structural prediction and may serve as a reliable proxy for mammalian repeat. The idea is to first determine avian repeat structure, then vary and intercalate side-chains substituents while holding the structural skeleton more or less fixed.

-- The structure/function has to accommodate some insertions and deletions of repeat units as well as certain point substitutions. For example, a single deletion is a wide-spread polymorphism in human and several other species and 1-2 extra repeats have no clear association with disease. The situation is similar in birds (see above). However, there are limits: 3 repeats is never seen in any species and 4-9 extra repeats are strongly associated with CJD. It seems that a pair of repeats is the minimal unit providing sufficient copper ligands and two pairs (4 repeats) is the minimal functional unit. The role of imperfect flanking repeats may be to stabilize copper-independent architecture.

-- Glycine serves as a small flexible tightly packing spacer with permissive bond angles; its carbonyl is a potential copper ligand, as in azurin; proline provides coiling capacity, rigidity, and periodicity through its narrow range of phi-psi angles. In tropocollagen, the glycines comprise the central core. The periodicity of glycine is as important as that of proline. Disease such as Ehler Anlos and osteogenesis imperfecta arise from the seemingly mild substitutions G to S. [Note in this context the unusual G to S of inbred lab mouse:PHGGSWGQ.PHGGSWGQ]

-- In all known copper-binding proteins (see refs 1, 2), histidine is the primary side chain liganding the metal ion, via either of the ring nitrogens. In superoxide dismutase, both ring nitrogens of a single histidine bridge copper and zinc atoms. Copper needs 4 ligands: other relevant candidates are water, carbonyls, and tyrosine hydroxyl. [See PNAS 1997 94:14225-30 Dec 23 Karlin et al.] There are not enough copper ligands available from side chains alone without histidines doing double duty. Copper is a fairly large atom and the spacing of ligands must respect bond lengths. Copper ( and zinc too) is consistently reported by experimentalists to have a strong affinity (10 micromolar) with stoichiometry 2-4 (importantly, more than 1) atoms per overall repeat region (consistent with 1 per two repeat units).

-- Proline forms an extended perfect repeat of length 3 (for up to 54 residues) within the biphasic avian repeat of length 6, with alternating HG and NY pairs stacked, strongly suggesting a left-handed collagen-like helix with 3 residues per turn, all trans bonds. This repeat is sometimes called polyproline II or PPII. The helix advances 3.12 angstroms per residue, far more than alpha helix at 1.50, resulting in an extended coil of diameter 14 angstroms (viewed as super-coiled) with some flexibility due to lack of axial internal hydrogen bonding. However, stacking of Y and N and copper intercalated between consecutive histidines may more than make up for this lack of hydrogen bonding and draw in the helix to a less extended form (3n residues/turn, n = 1,2,...), still left-handed because of phi-psi preferences.

-- Asparagine/tyrosine in avian prion are proposed to correspond to glutamine/tryptophan in mammal, suggesting the known QW motif, typically QNPDGGWG, and a NY counterpart newly proposed here, involving stacking of the amide side chain on the face of the aromatic residue. Assuming a PPII-like coil, the avian secondary periodiciy of 6 residues per unit neatly stacks N and Y (respecively QW) in 3D, explaining the pairings, their invariance, and the biphasic repeat. Phenylalanine is never found as a substitution, suggesting that some polarity [indole nitrogen and phenyl hydroxyl] is important in aromatic moiety of the amide-aromatic pairing. Another possibility is a tyrosine-histidine ring covalent bond as found at the active site of the copper-heme enzyme cytochrome oxidase, Scinece 280: 1753 1998.[Yeast prion has an irregular repeat region with an uncanny compositional resemplance to first mammalian repeat, PDAGYQQQYN, PQGGYQQ-YN, PQGGYQQQFN, PQGGRG-NYK but is not known to bind metal ion.]

Histidine is stacked with glycine on alternating turns of the coil, or rather not stacked, as glycine having no side chain allows an intercalated copper atom plus liganding to glycine carbonyl oxygen. This gives a novel axial stack of His-Cu-His-Cu-His-... (rotated 120 degrees from the N-Y-N-Y-N-Y-N-Y- and the P-P-P-P-P-P stacks) with interesting potential as a resonant structure. However, the distances are far too long for this to work as PPII per se.

Copper-histidine, as well as zinc-histidine bond lengths are typically 2.1Å, not 9.3Å as seen from super-oxide dismutase (where the geometry is puckered octahedral) or various compilations of known protein copper[See PNAS 1997 94:14225-30 Dec 23 Karlin et al.]. NY pairs are similarly too far apart.

Thus the issue here is the pitch and whether copper binding induces a flattening of this pitch bringing necessary ligands into proximity. Alpha helices do not have an integral number of residues per turn but here exactly 2 repeat units per full turn supplies enough ligands at the correct distance and right angles to coordinate correctly with copper and/or zinc plus retains periodicity for regular stacking.

In collagen, monomer supercoiling about a cylinderical axis creates the opportunity for the trimeric structure to form. Collagen has two prolines in the triplet unit (one hydroxylated) and polyproline itself has three, but avian prion just has just one and so lacks the full 'discipline' of the phi-psi = (-75, 145).

-- Note that when the pitch is close to zero, the structure looks more like a heme-type ring or hairpin with binding across 'anti-parallel' repeat monomers even though the structure is still a helix. In the absence of copper but presence of prolyl isomerase, the prion repeat conformation may well be extended PPII rather than random coil. Proline is somewhat tolerant of peptide bond torsion angle deviations which can contribute to coiling.

-- Collagen has a monophasic repeat PGxPGx, with the period 3 of glycine essential for the tightly packed 3-stranded parallel coils of tropocollagen, excluding a prion repeat trimer because of the bulky G to H substitution at one of the glycines. A dimer with vacancy or double helix are conceivable. Hydroxyproline (sometimes glycosylated) is important in collagen-like peptides but here there are adequate polar substitutions with H, Q/N. It seems unlikely that prion hydroxyproline could have been missed..

globular domain dimensionsSpecies#repeatsA/repeattotal length A# Cu atoms
duck, crane 6.5 18.0 117 3
owl, prion 7.5 18.0 135 3
chickenL 8.0 18.0 144 4
chickenXL 9.0 18.0 162 4
sq. monkey 4 18.0 72 2
mouse, human 5 18.0 90 2
bovine, sheep 5 18.0 90 2
bovineL 6 18.0 108 3
human extra 6-14 18.0 108-252 3-7

-- A prion repeats in this PPII conformation have extraordinary lengths, averaging 137 angstroms in birds, roughly 5 times the mean diameter of the globular domain of mouse prion (18-36Å) and twice the diameter of a membrane bilayer ( 65-80Å). Mammalian repeats have monophasic proline and lengths come out somewhat shorter using assumptions of the PPII model. Upstream defective repeats, not counted in table, could even out length comparisons between the lineages.

-- Short stretches of PPII are now recognized as fairly common in globular proteins, appearing on their surfaces rather than in interiors, for example neuropeptide tyrosine, potato lectin, plant cell wall glycoproteins, human salivary kallikrein, Sox-5 and other HMG boxes, bactenecin 5, serine proteinases , aspartic proteinases, immmunoglobulin constant domains (kinemage), Kunitz trypsin inhibitor, pituitary hormones, beta-endorphin, antifreeze glycoproteins, and avian pancreatic polypeptide. Complement activator c1q has two triple tropocollagen like coils holding the globular 'head' in position (tulip-with-stem).

-- Internal PPII forms a binding domain for various proteins, notably SH3 (Src homology 3) domains, profilins, cytoskeletal components (myosin I, alpha spectrin), signal-transduction enzymes (non-receptor tyrosine kinases, phospholipase C.) Ironically, SH3 in bovine phosphatidylinositol 3-kinase itself forms congophilic amyloid.

-- The N-terminus indeed may be a random coil when it is not charged with copper (or zinc) though more likely its two states are PPII and contracted PPII double coil. However, in vivo, as the nascent prion chain passes through the endoplasmic reticulum, prolyl isomerase and available metal ions probably cause a single conformation to be populated in mature protein. In other proteins, cis-trans stabilization acts as a cooperative zipper from N to C: a couple of proline bonds in trans attract a stabilizing metal which in turn favors a trans peptide bond in the succeeding proline and so on, the all trans state also being favored on its own and from the amide/aromatic periodic bonding.

-- The random coil found in nmr is then an artefact due to the presence of metal chelating agent and absence of prolyl isomerase; the correct state may be populated to some extent but is lost in averaging over all states. Since the disordered non-native state created by heating tropocollagen is commonly called gelatin, the random coil might be called the gelatin conformer to emphasis its non-native structure relative to the PPII-derived condition. In a similar vein, the model containing both zinc and copper bound might be called the brass hypothesis.

Mammalian and avian repeat regions likely accomplish the same function via slightly different yet analogous structures; the repeat regions are both held to the same length of about 51 amino acids (the avian repeat is shorter but there are more of them). Both have a 'defective' repeat structure at both ends of the repeat that may cap off the PPII coil and transition into the next domain. Repeats found today were probably expanded from the earliest flanking repeats.

The graphic shows one possible 'alignment' that transfers the avian unit to mammal. PHGGGWGQ transitions to PHNPGY by having Q loop back displacing the run of flexible glycines. The fit is a bit of a 'stretch.' The WGQ motif has largely displaced periodic proline as the dominant structural organizing feature in mammal and the repeat length may be somewhat longer. Again, one cupric ion would be bound by a pair of histidines and (carbonyl) glycines in consecutive repeats providing the four copper ligands needed per repeat pair.

-- The single known marsupial sequence has nona-repeats, a feature also found terminally in ferungulates. However, the eutherian-metatherian divergence node cannot be reliably reconstructed with only a single marsupial. Note in all species studied, there are his-gly pairs for metal liganding, amide-aromatic stacking pairs, and prolines for coiling-constrained phi-psi angles, off-180 omega angles, and cis-trans switching.

The Bottom Line:

In view of the above considerations, the 3D structures proposed here for the prion repeat region for avian and mammal prion repeat domains, are helical coils (of diameter 12.6Å and 17.2Å with mean pitches about 8.7 and 6.5 degrees, resp.) containing tightly bound copper (perhaps alternating with zinc spaced by histidine liganding through both ring nitrogens) in the central core. It takes two full repeats to make one turn of the helix so as to supply enough metal ligands at the required tetragonal geometry (6+6 = 4 x3) and stacking. Precise positioning of transition metals in an extended protected resonant structure suggests -- remarkably -- that the repeat domain is the core of an enzyme. The substrate evidently is a small inorganic molecule involving oxygen such as superoxide, nitric oxide, or hydroxyl radical because of copper's properties and the lack of a conventional substrate pocket.

Structure of flanking regions

13 June 98 webmaster
If the repeat region binds transition metal, what are the 69 residues flanking the 41 residue repeat regions doing? By aligning flanking regions of the repeat region in a large number of species, some curious sequence patterns emerge that must be clues to 3D structure and function.

Immediately following the signal peptide is a stretch of basic residues interspersed with prolines, called KP1 after its characteristic amino acids. Then comes a garbled pre-repeat region with similar composition and order to repeats themselves, called PG1. (In the DNA slippage model this region is the ancestral generator of the repeats.) Following the repeats comes, in macro-palindromic order, two analogous micro-domains, called PG2 and KP2. There follows a much-studied region of 20 hydrophobic residues, called here AG after its main constituent amino acids. This region has two remarkable micro-palindromic sequences, VAGAAAAGAV and GGLGG.

Paradoxically, the constituent amino acids of highly conserved domain AG are so bland, so generic, and have so little structural potential that Blast searches filter them out as uninformative. They are not part of the interior hydrophobic core of the main globular domain; the protein already has a membrane GPI anchor; this domain may be a hydrophobic receptor (precise packing material) for stacked aromatic residues of the repeat region or an electron transport component such as quinone or tocopherol. This could lead to amphipathic character: the region could switch between two conformations depending on whether the effector is bound. When the prion protein is partly degraded, it is naturally nicked at this domain boundary, so this conformational switch locked in the wrong position could spell trouble.

On a time scale of 100 million years, KP1 and PG1 show no evolutionary variation whatsoever, apart from an ancient glycine deletion event in non-ferungulates, a species-level serine/asparagine substitution, and a very low singlet rate of 1 per 580 residues (which could be entirely sequencing errors). PG2 is far less constrained though substitutions remain very conservative (eg ST, GSN, GNH, SN), in character with the repeat region composition, and generally associated with older nodes. KP2 shows no changes except for SN, a very blurred pair throughout this protein. The CJD mutations P102L and P105L lie in this domain. The AG domain is completely invariant except for V-to-M in primates at the amino terminus. (Valine is ancestral as can be seen from bird and marsupial. Elk have YLLG as beta strand.) The mutation A117V, affecting the third alanine, lies in this region.

Over much longer time scales of 178 and 310 million years, to divergences of mammals with marsupials and birds respectively, only region AG remains strictly conserved (until into the first beta strand). The other regions experienced short deletions and insertions making alignment problematical, though the compositional character of each region remains intact and 50% identity can be achieved with gapping anchored on residues such as tryptophan.

Hairpin C: One region of the early sequence is highly conserved even at the level of third codon position in DNA/RNA; this region, called hairpin C, was identified by Luck et al.. Since that publication, the number of species sequenced has tripled, yet the deduced structure has held up well. Curiously, the hairpin loop begins just at the boundary of PG1 and the repeat region. Colors show the hairpin stem below, dots in the graphic show third codon position:

aaccgctatccacctcagggagggggtggctgg
 N  R  Y  P  P  Q  G  G  G  G  W 
Clearly, the third codon of of a proline cannot remain cystosine (when any base would do) for 100 million years without strong selective pressure. The hairpin must serve some role in translational targeting the mRNA, regulating the rate or amount of protein made, pausing translation, or in regulating transcription. While periperal to final protein structure per se, maintaining a hairpin constrains amino acid substitution in this stretch, causing naively considered conservatism at the protein level to be overstated.

What ideas are available for 3D structure of these flanking domains and do they support structure and function proposed for the repeat domain itself?

KP1: The first few amino acids are part of the distal part of the signal peptide cleavage recognition site. Nearby like positively charged lysines (or arginines repel each other; proline every second residue is incompatible with known secondary structures. This region tolerates frequent small deletions as long as KP residues continue to predominate. The suggested structure is a bouquet of positive charges held in position by prolines; the function would be solubility of the repeat domain or possibly binding to negative phospholipids of the plasma membrane or to the negative side of the prion globular domain.

PG1: Certain aspects of the repeat are continued here, such as periodic aromatic amino acids and proline. This region may simply extend and stabilize the helical structure of the repeat domain without providing further metal binding sites and provide a spacer to the highly polar KP1.

PG2: The structure and function are probably similar to those of PG1.

KP2: Structurally, this is similar to KP1. While not providing a substrate for signal endopeptidase, it is a second signal to the ER that a major domain boundary has been reached and that the topology of the following hydrophobic domain AG must be correctely arranged. This is sometimes called the stop transfer effector region, or STP. [Yost CS et al. define this as HNQWNKPKTNMKH in Nature 343: p671 1990.] The region completes the large scale approximate palindromic structure KP1-PG1-(repeat)-PG2-KP2: an extended structure with solubility caps that places sheathed copper out into the extracellular matrix.

AG: We seldom think of amino acid palindromes, more usually nucleic acids. The amino acid sequence palindromes of region AG are a different form of symmetry than a direct repeat. If the symmetry is important, it becomes hard to change, rather like a tRNA stem, possibly partly accounting for its conservatism.

How might this symmetry be reflected in the structure? Begin with whatever 3D structure VAGAA possesses. Consider the mirror image taken at the carboxy terminus: this gives VAGAA-AAGAV. However the reflected structure is all D-amino acids and oriented C to N. Now formally swap amide nitrogens with the adjacent carbonyl. This both reverses chain direction and corrects to L-amino acid, yielding VAGAAAAGA. This process has no effect on atomic centers (other than NH and CO) or peptide bonds -- so creates no steric conflicts-- but interchanges phi and psi angles. Presumably, it remains an energy minimum given that VAGAA was. Inapplicable to proline-containing residues or more general cases where hydrogen bonding from sidechain to backbone was involved, this may lead to a valid structure when as here, most of the residues are glycine or alanine. Folding back into a hairpin implements this symmetry and suggests a symmetric hydrophobic binding pocket. The CJD mutant A117V disrupts this symmetry yet no mutations are known at other positions. The tentative nmr structure for this region is weakly consistent with these ideas; The graphic shows palindromic symmetry in 2Prp for GGLGG.

Sequence variations in the repeat region


The figure on the left summarizes amino acid variation in 80 species of eutherial mammals. These changes need to be very conservative if the structural and enzymatic proposal here is to remain viable. Note sequencing errors occur, inbred lab and domesticated animals do not necessarily have properly functioning alleles, and 'singlets' need further confirmation (more animals of the same species or additional closely related species).

The first repeat has by far the most point variability. Since most species have 5 repeats and only 4 are needed for 2 metal atoms, the first (usually 9 residues) repeat is not critical for metal binding. Instead, it may extend the helix for a half round to complete a capped environment for the first reactive center. A similar region is found just past the end of the repeat region. Glutamine apparently can (or must) replace histidine in end positions where bridging is not needed, possibly suggesting the metal order is Zn proximal, Cu distal (aspargine and carbonyl oxygens are known ligands of zinc). Marsupials also have the QHHH pattern for second residue; birds terminate differently: RQ-PSYPRQ-PGYPHN

The second repeat is strictly invariant so far. The third repeat shows only G to S in a lab animal. If a rare and possibly dysfunctional allele has been fixed by inbreeding, these mice would be poor choices for studying holoenzyme.

The fourth repeat shows a doubtful tryptophan deletion in rat but provides some support for the acceptibility of serine preceding this tryptophan, seen again in repeat 5 along with a following serine. Note the first repeat can have threonine before the tryptophan. Serine and threonine are known ligands of zinc.

Six species of primates have only 4 repeats, seven have 6 repeats, but the other 67 have 5. The first repeat is usually 9 amino acids, the others are octarepeats, with the exception of ferungulates which terminate in a synapomorphic nonarepeat. Repeat regions of DNA are generally subject to frequent strand slippage during replication, resulting in internal deletions and insertions of repeat units.

Despite the variability, all 80 species have repeat regions compatible with the binding of two metal atoms. Most changes involve conservative changes such as a shorter or longer run of glycines. The carbonyl glycine is not completely resolved unless serine carbonyl is manifestly unsuitable. The only questionable cases involve old world monkeys with 4 repeats and a glutamine amide in place of the ring histidine in the first repeat.

The same substitutions are seen in upstream avian hexapeptide repeat: S for G and Q for H and again in the single known marsupial sequence, PQGGGTNWGQ PHPGGSNWGQ PHPGGSSWGQ PHGGSNWGQ.

A table on a separate page shows the protein sequence from the repeat region for 80 eutherian mammals. Latin names are abbreviated on the left, the phylosort column brings phylogenetically related sequences adjacent to each other, the other columns give repeat modules. Because of insertions and deletions, these need have to be homologously aligned based on DNA; end points can only be determined up to an ambiguity zones.

Viewing the structure of the repeat region

20 May 98 webmaster

If your browser is not configured for interactively viewing molecular models, you may wish to first download and install free software and plug-ins. The best choices are RasMol and Chime for inline viewing and Kinemage/Prekin and SwissViewer for stand-alone and tinkering. The latter allows drag and drop adjustment of Ramachandran angles as well as concatenation of any number of repeat units.

Inline.pdbSnapshotDescription
ppII.pdb ppII.gif stretch of standard polyproline II coil
ppII_rep.pdb ppII_rep.gif stretch of repeat in extended PPII coil
sod_Cu.pdb sod_Cu.gif essentials of copper-histidine binding [SOD]
QW.pdb QW.gif gln-trp binding motif from squalene oxidase
NPGYPH.pdb - avian repeat unit taken from sialidase

Prion protein selectively binds Copper(II) ions.

Stoeckel J, Safar J,  et al.  [UCSF]
Biochemistry. 1998 May 19; 37(20): 7185-7193
Webmaster review 5 June 97

Prion repeat binds Cu++, trp; glycine carbonyl implicated
Prion repeat copper by mass spec
Higher prion repeats in cultured cells
Copper binding to extra-cellular domain of Alzheimer APP
Copper chaperones in yeast

After reminding us of Menkes and Wilson's CNS copper diseases (SOD comes later) and chelator spongiform effects, Stoeckel et al. look at hamster 29-231, somewhat regrettably missing 6 amino acids N-terminally and modified arginines (as well as carbohydrates and GPI anchor).

Comparing circular dichroism with and without CuCl2, they observe minor unfolding and a change in environment for aromatic residues, confirmed by 70% fluorescent quenching and blue-shifting of tryptophan, suggesting a more hydrophobic milieu. The conformational shift is assigned to the repeat region because of a similar effect with 57-91.

They and say without elaboration that no spectroscopic support is found for polyproline II helices [despite a minimum at 208 nm], offering no explanation for why Collinge's group had previously concluded this in Feb 97 [ellipticity minimum at 207 nm and weak broad max at 227 nm]. Collinge's group studied RYP PQGGGGWGQ PHGG and RYP PQGGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ PHGGGWGQ in FEBS Letters 405: 378-384 1997, finding highly solvated tryptophan indicating extended conformation but oddly never considering copper. Neither group experimented with avian prion which is a far stronger candidate for resembling PPII.

My explanation would be that when multiple secondary structures are present simultaneously, the CD spectral superposition cannot be deconvoluted uniquely. Further, if a novel repeat variation [still left-handed] of PPII occurs, say with different pitch, the situation is murkier still. (CD is known to markedly overstate beta sheet on 29-231.) CD spectra are just warmed-over classical optical activity which ultimately must derive from L-amino acids and beta carbon centers of threonine and isoleucine. This handedness translates into secondary handedness of left-handed PPII coils, twisted beta sheet, right-handed alpha helix, and so on. A complete set of 'basis vectors' is necessary to deconvolute a CD spectra in terms of 'fourier coefficients' of various secondary structures. If a novel coil contributing to the CD spectrum is missing from the basis set, its contribution necessarily gets misassigned.

The tryptophan fluorescent quenching is now used as a proxy for copper binding, which saturates (specifically for copper) at about 2 inequivalent copper atoms per molecule at not unreasonable concentrations of about 30 micromolar. The possibility remains that other metal ions bind quite well but simply don't affect tryptophan fluorescence particularly. It is also important to study the case where copper and zinc are both present because some superoxide dismutases are alternately bimetallic. The pH dependence unsurprisingly favors titration of repeat histidines. I disagree that quenching implies physical proximity of tryptophan to copper in the final state.

Finally, we get to equilibrium dialysis and copper-67. This again yields two sites and a 14 micromolar dissociation constant. (Maybe someday someone will do the obvious: releasing radiolabelled prion protein variants from cell culture.) Copper is found to facilitate a less-reversible temperature-dependent shift in in CD spectrum (taken as conventional secondary structure shift to beta.) Binding constants and stoichiometry are subject to the limitations of working with an apo-protein fragment.

Collinge's group also looked at temperature effects and chaotropes, finding much more 'PPII' at very low temperatures and _with_ urea and GuCl, the data affirming a non-random coil as a room-temperature populated state even in the absence of copper. They bring up an analogy with C hordein, a group of barley proteins whose primary structure consists predominantly of an octapeptide repeat motif PQQPFPQQ. However Tatham et al consider the structure as a mix of PPII-like and a stiff coil ('worm-like' chain) with periodic beta I/III turns and trans prolines. [Biochem J 1992 Oct 1;287( Pt 1):183-185]

Nmr is not the tool to determine the final structure because of assignment ambiguity in and within repeats and the paramagnetic aspects of cupric ion; mass spectrometry can determine only bond constraints but not spatial coordinates. It might nail down which glycine is involved, though this is scarcely an issue for birds and ambiguous in mammals. This leaves small molecule xray crystallography as the definitive method, though EPR or EXAFS are routinely used in other settings.

Note that the repeat region has a random coil conformer, a stiff but extended coils in the absence of copper, and a compacter coil in the presence of copper. Conformational shifts induced by copper have not been tied to disease-state normal and rogue conformational changes.

Looking through PDB, Stoeckel et al. find a APGYPH loop in a bacterial sialidase [see below], readily converted to a starting point NPGYPH of chicken. They say misleadingly that H and carbonyl G of sialidase 1EUT can ligand with copper, citing an unrelated azurin paper with such a glycine ligand. Most ironically, this sialidase does have a domain like fungal galactose oxidase but the copper binding there has nothing to do with this hexarepeat.

Next they joined two avian hexarepeats and bound a copper in a plane of two histidines and a two carbonyl glycines. This is a fairly strong constraint on possible structures. Mammalian repeat was then bootstrapped off this template, using gly-to-ser in mouse to eliminate the gly upstream of tryp (avoiding the question of whether the carbonyl of serine could substitute). The structure was checked for steric conflict and bad bond angles in ProCheck though there was no energy optimization and the allowed configuration space is still large. A color model of two repeats and one copper is pictured, though the journal let them get away without publishing its coordinates. [The coordinates for APGYPH, their apparent starting point, are provided here as NPGYPH.pdb]

One does not learn from this paper that Miura T et al. had reported 18 months earlier that HGGG contained the Cu ligands using Raman spectroscopy [FEBS Lett 1996 Nov 4;396(2-3):248-252] nor that Hornshaw HP had previously used aromatic fluorescent quenching to infer conformational change upon binding of copper: BBRC 1995 Sep 25;214(3):993-999. There is no reconciliation of differing binding constants.

The model, which on the whole is a good effort, slightly lacks a dyad axis about the copper suggesting that the tryptophans (and unstacked prolines) are slightly misplaced, violating the precept that periodic sequence gives periodic structure. I predict viewing the rough local two-fold axis as part of a global screw axis would improve this structure..

It is difficult to say if the model here would extend to 4 repeats without conflict or what it would look like as a backbone (left-handed helix with some internal tightening?) or estimate the inter-copper distance in the 4 repeat extension or if the copper planes would be parallel and interactive.

No models are proposed for repeats lacking copper, end-capping of the structure with pseudo-repeats, or repeats with deleted glycines. The model fails to explain the invariant glutamines [resp., invariant asparagines in birds]; I proposed long ago the QW [resp. NY] stacking motifs.

The paper concludes with some ideas about function, notably that the repeat is a copper sink to reduce copper catalyzed [Fenton] superoxides and notes disease parallels with SOD [ALS] and copper in APP Abeta of Alzheimer, and discussion of extra-repeat CJD and single repeat deletion normal polymorphism.

The bottom line is that 4-5 research groups are rapidly converging on a novel structure for the prion repeat region with bound copper. This structure could be viewed as unprecedented, or simply as a hybrid of known design elements. Copper has very limited and specialized uses in proteins, so its presence and form of display greatly constrain normal function.

However, the repeat domain is not strongly coupled to the globular domain at this point, so the latter's function is not illuminated. Even though the repeat region may shift between conformers, these have not been shown to couple or relate to normal-rogue disease conformatinal shifts.

The sialidase 1EUT from the bacterium Micromonospora viridifaciens has a non-repeating sequence APGYPH at 547- 552 which resembles chicken repeat NPGYPH. Gaskell A et al. write that about this galactose-binding sialidase, "the bacterium may have acquired both the immunoglobulin module and the galactose-binding module from eukaryotes, as the enzyme shows a remarkable similarity to a fungal galactose oxidase [Dactylium dendroides/Hypomyces rosellus, 1GOH and 1GOG] which possesses similar domains performing different functions and assembled in a different order. " Structure 1995 Nov 15;3(11):1197-1205. The galactose oxidase is a free-radical secreted copper enzyme with a 'novel thioether bond linking Cys 228 and Tyr 272 in a stacking interaction with Trp 290'. [(1991) Nature 350, 87-90)] but the hexapeptide is NOT what binds copper.

Mad Cow Home ... Best Links ... Search this site