对蛋白质结构预测算法的评估是困难而耗时的。哥伦比亚大学的研究者们提供了EVA——以一种连续的、自动化、大规模的工作方式进行蛋白质结构预测算法评估的Web服务器。目前,EVA评估了一系列在网上可获得的预测算法的表现。每周,最新被测定结构的蛋白质的序列被自动提交给预测服务器,然后返回评测结果,并形成摘要,在网上发布。这样的检验和其结果,对蛋白质预测算法的研究者和使用者都是有益的。下面提供一个较为详细的介绍。
总体流程:
--------------------------------------------------------------------------------
Flowchart of EVA. Every day, EVA downloads the newest protein structures from PDB [1] . The structures are added to mySQL databases, sequences are extracted for every protein chain, and are sent to each prediction server by META-PredictProtein [2] . META-PP collects the results and sends them to EVA. Every week, EVA runs alignment programs for searching sequence (iterated PSI-BLAST [3] , MaxHom [4] ) and structure (CE [5] , ProSub [6] ) databases to determine homologues. Predictions of secondary structure and inter-residue contacts, as well as, comparative modelling are evaluated at the EVA satellites at Columbia University, Rockefeller University, and CNB Madrid. The central EVA site at Columbia collects all the assessments from the satellites and the results from the database searches, and publishes the updated web pages. Finally, all web pages are mirrored at the satellites.
--------------------------------------------------------------------------------
目标:
CASP addresses the question ‘how well can experts predict protein structure if given sufficient incentive to do so?’. In contrast, the question addressed by EVA is ‘how well could molecular biologists predict protein structure, if they simply take the output from the programs out there?’. Thus, the goals are:
Provide a continuous, fully automated, and statistically significant analysis of structure prediction servers.
As has been shown by many of us, predictions based on small numbers of samples are NOT representative. EVA running for a year could produce a fairly representative picture. Even running for a month EVA could produce more reliable estimates than CASP can do in 2 years (at least, for answering the particular, restricted — but important - question ‘how well do servers do’).
EVA will NOT answer to requests of users!! It will NOT be a meta-server, rather it will simply sit there and evaluate servers based on known structures.
EVA will NOT evaluate any server without the consent of the author. (Of course, the hope is that most of you to whom this message goes would co-operate.)
We are seriously concerned about the ‘negative’ aspect of the freedom of the Web being that any newcomer can spend a day and hack out a program that predicts 3D structure, put it on the web, and it will be used.
技术:
Repeat: no use of EVA upon request, only for evaluation purposes.
Targets: provided by the PDB pre-release, i.e. 20 + per week.
update of evaluation data: once a month (or may be on a running average basis).
The detail of the data displayed remains to be discussed (clearly, we could profit from the lessons learned by the Alamos pioneers: Fidelis et al., here!).
The particular programs used for the evaluation remain to be discussed. We hope that the decision on which software to use for evaluation would be supported by ‘the community’.
Anybody who feels that she or he has a better method for evaluation could make this method available to EVA and it will be used in addition from that moment on. (There may be some mechanism to decide after a year that some of the results could be left out from now on, since they do NOT add particularly to the overall evaluation procedure.)
Any developer will have the chance at any point to add (any) comment regarding her (or his) server’s output linked to the EVA results of that server.
There will be a public forum where users (and developers) can deposit their statements about any result (i.e. developers can talk about the results of others, here).
对象:
In its first implementation, EVA should address the following fields of protein structure prediction:
Predictions of protein structure in 1D (secondary structure, solvent accessibility)
Predictions of protein structure in 2D (inter-residue distances)
Predictions of protein structure in 3D (homology modelling)
Prediction of protein structure in 3D- , i.e. threading results that may imply 3D predictions but may actually never go the entire way, but may be restricted to stating ‘A is similar to B’.
Predictions of novel folds.
Soon after EVA will be running, we hope to extend it to also cover other aspects of protein structure, such as membrane regions, signal peptides, cleavage sites, may be structural/functional motifs (others to be added to the list).
框架:
EVA could physically be located in a particular place, or could be shared between various labs involved in the effort. Currently, we favour to have it running on one particular machine, and to install various WWW mirrors for the output. It may be reasonable to request some financial help to pull it online. At this moment, we envision that EVA would be realised by a few people who actually are willing to spend resources on the issue. Apart from that we should hope that there will be a slightly larger group of people who are on the ‘board’ of the service, i.e. who contribute their experience, their tools, their ideas, and their support.