MODELLER基础教程-单模板的同源模建

数据挖掘 · 发表于 2013-9-22 17:21:48

马上注册，结交更多好友，下载更多分子模拟资源。

您需要登录才可以下载或查看，没有帐号？我想注册

x

MODELLER

背景：

我们只有1个蛋白的序列，没有其结构。

寻找最好的pdb作为模板进行模建

Step1

根据它的序列搜索和它序列相似的有结构的蛋白。

首先构建modeller的序列文件TvLDH.ali。

>P1;TvLDH

sequence:TvLDH:::::::0.00:0.00

MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKA

AFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPEN

FSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKI

GHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVV

EGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG*

然后借助于脚本searchSTR.py来寻找模板

frommodeller import *

log.verbose()

env =environ()

#--Prepare the input files

#--Read in the sequence database

sdb =sequence_db(env)

sdb.read(seq_database_file='pdb_95.pir',seq_database_format='PIR',

chains_list='ALL',minmax_db_seq_len=(30, 4000), clean_sequences=True)

#--Write the sequence database in binary form

sdb.write(seq_database_file='pdb_95.bin',seq_database_format='BINARY',

chains_list='ALL')

#--Now, read in the binary database

sdb.read(seq_database_file='pdb_95.bin',seq_database_format='BINARY',

chains_list='ALL')

#--Read in the target sequence/alignment

aln =alignment(env)

aln.append(file='TvLDH.ali',alignment_format='PIR', align_codes='ALL')

#--Convert the input sequence/alignment into

# profile format

prf =aln.to_profile()

#--Scan sequence database to pick up homologous sequences

prf.build(sdb,matrix_offset=-450, rr_file='${LIB}/blosum62.sim.mat',

gap_penalties_1d=(-500, -50),n_prof_iterations=1,

check_profile=False,max_aln_evalue=0.01)

#--Write out the profile in text format

prf.write(file='build_profile.prf',profile_format='TEXT')

#--Convert the profile back to alignment format

aln =prf.to_alignment()

#--Write out the alignment file

aln.write(file='build_profile.ali', alignment_format='PIR')

输入文件：pdb_95.pir #这是一个结构数据库的序列文件，查询的序列文件TvLDH.ali;

利用到的氨基酸相似度矩阵：blosum62.sim.mat。

输出文件：build_profile.ali,build_profile.prf

整个脚本的思路：读结构序列的数据库文件，然后将其转成机器码；

读要查询的序列文件（ali）,将其转成Prf格式。

设置查询条件（1氨基酸相似度的矩阵blosum62.sim.mat 2）

‘’’

The profile.build()command has many options. In this example rr_file is set to use the BLOSUM62similarity matrix (file "blosum62.sim.mat" provided in the MODELLERdistribution). Accordingly, the parameters matrix_offset and gap_penalties_1d areset to the appropriate values for the BLOSUM62 matrix. For this example, wewill run only one search iteration by setting the parameter n_prof_iterationsequal to 1. Thus, there is no need for checking the profile for deviation (check_profile set toFalse). Finally, the parameter max_aln_evalueis set to 0.01, indicating that only sequences with e-values smaller than orequal to 0.01 will be included in the final profile.

‘’’’

注意：ali即为PIR格式，序列比对格式

Prf序列搜索格式

直接用pythonsearchSTR.py就可以运行脚本，不推荐使用python2.3库的mod9.12解释器，因为这需要另外安装python2.3的库。

Step2挑选合适的模板

首先分析build_profile.prf文件，

第11列（倒数第二列）是序列的相似度，越大越好；最后一列是显著性，越小越好。

第二列是对应的pdbcode（这时一组pdb的代表）。

根据显著性来挑选4-10个pdbcode。

这里我们首先挑选了显著性为0的6个pdb出来分析（显著性的指标更加好）。

基于显著性进行排序。

得到(1bdm:A, 5mdh:A, 1b8p:A,1civ:A, 7mdh:A, and 1smk:A).这6个结果。

再从这6个结构当中选择最合适的作为模板（6个结构之间结构相似度序列相似度）。

借助于脚本compare.py

from modellerimport *

env = environ()

aln =alignment(env)

for (pdb, chain)in (('1b8p', 'A'), ('1bdm', 'A'), ('1civ', 'A'),

('5mdh', 'A'), ('7mdh','A'), ('1smk', 'A')):

m = model(env, file=pdb,model_segment=('FIRST:'+chain, 'LAST:'+chain))

aln.append_model(m, atom_files=pdb,align_codes=pdb+chain)

aln.malign()

aln.malign3d()

aln.compare_structures()

aln.id_table(matrix_file='family.mat')

env.dendrogram(matrix_file='family.mat',cluster_cut=-1.0)

在运行这个脚本之前，需要把相关的蛋白下载到你的工作目录中，我利用pymol中的fetch进行下载。

这个脚本不能自动产生log文件，所以需要你主动保存为log文件

pythoncompare.py >compare.log

找相近的，分组找代表的。

?? 不知道这里的（compare.log）39 50.5 55.37是什么意思？？不清楚，

知道的人请和我站内联系，感觉好像是越小越好。

一共聚了3类，第一类有3个第二类有2个，第3类有1；所以第一类是优先的，然后在

从分辨率上考虑的化选1bdm。这里只挑选了1个模板，所以是基于单模板的同源模建。

step3 比对

在进行基于模板的同源模建之前需要进行序列比对。

把模板的序列和要模建的序列进行比对。

利用脚本 align2d.py

from modellerimport *

env = environ()

aln =alignment(env)

mdl = model(env,file='1bdm', model_segment=('FIRST:A','LAST:A'))

aln.append_model(mdl,align_codes='1bdmA', atom_files='1bdm.pdb')

aln.append(file='TvLDH.ali',align_codes='TvLDH')

aln.align2d()

aln.write(file='TvLDH-1bdmA.ali',alignment_format='PIR')

aln.write(file='TvLDH-1bdmA.pap',alignment_format='PAP')

Step4

同源模建，根据得到的比对，进行模建，产生不同的模型；

Generatemodel.py

from modellerimport *

frommodeller.automodel import *

env = environ()

a = automodel(env,alnfile='TvLDH-1bdmA.ali',

knowns='1bdmA', sequence='TvLDH',

assess_methods=(assess.DOPE,assess.GA341))

a.starting_model =1

a.ending_model = 5

a.make()

这里是产生5个模建的结果。

注意这里也不会自动产生log文件，需要你主动生成

python generatemodel.py >genemodes.log

Step5 对产生的模型进行评价，挑选最好的模型

方法1：根据打分进行挑选

DOPE打分越低越好，负的越大越好

GA341打分越高越好

GA341 is not as good asDOPE at distinguishing 'good' models from 'bad' models

molPDF是默认打分函数，越小越好

Yun He wrote:

Currently, I amworking on a model based on a very low sequence-identity template (about 13%).I use an alignment to produce 30 models, and then I sort these models by theirmolpdf or GA341 or DOPE. However, the sorts turn out very different results,some models with higher GA341 scores takes higher DOPE scores. I am confused,which one should I choose?

Nomodel assessment score is perfect, so you should not expect them to beperfectly correlated.

A'good' model will have a GA341 score near 1.0 (larger is better) but a smallDOPE score (DOPE scores, like molpdf, are "energies", so smaller ornegative is better).

Neither DOPE nor GA341 show a lot ofdiscrimination for poor models - e.g. you can't really say that a model with aGA341 score of 0.1 is 'worse' than one with 0.2 (the score was developed totell between good and bad models, not to rank bad models). Anything worse than0.6 or so is a bad model.

对挑选的模板再进行评价；

看看那些区域是合理的，那些区域是不合理的。

借助于脚本evaluate_mode.py

from modellerimport *

frommodeller.scripts import complete_pdb

log.verbose() # request verbose output

env = environ()

env.libs.topology.read(file='$(LIB)/top_heav.lib')# read topology

env.libs.parameters.read(file='$(LIB)/par.lib')# read parameters

# read model file

mdl =complete_pdb(env, 'TvLDH.B99990002.pdb')

# Assess withDOPE:

s =selection(mdl) # all atom selection

s.assess_dope(output='ENERGY_PROFILENO_REPORT', file='TvLDH.profile',

normalize_profile=True,smoothing_window=15)

这里的能量是经过平滑处理的15个氨基酸作为一个窗口。

最后再画氨基酸的能量折线图。

仅仅是参考。

更高级的评价，后面讨论

如有问题，请与我站内联系！

参考

1 http://salilab.org/modeller/tutorial/basic.html

2 http://www.urlshare.cn/cgi-bin/qzshare/cgi_qzshare_urlcheck?appid=2&rappid=2&url=https%3A%2F%2Fsites.google.com%2Fsite%2Fiiserbioinformatics%2Fassignment%2Fbio455-worksheet

(国外网站要用代理看)

woniuzhixing · 发表于 2014-11-20 19:53:07

SEQ_DATABASE_FILE       : pdb.bin
SEQ_DATABASE_FORMAT    : BINARY
CHAINS_LIST             : ALL
CLEAN_SEQUENCES          :          T
MINMAX_DB_SEQ_LEN       :          0    999999
Number of sequences    :          19
Number of residues       :       5090
Length of longest sequence:       695

openf___224_> Open          1MUH.ali
read_al_373E> Protein specified in ALIGN_CODES(i) was not found
            in the alignment file; ALIGN_CODES(    1) =  ALL

Traceback (most recent call last):
  File "C:\basic-example\shyan\build_profile.py", line 23, in <module>
aln.append(file='1MUH.ali', alignment_format='PIR', align_codes='ALL')
  File "C:\Modeller9.14\modlib\modeller\alignment.py", line 79, in append
allow_alternates)
ModellerError: read_al_373E> Protein specified in ALIGN_CODES(i) was not found in the alignment file; ALIGN_CODES(    1) =  ALL
>>>
换了一个未知序列做，上面错在哪里？还有那个搜索的库.pir怎么生成的，从哪里获得？

puzhongji · 发表于 2014-11-19 15:04:38

39 50.5 55.37，感觉这些数字像是进化树上的那些距离，就是离节点的距离。这个问题之前问过群里的药物-design，他在科学网有modeller的教程。还有就是不明白做compare.py的意义。最后选择模板的标准还是分辨率和序列一致度，那么我们为什么要做这一步呢？

MerryKing · 发表于 2017-12-20 09:30:50

你好，想请教一下，关于modeller同源建模时，.ali文件中怎么加入核酸（有缺失）、配体、离子呢？相对应的.py文件该怎么设置呢？谢谢大神~

dejunchem · 发表于 2013-9-23 20:33:04

哈哈，必须顶~~~~

laoman · 发表于 2014-3-6 21:18:51

要开始学modeller了，这篇帖子应该很有用~

zilvbu · 发表于 2014-5-9 15:59:15

写得很详细，学习了

puzhongji · 发表于 2014-11-19 14:48:23

我记得step2以前能保存compare.log

puzhongji · 发表于 2014-11-19 14:50:16

compare.py>compare.log生成log文件

puzhongji · 发表于 2014-11-19 15:05:29

弄错了，原来你就是药物-design

火麒麟90 · 发表于 2015-5-7 23:42:14

谢谢楼主

帐号		自动登录	找回密码
密码			我想注册

MODELLER基础教程-单模板的同源模建

马上注册，结交更多好友，下载更多分子模拟资源。

评分

相关帖子