Standard CS-Rosetta

This tutorial shows how to run the standard CS-Rosetta protocol from Ref. [1] with the setup tools from the CS-Rosetta toolbox. This tutorial explains how to set up a basic calculation from backbone chemical shift data only. To run calculations with additional restraints like RDC, NOE and PCS data learn how to run the basic protocol first in this tutorial and then learn how to add restraints in Tutorial: Calculations with Restraints.

1. Preparation of Input Files:

The inputs for this tutorial are the following files:

2jrm_trim.tab backbone chemical shifts in TALOS file format
2jrm_trim.fasta corresponding FASTA sequence for structure calculation
2jrm_trim.frags3.dat.gz fragment library with 3mer fragments
2jrm_trim.frags9.dat.gz fragment library with 9mer fragments

These files are obtained as shown in Tutorial: Fragment Picking with Chemical Shift Data. For convenience these input files are also stored in the toolbox-folder csrosetta3/tutorials/inputs. Either run the tutorials directly in the toolbox folder or link this folder into your current folder like this:

$ which setup_target   # to figure out where the csrosetta toolbox is installed on your system
/net/software/csrosetta3/com_alias/setup_target   # for instance this might be the answer 
                                                  # showing you that /net/software/csrosetta3 is the location of the toolbox
$ mkdir run_tutorials  #create a clean folder
$ cd run_tutorials       
$ ln -s /net/software/csrosetta3/tutorials/inputs

With this setup you can either run the provided tutorial script:

$ /net/software/csrosetta3/tutorials/tutorial_abrelax.sh 

2. Automated Setup of CS-Rosetta calculation

Use the automated setup from the CS-Rosetta toolbox to combine all input data files into a Setup.

$ cd csrosetta3/tutorials/inputs
$ setup_target -target 2jrm_trim -method abrelax -frags 2jrm_trim.frags*dat.gz -fasta 2jrm_trim.fasta -cs 2jrm_trim.tab

If you want you can also add the native pdb structure (with the correct trimming) to the Setup to allow on the fly computation of RMSDs (useful for benchmarks only).

$ renumber_pdb 2jrmA.pdb -fasta 2jrm_trim.fasta > 2jrm_trim.pdb
$ setup_target -target 2jrm_trim -method abrelax -native 2jrm_trim.pdb

Note, that up to this point calling setup_target has not produced any files in your working directory. Instead the Setup is stored in a central location specified by the environment variable CS3_BENCH_TARGETLIB. This variable should be set if the CS-Rosetta toolbox is properly installed. The default location is $HOME/cs_targetlib. To check the setting, run

$ echo $CS3_BENCH_TARGETLIB

You should have now created a Setup with the appropriate input files. Let's check

$ list_setup
---------------------------------------------------------------------------
---------             CS-Rosetta 3.0   (toolbox)                -----------
---------------------------------------------------------------------------
Setup( 2jrm_trim | abrelax_standard )
$ display_setup -method abrelax -target 2jrm_trim
---------------------------------------------------------------------------
---------             CS-Rosetta 3.0   (toolbox)                -----------
---------------------------------------------------------------------------
AbrelaxMethod:  abrelax
Setup( 2jrm_trim | abrelax_standard )
============================================================
LOADED: Method options from existing setup 'abrelax_standard'...
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Method: abrelax
Choosen Options: 
   cs nmr_data/2jrm_trim.tab
   fasta fragments/2jrm_trim.fasta
   frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz
   native native/2jrm_trim.pdb
------------------------------------------------------------

which shows that all our input files are now loaded into the setup and we are ready to generate a Run. A Run is generated using command setup_run. You can specify further options that are specific to a particular Run such as number of decoys, length of simulation, etc. To find out the available options run

$ setup_run -method abrelax -h

To generate a CS-Rosetta calculation that produces 10k structures use the command

$ setup_run -method abrelax -target 2jrm_trim -dir run_abrelax -nstruct 10000 -job interactive -relax [ -cycle_factor 10 ]
....[ many lines of output omitted ] ... 
Method  abrelax  has been setup in run_abrelax/2jrm_trim ...
Enjoy!

If successful you should see the last two lines shown of the output above. You can use option -cycle_factor to shorten or lengthen individual trajectories. The default setting for abrelax is 10 if the option is omitted.

3. Running CS-Rosetta

$ cd run_abrelax/2jrm_trim/run
$ source production.interactive.job -n 12

would start ROSETTA using 12 processors. This is not a whole lot to generate 10-50k structures as often required. Hence it is advisable to figure out which kind of queuing system is running on your machine. By choosing -job slurm as option of setup_run, for instance, you generate a job-script for SLURM, and would start your jobs on 512 cores like this

$ sbatch -n 512 production.slurm.job

If ROSETTA runs properly it should eventually start to store structures in a file called decoys.out.

$ grep SCORE decoys.out | grep -v score | wc -l

allows you to count the number of decoys produced so far.

4. Processing Output

The decoys.out file contains all generated structures in silent file format. In the following we describe a number of typical commands to extract data from a silent file. See also Tutorial: (Re-)Scoring of Structures for more details.

$ extract_scores decoys.out score rms chem_shifts > scores.txt #yield scores as plain numbers in 3 columns
$ extract_decoys decoys.out -score 10 > low_10.out #yield 10 structure that are lowest by ROSETTA energy 
$ pack_pdbs -silent low_10.out >  final.pdb #yield a multi-model PDB file
$ cat final.pdb | unpack_pdbs -remark ROSETTA-TAG #yields individual PDBs with Rosetta-tags as pdb-file name
$ cat final.pdb | unpack_pdbs #yields individual PDBs renamed: model_01.pdb, model_02.pdb, ...

5. Tutorial Script

The tutorial is provided as a script in csrosetta3/tutorials/tutorial_abrelax.sh. Just type

$ tutorial_abrelax.sh

and all steps above will automatically proceed (However, the cycle_factor is set to 0.01 and -nstruct to 10 to speed things up).


References

sampling protocol:

application:

Comments

Dear Oliver,

Is there a way to specify the pH value of a simulation? Or to set the side chains with correct charges?

Best,

Yisong

Hi Yisong,

There is no direct way to set the pH value.
You would have to change the Residuetype variants used for various pH-sensitive residues.

Best,
Oliver

Hi Oliver,

Thank you for you reply!
I found the residue type params files in the rosetta_source directory. But I still have two questions:
1. How do I change the residuetype variants?
2. I found the params files (for protonation states) are only for "fullatom" models; should I use the default residuetype in "centroid" stage then change residuetype for "fullatom" "relax" stage or should I make a new params file for "centroid" model and change residuetype in the beginning? If I should change residuetype in the beginning, should I also adjust somehow when picking fragments?

Best,

Yisong

Dear Oliver,

I am also very interested in the option of specifying pH values as well as temperatures in the MD, e.g. to look at protein folding and unfolding.

Thanks and regards,
Ann.

Hi Akwan,

CS-Rosetta is mainly for structure calculation, not to model a physical process like protein folding in detail. If you have data obtained at different temperatures and this data indeed reflects structural changes you should obtain different structures.

Best,
Oliver

Hi Oliver,

Thanks for your reply. Yes, we have measured the chemical shifts at high temperatures (45 C) and low pH (2.5). I am wondering if I should be changing the MD parameters and/or protonation states of the protein to get representative structures.

In particular, I am quite interested in doing something along the lines of the following paper by Kilambi and Gray (2012)
http://graylab.jhu.edu/publications/2012BiophysJ_pKa.pdf

Regards,
Ann.

Hi Ann,

the effects of subtleties like high temperatures and low pH on the structure calculation will be small if you have sufficient data that defines the structure. I would definitely try the simplest thing first and take the CS-Rosetta protocol as is.
Further refinement is of course always possible. You can change the used score-function as described in the manual here and here for basic abinitio protocol or here for RASREC.

Best,
Oliver

Hi Oliver,

Thanks for your insights. We tried the standard CS-Rosetta protocol but the structures didn't converge very well. However, I think it was most probably due to the 8 cysteines in our protein which form 4 disulphide bonds. The Cb and Ca shifts are substantially different for oxidised vs reduced Cys (e.g. coil Cb for oxCys is ~41 ppm but only ~27 ppm for redCys).

Would it be possible for the fragment search to consider the presence or for users to define oxidised Cys and use the appropriate secondary shifts? We already included the disulphide restraints by specifying the disulphide bond connectivities using the " - in:fix_disulf" option but I don't think this helped the fragment search.

If we used the chemical shifts as they are, then the fragment search terminates with error (no fragments can be found for most fragment containing any of the oxidised Cys). We have a partial work around by subtracting from our Cys chemical shifts the difference between the reported oxidised Cys shifts and the reduced Cys as found in the BMRB. However, the distribution of secondary shifts for oxCys and redCys in different types of secondary structures can be different. We'd appreciate your advice about how best to handle this.

Regards,
Ann.

Dear Ann,

It is true that the Cbeta chemical shift of oxidized and reduced Cys are quite unique and can be used to (nearly) unambiguously identify its oxidation status. We have considered such issues by assigned a different name "c" for oxi-CYS ("C" for the red-XYS )in the chemical shift input. The programs will then calculated the secondary chemical shift for Cys based on its oxidation status.

For your info, the random coil chemical shift table for CYS used by our chemical shift methods is listed below:

VARS RESCODE RESNAME HA CA CB C N HN
FORMAT %4s %s %8.3f %8.3f %8.3f %8.3f %8.3f %8.3f

CYS C 4.550 56.900 28.900 174.600 118.800 8.230
cys c 4.710 55.400 43.700 174.600 118.600 8.230

Best,
Oliver

Dear Oliver,

Sorry for the slow response but that's great to know :) Do I need to use little "c" in the sequence header of the .tab file or only for the chemical shift entries or both?

We are having some Linux issues atm so can't run my local copy of cs-rosetta. I submitted the updated .tab to http://condor.bmrb.wisc.edu/bbee/rosetta/index.php but it doesn't seem to be giving me any output if I use little "c".

Thanks and regards,
Ann.

Use little 'c' in the tab files at both places.
When you run the simulation, however, you have to use 'C' for all Cysteines. This is probably why the BMRB server doesn't respond.

Sorry for the awkward requirements. We will try to stream-lines this in future versions.