CS-Rosetta

This chapter describes the basic CS-Rosetta protocol as described in Ref. [1]. In the literature variants of this protocol are often labeled CS-PCS-ROSETTA, CS-RDC-Rosetta, etc., which refers to combinations of the basic CS-Rosetta protocol with various types of experimental data[2][3]. This particular labeling scheme makes sense in the context of the respective publication but can be misleading from a practical point of view, since the inclusion of experimental data does not change the way the software has to be used. Accordingly, this Manual differentiates protocols by the way conformational space is explored and not by the question whether and which experimental data is used. The CS-Rosetta protocol is thus characterized in the following way:

  • fragment library is picked using chemical shift data
  • structure calculation uses the abrelax sampling protocol
  • many thousand independent trajectories are started in extended conformation

Experimental restraints can be added to this protocol in a straightforward manner[2][3][4] (See here).

Preparation of Input Files

The input for the structure calculation consists of the protein sequence (fasta format) and a suitable fragment library. The fragment library can be obtained using chemical shift data with the provided fragment picker. Unstructured regions can delay or hinder convergence and thus it is advisable to trim the target sequence to exclude flexible tails. Unless better information is available one should consider as flexible all those residues for which RCI analysis of the backbone chemical shifts yields low order parameters (S2<0.7). The flexible tails should be removed by preparing truncated input files (chemical shifts, fasta sequence) before fragment picking is performed. Suitable trimming scripts are provided in the Toolbox and this tutorial walks through the entire preparation process.

Running CS-Rosetta

For this, Rosetta 3.x has to be installed and compiled (See here).
Use the Automated Setup applications to create a Setup for your target protein and subsequently generate all necessary run scripts.
The basic command to generate the Setup is

setup_target.py -method abrelax
-frags frags.score.3mers.gz frags.score.9mers.gz
-fasta 1ubi.fasta
-cs cs.tab
-target_dir ./setup_files

This command stores the setup files in the local folder ./setup_files. To generate the actual Run from the Setup type the command:

setup_run.py -method abrelax
-target_dir ./setup_files
-dir ./rosetta
-job interactive

which will generate the folder ./rosetta. To start the simulation type cd ./rosetta; source production.interactive.job. This command, however, starts only a single process, which is generally far too little for a serious structure calculation and one wants to run the simulation on multiple processors. To start the simulation on 20 processors type source production.interactive.job -n 20, which will work if MPI (see wikipedia) is installed on your computer. Even better (and usually mandatory in computer centers) is to use a proper queuing system to spawn the ROSETTA calculations. To aid with this process, a number of generic job-scripts are provided which can be accessed via the -job option of setup_run. For SLURM queuing, for instance, use the flag -job slurm in setup_run.py and subsequently type sbatch -n 200 production.slurm.job to start a job with 200 processes. Please note, however, that local installations of queuing systems can differ widely and the user has to find out himself how to run a job in the local setup. For more information see Automated Setup, Benchmarking and Queuing Systems. XXXprovide linksXXX. Note: Please feel free to contact us or leave comments on this page when you encounter different queuing systems. We are happy to assist and also aim to collect generic job-scripts that cover most naturally encountered systems.

Restarting CS-Rosetta calculations

If a CS-Rosetta calculation using MPI is stopped prematurely, it can be restarted without any further changes. The decoys that have been stored already in the output file will be kept and counted towards the overall goal (set with -nstruct).
Note however, that progress in individual unfinished trajectories will be lost. For small structures (time columns in the silent file decoys.out:

silent_data.py decoys.out time

Post-Processing of CS-Rosetta structure calculations

The structures calculated by Rosetta will be stored in silent file format in the file decoys.out. The structures are stored together with scores and chemical shift scores. Each structure is identified uniquely by a tag, and hence scores can be used for selection of particular structures. Low energy structures can be extracted using extract_decoys and converted to PDB format using extract_pdbs. Rescoring can be performed using score_jd2 to calculate scores against further restraint data for cross-validation, to perform RMSD calculations against specific reference structures. The appliction r_rmsf can be used to superimpose, detect converged sub-regions of the structure and to perform simple rmsd clustering.

For more information see the Tutorials and the Application Reference.


References

sampling protocol: