RASREC CS-Rosetta

This tutorial shows how to run RASREC CS-Rosetta calculations as published in References [1][2] and [3] but without any additional restraints than backbone chemical shift data. To run calculations with additional restraints like RDC, NOE and PCS data learn how to run the basic protocol first in this tutorial and then learn how to add restraints in Tutorial: Calculations with Restraints. Thanks to the setup tools, this tutorial is very similar to Tutorial: Standard CS-Rosetta. The main difference is that final decoys are now found in a subfolder called fullatom_pool(see Section 4), the other difference is that instead of the keyword abrelax you now choose rasrec when using the -method option of the setup tools.

1. Preparation of Input Files:

the inputs for this tutorial are the following files:

2jrm_trim.tab backbone chemical shifts in TALOS file format
2jrm_trim.fasta corresponding FASTA sequence for structure calculation
2jrm_trim.frags3.dat.gz fragment library with 3mer fragments
2jrm_trim.frags9.dat.gz fragment library with 9mer fragments

These files are obtained as shown in Tutorial: Fragment Picking with Chemical Shift Data. For convenience these input files are also stored in the toolbox-folder csrosetta3/tutorials/inputs. Either run the tutorials directly in the toolbox folder or link this folder into your current folder like this:

$ which setup_target   # to figure out where the csrosetta toolbox is installed on your system
/net/software/csrosetta3/com_alias/setup_target   # for instance this might be the answer 
                                                  # showing you that /net/software/csrosetta3 is the location of the toolbox
$ mkdir run_tutorials  #create a clean folder
$ cd run_tutorials       
$ ln -s /net/software/csrosetta3/tutorials/inputs

Now you can either work through this tutorial step by step, or run the provided tutorial script:

$ /net/software/csrosetta3/tutorials/tutorial_rasrec.sh 

2. Automated Setup of RASREC CS-Rosetta calculation

2.1 Generate a Setup from Input Files

Use the automated setup from the CS-Rosetta toolbox to combine all input data files into a Setup.

$ cd csrosetta3/tutorials/inputs
$ setup_target -target 2jrm_trim -method rasrec -frags 2jrm_trim.frags*dat.gz -fasta 2jrm_trim.fasta -cs 2jrm_trim.tab

If you want you can also add the native pdb structure (with the correct trimming) to the Setup to allow on the fly computation of RMSDs (useful for benchmarks only).

$ renumber_pdb 2jrmA.pdb -fasta 2jrm_trim.fasta > 2jrm_trim.pdb
$ setup_target -target 2jrm_trim -method rasrec -native 2jrm_trim.pdb
************************************************************
Target          2jrm_trim   ---  Path: run_tutorials/tutorial_targets/2jrm_trim_2
------------------------------------------------------------
************************************************************
Setup( 2jrm_trim_2 | rasrec_standard )
Setup( 2jrm_trim_2 | rasrec_standard )
============================================================
STORED: method options as new setup 'rasrec_standard'...
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Method: rasrec
Choosen Options: 
   cs nmr_data/2jrm_trim.tab
   fasta fragments/2jrm_trim.fasta
   frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz
------------------------------------------------------------

Note, that up to this point calling setup_target has not produced any files in your working directory. Instead the Setup is stored in a central location specified by the environment variable CS3_BENCH_TARGETLIB. This variable should be set if the CS-Rosetta toolbox is properly installed. The default location is $HOME/cs_targetlib. To check the setting, run

$ echo $CS3_BENCH_TARGETLIB

You should have now created a Setup with the appropriate input files. Let's check

$ list_setup
---------------------------------------------------------------------------
---------             CS-Rosetta 3.0   (toolbox)                -----------
---------------------------------------------------------------------------
Setup( 2jrm_trim | abrelax_standard )
Setup( 2jrm_trim | rasrec_standard )

2.2 Transfer Setup from previously generated Setup

Note that Setup( 2jrm_trim | abrelax_standard )will only appear in the output of list_setup if you have also run the previous Tutorial: Standard CS-Rosetta.
In this case you can also create a Setup for method rasrec from the previously generated abrelax setup.

$ setup_target -target 2jrm_trim -method rasrec -transfer_method abrelax
============================================================
LOADED: Method options from existing setup 'abrelax_standard'...
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Method: rasrec
Choosen Options: 
   cs nmr_data/2jrm_trim.tab
   fasta fragments/2jrm_trim.fasta
   frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz
   native native/2jrm_trim.pdb
------------------------------------------------------------

Setup( 2jrm_trim | rasrec_standard )

============================================================
STORED: method options as new setup 'rasrec_standard'...
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Method: rasrec
Choosen Options: 
   cs nmr_data/2jrm_trim.tab
   fasta fragments/2jrm_trim.fasta
   frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz
   native native/2jrm_trim.pdb
------------------------------------------------------------

You can see exactly which files are loaded in your Setup from the output of setup_target, but you can also check (for instance at a later time) by using display_setup.

$ display_setup -method rasrec -target 2jrm_trim
---------------------------------------------------------------------------
---------             CS-Rosetta 3.0   (toolbox)                -----------
---------------------------------------------------------------------------
Method:  rasrec
Setup( 2jrm_trim | rasrec_standard )
============================================================
LOADED: Method options from existing setup 'rasrec_standard'...
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Method: abrelax
Choosen Options: 
   cs nmr_data/2jrm_trim.tab
   fasta fragments/2jrm_trim.fasta
   frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz
   native native/2jrm_trim.pdb
------------------------------------------------------------

2.3 Generating a RASREC Run Directory from a Setup

A Run is generated using command setup_run. You need to know which queuing system you are working with. In the following I will assume that SLURM is used, which uses sbatch to queue a job. You can choose which job-scripts to generate with the -job option. All provided templates are used to generate a respective job-script if this option is not set. Currently, templates for SLURM, MOAB and interactive jobs are available.
You can specify further options that are specific to a particular Run such as number of decoys, length of simulation, etc. To find out the available options run

$ setup_run -method abrelax -h
[... omitted method independent options (see application reference for full details ) ...]
run-options:
  extra options selected when generating a "RUN" with setup_run

  -cycle_factor CYCLE_FACTOR
                        the length of each abinitio stage (in terms of monte-
                        carlo cycles) can be increase/decreased using this
                        flag
  -cst_map_mode {historical,simple,simple_short,aadep,aadep_padonly,aadep_mid,aadep_mid_sd,aadep_mid_sdfix}
  -nocst_mapping        if restraints given with option -restraints contain
                        non-centroid atoms a second file is automatically
                        generated by remapping sidechain atoms to CB --
                        alternatively use this flag, remap manually, and load
                        cst-files with -centroid_stage_restraints
  -cst_mapping          if restraints given with option -restraints contain
                        non-centroid atoms a second file is automatically
                        generated by remapping sidechain atoms to CB --
                        alternatively use this flag, remap manually, and load
                        cst-files with -centroid_stage_restraints
  -normalize            normalize scores by collecting statistics during first
                        rounds of simulation

Most of these options concern handling of additional restraint data and thus do not concern us here. Option -cycle_factor can be used to shorten or lengthen individual trajectories. The default setting for rasrec is 2 if the option is omitted.

$ setup_run -method rasrec -target 2jrm_trim -dir run_rasrec -job slurm
....[ many lines of output omitted ] ... 
Method  rasrec  has been setup in run_rasrec/2jrm_trim ...
Enjoy!

If successful you should see the last two lines shown of the output above.

3. Starting the RASREC calculation

3.1 Starting a Test calculation

The automated setup tool generates two subdirectories under run_rasrec/2jrm_trim called run and test. The test folder contains an exact copy of files but with the added flag -run:test_cycles. This allows to quickly test your Setup. In particular on large computer centers the wait-time in queuing systems can be rather long. Moreover, either a test-queue should be provided or tiny jobs are served significantly faster than large jobs. In these cases it is good practice to check first using the test Run whether everything works before committing the production job in the run directory. To start the test Run

$ cd  run_rasrec/2jrm_trim/test
$ sbatch -n 16 test.slurm.job 

The test run should quickly generate a number of subdirectories like this

$ ls -1rt
batch_000001
batch_000002
batch_000003
batch_000004
centroid_pool_stage1
batch_000005
batch_000006
batch_000008
batch_000007
centroid_pool_stage2
batch_000009
batch_000010
batch_000011
batch_000012
centroid_pool
batch_000013

Especially if no centroid_pool sub-directory is generated something is going wrong. Note, however, that stage1 and stage2, is omitted for proteins with a low content in beta-strand. Your jobs are most likely fine if they successfully pass through 1-2 centroid stages.
If also a subdirectory fullatom_pool is generate the jobs are 100% fine. However, if your test run does generate centroid stages 1-2 and stalls afterwards, it might still run fine in production mode. The reason might be that the very short test-cycles lead to garbage structures for which the extracted beta-strand features used in the stage3 broken-chain sampling are impossible to realize. In such cases one can verify that indeed the broken-chain sampling was the reason for the stopped runs and not a problem with input files (wrong atom names in restraint files are the usual culprit).

3.2 Starting a Production calculation

$ cd run_rasrec/2jrm_trim/run
$ sbatch -n 512 production.slurm.job 

The RASREC calculation is finished when the subdirectory fullatom_pool_stage8 has been created. Note that stages 5 and 6 are omitted, stages 7 and 8 correspond to stages 5 and 6 in Ref. [4]. Additionally to the creation of the stage8 folder you will find this output (or very similar) at the end of the stderr-log of the job:

ERROR: quick exit from job-distributor due to flag jd2::mpi_nowait_for_remaining_jobs --- this is not an error 
ERROR:: Exit from: src/protocols/jd2/archive/ArchiveManager.cc line: 654
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD 
with errorcode 911.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 42924 on
node node020 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

As the message says: "ERROR .... this is not an error" it is not an error but the normal way to complete jobs. The reason for this slightly schizophrenic message is that it saves considerable on computer time to hard-crash the RASREC simulation instead of waiting for all running trajectories to complete. If you do not see this message your jobs may have been prematurely stopped. This can have several reasons, like a preemption of the queuing system, running over a set time limit in the queuing system, and random crashes.
To restart the RASREC simulation simply issue sbatch -n 512 production.slurm.job again and it will proceed from the last checkpointed result. You can even change the number of processes used for the simulation each time you restart.


References

sampling protocol: