This tutorial shows how to run RASREC CS-Rosetta calculations as published in References [1][2] and [3] but without any additional restraints than backbone chemical shift data. To run calculations with additional restraints like RDC, NOE and PCS data learn how to run the basic protocol first in this tutorial and then learn how to add restraints in Tutorial: Calculations with Restraints. Thanks to the setup tools, this tutorial is very similar to Tutorial: Standard CS-Rosetta. The main difference is that final decoys are now found in a subfolder called fullatom_pool(see Section 4), the other difference is that instead of the keyword abrelax you now choose rasrec when using the -method option of the setup tools.
the inputs for this tutorial are the following files:
2jrm_trim.tab | backbone chemical shifts in TALOS file format |
2jrm_trim.fasta | corresponding FASTA sequence for structure calculation |
2jrm_trim.frags3.dat.gz | fragment library with 3mer fragments |
2jrm_trim.frags9.dat.gz | fragment library with 9mer fragments |
These files are obtained as shown in Tutorial: Fragment Picking with Chemical Shift Data. For convenience these input files are also stored in the toolbox-folder csrosetta3/tutorials/inputs. Either run the tutorials directly in the toolbox folder or link this folder into your current folder like this:
$ which setup_target # to figure out where the csrosetta toolbox is installed on your system /net/software/csrosetta3/com_alias/setup_target # for instance this might be the answer # showing you that /net/software/csrosetta3 is the location of the toolbox $ mkdir run_tutorials #create a clean folder $ cd run_tutorials $ ln -s /net/software/csrosetta3/tutorials/inputs
Now you can either work through this tutorial step by step, or run the provided tutorial script:
$ /net/software/csrosetta3/tutorials/tutorial_rasrec.sh
Use the automated setup from the CS-Rosetta toolbox to combine all input data files into a Setup.
$ cd csrosetta3/tutorials/inputs $ setup_target -target 2jrm_trim -method rasrec -frags 2jrm_trim.frags*dat.gz -fasta 2jrm_trim.fasta -cs 2jrm_trim.tab
If you want you can also add the native pdb structure (with the correct trimming) to the Setup to allow on the fly computation of RMSDs (useful for benchmarks only).
$ renumber_pdb 2jrmA.pdb -fasta 2jrm_trim.fasta > 2jrm_trim.pdb $ setup_target -target 2jrm_trim -method rasrec -native 2jrm_trim.pdb ************************************************************ Target 2jrm_trim --- Path: run_tutorials/tutorial_targets/2jrm_trim_2 ------------------------------------------------------------ ************************************************************ Setup( 2jrm_trim_2 | rasrec_standard ) Setup( 2jrm_trim_2 | rasrec_standard ) ============================================================ STORED: method options as new setup 'rasrec_standard'... - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - Method: rasrec Choosen Options: cs nmr_data/2jrm_trim.tab fasta fragments/2jrm_trim.fasta frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz ------------------------------------------------------------
Note, that up to this point calling setup_target has not produced any files in your working directory. Instead the Setup is stored in a central location specified by the environment variable CS3_BENCH_TARGETLIB. This variable should be set if the CS-Rosetta toolbox is properly installed. The default location is $HOME/cs_targetlib. To check the setting, run
$ echo $CS3_BENCH_TARGETLIB
You should have now created a Setup with the appropriate input files. Let's check
$ list_setup --------------------------------------------------------------------------- --------- CS-Rosetta 3.0 (toolbox) ----------- --------------------------------------------------------------------------- Setup( 2jrm_trim | abrelax_standard ) Setup( 2jrm_trim | rasrec_standard )
Note that Setup( 2jrm_trim | abrelax_standard )
will only appear in the output of list_setup
if you have also run the previous Tutorial: Standard CS-Rosetta.
In this case you can also create a Setup for method rasrec from the previously generated abrelax setup.
$ setup_target -target 2jrm_trim -method rasrec -transfer_method abrelax ============================================================ LOADED: Method options from existing setup 'abrelax_standard'... - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - Method: rasrec Choosen Options: cs nmr_data/2jrm_trim.tab fasta fragments/2jrm_trim.fasta frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz native native/2jrm_trim.pdb ------------------------------------------------------------ Setup( 2jrm_trim | rasrec_standard ) ============================================================ STORED: method options as new setup 'rasrec_standard'... - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - Method: rasrec Choosen Options: cs nmr_data/2jrm_trim.tab fasta fragments/2jrm_trim.fasta frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz native native/2jrm_trim.pdb ------------------------------------------------------------
You can see exactly which files are loaded in your Setup from the output of setup_target, but you can also check (for instance at a later time) by using display_setup.
$ display_setup -method rasrec -target 2jrm_trim --------------------------------------------------------------------------- --------- CS-Rosetta 3.0 (toolbox) ----------- --------------------------------------------------------------------------- Method: rasrec Setup( 2jrm_trim | rasrec_standard ) ============================================================ LOADED: Method options from existing setup 'rasrec_standard'... - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - Method: abrelax Choosen Options: cs nmr_data/2jrm_trim.tab fasta fragments/2jrm_trim.fasta frags fragments/2jrm_trim.frags3.dat.gz fragments/2jrm_trim.frags9.dat.gz native native/2jrm_trim.pdb ------------------------------------------------------------
A Run is generated using command setup_run. You need to know which queuing system you are working with. In the following I will assume that SLURM is used, which uses sbatch to queue a job. You can choose which job-scripts to generate with the -job option. All provided templates are used to generate a respective job-script if this option is not set. Currently, templates for SLURM, MOAB and interactive jobs are available.
You can specify further options that are specific to a particular Run such as number of decoys, length of simulation, etc. To find out the available options run
$ setup_run -method abrelax -h [... omitted method independent options (see application reference for full details ) ...] run-options: extra options selected when generating a "RUN" with setup_run -cycle_factor CYCLE_FACTOR the length of each abinitio stage (in terms of monte- carlo cycles) can be increase/decreased using this flag -cst_map_mode {historical,simple,simple_short,aadep,aadep_padonly,aadep_mid,aadep_mid_sd,aadep_mid_sdfix} -nocst_mapping if restraints given with option -restraints contain non-centroid atoms a second file is automatically generated by remapping sidechain atoms to CB -- alternatively use this flag, remap manually, and load cst-files with -centroid_stage_restraints -cst_mapping if restraints given with option -restraints contain non-centroid atoms a second file is automatically generated by remapping sidechain atoms to CB -- alternatively use this flag, remap manually, and load cst-files with -centroid_stage_restraints -normalize normalize scores by collecting statistics during first rounds of simulation
Most of these options concern handling of additional restraint data and thus do not concern us here. Option -cycle_factor can be used to shorten or lengthen individual trajectories. The default setting for rasrec is 2 if the option is omitted.
$ setup_run -method rasrec -target 2jrm_trim -dir run_rasrec -job slurm ....[ many lines of output omitted ] ... Method rasrec has been setup in run_rasrec/2jrm_trim ... Enjoy!
If successful you should see the last two lines shown of the output above.
The automated setup tool generates two subdirectories under run_rasrec/2jrm_trim called run and test. The test folder contains an exact copy of files but with the added flag -run:test_cycles. This allows to quickly test your Setup. In particular on large computer centers the wait-time in queuing systems can be rather long. Moreover, either a test-queue should be provided or tiny jobs are served significantly faster than large jobs. In these cases it is good practice to check first using the test Run whether everything works before committing the production job in the run directory. To start the test Run
$ cd run_rasrec/2jrm_trim/test $ sbatch -n 16 test.slurm.job
The test run should quickly generate a number of subdirectories like this
$ ls -1rt batch_000001 batch_000002 batch_000003 batch_000004 centroid_pool_stage1 batch_000005 batch_000006 batch_000008 batch_000007 centroid_pool_stage2 batch_000009 batch_000010 batch_000011 batch_000012 centroid_pool batch_000013
Especially if no centroid_pool sub-directory is generated something is going wrong. Note, however, that stage1 and stage2, is omitted for proteins with a low content in beta-strand. Your jobs are most likely fine if they successfully pass through 1-2 centroid stages.
If also a subdirectory fullatom_pool is generate the jobs are 100% fine. However, if your test run does generate centroid stages 1-2 and stalls afterwards, it might still run fine in production mode. The reason might be that the very short test-cycles lead to garbage structures for which the extracted beta-strand features used in the stage3 broken-chain sampling are impossible to realize. In such cases one can verify that indeed the broken-chain sampling was the reason for the stopped runs and not a problem with input files (wrong atom names in restraint files are the usual culprit).
$ cd run_rasrec/2jrm_trim/run $ sbatch -n 512 production.slurm.job
The RASREC calculation is finished when the subdirectory fullatom_pool_stage8 has been created. Note that stages 5 and 6 are omitted, stages 7 and 8 correspond to stages 5 and 6 in Ref. [4]. Additionally to the creation of the stage8 folder you will find this output (or very similar) at the end of the stderr-log of the job:
ERROR: quick exit from job-distributor due to flag jd2::mpi_nowait_for_remaining_jobs --- this is not an error ERROR:: Exit from: src/protocols/jd2/archive/ArchiveManager.cc line: 654 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 911. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 2 with PID 42924 on node node020 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). --------------------------------------------------------------------------
As the message says: "ERROR .... this is not an error" it is not an error but the normal way to complete jobs. The reason for this slightly schizophrenic message is that it saves considerable on computer time to hard-crash the RASREC simulation instead of waiting for all running trajectories to complete. If you do not see this message your jobs may have been prematurely stopped. This can have several reasons, like a preemption of the queuing system, running over a set time limit in the queuing system, and random crashes.
To restart the RASREC simulation simply issue sbatch -n 512 production.slurm.job
again and it will proceed from the last checkpointed result. You can even change the number of processes used for the simulation each time you restart.