RNAspace

User manual using a web site

See the Help page of the public site rnaspace.org
Note that the documentation uses the parametrization of the site. For example, the size limitation of genomic input sequences, the number of day that results are stored ...

User manual using in command line

The rnaspace_cli.py script allows you to use RNAspace with the command line, without using the web interface.

1/ General description

All the information provided in the Load page (sequence, domain, name, Email) and the Predict page (gene-finders to launch with their parameters, combine) are now provided in a run configuration file.
In this file, it is also required to set the output format and the directory location for ncRNA predictions.
Note that at the end of the Run, a mail is send to the user. If the website is running, following the link given in the mail allows to interactively explore the ncRNA predictions found.

Here is a diagram to illustrate this functioning.

Command line functioning image

The syntax of the script is the following.
python rnaspace_cli.py [Options] RUN_CONFIGURATION_FILE

Options:

-r RNASPACE_CONF, --rnaspace=RNASPACE_CONF
RNAspace configuration file (by default the file cfg/rnaspace.cfg)
-p PREDICTORS_CONF_DIR, --predictors_conf_dir=PREDICTORS_CONF_DIR
Directory that contains the predictors conf files (by default the directory cfg/predictors)

Help Options:

-h interactive
enter in the interactive help
-h full
display all the help pages. It can be huge, so it is better to save it in
a file: python rnaspace_cli.py -h full > help_rnaspace.txt
-h gene-finder
display the list of the available gene finder
-h organisms
display the list of the available organisms
-h output-formats
display the list of available export formats
-h GENE_FINDER_NAME
display the help of GENE_FINDER_NAME

Example

Considering an installation with the gene finder configuration files in the /etc/rnaspace/gene-finder/ directory and the rnaspace configuration file /etc/rnaspace/rnaspace.cfg.
You can use the script with this options:
python rnaspace_cli.py -r /etc/rnaspace/rnaspace.cfg -p /etc/rnaspace/gene-finder

2. Run configuration file

The run configuration file is a INI file that describes the gene finder to execute. It is very similar to the other configuration files.

A run configuration file contains a Main section that describes the gene finders to execute and the input sequence(s) to use. For each sequence to use, a section must be provided to describe this input sequence. Optionally, the configuration file can contain sections for each gene finder to execute that specify the options to use with these gene finder.
For each sequence(s) to use, options about masking are available. You can mask coding regions with an annotation file, non transcript regions with a transcripts file, low complexity regions using mdust and repeated regions using repeatMasker. For masking coding option and for masking non-transcript option, a section must be provided to give the input file and the different parameters.

To simplify the process of writing the run configuration file, a file named rnaspace_cli.cfg is provided with RNAspace.
This file contains the configuration of all the gene finder provided with the platform. The parameters of gene-finders have default values so they could have been removed but they allow to easily change these default values.
You can use the file only changing the run option of the Main section to only execute the gene finder you want.

Example of command line configuration file

#Mandatory section
[Main]
email = email@domain.com
input = sample # a section named "sample" must exists
run = YASS,BLAST, Comparative # the gene finder to execute separated by a coma.
combine = True # execute the combine after the gene finder execution
export_output = outputs.gff # file in which the results will be exported
export_format = GFF # export the results in GFF format.

# describes the input sequence. Mandatory.
[sample]
path = sample.fna # path to the FASTA file
name = sample # input name
domain = bacteria # bacteria
mask_coding_regions = annotations #option to launch masking with annotations file. A section named "annotations" must exists
mask_transcript_regions = transcripts #option to launch masking with transcripts file. A section named "transcripts" must exists
mdust = True #option to launch mdust on sequence
repeatmasker = True #option to launch repeatmasker on sequence

# optionnal options
# species =
# strain =
# replicon =

[annotations]
path = NC_000913.gff # path to the annotation file to use
x = 20 #number of bp unmask at the beginning and the end of the sequence

[transcripts]
path = sample_RNA.fastq #path to the transcripts file to use
x = 20 #number of bp unmask at the beginning and the end of the sequence
p = 100 #megablast parameter: Identity (%)
w = 18 #megablast parameter: Word size (bp)

# specify some options for YASS gene finder
# options that are not specified get the default values.
[YASS]
alignments = 2
e = yes
r = both

# Specify the comparative pipeline
[Comparative]
conservation_soft = yass
aggregation_soft = CG-seq
inference_soft = caRNAc
organisms = Acholeplasma_laidlawii_PG_8A [1 sequences]

# Specify options for yass program
[yass]
p = "###-#@-##@##,###--#-#--#-###"
c = double

# Specify options for CG-seq program
[CG-seq]
L = 400
F = True

# Note that the options of the gene finder that do not have a section, take the default
# values.