User manual using a web site
See the Help page of the public site
rnaspace.org
Note that the documentation uses the parametrization of the site. For example,
the size limitation of genomic input sequences, the number of day that results
are stored ...
User manual using in command line
The rnaspace_cli.py script allows you to use RNAspace with the command line, without
using the web interface.
1/ General description
All the information provided in the Load page (sequence, domain, name, Email) and the
Predict page (gene-finders to launch with their parameters, combine) are now provided
in a run configuration file.
In this file, it is also required to set the output format and the directory location
for ncRNA predictions.
Note that at the end of the Run, a mail is send to the user.
If the website is running, following the link given in the mail allows to interactively
explore the ncRNA predictions found.
Here is a diagram to illustrate this functioning.
The syntax of the script is the following.
python rnaspace_cli.py [Options] RUN_CONFIGURATION_FILE
Options:
-r RNASPACE_CONF, --rnaspace=RNASPACE_CONF
RNAspace configuration file (by default the file cfg/rnaspace.cfg)
-p PREDICTORS_CONF_DIR, --predictors_conf_dir=PREDICTORS_CONF_DIR
Directory that contains the predictors conf files (by default the directory cfg/predictors)
Help Options:
-h interactive
enter in the interactive help
-h full
display all the help pages. It can be huge, so it is better to save it in
a file: python rnaspace_cli.py -h full > help_rnaspace.txt
-h gene-finder
display the list of the available gene finder
-h organisms
display the list of the available organisms
-h output-formats
display the list of available export formats
-h GENE_FINDER_NAME
display the help of GENE_FINDER_NAME
Example
Considering an installation with the gene finder configuration files
in the /etc/rnaspace/gene-finder/ directory and
the rnaspace configuration file /etc/rnaspace/rnaspace.cfg.
You can use the script with this options:
python rnaspace_cli.py -r /etc/rnaspace/rnaspace.cfg -p /etc/rnaspace/gene-finder
2. Run configuration file
The run configuration file is a INI file that describes the gene finder to execute. It is very
similar to the other configuration files.
A run configuration file contains a Main section that describes the gene finders to execute and the
input sequence(s) to use. For each sequence to use, a section must be provided to describe this input
sequence. Optionally, the configuration file can contain sections for each gene finder to execute that
specify the options to use with these gene finder.
For each sequence(s) to use, options about masking are available. You can mask coding regions with an
annotation file, non transcript regions with a transcripts file, low complexity regions using mdust
and repeated regions using repeatMasker. For masking coding option and for masking non-transcript
option, a section must be provided to give the input file and the different parameters.
To simplify the process of writing the run configuration file, a file named rnaspace_cli.cfg is
provided with RNAspace.
This file contains the configuration of all the gene finder provided with the platform. The
parameters of gene-finders have default values so they could have been removed but they allow
to easily change these default values.
You can use the file only changing the run option of the Main section to only execute the gene finder
you want.
Example of command line configuration file
#Mandatory section
[Main]
email = email@domain.com
input = sample # a section named "sample" must exists
run = YASS,BLAST, Comparative # the gene finder to execute separated by a coma.
combine = True # execute the combine after the gene finder execution
export_output = outputs.gff # file in which the results will be exported
export_format = GFF # export the results in GFF format.
# describes the input sequence. Mandatory.
[sample]
path = sample.fna # path to the FASTA file
name = sample # input name
domain = bacteria # bacteria
mask_coding_regions = annotations #option to launch masking with annotations file.
A section named "annotations" must exists
mask_transcript_regions = transcripts #option to launch masking with transcripts file.
A section named "transcripts" must exists
mdust = True #option to launch mdust on sequence
repeatmasker = True #option to launch repeatmasker on sequence
# optionnal options
# species =
# strain =
# replicon =
[annotations]
path = NC_000913.gff # path to the annotation file to use
x = 20 #number of bp unmask at the beginning and the end of the sequence
[transcripts]
path = sample_RNA.fastq #path to the transcripts file to use
x = 20 #number of bp unmask at the beginning and the end of the sequence
p = 100 #megablast parameter: Identity (%)
w = 18 #megablast parameter: Word size (bp)
# specify some options for YASS gene finder
# options that are not specified get the default values.
[YASS]
alignments = 2
e = yes
r = both
# Specify the comparative pipeline
[Comparative]
conservation_soft = yass
aggregation_soft = CG-seq
inference_soft = caRNAc
organisms = Acholeplasma_laidlawii_PG_8A [1 sequences]
# Specify options for yass program
[yass]
p = "###-#@-##@##,###--#-#--#-###"
c = double
# Specify options for CG-seq program
[CG-seq]
L = 400
F = True
# Note that the options of the gene finder that do not have a section, take the default
# values.