Initialise a database¶
The first step of a cg/wgMLST analysis is to initialise a database by a list of genes with a reference sequence for each of them.
- cgMLST
A list of genes corresponding to the coregenome of a species.
- wgMLST
A list of genes corresponding to the whole genome of a species or a clone.
Import from cgmlst.org¶
You can automatically import a cgMLST resource from cgmlst.org.
wgMLST import -h
Usage: wgMLST import [OPTIONS] DATABASE [SPECIES]...
Creates a wgMLST DATABASE from an online resource.
The research can be filtered by adding a SPECIES name.
Options:
-f, --force Overwrite alrealdy existing DATABASE
--prompt / --no-prompt Do not prompt if multiple choices are found,
fail instead.
Create from external scheme¶
The cg/wgMLST database can be created using a scheme corresponding to a list of different genes (a multi-fasta(.gz) file containing gene sequences in nucleotide format).
>ACICU_RS02500
TTATTTCTTCACAACAGATGGTGCAATTGGGTCGGCAGTGATATAGCCAACTGCTGCTGC
...
GTGGTTAGAAGCAGTGGTCAT
>ACICU_RS11305
CGCACCTAATGGAAGAAAAGGGATCCCCGTAAACCATTTTAAAATATCGCGACGTGTTGG
...
TTTGGAATTGATGCAGAAATTAAATCTTAA
>ACICU_RS08820
ATGGCTTATCAAACTTTAGAACAGCTACAGCAGTCTAAAGCCAAGCTTCACGAAACTGTG
...
TCGCAGTTACGTTAA
Warning
At contrary to other cg/wgMLST tools, only one allele for each gene must be include on the scheme file.
You can get scheme for:
- cgMLST
Using a scheme from a scientific publication and not available on cgmlst.org.
Using the annotation of the genes from the reference genome of the species. After adding your strains to the database, you can filter to core genome by removing genes absent from least 95% of the strains (see validate)
- wgMLST
Using gene annotations from a genome close to your strains
Using pangenome results from analysis of your strains with e.g. Roary.
wgMLST create --help
Usage: wgMLST create [OPTIONS] DATABASE COREGENE
Creates a wgMLST DATABASE from a template COREGENE.
Options:
-f, --force Overwrite alrealdy existing DATABASE
-c, --concatenate Automatically concatenates genes with duplicated sequences
-r, --remove Automatically removes genes with duplicated sequences
-s, --species TEXT Name of the species
-V, --version TEXT Version of the database
Warning
If the same sequence is used more than once in your scheme, you can specify how to handle it using the -c or -r options.