Check quality of the database

After loading all your strains to the database, you need to check allele calling quality before export results.

Note

You can have information of current data in the database using stats command.

wgMLST stats -h
Usage: wgMLST stats [OPTIONS] DATABASE

Extract stats from a wgMLST DATABASE.

Validate strains

To search potential strain with problems like bad assembly or wrong species, you can use the strain command with the -c option.

wgMLST strain -h
Usage: wgMLST strain [OPTIONS] DATABASE

Extracts a list of strains from a wgMLST DATABASE.

Options:
-m, --mincover INTEGER  Minimun number of strain found to keep a gene
                        (default:0)
-k, --keep              Keep only gene with different allele (omit missing).
-d, --duplicate         Conserve duplicate gene (default remove).
-V, --inverse           Keep only gene that do not meet the filter
                        of mincover or keep options.
-c, --count             Count the number of gene present in the database for
                        each strains.
-o, --output FILENAME   Export strain list to (default=stdout).

Note

If some strains show low number of genes found in comparison to the other, you can remove it using remove command.

Note

Similarly to gene command or export, you can filter gene that you want to conserved for the search.

By default, only duplicate genes are removed.

Validate genes

Similarly to strains, it could be interesting to saved genes list to conserved for the rest of the analysis using gene command.

wgMLST gene -h
Usage: wgMLST gene [OPTIONS] DATABASE

Extracts a list of genes from a wgMLST DATABASE.

Options:
-m, --mincover INTEGER  Minimun number of strain found to keep a gene
                                        (default:0)
-k, --keep              Keep only gene with different allele (omit missing).
-d, --duplicate         Conserve duplicate gene (default remove).
-V, --inverse           Keep only gene that do not meet the filter of
                        mincover or keep options.
-o, --output FILENAME   Export GENE list to (default=stdout).

Note

Gene list that pass your threshold can be used further for export sequence.

Warning

An important parameter are the -m option that defined the minimum number of strains found to keep a gene.

If you are interesting by coregene, you can defined this number to correspond to 95% of the strain in the database. (As example, if you have 100 strains in your database, you need to set this parameter to 95)

Remove strains or genes

After checking the database, if some strains or genes need to be removed, you can use the remove commands.

wgMLST remove -h
Usage: wgMLST remove [OPTIONS] DATABASE [GENES_OR_STRAINS]...

Removes STRAINS or GENES from a wgMLST DATABASE.

Options:
--strains / --genes    Choose the item you wish to remove  [default: strains]
-f, --file FILENAME    File list of genes or strains to removed on the wgMLST
                                       database.