API Documentation¶
python Mlst Local Search Tool
Whole Genome MLST¶
“Core classes and functions to work with Whole Genome MLST data.
-
pymlst.wg.core.open_wg(file=None, ref='ref')[source]¶ - A context manager function to wrap the creation a
WholeGenomeMLSTobject.
Context managers allow you to instantiate objects using the
withkeyword, eliminating the need to manage exceptions and commit/close processes yourself.- Parameters
file – The path to the database file to work with.
ref – The name that will be given to the reference strain in the database.
- Yields
A
WholeGenomeMLSTobject.
-
class
pymlst.wg.core.DatabaseWG(file, ref)[source]¶ A core level class to manipulate the genomic database.
Warning
Shouldn’t be instantiated directly, see
WholeGenomeMLSTinstead.-
add_infos(repository, species, version)[source]¶ Add infos of the cgMLST schema use in this database
-
get_gene_sequences(gene)[source]¶ Return all the sequences for a specific gene and lists the strains that are referencing them.
-
count_genes_per_souche(valid_shema)[source]¶ Return the number of distinct genes per strain.
The counted genes are restricted to the ones given in the valid_schema.
-
-
class
pymlst.wg.core.WholeGenomeMLST(file, ref)[source]¶ Whole Genome MLST python representation.
Example of usage:
open_wg('database.db') as db: db.create(open('genome.fasta')) db.add_strain(open('strain_1.fasta')) db.add_strain(open('strain_2.fasta'))
-
__init__(file, ref)[source]¶ - Parameters
file – The path to the database file to work with.
ref – The name that will be given to the reference strain in the database.
-
create(coregene, concatenate=False, remove=False)[source]¶ Creates a whole genome MLST database from a core genome fasta file.
- Parameters
coregene – The fasta Path containing the reference core genome.
concatenate – Whether we should concatenate genes with identical sequences.
remove – Whether we should remove genes with identical sequences.
For instance, if concatenate is set to
True, 2 genes g1 and g2 having the same sequence will be stored as a single gene named g1;g2.
-
add_infos(repository, species, version)[source]¶ Add infos of the cgMLST schema store in database.
- Parameters
repository – Source of the cgMLST data
species – Name of the specie
version – Version of the database
-
get_infos(output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Get infos of the cgMLST schema store in the database
-
add_strain(genome, strain=None, identity=0.95, coverage=0.9)[source]¶ Adds a genome strain to the database.
How it works:
A BLAT research is performed on each given contig of the strain to find sub-sequences matching the core genes.
The identified sub-sequences are extracted and added to our database where they are associated to a sequence ID.
An MLST entry is created, referencing the sequence, the gene it belongs to, and the strain it was found in.
-
add_reads(fastqs, strain=None, identity=0.95, coverage=0.9, reads=10)[source]¶ Adds raw reads of a strain to the database.
How it works:
A KMA research is performed on reads (fastq) of the strain to find sub-sequences matching the core genes.
The identified sub-sequences are extracted and added to our database where they are associated to a sequence ID.
An MLST entry is created, referencing the sequence, the gene it belongs to, and the strain it was found in.
- Parameters
fastqs – The reads we want to add as a list of fastq file.
strain – The name that will be given to the new strain in the database.
identity – Sets the minimum identity used by BWA for sequences research (in percent).
coverage – Sets the minimum accepted coverage for found sequences.
reads – Sets the minimum number of reads coverage to conserved an results
-
remove_gene(genes, file=None)[source]¶ Removes genes from the database.
- Parameters
genes – Names of the genes to remove.
file – A file containing a gene name per line.
-
remove_strain(strains, file=None)[source]¶ Removes entire strains from the database.
- Parameters
strains – Names of the strains to remove.
file – A file containing a strain name per line.
-
extract(extractor, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Takes an extractor object and writes the extraction result on the given output.
- Parameters
extractor – A
Extractorobject describing the way data should be extracted.output – The output that will receive extracted data.
-
-
class
pymlst.wg.core.Extractor[source]¶ A simple interface to ease the process of creating new extractors.
-
pymlst.wg.core.find_recombination(genes, alignment, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Counts the number of versions of each gene.
- Parameters
genes – List of genes (output of
TableExtractorusingexport='gene').alignment – fasta file alignment (output of
SequenceExtractorusingalign=True).output – The output where to write the results.
-
pymlst.wg.core.find_subgraph(distance, threshold=50, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, export='list')[source]¶ Searches groups of strains separated by a distance threshold.
- Parameters
threshold – Minimum distance to maintain for groups extraction.
distance – Distance matrix file (output of
TableExtractorwithexport='distance').output – The output where to write the results.
export – Sets the export type.
Set of methods to extract different types of results from wgMLST
-
class
pymlst.wg.extractors.SequenceExtractor(file=None, reference=False)[source]¶ Extracts coregene sequences into fasta file.
-
class
pymlst.wg.extractors.MsaExtractor(file=None, realign=False)[source]¶ Compute Multiple Sequence Alignment (MSA) and extracts the aligned sequences.
-
__init__(file=None, realign=False)[source]¶ - Parameters
file – Path of the file containing the coregens to extract
realign – Realign genes with same length
-
-
class
pymlst.wg.extractors.TableExtractor(mincover=0, keep=False, duplicate=False, inverse=False)[source]¶ Extraction of cgMLST distance matrix, MLST profiles, Genes and Strains list from a wgMLST database.
-
class
pymlst.wg.extractors.TableExtractorCommand(*args, **kwargs)[source]¶ Options supported by
TableExtractor.
-
class
pymlst.wg.extractors.GeneExtractor(**kwargs)[source]¶ Extracts a list of genes from a wgMLST database.
-
class
pymlst.wg.extractors.StatsExtractor[source]¶ Extracts stats, number of strains, coregenes and sequences from a wgMLST database.
-
class
pymlst.wg.extractors.StrainExtractor(count=False, **kwargs)[source]¶ Extracts a list of strains from a wgMLST database.
-
class
pymlst.wg.extractors.DistanceExtractor(mincover=0, keep=False, duplicate=False, inverse=False)[source]¶ Extracts a distance matrix from a wgMLST database.
Classical MLST¶
Core classes and functions to work with Classical MLST data.
-
pymlst.cla.core.open_cla(file=None, ref=1)[source]¶ - A context manager function to wrap the creation a
ClassicalMLSTobject.
Context managers allow you to instantiate objects using the
withkeyword, eliminating the need to manage exceptions and commit/close processes yourself.- Parameters
file – The path to the database file to work with.
ref – The name that will be given to the reference strain in the database.
- Yields
A
ClassicalMLSTobject.
-
class
pymlst.cla.core.DatabaseCLA(file, ref)[source]¶ A core level class to manipulate the genomic database.
Warning
Shouldn’t be instantiated directly, see
ClassicalMLSTinstead.-
add_infos(repository, species, mlst, version)[source]¶ Add infos of the MLST schema use in this database
-
add_sequence(sequence, gene, allele)[source]¶ Adds a new sequence associated to a gene and an allele.
-
add_mlst(sequence_typing, gene, allele)[source]¶ Adds a new sequence typing, associated to a gene and an allele.
-
-
class
pymlst.cla.core.ClassicalMLST(file, ref)[source]¶ Classical MLST python representation.
Example of usage:
open_cla('database.db') as db: db.create(open('profile.txt'), [open('gene1.fasta'), open('gene2.fasta'), open('gene3.fasta')]) db.multi_search(open('genome.fasta'))
-
__init__(file, ref)[source]¶ - Parameters
file – The path to the database file to work with.
ref – The name that will be given to the reference strain in the database.
-
create(profile, alleles)[source]¶ Creates a classical MLST database from an MLST profile and a list of alleles.
- Parameters
profile – The MLST profile
alleles – A list of alleles files.
The MLST profile should be a TXT file respecting the following format:
MLST Profile TXT¶ ST
gene1
gene2
gene3
…
1
1
1
1
…
2
3
3
2
…
3
1
2
1
…
4
1
1
3
…
…
…
…
…
…
-
add_infos(repository, species, mlst, version)[source]¶ Add infos of the MLST schema store in database.
- Parameters
repository – Source of the MLST data
species – Name of the specie
mlst – Name of the MLST schema
version – Version of the database
-
get_infos(output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Get infos of the MLST schema store in the database
-
search_st(genome, identity=0.9, coverage=0.9, fasta=None)[source]¶ Search the Sequence Type number of a strain.
- Parameters
-
multi_search(genomes, identity=0.9, coverage=0.9, fasta=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Searches the Sequence Type number of one or multi strain(s) from an assembly genome.
- Parameters
genomes – Tuple of one or more strain genomes given as input Path
output – An output for the sequence type research results.
identity – Sets the minimum identity used by BLAT for sequences research (in percent).
fasta – A file where to export genes alleles results in a fasta format.
coverage – Sets the minimum accepted coverage for found sequences.
-
search_read(fastqs, identity=0.9, coverage=0.95, reads=10, fasta=None)[source]¶ Searches the Sequence Type from raw reads of one strain.
- Parameters
fastq – List of fastq files containing raw reads
identity – Sets the minimum identity used by KMA for sequences research (in percent).
coverage – Sets the minimum accepted gene coverage for found sequences.
reads – Sets the minimum reads coverage to conserve an mapping
fasta – A file where to export genes alleles results in a fasta format.
-
multi_read(fastqs, identity=0.9, coverage=0.95, reads=10, paired=True, fasta=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Search the Sequence Type number of one or multi strain(s) from raw reads.
- Parameters
fastqs – Tuple of one or more strain raw reads given as input
output – An output for the sequence type research results.
identity – Sets the minimum identity used by KMA for sequences research (in percent).
reads – Sets the minimum reads coverage to conserve an mapping
paired – Defined if the raw reads are by paired or single
fasta – A file where to export genes alleles results in a fasta format.
coverage – Sets the minimum accepted coverage for found sequences.
-
-
class
pymlst.cla.core.ST_result(genome_name, st_val, alleles)[source]¶ Writes the results of the ST research
Other Typing¶
Core classes and functions to work in alternative typing methods.
-
pymlst.pytyper.core.open_typer(method)[source]¶ - Parameters
method – Defines typing method to apply. Possible values : 1- fim 2- spa 3- clmt
- Yields
A :class: ‘~pymlst.pytyper.core.pyTyper’ object.
-
class
pymlst.pytyper.core.PyTyper(fi, typing)[source]¶ Primary class for all pyTyper objects listed on method
-
abstract
search_genome(genome, identity=0.9, coverage=0.9, fasta=None)[source]¶ Abstract method for searching alleles against a genome.
- Parameters
genome – Path to the fasta genome
identity – Minimum identity treshold (0.9)
coverage – Minimum coverage threshold (0.9)
fasta – Path to a file to write alleles in fasta format (None)
-
abstract
create()[source]¶ - Initialiazes the database for a specific typing method:
FimH Spa Clermont
Uses a scheme created automatically that is specific to the typing
-
abstract
multi_search(genomes, identity, coverage, fasta=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Performed batch search analysis of list of genomes
- Parameters
genomes – List of path to the fasta genomes
identity – Minimum identity treshold
coverage – Minimum coverage threshold
fasta – Handle to a file to write alleles in fasta format (None)
output – Write result on this output (stdout)
-
abstract
-
class
pymlst.pytyper.core.FimH(fi)[source]¶ fimH typing for Escherichia coli.
-
search_genome(genome, identity, coverage, fasta)[source]¶ Abstract method for searching alleles against a genome.
- Parameters
genome – Path to the fasta genome
identity – Minimum identity treshold (0.9)
coverage – Minimum coverage threshold (0.9)
fasta – Path to a file to write alleles in fasta format (None)
-
multi_search(genomes, identity=0.9, coverage=0.9, fasta=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Performed batch search analysis of list of genomes
- Parameters
genomes – List of path to the fasta genomes
identity – Minimum identity treshold
coverage – Minimum coverage threshold
fasta – Handle to a file to write alleles in fasta format (None)
output – Write result on this output (stdout)
-
-
class
pymlst.pytyper.core.Spa(fi)[source]¶ Spa typing for Staphylococcus aureus.
-
search_genome(genome, identity, coverage, fasta)[source]¶ Abstract method for searching alleles against a genome.
- Parameters
genome – Path to the fasta genome
identity – Minimum identity treshold (0.9)
coverage – Minimum coverage threshold (0.9)
fasta – Path to a file to write alleles in fasta format (None)
-
multi_search(genomes, identity=0.9, coverage=0.9, fasta=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Performed batch search analysis of list of genomes
- Parameters
genomes – List of path to the fasta genomes
identity – Minimum identity treshold
coverage – Minimum coverage threshold
fasta – Handle to a file to write alleles in fasta format (None)
output – Write result on this output (stdout)
-
-
class
pymlst.pytyper.core.Clmt(fi)[source]¶ Phylogroupe determination using ClermontTyping methods for Escherichia coli.
-
search_genome(genome, identity, coverage, fasta)[source]¶ Abstract method for searching alleles against a genome.
- Parameters
genome – Path to the fasta genome
identity – Minimum identity treshold (0.9)
coverage – Minimum coverage threshold (0.9)
fasta – Path to a file to write alleles in fasta format (None)
-
multi_search(genomes, identity=0.9, coverage=0.99, fasta=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶ Performed batch search analysis of list of genomes
- Parameters
genomes – List of path to the fasta genomes
identity – Minimum identity treshold
coverage – Minimum coverage threshold
fasta – Handle to a file to write alleles in fasta format (None)
output – Write result on this output (stdout)
-
-
class
pymlst.pytyper.core.TypingResult(genome_name, method)[source]¶ Writes the results of the TYPING research
-
__init__(genome_name, method)[source]¶ - Parameters
genome_name – Name of the genome retrieved from the path provided by the user
method – Typing method uses for analysis
-