Python module

../_images/inMOTIFin_components.png

The package is following object oriented programming (OOP) principles, and it is organized into three main units: modules, organizer, and utils, as shown on a simplified version of class diagram above. The classes highlighted in red are available for the user. The exception under OOP are the utils submodules, these provide only functions. Below the exported classes are explained.

Controller class

The Controller is a main unit that talks with the Prepare and the Sample modules. Its role is to delegate tasks from the user. When run from the command line, it takes input from Main (which provides the command line interface and argument reading through config_args). However, to enable usage from python, it is exposed to the user.

The simulation of motif-in-seq is provided by the run_inmotifin() function. Simulation of motifs is provided by the create_motifs() function. Simulation of random sequences is provided by the create_backgrounds() function. The multimerisation can be achieved with create_multimers() function. These functions takes Paramsdata dataclasses as their inputs, which stores the same information as would be given in the config file or command line options.

Beyond these options, the Controller class has two additional functions for more control over motif-in-seq simulation. First, create_motif_in_seq() for adding of motifs into background sequences at specific locations. This is different from the run_inmotifin() function because the default motif-in-seq creation is a probabilistic process and does not allow full control over which motif gets into which sequence or which location (except central locations).

The second additional functionality, is mask_motif_in_seq() for adding background sequences to specific positions in a given sequence, thus masking existing motifs. This is different from the ones above, in that it creates short sequences of backgrounds with the given probability for each letter and adds these into the specified locations. Note: this does not ensure that no existing motif resembles the added masks.

Additional classes

Beyond pre-existing methods for overall insertion, inMOTIFin allows for creative combination of existing lower level functionalities. To this end, the Motifer, MotifInstancer, Multimerer, Markover, Backgrounder, BackgroundSampler, Shuffler, Grouper, Frequencer, FrequencySampler, Positioner, and Inserter are providing access to lower level functionalities. Furthermore, the basic functions of onehot_to_str, create_reverse_complement, define_complementary_map_motif_array, and create_reverse_complement_motif are available.

Examples

For examples, please refer to the Python Examples page.

Detailed documentation of classes

Controller

class inmotifin.Controller(basic_params)

Organizer of preparation and sampling

Class parameters

reader: Reader

File reader class to read in motifs if necessary

writer: Writer

instance of the writer class

data_for_simulation: Dict[str, Any]

Dictionary of simulated data passed for sampling

summary: Dict[str, Dict[str, int]]

Dictionary of summary information about the sampling

rng: np.random.Generator

Random generator for length (uniform from integeres) and motif (Dirichlet) sampling

create_backgrounds(background_params)

Option of creating backgrounds given input parameters

Return type:

None

Parameters
background_params: BackgroundParams

Dataclass storing alphabet, sequence length, sequence number, b_alphabet_prior, background_files, background_type, number_of_shuffle, and markov_order

create_motif_in_seq(background_ids, background_dict, b_alphabets, sequence_probs, positions, motif_ids, motifs, orientations, to_replace=True)

Add motif instances to specific positions into specific backgrounds

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Parameters
background_ids: List[str]

List of background IDs in order of insertion

background_dict: Dict[str, str]

Dictionary of backgound IDs and sequences

b_alphabets: Dict[str, str]

Dictionary of background alphabet

sequence_probs: Dict[str, np.ndarray]

Dictionary of background alphabet prior probabilities

positions: List[List[Tuple[int]]]

List of list of position tuples in order of insertion per sequence and per motif in the inner list

motif_ids: List[List[str]]

List of list of motif IDs in order of insertion per sequence and per position in the inner list.

motifs: Motifs

Data class for motifs with names (key), PPM, alphabet and alphabet pairs

orientations: List[List[int]]

List of list of motif instance orientations per sequence and per motif in the inner list.

to_replace: bool

Whether to replace backgorund bases with motif instance. Alternative is to insert between existing bases. Default: True

Return
motif_in_sequences: Dict[str, str]

Dictionary of sequence ids (with background, motif, position, and orientation) and corresponding sequences with motifs in

probabilistic_motif_in_sequences: Dict[str, np.ndarray]

Dictionary of sequence ids (with background, motif, position, and orientation) and corresponding probabilities of letters in sequences with motifs in

create_motifs(motif_params)

Option of creating motifs given input parameters

Return type:

None

Parameters
motif_params: MotifParams

Dataclass storing dirichlet_alpha, number_of_motifs, length_of_motifs_min, length_of_motifs_max, alphabet and motif_files

create_multimers(multimer_params)

Option of creating multimers given input motifs and rules

Return type:

None

Parameters
multimer_params: MultimerParams

Dataclass storing motif_files, jaspar_db_version and multimerisation_rule_path

mask_motif_in_seq(seq_with_motif, positions, mask_alphabet, mask_alphabet_prior, seq_with_motif_prob=None)

Mask motif instances with background-like sequences

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Parameters
seq_with_motif: Dict[str, str]

Dictionary of sequences with motifs corresponding to positions of masking

positions: Dict[str, List[Tuple[int]]]

Dictionary of list of position tuples corresponding to masking per sequence and per motif in the inner list

mask_alphabet: str

Alphabet of masking

mask_alphabet_prior: np.array

Array of probabilties of each letter in the alphabet of masking

seq_with_motif_prob: Dict[str, np.ndarray]

Dictionary of sequence with motifs probabilities corresponding to positions of masking. Optional. If not provided, no probabilistic output is returned (i.e., masked_probs is None)

Return
masked_sequences: Dict[str, str]

Dictionary of sequences after motifs are masked out

masked_probs: Dict[str, np.ndarray]

Dictionary of sequence probabilities after the motifs are masked

run_inmotifin(motif_params, background_params, group_params, freq_params, sampling_params, positions_params)

Prepare and sample

Return type:

None

Parameters
motif_params: MotifParams

Dataclass storing dirichlet_alpha, number_of_motifs, length_of_motifs_min, length_of_motifs_max, alphabet and motif_files

background_params: BackgroundParams

Dataclass storing alphabet, sequence length, sequence number, b_alphabet_prior, background_files, background_type, number_of_shuffle, and markov_order

group_params: groupParams

Dataclass storing number_of_groups, max_group_size, group_size_binom_p and group_motif_assignment_file

freq_params: FreqParams

Dataclass storing group_frequency_type, group_frequency_range, motif_frequency_type, motif_frequency_range, group_freq_file and motif_freq_file

sampling_params: SamplingParams

Data class with sampling parameters

positions_params: PositionParams

Data class with positioning parameters

run_sampling(sampling_params, positions_params)

Run main simulation module

Return type:

Tuple[Any]

Parameters
sampling_params: SamplingParams

Data class with sampling parameters

positions_params: PositionParams

Data class with positioning parameters

Return
sampled_data: Tuple[Any]

Tuple containing dagsim_graph, data, and no_motif_seq

save_outputs(dagsim_graph, data, no_motif_seq, no_motif_prob, to_draw)

Save outputs of simulation into files

Return type:

None

Parameters
dagsim_graph

Graph output from DagSim

data: Dict[]

Dictionary of sampled data

no_motif_seq: List[str]

List of sequences without motifs

to_draw: bool

Whether to draw dagsim_graph or not

setup_simulation(motif_params, background_params, group_params, freq_params)

Create data for sampling

Return type:

None

Parameters
motif_params: MotifParams

Dataclass storing dirichlet_alpha, number_of_motifs, length_of_motifs_min, length_of_motifs_max, alphabet and motif_files

background_params: BackgroundParams

Dataclass storing alphabet, sequence length, sequence number, b_alphabet_prior, background_files, background_type, number_of_shuffle, and markov_order

group_params: groupParams

Dataclass storing number_of_groups, max_group_size, group_size_binom_p and group_motif_assignment_file

freq_params: FreqParams

Dataclass storing group_frequency_type, group_frequency_range, motif_frequency_type, motif_frequency_range, group_freq_file and motif_freq_file

simulate_backgrounds(background_params, b_lengths=None)

Simulate backgrounds with background parameters, but can also create multiple different lengths

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Parameters
background_params: BackgroundParams

Dataclass storing alphabet, sequence length, sequence number, b_alphabet_prior, background_files, background_type, number_of_shuffle, and markov_order

b_lengths: List[int]

List of lenght of simulated backgrounds. Order should match with b_numbers.

Return
backgrounds: Dict[str, str]

Dictionary of background sequences

backgrounds_prob: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

Dataclasses for input parameters

class inmotifin.BasicParams(title, workdir=None, seed=None)

Class for keeping track of basic parameters

Class parameters

title: str

Title of the analysis

workdir: str

Working directory for the analysis, default is current directory. Note: it should be a relative path. Absolute paths are not supported.

seed: int

Random seed for reproducibility, default is None

class inmotifin.MotifParams(dirichlet_alpha=None, number_of_motifs=None, length_of_motifs_min=None, length_of_motifs_max=None, m_alphabet=None, m_alphabet_pairs=None, motif_files=None, jaspar_db_version=None)

Class for keeping track of parameters for motifs

Class parameters

dirichlet_alpha: np.ndarray

Dirichlet prior for motif probabilities, default is uniform

number_of_motifs: int

Number of motifs to generate, default is 10

length_of_motifs_min: int

Minimum length of motifs, default is 5

length_of_motifs_max: int

Maximum length of motifs, default is None, if not set all motifs will have the same length as length_of_motifs_min

m_alphabet: str

Motif alphabet, default is “ACGT”

m_alphabet_pairs: Dict[str, str]

Motif alphabet pairs for complementary bases, default is {“A”: “T”, “C”: “G”, “T”: “A”, “G”: “C”}

motif_files: List[str]

List of motif file(s) to use, default is empty

jaspar_db_version: str

Release name of JASPAR database version to use when Jaspar IDs are provided in the motif file(s) and fetched from JASPAR database via pyJASPAR. For futher information see pyJASPAR’s documentation. Example value: ‘JASPAR2024’ Default: None

class inmotifin.MultimerParams(motif_files, multimerisation_rule_path, jaspar_db_version=None)

Class for keeping track of parameters for multimers

Class parameters

motif_files: List[str]

List of motif file(s) to use for multimerisation

jaspar_db_version: str

Version of the JASPAR database to use when Jaspar IDs are provided in the motif file(s)

multimerisation_rule_path: str

Path to the multimerisation rules file

class inmotifin.BackgroundParams(b_alphabet=None, b_alphabet_prior=None, number_of_backgrounds=None, length_of_backgrounds_min=None, length_of_backgrounds_max=None, background_files=None, background_type=None, number_of_shuffle=None, markov_order=None, markov_n_iter=None, markov_algorithm=None, markov_seed=None)

Class for keeping track of parameters for background

Class parameters

b_alphabet: str

Background alphabet, default is “ACGT”

background_files: List[str]

List of background files to use, default is empty

b_alphabet_prior: np.ndarray

Background alphabet prior probabilities, default is uniform

number_of_backgrounds: int

Number of backgrounds to generate, default is 100

length_of_backgrounds_min: int

Minimum length of background sequences, default is 50

length_of_backgrounds_max: int

Maximum length of background sequences, default is None, if not set all background sequences will have the same length as length_of_backgrounds_min

background_type: str

Options: 1) fasta_iid (fasta files are used as is - default when background_files is not None) 2) random_nucl_shuffled_only (fasta files are used, nucleotides in sequences are shuffled and only shuffled ones are used) 3) random_nucl_shuffled_addon (fasta files are used, nucleotides in sequences are shuffled and both shuffled and original ones are used) 4) iid (fasta files are ignored if provided, b_alphabet_prior specifies nucelotide probabilities - default when background_files is None) 5) markov_fit (fasta files are used to fit hidden Markov model with order specified with markov_order. Original sequences are kept) 6) markov_sim (fasta files are used to fit hidden Markov model with order specified with markov_order. New sequences are sampled)

number_of_shuffle: int

Number of times to shuffle the backgrounds

markov_order: int

Order of Markov model to learn from sequences (when provided) and to simulate sequences. Defaults to 0 corresponding to learning independent nucleotide frequencies.

markov_n_iter: int

Number of iterations of Markov model to learn from sequences, default is 100

markov_algorithm: str

Algorithm of Markov model to learn from sequences. Options: ‘viterbi’ or ‘map’. See hmmlearn 0.3.3 documentation. default is ‘viterbi’.”

markov_seed: int

Seed for reproducibility for HMM

class inmotifin.GroupParams(number_of_groups=None, max_group_size=None, group_size_binom_p=None, group_motif_assignment_file=None)

Class for keeping track of parameters for groups

Class parameters

number_of_groups: int

Number of groups to generate, default is 1

max_group_size: int

Maximum size of each group, default is infinity

group_size_binom_p: float

Probability of success in the binomial distribution for group size, default is 1

group_motif_assignment_file: List[str]

List of group motif assignment files, default is empty

class inmotifin.FreqParams(group_frequency_type=None, group_frequency_range=None, motif_frequency_type=None, motif_frequency_range=None, group_group_type=None, concentration_factor=None, group_freq_file=None, motif_freq_file=None, group_group_file=None)

Class for keeping track of parameters for group and motif frequencies

Class parameters

group_frequency_type: str

Type of group frequency distribution (“uniform”, “random”)

group_frequency_range: int

The range of the potential differences between a frequent and a rare group

motif_frequency_type: str

Type of motif frequency distribution (“uniform”, “random”)

motif_frequency_range: int

The range of the potential differences between a frequent and a rare motif

group_group_type: str

Type of group-group interaction distribution on the off-diagonal (“uniform”, “random”)

concentration_factor: float

The preference of each groups to be selected again when selecting more than one group for insertion. Value between 0 and 1.

group_freq_file: str

File name for group frequency data

motif_freq_file: str

File name for motif frequency data

group_group_file: str

File name for group-group interaction data

class inmotifin.SamplingParams(to_draw=None, number_of_sequences=None, percentage_no_motif=None, orientation_probability=None, num_groups_per_sequence=None, motif_sampling_replacement=None, n_instances_per_sequence=None, lambda_n_instances_per_sequence=None)

Class for keeping track of parameters for sampling

Class parameters

to_draw: bool

Whether to draw the DAG of the sampling, default is False

number_of_sequences: int

Number of sequences to generate, default is 100

percentage_no_motif: float

Percentage of sequences without motif, default is 0

orientation_probability: float

Probability of orientation for the motif, default is 0.5

num_groups_per_sequence: int

Number of groups per sequence, default is 1

motif_sampling_replacement: bool

Whether to sample motifs from groups with replacement, default is True

n_instances_per_sequence: int

Number of instances per sequence, default is 1

lambda_n_instances_per_sequence: int

Lambda for the number of instances per sequence when Poisson distribution is used

class inmotifin.PositionParams(position_type=None, position_means=None, position_variances=None, to_replace=None)

Class for keeping track of parameters for positioning

Class parameters

position_type: str

Type of position distribution, possible values: “central”, “left_central”, “right_central”, “uniform”, “gaussian”. default is “central”

position_means: List[int]

List of means for the position distribution, used when position_type is “gaussian”

position_variances: List[float]

List of variances for the position distribution, used when position_type is “gaussian”

to_replace: bool

Whether to replace the positions when generating new positions. False when position_type is “gaussian”.

Dataclasses for input for sampling

class inmotifin.Motifs(motifs, alphabet, alphabet_revcomp_pairs)

Class for keeping track of motifs

Class parameters

motifs: Dict[str, np.ndarray]

Dictionary of motif IDs and arrays

alphabet: str

Alphabet of motifs

alphabet_revcomp_pairs: Dict[str, str]

Reverse complementary pairs of alphabet, e.g. {“A”: “T”, “C”: “G”, “T”: “A”, “G”: “C”}

motif_ids: List[str]

List of motif IDs (automatically extracted from motif dictionary)

check_alphabet()

Make sure that the correct number of letters are provided

class inmotifin.Backgrounds(backgrounds, b_alphabet, sequence_probs=None)

Class for keeping track of backgrounds

Class parameters

backgrounds: Dict[str, str]

Dictionary of background IDs and sequences

b_alphabet: str

Background alphabet, default is “ACGT”

sequence_prob: Dict[str, np.ndarray]

Position specific background probabilities. Defaults to i.i.d

background_ids: List[str]

List of background IDs (automatically extracted from background dictionary)

class inmotifin.Groups(groups)

Class for keeping track of groups

Class parameters

groups: Dict[str, List[str]]

Dictionary of group IDs and the motifs within each group

class inmotifin.Frequencies(group_freq, motif_freq_per_group, group_group_transition_prob)

Class for keeping track of frequencies

Class parameters

num_groups: int

Number of groups

group_freq: Dict[str, float]

Dictionary of group IDs and their expected occurrence frequencies

motif_freq_per_group: pd.DataFrame

Dataframe of expected frequencies of motifs per group

group_group_transition_prob: pd.DataFrame

Dataframe of expected transition probabilities of group pairs

class inmotifin.Positions(positions, to_replace)

Class for keeping track of positions

Class parameters

positions: List[Tuple[int]]

List of start and end of positions

to_replace: bool

Whether to replace background bases or insert in between existing bases

Simulating data

class inmotifin.Motifer(params, rng, reader, writer)

Class to generate and select sequence motifs

Class parameters

title: str

Title of the analysis

params: MotifParams

Dataclass storing dirichlet_alpha, number_of_motifs, length_of_motifs_min, length_of_motifs_max, alphabet and motif_files

rng: np.random.Generator

Random generator for length (uniform from integeres) and motif (Dirichlet) sampling

reader: Reader

File reader class to read in motifs if necessary

writer: Writer

instance of the writer class

motifs: Motifs

Data class for motifs with names (key) and PPM, alphabet and ids

motif_lengths: List[int]

The number of positions in each motif

create_motifs()

Controller function to read motifs from file or jaspar ID if file not available or simulate if no file nor ID are available

Return type:

None

get_motifs()

Get motifs

Return type:

Motifs

Return
motifs: Motifs

Motifs dataclass with PWMs and metadata

get_pwms()

Get PWMs of motifs

Return type:

Dict[str, ndarray]

Return
motif_dict: Dict[str, np.ndarray]

Dictionary with the motif IDs and PWMs

read_motifs()

Read motifs from files in csv, jaspar or meme format

Return type:

None

simulate_motifs()

Generate motifs with name and PPM

Return type:

None

simulate_one_motif(length)

Generate motif in PPM format

Return type:

ndarray

Parameters
length: int

The number of positions in the motif

Return
motif: np.ndarray

A single motif PWM in numpy array format

class inmotifin.Multimerer(params, reader, writer, rng)

Prepare multimers given two motifs and a distance

Class parameters

params: MultimerParams

Dataclass storing motif_files, jaspar_db_version and multimerisation_rule_path

multimer_rules: Dict[str, Tuple[List[str], List[int]]]

Dictionary of IDs and tuple of motif ID and pairwise distances

motifs: Motifs

Data class for motifs with names (key) and PPM, alphabet and ids

multimers: Motifs

Data class for multimer motifs with names (key) and PPM, alphabet and ids

reader: Reader

File reader class to read in motifs and distances

writer: Writer

instance of the writer class

rng: np.random.Generator

Random generator for adding epsilon to the equal probability of empty positions

create_a_multimer(motifs, distances, weights=None, random_variance=0.01)

Based on motifs and a rule create a multimer

Return type:

ndarray

Parameters
motifs: List[np.ndarray]

List of motifs that are part of the multimer

distances: List[int]

List of distances in between motifs that are part of the multimer

weights: List[float]

List of weigths for each motifs that are part of the multimer

random_variance: float

Magnitude of gaussian variance at the in-between positions

Return
multimer: np.ndarray

Multimer motif in numpy array format

create_multimers(multimer_rules, random_variance=0.01)

Fnction to assemble multimers

Return type:

None

get_multimers()

Getter for multimers

Return type:

Motifs

Return
multimers: Motifs

Data class for multimer motifs with names (key) and PPM, alphabet and ids

main()

Main function to read, assemble and save multimers

Return type:

None

read_motifs()

Read motifs from files in csv, jaspar or meme format

Return type:

None

read_multimer_rules()

Read tsv of multimerisation rules

Return type:

Tuple[str, Tuple[List[str], List[int], List[float]]]

save_multimers()

Save multimers in meme format

Return type:

None

set_motifs(motifs)

Setter for motifs when run from within python

Return type:

None

Parameters
motifs: Motifs

Instance of the Motifs dataclass

class inmotifin.Backgrounder(params, reader, writer, rng)

Class to generate or read background sequences

Class parameters

title: str

Title of the analysis

params: BackgroundParams

Dataclass storing alphabet, sequence length, sequence number, b_alphabet_prior, order, background_files, background_type, number_of_shuffle, and markov_order

backgrounds: Backgrounds

Data class for backgrounds

shuffler: Shuffler

Class for shuffling background sequence

reader: Reader

File reader class to read in sequences if necessary

writer: Writer

instance of the writer class

rng: np.random.Generator

Random generator for sampling letters

assign_iid_probs(backgrounds)

Assign position probabilities to sequences based on alphabet priors

Return type:

Dict[str, ndarray]

Parameters
backgrounds: Dict[str, str]

Dictionary with the background IDs and sequences

Return
backgrounds_prob: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

create_backgrounds()

Controller function to read backgrounds or simulate if no file available

Return type:

None

fit_markov(sequences)

Fit hidden Markov model on sequences to get position specific letter probabilities

Return type:

Dict[str, ndarray]

Parameters
sequences: Dict[str, str]

Dict of input sequences

Return
backgrounds: Dict[str, str]

Dictionary of background sequences

backgrounds_prob: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

get_backgrounds()

Getter for simulated backgrounds

Return type:

Backgrounds

Return
backgrounds: Backgrounds

Backgrounds dataclass with sequence and metadata

get_backgrounds_seq()

Getter for simulated backgrounds

Return type:

Dict[str, str]

Return
backgrounds_seq: Dict[str, str]

Dictionary with the background IDs and sequences

markov_backgrounds()

Generates a dictionary of sequences with ids

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Return
backgrounds: Dict[str, str]

Dictionary of background sequences and IDs

background_probs: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

read_backgrounds()

Reads sequences into a dictionary of sequences with ids and dictionary of sequence probabilities (iid)

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Return
backgrounds: Dict[str, str]

Dictionary of background sequences and IDs

background_probs: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

shuffle_backgrounds()

Shuffle available backgrounds thus generate new ones

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Parameters
backgrounds: Dict[str, str]

Dictionary with the background IDs and sequences

Return
backgrounds_seq: Dict[str, str]

Dictionary with the shuffled background IDs and sequences

backgrounds_prob: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

simulate_iid_backgrounds(b_lengths=None)

Generates a dictionary of random sequences within which each position is iid, and assignes IDs for each sequence

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Parameters
b_lengths: List[int]

List of lenght of simulated backgrounds. If None, sampled with b_length_min and b_length_max fetched from params data

Return
backgrounds: Dict[str, str]

Dictionary of background sequences

backgrounds_prob: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

simulate_markov(sequences)

Simulate background using Markov model

Return type:

Tuple[Dict[str, str], Dict[str, ndarray]]

Parameters
sequences: Dict[str, str]

Dict of input sequences

Return
backgrounds: Dict[str, str]

Dictionary of background sequences

backgrounds_prob: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

class inmotifin.Markover(alphabet, order, n_iter, rng, algorithm='map', seed=123)

Class to learn Markov model from input sequences

Class parameters

algorithm: str

Name of the algorithm to be used in HMM: map (default) or viterbi

alphabet_idx_map: Dict[str, int]

Map alphabet characters to integers

idx_alphabet_map: Dict[int, str]

Map integers to alphabet characters

model: hmm.CategoricalHMM

HMM model to fit and sample from

rng: np.random.Generator

Random generator for length (uniform from integers)

calc_position_probabilities(sampled_states)

Given a sequence and a trained model, calculate emission probabilities for each position

Return type:

ndarray

Parameters
sampled_states: List[int]

List of sampled states (in a single sequence)

Return
_np.ndarray

Numpy array of probabilities of each letter per each position

fit_model(sequences)

Function to fit model on sequences

Return type:

None

Parameters
sequences: List[str]

List of sequences to fit model on

get_probabilities(sequences)

Assign position specific probabilities for the input sequences based on fitted model

Return type:

Dict[str, ndarray]

Parameters
sequences: Dict[str, str]

Dictionary of background sequences

Return
backgrounds_prob: Dict[str, np.ndarray]

Dictionary of background sequence probabilities of letters in each position

get_str_sequence(sampled_seq)

Convert sampled sequence to string

Return type:

str

Parameters
sampled_seq: np.ndarray

A list of sampled integers as characters per position

Return
sampled_str: str

Character string representation of the sampled sequence

sample_from_model(len_sample)

Sample sequence from fitted model

Return type:

Tuple[List[int], List[List[int]]]

Parameters
len_sample: int

Length of the sequence to be sampled

Return
sampled_seq: List[int]

List of sampled sequences

sampled_states: List[List[int]]

List of sampled stated per position

sample_str_and_prob(len_seq_min, len_seq_max, num_seq)

Sample a sequence and its positional probabilities

Return type:

Tuple[List[str], List[List[ndarray]]]

Parameters
len_seq_min: int

Minimum length of the sequence to be sampled

len_seq_max: int

Maximum length of the sequence to be sampled. If None, it is set equal to len_seq_min. Defaults to None

num_seq: int

Number of sequences to generate

Return
seq_str: List[str]

List of sampled sequences

seq_probs: List[List[np.ndarray]]

List of position-specific letter probabilities

class inmotifin.Shuffler(number_of_shuffle, rng)

Class to shuffle background sequences

Class parameters

number_of_shuffle: int

The number of new sequences to be created from an existing one by shuffling it

rng: np.random.Generator

Random generator for permuting letters

shuffle_seq_random_nucleotide(backgrounds)

Randomly shuffle each letter in the previously read sequences, adding new sequence entries

Return type:

Dict[str, str]

Parameters
backgrounds: Dict[str, str]

Dictionary of backgrounds to update

Return
shuffled: Dict[str, str]

Dictionary of shuffled backgrounds

class inmotifin.Grouper(params, motif_ids, reader, writer, rng)

Class to select motif-group

Class parameters

title: str

Title of the analysis

params: groupParams

Dataclass storing number_of_groups, max_group_size, group_size_binom_p and group_motif_assignment_file

motif_ids: List[str]

Motif IDs

reader: Reader

File reader class to read in sequences if necessary

writer: Writer

instance of the writer class

groups: groups

groups with names (key) and list of motifs within them

rng: np.random.Generator

Random generator for group sizes and motif sampling

assign_motifs_to_groups(group_sizes)

Assign each motif to a group

Return type:

Dict[str, List[str]]

Parameters
group_sizes: List[int]

List of sizes of groups

Return
motif_group_membership: Dict[str, List[str]]

Dictionary of the group IDs and the list of motifs within

create_groups()

Create group sizes and memberships

Return type:

None

get_groups()

Getter for groups

Return type:

Groups

Return
groups: groups

groups with names (key) and list of motifs within them

membership_assignment(assignees, group_sizes)

General function to assign membership of one list to another

Return type:

Dict[str, List[str]]

Parameters
assignees: Set[str]

Set of instances that should be assigned to groups

group_sizes: List[int]

List of sizes of groups

Return
group_assignee_membership: Dict[str, List[str]]

Dictionary of memberships of assignees within groups Key: group id, Value: list of the id of the asignees

read_groups()

Read in group sizes and memberships from file

Return type:

None

select_group_sizes_binomial()

Helper function to select sizes of groups

Return type:

List[int]

Return
adjusted_sizes: List[int]

List of sizes of groups

simulate_groups()

Simulate group sizes and memberships

Return type:

None

class inmotifin.Frequencer(params, groups, reader, writer, rng)

Class to generate motif and group background frequencies, that is the selection probability for each group and motif within

Class parameters

title: str

Title of the analysis

params: FreqParams

Dataclass storing group_frequency_type, group_frequency_range, motif_frequency_type, motif_frequency_range, group_freq_file and motif_freq_file

groups: groups

The groups with ids and assigned motifs

num_groups: int

Number of groups

reader: Reader

File reader class to read in sequences if necessary

writer: Writer

instance of the writer class

frequencies: Frequencies

Data class for frequencies

rng: np.random.Generator

Random generator for random frequency sampling

assign_frequencies()

Read in or simulate group and motif frequencies

Return type:

None

assign_group_frequencies()

Simulate group frequencies

Return type:

Dict[str, float]

Return
group_freq: Dict[str, float]

Dictionary of group IDs and their expected occurrence frequencies

assign_group_group_trans_probs()

Simulate the probability of selecting groupX given previously selected groupY

Return type:

DataFrame

Return
group_group_transition_prob: pd.DataFrame

Pandas dataframe of co-occurrences of group pairs

assign_motif_frequencies()

Simulate motif frequencies within groups

Return type:

DataFrame

Return
motif_group_df: pd.DataFrame

Pandas dataframe of motif frequencies per group

get_frequencies()

Getter for group and motif frequencies

Return type:

Frequencies

Return
frequencies: Frequencies

Data class for frequencies

pairs_random(remaining_prob)

Creating a matrix of group-group and their transition probability: off-diagonals are random but rows sum to 1

Return type:

ndarray

Parameters
remaining_prob: float

The probability remaining after assigning self transition

Return
group_prob_arr: np.ndarray

Array containing probabilities for group transition

pairs_uniform(remaining_prob)

Creating a matrix of group-group and their transition probabilities: off-diagonals are uniform

Return type:

ndarray

Parameters
remaining_prob: float

The probability remaining after assigning self transition

Return
group_prob_arr: np.ndarray

Array containing probabilities for group transition

read_group_freq()

Read in group frequencies

Return type:

Dict[str, float]

Return
group_freq: Dict[str, float]

Dictionary of group IDs and their expected occurrence frequencies

read_group_group_trans()

Read in group group transitions

Return type:

DataFrame

Return
group_group: pd.DataFrame

Pandas dataframe of co-occurrences of group pairs

read_motif_freq_per_group()

Read in motif frequencies

Return type:

DataFrame

Return
motif_freq: pd.DataFrame

Motif frequencies per group from file

simulate_background_freq(freq_type, freq_range, ids)

Simulate background frequencies

Return type:

Dict[str, float]

Parameters
freq_type: str

Way to generate frequencies. Currently random and uniform are supported. Random refers to random sampling from a range of probabilities given freq_range. Uniform refers to assigning equal probabilities to all items.

freq_range: int

The expected max difference between an unlikely and a likely event . E.g. if set to 100, a low probability event can be 100x less likely than a high probability one

ids: List[str]

The IDs of the items to assign frequency to

Return
background_freq: Dict[str, float]

Probability assigned to each element of the given ids

simulate_background_freq_random(difference_width, ids)

Simulate background frequencies random uniform

Return type:

Dict[str, float]

Parameters
difference_width: int

The expected max difference between an unlikely and a likely event . E.g. if set to 100, a low probability event can be 100x less likely than a high probability one

ids: List[str]

The IDs of the items to assign frequency to

Return
background_freq: Dict[str, float]

Probability assigned to each element of the given ids

simulate_background_freq_uniform(ids)

Simulate equal background frequencies for all items

Return type:

Dict[str, float]

Parameters
ids: List[str]

The IDs of the items to assign frequency to

Return
background_freq: Dict[str, float]

Probability assigned to each element of the given ids

Sampling from data

class inmotifin.MotifInstancer(motifs, rng)

Class to take in motifs and generate the required number of instances

Class parameters

motifs: Motifs

Data class for motifs with names (key) and PPM

rng: np.random.Generator

Random generator for multinomial instance sampling

get_one_new_instance(motif_index)

Generate exactly one new motif instance

Return type:

str

Parameters
motif_index: str

ID of the motif from which an instance to be sampled

Return
instance_str: str

Sequence of a motif instance

orient_motif(current_instance, orientation)

Reverse complementing an instance as necessary

Return type:

str

Parameters
current_instance: str

String of a motif instance

orientation: int

0 or 1, where 0 means keeping the orientation and 1 means reverse complementing the motif instance.

Return
oriented_instance: str

Sequence of an oriented instance

sample_instances(motif_idx_list, orientations)

Accessor function for creating new instances

Return type:

List[str]

Parameters
motif_idx_list: List[str]

List of motif IDs

orientations: List[int]

List of motif instance orientations

Return
instances: List[str]

List of motif instances

class inmotifin.BackgroundSampler(backgrounds, rng)

Class to support sampling functions

Class parameters

backgrounds: Backgrounds

Data class for backgrounds

rng: np.random.Generator

Random generator for selecting a background

get_b_alphabet()

Get background alphabet

Return type:

str

get_b_alphabet_prior()

Get background alphabet prior

Return type:

ndarray

get_background_ids(num_sample)

Get a list of selected background sequence IDs

Return type:

List[str]

Parameters
num_sample: int

Number of samples to select

Return
selected_ids: List[str]

List of sequence IDs

get_backgrounds(num_backgrounds)

Get a list of backgrounds and their probabilties may contain duplicated entries

Return type:

Tuple[List[str], List[ndarray]]

Parameters
num_backgrounds: int

Number of requested backgrounds

Return
selected_backgrounds: List[str]

List of non-unique comma separated background IDs and sequences

selected_b_probs: List[np.ndarray]

List of corresponding sequence probabilities

get_single_background(selected_id)

Get a selected background sequence

Return type:

Tuple[str, ndarray]

Parameters
selected_id: str

The name of the selected sequence

Return
bckg_seq: str

A sequence given the selected_id

bckg_prob: np.ndarray

A corresponding matrix of probabilities per position per letter

class inmotifin.FrequencySampler(frequencies, num_groups_per_seq, rng)

Class to select motif based on its background frequencies

Class parameters

frequencies: Frequencies

Frequencies data class including probabilities of groups and motifs within them

num_groups_per_seq: int

Number of groups to select in total

rng: np.random.Generator

Random generator for sampling

select_groups()

Select groups based on their frequency and transition probability in a Markov chain fashion: the selection of the next group depends on the previous selected one given the group_group_transition_prob matrix. The first group is selected from the base group frequency list

Return type:

List[str]

Return
selected_ids: List[str]

List of selected group ids

select_motifs_from_groups(group_ids, num_instances_per_seq, w_replacement=True)

Select motifs from given groups. Equal number of motifs from each group. If cannot equally assign, loops through groups and picks one each until no more motifs

Return type:

List[str]

Parameters
group_ids: List[str]

List of selected group ids

num_instances_per_seq: int

Number of motifs to select (per sequence)

w_replacement: bool

Whether to select motifs from groups with replacement. Note, if more motifs are requested than available in a group, replacement will be used regardless of this parameter.

Return
selected_motifs: List[str]

List of selected motif IDs

class inmotifin.Positioner(params, selected_instances, seq_length, reader, rng)

Class to select positions where motif instances are to be inserted

Class parameters

params: PositionParams

Dataclass storing position_type, position_means, position_variances, and to_replace (insertion type)

motif_lengths: List[int]

List of the length of the motif instances to be inserted

seq_length: int

Length of the background sequence

positions: Positions

Start and end indeces where the motif instance should be inserted

reader: Reader

Fileops class with reading functionalities

rng: np.random.Generator

Random generator for length (uniform from integeres)

check_central_positions(central_position)

Assert that the start and end positions are within bounds

Return type:

None

Parameters
central_position: List[Tuple[int]]

List of positions that should be within the bounds of the length of the background

check_lengths()

Helper function to check if motif instances would fit into background sequence used when the motifs are replacing background bases

Return type:

None

check_overlap(current_positions, start_idx, end_idx)

Helper function to assess overlapping motifs

Return type:

bool

Parameters
current_positions: List[Tuple[int]]

Current list of positions

start_idx: int

Start index of the motif

end_idx:int

End index of the motif

Return
_: bool

True if there is overlap

get_positions()

Getter for positions class

Return type:

Positions

Return
positions: Positions

Dataclass of the start and end values of the selected positions

get_to_replace()

Getter for to_replace parameter

Return type:

bool

Return
_: bool

True if the motif instances should replace background bases

select_central_position()

Calculate central position at the middle of the motif

Return type:

None

select_gaussian_inserted()

Sample positions following Gaussian distribution centered around k positions. Only for inserting motif without replacing background bases.

Return type:

None

select_leftcentral_position()

Calculate central position at the left side of the motif

Return type:

None

select_positions()

Main function to position selector

Return type:

Positions

Return
positions: Positions

Dataclass of selected start and end positions

select_positions_inserted()

Generate positions within background sequence to insert motif instances, insert motif instances without replacing background bases. Note: both positions are the start as insertion is non replacing

Return type:

None

select_positions_replace()

Generate positions within bacgkround sequence to insert motif instances, ensure that motif instances are not overlapping each other, motif instances replacing background bases.

Return type:

None

select_rightcentral_position()

Calculate central position at the right side of the motif

Return type:

None

select_single_position_replacer(l_motif)

Select a single position within background ranges including motif

Return type:

Tuple[int]

Parameters
l_motif: int

Length of the motif to be inserted

Return
start: int

Start coordinate

end: int

End coordinate

set_positions(positions)

Setter for positions class

Return type:

None

Parameters
positions: Positions

Class with start and end indeces where the motif instance should be inserted

class inmotifin.Inserter(to_replace)

Class to add motif instance(s) to sequences

Class parameters

to_replace: bool

Whether the motif instance replaces background bases, alternative is to insert by extending the bakground

add_single_instance(sequence, motif_instance, position)

Adds a given motif_instance in a background sequence by replacing existing bases or by increasing the length

Return type:

str

Parameters
sequence: str

String of the sequence used as background

motif_instance: str

motif instance to insert

position: int

the start location where the motif to be inserted within the background sequence

Return
new_sequence: str

Sequence with instance inserted

add_single_motif_probabilities(sequence, motif, position)

Adds a given motif in a background probability array by replacing existing bases or by increasing the length

Return type:

ndarray

Parameters
sequence: np.ndarray

Letter probabilities of the sequence used as background

motif: np.ndarray

PWM to insert

position: int

the start location where the motif to be inserted within the background sequence

Return
new_sequence: np.ndarray

Letter probabilities of sequence with motif inserted

create_insert_positions(positions, motif_instances=None, motif_ids=None)

Reverse the positions to insert from the end when bases are not replaced to avoid overwriting the positions of the already inserted motifs. Adjusts the motif list to match the lengths

Return type:

Tuple[List[Tuple[int]], List[str], List[str]]

Parameters
positions: List[Tuple[int]]

List of (start, end) position tuples.

motif_instances: List[str]

List of motif instance sequences

motif_ids: List[str]

List of motif IDs to insert

Return
positions: List[Tuple[int]]

List of (start, end) position tuples in correct order.

motif_instances: List[str]

List of motif instance sequences in correct order

motif_ids: List[str]

List of motif IDs to insert in correct order

generate_motif_in_sequence(sequence, motif_instances, positions)

Function to insert all motif_instances into a background

Return type:

str

Parameters
sequence: str

String of a background sequence to insert motif_instances to

motif_instances: List[str]

List of motif instance sequences

positions: List[Tuple[int]]

List of (start, end) position tuples.

Return
motif_in_sequence: str

Sequence with motif instances inserted

generate_probabilistic_motif_in_sequence(b_alphabet, sequence_prob, motifs, motif_ids, orientation_list, positions)

Function to insert motifs into a probabilistic background sequence

Return type:

ndarray

Parameters
b_alphabet: str

Background alphabet, default is “ACGT”

sequence_prob: np.ndarray

Background sequence position-specific probabilities

motifs: Motifs

Data class for motifs with names (key), PPM, alphabet and alphabet pairs

motif_ids: List[str]

List of motif IDs to insert

orientations: List[int]

Mask for instances. List of 0s and 1s, where 0 means keeping the orientation, 1 means reverse complementing the motif instance.

positions: List[Tuple[int]]

List of (start, end) position tuples.

Return
probabilistic_motif_in_sequence: np.ndarray

Letter probabilities in sequence with motif instances inserted

set_to_replace(to_replace)

Set whether to replace background letters with motif instances

Return type:

None

Parameters
to_replace: bool

Value whether to replace background letters with motif instances. If false: insert the instances and extend the sequence

Utility classes

class inmotifin.Reader

IO methods for reading motifs, groups and backgrounds

convert_jaspardict_to_ppm(pyjaspar_out)

Convert dictionary of pyjaspar to numpy array

Return type:

ndarray

Parameters

pyjaspar_out: Dict[str, List[float]]

Dictionary in the form of {‘A’: [1,1,1], ‘C’:[1,1,1], etc}

Return

motif_ppm: np.ndarray

Motif as ppm in numpy array format (col: ACGT, row: value per position)

fetch_motif_from_jaspar(mfile, jaspar_db_version='JASPAR2024')

Fetch motifs from JASPAR database

Return type:

Tuple[Dict[str, ndarray], str]

Parameters

mfile: str

List of motif JASPAR IDs of interest

jaspar_db_version: str

Version of the JASPAR database. Defaults to JASPAR2024

Return

motifs: Dict[str, np.ndarray]

Motifs with ID and ppm

alphabet: str

Letters of the input alphabet

read_fasta(fasta_files)

Read fasta files into a dictionary of identifiers and sequences

Return type:

Dict[str, str]

Parameters

fasta_files: List[str]

Path to the files

Return

sequences: Dict[str, str]

Dictionary of identifiers and sequences

read_in_motifs(motif_files, jaspar_db_version)

Select reader by identifying the format and read in motif

Return type:

Tuple[Dict[str, ndarray], str]

Parameters

motif_files: List[str]

List of files with motifs in jaspar or meme format

jaspar_db_version: str

Version of the JASPAR database. Used when motif IDs are specified.

Return

my_motifs: Dict[str, np.ndarray]

Dictionary of motifs with ID as key and ppm as value

alphabet: str

Alphabet read from file. Only one is supported per run.

read_jaspar(mfile)

Read motif in jaspar format with Bio.motifs

Return type:

Tuple[Dict[str, ndarray], str]

Parameters

mfile: str

Path to the file

Return

motifs_in: Dict[str, np.ndarray]

Motifs with ID and ppm

alphabet: str

Letters of the input alphabet

read_meme(mfile)

Read motif in meme

Return type:

Tuple[Dict[str, ndarray], str]

Parameters

mfile: str

Path to the file

Return

motifs_in: Dict[str, np.ndarray]

Motifs with ID and ppm

alphabet: str

Letters of the input alphabet

read_motif(mfile, jaspar_db_version=None)

Read motif from inferred format

Return type:

Dict[str, ndarray]

Parameters

mfile: str

Path to the file

jaspar_db_version: str

Version of the JASPAR database. Used when motif IDs are specified. Defaults to None.

Return

motifs_in: Dict[str, np.ndarray]

Motifs with ID and PWM

read_multimerisation_tsv(multimerisation_rule_path)

Read tsv with two (optionally three) columns of comma separated lists (motif id, distance and weights): List[str] and List[int] and List[float]

Return type:

Dict[str, Tuple[List[str], List[int], List[float]]]

Parameters

multimerisation_rule_path: str

Path to a tsv with (optionally three) columns of comma separated lists (motif id, distance and weights): List[str] and List[int] and List[float]

Return

multimer_rules: Dict[str, Tuple[List[str], List[int], List[float]]]

Dictionary of IDs and tuple of motif ID and pairwise distances

read_tsv_to_pandas(pandas_dftsv_path)

Read in tsv exported by pandas or tsv looking like that

Return type:

DataFrame

Parameters

pandas_dftsv_path: str

Path to a TSV file with first column the index valued for pandas dataframe

Return

df_from_tsv: pd.DataFrame

Pandas dataframe from the provided TSV file

read_twocolumn_tsv(twocolumns_tsv_path)

Read tsv of two columns, second is a comma separated list or a single value

Return type:

Dict[str, List[str]]

Parameters

twocolumns_tsv_path: str

Path to a TSV file with two columns. First column must have a single value.

Return

two_columns: Dict[str, List[str]]

Dictionary of the values of the two columns, where the key is the value of the first column and the value of the dictionary is the value(s) of the second column.

class inmotifin.Writer(workdir, title)

IO methods for saving simulated motifs, groups and backgrounds

Class parameters

title: str

Title of the analysis

workdir: str

Directory of the analysis

outfolder: str

A subfolder in the workdir with the same name as the title

data_to_bed(dagsim_data)

Save bed files with motif coordinates.

Return type:

None

Parameters
dagsim_data: Dict[str, Any]

dictionary from the Dagsim output

dict_of_dict_to_json(counts_dict, filename)

Save counts dictionary to json

Return type:

None

Parameters
counts_dict: Dict[str, Dict[str, int]]:

Dictionary of names, values and counts

filename:

Output file name prefix. Json will be appended and saved to outfolder with title prefix.

dict_to_fasta(seq_dict, filename)

Export data in list format to a fasta file. One entry = one line

Return type:

None

Parameters
seq_dict: Dict[str, str]:

Sequence info and actual sequence in a dictionary format

filename:

Output file name prefix. Fa will be appended and saved to outfolder with title prefix.

dict_to_tsv(data_dict, filename)

Export data in dict format to a tsv file. One entry = one line

Return type:

None

Parameters
data_dict Dict[Any, List[Any]]:

any type of data in a dictionary format

filename:

File name prefix. TSV will be appended and saved to outfolder with title prefix.

get_outfolder()

Getter for name of folder where output is saved

Return type:

str

Return
outfolder: str

A subfolder in the workdir with the same name as the title

get_title()

Getter for the analysis title

Return type:

str

Return
title: str

Title of the analysis

list_to_file(data_list, filename, file_format='txt')

Export data in list format to a file. One entry = one line

Return type:

None

Parameters
data_list:

Any type of data in a list format

filename:

Output file name (without extension)

file_format: str

Format of the output file. Defaults to txt

motif_to_meme(motifs, alphabet, file_prefix)

Save motif position probability matrices in meme format

Return type:

None

Parameters
motifs: Dict[str, np.ndarray]

the IDs and ppms of the simulated motifs

alphabet: str

Characters in the order of column assignment (eg ACGT)

file_prefix: str

Output file name prefix. Meme will be appended and saved to outfolder with title prefix.

motived_and_plain_to_fasta(dagsim_data, no_motif_for_fasta, no_motif_prob)

Save fasta files with or without motived sequences.

Return type:

None

Parameters
dagsim_data: Dict[str, Any]

dictionary from the Dagsim output

no_motif_for_fasta: List[str]

List of selected (not necessarily unique) background IDs and sequences without inserted motif

no_motif_prob: List[np.ndarray]

List of corresponding sequence probabilties

pandas_to_tsv(dataframe, filename)

Write pandas dataframe to file

Return type:

None

Parameters
dataframe: pd.DataFrame

Data to save

filename: str

File name prefix. TSV will be appended and saved to outfolder with title prefix.

save_dagsim_data(dagsim_data, nomotif_in_seq, nomotif_prob)

Save dagsim data and unmotived sequences to bed and fasta files

Return type:

None

Parameters
dagsim_data: Dict[str, Any]

dictionary from the Dagsim output

nomotif_in_seq: List[str]

List of background ids and sequences without inserted motif

nomotif_prob: List[np.ndarray]

List of corresponding sequence probabilties

save_dictionary_with_numpy_to_npz(numpy_dict, filename)

Save dictionary with numpy arrays into npz format

Return type:

None

Parameters
numpy_dict: Dict[str, np.ndarray]

Dictionary with string keys and any dimensional numpy arrays as values

filename: str

File name which will get the .npz extension.

setup_dirs()

Create workfolder and outfolder if does not exist yet

Return type:

None

Utility functions

inmotifin.onehot_to_str(alphabet, motif_onehot)

Convert one-hot encoded motif into a string motif

Return type:

str

Parameters

alphabet: List[chr]

Allowed characters in the sequence (eg [A, C, G, T] or ‘ACGT’)

motif_onehot: List[np.array]

One-hot encoded motif

Return

motif: str

Motif in string format

inmotifin.create_reverse_complement(alphabet, motif_instance)

Translate sequence to its reverse complement. Case sensitive

Return type:

str

Parameters

alphabet: Dict[chr, chr]

Pairs of characters and their complementary pairs e.g. {‘A’:’T’, ‘C’:’G’, ‘G’:’C’, ‘T’:’A’}

motif_instance: str

Motif sequence

Return

revcomp: str

Reverse complement of motif sequence

inmotifin.define_complementary_map_motif_array(alphabet, alphabet_pairs)

Translate index of alphabet letter pair for column permutation

Return type:

List[int]

Parameters

alphabet: str

Alphabet in the order of motif numpy array columns

alphabet_pairs: Dict[chr, chr]

Pairs of characters and their complementary pairs e.g. {‘A’:’T’, ‘C’:’G’, ‘G’:’C’, ‘T’:’A’}

Return

complementary_idx: List[int]

Index of the partner of the letter

inmotifin.create_reverse_complement_motif(motif, complementary_idx)

Translate index of alphabet letter pair

Return type:

ndarray

Parameters

motif: np.ndarray

PPM of a motif in shape (len, alphabet)

complementary_idx: List[int]

Index of the partner of the letter

Return

oriented_motif: np.ndarray

PPM of a reverse complemented motif in shape (len, alphabet)