ChemEvolve¶
ChemEvolve is a Python package which provides a suite of chemical evolution tools. ChemEvolve is designed to allow for the rapid development and exploration of a chemical reaction systems with standardized input and outputs. CoreEvolve (currently) utilizes the Simplified Stochastic or Gillispie Algorithm to exactly simulate chemical systems.
ChemEvolve provides a standard format to specify chemical reaction systems (including the rate constants, catalytic effects, and a choice of propensity functions). It also allows users to simulate these reaction systems in spatially explicit models (e.g. solve Reaction-Diffusion Equations). All ouptut files are saved in Tidy Data format, to allow for easy plotting using standard data visualization tools ( e.g. Seaborn , or R )
Requirements¶
ChemEvolve requires Python 2.7.x, with NumPy, pandas, and both matplotlib and Seaborn for plotting and visualization.
Installation¶
To install, first download and install the package using pip,
$ pip install chemevolve
Then just throw this import line into the top of your python script:
import chemevolve as ce
That’s it! You should be good to go!
Tutorial¶
This tutorial is intended to provide some basic instruction for using ChemEvolve to evaluate Chemical Reaction Systems. For clarity this tutorial is seperated into three sections: 1) setting up a chemical reaction system, 2) specifying some initial conditions, 3) time evolving the system, and 4) plotting the output.
Setting up a chemical reaction system (CRS)
This is the most conceptually important part of using ChemEvolve. Setting up the CRS involves specifying the molecules which can exist in the system as well as the reactions the molecules are involved in. Each molecule must be given an ID numer (which is just an integer), as well as a name (which is a string). Each reaction must be assigned a set of reactants, reactant coefficients, products, product coefficients, a rate constant, any catalysts, and a propensity function.
ChemEvolve has three important objects: Reactions, which contain information about an individual chemical reaction, and CRS which contain lists of reactions as well as information about the molecules involved in those reactions, and an array which contains the abundance of each molecule. First lets think about Reaction and CRS objects.
As an illustrative example, let’s consider a system where all the molcules are made up of strings of As and Bs. Let’s consider all possible strings of As and Bs up to length 6. Let’s say all strings can form by concatenating smaller strings, and all strings can fragment at any location. Let’s say that the concatenation reactions happen with a propensity proportional to \(k_l [x_i][x_j]\), where \([x_i]\) is the concentration of one of the reactants and \(k_l\) is the rate constant associated with concatanation reactions. We can also say that dissocation reactions happen with a propensity proportional to \(k_d [x_i]\), where \(k_d\) is now the rate constant associated with dissocation reactions. For simplicity let’s assume that \(k_d\), is the same for all molecules, and \(k_l\) is the same for all assoication reactions.
How do we make this reaction system? Luckily ChemEvolve has a function to do it for us! It’s called
generate_all_binary_reactions
.You can check out the details of the function if you want, but using it is fairly simple, it’s under theBinaryPolyer
module, so we’ll import that. The propensities described above are what ChemEvolve considers ‘standard’ propensity functions, e.g. the reaction propensity depends linearly on the reactants involved and needs only one postive real valued rate constant. So for our example, let’s assume \(k_l = 0.001\), and \(k_d = 1.0\). Then generating a CRS for this system is as easy asimport chemevolve as ce from chemevolve import BinaryPolymer as BinPoly max_length = 6 # Maximum Length of binary polymers k_l = 0.001 # Forward (ligation) Reaction Rate constant k_d = 1.0 # Backward (degradation) Reaction Rate constant CRS = BinPoly.generate_all_binary_reactions(max_length, fconstant = k_l, bconstant = k_d)Specifying initial conditions
Once you’ve generated a CRS (either from a file or a function), you’ll need to specify the initial concentrations of molecules. ChemEvolve stores abundance information in a 3D array. The first two indicies are used to indicate spatial positions, x and y. The third index specifies the molecule of interest. For now, let’s just consider a well mixed system, so that all molecules are located in the same position. It makes sense to call the array storing concentration information,
concentrations
, but you could make it something different if you wanted.N_L = 1 # Size of the lattice (diffusion not yet implemented, fix at 1) total_mass = 10000 num_molecules = len(CRS.molecule_list) # Initialize array concentrations = np.zeros( (N_L, N_L, num_molecules), dtype = np.float64 ).We’ll assume that we start with an equal abundance of monomers in the system and nothing else. Since this is a binary polymer model, there’s only two monomer types and they’ll always be stored in the first two indices of the molecule list. However, we can always find the indices of a molecule, from the
molecule_dict
dictionary assoicated with the CRS. To find the indices of the monomers and the initial conditions to be half of the total mass we’ll doAindex = CRS.molecule_dict['A'] Bindex = CRS.molecule_dict['B'] # Set monomer values concentrations[0,0,Aindex] = total_mass/2.0 concentrations[0,0,Bindex] = total_mass/2.0.Remember, even though we don’t have any spatial extant in this model, we need to index the position, which is
x= 0, y = 0
.Time Evolving and Outputting Data
Now that we’ve got the
CRS
and the initial conditions set, we can time evolve the system using the Gillispie Algorithm! To do this we’ll use a function calledSSA_evolve
. In order to use this functions, we’ll need to know who long we want the system to evolve for, the simulation will run between timestau
andtau_max
. We’ll also need to set the random number generator seed, this is important for making sure different realizations of the time evolution are actually different. Even though the Gillispie Algorithm is stochasitic, it’s powered by pseudo random number generators and we should make sure they are seeded differently, which is why you have to provide it toSSA_evolve
. If we just want to update theconcentrations
array due to the time evolution, that’s all we’ll need. However, if we wantSSA_evolve
to output time series data (and who wouldn’t want that!?) we’ll need to provide it with two more parameters, a string calledoutput_prefix
:, and a float calledt_out
. The string tellsSSA_evolve
where to save our data and what to call it, your time series data will be saved as:output_prefix+ '_time_series_df.csv'
, so you can include any path or naming scheme you want.t_out
specifies how frequently the data should be output. Keep in mind due to the stochastic nature of the Gillispie Algorithm you’ll only output data approximately that frequently. Here’s an example of using this all together.### Time Evolve tau = 0.0 # Set current time tau_max = 1.0 # Total Integration Interval t_output = 0.1 # How frequently should data be output freq_counter = 0.0 # Counts which output is coming next rand_seed = 1337 # Seed for random number generator output_prefix = 'tutorial_data' # This is a string that will procceed all file names assoicated with the outputs of this simulation concentrations = ce.SSA_evolve(tau, tau_max, concentrations, CRS, rand_seed, output_prefix = output_prefix, t_out = t_output)During the computation
SSA_evolve
will save temporary files in your directory, when it is done with the time evolution it will automatically save your data in the file ‘tutorial_data_time_series_df.csv’, since we provided it with the output prefix ‘tutorial_data’.Plotting the output
All the outputs of ChemEvolve come in Tidy Data formats. This is done after the simulation is completed, so if you termiated the computation early the data will not be in the correct format. ChemEvolve has a few functions built in to plot the data, including looking at the molecule distribution, length distribution, and the time series of particular molcules. They are included in the
Plotting module
. Here’s an example of plotting the length distribution, then the molecule distribution, and then the time series abundance of the monomers:from chemevolve import Plotting as Plots Plots.plot_length_distribution(output_prefix + '_time_series_df.csv') Plots.plot_molecule_distribution(output_prefix + '_time_series_df.csv') Plots.plot_time_series(output_prefix + '_time_series_df.csv', ['A', 'B'])
Indices and tables¶
-
InitializeFunctions.
convert_CRS_to_npArrays
(CRS)[source]¶ This function converts ChemEvolve CRS objects into 4 numpy arrays. This allows the information to be easily passed to C.
- Input:
- CRS: Chemical Reaction System Object
- Output:
- constants: np double array with reaction constant values
- propensity_ints: np int32 array with integer identifying which propensity function to use
- reaction_arr: np int32 array with integers indicating the stochimetry of each reaction.
- Each row is a reaction, each column is a molecule. Reactants are negative, products are postive
- catalyst_arr: np double array with the effect of each molecule for each reaction.
- Rows are reactions, columns are molecule, non-zero value indicate catalysis
-
InitializeFunctions.
create_concentration_files
(file_prefix, N_L, molecules, concentrations, coordinates, dimensions=2)[source]¶ Create a concentration file to load as a np array dimensions - int, number of spatial dimensions N_L - int, size of one side of regular lattice molecules - list of strings representing molecules concentrations - list of ints representing abdunace of molecule at a particular site coordinates - list of tuples, coordinates in the lattice
-
InitializeFunctions.
create_concentration_files_old
(file_prefix, N_L, molecules, concentrations, coordinates, dimensions=2)[source]¶ Create a concentration file to load as a np array dimensions - int, number of spatial dimensions N_L - int, size of one side of regular lattice molecules - list of strings representing molecules concentrations - list of ints representing abdunace of molecule at a particular site coordinates - list of tuples, coordinates in the lattice
-
InitializeFunctions.
create_reaction_system
(filename)[source]¶ Reads a saved Reaction System and returns the CRS object
- Arguements:
- filename: the path to the saved Reaction System file (str)
- Returns:
- CRS: Reaction System Object of the CRS class.
-
InitializeFunctions.
get_c_pointers
(concentrations, constants, propensity_ints, reaction_arr, catalyst_arr)[source]¶ This function returns the C pointers to the input arrays. Pointers must be pasted to SSA library functions
- Arguments:
- concentrations: array of doubles containing molecule abundances
- constants: array of np doubles containing reaction constants
- propensity_ints: array of np int32 containing propesity integer codes
- reaction_arr: np int32 array with integers indicating the stochimetry of each reaction.
- Each row is a reaction, each column is a molecule. Reactants are negative, products are postive
- catalyst_arr: np double array with the effect of each molecule for each reaction.
- Rows are reactions, columns are molecule, non-zero value indicate catalysis
- Return:
- concentrations_ptr: points to concentration array
- constants_ptr: points to constants array
- propensity_ints_ptr: points to propensity_ints array
- reaction_arr_ptr: points to reaction_arr array
- catalyst_arr_ptr: points to catalyst_arr
-
InitializeFunctions.
read_concentration_files
(file_prefix)[source]¶ Opens a concentration file (.dat pickle list) and a molecule list file returns the numpy array with concentrations the last index identifying the molecular species the molecules list maps array indexes to molecular identities
-
InitializeFunctions.
read_concentration_files_old
(file_prefix)[source]¶ Opens a concentration file (.npy array) and a molecule list file returns the numpy array with concentrations the last index identifying the molecular species the molecules list maps array indexes to molecular identities
-
PropensityFunctions.
calculate_propensities
(CRS, concentrations, **kwargs)[source]¶ Calculate the propensity of a reaction according to the concentrations and propensity function
- Arguements:
- CRS: CRS object
- concentrations: array of molecule concentrations indexed by (position,ID)
- propensity_function: function option used to calculate reaction propensities
- Return:
- propensity_arr: an array of floats giving the total reaction propensity at each point in the system
-
PropensityFunctions.
replicator_composition_propensity_envMutation
(rxn, CRS, concentrations, mu=0.001)[source]¶ - Replication Propensity function calculates propensity as the concentrations of the replicator and the composition of the enivornment
- This propensity function calcuates the mutation propensity as a function of the resources availible
- Arguements:
- rxn: Reaction object
- CRS: CRS object for system
- concentrations: list of molecule concentrations indexed by ID
- Return:
- Ap: float, propensity of rxn given the current concentrations
-
PropensityFunctions.
standard_propensity
(rxn, CRS, concentrations)[source]¶ Standard Propensity function calculates propensity as the concentrations of the reactants raised to their coefficients
- Arguements:
- rxn: Reaction object
- CRS: CRS object for system
- concentrations: list of molecule concentrations indexed by ID
- Return:
- Ap: float, propensity of rxn given the current concentrations
-
OutputFunctions.
tidy_timeseries
(molecules, prefix, delete_dat=True)[source]¶ This function replaces the seperate time-series .dat files with a tidy dataframe containing all the time series data
- Arguements:
- prefix: string which is the prefix used to save the time-series data files
- delete_dat (optional): if False the .npy files will be saved, otherwise they will be deleted, default: True
-
class
CoreClasses.
CRS
(molecule_list=[], molecule_dict={}, reactions=[])[source]¶ A chemical reaction system comprising a list of molecules, and allowed reactions molecule_list contains strings, indexing molecules by their ID molecule_dict maps molecule strings to IDs (indices) reactions contains a list of reaction objects
- Attributes:
- molecule_list: a list of molecule strings, indexed by their ID (which is an int)
- molecule_dict: a dictionary that maps molecule strings to IDs
- reactions: a list of all reaction objects allowed in the Reaction System, indexed by their ID
-
class
CoreClasses.
Reaction
(ID, reactants=[], reactant_coeff=[], products=[], product_coeff=[], constant=1.0, catalysts=[], catalyzed_constants=[], prop=”)[source]¶ Reaction object containing an ID and reaction members (IDs not strings) as well as reaction coefficients and rate constants reactants, and products lists should contain integers which are the IDs of the reactant molecules and the product molecules reactant_coeff and product_coeff contain ints which are the reaction coefficients. The indices must match the corresponding ID lists
- Attributes:
- ID: unique int which indentifies the reaction
- reactants: a list of molecule IDs which stores the reactants
- reactant_coeff: a list of integers which contains the reactant coefficients (if reactant_coeff is not provided all reactants are assumed to have coefficients of 1)
- products: a list of molecule IDs which stores the products
- product_coeff: a list of integers which contains the product_coeff coefficients
- constant: a floating point number, the reaction rate constant
- catalysts: a list of molecule IDs which specify which molecules catalyze this reaction
- catalyzed_constants: a list of floating point numbers giving the catalytic effect of each molecule
-
Plotting.
plot_length_distribution
(filename, savename=None)[source]¶ Plots the time averaged length distribution of molecules in the entire system. Shows plot unless a savename is specified
-
Plotting.
plot_molecule_distribution
(filename, savename=None)[source]¶ Plots the time averaged molecule distribution of molecules in the entire system. Shows plot unless a savename is specified
-
Plotting.
plot_time_series
(filename, print_molcules, savename=None)[source]¶ Plot the time series of molecule names in print_molcules, if savename provide figure will be saved
-
BinaryPolymer.
check_mass
(original_mass, CRS, concentrations)[source]¶ Checks conservation of mass Arguements
- original_mass: integer mass of the original system
- CRS: CRS object
- concentrations: array of molecule abundances
-
BinaryPolymer.
generate_all_binary_reactions
(max_length, fconstant=1.0, bconstant=None)[source]¶ Generates all possible reactions between binary molecules up to size max_length assigns all reactions the same constant and standard prorpenisty Arguements:
- max_length: the maximum length of polymers
- fconstant: reaction rate constant for forward reactions
- bconstant: reaction rate constant for reverse reactions, if None given, it will be assigned to be equal to fconstant
-
BinaryPolymer.
generate_random_rate_polymerization_reactions
(output_name, max_length, fconstant=1.0, bconstant=None, fdis_type=’exp’, bdis_type=’exp’)[source]¶ Generates all possible polymerization reactions for molecules up to size max_length assigns all reactions a random constant and standard prorpenisty Arguements:
- output_name: filename to save as output (should be .txt)
- max_length: the maximum length of polymers
- fconstant: mean reaction rate constant for forward reactions
- bconstant: mean reaction rate constant for reverse reactions, if None given, it will be assigned to be equal to fconstant
-
BinaryPolymer.
generate_wim_RAF
(max_length, fconstant=1.0, bconstant=None, f_cat=1.0)[source]¶ Generates all possible reactions between binary molecules up to size max_length assigns all reactions the same constant and standard prorpenisty Arguements:
- max_length: the maximum length of polymers
- fconstant: reaction rate constant for forward reactions
- bconstant: reaction rate constant for reverse reactions, if None given, it will be assigned to be equal to fconstant
Grant Support¶
This project is supported in part by a grant provided by the Templeton World Charity Foundation as part of the Power Of Information Initiative.