API reference

Symbolic Regressor

class gplearn.genetic.SymbolicRegressor(population_size=1000, generations=20, tournament_size=20, stopping_criteria=0.0, const_range=(-1.0, 1.0), init_depth=(2, 6), init_method='half and half', function_set=('add', 'sub', 'mul', 'div'), metric='mean absolute error', parsimony_coefficient=0.001, p_crossover=0.9, p_subtree_mutation=0.01, p_hoist_mutation=0.01, p_point_mutation=0.01, p_point_replace=0.05, max_samples=1.0, feature_names=None, warm_start=False, low_memory=False, n_jobs=1, verbose=0, random_state=None)[source]

A Genetic Programming symbolic regressor.

A symbolic regressor is an estimator that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction.

Parameters:
population_size : integer, optional (default=1000)

The number of programs in each generation.

generations : integer, optional (default=20)

The number of generations to evolve.

tournament_size : integer, optional (default=20)

The number of programs that will compete to become part of the next generation.

stopping_criteria : float, optional (default=0.0)

The required metric value required in order to stop evolution early.

const_range : tuple of two floats, or None, optional (default=(-1., 1.))

The range of constants to include in the formulas. If None then no constants will be included in the candidate programs.

init_depth : tuple of two ints, optional (default=(2, 6))

The range of tree depths for the initial population of naive formulas. Individual trees will randomly choose a maximum depth from this range. When combined with init_method=’half and half’ this yields the well- known ‘ramped half and half’ initialization method.

init_method : str, optional (default=’half and half’)
  • ‘grow’ : Nodes are chosen at random from both functions and terminals, allowing for smaller trees than init_depth allows. Tends to grow asymmetrical trees.
  • ‘full’ : Functions are chosen until the init_depth is reached, and then terminals are selected. Tends to grow ‘bushy’ trees.
  • ‘half and half’ : Trees are grown through a 50/50 mix of ‘full’ and ‘grow’, making for a mix of tree shapes in the initial population.
function_set : iterable, optional (default=(‘add’, ‘sub’, ‘mul’, ‘div’))

The functions to use when building and evolving programs. This iterable can include strings to indicate either individual functions as outlined below, or you can also include your own functions as built using the make_function factory from the functions module.

Available individual functions are:

  • ‘add’ : addition, arity=2.
  • ‘sub’ : subtraction, arity=2.
  • ‘mul’ : multiplication, arity=2.
  • ‘div’ : protected division where a denominator near-zero returns 1., arity=2.
  • ‘sqrt’ : protected square root where the absolute value of the argument is used, arity=1.
  • ‘log’ : protected log where the absolute value of the argument is used and a near-zero argument returns 0., arity=1.
  • ‘abs’ : absolute value, arity=1.
  • ‘neg’ : negative, arity=1.
  • ‘inv’ : protected inverse where a near-zero argument returns 0., arity=1.
  • ‘max’ : maximum, arity=2.
  • ‘min’ : minimum, arity=2.
  • ‘sin’ : sine (radians), arity=1.
  • ‘cos’ : cosine (radians), arity=1.
  • ‘tan’ : tangent (radians), arity=1.
metric : str, optional (default=’mean absolute error’)

The name of the raw fitness metric. Available options include:

  • ‘mean absolute error’.
  • ‘mse’ for mean squared error.
  • ‘rmse’ for root mean squared error.
  • ‘pearson’, for Pearson’s product-moment correlation coefficient.
  • ‘spearman’ for Spearman’s rank-order correlation coefficient.

Note that ‘pearson’ and ‘spearman’ will not directly predict the target but could be useful as value-added features in a second-step estimator. This would allow the user to generate one engineered feature at a time, using the SymbolicTransformer would allow creation of multiple features at once.

parsimony_coefficient : float or “auto”, optional (default=0.001)

This constant penalizes large programs by adjusting their fitness to be less favorable for selection. Larger values penalize the program more which can control the phenomenon known as ‘bloat’. Bloat is when evolution is increasing the size of programs without a significant increase in fitness, which is costly for computation time and makes for a less understandable final result. This parameter may need to be tuned over successive runs.

If “auto” the parsimony coefficient is recalculated for each generation using c = Cov(l,f)/Var( l), where Cov(l,f) is the covariance between program size l and program fitness f in the population, and Var(l) is the variance of program sizes.

p_crossover : float, optional (default=0.9)

The probability of performing crossover on a tournament winner. Crossover takes the winner of a tournament and selects a random subtree from it to be replaced. A second tournament is performed to find a donor. The donor also has a subtree selected at random and this is inserted into the original parent to form an offspring in the next generation.

p_subtree_mutation : float, optional (default=0.01)

The probability of performing subtree mutation on a tournament winner. Subtree mutation takes the winner of a tournament and selects a random subtree from it to be replaced. A donor subtree is generated at random and this is inserted into the original parent to form an offspring in the next generation.

p_hoist_mutation : float, optional (default=0.01)

The probability of performing hoist mutation on a tournament winner. Hoist mutation takes the winner of a tournament and selects a random subtree from it. A random subtree of that subtree is then selected and this is ‘hoisted’ into the original subtrees location to form an offspring in the next generation. This method helps to control bloat.

p_point_mutation : float, optional (default=0.01)

The probability of performing point mutation on a tournament winner. Point mutation takes the winner of a tournament and selects random nodes from it to be replaced. Terminals are replaced by other terminals and functions are replaced by other functions that require the same number of arguments as the original node. The resulting tree forms an offspring in the next generation.

Note : The above genetic operation probabilities must sum to less than one. The balance of probability is assigned to ‘reproduction’, where a tournament winner is cloned and enters the next generation unmodified.

p_point_replace : float, optional (default=0.05)

For point mutation only, the probability that any given node will be mutated.

max_samples : float, optional (default=1.0)

The fraction of samples to draw from X to evaluate each program on.

feature_names : list, optional (default=None)

Optional list of feature names, used purely for representations in the print operation or export_graphviz. If None, then X0, X1, etc will be used for representations.

warm_start : bool, optional (default=False)

When set to True, reuse the solution of the previous call to fit and add more generations to the evolution, otherwise, just fit a new evolution.

low_memory : bool, optional (default=False)

When set to True, only the current generation is retained. Parent information is discarded. For very large populations or runs with many generations, this can result in substantial memory use reduction.

n_jobs : integer, optional (default=1)

The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

verbose : int, optional (default=0)

Controls the verbosity of the evolution building process.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

[R4f6d16d9df43-1]
  1. Koza, “Genetic Programming”, 1992.
[R4f6d16d9df43-2]
  1. Poli, et al. “A Field Guide to Genetic Programming”, 2008.
Attributes:
run_details_ : dict

Details of the evolution process. Includes the following elements:

  • ‘generation’ : The generation index.
  • ‘average_length’ : The average program length of the generation.
  • ‘average_fitness’ : The average program fitness of the generation.
  • ‘best_length’ : The length of the best program in the generation.
  • ‘best_fitness’ : The fitness of the best program in the generation.
  • ‘best_oob_fitness’ : The out of bag fitness of the best program in the generation (requires max_samples < 1.0).
  • ‘generation_time’ : The time it took for the generation to evolve.
fit(self, X, y, sample_weight=None)

Fit the Genetic Program according to X, y.

Parameters:
X : array-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

sample_weight : array-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns:
self : object

Returns self.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

predict(self, X)[source]

Perform regression on test vectors X.

Parameters:
X : array-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:
y : array, shape = [n_samples]

Predicted values for X.

score(self, X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True values for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

R^2 of self.predict(X) wrt. y.

Notes

The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self

Symbolic Classifier

class gplearn.genetic.SymbolicClassifier(population_size=1000, generations=20, tournament_size=20, stopping_criteria=0.0, const_range=(-1.0, 1.0), init_depth=(2, 6), init_method='half and half', function_set=('add', 'sub', 'mul', 'div'), transformer='sigmoid', metric='log loss', parsimony_coefficient=0.001, p_crossover=0.9, p_subtree_mutation=0.01, p_hoist_mutation=0.01, p_point_mutation=0.01, p_point_replace=0.05, max_samples=1.0, feature_names=None, warm_start=False, low_memory=False, n_jobs=1, verbose=0, random_state=None)[source]

A Genetic Programming symbolic classifier.

A symbolic classifier is an estimator that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction.

Parameters:
population_size : integer, optional (default=500)

The number of programs in each generation.

generations : integer, optional (default=10)

The number of generations to evolve.

tournament_size : integer, optional (default=20)

The number of programs that will compete to become part of the next generation.

stopping_criteria : float, optional (default=0.0)

The required metric value required in order to stop evolution early.

const_range : tuple of two floats, or None, optional (default=(-1., 1.))

The range of constants to include in the formulas. If None then no constants will be included in the candidate programs.

init_depth : tuple of two ints, optional (default=(2, 6))

The range of tree depths for the initial population of naive formulas. Individual trees will randomly choose a maximum depth from this range. When combined with init_method=’half and half’ this yields the well- known ‘ramped half and half’ initialization method.

init_method : str, optional (default=’half and half’)
  • ‘grow’ : Nodes are chosen at random from both functions and terminals, allowing for smaller trees than init_depth allows. Tends to grow asymmetrical trees.
  • ‘full’ : Functions are chosen until the init_depth is reached, and then terminals are selected. Tends to grow ‘bushy’ trees.
  • ‘half and half’ : Trees are grown through a 50/50 mix of ‘full’ and ‘grow’, making for a mix of tree shapes in the initial population.
function_set : iterable, optional (default=(‘add’, ‘sub’, ‘mul’, ‘div’))

The functions to use when building and evolving programs. This iterable can include strings to indicate either individual functions as outlined below, or you can also include your own functions as built using the make_function factory from the functions module.

Available individual functions are:

  • ‘add’ : addition, arity=2.
  • ‘sub’ : subtraction, arity=2.
  • ‘mul’ : multiplication, arity=2.
  • ‘div’ : protected division where a denominator near-zero returns 1., arity=2.
  • ‘sqrt’ : protected square root where the absolute value of the argument is used, arity=1.
  • ‘log’ : protected log where the absolute value of the argument is used and a near-zero argument returns 0., arity=1.
  • ‘abs’ : absolute value, arity=1.
  • ‘neg’ : negative, arity=1.
  • ‘inv’ : protected inverse where a near-zero argument returns 0., arity=1.
  • ‘max’ : maximum, arity=2.
  • ‘min’ : minimum, arity=2.
  • ‘sin’ : sine (radians), arity=1.
  • ‘cos’ : cosine (radians), arity=1.
  • ‘tan’ : tangent (radians), arity=1.
transformer : str, optional (default=’sigmoid’)

The name of the function through which the raw decision function is passed. This function will transform the raw decision function into probabilities of each class.

This can also be replaced by your own functions as built using the make_function factory from the functions module.

metric : str, optional (default=’log loss’)

The name of the raw fitness metric. Available options include:

  • ‘log loss’ aka binary cross-entropy loss.
parsimony_coefficient : float or “auto”, optional (default=0.001)

This constant penalizes large programs by adjusting their fitness to be less favorable for selection. Larger values penalize the program more which can control the phenomenon known as ‘bloat’. Bloat is when evolution is increasing the size of programs without a significant increase in fitness, which is costly for computation time and makes for a less understandable final result. This parameter may need to be tuned over successive runs.

If “auto” the parsimony coefficient is recalculated for each generation using c = Cov(l,f)/Var( l), where Cov(l,f) is the covariance between program size l and program fitness f in the population, and Var(l) is the variance of program sizes.

p_crossover : float, optional (default=0.9)

The probability of performing crossover on a tournament winner. Crossover takes the winner of a tournament and selects a random subtree from it to be replaced. A second tournament is performed to find a donor. The donor also has a subtree selected at random and this is inserted into the original parent to form an offspring in the next generation.

p_subtree_mutation : float, optional (default=0.01)

The probability of performing subtree mutation on a tournament winner. Subtree mutation takes the winner of a tournament and selects a random subtree from it to be replaced. A donor subtree is generated at random and this is inserted into the original parent to form an offspring in the next generation.

p_hoist_mutation : float, optional (default=0.01)

The probability of performing hoist mutation on a tournament winner. Hoist mutation takes the winner of a tournament and selects a random subtree from it. A random subtree of that subtree is then selected and this is ‘hoisted’ into the original subtrees location to form an offspring in the next generation. This method helps to control bloat.

p_point_mutation : float, optional (default=0.01)

The probability of performing point mutation on a tournament winner. Point mutation takes the winner of a tournament and selects random nodes from it to be replaced. Terminals are replaced by other terminals and functions are replaced by other functions that require the same number of arguments as the original node. The resulting tree forms an offspring in the next generation.

Note : The above genetic operation probabilities must sum to less than one. The balance of probability is assigned to ‘reproduction’, where a tournament winner is cloned and enters the next generation unmodified.

p_point_replace : float, optional (default=0.05)

For point mutation only, the probability that any given node will be mutated.

max_samples : float, optional (default=1.0)

The fraction of samples to draw from X to evaluate each program on.

feature_names : list, optional (default=None)

Optional list of feature names, used purely for representations in the print operation or export_graphviz. If None, then X0, X1, etc will be used for representations.

warm_start : bool, optional (default=False)

When set to True, reuse the solution of the previous call to fit and add more generations to the evolution, otherwise, just fit a new evolution.

low_memory : bool, optional (default=False)

When set to True, only the current generation is retained. Parent information is discarded. For very large populations or runs with many generations, this can result in substantial memory use reduction.

n_jobs : integer, optional (default=1)

The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

verbose : int, optional (default=0)

Controls the verbosity of the evolution building process.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

[R7aa29929ade2-1]
  1. Koza, “Genetic Programming”, 1992.
[R7aa29929ade2-2]
  1. Poli, et al. “A Field Guide to Genetic Programming”, 2008.
Attributes:
run_details_ : dict

Details of the evolution process. Includes the following elements:

  • ‘generation’ : The generation index.
  • ‘average_length’ : The average program length of the generation.
  • ‘average_fitness’ : The average program fitness of the generation.
  • ‘best_length’ : The length of the best program in the generation.
  • ‘best_fitness’ : The fitness of the best program in the generation.
  • ‘best_oob_fitness’ : The out of bag fitness of the best program in the generation (requires max_samples < 1.0).
  • ‘generation_time’ : The time it took for the generation to evolve.
fit(self, X, y, sample_weight=None)

Fit the Genetic Program according to X, y.

Parameters:
X : array-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

sample_weight : array-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns:
self : object

Returns self.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

predict(self, X)[source]

Predict classes on test vectors X.

Parameters:
X : array-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:
y : array, shape = [n_samples,]

The predicted classes of the input samples.

predict_proba(self, X)[source]

Predict probabilities on test vectors X.

Parameters:
X : array-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:
proba : array, shape = [n_samples, n_classes]

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(self, X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:
score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self

Symbolic Transformer

class gplearn.genetic.SymbolicTransformer(population_size=1000, hall_of_fame=100, n_components=10, generations=20, tournament_size=20, stopping_criteria=1.0, const_range=(-1.0, 1.0), init_depth=(2, 6), init_method='half and half', function_set=('add', 'sub', 'mul', 'div'), metric='pearson', parsimony_coefficient=0.001, p_crossover=0.9, p_subtree_mutation=0.01, p_hoist_mutation=0.01, p_point_mutation=0.01, p_point_replace=0.05, max_samples=1.0, feature_names=None, warm_start=False, low_memory=False, n_jobs=1, verbose=0, random_state=None)[source]

A Genetic Programming symbolic transformer.

A symbolic transformer is a supervised transformer that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction. The final population is searched for the fittest individuals with the least correlation to one another.

Parameters:
population_size : integer, optional (default=1000)

The number of programs in each generation.

hall_of_fame : integer, or None, optional (default=100)

The number of fittest programs to compare from when finding the least-correlated individuals for the n_components. If None, the entire final generation will be used.

n_components : integer, or None, optional (default=10)

The number of best programs to return after searching the hall_of_fame for the least-correlated individuals. If None, the entire hall_of_fame will be used.

generations : integer, optional (default=20)

The number of generations to evolve.

tournament_size : integer, optional (default=20)

The number of programs that will compete to become part of the next generation.

stopping_criteria : float, optional (default=1.0)

The required metric value required in order to stop evolution early.

const_range : tuple of two floats, or None, optional (default=(-1., 1.))

The range of constants to include in the formulas. If None then no constants will be included in the candidate programs.

init_depth : tuple of two ints, optional (default=(2, 6))

The range of tree depths for the initial population of naive formulas. Individual trees will randomly choose a maximum depth from this range. When combined with init_method=’half and half’ this yields the well- known ‘ramped half and half’ initialization method.

init_method : str, optional (default=’half and half’)
  • ‘grow’ : Nodes are chosen at random from both functions and terminals, allowing for smaller trees than init_depth allows. Tends to grow asymmetrical trees.
  • ‘full’ : Functions are chosen until the init_depth is reached, and then terminals are selected. Tends to grow ‘bushy’ trees.
  • ‘half and half’ : Trees are grown through a 50/50 mix of ‘full’ and ‘grow’, making for a mix of tree shapes in the initial population.
function_set : iterable, optional (default=(‘add’, ‘sub’, ‘mul’, ‘div’))

The functions to use when building and evolving programs. This iterable can include strings to indicate either individual functions as outlined below, or you can also include your own functions as built using the make_function factory from the functions module.

Available individual functions are:

  • ‘add’ : addition, arity=2.
  • ‘sub’ : subtraction, arity=2.
  • ‘mul’ : multiplication, arity=2.
  • ‘div’ : protected division where a denominator near-zero returns 1., arity=2.
  • ‘sqrt’ : protected square root where the absolute value of the argument is used, arity=1.
  • ‘log’ : protected log where the absolute value of the argument is used and a near-zero argument returns 0., arity=1.
  • ‘abs’ : absolute value, arity=1.
  • ‘neg’ : negative, arity=1.
  • ‘inv’ : protected inverse where a near-zero argument returns 0., arity=1.
  • ‘max’ : maximum, arity=2.
  • ‘min’ : minimum, arity=2.
  • ‘sin’ : sine (radians), arity=1.
  • ‘cos’ : cosine (radians), arity=1.
  • ‘tan’ : tangent (radians), arity=1.
metric : str, optional (default=’pearson’)

The name of the raw fitness metric. Available options include:

  • ‘pearson’, for Pearson’s product-moment correlation coefficient.
  • ‘spearman’ for Spearman’s rank-order correlation coefficient.
parsimony_coefficient : float or “auto”, optional (default=0.001)

This constant penalizes large programs by adjusting their fitness to be less favorable for selection. Larger values penalize the program more which can control the phenomenon known as ‘bloat’. Bloat is when evolution is increasing the size of programs without a significant increase in fitness, which is costly for computation time and makes for a less understandable final result. This parameter may need to be tuned over successive runs.

If “auto” the parsimony coefficient is recalculated for each generation using c = Cov(l,f)/Var( l), where Cov(l,f) is the covariance between program size l and program fitness f in the population, and Var(l) is the variance of program sizes.

p_crossover : float, optional (default=0.9)

The probability of performing crossover on a tournament winner. Crossover takes the winner of a tournament and selects a random subtree from it to be replaced. A second tournament is performed to find a donor. The donor also has a subtree selected at random and this is inserted into the original parent to form an offspring in the next generation.

p_subtree_mutation : float, optional (default=0.01)

The probability of performing subtree mutation on a tournament winner. Subtree mutation takes the winner of a tournament and selects a random subtree from it to be replaced. A donor subtree is generated at random and this is inserted into the original parent to form an offspring in the next generation.

p_hoist_mutation : float, optional (default=0.01)

The probability of performing hoist mutation on a tournament winner. Hoist mutation takes the winner of a tournament and selects a random subtree from it. A random subtree of that subtree is then selected and this is ‘hoisted’ into the original subtrees location to form an offspring in the next generation. This method helps to control bloat.

p_point_mutation : float, optional (default=0.01)

The probability of performing point mutation on a tournament winner. Point mutation takes the winner of a tournament and selects random nodes from it to be replaced. Terminals are replaced by other terminals and functions are replaced by other functions that require the same number of arguments as the original node. The resulting tree forms an offspring in the next generation.

Note : The above genetic operation probabilities must sum to less than one. The balance of probability is assigned to ‘reproduction’, where a tournament winner is cloned and enters the next generation unmodified.

p_point_replace : float, optional (default=0.05)

For point mutation only, the probability that any given node will be mutated.

max_samples : float, optional (default=1.0)

The fraction of samples to draw from X to evaluate each program on.

feature_names : list, optional (default=None)

Optional list of feature names, used purely for representations in the print operation or export_graphviz. If None, then X0, X1, etc will be used for representations.

warm_start : bool, optional (default=False)

When set to True, reuse the solution of the previous call to fit and add more generations to the evolution, otherwise, just fit a new evolution.

low_memory : bool, optional (default=False)

When set to True, only the current generation is retained. Parent information is discarded. For very large populations or runs with many generations, this can result in substantial memory use reduction.

n_jobs : integer, optional (default=1)

The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

verbose : int, optional (default=0)

Controls the verbosity of the evolution building process.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

[R62353c4bee8b-1]
  1. Koza, “Genetic Programming”, 1992.
[R62353c4bee8b-2]
  1. Poli, et al. “A Field Guide to Genetic Programming”, 2008.
Attributes:
run_details_ : dict

Details of the evolution process. Includes the following elements:

  • ‘generation’ : The generation index.
  • ‘average_length’ : The average program length of the generation.
  • ‘average_fitness’ : The average program fitness of the generation.
  • ‘best_length’ : The length of the best program in the generation.
  • ‘best_fitness’ : The fitness of the best program in the generation.
  • ‘best_oob_fitness’ : The out of bag fitness of the best program in the generation (requires max_samples < 1.0).
  • ‘generation_time’ : The time it took for the generation to evolve.
fit(self, X, y, sample_weight=None)

Fit the Genetic Program according to X, y.

Parameters:
X : array-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

sample_weight : array-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns:
self : object

Returns self.

fit_transform(self, X, y, sample_weight=None)[source]

Fit to data, then transform it.

Parameters:
X : array-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

sample_weight : array-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns:
X_new : array-like, shape = [n_samples, n_components]

Transformed array.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(self, X)[source]

Transform X according to the fitted transformer.

Parameters:
X : array-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns:
X_new : array-like, shape = [n_samples, n_components]

Transformed array.

User-Defined Functions

gplearn.functions.make_function(function, name, arity, wrap=True)[source]

Make a function node, a representation of a mathematical relationship.

This factory function creates a function node, one of the core nodes in any program. The resulting object is able to be called with NumPy vectorized arguments and return a resulting vector based on a mathematical relationship.

Parameters:
function : callable

A function with signature function(x1, *args) that returns a Numpy array of the same shape as its arguments.

name : str

The name for the function as it should be represented in the program and its visualizations.

arity : int

The number of arguments that the function takes.

wrap : bool, optional (default=True)

When running in parallel, pickling of custom functions is not supported by Python’s default pickler. This option will wrap the function using cloudpickle allowing you to pickle your solution, but the evolution may run slightly more slowly. If you are running single-threaded in an interactive Python session or have no need to save the model, set to False for faster runs.

User-Defined Fitness Metrics

gplearn.fitness.make_fitness(function, greater_is_better, wrap=True)[source]

Make a fitness measure, a metric scoring the quality of a program’s fit.

This factory function creates a fitness measure object which measures the quality of a program’s fit and thus its likelihood to undergo genetic operations into the next generation. The resulting object is able to be called with NumPy vectorized arguments and return a resulting floating point score quantifying the quality of the program’s representation of the true relationship.

Parameters:
function : callable

A function with signature function(y, y_pred, sample_weight) that returns a floating point number. Where y is the input target y vector, y_pred is the predicted values from the genetic program, and sample_weight is the sample_weight vector.

greater_is_better : bool

Whether a higher value from function indicates a better fit. In general this would be False for metrics indicating the magnitude of the error, and True for metrics indicating the quality of fit.

wrap : bool, optional (default=True)

When running in parallel, pickling of custom metrics is not supported by Python’s default pickler. This option will wrap the function using cloudpickle allowing you to pickle your solution, but the evolution may run slightly more slowly. If you are running single-threaded in an interactive Python session or have no need to save the model, set to False for faster runs.