API reference

Symbolic Regressor

class gplearn.genetic.SymbolicRegressor(*, population_size=1000, generations=20, tournament_size=20, stopping_criteria=0.0, const_range=(-1.0, 1.0), init_depth=(2, 6), init_method='half and half', function_set=('add', 'sub', 'mul', 'div'), metric='mean absolute error', parsimony_coefficient=0.001, p_crossover=0.9, p_subtree_mutation=0.01, p_hoist_mutation=0.01, p_point_mutation=0.01, p_point_replace=0.05, max_samples=1.0, feature_names=None, warm_start=False, low_memory=False, n_jobs=1, verbose=0, random_state=None)[source]

A Genetic Programming symbolic regressor.

A symbolic regressor is an estimator that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction.

Parameters
population_sizeinteger, optional (default=1000)

The number of programs in each generation.

generationsinteger, optional (default=20)

The number of generations to evolve.

tournament_sizeinteger, optional (default=20)

The number of programs that will compete to become part of the next generation.

stopping_criteriafloat, optional (default=0.0)

The required metric value required in order to stop evolution early.

const_rangetuple of two floats, or None, optional (default=(-1., 1.))

The range of constants to include in the formulas. If None then no constants will be included in the candidate programs.

init_depthtuple of two ints, optional (default=(2, 6))

The range of tree depths for the initial population of naive formulas. Individual trees will randomly choose a maximum depth from this range. When combined with init_method=’half and half’ this yields the well- known ‘ramped half and half’ initialization method.

init_methodstr, optional (default=’half and half’)
  • ‘grow’ : Nodes are chosen at random from both functions and terminals, allowing for smaller trees than init_depth allows. Tends to grow asymmetrical trees.

  • ‘full’ : Functions are chosen until the init_depth is reached, and then terminals are selected. Tends to grow ‘bushy’ trees.

  • ‘half and half’ : Trees are grown through a 50/50 mix of ‘full’ and ‘grow’, making for a mix of tree shapes in the initial population.

function_setiterable, optional (default=(‘add’, ‘sub’, ‘mul’, ‘div’))

The functions to use when building and evolving programs. This iterable can include strings to indicate either individual functions as outlined below, or you can also include your own functions as built using the make_function factory from the functions module.

Available individual functions are:

  • ‘add’ : addition, arity=2.

  • ‘sub’ : subtraction, arity=2.

  • ‘mul’ : multiplication, arity=2.

  • ‘div’ : protected division where a denominator near-zero returns 1., arity=2.

  • ‘sqrt’ : protected square root where the absolute value of the argument is used, arity=1.

  • ‘log’ : protected log where the absolute value of the argument is used and a near-zero argument returns 0., arity=1.

  • ‘abs’ : absolute value, arity=1.

  • ‘neg’ : negative, arity=1.

  • ‘inv’ : protected inverse where a near-zero argument returns 0., arity=1.

  • ‘max’ : maximum, arity=2.

  • ‘min’ : minimum, arity=2.

  • ‘sin’ : sine (radians), arity=1.

  • ‘cos’ : cosine (radians), arity=1.

  • ‘tan’ : tangent (radians), arity=1.

metricstr, optional (default=’mean absolute error’)

The name of the raw fitness metric. Available options include:

  • ‘mean absolute error’.

  • ‘mse’ for mean squared error.

  • ‘rmse’ for root mean squared error.

  • ‘pearson’, for Pearson’s product-moment correlation coefficient.

  • ‘spearman’ for Spearman’s rank-order correlation coefficient.

Note that ‘pearson’ and ‘spearman’ will not directly predict the target but could be useful as value-added features in a second-step estimator. This would allow the user to generate one engineered feature at a time, using the SymbolicTransformer would allow creation of multiple features at once.

parsimony_coefficientfloat or “auto”, optional (default=0.001)

This constant penalizes large programs by adjusting their fitness to be less favorable for selection. Larger values penalize the program more which can control the phenomenon known as ‘bloat’. Bloat is when evolution is increasing the size of programs without a significant increase in fitness, which is costly for computation time and makes for a less understandable final result. This parameter may need to be tuned over successive runs.

If “auto” the parsimony coefficient is recalculated for each generation using c = Cov(l,f)/Var( l), where Cov(l,f) is the covariance between program size l and program fitness f in the population, and Var(l) is the variance of program sizes.

p_crossoverfloat, optional (default=0.9)

The probability of performing crossover on a tournament winner. Crossover takes the winner of a tournament and selects a random subtree from it to be replaced. A second tournament is performed to find a donor. The donor also has a subtree selected at random and this is inserted into the original parent to form an offspring in the next generation.

p_subtree_mutationfloat, optional (default=0.01)

The probability of performing subtree mutation on a tournament winner. Subtree mutation takes the winner of a tournament and selects a random subtree from it to be replaced. A donor subtree is generated at random and this is inserted into the original parent to form an offspring in the next generation.

p_hoist_mutationfloat, optional (default=0.01)

The probability of performing hoist mutation on a tournament winner. Hoist mutation takes the winner of a tournament and selects a random subtree from it. A random subtree of that subtree is then selected and this is ‘hoisted’ into the original subtrees location to form an offspring in the next generation. This method helps to control bloat.

p_point_mutationfloat, optional (default=0.01)

The probability of performing point mutation on a tournament winner. Point mutation takes the winner of a tournament and selects random nodes from it to be replaced. Terminals are replaced by other terminals and functions are replaced by other functions that require the same number of arguments as the original node. The resulting tree forms an offspring in the next generation.

Note : The above genetic operation probabilities must sum to less than one. The balance of probability is assigned to ‘reproduction’, where a tournament winner is cloned and enters the next generation unmodified.

p_point_replacefloat, optional (default=0.05)

For point mutation only, the probability that any given node will be mutated.

max_samplesfloat, optional (default=1.0)

The fraction of samples to draw from X to evaluate each program on.

feature_nameslist, optional (default=None)

Optional list of feature names, used purely for representations in the print operation or export_graphviz. If None, then X0, X1, etc will be used for representations.

warm_startbool, optional (default=False)

When set to True, reuse the solution of the previous call to fit and add more generations to the evolution, otherwise, just fit a new evolution.

low_memorybool, optional (default=False)

When set to True, only the current generation is retained. Parent information is discarded. For very large populations or runs with many generations, this can result in substantial memory use reduction.

n_jobsinteger, optional (default=1)

The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

verboseint, optional (default=0)

Controls the verbosity of the evolution building process.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

1
  1. Koza, “Genetic Programming”, 1992.

2
  1. Poli, et al. “A Field Guide to Genetic Programming”, 2008.

Attributes
run_details_dict

Details of the evolution process. Includes the following elements:

  • ‘generation’ : The generation index.

  • ‘average_length’ : The average program length of the generation.

  • ‘average_fitness’ : The average program fitness of the generation.

  • ‘best_length’ : The length of the best program in the generation.

  • ‘best_fitness’ : The fitness of the best program in the generation.

  • ‘best_oob_fitness’ : The out of bag fitness of the best program in the generation (requires max_samples < 1.0).

  • ‘generation_time’ : The time it took for the generation to evolve.

fit(X, y, sample_weight=None)

Fit the Genetic Program according to X, y.

Parameters
Xarray-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

yarray-like, shape = [n_samples]

Target values.

sample_weightarray-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns
selfobject

Returns self.

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

predict(X)[source]

Perform regression on test vectors X.

Parameters
Xarray-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns
yarray, shape = [n_samples]

Predicted values for X.

score(X, y, sample_weight=None)

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

\(R^2\) of self.predict(X) wrt. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

Symbolic Classifier

class gplearn.genetic.SymbolicClassifier(*, population_size=1000, generations=20, tournament_size=20, stopping_criteria=0.0, const_range=(-1.0, 1.0), init_depth=(2, 6), init_method='half and half', function_set=('add', 'sub', 'mul', 'div'), transformer='sigmoid', metric='log loss', parsimony_coefficient=0.001, p_crossover=0.9, p_subtree_mutation=0.01, p_hoist_mutation=0.01, p_point_mutation=0.01, p_point_replace=0.05, max_samples=1.0, class_weight=None, feature_names=None, warm_start=False, low_memory=False, n_jobs=1, verbose=0, random_state=None)[source]

A Genetic Programming symbolic classifier.

A symbolic classifier is an estimator that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction.

Parameters
population_sizeinteger, optional (default=500)

The number of programs in each generation.

generationsinteger, optional (default=10)

The number of generations to evolve.

tournament_sizeinteger, optional (default=20)

The number of programs that will compete to become part of the next generation.

stopping_criteriafloat, optional (default=0.0)

The required metric value required in order to stop evolution early.

const_rangetuple of two floats, or None, optional (default=(-1., 1.))

The range of constants to include in the formulas. If None then no constants will be included in the candidate programs.

init_depthtuple of two ints, optional (default=(2, 6))

The range of tree depths for the initial population of naive formulas. Individual trees will randomly choose a maximum depth from this range. When combined with init_method=’half and half’ this yields the well- known ‘ramped half and half’ initialization method.

init_methodstr, optional (default=’half and half’)
  • ‘grow’ : Nodes are chosen at random from both functions and terminals, allowing for smaller trees than init_depth allows. Tends to grow asymmetrical trees.

  • ‘full’ : Functions are chosen until the init_depth is reached, and then terminals are selected. Tends to grow ‘bushy’ trees.

  • ‘half and half’ : Trees are grown through a 50/50 mix of ‘full’ and ‘grow’, making for a mix of tree shapes in the initial population.

function_setiterable, optional (default=(‘add’, ‘sub’, ‘mul’, ‘div’))

The functions to use when building and evolving programs. This iterable can include strings to indicate either individual functions as outlined below, or you can also include your own functions as built using the make_function factory from the functions module.

Available individual functions are:

  • ‘add’ : addition, arity=2.

  • ‘sub’ : subtraction, arity=2.

  • ‘mul’ : multiplication, arity=2.

  • ‘div’ : protected division where a denominator near-zero returns 1., arity=2.

  • ‘sqrt’ : protected square root where the absolute value of the argument is used, arity=1.

  • ‘log’ : protected log where the absolute value of the argument is used and a near-zero argument returns 0., arity=1.

  • ‘abs’ : absolute value, arity=1.

  • ‘neg’ : negative, arity=1.

  • ‘inv’ : protected inverse where a near-zero argument returns 0., arity=1.

  • ‘max’ : maximum, arity=2.

  • ‘min’ : minimum, arity=2.

  • ‘sin’ : sine (radians), arity=1.

  • ‘cos’ : cosine (radians), arity=1.

  • ‘tan’ : tangent (radians), arity=1.

transformerstr, optional (default=’sigmoid’)

The name of the function through which the raw decision function is passed. This function will transform the raw decision function into probabilities of each class.

This can also be replaced by your own functions as built using the make_function factory from the functions module.

metricstr, optional (default=’log loss’)

The name of the raw fitness metric. Available options include:

  • ‘log loss’ aka binary cross-entropy loss.

parsimony_coefficientfloat or “auto”, optional (default=0.001)

This constant penalizes large programs by adjusting their fitness to be less favorable for selection. Larger values penalize the program more which can control the phenomenon known as ‘bloat’. Bloat is when evolution is increasing the size of programs without a significant increase in fitness, which is costly for computation time and makes for a less understandable final result. This parameter may need to be tuned over successive runs.

If “auto” the parsimony coefficient is recalculated for each generation using c = Cov(l,f)/Var( l), where Cov(l,f) is the covariance between program size l and program fitness f in the population, and Var(l) is the variance of program sizes.

p_crossoverfloat, optional (default=0.9)

The probability of performing crossover on a tournament winner. Crossover takes the winner of a tournament and selects a random subtree from it to be replaced. A second tournament is performed to find a donor. The donor also has a subtree selected at random and this is inserted into the original parent to form an offspring in the next generation.

p_subtree_mutationfloat, optional (default=0.01)

The probability of performing subtree mutation on a tournament winner. Subtree mutation takes the winner of a tournament and selects a random subtree from it to be replaced. A donor subtree is generated at random and this is inserted into the original parent to form an offspring in the next generation.

p_hoist_mutationfloat, optional (default=0.01)

The probability of performing hoist mutation on a tournament winner. Hoist mutation takes the winner of a tournament and selects a random subtree from it. A random subtree of that subtree is then selected and this is ‘hoisted’ into the original subtrees location to form an offspring in the next generation. This method helps to control bloat.

p_point_mutationfloat, optional (default=0.01)

The probability of performing point mutation on a tournament winner. Point mutation takes the winner of a tournament and selects random nodes from it to be replaced. Terminals are replaced by other terminals and functions are replaced by other functions that require the same number of arguments as the original node. The resulting tree forms an offspring in the next generation.

Note : The above genetic operation probabilities must sum to less than one. The balance of probability is assigned to ‘reproduction’, where a tournament winner is cloned and enters the next generation unmodified.

p_point_replacefloat, optional (default=0.05)

For point mutation only, the probability that any given node will be mutated.

max_samplesfloat, optional (default=1.0)

The fraction of samples to draw from X to evaluate each program on.

class_weightdict, ‘balanced’ or None, optional (default=None)

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

feature_nameslist, optional (default=None)

Optional list of feature names, used purely for representations in the print operation or export_graphviz. If None, then X0, X1, etc will be used for representations.

warm_startbool, optional (default=False)

When set to True, reuse the solution of the previous call to fit and add more generations to the evolution, otherwise, just fit a new evolution.

low_memorybool, optional (default=False)

When set to True, only the current generation is retained. Parent information is discarded. For very large populations or runs with many generations, this can result in substantial memory use reduction.

n_jobsinteger, optional (default=1)

The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

verboseint, optional (default=0)

Controls the verbosity of the evolution building process.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

1
  1. Koza, “Genetic Programming”, 1992.

2
  1. Poli, et al. “A Field Guide to Genetic Programming”, 2008.

Attributes
run_details_dict

Details of the evolution process. Includes the following elements:

  • ‘generation’ : The generation index.

  • ‘average_length’ : The average program length of the generation.

  • ‘average_fitness’ : The average program fitness of the generation.

  • ‘best_length’ : The length of the best program in the generation.

  • ‘best_fitness’ : The fitness of the best program in the generation.

  • ‘best_oob_fitness’ : The out of bag fitness of the best program in the generation (requires max_samples < 1.0).

  • ‘generation_time’ : The time it took for the generation to evolve.

fit(X, y, sample_weight=None)

Fit the Genetic Program according to X, y.

Parameters
Xarray-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

yarray-like, shape = [n_samples]

Target values.

sample_weightarray-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns
selfobject

Returns self.

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

predict(X)[source]

Predict classes on test vectors X.

Parameters
Xarray-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns
yarray, shape = [n_samples,]

The predicted classes of the input samples.

predict_proba(X)[source]

Predict probabilities on test vectors X.

Parameters
Xarray-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns
probaarray, shape = [n_samples, n_classes]

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

Symbolic Transformer

class gplearn.genetic.SymbolicTransformer(*, population_size=1000, hall_of_fame=100, n_components=10, generations=20, tournament_size=20, stopping_criteria=1.0, const_range=(-1.0, 1.0), init_depth=(2, 6), init_method='half and half', function_set=('add', 'sub', 'mul', 'div'), metric='pearson', parsimony_coefficient=0.001, p_crossover=0.9, p_subtree_mutation=0.01, p_hoist_mutation=0.01, p_point_mutation=0.01, p_point_replace=0.05, max_samples=1.0, feature_names=None, warm_start=False, low_memory=False, n_jobs=1, verbose=0, random_state=None)[source]

A Genetic Programming symbolic transformer.

A symbolic transformer is a supervised transformer that begins by building a population of naive random formulas to represent a relationship. The formulas are represented as tree-like structures with mathematical functions being recursively applied to variables and constants. Each successive generation of programs is then evolved from the one that came before it by selecting the fittest individuals from the population to undergo genetic operations such as crossover, mutation or reproduction. The final population is searched for the fittest individuals with the least correlation to one another.

Parameters
population_sizeinteger, optional (default=1000)

The number of programs in each generation.

hall_of_fameinteger, or None, optional (default=100)

The number of fittest programs to compare from when finding the least-correlated individuals for the n_components. If None, the entire final generation will be used.

n_componentsinteger, or None, optional (default=10)

The number of best programs to return after searching the hall_of_fame for the least-correlated individuals. If None, the entire hall_of_fame will be used.

generationsinteger, optional (default=20)

The number of generations to evolve.

tournament_sizeinteger, optional (default=20)

The number of programs that will compete to become part of the next generation.

stopping_criteriafloat, optional (default=1.0)

The required metric value required in order to stop evolution early.

const_rangetuple of two floats, or None, optional (default=(-1., 1.))

The range of constants to include in the formulas. If None then no constants will be included in the candidate programs.

init_depthtuple of two ints, optional (default=(2, 6))

The range of tree depths for the initial population of naive formulas. Individual trees will randomly choose a maximum depth from this range. When combined with init_method=’half and half’ this yields the well- known ‘ramped half and half’ initialization method.

init_methodstr, optional (default=’half and half’)
  • ‘grow’ : Nodes are chosen at random from both functions and terminals, allowing for smaller trees than init_depth allows. Tends to grow asymmetrical trees.

  • ‘full’ : Functions are chosen until the init_depth is reached, and then terminals are selected. Tends to grow ‘bushy’ trees.

  • ‘half and half’ : Trees are grown through a 50/50 mix of ‘full’ and ‘grow’, making for a mix of tree shapes in the initial population.

function_setiterable, optional (default=(‘add’, ‘sub’, ‘mul’, ‘div’))

The functions to use when building and evolving programs. This iterable can include strings to indicate either individual functions as outlined below, or you can also include your own functions as built using the make_function factory from the functions module.

Available individual functions are:

  • ‘add’ : addition, arity=2.

  • ‘sub’ : subtraction, arity=2.

  • ‘mul’ : multiplication, arity=2.

  • ‘div’ : protected division where a denominator near-zero returns 1., arity=2.

  • ‘sqrt’ : protected square root where the absolute value of the argument is used, arity=1.

  • ‘log’ : protected log where the absolute value of the argument is used and a near-zero argument returns 0., arity=1.

  • ‘abs’ : absolute value, arity=1.

  • ‘neg’ : negative, arity=1.

  • ‘inv’ : protected inverse where a near-zero argument returns 0., arity=1.

  • ‘max’ : maximum, arity=2.

  • ‘min’ : minimum, arity=2.

  • ‘sin’ : sine (radians), arity=1.

  • ‘cos’ : cosine (radians), arity=1.

  • ‘tan’ : tangent (radians), arity=1.

metricstr, optional (default=’pearson’)

The name of the raw fitness metric. Available options include:

  • ‘pearson’, for Pearson’s product-moment correlation coefficient.

  • ‘spearman’ for Spearman’s rank-order correlation coefficient.

parsimony_coefficientfloat or “auto”, optional (default=0.001)

This constant penalizes large programs by adjusting their fitness to be less favorable for selection. Larger values penalize the program more which can control the phenomenon known as ‘bloat’. Bloat is when evolution is increasing the size of programs without a significant increase in fitness, which is costly for computation time and makes for a less understandable final result. This parameter may need to be tuned over successive runs.

If “auto” the parsimony coefficient is recalculated for each generation using c = Cov(l,f)/Var( l), where Cov(l,f) is the covariance between program size l and program fitness f in the population, and Var(l) is the variance of program sizes.

p_crossoverfloat, optional (default=0.9)

The probability of performing crossover on a tournament winner. Crossover takes the winner of a tournament and selects a random subtree from it to be replaced. A second tournament is performed to find a donor. The donor also has a subtree selected at random and this is inserted into the original parent to form an offspring in the next generation.

p_subtree_mutationfloat, optional (default=0.01)

The probability of performing subtree mutation on a tournament winner. Subtree mutation takes the winner of a tournament and selects a random subtree from it to be replaced. A donor subtree is generated at random and this is inserted into the original parent to form an offspring in the next generation.

p_hoist_mutationfloat, optional (default=0.01)

The probability of performing hoist mutation on a tournament winner. Hoist mutation takes the winner of a tournament and selects a random subtree from it. A random subtree of that subtree is then selected and this is ‘hoisted’ into the original subtrees location to form an offspring in the next generation. This method helps to control bloat.

p_point_mutationfloat, optional (default=0.01)

The probability of performing point mutation on a tournament winner. Point mutation takes the winner of a tournament and selects random nodes from it to be replaced. Terminals are replaced by other terminals and functions are replaced by other functions that require the same number of arguments as the original node. The resulting tree forms an offspring in the next generation.

Note : The above genetic operation probabilities must sum to less than one. The balance of probability is assigned to ‘reproduction’, where a tournament winner is cloned and enters the next generation unmodified.

p_point_replacefloat, optional (default=0.05)

For point mutation only, the probability that any given node will be mutated.

max_samplesfloat, optional (default=1.0)

The fraction of samples to draw from X to evaluate each program on.

feature_nameslist, optional (default=None)

Optional list of feature names, used purely for representations in the print operation or export_graphviz. If None, then X0, X1, etc will be used for representations.

warm_startbool, optional (default=False)

When set to True, reuse the solution of the previous call to fit and add more generations to the evolution, otherwise, just fit a new evolution.

low_memorybool, optional (default=False)

When set to True, only the current generation is retained. Parent information is discarded. For very large populations or runs with many generations, this can result in substantial memory use reduction.

n_jobsinteger, optional (default=1)

The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

verboseint, optional (default=0)

Controls the verbosity of the evolution building process.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

References

1
  1. Koza, “Genetic Programming”, 1992.

2
  1. Poli, et al. “A Field Guide to Genetic Programming”, 2008.

Attributes
run_details_dict

Details of the evolution process. Includes the following elements:

  • ‘generation’ : The generation index.

  • ‘average_length’ : The average program length of the generation.

  • ‘average_fitness’ : The average program fitness of the generation.

  • ‘best_length’ : The length of the best program in the generation.

  • ‘best_fitness’ : The fitness of the best program in the generation.

  • ‘best_oob_fitness’ : The out of bag fitness of the best program in the generation (requires max_samples < 1.0).

  • ‘generation_time’ : The time it took for the generation to evolve.

fit(X, y, sample_weight=None)

Fit the Genetic Program according to X, y.

Parameters
Xarray-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

yarray-like, shape = [n_samples]

Target values.

sample_weightarray-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns
selfobject

Returns self.

fit_transform(X, y, sample_weight=None)[source]

Fit to data, then transform it.

Parameters
Xarray-like, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

yarray-like, shape = [n_samples]

Target values.

sample_weightarray-like, shape = [n_samples], optional

Weights applied to individual samples.

Returns
X_newarray-like, shape = [n_samples, n_components]

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

transform(X)[source]

Transform X according to the fitted transformer.

Parameters
Xarray-like, shape = [n_samples, n_features]

Input vectors, where n_samples is the number of samples and n_features is the number of features.

Returns
X_newarray-like, shape = [n_samples, n_components]

Transformed array.

User-Defined Functions

gplearn.functions.make_function(*, function, name, arity, wrap=True)[source]

Make a function node, a representation of a mathematical relationship.

This factory function creates a function node, one of the core nodes in any program. The resulting object is able to be called with NumPy vectorized arguments and return a resulting vector based on a mathematical relationship.

Parameters
functioncallable

A function with signature function(x1, *args) that returns a Numpy array of the same shape as its arguments.

namestr

The name for the function as it should be represented in the program and its visualizations.

arityint

The number of arguments that the function takes.

wrapbool, optional (default=True)

When running in parallel, pickling of custom functions is not supported by Python’s default pickler. This option will wrap the function using cloudpickle allowing you to pickle your solution, but the evolution may run slightly more slowly. If you are running single-threaded in an interactive Python session or have no need to save the model, set to False for faster runs.

User-Defined Fitness Metrics

gplearn.fitness.make_fitness(*, function, greater_is_better, wrap=True)[source]

Make a fitness measure, a metric scoring the quality of a program’s fit.

This factory function creates a fitness measure object which measures the quality of a program’s fit and thus its likelihood to undergo genetic operations into the next generation. The resulting object is able to be called with NumPy vectorized arguments and return a resulting floating point score quantifying the quality of the program’s representation of the true relationship.

Parameters
functioncallable

A function with signature function(y, y_pred, sample_weight) that returns a floating point number. Where y is the input target y vector, y_pred is the predicted values from the genetic program, and sample_weight is the sample_weight vector.

greater_is_betterbool

Whether a higher value from function indicates a better fit. In general this would be False for metrics indicating the magnitude of the error, and True for metrics indicating the quality of fit.

wrapbool, optional (default=True)

When running in parallel, pickling of custom metrics is not supported by Python’s default pickler. This option will wrap the function using cloudpickle allowing you to pickle your solution, but the evolution may run slightly more slowly. If you are running single-threaded in an interactive Python session or have no need to save the model, set to False for faster runs.