duelpy.stats

Utilities for implementation of PB-MABA algorithms.

Package Contents

Classes

PreferenceEstimate

An estimation of a preference matrix based on samples.

class PreferenceEstimate(num_arms: int, confidence_radius: duelpy.stats.confidence_radius.ConfidenceRadius = TrivialConfidenceRadius())

An estimation of a preference matrix based on samples.

Consider this example:

>>> preference_estimate = PreferenceEstimate(
...     num_arms = 3,
...     confidence_radius=lambda num_samples: 1/(num_samples + 1)
... )

We use a TrivialConfidenceRadius for easy illustration. Note that the results are not accurate, you probably want to use something like HoeffdingConfidenceRadius in practice.

In the beginning, nothing is known yet.

>>> preference_estimate.get_mean_estimate_matrix()
array([[0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5]])
>>> preference_estimate.get_upper_estimate_matrix()
array([[0.5, 1. , 1. ],
       [1. , 0.5, 1. ],
       [1. , 1. , 0.5]])
>>> preference_estimate.get_lower_estimate_matrix()
array([[0.5, 0. , 0. ],
       [0. , 0.5, 0. ],
       [0. , 0. , 0.5]])

If we enter a sampled win, the estimated probability of that arm increases and the inverse probability decreases accordingly.

>>> preference_estimate.enter_sample(0, 1, first_won=True)
>>> preference_estimate.get_mean_estimate_matrix()
array([[0.5, 1. , 0.5],
       [0. , 0.5, 0.5],
       [0.5, 0.5, 0.5]])

When entering more samples, the probability keeps adjusting. Let’s make it one win out of four.

>>> preference_estimate.enter_sample(0, 1, first_won=False)
>>> preference_estimate.enter_sample(0, 1, first_won=True)
>>> preference_estimate.enter_sample(0, 1, first_won=True)
>>> preference_estimate.get_mean_estimate_matrix()
array([[0.5 , 0.75, 0.5 ],
       [0.25, 0.5 , 0.5 ],
       [0.5 , 0.5 , 0.5 ]])

Meanwhile the confidence intervals have adjusted as well:

>>> preference_estimate.get_upper_estimate_matrix()
array([[0.5 , 0.95, 1.  ],
       [0.45, 0.5 , 1.  ],
       [1.  , 1.  , 0.5 ]])
>>> preference_estimate.get_lower_estimate_matrix()
array([[0.5 , 0.55, 0.  ],
       [0.05, 0.5 , 0.  ],
       [0.  , 0.  , 0.5 ]])

And if we tighten the confidence radius, they get changed yet again:

>>> preference_estimate.set_confidence_radius(lambda num_samples: 1/(6 * num_samples + 1))
>>> preference_estimate.get_upper_estimate_matrix()
array([[0.5 , 0.79, 1.  ],
       [0.29, 0.5 , 1.  ],
       [1.  , 1.  , 0.5 ]])
>>> preference_estimate.get_lower_estimate_matrix()
array([[0.5 , 0.71, 0.  ],
       [0.21, 0.5 , 0.  ],
       [0.  , 0.  , 0.5 ]])

We can now also sample a complete preference matrix from a beta distribution:

>>> preference_estimate.sample_preference_matrix(
...     random_state=np.random.RandomState(42)
... )
array([[0.5       , 0.72606244, 0.4978376 ],
       [0.27393756, 0.5       , 0.44364733],
       [0.5021624 , 0.55635267, 0.5       ]])
Parameters
  • num_arms – The number of arms in the estimated preference matrix.

  • confidence_radius – The confidence radius to use when computing confidence intervals.

set_confidence_radius(self, confidence_radius: duelpy.stats.confidence_radius.ConfidenceRadius)None

Set the confidence radius to the given parameter.

Parameters

confidence_radius – The confidence radius to be set as the new confidence_radius.

enter_sample(self, first_arm_index: int, second_arm_index: int, first_won: bool)None

Enter the result of a sampled duel.

Parameters
  • first_arm_index – The index of the first arm of the duel.

  • second_arm_index – The index of the second arm of the duel.

  • first_won – Whether the first arm won the duel.

get_mean_estimate(self, first_arm_index: int, second_arm_index: int)float

Get the estimate of the win probability of first_arm_index against second_arm_index.

Parameters
  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.

Returns

The estimated probability that first_arm_index wins against second_arm_index.

Return type

float

get_confidence_interval(self, first_arm_index: int, second_arm_index: int)Tuple[float, float]

Get the bounds of the confidence interval on the win probability.

Parameters
  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.

Returns

The lower and upper bound of the confidence estimate for the probability that first_arm_index wins against second_arm_index.

Return type

Tuple[float, float]

get_upper_estimate(self, first_arm_index: int, second_arm_index: int)float

Get the upper estimate of the win probability of first_arm_index against second_arm_index.

Parameters
  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.

Returns

The upper bound of the confidence estimate for the probability that first_arm_index wins against second_arm_index.

Return type

float

get_lower_estimate(self, first_arm_index: int, second_arm_index: int)float

Get the lower estimate of the win probability of first_arm against second_arm.

Parameters
  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.

Returns

The lower bound of the confidence estimate for the probability that first_arm wins against second_arm.

Return type

float

get_num_samples(self, first_arm_index: int, second_arm_index: int)int

Get the number of times a duel between first_arm and second_arms was sampled.

Parameters
  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.

Returns

The number of times a duel between the two arms was sampled, regardless of the arm order.

Return type

int

get_radius_matrix(self)numpy.array

Seed the confidence radius cache and return it.

Returns

A numpy matrix containing the current confidence radius values.

Return type

np.array

get_mean_estimate_matrix(self)duelpy.stats.preference_matrix.PreferenceMatrix

Get the current mean estimates as a PreferenceMatrix.

Returns

The current mean estimate.

Return type

PreferenceMatrix

get_upper_estimate_matrix(self)duelpy.stats.preference_matrix.PreferenceMatrix

Get the current upper estimates as a PreferenceMatrix.

Returns

The current mean estimate.

Return type

PreferenceMatrix

get_lower_estimate_matrix(self)duelpy.stats.preference_matrix.PreferenceMatrix

Get the current lower estimates as a PreferenceMatrix.

Returns

The current mean estimate.

Return type

PreferenceMatrix

get_pessimistic_copeland_score_estimates(self)numpy.array

Get pessimistic estimates for every arm’s Copeland score.

This only counts wins that have a probability of above 50% in the pessimistic estimate. Those wins are “certain”, assuming the confidence interval is correct.

get_optimistic_copeland_score_estimates(self)numpy.array

Get optimistic estimates for every arm’s Copeland score.

This counts every win that is considered possible within the confidence interval.

sample_preference_matrix(self, random_state: numpy.random.RandomState)duelpy.stats.preference_matrix.PreferenceMatrix

Sample a preference matrix based on a Beta distribution.

The outcome is a PreferenceMatrix object which is initialized from a sampled preference matrix. In this preference matrix, each pairwise preference is drawn from a beta-distribution which is parameterized on the results of prior duels.

Parameters

random_state – A numpy random state.

Returns

A PreferenceMatrix object which is initialized from a preference matrix which is sampled on a Beta distribution.

Return type

PreferenceMatrix

__str__(self)str

Produce a string representation of the estimate.