
Utilities for implementation of PB-MABA algorithms.

Package Contents



An estimation of a preference matrix based on samples.

class PreferenceEstimate(num_arms: int, confidence_radius: duelpy.stats.confidence_radius.ConfidenceRadius = TrivialConfidenceRadius())

An estimation of a preference matrix based on samples.

Consider this example:

>>> preference_estimate = PreferenceEstimate(
...     num_arms = 3,
...     confidence_radius=lambda num_samples: 1/(num_samples + 1)
... )

We use a TrivialConfidenceRadius for easy illustration. Note that the results are not accurate, you probably want to use something like HoeffdingConfidenceRadius in practice.

In the beginning, nothing is known yet.

>>> preference_estimate.get_mean_estimate_matrix()
array([[0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5]])
>>> preference_estimate.get_upper_estimate_matrix()
array([[0.5, 1. , 1. ],
       [1. , 0.5, 1. ],
       [1. , 1. , 0.5]])
>>> preference_estimate.get_lower_estimate_matrix()
array([[0.5, 0. , 0. ],
       [0. , 0.5, 0. ],
       [0. , 0. , 0.5]])

If we enter a sampled win, the estimated probability of that arm increases and the inverse probability decreases accordingly.

>>> preference_estimate.enter_sample(0, 1, first_won=True)
>>> preference_estimate.get_mean_estimate_matrix()
array([[0.5, 1. , 0.5],
       [0. , 0.5, 0.5],
       [0.5, 0.5, 0.5]])

When entering more samples, the probability keeps adjusting. Let’s make it one win out of four.

>>> preference_estimate.enter_sample(0, 1, first_won=False)
>>> preference_estimate.enter_sample(0, 1, first_won=True)
>>> preference_estimate.enter_sample(0, 1, first_won=True)
>>> preference_estimate.get_mean_estimate_matrix()
array([[0.5 , 0.75, 0.5 ],
       [0.25, 0.5 , 0.5 ],
       [0.5 , 0.5 , 0.5 ]])

Meanwhile the confidence intervals have adjusted as well:

>>> preference_estimate.get_upper_estimate_matrix()
array([[0.5 , 0.95, 1.  ],
       [0.45, 0.5 , 1.  ],
       [1.  , 1.  , 0.5 ]])
>>> preference_estimate.get_lower_estimate_matrix()
array([[0.5 , 0.55, 0.  ],
       [0.05, 0.5 , 0.  ],
       [0.  , 0.  , 0.5 ]])

And if we tighten the confidence radius, they get changed yet again:

>>> preference_estimate.set_confidence_radius(lambda num_samples: 1/(6 * num_samples + 1))
>>> preference_estimate.get_upper_estimate_matrix()
array([[0.5 , 0.79, 1.  ],
       [0.29, 0.5 , 1.  ],
       [1.  , 1.  , 0.5 ]])
>>> preference_estimate.get_lower_estimate_matrix()
array([[0.5 , 0.71, 0.  ],
       [0.21, 0.5 , 0.  ],
       [0.  , 0.  , 0.5 ]])

We can now also sample a complete preference matrix from a beta distribution:

>>> preference_estimate.sample_preference_matrix(
...     random_state=np.random.RandomState(42)
... )
array([[0.5       , 0.72606244, 0.4978376 ],
       [0.27393756, 0.5       , 0.44364733],
       [0.5021624 , 0.55635267, 0.5       ]])
  • num_arms – The number of arms in the estimated preference matrix.

  • confidence_radius – The confidence radius to use when computing confidence intervals.

set_confidence_radius(self, confidence_radius: duelpy.stats.confidence_radius.ConfidenceRadius)None

Set the confidence radius to the given parameter.


confidence_radius – The confidence radius to be set as the new confidence_radius.

enter_sample(self, first_arm_index: int, second_arm_index: int, first_won: bool)None

Enter the result of a sampled duel.

  • first_arm_index – The index of the first arm of the duel.

  • second_arm_index – The index of the second arm of the duel.

  • first_won – Whether the first arm won the duel.

get_mean_estimate(self, first_arm_index: int, second_arm_index: int)float

Get the estimate of the win probability of first_arm_index against second_arm_index.

  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.


The estimated probability that first_arm_index wins against second_arm_index.

Return type


get_confidence_interval(self, first_arm_index: int, second_arm_index: int)Tuple[float, float]

Get the bounds of the confidence interval on the win probability.

  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.


The lower and upper bound of the confidence estimate for the probability that first_arm_index wins against second_arm_index.

Return type

Tuple[float, float]

get_upper_estimate(self, first_arm_index: int, second_arm_index: int)float

Get the upper estimate of the win probability of first_arm_index against second_arm_index.

  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.


The upper bound of the confidence estimate for the probability that first_arm_index wins against second_arm_index.

Return type


get_lower_estimate(self, first_arm_index: int, second_arm_index: int)float

Get the lower estimate of the win probability of first_arm against second_arm.

  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.


The lower bound of the confidence estimate for the probability that first_arm wins against second_arm.

Return type


get_num_samples(self, first_arm_index: int, second_arm_index: int)int

Get the number of times a duel between first_arm and second_arms was sampled.

  • first_arm_index – The first arm of the duel.

  • second_arm_index – The second arm of the duel.


The number of times a duel between the two arms was sampled, regardless of the arm order.

Return type



Seed the confidence radius cache and return it.


A numpy matrix containing the current confidence radius values.

Return type



Get the current mean estimates as a PreferenceMatrix.


The current mean estimate.

Return type



Get the current upper estimates as a PreferenceMatrix.


The current mean estimate.

Return type



Get the current lower estimates as a PreferenceMatrix.


The current mean estimate.

Return type



Get pessimistic estimates for every arm’s Copeland score.

This only counts wins that have a probability of above 50% in the pessimistic estimate. Those wins are “certain”, assuming the confidence interval is correct.


Get optimistic estimates for every arm’s Copeland score.

This counts every win that is considered possible within the confidence interval.

sample_preference_matrix(self, random_state: numpy.random.RandomState)duelpy.stats.preference_matrix.PreferenceMatrix

Sample a preference matrix based on a Beta distribution.

The outcome is a PreferenceMatrix object which is initialized from a sampled preference matrix. In this preference matrix, each pairwise preference is drawn from a beta-distribution which is parameterized on the results of prior duels.


random_state – A numpy random state.


A PreferenceMatrix object which is initialized from a preference matrix which is sampled on a Beta distribution.

Return type



Produce a string representation of the estimate.