Contextual Multi-Armed Bandits Library

Overview

A comprehensive Python library implementing a variety of contextual and non-contextual multi-armed bandit algorithms—including LinUCB, Epsilon-Greedy, Upper Confidence Bound (UCB), Thompson Sampling, KernelUCB, NeuralLinearBandit, and DecisionTreeBandit—designed for reinforcement learning applications that require decision-making under uncertainty with or without contextual information.

Features

Contextual Algorithms:
- LinUCB: Balances exploration and exploitation using linear regression with upper confidence bounds.
- Epsilon-Greedy: Explores randomly with probability epsilon and exploits the best-known option otherwise.
- KernelUCB: Uses kernel methods to capture non-linear relationships in the context space.
- NeuralLinearBandit: Combines neural networks for feature extraction with linear models for prediction.
- DecisionTreeBandit: Employs decision trees to model complex relationships between context and rewards.
Non-Contextual Algorithms:
- Upper Confidence Bound (UCB): Selects arms based on upper confidence bounds of estimated rewards.
- Thompson Sampling: Uses Bayesian methods to balance exploration and exploitation.

Installation

pip install contextual-bandits-algos

Instructions to Use the Updated Library

Clone the Repository

git clone https://github.com/singhsidhukuldeep/contextual-bandits.git
cd contextual-bandits

Install the Requirements
```
pip install -r requirements.txt
```
Install the Package
```
pip install .
```

Run the Examples

python examples/example_linucb.py
python examples/example_epsilon_greedy.py
python examples/example_ucb.py
python examples/example_thompson_sampling.py
python examples/example_kernelucb.py
python examples/example_neurallinear.py
python examples/example_tree_bandit.py

To run both algorithms in a single script:

python examples/example_usage.py

Run all examples *.py files

cd examples
for f in *.py; do echo "$f" & python "$f"; done

Algorithms and Examples

Below are detailed descriptions of each algorithm, indicating whether they are contextual or non-contextual, how they work, and examples demonstrating their usage.

1. LinUCB

Type: Contextual
Description: The LinUCB algorithm is a contextual bandit algorithm that uses linear regression to predict the expected reward for each arm given the current context. It balances exploration and exploitation by adding an upper confidence bound to the estimated rewards.

How It Works

Model: Assumes that the reward is a linear function of the context features.
Exploration: Incorporates uncertainty in the estimation by adding a confidence interval (scaled by alpha).
Exploitation: Chooses the arm with the highest upper confidence bound.

2. Epsilon-Greedy

Type: Contextual
Description: The Epsilon-Greedy algorithm selects a random arm with probability epsilon (exploration) and the best-known arm with probability 1 - epsilon (exploitation). It uses the context to predict rewards for each arm.

How It Works

Model: Uses linear models or other estimators to predict rewards based on context.
Exploration: With probability epsilon, selects a random arm.
Exploitation: With probability 1 - epsilon, selects the arm with the highest predicted reward.

3. Upper Confidence Bound (UCB)

Type: Non-Contextual
Description: The UCB algorithm selects arms based on upper confidence bounds of the estimated rewards, without considering any context. It is suitable when no contextual information is available.

How It Works

Model: Estimates the average reward for each arm.
Exploration: Adds a confidence term to the average reward to explore less-tried arms.
Exploitation: Chooses the arm with the highest upper confidence bound.

4. Thompson Sampling

Type: Non-Contextual
Description: Thompson Sampling is a Bayesian algorithm that selects arms based on samples drawn from the posterior distributions of the arm's reward probabilities.

How It Works

Model: Assumes Bernoulli-distributed rewards for each arm.
Exploration & Exploitation: Balances both by sampling from the posterior distributions.
Updates: Updates the posterior distributions based on observed rewards.

5. KernelUCB

Type: Contextual
Description: KernelUCB uses kernel methods to capture non-linear relationships between contexts and rewards. It extends the UCB algorithm to a kernelized context space.

How It Works

Model: Uses a kernel function (e.g., RBF kernel) to compute similarity between contexts.
Exploration: Adds an exploration term based on the uncertainty in the kernel space.
Exploitation: Predicts the expected reward using kernel regression.

6. NeuralLinearBandit

Type: Contextual
Description: NeuralLinearBandit uses a neural network to learn a representation of the context and then applies linear regression on the learned features.

How It Works

Model: Combines a neural network for feature extraction with a linear model for reward prediction.
Exploration: Adds an exploration bonus based on the uncertainty of the linear model.
Exploitation: Uses the predicted rewards from the linear model.

7. DecisionTreeBandit

Type: Contextual
Description: The DecisionTreeBandit algorithm uses decision trees to model the relationship between context and rewards, allowing it to capture non-linear patterns.

How It Works

Model: Fits a decision tree regressor for each arm based on the observed contexts and rewards.
Exploration: Relies on the decision tree's predictions; may require additional mechanisms for exploration.
Exploitation: Selects the arm with the highest predicted reward.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
contextual_bandits		contextual_bandits
examples		examples
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
Contextual Bandit Algorithms.png		Contextual Bandit Algorithms.png
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contextual Multi-Armed Bandits Library

Overview

Features

Installation

Instructions to Use the Updated Library

Algorithms and Examples

1. LinUCB

How It Works

2. Epsilon-Greedy

How It Works

3. Upper Confidence Bound (UCB)

How It Works

4. Thompson Sampling

How It Works

5. KernelUCB

How It Works

6. NeuralLinearBandit

How It Works

7. DecisionTreeBandit

How It Works

About

Releases

Packages

Languages

License

singhsidhukuldeep/contextual-bandits

Folders and files

Latest commit

History

Repository files navigation

Contextual Multi-Armed Bandits Library

Overview

Features

Installation

Instructions to Use the Updated Library

Algorithms and Examples

1. LinUCB

How It Works

2. Epsilon-Greedy

How It Works

3. Upper Confidence Bound (UCB)

How It Works

4. Thompson Sampling

How It Works

5. KernelUCB

How It Works

6. NeuralLinearBandit

How It Works

7. DecisionTreeBandit

How It Works

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages