<script type="application/ld+json">
{
 "@context": "https://schema.org",
 "@type": "FAQPage",
 "mainEntity": [{
   "@type": "Question",
   "name": "What is Thompson sampling?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "Thompson sampling is an algorithm that uses ​​exploration and exploitation for the purpose of choosing the actions that would maximize the rewards earned. It is also known as Probability Matching or Posterior Sampling."
   }
 },{
   "@type": "Question",
   "name": "What is the Multi-Armed Bandit Problem?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "The multi-armed bandit problem, though it may get you thinking about highwaymen, actually refers to multiple slot machines with varied payback percentages or a single slot machine with multiple arms, and each arm has a different payback percentage."
   }
 },{
   "@type": "Question",
   "name": "What is the intuition behind Thompson sampling in the multi-armed bandit problem?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "1. It is assumed that all the slot machines have a uniform probability of earning a reward.
2. Based on the reward or penalty observed from each action, a new distribution is generated with probabilities of success for each machine.
3. Based on the probabilities or observations from each round, further observations are made.
4. There will be a successful distribution associated with each machine after the right amount of observations.
5. This will assist the player in choosing the machines in the right way to earn the highest possible reward."
   }
 }]
}
</script>

Thompson sampling

What is Thompson sampling?

Thompson sampling is an algorithm that uses ​​exploration and exploitation for the purpose of choosing the actions that would maximize the rewards earned. It is also known as Probability Matching or Posterior Sampling.

Exploration essentially involves performing an action multiple times. Now, the results from the exploration (which can either be rewards or penalties), help in determining the actions that will be performed further with an aim to maximize the reward earned and improve future performance.

In Thompson sampling, as more information is gathered, the search is decreased. This helps us get the most information in the fewest searches possible. The algorithm is more search-oriented when there is fewer data available, and less search-oriented when there is more data at our disposal.

What is the Multi-Armed Bandit Problem?

The multi-armed bandit problem, though it may get you thinking about highwaymen, actually refers to multiple slot machines with varied payback percentages or a single slot machine with multiple arms, and each arm has a different payback percentage.

The goal is to create a policy for picking an action (in this problem, picking an action refers to choosing an arm to pull) at every step to maximize the total reward earned at the end of a pre-decided time period.

It is a Markov Decision process problem.

What is the intuition behind Thompson sampling in the multi-armed bandit problem?

  • It is assumed that all the slot machines have a uniform probability of earning a reward.
  • Based on the reward or penalty observed from each action, a new distribution is generated with probabilities of success for each machine.
  • Based on the probabilities or observations from each round, further observations are made.
  • There will be a success distribution associated with each machine after the right amount of observations. This will assist the player in choosing the machines in the right way to earn the highest possible reward.

What are the applications of Thompson sampling?

Thompson sampling can be used for a range of purposes. It can be used to predict stock prices to some extent, based on data currently available about stock prices. It can even be used to predict the delay in traffic signals or even in OTT platforms or eCommerce portals, to help item-based recommender engines to display images related to content or products that the users will be most likely to click on and watch or purchase. It can even be used to assist in automating the transportation and delivery of items.


About Engati

Engati powers 45,000+ chatbot & live chat solutions in 50+ languages across the world.

We aim to empower you to create the best customer experiences you could imagine. 

So, are you ready to create unbelievably smooth experiences?

Check us out!

Thompson sampling

October 14, 2020

Table of contents

Key takeawaysCollaboration platforms are essential to the new way of workingEmployees prefer engati over emailEmployees play a growing part in software purchasing decisionsThe future of work is collaborativeMethodology

What is Thompson sampling?

Thompson sampling is an algorithm that uses ​​exploration and exploitation for the purpose of choosing the actions that would maximize the rewards earned. It is also known as Probability Matching or Posterior Sampling.

Exploration essentially involves performing an action multiple times. Now, the results from the exploration (which can either be rewards or penalties), help in determining the actions that will be performed further with an aim to maximize the reward earned and improve future performance.

In Thompson sampling, as more information is gathered, the search is decreased. This helps us get the most information in the fewest searches possible. The algorithm is more search-oriented when there is fewer data available, and less search-oriented when there is more data at our disposal.

What is the Multi-Armed Bandit Problem?

The multi-armed bandit problem, though it may get you thinking about highwaymen, actually refers to multiple slot machines with varied payback percentages or a single slot machine with multiple arms, and each arm has a different payback percentage.

The goal is to create a policy for picking an action (in this problem, picking an action refers to choosing an arm to pull) at every step to maximize the total reward earned at the end of a pre-decided time period.

It is a Markov Decision process problem.

What is the intuition behind Thompson sampling in the multi-armed bandit problem?

  • It is assumed that all the slot machines have a uniform probability of earning a reward.
  • Based on the reward or penalty observed from each action, a new distribution is generated with probabilities of success for each machine.
  • Based on the probabilities or observations from each round, further observations are made.
  • There will be a success distribution associated with each machine after the right amount of observations. This will assist the player in choosing the machines in the right way to earn the highest possible reward.

What are the applications of Thompson sampling?

Thompson sampling can be used for a range of purposes. It can be used to predict stock prices to some extent, based on data currently available about stock prices. It can even be used to predict the delay in traffic signals or even in OTT platforms or eCommerce portals, to help item-based recommender engines to display images related to content or products that the users will be most likely to click on and watch or purchase. It can even be used to assist in automating the transportation and delivery of items.


Share

Continue Reading