Code&Data Insights

[Machine Learning] Reinforcement Learning - Upper Confidence Bound(UCB) | Thompson Sampling 본문

Data Science/Machine Learning

[Machine Learning] Reinforcement Learning - Upper Confidence Bound(UCB) | Thompson Sampling

paka_corn 2023. 6. 20. 09:47

[ Reinforcement Learning ]

 

Reinforcement Learning : reinforcement learning is a field of machine learning where a computer program, known as an agent, learns and improves gradually through experience while performing tasks in a specific environment. 

 

 

=> Agent interacts with the environment, perceives its current state, selects and executes actions, and receives rewards.

 

=> During this learning process, the agent explores different actions that could lead to rewards and adjusts its behavior based on the feedback of rewards to achieve incremental improvement.

 

 

 

 

[ Upper Confidence Bound ] 

Upper Confidence Bound(UCB) : the agent aims to strike a balance between exploration and exploitation in order to make optimal decisions. UCB takes uncertainty into account and selects actions that are expected to yield the highest rewards.

 

=> Deterministic Algorithm 

=> Requires update at every round (difference between UCB and Thompson Sampling) 

=> Used in click prediction, internet search engines, and recommendation systems.

 

 

 

 

 

[ Thompson Sampling ]

Thomson Sampling : Thompson Sampling is a reinforcement learning algorithm that maintains a balance between exploration and exploitation while seeking the optimal action. It estimates probability distributions and uses them to select actions.

 

=> Probabilistic Algorithm 

=> Can accommodate delayed feedback (difference between UCB and Thompson Sampling) 

 

=> In an uncertain environment, Thompson Sampling considers the uncertainty of value estimation and adjusts the balance between exploration and exploitation to make optimal choices.

 

Comments