Code&Data Insights
[Machine Learning] Reinforcement Learning - Upper Confidence Bound(UCB) | Thompson Sampling 본문
[Machine Learning] Reinforcement Learning - Upper Confidence Bound(UCB) | Thompson Sampling
paka_corn 2023. 6. 20. 09:47[ Reinforcement Learning ]
Reinforcement Learning : reinforcement learning is a field of machine learning where a computer program, known as an agent, learns and improves gradually through experience while performing tasks in a specific environment.
=> Agent interacts with the environment, perceives its current state, selects and executes actions, and receives rewards.
=> During this learning process, the agent explores different actions that could lead to rewards and adjusts its behavior based on the feedback of rewards to achieve incremental improvement.
[ Upper Confidence Bound ]
Upper Confidence Bound(UCB) : the agent aims to strike a balance between exploration and exploitation in order to make optimal decisions. UCB takes uncertainty into account and selects actions that are expected to yield the highest rewards.
=> Deterministic Algorithm
=> Requires update at every round (difference between UCB and Thompson Sampling)
=> Used in click prediction, internet search engines, and recommendation systems.
[ Thompson Sampling ]
Thomson Sampling : Thompson Sampling is a reinforcement learning algorithm that maintains a balance between exploration and exploitation while seeking the optimal action. It estimates probability distributions and uses them to select actions.
=> Probabilistic Algorithm
=> Can accommodate delayed feedback (difference between UCB and Thompson Sampling)
=> In an uncertain environment, Thompson Sampling considers the uncertainty of value estimation and adjusts the balance between exploration and exploitation to make optimal choices.