[Machine Learning] Reinforcement Learning - Upper Confidence Bound(UCB)

Recent Posts

Recent Comments

Tags more

Archives

Today

Total

Code&Data Insights

[Machine Learning] Reinforcement Learning - Upper Confidence Bound(UCB) | Thompson Sampling 본문

Artificial Intelligence/Machine Learning

[Machine Learning] Reinforcement Learning - Upper Confidence Bound(UCB) | Thompson Sampling

paka_corn 2023. 6. 20. 09:47

[ Reinforcement Learning ]

Reinforcement Learning : reinforcement learning is a field of machine learning where a computer program, known as an agent, learns and improves gradually through experience while performing tasks in a specific environment.

=> Agent interacts with the environment, perceives its current state, selects and executes actions, and receives rewards.

=> During this learning process, the agent explores different actions that could lead to rewards and adjusts its behavior based on the feedback of rewards to achieve incremental improvement.

[ Upper Confidence Bound ]

Upper Confidence Bound(UCB) : the agent aims to strike a balance between exploration and exploitation in order to make optimal decisions. UCB takes uncertainty into account and selects actions that are expected to yield the highest rewards.

=> Deterministic Algorithm

=> Requires update at every round (difference between UCB and Thompson Sampling)

=> Used in click prediction, internet search engines, and recommendation systems.

[ Thompson Sampling ]

Thomson Sampling : Thompson Sampling is a reinforcement learning algorithm that maintains a balance between exploration and exploitation while seeking the optimal action. It estimates probability distributions and uses them to select actions.

=> Probabilistic Algorithm

=> Can accommodate delayed feedback (difference between UCB and Thompson Sampling)

=> In an uncertain environment, Thompson Sampling considers the uncertainty of value estimation and adjusts the balance between exploration and exploitation to make optimal choices.

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

[Machine Learning] Dimensionality Reduction - Feature Extraction \| PCA \| LDA (0)	2023.06.26
[Machine Learning] Natural Language Processing(NLP) \| Bag-Of-Words Model (1)	2023.06.21
[Machine Learning] Association Rule Learning - Apriori \| Eclat Algorithm (0)	2023.06.19
[Machine Learning] Clustering - Hierarchical Clustering \| Agglomerative Hierarchical Clustering \| Dendrograms (0)	2023.06.19
[Machine Learning] Classification - K-Nearest Neighbours(KNN) \| Naive Bayes (0)	2023.06.16

'Artificial Intelligence/Machine Learning' Related Articles

Comments