Code&Data Insights
[Deep NLP] Word Embeddings | Word2Vec 본문
[Deep NLP] Word Embeddings | Word2Vec
paka_corn 2023. 12. 26. 04:17Word Embeddings
· Word Vectors
- Simple approach : one-hot vectors
=> NOT represent word meanining
=> Similarity/distance between all one hot vectors is the same
=> Better approach : ‘Word Embeddings’!
· Word2Vec
: how likely is a word w likely to show up near another word?
- Extract the learned weights as the word embeddings
- Use as training set, readily available texts, no need for hand-labeled supervision
(self-supervised learning)
Word2Vec CBOW Models
: we use contexts to guess the word
- Train ANN to guess a word given its context
=> CBOW(Continuous Bag of Word) model : guess the word in the middle
- Train ANN to guess the context given a word
=> Skip-gram model : guess the surrounding words
· CBOW Model
: use a shallow neural network with only 3-layers
- Input : Given context words
- Output : guess the word in the middle
- Training the CBOW Model
(0) Creating the Dataset
: the label is already in the data (self-supervised task)
** assume context words : ±2 word window
(1) Feeding the Model – xi
=> Each instance is fed one by one(feed-forward + back-prop)
=> Words are fed as 1-hot vectors
(2) Weight matrix – W
=> Weight matrix W between input & hidden layer W is a V x N matrix
=> W : shared by all context words
=> V = size of vocab / N = size of embedding that what we want(=num of neurons in the hidden layer)
=> C = size of context (+-window size)
=> Weight W is initially random, but modified via backprop
(3) Feedforward – compute hi = 1/C(xiW)
=> Calculate the output of each of N hidden nodes for each context word
=> No activation function
=> Take the avg from the sum(dot product)
W’ = N x V matrix (hidden & output layer)
(4) Compute probabilities – compute y(hat) = softmax
5) Compute Network Error
- target output – predicted output
6) Backpropagate errors to adjust W and W’
=> We want to update the weight only of the context word
=> Weight update is only done when xi = 1
=> (input is one-hot vector : 0 or 1)
=> Iterate feedforward/backprop until error is minimized
'Artificial Intelligence > Natural Language Processing' 카테고리의 다른 글
[Generative AI] Generative AI | Capabilities of Generative AI (0) | 2024.03.23 |
---|---|
[NLP] Large Language Model (LLM) (0) | 2024.03.23 |
[Deep NLP] Attention | Transformer (0) | 2023.12.26 |
[Statistical NLP] N-gram models (1) | 2023.12.26 |
[Statistical NLP] Bag of Word Model (1) | 2023.12.26 |