[Deep NLP] Word Embeddings

Recent Posts

Recent Comments

Tags more

Archives

Today

Total

Code&Data Insights

[Deep NLP] Word Embeddings | Word2Vec 본문

Artificial Intelligence/Natural Language Processing

[Deep NLP] Word Embeddings | Word2Vec

paka_corn 2023. 12. 26. 04:17

Word Embeddings

· Word Vectors

- Simple approach : one-hot vectors

=> NOT represent word meanining

=> Similarity/distance between all one hot vectors is the same

=> Better approach : ‘Word Embeddings’!

· Word2Vec

: how likely is a word w likely to show up near another word?

- Extract the learned weights as the word embeddings

- Use as training set, readily available texts, no need for hand-labeled supervision

(self-supervised learning)

Word2Vec CBOW Models

: we use contexts to guess the word

- Train ANN to guess a word given its context

=> CBOW(Continuous Bag of Word) model : guess the word in the middle

- Train ANN to guess the context given a word

=> Skip-gram model : guess the surrounding words

· CBOW Model

: use a shallow neural network with only 3-layers

- Input : Given context words

- Output : guess the word in the middle

- Training the CBOW Model

(0) Creating the Dataset

: the label is already in the data (self-supervised task)

** assume context words : ±2 word window

(1) Feeding the Model – xi

=> Each instance is fed one by one(feed-forward + back-prop)

=> Words are fed as 1-hot vectors

(2) Weight matrix – W

=> Weight matrix W between input & hidden layer W is a V x N matrix

=> W : shared by all context words

=> V = size of vocab / N = size of embedding that what we want(=num of neurons in the hidden layer)

=> C = size of context (+-window size)

=> Weight W is initially random, but modified via backprop

(3) Feedforward – compute hi = 1/C(xiW)

=> Calculate the output of each of N hidden nodes for each context word

=> No activation function

=> Take the avg from the sum(dot product)

W’ = N x V matrix (hidden & output layer)

(4) Compute probabilities – compute y(hat) = softmax

5) Compute Network Error

- target output – predicted output

6) Backpropagate errors to adjust W and W’

=> We want to update the weight only of the context word

=> Weight update is only done when xi = 1

=> (input is one-hot vector : 0 or 1)

=> Iterate feedforward/backprop until error is minimized

'Artificial Intelligence > Natural Language Processing' 카테고리의 다른 글

[Generative AI] Generative AI \| Capabilities of Generative AI (0)	2024.03.23
[NLP] Large Language Model (LLM) (0)	2024.03.23
[Deep NLP] Attention \| Transformer (0)	2023.12.26
[Statistical NLP] N-gram models (1)	2023.12.26
[Statistical NLP] Bag of Word Model (1)	2023.12.26

'Artificial Intelligence/Natural Language Processing' Related Articles

Comments