Code&Data Insights

[Deep NLP] Word Embeddings | Word2Vec 본문

Artificial Intelligence/Natural Language Processing

[Deep NLP] Word Embeddings | Word2Vec

paka_corn 2023. 12. 26. 04:17

Word Embeddings

·    Word Vectors

-       Simple approach : one-hot vectors

=> NOT represent word meanining

=> Similarity/distance between all one hot vectors is the same

=> Better approach : ‘Word Embeddings’!

 

·      Word2Vec

: how likely is a word w likely to show up near another word?

-       Extract the learned weights as the word embeddings

-       Use as training set, readily available texts, no need for hand-labeled supervision

(self-supervised learning)

 

 

Word2Vec CBOW Models

: we use contexts to guess the word

-       Train ANN to guess a word given its context

=> CBOW(Continuous Bag of Word) model : guess the word in the middle

 

-       Train ANN to guess the context given a word

=> Skip-gram model : guess the surrounding words

 

·      CBOW Model

: use a shallow neural network with only 3-layers

-       Input : Given context words

-       Output : guess the word in the middle

 

 

 

 

-       Training the CBOW Model

(0)   Creating the Dataset

: the label is already in the data (self-supervised task)

 

** assume context words : ±2 word window

 

(1)   Feeding the Model – xi

=> Each instance is fed one by one(feed-forward + back-prop)

=>  Words are fed as 1-hot vectors

 

 

(2)   Weight matrix – W

=> Weight matrix W between input & hidden layer W is a V x N matrix

=> W : shared by all context words

=> V = size of vocab / N = size of embedding that what we want(=num of neurons in the hidden layer)

=>  C = size of context (+-window size)

=> Weight W is initially random, but modified via backprop

 

 

(3)   Feedforward – compute hi = 1/C(xiW)

=> Calculate the output of each of N hidden nodes for each context word

=> No activation function

=>  Take the avg from the sum(dot product)

W’ = N x V matrix (hidden & output layer)

 

 

(4)   Compute probabilities – compute y(hat) = softmax

 

5) Compute Network Error

- target output – predicted output

 

 

6) Backpropagate errors to adjust W and W’

=> We want to update the weight only of the context word

=> Weight update is only done when xi = 1

=> (input is one-hot vector : 0 or 1)

 

=> Iterate feedforward/backprop until error is minimized

Comments