Word Embeddings

class: misk-title-slide

# .font140[Word Embeddings]

---

# Language modeling

The relative likelihood of words and phrases...

.center[
.font200.blue[John is currently teaching a] &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
]

---
# Language modeling

.pull-left[

Significant growth in algorithms:

- [word2vec](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)

- [GloVe](https://nlp.stanford.edu/pubs/glove.pdf)

- [BERT](https://github.com/google-research/bert)

- [ELMo](https://allennlp.org/elmo)

- [ULMFiT](https://arxiv.org/abs/1801.06146)

- [GPT-2](https://github.com/openai/gpt-2)

]

.pull-right[

Many use-cases:

- Keyboard auto-complete

- Speech recognition

- Chat bot Q&A

- Translation

- Text generation

- POS tagging

]

.blue.center.bold[Advancement in this area has been a result of moving towards vector space representations of terms.]

---
# Vector space representation

.pull-left[

.bold[One-hot encoding] only allows us to account for words as single distinct items without accounting for sequence or relationships to other words

]

.pull-right[

.bold[Word embeddings] allow us to capture the relationship terms have without other terms. Creates better feature representation.

.font70.right[Images: [Bhaskar Mitra](https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017?from_action=save)]

]

---
# Notions of similarity

.pull-left-40[

Consider the following phrases:

- "seattle map"
- "seattle weather"
- "seahawks jerseys"
- "seahawks highlights"
- "seattle seahawks wilson"
- "seattle seahawks sherman"
- "seattle seahawks browner"
- "seattle seahawks Ifedi"
- "denver map"
- "denver weather"
- "broncos jerseys"
- "broncos highlights"
- "denver broncos lynch"
- "denver broncos sanchez"
- "denver broncos miller"
- "denver broncos marshall"

]

.pull-right-60[

.font70.right[Image: [Bhaskar Mitra](https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017?from_action=save)]

]

---
# Notions of similarity

.pull-left-40.code70[

```r
seattle <- c(0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1)
seahawks <- c(1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)

denver <- c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
broncos <- c(0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0)

# rbind
m1 <- rbind(seattle, seahawks, denver, broncos)

# similarities
similarities <- text2vec::sim2(m1)
Matrix::tril(similarities)
## 4 x 4 Matrix of class "dtrMatrix"
## seattle seahawks denver broncos
## seattle 1.0000000 . . .
## seahawks 0.5714286 1.0000000 . .
## denver 0.2857143 0.0000000 1.0000000 .
## broncos 0.0000000 0.2857143 0.5714286 1.0000000
```

.content-box-gray.center[Measured with cosine similarity]

]

.pull-right-60[

.font70.right[Image: [Bhaskar Mitra](https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017?from_action=save)]

]

---
# How do we get embeddings from this?

.center.bold.font180[Let's check out a simple example in...Excel 😳 !]

---
# Example

Embeddings for "King" based on Wikipedia (2014 file dump) and Gigaword 5 (2 years of newswire data)

```r
king_embeddings
##   [1] -0.323070 -0.876160  0.219770  0.252680  0.229760  0.738800 -0.379540 -0.353070 -0.843690 -1.111300 -0.302660
##  [12]  0.331780 -0.251130  0.304480 -0.077491 -0.898150  0.092496 -1.140700 -0.583240  0.668690 -0.231220 -0.958550
##  [23]  0.282620 -0.078848  0.753150  0.265840  0.342200 -0.339490  0.956080  0.065641  0.457470  0.398350  0.579650
##  [34]  0.392670 -0.218510  0.587950 -0.559990  0.633680 -0.043983 -0.687310 -0.378410  0.380260  0.616410 -0.882690
##  [45] -0.123460 -0.379280 -0.383180  0.238680  0.668500 -0.433210 -0.110650  0.081723  1.156900  0.789580 -0.212230
##  [56] -2.321100 -0.678060  0.445610  0.657070  0.104500  0.462170  0.199120  0.258020  0.057194  0.534430 -0.431330
##  [67] -0.343110  0.597890 -0.584170  0.068995  0.239440 -0.851810  0.303790 -0.341770 -0.257460 -0.031101 -0.162850
##  [78]  0.451690 -0.916270  0.645210  0.732810 -0.227520  0.302260  0.044801 -0.837410  0.550060 -0.525060 -1.735700
##  [89]  0.475100 -0.704870  0.056939 -0.713200  0.089623  0.413940 -1.336300 -0.619150 -0.330890 -0.528810  0.164830
## [100] -0.988780
```

---
# Example

---
# Example

.code150.center[

```
## 7 x 7 Matrix of class "dtrMatrix"
##        man water king woman  boy girl queen
## man   1.00     .    .     .    .    .     .
## water 0.36  1.00    .     .    .    .     .
## king  0.51  0.26 1.00     .    .    .     .
## woman 0.83  0.35 0.37  1.00    .    .     .
## boy   0.79  0.29 0.47  0.77 1.00    .     .
## girl  0.73  0.28 0.36  0.85 0.92 1.00     .
## queen 0.47  0.28 0.75  0.51 0.47 0.52  1.00
```

]

---
# Resources to learn more about word embeddings

- [Why do we use word embeddings in NLP?](https://towardsdatascience.com/why-do-we-use-embeddings-in-nlp-2f20e1b632d2)

- [The illustrated word2vec](http://jalammar.github.io/illustrated-word2vec/)

- [Sebastian Ruder's series on Word Embeddings](https://ruder.io/word-embeddings-1/index.html)

- [Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)

- [Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/pdf/1301.3781.pdf)

- [A Neural Probabilistic Language Model](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf)

- [Speech and Language Processing by Dan Jurafsky and James H. Martin is a leading resource for NLP. Word2vec is tackled in Chapter 6.](https://web.stanford.edu/~jurafsky/slp3/)

- [Chris McCormick](http://mccormickml.com/) has written some great blog posts about Word2vec.

---
# Embeddings for predictive models

.center.font120[Our embeddings are developed to maximize predictive accuracy]

---
class: clear, center, middle, hide-logo

background-image: url(images/any-questions.jpg)
background-position: center
background-size: cover

---
# Back home

[.center[]](https://github.com/misk-data-science/misk-dl)

.center[https://github.com/misk-data-science/misk-dl]