class: misk-title-slide <br><br><br><br><br><br> # .font140[Word Embeddings] --- # Language modeling <br> The relative likelihood of words and phrases... <br><br> .center[ .font200.blue[John is currently teaching a] <font color="red"><u> </u></font> ] --- # Language modeling .pull-left[ Significant growth in algorithms: - [word2vec](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) - [GloVe](https://nlp.stanford.edu/pubs/glove.pdf) - [BERT](https://github.com/google-research/bert) - [ELMo](https://allennlp.org/elmo) - [ULMFiT](https://arxiv.org/abs/1801.06146) - [GPT-2](https://github.com/openai/gpt-2) ] .pull-right[ Many use-cases: - Keyboard auto-complete - Speech recognition - Chat bot Q&A - Translation - Text generation - POS tagging ] <br> .blue.center.bold[Advancement in this area has been a result of moving towards vector space representations of terms.] --- # Vector space representation .pull-left[ .bold[One-hot encoding] only allows us to account for words as single distinct items without accounting for sequence or relationships to other words <br> <img src="images/embeddings-one-hot.png" width="1351" style="display: block; margin: auto;" /> ] .pull-right[ .bold[Word embeddings] allow us to capture the relationship terms have without other terms. Creates better feature representation. <br> <img src="images/embeddings-vector-representation.png" width="795" style="display: block; margin: auto;" /> <br><br><br><br><br> .font70.right[Images: [Bhaskar Mitra](https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017?from_action=save)] ] --- # Notions of similarity .pull-left-40[ Consider the following phrases: - "seattle map" - "seattle weather" - "seahawks jerseys" - "seahawks highlights" - "seattle seahawks wilson" - "seattle seahawks sherman" - "seattle seahawks browner" - "seattle seahawks Ifedi" - "denver map" - "denver weather" - "broncos jerseys" - "broncos highlights" - "denver broncos lynch" - "denver broncos sanchez" - "denver broncos miller" - "denver broncos marshall" ] -- .pull-right-60[ <br><br><br> <img src="images/embeddings-notions-of-similarity.png" width="1389" style="display: block; margin: auto;" /> <br><br><br><br> .font70.right[Image: [Bhaskar Mitra](https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017?from_action=save)] ] --- # Notions of similarity .pull-left-40.code70[ ```r seattle <- c(0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1) seahawks <- c(1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0) denver <- c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1) broncos <- c(0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0) # rbind m1 <- rbind(seattle, seahawks, denver, broncos) # similarities similarities <- text2vec::sim2(m1) Matrix::tril(similarities) ## 4 x 4 Matrix of class "dtrMatrix" ## seattle seahawks denver broncos ## seattle 1.0000000 . . . ## seahawks 0.5714286 1.0000000 . . ## denver 0.2857143 0.0000000 1.0000000 . ## broncos 0.0000000 0.2857143 0.5714286 1.0000000 ``` .content-box-gray.center[Measured with cosine similarity] ] .pull-right-60[ <br><br><br> <img src="images/embeddings-notions-of-similarity.png" width="1389" style="display: block; margin: auto;" /> <br><br><br><br> .font70.right[Image: [Bhaskar Mitra](https://www.slideshare.net/BhaskarMitra3/neural-text-embeddings-for-information-retrieval-wsdm-2017?from_action=save)] ] --- # How do we get embeddings from this? <br><br><br><br><br> .center.bold.font180[Let's check out a simple example in...Excel 😳 !] --- # Example Embeddings for "King" based on Wikipedia (2014 file dump) and Gigaword 5 (2 years of newswire data) ```r king_embeddings ## [1] -0.323070 -0.876160 0.219770 0.252680 0.229760 0.738800 -0.379540 -0.353070 -0.843690 -1.111300 -0.302660 ## [12] 0.331780 -0.251130 0.304480 -0.077491 -0.898150 0.092496 -1.140700 -0.583240 0.668690 -0.231220 -0.958550 ## [23] 0.282620 -0.078848 0.753150 0.265840 0.342200 -0.339490 0.956080 0.065641 0.457470 0.398350 0.579650 ## [34] 0.392670 -0.218510 0.587950 -0.559990 0.633680 -0.043983 -0.687310 -0.378410 0.380260 0.616410 -0.882690 ## [45] -0.123460 -0.379280 -0.383180 0.238680 0.668500 -0.433210 -0.110650 0.081723 1.156900 0.789580 -0.212230 ## [56] -2.321100 -0.678060 0.445610 0.657070 0.104500 0.462170 0.199120 0.258020 0.057194 0.534430 -0.431330 ## [67] -0.343110 0.597890 -0.584170 0.068995 0.239440 -0.851810 0.303790 -0.341770 -0.257460 -0.031101 -0.162850 ## [78] 0.451690 -0.916270 0.645210 0.732810 -0.227520 0.302260 0.044801 -0.837410 0.550060 -0.525060 -1.735700 ## [89] 0.475100 -0.704870 0.056939 -0.713200 0.089623 0.413940 -1.336300 -0.619150 -0.330890 -0.528810 0.164830 ## [100] -0.988780 ``` --- # Example <img src="03-word-embeddings_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto 0 auto auto;" /> <img src="03-word-embeddings_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto 0 auto auto;" /> <img src="03-word-embeddings_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto 0 auto auto;" /> <img src="03-word-embeddings_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto 0 auto auto;" /> --- # Example .code150.center[ ``` ## 7 x 7 Matrix of class "dtrMatrix" ## man water king woman boy girl queen ## man 1.00 . . . . . . ## water 0.36 1.00 . . . . . ## king 0.51 0.26 1.00 . . . . ## woman 0.83 0.35 0.37 1.00 . . . ## boy 0.79 0.29 0.47 0.77 1.00 . . ## girl 0.73 0.28 0.36 0.85 0.92 1.00 . ## queen 0.47 0.28 0.75 0.51 0.47 0.52 1.00 ``` ] --- # Resources to learn more about word embeddings - [Why do we use word embeddings in NLP?](https://towardsdatascience.com/why-do-we-use-embeddings-in-nlp-2f20e1b632d2) - [The illustrated word2vec](http://jalammar.github.io/illustrated-word2vec/) - [Sebastian Ruder's series on Word Embeddings](https://ruder.io/word-embeddings-1/index.html) - [Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) - [Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/pdf/1301.3781.pdf) - [A Neural Probabilistic Language Model](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) - [Speech and Language Processing by Dan Jurafsky and James H. Martin is a leading resource for NLP. Word2vec is tackled in Chapter 6.](https://web.stanford.edu/~jurafsky/slp3/) - [Chris McCormick](http://mccormickml.com/) has written some great blog posts about Word2vec. --- # Embeddings for predictive models .center.font120[Our embeddings are developed to maximize predictive accuracy] <br> <img src="images/embeddings-weights-classification.png" width="624" style="display: block; margin: auto;" /> --- class: clear, center, middle, hide-logo background-image: url(images/any-questions.jpg) background-position: center background-size: cover --- # Back home <br><br><br><br> [.center[
<i class="fas fa-home fa-10x faa-FALSE animated "></i>
]](https://github.com/misk-data-science/misk-dl) .center[https://github.com/misk-data-science/misk-dl]