Natural Lanugage Processing with TensorFlow_ Teach language to machines using Python's deep learning library.pdf

[ 1 ]

Natural Language Processing with TensorFlow

Teach language to machines using Python's deep learning library

Thushan Ganegedara

BIRMINGHAM - MUMBAI

Natural Language Processing with TensorFlow Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this t his book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Acquisition Editor: Frank Pohlmann Project Editor: Radhika Atitkar Content Development Editor: Chris Nelson Technical Editor: Bhagyashree Rai Copy Editor: Tom Jacob Proofreader: Sas Editing Indexer: Rekha Nair Graphics: Tom Scaria Production Coordinator: Nilesh Mohite

First published: May 2018 Production reference: 2310518 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78847-831-1 www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe? •

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

•

Learn better with Skill Plans built especially for you

•

Get a free eBook or video every month

•

Mapt is fully searchable

•

Copy and paste, print, and bookmark content

PacktPub.com Did you know that Packt offers eBook versions of every book published, with PDF and ePub les available? You can upgrade to the eBook version at www.PacktPub. com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors About the author Thushan Ganegedara is currently a third year Ph.D. student at the University of Sydney, Australia. He is specializing in machine learning and has a liking for deep learning. He lives dangerously and runs algorithms on untested data. He also works as the chief data scientist for AssessThreat, an Australian start-up. He got his BSc. (Hons) from the University of Moratuwa, Sri Lanka. He frequently writes technical articles and tutorials about machine learning. Additionally, he also strives for a healthy lifestyle by including swimming in his daily schedule. I would like to thank my parents, my siblings, and my wife for the faith they had in me and the support they have given, also all my teachers and my Ph.D advisor for the guidance he provided me with.

About the reviewers Motaz Saad holds a Ph.D. in computer science from the University of Lorraine. He loves data and he likes to play with it. He has over 10 years, professional experience in NLP, computational linguistics, data science, and machine learning. He currently works as an assistant professor at the faculty of information technology, IUG.

Dr Joseph O'Connor is a data scientist with a deep passion for deep learning. His company, Deep Learn Analytics, a UK-based data science consultancy, works with businesses to develop machine learning applications and infrastructure from concept to deployment. He was awarded a Ph.D. from University College London for his work analyzing data on the MINOS high-energy physics experiment. Since then, he has developed ML products for a number of companies in the private sector, specializing in NLP and time series forecasting. You can nd him at http:// deeplearnanalytics.com/.

Packt is searching for authors like you If you're interested in becoming an author for Packt, please visit authors.packtpub. com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specic hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents Preface

xi

Chapter 1: Introduction to Natural Language Processing

1

What is Natural Language Processing? Tasks of Natural Language Processing The traditional approach to Natural Language Processing Understanding the traditional approach Example – generating football game summaries

Drawbacks of the traditional approach The deep learning approach to Natural Language Processing History of deep learning The current state of deep learning and NLP Understanding a simple deep model – a Fully-Connected Neural Network The roadmap – beyond this chapter Introduction Introductio n to the technical tools Description of the tools Installing Python and scikit-learn scikit-l earn Installing Jupyter Notebook Installing TensorFlow Summary

Chapter 2: Understanding TensorFlow What is TensorFlow? Getting started with TensorFlow TensorFlow client in detail TensorFlow architecture – what happens when you execute the client? Cafe Le TensorFlow – understanding TensorFlow with an analogy Inputs, variables, variables, outputs, and operations Dening inputs in TensorFlow [ i ]

1 2 5 5 6

10 10 11 13 14 16 21 21 22 22 23 24

27 28 28 31 32 35 36 37

Table of Contents Feeding data with Python code Preloading and storing data as tensors Building an input pipeline

37 38 39

Dening variables in TensorFlow Dening TensorFlow outputs Dening TensorFlow operations

43 45 45

Comparison operations Mathematical operations Scatter and gather operations Neural network-related operations

45 46 47 48

Reusing variables with scoping Implementing our rst neural network Preparing the data Dening the TensorFlow graph Running the neural network Summary

57 59 60 61 63 65

Chapter 3: Word2vec – Learning Word Embeddings

67

What is a word representation representatio n or meaning? Classical approaches to learning word representation representatio n WordNet – using an external lexical knowledge base for learning word representations

69 69

Tour of WordNet Problems with WordNet

70 70 73

One-hot encoded representation The TF-IDF method Co-occurrence Co-occurrence matrix Word2vec – a neural network-based approach to learning word representation Exercise: is queen = king – he + she? Designing a loss function for learning word embeddings The skip-gram algorithm From raw text to structured data Learning the word embeddings with a neural network Formulating a practical loss function Efciently approximating the loss function

Implementing skip-gram with TensorFlow The Continuous Bag-of-Words algorithm Implementing CBOW in TensorFlow Summary

Chapter 4: Advanced Word2vec

74 75 76 77 78 82 83 83 84 87 90

95 98 99 100

103

The original skip-gram algorithm Implementing Implementing the original skip-gram algorithm [ ii ]

104 105

Table of Contents

Comparing the original skip-gram with the improved skip-gram Comparing skip-gram with CBOW Performance comparison Which is the winner, skip-gram or CBOW? Extensions to the word embeddings algorithms Using the unigram distribution for negative sampling Implementing unigram-based negative sampling Subsampling – probabilistically ignoring the common words Implementing subsampling Comparing the CBOW and its extensions More recent algorithms extending skip-gram and CBOW A limitation of the skip-gram algorithm The structured skip-gram algorithm The loss function The continuous window model GloVe – Global Vectors representation Understanding GloVe Implementing GloVe Document classication with Word2vec Dataset Classifying documents with word embeddings Implementation – learning word embeddings Implementation – word embeddings to document embeddings Document clustering and t-SNE visualization of embedded documents Inspecting several outliers Implementation – clustering/classication of documents with K-means Summary

Chapter 5: Sentence Classication with Convolutional Neural Networks Introducing Convolution Neural Networks CNN fundamentals The power of Convolution Neural Networks Understanding Convolution Neural Networks Convolution operation Standard convolution operation Convolving with stride Convolving with padding Transposed convolution

107 107 108 112 114 114 115 117 118 118 119 119 120 120 122 123 123 125 126 127 127 128 129 130 131 132 134

135 136 136 139 139 140 140 141 142 143

Pooling operation

144

Max pooling Max pooling with stride Average pooling

145 145 146 [ iii ]

Table of Contents

Fully connected layers Putting everything together Exercise – image classication on MNIST with CNN About the data Implementing the CNN Analyzing the predictions produced with a CNN Using CNNs for sentence classication CNN structure Data transformation The convolution operation

147 147 148 149 149 152 153 153 153 154

Pooling over time Implementation – sentence classication with CNNs Summary

Chapter 6: Recurrent Neural Networks Understanding Recurrent Neural Networks The problem with feed-forward neural networks Modeling with Recurrent Neural Networks Technical description of a Recurrent Neural Network Backpropagation Through Time How backpropagation works Why we cannot use BP directly for RNNs Backpropagation Through Time – training RNNs Truncated BPTT – training RNNs efciently Limitations of BPTT – vanishing and exploding gradients Applications of RNNs One-to-one RNNs One-to-many RNNs Many-to-one RNNs Many-to-many RNNs Generating text with RNNs Dening hyperparameters Unrolling the inputs over time for Truncated BPTT Dening the validation dataset Dening weights and biases Dening state persisting variables Calculating the hidden states and outputs with unrolled inputs Calculating the loss Resetting state at the beginning of a new segment of text Calculating validation output Calculating gradients and optimizing Outputting a freshly generated chunk of text [ iv ]

157 159 162

163 164 165 166 168 170 170 171 172 173 173 175 176 176 177 178 179 179 180 181 181 181 182 183 183 184 184 184

Table of Contents

Evaluating text results output from the RNN Perplexity – measuring the quality of the text result Recurrent Neural Networks with Context Features – RNNs with longer memory Technical description of the RNN-CF Implementing the RNN-CF Dening the RNN-CF hyperparameters Dening input and output placeholders Dening weights of the RNN-CF Variables and operations for maintaining hidden and context states Calculating output Calculating the loss Calculating validation output Computing test output Computing the gradients and optimizing

Text generated with the RNN-CF Summary

185 187 188 188 190 190 191 191 192 194 195 195 196 196

196 199

Chapter 7: Long Short-Term Memory Networks Understanding Long Short-Term Memory Networks What is an LSTM? LSTMs in more detail How LSTMs differ from standard RNNs How LSTMs solve the vanishing gradient problem Improving LSTMs Greedy sampling Beam search Using word vectors Bidirectional LSTMs (BiLSTM) Other variants of LSTMs Peephole connections Gated Recurrent Units Summary

Chapter 8: Applications of LSTM – Generating Text Our data About the dataset Preprocessing data Implementing an LSTM Dening hyperparameters Dening parameters Dening an LSTM cell and its operations Dening inputs and labels Dening sequential calculations required to process sequential data [ v ]

201 202 203 204 212 213 216 217 218 219 220 222 223 224 226

229 230 230 232 232 232 233 235 236 237

Table of Contents

Dening the optimizer Decaying learning rate over time Making predictions Calculating perplexity (loss) Resetting states Greedy sampling to break unimodality Generating new text Example generated text Comparing LSTMs to LSTMs with peephole connections and GRUs Standard LSTM Review Example generated text

238 238 240 240 240 241 241 242 243 243 243 244

Gated Recurrent Units (GRUs)

245

Review The code Example generated text

245 246 247

LSTMs with peepholes

248

Review The code Example generated text

248 248 249

Training and validation perplexities over time Improving LSTMs – beam search Implementing beam search Examples generated with beam search Improving LSTMs – generating text with words instead of n-grams The curse of dimensionality Word2vec to the rescue Generating text with Word2vec Examples generated with LSTM-Word2vec and beam search Perplexity over time Using the TensorFlow RNN API Summary

250 251 252 254 255 255 255 256 258 259 260 264

Chapter 9: Applications of LSTM – Image Caption Generation

265

Getting to know the data ILSVRC ImageNet dataset The MS-COCO dataset The machine learning pipeline for image caption generation Extracting image features with CNNs Implementation – loading weights and inferencing with VGG-16 Building and updating variables Preprocessing inputs Inferring VGG-16

266 267 268 269 273 274 274 275 277

[ vi ]

Table of Contents

Extracting vectorized representations of images Predicting class probabilities with VGG-16 Learning word embeddings Preparing captions for feeding into LSTMs Generating data for LSTMs Dening the LSTM Evaluating the results quantitatively BLEU ROUGE METEOR CIDEr BLEU-4 over time for our model Captions generated for test images Using TensorFlow RNN API with pretrained GloVe word vectors Loading GloVe word vectors Cleaning data Using pretrained embeddings with TensorFlow RNN API Dening the pretrained embedding layer and the adaptation layer Dening the LSTM cell and softmax layer Dening inputs and outputs Processing images and text differently Dening the LSTM output calculation Dening the logits and predictions Dening the sequence loss Dening the optimizer

Summary

278 278 280 281 282 284 287 287 288 289 291 292 293 297 298 299 302 303 303 304 305 306 307 307 307

308

Chapter 10: Sequence-to-Sequence Learning – Neural Machine Translation Machine translation A brief historical tour of machine translation Rule-based translation Statistical Machine Translation (SMT) Neural Machine Translation (NMT) Understanding Neural Machine Translation Intuition behind NMT NMT architecture The embedding layer The encoder The context vector The decoder

311 312 313 313 315 317 320 320 321 322 322 323 324

Preparing data for the NMT system At training time Reversing the source sentence [ vii ]

325 325 326

Table of Contents

At testing time Training the NMT Inference with NMT The BLEU score – evaluating the machine translation systems Modied precision Brevity penalty The nal BLEU score Implementing an NMT from scratch – a German to English translator Introduction to data Preprocessing data Learning word embeddings Dening the encoder and the decoder Dening the end-to-end output calculation Some translation results Training an NMT jointly with word embeddings Maximizing matchings between the dataset vocabulary and the pretrained embeddings Dening the embeddings layer as a TensorFlow variable Improving NMTs Teacher forcing Deep LSTMs Attention Breaking the context vector bottleneck The attention mechanism in detail Implementing the attention mechanism Dening weights Computing attention

327 328 329 330 331 331 332 332 333 333 335 335 338 340 342 343 345 348 348 350 351 351 352 356 356 357

Some translation results – NMT with attention Visualizing attention for source and target sentences Other applications of Seq2Seq models – chatbots Training a chatbot Evaluating chatbots – Turing test Summary

Chapter 11: Current Trends and the Future of Natural Language Processing Current trends in NLP Word embeddings

359 361 363 364 365 366

369 370 370

Region embedding Probabilistic word embedding Ensemble embedding Topic embedding

370 374 375 375

Neural Machine Translation (NMT)

376 [ viii ]

Natural Lanugage Processing with TensorFlow_ Teach language to machines using Python's deep learning library.pdf

Recommend Documents