My Recommendations for Getting Started with NLP
Resources for getting started with natural language processing.
I have been studying natural language processing (NLP) since 2013, back when manual feature engineering was very popular in the world of machine learning. We have come a long way since then. I actually specialized in information retrieval and machine learning techniques for my Ph.D., particularly how they apply to social computing and computational linguistics, while at the same time developing approaches for efficient information extraction from large-scale text-based data. I am fortunate to have experience with classical machine learning applied to NLP and witnessed firsthand the explosion of deep learning in the field.
Lots of students have been asking me to prepare a guide for how to get started with natural language processing. This blog post is a shot at helping out others based on research, exposure to the field, and personal experience. Although it is not a direct guide, the resources I share here can help you create your own NLP learning path based on your needs. This will be a combination of educational resources that I have come across over the years. I will share my experience in studying these resources and where they are applicable.
The list is not exhaustive by any means but it should provide options that serve as a great starting point for anyone interested in gettering started with NLP. You don’t really need to consume all the content. Just choose the resources that fit your current needs. For instance, maybe you already have some theoretical foundation, and you only need to get the best practices for developing NLP systems in production. In that case, you can jump straight to the recommendations for getting hands-on experience with NLP techniques. I am only covering content that I have studied personally, and I am sure there are other wonderful resources out there that I have missed, feel free to comment if you have any recommendations.
📘 Speech and Language Processing
by Dan Jurafsky and James H. Martin
Studying the fundamentals is vital in learning about any subject you are studying. I am a huge advocate for that as this has worked for me. I have been following this book for a while and it’s now in its third edition. The material covered in this book is exceptionally well-written and offers a great theoretical foundation to NLP. This could potentially be a great starting point for anyone wanting to get started with NLP. Even though I have read the book, I regularly check it as it is regularly updated with the latest developments in the field. If you really like this book, you will also find these lectures useful as they do cover a lot of the fundamental topics covered in the book.
📘 Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax
by Emily M. Bender
Emily Bender is one of my favorite linguistics researchers. Her work has influenced my own research tremendously and has allowed me to adopt a more rigorous approach to NLP research. NLP is heavily influenced by linguistics and, in fact, Emily advocates for using teachings in linguistics to inform developments in NLP. Her book provides an exceptional introduction to concepts in linguistics used in NLP. A must-read book for any NLP student.
📘 Linguistic Structure Prediction
by Noah A. Smith
This book focuses on bridging natural language processing and machine learning, covering statistical, computational approaches to modeling linguistic structure. The book assumes that you have some exposure to machine learning already. You can check out the list of machine learning recommendations I made here if you are not too familiar with the topic. It is advised you at least do an intro to machine learning course to make the most out of this book.
📘 Introduction to Natural Language Processing
by Jacob Eisenstein
This is one of my favorite NLP books due to the focus on discussing linguistic concepts and applications. It covers methods like beam search, maximum likelihood estimation, matrix factorization, among others. It then explains how the methods are used to address a wide range of tasks like classification, part-of-speech tagging, relation extraction, language modeling, etc. The book assumes knowledge of subjects like multivariate calculus and linear algebra. One recommendation directly from the book is the Mathematics for Machine Learning book. It is a more advanced textbook compared to others and it does require some understanding of machine learning and mathematical concepts.
📘 Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies)
by Yoav Goldberg
If you are just starting your journey into NLP, you have probably been exposed to more modern methods for NLP like RNNs and other deep learning-based models. If you are looking for a comprehensive theoretical overview of neural networks and how they are used in NLP, this is the book for you. The references found in this book have been instrumental in my own research.
🌐 Modern Deep Learning Techniques Applied to Natural Language Processing
by Soujanya Poria and Elvis Saravia
On the topic of modern methods for NLP, I would also recommend this open resource I put together with Soujanya Poria. It walks you through some of the more recent developments in the field of NLP ranging from word embedding to attention mechanism to reinforcement learning.
📺 CS224N: Natural Language Processing with Deep Learning | Winter 2019
by Christopher Manning and Abigail See
If you recently got started with NLP, you have probably come across this popular NLP course. All the lectures and slides are public and you can find them on the course website. This course is heavily focused on deep learning methods for NLP so you will see that the first lecture starts directly with word vectors and then transitions into more advanced topics like convolutional networks and Transformers. If you are interested in classical NLP methods you may have to check one of the books mentioned at the beginning. In fact, I would strongly recommend you to do so as it’s valuable knowledge useful in practice for building real-world NLP systems.
As of October 2021, there is a new set of recordings for the course, available here.
The theory is great but regardless if you are an NLP researcher or engineer, you still have to complement it with hands-on practice. These are some books I have found to be exceptionally useful to get practice on topics like language modeling and text-based classification.
📘 Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning
by Delip Rao and Brian McMahan
Even though the book is based on PyTorch, it’s great to get hands-on practice with building language applications with deep learning. There is also content and code for the traditional concepts and methods like TF-IDF and semantics, to name a few. If you are a PyTorch developer, you will find this book easy to follow.
📘 Natural Language Processing in Action
by Hobson Lane, Cole Howard, and Hannes Hapke
This is another exceptional book, and of my favorites to get hands-on practice for all things NLP. This book guides you on how to build your first vocabulary from a corpus all the way up to building a chatbot. There is a lot of code examples in this book so if you are into coding, it could be a good fit for you.
📘 Practical Natural Language Processing
by Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana
In terms of hands-on practice for NLP, I am thoroughly enjoying this book published this year. It covers topics that range from all sorts of practical applications in NLP to best practices for deploying NLP systems. Even though I am just halfway through this book, I had to include it as there are many NLP engineers out there that want to get familiar with how to build NLP systems more effectively and understand the techniques needed to do so.
🤗 Natural Language Processing with Transformers
by Lewis Tunstall, Leandro von Werra, Thomas Wolf
This book by the awesome folks at Hugging Face focuses on modern NLP, specifically Transformers which has taken the field by storm. The book is in its early release at the time of writing this update but it’s already packed with hands-on material and practical ML/NLP tips along the way. While it focuses mostly on the Hugging Face libraries like Datasets and Transformers, this is one for the bookshelf for sure.
🐙 Libraries
When you are ready to start developing NLP-based solutions, I can recommend checking out Hugging Face Transformers and spaCy for some of the latest NLP techniques and best practices.
⭐️ Bonus
Here are a few other resources and projects that could help you to stay informed about the field of NLP:
That’s it for my recommendations on how to get started with NLP. It’s important that you choose the content that best fits your need. I have tried to offer some explanation for each item and hope that helps you to create your own learning path. These are some of the best resources I have come across and I have found them very useful to expand my knowledge and even teach these concepts, not to mention applying them to research ideas and building NLP systems that range from semantic search engines to emotion classifiers.
I try to regularly maintain this guide. To get regular updates on new ML and NLP resources, follow me on Twitter.