Machine Learning Made Easy: April 2019

Thursday, April 18, 2019

Introduction to NLP Part 2: Regular Expression in Python

Regular Expression is like a series of characters that is used to search a definite pattern in text. These are often used to extract information from both structured as well as unstructured text corpus. Almost all programming language have a well defined library of functions used for this purpose. In this blog we would look at some of the common functions that are used in python along with some scenario based use cases. The broad objective of this blog is to:

Get familiar with functions used for search
Exploring the 're' library
Use the expressions in a list and data frame to

Search text
Replace text

Link to extract python(ipynb) file:

https://drive.google.com/file/d/1G87XQbALi-EU6koFdz4u2MuFdQ_xm7hY/view?usp=sharing

Link to extract the html version:
https://drive.google.com/file/d/1EOudA7eL1Rk0TyeUwiqvGSRUgD0DpRWX/view?usp=sharing

Saturday, April 6, 2019

Introduction to NLP Part 1: Tokenization, Lemmatization and Stop Word Removal

In this post we would look at how to handle text data in python. Any text analysis activity basically has three main components:

Tokenization
Lemmatization/Stemming
Stop Word Removal

We would look at a small text example and understand how to perform the above three steps using the nltk library. I have performed all the operation by downloading all the methods in nltk using the following line of code

nltk.download()

I have not mentioned the above line of code in the attached python notebook and html version but it is advisable for users to run the above line after doing import nltk. The nltk.download() will take some time (few hours) to download all the relevant packages to your console. After this you can run the entire python script.

Download Link: https://drive.google.com/drive/folders/12LrZTI5qT-vzz6ce5dpXZ2ucdUsfa9S_?usp=sharing

Download the ipynb file and html version to understand the flow

Thursday, April 18, 2019

Introduction to NLP Part 2: Regular Expression in Python

Saturday, April 6, 2019

Introduction to NLP Part 1: Tokenization, Lemmatization and Stop Word Removal

Price Elasticity Model in Python