Text Analytics with Python - Dipanjan Sarkar

- DE
- FR

E-Book (pdf) Text Analytics with Python von Dipanjan Sarkar

Text Analytics with Python Dipanjan Sarkar E-Books Englisch

Leverage Natural Language Processing (NLP) in Python and learn how to set up your own robust environment for performing text analytics. The second edition of this book will show you how to use the latest state-of-the-art frameworks in NLP, coupled with Machine Learning and Deep Learning to solve real-world case studies leveraging the power of Python.

This edition has gone through a major revamp introducing several major changes and new topics based on the recent trends in NLP. We have a dedicated chapter around Python for NLP covering fundamentals on how to work with strings and text data along with introducing the current state-of-the-art open-source frameworks in NLP. We have a dedicated chapter on feature engineering representation methods for text data including both traditional statistical models and newer deep learning based embedding models. Techniques around parsing and processing text data have also been improved with some new methods.

Considering popular NLP applications, for text classification, we also cover methods for tuning and improving our models. Text Summarization has gone through a major overhaul in the context of topic models where we showcase how to build, tune and interpret topic models in the context of an interest dataset on NIPS conference papers. Similarly, we cover text similarity techniques with a real-world example of movie recommenders. Sentiment Analysis is covered in-depth with both supervised and unsupervised techniques. We also cover both machine learning and deep learning models for supervised sentiment analysis. Semantic Analysis gets its own dedicated chapter where we also showcase how you can build your own Named Entity Recognition (NER) system from scratch. To conclude things, we also have a completely new chapter on the promised of Deep Learning for NLP where we also showcase a hands-on example on deep transfer learning.

While the overall structure of the book remainsthe same, the entire code base, modules, and chapters will be updated to the latest Python 3.x release.

----------------------------------

Also the key selling points

. Implementations are based on Python 3.x and state-of-the-art popular open source libraries in NLP

. Covers Machine Learning and Deep Learning for Advanced Text Analytics and NLP

. Showcases diverse NLP applications including Classification, Clustering, Similarity Recommenders, Topic Models, Sentiment and Semantic Analysis

Autorentext
Dipanjan (DJ) Sarkar is a Data Scientist at Red Hat, a published author and a consultant and trainer. He has consulted and worked with several startups as well as Fortune 500 companies like Intel. He primarily works on leveraging data science, advanced analytics, machine learning and deep learning to build large- scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He is also an avid supporter of self-learning and massive open online courses. He has recently ventured into the world of open-source products to improve the productivity of developers across the world.

Dipanjan has been an analytics practitioner for several years now, specializing in machine learning, natural language processing, statistical methods and deep learning. Having a passion for data science and education, he also acts as an AI Consultant and Mentor at various organizations like Springboard, where he helps people build their skills on areas like Data Science and Machine Learning. He also acts as a key contributor and Editor for Towards Data Science, a leading online journal focusing on Artificial Intelligence and Data Science. Dipanjan has also authored several books on R, Python, Machine Learning, Social Media Analytics, Natural Language Processing and Deep Learning.

Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science, artificial intelligence and deep learning. In his spare time he loves reading, gaming, watching popular sitcoms and football and writing interesting articles on https://medium.com/@dipanzan.sarkar and https://www.linkedin.com/in/dipanzan. He is also a strong supporter of open-source and publishes his code and analyses from his books and articles on GitHub at https://github.com/dipanjanS.

Inhalt

Chapter 1: Natural Language Basics

Chapter Goal: Introduces the readers to the basics of NLP and Text processing

No of pages: 40 - 50

Sub -Topics

1.Language Syntax and Structure

2.Text formats and grammars

3.Lexical and Text Corpora resources

4.Deep dive into the Wordnet corpus

5.Parts of speech, Stemming and lemmatization

Chapter 2: Python for Natural Language Processing

Chapter Goal: A useful chapter for people focusing on how to setup your own python environment for NLP and also some basics on handling text data with python and coverage of popular open source frameworks for NLP

No of pages: 20 - 30

Sub - Topics

1.Setup Python for NLP

2. Handling strings with Python

3. Regular Expressions with Python

4.Quick glance into nltk, gensim, spacy, scikit-learn, keras

Chapter 3: Processing and Understanding Text

Chapter Goal: This chapter covers all the techniques and capabilities needed for processing and parsing text into easy to understand formats. We also look at how to segment and normalize text.

No of pages : 35 - 40

Sub - Topics:

1.Sentence and word tokenization

2.Text tagging and chunking

3.Text Parse Trees

3.Text normalization

4. Text spell checks and removal of redundant characters

5. Synonyms and Synsets

Chapter 4: Feature Engineering for Text Data

Chapter Goal: This chapter covers important strategies to extract meaningful features from unstructured text data. This includes traditional techniques as well as newer deep learning based methods.

No of pages : 40 - 50

Sub - Topics:

1.Feature engineering strategies for text data

2.Bag of words model

3.TF-IDF model

3.Bag of N-grams model

4. Topic Models

5. Word Embedding based models (word2vec, glove)

Chapter 5: Text Classification

Chapter Goal: Introduces readers to the concept of classification as a supervised machine learning problem and looks at a real world example for classifying text documents

No of pages: 30 - 40

Sub - Topics:

1. Classification basics

2. Types of classifiers

3. Feature generation of text documents

4.Binary and multi-class classification models

5.Building a text classifier on real world data with machine learning

6.Some coverage of deep learning based classifiers

7.Evaluating Classifiers

Chapter 6: Text summarization and topic modeling

Chapter Goal: Introduces the concepts of text summarization, n-gram tagging analysis and topic models to the readers and looks at some real world datasets and hands-on implementations on the same

No of pages: 40 - 45

Sub - Topics:

1.Text summarization concepts

2.Dimensionality reduction

3. N-gram tagging models

4. Topic modeling using LDA and LSA

5. Generate topics from real world data…

Titel

Text Analytics with Python

Untertitel

A Practitioner's Guide to Natural Language Processing

Autor

Dipanjan Sarkar

EAN

9781484243541

Format

E-Book (pdf)

Hersteller

Apress

Genre

IT & Internet

Veröffentlichung

21.05.2019

Digitaler Kopierschutz

Wasserzeichen

Dateigrösse

17.41 MB

Anzahl Seiten

674