List the steps to build an NLP pipeline.

 NLP Pipeline

There are the following steps to build an NLP pipeline:

1. Segmentation: It breaks the paragraph into separate sentences. For example, consider a paragraph "The sky is clear; the stars are twinkling at night." The segments are: a) The sky is clear. b) The stars are twinkling at night.

2. Tokenization: It is used to break the sentence into separate words or tokens. Tokenizer generates the following result: "The", "stars"," are"," twinkling","at"," night" 

3. Stop word Removal: Words such as was, in, is, and, the, are called stop words and can be removed in this stage. For example, words after stop word removal are "stars", "twinkling", "night".

4. Stemming: It is the process of obtaining the word stem of a word. Word stem gives new words up on adding affixes to them. For example, celebrates, celebrated and celebrating, all these words are originated with a single root word "celebrate".

5. Lemmatization: Lemmatization is quite similar to stemming. It is used to group different inflected forms of the word, called Lemma. The main difference between Stemming and lemmatization is that it produces the root word, which has a meaning. For example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word intelligent, which has a meaning.)

6. Dependency Parsing: Dependency parsing is used to find that how all the words in the sentence are related to each other.

7. Part of Speech (POS) tags: It includes noun, verb, adverb, and adjective. It indicates that how a word functions with its meaning as well as grammatically within the sentences. A word has one or more parts of speech based on the context in which it is used. For example "Google" something on the internet. In the above example, Google is used as a verb, although it is a proper noun.

8. Named Entity Recognition (NER): Named Entity Recognition (NER) is the process of detecting the named entity such as person name, movie name, organization name, Example: Steve Jobs introduced the iPhone at the Macworld Conference in San Francisco, California. location.

9. Chunking: Chunking is used to collect the individual piece of information and group them into bigger pieces of sentences.


Comments

Popular posts from this blog

What are different steps used in JDBC? Write down a small program showing all steps.

Discuss classification or taxonomy of virtualization at different levels.

Pure Versus Partial EC