June 24, 2022

Syntactic Analysis by NLP

Druti Bansaldrutibansal

DURATION

10min

The Parser Concept

It’s used to carry out the parsing process. It is a software component that takes input data (text) and converts it into a structural representation after verifying it for valid syntax using formal grammar. It creates a data structure, which can be a parse tree, an abstract syntax tree, or another hierarchical structure.

The primary functions of parse include:

To report any errors in syntax.
To recover from a frequently recurring error so that the rest of the program may be processed.
To make a parse tree.
To make a symbol table.
Creating intermediate representations (IR).

Parsing Techniques

Parsing is divided into two categories by derivation

Top-Down parsing

The parser constructs the parse tree and then proceeds to convert the start symbol to the input in this type of parsing. To parse the input, the most common kind of top-down parsing employs a recursive approach. Backtracking is the fundamental drawback of recursive descent parsing.

Bottom-Up parsing

In this type of parsing, the parser begins with the input symbol and attempts to build the parser tree up to the start.

Derivation Concept

We’ll need a series of production rules to acquire the input string. A collection of production rules is known as derivation. During parsing, we must choose the non-terminal that will be replaced and the production rule that will be used to replace the non-terminal.

Different Types of Derivation

There are two sorts of derivations in this part, which may be used to determine which non-terminal should be substituted with the production rule.

Left-Most Derivation

The sentential shape of input is read and substituted from left to right in the left-most derivation. The left-sentential form is the sentential shape in this circumstance.

Right-Most Derivation

The sentential shape of input is read and substituted from right to left in the right-most derivation. The right-sentential form is the sentential shape in this circumstance.

Concept of the Parse Tree

It may be characterized as a visual representation of a derivation. The root node of the parse tree is the starting element of derivation. The leaf nodes are endpoints in every parse tree, while the inside nodes are non-terminals. In-order traversal produces the original input string, a feature of parse trees.

Constituency Grammar or Phrase Structure

The constituency relation is the foundation of phrase structure grammar introduced by Noam Chomsky. As a result, it’s also known as constituency grammar. Dependency grammar is the polar opposite of this.

We must first grasp the fundamentals of constituency connection and constituency grammar before presenting an example.

In all associated frameworks, the sentence structure is viewed from the perspective of constituency relations.
The subject-predicate division in Latin and Greek grammar is the source of the constituency connection.
To comprehend the basic sentence form, utilize the verb phrase VP and the noun phrase NP.

We will use the sentence “This tree is illustrating the constituency relation” to understand how syntactical analysis works with help of code.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import nltk
from nltk
import pos_tag, word_tokenize, RegexpParser

# String to parse
to_parse = "This tree is illustrating the constituency relation"

# Find all parts of speech in above sentence
tagged_parts = pos_tag(word_tokenize(to_parse))

# Defining grammar on basis of which we 've to extract
grammar = r ""
" NP: {<DT>?<JJ>*<NN>}
P: {
    <IN>}
    V: {<V.*>}
    PP: {<p> <NP>}
    VP: {<V> <NP|PP>*}"""

 #Extracting all parts of speech
 parser = RegexpParser(grammar)

 # Print all parts of speech in above sentence
 output = parser.parse(tagged_parts)
 print("\nAfter Extracting the parts\n\n", output,"\n")