Parser

Published on February 2017 | Categories: Documents | Downloads: 24 | Comments: 0 | Views: 229
of 102
Download PDF   Embed   Report

Comments

Content

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing
Compiler Design

CSE 504

1 2 3

Grammars Recursive-Descent Parsing Top-Down Predictive Parsing
Version: 1.3 20:16:32 2012/02/14 Compiled at 07:22 on 2012/02/22 Compiler Design Parsing CSE 504 1 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing

A.k.a. Syntax Analysis Recognize sentences in a language. Discover the structure of a document/program. Construct (implicitly or explicitly) a tree (called as a parse tree) to represent the structure. The parse tree is used to guide translation.

Compiler Design

Parsing

CSE 504

2 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet.

Compiler Design

Parsing

CSE 504

3 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Efficient recognizers (automata) can be constructed to determine whether a string is in the language.

Compiler Design

Parsing

CSE 504

3 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Efficient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:

Compiler Design

Parsing

CSE 504

3 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Efficient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:
Finite Languages (FL) Enumeration

Compiler Design

Parsing

CSE 504

3 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Efficient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:
Finite Languages (FL) Enumeration Regular Languages (RL ⊃ FL) Regular Expressions

Compiler Design

Parsing

CSE 504

3 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
The syntactic structure of a language is defined using grammars. Grammars (like regular expressions) specify a set of strings over an alphabet. Efficient recognizers (automata) can be constructed to determine whether a string is in the language. Language heirarchy:
Finite Languages (FL) Enumeration Regular Languages (RL ⊃ FL) Regular Expressions Context-Free Languages (CFL ⊃ RL) Context-Free Grammars

Compiler Design

Parsing

CSE 504

3 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Regular Languages

Languages represented by regular expressions Examples: √ {a, b, c}



Languages recognized by finite automata

Compiler Design

Parsing

CSE 504

4 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Regular Languages

Languages represented by regular expressions Examples: √ √ {a, b, c}



Languages recognized by finite automata

{ , a, b, aa, ab, ba, bb, . . .}

Compiler Design

Parsing

CSE 504

4 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Regular Languages

Languages represented by regular expressions Examples: √ √ √ {a, b, c}



Languages recognized by finite automata

{ , a, b, aa, ab, ba, bb, . . .} {(ab)n | n ≥ 0}

Compiler Design

Parsing

CSE 504

4 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Regular Languages

Languages represented by regular expressions Examples: √ √ √ {a, b, c}



Languages recognized by finite automata

{ , a, b, aa, ab, ba, bb, . . .} {(ab)n | n ≥ 0}

× {an b n | n ≥ 0}

Compiler Design

Parsing

CSE 504

4 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Context-Free Grammars

Terminal Symbols: Tokens Nonterminal Symbols: set of strings made up of tokens Productions: Rules for constructing the set of strings associated with nonterminal symbols. Example: Stmt −→ while Expr do Stmt Start symbol: a nonterminal symbol that represents the set of all strings in the language.

Compiler Design

Parsing

CSE 504

5 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
Notation where recursion is explicit.
Examples: { , a, b, aa, ab, ba, bb, . . .} = L( (a | b)∗): S −→ Notational shorthand: S −→ ES S −→ | ES E −→ a E −→ a | b E −→ b

Compiler Design

Parsing

CSE 504

6 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
Notation where recursion is explicit.
Examples: { , a, b, aa, ab, ba, bb, . . .} = L( (a | b)∗): S −→ Notational shorthand: S −→ ES S −→ | ES E −→ a E −→ a | b E −→ b {an bn | n ≥ 0} : S −→ S −→ aSb

Compiler Design

Parsing

CSE 504

6 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Grammars
Notation where recursion is explicit.
Examples: { , a, b, aa, ab, ba, bb, . . .} = L( (a | b)∗): S −→ Notational shorthand: S −→ ES S −→ | ES E −→ a E −→ a | b E −→ b {an bn | n ≥ 0} : S −→ S −→ aSb {w | no. of a’s in w = no. of b’s in w }: Not expressible in CFG .

Compiler Design

Parsing

CSE 504

6 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

The First Useful Example

E E E E E E

−→ E + E −→ E − E −→ E ∗ E −→ E / E −→ ( E ) −→ id

L(E ) = {id, id + id, id − id, . . . , id + (id ∗ id) − id, . . .}

Compiler Design

Parsing

CSE 504

7 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Context-Free Grammars: Notations
Production: rule with nonterminal symbol on left hand side, and a (possibly empty) sequence of terminal or nonterminal symbols on the right hand side. Notations: Terminals: lower case letters, digits, punctuation Nonterminals: Upper case letters Arbitrary Terminals/Nonterminals: X , Y , Z Strings of Terminals: u, v , w Strings of Terminals/Nonterminals: α, β, γ Start Symbol: S

Compiler Design

Parsing

CSE 504

8 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id E =⇒ =⇒ =⇒ E +E E + id id + id

E derives id + id:

αAβ =⇒ αγβ iff A −→ γ is a production in the grammar.

Compiler Design

Parsing

CSE 504

9 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id E =⇒ =⇒ =⇒ E +E E + id id + id

E derives id + id:

αAβ =⇒ αγβ iff A −→ γ is a production in the grammar. α =⇒ β if α derives β in zero or more steps. ∗ Example: E =⇒ id + id


Compiler Design

Parsing

CSE 504

9 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id E =⇒ =⇒ =⇒ E +E E + id id + id

E derives id + id:

αAβ =⇒ αγβ iff A −→ γ is a production in the grammar. α =⇒ β if α derives β in zero or more steps. ∗ Example: E =⇒ id + id Sentence: A sequence of terminal symbols w such that S =⇒ w (where S is the start symbol)
+ ∗

Compiler Design

Parsing

CSE 504

9 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id E =⇒ =⇒ =⇒ E +E E + id id + id

E derives id + id:

αAβ =⇒ αγβ iff A −→ γ is a production in the grammar. α =⇒ β if α derives β in zero or more steps. ∗ Example: E =⇒ id + id Sentence: A sequence of terminal symbols w such that S =⇒ w (where S is the start symbol) Sentential Form: A sequence of terminal/nonterminal symbols α such that ∗ S =⇒ α
Compiler Design Parsing CSE 504 9 / 37



+

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first:

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ E +E

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ E +E id + E

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ =⇒ E +E id + E id + id

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ =⇒ Written as E =⇒lm id + id


E +E id + E id + id

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ =⇒


E +E id + E id + id

Written as E =⇒lm id + id Rightmost derivation: Rightmost nonterminal is replaced first:

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ =⇒


E +E id + E id + id

Written as E =⇒lm id + id Rightmost derivation: Rightmost nonterminal is replaced first: E =⇒ E +E

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ =⇒


E +E id + E id + id

Written as E =⇒lm id + id Rightmost derivation: Rightmost nonterminal is replaced first: E =⇒ =⇒ E +E E + id

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ =⇒


E +E id + E id + id

Written as E =⇒lm id + id Rightmost derivation: Rightmost nonterminal is replaced first: E =⇒ =⇒ =⇒ E +E E + id id + id

Compiler Design

Parsing

CSE 504

10 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Derivations
Grammar: E E −→ −→ E +E id

Leftmost derivation: Leftmost nonterminal is replaced first: E =⇒ =⇒ =⇒


E +E id + E id + id

Written as E =⇒lm id + id Rightmost derivation: Rightmost nonterminal is replaced first: E =⇒ =⇒ =⇒ Written as E =⇒rm id + id
Compiler Design Parsing CSE 504 10 / 37

E +E E + id id + id



Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parse Trees
A Parse Tree is a graphical representation of a derivation
Grammar: E E
E

−→ −→

E +E id

E

=⇒ E + E =⇒ id + E =⇒ id + id

E
E + E

=⇒ E + E =⇒ E + id =⇒ id + id

id

id

Compiler Design

Parsing

CSE 504

11 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parse Trees
A Parse Tree is a graphical representation of a derivation
Grammar: E E
E

−→ −→

E +E id

E

=⇒ E + E =⇒ id + E =⇒ id + id

E
E + E

=⇒ E + E =⇒ E + id =⇒ id + id

id

id

A Parse Tree succinctly captures the structure of a sentence.

Compiler Design

Parsing

CSE 504

11 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Ambiguity
A Grammar is ambiguous if there are multiple parse trees for the same sentence.

Grammar: E −→ E + E E −→ E ∗ E E −→ id Sentence: id + id ∗ id

Compiler Design

Parsing

CSE 504

12 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Ambiguity
A Grammar is ambiguous if there are multiple parse trees for the same sentence.
E

Grammar: E −→ E + E E −→ E ∗ E E −→ id Sentence: id + id ∗ id

E

+

E

id

E

*

E

id

id

Compiler Design

Parsing

CSE 504

12 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Ambiguity
A Grammar is ambiguous if there are multiple parse trees for the same sentence.
E

E

Grammar: E −→ E + E E −→ E ∗ E E −→ id Sentence: id + id ∗ id

E

+

E

E

*

E

id

E

*

E

E

+

E

id

id

id

id

id

Compiler Design

Parsing

CSE 504

12 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Disambiguition
Express Preference for one parse tree over others. Example: id + id ∗ id The usual precedence of ∗ over + means:
E

E

E

+

E

E

*

E

id

E

*

E

E

+

E

id

id

id

id

id

Preferred

Compiler Design

Parsing

CSE 504

13 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing
Construct a parse tree for a given string.
S S S −→ −→ −→ (S)S a

Compiler Design

Parsing

CSE 504

14 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing
Construct a parse tree for a given string.
S S S −→ −→ −→ (S)S a

(a)a

(a)(a)

Compiler Design

Parsing

CSE 504

14 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing
Construct a parse tree for a given string.
S S S −→ −→ −→ (S)S a

(a)a
S

(a)(a)

(

S

)

S

a

a

Compiler Design

Parsing

CSE 504

14 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing
Construct a parse tree for a given string.
S S S −→ −→ −→ (S)S a

(a)a
S
( S

(a)(a)
S

)

S

(

S

)

S
a ( S ) S

a

a
a

ε
CSE 504 14 / 37

Compiler Design

Parsing

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

A Procedure for Parsing

Grammar:

S

−→ a

Algorithm parse S() { switch (input token) { case TOKEN A: consume(TOKEN A); return; default: /* Parse Error */ } }

Compiler Design

Parsing

CSE 504

15 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

A Procedure for Parsing (Contd.)
S S −→ −→ a

Grammar:

Algorithm parse S() { switch (input token) { case TOKEN A: /* Production 1 */ consume(TOKEN A); return; case TOKEN EOF : /* Production 2 */ return; default: /* Parse Error */ } }

Compiler Design

Parsing

CSE 504

16 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

A Procedure for Parsing (Contd.)
S S S −→ −→ −→ (S)S a

Grammar:

Algorithm parse S() { switch (input token) { case TOKEN OPEN PAREN: /* Production 1 */ consume(TOKEN OPEN PAREN); parse S(); consume(TOKEN CLOSE PAREN); parse S(); return; /* Continued on next page */

Compiler Design

Parsing

CSE 504

17 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

A Procedure for Parsing (contd.)
S S S −→ −→ −→ (S)S a

Grammar:

case TOKEN A: /* Production 2 */ consume(TOKEN A); return; case TOKEN CLOSE PAREN: case TOKEN EOF : /* Production 3 */ return; default: /* Parse Error */

Compiler Design

Parsing

CSE 504

18 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Predictive Parsing: Restrictions

May not be able to choose a unique production

S

−→ a B d

B −→ b B −→ bc In general, we may need a backtracking parser: Recursive Descent Parsing

Compiler Design

Parsing

CSE 504

19 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

FIRST and FOLLOW
Grammar: S −→ (S)S | a |

FIRST(X ) = First symbol of any string that can be derived from X FIRST(S) = {(, a, }. FOLLOW(A) = First symbol that, in some derivation of a sentence in the language, appears immediately after A. FOLLOW(S) = {), EOF}
S

C a
Compiler Design

a ∈ FIRST(C ) b ∈ FOLLOW(C )
b
Parsing CSE 504 20 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

FIRST and FOLLOW

Grammar:

S S

−→ A S B −→

A −→ a B −→ b

FIRST (X ): FOLLOW (A):

First terminal in some α such that ∗ X =⇒ α. First terminal in some β such that ∗ S =⇒ αAβ.

FIRST (S) FIRST (A) FIRST (B)

= = =

{ a, } {a} {b}

Compiler Design

Parsing

CSE 504

21 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

FIRST and FOLLOW

Grammar:

S S

−→ A S B −→

A −→ a B −→ b

FIRST (X ): FOLLOW (A):

First terminal in some α such that ∗ X =⇒ α. First terminal in some β such that ∗ S =⇒ αAβ. FOLLOW (S) = { b, EOF }

FIRST (S) FIRST (A) FIRST (B)

= = =

{ a, } {a} {b}

Compiler Design

Parsing

CSE 504

21 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

FIRST and FOLLOW

Grammar:

S S

−→ A S B −→

A −→ a B −→ b

FIRST (X ): FOLLOW (A):

First terminal in some α such that ∗ X =⇒ α. First terminal in some β such that ∗ S =⇒ αAβ. FOLLOW (S) FOLLOW (A) = = { b, EOF } { a, b }

FIRST (S) FIRST (A) FIRST (B)

= = =

{ a, } {a} {b}

Compiler Design

Parsing

CSE 504

21 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

FIRST and FOLLOW

Grammar:

S S

−→ A S B −→

A −→ a B −→ b

FIRST (X ): FOLLOW (A):

First terminal in some α such that ∗ X =⇒ α. First terminal in some β such that ∗ S =⇒ αAβ. FOLLOW (S) FOLLOW (A) FOLLOW (B) = = = { b, EOF } { a, b } { b, EOF }

FIRST (S) FIRST (A) FIRST (B)

= = =

{ a, } {a} {b}

Compiler Design

Parsing

CSE 504

21 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FIRST
S S −→ −→ AS B A −→ B −→ a b

Grammar:

FIRST (α) is the smallest set such that
α= a, a terminal A, a nonterminal X1 , X2 , . . . , Xk , a string of terminals and nonterminals Property of FIRST (α) a ∈ FIRST (α) A −→ ∈ G =⇒ ∈ FIRST (α) A −→ β ∈ G , β = =⇒ FIRST (β) ⊆ FIRST (α) FIRST (X1 ) − { } ⊆ FIRST (α) FIRST (Xi ) ⊆ FIRST (α) if ∀j < i ∈ FIRST (Xj ) ∈ FIRST (α) if ∀j ≤ k ∈ FIRST (Xj )

FIRST (A) ⊇ FIRST (a) ⊇ {a}
Compiler Design Parsing CSE 504 22 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FIRST
S S −→ −→ AS B A −→ B −→ a b

Grammar:

FIRST (α) is the smallest set such that
α= a, a terminal A, a nonterminal X1 , X2 , . . . , Xk , a string of terminals and nonterminals Property of FIRST (α) a ∈ FIRST (α) A −→ ∈ G =⇒ ∈ FIRST (α) A −→ β ∈ G , β = =⇒ FIRST (β) ⊆ FIRST (α) FIRST (X1 ) − { } ⊆ FIRST (α) FIRST (Xi ) ⊆ FIRST (α) if ∀j < i ∈ FIRST (Xj ) ∈ FIRST (α) if ∀j ≤ k ∈ FIRST (Xj )

FIRST (B) ⊇ FIRST (b) ⊇ {b}
Compiler Design Parsing CSE 504 22 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FIRST
S S −→ −→ AS B A −→ B −→ a b

Grammar:

FIRST (α) is the smallest set such that
α= a, a terminal A, a nonterminal X1 , X2 , . . . , Xk , a string of terminals and nonterminals Property of FIRST (α) a ∈ FIRST (α) A −→ ∈ G =⇒ ∈ FIRST (α) A −→ β ∈ G , β = =⇒ FIRST (β) ⊆ FIRST (α) FIRST (X1 ) − { } ⊆ FIRST (α) FIRST (Xi ) ⊆ FIRST (α) if ∀j < i ∈ FIRST (Xj ) ∈ FIRST (α) if ∀j ≤ k ∈ FIRST (Xj )

FIRST (S) ⊇ { }
Compiler Design Parsing CSE 504 22 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FIRST
S S −→ −→ AS B A −→ B −→ a b

Grammar:

FIRST (α) is the smallest set such that
α= a, a terminal A, a nonterminal X1 , X2 , . . . , Xk , a string of terminals and nonterminals Property of FIRST (α) a ∈ FIRST (α) A −→ ∈ G =⇒ ∈ FIRST (α) A −→ β ∈ G , β = =⇒ FIRST (β) ⊆ FIRST (α) FIRST (X1 ) − { } ⊆ FIRST (α) FIRST (Xi ) ⊆ FIRST (α) if ∀j < i ∈ FIRST (Xj ) ∈ FIRST (α) if ∀j ≤ k ∈ FIRST (Xj )

FIRST (S) ⊇ { }, and FIRST (S) ⊇ FIRST ({ASB}) ⊇ FIRST (A) ⊇ {a}
Compiler Design Parsing CSE 504 22 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FOLLOW

Grammar:

S S

−→ −→

AS B

A −→ B −→

a b

FOLLOW (A) is the smallest set such that
A = S, the start symbol B −→ αAβ ∈ G B −→ αA, or B −→ αAβ, ∈ FIRST (β) Property of FOLLOW (A) EOF ∈ FOLLOW (S) Book notation: $ ∈ FOLLOW (S) FIRST (β) − { } ⊆ FOLLOW (A) FOLLOW (B) ⊆ FOLLOW (A)

FOLLOW (S) ⊇ {EOF }
Compiler Design Parsing CSE 504 23 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FOLLOW

Grammar:

S S

−→ −→

AS B

A −→ B −→

a b

FOLLOW (A) is the smallest set such that
A = S, the start symbol B −→ αAβ ∈ G B −→ αA, or B −→ αAβ, ∈ FIRST (β) Property of FOLLOW (A) EOF ∈ FOLLOW (S) Book notation: $ ∈ FOLLOW (S) FIRST (β) − { } ⊆ FOLLOW (A) FOLLOW (B) ⊆ FOLLOW (A)

FOLLOW (S) ⊇ {EOF }, and FOLLOW (S) ⊇ FIRST (B) − ⊇ {b}
Compiler Design Parsing CSE 504 23 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FOLLOW

Grammar:

S S

−→ −→

AS B

A −→ B −→

a b

FOLLOW (A) is the smallest set such that
A = S, the start symbol B −→ αAβ ∈ G B −→ αA, or B −→ αAβ, ∈ FIRST (β) Property of FOLLOW (A) EOF ∈ FOLLOW (S) Book notation: $ ∈ FOLLOW (S) FIRST (β) − { } ⊆ FOLLOW (A) FOLLOW (B) ⊆ FOLLOW (A)

FOLLOW (A) ⊇ FIRST (SB) − ⊇ {a, b}
Compiler Design Parsing CSE 504 23 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Definition of FOLLOW

Grammar:

S S

−→ −→

AS B

A −→ B −→

a b

FOLLOW (A) is the smallest set such that
A = S, the start symbol B −→ αAβ ∈ G B −→ αA, or B −→ αAβ, ∈ FIRST (β) Property of FOLLOW (A) EOF ∈ FOLLOW (S) Book notation: $ ∈ FOLLOW (S) FIRST (β) − { } ⊆ FOLLOW (A) FOLLOW (B) ⊆ FOLLOW (A)

FOLLOW (B) ⊇ FOLLOW (S) ⊇ {b, EOF }
Compiler Design Parsing CSE 504 23 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing Table
Grammar: S S −→ −→ AS B A B −→ −→ a b

Algorithm parse S() { switch (input token) { case TOKEN A: /* Production 3 */ parse A(); parse S(); parse B(); return; case TOKEN B: case TOKEN EOF : /* Production 4 */ return; Parsing Table:

Nonterminal S

Input Symbol a b EOF S −→ A S B S −→ S −→

Compiler Design

Parsing

CSE 504

24 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Table-driven Parsing

Grammar:

S S

−→ −→

AS B

A −→ B −→

a b

Parsing Table: Input Symbol a b EOF S −→ A S B S −→ S −→ A −→ a B −→ b

Nonterminal S A B

Compiler Design

Parsing

CSE 504

25 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Nonrecursive Parsing

Instead of recursion, use an explicit stack along with the parsing table. Data objects: Parsing Table: M(A, a), a two-dimensional array, dimensions indexed by nonterminal symbols (A) and terminal symbols (a). A Stack of terminal/nonterminal symbols Input stream of tokens The above data structures manipulated using a table-driven parsing program.

Compiler Design

Parsing

CSE 504

26 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Table-driven Parsing: a Sketch
stack initialized to hold the start symbol. while (! stack.isEmpty()) { X = stack.top(); if (X is a terminal symbol) consume(X ); else /* X is a nonterminal */ if (M[X , input token] = X −→ Y1 , Y2 , . . . , Yk ) { stack.pop(); for i = k downto 1 do stack.push(Yi ); } else /* Syntax Error */ }
Compiler Design Parsing CSE 504 27 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Constructing Parsing Table

Grammar:

S S

−→ −→

AS B

A −→ B −→ = = =

a b

First(S) First(A) First(B)

= = =

{ a, } {a} {b}

Follow (S) Follow (A) Follow (B)

{ b, EOF } { a, b } { b, EOF }

Nonterminal S A B

Input Symbol a b EOF S −→ A S B S −→ S −→ A −→ a B −→ b

Compiler Design

Parsing

CSE 504

28 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

A Procedure to Construct Parsing Tables
FIRST (X ): FOLLOW (A): First terminal in some α such that ∗ X =⇒ α. First terminal in some β such that ∗ S =⇒ αAβ.

Algorithm table construct(G ) { for each A −→ α ∈ G { for each a ∈ FIRST (α) such that a = add A −→ α to M[A, a]; if ∈ FIRST (α) for each b ∈ FOLLOW (A) add A −→ α to M[A, b]; }}

Compiler Design

Parsing

CSE 504

29 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing Table: Another Example

Grammar:

S S S

−→ −→ −→

(S)S a

FIRST(S) = {’(’, a, } FOLLOW(S) = {’)’, EOF} a S −→ a Input Symbol ( ) S −→ (S)S S −→ EOF S −→

LL(1) Grammar: When the grammar’s recursive descent parsing table has no conflicts (i.e. each cell has at most one entry).

Compiler Design

Parsing

CSE 504

30 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Recursive Descent Parsing: Restrictions
Grammar cannot be left-recursive Example: E −→ E + E | a
Algorithm parse E () { switch (input token) { case TOKEN A: /* Production 1 */ parse E (); consume(TOKEN PLUS); parse E (); return; case TOKEN A: /* Production 2 */ consume(TOKEN A); return; } }

Compiler Design

Parsing

CSE 504

31 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Removing Left Recursion

A −→ A a A −→ b

L(A) = {b, ba, baa, baaa, baaaa, . . .}

A −→ bA A A −→ aA −→

Compiler Design

Parsing

CSE 504

32 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Removing Left Recursion: Another Example

E E

−→ E + E −→ id ⇓

E E E

−→ id E −→ + id E −→

Compiler Design

Parsing

CSE 504

33 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S a S −→ A S B A −→ a Rule b S −→ B −→ b Derivation EOF S −→

Input Stream a a b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B Derivation S =⇒ A S B EOF S −→

Input Stream a a b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a Derivation S =⇒ A S B =⇒ a S B EOF S −→

Input Stream a a b b$ a a b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a Derivation S =⇒ A S B =⇒ a S B EOF S −→

Input Stream a a b b$ a a b b$ a a b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a S −→ A S B Derivation S =⇒ A S B =⇒ a S B =⇒ aAS B B EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a S −→ A S B A −→ a Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ aAS B B aaS B B EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A $B B S a a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a S −→ A S B A −→ a Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ aAS B B aaS B B EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A $B B S a $B B S a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a S −→ A S B A −→ a S −→ Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ =⇒ aAS B B aaS B B aaB B EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A $B B S a $B B S $B B a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a S −→ A S B A −→ a S −→ B −→ b Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ =⇒ =⇒ aAS B B aaS B B aaB B aabB EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A $B B S a $B B S $B B $B b a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a S −→ A S B A −→ a S −→ B −→ b Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ =⇒ =⇒ aAS B B aaS B B aaB B aabB EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A $B B S a $B B S $B B $B b $B a S −→ A S B A −→ a b S −→ B −→ b Rule S −→ A S B A −→ a S −→ A S B A −→ a S −→ B −→ b B −→ b Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ =⇒ =⇒ =⇒ aAS B B aaS B B aaB B aabB aabb EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$ b$

Compiler Design

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A $B B S a $B B S $B B $B b $B $b
Compiler Design

a S −→ A S B A −→ a

b S −→ B −→ b

EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$ b$ b$

Rule S −→ A S B A −→ a S −→ A S B A −→ a S −→ B −→ b B −→ b

Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ =⇒ =⇒ =⇒ aAS B B aaS B B aaB B aabB aabb

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 1
S A B Stack $S $B S A $B S a $B S $B B S A $B B S a $B B S $B B $B b $B $b $
Compiler Design

a S −→ A S B A −→ a

b S −→ B −→ b

EOF S −→

Input Stream a a b b$ a a b b$ a a b b$ a b b$ a b b$ a b b$ b b$ b b$ b b$ b$ b$ $

Rule S −→ A S B A −→ a S −→ A S B A −→ a S −→ B −→ b B −→ b

Derivation S =⇒ A S B =⇒ a S B =⇒ =⇒ =⇒ =⇒ =⇒ aAS B B aaS B B aaB B aabB aabb

Parsing

CSE 504

34 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule

Nonterminal E E Stack $E

id E −→ id E

EOF E −→

Input Stream id + id$

Derivation

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule E −→ id E

Nonterminal E E Stack $E

id E −→ id E

EOF E −→

Input Stream id + id$

Derivation E =⇒ id E

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule E −→ id E

Nonterminal E E Stack $E $E id

id E −→ id E

EOF E −→

Input Stream id + id$ id + id$

Derivation E =⇒ id E

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule E −→ id E E −→ + id E

Nonterminal E E Stack $E $E id $E

id E −→ id E

EOF E −→

Input Stream id + id$ id + id$ + id$

Derivation E =⇒ id E =⇒ id + id E

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule E −→ id E E −→ + id E

Nonterminal E E Stack $E $E id $E $E id +

id E −→ id E

EOF E −→

Input Stream id + id$ id + id$ + id$ + id$

Derivation E =⇒ id E =⇒ id + id E

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule E −→ id E E −→ + id E

Nonterminal E E Stack $E $E id $E $E id + $E id

id E −→ id E

EOF E −→

Input Stream id + id$ id + id$ + id$ + id$ id$

Derivation E =⇒ id E =⇒ id + id E

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule E −→ id E E −→ + id E

Nonterminal E E Stack $E $E id $E $E id + $E id $E

id E −→ id E

EOF E −→

Input Stream id + id$ id + id$ + id$ + id$ id$ $

Derivation E =⇒ id E =⇒ id + id E

E −→

=⇒

id+id

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Parsing with LL(1) Grammars: Example 2
Input Symbol + E −→ + E E Rule E −→ id E E −→ + id E

Nonterminal E E Stack $E $E id $E $E id + $E id $E $

id E −→ id E

EOF E −→

Input Stream id + id$ id + id$ + id$ + id$ id$ $ $

Derivation E =⇒ id E =⇒ id + id E

E −→

=⇒

id+id

Compiler Design

Parsing

CSE 504

35 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

LL(1) Derivations

Left to Right Scan of input

Compiler Design

Parsing

CSE 504

36 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

LL(1) Derivations

Left to Right Scan of input Leftmost Derivation

Compiler Design

Parsing

CSE 504

36 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

LL(1) Derivations

Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step

Compiler Design

Parsing

CSE 504

36 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

LL(1) Derivations

Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step Alternative characterization of LL(1) Grammars: Whenever A −→ α | β ∈ G
1

FIRST (α) ∩ FIRST (β) = { }, and

Compiler Design

Parsing

CSE 504

36 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

LL(1) Derivations

Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step Alternative characterization of LL(1) Grammars: Whenever A −→ α | β ∈ G
1 2

FIRST (α) ∩ FIRST (β) = { }, and if α =⇒


then FIRST (β) ∩ FOLLOW (A) = { }.

Compiler Design

Parsing

CSE 504

36 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

LL(1) Derivations

Left to Right Scan of input Leftmost Derivation (1) look ahead 1 token at each step Alternative characterization of LL(1) Grammars: Whenever A −→ α | β ∈ G
1 2

FIRST (α) ∩ FIRST (β) = { }, and if α =⇒


then FIRST (β) ∩ FOLLOW (A) = { }.

Corollary: No Ambiguous Grammar is LL(1).

Compiler Design

Parsing

CSE 504

36 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Other Parsing Algorithms
LR-, LALR-, SLR,. . . Table-driven “bottom-up” parsers (builds parse trees from leaves to root). Parsing time is linear in the length of the input string.

Compiler Design

Parsing

CSE 504

37 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Other Parsing Algorithms
LR-, LALR-, SLR,. . . Table-driven “bottom-up” parsers (builds parse trees from leaves to root). Parsing time is linear in the length of the input string. The set of LR(k) grammars includes all LL(k) grammars (but some grammars are not LR(k) for any k).

Compiler Design

Parsing

CSE 504

37 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Other Parsing Algorithms
LR-, LALR-, SLR,. . . Table-driven “bottom-up” parsers (builds parse trees from leaves to root). Parsing time is linear in the length of the input string. The set of LR(k) grammars includes all LL(k) grammars (but some grammars are not LR(k) for any k). Operator precedence parsers

Compiler Design

Parsing

CSE 504

37 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Other Parsing Algorithms
LR-, LALR-, SLR,. . . Table-driven “bottom-up” parsers (builds parse trees from leaves to root). Parsing time is linear in the length of the input string. The set of LR(k) grammars includes all LL(k) grammars (but some grammars are not LR(k) for any k). Operator precedence parsers Chart parsers (used in Natural Language Processing)

Compiler Design

Parsing

CSE 504

37 / 37

Grammars

Recursive-Descent Parsing

Top-Down Predictive Parsing

Other Parsing Algorithms
LR-, LALR-, SLR,. . . Table-driven “bottom-up” parsers (builds parse trees from leaves to root). Parsing time is linear in the length of the input string. The set of LR(k) grammars includes all LL(k) grammars (but some grammars are not LR(k) for any k). Operator precedence parsers Chart parsers (used in Natural Language Processing) Cocke-Kasami-Younger & Earley parsers: can handle arbitrary context-free grammars (but the parsers may take quadratic or cubic time).

Compiler Design

Parsing

CSE 504

37 / 37

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close