Academics &Innovations: 2013

PRINCIPLES OF COMPILER DESIGN

UNIT-I Lexical Analysis

What is a compiler? [AUT Nov. / Dec. 2011] [AU Nov. / Dec. 2009]

A Compiler is a program that reads a program written in one language – the source language and translates it into an equivalent program in another language- the target language. An important role of compiler is to report errors in the source program that it detects during the translation process.

What is an interpreter?

An interpreter is another kind of language processor. Instead of producing a target program as a translation, an interpreter appears to directly execute the operations specified in the source program on inputs supplied by the user.

What is a hybrid compiler? Give an example.

A language processors which combine compilation and interpretation is called as hybrid compiler. Eg. Java Language processors.

A java Source program may first compiled into an intermediate form called bytecodes. The bytecodes are then interpreted by a virtual machine. A benefit of this is that bytecodes compiled on one machine can be interpreted on another machines. To achieve faster processing, just-in-time compilers, translate the bytecodes into machine language immediately before they run the intermediate program to process the input.

What are the two parts of a compilation? Explain briefly.

Analysis and Synthesis are the two parts of compilation.

The analysis part is the front end of the compiler. It breaks up the source program into constituent pieces and creates an intermediate representation of the source program.It collects information about source program and stores it ina data structure called a symbol table.

The synthesis part is the back end of the compiler. It constructs the desired target program from the intermediate representation and the information in the symbol table.

List the phases of the compiler

Lexical Analyzer
Syntactic Analyzer
Semantic Analyzer
Intermediate Code Generation
Machine-Independent Code Optimization
Code Generation
Machine-Dependent Code Optimization

List the phases that constitute the front end of a compiler.

The front end consists of those phases or parts of phases those depends primarily on the source language and are largely independent of the target machine. These include

· Lexical and Syntactic analysis

· The creation of symbol table

· Semantic analysis

· Generation of intermediate code

A certain amount of code optimization can be done by the front end as well. Also includes error handling that goes along with each of these phases.

Mention the back-end phases of a compiler.

The back end of compiler includes those portions that depend on the target machine and generally those portions do not depend on the source language, just the intermediate language. These include

· Code optimization

· Code generation, along with error handling and symbol- table operations.

List the subparts or phases of analysis part.

Analysis consists of three phases:

· Linear Analysis.

· Hierarchical Analysis.

· Semantic Analysis.

What are the classifications of a compiler?

Compilers are classified as:

· Single- pass

· Multi-pass

· Load-and-go

· Debugging or optimizing

What is a symbol table? [AU Nov/Dec 2007]

A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. Whenever an identifier is detected by a lexical analyzer, it is entered into the symbol table. The attributes of an identifier cannot be determined by the lexical analyzer.

What is lexical analysis?

Linear analysis is one in which the stream of characters making up the source program is read from left to right and grouped into tokens that are sequences of characters having a collective meaning called lexemes. For each lexemes, lexical analyser produces a token of form <token-name, attribute-value>. The lexical analysis is also called as linear analysis or scanning.

What is syntax analysis?

It is the second phase of compiler. It is also called as Parsing. The parser uses the first components of tokens produced by the lexical analyzer to crteate a tree-like intermediate representation(i.e) a grammatical structure of a token stream. A syntax tree in which each interior node represents an operation and the children of the node represents the arguments of the operation.

What is the use of Syntax directed translation engines?

Syntax directed translation engines produce collections of routines that walk the parse tree, and generating the intermediate code.

What is the function of the semantic analyzer? [AU Nov / Dec 2011]

A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation. Type-checking is an important part of semantic analyzer

Ex: newval := oldval + 12

The type of the identifier newval must match with type of the expression (oldval+12)

What are the functions of Intermediate code generator?

A compiler may produce an explicit intermediate codes representing the source program. These intermediate codes are generally machining (architecture independent). But the level of intermediate codes is close to the level of machine codes.

Ex:

newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2,id3,temp1 Intermediates Codes (Quadraples)

ADD temp1,#1,temp2

MOV temp2,,id1

What is the function of the code generator?

Produces the target language in a specific architecture.The target program is normally is a re locatable object file containing the machine codes.

Ex:( assume that we have an architecture with instructions whose at least one of its operands is a machine register)

MOVE id2,R1

MULT id3,R1

ADD #1,R1

MOVE R1,id1

What is the function of the code optimizer?

The code optimizer optimizes the code produced by the intermediate code generator in the terms of time and space.

Ex:

MULT id2,id3,temp1

ADD temp1,#1,id1

Mention some of the cousins of a compiler. [AU Apr/May 2004, 2005][AU Nov/Dec 2013]

Cousins of the compiler are:

§ Preprocessors

§ Two Pass Assembly

§ Assemblers

§ Loaders and Link-Editors

What is the function of pretty printers?

A Pretty printer analyzes a program and prints it in such a way that the structure of the program is clearly visible.

How the Grouping of Phases into Passes is organized?

In an implementation, activities from several phases may be grouped together into a pass that reads an input file and writes an output file. (Ex.) The front-end phases of lexical,syntax,semantic and intermediate code generation might be grouped together into one pass. Code optimization might be optional pass. Then the back-end pass consisting of code generation for a particular target machine.

What are the various compiler construction Tools available? [AU Nov/Dec 2004]

1. Parser generators

2. Scanner generators

3. Syntax – directed translation engine

4. Code-generator generators

5. Data Flow analysis engine

6. Compiler-construction toolkits

Define compiler-compiler. [AUT Nov/Dec 2010]

Systems to help with the compiler-writing process are often been referred to as compiler-compilers, compiler-generators or translator-writing systems.

Largely they are oriented around a particular model of languages, and they are suitable for generating compilers of languages similar model.

What are the functions of preprocessor? [AU Nov/Dec 2007] [AU Nov/Dec 2004] [AUT Nov / Dec 2011]

· Macro processing

· File inclusion

· “Rational” preprocessors

· Language Extensions

Define preprocessor. [AU May/Jun 2007]

Preprocessors produce input to the compilers. It performs the following functions.

· Macro processing

· File inclusion

· “Rational” preprocessors

· Language Extensions

Differentiate analysis and synthesis. [AUT Nov/Dec 2010]

Analysis:

It is one Part of compilation.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

It includes Linear Analysis, Syntax Analysis and Semantic Analysis.

Synthesis:

It is another part of compilation.

The synthesis part constructs the desired target program from the intermediate representation.

It includes intermediate code generation, code generation and code optimization.

Represent the following expression into lexical analysis phase and semantic analysis phase: Average = (mark1 + mark2) * 1.5 / 2. [AUT Nov/Dec 2010]

Compare compiler and interpreter. [AU Nov. / Dec. 2012] [AU May / June 2013]

A Compiler is a program that translates code of a programming language in machine code, also called object code. The object code can be executed directly on the machine where it was compiled.

An Interpreter reads the statements of a program, analyzes them and then executes them on the virtual machine by calling the corresponding instructions of the library.

What is meant by a cross – compiler? [AU Nov. / Dec. 2012] [AU May / June 2013]

A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. Cross compiler tools are used to generate executables for embedded system or multiple platforms.

What is the role of lexical analyzer?[ AU Nov/Dec 2013]

The main task is to read the input characters of the source program, group them into lexemes, and produce as output a sequence of tokens for each lexeme in the source program. The atream of tokens is sent to the parser for syntax analysis. It is common for lexical analyzer to interact with the symbol table as well.

Differentiate tokens, patterns, lexeme.[AU Nov / Dec 2010] [AU May / June 2013]

Ø Tokens- Sequence of characters that have a collective meaning.

Ø Patterns- There is a set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token

Ø Lexeme- A sequence of characters in the source program that is matched by the pattern for a token.Ex. const a = 4.98; here a is a lexeme for the token “identifier”.

List the operations on languages.

§ Union - L U M ={s | s is in L or s is in M}

§ Concatenation – LM ={st | s is in L and t is in M}

§ Kleen Closure – L* (zero or more concatenations of L)

§ Positive Closure – L+ ( one or more concatenations of L)

Write a regular expression for an identifier.

An identifier is defined as a letter followed by zero or more letters or digits. The regular expression for an identifier is given as

letter (letter | digit)*

Mention the various notational shorthands for representing regular expressions.

i. One or more instances (+)

ii. Zero or one instance (?)

iii. Character classes ([abc] where a,b,c are alphabet symbols denotes the regular expressions a | b | c.)

iv. Non regular sets

What is the function of a hierarchical analysis?

Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing.

What does a semantic analysis do? [AU Nov / Dec 2011]

Semantic analysis is one in which certain checks are performed to ensure that components of a program fit together meaningfully.

Mainly performs type checking.

List the various error recovery strategies for a lexical analysis. [AU Apr/May 2005]

Possible error recovery actions are:

Ø Panic mode recovery

Ø Deleting an extraneous character

Ø Inserting a missing character

Ø Replacing an incorrect character by a correct character

Ø Transposing two adjacent characters

What are the components of Context Free Grammar?

A finite set of terminals (in our case, this will be the set of tokens)

Ø A finite set of non-terminals (syntactic-variables)

Ø A finite set of productions rules in the following form

· A ® a where A is a non-terminal and

a is a string of terminals and non-terminals (including the empty string)

A start symbol (one of the non-terminal symbol)

Example:

E ® E + E | E – E | E * E | E / E | - E

E ® ( E )

E ® id

What is the function of the syntactic analyzer?

A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
A syntax analyzer is also called as a parser.
A parse tree describes a syntactic structure.

A Parse tree pictorially shows the start symbol of a grammar derives a string in the language.

Ex. AàXYZ

Define input buffering. [AUT Nov/Dec 2010]

It stores the input and is divided into two N-character halves. N – number of character on one disk block. Ex. 1024 or 4096. It is used to reduce the amount of overhead required to process an input character.

Write the possible error recovery techniques.

1. Deleting an extraneous character

2. Inserting a missing character

3. Replacing an incorrect character by a correct character

4. Transposing two adjacent character

What is a Sentinel? [AU Nov / Dec 2010]

The Sentinel is a special character that cannot be part of the source program. Ex. eof. It is used to perform the test to identify the end of the source file.

How will you specify the patterns?

Regular expressions are used to describe tokens. Regular expressions are an important notation for specifying patterns .Each pattern matches a set of strings, so regular expressions will serve as names for set of strings.

What is a Language? What are the operations that cane be applied on it?

Language denotes any set of strings over some fixed alphabet. A language denoted by a regular expression is said to be a regular set. The possible operations on languages are Union, Concatenation, Kleene Closure, Positive Closure.

What is a regular definition?

If å is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form
d₁à r₁

d₂à r₂

d₃à r₃

…..

d_nà r_n

where each d_i is a distinct name, and each r_i is a regular expression over the symbols in å U{d_1, d2, d3,……. Di-1}.

Which constructs of a program should be recognized by the lexical analyzer, and which ones by the syntax analyzer?

1. Both of them do similar things; But the lexical analyzer deals with simple non-recursive constructs of the language.

2. The syntax analyzer deals with recursive constructs of the language.

3. The lexical analyzer simplifies the job of the syntax analyzer.

4. The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program.

5. The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures in our programming language.

What are the issues in Lexical Analysis? [AU May/Jun 2007, 2009, 2013]

The separation of lexical analysis from syntax analysis often allows us to simplify one or the other of these phases.
Compiler efficiency is improved
Compiler portability is enhanced

What is a DFA?

A DFA represents a finite state machine that recognizes a RE. For example, the following DFA:

recognizes (abc⁺)⁺.

Differentiate between NFA and DFA. [AU Nov / Dec 2012]

NFA: A transition function “move” that maps state-symbol pairs to set of states. It has ε- transition. For each state S and input symbol ‘a’, there is more than one edge labeled ‘a’ leaving S.

DFA: It has no ε- transition. For each state S and input symbol ‘a’, there is atmost one edge labeled ‘a’ leaving S.

Write a regular expression for the set of all strings of 0’s and 1’s with an

even number of 0’s and an odd number of 1’s. [AU Nov / Dec 2012]

(1*01*01*)* (1*10*10*)*.

What is LEX? [AU May / June 2013]

Lex is a computer program that generates lexical analyzers ("scanners" or "lexers"). Lex reads an input stream specifying the lexical analyzer and outputs source code implementing the lexer in the C programming language.

52. Write the difference between regular grammar and context free grammar. [AU May / June 2013]

In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form

V -> w

where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty). A formal grammar is considered "context free" when its production rules can be applied regardless of the context of a nonterminal. It does not matter which symbols the nonterminal is surrounded by, the single nonterminal on the left hand side can always be replaced by the right hand side. The languages generated by context-free grammars are known as the context-free languages.

A regular grammar is a formal grammar (N, Σ, P, S) such that all the production rules in P are of one of the following forms:

· B -> a - where B is a non-terminal in N and a is a terminal in Σ

· B ->aC - where B and C are in N and a is in Σ

· B -> ε - where B is in N and ε denotes the empty string, i.e. the string of length 0.

Write regular expression for language to recognize ‘identifier’. [AU May / June 2013]

The regular definition for recognizing identifiers in C programming language is given by

letter -> a|b|...z|A|B|...|Z|

digit -> 0|1|...|9

identifier -> letter(letter|digit)*

This definition will generate identifiers of the form identifier:

[_a-zA-Z][_a-zA-Z0-9]*

16 Mark Questions

1. Discuss in detail the various phases of a compiler for translating a source statement to object code.(or) What are the Six phases of the compilers? Explain with an example.

2. What are compiler construction tools? Describe about their features.(or) Why do we need compiler construction tools? Discuss in detail about some compiler construction tools.

3. Explain the needs of grouping of phases of compiler.

4.Write in detail about the cousins of the compiler.

5. Draw the DFA for the regular expression (a/b)*abb and find minimized DFA.

6. (a)Write in detail about tool for generating a Lexical Analyser (b) Give a lexical analyser specification for a model computer.

7. Depict the language processing system and describe about the cousins of the compiler.

8. Discuss in detail the efficiency issues concerned with the buffering of input during lexical analysis. (or) Discuss about the input buffering schemes in lexical analysis.

9. Construct a nondeterministic finite automata using Thompson’s construction for the regular expression (a/b)*abb (a/b)*

10. Write the role of lexical analyzer. (or) Explain in detail about the role of Lexical analyzer with the possible error recovery actions.

11. What is the need for lexical analysis? Discuss the issues in lexical analysis.

12. Discuss about the errors encountered during the lexical analysis phase.

13. Discuss about the implementation of lexical analyzer.

14. Write the significance of symbol table during the lexical analysis phase.

15. Construct a minimum state DFA for the regular expression (a/b)*abb(a/b).

16. Prove that the given two regular expressions are equivalent by showing that their minimum state DFA’s are the same (a*/b*) and ((ε/a) b*)*.

UNIT II Syntax Analysis and Run-time Environments

Define parser or State the role of parser. [AUT Nov/Dec 2010, Nov/Dec 2013]

Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing.

It gets the string of tokens from the lexical analyzer and verifies that the string can be generated by the grammar for the source language.

Mention the basic issues in parsing.

There are two important issues in parsing.

a. Specification of syntax

b. Representation of input after parsing.

Why lexical and syntax analyzers are separated out?

Reasons for separating the analysis phase into lexical and syntax analyzers:

Simpler design.
Compiler efficiency is improved.
Compiler portability is enhanced.

Define a context free grammar. [AU Nov / Dec 2009, Nov/Dec 2013]

A context free grammar G is a collection of the following

1. V is a set of non terminals

2. T is a set of terminals

3. S is a start symbol

4. P is a set of production rules

5. G can be represented as G = (V,T,S,P)

6. Production rules are given in the following form

7. Non terminal → (V U T)*

Briefly explain the concept of derivation.

Derivation from S means generation of string w from S. For constructing derivation two things are important.

ii. Choice of non terminal from several others.

iii. Choice of rule from production rules for corresponding non terminal. Instead of choosing the arbitrary non terminal one can choose

iv. either leftmost derivation – leftmost non terminal in a sentinel form

v. or rightmost derivation – rightmost non terminal in a sentinel form

Define ambiguous grammar. [AU Nov/Dec 2007]

A grammar G is said to be ambiguous if it generates more than one parse tree for some sentence of language L(G).i.e. both leftmost and rightmost derivations are same for the given sentence.

What is an operator precedence parser?

A grammar is said to be operator precedence if it possess the following properties:

No production on the right side is ε.
There should not be any production rule possessing two adjacent non terminals at the right hand side.

List the properties of LR parser.

a. LR parsers can be constructed to recognize most of the programming languages for which the context free grammar can be written.

b. The class of grammar that can be parsed by LR parser is a superset of class of grammars that can be parsed using predictive parsers.

c. LR parsers work using non backtracking shift reduce technique yet it is efficient one.

Mention the types of LR parser.

Ø SLR parser- simple LR parser

Ø LALR parser- lookahead LR parser

Ø Canonical LR parser

What are the problems with top down parsing? [AU May/Jun 2009]

The following are the problems associated with top down parsing:

1. Backtracking

2. Left recursion

3. Left factoring

4. Ambiguity

Write the algorithm for FIRST and FOLLOW.

FIRST

1. If X is terminal, then FIRST(X) IS {X}.

2. If X → ε is a production, then add ε to FIRST(X).

3. If X is non terminal and X → Y1,Y2..Yk is a production, then place a in FIRST(X) if for some i , a is in FIRST(Yi) , and ε is in all of FIRST(Y1),…FIRST(Yi-1);

FOLLOW

· Place $ in FOLLOW(S),where S is the start symbol and $ is the input right endmarker.

· If there is a production A → αBβ, then everything in FIRST(β) except for ε is placed in FOLLOW(B).

· If there is a production A → αB, or a production A→ αBβ where FIRST(β) contains ε , then everything in FOLLOW(A) is in FOLLOW(B).

List the advantages and disadvantages of operator precedence parsing.[AU May/Jun 2007]

Advantages

This type of parsing is simple to implement.

Disadvantages

· The operator like minus has two different precedence(unary and binary).Hence it is hard to handle tokens like minus sign.

· This kind of parsing is applicable to only small class of grammars.

What is dangling else problem?

Ambiguity can be eliminated by means of dangling-else grammar which is show below:

Stmt | if expr then stmt

| if expr then stmt else stmt

| other

Write short notes on YACC.

YACC is an automatic tool for generating the parser program. YACC stands for Yet Another Compiler Compiler which is basically the utility available from UNIX. Basically YACC is LALR parser generator. It can report conflict or ambiguities in the form of error messages.

What is meant by handle pruning?

A rightmost derivation in reverse can be obtained by handle pruning.

If w is a sentence of the grammar at hand, then w = γn, where γn is the nth right-sentential form of some as yet unknown rightmost derivation

S = γ0 => γ1…=> γn-1 => γn = w

What is meant by viable prefixes?

The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser are called viable prefixes. An equivalent definition of a viable prefix is that it is a prefix of a right sentential form that does not continue past the right end of the rightmost handle of that sentential form.

Define handle. [AU Nov /Dec 2012]

A handle of a string is a substring that matches the right side of a production, and whose reduction to the nonterminal on the left side of the production represents one step along the reverse of a rightmost derivation. A handle of a right – sentential form γ is a production A->β and a position of γ where the string β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ. That is , if S =>αAw =>αβw, then A->β in the position following α is a handle of αβw.

What are kernel & non-kernel items?

Kernel items, whish include the initial item, S'→ .S, and all items whose dots are not at the left end.

Non-kernel items, which have their dots at the left end.

What is phrase level error recovery?

Phrase level error recovery is implemented by filling in the blank entries in the predictive parsing table with pointers to error routines. These routines may change, insert, or delete symbols on the input and issue appropriate error messages. They may also pop from the stack.

What is meant by derivation?

A sequence of replacements of non-terminal symbols is called a derivation of id+id from E.

In general a derivation step is

aAb Þ bga if there is a production rule Ag® in our grammar

where a and b are arbitrary strings of terminal and non-terminal symbols

a1 Þ a2 Þ ... Þ an (an derives from a1 or a1 derives an )

Þ : derives in one step

Þ : derives in zero or more steps

Þ : derives in one or more steps

State the difference between sentential and sentence of a context free grammar G.

S Þ a - If a contains non-terminals, it is called as a sentential form of G.

If a does not contain non-terminals, it is called as a sentence of G.

What is L(G)?

Ø L(G) is the language of G (the language generated by G) which is a set of sentences.

Ø A sentence of L(G) is a string of terminal symbols of G.

Ø If S is the start symbol of G then

Ø w is a sentence of L(G) iff S Þ w where w is a string of terminals of G.

Ø If G is a context-free grammar, L(G) is a context-free language.

What is the difference between left most and right most derivation?

Ø If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation.

Ø E Þ -E Þ -(E) Þ -(E+E) Þ -(id+E) Þ -(id+id)

Ø If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation.

E Þ -E Þ -(E) Þ -(E+E) Þ -(E+id) Þ -(id+id)

When the Grammar is called as an ambiguous grammar?

A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar.

What is a Left recursive grammar?

A grammar is left recursive if it has a non-terminal A such that there is a derivation. A Þ Aa for some string a the left-recursion may appear in a single step of the derivation (immediate left- recursion), or may appear in more than one step of the derivation.

How will you eliminate the Left Recursion in Grammars?

A ® A a | b where b does not start with A

ß eliminate immediate left recursion

A ® b A’

A’ ® a A’ | e an equivalent grammar

In general,

A ® A a1 | ... | A am | b1 | ... | bn where b1 ... bn do not start with A

ß eliminate immediate left recursion

A ® b1 A’ | ... | bn A’

A’ ® a1 A’ | ... | am A’ | e an equivalent grammar

Give an Example for an Immediate Left Recursion and how it will be eliminated?

Example :1

E ® E+T | T

T ® T*F | F

F ® id | (E)

eliminate immediate left recursion

E ® T E’

E’ ® +T E’ | e

T ® F T’

T’ ® *F T’ | e

F ® id | (E)

Example 2:

S ® Aa | b

A ® Ac | Sd | f

- Order of non-terminals: S, A

for S:

- we do not enter the inner loop.

- there is no immediate left recursion in S.

for A:

- Replace A ® Sd with A ® Aad | bd

So, we will have A ® Ac | Aad | bd | f

- Eliminate the immediate left-recursion in A

A ® bdA’ | fA’

A’ ® cA’ | adA’ | e

So, the resulting equivalent grammar which is not left-recursive is:

S ® Aa | b

A ® bdA’ | fA’

A’ ® cA’ | adA’ | e

Why we need Left Factoring? [AU Nov / Dec 2011]

A predictive parser (a top-down parser without backtracking) insists that the grammar must be left-factored.

grammar è a new equivalent grammar suitable for predictive parsing

stmt ® if expr then stmt else stmt |

if expr then stmt

when we see if, we cannot now which production rule to choose to re-write stmt in the derivation.

Give an Example for left Factoring?

A ® abB | aB | cdg | cdeB | cdfB

A ® aA’ | cdg | cdeB | cdfB

A’ ® bB | B

A ® aA’ | cdA’’

A’ ® bB | B

A’’ ® g | eB | fB

Write an algorithm that will do the Left Factoring.

For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, let say

A ® ba1 | ... | ban | g1 | ... | gm

convert it into

A ® aA’ | g1 | ... | gm

A’ ® b1 | ... | bn

List out the important features of Predictive Parsing? [AU Nov/Dec 2007]

Predictive Parsing

• no backtracking

• efficient

• needs a special form of grammars (LL(1) grammars).

• Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking.

Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

Give a simple example for Recursive-Descent Parsing (uses Backtracking)

Backtracking is needed.

It tries to find the left-most derivation

S ® aBc

B ® bc | b

S S

input: abc

a B c a B c

b c b

What is a Non-Recursive Predictive Parsing -- LL(1) Parser?

Non-Recursive predictive parsing is a table-driven parser.

It is a top-down parser.

It is also known as LL(1) Parser.

What are the components of the Parsing Table?

parsing table

– a two-dimensional array M[A,a]

– each row is a non-terminal symbol

– each column is a terminal symbol or the special symbol $

– each entry holds a production rule

What is a FIRST (a)?

FIRST (a) is a set of the terminal symbols which occur as first symbols in strings derived from a where a is any string of grammar symbols.

if a derives to e, then e is also in FIRST(a) .

What is a FOLLOW (A)?

FOLLOW (A) is the set of the terminals which occur immediately after (follow) the non-terminal A in the strings derived from the starting symbol.

– a terminal a is in FOLLOW(A) if S Þ aAab

$ is in FOLLOW (A) if S Þ aA

How will you compute FIRST for Any String X ? [AU Nov / Dec 2011]

If X is a terminal symbol è FIRST(X)={X}

If X is a non-terminal symbol and X ® e is a production rule

è e is in FIRST(X).

If X is a non-terminal symbol and X ® Y1Y2..Yn is a production rule

è if a terminal a in FIRST(Yi) and e is in all FIRST(Yj) for j=1,...,i-1 then a is in FIRST(X).

è if e is in all FIRST(Yj) for j=1,...,n then e is in FIRST(X).

If X is e

è FIRST(X)={e}

If X is Y1Y2..Yn

è if a terminal a in FIRST(Yi) and e is in all FIRST(Yj) for j=1,...,i-1 then a is in FIRST(X).

è if e is in all FIRST(Yj) for j=1,...,n then e is in FIRST(X).

Write an Example to calculate the FIRST(X).

E ® TE’

E’ ® +TE’ | e

T ® FT’

T’ ® *FT’ | e

F ® (E) | id

FIRST(F) = {(,id} FIRST(TE’) = {(,id}

FIRST(T’) = {*, e} FIRST(+TE’ ) = {+}

FIRST(T) = {(,id} FIRST(e) = {e}

FIRST(E’) = {+, e} FIRST(FT’) = {(,id}

FIRST(E) = {(,id} FIRST(*FT’) = {*}

FIRST(e) = {e}

FIRST((E)) = {(}

FIRST(id) = {id}

How will you Compute FOLLOW for non-terminals ?

If S is the start symbol è $ is in FOLLOW(S)

if A ® aBb is a production rule

èeverything in FIRST(b) is FOLLOW(B) except e

f ( A ® aB is a production rule ) or ( A ® aBb is a production rule and e is in FIRST(b) ) è everything in FOLLOW(A) is in FOLLOW(B).

We apply these rules until nothing more can be added to any follow set

Give an Example to compute the FOLLOW?

E ® TE’

E’ ® +TE’ | e

T ® FT’

T’ ® *FT’ | e

F ® (E) | id

FOLLOW(E) = { $, ) }

FOLLOW(E’) = { $, ) }

FOLLOW(T) = { +, ), $ }

FOLLOW(T’) = { +, ), $ }

FOLLOW(F) = {+, *, ), $ }

What do we have to do it if the resulting parsing table contains multiply defined entries?

If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.

If the grammar is not left factored, we have to left factor the grammar.

If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is ambiguous or it is inherently not a LL(1) grammar.

What are the properties of LL(1) grammar?

A grammar G is LL(1) if and only if the following conditions hold for two distinctive production rules A ® a and A ® b

Both a and b cannot derive strings starting with same terminals.
At most one of a and b can derive to e.
If b can derive to e, then a cannot derive to any string starting with a terminal in FOLLOW(A).

List out the possible error may occur in the predictive parsing (LL(1) parsing)

– if the terminal symbol on the top of stack does not match with the current input symbol.

– if the top of stack is a non-terminal A, the current input symbol is a, and the parsing table entry M[A,a] is empty.

What should the parser do in an error case?

– The parser should be able to give an error message (as much as possible meaningful error message).

– It should be recovered from that error case, and it should be able to continue the parsing with the rest of the input.

List the various error recovery techniques?

Ø Panic-Mode Error Recovery

Ø Phrase-Level Error Recovery

Ø Error-Productions

Ø Global-Correction

What will be performed in Panic-Mode and Phrase-Level Error Recovery?

Panic-Mode Error Recovery - Skipping the input symbols until a synchronizing token is found.

Phrase-Level Error Recovery - Each empty entry in the parsing table is filled with a pointer to a specific error routine to take care that error case.

What is an Error Production?

Error-Productions

– If we have a good idea of the common errors that might be encountered, we can augment the grammar with productions that generate erroneous constructs.

– When an error production is used by the parser, we can generate appropriate error diagnostics.

– Since it is almost impossible to know all the errors that can be made by the programmers, this method is not practical.

How panic-mode error recovery is performed in LL(1) parsing?

– All the empty entries are marked as synch to indicate that the parser will skip all the input symbols until a symbol in the follow set of the non-terminal A which on the top of the stack. Then the parser will pop that non-terminal A from the stack. The parsing continues from that state.

– To handle unmatched terminal symbols, the parser pops that unmatched terminal symbol from the stack and it issues an error message saying that that unmatched terminal is inserted.

What is a Shift Reduce parsing?

Bottom-up parsing is also known as shift-reduce parsing because its two main actions are shift and reduce.

– At each shift action, the current symbol in the input string is pushed to a stack.

– At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side of a production) will replaced by the non-terminal at the left side of that production.

– There are also two more actions: accept and error.

What are the Conflicts encountered During Shift-Reduce Parsing ?

Stack contents and the next input symbol may not decide action:

– shift/reduce conflict: Whether make a shift operation or a reduction.

– reduce/reduce conflict: The parser cannot decide which of several reductions to make.

What is a non LR(K) grammar?

If a shift-reduce parser cannot be used for a grammar, that grammar is called as non-LR(k) grammar.

L: left to right R: right-most k: lookhead scanning derivation

An ambiguous grammar can never be a LR grammar.

What are the types of Parser?

Predictive Parser
Shift Reduce Parser
Operator-Precedence Parser-simple, but only a small class of grammars.
LR-Parsers

– covers wide range of grammars.

• SLR – simple LR parser

• LR – most general LR parser

• LALR – intermediate LR parser (lookhead LR parser)

SLR, LR and LALR work same, only their parsing tables are different.

What is an operator grammar? Give an Example.

Ø In an operator grammar, no production rule can have:

– e at the right side

– two adjacent non-terminals at the right side.

Ex:

E®AB E®EOE E®E+E |

A®a E®id E*E |

B®b O®+|*|/ E/E | id

not operator grammar not operator grammar operator grammar

Waht are the precedence relations between the pair of terminals in operator-precedence parsing?

In operator-precedence parsing, we define three disjoint precedence relations between certain pairs of terminals.

a <. b b has higher precedence than a

a =· b b has same precedence as a

a .> b b has lower precedence than a

What is the the canonical LR(0) collection?

– Sets of LR(0) items will be the states of action and goto table of the SLR parser.

– A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for constructing SLR parsers.

What is an LR(0) item? Give an Example.

An LR(0) item of a grammar G is a production of G a dot at the some position of the right side.

Ex: A ® aBb Possible LR(0) Items: A ® .aBb

(four different possibility) A ® a.Bb

A ® aB.b

A ® aBb.

What is an Augmented Grammar?

Augmented Grammar:

G’ is G with a new production rule S’®S where S’ is the new starting symbol.

How will you find the Closure (I)?

If I is a set of LR(0) items for a grammar G, then closure(I) is the set of LR(0) items constructed from I by the two rules:

Initially, every LR(0) item in I is added to closure(I).
If A ® a.Bb is in closure(I) and Bg® is a production rule of G; then B®.g will be in the closure(I). We will apply this rule until no more new LR(0) items can be added to closure(I).

What is a SLR(1)Parser?

– An LR parser using SLR(1) parsing tables for a grammar G is called as the SLR(1) parser for G.

– If a grammar G has an SLR(1) parsing table, it is called SLR(1) grammar (or SLR grammar in short).

– Every SLR grammar is unambiguous, but every unambiguous grammar is not a SLR grammar.

What are the conflicts will encounter in SLR Parser?

– If a state does not know whether it will make a shift operation or reduction for a terminal, we say that there is a shift/reduce conflict.

– If a state does not know whether it will make a reduction operation using the production rule i or j for a terminal, we say that there is a reduce/reduce conflict.

– If the SLR parsing table of a grammar G has a conflict, we say that that grammar is not SLR grammar.

–

Eliminate the Left Recursion from the following Grammars? [AU May/Jun 2007] [AU May / June 2013]

AàAc / Aad / bd / c (or) AàAc / Aad / bd / ε

Solution:

A à bdA’ / A’

A’ à cA’ / adA’ / c

What is predictive parser? [AU Nov/Dec 2007]

A predictive parser is an efficient way of implementing recursive descent parsing [Top down parsing without backtracking] by handling the stack of activation records. It has an input, a stack, a parsing table and an output.

Is the grammar S à SS | (S) | ( ) ambiguous. [AUT Nov/Dec 2010]

No. There is no two different parse tree for the same string.

Determine the FIRST(S) for the following grammar. [ AUT Nov/Dec 2010]

S à ScB | B

B à e | efg | efCg

Cà SdC | S

Removing left recursion: S à BS’

S’à cBS’ | ε

FIRST(S) = FIRST(B) = { e }.

Translate a*-(b+c) into postfix code. [ AUT Nov/Dec 2010]

ANS: abc+-*

Is the following grammar G LL(1)? E à A | B Aàa|c Bàb|c [AUT

Nov/Dec 2010]

There is no multiple entries in the parsing table. So it is not LL(1).

Derive FIRST and FOLLOW for the following grammar. [AUT

Nov/Dec 2010]

S à 0 | 1 | AS0 | BS0 Aà ε Bà ε

FIRST(S) = {0, 1, ε } FOLLOW(S) = FOLLOW(A) = FOLLOW(B) = {0, $}

FIRST(A) = { ε }

FIRST(B) = { ε }

What is an annotated parse tree? [AU May / June 2013]

A parse tree showing the attribute values at each node is called an annotated parse tree.

Write the difference between Top down parsing and Bottom up parsing. [AU May / June 2013]

Top-down parsers: starts constructing the parse tree at the top (root) of the parse tree and move down towards the leaves. Easy to implement by hand, but work with restricted grammars.

examples:

- Predictive parsers (e.g., LL(k))

Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars.

examples:

– shift-reduce parser (or LR(k) parsers)

Both are general techniques that can be made to work for all languages (but not all grammars!)

Define handle and handle pruning process. [AU Nov / Dec 2012]

Handle: A ‘handle’ of a string is a substring that matches the right side of a production and whose reduction to the non-terminal on the left side of the production represents one step along the reverse of a rightmost derivation.

Handle Pruning: Reducing β to A in αβω is called handle pruning,i.e., removing the children of A from the parse tree. A rightmost derivation in reverse can be obtained by handle pruning.

Write the LR parsing algorithm. [AU Nov / Dec 2012]

LR parser consists of an input, an output, a stack, a driver program and a parsing table that has two functions:

Ø action

Ø goto

The driver program is same for all LR parsers. Only the parsing table changes from one parser to another.

The parsing program reads character from an input buffer one at time. The program uses a stack to store a string of the form S₀X₁S₁X₂S₂…….X_mS_m, where S_m is on top. Each S_i is a symbol called state and each X_i is a grammar symbol.

What is Run-time organization?

To implement the abstractions embodied in the source language, a compiler creates and manages a run-time environment in concert with the operating system and the target machine. It has static area for the object code and the static data objects created at compile time. It also has dynamic stack and heap areas for managing objects created and destroyed as the target program executes.

What is Control stack?

Procedure calls and returns are usually managed by a run-time stack called control stack. We can use a stack because procedure calls or activation nest in time (i.e.,) if p calls q, then this activation of q is nested within this activation of p.

What is Stack allocation?

Storage for local variables can allocated on a run-time stack for languages that allow or require local variables to become inaccessible when their procedures end. For such languages, each live activation has an activation record on the control stack, with the root of activation tree at the bottom, and entire sequence of activation record on the stack corresponding to the path in activation tree to activation where control currently resides. The later activation has record at the top of the stack.

16 Mark Questions

1. Determine the FIRST and FOLLOW for the Grammar G with productions

S->0/1/AS0/BS0 A->ε B->ε. Construct a LL(1) parsing table and prove or disprove if the grammar is LL(1)

2. Explain how error recovery is done in predictive parsing with suitable example.

3. Construct Predictive Parsing table for the following grammar S->(L) / a L->L,S/S

Check whether the following sentences belong to the above grammar or not.

i.) (a,a) ii.) (a,(a,a))

4. Construct a CLR parser for the grammar: S->CC, C->cC / d

5. Give the canonical collection of sets of LR(0) items for the grammar given below:

E’ ->E

E -> E + T / T

T-> T * F / F

F-> (E) / id

6. Perform LR parsing and derive the input a(a,a(a,a))) using the below given grammar:

S’->S

S->a(L)

S->a

L->S,L

L->S

7. Explain the working of top down parsing and bottom up parsing?

8. Describe the error detection and recovery process involved in the lexical phase.

9. Write an algorithm for generating LR item sets and constructing SLR Parsing table.

10. Write about LALR parser.

11. What are the different strategies that a parser can employ to recover from syntax errors?

12. Construct the CLR parsing table form

S->AA

A->Aa/b

13. Write Operator-precedence parsing algorithm.

14. Compare LR, SLR, Canonical LR, LALR parser with their advantages.

15. Design a predictive parsing table for the grammar

E-> E+T/T

T-> T*F/F

F->(E)/id

and show the moves for input string id*id+id.

16. Design an operator precedence parser for the grammar

E-> E+T/T

T-> T*F/F

F->(E)/id

and draw the precedence function graph.

17. How the Storage Organization is performed in Run-Time Environment?

18.What are the available Storage allocation strategies available?

UNIT III Intermediate Code Generation

What are the different intermediate languages available? [AU Nov/Dec 2007][AU May / June 2013, Nov/Dec 2013]

Syntax trees can be used as an intermediate language.
Postfix notation can be used as an intermediate language.
Three-address code (Quadraples) can be used as an intermediate language

we will use quadraples to discuss intermediate code generation
quadraples are close to machine instructions, but they are not actual machine instructions.

Some programming languages have well defined intermediate languages.

· java – java virtual machine

· prolog – warren abstract machine

· In fact, there are byte-code emulators to execute instructions in these intermediate languages.

What are various representation of Three address statement? [AU Nov/Dec 2007]

Quadruples:A Quadruple is a record structure with four fields, op,arg1,arg2 and result. The op field contains an internal code for the operator. Eg. X^’= Y+Z is represented by placing + in op, y in arg1, z in arg2 and x in result.

Triples: It is a record with only three fields:op, arg1,arg2.Eg. X op Y

Indirect Triples: It will list pointers to the triples rather than the triples itself.

What are the properties of three address code?

§ Three address instruction has at most one operator in addition to the assignment symbol. The compiler has to decide the order in which operations are to be done.

§ The compiler must generate a temporary name to hold the value computed by each instruction.

Some three address instructions can have less than three operands

List the forms of three-address instruction forms.

· Assignment instruction of form, x= y op z.

· Assignment instruction of form, x= op y.

· Copy instruction of form, x= y.

· An unconditional jump goto L.

· Conditional jump of form, if x goto L and ifFalse x goto L.

· Conditional jump such as if x relop y goto L.

· Procedure calls such as

Param x1

Param x2

….

Param xn

Call p,n

· Indexed copy instruction of form, x= y[i] and x[i]=y.

· Address and pointer assisnments of form, x= &y, x= *y and *x= y.

What is record in type expression?

A record is a data structure with named fields. A type expression can be formed by applying the record type constructor to the field names and their types. Record types will be implemented by applying the constructor record to a symbol table containing entries for the fields.

How to use a name of X for the field within a record does not conflict with the other uses of name outside the record?

The three uses of X in the following declaration is distinct and do not conflict with each other.

float X;

record { float X; float Y;} p;

record { int tag; float X; float Y;} q;

A subsequent assignment X= p.X+q.X; sets variable X to the sum of fields named X in the records p and q. The relative address of X in p differs from the relative address of X in q.

How the Elements of arrays can be accessed in One Dimensional array?

location of A[i] è baseA+(i-low)*width

base A+(i-low)*width

can be re-written as i*width + (baseA-low*width)

should be computed at run-time can be computed at compile-time

So, the location of A[i] can be computed at the run-time by evaluating the formula i*width+c where c is (baseA-low*width) which is evaluated at compile-time.

How the Elements of arrays can be accessed in two Dimensional array?

The location of A[i1,i2] is baseA+ ((i1-low1)*n2+i2-low2)*width

baseA is the location of the array A.

low1 is the index of the first row

low2 is the index of the first column

n2 is the number of elements in each row

width is the width of each array element

Again, this formula can be re-written as

((i1*n2)+i2)*width + (baseA-((low1*n1)+low2)*width)

should be computed at run-time can be computed at compile-time

How to implement statements using control flow?

Statements can be translated ny inheriting a label next, where next marks the first instruction after the codefor this statement. The conditional S if(B)S1 can be translated by attaching a new label marking the beginning of code for S1 and passing a new label and S.next for the true and false exits, respectively of B.

How to generate jumping code for Boolean expressions?

Jumping code is useful because a Boolean expression is used for control flow as in if(B)S. Boolean values can be computed by jumping to t=true or t=false, where t is the temporary name. Using labels for jumps, a Boolean expression can be translated by inheriting the labels corresponding to its true or false exits. The constants true and false translate into the jump to the true and false exits, respectively.

How the Translation of Switch-Statements is performed?

The intended translation of switch is code to,

1. Evaluate the expression E.

2. Find the value Vj in the list of cases.

3. Execute the statement Sj associated with the value found.

What is meant by back patching? [AU Nov /Dec 2012] [AU May / June 2013]

By generating a series of branching statement with the targets of the jumps temporarily left unspecified. Each such statement will be put on a list of go to statements whose labels will be filled in when the proper label can be determined. This type of subsequent filling in of labels as back patching.

Define Backpatching. [AU May/Jun 2007, 2009] [AUT Nov/Dec 2010]

It is a Technique to solve the problem of replacing symbolic names into goto statements by the actual target address. The Backpatching can be done for,

· One-Pass code generation

· Boolean expression

· Flow-of-control statements

· Break,Continue and Goto statements.

Write the syntax directed translation for procedure call.

Sà call id (Elist)

{ for each item p on queue do

emit (‘param’ p);

emit (‘call’ id.place) }

15. What are the two representations to express intermediate languages?

§ Graphical representation.

§ Linear representation.

Define triple and give one example. [AU Nov/Dec 2007]

A triple is a data structure with 3 fields like operator , argument-1 , argument-2.

Example: a=b*-c

	operator	Argument-1	Argument-2
(0)	uminus	c
(1)	*	b	(0)
(2)	=	a	(1)

Define quadruple and give one example. [AU Nov/Dec 2007]

A quadruple is a data structure with 4 fields like operator , argument-1 , argument-2 and result.

Example: a=b*-c

	Operator	Argument-1	Argument-2	Result
(0)	Uminus	c		T₁
(1)	*	b	T₁	T₂
(2)	=	T₂		A

18. What is the merit of quadruples?

If we move a statement computing a variable , ‘A’ then the statements using ‘A’ requires no change. That is ‘A’ can be computed anywhere before the utilization of A.

19. Generate three address code for “if A=B then 1 else 0”, using numerical method.

· if A=B goto (4)

· T₁=0

· goto (5)

· T₁=1

· ……

What are the benefits of intermediate code generation? [AU May/Jun 2009]

A Compiler for different machines can be created by attaching different back end to the existing front ends of each machine.
A Compiler for different source languages can be created by proving different front ends for corresponding source languages t existing back end.
A machine independent code optimizer can be applied to intermediate code in order to optimize the code generation.

What are the various types of intermediate code representation? [AU Nov/Dec 2007] [AU Nov / Dec 2011][AU Nov/Dec 2013]

There are mainly three types of intermediate code representations.

Ø Syntax tree

Ø Postfix notation

Ø Three address code - Quadruple, Triple and Indirect triple

Give the syntax-directed definition for if-else statement.

1. S -> if E then S1

E.true := new_label()

E.false :=S.next

S1.next :=S.next

S.code :=E.code | | gen_code(E.true ‘: ‘) | | S1.code

2. S -> if E then S1 else S2

E.true := new_label()

E.false := new_label()

S1.next :=S.next

S2.next :=S.next

Mention the functions that are used in backpatching for One-pass code generation .

Ø mklist(i) creates the new list. The index i is passed as an argument to this function where I is an index to the array of quadruple.

Ø merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It returns the pointer to the concatenated list.

Ø backpatch(p,i) inserts i as target label for the statement pointed by pointer p.

What are the properties of intermediate languages? [AU May/Jun 2007]

Retargeting is facilitated; a computer for a different machine can be created by detaching a back end for the new machine to an existing front end.

Differentiate triple representation from indirect representation in the three address format. [AUT Nov/Dec 2010]

Triples: It is a record structure with three fields: op, arg1 and arg2. The field’s arg1 and arg2 are either pointers to the symbol table or pointers into the triple structure. It is used to avoid temporary names into the symbol table.

Indirect triples: listing pointers to triples rather than listing the triples themselves are called indirect triples. It saves some space compared to quadruples.

Translate the expression (a+b)*(c+d)+(a+b+c) into quadruples. [AUT Nov/Dec 2010]

	Op	Arg1	Arg2	Result
(0)	+	A	B	t1
(1)	+	C	D	t2
(2)	+	t1	C	t3
(3)	*	t1	t2	t4
(4)	+	t4	t3	t5

Compare a triple and Quadruple. [AU Nov / Dec 2011]

Quadruples: A Quadruple is a record structure with four fields: op, arg1, arg2 and result. The op field contains an internal code for the operator.

Translate a*-(b+c) into postfix code. [AUT Nov/Dec 2010]

Postfix notation is abc+-*

Write the difference between Syntax Tree and DAG. [AU May / June 2013]

A syntax tree is a graph structure. Nodes in a syntax tree correspond to statements. Edges in a syntax tree represent relations between statements. For example, lets take the case of a very simple executable statement: 1 + 2. It contains two sub-statements, the constant expressions 1 and 2 and the operator +. The syntax tree for this statement will have the root node representing a binary operation with the operator + and two child nodes representing the constant expressions 1 and 2.

Directed acyclic graph (DAG), is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of edges that eventually loops back to v again.

30. Why is SDT needed? AU May / June 2013]

Syntax-directed translation (SDT) refers to a method of compiler implementation where the source language translation is completely driven by the parser, i.e., based on the syntax of the language The parsing process and parse trees are used to direct semantic analysis and the translation of the source program. Almost all modern compilers are syntax-directed. SDT can be a separate phase of a compiler or we can augment our conventional grammar with information to control the semantic analysis and translation. Such grammars are called attribute grammars.

31. How the Procedure or Function calls can be translated?

The Procedure calls can be translated by using the concepts like,

· Function types

· Symbol tables

· Type checking

· Function calls

16 mark Questions

1. Describe the method of generating syntax-directed definition for Control statements.

2. Give the semantic for declarations in a procedure.

3. Explain the generation of intermediate code for declaration statement.

4. Discuss the methods for translating Boolean expression.

5. Brief intermediate code generation for basic block, control flow and Boolean expression.

6. Give the syntax-directed definition for flow of control statements.

7. How back patching can be used to generate the code for Boolean expression and flow of control statements.

8. Write the short note on procedure calls.

9. Explain about the different type of three address statements.

10. Explain in detail how three address code are generated and implemented.

11. Write syntax directed translation for desktop calculator implementation. Using the SDT compute the value of 19+4*15.

12. Write short notes on Quadruples, Triples and Indirect triples.

13. Discuss about back patching techniques with example.

14. Detail the translation scheme for flow control statements and procedure call

15. Discuss about back patching of Boolean expression.

UNIT IV Code Generation

1. What is Code generation?

Code generation is the final phase of a compiler. The code generator maps the intermediate representation produced by the front end or if there is a code optimization phase by a code optimizer, into the target program.

2. What are the issues in the design of a code generator? [AU May/Jun 2009] [AUT Nov/Dec 2010]

Intermediate code in which the source program was represented when it is given as a input to code generator. Target machine language Mapping names in the source program to addresses of data objects in run time memory is done cooperatively by the front end and the code generator.

· Instruction selection

· Register allocation

· Choice of evaluation order

· Approaches to code generation.

3. What are the problems identified while using registers?

During Register allocation, select the set of variables that will reside in registers at a point in the program. During a subsequent register assignment phase, pick the specific register that a variable will reside in.

4. What is an activation record?

Information needed during an execution of a procedure is kept in a block of storage called an activation record. Storage for names local to the procedure also appears in the activation record.

What is a basic block?[AU Nov/Dec 2007] [AU May / June 2013]

A basic block is a maximal sequence of consecutive three-address statements in which flow of control can only enter at the first statement of the block and leave at the last statement without halting or branching except possibly at the last statement in the basic block.

How do we allocate the space for the generated target code and the data object of our source programs?

• The places of the data objects that can be determined at compile time will be allocated statically.

• But the places for the some of data objects will be allocated at run-time.

• The allocation of de-allocation of the data objects is managed by the run-time support package.

– run-time support package is loaded together with the generate target code.

– the structure of the run-time support package depends on the semantics of the programming language (especially the semantics of procedures in that language).

• Each execution of a procedure is called as activation of that procedure

What is an activation tree? [AU Nov/Dec 2007] [AU Nov / Dec 2011]

We can use a tree (called activation tree) to show the way control enters and leaves activations.

In an activation tree:

– Each node represents an activation of a procedure.

– The root represents the activation of the main program.

– The node a is a parent of the node b iff the control flows from a to b.

– The node a is left to to the node b iff the lifetime of a occurs before the lifetime of b.

What is the flow of control in a program? [AU Nov/Dec 2007] [AU May / June 2013]

A flow graph is a graphical representation of a program in which the nodes of a graph are basic blocks and the edge of the graph shows how control can flow among the blocks.

What is the use of Control stack?

A stack (called control stack) can be used to keep track of live procedure activations.

– An activation record is pushed onto the control stack as the activation starts.

– That activation record is popped when that activation ends.

When node n is at the top of the control stack, the stack contains the nodes along the path from n to the root.

What do you meant by scope of the variable?

The scope rules of the language determine which declaration of a name applies when the name appears in the program.

An occurrence of a variable (a name) is:

– local: If that occurrence is in the same procedure in which that name is declared.

– non-local: Otherwise (ie. it is declared outside of that procedure)

Example:

procedure p;

var b:real;

procedure p;

var a: integer; a is local

begin a := 1; b := 2; end; b is non-local

begin ... end;

What are the contents of the activation record? [AU May / June 2012]

return value

actual parameters

optional control link

optional access link

saved machine status

local data

Temporaries

Who allocates an activation record of a procedure?

– Some part of the activation record of a procedure is created by that procedure immediately after that procedure is entered.

– Some part is created by the caller of that procedure before that procedure is entered.

Who deallocates an activation record of a procedure?

– Callee de-allocates the part allocated by Callee.

– Caller de-allocates the part allocated by Caller

What do you meant by dangling reference problem? Give an example.

Whenever storage is de-allocated, the problem of dangling references may occur.

Example:

main () {

int *p;

p = dangle();

}

int *dangle*()

{

int i=2;

return &i;

}

What are the various parameters passing mechanism?

· Call-by-value

· Call-by-reference

· Call-by-name (used by Algol)]

What are the two types of transformation that can be applied to basic blocks?[AU May/Jun 2007] [AU Nov / Dec 2011]

· Structure preserving Transformation

Ø Common sub expression elimination

Ø Dead code elimination

Ø Renaming of temporary variables

Ø Interchange of two independent adjacent statement

· Algebraic Transformation

Translate the following expression in to three address code sequence. d: = (a-b)+(a-c)+(a-c).

t := a-b

u :=a-c

v :=t+u

d := v+u

Generate the code for the above three address code sequence.

MOV a, R0

SUB b, R0

MOV a R1

SUB c, R1

ADD R1, R0

ADD R1,R0

MOV R0,d

What is a DAG? State its uses? [AU May/Jun 2007] [AUT Nov/Dec 2010][AU Nov/Dec 2013]

Directed Acyclic Graphs are useful data structures for implementing transformations on basic blocks. A DAG gives the picture of how the values are computed by each statement in the basic block is used in subsequent statements of the block.

Write an algorithm to construct a DAG?

Input: Basic Block

Output: A DAG for the basic block containing the following information

A label for each node. For leaves the label is an identifier and for interior nodes an operator symbol. For each node a list of attached identifier

Method:

Ø If node (y) is undefined, create a leaf labeled y.

Ø Determine if there is a node labeled op, whose left child is node(y) and whose right child is node (z).

Ø If not create such node.

Ø Delete x from the list of attached identifiers of node(x)

What is the use of Next – use information? [May / June 2012]

If the name in a register is no longer needed, then the register can be assigned to some other name. This idea can be applied in number of contexts.

Construct the dag for the following basic block.

d := b*c

e : = a+b

b := b*c

a := e-d

Write the steps for constructing LEADERS in Basic block. [AU May/Jun 2009]

i. First statement is a leader.

ii. Any statement that is the target of the conditional or unconditional goto is a leader.

iii. Any statement that immediately follows a goto or conditional goto statement is a leader.

State the importance of Next-use information. [AUT Nov/Dec 2010]

Next-use information is a collection of all the names that are used for next subsequent statements in a block. This information is generally collected for register allocation and temporary names or variables allocation.

What do you by instruction cost? [AU Nov / Dec 2011]

The cost of an instruction to be one plus the cost associated with the source and destination adress modes. This cost coresponds to the length of the instruction.

Define Register allocation and Register assignment?

Register allocation: It is the process of deciding which IR values to keep in registers. Graph coloring is an effective technique for doing register allocation in compilers.

Register assignment: It is the process of deciding which register should hold a given IR value.

Mention the issues to be considered while applying the techniques for code optimization.

Ø The semantic equivalence of the source program must not be changed.

Ø The improvement over the program efficiency must be achieved without changing the algorithm of the program.

What are the basic goals of code movement?

To reduce the size of the code i.e. to obtain the space complexity.
To reduce the frequency of execution of code i.e. to obtain the time complexity.

What do you mean by machine dependent and machine independent optimization?

Ø The machine dependent optimization is based on the characteristics of the target machine for the instruction set used and addressing modes used for the instructions to produce the efficient target code.

Ø The machine independent optimization is based on the characteristics of the programming languages for appropriate programming structure and usage of efficient arithmetic properties in order to reduce the execution time.

List the different storage allocation strategies.

The strategies are:

Ø Static allocation

Ø Stack allocation

Ø Heap allocation

What are the contents of activation record? [AU May/Jun 2007] [AUT Nov/Dec 2010]

The activation record is a block of memory used for managing the information needed by a single execution of a procedure. Various fields’ f activation record are:

Ø Temporary variables

Ø Local variables

Ø Saved machine registers

Ø Control link

Ø Access link

Ø Actual parameters

Ø Return values

What is code motion? [AU May/Jun 2007] [AUT Nov/Dec 2010] [AU May / June 2012]

Code motion is an optimization technique in which amount of code in a loop is decreased. This transformation is applicable to the expression that yields the same result independent of the number of times the loop is executed. Such an expression is placed before the loop.

What are the various ways to pass a parameter in a function?

Ø Call by value

Ø Call by reference

Ø Copy-restore

Ø Call by name

What is dynamic scope rule? [AUT Nov/Dec 2011]

Under dynamic scope, a new activation inherits the existing bindings of nonlocal names to storage. There are two approaches to implement dynamic scope:

a. Deep access

b. Shallow access.

Give any four applications of DAG. [AU May / June 2013]

Ø We can detect common sub expressions automatically.

Ø We can determine the identifiers which have their vlues used in the block.

Ø We can determine the statements which compute values that could be used outside the block.

16 mark questions

1. Explain various issues envolved in the design of code generation.(or) Discuss about the issues in the design of code generation.

2.Explain the code generation algorithm.

3. Explain the process of constructing DAG representation of Basic block for the below given code segment.

{

prod=0;

i=1;

{

prod = prod + a[i]*b[i];

i=i+1;

}

while(i<=10)

}

4. Explain in detail about primary structure-preserving transformation on Basic blocks.

5. Explain the construction of DAG for the Basic blocks in detail.

6. Discuss the runtime storage management of a code generator.

7. Describe a simple Code Generator. Explain how it generates code by an example.

8. Discuss about the data structure used for constructing symbol table.

9. Describe in detail the source language issues.

10. Discuss the various strategies for recovering from syntax error.

11. Explain the methods used to access nonlocal names.

UNIT V Code Optimization

1. What is meant by code optimization?[AU Nov/Dec 2013]

The unnecessary instructions in object code is eliminated or replacement of one sequence of instruction by a faster sequence of instructions that does the same thing is called as code improvement or code optimization. It has two types,

· Local code optimization – Code improvement within a basic blocks

· Global code optimization - Code improvement across the basic blocks.

2. What is an optimizing compiler?

Compilers that apply code-improving transformations are called optimizing compiler. It should preserve the semantics of the original program.

What are the properties of optimizing compiler? [AU May / June 2013]

The source code should be such that it should produce minimum amount of target code. There should not be any unreachable code. Dead code should be completely removed from source language. The optimizing compilers should apply following code improving transformations on source language.

i) Common sub-expression elimination

ii) Dead code elimination

iii)Code movement

iv)Strength reduction

What is copy propagation?

A copy statement, u=v, assigns one variable v to another, u. We can replace all uses of u by v, thus eliminating both the assignments and u.

What is a Function – Preserving Transformation?

There are number of ways in which a compiler can improve a program without changing the function it computes. Common sub Expression Elimination, Copy propagation, dead code elimination and constant folding are example for the above transformation.

Which type of expression is called as common sub expression?

An occurrence of an expression E is called a common sub expression if E was previously computed, and the values of variables in E have not changed since the previous computation.

What is meant by constant folding? [AU May June 2013]

The substitution of values for names whose values are constant is known as constant folding. (or) Deducing at compile time that the value of an expression is a constant and using the constant instead is known as constant folding.

What is happening at code motion?

The transformation takes an expression that yields the same result independent of the number of times a loop is executed and places that expression out of the loop.

What is induction variable?

It is a variables that take on a linear sequence of values each time around the loop. Some of these are used only to count iterations, and they often can be eliminated, thus reducing the time it takes to go around the loop.

What is strength reduction?

The transformation of replacing an expensive operation ,such as multiplication by a cheaper one , such as addition is known as strength reduction.

What is Dominators?

A node in the flow graph dominates another if every path to the latter must go through the former. A proper dominator is a dominator other than the node itself. Each node except the entry node has an immediate dominator – that one of its proper dominators that is dominated by all the other proper dominators.

What are the classification of Edges?

When we construct a depth-first spanning tree, all the edges of the flow graph can be classified into,

· Advancing edges – those that go from ancestor to proper descendant

· Retreating edges – those from descendant to ancestor

· Cross edges – others

13. What is meant by Back Edges?

A back edge is one whose head dominates its tail. Every back edge is a retreating edge, regardless of which depth-first spanning tree for its flow graph is chosen.

What is reducible flow graph?

If every retreating edge is a back edge, regardless of which depth-first spanning tree is chosen, then the flow graph is said to be reducible. The majority of flow graphs are reducible(i.e.,) those whose only control-flow statements are the usual loop-forming and branching statements are certainly reducible.

What is natural loops? How can be it constructed? What are its properties?

A natural loop is a set of nodes with a header node that dominates all the nodes in the set and has at least one back edge entering that node. Given any back edge, we can construct its natural loop by taking the head of the edge plus all nodes that can reach the tail of the edge without going through the head. It has two essential properties,

· It must have single-entry node, called header. This entry node dominates all nodes in the loop, or it would not be the sole entry to the loop

· There must be a back edge that enters the loop header. Otherwise, it is not possible for the flow of control to return to the header directly from the loop. (i.e.,) There really is no loop.

What is Data-flow analysis?

It defines the value at each point in the program. Statements of the program have associated transfer functions that relate the value before the statement to the value after. Statements with more than one predecessor must have their value defined by combining the values at the predecessors, using a meet operator.

What does the data-flow analysis framework contains?

A data-flow analysis framework (D,V,^,F) consists of

A direction of the data flow D, which is either FORWARDS or BACKWARDS.
A semi-lattice, which includes a domain of values V and a meet operator ^.
A family F of transfer functions from V to V. This family must include functions suitable for the boundary conditions, which are constant transfer functions for the special nodes ENTRY and EXIT in any flow graph.

What is Reaching Definitions in data flow framework?

It has values that are sets of statements in the program that define values for one or more variables. The transfer function for the block kills definitions of variables that are definitely redefined in the block and adds in those definitions of variables that occur within the block. The confluence operator is union, since definitions reach a point if they reach any predecessor of that point.

What is meant by Live Variables?

It computes the variables that are live at each point. The framework is similar to reaching definition, except the transfer function runs backward. A variable is live at the beginning of the block if it is either used before the definition in the block or is live at the end of the block and not redefined in the block.

What is Available expressions?

To discover global common sub expressions, we determine the available expressions at each point (i.e.,)expressions that have been computed and neither of the expression’s arguments were redefined after the last computation. The data-flow framework is similar to reaching definitions , but the confluence operator is intersection rather than union.

Give an example to represent code motion and reduction in strength. [AUT Nov/Dec 2010]

Reduction in strength: Replace expensive operations by equivalent cheaper ones.

X² è X * X

Code motion: while( i < = n-2)

{

…..

}

Changed to t = n-2;

While(i<=t)

{

…

}

Identify the induction variables in the following segment of basic block and eliminate to optimize the code. [AUT Nov/Dec 2010]

B1: j=j-1

t4=4*j

t5=a[t4]

If t5 > j goto B1

Induction variable: j

After optimization

T4 = 0

B1: t4 := t4 + 4

T5 = a[t4]

If t5>76 goto B1

Define Peephole optimization. [AU Nov / Dec 2012]

Peephole optimization is a kind of optimization performed over a very small set of instructions in a segment of generated code. The set is called a "peephole" or a "window". It works by recognizing sets of instructions that can be replaced by shorter or faster set of instructions.

24. What are the characteristic of a peephole optimization?

Ø Redundant – instruction elimination

Ø Flow- of – control optimization

Ø Algebraic simplification

Ø Use of machine idioms

Give example for ‘removal of loop invariant’. [AU May / June 2013]

LOOP INVARIANT COMPUTATIONS:

It’s the construct where a same computation is performed, every time when loop is executed. Elimination Process-: We first identify the invariant computations then move them outside the loop without changing the meaning of actual program.

Example

x
:=  init

for ...

      a := 
...

x := x È{a}

y := x Çz

od:In
this example:

· z is a loop invariant

· x and y are induction variables

Therefore, we can replace the above code with the following one:

x
:=  init

y
:= initÇz

for ...

a :=

x := x È{a}

if a Îz then

y := y È {a}

16 mark questions

Discuss in detail about peephole optimization. (or) Discuss the peephole optimization techniques with example.
What are the principle sources of optimization? Write notes on peephole optimization.
Explain various code optimization techniques in detail.
Describe the principal sources of optimization.
Optimize the following code using various optimization techniques.

i=1;s=0

for(i=1;i<=3;i++)

for(j=1;j<=3;j++)

c[i][j]=c[i][j][j]+a[i][j]+b[i][j].

What are the various loops in flow graphs.
The natural loop of a back edge n->h was defined to be h plus the set of nodes that can reach n without going through h. Show that h dominates all the nodes in the natural loop of n->h.
Suppose V is the set of complex numbers. Which of the following operations can serve as the meet operation for a semilattice on V?

a) Addition: (a+ib)^(c+id)=(a+b)+i(c+d).

b) Multiplication: (a+ib)^(c+id)=(ac-bd)+i(ad+bc)

UNIVERSITY QUESTION PAPERS

B.E/B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2013.

Seventh Semester

Computer Science and Engineering

080230045 - PRINCIPLES OF COMPILER DESIGN

(Regulation 2008)

Answer ALL questions

PART- A- (10*2=20 marks)

1. What is difference between compiler and assembler?

2. What are cousins of the compiler?

3. What is the role of Lexical analyser?

4. Draw the NFA for regular expression ab(a/b)*ab*.

5. Define context free grammar. Give an example.

6. What are role of parser?

7. What are intermediate languages?

8. Give some forms of intermediate languages.

9. What is DAG?

10. What is meant by Code Optimization?

PART- B- (5*16=80 marks)

11. (a).What are six phases of compilers? Explain with an example.

(b)What are compiler construction tools? Describe about their features.

12. (a)Draw the DFA for the regular expression (a/b)*abb and find minimized DFA.

(b).(i). Write in detail about tool for generating a Lexical Analyser. (8)

(ii).Give a lexical analyser specification for a model compiler. (8)

13. (a)Design a predictive parsing table for the grammar

E-> E+T/T

T-> T*F/F

F->(E)/id

and show the moves for input string id*id+id.

(b) Design an operator precedence parser for the grammar

E-> E+T/T

T-> T*F/F

F->(E)/id

and draw the precedence function graph.

14. (a)Detail the translation scheme for flow control statements and procedure call

(b).Discuss about back patching of Boolean expression.

15. (a)What are the principle sources of optimization? Write notes on peephole optimization.

(b)Describe a simple Code Generator. Explain how it generates code by an example.

B.E/B.Tech DEGREE EXAMINATION, MAY/JUNE 2012

Sixth semester

Computer science and Engineering

CS2352/Cs62/10144 Cs602-PRINCIPLES OF COMPILER DESIGN

(Regulation 2008)

Answer ALL questions

PART- A

1. Mention few cousins of compiler.

2. What are the possible error recovery actions in lexical analyzer?
3. Define an ambiguous grammar.
4. What is dangling reference?
5. Why are quadruples preferred over triples in an optimizing compiler?
6. List out the motivations for back patching.
7. Define flow graph.
8. How to perform register assignment for outer loops?
9. What is the use of algebraic identities in optimization of basic blocks?
10. List out two properties of reducible flow graph?

PART B

11. (a) (i) What are the various phases of the compiler? Explain each phase in detail.
(ii) Briefly explain the compiler construction tools.
                                                OR
(b) (i) What are the issues in lexical analysis?
(ii)Elaborate in detail the recognition of tokens.
12. (a) (i) Construct the predictive parser for the following grammar.
S->(L)/a
L->L,S/S
(ii)Describe the conflicts that may occur during shift reduce parsing.
                                                OR
(b) (i)Explain the detail about the specification of a simple type checker.
(ii)How to subdivide a run-time memory into code and data areas. Explain
13. (a) (i) Describe the various types of three address statements.
(ii)How names can be looked up in the symbol table? Discuss.
                                                OR
(b) (i)Discuss the different methods for translating Boolean expressions in detail.
(ii) Explain the following grammar for a simple procedure call statement S->call id (enlist).
14. (a) (i) Explain in detail about the various issues in design of code generator.
(ii)Write an algorithm to partition a sequence of three address statements into basic blocks.
                                                OR
(b) (i)Explain the code-generation algorithm in detail.
(ii)Construct the dag for the following basic block.
d: =b*c
e: = a +b
b: =b*c
a: =e-d
15. (a) (i) Explain the principal sources of optimization in detail.
(ii)Discuss the various peephole optimization techniques in detail.
                                                OR
(b) (i) How to trace data-flow analysis of structured program?
(ii) Explain the common sub expression elimination, copy propagation, and
transformation for moving loop invariant computations in detail.

B.E/B.Tech DEGREE EXAMINATION, NOVEMBER/DECEMBER 2011.

Sixth Semester

Computer Science and Engineering

CS 2352 - PRINCIPLES OF COMPILER DESIGN

(Regulation 2008)

Maximum : lOO marks

Answer ALL questions.

PART A

1. What is the role of lexical analyzer?

2. Give the transition diagram for an identifier.

3. Define handle pruning.

4. Mention the two rules for type Checking.

5. Construct the syntax tree for the following assignment statement:

6. What are the types of three address statements?

7. Define basic blocks and flow graphs.

8. What is DAG?

9. List out the Criteria for Code improving transformations.

10. When does dangling reference Occur?

PART B

11. (a).(i) Describe the Värious phäses Of Compiler amd trace it with the program Segment (position: = initial + rate * 60]. (10)

(ii) State the Complier Construction tools. Explain them. (6)

(b). (i) Explain briefly about input buffering in reading the source program for finding the tokens. (8)

(ii) Construct the minimized DFA for the regular expression (0+1)*(0+1) (8)

12. (a).Construct a Canonical parsing table for the grammar given below.

Also explain the algorithm used. (16)

E->E+T E->T T->T*F T->F

F ->(E) F ->id

(b).What are the different Storage allocation Strategies? Explain. (16)

13.(a).(i) Write down the translation Scheme to generate Code for assignment statement. Use the Scheme for generating three address Code for the assignment statement g:a+b-c*d (8)

(ii) Describe the various methods of implementing three-address

statements. (8)

(b),(i) How Can Back patching be used to generate code for Boolean expressions and flow of Control statements? (10)

(ii) Write a short note on procedures Calls. (6)

14.(a).(i) Discuss the issues in the design of Code generator. (10)

(ii). Explain the Structure-preserving transformations for basic blocks. (6)

(b).(i) Explain in detail about the simple code generator. (8)

(ii) Discuss briefly about the Peephole Optimization. (8)

15.(a).Describe in detail the principal sources Of Optimization. (16)

(b).(i) Explain in detail Optimization Of basic blocks with example. (8)

(ii) Write about Data flow analysis Of structural programs. (8)

B.E.,B.Tech. DEGREE EXAMINATION, APRIL/MAY 2011

Sixth Semester

Computer Science and Engíneen'ng

CS 2352 _ PRINCIPLES OF COMPILER DESIGN

(Regulation 2008)

Time : Three hours Maximum : 100 marks

Answer ALL questions

PART A

l. What is an interpreter?

2. Define token and lexeme.

3. What is handle pruning?

4. What are the limitations of static allocation?

5. List out the benefits of using machine-independent intermediate forms.

6. What is a syntax tree? Draw the syntax tree for the following statement:

7. List out the prirnary structure preserving transformations on basic block.

8. What is the purpose of next-use information?

9. Define dead-code elimination.

10. What is loop Optimization?

PART B

11.a.(i). Describe the Various phases of Complier and trace the program segment a:=b + C *4 for all phases. (10)

(ii) Explain in detail about compiler construction tools. (6)

b.(i) Discuss the role of lexical analyzer in detail. (8)

(ii)Draw the transition diagram for relational operators and unsigned numbers in Pascal. (8)

12.a.(i) Explain the error recovery strategies in syntax analysis. (6)

(ii) Construct a SLR construction table for the following grammar.

E->E+T

E->T

T->T*F

T->F

F ->(E)

F ->id

b.(i)Distinguish between the source text of a procedure and its activation at run time. (8)

(ii) Discuss the various storage allocation strategies in detail. (8)

13.a.(i).Define three-address Code. Describe the various methods of implementing three-address statements with an example. (8)

(ii)Give the translation scherne for Converting the assignments into three address code. (8)

b.(i) Discuss the various methods for translating Boolean expression. (8)

(ii)Explain the process of generating the code for a Boolean expression in a single pass using back patching. (8)

14.a.(i)Write in detail about the issues in the design of a code generator (10)

(ii) Define basic block. Write an algorithm to partition a sequence of three-address statements into basic blocks. (6)

b.(i)How to generate a code for a basic block from its dag representation? Explain. (6)

(ii)Briefly explain about simple code generator. (10)

15.a.(i)Write in detail about function-preserving transformations. (8)

(ii) Discuss briefly about Peephole optimization. (8)

b.(i)Write an algorithm to construct the natural loop of a back edge. (6)

(ii) Explain in detail about code-improving transformations. (10)

Academics &Innovations

Thursday 19 December 2013

PCD 2 marks Question and Answers - 2013(even sem)

Example