Compilers - SisInf Lab

Transcript

Compilers - SisInf Lab
Formal Languages
and Compilers
Master’s Degree Course in
Computer Engineering
A.Y. 2015/2016
FORMAL LANGUAGES AND COMPILERS
Compilers
Floriano Scioscia
1
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler history
•
The name compiler was introduced in 1950. Translation was seen as a “compilation” of a
sequence of machine language subprograms, selected from a library.
•
1957: the FORTRAN team at IBM, lead by John Backus, is credited for inventing the first
complete compiler.
•
1960: COBOL was one of the first languages compiled for multiple architectures.
•
A compiler is itself a program written in some language. The earliest compilers were
written in Assembly.
•
1962: the first self-compiled compiler (i.e. capable of compiling its own source code) was
developed for the Lisp language by Hart and Levin at MIT.
•
Creating a self-compiling compiler introduces a bootstrapping problem: the very first
compiler for that language must necessarily be written in another language or compiled
having the compiler work as an interpreter (like Hart and Levin did with their Lisp
compiler).
•
Early 1970s: the use of high-level languages to write compilers received a boost, as
Pascal and C compilers were written in the same languages.
•
Ambitious “optimizations” were adopted to generate efficient machine code: this was
essential for the first computers, having limited resources. Efficient resource usage is still
of high importance for modern compilers.
Compilers - Floriano Scioscia
2
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler implementation
goals
Correctness
• Compilers allow programmers to discover lexical and logical errors.
• Compilation techniques allow to improve security (e.g. Java Bytecode
Verifier).
Intellectual property protection
Efficiency
• Reduce time and memory occupation for data and code at both
compile and run time.
• Support language expressiveness.
Providing a “development environment”.
• Grant a fast turn-around time.
• Enable separate compilation.
• Allow source code debugging.
Compilers - Floriano Scioscia
3
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
“Compilers” as programming
language translators only?
Beyond translating high-level programming languages to lower-level ones,
compilers have further uses:
•
TeX and LaTeX use compilers to translate text and formatting markup to a
typeset document.
•
Postscript (generated from LaTeX, Word, etc.) is a programming language
interpreted by a printer or a Postscript viewer to produce a human-readable
form of a document.
•
Mathematica, MatLab and others are interactive systems mixing programming
with mathematics. They use compilation techniques to manage the problem
specifications, internal representation and solution.
•
Verilog and VHDL support the design of VLSI circuits. A silicon compiler
specifies the layout and composition of VLSI circuit masks using standard
building blocks: just like a typical compiler, they understand and apply design
rules which determine the feasibility of a circuit design.
•
Interactive tools often need a programming language to support automatic
analysis and modification of a system.
Compilers - Floriano Scioscia
4
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
What is a compiler?
• It is a program which reads a sentence in a language and translates it
in sentence(s) in another language.
Source program
COMPILER
usually a program written
in a high-level language
errors
Compilers - Floriano Scioscia
Object program
usually the equivalent program
in machine code for a specific
architecture
5
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Role of compilers
Support the use of high-level programming languages
• Increase programmers’ productivity
• Easier code maintainability
• Higher code portability
Exploit opportunities provided by low-level architecture details
• Instruction selection
• Addressing modes
• Pipeline
• Cache usage
• Instruction-level parallelism
Compilers are needed to bridge the gap between high-level and lowlevel languages
• Architecture changes → compiler changes
• Significant performance differences
Compilers - Floriano Scioscia
6
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler development
Compilers are software systems of large size and complexity.
Compiler development requires knowledge about:
• Programming tools (compilers, debuggers)
• Program-generation tools (LEX/YACC, Flex/Bison)
• Software libraries (sets, collections)
• Simulators
Knowledge of compiler architecture improves the effectiveness
of software designers/developers.
Compilers - Floriano Scioscia
7
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler requirements
• Correctness of generated code
• Efficiency of generated code (output runs fast)
• Efficiency of the compiler (compiler runs fast)
• Compile time must be proportional to code size
• Separate compilation
• Accurate syntax error diagnostics
• Good interoperability with the debugger
• Accurate anomaly detection
• Allowing cross language calls
• Predictable optimizations
Compilers - Floriano Scioscia
8
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler architecture
• Compilers typically include two main phases: analysis and synthesis.
• In the analysis phase an intermediate representation of the source
program is created.
• This phase includes:
– lexical analyzer,
– syntax analyzer,
– semantic analyzer,
– intermediate code generator.
• Starting from the intermediate representation, in the synthesis phase
the equivalent target program is created.
• This phase includes:
– code generator,
– optimizer.
Compilers - Floriano Scioscia
9
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler phases (1/2)
Source program
Scanner
Parser
Semantic checker
Symbol table
Intermediate code
generator
Error handlers
Machine-independent
code optimizer
Code generator
Machine-dependent
code optimizer
Target program
Compilers - Floriano Scioscia
10
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler phases (2/2)
•
Each phase transforms the source program from a representation to another.
•
Error handlers and symbol table are used in all phases.
•
The symbol table contains information on all symbolic elements, such as name,
scope, type (if present), etc.
COMPILER PHASES
Sequence of characters
ARCHITECTUREINDEPENDENT OPTIMIZER
LEXICAL ANALYZER
Sequence of tokens
Optimized
intermediate code
SYNTAX ANALYZER
FRONT END
(Analysis)
CODE GENERATOR
Parse tree
SEMANTIC ANALYZER
Abstract syntax tree
with attributes
BACK END
(Synthesis)
Machine code
ARCHITECTUREDEPENDENT OPTIMIZER
INTERMEDIATE CODE
GENERATOR
Optimized
machine code
Intermediate code
Compilers - Floriano Scioscia
11
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler classification
criteria (1/2)
Number of steps
•
It is the number of times the source code is read during compilation
•
Single-pass compilers – multi-pass compilers
Optimization
•
No optimization
•
Optimization in space
•
Optimization in time
•
Optimization in power consumption
Generated target format
•
Assembly language
•
Relocatable binary
•
Memory image
Compilers - Floriano Scioscia
12
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Compiler classification
criteria (2/2)
Generated object language
• pure machine code
– Compilers generate code for a particular machine instruction set, not presuming the
presence of any operating system or function library. This approach is rare, used
only in system implementation.
• augmented machine code
– Compilers generate code for a particular machine instruction set, augmented by
operating system and support routines: in order to execute such object code, the
target machine must have an operating system and a collection of runtime support
routines (I/O, memory allocation, etc.) which must be fused with the object code.
The degree of correspondence between code and hardware can vary widely.
• virtual machine code
– Compilers generate virtual machine code exclusively. This approach is attractive
because the object code is executed regardless of the underlying hardware. If the
virtual machine is kept simple, its interpreter can be written easily.
– This approach penalizes execution speed, typically of a 3:1 to 10:1 factor. "Just in
Time" (JIT) compiler can translate virtual code sections to native code to speed up
execution.
Compilers - Floriano Scioscia
13
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Benefits of virtual
machine code
The use of virtual machine code can be beneficial for several purposes:
– Simplifying a compiler by providing suitable primitives (e.g. method calls,
string manipulation, etc.);
– Compiler portability;
– Decreasing the size of generated code, since the instruction set is designed
for a particular programming language (e.g. JVM bytecode from Java).
• In order to generate virtual machine code, almost all compilers, in
varying degree, need to interpret some operations.
Compilers - Floriano Scioscia
14
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Generated target format
(1/2)
Assembly Language (Symbolic) Format
• A text file containing the assembly source code is produced: some
decisions in code generation (jump instruction targets, address
structure, etc.) are left to the assembler.
– This is a good approach for didactic projects.
– It supports the generation of assembly code for "cross-compilation" (compiling
occurs on a different machine w.r.t. the one the code will be executed).
– The assembly code generation simplifies debugging and understanding of a compiler
(as the generated code can be seen).
– Rather than a specific assembly language, C can be used as “universal assembly“
language. C is “machine independent “, more than any particular assembly
language. Nevertheless, some features of a program (such as run-time data
representation) are not accessible using C code, but easily accessible in assembly
language.
Compilers - Floriano Scioscia
15
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Generated target format
(2/2)
Relocatable Binary Format
• The code can be generated in a binary format with references to
external and local instructions and with data addresses not “bounded”
yet.
• The addresses are assigned relative to the start of the module or with
respect to a symbolic unit denomination.
– A linkage step allows adding support libraries and other separately compiled routines
and produces an executable absolute binary format of the program.
Memory-Image (Absolute Binary) Format
• The compiled code can be loaded to memory and executed
immediately.
– This is the fastest method but the possibility to use libraries can be limited and the
program must be recompiled for each run.
– Memory-image compilers are useful for students, who frequently debug and change
the code, and when the compiling costs are larger than execution ones.
Compilers - Floriano Scioscia
16
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Single-phase compilers
Lexical analyzer
Code generator
Syntax and
semantic
analyzer
Compilers - Floriano Scioscia
17
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Two-phase compilers (1/4)
•
Current compilers split the compiling process in two main phases, the front
end and the back end. Each can require reading the source code.
•
In the front-end step, the compiler translates source code in an intermediate
language (usually internal to the compiler). Hence the front end depends on
the source language, but not on the target machine.
•
In the back-end step, a preliminary optimization of the intermediate code
sometimes occurs, then the object code is generated and optimized. The back
end is therefore independent on the source language, but it depends on the
target machine.
Front end
Source
language
Back end
Intermediate
language
Object
language
Errors
Compilers - Floriano Scioscia
18
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Two-phase compilers (2/4)
Phases and passes
•
A single pass can be enough for more phases (interleaved during the pass):
for example, analysis and generation of intermediate code can be done in a
single pass “driven” by the syntax analyzer.
•
Decreasing the number of passes reduces required time, but it increases the
required memory.
•
The intermediate language (IL) can be of:
– high level, implying that the source language operators are still present in the IL;
– low level, with the source language operators translated to other, simpler or more
specialized ones.
if cond then branch1 else branch2
ifop cond branch1 branch2
Compilers - Floriano Scioscia
jump iffalse cond label2
branch1
jump label_exit
label2: branch2
label_exit
19
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Two-phase compilers (3/4)
Consequences of two-phase compilation:
•
Building a compiler for a new processor (retargeting) is simpler;
•
Compilers with multiple front ends can be designed.
Front end and Back end can be reused separately for new compilers.
Compilers - Floriano Scioscia
20
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Two-phase compilers (4/4)
PASCAL code
Front end for
PASCAL
Back end for
processor P1
Object code for P1
PASCAL code
Front end for
PASCAL
Back end for
processor P2
Object code for P2
PASCAL code
Front end for
PASCAL
Back end for
processor P3
Object code for P3
PASCAL code
Front end for
PASCAL
Back end for
processor P4
Object code for P4
PASCAL code
Front end for
PASCAL
Back end for
processor P1
Object code for P1
Front end for C
Back end for
processor P2
Object code for P2
Front end for
ADA
Back end for
processor P3
Object code for P3
Back end for
processor P4
Object code for P4
C code
ADA code
Retargeting
Multiple front-end
Compilers - Floriano Scioscia
21
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Lexical analyzer a.k.a.
scanner (1/2)
•
It processes the preprocessor directives (include, define, etc)
•
Transforms the source code in a compact and uniform structure (sequence of
tokens, i.e. lexical elements)
•
Removes unnecessary information in the source code (comments, whitespace)
•
Identifies lexical errors
•
Allows describing tokens effectively through regular expression notation.
•
A token describes a set of strings having the same role (e.g. identifiers,
operators, keywords, numbers, delimiters, etc.).
•
Tokens, described by regular expressions, are the smallest elements (not
further decomposable) of a languages, such as keywords (for, while),
variable names (goofy), operators (+, -, <<).
Compilers - Floriano Scioscia
22
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
•
Lexical analyzer a.k.a.
scanner (2/2)
A lexical analyzer (a.k.a. scanner) reads the source text one character at a
time and returns the tokens of the source program.
newval := oldval + 12

token:
newval identifier
:= assignment operator
oldval identifier
+ add operator
12 number
•
Information on identifiers are stored into a symbol table.
•
Regular expressions are used to describe tokens.
•
A Deterministic Finite-state Automaton can be used to implement a scanner
Compilers - Floriano Scioscia
23
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Syntax analyzer a.k.a.
parser
•
Syntax analysis is the process of construction of the derivation for a sentence
with respect to a given grammar.
•
Therefore a parser is an algorithm working on a string: if the string belongs to
the language generated by the grammar, parsing produces a derivation,
otherwise it stops notifying:
– the place within the string where the error occurred;
– the type of error (diagnosis).
•
Parsing takes in input the sequence of tokens produced by the previous lexical
analysis step and perform syntactic checks through a grammar. The output of
this step is a parse tree.
•
Parsing:
– looks for syntax errors;
– groups tokens in grammar sentences.
Compilers - Floriano Scioscia
24
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Syntax analyzer (CFG)
(1/2)
•
Syntax of a programming language is specified by means of a Context Free
Grammar (CFG).
•
Rules of a CFG are recursive.
•
To express a CFG, a BNF (Backus Naur Form) is often used:
assgstmt
::= identifier := expression
expression
::= identifier
expression
::= number
expression
::= expression + expression
•
A parser checks whether each program statement obeys the rules of the
language CFG.
•
If it is so, the parser creates a parse tree for the program statements.
Compilers - Floriano Scioscia
25
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
•
Syntax Analyzer (CFG)
(2/2)
A parser usually creates the syntactic structure as a parse tree for each
specific construct of a programming language. Hereafter there is the parse
tree of an assignment statement; likewise, parse trees are produced by the
parser for loop statements, as we will see
assgstmt
identifier
newval
:=
expression
expression
+
identifier
oldval
•
expression
number
12
In a parse tree, all terminals are leaves. All other nodes are nonterminals.
Compilers - Floriano Scioscia
26
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
•
Parsing techniques
Based on the way the parse tree is created, different parsing techniques exist.
They can be classified in two groups:
Top-Down Parsing: descending or “predictive” parsing
•
The construction of the parse tree starts from the root and proceeds toward
the leaves.
•
Efficient top-down parsers can be easily written by hand.
•
Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing:
Left-to-right, Left-most).
Bottom-Up Parsing: ascending or “shift-reduce” parsing
•
The construction of the parse tree starts from the leaves and proceeds toward
the root.
•
Efficient bottom-up parsers are usually created with the assistance of software
tools.
•
Bottom-up parsing is also known as shift-reduce parsing.
Operator-Precedence Parsing – easy to implement.
• LR Parsing: Left-to-right, Right-most.
•
Compilers - Floriano Scioscia
27
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Semantic analysis
•
Semantic analysis checks the meaning of statements in the source code.
Typical checks in this step include type checking, verifying that identifiers have
been declared before being used, etc.. The outcome of this step is the
Abstract Syntax Tree (AST).
•
Semantic analysis:
– looks for semantic errors;
– gathers information on types by doing type checking.
•
Example:
newval := oldval + 12
The type of newval identifier must match the type of expression (oldval+12)
•
A Semantic Analyzer looks for semantic errors in the source program and
collects information needed for the subsequent code generation and
optimization.
Compilers - Floriano Scioscia
28
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Semantic analyzer
• Semantic information cannot be typically represented with a simple
parse tree. Specific attributes (e.g. data types and values in
expressions) must be associated to constructs (more precisely, to
terminals and nonterminals of the grammar).
• Productions of a CFG must be integrated by annotating them with rules
(semantic rules) and/or code fragments (semantic actions). Rules
define how to compute the value of the attributes of nodes in the
production. Code fragments are executed when the production is used
during syntax analysis.
• The execution of semantic actions, in the order determined by parsing,
produces a syntax-directed translation of the program statements
as a result.
Compilers - Floriano Scioscia
29
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Type checking
• Type checking is an important part of semantic analysis: it checks the
static semantics of each node of the parse tree; it checks that the
construct is legal (all involved identifiers must be declared, types must
be correct, etc.); if the construct is semantically correct, type
checking annotates the node, adding information about the type or
the related symbol table entry, so allowing the creation of the AST for
the construct; if a semantic error is discovered, a proper error message
is notified.
• Type checking depends purely on the semantic rules of the source
language, i.e. it is independent from the target language.
Compilers - Floriano Scioscia
30
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Syntax Analyzer vs
Lexical Analyzer
Which program constructs must be recognized by a lexical analyzer and
which by a syntax analyzer?
• They both do the same thing, but the lexical analyzer only deals with
non-recursive language constructs.
• The syntax analyzer deals with recursive language constructs.
• The lexical analyzer simplifies the work for the syntax analyzer.
• The lexical analyzer recognizes the smallest elements (the tokens) of
the source program.
• The syntax analyzer works on tokens, recognizing the language
structures.
Compilers - Floriano Scioscia
31
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Intermediate code generator
(1/2)
•
The abstract syntax tree is the first form of intermediate representation (IR) of the
source program.
•
A low-level IR can be obtained explicitly from it, which is similar to machine code and
can be deemed as a kind of program for a virtual machine.
•
This intermediate code is known as three-address code. It is easy to generate and
translate to machine code, both for arithmetic expressions and for statements.
Intermediate representations of statement:
do i = i + 1; while(a[i] < v);
Abstract syntax tree
Compilers - Floriano Scioscia
Three-address code
32
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Intermediate code generator
(2/2)
•
An element can be translated if it is semantically correct.
•
Translation requires that the run-time “meaning” of the construct is captured.
•
For example, the AST of a while loop contains two subtrees, one for the control
expression, the other one for the body of the loop.
•
Nothing in the AST shows that the loop “iterates”. This meaning is captured when the
AST of the while loop is translated. Translation depends on the semantics of the source
language.
•
In the annotation of the AST by means of semantic rules and actions, the notion of
checking the value of the control expression and possibly executing the body of the loop
becomes explicit.
•
In other words, a CFG, besides defining the syntax of a programming language, can be
used as a support to translation with the technique of syntax-directed translation.
•
To do so, the AST (and therefore the statements and expressions of a programming
language) must be transformed from infix to postfix notation (where operators
appear after their operands). A simple example with arithmetic expressions follows.
Compilers - Floriano Scioscia
33
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
•
•
Translation from infix to
postfix notation
A generic arithmetic expression can be described visually
as a binary tree recursively containing an operator in the
father nodes and the two operands in the two children
nodes. The tree in the picture represents the expression
A-B or B-A, according respectively to the so-called left or
right infix notation in which the left (respectively, right)
children is visited first.
By applying the left (right) prefix notation, the tree corresponds to the
string –AB (resp. –BA). Finally, by applying the left (right) postfix notation,
the tree corresponds to the string AB- (resp. BA-).
•
Prefix and postfix notations lend themselves to unambiguous storage and
execution of arithmetic expressions. Conversely, the infix notation suffers from
interpretation ambiguity (hence it requires parenthesis), and therefore is less
often adopted.
•
Postfix notation can allow the execution of the equivalent arithmetic expression
if, reading the string from left to right, every operator is applied recursively to
the two operands which precede it immediately. These two operands are the
left and the right one if the notation is left, while the right and the left one if
the notation is right.
Compilers - Floriano Scioscia
34
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
•
Translation of an expression
in postfix notation
Considering the source expression
ALPHA = Beta * (Gamma - Delta) / (Omega + Psi)
the syntax analyzer will create the equivalent parse tree and then its AST,
=
ALPHA
/
*
Beta
+
-
Gamma
Omega
Psi
Delta
which is transformed by means of post-order traversal into the corresponding
left postfix notation
ALPHA Beta Gamma Delta - * Omega Psi + / =
easily translatable to a sequence of three-address instructions.
Compilers - Floriano Scioscia
35
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Three-address code
•
By transforming the AST in three-address instructions, a compiler can produce an
explicit representation of the intermediate code.
•
This code is generally independent from target architecture. It takes its name from its
instructions, which have the form x = y op z , where op is a binary operator, y and z
are operands and x is the location of the result of the operation.
•
A three-operand instruction can perform a single operation, typically a computation, a
comparison or a jump.
•
The level of the intermediate code is generally close to machine code.
example 1
ALPHA Beta Gamma Delta - * Omega Psi + / =
minus
Delta, , t1
add
Gamma, t1 , t2
mult
Beta, t2 , t3
add
Omega, Psi, t1
div
t3 , t1 , t2
mov
t2 , , ALPHA
example 2
newval := oldval * fact + 1
id1 := id2 * id3 + 1
MULT
ADD
MOV
Compilers - Floriano Scioscia
id2, id3, temp1
temp1, #1, temp2
temp2, , id1
36
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Optimizer
• Optimization aims to improve the code in order to reduce the running
time and the required memory (this phase can be very complex).
Example of code optimizer (for intermediate code):
MULT id2, id3, temp1
ADD temp1, #1, id1
Compilers - Floriano Scioscia
37
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Program synthesis (1/2)
• Little of the nature of the target machine must be manifest during the
generation and optimization of intermediate code.
• Detailed information on the target machine architecture (available
instructions, characteristics, etc.) are instead used in the final phase
of target machine code generation.
• In simple non-optimizing compilers, the translator generates the
reference code directly, without using an IR.
• In more sophisticated compilers, a high-level (source-oriented) IR is
generated first, then the code is translated to a low-level (targetoriented) IR.
• This approach allows a clear separation of dependencies derived from
the source and the target.
Compilers - Floriano Scioscia
38
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Program synthesis (2/2)
•
The IR code is analyzed and transformed into equivalent IR code optimized
for a specific architecture.
•
Actually, the name “optimization” is ambiguous: not always the produced
code is the best possible translation of the IR code:
– Some optimizations cannot be applied in some circumstances, as they are
undecidable problems. For example, the problem of the removal of “dead” code
cannot be solved in general.
– Other optimizations are however too expensive. This concerns NP-hard problems,
which are believed to be inherently exponential. Registry allocation to variables is
an example of NP-hard problem.
•
Optimization can be complex, it can involve several steps, which may need to
be executed many times.
•
Optimization can slow down translation. Nevertheless, a well-designed
optimizer can increase the program execution speed significantly, by moving
operations or deleting unneeded ones.
Compilers - Floriano Scioscia
39
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Code generation
•
IR code is mapped to machine code by the code generator, which produces the target
language for a particular architecture.
•
The target program is typically a relocatable binary object containing machine code.
•
Example (suppose having an architecture on which at most one operand is a registry) :
MOVE
id2,R1
MULT
id3,R1
ADD
#1,R1
MOVE
R1,id1
•
This phase uses detailed information on the target machine and includes optimizations
tied to the specific machine, such as register allocation and code scheduling.
•
The code generator can be rather complex, as many particular cases have to be
considered in order to produce good target code.
•
Automated code generators can be used. The basic approach is defining templates
matching low-level IR instructions with target instructions.
•
A famous compiler using automated code generation is GNU C, a strongly optimizing
compiler exploiting description files containing machine architecture descriptions for
more than ten CPU architectures and at least two languages (C and C++).
Compilers - Floriano Scioscia
40
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Symbol table(s)
• The compiler uses one or more symbol table as data structures to
store information about identifiers and constructs of the source
language. This information is collected during the analysis phase and
shared by all compilation steps.
• Each time an identifier is used, the symbol table allows accessing
information about it, collected when its declaration was processed.
• It may seem more natural that the scanner – being the first
component to process lexemes – associates a lexeme to an element
of the symbol table. Actually, in many cases it is the parser, because it
can distinguish among the different declarations and occurrences of
an identifier.
• Frequently, each program scope (e.g. in C, each code block) is
associated with a specific symbol table.
Compilers - Floriano Scioscia
41
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Tool for compiler design
• General-purpose software development tools + specialized
tools.
• Tools for the automatic design of compiler components:
– scanner generators: produce lexical analyzers;
– parser generators: produce syntax analyzers;
– syntax-directed translator engines: produce routine collections which
traverse the parse tree and generate intermediate code;
– automatic code generators: they translate intermediate language to
machine language through template matching rules;
– data-flow engines: data-flow analyzers optimize the code by means of
information about the way data are propagated among the different parts
of a program.
Compilers - Floriano Scioscia
42
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Useful formalisms for
compiler design
• Lexical analysis
• Semantic analysis
– Regular Grammars
– Finite State Automata
– Regular Expressions
– Syntax-directed translation
– Attribute grammars
– More sophisticated approaches
• Syntax analysis
• Code generation
– Context-Free Grammars
(although not every element in
a programming language can
be treated as a context-free
grammar)
– CFG with O(n3) or lower
complexity
– Push Down Automata
– Pattern matching
– Heuristics
– Ad-hoc solutions
• Symbol table
– Hashing functions
Compilers - Floriano Scioscia
43