Practical class: regular expressions and Automata

Transcript

Practical class: regular expressions and Automata
Master’s Degree Course in
Computer Engineering
Formal Languages
and Compilers
FORMAL LANGUAGES AND COMPILERS
PRACTICAL CLASS:
Regular expressions & automata
Eliana Bove – [email protected]
A.Y. 2015/2016
Formal Languages
and Compilers
Recall: regular expressions
A.Y. 2015/2016
DEI – Politecnico di Bari
A regular expression r – defined upon an alphabet Σ and a
set of metacharacters {+,*,(,),.,Ø, ε}, supposed not in Σ – is a
string r in the alphabet (Σ∪{+,*,(,),.,Ø}), such that one of the
following conditions holds:
1.
2.
3.
4.
r=Ø
r=ε
r = x, where x is any character Є Σ
r = (s+t) where s and t are regular expressions on Σ and + is the
union operator
5. r = s.t where s and t are regular expressions on Σ and . is the
concatenation operator
6. r = s* where s is a regular expression on Σ and * is the Kleene
closure operator
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Regular expressions:
exercise 1
Exercise 1: C language identifiers
C language identifiers  names which make reference to: variables,
functions, labels, and other user-defined entities.
An identifier is composed of one or more characters and must comply with
the following rules:
• the first character must be a letter or an underscore ( _ );
• the subsequent characters can be letters, digits or underscores.
Give a definition through regular expressions.
Exercise 1: C language identifiers
•
letter_  A | B |…| Z | a | b |…| z | _
•
digit
 0 | 1 |…| 9
•
id
 letter_ (letter_ | digit)*
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Regular expressions:
extensions
Extensions of regular expressions
Expression
Meaning
a*
a repeated 0 or more times
a?
a repeated 0 or 1 time
a+
a repeated 1 or more times
a{1,3}
a repeated from 1 to 3 times
a{3,}
a repeated at least 3 times
[abc]
any character of a, b or c
[0-9]
any of the characters in the range
[^a-z]
any character outside the range
.
any character
^
beginning of line
$
end of line
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Regular expressions:
exercise 2
Exercise 2: C language identifiers
Solve Exercise 1 using the extensions in the table.
Exercise 2: C language identifiers
• letter_  [A-Za-z_]
• digit
 [0-9]
• id
 letter_ (letter_ | digit)*
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Regular expressions:
exercise 3
Exercise 3: numbers in C
Unsigned int or floating point data types are strings like 5280,
0.0123, 6.336E4 or 1.89E-4. Provide a definition through
regular expressions.
Exercise 3: numbers in C
• digit
•
digits
• optionalFraction
 [0-9]
 digit digit*
 . digits | ε
•
optionalExponent  ( E (+ | - | ε) digits) | ε
•
number
 digits optionalFraction optionalExponent
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Regular expressions:
exercise 4
Exercise 4: numbers in C
Solve Exercise 3 using the regular expression extensions in the
table.
Exercise 4: numbers in C
• digit
 [0-9]
• digits
 digit+
• number
 digits (. digits)? (E [+-]? digits)?
Practical class: Regular expressions and automata – Eliana Bove
Regular expressions:
exercise 5
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Exercise 5: SQL
Many languages are case sensitive, thus keywords can be written in just
one way, and the regular expressions describing their lexeme patterns are
very simple. Some languages like SQL are case insensitive, instead, thus
keywords can be written in lower case, upper case or a combination of
lower and upper case. For example, the SELECT keyword can be written also
as select, Select, sElEcT.
Use regular expressions to define how keywords are represented in a caseinsensitive language. In particular, show how to write the keyword «select»
in SQL.
Exercise 5: SQL
•
select
 [Ss][Ee][Ll][Ee][Cc][Tt]
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Regular expressions:
exercises 6 and 7
Exercise 6
Define through regular expressions all the strings of a and b not containing
the substring abb.
Exercise 6
• string_ab  b*(a+b?)*
Exercise 7
Define through regular expressions all the strings of lower case letters
containing the 5 vowels in order.
Exercise 7
• consonants  [bcdfghjklmnpqrstvwxyz]
•
string
 consonants* a (consonants | a)* e (consonants
| e)* i (consonants | i)* o (consonants | o)* u (consonants | u)*
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Regular expressions:
further practice
Further practice:
1. Define, through regular expressions, all lower case strings in which
letters are in ascending lexicographic order.
2. Given the alphabet Σ = {0,1,2}, define, through regular expressions, all
the strings on Σ not containing repetitions.
3. Define, through regular expressions, comments on one or more lines
(/* comment */). Warning: inside the comment string the endcomment character sequence is not admitted, unless it is enclosed
within " ".
Useful link to check the correctness of your
expressions: http://regex101.com/#pcre
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
regexp  NFA (1/5)
Algorithm: regular expression  NFA (nondeterministic finitestate automaton)
Input: regular expression r on an alphabet Σ
Output: NFA N accepting the language L(r)
Method: to build an NFA starting from a regular expression, we use two
rules: BASIS (to build separate NFAs for each subexpression) and
INDUCTION (to build the final NFA by combining other NFAs).
BASIS:
• For the empty string ε, build an NFA with initial state i, final state f and an εtransition from i to f
start
i
ε
f
Practical class: Regular expressions and automata – Eliana Bove
Automata:
regexp  NFA (2/5)
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
• For each subexpression a in the alphabet Σ, build an NFA with initial state i, final
state f and an a-transition from i to f
start
a
i
f
INDUCTION:
Suppose N(s) and N(t) are two NFAs for regular expressions s and t, respectively
• For regular expression r=s|t (union of two regular expressions), build the
composite NFA of N(s) and N(t) by inserting a new initial state i, a new final
state f and 4 ε-transitions: two go from i to the initial states of N(s) and N(t)
and two from the final states of N(s) and N(t) to f.
N(s)
start
ε
ε
i
f
ε
N(t)
ε
N(r) accepts the language
L(r) = L(s) ∪ L(t)
Practical class: Regular expressions and automata – Eliana Bove
Automata:
regexp  NFA (3/5)
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
• For the regular expression r=st (concatenation of two regular
expressions), build the composite NFA from N(s) and N(t) by merging
in a single state the final state of N(s) and the initial state of N(t) and
considering the initial state of N(s) as the initial state of N(r) and the
final state of N(t) as the final state of N(r).
start
i
N(s)
N(t)
f
N(r) accepts the language
L(r) = L(s)L(t)
Practical class: Regular expressions and automata – Eliana Bove
Automata:
regexp  NFA (4/5)
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
•
For the regular expression r=s* (Kleene closure of a regular expression),
build the NFA from N(s) by adding a new initial state i, a new final state
f and 4 ε-transitions: one from i to f, one from i to the initial state of
N(s), one from the final state of N(s) to f and one from the final state
of N(s) to the initial state of N(s); the last one is needed to implement
the loop.
ε
start
i
ε
N(s)
ε
f
N(r) accepts the language
L(r) = L(s*)
ε
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
regexp  NFA (5/5)
Properties:
1. Each step of the algorithm introduces at most two new states
2. N(r) has one initial state and one final state
3. No edge enters the initial state
4. No edge exits from the final state
5. Each state of N(r) has either one outgoing a-transition, with a in Σ,
or two outgoing due ε-transitions
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
exercise 1 (1/4)
Exercise 1:
Transform the following regular expression into an NFA: ((ε|a)b*)*
Exercise 1:
1. First step: empty string ε
start
2. Second step: string a
start
3. Third step: string b
start
1
3
5
ε
a
b
2
4
6
Practical class: Regular expressions and automata – Eliana Bove
Automata:
exercise 1 (2/4)
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
4. Fourth step: ε|a
start
ε
1
ε
2
ε
0
ε
3
a
7
4
ε
ε
5. Fifth step: b*
start
9
ε
5
b
6
ε
ε
Practical class: Regular expressions and automata – Eliana Bove
8
Automata:
exercise 1 (3/4)
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
6. Sixth step: (ε|a)b*
start
ε
1
ε
ε
2
ε
7
0
ε
3
a
4
ε
5
b
6
ε
ε
Practical class: Regular expressions and automata – Eliana Bove
ε
8
Automata:
exercise 1 (4/4)
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
7. Seventh step: ((ε|a)b*)*
start
ε
S
ε
0
ε
ε
1
ε
2
ε
ε
7
3
a
4
ε
ε
5
b
6
ε
ε
ε
Practical class: Regular expressions and automata – Eliana Bove
8
ε
E
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
further practice
Further practice:
Transform the following regular expressions into NFAs.
1. (a|b)*
2. (a*|b*)*
3. (a|b)*abb(a|b)*
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
NFA  DFA (1/3)
Algorithm: NFA  DFA (deterministic finite-state automaton)
Input: NFA N
Output: DFA D accepting the same language as N
Method: build a table Dtran for D  every state of D is a set of states of
N. The construction of Dtran is useful to simulate "in parallel" all the
possibile actions in N for an input string. The algorithm uses the following
operations:
1. ε-closure(s)  set of states of N (including "s" itself) reachable
from state "s" of N through edges labeled with ε.
2. ε-closure(T)  set of states reachable from a set of states T of N
through edges labeled with ε (this operation returns the union of
the ε-closure(s) operation for each state "s" in T)
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
NFA  DFA (2/3)
3. move(T,a)  set of states of N where a transition exists on the
input symbol "a" from a state "s" in T
The algorithm starts with the ε-closure(s0) operation on the initial state
of N. Then it goes on with the construction of the set of states of D
(Dstates) and of the transition table Dtran:
while (there is an unmarked state T in Dstates){
mark T;
for (each input symbol a){
U = ε-closure(move(T, a));
if (U ∉ Dstates)
add U as an unmarked state Dstates;
Dtran[T,a] = U;
}
}
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
NFA  DFA (3/3)
Computing ε-closure(T):
push all states of T onto a stack;
initialize ε-closure(T) to T;
while (stack is not empty) {
pop t, the top element, off stack;
for (each state u with an edge from t to u labeled ε) {
if (u ∉ ε-closure(T)) {
add u to ε-closure(T);
push u onto stack;
}
}
}

The worst-case complexity of NFADFA conversion is exponential,
but for real programming languages this behavior does not occur (the
NFA and the DFA have approximately the same number of states)
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
exercise 2 (1/5)
Exercise 2:
The graph shows an NFA accepting strings represented by the following
regular expression: (a | b)* abb. The input alphabet is {a, b}. Transform
the NFA into a DFA.
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
exercise 2 (2/5)
Exercise 2:
1. First step:
•
•
•
•
NFA state  A = ε-closure(0) = {0, 1, 2, 4, 7}
DFA state  A
Dtran[A, a] = ε-closure(move(A, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7,
8} = B
Dtran[A, b] = ε-closure(move(A, b)) = ε-closure({5}) = {1, 2, 4, 5, 6, 7}
=C
2. Second step:
•
•
•
DFA state  B
Dtran[B, a] = ε-closure(move(B, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6,
7, 8} = B
Dtran[B, b] = ε-closure(move(B, b)) = ε-closure({5, 9}) = {1, 2, 4, 5, 6,
7, 9} = D
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
exercise 2 (3/5)
Exercise 2:
3. Third step:
DFA state  C
Dtran[C, a] = ε-closure(move(C, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7,
8} = B
Dtran[C, b] = ε-closure(move(C, b)) = ε-closure({5}) = {1, 2, 4, 5, 6, 7}
=C
•
•
•
4. Fourth step:
•
•
•
DFA state  D
Dtran[D, a] = ε-closure(move(D, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6,
7, 8} = B
Dtran[D, b] = ε-closure(move(D, b)) = ε-closure({5, 10}) = {1, 2, 4, 5,
6, 7, 10} = E
Practical class: Regular expressions and automata – Eliana Bove
Automata:
exercise 2 (4/5)
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Exercise 2:
5. Fifth step:
•
•
•
DFA state  E
Dtran[E, a]= ε-closure(move(E, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7,
8} = B
Dtran[E, b]= ε-closure(move(E, b)) = ε-closure({5}) = {1, 2, 4, 5, 6, 7} =
C
DFA
state
Input symbol
a
b
A
B
C
B
B
D
C
B
C
D
B
E
E
B
C
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
exercise 2 (5/5)
Exercise 2:
Diagram of the resulting DFA:
After NFADFA conversion, the output DFA may be minimized further. In this
example, we notice states A and C have the same values for the transition
function, so they can be merged.
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
further practice (1/2)
Further practice:
1. Transform the following NFA into a DFA.
Practical class: Regular expressions and automata – Eliana Bove
Formal Languages
and Compilers
A.Y. 2015/2016
DEI – Politecnico di Bari
Automata:
further practice (2/2)
Further practice:
2. Transform the following NFA into a DFA.
Practical class: Regular expressions and automata – Eliana Bove