Practical class: regular expressions and Automata
Transcript
Practical class: regular expressions and Automata
Master’s Degree Course in Computer Engineering Formal Languages and Compilers FORMAL LANGUAGES AND COMPILERS PRACTICAL CLASS: Regular expressions & automata Eliana Bove – [email protected] A.Y. 2015/2016 Formal Languages and Compilers Recall: regular expressions A.Y. 2015/2016 DEI – Politecnico di Bari A regular expression r – defined upon an alphabet Σ and a set of metacharacters {+,*,(,),.,Ø, ε}, supposed not in Σ – is a string r in the alphabet (Σ∪{+,*,(,),.,Ø}), such that one of the following conditions holds: 1. 2. 3. 4. r=Ø r=ε r = x, where x is any character Є Σ r = (s+t) where s and t are regular expressions on Σ and + is the union operator 5. r = s.t where s and t are regular expressions on Σ and . is the concatenation operator 6. r = s* where s is a regular expression on Σ and * is the Kleene closure operator Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Regular expressions: exercise 1 Exercise 1: C language identifiers C language identifiers names which make reference to: variables, functions, labels, and other user-defined entities. An identifier is composed of one or more characters and must comply with the following rules: • the first character must be a letter or an underscore ( _ ); • the subsequent characters can be letters, digits or underscores. Give a definition through regular expressions. Exercise 1: C language identifiers • letter_ A | B |…| Z | a | b |…| z | _ • digit 0 | 1 |…| 9 • id letter_ (letter_ | digit)* Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Regular expressions: extensions Extensions of regular expressions Expression Meaning a* a repeated 0 or more times a? a repeated 0 or 1 time a+ a repeated 1 or more times a{1,3} a repeated from 1 to 3 times a{3,} a repeated at least 3 times [abc] any character of a, b or c [0-9] any of the characters in the range [^a-z] any character outside the range . any character ^ beginning of line $ end of line Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Regular expressions: exercise 2 Exercise 2: C language identifiers Solve Exercise 1 using the extensions in the table. Exercise 2: C language identifiers • letter_ [A-Za-z_] • digit [0-9] • id letter_ (letter_ | digit)* Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Regular expressions: exercise 3 Exercise 3: numbers in C Unsigned int or floating point data types are strings like 5280, 0.0123, 6.336E4 or 1.89E-4. Provide a definition through regular expressions. Exercise 3: numbers in C • digit • digits • optionalFraction [0-9] digit digit* . digits | ε • optionalExponent ( E (+ | - | ε) digits) | ε • number digits optionalFraction optionalExponent Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Regular expressions: exercise 4 Exercise 4: numbers in C Solve Exercise 3 using the regular expression extensions in the table. Exercise 4: numbers in C • digit [0-9] • digits digit+ • number digits (. digits)? (E [+-]? digits)? Practical class: Regular expressions and automata – Eliana Bove Regular expressions: exercise 5 Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Exercise 5: SQL Many languages are case sensitive, thus keywords can be written in just one way, and the regular expressions describing their lexeme patterns are very simple. Some languages like SQL are case insensitive, instead, thus keywords can be written in lower case, upper case or a combination of lower and upper case. For example, the SELECT keyword can be written also as select, Select, sElEcT. Use regular expressions to define how keywords are represented in a caseinsensitive language. In particular, show how to write the keyword «select» in SQL. Exercise 5: SQL • select [Ss][Ee][Ll][Ee][Cc][Tt] Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Regular expressions: exercises 6 and 7 Exercise 6 Define through regular expressions all the strings of a and b not containing the substring abb. Exercise 6 • string_ab b*(a+b?)* Exercise 7 Define through regular expressions all the strings of lower case letters containing the 5 vowels in order. Exercise 7 • consonants [bcdfghjklmnpqrstvwxyz] • string consonants* a (consonants | a)* e (consonants | e)* i (consonants | i)* o (consonants | o)* u (consonants | u)* Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Regular expressions: further practice Further practice: 1. Define, through regular expressions, all lower case strings in which letters are in ascending lexicographic order. 2. Given the alphabet Σ = {0,1,2}, define, through regular expressions, all the strings on Σ not containing repetitions. 3. Define, through regular expressions, comments on one or more lines (/* comment */). Warning: inside the comment string the endcomment character sequence is not admitted, unless it is enclosed within " ". Useful link to check the correctness of your expressions: http://regex101.com/#pcre Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: regexp NFA (1/5) Algorithm: regular expression NFA (nondeterministic finitestate automaton) Input: regular expression r on an alphabet Σ Output: NFA N accepting the language L(r) Method: to build an NFA starting from a regular expression, we use two rules: BASIS (to build separate NFAs for each subexpression) and INDUCTION (to build the final NFA by combining other NFAs). BASIS: • For the empty string ε, build an NFA with initial state i, final state f and an εtransition from i to f start i ε f Practical class: Regular expressions and automata – Eliana Bove Automata: regexp NFA (2/5) Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • For each subexpression a in the alphabet Σ, build an NFA with initial state i, final state f and an a-transition from i to f start a i f INDUCTION: Suppose N(s) and N(t) are two NFAs for regular expressions s and t, respectively • For regular expression r=s|t (union of two regular expressions), build the composite NFA of N(s) and N(t) by inserting a new initial state i, a new final state f and 4 ε-transitions: two go from i to the initial states of N(s) and N(t) and two from the final states of N(s) and N(t) to f. N(s) start ε ε i f ε N(t) ε N(r) accepts the language L(r) = L(s) ∪ L(t) Practical class: Regular expressions and automata – Eliana Bove Automata: regexp NFA (3/5) Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • For the regular expression r=st (concatenation of two regular expressions), build the composite NFA from N(s) and N(t) by merging in a single state the final state of N(s) and the initial state of N(t) and considering the initial state of N(s) as the initial state of N(r) and the final state of N(t) as the final state of N(r). start i N(s) N(t) f N(r) accepts the language L(r) = L(s)L(t) Practical class: Regular expressions and automata – Eliana Bove Automata: regexp NFA (4/5) Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari • For the regular expression r=s* (Kleene closure of a regular expression), build the NFA from N(s) by adding a new initial state i, a new final state f and 4 ε-transitions: one from i to f, one from i to the initial state of N(s), one from the final state of N(s) to f and one from the final state of N(s) to the initial state of N(s); the last one is needed to implement the loop. ε start i ε N(s) ε f N(r) accepts the language L(r) = L(s*) ε Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: regexp NFA (5/5) Properties: 1. Each step of the algorithm introduces at most two new states 2. N(r) has one initial state and one final state 3. No edge enters the initial state 4. No edge exits from the final state 5. Each state of N(r) has either one outgoing a-transition, with a in Σ, or two outgoing due ε-transitions Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: exercise 1 (1/4) Exercise 1: Transform the following regular expression into an NFA: ((ε|a)b*)* Exercise 1: 1. First step: empty string ε start 2. Second step: string a start 3. Third step: string b start 1 3 5 ε a b 2 4 6 Practical class: Regular expressions and automata – Eliana Bove Automata: exercise 1 (2/4) Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari 4. Fourth step: ε|a start ε 1 ε 2 ε 0 ε 3 a 7 4 ε ε 5. Fifth step: b* start 9 ε 5 b 6 ε ε Practical class: Regular expressions and automata – Eliana Bove 8 Automata: exercise 1 (3/4) Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari 6. Sixth step: (ε|a)b* start ε 1 ε ε 2 ε 7 0 ε 3 a 4 ε 5 b 6 ε ε Practical class: Regular expressions and automata – Eliana Bove ε 8 Automata: exercise 1 (4/4) Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari 7. Seventh step: ((ε|a)b*)* start ε S ε 0 ε ε 1 ε 2 ε ε 7 3 a 4 ε ε 5 b 6 ε ε ε Practical class: Regular expressions and automata – Eliana Bove 8 ε E Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: further practice Further practice: Transform the following regular expressions into NFAs. 1. (a|b)* 2. (a*|b*)* 3. (a|b)*abb(a|b)* Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: NFA DFA (1/3) Algorithm: NFA DFA (deterministic finite-state automaton) Input: NFA N Output: DFA D accepting the same language as N Method: build a table Dtran for D every state of D is a set of states of N. The construction of Dtran is useful to simulate "in parallel" all the possibile actions in N for an input string. The algorithm uses the following operations: 1. ε-closure(s) set of states of N (including "s" itself) reachable from state "s" of N through edges labeled with ε. 2. ε-closure(T) set of states reachable from a set of states T of N through edges labeled with ε (this operation returns the union of the ε-closure(s) operation for each state "s" in T) Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: NFA DFA (2/3) 3. move(T,a) set of states of N where a transition exists on the input symbol "a" from a state "s" in T The algorithm starts with the ε-closure(s0) operation on the initial state of N. Then it goes on with the construction of the set of states of D (Dstates) and of the transition table Dtran: while (there is an unmarked state T in Dstates){ mark T; for (each input symbol a){ U = ε-closure(move(T, a)); if (U ∉ Dstates) add U as an unmarked state Dstates; Dtran[T,a] = U; } } Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: NFA DFA (3/3) Computing ε-closure(T): push all states of T onto a stack; initialize ε-closure(T) to T; while (stack is not empty) { pop t, the top element, off stack; for (each state u with an edge from t to u labeled ε) { if (u ∉ ε-closure(T)) { add u to ε-closure(T); push u onto stack; } } } The worst-case complexity of NFADFA conversion is exponential, but for real programming languages this behavior does not occur (the NFA and the DFA have approximately the same number of states) Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: exercise 2 (1/5) Exercise 2: The graph shows an NFA accepting strings represented by the following regular expression: (a | b)* abb. The input alphabet is {a, b}. Transform the NFA into a DFA. Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: exercise 2 (2/5) Exercise 2: 1. First step: • • • • NFA state A = ε-closure(0) = {0, 1, 2, 4, 7} DFA state A Dtran[A, a] = ε-closure(move(A, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} = B Dtran[A, b] = ε-closure(move(A, b)) = ε-closure({5}) = {1, 2, 4, 5, 6, 7} =C 2. Second step: • • • DFA state B Dtran[B, a] = ε-closure(move(B, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} = B Dtran[B, b] = ε-closure(move(B, b)) = ε-closure({5, 9}) = {1, 2, 4, 5, 6, 7, 9} = D Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: exercise 2 (3/5) Exercise 2: 3. Third step: DFA state C Dtran[C, a] = ε-closure(move(C, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} = B Dtran[C, b] = ε-closure(move(C, b)) = ε-closure({5}) = {1, 2, 4, 5, 6, 7} =C • • • 4. Fourth step: • • • DFA state D Dtran[D, a] = ε-closure(move(D, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} = B Dtran[D, b] = ε-closure(move(D, b)) = ε-closure({5, 10}) = {1, 2, 4, 5, 6, 7, 10} = E Practical class: Regular expressions and automata – Eliana Bove Automata: exercise 2 (4/5) Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Exercise 2: 5. Fifth step: • • • DFA state E Dtran[E, a]= ε-closure(move(E, a)) = ε-closure({3, 8}) = {1, 2, 3, 4, 6, 7, 8} = B Dtran[E, b]= ε-closure(move(E, b)) = ε-closure({5}) = {1, 2, 4, 5, 6, 7} = C DFA state Input symbol a b A B C B B D C B C D B E E B C Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: exercise 2 (5/5) Exercise 2: Diagram of the resulting DFA: After NFADFA conversion, the output DFA may be minimized further. In this example, we notice states A and C have the same values for the transition function, so they can be merged. Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: further practice (1/2) Further practice: 1. Transform the following NFA into a DFA. Practical class: Regular expressions and automata – Eliana Bove Formal Languages and Compilers A.Y. 2015/2016 DEI – Politecnico di Bari Automata: further practice (2/2) Further practice: 2. Transform the following NFA into a DFA. Practical class: Regular expressions and automata – Eliana Bove