# CSCE 212 Computer Architecture - Computer Science & E CSCE 531 Compiler Construction Lecture 4 (new improved) Subset construction NFA DFA Topics FLEX Subset Construction Equivalence relations, partitions Readings:

CSCE 531 Spring 2018 1 January 30, 2018 Overview Last Time Thompson Construction: Regular Expression NFA Todays Lecture

Rubular regular expressions online Subset Construction: NFA DFA Minimization of DFA well: distinguished states Lex/Flex References 2 Chapter 2

http://www.gnu.org/software/flex/manual/html_mono/flex.html C introduction Examples CSCE 531 Spring 2018 Rubular.com 3 CSCE 531 Spring 2018 RegEx Quick Reference http://rubular.com/r/11Z1trB1JI

4 CSCE 531 Spring 2018 Python Regular Expressions https://docs.python.org/3.4/library/re.html Pythex: a Python regular expression editor https://pythex.org/ POSIX:

5 https://www.regular-expressions.info/posix.html CSCE 531 Spring 2018 Python Regular Expressions import re import time filename="../../513/513F16.txt" f = open(filename, "r") regexp=re.compile(r'Degree:\s*([A-Z].*)|[1-9][0-9]?.*([A-Z][a-zAZ-]*)') #,\s*([A-Z][a-z]*)') degreeRE=re.compile(r'Degree:\s*([A-Za-z].*)')

nameRE=re.compile(r'[1-9][0-9]?\s*([A-Z][a-zA-Z-]*\,\s*[A-Za-z-]+)') CSCE 531 Spring 2018 6 https://docs.python.org/3.4/library/re.html name="unassigned" for line in f: Python Example Continued Continued m=regexp.match(line)

if m != None: m2 = nameRE.match(line) if m2 != None: name=m2.group(1) m3 = degreeRE.match(line) if m3 != None: print (name, "\t", m3.group(1)) f.close() 7 CSCE 531 Spring 2018 Example 3.14 (a | b)*abb

8 CSCE 531 Spring 2018 Transition Table 9 CSCE 531 Spring 2018 Deterministic Finite Automata (DFA) A deterministic finite automata (DFA) is an NFA such that: 1. There are no moves on input , and 2. For each state s and input symbol a , there is exactly

one edge out of s labeled a . 10 CSCE 531 Spring 2018 NFA transitions versus DFA 1. Not every state an input has a transition (s, a) might be emptys, a) might be empty) might be empty (s, a) might be empty1, a) might be empty) =

2. For a given state and input there can be more than one transition (s, a) might be empty0, a) might be empty) = { 0, 1} 3. Epsilon transitions allowed in NFAs 11 None in this diagram CSCE 531 Spring 2018 Every DFA is an NFA

A DFA is just an NFA that satisfies: 1. No epsilon moves; formally (s, a) might be emptys, ) = 2. For each state s and input symbol a, (s, a) might be emptys, a) might be empty) is a single state; formally | (s, a) might be emptys, a) might be empty) | = 1 So, every language accepted by an DFA is also accepted by an NFA. Earlier we showed every language denoted by a regular expression has an NFA that accepts it. Today we show every language recognized by an NFA has a DFA that recognizes it also. 12 CSCE 531 Spring 2018 So why use an NFA and why use a DFA?

NFA more expressive power DFA more efficient recognizer 13 CSCE 531 Spring 2018 Algorithm 3.18 : Simulating a DFA. INPUT: An input string x terminated by an end-of-file character eof. A DFA D with start state s0 , accepting states F , and transition function move. OUTPUT: Answer yes if D accepts x ; no otherwise. METHOD: Apply the algorithm in Fig. 3.27 to the input string x . The function move( s; c) gives the state to

which there is an edge from state s on input c . The function nextChar returns the next character of the input string x . 14 CSCE 531 Spring 2018 Simulating DFA = recognizing L(DFA) 15 CSCE 531 Spring 2018 Algorithm 3.20 : The subset construction

Algorithm 3.20 : The subset construction of a DFA from an NFA. INPUT: An NFA N . OUTPUT: A DFA D accepting the same language as N . METHOD: 16 Our algorithm constructs a transition table Dtran for D . Each state of D is a set of NFA states, and we construct Dtran so D will simulate in parallel all possible moves N can make on a given input string.

CSCE 531 Spring 2018 Fig 3.31 Operations on NFA states 17 CSCE 531 Spring 2018 Fig. 3.32 the Subset Contruction initially, - closure ( s 0 ) is the only state in Dstates, and it is unmarked; while ( there is an unmarked state T in Dstates ) { mark T ; for ( each input symbol a ) { U = - closure (move( T;a) );

if ( U is not in Dstates ) add U as an unmarked state to Dstates; Dtran[T;a] = U ; } } 18 CSCE 531 Spring 2018 Fig 3.33 Computing -closure of a set 19 CSCE 531 Spring 2018

NFA DFA via Subset Construction d0 = -closure(0) = {0, 1, 2, 4, 7} 20 CSCE 531 Spring 2018 State of DFA a d0 = -closure(0) closure(0) = {0, 1, 2, 4, 7}

d1 = -closure(0) closure(move(d0, a)) =-closure(0) closure( 3, 8} ={3, 8, 6, 7, 1, 2, 4} d1={3, 8, 6, 7, 1, 2, 4} -closure(0) closure(move(d1, a)) =-closure(0) closure( 3, 8} ={3, 8, 6, 7, 1, 2, 4} b d2 = -closure(0) closure(move(d0, b)) =-closure(0) closure( 5 } ={5, 6, 1, 7, 2, 4}

d2={5, 6, 1, 2, 4} 21 CSCE 531 Spring 2018 22 CSCE 531 Spring 2018 23 CSCE 531 Spring 2018 Algorithm 3.22 : Simulating an NFA

INPUT: An input string x terminated by an end-of- le character eof. An NFA N with start state s0 , accepting states F , and transition function move. OUTPUT: Answer yes if N accepts x ; no otherwise. METHOD: The algorithm keeps a set of current states S, those that are reached from s0 following a path labeled by the inputs read so far. If c is the next input character, read by the function nextChar(), then we first compute move( S; c) and then close that set using - closure (). 24 CSCE 531 Spring 2018

Simulating an NFA 25 CSCE 531 Spring 2018 26 CSCE 531 Spring 2018 DFA Minimization Algorithm (Authors Slide) Discover sets of equivalent states Represent each such set with just one state Two states are equivalent (indistinguishable)if and only if: The set of paths leading to them are equivalent

, transitions on lead to equivalent states (DFA) -transitions to distinct sets states must be in distinct sets A partition P of S Each s S is in exactly one set pi P The algorithm iteratively partitions the DFAs states 27 CSCE 531 Spring 2018 Details of the algorithm (Authors Slide) Details of the algorithm Group states into maximal size sets, optimistically

Iteratively subdivide those sets, as needed States that remain grouped together are equivalent Initial partition, P0 , has two sets: {F} & {Q-F} (D =(Q,,,q0,F)) Splitting a set (partitioning a set by a) Assume qa, & qb s, and (qa,a) = qx, & (qb,a) = qy If qx & qy are not in the same set, then s must be split qa has transition on a, qb does not a splits s One state in the final DFA cannot have two transitions on a 28

CSCE 531 Spring 2018 Minimization Algorithm P { F, {Q-F}} while ( P is still changing) T{} The partitioning step is: If (S) contains states in 2 equivalent sets of states. for each set S P for each a partition S by a

into S1, and S2 Add S1 and S2 to T if T P then S PT 29 CSCE 531 Spring 2018 Building Faster Scanners from the DFA (Authors slide) Table-driven recognizers waste a lot of effort

Read (& classify) the next character Find the next state Assign to the state variable Trip through case logic in action() Branch back to the top We can do better Encode state & actions in the code Do transition tests locally Generate ugly, spaghetti-like code Takes (many) fewer operations per input character 30 CSCE 531 Spring 2018

Symbol Tables #define ENDSTR 0 #define MAXSTR 100 #include struct nlist { /* basic table entry */ char *name; struct nlist *next; /*next entry in chain */ int val; }; #define HASHSIZE 100

static struct nlist *hashtab[HASHSIZE]; /* pointer table */ 31 CSCE 531 Spring 2018 The Hash Function /* PURPOSE: Hash determines hash value based on the sum of the character values in the string. USAGE: n = hash(s); DESCRIPTION OF PARAMETERS: s(array of char) string to be hashed AUTHOR: Kernighan and Ritchie LAST REVISION: 12/11/83 */

hash(char *s) /* form hash value for string s */ { int hashval; for (hashval = 0; *s != '\0'; ) hashval += *s++; return (hashval % HASHSIZE); } 32 CSCE 531 Spring 2018 The lookup Function /*PURPOSE: Lookup searches for entry in symbol table and

returns a pointer USAGE: np= lookup(s); DESCRIPTION OF PARAMETERS: s(array of char) string searched for AUTHOR: Kernighan and Ritchie LAST REVISION: 12/11/83*/ struct nlist *lookup(char *s) /* look for s in hashtab */ { struct nlist *np; for (np = hashtab[hash(s)]; np != NULL; np = np->next) if (strcmp(s, np->name) == 0) return(np); /* found it */ return(NULL); /* not found */

} 33 CSCE 531 Spring 2018 The install Function PURPOSE: Install checks hash table using lookup and if entry not found, it "installs" the entry. USAGE: np = install(name); DESCRIPTION OF PARAMETERS: name(array of char) name to install in symbol table AUTHOR: Kernighan and Ritchie, modified by Ron Sobczak LAST REVISION: 12/11/83

*/ 34 CSCE 531 Spring 2018 struct nlist *install(char *name) /* put (name) in hashtab */ { struct nlist *np, *lookup(); char *strdup(), *malloc(); int hashval; if ((np = lookup(name)) == NULL) { /* not found */ np = (struct nlist *) malloc(sizeof(*np)); if (np == NULL)

return(NULL); if ((np->name = strdup(name)) == NULL) return(NULL); hashval = hash(np->name); np->next = hashtab[hashval]; hashtab[hashval] = np; } return(np); } 35 CSCE 531 Spring 2018