cs164: Introduction to Programming Languages and Compilers

cs164: Introduction to Programming Languages and Compilers

Lecture 11 Implementing Small Languages internal vs. external DSLs, hybrid small DSLs Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring

2012 UC Berkeley 1 Where are we? Lectures 10-12 are exploring small languages both design and implementation Lecture 10: regular expressions well finish one last segment today Lecture 11: implementation strategies how to embed a language into a host language

Lecture 12: problems solvable with small languages 2 Today Semantic differences between regexes and Res Internal DSLs Hybrid DSLs External DSLs 3

Answer to 2nd challenge question from L10 Q: Give a JavaScript scenario where tokenizing depends on the context of the parser. That is, lexer cannot tokenize the input entirely prior to parsing. A: In this code fragment, / / could be divs or a regex: e / f / g 4 Recall from L10: regexes vs REs

Regexes are implemented with backtracking This regex requires exponential time to discover that it does not match the input string X==============. regex: X(.+)+X REs are implemented by translation to NFA NFA may be translated to DFA. Resulting DFA requires linear time, ie reads each char once 5 The String Match Problem

Consider the problem of detecting whether a pattern (regex or RE) matches an (entire) string match(string, pattern) --> yes/no The regex and RE interpretations of any pattern agree on this problem. That is, both give same answer to this Boolean question Example: X(.+)+X It does not matter whether this regex matches the string X===X with X(.)(..)X or with X(.)(.)(.)X, assigning different values to the + in the regex.

While there are many possible matches, all we 6 Lets now focus on when regex and RE differ Can you think of a question that where they give a different answer? Answer: find a substring 7 Example from Jeff Friedls book Imagine you want to parse a config file:

filesToCompile=a.cpp b.cpp The regex for this command line format: [a-zA-Z]+=.* Now lets allow an optional \n-separated 2nd line: filesToCompile=a.cpp b.cpp \<\n> d.cpp e.h We extend the original regex correspondingly: [a-zA-Z]+=.*(\\\n.*)?

This regex does not match our two-line input. What compiler textbooks dont teach you The textbook string matching problem is simple: Does a regex r match the entire string s? a clean statement suitable for theoretical study here is where regexes and FSMs are equivalent In real life, we face the sub-string matching problem: Given a string s and a regex r, find a substring in s matching r.

- tokenization is a series of substring matching problems Substring matching: careful about semantics Do you see the language design issues? There may be many matching substrings. We need to decide which substring to return. It is easy to agree where the substring should start: the matched substring should be the leftmost match

They differ in where the string should end: - there are two schools: RE and regex (see next slide) Where should the matched string end? Declarative approach: longest of all matches conceptually, enumerate all matches and return longest Operational approach: define behavior of *, | operators e* match e as many times as possible while allowing the

remainder of the regex t o match filesToCompile=a.cpp b.cpp \<\n>(greedy d.cpp semantics) e.h e|e select leftmost choice allowing [a-zA-Z]+ = .* (while \\ \n

.* )? remainder to match These are important differences We saw a non-contrived regex can behave differently personal story: I spent 3 hours debugging a similar regex despite reading the manual carefully The (greedy) operational semantics of * does not guarantee longest match (in case you need it) forces the programmer to reason about

backtracking It may seem that backtracking is nice to reason about because its local: no need to consider the 12 Where in history of re did things go wrong? Its tempting to blame perl but the greedy regex semantics seems older there are other reasons why backtracking is used

Hypothesis 1:creators of re libs knew not that NFA can can be the target language for compiling regexes find all matches simultaneously (no backtracking) be implemented efficiently (convert NFA to DFA) Hypothesis 2: their hands were tied Ken Thompsons algorithm for re-to-NFA was patented With backtracking came the greedy semantics Regular Expressions Concepts

Syntax tree-directed translation (re to NFA) recognizers: tell strings apart NFA, DFA, regular expressions = equally powerful but \1 (backreference) makes regexes more pwrful Syntax sugar: e+ to e.e* Compositionality: be weary of greedy semantics Metacharacters: characters with special meaning 14

Internal Small Languages a.k.a. internal DSLs 15 Embed your DSL into a host language The host language is an interpreter of the DSL Three levels of embedding where we draw lines is fuzzy (ones lib is your framework) 1) Library 2) Framework (parameterized library)

3) Language 16 DSL as a library When DSL is implemented as a library, we often dont think of it as a language even though it defines own abstractions and operations Example: network sockets Socket f = new Socket(mode) f.connect(ipAddress) f.write(buffer) f.close()

17 The library implementation goes very far rfig: formatting DSL embedding into Ruby. see slide 8 in http://cs164fa09.pbworks.com/f/01-rfig-tutorial.pdf 18

The animation in rfig, a Ruby-based language slide!('Overlays', 'Using overlays, we can place things on top of each other.', 'The pivot specifies the relative positions', 'that should be used to align the objects in the overlay.', overlay('0 = 1', hedge.color(red).thickness(2)).pivot(0, 0), staggeredOverlay(true, # True means that old objects disappear 'the elements', 'in this', 'overlay should be centered', nil).pivot(0, 0), cr, pause, # pivot(x, y): -1 = left, 0 = center, +1 = right

staggeredOverlay(true, 'whereas the ones', 'here', 'should be right justified', nil).pivot(1, 0), nil) { |slide| slide.label('overlay').signature(8) } 19 DSL as a framework It may be impossible to hide plumbing in a procedure these are limits to procedural abstraction Framework, a library parameterized with client code typically, you register a function with the library

library calls this client callback function at a suitable point ex: an action to perform when a user clicks on DOM node 20 Example DSL: jQuery Before jQuery var nodes = document.getElementsByTagName('a'); for (var i = 0; i < nodes.length; i++) { var a = nodes[i]; a.addEventListener('mouseover', function(event) { event.target.style.backgroundColor=orange'; }, false ); a.addEventListener('mouseout', function(event)

{ event.target.style.backgroundColor=white'; }, false ); } jQuery abstracts iteration and events jQuery('a').hover( function() { jQuery(this).css('backgroundcolor', 'orange'); }, 21 Embedding DSL as a language Hard to say where a framework becomes a language not too important to define the boundary precisely

Rules I propose: its a language if 1) its abstractions include compile- or run-time checks --- prevents incorrect DSL programs ex: write into a closed socket causes an error 2) we use syntax of host language to create (an illusion) of a dedicated syntax ex: jQuery uses call chaining to pretend it modifes a single object: 22 rake rake: an internal DSL, embedded in Ruby

Author: Jim Weirich functionality similar to make has nice extensions, and flexibility, since it's embedded ie can use any ruby commands even the syntax is close (perhaps better): embedded in Ruby, so all syntax is legal Ruby http://martinfowler.com/articles/rake.html 23 Example rake file task :codeGen do

# do the code generation end task :compile => :codeGen do # do the compilation end task :dataLoad => :codeGen do # load the test data end task :test => [:compile, :dataLoad] do # run the tests end 24

Ruby syntax rules Ruby procedure call 25 How is rake legal ruby? Deconstructing rake (teaches us a lot about Ruby): task :dataLoad => :codeGen do # load the test data end task :test => # run the tests end

[:compile, :dataLoad] do 26 Two kinds of rake tasks File task: dependences between files (as in make) file 'build/dev/rake.html' => 'dev/rake.xml' do | t| require 'paper' maker = PaperMaker.new t.prerequisites[0], t.name maker.run

end 27 Two kinds of tasks Rake task: dependences between jobs task :build_refact => [:clean] do target = SITE_DIR + 'refact/' mkdir_p target, QUIET require 'refactoringHome' OutputCapturer.new.run {run_refactoring} end 28

Rake can orthogonalize dependences and rules task :second do #second's body end task :first do #first's body end task :second => :first 29 General rules

Sort of like make's %.c : %.o BLIKI = build('bliki/index.html') FileList['bliki/*.xml'].each do |src| file BLIKI => src end file BLIKI do #code to build the bliki end 30 Parsing involved: DSL in a GP language GP: general purpose language

31 Parsing involved: GP in a DSL language GP: general purpose language 32 External DSL Own parser, own interpreter or compiler Examples we have seen: 33

Reading Read the article about the rake DSL 34 Acknowledgements This lecture is based in part on Martin Fowler, Using the Rake Build Language Jeff Friedl, Mastering Regular Expressions 35

Recently Viewed Presentations

  • Chapter 13: Visions of Canadian Identity

    Chapter 13: Visions of Canadian Identity

    During the construction of the CP Railway, Chinese people were welcomed to Canada to help build it. The Canadian Government introduced the Chinese Immigration Act in 1885 which imposed a head tax of $50 for any Chinese person that wanted...
  • CS412 Computer Networks

    CS412 Computer Networks

    CS 313 Introduction to Computer Networking & Telecommunication Medium Access Control Sublayer Chi-Cheng Lin, Winona State University * Topics Introduction Channel Allocation Problem Multiple Access Protocols CDMA * Introduction Broadcast networks Key issue: who gets to use the channel when...
  • Attachment - nclmoodle.org.uk

    Attachment - nclmoodle.org.uk

    Dollard and Miller (1950) suggested that the attachment was due to drive reduction (due to biological needs) Hunger and cold have a strong motivating affect on the child, driving the child to satisfy its need by eating or seeking warmth....
  • Strategic Planning Celebration 2013 - SJECCD

    Strategic Planning Celebration 2013 - SJECCD

    Number of New Jobs by Sector2011 to 2022. 2011 Jobs Professional, Scientific, and Technical Services Ambulatory Health Care Services Educational Services (Private)
  • Sparse Multi-task Learning for Detecting Influential Nodes in ...

    Sparse Multi-task Learning for Detecting Influential Nodes in ...

    References: Wang, Yingze, Guang Xiang, and Shi-Kuo Chang. "Sparse Multi-Task Learning for Detecting Influential Nodes in an Implicit Diffusion Network." AAAI. 2013. "Learning with Sparsity for Detecting Influential Nodes in Implicit Information Diffusion Networks", Yingze Wang, 2014.
  • Maine East High School

    Maine East High School

    Interns have access to a variety of assessment kits and rating scales extra (e.g., WISC-V, WIAT- III, Vineland-II, TOWL4) There are also intervention resources available in the intern office (e.g., The C.A.T. Project and StrongTeens)
  • &quot;Embroidery&quot; - Mcphee

    "Embroidery" - Mcphee

    Symbolism is the whole idea behind the embroidery they are working on. The beautiful bright flowers and the look on the man's face slowly being destroyed is an example. One of them accidently ruined the piece which made them attempt...
  • Root Cause Analysis How adaptive leaders use root

    Root Cause Analysis How adaptive leaders use root

    Root Cause Analysis How adaptive leaders use root cause analysis to collaboratively solve student achievement needs DC Data Summit July 14, 2016