A taste of Haskell - Tufts University

A taste of Haskell - Tufts University

cs242 Kathleen Fisher Reading: Beautiful Concurrency, The Transactional Memory / Garbage Collection Analogy Thanks to Simon Peyton Jones for these slide Multi-cores are coming! - - - For 50 years, hardware designers delivered 40-50% increases per year in sequential

program performance. Around 2004, this pattern failed because power and cooling issues made it impossible to increase clock frequencies. Now hardware designers are using the extra transistors that Moores law is still delivering to put more processors on a single chip. If we want to improve performance, concurrent programs are no longer optional. Concurrent programming is essential to improve performance on a multi-core. Yet the state of the art in concurrent programming is 30 years old: locks and condition variables.

(In Java: synchronized, wait, and notify.) Locks and condition variables are fundamentally flawed: its like building a sky-scraper out of bananas. This lecture describes significant recent progress: bricks and mortar instead of bananas Libraries build layered concurrency abstractions Library Library

Library Library Library Library Library Concurrency primitives Hardware Lib rar y Li br

ar y Locks and condition variables (a) are hard to use and (b) do not compose Lib r ary Library ry a

br i L ary r b Li Lib rar y s e l b a i

r a v n o i t i d n o c d n a s k c

Lo Hardware Atomic blocks are much easier to use, and do compose Library Library Library Library Library Library

Library Atomic blocks 3 primitives: atomic, retry, orElse Hardware A 10-second review: Races: forgotten locks lead to inconsistent views Deadlock: locks acquired in wrong order Lost wakeups: forgotten notify to condition variables Diabolical error recovery: need to restore invariants and release locks in exception handlers These are serious problems.

But even worse... Consider a (correct) Java bank Account class: class Account{ float balance; synchronized void deposit(float amt) { balance += amt; } } synchronized void withdraw(float amt) { if (balance < amt) throw new OutOfMoneyError(); balance -= amt; }

Now suppose we want to add the ability to transfer funds from one account to another. Simply calling withdraw and deposit to implement transfer causes a race condition: class Account{ } float balance; synchronized void deposit(float amt) { balance += amt; } synchronized void withdraw(float amt) {

if(balance < amt) throw new OutOfMoneyError(); balance -= amt; } void transfer_wrong1(Acct other, float amt) { other.withdraw(amt); // race condition: wrong sum of balances this.deposit(amt);} Synchronizing transfer can cause deadlock: class Account{ } float balance; synchronized void deposit(float amt) {

balance += amt; } synchronized void withdraw(float amt) { if(balance < amt) throw new OutOfMoneyError(); balance -= amt; } synchronized void transfer_wrong2(Acct other, float amt) { // can deadlock with parallel reverse-transfer this.deposit(amt); other.withdraw(amt); } Scalable double-ended queue: one lock per ce No interference

if ends far enough apart But watch out when the queue is 0, 1, or 2 elements long! Coding style Sequential code Difficulty of queue implementation Undergraduate Coding style Sequential Sequential code code Locks and

Locks and condition condition variables variables 1 Difficulty Difficulty of queue of concurrent implementation queue Undergraduate Undergraduate Publishable result at

Publishable result at international international conference1 conference Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. Coding style 1 Difficulty of queue implementation Sequential code

Undergraduate Locks and condition variables Publishable result at international conference1 Atomic blocks Undergraduate Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. Like

database transactions atomic {...sequential code...} To a first approximation, just write the sequential code, and wrap atomic around it All-or-nothing semantics: Atomic commit Atomic block executes in Isolation Cannot deadlock (there are no locks!) ACID Atomicity makes error recovery easy (e.g. throw exception inside sequential code) Optimistic concurrency atomic {... ...} One possibility:

Execute without taking any locks. y; Log each read and write in to a read read z; write 10 x; thread-local transaction log. write 42 z; Writes go to the log only, not to memory. At the end, the transaction validates the log. - If valid, atomically commits changes to memory. If not valid, re-runs from the beginning, discarding changes. Realising STM in

Haskell Logging memory effects is expensive. Haskell already partitions the world into immutable values (zillions and zillions) Haskell - mutable locations (some or none) programmers brutally Only need to log the latter! - trained from Type system controls where I/O effects birth to use happen. memory effects sparingly.

Monad infrastructure ideal for constructing transactions & implicitly passing transaction log. Already paid the bill. Simply reading or writing a mutable location is expensive (involving a procedure call) so transaction overhead is not as large as in an imperative language. Consider a simple Haskell program: main = do { putStr (reverse yes); putStr no } Effects are explicit in the type system. (reverse yes) :: String (putStr

no ) :: IO () -- No effects -- Effects okay Main program is a computation with main :: IO () effects. newRef :: a -> IO (Ref a) readRef :: Ref a -> IO a writeRef :: Ref a -> a -> IO ()

Recall that Haskell uses newRef, readRef, and writeRef functions within the IO Monad to manage mutable state. main = do { r <- newRef 0; incR r; s <- readRef r; print s } incR :: Ref Int -> IO () incR r = do { v <- readRef r; writeRef r (v+1) } Reads and writes are 100% explicit. The type system disallows (r + 6), because r :: Ref Int The fork function spawns a thread. It takes an action as its argument.

fork :: IO a -> IO ThreadId main = do { r <- newRef 0; fork (incR r); incR r; ... } incR :: Ref Int -> IO () incR r = do { v <- readRef f; writeRef r (v+1) } A race Idea: add a function atomic that executes its argument computation atomically. atomic :: IO a -> IO a -- almost

main = do { r <- newRef 0; fork (atomic (incR r)); atomic (incR r); ... } Worry: What prevents using incR outside atomic, which would allow data races between code inside atomic and outside? Introduce a type for imperative transaction variables (TVar) and a new Monad (STM) to track transactions. Ensure TVars can only be modified in atomic :: STM a -> IO a transactions. newTVar :: a -> STM (TVar a)

readTVar :: TVar a -> STM a writeTVar :: TVar a -> a -> STM () incT :: TVar Int -> STM () incT r = do { v <- readTVar r; writeTVar r (v+1) } main = do { r <- atomic (newTVar 0); fork (atomic (incT r)) atomic (incT r); ... } atomic newTVar readTVar writeTVar :: ::

:: :: STM a -> IO a a -> STM (TVar a) TVar a -> STM a TVar a -> a -> STM() Notice that: Cant fiddle with TVars outside atomic block [good] Cant do IO or manipulate regular imperative variables inside atomic block atomic (if x

(called atomically in the actual implementation.) ...and, best of all... incT :: TVar Int -> STM () incT r = do { v <- readTVar r; writeTVar r (v+1) } incT2 :: TVar Int -> STM () incT2 r = do { incT r; incT r } Compositi on is THE foo :: IO () way to foo = ...atomic (incT2 r)... build big programs that work

The type guarantees that an STM computation is always executed atomically (e.g. incT2). Simply glue STMs together arbitrarily; then wrap with atomic to produce an IO action. throw :: Exception -> STM a catch :: STM a -> (Exception -> STM a) -> STM a Three new ideas retry orElse always withdraw :: TVar Int -> Int -> STM () withdraw acc n = do { bal <- readTVar acc; if bal < n then retry;

writeTVar acc (bal-n) } retry :: STM () retry means abort the current transaction and re-execute it from the beginning. Implementation avoids the busy wait by using reads in the transaction log (i.e. acc) to wait simultaneously on all read variables. withdraw :: TVar Int -> Int -> STM () withdraw acc n = do { bal <- readTVar acc; if bal < n then retry; writeTVar acc (bal-n) } No condition variables!

Retrying thread is woken up automatically when acc is written, so there is no danger of forgotten notifies. No danger of forgetting to test conditions again when woken up because the transaction runs from the beginning. For example: atomic (do { withdraw a1 3; withdraw a2 7 }) retry can appear anywhere inside an atomic block, including nested deep within a call. For example, atomic (do { withdraw a1 3; withdraw a2 7 }) waits for a1>3 AND a2>7, without any change to withdraw function. Contrast:

atomic (a1 > 3 && a2 > 7) { ...stuff... } which breaks the abstraction inside ...stuff... Suppose we want to transfer 3 dollars from either account a1 or a2 into account b. atomic (do { withdraw a1 3 `orelse` withdraw a2 3; deposit b 3 }) Try this

...and if it retries, try this ...and and then do this orElse :: STM a -> STM a -> STM a transfer :: TVar Int -> TVar Int -> TVar Int -> STM () atomic (transfer a1 a2 b `orElse` transfer a3 a4 b)

transfer a1 a2 b = do { withdraw a1 3 `orElse` withdraw a2 3; deposit b 3 } The function transfer calls orElse, but calls to transfer can still be composed with orElse. A transaction is a value of type STM a. Transactions are first-class values. Build a big transaction by composing little transactions: in sequence, using orElse and retry, inside

procedures.... Finally seal up the transaction with atomic :: STM a -> IO a STM supports nice equations for reasoning: orElse is associative (but not commutative) retry `orElse` s = s s `orElse` retry = s (These equations make STM an instance of the Haskell typeclass MonadPlus, a Monad with some extra operations and properties.) The route to sanity is to establish invariants that are assumed on

entry, and guaranteed on exit, by every atomic block. We want to check these guarantees. But we dont want to test every invariant after every atomic block. Hmm.... Only test when something read by the invariant has changed.... rather like retry. always :: STM Bool -> STM () newAccount :: STM (TVar Int) newAccount = do { v <- newTVar 0; always (do { cts = 0) }); return v } Any transaction that

modifies the account will check the invariant (no forgotten checks). If the check fails, the An arbitrary boolean valued STM computation always :: STM Bool -> STM () The function always adds a new invariant to a global pool of invariants. Conceptually, every invariant is checked as every transaction commits. But the implementation checks only invariants that read TVars that have been written by the transaction ...and garbage collects invariants that

are checking dead Tvars. Everything so far is intuitive and arm-wavey. But what happens if its raining, and you are inside an orElse and you throw an exception that contains a value that mentions...? We need a precise specification! One exists No way to wait for complex conditions See Composable Memory Transactions for details.

A complete, multiprocessor implementation of STM exists as of GHC 6. Experience to date: even for the most mutation-intensive program, the Haskell STM implementation is as fast as the previous MVar implementation. - The MVar version paid heavy costs for (usually unused) exception handlers. Need more experience using STM in practice, though! You can play with it. The reading assignment contains a complete STM program. There are similar proposals for

adding STM to Java and other mainstream languages. class Account { float balance; void deposit(float amt) { atomic { balance += amt; } } void withdraw(float amt) { atomic { if(balance < amt) throw new OutOfMoneyError(); balance -= amt; } } void transfer(Acct other, float amt) { atomic { // Can compose withdraw and deposit. other.withdraw(amt); this.deposit(amt); } } }

Unlike Haskell, type systems in mainstream languages dont control where effects occur. What happens if code outside a transaction conflicts with code inside a transaction? Weak Atomicity: Non-transactional code can see inconsistent memory states. Programmer should avoid such situations by placing all accesses to shared state in transaction. - Strong Atomicity: Non-transactional code is guaranteed to see a consistent view of shared state. This guarantee may cause a performance hit. For more information: Enforcing Isolation and Ordering in STM -

At first, atomic blocks look insanely expensive. A naive implementation (c.f. databases): - Every load and store instruction logs information into a thread-local log. A store instruction writes the log only. A load instruction consults the log first. Validate the log at the end of the block. If succeeds, atomically commit to shared memory. If fails, restart the transaction. Normalised execution time Fine-grained

locking (2.57x) Traditional STM (5.69x) Coarsegrained locking (1.13x) Sequential baseline (1.00x) See Optimizing Memory Transactions for Workload: operations on a red-black tree,

1 thread, 6:1:1 lookup:insert:dele te mix with keys 0..65535 more information. Direct-update STM - Allows transactions to make updates in place in the heap Avoids reads needing to search the log to see earlier writes that the transaction has made Makes successful commit operations faster at the cost of extra work on contention or when a transaction aborts

Compiler integration - Decompose transactional memory operations into primitives Expose these primitives to compiler optimization (e.g. to hoist concurrency control operations out of a loop) Runtime system integration - Integrates transactions with the garbage collector to scale to atomic blocks containing 100M memory accesses Normalised execution time

Fine-grained locking (2.57x) Traditional STM (5.69x) Coarsegrained locking (1.13x) Directupdate STM (2.04x) Direct-update STM + compiler integration (1.46x)

Sequential baseline (1.00x) Scalable to multicore Workload: operations on a red-black tree, 1 thread, 6:1:1 lookup:insert:dele te mix with keys Microseconds per operation Coarse-grained

locking Fine-grained locking Traditional STM Direct-update STM + compiler integration #threads Nave STM implementation is hopelessly inefficient. There is a lot of research going on in the compiler and architecture communities to optimize STM. This work typically assumes transactions are

smallish and have low contention. If these assumptions are wrong, performance can degrade drastically. We need more experience with real workloads and various optimizations before we will be able to say for sure that we can implement STM sufficiently efficiently to be useful. The essence of shared-memory concurrency is deciding where critical sections should begin and end. This is a hard problem. - - Too small: application-specific data races (Eg, may see deposit but not

withdraw if transfer is not atomic). Too large: delay progress because deny other threads access to needed resources. Consider the following program: Initially, x = y = 0 Thread 1 // atomic { //A0 atomic { x = 1; } //A1 atomic { if (y==0) abort; } //A2 //} Thread 2 atomic { //A3 if (x==0) abort;

y = 1; } Successful completion requires A3 to run after A1 but before A2. So adding a critical section (by uncommenting A0) changes the behavior of the program (from terminating to non-terminating). Worry: Could the system thrash by continually colliding and reexecuting? No: A transaction can be forced to re-execute only if another succeeds in committing. That gives a strong progress guarantee. 1 Thread

But: A particular thread could Thread 2 starve: Thread 3 In languages like ML or Java, the fact that the language is in the IO monad is baked in to the language. There is no need to mark anything in the type system because IO is everywhere. In Haskell, the programmer can choose when to live in the IO monad and when to live in the realm of pure functional programming. Interesting perspective: It is not Haskell that lacks imperative features, but rather the other languages that lack the ability to have a statically distinguishable pure

subset. This separation facilitates concurrent programming. Usefu Arbitrary effects l No effects Useles s Dangerous Safe Plan A (everyone else)

Usefu Arbitrary effects l Nirvana Plan B (Haskell) No effects Useles s Dangerous Safe

Arbitrary effects Examples Regions Ownership types Vault, Spec#, Cyclone Default = Any effect Plan = Add restrictions Default = No effects Plan = Selectively permit effects Types play a major role Two main approaches:

Domain specific languages (SQL, Xquery, Google map/reduce) Wide-spectrum functional languages + controlled effects (e.g. Haskell) Value oriented programming Plan A (everyone else) Usefu Arbitrary effects l

Nirvana Envy Plan B (Haskell) No effects Useles s Dangerous Safe Plan A (everyone else)

Usefu Arbitrary effects l Ideas; e.g. Software Transactional Memory (retry, orElse) Nirvana Plan B (Haskell) No effects Useles s Dangerous

Safe One of Haskells most significant contributions is to take purity seriously, and relentlessly pursue Plan B. Imperative languages will embody growing (and checkable) pure subsets. Jones -- Simon Peyton Atomic blocks (atomic, retry, orElse) dramatically raise the level of abstraction for concurrent programming. It is like using a high-level language instead

of assembly code. Whole classes of low-level errors are eliminated. Not a silver bullet: - you can still write buggy programs; concurrent programs are still harder than sequential ones aimed only at shared memory concurrency, not message passing There is a performance hit, but it seems acceptable (and things can only get better as the research community focuses on the question.)

Recently Viewed Presentations

  • Introduction to STATA

    Introduction to STATA

    Ex. The difference between the height of each man in the sample and the unobservable ...
  • EDITING

    EDITING

    Discontinuity editing Five Conventions of Classical Hollywood- Style Continuity Editing Do not call attention to the editing- make it invisible. The visual transitions and manipulations of audio is hidden from our perception. Screen direction is consistent from shot to shot.
  • Manipur AUGUST 2012 JEWEL OF INDIA For updated

    Manipur AUGUST 2012 JEWEL OF INDIA For updated

    The Government of India has a trilateral agreement with Thailand and Myanmar to construct a trans-Asian highway connecting India (through Manipur) to the two countries. The Manipur State Road Transport Corporation (MSRSTC) provides state road transport services. ... A railway-line...
  • Evaluation of the Painful Shoulder J. Lindsay Quade,

    Evaluation of the Painful Shoulder J. Lindsay Quade,

    Taking a good history, paying special attention to the age of the patient and location of the pain, can help tailor the physical exam and narrow the diagnosis. Knowledge of common shoulder disorders is important as they can often be...
  • e-Commerce: Buying and Selling on the Internet

    e-Commerce: Buying and Selling on the Internet

    e-Commerce: Buying and Selling on the Internet ... Shopping cart technology also facilitates order processing Selling on Other Websites On-line malls - similar to consignment shops, most provide Tools to build & maintain your shop Credit card processing (with security)...
  • Strategy Chapter 2 Objectives  OM Strategy  Competitive  Order

    Strategy Chapter 2 Objectives OM Strategy Competitive Order

    Objectives OM Strategy Competitive Dimensions Order Qualifiers and Winners Strategy Design Process Productivity Measures What is Operations and Supply Strategy? Operations and supply strategy is concerned with setting broad policies and plan for using the resources of a firm to...
  • A Vision for Learning Creating Change Through Commitment:

    A Vision for Learning Creating Change Through Commitment:

    CIDA Recognizes The Value Of Knowledge (Management) ... CLS is not collecting Level 2, 3 or 4 Evaluation Data. Current Planning Cycle Inhibits Clear Expression of CLS Goals. GLS is Collecting Level 1 - Learner Satisfaction Data.
  • Obesity, Physical Activity and Diet in the Eastern Mediterranean

    Obesity, Physical Activity and Diet in the Eastern Mediterranean

    Obesity, Physical Activity and Diet in the Eastern Mediterranean ... de Onis and Blössner (2000) Arab Asia % Summary of Prevalence of Overweight and Obesity in the Middle East Countries 35 - 75 ...