Deferred Runtime Pipelining for contentious multicore ...

Deferred Runtime Pipelining for contentious multicore ...

Deferred Runtime Pipelining for contentious multicore transactions Shuai Mu Sebastian Angel Dennis Shasha 1 Multicore programming today Dangerous atomics Confusing semaphores Too many locks

2 Yet developers use transactions to interact with databases and distributed systems Simpler, less error-prone, efficient enough... 3 What keeps multicore devs in the dark ages? multicore txs have very high overheads (historically) 4 Three main points in this talk STM is not crazy expensive anymore

Recent proposals (STO [EuroSys 16], TDSL [PLDI 16]) use type information Competitive with fine-grained locking for many workloads DB community has for decades leveraged workload knowledge We extend STO to support more workloads efficiently New concurrency control protocol called DRP DRP is inspired by work in DB, but avoids static analysis Can handle arbitrary transactions defined at runtime 5 Three main points in this talk STM is not crazy expensive anymore DB community has for decades leveraged workload knowledge We extend STO to support many more workloads efficiently

6 Software Transactional Objects [EuroSys 16] void transfer(TArray& bal, TBox& num, int src, int dst, int amt) { TRANSACTION { Begin transaction bal[src] = bal[src] - amt; bal[dst] = bal[dst] + amt; Read operations num = num + 1;

} RETRY(true); } Write operations Try to commit STO uses OCC (or TL2) to execute transactions 1. 2. 3. 4. Reads performed without locks and writes buffered locally Locks are acquired on all objects in write set Reads are certified: check values read are still valid (abort otherwise) Install writes and release locks

7 STO inherits the performance profile of OCC no aborts + low overhead = life is good we want to be here! many aborts = wasted work throughput workload contention (probability of conflicts) 8 Three main points in this talk STM is not crazy expensive anymore

DB community has for decades leveraged workload knowledge We extend STO to support many more workloads efficiently 9 Transaction chopping [Shasha et al., TODS 95] Tx 1 Tx 2 R; W; W; R; R; R; W; R; W; Assume they conflict Static analysis Tx 1

Tx 2 W; R; W; R; R; R; W; R; W; Time R; W; W; R; R; R; W; R; W; W; R; W; R; R; R; W; R; W; Time 10 Runtime Pipelining [Xie et al., SOSP 15] Static analysis + runtime checks finer chopping more concurrency under contention

Tx 1 R; W; W; R; R; R; W; R; W; Tx 2 W; R; W; R; R; R; W; R; W; Time 11 Runtime Pipelining is good at high contention

Runtime Pipelining (RP) throughput OCC workload contention (probability of conflicts) 12 Problem: hard to port RP to STO Transactions are defined at runtime and may have unknown read/write sets void credit(TArray& bal, int clients, int amt) { TRANSACTION { for (int i = 0; i < clients; i++) { bal[i] = bal[i] + amt; } }

} 13 Three main points in this talk STM is not crazy expensive anymore DB community has for decades leveraged workload knowledge We extend STO to support many more workloads efficiently 14 Strawman Give each object a unique rank (for example its memory address) Acquire locks in increasing order of rank

Release locks when all operations on a rank are done Locks currently held TRANSACTION { bal[src] = bal[src] - amt; bal[dst] = bal[dst] + amt; } // rank 1 // rank 2 15 Strawman Give each object a unique rank (for example its memory address) Acquire locks in increasing order of rank Release locks when all operations on a rank are done

Locks currently held TRANSACTION { bal[src] = bal[src] - amt; bal[dst] = bal[dst] + amt; } // rank 1 // rank 2 1 16 Strawman Give each object a unique rank (for example its memory address) Acquire locks in increasing order of rank

Release locks when all operations on a rank are done But what if program order, control flow, orLocks data currently held TRANSACTION dependencies { disagree with ranks? bal[src] = bal[src] - amt; bal[dst] = bal[dst] + amt; // rank 1 // rank 2 2

} Tx 1: Tx 2: R; W; R; W; R; W;

R; W; 17 Strawman Give each object a unique rank (for example its memory address) Acquire locks in increasing order of rank Release locks when all operations on a rank are done Locks currently held TRANSACTION { bal[dst] = bal[dst] + amt; bal[src] = bal[src] - amt; } // rank 2

// rank 1 2 18 Strawman Give each object a unique rank (for example its memory address) Acquire locks in increasing order of rank Release locks when all operations on a rank are done Locks currently held TRANSACTION { bal[dst] = bal[dst] + amt; bal[src] = bal[src] - amt; } Cannot abort either!

// rank 2 // rank 1 cannot lock 1 (not in rank order!) This is an in-place update to the shared state (not a local update as in OCC) And we have already release its lock. oops. 19 Deferred Runtime Pipelining (DRP) During execution: Asynchronously read operations + Buffer write logic Similar to lazy evaluation

Similar to OCC (but for logic instead of values) After call to commit: Acquire locks in rank order + Enqueue write logic onto objects in a pipeline Similar to RP (but enqueues logic instead of performing operations) 20 Example of DRP (actual syntax is less verbose) TRANSACTION { auto bal_dst = bal.async_read(dst);

bal.enqueue(dst, { bal_dst + amt }); auto bal_src = bal.async_read(src); bal.enqueue(src, {bal_src amt}); Locks currently held // rank 2 // rank 2 // rank 1 // rank 1 } Thread-local state Shared global state

bal[dst]: ({bal_dst + amt}) 21 Example of DRP (actual syntax is less verbose) TRANSACTION { auto bal_dst = bal.async_read(dst); bal.enqueue(dst, { bal_dst + amt }); auto bal_src = bal.async_read(src); bal.enqueue(src, {bal_src amt}); Locks currently held // rank 2 // rank 2 // rank 1

// rank 1 } Thread-local state Shared global state bal[dst]: ({bal_dst + amt}) bal[src]: ({bal_src - amt}) 22 Example of DRP (actual syntax is less verbose) TRANSACTION {

auto bal_dst = bal.async_read(dst); bal.enqueue(dst, { bal_dst + amt }); auto bal_src = bal.async_read(src); bal.enqueue(src, {bal_src amt}); Locks currently held // rank 2 // rank 2 // rank 1 // rank 1 } We know write set! Thread-local state

Shared global state bal[dst]: ({bal_dst + amt}) bal[src]: ({bal_src - amt}) 23 Example of DRP (actual syntax is less verbose) TRANSACTION { auto bal_dst = bal.async_read(dst); bal.enqueue(dst, { bal_dst + amt }); auto bal_src = bal.async_read(src); bal.enqueue(src, {bal_src amt}); Locks currently held

// rank 2 // rank 2 1 // rank 1 // rank 1 } Thread-local state bal[dst]: ({bal_dst + amt}) Shared global state append

bal[src]: ({bal_src - amt}) bal[src]: ({bal_src - amt}) 24 Example of DRP (actual syntax is less verbose) TRANSACTION { auto bal_dst = bal.async_read(dst); bal.enqueue(dst, { bal_dst + amt }); auto bal_src = bal.async_read(src); bal.enqueue(src, {bal_src amt}); Locks currently held // rank 2

// rank 2 2 // rank 1 // rank 1 } Thread-local state bal[dst]: ({bal_dst + amt}) bal[src]: ({bal_src - amt}) Shared global state append

bal[src]: ({bal_src - amt}) bal[dst]: ({bal_dst + amt}) 25 Wait a second This requires the developer to rewrite transactions... And some txs cant be expressed with async reads + deferred writes 26 DRP also supports legacy transactions Legacy transactions execute with an OCC-ish protocol New protocol allows async + legacy transactions to coexist

Cool and very efficient mechanism. See paper. DRP automatically detects if a transaction is legacy or not! Read set of legacy tx = not empty Read set of async transaction = empty (only promises) 27 Evaluation questions (more in the paper) How does DRP perform on standard benchmarks? How does DRP perform as contention varies? 28 How does DRP perform on standard

benchmarks? Silo [SOSP 13] multicore database ported to STO and DRP TPC-C benchmark new order, payment, delivery, stock level, order status STAMP: suite of multicore applications See paper or ask me about it 29 TPC-C workload Runtime Pipelining (RP) throughput OCC

STO workload contention (probability of conflicts) (transaction chopping) 30 Evaluation questions (more in the paper) How does DRP perform on standard benchmarks? How does DRP perform as contention varies? 31

TPC-C with varying contention (32 threads) STO (transaction chopping) TPC-C has 5 transaction types. We only make 3 of them async. The other 2 types abort a lot. Workload contention increases 32 Summary DRP expands the workloads that can benefit from using STO DRP works transparently with async and legacy transactions DRP guarantees opacity and avoids deadlock and aborts

33 STAMP Results (32 threads) STO uses workload-specific optimizations called predicates (DRP does not implement these) 34 Impact of async txs at high contention Percentage of async transactions 35

Actual syntax void transfer(TArray& bal, TBox& num, int src, int dst, int amt) { TRANSACTION { // the next two lines explicitly use deferred interfaces to buffer intentions Intention* bal_src = bal.defer_at(src); bal.defer_update(src, new Intention([&](int& val){ return bal_src->result - amt; }, {bal_src})); // the next three lines use syntactic sugar based on C++'s operator // overloading and implicit type conversion to achieve the same effects auto bal_dst = bal[dst]; bal[dst] = bal_dst + amt; num += 1; }

} 36 Lots more in the paper DRP guarantees opacity and deadlock freedom DRP can be implemented incrementally Details on how to implement DRP into STO with low overhead 37 Avoiding rank mismatch If we know read/write set Acquire all locks ahead of time (predeclaration locking) If you ever need to acquire a lock out of rank order, acquire all prior locks

Ask programmer to predefine accesses ACCESS([&bal[dst], 2], [&bal[src], 2], [&num, 1]); TRANSACTION { bal[dst] = bal[dst] + amt; // rank 2 bal[src] = bal[src] - amt; // rank 1 num = num + 1; // rank 3 } This is a nightmare: error-prone, hard to reason about, etc. 38

Recently Viewed Presentations

  • Quality of Service in Event Broker Networks by

    Quality of Service in Event Broker Networks by

    Quality of Service in Event Broker Networks by Shruti P. Mahambre [email protected] Advisor - Prof. Umesh Bellur
  • U.S. General Services Administration GSA SmartPay 3 (SP3)

    U.S. General Services Administration GSA SmartPay 3 (SP3)

    -- expressed in terms of basis points (bps) or cents per transaction.-- the minimum refund that must be paid to agencies for use of the product / service. SP3 Contractors must provide the minimum refund rate or higher when the...
  • Drag&amp;Drop Macro: Easter Eggs - Lingualicious

    Drag&Drop Macro: Easter Eggs - Lingualicious

    This is a game I've used recently as an end-of-term treat for all my classes. It has a fairly straightforward aim of reaching the end before the other teams, or finishing furthest ahead by the end of the lesson.
  • Chapter 13

    Chapter 13

    Chapter 13. Section Views. Objectives. Use cutaway, or section, views as a method for showing the features of a part that are normally hidden when presented on a multiview drawing. Decide when a section view is necessary. Objectives (cont'd.)
  • Fla This is a song, music  and dance

    Fla This is a song, music and dance

    Flamenco This is a song, music and dance style Strongly influenced by the Gitanos Flamenco culture originated in andalusia At the time of the Spanish Inquisition, 1499, Moors, Gitanos and Jews were fleeing Spain.
  • Healthcare Clothing - UMT

    Healthcare Clothing - UMT

    Rinse time 3 minutes Recipe for Stone Wash Name Quantity Weight of stone 5kg (Standard weight) Antistain 25ml (0.5% on the wt of garment) Chemwet 25ml (0.5% on the wt of garment) Stone Wash Procedure Temperature @ 140 ┬░F Process...
  • Computer Modeling Fundamentals - PC&#92;|MAC

    Computer Modeling Fundamentals - PC\|MAC

    Graphics Window. The active modeling area where parts and assemblies are created and edited. Quick Access Toolbar. Graphics. Window. Application Menu. Ribbon. Computer Modeling Fundamentals. PLTW Gateway® Unit 1 - Lesson 1.5 - Designing For Production. Show students how to...
  • Microfinance: Challenges and Opportunities

    Microfinance: Challenges and Opportunities

    The second wave of microfinance is microsavings, increasingly becoming mainstreamed into microfinance operations as MFIs and other actors professionalise and transition to formal, regulated institutions. The Third Wave The Third Wave of microfinance rounds out the financial services offering to...