Experiments with Data Center Congestion Control Research Wei Bai APNet 2017, Hong Kong 1 The opinions of this talk do not represent t he official policy of HKUST and Microsoft 2
Data Center Congestion Control Research 2009: TCP retransmissions
2010: DCTCP, ICTCP 2011: , MPTCP 2012: PDQ, , 2013: pFabric 2014: PASE, Fastpass, CP 2015: DCQCN, TIMELY, PIAS . 3 This talk is about our experience on PIAS project.
Joint work with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang HotNets 2014, NSDI 2015 and ToN 2017 4 Outline PIAS mechanisms Implementation efforts for NSDI submission Efforts after NSDI Takeaway from PIAS experience 5
Outline PIAS mechanisms Implementation efforts for NSDI submission Efforts after NSDI Takeaway from PIAS experience 6 Flow Completion Time (FCT) is Key Data center applications Desire low latency for short messages
App performance & user experience Goal of DCN transport: minimize FCT 7 PIAS Key Idea PIAS performs Multi-Level Feedback Queue (M LFQ) to emulate Shortest Job First (SJF) Priority 1 High
Priority 2 Priority K Low 8 PIAS Key Idea PIAS performs Multi-Level Feedback Queue (M LFQ) to emulate Shortest Job First (SJF) Priority 1
Priority 2 Priority K 9 PIAS Key Idea PIAS performs Multi-Level Feedback Queue (M LFQ) to emulate Shortest Job First (SJF) In general, PIAS short flows finish in higher priority queues while large ones in lower priority queues,
emulating SJF, effective for heavy tailed DCN traffic. 10 How to implement MLFQ? Implementing MLFQ at switch directly not scal able Requires switch to keep per-flow state Priority 1 Priority 2
Priority K 11 How to implement MLFQ? Decoupling MLFQ Stateless Priority Queueing at the switch (a built-in functio n) Stateful Packet Tagging at the end host Priority 1
- priorities: thresholds: - Threshold from to is: Priority 2 Priority K priorities: thresholds: Threshold Threshold from
from to to is: is: priorities: priorities: thresholds: Threshold from to is: 12 How to implement MLFQ?
Decoupling MLFQ Stateless Priority Queueing at the switch (a built-in functio n) Stateful Packet Tagging at the end host i - priorities: thresholds: - Threshold from to is: Priority 1 Priority 2
Priority K priorities: thresholds: Threshold Threshold from from to to is: is: priorities: priorities:
thresholds: Threshold from to is: 13 Threshold vs Traffic Mismatch DCN traffic is highly dynamic Threshold fails to catch traffic variation mismatch 10MB Ideal, threshold = 20KB 10MB
High Low Too small, 10KB ECN 20KB Too big, 1MB 14
PIAS in 1 Slide PIAS packet tagging Maintain flow states and mark packets with priority PIAS switch Enable strict priority queueing and ECN PIAS rate control Employ Data Center TCP to react to ECN 15 Outline
Key idea of PIAS Implementation efforts for NSDI submission Efforts after NSDI Takeaway from PIAS experience 16 Implementation Stages ECN-based transport DCTCP at the end host ECN marking at the switch
MLFQ scheduling Packet tagging module at the end host Priority queueing at the switch Evaluation Measure FCT using realistic traffic 17 Integrate DCTCP into Linux Kernel DCTCP was not integrated into Linux in 2014 Linux patch provided by the authors
18 Integrate DCTCP into Linux Kernel DCTCP was not integrated in Linux in 2014 Linux patch provided by the authors PASS 19 ECN Marking at the Switch Switch hardware: Pica8 P-3295 1G switch Switch OS: PicOS 2.1.0
20 ECN Marking at the Switch Switch hardware: Pica8 P-3295 1G switch Switch OS: PicOS 2.1.0 No ECN and RED in this document Why our switch does not support ECN? 21 ECN Marking at the Switch
Switch model (down to top) Switching chip hardware Switching chip interfaces (all hardware features) Switch OS (some hardware features) Solution (with help from Dongsu) Use Broadcom shell to configure ECN/RED 22 DCTCP Performance Validation TCP incast experiment
TCP RTOmin: 200ms Static switch buffer allocation Expected results DCTCP greatly outperforms TCP Actual results DCTCP delivers similar / worse performance Some flows experience 3s timeout delays 23 Result Analysis
Why flows experience 3s timeouts? HZ = 1second 24 Result Analysis Why flows experience 3s timeouts? Many SYN packets get dropped ECN bits of SYN packets are 00 (Non-ECT) Root cause
Non-ECT packets: SYN, FIN, pure ACK packets The switch drops Non-ECT packets if the queue len gth exceeds the marking threshold Solution Modify all TCP packets to ECT using iptables 25 Packet Tagging Module A loadable kernel module Shim layer between TCP/IP and Qdisc
Application User Space TCP IP Packet Tagging Kernel Space Qdisc NIC Driver
26 Packet Tagging Module A loadable kernel module Shim layer between TCP/IP and Qdisc Netfilter hooks to intercept packets 27 Packet Tagging Module A loadable kernel module Shim layer between TCP/IP and Qdisc
Netfilter hooks to intercept packets Keep per-flow state in a hash table with linked lists Linked List 1 Flow 1 Flow 4 Flow 2 Flow 5
Linked List 2 Linked List 3 Linked List 4 Linked List 5 Flow 3 28 Kernel Programming Likely to cause kernel panic 29
Kernel Programming After implementing a small feature, test it! Use printk to get some useful information Common errors Spinlock functions (e.g., spin_lock_irqsave and spi n_lock) vmalloc and kmalloc (different types of memory) Pair programming 30
Priority Queueing at Switch Easy to configure using PicOS / Broadcom shell Undesirable interaction with ECN/RED Each queue is essentially a link with the varying ca pacity -> dynamic queue length threshold Existing ECN/RED solutions (queue/port/shared bu ffer pool) only support static thresholds Our choice: per-port ECN/RED Cannot preserve the scheduling policy 31
Evaluation Flow Completion Time (FCT) T(receive the last ACK) T(send the first packet) The TCP sender does not know the time to receive the last ACK in practice Measure FCT at the receiver side The receiver sends a request to the sender to get t he desired amount of data T(receive the all response) T(send the request) 32
Outline Key idea of PIAS Implementation efforts for NSDI submission Efforts after NSDI Takeaway from PIAS experience 33 Implementation Efforts Improve traffic generator Use persistent TCP connections
Better user interfaces Used in other papers (e.g. , ClickNP) Improve packet tagging module Identify message boundaries in TCP connections Monitor TCP send buffer occupancy using jprobe hooks Evaluation on Linux kernel 3.18 34 Some Research Questions How to do ECN marking with multiple queues?
Our solution (per-port ECN/RED) violates the sche duling policy for good throughput and latency How does switch mange its buffer? Incast only happens with static buffer allocation 35 Research Efforts ECN marking with multiple queues (2015-206) MQ-ECN [NSDI16]: dynamically adjust per-queue queue length thresholds
TCN [CoNEXT16]: use sojourn time as the signal Buffer management (2016-2017) BCC [APNet17]: buffer-aware congestion control f or extremely shallow-buffered data centers One more shared buffer ECN/RED configuration 36 Outline Key idea of PIAS Implementation efforts for NSDI submission Efforts after NSDI
Takeaway from PIAS experience 37 Takeaway Start to do implementation when you start a p roject. A good implementation not only makes the pa per stronger, but also unveils many research pr oblems. 38
Cloud & Mobile Group at MSRA Research Area: Cloud Computing, Networking, Mobile Computing We are now looking for full-time researchers a nd research interns Feel free to talk to me or send emails to my m anager Thomas Moscibroda 39 Thank You
El Mohan programa después de clases comenzará la primera semana de septiembre. El nuevo después de los ayudantes de la escuela . Comenzará a trabajar la semana del 18 de agosto . Se ha comprado un IPAD para cada estudiante.
RCOG Training ePortfolio: An introduction. New RCOG Training ePortfolio platform launched in May 2019. It delivers a revised RCOG Core Curriculum, also launched in May. Still in development, but all necessary resources will be live by 31 October . Access...
Posterior aspect of fibular head, upper 1/4 - 1/3 of posterior surface of fibula, middle 1/3 of medial border of tibial shaft, and from posterior surface of a tendinous arch spanning the two sites of bone origin . Insertion -
Terminology. 1. essay. A piece of writing that gives your thoughts (commentary) about a subject. All essays you will write in this unit will have at least 5 paragraphs; an introduction, 3 body paragraphs, and a concluding paragraph. ...
[Lásd: Greenshields (lineáris), Kladek, Greenberg (logaritmi-kus), Pipes and Munjal, Drew, Underwood, Drake, Zachor, Edie, Kövesné Gilicze Éva, - Debreczeni Gábor] A parkolók a hálózat működésében, mint általánosított szakaszok vesznek részt. Egy Pi parkolóban legyen a férőhely Ni és legyen t...
If so, how would he? Iago aims at a totalizing of Othello's understanding and experience. Can he evade this? See his soliloquy at l. 258ff. He comes to think of infidelity in marriage and being cuckholded as inevitable. BUT there's...
Ready to download the document? Go ahead and hit continue!