Embedded Systems in Silicon TD5102 MIPS design Datapath and Control Henk Corporaal http://www.ics.ele.tue.nl/~heco/courses/EmbSystems Technical University Eindhoven DTI / NUS Singapore 2005/2006 Topics Building a datapath A single cycle processor datapath HC TD5102 all instruction actions in one (long) cycle A multi-cycle processor datapath support a subset of the MIPS-I instruction-set each instructions takes multiple (shorter) cycles Exception support 2 Datapath and Control FSM or Microprogramming
Registers & Memories Multiplexors Buses ALUs Control HC TD5102 Datapath 3 The Processor: Datapath & Control Simplified MIPS implementation to contain only: Generic Implementation: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why?
HC TD5102 memory-reference? arithmetic? control flow? 4 More Implementation Details Abstract / Simplified View: Data PC Address Instruction memory Instruction Register # Registers Register # ALU Address Data memory Register # Data Two types of functional units: HC TD5102
elements that operate on data values (combinational) elements that contain state (sequential) 5 State Elements Unclocked vs. Clocked Clocks used in synchronous logic when should an element that contains state be updated? falling edge cycle time rising edge HC TD5102 6 An unclocked state element The set-reset (SR) latch output depends on present inputs and also on past inputs R Q Q S Truth table: HC TD5102 R 0 0 1
1 S 0 1 0 1 Q Q 1 0 ? state change 7 Latches and Flip-flops Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge (edge-triggered methodology) A clocking methodology defines when signals can be read and written wouldn't want to read a signal at the same time it was being written HC TD5102 8 D-latch
Two inputs: the data value to be stored (D) the clock signal (C) indicating when to read & store D Two outputs: the value of the internal state (Q) and it's complement C Q _ Q D HC TD5102 D C Q 9 D flip-flop Output changes only on the clock edge D D C D latch Q D
Q D latch _ C Q Q _ Q C D C Q HC TD5102 10 Our Implementation An edge triggered methodology Typical execution: read contents of some state elements, send values through some combinational logic, write results to one or more state elements State element 1 Combinational logic State element
2 Clock cycle HC TD5102 11 Register File 3-ported: one write, two read ports Read reg. #1 Read data 1 Read reg.#2 Read data 2 Write reg.# Write data Write HC TD5102 12 Register file: read ports Register file built using D flip-flops Read register number 1 Register 0 Register 1 M Register n 1 u x
Read data 1 Register n Read register number 2 M u Read data 2 x Implementation of the read ports HC TD5102 13 Register file: write port Note: we still use the real clock to determine when to write W r ite 0 R e g is te r n um b e r C R e g iste r 0 1 D n -to -1 d e co d e r C n 1 R e g iste r 1
D n C R eg is te r n 1 D C R e gis te r n R e g ister d ata HC TD5102 D 14 Simple Implementation Include the functional units we need for each instruction Instruction address MemWrite PC Instruction Add Sum Instruction memory Address a. Instruction memory 5 Register numbers 5 5 Data
b. Program counter 3 Read register 1 Read register 2 Registers Write register Write data c. Adder ALU control Write data Read data Data memory Data Sign extend 32 MemRead a. Data memory unit Read data 1 16 b. Sign-extension unit Zero
ALU ALU result Read data 2 RegWrite a. Registers HC TD5102 b. ALU Why do we need this stuff? 15 Building the Datapath Use multiplexors to stitch them together PCSrc M u x Add Add ALU result 4 Shift left 2 Registers PC Read address Instruction Instruction memory Read register 1 Read Read
data 1 register 2 Write register Write data RegWrite 16 HC TD5102 ALUSrc Read data 2 Sign extend M u x 32 3 ALU operation Zero ALU ALU result MemWrite MemtoReg Address Read data Data Write memory data M u x
MemRead 16 Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thing to be done ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path S tate element 1 Combinational logic State element 2 Clock cycle We are ignoring some details like setup and hold times ! HC TD5102 17 Control Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 000000 op HC TD5102 Instruction Format: 10001 rs 10010 rt 01000 rd 00000 shamt 100000 funct ALU's operation based on instruction type and function code 18 Control: 2 level implementation 31 6
Control 2 26 instruction register Opcode bit 00: lw, sw 01: beq 10: add, sub, and, or, slt Control 1 Funct. HC TD5102 2 ALUop 5 6 3 ALUcontrol 000: and 001: or 010: add 110: sub 111: set on less than ALU 0 19 Datapath with Control 0 M u x Add 4
Instruction  PC Instruction memory Instruction  Instruction  Zero ALU ALU result Address Read register 1 Instruction  Instruction  1 Shift left 2 RegDst Branch MemRead MemtoReg Control ALUOp MemWrite ALUSrc RegWrite Instruction  Read address ALU Add result
0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data 16 Sign extend Write data Read data Data memory 1 M u x 0 32
ALU control Instruction  HC TD5102 20 ALU Control1 What should the ALU do with this instruction example: lw $1, 100($2) HC TD5102 2 1 op rs rt 100 16 bit offset ALU control input 000 001 010 110 111 35 AND OR
add subtract set-on-less-than Why is the code for subtract 110 and not 011? 21 ALU Control1 Must describe hardware to compute 3-bit ALU control input given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic function code for arithmetic ALU Operation class, computed from instruction type Describe it using a truth table (can turn into gates): intputs HC TD5102 ALUOp ALUOp1 ALUOp0 0 0 X 1 1 X 1 X 1 X 1 X
1 X F5 X X X X X X X Funct field F4 F3 F2 F1 X X X X X X X X X 0 0 0 X 0 0 1 X 0 1 0 X 0 1 0 X 1 0 1 outputs Operation F0 X X 0 0 0 1 0 010 110 010 110 000 001 111 22 ALU Control1
Simple combinational logic (truth tables) ALUOp ALU control block ALUOp0 ALUOp1 F3 F (5 0) F2 F1 Operation2 Operation1 Operation Operation0 F0 HC TD5102 23 Deriving Control2 signals Input 6-bits 9 control (output) signals Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0
0 1 1 1 1 0 0 0 0 lw X 1 X 0 0 1 0 0 0 sw X 0 X 0 0 0 1 0 1 beq Determine these control signals directly from the opcodes: R-format: 0 lw: 35 sw: 43 beq: 4 HC TD5102 24 Control 2
Inputs Op5 Op4 PLA example implementation Op3 Op2 Op1 Op0 Outputs R-format Iw sw beq RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOpO HC TD5102 25 Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns)
PCSrc Add 4 RegWrite Instruction [25 21] PC Read address Instruction [31 0] Instruction memory Instruction [20 16] 1 M u Instruction [15 11] x 0 RegDst Instruction [15 0] Read register 1 Read register 2 Read data 1 Read Write data 2 register Write Registers data 16 Sign 32 extend Shift
left 2 ALU Add result 1 M u x 0 MemWrite ALUSrc 1 M u x 0 ALU control Zero ALU ALU result MemtoReg Address Write data Read data Data memory 1 M u x 0 MemRead
Instruction [5 0] ALUOp HC TD5102 26 Single Cycle Implementation Memory (2ns), ALU & adders (2ns), reg. file access (1ns) Fixed length clock: longest instruction is the lw which requires 8 ns Variable clock length (not realistic, just as exercise): HC TD5102 R-instr: Load: Store: Branch: Jump: 6 ns 8 ns 7 ns 5 ns 2 ns Average depends on instruction mix (see pg 374) 27
Where we are headed Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area: NO Sharing of Hardware resources One Solution: use a smaller cycle time have different instructions take different numbers of cycles a multicycle datapath: Instruction register PC Address IR ALU Registers Memory data register MDR HC TD5102 A Register # Instruction Memory or data
Data Data ALUOut Register # B Register # 28 Multicycle Approach We will be reusing functional units Add registers after every major functional unit Our control signals will not be determined solely by instruction HC TD5102 ALU used to compute address and to increment PC Memory used for instruction and data e.g., what should the ALU do for a subtract instruction? Well use a finite state machine (FSM) or microcode for control 29 Review: finite state machines
Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Current state Inputs Next-state function Next state Clock Output function HC TD5102 Outputs Well use a Moore machine (output based only on current state) 30 Multicycle Approach Break up the instructions into steps, each step takes a cycle At the end of a cycle
store values for use in later cycles (easiest thing to do) introduce additional internal registers Notice: we distinguish HC TD5102 balance the amount of work to be done restrict each cycle to use only one major functional unit processor state: programmer visible registers internal state: programmer invisible registers (like IR, MDR, A, B, and ALUout) 31 Multicycle Approach PC 0 M u x 1 Address Memory MemData Write data Instruction  Read register 1 Instruction 
Read Read data 1 register 2 Registers Write Read register data 2 Instruction  Instruction  Instruction register Instruction  Memory data register HC TD5102 0 M u x 1 0 M u x 1 A B Sign extend 32 Zero
ALU ALU result ALUOut 0 4 Write data 16 0 M u x 1 1M u 2x 3 Shift left 2 32 Multicycle Approach Note that previous picture does not include: branch support jump support Control lines and logic Tclock > max (ALU delay, Memory access, Regfile access)
See book for complete picture HC TD5102 33 Five Execution Steps Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step INSTRUCTIONS INSTRUCTIONSTAKE TAKEFROM FROM33--55CYCLES! CYCLES! HC TD5102 34 Step 1: Instruction Fetch
Use PC to get instruction and put it in the Instruction Register Increment the PC by 4 and put the result back in the PC Can be described succinctly using RTL "Register-Transfer Language" IR = Memory[PC]; PC = PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? HC TD5102 35 Step 2: Instruction Decode and Register Fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch Previous two actions are done optimistically!! RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC+(sign-extend(IR[15-0])<< 2); HC TD5102 We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) 36
Step 3 (instruction dependent) ALU is performing one of four functions, based on instruction type Memory Reference: ALUOut = A + sign-extend(IR[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut; Jump: PC = PC[31-28] || (IR[25-0]<<2) HC TD5102 37 Step 4 (R-type or memory-access) Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; R-type instructions finish Reg[IR[15-11]] = ALUOut; The write actually takes place at the end of the cycle on the edge
HC TD5102 38 Write-back step Memory read completion step Reg[IR[20-16]]= MDR; What about all the other instructions? HC TD5102 39 Summary execution steps Steps taken to execute any instruction class Step name Instruction fetch Action for R-type instructions Instruction decode/register fetch Action for memory-reference Action for instructions branches IR = Memory[PC] PC = PC + 4 A = Reg [IR[25-21]] B = Reg [IR[20-16]] ALUOut = PC + (sign-extend (IR[15-0]) << 2) Execution, address computation, branch/ jump completion ALUOut = A op B ALUOut = A + sign-extend
(IR[15-0]) Memory access or R-type completion Reg [IR[15-11]] = ALUOut Load: MDR = Memory[ALUOut] or Store: Memory [ALUOut] = B Memory read completion HC TD5102 if (A ==B) then PC = ALUOut Action for jumps PC = PC [31-28] II (IR[25-0]<<2) Load: Reg[IR[20-16]] = MDR 40 Simple Questions How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, L1 add $t5, $t2, $t3 sw $t5, 8($t3) L1: ... HC TD5102
#assume not taken What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? 41 Implementing the Control Value of control signals is dependent upon: Use the information we have accumulated to specify a finite state machine (FSM) HC TD5102 what instruction is being executed which step is being performed specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification 42 FSM: high level view Start/reset Instruction fetch, decode and register fetch Memory access instructions HC TD5102 R-type
instructions Branch instruction Jump instruction 43 Memory address computation ( Op 2 = 'L or W ') (Op = ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (O p ') 'S W ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 p = 'S W ')
Memory access e) Branch completion Execution 6 = yp R -t 'B How many state bits will we need? (Op = 'LW') 1 = Start MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PC Source = 00 (O p
0 (Op = 'J') Graphical Specification of FSM Instruction decode/ register fetch Instruction fetch PCWrite PC Source = 10 Finite State Machine for Control PCWrite PCWriteCond IorD MemRead Implementation: MemWrite IRWrite Control logic MemtoReg PCSource ALUOp Outputs ALUSrcB ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 Instruction register opcode field
HC TD5102 S0 S1 S2 S3 Op0 Op1 Op2 Op3 Op4 Op5 Inputs State register 45 Op4 opcode PLA Implementation Op5 (see fig C.14) Op3 Op2 Op1 Op0 S3
current state S2 S1 S0 If I picked a horizontal or vertical line could you explain it ? What type of FSM is used? datapath control PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource1 PCSource0 ALUOp1 ALUOp0 ALUSrcB1 ALUSrcB0 ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 next state HC TD5102
46 ROM Implementation ROM = "Read Only Memory" values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2m entries in the ROM our outputs are the bits of data that the address points to ROM n bits m bits address 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0
data 0 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 0 1 m is the "heigth", and n is the "width" HC TD5102 47 ROM Implementation HC TD5102 How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs ROM is 210 x 20 = 20K bits (very large and a rather unusual size) Rather wasteful, since for lots of the entries, the outputs are
the same i.e., opcode is often ignored 48 ROM Implementation Cheaper implementation: Exploit the fact that the FSM is a Moore machine ==> Control outputs only depend on current state and not on other incoming control signals ! Next state depends on all inputs Break up the table into two parts 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM Total number of bits: 4.3K bits of ROM HC TD5102 49 ROM vs PLA PLA is much smaller can share product terms (ROM has an entry (=address) for every product term only need entries that produce an active output
can take into account don't cares Size of PLA: (#inputs #product-terms) + (#outputs #product-terms) HC TD5102 For this example: (10x17)+(20x17) = 460 PLA cells PLA cells usually slightly bigger than the size of a ROM cell 50 Exceptions Unexpected events External: interrupt Internal: exception e.g. Overflow, Undefined instruction opcode, Software trap, Page fault How to handle exception? HC TD5102
e.g. I/O request Jump to general entry point (record exception type in status register) Jump to vectored entry point Address of faulting instruction has to be recorded ! 51 Exceptions Changes needed: see fig. 5.48 / 5.49 / 5.50 Extend PC input mux with extra entry with fixed address: C000000hex Add EPC register containing old PC (well use the ALU to decrement PC with 4) Cause register (one bit in our case) containing: 0: undefined instruction 1: ALU overflow Add 2 states to FSM HC TD5102 extra input ALU src2 needed with fixed value 4 undefined instr. state #10 overflow state #11 52
Reduces energy consumption in AHUs, mid-walls, evaporators, condensors, cassettes, consols, and hideaways by keeping them in pristine condition for 12 months of the year. Provides Facilities Managers with accurate feedback on the results of the O3 coil cleaning programmes.
Let's Talk About the Attack! Legal: The "meet and greet". ... White attack uses her free hand to grasp the defender's arm and get around her. Note: ... "The head of the stick dropping below the 10 and 2 o'clock...
A Model for Church Growth ... Prayer Warriors, IOCC, Visitation, Correspondence a. Activity Frequency/Commitment Level. Activities for Group Peace & Unity - Social - Individuals helping the Group during Group Activities. Special Care Visitation - Correspondence
The aim is to make a material with the appearance and clarity of standard glass but with effective protection from small arms. usually made from a combination of two or more types of glass, one hard and one soft. ......
[HW-182P] Touch firmware development: Talking HID to Windows [APP-185T] Make great Metro style apps that are touch-optimized using HTML5 [APP-186T] Build advanced touch apps in Windows 8 [PLAT-192C] Bring pen and touch input to your Metro style apps with ink...
Team has neither control nor influence, and should not take on this problem. Vision: The model of how the organization would like to be perceived by those internal and those external to the organization. Appendix 2 - QI Terminology
Welcome to CASCON 2006 Workshop on Model Fusion October 17, 2006 Some people to thank Organizers: Kim Letkeman (IBM), Jourgen Dingel (Queens), Zinovy Diskin (Queens), Marsha Chechik (Toronto), Steve Easterbrook (Toronto), Sebastian Uchitel (Imperial College) Additional speakers: Jon Whittle (George...
Ready to download the document? Go ahead and hit continue!