Machine-Level Programming I: Basics 15-213/18-213: Introduction to Computer

Machine-Level Programming I: Basics 15-213/18-213: Introduction to Computer

Machine-Level Programming I: Basics 15-213/18-213: Introduction to Computer Systems Instructor: EJ Kim Slides are taken from the Authors and may have been modified for the class. Authors: Randal E. Bryant and David R. OHallaron Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 1 Today: Machine Programming I: Basics History of Intel processors and architectures C, assembly, machine code Assembly Basics: Registers, operands, move Arithmetic & logical operations Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 2 Intel x86 Evolution: Milestones Name Date Transistors MHz 8086 1978 29K 5-10 First 16-bit Intel processor. Basis for IBM PC & DOS 1MB address space

386 1985 275K 16-33 First 32 bit Intel processor , referred to as IA32 Added flat addressing, capable of running Unix Pentium 4E 2004 125M 2800-3800 First 64-bit Intel x86 processor, referred to as x86-64 Core 2 2006 291M 1060-3500 First multi-core Intel processor Core i7 2008 731M 1700-3900

Four cores (our shark machines) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 3 Intel x86 Processors, cont. Machine Evolution 386 Pentium Pentium/MMX PentiumPro Pentium III Pentium 4 Core 2 Duo Core i7 1985 1993 1997 1995 1999 2001 2006 2008 0.3M 3.1M 4.5M 6.5M 8.2M 42M 291M 731M Added Features Instructions to support multimedia operations Instructions to enable more efficient conditional operations Transition from 32 bits to 64 bits

More cores Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 4 2015 State of the Art Core i7 Broadwell 2015 Desktop Model 4 cores Integrated graphics 3.3-3.8 GHz 65W Server Model 8 cores Integrated I/O 2-2.6 GHz 45W Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 5 x86 Clones: Advanced Micro Devices (AMD) Historically AMD has followed just behind Intel A little bit slower, a lot cheaper Then Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies Built Opteron: tough competitor to Pentium 4 Developed x86-64, their own extension to 64 bits Recent Years Intel got its act together

Leads the world in semiconductor technology AMD has fallen behind Relies on external semiconductor manufacturer Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 6 Intels 64-Bit History 2001: Intel Attempts Radical Shift from IA32 to IA64 Totally different architecture (Itanium) Executes IA32 code only as legacy Performance disappointing 2003: AMD Steps in with Evolutionary Solution x86-64 (now called AMD64) Intel Felt Obligated to Focus on IA64 Hard to admit mistake or that AMD is better 2004: Intel Announces EM64T extension to IA32 Extended Memory 64-bit Technology Almost identical to x86-64! All but low-end x86 processors support x86-64 But, lots of code still runs in 32-bit mode Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 7 Our Coverage IA32 The traditional x86 x86-64 The standard shark> gcc hello.c

shark> gcc m64 hello.c Presentation Book covers x86-64 Web aside on IA32 We will only cover x86-64 Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 8 Today: Machine Programming I: Basics History of Intel processors and architectures C, assembly, machine code Assembly Basics: Registers, operands, move Arithmetic & logical operations Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 9 Definitions Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand or write assembly/machine code. Examples: instruction set specification, registers. Microarchitecture: Implementation of the architecture. Examples: cache sizes and core frequency. Code Forms: Machine Code: The byte-level programs that a processor executes Assembly Code: A text representation of machine code

Example ISAs: Intel: x86, IA32, Itanium, x86-64 ARM: Used in almost all mobile phones Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 10 Assembly/Machine Code View CPU Addresses Registers Data PC Condition Codes Instructions Memory Code Data Stack Programmer-Visible State PC: Program counter Address of next instruction Called RIP (x86-64) Register file Heavily used program data Memory Byte addressable array

Code and user data Stack to support procedures Condition codes Store status information about most recent arithmetic or logical operation Used for conditional branching Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 11 Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc Og p1.c p2.c -o p Use basic optimizations (-Og) [New to recent versions of GCC] Put resulting binary in file p text C program (p1.c p2.c) Compiler (gcc Og -S) text Asm program (p1.s p2.s) Assembler (gcc or as) binary Object program (p1.o p2.o) Linker (gcc or ld) binary Static libraries (.a) Executable program (p)

Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 12 Compiling Into Assembly C Code (sum.c) long plus(long x, long y); void sumstore(long x, long y, long *dest) { long t = plus(x, y); *dest = t; } Generated x86-64 Assembly sumstore: pushq movq call movq popq ret %rbx %rdx, %rbx plus %rax, (%rbx) %rbx Obtain (on shark machine) with command gcc Og S sum.c Produces file sum.s Warning: Will get very different results on non-Shark machines (Andrew Linux, Mac OS-X, ) due to different versions of gcc and different compiler settings.

Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 13 Assembly Characteristics: Data Types Integer data of 1, 2, 4, or 8 bytes Data values Addresses (untyped pointers) Floating point data of 4, 8, or 10 bytes Code: Byte sequences encoding series of instructions No aggregate types such as arrays or structures Just contiguously allocated bytes in memory Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 14 Assembly Characteristics: Operations Perform arithmetic function on register or memory data Transfer data between memory and register Load data from memory into register Store register data into memory Transfer control Unconditional jumps to/from procedures Conditional branches Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 15 Object Code Code for sumstore 0x0400595: 0x53 0x48 0x89

0xd3 0xe8 0xf2 0xff 0xff 0xff 0x48 0x89 0x03 0x5b 0xc3 Assembler Translates .s into .o Binary encoding of each instruction Nearly-complete image of executable code Missing linkages between code in different files Total of 14 bytes Each instruction 1, 3, or 5 bytes Starts at address 0x0400595 Linker Resolves references between files Combines with static run-time libraries E.g., code for malloc, printf Some libraries are dynamically linked Linking occurs when program begins execution Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition

16 Machine Instruction Example C Code *dest = t; Store value t where designated by dest movq %rax, (%rbx) Assembly Move 8-byte value to memory Quad words in x86-64 parlance Operands: 0x40059e: 48 89 03 t: Register %rax dest: Register %rbx *dest: Memory M[%rbx] Object Code 3-byte instruction Stored at address 0x40059e Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 17 Disassembling Object Code Disassembled 0000000000400595 400595: 53

400596: 48 89 400599: e8 f2 40059e: 48 89 4005a1: 5b 4005a2: c3 : d3 ff ff ff 03 push mov callq mov pop retq %rbx %rdx,%rbx 400590 %rax,(%rbx) %rbx Disassembler objdump d sum Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a.out (complete executable) or .o file Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 18 Today: Machine Programming I: Basics

History of Intel processors and architectures C, assembly, machine code Assembly Basics: Registers, operands, move Arithmetic & logical operations Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 19 x86-64 Integer Registers %rax %eax %r8 %r8d %rbx %ebx %r9 %r9d %rcx %ecx %r10 %r10d %rdx

%edx %r11 %r11d %rsi %esi %r12 %r12d %rdi %edi %r13 %r13d %rsp %esp %r14 %r14d %rbp %ebp %r15 %r15d

Can reference low-order 4 bytes (also low-order 1 & 2 bytes) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 20 general purpose Origin Some History: IA32 Registers (mostly obsolete) %eax %ax %ah %al accumulate %ecx %cx %ch %cl counter %edx %dx %dh

%dl data %ebx %bx %bh %bl base %esi %si source index %edi %di destination index %esp %sp %ebp %bp 16-bit virtual registers

(backwards compatibility) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition stack pointer base pointer 21 Moving Data Moving Data movq Source, Dest: Operand Types Immediate: Constant integer data Example: $0x400, $-533 Like C constant, but prefixed with $ Encoded with 1, 2, or 4 bytes Register: One of 16 integer registers Example: %rax, %r13 But %rsp reserved for special use Others have special uses for particular instructions %rax %rcx %rdx %rbx %rsi %rdi %rsp %rbp %rN Memory: 8 consecutive bytes of memory at address given by register Simplest example: (%rax) Various other address modes

Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 22 movq Operand Combinations Source movq Dest Src,Dest C Analog Imm Reg movq $0x4,%rax Mem movq $-147,(%rax) temp = 0x4; Reg Reg movq %rax,%rdx Mem movq %rax,(%rdx) temp2 = temp1; Mem Reg movq (%rax),%rdx *p = -147; *p = temp;

temp = *p; Cannot do memory-memory transfer with a single instruction Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 23 Simple Memory Addressing Modes Normal (R) Mem[Reg[R]] Register R specifies memory address Aha! Pointer dereferencing in C movq (%rcx),%rax Displacement D(R) Mem[Reg[R]+D] Register R specifies start of memory region Constant displacement D specifies offset movq 8(%rbp),%rdx Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 24 Example of Simple Addressing Modes void swap (long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; }

Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition swap: movq movq movq movq ret (%rdi), %rax (%rsi), %rdx %rdx, (%rdi) %rax, (%rsi) 25 Understanding Swap() Memory void swap (long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; } Register %rdi %rsi %rax %rdx Value xp yp t0

t1 Registers %rdi %rsi %rax %rdx swap: movq movq movq movq ret Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition (%rdi), %rax (%rsi), %rdx %rdx, (%rdi) %rax, (%rsi) # # # # t0 = *xp t1 = *yp *xp = t1 *yp = t0 26 Understanding Swap() Memory Registers %rdi

0x120 %rsi 0x100 Address 123 0x118 0x110 %rax 0x108 %rdx swap: movq movq movq movq ret 0x120 456 (%rdi), %rax (%rsi), %rdx %rdx, (%rdi) %rax, (%rsi) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition

# # # # 0x100 t0 = *xp t1 = *yp *xp = t1 *yp = t0 27 Understanding Swap() Memory Registers %rdi 0x120 %rsi 0x100 %rax 123 Address 123 0x118 0x110 0x108

%rdx swap: movq movq movq movq ret 0x120 456 (%rdi), %rax (%rsi), %rdx %rdx, (%rdi) %rax, (%rsi) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition # # # # 0x100 t0 = *xp t1 = *yp *xp = t1 *yp = t0 28 Understanding Swap() Memory Registers

%rdi 0x120 %rsi 0x100 %rax 123 %rdx 456 swap: movq movq movq movq ret Address 123 0x120 0x118 0x110 0x108 456 (%rdi), %rax (%rsi), %rdx %rdx, (%rdi) %rax, (%rsi)

Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition # # # # 0x100 t0 = *xp t1 = *yp *xp = t1 *yp = t0 29 Understanding Swap() Memory Registers %rdi 0x120 %rsi 0x100 %rax 123 %rdx 456 swap:

movq movq movq movq ret Address 456 0x120 0x118 0x110 0x108 456 (%rdi), %rax (%rsi), %rdx %rdx, (%rdi) %rax, (%rsi) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition # # # # 0x100 t0 = *xp t1 = *yp *xp = t1 *yp = t0 30

Understanding Swap() Memory Registers %rdi 0x120 %rsi 0x100 %rax 123 %rdx 456 swap: movq movq movq movq ret Address 456 0x120 0x118 0x110 0x108 123

(%rdi), %rax (%rsi), %rdx %rdx, (%rdi) %rax, (%rsi) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition # # # # 0x100 t0 = *xp t1 = *yp *xp = t1 *yp = t0 31 Simple Memory Addressing Modes Normal (R) Mem[Reg[R]] Register R specifies memory address Aha! Pointer dereferencing in C movq (%rcx),%rax Displacement D(R) Mem[Reg[R]+D] Register R specifies start of memory region Constant displacement D specifies offset

movq 8(%rbp),%rdx Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 32 Complete Memory Addressing Modes Most General Form D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D] D: Rb: Ri: S: Constant displacement 1, 2, or 4 bytes Base register: Any of 16 integer registers Index register: Any, except for %rsp Scale: 1, 2, 4, or 8 (why these numbers?) Special Cases (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri)Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]] Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 33 Carnegie Mellon Address Computation Examples %rdx 0xf000 %rcx 0x0100

Expression Address Computation Address 0x8(%rdx) 0xf000 + 0x8 0xf008 (%rdx,%rcx) 0xf000 + 0x100 0xf100 (%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400 0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080 Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 34 Today: Machine Programming I: Basics

History of Intel processors and architectures C, assembly, machine code Assembly Basics: Registers, operands, move Arithmetic & logical operations Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 35 Carnegie Mellon Address Computation Instruction leaq Src, Dst Src is address mode expression Set Dst to address denoted by expression Uses Computing addresses without a memory reference E.g., translation of p = &x[i]; Computing arithmetic expressions of the form x + k*y k = 1, 2, 4,x) or 8 long long m12(long m12(long x) { { return return x*12; x*12; } }

Example Converted to ASM by compiler: leaq leaq salq salq (%rdi,%rdi,2), (%rdi,%rdi,2), %rax %rax # # t t <<- x+x*2 x+x*2 $2, # $2, %rax %rax # return return t<<2 t<<2 Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 36 Carnegie Mellon Some Arithmetic Operations Two Operand Instructions: Format Computation addq Src,Dest Dest = Dest + Src subq Src,Dest Dest = Dest Src imulq Src,Dest Dest = Dest * Src salq Src,Dest Dest = Dest << Src Also called shlq sarq Src,Dest Dest = Dest >> Src Arithmetic shrq Src,Dest Dest = Dest >> Src Logical xorq Src,Dest Dest = Dest ^ Src

andq Src,Dest Dest = Dest & Src orq Src,Dest Dest = Dest | Src Watch out for argument order! No distinction between signed and unsigned int (why?) Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 37 Carnegie Mellon Some Arithmetic Operations One Operand Instructions incq decq negq notq Dest Dest Dest Dest Dest Dest Dest Dest = = = = Dest + 1 Dest 1 Dest ~Dest

See book for more instructions Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 38 Carnegie Mellon Arithmetic Expression Example long long arith arith (long (long x, x, long long y, y, long long z) z) { { long long t1 t1 = = x+y; x+y; long long t2 t2 = = z+t1; z+t1; long long t3 t3 = = x+4; x+4; long long t4

t4 = = y y * * 48; 48; long long t5 t5 = = t3 t3 + + t4; t4; long long rval rval = = t2 t2 * * t5; t5; return return rval; rval; } } arith: leaq addq leaq salq leaq imulq ret (%rdi,%rsi), %rax %rdx, %rax (%rsi,%rsi,2), %rdx $4, %rdx

4(%rdi,%rdx), %rcx %rcx, %rax Interesting Instructions leaq: address computation salq: shift imulq: multiplication But, only used once Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 39 Carnegie Mellon Understanding Arithmetic Expression Example arith: long long arith arith (long (long x, x, long long y, y, long long z) z) { { long long t1 t1 = = x+y; x+y; long long t2

t2 = = z+t1; z+t1; long long t3 t3 = = x+4; x+4; long long t4 t4 = = y y * * 48; 48; long long t5 t5 = = t3 t3 + + t4; t4; long long rval rval = = t2 t2 * * t5; t5; return return rval; rval; } } leaq addq leaq

salq leaq imulq ret Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition (%rdi,%rsi), %rax %rdx, %rax (%rsi,%rsi,2), %rdx $4, %rdx 4(%rdi,%rdx), %rcx %rcx, %rax Register Use(s) %rdi Argument x %rsi Argument y %rdx Argument z %rax t1, t2, rval %rdx t4

%rcx t5 # t1 # t2 # t4 # t5 # rval 40 Machine Programming I: Summary History of Intel processors and architectures Evolutionary design leads to many quirks and artifacts C, assembly, machine code New forms of visible state: program counter, registers, ... Compiler must transform statements, expressions, procedures into low-level instruction sequences Assembly Basics: Registers, operands, move The x86-64 move instructions cover wide range of data movement forms Arithmetic C compiler will figure out different instruction combinations to carry out computation Bryant and OHallaron, Computer Systems: A Programmers Perspective, Third Edition 41

Recently Viewed Presentations

  • Character Design for Animation and Games Week 5a:

    Character Design for Animation and Games Week 5a:

    Multiple Texture Maps: Color. The COLOR MAP, also called Diffuse or Albedo, is a powerful way to communicate detail if painted with the same lavish care we use in 2D concepts.When combined with specular, bump , and opacity maps, the...
  • Configuring CIFS - PiPo e2H - Soluciones TIC Avanzadas

    Configuring CIFS - PiPo e2H - Soluciones TIC Avanzadas

    A CIFS server is a "logical" file server that utilizes the CIFS protocol to transfer files to and from CIFS clients. To follow best practices, create the CIFS server on the prepared VDM. This allows the CIFS server to be...
  • Lecture 1: Course Introduction and Overview

    Lecture 1: Course Introduction and Overview

    Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002 ... loop-level parallelism to exploit parallelism among iterations of a loop Vector is one way If not vector, then either dynamic via...
  • How to work with Complex Text - Alabama Learning Exchange

    How to work with Complex Text - Alabama Learning Exchange

    This is an exercise to take participants through the process of using complex text in their classroom. They need to wear their "teacher-learner" hat which is different than going through as a learner. It will prepare them for the planning...
  • DR. KIRA HUDSON BANKS & JENNIFER BLOME THE

    DR. KIRA HUDSON BANKS & JENNIFER BLOME THE

    I treat people fairly or as equals. Think about how I was raised or socialized and acknowledge whether or not that conditioning has shaped my views or behavior towards other races. Just admit your biases. Do not be ashamed of...
  • Quantitative Strategic Design of the HTS Compound Collection

    Quantitative Strategic Design of the HTS Compound Collection

    Can I get better at identifying high pi clusters? improve modelling and estimation of probability values Acknowledgements Stephen Pickett Darren Green Jameed Hussain Andrew Leach Andy Whittington Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage In the Past...
  • Diapositive 1

    Diapositive 1

    À la suite du décès accidentel du Dr. Camille Marcoux, reconnu pour son dévouement à la Basse-Côte-Nord et sa contribution au développement de la région, ses parents et amis ont mis sur pied une fondation en son honneur en 1973...
  • Visualizing Stats Canada Data - Carleton University

    Visualizing Stats Canada Data - Carleton University

    Accessing Canada's Open Data through APIs. Lucia Costanzo MA, MSc, MLIS. Welcome to the Visualizing Stats Canada Data using Tableau workshop! My name is Lucia Costanzo and I assist students, staff and faculty at the University of Guelph with analyzing...