Memory Continued Associate Cache Dr. John P. Abraham UTRGV CPU is unaware of cache CPU issues a memory address and expects a result back. Each RAM address does not need a label, simply wired to each block. The request is intercepted by the cache hardware that sits between the CPU and the memory. 64 byes blocks usually.
Cache is not unique to memory; there is a cache between hard drive and RAM (4 KB page size) Intel processor I7 with multicore Has Level 1 instruction cache and level 1 data cache for each core (32K), takes about 4 cycles. instructions dont need to be written back to RAM data may need to be written back, so use a dirty bit (next slide) Has level 2 cache for each core (256K), takes about 11 cycles. Also has a common level 3 cache for the entire CPU (8 M), takes about
40 cycles. RAM is generally 16 G, takes about 50 to 200 cycles. Associative Memory (cache) Each block is identified by a key or tag (recall from slide 1, RAM does not have need a tag). This tag goes along with the contents of the memory. https://miniwebtool.com/log-base-2-calculator/ good for cache calculations. Given RAM size of 128 K, cache memory size 16 K, block size 256 bytes and byte addressable. To address 128 K memory it requires 17 bits. So tag and offset should be 17 bits
Offset for 256 bytes is 8 bits The remaining bits (17-8=9) will be used for the Tag in a fully associative. So in addition to the contents of the memory, each line of cache should contain tag information, a dirty bit, and a valid bit Valid bit and dirty bit When computer boots cache is not populated with valid data. So the bit should be set to 0. When a block is brought in the valid bit should be set to 1. This
should happen also when processes are swapped in. When a data is updated, the new value can be written directly to cache and memory immediately (write-through), or it can just update the cache anticipating further changes; and set the dirty bit to 1. Before evicting the line to bring another line, if it has a dirty bit, it should be written to the RAM (this is called Write-Back) Another fully associative example Given RAM of 4GB, cache 4MB, and block size 1KB.
Bits required to address RAM is 32 Total number bits required to address cache is 22
Number of cache lines is 22-10 = 12 so we need this many comparisons. Required bits for block offset is 10 Tag for fully associative is 32-10 = 22 The tag will indicate where in RAM this address can be found. Tag__22___Offset 10 Cache hardware need to do parallel comparisons of 2^12 and do OR operations on all of them to determine if there is a Hit or Miss. Same memory but different block size
RAM 4 GB. Block Size 64 bytes. Requires 32 bits 64 bytes require 6 bits 26 6 Tag Offset 4-way associative cache using the same numbers Given RAM of 4GB, cache 4MB, and block size 1KB. Bits required to address RAM is 32 Total number bits required to address cache is 22
Number of cache lines is 22-10 = 12, but these are divided into 4 lines per set, giving us 2^10 sets Required bits for block offset is 10 12 10 10 Tag SetNo Offset Reads from the RAM - Cache hit and miss
CPU issues an address, cache hardware intercepts it and extracts tag field. Compares the cache tags (if direct mapping it is easy, only need to check one line). If found and if the valid bit is on, we have a hit. Return the value to the CPU Otherwise it is a miss. Cache must get the block from the RAM. Before retrieving the data one line must be evicted. If it is direct mapped, we know which one. But before evicting, check dirty bit, if it is on, write the content back to the RAM and then replace If not direct mapped, use an algorithm such as LRU, random, FIFO or LIFO to determine which line to be evicted then check dirty bit, and such.
Writes to the RAM CPU issues the memory address and data to be written Compare address with cache tag, same as previous slide If cache has the block, then use one of the following: Write-through. Update the cache and the RAM immediately Write-back. Update the cache only and turn on the dirty bit. Only writes to the RAM when that line is evicted. If cache does not have the block use one of the following:
Write-allocate: Read the block from RAM into the cache and update the cache with CPU instruction and set dirty bit to 1. Only written to RAM when the block is evicted. No-Write-allocate (or write around): Send the write on through to memory, do not load into cache. Write-through may be better with write-hits and no-write allocate with write misses. Your Textbook Appendix B What I covered is mostly from Appendix B. Cache performance: Avg mem access time = Hit time + miss rate x miss penalty
The access time can be further dived into instruction hit and data hit. In out of order executions, instruction may not be in the cache Six basic cache optimizations 1. Larger block sizes reduce miss rate 2. Larger cache continued 3. Higher associativity
Continued 4. Multilevel caches 5. Give priority to reads before writes 6. Avoid address translation during indexing of cache This is due to virtual machines and virtual caches. Not explained here in this class. Chapter 2: Memory Hierarchy Design We already discussed this.
Text gave 10 optimizations to improve cache performance, many of these wont make sense to you until I cover pipelining and ILP. Book also discussed memory protection and virtual machines when using cache. Coherency of cache when multiple cores or CPUs are used also discussed as data will be distributed to several caches.
Synonyms brutish vile cruel Antonyms humane kind effete (adj.) - worn out; barren Although worn down by age and a life of hard work, the man was far from effete. exhausted spent and sterile vital vigorous Synonyms Antonyms shard (noun)...
Family branding: Challenge in Developing New Products Need to shorten products "time to market" Reduce time between idea generation and product availability Reduce market risk and internal risk A short "time to market" can create a competitive advantage Challenge in...
>5 <5. Binary Search Tree (BST) A binary search tree is a binary tree that is ordered and has no duplicate values.In other words, for every node:. All nodes in the left subtree have values that are less than the...
CPU comunicaciones: basada en MP de 16 bits, comunicaciones disponibles RS-232, RS-485, Fibra Optica Bus frontal + módulo MMI (+ Display gráfico + 6 teclas de control) PANEL DE ALARMAS 96 Alarmas configurables (32 de protección, 48 de control, 16...
Claims and characters. You are going to be divided into groups. Each group will be assigned a character. Your group will make a broad claim about your character's motivations in ACT 3. You will provided quotes (textual evidence) to support...
Adaptive Leadership CHAPTER 13: SEE YOURSELF AS A SYSTEM continued. Photo courtesy of referenceforbusiness.com. When you combine all of these together, or as the book describes as, "situational insights," along with the insights into yourself as a system, you will...
Ready to download the document? Go ahead and hit continue!