Configurable Processors: A New Age of (Not So Hard)ware and ...

Configurable Processors: A New Age of (Not So Hard)ware and ...

Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation Objectives Discuss ASIC, FPGA-based systems, and general purpose processors Analyze the operating requirements for todays embedded processors Observe the architectural differences between

state-of-the-art processors for embedded systems and high-performance general purpose processors Tensilica Xtensa Stretch S5000 Embedded Processors Requirements

operate in memory constraint environment must be energy efficient must be low cost may have to be good at a common set of tasks matrix multiplication, encryption, filtering (FIR), network packet processing, etc. Implications

low memory footprint simplified instruction set 16-bit, 24-bit may not need support for VM may lack hardware MMUs energy efficient

less complex (smaller number of transistors) simple pipeline stages less cache memory on chips simple floating point units larger transistors and slower clocks integrated function specific components for common tasks Implications (cont.) low cost

share IP cores to reduce development cost ARM, MIPS, etc. use older semiconductor process technologies (e.g. 250nm instead of 90 nm) task specific

built in DSP unit wide data bus (more data per movement) may need support for adding functions to the cores may need field-reconfigurability Rationales from The Death of Micro-Processors, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming, Rationales (cont.)

from The Death of Micro-Processors, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming, Rationales (cont.) Studies have shown that custom hardware components often require much less energy to complete their tasks than the same tasks running on general purpose processors. [1] An ASIC is custom logic for a particular application. Custom logic can be orders of magnitude more efficient than microprocessor-based solutions. [2]

[1] Lach et al., Power-Efficient Adaptable Wireless Sensor Networks, Proceedings of International Conference on Military and Aerospace Programmable Logic Devices (MAPLD), September 2003. [2] Tredennick and Shimamoto, The Death of Micro-Processors, Embedded Systems Programming, Application Specific ICs (ASICs) provide custom design solutions for particular problems fixed solutions that require public acceptance to reduce cost required extensive knowledge of hardware design

not field-reconfigurable can have large non-recurring engineering (NRE) cost ASICs (cont.) Technology Mask cost 90 nm $1,000,000

180 nm $250,000 250 nm $120,000 350 nm $60,000

Wayne Wolf, FPGA-Based System Designs, Prentice Hall, 2004 FPGA Based Systems Field-programmable gate arrays (FPGAs) are slower and require more power than custom design are more expensive but provide no wait time from completing a design to making a chip great for prototyping

are also reusable FPGAs SRAM based--volatile Altera Flex, Stratix, Cyclone, Apex Antifuse--one-time programmable Actel EEPROM--non-volatile Altera Max

ASIC Design Approaches Custom VLSI designs are fabricated on manufacturing line takes months masking cost is also expensive operate much faster and consume less power than FPGA equivalents can be cheaper of manufactured in large volume ASIC Design Approaches

(cont.) Structured ASIC is based on pre-designed logic fabric structurally embedded in the platform fill the market gap between high-density FPGAs and standard cell ASICs can greatly reduce development time and cost reduce non-recurring engineering (NRE) cost Structured ASICs

View Altera demo Integrating ASICs with GPPs Todays embedded systems have can have complex software layers OS Virtual Machine Applications It is more ideal to mate GPPs with ASICs as co-processors

Integrating ASICs with GPPs (cont.) So, we can have GPPs to perform basic tasks and ASICs (co-processors) to speed up computing intensive functions sounds simple but in reality, it is quite complex basic hand-shaking is needed between the ASICs and the main processors data exchange shared memory

requires OS and architecture support synchronous or asynchronous calls cache coherency issue ASICs and GPPs (cont.) An example is to use hardware coprocessor for Cryptography should the co-processor calls be synchronous main processor blocked on calls and wait for response or asynchronous calling process blocked and swapped out

need interrupt support need to maintain context ASICs and GPPs (cont.) Co-processor shares bus with the main CPU is a source for bus contention can cause cache coherency issue data in the main CPU cache may have been updated by the co-processor flush the cache accordingly

should be equiped with DMA to relieve the main CPU from copying data Extending GPPs Tensilica Xtensa reconfigurable processor cores

support native 16-bit and 24-bit instruction for higher code density users can add/subtract components (MMU, Multipliers, FPUs) users can reconfigure cache organization users can select bus width (32, 64, or 128 bits) users defined instruction extension language users can create custom instructions to speed up commonly used functions users can instantiate custom registers of different sizes Tensilica Xtensa

from Tensilica Xtensa (cont.) We will not go into great detail about the Xtensa. However, we will study Stretch S5000 engine which is based on the Xtensa core. Design Time Solutions Up to now, we have only talked about design-time solutions! logic designs are done in house

not very reconfigurable after the chip is made even with FPGAs, someone has to come up with a new hardware design for it to change the Xtensa needs about 1 hours to synthesize the instruction extension What if we want to configure on the fly! each application brings in CPU intensive functions these functions are not known in advance Can we leave it up to the software developers to design fast co-processor?

Run-Time Configuration (R)evolution of Processors Ice Hard Rock Hard Playdough Hard (R)evolution of Processors Ice Hard

Hardwire, GPP Perform well in most conditions but not extreme conditions Rock Hard Playdough Hard (R)evolution of Processors Ice Hard

GPP with FPGAs Custom designs perform well in some extreme conditions. Required extensive knowledge Of hardware design Rock Hard Play Dough Hard (R)evolution of Processors Ice Hard

Rock Hard GPP with embedded programmable logics Reconfiguration triggered by software Playdough Hard (R)evolution of Processors

Ice Hard Contains ASIC (Application Specific IC) designs Increases time-tomarket Takes time to reconfigure Software Hotspots In DSP 80% of the processing load are spent on 20% of the code

Hand tuned assembly that can take thousands of cycle to execute. Less portable The remaining 80% of the code have complex system functions Run well on most GPP Software Hotspots Example when 16 QuadAM modem (19.2 Kbaud) implemented entirely in software takes 177,000 instruction cycles to execute on

TIC6711 FPGA Co-processor (a few cycles) Solving Hotspots PROCESSOR + FPGA P MULTIPLE DSPs P FPGA



FPGA DSP CPU FLEXIBILITY & TTM SCP = Software Configurable Processor

Recently Viewed Presentations

  •  Helbing 2013, World Economic Forum, reproduced with permission

    Helbing 2013, World Economic Forum, reproduced with permission

    Qu'est-ce qu'un système socio-écologique? ….« système complexe intégré qui comprend des sous-systèmes social (humain) et écologique (nature) en interaction réciproque »(Berkes 2011) On parle de système socioécologique parce que « La délimitation entre les systèmes sociaux et écologiques est artificielle...
  • Diapositiva 1 -

    Diapositiva 1 -

    Diagramación. Antes que Newton, todavía en el Renacimiento, Galileo Galilei (1564-1642), expresa esta revolución en la aplicación y desarrollo matemático respecto al espacio y el tiempo.
  • High Priests and Lowly Philosophers: The Battle for the Soul ...

    High Priests and Lowly Philosophers: The Battle for the Soul ...

    -- Ecclesiasticus 3: 21, 24-26, 28 Social Engineering Worldly Philosophy * * * * Formal Language Natural Language Universal Particular 1940s-1950s Samuelson - Arrow, Hahn, Debreu Formalistic Histocism Game theory (folk theorem) Market failure theory and information economics New Institutionalism,...
  • IBM Presentations: Smart Planet Template

    IBM Presentations: Smart Planet Template

    Workloads Large memory footprints and high CPU Utilization with high context switch rates are typically identified as candidates Can be either multi-threaded or a group of single-threaded processes. ... DSCR optimization in ASO is enabled Memory access patterns are monitored...
  • Do Now 1. What is the atmosphere? 2.

    Do Now 1. What is the atmosphere? 2.

    Each circle represents a layer of the atmosphere. You will label each layer so it can be seen from the outside of your foldable. Then you will fold each layer open and write what happens in that layer below. Then...
  • Horatio Algier is Dead

    Horatio Algier is Dead

    Horatio Alger, Jr. was an American author (1832-1899). Alger produced the one of the first boys' adventure series. Alger published over 118 novels in book form. Another 280 novels were produced in magazines along with more than five hundred short...
  • Ionic compounds - PC\|MAC

    Ionic compounds - PC\|MAC

    Formula unit--represents the most simple ratio of the ions in an ionic compound. The total number of e-'s gained by the nonmetal must equal the number lost by the metal atoms. Oxidation states—give insight into how the atom will bond....
  • Technological Advances - API

    Technological Advances - API

    Amended MARPOL Annex VI at MEPC 57, April 2008. Established new S limits for Global Cap and Emission Control Areas (ECA) Most industry NGOs supported keeping residual fuel in the mix. INTERTANKO called for global low-sulfur distillate solution. Took the...