

## CS4803PGC Design and Programming of Game Console Spring 2012

Prof. Hyesoon Kim





## Full/Empty Ascending/Descending

- - Descending (address grows download)
  - Ascending (Address grows upward)
  - Full/Empty: the stack pointer can be either point to the last item (a full stack) or the next free space (an empty stack)



Computing

ncreasing

Address

| Stack Type       | Push    | Pop    |
|------------------|---------|--------|
| Full descending  | STMFD / | LDMFD  |
| Full Ascending   | STMFA/  | LDMFA/ |
| Empty Descending | STMED / | LDMED  |
| Empty Ascending  | STMEA / | LDMEA  |







#### Use of R15

- R15: PC
  - PC may be used as a source operand
  - Register-based shift cannot use R15 as source operands.
- Running-ahead PC's behavior
  - PC is always running ahead
  - PC is always pointing +8 of the current instruction
    - Imagine 3-stage pipeline machine . PC is pointing what to fetch when an instruction is in the WB stage in the 3-stage pipeline machine
- When R15 is a source, the current PC + 8 is supplied to the source operand.
- When R15 is a destination
  - S: 1: SPSR→ CPSR, affecting interrupt, resource PC and CPSR automatically,









## **Exception generation time**

Pre-fetch abort: instruction fetch

| Fetch | Decode | Execute |  |
|-------|--------|---------|--|
|       |        |         |  |

PC+4

PC+8

Data abort : memory execution

Fetch Decode Execute









## **Controlling Interrupts**

```
void event EnableIRQ (void)
                                void event DisableIRQ (void)
                                  asm {
  asm {
                                        MRS r1, CPSR
       MRS r1, CPSR
                                        ORR r1, r1, #0x80
        BIC r1, r1, #0x80
                                        MSR CPSR_c, r1
        MSR CPSR c, r1
// Enable Bit 7 (set
                                  // Disable bit 7 (set 1)
   register 0)
        28 27
31
                                        8
                                               6
                                                   5
 NZCV
                                             IF
                                                        mode
                  unused
```

Bit 7: interrupt

Bit 6: Fast interrupt





- SUB Ir, Ir, #4
- STMFD sp!{reglist, Ir}

```
•
, ••••
```

LDMFD sp!, {reglist,pc}^





- Soc for embedded system.
- Single chip DSP
- Embedded applications running an RTOS
- Mass storage HDD & DVD
- Speech coders
- Automotive control
  - Cruise control, ABS, etc.
- Hands-free interfaces
- Modems and soft-modems
- Audio decoding
- Dolby AC3 digital
- MPEG MP3 audio
- Speech recognition and synthesis.



Computing

# ARM946E-S: (ARM 9 in Nintendo DS): Instructions

- Data processing instructions
- Load and store instructions
- Branch instructions
- Coprocessor instructions
  - Coprocessor data processing
  - Coprocessor register transfer
  - Coprocessor data transfer









#### **ARM 11 MP Core Processor**











#### **New Features in ARM 11**

- Improve Memory Accesses
  - Non-blocking (hit-under-miss) operations
- LD/ST and ALU are decoupled.
- Out-of-order completion:
  - Instructions that have no dependency on the outcome of the previous instruction can complete. !!! → Good or Bad?



| ARMv5TE(J)       | ARMv5TE(J)                                         | A D.M                                                                                                            |                                                                                                                                                                                               |
|------------------|----------------------------------------------------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                  | AINIVOIL(J)                                        | ARMv5TE                                                                                                          | ARMv6                                                                                                                                                                                         |
| 5                | 6                                                  | 7                                                                                                                | 8                                                                                                                                                                                             |
| (ARM926EJ)       | (ARM1026EJ)                                        | No                                                                                                               | Yes                                                                                                                                                                                           |
| No               | No                                                 | No                                                                                                               | Yes                                                                                                                                                                                           |
| No               | No                                                 | Yes                                                                                                              | Available as coprocessor                                                                                                                                                                      |
| No               | Static                                             | Dynamic                                                                                                          | Dynamic                                                                                                                                                                                       |
| No               | Yes                                                | Yes                                                                                                              | Yes                                                                                                                                                                                           |
| Scalar, in-order | Scalar, in-order                                   | Scalar, in-order                                                                                                 | Scalar, in-ord                                                                                                                                                                                |
| None             | ALU/MAC,<br>LSU                                    | ALU, MAC,<br>LSU                                                                                                 | ALU/MAC,<br>LSU                                                                                                                                                                               |
| No               | Yes                                                | Yes                                                                                                              | Yes                                                                                                                                                                                           |
| Synthesizable    | Synthesizable                                      | Custom chip                                                                                                      | Synthesizabl<br>and Hard mad                                                                                                                                                                  |
| Up to 250MHz     | Up to 325MHz                                       | 200MHz –<br>>1GHz                                                                                                | 350MHz -<br>>1GHz                                                                                                                                                                             |
|                  | No No No No Scalar, in-order None No Synthesizable | No No No No No No No Static Yes  Scalar, in-order Scalar, in-order ALU/MAC, LSU Yes  Synthesizable Synthesizable | NoNoNoNoStaticDynamicNoYesYesScalar, in-orderScalar, in-orderScalar, in-orderNoneALU/MAC,<br>LSUALU, MAC,<br>LSUNoYesYesSynthesizableSynthesizableCustom chipUp to 250MHzUp to 325MHz200MHz - |

ARM10E™

ARM9E™

**Feature** 

Figure 5. ARM Architecture Feature Comparisons

Intel®

ARM11<sup>™</sup>

#### Thumb-2 ISA







- Thumb-2 is a superset of the Thumb instruction set.
- Thumb-2 introduces 32-bit instructions that are intermixed with the 16-bit instructions. The Thumb-2 instruction set covers almost all the functionality of the ARM instruction set.
- Thumb-2 is backwards compatible with the ARMv6 Thumb instruction set.









#### **SIMD in ARM**

- Neon: ARM's SIMD engine
- 128bit SIMD
- NEON instructions perform "Packed SIMD" processing:
- Registers are considered as vectors of elements of the same data type
- Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision <u>floating point</u>
- Instructions perform the same operation in all lanes











### **Usage model of NEON**

- Watch any video in any format
- Edit and enhance captured videos video stabilization
- Anti-aliased rendering and compositing
- Game processing
- Process multi-megapixel photos quickly
- Voice recognition
- Powerful multichannel hi-fi audio processing









## **ARM BUS**



## Advanced Microcontroller Bus Architecture (AMBA)













#### **AMBA**

- AHB (Advanced High-performance Bus)
  - New standard
  - Connect high-performance system
  - Burst mode data transfer and split transactions
  - Pipelined
- ASB (Advanced System Bus)
  - Old standard
  - Connect high-performance system
  - Pipelined
  - Multiple systems
- APB (Advanced Peripheral Bus)
  - A simpler interface for low-performance peripherals
  - Low power
  - Latched address, simple interface









#### **AMBA Revisions**









#### **Bus Arbitration**









#### **AMBA Arbitration**

- A bus transaction is initiated by a bus master which requests access from a central arbiter.
- The arbiter decides priorities when there are conflicting requests.
- The design of the arbiter is a system specific issue.
- The ASB only specifies the protocol:
  - The master issues a request to the arbiter
  - When the bus is available, the arbiter issues a grant to the master.







## **Bus Pipelining**

- A memory access consists of several cycles (including arbitration)
- Since the bus is not used in all cycles, pipelining can be used to increase performance









## **Split Transactions**

- A transaction is splitted into a two transactions
  - Request-transaction
  - Reply-transaction
- Both transactions have to compete for the bus by arbitration



http://www.imit.kth.se/courses/2B1447/Lectures/2B1447\_L4\_Buses pdf







## **Burst Messages**

- Overheads can be reduced if the requests are sent as a burst
- Overheads
  - Arbitration, Addressing, Acknowledgement
- Better efficiency, but be careful with long requests



http://www.imit.kth.se/courses/2B1447/Lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B1447\_L4\_lectures/2B14







### **Bus Bridges**

- Bus bridges are used to separate highperformance devices from low-performance devices
- All communication from high-performance bus with the low performance device goes via the bridge