ARM Pipelining

Pipelining is a design method or procedure that improves the efficiency of data processing in computer and microcontroller processors. By keeping the CPU in a continuous fetching, decoding, and execution process known as (the F&E cycle). Click here to read all ARM articles

RISC (Reduced instruction set computer) employs a pipelining approach to execute instructions. Pipelining in ARM boosts execution speed.
by retrieving the instruction and executing it while other instructions are being decoded and executed at the same time.
As a result, the memory system and CPU can run continually.
Each ARM family has a distinct pipeline architecture.

3-stage Pipelining

A pipeline is the mechanism a RISC processor uses to execute instructions. Using a pipeline speeds up execution by fetching the next instruction while other instructions are being decoded and executed. One way to view the pipeline is to think of it as an automobile assembly line, with each stage carrying out a particular task to manufacture the vehicle.

Fetch retrieves a memory instruction.
Decode identifies the to-be-executed instruction.
The instruction is processed by Execute, and the result is written back to the register.
The speed of execution is boosted by overlapping the aforementioned steps of execution of various instructions.
The pipelining technique allows the core to execute an instruction once every cycle, resulting in higher throughput

The figure illustrates the ARM pipelining using a simple example. It shows a sequence of three instructions being fetched, decoded, and executed by the processor. Each instruction takes a single cycle to complete after the pipeline is filled. The three instructions are placed into the pipeline sequentially. In the first cycle the core fetches the ADD instruction from memory. In the second cycle the core fetches the SUB instruction and decodes the ADD instruction. In the third cycle, both the SUB and ADD instructions are moved along the pipeline. The ADD instruction is executed, the SUB instruction is decoded, and the CMP instruction is fetched. This procedure is called filling the pipeline. The pipeline allows the core to execute an instruction every cycle. As the pipeline length increases, the amount of work done at each stage is reduced, which allows the processor to attain a higher operating frequency. This in turn increases the performance. The system latency also increases because it takes more cycles to fill the pipeline before the core can execute an instruction. The increased pipeline length also means there can be data dependency between certain stages. You can write code to reduce this dependency by using instruction scheduling.

Below are 3 stage ARM pipelining features.

It has the ability to finish its procedure in three cycles.
It utilises the fundamental F&E cycle to achieve maximum throughput.
This is why, when compared to its other family members, the ARM 7 has the lowest throughput.
It works with 32-bit data.

ARM7 Pipeline Characteristics

An instruction in the ARM pipeline is not processed until it has completed the execution step.
The PC always refers to the instruction address Plus 8 bytes throughout the execution step.
PC always refers to the instruction address Plus 4 bytes when the processor is in thumb state.
The ARM core flushes its pipeline when executing branch instructions or branching via direct change of the PC.
Even if an interrupt has been raised, an instruction in the execution stage will finish its execution.

5-Stage Pipelining

3 stage pipeline vs 5 stage pipeline

The ARM pipelining differs for every ARM family members. For example, The ARM9 core increases the pipeline length to five stages. The ARM9 adds a memory and writeback stage, which allows the ARM9 to process on average 1.1 Dhrystone MIPS per MHz—an increase in instruction throughput by around 13% compared with an ARM7. The maximum core frequency attainable using an ARM9 is also higher.

5 stage pipelining in ARM — 5 stage ARM pipelining

Pipelining is similar to ARM 7, however there are five phases in ARM 9.
The procedure takes 5 cycles to finish.
ARM pipelining in 5 stages
It will fetch instructions from memory using the fetch command.
Decode- It decodes the instructions acquired in the previous cycle.
ALU – This step performs the instruction that was previously decoded.
LS1 (Memory) loads/stores data given by load/store instructions.
LS2(Write) extracts (zero or sign) data loaded by byte or half word load instruction and extends it.
The throughput is 10 percent to 13 percent greater than ARM 7 due to an improvement in phases and efficiency.
The ARM 9 core frequency is somewhat higher than the ARM 7 core frequency.

You also can refer GFG Article – Click Here

Spread knowledge

ARM Pipelining

3-stage Pipelining

ARM7 Pipeline Characteristics

5-Stage Pipelining

3 stage pipeline vs 5 stage pipeline

Leave a Comment Cancel Reply