## **Run-Time FPGA Partial Reconfiguration** for Image Processing Applications





#### Shaon Yousuf

Ph.D. Student NSF CHREC Center, University of Florida

#### Dr. Ann Gordon-Ross

Assistant Professor of ECE NSF CHREC Center, University of Florida

#### Introduction

- Run-time reconfiguration is an important feature in SRAM-based FPGAs that allows changes in functionality dynamically
  - Enables benefits such as flexibility, hardware reuse and reduced power consumption
- Drawbacks of run-time reconfiguration
  - Entire fabric is reconfigured even for slight design changes
    - System execution stalls completely
  - Time to load a design onto the fabric from external memory (reconfiguration time) increases with bitstream size
- Run-time reconfiguration is enhanced by run-time partial reconfiguration (PR) which mitigate these drawbacks





#### Partial Reconfiguration (PR)

- PR allows the ability to reconfigure a portion of an FPGA dynamically by dividing the FPGA into two types of regions
  - Static region contains static portion of the design (Static Modules)
  - Partially reconfigurable region (PRR) loaded with a partial reconfiguration module (PRM)
- PR benefits in addition to full reconfiguration benefits
  - Only reconfigured PRR is stalled while static region and other PPRs continue operating
  - Smaller bitstreams sizes
    - Reduced power consumption
    - Reduced memory requirements
    - Reduced time to reconfigure





#### PR Challenges and Motivations



4

### Contribution

- PR architecture benefits for a JPEG encoder system
  - JPEG encoder/decoder systems are a key enabling technology for lowpower and high-performance image transmission for on-line satellite communication
- The JPEG encoder PR architecture provides increased flexibility as compared to a non-PR architecture
- Leveraging the JPEG encoder PR architecture we propose a PR architecture for a JPEG encoder/decoder (codec) system
  - The proposed PR codec architecture will provide significant benefits in terms of resource savings and power savings as well as flexibility
- Study of the PR architecture of the JPEG systems can be adapted to realize potential benefits for similar applications types





## JPEG Encoder/Decoder Process

- JPEG encoding process for color images is divided into four main steps
  - Color Space Transformation RGB to YCbCr
  - Forward Discrete Cosine Transform (FDCT)
  - Quantization
  - Entropy Encoding





## JPEG Encoder/Decoder Process

- JPEG encoding process for color images is divided into four main steps
  - Color Space Transformation RGB to YCbCr
  - Forward Discrete Cosine Transform (FDCT)
  - Quantization
  - Entropy Encoding
- JPEG decoder performs these steps in reverse





#### Skipping ahead to when processing is almost done.....







#### **Encoding Complete!**





- Pipeline controller module
  - Ability to replace with updated module
  - Ability to replace with different controller module



- RGB2YCbCr and FDCT module
  - Ability to replace with an updated module
  - Ability to skip color space transformation by replacing with a module that only does DCT
  - Ability to replace different DCT types



- Entropy encoder modules (Zigzag, Run length, Huffman, Header Generator, Byte Stuffer)
  - Ability to update each individual module

Reconfigurable Computing

- Ability to employ different entropy encoding schemes
- Ability to replace Huffman code tables and update header accordingly



BRIGHAM YOUN

**Quantization Module** 

Reconfigurable Computing

- Ability to replace with an updated module
- Ability to change quantization matrix tables to control image quality



BRIGHAM YOUNG

Advantages of JPEG Encoder PR Architecture

Reconfigurable Computing

- Provides flexibility by allowing the ability to replace different modules
- More interesting benefits arise when the encoder architecture is combined with a decoder architecture



BRIGHAM YOUNG

## JPEG Codec PR Architecture





Center for High-Performance

### JPEG Codec PR Architecture





## JPEG Codec PR Architecture





# JPEG Codec PR Architecture Contd.

- Benefits
  - Resource savings
    - Same hardware resources shared between encoder and decoder
  - Power savings
    - PR module bitstreams stored in memory and loaded on demand (decoding or encoding) as opposed to both occupying actual hardware resources
  - Increased flexibility
    - Encoder and decoder PR modules can be updated as needed or replaced with one of another type as per application requirements

#### Architecture limitations

- For a PR module loaded into a particular region
  - The loaded PR module's size and resource requirements (slices, FIFOs, BRAMs, DSPs) cannot exceed the maximum available in the PR region
  - PR module port connections, both incoming and outgoing, cannot exceed the PR regions maximum incoming and outgoing port connections, respectively





# Experimental Setup

Software

#### Xilinx ISE 9.204 with PR patch 12 installed

- Synthesize options
  - Optimization Goal Speed
  - Optimization Effort Normal

#### Hardware

Xilinx Virtex-4 LX60





## Results – Architecture Specifications

- Input image specifications
  - Color images only (3 components, RGB input)
  - Supported resolution 800x600
- JPEG Encoder system
  - JPEG baseline encoding JPEG ITU-T T.81 | ISO/IEC 10918-1
  - Standard JFIF header v 1.01 automatic generation
  - Design operates above 100 MHz
  - Hardcoded Huffman tables and two programmable quantization tables, one for luminance and one for chrominance at 50% quality settings









BRIGHAM YOUNG

NSF Center for High-Performance Reconfigurable Computing







# Results: Resource Requirements

- JPEG Encoder Architecture
  - Total Slices = 5,531
  - Total DSP48s = 9
  - Total FIFO/RAMB16s = 27

- JPEG Encoder PR Architecture
  - Total Slices = 5,678
  - Total DSP48s = 9

Reconfigurable Computing

Total FIFO/RAMB16s = 27



#### **DSP48 Requirements**



## Results: Resource Requirements

- JPEG Encoder Architecture
  - Total Slices = 5,531
  - Total DSP48s = 9
  - Total FIFO/RAMB16s = 27

- JPEG Encoder PR Architecture
  - Total Slices = 5,678
  - Total DSP48s = 9
  - Total FIFO/RAMB16s = 27



#### **FIFO/RAMB16s Requirements**





## Results: PR vs Non PR Architectures

- Encoder architecture slice requirement = 5531
- Predicted decoder architecture slice requirement ~ 5600







### Results: PR vs Non-PR Architectures

- Predicted total slice requirement for non-PR codec architecture ~ 11200
- Predicted PR codec architecture slice requirement ~ 6100



26



## Results: PR vs Non-PR Architectures

- Predicted total slice requirement for non-PR codec architecture ~ 11200
- Predicted PR codec architecture slice requirement ~ 6100
- Predicted resource savings from PR codec architecture = ((11200-6100) \* 100)/11200 = 45%!!





#### Conclusions and Future Work

#### Conclusions

- We created a JPEG encoder PR architecture for image encoding
- The architecture provides increased flexibility and potential power savings with a slice macro overhead of only 4%
- The architecture forms a base for a JPEG codec PR architecture
- The JPEG codec architecture is predicted to benefit from increased flexibility and power savings as well as area savings as much as 45% relative to a non-PR codec architecture

#### Future Work

- Complete proposed JPEG codec PR architecture
- Extend work to development of PR architectures for MPEG/H.264 encoder and decoder systems









This work was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC-0642422. We also gratefully acknowledge tools provided by Xilinx.



