2013
Kitsos, Paris; Voros, Nikolaos; Dagiuklas, Tasos; Skodras, Athanassios
A high speed FPGA implementation of the 2D DCT for Ultra High Definition video coding Proceedings Article
In: 2013 18th International Conference on Digital Signal Processing (DSP), pp. 1-5, 2013, ISSN: 1546-1874.
Abstract | Links | BibTeX | Tags: 2D DCT, distributed arithmetic, FPGA implementation, VHDL, video coding
@inproceedings{Kitsos2013b,
title = {A high speed FPGA implementation of the 2D DCT for Ultra High Definition video coding},
author = {Paris Kitsos and Nikolaos Voros and Tasos Dagiuklas and Athanassios Skodras},
doi = {10.1109/ICDSP.2013.6622742},
issn = {1546-1874},
year = {2013},
date = {2013-07-01},
booktitle = {2013 18th International Conference on Digital Signal Processing (DSP)},
pages = {1-5},
abstract = {This paper presents two high performance FPGA architectures for the 2D DCT computation for Ultra High Definition video coding systems. Both architectures use Distributed Arithmetic to perform the necessary multiplications instead of traditional multipliers. The first architecture uses 105 clock cycles to transform an 8×8 block and reaches a rate of up to 206 samples per second at a 338.5 MHz frequency, while the second one requires 65 cycles for each 8×8 block and achieves a rate equal to 252 samples per second at 256 MHz. Both architectures have been implemented using VHDL. Virtex7 FPGA of Xilinx has been used for the realization of both implementations.},
keywords = {2D DCT, distributed arithmetic, FPGA implementation, VHDL, video coding},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents two high performance FPGA architectures for the 2D DCT computation for Ultra High Definition video coding systems. Both architectures use Distributed Arithmetic to perform the necessary multiplications instead of traditional multipliers. The first architecture uses 105 clock cycles to transform an 8×8 block and reaches a rate of up to 206 samples per second at a 338.5 MHz frequency, while the second one requires 65 cycles for each 8×8 block and achieves a rate equal to 252 samples per second at 256 MHz. Both architectures have been implemented using VHDL. Virtex7 FPGA of Xilinx has been used for the realization of both implementations.