Advances in video compression: a glimpse of the long-awaited disruption

Thomas Guionnet; Marwa Tarchouli; Sebastien Pelurson; Mickael Raulet

Authors

Thomas Guionnet Ateme
Marwa Tarchouli
Sebastien Pelurson
Mickael Raulet

Keywords:

Video Compression, machine learning

Abstract

The consumption of video content on the internet is increasing at a constant pace, along with an increase of video quality. Cisco [1] estimates that by 2023, two-thirds (66 percent) of the installed flat-panel TV sets will be UHD, up from 33 percent in 2018. The bitrate for 4K video is more than double the HD video bitrate, and about nine times more than SD bitrate. As an answer to the ever-growing demand for high quality video, compression technology improves steadily. Video compression is a highly competitive and successful field of research and industrial applications. Billions of people are impacted, from TV viewers and streaming addicts to professionals, from gamers to families. Video compression is used for contribution, broadcasting, streaming, cinema, gaming, video-surveillance, social networks, videoconferencing, military, you name it.

The video compression field stems from the early 80’s. Since then, it has grown continuous improvements, and strong attention from the business side - the video encoder market size is planned to exceed USD 2.2 Billion by 2025 [2]. Among the many well-known milestones of the video compression, MPEG-2 has been a tremendous success in the 90’s, and the enabler of digital TV. MPEG-2 has been present on cable TV, satellite TV, DVD. In the early 2000’s, AVC/H.264 has been a key component of HD TV, on traditional networks as well as on internet and mobile networks. AVC/H.264 is also used in HD Blu-Ray discs. Ten years later, in the 2010’s, HEVC (H.265) has been the enabler of 4k/UHD, HDR and WCG. Finally, VVC (H.266) has been issued in 2020. Although it is a young codec, not yet widely deployed, it is perceived as an enabler for 8k [3] and as a strong support for the ever-growing demand for high quality video over the internet.

Each codec generation allows decreasing the bitrate approximately by a factor two. This comes however at the cost of increased complexity. The reference VVC encoder is about 10 times more complex than the reference HEVC encoder. Interestingly, the technology does not change radically between codec generations. Instead, the same principles and ideas are re-used and pushed further. Of course, there are new coding tools, but the overall structure remains the same. Let us consider a simple example: Intra prediction mode, which consists in encoding a block of a frame independently from previous frames. In MPEG-2, intra block coding is performed without prediction from neighboring blocks. In AVC/H.264, intra blocks are predicted from neighboring blocks, with 9 possible modes. In HEVC, the prediction principle is reconducted, with 35 possible modes, while VVC is pushing further to 67 possible prediction modes. Having more prediction modes allows better predictions, hence better compression (even though mode signaling cost increases), at the cost of more complexity for the encoder which must decide among a larger set of possibilities.

The encoding structure we are dealing with is the block-based hybrid video coding scheme. One natural question which arises is how far we can push this model. In other words, can we improve steadily the compression performance of this model decades after decades, by pushing the parameters and adding more local coding tools, or are we converging to a limit? At each codec generation, the question has been raised, and answered by the next generation. People have tried to propose new competing models. For example, in the early 2000’s, 3D motion compensated wavelet filtering was studied as a mean of compacting efficiently video energy [4]. The technology was promising, but never surpassed the emerging AVC/H.264 at that time.

Nowadays, the recognized industry benchmark in terms of video compression performance is VVC. Can we go beyond the VVC performance? Well, the answer is already known, and it is yes. Indeed, the JVET standardization group, which is responsible for VVC, is currently conducting explorations. The Ad-Hoc Group 12 (AHG12) is dedicated to the enhancement of VVC. Around 15% coding efficiency gains are observed, only two years after VVC finalization [5]. So, we may continue the process for at least another decade.

However, there is a new contender arising: artificial intelligence; or more precisely, machine learning, or deep learning. In another Ad-Hoc Group, AHG11, JVET is exploring how machine learning can be the basis of new coding tools. This also brings coding efficiency gains of about 12% [6]. Hence the question: will the future of video compression include machine learning? At this stage, we would like to point-out two new facts.

First, considering the “traditional” methods explored by AHG12, there is a coding tool which seems to stop bringing gains: frame partitioning. The partitioning is a fundamental tool for video compression. It defines how precise can be the adaptation of the encoder to local content characteristics. The more flexible it is, the better the coding efficiency. All the subsequent coding tools depend on the ability to partition the frame efficiently. AVC has 16x16 pixels blocks, with some limited sub-partitioning. HEVC implements a much more flexible quadtree based partitioning from 64x64 pixels blocks. VVC combines quadtree partitioning with binary and ternary tree partitioning, from 128x128 pixels blocks for even more flexibility. During the exploration following HEVC standardization, the single fact of enhancing the partitioning brought up to 15% coding efficiency gains. Similarly, in the AHG12 context, people came with new extended partitioning strategies. However, only marginal gains were reported [7]. Does that mean we are finally approaching a limit?

The second fact is the development of end-to-end deep learning video compression. This strategy is highly disruptive. In short, the whole block-based hybrid coding scheme is replaced by a set of deep learning networks, such as auto-encoders. These types of schemes are competing with state-of-the-art fixed image coders [8]. For video applications, they are matching HEVC performance [9][10]. This level of performance has been reached in only five years. That’s an unprecedently fast progression. One may easily extrapolate, even if the progression slows down, that the state-of-the-art video compression performance will soon be the end-to-end strategy prerogative. Therefore, we may very well be at a turning point of the video codec history.

The goal of this paper is to analyze the benefits and limitations of deep learning-based video compression methods, and to investigate practical aspects such as rate control, delay, memory consumption and power consumption. In a first part, the deep-learning strategies are analyzed, with a focus on tool-based, end-to-end, and super-resolution-based strategies. In a second part, the practical limitations for industrial applications are studied. In a third part, a technology is proposed, namely overlapping patch-based end-to-end video compression, to overcome memory consumption limitations. Finally, experimental results are provided and discussed.

Downloads

Download data is not yet available.

Advances in video compression: a glimpse of the long-awaited disruption

Authors

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Language

Information

Current Issue

Make a Submission