Scalable Quality Extension for the Opus Codec

Internet-Draft	Scalable Quality Extension	March 2025
Valin	Expires 18 September 2025	[Page]

2. Scalable Quality Extension

The Opus codec was designed to operate at sampling frequencies up to 48 kHz, with an audio bandwidth up to 20 kHz. The CELT mode that is used for high bitrate coding uses ector quantization with a mostly implicit bit allocation system that is dictated by the bitstream definition. Opus can allocate up to 8 bits per MDCT bin in some of the bands.¶

While Opus capabilities listed above are sufficient to reach achieve perceptually transparent audio coding, there is a use for codecs that scale beyond those specs. That includes the current market for 24-bit/96 kHz codecs, but also any application where the intended receipient is not (only) a human being, e.g. ultra-sonic applications.¶

This document proposes a scalable quality extension layer that both increases the resolution of existing Opus quantizers below 20 kHz, and defines a way of coding audio above 20 kHz, with a sampling rate of 96 kHz. The extension is designed to be forward and backward compatible with [RFC6716]. All extra bits use the Opus extension mechanism defined in [opus-extension] and a 96 kHz decoder is designed to be able to decode a regular 48 kHz RFC 6716 stream and vice versa.¶

The code corresponding to this draft (work in progress) is available on the exp_qext24 branch of the Opus repository at https://gitlab.xiph.org/xiph/opus/ .¶

2.1. Extended resolution

To reduce the coding error, we need to increase the resolution for 3 different quantizers: the fine energy quantizer (scalar), the band pyramid vector quantizer (PVQ), and the band splitting angle quantizer. We also introduce a new cubic quantizer that scales to higher bit depths than PVQ. To preserve compatibility, all of the bits extending the Opus resolution are stored in the extension payload.¶

2.1.1. Fine energy quantizer

For each band we can increase the resolution of the fine energy quantizer by adding extra bits. The extra bits are added in the same way as the regular fine energy quantizer adds resolution on top of the coarse energy quantizer.¶

2.1.2. PVQ

From a size-K PVQ codebook in N dimensions we can create an extended codebook of size u*K, where u is always odd and selected as 2^b-1, where b is the extra depth. Let y_i be the (integer) value for dimension i of the size-K codebook and z_i be the corresponding value for the size-u*K codebook. We define a refinement r_i = z_i - u*y_i where |r_i| < u. In the N=2 special case, |r_i| < (u+1)/2. Only the refinement r_i needs to be coded since the regular Opus bitstream already includes y_i. The last residual value r_{N-1} does not need to be coded since it's value can be inferred from the other values and the knowledge that the sum of the absolute values is u*K. The only exception is when y_{N-1}=0, in which case, a single sign bit is coded, but the magnitude is still inferred.¶

Even though |r_i| < u, smaller values or r_i are more likely, so we benefit from entropy coding r_i. We assume that the likelihood of for |r_i| < (u+1)/2 is 7/8 and use that probability for decoding a "large" flag. If large=0, we decode b bits and and subtract u/2 to get r_i. If large=1, we decode a sign bit, followed by an integer with b-1 bits to which we add u/2+1 and apply the sign.¶

2.1.3. Angle quantizer

When using mid-side stereo or when splitting a band, we code an angle representing the atan of two sub-vectors' magnitude ratio. The standard Opus encoder can code angles with up to 8 bits. In a similar way to how we define the PVQ refinement, we pick u = 2^b-1 where u is the number of (equidistant) extra quantization levels to be added between each of the original levels. We code a unit symbol betweeh 0 and u-1, where 0 is almost mid-point to the previous (lower) quantization level, u-1 is almost mid-point to the next (higher) level, and (u+1)/2 perfectly lines up with with the originally selected quantization of the standard Opus layer.¶

2.1.4. Cubic quantizer

The existing Opus PVQ only scales up to 32-bit codebooks. For cases where there is no PVQ in the base Opus layer, we define a new cubic quantizer. Whereas the PVQ codebook is defined as a reflected simplex warped onto the unit sphere, the cubic quantizer warps an N-dimentional cubic shell to the same unit sphere. Cubic codewords specify which face of the cube the vector lies on by coding the dimension and sign of the largest component (using 1+log2(N) bits). The face of an N-dimentional hyper-cube shell is a full N-1-dimensional cube and can be coded with N-1 scalar values from 0 to Q-1 ((N-1)*log2(Q) bits). We use even Q (Q=2^b) for non-transient bands (B==1) and odd Q (Q=2^b-1) for transient bands (B>1).¶

2.2. Extended frequency range

To extend the audio bandwidth, we need to define more frequency bands. Because psychoacoustics is no longer involved past 20 kHz, all new bands are defined to have a width of 2 kHz. Therefore, when encoding 48-kHz content we add 2 extra bands and when encoding 96-kHz content, we add 14 extra bands. A flag is encoded to specify whether 2 or 14 bands are added. The decoder uses that flag to know how many bands to decode, regardless of whether decoding at 48 or 96 kHz.¶

2.3. Bit allocation

The allocation of the extra bit depth b is explicitly signaled for each band at a time, using a resolution of 1/4 bit depth between 0 and a band-dependent cap C, where C=12 for bands up to 20 kHz, and C=14 for the added bands. For band b_i, we use entropy coding to give a higher probability to three different cases: b_i=0, b_i=C, and b_i=b_{i-1}. In the case where b_{i-1} is either 0 or C, we merge two of the probabilities. The ICDF for the general case is {120, 112, 70, 0}, where the first symbol means b_i=0, the second means b_i=C, the third means b_i=b_{i-1}, and the last symbols means that b_i is equal to 1 plus a unit value coded from 0 to C-1. For b_{i-1} = 0, we use the ICDF {64, 50, 0} and for b_{i-1}=C, we use {110, 60, 0}, where the last symbol always means that a unit is coded. We start with b_{-1} = 0.¶

Given b_i, the number of extra energy bits is given by (b_i+3)/4. The number of 1/8 bits (BITRES) allocated for PVQ refinement and/or cubic codebook bits is given by ((W-1)*C * b_i * 8 + 2)/4, where W is the number of bins in the band and C is the number of channels.¶

2.4. Time-domain processing at 96 kHz

CELT includes two time-domain filter pairs that require updating for 96 kHz: the preemphasis/deempahsis filters, as well as the pitch prefilter/postfilter. The CELT deemphasis filter is currently defined as D(z)=1/(1 - a1*z^-1) for a 48 kHz signal, where a1=27853/32768. To obtain approximately the same response in the 0-20 kHz range using a sampling rate of 96 kHz, we instead use D(z)=g*(1 - b1*z^-1)/(1 - a1*z^-1), where g=5415/8192, b1=7209/32768, a1=30245/32768.¶

For the pitch pre-filter/post-filter, we use zero-insertion upsampling of the 48 kHz filters, which results in the same frequency response below 24 kHz and a "folded" image above 24 kHz. For example, if for a pitch period T (in 48 kHz units) the postfilter was P(z)=1/(1 - a0*z^-T+1 - a1*z^-T - a2*z^-T-1), then for the same pitch, the 96 kHz filter becomes P(z)=1/(1 - a0*z^-2T+2 - a1*z^-2T - a2*z^-2T-2).¶

Symbol(s)	PDF/Description
96 kHz flag	{1, 1}/2
Intensity stereo	uint
Dual stereo	{1, 1}/2
Intra coarse energy	{7, 1}/2
Coarse energy (high bands)
Bit allocation	Section 2.3
Fine energy (low bands)	Section 2.1.1
PVQ refinement	Section 2.1.2, Section 2.1.3
Fine energy (high bands)	Section 2.3
PVQ and cubic codebook (high bands)	Section 2.1.4

Scalable Quality Extension for the Opus Codec

Abstract

Status of This Memo

Copyright Notice

Table of Contents

1. Introduction

1.1. Requirements Language

2. Scalable Quality Extension

2.1. Extended resolution

2.1.1. Fine energy quantizer

2.1.2. PVQ

2.1.3. Angle quantizer

2.1.4. Cubic quantizer

2.2. Extended frequency range

2.3. Bit allocation

2.4. Time-domain processing at 96 kHz

3. Format

4. IANA Considerations

5. Security Considerations

6. References

6.1. Normative References

Author's Address