Internet-Draft | moq-mi | March 2025 |
Cenzano-Ferret & Frindell | Expires 6 September 2025 | [Page] |
This protocol can be used to send and receive video and audio over Media over QUIC Transport [MOQT], using LOC[loc] packaging.¶
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://afrind.github.io/draft-cenzano-media-interop/draft-cenzano-moq-media-interop.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-cenzano-moq-media-interop/.¶
Discussion of this document takes place on the Media Over QUIC Working Group mailing list (mailto:[email protected]), which is archived at https://mailarchive.ietf.org/arch/browse/moq/. Subscribe at https://www.ietf.org/mailman/listinfo/moq/.¶
Source for this draft and an issue tracker can be found at https://github.com/afrind/draft-cenzano-media-interop.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 6 September 2025.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This protocol specifies a simple mechanism for sending media (video and audio) over LOC[loc] for both live-streaming and video conference (VC) style use cases.¶
moq-mi allows updating encoding parameters in the middle of a track (ex: frame rate, resolution, codec, etc)¶
The protocol refers to [loc] to define the specific media wire format.¶
The publisher selects a namespace of their choosing, and sends an ANNOUNCE message for this namespace.¶
Within the publisher namespace the publisher will offer media tracks named as
videoX
and audioX
where X will be an integer starting at 0.¶
So in case the publisher issues 2 audio tracks and 1 video track, the track
names available will be video0
, audio0
, and audio1
.¶
The subscriber will consider all of those tracks belonging to the same namespace as part of the same synchronization group (timestamps aligned to the same timeline).¶
For the video track, the publisher begins a new group at the start of each IDR (so object 0 will be always an IDR Keyframe), and each group contains a single subgroup. Each object has the format described in Section 2.4.¶
For the audio track, the publisher begins a new group with each audio object, and each group contains a single subgroup. Each object has the format described in Section 2.4.¶
TODO: Datagram forwarding preference could be used, but has problems if audio frame does not fit in a single UDP payload.¶
To avoid using fractional numbers and having to deal with rounding errors, timestamps will be expressed with two integers: - timestamp numerator (ex: PTS, DTS, duration) - timebase¶
To convert a timestamp into seconds you just need to: timestamp(s) = timestamp numerator / timebase¶
Example:¶
PTS = 11, timebase = 30¶
PTS(s) = 11/30 = 0.366666s¶
MoQ-MI uses MOQT extension headers to provide metadata that identifies and augemts the media information found in the object payload.¶
It defines the media type inside object payload (see section IANA in MOQ TODO), and it MUST be present in all objects¶
Value | Media type |
---|---|
0x0 | Video H264 in AVCC |
0x1 | Audio Opus bitsream |
0x2 | UTF-8 text |
0x3 | Audio AAC-LC in MPEG4 |
It provides video metadata useful to consume the video carried in the payload of the object. The following table specifies the data inside this extesion header.¶
{ Seq ID (i) PTS Timestamp (i) DTS Timestamp (i) Timebase (i) Duration (i) Wallclock (i) }
It MUST be present in all objects where "media type header extension" is equal to "Video H264 in AVCC"(0x0)¶
Indicates PTS in timebase¶
TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)¶
Not needed if B frames are NOT used, in that case should be same value as PTS.¶
TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)¶
EPOCH time in ms when this frame started being captured. It will be 0 if not set¶
Provides extradata needed to start decoding the video stream¶
It MUST be present in all object 0 (start of group) where "media type header extension" is equal to "Video H264 in AVCC"(0x0) AND there has been an update on the encoding paramets (or very start of the stream)¶
{ Extradata (..) }
It provides audio metadata useful to consume the audio carried in the payload of the object. Following table specifies the data inside this extesion header.¶
It MUST be present in all objects where "media type header extension" is equal to "Audio Opus bitsream"(0x1)¶
{ Seq ID (i) PTS Timestamp (i) Timebase (i) Sample Freq (i) Num Channels (i) Duration (i) Wall Clock (i) }
Indicates PTS in timebase¶
TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)¶
Sample frequency used in the original signal (before encoding)¶
Number of channels in the original signal (before encoding)¶
{ Seq ID (i) }
{ Seq ID (i) PTS Timestamp (i) Timebase (i) Sample Freq (i) Num Channels (i) Duration (i) Wall Clock (i) }
Indicates PTS in timebase¶
TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)¶
Sample frequency used in the original signal (before encoding)¶
Number of channels in the original signal (before encoding)¶
{ 0x00 (Object ID)(i), 0x03 (Extension Count)(i), 0x0A (Header type: Media type header type)(i) 0x00 (header value: Media type)(i) 0x0B (Header type: H264 in AVCC metadata)(i) 0x0D (Header value length)(i) 0x00 (Header value: Seq ID)(i) 0x00 (Header value: PTS Timestamp)(i) 0x00 (Header value: DTS Timestamp)(i) 0x1E (Header value: Timebase)(i) 0x01 (Header value: Duration)(i) 0xC0, 0x00, 0x01, 0x95, 0x45, 0x6C, 0x8B, 0xFF (Header value: Wallclock)(i) 0x0D (Header type: H264 in AVCC extradata)(i) Header value length (i) Header value: H264 in AVCC extradata (..) Object Payload Length (i), Object Payload bytes (..), }¶
{ 0x01 (Object ID)(i), 0x02 (Extension Count)(i), 0x0A (Header type: Media type header type)(i) 0x00 (header value: Media type)(i) 0x0B (Header type: H264 in AVCC metadata)(i) 0x0D (Header value length)(i) 0x01 (Header value: Seq ID)(i) 0x00 (Header value: PTS Timestamp)(i) 0x00 (Header value: DTS Timestamp)(i) 0x1E (Header value: Timebase)(i) 0x01 (Header value: Duration)(i) 0xC0, 0x00, 0x01, 0x95, 0x45, 0x6C, 0x3B, 0xE0 (Header value: Wallclock)(i) Object Payload Length (i), Object Payload bytes (..), }¶
{ 0x00 (Track Alias)(i), 0x00 (Group ID)(i), 0x00 (Object ID)(i), 0x00 (Publisher Priority)(8), 0x02 (Extension Count)(i), 0x0A (Header type: Media type header type)(i) 0x03 (header value: Media type)(i) 0x13 (Header type: Audio AAC-LC in MPEG4)(i) 0x15 (Header value length)(i) 0x00 (Header value: Seq ID)(i) 0x00 (Header value: PTS Timestamp)(i) 0x80, 0x00, 0xBB, 0x80 (Header value: Timebase)(i) 0x80, 0x00, 0xBB, 0x80 (Header value: Sample freq)(i) 0x02 (Header value: Num channels)(i) 0x44, 0x00 (Header value: Duration)(i) 0xC0, 0x00, 0x01, 0x95, 0x45, 0x6C, 0x3B, 0xE0 (Header value: Wallclock)(i) Object Payload Length (i), Object Payload bytes (..), }¶
TODO: This sections needs to be updated with links to LOC¶
Payload MUST be H264 with bitstream AVC1 format as described in [ISO14496-15:2019] section 5.3. Using 4 bytes size field length.¶
Payload MUST be Opus packets, as described in [RFC6716] - section 3¶
Payload MUST be text bytes in UTF-8, as described in [RFC3629]¶
Payload MUST be AAC frame (syntax element raw_data_block()
), as described in section 4.4.2.1 of [ISO14496-3:2009].¶
[ISO14496-15:2019] "Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format", ISO ISO14496-15:2019, International Organization for Standardization, October, 2022.¶
[ISO14496-3:2009] "Information technology — Coding of audio-visual objects", ISO ISO14496-3:2009, International Organization for Standardization, September, 2009.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
TODO Security¶
This document has no IANA actions.¶
TODO acknowledge.¶