MPEG-4 Part 3
Encyclopedia
MPEG-4 Part 3 or MPEG-4 Audio (formally ISO
/IEC
14496-3) is the third part of the ISO
/IEC
MPEG-4
international standard developed by Moving Picture Experts Group
. It specifies audio
coding
methods. The first version of ISO/IEC 14496-3 was published in 1999.
The MPEG-4 Part 3 consists of a variety of audio coding technologies - from lossy speech coding
(HVXC
, CELP), general audio coding (AAC
, TwinVQ
, BSAC), lossless audio compression (MPEG-4 SLS
, Audio Lossless Coding
, MPEG-4 DST), a Text-To-Speech Interface (TTSI), Structured Audio
(using SAOL
, SASL, MIDI) and many additional audio synthesis and coding techniques.
MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback.
MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.
There is no standard for transport of elementary stream
s over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution.
The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the Delivery Multimedia Integration Framework
(DMIF) in ISO/IEC 14496-6. A wide variety of delivery mechanisms exist below this interface, e.g., MPEG transport stream, Real-time Transport Protocol
(RTP), etc.
Transport in Real-time Transport Protocol is defined in RFC 3016 (RTP Payload Format for MPEG-4 Audio/Visual Streams), RFC 3640 (RTP Payload Format for Transport of MPEG-4 Elementary Streams), RFC 4281 (The Codecs Parameter for "Bucket" Media Types) and RFC 4337 (MIME Type Registration for MPEG-4).
LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.
in MPEG-4 Part 3 (MPEG-4 Audio) Subpart 4 was enhanced relative to the previous standard MPEG-2
Part 7 (Advanced Audio Coding), in order to provide better sound quality
for a given encoding bitrate.
It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard.
The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles: Low Complexity profile (LC), Main profile and Scalable Sampling Rate profile (SSR).
The MPEG-4 Part 3 Subpart 4 (General Audio Coding) combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution (PNS) and defined them as Audio Object Types (AAC LC, AAC Main, AAC SSR).
using spectral band replication
(SBR), and Parametric Stereo
(PS). It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio.
bank. Then these 4 bands are further split using MDCTs
with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal.
The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands coding efficiencies around (1,2,3) * fs/8 is worse than normal MPEG-4 AAC LC.
MPEG-4 AAC-SSR is very similar to ATRAC
and ATRAC-3.
Example:
Note: although possible, the resulting quality is much worse than typical
for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is
achieved by using intensity stereo and reduced NMRs. This degrades audible quality
less than transmitting 6 kHz bandwidth with perfect quality.
applications.
.
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
/IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...
14496-3) is the third part of the ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
/IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...
MPEG-4
MPEG-4
MPEG-4 is a method of defining compression of audio and visual digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group under the formal standard ISO/IEC...
international standard developed by Moving Picture Experts Group
Moving Picture Experts Group
The Moving Picture Experts Group is a working group of experts that was formed by ISO and IEC to set standards for audio and video compression and transmission. It was established in 1988 by the initiative of Hiroshi Yasuda and Leonardo Chiariglione, who has been from the beginning the Chairman...
. It specifies audio
Sound
Sound is a mechanical wave that is an oscillation of pressure transmitted through a solid, liquid, or gas, composed of frequencies within the range of hearing and of a level sufficiently strong to be heard, or the sensation stimulated in organs of hearing by such vibrations.-Propagation of...
coding
Coding
Coding may refer to:* Channel coding in coding theory* Line coding* Computer programming, the process of designing, writing, testing, debugging / troubleshooting, and maintaining the source code of computer programs...
methods. The first version of ISO/IEC 14496-3 was published in 1999.
The MPEG-4 Part 3 consists of a variety of audio coding technologies - from lossy speech coding
Speech coding
Speech coding is the application of data compression of digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting...
(HVXC
Harmonic Vector Excitation Coding
Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm used in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency 8 kHz. It also operates at lower...
, CELP), general audio coding (AAC
Advanced Audio Coding
Advanced Audio Coding is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates....
, TwinVQ
TwinVQ
TwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation Human Interface Laboratories in 1994...
, BSAC), lossless audio compression (MPEG-4 SLS
MPEG-4 SLS
MPEG-4 SLS, or MPEG-4 Scalable to Lossless as per ISO/IEC 14496-3:2005/Amd 3:2006 , is an extension to the MPEG-4 Part 3 standard to allow lossless audio compression scalable to lossy MPEG-4 General Audio coding methods...
, Audio Lossless Coding
Audio Lossless Coding
MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006...
, MPEG-4 DST), a Text-To-Speech Interface (TTSI), Structured Audio
MPEG-4 Structured Audio
MPEG-4 Structured Audio is an ISO/IEC standard for describing sound. It was published as subpart 5 of MPEG-4 Part 3 in 1999....
(using SAOL
Structured Audio Orchestra Language
Structured Audio Orchestra Language is an imperative, MUSIC-N programming language designed for describing virtual instruments, processing digital audio, and applying sound effects...
, SASL, MIDI) and many additional audio synthesis and coding techniques.
MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback.
MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.
Versions
Edition | Release date | Latest amendment | Standard | Description |
---|---|---|---|---|
First edition | 1999 | 2001 | ISO/IEC 14496-3:1999 | also known as "MPEG-4 Audio Version 1" |
2000 | ISO/IEC 14496-3:1999/Amd 1:2000 | also known as "MPEG-4 Audio Version 2", an Amendment to first edition | ||
Second edition | 2001 | 2005 | ISO/IEC 14496-3:2001 | |
Third edition | 2005 | 2008 | ISO/IEC 14496-3:2005 | |
Fourth edition | 2009 | 2010 and under development | ISO/IEC 14496-3:2009 |
Subparts
MPEG-4 Part 3 contains following subparts:- Subpart 1: Main (list of Audio Object Types, Profiles, Levels, interface to ISO/IEC 14496-1, MPEG-4 Audio transport stream, etc.)
- Subpart 2: Speech coding - HVXC (Harmonic Vector eXcitation Coding)
- Subpart 3: Speech coding - CELPCode Excited Linear PredictionCode-excited linear prediction is a speech coding algorithm originally proposed by M.R. Schroeder and B.S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders...
(Code Excited Linear Prediction) - Subpart 4: General Audio Coding (GA) (Time/Frequency Coding) - AACAdvanced Audio CodingAdvanced Audio Coding is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates....
, TwinVQTwinVQTwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation Human Interface Laboratories in 1994...
, BSAC - Subpart 5: Structured AudioMPEG-4 Structured AudioMPEG-4 Structured Audio is an ISO/IEC standard for describing sound. It was published as subpart 5 of MPEG-4 Part 3 in 1999....
(SA) - Subpart 6: Text to Speech Interface (TTSI)
- Subpart 7: Parametric Audio Coding - HILN (Harmonic and Individual Line plus Noise)
- Subpart 8: Technical description of parametric coding for high quality audio (SSC, Parametric StereoParametric StereoParametric Stereo is a feature and an Audio Object Type defined and used in MPEG-4 Part 3 to further enhance efficiency in low bandwidth stereo media. Advanced Audio Coding Low Complexity combined with Spectral band replication and Parametric Stereo was defined as HE-AAC v2...
) - Subpart 9: MPEG-1MPEG-1MPEG-1 is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to 1.5 Mbit/s without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting possible.Today, MPEG-1 has become...
/MPEG-2MPEG-2MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods which permit storage and transmission of movies using currently available storage media and transmission...
Audio in MPEG-4 - Subpart 10: Technical description of lossless coding of oversampled audio (MPEG-4 DST - Direct Stream Transfer)
- Subpart 11: Audio Lossless CodingAudio Lossless CodingMPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006...
(ALS) - Subpart 12: Scalable Lossless Coding (SLS)
MPEG-4 Audio Object Types
MPEG-4 Audio includes a system for handling a diverse group of audio formats in a uniform matter. Each format is assigned a unique Audio Object Type to represent it. Object Type is used to distinguish between different coding methods. It directly determines the MPEG-4 tool subset required to decode a specific object. The MPEG-4 profiles are based on the object types and each profile supports different list of object types.Object Type ID | Audio Object Type | First public release date | Description |
---|---|---|---|
1 | AAC Main | 1999 | contains AAC LC |
2 | AAC LC (Low Complexity) | 1999 | Used in the "AAC Profile". MPEG-4 AAC LC Audio Object Type is based on the MPEG-2 Part 7 Low Complexity profile (LC) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4). |
3 | AAC SSR (Scalable Sample Rate) | 1999 | MPEG-4 AAC SSR Audio Object Type is based on the MPEG-2 Part 7 Scalable Sampling Rate profile (SSR) combined with Perceptual Noise Substitution (PNS) (defined in MPEG-4 Part 3 Subpart 4). |
4 | AAC LTP (Long Term Prediction Long Term Prediction In GSM, a RPE-LTP scheme is employed in order to reduce the amount of data sent between the mobile station and base transceiver station.... ) |
1999 | contains AAC LC |
5 | SBR (Spectral Band Replication Spectral band replication Spectral band replication is a technology to enhance audio or speech codecs, especially at low bit rates and is based on harmonic redundancy in the frequency domain.... ) |
2003 | used with AAC LC in the "High Efficiency AAC Profile" (HE-AAC HE-AAC High-Efficiency Advanced Audio Coding is a lossy data compression scheme for digital audio defined as a MPEG-4 Audio profile in ISO/IEC 14496-3. It is an extension of Low Complexity AAC optimized for low-bitrate applications such as streaming audio... v1) |
6 | AAC Scalable | 1999 | |
7 | TwinVQ TwinVQ TwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation Human Interface Laboratories in 1994... |
1999 | audio coding at very low bitrates |
8 | CELP (Code Excited Linear Prediction Code Excited Linear Prediction Code-excited linear prediction is a speech coding algorithm originally proposed by M.R. Schroeder and B.S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders... ) |
1999 | speech coding |
9 | HVXC (Harmonic Vector eXcitation Coding) | 1999 | speech coding |
10 | (Reserved) | ||
11 | (Reserved) | ||
12 | TTSI (Text-To-Speech Interface) | 1999 | |
13 | Main synthesis | 1999 | contains Wavetable synthesis and Algorithmic Synthesis and Audio Effects |
14 | Wavetable synthesis Wavetable synthesis Wavetable synthesis is used in certain digital music synthesizers to implement a restricted form of real-time additive synthesis. The technique was first developed by Wolfgang Palm of PPG in the late 1970s and published in 1979, and has since been used as the primary synthesis method in... |
1999 | contains General MIDI |
15 | General MIDI General MIDI General MIDI or GM is a standardized specification for music synthesizers that respond to MIDI messages. GM was developed by the MIDI Manufacturers Association and the Japan MIDI Standards Committee and first published in 1991... |
1999 | |
16 | Algorithmic Synthesis and Audio Effects | 1999 | |
17 | ER AAC LC | 2000 | Error Resilient |
18 | (Reserved ) | ||
19 | ER AAC LTP | 2000 | Error Resilient |
20 | ER AAC Scalable | 2000 | Error Resilient |
21 | ER TwinVQ | 2000 | Error Resilient |
22 | ER BSAC (Bit-Sliced Arithmetic Coding) | 2000 | It is also known as "Fine Granule Audio" or fine grain scalability tool. It is used in combination with the AAC coding tools and replaces the noiseless coding and the bitstream formatting of MPEG-4 Version 1 GA coder. Error Resilient |
23 | ER AAC LD AAC-LD The MPEG-4 Low Delay Audio Coder is audio compression format designed to combine the advantages of perceptual audio coding with the low delay necessary for two-way communication. It is closely derived from the MPEG-2 Advanced Audio Coding format... (Low Delay) |
2000 | Error Resilient, used with CELP, ER CELP, HVXC, ER HVXC and TTSI in the "Low Delay Profile", (commonly used for real-time conversation applications) |
24 | ER CELP | 2000 | Error Resilient |
25 | ER HVXC | 2000 | Error Resilient |
26 | ER HILN (Harmonic and Individual Lines plus Noise) | 2000 | Error Resilient |
27 | ER Parametric | 2000 | Error Resilient |
28 | SSC (SinuSoidal Coding) | 2004 | |
29 | PS (Parametric Stereo Parametric Stereo Parametric Stereo is a feature and an Audio Object Type defined and used in MPEG-4 Part 3 to further enhance efficiency in low bandwidth stereo media. Advanced Audio Coding Low Complexity combined with Spectral band replication and Parametric Stereo was defined as HE-AAC v2... ) |
2004 and 2006 | used with AAC LC and SBR in the "HE-AAC v2 Profile". PS coding tool was defined in 2004 and Object Type defined in 2006. |
30 | MPEG Surround MPEG Surround MPEG Surround , also known as Spatial Audio Coding is a lossy compression format for surround sound that provides a method for extending mono or stereo audio services to multi-channel audio in a backwards compatible fashion... |
2007 | also known as MPEG Spatial Audio Coding (SAC), it is a type of spatial audio coding (MPEG Surround was also defined in ISO/IEC 23003-1 in 2007) |
31 | (Reserved) | ||
32 | MPEG-1/2 Layer-1 | 2005 | |
33 | MPEG-1/2 Layer-2 MPEG-1 Audio Layer II MPEG-1 Audio Layer II or MPEG-2 Audio Layer II is a lossy audio compression format defined by ISO/IEC 11172-3 alongside MPEG-1 Audio Layer I and MPEG-1 Audio Layer III... |
2005 | |
34 | MPEG-1/2 Layer-3 MP3 MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression... |
2005 | also known as "MP3onMP4" |
35 | DST (Direct Stream Transfer) | 2005 | lossless audio coding, used on Super Audio CD |
36 | ALS (Audio Lossless Coding Audio Lossless Coding MPEG-4 Audio Lossless Coding, also known as MPEG-4 ALS, is an extension to the MPEG-4 Part 3 audio standard to allow lossless audio compression. The extension was finalized in December 2005 and published as ISO/IEC 14496-3:2005/Amd 2:2006 in 2006... ) |
2006 | lossless audio coding |
37 | SLS (Scalable Lossless Coding) | 2006 | two-layer audio coding with lossless layer and lossy General Audio core/layer (e.g. AAC) |
38 | SLS non-core | 2006 | lossless audio coding without lossy General Audio core/layer (e.g. AAC) |
39 | ER AAC ELD (Enhanced Low Delay) | 2008 | Error Resilient |
40 | SMR (Symbolic Music Representation) Simple | 2008 | note: Symbolic Music Representation is also the MPEG-4 MPEG-4 MPEG-4 is a method of defining compression of audio and visual digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group under the formal standard ISO/IEC... Part 23 standard (ISO/IEC 14496-23:2008) |
41 | SMR Main | 2008 | |
42 | USAC (Unified Speech and Audio Coding Unified Speech and Audio Coding Unified Speech and Audio Coding is an audio compression format and codec for both music and speech or any mix of speech and audio using very low bit rates between 12 and 64 kbit/s. It is currently under the development in MPEG and will be defined as an international standard ISO/IEC 23003-3... ) (no SBR) |
under development | |
43 | SAOC (Spatial Audio Object Coding) | 2010 | note: Spatial Audio Object Coding is also the MPEG-D MPEG-D MPEG-D is a group of standards for audio coding formally known as ISO/IEC 23003 - MPEG audio technologies, published since 2007.MPEG-D consists of three parts:* MPEG-D Part 1: MPEG Surround... Part 2 standard (ISO/IEC 23003-2:2010) |
44 | LD MPEG Surround | 2010 | This object type conveys Low Delay MPEG Surround Coding side information (that was defined in MPEG-D Part 2 - ISO/IEC 23003-2) in the MPEG-4 Audio framework. |
45 | USAC | under development (it will be also defined in MPEG-D Part 3 - ISO/IEC 23003-3) |
Audio Profiles
The MPEG-4 Audio standard defines several profiles. These profiles are based on the object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time.Audio Profile | Audio Object Types | First public release date |
---|---|---|
AAC Profile | AAC LC | 2003 |
High Efficiency AAC Profile | AAC LC, SBR | 2003 |
HE-AAC v2 Profile | AAC LC, SBR, PS | 2006 |
Main Audio Profile | AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, Main synthesis | 1999 |
Scalable Audio Profile | AAC LC, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI | 1999 |
Speech Audio Profile | CELP , HVXC, TTSI | 1999 |
Synthetic Audio Profile | TTSI, Main synthesis | 1999 |
High Quality Audio Profile | AAC LC, AAC LTP, AAC Scalable, CELP, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER CELP | 2000 |
Low Delay Audio Profile | CELP, HVXC, TTSI, ER AAC LD, ER CELP, ER HVXC | 2000 |
Natural Audio Profile | AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD, ER CELP, ER HVXC, ER HILN, ER Parametric | 2000 |
Mobile Audio Internetworking Profile | ER AAC LC, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD | 2000 |
HD-AAC Profile | AAC LC, SLS | 2009 |
ALS Simple Profile | ALS | 2010 |
Audio storage and transport
Standard | Description | |
---|---|---|
Multiplex | ISO/IEC 14496-1 | MPEG-4 Multiplex scheme (M4Mux) |
Multiplex | ISO/IEC 14496-3 | Low Overhead Audio Transport Multiplex (LATM) |
Storage | ISO/IEC 14496-3 (informative) | Audio Data Interchange Format (ADIF) - only for AAC |
Storage | ISO/IEC 14496-12 | MPEG-4 file format (MP4 MPEG-4 Part 14 MPEG-4 Part 14 or MP4 is a multimedia container format standard specified as a part of MPEG-4. It is most commonly used to store digital video and digital audio streams, especially those defined by MPEG, but can also be used to store other data such as subtitles and still images... ) / ISO base media file format ISO base media file format ISO base media file format defines a general structure for time-based multimedia files such as video and audio. It is used as the basis for other media file formats... |
Transmission | ISO/IEC 14496-3 (informative) | Audio Data Transport Stream (ADTS) - only for AAC |
Transmission | ISO/IEC 14496-3 | Low Overhead Audio Stream (LOAS), based on LATM |
There is no standard for transport of elementary stream
Elementary stream
An elementary stream as defined by MPEG communication protocol is usually the output of an audio or video encoder. ES contains only one kind of data, e.g. audio, video or closed caption. An elementary stream is often referred to as "elementary", "data", "audio", or "video" bitstreams or streams...
s over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution.
The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the Delivery Multimedia Integration Framework
Delivery Multimedia Integration Framework
DMIF, or Delivery Multimedia Integration Framework, is a uniform interface between the application and the transport, that allows the MPEG-4 application developer to stop worrying about that transport. DMIF was defined in MPEG-4 Part 6 in 1999. DMIF defines two interfaces: the DAI and the DNI...
(DMIF) in ISO/IEC 14496-6. A wide variety of delivery mechanisms exist below this interface, e.g., MPEG transport stream, Real-time Transport Protocol
Real-time Transport Protocol
The Real-time Transport Protocol defines a standardized packet format for delivering audio and video over IP networks. RTP is used extensively in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications, television services and...
(RTP), etc.
Transport in Real-time Transport Protocol is defined in RFC 3016 (RTP Payload Format for MPEG-4 Audio/Visual Streams), RFC 3640 (RTP Payload Format for Transport of MPEG-4 Elementary Streams), RFC 4281 (The Codecs Parameter for "Bucket" Media Types) and RFC 4337 (MIME Type Registration for MPEG-4).
LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.
Bifurcation in the AAC technical standard
The Advanced Audio CodingAdvanced Audio Coding
Advanced Audio Coding is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates....
in MPEG-4 Part 3 (MPEG-4 Audio) Subpart 4 was enhanced relative to the previous standard MPEG-2
MPEG-2
MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods which permit storage and transmission of movies using currently available storage media and transmission...
Part 7 (Advanced Audio Coding), in order to provide better sound quality
Sound quality
Sound quality is the quality of the audio output from various electronic devices. Sound quality can be defined as the degree of accuracy with which a device records or emits the original sound waves...
for a given encoding bitrate.
It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard.
The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles: Low Complexity profile (LC), Main profile and Scalable Sampling Rate profile (SSR).
The MPEG-4 Part 3 Subpart 4 (General Audio Coding) combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution (PNS) and defined them as Audio Object Types (AAC LC, AAC Main, AAC SSR).
HE-AAC
High-Efficiency Advanced Audio Coding is an extension of AAC LCAdvanced Audio Coding
Advanced Audio Coding is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates....
using spectral band replication
Spectral band replication
Spectral band replication is a technology to enhance audio or speech codecs, especially at low bit rates and is based on harmonic redundancy in the frequency domain....
(SBR), and Parametric Stereo
Parametric Stereo
Parametric Stereo is a feature and an Audio Object Type defined and used in MPEG-4 Part 3 to further enhance efficiency in low bandwidth stereo media. Advanced Audio Coding Low Complexity combined with Spectral band replication and Parametric Stereo was defined as HE-AAC v2...
(PS). It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio.
AAC-SSR
AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards. It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding (AAC) in 1997. The audio signal is first split into 4 bands using a 4 band polyphase quadrature filterPolyphase quadrature filter
A polyphase quadrature filter, or PQF, is a filter bank which splits an input signal into a given number N of equidistant sub-bands. These sub-bands are subsampled by a factor of N, so they are critically sampled....
bank. Then these 4 bands are further split using MDCTs
Modified discrete cosine transform
The modified discrete cosine transform is a Fourier-related transform based on the type-IV discrete cosine transform , with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset,...
with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal.
The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands coding efficiencies around (1,2,3) * fs/8 is worse than normal MPEG-4 AAC LC.
MPEG-4 AAC-SSR is very similar to ATRAC
ATRAC
Adaptive Transform Acoustic Coding is a family of proprietary audio compression algorithms developed by Sony. MiniDisc was the first commercial product to incorporate ATRAC in 1992. ATRAC allowed a relatively small disc like MiniDisc to have the same running time as CD while storing audio...
and ATRAC-3.
Why AAC-SSR was introduced
The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing the data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate.Example:
- 4 subbands: bitrate = 128 kbit/s, sample rate = 48 kHz, f_lowpass = 20 kHz
- 3 subbands: bitrate ~ 120 kbit/s, sample rate = 48 kHz, f_lowpass = 18 kHz
- 2 subbands: bitrate ~ 100 kbit/s, sample rate = 24 kHz, f_lowpass = 12 kHz
- 1 subband: bitrate ~ 65 kbit/s, sample rate = 12 kHz, f_lowpass = 6 kHz
Note: although possible, the resulting quality is much worse than typical
for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is
achieved by using intensity stereo and reduced NMRs. This degrades audible quality
less than transmitting 6 kHz bandwidth with perfect quality.
BSAC
Bit Sliced Arithmetic Coding is an MPEG-4 standard (ISO/IEC 14496-3 subpart 4) for scalable audio coding. BSAC uses an alternative noiseless coding to AAC, with the rest of the processing being identical to AAC. This support for scalability allows for nearly transparent sound quality at 64 kbit/s and graceful degradation at lower bit rates. BSAC coding is best performed in the range of 40 kbit/s to 64 kbit/s, though it operates in the range of 16 kbit/s to 64 kbit/s. The AAC-BSAC codec is used in Digital Multimedia Broadcasting (DMB)Digital Multimedia Broadcasting
Digital Multimedia Broadcasting is a digital radio transmission technology developed in South Korea as part of the national IT project for sending multimedia such as TV, radio and datacasting to mobile devices such as mobile phones...
applications.
Licensing
In 2002, the MPEG-4 Audio Licensing Committee selected the Via Licensing Corporation as the Licensing Administrator for the MPEG-4 Audio patent poolPatent pool
In patent law, a patent pool is a consortium of at least two companies agreeing to cross-license patents relating to a particular technology. The creation of a patent pool can save patentees and licensees time and money, and, in case of blocking patents, it may also be the only reasonable method...
.
See also
- TwinVQTwinVQTwinVQ is an audio compression technique developed by Nippon Telegraph and Telephone Corporation Human Interface Laboratories in 1994...
- one of the object types defined in MPEG-4 Audio version 1 - MPEG-4 Part 2MPEG-4 Part 2MPEG-4 Part 2, MPEG-4 Visual is a video compression technology developed by MPEG. It belongs to the MPEG-4 ISO/IEC standards. It is a discrete cosine transform compression standard, similar to previous standards such as MPEG-1 and MPEG-2...
- MPEG-4 Part 14MPEG-4 Part 14MPEG-4 Part 14 or MP4 is a multimedia container format standard specified as a part of MPEG-4. It is most commonly used to store digital video and digital audio streams, especially those defined by MPEG, but can also be used to store other data such as subtitles and still images...
container format (MP4) - Digital rights managementDigital rights managementDigital rights management is a class of access control technologies that are used by hardware manufacturers, publishers, copyright holders and individuals with the intent to limit the use of digital content and devices after sale. DRM is any technology that inhibits uses of digital content that...
- Advanced Audio CodingAdvanced Audio CodingAdvanced Audio Coding is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates....
(AAC)
External links
- Apple: MPEG-4: AAC
- "HE-AAC" (VideoLAN WIKI)
- EBU subjective listening tests on low-bitrate audio codecs
- AAC radio stations - Online radio stations in AAC format
- Tuner2 - Directory of radio stations in AAC+ format at various bitrates
- RadioFeeds UK & Ireland - Page containing plenty of terrestrial stations webcasting in AAC+ format.
- http://www.rjamorim.com/test/64test/results.html A page comparing codecs including He-AAC @64 kbit/s by listening tests.
- Official MPEG web site
- RFC 3016 - RTP Payload Format for MPEG-4 Audio/Visual Streams
- RFC 3640 - RTP Payload Format for Transport of MPEG-4 Elementary Streams
- RFC 4281 - The Codecs Parameter for "Bucket" Media Types
- RFC 4337 - MIME Type Registration for MPEG-4