Logo Computer scientist,
engineer, and educator
• Articles • Articles about computing

How to concatenate MP3 audio files (almost) properly, using command-line tools

The problem

Situations commonly arise where we need to concatenate a bunch of MP3 files together, to make one larger file. If you rip an audiobook CD, for example, you could end up with fifty-odd files, each a couple of minutes long. You might just want to merge all the tracks of an album to make one album-sized MP3 file.

There are various freely-available tools for doing this kind of concatenation, but none (so far as I can tell), do it very well, especially with variable bit-rate (VBR) MP3 encoding. Some have nice user interfaces, which is fine until you want to do a batch conversion on a whole heap of files, or until you want to do something that is slightly beyond what the user interface can cope with.

This article describes how to concatenate MP3 files and end up with something that is (almost) correct, using a command-line method that works on Linux and Cygwin. It also discusses the technical issues that make working on MP3 files more difficult than it ought to be.

Tools

mp3val is a very powerful, open-source MP3 file fixer. We'll need this to correct the errors that result from the brute-force concatenation we'll use. For Linux, you'll probably find mp3val in the standard repositories. I was not able to find a Cygwin binary of mp3val, so I compiled it (download).

Principles

The MP3 format is frame-based. That is, an MP3 file contains many individual, self-contained chunks of audio. If that were all they contained, it would be relatively easy to concatenate MP3 files — just paste them together using cat:

$ cat file1.mp3 file2.mp3 >> bigfile.mp3
This will almost work for some MP3 files. The problem is that MP3 files typically contain a whole heap of other stuff — tags, lyrics, album art — which has no business being found out-of-place in the middle of a file.

That this crude concatenation sometimes works is a tribute to the robustness of modern media players — they're good at skipping defective audio frames and finding valid ones. Nevertheless, this crude concatenation causes problems with misreported playback duration and interference with tagging tools.

To understand why this is the case, we need to know something about how MP3 files are encoded.

Until quite recently, MP3 files were nearly always encoded using constant bit-rate (CBr) methods. That means that each second of audio was encoded with the same amount of file data. It's always possible to work out the playback duration of a CBR MP3 file in seconds, just by dividing the file size by the amount of data that encodes one second. This calculation fails for variable bit-rate (VBR) encoding, where the bit-rate of the encoder can vary from frame-to-frame. VBR is almost ubiquitous these days, because we can achieve a dramatic reduction in file size with a relatively small loss of audio quality. Working out the playback duration from the bit-rate also fails when we don't know the file size — in streaming situations, for example.

To get around these problems, MP3 files use various tags and headers to encode the playback duration. I say 'tags and headers' deliberately, because a tag is different to a header. Tags are stripped using the mplayer method described above, but headers are part of the audio stream, and are preserved. Stripping these headers may not necessarily help, because many players need them to report duration properly.

Of particular importance in this context are the XING and VBRI headers, or which the former is by far the most common. If the XING header says that the file is ten minutes long, then most players will report it as ten minutes, however much data it actually contains. Players vary in how they deal with data that exceeds the reported playback duration — they generally play it, but the display may be misleading, and seek controls will probably not work.

Happily, the mp3val utility can recalculate the value of the Xing header by examining the actual audio data frame. It will also discard all ID3 tags except the first in the file.

A script

So, here is a simple Linux/Cygwin shell script to carry out the operations described above.
#!/bin/bash

outfile=$1

if [ -z "$outfile" ]; then
  echo "Usage: mp3cat {outfile} [{files...}";
  exit -1;
fi

if [ -f "$outfile" ]; then
  echo "mp3concat: won't overwrite $outfile";
  exit -1;
fi

shift

for file in "$@"; do
  echo "Concatenating to $outfile..."
  cat "$file" >> $outfile
done

echo "Fixing headers..."
mp3val -f -nb "$outfile"
Save it as, for example, /usr/bin/mp3concat and invoke it like this:

$ mp3concat output_file.mp3 file1.mp3 file2.mp3...

Limitations

This concatenation method won't give you gapless playback of an album, even if you concatenate a whole album into one file. Depending on the player you use, you might find that you get shorter gaps, because the player won't pause to read a new file at each track. But true gapless playback is impossible using standard MP3 files. The reason for this is that the MP3 format requires a fixed number of frames, and if the content does not fit exactly into frames, the last frame must be padded out with silence. Those players that purport to offer gapless playback typically make use of non-standard tags which encode duration information in a very precise way. Concatenating files together will, by its nature, preserve the padded silences in the MP3 files. If you want gapless playback, rip the CD into a single file in one pass.

Probably a minor point, but it's worth bearing in mind that there's no general way to merge the ID3 tags from multiple files into a single file. It's just not possible. mp3val keeps the first ID3 tag block, and discards the rest. So if you concatenate all the tracks in a album, you'll end up with a album with a track number (from the first track), which makes no sense at all. It's unlikely that issues of this kind will have much practical significance.

Copyright © 1994-2013 Kevin Boone. Updated Feb 12 2013