HD clips with ffmpeg for Youtube and Vimeo

In this article I am going to show you how to convert a sequence of images (PNGs in this case) into a clip that can be uploaded to and watched on Youtube or Vimeo in HD. ffmpeg – the tool we are going to use – has to be dealt with through the command line. The large number of different parameters and settings which often depend on each other or are mutually exclusive are quite intimidating at first sight (at second as well). Actually it took me quite a while to produce a clip that did not weigh more than 100 MB while keeping a good quality and didn’t produce obscure issues. But in the course of this odyssey I learned the basics of digital videos and also, at the end of the day, the settings which led to my desired outcome even started to make (some) sense!

First of all I created a thousand PNGs (named image000.png, image001.png, …, image999.png) using ggplot2 in R. If you don’t know ggplot2 yet, check out my introductory article on it and the code I used for the article (about stock quote scatter plots changing in time).

Okay, here we go (the new lines are added just for clarity):

Input frame rate (-framerate) 10 and output frame rate (-r) 30 leads to a clip with 30fps but only 10 new images per second. 30fps is the preferred video frame rate for Youtube and Vimeo. If the input frame rate is not specified then ffmpeg uses a new picture for every output frame and that would be too fast.

The images to be used (-i) are described by a string part and a number part ‘%03d’ which means three digits and padding left with 0. This also implies the order of usage for the clip.

The size of the video (-s:v) is also important so Youtube and Vimeo will create and offer an HD (720p in this case) version for the clip.

c:v specifies that we intend to use the libx264 encoder which eventually encodes the clip into H.264 codec format which offers pretty much the best trade-off between compression and visual quality. If you don’t know the difference between a codec (H.264) and its container (MP4) and how they relate and what they do for a living, then check out this answer on stackoverflow.com. The answer is given by slhck who also helped me with ffmpeg – thanks!

The profile:v parameter seems to sepecify what features to use within the H.264 encoding. More advanced features might optimize the outcome but decrease the number of players able to deal with the clip and display it.

pix_fmt specifies the format to use for a pixel – meaning the representation of the colors. But I have no idea what ‘yuv420p’ actually means – but it works.

crf is short for “constant frame rate” and influences the level of visual quality. 23 is the standard setting and considered “pretty much good enough for almost all cases” – just the premium quality for my guests, so I took 20, which is better. The interval reaches from 0 to 53 – 0 being the loss-least and 53 the loss-most. Why didn’t I choose 0 then ? of course the file size increases with the visual quality level.

UPDATED 28 March 2013:

I figured out that in this usage case the input frame rate has to be specified using the ‘framerate’ key instead of ‘r’. Using ‘r’ will lead to a clip which is “more or less” what you want – but parts of the clip will appear shakey and iinput frames (the PNGs) will just go missing. Pretty weird – but exchanging ‘r’ for ‘framerate’ as first parameter will resolve the issue.

2 thoughts on “HD clips with ffmpeg for Youtube and Vimeo

  1. Hey there! Thanks for the mention.

    Regarding yuv420p, it’s the format of how each pixel is stored. The color is converted from RGB colorspace (your input) to a YUV colorspace, which is what video codecs usually work with.

    The human visual system is not able to perceive colors as well as textural information (due to the ratio of cones vs. rods in our retina). This is why you can reduce the amount of data that needs to be transcoded if you simply reduce the resolution of the color planes (U & V), but leave the luma information (Y) intact. This process is called chroma subsampling. YUV 4:2:0 means that and are sampled on each alternate line.

    The p in yuv420p means that the actual data is stored in planes rather than packing each pixel’s luma and chroma information into a set of bytes next to each other. Read more about this on fourcc.

Comments are closed.