Calculating VMAF and PSNR with FFmpeg

FFmpeg is a great tool for video processing, it basically allows us to manipulate videos any way we like. Depending on the concrete use case, however, it can be challenging to assemble the right command. In this blog post, I will explain how to calculate the Video Multi-Method Assessment Fusion (VMAF) and Peak signal-to-noise-ratio (PSNR) scores in a single FFmpeg command, and produce the results in JSON format or a text file.

The scenario

Let us assume that we’ve created multiple encodes from our high quality source video and would like to check if the quality of our encodes is sufficient enough to deliver the video to the end user. As an example, this is a mandatory step for per-title encoding, which I explained in one of my previous blog posts. The most straightforward way to determine the quality of a video is by watching it and assigning a subjective quality score. For instance, we can rate the quality of our encodes on a scale from 1 to 5, with 1 being bad and 5 being very good. This simple scoring system is often referred to as the Mean Opinion Score (MOS). However, it requires us to watch all of our encodes multiple times, which obviously only scales to a certain degree. Even Netflix and YouTube can not afford to put thousands of people in front of a TV and let them rate the quality of their encodes. Hence, we need a more sophisticated approach to determine the quality of a video. Ideally, this process is completely automated and does not require any human interaction.

Video Quality Metrics

Video quality metrics such as VMAF and PSNR promise to do exactly that. Based on the source video, and the encoded video a video quality score is derived. Although I will not go into the details of both metrics, I want to highlight some of the key facts and common pitfalls:

  • In order to compute a PSNR or a VMAF score, the source video and the encoded video need to have the same resolution.
  • Both video quality metrics work on a frame basis. This means that videos are compared frame by frame and compute an overall score. Usually, the overall score is based on the arithmetic mean, alternatives like the harmonic mean are also valid and possible.
  • Some encoders include an additional black frame at the beginning of the video. This can falsify the results as it leads to a frame shift and an unequal number of frames between source- and encoded video.
  • While PSNR is based on the mean squared error, VMAF is a machine learning-based model that was trained on actual MOS scores. Hence, the VMAF scores are usually linked closer to the actual perceived quality of a video.
  • The default VMAF model is trained for video consumption on a 1080p TV device. There are two more models, one for mobile devices (trained on Samsung S5 with resolution 1920×1080), and one for 4k devices.

Metric Calculation with FFmpeg

Now that we know why we need metrics like VMAF and PSNR, we can take a closer look on how to calculate them. Without any further delay, here comes the FFmpeg command:

ffmpeg -i encode.mp4 -i reference.mp4 -filter_complex "[0:v]scale=1920x1080:flags=bicubic[main]; [1:v]scale=1920x1080:flags=bicubic,format=pix_fmts=yuv420p,fps=fps=25/1[ref]; [main][ref]libvmaf=psnr=1:phone_model=1:log_fmt=json" -f null - > out.json;

Okay, so that is a lot to take. Let us take a closer look at the individual parts. We start off pretty easily by specifying the paths to our encoded and reference videos. Then, we assemble our filter chain. First, we tell FFmpeg to take the video track of our first input (“[0:v]”) and scale it to a resolution of 1920×1080 using the bicubic scaler. We pipe the resulting file into a variable called “main”. Please note that you can use different upscaling algorithms (e.g lanczos or bilinear) at this point, but they should be the same for the encode and the reference files. After scaling the encode, we add some filter options to the reference file. We scale the reference up to 1080p as well. In addition, we set the framerate to 25fps and the pixel format to yuv420p. The result is piped into a variable called “ref”. Depending on your needs you can omit the fps- and the pixel format filter, or add it to the filter chain of the encoded video. Just remember to make sure that your encode and your reference file have the same format and number of frames.
Finally we use our two variables, “main” and “ref”, and feed them into the VMAF library. We tell FFmpeg to determine the PSNR score (“psnr=1”) as well and to use the phone model for VMAF calculation. If you want to use the standard model you can just omit the “phone_model=1” option. We specify a JSON output format and write the results to “out.json”. That is it. Our result will look like this:

Exec FPS: 9.794471
VMAF score = 80.45
PSNR score = 39.6

Conclusion

This concludes our dive into the world of FFmpeg and video quality metrics. If you want to find out more about encoding, feel free to check out our website.

Leave a Reply

Your email address will not be published. Required fields are marked *