Optimizing RetinaFace Performance for Real-Time Detection

A few deliberate settings — backbone choice, input size, batching and GPU use — turn RetinaFace from accurate into accurate and fast.

Out of the box RetinaFace favours accuracy, which is the right default. But when you move toward video or large batches, speed starts to matter, and a handful of well-chosen settings make a dramatic difference. The trick is knowing which levers actually move the needle.

Pick the right backbone

RetinaFace can run on a heavier backbone that maximises accuracy or a lighter one that maximises speed. For offline work on still images where every face counts, the heavier option is worth it. For real-time video or anything latency-sensitive, the lighter backbone gives up a little accuracy for a large gain in throughput. Match the backbone to the job rather than always reaching for the most accurate one.

Control the input size

Input resolution is the single biggest factor in detection speed. Larger inputs find smaller faces but cost more time; smaller inputs are faster but can miss distant faces. Find the smallest input size that still catches the faces you care about, and you will often double your speed with no meaningful loss. This one setting usually matters more than any other.

Use the GPU when it counts

For single images the CPU is perfectly adequate, but for video or heavy batches a GPU transforms performance. Detection is highly parallel work, exactly what a GPU is built for. If you have one available and you are processing a stream or a large volume of images, moving detection onto the GPU is the most effective upgrade you can make.

Batch your work

Processing images one at a time wastes capacity, especially on a GPU. Grouping several images into a batch lets the hardware work on them together and raises overall throughput considerably. Increase the batch size until you approach your memory limit, then step back slightly. The sweet spot depends on your hardware, so test a few sizes against your own data.

Resize before you detect

If your source images are far larger than they need to be, downscaling them before detection saves time at almost no cost to results, because faces remain perfectly detectable at moderate resolution. This pairs naturally with input-size tuning and is especially valuable when you are working through thousands of high-resolution photos.

Measure, then tune

Resist the urge to optimise blind. Time a representative sample with your current settings, change one variable, and measure again. Speed work is full of surprises, and the only reliable guide is your own numbers on your own hardware and images. A few minutes of measurement saves hours of guesswork and tells you precisely where the time is going.

Putting it together

For real-time detection, a typical fast configuration pairs the lighter backbone with a modest input size, GPU execution and sensible batching. For maximum-accuracy offline work, keep the heavier backbone and a larger input, and let it take the time it needs. Most projects land somewhere between, and the settings above let you place RetinaFace exactly where you want it on that spectrum.

Understanding RetinaFace Landmarks and Confidence Scores