We introduce the first single-shot motion blending model that enables seamless blending by temporally conditioning the generation process.
Our method incorporates a Spatially- Adaptive Denormalization (SPADE)-inspired conditioning scheme using skeleton-aware convolutions, enabling blending multiple input human skeletal motions into coherent animations in a single generative pass.
We evaluate our method on different datasets using both standard metrics like the Fréchet inception distance and a novel metric based on L2 error of velocity and acceleration, which we introduced to assess the smoothness of blended motion. The results demonstrate that our method produces realistic motion blending, offering a new solution to the problem of animation blending.
@inproceedings{tselepi2025blending,
author = {Tselepi, Eleni and Thermos, Spyridon and Potamianos, Gerasimos},
title = {Controllable Single-shot Animation Blending with Temporal Conditioning},
booktitle = {Proceedings of the International Conference on Computer Vision AI4VA Workshop},
year = {2025},
}