The project is a collaboration between Tencent’s Hunyuan team, the Hong Kong University of Science and Technology and Tsinghua University, one of mainland China’s top two universities in Beijing.
OpenAI’s Sora pours ‘cold water’ on China’s AI dreams
OpenAI’s Sora pours ‘cold water’ on China’s AI dreams
Tencent said it will release the full code for the model in April, but a demo is already available on GitHub. Researchers showcased some of its capabilities there, with one result showing how an image of a bird with the prompt “flap the wings” turned into a short MP4 file of a rainbow-coloured avian twitching one of its wings.
Another image of a girl standing outdoors with the simple one-word prompt “storm” turned into an animation with lightning flashing in the background.
Follow-Your-Click aims to solve issues faced by other image-to-video models on the market that tend to move entire scenes rather than focusing on specific objects in a picture, according to an academic paper by the researchers from the three organisations. Other models require users to give elaborate descriptions of how and where they want the image to move.
“Our framework has simpler yet precise user control and better generation performance than previous methods,” the researchers said in the paper published on Wednesday on arXiv, an online scientific paper repository.
In the field of text- and image-to-video generation, Silicon Valley-based Pika Labs, co-founded by Chinese PhD candidate Guo Wenjing at Stanford University, is another rising star. The start-up has raised US$55 million in seed capital and Series A funding rounds from some of the biggest names in tech.
Follow-Your-Click joins Tencent’s open-source text-to-video-generation and editing toolbox called VideoCrafter2, which the tech giant released in January. It is an updated version of VideoCrafter1, released in October 2023, but is limited to videos of just two seconds long.