Kling 2.0, a major upgrade to the state-of-the-art AI video generator released by the Chinese tech firm Kuaishou, hit the market last week to a flood of jaw-dropping reactions from creators, who quickly burned through hundreds of dollars testing its capabilities.
“AI video quality just 10x’d overnight. I’m speechless,” tweeted AI filmmaker PJ Ace, who claimed to have already spent $1,250 in credits exploring the tool’s limits. “I’ve never seen motion this fluid or prompts this accurate.” The post garnered over 757,000 views, highlighting the buzz around this release.
AI video quality just 10x’d overnight. I’m speechless.
Kling 2.0 just dropped and I’ve already burned through $1,250 in credits testing its limits.
I’ve never seen motion this fluid or prompts this accurate.
Here’s exactly how I made this video, step-by-step 👇🧵 pic.twitter.com/F54EfvLczj— PJ Ace (@PJaccetturo) April 15, 2025
The new version marks a significant leap forward from Kling 1.6, offering enhanced prompt understanding, more fluid character movement, and improved visual aesthetics that users describe as looking “filmed, not generated.” Most notably, Kling 2.0 can generate videos up to 2 minutes long, leaving competitors like OpenAI’s Sora in the dust when it comes to extended narrative possibilities.
“Overall, Kling does maintain the top spot on the leaderboard,” the YouTuber Tim Simmon, who specializes in reviewing generative AI models, said in his review. He believes it’s the clear winner in image-to-video generation, with the competition being closer when it comes to a direct text-to-video generation.
This new version arrives in an increasingly crowded AI video-generation market. Competitors include Runway, known for high-fidelity outputs—which recently released its v4 model, focused on cinematic results—and Google’s Veo2, with its robust text-to-video capabilities and aesthetically pleasing results.
So far, the model has yet to be featured on Artificial Analysis’ Video Generator Leaderboard—which ranks all the best generative video models—however its predecessor, Kling 1.6 is already the leader in image-to-video and ranks second on text-to-video based on blind tests.

Kling 2.0 features a multi-elements editor, allowing users to add, swap, or delete video content using text or image inputs.
The platform also introduces two specialized components: Kling 2.0 Master for video generation and Kolors 2.0 for image creation—not to be confused with another open-source Chinese AI image generator that was released under the same “Kolor” name—giving creators more control over their outputs.

The tool’s focus on cinematic quality makes it particularly attractive to filmmakers, marketers, and content creators. The model is extremely powerful in terms of resources, with generations taking hours in the free plan and up to 16 minutes for nearly 5 seconds of video in online platforms.
Pricing starts at $29 per month for the standard plan, which includes Professional mode, 8-second videos, and an allowance of 30 videos per day. A free plan offers 6 daily generations with 4-second limits and watermarks. The Professional plan, at $89 a month, delivers high resolution, advanced motion controls, and priority processing.
Testing the model
We tried the new model in five categories—dynamism, illustration, text-to-video, structural coherence, and multi-subject coherence. Here’s what we found.
Dynamism
All video generators handle still scenes well, but typically struggle with rapid movement, intricate scenes, and dynamic setup. This mirrors real-life video or animation—pause your TV during a “Tom & Jerry” chase or an action-packed war scene, and you’ll spot weird frames everywhere.
We tested the model with a still image of a man flying through a city and asked it to generate the scene.
Kling 2.0 proved extremely sensitive to minor prompt changes. Our first attempt used: “Dynamic tracking shot: A man is flying at extremely high speeds in a bustling city street. The camera follows closely behind, capturing the rush of buildings and traffic whizzing by, enhancing the sense of speed and exhilaration after he takes a sharp turn.”
Unfortunately the prompt generated the illusion of a subject kind of being vacuumed backwards down the street. This was likely due to our choice of words in the prompt.
So we removed just one word: “behind.” That altered the result, producing a much better video showing the subject flying forward, facing the camera.
Kling captured the key scene elements—dynamic and fast-paced movement—though the subject’s body morphed weirdly when changing direction, and some elements lacked uniform structure. Other models like Google’s Veo2 trade dynamism for realism, creating slower, more static, but more coherent scenes.
Illustration
Prompt: “360-degree horizontal pan: A bustling city intricately built around a massive tree, filled with houses and bridges. The camera smoothly moves from the front to the back of the tree, capturing children playing, people engaging in daily activities, and flying cars landing on branches and taking off, all under a warm, inviting atmosphere.”
The model excels with imaginative styles like comics and illustrations, but struggles with minor details. It prioritizes coherence over detail, respecting the main prompt elements with smooth camera movement and a fluid scene.
Object structure remains solid without the wiggling seen in other generators, though some kids (which would be small details beyond the original structure of the whole composition—a tree and the busy around it) lose coherence, and flying cars occasionally disappear.
Still, this test produced the best results we’ve seen from any video generator.
Text-to-video
Prompt: “A blonde woman in a red dress and an Asian man in black suit chat inside of a Starbucks. Medium shot.”
Text-to-video presents unique challenges for AI generators. The model must create an initial frame (essentially a text-to-image task) and use that as a reference for all subsequent frames. Ideally, you’d want a specialized image generator for that first frame—and ideally for the last frame too if you want the best coherence.
Kling 2.0 doesn’t particularly shine here—but it’s not bad either. The scene has the characteristic airbrushed style common to many image generators, but bodies maintain proper structure, fingers appear accurate, and there aren’t noticeable artifacts disrupting the scene.
It’s an improvement over Kling 1.6, but not what the model was designed for.
Structural coherence
Prompt: “Aerial view: shot of an intricate, abstract architectural structure rotating.”
While Kling may struggle with small details in crowded scenes, it excels at maintaining coherence and detail in single-subject shots.
We shared an image of an intricate piece and asked the model to make it rotate. Kling 2.0 handled this nearly flawlessly—the lighting remained consistent, movement was uniform, no artifacts appeared, and the structure maintained its integrity.
This capability makes it potentially valuable for 3D modeling, enabling object and scene previews from different angles.
Multi-subject coherence
Prompt: “Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.”
This remains the Achilles’ heel of all video models, Kling 2.0 included. Ever since OpenAI showed Sora failing to generate a pack of baby animals playing together, all video generators have attempted this challenge with mixed results. No model consistently achieves perfect outcomes.
Kling 2.0 generated a vivid, realistic-enough scene, but the wolves merge into each other, appearing and disappearing between frames. If the only thing analyzed is coherence, then there is not a lot of difference between Kling 2.0 and Kling 1.6.
One notable improvement: the irregularities mostly occur in the background, with foreground animals maintaining better coherence most of the time.
Kling 2.0 can be accessed via Kling AI, Freepik, Pollo AI and other providers.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
#Kling #Review #State #Art #Video #Quality