That's not The Rock

Testing the top image editing models across 100 recursive Dwaynes.

Image editing models are fantastic, but they generally degrade if you edit the edit of the edit's edit.

Earlier this year, u/Foreign_Builder_2238 demonstrated this by asking ChatGPT to endlessly recreate a photo of Dwayne 'The Rock' Johnson, instructing it to "Create an exact replica of this image, don't change a thing." I decided to put the the current top image editing models through that same recursive abuse (Nano Banana Pro, SeeDream 4, Qwen, and friends) to see how they unravel when looping their own outputs 100 times.

The findings were both interesting and ridiculous, with all the models tested behaving in slightly different ways. For a measure I calculated the structural similarity index for each generated image compared with the original. These are graphed for each model below. Note that using this for a measure has some flaws, minor differences in Dwayne's position will destroy the SSIM score even when the image is otherwise coherent.

The SSIM trend is interesting, but what matters more, I think, is how many recursions it takes before the subject simply isn't The Rock anymore. So, I’ve tracked the "That's Not The Rock" (TNTR) score for each model—the exact generation where Dwayne ceases to be Dwayne. Since this is subjective, I've also included a way for you to cast your own vote on when he loses his Rock-ness.

Note: Most models were only tested once, it is possible that we'd see entirely different results on retesting.

GPT-Image1

The original reddit post was 7 months prior to writing this; let's check in on how things have changed.

Well, that was boring. Recursive generations almost immediately degrade to static, only slightly recovering before falling to pieces. This was the worst performing model in the test and the same behaviour was seen in multiple runs.
GPT-Image1 SSIM Degradation
GPT-Image10 votes

Images till 'That's not The Rock'

-

Be the first to vote

Recursion 50 for GPT-Image1
Image 50

GPT-Image1-mini

Fortunately, the results for GPT-Image1-Mini were a lot more interesting.

The only model that continued to create coherent images without degrading into noise. Mini clearly isn't just 'smaller' and there are more substantial differences in the models. While this had the lowest peak SSIM and TNTR scores, it was definitely the most fun.
GPT-Image1-mini SSIM Degradation
GPT-Image1-mini0 votes

Images till 'That's not The Rock'

-

Be the first to vote

Recursion 50 for GPT-Image1-mini
Image 50

Nano Banana Pro

Currently touted as the SOTA of image editing models, how does it hold up?

Far more structurally consistent, differences in colour become more exaggerated over generations. Dwayne is quite noisy by step 10 and devolves from there into fractals and eventually into something reminiscent of a broken LCD panel.
Nano Banana Pro SSIM Degradation
Nano Banana Pro0 votes

Images till 'That's not The Rock'

-

Be the first to vote

Recursion 50 for Nano Banana Pro
Image 50

SeeDream 4

Currently the runner-up to Nano Banana Pro in the leaderboards, the pattern here trends to red, with several sudden jumps in image coherence.

The only model tested that got… hairy?
SeeDream 4 SSIM Degradation
The drift to the right at image 4 destroys the SSIM score early on, making the chart less useful in this case.
SeeDream 40 votes

Images till 'That's not The Rock'

-

Be the first to vote

Recursion 50 for SeeDream 4
Image 50

Qwen Image Edit

A highly trainable model which has recently been gaining in popularity on the release of a few nice fine tunes. Note: On writing I noticed that the test here uses a previous generation of Qwen Image Edit, the more recent Qwen Image Edit 2509 is yet to be tested.

The best performing model according to the numbers, with the highest peak SSIM, and a very smooth descent into blotchy green noise, distinguished only by suddenly refreshed facial hair and lips.
Qwen Image Edit SSIM Degradation
Qwen Image Edit0 votes

Images till 'That's not The Rock'

-

Be the first to vote

Recursion 50 for Qwen Image Edit
Image 50

Nano Banana

The previous generation of the Nano Banana model behaves quite differently to the latest version.

The model with the most significant Dwaynian motion, with The Rock gradually sliding to the right.
Nano Banana SSIM Degradation
The sideways motion breaks the SSIM quite quickly.
Nano Banana0 votes

Images till 'That's not The Rock'

-

Be the first to vote

Recursion 50 for Nano Banana
Image 50

Flux Kontext Pro

The model that proved the usefulness of editing models; it is over a year old now.

Pretty good! It performs the best of the models tested from an average SSIM perspective, but the results still trend to noise and abrupt white dudes.
Flux Kontext Pro SSIM Degradation
Flux Kontext Pro0 votes

Images till 'That's not The Rock'

-

Be the first to vote

Recursion 50 for Flux Kontext Pro
Image 50

All Models Compared

All Models: SSIM Degradation Comparison
Comparative performance across all 7 image editing models

Results breakdown

Model summary

ModelPeak SSIMAvg SSIMCost p/imageSWGTNTR
GPT-Image10.4440.017$0.04
GPT-Image1-mini0.2890.135$0.02
Nano Banana Pro0.8580.278$0.14
Nano Banana0.6550.262$0.039
SeeDream 40.7560.265$0.03
Qwen Image Edit0.9320.251$0.03
Flux Kontext Pro0.9070.314$0.04

Other findings

GPT-image1-mini is consistently weird

I was most interested in GPT-image1-mini, since this was the only model that produced consistently coherent images rather than degrading to static. I ran this model five times to see how the variations introduced during the process affected the final result. The final frames were surprisingly similar, with only one run failing to produce a variation of a demonic white dude.

Click image to show progression video

Image coherence jumps

While most models trend to noise, several of them showed sudden jumps in image coherence. Seedream 4 was particularly prone to this, showing 4 distinct coherence jumps.

Flux sudden jump example
SeeDream sudden jump example 1
SeeDream sudden jump example 2
SeeDream sudden jump example 3

So what did I learn?

I never really set out to learn anything with this; this was only ever a fun experiment. It also isn't very surprising that there is loss in recursive generations with image editing models; it is expected that there will be loss in asking a generative model to reproduce an image, and that delta will compound over time. It is interesting, though, the breadth of variation in how these models degrade.

If there’s a takeaway, it's to know that none of the frontier image editing models are safe from loss in recursive image editing. So beware of editing the edit of the edit; just write a better prompt the first time.

I'd love to know more about why these models behave so differently. If you can enlighten me, send me a DM.