I understand the 'fun factor' but at this point I really wonder what this pelica...

simonw · 2026-04-16T19:35:57 1776368157

That's why I did the flamingo on a unicycle.

For a delightful moment this morning I thought I might have finally caught a model provider cheating by training for the pelican, but the flamingo convinced me that wasn't the case.

furyofantares · 2026-04-16T20:07:14 1776370034

It is completely wild to me that you prefer Qwen's flamingo. I think it's really bad and Opus' is pretty good.

simonw · 2026-04-16T20:09:08 1776370148

The Opus one doesn't even have a bowtie.

furyofantares · 2026-04-16T20:40:45 1776372045

The Opus one looks like a flamingo, and looks like it's riding the unicycle. Sitting on the seat. Feet on the pedals.

The Qwen one looks like a 3-tailed, broken-winged, beakless (I guess? Is that offset white thing a beak? Or is it chewing on a pelican feather like it's a piece of straw?) monstrosity not sitting on the seat, with its one foot off the pedal (the other chopped off at the knee) of a malmanufactured wheel that has bonus spokes that are longer than the wheel.

But yeah, it does have a bowtie and sunglasses that you didn't ask for! Plus it says "<3 Flamingo on a Unicycle <3", which perhaps resolves all ambiguity.

bigyabai · 2026-04-16T21:59:01 1776376741

Let's not oversell Opus' output. The Qwen flamingo is flawed but could be easily fixed with 1-2 prompts if you're really upset with it. The Opus SVG is not any better than something that I could make in Inkscape with 3 minutes and sufficient motivation. Calling Opus' flamingo "programmer art" would be an insult to programmers.

monksy · 2026-04-16T21:39:26 1776375566

Game over opus

akavel · 2026-04-16T20:11:43 1776370303

r/LocalLlama is now doing a horse in a racing car:

https://redd.it/1slz38i

prodigycorp · 2026-04-16T19:47:08 1776368828

To me the opus flamingo is waaaay better than the qwen one. qwen has the better pelican, though.

dude250711 · 2026-04-16T19:50:54 1776369054

Is a flamingo on a unicycle not merely a special case of a pelican on a bicycle?

solarkraft · 2026-04-17T01:48:22 1776390502

If I (commercially) made models I’d put specific care into producing SVGs of various animals doing (riding) various things ... I find it interesting how confident you seem to be that they’re not.

simonw · 2026-04-17T05:06:08 1776402368

Google Gemini featured a bunch of examples of exactly that in their release video for 3.1 Pro: https://x.com/JeffDean/status/2024525132266688757

BoorishBears · 2026-04-16T21:38:12 1776375492

This is a gag that's long outlived its humor, but we're in a space so driven by hype there are people who will unironically take some signal from it. They'll swear up and down they know it's for fun, but let a great pelican come out and see if they don't wave it as proof the model is great alongside their carwash test.

luyu_wu · 2026-04-16T23:06:11 1776380771

Consider reading the article, which addresses all of the points you raise.

It's directly stated in the post that the entire test is meant to be humorous, not taken seriously, only that is has vaguely followed model performance to date. The author also writes that this new result shows that trend has broken..

gistscience · 2026-04-17T06:51:30 1776408690

Yeah I can imagine these popular benchmarks get special treatment in the training of new models. I wonder how they would perform for "Elephant riding a car" or "Lion sleeping in a bed"

stephbook · 2026-04-16T21:23:43 1776374623

They're certainly aware of the test, but a turtle doing a kickflip on a skateboard? I seriously doubt they train their models for that.

https://x.com/JeffDean/status/2024525132266688757

If anything, the disastrous Opus4.7 pelican shows us they don't pelicanmaxx

bitwize · 2026-04-16T21:38:59 1776375539

I think I found the leaked Claude Mythos version of the turtle benchmark: https://www.youtube.com/watch?v=l82XWTKLZuk