
The Seven-Petal Flower Test: ChatGPT’s Enduring Symmetry Struggle (and Why It’s Getting Better)
It’s not a serious benchmark like GPQA or SWE-bench, but it’s delightfully human. It reminds us that even as frontier models crush complex reasoning, basic perceptual tasks can still trip them up in surprising ways.













