Testing Claude, ChatGPT and Gemini for medical image analysis: brain anatomy

https://preview.redd.it/grr4398usree1.jpg?width=6200&format=pjpg&auto=webp&s=c027be3215349a65ee1b29f1c1b75142bfa72f88

https://preview.redd.it/9ep7za8usree1.jpg?width=6200&format=pjpg&auto=webp&s=8bc7a9895ed7354a821ab63f7ad68c60fafcb9af

https://preview.redd.it/0fub3b8usree1.jpg?width=6200&format=pjpg&auto=webp&s=0b54e7cb95e4ce38289f96f1f5bd00704be8cf42

https://preview.redd.it/3odtl88usree1.jpg?width=6200&format=pjpg&auto=webp&s=6bee5590765b1cf624ba1c2c9456efc843990941

https://preview.redd.it/g0n3z88usree1.jpg?width=6200&format=pjpg&auto=webp&s=23c6b09463a13b6da44047cf5e118e52358ae12e

https://preview.redd.it/tjhs8a8usree1.jpg?width=6200&format=pjpg&auto=webp&s=6ebc4cf1a923a302951628e9af30ad6eeeb9019d

https://preview.redd.it/7glqca8usree1.jpg?width=6200&format=pjpg&auto=webp&s=e5d7689dbc9b52722e78251c29b1b841e0a8dc5b

https://preview.redd.it/jiicy98usree1.jpg?width=6200&format=pjpg&auto=webp&s=94523a49b3d7d9e8cbe51ad0dfcc8dbbcc6c6a6d

https://preview.redd.it/mzcfba8usree1.jpg?width=6200&format=pjpg&auto=webp&s=4232fb38886404a336e1cdbc1b699b512e5f1d3c

https://preview.redd.it/2rief356tree1.jpg?width=6200&format=pjpg&auto=webp&s=b4280515b8380fb237ef45b642b44421fd2048e9

I gave a task to the three models: analyze the spatial transcriptomic of the mouse brain, and identify brain regions/nuclei according to the [unknown] gene expression pattern. All models were given the exact same series of prompts and were asked to think step by step. At the first prompt:

- Claude Sonnet3.5 (free version) correctly identified all the regions. When I asked it to be more specific on the nuclei it sees, it still gave a satisfactory answer, having misidentified just one nuclei as “possible parts”.

- ChatGPTo1 gave an almost correct response, though having included a bunch of regions, which did not have any detected gene expression in them. After I asked it to have a better look at the image and revise its answer, it insisted on the same regions, even though they were not correct. Seems that it confused the brainstem clusters with the midbrain/raphe nuclei.

- Gemini1.5 Flash at first gave a seemingly random list of areas, most of which were incorrect. However, after I asked to rethink its answer, it gave a much better response, having identified all the areas correctly, though not as precisely as Claude.

Then I showed them another image of the same brain slice with Acta2 expressed. It is a vascular marker, so in the brain it appears as a diffuse widespread pattern of expression with occasional “rings” – blood vessels, and obviously without any large clusters. This time their task was to propose possible gene candidates, which could show this pattern of expression. Claude was the only one who immediately recognized a vascular structure; ChatGPT and Gemini got confused with the diffused expression, and proposed something completely unrelated. My further hints like "look closely at the shape" did not improve the answers, so at the end Claude has shown the best performance of all the models.

I repeated the test twice on each model to make sure the result is consistent. I have also tested ChatGpt4o but the performance was not dramatically different from o1. Once again, I am impressed with Claude. I don’t know on how many gigabytes of mouse brain images it has been trained, but WOW.

P.S. Sorry for so many technical/anatomical terms, I know it's boring.