Abstract: Large language model (LLM)-based auto-graders, like Claude 3.5 Sonnet, show promise in educational technology. To test their capabilities, we conducted an experiment in which four ...