I tried that once. Specifically because I wanted to see if we could leverage som...

I tried that once. Specifically because I wanted to see if we could leverage some sort of productivity enhancements.

I was using a local LLM around 4B to 14B, I tried Phi, Gemma, Qwen, and LLama. The idea was to prompt the LLM with the question, the answer key/rubric, and the student answer. The student answer at the end did some prompt caching to make it much faster.

It was okay but not good, there were a lot of things I tried:

* Endlessly messing with the prompt. * A few examples of grading. * Messing with the rubric to give more specific instructions. * Average of K. * Think step by step then give a grade.

It was janky and I'll throw it up to local LLMs at the time being somewhat too stupid for this to be reasonable. They basically didn't follow the rubric very well. Qwen in particular was very strict giving zeros regardless of the part marks described in the answer key as I recall.

I'm sure with the correct type of question and correct prompt and a good GPU it could work but it wasn't as trivially easy as I had thought at the time.