A grading rubric is where AI grading either becomes useful or goes off the rails

If you ask an LLM to grade without a strong rubric, it will still produce something polished. That is exactly the problem.

Core lesson

Before asking AI to evaluate student work, define what should be evaluated, for whom, and at what level. If the rubric is weak, the model fills the gaps with assumptions.

Here are four common failures.

1. Not enough context leads to invented standards

If the model sees only a student response and a broad instruction like "grade this," it has to guess what matters.

That often means it rewards things you did not ask for or penalizes things you never taught.

txt

Weak:
"Grade this response to the reading."

Stronger:
"Using this 8th grade rubric, evaluate only:
1. claim clarity
2. textual evidence
3. explanation quality
Do not evaluate grammar unless it affects meaning."

2. If you let the LLM choose the criteria, it will

Teachers sometimes hand over too much authority without meaning to:

"Tell me what this student did well"
"Score this fairly"
"Give feedback on the assignment"

Those prompts sound reasonable, but they let the model decide what counts. That is rubric design by accident.

The model should apply criteria, not invent them.

HBG workflow rule

3. A rubric must match the student, not just the subject

A high school literary analysis rubric should not be reused for a 4th grade paragraph, and a college seminar rubric should not be applied to an English learner draft without adjustment.

If the rubric does not reflect the student's grade level, assignment type, or instructional context, the AI will generate feedback that sounds smart but is educationally wrong.

Mismatch warning

Generic rubrics produce generic judgments. Personalized rubrics produce feedback that is much more likely to be fair, usable, and teachable.

4. You need attribution for how the rubric was built

When AI helps draft a rubric, it should stay clear which parts came from the model and which parts were edited, tightened, or replaced by the educator.

That distinction matters for defensibility. If a rubric or a grade is later appealed, teachers need to show what they accepted, what they changed, and where professional judgment was applied.

5. Sycophancy can quietly misdirect the workflow

LLMs often try to be agreeable. If a teacher says, "This is probably an A paper, right?" the model may lean toward confirming that framing instead of evaluating independently.

That is sycophancy: the model optimizing for agreement rather than accuracy.

In grading, that can push AI use in the wrong direction by reinforcing an early hunch instead of checking the work against the rubric.

A better starting point

Before using AI for grading, make sure the rubric does these things:

names the exact criteria
fits the assignment and grade level
limits what the model should and should not evaluate
preserves attribution between AI draft work and educator edits
gives the model less room to guess and flatter

If you get the rubric right first, the rest of the AI workflow gets much easier.

What Teachers Need to Get Right Before Asking AI to Grade

A grading rubric is where AI grading either becomes useful or goes off the rails

1. Not enough context leads to invented standards

2. If you let the LLM choose the criteria, it will

3. A rubric must match the student, not just the subject

4. You need attribution for how the rubric was built

5. Sycophancy can quietly misdirect the workflow

A better starting point

Ready to Transform Your Grading Process?

Rubric

Grading

A grading rubric is where AI grading either becomes useful or goes off the rails

1. Not enough context leads to invented standards

2. If you let the LLM choose the criteria, it will

3. A rubric must match the student, not just the subject

4. You need attribution for how the rubric was built

5. Sycophancy can quietly misdirect the workflow

A better starting point

Ready to Transform Your Grading Process?