Scoring Rubric

How we score every song

Every lyric forged or evaluated in SongForgeAI is measured across 12 metrics in three tiers. Scoring is deliberately hard. A 50 is average. A 90+ is historically rare territory.

12 metrics across Craft, Expression, and Impact
Weighted composite: Expression counts most (40%)
Anti-inflation rules prevent meaningless high scores
Every score includes per-metric reasoning and evidence
Calibrated against the best lyrics ever written

Craft (25%)

Can this person write? Mechanics, structure, rhyme, and word choice.

Expression (40%)

Does it say something worth hearing? Specificity, originality, truth, and voice.

Impact (35%)

Will anyone remember it tomorrow? Transcendence, arc, stickiness, and genre fit.

The 12 Metrics

1
Prosody & Musicality
What we measure

Meter, stress patterns, consonant and vowel clusters, intentional silence, and breath points. Does the lyric feel good in the mouth?

What good looks like

Natural rhythmic flow that a singer can inhabit without fighting the phrasing. Stressed syllables land on strong beats.

2
Structural Architecture
What we measure

Song shape, arc, verse progression, chorus return, and bridge revelation. Does the structure serve the story?

What good looks like

Each section has a clear job. Verses build, choruses resolve, the bridge shifts perspective. Nothing feels arbitrary.

3
Rhyme Intelligence
What we measure

Rhyme as craft servant: internal rhyme, slant rhyme, strategic non-rhyme. Does the rhyme scheme feel intentional rather than forced?

What good looks like

Rhymes land with purpose. A mix of perfect, slant, and internal rhyme that never bends meaning to satisfy a sound.

4
Economy of Language
What we measure

Every word earning its place. No filler, no padding, no lines that exist only to set up a rhyme.

What good looks like

You cannot remove a word without losing something. Every syllable carries weight or music.

5
Lyrical Specificity
What we measure

Concrete imagery, sensory detail, proper nouns, time anchors. The opposite of abstract generalities.

What good looks like

The song lives in a real place with real objects. "Tangerines and someone else's smile" instead of "memories of you."

6
Imagery Originality
What we measure

Fresh metaphors, defamiliarized objects, governing images that haven't been written to death.

What good looks like

Images that surprise on first read and deepen on second. No shattered hearts, no oceans of tears, no wings of freedom.

7
Emotional Truth
What we measure

The ring-test: does it feel true? Earned emotion, unforced vulnerability, no borrowed sentiment.

What good looks like

The emotion arrives through specificity and honesty, not through telling the listener what to feel.

8
Voice & POV Integrity
What we measure

Narrator consistency, perspective clarity, and a credible speaker. Does this sound like one person talking?

What good looks like

A distinct human presence. Word choices, diction, and references that belong to one coherent narrator.

9
The Transcendent Line
What we measure

The unrepeatable line. Not necessarily the cleverest; the truest. The line someone would quote.

What good looks like

At least one line that stops a listener cold. The kind of line people screenshot and share.

10
Emotional Arc
What we measure

Does the song move from state A to state B? Revelation, release, recalibration. Not just emotion, but emotional motion.

What good looks like

The listener ends the song in a different place than they started. Something shifted.

11
Memorability
What we measure

The one-hour test: could you recall this 60 minutes after hearing it once? Hooks, refrains, and sticky phrases.

What good looks like

Lines that lodge in the brain involuntarily. A chorus you catch yourself humming without trying.

12
Genre Authenticity
What we measure

Does this honor its genre while extending it? Genre fluency without genre cliche.

What good looks like

A country song that sounds like country but doesn't sound like every country song. Respect and surprise.

Why scores are hard to game

We built anti-inflation into the scoring system so that high scores actually mean something.

Gravity Rule

The default is 50, not 80. Every point above average must be earned with specific evidence from the lyrics.

Burden of Proof

Scores above 80 require the scorer to cite specific lines and explain why they justify the number.

Antagonist Ceiling

A dedicated adversarial voice tries to lower every score. If it finds a real weakness, the score drops.

Historical Context

Scores are anchored against the best lyrics ever written. A 90+ means the song stands alongside recognized classics.

Grade Scale

S+95-100Historically rare. Canon-level craft.
S90-94Exceptional. Every line earns its place.
A+85-89Outstanding. Minor imperfections only.
A80-84Strong. Craft is evident throughout.
B+75-79Good. Solid work with room to grow.
B70-74Competent. Foundation is there.
C+65-69Developing. Moments of promise.
C55-64Average. Functional but unremarkable.
D40-54Below average. Significant gaps.
F0-39Needs fundamental rework.

How the composite score works

Each metric scores 0-100. The composite is a weighted average across the three tiers:

Craft (25%)+Expression (40%)+Impact (35%)=Composite

See it in action

Every song you forge or evaluate gets a full 12-metric breakdown with reasoning per metric.