Trust and Fairness (public-safe)
Trust and Fairness (public-safe)
Status: first-draft
Audience: public
Last-reviewed: 2026-04-29
Sources: docs/Blueprint.md Fairness Engine §
This page explains the fairness engine — the moat at the center of READYPLAY. It's safe to share with visitors because the math is the public-good story: we don't hide it, we publish it.
The problem
Pickup sports run on opinion. One harsh reviewer can drop a player's grade unfairly. One generous reviewer can inflate a friend forever. A scoring system that treats every grade as equal becomes a popularity-and-grudges engine within weeks.
The principle
Don't erase reviews — give them context. Every grade is recorded. But each reviewer's weight in the average is shaped by how their grades compare to everyone else who watched the same player in the same game.
A reviewer who's consistently far from the room loses weight. They're still in the room, still on the record — just not driving the consensus alone.
How it works
After every finished game with reviews, the engine computes:
1. Grade values
Letter grades map to numbers:
| Grade | Value |
|---|---|
| A | 4.0 |
| B | 3.0 |
| C | 2.0 |
| D | 1.0 |
| F | 0.0 |
2. Consensus gap (per reviewer, per reviewee, per game)
For each player being reviewed, the engine computes the team consensus — the average of every teammate grade for that player in that game. Then for each reviewer, it records:
gap = abs(reviewerOwnGrade - teamAverage)
That gap accumulates into the reviewer's rolling fairness profile across every game they've ever reviewed.
3. Outlier detection
A single review is flagged as an outlier when:
abs(ownGrade - teamAverage) >= 1.5
(That's a one-and-a-half letter-grade gap from the room.) Outlier rate accumulates into the same fairness profile.
4. Reviewer weight
The reviewer's weight in future weighted averages comes from:
reviewerWeight = max(0.35, 1.0 - (averageConsensusGap * 0.18) - (outlierRate * 0.25))
A reviewer who never goes against the room sits near 1.0. A reviewer who's frequently far out drops toward the floor of 0.35. The floor exists on purpose — even an outlier reviewer keeps a voice.
5. Weighted player grade
A player's displayed grade is a weighted average of every grade they've received:
weightedGrade = Σ (gradeValue * reviewerWeight) / Σ (reviewerWeight)
A grade from a high-fairness reviewer pulls more than a grade from a frequent outlier. Both are present.
6. Two named patterns
- Harshness — if a reviewer's average given grade stays much lower than platform norms and team consensus, their
harshnessIndexrises. - Favoritism — if a reviewer rates a repeated friend much higher than consensus across many games, their
favoritismIndexrises.
These don't change the math directly; they're surfaced for product decisions (e.g., warning labels, friend-detection nudges, future moderation tools).
Why this matters
- A pickup app without this is a popularity contest with worse UX than Yelp.
- A pickup app with this becomes a place where a peer-verified record means something: when a player's grade is high, it's because the room agrees, not because they happen to be friends with one loud rater.
- Reviews are anonymous to the reviewee — you never see who graded you. They are not anonymous to the engine, which is what makes the fairness math possible.
What the engine doesn't do
- It doesn't punish you for one tough game. The math operates over rolling history.
- It doesn't erase outlier reviews. They count, just at lower weight.
- It doesn't try to detect cheating. That's a separate roadmap item ("anti-cheat scoring pattern detection").
- It doesn't model relationships beyond same-game consensus gap. Friend graphs, team chemistry, and longer-term trust signals are future work.
Where this is implemented
- The math lives in
Shared/Services/FairnessService.swiftandShared/Models/FairnessProfile.swift(seedocs/Blueprint.mdfor the canonical model definitions). - Every player has a
FairnessProfilerecord updated after each completed game. - Tests in
Tests/ReviewWeightingTests.swiftlock the consensus-gap math, outlier detection, and the reviewer-weight floor.
How to talk about it
The right framing for a visitor or league commissioner:
Every grade you give counts. But the engine watches whether you're with the room or against it. Over time, raters who consistently disagree with the consensus carry less weight — not because their opinion is wrong, but because the platform leans on agreement to mean something.
Don't oversell it as "AI moderation" or "automated fairness." It's transparent statistics, applied to peer reviews, with documented constants. That's the honest pitch.