Data posture

What we surface, what we don't, and why.

Why don't I see a collection count on result rows?

We currently surface a cube classification axis (critic / DJ / collector) — a three-bit verdict that says which audiences engaged with this master — but we do not display a raw collection count on result rows.

The collector axis is sourced from mirror.collection_items_current, a sampled panel of Discogs users' public collections. The panel is roughly 3,843 users today — about 0.04% of the broader Discogs collector universe. That number is large enough to be a useful classification signal (collector-positive or collector-zero is a meaningful binary across millions of masters), but too narrow to display as a bare per-master count that a customer can reason about. "Owned by 84" reads as a global ownership claim; the underlying datum is "84 of our sampled panel" — a different question entirely.

Once hosaka users register and track their own collections, raw counts will return — labelled honestly as "X hosaka users own this" — and the per-record signal will become meaningful in absolute terms.

What's still in the API response?

The OpenAPI response shape is unchanged. The owner_count field is still present in /api/v1/search, /api/v1/masters/{id}, and /api/v1/masters/batch responses so existing paid API consumers keep working.

If you're consuming owner_count programmatically: treat it as a sampled-panel signal, not a global count. It's useful as a relative ranking input (records with higher panel ownership are, on average, more widely collected) but not as an absolute claim about how many people own a record.

Internals

For full transparency on the substrate:

| Signal | Source | Coverage | |---|---|---| | cube_quadrant (classification) | Per-axis presence checks | Full catalogue (~2.5M masters) | | owner_count (panel) | mirror.collection_items_current | ~3,843 users sampled (~0.04% of Discogs) | | dj_count | DJ-set tracklists (seen) | RRR, KEXP, NTS, etc. — partial AU/US/UK coverage | | critic_count | Editorial mentions (mirror) | Partial; expanding | | formats | Discogs CC0 | Full catalogue |

We hold ourselves to surfacing only signals we can defend as meaningfully sampled or globally measured. The cube classification axis qualifies for both. Raw panel counts currently don't.

Roadmap

  • First-party collections — once hosaka has its own user accounts, owner_count returns as a labeled hosaka-internal count.
  • Work-grain aggregation — today owner_count is per-master_id, which means Dark Side of the Moon's various pressings (UK Harvest, US Capitol, MoFi reissue, 2003 SACD, 50th anniversary, …) are counted separately. Work-grain aggregation via ISWC is on the ledger roadmap and will produce a single canonical "Dark Side of the Moon" rather than N pressings.

If you have a use case that depends on absolute ownership numbers today, talk to us — there are a few paths forward (Discogs-direct partnership data, MLC Public Database, etc.) but none that we'd ship behind a crate.0xhoneyjar.xyz URL without the right labelling.