H and H Coffee Factory — Knowledge Base Schema

This document overrides all LLM Wiki skill defaults. Read it first.

Architecture — knowledge-base/ is the single source of truth

This repository is a one-person experimental research project. The wiki is the single canonical source of truth for all research-grade data: people, companies, brands, places, events, documents, accession records, artifact catalog items, and gallery definitions. The Jekyll site renders from a projection of the wiki into _data/ (built by scripts/accessions_build_jekyll_data.exs and related generators).

                            ┌────────────────────────────────────┐
                            │   knowledge-base/  (SOURCE OF TRUTH) │
                            │                                      │
                            │   people/  brands/  companies/       │
                            │   places/  events/  documents/       │
                            │   accessions/  artifacts/            │
                            │   galleries/                         │
                            │                                      │
                            │   raw-sources/  raw-archives/        │
                            └──────────────┬───────────────────────┘
                                           │ (Elixir generators)
                                           ▼
                            ┌────────────────────────────────────┐
                            │   _data/  _brands/  (PROJECTION)    │
                            │                                      │
                            │   _data/accessions.yml               │
                            │   _data/galleries/*/items/*.yml      │
                            │   _data/galleries/*/order.yml        │
                            │   _brands/*.md                       │
                            └──────────────┬───────────────────────┘
                                           │ (Jekyll build)
                                           ▼
                                      _site/  (rendered output)

Implication: do not edit _data/accessions.yml, _data/galleries/*/items/, or _brands/ by hand. Edit the canonical record under knowledge-base/ and re-run the generator.

Directory

knowledge-base/

Topics — narrative research

Compiled pages live in topic directories:

Directory Contents
people/ Biographical pages — founders, family members, employees
brands/ Brand histories — product lines, trademarks, packaging. Canonical source for _brands/*.md Jekyll collection
companies/ Organizational pages — coffee companies (Hoffman-Hayman, Western Coffee Co., etc.)
events/ Historical milestones, openings, closings, transitions. Files with timeline: true in frontmatter are projected to _data/events.yml for the Jekyll /history/ page. Rich research synthesis pages (without timeline:) coexist as KB-only.
documents/ Synthesis pages for specific primary documents
places/ Locations — factories, offices, distribution points

Topics — structured collection data

These directories carry one file per record. Frontmatter is the structured data; body is research notes / interpretation.

Directory Contents Projects to
accessions/ One file per accession record (e.g. HH-AD-2014-0001.md). Schema in docs/history/2026-04-30-h-and-h-accession-and-loan-readiness-design.md. Frontmatter mirrors _data/accession_records/ v2 schema (accession_id, object_title, category, acquisition_source, acquired_date, acquisition_reference, possession_status, location, condition, loan_ready, notes). _data/accessions.yml (Jekyll consumes)
artifacts/ One file per catalog item, keyed by clip_id (e.g. HH-REF-2023-0001.md, HH-FACT-0000-0001.md). Frontmatter mirrors the catalog item schema (clip_id, title, alt, image_basename, image_path, url, gallery). Body is for research notes specific to the artifact. _data/galleries/<gallery>/items/<clip_id>.yml
galleries/ One file per gallery (e.g. reference.yml, factory.yml). Sequence/ordering + gallery-level metadata. _data/galleries/<gallery>/order.yml

Source Buckets

Raw sources live in raw-sources/ organized by type. Note: raw-sources/ is for primary historical sources only (clippings, ads, USPTO filings, etc.). Artifact catalog items and accession records have their own top-level directories (artifacts/, accessions/) — see “Topics — structured collection data” above.

Bucket Contents
newspapers/ Clippings, death notices, news articles
advertisements/ Ad copy, promotional records, trade materials (including audio/video transcription discs)
images/ Primary-source scans (USPTO filings, postcards, letterheads, lithographic prints) where the image itself is the document
research/ Secondary sources, research notes, existing docs/ content

Primary source vs. artifact documentation

The distinction matters: a primary source is the historical document itself (a 1923 newspaper clipping, a 1922 USPTO filing). An artifact is a physical object that survives from the period; an artifact documentation photograph is a 2014–2026 photo of that object. The photograph is contemporary documentation, not a period source. Examples:

  • 1923 SA Light clipping (scan) → newspapers/ bucket, primary source
  • 1922 USPTO trademark Official Gazette clipping (image of the filing) → images/ bucket, primary source
  • A 1920s H and H Blend tin → an artifact
  • A 2014 photo of that 1920s tin → artifacts/ bucket, documents an artifact
  • A 2026 photo of the 601 Delaware factory exterior → artifacts/ bucket, field documentation

Artifacts — catalog under artifacts/

The artifact catalog lives at knowledge-base/artifacts/<clip_id>.md, one file per item, organized by gallery (gallery is a frontmatter field, not a subdirectory — flat layout for easier cross-cutting search).

Gallery taxonomy (mirrors the legacy _data/galleries/ structure for now):

Gallery Contents Approx. count
branding_newspaper Display-ad scans 51
collection Items the museum owns 185
factory 601 Delaware site documentation 103
newspaper Newspaper-clip scans 299
not_our_h_and_h Look-alikes / non-H&H reference 17
reference H&H items documented but not owned 80
wanted Items being sought 9

Each artifact file has frontmatter mirroring the catalog item schema, plus optional cross-links to other wiki pages. The body is for research notes specific to that artifact.

---
clip_id: HH-REF-2023-0001
type: artifact
gallery: reference
title: "Large bulk-size H and H Blend Coffee tin (lid missing)  'H & H' monogram side panel and 'We roast It / others praise It' slogan front-face cartouche, early-1920s Hoffmann-Hayman branding"
alt: "Color photograph documented 2023-05-20 of a heavily worn large square-cross-section bulk-size H and H Blend Coffee tin…"
image_basename: 2023-05-20-h-and-h-blend-large-bulk-tin-monogram-side
image_path: /assets/images/thumbnail/2023-05-20-h-and-h-blend-large-bulk-tin-monogram-side.jpg
url: /assets/images/gallery/2023-05-20-h-and-h-blend-large-bulk-tin-monogram-side.jpg
date_documented: 2023-05-20
brands: [h-and-h-blend]
period_referenced: "early-1920s"
---

# Large bulk-size H and H Blend Coffee tin (early-1920s)

Research notes about variant, attribution, comparable items, etc.

The Elixir generator projects knowledge-base/artifacts/*.md_data/galleries/<gallery>/items/<clip_id>.yml (Jekyll consumes the projection).

Cross-references from narrative topic pages

When a topic page (people, brands, companies, events, places) cites an artifact, reference the clip_id in frontmatter:

artifacts:
  - HH-FACT-0000-0001   # 1932 G.W. Mitchell construction photo of the Hayman factory exterior
  - HH-COLL-0000-0042   # 1lb H and H Blend tin (front face)

When to add a raw-sources/ entry for an artifact

Most artifacts do not belong in raw-sources/. That registry is for primary historical sources. Add a raw-sources/ row only when the artifact itself functions as a primary source (e.g., a maker’s mark or embossment that’s the only surviving documentation of a fact). Otherwise: the artifact file under artifacts/<clip_id>.md is its own canonical record.

Accessions

Provenance records — one file per acquisition. Schema documented in docs/history/2026-04-30-h-and-h-accession-and-loan-readiness-design.md.

---
accession_id: "HH-AD-2014-0001"
type: accession
object_title: "1960 vintage print advertisement  The Toy House World, Saint Paul, Minnesota (context ephemera)"
category: "AD"
nomenclature_term: "advertisement"
acquisition_source: "ebay"        # ebay | vendor | donation | other
acquired_date: "2014-07-07"
acquisition_reference: "261521640532"
possession_status: "in_collection" # in_collection | on_loan_out | on_loan_in | transferred | deaccessioned | missing | unknown
location: "Private collection, San Antonio, Texas"
condition: "unknown"               # excellent | good | fair | poor | unknown
loan_ready: false
artifacts:                         # optional — clip_ids of the artifact catalog items this accession produced
  - HH-REF-2014-0001
notes: >
  Reference/context item; draft needs object scan before loan packaging.
---

# 1960 vintage print advertisement — Toy House World

(optional research notes about the acquisition, condition history, etc.)

Raw evidence files (receipt PDFs, eBay screenshots, antique-mall receipts) live at records/ (top-level, unchanged from the existing pattern). Each accession references its evidence via acquisition_reference (eBay transaction_id, antique-mall receipt filename, etc.).

The Elixir generator projects knowledge-base/accessions/*.md_data/accessions.yml for Jekyll, also writing museum-ready CSV exports.

purchase: frontmatter in _posts/*.md is not removed — posts continue to carry display-time provenance copy. The accession record in knowledge-base/accessions/ is canonical; posts cite by accession_id (frontmatter field) and the generator can validate consistency.

Galleries

Gallery-level configuration (sequence/order, gallery title, description):

---
gallery: reference
title: Reference
description: "Photographs of H and H Coffee items found online  not in our collection"
order_strategy: manual
sequence:
  - HH-REF-2023-0001
  - HH-REF-2023-0002
  - HH-REF-0000-0001
  # …
---

The Elixir generator projects knowledge-base/galleries/<gallery>.md_data/galleries/<gallery>/order.yml.

Reliability

  • Primary historical documents: reliability: high
  • Secondary sources and research notes: reliability: mixed unless verified against primary sources

Frontmatter Conventions

Mandatory: title, type, updated, sources

Optional:

  • tags: — era (1890s, 1900s … 1960s), document type (founder, brand, advertisement), subject (hoffmann, hayman, western-coffee)
  • period: — date range for historical entries (e.g. 1899–1920)
  • reliability:high | mixed | unverified

PDF Handling

Attempt text extraction with pdftotext -layout <file> <output.txt> first. If the PDF is image-based (extraction yields only header metadata), read it visually.

Register the PDF path in raw-sources/index.md.

Inbox Processing

After a source from work/inbox/ is ingested, move it to knowledge-base/raw-archives/<bucket>/ and rename using the item’s publication or creation date:

  • Newspapers: YYYY-MM-DD_slug.pdf (date = publication date)
  • Advertisements: YYYY-MM-DD_slug.{pdf,mp3,mp4,m4a,…} (date = publication or session date; audio/video formats allowed)
  • Images (primary-source scans): YYYY-MM-DD_slug.<ext> (date = creation/capture date if known, otherwise acquisition date)
  • Artifacts (object/field photographs): do not move to raw-archives/. Hand off to _data/galleries/<gallery>/ so the Jekyll catalog assigns a clip_id. The binary lives in assets/images/gallery/ (canonical) and assets/images/thumbnail/ (thumb). The wiki then references the clip_id.

Update the path or clip_id in raw-sources/index.md after moving / cataloging.

Audio / video advertising material

Belongs in advertisements/. The MP4/MP3/M4A binary lives in knowledge-base/raw-archives/advertisements/ alongside a *.transcript.md companion file. Frontmatter slug uses the recording session date (YYYY-MM-DD_…) even if multiple takes share that date. See 1961-08-01_hh-master-chef-radio-broggi-track-1.* for the established pattern.

Generators

All projections from knowledge-base/ to _data/ and _brands/ are written by Elixir scripts under scripts/, each wired into Jekyll’s :after_reset hook via a sibling _plugins/regenerate_*.rb:

Generator Reads Writes Plugin
accessions_validate_and_export.exs knowledge-base/accessions/*.md _data/accessions.yml, museum CSVs _plugins/regenerate_accessions_data.rb
accessions_build_jekyll_data.exs (delegates to above) _data/accessions.yml (same)
artifacts_build_jekyll_data.exs knowledge-base/artifacts/*.md _data/galleries/<gallery>/items/<clip_id>.yml _plugins/regenerate_artifacts_data.rb
galleries_build_jekyll_data.exs knowledge-base/galleries/*.md _data/galleries/<gallery>/order.yml _plugins/regenerate_galleries_data.rb
brands_build_jekyll_collection.exs knowledge-base/brands/*.md (only files with jekyll_filename:) _brands/*.md (Jekyll collection stubs) _plugins/regenerate_brands_collection.rb
events_build_jekyll_data.exs knowledge-base/events/*.md (only files with timeline: true) _data/events.yml _plugins/regenerate_events_data.rb

Each plugin honors its <name>_data.regenerate_on_build / .regenerate_only_when_stale config in _config.yml and the matching SKIP_<NAME>_REGEN / FORCE_<NAME>_REGEN env overrides.

Files that stay in _data/ (not migrated)

These remain in _data/ because they are Jekyll-display-only or auto-generated from raw eBay reports:

File Why it stays
navigation.yml Site nav config — pure Jekyll display
ui-text.yml UI strings — pure Jekyll display
story_taxonomy.yml Controlled vocab for _pages/artifact-index.md filters — content-coupled to a Jekyll template
acquisitions.yml, ebay_purchase_history.yml Auto-generated by scripts/combine_ebay_purchase_history.exs from _data/2014-2023-ebayReports/ and _data/2023-2026-ebayReports/ raw archives
_data/galleries/<gallery>/items/, _data/galleries/<gallery>/order.yml Projection output from knowledge-base/artifacts/ and knowledge-base/galleries/ (Steps 3–4)
_data/accessions.yml Projection output from knowledge-base/accessions/ (Step 2)
_data/events.yml Projection output from knowledge-base/events/ (Step 6)
_data/2014-2023-ebayReports/, _data/2023-2026-ebayReports/ Raw eBay export archives (evidence)

Framework Note

knowledge-base/ is the canonical store; Jekyll consumes a projection. The wiki can be migrated to a different static-site framework without losing the underlying research data — only the generators need to change.

Changelog

  • v3.1 (2026-05-15) — Step 6 audit complete: _data/events.yml joins the projection family (generated from knowledge-base/events/*.md with timeline: true opt-in). Vestigial _data/items.yaml and _data/crystalvac_jars.yaml deleted (no Jekyll consumers). navigation.yml, ui-text.yml, story_taxonomy.yml, acquisitions.yml, and ebay_purchase_history.yml documented as stays-in-data. Generator + plugin table moved to single canonical location.
  • v3 (2026-05-15) — knowledge-base/ becomes the single source of truth for all research-grade data. Added accessions/, artifacts/, galleries/ as top-level topic directories (canonical records, one file per record). _data/accession_records/, _data/galleries/*/items/, _data/galleries/*/order.yml, and _brands/*.md become Jekyll-side projections generated from knowledge-base/. Removed the v2 raw-sources/artifacts/ bucket (artifacts are now top-level, not under raw-sources). Documented the generator chain.
  • v2 (2026-05-15) — Added artifacts/ bucket under raw-sources/; distinguished primary-source scans from artifact documentation; documented _data/galleries/ as authoritative for imaged items. Superseded by v3.
  • v1 (2026-05-15) — Initial schema with newspapers/, advertisements/, images/, research/ buckets.