Bootstrap Config and Site Manifest Loading

How scraper finds site manifest directories before building dynamic site commands.

Sections

Terminology & Glossary
📖 Documentation
Navigation
11 sectionsv0.1
📄 Bootstrap Config and Site Manifest Loading — glaze help scraper-bootstrap-config-and-site-manifest-loading
scraper-bootstrap-config-and-site-manifest-loading

Bootstrap Config and Site Manifest Loading

How scraper finds site manifest directories before building dynamic site commands.

Topicscraperbootstrapconfigsitesmanifestsscrapersiteworkerapisites-manifest-dir

Scraper loads site manifests during a bootstrap phase that happens before the full Cobra command tree is built. This matters because dynamic site commands such as site js-demo run seed are discovered from the loaded site manifests. If scraper does not know where the site directories are yet, those commands do not exist.

Why This Is A Bootstrap Concern

Normal Cobra flags are parsed after the command tree already exists. That is too late for scraper's site verbs, because the site command tree is created by scanning loaded manifests and JS verb files.

So scraper uses a two-phase startup model:

raw CLI args
-> bootstrap manifest-dir discovery
-> load site manifests into registry
-> build Cobra command tree
-> normal Cobra parsing and execution

Where Site Directories Can Come From

Scraper merges site manifest directories from three sources, in this order:

  1. app config file
  2. environment variable
  3. bootstrap CLI flags

Later sources win only in ordering/append position; paths are normalized and de-duplicated.

Config File

Default app config path is resolved through the standard Glazed app-config resolution for scraper.

Example ~/.scraper/config.yaml:

sitesManifestDirs:
  - /home/me/code/scraper-sites
  - /opt/shared-scraper-sites

Environment Variable

Use:

SCRAPER_SITES_MANIFEST_DIRS

The value is parsed with filepath.SplitList(...), so on typical Unix systems it looks like:

export SCRAPER_SITES_MANIFEST_DIRS="/path/to/sites-a:/path/to/sites-b"

CLI Flag

Use one or more repeated flags:

scraper \
  --sites-manifest-dir ./sites \
  --sites-manifest-dir ../extra-sites \
  site js-demo run seed --help

The bootstrap parser also accepts the --sites-manifest-dir=/path form.

Common Commands

Run a site verb from the repo's default sites/ directory:

go run ./cmd/scraper --sites-manifest-dir ./sites site js-demo run seed --help

Run the worker against the same manifest set:

go run ./cmd/scraper \
  --sites-manifest-dir ./sites \
  worker run \
  --sites-dir /tmp/scraper-sites \
  --engine-db /tmp/engine.db

Serve the API with the same manifest set:

go run ./cmd/scraper \
  --sites-manifest-dir ./sites \
  api serve \
  --sites-dir ./state/sites \
  --engine-db ./state/engine.db

Troubleshooting

ProblemCauseSolution
site js-demo run seed is missingscraper did not load the manifest directories during bootstrapPass --sites-manifest-dir, set SCRAPER_SITES_MANIFEST_DIRS, or configure ~/.scraper/config.yaml
--help should work on a site verb but bootstrap fails firstThe bootstrap parser should only extract manifest dirsVerify you are on a build that includes the manual bootstrap scanner
Commands work in tests but not with go run ./cmd/scraperTests pass explicit manifest dirs while the real CLI does notAdd a bootstrap source for the real CLI invocation
Worker/API sees no sitesThe root command was built without any manifest directoriesStart those commands with the same bootstrap site-dir inputs as the site command

See Also

  • scraper help scraper-architecture-overview
  • scraper help scraper-runtime-model
  • scraper help scraper-adding-a-declarative-site
  • scraper help scraper-new-developer-onboarding