How scraper finds site manifest directories before building dynamic site commands.
Scraper loads site manifests during a bootstrap phase that happens before the full Cobra command tree is built. This matters because dynamic site commands such as site js-demo run seed are discovered from the loaded site manifests. If scraper does not know where the site directories are yet, those commands do not exist.
Normal Cobra flags are parsed after the command tree already exists. That is too late for scraper's site verbs, because the site command tree is created by scanning loaded manifests and JS verb files.
So scraper uses a two-phase startup model:
raw CLI args
-> bootstrap manifest-dir discovery
-> load site manifests into registry
-> build Cobra command tree
-> normal Cobra parsing and execution
Scraper merges site manifest directories from three sources, in this order:
Later sources win only in ordering/append position; paths are normalized and de-duplicated.
Default app config path is resolved through the standard Glazed app-config resolution for scraper.
Example ~/.scraper/config.yaml:
sitesManifestDirs:
- /home/me/code/scraper-sites
- /opt/shared-scraper-sites
Use:
SCRAPER_SITES_MANIFEST_DIRS
The value is parsed with filepath.SplitList(...), so on typical Unix systems it looks like:
export SCRAPER_SITES_MANIFEST_DIRS="/path/to/sites-a:/path/to/sites-b"
Use one or more repeated flags:
scraper \
--sites-manifest-dir ./sites \
--sites-manifest-dir ../extra-sites \
site js-demo run seed --help
The bootstrap parser also accepts the --sites-manifest-dir=/path form.
Run a site verb from the repo's default sites/ directory:
go run ./cmd/scraper --sites-manifest-dir ./sites site js-demo run seed --help
Run the worker against the same manifest set:
go run ./cmd/scraper \
--sites-manifest-dir ./sites \
worker run \
--sites-dir /tmp/scraper-sites \
--engine-db /tmp/engine.db
Serve the API with the same manifest set:
go run ./cmd/scraper \
--sites-manifest-dir ./sites \
api serve \
--sites-dir ./state/sites \
--engine-db ./state/engine.db
| Problem | Cause | Solution |
|---|---|---|
site js-demo run seed is missing | scraper did not load the manifest directories during bootstrap | Pass --sites-manifest-dir, set SCRAPER_SITES_MANIFEST_DIRS, or configure ~/.scraper/config.yaml |
--help should work on a site verb but bootstrap fails first | The bootstrap parser should only extract manifest dirs | Verify you are on a build that includes the manual bootstrap scanner |
Commands work in tests but not with go run ./cmd/scraper | Tests pass explicit manifest dirs while the real CLI does not | Add a bootstrap source for the real CLI invocation |
| Worker/API sees no sites | The root command was built without any manifest directories | Start those commands with the same bootstrap site-dir inputs as the site command |
scraper help scraper-architecture-overviewscraper help scraper-runtime-modelscraper help scraper-adding-a-declarative-sitescraper help scraper-new-developer-onboarding