Step-by-step guide for adding a scraper site with site.yaml, JavaScript verbs and scripts, and no site-specific Go code.
If your site does not need custom native Go modules or special runtime hooks, the preferred path is now declarative: define the site with a site.yaml manifest, keep the scraping behavior in JavaScript, and let the existing engine handle execution, retries, queue policies, and persistence.
This tutorial shows the no-Go path for adding a site.
Use the declarative path when:
Do not use this path yet when:
For those cases, use scraper help scraper-adding-a-site and keep the site Go-defined.
Create a directory under sites/<site>/ with:
site.yamlscripts/verbs/migrations/fixtures/No site-specific Go wrapper is needed for the normal declarative path. The site behavior lives entirely in YAML, SQL, and JavaScript.
site.yamlThe manifest is the declarative envelope for the site.
Minimal example:
name: example
databaseFileName: example.db
scriptsRoot: scripts
verbsRoot: verbs
sqlMigrationsRoot: migrations
modules:
- default-registry
Optional queue policy example:
queuePolicies:
- queue: site:example:http
maxInFlight: 2
rateLimit:
ratePerSecond: 1
burst: 2
Current manifest fields are validated strictly. Typos and unknown keys fail fast during load.
Scraper must know where your sites/<site>/ directory lives before it can build dynamic commands like site <site> run <verb>.
Choose one of these bootstrap paths:
--sites-manifest-dir /path/to/sitesSCRAPER_SITES_MANIFEST_DIRS~/.scraper/config.yamlExample config:
sitesManifestDirs:
- /absolute/path/to/my-sites
You can use multiple site directories. Scraper merges config, env, and CLI bootstrap values before building the Cobra command tree.
The entrypoint still lives in verbs/ and uses the normal submit-verb model.
Typical responsibilities:
ctx.valuesThe verb should not perform the crawl directly. It seeds the workflow graph.
Put the durable execution logic in scripts/.
Use the existing JS runtime APIs:
ctx.inputctx.dep(...)ctx.emit(...)ctx.writeRecord(...)ctx.writeArtifact(...)See:
scraper help scraper-js-api-referencesites/jsdemo/scripts/sites/hackernews/scripts/If the site needs queryable projections, add numbered SQL files in migrations/.
Examples:
sites/jsdemo/migrations/001_init.sqlsites/hackernews/migrations/001_init.sqlKeep the first migration small and focused on the output your workflow actually writes.
Do not stop at unit tests for a parser or one helper function. Add a command-path or scheduler-path test that proves:
Good examples:
pkg/cmd/site_test.gopkg/cmd/bootstrap_test.goBefore committing:
go test ./pkg/cmd/... -count=1
go test ./... -count=1
If you added or changed help pages:
go run ./cmd/scraper --sites-manifest-dir ./sites help scraper-adding-a-declarative-site
sites/ directories instead of recompiling the binary.scraper help scraper-adding-a-sitescraper help scraper-runtime-modelscraper help scraper-js-api-referencescraper help scraper-architecture-overviewscraper help scraper-bootstrap-config-and-site-manifest-loading