Step-by-step guide for adding a new Go-native site when declarative manifests are not enough.
Adding a site means fitting your scraper into the existing engine shape rather than inventing a new execution path. The durable engine is already responsible for scheduling, retries, HTTP execution, DB lifecycle, and queue policy. Your job is to describe the site's behavior in a site definition with submit verbs, scripts, migrations, fixtures, and tests.
This tutorial now covers the fallback Go-native path only. Use it when the site truly needs custom Go-owned behavior beyond what site.yaml plus JavaScript can express.
If your site can be expressed with site.yaml plus JavaScript, start with:
scraper help scraper-adding-a-declarative-siteBefore starting, make sure you understand:
scraper help scraper-architecture-overviewscraper help scraper-runtime-modelscraper help scraper-js-api-referenceIt also helps to read one simple site and one complex site under the repo-level sites/ directory:
sites/jsdemo/sites/nereval/Before writing any Go, start from a declarative site under sites/<site>/ and ask what is still missing.
You only need this Go-native path when at least one of these is true:
If none of those are true, stop and use scraper help scraper-adding-a-declarative-site instead.
If you do need Go, create a normal Go package under pkg/sites/<site>/ that contributes the extra runtime behavior, then point it at the declarative content under sites/<site>/.
Typical additions in a Go-native site are:
Every site should have at least one operator entrypoint that seeds the first durable work. This lives in verbs/ and uses __verb__.
Use js-demo as the smallest pattern:
sites/jsdemo/verbs/seed.jsUse hackernews for an HTTP-based site:
sites/hackernews/verbs/seed.jsUse nereval when the site needs more complex input:
sites/nereval/verbs/seed.jsThe submit verb should:
ctx.valuesIt should not:
Your durable site behavior lives in scripts/. Start with the smallest graph that proves the site can run in the engine.
Typical shapes:
seed.jsextract_frontpage.jsseed.jsextract_list.jsextract_detail.jsscripts/lib/Use the helper modules already exposed by the runtime:
require("site-db")require("scraper-db")And use the runtime context carefully:
ctx.inputctx.dep(...)ctx.emit(...)ctx.writeRecord(...) and ctx.writeArtifact(...)See scraper help scraper-js-api-reference for the complete API.
For examples:
sites/hackernews/scripts/extract_frontpage.jssites/slashdot/scripts/extract_frontpage.jssites/nereval/scripts/extract_list.jssites/nereval/scripts/extract_detail.jsIf the site needs queryable output, define it in migrations/. The site DB should contain projection tables, not engine workflow state.
Migration files are numbered SQL scripts (e.g. 001_init.sql). The migration manager (pkg/sites/migrate/) discovers and applies them in order when scraper site migrate <site> runs, or automatically when the worker opens the site DB. Each migration runs in a transaction and is tracked so it only applies once.
Examples:
sites/jsdemo/migrations/001_init.sqlsites/nereval/migrations/001_init.sqlKeep the first migration small and query-oriented. Add only the tables that the first end-to-end workflow actually writes.
The site still needs to be reachable during bootstrap so the root command can build dynamic site verbs.
There are two normal ways to do that:
--sites-manifest-dir, SCRAPER_SITES_MANIFEST_DIRS, or ~/.scraper/config.yaml)If you skip this step, your code may compile but the CLI will not expose the site verbs.
Use fixtures first. They make parser tests fast, deterministic, and reviewable.
Good fixture sets usually include:
Current examples:
sites/hackernews/fixtures/frontpage.htmlsites/slashdot/fixtures/frontpage.htmlsites/nereval/fixtures/Do not stop at parser unit tests. Add at least one test that exercises:
site <site> run <verb>engine statusworker runThe strongest current examples are in pkg/cmd/site_test.go.
That test proves the site works in the actual engine path rather than only in a hand-rolled helper.
Before committing:
gofmt -w pkg/sites/<site> pkg/cmd/site_test.go
go test ./... -count=1
If the site adds or changes help pages, also verify:
go run ./cmd/scraper --sites-manifest-dir ./sites help <your-slug>
| Problem | Cause | Solution |
|---|---|---|
| The CLI does not show the site | The site manifests were not available during bootstrap, or Go-native registration is incomplete | Check the bootstrap manifest dirs first, then review the root-command construction path |
site <site> run <verb> exists but does nothing useful | The submit verb emitted no initial ops | Review the verbs/ file first |
| The worker runs but the site DB stays empty | The durable scripts never write projections or they errored out early | Add assertions around ctx.dep(...), artifact extraction, and site-db writes |
| Pagination or detail fan-out duplicates work | The script emits duplicate child ops | Add workflow-local dedup checks or explicit dedup keys before changing the engine |
scraper help scraper-new-developer-onboarding — Suggested first-day path through the existing sitesscraper help scraper-bootstrap-config-and-site-manifest-loading — How manifest directories are discovered before dynamic site commands existscraper help scraper-runtime-model — Why submit verbs and durable scripts are separatescraper help scraper-js-api-reference — Complete JavaScript API referencescraper help scraper-architecture-overview — Broader system map