Adding a Declarative Site

Step-by-step guide for adding a scraper site with site.yaml, JavaScript verbs and scripts, and no site-specific Go code.

Sections

Terminology & Glossary
📖 Documentation
Navigation
11 sectionsv0.1
📄 Adding a Declarative Site — glaze help scraper-adding-a-declarative-site
scraper-adding-a-declarative-site

Adding a Declarative Site

Step-by-step guide for adding a scraper site with site.yaml, JavaScript verbs and scripts, and no site-specific Go code.

Tutorialscrapertutorialsitesjavascriptmanifestsonboardingsiteworkersites-direngine-db

If your site does not need custom native Go modules or special runtime hooks, the preferred path is now declarative: define the site with a site.yaml manifest, keep the scraping behavior in JavaScript, and let the existing engine handle execution, retries, queue policies, and persistence.

This tutorial shows the no-Go path for adding a site.

When To Use This Path

Use the declarative path when:

  • your site can be described with roots for scripts, verbs, migrations, and optional help
  • queue policies are pure metadata
  • the site only needs the standard runtime modules already supported by the manifest loader

Do not use this path yet when:

  • the site requires a one-off Go-native module
  • the site needs custom runtime registrars
  • the site needs custom CLI wiring that cannot be shared

For those cases, use scraper help scraper-adding-a-site and keep the site Go-defined.

Step 1 — Create The Site Directory

Create a directory under sites/<site>/ with:

  • site.yaml
  • scripts/
  • verbs/
  • migrations/
  • optional fixtures/

No site-specific Go wrapper is needed for the normal declarative path. The site behavior lives entirely in YAML, SQL, and JavaScript.

Step 2 — Write site.yaml

The manifest is the declarative envelope for the site.

Minimal example:

name: example
databaseFileName: example.db
scriptsRoot: scripts
verbsRoot: verbs
sqlMigrationsRoot: migrations
modules:
  - default-registry

Optional queue policy example:

queuePolicies:
  - queue: site:example:http
    maxInFlight: 2
    rateLimit:
      ratePerSecond: 1
      burst: 2

Current manifest fields are validated strictly. Typos and unknown keys fail fast during load.

Step 3 — Make The Site Discoverable During Bootstrap

Scraper must know where your sites/<site>/ directory lives before it can build dynamic commands like site <site> run <verb>.

Choose one of these bootstrap paths:

  • pass --sites-manifest-dir /path/to/sites
  • set SCRAPER_SITES_MANIFEST_DIRS
  • add the directory to ~/.scraper/config.yaml

Example config:

sitesManifestDirs:
  - /absolute/path/to/my-sites

You can use multiple site directories. Scraper merges config, env, and CLI bootstrap values before building the Cobra command tree.

Step 4 — Write The Submit Verbs

The entrypoint still lives in verbs/ and uses the normal submit-verb model.

Typical responsibilities:

  • read values from ctx.values
  • create the initial durable ops
  • optionally set a target op

The verb should not perform the crawl directly. It seeds the workflow graph.

Step 5 — Write The Durable Scripts

Put the durable execution logic in scripts/.

Use the existing JS runtime APIs:

  • ctx.input
  • ctx.dep(...)
  • ctx.emit(...)
  • ctx.writeRecord(...)
  • ctx.writeArtifact(...)

See:

  • scraper help scraper-js-api-reference
  • sites/jsdemo/scripts/
  • sites/hackernews/scripts/

Step 6 — Add Migrations

If the site needs queryable projections, add numbered SQL files in migrations/.

Examples:

  • sites/jsdemo/migrations/001_init.sql
  • sites/hackernews/migrations/001_init.sql

Keep the first migration small and focused on the output your workflow actually writes.

Step 7 — Add At Least One End-To-End Test

Do not stop at unit tests for a parser or one helper function. Add a command-path or scheduler-path test that proves:

  1. the submit verb emits work
  2. the worker can execute the work
  3. the site DB receives the expected projection or artifacts

Good examples:

  • pkg/cmd/site_test.go
  • pkg/cmd/bootstrap_test.go

Step 8 — Validate

Before committing:

go test ./pkg/cmd/... -count=1
go test ./... -count=1

If you added or changed help pages:

go run ./cmd/scraper --sites-manifest-dir ./sites help scraper-adding-a-declarative-site

Practical Advice

  • Start with the smallest possible workflow graph.
  • Add queue policies only where the site actually needs protection.
  • Keep the first manifest small and boring.
  • If you feel pressure to stuff custom runtime logic into the manifest, that is a signal the site may still need a Go-native extension path.
  • Prefer keeping example/default sites in normal sites/ directories instead of recompiling the binary.

See Also

  • scraper help scraper-adding-a-site
  • scraper help scraper-runtime-model
  • scraper help scraper-js-api-reference
  • scraper help scraper-architecture-overview
  • scraper help scraper-bootstrap-config-and-site-manifest-loading