Schema + Indexer E2E

This guide describes an operational workflow for blue/green schema apply, full and partitioned reindex, retention management, and safe delete‑stale.

Audience: operators and engineers running indexing in production
Scope: engine APIs and CLI tasks; network‑safe, redaction‑aware

Related: Schema, Indexer, CLI, Observability, DX, Troubleshooting

Overview

Blue/green keeps downtime minimal and rollbacks safe:

Apply schema changes to a new physical collection (versioned name)
Rebuild data (full or partitioned) into that new physical
Validate correctness and performance
Atomically swap the alias to promote the new physical
Retain prior physicals for a short window; drop beyond retention

If you create a physical, import into it, and call Client#upsert_alias directly, retention cleanup will NOT run. Old physicals remain until explicitly deleted. Use Schema.apply! (or search_engine:schema:apply[…]) to run the full blue/green lifecycle including retention.

See: Schema, Indexer, CLI

Schema lifecycle: diff → apply → validate → swap → rollback

Physical naming: “<logical>YYYYMMDD_HHMMSS###” (timestamp + sequence). Logical name is stable and exposed via alias (e.g., products).
Diff compares compiled vs live (aliased physical): fields, types, and selected options. Intentionally immutable aspects (e.g., field type narrowing) are flagged for safe migration.
Apply creates a new physical collection; reindex populates it; failures leave the alias untouched for inspection.
Validation before swap: sample queries, mapping sanity, counts/order checks on key paths. Prefer dry_run! and the offline client (or stub client) for exploratory checks.
Alias swap: traffic moves atomically to the new physical via an upsert; old physicals remain per retention.
Rollback: re‑point alias to prior retained physical if available; consider documents indexed after the swap window.
Observability hooks (predictive): search_engine.schema.diff, search_engine.schema.apply, search_engine.alias.swap (swap often occurs within schema.apply). See Observability.

Indexer overview: sources, Mapper DSL, partitions, hooks, dispatcher

Sources: feed batched rows
- ActiveRecord (ORM) for in‑app queries and scopes
- SQL for tuned reads and keyset iteration
- Lambda/stream for external APIs and services
Mapper DSL: transforms rows to documents (field renames, coercions, joins lookups, defaults), validates required fields, and emits mapping metrics.
Partitions: choose a stable key (e.g., shop, shard, updated_at bucket). Full rebuild runs once over all data. Partitioned rebuild reduces risk and isolates failures.
Hooks: before_partition/after_partition for enrichment, throttling, metrics; must be idempotent.
Dispatcher: fans out partitions inline or via background jobs, with back‑pressure and retries; batch size and timeouts are tunable.
Events (predictive): search_engine.indexer.enqueue, search_engine.indexer.batch_import, search_engine.indexer.retry, search_engine.indexer.complete. Also see existing events indexer.partition_start, indexer.partition_finish, indexer.batch_import.

See: Indexer, CLI, Observability

CLI tasks & expected outcomes

Use these tasks to drive the lifecycle. They respect configured timeouts, batch sizes, redaction, and dispatch mode. See CLI.

rails 'search_engine:schema:apply[products]'
rails 'search_engine:index:rebuild[products]'
rails 'search_engine:index:delete_stale[products]'

- What: create new physical → reindex → alias swap → retention cleanup
- Preconditions: collection registered; API key valid; network reachable
- Expected: summary with created/updated fields, new/previous physical, retention results
- What: full rebuild or partition fan‑out via dispatcher; batches streamed with retries
- Preconditions: schema exists and alias resolves; sources accessible
- Expected: summary of batches/partitions processed; retry counts, durations
- Tip: model API supports optional dependency preflight
  - SearchEngine::Book.index_collection(pre: :ensure) (ensure presence only)
  - SearchEngine::Book.index_collection(pre: :index) (ensure + fix drift)
- What: delete documents not in the current source‑of‑truth snapshot via your stale rules (declared inside index and OR‑merged into a filter_by)
- Preconditions: at least one stale rule resolves to a non‑empty filter; strict mode or dry‑run configured as needed
- Expected: summary of candidates and deletions (or preview in dry‑run)

Apply flow

Partitioned rebuild flow

Retention & delete‑stale

Retention: keep a limited number of recent physicals (configurable globally and per collection) for debug/rollback. Old physicals beyond the window are dropped; the alias target is never dropped. See Schema.
Delete‑stale: identify documents that should no longer exist based on a source‑of‑truth snapshot. Declare stale rules that compile into a filter_by string (e.g., archived flag, partition + archived, date thresholds). Strict mode prevents catch‑alls.
Guardrails: dry‑run preview, maximum delete thresholds, and sampling pre‑checks before destructive steps. Logs are redacted and include a short filter hash.

See: Indexer, CLI

Safety thresholds

Configure strict mode and optional estimation to avoid large, accidental purges. Exits with non‑success when thresholds are violated under strict settings.

Safety, performance & tuning

Idempotency: imports and hooks must tolerate retries and duplicate batches.
Retries & back‑pressure: transient failures back off; 413 splits batches; timeouts are configurable. Avoid hot partitions by balancing partition keys.
Tuning knobs: batch size, concurrency, dispatch mode, and request timeouts.
Query behavior: schema changes can alter ranking/filtering; coordinate with presets and curation for consistent results.

See: Deferred Typesense Features, Observability

Debugging & DX

Use dry_run! on representative relations to validate mapping and query readiness against the new schema without I/O.
Use explain to surface selection, grouping, joins, and potential conflicts; use to_curl to reproduce requests (redacted).
Doctor task: search_engine:doctor validates environment and connectivity.

See: DX, Observability, CLI

Troubleshooting

Insufficient privileges for schema tasks → see Schema permissions
Alias not found or mispointed → see Alias swap
Indexer job failure (batch too large, timeout) → see Indexer and CLI
Delete‑stale threshold exceeded → see Safety thresholds

Schema permissions

Ensure the API key can create collections, update aliases, and delete collections. Validate with search_engine:doctor; confirm host/port/protocol and timeouts. For CI, inject keys via ENV. See CLI and Troubleshooting.

Overview

Guidebook

Guides

API

CLI & DX

Operations & Testing

Community

Overview

Schema lifecycle: diff → apply → validate → swap → rollback

Indexer overview: sources, Mapper DSL, partitions, hooks, dispatcher

CLI tasks & expected outcomes

Apply flow

Partitioned rebuild flow

Retention & delete‑stale

Safety thresholds

Safety, performance & tuning

Debugging & DX

Troubleshooting

Schema permissions

Overview

Guidebook

Guides

API

CLI & DX

Operations & Testing

Community

​Overview

​Schema lifecycle: diff → apply → validate → swap → rollback

​Indexer overview: sources, Mapper DSL, partitions, hooks, dispatcher

​CLI tasks & expected outcomes

​Apply flow

​Partitioned rebuild flow

​Retention & delete‑stale

​Safety thresholds

​Safety, performance & tuning

​Debugging & DX

​Troubleshooting

​Schema permissions

Overview

Schema lifecycle: diff → apply → validate → swap → rollback

Indexer overview: sources, Mapper DSL, partitions, hooks, dispatcher

CLI tasks & expected outcomes

Apply flow

Partitioned rebuild flow

Retention & delete‑stale

Safety thresholds

Safety, performance & tuning

Debugging & DX

Troubleshooting

Schema permissions