Skip to main content
This guide describes an operational workflow for blue/green schema apply, full and partitioned reindex, retention management, and safe delete‑stale.
  • Audience: operators and engineers running indexing in production
  • Scope: engine APIs and CLI tasks; network‑safe, redaction‑aware
Related: Schema, Indexer, CLI, Observability, DX, Troubleshooting

Overview

Blue/green keeps downtime minimal and rollbacks safe:
  • Apply schema changes to a new physical collection (versioned name)
  • Rebuild data (full or partitioned) into that new physical
  • Validate correctness and performance
  • Atomically swap the alias to promote the new physical
  • Retain prior physicals for a short window; drop beyond retention
If you create a physical, import into it, and call Client#upsert_alias directly, retention cleanup will NOT run. Old physicals remain until explicitly deleted. Use Schema.apply! (or search_engine:schema:apply[…]) to run the full blue/green lifecycle including retention.
See: Schema, Indexer, CLI

Schema lifecycle: diff → apply → validate → swap → rollback

  • Physical naming: “<logical>YYYYMMDD_HHMMSS###” (timestamp + sequence). Logical name is stable and exposed via alias (e.g., products).
  • Diff compares compiled vs live (aliased physical): fields, types, and selected options. Intentionally immutable aspects (e.g., field type narrowing) are flagged for safe migration.
  • Apply creates a new physical collection; reindex populates it; failures leave the alias untouched for inspection.
  • Validation before swap: sample queries, mapping sanity, counts/order checks on key paths. Prefer dry_run! and the offline client (or stub client) for exploratory checks.
  • Alias swap: traffic moves atomically to the new physical via an upsert; old physicals remain per retention.
  • Rollback: re‑point alias to prior retained physical if available; consider documents indexed after the swap window.
  • Observability hooks (predictive): search_engine.schema.diff, search_engine.schema.apply, search_engine.alias.swap (swap often occurs within schema.apply). See Observability.

Indexer overview: sources, Mapper DSL, partitions, hooks, dispatcher

  • Sources: feed batched rows
    • ActiveRecord (ORM) for in‑app queries and scopes
    • SQL for tuned reads and keyset iteration
    • Lambda/stream for external APIs and services
  • Mapper DSL: transforms rows to documents (field renames, coercions, joins lookups, defaults), validates required fields, and emits mapping metrics.
  • Partitions: choose a stable key (e.g., shop, shard, updated_at bucket). Full rebuild runs once over all data. Partitioned rebuild reduces risk and isolates failures.
  • Hooks: before_partition/after_partition for enrichment, throttling, metrics; must be idempotent.
  • Dispatcher: fans out partitions inline or via background jobs, with back‑pressure and retries; batch size and timeouts are tunable.
  • Events (predictive): search_engine.indexer.enqueue, search_engine.indexer.batch_import, search_engine.indexer.retry, search_engine.indexer.complete. Also see existing events indexer.partition_start, indexer.partition_finish, indexer.batch_import.
See: Indexer, CLI, Observability

CLI tasks & expected outcomes

Use these tasks to drive the lifecycle. They respect configured timeouts, batch sizes, redaction, and dispatch mode. See CLI.
rails 'search_engine:schema:apply[products]'
rails 'search_engine:index:rebuild[products]'
rails 'search_engine:index:delete_stale[products]'
    • What: create new physical → reindex → alias swap → retention cleanup
    • Preconditions: collection registered; API key valid; network reachable
    • Expected: summary with created/updated fields, new/previous physical, retention results
    • What: full rebuild or partition fan‑out via dispatcher; batches streamed with retries
    • Preconditions: schema exists and alias resolves; sources accessible
    • Expected: summary of batches/partitions processed; retry counts, durations
    • Tip: model API supports optional dependency preflight
      • SearchEngine::Book.index_collection(pre: :ensure) (ensure presence only)
      • SearchEngine::Book.index_collection(pre: :index) (ensure + fix drift)
    • What: delete documents not in the current source‑of‑truth snapshot via your stale rules (declared inside index and OR‑merged into a filter_by)
    • Preconditions: at least one stale rule resolves to a non‑empty filter; strict mode or dry‑run configured as needed
    • Expected: summary of candidates and deletions (or preview in dry‑run)

Apply flow


Partitioned rebuild flow


Retention & delete‑stale

  • Retention: keep a limited number of recent physicals (configurable globally and per collection) for debug/rollback. Old physicals beyond the window are dropped; the alias target is never dropped. See Schema.
  • Delete‑stale: identify documents that should no longer exist based on a source‑of‑truth snapshot. Declare stale rules that compile into a filter_by string (e.g., archived flag, partition + archived, date thresholds). Strict mode prevents catch‑alls.
  • Guardrails: dry‑run preview, maximum delete thresholds, and sampling pre‑checks before destructive steps. Logs are redacted and include a short filter hash.
See: Indexer, CLI

Safety thresholds

Configure strict mode and optional estimation to avoid large, accidental purges. Exits with non‑success when thresholds are violated under strict settings.

Safety, performance & tuning

  • Idempotency: imports and hooks must tolerate retries and duplicate batches.
  • Retries & back‑pressure: transient failures back off; 413 splits batches; timeouts are configurable. Avoid hot partitions by balancing partition keys.
  • Tuning knobs: batch size, concurrency, dispatch mode, and request timeouts.
  • Query behavior: schema changes can alter ranking/filtering; coordinate with presets and curation for consistent results.
See: Deferred Typesense Features, Observability

Debugging & DX

  • Use dry_run! on representative relations to validate mapping and query readiness against the new schema without I/O.
  • Use explain to surface selection, grouping, joins, and potential conflicts; use to_curl to reproduce requests (redacted).
  • Doctor task: search_engine:doctor validates environment and connectivity.
See: DX, Observability, CLI

Troubleshooting

Schema permissions

Ensure the API key can create collections, update aliases, and delete collections. Validate with search_engine:doctor; confirm host/port/protocol and timeouts. For CI, inject keys via ENV. See CLI and Troubleshooting.
Related links: Schema, Indexer, CLI, Observability, DX, Troubleshooting, Deferred Typesense Features