Skip to content

Migration Pipeline

The default pgferry flow is built to load data first and add expensive constraints later.

StageWhat happensFullschema_onlydata_only
1Load and validate configYesYesYes
2Introspect the source schemaYesYesYes
3Create target schema and tablesYesYesNo
4Run before_data hooksYesNoYes
5Stream data with COPYYesNoYes
6Run after_data hooksYesNoYes
7Optional validationYesNoYes
8SET LOGGED for full migrations using UNLOGGED tablesYesNoNo
9Add primary keysYesYesNo
10Add indexes with bounded parallelismYesYesNo
11Run before_fk hooksYesYesNo
12Optional orphan cleanupYesNoNo
13Add foreign keysYesYesNo
14Reset sequencesYesYesYes
15Optional unsigned checksYesYesNo
16Optional trigger emulationYesYesNo
17Run after_all hooksYesYesYes

This is the default. pgferry creates tables, loads data, then adds constraints and post-load objects.

schema_only = false
data_only = false

Use this unless you have a specific operational reason to split schema creation and data load.

schema_only = true

Use this when you want to inspect or adjust the target schema before any data is loaded. It skips:

  • before_data
  • data COPY
  • after_data
  • row-count validation
  • SET LOGGED
  • orphan cleanup
data_only = true

Use this when the target schema already exists from a prior schema_only run or from your own DDL. It skips:

  • table creation
  • primary keys and indexes
  • before_fk
  • orphan cleanup
  • foreign keys
  • unsigned checks
  • trigger creation

Operational note: before COPY starts, pgferry verifies that the target role can run ALTER TABLE ... DISABLE TRIGGER ALL and later re-enable those triggers on the selected target tables. If that preflight fails, pgferry aborts before copying any data.

This matters because data_only loads into a schema where foreign keys and other triggers may already exist. pgferry temporarily disables those triggers during the load so parallel COPY can proceed without immediate FK enforcement.

Terminal window
pgferry migrate schema-migration.toml # schema_only = true
pgferry migrate data-migration.toml # data_only = true

Tradeoff: this is slower than the default full pipeline because data is loaded with indexes and foreign keys already present.

  • Fastest mode.
  • Tables or chunks can run in parallel.
  • Best when source writes are already paused or consistency across tables does not matter.
  • Supported on MySQL, MariaDB, and MSSQL.
  • Reads the source inside one read-only transaction for a stable point-in-time view.
  • Gives up parallel source reads for consistency.

Use single_tx when the source stays live during migration and the tables must agree with each other at one point in time.

pgferry chunks a table only when it has a single-column numeric primary key.

  • MySQL integer PKs such as tinyint, smallint, mediumint, int, bigint
  • MariaDB integer PKs such as tinyint, smallint, mediumint, int, bigint
  • SQLite integer primary keys
  • MSSQL numeric integer primary keys
  • composite primary keys
  • non-numeric primary keys such as UUID or text
  • tables with no primary key
  1. pgferry finds MIN(pk) and MAX(pk).
  2. It splits the range into chunks of roughly chunk_size.
  3. Each chunk becomes a bounded SELECT ... WHERE pk >= lower AND pk < upper.
  4. Completed chunks can be checkpointed individually when resume = true.

When resume = true, pgferry stores progress in pgferry_checkpoint.json next to the config file.

Use pgferry checkpoint status migration.toml when you want to inspect that file as human-readable progress instead of opening the JSON directly.

  • completed table copies
  • completed chunk copies
  • migration shape compatibility, including hooks, identifier rules, type mapping, and table layout
resume = true
unlogged_tables = false

Do not combine resume = true with on_schema_exists = "recreate" or "use", or with schema_only = true.

validation = "row_count" runs after after_data hooks and before post-load DDL like indexes and foreign keys.

This is intentional:

  • table data is already present
  • hook-driven cleanup or transforms can run first
  • expensive post-load objects are not built yet if validation finds a mismatch
Terminal window
pgferry validate migration.toml

This command is outside the migrate pipeline. It loads the same TOML config, reconnects to the source and target, introspects the selected tables, and reruns the configured validation mode only.

It skips:

  • schema creation
  • data COPY
  • hooks
  • checkpoints
  • post-load DDL like indexes, foreign keys, and sequence reset

Use it when you want to compare the current source state against an already-loaded target after cutover or on a later check.

  • Use the full pipeline for most rehearsals and first production attempts.
  • Split into schema_only and data_only only when you need manual schema review or a more controlled cutover sequence.
  • Treat data_only as privilege-sensitive on PostgreSQL. Managed services and restricted roles often block trigger control on pre-existing tables.
  • Prefer resume = true for large runs where restarts are expensive.
  • Prefer validation = "row_count" for the final rehearsals before cutover.