Migration Pipeline
The default pgferry flow is built to load data first and add expensive constraints later.
Stage order
Section titled “Stage order”| Stage | What happens | Full | schema_only | data_only |
|---|---|---|---|---|
| 1 | Load and validate config | Yes | Yes | Yes |
| 2 | Introspect the source schema | Yes | Yes | Yes |
| 3 | Create target schema and tables | Yes | Yes | No |
| 4 | Run before_data hooks | Yes | No | Yes |
| 5 | Stream data with COPY | Yes | No | Yes |
| 6 | Run after_data hooks | Yes | No | Yes |
| 7 | Optional validation | Yes | No | Yes |
| 8 | SET LOGGED for full migrations using UNLOGGED tables | Yes | No | No |
| 9 | Add primary keys | Yes | Yes | No |
| 10 | Add indexes with bounded parallelism | Yes | Yes | No |
| 11 | Run before_fk hooks | Yes | Yes | No |
| 12 | Optional orphan cleanup | Yes | No | No |
| 13 | Add foreign keys | Yes | Yes | No |
| 14 | Reset sequences | Yes | Yes | Yes |
| 15 | Optional unsigned checks | Yes | Yes | No |
| 16 | Optional trigger emulation | Yes | Yes | No |
| 17 | Run after_all hooks | Yes | Yes | Yes |
Full migration
Section titled “Full migration”This is the default. pgferry creates tables, loads data, then adds constraints and post-load objects.
schema_only = falsedata_only = falseUse this unless you have a specific operational reason to split schema creation and data load.
schema_only
Section titled “schema_only”schema_only = trueUse this when you want to inspect or adjust the target schema before any data is loaded. It skips:
before_data- data COPY
after_data- row-count validation
SET LOGGED- orphan cleanup
data_only
Section titled “data_only”data_only = trueUse this when the target schema already exists from a prior schema_only run or from your own DDL. It skips:
- table creation
- primary keys and indexes
before_fk- orphan cleanup
- foreign keys
- unsigned checks
- trigger creation
Operational note: before COPY starts, pgferry verifies that the target role can run ALTER TABLE ... DISABLE TRIGGER ALL and later re-enable those triggers on the selected target tables. If that preflight fails, pgferry aborts before copying any data.
This matters because data_only loads into a schema where foreign keys and other triggers may already exist. pgferry temporarily disables those triggers during the load so parallel COPY can proceed without immediate FK enforcement.
Two-phase workflow
Section titled “Two-phase workflow”pgferry migrate schema-migration.toml # schema_only = truepgferry migrate data-migration.toml # data_only = trueTradeoff: this is slower than the default full pipeline because data is loaded with indexes and foreign keys already present.
Snapshot strategy
Section titled “Snapshot strategy”source_snapshot_mode = "none"
Section titled “source_snapshot_mode = "none"”- Fastest mode.
- Tables or chunks can run in parallel.
- Best when source writes are already paused or consistency across tables does not matter.
source_snapshot_mode = "single_tx"
Section titled “source_snapshot_mode = "single_tx"”- Supported on MySQL, MariaDB, and MSSQL.
- Reads the source inside one read-only transaction for a stable point-in-time view.
- Gives up parallel source reads for consistency.
Use single_tx when the source stays live during migration and the tables must agree with each other at one point in time.
Chunking
Section titled “Chunking”pgferry chunks a table only when it has a single-column numeric primary key.
Chunkable tables
Section titled “Chunkable tables”- MySQL integer PKs such as
tinyint,smallint,mediumint,int,bigint - MariaDB integer PKs such as
tinyint,smallint,mediumint,int,bigint - SQLite integer primary keys
- MSSQL numeric integer primary keys
Not chunkable
Section titled “Not chunkable”- composite primary keys
- non-numeric primary keys such as UUID or text
- tables with no primary key
What chunking changes
Section titled “What chunking changes”- pgferry finds
MIN(pk)andMAX(pk). - It splits the range into chunks of roughly
chunk_size. - Each chunk becomes a bounded
SELECT ... WHERE pk >= lower AND pk < upper. - Completed chunks can be checkpointed individually when
resume = true.
Resume behavior
Section titled “Resume behavior”When resume = true, pgferry stores progress in pgferry_checkpoint.json next to the config file.
Use pgferry checkpoint status migration.toml when you want to inspect that file as human-readable progress instead of opening the JSON directly.
What the checkpoint protects
Section titled “What the checkpoint protects”- completed table copies
- completed chunk copies
- migration shape compatibility, including hooks, identifier rules, type mapping, and table layout
What you need for safe resume
Section titled “What you need for safe resume”resume = trueunlogged_tables = falseDo not combine resume = true with on_schema_exists = "recreate" or "use", or with schema_only = true.
Validation timing
Section titled “Validation timing”validation = "row_count" runs after after_data hooks and before post-load DDL like indexes and foreign keys.
This is intentional:
- table data is already present
- hook-driven cleanup or transforms can run first
- expensive post-load objects are not built yet if validation finds a mismatch
Standalone validate
Section titled “Standalone validate”pgferry validate migration.tomlThis command is outside the migrate pipeline. It loads the same TOML config, reconnects to the source and target, introspects the selected tables, and reruns the configured validation mode only.
It skips:
- schema creation
- data COPY
- hooks
- checkpoints
- post-load DDL like indexes, foreign keys, and sequence reset
Use it when you want to compare the current source state against an already-loaded target after cutover or on a later check.
Practical operating advice
Section titled “Practical operating advice”- Use the full pipeline for most rehearsals and first production attempts.
- Split into
schema_onlyanddata_onlyonly when you need manual schema review or a more controlled cutover sequence. - Treat
data_onlyas privilege-sensitive on PostgreSQL. Managed services and restricted roles often block trigger control on pre-existing tables. - Prefer
resume = truefor large runs where restarts are expensive. - Prefer
validation = "row_count"for the final rehearsals before cutover.