iqtoolkit journal
How to Use iqtoolkit/pgtools: HOT Update Optimization Checklist (Part 2)
How to Use iqtoolkit/pgtools: HOT Update Optimization Checklist (Part 2)
This is the second post in our series on using iqtoolkit/pgtools for PostgreSQL performance optimization.
In Part 1, we covered what HOT updates are, how to detect them, and how to apply fillfactor recommendations.
In this post you'll learn:
- How to validate that your fillfactor changes actually improved HOT behavior
- Why HOT% might still be low after tuning
- Strategies for investigating index-related issues
- When (and when not) to reach for more aggressive rewrite operations
- Setting up continuous monitoring with pgtools
Validating Changes in Production
After applying ALTER TABLE ... SET (fillfactor=...) and a rewrite operation, you need solid data to confirm improvement.
The Validation Window
Here's the key insight: statistics reset on rewrite.
When you run VACUUM (FULL), CLUSTER, pg_squeeze, or pg_repack, PostgreSQL resets the counters in pg_stat_user_tables for that table. This is actually helpful—it gives you a clean baseline.
The validation process:
- Note the timestamp before your rewrite operation
- Run the rewrite (
VACUUM (FULL),CLUSTER,pg_squeeze, orpg_repack) - Let your application run for the same duration as before (e.g., if you collected stats over 24 hours, wait 24 hours again)
- Re-run the pgtools checklist and compare
Example Validation Query
After your change, you can run:
SELECT
schemaname,
tablename,
n_tup_upd,
n_tup_hot_upd,
ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) AS hot_ratio,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_stat_user_tables
WHERE schemaname = 'public'
AND n_tup_upd > 1000 -- Focus on meaningful workloads
ORDER BY n_tup_hot_upd DESC;What you're looking for:
hot_ratioshould improve compared to before the change- If it doesn't improve, that signals a different root cause (see troubleshooting below)
Troubleshooting: Why Didn't HOT% Improve?
So you lowered fillfactor, ran the rewrite, and HOT% is still low. Here are the most common culprits.
1. Updates Are Changing Indexed Columns
This is the most common reason HOT updates aren't happening.
The Problem:
If you're running UPDATE mytable SET indexed_column = value ..., PostgreSQL must update all indexes on indexed_column. HOT is impossible in that case—it's a physical constraint, not a tuning opportunity.
How to Find It:
Use the pgtools index analysis script to identify which columns in your indexes are being frequently updated:
psql -d your_db -Xq -f analysis/index_column_update_analysis.sql \
| jq -r '.analysis[] | select(.update_frequency > 10) | {table, index, columns}'Or, if you're using application-level tracking, correlate your UPDATE statements with your table indexes.
The Fix:
- Consider if the index is necessary. If updates always change a certain indexed column but queries rarely use that column, drop the index.
- Use a partial index if only a subset of rows need it (e.g.,
WHERE status = 'active'). - Defer the column update to a separate, less-frequent operation if your business logic allows.
2. Fillfactor Is Still Too High (or You Skipped the Rewrite)
Lower fillfactor only helps new inserts and page splits. If you set fillfactor=80 but never rewrote the table, the old pages are still packed at 100%.
The Fix:
- Run the rewrite operation (
VACUUM (FULL),CLUSTER,pg_squeeze, orpg_repack) after setting fillfactor. - Choose the least disruptive approach for your workload:
pg_squeeze: Non-blocking external tool (recommended first option—uses replication slots, minimal performance impact).pg_repack: Non-blocking external tool (alternative—uses triggers which have more overhead than replication slots).VACUUM (FULL): Simplest built-in option, but locks the table (use during maintenance windows only).CLUSTER: Locks but also optimizes physical order on disk (good if you have a natural sort key).
3. Vacuum Isn't Keeping Up
If dead tuples accumulate faster than VACUUM can reclaim them, page fragmentation gets worse, and HOT updates still can't find free space.
Diagnosis:
SELECT
schemaname,
tablename,
n_dead_tup,
n_live_tup,
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
ORDER BY dead_ratio DESC;If dead_ratio is above ~10% on update-heavy tables, autovacuum is likely falling behind.
The Fix:
- Tune autovacuum parameters for that table:
ALTER TABLE mytable SET ( autovacuum_vacuum_scale_factor = 0.01, -- vacuum at 1% dead tuples autovacuum_analyze_scale_factor = 0.005 -- analyze at 0.5% ); - Consider manual
VACUUMruns on a schedule during off-peak hours. - Monitor
autovacuum_workersand ensure they're not resource-starved.
4. Row Width Is Expanding (LOB Columns, JSON, Text)
Large columns can make it impossible to fit an updated row on the same page, even with lower fillfactor.
Diagnosis:
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) / (n_live_tup + 1)) AS avg_row_size
FROM pg_stat_user_tables
WHERE schemaname = 'public'
AND n_live_tup > 0
ORDER BY avg_row_size DESC;The Fix:
- Extract large columns to a separate table (consider a JSONB column in a "details" table instead of inline).
- Compress data if you're storing serialized text (JSONB compression, pgcrypto, bytea with compression).
- Use TOAST settings to force large values out-of-line:
ALTER TABLE mytable ALTER COLUMN large_col SET STORAGE EXTERNAL;
Advanced Monitoring: Setting Up Continuous Tracking
Once you've tuned a table, don't set it and forget it.
Create a Baseline View
Create a view that captures your "optimized" tables and their target HOT ratios:
CREATE VIEW hot_optimization_baseline AS
SELECT
schemaname,
tablename,
current_setting('fillfactor'::text)::int AS fillfactor_setting,
75 AS target_hot_ratio, -- Adjust based on your SLA
n_tup_upd,
n_tup_hot_upd,
ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) AS current_hot_ratio,
CASE
WHEN ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) >= 75 THEN 'GOOD'
WHEN ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) >= 50 THEN 'FAIR'
ELSE 'NEEDS ATTENTION'
END AS status
FROM pg_stat_user_tables
WHERE schemaname = 'public'
AND tablename IN (
'payments', 'orders', 'inventory' -- Your optimized tables
);Alert on Regression
Query this baseline periodically (daily or weekly) and alert if status drops below your target:
psql -d your_db -Xq -c "
SELECT tablename, current_hot_ratio, status
FROM hot_optimization_baseline
WHERE current_hot_ratio < 75 AND n_tup_upd > 0;
" | mail -s "HOT Optimization Alert" dba-team@company.comReal-World Case Study: The Payments Table
Let's walk through a complete optimization cycle:
Initial State:
paymentstable: 50M rows,fillfactor=100, ~500k updates/day- HOT ratio: 12%
- Index on
(customer_id, created_at) - Index on
(status, amount)
Analysis:
- Ran pgtools, found HOT recommendations
- Checked for indexed column updates:
UPDATE payments SET status = ...happens frequently - Checked dead tuples: 8% dead ratio—autovacuum was keeping up
Strategy:
- Status is indexed, so we can't make it HOT in all cases
- But we can improve fillfactor and let autovacuum be more aggressive
- Set
fillfactor = 75(more conservative than default 100) - Ran
pg_squeezeon Sunday evening (non-blocking rewrite with minimal performance impact) - Tuned autovacuum for that table
Result (after 1 week):
- HOT ratio improved from 12% to 43%
- Index size growth slowed by ~30%
- Update latency dropped slightly
- Not perfect (still updating
status), but significant improvement
When Not to Optimize for HOT
Not every table benefits from HOT tuning. Save your effort for:
- Update-heavy tables (>10k updates/day)
- Tables where most updates don't touch indexed columns
- Tables where index bloat is a known pain point
Skip HOT optimization for:
- Tables that are append-only or rarely updated
- OLAP-style tables (batch inserts, rare updates)
- Temporary or staging tables
- Situations where you're already using partial indexes or partitioning for other reasons
Next Steps
- Run pgtools on your workloads to identify candidates
- Validate baseline metrics before making changes
- Test fillfactor changes in staging first
- Monitor continuously after production rollout
- Iterate based on real workload patterns
The investment in careful monitoring often pays dividends in reduced index bloat and more predictable performance.
Coming up: In Part 3, we'll dive into using pgtools for index bloat analysis and recovery strategies for severely bloated indexes.