Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement

June 16, 2026

Author

Corrado Pandiani

Insight for DBAs

Insight for Developers

MySQL

Share this Post:

Managing data retention policies is one of the most common operational tasks in MySQL.

Applications continuously generate transactional, audit, logging, telemetry, and event data. Over time, these tables can grow to billions of rows, causing:

Larger backups
Longer recovery times
Reduced buffer pool efficiency
Slower index maintenance
Increased storage costs
Degraded query performance

To address these problems, organizations typically implement retention policies based on dates or timestamps. Examples include deleting events older than 90 days or purging session data older than 30 days and so forth. The deleted data can then eventually be archived somewhere else, like in another DBMS or on external files.

One of the most widely used tools for implementing these policies in MySQL ecosystems is pt-archiver, part of the Percona Toolkit.

This article provides a review of what pt-archiver is and how to use it, but in particular it focuses on the fact this tool is not partitioning aware, and this can make the deletion phase more costly. The article shows how to extend pt-archiver with a Perl plugin to make it aware of partitioning.

What is pt-archiver?

pt-archiver is a command-line utility from Percona Toolkit designed to:

Archive rows from MySQL tables
Purge rows from MySQL tables
Move data between tables into the local database or a remote one
Export rows into files

In a few words: implementing retention policies safely.

The tool processes rows incrementally in chunks, avoiding massive transactions and reducing impact on production systems.

Example:

pt-archiver \
  --source h=localhost,D=mydb,t=events \
  --where "created_at &lt; '2026-05-01'" \
  --purge \
  --limit 1000 \
  --commit-each

pt-archiver \

--source h=localhost,D=mydb,t=events \

--where "created_at < '2026-05-01'" \

--purge \

--limit 1000 \

--commit-each

This command:

Scans rows matching the WHERE condition
Processes them in chunks of 1000 rows
Commits every chunk
Deletes matching rows from the source table

pt-archiver provides several advantages compared to ad-hoc DELETE statements.

Instead of running:

DELETE FROM events
WHERE created_at &lt; '2026-05-01';

1 2	DELETE FROM events WHERE created_at < '2026-05-01';

which may:

Lock rows for a long time
Generate massive undo/redo logs
Create replication lag
Exhaust transaction logs

pt-archiver processes rows incrementally to make the process overhead less impactful for the database performance.

pt-archiver implementation permits flexible archival strategies

Rows can be copied to another table on a remote host, exported to files or removed completely

More details: ps://docs.percona.com/percona-toolkit/pt-archiver.html

Example: Copy rows to a remote archive table

The following example archives rows older than 90 days from a local table into an archive table hosted on a remote MySQL server:

pt-archiver \
  --source h=localhost,D=sales,t=orders,u=archiver,p=secret \
  --dest h=archive-server,D=archive,t=orders_archive,u=archiver,p=secret \
  --where "created_at &lt; '2026-05-01'" \
  --limit 1000 \
  --commit-each \
  --progress 10000 \
  --statistics

pt-archiver \

--source h=localhost,D=sales,t=orders,u=archiver,p=secret \

--dest h=archive-server,D=archive,t=orders_archive,u=archiver,p=secret \

--where "created_at < '2026-05-01'" \

--limit 1000 \

--commit-each \

--progress 10000 \

--statistics

In this example:

–source defines the source table
–dest defines the remote archive destination
–where selects rows eligible for archival
–limit controls batch size
–commit-each commits every batch independently to reduce transaction overhead

–-progress reports progress every 10,000 rows

If rows should be removed from the source table after being copied, add –purge

Example: Export rows to a file

The following example exports rows older than one year into a text file:

pt-archiver \
  --source h=localhost,D=sales,t=orders,u=archiver,p=secret \
  --where "created_at &lt; NOW() - INTERVAL 1 YEAR" \
  --file '/tmp/orders_archive_%Y-%m-%d.txt' \
  --output-format csv \
  --limit 1000 \
  --commit-each \
  --progress 10000 \
  --statistics

pt-archiver \

--source h=localhost,D=sales,t=orders,u=archiver,p=secret \

--where "created_at < NOW() - INTERVAL 1 YEAR" \

--file '/tmp/orders_archive_%Y-%m-%d.txt' \

--output-format csv \

--limit 1000 \

--commit-each \

--progress 10000 \

--statistics

In this example:

–file specifies the output file
–-output-format csv exports rows in CSV format
Date placeholders in the filename are expanded automatically

Rows can optionally be deleted from the source table by adding –purge

This allows pt-archiver to be used both for data retention and for offline archival workflows.

The Hidden Cost of DELETE Statements

Although pt-archiver is much safer than massive DELETE operations, it still fundamentally relies on DELETE statements.

This is a critical point.

Even when there are proper indexes, the rows are processed in chunks, and transactions are small; the large-scale DELETE operations remain expensive.

Deleting rows is expensive in InnoDB because it involves:

Locating rows via indexes
Modifying clustered indexes
Modifying secondary indexes
Generating undo logs
Generating redo logs
Purge thread processing
Replication event generation
Page fragmentation

When deleting billions of rows, the overhead becomes enormous.

Indexes help for sure, but only partially.

Consider:

DELETE FROM events
WHERE created_at &lt; '2024-01-01';

1 2	DELETE FROM events WHERE created_at < '2024-01-01';

If created_at is indexed, MySQL can efficiently locate rows.

However, locating rows efficiently is only part of the cost. The actual delete operations still require all those things we mentioned above.

At considerable scale, this becomes expensive.

Why RANGE Partitioning is Superior for Retention Policies

For time-based retention policies, partitioning is often dramatically more efficient. In particular, RANGE partitioning is very useful for these cases.

Example:

CREATE TABLE events (
    id BIGINT NOT NULL,
    created_at DATETIME NOT NULL,
    payload JSON,
    PRIMARY KEY(id, created_at)
)

PARTITION BY RANGE (TO_DAYS(created_at)) (
    PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
    PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
    PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01'))
);

CREATE TABLE events (

id BIGINT NOT NULL,

created_at DATETIME NOT NULL,

payload JSON,

PRIMARY KEY(id, created_at)

)

PARTITION BY RANGE (TO_DAYS(created_at)) (

PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),

PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),

PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01'))

);

With partitioning, dropping old data becomes:

ALTER TABLE events DROP PARTITION p202604;

1	ALTER TABLE events DROP PARTITION p202604;

This operation is dramatically faster than running a DELETE.

Dropping a partition:

Removes an entire physical partition
Avoids row-by-row DELETE
Avoids undo generation for each row
Avoids secondary index maintenance per row
Minimizes redo generation
Is nearly metadata-only

This can remove millions or billions of rows in a matter of seconds without the same large cost of DELETE.

The Problem: pt-archiver is Not Partition-Aware

Unfortunately, pt-archiver does not automatically understand partitioning strategies.

Even if the table is partitioned or the retention policy perfectly matches partition boundaries, pt-archiver still executes DELETE statements.

Example:

pt-archiver \
  --where "created_at &lt; NOW() - INTERVAL 90 DAY" \
  --purge

pt-archiver \

--where "created_at < NOW() - INTERVAL 90 DAY" \

--purge

Internally, this still produces DELETE … instead of ALTER TABLE … DROP PARTITION …

This means organizations may lose the major operational benefits of partitioning, or they need to implement custom scripts for managing the selection of rows to copy using pt-archiver and then use DROP PARTITION separately from the tool. That is doable, and to be honest, not too complicated, but why not make pt-archiver aware of partitioning for some specific use cases?

Extending pt-archiver with Pulg-ins

Fortunately, pt-archiver supports Perl plug-ins.

A plug-in can do plenty of things. Like: inspect runtime conditions, interact with MySQL, override behaviors, and execute custom logic

This gives us an opportunity to implement partition-aware retention handling.

The plug-in can:

Inspect partition definitions
Analyze the WHERE condition
Determine which partitions are fully expired
Execute ALTER TABLE DROP PARTITION
Prevent row-by-row DELETE processing

This approach combines the scheduling/orchestration power of pt-archiver with the efficiency of partition pruning.

Plug-in Design

Our plug-in will:

Connect using the pt-archiver DB handle
Inspect INFORMATION_SCHEMA.PARTITIONS
Identify partitions older than the retention cutoff
Issue DROP PARTITION statements
Log actions
Skip DELETE processing

Assumptions:

The table is RANGE partitioned
Partitions are DATETIME based using the TO_DAYS() function to define ranges
Partition naming convention contains dates
Retention policy aligns with partition boundaries; if the plugin cannot determine a specific boundary, pt-archiver does nothing

Full Perl Plug-in for pt-archiver

package pt_archiver_partition_drop;

use strict;
use warnings;

sub new {
    my ($class, %args) = @_;
    my $self = {
        dbh        =&gt; $args{dbh},
        db         =&gt; $args{db},
        tbl        =&gt; $args{tbl},
        statistics =&gt; {},
    };

    bless $self, $class;
    return $self;
}

sub statistics {
    my ($self) = @_;
    return $self-&gt;{statistics};
}


sub before_begin {
    my ($self) = @_;
    my $dbh = $self-&gt;{dbh} or die "Missing dbh from pt-archiver\n";
    my $db  = $self-&gt;{db}  or die "Missing db from pt-archiver plugin args\n";
    my $tbl = $self-&gt;{tbl} or die "Missing tbl from pt-archiver plugin args\n";
    my $where  = _get_cmdline_option('where');
    my $dryrun = $ENV{PT_PARTITION_DROP_DRY_RUN} ? 1 : 0;

    die "Missing --where from original command line\n" unless $where;

    print "PLUGIN before_begin called\n";
    print "DB=$db TABLE=$tbl\n";
    print "WHERE=$where\n";
    print "PLUGIN_DRY_RUN=$dryrun\n";

    my ($column, $cutoff_date) = _parse_where($where);

    my $partitions = _get_partitions($dbh, $db, $tbl);

    if (!@$partitions) {
        print "Table `$db`.`$tbl` is not partitioned. Refusing DELETE.\n";
        exit(0);
    }

    my $partition_expr = $partitions-&gt;[0]-&gt;{expression};
    die "Missing PARTITION_EXPRESSION\n"
        unless defined $partition_expr &amp;&amp; length $partition_expr;

    print "Partition expression: $partition_expr\n";

    my $cutoff_value = _evaluate_cutoff(
        $dbh,
        $partition_expr,
        $column,
        $cutoff_date,
    );

    print "Cutoff date: $cutoff_date\n";
    print "Cutoff boundary value: $cutoff_value\n";

    my $matched;

    for my $p (@$partitions) {
        next if !defined $p-&gt;{description};
        next if uc($p-&gt;{description}) eq 'MAXVALUE';

        if ($p-&gt;{description} == $cutoff_value) {
            $matched = $p;
            last;
        }
    }


    if (!$matched) {
        print "No exact partition boundary matches cutoff $cutoff_value. Refusing DELETE.\n";
        exit(0);
    }

    print "Matched boundary partition: $matched-&gt;{name}, position $matched-&gt;{position}\n";

    my @drop;

    for my $p (@$partitions) {
        next if !defined $p-&gt;{description};
        next if uc($p-&gt;{description}) eq 'MAXVALUE';

        if ($p-&gt;{position} &lt;= $matched-&gt;{position}) {
            push @drop, $p-&gt;{name};
            print "Eligible for DROP: $p-&gt;{name}, boundary $p-&gt;{description}\n";
        }
    }

    if (!@drop) {
        print "No partitions eligible for DROP. Refusing DELETE.\n";
        exit(0);
    }

    my $sql = sprintf(
        "ALTER TABLE %s.%s DROP PARTITION %s",
        _quote_ident($db),
        _quote_ident($tbl),
        join(", ", map { _quote_ident($_) } @drop),
    );

    print "SQL: $sql\n";

    if ($dryrun) {
        print "PT_PARTITION_DROP_DRY_RUN enabled. Not executing DROP PARTITION.\n";
    }
    else {
        $dbh-&gt;do($sql);
        print "Dropped partitions: " . join(", ", @drop) . "\n";
    }

    $self-&gt;{statistics}-&gt;{partitions_dropped} = scalar @drop;

    exit(0);
}


sub _parse_where {
    my ($where) = @_;

    $where =~ s/^\s+|\s+$//g;

    die "Only WHERE format supported: created_at &lt; 'YYYY-MM-DD'\n"
        unless $where =~ /^`?([A-Za-z0-9_]+)`?\s*&lt;\s*'(\d{4}-\d{2}-\d{2})'\s*$/;

    return ($1, $2);
}

sub _evaluate_cutoff {
    my ($dbh, $partition_expr, $column, $cutoff_date) = @_;

    my $expr = $partition_expr;
    $expr =~ s/`//g;

    die "Partition expression does not reference column `$column`: $partition_expr\n"
        unless $expr =~ /\b\Q$column\E\b/i;

    $expr =~ s/\b\Q$column\E\b/'$cutoff_date'/ig;

    die "Unsafe generated expression: $expr\n"
        unless $expr =~ /^[A-Za-z0-9_\s\(\)\+\-\*\/,\.'":]+$/;

    my $sql = "SELECT $expr";

    print "Boundary evaluation SQL: $sql\n";

    my ($value) = $dbh-&gt;selectrow_array($sql);

    die "Cannot evaluate cutoff expression: $sql\n"
        unless defined $value;

    return $value;
}

sub _get_partitions {
    my ($dbh, $db, $tbl) = @_;

    my $sql = q{
        SELECT
            PARTITION_NAME,
            PARTITION_DESCRIPTION,
            PARTITION_EXPRESSION,
            PARTITION_ORDINAL_POSITION
        FROM INFORMATION_SCHEMA.PARTITIONS
        WHERE TABLE_SCHEMA = ?
          AND TABLE_NAME = ?
          AND PARTITION_NAME IS NOT NULL
        ORDER BY PARTITION_ORDINAL_POSITION
    };

    my $sth = $dbh-&gt;prepare($sql);
    $sth-&gt;execute($db, $tbl);
    my @partitions;

    while (my $row = $sth-&gt;fetchrow_hashref()) {
        push @partitions, {
            name        =&gt; $row-&gt;{PARTITION_NAME},
            description =&gt; $row-&gt;{PARTITION_DESCRIPTION},
            expression  =&gt; $row-&gt;{PARTITION_EXPRESSION},
            position    =&gt; $row-&gt;{PARTITION_ORDINAL_POSITION},
        };
    }

    return \@partitions;
}


sub _get_cmdline_option {

    my ($name) = @_;

    my $opt = "--$name";

    for (my $i = 0; $i &lt; @ARGV; $i++) {
        if ($ARGV[$i] eq $opt &amp;&amp; defined $ARGV[$i + 1]) {
            return $ARGV[$i + 1];
        }

        if ($ARGV[$i] =~ /^\Q$opt\E=(.*)$/) {
            return $1;
        }
    }

    if (open my $fh, '&lt;', "/proc/$$/cmdline") {
        local $/;
        my $raw = &lt;$fh&gt;;
        close $fh;

        my @cmd = split /\0/, $raw;

        for (my $i = 0; $i &lt; @cmd; $i++) {
            if ($cmd[$i] eq $opt &amp;&amp; defined $cmd[$i + 1]) {
                return $cmd[$i + 1];
            }

            if ($cmd[$i] =~ /^\Q$opt\E=(.*)$/) {
                return $1;
            }
        }
    }

    return undef;
}



sub _quote_ident {

    my ($ident) = @_;

    die "Invalid identifier: $ident\n"
        unless defined $ident &amp;&amp; $ident =~ /^[A-Za-z0-9_]+$/;

    return "`$ident`";
}

1;

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

package pt_archiver_partition_drop;

use strict;

use warnings;

sub new {

my ($class, %args) = @_;

my $self = {

dbh => $args{dbh},

db => $args{db},

tbl => $args{tbl},

statistics => {},

};

bless $self, $class;

return $self;

}

sub statistics {

my ($self) = @_;

return $self->{statistics};

}

sub before_begin {

my ($self) = @_;

my $dbh = $self->{dbh} or die "Missing dbh from pt-archiver\n";

my $db = $self->{db} or die "Missing db from pt-archiver plugin args\n";

my $tbl = $self->{tbl} or die "Missing tbl from pt-archiver plugin args\n";

my $where = _get_cmdline_option('where');

my $dryrun = $ENV{PT_PARTITION_DROP_DRY_RUN} ? 1 : 0;

die "Missing --where from original command line\n" unless $where;

print "PLUGIN before_begin called\n";

print "DB=$db TABLE=$tbl\n";

print "WHERE=$where\n";

print "PLUGIN_DRY_RUN=$dryrun\n";

my ($column, $cutoff_date) = _parse_where($where);

my $partitions = _get_partitions($dbh, $db, $tbl);

if (!@$partitions) {

print "Table `$db`.`$tbl` is not partitioned. Refusing DELETE.\n";

exit(0);

}

my $partition_expr = $partitions->[0]->{expression};

die "Missing PARTITION_EXPRESSION\n"

unless defined $partition_expr && length $partition_expr;

print "Partition expression: $partition_expr\n";

my $cutoff_value = _evaluate_cutoff(

$dbh,

$partition_expr,

$column,

$cutoff_date,

);

print "Cutoff date: $cutoff_date\n";

print "Cutoff boundary value: $cutoff_value\n";

my $matched;

for my $p (@$partitions) {

next if !defined $p->{description};

next if uc($p->{description}) eq 'MAXVALUE';

if ($p->{description} == $cutoff_value) {

$matched = $p;

last;

}

if (!$matched) {

print "No exact partition boundary matches cutoff $cutoff_value. Refusing DELETE.\n";

exit(0);

}

print "Matched boundary partition: $matched->{name}, position $matched->{position}\n";

my @drop;

for my $p (@$partitions) {

next if !defined $p->{description};

next if uc($p->{description}) eq 'MAXVALUE';

if ($p->{position} <= $matched->{position}) {

push @drop, $p->{name};

print "Eligible for DROP: $p->{name}, boundary $p->{description}\n";

}

if (!@drop) {

print "No partitions eligible for DROP. Refusing DELETE.\n";

exit(0);

}

my $sql = sprintf(

"ALTER TABLE %s.%s DROP PARTITION %s",

_quote_ident($db),

_quote_ident($tbl),

join(", ", map { _quote_ident($_) } @drop),

);

print "SQL: $sql\n";

if ($dryrun) {

print "PT_PARTITION_DROP_DRY_RUN enabled. Not executing DROP PARTITION.\n";

}

else {

$dbh->do($sql);

print "Dropped partitions: " . join(", ", @drop) . "\n";

}

$self->{statistics}->{partitions_dropped} = scalar @drop;

exit(0);

}

sub _parse_where {

my ($where) = @_;

$where =~ s/^\s+|\s+$//g;

die "Only WHERE format supported: created_at < 'YYYY-MM-DD'\n"

unless $where =~ /^`?([A-Za-z0-9_]+)`?\s*<\s*'(\d{4}-\d{2}-\d{2})'\s*$/;

return ($1, $2);

}

sub _evaluate_cutoff {

my ($dbh, $partition_expr, $column, $cutoff_date) = @_;

my $expr = $partition_expr;

$expr =~ s/`//g;

die "Partition expression does not reference column `$column`: $partition_expr\n"

unless $expr =~ /\b\Q$column\E\b/i;

$expr =~ s/\b\Q$column\E\b/'$cutoff_date'/ig;

die "Unsafe generated expression: $expr\n"

unless $expr =~ /^[A-Za-z0-9_\s\+\-\*\/,\.'":]+$/;

my $sql = "SELECT $expr";

print "Boundary evaluation SQL: $sql\n";

my ($value) = $dbh->selectrow_array($sql);

die "Cannot evaluate cutoff expression: $sql\n"

unless defined $value;

return $value;

}

sub _get_partitions {

my ($dbh, $db, $tbl) = @_;

my $sql = q{

SELECT

PARTITION_NAME,

PARTITION_DESCRIPTION,

PARTITION_EXPRESSION,

PARTITION_ORDINAL_POSITION

FROM INFORMATION_SCHEMA.PARTITIONS

WHERE TABLE_SCHEMA = ?

AND TABLE_NAME = ?

AND PARTITION_NAME IS NOT NULL

ORDER BY PARTITION_ORDINAL_POSITION

};

my $sth = $dbh->prepare($sql);

$sth->execute($db, $tbl);

my @partitions;

while (my $row = $sth->fetchrow_hashref()) {

push @partitions, {

name => $row->{PARTITION_NAME},

description => $row->{PARTITION_DESCRIPTION},

expression => $row->{PARTITION_EXPRESSION},

position => $row->{PARTITION_ORDINAL_POSITION},

};

}

return \@partitions;

}

sub _get_cmdline_option {

my ($name) = @_;

my $opt = "--$name";

for (my $i = 0; $i < @ARGV; $i++) {

if ($ARGV[$i] eq $opt && defined $ARGV[$i + 1]) {

return $ARGV[$i + 1];

}

if ($ARGV[$i] =~ /^\Q$opt\E=(.*)$/) {

return $1;

}

if (open my $fh, '<', "/proc/$$/cmdline") {

local $/;

my $raw = <$fh>;

close $fh;

my @cmd = split /\0/, $raw;

for (my $i = 0; $i < @cmd; $i++) {

if ($cmd[$i] eq $opt && defined $cmd[$i + 1]) {

return $cmd[$i + 1];

}

if ($cmd[$i] =~ /^\Q$opt\E=(.*)$/) {

return $1;

}

return undef;

}

sub _quote_ident {

my ($ident) = @_;

die "Invalid identifier: $ident\n"

unless defined $ident && $ident =~ /^[A-Za-z0-9_]+$/;

return "`$ident`";

}

Create the file named pt_archiver_partition_drop.pm into the /usr/local/share/perl5 path.

Also set the environment variable PERL5LIB to let pt-archiver where to find the Perl package

export PERL5LIB=/usr/local/share/perl5

1	export PERL5LIB=/usr/local/share/perl5

Example Usage

First, create the partitioned table events and insert some fake data.

DROP TABLE IF EXISTS events;


CREATE TABLE events (
  id BIGINT NOT NULL,
  created_at DATETIME NOT NULL,
  payload JSON DEFAULT NULL,
  PRIMARY KEY (id, created_at)
)
PARTITION BY RANGE (TO_DAYS(created_at)) (
  PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
  PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
  PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01')),
  PARTITION pmax VALUES LESS THAN MAXVALUE
);

INSERT INTO events (id, created_at, payload) VALUES

-- p202604
(1,  '2026-04-01 08:00:00', JSON_OBJECT('event', 'login',    'user', 'alice')),
(2,  '2026-04-03 09:15:00', JSON_OBJECT('event', 'view',     'page', 'home')),
(3,  '2026-04-05 10:30:00', JSON_OBJECT('event', 'click',    'button', 'signup')),
(4,  '2026-04-08 11:45:00', JSON_OBJECT('event', 'search',   'term', 'mysql')),
(5,  '2026-04-10 12:00:00', JSON_OBJECT('event', 'purchase', 'amount', 100)),
(6,  '2026-04-14 13:20:00', JSON_OBJECT('event', 'logout',   'user', 'alice')),
(7,  '2026-04-18 14:35:00', JSON_OBJECT('event', 'download', 'file', 'report.pdf')),
(8,  '2026-04-22 15:50:00', JSON_OBJECT('event', 'upload',   'file', 'image.png')),
(9,  '2026-04-26 16:05:00', JSON_OBJECT('event', 'click',    'button', 'buy')),
(10, '2026-04-30 23:59:59', JSON_OBJECT('event', 'month_end')),

-- p202605

(11, '2026-05-01 00:00:00', JSON_OBJECT('event', 'login',    'user', 'bob')),
(12, '2026-05-03 08:10:00', JSON_OBJECT('event', 'view',     'page', 'pricing')),
(13, '2026-05-06 09:20:00', JSON_OBJECT('event', 'search',   'term', 'percona')),
(14, '2026-05-09 10:30:00', JSON_OBJECT('event', 'purchase', 'amount', 250)),
(15, '2026-05-12 11:40:00', JSON_OBJECT('event', 'logout',   'user', 'bob')),
(16, '2026-05-16 12:50:00', JSON_OBJECT('event', 'download', 'file', 'backup.sql')),
(17, '2026-05-20 13:00:00', JSON_OBJECT('event', 'upload',   'file', 'data.csv')),
(18, '2026-05-24 14:10:00', JSON_OBJECT('event', 'click',    'button', 'subscribe')),
(19, '2026-05-28 15:20:00', JSON_OBJECT('event', 'view',     'page', 'docs')),
(20, '2026-05-31 23:59:59', JSON_OBJECT('event', 'month_end')),

-- p202606

(21, '2026-06-01 00:00:00', JSON_OBJECT('event', 'login',    'user', 'carol')),
(22, '2026-06-03 08:05:00', JSON_OBJECT('event', 'search',   'term', 'partitioning')),
(23, '2026-06-06 09:15:00', JSON_OBJECT('event', 'view',     'page', 'dashboard')),
(24, '2026-06-09 10:25:00', JSON_OBJECT('event', 'purchase', 'amount', 500)),
(25, '2026-06-12 11:35:00', JSON_OBJECT('event', 'logout',   'user', 'carol')),
(26, '2026-06-16 12:45:00', JSON_OBJECT('event', 'login',    'user', 'dave')),
(27, '2026-06-20 13:55:00', JSON_OBJECT('event', 'download', 'file', 'archive.zip')),
(28, '2026-06-24 14:05:00', JSON_OBJECT('event', 'upload',   'file', 'video.mp4')),
(29, '2026-06-28 15:15:00', JSON_OBJECT('event', 'click',    'button', 'checkout')),
(30, '2026-06-30 23:59:59', JSON_OBJECT('event', 'month_end')),

-- pmax
(31, '2026-07-01 00:00:00', JSON_OBJECT('event', 'login',    'user', 'eve')),
(32, '2026-07-05 08:30:00', JSON_OBJECT('event', 'view',     'page', 'future')),
(33, '2026-07-10 09:45:00', JSON_OBJECT('event', 'search',   'term', 'maxvalue')),
(34, '2026-08-01 10:00:00', JSON_OBJECT('event', 'purchase', 'amount', 750)),
(35, '2026-09-01 11:15:00', JSON_OBJECT('event', 'retained_future'));

DROP TABLE IF EXISTS events;

CREATE TABLE events (

id BIGINT NOT NULL,

created_at DATETIME NOT NULL,

payload JSON DEFAULT NULL,

PRIMARY KEY (id, created_at)

)

PARTITION BY RANGE (TO_DAYS(created_at)) (

PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),

PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),

PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01')),

PARTITION pmax VALUES LESS THAN MAXVALUE

);

INSERT INTO events (id, created_at, payload) VALUES

-- p202604

(1, '2026-04-01 08:00:00', JSON_OBJECT('event', 'login', 'user', 'alice')),

(2, '2026-04-03 09:15:00', JSON_OBJECT('event', 'view', 'page', 'home')),

(3, '2026-04-05 10:30:00', JSON_OBJECT('event', 'click', 'button', 'signup')),

(4, '2026-04-08 11:45:00', JSON_OBJECT('event', 'search', 'term', 'mysql')),

(5, '2026-04-10 12:00:00', JSON_OBJECT('event', 'purchase', 'amount', 100)),

(6, '2026-04-14 13:20:00', JSON_OBJECT('event', 'logout', 'user', 'alice')),

(7, '2026-04-18 14:35:00', JSON_OBJECT('event', 'download', 'file', 'report.pdf')),

(8, '2026-04-22 15:50:00', JSON_OBJECT('event', 'upload', 'file', 'image.png')),

(9, '2026-04-26 16:05:00', JSON_OBJECT('event', 'click', 'button', 'buy')),

(10, '2026-04-30 23:59:59', JSON_OBJECT('event', 'month_end')),

-- p202605

(11, '2026-05-01 00:00:00', JSON_OBJECT('event', 'login', 'user', 'bob')),

(12, '2026-05-03 08:10:00', JSON_OBJECT('event', 'view', 'page', 'pricing')),

(13, '2026-05-06 09:20:00', JSON_OBJECT('event', 'search', 'term', 'percona')),

(14, '2026-05-09 10:30:00', JSON_OBJECT('event', 'purchase', 'amount', 250)),

(15, '2026-05-12 11:40:00', JSON_OBJECT('event', 'logout', 'user', 'bob')),

(16, '2026-05-16 12:50:00', JSON_OBJECT('event', 'download', 'file', 'backup.sql')),

(17, '2026-05-20 13:00:00', JSON_OBJECT('event', 'upload', 'file', 'data.csv')),

(18, '2026-05-24 14:10:00', JSON_OBJECT('event', 'click', 'button', 'subscribe')),

(19, '2026-05-28 15:20:00', JSON_OBJECT('event', 'view', 'page', 'docs')),

(20, '2026-05-31 23:59:59', JSON_OBJECT('event', 'month_end')),

-- p202606

(21, '2026-06-01 00:00:00', JSON_OBJECT('event', 'login', 'user', 'carol')),

(22, '2026-06-03 08:05:00', JSON_OBJECT('event', 'search', 'term', 'partitioning')),

(23, '2026-06-06 09:15:00', JSON_OBJECT('event', 'view', 'page', 'dashboard')),

(24, '2026-06-09 10:25:00', JSON_OBJECT('event', 'purchase', 'amount', 500)),

(25, '2026-06-12 11:35:00', JSON_OBJECT('event', 'logout', 'user', 'carol')),

(26, '2026-06-16 12:45:00', JSON_OBJECT('event', 'login', 'user', 'dave')),

(27, '2026-06-20 13:55:00', JSON_OBJECT('event', 'download', 'file', 'archive.zip')),

(28, '2026-06-24 14:05:00', JSON_OBJECT('event', 'upload', 'file', 'video.mp4')),

(29, '2026-06-28 15:15:00', JSON_OBJECT('event', 'click', 'button', 'checkout')),

(30, '2026-06-30 23:59:59', JSON_OBJECT('event', 'month_end')),

-- pmax

(31, '2026-07-01 00:00:00', JSON_OBJECT('event', 'login', 'user', 'eve')),

(32, '2026-07-05 08:30:00', JSON_OBJECT('event', 'view', 'page', 'future')),

(33, '2026-07-10 09:45:00', JSON_OBJECT('event', 'search', 'term', 'maxvalue')),

(34, '2026-08-01 10:00:00', JSON_OBJECT('event', 'purchase', 'amount', 750)),

(35, '2026-09-01 11:15:00', JSON_OBJECT('event', 'retained_future'));

Now you can run the following command to delete all rows before the 1st of May, which, by the way, matches the entire first partition in the table.

pt-archiver \
  --source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \
  --where "created_at &lt; '2026-05-01'" \
  --purge

pt-archiver \

--source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \

--where "created_at < '2026-05-01'" \

--purge

Notice the Perl plugin must be indicated with the m option in the DSN string.

In practice:

pt-archiver initializes
The plug-in runs
Partitions are dropped
No DELETE statements are executed

Here is what you get from the execution of the above command:

PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at &lt; '2026-05-01'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-05-01')
Cutoff date: 2026-05-01
Cutoff boundary value: 740102
Matched boundary partition: p202604, position 1
Eligible for DROP: p202604, boundary 740102
SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`
Dropped partitions: p202604

PLUGIN before_begin called

DB=mydb TABLE=events

WHERE=created_at < '2026-05-01'

PLUGIN_DRY_RUN=0

Partition expression: to_days(`created_at`)

Boundary evaluation SQL: SELECT to_days('2026-05-01')

Cutoff date: 2026-05-01

Cutoff boundary value: 740102

Matched boundary partition: p202604, position 1

Eligible for DROP: p202604, boundary 740102

SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`

Dropped partitions: p202604

You can simply verify the table has been managed correctly:

SELECT * FROM mydb.events;

SHOW CREATE TABLE mydb.events;

Now TRUNCATE the table and recreate the data and try now to specify the where conditions that match a RANGE that is not the first in the list of the boundaries.

pt-archiver \
  --source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \
  --where "created_at &lt; '2026-06-01'" \
  --purge

pt-archiver \

--source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \

--where "created_at < '2026-06-01'" \

--purge

You should get:

PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at &lt; '2026-06-01'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-06-01')
Cutoff date: 2026-06-01
Cutoff boundary value: 740133
Matched boundary partition: p202605, position 2
Eligible for DROP: p202604, boundary 740102
Eligible for DROP: p202605, boundary 740133
SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`, `p202605`
Dropped partitions: p202604, p202605

PLUGIN before_begin called

DB=mydb TABLE=events

WHERE=created_at < '2026-06-01'

PLUGIN_DRY_RUN=0

Partition expression: to_days(`created_at`)

Boundary evaluation SQL: SELECT to_days('2026-06-01')

Cutoff date: 2026-06-01

Cutoff boundary value: 740133

Matched boundary partition: p202605, position 2

Eligible for DROP: p202604, boundary 740102

Eligible for DROP: p202605, boundary 740133

SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`, `p202605`

Dropped partitions: p202604, p202605

In this case, two partitions have been identified and dropped.

Truncate the table and recreate the data again. Try now to provide a WHERE condition that does not match any of the boundaries in the RANGE.

pt-archiver \
  --source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \
  --where "created_at &lt; '2026-04-25'" \
  --purge

pt-archiver \

--source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \

--where "created_at < '2026-04-25'" \

--purge

You get the following:

PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at &lt; '2026-04-25'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-04-25')
Cutoff date: 2026-04-25
Cutoff boundary value: 740096
No exact partition boundary matches cutoff 740096. Refusing DELETE.

PLUGIN before_begin called

DB=mydb TABLE=events

WHERE=created_at < '2026-04-25'

PLUGIN_DRY_RUN=0

Partition expression: to_days(`created_at`)

Boundary evaluation SQL: SELECT to_days('2026-04-25')

Cutoff date: 2026-04-25

Cutoff boundary value: 740096

No exact partition boundary matches cutoff 740096. Refusing DELETE.

As expected, the tool now refuses to execute anything if it doesn’t find an exact match.

Operational Benefits

This approach provides major advantages.

Dropping partitions is vastly faster than deleting rows, and minimal binary logging is needed, compared to billions of row deletes. There is no massive transactional overhead for managing undo logs and purging. You get then a better InnoDB Buffer Pool stability because of less page churn.

In the end, retention jobs are completed quickly and consistently in a predictable way and at the minimal cost.

Important Caveats

Partition Boundaries Must Match Retention Policy

If partitions contain mixed retention windows, DROP PARTITION may remove too much data. For this reason, ensure correct partition design.

Recommended:

daily partitions
weekly partitions
monthly partitions

aligned with business retention requirements.

Metadata Locks

ALTER TABLE DROP PARTITION still acquires metadata locks.

Test carefully in production.

Backup Awareness

Ensure dropped partitions are no longer needed before removal or use pt-archiver to also copy the data into a remote server or dump the data into a CSV file before running the DROP PARTITION.

Possible Enhancements

The plug-in can be extended further.

Potential improvements:

Support for daily partitions
Support for UNIX timestamp partitions
Dry-run reporting
Automatic partition creation
Push Slack notifications
Export Prometheus metrics
Safety checks for replicas
GTID-aware orchestration
Integration with pt-online-schema-change workflows

These are just some ideas I had meanwhile doing my tests. What you can do by implementing a Perl plugin is only limited by your imagination and your real needs.

Conclusion

pt-archiver remains an excellent tool for implementing retention policies and archival workflows.

However, DELETE-based purging becomes increasingly expensive at scale, even with proper indexing and chunked processing.

For large time-series or historical datasets, RANGE partitioning is often a dramatically superior strategy.

The challenge is that pt-archiver does not natively leverage partition-level operations.

Fortunately, its Perl plug-in architecture allows advanced users to extend its behavior and implement partition-aware cleanup logic.

By combining:

pt-archiver orchestration
MySQL RANGE partitioning
Custom Perl plug-ins

Organizations can achieve:

Faster retention enforcement
Lower operational overhead
Smaller replication impact
Dramatically improved scalability

For large MySQL deployments, this hybrid approach can turn multi-hour purge operations into near-instant metadata operations.

The use case presented in this article is limited to a specific scenario, but you can reuse it or customize it if you have a different kind of RANGE partitioning, for example, not using TO_DAYS().

Take this as just an example of how you can extend pt-archiver. What you can do for real is driven by your needs and/or only limited by your imagination.

More info about extending pt-archiver:
https://docs.percona.com/percona-toolkit/pt-archiver.html#extending