Subrange repairs in cassandra

Steven Lacerda
5 min readApr 21, 2020

--

The command is:

nodetool repair -st <value> -et <value> <options> keyspace table

The first thing to understand is that subrange repairs cannot span token boundaries. You can see what tokens a node owns as the primary or partitioner range via nodetool ring:

$ nodetool ringDatacenter: SearchAnalytics
==========
Address Rack Status State Load Owns Token
8192980903823075642
10.101.35.252 rack1 Up Normal 3.1 GiB ? -9144028497730107534
10.101.36.29 rack1 Up Normal 3.7 GiB ? -9040309921064747559
10.101.36.29 rack1 Up Normal 3.7 GiB ? -7928172198680695968
10.101.36.29 rack1 Up Normal 3.7 GiB ? -7779485228575103453
10.101.36.16 rack1 Up Normal 4.29 GiB ? -6659387401131168306
10.101.36.29 rack1 Up Normal 3.7 GiB ? -5001499423575491015
10.101.36.29 rack1 Up Normal 3.7 GiB ? -4836161519518188735
10.101.36.16 rack1 Up Normal 4.29 GiB ? -4549431037104360096
10.101.36.16 rack1 Up Normal 4.29 GiB ? -4144046692937582676
10.101.36.16 rack1 Up Normal 4.29 GiB ? -3355867918154707689
10.101.36.29 rack1 Up Normal 3.7 GiB ? -3231601002377561805
10.101.35.252 rack1 Up Normal 3.1 GiB ? -2935776010680986435
10.101.36.29 rack1 Up Normal 3.7 GiB ? -926178401474037256
10.101.35.252 rack1 Up Normal 3.1 GiB ? -681563722651910912
10.101.36.16 rack1 Up Normal 4.29 GiB ? -509669029094950376
10.101.35.252 rack1 Up Normal 3.1 GiB ? 681850053019709972
10.101.36.16 rack1 Up Normal 4.29 GiB ? 3015510142856779001
10.101.36.29 rack1 Up Normal 3.7 GiB ? 4439185777656285256
10.101.35.252 rack1 Up Normal 3.1 GiB ? 5286596102575829058
10.101.35.252 rack1 Up Normal 3.1 GiB ? 6408462580022980409
10.101.35.252 rack1 Up Normal 3.1 GiB ? 6590428700301090619
10.101.36.16 rack1 Up Normal 4.29 GiB ? 7473066361305869170
10.101.36.16 rack1 Up Normal 4.29 GiB ? 7514654700679932416
10.101.35.252 rack1 Up Normal 3.1 GiB ? 8192980903823075642
Warning: “nodetool ring” is used to output all the tokens of a node.
To view status related info of a node use “nodetool status” instead.

With nodetool repair you cannot cross those token range boundaries. Thus, although node .29 owns 3 ranges that lie directly next to each other:

10.101.36.29 rack1 Up Normal 3.7 GiB ? -9040309921064747559
10.101.36.29 rack1 Up Normal 3.7 GiB ? -7928172198680695968
10.101.36.29 rack1 Up Normal 3.7 GiB ? -7779485228575103453

You still cannot do something like:

nodetool repair -st -9040309921064747559 -et -7779485228575103453 foo

That will result in an error similar to the following, because you passed the token range boundary and into another range:

java.lang.RuntimeException: Repair job has failed with the error message: [2020–04–20 19:41:58,501] Repair command #1890 failed with error Requested range (-9040309921064747559,-7779485228575103453] intersects a local range ((-8900428075539446602,-8900643771843359711]) but is not fully contained in one; this would lead to imprecise repair. keyspace: foo

What you need to do instead is repair within the boundary:

10.101.36.29 rack1 Up Normal 3.7 GiB ? -9040309921064747559
10.101.36.29 rack1 Up Normal 3.7 GiB ? -7928172198680695968
10.101.36.29 rack1 Up Normal 3.7 GiB ? -7779485228575103453
nodetool repair -st -9040309921064747559 -et -7928172198680695968 foo

The above will work, because you are spanning one token boundary, not several.

An additional note, is that this is a ring architecture, so when you get to the last token, the range would be last to first:

10.101.35.252 rack1 Up Normal 3.1 GiB ? -9144028497730107534
// bunch of other ranges
10.101.35.252 rack1 Up Normal 3.1 GiB ? 8192980903823075642

So, the command when reaching the last token, 8192980903823075642, would be:

nodetool repair -st 8192980903823075642 -et -9144028497730107534 foo

Subrange repairs also provide you with options for repairing with -full or -inc. Always use -full because incremental offers challenges and the defects have not yet been ironed out. As of version 5.1.3 dse or 3.11.x, full is the default. Although, there does seem to be a defect, that when you run repairs without specifying full, it prints incremental true even though it’s running a full repair (note: only true when not specifying a keyspace):

~$ nodetool repair -st -926178401474037256 -et -681563722651910912INFO: Neither --inc or --full repair options were provided. Running full repairs on tables with MVs, CDC or that were never incrementally repaired: [dropped_messages, range_latency_histograms_global, net_stats, partition_size_histograms_summary, write_latency_histograms, cell_count_histograms, node_slow_log, write_latency_histograms_global, write_latency_histograms_summary, range_latency_histograms_summary, key_cache, write_latency_histograms_ks, read_latency_histograms_summary, thread_pool, partition_size_histograms, range_latency_histograms, read_latency_histograms, sstables_per_read_histograms, read_latency_histograms_global, thread_pool_messages, cell_count_histograms_summary, sstables_per_read_histograms_summary, range_latency_histograms_ks, read_latency_histograms_ks, sstables_per_read_histograms_ks][2020-04-23 14:54:06,110] Replication factor is 1. No repair is needed for keyspace 'dse_perf'[2020-04-23 14:54:06,144] Starting repair command #9 (47897c60-8572-11ea-966a-13ee9400f2d0), repairing keyspace dfs_fs with repair options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], runAntiCompaction: false, # of ranges: 1, pull repair: false)

If you specify a keyspace, it seems that it prints correctly with incremental: false:

$ nodetool repair -st -926178401474037256 -et -681563722651910912 keyspace1INFO: Neither --inc or --full repair options were provided. Running full repairs on tables with MVs, CDC or that were never incrementally repaired: [standard1, names, test, zip_to_zip_delivery_matrix, counter1, price_sets][2020-04-23 13:13:17,491] Starting repair command #8 (3244b990-8564-11ea-966a-13ee9400f2d0), repairing keyspace keyspace1 with repair options (parallelism: parallel, primary range: false, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], runAntiCompaction: false, # of ranges: 1, pull repair: false)[2020-04-23 13:13:19,819] Repair session 32486310-8564-11ea-966a-13ee9400f2d0 for range [(-926178401474037256,-681563722651910912]] finished (progress: 100%)

I think something people get confused is with -full and -pr. Note, these are not polar opposites, in fact you can run a -full and -pr in the same command:

$ nodetool repair -st -926178401474037256 -et -681563722651910912 --full -pr keyspace1[2020-04-23 14:57:44,675] Starting repair command #16 (c9cbbd00-8572-11ea-966a-13ee9400f2d0), repairing keyspace keyspace1 with repair options (parallelism: parallel, primary range: true, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters: [], hosts: [], runAntiCompaction: false, # of ranges: 1, pull repair: false)[2020-04-23 14:57:48,530] Repair session c9cf8d90-8572-11ea-966a-13ee9400f2d0 for range [(-926178401474037256,-681563722651910912]] finished (progress: 100%)

If you run -pr, you are just saying that you want to repair the partitioner range that the node owns. It’s the same as repairing the ranges that are owned by the ring. Let’s give an example:

$ nodetool ringDatacenter: SearchAnalytics
==========
Address Rack Status State Load Owns Token
8192980903823075642
10.101.35.252 rack1 Up Normal 3.1 GiB ? -9144028497730107534
10.101.36.29 rack1 Up Normal 3.7 GiB ? -9040309921064747559

In the above, if I run:

$ nodetool repair -st -9144028497730107534 -et -9040309921064747559 -h 10.101.35.252

I’m repairing the nodes primary range for that token range, so this would be the same as -pr for that range. However, if I did the following:

$ nodetool repair -st -9144028497730107534 -et -9040309921064747559 -h 10.101.36.29

Then I would be repairing the nodes non-primary range, so basically repair without -pr.

Thus, subrange repairs with -pr and without -pr are moot when running subrange repairs because you are specifying a host to run the repair on and the range.

Lastly, you can repair a range within a token boundary, that a node does not own. That will repair the replica range, not the partitioner range on the node.

For general repairs, -st, -et on the primary range node is generally a good balance between consistency and performance.

Author: Steve Lacerda

--

--

Steven Lacerda
Steven Lacerda

Written by Steven Lacerda

Steve Lacerda is a software engineer specializing in web development. His favorite 80’s song is Let’s Put the X in Sex by Kiss.

No responses yet