I don't believe your erosity estimate is robust enough or accurate enough to be used as a metric.

First, Tennessee is not shaped like a circle, and 10 is not a large number.

And while the total perimeter of a large number of circles does approach a limit of sqrt(N) * 2*pi*R, where R is the radius of the containing circle, it might not do so monotonically. 7 small circles can be placed in a hexagon shape, with rounded vertices that would have a relatively small outer bounding circle. Add in an 8th and the bounding circle will expand quite a bit, remove a 7th, and the shrinkage will be small. Going from 6 to 7 had little cost, going from 7 to 8 a large amount.

This can be illustrated with an idealized-Iowa-like state which has infinite granularity but where boundaries are restricted to North-South and East-West lines. We draw equal sized districts (the population density is non-varying).

P is the

total perimeter of the districts, so the shared (green) perimeters are counted twice, and the outer black perimeter is counted one. This allows us to use sqrt(N) for our estimate, rather than sqrt(N)-1. This has the advantage that sqrt(1) is non-zero, while sqrt(1)-1 is zero.

So the perimeter of our single district state is 4.00, while for a 2-district state it is 6.00. But E, the estimate of the 2-district perimeter, is sqrt(2/1)*4.00 or 5.66. We grossly underestimated our total perimeter. This is not that surprising. Our districts are non-compact by Iowa standards, with a length twice their width.

We continue to 3 district, where P is 7.33. Our estimate is based on the 2-district case. That is E(N+1) = P(N)*sqrt((N+1)/N). E(3) = 6.00 * sqrt(3/2) = 7.35. In this case our estimate is good. We have the very-elongated district L:W = 3:1, and two somewhat compact districts L:W = 4:3. This really says that 3 districts are about as good or about as bad as 2 districts. If we based our estimate for 3 districts on the 1 district case, our estimate would be 4.00*sqrt(3/1) or 6.93, which would have been too optimistic.

But for a 4-district case P is 8.00, but our estimate of P(3)*sqrt(4/3) = 8.47 is much higher. Our districts are all maximally compact.

As we continue on, the estimate for 5 districts is quite low, followed by slightly high, slightly low, slightly high, and then quite high when we once again reach a compact symmetric case for 9 districts.

The first example really wasn't how we were subdividing regions. In this 2nd example, we nibble off smaller regions one at a time.

Somewhat surprising is the two-district case, where we do better than estimated, and much better than where we split into two equal areas. But it less surprising when we recognize that both regions are fairly compact, with only a corner missing from an almost square larger district. And of course smaller areas have smaller perimeters (P1/P2 = sqrt(A1/A2) for two similar polygons. The perimeter of the small region with 1/9 the of the total area, has a perimeter of 1/sqrt(9) or 1/3 of that of the total area.

Adding a 3rd district is not quite as compact as expected base on the two-district case. But our larger regions is beginning to become less compact. Our estimate based on two-districts was 6.53. We only did 6.67. But an estimate based on 1-district is 6.93, and when we create 3 equal area districts the perimeter is 7.33.

Adding a 4th district is quite efficient as we only have to chop off the panhandle of our large region. Our 4 districts are are as compact as 3 equal area districts can be, and better than the 4-district symmetric equal-area case of our first set of examples. That is, we can get a smaller perimeter by having greater variation in region size.

As we continue on, we are worse or better than the estimate based on the previous case E(N+1) = P(N)*sqrt((N+1)/N), depending on if we are able to cut off a panhandle of larger district, or the new district is cut off from the body. In all cases, other than for N=8, we are better than the equivalent equal-area version for the same number of districts.

It is instructive to look at the 4-district case in more detail. An estimate based on 1 district would be that we would have a perimeter of 8.00. We managed to do better, with a 7.33. And based on the 4-district case, we should be able to do 9-districts in 7.33*sqrt(9/4) with a perimeter of 11.00. We managed to do only 12.00 even with maximally compact districts.

What went wrong? We essentially gamed the system by creating districts for grossly different size, and then were unable to maintain that when subdividing the larger region.

If we use your proposed estimate method, which only measures interior perimeters, our estimate is even worse. The interior perimeter of the 4-district case is 1.67. The estimate for the 9 district case would be 1.67 * (sqrt(9) - 1)/(sqrt(4) - 1) = 3.33. But in reality it is 4.00.

If we adjust these numbers so they are equivalent to my values for total perimeter, the estimate for the 9-district case would be 10.67, which is even further from my estimate of 11.00, and the actual value of 12.00. Your estimate is based on how well you did in creating the first three small districts, with no knowledge of the area in the larger district beyond other than one side and an idea of its area (because we knew that it could be divided into 6 more areas).

We could do better, by making an estimate for dividing the larger area into 6 districts. The total perimeter for the 4-district case is 7.33, 4.00 of which is the total perimeter of the 3 small districts, and 3.33 for the 4th district. The estimate for the total perimeter after dividing the larger area into 6 districts is 3.33 * sqrt(6/1) = 8.16. Add in the 4.00 for the original 3 districts, and the estimate is 12.16, which is just above the actual 12.00. We went from a slightly non-compact area (length:width of 3:2) to 6 maximally compact districts.

In your Tennessee example, a better estimate of dividing the eastern edge of the state into 3 districts could be obtained from using that area alone, and not basing it in part on the rest of the state which had a mix of district sizes between smaller ones in the Memphis and Nashville areas, and the more rural areas between the mountains and the Mississippi.

If we use county-county links to measure erosity, this is problematic, since it would require determining links between counties in Virginia, North Carolina, and Georgia, and those in Eastern Tennessee. There may be scaling problems because of county size styles. And how well we divide eastern Tennessee is probably only marginally related to how easy it is to travel across the mountains between Tennessee and North Carolina. A better approach would be to calculate the internal erosity among all counties in the area being divided, and estimate downward.

There are 25 counties in your 3-district Eastern region, 6 counties in your greater Nashville region 2-district region, and 14 counties in your Western-Memphis region. The inter-county link counts for these 3 regions is 50, 9, and 26 respectively.

We can estimate the erosity if each of these county areas were consolidated into N regions:

Erosity(N)/County_Links = (sqrt(N) - 1) / (sqrt(ncounties) - 1)

For the eastern region:

Erosity(3) = 50 * (sqrt(3) - 1) / (sqrt(25) - 1) = 9.15

For the greater Nashville region:

Erosity(2) = 9 * (sqrt(2) - 1) / (sqrt(6) - 1) = 2.57

For the western-Memphis region:

Erosity(2) = 26 * (sqrt(2)-1) / (sqrt(14) - 1) = 3.93

These are reasonably close for the first two, if we ignore your proposed county splits (10 and 3). As to be expected, we missed badly in the 3rd, because there we simply chopped part of Shelby County from the region.

Add these to the 5-region erosity for a final estimate:

42 + 9.15 + 2.57 +3.93 = 57.65

BTW, why are links preferred over perimeter?