Ok, this is going to sound stupid at first glance, but hear me out.
One issue with most any system of drawing districts is that "nice" looking districts are in practice a Republican gerrymander, as you'll often just have handful of strong D urban districts, and a bunch of lean / strong R suburban / rural districts.
One way to fix this could be to require that each district have the same mix of urban/suburban/rural voters as the state as a whole, though then the results would really depend on how you (or the Census) defines this term.
Another, more agnostic, policy, would be to force the districts to have equal area, as well as equal population. Thus, all the districts will have equal population density (by construction) and should hopefully have a good mix of urban, suburban, and rural populations. Of course, if drawn by a person, you could still have all sorts of abuses (let me throw this desert or this National Park into this otherwise urban district, etc.), even if you throw out obvious problems (like water area).
But there's a way to do this algorithmically, at least in principle.
Take your state, pick some cardinal direction (let's say north), and divide your state into a northern portion and a southern portion. Move the dividing line until you've split the state into two halves of equal population. Now, this division will not (unless you're very lucky) result in halves of equal area.
But we can repeat this for any number of cardinal directions. Eventually, we can build up a function f(d), the proportion of the state that is on the "d" side of the line when I divide the state into two halves of equal population. Barring some really weird population distributions, this should be a continuous function of d (in practice if we're working with voting blocks, it won't be of course, but it should be close enough).
By construction, f(N) = 1 - f(S). That is, if I had picked South instead of North as my cardinal direction, that's essentially equivalent, and I'm going to get the exact same line dividing my state into two portions of equal population. Obviously, the proportion of the state that's on the north side of that line is just 1 minus the proportion that's on the south side of the line, as all of the state is on one side of the line or the other.
Thus, there's some direction where f(d) is greater than or equal to 1/2, and there's some point where f(d) is less than or equal to 1/2. Since f(d) is continuous (or close enough, for practical purposes), there's some direction d where f(d) = 1/2, and we can divide the state into two halves that have both equal area and equal population.
There may be multiple directions d (even apart from the trivial 180-degree flip) where this is true; you can pick some other criterion to break those ties (the one with the shortest length of the dividing boundary, for instance).
For states with 2^n districts, as opposed to just 2, you can just iterate this process on each of the districts until you have enough.
(1) For non-convex states, this process is not guaranteed to give you contiguous districts. Not sure if this would be a problem in practice, as most states are pretty convex.
(2) This process is decidedly not guaranteed to work if the number of districts needed is not a power of 2. The proof rested on the fact that north and south are equivalent if I'm dividing the state in half, which is no longer true if I'm dividing it into one-third and two-thirds portions. It's pretty easy to find a counterexample where the process doesn't work (e.g. a circular state with some radially symmetric but non-uniform population distribution).
This is a harder problem, but one that has a couple possible solutions:
(A) I can of course approximate 1/p by some binary fraction (e.g. 1/3 \approx 85/256), but (a) you can't slice things too fine without running into continuity problems due to the finite size of voting blocs and (b) there's then a lot of freedom in how to allocate my binary fraction approximation into different districts (e.g. which 85 portions do I put into district A, which 85 portions do I put in district B, and which 86 portions do I put into district C)?
(B) Perhaps there's some way to generalize the algorithm that works to divide a state into some odd prime number of districts, p. Perhaps picking p-1 different directions? Though it's not immediately obvious to me how (or if) this would work.
TL;DR In any event, despite these issues, I'd like to have a try at doing this for some different states (e.g. New Hampshire). Anyone know of a good tool that would work for this? Dave's tool doesn't show surface area, so I expect I'll have to get into the nitty-gritty of the GIS here.
This is grossly similar in concept to the splitline algorithm.
Their initial algorithm would subdivide an area with N districts into two areas of
[N/2] and [(N+1)/2] districts, so 27 would divide into 13 and 14, and 28 would divide into 14 and 14.
The basic premise of using the shortest line is that it would lead the smallest internal boundary length. The problem is that recursive subdivision does not necessarily lead to that result collectively. At one time they discussed that a state could hold a contest where the winner would have the smallest internal boundaries, and splitline would simply set an upper limit. While someone might produce a shorter length, it would be unlikely to have a direct political bias.
My conjecture is that maintaining a more balanced number of internal vertices would result in a better global solution (e.g. districts which are pentagons have an average internal angle of 108 degrees, while triangles average 60 degrees, pentagons will tend to be more circle-like).
Splitline sometimes favors creation of needle-like triangles. A median line will tend to pass through higher density areas. In Colorado, the initial split is a north-south line through the Denver area. But a second split is not an east-west split, but rather a needle-like triangle from the Wyoming border down into the Denver area, with much of the population concentrated in the tip.
In Illinois you get a number of cuts chopping off pieces of Chicago. Eventually you will get some larger divisions down-state but mostly you have districts aligning in Chicago with a northwest-southeast orientation, reflecting some of the earlier cuts.
I had thought about a division of an area with N districts into two areas of M and N-M districts, where
1 <= M < N, and choosing the division that produced the best areal balance. Perhaps instead of area, you could use some geometric compactness measure.
A weakness of shortest line is that it may focus too much on local shapes. The shortest cut across California begins at Santa Monica because it the closest point to the Nevada border (and you would expect that a shortest line be normal to the Nevada border), plus there are lots of people in the area.
Imagine that you are applying your algorithm to California. Your first cut will be in the LA area to get enough population on both sides of the line, and will tilt northward to get enough area. Southern California is wider than northern California, so it might not have to reach to Oregon.
Perhaps an east-west line between San Francisco and Los Angeles will work for a second division. But then the northern coastal areas are going to be split into long then strips.
Back to mechanics. You will want to use a gnomic projection, which ensures that straight lines on your map correspond to great circles on a sphere.
If your atoms are census blocks, each has:
(1) population, which may be 0;
(2) an internal point, which is typically the centroid of the block, but may be shifted to be internal to the census block, and for your purposes is sufficient to represent the population;
(3) a land and water area.
It is relatively trivial for an initial cardinal angle of North, to sort the blocks by a y coordinate (after the gnomic projection) and find the population median, and calculate the total area of the subareas.
For a dividing line of angle d, you don't have to do a full rotation. You can use:
x' = x
y' = y + x sin(d)
For a given angle, sin(d) is a constant.
In effect for a given angle, you don't have to slide the dividing line parallel to the initial line, but can slide it somewhat obliquely, since that line will also be parallel, and you aren't determining distance, but merely whether a point is to the left or right of the line.
You might be able to simply step d by some increment, and then perhaps refine near cross-over points.
I am not confident that f(d) is continuous.
A problem with using census blocks is that there is a ginormous amount of them. Larger states have more than one million census blocks.
An alternative would be to use block groups, which are collections of census blocks with a couple of useful features:
(1) They are roughly 1000 persons in size, so they can aggregate lots of census blocks in sparsely populated areas;
(2) They may have some degree of demographic homogeneity, since they are intended to produce useful statistical data from the ACS, thus districts may have a little bit of COI coherence.
If you use block groups, you probably want to use the center of population (perhaps this is already done by the Census Bureau) for location.
QGIS is quite useful for converting spreadsheet data to maps.
Regarding the power of 2 problem:
I had at one time thought at one time apportioning representatives to whole units like counties, and then using some automated process to divide the residual population.
Think of California, and let's say Los Angeles County is equivalent to 12.437 districts. We apportion 12 districts to Los Angeles, and then assign the residual population of 0.437 districts throughout the county (0.437/12.437) * population of each atom.
Then divide the residual from all counties on a statewide basis.
But let's say we create 1024*53 districts in California. We can apportion those to each county, and perhaps a secondary apportionment to individual cities. Then create districts within each city and county.
And finally aggregate the districts pairwise in 10 rounds to create the final 53 districts.