Mac,
I've been more than happy to share my thoughts with Ron and the powers that be.
The Golf Digest list is dramatically different than the other publications. One thing that has never been clear to me is how much of that is driven by the system (the formula) vs the relative make-up of the panel. It'd be fine if the list was a fair representation of the collective view of the panel, but I think the formula is too rigid to account for that.
The numbers cruncher in me thinks it would be relatively easy to figure out what the "right" weights should be for each category. Just ask the panelists to submit an overall score along with the category submission, run a regression and see what shakes out. Golfweek could do this too with their myriad of data (it surprises me that they never do much with this information)
It might be more interesting to turn it around. "Force" the raters to select the higher of two courses in a long series of one-to-one comparisons. After that exercise have them associate attributes with specific courses, qualitatively not quantitatively. These should be a mix of architectural and non-architectural attributes: the latter should include everything from clubhouse food to their playing companions. (They don't need to be given a choice from every attribute on each course.) Run them through the head-to-heads and attribute linking quickly; don't give them enough time to think about any of this too much.
After collecting the results, next infer the "ideal" attributes.
Instead of asking people to score on the attributes and the overall -- ie, what raters are
supposed to rate -- this approach should get at what raters
really rate. The raters would rate the courses in simple fashion then afterwards ascribe attributes; the analysis would show which attributes matter and how much.
The results likely would be quite illuminating. As a bonus, non-architectural attributes at last could be screened from the final result. I haven't thought it through but I think this approach would not be troubled, at least not nearly to the same degree as the current approach, by small sample sizes. The analysis of attributes is not constrained by a need for a minimum number of scores across the courses.
Speaking of which, and riffing on your post, another thought: small sample sizes bedevil these rankings. GD tries to solve this problem by using stringent criteria and by "grooming" their pool of raters. That way they don't need every rater to see every course; their hope is that raters rate homogeneously, like interchangeable automatons.
As already discussed, that's not actually how things happen. A workaround for some people on this board is to single out an individual whose opinion they agree with / respect and use their ratings / recommendations. But GD has this huge pool of raters: surely some of them, if freed from the narrow constraints of formulae, would share the individual reader's preferences.
So why not present the views of rater subsets that match the views of the individual reader? The data and analysis discussed above could feed into a collaborative filtering mechanism to do just that.
This would produce the world's first truly bespoke rankings system. It could even be customized to the situation; eg, "greatest" courses to play alone.
Okay, ramble over.