News:

Welcome to the Golf Club Atlas Discussion Group!

Each user is approved by the Golf Club Atlas editorial staff. For any new inquiries, please contact us.


Patrick_Mucci

Re:You must be kidding !
« Reply #50 on: May 06, 2006, 07:30:50 PM »
Jim Nugent,

What you and a few others seem to miss is that we're not segregating the long hitters into 10 % increments.

The issue is, that they all hit it long.
That the distance problem is systemic, not isolated.

I only watched the Tournament briefly today, but, when Furyk and Van Pelt hit 5 -iron into the 217 yard par 3 and Goosen hits a 6 iron, that should tell you that distance is out of control.

Nicklaus in his best days never routinely hit 6 iron 217, or 5 iron either.

I believe Jim Awtrey recently commented that too many people are focused on the distance problem in the context of the driver, overlooking the distance problem with irons.

217 used to be a long iron and the trajectory those clubs traveled on was a lot flatter than a 6-iron.

Thus the intended interfacing of the architecture with the player is rendered useless, especially on older golf courses.

Jim Nugent

Re:You must be kidding !
« Reply #51 on: May 07, 2006, 01:50:30 AM »
Patrick -- I agree with what you just said, and think I did in my post.  Technology is sending the ball into outer space.  Lots of great courses are no longer competitive.  

David was making a somewhat different point.  He was asking if 1) there is a correlation between distance and tour success, and 2) distance is more important now than it used to be.  

I didn't answer the second, but I tried to answer the first.  Distance and tour success seems to me poorly correlated.  The big majority of long hitters don't have much tour success.  Only a few do.  Among those who do have tour success, about half are short or medium hitters.  

Obviously something must be done about today's long ball, if we want to keep playing major tournaments on the great classic courses.  


DMoriarty

Re:You must be kidding !
« Reply #52 on: May 07, 2006, 02:48:10 AM »
. . . Distance and tour success seems to me poorly correlated.  The big majority of long hitters don't have much tour success. . . .

The fact that many long hitters are not successful is really beside the point.  All it proves is distance doesnt guarantee success.  No one says it did.  A better way to determine correlation is to examine the whether the successful players are longer.  You pay this lip service below, but I dont agree with your conclusions . . .

Quote
Only a few do.  Among those who do have tour success, about half are short or medium hitters.  

Only a few longer hitters have tour success?  Perhaps we should get a few definitions straight here . . . just what do you mean by longer hitters?  tour success?   Looking at the statistics, I have no idea what you mean.  

Of the top 40 money winners last year . . .

11 of 40 (27.5%) averaged-- averaged-- over 300 yards.
24 of 40 (60%) averaged over 290 yards.  
31 of 40 (77.5%) averaged over 285 yards.  
2 of 40 ( 5%) averaged less than 280 yards.  

What would you call a strong correlation between distance and success?  50% over 300 yards?  100% over 290 yards?  

Remember we are talking about averages.  

Jim Nugent

Re:You must be kidding !
« Reply #53 on: May 07, 2006, 06:40:12 AM »
David, what you've established is that tour players hit the ball further than most golfers.  No surprise there.  Just as interior linemen in pro football are bigger/stronger than the average guy.  If you want to compare pro's to everyone else, you are surely right.  Now, yesterday, and probably always.    

But that is not the question, is it?  The question is, on the PGA tour does distance correlate with success?  

Well, around 40% of the top 10, 20, 30 and 40 money winners last year were not even in the top 100 in driving distance.  Compared to their peers, they were short hitters.  

And of the top 40 in driving distance last year, only around 10 by my eyeball count were in the top 40 money winners.  

So a very large percentage of money winners did not hit a long ball last year, compared to other pro's.  And nearly all the real long hitters did not make much money at all.  

Like I said, that seems like a weak correlation to me.  


DMoriarty

Re:You must be kidding !
« Reply #54 on: May 07, 2006, 03:32:11 PM »
David, what you've established is that tour players hit the ball further than most golfers.  No surprise there.  Just as interior linemen in pro football are bigger/stronger than the average guy.  If you want to compare pro's to everyone else, you are surely right.  Now, yesterday, and probably always.    

But that is not the question, is it?  The question is, on the PGA tour does distance correlate with success?  

Well, around 40% of the top 10, 20, 30 and 40 money winners last year were not even in the top 100 in driving distance.  Compared to their peers, they were short hitters.  

And of the top 40 in driving distance last year, only around 10 by my eyeball count were in the top 40 money winners.  

So a very large percentage of money winners did not hit a long ball last year, compared to other pro's.  And nearly all the real long hitters did not make much money at all.  

Like I said, that seems like a weak correlation to me.  

Jim it is certainly not just that the Pros are longer, it is also that short hitters have a much harder time making it as Pros.  In other words, as Patrick pointed out, they are all really long.  

But even on the tour, being longer is a big benefit.  

Longest 40 Drivers' average earnings, $1.67 million
Shortest 40 Drivers' average earnings, $743,000
Tour Average Earnings, around 1.12 million.

How's that for an advantage?  

Other evidence of the correllation?  

The Top 10 money winners averaged close to 7 yards more off the tee than the Tour average.

The Top 10 money winners also averaged about 10 yards more per drive than the bottom 40 money winners.

The Top 40 money winners averaged around 4 yards more per drive than the Tour Average, and around 8 more yards than the bottom 40 money winners.  

While there are exceptions, distance pays.  



Marc Haring

  • Karma: +0/-0
Re:You must be kidding !
« Reply #55 on: May 07, 2006, 03:42:28 PM »
Just checked out the PGA tour website and Trevor Immelmen  is averaging 305 yards off the tee and is 61st on the stats for the week.

Jim Nugent

Re:You must be kidding !
« Reply #56 on: May 07, 2006, 04:34:07 PM »
David, what you've established is that tour players hit the ball further than most golfers.  No surprise there.  Just as interior linemen in pro football are bigger/stronger than the average guy.  If you want to compare pro's to everyone else, you are surely right.  Now, yesterday, and probably always.    

But that is not the question, is it?  The question is, on the PGA tour does distance correlate with success?  

Well, around 40% of the top 10, 20, 30 and 40 money winners last year were not even in the top 100 in driving distance.  Compared to their peers, they were short hitters.  

And of the top 40 in driving distance last year, only around 10 by my eyeball count were in the top 40 money winners.  

So a very large percentage of money winners did not hit a long ball last year, compared to other pro's.  And nearly all the real long hitters did not make much money at all.  

Like I said, that seems like a weak correlation to me.  

Jim it is certainly not just that the Pros are longer, it is also that short hitters have a much harder time making it as Pros.  In other words, as Patrick pointed out, they are all really long.  

But even on the tour, being longer is a big benefit.  

Longest 40 Drivers' average earnings, $1.67 million
Shortest 40 Drivers' average earnings, $743,000
Tour Average Earnings, around 1.12 million.

How's that for an advantage?  

Other evidence of the correllation?  

The Top 10 money winners averaged close to 7 yards more off the tee than the Tour average.

The Top 10 money winners also averaged about 10 yards more per drive than the bottom 40 money winners.

The Top 40 money winners averaged around 4 yards more per drive than the Tour Average, and around 8 more yards than the bottom 40 money winners.  

While there are exceptions, distance pays.  




David, your analysis shows some of the problems arguing with statistics.  Of the shortest 40 drivers last year, many are older, in the twilight of their careers.  Here are a few names I saw:  Corey Pavin, Loren Roberts, John Cook, Larry Mize, Kirk Triplett, Jay Haas, Jeff Sluman, Mark O'Meara.  Are they winning less money because they don't hit the ball far enough, or because no one on earth can play tournament golf after 45 the way they did when they were 30?

Here's the next flaw in the analysis.  How many tournaments each player played.  I don't want to take the time to do the averages.  But I do see that guys like Roberts, Haas, Mize and Triplett did not play anywhere near as much as most other players.  Again no surprise.  They are not really full-time players on the tour.  Of course their earnings will be less.

I think you are partly right.  There is something of a correlation.   But it is not strong, and has tons of exceptions: at least 40% and probably more.  IMO the stats make real clear that winning golf still requires a whole lot more than just flogging.  

Patrick_Mucci

Re:You must be kidding !
« Reply #57 on: May 07, 2006, 07:59:57 PM »
In watching a little of the tournament today, on a 490 yard hole, under wet, cold conditions, a player hit a driver, 8-iron to a few feet from the hole.

490 yards under WET, COLD conditions and he hit driver, 8-iron.

Whomever said that there isn't a distance problem.
[size=6x]
You must be kidding !
[/size]
« Last Edit: May 07, 2006, 08:00:25 PM by Patrick_Mucci »

Jordan Wall

Re:You must be kidding !
« Reply #58 on: May 07, 2006, 11:48:26 PM »
In watching a little of the tournament today, on a 490 yard hole, under wet, cold conditions, a player hit a driver, 8-iron to a few feet from the hole.

490 yards under WET, COLD conditions and he hit driver, 8-iron.

Whomever said that there isn't a distance problem.
[size=6x]
You must be kidding !
[/size]

If they were that long wouldnt they be able to hit 650 yard holes, in wet and cold conditins.

It is a problem yes but I doubt any pro will hit a 300 yard drive then a 190 yard 8-iron.

DMoriarty

Re:You must be kidding !
« Reply #59 on: May 08, 2006, 01:46:29 AM »
David, your analysis shows some of the problems arguing with statistics.  Of the shortest 40 drivers last year, many are older, in the twilight of their careers.  Here are a few names I saw:  Corey Pavin, Loren Roberts, John Cook, Larry Mize, Kirk Triplett, Jay Haas, Jeff Sluman, Mark O'Meara.  Are they winning less money because they don't hit the ball far enough, or because no one on earth can play tournament golf after 45 the way they did when they were 30?

If I am not mistaken didnt you offer up the likes of Brett Wetterich, Scott Gutschewski, John Elliot, Will MacKenzie, and various other inexperienced players struggling to make it on Tour to support your theory that there was no correlation between distance and success?  Interesting that now you want to weed through my statistics and throw out a bunch of established veterans that you think might skew the statistics.  Hmmm . . .

But if you want . . . . Forget about the bottom 40.  Look at the tour averages compared to the long hitters. The long hitters earned almost half-again the amount of the average player.  A half million dollars more.  That seems pretty significant to me!  

Quote
Here's the next flaw in the analysis.  How many tournaments each player played.  I don't want to take the time to do the averages.  But I do see that guys like Roberts, Haas, Mize and Triplett did not play anywhere near as much as most other players.  Again no surprise.  They are not really full-time players on the tour.  Of course their earnings will be less.

Not near as much?   These are all full time players with enough drives to qualify to be listed in the official stats (unlike long hitting winner Ernie Els.) The top 40 longest drivers averaged about 81 rounds.  The 40 shortest drivers averaged about 73 rounds.   I'll bet if we looked closer we would see that this 10% difference is largely explained by cuts made, rather than tournaments entered.  

But even if we adjust the statistics for dollars per round, the longest 40 drivers still made over twice as much money per round played.  

Quote
I think you are partly right.  There is something of a correlation.   But it is not strong, and has tons of exceptions: at least 40% and probably more.  IMO the stats make real clear that winning golf still requires a whole lot more than just flogging.  

Actually I think it is a pretty strong correlation, considering how many factors go into successful golf.  Of the top 40 money winners, 72.5% were in the top 1/2 of drivers, averaging at least 288.4 yards.  

It is easy to focus on a few impressive outliers like Jim Furyk and miss the overwhelming trend to the contrary.   But there just arent that many Jim Furyk's out there and there are fewer and fewer every year.  

Bryan Izatt

  • Karma: +0/-0
Re:You must be kidding !
« Reply #60 on: May 08, 2006, 03:04:45 AM »
If you want to talk correlation, you should do a proper correlation calculation.  Taking the top 59 earners (leaving out Darren Clarke because there doesn't seem to be an average driving distance for him) this year, up to and including the Wachovia, the following scattergram results:



Visually there is no discernable relationship between money earned (success) and driving distance.  

The correlation coefficient (r) is 0.0224 (a perfect correlation would have an r of 1.000).  The correlation coefficient for this sample of the 59 most successful players this year has a statistical significance of 12% (where 99% would be statistically significant).

The statistical conclusion is that for the top 59 most succesful PGA Tour players through the Wachovia tournament this year, there is absolutely no correlation between success and driving distance.

From the scattergram, you could infer that there are only a handful of succesful players that drive the ball further than 300 yards on average.  Equally there is a handful of succesful players that drive the ball less than 280 yards.  I think it would be a safe bet to say that the majority band (between 280 and 300) is about 25 yards longer than it was 15 or 50 years ago.  If you apply that to 14 driving holes per course, the central pool of players has gained about 350 yards off the tee per course.  I suppose that's why the formerly 6900 yard championship courses are now 7300 yards.  Would it destroy the integrity of classic courses to add that 25 yards times 14 holes, by moving back tees, to recalibrate to modern distances?

When I was watching the highlights on the news, they mentioned two shots - a 190 yard 3 iron from Immelman, I think, and a 145 yard 8 iron by Furyk.  You can always find anecdotal examples to support either side of this debate.
« Last Edit: May 08, 2006, 12:21:07 PM by Bryan Izatt »

JLahrman

  • Karma: +0/-0
Re:You must be kidding !
« Reply #61 on: May 08, 2006, 09:17:31 AM »
Bryan,

The problem with your correlation is that it compares average driving distance to total money won.  While any dichotomous correlation will always leave out some important considerations, a much more relevant comparison would be to correlate the average driving distance to the average percent of a tournament purse won by each player.  This would address two issues:

1.  The differences between purse sizes in tournaments.
2.  The fact that some players play far more than others.

Have you given us the r or the r-squared?

I'm assuming your 12% significance is done by comparing the correlation to zero, which is the only way I know to test the significance of a correlation.  Using 5% as the significance level, I would interpret your figure to mean that there is really no correlation.

Please let me know if I've misunderstood anything about your post...
« Last Edit: May 08, 2006, 09:54:47 AM by JAL »

Bryan Izatt

  • Karma: +0/-0
Re:You must be kidding !
« Reply #62 on: May 08, 2006, 12:30:19 PM »
Joel,

I managed to drop a zero in reporting the r above.  It should have been 0.0224, not 0.224.  I've edited it to correct it.  So, yes, I'm saying, based on this data, that there is no correlation.

To your point that there could be better data to support or refute the correlation of success and driving distance, of course that's true.  I was using the data that was readily available on the Tour web site.  To try to address part of your point I did a correlation of driving distance and earnings per start to try to accomodate the variation in the number of tournaments played (which ranged from 7 to 14 for this sample group).  The correlation coefficient rose to 0.07.  Still not nearly significant.

This is the r, not the r-squared.

JLahrman

  • Karma: +0/-0
Re:You must be kidding !
« Reply #63 on: May 08, 2006, 04:00:03 PM »
I'm sure data availability is an issue.  And using the earnings per start is a much better indicator in my view, yet still doesn't show any correlation.

DMoriarty

Re:You must be kidding !
« Reply #64 on: May 08, 2006, 05:26:31 PM »
Bryan,  Thanks for the chart.  I have a number of comments.

1.  I think you will agree that all of last year is a better sample if only because a huge chunk of this year's money is still on the table.  

2.  Looking at the top 59 earners really misses the point.  It just isnt a wide enough sample if we are looking for a correlation between distance and success on Tour.  None of the unsuccessful guys even made it into your sample, so your chart tells us nothing about them.

3  Lastly and perhaps most importantly, I dont think you are using the term "statistically significant"  correctly.   Surely one does not need an r value close to 1.000 for the data to be considered statistically significant.  What about situations (like this one) where many different factors combine to produce a result?  Your "statistical significance" definition would make meaningful statistical analysis of such a situation impossible.  

I dont recall exactly how to determine "statisical significance, but I believe it has something to do with amount of variation over a large sample size.  In other words, if you have a large enough sample size, even data with tiny or partially inconsistent variation may be statistically significant.  

Here is a chart for 2005, listing all the the golfers with enough rounds to make the driving distance list, and their monetary winnings.  



I had Excel add a linear trend line (and formula) and calculate the r and r squared.  My numbers are quite a bit different than yours . . .

2005
r     =     0.2869
rsq  =     0.0823

Given the number of variables involved in winning money on tour, this seems like a pretty strong correlation to me.   For example, if we compare it to 1980 (following the USGA's lead) we see that the correlation in 2005 is quite a bit stronger, expecially if we compare the "r squared,"  which I believe is the proper one to use when comparing a changing correlation.  

1980
r     =     0.1633
rsq  =     0.0266

This to me looks like pretty convincing evidence that  

1.  There was a correlation between driving distance and winnings in 2005; and

2.  This correlation is much stronger than it was in 1980.

Does anyone know how to quickly do a "t test" or something to figure if our difference is "statistical significance?"

« Last Edit: May 08, 2006, 06:12:26 PM by DMoriarty »

JLahrman

  • Karma: +0/-0
Re:You must be kidding !
« Reply #65 on: May 08, 2006, 11:27:13 PM »
David,

The concept of statistical significance is really useless on correlations.  The only measure of statistical significance shows whether or not the correlation is significantly differenct than zero.

Testing can be done on a data set using chi-square, t-, Mann-Whitey, etc. tests to attempt to find significant differences.  This cannot be done on correlations.

Normally, I would like to see a correlation of at least .5 between two variables on an interval scale before I would consider the correlation meaningful (I'm purposely avoiding using the word signficant).  If the idea is that distance is truly the lynch pin for performance on Tour, in my opinion both of the correlations you show are low.  However, there is no way to test for any significant differences between your two r values or two r squared values.

While I think it's a bit of a random approach, my idea for a t-test for significant differences would be to take the ten longest drivers from 1980 and 2005 (ten being the random number - it could be anything we wanted), finding and listing the percent of the purse each player won in each tournament he played, and running an independent samples t-test between the two data sets.  This would simultaneously circumvent the issues associated with differing numbers of tournaments played and inflation.
« Last Edit: May 08, 2006, 11:34:54 PM by JAL »

Jim Nugent

Re:You must be kidding !
« Reply #66 on: May 08, 2006, 11:53:28 PM »
David, besides JAL's comment, seems to me another point weakens the correlation.  You are looking for trends for the entire tour.  Well, leave out just three players -- Tiger, Mick and Vijay -- and I think your trend line might change dramatically.  Without doing any calculations, I'm guessing your trend line would have around zero slope.  If so, doesn't that mean that for around 98.5% of all players, the correlation is near zero?  

Bryan Izatt

  • Karma: +0/-0
Re:You must be kidding !
« Reply #67 on: May 09, 2006, 12:50:20 AM »
David,

I'm impressed with your determination in creating the data.  It was kind of tedious to match up the driving distance to the earnings, so I stopped at 59.  Sure, using the full sample set would provide a more reliable answer.  How many did you use?

It is possible that outliers in the data can significantly alter the correlation coefficient.  Your chart has 4 outliers (such as Tiger) above $4M in earnings.  You might try the correlation again to see the effect of removing those outliers.

The easiest way to determine the significance is to consult a table found in most statistics texts that shows the r values at various levels of significance depending on the degrees of freedom.  The degrees of freedom are two less than the number of samples you have.  Assuming you have 200 or so samples an r of 0.138 is significant at 95% while an r of 0.181 is significant at 99%.  Since the r you calculated is greater than either of these r values, then the correlation is significant at least to 99%.  

However, the r-squared is an indication of how much of the variation in the earnings is explained by the independent variable - driving distance.  Your calculation shows it to be .08 or 8%. So, driving distance only explains 8% of the the variation in earnings, i.e. it doesn't explain much.  That's visually evident from the scattergram.

The least squares line is obviously not a very good predictor of earnings (or success) given the scatter.

You might try the earning per start to address the issues Joel sees in using this data.

To calculate the significance of the difference in the correlations between 1980 and 2005 you could use the following process from G. David  Garson of NCSU.

"Significance of the difference between two correlations from two independent samples

To compute the significance of the difference between two correlations from independent samples, such as a correlation for males vs. a correlation for females, follow these steps:

   1. Use the table of z-score conversions or convert the two correlations to z-scores, as outlined above. Note that if the correlation is negative, the z value should be negative.
   2. Estimate the standard error of difference between the two correlations as:

      SE = SQRT[(1/(n1 - 3) + (1/(n2 - 3)]

      where n1 and n2 are the sample sizes of the two independent samples
   3. Divide the difference between the two z-scores by the standard error.
   4. If the z value for the difference computed in step 3 is 1.96 or higher, the difference in the correlations is significant at the .05 level. Use a 2.58 cutoff for significance at the .01 level.
"

Assuming you used 200 players from each year, the methodology above results in a z value of 1.27. This is not significant at the .05 level, therefore the difference is not really statistically significant.

I think therefore that your conclusions are a little strong.  You could try taking out the outliers and using earnings per start to see if that helps demonstrate any stronger relationship.


DMoriarty

Re:You must be kidding !
« Reply #68 on: May 09, 2006, 01:23:10 AM »
If you guys held the USGA to the same standards of proof as you hold the USGA's detractors, then maybe we wouldnt be in the mess we are in now.

Normally, I would like to see a correlation of at least .5 between two variables on an interval scale before I would consider the correlation meaningful (I'm purposely avoiding using the word signficant).  If the idea is that distance is truly the lynch pin for performance on Tour, in my opinion both of the correlations you show are low.  However, there is no way to test for any significant differences between your two r values or two r squared values.
JAL,

I just can't see how this is correct.  No one I know has ever claimed that distance is the only factor which determines success.  Rather, the claim is twofold:

1.  Distance has become a much more important factor than it used to be.  
2.  Distance is has become too important, compared to the other factors.  

The first claim is potentially verifiable from the statistics, and it looks to me that the statistics do indeed verify this.   Distance is more closely correlated to money than it used to be.  

The second claim is much more subjective and requires that we decide how much of a correlation is too much.   I think truth of the first the first claim provides pretty good evidence of this second, but I may have a lower tolerance for the changes in the game than you.  

In either case, to dismiss the correlation because it is not equal or greater than 0.5 is just too draconian.  I just dont think there is anything magical about r >=0.5 from a statistics standpoint.   If it was a magical threshhold, multiple variables would render any meaningful statistical analysis of multiple cause events meaningless.  I just dont think this is the case.  

Quote
While I think it's a bit of a random approach, my idea for a t-test for significant differences would be to take the ten longest drivers from 1980 and 2005 (ten being the random number - it could be anything we wanted), finding and listing the percent of the purse each player won in each tournament he played, and running an independent samples t-test between the two data sets.  This would simultaneously circumvent the issues associated with differing numbers of tournaments played and inflation.

I think you overestimate the degree of variance in the tournaments issued  I am only using stats from those who had enough rounds to qualify in the driving stats.   Also, I'd be willing to bet the the top money guys actually started less tournaments.  If so, this works for my point, not against it.  

As for inflation, I dont think it matters much, if at all.   Run the correlation with money rank instead of the dollar figures, and you will see that there was a much stronger correlation in 2005.  

I had Excel run a ttest on both sets of data using all the drives and got extraordinary low values. Something like 2.34E-30.
_________________________________

Quote
David, besides JAL's comment, seems to me another point weakens the correlation.  You are looking for trends for the entire tour.  Well, leave out just three players -- Tiger, Mick and Vijay -- and I think your trend line might change dramatically.  Without doing any calculations, I'm guessing your trend line would have around zero slope.  If so, doesn't that mean that for around 98.5% of all players, the correlation is near zero?

Let me get this straight . . . we are trying to test for a correlation between distance and success, and you want to throw out the best three players on tour because they are too long and too successful?  For what are we trying to correlate again?  

If you want to throw some guys out, why not throw out Furyk, DiMarco, and Funk.  Would that make any sense at all?  What do you suppose would happen to the chart then?

If you think the top 3 just made too much money, then knock them down to Furyk's level and you get substantially the same results as posted.  But elimating guys because you think they are too long and too successful just isnt going to wash.  

DMoriarty

Re:You must be kidding !
« Reply #69 on: May 09, 2006, 02:12:31 AM »
I'm impressed with your determination in creating the data.  It was kind of tedious to match up the driving distance to the earnings, so I stopped at 59.  Sure, using the full sample set would provide a more reliable answer.  How many did you use?

All of them listed on both lists. 203, I think. Some, like Ernie Els (a long hitting big winner,) were not on the driving list because they did not have enough rounds.  

Quote
It is possible that outliers in the data can significantly alter the correlation coefficient.  Your chart has 4 outliers (such as Tiger) above $4M in earnings.  You might try the correlation again to see the effect of removing those outliers.

I dont know we should consider them outliers in the sense of somehow not reflecting what we are trying to study.  They were just incredibly successful and very long.  There is a fixed pool of money, and these guys just go a bigger chunk of that.  Leaving them off would bias the lines against the big hitters.  Furyk and the short hitting successful guys are much more of an anomoly, percentage wise.

But to humor you guys, here is the chart comparing distance to money rank, getting rid of the perceived outlier problem and inflation problem in one step.  



   r = -0.2273
rsq = 0.0517

It makes a difference but the difference is not huge.

Notice also that I've changed my scale.  It is easier to see that there indeed is a visible trend if we stand back a little.  We were too close to the Monet.  

Quote
Since the r you calculated is greater than either of these r values, then the correlation is significant at least to 99%.

Instead of using a chart, I calculated p values on excel.  The numbers were very low, indicating a probability of a correlation at something greater than 99.9999999%.

Quote
So, driving distance only explains 8% of the variation in earnings, i.e. it doesn't explain much.  That's visually evident from the scattergram.
 

I dont understand the the logic behind your conclusion that 8% is "not much." Driving distance is 8% responsible for relative success among a sampling of the best players in the world.  Before you dismiss that number as "not much," shouldnt you at least consider how that percentage has changed over the years.  Going from 2% to 8% seems like a change with some pretty significant ramications to me.
« Last Edit: May 09, 2006, 02:28:53 AM by DMoriarty »

Bryan Izatt

  • Karma: +0/-0
Re:You must be kidding !
« Reply #70 on: May 09, 2006, 03:31:41 AM »
David,

Quote
If you guys held the USGA to the same standards of proof as you hold the USGA's detractors, then maybe we wouldnt be in the mess we are in now.

Huh????  Are you suggesting we don't statistically critique the USGA studies? I think I did.  Are you suggesting that the USGA would listen to any of us?

Curious that Ernie wasn't on the driving list.  On the money list he showed as playing 11 tournaments.  Should have had somewhere around 80 driving measurements.  Another conspiracy afoot here.

Quote
I dont know we should consider them outliers in the sense of somehow not reflecting what we are trying to study.  They were just incredibly successful and very long.  There is a fixed pool of money, and these guys just go a bigger chunk of that.  

They are outliers in the statistical sense.  If you read up on how to do statistics you'll see that dealing with outliers is important if you want to have a meaningful trend line.  I'm sure you noticed that of the top 5 earners, three were long and two were short and thus look like outliers.  Using rankings is one way to deal with it.  Why didn't you use rankings for distance too?  What was the slope of the line when you removed the 5 outliers and used the money earned vs distance correlation?

Quote
Notice also that I've changed my scale.  It is easier to see that there indeed is a visible trend if we stand back a little.

I didn't find the scale change very helpful.  And changing money ranking to the x-axis suggests that it is the independent variable.  Why don't you replot it using money and distance rankings, put money on the y-axis where it belongs and put the best fit least squares line on it?  And, if you're going to refer to 1980 can you post the 1980 scattergram too.

Quote
Driving distance is 8% responsible for relative success among a sampling of the best players in the world.  Before you dismiss that number as "not much," shouldnt you at least consider how that percentage has changed over the years.  Going from 2% to 8% seems like a change with some pretty significant ramications to me.

Well, if the other 92% doesn't seem like a lot to you, so be it.  I kind of thought there might be other factors that might do a better job of describing success.  Putts, GIR, sand saves, scrambling, fairways hit.  In the context of a total explanation of success, 8% is not much.

As to comparing to 1980, you are conveniently ignoring the analysis I provided which shows the difference in the correlations is not significant.

DMoriarty

Re:You must be kidding !
« Reply #71 on: May 09, 2006, 04:22:29 AM »
David,

Huh????  Are you suggesting we don't statistically critique the USGA studies? I think I did.  Are you suggesting that the USGA would listen to any of us?

This wasnt directed to you but rather those who are looking give weight to Rugge's absurd anecdotes.

Quote
Curious that Ernie wasn't on the driving list.  On the money list he showed as playing 11 tournaments.  Should have had somewhere around 80 driving measurements.  Another conspiracy afoot here.

Look at the statistics.  Most the guys on the list have 70 or eighty rounds, which is much more than Ernie had.  He just didnt play enough rounds.  


Quote
They are outliers in the statistical sense.  If you read up on how to do statistics you'll see that dealing with outliers is important if you want to have a meaningful trend line.  I'm sure you noticed that of the top 5 earners, three were long and two were short and thus look like outliers.  Using rankings is one way to deal with it.  Why didn't you use rankings for distance too?  What was the slope of the line when you removed the 5 outliers and used the money earned vs distance correlation?

I am not all that concerned with the trend line. I am concerned with the correlation.  The "outliers" did not make a huge difference in the correlation. I put it on there so as to demonstrate that there was actually a trend up, but took it off in the next chart, because as you pointed out it is of limited value.  I have and the correlation is still there.  Why dont you use the rankings and double check it.   I removed 6 outliers (3 and 3) and the r was something like .24.  

Quote
Notice also that I've changed my scale.  It is easier to see that there indeed is a visible trend if we stand back a little.

Quote
I didn't find the scale change very helpful.  And changing money ranking to the x-axis suggests that it is the independent variable.  Why don't you replot it using money and distance rankings, put money on the y-axis where it belongs and put the best fit least squares line on it?  And, if you're going to refer to 1980 can you post the 1980 scattergram too.

I didnt mean to switch the axis but you get the idea.  The correlation is there and the axis is not going to change the r value.  

Quote
Well, if the other 92% doesn't seem like a lot to you, so be it.  I kind of thought there might be other factors that might do a better job of describing success.  Putts, GIR, sand saves, scrambling, fairways hit.  In the context of a total explanation of success, 8% is not much.

Wh0 says tha the other 92% doesnt seem like a lot?  As for other things doing "better job of describing success" I have no idea what you mean.   My understanding is that it just doesnt work that way.  If it used to be 2%, then 8% is quite a lot.  

Quote
As to comparing to 1980, you are conveniently ignoring the analysis I provided which shows the difference in the correlations is not significant.

If you dont think the change from 1980 matters then consider my reference to the 2% a hypothetical used to demonstrate how 8 percent could matter quite a bit.

I am not ignoring your analysis, but rather still pondering it.  I cant make sense of it yet so  therefore have not commented.  I will if I figure it out.  

As for 2003, just what percentage (rsq) do you think would be important?  25%?  50%?  75%?
« Last Edit: May 09, 2006, 04:29:42 AM by DMoriarty »

JLahrman

  • Karma: +0/-0
Re:You must be kidding !
« Reply #72 on: May 09, 2006, 08:50:58 AM »
David,

A couple of notes here before this thread gets too tough for me to follow:

Yes, the use of .5 for a correlation is a random figure.  Such is life with correlations.  Depending on the data and what you are trying to show, .5 may be plenty high or not nearly enough - I use it as a general benchmark but certainly not a rule.  Brian notes the idea behind statistical signficance of correlations, but this only indicates that the correlation is significantly different than zero.  It doesn't say anything about how much of a correlation we should see before relying on it too much, and I doubt that there are standards for the specific relationship we are trying to show.  Your 28% figure may be high enough, and from the standpoint that it is higher than the correlation from the 1980 data it is definitely interesting.  My point is that it is far from being the principal determinant based on the low correlation.  Maybe it is more important than it used to be (which I believe is your point), but still the correlation is not that high.

Secondly, you are right about not worrying about inflation of tour purses.  I hadn't thought that out very well.

It would be interesting to take a whole smorgasbord of statistics and run it through a CHAID or some other decision tree analysis to see if we can make any sense of it.

Looking back, I'm signing off of this thread.  I think valid points are being made but it is getting too difficult to follow for me.
« Last Edit: May 09, 2006, 09:01:19 AM by JAL »

Bryan Izatt

  • Karma: +0/-0
Re:You must be kidding !
« Reply #73 on: May 09, 2006, 11:35:20 AM »

I am not all that concerned with the trend line. I am concerned with the correlation.  The "outliers" did not make a huge difference in the correlation. I put it on there so as to demonstrate that there was actually a trend up, but took it off in the next chart, because as you pointed out it is of limited value.  I have and the correlation is still there.  Why dont you use the rankings and double check it.   I removed 6 outliers (3 and 3) and the r was something like .24.

Here is a quote from Statsoft that explains why you should be concerned about outliers:

"Outliers. Outliers are atypical (by definition), infrequent observations. Because of the way in which the regression line is determined (especially the fact that it is based on minimizing not the sum of simple distances but the sum of squares of distances of data points from the line), outliers have a profound influence on the slope of the regression line and consequently on the value of the correlation coefficient. A single outlier is capable of considerably changing the slope of the regression line and, consequently, the value of the correlation, as demonstrated in the following example. Note, that as shown on that illustration, just one outlier can be entirely responsible for a high value of the correlation that otherwise (without the outlier) would be close to zero. Needless to say, one should never base important conclusions on the value of the correlation coefficient alone (i.e., examining the respective scatterplot is always recommended)."

When I said that the regression line was of limited value, it was limited because there was so much scatter in the data that using the line as a predictor would not be useful.  The slope of the line is informative in trying to understand if changes in driving distance lead to small or large changes in earnings.  It's one thing to say that driving distance explains a lot about earnings (success) on the Tour (which the data doesn't);  it's another to say that increases in driving distance lead to large (or small) gains in earnings (if you don't provide the slopes of the regression lines).

Is there a different slope in the 1980 line vs the 2005 line? And remember that the outliers have a large effect on the slope, but less on the r-squared.
 

Wh0 says tha the other 92% doesnt seem like a lot?  As for other things doing "better job of describing success" I have no idea what you mean.   My understanding is that it just doesnt work that way.  If it used to be 2%, then 8% is quite a lot.  

What I mean is that if you analyzed $ vs putting for instance, you might get an r-squared of 20% and that would suggest that putting accounts for more of the variation in earnings than does driving distance.

The 2% to 8% change you report is not quite a lot, as I tried to point out in the following.  It's not statistically significant.


Quote
As to comparing to 1980, you are conveniently ignoring the analysis I provided which shows the difference in the correlations is not significant.

If you dont think the change from 1980 matters then consider my reference to the 2% a hypothetical used to demonstrate how 8 percent could matter quite a bit.

I am not ignoring your analysis, but rather still pondering it.  I cant make sense of it yet so  therefore have not commented.  I will if I figure it out.  

As for 2003, just what percentage (rsq) do you think would be important?  25%?  50%?  75%?

Is the r-squared of 0.02 what you got by analyzing the 1980 data or is it hypothetical?

Did you mean 2005 in the last sentence?  I don't have a particular r-squared in mind as being "important".  8%, or 6% if you remove the outliers, is not convincing.  The data indicate that there is some covariance between the two variables.  The r-squared indicates that driving distance explains a very small percentage of the variation in earnings.  The analysis I provided you suggests there is no significance to the difference in the correlations between 1980 and 2005.  You haven't provided the least squares fit line for the two years so it's impossible to say (even though the covariance is weak) how much an increase in driving distance contributes to earnings.  Is the line steep or flat?  

Perhaps you could send me your Excel spreadsheet and I could review it myself.
 


DMoriarty

Re:You must be kidding !
« Reply #74 on: May 09, 2006, 11:35:47 AM »
JAL

This thread has been tough for me to follow for a while now.  

It would be interesting to take a whole smorgasbord of statistics and run it through a CHAID or some other decision tree analysis to see if we can make any sense of it.

I agree this would be interesting and your saying so reminds me of what my main point was when I got into this thread.   I got involved in response to someone citing a recent Dick Rugge quote as compelling evidence that we do not have a growing distance gap.  

That is my main frustration.  Instead of doing meaningful and useful analysis, the USGA is bombarding us with trite platitudes and tangential anecdotes to explain away some of the most compelling problems in the game.   I dont think it should be you and me who try to wade through the historical statisitics, it should be the USGA.  Unfortunate, to say the least.

« Last Edit: May 09, 2006, 11:36:52 AM by DMoriarty »