Surveys often make much of comparisons, such as the average number of lawyers for every billion dollars of revenue between various industries, but when are those differences meaningful? If the average for retail is 3 lawyers per billion and the average for consumer products is 3.5, is it legitimate to call out that gap of a half lawyer?
As another example, a study might trumpet the average total compensation of lawyers in manufacturing being “much less than” the average of lawyers in technology, or another announces that the average settlement paid last year in patent infringement cases was “12 percent higher than” the average this year. The question is whether 12 percent is or is not negligible. Those who manage law departments (or law firms) should understand and appreciate whether differences journalists and publicists seize on are meaningful in a statistical sense. If there were lots of surveys or studies done the same way, would the highlighted average-gap hold? This article explains one method to find out, the so-called t-test.
Let’s try out our statistical tool on real compensation data, Major, Lindsey & Africa’s 2013 report on U.S law departments. We can compare the average total compensation of lawyers who are not general counsel in two industries to see if the gap between those averages deserves special attention.
Major, Lindsey’s report has data on 163 lawyers in manufacturing company law departments and 81 lawyers in financial services. The average total compensation for manufacturing was $189,835 and for financial services was $194,103. Is the gap of $4,267 meaningful or could it simply be that this particular set of 244 lawyers came out that way, but another random set of about the same number of in-house counsel from the same industries could be materially different? The answer lies in those two collections of data points, their respective variances, means, and the sample size (as reflected in the “degrees of freedom”). Bring out some t-test statistics and we will find out!
To start, you need to appreciate these three pieces of information that the test relies on: average, variance and degrees of freedom.
The averages of the sets of data (data analysts call it the “mean”) are crucial. Getting an average is simple: add up all the comp figures of the lawyers and divide that sum by the number of lawyers. We presented those averages above as well as the result of subtracting the lower from the higher ($4,267).
Next, variance describes how spread out, how dispersed, a set of numbers are. More technically, variance summarizes how far each piece of data, such as the total compensation of each lawyer in the survey, stands from the average (mean). Thus, if one lawyer’s comp was $250,000, but the mean over all lawyers was $251,000, that lawyer’s (data point’s) deviation was $1,000.
When software squares all the deviations of that sample of data and adds them together you have the variance, which is a good indication of dispersion (the standard deviation of the data, much used in statistics, is the square root of the variance – what number, multiplied by itself, equals the variance). The variance states numerically how variable are whatever sets of numbers whose means you are comparing.
Third, you need to know how many elements there are in the data set. That number is important so that the t-test can take into account degrees of freedom. Think of degrees of freedom as approximately a bit larger than the number of data points in the group that provides the averages.
Surveys will tell you averages, but they usually do not pair the average with a variance (or standard deviation). They generally tell you how many participants took part in the survey, but may not tell you how many were in each subset that produced the averages.
An exemplary survey states the variances of its data sets. Otherwise, a reader cannot understand or calculate the variability. But for this article we have the variance: for the manufacturing group it is 2,470,835,865; for finance it is 2,803,001,809. The variances are large because we have lots of data and we are squaring some sizeable deviations, perhaps tens of thousands of dollars. [Note that there were wider swings in total comp (a larger dispersion around the mean) among the financial services lawyers.]
We admit we have sneaked in another statistical concept, that of populations versus samples. All the in-house counsel of all manufacturing companies in the United States would be the population, but we have only a small sample from that large group. The wonder of statistics is that it enables us to make inferences about a population based on a representative sample from it. So, put another way, if we collect sets of lawyer compensations multiple times regarding a randomly selected group of U.S. lawyers in manufacturing and financial services a thousand times and calculated the means of each industry, what would the pattern be of the thousand means? Would the initial finding of the $4,267 difference turn out to be representative so we can make much of it?
Below we can see the plot of the data points and the mean as a vertical line for the two sets. The unaided eye cannot reach a confident conclusion as to whether the two industries actually have similar enough averages that sampling error could account for the difference.
We will skip over the requirement for the t-test that the comp data should conform reasonably to a normal distribution – a bell curve appearance if we plot how many instances there are of each total compensation amount. (The term “normal” does not imply anything about “expected” or “natural” – it is statistical jargon for a specific distribution of data that occurs quite a bit.) In fact, the data for the manufacturing lawyers does look to be distributed fairly normally, as seen in the histogram below with a kernel density line superimposed. In any case, we have enough data to ignore this requirement.
Software combines means, variances and degrees of freedom and performs the t-test. With our Major, Lindsey data, it outputs what is called the t statistic, -0.48282, and the degrees of freedom, 89.065. (In the old days, statisticians would consult long tables to find out from those two values what the next value is, the p-value.) Now, software tells you.
The p-value turns out to be 0.6304. That number tells us clearly that the likelihood is strong that these two samples of comp data came from the same industry (population). This, by the way, is what statisticians call the null hypothesis – that there is in reality no difference in mean compensation between the industries from which the samples were drawn. A p-value of 0.05 means there is a 5 percent chance that two samples from the same population (null hypothesis true) could have resulted in a difference in the means that is as big or bigger than the one that was observed. With such a small probability, statisticians “reject the null hypothesis,” because it’s just too unlikely that a difference in means that big could have been generated from a single population. Since the p-value from this t-test is much higher than 0.05, we have to accept that the two means are not statistically distinguishable.
Financial services in-house counsel may indeed make more than their manufacturing counterparts; but based on this large data set and the statistical t-test, that difference shouldn’t be highlighted.
So, as we are bombarded in the legal industry by surveys and the comparisons they announce between averages, we should keep in mind the statistical t-test. Given the respective variances, averages and degrees of freedom of the surveyor’s data set, how much faith can we place in the findings of differences? Moreover, those who publish survey results should include variances so that readers can employ the potent scrutiny of statistics.
Published September 8, 2015.