Cutting through the bullshit.

Sunday, 13 May 2007

We won't be fooled again

As everybody knows, there’s just one thing that’s more boring than grammar, and that’s statistics. Or maybe it was the other way around? I forget. Anyway, it was in a previous incarnation that I was a grammarian.

Over the last couple of months or so a few things I’ve heard and read here and there have reminded me that a lot of people still aren’t sure what labour force statistics mean. I’ve left some comments around, but I thought I’d try to draw together some basic information that might help people who read this blog to understand where these numbers come from, what they mean, and how they’re used. It’s a dirty job, but somebody has to do it. The whole point of this is to make things clear to people who don’t ordinarily think about statistical issues. I’m grateful to Ablokeimet, who has some considerable expertise in stats, and Meezan, who doesn’t, for their comments and corrections. I’ve actually been sitting on this draft for a few weeks waiting for comments from others on my mailing list, but since none have come through, it must be absolutely intelligible. If you don’t think so, or have any specific queries, please post a comment or email me and I’ll try to respond either directly or by making appropriate revisions, or both.

I suppose it should come as no surprise that a lot of people aren’t too clear on how to interpret statistics. And I think this is a shame, because, as I keep saying, people use statistics to lie and deceive, but statistics are not necessarily inherently mendacious. As Ablokeimet pointed out in a response to my comment on leftwrites, ‘Statistics are far too important to let the Government fiddle with them…The real masters of society want figures they can trust.’ I hasten to add that not every country’s statistical agency is as accommodating of their ruling class’s requirement for robust statistics as, for example, the Australian Bureau of Statistics. In Pakistan, for example, everybody thinks the government tampers with statistics before they’re released. The long delays between collection and publication and other factors certainly make this seem plausible. It may even be true. But a lot of the statistics we read about in the media are credible and can tell us things we need to know, if we understand what they mean. Among the things they can tell us is whether the politician using them is trying to hoodwink us.

So what I want to do is to reveal for the first time, the deepest, darkest secrets of the statisticians’ craft – secrets so carefully guarded by generations of statisticians that you must often go as far as the end of the book to find them, or even read the glossary on the website!

To begin with, it may be useful to know that the kind of statistics I’m talking about – ‘social’ or ‘population’ statistics – come from two basic kinds of sources – administrative byproduct data and surveys. Administrative data can be very robust and accurate in a jurisdiction that collects it rigorously. If, however, births and deaths are reported haphazardly, for example, then such ‘vitals’ data is useless, or worse.

Then there are two basic kinds of survey data. A census aims to enumerate every unit in the relevant population. If it’s reasonably successful in doing so, then the figures that come from it are quite reliable. There is no ‘standard error’ associated with them, so you can confidently extract data on small geographical areas and so forth. The disadvantage of a Census, however, is that it is very expensive to survey, say, all 300 million odd people in the US. So, for one thing, they usually only happen once every five or ten years. If you’re lucky. Pakistan’s last decennial census took place in 1987. And for another, they can’t gather data with the fine detail possible when you can ask several carefully worded and sequenced questions. To gather a few basic labour force concepts, the Labour Force Survey in Australia administers a questionnaire of 93 questions, while the 2006 Census asked only thirteen questions to collect labour force data.

Because we know that individuals in society are not completely unique in every respect, it’s possible to surmise general characteristics by observing a proportion of the members of the population of interest. That means that each unit observed represents a particular number of others. The process can be refined by ‘stratifying’ the sample by factors such as geographical location, sex, and age. By multiplying each observation by the ‘weight’, which is derived from the proportion of the total population represented by the sample, we arrive at an ‘estimate’. An ‘estimate’ in this sense is not just a guess. It’s the weighted product of the observations.

The disadvantage of a sample survey, of course, is that it introduces the possibility that the units actually selected for observation may not turn out to be quite as representative as we might have hoped. So there is an element of doubt. Fortunately, the level of doubt is quantifiable. That’s why we have stuff like standard errors and confidence intervals and ‘margins of error’. Essentially, the reported estimate is the midpoint, more or less, of a range of values. The broader the range, the higher the level of confidence that the true value lies within it. So for example, if the reported estimate is 10%, it probably means something like ‘We are 85% sure that it is really between 9% and 11% and 95% sure that it’s between 8.5% and 11.5%’ or ‘We think the proportion is 10%. There’s a 15% chance that the true value is less than 9% or more than 11%, but only a 5% chance that it’s below 8.5% or above 11.5%’ . In other words, the 85% ‘confidence interval’ is 9%-11% and the 95% ‘confidence interval’ is 8.5%-11.5%.

The real problems arise when you want to compare small populations. If you wanted to compare the number of women who are employed with the number of men who are employed, that would not be a problem when using a sample survey. You could even disaggregate the data by sex, age, and state to compare, for example, the number of unemployed men aged 25-29 in Victoria with the corresponding population in Queensland. But if you were interested in numbers of farriers in the Northern Territory, or practically anything at all in a particular postcode area, because the numbers representing those populations in the actual sample are very small, the standard error would be so high that you couldn’t trust the estimate at all. This is not an issue with Census data, although statistical agencies won’t release data on populations so well defined that it becomes possible to identify characteristics of individuals.

So that’s kind of a quick summary of where the numbers come from. There’s a lot more to say about it and any statistics textbook, or a popular introduction, will say some of that. But I want to move on to the kind of stuff that I think can really contribute to making the numbers meaningful.

The first thing you need to notice about a number, and it doesn’t matter whether it comes from admin data, or a census, or a sample survey, is whether it’s an absolute number or a proportion. In national level population statistics, the former are usually expressed in thousands and the latter in percentages. Proportions reveal important relations, but if all you have is proportions, it can conceal significant matters of scale. More importantly, if you have the original numbers, you can always calculate the proportions and there is often potential to work out relations that are not explicitly reported. When I see proportions and have no way of recovering the absolute numbers they’re based on, it makes me uncomfortable and suspicious.

For example, you may recall a few weeks ago I was discussing an analysis of some survey results that correlated an ‘AntiSemitism Index’ (ASI) with an ‘AntiIsrael Index’ (AII). The authors of the paper under consideration, Kaplan and Small, found a very strong correlation. The higher the AII, the higher the proportion with a high ASI. In other words, it appeared that those who had more Anti Israel attitudes were also likely to have a lot of anti-Semitic attitudes. What I found when I looked at the original data was that,

While it is quite true that just over 56% of those with an AII of 4 [the maximum for that index] recorded an ASI greater than 5, Kaplan and Small don’t mention that the total number with an AII of 4 is 57, or just 1.14% of the sample of 5004. When the numbers get that small, questions of accuracy start to arise.

The next thing to notice is what the number is counting. I’ve been talking about ‘units’. It matters what the units are. Among the kinds of units that tend to be relevant in social analysis are the person, the family, the household, the dwelling, the job, the enterprise, etc. There was a discussion on leftwrites reacting to the announcement that the Prime Minister had been crowing, “I do believe that after a year in which 276,000 new jobs were created, it is reasonably [sic] to assume that one of the fundamentals to jobs has been the abolition of the unfair dismissal laws.”

In reality, there were probably more than 276,000 jobs created. The estimate of 276,000 was the increase in the number of employed persons. Clearly, some of those persons may have had more than one job and some of those who were already employed might have taken on, or lost, or left, second jobs. There is probably a correlation between the number of jobs and the number of employed persons, but it’s not necessarily straightforward, and in any case, it is decidedly not the same thing. I don’t regard this as a particularly egregious example, since the bastard probably ended up understating what he thought he was talking about. Of course from the perspective of social analysis, we are more concerned about the number of employed persons than the number of jobs. So he was actually reporting the number of interest, but not describing it correctly.

Like most statistics, out of context, a number like this is just propaganda. The specific context that we need to understand the increase in the number of employed persons is the number of persons who have left the labour force through death, retirement, and the like, and the number who have entered, after leaving school, etc., over that period. The only real gain in employment is the difference between the increase in the number of employed persons reported and the net increase in the labour force. That gain for the year to March 2007, the twelve months since the introduction of the draconian WorkChoices legislation, was 46,400 – not as impressive a figure as 276,000.

It might clarify this if we look at the actual numbers.

Employed persons, March 2007


Employed persons, March 2006


Increase in number of employed persons, March 2006 – March 2007


Persons in the labour force, March 2007


Persons in the labour force, March 2006


Increase in number of Persons in the labour force, March 2006 - March 2007


Increase in Number of employed persons over and above the increase in the size of the Labour force


(Source ABS, Labour force, Australia. ABS Cat. No. 6202.0, March 2007)

But beyond that, while the increase in the number of employed persons in Australia in the year to March 2007 was higher than the increase over the twelve months to March 2006, 151,600, it doesn’t compare that favourably with the previous twelve months to March 2005, 334,400. As for the real increase of 46,400, it was a big improvement on the 10,000 for the previous year and the 30,000 the year before that. But in the year to March 2004, it was 63,900.

In a more telling example, lenin reported that there was an item in Metro claiming that ‘Almost half the black children in Britain are being raised by single parents, new Government figures reveal.’ The data supporting this assertion was that, ‘The biggest percentage of lone-parent households is among black ethnic groups. Forty-eight per cent of black Caribbean families have one parent, as do 36 per cent of black African households.’ So, what’s wrong with this picture? Leaving aside the confusion between families and households, which is something you expect in the media, how do we know that 48% is the proportion of children who are members of 48% of families? Some families have more children than others.

As a matter of fact, the Office of National Statistics publication Social Trends 2007, which appears to be the source of the data reported by Metro, shows that couple families tend to have more children than one parent families. It’s not actually inconceivable that the proportion of Black children in one parent families is ‘almost half’, but the data presented definitely don’t support that conclusion because they enumerate not children, i.e. persons, but families. In other words, they took an estimate of the number of different sized boxes of apples, so to speak, and presented it as an estimate of the number of apples. As a matter of fact, I don’t think Metro was trying to deceive its readers. They probably didn’t notice themselves that they had confused two different units of enumeration to arrive at a bogus conclusion. The point is not to get sucked in when they do it, whether it’s deliberate or not.

The next thing to be aware of is that classifications matter. The whole point of statistics is to be able to group populations on the basis of some perceived similarity and observe whether there are correlations with other phenomena. For example, you might be interested in knowing whether people aged 15 to 19 are more or less likely to be employed than people 20-24. If so, you’d probably be in luck, because these are standard five year age ranges. But if you wanted to know about people aged 16-22, the data would be harder to find. If the survey collected age in ranges rather than in single years or by date of birth, it might just not be available at all. I’ll come back to classification in a minute.

Finally, definitions matter. Statisticians have to define concepts very carefully so they know in any given case which category a unit belongs in. And the definitions don’t always correspond with common parlance. For example, I mentioned just before that Metro had confused households and families. A straightforward definition of a household would be ‘All the people who usually reside in the same private dwelling’. Obviously, this begs a lot of questions, like how we define ‘usual residence’ and ‘private dwelling’, but I’m going to take that as read for the time being. Significantly, this definition allows the possibility that a household can comprise a lone person.

A family, a much more complicated concept, is a group of related people who usually reside in the same private dwelling. This is fairly close to the common parlance definition, although you might want to include people who don’t live together. However, for statistical purposes, it’s not good enough. Rather than defining the concept precisely, I’m just going to give an example of how it would work. Imagine a household comprising a couple, their 20 year old daughter and her child. For statistical purposes, the couple is one family, the daughter and her child, another. And these two families exemplify two of the categories of the classification of Family type – a Couple family without children and a One parent family.

To get to the point, a household may comprise one person or a group of people, who may or may not be related. A family is necessarily a group of people and must be related in particular ways. There may be more than one family in a household, even if all the household members are related. And it can get even more counterintuitive than that, as in the example, where there was a couple family without children, even though their own biological daughter was right there in the same household, but allocated to a different family.

Now it’s one thing to look in the glossary on the website or at the back of a statistical publication and that’s definitely where you start. But unfortunately, statisticians aren’t always totally honest. For example, you often find a definition of household that includes some reference to the members eating together. The idea is to be able to distinguish ‘boarders’ who eat with the rest of the household, from ‘lodgers’, who make their own arrangements. I won’t go into why this is a bogus distinction at this stage, but the point is that in reality, there is probably no attempt made to establish whether or not they actually do eat together. Unrelated children can count as family members. And so forth. Ultimately, to understand what the real concept behind a number is you have to know what questions were asked and stuff like that.

But what I really wanted to talk about in some detail was Labour force status (LFS). There’s a lot of confusion about the concepts. People bandy about the Unemployment rate, as I mentioned before, but not many people really know what it means.

‘Labour force status’ (LFS) is a variable that applies to a specific population of persons: civilian adults. Apart from children and the military, there may be other exclusions, like people residing in institutions, like prisons and monasteries, but these tend to be associated with survey methodology, for example if the sample excludes institutions. Who counts as an adult for these purposes varies from country to country. The age tends to be associated with something like the age at which it becomes legal to employ a child, or the age they can legally leave school, or the like. It is usually all persons aged 15 and over. Note that ‘15 and over’ is not the same thing as ‘over 15’. It’s the same thing as ‘over 14’. But in the US and the UK, it’s 16 and over, in Pakistan, it’s 10 and over. In Bangladesh, it’s 5 and over!

LFS classifies the relevant population of persons into three categories: ‘Employed’, ‘Unemployed’, and ‘Not in the Labour force’ (NILF, or in Canada, ‘Out of the labor force’ (OLF)). To make any sense of statistics like the Unemployment rate, it’s essential to know how each of these categories is defined. You’ll probably find these dark secrets shocking. All the more so because they are accepted everywhere and have been for quite some time. The basic source is RaIf Hussmanns, Farhad Mehran and Vijay Verma, 1990, Surveys of economically active population, employment, unemployment and underemployment: An ILO manual on concepts and methods.

The standard definitions of employed and unemployed as enunciated by the 13th International Conference of Labour Statisticians in 1982 (pp. 242-243 of Hussmanns et al.) are quite long and complex. But in a nutshell, a person is employed if they work ONE hour or more in the reference period (the period about which the survey asks, usually a week) ‘for pay or profit’ (in cash or in kind), or indeed without pay in a family business or on a farm. In fact, you still count as employed even if you didn’t work at all in the reference week, if you had a job and you were off on holiday, compo, strike, or for certain other reasons.

A person is unemployed if they were not employed, as defined, during the reference period, and were available to start work, and were actively looking for work. This is the population to which employers can point and say, ‘If you don’t like the pay or conditions here, there are plenty of others who’ll be glad to have your job.’ In this way, they help to discipline the employed labour force and impose downward pressure on the general level of wages. For these purposes, people who are not working but are not both available to start work and actively making themselves known to potential employers are irrelevant. If they are not available, because they’re sick, or looking after children, or any other reason, they are no use to potential employers at the relevant time. And if they are not actively looking for work, employers won’t know about them, even if they’re available.

The employed and the unemployed combine to form the ‘Labour force’. Everybody else in the civilian adult population in scope is NILF – full time students (who don’t have jobs), homemakers, retired persons, etc. Also, people who are looking for work, but not available to start at the relevant time and people who would start work if offered a job, but who have given up actively looking – ‘discouraged jobseekers’ – are classified as NILF.

It’s important to point out that these populations are typically identified through household based Labour Force Surveys and the like. Not everybody who counts as unemployed for these purposes is eligible for unemployment benefits, whatever they’re called. For one thing, the activity test tends to be slightly different. But more importantly, some who are statistically unemployed would not be eligible for benefits because of spouse’s or parents’ income, or other factors. Furthermore, those working few hours or for low pay may retain some level of eligibility for unemployment benefits in some jurisdictions. So you don’t actually expect the count of benefit recipients to match the estimate of unemployed persons, although every once in a while somebody notices that they don’t correspond and raises a fuss in the media. Like they do every few years when somebody reveals the bizarre ‘one hour rule’.

So, now we’ve defined five different populations:

The civilian population aged 15 (or whatever) and over.

1. The labour force, comprising

a. Employed persons and

b. Unemployed persons

2. Persons not in the labour force

This basic division of the civilian adult population into the Labour force and PNILF makes sense for the intended economic purposes. But from the perspective of social analysis, it might be more helpful to forget about the Labour force entirely and just distinguish employed persons from not employed persons, like this.

The civilian population aged 15 (or whatever) and over.

1. Employed persons

2. Persons not employed

a. Unemployed persons

b. Persons not in the labour force

You might even want to exclude those persons working short hours or without pay who are currently defined as employed and count them with the unemployed and NILF.

The civilian population aged 15 (or whatever) and over.

1. Fully employed persons

2. Persons not fully employed

a. Persons working short hours or without pay

b. Unemployed persons

c. Persons not in the labour force

Labour force statistics are widely misunderstood and the media contribute to perpetuating the misperceptions. The statistical agencies that collect and disseminate the data could do more to correct them, but to their credit, none of this stuff is really secret – you can find all the definitions and explanatory material you need on the website or in any publication.

Now there are three main indicators that are typically calculated from these populations:

· The Unemployment Rate is the ratio of unemployed persons in the labour force, in other words, it is the number of unemployed persons divided by the number in the labour force, i.e. Unemployed plus employed.

· The (Labour Force) Participation Rate is the proportion of the adult civilian population that is in the labour force, so the labour force divided by the civilian population.

· The Employment-Population Ratio is the proportion of the adult civilian population that is employed, so the employed population divided by the civilian population.

Most economic statistics come from administrative data and surveys of business establishments. Even though labour force statistics come from household surveys, however, they are essentially economic indicators. The unemployment rate, in particular, which is the most commonly reported indicator, is not a social indicator. It is a measure of what labour statisticians call ‘labour market slack’. It is the proportion of the labour force – ‘the currently economically active population’ – that is, the ‘employed’ plus the ‘unemployed’ as defined, who are in the latter category. That is, working less than one hour per week, available to start work, and actively looking for work. In other words, the reserve army of labour, rather than, say, the proportion of people who don’t have an income. The reason it’s important to employers and governments is that they are interested in knowing how many people are available to fill shortfalls in labour and making themselves known to employers. It’s a measure of the discrepancy between the supply and demand of labour. When the unemployment rate is high, there’s a buyer’s market for labour power.

Specifically, it is not an indicator of the proportion of the population who lack means of livelihood. Because of the ‘one hour rule’ and the inclusion of unpaid family workers in defining Employment and the availability and activity tests in the definition of Unemployment, along with a few other less significant factors, the concepts just don’t align with the concerns of social analysts regarding income and so forth.

Usually, I find a more interesting statistic to be the ‘non-employed/population ratio’ – that is, the difference between the Employment/Population Ratio and 100% - the proportion of the civilian population aged 15 and over who are not employed (not to be confused, of course, with ‘unemployed’). In other words, it’s the proportion of the adult civilian population who are either unemployed or NILF. Of course, this is not really an indicator of livelihood, either. But it’s usually easy to find, or to calculate from readily available estimates, and it’s heaps better than the Unemployment rate for this purpose. The reasons it’s not a livelihood indicator are that, for one thing, there are other sources of livelihood than employment, principally investments and benefits. For another, most jobs do not pay enough in one hour per week to really constitute a livelihood. Usually Hours worked data are readily available, so you can calculate more refined indicators, like the proportion of the adult civilian population working full time, or, say, 15 hours per week or more. Other refinements might be to include discouraged jobseekers or other NILF populations with the strictly unemployed to calculate a different kind of ‘non employment rate’.

And while I’m clarifying Labour force concepts, there is often confusion about underemployment. There are two kinds of underemployment – visible (or ‘time-related’) and invisible (or ‘inadequate employment situations’). The international definition of visible underemployment is the persons employed less than ‘normal hours’ (often 35 hours per week) who want to work more hours and are either available or actively looking. (In Australia, those who report that they’re available to work more hours are underemployed whether they’re actively looking or not.) Significantly, if you already have a fulltime job that doesn’t pay enough and you worked your full hours in the reference period, you are not underemployed even if you are available to work more hours and are looking. Although it is obviously the principal concern of many who talk about underemployment, only a few countries attempt to measure invisible underemployment, like not working in the occupation or industry you trained for (and after all, who is?).

Finally, I hasten to add that if anyone is thinking of trying to find the working class in labour force statistics, don’t imagine that it’s straightforward. The standard classification of ‘Status in employment’ has four categories. ‘Own account workers’ have their own unincorporated businesses and don’t employ anybody else. ‘Employers’ also have their own unincorporated businesses and do have employees. ‘Contributing family workers’ are the ones who work for nothing in family businesses and farms. Everybody else is an ‘Employee’. That includes many senior executives and managers. The wording and sequencing of questions in labour force surveys and other survey instruments that collect Status in employment data usually preclude systematic identification of another category widely acknowledged to be of interest – ‘Owner manager of own incorporated enterprise’. In case it’s not obvious, the rationale is that the incorporated enterprise is a separate legal entity that employs its owner. The International Classification of Status in Employment (ICSE), by the way, incorporates a number of subcategories that can capture the owner managers and many other more exotic arrangements like members of cooperatives and work gangs. But even if there were surveys that collected status in employment in sufficient detail to support output to the full ICSE, it still wouldn’t unambiguously identify persons by class.

It’s possible to get data on occupation that can help to refine the concept. Data on ‘non-managerial employees’, however, while probably closer to what we would be interested in, is still not nearly good enough. For one thing, it’s still a count of Employees, as defined, and more and more actual employees are technically classified as ‘subcontractors’ or ‘consultants’ and would therefore fall into the category of ‘Own account workers’ rather than ‘Employees’. For another, a lot of non managerial jobs are being reclassified as ‘managerial’, as recently occurred with nurses in the US. Finally, the working class, as Marxists understand it, includes segments of the Unemployed and NILF populations that are even more difficult to identify from labour force statistics.

So now you know.


  1. Ernie, please

    the participation rate has been edging up for the past 5 years. Stop trying to turn honey into shit without a reason. These are good times. Enjoy them while they last

  2. Liek they always say, Anonymous, there's no recession in Vaucluse. The point of this post was to try to explain where widely reported labour force statistics come from and what they mean. If you followed the explanation, you'd be able to surmise that an increase in the Participation rate is not necessarily a good thing. It can arise, for example, from changes to welfare entitlements that force the disabled and lone parents to seek work, moving them into the labour force, but hardly improving their quality of life, particularly with cuts to childcare and the like.

    Anyway, these are not good times for everybody, and if you think they are, you probably aren't among those for whose benefit I tried to clarify these issues.