Unlike sampling units, which appear to be a widely accepted and undisputed term, “much of the discussion on the „unit of analysis‟ confuses the issues of what should form the basis for coding with what should form the basis for measuring or counting the amount of disclosure” (Milne and Adler, 1999, p. 243, see e.g. Gray et al.‟s, 1995b, assertion that “Sentences are to be preferred if one is seeking to infer meaning… [but] Pragmatically, pages are… easier… unit to measure by hand”, p. 84, also Guthrie et al., 2008 for similar arguments). Part of this confusion may be generated from the use of different and at times opposite terms from researchers (e.g.

Milne and Adler‟s (1999) coding units are what Krippendorff (2004) describes as context units and Neuendorf (2002) as analysis units, whilst Milne and Adler‟s measurement units are Krippendorff‟s recording/coding units and Neuendorf‟s (2002) data collection units, see also Walden and Schwartz [1997]; Unerman [1998, as cited by O‟Dwyer, 1999, pp. 229-232]; O‟Dwyer [1999], for alternative interpretations). In this paper, Krippendorff‟s (2004) terminology is adopted, in accordance with a number of other prominent CA theorists (Grey et al. 1949; Berelson, 1952; Osgood, 1959; Stone et al., 1966; Weber, 1985; 1990).

- 13 Most of the CA studies refer to a single unit of analysis, that being most frequently the recording unit for the volumetric studies and the context unit for the index ones.

This is not surprising given that in the index studies where most frequently the recording unit is the presence or absence of specific information, the unit that usually needs attention is the context one (see e.g. Patten and Crampton, 2004, pp. 39-40, for an illustrative account on how sentences were used as a basis for coding for their index). In contrast, for volumetric studies the recording units seem to be of more importance, whilst for the context units, since unlike sampling and recording ones they “are not counted, need not be independent of each other, can overlap, and may be consulted in the description of several recording units” (Krippendorff, 2004, p. 101) researchers do not often explicitly discuss their choices (but see Zéghal and Ahmed,

1990) and meaning is coded perhaps by section heading, phrase or sentence (Buhr, 1994; Campbell, 2004).

There seems to be no logical limit for the size of the context units: As Krippendorff (2004) notes “sentences are the minimal context units for individual words, but sentences may not be enough” (p. 101) and at times, when e.g. decisions on a positive or negative context of a commentary are made, “analysts might need to examine even larger context units, such as a paragraph or a whole speech.... The best content analyses define their context units as large as is meaningful (adding to their validity) and as small as feasible (adding to their reliability)” (ibid., pp. 101-102). With regards to this, Milne and Adler (1999) demonstrate that “sentences are far more reliable than any other unit of analysis” (p. 243) and further assert that “Most social and environmental content analyses in fact use sentences as the basis for coding decisions” (ibid.).

Naturally, researchers need to clearly define their context units before commencing recording them. Abbot and Monsen (1979) seem to consider a prominent type of error in CA to be “the formulation of categories that do not reflect all the issues actually contained in the report that are of policy interest” (p. 506). As Holsti (1969a) points out, “categories should reflect the purposes of the research, be exhaustive, be mutually exclusive, independent, and be derived from a single classification principle” (p. 95, emphasis in original). It seems particularly useful in CSR research to establish

- 14 clear rules as to what consists CSR and what does not, a problem largely stemming from the variety of definitions of the field, that are “generally too exclusive… or too all-embracing” (Gray et al., 1995b, fn4, p. 89)9. As Milne and Adler (1999) empirically attest for inter-rater reliability, “by far the greatest proportion of disagreements concerned whether a particular sentence was or was not a social disclosure regardless of the coder. If coders had agreed to a sentence being a social disclosure (regardless of which theme) they were relatively unlikely to disagree over which theme, what sort of evidence and what type of news characteristics the sentence contained” (p. 252).

In the literature four major „themes‟ (what Holsti, 1969a refers to as manifest subject of information, not inductively generated) for CSR are employed: marketplace (consumers, creditors), workplace (employees), community, and environment, but there will always be a need for a development of an „other‟ category (Gray et al., 1995b). This classification was employed in the coding spreadsheet of the BA study, which for illustration purposes is depicted in Figure 1. This spreadsheet was based, among others, on Ernst and Ernst (1978) and Gray et al. (1995b) CSD classifications, as well as those of the GRI (2002), which Erusalimsky et al. (2006) consider, along with AA1000, to “perhaps represent the very base, the entry-level for analysing reporting in an even vaguely serious manner” (p. 19)10.

This variety of definitions is also reflected in the number of alternative terms that at times have been offered to even term the field, including: Corporate Social Reporting (Gray et al., 1988; 1995a); Social Accounting (Gray, 2002; Gray et al., 1997); Social and Environmental Accounting (Mathews, 1997;

Gray, 2006); Social and Environmental Accounting and Auditing (Owen, 2004); Social and Environmental Accountability (Parker, 2005); Social Responsibility Disclosure (Trotman, 1979; Neu et al., 1998); Corporate Social and Ethical Reporting (Adams, 2002); Ethical Reporting (Adams, 2004);

Despite that these, as Epstein (2004) notes, “are all terms used to describe the measurement and reporting of an organization‟s social, environmental, and economic impacts, as well as society‟s impacts on that organization, including both positive and negative impacts”, and that the adoption of a specific term for CSR may sufficiently serve the short-term needs of a „pragmatically–oriented‟ research paper, it should be noted that this also adds to the long-term confusion over the terminology and the (much sought [Parker, 2005]) conceptual framework of the area. Naturally, the adoption of different definitions and resulted context categories further prohibits meaningful comparative analyses of the findings of the papers.

Note that this was a multiple classification protocol, in that each coding unit was classified into more than one category, based on the condition that each entry may have more than one attribute. It should be further noted though that the gain from the adoption of this multiple classification in semantic precision does not outweigh the potential losses from logical distinctiveness and exclusiveness since “not all entries need have the same attribute to the same extent” (Weber, 1990). This is particularly the case when latent rather than manifest classification schemes are examined, as further discussed in section 6.

- 15 Further, it seems reasonable to take a more „pragmatic‟ rather than „classical‟ approach and suggest that the categories of each study should also reflect the set research objectives: in the BA study the categories reflect the focus on the CSD effects of aviation accidents and subsequently the Health and Safety disclosures, whilst for Deegan et al. (2002) the categories reflect their focus on Human Resources.

Holsti (1969a) considers “reflect[ing] the investigator‟s research question… the most important requirement of categories” (p. 95) and warns that “Unless theory and technique are intimately related, even the minimal requirements of validity cannot be met” (p. 94). This however should not necessarily be the case for general CSR surveys, where the adoption of a more „standardised‟ approach may increase comparability and cumulative research (Berelson, 1952).

Further consideration of the CSR subject categories is beyond the scope of this study.

However, as indicated in Figure 1, when reviewing and deciding on the context units all the alternative ways to define CSD need to be set and clearly defined. For the BA study further distinctions on the type of disclosure (positive or negative) and the possible underlying corporate strategy (3 substantive and 3 symbolic) as well as on whether CSD was mandatory or voluntary and narrative or non-narrative were included. This resulted in overall (19 choices of theme) x (7 choices of strategy) x (3 choices of type) x (2 mandatory/voluntary) x (2 narrative/non-narrative) = 1,596 possible choices for coding each individual CSD. These distinctions will be discussed more extensively in the related findings section.

Recording/coding units Recording/coding units are defined by Holsti (1969a) as “the specific segment of content that is characterized by placing it in a given category” (p. 116). As

Krippendorff (2004) elaborates:

Whilst for the basis for coding researchers seem to concede that “Sentences are to be preferred if one is seeking to infer meaning” (Gray et al., 1995b, p. 84, see also Milne and Adler, 1999; Unerman, 2000), with regards to the basis for measuring, “once the content has been coded… quantification may be done in a number of ways” (Milne and Adler, 1999, p. 243). Indeed, researchers have at times employed a variety of different approaches to measurement, often justifying their choice on the empirical evidence (Grey et al., 1949; Patten, 1992; Deegan and Gordon, 1996; Deegan and Rankin, 1996; Hackston and Milne, 1996; Williams, 1999; Campbell, 2000) which gives support to the suggestion “that measurement error between various quantification techniques is likely to be quite negligible” (Milne and Adler, 1999, p.

243). However, as now illustrated, each unit has its distinct advantages and disadvantages which need to be considered when selecting units and interpreting results.

The four types of recording units considered here are words, sentences, proportion of pages and page size data11. A summary of a number of issues of concern drawn from the literature regarding these units is provided in Table 3.

Wiseman (1982) and Patten (2002b) also counted lines, in a complementary manner to an „index‟ CA. Lines have also been employed by e.g. Bowman and Haire (1975; 1976) and Trotman and Bradley (1981), but in order to estimate the proportion of the total discussion on all issues. Davey (1982, cited in Guthrie and Mathews, 1985, pp. 258-259) interestingly determined the volume of disclosures by calculating words as composed of five characters and a one character space (six characters in total), in essence a character-based quantification, similar to the one adopted by Tinker and Neimark (1987).

Although, it should be acknowledged that, particularly characters could possibly bring extra precision in measurements, as Milne and Adler (1999) note for words, this “seems unlikely to add to understanding” (p. 243). It is further assumed that the arguments behind the potential use of these units are subsumed in the discussion of e.g. words or sentences. Further, Burritt and Welch (1997) counted passages/ thematic units, an approach to measurement, however, highly contested given that equal sovereignty was granted to issues discussed in one sentence with others in whole paragraphs, where further reliability is very difficult to be attained (Holsti, 1969).

- 17 Words, sentences and proportion of pages As illustrated in Table 2, a number of CSR studies have employed words or sentences as the recording unit. As further illustrated in Table 3, these two approaches share a number of benefits and limitations and their inter-relation has been empirically validated as early as 1947 (Dollard and Mower, 1947). Both approaches do not account for differences in typeface within the document (Hackston and Milne, 1996) or for repetitions in the information (Patten, 2002a); however, both approaches are not affected by variations in the general font size of different documents (Tilt and Symes,

1999) or by the presence of margins or blank pages (Gray et al., 1995b) nor by whether the sources are in an electronic (particularly internet or.pdf files) or in microfiche form (Campbell, 2004) and they generally seem to “lend themselves to a more controllable analysis” (Gao et al., 2005).

Compared to sentences, words seem to have the advantage of being “the smallest unit of measurement for analysis and can be expected to provide the maximum robustness in assessing the quantity of disclosure” (Wilmshurst and Frost, 2000, p. 16). As Krippendorff (2004) similarly argues “To ensure agreement among different analysts in describing the coding/recording units of a content analysis, it is desirable to define these units of description as the smallest units that bear all the information needed in the analysis, words being perhaps the smallest meaningful units of text… and the safest recording unit for written documents” (pp. 100, 104). Further, words as the recording unit may also assist by allowing the inclusion of tables in the analysis (but see Hackston and Milne‟s, 1996, approximation for one table line to equal one sentence, which allows tables also to be captured when using sentences as the recording unit).

A number of studies, though, have questioned the usefulness of the additional detail in measurements from employing words rather than sentences. Researchers note that the “tedious exactitude” (Patterson and Woodward, 2006, pp. 21-22) of words “seems unlikely to add to understanding” (Milne and Adler, 1999, p. 243) and put forward arguments for the use of sentences, since these are also “easily recognizable

- 18 syntactically defined units of text” (Krippendorff, 2004, p. 105), they may be quantified with greater measurement accuracy (Unerman, 2000), they are thus subject to less inter-coder variation (Ingram and Frazier, 1980; Deegan et al., 2002) and overall seem to be able “to provide complete, reliable and meaningful data for further analysis” (Milne and Adler, 1999, p. 243).

A strong argument, however against employing either words or sentences as recording units “is that this will result in any non-narrative CSR disclosures (such as photographs or charts) being ignored” (Unerman, 2000, pp. 675). As Beattie and Jones (1997) have argued particularly with regards to graphs, approximately 80% of leading US and UK companies use them in their Annual Reports; these are more userfriendly than tables; and graphs, especially in colour, attract the reader‟s attention;

additionally “the reader‟s ability to remember visual information is normally superior to that for remembering numerical or textual information” (Beattie and Jones, 1997, p.

34, a justification supported by Leivian, 1980). Photographs have also been used to present and highlight what companies wish to portray (Preston et al., 1996) and it seems that the role of graphic representation in corporate external financial reporting is being recognised increasingly by a number of national regulatory bodies, such as the Canadian Institute of Chartered Accountants (Beattie and Jones, 1997). It is thus evident that this information needs not to be excluded from CA studies (see also similar arguments by Berelson, 1952; Stone et al., 1966).

In an attempt to capture this valuable non-narrative information, a number of researchers employ proportions of a page as recording unit. Researchers frequently lay an A4 grid with twenty five rows of equal height and four columns of equal width (but see e.g. different A4 grids by Guthrie and Parker, 1989; Hackston and Milne, 1996; and Newson and Deegan, 2002) across each CSR disclosure, “with volume being counted as the number of cells on the grid taken up by a disclosure” (Unerman, 2000, p. 676). The main benefit of this approach, other than capturing the information provided in a pictorial, tabular, graphic or large typeface form, is that it generates detailed measurements and comparable findings across reports of the same and different companies.

