Friday, October 06, 2006

I’ve recently completed a quick study to explore the nature of technical handbooks. The purpose of the study was to start to unpack the meaning of the word “handbook.” This genre is particularly immune to critical study despite its continued importance for professionals such as architects and engineers.

There are various definitions for handbooks. My main concern is not how librarians or publishers define and describe them, but rather how actual users describe them. One source of information is published reviews of different handbooks. These reviews are typically written by engineers and other technical professionals to be read by their colleagues.

Sample

This study focused on the contents of reviews of technical handbooks. The sampling frame consisted of all book reviews available in the ProQuest ABI/Inform—Trade and Industry database. The sample consists of reviews that contained the keywords “handbook” and “engineer” and that explicitly review a technical handbook to be used by engineers or other technical professions.

A total of 188 reviews were selected for analysis. The shortest review is 47 words. The longest is over 1,300 words. The average review length is 350 words. The reviews represent a wide range of technical handbooks, such as:

  • Composite Materials Handbook

  • Concrete Technology

  • EMI/EMC Computational Modeling Handbook

  • Handbook of Computer Simulation in Radio Engineering, Communications and Radar

  • Handbook of Engineering Electromagnetics

  • Handbook of Filter Media

  • Handbook of Material Weathering

  • Handbook of Polypropylene and Polypropylene Composites

  • Handbook of Powder Science and Technology

  • Materials Handbook: A Concise Desktop Reference

  • McGraw-Hill Machining and Metalworking Handbook

  • Perry’s Chemical Engineers Handbook

  • The Pilot Plant Real Book: A Unique Handbook for the Chemical Process Industry

  • Standard Handbook of Consulting Engineering Practice

These reviews come from a variety of different trade journals, such as:

  • Chemical Engineering Progress

  • Civil Engineering: Magazine of the South African Institution of Civil Engineering

  • Electrical Apparatus

  • Electromagnetic News Report

  • Information Intelligence Online Newsletter

  • Mechanical Engineering

  • Microwave Journal

  • Modern Machine Shop

  • Pit and Quarry

  • Plastics Engineering

Analysis

I saved a digital version of each review and removed all header material. I then processed the reviews using TextSTAT, a text analysis program available from the Free University of Berlin (http://www.niederlandistik.fu-berlin.de/textstat/software-en.html). The TextSTAT analysis yielded raw frequency counts for each word that appeared in the sample corpus.

The raw counts are meaningless. While certain words seemed to occur quite frequently in the corpus, their overall prevalence in the English language must be considered. The value I specifically explore is a log transform of the ratio of the corpus frequency compared to the language frequency:

G = log (frequency_corpus/frequency_language)

To determine English language word frequencies, I used the frequency lists available on the companion website for Leech, Rayson, and Wilson’s Word Frequencies in Written and Spoken English (http://www.comp.lancs.ac.uk/ucrel/bncfreq/). I calculated G values for the 840 terms that appeared more than twice in the entire corpus. Of these terms, 62 are very uncommon in written English (i.e., less than once in a million words). Leech, Rayson, and Wilson list the frequency for these terms as 0, making a G calculation impossible.

Results

My analysis included over 840 terms. The G values for these terms are normally distributed and ranged from high value of 2.76 to a low value of -1.90. The values were normally distributed (mean = 0.64, SD = 0.77) with minimal distortion (kurtosis = -0.16, skewness = 0.22).

I isolated three particular groups of terms that yield some insights on how technical handbooks are understood by their audiences. The first is the list of terms that appear frequently in the corpus yet have sub-threshold frequencies in written English. The second list consists of terms that appear unexpectedly frequently as determined by a G value greater than two standard deviations higher than the mean value. I determined the cutoff to be 2.17. The third list of interest consists of words that appear unexpectedly infrequently as determined by a G value less than two standard deviations lower than the mean value (-0.89). The differences in the lengths of lists two and three are due to the skew of the distribution.

List 1: Terms that appear very frequently


Term

Corpus Count

(per 65,501 words)

Brookfield

55

McGraw-Hill

35

modeling

28

parts

27

hardcover

25

molding

25

CDMA

24

Wiley

23

cryogenic

22

fibers

20

Dekker

20

Grossel

20

media

19

ASM

19

ASTM

17

Boca

16

EMC

15

models

15

molds

15

nondestructive

15

polymerization

15

made

14

RF

14

e-mail

12

flowmeters

12

SPE

12

Artech

11

CA

11

optimization

10

ASTs

10

dryers

10

machining

10

sizing

10

UST

10

ASME

9

designing

9

steels

9

USTs

9

elastomers

8

OnDisc

8

AIChE

7

exchangers

7

extrusion

7

FSU

7

troubleshooting

7

pertaining

6

actuators

6

Bashore

6

Begell

6

CNC

6

coupling

6

criteria

6

Elsevier

6

Gating

6

IS-95

6

Loctite

6

molded

6

practicing

6

scale-up

6

wastewater

6

List 2: Terms where Gz > 2


Term

Corpus Count

(per 65,501 words)

English Frequency (per Million words)

G

handbook

517

6

3.16

piping

38

1

2.76

fluid

35

1

2.73

manufacturing

61

2

2.67

bookshelf

27

1

2.62

composites

27

1

2.62

CRC

25

1

2.58

anonymous

224

12

2.45

appendices

16

1

2.39

covered

45

3

2.36

terms

15

1

2.36

mechanical

297

20

2.36

combustion

44

3

2.35

electromagnetic

44

3

2.35

corrosion

43

3

2.34

academia

14

1

2.33

filler

32

3

2.21

engineering

533

51

2.20

mixing

62

6

2.20

sensors

20

2

2.18

according

10

1

2.18

polypropylene

10

1

2.18

studies

10

1

2.18

List 3: Terms where Gz < -2


Term

Corpus Count

(per 65,501 words)

English Frequency (per Million words)

G

still

6

749

-0.91

have

98

13,655

-0.96

just

9

1,296

-0.97

know

13

1,883

-0.98

very

7

1,230

-1.06

when

8

2,143

-1.24

with

17

6,575

-1.40

his

7

4,334

-1.61

he

7

8,470

-1.90