Tom Knapp

“Tom Knapp's books and articles are at the top of my list for statistical literacy.” — Milo Schield (2012).

For more information, see his home page: http://tomswebpage.net/

Tom is Professor Emeritus of Education and Nursing at the University of Rochester and The Ohio State University.

Latest Unpublished Book:

Published Articles:

Short Papers:

BOOKS:

Random (2012)

Preface:

What is the meaning of the word “random”? What is the difference between random sampling and random assignment? If and when a researcher finds that some data are missing in a particular study, under what circumstances can such data be regarded as “missing at random”? These are just a few of the many questions that are addressed in this monograph. I have divided it into 20 sections--one section for each of 20 questions--a feeble attempt at humor by appealing to an analogy with that once-popular game. But it is my sincere hope that when you get to the end of the monograph you will have a better understanding of this crucial term than you had at the beginning.

To give you some idea of the importance of the term, the widely-used search engine Google returns a list of approximately 1.4 billion web pages when given the prompt “random.” Many of them are duplicates or near-duplicates, and some of them have nothing to do with the meaning of the term as treated in this monograph (for example, the web pages that are concerned with the rock group Random), but many of those pages do contain some very helpful information about the use of “random” in the scientific sense with which I am concerned.

I suggest that you pay particular attention to the connection between randomness and probability (see Section 2), especially the matter of which of those is defined in terms of the other. The literature is quite confusing in that respect.

There are very few symbols and no formulas, but there are LOTS of important concepts. A basic knowledge of statistics, measurement, and research design should be sufficient to follow the narrative (and even to catch me when I say something stupid). Thanks for stopping by, and enjoy!

Table of Contents

  1. What is the meaning of the word “random”?
  2. Which comes first, randomness or probability?
  3. Are randomness and chance the same thing?
  4. Is randomness a characteristic of a process or a product?
  5. What is a random-number generator?
  6. Where can you find tables of random numbers?
  7. What are tests of randomness?
  8. What is “random sampling” and why is it important?
  9. What is the difference between random sampling and random assignment?
  10. What is the randomized response method in survey research?
  11. Under what circumstances can data be regarded as missing at random?
  12. What is a random variable?
  13. What is the difference between random effects and fixed effects in experimental research?
  14. Can random sampling be either with replacement or without replacement?
  15. What is stratified random sampling and how does it differ from stratified random assignment (blocking)?
  16. What is the difference between stratified random sampling and quota sampling?
  17. What is random error?
  18. Does classical reliability theory necessarily assume random error?
  19. References

Percentages: The Most useful Statistics Ever Invented (2009)

Table of Contents

  1. The basics
  2. Interpreting percentages
  3. Percentages and probability
  4. Sample percentages vs. population percentages
  5. Statistical inferences for differences between percentages and ratios of percentages
  6. Graphing percentages
  7. Percentage overlap of two frequency distributions
  8. Dichotomizing continuous variables: Good idea or bad idea?
  9. Percentages and reliability
  10. Wrap-up
  11. References

Preface

You know what a percentage is. 2 out of 4 is 50%. 3 is 25% of 12. Etc. But do you know enough about percentages? Is a percentage the same thing as a fraction or a proportion? Should we take the difference between two percentages or their ratio? If their ratio, which percentage goes in the numerator and which goes in the denominator? Does it matter? What do we mean by something being statistically significant at the 5% level? What is a 95% confidence interval? Those questions, and much more, are what this book is all about.

In his fine article regarding nominal and ordinal bivariate statistics, Buchanan (1974) provided several criteria for a good statistic, and concluded: “The percentage is the most useful statistic ever invented…” (p. 629). I agree, and thus my choice for the title of this book. In the ten chapters that follow, I hope to convince you of the defensibility of that claim.

The first chapter is on basic concepts (what a percentage is, how it differs from a fraction and a proportion, what sorts of percentage calculations are useful in statistics, etc.) If you’re pretty sure you already understand such things, you might want to skip that chapter (but be prepared to return to it if you get stuck later on!).

In the second chapter I talk about the interpretation of percentages, differences between percentages, and ratios of percentages, including some common mis-interpretations and pitfalls in the use of percentages.

Chapter 3 is devoted to probability and its explanation in terms of percentages. I also include in that chapter a discussion of the concept of “odds” (both in favor of, and against, something). Probability and odds, though related, are not the same thing (but you wouldn’t know that from reading much of the scientific and lay literature).

Chapter 4 is concerned with a percentage in a sample vis-à-vis the percentage in the population from which the sample has been drawn. In my opinion, that is the most elementary notion in inferential statistics, as well as the most important. Point estimation, interval estimation (confidence intervals), and hypothesis testing (significance testing) are all considered.

The following chapter goes one step further by discussing inferential statistical procedures for examining the difference between two percentages and the ratio of two percentages, with special attention to applications in epidemiology.

The next four chapters are devoted to special topics involving percentages. Chapter 6 treats graphical procedures for displaying and interpreting percentages. It is followed by a chapter that deals with the use of percentages to determine the extent to which two frequency distributions overlap. Chapter 8 discusses the pros and cons of dichotomizing a continuous variable and using percentages with the resulting dichotomy. Applications to the reliability of measuring instruments (my second most favorite statistical concept--see Knapp, 2009) are explored in Chapter 9. The final chapter attempts to summarize things and tie up loose ends.

There is an extensive list of references, all of which are cited in the text proper. You may regard some of them as “old” (they actually range from 1919 to 2009). I like old references, especially those that are classics and/or are particularly apt for clarifying certain points. [And I’m old too.]

Enjoy!

Learning Statistics through Playing Cards (1996 Sage; 2003, 2012)

A one-of-a-kind volume, Learning Statistics Through Playing Cards uniquely utilizes a simple deck of playing cards to explain the important concepts in statistics. Covering many of the topics included in introductory college statistics courses, author Thomas R. Knapp escorts the student through populations and variables, parameters, percentages, probability and sampling, sampling distribution, estimation, hypothesis testing, and two-by-two tables. Each chapter ends with a series of exercises designed to help the student actually manipulate the concept under discussion (the answers are provided at the back of the text). Also included is an annotated bibliography that directs the student toward further readings. This simple approach to teaching the elementary principles of statistics and probabilities makes this an exceptional supplementary text for undergraduates and first-year graduates in the social, behavioral, and health sciences.

The Reliability of Measuring Instruments (2009)

[Tom is continually revising his reliability book. You can read the latest version, download it, or whatever, by visiting his website.]

Preface:

Can you say “reliability” without saying “validity”? (Can you say “Rosencrantz” without saying “Guildenstern”?) I hope so, because this book is all about reliability, except for five appendices in which I discuss validity and for occasional comments in the text proper regarding the difference between reliability and validity. But isn't validity more important than reliability? Of course; a reliable instrument that doesn't measure what you want it to measure is essentially worthless. The problem is that the validity of a measurement device ultimately relies on the subjective judgment of experts in the field (all of the current emphasis on construct validity to the contrary notwithstanding), and my primary purpose in writing this book is to pursue those statistical features of measuring instruments that tell you whether or not, or to what extent, such instruments are consistent.

There are 14 chapters in the book. Chapter 1 is an introductory treatment of the concept of reliability, with special attention given to its many synonyms and nuances. The following chapter addresses the associated concept of measurement error, with an extended discussion of “randomness.” Chapter 3 is devoted to classical reliability theory and is the most technical section of the book, but if you think back to your high school mathematics you will recognize the similarity to plane geometry, with its counterpart definitions, axioms, and theorems. (It is assumed that you are also familiar with descriptive statistics such as means, variances, and correlation coefficients, and with the basic principles of inferential statistics.)

Chapters 4 and 5 treat, respectively, the concept of attenuation and the interpretation of individual measurements. In Chapter 6 I try to summarize the literature regarding the reliability of difference scores of various types and the controversies concerning some of those types.

The matter of the reliability of individual test items is explored in Chapter 7. Discussion of the internal consistency reliability of the total score on a test that consists of more than one item (the usual case) follows naturally in Chapter 8, where the primary emphasis is on coefficient alpha (Cronbach's alpha). That chapter (Chapter 8) also includes a brief section in which I point out the methodological equivalence of internal consistency reliability and both inter-rater and intra-rater reliability.

Chapter 9 on intraclass correlations is my favorite chapter. Although their principal application has been to the reliability of ratings, they come up in all sorts of interesting contexts, including those concerned with the unit of analysis and the independence of observations.

Relative agreement vs. absolute agreement and ordinal vs. interval measurement provide the focus of Chapter 10. Most discussions of instrument reliability are concerned with the relative agreement between two equal-status operationalizations of a particular construct, but some are devoted exclusively to absolute agreement. Likert-type scales and other instruments that do not have equal units require special considerations. (Some of this material was originally included in various other chapters in previous editions of this book.)

Chapter 11 is concerned mostly with statistical inferences from samples of “measurees” to populations of “measurees,” but some attention is also given to statistical inferences from samples of “measurers” to populations of “measurers.”

In Chapter 12 I try to bring everything together by applying classical reliability theory to a set of data that were generated in a study of alternative ways of measuring height. (The data, which have been graciously provided to me by Dr. Jean K. Brown, Dean, School of Nursing, University at Buffalo, State University of New York, are in Appendix A.)

The following chapter (Chapter 13) deals with a variety of special topics regarding instrument reliability. And a final chapter (Chapter 14) attempts to extend the concept of reliability of measuring instruments to the reliability of claims.

There is an appendix (Appendix B) on the validity of measuring instruments in general, an appendix (Appendix C) on the reliability and validity of birth certificates and death certificates, an appendix (Appendix D) on the reliability and validity of height and weight measurements, an appendix (Appendix E) on the reliability and validity of the four gospels, and an appendix (Appendix F) on the reliability and validity of claims regarding the effects of secondhand smoke. A list of references completes the work.

The book is replete with examples of various measurement situations (real and hypothetical), drawn from both the physical sciences and the social sciences. Measurement is at the heart of all sciences. Without reliable (and valid) instruments science would be impossible.

You may find my writing style to be a bit breezy. I can't help that; I write just like I talk (and nobody talks like some academics write!). I hope that my informal style has not led me to be any less rigorous in my arguments regarding the reliability of measuring measurements. If it has, I apologize to you and ask you to read no further if or when that happens. You may also feel that many of the references are old. Since I am a proponent of the “classical” approach to reliability, their inclusion is intentional.

I would like to thank Dr. Brown and Dr. Shlomo S. Sawilowsky (Wayne State University) for their very helpful comments regarding earlier manuscript versions of the various chapters in this book.

But don't hold them accountable for any mistakes that might remain. They're all mine.

Table of Contents:

    Preface

  1. What do we mean by the reliability of a measuring instrument?
    [Terminology Illustrative examples Necessity vs. sufficiency Additional reading]
  2. Measurement error
    [Attribute vs. variable When is something random? Obtained score, true score, and error score Dunn's example Continuous vs. discrete variables The controversial true score Some more thoughts about randomness Additional reading]
  3. Reliability theory (abridged, with examples)
    [The basic concepts The first few axioms, definitions, and theorems A hypothetical example A different approach Some other concepts and terminology The key theorem A caution concerning parallelism and reliability Truman Kelley on parallelism and reliability Examples (one hypothetical, one real) Hypothetical data Real data Additional reading]
  4. Attenuation
    [What happens, and why The “correction” What can go wrong? How many ways are there to get a particular correlation between two variables? The effect of attenuation on other statistics Additional reading Chapter 5 The interpretation of individual measurements. Back to our hypothetical example, and a little more theory How to interpret an individual measurement Point estimation Interval estimation Hypothesis testing Compounded measurement error Additional reading]
  5. The reliability of difference scores
    [Types of difference scores The general case Measure-remeasure differences Between-object differences Change scores Simple change Controversy regarding the measurement of simple change Modified change Percent change Weighted change Residual change Other difference scores that are not change scores Inter-instrument differences Inter- and intra-rater differences Our flow meter example (revisited) Additional reading]
  6. The reliability of a single item
    [Single-item examples X, T, and E for single dichotomous items Some approaches to the estimation of the reliability of single items The Knapp method (and comparison to the phi coefficient) The Guttman method Percent agreement and Cohen's kappa Spearman-Brown in reverse Visual analog(ue) scales Additional reading]
  7. The internal consistency of multi-item tests
    [A little history Kuder and Richardson Cronbach How many items? Factor analysis and internal consistency reliability Inter-item and item-to-total correlations Other approaches to internal consistency Inter-rater reliability and intra-rater reliability Additional reading]
  8. Intraclass correlations
    [The most useful one The one that equals Cronbach’s alpha Additional reading
  9. Two vexing problems
    [Absolute vs. relative agreement Mean and median absolute differences Ordinal vs. interval measurement Kendall’s tau-b Goodman & Kruskal’s gamma Williams’ method Back to John and Mary Additional reading]
  10. Statistical inferences regarding instrument reliability
    [Parallel forms reliability coefficients Test-retest reliability coefficients Intraclass correlations Coefficient alpha Cohen's kappa Reliability and power Sample size for reliability studies The effect of reliability on confidence intervals in general Our flow meter example (re-revisited) Random samples vs. “convenience” samples Additional reading]
  11. A very nice real-data example
    [Background and the study itself Over-all parallelism Over-all reliability The 82 measurers Tidbits]
  12. Special topics
    [Some other conceptualizations of reliability Generalizability theory Item response theory Structural equation modeling Norm-referenced vs. criterion-referenced reliability Unit-of-analysis problems Weighting Missing-data problems Some miscellaneous educational testing examples Some more esoteric contributions]
  13. The reliability of claims
  1. The very nice data set
  2. The validity of measuring instruments
  3. The reliability and validity of birth and death certificates
  4. The reliability and validity of height and weight measurements
  5. The reliability and validity of the four gospels
  6. The reliability and validity of claims regarding the effects of secondhand smoke
  7. References