Tom Knapp

Authors Popular StatLit News Authors-Academic Statistical Literacy Numeracy Statistical Reasoning

Gary Klass Tom Knapp Othmar Winkler Judea Pearl Herbert I. Weisberg J. Cornfield Wayne Winston Conrad Carlberg Jeffrey Bennett John Brignell Dennis Haack




"Tom Knapp's books and articles are at the top of my list for statistical literacy."  Milo Schield (2012).  For more information, see his home page:
Tom is Professor Emeritus of Education and Nursing at the University of Rochester and The Ohio State University.

Latest Unpublished Book:

Quantitative Research Methods 2016.   [Highly recommended!  An excellent collection of important ideas.  Ed.]

Published Articles:

Significance Test, Confidence Interval, Both or Neither? 2013.

n: Answers to "How Many?" questions 2012

Using Pearson Correlations to Teach or Learn Statistics.  2012

Treating Ordinal Scales as Ordinal Scales.  1993 Nursing Research

Treating Ordinal Scales as Interval Scales: An Attempt to Resolve the Controversy.  1990 Nursing Research

Instances of Simpson's Paradox.  MAA College Math. Journal, 16:3, 209–211  1985. PDF

Knapp, T. R. (1977)  The unit-of-analysis problem in applications of simple correlation analysis to educational research. Source : Journal of Educational [and Behavioral] Statistics, 2, 171-186, 1977 Abstract:

Knapp, T.R. (1982) 'The unit and context of the analysis for research in educational administration', Educational Administration Quarterly, 18(1), 1-13. Abstract:


Short Papers:

Bias     2013.

Causality    2013.

Change    2013.

Dichotimization    2013.

Likert and Visual Analog Scales    2013.

iN versus (N-1) Re-visited    2013.

To pool or not to pool    2013.


Random (2012)
What is the meaning of the word "random"? What is the difference between random sampling and random assignment? If and when a researcher finds that some data are missing in a particular study, under what circumstances can such data be regarded as "missing at random"? These are just a few of the many questions that are addressed in this monograph. I have divided it into 20 sections--one section for each of 20 questions--a feeble attempt at humor by appealing to an analogy with that once-popular game. But it is my sincere hope that when you get to the end of the monograph you will have a better understanding of this crucial term than you had at the beginning.

To give you some idea of the importance of the term, the widely-used search engine Google returns a list of approximately 1.4 billion web pages when given the prompt "random". Many of them are duplicates or near-duplicates, and some of them have nothing to do with the meaning of the term as treated in this monograph (for example, the web pages that are concerned with the rock group Random), but many of those pages do contain some very helpful information about the use of "random" in the scientific sense with which I am concerned.

I suggest that you pay particular attention to the connection between randomness and probability (see Section 2), especially the matter of which of those is defined in terms of the other. The literature is quite confusing in that respect.

There are very few symbols and no formulas, but there are LOTS of important concepts. A basic knowledge of statistics, measurement, and research design should be sufficient to follow the narrative (and even to catch me when I say something stupid). Thanks for stopping by, and enjoy!

Table of Contents

Section 1: What is the meaning of the word "random"

Section 2: Which comes first, randomness or probability?

Section 3: Are randomness and chance the same thing?

Section 4: Is randomness a characteristic of a process or a product?

Section 5: What is a random-number generator?

Section 6: Where can you find tables of random numbers?

Section 7: What are tests of randomness?

Section 8: What is "random sampling" and why is it important?

Section 9: What is the difference between random sampling and random assignment?

Section 10: What is the randomized response method in survey research?

Section 11: Under what circumstances can data be regarded as missing at random?

Section 12: What is a random variable?

Section 13: What is the difference between random effects and fixed effects in experimental research?

Section 14: Can random sampling be either with replacement or without replacement?

Section 15: What is stratified random sampling and how does it differ from stratified random assignment (blocking)?

Section 16: What is the difference between stratified random sampling and quota sampling?

Section 17: What is random error?

Section 18: Does classical reliability theory necessarily assume random error?



Percentages: The Most useful Statistics Ever Invented (2009)
Table of Contents

Chapter 1: The basics

Chapter 2: Interpreting percentages

Chapter 3: Percentages and probability

Chapter 4: Sample percentages vs. population percentages

Chapter 5: Statistical inferences for differences between percentages and ratios of percentages

Chapter 6: Graphing percentages

Chapter 7: Percentage overlap of two frequency distributions

Chapter 8: Dichotomizing continuous variables: Good idea or bad idea?

Chapter 9: Percentages and reliability

Chapter 10: Wrap-up




You know what a percentage is. 2 out of 4 is 50%. 3 is 25% of 12. Etc. But do you know enough about percentages? Is a percentage the same thing as a fraction or a proportion? Should we take the difference between two percentages or their ratio? If their ratio, which percentage goes in the numerator and which goes in the denominator? Does it matter? What do we mean by something being statistically significant at the 5% level? What is a 95% confidence interval? Those questions, and much more, are what this book is all about.

In his fine article regarding nominal and ordinal bivariate statistics, Buchanan (1974) provided several criteria for a good statistic, and concluded: “The percentage is the most useful statistic ever invented…” (p. 629). I agree, and thus my choice for the title of this book. In the ten chapters that follow, I hope to convince you of the defensibility of that claim.

The first chapter is on basic concepts (what a percentage is, how it differs from a fraction and a proportion, what sorts of percentage calculations are useful in statistics, etc.) If you’re pretty sure you already understand such things, you might want to skip that chapter (but be prepared to return to it if you get stuck later on!).

In the second chapter I talk about the interpretation of percentages, differences between percentages, and ratios of percentages, including some common mis-interpretations and pitfalls in the use of percentages.

Chapter 3 is devoted to probability and its explanation in terms of percentages. I also include in that chapter a discussion of the concept of “odds” (both in favor of, and against, something). Probability and odds, though related, are not the same thing (but you wouldn’t know that from reading much of the scientific and lay literature).

Chapter 4 is concerned with a percentage in a sample vis-à-vis the percentage in the population from which the sample has been drawn. In my opinion, that is the most elementary notion in inferential statistics, as well as the most important. Point estimation, interval estimation (confidence intervals), and hypothesis testing (significance testing) are all considered.

The following chapter goes one step further by discussing inferential statistical procedures for examining the difference between two percentages and the ratio of two percentages, with special attention to applications in epidemiology.

The next four chapters are devoted to special topics involving percentages. Chapter 6 treats graphical procedures for displaying and interpreting percentages. It is followed by a chapter that deals with the use of percentages to determine the extent to which two frequency distributions overlap. Chapter 8 discusses the pros and cons of dichotomizing a continuous variable and using percentages with the resulting dichotomy. Applications to the reliability of measuring instruments (my second most favorite statistical concept--see Knapp, 2009) are explored in Chapter 9. The final chapter attempts to summarize things and tie up loose ends.

There is an extensive list of references, all of which are cited in the text proper. You may regard some of them as “old” (they actually range from 1919 to 2009). I like old references, especially those that are classics and/or are particularly apt for clarifying certain points. [And I’m old too.]


Learning Statistics through Playing Cards (1996 Sage; 2003, 2012)

A one-of-a-kind volume, Learning Statistics Through Playing Cards uniquely utilizes a simple deck of playing cards to explain the important concepts in statistics. Covering many of the topics included in introductory college statistics courses, author Thomas R. Knapp escorts the student through populations and variables, parameters, percentages, probability and sampling, sampling distribution, estimation, hypothesis testing, and two-by-two tables. Each chapter ends with a series of exercises designed to help the student actually manipulate the concept under discussion (the answers are provided at the back of the text). Also included is an annotated bibliography that directs the student toward further readings. This simple approach to teaching the elementary principles of statistics and probabilities makes this an exceptional supplementary text for undergraduates and first-year graduates in the social, behavioral, and health sciences.

The Reliability of Measuring Instruments (2009)
Tom is continually revising his reliability book. You can read the latest version, download it, or whatever, by visiting his website.]


Can you say "reliability" without saying "validity"? (Can you say "Rosencrantz" without saying "Guildenstern"?) I hope so, because this book is all about reliability, except for five appendices in which I discuss validity and for occasional comments in the text proper regarding the difference between reliability and validity. But isn't validity more important than reliability? Of course; a reliable instrument that doesn't measure what you want it to measure is essentially worthless. The problem is that the validity of a measurement device ultimately relies on the subjective judgment of experts in the field (all of the current emphasis on construct validity to the contrary notwithstanding), and my primary purpose in writing this book is to pursue those statistical features of measuring instruments that tell you whether or not, or to what extent, such instruments are consistent.

There are 14 chapters in the book. Chapter 1 is an introductory treatment of the concept of reliability, with special attention given to its many synonyms and nuances. The following chapter addresses the associated concept of measurement error, with an extended discussion of "randomness". Chapter 3 is devoted to classical reliability theory and is the most technical section of the book, but if you think back to your high school mathematics you will recognize the similarity to plane geometry, with its counterpart definitions, axioms, and theorems. (It is assumed that you are also familiar with descriptive statistics such as means, variances, and correlation coefficients, and with the basic principles of inferential statistics.)

Chapters 4 and 5 treat, respectively, the concept of attenuation and the interpretation of individual measurements. In Chapter 6 I try to summarize the literature regarding the reliability of difference scores of various types and the controversies concerning some of those types.

The matter of the reliability of individual test items is explored in Chapter 7. Discussion of the internal consistency reliability of the total score on a test that consists of more than one item (the usual case) follows naturally in Chapter 8, where the primary emphasis is on coefficient alpha (Cronbach's alpha). That chapter (Chapter 8) also includes a brief section in which I point out the methodological equivalence of internal consistency reliability and both inter-rater and intra-rater reliability.

Chapter 9 on intraclass correlations is my favorite chapter. Although their principal application has been to the reliability of ratings, they come up in all sorts of interesting contexts, including those concerned with the unit of analysis and the independence of observations.

Relative agreement vs. absolute agreement and ordinal vs. interval measurement provide the focus of Chapter 10. Most discussions of instrument reliability are concerned with the relative agreement between two equal-status operationalizations of a particular construct, but some are devoted exclusively to absolute agreement. Likert-type scales and other instruments that do not have equal units require special considerations. (Some of this material was originally included in various other chapters in previous editions of this book.)

Chapter 11 is concerned mostly with statistical inferences from samples of "measurees" to populations of "measurees", but some attention is also given to statistical inferences from samples of “measurers” to populations of “measurers”.

In Chapter 12 I try to bring everything together by applying classical reliability theory to a set of data that were generated in a study of alternative ways of measuring height. (The data, which have been graciously provided to me by Dr. Jean K. Brown, Dean, School of Nursing, University at Buffalo, State University of New York, are in Appendix A.)

The following chapter (Chapter 13) deals with a variety of special topics regarding instrument reliability. And a final chapter (Chapter 14) attempts to extend the concept of reliability of measuring instruments to the reliability of claims.

There is an appendix (Appendix B) on the validity of measuring instruments in general, an appendix (Appendix C) on the reliability and validity of birth certificates and death certificates, an appendix (Appendix D) on the reliability and validity of height and weight measurements, an appendix (Appendix E) on the reliability and validity of the four gospels, and an appendix (Appendix F) on the reliability and validity of claims regarding the effects of secondhand smoke. A list of references completes the work.

The book is replete with examples of various measurement situations (real and hypothetical), drawn from both the physical sciences and the social sciences. Measurement is at the heart of all sciences. Without reliable (and valid) instruments science would be impossible.

You may find my writing style to be a bit breezy. I can't help that; I write just like I talk (and nobody talks like some academics write!). I hope that my informal style has not led me to be any less rigorous in my arguments regarding the reliability of measuring measurements. If it has, I apologize to you and ask you to read no further if or when that happens. You may also feel that many of the references are old. Since I am a proponent of the "classical" approach to reliability, their inclusion is intentional.

I would like to thank Dr. Brown and Dr. Shlomo S. Sawilowsky (Wayne State University) for their very helpful comments regarding earlier manuscript versions of the various chapters in this book.

But don't hold them accountable for any mistakes that might remain. They're all mine.



Chapter 1 What do we mean by the reliability of a measuring instrument? Terminology Illustrative examples Necessity vs. sufficiency Additional reading

Chapter 2 Measurement error.  Attribute vs. variable When is something random? Obtained score, true score, and error score Dunn's example Continuous vs. discrete variables The controversial true score Some more thoughts about randomness Additional reading

Chapter 3 Reliability theory (abridged, with examples).  The basic concepts The first few axioms, definitions, and theorems A hypothetical example A different approach Some other concepts and terminology The key theorem A caution concerning parallelism and reliability Truman Kelley on parallelism and reliability Examples (one hypothetical, one real) Hypothetical data Real data Additional reading

Chapter 4 Attenuation.  What happens, and why The "correction" What can go wrong? How many ways are there to get a particular correlation between two variables? The effect of attenuation on other statistics Additional reading Chapter 5 The interpretation of individual measurements.  Back to our hypothetical example, and a little more theory How to interpret an individual measurement Point estimation Interval estimation Hypothesis testing Compounded measurement error Additional reading

Chapter 6 The reliability of difference scores.  Types of difference scores The general case Measure-remeasure differences Between-object differences Change scores Simple change Controversy regarding the measurement of simple change Modified change Percent change Weighted change Residual change Other difference scores that are not change scores Inter-instrument differences Inter- and intra-rater differences Our flow meter example (revisited) Additional reading

Chapter 7 The reliability of a single item.  Single-item examples X, T, and E for single dichotomous items Some approaches to the estimation of the reliability of single items The Knapp method (and comparison to the phi coefficient) The Guttman method Percent agreement and Cohen's kappa Spearman-Brown in reverse Visual analog(ue) scales Additional reading

Chapter 8 The internal consistency of multi-item tests.  A little history Kuder and Richardson Cronbach How many items? Factor analysis and internal consistency reliability Inter-item and item-to-total correlations Other approaches to internal consistency Inter-rater reliability and intra-rater reliability Additional reading

Chapter 9 Intraclass correlations.  The most useful one The one that equals Cronbach’s alpha Additional reading

Chapter 10 Two vexing problems.  Absolute vs. relative agreement Mean and median absolute differences Ordinal vs. interval measurement Kendall’s tau-b Goodman & Kruskal’s gamma Williams’ method Back to John and Mary Additional reading

Chapter 11 Statistical inferences regarding instrument reliability.  Parallel forms reliability coefficients Test-retest reliability coefficients Intraclass correlations Coefficient alpha Cohen's kappa Reliability and power Sample size for reliability studies The effect of reliability on confidence intervals in general Our flow meter example (re-revisited) Random samples vs. "convenience" samples Additional reading

Chapter 12 A very nice real-data example.   Background and the study itself Over-all parallelism Over-all reliability The 82 measurers Tidbits

Chapter 13 Special topics.  Some other conceptualizations of reliability Generalizability theory Item response theory Structural equation modeling Norm-referenced vs. criterion-referenced reliability Unit-of-analysis problems Weighting Missing-data problems Some miscellaneous educational testing examples Some more esoteric contributions

Chapter 14 The reliability of claims


Appendix A The very nice data set

Appendix B The validity of measuring instruments

Appendix C The reliability and validity of birth and death certificates

Appendix D The reliability and validity of height and weight measurements

Appendix E The reliability and validity of the four gospels

Appendix F The reliability and validity of claims regarding the effects of secondhand smoke




Home | Gary Klass | Tom Knapp | Othmar Winkler | Judea Pearl | Herbert I. Weisberg | J. Cornfield | Wayne Winston | Conrad Carlberg | Jeffrey Bennett | John Brignell | Dennis Haack

This site was last updated 09/04/16