Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Biostatistics for Medical and Biomedical Practitioners
Biostatistics for Medical and Biomedical Practitioners
Biostatistics for Medical and Biomedical Practitioners
Ebook1,461 pages12 hours

Biostatistics for Medical and Biomedical Practitioners

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Biostatistics for Practitioners: An Interpretative Guide for Medicine and Biology deals with several aspects of statistics that are indispensable for researchers and students across the biomedical sciences.

The book features a step-by-step approach, focusing on standard statistical tests, as well as discussions of the most common errors.

The book is based on the author’s 40+ years of teaching statistics to medical fellows and biomedical researchers across a wide range of fields.

  • Discusses how to use the standard statistical tests in the biomedical field, as well as how to make statistical inferences (t test, ANOVA, regression etc.)
  • Includes non-standards tests, including equivalence or non-inferiority testing, extreme value statistics, cross-over tests, and simple time series procedures such as the runs test and Cusums
  • Introduces procedures such as multiple regression, Poisson regression, meta-analysis and resampling statistics, and provides references for further studies
LanguageEnglish
Release dateSep 3, 2015
ISBN9780128026076
Biostatistics for Medical and Biomedical Practitioners
Author

Julien I. E. Hoffman

Julien I E Hoffman, M.D., F.R.C.P (London) was born and educated in Salisbury (now Harare) in Southern Rhodesia (now Zimbabwe). He received a Bsc (Hons) in 1945 from the University of the Witwatersrand in South Africa, and his M.B., B.Ch. degree there in 1949. After working in the Departments of Medicine in Johannesburg General Hospital and in the Central Middlesex Hospital in London, he worked for the Medical Research Council at the Royal Postgraduate School in Hammersmith, London. Then he spent two years training in Pediatric Cardiogy at Boston Children’s Hospital, followed by 15 months as a Fellow at the Cardiovascular Research Institute (CVRI) at the University of California in San Francisco (UCSF). In 1962 he joined the faculty of the Albert Einstein College of Medicine in New York, and moved in 1966 to UCSF as Associate Professor of Pediatrics and member of the CVRI. He spent 50% of his time in the care of children with heart disease and 50% of his time doing research into the pathophysiology of the coronary circulation. His interest in Statistics began while taking his Science degree. In England, he took a short course run by Bradford Hill. On returning to Johannesburg he was assigned to statistical analyses for other members of the Department of Medicine. Learning was by trial and error, helped by Dr J Kerrich, head of the University’s Statistics Department. Hoffman began teaching statistics to Medical students in 1964, and in San Francisco conducted an approved course for Fellows and Residents for over 30 years. He was a member of the Biostatistics group for approving and coordinating statistics at UCSF. For many years he was a statistical consultant for the journal Circulation Research, and was intermittently statistical consultant to several other medical journals.

Related to Biostatistics for Medical and Biomedical Practitioners

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Biostatistics for Medical and Biomedical Practitioners

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Biostatistics for Medical and Biomedical Practitioners - Julien I. E. Hoffman

    Biostatistics for Medical and Biomedical Practitioners

    Julien I.E. Hoffman

    Tiburon, California, USA

    Table of Contents

    Cover image

    Title page

    Copyright

    About the Author

    Preface

    Acknowledgments

    Part 1. Basic Aspects of Statistics

    Chapter 1. Basic Concepts

    Introduction

    Basic Uses of Statistics

    Data

    General Approach to Study Design

    A Brief History of Statistics

    Chapter 2. Statistical Use and Misuse in Scientific Publications

    Early Use of Statistics

    Current Tests in Common Use

    Statistical Misuse

    Basic Guides to Statistics

    Chapter 3. Some Practical Aspects

    Statistics Programs

    Variables

    Measurement Scales

    Displaying Data Sets

    Accuracy of Measurement

    Notation

    Operators

    Weights

    Statistics Books

    Chapter 4. Exploratory Descriptive Analysis

    Basic Concepts

    Advanced and Alternative Concepts

    Appendix

    Chapter 5. Basic Probability

    Introduction

    Types of Probability

    Basic Principles and Definitions

    Conditional Probability

    Bayes' Theorem

    Part 2. Continuous Distributions

    Chapter 6. Normal Distribution

    Introduction

    Normal or Gaussian Curve

    Populations and Samples

    Description of the Distribution Shape

    Determining Normality

    Ungrouped Data

    How Important Is Normality?

    Chapter 7. Statistical Limits and the Central Limit Theorem

    Central Limit Theorem

    Tolerance Limits

    Reporting Results

    Chapter 8. Other Continuous Distributions

    Continuous Uniform Distribution

    Exponential Distribution

    Logarithmic Distribution

    Weibull Distribution

    Chi-Square Distribution

    Variance Ratio (F) Distribution

    Chapter 9. Outliers and Extreme Values

    Outliers

    Extreme Values

    Appendix

    Part 3. Hypothesis Testing

    Chapter 10. Hypothesis Testing

    Hypotheses

    Significance

    Chapter 11. Hypothesis Testing

    Basic Concepts

    Advanced and Alternative Concepts

    Part 4. Discrete and Categorical Distributions

    Chapter 12. Permutations and Combinations

    Permutations

    Combinations

    Chapter 13. Hypergeometric Distribution

    Introduction

    General Formula

    Fisher's Exact Test

    Multiple Groups

    Chapter 14. Categorical and Cross-Classified Data

    Basic Concepts

    Advanced Concepts

    Chapter 15. Categorical and Cross-Classified Data

    Paired Samples: McNemar's Test

    Testing Ordered Categorical Data: Kolmogorov–Smirnov Tests

    Concordance (Agreement) between Observers

    Intraclass Correlation

    Chapter 16. Binomial and Multinomial Distributions

    Basic Concepts

    Advanced or Alternative Concepts

    Appendix

    Chapter 17. Proportions

    Introduction

    Proportions and Binomial Theorem

    Confidence Limits

    Sample and Population Proportions

    Sample Size

    Comparing Proportions

    Pooling Samples

    Chapter 18. The Poisson Distribution

    Introduction

    Relationship to the Binomial Distribution

    Goodness of Fit to a Poisson Distribution

    The Ratio of the Variance to the Mean of a Poisson Distribution

    Setting Confidence Limits

    The Square Root Transformation

    Cumulative Poisson Probabilities

    Differences between Means of Poisson Distributions

    Determining the Required Sample Size

    Appendix

    Chapter 19. Negative Binomial Distribution

    Introduction

    Probability of r Successes

    Overdispersed Distribution

    Uses of the Negative Binomial

    Part 5. Probability in Epidemiology and Medical Diagnosis

    Chapter 20. Some Epidemiological Considerations

    Basic Concepts

    Advanced Concepts

    Chapter 21. Probability, Bayes' Theorem, Medical Diagnostic Evaluation, and Screening

    Bayes' Theorem Applied

    Sensitivity and Specificity

    Likelihood Ratios

    Cutting Points

    Receiver Operating Characteristic Curves

    Some Comments on Screening Tests

    Part 6. Comparing Means

    Chapter 22. Comparison of Two Groups

    Basic Concepts

    Advanced Concepts

    Appendix

    Chapter 23. t-Test Variants

    Crossover Trials

    Equivalence and Noninferiority Testing

    Chapter 24. Multiple Comparisons

    Introduction

    Bonferroni Correction and Equivalent Tests

    Group Sequential Boundaries

    Sequential Analysis

    Adaptive Methods

    Chapter 25. Analysis of Variance I. One-Way

    Basic Concepts

    Advanced Concepts

    Chapter 26. Analysis of Variance II. More Complex Forms

    Basic Concepts

    Advanced and Alternative Concepts

    Appendix

    Part 7. Regression and Correlation

    Chapter 27. Linear Regression

    Basic Concepts

    Advanced or Alternative Concepts

    Appendix

    Chapter 28. Variations Based on Linear Regression

    Transforming the Y Variate

    Inverse Prediction

    Line of Best Fit Passes through Zero

    Errors in the X Variate

    Break Points

    Resistant Lines

    Appendix

    Chapter 29. Correlation

    Basic Concepts

    Advanced and Alternative Concepts

    Appendix

    Chapter 30. Multiple Regression

    Basic Concepts

    Advanced Concepts and Examples

    Chapter 31. Serial Measurements

    Introduction

    Serial Correlation

    Control Charts

    Cumulative Sum Techniques (Cusums)

    Serial Measurements

    Chapter 32. Dose–Response Analysis

    General Principles

    Quantal Dose–Response Curves

    Chapter 33. Logistic Regression

    Introduction

    Single Explanatory Variable

    Multiple Explanatory Variables

    Appropriateness of Model

    Chapter 34. Poisson Regression

    Introduction

    Suitability of Poisson Regression

    Detecting Overdispersion

    Correcting for Overdispersion

    Part 8. Miscellaneous Topics

    Chapter 35. Survival Analysis

    Basic Concepts

    Advanced Concepts

    Chapter 36. Meta-analysis

    Introduction

    Forest Graphs

    Funnel Plots

    Radial Plots

    L'Abbé Plots

    Criticisms of Meta-analysis

    Chapter 37. Resampling Statistics

    Introduction

    Bootstrap

    Permutations

    Jackknife

    Monte Carlo Methods

    Chapter 38. Study Design

    Sampling Problems

    Historical Controls

    Randomization

    Clinical Trials

    Placebo Effect

    Alternatives to Randomized Clinical Trials

    Part 9. End Texts

    Answers to Problems

    Glossary

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    125 London Wall, London EC2Y 5AS, UK

    525 B Street, Suite 1800, San Diego, CA 92101-4495, USA

    225 Wyman Street, Waltham, MA 02451, USA

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK

    Copyright © 2015 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    ISBN: 978-0-12-802387-7

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    For information on all Academic Press publications visit our website at http://store.elsevier.com/

    Publisher: Mica Haley

    Acquisition Editor: Rafael Teixeira

    Editorial Project Manager: Mariana Kuhl

    Production Project Manager: Julia Haynes

    Designer: Victoria Pearson

    Typeset by TNQ Books and Journals

    www.tnq.co.in

    Printed and bound in the United States of America

    About the Author

    Julien I. E. Hoffman was born and educated in Salisbury (now named Harare) in Southern Rhodesia (now named Zimbabwe) in 1925. He entered the Faculty of Medicine at the University of the Witwatersrand in Johannesburg, South Africa, and obtained a BSc Hons in Anatomy and Physiology in 1945 and a medical degree (MB, BCh) in 1949. After working for almost 3  years at the Central Middlesex Hospital in London, England, as a house officer and registrar in Medicine, he returned to Johannesburg as a senior registrar in the Department of Medicine for 18  months, and then returned to England to work for the Medical Research Council. In 1957, he went to Boston Children's Hospital to study congenital heart disease, and this was followed by 15  months as a Fellow in the Cardiovascular Research Institute at the University of California in San Francisco.

    In 1962, he joined the faculty at the Albert Einstein College of Medicine in New York City as an Assistant Professor of Pediatrics and Internal Medicine, and in 1966, returned to the University of California in San Francisco as Associate Professor of Pediatrics and member of the Cardiovascular Research Institute. He was a clinical pediatric cardiologist, taking care of children with heart disease, but spent about 50% time running a research laboratory to study the pathophysiology of the coronary circulation.

    His interest in statistics began while taking his Science degree. After going to England, he took short courses in Biostatistics from Bradford Hill and his colleagues. On returning to South Africa, as the only department member to know anything about statistics, he was assigned to perform statistical analyses for other members of the department. This was a period of learning by trial and error, helped by Dr J Kerrich, head of the University's Division of Statistics.

    When he went to San Francisco as a Fellow, he was assigned to give statistics lectures to the Fellows and Residents, and when returned to San Francisco in 1966, he gave an officially sanctioned course in Biostatistics to research Fellows and Residents. These lectures were given annually for over 30  years. He was a member of the Biostatistics group, a semiformal group that supervised statistical teaching and consultation. He also was a statistical consultant to the journal Circulation Research, and was often assigned manuscripts by other journals, mostly from the American Heart Association, to check on the statistical procedures used.

    Preface

    In my 40  years experience of advising medical research workers about how to analyze their studies, certain problems arose frequently. For examples, many investigators wanted to compare two Poisson distributions, yet some introductory books on Biostatistics give little attention to the Poisson distribution, even though it is an important distribution that answers the question about how often a rare event occurs, such as the number of deliveries per hour in a delivery room. Few people can navigate the minefield of multiple comparisons, involved when several different groups are compared, often done incorrectly by performing multiple t-tests, yet most elementary texts do not deal with this problem adequately. Problems of repeated measures analysis in which several measurements are made in each member of the group and thus are not independent occur frequently in medical research but are not often discussed. Tolerance tests are often needed to set ranges of normal values so that a single measurement can be assessed as likely to be normal or abnormal (such as a single fasting blood glucose concentration). Because most basic books do not discuss this problem, most people incorrectly set confidence limits that should apply only to mean values. In fact, one of the incentives for this book was the lack of books in introductory Biostatistics that could be understood relatively easily, but nevertheless were advanced enough that most investigators would not need to buy additional books or hunt through unfamiliar journals for appropriate tests.

    The book is intended to help physicians and biologists who might have had a short course on Statistics several years ago, but have forgotten all but a few of the terms and concepts and have not used their knowledge of statistics for reading the literature critically or for designing experiments. The general aim is to extend their knowledge of statistics, to indicate when various tests are applicable, what their requirements are, and what can happen when they are used inappropriately.

    This book has four components.

    1. It covers the standard statistical approaches for making descriptions and inferences—for example, mean and standard deviation, confidence limits, hypothesis testing, t-tests, chi-square tests, binomial, Poisson, and normal distributions, analysis of variance, linear regression and correlation, logistic regression, and life tables—to help readers understand how the tests are constructed, how to look for and avoid using inappropriate tests, and how to interpret the results. Examples of injudicious use of these tests are given. Although some basic formulas are presented, these are not essential for understanding what the tests do and how they should be interpreted.

    2. Some chapters include a section on advanced methods that should be ignored on a first reading but provide information when needed, and others have an appendix where some simple algebraic proofs are given. As this is not intended to be a mathematically rigorous book, most mathematical proofs are omitted, but a few are important teaching tools in their own right and should be studied. However, knowledge of mathematics (and differential calculus in particular) beyond elementary algebra is not required to use the material provided in this book. The equations indicate what the tests are doing.

    3. Scattered throughout the chapters are variations on tests that are often needed but not frequently found in basic texts. These sections are often labeled Alternative Methods, and they should be read and understood because they often provide simpler and more effective ways of approaching statistical inference. These include:

    a. Robust statistics for dealing with grossly abnormal distributions, both univariate and bivariate.

    b. Extending McNemar's test, a test for comparing paired counts, to more than two categories; for example, if matched pairs of patients are given one of two treatments and the results recorded as improved, the same, or worse, how should these be analyzed?

    c. Equivalence or noninferiority testing, to determine if a new drug or vaccine is equivalent to or not inferior to those in standard use.

    d. Finding the break point between two regression lines. For example, if the lactate:pyruvate ratio remains unchanged when systemic oxygen delivery is reduced below normal until some critical point is reached when the ratio starts to rise, how do we determine the critical oxygen delivery value?

    e. Competing risks analysis used when following the survival of a group of patients after some treatment, say replacement of the mitral valve, and allowing for deaths from noncardiac causes.

    f. Tolerance testing to determine if a single new measurement is compatible with a normative group.

    g. Crossover tests, in which for a group of subjects each person receives two treatments, thus acting as his or her own control.

    h. Use of weighted kappa statistics for evaluating how much two observers agree on a diagnosis.

        Some of these analyses can be found only in journals or advanced texts, and collecting them here may save investigators from having to search in unfamiliar sources to find them.

    4. Some chapters describe more complex inferences and their associated tests. The average investigator is not likely to use any of these tests without consultation with a statistician, but does need to know that these techniques exist and, if even vaguely, what to look for and how to interpret the results of these tests when they appear in publications. These subjects include:

    a. Poisson regression (Chapter 34), in which a predicted count, for example, the number of carious teeth, is determined by how many subjects have zero, 1, 2, etc., carious teeth.

    b. Resampling methods (Chapter 37), in which computer-intensive calculations allow the determination of the distributions and confidence limits for mean, median, standard deviations, correlation coefficients, and many other parameters without needing to assume a particular distribution.

    c. The negative binomial distribution (Chapter 19) that allows investigation of distributions that are not random but in which the data are aggregated. If we took samples of seawater and counted the plankton in each sample, a random distribution of plankton would allow us to fit a standard distribution such as a binomial or Poisson distribution. If, however, some samples had excessive numbers of plankton and others had very few, a negative binomial distribution may be the way to evaluate the distribution.

    d. Meta-analysis (Chapter 36), in which the results of several small studies are aggregated to provide a larger sample, for example, combining several small studies of the effects of beta-adrenergic blockers on the incidence of a second myocardial infarction, is often used. The pitfalls of doing such an analysis are seldom made clear in basic statistics texts.

    e. Every investigator should be aware of multiple and nonlinear regression techniques (Chapter 30) because they may be important in planning experiments. They are also used frequently in publications, but usually without mentioning their drawbacks.

    With the general availability of personal computers and statistical software, it is no longer necessary to detail computations that should be done by computer programs. There are many simple free online programs that calculate most of the commonly used statistical descriptions (mean, median, standard deviation, skewness, interquartile distance, slope, correlation, etc.) as well as commonly used inferential tests (t-test, chi-square, ANOVA, Poisson probabilities, binomial probabilities, life tables, etc.), along with their associated graphics. More complex tests require commercial programs. There are free online programs for almost all the tests described in this book, and hyperlinks are provided for these.

    Problems are given in appropriate chapters. They are placed after a procedure is described so that the reader can immediately practice what has been studied to make sure that the message is understood; the procedures are those that should be able to be performed by the average reader without statistical consultation. Although the problems are simple and could be done by hand, it is better to use one of the recommended online calculators because they save time and do not make arithmetic errors. This frees up time for the reader to consider what the results mean.

    The simpler arithmetic techniques, however, are still described in this book because they lead to better understanding of statistical methods, and show the reader where various components of the calculation come from, and how the components are used and interpreted. In place of tedious instructions for doing the more complex arithmetic procedures, there is a greater concentration on the prerequisites for doing each test and for interpreting the results. It is easier than ever for the student to think about what the statistical tests are doing and how they contribute to solving the problem. On the other hand, we need to resist the temptation to give a cookbook approach to solving problems without giving some understanding of their bases, even though this may involve some elementary algebra. As Good and Hardin (2009) wrote: Don't be too quick to turn on the computer. Bypassing the brain to compute by reflex is a sure recipe for disaster.

    Many people who learn statistics devise and carry out their own experiments, and for them a good knowledge of statistics is essential. There are many excellent statistical consultants, but not enough of them to advise every investigator who is designing an experiment. An investigator should be able to develop efficient experiments in most fields, and should reserve consultation only for the more complex of these. But more numerous and important are those who do not intend to do their own research. Even if the person is not a research worker, he or she is still responsible for assessing the merit of the articles that they are reading to gain new knowledge. It is no longer good enough for such a reader to know only technical information about the drugs used, the techniques for measuring pressure and flow, or to understand the physiologic and biochemical changes that take place in the disease in question. It is as essential for the reader to learn to read critically, and to know how to evaluate the statistical tests used. Who has not been impressed by the argument that because Fisher's exact test showed a probability of 0.003, the two groups were different, or that because the probability was 0.08 the two groups were not different? Who has heard of Fisher's exact test? Of McNemar's test? Of the Kolmogorov–Smirnov two group test? Of Poisson regression? And if the reader has not heard of them, how can he or she know if they should have been used, or have been used and interpreted correctly? With the pace at which new knowledge is appearing and being incorporated into medical practice, everyone engaged in medical research or practice needs to be well grounded in Statistics. Statistical thinking is not an addendum to the scientific method, but is an integral component of it.

    What can you expect to achieve after studying this book?

    1. A better appreciation of the role of variation in determining average values, and how to take account of this variation in assessing the importance of measured values or the difference between these values in two or more groups.

    2. A better understanding of the role of chance in producing unexpected results, and how to make decisions about what the next steps should be.

    3. An appreciation of Bayes' theorem, that is, how to improve estimates of probability by adding in potential explanatory variables, and the importance of the population prevalence in making clinical predictions.

    4. An ability to perform simple statistical tests (e.g., t-tests, chi-square, linear regression and correlation, McNemar's test, simple analysis of variance, calculate odds ratios and their confidence limits, and calculate simple life tables), as well as understanding their limitations.

    5. Appreciate that there are many statistical techniques that can be used to address specific problems, such as meta-analysis, bioassay methods, nonlinear and multiple regression, negative binomial distributions, Poisson regression, and time series analysis. It is unlikely that after studying this book you will be able to perform these analyses on your own, but you should know that such methods exist and that a statistical consultant can help you to choose the right analytic method.

    Statistical procedures are technologies to help investigators interpret their data, and are not ends in themselves. Like most technologies, Statistics is a good servant but a bad master. Francis Galton (1889) wrote: Some people hate the very name of statistics, but I find them full of beauty and interest. Whenever they are not brutalized but delicately handled by the higher methods, and are warily interpreted, their power of dealing with complicated phenomena is extraordinary.

    References

    Galton F.I. Natural Inheritance. London: Macmillan and Company; 1889 p. 62.

    Good P.I, Hardin J.W. Common Errors in Statistics (and How to Avoid Them). Hoboken, NJ: John Wiley & Sons; 2009 p. 274.

    Acknowledgments

    Many people have helped me eradicate errors and clumsy phrasing. I owe a debt of gratitude to Joseph P. Archie, Jr, Raul Domenech, Stanton Glantz, Gus Vlahakes, and Calvin Zippin. My special thanks to my wife for help with editing.

    I also wish to thank the staff of Elsevier for their hard work, with particular thanks to Rafael Teixeira, Mariana Kühl Leme, and Julia Haynes.

    Part 1

    Basic Aspects of Statistics

    Outline

    Chapter 1. Basic Concepts

    Chapter 2. Statistical Use and Misuse in Scientific Publications

    Chapter 3. Some Practical Aspects

    Chapter 4. Exploratory Descriptive Analysis

    Chapter 5. Basic Probability

    Chapter 1

    Basic Concepts

    Abstract

    This chapter introduces the major aspects of statistical approaches to scientific investigation. It describes populations and sample, variables and parameters; considers the main ways in which statistical approaches are used; describes the principles of designing a scientific study; and ends by giving a brief history of the development of statistical thought.

    Keywords

    Scientific investigation; Scientific study; Statistical approaches; Statistical thought

    Contents

    Introduction 3

    Populations and Samples 4

    The Effects of Variability 4

    Variables and Parameters 5

    Basic Uses of Statistics 6

    Description 6

    Statistical Inference 7

    Data 9

    Models 10

    General Approach to Study Design 10

    A Brief History of Statistics 13

    References 16

    Introduction

    Statistics is a crucial part of scientific method for testing theories derived from empirical studies. In most scientific studies the investigator starts with an idea, examines the literature to determine what is known and not known about the subject, formulates a hypothesis, decides what and how measurements should be made, collects and analyzes the results, and draws conclusions from them. Excellent introductions to these processes are provided by Altman (1992) and Hulley et al. (2007).

    One of the main aims of scientific study is to elucidate causal relationships between a stimulus and a response: if A, then B. Because responses to a given stimulus usually vary, we need to deal with variation and the associated uncertainty. In thinking about the height of adult males in the USA, there is some typical number that characterizes height, for example, 70  in or 178  cm, but there is variation around that typical value. Although fasting normal blood glucose is about 100  mg/dl, not every normal person will have this exact blood sugar concentration. If we measure the length of 2-in long 20-gauge needles with great accuracy, there will be variation in length from needle to needle. It is an essential role of statistical thought to deal with the uncertainty caused by variability. One of the major ways to assess uncertainty is by doing an appropriate statistical test, but the test is merely the instrument that produces a result that needs to be interpreted. Performing a statistical test without the correct interpretation may lead to incorrect conclusions.

    Populations and Samples

    It is essential to distinguish between measurements in an entire group with a given characteristic, known as a population, and those in a subset known as a sample . The slope of a straight line (for example, relating height to blood pressure in a population of children) is symbolized by β, but in a sample from that population is symbolized by b.

    The Effects of Variability

    How does variability interfere with drawing valid conclusions from the data obtained in a study? A recent article described that in the year 2000, adult Dutch males were 4.9  cm taller than their US counterparts (Komlos and Lauderdale, 2007). Several questions can be asked about this result. First, how the measurements were made and with what accuracy; this is not a statistical question, but one that requires knowledge of the field of study; second, how likely is that difference in height in the samples to be a true measure of the height differences.

    One way to answer this question would be to measure the whole population. All the adult males in the Netherlands constitute one population, and all adult males in the USA form another population. In theory, heights could be measured in the whole population of adult males in both countries, but this would be an enormous undertaking in time, resources, and money. It is much easier to select a sample (a subset of the population) of adult males from each country and use the resultant measurements to make inferences about national heights. We need a precise definition of the population being studied. For example, national heights depend on the year of measurement: In 1935, adult males in the Netherlands and the USA had equal average heights that were both less than those measured in the year 2000. Population 1935 and Population 2000 are two different populations.

    If the measurements were made in the whole population, the results would be unambiguous, and no statistical inference is needed—all adult males in the Netherlands in year 2000 would be on average 4.9  cm taller than their US counterparts. But if the measurements are actually made in a sample from each country, another question arises. How well does the difference in the averages of the two samples reflect the true difference in the averages of the two populations? Is it possible that the sample from the Netherlands contained a disproportionate number of tall subjects, and that the two population average heights differ by less than 4.9  cm? Indeed, it is possible, either because of bias in selecting subjects or even at random. The term random means governed by chance, so that any member of the population has an equal chance of being included in the sample; if the sample taken was of basketball players, then their heights clearly do not represent the heights in the general population. Issues of randomization and selection bias are dealt with in detail in Chapter 38. What statistical analysis can do is to allow us to infer from the samples to the population, and in this instance to indicate, based on our samples, the likely values for the population averages.

    The same considerations apply to experiments. In a population of people with diabetes mellitus, their fasting blood glucose concentrations might range from 110 to 275  mg/dl. If we select two samples of 10 patients each from this population, it is unlikely that the two samples will have the same concentrations; almost certainly, one group will have some concentrations higher or lower than any concentrations in the other group, and the mean concentrations in the two groups are unlikely to be the same. If the group with the lower mean blood glucose concentration had been given a drug that in reality had no effect on fasting blood glucose concentration, not knowing that both were from the same population would lead to the false conclusion that the drug had caused fasting blood glucose concentration to decrease. It is to guard against this type of error (and many other types) that we need to think about statistical inference.

    Variables and Parameters

    A characteristic that has different values in different people, times, or things is a variable. Variables such as weight, height, age, or blood pressure can be measured in appropriate units. Others, such as eye color, gender, or presence or absence of illness, are attributes, and typically we count the number of items possessing each attribute. Sometimes we deal with one variable at a time, for example, weight gain of groups of young rats on different diets; these lead to univariate statistical descriptions and testing. At other times we try to relate one variable to another, such as age and height in growing children, and then we are dealing with bivariate statistics. Finally, many different variables might be involved.

    Measured variables have several components. If we measure the heights of different people, there will be differences due to individual genetic and environmental (nutritional) influences. These differences are due to the biological processes that control height and may be the signals that we are examining. On the other hand, if we measured the height of one particular person several times, we might get variations that are due to the degree to which the subject stretches. The person's true height is not changing over a few seconds, but there is variability of the measurement process itself. This source of variability is sometimes known as noise, and we usually do our best to eradicate or minimize it. Finally, the signal might not represent the true value. For example, if each person measured wore shoes with 1-in heels, their heights would all be 1-in greater than they should be. This represents a consistent bias, and as far as possible such biases should be detected and eliminated. These ideas produce a model:

    The more we can reduce the second and third sources of error (bias and noise), the closer will be the measured and true values. Statistics is therefore a method for separating the signal from the noise in which it is embedded. Statistical tests help to allow for noise, but do not eliminate bias unless special steps are taken.

    Noise includes not only measurement error, but also legitimate variation. Thus in the example of adult male heights, the true value sought is the average in each country. Any one person can be above or below that average for reasons of genetics or nutrition, so that the model is slightly different. It is:

    Measured height in subject A  =  Average national height  ±  individual difference between the height of subject A and the average. This individual difference from the average is known as error, often symbolized by ε.

    This is a form of noise that is biological, not due to measurement error, but it plays the same role in making it difficult to determine the true value of the desired signal.

    The term parameter has different meanings in different fields of study. In statistics, it is generally used to indicate the numerical characteristic of a population, such as the mean μ (average), slope of a relationship β, and proportion of positive results π. The individual sample values from the population are variables, the characteristic number is the parameter.

    Basic Uses of Statistics

    Description

    The first use is descriptive: how many people survived therapy A? What was the average length of the femur at any given age in children with growth hormone deficiency with and without treatment? How close is the relation of height to blood pressure? There are many ways of making these descriptions, most of them relatively simple. The results obtained are termed point estimates.

    These descriptions concern a specific set of observations, and are unlikely to be identical for another similar set of data. What is important is how much other similar sets of data from the same population might vary from that data set. One way of determining this is to draw many samples from the population and determine how their means vary, but there is a simpler way to obtain the same answer. Elementary statistical theory provides limits within which 99%, 95%, 90% (or any other percentage desired) of the means of all future sets of similar data will fall. Thus the sample average (or mean) height of 97 children of a particular age may be 39  in and the standard deviation (a measure of variability) may be 1.2  in. This point estimate of the mean and the observed standard deviation then are used to predict that 95% of all similar sized groups of children of that age will have mean heights varying between 37.3 and 40.7  in; these are referred to as 95% confidence limits or intervals. (A confidence interval is a range of values based on the sample observations that allows us to predict with a given probability (usually 95%) where the true characteristic value of the population is likely to occur. The interval is determined by using information from the size of the sample—a small sample is less reliable than a large one and gives a bigger interval—and the variability of the sample data as indicated by the standard deviation.) Therefore an estimate of mean height in a single sample (the point estimate) allows us to predict the limits within which the means of other samples of the same size are likely to lie, and therefore within what range the population mean is likely to be. Setting these confidence limits is an essential extension of descriptive analysis. How to estimate the population mean from the sample mean will be discussed in later chapters.

    Point estimates are specific for a particular sample and are as accurate as measurement and calculation can make them. They give a specific answer, just as an assayer testing an ore sample can tell how much gold is present. However, the confidence limits not only depend on the model used but also convey the degree of uncertainty that exists. Given the data, there is a 95% probability of being right in setting the limits for that mean value in other samples from the same population, but that goes along with a 5% chance of being wrong. Limits may be widened to 99%, or 99.5%, or 99.9999%, but certainty is never achieved. Could the randomly chosen sample of adult males have by chance selected the tallest members of the population so that the estimate of mean height, instead of being 175  cm, was 185  cm? Yes, indeed such an aberrant sample could have been chosen, even if it is very unlikely, and if it had been chosen our estimates of the mean population height would be seriously in error. That is the uncertainty that we have to live with.

    Statistical Inference

    When we ask whether the mean heights of treated and untreated children with growth hormone deficiency are different, we invoke the second use of statistics—statistical inference that is done with the aid of hypothesis tests. In general, we generate hypotheses about the possible effects of treatment, and then deploy statistical techniques to test the appropriateness of these hypotheses. Often the null hypothesis is selected, that is, we test the possibility that the differences between the groups (taking growth hormone) have not caused a change in an outcome (mean height). If the tests indicate that the differences observed would be unlikely to occur by chance, then we may decide to reject the null hypothesis. One of the advantages of the statistical method is that it can give an approximate probability of correctly rejecting the null hypothesis. It does so by calculating a P value that may be defined as the probability of finding an observed difference or even more extreme differences (say in mean heights) if the null hypothesis is true. This subject is dealt with in depth in Chapter 8.

    Statistical methods also allow us to investigate relationships among variables that may be thought of as either causative (independent) or responsive (dependent). Thus if we give a drug to patients and observe a fall in blood pressure, we believe that the drug (independent cause) produced the fall in pressure (dependent response).

    We investigate relationships in two main ways. In the survey method, we collect examples from one or more populations and determine if the independent and dependent variables are related. In the survey there may or may not be a comparison group. Without comparison groups it is a descriptive study. With comparison groups it is an analytical study (Grimes and Schulz, 2002). For example, the role of fluoride in preventing dental caries was examined in cities that did or did not supplement their water supplies with fluoride; the amount of caries was much lower in people living in the cities with higher water fluoride concentrations (Grimes and Schulz, 2002). The advantages of an observational study are that large populations can be studied and that individuals do not have to be manipulated and assigned to groups; the disadvantage is that factors other than the fluoride might have caused the observed differences in the incidence of dental caries. Factors extraneous to but related to the factors being studied are termed confounding factors. For example, the incidence of dental caries might really be due to deficiency of another factor (X) that is inversely correlated with fluoride. If we do not know about X, or do not measure it, then we would rightly conclude that increased fluoride in the water was associated with decreased incidence of caries, but would be incorrect in stating that the increased amount of fluoride caused the decreased incidence of caries because that decrease was really caused by a deficiency of substance X. X is the confounding factor. Confounding variables are not necessarily unimportant, but they confuse the relationship that is being examined.

    Had subjects been allocated at random to a high- or a low-fluoride group, there would have been fewer possible confounding factors, but there would have been far fewer people examined (because of the cost), and the investigators would have had to make sure that no other sources of fluoride were introduced.

    There are three main types of analytical studies. One is the cohort study, in which people exposed to an agent are compared with people who are not so exposed. For example, a survey might compare a large group of people who took aspirin regularly with another group (matched for age, gender, and anything else that seems to be important) that did not take aspirin (exposed vs nonexposed), and after some years the investigators determine how many in each group had had a myocardial infarction (outcome). This might be a prospective study, but could also be retrospective if the outcomes were examined from a database started 10  years ago. A second type of study, case-control study, starts with the outcome and then looks back at exposure; for example, taking 100 patients who had had a myocardial infarction and another 100 with no infarction (appropriately matched for age and gender) and looking back to see how many in each group had taken regular aspirin. The case-control study is often used when investigating rare diseases or outcomes. The third type of study is the cross-sectional study in which exposure and outcome are determined at the same time in the study population. For example, in a group of hospitalized patients, high-density lipoprotein (HDL) concentrations are measured in those who have had a myocardial infarction and a comparable group who have not. If the first group has low HDL concentrations on average and the second group does not, then there is an association between HDL concentration and myocardial infarction. This type of study is relatively easy and cheap to do, but it does not determine if having a low HDL is a cause of a myocardial infarction.

    The other type of relationship is that determined by experiment in which the independent variable is deliberately manipulated. For example, two groups of people with normal blood pressures are selected, and one group is deliberately exposed to increased stress or a high-salt diet. If the blood pressures increase in this group but not in the control group, then it is possible that stress or the salt content of the diet is a cause of hypertension, and the mechanisms by which this occurs can then be investigated.

    These statistical inferences indicate the probability that the differences observed could occur due to chance alone. Frequently, statistical procedures begin with the assumption of the null hypothesis, for example, that the means of two populations are identical, and then tests to determine if observed sample differences could easily have occurred by chance. If differences of the magnitude observed could easily have occurred by chance, then the null hypothesis should not be rejected and there is no good reason to postulate a causative effect.

    Setting confidence limits and making statistical inferences can be done efficiently only after an effective description of the data. Therefore the initial part of this book is devoted to descriptive methods on which all subsequent developments are based.

    Data

    Statistics involves thinking about numbers, but sometimes these are not what they seem. For example, a university proudly announces that its ranking has improved because last year it accepted only 5% of all applicants. This might mean that it is indeed a prestigious university so that many candidates apply, but it might also mean that the university indulges in misleading advertising so that many who have no chance of admission are persuaded to apply. A university department might assert that it has greatly improved its teaching because 97% of the class passed with flying colors, but that might conceal the fact that several students who were not doing well were advised to withdraw from the course. Therefore whether the university or the department has improved depends not on the numbers provided (often by people who have a stake in the impression given) but on what the data actually mean. This is true of all data, and it is as well to consider what the numbers mean before beginning to analyze them.

    Models

    All statistical tests are based on a particular mathematical model. A model is a representation of reality, but it should never be mistaken for the reality that underlies it. As Box and Draper wrote Essentially, all models are wrong, but some are useful (Box and Draper, 1987).

    All statistical tests use models, either explicitly or implicitly. For example, consider the heights of a group of people in the USA and a group of Central African pygmies. In each group the heights will vary, but on average the pygmies will be much shorter than the Americans. The model that we use is to regard the pygmies as having an average height of μP cm and the Americans as having an average height of μA cm. Any one pygmy may differ from the pygmy average by a difference εi, and any one American may differ from the American average by difference εj. An assumption is that each of these sets of differences is normally distributed (Chapter 6). To compare the two sets of heights to determine if they really are different from each other, a test such as the t-test is used in which this model is required. If the population distributions fit the model and are indeed normally distributed, then the t-test is an efficient way of comparing the two sets of heights. If, on the other hand, the population heights are far from normally distributed, then using a test that demands a statistical model with normal distributions can produce misleading conclusions. Therefore whenever a statistical test is used, the model being invoked must be considered and the data must be evaluated to find out if they reasonably fit the model. If the model chosen is incorrect, then the test based on that model may give incorrect results.

    General Approach to Study Design

    After studying this book, the reader should be able to design simple studies and tell if someone else's studies have been correctly designed. The following discussion summarizes what should be done, and gives some guidance about selecting the appropriate analyses.

    The first requisite for any study is to ask a focused question. It is of little use to ask What happens if we place a normal human adult in a temperature of 35  °C (95  °F) for 3  weeks? Many hundreds of anatomic, physiologic, neurologic, biochemical, and molecular biologic changes probably occur. You may be able to measure only some of these, and for all you know have not measured the most important changes. Furthermore, if you could measure all the possible changes, the mass of results would be very difficult to interpret. That is not to state that the effects of persistent high-ambient temperatures are not important and should not be studied, but rather that the study be designed to answer specific questions. For example, asking what mechanisms achieve adequate heat loss under these circumstances, and looking at possible factors such as changes in blood volume, renal function, and heat exchange through the skin are valid and important questions.

    This requisite applies to any study. The study might be a laboratory experiment of norepinephrine concentrations in rats fed with different diets, a clinical trial of two treatments for a specific form of cancer, a retrospective search of population records for the relation between smoking and bladder cancer, or the relationship between prospective votes for Democrats, Republicans, and Independents related to race and gender.

    The next decision is what population to sample. If the question is whether a new antihypertensive drug is better than previous agents, decide what the target population will be. All hypertensives or only severe hypertensives? Males and females, or only one gender? All ages or only over 65  years? All races or only Afro-Americans? With or without diabetes mellitus? With or without prior coronary artery disease? And so on, depending on the subject to be studied. These are not statistical questions, but they influence the statistical analysis and interpretation of the data. Therefore inclusion and exclusion criteria must be unambiguously defined, and the investigator and those who read the results must be clear that the results apply at best only to a comparable group.

    Define what will be measured, and how. Will it be a random blood pressure, or one taken at 8 am every day, or a daily average? Will you use a standard sphygmomanometer or one with a zero-muddling device? Will you measure peripheral blood pressure or measure central blood pressure by using one of the newer applanation devices? Again these are not statistical questions, but they will affect the final calculations and interpretation. The number of possible variables in this comparatively simple study is large, and this explains in part why different studies with different variables often reach different conclusions.

    Consider how to deal with confounders. It is rare to find a perfect one-to-one relationship between two variables in a biomedical study, and there is always the possibility that other factors will affect the results. These other factors are confounders. If we know about them, our study might be made more efficient by allowing for them; for example, including patients with diabetes mellitus when examining the outcome of stenting a stenosed coronary artery. Therefore in planning to study the outcome of stent implantation, we might want to incorporate potential confounders in the study; for example, diabetes mellitus, hypertension, obesity, elevated LDL concentrations, renal function, racial group, and age distribution. If we can arrange our study so that each group has subgroups each with an identical pattern of confounders, analysis will be easier and more effective. On the other hand, with too many subgroups, either the total numbers will be huge or else each subgroup may have insufficient numbers to allow for secure interpretation. If for practical reasons such balancing of confounders cannot be done, an approach such as Cox regression (Chapter 35) or propensity analysis (Chapter 38) might allow for the influence of each of these other factors. Either approach would be better than not considering these confounders at all. Finally, there are likely to be confounders that we do not know about. The only way to try to allow for these is to make sure that the various groups are chosen at random, so that it is likely that unknown confounders will be equally represented in all the groups.

    The term simple random sampling means that each member of the target population has an equal chance of being selected, and that selection of one member has no effect on the selection of any other member. Randomization is the process of taking a given sample and dividing it into subgroups by random selection. As stated by Armitage and Remington (1970): Randomization may be thought of as a way of dealing with all the unrecognized factors that may influence responses, once the recognizable factors, if any, have been allowed for by some sort of systematic balancing. It does not ensure that groups are absolutely alike in all relevant aspects; nothing can do that. It does ensure that they are all unlikely to differ on the average by more than a moderate amount in any given characteristic, and it enables the statistician to assess the extent to which an observed difference in response to different treatments can be explained by the hazards of random allocation. The hope is that any unrecognized but relevant factors will be equalized among the groups and therefore not confound the results. More details about randomization are given in Chapter 38.

    When making these decisions, the type of statistical analysis that will be used needs to be specified before starting the study. Will there be two groups, one given the new drug A and one a standard drug B? Will there be a crossover experiment, giving group 1 drug A first and drug B second, whereas group 2 gets drug B first and drug A second? Is there to be one group, giving drug A first and drug B second? The answer to these questions will in part determine how to design the details of the study, how to select subjects, and what form of analysis to use. The more effective the design, the more informed will be the final interpretation.

    In deciding about the statistical approach to be used, define what will be measured and how it will be done. If the outcomes are dichotomous (yes if the pressure falls >10  mm Hg, no if it does not) versus outcomes that are continuous ratio numbers, then different approaches will be needed. Decide what to do if any subject cannot complete the study.

    Although the outcomes are not yet available, make some guesses as to what to expect, because this leads to the calculations of sample size. For a new antihypertensive drug there will be some prior information from animal or anecdotal human studies, and the important question to ask is what magnitude of result to expect. No one would do a study to find a decrease of 1  mm Hg blood pressure, but what are you going to do if you expected a 15  mm Hg decrease and the preliminary results showed a decrease of ≤5  mm Hg? Do you have a stopping rule? Did you factor this possibility into your calculation of sample size?

    Begin analysis of any data set with simple preliminary exploration and description before plunging into hypothesis testing.

    A Brief History of Statistics

    For centuries, governments collected demographic data about manpower, births and deaths, taxes, and other details. Early examples of this were a Chinese census in AD2 by the Han dynasty that found 57.67  million people in 12.36  million households, and the tabulation in 1085 by William the Conqueror of details of all the properties in England, as collected in the Domesday Book. Other than a large collected list, however, no manipulation of data was done until John Graunt (1620–1674) published his Natural History and Political Observations on the London Bills of Mortality in 1662, perhaps the first major systematic contribution in the field of what was termed political arithmetic. Graunt not only collected data systematically, but also analyzed them.

    In 1631 (Lewin, 2010) the term statist was used to describe a person interested in political arithmetic who desires to look through a Kingdome, perhaps the first time this term was used. However, the term statistics seems to have been used first by a professor of politics in Gottingen in 1749, when he wanted a term to describe numerical information about the state: number of kilometers of roads, number of people, number of births and deaths, number of bushels of wheat grown, and so on (Yule and Kendall, 1937). (Kaplan and Kaplan (2006) attribute the term to a professor in Breslau in Prussia.) Today these are termed economic and vital statistics. The items in a group are often referred to as statistics, but there is usually no difficulty deciding whether the term statistics refers to items in a group or to the field of study.

    One of the origins of statistics concerns probability theory. Even before the Roman empire, people were interested in the odds that occur in games of chance, and these were systematized by Cardano (1501–1576) in Liber de Ludo Aleae (Book of Dice Games) published in 1663 but written in 1560 (as described in delightful books by Weaver (1963) and Kaplan and Kaplan (2006)). However, the first specific mathematical theory of probability originated in 1654 with a gambling problem that puzzled Antoine Gombaud, Chevalier de Méré, Sieur de Baussay. There is a game of chance in which the house (the gambling establishment) offers to bet even money that a player will throw at least one 6 in four throws of a die. On the average, the player will win 671 times for every 625 times he loses. What concerned Chevalier de Méré, however, was that it was not favorable to the player to bet on throwing at least one double 6 in 24 throws of a pair of dice. After solving this problem (try it yourself, then see Chapter 5), he checked his solution with Blaise Pascal (1623–1662), the great French mathematician. Pascal confirmed his answer, and then went on to investigate other probabilities related to gambling. He exchanged letters with Pierre de Fermat (1601–1665), and then other mathematicians were drawn into the field that grew rapidly. Although the problem above was a real problem, it was probably known well before Pascal was involved, and the story may be apocryphal (Ore, 1960).

    Egon Pearson (1973) (1895–1980) emphasized that big advances in statistics were nearly always made by a combination of a real problem to be solved and people with the imagination and technical ability to solve the problem. For example, about 100  years after Pascal, mathematicians were concerned about the accuracy with which astronomical observations could be made. At this time, Newton's theories were well known, and astronomers were making accurate measurements of the heavenly bodies. This was not just for curiosity, but because navigation, commerce, and military actions depended critically on accurate knowledge of time and position, including the errors of making these measurements (Stigler, 1986). Many of the developments were made in response to specific practical questions about the orbits of celestial objects or the best estimates of length and weight. In the first half of the eighteenth century, mathematicians began to investigate the theory of errors (or variability), with a major contribution from Gauss (1777–1855). This phase culminated when Legendre (1752–1833) introduced the method of least squares in 1805.

    The introduction of statistical methods into nonphysical sciences came relatively late, and began with data collection and analyses, many performed by those we now term epidemiologists. Adolphe Quetelet (1796–1874), an astronomer, studied populations (births, deaths, and crime rates). Although he did not invent the method of least squares for assessing errors (variability), he used it in his work. In 1852, William Farr (1807–1883) studied cholera fatalities during the cholera epidemic of 1848/1849 and demonstrated that the fatality rate was inversely related to the elevation above sea level. He published a revealing figure that plotted fatality rate against elevation, and showed a conical figure with a wide base and narrow top. Florence Nightingale (1820–1920) analyzed deaths in the Crimean campaign, and in 1858 published what might have the first modified pie chart (called a coxcomb chart) to demonstrate that most deaths were due to preventable diseases rather than to battle injuries (Joyce, 2008). Because of her revolutionary work, in 1858 she was the first woman to be elected a Fellow of the Statistical Society of London.

    Gustav Theodor Fechner (1801–1887) was apparently the first to use statistical methods in experimental biology, published in his book on experimental psychology (Elemente der Psychophysik) in 1860. Then Francis Galton (1822–1911) began his famous studies on heredity with the publication in 1869 of his book Hereditary Genius. He not only analyzed people in innumerable ways, but also did experiments on plants. He appears to have been the first person to take an interest in variability, his predecessors being interested mainly in mean values (Pearson, 1973). In 1889 he published a book, Natural Inheritance, which influenced mathematicians such as Francis Ysidro Edgeworth (1845–1926) and Karl Pearson (1857–1936) to develop better methods of dealing with the variable data and peculiar distributions so often found in biology.

    A notable advance was made when William Sealy Gosset (1876–1937) published his article The Probable Error of a Mean (Student, 1908). Gosset, who studied mathematics and chemistry at Oxford, was employed by Guinness brewery to analyze data about the brewing of beer. Up to that time all statistical tests dealt with data sets with over 1000 measurements, and Gosset realized that tests were necessary to analyze more practical problems that involved small numbers of measurements (Boland, 1984). He worked on this in 1906 while on sabbatical leave, where he was closely associated with Karl Pearson. His publication in 1908 of what became called the t-test was a landmark. Because at that time the Guinness Company did not allow its employees to publish the results of any studies done (for fear of leakage of industrial secrets), Gosset published his study under the pseudonym of Student to avoid association with his employer.

    One final root of modern statistics deserves special mention. In agricultural experiments efficient experimental design has particular importance. Most crops have one growing season per year, and their growth can be influenced by minor variations in the composition, environment, and drainage of the soil in which they grow. Thus experiments concerning different cultivars or fertilizers have to be designed carefully so that the greatest amount of information can be extracted from the results. It would be inefficient to test one fertilizer one year, another the next year, and so on. It was this impetus that lead to the extensive developments of statistical design and analysis in agriculture. As early as 1771 Arthur Young (1741–1820) published his Course of Experimental Agriculture that laid out a very modern approach to experiments (Young, 1771), and in 1849 James Johnson published his book Experimental Agriculture to emphasize the importance of experimental design (Owen, 1976). In 1843 John Lawes, an entrepreneur and scientist, founded an agricultural research institute at his estate Rothamsted Manor to investigate

    Enjoying the preview?
    Page 1 of 1