Documente Academic
Documente Profesional
Documente Cultură
tall structures are all designed-built-tested in a computer before they are actually
built. Statistical tools are central in design and operation of a simulation system.
Many engineering services are designed as optimized systems. Take the example
of a gasoline pumping station. How many pump stations should be there? If you
have less number of pumps, your customers have to wait, which gives you a bad
reputation. If you have more number of pumps, it would not be cost effective for
you, as some pumps would lie idle for much of the day. This is a problem in
optimization.
Formal definitions
Example 1.4
Suppose you are interested to find the average age of the people living in Dhaka
city. All the people living in Dhaka city is the population for this study.
Definition: Population is the set of all the individuals, items, or data for which a
statistical study is conducted.
Statistical methods are particularly useful for studying, analyzing, and learning
about populations. A population size does not have to be large. If you are
interested in finding the average grade of a class, the population size is 30 to 50.
On the other hand if you interested in the average value of resistance produced in
a production line, population size is infinity.
For a statistical study, a variable does not have to be the quantity of study. For
example for the problem we are discussing in this example, beside the age of
individual element, other quantities, like their locality, income may also be
quantities of interest.
For statistical studies, if the population size is large, it may not practical to
investigate all the elements of the population. In that case, a small sample is taken
from the population; the statistical study is conducted on the sample; and the
results obtained from the sample are extrapolated to the population. (This is called
statistical inference, a topic that we will discuss in detail later in the semester.)
Statistics has two major branches: descriptive statistics and inferential statistics.
Descriptive statistics are used to describe the basic features of the data gathered
from an experimental study in various ways. They provide simple summaries
about the population, or sample and the measures. Together with simple graphics
analysis, they form the basis of virtually every quantitative analysis of data.
estimation, hypothesis testing, and prediction. We will discuss all these tools later
in the semester.
Exercises
3. A Gallup youth poll was conducted to determine the topics that teenagers
most want to discuss with their parents. The findings show that 46% would
like more discussion about the family’s financial situation, 37% would like to
talk about school, and 30% would like to talk about their religion. The survey
was based upon a national sample of 505 teenagers.
(a) Describe the population of interest.
(b) Identify the variable of interest.
(c) Describe the sample.
(d) Give an example of inference that the manufacturer might make.
vessels in the eye, was tried on 224 patients. Of those, only 14% went blind
in one year. In a control group of similar untreated patients, 42% went blind
in one year. Therefore, to determine whether the laser treatment was
effective, research physicians wished to compare the proportions of patients
going blind in one year for two different populations.
(a) Describe the population of interest.
(b) Identify the variable of interest.
(c) Describe the sample.
(d) Give an example of inference that the manufacturer might make.
Types of data
For sake of analysis, statistical data re categorized in many ways. At the top of
classification, data are categorized as qualitative data, and quantitative data.
Nominal data are the weakest data measurements. These data are normally verbal
in nature, like religious affiliation, gender, brands of cars, etc.
Ordinal data may be verbal, or numerical, and will have some hierarchy. But the
hierarchy would qualitative in nature. For example, we know that the class
standings of students are: freshman, sophomore, junior, and senior. We know that
a senior is senior than a junior; a junior is senior that a sophomore, and so on.
Even if the data is represented numerically, the numbers are meant to be as
classification only; they will have no numerical value. For example, you may be
asked to rate the service of a cell phone company from 1 to 10. If you rate the
company 4 and your friend rates the company 8, these ratings are qualitative in
nature. In other word the rating ‘8’ cannot be considered as twice the rating ‘4’.
We can at best say that ‘8’ > ‘4’; but by how much, we cannot measure that. The
bottom-line is that any mathematical operation with ‘8’ and ‘4’ would be
meaningless.
Quantitative data may be further classified as interval data and ratio data.
Interval data are on a scale of measurement where the distance between any two
adjacent units of measurement (or 'intervals') is the same but the zero point is
arbitrary. Scores on an interval scale can be added and subtracted but can not be
meaningfully multiplied or divided.
Ratio data has the highest level of measurement. It is possible to perform all
arithmetic operation on ratio data.
The subtle difference between interval data and ratio data is the location and
handling of zero. In case of interval data, the location of 0 is not important at all.
The analysis is done with respect to an arbitrary zero. For example, you are
testing large number of resistors whose value is supposed to be 5.5 ohms.
Suppose the results range from 5.2 to 5.7 ohms. You may not be interested in the
actual value of the resistor, but rather the difference from 5.5 ohms. Therefore, in
this case the zero would be set at 5.5. Other examples are the time interval
between the starts of years 1981 and 1982 is the same as that between 1983 and
1984, namely 365 days. The zero point, year 1 AD, is arbitrary; time did not begin
then (there was no 0 AD, as 0 had not been invented by then!). Other examples of
interval scales include the heights of tides, and the measurement of longitude – 0
is set arbitrarily, and has no physical meaning.
On the other hand 0 in ratio data has a physical meaning, in other word
interpretable. For example, weights.
Discrete data are values / observations belonging to it are distinct and separate,
i.e. they can be counted (1,2,3,....). Examples might include the number of
transistors on a VLSI circuit; the number of patients in a doctor’s surgery; the
number of flaws in one meter of cloth.
Continuous data are values / observations belonging to any value within a finite
or infinite interval. You can count, order and measure continuous data. For
example height, weight, temperature, the amount of sugar in an orange, the time
required to run a mile.
Exercises
20 batches of green tomatoes were divided into two groups; one group was
precooled with the new method, and the other with conventional method. The
water flow (in gallons) required to effectively cool each batch was recorded.
(a) Identify the population, samples, and the type of statistical inference to
be made for this problem.
(b) Propose a model as how could the sample data be used to compare the
cooling effectiveness of the two systems.
10. Environmental engineers are studying the patterns of extinction in the New
Zealand bird population. The following characteristics were determined for
each bird species that inhabited at the time of Maori colonization. Classify
each variable.
(a) Flight capability (Volant or flightless)
(b) Habitat type (aquatic, ground terrestrial, aerial terrestrial)
(c) Nesting site (ground, cavity within ground, tree, cavity above ground)
(d) Nest density (birds per nest)
(e) Diet (fish, vertebrates, vegetables, invertebrates)
(f) Body mass (grams)
(g) Egg length (millimeters)
(h) Extinct status (extinct, absent from island, present)
11. A new type of computed axial tomography (CAT) scanning method has been
developed to identify lung cancer. Medical physicists believe that CAT
scanning is more sensitive than regular X-rays in pin-pointing small tumors.
To test this hypothesis, a clinical trial of 50,000 nationwide smokers was
conducted to compare the effectiveness of CAT scanning and X-rays for
detecting lung cancer. Each participant smoker is randomly assigned to one
of two screening methods, CAT scanning or chest X-ray and their progress
tracked over time. In addition to the type of screening method used, the
physicists recorded the age at which the scanning method first detects a
tumor for each smoker.
(a) Identify the two variables measured for each experiment
(b) Classify the type of the variable
(c) What is the inference that will ultimately be drawn from the clinical
result?
12. All highway bridges in the United States are inspected periodically for
structural deficiency by the Federal Highway Administration. Data from the
administration are compiled into the National Bridge Inventory (NBI).
Several of the nearly 100 variables maintained by the NBI are listed below.
Classify each variable.
(a) Length of the maximum span (feet)
(b) Number of vehicle lanes
(c) Toll bridge (yes or no)
(d) Average daily traffic
(e) Condition of deck (good, fair, or poor)
(f) Bypass or detour length (miles)
(g) Route type (interstate, US, state, county, or city)
Sets are used to group objects together. Often, the objects in a set have similar
properties. For example, all the students of this class will form a set. There are
several ways to describe a set. A common way is to list the element within curly
bracket as {a, b, c, d}
Example 1.5
The set V of all vowels in the English language can be written as
V = {a, e, i, o, u}.
Example 1.6
The set O of odd positive integers less than 10 can be expressed as
O = {1, 3, 5, 7, 9}.
Two sets are equal if and only if their elements are common. Therefore, the sets
{1, 3, 5}, {5, 3, 1}, and {1, 1, 3, 3, 5, 5} are equal.
Another way to describe a set is by using set builders. The set O above can be
stated as
O = {x | x is an odd positive integer less than 10}
A null set, denoted with ∅ is a set with no elements. It is also called an empty set.
A universal set U of a set A is defined as a superset from where the set A has been
derived. Therefore, for the set A above, the universal set U can be considered to
be a set of all alphabets in English language.
Sets can also be represented using Venn diagram. The universal set is normally
denoted with a rectangle, and other sets are drawn within the rectangle as shown
in figure 1.
Definition: Let A and B be two sets. The union of the sets A and B, denoted by
A ∪ B, is defined as a set that contains all the elements which are either in A or
B.
Symbolically
U
A Figure 3. Venn diagram
A ∪ B = {x | x ∈ A ∨ x ∈ B}
The Venn diagram for union is shown in figure 2.
Example 1.7
Let A = {1, 3, 4, 6} and B = {3, 5, 6, 7}. A ∪ B = {1, 3, 4, 5, 6, 7}.
Definition: Let A and B be two sets. The intersection of the sets A and B, denoted
by A ∩ B, is defined as a set that contains all the elements which are either in A
or B.
Symbolically
U U
A B
A∪ B
A ∩ B = {x | x ∈ A ∧ x ∈ B}
The Venn diagram for intersection is shown in figure 3.
Example 1.8
Let A = {1, 3, 4, 6} and B = {3, 5, 6, 7}. A ∩ B = {3, 6}.
Definition: Two sets A and B are disjoint if their intersection is an empty set. In
other words A ∩ B = ∅.
U U
A B
A∩ B
U U
A B
A−B
Identity Name
( A) = A Complementation law
A∩B= A∪B
These are counting techniques. These two methods are used when you have to
choose r items from a total of n items. If the sequence of the items is important
the total number of possibilities is called permutation, and is denoted by
n!
n
Pr =
( n − r )!
If the sequence of the items those are being chosen is not important, the total
number of possibilities is called combination, and is denoted by
n!
n
Cr =
(n − r )!r!
n
Besides nCr , another way to denote the combination is .
r