Sunteți pe pagina 1din 12

6 Preliminary Concepts

tall structures are all designed-built-tested in a computer before they are actually
built. Statistical tools are central in design and operation of a simulation system.

Many engineering services are designed as optimized systems. Take the example
of a gasoline pumping station. How many pump stations should be there? If you
have less number of pumps, your customers have to wait, which gives you a bad
reputation. If you have more number of pumps, it would not be cost effective for
you, as some pumps would lie idle for much of the day. This is a problem in
optimization.

A last example of application of statistics in engineering is in the field of speech


processing. Speech processing is appearing more and more in many fields. Some
examples of its application are consumer commerce, access security, as an aid to
disability, voice command for machineries, etc. To see why statistics is needed in
this field, a wave-form of the audio of ‘hello’ is shown in figure 1.2. We can
clearly see that even though this is audio of same person taken at three different
times, they are quite different. Therefore, it is not difficult to believe that if we
consider the audio from different individuals; added to this the differences due to
gender, it is clear that if a machine has to understand this audio to be ‘hello’, this
has to be done through a statistical model.

These are just some of the applications of statistics in engineering. As we would


move along in the course, we would encounter more and more examples of
statistics in engineering.

Formal definitions

We are in a much better state to give a formal definition of statistics.

Definition: Statistics may be defined as a branch of applied mathematics


concerned with the collection and interpretation of quantitative data and the use
of probability theory to estimate the behavior of physical systems.

Before we elaborate further on this definition, we require explaining the concepts


of variable, population, and sample. We elaborate on these concepts through an
example.

Example 1.4
Suppose you are interested to find the average age of the people living in Dhaka
city. All the people living in Dhaka city is the population for this study.

Independent University, Bangladesh Lecture Notes on


Probability and Statistics
Preliminary Concepts 7

Definition: Population is the set of all the individuals, items, or data for which a
statistical study is conducted.

Statistical methods are particularly useful for studying, analyzing, and learning
about populations. A population size does not have to be large. If you are
interested in finding the average grade of a class, the population size is 30 to 50.
On the other hand if you interested in the average value of resistance produced in
a production line, population size is infinity.

In studying a population, we focus on one or more characteristics or properties of


the population. For the problem we may be interested in the age, income, and the
education 0of people living in a particular locality of Dhaka city. Such
characteristics of a population are called variables.

Definition: A variable is a measurable factor, characteristic, or attribute of an


individual or a population.

For a statistical study, a variable does not have to be the quantity of study. For
example for the problem we are discussing in this example, beside the age of
individual element, other quantities, like their locality, income may also be
quantities of interest.

For statistical studies, if the population size is large, it may not practical to
investigate all the elements of the population. In that case, a small sample is taken
from the population; the statistical study is conducted on the sample; and the
results obtained from the sample are extrapolated to the population. (This is called
statistical inference, a topic that we will discuss in detail later in the semester.)

Definition: A sample is a subset of the population.

Statistics has two major branches: descriptive statistics and inferential statistics.

Descriptive statistics are used to describe the basic features of the data gathered
from an experimental study in various ways. They provide simple summaries
about the population, or sample and the measures. Together with simple graphics
analysis, they form the basis of virtually every quantitative analysis of data.

As we just discussed, in case of large population, the study is conducted on a


sample. The tools used to extrapolate the result of the sample to the population are
called inferential statistics. The tools included here are point estimation, interval

Lecture Notes on Independent University, Bangladesh


Probability and Statistics
8 Preliminary Concepts

estimation, hypothesis testing, and prediction. We will discuss all these tools later
in the semester.

Exercises

1. A manufacturer of vacuum cleaners has detected that an assembly line is


operating satisfactorily if less than 2% of the cleaners produced per day are
defective. If 2% or more of the cleaners are defective, the line must be shut
down and proper adjustments have to be made. To check every cleaner as it
comes off the line would be costly and time-consuming. The manufacturer
decides to choose 3o cleaners at random from a specific day’s production and
test for defects.
(a) Describe the population of interest.
(b) Identify the variable of interest.
(c) Describe the sample.
(d) Give an example of inference that the manufacturer might make.

2. An insurance company would like to determine the proportion of all; medical


doctors who have been involved in one or more malpractice lawsuit. The
company selects 500 doctors at random from a professional directory and
determines the number in the sample who have ever been involved in a
malpractice lawsuit.
(a) Describe the population of interest.
(b) Identify the variable of interest.
(c) Describe the sample.
(d) Give an example of inference that the manufacturer might make.

3. A Gallup youth poll was conducted to determine the topics that teenagers
most want to discuss with their parents. The findings show that 46% would
like more discussion about the family’s financial situation, 37% would like to
talk about school, and 30% would like to talk about their religion. The survey
was based upon a national sample of 505 teenagers.
(a) Describe the population of interest.
(b) Identify the variable of interest.
(c) Describe the sample.
(d) Give an example of inference that the manufacturer might make.

4. A medical report describes a new method of treating a major form of


blindness in elderly people. The process, using lasers to seal abnormal blood

Independent University, Bangladesh Lecture Notes on


Probability and Statistics
Preliminary Concepts 9

vessels in the eye, was tried on 224 patients. Of those, only 14% went blind
in one year. In a control group of similar untreated patients, 42% went blind
in one year. Therefore, to determine whether the laser treatment was
effective, research physicians wished to compare the proportions of patients
going blind in one year for two different populations.
(a) Describe the population of interest.
(b) Identify the variable of interest.
(c) Describe the sample.
(d) Give an example of inference that the manufacturer might make.

Types of data

For sake of analysis, statistical data re categorized in many ways. At the top of
classification, data are categorized as qualitative data, and quantitative data.

Definition: Qualitative data is data describing the attributes or properties that an


object possesses. The properties are categorized into classes that may be assigned
numeric values. However, there is no significance to the data values themselves;
they simply represent attributes of the object concerned.

Qualitative data my further be classified as nominal data and ordinal data.

Nominal data are the weakest data measurements. These data are normally verbal
in nature, like religious affiliation, gender, brands of cars, etc.

Ordinal data may be verbal, or numerical, and will have some hierarchy. But the
hierarchy would qualitative in nature. For example, we know that the class
standings of students are: freshman, sophomore, junior, and senior. We know that
a senior is senior than a junior; a junior is senior that a sophomore, and so on.
Even if the data is represented numerically, the numbers are meant to be as
classification only; they will have no numerical value. For example, you may be
asked to rate the service of a cell phone company from 1 to 10. If you rate the
company 4 and your friend rates the company 8, these ratings are qualitative in
nature. In other word the rating ‘8’ cannot be considered as twice the rating ‘4’.
We can at best say that ‘8’ > ‘4’; but by how much, we cannot measure that. The
bottom-line is that any mathematical operation with ‘8’ and ‘4’ would be
meaningless.

Lecture Notes on Independent University, Bangladesh


Probability and Statistics
10 Preliminary Concepts

Definition: Quantitative data is data expressing a certain quantity, amount or


range. Usually, there is measurement units associated with the data, e.g. meters,
in the case of the height of a person. It makes sense to set boundary limits to such
data, and it is also meaningful to apply arithmetic operations to the data.

Quantitative data may be further classified as interval data and ratio data.

Interval data are on a scale of measurement where the distance between any two
adjacent units of measurement (or 'intervals') is the same but the zero point is
arbitrary. Scores on an interval scale can be added and subtracted but can not be
meaningfully multiplied or divided.

Ratio data has the highest level of measurement. It is possible to perform all
arithmetic operation on ratio data.

The subtle difference between interval data and ratio data is the location and
handling of zero. In case of interval data, the location of 0 is not important at all.
The analysis is done with respect to an arbitrary zero. For example, you are
testing large number of resistors whose value is supposed to be 5.5 ohms.
Suppose the results range from 5.2 to 5.7 ohms. You may not be interested in the
actual value of the resistor, but rather the difference from 5.5 ohms. Therefore, in
this case the zero would be set at 5.5. Other examples are the time interval
between the starts of years 1981 and 1982 is the same as that between 1983 and
1984, namely 365 days. The zero point, year 1 AD, is arbitrary; time did not begin
then (there was no 0 AD, as 0 had not been invented by then!). Other examples of
interval scales include the heights of tides, and the measurement of longitude – 0
is set arbitrarily, and has no physical meaning.

On the other hand 0 in ratio data has a physical meaning, in other word
interpretable. For example, weights.

Another very important way to classify quantitative data is discrete and


continuous data.

Discrete data are values / observations belonging to it are distinct and separate,
i.e. they can be counted (1,2,3,....). Examples might include the number of
transistors on a VLSI circuit; the number of patients in a doctor’s surgery; the
number of flaws in one meter of cloth.

Continuous data are values / observations belonging to any value within a finite
or infinite interval. You can count, order and measure continuous data. For

Independent University, Bangladesh Lecture Notes on


Probability and Statistics
Preliminary Concepts 11

example height, weight, temperature, the amount of sugar in an orange, the time
required to run a mile.

It is very important to distinguish a data between discrete and continuous data,


because different set of statistical tools are needed to handle these two kinds of
data.

Exercises

5. Pesticides applied to an extensively grown crop can result in inadvertent air


contamination. Environmental agencies are planning to investigate on thion
residues of the insecticide chlorpyrifos use din dormant orchards in a
particular location. Ambient air specimens were collected daily at an orchard
site during an intensive period of spraying — a total of 13 days — and the
thion level (ng/m3) was measured each day.
(a) Identify the population of interest to the researchers.
(b) Identify the sample.

6. A team of civil and environmental engineers studied the ground motion


characteristics of 15 earthquakes that occurred around the world between
1940 and 1995. Three (of many) variables measured on each earthquake were
the type of ground motion (short, long, or forward directive), earthquake
magnitude on Richter scale, and peak ground acceleration in feet per second.
One of the goals of the study was to estimate the inelastic spectra of any
ground motion cycle. Do the data for the 15 earthquakes represent a
population or a sample?

7. Electrical engineers recognize that high neutral current in computer power


systems is a potential problem. To determine the extent of the problem, a
survey of the computer power system load currents at 146 US sites were
taken. The survey revealed that less than 10% of the sites had high neutral to
full-load current ratios.
(a) Identify the population of interest.
(b) Identify the sample.

8. Researchers have developed a new precooling method for preparing


vegetables for market. The system employs an air and water mixture
designed to yield effective cooling with a much lower water flow than
conventional hydrocooling. To compare the effectiveness of the two systems,

Lecture Notes on Independent University, Bangladesh


Probability and Statistics
12 Preliminary Concepts

20 batches of green tomatoes were divided into two groups; one group was
precooled with the new method, and the other with conventional method. The
water flow (in gallons) required to effectively cool each batch was recorded.
(a) Identify the population, samples, and the type of statistical inference to
be made for this problem.
(b) Propose a model as how could the sample data be used to compare the
cooling effectiveness of the two systems.

9. A study reports on the effects of a tropical cyclone on the quality of drinking


water on remote islands. Water samples (size 500 ml) were collected
approximately 4 weeks after a cyclone hit the island. The following variables
were recorded for each water sample. Classify each of the data.
(a) Villages where samples were recorded
(b) Type of water supply (river intake, stream, deep tubewell)
(c) Acidity (pH level, scale 1 to 14)
(d) Turbidity level (nephalometric turbidity units, NTU)
(e) Temperature (degree Celsius)
(f) Number of fecal coliforms per 100 millimeters
(g) Free chlorine-residual (milligram per liter)
(h) Presence of hydrogen sulphide (yes or no)

10. Environmental engineers are studying the patterns of extinction in the New
Zealand bird population. The following characteristics were determined for
each bird species that inhabited at the time of Maori colonization. Classify
each variable.
(a) Flight capability (Volant or flightless)
(b) Habitat type (aquatic, ground terrestrial, aerial terrestrial)
(c) Nesting site (ground, cavity within ground, tree, cavity above ground)
(d) Nest density (birds per nest)
(e) Diet (fish, vertebrates, vegetables, invertebrates)
(f) Body mass (grams)
(g) Egg length (millimeters)
(h) Extinct status (extinct, absent from island, present)

11. A new type of computed axial tomography (CAT) scanning method has been
developed to identify lung cancer. Medical physicists believe that CAT
scanning is more sensitive than regular X-rays in pin-pointing small tumors.
To test this hypothesis, a clinical trial of 50,000 nationwide smokers was
conducted to compare the effectiveness of CAT scanning and X-rays for
detecting lung cancer. Each participant smoker is randomly assigned to one
of two screening methods, CAT scanning or chest X-ray and their progress

Independent University, Bangladesh Lecture Notes on


Probability and Statistics
Preliminary Concepts 13

tracked over time. In addition to the type of screening method used, the
physicists recorded the age at which the scanning method first detects a
tumor for each smoker.
(a) Identify the two variables measured for each experiment
(b) Classify the type of the variable
(c) What is the inference that will ultimately be drawn from the clinical
result?

12. All highway bridges in the United States are inspected periodically for
structural deficiency by the Federal Highway Administration. Data from the
administration are compiled into the National Bridge Inventory (NBI).
Several of the nearly 100 variables maintained by the NBI are listed below.
Classify each variable.
(a) Length of the maximum span (feet)
(b) Number of vehicle lanes
(c) Toll bridge (yes or no)
(d) Average daily traffic
(e) Condition of deck (good, fair, or poor)
(f) Bypass or detour length (miles)
(g) Route type (interstate, US, state, county, or city)

Review of mathematics – Set Theory

Sets are used to group objects together. Often, the objects in a set have similar
properties. For example, all the students of this class will form a set. There are
several ways to describe a set. A common way is to list the element within curly
bracket as {a, b, c, d}

Example 1.5
The set V of all vowels in the English language can be written as
V = {a, e, i, o, u}.

Example 1.6
The set O of odd positive integers less than 10 can be expressed as
O = {1, 3, 5, 7, 9}.

Two sets are equal if and only if their elements are common. Therefore, the sets
{1, 3, 5}, {5, 3, 1}, and {1, 1, 3, 3, 5, 5} are equal.

Lecture Notes on Independent University, Bangladesh


Probability and Statistics
14 Preliminary Concepts

Another way to describe a set is by using set builders. The set O above can be
stated as
O = {x | x is an odd positive integer less than 10}

The element of a set is denoted symbolically using the symbol∈.

Cardinality of a set A, denoted as |A|, or n(A).

A subset is defined as a set containing some of the element of another set.


Assuming A = {1, 2, 3, 4, 5, 6}, the set B = {3, 4, 5} is a subset of A.
Symbolically, this is written as B ⊂ A. It can also be said that A is a superset of B,
which is written symbolically as A ⊃ B. A set is always a subset of itself.

A null set, denoted with ∅ is a set with no elements. It is also called an empty set.

A universal set U of a set A is defined as a superset from where the set A has been
derived. Therefore, for the set A above, the universal set U can be considered to
be a set of all alphabets in English language.

Sets can also be represented using Venn diagram. The universal set is normally
denoted with a rectangle, and other sets are drawn within the rectangle as shown
in figure 1.

Definition: Let A and B be two sets. The union of the sets A and B, denoted by
A ∪ B, is defined as a set that contains all the elements which are either in A or
B.

Symbolically

U
A Figure 3. Venn diagram

Independent University, Bangladesh Lecture Notes on


Probability and Statistics
Preliminary Concepts 15

A ∪ B = {x | x ∈ A ∨ x ∈ B}
The Venn diagram for union is shown in figure 2.

Example 1.7
Let A = {1, 3, 4, 6} and B = {3, 5, 6, 7}. A ∪ B = {1, 3, 4, 5, 6, 7}.

Definition: Let A and B be two sets. The intersection of the sets A and B, denoted
by A ∩ B, is defined as a set that contains all the elements which are either in A
or B.

Symbolically

U U
A B
A∪ B

Figure 4 Union of two sets.

A ∩ B = {x | x ∈ A ∧ x ∈ B}
The Venn diagram for intersection is shown in figure 3.

Example 1.8
Let A = {1, 3, 4, 6} and B = {3, 5, 6, 7}. A ∩ B = {3, 6}.

Definition: Two sets A and B are disjoint if their intersection is an empty set. In
other words A ∩ B = ∅.

Looking at figure 2, we can find the cardinality of A ∪ B as


n(A ∪ B) = n(A) + n(B) − n(A ∩ B) (1)

Lecture Notes on Independent University, Bangladesh


Probability and Statistics
16 Preliminary Concepts

U U
A B
A∩ B

Figure 5. Intersection of two sets.


Definition: Let A and B be two sets. The difference of A and B, denoted by A − B,
is the set containing those elements that are in A not in B. The difference of A and
B is also called the complement of B with respect to A.

This can be stated symbolically as


A − B = {x | x ∈ A ∧ x ∉ B}
The Venn diagram for the difference set is shown in figure 4. The cardinality of A
− B is obtained as
n(A − B) = n(A) − n(A ∩ B) (2)

U U
A B
A−B

Figure 6. Difference of two sets

Definition: For a set A, the complimentary set A′ (or Ac, or A ) is defined as


elements x, such that x ∉ A ∧ x ∈ U.

Some important set identities are shown in the table below.

Identity Name

Independent University, Bangladesh Lecture Notes on


Probability and Statistics
Preliminary Concepts 17

A∪∅=A Identity laws


A∩U=A
A∪U=U Domination laws
A∩∅=∅
A∪A=A Idempotent laws
A∩A=A

( A) = A Complementation law

A∪B=B∪A Commutative laws


A∩B=B∩A
(A ∪ B) ∪ C = A ∪ (B ∪ C) Associative laws
(A ∩ B) ∩ C = A ∩ (B ∩ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) Distributive laws
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A∪B= A∩B DeMorgan’s laws

A∩B= A∪B

Review of mathematics – Permutation and Combination

These are counting techniques. These two methods are used when you have to
choose r items from a total of n items. If the sequence of the items is important
the total number of possibilities is called permutation, and is denoted by
n!
n
Pr =
( n − r )!
If the sequence of the items those are being chosen is not important, the total
number of possibilities is called combination, and is denoted by
n!
n
Cr =
(n − r )!r!

n
Besides nCr , another way to denote the combination is   .
r

Lecture Notes on Independent University, Bangladesh


Probability and Statistics

S-ar putea să vă placă și