Documente Academic
Documente Profesional
Documente Cultură
Chapter 12
Statistical Analysis
In this Chapter
Introduction
Univariate Statistics
Multivariate Statistics
Introduction
You can perform basic statistical analyses on any numeric data in
the workspace using the following two commands found in the
Analysis submenu on the Workspace Menu:
Exploration
Page 2262
Univariate Statistics
Most exploration analyses deal with more than a single variable.
However, it is important that you understand the behaviour of each
variable independently of the others before any multivariate data
analysis takes place. The univariate statistical functions in
Gemcom for Windows allow you to do this.
You can perform the following univariate statistical functions on
data in an extraction file:
Note that before you select this command, you will have to create
an extraction file. Input data to all of these functions is obtained
directly from the workspaces via this extraction file. This lets you
define, before the analysis, sets of selection criteria to extract the
data or subset of data that you require for the statistical analysis. It
Page 2263
also lets you use data sets created by other Gemcom systems such
as PC-MINE or ORE-CONTROL.
The univariate statistics function creates three output files that are
used by QuickGraf to create graphical representations of the data:
NORMAL.GRF.
LOG.GRF.
All the files are text files and are located in the GCDBaa\GRAPHS
subdirectory.
Output from univariate statistics can be directed to the screen,
printer or text files, as well as to QuickGraf for viewing and
printing histograms, frequency plots and probability plots.
Exploration
Page 2264
Page 2265
Value Statistics
The Value Statistics area is the next area in the dialog box. This
area displays the following information taken from the extract file:
Number of Values
Number of Values (<= 0)
Minimum Value
Maximum Value
Exploration
Page 2266
Histogram Definition
The Histogram Definition area, in the lower right-hand corner of
the dialog box, is the fourth area. Data entered in this area is used
to create the frequency distribution analysis.
The frequency distribution analysis provides you with details of the
variations within the data set being analyzed. To calculate the
frequency distribution, you must define the upper and lower limits
of the data to be analyzed, and you must define a set of ranges of
values, known as class intervals.
When you define the frequency distribution analysis, you can
impose limits on the data set that you are analyzing. You do this by
applying a lower and upper bound value. Enter the following
parameters to define the histogram to be created.
Lower and Upper Bounds. Enter the lower bound of the first
class interval and the upper bound of the last class interval that
you want to be displayed in the frequency distribution analysis
and any histograms you might plot. The default values are the
same as the default upper and lower cut-off values in the Data
Selection Transformation area.
If the lower bound of the first class interval is greater than the
lower cut-off value of the data selection transformation limits,
all values falling between the lower cut-off value and the lower
bound of the class interval will be included in the first class
interval.
If the upper bound of the first class interval is less than the
upper cut-off value of the data selection transformation limits,
all values falling between the upper cut-off value and the upper
bound of the class interval will be included in the last class
interval.
If you do not want this to occur, make sure that the lower bound
of the first class interval is equal to the lower cut-off value, and
the upper bound of the last class interval is equal to the upper
cut-off value.
Page 2267
Exploration
Page 2268
Mean. This is the arithmetic mean, the sum of all of the values
in the data set divided by the total number of samples (n).
Median. This is the middle value of the data set when all
the samples are sorted into ascending order.
Page 2269
Natural Log Mean. This is the mean of the sum of all the
natural logs of the data set. Note that this value is only
calculated when there are no samples less than or equal to
zero in the data set.
Exploration
Page 2270
This manual does not discuss the use of classical statistics in detail.
Please refer to the numerous statistical reference books available
for more detailed discussions about these values and their
meanings.
Page 2271
Page 2272
Dec Count. The decreasing count of all the values in the class
interval and all the class intervals following it.
Dec Mean. The decreasing means of all the values in the class
interval and all the class intervals following it.
Page 2273
Histograms
Histograms are block diagrams that show the count of the number of
samples in each class interval. Histograms are usually plotted with the
Exploration
Page 2274
class intervals along the horizontal axis of the graph, and the count or
relative count along the vertical axis.
You can display two types of histograms automatically:
Log scaled histogram. The X axis of the graph show the class
intervals displayed with a logarithmic scale. The Y axis of the
graph shows the sample count with a normal scale.
Normal Histogram
1000
Frequency Count
800
600
400
200
0
0
1.000
2.000
3.000
4.000
5.000
Copper (Percent)
Software By Gemcom
Page 2275
Frequency Count
600
400
200
0
0.010
0.100
1.000
10.000
Copper (Percent)
Software By Gemcom
Frequency Graphs
Frequency graphs are line graphs that show the relative count,
cumulative count or decreasing count of values in each class
interval. Frequency graphs are usually plotted with the class
intervals along the horizontal axis of the graph, and with the count,
relative count, cumulative count or decreasing count along the
vertical axis of the graph.
You can automatically display two types of frequency graphs:
Exploration
Page 2276
Page 2277
Probability Graphs
Probability graphs are graphs that show the probability of values
falling within a particular class interval. The probability plot is created
by plotting class intervals against cumulative relative frequency
plotted using an axis with a probability scale.
There are two types of probability graphs:
Exploration
Page 2278
Multivariate Statistics
You can use the multivariate statistics function in Gemcom for
Windows to analyze the relationships between multiple variables.
These functions provide you with the following tools:
Page 2279
STATS.DAT.
All the files are text files and are located in the GCDBaa\GRAPHS
subdirectory.
Data for the multivariate statistics is obtained directly from the
workspaces. You can perform multivariate statistics on up to ten
fields with numeric data types (real, double or angle) from one table
at a time. You can use data from any type of workspace and define
subsets of data by applying filters and other selection criteria to the
workspace.
Output from the multivariate statistics can be directed to the
screen, printer or text files as well to QuickGraf for viewing and
plotting scattergrams and linear regression models.
When you select Workspace } Analysis } Multivariate
Statistics from Workspace, the Multivariate Statistics
Preparation dialog box will appear. This dialog box consists of
three tabs, which contain the parameters defining the data set that
will be used to calculate your multivariate statistics, and three
buttons. If you have performed multivariate statistics using your
current workspace before, the default parameters displayed in each
of the tabs will be the parameters you entered last time.
Exploration
Page 2280
Data. The parameters under this tab specify which table and
fields will be used to perform the multivariate statistics
function.
Location. This tab allows you to limit the records chosen for
the function to a particular area by defining a bounding box
using northing, easting and elevation coordinates.
Filter. This tab allows you to limit the records chosen for the
functions by specifying upper and lower bounds or matching
strings for the data to be used.
Data
The parameters on the Data tab (see Figure 12-10) allow you to
specify the fields and tables to be used in the calculations. Enter
the following parameters:
Use FROM. Select this option to use the data in the FROM
field as the reference position.
Page 2281
Use TO. Select this option to use the data in the TO field as
the reference position.
Location
You can use the parameters in this tab to define the physical area
from which data for the calculation is to be taken. Enter the lower
Exploration
Page 2282
Filter
The parameters you enter in the Filter tab will determine which
records from the physical bounding box specified in the Location
tab will be used for the calculations. You can specify lower and
upper bounds or matching strings for fields from up to three tables:
the Header table, the table to be used (if different from the Header
table), and the cross-reference table (if selected, and if different
from the Header table).
Page 2283
Enter the following parameters as necessary for each field you wish
to use to limit the selection criteria:
Field. Select the name of the field to use to limit record selection.
Axis. If the field you selected is a coordinate field, select the axis
(X, Y or Z) for which to enter lower and upper bounds.
Lower Bound and Upper Bound. If the field you selected is a
numeric field, enter a lower and upper bound for the data to be
selected.
Matching String. If the field you selected is a character field,
enter a string to define which data is selected. You can use the
wildcard characters * and ? in your string.
Statistics Calculation
Once you have defined the selection criteria by entering parameters
in all three tabs according to your requirements, you can extract
the subset of data from the workspace and perform the multivariate
statistics calculation. Data will only be extracted for records that
have data present in all of the fields defined for the analysis. If any
of the fields in a record being considered are missing (as denoted by
the special values Not Entered, Insufficient Sample, Not Sampled,
Not Calculated, or Error), then none of the values for the record
will be used.
You can use the three buttons at the right-hand side of the dialog
box to perform the following functions:
Exploration
Page 2284
Page 2285
Statistics Table.
Use the Display option buttons found in the lower left-hand corner
to determine which data table is displayed within the dialog box, as
outlined in the section below.
Exploration
Page 2286
Statistics Table
The multivariate statistics function will calculate a set of classical
statistics for the data in each of the fields selected under the Data
tab as outlined on page 2280. A sample statistics table is shown in
Figure 12-13.
The following functions, which fall into the two groups shown, are
calculated:
Page 2287
Natural log mean. This is the mean of the sum of all the
natural logs of the data set. Note that this value is only
calculated when there are no samples less than or equal to
zero in the data set.
This manual does not discuss the use of classical statistics in detail.
Please refer to the numerous statistical reference books available
for more detailed discussions about these values and their
meanings.
Exploration
Page 2288
Page 2289
Exploration
Page 2290
parameters that define the way the graph is displayed from within
QuickGraf.
For details about working with graphs, see Chapter 23: QuickGraf.
Scattergrams
Scattergrams are X-Y scatter graphs that show the relationship
between two variables. Scattergrams are initially displayed using
normally scaled axes.
Linear Regression
You can use QuickGrafs modelling function to fit a straight line
regression curve to the data set displayed on the scattergram. The
straight line can be in one of two forms:
X = A + BY
In both cases:
X
Y
A
B