Documente Academic
Documente Profesional
Documente Cultură
ACeL
Amity University
In the modern world of computers and information technology, the importance of statistics is very well recogonised by all the disciplines. Statistics has originated as a science of statehood and found applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and so on. As on date there is no other human walk of life, where statistics cannot be applied.
Preface
The importance of Business Statistics, as a field of study and practice, is being increasingly realized in schools, colleges, universities, commercial and industrial organizations both in India and abroad. It is a technical and practical subject and learning of it means familiarizing oneself with many new terms and concepts. As the Students Study Material is intended to serve the beginners in the field, I have given it the quality of simplicity. This Study Material is intended to serve as a Study Material for students of BFIA course of Amity University. This Study Material of Business Statistics, is student oriented and written in teach yourself style.
The primary objective of this study material is to facilitate clear understanding of the subject of Business Statistics. This Material contains a wide range of theoretical and practical questions varying in content, length and complexity. Most of the illustrations and exercise problems have been taken from the various university examinations. This material contains a sufficiently large number of illustrations to assist better grasp and understanding of the subject. The reader will find perfect accuracy with regard to formulae and answers of the exercise questions. For the convenience of the students I have also included multiple questions and case study in this Study Material for better understanding of the subject.
I hope that this Material will prove useful to both students and teachers. The contents of this Study Material are divided into eight chapters covering various aspects of the syllabus of BFIA and other related courses. At the end of this Material three assignments have been provided which are related with the subject matter.
I have taken considerable amount of help from various literatures, journals and medias. I express my gratitude to all those personalities who have devoted their life to knowledge specially Statistics, from whom I could learn and on the
basis of those learnings now, I am trying to deliver my knowledge to others through this material. It is by Gods loving grace that he brought me in to this world and blessed me with loving and caring parents, my respected father Mr. Manohar Lal Arora and my loving mother Mrs. Kamla Arora, who have supported me in this Study Material. Words may not be enough for me to express my deep sense of gratitude and indebtedness to Dr. Shipra Maitra, Director (Amity College of Commerce & Finance, Amity University, Noida) for the benevolent guidance, constructive criticism and constant encouragement throughout the period I have been involved in this Study Material. I am thankful to my beloved wife Mrs. Deepti Arora, without whose constant encouragement, advice and material sacrifice, this achievement would have been a far of dream.
Table of Contents
Preface ....................................................................................................................................................................... 2 CHAPTER ONE : INTRODUCTION TO STATISTICS....................................................................................... 8 1.1 Introduction ..................................................................................................................................................... 8 1.2 Meaning of Statistics...................................................................................................................................... 9 1.3 Origin and Growth of Statistics: ............................................................................................................. 10 1.4 Definitions : ................................................................................................................................................... 10 1.4.1 Definition by Florence Nightingale ................................................................................................ 11 1.4.2 Definitions by A.L. Bowley: ............................................................................................................ 11 1.4.3 Definition by Croxton and Cowden: ............................................................................................... 11 1.4.4 Definition by Horace Secrist:.......................................................................................................... 12 1.4.5 Definition by Professor Secrit : ...................................................................................................... 12 1.4.6 Definition by Croxton and Cowden : .............................................................................................. 13 1.5 Characteristics of Statistics: .................................................................................................................... 13 1.5.1 Statistics are aggregate of facts : ................................................................................................... 13 1.5.2 Statistics are numerically expressed : ............................................................................................ 13 1.5.3 Statistics are effected to a marked extent by multiplicity of causes : ........................................... 14 1.5.4 Statistics are collected in a systematic order :............................................................................... 14 1.5.5 Statistics must be collected for a predetermined purpose : ......................................................... 14 1.5.6 Statistics should be placed in relation to each other :................................................................... 14 1.6 Functions of Statistics: ............................................................................................................................... 14 1.6.1 Condensation: ................................................................................................................................ 14 1.6.2 Comparison: ................................................................................................................................... 15 1.6.3 Forecasting: .................................................................................................................................... 15 1.6.4 Estimation: ..................................................................................................................................... 15 1.6.5 Tests of Hypothesis: ....................................................................................................................... 16 1.7 Scope of Statistics: ............................................................................................................................ 16 1.7.1 Statistics and Industry: ................................................................................................................... 16 1.7.2 Statistics and Commerce: .............................................................................................................. 17 1.7.3 Statistics and Agriculture: .............................................................................................................. 17 1.7.4 Statistics and Economics: ............................................................................................................... 18 1.7.5 Statistics and Education: ................................................................................................................ 18
1.7.6 Statistics and Planning: .................................................................................................................. 19 1.7.7 Statistics and Medicine: ................................................................................................................. 19 1.7.8 Statistics and Modern applications: ............................................................................................... 19 1.8 Limitations of statistics: ............................................................................................................................ 20 1.8.1 Statistics is not suitable to the study of qualitative phenomenon: ............................................... 20 1.8.2 Statistics does not study individuals: ............................................................................................. 20 1.8.3 Statistical laws are not exact: ........................................................................................................ 20 1.8.4 Statistics table may be misused: .................................................................................................... 21 1.8.5 Statistics is only, one of the methods of studying a problem: ....................................................... 21 1.9 Distrust Of Statistics ................................................................................................................................... 21 1.10 Uses of Statistics : ..................................................................................................................................... 22 1.10.1 To present the data in a concise and definite form : ................................................................... 22 1.10.2 To make it easy to understand complex and large data : ............................................................ 22 1.10.3 For comparison : .......................................................................................................................... 22 1.10.4 In forming policies :...................................................................................................................... 22 1.10.5 Enlarging individual experiences :................................................................................................ 22 1.10.6 In measuring the magnitude of a phenomenon: ......................................................................... 22 1.11 Types of Statistics ..................................................................................................................................... 23 1.12 Common Mistakes Committed In Interpretation of Statistics .................................................. 23 Chapter One: End Chapter Quizzes .............................................................................................................. 25 CHAPTER TWO: PRIMARY AND SECONDARY DATA ............................................................................... 27 2.1 Primary Data ................................................................................................................................................. 27 2.2 Sources of Primary Data ........................................................................................................................... 27 2.2.1 Direct personal investigations :...................................................................................................... 27 2.2.2 Indirect oral investigations : .......................................................................................................... 28 2.2.3 Information through correspondence : ......................................................................................... 28 2.2.4 Mailed questionnaire method : ..................................................................................................... 28 2.2.5 Schedule to be filled in by the enumerator : ................................................................................. 28 2.3 Secondary Data............................................................................................................................................. 28 2.4 The nature of secondary sources of information............................................................................. 29 2.5 Sources of Secondary data ....................................................................................................................... 31 2.5.1 Internal sources of secondary data................................................................................................ 31
2.5.2 External sources of secondary information ................................................................................... 32 2.5.3 Examples of Sources of External Secondary Data .......................................................................... 35 2.6 The problems of secondary sources ..................................................................................................... 36 2.7 Difference between Primary & Secondary Data............................................................................... 39 Chapter Two: End Chapter Quizzes .............................................................................................................. 41 CHAPTER THREE : MEASURES OF DISPERSION....................................................................................... 44 3.1 Meaning........................................................................................................................................................... 44 3.2 Definitions : ................................................................................................................................................... 44 3.3 Types of Dispersion : .................................................................................................................................. 44 3.3.1 Absolute Dispersion : ..................................................................................................................... 44 3.3.2 Relative Dispersion :....................................................................................................................... 44 3.4 Features of an ideal measure of dispersion ....................................................................................... 46 3.5 Methods of measuring Dispersion ........................................................................................................ 46 3.5.1 Range ............................................................................................................................................. 46 3.5.2 Quartile Deviations ........................................................................................................................ 49 3.5.3 Mean Deviation.............................................................................................................................. 52 3.5.4 Standard Deviation (S. D.) .............................................................................................................. 57 3.5.5 Co-efficient Of Variation ( C. V. ) .................................................................................................... 58 Chapter Three:- End Chapter Quizzes ......................................................................................................... 62 CHAPTER FOUR:-MEASURES OF SKEWNESS ............................................................................................. 65 4.1 Skewness ........................................................................................................................................................ 65 4.2 Definitions : ................................................................................................................................................... 65 4.3 Difference between Skewness and Dispersion ................................................................................. 67 4.4 Tests of Skewness ........................................................................................................................................ 67 4.5 Methods of measurement of Skewness ............................................................................................... 67 Chapter Four: End Chapter Quizzes ............................................................................................................. 73 CHAPTER FIVE: CORRELATION ...................................................................................................................... 76 5.1 Introduction .................................................................................................................................................. 76 5.2 Definitions : ................................................................................................................................................... 77 5.3 Coefficient of Correlation ......................................................................................................................... 78 5.4 Types of Correlation ................................................................................................................................... 82 5.5 Degrees of Correlation............................................................................................................................... 83
5.6 Techniques in Determining Correlation ............................................................................................. 84 5.6.1 Rating Scales............................................................................................................................... 85 5.7 Methods of Determining Correlation ................................................................................................... 89 5.7.3 Spearmans Rank Correlation Coefficient ............................................................................ 94 Chapter Five: End Chapter Quizzes ............................................................................................................ 100
baseball player, on base percentages of a baseball player, salary rates, standardized test results. 1.2 Meaning of Statistics The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state." Clearly, statistics is closely linked with the administrative affairs of a state such as facts and figures regarding defense force, population, housing, food, financial resources etc. What is true about a government is also true about industrial administration units, and even ones personal life. The word statistics has several meanings. In the first place, it is a plural noun which describes a collection of numerical data such as employment statistics, accident statistics, population statistics, birth and death, income and expenditure, of exports and imports etc. It is in this sense that the word 'statistics' is used by a layman or a newspaper. Secondly the word statistics as a singular noun, is used to describe a branch of applied mathematics, whose purpose is to provide methods of dealing with a collections of data and extracting information from them in compact form by tabulating, summarizing and analyzing the numerical data or a set of observations. The various methods used are termed as statistical methods and the person using them is known as a statistician. A statistician is concerned with the analysis and interpretation of the data and drawing valid worthwhile conclusions from the same. It is in the second sense that we are writing this guide on statistics.
Lastly the word statistics is used in a specialized sense. It describes various numerical items which are produced by using statistics ( in the second sense ) to statistics ( in the first sense ). Averages, standard deviation etc. are all statistics in this specialized third sense. 1.3 Origin and Growth of Statistics: The word Statistics and Statistical are all derived from the Latin word Status, means a political state. The theory of statistics as a distinct branch of scientific method is of comparatively recent growth. Research particularly into the mathematical theory of statistics is rapidly proceeding and fresh discoveries are being made all over the world.
1.4 Definitions : Statistics is defined differently by different authors over a period of time. In the olden days statistics was confined to only state affairs but in modern days it embraces almost every sphere of human activity. Therefore a number of old definitions, which was confined to narrow field of enquiry were replaced by more definitions, which are much more comprehensive and exhaustive. Secondly, statistics has been defined in two different ways Statistical data and statistical methods. The following are some of the definitions of statistics as numerical data. 1. Statistics are the classified facts representing the conditions of people in a state. In particular they are the facts, which can be stated in numbers or in tables of numbers or in any tabular or classified arrangement.
2. Statistics are measurements, enumerations or estimates of natural phenomenon usually systematically arranged, analysed and presented as to exhibit important interrelationships among them.
1.4.1 Definition by Florence Nightingale
the most important science in the whole world: for upon it depends the practical application of every other science and every art: the one science essential to all political and social administration, all education, all organization based on experience, for it only gives results of our experience.
Statistics are numerical statement of facts in any department of enquiry placed in relation to each other. - A.L. Bowley Statistics may be called the science of counting in one of the departments due to Bowley, obviously this is an incomplete definition as it takes into account only the aspect of collection and ignores other aspects such as analysis, presentation and interpretation. Bowley gives another definition for statistics, which states statistics may be rightly called the scheme of averages . This definition is also incomplete, as averages play an important role in understanding and comparing data and statistics provide more measures.
1.4.3 Definition by Croxton and Cowden:
Statistics may be defined as the science of collection, presentation analysis and interpretation of numerical data from the logical analysis. It is clear that the
definition of statistics by Croxton and Cowden is the most scientific and realistic one. According to this definition there are four stages: 1. Collection of Data: It is the first step and this is the foundation upon which the entire data set. Careful planning is essential before collecting the data. There are different methods of collection of data such as census, sampling, primary, secondary, etc., and the investigator should make use of correct method. 2. Presentation of data: The mass data collected should be presented in a suitable, concise form for further analysis. The collected data may be presented in the form of tabular or diagrammatic or graphic form. 3. Analysis of data: The data presented should be carefully analysed for making inference from the presented data such as measures of central tendencies, dispersion, correlation, regression etc., 4. Interpretation of data: The final step is drawing conclusion from the data collected. A valid conclusion must be drawn on the basis of analysis. A high degree of skill and experience is necessary for the interpretation.
Statistics may be defined as the aggregate of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in a systematic manner, for a predetermined purpose and placed in relation to each other. The above definition seems to be the most comprehensive and exhaustive.
1.4.5 Definition by Professor Secrit : The word statistics in the first sense is
"By statistics we mean aggregate of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other." This definition gives all the characteristics of statistics which are : Aggregate of facts, Affected by multiplicity of causes, Numerically expressed, Estimated according to reasonable standards of accuracy, Collected in a systematic manner, Collected for a predetermined purpose, Placed in relation to each other.
1.4.6 Definition by Croxton and Cowden : The word 'statistics' in the second
sense is defined by Croxton and Cowden as follows:"The collection, presentation, analysis and interpretation of the numerical data." This definition clearly points out four stages in a statistical investigation, namely: 1) Collection of data 2) Presentation of data
3) Analysis of data
4) Interpretation of data
become statistics, there must be more than one fact. However the data may relate to production, sales, employment, birth, death etc.
1.5.2 Statistics are numerically expressed : Only those statements which can be
expressed numerically are statistics. It does not deal with qualitative statements like students of MBA are intelligent. On the other hand if say that sales of Escorts Ltd. is Rs. 354 crores. These are statistical facts stated numerically.
Statistical data are affected to a great extent by various causes. For instance, the production of wheat depends upon the quality of seed, rainfall, quality of soil, fertilizer used, method of cultivation etc.
1.5.4 Statistics are collected in a systematic order : Statistical data are collected
in a systematic manner. Means the investigator has to chalk out a plan keeping in view the objective of data collection, determine the statistical unit, technique of data collection and so on.
1.5.5 Statistics must be collected for a predetermined purpose : The objective
of data collection must be predetermined and well established. A mere statement of purpose is insufficient.
1.5.6 Statistics should be placed in relation to each other : The Statistical data
must be comparable. It is possible only when the data are homogeneous. 1.6 Functions of Statistics: There are many functions of statistics. Let us consider the following five important functions.
1.6.1 Condensation:
Generally speaking by the word to condense , we mean to reduce or to lessen. Condensation is mainly applied at embracing the understanding of a huge mass of data by providing only few observations. If in a particular class in Chennai School, only marks4
1.6.2 Comparison:
Classification and tabulation are the two methods that are used to condense the data. They help us to compare data collected from different sources. Grand totals, measures of central tendency measures of dispersion, graphs and diagrams, coefficient of correlation etc provide ample scope for comparison. If we have one group of data, we can compare within itself. If the rice production (in Tonnes) in Tanjore district is known, then we can compare one region with another region within the district. Or if the rice production (in Tonnes) of two different districts within Tamilnadu is known, then also a comparative study can be made. As statistics is an aggregate of facts and figures, comparison is always possible and in fact comparison helps us to understand the data in a better way.
1.6.3 Forecasting:
By the word forecasting, we mean to predict or to estimate before hand. Given the data of the last ten years connected to rainfall of a particular district in Tamilnadu, it is possible to predict or forecast the rainfall for the near future. In business also forecasting plays a dominant role in connection with production, sales, profits etc. The analysis of time series and regression analysis plays an important role in forecasting.
1.6.4 Estimation:
One of the main objectives of statistics is drawn inference about a population from the analysis for the sample drawn from that population. The four major branches of statistical inference are 1. Estimation theory 2. Tests of Hypothesis 3. Non Parametric tests 4. Sequential analysis
In estimation theory, we estimate the unknown value of the population parameter based on the sample observations. Suppose we are given a sample of heights of hundred students in a school, based upon the heights of these 100 students, it is possible to estimate the average height of all students in that school.
1.6.5 Tests of Hypothesis:
A statistical hypothesis is some statement about the probability distribution, characterising a population on the basis of the information available from the sample observations. In the formulation and testing of hypothesis, statistical methods are extremely useful. Whether crop yield has increased because of the use of new fertilizer or whether the new medicine is effective in eliminating a particular disease are some examples of statements of hypothesis and these are tested by proper statistical tools.
Statistics is not a mere device for collecting numerical data, but as a means of developing sound techniques for their handling, analysing and drawing valid inferences from them. Statistics is applied in every sphere of human activity social as well as physical like Biology, Commerce, Education, Planning, Business Management, Information Technology, etc. It is almost impossible to find a single department of human activity where statistics cannot be applied. We now discuss briefly the applications of statistics in other disciplines.
1.7.1 Statistics and Industry:
Statistics is widely used in many industries. In industries, control charts are widely used to maintain a certain quality level. In production engineering, to find whether the product is conforming to specifications or not, statistical tools, namely inspection plans, control charts, etc., are of extreme importance. In inspection
plans we have to resort to some kind of sampling a very important aspect of Statistics.
1.7.2 Statistics and Commerce:
Statistics are lifeblood of successful commerce. Any businessman cannot afford to either by under stocking or having overstock of his goods. In the beginning he estimates the demand for his goods and then takes steps to adjust with his output or purchases. Thus statistics is indispensable in business and commerce. As so many multinational companies have invaded into our Indian economy, the size and volume of business is increasing. On one side the stiff competition is increasing whereas on the other side the tastes are changing and new fashions are emerging. In this in an examination are given, no purpose will be served. Instead if we are given the average mark in that particular examination, definitely it serves the better purpose. Similarly the range of marks is also another measure of the data. Thus, Statistical measures help to reduce the complexity of the data and consequently to understand any huge mass of data. connection, market survey plays an important role to exhibit the present conditions and to forecast the likely changes in future.
1.7.3 Statistics and Agriculture:
Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A. Fisher, plays a prominent role in agriculture experiments. In tests of significance based on small samples, it can be shown that statistics is adequate to test the significant difference between two sample means. In analysis of variance, we are concerned with the testing of equality of several population means.
For an example, five fertilizers are applied to five plots each of wheat and the yield of wheat on each of the plots are given. In such a situation, we are interested in finding out whether the effect of these fertilisers on the yield is significantly different or not. In other words, whether the samples are drawn from the same normal population or not. The answer to this problem is provided by the technique of ANOVA and it is used to test the homogeneity of several population means.
1.7.4 Statistics and Economics:
Statistical methods are useful in measuring numerical changes in complex groups and interpreting collective phenomenon. Nowadays the uses of statistics are abundantly made in any economic study. Both in economic theory and practice, statistical methods play an important role. Alfred Marshall said, Statistics are the straw only which I like every other economist have to make the bricks. It may also be noted that statistical data and techniques of statistical tools are immensely useful in solving many economic problems such as wages, prices, production, distribution of income and wealth and so on. Statistical tools like Index numbers, time series Analysis, Estimation theory, Testing Statistical Hypothesis are extensively used in economics.
1.7.5 Statistics and Education:
Statistics is widely used in education. Research has become a common feature in all branches of activities. Statistics is necessary for the formulation of policies to start new course, consideration of facilities available for new courses etc. There are many people engaged in research work to test the past knowledge and evolve new knowledge. These are possible only through statistics.
Statistics is indispensable in planning. In the modern world, which can be termed as the world of planning, almost all the organisations in the government are seeking the help of planning for efficient working, for the formulation of policy decisions and execution of the same. In order to achieve the above goals, the statistical data relating to production, consumption, demand, supply, prices, investments, income expenditure etc and various advanced statistical techniques for processing, analysing and interpreting such complex data are of importance. In India statistics play an important role in planning, commissioning both at the central and state government levels.
1.7.7 Statistics and Medicine:
In Medical sciences, statistical tools are widely used. In order to test the efficiency of a new drug or medicine, t - test is used or to compare the efficiency of two drugs or two medicines, t test for the two samples is used. More and more applications of statistics are at present used in clinical investigation.
1.7.8 Statistics and Modern applications:
Recent developments in the fields of computer technology and information technology have enabled statistics to integrate their models and thus make statistics a part of decision making procedures of many organisations. There are so many software packages available for solving design of experiments, forecasting simulation problems etc. SYSTAT, a software package offers mere scientific and technical graphing options than any other desktop statistics package. SYSTAT supports all types of scientific and technical research in various diversified fields as follows 1. Archeology: Evolution of skull dimensions 2. Epidemiology: Tuberculosis 3. Statistics: Theoretical distributions
4. Manufacturing: Quality improvement 5. Medical research: Clinical investigations. 6. Geology: Estimation of Uranium reserves from ground water.
1.8 Limitations of statistics: Statistics with all its wide application in every sphere of human activity has its own limitations. Some of them are given below.
1.8.1 Statistics is not suitable to the study of qualitative phenomenon: Since
statistics is basically a science and deals with a set of numerical data, it is applicable to the study of only these subjects of enquiry, which can be expressed in terms of quantitative measurements. As a matter of fact, qualitative phenomenon like honesty, poverty, beauty, intelligence etc, cannot be expressed numerically and any statistical analysis cannot be directly applied on these qualitative phenomenons. Nevertheless, statistical techniques may be applied indirectly by first reducing the qualitative expressions to accurate quantitative terms. For example, the intelligence of a group of students can be studied on the basis of their marks in a particular examination.
1.8.2 Statistics does not study individuals: Statistics does not give any specific
importance to the individual items, in fact it deals with an aggregate of objects. Individual items, when they are taken individually do not constitute any statistical data and do not serve any purpose for any statistical enquiry.
1.8.3 Statistical laws are not exact: It is well known that mathematical and
physical sciences are exact. But statistical laws are not exact and statistical laws are only approximations. Statistical conclusions are not universally true. They are true only on an average.
1.8.4 Statistics table may be misused: Statistics must be used only by experts;
otherwise, statistical methods are the most dangerous tools on the hands of the inexpert. The use of statistical tools by the inexperienced and untraced persons might lead to wrong conclusions. Statistics can be easily misused by quoting wrong figures of data. As King says aptly statistics are like clay of which one can make a God or Devil as one pleases .
1.8.5 Statistics is only, one of the methods of studying a problem:
Statistical method do not provide complete solution of the problems because problems are to be studied taking the background of the countries culture, philosophy or religion into consideration. Thus the statistical study should be supplemented by other evidences.
1.9 Distrust Of Statistics It is often said by people that, "statistics can prove anything." There are three types of lies - lies, demand lies and statistics - wicked in the order of their naming. A Paris banker said, "Statistics is like a miniskirt, it covers up essentials but gives you the ideas." Thus by "distrust of statistics" we mean lack of confidence in statistical statements and methods. The following reasons account for such views about statistics. Figures are convincing and, therefore people easily believe them. They can be manipulated in such a manner as to establish foregone conclusions. The wrong representation of even correct figures can mislead a reader. For example, John earned $ 4000 in 1990 - 1991 and Jem earned $ 5000. Reading this one would form the opinion that Jem is decidedly a better worker than John. However if we carefully examine the statement, we might reach a different
conclusion as Jems earning period is unknown to us. Thus while working with statistics one should not only avoid outright falsehoods but be alert to detect possible distortion of the truth.
classifying and tabulating raw data for processing and further tabulation for end users.
1.10.2 To make it easy to understand complex and large data : This is done by
presenting the data in the form of tables, graphs, diagrams etc., or by condensing the data with the help of means, dispersion etc.
1.10.3 For comparison : Tables, measures of means and dispersion can help in
schedule, based on the relevant sales figures. It is used in forecasting future demands.
1.10.5 Enlarging individual experiences : Complex problems can be well
understood by statistics, as the conclusions drawn by an individual are more definite and precise than mere statements on facts.
1.10.6 In measuring the magnitude of a phenomenon: Statistics has made it
possible to count the population of a country, the industrial growth, the agricultural growth, the educational level (of course in numbers).
1.11 Types of Statistics As mentioned earlier, for a layman or people in general, statistics means numbers - numerical facts, figures or information. The branch of statistics wherein we record and analyze observations for all the individuals of a group or population and draw inferences about the same is called "Descriptive statistics" or "Deductive statistics". On the other hand, if we choose a sample and by statistical treatment of this, draw inferences about the population, then this branch of statistics is known as Statical Inference or Inductive Statistics. In our discussion, we are mainly concerned with two ways of representing descriptive statistics : Numerical and Pictorial. 1. Numerical statistics are numbers. But some numbers are more meaningful such as mean, standard deviation etc. 2. When the numerical data is presented in the form of pictures (diagrams) and graphs, it is called the Pictorial statistics. This statistics makes confusing and complex data or information, easy, simple and straightforward, so that even the layman can understand it without much difficulty. 1.12 Common Mistakes Committed In Interpretation of Statistics 1. 1.12.1 Bias:- Bias means prejudice or preference of the investigator, which creeps in consciously and unconsciously in proving a particular point. 2. 1.12.2 Generalization:- Some times on the basis of little data available one could jump to a conclusion, which leads to erroneous results.
3. 1.12.3 Wrong conclusion:- The characteristics of a group if attached to an individual member of that group, may lead us to draw absurd conclusions. 4. 1.12.4 Incomplete classification:- If we fail to give a complete classification, the influence of various factors may not be properly understood. 5. 1.12.5 There may be a wrong use of percentages. 6. 1.12.6 Technical mistakes may also occur. 7. 1.12.7 An inconsistency in definition can even exist. 8. 1.12.8 Wrong causal inferences may sometimes be drawn. 9. 1.12.9 There may also be a misuse of correlation.
Chapter One: End Chapter Quizzes 1. The statement, Statistics is both a science and an art, was given by b- Tippet d- A. L. Bowley
a- R. A. Fisher c- L. R. Connor
2.
a- Singular
3. stated by
b- W. I. King d- A. L. Boddington
4.
5. statistics.
Who stated that there are three kinds of lies: lies, dammed lies and
b- Disraeili d- G. W. Snedecor
6.
a- a single value
7.
8.
a- positive
9.
10. type
Who originally gave the formula for the estimation of errors of the
a- L. R. Connor c- A. L. Bowley
b- W. I. King d- A. L. Boddington
2.1 Primary Data The foundation of statistical investigation lies on data so utmost care must be taken while collecting data. If the collected data are inaccurate and inadequate, the whole analysis and interpretation will also become misleading and unreliable. The method of collection of data depends upon the nature, object and scope of statistical enquiry on the one hand and the availability of time and money on the other hand. Data, or facts, may be derived from several sources. Data can be classified as primary data and secondary data. Primary data is data gathered for the first time by the researcher. So if the investigator himself prefers to collect the data for the purpose of purpose and enquiry and uses the data, it is called collection of primary data. These data are original in nature. According to Horace Secrist, primary data are meant that data which are original, that is, those in which little or no grouping has been made, for instance being recorded or itemized as encountered. They are essentially raw material.
personally contacts the informants and collect the data. This method of data collection is suitable where the field of enquiry is limited or the nature of inquiry is confidential.
cases where informants are reluctant to give information, so information is gathered from those who possess information on the problem under investigation. The informants are called witnesses. This method of investigation is normally used by enquiry committees and commissions.
2.2.3 Information through correspondence : Under this method, the
investigator appoints local agents or correspondents indifferent parts of the field of enquiry. They send information on specific issues on regular basis to investigator. This method is generally adopted by various television news channels, newspapers and periodicals on regular basis.
2.2.4 Mailed questionnaire method : Under this method, a questionnaire is
prepared by the investigator containing questions on the problem under investigations. This questionnaires are mailed to various informants who are requested to return by mail after answering the questions. A covering letter is also enclosed requesting the informants to reply before a specific date.
2.2.5 Schedule to be filled in by the enumerator : Under this method,
enumerators are appointed areawise. They contact the informants and and information is filled up by them in the schedules. The enumerators should be honest, painstaking and tactful as they have to deal with people of different nature.
Hence, the data obtained from published or unpublished sources are known as secondary data. There are many advantages in searching for and analyzing data before attempting the collection of primary data. In some cases, the secondary data itself may be sufficient to solve the problem. Usually the cost of gathering secondary data is much lower than the cost of organizing primary data. Moreover, secondary data has several supplementary uses. It also helps to plan the collection of primary data, in case, it becomes necessary. Blair has rightly defined, secondary data, as those already in existence and which have been collected for some other purpose than the answering of the question at hand. Secondary data is of two kinds, internal and external. Secondary data whether internal or external is data already collected by others, for purposes other than the solution of the problem on hand. Business firms always have as great deal of internal secondary data with them. Sales statistics constitute the most important component of secondary data in marketing and the researcher uses it extensively. All the output of the MIS of the firm generally constitutes internal secondary data. This data is readily available; the market researcher gets it without much effort, time and money.
No marketing research study should be undertaken without a prior search of secondary sources (also termed desk research). There are several grounds for making such a bold statement. Secondary data may be available which is entirely appropriate and wholly
adequate to draw conclusions and answer the question or solve the problem. Sometimes primary data collection simply is not necessary. It is far cheaper to collect secondary data than to obtain primary data. For
the same level of research budget a thorough examination of secondary sources can yield a great deal more information than can be had through a primary data collection exercise. The time involved in searching secondary sources is much needed to complete primary data collection. Secondary sources of information can yield more accurate data than that less than that
obtained through primary research. This is not always true but where a government or international agency has undertaken a large scale survey, or even a census, this is likely to yield far more accurate results than custom designed and executed surveys when these are based on relatively small sample sizes. It should not be forgotten that secondary data can play a substantial role in
the exploratory phase of the research when the task at hand is to define the research problem and to generate hypotheses. The assembly and analysis of secondary data almost invariably improves the researcher's understanding of the marketing problem, the various lines of inquiry that could or should be followed and the alternative courses of action which might be pursued. Secondary sources help define the population. Secondary data can be
extremely useful both in defining the population and in structuring the sample to be taken. For instance, government statistics on a country's agriculture will help
decide how to stratify a sample and, once sample estimates have been calculated, these can be used to project those estimates to the population.
Financial data: An organisation has a great deal of data within its files
on the cost of producing, storing, transporting and marketing each of its products and product lines. Such data has many uses in marketing research including allowing measurement of the efficiency of marketing operations. It can also be used to estimate the costs attached to new products under consideration, of particular utilisation (in production, storage and transportation) at which an organisation's unit costs begin to fall. Transport data: Companies that keep good records relating to their
transport operations are well placed to establish which are the most profitable routes, and loads, as well as the most cost effective routing patterns. Good data on transport operations enables the enterprise to perform trade-off analysis and thereby establish whether it makes economic sense to own or hire vehicles, or the point at which a balance of the two gives the best financial outcome. Storage data: The rate of stockturn, stockhandling costs, assessing the
efficiency of certain marketing operations and the efficiency of the marketing system as a whole. More sophisticated accounting systems assign costs to the cubic space occupied by individual products and the time period over which the product occupies the space. These systems can be further refined so that the profitability per unit, and rate of sale, are added. In this way, the direct product profitability can be calculated.
sources is futile. Consequently, only a specified search is made with no real expectation of sources. Cursory researches become a self-fulfilling prophecy. Dillon et. al3 give the following advice: "You should never begin a half-hearted search with the assumption that what is being sought is so unique that no one else has ever bothered to collect it and publish it. On the contrary, assume there are scrolling secondary data that should help provide definition and scope for the primary research effort." The same authors support their advice by citing the large numbers of organisations that provide marketing information including national and local government agencies, quasi-government agencies, trade associations, universities, research institutes, financial institutions, specialist suppliers of secondary marketing data and professional marketing research enterprises. Dillon et al further advise that searches of printed sources of secondary data begin with referral texts such as directories, indexes, handbooks and guides. These sorts of publications rarely provide the data in which the researcher is interested but serve in helping him/her locate potentially useful data sources. The main sources of external secondary sources are : (1) (2) (3) (4) Government (federal, state and local) Trade associations Commercial services National and international institutions. Governme nt statistics These may include all or some of the following: Population Social surveys, family expenditure censuses surveys statistics
Import/export
statistics
Trade associations differ widely in the extent of their data collection and information dissemination activities. However, it is worth checking with them to determine what they do publish. At the very least one would normally expect that they would produce a trade directory and, perhaps, a yearbook.
Commerci al services
Published publications
market
research
reports
and
other
wide range of
organisations which charge for their information. Typically, marketing people are interested in media statistics and consumer information which has been obtained from large scale consumer or farmer panels. The commercial organisation funds the collection of the data, which is wide ranging in its content, and hopes to make its money from selling this data to interested parties. National Bank economic reviews, university research reports,
and international journals and articles are all useful sources to contact. institutions International agencies such as World Bank, IMF, IFAD, UNDP, ITC, FAO and ILO produce a plethora of secondary data which can prove extremely useful to the marketing researcher.
published, statistics and figures are available on the internet either free or for a fee. The yellow pages of telephone directories/stand alone yellow pages
have become an established source of elementary business information. Tata Press, which first launched a stand alone yellow pages directory for Mumbai City, and GETIT yellow pages have been leading in this field. Today, yellow pages publications are available for all cities and major town a in the country. New Horizons, a joint venture between the Living Media group of publications and Singapore Telecom has been publishing stand alone directories for specific businesses. Business India data base of the Business India publications had been publishing the Delhi Pages directory. The Thomas Register is the worlds most powerful industrial buying
guide. It ensures a fast, frictionless flow of information between buyers and sellers of industrial goods and services. This purchasing tool is now available in India. The Thomas Register of Indian manufacturers or TRIM is Indias first dedicated manufacture-to-manufacture register. It features 120,000 listing of 40,000 industrial manufacturers and industrial service categories. It is available in print, CD forms and on the internet. The source Directory brought out by Mumbai based Source Publishers
is another example. It covers contact information on advertising agencies and related services and products, music companies, market research agencies, marketing and sales promotion consultants, publication, radio stations and cable
and satellite station telemarketing services, among others. It currently has editions for Metro cites. The Industrial Product Finder (IPF): IPF details the many application
of the new products and tells what is available and from whom. Most manufacturers of industrial products ensure that a description of their product is published in IPF before they hit the market. Phone data service: Agencies providing phone data services have also
come up in major cities in recent times Melior Communication for example, offers a tele-data service. Basic data on a number of subjects/products can be had through call to the agency. The service is termed Tell me Business through phone service. Its main aim, like that of yellow pages, is to bring buyers and sellers of products together. It also provides some elementary databank support to researchers.
careful handling. Such figures may refer to any one of the following: the land an individual owns, the land an individual owns plus any additional land he/she rents, the land an individual owns minus any land he/she rents out, all of his land or only that part of it which he actually cultivates. It should be noted that definitions may change over time and where this is not rganizati erroneous conclusions may be drawn. Geographical areas may have their boundaries redefined, units of measurement and grades may change and imported goods can be reclassified from time to time for purposes of levying customs and excise duties. Measur ement error When a researcher conducts fieldwork she/he is possibly able to estimate inaccuracies in measurement through the standard deviation and standard error, but these are sometimes not published in secondary sources. The only solution is to try to speak to the individuals involved in the collection of the data to obtain some guidance on the level of accuracy of the data. The problem is sometimes not so much error but differences in levels of accuracy required by decision makers. When the research has to do with large investments in, say, food manufacturing, management will want to set very tight margins of error in making market demand estimates. In other cases, having a high level of accuracy is not so critical. For instance, if a food manufacturer is merely assessing the prospects for one more flavour for a snack food already produced by the company then there is no
need for highly accurate estimates in order to make the investment decision. Source bias Researchers have to be aware of vested interests when they consult secondary sources. Those responsible for their compilation may have reasons for wishing to present a more optimistic or pessimistic set of results for their rganization. It is not unknown, for example, for officials responsible for estimating food shortages to exaggerate figures before sending aid requests to potential donors. Similarly, and with equal frequency, commercial rganizations have been known to inflate estimates of their market shares. Reliabil ity The reliability of published statistics may vary over time. It is not uncommon, for example, for the systems of collecting data to have changed over time but without any indication of this to the reader of published statistics. Geographical or administrative boundaries may be changed by government, or the basis for stratifying a sample may have altered. Other aspects of research methodology that affect the reliability of secondary data is the sample size, response rate, questionnaire design and modes of analysis. Time scale Most censuses take place at 10 year intervals, so data from this and other published sources may be out-of-date at the time the researcher wants to make use of the statistics. The time period during which secondary data was first compiled may have a substantial effect upon the nature of the
data. For instance, the significant increase in the price obtained for Ugandan coffee in the mid-90s could be interpreted as evidence of the effectiveness of the
rehabilitation programme that set out to restore coffee estates which had fallen into a state of disrepair. However, more knowledgeable coffee market experts would interpret the rise in Ugandan coffee prices in the context of large scale destruction of the Brazilian coffee crop, due to heavy frosts, in 1994, Brazil being the largest coffee producer in the world. Whenever possible, marketing researchers ought to use multiple sources of secondary data. In this way, these different sources can be cross-checked as confirmation of one another. Where differences occur an explanation for these must be found or the data should be set aside.
In secondary data, information relates to a past period. Hence, it lacks aptness and therefore, it has unsatisfactory value. Primary data is more accommodating as it shows latest information. Secondary data is obtained from some other organization than the one instantaneously interested with current research project. Secondary data was collected and analyzed by the organization to convene the requirements of various research objectives. Primary data is accumulated by the researcher particularly to meet up the research objective of the subsisting project. Secondary data though old may be the only possible source of the desired data on the subjects, which cannot have primary data at all. For example, survey reports or secret records already collected by a business group can offer information that cannot be obtained from original sources. Firm in which secondary data are accumulated and delivered may not accommodate the exact needs and particular requirements of the current research study. Many a time, alteration or modifications to the exact needs of the investigator may not be sufficient. To that amount usefulness of secondary data will be lost. Primary data is completely tailor-made and there is no problem of adjustments. Secondary data is available effortlessly, rapidly and inexpensively. Primary data takes a lot of time and the unit cost of such data is relatively high.
2.
3. respondents abcd-
4.Statistical data are collected for, a- collecting data without any purpose b- a given purpose c- any purpose d- none of the above
5. Method of complete enumeration is applicable for abKnowing the production Knowing the population
cd-
6. A statistical population may consist of abcdan infinite number of items an finite numberof items either of (a) and (b) none of (a) and (b)
7. Which of the following example does not constitute an infinite population? abcdPopulation consisting of odd numbers Population of weights of newly born babies Population of heights of 15-years -old children Population of head and tails in tossing a coin successively
8. Which of the following can be classified as hypothetical population? abcdAll labourers of a factory Female population of a factory Population of real numbers between 0 and 100 students of the world
9. A study based on complete enumeration is known as abcdsample survey pilot survey census survey none of the above
d-
universally true
which the original data exist, e.g., kg, rupee, years etc.
3.3.2 Relative Dispersion : Absolute dispersion fails to measure the
comparison between two series specially when the statistical unit is not the same. Hence, absolute dispersion has to be converted into relative measure of dispersion. Relative dispersion is measured in ratio form. It is also called coefficient of dispersion.
The measures of central tendencies (i.e. means) indicate the general magnitude of the data and locate only the center of a distribution of measures. They do not establish the degree of variability or the spread out or scatter of the individual items and their deviation from (or the difference with) the means. i) According to Nciswanger, "Two distributions of statistical data may be symmetrical and have common means, medians and modes and identical frequencies in the modal class. Yet with these points in common they may differ widely in the scatter or in their values about the measures of central tendencies." ii) Simpson and Kafka said, "An average alone does not tell the full story. It is hardly fully representative of a mass, unless we know the manner in which the individual item. Scatter around it. A further description of a series is necessary, if we are to gauge how representative the average is." From this discussion we now focus our attention on the scatter or variability which is known as dispersion. Let us take the following three sets.
Students Group X up Y 1 2 3 50 50 50 50 45 50 55 50 30 45 75 50 Gro Group Z
mean
Thus, the three groups have same mean i.e. 50. In fact the median of group X and Y are also equal. Now if one would say that the students from the three groups are of equal capabilities, it is totally a wrong conclusion then. Close examination reveals that in group X students have equal marks as the mean, students from group Y are very close to the mean but in the third group Z, the
marks are widely scattered. It is thus clear that the measures of the central tendency is alone not sufficient to describe the data. 3.4 Features of an ideal measure of dispersion An ideal measure of dispersion must possess the following features : Simple to understand Easy to compute Well defined measure Based on all the items of data Capable of algebraic treatment Should not be affected by the extreme items.
3.5.1 Range
In any statistical series, the difference between the largest and the smallest values is called as the range.
Coefficient of Range : The relative measure of the range. It is used in the comparative study of the dispersion co-efficient of Range = Example ( Individual series ) Find the range and the co-efficient of the range of the following items : 110, 117, 129, 197, 190, 100, 100, 178, 255, 790. Solution: R = L - S = 790 - 100 = 690 Solution: R = L - S = 100 - 10 = 90 Co-efficient of range = Example ( Discrete Series ) Find the range and the co-efficient of the range of the following items : x f 8 3 10 8 12 12 13 10 14 6 17 4
Solution
f 3 8 12 10 6 4
= 9/25 = 0.36 Continuous Series Example (Continuous Series) Find the range and the co-efficient of the range of the following items : X(m arks) F(St udents) 5 8 12 6 4 0-10 10-20 20-30 30-40 40-50
Solution
X(Marks) 0-10 10-20 20-30 30-40 40-50 Range = L-S = 50-0 50 Coefficient of Range = (L-S) / (L+S) Relative Range = (50-0) / (50+0) = 50/50 =1
F(Students) 5 8 12 6 4
If we concentrate on two extreme values ( as in the case of range ), we dont get any idea about the scatter of the data within the range ( i.e. the two extreme values ). If we discard these two values the limited range thus available might be more informative. For this reason the concept of interquartile range is developed. It is the range which includes middle 50% of the distribution. Here 1/4 ( one quarter of the lower end and 1/4 ( one quarter ) of the upper end of the observations are excluded.
Now the lower quartile ( Q1 ) is the 25th percentile and the upper quartile ( Q3 ) is the 75th percentile. It is interesting to note that the 50th percentile is the middle quartile ( Q2 ) which is in fact what you have studied under the title Median ". Thus symbolically If we divide ( Q3 - Q1 ) by 2 we get what is known as Semi-Iinter quartile range. Q.D. = (Q3-Q1)/2, where Q1 = First Quartile and Q3 = Third quartile Relative or Coefficient of Q.D. : To find the coefficient of Q. D., we divide the semi interquartile range by the sum of semi interquartiles. Symbolically : Coefficient of Q.D. = (Q3 Q1) / (Q3 + Q1) Example ( Individual Series ) Find the quartile deviation and its co-efficient from the following items :
X(marks) 5 8 10 12 15 9 11 12 15 20
Solution
S. No.
X(Marks)
1 2 3 4 5 6 7 8 9 10 Q1 = ( N+1)/4th item
5 8 10 12 15 9 11 12 15 20
5 8 9 10 11 12 12 15 15 20
Where N = No. of items in the data Q1 = (10+1)/4 = 11/4 = 2.75th item and 2.75th item = 2nd item + ( 3rd 2nd item) 75/100 = 8 + (9-8) = 8 + 0.75 = 8.75 Q3 = 3 (N+1)/4th item = 3 ( 10+1)/4 = 33/4 = 8.25th item
and 8.25th item 8th = (9th 8th item) 25/100 = 15+(15-15)/4 = 15+ 0 = 15 Q.D. = (Q3 Q1) /2 = (15- 8.75)/ 2 = 3.125 and coefficient of Q.D. = (Q3 Q1) / (Q3+Q1) = (15 8.75) / (15+8.75) = 6.25/ 23.75 = 0.26 Example (Discrete Series) Find the range and the co-efficient of the range of the following data : Solution Central size of items(x) 2 3 4 5 6 7 8 9 10 11 2 3 5 6 8 12 16 7 5 4 2 5 10 16 24 36 52 59 64 68 Frequency(f) c.f.
N = 68 Q1 = ( N+1) /4th item = (68+1)/ 4th item = (69)/4 = 17.25th item 17.25th item lies in c.f. 24 and against value of X = 6 Q1 = 6 Q3 = 3(N+1)/4th item = 3(68+1)/4 th item = (3*69)/4 = 51.75th item 51.75th item lies in c.f. 52 and against it value of X = 8 Q3 = 8 Q.D. = (Q3-Q1)/2 = (8-6)/2 =1 Coefficient of Q.D. = (Q3-Q1)/(Q3+Q1) = (8-6)/(8+6) = 2 / 14 = 0.143
3.5.3 Mean Deviation
Average deviations ( mean deviation ) is the average amount of variations (scatter) of the items in a distribution from either the mean or the median or the mode, ignoring the signs of these deviations by Clark and Senkade. Individual Series Steps : (1) Find the mean or median or mode of the given series.
(2) Using and one of three, find the deviations ( differences ) of the items of the series from them. i.e. xi - x, xi - Me and xi - Mo. Me = Median and Mo = Mode. (3) Find the absolute values of these deviations i.e. ignore there positive (+) and negative (-) signs. i.e. | xi - x | , | xi - Me | and xi - Mo |. (4) Find the sum of these absolute deviations.
i.e. | xi - x | + , | xi - Me | , and | xi - Mo | .
Note that : (i) generally M. D. obtained from the median is the best for the practical purpose. (ii) co-efficient of M. D. = Merits and Demerits of Mean Deviations Merits 1. deviation. 2. 3. This method is based on all the items of the data. The mean deviation is less affected by the extreme items in relation to It is a better technique of dispersion in relation to range and quartile
Example Calculate Mean deviation and its co-efficient for the following salaries: $ 1030, $ 500, $ 680, $ 1100, $ 1080, $ 1740. $ 1050, $ 1000, $ 2000, $ 2250, $ 3500 and $ 1030.
Calculations :
i) Median (Me) = Size of = Size of 11th item. Therefore, Median ( Me) = 8 ii) M. D. =
Example ( Continuous series ) Calculate the mean deviation and the coefficient of mean deviation from the following data using the mean. Difference in ages between boys and girls of a class.
Diff. in years 0-5 5 10 10 15 15 20 20 25 No.of students 449 705 507 281 109
25 30 30 35 35 40
52 16 4
Calculation: 1) X
2) M. D.
Thus, s.d. ( x ) =
where n =
fi
Merits : (1) It is rigidly defined and based on all observations. (2) It is amenable to further algebraic treatment. (3) It is not affected by sampling fluctuations. (4) It is less erratic. Demerits : (1) It is difficult to understand and calculate. (2) It gives greater weight to extreme values.
and s. d. ( x ) = Then V ( x ) =
and
3.5.5 Co-efficient Of Variation ( C. V. ) To compare the variations (dispersion) of two different series, relative measures of standard deviation must be calculated. This is known as co-efficient of variation or the co-efficient of s. d. Its formula is C. V. = Thus it is defined as the ratio s. d. to its mean.
Remark: It is given as a percentage and is used to compare the consistency or variability of two more series. The higher the C. V. , the higher the variability and lower the C. V., the higher is the consistency of the data. Example Calculate the standard deviation and its co-efficient from the following data.
A B C D E F G H I J
Solution
10 12 16 8 25 30 14 11 13 11
No. A B C D E
xi 10 12 16 8 25
(xi - x) -5 -3 +1 -7 +10
( xi - x )2 25 9 1 49 100
F G H I J n= 10
30 14 11 13 11 xi = 150
+15 -1 -5 -2 -4
Calculations : i)
ii) iii)
Example Calculate s.d. of the marks of 100 students. Marks No. of students (fi) 0-2 2-4 4-6 6-8 8-10 10 20 35 30 5 Midvalues (xi) 1 3 5 7 9 10 60 175 210 45 10 180 875 1470 405 fi xi fi xi2
n = 100
fi xi = 500
fi xi2 = 2940
Solution 1)
2)
1. Which of the following is not a measure of dispersion? abcdmean deviation quartile deviation standard deviation average deviation from mean
2. Which of the following is a unit less measure of dispersion? abcdstandard deviation mean deviation coefficient of variation range
3. Which one of the given measures of dispersion is considered best? a-standard deviation b- range c- variance d- coefficient of variation 4. For comparison of two different series, the best measure of dispersion is efghrange mean deviation standard deviation none of the above
5. Out of all measures of dispersion, the easiest one to calculate is a- standard deviation
b- range c- variance d- quartile deviation 6. Mean deviation is minimum when deviations are taken from a. b. c. d. mean median mode zero
7. Sum of squares of the deviations is when deviations are taken from a. b. c. d. mean meadian mode zero
8. Which measure of dispersion is least affected by extreme values ? a. b. c. d. 9. is called a. b. c. d. 10. variance absolute deviation standard deviation mean deviation range mean deviation standard deviation quartile deviation
c. d.
Although the two distributions have the same means and standard deviations they are not identical. Where do they differ ? They differ in symmetry. The left-hand side distribution is symmetrical one where as the distribution on the right-hand is asymmetrical or skewed. For a symmetrical distribution, the values, of equal distances on either side of the mode, have equal frequencies. Thus, the mode, median and mean - all coincide. Its curve rises slowly, reaches a maximum ( peak ) and falls equally slowly (Fig. 1). But for a skewed distribution, the mean, mode and median do not coincide. Skewness is positive or negative as per the positions of the mean and median on the right or the left of the mode. A positively skewed distribution ( Fig.2 ) curve rises rapidly, reaches the maximum and falls slowly. In other words, the tail as well as median on the righthand side. A negatively skewed distribution curve (Fig.3) rises slowly reaches its maximum and falls rapidly. In other words, the tail as well as the median are on the left-hand side.
Size 1 2 3 4 5
Frequency 12 13 14 15 14
Size 1 2 3 4 5
Frequency 4 6 12 10 8
Size 1 2 3 4 5
Frequency 3 7 8 10 12
6 7
13 12
6 7
7 3
6 7
6 4
4.3 Difference between Skewness and Dispersion Dispersion refers to spreadness or variations of items in a series while skewness refers to the direction of variation in a series. Thus, we measure the lack of symmetry in the distribution. Skewness may be both positive as well as negative depending upon the fact whether the value of mode is on the right or on the left side of the distribution.
4.4 Tests of Skewness 1. The values of mean, median and mode do not coincide. The more the difference between them, the more is the skewness. 2. Quartiles are not equidistant from the median. i.e. ( Q3 -Me ) ( Me - Q1 ). 3 The sum of positive deviations from the median is not equal to the sum of the negative deviations. 4. Frequencies are not equally distributed at points of equal deviation from the mode. 5. When the data is plotted on a graph they do not give the normal bellshaped form.
4.5 Methods of measurement of Skewness 1. First measure of skewness It is given by Karl Pearson Measure of skewness Co-efficient of skewness
J=
Pearson has suggested the use of this formula if it is not possible to determine the mode (Mo) of any distribution, ( Mean - Mode ) = 3 ( mean - median ) Skp = 3 ( - Mo ) Thus J =
Note : i) Although the co-efficient of skewness is always within 1, but Karl Pearsons co-efficient lies within 3. ii) If J = 0, then there is no skewness iii) If J is positive, the skewness is also positive. iv) If J is negative, the skewness is also negative. Unless and until no indication is given, you must use only Karl Pearsons formula.
Example Find Karl Pearsons coefficient of skewness from the following data:
Marks above 0 10 20 30 40 No.of students 150 140 100 80 80
50 60 70 80
70 30 14 0
Note: You will always find the different values of J when calculated by Karl Pearsons and Bowleys formula. But the value of J by Bowleys formula always lies with 1. Example The following table gives the frequency distribution of 291 workers of a factory according to their average monthly income in 1945- 55.
Income group ($) Below 50 50-70 70-90 90-110 110-130 130-150 No.of workers 1 16 39 58 60 46
22 15 15 9 10
Solution Income group Below 50 50 70 70 90 90 - 110 110 - 130 130 - 150 150 - 170 170 - 190 190 - 210 210 - 230 230 & above n= f 1 16 39 58 60 46 22 15 15 9 10 c.f. 1 17 56 114 174 220 242 257 252 281 291
f = 291
= Size of
item
= =
2. For a negatively skewed distribution, the correct inequality is abcdmode < median mean < median mean < mode none of the above
3. In case of a positive skewed distribution, the relation between mean, mead, median, and mode that hold is abcdmedian >mean >mode mean > median > mode mean = median = mode none of the above
4. For a positive skewed frequency curve, the inequality that holds is abcdQ1 +Q3 >2Q2 Q1 + Q2 > 2Q3 Q1 + Q3 > Q2 Q3 Q1 > Q2
5. If a moderately skewed distribution has mean 30 and mode 36, the median of the distribution is a10
bcd-
35 20 zero
6. First and third quartile of a frequency distribution are 30 and 75. Also its coefficient of skewness is 0.6. The median of the frequency distribution is a- 40 b- 39 c- 41 d- 38 7. For negatively skewed distribution, the correct relation between mean, median and mode is abcdmean = median = mode median < mean < mode mean < median < mode mode < mean < median
8. In the case of positive skewed distribution, the extreme values lies in the abcdleft tail right tail middle any where
9. The extreme values in a negatively skewed distribution lie in the abcdmiddle right tail left tail whole curve
10. Which of the following statements is true for a measures of deviation is amean deviation does not follow algebraic rule
bcd-
range is a crudest measure coefficient of variation is a relative measure all the above statements
isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights. Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of your data. It means the study of existence, magnitude and direction of the relation between two or more variables. in technology and in statistics. Correlation is very important. The famous astronomist Bravais, Prof. Sir Fancis Galton, Karl Pearson (who used this concept in Biology and in Genetics). Prof. Neiswanger and so many others have contributed to this great subject. 5.2 Definitions : An analysis of the covariation of two or more variables is usually called correlation. A. M. Tuttle Correlation analysis attempts to determine the degree of relationship between variables. Ya Lun Chou The effect of correlation is to reduce the range of uncertainty of ones prediction. Tippett
The second caveat is that the Pearson correlation technique works best with linear relationships: as one variable gets larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear relationships (in which the relationship does not follow a straight line). An example of a curvilinear relationship is age and health care. They are related, but the relationship doesn't follow a straight line. Young children and older people both tend to use much more health care than teenagers or young adults. Multiple regression (also included in the Statistics Module) can be used to examine curvilinear relationships, but it is beyond the scope of this article. Correlation Example Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is):
Correlation Example Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that how tall you are effects your self esteem (incidentally, I don't think we
have to worry about the direction of causality here -- it's not likely that self esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I made this data up to illustrate what a correlation is):
Person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Height 68 71 62 75 58 60 67 68 71 69 68 67 63 62 60 63 65 67 63 61 Self Esteem 4.1 4.6 3.8 4.4 3.2 3.1 3.8 4.1 4.3 3.7 3.5 3.2 3.7 3.3 3.4 4.0 4.1 3.8 3.4 3.6
Now, let's take a quick look at the histogram for each variable:
And, here are the descriptive statistics: Variable Mean Height 65.4 StDev 4.40574 Variance Sum 19.4105 1308 Minimum Maximum Range 58 75 17 1.5
Self 3.755 0.426090 0.181553 75.1 3.1 4.6 Esteem Finally, we'll look at the simple bivariate (i.e., two-variable) plot:
You should immediately see in the bivariate plot that the relationship between the variables is a positive one (if you can't see that, review the section on types of relationships) because if you were to fit a single straight line through the dots it would have a positive slope or move up from left to right. Since the correlation is nothing more than a quantitative estimate of the relationship, we would expect a positive correlation. What does a "positive relationship" mean in this context? It means that, in general, higher scores on one variable tend to be paired with higher scores on the other and that lower scores on one variable tend to be paired with lower scores on the other. You should confirm visually that this is generally true in the plot above.
5.4 Types of Correlation 5.4.1 Positive and negative correlation 5.4.2 Linear and non-linear correlation
A) If two variables change in the same direction (i.e. if one increases the other also increases, or if one decreases, the other also decreases), then this is called a positive correlation. For example : Advertising and sales. B) If two variables change in the opposite direction ( i.e. if one increases, the other decreases and vice versa), then the correlation is called a negative correlation. For example : T.V. registrations and cinema attendance. 1. The nature of the graph gives us the idea of the linear type
of correlation between two variables. If the graph is in a straight line, the correlation is called a "linear correlation" and if the graph is not in a straight line, the correlation is non-linear or curvi-linear. For example, if variable x changes by a constant quantity, say 20 then y also changes by a constant quantity, say 4. The ratio between the two always remains the same (1/5 in this case). In case of a curvi-linear correlation this ratio does not remain constant.
and in the same proportion, the correlation between the two is perfect positive. According to Karl Pearson the coefficient of correlation in this case is +1. On the other hand if the variables change in the opposite direction and in the same proportion, the correlation is perfect negative. its coefficient of correlation is -1. In practice we rarely come across these types of correlations.
relations between them or change in variable does not lead to a change in the other variable, then we can firmly say that there is no correlation or absurd correlation between the two variables. In such a case the coefficient of correlation is 0.
5.5.3 Limited degrees of correlation: If two variables are not perfectly
correlated or is there a perfect absence of correlation, then we term the correlation as Limited correlation. It may be positive, negative or zero but lies with the limits 1.
High degree, moderate degree or low degree are the three categories of this kind of correlation. The following table reveals the effect ( or degree ) of coefficient or correlation.
Degrees Absence of correlation Positive Zero Negative 0
type called partial correlation. The latter is useful when you want to look at the relationship between two variables while removing the effect of one or two other variables. Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for quantifiable data in which numbers are meaningful, usually quantities of some sort. It cannot be used for purely categorical data, such as gender, brands purchased, or favorite color. Following are the techniques for determining the correlation :-
provide precise measurements. When working with rating scales, correlations provide general indications.
We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we came up with this formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data -- how you can use the formula to compute the correlation. Let's look at the data we need for the formula. Here's the original data with the other necessary columns:
Person
1 2 3
Heigh t (x) 68 71 62
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sum =
75 58 60 67 68 71 69 68 67 63 62 60 63 65 67 63 61 1308
4.4 3.2 3.1 3.8 4.1 4.3 3.7 3.5 3.2 3.7 3.3 3.4 4 4.1 3.8 3.4 3.6 75.1
330 185.6 186 254.6 278.8 305.3 255.3 238 214.4 233.1 204.6 204 252 266.5 254.6 214.2 219.6 4937. 6 2
5625 3364 3600 4489 4624 5041 4761 4624 4489 3969 3844 3600 3969 4225 4489 3969 3721 8591 5
19.36 10.24 9.61 14.44 16.81 18.49 13.69 12.25 10.24 13.69 10.89 11.56 16 16.81 14.44 11.56 12.96 285.4
The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self esteem data. The bottom row consists of the sum of each column. This is all the
information we need to compute the correlation. Here are the values from the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula: The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self esteem data. The bottom row consists of the sum of each column. This is all the information we need to compute the correlation. Here are the values from the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula:
Now, when we plug these values into the formula given above, we get the following (I show it here tediously, one step at a time):
So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship. I guess there is a relationship between height and self esteem, at least in this made up data!
i) If all points lie on a rising straight line the correlation is perfectly positive and r = +1 (see fig.1 ) ii) If all points lie on a falling straight line the correlation is perfectly negative and r = -1 (see fig.2) iii) If the points lie in narrow strip, rising upwards, the correlation is high degree of positive (see fig.3)
iv) If the points lie in a narrow strip, falling downwards, the correlation is high degree of negative (see fig.4) v) If the points are spread widely over a broad strip, rising upwards, the correlation is low degree positive (see fig.5) vi) If the points are spread widely over a broad strip, falling downward, the correlation is low degree negative (see fig.6) vii) If the points are spread (scattered) without any specific pattern, the correlation is absent. i.e. r = 0. (see fig.7) Though this method is simple and is a rough idea about the existence and the degree of correlation, it is not reliable. As it is not a mathematical method, it cannot measure the degree of correlation.
where N = Number of pairs of observation Note : r is also known as product-moment coefficient of correlation.
OR r =
Example Calculate the coefficient of correlation between the heights of father and his son for the following data.
Height of father (cm):
165
166
167
168
167
169
170
172
167
168
165
172
168
172
169
171
y= yi-y
xy
x2
y2
-1 -1 0 1 2 4 0
-4 -1 3 3 0 2 0
4 1 0 3 0 8 xy=24
1 1 0 1 4 16 x2=36
16 1 9 9 0 4 y2=44
Calculation:
Now,
Since r is positive and 0.6. This shows that the correlation is positive and moderate (i.e. direct and reasonably good). Example From the following data compute the coefficient of correlation between x and y.
Example
If
covariance
between x and y is 12.3 and the variance of x and y are 16.4 and 13.8 respectively. Find the coefficient of correlation between them. Solution: Given - Covariance = cov. ( x, y ) = 12.3 Variance of x ( Variance of y ( Now,
y 2 x 2
)= 16.4
) = 13.8
R= where R = Rank correlation coefficient D = Difference between the ranks of two items N = The number of observations. Note: -1 i) R 1. Perfect positive correlation or complete
When R = +1
agreement in the same direction ii) When R = -1 Perfect negative correlation or complete
Computation: i. Give ranks to the values of items. Generally the item with
the highest value is ranked 1 and then the others are given ranks 2, 3, 4, .... according to their values in the decreasing order.
ii.
where R1 = Rank of x and R2 = Rank of y Note that iii. iv. Note : In some cases, there is a tie between two or more items. in such a case each items have ranks 4th and 5th respectively then they are given = D = 0 (always) Calculate D2 and then find Apply the formula. D2
4.5th rank. If three items are of equal rank say 4th then they are given = 5th rank each. If m be the number of items of equal ranks, the factor is added to S D2. If there are more than one of such cases
then this factor added as many times as the number of such cases, then
10
Student No.
R1 - R2 D -2 2 3 0 -2 -3 -5 2 -1 6 SD=0
(R1 - R2 )2 D2 4 4 9 0 4 9 25 4 1 36 S D2 = 96
1 2 3 4 5 6 7 8 9 10 N = 10
Calculation of R :
Example Calculate R of 6 students from the following data. Marks in Stats : Marks in
40
42
45
35
36
39
46
43
44
39
40
43
English : Solution: Marks in Stats 40 42 45 35 36 39 Marks in English 46 43 44 39 40 43 (R1 -R2)2 =D2 4 2.25 1 0 0 0.25
R1 3 2 1 6 5 4
R2 1 3.5 2 6 5 3.5
R1 - R2 2 -1.5 -1 0 0 0.5
N=6
SD=0
S D2 = 7.50
Example The value of Spearmans rank correlation coefficient for a certain number of pairs of observations was found to be 2/3. The sum of the squares of difference between the corresponding rnks was 55. Find the number of pairs. Solution: We have
Example A panel of two judges A and B graded dramatic performance by independently awarding marks as follows:
Solution
Inserting x = 38, we get y - 33 = 0.74 ( 38 - 33 ) y - 33 = 0.74 y - 33 = 3.7 y = 3.7 + 33 y = 36.7 = 37 ( approximately ) Therefore, the Judge B would have given 37 marks to 8th performance. 5
2. abcd-
Correlation coefficient was invented in the year 1910 1890 1908 none of the above
3. abcd-
The unit of correlation coefficient is kg/ cc per cent non-existing none of the above
4. abcd-
5.
abcd-
the signs of the deviations the magnitude of deviation both (a) and (b) none of (a) and (b)
6.
7.
8. abcd-
Another name of autocorrelation is biserial correlation serial correlation Spearmans correlation none of the above
9. means that
abcd-
10. abcd-
The correlation between the two variables is unity, there is perfect correlation perfect positive correlation perfect negative correlation no correlation