Sunteți pe pagina 1din 58

CONTENT

ACKNOWLEDGEMENT OBJECTIVES INTRODUCTION TASK SPECIFICATION PROBLEM SOLVING FURTHER EXPLORATION REFLECTION

2 3 4-5 6-8 9-49 50 51

1 1

ACKNOWLEDGEMENT
First of all, I would like to say Alhamdulillah, for giving me the potency and health to do this project work and finish it. Not forgotten to my parents and my sister giving everything that has to do with my project, their advises which I really need the most for making this project and facilities such as internet, books and printer for me to use. They supported me and hearten me to complete this task so that I will not procrastinate in doing it which Ive done it before. Then I would like to thank to my teacher, Mr. Saw Seong Moh for guiding me throughout this project. Even I had some laziness in doing it on the first place and the difficulties that I had, but he taught me patiently until we knew what to do. He tried to teach me until I understand what I am supposed to do with this thing. And patiently waiting for my project because I didnt pass it on time. Besides, my friends who has always supporting me and we all have the same difficulties in doing this project and almost get mad, but yet we manage to survived doing it. Even this project individually but we are cooperated doing this project especially in discussion and sharing ideas to ensure that we do it right and finish completely. Last but not least, any party which involved either directly or indirect in doing this project work.

I would like to say THANK YOU for supporting me and give me some advises.
2 2

OBJECTIVES
Develop mathematical knowledge in a way which increases students

interest and confidence,

Apply mathematics to everyday situations and to begin to understand

the part that mathematics plays in the world which me live,

Improve thinking skills and promote effective mathematical

communication,

Assist students to develop positive attitude and personalities, intrinsic

mathematical values such as accuracy, confidence and systemic reasoning,

Stimulate learning and enhance effective learning.

3 3

INTRODUCTION

Vision 2020 aims to produce a balanced human capital terms of physical, emotional, spiritual and intellectual in accordance with the national education philosophy. In order to expand the intellectual aspect, every individual should have the ability to analyze data.

4 4

The picture above shows students in secondary school having their SPM examination. The school examination secretary will collect the marks for each subject to determine the average grade of the subject, the average grade school and which will give the picture of the performance of the school. Data representation reflects the general characteristic of data that allows us to compare and thus predict and plan for the future. Data analysis is a process used to transform, remodel and revise certain information (data) with a view to reach to a certain conclusion for a given situation or problem. Data analysis can be done by different methods as according to the needs and requirements. For example if a school principal wants to know whether there is a relationship between students performance on the district writing assessment and their socioeconomic levels. In other words, do students who come from lower socioeconomic backgrounds perform lower, as we are led to believe? Or are there other variables responsible for the variance in writing performance? Again, a simple correlation analysis will help describe the students performance and help explain the relationship between the issues of performance and socioeconomic level.

Analysis does not have to involve complex statistics. Data analysis in schools involves collecting data and using that data to improve teaching and learning. Interestingly, principals and teachers have it pretty easy. In most cases, the collection of data has already been done. Schools regularly collect attendance data, transcript records, discipline referrals, quarterly or semester grades, norm- and criterion-referenced test scores, and a variety of other useful data. Rather than complex statistical formulas and tests, it is

5 5

generally simple counts, averages, percents, and rates that educators are interested in.

TASK SPECIFICATION
PART 1: 1. List the importance of data analysis in daily life. 2. (a) specify (i) (ii) (b) Three types of measures of central tendency At least two types of measure of dispersion

For each type of measure of central tendency stated in (a), give examples of their uses in daily life.

PART 2:
6 6

1. Get your class marks of any subject in one examination/test. Attach the mark sheet. 2. Calculate the (a) Mean (b) Median (c) Mode (d) Standard deviation

For the above marks.

Class Intervals (Marks)

Frequency

3.

Construct a frequency distribution table as in table 1 which contains at intervals of equal size. Choose a suitable class size.

least five class (a)

From table 1 find the (i) (ii) (iii) (iv) (v) Mean Mode Median (at least two methods) Standard deviation (at least two methods) Interquartile range (at least two methods)
7 7

(b)

Based on your answer from 3(a) above, state the most appropriate tendency that reflect the performance of your class.

measure of central Give your reason. (c)

Measure of dispersion is a measurement used to determine how far the

values of data in a set of data are spread out from its average value. Explain the advantages of using standard deviation compared to interquartile range as 4. the better measure of dispersion.

Ungrouped data and grouped data have been used to obtain the mean

and standard deviation in question (2) and (3) respectively. (a) Determine which type of data gives a more accurate representation.

Give your reasons. (b) State the conditions when grouped data and ungrouped data are

preferred. PART 3: Based on your group data, answer the following question. 1. Your teacher will add 3 marks for each student in your class for completing all their assignments. Make a conjecture for the new values of the following: (a) Mean (b) Mode (c) Median (d) Interquartile range (e) Standard deviation
8 8

Verify your answer using ICT.

2. A new student has just enrolled in your class. The student scored 97% in his/her former school. If the students mark is taken into account in the analysis of your school examination/test, calculate the new mean and the new deviation.

PROBLEM SOLVING
9 9

PART 1:

1. IMPORTANCE OF DATA ANALYSIS IN DAILY LIFE

There are many benefits of data analysis however the most important ones are as follow: - data analysis helps in structuring the findings from different sources of data collection like survey research. It is again very helpful in breaking a macro problem in micro parts. Data analysis act like a filter when it comes to acquiring meaningful insights of huge data-set. Every researcher has sort out huge file of data that he/she has collected, before reaching to a conclusion of the research question. It provides a meaningful base to critical decision. It helps create a complete dissertation proposal.

One of the most important uses of data analysis is that it helps in keeping humans bias away from research conclusion with the help of proper statistical treatment. With the help of data analysis a researcher can filter both qualitative and quantitative data for an assignment writing projects. Thus, it can be said that data analysis is of utmost importance for both the research and the researcher. Or to put it in another words data analysis is as important to a researcher as it important to a doctor to diagnose the problem of the patient before giving him any treatment.

10 10

Data analysis is a process used to transform, remodel and revise certain information (data) with a view to reach a certain conclusion for a given situation or problem. Data analysis can be done by different methods as according to the needs and requirements. For example if a school principal wants to know whether there is a relationship between a students performance on the district writing assessment and their socioeconomic levels. In other words, do students who come from lower socioeconomic backgrounds perform lower as we led to believe? Or are there other variables responsible for the variance in writing performance? Again a simple correlation analysis will help describe the students performance and help explain the relationship between the issues of the performance and socioeconomic level. Analysis does not have to involve complex statistic. Data analysis in school involves collecting data and using that data to improve teaching and learning. Interestingly, principals and teachers have it pretty easy. In most cases, the collection of data has already been done. School regularly collect attendance data, transcript records, discipline referrals, quarterly or semester grades, normand criterion-referenced test scores and a variety of other useful data. Rather than complex statistical formulas and tests, it is generally simple counts, averages, percents and rates that educators are interested in. Data plays an important role in the information we receive on a daily basis from environmental print, newspapers, television, magazines, the

11 11

Internet, etc. Other areas of mathematics are deeply embedded into this strand of the curriculum. When working through data analysis activities, students naturally draw upon other mathematical skills such as understanding of number, operations, patterning, and various problem solving strategies. Students view various forms of data in many other areas of the curriculum, such as prediction charts in Science, population graphs in Social Studies, or informational text in Language Arts. For students, the process of data analysis is not only interesting, but constitutes real problem solving linked to many aspects of their environment.

2. (a)(i) Three types of measure of central tendency.

Measures of Central Tendency


Introduction A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode.
12 12

The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used. Mean (Arithmetic) The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data (see our Types of Variable guide for data types). The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, the sample mean, usually denoted by (pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol letter, , pronounced "sigma", which means "sum of...":

Or
You may have noticed that the above formula refers to the sample mean. So, why have we called it a sample mean? This is because, in statistics, samples and populations have very different meanings and these differences are very important, even if, in the case of the mean, they are calculated in the same way. To acknowledge that we are calculating the population mean and not the sample mean, we use the Greek lower case letter "mu", denoted as :

The mean is essentially a model of your data set. It is the value that is most common. You will notice, however, that the mean is not often one of the actual values that you have observed in your data set. However, one of its important properties is that it minimises error in the prediction of any one
13 13

value in your data set. That is, it is the value that produces the lowest amount of error from all other values in the data set. An important property of the mean is that it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero. When not to use the mean The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. For example, consider the wages of staff at a factory below: Staff Salary 1 15 k 2 18 k 3 16 k 4 14 k 5 15 k 6 15 k 7 12 k 8 17 k 9 90 k 10 95 k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean value might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have a better measure of central tendency. As we will find out later, taking the median would be a better measure of central tendency in this situation. Another time when we usually prefer the median over the mean (or mode) is when our data is skewed (i.e., the frequency distribution for our data is skewed). If we consider the normal distribution - as this is the most frequently assessed in statistics - when the data is perfectly normal, the mean, median and mode are identical. Moreover, they all represent the most typical value in the data set. However, as the data becomes skewed the mean loses its ability to provide the best central location for the data because the skewed data is dragging it away from the typical value. However, the median best retains this position and is not as strongly influenced by the skewed values. This is explained in more detail in the skewed distribution section later in this guide.

14 14

Median The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data. In order to calculate the median, suppose we have the data below: 65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first): 14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle mark because there are 5 scores before it and 5 scores after it. This works fine when you have an odd number of scores, but what happens when you have an even number of scores? What if you had only 10 scores? Well, you simply have to take the middle two scores and average the result. So, if we look at the example below: 65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first): 14 35 45 55 55 56 56 65 87 89 92

Only now we have to take the 5th and 6th score in our data set and average them to get a median of 55.5.

Mode The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option. An example of a mode is presented below:

15 15

Normally, the mode is used for categorical data where we wish to know which is the most common category, as illustrated below:

We can see above that the most common form of transport, in this particular data set, is the bus. However, one of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency, such as below:

16 16

We are now stuck as to which mode best describes the central tendency of the data. This is particularly problematic when we have continuous data because we are more likely not to have any one value that is more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)? The answer, is probably very unlikely - many people might be close, but with such a small sample (30 people) and a large range of possible weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous data. Another problem with the mode is that it will not provide us with a very good measure of central tendency when the most common mark is far away from the rest of the data in the data set, as depicted in the diagram below:

17 17

In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is not representative of the data, which is mostly concentrated around the 20 to 30 value range. To use the mode to describe the central tendency of this data set would be misleading. Skewed Distributions and the Mean and Median We often test whether our data is normally distributed because this is a common assumption underlying many statistical tests. An example of a normally distributed set of data is presented below:

18 18

When you have a normally distributed sample you can legitimately use both the mean or the median as your measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode are equal. However, in this situation, the mean is widely preferred as the best measure of central tendency because it is the measure that includes all the values in the data set for its calculation, and any change in any of the scores will affect the value of the mean. This is not the case with the median or mode. However, when our data is skewed, for example, as with the right-skewed data set below:

19 19

we find that the mean is being dragged in the direct of the skew. In these situations, the median is generally considered to be the best representative of the central location of the data. The more skewed the distribution, the greater the difference between the median and mean, and the greater emphasis should be placed on using the median as opposed to the mean. A classic example of the above right-skewed distribution is income (salary), where higher-earners provide a false representation of the typical income if expressed as a mean and not a median. If dealing with a normal distribution, and tests of normality show that the data is non-normal, it is customary to use the median instead of the mean. However, this is more a rule of thumb than a strict guideline. Sometimes, researchers wish to report the mean of a skewed distribution if the median and mean are not appreciably different (a subjective assessment), and if it allows easier comparisons to previous research to be made. Summary of when to use the mean, median and mode
20 20

Please use the following summary table to know what the best measure of central tendency is with respect to the different types of variable.

Type of Variable Nominal Ordinal Interval/Ratio (not skewed) Interval/Ratio (skewed)

Best measure of central tendency Mode Median Mean Median

21 21

(ii)

Two types of measure of dispersion.

Measure of Dispersion
Introduction Measures of average such as the median and mean represent the typical value for a dataset. Within the dataset the actual values usually differ from one another and from the average value itself. The extent to which the median and mean are good representatives of the values in the original dataset depends upon the variability or dispersion in the original data. Datasets are said to have high dispersion when they contain values considerably higher and lower than the mean value. In figure 1 the number of different sized tutorial groups in semester 1 and semester 2 are presented. In both semesters the mean and median tutorial group size is 5 students, however the groups in semester 2 show more dispersion (or variability in size) than those in semester 1. Dispersion within a dataset can be measured or described in several ways including the range, inter-quartile range and standard deviation.

22 22

The Range The range is the most obvious measure of dispersion and is the difference between the lowest and highest values in a dataset. In figure 1, the size of the largest semester 1 tutorial group is 6 students and the size of the smallest group is 4 students, resulting in a range of 2 (6-4). In semester 2, the largest tutorial group size is 7 students and the smallest tutorial group contains 3 students, therefore the range is 4 (7-3). The range is simple to compute and is useful when you wish to evaluate the whole of a dataset. The range is useful for showing the spread within a dataset and for comparing the spread between similar datasets. An example of the use of the range to compare spread within datasets is provided in table 1. The scores of individual students in the examination and coursework component of a module are shown.

23 23

To find the range in marks the highest and lowest values need to be found from the table. The highest coursework mark was 48 and the lowest was 27 giving a range of 21. In the examination, the highest mark was 45 and the lowest 12 producing a range of 33. This indicates that there was wider variation in the students performance in the examination than in the coursework for this module. Since the range is based solely on the two most extreme values within the dataset, if one of these is either exceptionally high or low (sometimes referred to as outlier) it will result in a range that is not typical of the variability within the dataset. For example, imagine in the above example that one student failed to hand in any coursework and was awarded a mark of zero, however they sat the exam and scored 40. The range for the coursework marks would now become 48 (48-0), rather than 21, however the new range is not typical of the dataset as a whole and is distorted by the outlier in the coursework marks. In order to reduce the problems caused by outliers in a dataset, the inter-quartile range is often calculated instead of the range. Range = maximum value minimum value

Interquartile range = Q3 - Q1

24 24

The Inter-quartile Range The inter-quartile range is a measure that indicates the extent to which the central 50% of values within the dataset are dispersed. It is based upon, and related to, the median. In the same way that the median divides a dataset into two halves, it can be further divided into quarters by identifying the upper and lower quartiles. The lower quartile is found one quarter of the way along a dataset when the values have been arranged in order of magnitude; the upper quartile is found three quarters along the dataset. Therefore, the upper quartile lies half way between the median and the highest value in the dataset whilst the lower quartile lies halfway between the median and the lowest value in the dataset. The inter-quartile range is found by subtracting the lower quartile from the upper quartile. For example, the examination marks for 20 students following a particular module are arranged in order of magnitude.

The median lies at the mid-point between the two central values (10th and 11th) = half-way between 60 and 62 = 61 The lower quartile lies at the mid-point between the 5th and 6th values = half-way between 52 and 53 = 52.5 The upper quartile lies at the mid-point between the 15th and 16th values
25 25

= half-way between 70 and 71 = 70.5 The inter-quartile range for this dataset is therefore 70.5 - 52.5 = 18 whereas the range is: 80 - 43 = 37. The inter-quartile range provides a clearer picture of the overall dataset by removing/ignoring the outlying values. Like the range however, the inter-quartile range is a measure of dispersion that is based upon only two values from the dataset. Statistically, the standard deviation is a more powerful measure of dispersion because it takes into account every value in the dataset. The standard deviation is explored in the next section of this guide.

The Standard Deviation The standard deviation is a measure that summarises the amount by which every value within a dataset varies from the mean. Effectively it indicates how tightly the values in the dataset are bunched around the mean value. It is the most robust and widely used measure of dispersion since, unlike the range and inter-quartile range, it takes into account every variable in the dataset. When the values in a dataset are pretty tightly bunched together the standard deviation is small. When the values are spread apart the standard deviation will be relatively large. The standard deviation is usually presented in conjunction with the mean and is measured in the same units. In many datasets the values deviate from the mean value due to chance and such datasets are said to display a normal distribution. In a dataset with a normal distribution most of the values are clustered around the mean while relatively few values tend to be extremely high or extremely low. Many natural phenomena display a normal distribution. For datasets that have a normal distribution the standard deviation can be used to determine the proportion of values that lie within a particular range
26 26

of the mean value. For such distributions it is always the case that 68% of values are less than one standard deviation (1SD) away from the mean value, that 95% of values are less than two standard deviations (2SD) away from the mean and that 99% of values are less than three standard deviations (3SD) away from the mean. Figure 3 shows this concept in diagrammatical form.

If the mean of a dataset is 25 and its standard deviation is 1.6, then 68% of the values in the dataset will lie between MEAN-1SD (25-1.6=23.4) and MEAN+1SD(25+1.6=26.6) 99% of the values will lie between MEAN-3SD (25-4.8=20.2) and MEAN+3SD(25+4.8=29.8). If the dataset had the same mean of 25 but a larger standard deviation (for example, 2.3) it would indicate that the values were more dispersed. The frequency distribution for a dispersed dataset would still show a normal
27 27

distribution but when plotted on a graph the shape of the curve will be flatter as in figure 4.

Population and sample standard deviations There are two different calculations for the Standard Deviation. Which formula you use depends upon whether the values in your dataset represent an entire population or whether they form a sample of a larger population. For example, if all student users of the library were asked how many books they had borrowed in the past month then the entire population has been studied since all the students have been asked. In such cases the population standard deviation should be used. Sometimes it is not possible to find information about an entire population and it might be more realistic to ask a sample of 150 students about their library borrowing and use these results to estimate library borrowing habits for the entire population of students. In such cases the sample standard deviation should be used. Formulae for the standard deviation

28 28

Whilst it is not necessary to learn the formula for calculating the standard deviation, there may be times when you wish to include it in a report or dissertation. The standard deviation of an entire population is known as (sigma) and is calculated using:

Where x represents each value in the population, is the mean value of the population, is the summation (or total), and N is the number of values in the population. The standard deviation of a sample is known as S and is calculated using:

Where x represents each value in the population, x is the mean value of the sample, is the summation (or total), and n-1 is the number of values in the sample minus 1. Calculating the standard deviation using Excel Excel has functions to calculate the population and sample standard deviations. The appropriate commands are entered into the formula bar towards the top of the spreadsheet and the corresponding cells in the spreadsheet are updated to show the result. For an example of calculating the population standard deviation, imagine you wish to know how fuel-efficient a new car that you have just purchased is. You calculate how many kilometres you have done per litre on your first five trips. This information is presented as column A of the spreadsheet (figure 5). As you have only made 5 trips you do not have any further information and
29 29

you are therefore measuring the whole population at this point in time. The command to find the population standard deviation in Excel is =STDEVP(VALUES) and in this case the command is =STDEVP(A2:A6) which gives an answer of 0.49. Basing your results on the population standard deviation and assuming that your first 5 trips in your new car have been typical of your usual journeys, you can be 99% confident that your new car will do between 14.75 (MEAN3SD) and 17.69 (MEAN+3SD) kilometres per litre.

The same data can be used to demonstrate how to calculate the sample standard deviation in Excel. In this case, imagine that the data in column A represent the kilometres per litre found for a sample of 5 new cars tested by the manufacturer. The population standard deviation is calculated using =STDEV(VALUES) and in this case the command is =STDEV(A2:A6) which produces an answer of 0.55. The sample standard deviation will always be greater than the population standard deviation when they are calculated for the same dataset. This is because the formula for the sample standard deviation has to take into
30 30

account the possibility of there being more variation in the true population than has been measured in the sample. Based on their sample of 5 cars, and therefore using the sample standard deviation, the manufacturers could state with 99% confidence that similar cars will do between 14.57 (MEAN-3SD) and 17.87 (MEAN+3SD) kilometres per litre . These examples show the quick method of calculating standard deviations using a cell range. Each of the commands can also be written out in a longer format with the individual kilometres/litre entered. For example entering: =STDEV(16.13,16.40,15.81,17.07,15.69) produces an identical result to=STDEV(A2:A6). However, if one of the values in column A was found to be incorrect and adjusted, the cell range method would automatically update the calculation of the standard deviation whereas the longer format will require manual adjustment of the command. The range, inter-quartile range and standard deviation are all measures that indicate the amount of variability within a dataset. The range is the simplest measure of variability to calculate but can be misleading if the dataset contains extreme values. The inter-quartile range reduces this problem by considering the variability within the middle 50% of the dataset. The standard deviation is the most robust measure of variability since it takes into account a measure of how every value in the dataset varies from the mean. However, care must be taken when calculating the standard deviation to consider whether the entire population or a sample is being examined and to use the appropriate formula.

31 31

PART 2: 1. Class marks of any subject in one examination/test Mathematics test. STUDENTS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 MARKS 66 39 65 71 81 82 74 87 73 12 88 78 32 60 65 76
32 32

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

29 75 54 37 84 77 49 66 52 89 50 80 79 70 61 42

33 33

Frequency Table. Marks 1 10 11 20 21 30 31 40 41 50 51 60 61 70 71 80 81 90 91 100 Frequency 0 1 1 3 3 3 6 9 6 0

(2)(a) Mean

34 34

MARKS 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100

MIDPOINT, x 5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5 TOTAL

FREQUENCY, f 0 1 1 3 3 3 6 9 6 0 32

Fx 0 15.5 25.5 106.5 136.5 166.5 393 679.5 513 0 2036

Mean,

= 63.625

(b) Median Median is the value of the centre of a set data.


35 35

Method 1: Using Formula. *Median mark for 32 students can be obtained by using formula

Hint: L = lower boundary of median class N = total frequency F = cumulative frequency before median class fm = frequency of median class C = class interval size MARKS LOWER BOUNDARY 1 10 11 20 21 30 31 40 41 50 51 60 61 70 71 80 81 90 91 100 0.5 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5 90.5 UPPER BOUNDARY 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5 90.5 100.5 FREQUENCY, f 0 1 1 3 3 3 6 9 6 0 CUMULATIVE FREQUENCY 0 1 2 5 8 11 17 26 32 32
36 36

Median class = 16

= 32

= 17th value = 61-70 L= 60.5 C = 70.5 60.5 = 10 fm = 6 N=32 F= 11

(c) Mode The modal class is

(d) Standard deviation Method 1:

37 37

MARKS 1 10 11 20 21 30 31 40 41 50 51 60 61 70 71 80 81 90 91 100

MIDPOINT, x 5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5 TOTAL

FREQUENCY, f 0 1 1 3 3 3 6 9 6 0 32

fx 0 15.5 25.5 106.5 136.5 166.5 393 679.5 513 0 2036 0 240.5 650.25 3780.75 6210.75 9240.75 25741.5 51302.25 43861.5 0 141032.75

Method 2:

38 38

MARKS

MIDPOINT, x

FREQUENCY ,f 0 1 1 3 3 3 6 9 6 0 TOTAL 50.0645 2506.454 2 0 -40.4355 1635.029 7 0

1 10 11 20 21 30 31 40 41 50 51 60 61 70 71 80 81 90 91 100

5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5

39 39

(e.) Interquartile range Method 1: Using formula Q1 class

4150

L = 40.5 C = 10

fm = 3

N = 32

F=5

Q1 = = 30.5

Q3 class = 32 = 24 = 26th value = 7180


40 40

L = 70.5 17 C = 10

fm = 9

N = 32

F=

Q3 = = 78.2777

Interquartile range = Q3Q1 = =47.7777

41 41

APPROPRIATE MEASURE OF CENTRAL TENDENCY. From the above measure of tendency, mean is a suitable measure of central tendency because the minimum value of raw data is not extreme where the data seems to be clustered, whereas mode and median doesnt take all the values in the data into account which decrease the accuracy of central tendency.

42 42

MEASURE OF DISPERSION. Measure of dispersion is a measurement to determine how far the values of data in a set of data are spread out from its average value. a) (1) interquartile range Method 1: Using Formula

Q1 class = 32 =8 = 8th value = 41-50

L = 40.5 F=5

fm = 3 C = 10

N = 32

= 50.5

43 43

Q3 class = 32 = 24 = 26th value = 71 80 L = 70.5 F = 17 fm = 9 C = 10 N = 32

=78.2778 Therefore, Interquartile range = Q3 Q1 =78.2778 50.5 = 27.778

(2) Standard deviation Method 1:

MARKS 1 10 11 20 21 30

MIDPOINT FREQUENC ,x Y, f 5.5 0 15.5 1 25.5 1

fx 0 15.5 25.5 0 240.5 650.25


44 44

31 40 35.5 41 50 45.5 51 60 55.5 61 70 65.5 71 80 75.5 81 90 85.5 91 100 95.5 TOTAL

3 3 3 6 9 6 0 32

106.5 136.5 166.5 393 679.5 513 0 2036

3780.75 6210.75 9240.75 25741.5 51302.25 43861.5 0 141032.7 5

Method 2:

MARKS

MIDPOINT, x

FREQUENCY ,f 0 1 -40.4355 1635.029 7 0

1 10 11 20

5.5 15.5

45 45

21 30 31 40 41 50 51 60 61 70 71 80 81 90 91 100

25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5

1 3 3 3 6 9 6 0 TOTAL 50.0645 2506.454 2 0

b)

Advantages of using standard deviation

The standard deviation gives a measure of dispersion of the data about the mean. A direct analogy would be that of the interquartile range, which gives a measure of dispersion about the median. However, the standard deviation is generally more useful than the interquartile range as it includes all data in its calculation. The interquartile range is totally dependent on just two values and ignores all the other observations in the data. This reduces the accuracy it extreme value is present in the data. Since the marks does not contain any extreme value, standard deviation give a better measures compared to interquartile range.

46 46

PART 3: a. The new marks for 31 students STUDENTS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 MARKS 69 42 68 74 84 85 77 90 76 15 91 81 35 63 68 79 32 78


47 47

19 20 21 22 23 24 25 26 27 28 29 30 31 32

57 40 90 80 52 69 55 92 53 83 82 73 64 45

48 48

New frequency distributions table:

MAR KS 1-10 1120 2130 3140 4150 5160 6170 7180 8190 91100

LOWER BOUNDARY 0.5 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5 90.5

MIDPOIN T, x 5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5

FREQUENC Y,f 0 1 0 3 2 4 6 7 7 2

CUMULATIVE FREQUENCY 0 1 1 4 6 10 16 23 30 32

fx 0 15.5 0 106. 5 91 222 393 528. 5 598. 5 191 214 0 240.5 0 3780.75 4140.5 12321 25741.5 39901.7 5 51171.7 5 18240.5 155538.
49 49

TOTAL

25

Mean

= 67.0625

Mode The modal class is 71-80 and 81-90 Median Method 1: Formula

Median class = 32 2 =16 =16th value = 61 - 70 L = 60.5 F = 10 fm = 6 C = 10 N = 32

50 50

=70.5

Class interval Class interval remain same C = 10

Interquartile range Method 1: Formula Q1 class = 32 =8 = 10th value = 51 60 L = 50.5 F=6 fm = 4 C = 10 N = 32

=55.5

Q3 class = 32 = 24
51 51

= 30th value = 81 90

L = 80.5 F = 23

fm = 7 C = 10

N = 32

= 81.9285

Interquartile range = Q3 Q1 = 81.9285 55.5 = 26.4285

Standard deviation

=363.1914063

52 52

b.

The new student scored 97%

MAR KS 1-10 1120 2130 3140 4150

LOWER BOUNDARY 0.5 10.5 20.5 30.5 40.5

MIDPOIN T, x 5.5 15.5 25.5 35.5 45.5

FREQUENC Y,f 0 1 0 3 2

CUMULATIVE FREQUENCY 0 1 1 4 6

fx 0 15.5 0 106.5 91 0 240.5 0 3780.7 5 4140.5

53 53

5160 6170 7180 8190 91100

50.5 60.5 70.5 80.5 90.5

55.5 65.5 75.5 85.5 95.5 TOTAL

4 6 7 7 3

10 16 23 30 33

222 393 528.5 598.5 286.5 2241. 5

12321 25741. 5 39901. 75 51171. 75 27360. 75 164658 .5

Mean

=67.924

Standard deviation

=375.9817
54 54

Mode, median, and interquartile range are not affected by the adding for new marks.

FURTHER EXPLORATION
55 55

Mr. Mas class achievement doesnt have any difference between my class. Mean scored were almost the same but my class mean scored was 67.924 and Mr. Mas was 76.79. meanwhile the standard deviation of both class was way too different. Both of the class study very hard to get the best score and yet the difference was clearly shown. Even if the scored change the difference will always clear.

REFLECTION
56 56

While conducting this project, a lot of information that I found. I have learnt how statistics appear in our daily life and its importance. Apart from that, this project encourages me and other students who took this project to work together and share their knowledge. It is also encourage student to gather information from internet, improve thinking skills and promote an effective mathematical communication. Not only that, I had learned some moral values that Ive practice. This project had taught me to responsible on the works that are given to me to be completed on time and not procrastinate it. This project had also made me felt more confidence to do works and not to give easily when we could find the solution for the question which I have totally do it when I couldnt find the solution. I also learned to be more discipline on time which I was given about two weeks or three to complete this project and pass to my teacher a little bit late. I also enjoy doing this project, Ive spend my time with friends to complete this project and it had tighten our friendship. Last but not least, I proposed this project should be continue because it brings a lot of moral values to the student and also test the student understanding in additional mathematics.

THE ESSENCEOF MATHEMATICS IS NOT TO MAKE SIMPLE THINGS COMPLICATED, BUT TO MAKE COMPLICATED THINGS SIMPLE. ~S.GUDDER
57 57

58 58

S-ar putea să vă placă și