University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom Tel +44 1223 553997 Fax +44 1223 553621 Email ESOLHelpdesk@CambridgeESOL.org
www.CambridgeESOL.org
Cambridge ESOL 2010
Contents 1 Introduction 1 1.1 What is BULATS? 1 1.2 What levels of language ability does BULATS test? 1 1.3 How are BULATS results reported? 2 1.4 Who is BULATS suitable for? 2 1.5 What topics and situations are covered? 2 1.6 How should candidates prepare for BULATS? 4 2 BULATS Online and CD-ROM Reading and Listening tests 5 3 BULATS Standard test 7 4 BULATS Online Writing test 10 5 BULATS Writing test (paper-based) 11 6 BULATS Online Speaking test 12 7 BULATS Speaking test (face-to-face) 13 8 Test results 14 9 Statistical characteristics of the test 16 9.1 How accurately do the BULATS tests measure? 16 9.2 Are different versions of the tests equivalent? 19 9.3 Are the computer-based test and the Standard test equivalent? 20 9.4 On-going validation of BULATS 20 10 The development of BULATS 22 10.1 The history of BULATS 22 10.2 The development of the BULATS computer-based tests 22 10.3 The revision of BULATS in 2002 23 10.4 The Production Cycle for question papers 23
BULATS Test Specification: A Guide for Clients i
1 Introduction 1.1 What is BULATS? BULATS is a suite of language tests specifically for the use of companies and organisations which need a reliable way of assessing the language ability of groups of employees or trainees. BULATS is designed to test the language proficiency of employees who need to use a foreign language in their work and for students and employees on language courses or on professional/business courses where foreign language ability is an important element of the course. BULATS provides: relevant, useful and reliable language tests in work contexts reports on candidates' performance in terms of internationally understood standards test administration to suit the client company's individual requirements rapid turn around of test results information to help the interpretation of test results advice to companies on appropriate strategies for language testing, assessing language needs (language auditing) and training.
BULATS is a multilingual service, offering tests in English, French, German and Spanish. The following tests are offered: BULATS Online Reading and Listening test BULATS CD-ROM Reading and Listening test BULATS Standard test (a paper-based reading and listening test) BULATS Online Speaking test (English only) BULATS Speaking test (a face-to-face Speaking test) BULATS Online Writing test (English only) BULATS Writing test (a paper-based writing test) Tests can be combined to test all four language skills or they can be used independently. Tests of English are produced by the University of Cambridge ESOL Examinations, French by the Alliance Franaise, German by the Goethe-Institut and Spanish by the Universidad de Salamanca. These four institutions are members of the Association of Language Testers in Europe. BULATS is co- ordinated by University of Cambridge ESOL Examinations, who have been producing language examinations since 1913 and deliver over 1.5 million tests a year in over 130 countries. Cambridge ESOL is a part of Cambridge Assessment, one of the world's largest assessment agencies. 1.2 What levels of language ability does BULATS test? BULATS tests are suitable for all learners who need a language test with a business or workplace focus. There is no pass mark; rather candidates are placed in one of six levels or bands. These are expressed as levels on the Council of Europes Common European Framework of Reference for Languages (CEFR). BULATS Test Specification: A Guide for Clients | | 1 Introduction 1
The CEFR provides a series of levels of language ability from Beginner to Upper Advanced and is the standard framework used in Europe for comparing candidates who have sat different tests in different languages. They describe levels of ability in terms of what people can do in real situations. For example, a typical BULATS computer-based test or Standard test candidate who receives a CEFR level B2 would be expected to be able to understand most business reports and non-routine letters with the aid of a dictionary. A list of the functional-situational Can-do statements at each level can be printed on the back of all BULATS candidate reports and examples of these can be found in section 6. Common European Framework of Reference (CEFR) Levels Level description Cambridge ESOL certificated examinations at these levels C2 Upper Advanced CPE C1 Advanced CAE, BEC Higher B2 Upper Intermediate FCE, BEC Vantage B1 Intermediate PET, BEC Preliminary A2 Elementary KET A1 Beginner -
Candidates taking the Standard test or a computer-based Reading and Listening test are placed into a CEFR band based on their overall score on the module. They also receive scores for Listening, and Reading and Language Knowledge. Candidates taking a Speaking test or a Writing test are placed into a CEFR band based on examiner judgement of their performance. Different levels of performance within a CEFR band are also reported. 1.3 How are BULATS results reported? The exact format of the reporting of candidates test results is decided by the organisation they work for or study with. Typically, candidates receive a Test Report Form (TRF), which includes information about their level. On the reverse of the form there is a summary of the Can-do statements and a guide to the interpretation of scores. Group reports can also be produced for organisations who have entered several candidates. See section 8 for more details.
1.4 Who is BULATS suitable for? BULATS is suitable for any learner who needs to use English, French, German or Spanish at work. The test is designed to be suitable for a wide range of people at work ranging from technicians to secretaries or managers who may work in banking, education, manufacturing, administration, research, marketing or other sectors. It does not require any previous business experience, so it is also suitable for candidates who may need to use the foreign language in a work context in the future. 1.5 What topics and situations are covered? To ensure that BULATS tests are representative of the language used in business situations, a wide range of different functions and situations are covered. Below are some of the areas that candidates can expect to meet in a BULATS test. BULATS Test Specification: A Guide for Clients | | 1 Introduction 2
Personal information Asking for and giving personal details (name, occupation, etc.) Asking about and describing jobs and responsibilities Asking about and describing an organisation and its structure The office, general business environment and routine Arranging appointments/meetings Planning future events and tasks Asking for and giving permission Giving and receiving instructions Predicting and describing future possibilities Asking for and giving opinions Agreeing and disagreeing Making, accepting and rejecting suggestions Expressing needs and wants Discussing problems Making recommendations Justifying decisions and past actions
Entertainment of clients, free time, relationships with colleagues and clients Discussing interests and leisure activities Inviting, accepting and refusing offers and invitations Thanking and expressing appreciation Apologising and accepting apologies Travel Making enquiries, reservations, requests and complaints Health Health and safety rules in the workplace Leisure activities, interests and sports Buying and selling Understanding and discussing prices and delivery dates, offers and agreements Products and services Asking for and giving information about a product or service Making comparisons, expressing opinions, preferences, etc. Making and receiving complaints Results and achievements Descriptions and explanations of company performance and results, trends, events and changes Other topic areas A number of other topics in areas of general interest, such as food and drink, education (training, courses), consumer goods, shopping and prices, politics and current events, places, weather, etc. may be included. BULATS Test Specification: A Guide for Clients | | 1 Introduction 3
1.6 How should candidates prepare for BULATS? BULATS tests candidates ability to use foreign languages in real-life situations. So the best way to prepare for BULATS is to practise using the language in realistic situations. Advice to candidates on how to prepare for BULATS is given in the BULATS Information for Candidates Handbook. This is available from Agents for a small fee, or can be downloaded free of charge from the BULATS website. The skills tested in the Online and CD-ROM Reading and Listening tests are the same as in the Standard test. Candidates should familiarise themselves with the format of the computer-based test and the way they need to answer questions. A demonstration version of the computer-based test is available on the BULATS website (www.bulats.org). Teaching resources for tutors who are helping candidates to prepare for BULATS are available on the Cambridge ESOL website at www.CambridgeESOL.org/teach/bulats/ Cambridge ESOL has also developed two types of BULATS Online Courses: The blended learning course, which is delivered as a mix of online and classroom learning. The self-study course, which will be delivered entirely online. More information on these course can be found here: http://bulats.org/BULATS-Training-and-Learning-Courses/Overview.html
BULATS Test Specification: A Guide for Clients | | 1 Introduction 4
2 BULATS Online and CD-ROM Reading and Listening tests The BULATS Online Reading and Listening test and the BULATS CD-ROM Reading and Listening test are computer-based tests They are delivered via one of two modes: online or CD ROM. The task types in both modes are identical, and the candidate experience is the same. The main differences between the CD-ROM test and the online test are technical and administrative. The tests assess candidates ability to use the foreign language by presenting questions via a computer. Questions appear on screen and candidates answer them by clicking on a particular option or by typing in words or phrases. The tests are adaptive tests, which means they change in level of difficulty to meet the language level of the candidate. If a candidate gets questions right, the program will give the candidate more difficult questions. If they get questions wrong, it will give easier ones. The tests are supported by a large secure bank of tasks, which allows a quick and accurate assessment of a candidates language skills. There are eight types of question and they assess Reading and Listening skills, including grammar and vocabulary knowledge. The tests start by testing a candidates Reading proficiency before starting the second section which tests Listening. The task types can come in any order within each part of the test. As the test is adaptive, the length of the test will depend on the candidate's level of ability but it is usually approximately 60 minutes long. Tasks in the Online and CD-ROM Reading and Listening tests Skill Task type and focus Reading and Language Knowledge Multiple choice. Reading to understand e.g. notices, messages, timetables, adverts, leaflets, graphs. Multiple choice. Reading a longer text for understanding, e.g. a newspaper or magazine article, advert, leaflet. Multiple choice cloze. Medium length gapped text focusing on lexis and lexico-grammar. Open cloze. Medium length gapped text focusing on grammar and lexico-grammar. Multiple choice. Gapped sentences focusing on grammar and vocabulary, e.g. semantic precision, collocations, fixed phrases, linking words. Listening Multiple choice. Understanding short conversations or monologues. Candidates listen and select the correct answer. Multiple choice. Understanding short conversations or monologues. Candidates listen and select the correct picture or graphic. Multiple choice. Listening to extended speech for detail and inference. One monologue and one dialogue.
Candidates can hear the listening recordings twice. BULATS Test Specification: A Guide for Clients | | 2 BULATS Online and CD-ROM Reading and Listening tests 5
How are the results reported? Candidates receive an overall BULATS score and scores for Listening, and Reading and Language Knowledge. Each score is out of 100. The CEFR and BULATS scores are shown in the table below. CEFR Levels BULATS scores Level description C2 90100 Upper Advanced C1 7589 Advanced B2 6074 Upper Intermediate B1 4059 Intermediate A2 2039 Elementary A1 1019 Beginner
Candidates scoring 0-9 in the online test are indicated as being pre-A1.
BULATS Test Specification: A Guide for Clients | | 2 BULATS Online and CD-ROM Reading and Listening tests 6
3 BULATS Standard test The Standard test lasts 110 minutes and tests listening and reading skills, and knowledge of grammar and vocabulary. The sections, task format and focus, and the number of questions in each part or section are given in the tables below. Skill Part or Section Task format and focus Number of questions Listening Part 1 Graphical and written prompts with a short conversation or monologue. Candidates have to choose from 3 options. 10 The main focus is listening for specific information.
Part 2 Forms and notes with gaps for missing information. Candidates have to listen to phone messages, orders, etc. and complete the missing information. 12 The main focus is listening for specific information and completing notes and forms.
Part 3 Written prompts with short recorded sections or snippets, usually monologues. Candidates have to match the text to the most appropriate prompt. 10 The main focus is listening for global meaning. Part 4 Multiple-choice questions with extended speech in the form of monologues or dialogues. Candidates have to choose from 3 options. 18 The main focus is listening for specific information.
Subtotal: 50
Skill Part or Section Task format and focus Number of questions Reading and Language Knowledge Part 1 Section 1 Graphical and written prompts, for example notices, messages, timetables, adverts, leaflets, graphs, etc. Candidates have to choose from 3 options. 7 The main focus is reading for specific information.
BULATS Test Specification: A Guide for Clients | | 3 BULATS Standard test 7
Part 1 Section 2 Sentences where one word is gapped. Candidates have to choose from 4 options. 6 The main focus is knowledge of grammar and vocabulary.
Part 1 Section 3 A longer text, for example a newspaper or magazine article, advert, leaflet, etc. with multiple-choice questions. Candidates have to choose from 3 options. 6 The main focus is reading for specific information.
Part 1 Section 4 A medium length text with 5 words gapped. Candidates have to provide the missing words for the gaps. 5 The main focus is knowledge of grammar. Part 2 Section 1 Four short texts on a similar theme or topic with prompts. Candidates have to choose the prompt which refers to each text. 7 The main focus is reading for specific information.
Part 2 Section 2 A medium length text with 5 multiple-choice gapped items. Candidates have to choose from 4 options. 5 The main focus is knowledge of vocabulary. Part 2 Section 3 A medium length text with 5 words gapped. Candidates have to provide the missing words for the gaps. This task is the same format as Part 1 Section 4. 5 The main focus is knowledge of grammar. Part 2 Section 4 Sentences where one word is gapped. Candidates have to choose from 4 options. This task is the same format as Part 1 Section 2. 6 The main focus is knowledge of grammar and vocabulary.
Part 2 Section 5 A longer text, for example a newspaper or magazine article, report, etc. with multiple- choice questions. Candidates have to choose from 4 options. 6 The main focus is reading for specific information and general meaning.
BULATS Test Specification: A Guide for Clients | | 3 BULATS Standard test 8
Part 2 Section 6 A medium length text with a wrong word in some of the lines. Candidates have to identify and correct the wrong word or indicate that the line does not contain a wrong word. 7 The main focus is grammar. Subtotal 60 Total 110
How are the results reported? Candidates receive an overall BULATS score and scores for Listening, and Reading and Language Knowledge. Each score is out of 100. The CEFR and BULATS scores are shown in the table below. CEFR Levels BULATS scores Level description C2 90100 Upper Advanced C1 7589 Advanced B2 6074 Upper Intermediate B1 4059 Intermediate A2 2039 Elementary A1 019 Beginner
A1 includes candidates who are working towards A1.
BULATS Test Specification: A Guide for Clients | | 3 BULATS Standard test 9
4 BULATS Online Writing test The BULATS Online Writing test is a separate, stand-alone test and can be taken on its own or in conjunction with other BULATS tests. It is available in English. There are two parts to the BULATS Online Writing test. These are described in the table below: Section Format Time (approx.) Further information Part 1 Short message, email or letter (5060 words) 15 minutes Candidates write a short message, email or letter using information given in written input Part 2 Report or letter (180 200 words) 30 minutes Candidates write a short report or letter following brief instructions. For this part, candidates choose a task from two alternatives.
How are candidates assessed? Candidates are assessed by trained examiners. They are assessed on: accuracy and appropriacy of language organisation of ideas task achievement.. Examiners undergo a process of training and certification. This helps to ensure that different examiners award standardised scores that are reliable across candidates at different levels of ability. How are the results reported? The Test report will indicate which CEFR level the candidate has achieved. A strong performance within a level is shown by the word high. For example, a strong B1 level candidate will receive a result of B1 High BULATS Test Specification: A Guide for Clients | | 4 BULATS Online Writing test 10
5 BULATS Writing test (paper-based) The BULATS Writing test is a separate, stand-alone test and can be taken on its own or in conjunction with other BULATS tests. It is available in English, French, German and Spanish. There are two parts to the BULATS Writing test. These are described in the table below: Section Format Time (approx.) Further information Part 1 Short message, email or letter (5060 words) 15 minutes Candidates write a short message, email or letter using information given in written input. Part 2 Report or letter (180 200 words) 30 minutes Candidates write a short report or letter following brief instructions. For this part, candidates choose a task from two alternatives.
How are candidates assessed? Candidates are assessed by trained examiners. They are assessed on: accuracy and appropriacy of language organisation of ideas task achievement.. Examiners undergo a process of training and certification. This helps to ensure that different examiners award standardised scores that are reliable across candidates at different levels of ability. How are the results reported? The test report will indicate a candidates CEFR level. The report will also state whether they are high, middle, or low within that band. BULATS Test Specification: A Guide for Clients | | 5 BULATS Writing test (paper-based) 11
6 BULATS Online Speaking test The BULATS Online Speaking test is a separate, stand-alone test and can be taken on its own or in conjunction with other BULATS tests. It is available in English. The BULATS Online Speaking test has five parts. These are described in the table below: Section Format Time (approx.) Further information Part 1 Interview 3 minutes The candidate answers eight questions about him/herself, his/her work, background, future plans and interests. Part 2 Reading Aloud 2 minutes The candidate reads aloud eight sentences. Part 3 Presentation 2 minutes The candidate is given a work-related topic (eg The Perfect Office) to talk about for one minute. The candidate is given prompts about the topic and 40 seconds in which to prepare. Part 4 Presentation with Graphic 2 minutes The candidate is given one or more graphics (eg pie charts, line graphs) with a business focus (eg Company Exports) to talk about for one minute. The candidate has one minute in which to prepare. Part 5 Communication Activity 3 minutes The candidate gives his/her opinion on five questions related to one scenario (eg Planning a Conference). How are candidates assessed? Candidates are assessed by trained examiners. For Parts 1, 3, 4 and 5 candidates are assessed on: task achievement management of discourse pronunciation use of grammar and vocabulary and extent of hesitation. For Part 2 candidates are assessed on: overall intelligibility, pronunciation of individual sounds and stress & intonation is assessed. Examiners undergo a process of training and certification. This helps to ensure that they award standardised marks that are reliable across candidates at different levels of ability and over time. How are the results reported? The Test report will indicate which CEFR level the candidate has achieved. A strong performance within a level is shown by the word high. For example, a strong B2 level candidate will receive a result of B2 High BULATS Test Specification: A Guide for Clients | | 6 BULATS Online Speaking test 12
7 BULATS Speaking test (face-to-face) The BULATS face-to- face Speaking test is a separate, stand-alone test and can be taken on its own or in conjunction with other BULATS tests. It is available in English, French, German and Spanish. The BULATS face-to-face Speaking test is in three parts. These are described in the table below: Section Format Time (approx.) Further information Part 1 Interview 4 minutes The examiner asks candidates questions about themselves, their work and interests. Part 2 Presentation 4 minutes The examiner gives candidates a sheet with three topics on it. Candidates choose a topic and have one minute to prepare a short presentation. They speak on the topic for one minute. Afterwards, the examiner asks candidates one or two questions about their presentation. Part 3 Information exchange and discussion 4 minutes The examiner gives candidates a sheet with a role-play situation. Candidates ask the examiner questions to get the required information. This leads to a discussion on a related topic.
How are candidates assessed? The test is conducted and marked by one examiner. The Speaking test is recorded and the recording is sent to a second examiner who assesses the candidates speaking ability independently. Examiners use a set of scales to assess candidates ability in English. These scales focus on particular areas of language ability. They are: accuracy of language range of grammar and vocabulary pronunciation management of discourse. Examiners undergo a process of training and certification. This helps to ensure that they award standardised marks that are reliable across candidates at different levels of ability and over time. How are the results reported? The test report will indicate a candidates CEFR level. The report will also state whether they are high, middle, or low within that band.
BULATS Test Specification: A Guide for Clients | | 7 BULATS Speaking test (face-to-face) 13
8 Test results There are two types of test report. The group report can be provided to organisations who have entered a number of candidates for a BULATS session. This report lists candidates and their scores in each part, their overall score, and their CEFR level. The candidate report which can be printed onto pre-printed BULATS stationery. This report is normally referred to as the Test Report Form or TRF. Both kinds of report are normally provided by the Agent or organisation which arranges the test session. Test reports can be printed in English, French, German or Spanish. Below is an example of the front of a Test Report Form (TRF) in English for a candidate who took the BULATS Online Reading and Listening test. Apart from some minor details this is identical to the TRF for the CD-ROM Reading and Listening test and the Standard test: the candidates name, company and the date of test are on the front with the candidates overall CEFR level and the BULATS score (out of 100). In addition, the candidates BULATS scores for Listening, and Reading and Language Knowledge (both out of 100) are provided. Note that as Listening and Reading and Language Knowledge are not weighted equally the overall score is not necessarily the mean of the section scores; for example, a candidate who receives a Listening score of 50 and a Reading and Language Knowledge score of 70 will not necessarily receive an overall score of 60.
BULATS Test Specification: A Guide for Clients | | 8 Test results 14
Writing and Speaking Test Report Forms have a very similar format to that shown here and contain information about how strongly candidates performed within a CEFR level.. Below is an example of what is usually found on the reverse of a BULATS Test Report Form. The first table shows a list of Can-do statements corresponding to different CEFR levels. Can-do statements are functional/situational oriented statements that have been shown to describe what we would expect a candidate at a specific CEFR level to be able to do in the language they are being tested in. For example, a candidate who receives a CEFR level of B2 is expected to be able to understand most reports and non-routine letters, with dictionary help. Also included on the reverse is an explanation of the scores and further information regarding the scoring of the Speaking and Writing tests.
BULATS Test Specification: A Guide for Clients | | 8 Test results 15
9 Statistical characteristics of the test Two key qualities of an exam are validity and reliability. Validity relates to the usefulness of a test for a purpose: does it enable well-founded inferences about candidates' ability? Can performance in the test be interpreted in terms of ability to perform in the real world? Reliability relates to the accuracy of the measurement of the exam: does it rank-order candidates similarly in repeated uses? Can we expect a candidate to achieve the same score in two versions of the same test or in the computer-based and the Standard tests? This section presents evidence for the validity and reliability of BULATS. 9.1 How accurately do the BULATS tests measure? It is important for candidates and exam users to be confident that an examination produces scores that are accurate in that 1) the scores within one test are not significantly different for candidates who are at the same ability and 2) if a candidate sat two versions of the same test (and no increase in candidate ability occurred between the tests) he or she would get the same or nearly the same score on both tests. Language testers use the concept of Measurement Error, for example, to observe this. Error does not mean that the test contains mistakes but rather that candidates scores are not completely consistent across the test or between different versions of the test. Imagine a group of candidates who are all at the same level of language proficiency. If they sat a test they would not all get exactly the same score no matter how accurate or long the test was. This difference in scores could be due to a number of factors such as different levels of motivation or misinterpretation of a question or some candidates meeting questions that tested a particular area of language they were weak at. This difference in scores is an example of Measurement Error. Reliability A common way of measuring error and consistency of test scores is to use a correlation coefficient called Cronbachs Alpha. This operates by dividing the test into halves and correlates the candidates scores in one half of the test to their scores in the other half. It then adjusts the correlation to take account of the full number of items in the test as a whole. In theory, reliability coefficients can range from 0 to 1 but in practice we can expect them to be between 0.6 and 0.95, with the higher number indicating a more reliable test. BULATS agents are requested to return candidate answer sheets from the Standard test to Cambridge ESOL, where they are used to calculate the reliability of the BULATS Standard test. Table 1 opposite shows the reliability (Cronbachs Alpha) for the most recent versions of Standard BULATS based on a random sample of the live BULATS population. BULATS Test Specification: A Guide for Clients | | 9 Statistical characteristics of the test 16
Table 1: Reliability (Cronbachs Alpha) for most recent versions of Standard BULATS, by component and as a whole Standard test version Sample size of candidates Listening reliability Reading and Language Knowledge reliability Overall reliability EN21 520 0.93 0.95 0.97 EN22 468 0.92 0.92 0.96 EN23 959 0.92 0.93 0.96 EN24 1446 0.95 0.94 0.97 EN25 789 0.91 0.92 0.95
However, correlations depend on the rank ordering of candidates: consistent rank ordering is easier to achieve with a group of candidates with a wide range of abilities. Therefore, measures of reliability, such as Cronbachs Alpha, are as much dependent on the spread of ability of the candidate population as the accuracy of the test. This means that direct comparison of reliability across different tests with different populations and ranges of item difficulty can be misleading. In judging the adequacy of the reliability of a test we need to take into account the type of candidates taking the tests and the purpose of the test. The BULATS computer-based test is adaptive, which means that candidates receive items appropriate to what the test calculates as the candidates ability. Therefore different candidates will not be presented with the same items. This means that split-half methods of calculating reliability, such as Cronbachs Alpha, cannot be used. An analogous measure, the Rasch reliability, is used instead; rather than raw scores, this reliability measure uses candidates ability estimates. Table 2 below shows the reliability (Rasch) of computer-based BULATS based on a sample of the live BULATS population. Table 2: Reliability (Rasch) of computer-based BULATS, by component and as a whole Sample size of candidates Listening reliability Reading and Language Knowledge reliability Overall reliability 1407 0.92 0.89 0.94
The reliability of the overall test is 0.94, which is very high. At first glance, it may be surprising that the reliability of the Listening sub-section (0.92) is higher than that for the RLK sub-section (0.89) since the Listening section is shorter than the RLK section. However, the Listening section also shows a higher standard deviation of ability estimates which would help to increase the reliability, since reliability improves as the collection of score data becomes more widely spread from the mean and the range increases. Standard Error of Measurement Another way of describing the accuracy of a test is in terms of candidates' individual scores and the likely variation in those scores from their real or true scores; that is their scores if the test contained no Measurement Error whatsoever. (A true score can be defined as the mean score if a candidate were to take the test repeatedly.) This is what the Standard Error of Measurement provides. BULATS Test Specification: A Guide for Clients | | 9 Statistical characteristics of the test 17
The transformation of raw scores to BULATS scale scores is non-linear. Therefore the form of SEM that is most meaningful to report is the Conditional Standard Error of Measurement. This relates to a particular score in the test. In the case of BULATS, Conditional SEM is most useful when used to estimate the error associated with each band cut-off, that is the lowest score at a band. The conditional SEM will vary slightly according to test version and band cut-off, depending on the precise difficulty of the items. However, the values reported in the table below for a sample calibration version are typical. Table 3: Conditional SEM in BULATS Standard scores for a sample calibration version At band cut-off Overall SEM Listening/Reading and Language Knowledge SEM 5 +/-3 +/-4 4 +/-4 +/-5 3 +/-4 +/-5 2 +/-4 +/-5 1 +/-3 +/-4
The table above shows that candidates at Band 1 or 5 are likely to get a score that is within 3 points of their true score. They are almost certain to get a BULATS score within 6 points (2 Standard Error Measurements) of their true score. For candidates at Bands 24 these numbers are 4 and 8 respectively. It is always possible that a candidate will be at the borderline between 2 levels, but for the majority of candidates who take the BULATS Standard test the Standard Error Measurements reported above show that they will receive a band that is an accurate reflection of their true ability. It is extremely unlikely for candidates to receive a band that is more than one band higher or lower than their ability warrants. If we want to compare candidates scores, either those of two different individuals or the same candidates performance over time, it is necessary to take into account the Standard Error of Measurement of both scores. This is higher than that for a single score. We can calculate from the table above that a difference of 7 BULATS points between candidates probably indicates a real difference in language ability and candidates with a difference of 14 BULATS points are almost certainly at different language abilities. However, when comparing candidate performance over time it is also necessary to take into account another aspect of Measurement Error known as Regression to the Mean. This is a statistical phenomenon whereby candidates who score well below or above the mean will tend to score nearer the mean if they sit the test again regardless of any improvement in language ability. Regression to the Mean is a phenomenon of all tests. The SEM of the computer-based tests can be seen in the table below. Table 4: SEM (Rasch) of BULATS Scores in CB BULATS Version 6.1 (sample size = 1407) Overall SEM Reading and Language Knowledge Listening +/-5 +/-6 +/-7
BULATS Test Specification: A Guide for Clients | | 9 Statistical characteristics of the test 18
As for the table detailing Standard test SEM, the table above shows that the majority of candidates will receive a band that is an appropriate reflection of their true language proficiency level. It is highly improbable that a candidate will receive a band that is more than one level different from their true language proficiency level. Reliability in the Speaking and Writing tests BULATS Speaking and Writing tests contain tasks that are authentic in that they resemble tasks that candidates might be expected to perform in a business environment, such as writing a report or giving a short presentation. These types of tasks require assessment by trained examiners. Reliability in BULATS Speaking and Writing tests is centred on the need to ensure that examiners mark consistently over time (intra-rater reliability) and with other examiners (inter-marker reliability). This is maintained in the examiner certification and training process as detailed below. The Writing test assesses writing skills in relation to the workplace, and takes 45 minutes. The test is assessed by up to two trained language specialists. Writing examiners undergo a rigorous training programme in order to qualify and are required to re-certify every two years. In addition, sample scripts are regularly monitored by examiner monitors to ensure that examiners in the field are marking to accepted standards.
The Speaking test assesses speaking skills in the workplace. For the online test the candidates responses are recorded and marked by up to five trained examiners. For the face-to-face test the test is recorded and assessed by the examiner conducting the test and then by a second trained examiner. Provision is made for a third examiner to assess the interview in cases where examiner 1 and examiner 2 differ in their grading by more than two sub-bands. Oral examiners undergo a rigorous training programme in order to qualify and are required to re- certify every two years. In addition, sample performances are regularly monitored by examiner monitors to ensure that examiners in the field are marking to accepted standards. 9.2 Are different versions of the tests equivalent? Organisations often use BULATS over an extended period of time. Therefore they are likely to use more than one version of the Standard or computer-based test. It is essential that candidates and exam users are confident that different versions of the test are equivalent in that they produce scores and bands that are at the same level of language proficiency. The equivalence of different versions of the tests is promoted by the Examination Production Cycle, where item writers are trained and items are trialled and checked for suitability and difficulty. This process is explained in detail in section 8.4. Care is taken to ensure that the trial sample is representative of the BULATS candidate population in terms of first language background and language level. This allows us to check to see if any of the items show bias, that is whether they are particularly difficult for a specific group of candidates for non-linguistic reasons. Any such items are excluded from the final test. Items which appear adequate then enter an item bank, from which new BULATS versions are constructed that conform to set targets of Item Difficulty and Discrimination. Item Difficulty is a measure of the likelihood of a candidate of a fixed ability to answer the item correctly. Discrimination is a measure of the capacity of an item to distinguish or discriminate between weak and strong candidates. Item banking also allows tests to be constructed that contain items that test a representative sample of the grammatical structures, functions and topics associated with language used in a business environment. For each new standard BULATS version, transformation tables are produced which convert raw scores to BULATS standardised scores and BULATS bands. These tables are produced for Reading and Language Knowledge, and Listening, and for all items separately, so as to provide the component BULATS score and overall BULATS score and band. Items for the online and CD-Rom versions come directly from the calibrated item bank. BULATS Test Specification: A Guide for Clients | | 9 Statistical characteristics of the test 19
9.3 Are the computer-based test and the Standard test equivalent? Some organisations use both the computer-based test and the Standard test. In these situations it is essential that exam users are confident that both tests produce scores that are comparable. Whenever a new computer-based test is designed its items are taken from calibrated items in the item bank. To investigate the effect of the mode of the test (computer or paper based) a sample of candidates from a number of different languages are requested to take both the computer-based test and a Standard test. Their results are correlated and bands and scores in each test are compared to ensure that candidates receive similar scores in the computer-based and Standard tests taking into account the SEM for both tests (see Jones 2000). Below is a table showing the correlations of scores in recent computer-based and standard versions of the test. The measurement error of the tests, as discussed earlier, underestimates this correlation. This is taken into account and is known as Correction for Attenuation. This table shows that the mode of the test (paper or computer-based) has, in most cases, little effect on a candidates band and overall score. However, for the minority of candidates who are uncomfortable using a computer we cannot expect their scores to be the same in each mode but the overall band should be very close.
Correlation (corrected for attenuation) Band score 0.86 Overall BULATS score 0.95 Sample size 62
9.4 On-going validation of BULATS Cambridge ESOL places emphasis on the maintenance of quality by regular monitoring of candidate and examination performance. Re-appraisal and, where necessary, revision to ensure that examinations provide the most accurate, fair and useful means of assessment are key strengths of the organisation. This work is supported by the Research and Validation Group at Cambridge ESOL; the largest dedicated research team of any UK-based provider of English language assessment. Currently a number of projects are in progress related to BULATS and Business English. Volume 17 in the Studies in Language Testing Series: Issues in Testing Business English by Barry OSullivan, deals with the recent revision of the Business English Certificate (BEC) examinations and outlines Cambridge ESOLs understanding of the Business English Construct and model of communicative ability. On occasion, organisations and candidates are requested to help in providing data and feedback for research projects; their co-operation is welcomed. More information on Cambridge ESOL and its research and validation work can be found on the Cambridge ESOL website: www.CambridgeESOL.org BULATS Test Specification: A Guide for Clients | | 9 Statistical characteristics of the test 20
Cambridge ESOL produces Research Notes, a quarterly journal dealing with current issues in language testing and Cambridge ESOL examinations. These can be accessed on the website and a search for articles related to BULATS or other themes made. A selection of recent articles on BULATS and Business English is given below. BULATS: A case study comparing computer based and paper-and-pencil tests Neil Jones Research Notes Issue 3 (November 2000) CB BULATS: Examining the reliability of a computer based test using test-retest method Ardeshir Geranpayeh Research Notes Issue 5 (July 2001) Revising the BULATS Standard Test Ed Hackett Research Notes Issue 8 (May 2002) Some theoretical perspectives on testing language for business Barry O'Sullivan Research Notes Issue 8 (May 2002) Analysing domain-specific lexical categories: evidence from the BEC written corpus David Horner and Peter Strutt Research Notes Issue 15 (February 2004) Using simulation to inform item bank construction for the BULATS computer adaptive test Louise Maycock Research Notes Issue 27 (February 2007) Using the CEFR to inform assessment criteria development for Online BULATS speaking and writing Lucy Chambers Research Notes Issue 38 (November 2009) CB BULATS: Examining the reliability of a computer-based test Laura Cope Research Notes Issue 38 (November 2009)
BULATS Test Specification: A Guide for Clients | | 9 Statistical characteristics of the test 21
10 The development of BULATS 10.1 The history of BULATS The BULATS Standard test was first launched in 1997 in a limited number of countries. Since then the network has grown to more than 300 Agents in over 40 countries. The Speaking and Writing tests became available in 1998 and the computer-based test was launched in 1999. In mid-2002 a revised version of the Standard test was launched. This version was slightly longer than the original version and reported results in a slightly different way. At the same time the computer-based test was revised and the software upgraded. The computer-based test continues to be upgraded with the regular release of updated versions. In 2008 an online version of the computer-based test was launched, this is adaptive in the same way as the CD-ROM version but has increased flexibility in that no installation is needed. To add to the test provision, online Speaking and Writing test have recently been developed which are easy and cost effective to administer. 10.2 The development of the BULATS computer-based tests The first trial version of the BULATS computer-based test was released in 1998 in English and this was closely followed by a full range of tests in all languages which were available from 1999. The test is adaptive and is supported by a large bank of encrypted, secure tasks. An adaptive algorithm chooses items as the test progresses according to how the candidate performed on previous items. It allows candidates to face items at an appropriate level of difficulty and provides more accurate assessment of candidate ability than a non-adaptive test with a similar number of items. The software used in the BULATS computer-based test was developed for Cambridge ESOL by Homerton College, University of Cambridge, which is an acknowledged centre for the development of educational technology. Development has continued with the production of new versions of the test which combine revised item banks and improved software. Many of the changes are in response to customers requirements to make the application more customisable. These changes have included; an export facility so customers can export data to other applications, and changes to the registration screen to include more than one ID number and other company specific information. Alongside this, there have been improvements to security to maintain the integrity of the test, and improvements to software design to ensure the test runs effectively on a wide range of hardware specifications. The computer-based test is designed to run in both stand-alone and networked mode. There is a comprehensive manual available in six languages. Online BULATS was launched in 2008; this is also an adaptive test and is available in four languages. Online delivery means that the tests can be administered with no installation ; tests can be taken at any computer which has a sound card and is accessed by Internet Explorer. Results from the test are available immediately and allow flexible reporting so that groups or individuals can be scored. Online Speaking and writing tests were launched in 2010. These allow candidates to take a speaking or writing test online on any computer which has a sound card and is accessed by Internet Explorer. The candidate responses are recorded and uploaded: agents can then arrange for them to be marked on screen via the online system. These tests are easy to administer and allow great flexibility for agents, examiners and candidates. The new online BULATS Speaking test measures the same CEFR levels and measures to the same precision as the face-to-face test Speaking test. However it is important to note that the online and face-to-face Speaking tests are different types of tests; they contain different task types, have different assessment scales and measure different aspects of spoken performance. BULATS Test Specification: A Guide for Clients | | 10 The development of BULATS 22
Support for all computer-based products is offered through a dedicated helpline, local agent or office and our website. 10.3 The revision of BULATS in 2002 As with any test, candidature and usage change over time, and BULATS is reviewed regularly to ensure the tests are fair, make the best use of modern technology and meet customers needs and expectations. In the first few years of the test a more detailed picture emerged of the BULATS population and the needs of BULATS clients. A major study which included a questionnaire to existing BULATS Agents was carried out. As a result of this and other concurrent validation studies it was decided to revise some elements of the Standard test. The main revisions were: The length of the Listening section was increased from 30 to 50 items. As a result of this two sections of the Listening test were revised allowing candidates just one chance to listen to the Listening text. A number of changes to both the format and task types were made in the newly titled Reading and Language Knowledge section. Reading tasks and Language Knowledge tasks were alternated to avoid the disproportionatly negative effects of possible candidate lack of time and fatigue on the grammar and vocabulary tasks, which previously had been at the end of the paper. Amended tasks were extensively trialled to ensure the measurement characteristics of the test were maintained or improved and that validity and fairness issues had been accounted for. Feedback on the revised format has been positive. A more detailed treatment of the revision process can be found in Issue 8 of Cambridge ESOLs publication Research Notes, available online at www.CambridgeESOL.org/rs_notes/ 10.4 The Production Cycle for question papers
BULATS Test Specification: A Guide for Clients | | 10 The development of BULATS 23
BULATS Test Specification: A Guide for Clients | | 10 The development of BULATS 24 Cambridge ESOL employs teams of item writers to produce examination material, and throughout the writing and editing process strict guidelines are followed in order to ensure that the materials conform to the test specifications. Topics or contexts of language use which might introduce a bias against any group of candidates of a particular background (i.e. on the basis of sex, ethnic origin, etc.) are avoided. After selection and editing, the items are pretested. Pretesting plays an important role as it allows for questions and materials with known measurement characteristics to be banked so that new versions of question papers can be produced as and when required. The pretesting process helps to ensure that all versions conform to test requirements in terms of content and level of difficulty. For English Reading and Listening, items are pretested during the live administration of the online test or following the procedure for foreign language tests described below. A sample of Online candidates are given a small number of extra items during their tests; these items play no part in the candidates results and will not increase the length of the test significantly. These extra items will be taken by many different candidates and the data compared to items with known measurement characteristics resulting in the items being calibrated and linked to a common scale of difficulty. For foreign language versions, pretest items are compiled into pretest papers and these are supplied to candidates. The tests include anchor items which are carefully chosen on the basis of their known measurement characteristics and their inclusion means that all new items can be linked to a common scale of difficulty. These pretest papers are despatched to a wide variety of organisations which have offered to administer the pretests to candidates of a suitable level. After the completed pretests are returned to the Pretesting Unit at Cambridge ESOL, a score for each student is provided to the centre. The items are marked and analysed, and those which are found to be suitable are put into an item bank. BULATS question papers then go through an additional process of Standards Fixing which confirms the measurement characteristics of the tasks and items. Materials for the productive tests (Speaking and Writing) are trialled directly with candidates to assess their suitability for inclusion in the item bank. For further information, or to enquire about participating in BULATS Pretesting, please email: BULATSPretesting@CambridgeESOL.org