Sunteți pe pagina 1din 19

Preparing to review the literature systematically

Working with bibliographic records to plan your literature review1.


Stephen Gourlay, Kingston Business School

The purpose of this short guide is to introduce tools and techniques for analysing records downloadable from the Web of Science.2 The focus is on questions you are likely to ask in preparing to select materials for a literature review, such as:

What keywords can I use for my search? Who are the key authors? Which journals were these papers published in? Which authors, which papers, which journals, and which non-journals, are cited most frequently? Which papers have a number of citations in common? Are citations associated with each other? Which citations tend to be cited together?

To follow the techniques described here you will need to have access to the Web of Science (part of the Web of Knowledge), to have downloaded and installed Bibexcel, and AntConc, and to have access to Excel and Notepad (or a similar text editor). If you want to visualise some of the output you will also need a program like Pajek, *ORA (apart from Excel and Notepad, these are all free programs). Information about downloading these programs is at the end of this guide. In this version some of the steps are illustrated with pictures of the programs and the action.

Downloading records from the Web of Science


If you want to analyse citations it is important to search in the Web of Science (including the citation indexes) and not at the Web of Knowledge level (see Fig 1) as currently (May 2010) you can only download the record citations at the Web of Science.

Fig 1: Web of Science

1 v.3.12 October 2010 (minor amendments). My thanks to Olle Persson for help with Bibexcel. 2 Records from other databases may be used in the same way, but will also need preparation before they can be used. Such preparation is not covered in this guide. You need to explore this yourself. If you use Zotero to maintain your bibliography you can export records in the .ris format that Bibexcel can convert from.

Open the Web of Science. Do your search, then output the records. For the output you need to select which records to include (usually all of them) and then choose Full Record, plus Cited Reference (Step 1 in Fig 2). Under the Save drop-down menu, choose Save to Plain Text.

Fig 2: Saving Web of Science records Save this to a folder that Bibexcel can access easily (Step 2). (When using Bibexcel, it is easiest to put the data files in a folder below the folder that the program is in. Bibexcel is a very small program and can be run from e.g. a folder in your My Documents folder). The examples in this guide are taken from a sample of 100 records obtained by searching for papers in the Journal of Business Ethics.
Working with large Web of Science data sets

The Web of Science limits you to 500 records per download file. To analyse more records than this you need to download the records in separate files, then join them together to make one file before creating the .doc file in Bibexcel. To join two or more files together you will need a plain text editor Wordpad can be used (make sure you save the files as .txt files) or you can download a plain text editor like Notepad++, or NoteTab. 1) open the file you want begin editing. First inspect the file: You will see the beginning of the Web of Science records looks like this: FN ISI Export Format VR 1.0 Each bibliographic record begins: PT (followed by a space, and then a letter) and ends with ER and a blank line. At the end of the file are the letters EF . 2) Select the first two lines (as above) and delete them. 3) go to the end of the file; delete the line that says EF (but not the blank line that follows it). Save this file. 4) Open the next file that you want to copy records from. Delete the same lines as above (steps 1-3) then select and copy all the text, and paste it into the first file (saved at Step 3). 5) repeat Step 4 as many times as is necessary until all the records are in one file. 6) Finally, go to the end of the file containing all the records, and add EF at the end. Save.

Preparing the Web of Science records for Bibexcel


Bibexcel cannot directly use the records produced by the Web of Science. There is a utility in Bibexel to convert the file. 2

1) select (highlight) the file name in Bibexcel under Select file here3 (see Fig 3). 2) Click on Edit doc file / Replace line feed with carriage return4 3) results: a new file with the extension .tx2 will be created

Fig 3: Initial file preparation in Bibexcel Select the .tx2 file, and 1) click on Misc / Convert to dialog format / Convert from Web of Science 2) result: a .doc file containing the records Bibexcel can work with. This .doc file is the main record file and is the starting point for performing analyses in Bibexcel. It is a good idea to familiarise yourself with the structure. In Bibexcel, select the file, then click on View file and you can see the file contents in The List. You can see e.g. that the author field begins with AU, the title field with TI, and the citation list with CD. Each line ends with a | and each record with | |. In some fields (e.g. AU, CD) items are separated with a delimiter and the semi-colon (;) is often used for this. Once you have prepared the .doc file you are ready to use Bibexcel to find answers to the questions listed above. (NB do not load this file into Word! If you need to edit it, do so using Notepad, Wordpad or a similar plain text editor).

What keywords can I use for my search?


Identifying keywords (or search terms) is an important early step on a systematic literature study. Keywords are useful to help manage and focus searches in databases, and you can report which keywords you used for your search (which helps anyone else to replicate your search). There are several ways to identify potentially useful keywords:
3 Generally throughout this guide I have put program commands or options, or program labels to parts of the screen, in bold text. 4 The / indicates that what follows is another command or option. I.e. Edit .doc file / replace line feed means that you first select the option Edit .doc file and then the option Replace line feed .

list the keywords used in the database identify the most frequently used words in the title, or abstract look at a concordance of words (words in context) from the title.

What keywords have been used in the database?

If you look at the .doc file you will see a field with DE at the beginning this contains the keywords used in the database. You can obtain a list as follows (see Fig. 4): 1) Select the .doc file where it says Select file here 2) In the Old Tag box type: DE 3) Under Select field to be analysed choose Any ; separated field (as this describes the DE field). 4) Click on PREP to make an .out file containing the keywords.

Fig 4: Preparing a keyword list

5) Result: an .out file looking something like this: 1 1 1 1 2 2 2 business ethics communication ethics organizational culture issue decision making power holders corporatism economic influence

The numbers refer to the records in the .doc file (this is a consistent pattern in .out files). 4

How often have which keywords have been used?

It can be helpful to find out which keywords occur most frequently as this suggests which terms authors prefer: 1) Select the .out file (Select file here) 2) in the box Frequency distribution select Whole string from the drop-down box (see Fig 5). 3) Select Sort descending 4) Click on Start.

Fig 5: Getting frequencies

5) Result: a .cit file looking something like this: 19 14 5 5 4 4 4 3 3 ethics business ethics corporate citizenship corporate social responsibility corporate governance shareholder activism socially responsible investment corporate social responsibility (CSR) social responsibility

Here the numbers indicate the frequency with which the word or phrase appears in the database. Its clear that two terms dominate, but that a number of other phrases are also used.
Identifying frequently used words in the title

You may want to create your own list of keywords to supplement those created by authors or database compilers. A good place to begin is with the article titles. Create a file containing the contents (as separate words, not the whole line) of the Title (TI) field: 1) Select the .doc file where it says Select file here 5

2) In the Old Tag box type: TI 3) Under Select field to be analysed choose Blank-separated words (e.g. a title) (as this describes the Title field). 4) Click on PREP to make an .out file containing the title words. Bibexcel has an inbuilt list of stop-words (words you would normally not want to include such as and, the). Assuming you want to use this, respond No to the first question posed, and Yes to the rest. 5) Result: a .out file looking something like this: 1 Organizational 1 factors 1 encouraging 1 ethical 2 Illusions 2 corporate 2 power 2 Revisiting 2 relative 3 Firm 3 newness The numbers refer to the records in the list of references. To get a frequency count for each word: 1) Select the .out file (Select file here) 2) in the box Frequency distribution select Whole string from the drop-down box. 3) Select Sort descending 4) Click on Start. 5) Result: a .cit file with the most frequently used words at the top of the list. In this example (a sample of 100 papers from the Journal of Business Ethics) we find: 32 ethics 23 ethical 20 business 17 corporate 13 social (Bibexcel ignores capitalization - e.g. ethics and Ethics both occur, and count as the same word. Sometimes it will use the capitalized form (e.g. Research) and sometimes not).
Exploring a concordance of title words

Words are often used in pairs (or larger combinations). One way to view the title words in their context (a condordance) is to use AntConc in combination with Bibexcel, and Excel (or a similar editor). In Bibexcel, create a file containing the titles of the articles: 1) select the .doc file 2) put TI in the Old Tag box 3) in the box Select field to be analysed select Whole field intact from the dropdown box. 6

4) Click on PREP. 5) Result: a .out file containing the article titles (and the record numbers). You only want the titles (not the record numbers). Open the .out file in Excel. Select the column containing the titles, copy it, and save it as a .txt file. Run AntConc. Open the titles list file the name will appear in the Corpus Files list5. Click on the Word List tab, then on Start. Assuming you have already set up a stop-words list, words like a and and so on will not be included (see Appendix 2 for setting up a stop-word list). AntConc lists the rank of each word, number of occurrences, and the word. For example: 1 33 2 23 3 21 4 17 5 13 The same as before in Bibexcel. ethics ethical business corporate social

AntConc permits a variety of other kinds of analysis of these words, particularly by showing them in the context of other words. Click on a word in the Word List output list you will be taken to the Concordance tab where the word is shown in the titles in which it appears. Using the KWIC Sort (bottom left of the AntConc screen) you can use the Level 1 dropdown menu to sort the list on a number of words to the left, or to the right, of the word at the centre of the display screen. E.g. choose 1L in the drop-down box; click on Sort you now have a list organized alphabetically by the word preceding your focus word. You can also identify phrases using the N-grams or Cluster tab (this is the same tab which is labelled N-Gram6, or Clusters, depending on options selected within it). If the tab shows NGrams, select it, then click on the N-Grams tick mark to remove it. Enter a word in the Search Term box click on Start, and the result will be a list of phrases with that word in. You can control the number of words in the phrase (change the Cluster Size) whether the word is anywhere in the phrase (the default), on the left, or on the right (the Search Term Position tick boxes). The phrases are displayed in rank order by frequency of occurrence (this too can be changed). Searching for ethical, setting the Cluster Size maximum to 3, making the Search Term Position on the Left produces the following results: 1 2 3 4 5 6 3 2 2 2 2 2 ethical decision ethical climate ethical decision making ethical institutions ethical issues Ethical issues

The results are in rank order, with the number of occurrences in the middle column. You can now clearly identify phrases that can function as key words in a search, and the frequency with
5 A corpus is a collection of documents for analysis. 6 N-Gram is a term used in linguistics and computing contexts to refer to a sequence of n items. A word ngram of size 2 (sometimes called a bigram or bi-gram) is simply two words: in the three word sentence It is hot. there are two word n-grams: It is, and is hot. Whether n-grams will be useful depends on your needs. As the n-grams function includes words like a, and the etc. searching for Clusters will probably be generally more useful in text (qualitative) data analysis.

which they occur. You can perform a similar analysis on the abstract. Since this contains more information than the title (for example they may describe the context, the findings etc.) you may find additional useful information. To do this: Using Bibexcel, create a file containing the content of the Abstract (AB) field: 1) Select the .doc file where it says Select file here 2) In the Old Tag box type: AB 3) Under Select field to be analysed choose Whole field intact if you want to keep the text, or blank-separated words ... to have a list of words (excluding stop-words). 4) Click on PREP to make an .out file containing the title words. 5) Result: a .out file containing all the Abstract text. And repeat the analysis performed above for title words, if you think it worthwhile. In addition to these techniques, you could use Bibexcel to produce a list of words that occur together (co-occurrence). There is an example of co-occurrence analysis below (co-citation) that you can adapt to do this.

Who are the key authors?


There are several ways of identifying key authors from bibliographic data. One is to ask: which authors in my sample have published most frequently? Another is to ask: which authors in my sample have been cited most often? Identification of key authors in terms of publication frequency can be done in the same manner as for identification of key words. Using Bibexcel, create a file containing the content of the author (AU) field: 1) Select the .doc file where it says Select file here 2) In the Old Tag box type: AU 3) Under Select field to be analysed choose Any ; separated field (as author names are separated by a ; in the AU field). 4) Click on PREP. 5) Result: an .out file containing the authors names. You may need to take this .out file into Excel (or any other suitable text editor) to clean the data. (See at the end for instructions as to how to use Excel for this purpose). In this dataset, for example, it seems odd that there is an author called ADLER PA and also one called ALDER PA perhaps they are different, but perhaps there is an error in the data. You need to do some research to satisfy yourself that there is no error, to change the data accordingly.7 Once you are sure the .out file is suitable for analysis, to obtain a count of authors: 1) select the .out file where it says Select file here 2) in the Frequency distribution area, select Whole string from the list
7 Data from the ISI Citation Indexes often contain variations e.g. in the number of author initials recorded, or the actual initials recorded; there may be mis-spellings of names; and the journal title abbreviations vary. As a result you may need to clean the records to ensure that errors or variants in record date are corrected and the data is consistent throughout the file. This cleaning is important for a publishable bibliometric analysis, but for rougher work such as is the focus of this guide you can probably tolerate minor inaccuracies.

3) tick Sort descending 4) click Start. 5) Result: a .cit file showing author frequency by name. (To identify which authors are cited most often, see the note below, under Which authors are cited most often?)

Which journals were these papers published in?


Using Bibexcel, create a file containing the content of the Journal title (SO) field: 1) Select the .doc file where it says Select file here 2) In the Old Tag box type: SO 3) Under Select field to be analysed choose Whole field intact. 4) Click on PREP to make an .out file containing the journal name. 5) Result: a .out file listing the contents of the SO field. To get a count of which titles appear most frequently, create a .cit file. Once you are sure the .out file is suitable for analysis: 1) select the .out file where it says Select file here 2) in the Frequency distribution area, select Whole string from the list 3) tick Sort descending 4) click Start. 5) Result: a .cit file of titles by frequency.

Which authors, which papers, which journals, and which nonjournals, are cited most frequently?
The starting point to answer both questions is the same you need to create a file containing the cited records data, and then manipulate that. First create the .out file containing the cited records: 1) Select the .doc file where it says Select file here 2) In the Old Tag box type: CD 3) Under Select field to be analysed choose Any ; separated field (records in the CD field are separated by a semi-colon). 4) Click on PREP 5) Result: a .out file listing cited references looking something like this: 1 1 1 1 1 1 1 1 ADLER PA, 1994, P377, HDB QUALITATIVE RES ALBERT S, 1985, V7, P263, RES ORGAN BEHAV ALDER PA, 1987, MEMBERSHIP ROLES FIE BARNEY JB, 1986, V11, P656, ACAD MANAGE REV BARON MW, 1995, KANTIAN ETHICS APOLO BEISER FC, 1992, P26, CAMBRIDGE COMPANION BOROWSKI PJ, 1998, V17, P1623, J BUS ETHICS BOWEN SA, 2002, V2, P270, J PUBLIC AFFAIRS 9

BOWEN SA, 2004, V16, P65, J PUBLIC RELAT RES

To find out which sources have been cited most frequently you need to work with the whole of the citation information. 1) Select the .out file where it says Select file here 2) Where it says Frequency distribution select Cited reference from the drop-down box. 3) Click on Start 4) Result: a .cit file listing citations by frequency for example: 6 6 5 5 5 5 5 5 MITCHELL RK, 1997, V22, P853, ACAD MANAGE REV FRIEDMAN M, 1962, CAPITALISM FREEDOM JENSEN MC, 1976, V3, P305, J FINANC ECON TREVINO LK, 1986, V11, P601, ACAD MANAGE REV FERRELL OC, 1985, V49, P87, J MARKETING RAWLS J, 1971, THEORY JUSTICE WILLIAMSON OE, 1975, MARKETS HIERARCHIES ROKEACH M, 1973, NATURE HUMAN VALUES

If you only want to look at citations of journal articles then at step 2 above, select Cited journal whole string, and if you only want to look at citations of non-journal materials then select Cited non-journal. Bibexcel assumes that a cited document with a volume number is published in a journal (Persson et al 2009, p. 13) which while generally satisfactory may need checking (depending on your needs). To find out which authors are cited most frequently, you have to separate out the author part of the citation record. This is easily done with Bibexcel: 1) Select the .out file where it says Select file here 2) Where it says Frequency distribution select Cited author from the drop-down box. 3) Click on Sort descending 4) Click on Remove duplicates. An author might be cited more than once in the same publication (see Bowen SA in the example above) but since you only want to count a cited author once for each paper in which they are cited) you need to remove duplicates. 5) Click on Start 6) Result: a .cit file of cited author frequency. In this example the first few lines are as follows: 26 DONALDSON T 23 FRIEDMAN M 19 TREVINO LK 19 KOHLBERG L 19 CARROLL AB 17 2002 17 GRUNIG JE 16 SINGHAPAKDI A Donaldson has been cited 19 times; Friedman 23 times; Trevino, Kohlberg and Carroll 19 10

times. The fact that 2002 has been cited suggests the list of citations needs editing to remove citations where there is a date instead of an author name. Simply open the .out file in any text editor, identify non-authors that have made it into the author field for some reason, and delete them. Then repeat the procedure to make a .cit file.

Which papers have a number of citations in common?


If different papers have a number of citations in common, this might indicate something of interest to investigate. (Since people cite for many reasons it is not possible to say what having common citations indicates without further investigation). To find out which pairs of records ( and thus authors of papers) have common citations involves what is called bibliographic coupling in bibliometrics, and is easily accomplished with Bibexcel. First prepare a .out file containing the cited references: 1) select the .doc file 2) type CD in Old Tag 3) select Any ; separated field 4) press PREP 5) result: .out file with each line containing the whole citation. Second, create the file showing how many citations each pair of records have in common. 1) select the .out file 2) click: Analyze / Shared units/coupling / shared units 3) result: a .cou file looking something like this: 20 16 24 18 1 26 16 15 36 15 14 53 10 14 14 Column 1 shows how many citations shared by the pair of records indicated in columns 2 and 3. (e.g. records 16 and 24 have 20 citations in common). To make this easier to understand, you can replace the record numbers with the record author names. First make a new .out file containing the authors names: 1) select the .doc file 2) type AU in Old Tag 3) select Whole field intact 4) press PREP 5) result: .out file with each line containing the authors, one line for each record. Second: 1) select the .cou file 2) enter the FULL path of the .out file in the Type new filename here box. E.g. C:\Documents and Settings\Stephen\My Documents\bibexcel\tk\myfile.out (If you dont provide the full path of the .out file, Bibexcel will not be able to find it and will 11

give you a Run time error message, and then close!). The easy way to do this: a) open the file explorer (My Computer), navigate to the folder where the .out file is. The file path is where it says Address. b) Click on (highlight) the file path and copy this. c) Return to Bibexcel - paste the file path into the Type new filename here box. d) at the end of the file path, type \ followed by the last part of the file name (e.g.: \myfile.out). 3) click Add data/Classify / Add labels to freq-docnr-docnr and then choose to replace the document (record) numbers ( making freq-label-label), or to keep them ( making freq-docnr-label-docnr-label). 4) Result: a .cdd (or .ad2) file (depending on which replacement option you chose, looking something like this:
19 11 6 5 5 Isaac RG; Wilson LK; Pitt DC Isaac RG; Wilson LK; Pitt DC Al-Khatib JA; Robertson CJ; Lascu DN Winter SJ; Stylianou AC; Giacalone RA Adam AM; Rachman-Moore D Valentine S; Fleischman G Cox P; Brammer S; Millington A Garriga E; Mele DN Michelson G; Wailes N; van der Laan S; Frost G Sparkes R; Cowton CJ

The .ad2 file retains the record numbers as well as has the author names.

Are citations associated with each other? Which papers / authors tend to be cited together?
Authors writing about related topics often do not have unique sets of citations, but instead when they cite one particular source, they are likely to cite another particular source. We can find out how citations are associated, and thus what kinds of sources dominate a field, from these bibliographic records. This involves performing co-citation analysis. There are four steps to to performing a co\-citation analysis on cited authors names: First, create an .out file containing the citations: 1) select the .doc file 2) type CD in Old Tag 3) select Any ; separated field 4) press PREP 5) result: .out file with each line containing the whole citation. Second, count how often each author has been cited. 1) Select the .out file where it says Select file here 2) Where it says Frequency distribution select Cited author from the drop-down box. 3) Click on Sort descending 4) Click on Remove duplicates. 5) Click on Start 6) result: a .cit file containing frequencies of cited authors.

12

Third, create a new out file containing just the author names. 1) select the .out file (containing the full citation information) 2) Where it says Frequency distribution select Cited author from the drop-down box. 3) Click on Remove duplicates (an author might have been cited more than once in a paper, and so their name will occur more than once in a bibliographic record. To analyse author co-citation you need to remove duplication of author names from the record). 4) Click on Make new outfile 5) result: a .oux file containing the author names. Fourth, select the highly frequently cited authors in order to make the co-citation analysis. (Normally you focus on highly cited authors since co-citation is most evident for them rather than look at all possible co-citations). 1) Select the .cit file, then click on View file. The file should be visible in The List. Now click on the first line of the file (in The List) and select (highlight) further lines (Shift + down-arrow key) until you have selected several lines. Its up to you what cut-off point to choose. 2) Click: Analyse / Co-occurrence / Select units via listbox. This will remove all the entries from The List, except those you have highlighted. 3) Click on the .oux file in Select file here but do not view it! 4) Click: Analyse / Co-occurrence / Make pairs via listbox. 5) Result: a .coc file containing co-citation information. Depending on which choices you made after step (4) above, the file might look like this: 8 7 5 5 5 DONALDSON T$20 CARROLL AB$12 CARROLL AB$12 DEGEORGE RT$9 DONALDSON T$20 FRIEDMAN M$13 FRIEDMAN M$13 DONALDSON T$20 DONALDSON T$20 RAWLS J$9

Here, Donaldson T and Friedman M are cited together 8 times in this data set; Donaldson is cited 20 times, and Friedman 13 times (you can choose not to have the citation frequency of each individual item in this output).
Visualising the co-citation output

Textual representation of co-citations (as above) may be difficult to use. For example, from the .coc file fragment above we can see that Donaldson is paired with Friedman, Carroll, Degeorge, and Rawls but this information occurs on separate lines. We can make use of network mapping programs to produce a diagram showing the network of relations amongst these authors. Pajek is one of several free network visualization programs, and is easy to use with Bibexcel because there are Bibexcel routines to create the Pajek input files (the following is drawn from Persson et al 2009 see the paper for more details). To create a network map of the co-citation data in Bibexcel: 1) perform the analysis as above i.e. create a .coc file, and a .cit file 2) select the .coc file click on To pajek / Create net-file from coc-file (Answer No at 13

the prompt as the results of co-citation analysis are not directed arcs). 3) Result: a .net file is created. 4) Select the .cit file click on To pajek / Create vec-file 5) result: a .vec file is created. 6) select the .coc file click on Analyse / Co-occurances / Cluster pairs 7) result: 3 files containing information about cluster pairs in the .coc file these are .per, .pe2 and .pe3 files. 8) Select the .pe2 file click on To pajek / Create clu-file. Now run Pajek, and 1) under Networks click on the folder icon, navigate to the folder containing the Pajek files youve just created, and open the .net file (this should be the only one Pajek identifies at this point). 2) Under Partitions click on the folder icon, and open the .clu file 3) under Vectors click on the folder icon, and open the .vec file. 4) Click on Draw / Draw-partition-vector. 5) Result a new window with the initial network map or diagram of the co-citations. 6) Clicking on Layout / Energy / Kamada-Kawai / Free produces a more easily understandable layout. 7) Experiment! For this test data set one result is:

Fig : Co-citation map 1 (Pajek)

To produce this result I experimented with Options / Size..., and the vertices (i.e. the circles) were marked using vector values the same values as in Bibexcels .cit file. Interpreting these network diagrams fully requires further work. However, is appears from this diagram that 14

there is a single network of authors/citations, dominated by Donaldson. This is not surprising given the focused nature of the sample (100 records from one journal). So you might expect Donaldsons ideas to play a role in papers in this journal (what role remains to be identified). If you want to publish in the journal, then youd perhaps be well advised to cite Donaldson! You can also load the .net file into *ORA, another network visualization program. Like Pajek this is a complex program the full use of which requires an understanding of network analysis concepts. Basic *ORA use is quite intuitive however. It is quite easy in the map display to interactively reduce the amount of information displayed (e.g. Hide links with weights less than then choose a value) to produce a simplified version of the map.

Fig: Co-citation map 2 (*ORA)

Here for example the links with weight <3 have been hidden leaving Williamson, Trevino and Kohlberg isolated from the other authors, suggesting they are less central to the network. The program also provides a variety of forms of output about the network in addition to the maps. For example selecting Measure Charts produces a great many measures, such as Boundary spanner, Potential: network; a number of measures of centrality, and so on (to understand just what these concepts mean requires further investigation of the program and these types of measures). (Some of these, and other relevant measures, may be available in Pajek, but I have not explored either program properly).

In conclusion
This guide shows how you can analyse bibliographic records, particularly those obtained from the Web of Science, to help guide your search and selection choices. The analyses can also be used to describe your literature review data collection and selection processes thoroughly. If you lack the resources to conduct a systematic literature review, as most of us do, you can at least review the literature systematically describing the characteristics of your sample, justifying the selections made, and providing sufficient information about your methods to permit others to replicate this part of your review. In addition you have learned some of the rudiments of bibliometrics, and will perhaps want to pursue this skill further (see below). 15

All instructions and software mentioned in this guide were tested in April 2010. If you find any mistakes please email me: gourlaysn@kingston.ac.uk If you have difficulties using the software, read the manuals and help-files (and I may be able to help).

References and resources


Persson, O., Daniell, R., & Schneider, J. W. 2009, How to use Bibexcel for various types of bibliometric analysis, in strm, F. et al (eds) Celebrating scholarly communication studies. International Society for Scientometrics and Informetrics, e-zine Special issues Vol. 05-S, pp. 9-24 (http://www.issi-society.info/ollepersson60/ollepersson60.pdf accessed 14 April 2010).
Free software links

Bibexcel: http://www8.umu.se/inforsk/Bibexcel/ AntConc: http://www.antlab.sci.waseda.ac.jp/software.html Pajek http://pajek.imfm.si/doku.php *ORA http://www.casos.cs.cmu.edu/projects/ora/ (Another free text analysis program like AntConc is TextSTAT perhaps simpler to use, but you cannot create a stop-word list: http://neon.niederlandistik.fu-berlin.de/en/textstat/ ). (Links correct on 18 April 2010. You should always check for updated software).
Further information

For further information on informetrics, bibliometrics and scientometrics see e.g.: http://en.wikipedia.org/wiki/Informetrics (this has links to the other terms) http://www.ischool.utexas.edu/~palmquis/courses/biblio.html a short outline of bibliometrics http://www.norslis.net/2004/Bib_Module_KUL.pdf a lengthy bibliometrics course guide. http://users.fmg.uva.nl/lleydesdorff/software.htm bibliometrics course materials; more free software for bibliometric analyses.

Some examples of bibliometric analysis


Acedo, F.J., Barroso, C. & Galan, J.L., 2006. The resource-based theory: Dissemination and main trends. Strategic Management Journal, 27(7), 621. Cornelius, B., Landstrm, H. & Persson, O., 2006. Entrepreneurial Studies: The Dynamic Research Front of a Developing Social Science. Entrepreneurship Theory and Practice, 30(3), 375-398. Danell, R., 2000. Stratification among Journals in Management Research: A Bibliometric Study of Interaction between European and American Journals. Scientometrics, 49(1), 23-38. De Bakker, F.G.A., Groenewegen, P. & Den Hond, F., 2005. A Bibliometric Analysis of 30 Years of Research and Theory on Corporate Social Responsibility and Corporate Social Performance. Business Society, 44(3), 283-317. Gu, Y., 2004. Global knowledge management research: A bibliometric analysis. Scientometrics, 61(2), 171-190. Smith, G.M., 1977. Key Books in business and management studies: a bibliometric analysis. 16

Aslib Proceedings, 29(5), 174 - 188.

Appendix 1: Standardizing the .out file list in Excel (for author names)
Open the .out file in Excel. The example assumes access to Excel 2007. Select the column containing the authors names. To separate the initials from the whole name: Click Data (tab), Text to Columns. Make sure the Original data type is Delimited (then click Next); identify Space as the delimiter (then click Next) and Finish. The initials will now be in a separate column. To check if any names (surnames) occur with more than one variant of the initials: Sort the data on the Surname (making sure you sort the whole spreadsheet, not just the Surname column). Then inspect the data to see whether the same surname occurs with one and with two initials, or there are other reasons to suspect that someones name occurs twice. To reduce the number of initials to one (i.e. the first): assuming column C holds the initials, place the cursor in the first row of column D; enter =left(C2) and press Return. This should put the first initial into D1. Then select the column, and copy the formula to all cells in that column. You need to replace the formula with the actual values select all the column, copy, select paste special and select Values; paste these into another column. To re-create the name+first_initial string: in an empty column, select the first cell, type =CONTATENATE(A1, ,B1) (assuming A1 is the surname cell, and B1 is the single initial cell). A new value will be created consisting of the surname+space+initial. Copy this formula to the other cells in the column, then convert this from formula to value (as above). To finish off: Delete any redundant columns so that the spreadsheet consists only of the first column (the record numbers) and the surname+initial column. Sort this again on the first column (I.e. return it to the same order as it was originally when you imported it). Make sure only the columns and rows that are relevant are saved, and save as in the same format as the original .out file (for simplicity, just replace the .out file). The file is now ready for use in Bibexcel again (make sure you keep in in tab-delimited format). This procedure can be used for working on other text columns.

17

Appendix 2: Creating a stop-word list in AntConc


Text files contain many words that usually are of little interest so far as coding, or identifying common words and concepts, are concerned such as: a, the, an, and . You may decide you want to create a list of these so that they will be excluded from word counts. Such a list is called a stop list. Assuming you have loaded some files: 1) click on the Word List tab 2) click on Start 3) result: a frequency list of words in the documents.

4) Save this output File / Save output to text file. 5) Open this file in Excel you should see the words are all listed in one column. Simply select the words to include in the stop list usually those at the top of the list. Copy these, open a new text document (or a new sheet in Excel) and save this with a suitable name e.g. stoplist.txt in the AntConc program folder. (You can save it anywhere putting it here is just for convenience). If you later find other words you want to put in the list you can either enter them manually in the window where the words occur, or edit and then reload the stop list file. You can edit this file to add or delete words. To use this in AntConc: 1) click on Tool Preferences / Word List 18

2) under Word Range List options, click on the Use a stoplist button; Open the stoplist.txt file using the Open button, and then click Apply.

Fig : Setting up a stop-word list

19

S-ar putea să vă placă și