Documente Academic
Documente Profesional
Documente Cultură
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Association for Public Opinion Research and Oxford University Press are collaborating with JSTOR
to digitize, preserve and extend access to The Public Opinion Quarterly.
http://www.jstor.org
The
of
Content
Characteristics
Best-Selling Novels
BY JOHN HARVEY
The subject of this paper is the puzzle of
literary success. Discarding the subjective approach, the author has, in the research here
described, sought to isolate content characteristics which differentiate best- from poor-sellers.
Although perfect prediction of book sales will
require continued inquiry, he finds that certain content variables do appear to be associated with sales figures.
John Harvey is Librarian at Parsons College
in Fairfield, Iowa.
WHAT makes a best seller?" is a perennial question to which countless answers have been offered by publishers, booksellers, librarians,
and authors. However, none of their hypotheses and opinions have
demonstratedwhy one book sells better than another, nor is it yet possible to predict the sales of any book on the basis of tested criteria.
The study on which this article is based confined itself to best-sellnovels.1
The popularityof these novels might be explained in terms
ing
of their timeliness, their authors, their sales promotion, or the prestige
attached to having read them. By contrast, the present study concentrated on the content of novels in an attempt to isolate content characteristics differentiating best sellers from similar novels which failed to
sell so well.
PREVIOUSAPPROACHESTO THE PROBLEM
There have been numerous approachesto the "why?" of best sellers. The historical approach is best illustrated by Frank Luther Mott's
history of American best sellers, which contains two chapters on causation.2Mott found sensationalism,themes of religion and adventure,and
a sentimental treatment of subject matter recurring frequently in best
sellers. His major thesis, however, was the frequently encountered one
that there is no "best seller formula" and consequently that lists of important variablesare of little value.
1 Harvey, John F., The Content Characteristicsof Best-Selling Novels, unpublished Ph.D.
dissertation,GraduateLibrary School, University of Chicago, I949.
2 Mott, Frank Luther, Golden Multitudes; The
Story of Best Sellers in the U.S., New York:
Macmillan, 1947.
92
CONTENTS
OF BEST SETJERRS
93
Kappel and J. V. M. Berreman are more important because these pitfalls were avoided and reliable analysesobtained. Joseph Kappel studied
the literaryquality of the best sellers and book club selections of the last
twenty years.8He used Book Review Digest plus and minus ratings
of reviews as his measureof literaryquality and obtained from them annual index numbers showing the degree of favorability shown by the
reviews. These index numbers were obtained for all Book of the Month
Club and Literary Guild selections, for a random sample of each year's
new books, and for all best sellers (both fiction and non-fiction) of the
years I926 through I941. The annual favorability index numbers were
then plotted on charts to show comparisonsamong the groups of books
for each year. These charts showed that from a literary standpoint, best
sellers as well as Book of the Month Club and Literary Guild selections
ranked above the random sample (or average) each year. Kappel's
study thus tended to suggest that neither the best sellers nor the book
clubs have reduced the general quality of books in recent years.
94
ferencesin its publicityfactors,but he could not equally predictdifferencesbetweenthe salesof the top bestsellers(the "super"bestsellers)
and those lower on the best seller lists. In short,his publicityfactors
explainedonly grossdifferencesin sales.
Berremandevotedbut little attentionto content differences.On
the basisof reviews,he groupedsixty novelsinto twelve classesby setting, theme,and treatmentof theme.He concludedthat contentprobably affectedthe sale of poor sellerswhich were well-marketedas well
as best sellerswhich sold well despitefeeble effortsto marketthem.
He alsopointedout the probableusefulnessof contentin any attempt
to improvethe correlationswith sale shown in Table i.
TABLE i
BERREMAN'SCORRELATIONS
BETWEEN
PUBLICITY FACTORS AND SALE
I935 Random Sample From
Publicity Factors
PublishersLists
N
r
Advertising
Author prestige
Review wordage
Review favorability
1933-1938 Best
Seller Titles
N
+.28
77
+.79
228
.65
71
.48
77
77
.52
50
50
.IO
.09
.12
50
CONTENTS
OF BEST SELLERS
a. Action
b. Emotion
c. Personalitiesof the MajorCharacters
d. Plot Themes
e. Romanticization
95
15 differentvariables
50 differentvariables
750 differentvariables
350 differentvariables
50 differentvariables
f. Simplicity
32 different variables
Action. Action referred to any change or movement of the plot,
setting, or characters.An action was shown whenever the plot themes
changed, part of the setting moved (as for instance a door slammed),
the scene of action changed, or a charactermoved. Such measures as the
following were used: frequency of movement by each major character
per Ioo characterlines, the frequency with which the reader'sattention
was shifted from character to character, and the frequency per page
with which the scene shifted to a new location.
Emotion. The pattern of emotion referred to the amount, kind,
and use of the emotion expressedin each novel. It included the amount
of certain basic emotions shown by each character, the frequency of
change from one emotion to another, and the percentage of the lines
with an emotional charge.
Personalities of the Major Characters.This group of variables included many different aspects of each character'slife: for example, the
character'sage, occupation, education, nationality, social status, social
traits, morals, and prominence in the novel.
Plot Themes. The pattern of the plot themes referred to the inevitably differing themes used in different novels. The list of themes
was taken from the table of contents of Elbert Lenrow's Readers Guide
to Prose Fiction, which contains approximately 350 different themes.1
The coverage of this list is indicated by the following outline: entertainment and escape,the individual and his personal environment (people and their personal problems), and the individual and his social
environment (social, political, economic, vocational, religious, and
philosophical problems).
Romanticization. Examples of variables in this category included
extremes of each character'swealth or poverty, the contrast between income level at the beginning and end of the novel, the physical attrac11Lenrow Elbert, Readers Guide to Prose Fiction, New York: Appleton-Century, I940, pp.
vii-xi.
96
CONTENTS
OF BEST SELTT,ERS
97
98
99
OF THE METHOD
OF SAMPLING
THE NOVELS
Factors
I. Lines of central
male character's
level i emotion
a. Unhappiness
b. Love
c. Anger
2.
97%
I00
98
96%
I00
99
92%
I00
96%
99
97%
Io0
97
96
I00
IOO
I00
I00
89
89
93
I00
Number of major
characterstoward
whom centralmale
character's
attitudeis
a. Affectionate
b. Respectful
3. Sentimentaltheme
I00
I00
90
I00
89
93
97
90
Explanation
This table shows the results of rating 30 new sample pages from each of five novels for six
of the variables found to be good discriminators.The percentages show the proportion of lines,
pages, or characterswhich were rated the same way, or produced the same ratings, for the new
sample as for the original sample. In other words, the table shows the percentage of agreement
between the new and original samples for each of the six factors. Totals of 4850 lines and 29
characterswere rated on i50 new sample pages.
Reliabilityof the Variables.The objectivityof the presentdescription of contentwas also provenfor the abovesix variablesin tests of
theirreliability.The testsindicatedwhetherthesevariableswere being
measuredobjectivelyor subjectively-whetherthey could be measured
consistentlyby the same and other ratersor whetherthey were ill-defined and the measurementwas erratic.
The reliabilityof the ratingscould be testedby two methods:by
examiningthe consistencyof the originalrater,and by comparingthe
100
observations
of severalraters.Bothmethodswere employedin the present tests. The test of internalconsistency-testingthe percentageof
linesclassifiedthe sameway severalmonthslaterby the originalraterindicateda fairlyhigh degreeof agreementbetweenthe two ratings.
This is summarizedin Table3.
TABLE 3
RELIABILITYOF SIX IMPORTANTVARIABLES
Original Rater
(1)
Variables
ia
Ib
IC
2
3
4
(2)
Total
Ratings
2000
2000
2000
35
46
90
(3)
Agreement
1983
1996
I98I
31
4
83
(5)
(6)
(7)
Total Agreement
%
%
Agreement
Agreement Ratings
28
28
Ioo%
99%
86
Ioo
28
24
28
26
99
93
88
28
96
27
28
89
87
25
92
28
26
93
Code
I. Central male character'slevel I emotion
a. Unhappiness (level i)
b. Love (level i)
c. Anger (level I)
2. Number of major characterstoward whom central male character'sattitude is affectionate
3. Number of major characterstoward whom central male character'sattitude is respectful
4. Sentimental theme
Explanation
Columns 2 and 5 show the total agreement or "correct" ratings plus the total disagreement
ratings; disagreement ratings included both errors of commission-when a variable was added
where it did not belong-and omission-where a variable was not found in its proper place.
Columns 3 and 6 show the number of agreement ratings.
Columns 4 and 7 show the percentage which columns 3 and 6 are, of columns 2 and 5, respectively.
The original rater re-rated forty pages for variables ia, ib, and ic, ninety pages for variable
4, and eight novels involving the attitudes of fourteen major characterstoward eighty-one other
major characters for variables 2 and 3. The four student raters rated seven pages for each
of the variables.
In columns 2 and 3 the ratings should be read in this manner: for variables Ia, ib, and Ic,
the figures represent numbers of lines rated; for variables 2 and 3 the figures represent the
numbers of major characters rated; for variable 4 the figures represent numbers of sample
pages.
In columns 5 and 6 the ratings should be read in this manner: For all six variables the
figures represent the total numbers of pages on which each variable was rated by the summed
ratings of all four raters.
CONTENTS
OF BEST SELLERS
IOI
of the novels analyzed early in the study, Bottome'sMortal Storm. Definitions were then explained to three additional raters and their ratings
obtained for the seven pages. Table 3 indicates that the six variables
showed a high degree of reliability.
Pearson'sChi-Square(x2) test of the dispersionof individual scores
about an expected score was also applied to the observationsof the four
raters. This test measured the likelihood that the differences between
raters in the scores for each variable were actually significant. If the
scores were not significantly different, then the variations were due to
chance factors alone.
The mean score for all four raterswas used as the "expected"score
for each variable.The mean was used for this purpose because there was
no expected score in the sense that one rating or one rater was necessarilycorrectand all others incorrect.For each of the six variablestested,
results indicated that there was a probability of from .60 to I.oo that
the differences shown between the raters were due to chance. When
the variableswere lumped together and their total tested results were
equally favorable and the ratings reliable. There was a probability of
.80 to .90 that the differences were due only to chance factors; these
variables,therefore, were rated reliably in this test.
THE NOVELS
T02
neutralized,or held constantin the samplingoperation.It was, therefore, requiredthat the poor sellersused in the samplebe as well endowedwith publicityadvantagesas werethe bestsellers;in otherwords,
best and poor sellerswere equatedor matchedon publicityfactorsso
that the influenceof contenton sale could be studieddirectly.In addition to the influenceof publicityfactors,it seemedprobablethat such
factorsas the novels'themes,wordages,publicationdates,and perhaps
even favorabilityof reviews were also involved. Consequently,these
factorswere matchedfor best and poor sellersin the same way that
publicityfactorswere matched.
This matching of novels constituteda "matchedsample"from
the bestand poorselleruniverses.Use of the matchedsamplesuggested
the desirabilityof matchingand analyzingbestand poorsellers,title by
title. Accordingly,each best seller was closely matchedwith a poor
sellerto form a pairfor purposesof analysis.
Selectingthe MatchedSample:PublicityFactors.The closestfeasible matchingof publicityfactorsrequireduse of the weights which
Berremandevelopedfor his formula. Berremandid not explain his
weightsin detail,nor how he arrivedat his values,nor did he define
his categories.It was thereforenecessaryto adapthis procedure(as it
could be understood)to the presentstudy,and in some caseseven to
modify the originalweights.Each novel was scoredon each publicity
factor separatelyand a weight assigned;then all the weights were
summedfor the novel and the total taken as the novel'sfinal scoreon
publicity.A criterionwas adoptedfor eachpairof novelswhich stated
that the total of the poor sellerscoreson publicitymust alwaysequal,
if not exceed,that of the best seller.Pairsin which this criterionwas
not met were eliminatedbeforethe analysisstarted.The publicityfactors, sourcesof informationon them, and the weighting used in the
presentstudy are given in Table 4; total possiblescore is 9.5 points.
TABLE 4
SCORE
CARD
FOR WEIGHTING
Factor
ON PUBLICITY
FACTORS
Source
Weighting
PublishersWeekly
I.o
3.0
CONTENTS
OF BEST SELLERS
103
Literary uulld
0.0
I927-I936
I937-I943
I944-1946
B. Pulitzer Prize
C. Pre-publication and first month advertising
in the following journals:
New York Herald Tribune Sunday book
review section
New York Times Sunday book review
section
Publishers Weekly
Saturday Review of Literature
I. Total of more than eight pages
2. Total of five to seven and nine-tenths
I.0
3.0
World Almanac
Direct measurement
2.0
I.5
I.0
pages
nine-tenths
Total
three
to
four
and
of
3.
0.5
pages
D. Author popularity: Inclusion of author on Hackett: Fifty Years
Hackett lists during the preceding ten of Best Sellers
years for best selling novels with similar
I.0
themes
E. Wordage of reviews in the following jour- Direct measurement
nals:
Nation
New York Herald Tribune Sunday book
review section
New York Times Sunday book review
section
New Yorker
Saturday Review of Literature
Springfield Republican
Time
I.0
I. Total of more than 5500 words
2.
0.5
F.
I04
CONTENTS
OF BEST SETLLERS
I05
poor seller in eleventhplace, just off the list. Such a pairing would
scarcelyserveto comparea best with a poor seller becausetheir sales
would have been so nearlythe same.It was accordinglydecidedthat
the best and poor seller in each pair shouldbe separatedby at least a
4:i ratioin sale.
In the absenceof any reliablefigureson actualsale,an approximation was obtainedby summing the monthly decimal scoresfor each
novelin the PublishersWeeklyandconvertingthemto whole numbers;
the total thus obtained(hereafterreferredto as the PW score) served
as an estimateof the saleof eachnovelin relationto othernovels.These
monthly decimalsor percentageswere collectedfor the durationof
eachnovel'sstayon the lists,but not exceedingtwo yearsfollowing its
publicationdate; the two yearlimit gave everynovel the same chance
to makeits score.
Examplesof the Pairingand AnalysisProcedures.An exampleof
this scoringfor a pair of novelswould be in orderhere. One pair was
Langley'sA Lion Is in the Streets(the best seller), and Warren'sAll
the King'sMen (the poorseller).Langley'sbook appearedon the Publishers Weeklychartsfor six months, from June through November
of I945. During this period its monthly percentages were 43, 65, 65, 44,
37, and 26. When these scoresare summed,the total, 280, is the final
PW scorefor this novel. Warren'snovel appearedon the chartsfor
only one month, September,1946;its scorefor this month was 20 per
cent. Therefore,the best sellerscoring280 and the poor seller scoring
20 makea ratioof 14:I which is sufficiently
high to meet the criterion.
To clarifythe pairingprocedure,Table 5 shows the raw scoresof
the two novelson all of the matchingfactors.Comparisonof the raw
scoresin Table5 to the weightsin Table4 will allow calculationof the
scoreson the publicityfactors.Langleyscoredas follows: advertising,
i.o; and reviewing,o.5; while Warrenscored2.0 on the Pulitzerprize,
0.5 on advertising,i.o on reviewing,and 0.5 on references.These totals
are then i.5 for Langley and 4.0 for Warren,therebymeeting the
criterionthat the poor sellerscoreequal if not exceedthat of the best
seller.In general,the Langley-Warren
pairwas well matchedon all factors.Figurei summarizesthe gamutwhich all the pairswere forcedto
run and the steadydeclinein the numberof pairsstill acceptableafter
eachrequirementwas put forth.
io6
FIGURE i
SUMMARY OF MATCHING PROCEDURE WHICH
TWENTY-TWO
YIELDED
PAIRS OF NOVELSb
')A I
P
A
I
R
220
qo;o:
200
.:-:
180
. -z
'.'.'i
:i:
160
..,
%
....
%
..
...., ,.
*:.:.
''
e~
.....
140
0
,@
,@
......
.>.-.
.-.:-~
e -e 4
....
120
......
....
.......
.....
100
N
0
%
:::::::
''''
....
80
:-:---:
60
40
'o.%%
.'.
'
...........
....
:::::::
'' ''
%ooOo~oO,~,%,
,l
.o,.,,.,,
%%ooo~%~oOo,
*.-.-.-.-.-:-.
~~%~~~o~~o~~o,~
::::::::
~~o~~~~~~~~~~o,,
....r
.:.:.:.-.:.-.::
::::::::::
....~
:::::::
....
20
i i i i ii !
.:.:.:.:.:.:.:.
~
,-p
, ..
Ii i i i i
... .....
.--.-.-
..
CONTENTS
OF BEST SELTTERS
107
TABLE 5
SCORES OF LANGLEY'S
AND WARREN'SAll
Factors
Publicity Factors
Book club selection
Pulitzer prize
Advertisingpages:
Publishers Weekly
New York Herald Tribune
New York Times
SaturdayReview of Literature
Not a selection
No
4 pages
I I/2 pages
I page
I page
Total pages: 7 I/2
Never
o words
I200 words
Io50 words
200 words
950 words
500 words
o words
Total words: 3900
Never
Never
None
o references
3 references
Total references:3
6th in I945
Huey Long biography
May, 1945
15 month interval
150,000 words
Publishedin I945
280 PW score
5 reviews
Not a selection
Won prize for I946
2 pages
3/4 pages
i page
i page
Total pages: 4 3/4
Never
Author prestige
(on previousHackett lists)
Review wordage:
Nation
I400 words
New York Herald Tribune
900owords
1 50 words
New York Times
Ioo words
New Yorker
SaturdayReview of Literature 850 words
800 words
SpringfieldRepublican
Time
750 words
Serialization
Condensation
Authorreputationin anotherfield
Referencesto novel:
New York Times Index
ReadersGuide
Non-PublicityFactors
Hackett list
Majortheme
From 1930-1946period
Five-yearinterval
Wordagein novel
Order of publication
4:I ratio in PW score
Review favorability:
+
Published in I946
20 PW score
o reviews
7 reviews
3 reviews
o reviews
o reviews
I review
3 reviews
io8
The AnalysisProcedure.The analysisprocedureconsistedprincipally of coding the variableson each samplepage, totaling and transferringthemto codingsheets,and transcribingthe final scores(usually
expressedas frequenciesper Ioo words or a percentage)to the result
sheets.It wasby comparingthe final scoresof eachpairon eachvariable
on the resultsheetsand then applyingthe 50 per cent criterionthat the
retentionor eliminationof the variablewas judgedat frequentintervals
duringthe analysis.21
FINDINGS: THE IMPORTANT VARIABLES
TABLE 6
VARIABLES STILL IMPORTANT AT THE END OF THE ANALYSIS
Final Scorea
o
I
II
8
12
5
4
9
6
Io
Style
I. Readability (Flesch)
2. Average chapter wordage
3. Recency of events in the novel
21 The 50 per cent criterion stated that, to be retained for further analysis, a variable should
have scored significantly higher or lower on the best seller in at least half of the pairs already
analyzed.
CONTENTS
OF BEST SELLERS
og9
Final Scorea
4. Sentimental theme
5. Moralizing theme
9
7
8
15
5
o
1
9
6
II
II
6
9
Io
3
4
3
3
4
15
6
9
3
13
6
7
8
7
I
7
x8
9
o
9
6
4
12
12
Non-Content Variablesc
15. Prior publication date
i6. Wordage of the novel
a These scores are to be interpreted as follows: a score of I-12-9 means that in one pair of
novels the best seller's score was significantly larger than that of the poor seller, in twelve pairs
the scores were even, and in nine pairs the poor seller's score was significantly larger than that
of the best seller.
b Specific emotions were scored in five different ways: level 2 indicated very strong emotion
with the character in an extreme emotional state; level I indicated emotion of some strength
but not at an extremely high level; total emotion of each kind (unhappiness, love, anger, etc.)
contained a summation of the amount of emotion shown in both levels I and 2; all emotion
at level I and all emotion at level 2 were summed; and finally the totals for level I and level 2
were summed to obtain a score for all emotion displayed in the 30 page sample of each novel.
CVariables I5 and I6 were segregated from the other variables because they were only partially related to content; they contained both content and non-content elements and therefore
did not meet the rigorous interpretationof a content variable used in this study.
Regarding prior publication, it was hypothesized that, of two novels on the same theme,
the first to be published probably "soaked up" all of the market for that particular theme or
treatmentof it and left the second novel with little or no market at all. This involved the content of the novel in that it involved its theme, but publication date itself had nothing to do
with content.
Wordage involved the amount of the content in a novel, but non-content factors were present
because a large wordage apparently encouraged readers to buy with the idea that they were
getting more for their money, without regard to the ingredients of the novel. The readers
seemed to believe that their pleasure would last longer with a long novel than with a short one.
IIO
OF THE GROUPS
BEGINNING
Groups
Action
Emotion
Personalities
Plot themes
Romanticization
AND END
OF VARIABLES
AT THE
OF THE ANALYSIS*
Number of Variables at
Beginning of Analysis
Number of Variables
at End of Analysis
I5
54
o
Io
75
350
50
2
o
I
Simplicity
32
* Since several of the variables belonged to more than one group, the sums of the two columns equal more than the sums of the different variables included.
FINDINGS: THE FORMULAS
CONTENTS
III
II2
Variablesa
Percentage of CorrectPredictionb
I-4-I5
4-7-15
82%
82
I-4-7-9-I5
82
I-4-7-15
I-3-4-7-5I
80
80
4-I5
77
I-4-7-9
77
4-7-9-16
77
4-9-15
75
I-3-7-I5
75
aCode
Readability,
Recency of the events in the novel
Sentimental theme
Central male character'saffectionate attitude toward other characters
Central male character'stotal of level I emotion
I5. Prior publication date
i6. Wordage of the novel
b All discriminationsin this table were significant at or below the five per cent confidence level.
i.
3.
4.
7.
9.
CONTENTS
OF BEST SELLT,ERS
II3
II4