Sunteți pe pagina 1din 316

Principles of Mathematics for Economics1

Simone Cerreia-Vioglio
Department of Decision Sciences and IGIER, Università Bocconi

Massimo Marinacci
AXA-Bocconi Chair, Department of Decision Sciences and IGIER, Università Bocconi

Elena Vigna
Dipartimento Esomas, Università di Torino and Collegio Carlo Alberto

5 September 2016

1
This manuscript is a very preliminary version of a textbook that will be published by Springer
International Publishing (ISBN 978-3-319-44713-1). It is for the personal use of Bocconi students who
are attending …rst year mathematics courses. We thank Gabriella Chiomio and Claudio Mattalia,
who thoroughly translated a …rst version of the manuscript, as well as Alexandra Fotiou, Giacomo
Lanzani and Kelly Gail Strada for excellent research assistance, Margherita Cigola, Guido Osimo,
and Lorenzo Peccati for some very useful comments that helped us to improve the manuscript. We
are especially indebted to Pierpaolo Battigalli, Erio Castagnoli (with whom this project started),
Itzhak Gilboa, Fabio Maccheroni, Luigi Montrucchio, and David Schmeidler for the discussions that
over the years shaped our views on economics and mathematics.
ii
Contents

I Structures 1

1 Sets and numbers: an intuitive introduction 3


1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Properties of the operations . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.4 A naive remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Structure of the integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Divisors and algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Order structure of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4.1 Maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.2 Supremum and in…mum . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.4.3 Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Powers and logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5.1 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5.2 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.6 Numbers, …ngers and circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.7 The extended real line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.8 The birth of the deductive method . . . . . . . . . . . . . . . . . . . . . . . . 38

2 Cartesian structure and Rn 41


2.1 Cartesian products and Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Operations in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3 Order structure on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.1 Static choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.2 Intertemporal choices . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5 Pareto optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5.2 Maxima and maximals . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5.3 Pareto frontier and Edgeworth box . . . . . . . . . . . . . . . . . . . . 53

iii
iv CONTENTS

3 Linear structure 59
3.1 Vector subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Linear independence and dependence . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Generated subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6 Bases of subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Euclidean structure 75
4.1 Absolute value and norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.1 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.2 Absolute value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1.3 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5 Topological structure 85
5.1 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 Taxonomy of the points of Rn with respect to a set . . . . . . . . . . . . . . . 90
5.3.1 Interior, exterior and boundary points . . . . . . . . . . . . . . . . . . 90
5.3.2 Limit (accumulation) points . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.5 Set-theoretical stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6 Compact sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.7 Closure and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6 Functions 105
6.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2.1 Static choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2.2 Intertemporal choices . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3 General properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.1 Preimages and level curves . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.2 Algebra of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4 Classes of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.4.1 Injective, surjective, and bijective functions . . . . . . . . . . . . . . . 126
6.4.2 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4.3 Bounded functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.4.4 Monotonic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.4.5 Concave and convex functions (preview) . . . . . . . . . . . . . . . . . 138
6.4.6 Separable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.5 Elementary functions on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5.1 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5.2 Exponential and logarithmic functions . . . . . . . . . . . . . . . . . . 142
6.5.3 Trigonometric and periodic functions . . . . . . . . . . . . . . . . . . . 144
CONTENTS v

6.6 Maxima and minima of a function (preview) . . . . . . . . . . . . . . . . . . . 149


6.7 Domains and restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.8 Grand …nale: preferences and utility . . . . . . . . . . . . . . . . . . . . . . . 153
6.8.1 Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.8.2 Paretian utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.8.3 Existence and lexicographic preference . . . . . . . . . . . . . . . . . . 156

7 Cardinality 159
7.1 Actual in…nite and potential in…nite . . . . . . . . . . . . . . . . . . . . . . . 159
7.2 Bijective functions and cardinality . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3 A Pandora’s box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

II Discrete analysis 169

8 Sequences 171
8.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.2 The space of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.3 Application: intertemporal choices . . . . . . . . . . . . . . . . . . . . . . . . 175
8.4 Images and classes of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.5 Limits: introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.6 Limits and asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.6.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.6.2 Limits from above and from below . . . . . . . . . . . . . . . . . . . . 182
8.6.3 Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.6.4 Topology of R and general de…nition of limit . . . . . . . . . . . . . . 183
8.7 Properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.7.1 Monotonicity and convergence . . . . . . . . . . . . . . . . . . . . . . 187
8.7.2 Heron’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.7.3 The Bolzano-Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . 191
8.8 Algebra of limits and fundamental limits . . . . . . . . . . . . . . . . . . . . . 194
8.8.1 (Many) certainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
8.8.2 Some basic limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.8.3 Indeterminate forms for the limits . . . . . . . . . . . . . . . . . . . . 198
8.8.4 Summarizing tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.8.5 But how many indeterminate forms are? . . . . . . . . . . . . . . . . . 202
8.9 Convergence criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
8.10 The Cauchy condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.11 Napier’s constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.12 Orders of convergence and of divergence . . . . . . . . . . . . . . . . . . . . . 213
8.12.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.12.2 Little-o algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.12.3 Asymptotic equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.12.4 Characterization and decay . . . . . . . . . . . . . . . . . . . . . . . . 221
8.12.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.12.6 Scales of in…nities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
vi CONTENTS

8.12.7 The De Moivre-Stirling formula . . . . . . . . . . . . . . . . . . . . . . 224


8.12.8 Distribution of prime numbers . . . . . . . . . . . . . . . . . . . . . . 225
8.13 Sequences in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

9 Series 229
9.1 The concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.1.1 Three classical series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.1.2 Intertemporal utility with in…nite horizon . . . . . . . . . . . . . . . . 233
9.2 Elementary properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.3 Series with positive terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.3.1 Comparison convergence criterion . . . . . . . . . . . . . . . . . . . . . 234
9.3.2 Ratio convergence criterion: prelude . . . . . . . . . . . . . . . . . . . 238
9.3.3 Ratio criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.3.4 A …rst series expansion . . . . . . . . . . . . . . . . . . . . . . . . . . 241
9.4 Series with terms of any sign . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.4.1 Absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.4.2 Alternating series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

10 Discrete calculus 245


10.1 Preamble: limit points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
10.2 Discrete calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
10.2.1 Finite di¤erences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
10.2.2 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
10.3 Convergence in mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.3.1 In medio stat virtus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.3.2 Creatio ex nihilo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.4 Convergence criteria for series . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
10.4.1 Root criterion for convergence . . . . . . . . . . . . . . . . . . . . . . . 258
10.4.2 The power of the root criterion . . . . . . . . . . . . . . . . . . . . . . 260
10.5 In…nite patience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

III Continuity 265

11 Limits of functions 267


11.1 Introductory examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
11.2 Functions of a single variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11.2.1 Two-sided limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11.2.2 One-sided limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
11.2.3 Relations between one-sided and two-sided limits . . . . . . . . . . . . 279
11.2.4 Grand …nale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
11.2.5 Post scriptum: horizontal and vertical asymptotes . . . . . . . . . . . 281
11.3 Functions of several variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
11.4 Properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
11.5 Algebra of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
11.5.1 Indeterminacies for limits . . . . . . . . . . . . . . . . . . . . . . . . . 291
CONTENTS vii

11.6 Elementary limits and important limits . . . . . . . . . . . . . . . . . . . . . 293


11.6.1 Elementary limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
11.6.2 Important limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
11.7 Orders of convergence and of divergence . . . . . . . . . . . . . . . . . . . . . 295
11.7.1 Little-o algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
11.7.2 Asymptotic equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . 299
11.7.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
11.7.4 The usual bestiary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

12 Continuous functions 303


12.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
12.2 Discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
12.3 Operations and composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
12.4 Zeros and equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
12.4.1 Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
12.4.2 Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
12.5 Weierstrass’Theorem (preview) . . . . . . . . . . . . . . . . . . . . . . . . . . 315
12.5.1 Intermediate value theorem . . . . . . . . . . . . . . . . . . . . . . . . 317
12.6 Limits and continuity of operators . . . . . . . . . . . . . . . . . . . . . . . . 319
12.7 Uniform continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

IV Linear and nonlinear analysis 325

13 Linear functions and operators 327


13.1 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
13.1.1 De…nition and …rst properties . . . . . . . . . . . . . . . . . . . . . . . 327
13.1.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
13.1.3 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
13.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
13.2.1 Operations among matrices . . . . . . . . . . . . . . . . . . . . . . . . 333
13.2.2 Product of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
13.3 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
13.3.1 De…nition and …rst properties . . . . . . . . . . . . . . . . . . . . . . . 339
13.3.2 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
13.3.3 Matrices and operations . . . . . . . . . . . . . . . . . . . . . . . . . . 344
13.4 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
13.4.1 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
13.4.2 Rank of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
13.4.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
13.4.4 Gaussian elimination procedure . . . . . . . . . . . . . . . . . . . . . . 354
13.5 Invertible operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
13.5.1 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
13.5.2 Inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
13.6 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
13.6.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
viii CONTENTS

13.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364


13.6.3 Laplace’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
13.6.4 Inverses and determinants . . . . . . . . . . . . . . . . . . . . . . . . . 374
13.6.5 Kronecker’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
13.7 Square linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
13.8 General linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
13.9 Solving systems: Cramer’s method . . . . . . . . . . . . . . . . . . . . . . . . 386
13.10Grand …nale: Hahn-Banach et similia . . . . . . . . . . . . . . . . . . . . . . 389

14 Concave functions 393


14.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
14.1.1 De…nition and basic properties . . . . . . . . . . . . . . . . . . . . . . 393
14.1.2 Back to high school: polytopes . . . . . . . . . . . . . . . . . . . . . . 396
14.2 Concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
14.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
14.3.1 Concave functions and convex sets . . . . . . . . . . . . . . . . . . . . 404
14.3.2 A¢ ne functions and a¢ ne sets . . . . . . . . . . . . . . . . . . . . . . 407
14.3.3 Jensen’s inequality and continuity . . . . . . . . . . . . . . . . . . . . 409
14.4 Quasi-concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
14.5 Diversi…cation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
14.6 Grand …nale: Cauchy’s equation . . . . . . . . . . . . . . . . . . . . . . . . . 418
14.6.1 Remarkable variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
14.6.2 Compounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
14.7 Fireworks: the skeleton of convexity . . . . . . . . . . . . . . . . . . . . . . . 423
14.7.1 Convex hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
14.7.2 Extreme points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

15 Homogeneous functions 427


15.1 Preamble: cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
15.2 Homogeneity and returns to scale . . . . . . . . . . . . . . . . . . . . . . . . . 428
15.2.1 Homogeneous functions . . . . . . . . . . . . . . . . . . . . . . . . . . 428
15.2.2 Average functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
15.2.3 Homogeneity and quasi-concavity . . . . . . . . . . . . . . . . . . . . . 432
15.3 Homotheticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
15.3.1 Semicones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
15.3.2 Homotheticity and utility . . . . . . . . . . . . . . . . . . . . . . . . . 434

V Optima 437

16 Optimization problems 439


16.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
16.1.1 The beginner’s luck . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
16.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
16.1.3 Consumption and production . . . . . . . . . . . . . . . . . . . . . . . 448
16.2 Existence: Weierstrass’Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 453
CONTENTS ix

16.2.1 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453


16.2.2 Proof 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
16.2.3 Proof 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
16.3 Existence: Tonelli’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
16.3.1 Coercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
16.3.2 Tonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
16.3.3 Supercoercivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
16.4 Local extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
16.5 Concavity and quasi-concavity . . . . . . . . . . . . . . . . . . . . . . . . . . 467
16.5.1 Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
16.5.2 Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
16.6 Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
16.7 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
16.7.1 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
16.7.2 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
16.8 Operator optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479

17 Projections and approximations 483


17.1 Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
17.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
17.3 Return to Riesz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
17.4 Least squares and projections . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
17.5 A …nance illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
17.5.1 Portfolios and contingent claims . . . . . . . . . . . . . . . . . . . . . 489
17.5.2 Market value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
17.5.3 Law of one price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
17.5.4 Pricing rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
17.5.5 Pricing kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
17.5.6 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494

VI Di¤erential calculus 497

18 Derivatives 499
18.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
18.1.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
18.2 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
18.3 Derivative function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
18.4 Unilateral derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
18.5 Derivability and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
18.6 Derivatives of elementary functions . . . . . . . . . . . . . . . . . . . . . . . . 512
18.7 Algebra of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
18.8 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
18.9 Derivative of inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
18.10Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
18.11Di¤erentiability and linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
x CONTENTS

18.11.1 Di¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523


18.11.2 Di¤erentiability and derivability . . . . . . . . . . . . . . . . . . . . . 525
18.11.3 Di¤erentiability and continuity . . . . . . . . . . . . . . . . . . . . . . 526
18.11.4 Continuously di¤erentiable functions . . . . . . . . . . . . . . . . . . . 527
18.12Derivatives of higher order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
18.13Post scriptum: a discrete angle . . . . . . . . . . . . . . . . . . . . . . . . . . 528

19 Di¤erential calculus in several variables 531


19.1 Partial derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
19.1.1 Derivative operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
19.1.2 Ceteris paribus: marginal analysis . . . . . . . . . . . . . . . . . . . . 539
19.2 Di¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
19.2.1 Di¤erentiability and derivability . . . . . . . . . . . . . . . . . . . . . 543
19.2.2 Total di¤erential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
19.2.3 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
19.3 Partial derivatives of higher order . . . . . . . . . . . . . . . . . . . . . . . . . 550
19.4 Incremental and approximation viewpoints . . . . . . . . . . . . . . . . . . . 555
19.4.1 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
19.4.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
19.4.3 The two viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
19.5 Di¤erential of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
19.5.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
19.5.2 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
19.5.3 Proof of the chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 567

20 Di¤erential methods 571


20.1 Extremal and critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
20.1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
20.1.2 Fermat’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
20.1.3 Unconstrained optima: incipit . . . . . . . . . . . . . . . . . . . . . . . 576
20.2 Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
20.3 Continuity properties of the derivative . . . . . . . . . . . . . . . . . . . . . . 580
20.4 Monotonicity and derivability . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
20.5 Su¢ cient conditions for local extremal points . . . . . . . . . . . . . . . . . . 586
20.5.1 Local extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
20.5.2 Search of local extremal points . . . . . . . . . . . . . . . . . . . . . . 588
20.5.3 Unconstrained optima: scalar case . . . . . . . . . . . . . . . . . . . . 590
20.5.4 Global extremal points . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
20.6 De l’Hospital’s Theorem and rule . . . . . . . . . . . . . . . . . . . . . . . . . 593
20.6.1 Indeterminate forms 0=0 and 1=1 . . . . . . . . . . . . . . . . . . . . 593
20.6.2 Other indeterminate forms . . . . . . . . . . . . . . . . . . . . . . . . 596

21 Approximation 599
21.1 Taylor’s polynomial approximation . . . . . . . . . . . . . . . . . . . . . . . . 599
21.1.1 Polynomial expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
21.1.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
CONTENTS xi

21.1.3 Taylor and limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606


21.2 Omnibus proposition for local extremal points . . . . . . . . . . . . . . . . . . 607
21.3 Omnibus procedure of search of local extremal points . . . . . . . . . . . . . . 610
21.3.1 Twice di¤erentiable functions . . . . . . . . . . . . . . . . . . . . . . . 610
21.3.2 In…nitely di¤erentiable functions . . . . . . . . . . . . . . . . . . . . . 610
21.4 Taylor’s expansion: vector functions . . . . . . . . . . . . . . . . . . . . . . . 611
21.4.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
21.4.2 Taylor’s expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
21.4.3 Second-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 617
21.4.4 Unconstrained optima: vector functions . . . . . . . . . . . . . . . . . 621
21.5 Asymptotic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
21.5.1 Asymptotic scales and expansions . . . . . . . . . . . . . . . . . . . . 622
21.5.2 Asymptotic expansions and analytic functions . . . . . . . . . . . . . . 626
21.5.3 Hille’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629

22 Concavity and di¤erentiability 631


22.1 Scalar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
22.1.1 Decreasing marginal e¤ects . . . . . . . . . . . . . . . . . . . . . . . . 631
22.1.2 Tests of concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
22.1.3 Chords and tangents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
22.2 Vector functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
22.3 Su¢ ciency of the …rst order condition . . . . . . . . . . . . . . . . . . . . . . 643
22.4 Superdi¤erentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
22.5 Appendix: monotonicity of operators . . . . . . . . . . . . . . . . . . . . . . . 649

23 Implicit functions 651


23.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
23.2 A local perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
23.2.1 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . 653
23.2.2 Level curves and marginal rates . . . . . . . . . . . . . . . . . . . . . . 657
23.2.3 Quadratic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 660
23.2.4 Implicit vector functions . . . . . . . . . . . . . . . . . . . . . . . . . . 662
23.2.5 Implicit operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
23.3 A global perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665
23.3.1 Implicit functions and comparative statics . . . . . . . . . . . . . . . . 670
23.3.2 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . 673
23.3.3 Properties of implicit functions . . . . . . . . . . . . . . . . . . . . . . 676
23.4 A glocal perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
23.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
23.5.1 Projections and shadows . . . . . . . . . . . . . . . . . . . . . . . . . . 680
23.5.2 Proof of the Implicit Function Theorem . . . . . . . . . . . . . . . . . 681

24 Study of functions 683


24.1 In‡ection points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
24.2 Asymptotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
24.3 Study of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
xii CONTENTS

VII Di¤erential optimization 697

25 Unconstrained optimization 699


25.1 Unconstrained problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
25.2 Coercive problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
25.3 Concave problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
25.4 Relationship among problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
25.5 Weakening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
25.6 No illusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707

26 Equality constraints 709


26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
26.2 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
26.3 One constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
26.3.1 A key lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
26.3.2 Lagrange’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
26.4 The method of elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
26.5 The consumer problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
26.6 Cogito ergo solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
26.7 Several constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726

27 Inequality constraints 733


27.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
27.2 Resolution of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
27.2.1 Kuhn-Tucker’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 739
27.2.2 The method of elimination . . . . . . . . . . . . . . . . . . . . . . . . 740
27.3 Cogito et solvo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
27.4 Concave optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
27.5 Appendix: proof of a key lemma . . . . . . . . . . . . . . . . . . . . . . . . . 749

28 General constraints 755


28.1 A general concave problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
28.2 Analysis of the black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
28.2.1 Variational inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
28.2.2 A general …rst order condition . . . . . . . . . . . . . . . . . . . . . . 758
28.2.3 Divide et impera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
28.3 Resolution of the general concave problem . . . . . . . . . . . . . . . . . . . . 762

29 Parametric optimization problems 765


29.1 Preamble: correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
29.1.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
29.1.2 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
29.2 Parametric optimization problems . . . . . . . . . . . . . . . . . . . . . . . . 767
29.3 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
29.4 Envelope theorems I: …xed constraint . . . . . . . . . . . . . . . . . . . . . . . 770
29.5 Envelope theorems II: variable constraint . . . . . . . . . . . . . . . . . . . . 772
29.6 Marginal interpretation of multipliers . . . . . . . . . . . . . . . . . . . . . . . 773
CONTENTS xiii

VIII Integration 775

30 Riemann’s integral 777


30.1 Plurirectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
30.2 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
30.2.1 Positive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
30.2.2 General functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
30.2.3 Everything holds together . . . . . . . . . . . . . . . . . . . . . . . . . 788
30.3 Criteria of integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
30.4 Classes of integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
30.4.1 Step functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
30.4.2 Analytic approach and geometric approach . . . . . . . . . . . . . . . 797
30.4.3 Continuous functions and monotonic functions . . . . . . . . . . . . . 798
30.5 Properties of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801
30.6 Fundamental theorems of integral calculus . . . . . . . . . . . . . . . . . . . . 808
30.6.1 Primitive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
30.6.2 Formulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
30.6.3 The First Fundamental Theorem of Calculus . . . . . . . . . . . . . . 812
30.6.4 The Second Fundamental Theorem of Calculus . . . . . . . . . . . . . 813
30.7 Properties of the inde…nite integral . . . . . . . . . . . . . . . . . . . . . . . . 816
30.8 Change of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
30.9 Functions integrable in closed form . . . . . . . . . . . . . . . . . . . . . . . . 823
30.10Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
30.10.1 Unbounded intervals of integration: generalities . . . . . . . . . . . . . 827
30.10.2 Unbounded intervals of integration: properties and criteria . . . . . . 834
30.10.3 Gauss’s integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
30.10.4 Unbounded functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 839

31 Parameter-dependent integrals 841


31.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
31.2 Variability: Leibniz’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
31.3 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846

32 Stieltjes’integral 847
32.1 De…nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
32.2 Integrability criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
32.3 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850
32.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
32.5 Step integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
32.6 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
32.7 Change of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856

33 Moments 859
33.1 Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
33.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860
33.3 The problem of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
xiv CONTENTS

33.4 Moment generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862

IX Appendices 865

A Permutations 867
A.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
A.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
A.3 Anagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
A.4 Newton’s binomial formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870

B Notions of trigonometry 873


B.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
B.2 Concerto d’archi (string concert) . . . . . . . . . . . . . . . . . . . . . . . . . 875
B.3 Perpendicularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877

C Elements of intuitive logic 879


C.1 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
C.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
C.3 Logical equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881
C.4 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
C.4.1 Direct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
C.4.2 Reductio ad absurdum . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
C.4.3 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
C.5 The logic of scienti…c inquiries . . . . . . . . . . . . . . . . . . . . . . . . . . 887
C.6 Predicates and quanti…ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
C.6.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
C.6.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
C.6.3 Example: linear dependence and independence . . . . . . . . . . . . . 891

D Mathematical induction 893


D.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
D.2 The harmonic Mengoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895

E Cast of characters 897


Part I

Structures

1
Chapter 1

Sets and numbers: an intuitive


introduction

1.1 Sets
A set (or aggregate) is a collection of distinguishable objects. There are two ways to describe
a set: by listing directly its elements, or by specifying a property that its elements have in
common. The second way is more common than the …rst one; for instance,

f11; 13; 17; 19; 23; 29g (1.1)

can be described as the set of the prime numbers between 10 and 30. The chairs of your
kitchen form a set of objects, the chairs, that have in common the property of being part
of your kitchen. The chairs of your bedroom form another set, as the letters of the Latin
alphabet form a set, distinct from the set of the letters of the Greek alphabet (and from the
set of chairs or from the set of numbers considered above).

Sets are usually denoted by capital letters: A, B, C, and so on; their elements are denoted
by small letters: a, b, c, and so on. To denote that an element a belongs to the set A we
write

a2A
where 2 is the symbol of belonging. Instead, to denote that an element a does not belong
to the set A we write a 2
= A.

O¤ the record remark (O.R.). The concept of set, apparently introduced in 1847 by
Bernhard Bolzano, is for us a primitive concept, not de…ned through other notions. The
situation is similar to the one we have in Euclidean geometry, in which points and lines are
primitive concepts (with an intuitive geometric meaning that readers may give them). H

1.1.1 Subsets
The chairs of your bedroom are a subset of the chairs of your home: a chair that belongs to
your bedroom also belongs to your home. In general, a set A is subset of a set B when all
the elements of A are also elements of B. In this case we write A B. Formally,

3
4 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

De…nition 1 Given two sets A and B, we say that A is subset of B, in symbols A B, if


all the elements of A are also elements of B, that is, if x 2 A implies x 2 B.

For instance, denote by A the set (1.1), that is,

A = f11; 13; 17; 19; 23; 29g

and let
B = f11; 13; 15; 17; 19; 21; 23; 25; 27; 29g (1.2)
be the set of the odd numbers between 10 and 30. We have A B.

Graphically, the relation A B can be illustrated as

4 A ⊆B
2

-2 A

-4
B

-6
-6 -4 -2 0 2 4 6

by using the so-called Venn diagrams to represent graphically the sets A and B: it is an
ingenuous, yet e¤ective, way to visualize sets.

When we have both A B and B A – that is, x 2 A if and only if x 2 B – the two
sets A and B are said to be equal; in symbols A = B. For example, let A be the set of
the solutions of the quadratic equation x2 3x + 2 = 0 and let B be the set formed by the
numbers 1 and 2. It is easy to see that A = B.
When A B and A 6= B, we write A B and say that A is a proper subset of B.
The sets A = fag that consist of a unique element are called singletons. They are a
peculiar, but altogether legitimate, class of sets.1

N.B. Though the two symbols 2 and are conceptually well distinct and must not be
confused, there exists an interesting relation between them. Indeed, consider the set formed
by a unique element a, that is, the singleton fag. Through such a singleton, we can establish
the relation
a 2 A if and only if fag A
between 2 and . O
1
Note that a and fag are not the same thing; a is an element and fag is a set, even if it is formed by only
one element. For instance, the set A of the Nations of the Earth with the ‡ag of only one colour had (until
2011) only one element, Libya, but it is not “the Libya”: Tripoli is not the capital of A.
1.1. SETS 5

1.1.2 Operations
There are three basic operations among sets: union, intersection, and di¤erence. As we will
see, they take any two given sets and, starting from them, form a new set.

The …rst operation that we consider is the intersection of two sets A and B. As the
term “intersection” suggests, with this operation we select all the elements that belong
simultaneously to the sets A and B.

De…nition 2 Given two sets A and B, their intersection A \ B is the set of all the elements
that belong both to A and B, that is, x 2 A \ B if x 2 A and x 2 B.

The operation can be illustrated graphically in the following way:

For example, let A be the set of the left-handers and B the set of the right-handers in Italy.
The intersection A \ B is the set of the ambidextrous Italians. If, instead, A is the set of the
petrol cars and B the set of the methane cars, the intersection A \ B is the set of the bi-fuel
cars that run on both petrol and methane.

It can happen that two sets have no elements in common. For example, let

C = f10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30g (1.3)

be the set of the even numbers between 10 and 30. It has no elements in common with the
set B in (1.2). In this case we talk of disjoint sets, with no elements in common. Such a
notion gives us the opportunity to introduce a fundamental set.

De…nition 3 The empty set, denoted by ;, is the set without elements.

As a …rst use of the notion, note that two sets A and B are disjoint when they have
empty intersection, that is, A \ B = ;. For example, for the sets B and C in (1.2) and (1.3),
we have B \ C = ;.
We write A 6= ; when the set A is not empty, that is, it contains at least one element.
Conventionally, we consider the empty set as a subset of any set, that is, ; A for every set
A.

It is immediate that A \ B A and that A \ B B. The next result is more subtle and
establishes a useful property that links and \.
6 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Proposition 4 A \ B = A if and only if A B.

Proof “If”. Let A B. We want to prove that A \ B = A. In order to show that two
sets are equal, we always need to prove separately the two opposite inclusions: in this case,
A \ B A and A A \ B.
The …rst inclusion A \ B A is easily proven to be true. Indeed, let x 2 A \ B.2 Then,
by de…nition, x belongs both to A and to B. In particular, x 2 A and this is enough to
conclude that A \ B A.
Let us prove the second inclusion: A A \ B. Let x 2 A. As, by hypothesis, A B,
each element of A also belongs to B, it follows that x 2 B. Hence, x belongs both to A and
to B, i.e., x 2 A \ B, and this proves that A A \ B.
We have shown that both the inclusions A \ B A and A A \ B hold; we can therefore
conclude that A \ B = A, which completes the proof of the “If” part.

“Only if”. Let A \ B = A. Let x 2 A. As by hypothesis A \ B = A, it follows that


x 2 A \ B. In particular, this means that x belongs to B, as claimed.

The next operation we consider is the union. Here again the term “union” already
suggests how in this operation all the elements of both sets are collected together.

De…nition 5 Given two sets A and B, their union A [ B is the set of all the elements that
belong to A or to B, that is, x 2 A [ B if x 2 A or x 2 B.3

Note that an element can belong to both sets (unless the sets are disjoint). For example,
if A is again the set of the left-handers and B is the set of the right-handers in Italy, the
union set contains all the Italians with at least one hand, and there are individuals (the
ambidexters) who belong to both sets.
It is immediate to show that A A [ B and that B A [ B. It then follows that

A\B A[B

Graphically the union is represented in the following way:

2
In proving an inclusion between sets, say C D, throughout the book we will tacitly assume that C 6= ;
since the inclusion is trivially true when C = ;. For this reason our inclusion proof will show that x 2 C (i.e.,
C 6= ;) implies x 2 D.
3
The conjunction “or” has the inclusive sense of the Latin “vel” (x belongs to A or to B or to both) and
not the exclusive sense of “aut” (x belongs to either A or to B, but not to both). Indeed, Giuseppe Peano
gave the symbol [ the meaning “vel” when he …rst introduced it, along with the intersection symbol \ and
the membership symbol ", which he interpreted as the Latin “et” and “est”, respectively (see the “signorum
tabula” in his 1889 work Arithmetices principia, nova methodo exposita, a seminal work on the foundations
of mathematics).
1.1. SETS 7

4 A ∪ B

-2 A
B
-4

-6
-2 0 2 4 6 8 10

The last operation that we consider is the di¤erence.

De…nition 6 Given two sets A and B, their di¤erence A B is the set of all the elements
that belong to A, but not to B, that is, x 2 A B if both x 2 A and x 2
= B.

The di¤erence set4 A B is therefore obtained by eliminating from A all the elements
that belong (also) to B. Graphically:

2 A -B

-1 B
A
-2

-3
-3 -2 -1 0 1 2 3 4 5

For example, let us go back to the sets A and B identi…ed in (1.1) and (1.2). Then,

B A = f15; 21; 25; 27g

that is, B A is the set of the non-prime odd numbers between 10 and 30. Note that: (i)
when A and B are disjoint, we have A B = A and B A = B, (ii) A B is equivalent
to A B = ; since, by removing from A all the elements that belong also to B, the set A is
deprived of all its elements, that is, we remain with the empty set.

In many applications there is a general set of reference, an all inclusive set, of which
various subsets are considered. For example, for demographers this set can be the entire
4
The set di¤erence A B is often denoted by AnB.
8 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

population of a country, of which they can consider various subsets according to the demo-
graphic properties that are of interest (for instance, age is a common demographic variable
through which the population can be subdivided in subsets).
The general set of reference is called universal set or, more commonly, space. There is no
consolidated notation for this set (which is often clear from the context), which we denote
temporarily by S. Given any of its subsets A, the di¤erence S A is denoted by Ac and
is called the complement set, or simply the complement, of A. The di¤erence operation is
called complementation when it involves the universal set.

Example 7 If S is the set of all citizens of a country and A is the set of all citizens that are
at least 65 years old, the complement Ac is constituted by all citizens that are (strictly) less
than 65 years old. N

It is immediate to verify that, for every A, we have A [ Ac = S and A \ Ac = ;. We also


have:

Proposition 8 If A is a subset of a space S, we have (Ac )c = A.

Proof Since we have to verify an equality between sets (as in the proof of Proposition 4),
we have to consider separately the two inclusions (Ac )c A and A (Ac )c .
If a 2 (Ac )c , then a 2
= Ac and therefore a 2 A. It follows that (Ac )c A.
Vice versa, if a 2 A, then a 2= Ac and therefore a 2 (Ac )c ; hence A (Ac )c .

Finally, we can prove without di¢ culty that A B = A \ B c . Indeed, x 2 A B means


that x 2 A and x 2= B, that is, x 2 A and x 2 B c .

1.1.3 Properties of the operations


Proposition 9 The operations of union and intersection are:

(i) commutative, that is, for any two sets A and B, we have A \ B = B \ A and A [ B =
B [ A;

(ii) associative, that is, for any three sets A, B, and C, we have A[(B [ C) = (A [ B)[C
and A \ (B \ C) = (A \ B) \ C.

We leave to the reader the simple proof. Property (ii) permits to write A [ B [ C
and A \ B \ C and, therefore, to extend without ambiguity the operations of union and
intersection to an arbitrary (…nite) number of sets:
n
[ n
\
Ai and Ai
i=1 i=1

It is possible to extend such operations also to in…nitely many sets. If A1 ; A2 ; :::An ; ::: is an
in…nite collection of sets, the union
[1
An
n=1
1.1. SETS 9

is the set of the elements that belong at least to one of the An , that is,
1
[
An = fa : a 2 An for at least one index ng
n=1

The intersection
1
\
An
n=1

is the set of the elements that belong to every An , that is,


1
\
An = fa : a 2 An for every index ng
n=1

Example 10 Let An be the T1set of the even numbers n. For example, A3 = f0; 2g and
A6 = f0; 2; 4; 6g. We have Sn=1 An = f0g, since 0 is the only even number such that 0 2 An
S1
for each n 1. Moreover, 1 A
n=1 n = f2n : n positive integerg, that is, n=1 An is the set
of all even numbers. N

We turn to the relations between the operations of intersection and union. Note the
symmetry between properties (1.4) and (1.5), in which \ and [ are exchanged.

Proposition 11 The operations of union and intersection are distributive, that is, given
any three sets A, B, and C, we have

A \ (B [ C) = (A \ B) [ (A \ C) (1.4)

and
A [ (B \ C) = (A [ B) \ (A [ C) : (1.5)

Proof We prove only (1.4). We have to consider separately the two inclusions A\(B [ C)
(A \ B) [ (A \ C) and (A \ B) [ (A \ C) A \ (B [ C).
If x 2 A \ (B [ C), then x 2 A and x 2 B [ C, that is (i) x 2 A and (ii) x 2 B or
x 2 C. It follows that x 2 A \ B or x 2 A \ C, i.e., x 2 (A \ B) [ (A \ C), and therefore
A \ (B [ C) (A \ B) [ (A \ C).
Vice versa, if x 2 (A \ B) [ (A \ C), then x 2 A \ B or x 2 A \ C, that is, x belongs
to A and to at least one of B and C and therefore x 2 A \ (B [ C). It follows that
(A \ B) [ (A \ C) A \ (B [ C).

We now introduce a concept that plays an important role in many applications.

De…nition 12 A family
fA1 ; A2 ; : : : ; An g = fAi gni=1
of subsets of a set A is a partition of A if the subsets are pairwise
S disjoint, that is, Ai \Aj = ;
for every i 6= j, and if their union coincides with A, that is, ni=1 Ai = A.
10 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Example 13 Let A be the set of all citizens of a country. Its subsets A1 , A2 , and A3
formed, respectively, by the citizens of school or pre-school age (from 0 to 17 years old), by
the citizens of working age (from 18 to 65 years old) and by the elders (from 65 years old
on) constitute a partition of the set A. N

We conclude with the so-called De Morgan’s laws for complementation: they illustrate
the relationship between the operations of intersection, union, and complementation.

Proposition 14 Given two subsets A and B of a space S, we have (A [ B)c = Ac \ B c and


(A \ B)c = Ac [ B c .

Proof We prove only the …rst law, leaving the second one to the reader. As usual, in order
to prove an equality between sets, we have to consider separately the two inclusions that
compose it. (i) (A [ B)c Ac \ B c . If x 2 (A [ B)c , then x 2
= A [ B, that is, x does not
belong either to A or to B. It follows that x belongs simultaneously to Ac and to B c and,
therefore, to their intersection. (ii) Ac \ B c (A [ B)c . If x 2 Ac \ B c then x 2= A and
x2= B; therefore, x does not belong to their union.

De Morgan’s laws show that, when considering complements, the operations [ and \ are
essentially interchangeable. Often these laws are written in the equivalent form

A [ B = (Ac \ B c )c and A \ B = (Ac [ B c )c

1.1.4 A naive remark


In this book we will usually de…ne sets by means of the properties of their elements. Such
a “naive” notion of a set is su¢ cient for our purposes. The naiveté of this approach is
highlighted by the classical paradoxes that, between the end of the Nineteenth century and
early Twentieth century, were discovered by Cesare Burali Forti and Bertrand Russell. Such
paradoxes arise by considering sets of sets, that is, sets whose elements are sets themselves.
As in Burali Forti, using the naive notion of a set we de…ne “the set of all sets”, that is, the
set whose elements share the property of being sets. If such a universal set “U ” existed, we
could also form the set fB : B U g that consists of U and all of its subsets. Yet, as shown
in Cantor’s Theorem 257, such a set does not belong to U , which contradicts the supposed
universality of U . Among the bizarre features of a universal set there is the fact that it
belongs to itself, i.e. U 2 U , a completely unintuitive property (as observed by Russell, “the
human race, for instance, is not a human”).
As suggested by Russell, let us consider the set A formed by all sets that are not members
of themselves (e.g., the set of red oranges belongs to A because its elements are red oranges
and, obviously, none of them is the entire collection of all them). If A 2 = A, namely if A
does not belong to itself, then A 2 A because it is a set that satis…es the property of not
belonging to itself. On the other hand, if A 2 A, namely if A contains itself, then A 2 = A
because, by de…nition, the elements of A do not contain themselves. In conclusion, we reach
the absurdity A 2 = A if and only if A 2 A. It is the famous paradox of Russell.
These logical paradoxes (often called antinomies) can be addresses within a non-naive set
theory, in particular that of Zermelo-Fraenkel. In the practice of mathematics, all the more
in an introductory textbook, these foundational aspects can be safely ignored (their study
1.2. NUMBERS 11

would require an ad hoc, highly non-trivial, course). But, it is important to be aware of these
paradoxes because the methods that have been developed to address them have a¤ected the
practice of mathematics, as well as that of the empirical sciences.

1.2 Numbers
To quantify the quantities of interest in economic applications (for example, the prices and
quantities of goods traded in some market) we need an adequate set of numbers. This is the
argument of the present section.
The natural numbers
0; 1; 2; 3; :::
do not need any introduction; their set will be denoted by the symbol N.
The set N of natural numbers is closed with respect to the fundamental operations of
addition and multiplication:

(i) m + n 2 N when m; n 2 N;

(ii) m n 2 N when m; n 2 N.

On the contrary, N is not closed with respect to the fundamental operations of subtraction
and division: for example, neither 5 6 nor 5=6 are natural numbers. It is therefore clear
that N is inadequate as a set of numbers to quantify all economic quantities: the budget of
a company is a …rst obvious example in which the closure with respect to the subtraction is
crucial (otherwise, how can we quantify losses?).

The integer numbers (or relative integers)5

:::; 3; 2; 1; 0; 1; 2; 3; :::

form a …rst extension, denoted by the symbol Z, of the set N. It leads to a set that is closed
with respect to addition and multiplication, as well as to subtraction. Indeed, by setting
m n = m + ( n),6 we have

(i) m n 2 Z when m; n 2 Z;

(ii) m n 2 Z when m; n 2 Z.

Formally, the set Z can be written in terms of N as

Z = fm n : m; n 2 Ng

Proposition 15 N Z.
5
In ancient India positive numbers and negative numbers were distinguished by writing them, respectively,
in red and in black. This convention is in contrast to the one banks follow according to which a checking
account with negative balance is “in the red”.
6
The di¤erence m n is simply the sum of m with the negative n of n. Concerning this aspect, recall
the notion of algebraic sum.
12 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Proof Let m 2 N. We have m = m 0 2 Z, since 0 2 N.

We are left with a fundamental operation with respect to which Z is not closed: division.
For example, 1=3 is not an integer number. To remedy this important shortcoming of the
integers (if we want to divide 1 cake among 3 guests, how can we quantify their portions
if only Z is available?), we need a further enlargement to the set of the rational numbers,
denoted by the symbol Q, and given by
nm o
Q= : m; n 2 Z with n 6= 0
n
In other words, the set of the rational numbers consists of all the fractions with integer
numbers in the numerator and in the denominator (not equal to zero).

Proposition 16 Z Q.

Proof Let m 2 Z. We have m = m=1 2 Q, since 1 2 Z.

The set of rational numbers is closed with respect to all the four fundamental operations:7

(i) m n 2 Q when m; n 2 Q;

(ii) m n 2 Q when m; n 2 Q;

(iii) m=n 2 Q when m; n 2 Q with n 6= 0.

O.R. Each rational number that is not periodic, that is, that has a …nite number of decimals,
has two decimal representations. For example, 1 = 0:9 because
1
0:9 = 3 0:3 = 3 =1
3
In an analogous way, 2:5 = 2:49, 51:2 = 51:19, and so on. On the contrary, periodic rational
numbers and irrational numbers have a unique decimal representation (which is in…nite).
This is not a simple curiosity: if 0:9 were not equal to 1, we could state that 0:9 is the
number that immediately precedes 1 (without any other number in between), which would
violate a notable property that we will discuss shortly. H

The set of rational numbers seems, therefore, to be equipped with all what can be use-
ful. Some simple observations on the multiplication, however, will bring us some surprising
…ndings. If q is a rational number, as it is well known, the notation q n , with n 1, means

q q ::: q
| {z }
n times

We agree that q 0 = 1 for every q 6= 0. By itself the notation q n , called power of basis q
and exponent n, is just a simple way to write more compactly the repeated multiplication
7
The names of the four fundamental operations are addition, subtraction, multiplication, and division,
while the names of their results are, respectively, sum, di¤erence, product, and quotient (the addition of 3
and 4 has 7 as sum, and so on).
1.2. NUMBERS 13

of the same factor. Nevertheless, given a rational q > 0, it is natural to consider the inverse
1
path, that is, to determine the positive “number”, denoted by q n (sometimes by q 1=n ) — or,
p
equivalently, by n q –and called root of order n of q, such that
1 n
qn =q
p
For example,8 25 = 5 as 52 = 25. To understand the importance of roots, we can consider
the following simple geometric …gure:

p
By Pythagoras’ Theorem, the length of the hypotenuse is 2. To quantify elementary
geometric entities, we thus need square roots. Here we have a, tragic to some, surprise.9
p
Theorem 17 22
= Q.
p
Suppose, by contradiction, that
Proof p 2 2 Q. Then there exist m; n 2 Z such that
m=n = 2, and therefore
m 2
=2 (1.6)
n
We can assume that m=n is already reduced to its lowest terms, i.e., that m and n have no
factors in common.10 This means that m and n cannot both be even numbers (otherwise, 2
would be a common factor).
Formula (1.6) implies
m2 = 2n2 (1.7)
and therefore m2 is even. As the square of an odd number is odd, m is also even (di¤erently,
if m were odd, m2 would also be odd). Therefore, there exists an integer k 6= 0 such that

m = 2k (1.8)

From (1.7) and (1.8) it follows that


n2 = 2k 2
8 p p
The square root 2 q is simply denoted by q, skipping the index 2.
9
For the Pythagorean philosophy, in which the proportions (that is, the rational numbers) were central,
the discovery of the non-rationality of square roots was a traumatic event. We refer the curious reader to K.
von Fritz, “The discovery of incommensurability by Hippasus of Metapontum”, Annals of Mathematics, 46,
242–264, 1945.
10
For example, 14=10 is not reduced to its lowest terms because the numerator and the denominator have
in common the factor 2. On the contrary, 7=5 is reduced to its lowest terms.
14 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Therefore n2 is even, and so n itself is even. In conclusion, both m and n are even, but this
contradicts
p the fact that m=n is reduced to its lowest terms. This contradiction proves that
22= Q.

This magni…cent result is one of the great theorems of Greek mathematics. Proved by
the Pythagorean school between the VI and the V century B.C., it was a turning point in
the history of mathematics. Leaving aside the philosophical aspects, from the mathematical
point of view it shows the need for a further enlargement of the set of numbers in order to
quantify basic geometric entities (as well as basic economic quantities, as it will be clear in
the sequel).
To introduce, at an intuitive level, this …nal enlargement,11 consider the classical real
line:

It is easy to see how on this line we can represent the rational numbers:

The rational numbers do not exhaust, however, the real line. For example, also roots like
p
2, or other non-rational numbers, such as , must …nd their representation on the real
line:12

We denote by R the set of all the numbers that can be represented on the real line; they are
called real numbers.
The set R has the following properties in terms of the fundamental operations (here a; b
and c are generic real numbers):

(i) a + b 2 R and a b 2 R;

(ii) a + b = b + a and a b = b a;

(iii) (a + b) + c = a + (b + c) and (a b) c = a (b c);

(iv) a + 0 = a and b 1 = b;
1
(v) a + ( a) = 0 and b b = 1 provided b 6= 0;

(vi) a (b + c) = a b + a c.
11
For a rigorous treatment we refer, for example, to the …rst chapter of W. Rudin, Principles of mathematical
analysis, McGraw-Hill, 1976.

12
Though intuitive, it is actually a postulate (of continuity of the real line).
1.3. STRUCTURE OF THE INTEGERS 15

Clearly, Q R; but Q 6= R: there are many real numbers, called irrationals, that are
not rational. Many roots and the numbers and e are examples of irrational numbers. It
is actually possible to prove that most real numbers are irrational. Although a rigorous
treatment of this topic would take us too far, the next simple result is already a clear
indication of how rich the set of the irrational numbers is.

Proposition 18 Given any two rational numbers a < b, there exists an irrational number
c 2 R such that a < c < b.

Proof For each natural n 2 N, let


p
2
cn = a +
n
We have cn > a for every n, and it is easy to check that every cn is irrational. Moreover,
p
2
cn < b () n >
b a
p
Let therefore n 2 N be any natural number such that n > 2= (b a) (such n exists because
of the Archimedean property of the real numbers, which we will soon see in Proposition 38).
Since a < cn < b, the proof is complete.

In conclusion, R is the set of numbers that we will consider in the rest of the book. It
turns out to be adequate for most economic applications.13

1.3 Structure of the integers


Let us now analyze some basic — yet not trivial — properties of integers. The main result
we will present is the Fundamental Theorem of Arithmetic, which shows the central role
prime numbers play in the structure of the set of integers.

1.3.1 Divisors and algorithms


In this …rst section we will present some preliminary notions which will be needed for the
following section regarding prime numbers. In so doing we will encounter and get acquainted
with the notion of algorithm, which is of paramount importance for applications.
We begin by introducing in a rigorous fashion some notions, the essence of which the
reader may have learned in elementary school. An integer n is divisible by an integer p 6= 0
if there is a third integer q such that n = pq. In symbols we write p j n, which is read as “p
divides n”.

Example 19 The integer 6 is divisible by the integer 2, that is 2 j 6, as the integer 3 is such
that 6 = 2 3. Furthermore, 6 is divisible by 3, that is 3 j 6, as the integer 2 is such
that 6 = 2 3. N
13
An important further enlargement, which we do not consider, is the set C of complex numbers.
16 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

The reader may have learned in elementary school how to divide two integers by using
remainders and quotients. For example, if n = 7 and m = 2, we have n = 3 2 + 1, with 3 as
the quotient and 1 as the remainder. The next simple result formalizes the above procedure
and shows that it holds for any pair of integers (something that young learners take for
granted, but from now on we will take nothing for granted).

Proposition 20 Given any two integers m and n, with m strictly positive,14 there is one
and only one pair of integers q and r such that

n = qm + r

with 0 r < m.

Proof Two distinct properties are stated in the proposition: the existence of the pair (q; r),
and its uniqueness. Let us start by proving its existence. We will only consider the case
in which n 0 (you need only to to change the sign if n < 0). Consider the set A =
fp 2 N : p n=mg. Since n 0, A is non-empty, as it contains at least the integer zero. Let
q be the largest element of A. By de…nition, qm n < (q + 1) m. Setting r = n qm, we
have
0 n qm = r < (q + 1) m qm = m

We have thus shown the existence of the desired pair (q; r).
Let us now consider uniqueness. By contradiction, let (q 0 ; r0 ) and (q 00 ; r00 ) be two di¤erent
pairs such that
n = q 0 m + r0 = q 00 m + r00 (1.9)

with 0 r0 ; r00 < m. Since (q 0 ; r0 ) and (q 00 ; r00 ) are di¤erent we have either q 0 6= q 00 or r0 6= r00
or both. If q 0 6= q 00 , without loss of generality, we can suppose that q 0 < q 00 ; that is,

q0 + 1 q 00 (1.10)

since q 0 and q 00 are integers. It follows from (1.9) that (q 00 q0 ) m = r0 r00 . Since
(q 00 q 0 ) m 0, we have that 0 r00 r0 < m. Hence,

q 00 q 0 m = r0 r00 < m

which implies that q 00 q 0 < 1, that is, q 00 < q 0 + 1, which contradicts (1.10). We can
conclude that, necessarily, q 0 = q 00 . This leaves open only the possibility that r0 6= r00 . But,
since q 0 = q 00 , we have that

0 = q 00 q0 m = r0 r00 6= 0;

a contradiction. Hence, the assumption of having two di¤erent pairs (q 0 ; r0 ) and (q 00 ; r00 ) is
false.
14
An integer m is said to be strictly positive if m > 0, that is, m 1.
1.3. STRUCTURE OF THE INTEGERS 17

Greatest common divisor


Given two strictly positive integers m and n, their greatest common divisor, denoted by
gcd (m; n), is the largest divisor both numbers share. The next result, which was proven by
Euclid in his Elements, shows exactly what was taken for granted in grade school, namely,
that any pair of integers has a unique greatest common divisor.

Theorem 21 (Euclid) Any pair of strictly positive integers has one and only one greatest
common divisor.

Proof Like Proposition 20, this is also an existence and uniqueness result. Uniqueness is
obvious; let us prove existence. Let m and n be any two strictly positive integers. By
Proposition 20, there is a unique pair (q1 ; r1 ) such that

n = q 1 m + r1 (1.11)

with 0 r1 < m. If r1 = 0, then gcd (m; n) = m, and the proof is concluded. If r1 > 0, we
iterate the procedure by applying Proposition 20 to m. We thus have a unique pair (q2 ; r2 )
such that
m = q 2 r1 + r2 (1.12)
where 0 r2 < r1 . If r2 = 0, then gcd (m; n) = r1 . Indeed, (1.12) implies r1 j m. Further-
more, by (1.11) and (1.12), we have that
n q 1 m + r1 q 1 q 2 r1 + r 1
= = = q1 q2 + 1
r1 r1 r1

and so r1 j n. Thus r1 is a divisor both for n and m. We now need to show that it is the
greatest of those divisors. Suppose p is a strictly positive integer such that p j m and p j n.
By de…nition, there are two strictly positive integers a and b such that n = ap and m = bp.
We have that
r1 n q1 m
0< = = a q1 b
p p
Hence r1 =p is a strictly positive integer, which implies that r1 p. To sum up, gcd (m; n) =
r1 , if r2 = 0. If this is the case, the proof is concluded.
If r2 > 0, we iterate the procedure once more by applying Proposition 20 to r2 . We thus
have a unique pair (q3 ; r3 ) such that

r 1 = q 3 r2 + r 3

where 0 r3 < r2 . If r3 = 0, proceeding as above we can show that gcd (m; n) = r2 ,


and the proof is complete. If r3 > 0, we iterate the procedure. Iteration after iteration, a
strictly decreasing sequence of positive integers r1 > r2 > > rk is generated. A strictly
decreasing sequence of positive integers can only be …nite: there is a k 1 such that rk = 0.
Proceeding as above we can show that gcd (m; n) = rk 1 , which completes the proof of
existence of gcd (m; n).

From a methodological standpoint, the above argument is a good example of a con-


structive proof, since it is based on an algorithm (known as the Euclid’s Algorithm) which
18 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

determines with a …nite number of iterations the mathematical entity whose existence is
stated – here, the greatest common divisor. The notion of algorithm is of paramount im-
portance because, when an algorithm is available, it makes mathematical entities computable.
In principle an algorithm can be automated by means of an appropriate computer program
(for example, Euclid’s Algorithm allows us to automate the search for the greatest common
divisors).

Euclid’s Algorithm is the …rst algorithm we encounter and it is of such importance in


number theory that it deserves to be reviewed in greater detail. Given two strictly positive
integers m and n, the algorithm unfolds in the following k 1 steps:
Step 1 n = q1 m + r1
Step 2 m = q2 r1 + r2
Step 3 r1 = q2 r2 + r3
.......
Step k rk 2 = q 2 rk 1 (that is, rk = 0)
The algorithm stops at step k, where rk = 0. In this case gcd (m; n) = rk 1, as we saw
in the previous proof.
Example 22 Let us consider the strictly positive integers 3801 and 1708. Their greatest
common divisor is not apparent at …rst sight. Fortunately we can calculate it by means of
Euclid’s Algorithm. We proceed as follows:
Step 1 3801 = 2 1708 + 385
Step 2 1708 = 4 385 + 168
Step 3 385 = 2 168 + 49
Step 4 168 = 3 49 + 21
Step 5 49 = 2 21 + 7
Step 6 21 = 3 7
In six steps we have found that gcd(3801; 1708) = 7. N
The quality of an algorithm depends on the number of steps, or iterations, that are
required to reach the solution. The fewer the iterations, the more powerful the algorithm is.
The following remarkable property –proven by Gabriel Lamé –holds for Euclid’s Algorithm.
Theorem 23 (Lamé) Given two integers m and n, the number of iterations needed for
Euclid’s Algorithm is less than or equal to …ve times the number of digits of min fm; ng.
For example, if we go back to the numbers 3801 and 1708, the number of relevant digits
is 4. Lamé’s Theorem guarantees in advance that Euclid’s Algorithm would have required
at most 20 iterations. It took us only 6 steps, but thanks to Lamé’s Theorem we already
knew, before starting, that it would not have taken too much e¤ort (and thus it was worth
giving it a shot without running the risk of getting stuck in a grueling number of iterations).
1.3. STRUCTURE OF THE INTEGERS 19

1.3.2 Prime numbers


Among the natural numbers, a prominent position is held by prime numbers, which the
reader has most likely encountered in secondary school

De…nition 24 A natural number n 2 is said to be prime if it is divisible only by 1 and


itself.

A natural number which is not prime is called composite. Let us denote the set of prime
numbers by P. Obviously, P N and N P is the set of composite numbers. The reader can
easily verify that the following naturals

f2; 3; 5; 7; 11; 13; 17; 19; 23; 29g

are the …rst ten prime numbers.


The importance of prime numbers becomes more apparent if we note how composite
numbers (strictly greater than 1) can be expressed as a product of primes. For example, the
composite number 12 can be written as

12 = 22 3

while the composite number 60 can be written as

60 = 22 3 5

In general, the prime factorization (or decomposition) of a composite number n can be


written as
n = pn1 1 pn2 2 pnk k (1.13)
where pi 2 P and ni 2 N for each i = 1; :::; k, with

p1 < p2 < < pk and n1 > 0; :::; nk > 0

Example 25 (i) For n = 12 we have p1 = n1 = 2, p2 = 3 and n2 = 1; in this case k = 2.


(ii) For n = 60 we have p1 = n1 = 2, p2 = 3, n2 = 1, p3 = 5 and n3 = 1; in this case k = 3.
(iii) For n = 200 we have
200 = 23 52
hence p1 = 2, n1 = 3, p2 = 5 and n2 = 2; in this case k = 2. (iv) For n = 522 we have

522 = 2 32 29

hence p1 = 2, n1 = 1, p2 = 3, n2 = 2, p3 = 29 and n3 = 1; in this case k = 3. N

What we have just seen raises two questions: if every natural number admits a prime
factorization (we have only seen a few speci…c examples up to now) and if such factorization
is unique. The next result, the Fundamental Theorem of Arithmetic, resolves both matters
by showing that every integer admits one and only one prime factorization. In other words,
every integer can be expressed uniquely as a product of prime numbers.
Prime numbers are thus the “atoms” of N: they are “indivisible” (as they are divisible
only by 1 and themselves) and by means of them any other natural number can be expressed
20 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

uniquely. The importance of this result, which shows the centrality of prime numbers, can
be seen in its name. Its …rst proof can be found in the famous Disquisitiones Arithmeticae,
published in 1801 by Carl Friederich Gauss, although Euclid was already aware of the result
in its essence.

Theorem 26 (Fundamental Theorem of Arithmetic) Any natural number n > 1 ad-


mits one and only one prime factorization as in (1.13).

Proof Let us start by showing the existence of this factorization. We will proceed by
contradiction. Suppose there are natural numbers that do not have a prime factorization
as in (1.13). Let n > 1 be the smallest among them. Obviously, n is a composite number.
There are then two natural numbers p and q such that n = pq with 1 < p; q < n. Since n
is the smallest number that does not admit a prime factorization, the numbers p and q do
admit such factorization. In particular, we can write
n0 n0 0
p = pn1 1 pn2 2 pnk k and q = q1 1 q2 2 qsns

Thus, we have that


n0 n0 0
n = pq = pn1 1 pn2 2 pnk k q1 1 q2 2 qsns
By collecting the terms pi and qj appropriately, n can be rewritten as in (1.13). Hence, n
admits a prime factorization, which contradicts our assumptions on n, thus concluding the
proof of the existence.
Let us proceed by contradiction to prove uniqueness as well. Suppose that there are
natural numbers that admit more than one factorization. Let n > 1 be the smallest among
them: then n admits at least two di¤erent factorizations, so that we can write
n0 n0 0
n = pn1 1 pn2 2 pnk k = q1 1 q2 2 qsns

Since q1 is a divisor of n, it must be a divisor of at least one of the factors p1 < < pm .15
For example, let p1 be one such factor. Since both q1 and p1 are primes, we have that q1 = p1 .
Hence
n0 1 n0 0
pn1 1 1 pn2 2 pnk k = q1 1 q2 2 qsns < n
which contradicts the minimality of n, as the number pn1 1 1 pn2 2 pnk k also admits multiple
factorizations. The contradiction proves the uniqueness of the prime factorization.

From a methodological viewpoint it must be noted that this proof of existence is carried
out by contradiction and, as such, cannot be constructive. Indeed, such proofs are based on
the law of excluded middle (a property is true if and only if it is not false) and the truth
of a statement is established by showing its non-falseness. This often allows for such proofs
to be short and elegant but, although logically air-tight,16 they are almost metaphysical as
they do not provide a procedure for constructing the mathematical entities whose existence
15
This mathematical fact, although intuitive, requires a mathematical proof. This is indeed the content of
Euclid’s Lemma, which we do not prove. This lemma allows to conclude that if a prime p divides a product
of strictly positive integers, then it must divide at least one of them.
16
Unless one rejects the law of excluded middle, as many eminent mathematicians have done (although it
constitutes a minority view and a very subtle methodological issue, the analysis of which is surely premature).
1.3. STRUCTURE OF THE INTEGERS 21

they establish. In other words, they do not provide an algorithm with which such entities
can be determined.
To sum up, we invite the reader to compare this proof of existence with the constructive
one provided for Theorem 21. This comparison should clarify the di¤erences between the two
fundamental types of proofs of existence, constructive/direct and non-constructive/indirect.

It is not a coincidence that the proof of the existence in the Fundamental Theorem of
Arithmetic is not constructive. Indeed, designing algorithms which allow us to factorize
a natural number n into prime numbers (the so-called factorization tests) is exceedingly
complex. After all, constructing algorithms which can assess whether n is prime or composite
(the so-called primality tests) is already extremely cumbersome and it is to this day an active
research …eld (so much so that an important result in this …eld dates to 2002).17
In order to grasp the complexity of the problem it su¢ ces to observe that, if n is com-
p p
posite, there are two natural numbers a; b > 1 such that n = ab. Hence, a n or b n
(otherwise, ab > n), and so there is a divisor of n among the natural numbers between 1
p
and n. In order to verify whether n is prime or composite, we can merely divide n by all
p
natural numbers between 1 and n: if none of them is a divisor for n, we can safely conclude
that n is a prime number, or, if this is not the case, that n is composite. This procedure
p
requires at most n steps.
With this in mind, suppose we want to test whether the number 10100 + 1 is prime or
compositep (it is a number with 101 digits, so it is big, but not huge). The procedure requires
100 50
at most 10 + 1 operations, that is, at most 10 operations (approximately). Suppose we
have an extremely powerful computer which is able to carry out 1010 (ten billion) operations
per second. Since there are 31:536:000 seconds in a year, that is, approximately 3 107
seconds, our computer would be able to carry out approximately 3 107 1010 = 3 1017
operations in one year. In order to carry out the operations our procedure might require,
our computer would need
1050 1
= 1033
3 1017 3
years. We had better get started...

It should be noted that, if the prime factorization of two natural numbers n and m is
known, we can easily determine their greatest common divisor. For example, from

3801 = 3 7 181 and 1708 = 22 7 61

it easily follows that gcd (3801; 1708) = 7, which con…rms the result of Euclid’s Algorithm.
Given how di¢ cult it is to factorize natural numbers, the observation is hardly useful from
a computational standpoint. Thus, it is a good idea to hold on to Euclid’s Algorithm, which
thanks to Lamé’s Theorem is able to produce the greatest common divisors with reasonable
e¢ ciency, without having to conduct any factorization.
17
One of the reasons why the study of factorization tests is an active research …eld is that the di¢ culty
in factorizing natural numbers is exploited by modern cryptography to build unbreakable codes (see Section
6.4).
22 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

But how many are there?


Given the importance of prime numbers, it comes naturally to ask oneself how many there
are. The next celebrated result of Euclid shows that these are in…nitely many. After Theorem
17, it is the second remarkable gem of Greek mathematics we have the pleasure to meet in
these few pages.

Theorem 27 (Euclid) There are in…nitely many prime numbers.

Proof The proof is carried out by contradiction. Suppose that there are only …nitely many
prime numbers and denote them by p1 < p2 < < pn . De…ne

q = p1 p 2 pn

and set m = q + 1. The natural number m is larger than any prime number, hence it is a
composite number. By the Fundamental Theorem of Arithmetic, it is divisible by at least
one of the prime numbers p1 , p2 , ..., pn . Let us denote this divisor by p. Both natural
numbers m and q are thus divisible by p. It follows that also their di¤erence, that is the
natural number 1 = m q, is divisible by p, which is impossible since p > 1. Hence, the
assumption that there are …nitely many prime numbers is false.

In conclusion, we have looked at some basic notions in number theory, the branch of
mathematics which deals with the properties of integers. It is one of the most fascinating
and complex …elds of mathematics, and it bears incredibly deep results, which are often easy
to state, but very hard to prove. A classic example is Fermat’s (famous) Last Theorem,
whose statement is quite simple: if n 3, there cannot exist three strictly positive integers
x, y, and z such that xn + y n = z n . Thanks to Pythagoras’ Theorem we know that for
n = 2 such triplets of integers do exist (for example, 32 + 42 = 52 ); Fermat’s Last Theorem
states that n = 2 is indeed the only case in which this remarkable property holds. Stated
by Fermat, the theorem was …rst proven in 1994 by Andrew Wiles after more than three
centuries of unfruitful attempts.

1.4 Order structure of R


We now turn our attention to the set R of the real numbers, which is central for applications.
An important property of R is the possibility of ordering its elements through the inequality
. The intuitive meaning of such inequality is clear: given two real numbers a and b, we
have a b when a is at least as great as b.
Consider the following properties of the inequality :

(i) re‡exivity: a a;

(ii) antisymmetry: if a b and b a, then a = b;

(iii) transitivity: if a b and b c, then a c;

(iv) completeness (or totality): for every pair a; b 2 R, we have a b or b a (or both);

(v) additive independence: if a b, then a + c b + c for every c 2 R.


1.4. ORDER STRUCTURE OF R 23

(vi) multiplicative independence: let a b; then

ac bc if c > 0

ac = bc = 0 if c = 0

ac bc if c < 0

(vii) separation:18 given two sets of real numbers A and B, if a b for every a 2 A and
b 2 B, then there exists c 2 R such that a c b for every a 2 A and b 2 B.

The …rst three properties have an obvious interpretation. Completeness guarantees that
any two real numbers can always be ordered. Additive independence ensures that the initial
ordering between two real numbers a and b is not altered by adding to both the same real
number c. Multiplicative independence considers, instead, the stability of such ordering with
respect to multiplication.
Finally, separation permits to separate two sets ordered by – that is, such that each
element of one of the two sets is greater than or equal to each element of the other one –
through a real number c, called separating element.19 Separation is a fundamental property
of “continuity”of the real numbers and it is what mainly distinguishes them from the rational
numbers (for which such property does not hold, as remarked in the last footnote) and makes
them the natural environment for mathematical analysis.

The strict form a > b of the “weak”inequality indicates that a is strictly greater than
b. In terms of , we have a > b if and only if b a, that is, the strict inequality can be
de…ned as the negation of the weak inequality (of opposite direction). The reader can verify
that transitivity and independence (both additive and multiplicative) hold also for the strict
inequality >, while the other properties of the inequality do not hold for >.

The order structure, characterized by properties (i)–(vii), is fundamental in R. Before


starting its study, we introduce by means of and > some fundamental subsets of R:

(i) the closed bounded intervals [a; b] = fx 2 R : a x bg;

(ii) the open bounded intervals (a; b) = fx 2 R : a < x < bg;

(iii) the half-closed (or half-open) bounded intervals (a; b] = fx 2 R : a < x bg and
[a; b) = fx 2 R : a x < bg.

Other important intervals are:


18
Sometimes the property of separation of real numbers is called axiom of completeness (or of continuity
or also of Dedekind ). In this textbook we do not adopt this terminology to avoid confusion with property
(iv) of completeness or totality.
19
The property
p of separation holds also
p for N and Z, but not for Q. For example, the sets A =
q 2 Q : q < 2 and B = q 2 Q : q > 2 do not have a rational separating element (as the reader can
verify in light of Theorem 17 and of what we will see in Section 1.4.3).
24 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

(iv) the unbounded intervals [a; 1) = fx 2 R : x ag and (a; 1) = fx 2 R : x > ag, and
their analogous ( 1; a] and ( 1; a).20 In particular, the positive half-line [0; 1) is
often denoted by R+ , while R++ denotes (0; 1), that is, the positive half-line without
the origin.

The use of the adjectives open, closed, and unbounded will become clear in Chapter 5.
To ease notation, in what follows (a; b) will denote both an open bounded interval and the
unbounded ones (a; 1), ( 1; b) and ( 1; 1) = R. Analogously, (a; b] and [a; b) will denote
both the half-closed bounded intervals and the unbounded ones ( 1; b] and [a; 1).

1.4.1 Maxima and minima


De…nition 28 Let A R be a non-empty set. A number h 2 R is called upper bound of A
if it is greater than or equal to each element of A, that is, if 21

h x 8x 2 A

while it is called lower bound of A if it is smaller than or equal to each element of A, that
is, if
h x 8x 2 A

For example, if A = [0; 1], the number 3 is an upper bound and the number 1 is a lower
bound since 1 x 3 for every x 2 [0; 1]. In particular, the set of upper bounds of A is
the interval [1; 1) and the set of the lower bounds is the interval ( 1; 0].
We will denote by A the set of upper bounds of A and by A the set of lower bounds.
In the example just seen, A = [1; 1) and A = ( 1; 0].

A few simple remarks. Let A be any set.

(i) Upper bounds and lower bounds do not necessarily belong to the set A: the upper
bound 3 and the lower bound 1, for the set [0; 1], are an example of this.

(ii) Upper bounds and lower bounds might not exist. For example, for the set of even
numbers
f0; 2; 4; 6; g (1.14)
there is no real number which is greater than all its elements: hence, this set does not
have upper bounds. Analogously, the set

f0; 2; 4; 6; g (1.15)

has no lower bounds, while the set of integers Z is a simple example of a set without
upper and lower bounds.
20
When there is not danger of confusion, we will write simply 1 instead of +1. The symbol 1, introduced
in mathematics by John Wallis in the 17th Century, reminds a curve called lemniscate and a kind of hat or of
halo (symbol of force) put on the head of some tarot card …gures: in any case, it is de…nitely not a ‡attened
8.
21
The universal quanti…er 8 reads “for every”. Therefore, “8x 2 A”reads “for every element x that belongs
to the set A”.
1.4. ORDER STRUCTURE OF R 25

(iii) If h is an upper bound, so is h0 > h; analogously, if h is a lower bound, so is h00 < h.


Therefore, if they exist, upper bounds and lower bounds are not unique.

Through upper bounds and lower bounds we can give a …rst classi…cation of sets of the
real line.

De…nition 29 A non-empty set A R is called:

(i) bounded from above if it has an upper bound, that is, A 6= ;;

(ii) bounded from below if it has a lower bound, that is, A 6= ;;

(iii) bounded if it is bounded both from above and from below.

For example, the closed interval [0; 1] is bounded, since it is bounded both from above
and from below, while the set (1.14) of even numbers is bounded from below, but not from
above (indeed, it has no upper bounds).22 Analogously, the set (1.15) is bounded from above,
but not from below.
Note that this classi…cation of sets is not exhaustive: there exist sets that do not fall in
any of the types (i)–(iii) of the previous de…nition. For example, Z has neither an upper
bound nor a lower bound in R, and therefore it is not of any of the types (i)-(iii). Such sets
are called unbounded .

We now introduce a fundamental class of upper and lower bounds.

De…nition 30 Given a non-empty set A R, an element x


^ of A is called maximum of A
if it is the greatest element of A, that is, if

x
^ x 8x 2 A

and it is called minimum of A if it is the smallest element of A, that is, if

x
^ x 8x 2 A

The key feature of this de…nition is the condition that the maximum and minimum belong
to the set A at hand. It is immediate to see how maxima and minima are, respectively, upper
bounds and lower bounds. Indeed, they are nothing but the upper bounds and lower bounds
that belong to the set A. For such a reason, maxima and minima can be seen as the “best”
among the upper bounds and the lower bounds. Many economic applications are, indeed,
based on the search of maxima or minima of suitable sets of alternatives.

Example 31 The closed interval [0; 1] has minimum 0 and maximum 1. N

Unfortunately, maxima and minima are fragile notions: sets often do not admit them.
22
By using Proposition 38, the reader can formally prove that, indeed, the set of even numbers is not
bounded from above.
26 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

Example 32 The half-closed interval [0; 1) has minimum 0, but it has no maximum. Indeed,
suppose by contradiction that there exists a maximum x ^ 2 [0; 1), so that x
^ x for every
x 2 [0; 1). Set
1 1
x
~= x
^+ 1
2 2
Since x^ < 1, we have x
^<x ~. But, it is obvious that x
~ 2 [0; 1), which contradicts the fact
that x
^ is maximum of [0; 1). N

By reasoning in a similar way, we see that:

(i) the half-closed interval (0; 1] has maximum 1, but it has no minimum;

(ii) the open interval (0; 1) has neither minimum, nor maximum.

When they exist, maxima and minima are unique:

Proposition 33 A set A R has at most one maximum and one minimum.

Proof Let x^1 ; x


^2 2 A be two maxima of A. We show that x^1 = x^2 . Since x
^1 is a maximum,
we have x
^1 x for every x 2 A. In particular, since x
^2 2 A, we have x ^1 x ^2 . Analogously,
x
^2 x ^1 because also x^2 is a maximum. Therefore, x
^1 = x
^2 . In a similar way, we can prove
the uniqueness of the minimum.

The maximum of a set A is denoted by max A, and its minimum by min A. For example,
for A = [0; 1] we have max A = 1 and min A = 0.

1.4.2 Supremum and in…mum


Since maxima and minima are key for applications (and not only there), their fragility is a
substantial problem. To mitigate this we look for a “surrogate”, a conceptually similar, but
less fragile, notion which is available also when maxima or minima are absent.
Let us consider …rst maxima.23 We begin by noting that the maximum, when it exists,
is the smallest (least) upper bound, that is,

max A = min A (1.16)

Let x
^ 2 A be the maximum of A. If h is an upper bound of A, we have h x
^, since x
^ 2 A.
On the other hand, x
^ is also an upper bound, and we thus obtain (1.16).

Example 34 The set of upper bounds of [0; 1] is the interval [1; 1). In this example, the
equality (1.16) takes the form max [0; 1] = min [1; 1). N

Thus, when it exists, the maximum is the smallest upper bound. But, the smallest upper
bound –that is, min A –might exist also when the maximum does not exist. For example,
consider A = [0; 1): the maximum does not exist, but the smallest upper bound exists and
it is 1, i.e., min A = 1.
23
As already mentioned, in economics maxima play a fundamental role.
1.4. ORDER STRUCTURE OF R 27

All of this suggests that the smallest upper bound is the surrogate for the maximum
which we are looking for. Indeed, in the example just seen, the point 1 is, in absence of a
maximum, its closest approximation.
Reasoning in a similar way, the greatest lower bound, i.e., max A , is the natural candid-
ate to be the surrogate for the minimum when the latter does not exist. Motivated by what
we have just seen, we give the following de…nition.

De…nition 35 Given a non-empty set A R, one calls supremum of A the least upper
bound of A, that is, min A , and in…mum the greatest lower bound of A, that is, max A .

Thanks to Proposition 33, both the supremum and the in…mum of A are unique, when
they exist. We denote them by sup A and inf A. For example, for A = (0; 1) we have
inf A = 0 and sup A = 1.
As already remarked, when inf A 2 A, it is the minimum of A, and when sup A 2 A, it
is the maximum of A.

Although suprema and in…ma may exist when maxima and minima do not, they do not
always exist.

Example 36 Consider the set A of the even numbers in (1.14). In this case A = ; and so
A has no supremum. More generally, if A is not bounded from above, we have A = ; and
the supremum does not exist. In a similar way, the sets that are not bounded from below
have no in…ma.24 N

To be a useful surrogate, suprema and in…ma must exist for a large class of sets; other-
wise, if also their existence were problematic, they would be of little help as surrogates.25
Fortunately, the next important result shows that suprema and in…ma do indeed exist for a
large class of sets (with sets of the kind seen in the last example being the only troublesome
ones).

Theorem 37 (Least Upper Bound Principle) Each non-empty set A R has supremum
if it is bounded from above and it has in…mum if it is bounded from below.

Proof We limit ourselves to prove the …rst statement. To say that A is bounded from above
means that it admits an upper bound, i.e., that A 6= ;. Since a h for every a 2 A and
every h 2 A , by the separation property there exists a separating element c 2 R such that
a c h for every a 2 A and every h 2 A . Since c a for every a 2 A, we have that c
is an upper bound of A, so that c 2 A . But, since c h for every h 2 A , it follows that
c = min A , that is, c = sup A. This proves the existence of the supremum of A.

Except for the sets that are not bounded from above, all the other sets in R admit
supremum. Analogously, except for the sets that are not bounded from below, all the other
24
If A does not admit supremum, we write sup A = +1 and, when it does not admit in…mum, inf A = 1.
Moreover, by convention, we set sup ; = 1 and inf ; = +1. This is motivated by the fact that each real
number must be considered simultaneously an upper bound and a lower bound of ;: then it is natural to
conclude that sup ; = inf ; = inf R = 1 and inf ; = sup ; = sup R = + 1.
25
The utility of a surrogate depends on how well it approximates the original, as well as on its availability.
28 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

sets in R have in…mum. Suprema and in…ma are thus excellent surrogates that exist, and so
help us, for a large class of subsets of R.
Note that a simple, but useful, consequence of the previous theorem is that bounded sets
have both supremum and in…mum.

1.4.3 Density
The order structure is also useful to clarify the relations among the sets N, Z, Q, and R.
First of all, we make rigorous a natural intuition: however great is a real number, there
always exists a greater natural number. This is the so-called Archimedean property of real
numbers.

Proposition 38 For each real number a 2 R, there exists a natural number n 2 N such that
n a.

Proof By contradiction, assume that there exists a 2 R such that a n for all n 2 N.
By the Least Upper Bound Principle, sup N exists and belongs to R. Recall that, by the
de…nition of sup,
sup N n 8n 2 N (1.17)
At the same time, again by the de…nition of sup, we have sup N 1 < n for some n 2 N
(otherwise, sup N 1 would be an upper bound of N, thus violating the fact that sup N is the
least of these upper bounds). We can conclude that sup N < n + 1 2 N, which contradicts
(1.17).

The next property shows a fundamental di¤erence between the structures of N and Z, on
the one side, and of Q and R, on the other side. If we take an integer number, we can talk
in a very natural way of predecessor and successor. In particular, if m 2 Z, its predecessor
is the integer m 1, while its successor is the integer m + 1 (for example, the predecessor of
317 is 316 and its successor is 318). In other words, Z has a discrete “rhythm”.
In contrast, we cannot talk of predecessors and successors in Q or in R. Consider …rst
Q. Given a rational number q = m=n, let q 0 = m0 =n0 be any rational such that q 0 > q. Set
1 0 1
q 00 = q + q
2 2
The number q 00 is rational, since
1 m0 1 m 1 m0 n + mn0
q 00 = 0
+ =
2 n 2 n 2 nn0
and one has
q < q 00 < q 0 (1.18)
Therefore, there is no smallest rational number greater than q. Analogously, it is easy to
see how that there is no greatest rational number smaller than q. Rational numbers, hence,
do not admit predecessors and successors.
In a similar way we show that, given any two real numbers a < b, there exists a real
number c such that a < c < b. Indeed,
1 1
a< a+ b<b
2 2
1.4. ORDER STRUCTURE OF R 29

Real numbers, therefore, also do not admit predecessors and successors. The rhythm of both
rational and real numbers is “tight”, without discrete interruptions (which are intervals).
Such property of Q and R is called density. Unlike N and Z, which are discrete sets, Q and
R are dense sets.26

We conclude with an important density relationship between Q and R. We already


observed how most real numbers are not rational. Nevertheless, rational numbers are a
“dense” – and therefore very signi…cant – subset of the real numbers because, as we show
next, in the inequality a < c < b, with a; b 2 R, we can always choose c to be a rational
number: between any two real numbers we can always “insert” a rational number.

Proposition 39 Given any two real numbers a < b, there exists a rational number q 2 Q
such that a < q < b.

This property can be stated by saying that Q is dense in R.


In the proof of the result we use the notion of integer part [a] of a real number a 2 R,
which
p is the greatest integer n 2 Z such that n a. For example, [ ] = 3, [5=2] = 2,
2 = 1, [ ] = 4 and so on . The reader can verify that

[a + 1] = [a] + 1 (1.19)

since, for each n 2 Z, we have n a if and only if n + 1 a + 1. Moreover, [a] < a when
a2= Z.

Proof Let a; b 2 R, with a < b. For simplicity, we distinguish three cases.

Case 1: Let a + 1 = b. If a 2 Q the result follows from (1.18). Let a 2


= Q, and therefore
a+12= Q. We have
[a] a < [a] + 1 = [a + 1] < a + 1 (1.20)
and therefore q = [a] + 1 is the rational number we were looking for.

Case 2: Let b a > 1, i.e., a < a + 1 < b. From Case 1 it follows that there exists q 2 Q
such that a < q < a + 1 < b.

Case 3: Let b a < 1. By the Archimedean property of real numbers, there exists
0 6= n 2 N such that
1
n
b a
so that nb na = n (b a) 1. Then, for what we have just seen in cases 1 and 2, there
exists q 2 Q such that na < q < nb. Therefore,
q
a< <b
n
which completes the proof because q=n 2 Q.
26
In his famous argument against plurality, Zeno of Elea remarks that a “plurality” is in…nite because “...
there will always be other things between the things that are, and yet others between those others.” (trans.
Raven). Zeno thus identi…es density as the characterizing property of an in…nite collection. With a (twenty
…ve centuries) hidden insight, we can say that he is neglecting the integers. Yet, it is stunning how he was
able to identify a key property of in…nite sets.
30 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

1.5 Powers and logarithms


1.5.1 Powers
1
Given n 2 N, we have already recalled the meaning of q n with q 2 Q and of q n with
1
0 < q 2 Q. In a similar way we de…ne an with a 2 R and a n with 0 < a 2 R. More generally,
we set
1 m 1
a n = n and a n = (am ) n
a
for m; n 2 N and 0 < a 2 R. We have, therefore, p de…ned the power ar with real positive
n m m
base and rational exponent. Sometimes we write a instead of a n .
Given 0 < a 2 R, we now want to extend this notion to the case ax with x 2 R, i.e., with
real exponent. Before doing this, we make two important observations.

(i) We have de…ned ar only for a > 0, in order to avoid dangerous and embarrassing q
3
misunderstandings. Think, for example, of ( 5) . It could be rewritten as 2 ( 5)3 =
2
p2
p 3
125 or as 2 5 ; which do not exist (among the real numbers). But, it could
3 6
q
also be written as ( 5) 2 = ( 5) 4 which, in turn, can be expressed as either 4 ( 5)6 =
p p 6
4
15; 625; or 4 5 . The former exists and is approximately equal to 11:180339, but
the latter does not exist.
p 1
(ii) Let us consider the root a = ap 2 . It is well known that each positive number has

two algebraic roots, for example 9 = 3. The unique positive value of the root is
called, instead, arithmetical root. For example, 3 and 3 are the two algebraic roots
of 9, while 3 is its unique arithmetical root. In what follows the (even order) roots will
always be in the arithmetical sense (and therefore with a unique value). It is, by the
way, the standard convention: for example, in the classical solution formula
p
b b2 4ac
x=
2a
of the quadratic equation ax2 + bx + c = 0, the root is in the arithmetical sense
(otherwise, we should not write because the root would be automatically double).

We now extend the notion of power to the case ax , with 0 < a 2 R and x 2 R. Since,
unfortunately, the details of this extension are tedious, we limit ourselves to saying that, if
a > 1, the power ax is the supremum of the set of all the values aq when the exponent q
varies among the rational numbers such that q x. Formally,

ax = sup faq : q x with q 2 Qg (1.21)

In a similar way we de…ne ax for 0 < a < 1. We have the following properties that, by (1.21),
follow from the analogous properties that hold when the exponent is rational.

Lemma 40 Let a > 0 and x; y 2 R. We have ax > 0 for every x 2 R. Moreover:

(i) ax ay = ax+y and ax =ay = ax y;


1.5. POWERS AND LOGARITHMS 31

(ii) (ax )y = axy ;

(iii) ax bx = (ab)x and ax =bx = (a=b)x ;

(iv) if x > y; then


ax > ay if a > 1

ax < ay if a < 1

ax = ay = 1 if a = 1

Among the bases a > 0, the most important is the number e (which will be introduced
in Chapter 8). As we will see, the power ex has truly remarkable properties.

1.5.2 Logarithms
The operations of addition and multiplication are commutative: a + b = b + a and ab = ba.
Therefore, they have only one inverse operation, respectively the subtraction and the division:

(i) if a + b = c, then b = c a and a = c b.

(ii) if ab = c, then b = c=a and a = c=b, with a; b 6= 0.

The power operation ab , with a > 0, is not commutative: ab might well be di¤erent from
ba .Therefore, it has two distinct inverse operations.
Let ab = c. The …rst inverse operation (given c and b, …nd out a) is called root with index
b of c: p
a = b c = c1=b
The second one (given c and a, …nd out b) is called logarithm with base a of c:

b = loga c

Note that, together with a > 0 and c > 0, one must also have a 6= 1, because 1b = c is
impossible except when c = 1.

The logarithm is a fundamental notion, ubiquitous in mathematics and in all its applic-
ations. As we have just seen, it is a simple notion: the number b = loga c is nothing but the
exponent that must be given to a in order to get c, that is,

aloga c = c

The properties of the logarithms derive easily from the properties of the powers seen in
Lemma 40.

Lemma 41 Let a; c; d > 0, with a 6= 1. We have:

(i) log1=a c = loga c;


1
(ii) logak c = k loga c for every 0 6= k 2 R;
32 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

(iii) loga (cd) = loga c + loga d;

(iv) loga (c=d) = loga c loga d;

(v) loga ck = k loga c for every k 2 R.

(vi) loga c = logb c= logb a (change of base).


b
Proof (i) If (1=a)b = c, then a b = c. (ii) If ak = c, then akb = c, and therefore the
exponent that must be given to ak in order to get c is 1=k of the exponent that must be
given to a. (iii) Let ax = c, ay = d, and az = cd: given that cd = ax ay = ax+y , the statement
holds. (iv) The proof is similar to the previous one. (v) Let ax = c and ay = ck : given that
ck = (ax )k = akx , the statement follows.27 (vi) Let ax = c, by = c, and bz = a: we have
ax = (bz )x = bzx = c = by and therefore zx = y , that is, x = y=z.

In view of the change of base property (vi), it is possible to take as base of the logarithms
always the same number, say 10, because
log10 c
loga c =
log10 a
As for the powers ax , also for the logarithms the most common base is the number e. In
such a case we simply write log x instead of loge x. Because of its importance, log x is called
the natural logarithm of x, which leads to the notation ln x sometimes used in place of log x.

The next result shows the close connections between logarithms and powers, which can
be actually seen as inverse notions.

Proposition 42 Given a > 0, a 6= 1, we have

loga ax = x 8x 2 R

and
aloga x = x 8x > 0

We leave to the reader the simple proof. To check their understanding of the material of
this section, the reader can also verify that bloga c = cloga b for all strictly positive numbers
a 6= 1, b, and c.

1.6 Numbers, …ngers and circuits


The most natural way to write numbers makes use of the “decimal notation”. Ten graphic
symbols have been chosen,
0; 1; 2; 3; 4; 5; 6; 7; 8; 9 (1.22)
called digits. Using positional notation, any natural number can be written by means of
digits which represent, from right to left respectively, units, tens, hundreds, thousands, etc.
27
For example, with a 6= 1, loga x2 = 2 loga x for x > 0. Note that loga x2 exists for each x 6= 0, while
2 loga x exists only for x > 0.
1.6. NUMBERS, FINGERS AND CIRCUITS 33

For example, in this manner, 4357 means 4 thousands, 3 hundreds, 5 tens and 7 units.
The natural numbers are thus expressed by powers of 10, each of which causes a digit to be
added: writing 4357 is the abbreviation of

4 103 + 3 102 + 5 101 + 7 100

In order to employ positional notation, it is fundamental to adopt the 0 to signal an empty


slot: for example, when writing 4057 the zero signals the absence of the hundreds, that is,

4 103 + 0 102 + 5 101 + 7 100

Non-integers are represented in a completely analogous fashion, being articulated by


powers of 1=10 = 10 1 : for example 0:501625 is the abbreviation of
1 2 3 4 5 6
5 10 + 0 10 + 1 10 + 6 10 + 2 10 + 5 10

The choice of decimal notation is due to the mere fact that we have ten …ngers, but
obviously is not the only possible one. Some Native American tribes used to count on their
hands using the eight spaces between their …ngers rather than the ten …ngers themselves.
They would have chosen only 8 digits, which could have easily been

0; 1; 2; 3; 4; 5; 6; 7

and they would have articulated the integers along the powers of 8, that is 8, 64, 512, 4096,
. . . They would have written our decimal number 4357 as

1 4096 + 0 512 + 4 64 + 0 8 + 5 = 1 84 + 0 83 + 4 82 + 0 81 + 5 80 = 10405

and the decimal 0:501625 as

1 2
4 0:125 + 1 0:0015625 = 4 8 +1 8 = 0:41
In general, given a base b and a set of digits

Cb = fc0 ; c1 ; :::; cb 1g

used to represent the integers between 0 and b 1, every natural number n is written in the
base b as
dk dk 1 d1 d0
where k is an appropriate natural number and

n = d k bk + d k 1b
k 1
+ + d1 b + d0

with di 2 Cb for each i = 0; :::; k.


For example, let us consider the duodecimal base, with digits

0; 1; 2; 3; 4; 5; 6; 7; 8; 9; |; •

We have used the symbols | and • for the two additional digits we need compared to the
decimal notation. The duodecimal number

9|0•2 = 9 124 + | 123 + 0 122 + • 12 + 2


34 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

can be converted to decimal notation as


9|0•2 = 9 124 + | 123 + 0 122 + • 12 + 2
= 9 124 + 10 123 + 0 122 + 11 12 + 2
= 188630
using the conversion table
Duod. 0 1 2 3 4 5 6 7 8 9 | •
Dec. 0 1 2 3 4 5 6 7 8 9 10 11
One can note that the duodecimal notation 9|0•2 requires fewer digits than the decimal
188630, that is, …ve instead of six. On the other hand, the duodecimal notation requires 12
symbols to be used as digits, instead of 10. It is a typical trade o¤ one faces in choosing the
base in which to represent numbers: larger bases make it possible to represent numbers with
fewer digits, but require a large set of digits. The solution to the trade o¤, and the resulting
choice of base, depends on the characteristics of the application of interest.
For example, in electronic engineering, it is important to have a set of digits which is as
simple as possible, with only two elements, as computers and electrical appliances naturally
have only two digits at their disposal (open or closed circuit, positive or negative polarity).
For this reason, the base 2 is incredibly common, as it is the most e¢ cient base in terms of
the complexity of the digit set C2 , which only consists of the digits 0 and 1 (which are called
bits, from binary digits).
In binary notation, the integers can be written as
Dec. 0 1 2 3 4 5 6 7 8 9 10 11 16
Bin. 0 1 10 11 100 101 110 111 1000 1001 1010 1011 10000
where, for example, in binary notation

1011 = 1 23 + 0 22 + 1 21 + 1 20
and in decimal notation
11 = 1 101 + 1 100
The considerable reduction in the digit set C2 made possible by the base 2 involves in terms of
cost the large number of bits required to represent numbers in binary notation. For example:
if 16 consists of two decimal digits, the corresponding binary 10000 requires …ve bits; if 201
requires three digits, the corresponding binary 11001001 requires eight bits; if 2171 requires
four digits, the corresponding binary 100001111011 requires twelve bits, and so on. Very
quickly, binary notation requires a number of bits that only a computer is able to process.

From a purely mathematical perspective, the choice of base is merely conventional, and
going from one base to another is easy (although tedious).28 Bases 2 and 10 are nowadays the
28
Operations on numbers written in a non-decimal notation are not particularly di¢ cult either. For ex-
ample, 11 + 9 = 20 can be calculated in a binary way as
1011+
1001 =
10100
It is su¢ cient to remember that the “carrying” must be done at 2 and not at 10.
1.6. NUMBERS, FINGERS AND CIRCUITS 35

most important ones, but many others have been used in the past, such as 20 (the number of
…ngers and toes, a trace of which is still found in the French language where“quatre-vingts”,
or “four-twenties”stands for eighty and “four-twenty-ten” stands for ninety), as well as 16
(the number of spaces between …ngers and toes) and 60 (which is convenient because it is
divisible by 2, 3, 4, 5, 6, 10, 12, 15, 20 and 30; a signi…cant trace of this system remains in
how we divide hours and minutes and how we measure angles).

The positional notation has been used to perform manual calculations since the dawn
of times (just think about computations carried out with the abacus), but it is a relatively
recent conquest in terms of writing, made possible by the fundamental innovation of the zero,
and has been exceptionally important in the development of mathematics and its countless
applications – commercial, scienti…c, and technological. Born in India (apparently around
the 5th century AD), the positional notation was developed during the early Middle Ages
in the Arab world (especially thanks to the works of Al-Khwarizmi), from which the name
“Arabic numerals” for the digits (1.22) derives, and arrived in the Western world thanks to
Italian merchants between the 11th and 12th centuries. In particular, the son of one of those
merchants, Leonardo da Pisa (also known as Fibonacci), was the most important medieval
mathematician: he authored a famous treatise in 1202, the Liber Abaci, the most acclaimed
among the …rst essays in Europe regarding positional notation. Until then non-positional
Roman numerals were used

I; II; III; IV; V; :::; X; :::; L; :::; C; :::M; :::

which made even trivial operations overly complex (try to sum sum up CXL and MCL, and
then 140 and 1150).
Let us conclude with the incipit of the …rst chapter of Liber Abaci and the extraordinary
innovation the book brought to the Western world:

Novem …gure indorum he sunt

9; 8; 7; 6; 5; 4; 3; 2; 1

Cum his itaque novem …guris, et cum hoc signo, quod arabice zephirum appellatur,
scribitur quilibet numerus, ut inferius demonstratur. [...] ut in sequenti cum
…guris numeris super notatis ostenditur.

MI M M XXIII M M M XXII M M M XX M M M M M DC MMM


1001 2023 3022 3020 5600 3000

... Et sic in reliquis numeris est procedendum.29


29
“The nine Indian symbols are ... With these nine symbols and with the symbol 0, which the Arabs call
zephyr, any number can be written as shown below. [...] the above numbers are shown below in symbols
... And in this way you continue for the following numbers.” Interestingly, Roman numerals continued to be
used in book keeping for a long time because they are more di¢ cult to manipulate (just add a 0 to an Arabic
numeral in a balance sheet...).
36 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

1.7 The extended real line


In the theory of limits that we will study later in the book, it is very useful to consider the
extended real line. It is obtained by adding to the real line the two ideal points +1 and
1. We obtain in such a way the set

R [ f 1; +1g

denoted by the symbol R (sometimes with [ 1; +1]) The order structure of R can be
naturally extended on R by setting 1 < a < +1 for each a 2 R.

The operations de…ned in R can be partially extended to R. In particular, besides the


usual rules of calculation in R, on the extended real line the following further rules hold:

(i) addition with a real number:

a + 1 = +1; a 1= 1 8a 2 R (1.23)

(ii) addition between in…nities of the same sign:

+1 + 1 = +1 and 1 1= 1

(iii) multiplication with a non-zero number:30

a (+1) = +1 and a ( 1) = 1 8a > 0


a (+1) = 1 and a ( 1) = +1 8a < 0

(iv) multiplication of in…nities:


(
+1 (+1) = 1 ( 1) = +1
+1 ( 1) = 1 (+1) = 1

with, in particular,

(+1)a = +1 if a > 0 and (+1)a = 0 if a < 0

(v) division:
a a
= =0 8a 2 R
+1 1
(vi) power of a real number:
8
>
> a+1 = +1 if a > 1
>
>
>
< a+1 = 0 if 0 < a < 1
>
> a 1 =0 if a > 1
>
>
>
: 1
a = +1 if 0 < a < 1
30
A real number is often called scalar.
1.7. THE EXTENDED REAL LINE 37

(vii) power between in…nities:


(
(+1)+1 = +1
(+1) 1 =0

While the addition of in…nities with the same sign is a well-de…ned operation (for example,
the sum of two positive in…nities is again a positive in…nity), the addition of in…nities of
di¤erent sign is not de…ned. For example, the result of +1 1 is not de…ned. This is a
…rst example of an indeterminate operation in R. In general, the following operations are
indeterminate:

(i) addition of in…nities with di¤erent sign:

+1 1 and 1+1 (1.24)

(ii) multiplication between 0 and in…nity:

1 0 and 0 ( 1) (1.25)

(iii) divisions with denominator equal to zero or with numerator and denominator that are
both in…nities:
a 1
and (1.26)
0 1

with a 2 R;

(iv) the powers:


1
1 ; 00 ; (+1)0 (1.27)

The indeterminate operations (i)–(iv) are called forms of indetermination and will play
an important role in the theory of limits. Note that, by setting a = 0, formula (1.26) takes
the form
0
0

O.R. As we have observed, the most natural geometric image of R is the (real) line: to each
point there corresponds a number and, vice versa, to each number there corresponds a point.
If we take a closed (and obviously bounded) segment, we can “transport” all the numbers
from the real line to the segment, as the following …gure shows:31

31
We refer to the proof of Proposition 249 for the analytic expression of the bijection shown here.
38 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

All the real numbers that found a place on the real line also …nd a place on the segment,
extremes excluded (maybe packed, but they really …t all). Two points are left, the extremes
of the segment, to which it is natural to associate, respectively, +1 and 1. The geometric
image of R is therefore a closed segment. H

1.8 The birth of the deductive method


The deductive method, upon which mathematics is based, was born between the VI and
the V century B.C. and, in that period, came to dominate Greek mathematics. As we have
seen throughout the chapter, mathematical properties are stated in theorems, whose truth
is established by a logical argument, their proof, which is based on axioms and de…nitions.
It is a revolutionary innovation in the history of human thought, celebrated in several
Dialogues of Plato and codi…ed in the Elements of Euclid. It places reason as the sole guide
for scienti…c (and non-scienti…c) investigations. A mathematical property – for example,
that the sum of the squares of the catheti is equal to the square of the hypotenuse – is
true because it can be logically proved and not because it is empirically veri…ed in concrete
examples or because a nice drawing makes the intuition clear or because some “authority”
reveals its truth.
Little is known about the birth of the deductive method, the survived documentation is
scarce. Reason emerged in the Ionian Greek colonies (…rst in Miletus with Thales and Anax-
imander) to guide the …rst scienti…c investigations of physical phenomena. It was, however,
in Magna Graecia that reason …rst tackled abstract matters. An intriguing hypothesis, pro-
posed by Arpad Szabo,32 underlines the importance of the Eleatic philosophy, ‡ourished at
32
See “The Beginnings of Greek Mathematics”, Reidel Publishing Company, 1978. Elea was a town of
1.8. THE BIRTH OF THE DEDUCTIVE METHOD 39

Elea in the V century B.C. and that has in Parmenides and Zeno its most famous exponents.
In Parmenides’famous doctrine of the Being, a turning point in intellectual history that the
reader might have encountered in some high school philosophy course, it is logic that permits
the study of the Being, that is, of the world of truth ( " ). This study is impossible for
the senses, which can only guide us among the appearances that characterize the world of
opinion ( o ). In particular, only the reason can dominate the arguments by contradiction,
which have no empirical substratum, but are the pure result of reason. Such arguments,
developed – according to Szabo – by the Eleatic school and at the center of its dialectics
(culminated in the famous paradoxes of Zeno), for example enabled the Eleatic philosopher
Melissus of Samo to state that the Being “always was what it was and always will be. For
if it had come into being, necessarily before it came into being there was nothing. But, if
there was nothing, in no way could something come into being from nothing”.33
True knowledge is thus theoretic, only the eye of the mind can see the truth, while
empirical analysis necessarily stops at the appearance. The anti-empirical character of the
Eleatic school could have been decisive in the birth of the deductive method, at least in
creating a favorable intellectual environment. Naturally, it is not possible to exclude an
opposite causality to the one proposed by Szabo: The deductive method could have been
developed inside mathematics and could have p then in‡uenced philosophy, and in particular
the Eleatics.34 Indeed, the irrationality of 2, established by the Pythagorean school (the
other great pre-Socratic school of Magna Graecia), is a …rst decisive triumph of such a
method in mathematics: only the eye of the mind could see such a property, which is devoid
of any “empirical” intuition. It is the eye of the mind that explains the inescapable error
in which incurs every empirical measurement of the hypotenuse of a right triangle with
catheti of unitary length: however accurate is this
p measurement, it will always be a rational
approximation of the true irrational distance, 2, with a consequent approximation error
(that, by the way, will probably vary from measurement to measurement).
In any case, between the VI and the V century B.C. two pre-Socratic schools of Magna
Graecia were the cradle of an incredible intellectual revolution. In the III century B.C. an-
other famous Magna Graecia scholar, Archimedes from Syracuse, led this revolution to its
maximum splendor in the classical world (and beyond). We close with Plato’s famous (prob-
ably …ctional) description of two protagonists of this revolution, Parmenides and Zeno.35

Magna Graecia, around 140 kilometers south of Naples.


33
In his book “The Presocratic philosophers”, Routledge, 1982, J. Barnes calls this beautiful fragment the
theorem of ungenerability (trans. Allho¤, Smith, and Vaidya in “Ancient phylosophy”, Blackwell, 2008). In a
less transparent way (but it was part of the …rst logical argument ever reported) Parmenides had written in his
poem “And how might what is be then? And how might it have come into being? For if it came into being, it
is not, nor if it is about to be at some time”(trans. Barnes). We refer to G. Calogero “Studi sull’Eleatismo”,
La Nuova Italia, 1977, for a classic work on Eleatic philosophy, and to the book by J. Barnes as well as to
the recent W. James, “Presocratics”, Routledge, 2014, for general introductions to the Presocratics.
34
For instance, arguments by contradiction could have been developed within the Pythagorean school p
through the odd-even dichotomy for natural numbers that is central in the proof of the irrationality of 2.
This is what Maria Cardini Timpanaro argues, contra Szabo, in her comprehensive “Pitagorici”, La Nuova
Italia, 1964. See also pp. 258-259 in G. Vlastos, “Studies in Greek philosophy”, v. 1, Princeton University
Press, 1996. Interestingly, the archaic Greek enigmas were formulated in contradictory terms (their role in
the birth of dialectics is emphasized by G. Colli in “La nascita della …loso…a”, Adelphi, 1975).
35
In Plato’s dialogue “Parmenides” (trans. Jowett reported in Barnes ibid.). A caveat: over the centuries
– actually, over the millennia – the strict Eleatic anti-empirical stance (understandable, back then, in the
excitement of a new approach) has inspired a great deal of metaphysical thinking. Reason without empirical
40 CHAPTER 1. SETS AND NUMBERS: AN INTUITIVE INTRODUCTION

They came to Athens ... the former was, at the time of his visit, about 65 years
old, very white with age, but well favoured. Zeno was nearly 40 years of age,
tall and fair to look upon: in the days of his youth he was reported to have been
beloved by Parmenides.

motivation and discipline becomes, at best, sterile.


Chapter 2

Cartesian structure and Rn

2.1 Cartesian products and Rn


Suppose we want to classify a wine according to two characteristics, ageing and alcoholic
content. For example, suppose one reads on a label: 2 years of ageing and 12 degrees. We
can write
(2; 12)

On another label one reads: 1 year of ageing and 10 degrees. In this case we can write

(1; 10)

The pairs (2; 12) and (1; 10) are called ordered pairs and in them we distinguish the …rst
element, the ageing, from the second one, the alcoholic content. In an ordered pair the
position is, therefore, crucial.
Let A1 be the set of the possible years of ageing and let A2 be the set of the possible
alcoholic contents. We can write

(2; 12) 2 A1 A2 ; (1; 10) 2 A1 A2

We denote by a1 a generic element of A1 and by a2 a generic element of A2 . For example,


in (2; 12) we have a1 = 2 and a2 = 12.

De…nition 43 Given two sets A1 and A2 , the Cartesian product A1 A2 is the set of all
the ordered pairs (a1 ; a2 ) with a1 2 A1 and a2 2 A2 .

In the example, we have A1 N and A2 N, i.e., the elements of A1 and A2 are natural
numbers. More generally, we can assume that A1 = A2 = R, so that the elements of A1 and
A2 are any real numbers, although with a possible di¤erent interpretation according to their
position. In this case A1 A2 = R R = R2 and the pair (a1 ; a2 ) can be represented by a
point in the plane:

41
42 CHAPTER 2. CARTESIAN STRUCTURE AND RN

An ordered pair of real numbers (a1 ; a2 ) 2 R2 is called a vector.

Among the subsets of R2 , of particular importance are:

(i) (a1 ; a2 ) 2 R2 : a1 = 0 , that is, the set of the ordered pairs of the form (0; a2 ); it is
the vertical axis (or axis of the ordinates).

(ii) (a1 ; a2 ) 2 R2 : a2 = 0 , that is, the set of the ordered pairs of the form (a1 ; 0); it is
the horizontal axis (or axis of the abscissae).

(iii) (a1 ; a2 ) 2 R2 : a1 0 and a2 0 , that is, the set of the ordered pairs (a1 ; a2 ) with
both components that are positive; it is the …rst quadrant of the Cartesian plane (also
called positive orthant). In a similar way we can de…ne the other quadrants:

y
3

II I
1

0
O x
-1

III IV
-2
-3 -2 -1 0 1 2 3 4 5

(iv) (a1 ; a2 ) 2 R2 : a21 + a22 = 1 and (a1 ; a2 ) 2 R2 : a21 + a22 1 , that is, respectively
the circumference and the circle with center at the origin and radius equal to 1.
2.1. CARTESIAN PRODUCTS AND RN 43

Above we have classi…ed wines using two characteristics, ageing and alcoholic content.
We now consider a slightly more complicated product, for example a portfolio of assets.
We suppose that there exist four di¤erent assets that can be purchased on the market. A
portfolio is then described by an ordered quadruple

(a1 ; a2 ; a3 ; a4 )

where a1 is the amount of money invested in the …rst asset, a2 is the amount of money
invested in the second asset, and so on. For example,

(1000; 1500; 1200; 600)

denotes a portfolio in which 1000 euros have been invested in the …rst asset, 1500 in the
second one, and so on. The position is crucial: the portfolio

(1500; 1200; 1000; 600)

is very di¤erent from the previous one, although the amounts of money invested in the
di¤erent assets are the same.
Since amounts of money are numbers that are not necessarily integers, possibly negative
(in case of short sales), it is natural to take A1 = A2 = A3 = A4 = R, where Ai is the set of
the possible amounts of money that can be invested in asset i = 1; 2; 3; 4. We have

(a1 ; a2 ; a3 ; a4 ) 2 A1 A2 A3 A4 = R4

In particular,
(1000; 1500; 1200; 600) 2 R4
In general, if we consider n sets A1 ; A2 ; :::; An we can give the following de…nition.

De…nition 44 Given n sets A1 ; A2 ; :::; An , their Cartesian product

A1 A2 An
Q
denoted by ni=1 Ai (sometimes by ni=1 Ai ), is the set of all the ordered n-tuples (a1 ; a2 ; :::; an )
with a1 2 A1 ; a2 2 A2 ; ; an 2 An .

We call a1 ; a2 ; ; an the components (or elements) of a. When A1 = A2 = = An =


A, we write
A1 A2 An = A A A = An
In particular, if A1 = A2 = = An = R the Cartesian product is denoted by Rn , which
therefore is the set of all the (ordered) n-tuples of real numbers. In other words,

Rn = |R R {z R}
n times

An element
x = (x1 ; x2 ; :::; xn ) 2 Rn
44 CHAPTER 2. CARTESIAN STRUCTURE AND RN

is called a vector.1 The Cartesian product Rn is called the Euclidean space (n-dimensional).
For n = 1, R is represented by the real line; for n = 2, R2 is represented by the plane;
and so on. As for R and R2 , the vectors (a1 ; a2 ; a3 ) in R3 admit a graphic representation:

1 z
0.9

0.8
a
3
0.7

0.6

0.5
a
2
0.4 O
0.3 a
1
0.2 y
x
0.1

0
0 0.2 0.4 0.6 0.8 1

This is no longer possible in Rn when n 4. The graphic representation may help the
intuition, but from a theoretical and computational viewpoint it has no importance because
the vectors of Rn , with n 4, are completely well-de…ned entities. They actually turn out
to be fundamental in economic applications, as we will see in Section 2.4.

Notation. We will denote the components of a vector by the same letter used for the vector
itself, along with an ad hoc index: for example a3 is the third component of the vector a,
y7 the seventh component of the vector y, and so on.

2.2 Operations in Rn
Let us consider two vectors in Rn ,

x = (x1 ; x2 ; ::; xn ) ; y = (y1 ; y2 ; :::; yn )

We de…ne the vector sum x + y as

x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn )

For example, for the two vectors x = (7; 8; 9) and y = (2; 4; 7) in R3 , we have

x + y = (7 + 2; 8 + 4; 9 + 7) = (9; 12; 16)

Note that x + y 2 Rn : through the operation of addition we built a new element of Rn .


Let now 2 R and x 2 Rn . We call product the vector x by the scalar the vector x
de…ned as
x = ( x1 ; x2 ; :::; xn )
1
For real numbers we use the letter x instead of a.
2.2. OPERATIONS IN RN 45

For example, for = 2 and x = (7; 8; 9) 2 R3 , we have

2x = (2 7; 2 8; 2 9) = (14; 16; 18)

Even in this case, we have x 2 Rn . In other words, also with the operation of multiplication
by scalars, we built a new element of Rn .

Notation. We set x = ( 1)x = ( x1 ; x2 ; :::; xn ) and x y = x + ( 1) y. We will also


set 0 = (0; 0; :::; 0), where boldface distinguishes the vector 0 of zeros from the scalar 0. The
vector 0 is called the zero vector.

We have introduced in Rn two operations, addition and multiplication by scalars, that


extend to vectors the corresponding operations for real numbers. Let us see their properties.
We start with addition.

Proposition 45 Let x; y; z 2 Rn . The operation of addition satis…es the following proper-


ties:

(i) x + y = y + x (commutativity),

(ii) (x + y) + z = x + (y + z) (associativity),

(iii) x + 0 = x (existence of the neutral element for addition),

(iv) x + ( x) = 0 (existence of the opposite of any vector).

Proof We prove (i), leaving the other properties to the reader. We have

x + y = (x1 + y1 ; x2 + y2 ; :::; xn + yn ) = (y1 + x1 ; y2 + x2 ; :::; yn + xn ) = y + x

as desired.

We now consider the multiplication by scalars.

Proposition 46 Let x; y 2 Rn and ; 2 R. The operation of multiplication by scalars


satis…es the following properties:

(i) (x + y) = x + y (distributivity of the addition of vectors),

(ii) ( + ) x = x + x (distributivity for the addition of scalars),

(iii) 1x = x (existence of the neutral element for the multiplication by scalars),

(iv) ( x) = ( ) x (associativity).

Proof We prove (ii): the other properties are left to the reader. We have

( + ) x = (( + ) x1 ; ( + ) x2 ; :::; ( + ) xn )
= ( x1 + x1 ; x2 + x2 ; :::; xn + xn )
= ( x1 ; x2 ; :::; xn ) + ( x1 ; x2 ; :::; xn ) = x + x
46 CHAPTER 2. CARTESIAN STRUCTURE AND RN

as claimed.

As we will see better in the next chapter (Section 3.3), the operations of addition and
multiplication by scalars allow us to de…ne the important notion of linear combination of
vectors. In particular, a vector x 2 Rn will be said to be linear combination of the vectors
m
xi i=1 of Rn if there exist m real numbers (coe¢ cients) f i gm 1
i=1 such that x = 1 x +
+ mx . m

The last operation in Rn that we consider is the inner product . Given two vectors x and
y in Rn , their inner product, denoted by x y, is de…ned as

x y = x1 y1 + x2 y2 + + xn yn

that is, in more compact notation,2


n
X
x y= xi yi
i=1

Other common notations for the inner product are (x; y) and hx; yi.
For example, for the vectors x = (1; 1; 5; 3) and y = ( 2; 3; ; 1) of R4 , we have

x y = 1 ( 2) + ( 1) 3 + 5 + ( 3) ( 1) = 5 2

The inner product is an operation that di¤ers from addition and scalar multiplication in a
structural aspect: while the latter operations determine a new vector of Rn , the result of the
inner product is a scalar. The next result gathers the main properties of the inner product
(we leave to the reader the simple proof).

Proposition 47 Let x; y; z 2 Rn . We have:

(i) x y = y x ( commutativity),

(ii) (x + y) z = (x z) + (y z) ( distributivity),

(iii) x z= (x z) ( distributivity).

Note that the two distributive properties can be summarized in the single property
( x + y) z = (x z) + (y z).

2.3 Order structure on Rn


The order structure of Rn is based on the order structure of R, but with some important
novelties. We begin by de…ning the order on Rn : given two vectors x = (x1 ; x2 ; ::; xn ) and
y = (y1 ; y2 ; ::; yn ) in Rn , we write
x y
2 Pn
Given n real numbersQri , their sum r1 + r2 + + rn is denoted by i=1 ri . Analogously, their product
r1 r2 rn is denoted by ni=1 ri .
2.3. ORDER STRUCTURE ON RN 47

when xi yi for every i = 1; 2; : : : ; n. In particular, we have x = y if and only if we have


both x y and y x.
In other words, orders two vectors by considering all their components and by applying
to them the order on R studied in Section 1.4. For example, x = (0; 3; 4) y = (0; 2; 1).
When n = 1, the order reduces to the classical one on R.

The study of the basic properties of the inequality reveals a …rst important novelty:
when n 2, the order does not satisfy completeness. Indeed, consider for example
x = (0; 1) and y = (1; 0) in R2 : we have neither x y nor y x. We say, therefore, that
on Rn is a partial order (which becomes complete when n = 1).
It is easy to …nd vectors in Rn that are not comparable. The following …gure shows the
vectors of R2 that are or than the vector x = (1; 2); the darker area represents the points
smaller than x, the clearer area those greater than x, and the two white areas represent the
points that are not comparable with x.

5
y
4

2
2
1

0
O 1 x

-1

-2
-2 -1 0 1 2 3 4 5

Apart from completeness, it is easy to verify that on Rn continues to enjoy the properties
seen for n = 1:

(i) re‡exivity: x x,

(ii) transitivity: if x y and y z, then x z,

(iii) independence: if x y, then x + z y + z for every z 2 Rn ,

(iv) separation: given two sets A and B in Rn , if a b for every a 2 A and b 2 B, then
there exists c 2 Rn such that a c b for every a 2 A and b 2 B.

Another notion that becomes surprisingly delicate when n 2 is that of strict inequality.
Indeed, given two vectors x = (x1 ; x2 ; :::; xn ) and y = (y1 ; y2 ; :::; yn ) of Rn , two cases can
happen:

All the components of x are than the corresponding components of y, with some of
them strictly greater; i.e., xi yi for each index i = 1; 2; :::n, with xi > yi for at least
an index i.
48 CHAPTER 2. CARTESIAN STRUCTURE AND RN

All the components of x are > than the corresponding components of y; i.e., xi > yi
for each i = 1; 2; :::n:

In the …rst case we have a strict inequality, in symbols x > y; in the second case a strong
inequality, in symbols x y.

Example 48 For x = (1; 3; 4) and y = (0; 1; 2) in R3 , we have x y. For x = (0; 3; 4) and


y = (0; 1; 2), we have x > y, but not x y, because x has only two components out of three
strictly greater than the corresponding components of y. N

Given two vectors x; y 2 Rn , we have

x y =) x > y =) x y

The three notions of inequality among vectors in Rn are, therefore, more and more
stringent:

(i) a weak notion, , that permits the equality between the two vectors;

(ii) an intermediate notion , >, that requires at least one strict inequality among the
components;

(iii) a strong notion, , that requires strict inequality among all the components of the
two vectors.

When n = 1, both > and reduce to the classical > on R seen in Section 1.4. Moreover,
the same symbols “reversed”, i.e., , <, and are used in the opposite case.

An especially important case is the comparison between a vector x and the vector 0. We
say that the vector x is:

(i) positive if x 0, i.e., if all the components of x are positive;

(ii) strictly positive if x > 0, i.e., if all the components of x are positive and at least one
of them is strictly positive;

(iii) strongly positive if x 0, i.e., all the components of x are strictly positive.

N.B. The notation and terminology that we have introduced is not the only possible one.
For example, some authors use =, >, and > in place of >, >, and ; other authors call
“non-negative” the vectors that we call positive, and so on. O

Together with the lack of completeness of , the presence of the two di¤erent notions of
strict inequality is the main novelty that we have in Rn , when n 2, with respect to the
special case R, i.e., n = 1, of Section 1.4.

We conclude this section by generalizing the intervals introduced in R (Section 1.4).


Given a; b 2 Rn , we have:
2.4. APPLICATIONS 49

(i) the bounded closed interval


[a; b] = fx 2 Rn : a x bg = fx 2 Rn : ai xi bi g

(ii) the bounded open interval


(a; b) = fx 2 Rn : a x bg = fx 2 Rn : ai < xi < bi g

(iii) the bounded half-closed (or half-open) intervals


(a; b] = fx 2 Rn : a x bg and [a; b) = fx 2 Rn : a x bg

We also have

(iv) unbounded intervals [a; 1) = fx 2 Rn : x ag and (a; 1) = fx 2 Rn : x ag, and


their analogues ( 1; a] and ( 1; a). In particular, the interval [0; 1) = fx 2 Rn : x 0g
is often denoted by Rn+ , while Rn++ denotes the interval (0; 1) = fx 2 Rn : x 0g. In a
similar way we de…ne the intervals Rn = fx 2 Rn : x 0g and Rn = fx 2 Rn : x 0g.

N.B. (i) The intervals in Rn can be expressed as Cartesian products of intervals in R; for
example, Y
n
[a; b] = i=1 [ai ; bi ]

(ii) In the notions of intervals just introduced we used the inequalities or . By


replacing them with the inequality <, we obtain other possible intervals that, however, are
not that relevant for our purposes. O

2.4 Applications
2.4.1 Static choices
Let us consider a consumer who has to choose how many kilograms of apples and of potatoes
to buy at the market. For convenience, we assume that these goods are in…nitely divisible,
so that the consumer can buy any real positive quantity (for example, 3 kg of apples and
kg of potatoes). In this case, R+ is the set of the possible quantities of apples that can be
bought, and the same for potatoes. Therefore, the set of the bundles of apples and potatoes
that the consumer can buy is
R2+ = R+ R+ = f(x1 ; x2 ) : x1 ; x2 2 R+ g
Graphically, this is the …rst quadrant of the plane. In general, if a consumer chooses among
more than two goods, say n goods, the set of the bundles is represented by the Cartesian
product
Rn+ = R+ R+ R+ = f(x1 ; x2 ; ::; xn ) : xi 2 R+ for i = 1; 2; :::; ng

In production theory, a vector in Rn+ represents, instead, a possible con…guration of n


inputs for the producer. In this case the vector x = (x1 ; x2 ; ::; xn ) indicates that the producer
has at his disposal x1 units of the …rst input, x2 units of the second input, ..., and xn units
of the last one.
50 CHAPTER 2. CARTESIAN STRUCTURE AND RN

2.4.2 Intertemporal choices


We have seen how, in consumer theory, a vector x = (x1 ; x2 ; ::; xn ) can be interpreted as a
bundle in which xi is the quantity of the i-th good for i = 1; 2; :::n. But, there is another
possible interpretation in which there is a single good and x = (x1 ; x2 ; ::; xn ) indicates the
quantity of such good available in di¤erent periods, with xi being the quantity of the good
available in the i-th period. For example, if the good are apples, x1 is the quantity of apples
in period 1, x2 is the quantity of apples in period 2, and so on, until xn which is the quantity
of apples in the n-th period.
In this case, Rn+ denotes the space of all the intertemporal bundles of a given good over
n periods; we often use the more evocative notation RT , where T is the number of periods
and xt is the quantity of the good in period t, with t = 1; 2; : : : ; T .3 A fundamental example
is the one in which the good is money, so that

x = (x1 ; x2 ; :::; xt ; :::; xT ) 2 RT

represents the quantity of money in di¤erent periods: in this case x is a cash ‡ow. For
example, the current account of a family records each day the balance between revenues
(wages, incomes, etc.) and expenditures (purchases, rents, etc.): setting T = 365, the
resulting cash ‡ow is
x = (x1 ; x2 ; ::::; x365 )
Therefore, x1 is the balance of the current account on January 1, x2 is the balance on January
2, and so on until x365 , which is the balance at the end of the year.

Instead of a single good over several periods, we can consider a bundle of several goods
over several periods. Similarly, in an intertemporal problem of production, we will have
vectors of input over several periods. Such situations are modeled by means of matrices,
a simple notion that will be studied in Chapter 13. Many economic applications focus,
however, on the single good case, and therefore RT is a very important space in the theory
of intertemporal choices.

2.5 Pareto optima


2.5.1 De…nition
The concept of maximum of a subset of R (De…nition 30) can be equivalently reformulated
as follows:

Lemma 49 Given a set A R, a point x


^ 2 A is maximum of A if and only if there is no
x 2 A such that x > x
^.

Indeed, requiring that all the points of A be x^ amounts to require that none of them
be > x
^. A similar reformulation can be given for minima.

We turn now our attention to subsets of Rn and the partial order . We can extend the
notion of maximum in the following way.
3
The notation t = 1; 2; : : : ; T is equivalent to t 2 f1; 2; : : : ; T g, like the notation i = 1; 2; : : : ; n is equivalent
to i 2 f1; 2; : : : ; ng. Choosing one of them is a matter of convenience.
2.5. PARETO OPTIMA 51

De…nition 50 Given a set A Rn , a point x


^ 2 A is called maximum of A if x
^ x for
every x 2 A.

In an analogous way we can de…ne the minimum. Moreover, the analogue of Proposition
33 holds: the maximum (minimum) of a set A Rn , if it exists, is unique (the proof is
similar).
Unfortunately, the notions of maximum and minimum are of little interest in applications
because often subsets of Rn do not have maxima or minima since the order is only partial
in Rn (as seen in Section 2.3). It is much more pro…table to follow, instead, the order of
ideas sketched in Lemma 49. Indeed, the characterization established there is equivalent to
the usual de…nition of maximum in R, but it becomes more general in Rn . This motivates
the next de…nition, of great importance in economic applications.

De…nition 51 Let A Rn . A point x ^ 2 A is called maximal (or a Pareto optimum) of A


if there is no x 2 A such that x > x
^.

In a similar way we can de…ne minimals, which are also called Pareto optima.4 Say
that a point x 2 A is dominated by another point y 2 A if x < y, that is, if xi yi for
each index i, with xi < yi for at least an index i (Section 2.3). A dominated point is thus
outperformed by another point available in the set. For instance, if they represent bundles
of goods, a dominated bundle x is obviously a no better alternative than the dominant one
y. In terms of dominance, we can say that a point a of A is maximal if is not dominated
by any other point in A. That is, a is not outperformed by any other alternative available
in A. Maximality is thus the natural extension of the notion of maximum when dealing –
as it is often the case in applications –with alternatives that are multi-dimensional (and so
represented by vectors of Rn ).
In the rest of the section we focus on maxima and maximals, the most relevant in economic
applications, leaving to the reader the dual properties that hold for minima and minimals.

2.5.2 Maxima and maximals


Lemma 49 shows that the notions of maximum and maximal are equivalent in R. This is no
longer true in Rn when n > 1: the notion of maximum becomes (much) stronger than that
of maximal.

Lemma 52 Given a set A Rn , its maximum, if it exists, is the unique maximal of A.

Proof Let x ^ 2 A be the maximum of A. Clearly, x ^ is a maximal. We need to show that it


is the unique maximal. Let x 2 A with x 6= x^. Since x
^ is the maximum of A, we have x
^ x.
Since x 6= x
^, we have x
^ > x. Therefore, x is not a maximal.

The set in next …gure has a maximum, point a, which thanks to this lemma is therefore
also the unique maximal.
4
Optima, like angels, have no gender. Even if it were preferable to talk about Pareto maxima and minima,
unfortunately the tradition does not distinguish between them calling them both Pareto optima. Their nature
is then clari…ed by the context.
52 CHAPTER 2. CARTESIAN STRUCTURE AND RN

Lemma 52 has, therefore, established that the maximum of a set, when it exists, is the unique
maximal; that is,
maximum =) maximal

But, the converse is false: there exist maximals that are not maxima; that is,

maximal 6=) maximum

Example 53 The next …gure shows a set A of R2 that has no maxima, but in…nitely many
maximals.

3 a

2
A

0
O

-1

-2
-2 -1 0 1 2 3 4 5

It is easy to see that any point a 2 A on the dark edge is maximal: there is no x 2 A such
that x > a. On the other hand, a is not a maximum: we have a x only for the points
2.5. PARETO OPTIMA 53

x 2 A that are comparable with a, which are represented in the shaded part of A :

Nothing can be said, instead, for the points that are not comparable with a (the non-shaded
part of A). The lack of maxima for this set is due to the non comparability of all the elements
of the set, so, in the …nal analysis, to the fact that the order is only partial in Rn when
n > 1. N

The set A of the example illustrates another fundamental di¤erence between maxima and
maximals in Rn with n > 1: the maximum of a set, if it exists, is unique while a maximal
might not to be unique (indeed, very often, it is not).

In conclusion, because of the incompleteness of the order on Rn , maxima are much less
important than maximals, which are the key notion in Rn . That said, maximals might also
not exist: the 45 straight line is a simple subset of R2 without maximals and minimals.5

2.5.3 Pareto frontier and Edgeworth box

Maximals are fundamental in economics, where they are (often) called Pareto optima. The
set of these points is of particular importance.

De…nition 54 The set of the maximals of a set A Rn is called the Pareto (or e¢ cient)
frontier of A.

5
This set is the graph of the function f : R ! R given by f (x) = x, as we will see in Chapter 6.
54 CHAPTER 2. CARTESIAN STRUCTURE AND RN

In the last example, the dark edge is the Pareto frontier of the set A :
5

2
A

0
O

-1

-2
-2 -1 0 1 2 3 4 5

As a …rst economic application, assume for example that the di¤erent vectors of a set
A Rn represent the pro…ts that n individuals can get. The Pareto optima represent the
situations from which it is not possible to move away without reducing the pro…t of at least
one of the individuals. In other words, the n individuals would not object to restrict A to the
set of its Pareto optima (nobody looses), that is, to its Pareto frontier; a con‡ict of interests
arises, instead, when a point on the frontier has to be selected.
The concept of Pareto optimum, simple but ingenious, has the great merit of allowing
to narrow down, with a unanimous consensus, a set A of alternative possibilities, and so to
identify the true “critical” subset, the Pareto frontier, which is often much smaller than the
original set A.6
A magni…cent illustration of this key aspect of Pareto optimality is the famous Edgeworth
box.7 Consider two agents, Albert and Barbara, who have to divide between them unitary
quantities of two in…nitely divisible goods (for example, a kilogram of ‡our and a liter of
wine). We want to model the problem of division (probably determined by a bargaining
between them) and to see if, thanks to Pareto optimality, we can say something non-trivial
about it.
Each pair x = (x1 ; x2 ) with x1 2 [0; 1] and x2 2 [0; 1], is a possible allocation of the two
goods to one of the two agents: in other words, the Cartesian product [0; 1] [0; 1] describes
them all. The two agents must agree on the allocations (a1 ; a2 ) of Albert and (b1 ; b2 ) of
Barbara. Clearly,
a1 + b1 = a2 + b2 = 1 (2.1)
6
For the Pareto optimality is key that the agents only consider their own alternatives (bundles of goods,
pro…ts, etc.), without minding about those of their peers. In other words, that they do not feel envy or similar
social emotions. To see why, think to a tribe of “envious”, whose head decides to double the food rations to
half of the members of the tribe, living unchanged those of the other members. The new allocation would
provoke lively protests by the “unchanged” members even though nothing changed for them.
7
Since we will use notions that we will introduce in Chapter 6, the reader may want to read this application
after having read that chapter.
2.5. PARETO OPTIMA 55

To complete the description of the problem, we have to say which are the desiderata of
the two agents. To this end, we suppose that they have identical utility functions ua ; ub :
[0; 1] [0; 1] ! R, and that, for simplicity, they are of the Cobb-Douglas type ua (x1 ; x2 ) =
p
ub (x1 ; x2 ) = x1 x2 (see Example 174). The indi¤erence curves can be “packed” in the
following way:

This is the classic Edgeworth box. By condition (2.1), we can think of a point (x1 ; x2 ) 2
[0; 1] [0; 1] as the allocation of Albert. We can actually identify each possible division
between the two agents with the allocations (x1 ; x2 ) of Albert; indeed, the allocations of
Barbara (1 x1 ; 1 x2 ) are univocally determined once those of Albert are known.

Each allocation (x1 ; x2 ) has utility ua (x1 ; x2 ) for Albert and ub (1 x1 ; 1 x2 ) for Bar-
bara. Let

A = (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) 2 R2+ : (x1 ; x2 ) 2 [0; 1] [0; 1]

be the set of all the utility pro…les of the two agents determined by the division of the two
goods.
By looking at the Edgeworth box, the reader will be easily convinced that the Pareto
frontier of A, i.e., the set of the Pareto optima of A, is given by the diagonal

D = (d; d) 2 R2+ : d 2 [0; 1]

of the box. That is, by the locus of the tangency points of the indi¤erence curves (called
contract curve). To prove this rigorously, we need the next simple result.

Lemma 55 Given x1 ; x2 2 [0; 1], we have


p p
1 x1 x2 (1 x1 ) (1 x2 ) (2.2)

with equality if and only if x1 = x2 .


56 CHAPTER 2. CARTESIAN STRUCTURE AND RN

Proof Since x1 ; x2 2 [0; 1], we have:


p p p
1 x1 x2 (1 x1 ) (1 x2 ) () (1 x1 x2 )2 (1 x1 ) (1 x2 )
2
x1 + x2 p x1 + x2
() x1 x2 () x1 x2 () (x1 x2 )2 0
2 2

Since the last inequality is always true, we conclude that (2.2) holds. Moreover, these
equivalences imply that
p p
1 x1 x2 = (1 x1 ) (1 x2 ) () (x1 x2 )2 = 0

which holds if and only if x1 = x2 .

Having established this lemma, we can now prove rigorously what the graph suggested.

Proposition 56 A pro…le (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) 2 A is a Pareto optimum of A if


and only if (x1 ; x2 ) 2 D.

Proof We start by showing that, for any division of goods (x1 ; x2 ) 2


= D, that is such that
x1 6= x2 , there exists (d; d) 2 D such that

(ua (d; d) ; ub (1 d; 1 d)) > (ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) (2.3)

For Albert, we have


p p p
ua ( x1 x2 ; x1 x2 ) = x1 x2 = ua (x1 ; x2 )
p p
and, therefore, ( x1 x2 ; x1 x2 ) is for him indi¤erent to (x1 ; x2 ). By Lemma 55, for Barbara
we have
p p p p
ub (1 x1 x2 ; 1 x1 x2 ) = 1 x1 x2 > (1 x1 ) (1 x2 ) = ub (1 x1 ; 1 x2 )
p
where the inequality is strict since x1 6= x2 . Therefore, setting d = x1 x2 , (2.3) holds.
It follows that the divisions (x1 ; x2 ) outside of the diagonal are not Pareto optima. It
remains to show that the divisions on the diagonal are so. Let (d; d) 2 D and suppose, by
contradiction, that there exists (x1 ; x2 ) 2 [0; 1] [0; 1] such that

(ua (x1 ; x2 ) ; ub (1 x1 ; 1 x2 )) > (ua (d; d) ; ub (1 d; 1 d)) (2.4)

Without loss of generality,8 suppose that

ua (x1 ; x2 ) > ua (d; d) and ub (1 x1 ; 1 x2 ) ub (1 d; 1 d)

that is,
p p p p
x1 x2 > dd = d and (1 x1 ) (1 x2 ) (1 d) (1 d) = 1 d

Therefore, p
p
1 x1 x2 < 1 d (1 x1 )(1 x2 )
8
A similar argument holds when ua (x1 ; x2 ) ua (d; d) and ub (1 x1 ; 1 x2 ) > ub (1 d; 1 d).
2.5. PARETO OPTIMA 57

which contradicts (2.2). It follows that there is no (x1 ; x2 ) 2 [0; 1] [0; 1] for which (2.4)
holds. This completes the proof.

By Proposition 56, we can say that if the agents maximize their Cobb-Douglas utilities,
the bargaining will be solved in a division of the goods on the diagonal of the Edgeworth
box, i.e., such that each agent has an equal quantity of both goods.
Naturally, Proposition 56 cannot tell us anything about which of the points of the diagonal
is, then, e¤ectively determined by the bargaining. The Pareto frontier D is, however, a small
subset of A: through the notion of Pareto optimum we have been able to say something
highly non-trivial about the problem of division.
58 CHAPTER 2. CARTESIAN STRUCTURE AND RN
Chapter 3

Linear structure

In this chapter we study more in depth the linear structure of Rn which was introduced
in Section 2.2. The study of such a fundamental structure of Rn , which we will continue
in Chapter 13 on linear functions, is part of linear algebra. The theory of …nance is a
fundamental application of linear algebra, as we will see in Section 17.5.

3.1 Vector subspaces of Rn


Propositions 45 and 46 have shown how the operations of addition and multiplication by
scalars on Rn satisfy the following properties, for every x; y; z 2 Rn and every ; 2 R,

(v1) x + y = y + x (commutativity)

(v2) (x + y) + z = x + (y + z) (associativity)

(v3) x + 0 = x (existence of the neutral element for addition)

(v4) x + ( x) = 0 (existence of the opposite)

(v5) (x + y) = x + y (distributivity)

(v6) ( + ) x = x + x (distributivity)

(v7) 1x = x (existence of the neutral element for the multiplication by scalars)

(v8) ( x) = ( )x (associativity)

For this reason, as the reader will learn in more advanced courses, Rn is an example of a
vector space, which, in general, is a set where we can de…ne two operations of addition and
multiplication by scalars that satisfy properties (v1)–(v8).1 For example, in Chapter 13 we
will see another example of vector space, the space of matrices.
We call vector subspaces of Rn its subsets that behave well with respect to the two
operations:
1
The notion of vector space (…rst proposed by Giuseppe Peano in 1888) is central in mathematics, but it
is necessary to go beyond Rn to fully understand it. For this reason the reader will study in depth this notion
in more advanced courses.

59
60 CHAPTER 3. LINEAR STRUCTURE

De…nition 57 A non-empty subset V of Rn is called vector subspace if it is closed with


respect to the operations of addition and multiplication by scalars.2

We leave to the reader the easy check that the two operations satisfy in V properties
(v1)–(v8). In this regard, note that the origin belongs to each vector subspace V –i.e., 0 2 V
–since 0x = 0 for every vector x 2 V .

The following characterization is useful for determining whether a subset of Rn is a vector


subspace.

Proposition 58 A non-empty subset V of Rn is a vector subspace if and only if

x+ y 2V (3.1)

for every ; 2 R and every x; y 2 V .

Proof “Only if”. Let V be a vector subspace and let x; y 2 V . As V is closed with respect
to multiplication by scalars, we have x 2 V and y 2 V . It follows that x + y 2 V since
V is closed with respect to addition.
“If”. Putting = = 1 in (3.1), we get x + y 2 V , while, putting = 0, we get
x 2 V . Therefore, V is closed with respect to the operations of addition and multiplication
by scalars inherited from Rn .

Putting = = 0, (3.1) implies that 0 2 V . This con…rms that each vector subspace
contains the origin 0.

Example 59 There are two legitimate, but trivial, subspaces of Rn : the singleton f0g and
the space Rn itself. In particular, the reader can check that a singleton fxg is a vector
subspace of Rn if and only if x = 0. N

Example 60 Let m n and set

M = fx 2 Rn : x1 = = xm = 0g

For example, if n = 3 and m = 2, we have M = x 2 R3 : x1 = x2 = 0 . The subset M is a


vector subspace. Indeed, let x; y 2 M and ; 2 R. We have:

x + y = ( x1 + y1 ; :::; xn + yn )
= (0; :::; 0; xm+1 + ym+1 ; :::; xn + yn ) 2 M

In particular, the vertical axis in R2 , which corresponds to M = x 2 R2 : x1 = 0 , is a


vector subspace of R2 . N
2
Recall that a set is closed with respect to an operation when the result of the operation still belongs to
the set.
3.1. VECTOR SUBSPACES OF RN 61

Example 61 Let M be the set of all x 2 R4 such that


8
< 2x1 x2 + 2x3 + 2x4 = 0
x1 x2 2x3 4x4 = 0
:
x1 2x2 2x3 10x4 = 0

In other words, M is the set of the solutions of this system of equations. It is a vector
subspace: the reader can check that, given x; y 2 M and ; 2 R, we have x + y 2 M .
Performing the computations,3 we …nd that the vectors

10 2
t; 6t; t; t (3.2)
3 3

solve the system for each t 2 R, so that

10 2
M= t; 6t; t; t :t2R
3 3

is a description of the subspace. N

If V1 and V2 are two vector subspaces, we can show that also their intersection V1 \ V2 is
a vector subspace. More generally, we have the following result.

Proposition 62 The intersection of any collection of vector subspaces of Rn is itself a vector


subspace.

Rn . Since 0 2 Vi for every i, we


ProofT Let fVi g be any collection of vector subspaces of T
have i Vi 6= ;. Let x; y 2 V and ; 2 R. Since x; y 2 i Vi , we have x; y 2 Vi for every
i and, therefore,
T x + yT2 Vi for every i since each Vi is a vector subspace of Rn . Hence,
x + y 2 i Vi , and so i Vi is a vector subspace of Rn .

Di¤erently from the intersection, the union of vector subspaces is not in general a vector
subspace, as the next example shows.
3
For the sake of completeness, we provide the computations. We consider x4 as a “parameter” and solve
the system in x1 , x2 , and x3 ; clearly, we will get solutions that depend on the value of the parameter x4 :
8 8
< 2x1 x2 + 2x3 + 2x4 = 0 < 2x1 x2 = 2x3 2x4
x1 x2 2x3 4x4 = 0 =) x1 x2 = 2x3 + 4x4 =)
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< 2 (x2 + 2x3 + 4x4 ) x2 = 2x3 2x4 < x2 = 6x3 10x4
x1 + ( 2x3 2x4 2x1 ) = 2x3 + 4x4 =) x1 = 4x3 6x4 =)
: :
x1 2x2 2x3 10x4 = 0 x1 2x2 2x3 10x4 = 0
8 8
< x2 = 6x3 10x4 < x2 = 6x3 10x4
x1 = 4x3 6x4 =) x1 = 4x3 6x4 =)
: :
( 4x3 6x4 ) 2 ( 6x3 10x4 ) 2x3 10x4 = 0 x3 = 32 x4
8 2
8
< x2 = 6 x
3 4
10x4 < x2 = 6x4
2
x1 = 4 3
x4 6x 4 =) x1 = 10 x
3 4
: 2 :
x3 = 3 x4 x3 = 23 x4

In conclusion, the vectors of R4 of the form (3.2) are the solutions of the system for every t 2 R.
62 CHAPTER 3. LINEAR STRUCTURE

Example 63 The sets V1 = x 2 R2 : x1 = 0 and V2 = x 2 R2 : x2 = 0 are both vector


subspaces of R2 . We have

V1 [ V2 = x 2 R2 : x1 = 0 or x2 = 0

which is not a vector subspace of R2 . Indeed,

(1; 0) 2 V1 [ V2 and (0; 1) 2 V1 [ V2

but (1; 0) + (0; 1) = (1; 1) 2


= V1 [ V2 . N

3.2 Linear independence and dependence


In this chapter we will adopt the notation xi = xi1 ; :::; xin 2 Rn , in which the superscript
identi…es di¤erent vectors and the subscripts their components. We use immediately such
notation in the next important de…nition.
m
De…nition 64 A …nite set of vectors xi i=1 of Rn is said to be linearly independent if
whenever
1 2
1x + 2x + + m xm = 0
for some set f i gm
i=1 of real numbers, then

1 = 2 = = m =0
m
The set xi i=1 is, instead, said to be linearly dependent if it is not linearly independent,
i.e.,4 if there exists a set f i gm
i=1 of real numbers, not all equal to zero, such that
1 2 m
1x + 2x + + mx =0

Example 65 In Rn consider the vectors

e1 = (1; 0; 0; :::; 0)
e2 = (0; 1; 0; :::; 0)

en = (0; 0; :::; 0; 1)

called standard unit vectors (or versors) of Rn . The set e1 ; :::; en is linearly independent.
Indeed
1
1e + + n en = ( 1 ; :::; n )
and therefore 1e
1 + + ne
n = 0 implies 1 = = n = 0. N
m
Example 66 All the sets of vectors xi i=1 of Rn that contain the vector 0 are linearly
dependent. Indeed, without loss of generality, set x1 = 0. Given a set f i gm
i=1 of scalars
with 1 6= 0 and i = 0 for i = 2; :::; m, we have
1 2 m
1x + 2x + + mx =0
m
which proves the linear dependence of the set xi i=1
. N
4
See Section C.6.3 of the appendix for a careful analysis of this important negation.
3.2. LINEAR INDEPENDENCE AND DEPENDENCE 63

Example 67 Two vectors x1 and x2 that are linearly dependent are called collinear. This
happens if and only if there exist two scalars 1 and 2 , where at least one is di¤erent from
zero, such that 1 x1 = 2 x2 . In other words, if and only if either x = 0, or y = 0, or there
exists 6= 0 such that x1 = x2 . N

Before going on with examples, we must clarify a terminological question. Although


m
linear independence and dependence are properties of a set of vectors xi i=1 , often they
are referred to the single vectors, and we speak of a “set of linearly independent (dependent)
vectors” instead of a “linearly independent (dependent) set of vectors”.

Example 68 In R3 , the vectors

x1 = (1; 1; 1) ; x2 = (3; 1; 5) ; x3 = (9; 1; 25)

are linearly independent. Indeed

1 2 3
1x + 2x + 3x = 1 (1; 1; 1) + 2 (3; 1; 5) + 3 (9; 1; 25)
=( 1 +3 2 +9 3; 1 + 2 + 3; 1 +5 2 + 25 3)

and, therefore, 1 2 3
1x + 2x + 3x = 0 means
8
< 1 +3 2+9 3 =0
1 + 2+ 3 =0
:
1 + 5 2 + 25 3 = 0

which is a system of equations whose unique solution is ( 1; 2; 3) = (0; 0; 0). More gener-
ally, to verify if k vectors

x1 = x11 ; :::; x1n ; x2 = x21 ; :::; x2n ; ; xk = xk1 ; :::; xkn

are linearly independent in Rn , it su¢ ces to solve the linear system


8 1 2 k
>
> 1 x1 + 2 x1 + + k x1 =0
< 1 2 k
1 x2 + 2 x2 + + k x2 =0
>
>
: 1 2 k
1 xn + 2 xn + + k xn =0

If ( 1 ; :::; k ) = (0; :::; 0) is the unique solution, then the vectors are linearly independent in
Rn . For example, consider in R3 the two vectors x1 = (1; 3; 4) and x2 = (2; 5; 1). The system
to solve is 8
< 1+2 2 =0
3 1+5 2 =0
:
4 1+ 2=0

It has the unique solution ( 1; 2) = (0; 0), and so the two vectors x1 and x2 are linearly
independent. N
64 CHAPTER 3. LINEAR STRUCTURE

Example 69 Consider the vectors

x1 = (2; 1; 1) ; x2 = ( 1; 1; 2) ; x3 = (2; 2; 2) ; x4 = (2; 4; 10)

To determine if these vectors are linearly independent in R3 , we solve the system


8
< 2 1 2+2 3+2 4 =0
1 2 2 3 4 4=0
:
1 2 2 2 3 10 4 = 0

As we have seen previously (Example 61), it is solved by the vectors

10 2
t; 6t; t; t (3.3)
3 3

for each t 2 R. Therefore, (0; 0; 0; 0) is not the unique solution of the system, and so the
vectors x1 , x2 , x3 , and x4 are linearly dependent. Indeed, by setting for example t = 1 in
(3.3), the set of four numbers

10 2
( 1; 2; 3; 4) = ; 6; ;1
3 3

is a set of real coe¢ cients, with at least one di¤erent from zero, such that 1 2
1x + 2x +
3 4
3 x + 4 x = 0. N

Subsets retain linear independence.

Proposition 70 The subsets of a linearly independent set are, in turn, linearly independent.

The simple proof is left to the reader, who can also check that if we add a vector (or
more than one) to a linearly dependent set, the set remains linearly dependent.

3.3 Linear combinations


m
De…nition 71 A vector x 2 Rn is said to be a linear combination of the vectors xi i=1
of
Rn if there exist m real coe¢ cients f i gm
i=1 such that

1 m
x= 1x + + mx

Example 72 Consider the two vectors e1 = (1; 0; 0) and e2 = (0; 1; 0) in R3 . A vector of R3


is a linear combination of e1 and e2 if and only if it has the form ( 1 ; 2 ; 0) for 1 ; 2 2 R.
Indeed, ( 1 ; 2 ; 0) = 1 e1 + 2 e2 . N

The notion of linear combination allows us to establish a remarkable characterization of


linear dependence.

Theorem 73 A …nite set S of Rn , with S 6= f0g, is linearly dependent if and only if there
exists at least an element of S that is a linear combination of other elements of S.
3.3. LINEAR COMBINATIONS 65

m
Proof “Only if”. Let S = xi i=1 be a linearly dependent set of Rn . Let 2 k m
be the smallest natural number between 2 and m such that the set x1 ; :::; xk is linearly
m
dependent. At worst, k is equal to m since by hypothesis xi i=1 is linearly dependent. By
the de…nition of linear dependence, there exist therefore k real coe¢ cients f i gki=1 , with at
least one di¤erent from zero, such that
1 2 k
1x + 2x + + kx =0
We have k 6= 0, because otherwise x1 ; :::; xk 1 would be a linearly dependent set, contra-
dicting the fact that k is the smallest natural number between 2 and m such that x1 ; :::; xk
is a linearly dependent set. Given that k 6= 0, we can write
1 1 2 2 k 1 k 1
xk = x + x + + x
k k k

and therefore xk is linear combination of the vectors x1 ; :::; xk 1 . In other words, the
vector xk of S is linear combination of other elements of S.
m
“If”. Suppose that the vector xk of a …nite set S = xi i=1 is a linear combination of
other elements of S. Without loss of generality, assume k = 1. There exists a set f i gm
i=2 of
real coe¢ cients such that
x1 = 2 x2 + + m xm
De…ne the real coe¢ cients f i gm
i=1 as follows

1 i=1
i =
i i 2
By construction,
Pm f i gm
i=1 is a set of real coe¢ cients, with at least one di¤erent from zero,
i
such that i=1 i x = 0. Indeed
m
X
i
ix = x1 + 2x
2
+ 3x
3
+ + mx
m
= x1 + x1 = 0
i=1
m
It follows that xi i=1
is a linearly dependent set.

Example 74 Consider the vectors x1 = (1; 3; 4), x2 = (2; 5; 1) ; and x3 = (0; 1; 7) in R3 .


Since x3 = 2x1 x2 , the third vector is a linear combination of the other two. By Theorem
73, the set x1 ; x2 ; x3 is linearly dependent (in the proof we have k = 3). It is immediate
to check that also each of the vectors in the set x1 ; x2 ; x3 is a linear combination of the
other two, something that, as the next example shows, does not hold in general for sets of
linearly dependent vectors. N

Example 75 Consider the vectors x1 = (1; 3; 4), x2 = (2; 6; 8) ;and x3 = (2; 5; 1) in R3 .


Since x2 = 2x1 , the second vector is a multiple (and hence a linear combination) of the
…rst vector. By Theorem 73, the set x1 ; x2 ; x3 is linearly dependent (in the proof we have
k = 2). Note how x3 is not a linear combination of x1 and x2 , i.e., there are no 1 ; 2 2 R
such that x3 = 1 x1 + 2 x2 . In conclusion, Theorem 73 ensures that, in a set of linearly
dependent vectors, some of them are linear combination of others, but this is not necessarily
the case for all the vectors of the set. For example, this happened for all the vectors in the
previous example, but not in this example. N
66 CHAPTER 3. LINEAR STRUCTURE

The next result is an immediate, but fundamental, consequence of Theorem 73.

Corollary 76 A …nite set S of Rn is linearly independent if and only if none of the vectors
in S is linear combination of other vectors in S.

3.4 Generated subspaces


Let S be a set of vectors of Rn and fVi g be the collection of all the vector subspaces that
contain S. The collection is non-empty because, trivially, Rn contains
T S and it is, therefore,
an element of the collection. By Proposition 62, the intersection
T i Vi of all such subspaces is
itself a vector subspace of Rn that contains S. Therefore, i Vi is the smallest (with respect
to n
T inclusion) vector subspace of R that contains S: for each such subspace V , we have
i Vi V.
T
The vector subspace i Vi is very important and is called the vector subspace generated
or spanned by S, denoted by span S. In other words, span S is the smallest “enlargement”
of S with the property of being a vector subspace.

The next result shows how span S has a “concrete” representation in terms of linear
combinations of S.

Theorem 77 Let S be a subset of Rn . A vector x 2 Rn belongs to span S if and only if it


is a linear combination of vectors of S, i.e., if and only if there exist a …nite set xi i2I in
S and a set f i gi2I of real numbers such that
X
i
x= ix
i2I

Proof “If”. Let x 2 Rn be a linear combination of a …nite set xi i2I


of vectors of S. For
simplicity, set xi = x1 ; :::; xk . There exists, therefore, a set f i gki=1 of real numbers
Pk i2I
such that x = i=1 i xi . By the de…nition of a vector subspace, we have 1 x1 + 2 x2 2
span S since x1 ; x2 2 span S. In turn, 1
1x + 2x
2 2 span S implies 1
1x + 2x
2 +
3
P k i
3 x 2 span S, and by proceeding in this way we get that x = i=1 i x 2 span S, as
claimed.
“Only if”. Let V be the set of all vectors x 2 Rn that can be expressed as linear
combinations of vectors of S, that is, x 2 V if there exist …nite sets xi i2I S and
i
Pk
i2I
R such that x = i=1 i x . It is easy to see that V is a vector subspace of Rn
i

containing S. It follows that span S V and so each x 2 span S is a linear combination of


vectors of S.

Before illustrating the theorem with some examples, we state a simple consequence.

Corollary 78 Let S be a subset of Rn . If x 2 Rn is a linear combination of vectors of S,


then span S = span (S [ fxg).
3.4. GENERATED SUBSPACES 67

Example 79 Let S = x1 ; :::; xk Rn . By Theorem 77 we have


( k
)
X
span S = x 2 Rn : x = ix
i
with i 2 R for every i = 1; :::; k
i=1
( k )
X
i
= ix : i 2 R for every = 1; :::; k
i=1

Example 80 Let S = f(1; 0; 0) ; (0; 1; 0) ; (0; 0; 1)g R3 . We have


span S = x 2 R3 : x = 1 (1; 0; 0) + 2 (0; 1; 0) + 3 (0; 0; 1) ; with i 2R
for every i = 1; 2; 3g
= f( 1; 2; 3) : i 2 R for every i = 1; 2; 3g = R3
More generally, let S = e1 ; :::; en Rn . We have
( n
)
X
span S = x 2 Rn : x = ie
i
with i 2 R for every i = 1; :::; n
i=1
= f( 1 ; :::; n) : i 2 R for every i = 1; :::; ng = Rn

Example 81 If S = fxg, then span S = f x : 2 Rg. For example, let x = (2; 3) 2 R2 .


We have
span S = f(2 ; 3 ) : 2 Rg
i.e., span S is nothing but the graph of the straight line
3
y= x
2
passing through the origin and the point x, that is,
8

y
6

3
2

0
O 2 x

-2

-4
-6 -4 -2 0 2 4 6

N
68 CHAPTER 3. LINEAR STRUCTURE

3.5 Bases
By Theorem 77, the subspace generated by a subset S of Rn is formed by all the linear
combinations of the vectors in S. Suppose that S is a linearly dependent set. By Theorem
73, some vectors in S are linear combinations of other elements of S. By Corollary 78, such
vectors are, therefore, redundant for the generation of span S. Indeed, if a vector x 2 span S
is a linear combination of vectors of S, then by Corollary 78 we have

span S = span (S fxg)

where S fxg is the set S without the vector x.


A linearly dependent set S thus contains some elements that are redundant for the
generation of span S. This does not happen if, on the contrary, S is a linearly independent
set: by Corollary 76, no vector of S can then be a linear combination of other elements of S.
In other words, when S is linearly independent, all its vectors are essential for the generation
of span S.
These observations lead us to introduce the notion of basis.

De…nition 82 A …nite subset S of Rn is a basis of Rn if S is a linearly independent set


such that span S = Rn .

If S is a basis of Rn , it therefore holds that:

1. each x 2 Rn can be represented as a linear combination of vectors in S;

2. all the vectors of S are essential for this representation, none of them is redundant.

Such “essentiality” of a basis to represent, as linear combinations, the elements of Rn is


evident in the following result.

Theorem 83 A …nite subset S of Rn is a basis of Rn if and only if each x 2 Rn can be


written in only one way as a linear combination of vectors in S.

Proof “Only if”. Let S = x1 ; :::; xm be a basis of Rn . By de…nition, each vector x 2 Rn


can be represented as a linear combination of elements of S. Given x 2 Rn , suppose that
there exist two sets of scalars f i gm m
i=1 and f i gi=1 such that

m
X m
X
i i
x= ix = ix
i=1 i=1

Hence,
m
X
i
( i i) x =0
i=1

and, since the vectors in S are linearly independent, it follows that i i = 0 for every
i = 1; :::; m; that is, i = i for every i = 1; :::; m.
“If”. Let S = x1 ; :::; xm and suppose that each x 2 Rn can be written in a unique way
as a linear combination of vectors in S. Clearly, by Theorem 77, we have Rn = span S. It
3.5. BASES 69

remains to prove that S is a linearly independent set. Suppose that the scalars f i gm
i=1 are
such that
Xm
i
ix = 0
i=1
Since we also have
m
X
0xi = 0
i=1
we conclude that i = 0 for every i = 1; :::; m since, by hypothesis, the vector 0 can be
written in only one way as a linear combination of vectors in S.

Example 84 The canonical basis of Rn is given by the vectors e1 ; :::; en . Each x 2 Rn


can be written, in a unique way, as a linear combination of these vectors. In particular,
n
X
x = x1 e1 + + xn en = xi ei (3.4)
i=1

that is, the coe¢ cients of the linear combination are the components of the vector x. N

Example 85 The canonical basis of R2 is f(1; 0) ; (0; 1)g. But, there exist in…nitely many
other bases of R2 : for example, S = f(1; 2) ; (0; 7)g is another such basis. It is easy to
prove the linear independence of S. To show that span (S) = R2 , consider any vector
x = (x1 ; x2 ) 2 R2 . We need to show that there exist 1 ; 2 2 R such that

(x1 ; x2 ) = 1 (1; 2) + 2 (0; 7)

i.e., that solve the simple linear system

1 = x1
2 1+7 2 = x2

Since
x2 2x1
1 = x1 ; 2 =
7
solve the system, we conclude that S is indeed a basis of R2 . N

Each vector of Rn can be expressed (“recovered”) as a linear combination of the vectors of


a basis of Rn . In a sense, a basis is therefore the “genetic code”for a vector space, containing
all the pieces of information necessary to identify its elements. Since there are several bases
of Rn , such pieces of “genetic” information can be enclosed in di¤erent sets of vectors. It is
therefore important to understand what are the relations among the di¤erent bases. They
will become clear after the next theorem, whose remarkable implications make it the deus
ex machina of the chapter.

Theorem 86 For each linearly independent set x1 ; :::; xk of Rn with k n, there exist
n
n k vectors xk+1 ; :::; xn such that the total set xi i=1 is a basis of Rn .

Due to its importance, we give two di¤erent proofs of the result. Both proofs require the
following lemma.
70 CHAPTER 3. LINEAR STRUCTURE

Lemma 87 Let b1 ; :::; bn be a basis of Rn . If

x = c1 b1 + : : : + cn bn

with ci 6= 0, then b1 ; :::; bi 1 ; x; bi+1 ; :::; bn is a basis of Rn .

Proof Without loss of generality suppose that c1 6= 0. We prove that x; b2 ; :::; bn is a


basis of Rn . As c1 6= 0, we can write
1 c2 2 cn n
b1 = x b ::: b
c1 c1 c1
and, therefore, for each choice of the coe¢ cients f i gni=1 R we have
n n
" n
# n
X X 1 X ci 1
X 1 ci
i i
ib = ib + 1 x bi = x+ i bi
c1 c1 c1 c1
i=1 i=2 i=2 i=2

It follows that
span x; b2 ; :::; bn = span b1 ; b2 ; :::; bn = Rn
It remains to show that the set x; b2 ; :::; bn is linearly independent, so that we can
conclude that it is a basis of Rn . Let f i gni=1 R be coe¢ cients for which
n
X
i
1x + ib =0 (3.5)
i=2

If 1 6= 0, we have
n
X n
X
i i i i
x= b = 0b1 + b
i=2 1 i=2 1

Since x can be written in a unique way as linear combination of the vectors of the basis
n
bi i=1 , one gets that c1 = 0, which contradicts the hypothesis c1 6= 0. This means that
1 = 0 and (3.5) simpli…es to
Xn
1 i
0b + ib = 0
i=2

Since b1 ; : : : ; b n is a basis, one obtains 2 = ::: = n =0= 1.

Proof 1 of Theorem 86 We proceed by induction.5 We start, therefore, with k = 1, i.e.,


with a singleton x1 . We want to show that there exist n 1 vectors that added to x1 yield
a basis of Rn . Let y 1 ; :::; y n be a basis of n elements of Rn (for example, the one formed
by the standard unit vectors). There exist coe¢ cients f i gni=1 R such that
n
X
1 i
x = iy (3.6)
i=1

Since x1 6= 0, not all i are zero (why x1 =


6 0?). Suppose, for example, that 1 6= 0. Then,
by Lemma 87 x ; y ; :::; y is a basis of Rn . The case k = 1 is thus proved.
1 2 n

5
See Appendix D for the induction principle.
3.5. BASES 71

Suppose now that the statement of the theorem is true for each set of k 1 vectors; we
want to show that it is true for each set of k vectors. Let therefore x1 ; :::; xk be a set of k
linearly independent vectors. The subset x1 ; :::; xk 1 is linearly independent and has k 1
elements. By the induction hypothesis, there exist n (k 1) vectors yek ; :::; yen such that
x1 ; :::; xk 1 ; yek ; :::; yen is a basis of Rn . Therefore, there exist coe¢ cients f i gni=1 R such
that
k 1
X n
X
xk = i xi
+ ei
iy (3.7)
i=1 i=k

As the vectors x1 ; :::; xk 1 ; xk are linearly independent, at least one of the coe¢ cients
Pk 1
f i gni=k is di¤erent from zero. Otherwise we would have xk = i=1 i
i x and the vector x
k

would be linear combination of the vectors x1 ; :::; xk 1 , something that by Corollary 76


cannot happen. Let, for example, k 6= 0. Then, by Lemma 87 x1 ; :::; xk ; yek+1 ; :::; yen is a
basis of Rn . This completes the induction.

Proof 2 of Theorem 86 The theorem holds for k = 1. Indeed, consider Pn a singleton fxg,6
1
with x 6= 0, and the canonical basis e ; :::; e n of R . As x = i=1 xi ei , there exists at
n

least one index i such that xi 6= 0. By Lemma 87, e1 ; :::; ei 1 ; x; ei+1 ; :::; en is a basis of
Rn .
Since the statement holds for k = 1, let 1 < k n be the smallest integer for which
the property is false. By Lemma 87, there exists a linearly independent set x1 ; :::; xk such
that there do not exist n k vectors of Rn that, added to x1 ; :::; xk , yield a basis of Rn .
Given that x1 ; :::; xk 1 is, in turn, linearly independent, the minimality of k implies that
there are xk ; :::; xn such that x1 ; :::; xk 1 ; xk ; :::; xn is a basis of Rn . But then

xk = c1 x1 + + ck 1x
k 1
+ ck xk + + cn xn

Given that x1 ; :::; xk is linearly independent, one cannot have ck = = cn = 0 and,


therefore, cj 6= 0 for some index j 2 fk; :::; ng. By Lemma 87
n o
x1 ; :::; xk 1 ; xk ; :::; xj 1 ; xk ; xj+1 ; :::; xn

is a basis of Rn , a contradiction.

The next result is a simple, but important, consequence of Theorem 86.

Corollary 88 (i) Each linearly independent set of Rn with n elements is a basis of Rn .

(ii) Each linearly independent set of Rn has at most n elements.

Proof (i) It is enough to set k = n in Theorem 86. (ii) Let S = x1 ; :::; xk be a linearly
independent set in Rn . We want to show that k n. By contradiction, suppose k > n.
Then, x1 ; :::; xn is in turn a linearly independent set and by assertion (i) is a basis of Rn .
Hence, the vectors xn+1 ; :::; xk are linear combinations of the vectors x1 ; :::; xn , which,
by Corollary 76, contradicts the linear independence of the vectors x1 ; :::; xk . Therefore,
k n, which completes the proof.
6
Note that a singleton fxg is linearly independent when x = 0 implies = 0, which is equivalent to
requiring x 6= 0.
72 CHAPTER 3. LINEAR STRUCTURE

Example 89 By assertion (i), any two linearly independent vectors form a basis of R2 .
Going back to Example 85, it is therefore su¢ cient to verify that the vectors (1; 2) and (0; 7)
are linearly independent to conclude that S = f(1; 2) ; (0; 7)g is a basis of R2 . N

We can …nally state the main result of the section.

Theorem 90 All bases of Rn have the same number n of elements.

In other words, although the “genetic” information of Rn can be codi…ed in di¤erent


sets of vectors –that is, in di¤erent bases –such sets have the same (and …nite) number of
elements, that is, the same “length”. The number n can, therefore, be seen as the dimension
of the space Rn ; on the other hand, it is natural to think that the “greater” a space Rn is,
the more elements its bases have, that is, the greater is the quantity of information needed
to specify its elements.
In conclusion, the number n that emerges from Theorem 90 indicates the “dimension”of
Rn and, in a sense, justi…es its superscript n. In particular, this notion of dimension makes
it rigorous the intuitive idea that Rn is a larger space than Rm when m < n. It is larger
because it is necessary more information, i.e., bases of larger cardinality, in order to specify
its elements.

Proof Suppose that Rn has a basis of n elements. By item (ii) of Corollary 88, every other
basis of Rn can have at most n elements. Let x1 ; :::; xk be any another basis of Rn . We
show that one cannot have k < n, and so conclude that k = n. Suppose that k < n. By
Theorem 86, there exist n k vectors xk+1 ; :::; xn such that the set x1 ; :::; xk ; xk+1 ; :::; xn
is a basis of Rn . This, however, contradicts the assumption that x1 ; :::; xk is a basis of Rn ,
because the vectors xk+1 ; :::; xn are not linear combinations of the vectors x1 ; :::; xk :
x1 ; :::; xn is a linearly independent set. Therefore k = n.

3.6 Bases of subspaces


The notions introduced in the previous section for Rn extend in a natural way to its vector
subspaces V : we are interested in …nite subsets that contain all the essential information.

De…nition 91 Let V be a vector subspace of Rn . A …nite subset S of V is a basis of V if


S is a linearly independent set such that span S = V .

Bases of vector subspaces, too, permit to represent each vector of the subspace as a linear
combination of basis elements, and such representation is essential, without redundancies.
The results of the previous section can be easily generalized.7 We start with Theorem
83.

Theorem 92 Let V be a vector subspace of Rn . A …nite subset S of V is a basis of V if


and only if each x 2 V can be written in a unique way as linear combination of vectors in S.
7
We leave to the reader the proofs of the results of this section because they are similar to those of the
last section.
3.6. BASES OF SUBSPACES 73

Example 93 (i) The horizontal axis M = x 2 R2 : x2 = 0 is a vector subspace of R2 . The


singleton e1 M is a basis. (ii) The plane through the origin M = x 2 R3 : x3 = 0 is
a vector subspace of R3 . The set e1 ; e2 M is a basis. N

Since V is a subset of Rn , it will have at most n linearly independent vectors. In partic-


ular, the following generalization of Theorem 86 holds.

Theorem 94 Let V be a vector subspace of Rn with a basis of m n elements. For


each linearly independent set of vectors v 1 ; :::; v k , with k m, there exist m k vectors
m
v k+1 ; :::; v m such that the set v i i=1 is a basis of V .

In turn, Theorem 94 leads to the following extension of Theorem 90.

Theorem 95 All bases of a vector subspace of Rn have the same number of elements.

Although in view of Theorem 90 the result is not surprising, it remains of great elegance
because it shows how, despite their diversity, the bases share a fundamental characteristic
like the cardinality. This motivates the next de…nition, which was implicit in the discussion
that followed Theorem 90.

De…nition 96 The dimension of a vector subspace V of Rn is the number of elements of


any basis of V .

By Theorem 95, this number is unique, and is denoted by dim V . It is the notion of dimen-
sion that, indeed, makes interesting this (otherwise routine) section, as the next examples
show.

Example 97 In the special case V = Rn we have dim Rn = n, which makes rigorous the
discussion that followed Theorem 90. N

Example 98 (i) The horizontal axis is a vector subspace of dimension one of R2 . (ii) The
plane M = x = (x1 ; x2 ; x3 ) 2 R3 : x1 = 0 is a vector subspace of dimension two of R3 , that
is, dim M = 2. N

Example 99 If V = f0g, that is, if V is the trivial vector subspace formed only by the
origin 0, we set dim V = 0. On the other hand, V does not contain linearly independent
vectors (why?) and, therefore, it has as basis the empty set f;g. N
74 CHAPTER 3. LINEAR STRUCTURE
Chapter 4

Euclidean structure

4.1 Absolute value and norm


4.1.1 Inner product
The operations of addition and multiplication by scalars and their properties determine
the linear structure of Rn . The operation of inner product and its properties characterize,
instead, the Euclidean structure of Rn , which we will study in this chapter.
Recall from Section 2.2 that the inner product x y of two vectors in Rn is de…ned as
n
X
x y = x1 y1 + x2 y2 + + xn yn = xi yi
i=1

and that it is commutative, x y = y x, and distributive, ( x + y) z = (x z) + (y z).


Note, moreover, that
Xn
x x= x2i 0
i=1
The sum of the squares of the coordinates of a vector is nothing but the inner product of the
vector by itself. This simple observation will be central in this chapter because it will allow
us to de…ne the fundamental notion of norm using the inner product. In this regard, note
that x x = 0 if and only if x = 0: the sum of squares is zero if and only if all the terms are
zero.

Before studying the norm we introduce the absolute value, which is the scalar version of
the norm and probably already familiar to the reader.

4.1.2 Absolute value


The absolute value jxj of a number x 2 R is

x if x 0
jxj =
x if x < 0

For example, j5j = j 5j = 5. The absolute value satis…es the following elementary properties
that the reader can verify:

75
76 CHAPTER 4. EUCLIDEAN STRUCTURE

(i) jxj 0 for every x 2 R;


(ii) jxj = 0 if and only if x = 0;
(iii) jxyj = jxj jyj for every x; y 2 R;
(iv) jx + yj jxj + jyj for every x; y 2 R.
Property (iv) is called the triangle inequality. Another property of the absolute value is
jxj < c () c<x<c 8c > 0 (4.1)
We leave to the reader the veri…cation of this simple property, which will be very important
in next chapter.
p
Recall that we have already agreed to consider only the positive
p root x of a posit-
ive scalar x (the so-called “arithmetical root”): for example, 25 = 5. Formally, this is
equivalent to taking p
x2 = jxj 8x 2 R (4.2)
as the reader is invited to verify.

4.1.3 Norm
The notion of norm generalizes that of absolute value to Rn . In particular, the (Euclidean)
norm of a vector x 2 Rn , denoted by kxk, is given by
1
q
kxk = (x x) 2 = x21 + x22 + + x2n
When n = 1, the norm reduces to the absolute value; indeed, thanks to (4.2) we have
p
kxk = x2 = jxj 8x 2 R
q p
For example, if x = 4 we have kxk = ( 4)2 = 16 = 4 = j 4j = jxj.

Geometrically the norm of a vector is nothing but the length of the segment that joins it
with the origin, which is itspdistance from the origin. For n = 2 this length, by Pythagoras’
Theorem, is exactly kxk = x21 + x22 .

A similar geometric interpretation holds for n = 3, but is obviously lost for n 4.


4.1. ABSOLUTE VALUE AND NORM 77
p p
Example 100 (i) if x = (1; 1) 2 R2 , then kxk = 12 + ( 1)2 = 2;

(ii) if x = a; a2 2 R2 , with a 2 R, then


q p p
kxk = a2 + (a2 )2 = a2 + a4 = jaj 1 + a2

p p
(iii) if x = (a; 2a; a) 2 R3 , then kxk = a2 + (2a)2 + ( a)2 = jaj 6;
p
(iv) if x = 2; ; 2; 3 2 R4 , then
q p p
p 2
kxk = 22 + 2 + 2 + 32 = 4+ 2 +2+9= 15 + 2

The norm satis…es some elementary properties that extend to Rn those seen for the
absolute value. The next result gathers the simplest ones.

Proposition 101 Let x; y be arbitrary vectors in Rn and 2 R. Then:

(i) kxk 0;

(ii) kxk = 0 if and only if x = 0;

(iii) k xk = j j kxk.

Proof
p We verify (ii), leaving the rest to the reader. If x = 0 = (0; 0; :::; 0), then kxk =
0+0+ + 0 = 0; vice versa, if kxk = 0, we have

0 = kxk2 = x21 + x22 + + x2n (4.3)

Since x2i 0 for each i = 1; 2; ; n, from (4.3) it follows that x2i = 0 for each i = 1; 2; ;n
since a sum of positive numbers is ‘only if they are all zero.

Property (iii) extends the property jxyj = jxj jyj of the absolute value. We state now the
famous Cauchy-Schwarz inequality, that is a di¤erent, and more subtle, extension of such
property.

Proposition 102 (Cauchy-Schwarz) For every x; y 2 Rn ,

jx yj kxk kyk (4.4)

Equality holds if and only if the vectors x and y are collinear.1

1
Recall that two vectors are said to be collinear if they are linearly dependent (Example 67).
78 CHAPTER 4. EUCLIDEAN STRUCTURE

Proof Let x; y 2 Rn be two generic vectors. If either x = 0 or y = 0, we have jx yj = 0 =


kxk kyk. Moreover, the two vectors are trivially collinear, consistently with the fact that
in (4.4) we have equality. Let us assume, thus, that both vectors are di¤erent from 0. We
observe that (x + ty) (x + ty) = kx + tyk2 0 for all t 2 R, therefore

0 (x + ty) (x + ty) = x x + 2t(x y) + t2 (y y) = at2 + bt + c

where a = y y, b = 2(x y) and c = x x. It is well known in analytic geometry that a


parabola is greater than or equal to 0 only if its discriminant is smaller than or equal to 0,
i.e., only if = b2 4ac 0. Therefore

0 = b2 4ac = 4(x y)2 4(x x)(y y) = 4 (x y)2 kxk2 kyk2 (4.5)

Whence
(x y)2 kxk2 kyk2
and, by taking square roots of both sides, we obtain the inequality (4.4). It remains to prove
that equality holds if and only if the vectors x and y are collinear.
“Only if”. Let us assume that (4.4) holds as equality. Then, by (4.5), it follows that = 0.
Thus, there exists a point t^ where the parabola at2 + bt + c takes the value 0, i.e.,
2
0 = (x + t^y) (x + t^y) = x + t^y

By Proposition 101, this implies that x + t^y = 0, i.e., x = t^y.


“If”. If x and y are collinear, then x = t^y for some t^. Then, 0 = 0 0 =(x + t^y) (x + t^y).
This implies that the parabola at2 + bt + c, besides being always positive, takes the value 0
at the point t^, and thus the discriminant must be zero. By (4.5), we deduce that (4.4)
holds as equality.

The Cauchy-Schwarz inequality allows us to prove the triangle inequality, thereby com-
pleting the extension to the norm of properties (i)–(iv) of the absolute value.

Corollary 103 For every x; y 2 Rn , it holds that

kx + yk kxk + kyk (4.6)

Inequality (4.6) is the triangle inequality for the norm.

Proof Squaring both sides, (4.6) becomes

kx + yk2 kxk2 + kyk2 + 2 kxk kyk

that is !1 !1
n
X n
X n
X n
X 2 n
X 2

(xi + yi )2 x2i + yi2 + 2 x2i yi2


i=1 i=1 i=1 i=1 i=1
Hence, simplifying,
n n
!1 n
!1
X X 2 X 2

xi yi x2i yi2
i=1 i=1 i=1
4.1. ABSOLUTE VALUE AND NORM 79

which holds thanks to the Cauchy-Schwarz inequality.


p p
A vector of unit norm is also called versor . In the …gure the vectors 2=2; 2=2 and
p
3=2; 1=2 are two versors in R2 :

x
2
y

0
O
-1

-2
-3 -2 -1 0 1 2 3 4 5

Note that, for each x 6= 0, the vector


1
v= x
kxk
is a versor: it is su¢ cient to “normalize” x dividing it by its length. In particular
e1 = (1; 0; 0; ::; 0)
e2 = (0; 1; 0; :::; 0)

en = (0; 0; :::; 0; 1)
are the standard (or canonical) versors of Rn introduced in Chapter 3. To see their special
status, note that in R2 they are
e1 = (1; 0) and e2 = (0; 1)
and lie on the horizontal and on the vertical axes, respectively. In particular, the four versors
e1 ; e2 are the versors that belong to the Cartesian axes of R2 :

0.8

0.6
2
+e
0.4

0.2
1 1
-e +e
0
O
-0.2

-0.4
2
-e
-0.6

-0.8

-1
-1 -0.5 0 0.5 1
80 CHAPTER 4. EUCLIDEAN STRUCTURE

In R3 the standard versors are

e1 = (1; 0; 0) ; e2 = (0; 1; 0) and e3 = (0; 0; 1)

In this case, too, the six versors e1 ; e2 ; e3 are the versors that belong to the Cartesian
axes of R3 .

4.2 Orthogonality
Appendix B.3 shows how two vectors x and y of the plane can be seen to be perpendicular
when their inner product is zero, i.e., x y = 0. This suggests the following:

De…nition 104 Two vectors x; y 2 Rn are said to be orthogonal (or perpendicular) if

x y=0

When x and y are orthogonal, we write x?y. From the commutativity of the inner
product it follows that x?y is equivalent to y?x.

Example 105 (i) Two di¤erent standard versors are orthogonal. For example, for e1 and
e2 in R3 we have
e1 e2 = (1; 0; 0) (0; 1; 0) = 0
p p p
(ii) The vectors 2=2; 6=2 and 3=2; 1=2 are orthogonal:
p p ! p ! p p
2 6 3 1 6 6
; ; = + =0
2 2 2 2 4 4

The notion of orthogonality is central, as shown by the famous

Theorem 106 (Pythagoras) Let x; y 2 Rn . If x?y, then

kx + yk2 = kxk2 + kyk2

Proof We have

kx + yk2 = (x + y) (x + y) = kxk2 + x y + y x + kyk2 = kxk2 + kyk2

as desired.

The classic Pythagoras’Theorem is the case n = 2. Thanks to the notion of orthogonality


we have the general version for Rn .

The orthogonality extends in a natural way to sets of vectors.

De…nition 107 A set of vectors fxi gki=1 of Rn is said to be orthogonal if its vectors are
pairwise orthogonal.
4.2. ORTHOGONALITY 81

The set e1 ; :::; en of the fundamental versors is the most classical example of orthogonal
set.

A remarkable property of orthogonal sets is linear independence.2

Proposition 108 Any orthogonal set that does not contain the zero vector is linearly inde-
pendent.
k
Proof Let xi i=1 be an orthogonal set of Rn . Let f i gki=1 be a set of scalars such that
Pk i
i=1 i x = 0. We have to show that 1 = 2 = = k = 0. We have

k k k
!
X X X
j j i
0= jx 0 = jx ix
j=1 j=1 i=1
k
! k
! k
!
X X X
1 i 2 i k i
= 1x ix + 2x ix + + kx ix
i=1 i=1 i=1
k
! k
!
X X
2 1 2 1 i 2 2 2 2 1 i
= 1 x + 1x ix + 2 x + 2x 1x + ix
i=2 i=3
k 1
!
2 X
2
+ + k xk + kx
k
ix
i

i=1
k
X
2 2
= i xi
i=1

where we have used the hypothesis


Pk that the vectors are pairwise orthogonal, i.e., xi xj = 0
2
for every i 6= j. Hence, 0 = i=1 2i xi . Since none of the vectors xi is zero, we have
2
xi > 0 for every i = 1; 2; :::k. This yields 1 = 2 = = k = 0, as desired.

An orthogonal set composed by vectors of unit norm, i.e., by versors, is called orthonor-
mal.. The set e1 ; :::; en of the standard versors is, for example, orthonormal. In general,
k
given an orthogonal set xi i=1
of vectors of Rn , the set

k
xi
kxi k i=1

xi
obtained by dividing each element by its norm is orthonormal. Indeed, we have kxi k
=
1 xi xj 1
kxi k
xi = 1 and kxi k kxj k
= kxi kkxj k
xi xj = 0 for every i 6= j.

Example 109 Consider the following three orthogonal vectors in R3 :

x1 = (1; 1; 1) ; x2 = ( 2; 1; 1) ; x3 = (0; 1; 1)
2
In reading this result, recall that a set of vectors containing the zero vector is necessarily linearly dependent
(see Example 66).
82 CHAPTER 4. EUCLIDEAN STRUCTURE

Then p p p
x1 = 3; x2 = 6; x3 = 2
Dividing each vector by its norm, we get the orthonormal vectors

x1 1 1 1 x2 2 1 1 x3 1 1
= p ;p ;p ; = p ;p ;p ; = 0; p ;p
kx1 k 3 3 3 kx2 k 6 6 6 kx3 k 2 2

In particular, these three vectors form an orthonormal basis. N

The orthonormal bases of Rn , in primis the standard one, are the most important among
the bases of Rn because for them it is easy to determine the coe¢ cients of the linear com-
binations that represent the vectors of Rn :

Proposition 110 Let fx1 ; x2 ; :::; xn g be an orthonormal basis of Rn . For every y 2 Rn , we


have
Xn
1 1 2 2 n n
y = (y x )x + (y x )x + + (y x )x = (y xi )xi
i=1

The coe¢ cients y xi are called Fourier coe¢ cients in the given basis.

Proof Since fx1 ; x2 ; :::; xn g is a basis, there exist n scalars 1; 2 ; :::; n such that
n
X
i
y= ix
i=1

For j = 1; 2; :::n the scalar product y xj is


n
X
j i
y x = i (x xj )
i=1

Since fx1 ; x2 ; :::; xn g is orthonormal, we have

0 if i 6= j
xi xj =
1 if i = j

Hence y xj = j, from which the statement follows.

For the standard basis e1 ; e2 ; :::; en for each y = (y1 ; :::; yn ) 2 Rn we have y ei = yi
and in this way we …nd again (3.4), i.e.,
n
X
y= yi ei
i=1

The next example considers a di¤erent orthonormal basis.


4.2. ORTHOGONALITY 83

Example 111 Consider the orthonormal basis of R3 of Example 109:

1 1 1 2 1 1 1 1
x1 = p ;p ;p ; x2 = p ;p ;p ; x3 = 0; p ;p
3 3 3 6 6 6 2 2
Consider, for example, the vector y = (2; 3; 4). Since
9 3 1
x1 y = p ; x2 y = p ; x3 y = p
3 6 2
we have

y = x1 y x1 + x2 y x2 + x3 y x3
9 1 1 1 3 2 1 1 1 1 1
=p p ;p ;p +p p ;p ;p +p 0; p ;p
3 3 3 3 6 6 6 6 2 2 2
N

We close by showing that Pythagoras’Theorem extends to the orthogonal sets.


k
Proposition 112 For an orthogonal set xi i=1
of vectors of Rn we have

k 2 k
X X 2
i
x = xi
i=1 i=1

Proof We proceed by induction. We already know that the assertion holds for k = 2. We
suppose that it holds for k 1, i.e.,

k 1 2 k 1
X X 2
i
x = xi (4.7)
i=1 i=1
Pk 1 i
We show that this implies that it holds for k. Observe that, setting y = i=1 x , we have
y?xk . Indeed, !
k 1
X k 1
X
k i k
y x = x x = xi xk = 0
i=1 i=1

By Pythagoras’Theorem and (4.7), we have

k 2 k 1 2
X X 2 2
i
x = x +xi k
= y + xk = kyk2 + xk
i=1 i=1
k 1 2 k 1 k
X 2 X 2 X
i k i 2 k 2
= x + x = x + x = xi
i=1 i=1 i=1

as desired.
84 CHAPTER 4. EUCLIDEAN STRUCTURE
Chapter 5

Topological structure

In this chapter we introduce the fundamental notion of distance between points of Rn and
we study its main properties and the consequences of its presence for Rn .

5.1 Distances
The norm, studied in Section 4.1, allows to de…ne a distance in Rn . We start with n = 1,
when the norm is simply the absolute value jxj. Consider two points x and y on the real
line, with x > y:

The distance between the two points is x y, which is the length of the segment that joins
them. On the other hand, if we take any two points x and y on the real line, without knowing
their order (i.e., if x y or x y), the distance becomes

jx yj

which is the absolute value of their di¤erence. Indeed,

x y if x y
jx yj =
y x if x < y

and hence the absolute value of the di¤erence provides the distance between the two points
independently of their order. In symbols, we can write

d (x; y) = jx yj 8x; y 2 R

In particular, d (0; x) = jxj and therefore the absolute value, or, equivalently, the norm of a
point x 2 R can be regarded as its distance from the origin.
Let us now consider n = 2. We take two vectors x = (x1 ; x2 ) and y = (y1 ; y2 ) in R2 :

85
86 CHAPTER 5. TOPOLOGICAL STRUCTURE

Distance between x = (x1 ; x2 ) and


y = (y1 ; y2 ) in R2

The distance between the two vectors x and y is given by the length of the segment that
joins them (in boldface in the …gure). By Pythagoras’Theorem, this distance is
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 (5.1)
since it is the hypotenuse of the right triangle whose catheti are the segments that join xi
and yi for i = 1; 2.

Observe that the distance (5.1) it is nothing but the norm of the vector x y (and also of
y x), i.e.,
d (x; y) = kx yk
The distance between two vectors in R2 is, therefore, given by the norm of their di¤erence.
It is easy to see, applying again Pythagoras’Theorem, that the distance between two vectors
x and y in R3 is given by
q
d(x; y) = (x1 y1 )2 + (x2 y2 )2 + (x3 y3 )2
and therefore we have again
d (x; y) = kx yk
At this point we generalize the notion of distance to any n.
5.1. DISTANCES 87

De…nition 113 The ( Euclidean) distance d (x; y) between two vectors x and y in Rn is the
norm of their di¤ erence: d (x; y) = kx yk.

In particular, d(x; 0) = kxk, which is the norm kxk of the vector x 2 Rn M; can be
regarded as its distance from the vector 0, i.e., as we have already said, as the length of the
segment that represents x.

We state the following proposition for distances between vectors of Rn , leaving its simple
proof (it is su¢ cient to apply the de…nitions) to the reader.

Proposition 114 Let x; y be two arbitrary vectors in Rn . Then:

(i) d (x; y) 0;

(ii) d (x; y) = 0 , x = y;

(iii) d (x; y) = d (y; x);

(iv) d (x; y) d (x; z) + d (z; y) for every z 2 Rn .

Properties (i)–(iv) are all natural for a notion of distance. (i) says that a distance is
always a positive quantity, which by (ii) is zero only between vectors that are equal, the
distance between distinct vectors being always strictly positive. (iii) says that distance is
a symmetric notion: in measuring a distance between two vectors, it does not matter from
which of the two vectors we begin the measurement. Finally, (iv) is the so-called triangle
inequality: for example, the distance between Milan, x, and Rome, y, cannot exceed the sum
of the distances between Milan and any other place z and between that place z and Rome:
detours cannot save the distance one needs to cover.

Example 115 (i) If x = (1=3) and y = 1=3, then


1 1 2 2
d (x; y) = = =
3 3 3 3

(ii) if x = a and y = a2 with a 2 R, then d (x; y) = d a; a2 = a a2 = jaj j1 aj;

(iii) if x = (1; 3) and y = (3; 1), then


p p p
d (x; y) = (1 3)2 + ( 3 ( 1))2 = 8=2 2

(iv) if x = (a; b) and y = ( a; b), then


p p p
d (x; y) = (a ( a))2 + (b b)2 = (2a)2 + 0 = 4a2 = 2 jaj

(v) if x = (0; a; 0) and y = (1; 0; a), then


p p
d (x; y) = (0 1)2 + (a 0)2 + (0 ( a))2 = 12 + a2 + a2
p
= 1 + 2a2

N
88 CHAPTER 5. TOPOLOGICAL STRUCTURE

5.2 Neighborhoods

De…nition 116 We call (spherical) neighborhood of center x0 2 Rn and radius " > 0, and
denote it by B" (x0 ), the set

B" (x0 ) = fx 2 Rn : d (x; x0 ) < "g

The neighborhood B" (x0 ) is therefore the locus of Rn whose points lie at distance strictly
smaller than " from x0 .
In R such neighborhood is the open interval (x0 "; x0 + "), i.e.,

B" (x0 ) = (x0 "; x0 + ")

Indeed,

fx 2 R : d(x; x0 ) < "g = fx 2 R : jx x0 j < "g


= fx 2 R : "<x x0 < "g
= (x0 "; x0 + ")

where we have used (4.1), i.e., jxj < a , a < x < a.

Hence in R the neighborhoods are intervals. It is easily seen that in R2 they are discs
(without circumference), in R3 balls (without surface), etc.. Indeed, the points that lie at a
distance less than " from x0 form a disc, a ball, etc. “without peel” of center x0 .1

Neighbourhood of x0 2 R of radius "

1
Some textbooks consider as neighbourhood of a point x0 2 R any open interval containing x0 ; in this
textbook, however, we will not do this.
5.2. NEIGHBORHOODS 89

2 ε
x
0
1

0
O
-1

-2
-3 -2 -1 0 1 2 3 4 5

Neighbourhood of x0 2 R2 of radius "

Let us give some examples of neighborhoods. For simplicity of notation, we will write
B" (x1 ; ::; xn ) instead of B" ((x1 ; ::; xn )).

Example 117 (i) We have B3 ( 1) = ( 1 3; 1 + 3) = ( 4; 2).

(ii) We have
3 3 1 5
B 3 (1) = 1 ;1 + = ;
2 2 2 2 2

(iii) The notations B 1 (0) and B0 (1) are meaningless because we need " > 0.

(iv) We have
q
B3 (0; 0) = B3 (0) = x 2 R2 : d(x; 0) < 3 = x 2 R2 : x21 + x22 < 3

= x 2 R2 : x21 + x22 < 9

(v) We have

B1 (1; 1; 1) = x 2 R3 : d (x; (1; 1; 1)) < 1


n p o
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1
= x 2 R3 : (x1 1)2 + (x2 1)2 + (x3 1)2 < 1

For example, (1=2; 1=2; 1=2) 2 B1 (1; 1; 1). Indeed


2 2 2
1 1 1 3
1 + 1 + 1 = <1
2 2 2 4

Verify that, on the contrary, 0 = (0; 0; 0) 2


= B1 (1; 1; 1). N
90 CHAPTER 5. TOPOLOGICAL STRUCTURE

O.R. A point has in…nitely many neighborhoods: one for each value of " > 0. It is therefore
misleading to talk about the neighborhood of a point as if it were only one. H

For some purposes we will have the opportunity to use, exclusively in R, also “half
neighborhoods” of a point x0 ; precisely:

De…nition 118 The interval [x0 ; x0 + "), with " > 0, is called the right neighborhood of
x0 2 R of radius ". The interval (x0 "; x0 ], with " > 0, is called the left neighborhood of
x0 of radius ".

With them we can give a useful characterization of the supremum and in…mum of a
subset of R, introduced in Section 1.4.2.

Proposition 119 Given a set A R, we have a = sup A if and only if

(i) a x for every x 2 A,

(ii) for every " > 0, there exists x 2 A such that x > a ".

Proof “Only if”. If a = sup A, (i) is obviously satis…ed. Let " > 0. Since sup A > a ", the
point a " is not an upper bound of A. Therefore, there exists x 2 A such that x > a ".
“If”. Suppose that a 2 R satis…es (i) and (ii). By (i), a is an upper bound of A. By (ii),
it is also the least upper bound. Indeed, each b < a can be written as b = a ", by setting
" = a b > 0. Given b < a, by (ii) there exists x 2 A such that x > a " = b. Therefore, b
is not an upper bound of A, which implies that there is no upper bound smaller than a.

In other words, the point a 2 R is supremum of A R if and only if it is an upper


bound of A and in each left neighborhood of a there are elements of A. An analogous
characterization holds for in…ma, by replacing right neighborhoods with left neighborhoods.

5.3 Taxonomy of the points of Rn with respect to a set


The notion of neighborhood allows one to classify the points of Rn in various categories,
according to their relations with a given set A Rn .

5.3.1 Interior, exterior and boundary points


The …rst fundamental notion is that of interior point. Intuitively, a point interior to a set is
a point “inside”it, i.e., a point all surrounded by other points of the set (that is from which
it is possible to move away in any direction remaining, at least for a while, in the set).

De…nition 120 Let A be a subset of Rn . A point x0 2 A is called interior point of A if


there exists " > 0 such that B" (x0 ) A.

In other words, x0 is interior point of A if there exists at least a neighborhood of x0


completely contained in A. This motivates the adjective “interior”. A point of A is therefore
interior if it is contained in A together with an entire neighborhood, however small. We
5.3. TAXONOMY OF THE POINTS OF RN WITH RESPECT TO A SET 91

can say that the interior points are the points that belong to A both in set-theoretical sense
(x 2 A) and in topological sense (there exists B" (x) A).

In an analogous way, a point x0 2 Rn is called exterior to A if it is interior to the


complement Ac of A, i.e., if there exists " > 0 such that B" (x0 ) is contained in Ac : B" (x0 ) \
A = ;. A point not in A is therefore exterior when it does not belong to A with an entire
neighborhood, however small.

The set of the interior points of A is called the interior of A and it is denoted by int A.
By de…nition int A A.

Example 121 Let A = (0; 1). Each point of A is interior, that is, int A = A. Let indeed
x 2 (0; 1) and consider the smallest among the distances of x from the extreme points 0 and
1, i.e., min fd (0; x) ; d (1; x)g. Take " > 0 such that

" < min fd (0; x) ; d (1; x)g

Then
B" (x) = (x "; x + ") (0; 1)
and therefore x is an interior point of A. Since x was any point of A, it follows that int A = A.
N

Example 122 Let A = [0; 1]. We have int A = (0; 1). Indeed, by proceeding as above, we
see that the points in (0; 1) are all interior, that is, (0; 1) int A. It remains to examine the
extreme points 0 and 1. Consider 0. Each of its neighborhoods has the form ( "; "), with
" > 0, and hence it contains also points of Ac . It follows that 0 2 = int A. In an analogous
way one can show that 1 2 = int A. We conclude that int A = (0; 1).
The set of the exterior points of A coincides with the complement set Ac = ( 1; 0) [
(1; +1), and therefore int Ac = Ac , as the reader can easily verify. N

De…nition 123 Let A be a subset of Rn . A point x0 2 Rn is called boundary point for A


if it is neither interior nor exterior, i.e., if for every " > 0 one has that B" (x0 ) \ A 6= ; and
B" (x0 ) \ Ac 6= ;.

A point x0 is therefore a boundary point for A if each of its neighborhoods contains both
points of A (because it is not exterior) and points of Ac (because it is not interior). The set
of the boundary points of a set A is called the boundary or frontier of A and it is denoted
by @A. Intuitively, the frontier is the “border” of a set.

Note that the de…nition of boundary points is residual: a point is a boundary point if it is
not “anything else”. This implies that the classi…cation into interior, exterior, and boundary
points is exhaustive: given a set A, each point x0 2 Rn necessarily falls down into one of
these three categories.

Example 124 (i) Let A = (0; 1). Given the residual nature of the de…nition of boundary
points, to determine @A we have …rst of all to identify the interior and exterior points. We
have seen that int A = (0; 1), and also that Ac = ( 1; 0] [ [1; +1), and hence

int Ac = ( 1; 0) [ (1; +1)


92 CHAPTER 5. TOPOLOGICAL STRUCTURE

The exterior points to A are therefore those of the set ( 1; 0) [ (1; +1). It follows that

@A = f0; 1g

i.e., the boundary of (0; 1) is constituted by the two points 0 and 1. Note that A \ @A = ;:
in this example the boundary points do not belong to the set.
(ii) Let A = [0; 1]. In the Example 122 we have seen that int A = (0; 1) and that Ac is
the set of the exterior points of A. Therefore, @A = f0; 1g. Here we have @A A, the set
contains its own boundary points.
(iii) Let A = (1; 0]. The reader can verify that int A = (0; 1) and that all the points
of ( 1; 0) [ (1; +1) are exterior. Hence, @A = f0; 1g. In this example, the frontier stays
partly outside and partly inside the set: the boundary point 1 is in A, while the boundary
point 0 is not.
(iv) If
A = (x1 ; x2 ) 2 R2 : x21 + x22 1 R2
then all the points such that x21 + x22 < 1 are interior, that is,

int A = (x1 ; x2 ) 2 R2 : x21 + x22 < 1

while all the points such that x21 + x22 > 1 are exterior. Therefore,

@A = (x1 ; x2 ) 2 R2 : x21 + x22 = 1

The set A contains all its own boundary points.


(v) Let A = Q be the set of rational numbers, so that Ac is the set of the irrational
numbers. By Propositions 18 and 39, between any two rational numbers q < q 0 there exists
an irrational number a such that q < a < q 0 and between any two irrational numbers a < b
there exists a rational number q 2 Q such that a < q < b. The reader can verify that this
implies int A = int Ac = ;, and hence @A = R. The example shows that the interpretation
of the boundary as a border can in some cases be not suitable. On the other hand, the
mathematical notions have their own life and we must be ready to follow them also when
the intuition falls short. N

The next lemma generalizes what we saw in items (i)–(iii) of the example.

Lemma 125 Let A R be a bounded set. Then sup A 2 @A and inf A 2 @A.

Proof We prove that = sup A 2 @A (the proof for the in…mum is analogous). Consider
an arbitrary neighborhood of , ( "; + "). We have ( ; + ") Ac , and therefore
( "; + ") \ Ac 6= ;. Moreover, thanks to Proposition 119 for every " > 0 there exists
x0 2 A such that x0 > ", so that ( "; ] \ A 6= ;, and hence ( "; + ") \ A 6= ;.
Therefore, for every " > 0 we have ( "; + ") \ A 6= ; and ( "; + ") \ Ac 6= ;, that
is, 2 @A.

We identify an important subclass of the boundary points.

De…nition 126 Let A be a subset of Rn . A point x0 2 A is called isolated if there exists a


neighborhood of x0 that does not contain other points of A except for x0 itself.
5.3. TAXONOMY OF THE POINTS OF RN WITH RESPECT TO A SET 93

Hence, a point x0 2 A is isolated if there exists a neighborhood B" (x0 ) such that A \
B" (x0 ) = fx0 g. As the terminology suggests, the isolated points are points of the set
“separated” from the rest of the set.

Example 127 Let A = [0; 1] [ f2g. It consists of the closed unit interval and, in addition,
the point 2. The latter is isolated. Indeed, if B" (2) is a neighborhood of 2 with " < 1, then
A \ B" (2) = f2g. N

As anticipated, we have

Lemma 128 The isolated points are boundary points.

Proof Let x0 be an isolated point of A. Since x0 belongs to each of its neighborhoods, we


have B" (x0 ) \ A 6= ; for every " > 0. It remains to prove that B" (x0 ) \ Ac 6= ; for every
" > 0. Let therefore " > 0. Since x0 is isolated point of A, there exists "0 > 0 such that
(B"0 (x0 ) fx0 g) Ac . Let = minf"; "0 g. We have B (x0 ) fx0 g B"0 (x0 ) fx0 g Ac
and B (x0 ) fx0 g B" (x0 ) fx0 g. Let y 2 B (x0 ) fx0 g. For what we have seen, y 2 Ac
and y 2 B" (x0 ) fx0 g, therefore y 2 Ac \ B" (x0 ). It follows that B" (x0 ) \ Ac 6= ;. Hence for
every " > 0 we have B" (x0 ) \ A 6= ; and B" (x0 ) \ Ac 6= ;, so x0 is a boundary point for A:

5.3.2 Limit (accumulation) points


De…nition 129 Let A be a subset of Rn . A point x0 2 Rn is called a limit or accumulation
point for A if each neighborhood of x0 contains at least one point of A distinct from x0 .

Hence, x0 is a limit point of A if, for every " > 0, there exists some x 2 A such that2
0 < kx0 xk < ". The set of limit points of A is denoted by A0 and it is called derived set
of A. Note that it is not required that the limit point x0 belongs to A.

N.B. De…nition 129 can be equivalently expressed saying that x0 2 Rn is a limit point for A if,
for every " > 0, there exists a neighborhood B" (x0 ) of x0 such that (B" (x0 ) fx0 g) \ A 6= ;.
O

First of all, let us state the relations of the limit points with the classi…cation just seen.
Obviously, limit points are never exterior. Moreover:

Lemma 130 Let A be a subset of Rn .

(i) Each interior point of A is a limit point, that is, int A A0 .

(ii) A boundary point of A is a limit point if and only if it is not isolated.

Proof (i) If x0 2 int A, there exists a neighborhood B"0 (x0 ) of x0 such that B"0 (x0 ) A.
Let B" (x0 ) be any neighborhood of x0 . The intersection

B"0 (x0 ) \ B" (x0 ) = Bminf"0 ;"g (x0 )


2
The inequality 0 < kx0 xk is equivalent to x 6= x0 , i.e., it imposes that x is a point of A distinct from
x0 .
94 CHAPTER 5. TOPOLOGICAL STRUCTURE

is in turn a neighborhood of x0 of radius min f"0 ; "g > 0. Hence Bminf"0 ;"g (x0 ) A and,
in order to complete the proof, it is su¢ cient to consider any x 2 Bminf"0 ;"g (x0 ) such that
x 6= x0 . Indeed, x belongs also to the neighborhood B" (x0 ) and it is distinct from x0 .

(ii) “If”. Consider a point x0 that is a boundary point, but not an isolated point. By the
de…nition of boundary points, for every " > 0 we have B" (x0 ) \ A 6= ;. By the de…nition
of non-isolated points, for every " > 0 we have B" (x0 ) \ A 6= fx0 g. This implies that for
every " > 0 we have (B" (x0 ) fx0 g) \ A 6= ;, i.e., that x0 is a limit point of A. “Only if”.
Take a point x0 that is both a boundary point and a limit point, i.e., x0 2 @A \ A0 . Each
neighborhood B" (x0 ) contains at least a point x 2 A distinct from x0 , that is, B" (x0 ) \ A 6=
fx0 g. It follows that x0 is not isolated.

In the light of this result, the set A0 of the limit points consists of the interior points of
A and the non-isolated boundary points of A. Therefore, a point of A is a limit point or it
is isolated, tertium non datur.

Example 131 (i) If A = [1; 0) R, all the points of the interval [0; 1] and only them are
limit points, that is, A0 = [0; 1]. Note how 1 is a limit point although it does not belong to
A.
(ii) If A = (x1 ; x2 ) 2 R2 : x21 + x22 1 , all the points of A are limit points, that is,
A = A0 . N

Example 132 Let A = (x1 ; x2 ) 2 R2 : x1 + x2 = 1 . A is a straight line in the plane.


We have int A = ; and @A = A0 = A. Hence, the set A does not have any interior point
(graphically, draw a little disc around a point of A: even if it is very small, there is no way
to include it all in A), while all its points are both limit points and boundary points.

4
x
2
3

2 2

0
-1 O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

In the de…nition of limit point it is required that each of its neighborhoods contains at
least one point of A other than itself. Actually, as we show now, it necessarily contains
in…nitely many of them.
5.4. OPEN AND CLOSED SETS 95

Proposition 133 Each neighborhood of a limit point of A contains in…nitely many points
of A.

Proof Let x be a limit point of A. Suppose, by contradiction, that there exists a neighbor-
hood B" (x) of x containing a …nite number of points fx1 ; :::; xn g of A, except, at most, x
itself. Since fx1 ; :::; xn g is a …nite set, the minimum distance

min d (x; xi )
i=1;:::;n

exists and it is strictly positive, i.e., mini=1;:::;n d (x; xi ) > 0. Let > 0 be such that
< mini=1;:::;n d (x; xi ) : It is evident that 0 < < ", since < mini=1;:::;n d (x; xi ) < ":
Hence B (x) B" (x): It is also evident, by construction, that for each i = 1; 2; :::n; we have
xi 2
= B (x): So, if x 2 A, we have B (x) \ A = fxg; if instead x 2 = A, we have B (x) \ A = ;:
Independently of whether x belongs to A or not, we have

B (x) \ A fxg

Therefore, the unique point of A that B (x) can contain is, at most, x itself. But, this
contradicts the hypothesis that x is a limit point of A.

O.R. The concept of interior point of a set A requires the existence of a neighborhood of
the point that is entirely formed by points of A. This means that it is possible to move away
(at least a bit) from the point by following any path that starts from it and remain inside
A (i.e., it is possible go for a “little walk” in any direction without showing the passport):
looking at the path in the opposite direction, we can say that it is possible to approach the
point by coming from any direction and by remaining within A.
The concept of limit point of a set A, which does not require that the point belongs to A,
requires instead that we can get as close as we want to the point by “jumping” on points of
the set (i.e., that, as when we cross a river jumping on surfacing stones, we can get as close
as we want to our target on “stones” that all belong to the set). This idea of approaching a
certain point by remaining within a given set will be crucial for the de…nition of the limit of
a function. H

5.4 Open and closed sets


We introduce now the fundamental notions of open set and of closed set. Intuitively, the
concept of open set is the abstraction of the idea of geometric …gure without the border,
while the concept of closed set is the abstraction of geometric …gure with the border (the
concept of boundary is, instead, the abstraction of border3 ).

De…nition 134 A subset A Rn is called open if all its points are interior, i.e., if int A =
A.
3
With the caveat of Example 124-(v).
96 CHAPTER 5. TOPOLOGICAL STRUCTURE

Example 135 The open intervals (a; b) are open (whence the name). Indeed, let x 2 (a; b)
be any point of (a; b). We show that it is interior. Let

0 < " < min fd (x; a) ; d (x; b)g

We have B" (x) (a; b) and therefore x is an interior point of (a; b). It follows that (a; b) is
open. N

Example 136 The set A = B1 (0; 0) f(0; 0)g = x 2 R2 : 0 < x21 + x22 < 1 is open.
Graphically, it is the disc without both the “peel” and the origin, that is,

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

Given that the neighborhoods in R are of the type (a; b), they are all open. The next
result shows that the property of the neighborhoods of being open holds in general in Rn .

Lemma 137 The neighborhoods in Rn are open.

Proof Let B" (x0 ) be a neighborhood of a point x0 2 Rn . To show that B" (x0 ) is open, we
have to show that each of its points is interior. Let x 2 B" (x0 ). To prove that x is interior
to B" (x0 ), let
0 < "0 < " d (x; x0 ) (5.2)

Then B"0 (x) B" (x0 ) : Indeed, let y 2 B"0 (x). Then

d(y; x0 ) d(y; x) + d(x; x0 ) < "0 + d (x; x0 ) < "

where the last inequality follows from (5.2). Therefore, B"0 (x) B" (x0 ), which completes
the proof.

This proof can be illustrated by the following picture:


5.4. OPEN AND CLOSED SETS 97

De…nition 138 The set


A [ @A
formed by the points of A and by its boundary points is called the closure of A and is denoted
by A.

Clearly, A A. The closure of A is, thus, an “enlargement” of A that includes all its
boundary points, that is, the borders. Naturally, the notion of closure is relevant when the
borders are not already part of A.

Example 139 (i) If A = [0; 1) R, then A = [0; 1].


(ii) If A = (x1 ; x2 ) 2 R2 : x21 + x22 1 , then A = A. N

Example 140 Given a neighborhood B" (x0 ) of a point x0 2 Rn , we have

B" (x0 ) = fx 2 Rn : d (x; x0 ) "g (5.3)

the closure of a neighborhood is characterized by having “ "” instead of “< "”. N

We can now introduce the closed sets.

De…nition 141 A subset A of Rn is called closed if it contains all its boundary points, that
is, if A = A.

Hence, a set is closed when it includes its border.

Example 142 The set A = [0; 1) R is not closed since A 6= A, while the set A =
2 2 2
(x1 ; x2 ) 2 R : x1 + x2 1 is closed since A = A. N

Example 143 The closed intervals [a; b] R are closed (whence the name). The unbounded
intervals (a; 1) and ( 1; a) are open. The unbounded intervals [a; 1) and ( 1; a] are
closed. N

Example 144 The set A = (x1 ; x2 ) 2 R2 : x1 + x2 = 1 is closed since A = @A = A0 = A.


N

The notions of open and closed sets are dual, as the next basic result shows.4
4
In many textbooks a closed set is de…ned as one whose complement is open, and it is proved as a theorem
the consequent property that each closed set contains its boundary. In other words, the de…nition and the
theorem are switched with respect to the formulation we have chosen.
98 CHAPTER 5. TOPOLOGICAL STRUCTURE

Theorem 145 A set in Rn is open if and only if its complement is closed.

Proof “Only if”. Let A be open. We show that Ac is closed. Let x be an arbitrary boundary
point of Ac , that is, x 2 @Ac . By de…nition, x is not interior either for A or for Ac . Hence,
x2 = int A. But, A = int A, since A is open. Therefore x 2 = A, that is, x 2 Ac . It follows that
@Ac Ac , since x was an arbitrary point of @Ac . Therefore, Ac = Ac , which proves that Ac
is closed.

“If”. Let Ac be closed. We show that A is open. Let x be any point of A. Since
x2 = Ac = Ac , x is not a boundary point for Ac and it is therefore interior for A or interior
for Ac . But, since x 2
= Ac implies x 2= int Ac , we conclude that x 2 int A. Hence the point x
is interior, which implies that A is open.

Example 146 The …nite sets A = fx1 ; x2 ; :::; xn g of Rn (in particular, the singletons) are
closed. To verify it, observe that the complement Ac is open. Indeed let x 2 Ac and " > 0
such that
" < d (x; xi ) 8i = 1; :::; n
We have B" (x) Ac and hence x is an interior point. It follows that Ac is open. We leave
the reader to verify that int A = ; and @A = A. N

Example 147 The …gure

4
x
2
3

0 -1 2
O x
1
-1
-1

-2
-3 -2 -1 0 1 2 3 4 5

represents the closed set

f(2; 1)g [ f(x1 ; x2 ) 2 R2 : x2 = x21 g [ f(x1 ; x2 ) 2 R2 : (x1 + 1)2 + (x2 + 1)2 1=4g

of R2 . N

Open and closed sets are therefore two faces of the same medal: to state that a set is
closed/open is equivalent to state that its complement is open/closed. Naturally, there are
many sets that do not satisfy any of these properties. We now see a very simple example of
this.
5.4. OPEN AND CLOSED SETS 99

Example 148 The set A = [0; 1) R is neither open, nor closed. Indeed, int A = (0; 1) 6= A
and A = [0; 1] 6= A. N

There is a case in which the duality of open and closed sets assumes a curious appearance.

Example 149 The empty set ; and the entire Rn are simultaneously open and closed. By
Theorem 145, it is su¢ cient to show that Rn is both open and closed. But, this is obvious.
Indeed, Rn is open since, trivially, each of its points is interior, and it is closed because Rn
necessarily coincides with its own closure. It is possible to show that ; and Rn are the unique
sets with such double personality. N

We go back to the notion of closure A. The next result shows how it can equivalently be
seen as the addition to the set A of its limit points A0 . In other terms, adding the borders
is equivalent to adding the limit points.

Proposition 150 We have A = A [ A0 .

Proof We have to prove that A [ A0 = A [ @A. We start by showing that A [ A0 A [ @A.


Since A A [ @A, we have to prove that A0 A [ @A. Let x 2 A0 . By what we have
observed after the proof of Lemma 130, x is an interior point or a boundary point, and hence
x 2 A [ @A.
It remains to show that A [ @A A [ A0 . Since A A [ A0 , we have to prove that
0
@A A [ A . Let x 2 @A: If x is an isolated point, then by de…nition x 2 A: Otherwise, by
Lemma 130, x is a limit point for A, that is, x 2 A0 . Hence, x 2 A [ A0 .

From the equivalence just shown if follows, as a corollary, that a set is closed when it
contains all its limit points. It is a remarkable equivalence.

Corollary 151 A subset A of Rn is closed if and only if it contains all its limit points.

Proof Let A be closed. By de…nition, A = A and hence, thanks to Proposition 150,


A [ A0 = A, that is, A0 A. Vice versa, if A0 A, then obviously A [ A0 = A. By
0
Proposition 150, we have A = A [ A = A.

Example 152 The inclusion A0 A in Corollary 151 can be strict, in which case the set
0
A A consists of the isolated points of A. For example, let A = [0; 1] [ f 1; 4g. Then A
is closed and A0 = [0; 1]. Hence A0 is strictly included in A and the set A A0 = f 1; 4g
consists of the isolated points of A. N

As we have already observed, it always holds that

int A A A (5.4)

The next result shows the topological importance of these inclusions.

Proposition 153 Given any set A in Rn , we have:

(i) int A is the largest open set contained in A;


100 CHAPTER 5. TOPOLOGICAL STRUCTURE

(ii) A is the smallest closed set that contains A.

The set of interior points int A is therefore the largest open set that approximates A
“from inside”, while the closure A is the smallest closed set that approximates A “from
outside”. The relation (5.4) is therefore the best topological sandwich, with lower open slice
and upper closed slice, that we can have for the set A.5

It is now easy to prove an interesting and intuitive property of the boundary of a set.

Corollary 154 The boundary of any set in Rn is a closed set.

Proof Let A be any set in Rn . Since the exterior points to A are interior to its complement,
we have
(@A)c = int A [ int Ac
and hence @A is closed because int A and int Ac are open and, as we will see in Theorem
156, a union of open sets is open.

The next result, whose proof is left to the reader, shows that the di¤erence between the
closure and the interior of a set is given by its boundary points.

Proposition 155 For each subset A Rn we have @A = A int A.

The result makes precise the intuition that open sets are sets without borders. Indeed,
Proposition 155 implies that A is open if and only if @A \ A = ;. On the other hand, by
de…nition, a set is closed if and only if @A A, that is, when it includes the borders.

5.5 Set-theoretical stability


We have seen in Theorem 145 that the set operation of complementation plays a crucial role
for open and closed sets. It is natural to ask what stability properties the open and closed
sets enjoy with respect to the other basic set operations, i.e., intersection and union.
We start by considering this issue for neighborhoods, the simplest open sets. The inter-
section of two neighborhoods of x0 is still a neighborhood of x0 : indeed B"1 (x0 ) \ B"2 (x0 )
is nothing but the smallest of the two, i.e.,

B"1 (x0 ) \ B"2 (x0 ) = Bminf"1 ;"2 g (x0 )

The same is true for intersections of a …nite number of neighborhoods. It is, however, no
longer true for intersections of in…nitely many neighborhoods: for example,
1
\
B 1 (x0 ) = fx0 g (5.5)
n
n=1

i.e., this intersection reduces to the singleton fx0 g, which is closed, as observed in Example
146. Therefore, the intersection of in…nitely many neighborhoods might well not be open.
5
Clearly, there are also sandwiches with a lower closed slice and an upper open slice, as the reader will see
in more advanced courses.
5.5. SET-THEORETICAL STABILITY 101

T
To check (5.5), note that a point belongs to the intersection 1 n=1 B1=n (x0 ) if and only
if it belongs
T to each neighborhood B 1=n (x0 ). This is certainly true for x0 , and therefore
x0 2 1 B
n=1 1=n (x0 ). We show that it is the unique point
T1 that satis…es this property.
Suppose, by contradiction, that y 6= x0 is such that y 2 n=1 B1=n (x0 ). Since y 6= x0 , we
have d (x0 ; y) > 0. If we take n su¢ ciently large, in particular
1
n>
d (x0 ; y)

then its reciprocal 1=n will be su¢ ciently small to have


1
0< < d (x0 ; y)
n
T1
Therefore, y 2= B1=n (x0 ), which contradicts the assumption
T1 that y 2 n=1 B1=n (x0 ). It
follows that x0 is the only point in the intersection n=1 B1=n (x0 ), i.e., (5.5) holds.

A union of neighborhoods of x0 is, instead, always a neighborhood of x0 , even if the


union is in…nite. Indeed

B"1 (x0 ) [ B"2 (x0 ) = Bmaxf"1 ;"2 g (x0 )

is nothing but the largest of the two. More generally, in the case of in…nitely many neigh-
borhoods B"i (x0 ), if supi "i < +1 we set " = supi "i , so that
1
[
B"i (x0 ) = B" (x0 )
i=1

For example,
1
[
B 1 (x0 ) = B1 (x0 )
n
n=1

When, on the contrary, supi "i = +1, we have


1
[
B"i (x0 ) = Rn
i=1

For example,
1
[
Bn (x0 ) = Rn
n=1

In any case, we always get an open set.

Finite intersections of neighborhoods are therefore neighborhoods, while (arbitrary) uni-


ons of neighborhoods are neighborhoods. The next result shows that these properties of
stability hold for all open sets.

Theorem 156 (i) The intersection of any …nite family of open sets is open. (ii) The union
of any family (…nite or not) of open sets is open.
102 CHAPTER 5. TOPOLOGICAL STRUCTURE

T
Proof (i) Let A = ni=1 Ai , with all Ai open sets. Each x 2 A belongs to all the Ai and
it is interior to all of them (because they are open),
T i.e., there exist neighborhoods of x,
B"i (x) Ai . We call B their intersection, B = ni=1 B"i (x): it is still a neighborhood of x
(with radius " = min f"1 ; :::; "n g) and, even more so, B Ai for each i = 1; 2; : : : ; n. But
then B A and it is a neighborhood of x all contained in A. Therefore, A is open.
S
(ii) Let A = A , where runs over a …nite or in…nite set. Each x 2 A belongs to at
least one among the A s, call it A . Since all the A s are open, there exists a neighborhood
of x contained in A and hence, even more so, in A. Therefore, x is interior to A and, given
the arbitrariness of x, A is open.

By Theorem 145 and the De Morgan laws, it is easy to prove that dual properties hold
for the closure, which is preserved by all intersections, but only by …nite unions.

Corollary 157 The union of any …nite family of closed sets is closed. The intersection of
any family (…nite or not) of closed sets is closed.

5.6 Compact sets


This section is short, yet quite important. First of all, we introduce the concept of bounded
set. For sets of the real line the notion has already been introduced with De…nition 29: a
set A in R is bounded when it is both bounded from below and bounded from above. As
the reader can easily verify, this is equivalent to the existence of a constant K > 0 such that
K < x < K for every x 2 A R, that is,

jxj < K 8x 2 A

The next de…nition is the natural extension of this idea to Rn , where the absolute value is
replaced by the more general notion of norm.

De…nition 158 A subset A of Rn is bounded if there exists K > 0 such that

kxk < K 8x 2 A

By recalling that kxk is the distance of x to the origin, it is easily seen that a set A is
bounded if, for every x 2 A, we have d(x; 0) < K. In other words, A is bounded if there
exists a neighborhood BK (0) of the origin that contains it, i.e., A BK (0). It is immediate
to see that all the neighborhoods B" (x) are bounded sets, as are their closures (5.3): it
is su¢ cient to take K = ". On the contrary, the interval (a; 1) is a simple example of
unbounded set (for this reason it is called unbounded interval).

Using boundedness, we can de…ne a class of closed sets that turns out to be very important
for applications.

De…nition 159 A subset A of Rn is called compact if it is closed and bounded.


5.7. CLOSURE AND CONVERGENCE 103

For example, all the intervals closed and bounded in R are compact6 . More generally,
the closures B" (x0 ) of any neighborhood B" (x0 ) in Rn are compact. For example, the set

B1 (0) = (x1 ; :::; xn ) 2 Rn : x21 + + x2n 1

is compact in Rn . This classical set of Rn is called closed unit ball. The reason for this
terminology is evident in the special case n = 2:

4
x
2
3

0
O x
1

-1

-2
-3 -2 -1 0 1 2 3 4 5

Like the closed sets, compactness is stable under …nite unions and arbitrary intersections,
as we leave to the reader to prove7 .

Example 160 The …nite sets A = fx1 ; x2 ; : : : ; xn g, and in particular the singletons, are
compact sets. Indeed, in Example 146 we showed that they were closed sets. Since they are
obviously bounded, they are compact. N

Example 161 Budget sets are a fundamental example of compact sets in consumer theory,
as Proposition 670 will show. N

5.7 Closure and convergence


In this section we present an important characterization of closed sets by means of sequences,
which will be introduced in Chapter 8.8

Theorem 162 A set C in Rn is closed if and only if it contains the limit of every convergent
sequence of its points. That is, C is closed if and only if

fxn g C; xn ! x =) x 2 C (5.6)
6
The empty set ;, however trivial, is considered a compact set.
7
With regard to this, the reader can observe that, since the empty set is compact, the intersection of two
disjoint compact sets is the empty (compact) set.
8
Therefore, the section can be skipped in a …rst moment, and can be read only after having studied
sequences.
104 CHAPTER 5. TOPOLOGICAL STRUCTURE

Proof “Only if”. Let C be closed. Let fxn g C be a sequence such that xn ! x. We want
to show that x 2 C. Suppose, by contradiction, that x 2 = C. Since xn ! x, for every " > 0
there exists n" 1 such that xn 2 B" (x) for every n n" . Therefore, x is a limit point for
C, which contradicts x 2= C because C is closed and so contains all its limit points.
“If”. Let C be a set for which property (5.6) holds. By contradiction, let C be non-
closed. Then there exists at least one boundary point x of C that does not belong to C. As
it cannot be isolated (otherwise it would belong to C), by Lemma 130 x is a limit point for
C. Each neighborhood B1=n (x) does contain a point of C: call it xn . The sequence of such
xn s converges to x 2
= C, contradicting (5.6). Hence, C is closed.

This property is very important: a set is closed if and only if “it is closed with respect
to the limit operation”, that is, if by taking limits of sequences we never leave the set. The
property is natural in economics: a set is closed if (and only if), whenever it is possible to
get arbitrarily close to a point by still staying in the set, the point must belong to the set.
In a concrete problem it would be very strange if, with points of the set, one could get
arbitrarily close to a point x without being able to reach it: it would be like licking the window
of a confectioner without being able to reach the pastries (very close, yet unreachable). For
this reason the sets that appear in economic models are almost always closed.

Example 163 Consider the closed interval C = [a; b]. We show that it is closed using
Theorem 162. Let fxn g C be such that xn ! x 2 R. Thanks to Theorem 162, to show
that C is closed, it is su¢ cient to show that x 2 C. Since a xn b, a simple application
of the comparison criterion shows that a x b, that is, x 2 C. N

Example 164 Consider the rectangle C = [a; b] [c; d] in R2 . Let xk C be such


that xk ! x 2 R2 . By Theorem 162, to show that C is closed it is su¢ cient to show that
x = (x1 ; x2 ) 2 C. By (8.41), xk ! x implies xk1 ! x1 and xk2 ! x2 . Since xk1 2 [a; b] and
xk2 2 [c; d] for every k, again a simple application of the comparison criterion shows that
x1 2 [a; b] and x2 2 [c; d], that is, x 2 C. N
Chapter 6

Functions

6.1 The concept


Consider a greengrocer who at the vegetable market faces the following table that lists the
unit price of a kilogram of walnuts in correspondence to various quantities (measured in
kilograms) of walnuts that can be purchased by his own wholesaler:

Quantity Price per kg


10 kg 4 euros
20 kg 3:9 euros
30 kg 3:8 euros
40 kg 3:7 euros

In other words, if the greengrocer buys 10 kg of walnuts he will pay them 4 euros per
kg, if he buys 20 kg he will pay them 3:9 euros per kg, and so on (we are assuming that, the
larger quantities are purchased, the lower the unit price).
The table is an example of a supply function, which associates to each quantity the
corresponding purchase price: A = f10; 20; 30; 40g is the set of the quantities and B =
f4; 3; 9; 3; 8; 3; 7g is the set of their unit prices; the supply function is a rule that associates
to each element of the set A an element of the set B.

In general, we have

De…nition 165 Given any two sets A and B, a function de…ned on A and with values in
B, denoted by f : A ! B, is a rule that associates to each element of the set A one, and
only one, element of the set B.

To denote that to the element a 2 A f associates the element b 2 B we write

b = f (a)

105
106 CHAPTER 6. FUNCTIONS

The rule can be completely arbitrary; what matters is only that it associates to each
element a of A only one element b of B 1 .

The arbitrariness of the rule is the crucial aspect of the notion of function. It is one of
the fundamental ideas of mathematics, to which mathematicians arrived relatively recently:
the notion of function considered above was introduced in 1829 by Dirichlet after about 150
years of discussions (the …rst ideas regarding this notion go back to Leibnitz at the end of
the XVII century).

Note that it is perfectly legitimate that the same element of B is associated to two (or
more) di¤erent elements of A, that is,

Legitimate

On the contrary, it cannot happen that several elements of B are associated to the same
1
We have written in italics the most important words: the rule must hold for each element of A and, to
each one, it must associate only one element of B.
6.1. THE CONCEPT 107

element of A, i.e.,

Illegitimate

Before considering some examples, we introduce a bit of terminology. The two variables,
a and b, are traditionally called the independent variable and the dependent variable, re-
spectively. Moreover, the set A is called the domain of the function, while the set B is its
codomain.
The codomain is the set in which the function assumes its values, but not necessarily
contains only such values: it can also be larger. Concerning this aspect, the next notion is
important: given a 2 A, the element f (a) 2 B is called the image of a. Taken any subset C
of the domain A, the set
f (C) = ff (x) : x 2 Cg B (6.1)
of the images of the points in C is called the image of C. In particular, the set f (A) of
all the images of points of the domain is called image (set) of the function f . It is denoted
by Im f and it is therefore the subset of the codomain constituted by the elements that are
image of some element of the domain:

Im f = f (A) = ff (x) : x 2 Ag B

Note that each set that contains Im f is, indeed, a possible codomain for the function: if
Im f B and Im f C, then writing both f : A ! B and f : A ! C is …ne. The choice
of codomain is a mere question of convenience. For example, in this book, we will often
consider functions taking real values, that is, f (x) 2 R for each x in the domain of f ; in
this case, the natural choice for the codomain is the entire real line and we will usually write
f : A ! R.

Example 166 Let A be the set of all countries on Earth and B a set containing some colors
(at least four). The function f : A ! B associates to each country the color given to it on a
geographic map: Im f is the set of the colors e¤ectively used at least once. N

Example 167 The rule that associates to each living human being his date of birth is a
function f : A ! B, where A is the set of the human beings and, for example, B is the set
of the dates of the last 150 years (a codomain su¢ ciently large to contain all the possible
birth dates). N
108 CHAPTER 6. FUNCTIONS

Let us see an example of rule that does not de…ne a function.

Example 168 Consider the rule that associates to each real positive number x both the
positive square root and the negative square root (the so-called algebraic root), that is,
p p
f x; xg. For example, it associates to 4 the elements f 2; 2g. This rule does not describe
a function f : R+ ! R since to each element of the domain di¤erent from 0 two di¤erent
elements of the codomain are associated. N

The main classes of functions that we will consider are:

(i) f : A R ! R, real-valued functions of a real variable, called functions of a single


variable or scalar functions.2
(ii) f : A Rn ! R, real-valued functions of n real variables, called functions of several
variables or vector functions.
(iii) f : A R ! Rm , vector-valued functions of a real variable, called curves.3
(iv) f : A Rn ! Rm , vector-valued functions of n real variables, called operators.

We present now some classical examples of functions of one variable.

Example 169 Let f : R ! R be de…ned by f (x) = x3 , for which the rule is to associate to
each real number its cube; each real number has a unique cube, and hence the rule de…nes
a function. Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Function x3

In particular we have Im f = f (R) = R. N


2
The terminology “scalar function” has the advantage of brevity, but it is less common and it can have
di¤erent meanings. Accordingly, the reader must use it with some care. The same is true for the terminology
“vector function”.
3
We will rarely consider functions f : A R ! Rm (we mention them here for the sake of completeness)
and so this speci…c meaning of the word “curve” will not be relevant for us in the book.
6.1. THE CONCEPT 109

Example 170 Let f : R ! R be de…ned by f (x) = x2 , for which the rule is to associate to
each real number its square; each real number has a unique square that is certainly 0 and
hence this rule, too, de…nes a function with Im f = f (R) = R+ . Graphically:

5
y
4

1
1

0
-1 O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

Function x2

Observe how in this case to two di¤erent elements of the domain can correspond the same
element: for example, f (1) = f ( 1) = 1. N
p
Example 171 (i) Let f : R+ ! R be the function de…ned by f (x) = x, which associates
to each positive real number its (arithmetic) square root. The domain is the positive half-line
and Im f = R+ . Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

p
Function x

(ii) The function f : R++ ! R de…ned by f (x) = loga x, a > 0 and a 6= 1, which
associates to each strictly positive real number its logarithm, has as domain R++ . Moreover,
110 CHAPTER 6. FUNCTIONS

Im f = R. Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Function log x

Example 172 (i) Let f : R ! R be de…ned by f (x) = jxj for every x 2 R. It is called
absolute value function of x ( or modulus function of x). For this function with domain R
we have Im f = R+ . Graphically:

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Function jxj

(ii) Let f : R f0g ! R be de…ned by f (x) = 1= jxj for every x 2 R. Graphically:


6.1. THE CONCEPT 111

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Function 1= jxj

Here the domain is A = R f0g, the real line without the origin, while Im f = R++ . N

The functions of several variables f : A Rn ! R are of fundamental importance in


economic applications. Let us provide some examples.

Example 173 (i) Let the function f : R2 ! R be de…ned by4

f (x1 ; x2 ) = x1 + x2

It associates to each pair x = (x1 ; x2 ) 2 R2 the sum of its components; for every x 2 R2
such sum is unique, and therefore the rule de…nes a function with Im f = f R2 = R.
(ii) The function f : Rn ! R de…ned by
n
X
f (x1 ; x2 ; ; xn ) = xi
i=1

generalizes to Rn the function of two variables f (x1 ; x2 ) = x1 + x2 (which is the special case
n = 2). N

Example 174 (i) Let f : R2+ ! R be de…ned by


p
f (x1 ; x2 ) = x1 x2

It associates to each x = (x1 ; x2 ) 2 R2+ the square root of the product of the components; for
each x 2 R2+ this root is unique and, therefore, the rule de…nes a function with Im f = R+ .
(ii) The function f : Rn+ ! R de…ned by
n
Y
f (x1 ; x2 ; ; xn ) = xi i
i=1
4
To be consistent with the notation adopted for vectors, we should write f ((x1 ; x2 )); but, to ease notation,
throughout the book we write f (x1 ; x2 ).
112 CHAPTER 6. FUNCTIONS

P
with the exponents i > 0 of unit sum, that is, ni=1 i = 1, generalizes to Rn the function of
p
two variables f (x1 ; x2 ) = x1 x2 (which is the special case with n = 2 and 1 = 2 = 1=2).
This function is widely used in economics with the name of Cobb-Douglas function. N

In economics the operators f : A Rn ! Rm , too, are very important. Let us provide


some analytical examples5 .

Example 175 (i) Let the function f : R2 ! R2 be de…ned by

f (x1 ; x2 ) = (x1 ; x1 x2 ) ; 8 (x1 ; x2 ) 2 R2

For example, if (x1 ; x2 ) = (2; 5), then f (x1 ; x2 ) = (2; 2 5) = (2; 10) 2 R2 .

(ii) Let f : R3 ! R2 be de…ned by

f (x1 ; x2 ; x3 ) = 2x21 + x2 + x3 ; x1 x42 ; 8 (x1 ; x2 ; x3 ) 2 R3

For example, if x = (2; 5; 3), then

f (x1 ; x2 ; x3 ) = 2 22 + 5 3; 2 54 = (10; 623)

O.R. A function f : A ! B is a kind of machine that transforms each element a 2 A in an


element b = f (a) 2 B.

b=f(a)

If we insert in it any element a 2 A, it “spits out”f (a) 2 B. If we insert an element a 2


= A,
the machine will jam and will not produce anything. The image set Im f = f (A) B is
simply the “list” of all the elements that can come out from the machine.
In particular, for scalar functions the machine transforms real numbers into real numbers,
for vector functions it transforms vectors of Rn into real numbers, for curves it transforms
5
In Section 14:1 we will see some economic examples of such functions.
6.1. THE CONCEPT 113

real numbers into vectors of Rm , and for operators it transforms vectors of Rn into vectors
of Rm .
Observe that the names of the variables are altogether irrelevant: we can indi¤erently
write a = f (b), or y = f (x), or s = f (t), or = f ( ), etc., or also = f ( ): the names
of the variables are simple place cards and what counts is only the sequence of operations
(almost always numerical) that lead from a to b = f (a). Writing b = a2 +2a+1 is exactly the
same as writing y = x2 +2x+1, or s = t2 +2t+1, or = 2 +2 +1, or even = 2 +2 +1:
the function (its name is f ) is identi…ed by the operations square + double + 1 that allow
us to pass from the independent variable to the dependent one. H

We close this introductory section by making rigorous the notion of graph of a function,
until now used at an intuitive level. For the parabola x2 the graph

5
y
4

1
1

0
-1 O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

is the locus of the points (x; f (x)) of the plane, when x varies in the domain of the function.
For example, the points ( 1; 1), (0; 0) and (1; 1) belong to the graph of the parabola.

De…nition 176 The graph Gr f of the function f : A ! B is the set

Gr f = f(x; f (x)) : x 2 Ag A B

The graph is therefore a subset of the Cartesian product A B. In particular:

(i) When A R and B = R, the graph is a subset of the plane R2 . Geometrically it is


a curved line (without thickness) in R2 given that to each x 2 A there corresponds a
114 CHAPTER 6. FUNCTIONS

unique f (x).

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Curve in R2

(ii) When A R2 and B = R, the graph is a subset of the tridimensional space R3 , i.e., a
surface (without thickness).

Surface in R3
6.2. APPLICATIONS 115

6.2 Applications
6.2.1 Static choices
Let us assume that, as in Section 2.4.1, the vectors in Rn+ have the meaning of bundles of
goods. It is natural to think that the consumer will prefer some bundles to others. For
example, it is reasonable to assume that, if x y (x is “more abundant” than y), x is
preferred to y. In symbols, we write x % y, where the symbol % represents the preference
relation of the consumer on the bundles.
In general, we assume that the preference % on the various bundles of goods can be
represented by a function u : Rn+ ! R, called utility function, such that the bundle x is
preferred to y if and only if u (x) u (y), i.e.,

x % y () u (x) u (y) (6.2)

Originally, around 1870, the …rst marginalists (in particular Jevons, Menger and Walras)
interpreted u (x) as the level of welfare/physical satisfaction produced by the bundle x.
They gave therefore a physiological interpretation of the utility functions, which quanti…ed
the emotions that the consumer felt in owing di¤erent bundles. In the so-called cardinalist
interpretation of the utility functions that goes back to Jeremy Bentham and to his “pain
and pleasure calculus”,6 the utility functions, besides representing the preference %, are
inherently interesting because they quantify an emotive state of the consumer, his degree of
pleasure induced by the bundles. In addition to the comparison u (x) u (y), it is also licit
to compare the di¤erences
u (x) u (y) u (z) u (w) (6.3)
which indicate that the bundle x is more intensively preferred to y of how much the bundle
z is with respect to the bundle w. Moreover, since u (x) measures the degree of pleasure
that the consumer gets by the bundle x, in the cardinalist interpretation it is also licit to
compare these measures among di¤erent consumers, i.e., to make interpersonal comparisons
of utility.

The cardinalist interpretation came into question at the end of the XIX century due to
the impossibility of measuring in an experimental way the supposed physiological aspects
that lie at the basis of utility functions.7 For this reason, with the works of Vilfredo Pareto
at the beginning of the XX century, developed …rst by Eugen Slutsky in 1915 and then
by John Hicks in the 1930s,8 the ordinalist interpretation of the utility functions prevailed:
more modestly, it is assumed that they are only a mere numerical representation of the
preference % of the consumer. According to such an interpretation, what counts is only that
the ordering u (x) u (y) represents the preference for bundle x over bundle y, that is, x % y.
On the other hand, it is of no interest to know if it also represents the, more or less intense,
6
See his Introduction to the Principles of Morals and Legislation, published in 1789.
7
Around 1901, the famous mathematician Henri Poincaré wrote to Walras: “I can say that one satisfaction
is greater than another, since I prefer one to the other, but I cannot say that the …rst satisfaction is two or
three times greater than the other.” Even if he did not have great economic knowledge, Poincaré, with great
sensibility, understood the main point.
8
The interested reader can read G. J. Stigler, The development of utility theory I, II, Journal of Political
Economy, 58, 307–327 and 373–396, 1950.
116 CHAPTER 6. FUNCTIONS

consumer’s emotions. In other terms, in the ordinalist approach the fundamental notion is
the one of preference %, while the utility function is a mere numerical representation of it.
The comparisons of intensity (6.2) or the interpersonal comparisons of utility no longer have
meaning.
At the empirical level, the consumer’s preference % is revealed in the choices among
bundles which are much simpler to observe than emotions or other mental states.

The ordinalist interpretation established itself as the standard one because, besides the
superior empirical content just mentioned, the works of Pareto showed how it is su¢ cient
for developing the consumer theory. Nevertheless, at an intuitive level many economists
continue to use cardinalist categories because of their introspective plausibility. In any case,
thanks to the utility functions we can deal with the consumer problem that has to choose
a bundle in an assigned set A of Rn+ . The consumer will be guided in such a choice by his
utility function u : A Rn+ ! R; namely, u (x) u (y) indicates that the consumer prefers
the bundle x of goods to the bundle y or that he is indi¤erent between the two. The image
Im u represents all the levels of utility that can be obtained by the consumer.
For example,
Xn
u (x) = xi
i=1
is the utility function of a consumer that orders the bundles simply according to the sum of
the quantities of the goods that they contain. The classic Cobb-Douglas utility function is
n
Y
u (x) = xi i
i=1
Pn
with the exponents i > 0 such that i=1 i = 1 (see Example 174). When i = 1=n for
each i, we have
n n
!1
Y 1 Y n

u (x) = (xi ) =
n xi
i=1 i=1
according to which the bundles are ordered according to the n-th root of the product of the
quantities of the goods that they contain.9

Going back instead to Section 2.4.1, let us consider a producer that has to decide how
much output to produce. In such a decision the so-called production function f : A Rn+ !
R plays a crucial role. The production function describes how much output f (x) is obtained
starting from a vector x 2 Rn of input. For example,
n
!1
Y n

f (x) = xi
i=1

is the Cobb-Douglas production function in which the output is equal to the n-th root of the
product of the input components.
9
Note that, by an obvious property of the product, all the bundles with at least one zero component xi
have 0 utility. From an economic viewpoint, it is not really plausible to think that the presence of one zero
component has such drastic consequences. For this reason, it is often preferred to de…ne the Cobb-Douglas
function only on Rn++ , and we will do so.
6.3. GENERAL PROPERTIES 117

6.2.2 Intertemporal choices


As in Section 2.4.2, we usually assume that the consumer, on the possible intertemporal
consumption pro…les x = (x1 ; x2 ; :::; xT ), has preferences quanti…ed by an intertemporal
utility function U : A RT ! R. For example, let us assume that the consumer has a
utility function ut : A R ! R, called instantaneous, for the consumption xt of each period.
In this case a possible form of the intertemporal utility function is

T
X
T 1 t 1
U (x) = u1 (x1 ) + u2 (x2 ) + + uT (xT ) = ut (xt ) (6.4)
t=1

where 2 (0; 1) is a subjective discount factor that depends on how “patient”the consumer
is. The more patient the consumer, i.e., the more he is willing to postpone his consumption
of a given quantity of the good, the higher the value of . The closer gets to 1, the closer
we approach the form

T
X
U (x) = u1 (x1 ) + u2 (x2 ) + + uT (xT ) = ut (xt )
t=1

in which the consumption in each period is evaluated in an identical way. On the contrary, the
closer gets to 0, the closer U (x) gets to u1 (x1 ), that is, the consumer becomes extremely
impatient and does not attach importance to future consumptions.

6.3 General properties


6.3.1 Preimages and level curves
The notion of preimage is dual to that of image. Let f : A ! B. Given a point y 2 B, its
preimage, denoted by f 1 (y), is the set

1
f (y) = fx 2 A : f (x) = yg

of the elements of the domain whose image is y. More generally, given any subset D of the
codomain B, its preimage f 1 (D) is the set

1
f (D) = fx 2 A : f (x) 2 Dg

of the elements of the domain whose images belong to D.


The next examples illustrate these notions. For the sake of brevity, we will consider as
sets D only intervals and singletons, but analogous considerations hold for other types of
sets.

Example 177 Consider the function f : A ! B that to each person associates the date of
birth. If y 2 B is a possible such date, f 1 (y) is the set of the (living) persons that have y
as date of birth; in other words, all the persons in f 1 (y) have the same age. N
118 CHAPTER 6. FUNCTIONS

Example 178 Let f : R ! R be given by f (x) = x3 . We have Im f = R. For each y 2 R,


n 1o
1
f (y) = y 3

For example, f 1 (27) = 3. The preimage of a closed interval [a; b] is


h 1 1i
f 1 ([a; b]) = a 3 ; b 3

For example, f 1 ([ 8; 27]) = [ 2; 3]. N

Example 179 Let f : R ! R be given by f (x) = x2 . We have Im f = R+ . The preimage


of each y 0 is
p p
f 1 (y) = f y; yg
while that of each y < 0 is f 1 (y) = ;. For simplicity, we denote the preimage of an open
interval (a; b) by f 1 (a; b) instead of f 1 ((a; b)). It is
8 p p p p
>
> b; a [ a; b if a 0
>
<
f 1 (a; b) = ; if b < 0
>
> p p
>
: b; b if a < 0 < b

Observe that as in the last case, when a < 0; we have f 1 (a; b) = f 1 ([0; b)). This is due to
the fact that the elements between a and 0 have no preimage. For example, if D = ( 1; 2),
then p p
f 1 (D) = 2; 2

Note that
1 1 1
f (D) = f ([0; 2)) = f ( 1; 2)
that is the negative elements of D are irrelevant (since they do not belong to the image of
the function). N

For a function f : A Rn ! R of several variables, resorting to an appropriate topo-


10
graphic term, the preimage f 1 (k) is often called level curve of f in k (or of height k, with
k 2 R). In other words, the level curve of f in k is the set
1
f (k) = fx 2 A : f (x) = kg

This terminology, which expresses the idea that the points of f 1 (k) are the points of the
domain in which the function reaches the “level”k, is particularly …tting in several economic
applications, as we will see shortly. The level curves are especially used for the functions
f : R2 ! R because in this case it is possible to give a geometric representation that may
prove illuminating.
10
The motivation is the same as the one that leads to representing the mountains on a geographic map
through the so-called isohypses, i.e., the ideal lines that connect all the points at the same altitude above the
sea level. For the functions of two variables, the problem is exactly the same: it is possible to represent a
surface in R3 through the lines that join the points (x1 ; x2 ) for which the function assumes the same value k.
6.3. GENERAL PROPERTIES 119

Example 180 Let f : R2 ! R be given by f (x1 ; x2 ) = x21 + x22 . For every k 0, the level
curve f 1 (k) is the locus in R2 of equation

x21 + x22 = k
p
i.e., it is the circle with center in the origin and radius k. Graphically, the level curves can
therefore be represented as:

while the graph of the function is:

4
x3

0
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

Two di¤erent level curves of the same function cannot have any point in common, that
is,
1 1
f (k1 ) \ f (k2 ) = ; (6.5)
120 CHAPTER 6. FUNCTIONS

if k1 6= k2 . Indeed, assuming there is a point x 2 Rn that belongs to the two curves of levels
k1 and k2 ,one would have f (x) = k1 and f (x) = k2 with k1 6= k2 , but this is forbidden
because, by de…nition, a function assumes only one value in each point.

p
Example 181 Let f : A R2 ! R be given by f p (x1 ; x2 ) = 7x21 x2 . For every k 0,
the level curve f 1 (k) is the locus in R2 of equation 7x21 x2 = k, that is, x2 = k 2 +7x21 .
It is a parabola that intersects the vertical axis in k 2 . Graphically:

7
x
6 2

1
k= 0
0
O x
1
-1
k= 1
-2

-3

-4
k= 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Example 182 The function f : R++ R ! R given by

s
x21 + x22
f (x1 ; x2 ) =
x1

is de…ned only for x1 > 0. Its level curves have equation

s
x21 + x22
=k
x1

that is x21 + x22 k 2 x1 = 0, and therefore they are circles passing through the origin and
6.3. GENERAL PROPERTIES 121

with centres k 2 =2; 0 , all on the horizontal axis. Graphically:

Note that, although all such circles have the origin as common point, the “true”level curves
are the circles without the origin (because in (0; 0) the function is not de…ned) and that they
cannot have any point in common. N

O.R. We limit ourselves to functions of two variables. The generic level curve of f has
equation
f (x1 ; x2 ) = k
It can be rewritten, in an apparently more complicated form, as

y = f (x1 ; x2 )
y=k

but this recasting exhibits well its geometric meaning:

(i) the equation y = f (x1 ; x2 ) represents a surface in R3 ;

(ii) the equation y = k represents an horizontal plane (it contains the points (x1 ; x2 ; k) 2
R3 , i.e., all the points of “height” k);

(iii) the brace f geometrically means intersection between the sets de…ned by the two
equations.

The curve of level k appears therefore as the intersection between the surface that rep-
resents f and a horizontal plane.
122 CHAPTER 6. FUNCTIONS

x3
-2

-4
2
1 2
0 1
0
-1
-1
x2 -2 -2
x1

Level curve of a generic surface

Hence, the various level curves are obtained by cutting the surface horizontally with hori-
zontal planes (at various levels) and representing the edges of the “slices” obtained in this
way on the plane (x1 ; x2 ). H

Indi¤erence curves
We see now a classical economic application of the level curves. Given a utility function
u : A Rn+ ! R, the level curves
1
u (k) = fx 2 A : u (x) = kg
are called indi¤erence curves. In other words, an indi¤erence curve is formed by all the
bundles x 2 Rn+ that have the same utility k, and are therefore indi¤erent for the consumer.
The set u 1 (k) : k 2 R of all the indi¤erence curves is sometimes called indi¤ erence map.
Example 183 Consider the simple Cobb-Douglas utility function u : R2+ ! R given by
1
u (x) = (x1 x2 ) 2 : For every k > 0 we have
n 1
o
u 1 (k) = x 2 R2+ : (x1 x2 ) 2 = k = x 2 R2+ : x1 x2 = k 2
k2
= x 2 R2+ : x2 =
x1
Therefore, the indi¤erence curve of level k is the hyperbola of equation
k2
x2 =
x1
When k > 0 varies we get the indi¤erence map u 1 (k) , i.e.,
k
6.3. GENERAL PROPERTIES 123

8
y
7

6 k=3

5
k=2
4

2 k=1

0
O x
-1
0 0.5 1 1.5 2 2.5 3 3.5

Note that the property of the indi¤erence curves being disjoint is nothing but a special
case of property (6.5) valid for any family of level curves.

For a production function f : A Rn+ ! R, the level curves


1
f (k) = fx 2 A : f (x) = kg

are called isoquants. In other words, an isoquant is the set of all the input vectors x 2 Rn+
that produce the same output. The set f 1 (k) : k 2 R of all the isoquants is sometimes
called isoquant map.
Finally, for a cost function c : A R+ ! R, the level curves
1
c (k) = fx 2 A : c (x) = kg

are called isocosts. In other words, an isocost is the set of all the levels of output x 2 A that
have the same cost. The set c 1 (k) : k 2 R of all the isocosts is sometimes called isocost
map.

Indi¤erence curves, isoquants and isocosts are all examples of level curves, whose prop-
erties they inherit. For example, the fact that two level curves have no points in common –
property (6.5) –implies the analogous classical property of the indi¤erence curves, as already
observed.

6.3.2 Algebra of functions


Given two sets A and B, we denote by B A the set of all functions f : A ! B.11
Given any set A, consider the set RA of all functions f : A ! R with real values. In this
set we can de…ne in a natural way some operations that associate to two any functions in
RA a new function still in RA .
11 A
Sometimes we use the notation B instead of B A .
124 CHAPTER 6. FUNCTIONS

De…nition 184 Given any two functions f and g in RA , the function f + g is the element
of RA for which
(f + g) (x) = f (x) + g (x) 8x 2 A:

The sum function f + g : A ! R is hence built adding, for each element x of the domain
A, the images f (x) and g (x) of x under the two functions.

Example 185 Let RR be the set of all the functions f : R ! R. Consider f (x) = x and
g (x) = x2 . The sum function f + g is de…ned by (f + g) (x) = x + x2 . N

In a similar way we de…ne:

(i) the di¤erence function (f g) (x) = f (x) g (x) for every x 2 A;

(ii) the product function (f g) (x) = f (x) g (x) for every x 2 A;

(iii) the ratio function (f =g) (x) = f (x) =g (x) for every x 2 A, provided g (x) 6= 0.

We have introduced four operations in the set RA , based on the four basic operations on
the real numbers. It is easy to see that these operations enjoy analogous properties to those
of the basic operations. For example, the addition is commutative, that is, f + g = g + f ,
and associative, that is, (f + g) + h = f + (g + h).

N.B. (i) In De…nition 184 and in that of the other operations the functions have to share
p
the same domain A. For example, if f (x) = x2 and g (x) = x, the sum f + g is meaningless
because, for x < 0, the function g is not de…ned. (ii) The domain A is any set: numbers,
chairs, or other. On the contrary, it is essential that the codomain is R because it is among
real numbers that we are able to perform the four basic operations. O

6.3.3 Composition
Consider two functions f : A ! B and g : C ! D, with Im f C. Take any point x 2 A.
Since Im f C, the image f (x) belongs to the domain C of the function g. We can apply
the function g to the image f (x), obtaining in such a way the element g (f (x)) of D. Indeed,
the function g has as its argument the image f (x) of x.

1.6 A Im(f) ⊆ C D

1.4

1.2 f g
x f(x) g(f(x))
1

0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
6.4. CLASSES OF FUNCTIONS 125

We have therefore associated to each element x of the set A the element g (f (x)) of the
set D. This rule, called of composition, starts with the functions f and g and de…nes a new
function from A in D, denoted by g f . Formally:

De…nition 186 Let A, B, C and D be four sets and f : A ! B and g : C ! D two


functions. If Im f C, the composite (or compound) function g f : A ! D is de…ned by

(g f ) (x) = g (f (x)) 8x 2 A

Note that the inclusion condition, Im f C, is key in making the composition possible.
Let us give some examples.

Example 187 Let f; g 2 RR be given by f (x) = x2 and g (x) = x + 1. In this case


A = B = C = D = R, and the inclusion condition is trivially satis…ed. Consider g f . Given
x 2 R, one has f (x) = x2 . The function g has therefore as its argument x2 , so that

g (f (x)) = g x2 = x2 + 1

Hence, the composite function g f : R ! R is given by (g f ) (x) = x2 + 1.


Consider instead f g. Given x 2 R, one has g (x) = x + 1. The function f has therefore
as its argument x + 1, whence

f (g (x)) = f (x + 1) = (x + 1)2

The composite function f g : R ! R is therefore given by (f g) (x) = (x + 1)2 . N


p
Example 188 Consider f : R+ ! R given by f (x) = x and g : R ! R given by
g (x) = x 1. In this case B = C = D = R and A = R+ . The inclusion condition is satis…ed
for g f because Im f = R+ R, but not for f g, because Im g = R is not included in R+ ,
which is the domain of f .
p
Let us consider g f . Given x 2 R, we have f (x) = x. The function g has therefore as
p
its argument x, and so p p
g (f (x)) = g x = x 1
p
The composite function g f : R+ ! R is given by (g f ) (x) = x 1. N

Example 189 If in the previous example we consider g~ : [1; +1) ! R given by g~ (x) = x 1,
the inclusion condition is satis…ed
p for f g~, because Im g~ = [0; +1) = R+ . In particular,
f g~ : [1; +1) ! R is given by x 1. As we will see soon in Section 6.7, the function g~ is
the restriction of g to [1; +1). N

Example 190 Let A be the set of all Italian citizens, f : A ! R the function that to
each of them associates his income for this year, and g : R ! R the function that to each
possible income associates the tax that must be paid. The composite function g f : A ! R
establishes the correspondence between each Italian and the tax that he has to pay. For the
tax o¢ ces (and also for the citizens) such composite function is of great interest. N

6.4 Classes of functions


In this section we introduce some important classes of functions.
126 CHAPTER 6. FUNCTIONS

6.4.1 Injective, surjective, and bijective functions


Given two sets A and B, a function f : A ! B is called injective (or one-to-one) if

x 6= y =) f (x) 6= f (y) ; 8x; y 2 A (6.6)

that is, if to di¤erent elements of the domain f associates di¤erent elements of the codomain.
Graphically:

1.6

A B
1.4
a
1
1.2 b
1
b
3
1 a b
2 2

0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

A simple example of injective function is f (x) = x3 . Indeed, two distinct real numbers have
always distinct cubes, that is, x 6= y implies x3 6= y 3 for every x; y 2 R. A classical example
of non-injective function is f (x) = x2 : for instance, to the two distinct points 2 and 2 of
R there corresponds the same square, that is, f (2) = f ( 2) = 4.
Note that (6.6) is equivalent to the contrapositive:12

f (x) = f (y) =) x = y 8x; y 2 A

which requires that two elements of the domain that have the same image be equal.

Given two sets A and B, a function f : A ! B is called surjective (or onto) if

Im f = B

that is, if for each element y of B there exists at least an element x of A such that f (x) = y.
In other words, a function is surjective if each element of the codomain is the image of at
least one point in the domain.

Example 191 The function f : R ! R given by f (x) = x3 is surjective because each y 2 R


1 1
is the image of y 3 2 R, that is, f y 3 = y. On the other hand, the function f : R ! R
given by f (x) = x2 is not surjective, because no y < 0 is the image of a point of the domain.
N
12
Given two properties p and q, we have p =) q if and only if :q =) :p (: stands for “not”). The
implication :q =) :p is the contrapositive of the original implication p =) q. See Appendix C.
6.4. CLASSES OF FUNCTIONS 127

By recalling what we said about codomains, we note that a function f : A ! B can


always be written as f : A ! Im f , that is, it can be made surjective (it is su¢ cient to take
B = Im f ). For example, if we denote the square function x2 as f : R ! R+ , it becomes
surjective. Therefore, by choosing in a suitable way the codomain, each function becomes
surjective. This however does not mean that surjectivity is a notion without interest: as we
will see, the set B is often …xed (for various reasons) a priori and it is important to distinguish
the functions that have B as image, that is, the surjective ones, from those whose image is
only contained in B.

Finally, given two sets A and B, a function f : A ! B is called bijective if it is both


injective and surjective. In this case, we can “back and forth” between the sets A and B
using f : from any x 2 A we pass to a unique y = f (x) 2 B, while from any y 2 B we go
back to a unique x 2 A such that y = f (x).

1.6

A B
1.4
a b
1 1
1.2

1 a b
2 2

0.8
a b
3 3

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

For example, the function f : R ! R given by f (x) = x3 is bijective. In the case of …nite
sets we have the following simple, but interesting, result, where jAj denotes the cardinality
of a …nite set A, that is, the number of elements that belong to it.

Proposition 192 Let A and B be two any …nite sets. There exists a bijection f : A ! B if
and only if jAj = jBj.

Proof “If”. Denote jAj = jBj = n and write A = fa1 ; a2 ; : : : ; an g and B = fb1 ; b2 ; : : : ; bn g.
Then de…ne the bijection f : A ! B by f (ai ) = bi for i = 1; 2; ; n.
“Only if”. Let f : A ! B be a bijection. By injectivity, we have jAj jBj. Indeed, to
each x 2 A there corresponds a distinct f (x) 2 B. On the other hand, by surjectivity, we
have jBj jAj. Indeed, for each y 2 B, set C (y) = f 1 (y) = fx 2 A : f (x) = yg. If y1 6= y2 ,
we have C (y1 ) \ C (y2 ) = ;. Hence, setting C = fC (y) : y 2 Bg, we have jBj = jCj. But, it
is easy to see that jCj jAj, whence jBj jAj. In conclusion, we have jAj = jBj.

As we will see in Chapter 7, paraphrasing a famous quote of David Hilbert, this result is
the door to the paradise of Cantor.
128 CHAPTER 6. FUNCTIONS

6.4.2 Inverse functions


Given two sets A and B, let f : A ! B be an injective function. Then, to each element f (x)
of the image Im f there corresponds a unique element x 2 A such that f (x) = y. The function
so determined is called inverse function of f . The inverse function of an injective function f
therefore associates to each element of the image of f its unique preimage. Formally:

De…nition 193 Let f : A ! B be an injective function. The function f 1 : Im f ! A


de…ned by f 1 (y) = x if and only if f (x) = y is called the inverse function of f .

We therefore have
1
f (f (x)) = x 8x 2 A (6.7)
and
1
f f (y) = y 8y 2 Im f (6.8)
The inverse functions go the opposite way to the original ones: from x 2 A we arrive to
f (x) 2 B, and we go back with f 1 (f (x)) = x.

1.6

A B
1.4

1.2

f
1 x y
-1
f
0.8

0.6

0.4
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

It makes sense to talk about the inverse function only for injective functions, which are
then called invertible. Indeed, if f were not injective and there were therefore two elements
of the domain x1 6= x2 with the same image y = f (x1 ) = f (x2 ), the set of the preimages of
y would not be a singleton (because it would contain at least the two elements x1 and x2 )
and the relation f 1 would not be a function. When the function f is also surjective, and it
is therefore bijective, we have f 1 : B ! A. In such a case the domain of the inverse is the
entire codomain of f .

Example 194 (i) Let f : R ! R be the bijective function f (x) = x3 . From y = x3 it


1 1
follows x = y 3 . The inverse f 1 : R ! R is given by f 1 (y) = y 3 , that is, given the
1
irrelevance of the name of the independent variable, f 1 (x) = x 3 .
(ii) Let f : R ! R be the bijective function f (x) = 3x . From y = 3x it follows x = log3 y.
The inverse f 1 : R ! R is given by f 1 (y) = log3 y, that is, f 1 (x) = log3 x. N
6.4. CLASSES OF FUNCTIONS 129

Example 195 Let f : R ! R be given by

8 x
< if x < 0
f (x) = 2 :
:
3x if x 0

From y = x=2 it follows x = 2y, while from y = 3x it follows x = y=3. Therefore,

8
< 2y if y < 0
1
f (y) = :
: y if y 0
3

Example 196 Let f : R f0g ! R be given by f (x) = 1=x. From y = 1=x it follows that
x = 1=y, and therefore f 1 : R f0g ! R is given by f 1 (y) = 1=y. In this case f = f 1 .
Note that R f0g is both the domain of f 1 and the image of f: N

Example 197 The function f : R ! R de…ned by

(
x if x 2 Q
f (x) =
x if x 2
=Q

even if not appealing, is injective (and surjective) and therefore invertible. N

1
It is easy to see that, when it exists, the inverse (g f ) of the composite function g f
is
1 1
f g (6.9)

that is, the composition of the inverse ones, but exchanged of place: indeed from y = g (f (x))
we get g 1 (y) = f (x) and …nally f 1 g 1 (y) = x. On the other hand, in dressing, …rst we
put the underpants (f ) and then the trousers (g); in undressing, …rst we take o¤ the trousers
(g 1 ) and then the underpants (f 1 ).

O.R. The graph of the function f 1 is the same as that of f , once that the Cartesian axes
have been rearranged. The simplest way of seeing it is to trace the graph of f on a paper
sheet with little thickness, to hold it up to the light rotating the axes by 900 so as to exchange
abscissae and ordinates: what appears is the graph of f 1 .
130 CHAPTER 6. FUNCTIONS

5 5

y y
4 4

3 3

2 2

1 1

0 0
x x
O O
-1 -1

-2 -2
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4

p
Function y = f (x) = 3
x Function y = f 1(x) = x3

Inverses and cryptography The computation of the cube x3 of any scalar x is much
p
easier than the computation of the cube root 3 x: it is much easier to compute 803 = 512; 000
p
(three multiplications su¢ ce) than 3 512; 000 = 80. In other words, the computation of the
p
cubic function f (x) = x3 is much easier than the computation of its inverse f 1 (x) = 3 x.
This computational di¤erence increases signi…cantly as we take higher and higher odd powers
(for example f (x) = x5 , f (x) = x7 and so on).
Similarly, while the computation of ex is fairly easy, that of log x is much harder (before
the advent of electronic calculators, logarithmic tables were used to aid such computations).
From a merely computational viewpoint (not theoretical, where everything works smoothly),
the inverse function f 1 may be very di¢ cult to deal with. The injective functions, for which
the computation of f is easy, while that of f 1 is complex, are called one-way.13
For example, let A = f(p; q) 2 P P : p < qg, and consider the function f : A P P ! N
de…ned as f (p; q) = pq, which associates to each pair of prime numbers p; q 2 P, with p < q,
their product pq. For example, f (2; 3) = 6 and f (11; 13) = 143. Thanks to the Fundamental
Theorem of Arithmetic, it is an injective function.14 Given two prime numbers p and q, the
computation of their product is a trivial multiplication. Instead, given any natural number
n it is quite complex, and it can require a long time, even for a powerful computer, to de-
termine if it is the product of two prime numbers. In this regard, the reader may recall the
discussion regarding factorization and primality tests from Section 1.3.2 (to experience the
di¢ culty …rsthand, the reader may try to check whether the number 4343 is the product of
two prime numbers). This makes the computation of the inverse function f 1 very complex,
as opposed to the very simple computation of f . For this reason, f is a classic example of a
one-way function.
13
The notions of “simple” and “complex”, here used qualitatively, can be made more rigorous (as the
curious reader may discover in cryptography texts).
14
But not surjective: for example 4 2
= Im f because no two di¤erent prime numbers whose product is 4
exist.
6.4. CLASSES OF FUNCTIONS 131

Let us now look at a simple application of one-way functions to cryptography. Consider


a user who manages reserved data with an information system accessible by means of a
password. Suppose the password is numerical and that, for the sake of simplicity, it is made
up of any pair of natural numbers. The system has a speci…c data storage unit in which
it saves the password chosen by the user. When the user inputs this password, the system
veri…es whether it coincides with the one stored in its memory.
This scheme has an obvious Achilles’ heel: the system manager can access such data
storage and hence can reveal the password to any third party interested in accessing the
user’s personal data. One-way functions make it possible to mitigate this problem. Indeed,
let f : A N N ! N be a one-way function which associates a natural number f (n; m) to
any pair of natural numbers (n; m) 2 A. Instead of memorizing the chosen password, let us
call it (n; m), the system now memorizes its image f (n; m). When the user inserts a password
(n; m) the system computes f (n; m) and compares it with f (n; m). If f (n; m) = f (n; m),
the password is correct, that is, (n; m) = (n; m), and the system allows the user to log in.
Since the function is one-way, the computation of f (n; m) is simple and requires a level
of e¤ort only slightly higher than that needed to compare passwords directly. The memory
will no longer store the password (n; m), but its image f (n; m), and this image will be the
only thing the manager will be able to access. Even if he (or the third party to whom he
gives the information) knows the function f , the fact that the computation of the inverse
f 1 is very complex (and requires a good deal of time) makes it computationally, and hence
practically, very di¢ cult to recover the password (n; m) from the knowledge of f (n; m). But,
without the knowledge of (n; m) it is impossible to access sensitive data.
For example, if instead of any natural number we require the password to consist of a
pair (p; q) of prime numbers, we can use f (p; q) = pq as one-way function. The manager has
access to the product pq, for example the number 4343, and it will not be easy to recover
the pair of prime numbers (p; q) that generated the product, and hence the password, in a
reasonably short amount of time.
To sum up, one-way functions make it possible to signi…cantly strengthen the protec-
tion of restricted access systems. The design of better and better one-way functions which
combine the ease of computation of f (x) with increasingly complex inverses f 1 (x), is an
important …eld of research in Cryptography.

6.4.3 Bounded functions


Let f : A ! R be a function with domain A and codomain the real line. We say that it is:

(i) bounded from above if its image Im f is a set bounded from above in R, i.e., if there
exists M 2 R such that f (x) M for every x 2 A;
(ii) bounded from below if its image Im f is a set bounded from below in R, i.e., if there
exists m 2 R such that f (x) m for every x 2 A;
(iii) bounded if it is both bounded from above and from below.

For example, the function f : R f0g ! R given by


1
f (x) =
jxj
132 CHAPTER 6. FUNCTIONS

is bounded from below, but not from above, since f (x) 0 for every x 2 R, while the
function f : R ! R given by f (x) = x2 is bounded from above, but not from below, since
f (x) 0 for every x 2 R.
The next lemma gives us a simple, but very useful, condition of boundedness.
Lemma 198 A function f : A ! R is bounded if and only if there exists k > 0 such that
jf (x)j k 8x 2 A (6.10)
Proof If f is bounded, there exist m; M 2 R such that m f (x) M . Let k > 0 be such
that k m M k. Then (6.10) holds. Vice versa, suppose that (6.10) holds. Thanks
to (4.1), which holds also for , we have k f (x) k, which implies that f is bounded
both from above and from below.

The function de…ned by


8
< 1
> if x 1
f (x) = 0 if 0 < x < 1 (6.11)
>
:
2 if x 0
is bounded since jf (x)j 2 for every x 2 R.

Thus, we have a …rst taxonomy of the functions with real values f : A ! R, that is, of
the elements of the space15 RA . Note that such taxonomy is not exhaustive, i.e., there exist
functions that do not satisfy any of the conditions (i)–(iii): this is the case, for example,
when f (x) = x. Such functions are called unbounded (their image is an unbounded set).
We denote by supx2A f (x), often shortened as sup f , the supremum of the image of a
function f : A ! R bounded from above, that is,
sup f (x) = sup (Im f )
x2A

By the de…nition of the supremum, a number M is such that f (x) M for every x 2 A if
and only if sup f M .
Similarly, we denote by inf x2A f (x) –often shortened as inf f –the in…mum of the image
of a function f : A ! R bounded from below, that is,
inf f (x) = inf (Im f )
x2A

By the de…nition of the in…mum, a scalar m is such that f (x) m for every x 2 A if and
only if m inf f .
Clearly, a bounded function f : A ! R has both extrema, and so
inf f f (x) sup f for every x 2 A
In particular, the numbers m and M are such that m f (x) M for every x 2 A if and
only if m inf f sup f M .
Example 199 For the function (6.11) one has that sup f = 1 and inf f = 2. For the
function f : R f0g ! R given by f (x) = 1= jxj, which is bounded from below, but not
from above, one has inf f = 0. N
15
Note the use of the term space to denote a set of reference (in this case the set of all the functions of RA ).
6.4. CLASSES OF FUNCTIONS 133

6.4.4 Monotonic functions


We introduce now an important class of functions f : A Rn ! R, namely, functions with
real values whose domain A is a subset of Rn .

Monotonic functions on R

We start with the case n = 1, which is of particular importance. In detail

De…nition 200 A function f : A R ! R is said to be:

(i) increasing, if
x > y =) f (x) f (y) 8x; y 2 A (6.12)

strictly increasing, if

x > y =) f (x) > f (y) 8x; y 2 A (6.13)

(ii) decreasing, if
x > y =) f (x) f (y) 8x; y 2 A (6.14)

strictly decreasing, if

x > y =) f (x) < f (y) 8x; y 2 A

(iii) constant, if there exists k 2 R such that

f (x) = k 8x 2 A

Note that a function is constant if and only if it is both increasing and decreasing. In
other words, constancy is equivalent to having both monotonicity properties. It is for this
reason that we have introduced constancy among the forms of monotonicity. Soon, we will
see that for vector functions the relation between constancy and monotonicity is a bit more
subtle.

Increasing or decreasing functions are generically called monotonic. We thus have strict
monotonicity when the inequality between the images f (x) and f (y) is strict for all points
x 6= y in the domain. In other words, strict monotonicity excludes the possibility that the
function is constant in some region of the domain. Formally:

Proposition 201 An increasing function f : A R ! R is strictly increasing if and only


if
f (x) = f (y) =) x = y 8x; y 2 A; (6.15)

that is, if and only if it is injective.


134 CHAPTER 6. FUNCTIONS

An analogous result holds for strictly decreasing functions. Strictly monotonic functions
are therefore injective, and hence invertible.

Proof “Only if”. Let f be strictly increasing and let f (x) = f (y). Suppose, by contradiction,
that x 6= y: x > y or y > x. In both cases, by (6.13), we have f (x) 6= f (y), which contradicts
f (x) = f (y). It follows that x = y, as desired.
“If”. Let us suppose that (6.15) holds. Let f be increasing. We prove that it is also
strictly increasing. Let x > y. By increasing monotonicity, we have f (x) f (y), but we
cannot have f (x) = f (y), because from (6.15) it would follow that x = y. We have therefore
f (x) > f (y), as claimed.

Example 202 The functions f : R ! R given by f (x) = x and f (x) = x3 are strictly
increasing, while the function
x if x 0
f (x) =
0 if x < 0
is increasing, but not strictly increasing, since it is constant for every x < 0. The same holds
for the function de…ned by
8
< x 1 if x 1
f (x) = 0 if 1<x<1 (6.16)
:
x+1 if x 1

since it has a constant piece in [ 1; 1]. N

Note that in (6.12) we can replace x > y by, x y without any consequence since we
have f (x) = f (y) if x = y. Hence, increasing monotonicity is equivalent to

x y =) f (x) f (y) (6.17)

Consider the converse implication,

f (x) f (y) =) x y (6.18)

which requires that, to larger values of the image correspond larger values of the argument.
Clearly, f (x) = f (y) is equivalent to having both f (x) f (y) and f (y) f (x), which in
turn, by (6.18), imply x y and y x, that is, x = y. Therefore, from (6.18) it follows that

f (x) = f (y) =) x = y (6.19)

In the light of Proposition 201, we conclude that an increasing function that satis…es also the
converse implication (6.18) is strictly increasing. The next result shows that the converse is
also true, establishing in this way an interesting characterization of the strictly increasing
functions; an analogous result holds for the strictly decreasing functions.

Proposition 203 A function f : A R ! R is strictly increasing if and only if

x y () f (x) f (y) 8x; y 2 A (6.20)


6.4. CLASSES OF FUNCTIONS 135

Proof Thanks to what we have seen above, it remains to prove the “Only if”part, i.e., that
a strictly increasing function satis…es (6.20). Since a strictly increasing function is increasing,
the implication
x y =) f (x) f (y)
is obvious. To prove (6.20) it remains to show that

f (x) f (y) =) x y

Let f (x) f (y) and suppose, by contradiction, that x < y. The strictly increasing mono-
tonicity implies f (x) < f (y), which contradicts f (x) f (y). We have therefore x y, as
desired.

Monotonic functions on Rn
The monotonicity notions seen in the case n = 1 generalize in a natural way to the case of
arbitrary n, but with some delicate aspects due to the two peculiarities of the case n 2,
that is, the incompleteness of and the presence of two notions of strict inequality, > and
. For the sake of brevity, we consider the increasing monotonicity (analogous notions hold
for the decreasing monotonicity). The notion of increasing monotonicity can be extended in
an obvious way: a function f : A Rn ! R is said to be increasing if

x y =) f (x) f (y) 8x; y 2 A (6.21)

Note that this notion does not concern vectors x and y that cannot be compared, such
as for example (1; 2) and (2; 1) in R2 . Analogously, it is possible to introduce the concept of
decreasing function. Moreover, f is constant if there exists k 2 R such that

f (x) = k 8x 2 A

More delicate is the extension to Rn of the strict monotonicity, given that we have two
distinct concepts of strict inequality. A function f : A Rn ! R is said to be strictly
increasing if
x > y =) f (x) > f (y) 8x; y 2 A
and strongly increasing if is increasing and

x y =) f (x) > f (y) 8x; y 2 A (6.22)

We have a simple hierarchy among these notions:

Proposition 204 For functions f : A Rn ! R it holds that:

strictly increasing =) strongly increasing =) increasing (6.23)

They are therefore increasingly stronger notions of monotonicity. In applications we will


have to choose the most pertinent form for the problem studied.16
16
The notions of monotonicity for functions of several variables studied here are componentwise, i.e., they
are based on the comparison of the components of the vectors that are argument of the functions. Soon, in
Section 19:2 we will see another notion of monotonicity for functions of several variables.
136 CHAPTER 6. FUNCTIONS

Proof A strongly increasing function is, by de…nition, increasing. It remains to prove that
strictly increasing implies strongly increasing. Let therefore f be strictly increasing. We
need to prove that f is increasing and satis…es (6.22). If x y, we have x = y or x > y. In
the …rst case f (x) = f (y). In the second case f (x) > f (y), and hence f (x) f (y). Thus,
f is increasing. Moreover, if x y, a fortiori we have x > y, and therefore f (x) > f (y).
The function f is therefore strongly increasing.

The converses of the previous implications do not hold. An increasing function with
constant pieces is an example of increasing, but not strongly increasing function. Therefore

increasing 6=) strongly increasing

Moreover, as the next example shows, that there exist functions that are strongly increasing,
but not strictly increasing, that is,

strongly increasing 6=) strictly increasing

Example 205 The Leontief function f : R2 ! R given by

f (x) = min fx1 ; x2 g

is strongly increasing, but not strictly increasing. For example, x = (1; 2) and y = (1; 1) are
such that x > y, but f (x) = f (y) = 1. N

N.B. For operators f : Rn ! Rm with m > 1 the notions of monotonicity studied for the case
m = 1 assume a di¤erent meaning since the images f (x) and f (y) might not be comparable,
that is, neither f (x) f (y), nor f (y) f (x) holds. For example, if f : R2 ! R2 is such
that f (0; 1) = (1; 2) and f (3; 4) = (2; 1), the images (1; 2) and (2; 1) are not comparable.
For brevity, we do not deal with this issue and we leave to more advanced courses the study
of notions of monotonicity suitable for operators f : Rn ! Rm when m > 1. O

Utility functions
Let u : A ! R be a utility function de…ned on a suitable set A Rn+ of bundles of goods. A
transformation f u : A ! R of u, where f : Im u R ! R, de…nes another utility function
with the same meaning provided

u (x) u (y) () (f u) (x) (f u) (y) x; y 2 A (6.24)

In other words, the function f u orders the goods in the same way as u, that is,

x % y () (f u) (x) (f u) (y) x; y 2 A

By Proposition 203, f satis…es (6.24) if and only if it is strictly increasing. Therefore, f u is


itself a utility function if and only if f is strictly increasing. To describe such a fundamental
property of invariance of utility functions we say that they are ordinal, that is, unique up
to monotonic (strictly increasing) transformations. This is a property that lies at the basis
of the ordinalist approach, in which utility functions are a mere numerical representation of
the preference %, which is the fundamental notion (recall the discussion in Section 6.2.1).
6.4. CLASSES OF FUNCTIONS 137

Example 206 Consider the Cobb-Douglas utility function on Rn++ given by


n
Y
u (x1 ; x2 ; ; xn ) = xi i
i=1
Pn
with each i > 0 and i=1 i = 1. Taking f (x) = log x, the transform of
n
X
f u= i log xi
i=1

is a utility function equivalent to u on Rn++ :17 It is the logarithmic version of the Cobb-
Douglas function, often called log-linear utility function.18 N

The three notions of monotonicity on Rn (increasing, strongly increasing, and strictly


increasing) are very important for utility functions u : A ! R. Since their argument x 2 Rn
is a bundle of “goods”, it is natural to assume that the consumer prefers vectors with larger
amounts of the di¤erent goods, that is, “the more, the better”. According to how we state
this motto, one of the three forms of monotonicity becomes the appropriate one.
If in a vector x 2 Rn each component, that is, each type of good, is deemed by the
consumer as important, it is natural to assume that u is strictly increasing:

x > y =) u (x) > u (y) 8x; y 2 A

In this case it is su¢ cient to increase the amount of any of the goods to achieve a greater
utility: “the more of any good is always better”.
If, instead, we want to contemplate the possibility that some good can actually be useless
to the consumer, we can only ask for u to be increasing:

x y =) u (x) u (y) 8x; y 2 A (6.25)

Indeed, if a good is “useless” (as wine is for a teetotaller, or for a drunk who has already
had too much of it), the inequality x y might be determined exactly by a larger amount
of this good, keeping all the other unvaried; it is reasonable then that u (x) = u (y), since
the consumer does not get any bene…t in passing from y to x. In this case “the more of any
good can be better or indi¤erent”.
Finally, “the more of any good is always better”property implied by strict monotonicity
can be weakened in the sense of the strong monotonicity by assuming that “the more of all
the goods is always better”, that is,

x y =) u (x) > u (y) 8x; y 2 A

In this case, there is an improvement only when the amounts of all goods increase, it is not
enough to increase the amount of only some good. Such a form of monotonicity re‡ects a
17
Recall that, even if mathematically it can be de…ned on the entire positive orthant Rn+ , from the economic
viewpoint, it is precisely on Rn
++ that the Cobb-Douglas function is interesting (Example 207).
18
It is necessary to consider the Cobb-Douglas function on Rn ++ , and not on the entire positive orthant
Rn+ in order for the logarithmic transformation to be well de…ned on strictly positive numbers. While the
,
Cobb-Douglas function can be de…ned on the entire positive orthant Rn + , the log-linear function is de…ned
only on Rn ++ . On the other hand, note also what we have observed in the previous footnote.
138 CHAPTER 6. FUNCTIONS

form of complementarity among goods, so that an increase of the amounts of only some of
them can turn out to be super‡uous for the consumer if the quantities of other goods remain
unchanged. Perfect complementarity a la Leontief is the extreme case, a classical example
being the pairs of shoes, right and left.19

Example 207 (i) The Cobb-Douglas utility function on Rn++ given by


n
Y
u (x1 ; x2 ; ; xn ) = xai i (6.26)
i=1

is strictly increasing. By (6.23), it is also strongly increasing and increasing.


(ii) The Leontief utility function on Rn++ given by

u (x1 ; x2 ; ; xn ) = min fx1 ; :::; xn g

in which the goods are perfect complements, is strongly increasing. By (6.23), it is also
increasing. As we have already seen in Example 205, it is not strictly increasing.
(iii) The reader can check which properties of monotonicity hold if we consider the two
previous utility functions on the entire positive orthant Rn+ and not just on Rn++ . N

Observe that consumers with strictly monotonic or strongly monotonic utility functions
are “insatiable”, because by increasing in a suitable way their bundles their utility also
increases. This property of utility functions is sometimes called insatiability, and hence it
is shared by both strict and strong monotonicity. The unique form of monotonicity that
can encompass the possibility of satiety is increasing monotonicity (6.25): as observed for
the drunk consumer, this weaker form of monotonicity allows for the possibility that a given
good, when it exceeds a certain level, does not result in a further increase of utility. On
the contrary, it cannot happen that utility decreases: if (6.25) holds, utility either increases
or remains constant, but it never decreases. Therefore, if an extra glass of wine results
in a decrease of the drunk’s utility, this cannot be modelled by any form of increasing
monotonicity, no matter how weak.

6.4.5 Concave and convex functions (preview)


The class of concave and convex functions is of fundamental importance in economics. The
concept, which will be fully developed in Chapter 14, is anticipated here in the scalar case.
Graphically, a function is concave if the segment (called chord ) that joins any two points
(x; f (x)) and (y; f (y)) of its graph lies below the graph of the function, while it is convex if
the opposite happens, that is, if such chord lies above the graph of the function.
Formally:

De…nition 208 A function f : I ! R, de…ned on an interval I of R, is said to be concave


if
f ( x + (1 ) y) f (x) + (1 ) f (y)
19
It is useless to increase the number of the right shoes without increasing, in the same measure, that of
the left shoes (and vice versa).
6.4. CLASSES OF FUNCTIONS 139

for every x; y 2 I and every 2 [0; 1], while it is said to be convex if

f ( x + (1 ) y) f (x) + (1 ) f (y)

for every x; y 2 I and every 2 [0; 1].

Note that the domain must be an interval for the points x + (1 ) y to belong to it
so that the expression f ( x + (1 ) y) makes sense.

Example 209 The functions f; g : R ! R given by f (x) = x2 and g(x) = ex are convex,
while the function f : R ! R given by f (x) = ln x is concave. The function f : R ! R given
by f (x) = x3 is neither concave, nor convex. N

5 5

4 4

3 3

2 2

1 1

0 0
x y x y
-1 -1

-2 -2
-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4

Convex function f (x) = x2 Convex function f (x) = ex

8
3
6

2 4

2
1 x
0
y
0
x -2
y
-4
-1
-6

-2 -8

-3 -2 -1 0 1 2 3 4 5

-3
-1 0 1 2 3 4

Non-concave and non-convex


Concave function f (x) = ln x function f (x) = x3

6.4.6 Separable functions


In economics a very important role is played by the separable functions of several variables,
that is, the functions that can be de…ned as sums of scalar functions.
140 CHAPTER 6. FUNCTIONS

De…nition 210 A function f : A Rn ! R, with n 2, is said to be separable if there


exist n scalar functions gi : A R ! R such that
n
X
f (x) = gi (xi ) 8x = (x1 ; :::; xn ) 2 A
i=1

The importance of this class of functions


Pnof several variables is due to their great tractab-
ility. The most trivial example is f (x) = i=1 xi , for which the functions gi are the identity:
gi (x) = x. Let us give more examples.

Example 211 The function f : R2 ! R given by

f (x) = x21 + 4x2 8x = (x1 ; x2 ) 2 R2

is separable with g1 (x1 ) = x21 and g2 (x2 ) = 4x2 . N

Example 212 The function f : Rn++ ! R, called entropy, and given by


n
X
f (x) = xi log xi 8x = (x1 ; :::; xn ) 2 Rn++
i=1

is separable with gi (xi ) = xi log xi . N

Example 213 The intertemporal utility function (6.4), that is,


T
X
t 1
U (x) = ut (xt )
t=1

is separable with gt (xt ) = t 1


ut (xt ). N

Example 214 Separable utility functions are very important in the static case as well. The
utility functions used by the …rst marginalists were indeed of the form
n
X
u (x) = ui (xi ) (6.27)
i=1

In other words, it was assumed that the utility (cardinally intended) of a bundle x is de-
composable into the utility of the quantities xi of the various goods that compose it. This is
a restrictive assumption that ignores each possible interdependency, for example of comple-
mentarity or substitutability, among the di¤erent goods. Due to its remarkable tractability,
however, 6.27 remained for a long time the usual form of the utility functions until, at the
end of the XIX century, the works of Edgeworth and Pareto showed how to develop the
consumer theory for utility functions that are not necessarily separable. N

Example 215 If in (6.27) we set ui (xi ) = xi for all i, we obtain the important special case
n
X
u (x) = xi
i=1
6.5. ELEMENTARY FUNCTIONS ON R 141

where the goods are perfect substitutes. The utility of bundles x depends only on the sum
of the amounts of the di¤erent goods, regardless of the speci…c amounts of the individual
goods. For example, think of x as a bundle of di¤erent types of oranges, which di¤er in origin
and taste, but are identical in terms of nutritional values. In this case, if the utility of the
bundle depends only on its nutritional value, then these di¤erent types of oranges are perfect
substitutes. This case is opposite to the case of perfect complements that characterizes the
Leontief utility function. N

Example 216 More generally, if in (6.27) we set ui (xi ) = i xi for all i, with i > 0, we
have
n
X
u (x) = i xi
i=1
In this case, the goods in the bundle are no longer perfect substitutes; rather, their relevance
depends on their weights i . Therefore, in order to keep utility constant, each good can
be replaced with another according to a linear trade-o¤. Intuitively, one unit of good i is
equivalent to j = i units of good j. The notion of marginal rate of substitution (Section
23.2.2) formalizes this idea. N

Example 217 The logarithmic transformation


n
X
log u (x) = ai log xi
i=1

of the Cobb-Douglas utility function, that is, the log-linear utility function (Example 206),
is separable. The example shows that sometimes it is possible to obtain separable versions
of utility functions by using their strictly monotonic transformations. Usually, the separable
versions are the most convenient from the analytical point of view (so is, for example, the
log-linear utility, handier to manipulate with respect to the non-separable version (6.26)). N

6.5 Elementary functions on R


The section introduces the so-called “elementary”functions, the important class of functions
that contains most of the functions of interest in the applications. Section 30.9 of Chapter
30 will continue their study.

6.5.1 Polynomial functions


The polynomial function, or polynomial, f : R ! R of degree n 0 has the form

f (x) = a0 + a1 x + + an xn

with ai 2 R for every 0 i n and an 6= 0. Let Pn be the set of all polynomials of degree
lower than or equal to n. Naturally, one has

P0 P1 P2 Pn

Example 218 f (x) = x + x2 2 P2 , and f (x) = 3x 10x4 2 P4 . N


142 CHAPTER 6. FUNCTIONS

Example 219 A polynomial f has degree zero when there exists a 2 R such that f (x) = a
for every x. The constant functions can therefore be regarded as polynomials of degree zero.
N

[
The set of all polynomials, of any degree, is denoted by P; that is, P = Pn .
n 0

6.5.2 Exponential and logarithmic functions

Given a > 0, the function f : R ! R de…ned by

f (x) = ax

is called the exponential function of base a.


In the sequel we will systematically use the number e as base and we will call f (x) = ex
exponential function, without further speci…cation. Sometimes it is denoted by f (x) = exp x.
Thanks to the properties of the expression ex , the exponential function has a fundamental
role in many …elds of the analysis. Its graph is:

5
y
4

1 1

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

Function ex

The negative exponential function f (x) = e x is also very important; its graph is:
6.5. ELEMENTARY FUNCTIONS ON R 143

2
y
1

0
O x
-1 -1

-2

-3

-4

-5
-3 -2 -1 0 1 2 3 4

Function e x

The image of the exponential function is the set (0; 1) of the strictly positive scalars.
Moreover, thanks to Lemma 40-(iv), the exponential function ax is:

(i) strictly increasing if a > 1;

(ii) constant if a = 1;

(iii) strictly decreasing if 0 < a < 1.

Provided a 6= 1, the exponential function is strictly monotonic, and therefore injective.


Its inverse has as domain the image (0; 1) and, by Proposition 42 of Section 1.5, it is the
function f : (0; 1) ! R de…ned as
f (x) = loga x
called logarithmic function of base a > 0. Note that, for what we have just observed, a 6= 1.

The statements of Proposition 42, i.e., that

loga ax = x 8x 2 R

and
aloga x = x 8x 2 (0; 1)
are therefore nothing but the relations (6.7) and (6.8) for the inverse functions, i.e., the
relations f 1 (f (x)) = x and f f 1 (y) = y.

In light of the importance of the natural logarithm, we will call f (x) = log x = loge x
logarithmic function without further speci…cation.20 As the exponential function, to which
it is strictly linked as we will see soon, the logarithmic function is central in many …elds. Its
graph is:
20
Another standard notation for log x is ln x.
144 CHAPTER 6. FUNCTIONS

5
y
4

0
O 1 x
-1

-2
-3 -2 -1 0 1 2 3 4

Function log x

We conclude with a result that summarizes the properties of monotonicity of these ele-
mentary functions.

Lemma 220 Both the exponential function ax and the logarithmic function loga x are in-
creasing if a > 1 and decreasing if 0 < a < 1.

Proof For the exponential function, observe that, when a > 1, also ah > 1 for every h > 0.
Therefore ax+h = ax ah > ax for every h > 0. For the logarithmic function, after observing
that loga k > 0 if a > 1 and k > 1, we have

h h
loga (x + h) = loga x 1 + = loga x + loga 1 + > loga x
x x
for every h > 0, as desired.

6.5.3 Trigonometric and periodic functions


The trigonometric functions, and more generally the periodic functions, are important in
many applications. We introduce them, referring the reader to the Appendix for a recall of
some elementary notions of trigonometry.21

Trigonometric functions
The sine function f : R ! R de…ned by f (x) = sin x is the …rst example of trigonometric
function. For each x 2 R we have

sin (x + 2k ) = sin x 8k 2 Z

The graph of the sine function is:


21
For a more detailed introduction to the topic, we refer the reader to secondary math school textbooks.
6.5. ELEMENTARY FUNCTIONS ON R 145

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

The function f : R ! R de…ned by f (x) = cos x is the cosine function. For each x 2 R
we have
cos (x + 2k ) = cos x 8k 2 Z

Its graph is:

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

Finally, the function f : R 2 + k ; k 2 Z ! R de…ned by f (x) = tan x is the tangent


function. By (B.3),
tan (x + k ) = tan x 8k 2 Z

The graph is:


146 CHAPTER 6. FUNCTIONS

10
y
8

0
O x
-2

-4

-6

-8

-10
-4 -3 -2 -1 0 1 2 3 4

The functions sin x, cos x and tan x are monotonic, and hence invertible, respectively
in the intervals [ =2; =2], [0; ], and ( =2; =2). Their inverse functions are denoted
respectively by arcsin x (or sin 1 x), arccos x (or cos 1 x), and arctan x (or tan 1 x). In
particular, restricting ourselves to the interval of strict monotonicity of the function sin x,
[ =2; =2] ; we have
h i
sin x : ; ! [ 1; 1]
2 2
Hence, the inverse function of sin x is
h i
arcsin x : [ 1; 1] ! ;
2 2
and its graph is:

3 y

O x
-1

-2

-3

-4 -3 -2 -1 0 1 2 3 4
6.5. ELEMENTARY FUNCTIONS ON R 147

Restricting ourselves to the interval [0; ] of strict monotonicity of cos x we have:

cos x : [0; ] ! [ 1; 1]

Therefore, the inverse function of cos x is

arccos x : [ 1; 1] ! [0; ]

and its graph is:

y
3

0
O x
-1

-2

-3

-4 -3 -2 -1 0 1 2 3 4

Finally, restricting ourselves to the interval ( =2; =2) of strict monotonicity of tan x
we have:

tan x : ; !R
2 2

so that the inverse function of tan x is

arctan x : R ! ;
2 2

and its graph is:


148 CHAPTER 6. FUNCTIONS

3 y

O x
-1

-2

-3
-4 -3 -2 -1 0 1 2 3 4

It is immediate to see that, for 2 (0; =2), one has 0 < sin < < tan .

Periodic functions

The trigonometric functions are the most classical periodic functions.

De…nition 221 A function f : R ! R is said to be periodic if there exists p 2 R such that,


for each x 2 R, we have
f (x + kp) = f (x) 8k 2 Z (6.28)

The smallest (if there exists) among such p > 0 is called the period of f . In particular,
the functions sin x and cos x are periodic of period 2 , while the function tan x has period
. Their graphs show the property that characterizes the periodic functions, that is, of
repeating themselves identical on each interval of width p.

Example 222 The functions sin2 x and log tan x are periodic of period . N

Let us see an example of non-trigonometric periodic function.

Example 223 The function f : R ! R given by f (x) = x [x] is called mantissa function.22
For x > 0 the mantissa of x is its decimal part; for example f (2:37) = 0:37. The mantissa
function is periodic of period 1: by (1.19), [x + 1] = [x] + 1 for every x 2 R, and therefore

f (x + 1) = x + 1 [x + 1] = x + 1 ([x] + 1) = x [x] = f (x)


22
Recall from Proposition 39 that the integer part [x] of a real number x 2 R is the greatest integer number
n 2 Z such that n x.
6.6. MAXIMA AND MINIMA OF A FUNCTION (PREVIEW) 149

The graph
2.5

2
y
1.5

0.5

-0.5

-1
O x
-1.5

-2

-2.5
-3 -2 -1 0 1 2 3

makes plain the periodicity. N

The reader can verify that periodicity is preserved by the fundamental operations among
functions. That is, if f and g are two periodic functions of same period p, the functions
f (x) + g (x), f (x) g (x) and f (x) =g (x) are also periodic (of period at most p).

6.6 Maxima and minima of a function (preview)


At this point, it is useful to introduce the concepts of maximum and minimum of a function.
We will carefully discuss them in Chapter 16. Given any function f : A ! R with any domain
A and with real values, its image Im f is a subset of R. Recall that if Im f is a bounded set,
the function is said to be bounded (Section 6.4.3). If, besides being bounded (and having
therefore supremum and in…mum), the set Im f has also maximum and/or minimum we say
that the function f has maximum and/or minimum, according to the following de…nition.

De…nition 224 Let f : A R ! R be a function with real values. An element x


^ 2 A is
called a (global) maximizer (or maximum point) of f on A if

f (^
x) f (x) 8x 2 A (6.29)

The value f (^
x) of the function at x
^ is called a global maximum (or maximum value) of f
on A. The maximum of a function f : A ! R is, if it exists, the value M 2 R such that

M = max(Im f )

In this case we write M = maxx2A f (x), and a point x0 2 A such that f (x0 ) = M is called
a maximizer of f on A.

Thus, the maximum value of f on A is nothing but the maximum of the set f (A) = Im f ,
that is,
f (^
x) = max f (A) = max(Im f )
Thanks to Proposition 33, the maximum value is unique. We denote this unique value by

max f (x)
x2A
150 CHAPTER 6. FUNCTIONS

Analogous de…nitions hold for the minimum value of f on A and for the minimizer of f on
A.
Example 225 Consider the parabola f (x) = x2 , whose graph is

5
y
4

0
O x
-1

-2
-3 -2 -1 0 1 2 3 4

As one can see from the graph, the minimizer of f is 0 and the minimum value is 0. Indeed,
0 = f (0) f (x) for every x 2 R. N
As we have seen, if it exists, the maximum (minimum) of f on A is unique. Vice versa,
the maximizer and the minimizer might not be unique; indeed, in general they are not, as
the next example shows.
Example 226 Let f : R ! R be the sine function f (x) = sin x (Section 6.5.3). Since
Im f = [ 1; 1], the unique maximum of f on R is 1 and the unique minimum of f on R is
1: Nevertheless there are both in…nitely many maximizers (all the points x = =2 + 2k
with k 2 Z) and in…nitely many minimizers (all the points x = =2 + 2k with k 2 Z), as
we can easily see from the graph:

4
y
3

0
O x
-1

-2

-3

-4
-4 -2 0 2 4 6

N
6.7. DOMAINS AND RESTRICTIONS 151

6.7 Domains and restrictions


In the …rst paragraph of the chapter we have de…ned the domain of a function as the set
on which the function is de…ned: the domain of a function f : A ! B is A. In the various
examples of functions of a real variable presented until now we have identi…ed as domain the
greatest set A R where the function f could be de…ned. For example, for f (x) = x2 the
p
domain is all R, for f (x) = x the domain is R+ , for f (x) = log x the domain is R++ , and
so on. For a function f of one or several variables we will call natural domain (or domain of
existence) the greatest set on which f can be de…ned. For example, R is the natural domain
p
of x2 , R+ is that of x, R++ is that of log x, and so on.
But, there is nothing special, except for maximality, in the natural domain: a function
can be considered as de…ned on any subset of the natural domain. For example, we can
consider x2 only for positive values of x, in order to have a quadratic function f : R+ ! R,
or we can consider log x only for values of x greater than 1, in order to have a logarithmic
function f : [1; +1) ! R, and so on.
In particular, given a function f : A ! B, it is sometimes important to consider restric-
tions to subsets.

De…nition 227 Let f : A ! B be a function and let C A. The function g : C ! B


de…ned by
g(x) = f (x) 8x 2 C
is called the restriction of f to C and it is denoted by fjC .

The restriction fjC can therefore be seen as f restricted on the subset C of A. Thanks
to the smaller domain, the function fjC can satisfy properties di¤erent from those of the
original function f .

Example 228 Let g : [0; 1] ! R be de…ned by g(x) = x2 . The function g can be seen as
the restriction to the interval [0; 1] of the function f : R ! R given by f (x) = x2 ; that
is g = fj[0;1] . Thanks to its restricted domain, the function g has more (better) properties
than the function f . For example: g is strictly increasing, while f is not; g is injective (and
therefore invertible), while f is not; g is bounded, while f is only bounded from below; g
has both a (global) maximizer and a minimizer, while f does not have a maximizer. N

Example 229 Let g : R ! R be de…ned by g(x) = x. The function g can be seen as


the restriction to ( 1; 0] both of f : R ! R given by f (x) = jxj and of h : R ! R given by
h(x) = x. Indeed, a function can be considered the restriction of several functions (rather,
of in…nitely many functions) and being able to tell which among them is the more suitable
for a speci…c purpose is an interesting question in itself. In any case, let us analyze the
di¤erences between g and f and those between g and h. The function g is injective, while
f is not; g is monotonic decreasing, while f is not. The function g is bounded from below,
while h is not; g has a global minimizer, while h does not. N
p
Example 230 The function f (x1 ; x2 ) = x1 x2 has as natural domain R2+ [ R2 . Nev-
ertheless, when we regard it as a utility function of Cobb-Douglas type, its domain is re-
stricted to R2+ , since bundles of goods always have positive components. Moreover, since
f (x1 ; x2 ) = 0, even when only one component is zero, which is not really appropriate from
152 CHAPTER 6. FUNCTIONS

an economic viewpoint, this utility function is often considered only on R2++ . Therefore,
purely economic considerations lead to restricting the domain on which we study the func-
p
tion f (x1 ; x2 ) = x1 x2 . N

Example 231 Let g : [0; +1) ! R be de…ned by g (x) = x3 : The function g can be seen
as the restriction to the interval [0; +1) of the function f : R ! R given by f (x) = x3 , that
is, g = fj[0;+1] . We observe that g is convex, while f is not; g is bounded from below, while
f is not; g has a minimizer, while f does not. N

Example 232 Let g : ( 1; 0] ! R be de…ned by g (x) = x3 . The function g can be seen as


the restriction to the interval ( 1; 0] of the function f : R ! R given by f (x) = x3 , that is,
g = fj( 1;0] . We observe that g is concave, while f is not; g is bounded from above, while f
is not; g has a maximizer, while f does not. N

In a dual way relative to the concept of restriction, we introduce now the concept of
extension of a function (function “extended” to a domain larger than the initial one).

De…nition 233 Let f : A ! B be a function and let A C. A function g : C ! B such


that
g (x) = f (x) 8x 2 A

is called an extension of f to C.

It is evident from the de…nitions just given that restriction and extension are two faces
of the same medal: g is an extension of f if and only if f is a restriction of g. In particular,
a function de…ned on its natural domain A is an extension to A of each restriction of this
function. It is also evident that if a function has an extension, it has in…nitely many ones.23

Example 234 The function g : R ! R de…ned by

1
x if x 6= 0
g(x) =
0 if x = 0

is an extension of the function f (x) = 1=x, which has as natural domain R f0g. N

Example 235 Let g : R ! R be de…ned by

x for x 0
g(x) =
log x for x > 0

It is an extension of the function f (x) = log x, which has natural domain R++ . N
23
It could happen that a function does not have restrictions or does not have extensions. Indeed, let
f : A R ! R. In the extreme situations, if A = fx0 g, that is, if the domain of f is a single point, then f
does not have restrictions. If instead A = R, f does not have extensions.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 153

6.8 Grand …nale: preferences and utility


6.8.1 Preferences
We close the chapter reconsidering in more depth the notions of preference and utility from
Section 6.2.1. Let us consider a preference relation % de…ned on a subset A of Rn+ , called
choice set, whose elements are interpreted as the bundles of goods relevant for the choices
of the consumer.
The preference represents the tastes of the consumer over the bundles. In particular,
x % y means that the consumer prefers the bundle x over the bundle y.24 It is a basic notion,
which economists take as a given (leaving to psychologists the study of the motivations, more
or less deep, that underlie it). From it, the following two important notions are derived:

(i) we write x y if the bundle x is strictly preferred to y, that is, if x % y, but not y % x;

(ii) we write x y if the bundle x is indi¤ erent relative to the bundle y, that is, if both
x % y and y % x.

Note that the relations and are, obviously, mutually exclusive: between two indif-
ferent bundles there cannot exist strict preference, and vice versa.

On the primitive preference % we consider some axioms.

Re‡exivity: x % x for every x 2 A.

This …rst axiom re‡ects the “weakness”of %: each bundle is preferred to itself. The next
axiom is more interesting.

Transitivity: x % y and y % z implies x % z for every x; y; z 2 A.

It is an axiom of rationality that requires that the preferences of the decision maker have
no cycles:
x%y%z x
Strict preference and indi¤erence inherit these …rst two properties (with the obvious
exception of re‡exivity for the strict preference).

Lemma 236 Let % be re‡exive and transitive. Then:

(i) is re‡exive and transitive;

(ii) is transitive.

Proof (i) We have x x since, thanks to the re‡exivity of %, both x % x and x - x hold.
Hence, the relation is re‡exive. To prove the transitivity, suppose that x y and y z.
We show that this implies x z. By de…nition, x y means that x % y and y % x, while
y z means that y % z and z % y. Thanks to the transitivity of %, from x % y and y % z
24
In the weak sense of “prefers or is indi¤erent”.
154 CHAPTER 6. FUNCTIONS

follows x % z, while from y % x and z % y follows z % x. We have therefore both x % z and


z % x, i.e., x z.
(ii) Suppose that x y and y z. We show that this implies x z. Suppose, by
contradiction, that this is not the case, i.e., z % x. By de…nition, x y and y z only if
x % y and y % z. By the transitivity of % and since y % z and z % x, it follows y % x, that
is, x y since x - y. But, x y contradicts x y.

For each bundle x 2 A, denote by

[x] = fy 2 A : y xg

the collection of the bundles indi¤erent to it. This set is the indi¤ erence class of % determined
by the bundle x.

Lemma 237 If % is re‡exive and transitive, we have

x y () [x] = [y] (6.30)

and
x y () [x] \ [y] = ; (6.31)

Relations (6.30) and (6.31) express two fundamental properties of the indi¤erence classes.
By (6.30), the indi¤erence class [x] does not depend on the choice of the bundle x: each
indi¤erent bundle determines the same indi¤erence class. By (6.31) the indi¤erence classes
do not have elements in common, they do not intersect.

Proof By the previous lemma, is re‡exive and transitive. This implies (6.30) and (6.31).
Concerning (6.30), suppose that x y. We show that this implies [x] [y]. Let z 2 [x], that
is, z x. Since is transitive, x y and z x imply that z y, that is, z 2 [y], which
shows that [x] [y]. A similar argument shows that [y] [x], and therefore we conclude
that x y implies [y] = [x]. Since the converse is obvious, (6.30) is proved.
We move now to (6.31) and we suppose that x y. This implies that [x] \ [y] = ;. Let
us suppose, by contradiction, that this is not the case and there exists z 2 [x] \ [y]. By
de…nition, we have both z x and z y and hence, by the transitivity of , we have x y,
which contradicts x y. The contradiction shows that x y implies [x] \ [y] = ;. Since
here also the converse is obvious, the proof is complete.

The collection f[x] : x 2 Ag of all the indi¤erence classes is denoted by A= and it is


sometimes called indi¤ erence map. Thanks to last lemma, A= forms a partition of A.

Now we take again the study of %. The next axiom does not concern the rationality, but
the information of the decision maker.

Completeness: x % y or y % x for every x; y 2 A.

Completeness requires that the consumer is able to compare any two bundles of goods,
even very di¤erent ones. Naturally, to do so the consumer must, at least, have su¢ cient
information about the two possibilities: it is easy to think examples where this assumption
is rather strong.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 155

In any case, note how completeness requires, inter alia, that each bundle be comparable
to itself, that is, x % x. Thus, completeness implies re‡exivity.

Given the completeness assumption, the relations and are both exclusive (as seen
above) and exhaustive.

Lemma 238 Let % be complete. Given two any bundles x and y, we have always x y or
y x or x y.25

Proof By completeness, we have x % y or26 y % x. Suppose, without loss of generality, that


x % y. One has y % x if and only if x y, while one does not have y % x if and only if
x y.

Since we are considering bundles of economic goods (and not of “bads”), it is natural to
assume the monotonicity, i.e., that “more is better”. The triad , >, and leads to three
possible incarnations of this simple principle of rationality:

Monotonicity: x y implies x % y for every x; y 2 A.

Strict monotonicity: x > y implies x y for every x; y 2 A.

Strong monotonicity: % is monotonic and x y implies x y for every x; y 2 A.

The relationships among the three notions are very similar to those seen for the analogous
notions of monotonicity studied (also for utility functions) in Section 6.4.4. For example, the
strict monotonicity means that, given a bundle, the increase of the quantity of any good of
the bundle determines a strictly preferred bundle.
Analogous considerations hold for the other notions. In particular, (6.23) assumes the
form:
strict monotonicity =) strong monotonicity =) monotonicity

6.8.2 Paretian utility


Although the preference % is the fundamental notion, for reasons of analytical convenience
it is important to …nd a numerical representation, that is, a function of several variables
u : A ! R such that, for each pair of bundles x; y, we have

x % y () u(x) u(y) (6.32)

The function u is called of (Paretian) utility and it represents also the strict preference and
indi¤erence:

Lemma 239 We have


x y () u(x) = u(y) (6.33)
and
x y () u(x) > u(y) (6.34)
25
These “or” are intended as “aut”.
26
Intended as “vel”.
156 CHAPTER 6. FUNCTIONS

Proof Indeed,

x y () x % y and y % x () u(x) u(y) and u (y) u (x) () u (x) = u (y)

which proves (6.33).


Now consider (6.34). If x y, then u(x) > u(y). Indeed, suppose, by contradiction,
that u (x) u (y); (6.32) implies x - y, which contradicts x y. It remains to show that
u(x) > u(y) implies x y. Arguing again by contradiction, suppose that x - y; (6.32)
implies u (x) u (y), which contradicts u(x) > u(y). This completes the proof of (6.34).

Expression (6.33) allows to represent the indi¤erence classes as indi¤erence curves of the
utility function:
[x] = fy 2 X : u (y) = u (x)g
As already observed in Section 6.4.4, the utility function is a mere representation of the
preference relation, which is the basic notion, without any special psychological meaning.
Indeed, we have already seen how each strictly increasing function f : Im u ! R de…nes an
equivalent utility function f u, for which it holds that

x % y () (f u) (x) (f u) (y)

6.8.3 Existence and lexicographic preference


In the light of all this, the central theoretical problem that arises is to establish under which
conditions a preference relation % admits a utility function. It is a subtle problem and shortly
we will get acquainted with the famous lexicographic preference, which does not admit any
numerical representation.
The next existence theorem solves the problem. To this end we need a last axiom, which
reminds the Archimedean property of the real numbers seen in Section 1.4.3. For simplicity,
in it we will suppose that the set of choice A is the entire Rn+ .27

Archimedean: given three any bundles x; y; z 2 A = Rn+ with x y z, there exist


weights ; 2 (0; 1) such that

x + (1 )z y x + (1 )z

The axiom implies that there exist no in…nitely preferred and no in…nitely “unpreferred”
bundles. Given the preferences x y and y z, for the consumer the bundle x cannot
be in…nitely better than y, nor the bundle z can be in…nitely worse than y. Indeed, by
combining appropriately the bundles x and z we get both a bundle better than y, that is,
x + (1 )z, and a bundle worse than y, that is, x + (1 )z. This would be impossible
if x were in…nitely better than y, or if z were in…nitely worse than y.
Concerning this aspect, recall the analogous property of the real numbers: if x; y; z 2 R
are three scalars with x > y > z, there exist ; 2 (0; 1) such that

x + (1 )z > y > x + (1 )z (6.35)


27
The axiom can be stated more generally for convex sets, an important notion that we will study in
Chapter 13.
6.8. GRAND FINALE: PREFERENCES AND UTILITY 157

The property does not hold if we consider 1 and 1, that is, the extended real line
R= [ 1; 1]. In this case, if y 2 R, but x = +1 and/or z = 1, the scalar x is in…nitely
greater than y, and z is in…nitely smaller than y, and there do not exist ; 2 (0; 1) that
satisfy the inequality (6.35). Indeed, 1 = +1 and ( 1) = 1 for every ; 2 (0; 1),
as seen in Section 1.7.

In conclusion, the Archimedean axiom makes the bundles of di¤erent but comparable
quality, that is, however di¤erent they belong to the same league. Thanks to it we can now
state the theorem of existence, whose not simple proof we will omit.

Theorem 240 Let % be a preference de…ned on A = Rn+ . The following conditions are
equivalent:

(i) % is transitive, complete, strictly monotonic and Archimedean;

(ii) there exists a strictly monotonic and continuous function28 u : A ! R such that (6.32)
holds, that is,
x % y () u(x) u(y)

This is a result of remarkable importance: most economic applications use utility func-
tions and the theorem shows which conditions on preferences justify such use.29

To appreciate the importance of Theorem 240, we close the chapter with a famous ex-
ample of preferences that do not admit utility function. Let A = R2+ and, given two bundles
x and y, let us write x % y if x1 > y1 or if x1 = y1 and x2 y2 . The consumer starts by
considering the …rst coordinate: if x1 > y1 , then x % y. If, on the other hand, x1 = y1 , then
he turns his attention to the second coordinate: if x2 y2 , then x % y.
The preference takes the way with which the dictionaries order the words; for this reason
% is called lexicographic preference. In particular, we have x y if x1 > y1 or x1 = y1 and
x2 > y2 , while we have x y if and only if x = y. The indi¤erence classes are therefore
singletons, a …rst remarkable characteristic of this preference.
The lexicographic preference is complete, transitive and strictly monotonic, as the reader
can easily verify. It is not Archimedean, however. Indeed, consider, for example, x = (1; 0),
y = (0; 1), and z = (0; 0). We have x y z and

x + (1 ) z = ( ; 0) y z 8 2 (0; 1)

which shows that the Archimedean axiom does not hold.


For this reason Theorem 240 does not apply to lexicographic preference, which therefore
cannot be represented by a strictly monotonic and continuous utility function. Actually, this
preference does not admit any utility function at all.

Proposition 241 The lexicographic preference does not admit any utility function.
28
Continuity is an important property, to which Chapter 12 is devoted.
29
There exist other results on existence of utility functions, in great part proved in the years 1940ies and
1950ies.
158 CHAPTER 6. FUNCTIONS

Proof Suppose, by contradiction, that there exists u : R2+ ! R that represents the lex-
icographic preference. Let a < b be any two positive scalars. For each x 0 we have
(x; a) (x; b) and therefore u (x; a) < u (x; b). By Proposition 39, there exists a rational
number q (x) such that u (x; ) < q (x) < u (x; ). The rule x 7! q (x) de…nes therefore a
function q : R+ ! Q. It is injective. If x 6= y, for example y < x, then:

u (y; a) < q (y) < u (y; b) < u (x; a) < q (x) < u (x; b)

and hence q (x) 6= q (y). But, since R+ has the same cardinality of R, the injectivity of the
function q : R+ ! Q implies jQj jRj, contradicting Theorem 250 of Cantor. This proves
that the lexicographic preference does not admit any utility function.
Chapter 7

Cardinality

7.1 Actual in…nite and potential in…nite


Ideally, a quantity can be made larger and larger by unit increases, a set can become larger
and larger by adding to it an extra element, a segment can be subdivided into smaller and
smaller parts (of positive length) by continuing to cut it in half. Therefore, potentially, we
have arbitrarily large quantities and sets, as well as arbitrarily small segments. In these
cases, we talk of potential in…nite. It is a notion that has been playing a decisive role in
mathematics since the dawn of Greek mathematics. The "- arguments upon which the
study of limits is based are a brilliant example of this.1
When the potential in…nite realizes and becomes actual, we have an actual in…nite. In set
theory, our main interest here, the actual in…nite corresponds to sets constituted by in…nite
elements. Not in potentia (in power) but in act: a set with a …nite number of grains of sand
to which we add more and more new grains is in…nite in potentia, but not in act, because,
however large, the number of grains remains …nite. Instead, a set that consists of in…nite
grains of sand is in…nite in the actual sense.2 It is, of course, a metaphysical notion that
only the eye of the mind can see: (sensible) reality is necessarily …nite. Thus, actual in…nite,
starting from Aristotle, to whom the distinction between the two notions of in…nite dates
back, was considered with great suspicion (summarized with the Latin saying in…nitum actu
non datur ). On the other hand, the dangers of a naive approach, based purely on intuition,
to the actual in…nite had been masterfully highlighted already in pre-Socratic times by some
of the celebrated paradoxes of Zeno of Elea.
All of this did change, after more than twenty centuries, with the epoch-making work of
Georg Cantor. Approximately between 1875 and 1885, Cantor revolutionized mathematics
by …nding the key concept (bijective functions) that allows for a rigorous study of sets, …nite
1
As we will see in Chapters 8 and 11. The potential in…nite will come into play when, for example, we
will consider " > 0, arbitrarily small (but always non-zero) or n arbitrarily large (yet …nite).
2
In a conference held in 1925, David Hilbert described these notions of in…nite with the following words
“Someone who wished to characterize brie‡y the new conception of the in…nite which Cantor introduced
might say that in analysis we deal with the in…nitely large and the in…nitely small only as limit concepts, as
something becoming, happening, i.e., with the potential in…nite. But this is not the true in…nite. We meet
the true in…nite when we regard the totality of numbers 1,2,3,4,. . . itself as a completed unity, or when we
regard the points of an interval as a totality of things which exists all at once. This kind of in…nity is known
as actual in…nity.” (Translated in P. Benacerraf and H. Putnam, Philosophy of mathematics, Cambridge
University Press, 1964).

159
160 CHAPTER 7. CARDINALITY

and in…nite, thus putting the notion of set at the foundations of mathematics. It is not by
chance that our textbook starts with such a notion. The rest of the chapter is devoted to
the Cantorian study of in…nite sets, in particular of their cardinality.

7.2 Bijective functions and cardinality


Bijective functions, introduced in the last chapter, are fundamental in mathematics since
they are often the formalization of a criterion of similarity. Cantor bases the study of the
cardinality of in…nite sets on such functions.
We start by considering a …nite set A, that is, a set that contains a …nite number of
elements. We call the number of elements of the set A the cardinality (or power ) of A, and
we usually denote it by jAj.

Example 242 The set A = f11; 13; 15; 17; 19g of the odd integer numbers between 10 and
20 is …nite and jAj = 5. N

Thanks to Proposition 192, two …nite sets have the same cardinality if and only if their
elements can be put in a one-to-one correspondence: for example, if we have seven seats and
seven students, we can pair each seat with a student by making the latter sit on the former.
In particular, we have the following de…nition.

De…nition 243 A set A is …nite if it can be put in a one-to-one correspondence with a


subset of the form f1; 2; :::; ng of N. In this case, we write jAj = n.

In other words, A is …nite if there exist a set f1; 2; :::; ng of natural numbers and a bijective
function f : f1; 2; :::; ng ! A. The set f1; 2; :::; ng can be seen as the “prototypical” set of
cardinality n, relative to which it is possible to “calibrate” all the other …nite sets of same
cardinality through suitable bijective functions.

For the cardinality of …nite sets, the functional viewpoint, based on bijective functions
and on isolating a prototypical set, not much more than a curiosity. However, it becomes
substantial when we want to extend the notion of cardinality to in…nite sets. This was one of
the fundamental intuitions of Georg Cantor, which led to the birth of the theory of in…nite
sets. Indeed, the possibility of establishing a one-to-one correspondence among in…nite sets
allows for a classi…cation of these sets by “size”and leads to the discovery of properties that
are not always intuitive.

De…nition 244 A set A is said to be countable if it can be put in a one-to-one correspond-


ence with the set N of the natural numbers. In this case, we write jAj = jNj.

In other words, A is countable if there exists a bijective function f : N ! A, that is, if


the elements of the set can be ordered in a sequence: a0 ; a1 ; :::; an ; ::: (i.e., 0 corresponds to
a0 , 1 to a1 , and so on). The set N is therefore the “prototype”for countable sets: any other
set is countable if it is possible to pair in a one-to-one fashion (as the aforementioned little
cups and teaspoons) its elements with those of N. This is the …rst category of in…nite sets
that we encounter.
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 161

Relative to …nite sets, countable sets immediately exhibit a remarkable, possibly puzzling,
property: it is always possible to put a countable set into a one-to-one correspondence with
an in…nite proper subset of it. In other words, losing elements may not a¤ect cardinality
when dealing with countable sets.
Theorem 245 Each in…nite subset of a countable set is also countable.
Proof Let X be a countable set and let A X be an in…nite proper subset of X, i.e.,
A 6= X. Since X is countable, its elements can be listed as a sequence of distinct elements
X = fx0 ; x1 ; : : : ; xn ; : : :g = fxi gi2N . Let us denote by n0 the smallest integer larger than or
equal to 0 such that xn0 2 A (if, for example, x0 2 A, we have n0 = 0, if x0 2 = A and x1 2 A
we have n0 = 1, and so on). Analogously, let us denote by n1 the smallest integer number
(strictly) larger than n0 such that xn1 2 A. Given n0 ; n1 ; : : : ; nj (j 1), let us de…ne nj+1 as
the smallest integer number larger than nj such that xnj+1 2 A. Consider now the function
f : N ! A de…ned by f (i) = xni , with i = 0; 1; : : : ; n; : : :. It is easy to check that f is a
one-to-one correspondence between N and A, and so A is countable.

The following example should clarify the scope of the previous theorem. The set E of
even numbers is, clearly, a proper subset of N that we may think contains only “half” of
its elements. Nevertheless, it is possible to establish a one-to-one correspondence with N by
putting in correspondence each even number to its half:
2n 2 E !n2N
and therefore jEj = jNj. Already Galileo realized this remarkable peculiarity of in…nite sets,
which clearly distinguishes them from …nite sets, whose proper subsets have always smaller
cardinality.3 In a famous passage of the Discorsi e dimostrazioni matematiche intorno a
due nuove scienze,4 published in 1638, he observed that the natural numbers can be put in
a one-to-one correspondence with their squares by setting n2 $ n. The squares, which at
…rst sight seem to constitute a rather small subset of N, are thus in equal number with the
natural numbers: “in an in…nite number, if one could conceive of such a thing, he would be
forced to admit that there are as many squares as there are numbers all taken together”. The
clarity with which Galileo exposes the problem is worthy of his genius. Unfortunately, the
mathematical notions available to him were completely insu¢ cient for further developing
his intuitions. For example, the notion of function, fundamental for the ideas of Cantor,
emerged (in a primitive form) only at the end of the Seventeenth century in the works of
Leibnitz.

Clearly, the union of a …nite number of countable sets is also countable. Much more is
actually true.
3
The mathematical fact considered here is at the basis of several little stories. For example, The Paradise
Hotel has countably in…nite rooms, progressively numbered 1; 2; 3; . At a certain moment, they are all
occupied when a new guest checks in. At this point, the hotel manager faces a conundrum: how to …nd a
room for the new guest? Well, after some thought, he realizes that it is easier than he imagined! It is enough
to ask every other guest to move to the room coming after the one they are actually occupying (1 ! 2; 2 ! 3;
3 ! 4, etc.). In this way, room number 1 will become free. He also realizes that it is possible to improve
upon this new arrangement! It is enough to ask everyone to move to the room with a number which is twice
the one of the room actually occupied (1 ! 2; 2 ! 4; 3 ! 6, etc.). In this way, in…nite rooms will become
available: all the odd ones.
4
The passage is in a dialogue between Sagredo, Salviati, and Simplicio, during the …rst day.
162 CHAPTER 7. CARDINALITY

Theorem 246 The union of a countable collection of countable sets is also countable.

Proof We …rst prove two auxiliary claims.


Claim 1 N N is countable.
Proof Claim 1 Consider the function f1 : N N ! N given by f1 (m; n) = 2n+1 3m+1 . Note
that f1 (m; n) = f1 (m; n) means that 2n+1 3m+1 = 2n+1 3m+1 . By the Fundamental Theorem
of Arithmetic, this implies that n+1 = n+1 and m+1 = m+1, proving that (m; n) = (m; n).
Thus, f1 is injective and f1 : N N ! Im f1 is bijective. At the same time, by Theorem 245
and since Im f1 is in…nite (it indeed contains the set 2 3; 22 3; :::; 2n 3; ::: ), it follows
that Im f1 is countable, that is, there exists a bijection f2 : N ! Im f1 . The reader can
easily verify that the map f = f1 1 f2 is a bijection from N to N N, proving that N N
is countable.
Claim 2 If g : N ! B is surjective and B is in…nite, then B is countable.
Proof Claim 2 De…ne h1 : B ! N by h1 (b) = min fn 2 N : g (n) = bg for all b 2 B. Since
h1 is surjective, fn 2 N : g (n) = bg is non-empty for all b 2 B, thus h1 is well de…ned. Note
that b 6= b0 implies that h1 (b) 6= h1 (b0 ), thus h1 is injective. It follows that h1 : B ! Im h1 is
bijective. At the same time, by Theorem 245 and since Im h1 is in…nite (B is in…nite), there
exists a bijection h2 : N ! Im h1 . The reader can easily verify that the map h = h1 1 h2 is
a bijection from N to B, proving that B is countable.
We are ready to prove the result. Consider the countable collection

A0 ; A1 :::; Am ; ; ::: (7.1)


S
and de…ne B = +1 m=0 Am . Since each Am is countable, clearly, B is in…nite and there exists
a bijection gm : N ! Am . De…ne the map g^ : N N ! B by the rule g^ (m; n) = gm (n). In
other words, the …rst natural number m chooses the set while the second natural number
chooses the n-th element of that set. The map g^ is surjective, for, given an element b 2 B, it
belongs to Am for some m and it is paired to a natural number n by the map gm (n), that is,
g^ (m; n) = gm (n) = b. Unfortunately, g^ might not be injective, since the sets in (7.1) might
have elements in common. If we consider g = g^ f where f is like in Claim 1, this function
is from N to B and it is surjective. By Claim 2, it follows that B is countable, proving the
statement.

With a similar argument it is possible to prove that also the Cartesian product of a …nite
number of countable sets is countable. In particular, the result above yields that the set Q
of the rational numbers is countable.

Corollary 247 Z and Q are countable.

Proof We …rst prove that Z is countable. De…ne f : N ! Z by the rule


n
2 if n is even
f (n) = (n+1)
2 if n is odd

The reader can verify that f is indeed bijective, proving that Z is countable. On the other
hand, the set nm o
Q= : m 2 Z and n 2 N, with n 6= 0
n
7.2. BIJECTIVE FUNCTIONS AND CARDINALITY 163

of rational numbers can be written as union of in…nitely many countable sets:

+1
[
Q= An
n=1

where
0 1 1 2 2 m m
An = ; ; ; ; ;:::; ; ;:::
n n n n n n n

Each An is countable because it is in a one-to-one correspondence with Z, which, in turn, is


countable. By Theorem 246, it follows that Q is countable.

The property just stated is quite surprising: though the rational numbers are much more
numerous than the natural numbers, there exists a way to put these two classes of numbers
into a one-to-one correspondence. The cardinality of N, and of any countable set, is usually
denoted by @0 : jNj = @0 . Therefore, we can write as

jQj = @0

the remarkable property that Q is countable.5


At this point, we might suspect that all in…nite sets are countable. The next result
shows that this is not the case. The set R of real numbers is in…nite, but not countable,
being much richer in terms of elements than N. To establish this fundamental result, we
need a new de…nition and an interesting result.

De…nition 248 A set A has the cardinality of the continuum if it can be put in a one-to-one
correspondence with the set R of the real numbers. In this case, we write jAj = jRj.

The cardinality of the continuum is often denoted by c, that is, jRj = c. Also in this case
there exist subsets that are, prima facie, much smaller than R, but turn out to have the same
cardinality. Let us see an example which will be useful in proving that R is uncountable.

Proposition 249 The interval (0; 1) has the cardinality of the continuum.

Proof We want to show that j(0; 1)j = jRj. To do this we have to show that the numbers of
(0; 1) can be put in a one-to-one correspondence with those of R. The bijection f : R ! (0; 1)
de…ned by
1 x
1 2e if x < 0
f (x) = 1 x
2e if x 0

5
@ (aleph) is the …rst letter of the Hebrew alphabet. In the following section we will formalize, also for
in…nite sets, the idea of having the same or greater cardinality; now, we treat these notions intuitively.
164 CHAPTER 7. CARDINALITY

2
y
1.5

1
1

0.5 1/2

0
O x
-0.5

-1

-1.5

-2
-5 -4 -3 -2 -1 0 1 2 3 4 5

shows that, indeed, this is the case (as the reader can also formally verify).

Theorem 250 (Cantor) R is uncountable, that is, jRj > @0 .

Proof We proceed by contradiction and assume that R is countable. Hence, there exists
a bijective function g : N ! R. By Proposition 249, it follows that there exists a bijective
function f : R ! (0; 1). The reader can easily prove that f g is a bijective function from
N to (0; 1), yielding that (0; 1) is countable. We will next reach a contradiction, showing
that (0; 1) cannot be countable. To this end, we write all the numbers in (0; 1) using their
decimal representation: each x 2 (0; 1) will be written as

x = 0:c0 c1 cn

with ci 2 f0; 1; :::; 9g, using always in…nitely many digits (for example 3:54 will be written
3:54000000 : : :). Since until now we obtained that (0; 1) is countable, there exists a way to
list its elements as a sequence.

x0 = 0:c00 c01 c02 c03 c0n


x1 = 0:c10 c11 c12 c13 c1n
x2 = 0:c20 c21 c22 c23 c2n

and so on. Let us take then the number x = 0:d0 d1 d2 d3 dn such that its generic decimal
digit dn is di¤erent from cnn (but without choosing in…nitely many times 9, thus to avoid a
periodic 9 which, as we know, does not exist on its own). The number x belongs to (0; 1), but
sadly does not belong to the list written above, since dn 6= cnn (and therefore it is di¤erent
from x0 since d0 6= c00 , from x1 since d1 6= c11 , etc.). We conclude that the list written
above cannot be complete and hence the numbers of (0; 1) cannot be put in a one-to-one
correspondence with N. The interval (0; 1) therefore is not countable, a contradiction.
7.3. A PANDORA’S BOX 165

The set R of real numbers is, therefore, much richer than N and Q. The rational numbers
— that have, as we remarked, a “quick rhythm”— are comparatively very few with respect
to the real numbers: they form a kind of very …ne dust that overlaps with the real numbers
without covering them all. At the same time, it is dust so …ne that between any two real
numbers, no matter how close they are, there are particles of it.
In sum, the real line is a new prototype of in…nite set.

It is possible to prove that both the union and the Cartesian product of a …nite or
countable collection of sets that have the cardinality of the continuum has, in turn, the
cardinality of the continuum. This has the next remarkable consequence.
Theorem 251 Rn has the power of the continuum for each n 1.
This is another remarkable …nding, which is surprising already in the special case of the
plane R2 that, intuitively, may appear to contain many more points than the real line. It is
in front of results of this type, so surprising for our “…nitary” intuition, that Cantor wrote
in a letter to Dedekind “I see it, but I do not believe it”. His key intuition on the use of
bijective functions to study the cardinality of in…nite sets opened a new and fundamental
area of mathematics, which is also rich in terms of philosophical implications (mentioned at
the beginning of the chapter).

7.3 A Pandora’s box


The symbols @0 and c are called in…nite cardinal numbers. The role played by the natural
numbers in representing the cardinality of …nite sets is now played by the cardinal numbers
@0 and c for the in…nite sets N and R. For this reason, the natural numbers are also called
…nite cardinal numbers. The cardinal numbers
0; 1; 2; :::; n; :::; @0 , and c (7.2)
represent, therefore, the cardinality of the prototype sets
;; f1g ; f1; 2g ; :::; f1; 2; :::; ng ; :::; N, and R
respectively. Looking at (7.2) it is natural to wonder whether @0 and c are the only in…nite
cardinal numbers. As we will see shortly, this is far from being true. Indeed, we are about to
uncover a genuine Pandora’s box (from which, however, no evil will emerge only wonders).
To do this, we …rst need to generalize to any pairs of sets the comparative notion of size we
considered in De…nitions 244 and 248.
De…nition 252 Two sets A and B have the same cardinality if there exists a bijective
correspondence f : A ! B. In this case, we write jAj = jBj.
In particular, when A is …nite we have jAj = jf1; :::; ngj = n, when A is countable we
have jAj = jNj, and when A has the cardinality of the continuum we have jAj = jRj = c.
We denote by 2A the power set of the set A, that is, the collection
2A = fB : B Ag
of all its subsets. The notation 2A is justi…ed by the cardinality of the power set, as we next
show.
166 CHAPTER 7. CARDINALITY

Proposition 253 If jAj = n, then 2A = 2n .


n
Proof Combinatorial analysis shows immediately that 2A contains the empty set, 1 sets
with one element, n2 sets with two elements,..., nn 1 sets with n 1 elements, and n
n =1
sets with all the n elements. Therefore,
n n n n
2A = 1 + + + ::: + +
1 2 n 1 n
n
X n k n k
= 1 1 = (1 + 1)n = 2n
k
k=0

where the penultimate equality follows from Newton’s binomial formula.

Sets can have the same size, but also di¤erent sizes. This motivates the following de…ni-
tion:

De…nition 254 A set A has cardinality less than or equal to that of B, written jAj jBj,
if there exists an injective function f : A ! B. A set A has cardinality strictly less than that
of B, written jAj < jBj, if jAj jBj and jAj = 6 jBj.

Next we list a few properties of these comparative notions of cardinality.

Proposition 255 If A, B, and C are any three sets, then:

(i) jAj jAj;

(ii) jAj jBj and jBj jCj imply that jAj jCj;

(iii) jAj jBj and jBj jAj if and only if jAj = jBj;

(iv) A B implies that jAj jBj.

Example 256 We have jNj < jRj. Indeed, by Theorem 250 jNj =
6 jRj and, by assertion (iv),
N R implies jNj jRj. N

Properties (i) and (ii) say that the order is re‡exive and transitive. As for property
(iii), it tells us that and = are related in a natural way. Finally, (iv) con…rms the intuitive
idea that smaller sets have a small cardinality. Remarkably, this intuition does not carry
over to < – i.e., A ( B does not imply jAj < jBj – because, as we have already noted, a
proper subset of an in…nite set may have the same cardinality as the original set (as Galileo
had envisioned).

Proof We start by proving an auxiliary fact. If f : A ! B and g : B ! C are injective,


then g f is injective. For, set h = g f . Assume that h (a) = h (a0 ). Denote b = f (a) and
b0 = f (a). By the de…nition of h, we have g (b) = g (b0 ). Since g is injective, this implies
b = b0 , that is, f (a) = f (a0 ). Since f is injective, we conclude that a = a0 , proving h is
injective.
(i) Let f : A ! A be the identity, that is, f (a) = a for all a 2 A. The function f is
trivially injective and the statement follows.
7.3. A PANDORA’S BOX 167

(ii) Since jAj jBj, there exists an injective function f : A ! B. Since jBj jCj, there
exists an injective function g : B ! C. Next, note that h = g f is well de…ned, h : A ! C,
and, by the initial part of the proof, we also know it is injective, proving that jAj jCj.
(iii) We only prove the “if” part. The “only if” part is the content of the Schroeder-
Bernstein’s Theorem which we leave to more advanced courses. By de…nition and since
jAj = jBj, there exists a bijection f : A ! B. Since f is bijective, it follows that f 1 : B ! A
is well de…ned and bijective. Thus, both f : A ! B and f 1 : B ! A are injective, yielding
that jAj jBj and jBj jAj.
(iv) De…ne f : A ! B by the rule f (a) = a. Since A B, f is well de…ned and clearly
injective, proving the statement.

When a set A is …nite and non-empty, we clearly have jAj < 2A . Remarkably, the
inequality continues to hold for in…nite sets.

Theorem 257 (Cantor) For each set A, …nite or in…nite, we have jAj < 2A .

Proof Consider a set A and the collection of all singletons C = ffagga2A . It is immediate to
see that there is a bijective mapping between A and C, that is, jAj = jCj, and C 2A . Since
jCj 2A , we conclude that jAj 2A . Next, by contradiction, assume that jAj = j2A j.
Then there exists a bijection between A and 2A which associates to each element a 2 A an
element b = b (a) 2 2A and vice versa: a $ b. Observe that each b (a), being an element of
2A , is a subset of A. Consider now all the elements a 2 A such that the corresponding subset
b (a) does not contain a. Call S the subset of these elements, that is, S = fa 2 A : a 62 b (a)g.
Since S is a subset of A, S 2 2A . Since we have a bijection between A and 2A , there must
exist an element c 2 A such that b (c) = S. We have two cases:

(i) if c 2 S, then by the de…nition of S, b (c) does not contain c and therefore c 2
= b (c) = S;

(ii) if c 2
= S, then by the de…nition of S, b (c) contains c and therefore c 2 b (c) = S.

In both cases, we have reached a contradiction, proving jAj < j2A j:

Cantor’s Theorem o¤ers a simple way to make a “cardinality jump” starting from a
given set A: it is su¢ cient to consider the power set 2A . For example, 2R > jRj, then also
R
22 > j2R j, and so on. We can therefore build an in…nite sequence of sets that are of higher
and higher cardinality. In this way, we enrich (7.2), which now becomes
n R
o
1; 2; :::; n; :::; @0 ; jRj ; 2R ; 22 ; ::: (7.3)

Here is the Pandora’s box mentioned above, which Theorem 257 has allowed us to uncover.
The breathtaking sequence (7.3) is only the incipit of the theory of the in…nite sets, whose
study (even the introductory part) would take us too far away.
Before moving on with the book, however, we consider a …nal famous aspect of the
theory, the so-called continuum hypothesis (which the reader might have already heard of).
By Theorem 257, we know that 2N > jNj. On the other hand, by Theorem 250 we also
have jRj > jNj. The next result (we omit its proof) shows that these two inequalities are
actually not distinct.
168 CHAPTER 7. CARDINALITY

Theorem 258 2N = jRj.

Therefore, the power set of N has the cardinality of the continuum. The continuum
hypothesis states that there is no set A such that

jNj < jAj < jRj

That is, there does not exist any in…nite set of intermediate cardinality between @0 and c.
In other words, a set that has cardinality larger than @0 must have at least the cardinality
of the continuum.
The validity of the continuum hypothesis is the …rst among the celebrated Hilbert prob-
lems, posed by David Hilbert in 1900, and represents one of the deepest questions in math-
ematics. By adopting this hypothesis, it is possible to set

@1 = jRj

and to consider the cardinality of the continuum as the second in…nite cardinal number @1
after the …rst one @0 = jNj.
The continuum hypothesis can be reformulated in a suggestive way by writing

@1 = 2@0

That is, the smallest cardinal number greater than @0 is equal to the cardinality of the power
set of N or, equivalently, of any set of cardinality @0 (like, for example, the rational numbers).
The generalized continuum hypothesis states that, for each n, we have

@n+1 = 2@n

All the jumps of cardinality in (7.3), not only the …rst one from @0 to @1 , are thus obtained
by considering the power set. Therefore,

R 2R
@2 = 22 ; @3 = 22

and so on. At this point, (7.3) becomes

f1; 2; :::; n; :::; @0 ; @1 ; @2 ; @3 ; :::g

Summing up, the depth of the problems that the use of bijective functions opened is
incredible. As we have seen, this study started by Cantor is, at the same time, rigorous
and intrepid (as typical of the best mathematics, at the basis of its beauty). It relies on
the use of bijective functions to capture the fundamental principle of similarity (in terms of
numerosity) among sets.6

6
The reader who wants to learn more about set theory can consult P. Halmos, Naive set theory, Van
Nostrand, 1960 or P. Suppes, Axiomatic set theory, Van Nostrand, 1960.
Part II

Discrete analysis

169
Chapter 8

Sequences

8.1 The concept


A numerical sequence is an in…nite “list” of real numbers, for example

f2; 4; 6; 8; :::g (8.1)

where each number occupies a place of order, i.e., it follows (except the …rst one) a real
number and precedes another one. The next de…nition formalizes this. We denote by N+
the set of the natural numbers without 0.

De…nition 259 A function f : N+ ! R is called a sequence of real numbers.

In other words, a sequence is a function that associates to each natural number n 1a


real number f (n). In (8.1), to each n we associate f (n) = 2n, that is,

n 7 ! 2n (8.2)

and so we have the sequence of even strictly positive integers. The image f (n) is usually
denoted by xn . With such notation, the sequence of the even strictly positive integers is
xn = 2n for each n 1. The images xn are called terms (or elements) of the sequence. We
will denote sequences by fxn g1
n=1 , or brie‡y by fxn g.
1

There are di¤erent ways to de…ne a sequence fxn g, that is, to describe the underlying
function f : N+ ! R. A …rst way is to describe it in closed form, i.e., through a formula: for
example, it is what we have done with the sequence of the even numbers using (8.2). Other
de…ning rules are, for example,

n 7 ! 2n 1 (8.3)
2
n7 !n (8.4)
1
n7 ! p (8.5)
2n 1

1
The choice of starting the sequence from n = 1 instead of n = 0 is a mere convention. In contexts where
it is more suitable to start from n = 0, is perfectly legitimate to consider sequences fxn g1
n=0 .

171
172 CHAPTER 8. SEQUENCES

Rule (8.3) gives rise to the sequence of odd strictly positive integers

f1; 3; 5; 7; g (8.6)

rule (8.4) to the sequence of the squares

f1; 4; 9; 16; g

and rule (8.5) de…nes the sequence

1 1 1
1; p ; p ; p ; (8.7)
2 4 8
Another important way to de…ne a sequence is by recurrence (or recursion). Consider
the classical Fibonacci sequence

f0; 1; 1; 2; 3; 5; 8; 13; 21; 34; 55; g

in which each term is the sum of the two terms that precede it, with …xed initial values 0
and 1. For example, in the fourth position we …nd the number 2, i.e., the sum 1 + 1 of the
two terms that precede it, in the …fth position we …nd the number 3, i.e., the sum 1 + 2 of
the two terms that precede it, and so on. The underlying function f : N+ ! R is, hence,
(
f (1) = 0 ; f (2) = 1
(8.8)
f (n) = f (n 1) + f (n 2) for n 3

We therefore have two initial values, f (1) = 0 and f (2) = 1, and a recursive rule that allows
to calculate the term in position n once the two preceding terms are known. Di¤erently
from the sequences de…ned through a closed formula, such as (8.3)–(8.5), to obtain the term
xn we now have to …rst build, using the recursive rule, all the terms that precede it. For
example, to calculate the term x100 in the sequence (8.6) of the odd numbers, it is su¢ cient
to substitute n = 100 in formula (8.3), …nding x100 = 199. On the contrary, to calculate the
term x100 in the Fibonacci sequence we have to rebuild …rst by recurrence the …rst 99 terms
of the sequence. Indeed, it is true that to determine x100 it is su¢ cient to know the values
of x99 and x98 and then to use the rule x100 = x99 + x98 , but to determine x99 and x98 we
must …rst know x97 and x96 , and so on.
Therefore, the recursive de…nition of a sequence consists of one or more initial values
and of a recurrence rule that, starting from them, allows to build the various terms of the
sequence. The initial values are arbitrary. For example, if in (8.8) we choose f (1) = 2 and
f (2) = 1 we have the following Fibonacci sequence

f2; 1; 3; 4; 7; 11; 18; 29; 47; g

We provide now a pair of classic examples of sequences, the …rst one de…ned by recurrence
and the second one in closed form.

Example 260 Fixed any a; b 2 R, let f : N+ ! R be de…ned by

f (1) = a
f (n) = f (n 1) + b for n 2
8.1. THE CONCEPT 173

The initial value is f (1) = a, starting from which it is possible to build the entire sequence
through the recursive formula f (n) = f (n 1) + b. Such sequence is called arithmetic (or
an arithmetic progression) with …rst term a and common di¤erence b. For example, if a = 2
and b = 4, we have
f2; 6; 10; 14; 18; 22; g
. N

Example 261 The sequence with xn = 1=n, that is,

1 1 1 1
1; ; ; ; ;
2 3 4 5

is called harmonic,2 while the sequence with xn = aq n 1, that is,

a; aq; aq 2 ; aq 3 ; aq 4 ;

is called geometric (or a geometric progression) with …rst term a and common ratio q. N

Clearly, not all sequences can be described in closed or recursive form. The most famous
example is the sequence fpn g of prime numbers: it is in…nite by Euclid’s Theorem, but it
does not have a (known) explicit description. In particular:

(i) Given n, we do not know any formula that tells us what pn is; in other words, the
sequence fpn g cannot be de…ned in closed form (as far as we know).

(ii) Given pn , we do not know any formula that tells us what pn+1 is; in other words, the
sequence fpn g cannot be de…ned by recurrence.

The situation is actually even more sad:

(iii) Given any prime number p, we do not know of any formula that gives us a prime
number q greater than p; in other words, the knowledge of a prime number does not
give any information on the subsequent prime numbers.

Hence, we do not have a clue on how the prime numbers follow one another, that is,
on the form of the function f : N+ ! R that de…nes such sequence. We have, therefore,
to consider all the natural numbers and check, one by one, whether or not they are prime
numbers through the primality tests (Section 1.3.2). Having at our disposal the eternity, we
could then construct term by term the sequence fpn g. More modestly, in the short time that
passed between Euclid and us, tables of prime numbers have been compiled; they establish
the terms of the sequence fpn g until numbers that may seem very large to us, but that are
nothing relative to the in…nity of all the prime numbers.
O.R. Concerning observation (iii), for centuries mathematicians have looked for a rule that,
given a prime number p, made it possible to …nd a greater prime q > p, that is, a function
q = f (p). A famous example of a possible such rule is given by the prime numbers of
2
It is called harmonic because 1=2; 1=3; 1=4; are the positions in which we have to put a …nger on a
vibrating string to obtain the di¤erent notes.
174 CHAPTER 8. SEQUENCES

Mersenne. A prime number is said to be a Mersenne number if it can be written in the form
2p 1 with p prime. It is possible to prove that if 2p 1 is prime, then so is p. For centuries,
it was believed (or hoped) that the much more interesting converse was true, namely: if p
is prime, so is 2p 1. This conjecture was de…nitely disproved in 1536, when Hudalricus
Regius showed that
211 1 = 2047 = 23 89
thus …nding the …rst counterexample to the conjecture. Indeed, p = 11 does not satisfy
it. In any case, the Mersenne numbers are among the most important prime numbers. In
particular, as of 2016, the greatest prime number known is

274207281 1

which has 22338618 digits and is a Mersenne number (see the Great Internet Mersenne Prime
Search). H

We close the section by observing that given any function f : R+ ! R, the restriction of
f to N+ , fjN+ is a sequence.

8.2 The space of sequences


We denote by R1 the space of all the sequences x = fxn g of real numbers.3 We denote
therefore by x a generic element of R1 which, written in “extended ” form, reads

x = fxn g = fx1 ; x2 ; : : : ; xn ; : : :g

The operations seen on the functions in Section 6.3.2 have as a special case the operations
on sequences, that is, on elements of the space R1 . In particular, given two sequences
x = fxn g and y = fyn g in R1 , we have:

(i) the sequence sum (x + y)n = xn + yn for every n 1;

(ii) the sequence di¤erence (x y)n = xn yn for every n 1;

(iii) the sequence product (xy)n = xn yn for every n 1;

(iv) the sequence ratio (x=y)n = xn =yn for every n 1, provided yn 6= 0.

In view of (i), for convenience of notation, we will denote the sum directly as fxn + yn g
instead of f(x + y)n g, and we will do the same for the other operations.4

On R1 we have an order structure with characteristics similar to those seen for Rn . In


particular, given x; y 2 R1 , we write:
3
Sometimes we have to deal with sets of vectors “of variable length”: for example, if the vectors are
consumption pro…les, it may happen that some of them cover 3 periods, others 5, others 12, etc.. If there is
not an obvious …xed number of periods, as it is the case, for example, with life times, the only possibility is
to imagine that all the consumption pro…les cover in…nitely many periods (i.e., they are sequences in R1 ),
possibly ending with an in…nite number of zeros.
4
Note that, if f; g : N+ ! R are the functions underlying the sequences fxn g and fyn g, (i) is equivalently
written (x + y)n = (f + g) (n) = f (n) + g (n) for every n 1, and similarly for the other operations (ii)–(iv).
8.3. APPLICATION: INTERTEMPORAL CHOICES 175

(i) x y if xn yn for every n 1;

(ii) x > y if x y and x 6= y, that is, if x y and there exists at least a position index n
such that xn > yn ;

(iii) x y if xn > yn for every n 1.

Moreover, (iii) =) (ii) =) (i), i.e.,

x y =) x > y =) x y 8x; y 2 R1

The functions g : A R1 ! R de…ned on subsets of R1 are very important. Thanks to


the order structure of R1 , we have a …rst classi…cation of these functions by monotonicity,
analogous to the one seen for Rn in Section 6.4.4. A function g : A R1 ! R is said to be:

(i) increasing if
x y =) g (x) g (y) x; y 2 A (8.9)

(ii) strongly increasing if it is increasing and

x y =) g (x) > g (y) x; y 2 A

(iii) strictly increasing if


x > y =) g (x) > g (y) x; y 2 A

The decreasing counterparts of these notions are de…ned in an analogous way. Moreover,
in particular, g is constant if there exists k 2 R such that

g (x) = k 8x 2 A

For brevity we do not dwell further upon these notions, and we limit ourselves to observing
that the strictly increasing monotonicity implies the other two properties.

8.3 Application: intertemporal choices


In Section 2.4.2 we have seen how the Euclidean space RT can model a problem of intertem-
poral choice of the consumer on T periods. However, in many applications it is important
not to …x a priori a …nite horizon T for the consumer, but to imagine that he faces an in-
…nite horizon. In this case, in the sequence x = fx1 ; x2 ; : : : ; xt ; : : :g the term xt denotes the
quantity of the good consumed at time t, for t = 1; 2; : : :.
This is, of course, an idealization. But it permits to model in a simple way the inter-
temporal choices of agents that are not able to specify the last period T relevant for them
(for example, in some intertemporal choices the …nal date is that of the death of the agent,
which he does not know a priori).
In analogy to what we have seen in Section 6.2.2, the consumer has preferences on
the possible pro…les x = fx1 ; x2 ; : : : ; xt ; : : :g of intertemporal consumption, the so-called
consumption ‡ows (“consumption streams”), quanti…ed by an intertemporal utility function
U : R1+ ! R. For example, if, as in Section 6.2.2, we assume that the consumer has, for the
176 CHAPTER 8. SEQUENCES

consumption xt of each period, a utility function ut : R+ ! R, called instantaneous, then a


possible form of the intertemporal utility function is
t 1
U (x) = u1 (x1 ) + u2 (x2 ) + + ut (xt ) +

where 2 (0; 1) can be interpreted as a subjective discount factor that, as we have seen,
depends on the degree of patience of the consumer.

The properties of monotonicity of intertemporal utility functions U : R1 + ! R are ana-


logous to those of functions of several variables. In particular, the function U is increasing
if
x y =) U (x) U (y) x; y 2 R1+

it is strongly increasing if it is increasing and

x y =) U (x) > U (y) x; y 2 R1


+

and it is strictly increasing if

x > y =) U (x) > U (y) x; y 2 R1


+

Regarding this, a few observations analogous to those made in Section 6.4.4 for utility func-
tions on Rn are valid. In particular, here we have, too,

strictly increasing =) strongly increasing =) increasing

8.4 Images and classes of sequences


Observe that in a sequence the same values can appear several times. For example, the
sequence with generic element xn = ( 1)n is

f 1; 1; 1; 1; :::g (8.10)

in which the two values 1 and 1 are repeated. The constant sequence, with generic element
xn = 2 for every n 1,
f2; 2; 2; :::g (8.11)
is constituted only by 2 (the corresponding f is therefore the constant function f (n) = 2 for
every n 1).
Concerning this aspect, the image (or range)

Im f = ff (n) : n 1g

of the sequence, which consists exactly of the values that the sequence assumes, disregarding
repetitions, is important. For example, the image of the sequence (8.10) is f 1; 1g, while
for the constant sequence (8.11) it is the singleton f2g. The image gives therefore a very
important information because it indicates which values the sequence e¤ectively assumes,
without the repetitions: as we have seen, they can be very few and repeat themselves over
and over again along the sequence. On the other hand, the sequence (8.6) of the odd
8.4. IMAGES AND CLASSES OF SEQUENCES 177

numbers does not contain any repetition, and its image consists of all its terms, that is,
Im f = f2n 1 : n 1g.

Through the image, in Section 6.4.3 we have studied various notions of boundedness for
functions. In the special case of the sequences — i.e., of the functions f : N+ ! R — these
general notions assume the following form. A sequence fxn g is:

(i) bounded from above if there exists k 2 R such that xn k for every n 1;

(ii) bounded from below if there exists k 2 R such that xn k for every n 1;

(iii) bounded if it is both bounded from above and from below, i.e., if there exists k > 0
such that jxn j k for every n 1.

For example, the sequence fxn g = f( 1)n g is bounded, while that of the odd numbers
(8.6) is only bounded from below. Note that, as usual, this classi…cation is not exhaustive,
since there exist sequences that are neither bounded from above, nor bounded from below:
for example, xn = ( 1)n n. Such sequences are called unbounded.
Another important class of sequences are the monotonic ones, which are de…ned in a
similar way to what we saw for functions in Section 6.4.4. In particular, a sequence fxn g is:

(i) increasing if
xn+1 xn 8n 1
strictly increasing if
xn+1 > xn 8n 1

(ii) decreasing if
xn+1 xn 8n 1
strictly decreasing if
xn+1 < xn 8n 1

(iii) constant if it is both increasing and decreasing, i.e., if there exists k 2 R such that

xn = k 8n 1

An increasing or decreasing sequence is called monotonic 5 . For example, the sequence


(8.6) of the odd numbers is increasing, while the sequence (8.7) is decreasing.

A very important concept concerns the properties eventually enjoyed by a sequence:

De…nition 262 We say that a sequence satis…es a property P eventually if, starting from
a certain place of order n = nP , all the terms of the sequence satisfy P.

Obviously, the place (or index) n depends on the property P: this is indicated by writing
n = nP .
5
For sequences the notions of strict monotonicity are not so important.
178 CHAPTER 8. SEQUENCES

Example 263 (i) The sequence f2; 4; 6; 32; 57; 1; 3; 5; 7; 9; 11; g is eventually increas-
ing: indeed, starting from the 6th term, it is increasing.

(ii) The sequence fng is eventually 1:000: indeed, all the terms of the sequence, starting
from the ones of place 1:000, are 1:000.

(iii) The same sequence is also eventually 1:000:000:000; as well as 10123 .

(iv) The sequence f1=ng is eventually smaller than 1=1:000:000.

(v) The sequence


f27; 65; 13; 32; ; 125; 32; 3; 3; 3; 3; 3; 3; 3; 3; g
is eventually constant. N

O.R. To eventually satisfy a property, the sequence, “when young”, can do what it wants;
the important is that “when enough” (that is, from a certain n onward), “it settles down”.
Youthful blunders are forgiven: what is important is that, sooner or later, all the terms of
the sequence satisfy the property. H

8.5 Limits: introductory examples


The original purpose of the notion of limit was to formalize rigorously the concept of “how
a sequence behaves as n becomes larger and larger”, that is, asymptotically. In other words,
as for a thriller story, we ask ourselves “how it will end”. For the sequences whose terms
represent the values that an economic quantity assumes in subsequent dates, in economics
we talk about “long run behavior”.
We start with some examples, to understand intuitively what we mean by limit of a
sequence. Consider the sequence (8.7)

1 1 1
1; p ; p ; p ;
2 4 8
p
By continuing, we can verify that, for larger and larger values of n, its terms xn = 1= 2n 1
become closer and closer, we say “tend”, to the value L = 0. In this case we say that the
sequence tends to 0 and we write

1
lim p =0
n!1 2n 1

For the sequence (8.6) of the odd numbers

f1; 3; 5; 7; g

the terms xn = 2n 1 of the sequence grow larger and larger for larger and larger values of
n. In this case we say that the sequence diverges positively, and we write

lim (2n 1) = +1
n!1
8.6. LIMITS AND ASYMPTOTIC BEHAVIOR 179

Dually, the sequence of the negative odd numbers xn = 2n + 1 diverges negatively: in


symbols
lim ( 2n + 1) = 1
n!1

Finally, consider the sequence xn = ( 1)n :

f 1; 1; 1; 1; g

By changing the values of n, it continues to oscillate between the values 1 and 1, never
approaching (eventually) any particular value. In this case, we say that the sequence is
oscillating (or irregular): it does not have a limit.

8.6 Limits and asymptotic behavior


In the introductory examples we have identi…ed three possible asymptotic behaviors of the
terms of a sequence:

(i) convergence to a value L 2 R;

(ii) divergence to +1 or to 1;

(iii) oscillation.

In the …rst two cases we say that the sequence is regular : it tends (it approaches asymp-
totically) to a value, possibly in…nite. In case (iii) we say that the sequence is irregular (or
oscillating). In the rest of the section we formalize the intuitive idea of “tending to a value”.6

8.6.1 Convergence
We start with convergence, that is, with case (i).

De…nition 264 A sequence fxn g converges to a point L 2 R, in symbols xn ! L or


limn!1 xn = L, if for every " > 0 there exists n" 1 such that

n n" =) jxn Lj < " (8.12)

The number L is called the limit of the sequence.

The implication (8.12) can be rewritten as

n n" =) d (xn ; L) < "

Therefore, a sequence fxn g converges to L when, for each quantity ", arbitrarily small (but
positive), there exists a place n" (that depends on "!) starting from which the distance
between the terms xn of the sequence and the limit L is always smaller than ". A sequence
fxn g that converges to a point L 2 R is called convergent.
6
Often, irregual sequences are called divergent. In order to avoid any confusion with regular sequences
that are not convergent, the latter have the extra speci…cation of being divergent to either +1 or 1.
180 CHAPTER 8. SEQUENCES

We have said that the position (index) n" depends on ". Moreover, as it should be clear
from Examples 266 and 267, the choice of n" is not unique: if there exists a position n" such
that jxn Lj < " for every n n" , the same holds for any subsequent position, which can
also be itself chosen as n" . The choice of which among these positions to call n" is completely
irrelevant: the de…nition asks that there exists (at least) one. The two examples that we will
present shortly should clarify the question.

The de…nition of convergence can also be rewritten in the language of neighborhoods.


This conceptually is a very important rewriting, which deserves a separate mention.

De…nition 265 A sequence fxn g converges to a point L 2 R if for every neighborhood


B" (L) of L there exists n" 1 such that

n n" =) xn 2 B" (L)

that is
n n" =) L " < xn < L + "

In other words, a sequence fxn g tends to a number L 2 R if the sequence falls/belongs


eventually in/to each neighborhood B" (L) of L, no matter how small one takes it. Although
De…nition 265 is a mere rewriting of De…nition 264, the use of neighborhoods is particularly
e¤ective in clarifying the nature of the de…nition of convergence.

O.R. The de…nition requires that “falling eventually inside”happens for every neighborhood
of L: it is thus essential that this happens for arbitrarily small neighborhoods (it is easy to
belong to an enormous neighborhood, but di¢ cult to belong to a very small one). H

Example 266 Consider the sequence f1=ng. The natural candidate for its limit is 0. Let
us verify that this is indeed the case. Let " > 0. We have

1 1 1
0 < " () < " () n >
n n "

Therefore, if we take as n" any integer greater that 1=", for example n" = [1="] + 1,7 then
we have
1
n n" =) 0 < < "
n
and therefore 0 is actually the limit of the sequence. For example, if " = 10 100 , we have
n" = 10100 + 1. Note that we could have chosen n" to be any integer greater than 10100 + 1.
N
n p o
Example 267 Consider the sequence (8.7), that is, 1= 2n 1 . Also here the natural
candidate for its limit is 0. Let us verify this. Let " > 0. We have

1 1 n 1 1 1
p 0 < " () n 1 < " () 2 2 > () n > 1 + 2 log2
2n 1 2 2 " "
7
Recall that [ ] denotes the integer part, introduced in Section 1.4.3.
8.6. LIMITS AND ASYMPTOTIC BEHAVIOR 181

and therefore, by taking n" to be any integer greater than 1 + 2 log2 " 1, for example n" =
2 + 2 log2 " 1 , we have
1
n n" =) 0 < p <"
2n 1
Therefore, 0 is the limit of the sequence. For example, if " = 10 100 we have n" = 2 +
2 log2 10100 = 2 + 200 [log2 10]. N

We have seen two examples of sequences that converge to 0. Such sequences are called
in…nitesimal (or null ). Thanks to the next result, the computation of their limits is of
particular importance.

Proposition 268 A sequence fxn g converges to a point L 2 R if and only if d (xn ; L) ! 0.

Proof “Only if”. Let limn!1 xn = L. Consider the (new) sequence of term yn = d(xn ; L):
We have to prove that limn!1 yn = 0, i.e., that for every " > 0 there exists n" 1 such that
n n" implies jyn j < ". On the other hand, since yn 0, this is equivalent to showing that

n n" =) yn < " (8.13)

Since xn ! L, given " > 0, there exists n" 1 such that d(xn ; L) < " for every n n" , and
therefore (8.13) holds.

“If”. Suppose that limn!+1 d (L; xn ) = 0. Let " > 0. There exists n" 1 such that
d (L; xn ) < " for every n n" . Therefore, xn 2 B" (L) for every n n" , as desired.

We can therefore reduce the study of the convergence of any sequence to the convergence
to 0 of the sequence fd (xn ; L)gn 1 of real numbers. In other words, to check if xn ! L, it
is su¢ cient to check if d (xn ; L) ! 0.

Example 269 Consider the sequence

1
xn = 1 + ( 1)n
n

and let us verify that it converges to L = 1. We have

( 1)n ( 1)n 1
d (xn ; 1) = 1 + 1 = = ! 0;
n n n

and therefore, by Proposition 268, xn ! 1. N

We close with an important observation: in applying De…nition 264 of convergence, we


have always to specify a possible limit L 2 R, and then to verify, according to the de…nition,
if it is actually so. For some sequences to exhibit a possible candidate limit L is not obvious,
making it problematic the application of the de…nition. We will talk again about this.
182 CHAPTER 8. SEQUENCES

8.6.2 Limits from above and from below


It can happen that xn ! L 2 R and that eventually we also have xn L. In this case, fxn g
approaches L by remaining to its right. In such a case we say that fxn g tends to L from
above and we write limn!1 xn = L+ or xn ! L+ or, even better, xn # L. Note that the
notations xn # L and xn ! L+ are more informative than xn ! L: besides saying that fxn g
converges to L they both convey the information that this happens from above.
Analogously, if xn ! L 2 R and eventually xn L, we say that fxn g tends to L from
below and we write limn!1 xn = L or xn ! L or xn " L.
p p
Example 270 (i) 1=n # 0. (ii) 1= 2n 1 # 0, since 2n 1 > 0. (iii) f1 1=ng " 1, since
1 1=n < 1. (iv) 1 + ( 1n ) n 1 ! 1, but neither to 1+ nor to 1 . N

We leave to the reader the rigorous de…nition of limit from above and from below in
terms of right and left neighborhoods of L.

8.6.3 Divergence
We now consider the divergence, starting with the positive divergence. The idea of the
de…nition is similar, mutatis mutandis, to the previous ones.

De…nition 271 A sequence fxn g diverges positively, written xn ! +1 or limn!1 xn =


+1, if for every K 2 R there exists nK 1 such that

n nK =) xn > K

In other words, a sequence diverges positively when it eventually becomes greater than
every K > 0. Since the constant K can be taken arbitrarily large, this can happen only if
the sequence is not bounded from above.

O.R. The de…nition requires that the inequality holds for every scalar K: it is decisive
that this happens for arbitrarily large values of K (it is easy to be > K when K is small,
increasingly di¢ cult the larger K is). H

Example 272 Consider the sequence of the even numbers, xn = 2n and let us verify that
it diverges positively. Let K 2 R. We have
K
2n > K () n >
2
and so we can choose as nK any integer greater than K=2. For example, if K = 10100 , we
can put nK = 10100 =2 + 1. Therefore fxn g = f2ng diverges positively. N

The de…nition of negative divergence is analogous.

De…nition 273 A sequence fxn g diverges negatively, written xn ! 1 or limn!1 xn =


1, if for every K 2 R there exists nK 1 such that

n nK =) xn < K
8.6. LIMITS AND ASYMPTOTIC BEHAVIOR 183

In such a case, the terms of the sequence are eventually smaller than every K < 0:
although the constant can take arbitrarily large negative values (in absolute value), there
exists a position besides which all the terms of the sequence are smaller than or equal to the
constant. This characterizes the convergence to 1 of the sequence.

Intuitively, divergence is a form of “convergence to in…nity”. The next simple, but


important, result highlights the strong link between convergence and divergence.

Proposition 274 A sequence fxn g, with eventually xn > 0, diverges positively if and only
if the sequence f1=xn g converges to zero.

An analogous result holds for the negative divergence. Note how the hypothesis “eventu-
ally xn > 0” is irrelevant for a sequence that diverges positively since this kind of sequence
always satis…es this condition.

Proof “If”. Let 1=xn ! 0. Let K > 0. Setting " = 1=K > 0, by De…nition 264, there exists
n1=K 1 such that 1=xn < 1=K for every n n1=K . Therefore, xn > K for every n n1=K ,
and by De…nition 271 we have xn ! +1.
“Only if”. Let xn ! +1 and let " > 0. Setting K = 1=" > 0, by De…nition 271, there
exists n1=" such that xn > 1=" for every n n1=" . Therefore, 0 < 1=xn < " for every n n1="
and therefore 1=xn ! 0.

O.R. Adding, subtracting, altering (or changing in any other way) a …nite number of terms
of a sequence does not change its asymptotic behavior: if it is regular, i.e., convergent or
(properly) divergent, it remains so, and with the same limit; if it is oscillating (irregular),
it remains so. This obviously depends on the fact that the limit requires that a property
(either “hitting” an arbitrarily small neighborhood in case of convergence or being greater
than an arbitrarily large number in case of divergence) only holds eventually. H

8.6.4 Topology of R and general de…nition of limit


The topology of the real line can be extended in a natural way to the extended real line,
de…ning the neighborhoods of the points at in…nity in the following way.

De…nition 275 A neighborhood of +1 is a half-line of the type (K; +1], with K 2 R. A


neighborhood of 1 is an half-line of the type [ 1; K), with K 2 R.

Therefore, a neighborhood of +1 is formed therefore by all the numbers greater than


K, a neighborhood of 1 is formed by all the numbers smaller than K. Clearly, for a
neighborhood of +1, the value of K becomes particularly signi…cant when it is arbitrarily
large, while for a neighborhood of 1 the value of K becomes particularly signi…cant when
it is of negative sign and arbitrarily large in absolute value.

O.R. A neighborhood B" (x) of a point is the smaller, the smaller " > 0 is; a neighborhood
(K; +1] of +1 is the smaller, the greater K is (and similarly for the neighborhoods [ 1; K)
of 1). H

Having observed that (K; +1] and [ 1; K) are open in R for every K 2 R, we can state
a lemma that will turn out to be useful in de…ning limits of sequences and functions.
184 CHAPTER 8. SEQUENCES

Lemma 276 Let A be a set in R.

(i) +1 is a point of accumulation A if and only if A is not bounded from above.

(ii) 1 is point of accumulation of A if and only if A is not bounded from below.

Proof Since the proof of (ii) is analogous, it is su¢ cient to show (i). “If”. Let A be unbounded
from above, i.e., A does not have an upper bound. Let (K; +1] be a neighborhood of +1.
Since A does not have any upper bound, K is not an upper bound of A. Therefore there
exists x 2 A such that x > K, i.e., x 2 (K; +1] \ A and x 6= +1. It follows that +1 is a
limit point of A (indeed, each neighborhood of +1 contains points of A di¤erent from +1).
“Only if”. Let +1 be a limit point of A. We show that A does not have any upper
bound. Suppose, by contradiction, that K 2 R is an upper bound of A. Since +1 is a limit
point of A, the neighborhood (K; +1] of +1 contains a point x 2 A such that x 6= +1.
Therefore K < x, contradicting the fact that K is an upper bound of A.

Example 277 The sets A such that (a; +1) A for some a 2 R constitute an important
class of unbounded from above sets. By Lemma 276, it follows that for them +1 is a limit
point. In a similar way, 1 is a limit point for the sets A such that ( 1; a) A for some
a 2 R. N

Using the topology of R we can give a general de…nition of limit that extends De…nition
265 in order to include also the De…nitions 271 and 273 of divergence. We observe that in
the next de…nition, which uni…es all the possible de…nitions of limit of sequence, we have
that: 8
>
> B" (L) if L 2 R
<
U (L) = (K; +1] if L = +1
>
>
:
[ 1; K) if L = 1

De…nition 278 A sequence fxn g in R converges to a point L 2 R if for every neighborhood


U (L) of L there exists nU 1 such that

n nU =) xn 2 U (L)

If L 2 R, we recover De…nition 265. If L = 1, thanks to De…nition 275 of neighborhood,


De…nition 278 becomes a reformulation in terms of neighborhoods of De…nitions 271 and 273.
The general de…nition shows therefore the unity of the notions that we have seen, con-
…rming the strong connection between convergence and divergence already underlined by
Proposition 274.

O.R. Observe that if L 2 R, nU depends on an arbitrary radius " > 0 (in particular, as
small as we want), and hence we can write nU = n" : If, instead, L = +1, nU depends on
any real number K (in particular, arbitrarily large) and we can write nU = nK , with K > 0
without loss of generality. Finally, if L = 1, nU depends on any negative real number K
(in particular, arbitrarily large in absolute value) and, without loosing generality, we can set
nU = nK with K < 0. On the other hand, when L is …nite it is decisive that the property
holds also for arbitrarily small values of ". When L = 1, it is instead decisive that the
property holds also for K arbitrarily large in absolute value. H
8.7. PROPERTIES OF LIMITS 185

8.7 Properties of limits


In this section we study some properties of limits. The …rst result shows that the limit of a
sequence, if there exists, is unique.

Theorem 279 (Uniqueness of the limit) A sequence fxn g converges to at most one limit
L 2 R.

Proof Let us suppose, by contradiction, that there exist two distinct limits belonging to the
set R. For such limits di¤erent cases are possible.
We analyze …rst the case of two distinct …nite limits L0 ; L00 2 R, i.e., L0 6= L00 . Without
loss of generality, suppose that L00 > L0 . Take " > 0 such that

L00 L0
"<
2
Then
B" L0 \ B" L00 = ;

10 y

8 L''+ε
L''
L''- ε
6
L'+ε
L'
4
L'- ε

O x
0

-2
-2 -1 0 1 2 3 4

as the reader can verify. On the other hand, by De…nition 265, there exists n0" 1 such that
xn 2 B" (L0 ) for every n n0" , and there exists n00" 1 such that xn 2 B" (L00 ) for every
n n" . Setting n" = max fn" ; n" g, we have therefore both xn 2 B" (L0 ) and xn 2 B" (L00 )
00 0 00

for every n n" , i.e., xn 2 B" (L0 ) \ B" (L00 ) for every n n" . But this contradicts
0 00 0 00
B" (L ) \ B" (L ) = ;. It follows that L = L and therefore the limit is unique.
Let us analyze now the case in which the sequence admits one …nite limit L and another
one equal to +1. For every " > 0 and every K > 0, there exist n" and nK such that

L " < xn < L + " 8n n" and xn > K 8n nK

For n max fn" ; nK g, we therefore have simultaneously

L " < xn < L + " and xn > K


186 CHAPTER 8. SEQUENCES

It is now su¢ cient to take K = L + " to realize that, for n max fn" ; nK g, the two
inequalities cannot coexist.
The remaining cases can be treated in an analogous way.

The next result shows that, when a sequence converges to a point L 2 R, in each neigh-
borhood of L we …nd almost all the points of the sequence.

Proposition 280 A sequence fxn g converges to L 2 R if and only if each neighborhood


B" (L) of L contains all the terms of the sequence, except at most a …nite number of them.

In other words, the sequence eventually belongs to any neighborhood B" (L) of L.
Proof Let us suppose xn ! L. By De…nition 265, for every " > 0 there exists n" 1 such
that xn 2 B" (L) for every n n" . Therefore, except at most the values xn with 1 n < n" ,
all the elements of the sequence belong to B" (L).
Vice versa, given any neighborhood B" (L) of L, suppose that all the terms of the sequence
belong to it, except at most a …nite number of them. Denote by fxnk g, k = 1; 2; : : : ; m, the
set of the elements of the sequence that do not belong to B" (L). Setting n" = nm + 1, we
have that xn 2 B" (L) for every n n" . As this is true for each neighborhood B" (L) of L,
by De…nition 265 we have that xn ! L.

The next classical result, concerning the permanence of sign, shows that if a sequence
has a non-zero limit L, then the terms of the sequence eventually have the same sign as L.

Theorem 281 (Permanence of sign) Let fxn g be a sequence that converges to the limit
L 6= 0. Then, there exists n 1 such that xn has the same sign as L for every n n, that
is,
xn L > 0 8n n
Analogously, if xn ! +1 ( 1), then there exists n such that xn is positive (negative) for
every n n.

Proof Suppose L > 0 (an analogous argument holds if L < 0). Let " 2 (0; L). By De…nition
264 there exists n 1 such that jxn Lj < ", i.e., L " < xn < L + " for every n n. Since
" 2 (0; L), we have L " > 0. Therefore,

0<L " < xn 8n n

and we conclude that xn > 0 for every n n, as desired. If xn ! 1, by de…nition,


xn > K > 0 for n n.

The Theorem on the permanence of sign represents a property of the limits with respect
to the order structure on R. We give another simple result of the same type, leaving the
proof to the reader.

Proposition 282 Let fxn g and fyn g be two sequences such that xn ! L 2 R and yn !
H 2 R. If eventually xn yn , then L H.

The converse does not hold: for example, let L = H = 0, fxn g = f 1=ng and fyn g =
f1=ng. We have L H, but xn < yn for every n. However, if we assume L > H, the
converse holds in the following strict form:
8.7. PROPERTIES OF LIMITS 187

Proposition 283 If fxn g and fyn g are two sequences such that xn ! L 2 R and yn ! H 2
R, with L > H, then eventually xn > yn .

Proof We prove the statement for L; H 2 R, leaving the other cases to the reader. Let
0 < " < (L H) =2. Since H + " < L ", we have (H "; H + ") \ (L "; L + ") = ;.
Moreover, there exist n0" ; n00" 1 such that yn 2 (H "; H + ") for every n n0" and
xn 2 (L "; L + ") for every n n00" . For n maxfn0" ; n00" g it follows that yn 2 (H "; H + ")
and xn 2 (L "; L + "), so that xn > L " > H + " > yn for every n maxfn0" ; n00" g.

The scope of the proposition is remarkable. It allows, for example, to verify the positive
or negative divergence of a sequence through the simple comparison with other divergent
sequences. Indeed, if xn yn and xn diverges negatively, then so does yn ; if xn yn and yn
diverges positively, then so does xn .

8.7.1 Monotonicity and convergence


The next result gives a simple necessary condition for convergence.

Proposition 284 Each convergent sequence is bounded.

Proof Suppose xn ! L. Setting " = 1, there exists n1 1 such that xn 2 B1 (L) for every
n n1 . Let M be a constant such that M > max [1; d (x1 ; L) ; : : : ; d (xn1 1 ; L)]. We have
d (xn ; L) < M for every n 1, i.e., jxn Lj < M for every n 1. This implies that

L M < xn < L + M

and, therefore, the sequence is bounded.

Thanks to Proposition 284, the convergent sequences constitute a subset of the bounded
ones. Therefore, if a sequence is not bounded, it cannot be convergent.

In general, the converse of Proposition 284 is false. For example the sequence xn =
( 1)n is bounded, but does not converge. However, for an important class of sequences, the
monotonic ones, the converse holds: for such sequences, the boundedness is both a necessary
and su¢ cient condition for convergence. Before to state and prove such a result, we need
another important theorem:

Theorem 285 Each monotonic sequence in R is regular; in particular,

(i) it converges if it is bounded;

(ii) it diverges positively if it is increasing and not bounded;

(iii) it diverges negatively if it is decreasing and not bounded.

Proof Let fxn g be an increasing sequence in R (the proof in the case of a decreasing sequence
is analogous). The sequence fxn g can be bounded or not bounded from above (it is surely
bounded from below since x1 xn for every n 1).
188 CHAPTER 8. SEQUENCES

Let us suppose that it is bounded. We want to prove that it is convergent. Let E be


the image of the sequence. By hypothesis, it is a bounded subset of R. By the Least Upper
Bound Principle, there exists sup E. Set L = sup E and let us show that xn ! L. Let " > 0.
From the fact that L is the supremum of E, one derives two consequences (recall Proposition
119): (i) L xn for every n 1, (ii) there exists an element xn" of E such that xn" > L ".
Since fxn g is an increasing sequence, it follows that

L xn xn" > L " 8n n"

and hence xn 2 B" (L) for every n n" , as desired.


If fxn g is not bounded from above, then for every K > 0 there exists an element xnK
such that xnK > K. Since fxn g is increasing, we have xn xnK > K for every n nK and
therefore it diverges to +1.

Thus, Theorem 285 guarantees that monotonic sequences cannot be irregular.8 We are
now able to state and prove the theorem anticipated above on the equivalence of boundedness
and convergence for monotonic sequences.

Corollary 286 A monotonic sequence is convergent if and only if it is bounded.

Proof Let fxn g be an increasing sequence in R: If it is convergent, then by Proposition 284


it is bounded. If it is bounded, then by Theorem 285 is convergent.

We close with an obvious, but useful observation: the results just discussed hold, more
generally, for sequences that are eventually monotonic.

8.7.2 Heron’s method


While computing the square a2 of a number a is quite simple, the procedure required to
p
compute the square root a of a positive number a is signi…cantly harder. Fortunately we
can count on Heron’s method, a powerful algorithm also known as “Babylonian method”.
p
In order to obtain a, given a > 0 and a 6= 1, we must build a recursive sequence fxn g
by setting9 x1 = a and
1 a
xn+1 = xn + 8n 2
2 xn
p
Theorem 287 (Heron) xn ! a.

Thus, Heron’s sequence converges to the square root of a. On top of that, the rate of
convergence is quite fast, as we shall see.

Proof The sequence is (strictly) decreasing, at least after n = 2, as the following claims
show:
8
The version for functions of real variable of this important result is Lemma 881 (which will be used in
the study of improper integrals).
9 p
We could also take x1 = b with b between a and a: in this way the rate of convergence is increased.
8.7. PROPERTIES OF LIMITS 189

(i) Note that p p


xn > a =) a < xn+1 < xn
p
Indeed, if xn > a, it follows that x2n > a , i.e., xn > a=xn , and so
1 a 1
xn+1 = xn + < (xn + xn ) = xn
2 xn 2
2 p
Moreover, from the inequality x2n a > 0, which always holds for xn 6= a, we can
progressively deduce that
x4n + a2 a2
x4n 2x2n a + a2 > 0; > 2a; x2
n + > 2a
x2n x2n
2
a2 a
x2n + + 2a > 4a; xn + > 4a
x2n xn
that is,
a 21 p
x2n+1
xn + = > a and so xn+1 > a
xn 4
p p
(ii) Whenever a > 1, x1 = a > a. A fortiori, by (i) x2 < a also holds. Conversely,10
p
if 0 < a < 1, then x2 = 12 (a + 1) > a. Indeed, by squaring we obtain

(a + 1)2 > 4a; a2 + 2a + 1 > 4a; a2 2a + 1 > 0; (a 1)2 > 0


which is trivially true.
p p
To sum up, x2 is always greater than a. From point (i) it thus follows that a < x3 < x2 ,
p
which in turn implies a < x4 < x3 , and so on. The elements of the sequence, starting from
p
the second one, are thus decreasing and greater than a.

To conclude, the sequence fxn g is decreasing (at least for n 2) and thus admits a limit
L which must satisfy
1 a a a
L= L+ ; 2L = L + ; L= ; L2 = a
2 L L L
p
Hence, L = a.
p
Example 288 (i) Let us compute 2, which we know to be approximately 1:4142135.
Hero’s sequence is made up of the following elements:
x1 = 2
1 2 3
x2 = 2+ =
= 1:5
2 2 2
1 3 2 17
x3 = + = ' 1:4166667
2 2 3=2 12
1 17 2 577
x4 = + = ' 1:4142156
2 12 17=12 408
1 577 2 665857
x5 = + = ' 1:4142135
2 408 577=408 470832
10
If a = 1, then clearly xn = 1 for all n.
190 CHAPTER 8. SEQUENCES
p
(ii) Let us compute 428356 ' 654:48911:

x1 = 428356 ; x2 ' 214178:5 ; x3 ' 107090:24


x4 ' 53547:115 ; x5 ' 26777:619 ; x6 ' 13396:807
x7 ' 6714:3905 ; x8 ' 3389:0936 ; x9 ' 1757:743
x10 ' 1000:7198 ; x11 = 714:3838 ; x12 ' 656:9999
x13 ' 654:4939 ; x14 ' 654:4891

By taking x1 = b = 1:000 we can increase the rate of convergence:

x2 ' 714:178 ; x3 ' 656:9834 ; x4 ' 654:4938 ; x5 ' 654:4891

p
(iii) For 0:13 ' 0:3605551 we have

x1 = 0:13 ; x2 ' 0:565 ; x3 ' 0:3975442


x4 ' 0:3622759 ; x5 ' 0:3605592 ; x6 ' 0:360555

Note that, since 0; 13 < 1, the sequence is decreasing starting from the second element.
N

The intuition behind Heron’s method is truly elegant. It is based on a sequence of


p
rectangles, all of which have surface equal to a, converging to a square with side equal to a
(thus with surface a). The n-th rectangle’s longer side is equal to xn and the shorter side is
equal to a=xn (given that the area must equal a): for n + 1 the longer side shrinks to

1 a
xn+1 = xn + < xn
2 xn

By iterating the algorithm xn and a=xn become closer and closer: their common value cannot
p
be anything other than a.

y
4

2a/x n+1
a/x
n
1

0
O x x x
n+1 n

-1
-1 0 1 2 3 4 5
8.7. PROPERTIES OF LIMITS 191

8.7.3 The Bolzano-Weierstrass Theorem


There exists a partial converse to Proposition 284: the famous Bolzano-Weierstrass Theorem.
In order to state it, we must …rst introduce the notion of subsequence. Let us consider a
sequence fxn g in R. Given a strictly increasing sequence fnk g1
k=1 that assumes only strictly
positive integer values, i.e., such that

n1 < n2 < n3 < < nk <

with each nk 1, the sequence fxnk g1 k=1 is called subsequence of fxn g. In other words, the
subsequence fxnk g is a sequence built starting from the original sequence fxn g taking only
the terms of position (index) nk .

Example 289 Consider the sequence


1 1 1 1
1; ; ; ; : : : ; ; : : : (8.14)
2 3 4 n

with generic element xn = 1=n. A subsequence of it is given by

1 1 1 1
1; ; ; ; : : : ; ;:::
3 5 7 2k + 1

in which fnk gk 1 is the sequence of the odd numbers f1; 3; 5; : : :g: this subsequence has been
built by selecting the elements of odd place in the original one. Another subsequence of
(8.14) is given by
1 1 1 1 1
; ; ; ;:::; n;:::
2 4 8 16 2
where this time fnk gk 1 is formed by the (strictly positive integer) powers of 2, that is,

2; 22 ; 23 ; : : :

built by selecting the original elements whose place is a power of 2. N

Example 290 Consider the oscillating sequence in R with generic element xn = ( 1)n . A
simple subsequence is given by
f1; 1; 1; : : : ; 1; : : :g (8.15)
in which fnk gk 1 is the sequence of the even numbers: this subsequence has been built by
selecting the elements of even place in the original one. If we selected those of odd place we
would have built the subsequence

f 1; 1; 1; : : : ; 1; : : :g (8.16)

Taking fnk gk 1 = f1000kg, i.e., selecting only the elements of places 1; 000, 2; 000,
3; 000, ... we still get f1; 1; 1; : : : ; 1; : : :g. N

A subsequence is obtained simply by discarding some terms (also in…nitely many) of the
sequence, leaving in place however an in…nite number of them. If a sequence is regular, each
of its subsequences is regular and with the same limit (ubi maior ...).
192 CHAPTER 8. SEQUENCES

Proposition 291 A sequence fxn g in R is regular, with limit L 2 R, if and only if each of
its subsequences is regular with the same limit L.

Proof We prove the result for L 2 R, leaving the cases 1 to the reader. “Only if”.
Suppose that fxn g converges to L. Let " > 0. There exists n" 1 such that jxn Lj < "
for every n n" . Let fxnk g1 k=1 be a subsequence of fxn g. Since nk k for every k 1, we
a fortiori have jxnk Lj < " for every k n" , so that fxnk g converges to L. “If”. Suppose
that each subsequence of fxn g converges to L. Assume, by contradiction, that fxn g does
not converge to L. Then, there exists "0 > 0 such that, for every integer k 1, there exists
nk k for which jxnk Lj > "0 . Consider 11 the sequence of such xnk . It is a subsequence
of fxn g that, by construction, does not converge to L: contradiction. Hence, fxn g converges
to L.
As the last example shows, it can happen that, while the original sequence is irregular,
some of its subsequences are convergent. In other words, it can happen, by selecting the
elements in a suitable way, that we can “extract”a convergent trend out of an irregular one.
In Example 290 we have an oscillating sequence, from which we have selected a constant
subsequence taking only the elements of even position (or only of odd position). The next
result, the Bolzano-Weierstrass Theorem, shows that this is always possible, provided the
sequence is bounded.

Theorem 292 (Bolzano-Weierstrass) Each bounded sequence has (at least) one conver-
gent subsequence.

In other words, from any bounded sequence fxn g, even if very irregular, it is possible
to extract a convergent subsequence fxnk g , i.e., such that there exists L 2 R for which
lim xnk = L. The possibility of being always able to “extract” a convergent behavior from
any bounded sequence is a property of great importance.

Example 293 The sequence xn = ( 1)n is bounded, since its image is the bounded set
f 1; 1g. By the Bolzano-Weierstrass Theorem, it has at least one convergent subsequence.
Indeed, such are the constant subsequences (8.15) and (8.16). N

The proof of Bolzano-Weierstrass Theorem is based on the next lemma.

Lemma 294 Each sequence has a monotonic subsequence.

Proof Let fxn g be a sequence in R. We consider two cases.


Case 1: for every n 1 there exists m > n such that xm xn . Set n1 = 1. Let n2 > n1
be such that xn2 xn1 ; then let n3 > n2 be such that xn3 xn2 , and so on. We build in
this way a decreasing monotonic subsequence fxnk g, and the lemma is proved in Case 1.
Case 2: there exists a position n 1 such that for each m > n we have xm > xn .
Let I N be the set of all the positions with this property. If I is a …nite set, then for
all the positions n > max I Case 1 holds. Considering n > max I, we can therefore build,
proceeding as in Case 1, a decreasing monotonic subsequence fxnk g.
11
For the …rst term we take k = 1 and the integer n1 1 such that jxn1 Lj > "0 ; for the second term we
take k = 2 and the integer n2 1 such that jxn2 Lj > "0 ; and so on.
8.7. PROPERTIES OF LIMITS 193

Let us suppose that, instead, I is not …nite, i.e., that there exist in…nitely many positions
n 1 such that
m > n =) xm > xn (8.17)

Since they are in…nitely many, we denote the elements of I as I = fn1 ; n2 ; : : : ; nk ; : : :g, with
n1 < n2 < : : : < nk : : : Thanks to (8.17) we have

xn1 < xn2 < < xnk <

The subsequence fxnk g is therefore monotonic increasing; this completes the proof in Case
2.

Proof of the Bolzano-Weierstrass Theorem Let fxn g be a bounded sequence. By


Lemma 294, there exists a monotonic subsequence fxnk g. Since this subsequence is bounded
(as a subsequence of a bounded sequence), Theorem 285 shows that it is convergent, as
desired.

O.R. The Bolzano-Weierstrass Theorem states in substance that it is not possible to take
in…nitely many numbers (the elements of the sequence) in a bounded real interval in a way
that they (or a part of them) are “well separated”one from the other: necessarily they crowd
in the proximity of (at least) one point. H

For unbounded sequences, it is possible to state something very similar:

Proposition 295 Each unbounded sequence has a divergent subsequence (to +1 when it is
not bounded from above, to 1 when it is not bounded from below).12

Proof If the sequence is unbounded from above, then for every K > 0 there exists at least
one element of the sequence greater than K. We denote by xnK the smallest term in the
sequence fxn g that turns out to be > K: taking K = 1; 2; : : :, fxnK g is clearly a subsequence
of fxn g (all its terms have been taken among those of fxn g) and, by de…nition, it diverges
to +1. The case of sequences not bounded from below can be treated in an analogous way.

In conclusion:

Proposition 296 Each sequence has a regular subsequence.

O.R. We can therefore say that there is no way of taking in…nitely many real numbers
without at least a part of them crowding somewhere (in proximity of either a …nite number
or of +1 or of 1; i.e., of some point of R). H
12
In the case where it is neither bounded from above nor bounded from below it has both a subsequence
diverging to +1 and a subsequence diverging to 1.
194 CHAPTER 8. SEQUENCES

8.8 Algebra of limits and fundamental limits


8.8.1 (Many) certainties
In the calculation of limits it is important to know how the limits behave with respect to the
operations on sequences seen in Section 8.2. Besides the theoretical interest, the question is
important from the computational viewpoint, because the calculation of more complex limits
often reduces, applying the rules that we will see in Proposition 297, to the calculation of
more elementary limits or limits that involve the basic ones that we will introduce soon.
The next result, based on the properties of the extended real line, shows that in all the
cases covered by the rules of “partial arithmetization” – that is, excluding the forms that
we called of indetermination – the limits commute with the basic operations. The writing
xn ! L 2 R indicates that the sequence fxn g converges to L 2 R or diverges positively or
negatively.

Proposition 297 Let xn ! L 2 R and yn ! H 2 R. Then,

(i) xn + yn ! L + H, provided that L + H is not an indeterminate form (1.24), of the type

+1 1 or 1+1

(ii) xn yn ! LH, provided that LH is not an indeterminate form (1.25), of the type

1 0 or 0 ( 1)

(iii) xn =yn ! L=H provided that eventually yn =


6 0 and L=H is not an indeterminate form
(1.26), of the type13
1 a
or
1 0

Proof (i) Let xn ! L and yn ! H, with L; H 2 R. This means that, for every " > 0, there
exist n1 and n2 such that

L " < xn < L + " 8n n1


H " < yn < H + " 8n n2

Summing the inequalities member by member, for n n3 = max fn1 ; n2 g we have

L+H 2" < xn + yn < L + H + 2"

and, because 2" is arbitrary, it follows that xn + yn ! L + H.


Now let xn ! L 2 R and yn ! +1. This means that, for every " > 0 and for every
K > 0, there exist n1 and n2 such that

L " < xn < L + " 8n n1


yn > K 8n n2
13
Note that excluding the indeterminacy a=0 is equivalent to requiring that H 6= 0.
8.8. ALGEBRA OF LIMITS AND FUNDAMENTAL LIMITS 195

Summing, we have that, for n n3 = max fn1 ; n2 g,

xn + yn > K + L "

and since K + L " > 0 is arbitrary, it follows that xn + yn ! +1. The other cases of
in…nite limit are treated in an analogous way.

(ii) Let xn ! L and yn ! H, with x; y 2 R. This means that, for every " > 0, there
exist n1 and n2 such that

L " < xn < L + " 8n n1


H " < yn < H + " 8n n2

Moreover, being convergent, fyn g is bounded (recall Proposition 284): there exists b > 0
such that jyn j b for every n. Now, for n n3 = max fn1 ; n2 g,

jxn yn LHj = jyn (xn L) + L (yn H)j jyn j jxn Lj + jLj jyn Hj < " (b + jLj)

and, thanks to the arbitrariness of " (b + jLj), we conclude that xn yn ! L H.


If L > 0 and H = +1, then in addition to having, for every " > 0,

L " < xn < L + " 8n n1

one also has, for every K > 0, yn > K for n n2 . It follows that, for n n3 = max fn1 ; n2 g,

xn yn > (L ") K

and, for the arbitrariness of (L ") K > 0, we conclude that xn yn ! +1. If L < 0 and
H = +1, xn yn < (L + ") K and therefore xn yn ! 1. The other cases of in…nite limits
are treated in an analogous way.

We leave assertion (iii) to the reader.

Example 298 Let xn = n= (n + 1) and yn = 1 + ( 1)n =n. Since xn ! 1 and yn ! 1, we


have that xn + yn ! 1 + 1 = 2 and that xn yn ! 1. N

Example 299 Let xn = 2n and yn = 1 + ( 1)n =n. Since xn ! +1 and yn ! 1, we have


that xn + yn ! +1 and that xn yn ! +1. N

The following result shows that the case a=0 of assertion (iii) with a 6= 0 is not an
indeterminacy for the algebra of limits, although it is so for the extended real line (as
indicated in Section 1.7).

Proposition 300 Let xn ! L 2 R, with L 6= 0, and yn ! 0 2 R. The limit of the sequence


xn =yn exists if and only if the sequence fyn g eventually has constant sign. In such a case:14
14
By recalling the notation of Section 8.6.2, yn ! 0+ means that yn ! 0 with the terms yn eventually
positive (in other words, the sequence tends to zero from above), while yn ! 0 means that yn ! 0 with the
terms yn eventually negative (in other words, the sequence tends to zero from below).
196 CHAPTER 8. SEQUENCES

(i) if L > 0 and yn ! 0+ or if L < 0 and yn ! 0 , then


xn
! +1
yn

(ii) if L > 0 and yn ! 0 or if L < 0 and yn ! 0+ , then


xn
! 1
yn
Proof Let us prove the “only if”part. We leave to the reader the rest of the proof. Let L > 0
(the case L < 0 is analogous). Suppose that the sequence fynn g does
o not have eventually
constant sign. Hence, there exist two subsequences fynk g and yn0k such that ynk ! 0+
and yn0k ! 0 . Therefore, xnk =ynk ! +1, while xnk =yn0k ! 1. Since two subsequences
of xn =yn have distinct limits, Proposition 291 shows that the sequence xn =yn has no limit.
Example 301 (i) Take xn = 1=n 2 and yn = 1=n. We have xn ! 2 and yn ! 0.
Since fyn g has always (and therefore also eventually) positive sign, the proposition yields
xn =yn ! 1.
(ii) Take xn = 1=n + 3 and yn = ( 1)n =n. In this case xn ! 3, but yn ! 0 with
alternating signs, that is, yn has not eventually constant sign. Thanks to the proposition,
the sequence fxn =yn g has no limit. N
Proposition 300 does not, unfortunately, say anything for the case a = 0, that is, for
the indeterminacy 0=0. In the light of the last two propositions, we have the following
indeterminate forms for the limits:
+1 1 or 1+1 (8.18)
which is often denoted by writing brie‡y 1 1;
1 0 or 0 ( 1) (8.19)
which is often denoted by writing brie‡y 0 1; and
1 0
or (8.20)
1 0
1 0
which are often denoted by writing brie‡y and . Section 8.8.3 will be devoted to them.
1 0
Finally, observe that, aside from the basic operations, the operation of limit commutes
also with the power (and the root, which is a special case), the exponential, and the logar-
ithm. Indeed, (12.8) of Chapter 12 will show that this painless commuting happens for all
continuous functions, and since the proof of next result is based precisely on the continuity
of the power, exponential, logarithm functions, we omit it.
Proposition 302 Excluding the indeterminate forms (1.27), that is,
1
1 ; 00 ; (+1)0
we have:15
15
From now on, since there is no danger of confusion, we will simply write lim xn instead of limn!1 xn .
Indeed, the limit of a sequence is de…ned only for n ! 1 and therefore this extra speci…cation can be
considered super‡uous.
8.8. ALGEBRA OF LIMITS AND FUNDAMENTAL LIMITS 197

(i) lim xn = (lim xn ) provided that 2 R and xn > 0;

(ii) lim xn = lim xn provided that > 0;

(iii) lim loga xn = loga lim xn .

We have therefore also the following indeterminate forms for the limits: 1 1 , which we
will often denote by writing 11 ; 00 ; and (+1)0 , which we will often denote by writing 10 .

8.8.2 Some basic limits


We introduce two basic sequences (actually it would be su¢ cient to consider the …rst one,
since the second one is simply its reciprocal): from their behavior we deduce, thanks to the
previous Propositions 297 and 302, many other limits. For the sequence of generic term
xn = n one can verify very easily that:

lim n = +1

since n > K for every n [K] + 1.

For the “reciprocal” sequence xn = 1=n, we have

1
lim =0
n
since 0 < 1=n < " for every n [1="] + 1.

As we have anticipated, from these two elementary limits we deduce, using Propositions
297 and 302, many other ones:

(i) lim n = +1 for every > 0;

(ii) lim (1=n) = lim n = 0+ for every > 0; therefore


8
< +1 if > 0
lim n = 1 if = 0
: +
0 if < 0

(iii) we have: 8
< +1 if > 1
n
lim = 1 if = 1
: +
0 if 0 < < 1

+1 if > 1
lim log n =
1 if 0 < < 1
and many other limits. For example, we have
7
lim 5n + n2 + 1 = +1 + 1 + 1 = +1
198 CHAPTER 8. SEQUENCES

as well as

3 1
lim n2 3n + 1 = lim n2 1 + 2 = +1 (1 0 + 0) = +1
n n

5 7
n2 5n 7 n2 1 n n2 1 0 0 1
lim = lim 4 6 = =
2n2 + 4n + 6 n2 2 + n + n2
2+0+0 2
1
5 n
lim = [0 (5 0)] = 0
2n2
and

n (n + 1) (n + 2) n n 1 + n1 n 1 + n2
lim = lim 1 2 4
(2n 1) (3n 2) (5n 4) 2n 1 2n 3n 1 3n 5n 1 5n
1 2
1+ n 1+ n
= lim 1 2 4
30 1 2n 1 3n 1 5n
1 1 1
= =
30 1 1 1 30

8.8.3 Indeterminate forms for the limits


In the previous section we have avoided accurately the indeterminate forms of the limits
(8.18)–(8.20) since in such cases we cannot state anything in general. For example, the limit
of the sum of two sequences whose limits are in…nite of opposite sign can be …nite, in…nite,
or not exist, as examples below will show. Such limit turns out to be “indeterminate”based
only on the information that the two summands diverge respectively to +1 and to 1.
We try to be very explicit. In many cases (all those covered by the “partial arithmet-
ization”) the limit of a sequence obtained through an operation on two other sequences is
determined only by the limits of those two sequences (for example, whatever fxn g and fyn g,
if they tend respectively to 5 and to 3, the limit of fxn + yn g is 5 + ( 3) = 2 and the limit
of fxn yn g is 5 ( 3) = 15). In some cases instead (and they are those that we have called
“of indetermination”) the information on the two limits is not su¢ cient; there is therefore
no short cut in calculating the limit: one has to roll up one’s sleeves and calculate it each
time.

Indeterminate form 1 1
Let us consider the indeterminate form 1 1. For example, the limit of the sum xn + yn of
the sequences xn = n and yn = n2 falls under this form of indetermination, so one cannot
resort to previous results. We have, however,

xn + yn = n n2 = n (1 n)

where n ! +1 and 1 n ! 1 so that, being in the case +1 ( 1), it follows that


xn + yn ! 1. Note how, due to a very simple algebraic manipulation, we have been able
to …nd our way out from this indeterminacy.
8.8. ALGEBRA OF LIMITS AND FUNDAMENTAL LIMITS 199

Now take xn = n2 and yn = n. Also in this case, the limit of the sum xn + yn falls
under the indeterminacy 1 1. By proceeding like we have just done, this time we obtain
lim (xn + yn ) = lim n (n 1) = lim n lim (n 1) = +1
1
Next, take xn = n and yn = n, again of type 1 1. Here again a simple manipulation
n
allows us to …nd a way out:
1 1
lim (xn + yn ) = lim n + n = lim =0
n n
Finally, take xn = n2 +( 1)n n and yn = n2 , which is again of type 1 1 since xn ! +1,
because xn n2 n = n (n 1). Now
lim (xn + yn ) = lim ( 1)n n
does not exist.
Therefore, when we have a case of type 1 1, it can happen that the limit under
consideration is either +1 or 1 or …nite or nonexistent. In other words, everything can
happen. The simple observation that the case at hand is of type 1 1 does not allow us to
say anything on the limit of the sum.16 In the case 1 1 we have to really look carefully
at the two sequences and, each time, manage to …nd a way, which is very often simple, to
avoid the indeterminacy (as we have seen in the small examples discussed above). The same
can be said for the other indeterminate forms.

Indeterminate form 0 1
Let, for example, xn = 1=n and yn = n3 . The limit of their product has the form 0 1, and
therefore we cannot let ourselves be guided by the previous results. We have, however,
1
lim xn yn = lim n3 = lim n2 = +1
n
1
If xn = and yn = n, then
n3
1 1
lim xn yn = lim 3
n = lim 2 = 0
n n
If xn = n3 and yn = 7=n3 , then
7
lim xn yn = lim n3 = lim 7 = 7
n3
If xn = 1=n and yn = n(cos n + 2),17 then
lim xn yn = lim(cos n + 2)
does not exist.
Again, only the direct calculation of the limit can determine its value: the indeterminate
form can give completely di¤erent results.
16
If instead, it were a form of type 1 + a, even without knowing how the two sequences are de…ned, we
would have been able to say that the limit of their sum is 1.
17
Using the comparison criterion, that we will study soon (Theorem 303), it is possible to prove easily that
yn ! +1.
200 CHAPTER 8. SEQUENCES

Indeterminate forms 1=1 and 0=0


Let, for example, xn = n and yn = n2 . The limit of their quotient has the form 1=1, but

xn n 1
lim = lim 2 = lim = 0
yn n n

On the other hand, exchanging xn with yn , the indeterminacy 1=1 remains, but

yn n2
lim = lim = lim n = +1
xn n

with limit completely di¤erent from the previous one18 .


Let us see another example concerning 1=1. If xn = n2 and yn = 1 + 2n2 , we have

xn n2 1 1
lim = lim = lim 1 =
yn 1 + 2n2 n2
+2 2

Still, if xn = n2 (sin n + 7) and yn = n2 , then


xn
lim = lim (sin n + 7)
yn

which does not exist.


Naturally, this holds also for the indeterminacy 0=0. For example, let xn = 1=n and
yn = 1=n2 . We have
1
xn
lim = lim n1 = lim n = +1
yn n2

whereas, exchanging the places of xn and yn , we have


1
yn n2 1
lim = lim 1 = lim =0
xn n
n

Observe the simple relation between the indeterminacies 1=1 and 0=0: if the limit of
the quotient of the sequences fxn g and fyn g falls under the indeterminate form 1=1, the
limit of the quotient of the sequences f1=xn g and f1=yn g falls under the indeterminate form
0=0, and vice versa.

8.8.4 Summarizing tables


We can summarize the algebra of limits in three tables. In the tables the …rst row indicates
the limit of the sequence fxn g, while the …rst column indicates the limit of the sequence
fyn g.
We start with the limit of the sum: the inner cells give the result for the limit fxn + yn g;
we have written ?? in case of indeterminacy.
18
Since xn =yn = 1= (yn =xn ), for the two limits Proposition 274 holds.
8.8. ALGEBRA OF LIMITS AND FUNDAMENTAL LIMITS 201

sum +1 L 1
+1 +1 +1 ??
H +1 L+H 1
1 ?? 1 1

We have two indeterminate cases out of nine. We pass to the product: the inner cells give
the result for the limit fxn yn g :

product +1 L>0 0 L<0 1


+1 +1 +1 ?? 1 1
H>0 +1 LH 0 LH 1
0 ?? 0 0 0 ??
H<0 1 LH 0 LH +1
1 1 1 ?? +1 +1

where there are four indeterminate cases out of twenty-…ve. Finally, for the quotient we have
the following table, where the inner cells give the result for the limit fxn =yn g:

quotient +1 L>0 0 L<0 1


+1 ?? 0 0 0 ??
L L
H>0 +1 H 0 H 1
0 1 1 ?? 1 1
L L
H<0 1 H 0 H +1
1 ?? 0 0 0 ??

where, in the light of Proposition 300, in the third row we have assumed that yn tends to
0 from above (yn ! 0+ ) or from below (yn ! 0 ). In turn, this determines the sign of the
in…nity; for example,
1 1
lim 1 = lim n = +1 and lim 1 = lim ( n) = 1
n n

We have here …ve indeterminate cases out of twenty-…ve.


The tables make it clear how in the majority of the cases the previous results are e¤ective,
leaving only relatively few indeterminate cases.

O.R. The case 0 1 is not an indeterminacy. It is obviously an abbreviation for the lim xynn ,
where the base is a sequence (positive, otherwise the power is not de…ned!) tending to
0 (more precisely to 0+ ) and the exponent is a divergent sequence. We can set without
di¢ culty 0+1 = 0: the idea is to multiply 0 by itself “in…nitely many times” and we get a
zero as large as a palace (a “very big zero”, as a famous professor used to say). The form
0 1 is reciprocal to the previous one and therefore 0 1 = +1. H
202 CHAPTER 8. SEQUENCES

8.8.5 But how many indeterminate forms are?


We mentioned seven indeterminate forms:
1 0
; ; 0 1; 1 1; 00 ; 10 ; 11
1 0
They are actually all linked. We could consider, for example, 0 1 (or any other) as the
fundamental indeterminate form and reduce all the other ones to it:

(i) If xn ; yn ! 1, their ratio xn =yn appears in the form 1=1, but it is su¢ cient to write
the ratio as
1
xn
yn
to get the form 0 1.

(ii) If xn ; yn ! 0, their ratio xn =yn appears in the form 0=0, but it is su¢ cient to write
the ratio as
1
xn
yn
to get the form 0 1.

(iii) If xn ! 1 and yn ! 1, their sum xn + yn appears in the form 1 1. However,


we can write
yn
xn + yn = 1+ xn
xn
if yn =xn does not tend to 1, the form is no longer indeterminate and, if yn =xn ! 1
the form is of the type 0 1.

(iv) For the last three cases it is su¢ cient to consider the logarithm to place oneself in the
case 0 1:

log 00 = 0 log 0 = 0 ( 1) ; log 10 = 0 log 1 = 0 1; log 11 = 1 log 1 = 1 0

The number e, which we will meet shortly, represents the limit of an indeterminate
form (the most valuable) of the type 11 .

The reader can try to bring back all the forms of indeterminacy to 0=0 or to 1=1.

8.9 Convergence criteria


The computation of limits can be rather tedious and, in many cases, presents some di¢ culties.
In these cases theorems that provide su¢ cient conditions for convergence are useful: they
are also called convergence criteria.19
19
In this book “criterion” will be always understood as “su¢ cient condition”. An alternative customary
term is “test”.
8.9. CONVERGENCE CRITERIA 203

We start with the classical comparison criterion: when two sequences converge to the
same limit, the same is true for any sequence whose terms are “sandwiched” between those
of the two original sequences.20

Theorem 303 (Comparison criterion) Let fxn g, fyn g, and fzn g be three sequences. If,
eventually,
yn xn zn (8.21)
and
lim yn = lim zn = L 2 R (8.22)
then
lim xn = L

Proof Let " > 0. From (8.22) it follows, by De…nition 265, that there exists n1 such
that yn 2 B" (L) for every n n1 , and there exists n2 such that zn 2 B" (L) for every
n n2 . Finally we call n3 the index starting from which one has yn xn zn . Setting
n = max fn1 ; n2 ; n3 g, we have yn 2 B" (L), zn 2 B" (L), and yn xn zn for every n n,
and therefore
L " < yn xn zn < L + " 8n n
that is, xn 2 B" (L) for every n n. Hence, xn ! L as claimed.

The typical use of the result is in proving the convergence of a sequence by showing that
it can be “trapped” between two suitable convergent sequences.

Example 304 Consider the sequence xn = n 2 sin2 n. Since 1 sin n 1 for every
n 2 N, we have 0 sin2 n 1 for every n 1 and therefore
sin2 n 1
0 8n 1
n2 n2
If we consider the sequences with yn = 0 and zn = 1=n2 , conditions (8.21) and (8.22) with
L = 0 are satis…ed. By the comparison criterion, lim xn = 0. N

Example 305 The sequence with xn = n 1 sin n converges to 0. Indeed,


1 sin n 1
8n 1
n n n
and both sequences f1=ng and f 1=ng converge to 0. N

The preceding example suggests that, if fxn g is a bounded sequence and yn ! +1 or


yn ! 1, then
xn
!0
yn
Indeed jM=yn j xn =yn jM=yn j, where M is the supremum of jxn j.

The next two simple and useful theorems introduce some analytical tools that will be
used also for the convergence of series, as we will see in next chapter.
20
In Italy, the theorem is sometimes called “the two carabinieri (policemen) theorem”. Indeed, if convict
fxn g is escorted by the two policemen fyn g and fzn g (one on each “side”), then he is forced to go wherever
they go.
204 CHAPTER 8. SEQUENCES

Theorem 306 (Ratio criterion) If there exists a number q < 1 such that, eventually,

xn+1
q (8.23)
xn

then the sequence fxn g tends to 0.

Proof Suppose that the inequality holds starting from n = 1: if it held from a certain n
onwards, just recall that eliminating a …nite number of terms does not alter the limit. From

xn+1
q
xn

we deduce that jxn+1 j q jxn j and, therefore, in particular,

jx2 j q jx1 j ; jx3 j q jx2 j q 2 jx1 j ; ; jxn j qn 1


jx1 j ;

which can be rewritten as

qn 1
jx1 j xn qn 1
jx1 j 8n 2

Since 0 < q < 1, we have q n 1 ! 0 and the result then follows from the comparison criterion.

Note that the theorem does not simply require that the ratio jxn+1 =xn j be < 1, that is,

xn+1
<1
xn

but that it is “far from it”, i.e., that it is smaller than a number q which, in turn, is itself
lower than 1. The next example clari…es this observation.

Example 307 The sequence xn = ( 1)n 1 + n1 does not converge (indeed, the subsequence
extracted from it with the even indices tends to +1, whereas that with the odd indices tends
to 1), although
1
xn+1 1 + n+1 n2 + 2n
= = <1
xn 1 + n1 n2 + 2n + 1
for every n 1. N

Note that the property (8.23) clearly holds if the ratio jxn+1 =xn j has a limit and the
latter is strictly smaller than 1, that is,

xn+1
lim <1 (8.24)
xn

Indeed, let us denote by L this limit and let " > 0 be such that L + " < 1. By the de…nition
of limit, eventually we have
xn+1
L <"
xn
8.9. CONVERGENCE CRITERIA 205

that is L " < jxn+1 =xn j < L + ". Therefore, setting q = L + ", it follows that eventually
jxn+1 =xn j < q, which is property (8.23).
Indeed, the limit form (8.24) is the most common way in which the ratio criterion is
applied, as we will see soon in some example.

Recalling that xn ! L if and only if xn L ! 0, we can also state that


xn+1 L
< q =) xn ! L
xn L
Thanks to this simple observation, the ratio criterion (and also the root criterion that we will
see soon) applies to the study of the convergence xn ! L, although stated only for xn ! 0.

The ratio criterion allows to prove some basic limits:

(i) For any > 1 and k 2 R, one has

nk
lim n
=0 (8.25)

Indeed, by setting
nk
xn = n

and by taking the ratio of two consecutive terms (the absolute value is irrelevant given
that here all the terms are positive), we have
k k
xn+1 (n + 1)k n n+1 1 1 1 1
= n+1
= = 1+ ! <1
xn nk n n

(ii) If k 2 R and yn ! +1, then


logk yn
lim =0
yn
Indeed, setting yn = ezn we fall in the previous case. In particular,

logk n log n
lim = lim =0
n n

What precedes indicates a precise hierarchy among the following classes of divergent
sequences:
n
with > 1; nk with k > 0; logk n with k > 0
The “strongest” are the exponentials, graded according to the base , then the powers
follow, graded according to the exponent k, and, …nally, the logarithms, graded according to
the exponent k.
For example,
5n 6 2n n123 + 7n87 n36 log n ! +1
inheriting the behavior of 5n , and
n4 3n3 + 6n2 4 1
4 3 2
!
5n + 7n + 25n + 342 5
206 CHAPTER 8. SEQUENCES

because the numerator inherits the behavior of n4 and the denominator of 5n4 .
Soon, in Section 8.12, we will make rigorous these observations on limits based on the
rate of convergence (or divergence).

Theorem 308 (Root criterion) If there exists a number q < 1 such that, at least eventu-
ally
p
n
jxn j q (8.26)

then the sequence fxn g tends to 0.

The strict
p inequality q < 1 is key: the constant sequence xn = 1 does not converge to 0,
n
although jxn j 1 for every n.

Proof Suppose that 8.26 holds starting with n = 1. From


p
n
jxn j q

we immediately get that jxn j q n i.e., that qn xn q n . Since 0 < q < 1, q n ! 0, the
result follows from the comparison criterion.

For the root criterion we can also make observations similar


n p to those
o for the ratio criterion.
In particular, property (8.26) holds surely if the sequence n jxn j has a limit and the latter
is smaller than 1, that is, if
p
lim n jxn j < 1 (8.27)

This limit form is the most common in which the criterion is applied.

The next simple example shows how both the ratio and the root criteria are su¢ cient,
yet not necessary, conditions for convergence. However useful, they cannot thus always be
conclusive in determining the convergence of a sequence.

Example 309 The sequence with xn = 1=n converges to zero. However, we have that
xn+1 n
= !1
xn n+1
p
n
and so the ratio criterion is not applicable. Furthermore, n ! 1 as

p log n
log n
n = log n1=n = !0
n
It follows that r
n 1 1
= p !1
n n
hence neither the root criterion is applicable. None of the two criteria can thus be useful in
determining the convergence of such a simple sequence. N
8.10. THE CAUCHY CONDITION 207

Lastly, note that both sequences xn = 1=n and xn = ( 1)n =n satisfy condition

xn+1
!1
xn

although the …rst one converges to zero and the second one does not converge at all. Therefore
such a condition does not allow us to draw any conclusion regarding the asymptotic behavior
of the sequence: it can
p converge or not while satisfying this condition. Similar considerations
hold for condition n jxn j ! 1: it is enough to look at the sequences fng and f1=ng. All
this con…rms the importance of condition < 1 for the limit forms (8.24) and (8.27) to have
conclusive power. The next well-known limit further con…rms our statement.
p
n
Proposition 310 For every k > 0 we have that lim k = 1.

Proof The result is obvious


p for k = 1. Take k > 1. For any n let xn > 0 be such that
n n
(1 + xn ) = k, so that k = 1 + xn . From
p Newton’s binomial formula (A.4), we have that
nxn k so that xn ! 0. It follows that n k ! 1. p
n
Now take
p k < 1. From what we have seen above, we have that 1=k ! 1, hence
p the
sequence 1=k is bounded (Proposition 284). This in turn implies that the sequence n k is
n

bounded as well. Thanks to the Corollary, the equality


r
p
n n 1 p
n
k 1 = 1 k
k

p
n
p
n
implies that k 1 ! 0, that is lim k = 1.

8.10 The Cauchy condition


A sequence is convergent if it admits a …nite limit. To establish the convergence of a given
sequence we should therefore compute its limit, a task that could turn out to be quite di¢ cult.
In addition, the limit is an object, in a certain sense, which is “foreign” to the sequence (in
general it is not a term of the sequence). Therefore, to establish the convergence of a
sequence, we have to rely on a “stranger” that, moreover, can be di¢ cult to compute.
For such a reason, it would be preferable to have a de…nition of convergence that only
makes use of the terms of the sequence. To see how to do this, we are guided by the following
simple intuition: if a sequence converges, then its elements are, at least from a certain point
onward, all very close to the limit. But, if they are all very close to the limit, they are also
very close to one another. What follows simply formalizes this intuition.

Theorem 311 (Cauchy) The sequence fxn g is convergent if and only if for each " > 0
there exists an integer n" 1 such that

jxn xm j < " 8n; m n" (8.28)

Condition (8.28), which can be rewritten as d (xn ; xm ) < " for every n; m n" , is
called the Cauchy condition: its validity for every " > 0 is therefore a necessary and a
208 CHAPTER 8. SEQUENCES

su¢ cient condition for convergence. Sequences that satisfy this condition are called “Cauchy
sequences”.

Proof “Only if”. If xn ! L then, by de…nition, for each " > 0 there exists n" 1 such that
jxn Lj < " for every n n" . This implies that for every n; m n"

jxn xm j = jxn L+L xm j jxn Lj + jxm Lj < " + " = 2"

and, since " was arbitrarily chosen, the statement follows.


“If”. If jxn xm j < " for every n; m n" , it easily follows that jxn xn" j < " for
n = n" + 1; n" + 2; : : :, that is,

xn" " < xn < xn" + " for n = n" + 1; n" + 2; : : :

Next, we denote by A (respectively, B) the set of the real numbers such that the sequence
is eventually strictly larger (respectively, smaller) than each of them. Formally, we de…ne

A = fa 2 R : 9na 2 N such that xn > a 8n na g

and
B = fb 2 R : 9nb 2 N such that b > xn 8n nb g
Note that:

(i) A and B are not empty. Indeed, we have xn" " 2 A and xn" + " 2 B.

(ii) if a 2 A and b 2 B, then b > a. Indeed, since a 2 A (respectively, b 2 B), we have


that there exists na 2 N such that xn > a for every n na (respectively, there exists
nb 2 N such that b > xn for every n nb ). De…ne n = max fna ; nb g. It follows that
b > xn > a.

(iii) sup A = inf B. Indeed, by the Least Upper Bound Principle and the previous two
points, sup A and inf B are well de…ned and such that sup A inf B. Since, by point
(i), xn" " 2 A and xn" + " 2 B, we have xn" " sup A inf B xn" + ", and, in
particular, jinf B sup Aj 2". Since " can be chosen to be arbitrarily small, we have
jinf B sup Aj = 0, that is, inf B = sup A.

Let us call z the common value of sup A and inf B. We claim that xn ! z. Indeed, by
…xing arbitrarily a number > 0, there exist a 2 A and b 2 B such that 0 b a < and
therefore (since a z b and hence z < a and b < z + )

z <a<b<z+

But, by de…nition of A and B, the sequence is eventually strictly larger than a and eventually
strictly smaller than b and hence, the sequence is such that eventually

z < xn < z +

which, due to the arbitrary choice of , shows that z is the limit of the sequence fxn g.
8.11. NAPIER’S CONSTANT 209

Example 312 The sequence with generic term xn = 1=n is a Cauchy sequence. Indeed, let
" > 0. We have to show that there exists n" 1 such that for every n; m n" one has
jxn xm j < ". It is without loss of generality to think that n m. Note that for n m we
1 1 1 1 1 1
have jxn xm j = m n . Clearly, for every n m, 0 < m n < m . Since m < " is the same
1 1
as m > " , by choosing n" = " + 1, we have jxn xm j < " for every n m n" , proving
that xn = 1=n is a Cauchy sequence. N

Example 313 The sequence with generic term xn = log n is not a Cauchy sequence. Let
us suppose, by contradiction, that for …xed " > 0 there exists n" 1 such that for every
n; m n" we have jxn xm j < ". First, note that if n = m + h with h 2 N, we have
jxn xm j = log m+h
m < " if and only if h < m(e" 1). Thus, by choosing h = [m(e" 1)]+1
and m n" , we obtain jxn xm j = log m+h m ", which contradicts (since n; m n" )
jxn xm j < ". Therefore, xn = log n is not a Cauchy sequence. N

O.R. The previous theorem, sometimes called general criterion of convergence, does state
a fundamental property of convergent sequences, yet its relevance is due to the structural
property it isolates. This property of the set of real numbers is called completeness (and is
of great conceptual importance).
For example, let us assume (as it was the case for Pythagoras) that we only knew the
rational numbers: the space on which we operate is therefore Q. Consider the sequence
whose elements (all rationals) are the decimal approximations of :

x1 = 3; x2 = 3:1; x3 = 3:14; x4 = 3:141; x5 = 3:1415; : : :

Being a decimal approximation, this sequence satis…es the Cauchy condition


minfm 1;n 1g
jxn xm j < 10

and this can be made arbitrarily small. The sequence, however, does not converge (recall
that we are acting as if we were only aware of Q, and not of R) to any point of Q: if we
knew R, we could say that it converges to (and it could not converge to anything else since
the limit is unique). Therefore, in Q the Cauchy condition is necessary, but not su¢ cient,
for convergence.
Reiterating, there are spaces, such as R, in which the Cauchy condition is su¢ cient for
convergence and others, such as Q, in which it is not. The …rst ones are called complete
and the second ones incomplete. Without going into details, R is the “completion” of Q
in that it is the smallest complete space that contains Q. In other words, Q is e¢ ciently
enriched with all the“limits” (which do not exist in Q) of the sequences that satisfy the
Cauchy condition, obtaining in this way the complete space R. H

8.11 Napier’s constant


A fundamental sequence, which appears in the indeterminate form 11 , is
n
1
1+ (8.29)
n
210 CHAPTER 8. SEQUENCES

We will prove soon that it converges to a number (irrational, and actually transcendental)21
that is usually denoted by e and is equal to 2:71828:::
e is the most natural base of the logarithms and as such it acquires remarkable properties.
From now on we will take without exception e as base of the logarithms and very often as
base of powers.

Theorem 314 The sequence (8.29) is convergent. Its limit is denoted by e and it is called
Napier’s constant.

Since the sequence involves powers, the root criterion is the …rst possibility to consider.
Unfortunately, s
n 1 n 1
1+ =1+ !1
n n
and therefore this criterion cannot be applied. The proof is based, instead, on the following
classical inequality.

Lemma 315 If a > 1, with a 6= 0, one has

(1 + a)n > 1 + an (8.30)

for every integer number n > 1.22

Proof The proof is done by induction. Inequality (8.30) holds for n = 2. Indeed, for each
a 6= 0 we have:
(1 + a)2 = 1 + 2a + a2 > 1 + 2a
Suppose now that (8.30) holds for some n 2, i.e.,

(1 + a)n > 1 + an

We want to prove that (8.30) holds for n + 1. We have:

(1 + a)n+1 = (1 + a)(1 + a)n > (1 + a)(1 + an)


= 1 + a(n + 1) + a2 n > 1 + a(n + 1)

where the …rst inequality, due to the induction hypothesis, holds only for a > 1. This
completes the induction step.

Proof of Theorem 314 Let us set


n n+1
1 1
an = 1+ ; bn = 1+ 8n = 1; 2; :::
n n

We proceed by steps.
21
A non-rational number
p is called algebraic if it is a root of some polynomial equation with integer coe¢ -
cients: for example, 2 is algebraic because it is a root of the equation x2 2 = 0. Non-algebraic irrational
numbers are called transcendental.
22
For n = 1 the equality holds trivially.
8.11. NAPIER’S CONSTANT 211

Step 1: fbn g is decreasing. Indeed, for n 2 we have


2 3n " #n
1 n+1 1 n+1
bn 1+ 1 1+ 1
= n
n = 1+ 4 n 5 = 1+ n
n
bn 1 1+ 1 n 1+ 1 n n 1
n 1 n 1
n 1
1 (n + 1) (n 1) 1+ n
= 1+ = n
n n2 1+ 1
n2 1

and, using the inequality (1 + a)n > 1 + an proved in Lemma 315,23 we see that
n
1 n n 1
1+ >1+ >1+ >1+
n2 1 n2 1 n 2 n

whence bn =bn 1 < 1.

Step 2: fan g is increasing. Indeed


n
1 n n+1 n n 1 n n2 1 1 n
an 1+ n n n n2 1 n2
= n 1 = n 1 = 1 = 1
an 1 1 1 1
1+ n 1
n n n

and, again by the inequality used above,


n
1 n 1
1 >1 =1
n2 n2 n

we see that an =an 1 > 1.

Step 3: bn > an for every n and, moreover, bn an ! 0. Indeed


!
n+1 n n+1
1 1 1 1
bn an = 1+ 1+ = 1+ 1 1
n n n 1+ n
n+1
1 1 1
= 1+ = bn >0
n n+1 n+1

Given that bn < b1 , one gets that

bn b1
0 < bn an = < !0
n+1 n+1

Since the two sequences are strictly monotonic, convergent, and such that their di¤erence
tends to 0, we conclude that sup an = inf bn = lim an = lim bn .

One obtains
23 1
Note that 1< n2 1
6= 0 for n 2:
212 CHAPTER 8. SEQUENCES

a1 = 21 = 2 b1 = 22 = 4
3 2 9 3 3 27
a2 = 2 = 4 = 2:25 b2 = 2 = 8 = 3:375

11 10 11 11
a10 = 10 ' 2:59 b10 = 10 ' 2:85
and therefore Napier’s constant lies between 2:59 and 2:85. As we have already indicated,
the number e is transcendental and is equal to 2:71828:::.

From the fundamental limit just indicated we can deduce many other limits. The limit
(8.29) and the following ones (i)–(v) are other examples of fundamental limits.

(i) If jxn j ! +1 (for example xn ! +1 or xn ! 1), we have


xn
k
lim 1 + = ek
xn

For k = 1 the proof can be done easily considering the integer part of xn . For any k,
it is su¢ cient to set k=xn = 1=yn , so that
xn kyn yn k
k 1 1
1+ = 1+ = 1+ ! ek
xn yn yn

(ii) If an ! 0 and an 6= 0, then


lim (1 + an )1=an = e
It is su¢ cient to set an = 1=xn to …nd again the previous case (i).

(iii) If an ! 0 and an 6= 0, then


log (1 + an )
lim =1
an
It is su¢ cient to take the logarithm in the previous limit. More generally,

logb (1 + an )
lim = logb e 80 < b 6= 1
an

(iv) If c > 0, yn ! 0, and yn 6= 0, then

cyn 1
lim = log c
yn

It is su¢ cient to set cyn 1 = an (so that also an ! 0) to see that

cyn 1 an
=
yn logc (1 + an )

and so we are reduced to the (reciprocal of the) previous case in which the limit is
1= logc e = loge c = log c.
8.12. ORDERS OF CONVERGENCE AND OF DIVERGENCE 213

(vi) If 2 R and zn ! 0, with zn 6= 0, then


(1 + zn ) 1
lim =
zn
The statement is obvious for = 1. If 6= 1 we set an = (1 + zn ) 1, that is
1 + an = (1 + zn ) i.e., log (1 + an ) = log (1 + zn ) (so that also an ! 0). We have
therefore
log (1 + an ) log (1 + zn ) log (1 + zn ) zn
= =
an (1 + zn ) 1 zn (1 + zn ) 1
log (1 + an ) log (1 + zn )
and, given that ! 1 and ! 1, the statement follows.
an zn

We apply what we have just seen to some simple limits. We have


n n
n+5 5
= 1+ ! e5
n n
as well as !
3 3
2 1 1 + 1=n2 1
n 1+ 2 1 = !3
n 1=n2
and
1 log (1 + 1=n)
n log 1 + = !1
n 1=n
and
2n 1
! log 2
n

8.12 Orders of convergence and of divergence


8.12.1 Generalities
Some sequences converge to their limit “faster” than others. For example, consider two
sequences fxn g and fyn g, both diverging to +1. For example, yn = n and xn = n2 .
Intuitively, the sequence fxn g diverges faster than fyn g. If we compare them through their
ratio
yn
xn
we have
yn 1
lim = lim = 0
n xn n n

Even though here the numerator also tends to +1, the denominator has driven the fraction
on its side, forcing it to zero. Hence, the higher rate of divergence (that is, of convergence
to +1) of the sequence fxn g reveals itself in the convergence to zero of the ratio yn =xn .
The ratio seems therefore to be the natural …eld of comparison for the relative speed of
convergence/divergence of the two sequences.

The next de…nition formalizes this intuition, important both from the conceptual point
of view and the calculations.
214 CHAPTER 8. SEQUENCES

De…nition 316 Let fxn g and fyn g be two sequences, with the terms of the …rst one even-
tually di¤ erent from zero.

(i) If
yn
!0
xn
we say that fyn g is negligible with respect to fxn g, and we write

yn = o (xn )

(ii) If
yn
! k 6= 0 (8.31)
xn
we say that fyn g is of the same order or that it is comparable with fxn g, and we write

yn xn

(iii) In particular when k = 1, i.e., when


yn
!1
xn
we say that fyn g and fxn g are asymptotic., and we write

yn xn

This classi…cation is comparative. For example, fyn g is negligible with respect to fxn g:
this does not mean that fyn g is in itself negligible, but that it becomes so when it is compared
to fxn g. The sequence n2 is negligible with respect to n5 , but, in absence of n5 , it
is not negligible at all (it tends to in…nity!).
Observe, in addition, that thanks to Proposition 274 we have
yn xn
! 1 () !0
xn yn
i.e., if and only if xn = o (yn ). Therefore, also when the ratio diverges we can use the above
classi…cation, no separate analysis is needed.

Terminology The expression yn = o (xn ) reads “fyn g is little-o of fxn g”.

We report immediately some simple properties of these notions.

Lemma 317 Let fxn g and fyn g be two sequences with terms eventually di¤ erent from zero.
Then,

(i) the relation of comparability (in particular ) is both symmetric, that is, yn xn if
and only if xn yn , and transitive, that is, zn yn and yn xn imply zn xn .

(ii) the relation of negligibility is transitive, that is, zn = o (yn ) and yn = o (xn ) implies
zn = o (xn ).
8.12. ORDERS OF CONVERGENCE AND OF DIVERGENCE 215

Proof The symmetry of follows from


yn xn 1
! k 6= 0 () ! 6= 0
xn yn k
We leave to the reader the easy proof of the other properties.

Finally, observe that


1 1
yn xn ()
yn xn
and, in particular,
1 1
yn xn () (8.32)
yn xn
provided that fxn g and fyn g are eventually di¤erent from zero. In other words, comparability
and negligibility are preserved when one passes to the reciprocals.

We now consider the more interesting cases, where both sequences are in…nitesimal or
divergent. We start with two in…nitesimal sequences fxn g and fyn g, that is, limn!1 xn =
limn!1 yn = 0. In this case, the negligible sequences tend faster to zero. Let us consider,
for example, xn = 1=n and yn = 1=n2 . Intuitively, yn goes to zero faster than xn . Indeed,
1
n2 1
1 = !0
n
n

that is yn = o (xn ). On the other hand, we have


p r
n 1
p = 1 !1
n+1 n+1
p p
and so the in…nitesimal sequences xn = 1= n and yn = 1= n + 1 are comparable.

Suppose now that the sequences fxn g1 1


n=1 and fyn gn=1 are both divergent, positively
or negatively, that is, limn!1 xn = 1 and limn!1 yn = 1. In this case, negligible
sequences tend slower to in…nity (independently on the sign), that is, they assume values
greater and greater in absolute value less rapidly. For example, let xn = n2 and yn = n.
Intuitively, yn goes to in…nity more slowly than xn . Indeed,
yn n 1
= 2 = !0
xn n n
that is yn = o (xn ). On the other hand, we have the same if xn = n2 and yn = n because
it is not the sign of the in…nity that counts, but the approaching rate.

The meaning of negligibility must therefore be speci…ed according to whether we consider


convergence to zero or to in…nity (i.e., divergence). It is important to distinguish accurately
the two cases.

N.B. Setting xn = n and yn = n + k, with k > 0, the sequences fxn g and fyn g are
asymptotic. Indeed, no matter how large k is, the divergence to +1 of the two sequences
216 CHAPTER 8. SEQUENCES

will make negligible, from the asymptotic point of view, the role of k. Such a fundamental
viewpoint, central to the theory of sequences, should not make us forget that two asymptotic
sequences are, in general, very di¤erent (to …x ideas, set for example k = 1010 , i.e., 10 billions,
and consider the asymptotic sequences xn = n and yn = n + 1010 ). O

8.12.2 Little-o algebra


The application of the concept of “little-o” is not always straightforward. Indeed, knowing
that a sequence yn is little-o of another sequence xn does not convey too much information
on the form of yn , apart from being negligible with respect to xn . There exists however an
“algebra” of little-o that allows for manipulating safely the little-o of sums and products of
sequences.

Proposition 318 For every pair of sequences fxn g and fyn g and for every scalar c 6= 0, it
holds that:

(i) o(xn ) + o (xn ) = o (xn );

(ii) o(xn )o(yn ) = o(xn yn );

(iii) c o(xn ) = o(xn );

(iv) o(yn ) + o (xn ) = o (xn ) if yn = o(xn ).

The relation o(xn ) + o (xn ) = o (xn ) in (i), bizarre at …rst sight, simply means that the
sum of two little-o of the same sequence is still a little-o of the same sequence, that is,
it continues to be negligible with respect to that sequence. The analogous re-reading of
the other properties in the proposition facilitates its understanding. Note that (ii) has the
remarkable special case
o(xn )o(xn ) = o(x2n )
Proof If a sequence is little-o of xn it can be written as xn "n , where "n is an in…nitesimal
sequence. Indeed
xn "n
lim = lim "n = 0
xn
and therefore xn "n is little-o of xn . The proof will be based on this very useful arti…ce.
(i) Let us call xn "n the …rst of the two little-o to the left of the equality symbol, and
xn n the second one, with "n and n two in…nitesimal sequences. Then
xn "n + xn n
lim = lim ("n + n) =0
xn
which shows that o(xn ) + o (xn ) is o (xn ).
(ii) Let us call xn "n the little-o of xn and yn n the little-o of yn , with "n and n two
in…nitesimal sequences. Then
xn "n yn n
lim = lim ("n n) =0
xn yn

so that o(xn )o (yn ) is o (xn yn ).


8.12. ORDERS OF CONVERGENCE AND OF DIVERGENCE 217

(iii) Let us call xn "n the little-o of xn , with "n in…nitesimal sequence. Then
c xn "n
lim = c lim "n = 0
xn
that shows that c o(xn ) is o (xn ).
(iv) Let us call yn = xn "n , with "n an in…nitesimal sequence. Then, the little-o of yn
can be written as yn n that is, xn "n n , with n an in…nitesimal sequence. Moreover, we call
xn n the little-o of xn , with n an in…nitesimal sequence. Then
xn "n n + xn n
lim = lim ("n n + n) =0
xn
so that o(yn ) + o (xn ) = o (xn ).

Example 319 Let fxn g be the sequence with n-th term xn = n2 . Consider the sequences
with n-th term yn = n and zn = 2(log n n). It is immediate to see that yn = o(xn ) = o(n2 )
and zn = o(xn ) = o(n2 ).

(i) Summing up the two sequences we obtain yn + zn = 2 log n n, which is still o(n2 ), in
accordance with (i) proved above.

(ii) Multiplying the two sequences we obtain yn zn = 2n log n 2n2 , which is o(n2 n2 )
i.e., o(n4 ), in accordance with (ii) proved above (in the special case o(xn )o(xn )). Note
that yn zn is not o(n2 ).

(iii) Take c = 3 and consider c yn = 3n. It is immediate that 3n is still o(n2 ), in accordance
with (iii) proved above.
p
(iv) Consider the sequence wn = n 1. It is immediate that wn = o(yn ) = o(n). Consider
now the sum wn + zn (with zn de…ned above),which is the sum of an o(yn ) and an
p
o(xn ), with yn = o(xn ). We have wn + zn = n 1 + 2 log n 2n, which is o(n2 ), i.e.,
o(xn ), in accordance with (iv) proved above. Note that wn + zn is not o(yn ), even if
wn is o(yn ). N

N.B. (i) To state that a sequence is o (1) simply means that it tends to 0: indeed, xn =
o (1) means that xn =1 = xn ! 0. (ii) The fourth property in the previous proposition is
particularly important, since it highlights that if yn is negligible with respect to xn , in the
sum o(yn ) + o (xn ) the little-o o(yn ) is incorporated from o (xn ). O

8.12.3 Asymptotic equivalence


The relation identi…es sequences that are equivalent to one another from the asymptotic
point of view. Indeed, it is easy to see that yn xn implies that, for L 2 R,

yn ! L () xn ! L (8.33)

In detail:

(i) if L 2 R, yn ! L if and only if xn ! L;


218 CHAPTER 8. SEQUENCES

(ii) if L = +1, yn ! +1 if and only if xn ! +1;

(iii) if L = 1, yn ! 1 if and only if xn ! 1;

All this suggests that it is possible to replace xn by yn (or vice versa) in the calculation
of the limits. Such possibility is attractive because it would allow to replace to a complicate
sequence by a simpler one that is asymptotic to it.
To make this intuition precise we start by observing that the asymptotic equivalence
is preserved under the fundamental operations.

Lemma 320 If yn xn and zn wn , then

(i) yn + zn xn + wn provided that there exists k > 0 such that eventually24

jxn = (xn + wn )j k

(ii) yn zn xn wn ;

(iii) yn =zn xn =wn provided that eventually zn 6= 0 and wn 6= 0.

Note that for sums, di¤erently from the case of products and ratios, the result does not
hold in general, but only with a signi…cant ad hoc hypothesis.
For this reason assertions (ii) and (iii) are the most interesting and in the sequel we will
concentrate on the asymptotic equivalence of products and ratios, leaving to the reader the
study of sums.
Proof (i) We have
y n + zn yn zn yn xn zn wn
= + = +
xn + wn xn + wn xn + wn xn xn + wn wn xn + wn
yn xn zn xn yn zn xn zn
= + 1 = +
xn xn + wn wn xn + wn xn wn xn + wn wn
Since yn =xn ! 1 and zn =wn ! 1, we have
yn zn
!0
xn wn
hence
yn zn xn yn zn xn yn zn
0 = k !0
xn wn xn + wn xn wn xn + wn xn wn
By the comparison criterion,
yn zn xn
!0
xn wn xn + wn
and hence, since zn =wn ! 1, we have
y n + zn
!1
xn + wn
24
For example, the condition holds if fxn g and fwn g are both eventually positive.
8.12. ORDERS OF CONVERGENCE AND OF DIVERGENCE 219

as desired.
(ii) and (iii) We have
y n zn y n zn
= !1
xn wn xn wn
and yn
zn yn wn yn wn
xn = = !1
wn zn xn xn zn
since yn =xn ! 1 and zn =wn ! 1.
The next simple lemma is very useful: in the calculation of a limit it is good to neglect
what is negligible.

Lemma 321 We have


xn xn + o (xn )

Proof It is su¢ cient to observe that


xn + o (xn ) o (xn )
=1+ !1
xn xn

Thanks to (8.33), we have therefore

xn + o (xn ) ! L () xn ! L

What is negligible with respect to the sequence fxn g, i.e., what is o (xn ), is asymptotically
irrelevant and one can safely ignore it. Together with Lemma 320, this implies for products
and ratios, that
(xn + o (xn )) (yn + o (xn )) xn yn (8.34)
and
xn + o (xn ) xn
(8.35)
yn + o (xn ) yn
We illustrate these very useful asymptotic equivalences with some examples, that we
invite the reader to read with particular attention.

Example 322 Let us consider the limit

n4 3n3 + 5n2 7
lim
2n5 + 12n4 6n3 + 4n + 1

Thanks to (8.35) we have

n4 3n3 + 5n2 7 n4 + o n4 n4 1
= = !0
2n5 + 12n4 6n3 + 4n + 1 2n5 + o (n5 ) 2n5 2n
N
220 CHAPTER 8. SEQUENCES

Example 323 Consider the limit

1 3
lim n2 7n + 3 2 +
n n2

By (8.34),25

1 3
n2 7n + 3 2 + = n2 + o n2 (2 + o (1)) 2n2 ! +1
n n2
N

Example 324 Consider the limit

n (n + 1) (n + 2) (n + 3)
lim
(n 1) (n 2) (n 3) (n 4)

By (8.35),

n (n + 1) (n + 2) (n + 3) n4 + o n4 n4
= 4 =1 !1
(n 1) (n 2) (n 3) (n 4) n + o (n4 ) n4
N

Example 325 Consider the limit

n 1
lim e 7+
n

By (8.34),
n 1 n n
e 7+ =e (7 + o (1)) 7e !0
n
N

By (8.32), we have
yn xn zn wn
() (8.36)
zn wn yn xn
provided that the ratios are (eventually) well de…ned and not zero. Therefore, once we have
established the asymptoticity of the ratios yn =zn and xn =wn , we “automatically” have also
the asymptoticity of their reciprocals zn =yn and wn =xn .
25
For k 2 R, with k 6= 0, we have k + o(1) k. Indeed,

k + o(1) 1
= 1 + o(1) ! 1
k k
8.12. ORDERS OF CONVERGENCE AND OF DIVERGENCE 221

Example 326 Consider the limit

e5n n7 4n2 + 3n
lim
6n + n 8 n4 + 5n3
By (8.35),
n
e5n n7 4n2 + 3n e5n + o e5n e5n e5
= = ! +1
6n + n8 n4 + 5n3 6n + o (6n ) 6n 6

If, instead, we consider the reciprocal limit

6n + n 8 n4 + 5n3
lim
e5n n7 4n2 + 3n
then, by (8.36),
n
6n + n8 n4 + 5n3 6
!0
e5n n7 4n2 + 3n e5
N

In conclusion, a clever use of (8.34)–(8.35) often allows to simplify in a substantial way


the calculation of limits. But, beyond the calculation, they are illuminating relations from
the conceptual point of view.

8.12.4 Characterization and decay


The next result establishes an enlightening characterization of the asymptotic equivalence.

Proposition 327 It holds that

xn yn () xn = yn + o (yn )

In other words, two sequences are asymptotic when they are equal modulo a component
that is asymptotically negligible with respect to them. This result further clari…es how the
relation can be seen as an asymptotic equality.

Proof “If.” From xn = yn + o (yn ) it follows that

xn yn + o (yn ) o (yn )
= =1+ !1
yn yn yn

“Only if.” Let xn yn . Denoting zn = xn yn , one has that


zn xn yn xn
= = 1 !0
yn yn yn

and therefore zn = o (yn ).

The next result is a nice application of this characterization.


222 CHAPTER 8. SEQUENCES

Proposition 328 Let fxn g be a sequence with terms eventually non-zero. Then
1
log jxn j ! k 6= 0 (8.37)
n

if and only if jxn j = ekn+o(n) .

Proof “If.” From jxn j = ekn+o(n) it follows that

1 1 kn + o (n)
log jxn j = log ekn+o(n) = !k
n n n
“Only if.” Set zn = log jxn j. Since k 6= 0, from (8.37) it follows that zn =kn ! 1, i.e.,
zn kn. From the previous proposition and Proposition 318-(iii) it follows that

jxn j = ezn = ekn+o(kn) = ekn+o(n)

as claimed.

When k < 0, the condition (8.37) characterizes the sequences that converge to zero at
exponential rate. In that case, we speak about exponential decay. When k > 0, there is
instead an explosive exponential behavior.

8.12.5 Terminology
Due to their importance, for the comparison both of in…nitesimal sequences and of divergent
sequences there is a speci…c terminology. In particular,

(i) if two in…nitesimal sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is in…nitesimal of higher order with respect to fxn g;

(ii) if two divergent sequences fxn g and fyn g are such that yn = o (xn ), we say that the
sequence fyn g is of lower order of in…nity with respect to fxn g.

In other words, a sequence is in…nitesimal of higher order if it tends to zero faster, while
it is of lower order of in…nity if it tends to in…nity slower. Besides the terminology (which is
not universal), it is important to recall the idea of negligibility that lies at the basis of the
relation yn = o (xn ).

8.12.6 Scales of in…nities


In light of what we have seen on the orders of convergence, we formulate the results, in large
part already obtained with the ratio criterion, for the case of comparison among exponential
sequences f n g, power sequences nk , and logarithmic sequences logk n .26 First of all,
observe that they are of in…nite order when > 1 and k > 0 and in…nitesimal when 0 < < 1
and k < 0. Moreover, we have:

(i) If > , then n


= o( n ). Indeed, n
= n = ( = )n ! 0.
26
As announced in Section 8.11, here and in the rest of the book we consider only natural logarithms.
8.12. ORDERS OF CONVERGENCE AND OF DIVERGENCE 223

(ii) nk = o ( n ) for every > 1, as already proved with the ratio criterion. We have
n = o nk if, instead, 0 < < 1 and k > 0.

(iii) If k1 > k2 , then nk2 = o nk1 . Indeed, nk2 =nk1 = 1=nk1 k2 ! 0.

(iv) logk n = o (n), as already proved with the ratio criterion.

(v) If k1 > k2 , then logk2 n = o logk1 n . Indeed,

logk2 n 1
k1
= k1 k2
!0
log n log n

The next lemma reports two important comparisons of in…nities that show that expo-
nentials are of lower order of in…nity than factorials n!. We omit the proof.

Lemma 329 One has that n = o (n!), with > 0, and n! = o (nn ).

Note that this implies, by Lemma 317, that n = o (nn ). Exponentials are, therefore, of
lower order of in…nity also compared with sequences of the type nn .
The di¤erent orders of in…nity and in…nitesimal are sometimes organized through scales.
If we limit ourselves to the in…nities (similar considerations hold for the in…nitesimals), the
most classical scale of in…nities is the logarithmic-exponential one. Taking xn = n as the
basis, we have the ascending scale
2 k n
n; n2 ; :::; nk ; :::; en ; e2n ; :::; ekn ; :::; en ; :::; en ; :::; ee ; :::

and the descending scale


1 1 p p
k
p p
k
n; n 2 ; :::; n k ; :::; log n; log n; :::; log n; :::; log log n; log log n; :::; log log n; :::

They give some “samples”for the asymptotic behavior of sequence fxn g that tends to in…nity.
For example, if xn log n, the sequence fxn g is asymptotically logarithmic; if xn n2 , the
sequence fxn g is asymptotically quadratic, and so on.
Although for brevity we omit the details, Lemma 329 shows that the logarithmic-exponential
scale can be remarkably re…ned with orders of in…nity of the type n! and nn .
n
Given this, in the applications one seldom considers orders of in…nity higher than ee
and lower than log log n. On the other hand, log log n has an almost imperceptible increase,
it is almost constant:
n 10 102 103 104 105 106
log log n 0:834 03 1:527 2 1:932 6 2:220 3 2:443 5 2:625 8
n
while ee increases explosively:

n 3 4 5 6
n
ee 5:284 9 108 5:148 4 1023 2:851 1 1064 1:610 3 10175

The asymptotic behavior of sequences tending to in…nity relevant in the applications usu-
n
ally ranges between the slowness of log log n and the explosiveness of ee . But, from the
theoretical point of view, the study of the scales of in…nity is of great elegance.
224 CHAPTER 8. SEQUENCES

8.12.7 The De Moivre-Stirling formula


In order to better illustrate how little-o analysis works, we shall present the De Moivre-
Stirling formula. Besides being a quite surprising formula, it is also used in many theoretical
and applied problems in dealing with the asymptotic behavior of n!.

Theorem 330 We have that

log n! = n log n n + o (n)


1 p
= n log n n + log n + log 2 + o (1)
2

Two approximations of log n! are thus known. The …rst one, which De Moivre came up
with, is slightly less precise as it has an error term of order o (n). The second approximation
was given by Stirling and is more accurate - its error term is o (1) - but also more complex.27

Proof We shall only show the …rst equality. By setting xn = n!=nn , in the proof of Lemma
329 we have seen that
xn+1 1
lim =
xn e
From (10.18), we have also that
p
n
p n! 1
lim n
xn = lim =
n e
p n
We can thus conclude that n= n n! = e (1 + o (1)), or n!=nn = e n (1 + o (1)) , that is to
say
n
n! = nn e n
(1 + o (1))

Hence log n! = n log n n n log (1 + o (1)). Since log (1 + an ) an per an ! 0, we can


write n log (1 + o (1)) n o (1) = o (n).

p
One can hence conclude that n! = nn e n 2 neo(1) , and so

n!
p = eo(1) ! 1
nn e n 2 n

One thus can obtain the following remarkable formula


p
n! nn e n
2 n

allowing us to elegantly conclude our argument.


27
Since o (1) =n ! 0, a sequence which is o (1) is also o (n). For this reason an error term of o (1) is better
than one of o (n).
8.12. ORDERS OF CONVERGENCE AND OF DIVERGENCE 225

8.12.8 Distribution of prime numbers


The little-o notation was born and …rst used at the end of the 18th century in the study
of the distribution of prime numbers. We introduced prime numbers in Section 1.3 where
we showed their “atomic” centrality among the other natural numbers by means of the
Fundamental Theorem of Arithmetic. The existence of in…nitely many prime numbers was
also proven thanks to a well-known theorem by Euclid so that we can speak of the sequence
of prime numbers fpn g. Nevertheless in Section 8.1 we noted that it is unfortunately not
possible to explicitly describe such a sequence. Such an issue brought us to think about the
distribution of prime numbers in N. Let : N+ ! R be the sequence whose n-th term (n)
is the number of prime numbers which are less than or equal than n. For example

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(n) 0 1 2 2 3 3 4 4 4 4 5 5 6 6 6

and so on. It is naturally not possible to fully describe the sequence as it would be equi-
valent to describing the sequence of prime numbers, which we have argued to be impossible.
Nevertheless, we can still ask ourselves whether there is be a sequence fxn g which can be
described in closed form and is asymptotically equal to : in other words, our question is
whether we can …nd a reasonably simple sequence that asymptotically approximates well
enough.
Around the year 1800, Gauss and Legendre noticed independently from on another that
the sequence fn= log ng swell approximated , as we can check by inspection of the following
table.

n (n)
n (n) log n n= log n

10 4 4; 3 0; 921
102 25 21; 7 1; 151
103 168 145 1; 161
104 1:229 1:086 1; 132
105 9:592 8:686 1; 104
1010 455:052:511 434:294:482 1; 048
1015 29:844:570:422:669 28:952:965:460:217 1; 031
1020 2:220:819:602:560:918:840 2:171:472:409:516:250:000 1; 023

One can easily see that the ratio


(n)
n
log n

becomes closer and closer to 1 as n increases. Gauss and Legendre’s conjecture was that this
was so because is asymptotically equal to fn= log ng. Their conjecture remained untested
for about a century, until it was proven to be true in 1896 by two great mathematicians
independently, Jacques Hadamard and Charles de la Vallée Poussin. The importance of
226 CHAPTER 8. SEQUENCES

such a result is testi…ed by its name, which is as simple as it is demanding.28

Theorem 331 (Prime Number Theorem) It holds that


n
(n)
log n

Although we are not able to describe the sequence , thanks to the Prime Number
Theorem we can say that its asymptotic behavior is similar to that of the simple sequence
fn= log ng, that is to say that their number in any given interval of natural numbers [m; n]
is approximately
n m
(n) (m) =
log n log m
with increasing accuracy. This result, which undoubtedly has a statistical “‡avor”is incred-
ibly elegant, even more so if we consider its following remarkable consequence.

Theorem 332 It holds that


pn n log n (8.38)

The sequence of prime numbers fpn g is thus asymptotically equivalent to fn log ng. The
n-th prime number’s value is, approximately, n log n.29 For example, by inspecting the prime
number table one can see that for n = 100 one has that pn = 541 while its “estimate” is
n log n = 460 (rounding down). Similarly:
pn
n pn n log n n log n

100 541 460 1; 176 1


1:000 7:919 6:907 1; 146 5
10:000 104:729 92:104 1; 137 1
100:000 1:299:709 1:151:292 1; 128 9
10:00:000 154:85:863 13:815:510 1; 120 9
10:000:000 179:424:673 161:180:956 1; 113 2
100:000:000 2:038:074:743 1:842:068:074 1; 106 4
1:000:000:000 22:801:763:489 20:723:265:836 1; 100 3

One can see that the ratio between pn and its estimate n log n stays steadily around 1.

Proof From the Prime Number Theorem one has that


log n
(n) !1
n
28
The non-easy proof of the theorem requires Complex Analysis methods which we do not cover in these
notes. The use of Complex Analysis in the study of prime numbers is due to Bernhard Riemann’s deep
intuition. Only in 1949 two other outstanding mathematicians, Paul Erdös e Atle Selberg, were able to prove
this results using Real Analysis methods.
29
In the next subseqction we shall clarify the meaning of the adverb “approximately ”.
8.13. SEQUENCES IN RN 227

hence, for any " > 0, there is an n" such that

log n
(n) 1 <" 8n n" (8.39)
n

Since pn ! 1, there is an n" such that pn n" per n n" . Hence (8.39) implies that

log pn
(pn ) 1 <" 8n n"
pn

At the same time, one has that (pn ) = n, so that

log pn
n 1 <" 8n n"
pn

that is,
log pn
n !1 (8.40)
pn
from which it follows that
log pn
log n ! log 1 = 0
pn
or, log n + log log pn log pn ! 0. Since log pn ! +1,
log n log log pn log n + log log pn log pn
+ 1= !0
log pn log pn log pn

Yet, log log pn = log pn ! 0 (can you explain why?), and so

log n
!1
log pn

Multiplying by (8.40) we get that

n log n log pn log n


=n !1
pn pn log pn

thus showing that (8.38) holds.

8.13 Sequences in Rn
We now examine sequences xk of vectors in Rn . For them we give the following de…nition
of limit that follows closely the one already given for sequences in R. The fundamental
di¤erence is that each element of the sequence is now a vector xk = (xk1 ; xk2 ; :::; xkn ) 2 Rn and
not a scalar.

De…nition 333 We say that the sequence xk in Rn tends to L 2 Rn , in symbols xk ! L


or lim xk = L, if for every " > 0 there exists n" 1 such that

k n" =) kxk Lk < "


228 CHAPTER 8. SEQUENCES

In other words, xk ! L = (L1 ; L2 ; :::; Ln ) if the numeric sequence xk L converges to


zero. Since r
Xn 2
k
x L = xki Li
i=1

we recognize immediately that

xk L ! 0 () xki Li !0 8i = 1; 2; : : : ; n (8.41)

that is if and only if the numerical sequences xki of the i-th components converge to the
component Li of the vector L.
The convergence of a sequence of vectors therefore reduces to the convergence of the
sequences of the single components and hence it does not present any di¢ culty of under-
standing or calculation.

N.B. Observe that a sequence in Rn is nothing but the restriction to N+ of a vector function
f : R ! Rn . O

Example 334 Consider the sequence

1 1 2k + 3
1 + ; 2;
k k 5k 7

in R3 . Since
1 1 2k + 3 2
1+ !1 , !0 and !
k k2 5k 7 5
the sequence converges to the vector (1; 0; 2=5). N

In an analogous way we de…ne the divergences to +1 and to 1 when all the components
of the vectors that form the sequence diverge respectively to +1 or to 1. When, …nally, the
single components have di¤erent behaviors (some converge, others diverge or are irregular)
the sequence of vectors does not have a limit. For brevity, we omit the details.

Notation The sequences of vectors are denoted by xk instead of fxn g to avoid confusion
with the dimension n of the space Rn and to be able to indicate the single components xki
of each vector xk of the sequence.
Chapter 9

Series

9.1 The concept

The idea that we want to develop here concerns, roughly, the possibility of summing in…nitely
many addends. To provide a rudimentary example, imagine a stick 1 meter long and cut it
in half, obtaining in this way two pieces 1=2 meter long; then cut the second piece in half,
obtaining two pieces 1=4 meter long; cut again the second piece, obtaining two pieces 1=8
meter long, and continue ideally, without never stopping. This cutting process would result
in in…nitely many pieces of length 1=2, 1=4, 1=8, ... in which the original stick of 1 meter
has been divided. It is rather natural to imagine that

1 1 1 1
+ + + + + =1 (9.1)
2 4 8 2n

i.e., that — by reassembling the individual pieces — we would get back the original meter.
In this chapter we will give a precise meaning to equalities like (9.1). Let us imagine,
therefore, a sequence fxn g, and that we want to “sum” all the terms, i.e., to carry out the
operation
1
X
x1 + x2 + + xn + = xn
n=1

To give a precise meaning to this new operation of “addition of in…nitely many summands”,
which is di¤erent from the ordinary addition (as we will realize),1 we will sum a …nite number
of terms, say n, and then make n tend to in…nity and take the resulting limit, if it exists, as
the value to assign to the series. We are, therefore, thinking of constructing a new sequence

1
We cannot really sum in…nitely many summands: all the world paper would not su¢ ce, nor would our
entire life, we would not know where to put the line that one traditionally writes under the summands before
adding them, etc..

229
230 CHAPTER 9. SERIES

fsn g, so de…ned by

s1 = x1 (9.2)
s2 = x1 + x2
s3 = x1 + x2 + x3

sn = x1 + + xn

and to take the possible limit of fsn g as the sum of the series.

De…nition
P1 335 The series with terms given by a sequence fxn g of scalars, in symbols
x
n=1 n , is the sequence fsn g de…ned in (9.2). The terms sn of the sequence are called
partial sums of the series.

P
The series 1 n=1 xn is therefore de…ned as the sequence fs g of the partial sums, whose
Pn1
limit behavior determines its value. In particular, the series n=1 xn is:

(i) convergent, to the sum S, in symbols

1
X
xn = S
n=1

if lim sn = S 2 R;
P1
(ii) positively divergent, in symbols n=1 xn = +1, if lim sn = +1;
P
(iii) negatively divergent, in symbols 1
n=1 xn = 1, if lim sn = 1;

(iv) irregular (or oscillating) if the sequence fsn g is irregular.

Brie‡y, we attribute to the series the same character (convergence, divergence, or irreg-
ularity) as that of its sequence of the partial sums2 .

O.R. Sometimes it is useful to start the series with the index n = 0 rather than from n = 1.
When the option exists (we will see that this is not the case for some types of series, like the
harmonic series, which for example cannot be de…ned for n = 0), the choice to start a series
from either n = 0 or n = 1 (or from another value of n) is a pure matter of convenience
and the context itself typically suggests the best choice. In any case, this choice does not
alter the character of the series and, therefore, it does not a¤ect the problem of determining
whether the series converges or not. H
2
Using the terminology already employed for the sequences, a series is sometimes called regular when it is
not irregular, that is when one of the cases (i)–(iii) holds.
9.1. THE CONCEPT 231

9.1.1 Three classical series


We illustrate the previous notions with three important series.

Example 336 (Mengoli series) The Mengoli series is given by:


1
X
1 1 1 1
+ + + + =
1 2 2 3 n (n + 1) n (n + 1)
n=1

Since
1 1 1
=
n (n + 1) n n+1
one has that
1 11
sn = + + +
1 2 2 3 n (n + 1)
1 1 1 1 1 1 1 1
=1 + + + + =1 !1
2 2 3 3 4 n n+1 n+1
Therefore,
1
X 1
=1
n (n + 1)
n=1
and so the Mengoli series converges and it has sum 1. N

Example 337 (Harmonic series) The harmonic series is given by:


1
X
1 1 1 1
1+ + + + + =
2 3 n n
n=1

Let us consider its partial sums taken for indices n that are powers of 2 (n = 2k ):
1
s1 = 1; s2 = 1 +
2
1 1 1 1 1 1 1 1
s4 = 1 + + + > 1 + + + = s2 + = 1 + 2
2 3 4 2 4 4 2 2
1 1 1 1 1 1 1 1 1 1
s8 = s4 + + + + > s4 + + + + = s4 + > 1 + 3
5 6 7 8 8 8 8 8 2 2
Continuing in this way we see that
1
s2k > 1 + k (9.3)
2
The sequence of partial sums is strictly increasing (since the summands are all positive) and
therefore it admits limit; (9.3) guarantees that it is not bounded from above and therefore
lim sn = +1. Hence,
X 1
1
= +1
n
n=1

i.e., the harmonic series diverges positively.3 N


3
In Appendix D.2, we present another proof of the divergence of the harmonic series, due to Pietro Mengoli.
232 CHAPTER 9. SERIES

Example 338 (Geometric series) The geometric series with ratio q is de…ned as follows:
1
X
1 + q + q2 + q3 + + qn + = qn
n=0

Its character depends on the value of q. In particular, we prove that:


8
> +1 if q 1
>
>
1
X < 1
qn = if jqj < 1
>
> 1 q
n=0 >
:
irregular if q 1

To verify this, we start by observing that when q = 1 we have

sn = |1 + 1 +
{z + 1} = n + 1 ! +1
n+1 times

Let now q 6= 1. Since

sn qsn = 1 + q + q 2 + q 3 + + qn q 1 + q + q2 + q3 + + qn
= 1 + q + q2 + q3 + + qn q + q2 + q3 + + q n+1 = 1 q n+1

we have
(1 q) sn = 1 q n+1

and therefore, since q 6= 1,


1 q n+1
sn =
1 q
It follows that
1
X 1 q n+1
q n = lim
n!+1 1 q
n=0

The study of this limit is divided into several cases:

(i) if 1 < q < 1, we have q n+1 ! 0 and so

1
sn !
1 q

(ii) if q > 1, we have q n+1 ! +1 and so sn ! +1;

(iii) if q = 1, the partial sums of odd order are equal to zero, while those of even order
are equal to 1. The sequence formed by them is hence irregular;

(iv) if q < 1, the sequence q n+1 is irregular and therefore it is so also fsn g. N
9.1. THE CONCEPT 233

Epicurus in his letter to Herodotus wrote “Once one says that there are in…nite parts in
a body or parts of any degree of smallness, it is not possible to conceive how this should be,
and indeed how could the body any longer be limited in size?” The former examples show
that, indeed, if these “parts”, these particles, have a strictly positive, but di¤erent size — for
example either 1=n (n + 1) or q n , with q 2 (0; 1) — then the series might converge, and so
the size of the “body” could be de…ned. Nevertheless, Epicurus was right in the sense that,
if we assume –as it seems he does too –that all the particles have same size, no matter how
small: the series
"+"+"+ +"+
P1
positively diverges, that is, n=1 " = +1, for every " > 0. Indeed, we have sn = n" ! +1.
This simple series has an important philosophical counterpart (the properties of series have
been often used, even within philosophy, to try to clarify the nature of the potential in…nite).

9.1.2 Intertemporal utility with in…nite horizon


Series are applied fruitfully in economics. For example, let us go back to the intertemporal
choices with …nite horizon introduced in Section 8.3.
We saw how an intertemporal consumption pro…le can be represented by a sequence of
the type
x = fx1 ; x2 ; : : : ; xt ; : : :g
and it can be quanti…ed by an intertemporal utility function U : A R1 ! R. In particular,
we mentioned the possible form of U , given by
t 1
U (x) = u1 (x1 ) + u2 (x2 ) + + ut (xt ) + (9.4)

where 2 (0; 1) is the subjective discount factor. In the light of what we have just seen,
(9.4) is the series
X1
t 1
ut (xt ) (9.5)
t=1

Series allow therefore to give a correct meaning to the fundamental speci…cation (9.4)
of the intertemporal utility function. Naturally, we are interested in the case in which the
series (9.5) is convergent, because we want for the overall utility that the consumer gets
from the intertemporal consumption fx1 ; x2 ; : : : ; xt ; : : :g to be …nite. Otherwise, how could
we compare, and hence choose, among such pro…les if we get in…nite utility?4
Using the properties of the geometric series, it is possible to show that the series (9.5)
converges if and only if < 1, provided that the utility functions ut are positive and bounded
by the same constant. In such a case, having assumed 2 (0; 1), the intertemporal utility
function
X1
t 1
U (x) = ut (xt ) (9.6)
t=1

has as domain all R1 ,


that is, U (x) 2 R for every x 2 R1 . It allows to compare all the
possible intertemporal consumptions pro…les.
4
The mind goes to the famous “bet”of Blaise Pascal. If God did not exist, to believe or not would procure
…nite utility; but if does God exist, then to believe gives in…nite utility (the eternal beatitude) and not to
believe utility equal to minus in…nity (the eternal damnation): hence it is reasonable to believe in God.
234 CHAPTER 9. SERIES

9.2 Elementary properties


Given that the character of a series is indeed determined by the character of the sequence
of its partial sums, it is evident that subtracting, adding, or modifying a …nite number of
terms of a series,
Pdoes not change its character. On the contrary, in general, its sum
P1 changes.
1
In particular, n=1 xn has the same character (but not the same sum) as n=k xn for
every integer k > 1: the di¤erence between the two sums is nothing but the total of the
modi…cations that have been introduced.
Concerning the fundamental operations we have
1
X 1
X
cxn = c xn 8c 2 R
n=1 n=1

and, when we do not fall in a form of indeterminacy of the type 1 1,


1
X 1
X 1
X
(xn + yn ) = xn + yn
n=1 n=1 n=1
P
The next result is quite obvious, but important. If 1n=1 xn converges, then xn necessarily
tends to 0: the summand must eventually become irrelevant to avoid having an exploding
sum.
P
Theorem 339 If the series 1 n=1 xn converges, then xn ! 0.

Proof Trivially, we have xn = sn sn 1 and, given that the series converges, sn ! S as


well as sn 1 ! S; therefore xn = sn sn 1 ! S S = 0.

Convergence to zero of the sequence fxn g is therefore a necessary condition for conver-
gence of its series. The fact that this condition is only
P1 necessary is demonstrated by the
harmonic series: even if we have 1=n ! 0, the series n=1 1=n diverges.

Example 340 The series of general term


2n2 3n + 4
xn =
17n2 + 4n + 5
is not convergent because the generic term is asymptotic to 2n2 =17n2 = 2=17 and therefore
it does not tend to 0. N

9.3 Series with positive terms


9.3.1 Comparison convergence criterion
P1
We examine now the special case of series n=1 xn with all the terms positive, that is,
xn 5
0. In such a case, the sequence fsn g of the partial sums is increasing and therefore
the following regularity result holds trivially.
5
Nothing would change if the terms were positive only eventually. Indeed, we can always discard a
…nite number of terms without altering the asymptotic behavior of the series. Hence, all the results on the
asymptotic behavior of series with positive terms hold, more generally, for series with eventually positive
terms.
9.3. SERIES WITH POSITIVE TERMS 235

Proposition 341 Each series with positive terms is convergent or positively divergent. In
particular, it is convergent if and only if it is bounded from above.6

The series with positive terms inherit hence the remarkable regularity properties of mono-
tonic sequences. This gives them a particularly important status among the series. For them,
we now recast the convergence criteria presented in Section 8.9 for the sequences.
P1 P1
Proposition 342 (Comparison criterion) Let n=1 xn and n=1 yn be two series with
positive terms and let xn yn eventually.
P1 P
(i) If diverges positively, then so does 1
n=1 xn n=1 yn .
P1 P1
(ii) If n=1 yn converges, then so does n=1 xn .
P 0
Proof Let n0 1 be such that xn yn for all n n0 , and set = nn=1 (yn xn ). By
calling sn and n the partial sums of the two sequences, for n > n0 we have that
Xn
n sn = + (yk xk )
k=n0 +1

that is, n sn + . Therefore, the statement follows from Proposition 282 (which is the
sequence counterpart of this statement).

Note that (i) is the contrapositive of (ii), and vice versa: indeed, thanks to Proposition
341, for a series with positive terms the negation of the convergence is the positive diver-
gence.7 For their utility we stated both, but it is the same property seen in equivalent
ways.

Example 343 The series


X1 10n
n=1 n52n+3
converges. Indeed, since
n
10n 10n 10n 1 2
= =
n52n+3 52n+2 25n 52 25 5

the convergence of the geometric series with ratio 2=5 guarantees, via the comparison cri-
terion, the convergence of the series. N

Example 344 The series of the reciprocal of the factorials8


1
X 1
n!
n=0
6
By de…nition, the series is bounded from above when the sequence of the partial sums is so, i.e., there
exists k > 0 such that jsn j k for every n 1.
7
Recall that, given two properties p and q, the implication :q =) :p is the contrapositive of the original
implication p =) q. P
8
reason, we start the series from n = 0, which allows to write 1
Recall that 0! = 1. For this P n=0 1=n! = e:
a more elegant expression than 1 n=1 1=n! = e 1.
236 CHAPTER 9. SERIES

converges. Indeed, observe that


1
X 1
X 1
X
1 1 1
=1+1+ =2+
n! n! (n + 1)!
n=0 n=2 n=1

But the series


1
X 1
(n + 1)!
n=1
converges because, for every n 3,
1 1
<
(n + 1)! n (n + 1)
where the latter is the generic term of Mengoli’ P1s series, which we know converges. By
the
P1 comparison criterion, the convergence of n=0 1=n! follows from the convergence of
n=1 1= (n + 1)! We will see later that its sum is the Napier’s constant e. N

Example 345 We call generalized harmonic series the series


1
X 1
n
n=1

with 2 R. If = 1, it reduces to the harmonic series that we know diverges to +1.


If < 1, it is easy to see that, for every n > 1,
1 1
> (i.e., n < n)
n n
and therefore, by the comparison criterion,
1
X 1
= +1
n
n=1

If = 2, the generalized harmonic series converges. Indeed, let us observe that


1
X 1
X 1
X
1 1 1
2
= 1 + 2
= 1 +
n n (n + 1)2
n=1 n=2 n=1

But the series


1
X 1
(n + 1)2
n=1
converges because for every n 1
1 1
<
(n + 1)2 n (n + 1)
which is the genericPterm of the convergent Mengoli series.9 By the P comparison criterion,
the convergence of 1 n=1 1=n 2 is a consequence of the convergence of 1 2
n=1 1= (n + 1) .
9 P1
Indeed, it is true that n=1 1=n2 = 2
=6, but here we do not have the tools to prove this remarkable
result.
9.3. SERIES WITH POSITIVE TERMS 237

If > 2, then
1 1
< 2
n n
for every n > 1 and therefore we still have convergence.
Finally, it is possible to see, but it is more delicate, that the generalized harmonic series
converges also for all values of 2 (1; 2).
To sum up, the generalized harmonic series
1
X 1
n
n=1

converges for > 1, while it diverges for 1. N

For the generalized harmonic series, the case = 1 is hence the “last”case of divergence:
it is su¢ cient to very slightly increase the exponent, from 1 to 1 + " with " > 0, and the
series will converge. This suggests that the divergence is extremely slow, as the reader can
check calculating some of the partial sums.10 This intuition is made precise by next nice
result.

Proposition 346 We have

1 1 1
1+ + + + log n (9.7)
2 3 n

In other words, the sequence of the partial sums of the harmonic series is asymptotic to
the logarithm.

Example 347 More generally, it is possible to prove that the series


1
X 1
n=2
n log n

converges for > 1 and any , as well as for = 1 and > 1, while it diverges for <1
and any , as well as for = 1 and any 1. N

The comparison criterion has a nice and useful asymptotic version, based on the asymp-
totic comparison of the terms of the sequences.
P P
Proposition 348 (Asymptotic comparison criterion) Let 1 xn and 1
n=1P yn be two
n=1P
series with positive and non-zero terms. If xn yn , then the series n=1 xn and 1
11 1
n=1 yn
have the same character.
10
A famous professor talked about “cadaverous in…nity”.
11
The hypothesis that the terms are non-zero is necessary to make the ratio xn =yn well-de…ned.
238 CHAPTER 9. SERIES

Example 349 Let


2n3 3n + 8
xn =
5n5 n4 4n3 + 2n2 12
Since
2n3 2
xn
5
= 2
P 5n 5n
the series 1 x
n=1 n converges. Let, instead,
n+1
xn =
n2
3n + 4
P1
Since xn 1=n, the series n=1 xn diverges to +1. N

9.3.2 Ratio convergence criterion: prelude


In the next section we will present the important convergence criterion of the ratio. For the
impatient reader, before doing this we see its simplest version.
P1
Proposition 350 (Ratio criterion, elementary limit form) Let n=1 xn be a series
with positive and non-zero terms, and suppose that the limit lim xn+1 =xn exists.
(i) If
xn+1
lim <1
xn
the series converges.
(ii) If
xn+1
lim >1
xn
the series diverges positively.
The criterion is based on the study of the limit of the ratio
xn+1
xn
of the terms of the sequence. The condition that the limit lim xn+1 =xn exists is rather
stringent, as we will see in the next section. But, when this condition is satis…ed, the
elementary limit form of the ratio criterion is easy to apply and turns out to be quite useful.
Example 351 (i) The series
1
X n2 + 5n + 1
n2n + 1
n=1
converges. Indeed,
(n+1)2 +5(n+1)+1
xn+1 (n+1)2n+1 +1 (n + 1)2 + 5 (n + 1) + 1 n2n + 1
= n2 +5n+1
=
xn n2 + 5n + 1 (n + 1) 2n+1 + 1
n2n +1
2
n + 7n + 7 n2n + 1 n2 n2n
=
n2 + 5n + 1 (n + 1) 2n+1 + 1 n2 (n + 1) 2n+1
n2n 1 n 1
= n+1
= !
(n + 1) 2 2n+1 2
which implies the convergence of the series.
9.3. SERIES WITH POSITIVE TERMS 239

(ii) The series


1
X 2n!
3n
n=1
diverges positively. Indeed,
2(n+1)!
xn+1 3n+1 2 (n + 1)! 3n 1 (n + 1)! 1
= = = = (n + 1) ! +1
xn 2n!
3n
3n+1 2n! 3 n! 3

which implies the divergence of the series. N

If lim xn+1 =xn exists, but


xn+1
lim =1
xn
then nothing can bePsaid, as the series
P1 may converge or diverge positively. This is well illus-
trated by the series 1n=1 1=n and n=1 1=n 2 : although for both we have lim (x
n+1 =xn ) = 1,
the …rst one diverges, while the second one converges.

9.3.3 Ratio criterion


We now study the ratio criterion in more depth, giving …rst a more general version.
P
Proposition 352 (Ratio criterion) Let 1 n=1 xn be a series with, eventually, positive and
non-zero terms.12

(i) If there exists a number q < 1 such that eventually


xn+1
q (9.8)
xn
then the series converges.

(ii) If instead the ratio is eventually 1, the series diverges positively.

The theorem requires that the ratios are (uniformly) smaller than a number q which
is itself smaller than 1, and not simply that they are all smaller than 1. Indeed, for the
harmonic series the ratios are
1
n+1 n
1 = n+1
n
so all lower than 1, but the series diverges (as the ratios tend to 1, there is no room to insert
a number q that is simultaneously
P1 greater than all and smaller than 1).
Since the convergence of n=1 xn implies xn ! 0 (Theorem 339), the ratio criterion for
series can be seen as an extension of the homonymous criterion for the sequences. A similar
observation holds for the root criterion that we will see soon.

Proof From xn+1 qxn we deduce, as in the analogous criterion for sequences that 0 < xn
q n 1 x1 , and the …rst statement follows from the comparison criterion (Proposition 342) and
12
The hypothesis that the terms xn are non-zero ensures that the ratio xn+1 =xn is well de…ned (recall the
analogous condition required for the asymptotic comparison criterion).
240 CHAPTER 9. SERIES

from the convergence of the geometric series. If instead xn+1 =xn 1, i.e., if xn+1 xn > 0,
fxn g is increasing and therefore it cannot tend to 0.

It is possible to prove (see Section 10.4) that if the lim(xn+1 =xn ) exists, the ratio criterion
assumes exactly the tripartite form given in Proposition 350; in particular, if
xn+1
lim =1
xn

the criterion fails gives no indication about the character of the series.

At the operative level, the tripartite form is the usual one in which we apply the ratio
criterion. At the mechanical level, it is su¢ cient to recall the tripartition of Proposition
350 and the illustrative examples given in the Prelude. But, not to do plumbing rather
than mathematics, it is important to keep in mind the theoretical foundations provided by
Proposition 352.
Let us see other examples.
P
Example 353 (i) The series 1 k n
n=1 n q converges for every k 2 R and every 0 < q < 1.
Indeed,
(n + 1)k q n+1 n+1 k
= q !q<1
nk q n n
This shows also that this series diverges positively when q > 1.

(ii) The series


1
X xn
n!
n=0

converges for every x > 0. Indeed, for n 1 we have

xn+1 n! x
= !0 8x > 0
(n + 1)! xn n+1

(iii) The series


1
X xn
n
n=1

converges for every 0 < x < 1. Indeed,

xn+1 n n
n
= x !x
n+1 x n+1
which obviously is < 1 when 0 < x < 1. N

We stop here our study of the convergence criteria. Much more can be said: in Section
10.4 we will continue to investigate this topic in some more depth.
9.3. SERIES WITH POSITIVE TERMS 241

9.3.4 A …rst series expansion


The very important number Napier’s constant e has been introduced in the previous chapter
as the limit of the sequence (1 + 1=n)n . Surprisingly, it emerges also as sum of the series
X 1
1=n! of the reciprocals of the factorials.
n=0

Proposition 354 We have


1
X 1
=e (9.9)
n!
n=0

Proof In Example 344 we have shown that the series converges. Let us calculate its sum.
By Newton’s formula (A.4), we have
n n
X n
X
1 n 1 1 n! 1
1+ = k
=
n k n k! (n k)! nk
k=0 k=0

On the other hand,

n! k
= n (n 1) (n k + 1) |n {z n} = n
(n k)! | {z }
k times k times

Therefore,
n! 1
1
(n k)! nk
which implies
n n
X n
X
1 1 n! 1 1
1+ =
n k! (n k)! nk k!
k=0 k=0

It follows that
1
X 1
e (9.10)
n!
n=0

For every k 1 we have


n! 1
lim =1 (9.11)
n!1 (n k)! nk
Indeed,
n! 1 n (n k) (n k + 1) nk
k
= =1
(n k)! n nk nk
Let us …x m 1. For every n > m we have
n n
X m
X n
X
1 1 n! 1 1 n! 1 1 n! 1
1+ = k
= k
+
n k! (n k)! n k! (n k)! n k! (n k)! nk
k=0 k=0 k=m+1
Xm
1 n! 1
k! (n k)! nk
k=0
242 CHAPTER 9. SERIES

and therefore, thanks to (9.11),


n m
X m
X m
X
1 1 n! 1 1 n! 1 1
e = lim 1+ lim = lim =
n!1 n n!1 k! (n k)! nk k! n!1 (n k)! nk k!
k=0 k=0 k=0

Since this holds for every m, we have


m
X 1
X
1 1
e lim =
m!1 k! n!
k=0 n=0

that, in conjunction with (9.10), implies (9.9).

The sum (9.9) can be generalized in a substantial way (we omit the proof).

Theorem 355 For every x 2 R, we have


1
X
x n xn
ex = lim 1+ = (9.12)
n!1 n n!
n=0

The equality (9.12) holds for every number x and it reduces to (9.9) in the special case
x = 1. Note the very remarkable series expression
1
X
x xn x2 x3 xn
e = =1+x+ + + + + (9.13)
n! 2 3! n!
n=0
X1
of the exponential function. We will see soon that xn =n! is a power series. For this
n=0
reason, the equality (9.13) is called the power series expansion of the exponential function.
It is a result as elegant as important, which allows to “decompose”the exponential function
in a sum (although in…nite) of elementary functions such as the powers xn .
We will study in greater generality series expansions with tools of di¤erential calculus,
of which series expansions are one of the most remarkable applications.

9.4 Series with terms of any sign


For a more thorough treatment of the topic, we refer the reader to more advanced textbooks.

9.4.1 Absolute convergence


P1
Consider now the general case of series n=1 xn with general terms xn , not necessarily
always positive, not even eventually. To study them we consider an auxiliary series with
positive terms, leaving only to come back through the back door.
P P1
De…nition 356 The series 1 n=1 xn is said to be absolutely convergent if the series n=1 jxn j
of its absolute values is convergent.

The next result shows that the convergence of the series of absolute values, which can be
veri…ed with the criteria just discussed, guarantees the convergence of the much wilder, not
necessarily positive, original series. We omit the proof.
9.4. SERIES WITH TERMS OF ANY SIGN 243

P1 P1
Theorem 357 If n=1 xn converges absolutely, then n=1 xn converges.

Note that the condition is only su¢ cient: the series


1
X ( 1)n 1 1 1 1 1
= 1+ + + +
n 2 3 4 5 6
n=1

converges (see Proposition 361), but not absolutely since


1
X 1
X
( 1)n 1
= = +1
n n
n=1 n=1

The class of absolutely convergent series is therefore contained in the one of convergent
series. It is possible to prove that this subclass has fundamental properties of regularity with
respect to the convergent series of generic terms that are not absolutely convergent. In other
words, the absolutely convergent series are, among the convergent ones, ones that behave
signi…cantly better.

Example 358 Let us retake, to complete it, Example 353.


P1 k qn
(i) By Theorem 357 and the ratio criterion, the series n=1 n converges for every
k 2 R and every 1 < q < 1. Indeed, from
k
jxn+1 j (n + 1)k q n+1 n+1
= = jqj ! jqj < 1
jxn j nk q n n

it follows that it converges absolutely.


P
(ii) The series 1 n
n=1 x =n! converges for every x 2 R. Indeed, from

jxn+1 j xn+1 n! x
= = !0 8x 2 R
jxn j (n + 1)! xn n+1
it follows that it converges absolutely.
P
(iii) The series 1 n
n=1 x =n converges for every 1 < x < 1. Indeed,

jxn+1 j xn+1 n n
= n
= jxj ! jxj
jxn j n+1 x n+1
which obviously is < 1 when 1 < x < 1. Thus, also this series converges absolutely.N

Example 359 Let us consider the series


1
X ( 1)n
n2
n=1

We have ( 1)n =n2 = 1=n2 . Therefore, the series assigned converges absolutely. N

Example 360 We add two other examples.


244 CHAPTER 9. SERIES

(i) The series


1
X
x3 x5 x7 x2n+1
x + + = ( 1)n
3! 5! 7! (2n + 1)!
n=0
converges for every x 2 R. Indeed,
x2n+3 (2n + 1)! x2
= !0 8x 2 R
(2n + 3)! x2n+1 (2n + 3) (2n + 2)
and therefore the series converges absolutely.
(ii) The series
1
X
x2 x4 x6 x2n
1 + + = ( 1)n+1
2! 4! 6! (2n)!
n=0
converges for every x 2 R. Indeed,
x2n+2 (2n)! x2
= !0 8x 2 R
(2n + 2)! x2n (2n + 2) (2n + 1)
and therefore also this last series converges absolutely. N

9.4.2 Alternating series


We close by considering series that have terms of alternating sign:
1
X
n+1
x1 x2 + x3 x4 + + ( 1) xn + = ( 1)n+1 xn
n=1

with every xn 0. We can say already something interesting about them (we omit the
proof).
P1 n+1
Proposition 361 The series n=1 ( 1) xn with alternating sign is convergent if the
sequence fxn g is decreasing and in…nitesimal.

As we well know by now, the condition xn ! 0 is necessary, but not su¢ cient for the
convergence of a generic series. For an alternating series though, it becomes also su¢ cient,
provided fxn g is decreasing.13

Example 362 By using this result, we can conclude that the series
1
X 1 1 1 1 1 1
( 1)n+1 =1 + + +
n 2 3 4 5 6
n=1

which is called the alternating harmonic series, and the series


1
X ( 1)n 1 1 1 1 1
= 1+ + + +
n 2 3 4 5 6
n=1

which we have seen above, are convergent.14 N

13
Note that xn is the absolute value of the term ( 1)n+1 xn of the sequence.
14
The sum of the alternating harmonic series is log 2.
Chapter 10

Discrete calculus

Discrete calculus deals with problems analogous to those of di¤erential calculus, with the
di¤erence that sequences, that is functions f : N f0g ! R with discrete domain, are
considered instead of functions on R. As we will see, Discrete calculus results are more raw
and less neat than the those obtained in di¤erential calculus.1 Nevertheless, discrete calculus
can be very useful in some applications. More precisely, in this chapter we will show its use
in the study of series and sequences, allowing for a deeper analysis of some issues which we
have already discussed.

10.1 Preamble: limit points


Let fxn g be a bounded sequence of real numbers, that is, there exists M > 0 such that
M xn M for every n. Let us consider the ancillary sequences fyn g and fzn g de…ned
as
yn = sup xk and zn = inf xk
k n k n

Example 363 If we consider the sequence f( 1)n g we have that yn = 1 and zn = 1 for
every n, whereas for the sequence f1=ng we have that yn = 1=n and zn = 0 for every n. N

It is immediate to verify that

M zn yn M 8n (10.1)

hence, also the ancillary sequences are bounded. Moreover,

n1 < n2 =) sup xk sup xk and inf xk inf xk


k n1 k n2 k n1 k n2

therefore fyn g is decreasing and fzn g is increasing. Since they are monotone, from Theorem
285 we have that fyn g and fzn g converge. If we denote their limits as y and z, that is,
yn ! y and zn ! z, we can write

lim sup xk = y and lim inf xk = z


n!+1 k n n!+1 k n

1
Some parts of this chapter require a basic knowledge of di¤erential calculus. This chapter can be read
seamlessly after Chapter 18.

245
246 CHAPTER 10. DISCRETE CALCULUS

The limits y and z are respectively referred to as limit superior and limit inferior of fxn g
and they are denoted as lim sup xn and lim inf xn .

Example 364 For the divergent sequence f( 1)n g we have that

lim sup xn = 1 and lim inf xn = 1

whereas for the converging sequence f1=ng we have that

lim sup xn = lim inf xn = lim xn = 0

This example shows two fundamental properties of these limits: they always exist, even if
the original sequence has no limit2 and their equality is a necessary and su¢ cient condition
for the convergence of the sequence fxn g: lim sup xn = lim inf xn if and only if lim xn .
Formally:

Proposition 365 Let fxn g be a bounded sequence of real numbers. We have

1 < lim inf xn lim sup xn < +1 (10.2)

In particular, fxn g converges to L 2 R if and only if lim inf xn = lim sup xn = L.

Proof Thanks to (10.1), (10.2) follows from Proposition 282. The proof of the second part
of the statement is left as an exercise for the reader.

Other interesting properties, are

lim inf xn = lim sup xn and lim sup xn = lim inf xn (10.3)

They are duality properties, as they relate the limit superior and limit inferior of the sequence
fxn g with those of the opposite sequence f xn g. For instance, this simple duality allows to
easily translate some properties of the limit superior into properties of the limit inferior, and
vice versa. This is exactly what happens in the next proof.
Another interesting consequence of the duality is the possibility to rewrite the inequality
(10.2) as
lim inf xn lim inf xn

The next result lists some simple, yet useful, properties of the limit superior and the
limit inferior. Thanks to the previous result, they imply the similar properties which we
have listed for convergent sequences.

Lemma 366 Let fxn g and fyn g be two bounded sequence of real numbers. We have:

(i) lim inf xn + lim inf yn lim inf (xn + yn ),

(ii) lim sup xn + lim sup yn lim sup (xn + yn ) ;


2
Since it is bounded, fxn g converges or oscillates, but does not diverge.
10.1. PREAMBLE: LIMIT POINTS 247

(iii) lim inf xn lim inf yn and lim sup xn lim sup yn if eventually xn yn .

Proof (i) For every n we have that supk n (xk + yk ) supk n xk + supk n xk . Hence, (i)
follows from Proposition 282. (ii) follows from (i) and the duality result (10.3):

lim sup (xn + yn ) = lim inf (( xn ) + ( yn ))


lim inf ( xn ) lim inf ( yn ) = lim sup xn + lim sup yn

The proof of (iii) is left to the reader.

It is possible to give a topological characterization of these limits; in order to do so, we


introduce the notion of limit point.

De…nition 367 L 2 R is a limit point for a sequence fxn g if every neighborhood of L


contains an in…nite number of elements of the sequence.

If the sequence converges, there exists a unique limit point: the limit of the sequence.
If the sequence does not converge, the limit points are the scalars which are approached
by in…nitely many elements of the sequence, even if the sequence does not converge to said
scalars. Indeed, it can be easily shown that L is a limit point for fxn g if and only if there
exists a subsequence fxnk g that converges to L.

Example 368 (i) The interval [ 1; 1] is the set of limit points of the sequence fsin ng ,
whereas f0; 1g are the limit points of the sequence f( 1)n g. (ii) the singleton f0g is the
unique limit point of the convergent sequence f1=ng. N

The next result shows that the limit points belong to the interval determined by the limit
superior and the limit inferior.

Proposition 369 Let fxn g be a bounded sequence of scalars. A value x 2 R is a limit point
for the sequence only if x 2 [lim inf xn ; lim sup xn ].

Intuitively, the larger the set of limit points, the more the sequence is divergent, in particu-
lar, this set reduces to a singleton when the sequence converges. In light of the last result, the
di¤erence between superior and inferior limits, that is, the length of [lim inf xn ; lim sup xn ],
is a (not particularly precise) indicator of the divergence of a sequence.

Thanks to the inequality lim inf xn lim inf xn , the interval [lim inf xn ; lim sup xn ]
can be rewritten as [lim inf xn ; lim inf xn ]. For instance, if xn = sin n or xn = cos n, we
have that [lim inf xn ; lim inf xn ] = [ 1; 1].

N.B. Up to this point, we have considered only bounded sequences. Actually, if we allow the
limit superior and limit inferior to assume in…nity as a value, all these results can be easily
extended to generic sequences. For instance, if we consider the sequence fng, that diverges
to +1, we have that lim inf xn = lim sup xn = +1; for the sequence f en g, that diverges
to 1, we have that lim sup xn = lim inf xn = 1, whereas for the sequence f( 1)n ng we
have that lim inf xn = 1 and lim sup xn = +1, so that [lim inf xn ; lim sup xn ] = R. We
leave the extension of the previous results to generic sequences to the reader. O
248 CHAPTER 10. DISCRETE CALCULUS

10.2 Discrete calculus


10.2.1 Finite di¤erences
The (…nite) di¤ erences
xn = xn+1 xn
are the discrete case counterpart of derivatives:3 the smallest step increment starting from
n is equal to 1; therefore
xn+1 xn xn+1 xn xn
xn = = =
1 (n + 1) n n

De…nition 370 The sequence f xn g = fxn+1 xn g is called sequence of (…nite) di¤erences


of fxn g.

The next result lists the algebraic properties of the di¤erences, that is, their behavior
with respect to the fundamental operations. It is the discrete counterpart of the results in
Section 18.7.

Proposition 371 Let fxn g and fyn g be two sequences, we have that

(i) (xn + yn ) = xn + yn ;

(ii) (xn yn ) = xn+1 yn + yn xn ;


xn yn xn xn yn
(iii) = .
yn yn yn+1

On the one hand, (i) guarantees that the di¤erence distributes over addition, on the
other hand, (ii) and (iii) show that more complex rules hold for multiplication and division.
Properties (ii) and (iii) are respectively called product rule and quotient rule.

Proof (i) is obvious. (ii) follows from

(xn yn ) = xn+1 yn+1 xn yn = xn+1 yn+1 xn+1 yn + xn+1 yn xn yn


= xn+1 (yn+1 yn ) + yn (xn+1 xn ) = xn+1 yn + yn xn

(iii) follows from


xn xn+1 xn xn+1 yn xn yn+1 xn+1 yn xn yn + xn yn xn yn+1
= = =
yn yn+1 yn yn yn+1 yn yn+1
yn (xn+1 xn ) xn (yn+1 yn ) yn xn xn yn
= =
yn yn+1 yn yn+1

Monotonicity of sequences is characterized using di¤erences in a simple, yet interesting


way.
3
See Section 18.13.
10.2. DISCRETE CALCULUS 249

Lemma 372 A sequence is increasing (decreasing) if and only if xn 0( 0) for every


n 1.

Therefore, the monotonicity of the original sequence is revealed by the sign of the di¤er-
ences.

Example 373 If xn = an , with a > 0, we have that

xn = an+1 an = (a 1) an = (a 1) xn

Therefore, the sequence fan g is increasing if and only if a 1. N

If a = 2, for the increasing sequence f2n g we have that xn = xn . Indeed, if fxn g is


such that
xn = xn (10.4)
that is, xn+1 xn = xn , by a recurrence argument, we obtain xn = 2n 1 x1 . In particular,
if we set x1 = 2 we obtain xn = 2n . The invariance property (10.4) for …nite di¤erences
characterize the sequence f2n g, which is therefore the discrete counterpart of the exponential
function in di¤erential calculus.
The sequence of di¤erences of f xn g is denoted as 2x
n and is called sequence of
second di¤ erences; in particular:
2
xn = xn+2 xn+1 (xn+1 xn ) = xn+2 2xn+1 + xn

Analogously, we can denote as kx the di¤erences of k 1x , that is,


n n

k
X k
k
xn = k 1
xn = k 1
xn+1 k 1
xn = ( 1)k i
xn+i (10.5)
i
i=0

we leave the proof of the binomial expansion to the reader.

Example 374 If xn = n2 , we have that

n2 = (n + 1)2 n2 = 2n + 1
2 2
n = 2 (n + 1) + 1 (2n + 1) = 2

and k n2 = 0 for every k > 2. In general, let xn = nk with k 2 N. By generalizing what


we have just seen for k = 2, we can show that
k k
n = k!

We will use an induction argument. When k = 2 we know that 2 xn = 2 = 2! Assume that


for every 2 r k 1 we have r nr = r! Using Newton’s binomial, we have
k k k k
nk = (n + 1)k nk = nk + n 1
+ n 2
+ +1 nk
1 2
k k
= knk 1
+ n 2
+ +1
2
250 CHAPTER 10. DISCRETE CALCULUS

therefore, using the induction hypothesis r nr = r! for 2 r k 1 and the algebraic


properties of the di¤erences, we have

k 1 k k
nk = k 1
knk 1
+ n 2
+ +1
2
k 1 k 1 k k 1 k 2 k 1
=k n + n + + 1
2
= k (k 1)! + 0 + + 0 = k!

Notice that the zeroes in the last line follow from the induction hypothesis that guarantees
that, for 2 r k,

k 1 k r r 1 k r k r r 1
n = n = (k r)! = 0

Summing up, k nk = k 1 nk = k!, as desired. N

The example shows the analogy between in discrete calculus and the derivative in
“continuous” calculus. Indeed, in the continuous case, it is necessary to derive k times xk
in order to obtain a constant and k + 1 times to get the constant 0. In the discrete case,
we must apply k times the operator to the sequence nk in order to obtain a constant and
k + 1 times to get the constant 0.

Formula (10.5) permits the following beautiful generalization of the series expansion
(9.12) of the exponential function.

Proposition 375 Let fyn g be any sequence of scalars. Then, for each n 1,
1
X 1
X
xk k x xj
yn = e yn+j 8x 2 R (10.6)
k! j!
k=0 j=0

Proof By (10.5), we have to show that, for each n,

1
X k 1
xk X k X xj
( 1)k i yn+i = e x
yn+j 8x 2 R (10.7)
k! i j!
k=0 i=0 j=0

Fix an integer j 0. We show that the coe¢ cients of yn+j on the two sides of (10.6) are
equal. Clearly, on the right hand side this coe¢ cient is e x xj =j!. As to the left hand side,
this coe¢ cient is
1
X 1
X
xk k k j xk k
( 1) = ( 1)k j
k! j k! j
k=0 k=j

where the equality holds because the binomial coe¢ cients are zero if k < j. Therefore, it
remains to prove that
X1
xk k xj
( 1)k j = e x (10.8)
k! j j!
k=j
10.2. DISCRETE CALCULUS 251

Set i = k j. Then,
1
X 1
X 1
X
xk k xi+j i+j xi+j (i + j)! i + j
( 1)k j
= ( 1)i = ( 1)i
k! j (i + j)! j (i + j)! i!j! j
k=j i=0 i=0
1 1
xj X i ( 1)i xj X ( x)i xj x
= x = = e
j! i! j! i! j!
i=0 i=0

as desired.

The series expansion (9.12) is a special case of (10.6). Indeed, let n = 0 so that (10.6)
becomes
X1 X1
xk k xj
y0 = e x yj (10.9)
k! j!
k=0 j=0

Assume that yj = 1 for every j. Then, 0y = y0 = 1 and ky = 0 if k 1. Hence, (10.9)


0 0
becomes
1
X
x xj
1=e
j!
j=0

which is the series expansion (9.12).

10.2.2 Asymptotic behavior


The limit of the ratio
xn
yn
is fundamental, as we have seen in the analysis of the order of convergence. Consider the
following example.

Example 376 Take xn = n ( 1)n and yn = n2 . We have


xn ( 1)n
= !0
yn n
If we consider their di¤erences we get

xn xn+1 xn ( 1)n+1 (1 + 2n)


= = = ( 1)n+1
yn yn+1 yn 1 + 2n
therefore, the ratio xn = yn does not converge. N

Therefore, even if the ratio xn =yn does converge, the behavior of the ratio xn = yn of
the di¤erences may not. Conversely, the next result shows that the asymptotic behavior of
the ratio xn = yn determines the one of xn =yn .

Theorem 377 (Cesàro) Let fyn g be an increasing sequence that diverges to in…nity, that
is, yn " +1, and let fxn g be a generic sequence. It follows that
xn xn xn xn
lim inf lim inf lim sup lim sup (10.10)
yn yn yn yn
252 CHAPTER 10. DISCRETE CALCULUS

In particular, this inequality implies that, if the (…nite or in…nite) limit of the ratio
xn = yn exists, we have that
xn xn xn xn
lim inf = lim inf = lim sup = lim sup (10.11)
yn yn yn yn
that is, xn =yn converges to the same limit. Therefore, as stated above, the “regularity”of the
the asymptotic behavior of the ratio xn = yn implies the “regularity” of the original ratio
xn =yn . At the same time, if the ratio xn =yn presents an “irregular”asymptotic behavior, so
will the di¤erence ratio.

Proof We will only prove the special case (10.11) when xn = yn admits a limit, …nite or
in…nite. Therefore, let xn = yn ! L 2 R. It follows that, for " > 0, there exists n" such
that
xn
L "< <L+"
yn
for every n n" . Since, by hypothesis, yn+1 yn > 0, we have
(L ") (yn+1 yn ) < xn+1 xn < (L + ") (yn+1 yn ) 8n n"
In particular, for every n > n" , we obtain
(L ") (yn" +1 yn" ) < xn" +1 xn" < (L + ") (yn" +1 yn" )
(L ") (yn" +2 yn" +1 ) < xn" +2 xn" +1 < (L + ") (yn" +2 yn" +1 )

(L ") (yn yn 1) < xn xn 1 < (L + ") (yn yn 1)

Summing over the previous inequalities, we get

(L ") (yn yn" ) < xn xn" < (L + ") (yn yn" )


that is
xn"
(L ") yn" xn xn (L + ") yn"
L "+ < <L+"+ "
yn yn yn
Since n" is a given integer and yn " +1 for n ! 1, it follows
xn" (L ") yn" xn (L + ") yn"
lim = lim " =0
n yn n yn
Therefore we have
xn xn
L " lim inf lim sup L+"
n yn n yn
Since " > 0 is arbitrary, it follows
xn xn
lim inf = lim sup =L
n yn n yn

as desired. If xn = yn ! 1 we can proceed in a similar way, as the reader can verify.

The previous result can be interpreted as the discrete version of de l’Hospital’s Theorem.
As the de l’Hospital’s Theorem is useful in …nding the limit of functions, in particular if they
present indeterminate forms, the discrete analogous by Cesàro proves itself to be most useful
in …nding the limit of sequences that present indeterminate forms.
10.3. CONVERGENCE IN MEAN 253

Example 378 Consider the sequence


log (1 + n)
(10.12)
n
its limit is the indeterminate form 1=1. Consider the sequence de…ned as xn = log (1 + n)
and yn = n, it follows that the sequence (10.12) can be obtained as xn =yn . We have
xn log (1 + n + 1) log (1 + n) 1
= = log 1 + !0
yn 1 1+n
therefore
log (1 + n)
lim =0
n
by Cesàro’s Theorem. N

In the next section, we will see how Cesàro’s Theorem allows for a better understanding
of convergence criteria for series (see Section 10.4). To this end, the following remarkable
consequence of Cesàro’s Theorem will be crucial.

Corollary 379 Let fxn g be a sequence such that, eventually, xn > 0. It follows

xn+1 p p xn+1
lim inf lim inf n xn lim sup n xn lim sup
xn xn
Proof Let fxn g be a positive sequence. We have
xn+1 p 1
log = log xn+1 log xn and log n
xn = log xn
xn n
Consider log xn and yn = n , (10.10) takes the form
log xn log xn log xn log xn
lim inf lim inf lim sup lim sup
yn yn yn yn
that is
log xxn+1
n
p p log xxn+1
n
lim inf lim inf log n
xn lim sup log n
xn lim sup
1 1
from which (10.18) follows.

10.3 Convergence in mean


10.3.1 In medio stat virtus
The next result, apart from being particularly elegant, is a deterministic version of the law
of large numbers, one of the main results in probability theory.

Theorem 380 Let fxn g be a sequence that converges to L 2 R. We have


x1 + x2 + + xn
!L
n
254 CHAPTER 10. DISCRETE CALCULUS

Proof Consider the two sequences de…ned as: zn = x1 + x2 + + xn and yn = n. We have


zn+1 zn xn+1
= = xn+1
yn+1 yn 1
Therefore, from the previous results, it follows that
zn zn
lim inf xn+1 lim inf lim sup lim sup xn+1
n n
and, since by hypothesis lim inf xn+1 = lim sup xn+1 = lim xn = L , it follows
x1 + x2 + + xn
lim zn = lim =L
n
as desired.

The sequence Pn
i=1 xi
n
of arithmetic means converges always to the same limit of the sequence fxn g, whereas the
converse does not hold: the sequence of means may converge while the original one does not.

Example 381 The sequence f( 1)n g does not converge, whereas


Pn
i=1 xi
!0
n
Indeed (
x1 + x2 + + xn 0 if n is even
=
n 1
if n is odd
n
N

Therefore, the sequence of means is more “stable”than the original one. This motivates
the following, more general, de…nition of limit of a sequence, which is named after Cesàro
(or “in mean”), which is fundamental in probability theory (and in its applications).

De…nition 382 We say that a sequence fxn g converges a la Cesàro (or in mean) to L, and
C
we write xn ! L, when
x1 + x2 + + xn
!L
n
From the previous results, it follows that standard convergence to a limit implies conver-
gence a la Cesàro to the same limit. The converse does not hold: we may have convergence
a la Cesàro without standard convergence.

Example 383 The sequence f( 1)n g from the last example does not converge, whereas
C
( 1)n ! 0. N

It is useful to …nd conditions such that the converse holds, that is, the convergence of the
sequence of means implies the convergence of the original sequence. These results are called
Tauberian theorems. We provide one as an example:
10.3. CONVERGENCE IN MEAN 255

Proposition 384 (Landau) Let fxn g be a sequence such that there exists k < 0 such that
C
xn =n > k for every n 1. Then xn ! L 2 R if and only if xn ! L.

In particular, the hypothesis is always satis…ed when the sequence fxn g is increasing: an
increasing sequence converges to L if and only if it converges a la Cesàro to L.

Whenever a sequence does not converge in mean, we consider the sequence of the means
of the means, that, by previous results, it is more likely to converge than the sequence of
means: this is called (C; 2) convergence. This idea can be extended to the mean of the
mean iterated k times. We won’t consider such cases. However, the fundamental principle is
that means tend to smooth the behavior of a sequence. In various fashions, often stochastic
(an example is the law of large number previously mentioned), this principle is of central
importance in the applications. In medio stat virtus.

10.3.2 Creatio ex nihilo


The previous analysis has a particularly interesting application to the sequence of partial
sums. Indeed, if we consider the limit a la Cesàro of the sequence of partial sums fsn g ; we
C P C
can extend the concept of summation of a series: if sn ! S we will write 1 n=1 xn = S.
Completely divergent series become convergent according to this broader de…nition. Con-
sider this famous example.

Example 385 The series, named after Grandi,


X1
1 1+1 1+ = ( 1)n+1
n=1

does not converge. Its partial sums

s1 = 1 ; s2 = 0 ; s3 = 1 ; s4 = 0 ; s5 = 1 ;

lead to the following sequence of means (of partial sums)

1+0 1 2 2 1 3
y 1 = 1 ; y2 = = ; y3 = ; y4 = = ; y5 = ;
2 2 3 4 2 5

It is quite obvious that yn is equal to 1=2 when n is even and

1=2 + n=2 n+1 1 1


yn = = = +
n 2n 2 2n

when n is odd, and therefore they also converge to 1=2. Hence,


1
X C 1
( 1)n+1 =
2
n=1

Grandi’s series converges a la Cesàro.


256 CHAPTER 10. DISCRETE CALCULUS

Even if this is not his main scienti…c contribution, the name of Guido Grandi is re-
membered for his treatment of this series. It is curious to notice that, until mid-nineteenth
century, also the greatest mathematicians believed that this series summed up to 1=2. Until
then, mathematics had been developing untidily: on the one hand, highly complex theorems
were known, on the other, attention to well-posed de…nitions and rigor which we are now
used to, was lacking.
The monk Guido Grandi, proposed the following explanation, which contains two mis-
takes. First of all, he identi…ed 1 1 + 1 1 + 1 1 + as a geometric series with common
ratio q = 1 (correct) and therefore having sum
1 1 1
= =
1 q 1 ( 1) 2
(wrong, since the geometric series converges only when jqj < 1); pairing the addenda (wrong,
since the associative property does not generally hold for series) he derived the equality
(1 1) + (1 1) + =0+0+ in order to conclude
1
0+0+0+ =
2
that is, the sum of in…nite zeroes is equal to 1=2. This led him not to deny the existence
of God , but to deem as irrelevant his intervention in the creation; even without divine
intervention, something can come out of nothing (if you wait long enough): creatio ex nihilo.
Having said this, Grandi can be satis…ed of his work: he made several mistakes, yet, well
ahead of his time, he provided an answer to a much more general question.

10.4 Convergence criteria for series


The results of this chapter allow us to achieve a better understanding of the convergence
criteria for series provided in Section 9.3.4 We begin with a useful lemma.

Lemma 386 Let fxn g be a sequence with, eventually, xn > 0. There exists q < 1 such that,
eventually, xn+1 =xn q if and only if
xn+1
lim sup <1 (10.13)
xn
Proof “Only if”. Suppose that there exists q < 1 such that eventually (9.8) holds. There
exists n such that xn+1 =xn q for every n n. Therefore, for any such n we have
supk n xk+1 =xk q, which implies
xn+1 xk+1
lim sup = lim sup q<1
xn n!1 k n xk

“If”. Suppose that (10.13) holds. Since


xk+1
lim sup =L<1
n!1 k n xk
4
For the sake of brevity, we shall only consider the series. Nonetheless, similar considerations hold for
sequences (Section 8.9). Example 378 is explanatory.
10.4. CONVERGENCE CRITERIA FOR SERIES 257

for every " > 0 there exists n such that

xn+1
sup L <" 8n n
k n xn

that is
xk+1
L " < sup <L+" 8n n
k n xk
If we choose " su¢ ciently small so that L + " < 1, by setting q = L + " we obtain the desired
condition.

Analogously, we can prove that eventually xn+1 =xn 1 if


xn+1
lim inf >1 (10.14)
xn
and only if
xn+1
lim inf 1 (10.15)
xn
Therefore, the condition “eventually xn+1 =xn 1”implies (10.15) and is implied by (10.14).
However, we cannot prove anything more. The constant sequence de…ned as xn = 1 shows
that the aforementioned condition holds even if (10.14) does not hold, whereas the sequence
f1=ng shows that (10.15) may hold even if the condition is violated.

The previous analysis leads to the following Corollary, which is very useful for computa-
tions, where the ratio criterion is expressed in terms of limits.
P
Corollary 387 Let 1 n=1 xn be a series with, eventually, xn > 0.

(i) If
xn+1
lim sup <1
xn
then the series converges.
(ii) If
xn+1
lim inf >1
xn
then the series diverges positively.

Notice that, thanks to Lemma 386, point (i) is equivalent to point (i) of Proposition 352.
In contrast, point (ii) is weaker than point (ii) of Proposition 352 since condition (10.14) is
only su¢ cient, but not necessary, to have that xn+1 =xn 1 eventually.

As shown by the following examples, this speci…cation of the ratio criterion is particularly
useful when the limit
xn+1
lim
xn
exists, that is, whenever
xn+1 xn+1 xn+1
lim = lim sup = lim inf
xn xn xn
In this particular case, the ratio criterion takes the useful tripartite form of Proposition 350:
258 CHAPTER 10. DISCRETE CALCULUS

(i) if
xn+1
lim <1
xn
the series converges;

(ii) if
xn+1
lim >1
xn
the limit of the series is 1;

(iii) if
xn+1
lim =1
xn
the criterion fails and it does not determine the behavior of the series.

As we have seen in Section 8.9, this form of the ratio criterion is the one which is usually
used in applications. Examples P351 and 353 have
P1shown 2cases (i) and (ii). The unfortunate
1
case (iii) is well-exempli…ed by n=1 1=n and n=1 1=n .

10.4.1 Root criterion for convergence


The next convergence criterion is, from a theoretical point of view, the most powerful (as
the next section will show).
P1
Proposition 388 (Root criterion) Let n=1 xn be a series with positive terms.

(i) If there exists a number q < 1 such that, eventually,


p
n
xn q

then the series converges.


p
(ii) If instead n xn 1 for in…nitely many values of n, then the series diverges.
p
Proof From n xn q we immediately have that 0 xn q n and, by using the comparison
p
criterion and the convergence of the geometric series, we have the thesis. If instead n xn 1
for in…nitely many values of n, for them xn is 1 and it cannot tend to 0.

Let us see its limit form. By Lemma 386, point (i) can be equivalently stated as
p
lim sup n
xn < 1
p
As to point (ii), it requires that n xn 1 for in…nitely many values of n, that is, that there
p
is a subsequence fnk g such that nk xnk 1 for every k. Such a condition holds if
p
lim sup n
xn > 1 (10.16)

and only if
p
lim sup n
xn 1 (10.17)
10.4. CONVERGENCE CRITERIA FOR SERIES 259

The constant sequence xn = 1 exempli…es how condition (10.17) can hold even if (10.16)
does not. Sequence f(1 1=n)n g on the other hand, shows how even condition (ii) from
Proposition 388 may not hold although (10.17) holds. It is therefore clear that (10.16)
implies point (ii) of Proposition 388, which in turn implies (10.17), but that the opposite
implications do not hold.

All this brings us to the following limit form, in which point (i) is equivalent to that of
Proposition 388, while point (ii) is weaker than its counterpart since, as we have seen above,
p
condition (10.16) only is a su¢ cient condition for n xn 1 to hold for in…nitely many values
of n.
P
Corollary 389 (Root criterion in limit form) Let 1 n=1 xn be a positive term series.
p
(i) If lim sup n xn < 1, the series converges.
p
(ii) If lim sup n xn > 1, the series diverges positively.
p p
Proof If lim sup n xn < 1, we have that n xn q for some q < 1, eventually. The desider-
p p
atum follows from Proposition 388. If lim sup n xn > 1, then n xn 1 for in…nitely many
values of n, and the thesis follows from Proposition 388.

As for the limit form of the ratio criterion, also that of the root criterion is particularly
p
useful when lim n xn exists. Under such circumstances the criterion takes the following
tripartite form:
(i) if
p
lim n
xn < 1
the series converges;
(ii) if
p
lim n
xn > 1
the series diverges positively;
(iii) if
p
lim n
xn = 1
the criterion fails and it does not determine the behavior of the series.

As for the tripartite form of the ratio criterion, that of the root criterion is its most useful
form at a computational level. Nonetheless, we hope the reader will always keep in mind the
theoretical background of the criterion, as “ye were not made to live like unto brutes, but
for pursuit of virtue and of knowledge”.
Example 390 (i) Let q > 0. The series
1
X qn
nn
n=1
converges as r
n qn q
n
= !0
n n
260 CHAPTER 10. DISCRETE CALCULUS

P p
(ii) Let 0 q < 1: The series 1 k n
n=1 n q converges for every k: indeed
n
nk q n = qnk=n ! q
since n k=n ! 1 (as log n k=n = (k=n) log n ! 0). N

10.4.2 The power of the root criterion


p
The ratio and root criteria are based on the behavior of sequences fxn+1 =xn g and n xn .
In this regard, the next result, which will be proven in Section 10.2.2, is quite enlightening.

Proposition 391 For every sequence fxn g with positive terms, we have that
xn+1 p p xn+1
lim inf lim inf n
xn lim sup n
xn lim sup (10.18)
xn xn
If lim xn+1 =xn exists, we have that
xn+1 p
lim = lim n xn (10.19)
xn
and so the two criteria are equivalent in their limit form. However, if lim xn+1 =xn does not
exist, we still obtain from (10.18) that
xn+1 p
lim sup < 1 ) lim sup n xn < 1
xn
and
xn+1 p
lim inf > 1 ) lim sup n xn > 1
xn
which suggests that the root criterion is more powerful than the ratio criterion in determining
convergence: whenever the ratio criterion rules in favor of convergence or of divergence, we
would have reached the same conclusion by using the root criterion. The opposite does not
hold, as the next example shows: the ratio criterion fails while the root criterion determines
that the series in question converges.

Example 392 Let us consider the sequence5


( 1
2n if n odd
xn = 1
2n 2 id n even

that is:
1 1 1 1 1 1 1
+1+ + + + + + +
2 8 4 32 16 128 64
We have that 8 1
>
> 2(n+1) 2
=2 if n odd
xn+1 < 1
2n
=
xn >
>
1
: 2n+1
1 = 18 if n even
2n 2

and ( 1
p 2 if n odd
n
xn = p
n
4
2 if n even
5
See Rudin (1976) p. 67.
10.4. CONVERGENCE CRITERIA FOR SERIES 261

so that
xn+1 xn+1 1
lim sup =2 , lim inf =
xn xn 8
and
p 1
lim sup n
xn =
2
The ratio criterion thus fails, while the root criterion tells us that the series converges. N

Even though the root criterion is more powerful, the ratio criterion can still be useful as
it is generally easier to compute the limit of ratios than that of roots. The root criterion may
be more powerful from a theoretical standpoint, yet it is harder to use from a computational
perspective.

In light of this, when using the criteria for solving problems, one should …rst check
whether lim xn+1 =xn exists and, if it does, compute it. In such a case, thanks to (10.19) we
p
can also know the value of lim n xn and thus we can use the more powerful root criterion.
In the unfortunate case in which lim xn+1 =xn does not exist so that we can no longer
determine lim sup xn+1 =xn and lim inf xn+1 =xn , we can either use the less powerful ratio
criterion (which may fail as we have seen in the previous example), or we may try to compute
p
lim sup n xn directly, hoping it exists (as in the previous example) so that the root criterion
can be used in its handier limit form.

Finally, note that, however powerful it may be, the root criterion (and, a fortiori, the
weaker ratio criterion) only gives a su¢ cient condition for convergence, as the following
example shows.

Example 393 The series


1
X 1
n2
n=1

converges. However, by recalling Example 309, we have that


r r r
n 1 n 1 n 1
lim 2
= lim lim =1
n n n
N
P1 2
The root criterion is of no help in determining whether the simple series n=1 n
converges. The reason behind such a “shortcoming”is evident in the following simple result,
which shows how such a criterion implies that the terms of the sequence converge to zero as
fast as the geometric sequence.
P1 p
Proposition 394 Let n=1 xn be a series with positive terms, with lim sup n xn < 1. For
every q > 0 such that
p
lim sup n
xn q<1
we have that, eventually,
xn qn (10.20)
262 CHAPTER 10. DISCRETE CALCULUS

p
Proof Take q > 0 such that lim sup n xn q < 1. There is an nq 1 such that
p
n
xn q

for every n nq . For every such an n we have that:


p
n
xn q () xn qn

and so (10.20) holds.

Thanks to (10.20), we can say that those convergent series whose terms converge to zero
less quickly than the geometric sequence, that is such that q n = o (xn ), or out of the root
criterion’s reach. For example, for every natural number k 2 we have that
qn
1 !0
nk
P1
and so q n = o n k . In order to determine whether the series n=1 n
k converges, the root
criterion is thus useless. This is con…rmed by the fact that
r
n 1
lim =1
nk
and we are able to understand why the root criterion fails in this instance thanks to Propos-
ition 394.

10.5 In…nite patience


In Section 9.1.2 we introduced, for every 2 (0; 1), the intertemporal utility function U :
R1 ! R given by
1
X
t 1
U (x) = ut (xt ) (10.21)
t=1

Such a function orders all possible intertemporal consumption pro…les x = (x1 ; :::; xt ; :::) 2
R1 . In particular, the higher the subjective discount factor the more the decision maker
cares about future periods, that is he is more patient.
One may ask oneself what happens in the limit case " 1,6 that is when the subjective
discount factor tends to 1. Intuitively, we are in an “in…nite patience” setting, where all
periods, present and future, count the same for the decision maker. When the horizon T is
…nite, the answer is simple:
T
X T
X
t 1
lim ut (xt ) = ut (xt ) (10.22)
"1
t=1 t=1

so that the limit case corresponds to the sum of the utilities of all periods, all with equal
unitary weight. When the horizon is in…nite the problem becomes far more complex as, by
6
For the meaning of " 1 we refer the reader to Section 8.6.2.
10.5. INFINITE PATIENCE 263

1
X
Theorem 339, for the series ut (xt ) to converge, it must be that limt!1 ut (xt ) = 0, which
t=1
is hardly justi…able in an economic perspective.
Let us consider instead the limit
1
X
t 1
lim (1 ) ut (xt )
"1
t=1

where 1 is a normalization factor as


1
X
t 1
(1 ) =1 (10.23)
t=1

Such a limit may not exist:

Example 395 Let us consider the sequence fxt g given by

0; 0 ; 1; 1 ; 0; 0; 0; 0 ; 1; 1; 1; 1; 1; 1; 1; 1; :::
|{z} |{z} | {z } | {z }
2 elements 2 elements 4 elements 8 elements

where every block of 0s and 1s has length equal to the sum of the lengths of the previous
1
X
blocks. One can show that lim "1 (1 ) t 1
xt does not exist. N
t=1

The next remarkable result, the non-simple proof of which we omit, shows how the
existence of the limit is equivalent to convergence in means.

Theorem 396 (Frobenius-Littlewood) Let x = fxt g be a bounded sequence, that is x 2


1
X
t 1
R1 . The limit lim "1 (1 ) xt exists if and only if fxt g converges a la Cesàro. If
t=1
that is the case,
1
X T
t 1 1X
lim (1 ) xt = lim xt
"1 T !+1 T
t=1 t=1

The theorem suggests to de…ne the function V : R1 ! R as

V (x) = (1 ) U (x) 8x 2 R1

For every 2 (0; 1) the function V is equivalent to U :

V (x) V (y) () U (x) U (y) 8x 2 R1

In light of (10.23), V is a normalization of U which assigns value 1 to the constant sequence


xt = 1 for every t.
Thanks to Frobenius-Littlewood’s Theorem, we have that
T
1X
lim V (x) = lim ut (xt )
"1 T !+1 T
t=1
264 CHAPTER 10. DISCRETE CALCULUS

as long as the limits exist. The in…nite patience case is thus captured by the limit of the
average utilities
T
1X
lim ut (xt ) (10.24)
T !+1 T
t=1

that is by the limit a la Cesàro of the sequence fut (xt )g. Such a criterion can be thus seen
as a limit case for " 1 of the intertemporal utility function V .
XT
The role the sum ut (xt ) plays in case (10.22) with …nite horizon is thus played in
t=1
the in…nite horizon case by the limit of the average utilities (10.24). This relevant economic
application of Frobenius-Littlewood’s Theorem allows us to elegantly conclude this chapter.
Part III

Continuity

265
Chapter 11

Limits of functions

11.1 Introductory examples


The concept of limit arises in the attempt of formalizing the concept of “how a function
behaves when the independent variable approaches (tends to) a point x0 ”. To …x the ideas,
we start with some introductory examples in which we consider scalar functions, and then
arrive to a rigorous formalization both in the scalar case and in the more general case of
functions of vector variables.

Let us consider the function f : R f0g ! R given by


sin x
f (x) =
x
and analyze its behavior for points closer and closer to x0 = 0, i.e., to the origin. In the next
table we …nd the values that the function assumes for several such:

x 0:1 0:01 0:001 0:001 0:01 0:1


sin x
0:998 0:99998 0:9999999 0:9999999 0:99998 0:998
x

Inserting other values closer and closer to the origin, we can verify that the corresponding
values of sin x=x get closer and closer to the limit L = 1. In this case we say that “the limit
of sin x=x as x tends to x0 = 0 is L = 1”; in symbols,
sin x
lim =1
x!0 x

Observe that in this example the point x0 = 0 where we take the limit does not belong to
the domain of the function f .

Let f : R ! R be the function de…ned by

x for x 1
f (x) =
1 for x > 1
Its graph is

267
268 CHAPTER 11. LIMITS OF FUNCTIONS

How does f behave when one approaches the point x0 = 1? Taking points closer and
closer to x0 = 1 we have:

x 0:98 0:99 0:999 0:9999 1:0001 1:001 1:01 1:02


f (x) 0:98 0:99 0:999 0:9999 1 1 1 1

Adding other values, closer and closer to x0 = 1, we can verify that as x gets closer and
closer to x0 = 1, f (x) gets closer and closer to L = 1. In this case we say that “the limit of
f (x) as x tends to x0 = 1 is L = 1”, and we write

lim f (x) = 1
x!1

Observe that the value that the function assumes at the point x0 = 1 is f (1) = 1, and
therefore the limit L = 1 is equal to the value f (1) of the function at x0 = 1.

Let f : R ! R be the function de…ned by

8
< x if x < 1
f (x) = 2 if x = 1
:
1 if x > 1

Compared to the previous example we have introduced a “jump”at the point x = 1, so that
the function jumps to the value 2 (we have indeed f (1) = 2).
11.1. INTRODUCTORY EXAMPLES 269

If we study the behavior of f for values of x closer and closer to x0 = 1, we build the
same table as before (because the function, except at the point 1, is identical to the one in
the previous example), and therefore also in this case we have

lim f (x) = 1
x!1

This time the value that the function assumes at the point 1 is f (1) = 2, di¤erent from the
value L = 1 of the limit.

Until now we have approached the point x0 both from the right and from the left, that is,
bilaterally (in two-sided manner). Sometimes this is not possible; rather, one can approach
from either the right or the left, that is, unilaterally (in one-sided manner). Let us consider,
for example, the function f : R f2g ! R given by f (x) = 1= (x 2) and let x0 = 2.
270 CHAPTER 11. LIMITS OF FUNCTIONS

“To approach the point x0 = 2 from the right”means to approach it by considering only
values x > 2:
x 2:0001 2:001 2:01 2:05 2:1 2:2 2:5
f (x) 10; 000 1; 000 100 20 10 5 2

For values closer and closer to 2 from the right the function assumes values that are larger
and larger and not bounded from above. In this case we say that “the function f tends to
+1 as x tends to 2 from the right” and we write

lim f (x) = +1
x!2+

Let us see now what happens by approaching x0 = 2 from the left, that is, by considering
values x < 2:

x 1:5 1:8 1:9 1:95 1:99 1:999 1:9999


f (x) 2 5 10 20 100 1; 000 10; 000

For values closer and closer to 2 from the left the function assumes larger and larger (in
absolute value) negative values. In this case we say that “the function f tends to 1 as x
tends to 2 from the left” and we write

lim f (x) = 1
x!2

Observe that
lim f (x) 6= lim f (x)
x!2+ x!2
that is, there exist the “right-hand” and the “left-hand” limits, but they are di¤erent. As
we will see in Proposition 413, the fact that the one-sided limits are distinct re‡ects the
fact that the two-sided limit of f (x) as x tends to 2 does not exist. Indeed, the equality of
the one-sided limits is equivalent to the existence of the two-sided limit. For example, if we
modify the function by considering f (x) = 1= jx 2j, we have

lim f (x) = lim f (x) = lim f (x) = +1 (11.1)


x!2 x!2+ x!2

Now the two one-sided limits are equal, and they coincide with the two-sided one, which in
this case exists (even if in…nite).

Considering again the function f (x) = 1= (x 2), what does it happen if as x0 we take
+1? In other terms, what does it happen if we consider increasingly larger values of x?
Look at the following table:

x 100 1; 000 10; 000 100; 000 1; 000; 000


f (x) 0:0102 0:001002 0:0001 0:00001 0:000001

For increasingly larger values of x the function assumes values closer and closer to 0. In this
case we say that “the function tends to 0 as x tends to +1” and we write

lim f (x) = 0
x!+1
11.1. INTRODUCTORY EXAMPLES 271

Observe how the function assumes values close to 0, but always strictly positive: f approaches
0 “from above”. If we want to emphasize this aspect we write

lim f (x) = 0+
x!+1

where 0+ suggests that while converging to 0 the values of f (x) remain strictly positive.
What does it happen if, instead, as x0 we take 1? We have the following table of
values:

x 100 1; 000 10; 000 100; 000 1; 000; 000


f (x) 0:0098 0:000998 0:0001 0:00001 0:000001

For negative and increasingly larger (in absolute value) values of x, the function assumes
values closer and closer to 0. We say that “the function tends to 0 as x tends to 1” and
we write
lim f (x) = 0
x! 1
If we want to emphasize that the function, in tending to 0, remains negative, we write

lim f (x) = 0
x! 1

Finally, after having seen various types of limits, let us consider a function that has no
limit, i.e., that it does not exhibit any “de…nite trend”. Let f : R f0g ! R be given by
1
f (x) = sin
x
At the point x0 = 0 the function does not have a limit: for x closer and closer to x0 = 0, the
function continues to oscillate with a tighter and tighter sinusoidal trend:

1 y
0.8

0.6

0.4

0.2

0
x
-0.2

-0.4

-0.6

-0.8

-1

-0.4 -0.2 0 0.2 0.4 0.6

f (x) = sin x1
272 CHAPTER 11. LIMITS OF FUNCTIONS

The point x0 = 0 is, however, the unique point where the function does not have a limit:
at all the points of the domain the limit exists. A much more dramatic behavior is displayed
by the function f : R ! R given by
1 for x 2 Q
f (x) = (11.2)
0 for x 2
=Q
This remarkable recipe, called the Dirichlet function, oscillates “obsessively” between the
values 0 and 1 because, by the density of the rational numbers in the real numbers, for any
pair x < y of real numbers there exists a rational number q such that x < q < y. As we will
see, the Dirichlet function does not have a limit at any point x0 2 R.

11.2 Functions of a single variable


11.2.1 Two-sided limits
From the examination of the introductory examples there emerge four possible cases when
the limit exists, depending on the …niteness or not of the point x0 and of the value of the
limit:
(i) limx!x0 f (x) = L 2 R, i.e., both the point x0 and the limit L are real (…nite);
(ii) limx!x0 f (x) = 1, i.e., the point x0 is real, but the limit is in…nite;
(iii) limx!+1 f (x) = L 2 R or limx! 1f (x) = L 2 R, i.e., the point x0 is in…nite, but
the limit is …nite;
(iv) limx!+1 f (x) = 1 or limx! 1f (x) = 1, i.e., both the point x0 and the limit
are in…nite.
We formalize the notion of limit in these cases. We begin with case (i). First of all let
us observe that we can attempt to calculate the limit of a function with domain A R for
x0 2 R only when x0 is a limit point of A. This allows to give a meaning to the sentence “as
x 2 A tends to x0 ”.
De…nition 397 Given a function f : A R ! R and a limit point x0 of A, we write
lim f (x) = L 2 R
x!x0

if for every " > 0 there exists a " > 0 such that, for every x 2 A,
0 < jx x0 j < " =) jf (x) Lj < " (11.3)
The value L is called the limit of the function at x0 .
Note that (11.3) can be written as
0 < d (x; x0 ) < " =) d (f (x) ; L) < " (11.4)
The de…nition requires that, for any …xed quantity " > 0, arbitrarily small, there exists
a value " such that all the points x 2 A lying at a distance smaller than " from x0 have
images f (x) lying at a distance smaller than " from the value L of the limit. Note that the
condition d (x; x0 ) > 0 is equivalent to requiring x 6= x0 .
11.2. FUNCTIONS OF A SINGLE VARIABLE 273

Example 398 Let us show that limx!2 (3x 5) = 1. We have to verify that, for every
" > 0, there exists " > 0 such that

jx 2j < =) j(3x 5) 1j < " (11.5)

We have j(3x 5) 1j < " if and only if jx 2j < "=3 and therefore, setting " = "=3 yields
(11.5). N

Note how the quantity " depends on the value chosen for ": the smaller the value of ",
the smaller " . Naturally, the choice of " is not unique, so that, when we …nd a value of " ,
all the values smaller than it also work …ne: in Example 398, we could actually choose as "
any (positive) value lower than "=3.

N.B. The value of ", besides depending on ", clearly depends also on x0 . O

We provide now an example in which the limit does not exist.

Example 399 Let us reconsider the Dirichlet function (11.2): limx!x0 f (x) does not exist
for any x0 2 R. Indeed, given x0 2 R, let us suppose, by contradiction, that limx!x0 f (x)
exists and is equal to L 2 R. Let 0 < " < 1=2. By de…nition, there exists = " such that
1
x 2 (x0 ; x0 + ) and x 6= x0 =) jf (x) Lj < " <
2
In each neighborhood (x0 ; x0 + ) there exist both rational points and irrational points
distinct from x0 (see Proposition 39), that is, points x 2 (x0 ; x0 + ) for which f (x) = 1
and points x 2 (x0 ; x0 + ) for which f (x) = 0. But this contradicts jf (x) Lj < 1=2
for every x 2 (x0 ; x0 + ) with x 6= x0 .1 Therefore, limx!x0 f (x) does not exist in any
point x0 2 R. N

A de…nition as 397, in which the distances are made explicit, is called of “"- ” type. In
the light of (11.4), the rewriting of De…nition 397 in the language of the neighborhoods is
immediate. To make it more immediately expressive the symbology, we will denote, rather
by the letter B, respectively by U (x0 ) a neighborhood of x0 of radius (of the independent
variable, i.e., a neighborhood in abscissa) and by V (L) = V" (L) a neighborhood of L of
radius " (of the dependent variable, i.e., a neighborhood in ordinate).

De…nition 400 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x 2 U " (x0 ) \ A and x 6= x0 =) f (x) 2 V" (L) (11.6)


1
Indeed, let x and y be respectively a rational and an irrational point in (x0 ; x0 + ). It follows that
1 1
1 = j1 0j = jf (x) f (y)j jf (x) Lj + jL f (y)j < + =1
2 2
274 CHAPTER 11. LIMITS OF FUNCTIONS

As for convergence of sequences, the rewriting in the language of neighborhoods is very


evocative.2 In particular, referring to the topology of the extended real line introduced in
Section 8.6.4, we can immediately generalize the de…nition in such a way as to include also
the cases (ii), (iii) and (iv), in analogy with what we have done with De…nition 278 for the
limits of sequences.

De…nition 401 Let f : A R ! R be a function and x0 2 R a limit point of A. We write


lim f (x) = L 2 R
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that
x 2 U " (x0 ) \ A and x 6= x0 =) f (x) 2 V" (L) (11.7)

The di¤erence between De…nitions 400 and 401 is obviously minor: where in the …rst
de…nition we had R, in the second one we have R. The simple modi…cation allows however
to consider also the cases (ii), (iii) and (iv). In particular:

case (ii) is obtained by setting x0 2 R and L = 1;


case (iii) is obtained by setting x0 = 1 and L 2 R;
case (iv) is obtained by setting x0 = 1 and L = 1.

As an example, we consider explicitly some subcases, leaving to the reader the other
ones. We start with the subcase x0 2 R and L = +1 of (i). In this case De…nition 401 is
equivalent to the following one, in “"- ” form, that is, with making the distances explicit.

De…nition 402 Let f : A R ! R be a function and x0 2 R a limit point of A. We write


lim f (x) = +1
x!x0

if, for every M > 0, there exists M > 0 such that, for every x 2 A, we have
0 < jx x0 j < M =) f (x) > M (11.8)

In other words, for each constant M , no matter large, there exists M > 0 such that all
the points x 2 A that lie at distance less than M (excluding at most x0 ) have images f (x)
larger than M .

Example 403 Let f : R f2g ! R be given by f (x) = 1= jx 2j. The point x0 = 2 is


a limit point for R f2g and therefore we can consider limx!2 f (x). Let M > 0. Setting
M = 1=M , we have

1 1
0 < jx x0 j < M () 0 < jx 2j < =) >M
M jx 2j
and therefore
0 < jx 2j < M =) f (x) > M
that is, limx!2 f (x) = +1. N
2
In a nutshell, we can say that “there exists a neighbourhood”takes the place of “eventually”as employed
sequences.
11.2. FUNCTIONS OF A SINGLE VARIABLE 275

Let us now consider case (iii) with x0 = +1 and L 2 R. Here De…nition 401 is equivalent
to the following one, still in “"- ” form because the distances are made explicit.

De…nition 404 Let f : A R ! R, with A unbounded from above.3 We write

lim f (x) = L 2 R
x!+1

if, for every " > 0, there exists M" > 0 such that, for every x 2 A, we have

x > M" =) jf (x) Lj < " (11.9)

In this case, for each choice of " > 0 arbitrarily small, there exists a value M" such that
the images of points x greater that M" lie at distance less than " from L.

Example 405 Let f : R ! R be given by f (x) = 1 + e x . Thanks to Lemma 276, +1 is


a limit point of R. We can therefore consider the limit limx!+1 f (x). Let us verify that
limx!+1 f (x) = 1. Let " > 0. We have
x x
jf (x) Lj = 1 + e 1 =e < " () x < log " () x > log "

and therefore, setting M" = log ",

x > M" =) jf (x) Lj < "

that is, limx!+1 f (x) = 1. N

Finally, we consider case (iv) with x0 = L = +1. In this case De…nition 401 is equivalent
to the following one:

De…nition 406 Let f : A R ! R; with A unbounded from above. We write

lim f (x) = +1
x!+1

if, for every M > 0, there exists N such that, for every x 2 A, we have

x > N =) f (x) > M (11.10)


p
Example 407 Let f : R+ ! R be given by f (x) = x. Thanks to Lemma 276, +1 is a
limit point of R+ and we can therefore consider the limit limx!+1 f (x). Let us verify that
limx!+1 f (x) = +1. For every M > 0 we have
p
f (x) > M () x > M () x > M 2

Setting N = M 2 yields
x > N =) f (x) > M
that is, limx!+1 f (x) = +1. N
3
By Lemma 276, the fact that A is not bounded from above guarantees that +1 is a limit point of A.
For example, this is the case when (a; +1) A.
276 CHAPTER 11. LIMITS OF FUNCTIONS

If A = N+ , that is, f : N+ ! R is a sequence, with the last two de…nitions we recover


the notions of convergence and (positive) divergence for sequences. The theory of limits of
functions extends therefore the theory of limits of sequences of Chapter 8.

O.R. It is useful to see the concept of limit “in three stages” (as a rocket):
(i) for every neighborhood V of L (in ordinate)
(ii) there exists a neighborhood U of x0 (in abscissa) such that
(iii) all the values of f with x 2 U , x 6= x0 , belong to V , i.e., all the images — excluding at
most f (x0 ) — of f in U \ A belong to V : f (U \ A fx0 g) V .

10 y

V(l)
6

O U(x ) x
0
0

-2
-2 -1 0 1 2 3 4

We are often tempted to simplify to two stages: “the values of x close to x0 have images
f (x) close to L”, that is,
for every U there exists V such that f (U \ A fx0 g) V
Unfortunately, in such a way we say nothing at all because what precedes is always true, as
the …gure shows:

5
y

3 V(l)

0
O x
U(x )
-1 0

-2

-3

-4
-4 -2 0 2 4 6
11.2. FUNCTIONS OF A SINGLE VARIABLE 277

In the …gure, for every neighborhood U , also very small, of x0 there exists always a
neighborhood (usually quite big) V of L inside which all the values of f (x) with x 2 U fx0 g
fall. Such V can always be taken as an open interval that contains f (U fx0 g). H

O.R. As we have said many times a sequence is a function de…ned on N+ . This set has only
one limit point: +1. For a sequence the only limit we can talk about is therefore limn!1 ,
which we sometimes denote simply by lim since there is no danger of confusion. H

11.2.2 One-sided limits


We cannot always talk of two-sided (or bilateral) limits. For example, in the boundary points
of the domain of a function the two-sided limits, for obvious reasons, are not de…ned: the
p p
two-sided limit limx!0 x cannot exist because x is de…ned only for x 0. However,
the two-sided limit may fail to exist also in the interior points of the domain. Consider for
example the simple function f : R ! R given by
2 if x 1
f (x) =
x if x < 1

It is easy to see that the limit limx!1 f (x) does not exist. In these cases one can resort to
the weaker notion of one-sided (or unilateral) limit, already met in an intuitive way in the
introductory examples of the present chapter. They suggest the di¤erent possibilities:

(i) limx!x+ f (x) 2 R and limx!x f (x) 2 R;


0 0

(ii) limx!x+ f (x) = 1 and limx!x f (x) = 1.


0 0

Note how, in the one-sided limits, the point x0 is necessarily in R, while the value of the
limit is in R.
The next de…nition includes the two possible general cases (i) and (ii), by suitably modi-
fying De…nition 401.

De…nition 408 Let f : A R ! R be a function and x0 2 R a limit point of A. We write


lim f (x) = L 2 R
x!x+
0
278 CHAPTER 11. LIMITS OF FUNCTIONS

if, for every neighborhood V" (L) of L, there exists a right neighborhood U +" (x0 ) = [x0 ; x0 + " )
of x0 such that
x 2 U +" (x0 ) \ A and x 6= x0 =) f (x) 2 V" (L) (11.11)
The value L is called the right limit of the function at x0 .

Since, excluding x0 , the neighborhood U +" (x0 ) reduces to (x0 ; x0 + " ), (11.11) can be
simpler written as
x 2 (x0 ; x0 + " ) \ A =) f (x) 2 V" (L)
In particular:

case (i) is obtained by setting L 2 R;

case (ii) is obtained by setting L = 1.

In case (i), De…nition 408 is equivalent to the following “"- ” de…nition:

De…nition 409 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x+
0

if, for every " > 0, there exists = " > 0 such that, for every x 2 A,

x0 < x < x0 + =) jf (x) Lj < " (11.12)

Let us see an example.


p p
Example 410 Consider f : R+ ! R given by f (x) = x. We claim that limx!0+ x = 0.
Let " > 0. Then p
jf (x) Lj = x < " () x < "2
Setting " = "2 , we have
0<x< " =) jf (x) Lj < "
p
that is, limx!0+ x = 0. N

Let us consider the subcase L = +1 of (ii), leaving to the reader the subcase L = 1.
For this case De…nition 408 is equivalent to the following “"- ” de…nition:

De…nition 411 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = +1
x!x+
0

if, for every M > 0, there exists M > 0 such that, for every x 2 A,

x0 < x < x0 + M =) f (x) > M (11.13)

In the same way we de…ne the left limits, which are denoted by limx!x f (x). We leave
0
the details to the reader.
To end this section, let us see an example in which the two one-sided limits (the right
one and the left one) exist, but are di¤erent.
11.2. FUNCTIONS OF A SINGLE VARIABLE 279

Example 412 Let f : R f2g ! R be given by f (x) = 1= (x 2). The point x0 = 2 is


a limit point of R f2g and we can consider the two one-sided limits limx!2+ f (x) and
limx!2 f (x). Let M > 0. Setting M = 1=M , for every x > 2 we have
1 1
x x0 < M () x 2< =) >M
M x 2
and therefore
0<x 2< M =) f (x) > M
that is, limx!2+ f (x) = +1. On the other hand, for every x < 2 we have
1 1
x0 x< M () 2 x< =) < M
M x 2
and therefore
0<2 x< M =) f (x) < M
that is, limx!2 f (x) = 1. N

11.2.3 Relations between one-sided and two-sided limits


The …rst property of the limits that we study shows that the two-sided limits (for …nite
points) exist if and only if the corresponding one-sided limits exist and are equal. In other
words, the two-sided limit can be seen as the case in which the two one-sided limits coincide,
while, when there is discrepancy between them (or at least one of them does not exist), the
two-sided limit does not exist.

Proposition 413 Let f : A R ! R be a function and x0 a point for which there exists a
neighborhood B" (x0 ) such that B" (x0 ) fx0 g A. Then limx!x0 f (x) = L 2 R if and only
if
lim f (x) = lim f (x) = L 2 R
x!x+
0 x!x0

Note that B" (x0 ) fx0 g is a neighborhood of x0 “with a hole in it”, i.e., without the
point x0 itself. The condition B" (x0 ) fx0 g A requires that there exists at least one such
neighborhood with a hole in included in A. Naturally, an obvious and important case where
there exists a neighborhood B" (x0 ) such that B" (x0 ) fx0 g A is when x0 is an interior
point of A.

Going back to the examples just seen, for f (x) = 1= jx 2j we have limx!2 f (x) = +1
and hence, by Proposition 413,

lim f (x) = lim f (x) = lim f (x) = +1


x!2 x!2+ x!2

which con…rms (11.1). For f (x) = 1= (x 2) we have instead

+1 = lim f (x) 6= lim f (x) = 1


x!2+ x!2

and hence, by Proposition 413, the two-sided limit limx!2 f (x) does not exist.
280 CHAPTER 11. LIMITS OF FUNCTIONS

Proof We prove the proposition for L 2 R, leaving to the reader the case L = 1.
Moreover, for simplicity we suppose that x0 is an interior point of A.
“If”. We show that if limx!x f (x) = limx!x+ f (x) = L, then limx!x0 f (x) = L. Let
0 0
" > 0. Since limx!x+ f (x) = L, there exists 0" > 0 such that for every x 2 (x0 ; x0 + 0" ) \ A
0
we have jf (x) Lj < ". On the other hand, since limx!x f (x) = L, there exists 00" > 0
0
00 0 00
such that for every x 2 (x0 " ; x0 ) \ A we have jf (x) Lj < ". Let " = min "; " .
Then
x 2 (x0 ; x0 + " ) \ A =) jf (x) Lj < " (11.14)
and
x 2 (x0 " ; x0 ) \ A =) jf (x) Lj < " (11.15)
that is
x 2 (x0 " ; x0 + ") \ A and x 6= x0 =) jf (x) Lj < "
Therefore, limx!x0 f (x) = L.
“Only if”. That is, if limx!x0 f (x) = L, then limx!x f (x) = limx!x+ f (x) = L. Let
0 0
" > 0. Since limx!x0 f (x) = L, there exists " > 0 such that

x 2 (x0 " ; x0 + ") \ A and x 6= x0 =) jf (x) Lj < " (11.16)

Since x0 is not a boundary point, both (x0 " ; x0 ) \ A and (x0 ; x0 + " ) \ A are not empty.
Therefore, (11.16) implies both (11.14) and (11.15), that is, limx!x+ f (x) = limx!x f (x) =
0 0
L.

The reader will have observed that, when A is an interval, Proposition 413 forbids that
x0 is a boundary point. Indeed, to …x ideas, let us consider an interval of the real line with
extremes a < b.4 When x0 = a = inf A, it does not make sense to talk about the one-
sided limit limx!a f (x), while when x0 = b = sup A it does not make sense to talk about
the unilateral limit limx!b+ f (x). We will set, instead, limx!a f (x) = limx!a+ f (x) and
limx!b f (x) = limx!b f (x) since the conditions in the de…nition of the two-sided limit are
perfectly satis…ed: for each (two-sided) neighborhood V of L there exists a neighborhood,
by force one-sided because x0 is a boundary point, such that the images of f , except perhaps
f (x0 ), fall in V .
A similar observation can be made if A is a half-line bounded from below (above): in
such a case the left (right) limit as x tends to the in…mum (supremum) of A does not exist:
p
For example, for f (x) = x and x0 = 0, we have x0 = inf A (even better, x0 = min A) and
p
the limit limx!0 x is not de…ned. When x0 is a generic boundary point, for example the
point 3 for the set A = (0; 3) [ [5; 19], the problem appears exactly in the same terms.
p
Example 414 Let f : R+ ! R be given by f (x) = x. In Example 410 we have seen that
limx!0+ f (x) = 0. By what we have just said, we can also write limx!0 f (x) = 0. It is
instructive to calculate this bilateral limit also directly, through De…nition 397. Let " > 0.
As we have seen in Example 410, we have
p
jf (x) Lj = x < " () x < "2
4
In other words, one of the following four cases holds: (i) A = (a; b); (ii) A = [a; b); (iii) A = (a; b]; (iv)
A = [a; b].
11.2. FUNCTIONS OF A SINGLE VARIABLE 281

Setting " = "2 , for every x 2 A, that is, for every x 0, we have

0 < jx x0 j < " () 0 < x < " =) jf (x) Lj < "


p
and therefore limx!0 x = 0, i.e., it can also be seen as two-sided limit. N

11.2.4 Grand …nale


We conclude by observing that in the general De…nition 401 of the two-sided limit, which
summarizes all the cases of …nite or in…nite points and …nite or in…nite limits, the mention of
" and " is actually super‡uous. We can therefore rewrite such a de…nition in the following
“cleaner” way.

De…nition 415 Let f : A R ! R be a function and x0 2 R a limit point of A. We write

lim f (x) = L 2 R
x!x0

if, for every neighborhood V of L, there exists a neighborhood U of x0 such that

f ((U \ A) fx0 g) V

It is this version of two-sided limit that the reader will …nd generalized to topological
spaces in more advanced courses. We are happy to leave to the reader the analogous general
version for one-sided limits.
We observe that also for functions f : A R ! R it is possible to introduce the notions
of limit from above and of limit from below studied for sequences. Clearly, these notions
should not be confused with the one-sided limits discussed before.

11.2.5 Post scriptum: horizontal and vertical asymptotes


We will treat in detail the asymptotes in Chapter 24. We anticipate however that:

(i) when at least one of the following two conditions is satis…ed:

lim f (x) = +1 or 1
x!x+
0

lim f (x) = +1 or 1
x!x0

the straight line of equation x = x0 is called vertical asymptote for f ;

(ii) when
lim f (x) = L (or lim f (x) = L)
x!+1 x! 1

the straight line of equation y = L is called horizontal asymptote for f at +1 (or: at


1).
282 CHAPTER 11. LIMITS OF FUNCTIONS

1
Example 416 The function f (x) = x+1 + 2, with graph

8
y
6

0
O x
-2

-4

-5 0 5

has horizontal asymptote y = 2 and vertical asymptote x = 1. N

11.3 Functions of several variables


The extension to functions f : A Rn ! R of the de…nition of limit limx!x0 f (x) = L is
very natural. In this case we have d (x; x0 ) = kx x0 k and “to approach” x0 means that
this distance becomes smaller and smaller. We give directly the vector version of De…nition
401 for the notion of limit.5

De…nition 417 Let f : A Rn ! R be a function and x0 2 Rn a limit point of A. We


write
lim f (x) = L 2 R (11.17)
x!x0

if, for every neighborhood V" (L) of L, there exists a neighborhood U " (x0 ) of x0 such that

x 2 U " (x0 ) \ A and x 6= x0 =) f (x) 2 V" (L) (11.18)

The value L is called the limit of the function at x0 .

De…nition 401 is the special case with n = 1. In the “"- ” version we have (11.17) if, for
every " > 0, there exists " > 0 such that, for every x 2 A,

0 < d (x; x0 ) < " =) d (f (x) ; L) < " (11.19)

Note how (11.19) is absolutely identical to (11.4): the distance d (x; x0 ), that is, jx x0 j
in the case n = 1, in the general case becomes kx x0 k.
P
Example 418 Let f : Rn ! R be given by f (x) = 1+ ni=1 xi . We verify that limx!0 f (x) =
1. Let " > 0. We have
n
X n
X
d (f (x) ; 1) = 1 + xi 1 < " () xi < "
i=1 i=1
5
We consider only the case x0 2 Rn , since the study of the analogous notion x ! 1 is particularly
delicate for the vector case.
11.3. FUNCTIONS OF SEVERAL VARIABLES 283

P Pn
Set " = "=n. Since j ni=1 xi j i=1 jxi j, we have
v
u n n
uX " X "2
d (x; x0 ) < " () t x2i < () x2i < 2 =)
n n
i=1 i=1
q r
2 "2 2 "2 "
xi < 2 8i = 1; 2; : : : ; n =) jxi j = xi < 2
= 8i = 1; 2; : : : ; n
n n n
Xn n
X Xn
=) jxi j < " =) d (f (x) ; 1) = xi jxi j < "
i=1 i=1 i=1

that is, limx!0 f (x) = 1. N

To extend the de…nition of one-sided limit to the vector case we should consider the
di¤erent directions along which we can approach a point x0 2 Rn . Leaving to more advanced
courses the study of the topic, here we will only consider the extension to Rn of the scalar
two-sided limit given by De…nition 417. By contrast, there is no di¢ culty in extending to
vector functions the limits from above and from below (because L is in R and not in Rn ).

The calculation of the limit of a function de…ned on Rn is much more delicate than in
the scalar case. Intuitively, for a function of several variables f to tend to a value L, as x
tends to the vector x0 , it is necessary that this happens along all possible approaching paths
from x to x0 . If, therefore, there are two approaching ways along which f does not tend to
the same limit value, the function does not have a limit as x ! x0 . The following example
should clarify the issue.

Example 419 Let f : R2 ! R be given by

ln(1 + x1 x2 )
f (x1 ; x2 ) =
x21

Let us verify that lim(x1 ;x2 )!(0;0) f (x) does not exist. Consider two approaching paths to the
vector (0; 0), along the parabola x2 = x21 , and along the straight line x2 = x1 . Along the …rst
path we have

ln(1 + x31 ) ln(1 + x31 )


lim f (x1 ; x2 ) = lim f x1 ; x21 = lim = lim x1 = 0 (11.20)
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x21 x1 !0 x31

while along the second one

ln(1 + x21 )
lim f (x1 ; x2 ) = lim f (x1 ; x1 ) = lim =1
(x1 ;x2 )!(0;0) x1 !0 x1 !0 x21

Since f tends to two di¤erent limit values along the two di¤erent paths to (0; 0), we conclude
that lim(x1 ;x2 )!(0;0) f (x) does not exist. Let us prove it rigorously using the last de…nition.
Suppose, by contradiction, that the limit exists, that is,

lim f (x1 ; x2 ) = L
(x1 ;x2 )!(0;0)
284 CHAPTER 11. LIMITS OF FUNCTIONS

Set " = 1=4. By the de…nition of limit, there exists 1 > 0 such that for (x1 ; x2 ) 2 B 1 (0; 0),
with (x1 ; x2 ) 6= (0; 0), one has
1
d (f (x1 ; x2 ) ; L) < (11.21)
4
From (11.20), setting
ln(1 + x3 )
g(x) =
x2
one gets that limx1 !0 g(x1 ) = 0. Therefore, by setting again " = 1=4, there exists 2 >0
such that for x1 2 B 2 (0) R, with x1 6= 0, we have
1 1
g(x1 ) 2 ( "; ") = ;
4 4
Now consider the spherical neighborhood B 2 (0; 0) R2 of (0; 0): Take a point of the
2
parabola x2 = x1 that belongs to this neighborhood, that is, a point x ^21 2 B 2 (0; 0),
^1 ; x
with x^1 ; x ^1 2 B 2 (0),6 and therefore
^21 6= (0; 0). We have x
1 1
f x ^21 = g (^
^1 ; x x1 ) 2 ; (11.22)
4 4
Analogously, from the limit along the second path above setting
ln(1 + x2 )
h(x) =
x2
one gets limx1 !0 h(x1 ) = 1. Therefore, setting again " = 1=4, there exists 3 > 0 such that
for x1 2 B 3 (0) R, with x1 6= 0, we have
3 5
h(x1 ) 2 (1 "; 1 + ") = ;
4 4
Now consider the spherical neighborhood B 3 (0; 0) R2 of (0; 0) and take a point of the
straight line x2 = x1 that belongs to it, that is, a point (~
x1 ; x
~1 ) 2 B 3 (0; 0), with (~
x1 ; x
~1 ) 6=
(0; 0). We have x~1 2 B 3 (0), so that
3 5
f (~
x1 ; x
~1 ) = h (^
x1 ) 2 ; (11.23)
4 4
Let = minf 1 ; 2 ; 3 g and consider two points x ^21 and (~
^1 ; x x1 ; x
~1 ) on the parabola and
on the straight line that belong to B (0; 0) and that are di¤erent from the origin (0; 0): By
(11.22) and (11.23), we have
1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) >
2
On the other hand, for (11.21) we have
1 1 1
d f x ^21 ; f (~
^1 ; x x1 ; x
~1 ) d f x ^21 ; L + d (L; f (~
^1 ; x x1 ; x
~1 )) < + =
4 4 2
We reached a contradiction, and therefore the limit lim(x1 ;x2 )!(0;0) f (x1 ; x2 ) does not exist.
N
6
Indeed, d((^ ^21 ); (0; 0)) <
x1 ; x 2, ^21 + x
that is, x ^41 < 2
2, ^21 <
implies x 2
2, whence d(^
x1 ; 0) < 2.
11.4. PROPERTIES OF LIMITS 285

Finally, we observe that De…nition 468 of Chapter 12 will extend the notion of limit
to operators. Since this last extension does not present any substantial novelty, we prefer
to study it directly together with the notion of continuity of operators (which justi…es the
study).

11.4 Properties of limits


In this section we present some fundamental properties of limits. In order to state the
properties directly in terms of functions of several variables, we will consider …nite limit
points x0 , that is, x0 2 Rn ; we leave to the reader the versions of the properties for the case
x ! 1 of scalar functions.

We start with an important characterization of the limits of functions through limits of


sequences.
Proposition 420 Given a function f : A Rn ! R and a limit point x0 2 Rn of A, we
have limx!x0 f (x) = L 2 R if and only if
xn ! x0 =) f (xn ) ! L
for every sequence fxn g in A with xn 6= x0 for every n 1.
Proof Let us consider L 2 R, leaving to the reader the case L = 1. “If”. Suppose
f (xn ) ! L for every sequence fxn g of points of A, with xn 6= x0 for every n, such that
xn ! x0 . Assuming by contradiction that limx!x0 f (x) = L is false, there exists " > 0 such
that, for every > 0, there exists x 2 A such that 0 < d (x ; x0 ) < and d (f (x ) ; L) ".
For every n, set = 1=n and let xn be the corresponding point of A just denoted by x . For
the sequence fxn g of points of A so built we have d (x0 ; xn ) < 1=n for every n, and hence
limn!1 d (x0 ; xn ) = 0. By Proposition 268, xn ! x. But, by construction, d (f (xn ) ; L) "
for every n, and therefore the sequence f (xn ) does not converge to L. Having contradicted
the hypothesis, we conclude that limx!x0 f (x) = L.

“Only if”. Let us suppose limx!x0 f (x) = L 2 R. Let fxn g be a sequence of points of A,
with xn 6= x0 for every n, such that xn ! x0 . Let " > 0. There exists " > 0 such that for
every x 2 A with 0 < d (x; x0 ) < " we have d (f (x) ; L) < ". Since xn ! x0 and xn 6= x0 ,
there exists n" 1 such that 0 < d (xn ; x0 ) < " for every n n" . For every n n" we have
therefore d (f (xn ) ; L) < ", which implies f (xn ) ! L.
Example 421 We go back to the limit limx!2 (3x 5) of Example 398. Since A = R, let
fxn g be any sequence of real numbers, with xn 6= 2 for every n, such that xn ! 2. For
example, xn = 2 + 1=n or xn = 2 1=n2 . By the algebra of limits of sequences, we have

lim (3xn 5) = 3 lim xn 5=3 2 5=1


n!+1 n!+1

For example, in the special case xn = 2 + 1=n we have


1 1
lim 3 2+ 5 = 3 lim 2+ 5=3 2 5=1
n!+1 n n!+1 n
N
286 CHAPTER 11. LIMITS OF FUNCTIONS

Example 422 Consider the function f : R++ ! R given by


p
x
f (x) =
x

and the limit limx!0 f (x). Since A = R++ and x0 = 0, let fxn g be any sequence of strictly
positive real numbers, i.e., xn > 0 for every n, such that xn ! 0. For example, xn = 1=n or
xn = 1=n2 . By the algebra of limits of sequences, we have
p
xn 1
lim = lim p = +1
n!+1 xn n!+1 xn

and so Proposition 420 allows us to conclude that limx!0 f (x) = +1. N

The characterization of limits through sequences is important both from the computa-
tional viewpoint, because the calculation of the limits of functions reduces to the simpler
calculation of limits of sequences, and from the theoretical viewpoint, because in this way
many of the properties seen for limits of sequences extend immediately to limits of functions.
In this section we will rely on the second, more theoretical aspect, to obtain basic properties
of limits of functions. We start with the uniqueness of the limit.

Theorem 423 (uniqueness of the limit) Let f : A Rn ! R be a function and x0 2 R


a limit point of A. There exists at most a unique L 2 R such that limx!x0 f (x) = L.

In the case n = 1 we can also assume x0 2 R:

Proof Let us suppose, by contradiction, that there exist two di¤erent limits L0 6= L00 . Let
fxn g be a sequence in A, with eventually xn 6= x0 , such that xn ! x0 . By Proposition 420,
f (xn ) ! L0 and f (xn ) ! L00 , which contradicts the uniqueness of the limit for sequences.
It follows that L0 = L00 .

Here is an alternative proof, which does not use limits of sequences.

Alternative proof By contradiction, let us suppose that there exist two di¤erent limits L1
and L2 , that is, L1 6= L2 . We assume therefore that

lim f (x) = L1
x!x0

and
lim f (x) = L2
x!x0

with L1 6= L2 . Without loss of generality, suppose that L1 > L2 . There exists a number K
such that L1 > K > L2 . Setting 0 < "1 < L1 K and 0 < "2 < K L2 , the neighborhoods
11.4. PROPERTIES OF LIMITS 287

B"1 (L1 ) = (L1 "1 ; L1 + "1 ) and B"2 (L2 ) = (L2 "2 ; L2 + "2 ) are disjoint.

10 y
L +ε
2 2
8
L2

L -ε
2 2
6
L +ε
1 1
L
1
4 L -ε
1 1

O x
0

-2
-2 -1 0 1 2 3 4

Since by hypothesis limx!x0 f (x) = L1 , given "1 > 0 one can …nd 1 > 0 such that

x 2 (x0 1 ; x0 + 1) \ A; x 6= x0 =) f (x) 2 (L1 "1 ; L1 + "1 ) (11.24)

Analogously, since by hypothesis limx!x0 f (x) = L2 , given "2 > 0 one can …nd 2 > 0 such
that
x 2 (x0 2 ; x0 + 2 ) \ A; x 6= x0 =) f (x) 2 (L2 " 2 ; L2 + " 2 ) (11.25)
Taking = min f 1 ; 2 g we have that the neighborhood (x0 ; x0 + ) of x0 with radius
is contained in the two previous neighborhoods, i.e., in (x0 ; x0 + ) both (11.24) and
(11.25) hold:

8x 2 (x0 ; x0 + ); x 6= x0 =) f (x) 2 (L1 "1 ; L1 + "1 ) and f (x) 2 (L2 " 2 ; L2 + " 2 )

Hence,

8x 2 (x0 ; x0 + ); x 6= x0 =) f (x) 2 (L1 "1 ; L1 + "1 ) \ (L2 " 2 ; L2 + " 2 )

which is a contradiction, since we assumed that

(L1 "1 ; L1 + "1 ) \ (L2 " 2 ; L2 + " 2 ) = ;

The limit is therefore unique.

We continue with the version for functions of the permanence of sign Theorem.

Theorem 424 (permanence of sign) Let f : A Rn ! R be a function and x0 2 Rn


(or x0 2 R when n = 1) a limit point of A. If limx!x0 f (x) = L 6= 0, then there exists a
neighborhood U" (x0 ) of x0 such that

f (x) L > 0 8x 2 (U" (x0 ) \ A) fx0 g

i.e., such that f (x) and L have the same sign.


288 CHAPTER 11. LIMITS OF FUNCTIONS

This last important result is very intuitive: if L 6= 0, it is always possible to choose a


neighborhood of x0 (su¢ ciently small) such that all its points have image of the same sign
of L.

We leave to the reader the “sequential”proof, based on Theorem 281 and on Proposition
420, giving instead an alternative proof that does not use limits of sequences.

Alternative proof Suppose that L > 0. Since limx!x0 f (x) = L, taking " = L=2 > 0,
there exists a neighborhood U" (x0 ) of x0 such that
L L L 3L
x 2 U" (x0 ) \ A fx0 g =) f (x) 2 L ;L + = ;
2 2 2 2
Since L=2 > 0, we are done. For L < 0 the proof is analogous.

We now move to the comparison for functions.


Theorem 425 (Comparison criterion) Let f; g; h : A Rn ! R be three functions and
x0 2 Rn (or x0 2 R when n = 1) a limit point of A. If
g (x) f (x) h (x) 8x 2 A (11.26)
and
lim g (x) = lim h (x) = L 2 R (11.27)
x!x0 x!x0
then
lim f (x) = L
x!x0

Again we leave to the reader the “sequential” proof, based on Theorem 303 and on
Proposition 420 and provide an alternative proof.

Alternative proof Given " > 0 arbitrary, we have to show that there exists > 0 such that
f (x) 2 (L "; L + ") for every x 2 (x0 ; x0 + ) \ A with x 6= x0 . Since limx!x0 g(x) = L,
given " > 0, there exists 1 > 0 such that
8x 2 (x0 1 ; x0 + 1) \ A; x 6= x0 =) L " < g(x) < L + " (11.28)
Since limx!x0 h(x) = L, given " > 0, there exists 2 > 0 such that
8x 2 (x0 2 ; x0 + 2) \ A; x 6= x0 =) L " < h(x) < L + " (11.29)
Now taking = min f 1 ; 2 g, we have that in (x0 ; x0 + ) \ A both (11.28) and (11.29)
hold. Moreover, g(x) f (x) h(x) in (x0 ; x0 + ) \ A. Therefore, for any x 2
(x0 ; x0 + ) \ A; x 6= x0 we have
L " < g(x) f (x) h(x) < L + "
that is
f (x) 2 (L "; L + ") 8x 2 (x0 ; x0 + ); x 6= x0
Since " was arbitrary, we have limx!x0 f (x) = L, as claimed.

The interpretation of the result is completely analogous to the version for sequences. Let
us see a simple application, similar, mutatis mutandis, to the one seen in Example 304.
11.4. PROPERTIES OF LIMITS 289

2 1
Example 426 Let f : R ! R be given by f (x) = ex cos x and let x0 = 0. Since

1
0 cos2 1 8x 2 R
x
considering the monotonicity of the exponential function, for x 0 we have
2 1
1 = e0 x ex cos x e 1 x = ex 8x 0

Setting g (x) = 1 and h (x) = ex , conditions (11.26) and (11.27) are satis…ed with L = 1.
Therefore limx!0 f (x) = 1. The proof for x < 0 is analogous. N

As already observed, also for functions, the permanence of sign and the comparison
theorems are properties of the limits with respect to the structure of order. The next
proposition, the analogue for functions of Proposition 282, is yet another simple result of the
same type.

Proposition 427 Let f; g : A Rn ! R be two functions, x0 2 Rn (or x0 2 R when n = 1)


a limit point of A and let be limx!x0 f (x) = L 2 R and limx!x0 g (x) = H 2 R.

(i) If f (x) g (x) in a neighborhood Br (x0 ) of x0 , then L H.

(ii) If L > H, then there exists a neighborhood of x0 in which f (x) > g (x) :

Proof (i) By contradiction, assume that L < H. Set " = H L, so that " > 0. The
neighborhoods (L "=4; L + "=4) and (H "=4; H + "=4) are disjoint since L + "=4 < H
"=4. Since limx!x0 f (x) = L, there exists 1 > 0 such that
" "
x 2 (x0 1 ; x0 + 1 ); x 6= x0 =) f (x) 2 L ;L +
4 4
Analogously, since limx!x0 g (x) = H, there exists 2 > 0 such that
" "
x 2 (x0 2 ; x0 + 2 ); x 6= x0 =) g(x) 2 H ;H +
4 4
By setting = minf 1 ; 2 g, we have
" " " "
x 2 (x0 ; x0 + ); x 6= x0 =) L < f (x) < L + < H < g(x) < H +
4 4 4 4
that is, f (x) < g(x) for every x 2 B (x0 ). This contradicts the hypothesis that f (x) g (x)
in a neighborhood Br (x0 ) of x0 .
(ii) If one would have f (x) g(x) for every neighborhood of x0 , then by (i) one would
have L H.

Observe that in (i) L H continues to hold also when we have the strict inequality
f (x) > g (x). For example, if f (x) = 1=x and g (x) = 0, for x ! +1 we have L = H = 0
although f (x) > g (x) for every x > 0.
290 CHAPTER 11. LIMITS OF FUNCTIONS

11.5 Algebra of limits


The next result concerns the behavior of limits with respect to the fundamental operations.7
It is the version for functions of the analogous properties of sequences seen in Proposition
297. An analogous result, omitted for brevity, holds for Proposition 302.

Proposition 428 Given two functions f; g : A Rn ! R and a limit point x0 2 Rn of A,


suppose that limx!x0 f (x) = L 2 R and limx!x0 g (x) = M 2 R. Then:

(i) limx!x0 (f + g) (x) = L+M , provided that L+M is not an indeterminate form (1.24),
of the type
+1 1 or 1+1

(ii) limx!x0 (f g) (x) = LM , provided that LM is not an indeterminate form (1.25), of the
type
1 0 or 0 ( 1)

(iii) limx!x0 (f =g) (x) = L=M provided that g (x) 6= 0 in a neighborhood of x0 , with x 6= x0 ,
and L=M is not an indeterminate form (1.26), of the type8
1 a
or
1 0

Proof We prove only (i), leaving to the reader the analogous proof of (ii) and (iii). Let fxn g
be a sequence in A, with xn 6= x0 for every n 1, such that xn ! x0 . By Proposition 420,
f (xn ) ! L and g (xn ) ! M . Let us suppose that L + M is not an indeterminate form. By
Proposition 297, (f + g) (xn ) ! L + M , and therefore, by Proposition 420 it follows that
limx!x0 (f + g) (x) = L + M .

Example 429 Let f; g : R f0g ! R be given by f (x) = sin x=x and g (x) = 1= jxj. We
have limx!0 sin x=x = 1 and limx!0 1= jxj = +1, and therefore

sin x 1
lim + = 1 + 1 = +1
x!0 x jxj

If, instead, g (x) = ex , we have limx!0 (sin x=x + ex ) = 1 + 1 = 2. N

As in the case of sequences, the case a=0 of point (iii) with a 6= 0 is not of indetermination
for the algebra of limits, as the following version for functions of Proposition 300 shows.

Proposition 430 Let limx!x0 f (x) = L 2 R, with L 6= 0, and limx!x0 g(x) = 0. The limit
limx!x0 (f =g) (x) exists if and only if there is a neighborhood U (x0 ) of x0 2 Rn where the
function g has constant sign, except at most at x0 . In such a case:9
7
As in the previous section, we will consider limits at points x0 2 Rn , leaving to the reader the case
x ! 1 for functions of one variable.
8
As in the case of sequences, we observe that to exclude the indeterminacy a0 is equivalent to require that
M 6= 0.
9
Here g ! 0+ and g ! 0 indicate that limx!x0 g (x) = 0 with, respectively, g(x) > 0 and g (x) < 0 for
every x0 6= x 2 U (x0 ).
11.5. ALGEBRA OF LIMITS 291

(i) if L > 0 and g ! 0+ or if L < 0 and g ! 0 , then

f (x)
lim = +1
x!x0 g (x)

(ii) if L > 0 and g ! 0 or if L < 0 and g ! 0+ , then

f (x)
lim = 1
x!x0 g (x)

Example 431 Consider f (x) = x + 5 and g(x) = x. As x ! 0 we have f ! 5, but in every


neighborhood of 0 the sign of the function g(x) alternates, that is, there is no neighborhood
of 0 where g has constant sign. By Proposition 430, the limit of (f =g) (x) as x ! 0 does not
exist. N

As in the previous section, we have considered limits at points x0 2 Rn . The reader


can verify that the results of this section extend to the case x ! 1 for functions of one
variable.

Example 432 Take f (x) = 1=x 1 and g(x) = 1=x. As x ! +1 we have f ! 1 and
g ! 0. Since g(x) > 0 for every x > 0, and therefore also in any neighborhood of +1,
we have g ! 0+ . Thanks to the version for x ! 1 of Proposition 430, one obtains
limx!+1 (f =g) (x) = 1. N

11.5.1 Indeterminacies for limits


The algebra of limits presents indeterminacies analogous to what we have seen in the case
of sequences. We see them brie‡y.

Indeterminacy 1 1
Consider, for example, the limit limx!0 (f + g) (x) of the sum of the functions f; g : R f0g !
R given by f (x) = 1=x2 and g (x) = 1=x4 , which falls under this indeterminacy. We have

1 1 1 1
(f + g) (x) = = 2 1
x2 x4 x x2

and therefore
1 1
lim (f + g) (x) = lim lim 1 = 1
x!0 x!0 x2 x!0 x2
since (+1) ( 1) is not an indeterminacy. Exchanging the signs between these two func-
tions, that is, setting f (x) = 1=x2 and g (x) = 1=x4 , we have again the indeterminacy
1 1 at x0 = 0, but this time limx!0 (f + g) (x) = +1. As it is completely obvious, also
for the case of functions the indeterminate forms can give completely di¤erent results and
they must be solved case by case.
Finally note that such f and g give rise to an indeterminacy at x0 = 0, but not at x0 6= 0.
Therefore, it is crucial to specify the point x0 that we are considering.
292 CHAPTER 11. LIMITS OF FUNCTIONS

Indeterminacy 0 1
For example, let f; g : R ! R be given by f (x) = (x 3)2 and g (x) = 1= (x 3)4 . The
limit limx!3 (f g) (x) falls under this indeterminacy. But we have
1 1
lim (f g) (x) = lim (x 3)2 4 = lim = +1
x!3 x!3 (x 3) x!3 (x 3)2
On the other hand, considering f (x) = 1= (x 3)2 and g (x) = (x 3)4 , we have
1
lim (f g) (x) = lim (x 3)4 = lim (x 3)2 = 0
x!3 x!3 (x 3)2 x!3

Again, only the direct calculation of the limit can determine its value.

Indeterminacies 1=1 and 0=0


For example, let f; g : R ! R be given by f (x) = 5 x and g (x) = x2 25. The limit of
their quotient as x ! 5 has the form 0=0, but
f 5 x 5 x 1 1
lim (x) = lim = lim = lim =
x!5 g x!5 x2 25 x!5 (x 5)(x + 5) x!5 x+5 10
On the other hand, taking f; g : R ! R given by f (x) = x2 and g (x) = x, as x ! +1 we
have a form of the type 1=1 and
f x2
lim (x) = lim = lim x = +1
x!+1 g x!+1 x x!+1

while for x ! 1 we still have a form of the type 1=1 and


f x2
lim (x) = lim = lim x = 1
x! 1 g x! 1 x x! 1

with results of opposite sign in the two cases: again, one cannot avoid the direct calculation
of the limit.

For the functions f and g just seen, at the point x0 = 0 we have the indeterminacy 0=0,
but
f x2
lim (x) = lim = lim x = 0
x!0 g x!0 x x!0

while, setting g (x) = x4 , we still have a form of the type 0=0 and
f x2 1
lim 4
(x) = lim
= lim 2 = +1
x!0 g
x!0 x x!0 x
p
On the other hand, taking f : R+ ! R given by f (x) = x + x 2 and g : R f1g ! R
given by g (x) = x 1, we have
p p p
f x+ x 2 x 1+ x 1 x 1
lim (x) = lim = lim = lim 1 +
x!1 g x!1 x 1 x!1 x 1 x!1 x 1
p
x 1 1 1 3
= 1 + lim p p = 1 + lim p =1+ =
x!1 ( x 1) ( x + 1) x!1 x+1 2 2
We close with two observations.
11.6. ELEMENTARY LIMITS AND IMPORTANT LIMITS 293

As for sequences, for functions the various indeterminacies can be reduced to one
another.

Also in the case of functions we can summarize what we have seen till now in tables
completely identical to those in Section 8.8.4, to which we refer.

11.6 Elementary limits and important limits


11.6.1 Elementary limits
Using the de…nitions seen in the previous section, we calculate some elementary limits. Other
signi…cant limits, that require the preliminary proof of some properties, will be considered
in Section 11.6.
The next examples, which exploit elementary properties of limits, provide limits of integer
powers, hyperbolic, exponential, logarithmic and trigonometric functions.

Example 433 (i) Let f : R ! R be given by f (x) = xn with n 1. For every x0 2 R, by


the elementary properties of limits, we have

lim xn = xn0
x!x0

Moreover, limx! n = +1 if n is even, while limx!+1 xn = +1 and limx! n


1x 1x = 1
if n is odd.

(ii) Let f : R f0g ! R be given by f (x) = 1=xn for n 1. For every 0 6= x0 2 R, we


have
1
lim f (x) = n
x!x0 x0
Moreover, limx! 1 1=xn = 0+ if n is even, while limx!+1 1=xn = 0+ and limx! 1 xn =
0 if n is odd. Finally, limx!0+ 1=xn = +1 and limx!0 1=xn = 1 if n is odd, while
limx!0+ 1=xn = limx!0 1=xn = +1 if n is even.

(iii) Let f : R ! R be given by f (x) = x, with > 0. For every x0 2 R, we have


limx!x0 x = x0 . Moreover,
8 8
< 0 if >1 < +1 if >1
x x
lim = 1 if =1 and lim = 1 if =1
x! 1 : x!+1 :
+1 if <1 0 if <1

(iv) Let f : R++ ! R be given by f (x) = loga x, with a > 0; a 6= 1. For every x0 > 0, we
have limx!x0 loga x = loga x0 . Moreover,

1 if a > 1 +1 if a > 1
lim loga x = and lim loga x =
x!0+ +1 if a < 1 x!+1 1 if a < 1

(v) Let f; g : R ! R be given by f (x) = sin x and g (x) = cos x. For every x0 2 R, we
have limx!x0 sin x = sin x0 and limx!x0 cos x = cos x0 . The limits limx! 1 sin x and
limx! 1 cos x do not exist. N
294 CHAPTER 11. LIMITS OF FUNCTIONS

11.6.2 Important limits


Thanks to the de…nitions and to the elementary limits, we can now calculate some non-trivial
frequently encountered limits.
We had already met the …rst such limit (11.30) in the introduction to the present chapter.

Proposition 434 Let f; g : R+ ! R be de…ned by f (x) = sin x=x and g (x) = (cos x 1) =x.
Then
sin x
lim =1 (11.30)
x!0 x

and
1 cos x 1 cos x 1
lim = 0; lim 2
= (11.31)
x!0 x x!0 x 2

Proof It is easy to see graphically that 0 < sin x < x < tan x for x 2 (0; =2) and that
tan x < x < sin x < 0 for x 2 ( =2; 0). Therefore, dividing all the terms by sin x and
observing that sin x > 0 when x 2 (0; =2) and sin x < 0 when x 2 ( =2; 0), we have in all
the cases
x 1
1< <
sin x cos x
The …rst limit then follows from the comparison criterion. For the third one, it is su¢ cient
to observe that
1 cos x 1 cos x 1 + cos x 1 cos2 x sin2 x 1
2
= 2
= 2
= 2
x x 1 + cos x x (1 + cos x) x 1 + cos x

and that, as x ! 0, the …rst factor tends to 1 while the second one tends to 1=2.
Finally, the second limit follows immediately from the third one:
1 cos x 1 cos x 1
=x 2
!0 =0
x x 2

From the analogous propositions for sequences we easily deduce (the proofs are essentially
identical) the following ones:

(i) If f (x) ! 1 as x ! x0 , then


f (x)
k
lim 1+ = ek
x!x0 f (x)

In particular
f (x) x
1 1
lim 1+ = e; lim 1+ =e
x!x0 f (x) x! 1 x

(ii) Let a > 0 and f (x) ! 0 as x ! x0 . Then

af (x) 1
lim = log a
x!x0 f (x)
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 295

In particular,
ax 1
lim = log a (11.32)
x!0 x
which, when a = e, becomes
ex 1
lim =1
x!0 x
(iii) Let 0 < a 6= 1 and f (x) ! 0 as x ! x0 . Then

loga (1 + f (x)) 1
lim =
x!x0 f (x) log a

In particular,
loga (1 + x) 1
lim =
x!0 x log a
which, when a = e, becomes
log(1 + x)
lim =1
x!0 x
(iv) If f (x) ! 0 as x ! x0 , we have

(1 + f (x)) 1
lim =
x!x0 f (x)

In particular,
(1 + x) 1
lim =
x!0 x

11.7 Orders of convergence and of divergence


As for sequences, also for functions it happens that some of them tend to their limit “faster”
than other ones.
For simplicity we con…ne our discussion to functions of one variable, for which all the
conclusions we got for sequences hold. We give however the analogue of De…nition 316. Note
the importance of the clause “as x ! x0 ”, which is the unique true novelty with respect
to the case of sequences, in which this was not necessary because we considered the unique
limit as n ! +1. In other words, the clause “x ! x0 ” replaces the clause “n ! +1” used
in the asymptotic comparison of sequences.

De…nition 435 Given two functions f; g : A R ! R, let x0 2 R be a limit point of A for


which there exists a neighborhood B" (x0 ) such that g (x) 6= 0 for every x 2 A \ B" (x0 ).

(i) If
f (x)
lim =0
x!x0 g (x)
we say that f is negligible with respect to g as x ! x0 ; in symbols,

f = o (g) as x ! x0
296 CHAPTER 11. LIMITS OF FUNCTIONS

(ii) If
f (x)
lim = k 6= 0 (11.33)
x!x0 g (x)
we say that f is comparable with g as x ! x0 ; in symbols,

f g as x ! x0

(iii) In particular, if
f (x)
lim =1
x!x0 g (x)
we say that f and g are asymptotic (or asymptotically equivalent) to one another as
x ! x0 and we write
f (x) g (x) as x ! x0

Terminology For functions, too, the expression f = o (g) as x ! x0 reads “f is little-o


of g, as x ! x0 ”.

It is easy to see that for functions, too, the relations and for enjoy the same properties
seen in Section 8.12 for sequences:

(i) the relations of comparability and of asymptotic are symmetric and transitive;

(ii) the relation of negligibility is transitive;

(iii) if the limits limx!x0 f (x) and limx!x0 g (x) are both …nite and non-zero, then f g
as x ! x0 ;

(iv) if limx!x0 f (x) = 0 and 0 6= limx!x0 g (x) 2 R, then f = o (g) as x ! x0 .

We now consider the cases, the most interesting also for functions, in which both functions
converge to zero or diverge to 1. We start with the convergence to zero: limx!x0 f (x) =
limx!x0 g (x) = 0. In this case, intuitively, f is negligible with respect to g as x ! x0 if it
tends to zero faster. Let, for example, x0 = 1, and f (x) = (x 1)2 and g (x) = x 1. We
have
(x 1)2
lim = lim (x 1) ! 0
x!1 x 1 x!1

that is, f = o (g) as x0 ! 1. On the other hand, as x ! +1, we have


p r
x 1
lim p = lim 1 !1
x!+1 x + 1 x!+1 x+1
p p
and therefore the functions f (x) = x and g (x) = x + 1 are comparable (even better
asymptotic to one another) as x ! +1.

Let us consider two functions tending both to 1 as x ! x0 . In this case, intuitively, f


is negligible with respect to g when it tends to in…nity slower, that is, it assumes less rapidly
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 297

larger and larger values in absolute value. For example, if f (x) = x and g (x) = x2 , for
x0 = +1, we have
x 1
lim = lim =0
x!+1 x2 x!+1 x

and therefore f = o (g) as x ! +1. When x ! 1, too, we have


x 1
lim = lim =0
x! 1 x2 x! 1 x
and therefore f = o (g) also as x ! 1: in both cases x tends to in…nity slower than x2 .
Note that, as x ! 0, we have instead limx!0 x2 = limx!0 x = 0, and

x2
lim = lim x = 0
x!0 x x!0

and therefore g = o (f ) as x ! 0.

For functions, too, the meaning of negligibility must be speci…ed according to whether
we consider convergence to zero or divergence to in…nity. Moreover, the point x0 where we
take the limit is essential, and this represents the only meaningful novelty with respect to
what we have seen for sequences.

11.7.1 Little-o algebra


Like for sequences, for functions the application of the concept of “little-o” is not always
straightforward. Indeed, knowing that a function f is little-o of another function g as x ! x0
does not give much information on the form of f , apart from being negligible with respect to
g. There exists also in this case an “algebra”of little-o that allows to manipulate safely the
little-o of sums and products of functions. To avoid complicated notation, in what follows
we will always assume that the negligibility of the various functions is as x that tends always
to the same point x0 and therefore we will always omit writing “as x ! x0 ” (anyway, it
would not make sense to consider sums or products of little-o at di¤erent points).

Proposition 436 For every pair of functions f and g and for every scalar c 6= 0, it holds
that:

(i) o(f ) + o (f ) = o (f );

(ii) o(f )o(g) = o(f g);

(iii) c o(f ) = o(f );

(iv) o(g) + o (f ) = o (f ) if g = o(f ).

The writing o(f ) + o (f ) = o (f ) in (i), bizarre at …rst sight, simply means that the sum of
two little-o of the same function is still a little-o of that function, that is, it continues to be
negligible with respect to that function. The analogous re-reading of the other properties of
the proposition facilitates its understanding. Note that (ii) has the remarkable special case

o(f )o(f ) = o(f 2 )


298 CHAPTER 11. LIMITS OF FUNCTIONS

Proof As for sequences, if a function is little-o of f it can be written as f ", where " is an
in…nitesimal. Indeed
f "
lim = lim " = 0
x!x0 f x!x0

and therefore f " is little-o of f . The proof will be based on this very useful arti…ce.
(i) Let us call f " the …rst of the two little-o to the left of the equality symbol, and f
the second one, with " and two in…nitesimals as x ! x0 . Then
f "+f
lim = lim (" + ) = 0
x!x0 f x!x0

which shows that o(f ) + o (f ) is o (f ).


(ii) Let us call f " the little-o of f and g the little-o of g, with " and two in…nitesimals
as x ! x0 . Then
f " g
lim = lim (" ) = 0
x!x0 fg x!x0

so that o(f )o (g) is o (f g).


(iii) Let us call f " the little-o of f , with " in…nitesimal as x ! x0 . Then
c f "
lim =c lim " = 0
x!x0 f x!x0

which shows that c o(f ) is o (f ).


(iv) Denote g = f ", with " in…nitesimal as x ! x0 . Then, the little-o of g can be
written as g that is f " , with in…nitesimal as x ! x0 . Moreover, we call f the
little-o of f , with in…nitesimal as x ! x0 . Then
f " +f
lim = lim (" + )=0
x!x0 f x!x0

so that o(g) + o (f ) = o (f ).

Example 437 Let f (x) be the function given by f (x) = xn , with n > 2. Consider the
functions g(x) = xn 1 and h(x) = e x 3xn 1 . It is immediate to verify that g = o(f ) =
o(xn ) and h = o(f ) = o(xn ) as x ! +1.

(i) Summing the two functions we obtain g + h = e x 2xn 1, which is still o(xn ) as
x ! +1, in accordance with (i) proved above.
(ii) Multiplying the two functions we obtain g h = xn 1 e x 3x2n 2 , which is o(xn xn ) ,
i.e., o(x2n ) as x ! +1, in accordance with (ii) proved above (in the special case
o(f )o(f )). Note that (since n > 2), g h is not o(xn ).
(iii) Let us set c = 3, and consider c g = 3xn 1 . It is immediate to verify that 3xn 1 is
still o(xn ) as x ! +1, in accordance with (iii) proved above.
(iv) Consider the function l(x) = x + 1. It is immediate to verify that l = o(g) = o(xn 1 )
as x ! +1. Consider now the sum l + h (with h de…ned above), which is a sum of
a o(g) and of a o(f ), with g = o(f ). We have l + h = x + 1 + e x 3xn 1 , which is
o(xn ) as x ! +1, i.e., o(f ), in accordance with (iv) proved above. Note that l + h is
not o(g), even if l is a o(g). N
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 299

N.B. (i) To state that a function is o (1) as x ! x0 simply means that it tends to 0 as
x ! x0 . Indeed, f = o (1) means that f =1 = f ! 0 as x ! x0 . (ii) The fourth property of
the previous proposition is particularly important, since it highlights that if g is negligible
with respect to f , in the sum o(g) + o (f ) the little-o o(g) is “absorbed” by o (f ). O

Let us see some classical examples of functions with di¤erent rates of divergence.

Proposition 438 If k; h > 0, > 1 and a > 1, we have

(i) xk = o ( x) as x ! +1, that is,

xk
lim x
=0
x!+1

(ii) xh = o xk as x ! +1 if h < k;

(iii) loga x = o xk as x ! +1, that is,

loga x
lim =0
x!+1 xk

Note that, by the transitivity property of the negligibility relation, from (i) and (ii) it
follows that
loga x = o ( x ) as x ! +1
Proof For all the three functions x , xk , loga x, one has that f (n 1) f (x) f (n)
where n = [x] is the integer part of x: such sequences are therefore increasing. It is then
su¢ cient to use the sequence de…nition of the limit of a function and to use the comparison
criterion.

N.B. To state that a function is a o (1) as x ! x0 simply means that it tends to 0: indeed,
to state that f (x) = o (1) means that f (x) =1 = f (x) ! 0. O

11.7.2 Asymptotic equivalence


The asymptotic equivalence for functions is analogous to that for sequences. In particular, we
will see that in the calculation of limits it is possible to replace a function by an asymptotically
equivalent one, which often allows to simplify in a substantial way such calculation.
The development of this argument is parallel to the one seen for sequences in Section
8.12.3. Such parallelism, and the unavoidable repetitiveness that it implies, should not make
us lose out of sight the importance of what we will see now. In any case, to limit repetitions,
we will omit some details and comments, as well as the proofs (referring the reader to Section
8.12.3).
Let us start by observing that f (x) g (x) as x ! x0 implies, for given L 2 R,

lim f (x) = L () lim g (x) = L


x!x0 x!x0

that is, two functions asymptotic to one another as x ! x0 have the same limit as x ! x0 .
In particular, we have the following version for functions of Lemma 320.
300 CHAPTER 11. LIMITS OF FUNCTIONS

Lemma 439 Let f (x) g (x) and h (x) l (x) as x ! x0 . Then:

(i) f (x)+h (x) g (x)+l (x) as x ! x0 , provided that there exist k > 0 and a neighborhood
B" (x0 ) of x0 such that
g (x)
k (11.34)
g (x) + l (x)
for every x 2 B" (x0 );

(ii) f (x) h (x) g (x) l (x) as x ! x0 ;

(iii) f (x) =h (x) g (x) =l (x) as x ! x0 , provided that h (x) 6= 0 and l(x) 6= 0 in every
point x 6= x0 of a neighborhood B" (x0 ).

We give now the analogue of the important Lemma 321,

Lemma 440 We have

f (x) f (x) + o (f (x)) as x ! x0 (11.35)

Therefore,
lim f (x) = L () lim (f (x) + o (f (x))) = L
x!x0 x!x0

What is negligible with respect to f as x ! x0 , which is what o (f (x)) is as x ! x0 , is


asymptotically irrelevant and can be neglected. Thanks to Lemma 439, we therefore have
for products and quotients (also here, as for sequences, the most interesting cases):

(f (x) + o (f (x))) (g (x) + o (g (x))) f (x) g (x) as x ! x0 (11.36)

and
f (x) + o (f (x)) f (x)
as x ! x0 (11.37)
g (x) + o (g (x)) g (x)
Let us see some examples.

Example 441 Consider the limit


p p
3 1 2
2 x + 5 x2 + x 2x 2 + 5x 3 + x
lim p = lim
x!+1 3 + x3 + 3x x!+1 3 + x 23 + 3x

3
and let us set f (x) = x and g (x) = x 2 . As x ! +1 we have
1 2
2x 2 + 5x 3 = o (f ) and 3 + 3x = o (g)

Expression (11.37) implies therefore


1 2
2x 2 + 5x 3 + x x 1
3 3 =p !0 as x ! +1
3 + x + 3x 2 x
2 x
N
11.7. ORDERS OF CONVERGENCE AND OF DIVERGENCE 301

Example 442 Consider the limit


1
x2
+ x24 + e1x x 2+ 2x 4 + e x
lim = lim
x!+1 14 + 18 + 310 x!+1 x 4 + x 8 + 3x 10
x x x

As x ! +1, we have x 8 + 3x 10 = o x 4 and, by Proposition 438-(i) e x + 2x 4 =


o x 2 : Expression (11.37) implies therefore

x 2+ 2x 4 + e x x 2
4 + x 8 + 3x 10 4
= x2 ! +1 as x ! +1
x x
N

Example 443 Consider the limit

1 cos x
lim
x!0 sin2 x + x3

Applying …rst (11.37) and then Lemma 439 item (iii), we obtain

1 cos x 1 cos x 1 cos x 1


! as x ! 0
sin2 x + x3 sin2 x x 2 2
N

11.7.3 Terminology
Here too, for the comparison of two functions converging to zero and of two functions tending
to 1, there is a speci…c terminology. In particular,

(i) a function f such that limx!x0 f (x) = 0 is called in…nitesimal as x ! x0 ;

(ii) a function f such that limx!x0 f (x) = 1 is called in…nite (or unbounded, or in…nitely
large) as x ! x0 ;

(iii) if two functions f and g are in…nitesimal at x0 and such that f = o (g) as x ! x0 , then
f is said to be in…nitesimal of higher order at x0 with respect to g;

(iv) if two functions f and g are in…nite at x0 and such that f = o (g) as x ! x0 , then f
is said to be in…nite of lower order with respect to g.

A function is therefore in…nitesimal of higher order than another one if it tends to zero
faster, while it is in…nite of lower order if it tends to in…nity slower.

Example 444 (i) The functions de…ned by (x x0 )a are in…nitesimal as x ! x+ 0 when


a > 0 and in…nite when a < 0.
(ii) The functions de…ned by x are in…nite as x ! +1 and in…nitesimal as x ! 1 when
> 1, and vice versa when 0 < < 1. N
302 CHAPTER 11. LIMITS OF FUNCTIONS

11.7.4 The usual bestiary


We recast the results, already provided for sequences, concerning comparison among expo-
nential functions x , power functions xk , and logarithmic functions logh x. As x ! +1,
they are in…nite when > 1, k > 0 and h > 0, and in…nitesimal when 0 < < 1, k < 0 and
h < 0.

(i) If > > 0, then x


= o( x ): indeed, x
= x = ( = )x ! 0.

(ii) xk = o ( x ) for every > 1 and k > 0, as already proved with the ratio criterion. If
instead 0 < < 1 and k > 0, then x = o xk .

(iii) If k1 > k2 > 0, then xk2 = o xk1 : indeed, xk2 =xk1 = xk2 k1 ! 0.

(iv) If k > 0, then logh x = o xk .

(v) If h1 > h2 , then logh2 x = o logh1 x : indeed,

logh2 x 1
h1
= h1 h2
!0
log x log x

We can still add

(vi) x = o (xx ) for every > 0: indeed, x =xx = ( =x)x ! 0.

The previous results adapt easily to the case in which instead of x we have a function
f (x) ! +1 as x ! x0 . Moreover, such results can be organized in scales of in…nities and
in…nitesimals, in analogy with what we have seen for sequences. For brevity we omit the
details.

S-ar putea să vă placă și