Sunteți pe pagina 1din 6

Int. Statistical Inst.: Proc.

58th World Statistical Congress, 2011, Dublin (Session CPS055)

Sizing and profiling the Small, Medium and Micro Business


Market in South Africa
Galpin, Jacky
University of the Witwatersrand, School of Statistics and Actuarial Science
1 Jan Smuts Avenue
Johannesburg (2000), South Africa
jacky@galpin.co.za

Neethling, Ariane
University of the Free State and University of Stellenbosch
Ariane_Neethling@yahoo.com

Introduction
FinMark Trust is an independent Trust, established with funding from the United Kingdom Department for
International Development, with the objective of Making financial markets work for the poor in Africa. One
aim is to build a picture of the informal business sector, both as to the size and characteristics, and well as the
role they can play in developing countries. Businesses range from very informal (such as vendors on street
corners and hawkers) to semi-formal (such as those running a garden service or computer repair business), to
more formal registered businesses, with a formal office.
The 2010 FinscopeTM survey targeted Small, Micro and Medium Enterprises (SMMEs) in South Africa
(SA). A nationally representative sample of business owners aged 16+, with less than 200 employees, was
drawn. The objectives of the survey were to estimate the size of the small business market in SA, to quantify the
number of people engaged in small business activities, and to profile the businesses.
Sample design
A stratified random sample of 1000 enumerator areas (EAs) was drawn, representative of SA at national,
provincial and geo-type levels (Finscope, 2010). Probability proportional to size sampling was used, with the
estimated number of households per EA in 2009 being used as the measure of size. The dominant race group
(Black, Coloured, Asian, White) of the EA was used as a further stratification variable, to ensure that a
representative sample of all race groups was obtained. Power rule allocation (power = 0.7) was used to ensure an
adequate sample size in each of the strata, this disproportionate allocation procedure being recommended for
surveys with numerous small strata where there is a need for relatively precise estimates at each stratum level
(Lehtonen & Pahkinen, 1994). The distribution of the EAs is shown in Table 1.
The dwelling units (DUs) in each chosen EA were listed. A step-size was chosen to yield six segments for
the EA, as the aim was to obtain six interviews with small business owners in each selected EA. From each
starting point successive dwellings in the segment (as marked on a map supplied to the interviewers) were
visited, with contact information being listed, as well as whether any of the household members was involved in
a small business. On finding a dwelling with a SMME, a full interview was conducted, and the interviewer
progressed to the next starting point. This gave the hit rate information for each EA, namely the number of
dwelling with no success, before a success was obtained. The number of successful interviews in the EA was
also recorded. A total of 5676 interviews with SMMEs were obtained.

p.5254

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)

Dominant race group


Province
Western
Cape

Eastern Cape

Northern
Cape

Free State

Geoarea
Urban
Rural

Blacks Coloureds
Asians Whites
37
43
2
37
0
13
0
0

p.5255

Overall total per


Geo-area and
province
119
13

Total

37

56

37

132

Urban
Rural

34
52

15
4

2
0

18
0

69
56

Total

86

19

18

125

Urban
Rural

13
7

15
6

0
0

7
0

35
13

Total

20

21

48

Urban
Rural

40
16

5
1

0
0

16
0

61
17

Total

56

16

78

Urban
Rural

45

29

27

108

58

58

Total

103

29

27

166

North West

Urban
Rural

27
36

3
0

2
0

16
0

48
36

Total

63

16

84

Gauteng

Urban
Rural

102
10

14
0

12
0

59
1

187
11

Total

112

14

12

60

198

Urban
Rural

26
36

2
0

1
0

14
0

43
36

Total

62

14

79

Urban
Rural

12
67

0
0

1
0

10
0

23
67

Total

79

10

90

Overall total per race:

618

128

49

205

1000

Kwazulu
Natal

Mpumalanga

Limpopo

Table 1: Distribution of the EAs for the stratified random sample


Data were weighted to the population figures based on the EA inclusion probability, the inclusion
probability of a household, and the weight of a person having one or more small businesses. The negative
binomial approach was used to determine the inclusion probability of a household, by taking the number of
failures (households with no small business owners) into account. The final weights were used to estimate the
number of SMMEs. It is estimated that there are just under 6 million small businesses in SA, with 5.6 million
small business owners. The geographical spread of these, relative to the population, is shown in Figure 1.

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)

Figure 1: Geographical distribution of small businesses and population (Finscope, 2010)


Profiling the small businesses
In order to profile the businesses, a Business Sophistication Measures (BSM) was created. Questions used
concerned hard facts, such as where the business operates from (e.g. street corner, no fixed address, house,
office block) as well as access to and use of a number of services such as water, electricity, computers, banking
and insurance. The responses were coded as 1=yes or 0=no, which places all 181 questions on the same scale,
making principal component analysis (PCA) the appropriate technique for creating an index. The data were
weighted up to the estimated population size.
The first principal component explained 14.3% of the variance, and formed the BSM index. The sensitivity
of the analyses was investigated by omitting questions with very few informants in either the yes or no
categories. Two scenarios were investigated, omitting variables with fewer than 10, and fewer than 30
informants in one of the categories. A k-means cluster analysis was used to group informants into similar groups.
Groups with small numbers of informants (i.e. informants who differ from the others) were omitted from the
PCA, in order to check the sensitivity of index. The results of these investigations showed little sensitivity of the
major principal components to the combinations of variables and informants.
The resulting principal component scores were divided into equi-sized groups, giving and initial ranking of
the businesses. Initially, 20 groups were formed, which would allow for merging of similar groups, to obtain a
smaller number of distinct groups. The choice of 20 initial groups resulted in approximately 284 equivalent
informants per group. (Equivalent informants are obtained by rescaling the sum of the weighted informants
from the population size to the sample size.) The groups could then be examined as to their characteristics, and
similar groups could be merged, and anomalous groups split.
The stability of these groups was investigated using discriminant analysis (DA), using the groups and the
variables used to create the scores. The success in recovering the original 20 groups is shown in Table 2. The
highlighting indicates the number of equivalent informants correctly assigned by the DA, while the last column
gives the percentage correctly classified for each 20 individual group. A total of 54.1% were correctly classified.

p.5256

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)

Gp #infs
1
2
3 4
6
1 320 315
2 284 81 132 31 21
3 255 48 40 126 22
4 261 27 37 15 140
5 296 22 18 16 50
7
5 9
6 285 21
6 12
4 9
7 291
8
1
6 8
8 281
3
3
9 281 11
3
2
5 11
10 293
3
2
11 280
2
4
12 277
13 286
4
14 210
1
15 352
16 289
17 286
18 282
19 285
20 282

10

18
17
1
26
4 12
123 27 30 4
4
21 122 64 24 11
27 32 142 17 35
7 13 76 81 47
9
7 22 18 131
3
6 24 14 46
1
6
3 9
9
1
4 17 1
3
1
2 2
1
3
8 6
4
1
1
1 1
1
1 1
1
2

11

12

13

p.5257

14

15

16

17

18

19

20

28
41
93
47
22
15
2
5
5

6
6
1
17 18
1
22 44 16
99 56 33
23 152 25
30 36 117
8 21 23
11 27 20
1
9
8
2
4
1
1
2
1

3
8
4
9 13
1
58 23
3
80 34 18
19 193 69
2
15 36 197 15
6 11 55 163 42
1
17 39 208 12
1
2
5 42 230
4
2 21 259

%
98
47
49
54
42
43
49
29
47
32
35
55
41
38
55
68
57
74
81
92

Table 2: Comparison of the classification of BSM groups (rows) and DA (columns)


BSM group

A (8)

1 -3

84.8

70.3

B (7)
89.4

14

93.3

93.3

76.8

76.8

76.6

76.6

72

85.4

69.7
78

9-10
11-13

D (6)

68.3

5
7 -8

C (7)

71.5
73

73

15-16
17 -18

80.6

80.6

80.6

19

83.4

83.4

83.4

83.4

20

91.5

91.5

91.5

91.5

% correct

75.6

79.8

80.6

82.9

Table 3: Combinations of the 20 groups, showing the percentages correctly classified by the DA
Possible combinations of these groups were investigated, in order to determine similar BSM categories,
namely those for which the reconstruction by the DA was satisfactory (at least 70% for all groups). Table 3
shows the results for 4 possible combinations, resulting in 6-8 final groups, together with the overall percentage
of correct classification. These groups were then profiled in terms of the variables, to allow the Finscope experts

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)

p.5258

to assess the coherence of the groups with respect to interpretation in business terms. The 8 group solution
(scenario A) was chosen on the grounds of providing reasonable differentiation and business usefulness.
Investigation of variables discriminating between the BSM groups, using Chi-squared automatic
interaction detection (CHAID)
Chi-squared automatic interaction detection (CHAID) was used to profile of the BSM groups (Kass, 1980,
Hawkins and Kass, 1994). At the first step, CHAID looks at which of the predictors differentiate between the 8
BSM categories. The most discriminating variable was do not have any insurance=1, and 0=have some
insurance (p=1.6e-809 for the contingency table). At the second step, CHAID examines each of the new nodes
to determine the most significant predictor for each node. The stronger discriminator between the BSM groups
with some insurance, was own, lease or hire: internet (p=3.6e-96). For the BSM groups who did not have any
insurance, the strongest predictor was do not use a bank for the business p=1.9e-588). Table 4 shows the
details of the CHAID, against the 8 BSM groups, showing increase in sophistication for the BSM groups.
Percentages above 10 are highlighted.

No insurance

No bank
used for
business

Bank used
for business

Business
registered

No bank
used for
business

Hot
running
water
inside

Bank used
for business

Business
registered

No bank
used for
business

Toilet
inside

Bank used
for business

Some insurance

No hot
running
water
inside
No
inside
toilet

Not
registered
Not
registered

No internet

Registered

No internet

Not
Registered

No upto-date
financial
records

Up-todate
financial
records

BSM1

BSM2

BSM3

BSM4

BSM5

BSM6

54.8

35.4

8.2

1.1

0.2

0.2

4.2

34.0

42.5

10.7

6.0

2.6

0.7

23.4

50.2

15.3

6.6

3.8

687

6.3

63.8

19.4

8.8

1.9

160

0.7

17.6

28.0

37.6

14.2

2.0

917

0.7

24.5

26.6

21.7

22.4

4.2

142

4.5

17.4

29.2

39.9

8.4

0.6

177

0.4

4.0

61.2

26.3

8.3

278

1.3

13.2

48.4

30.2

6.3

159

15.8

45.8

38.3

241

4.9

95.1

165

No
running
water
inside
Running
water
inside

0.6

Lease/own
internet

Table 4: Characterisation of the 8 BSM groups


5

BSM7

BSM8

Infs
2010

740

0.1

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)

p.5259

It should be noted that, since CHAID is aimed at determining the most predictive variables and splits at each
stage, the p-values can be interpreted as giving a ranking of the usefulness of the variables. So although all
variables used are significant at the 1% level, the ratio of the p-values gives an indication of the relative
predictive power of the variables. The no insurance variable is approximately 107 times more predictive than the
next strongest predictor: have a credit or debit card.
Looking at the number of employees in the businesses in the different BSM groups, shown in Figure 2, it can
be seen that the lower BSM groups essentially have less than 10 employees.
200

3136
5157

175

150

4461

125
5150

100

4166

2861

75

4877

N_EMPLOY

50
1319
5159
4880

5385

25

N=

2953
5487
2255
5488
2864
1739
4906
5132
4955
5509
2269
1896
5005
1423
1313
483
2053
4956
3610
756
1422
4722
1406
1280
4554
1320
3533
490
4976
571
4097
329
1722
1312
1799
1266
2539
1134
1110
5270
5508
4755
1830
85
4634
4584
4226
5599
1834
4734
2673
4243
1100
1676
4735
1589
264
1590
2348
45
1369
4948
5358
2067
760
2402
1111
4239
3046
3799
2967
125
282
2939
1138
1027
1131
952
1055
5212
4222
1616
4435
4032
4499
2431
2605
906
517
340
1905
3958
3690
1132
4006
962
2903
4007
576
2629
479
1005
5641
1030
285
513
777
455
3874
1815
893
2030
3971
266
5387
5101
4308
1507
3941
957
1517
163
1819
1307
1548
1301
2081
575
2574
3534
4311
1934
4134
946
3728
166
605
883
3199
5470
2009
684
4611
3780
2542
4127
267
4043
1490
2357
1396
1893
1904
3250
2072
3418
2363
1403
4647
969
665
2070
5542
514
1969
2772
3814
693
1268
4316
3719
6186
1043
1172
4380
3090
672
3555
2750
3467
5056
2134
2833
1002
179
2661
1088
305
71
3284
73
971
5655
2462
5504
2184
2076
4746
3044
831
1059
486
1504
2283
4738
1394
2077
5273
1690
3905
4403
3028
2581
4200
3038
3825
3419
947
4214
1514
2978
3471
314
3899
2349
5441
2045
5001
238
670
376
4306
1961
336
4515
1242
1404
671
965
384
41
2862
44
843
270

1197
4014
1663
3667
4511
5356
2374
2700
2153
2889
1662
3364
4081
3108
802
1247
2284
3195
3562
5420
4334
3437
5442
214
5029
2111
2124
2082
870
2974
1356
677
4645
2894
5276
3472
156
3831
2503
241
36
4169
4228
3795
1871
4561
5213
835
2622
4560
1457
856
395
662
2708
859
2834
1478
3750
4537
2599
3029
3438
3558
3729
2839
2835
759
70
4190
2829
2502
663
1989
363
680
3740
3589
546
1630
3882
3410
1816
1189
708
1563
4891
3528
963
5331
190
676
3165
1008
1314
3820
4729
4075
1397
1372
325
3868
2688
4797
1329
4870
2271
3896
3131
4198
5388
3623
28
5480
2375
3095
2891
3398
4344
2102
2010
3711
3793
2716
652
2252
393
884
296
1290
448
3092
265
4183
1440
1657
4709
3600
3808
3450
84787
4644
818
4216
2129
1648
2470
89
373
3633
2512
2364
281
1383
306
551
27
192
1846
3033
422
2578
5473
3025
1230
3628
1355
855
4207
158
3664
4302
331

1001
5040
108
2921
1458
92
1343
1873
4682
682
2711
2378
685
2456
4849
1832
2188
383
3486
75
3768
860
1595
2580
3205
3743
4189
667
3202
2087
390
3518
2838
3888
3742
24
1269
3132
868
394
2819
2096
1829
5274
26
3770
2977
3877
1908
13
385
852
382
3668
386
426
1052
188
1921
2496
3459
4368
2139
2840
1341
4619
681
4962
2453
4968
2509
4358
310
4510
3586
3821
879
1838
3660
392
974
391
1655

4545
2961
4202
1665
1433
3879
2195
4280
2934
4492
1393
1724

763
4726
1842
406
2178
5460
2157
833
209
3858
4376
3656
4072
3200
3383
2730
2043
2710
4341
2910
854
4335
4315
1952
4945

797
5667
5114
5110
686
2042
1964
5289
5238
5261
4624
5555
5520
5368
4910
2860
5287
3321
2248
368
5033
5548
5565
2419
5163
5112
5072
5141
4920
2335
4704
5651
4132
369
5260
5158
4667
5303
4808
5201
4841

4188
5525
1032
3281
1011
2029
3606
911
4702
3432
3316
1555
5000
3320
4990
2930
1360
1130
312
3621
2587

5137
2289
4276
3876
4999
765

1136

1141

1130

565

567

572

281

284

BSM_8GP
Analy sis weighted byof
WTSC
Figure 2: distribution
the number of employees by BSM group

Conclusions
The BSM index has been interrogated by users, and has been accepted as a useful categorization.
References

Finscope (2010). Finscope South Africa Small Business Survey 2010. Finmark Trust, South Africa.
Kass, GV (1980). An exploratory technique for investigating large quantities of categorical data.
Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 29, No. 2, pp 119-127.
Hawkins, DM and Kass, GV. (1982) Automatic Interaction Detection. In: Hawkins, DM. (Ed)
Topics in Applied Multivariate Analysis. Cambridge University Press: Cambridge.
Lehtonen, R. and Pahkinen, E.J. (1994) Practical Methods for Design and Analysis of Complex
Surveys. John Wiley & Sons, New York.

S-ar putea să vă placă și