Documente Academic
Documente Profesional
Documente Cultură
By Eloy Caballero
EXPLORING THE LIMITS OF EXCEL
Before the Power Utilities Came Along
By Eloy Caballero
Eloy Caballero
ISBN-13: 978-1540538895
ISBN-10: 1540538893
Eloy Caballero writes, mostly in Spanish, about Excel and other spreadsheet
software at:
https://ideasexcel.com/
He is the author of several Excel utilities and add-ins, such as: Random Sample Generator,
MaByCool, SelectEx and others.
Eloy also writes (in Spanish) about science, superstition and communication at:
https://areasubliminal.com/
He has published two celebrated (at least in his native town) non-fiction books in Spanish:
Contents
1 Introduction
3 Operational Limits
4.12.2 DoEvents
5 Conclusions
1 INTRODUCTION
As a result of this, both in the home environment as well as in the academic and
corporate world, Excel is used in practically every written document that involves numbers,
charts, tables, analysis. It wouldn't be too daring to express that every report that reaches the
desk of a decision making figure in any of the previously mentioned work or home
environments has been made, adapted, or reconfigured in Excel, thus giving the tool a critical
role in the daily life of almost any organization.
1.2 Excel Files or Excel Utilities?
Usual terminology in computer software can be confusing when studying the nature
of a standard Excel file or workbook. Because by its own nature, even in the case of the
novice user, any Excel file is, truly, a computer program. Why? Because even if it does not
contain any VBA code, the file is a calculation model, however simple, that contains a
combination of input data, formulas, and output results.
After only a few months of practice, the average user is typically confident enough to
employ features such as formulas, with as much as two or three nested levels, lookup and
index-match variations, validation lists, cell protection, conditional formatting, subtotals,
filters, dynamic tables and charts in some cases. The most daring among those average users,
even if they are not capable of programming their own formulas and subroutines, very
quickly develop an interest for the extended VBA capabilities, start becoming familiar with
the Integrated Development Environment (IDE) and begin the process of learning how to add
new functionalities to their workbooks by means of add-ins or User Defined Formulas (UDF)
downloaded from the internet and developed by more experienced users.
1.3 Excel Bugs and Excel Limits
Within the daily routine of any corporation it is a very common situation that an
Excel workbook that starts as a simple table of data, evolves to a file that contains critical
information, for example data input for an important financial or mathematical model. That
model may provide the output for a critical decision to be made by the person in charge, a
decision that will, very likely, be of capital importance for the production line. It is only when
we use quick and informal computer language that such a file is simply referred as an Excel
file.
And that is how we may find true software applications developed in Excel by a
single advanced user that are being employed in some critical steps within the decision
making process of a company, without those files having been subjected to any checks or
verifications by expert programmers. Neither the file creator, nor their supervisor feel that
they are at fault in this respect or that they are engaging in any error prone activities. In fact, if
asked about it, they might just say that they are only getting their Excel file printed for the next
meeting.
According to my experience, what lays at the root of the problem is a bad and
careless use of the tool that we call Excel and of spreadsheet software in general. And yet,
man's natural tendency to shake off responsibility leads us to blame the tool itself when errors
occur. Amazingly enough, Excel is charged with the burdens of its supposedly great
advantages: being very open turns to being too open, very general to too general and very
accessible to excessively accessible. When problems loom Excels acknowledged ease of use
goes from blessing to curse and the user's attitude goes from appreciation to lament. Excel's
ubiquity among computers all over the world has led to an increasing number of incidents
involving serious consequences, particularly among very sensitive industries such as the
world of finance. If you have really being paying attention to the Excel world during the past
few years, then you must have heard of the London Whale[1] or the Tibco shareholders
100$M loss[2], all due to spreadsheet mistakes. The Eusprig forum[3], whose yearly meetings
I attend as often as I can, is dedicated mostly to studying and documenting errors related to
spreadsheet use and to developing and recommending strategies to prevent their appearance
by a thorough knowledge of the programs capacities, by the setup of checking procedures
and by a correct and safe organization of the input-model-output data structure that every
spreadsheet workbook really is.
Most of the so called Excel errors that have great public or private repercussion
are really due to the admittance within the production line, or within the decision making
process of the company, of information generated by small Excel applications or models
containing formulas (programming language), and containing input data which have not been
properly checked and tested. If a certain cell within the worksheet is due to contain the
formula =SUM(A1:A50)/2 and the user has written =SUM(A1:A50)*2 then this is no Excel
error at all but blatant human error. How can we possibly limit the flexibility of the
spreadsheet so that it is capable to identify by itself that the user s original intention for a cell
that can contain formulas was division and no multiplication?
This problem will continue to exist as long as Excel keeps on being regarded by the
staff as little more than a mere office application that can help their subordinates doing some
calculations. When the company becomes aware of these problems, solutions are usually
suggested ranging from the setting of a certain number of check points at some proper stages
for those who work with Excel files, perhaps the outsourcing of professional services for file
testing, an even the total elimination of Excel from the company and its substitution by an ad-
hoc and in-house piece of software made by the IT department, in case there is such, or by an
outside company if there isnt.
Within this frame of reference and thinking of ourselves as advanced users, who have
a good command of the native abilities and can even program their own little VBA macros,
we will explore Excel's response to these limit situations, so that we may be able to classify
those limits and get to know them well.
In this study, we will only take into account those variables related to Excel itself,
ignoring parameters such as the computer's physical features (RAM, HD, Chipset) and the
work load it is dealing with at a given moment. We will think of a standard machine running
only and strictly those processes that are indispensable for Excel to work properly. If no
indication is given, the examples I'm going to work with are built and developed in Excel
2010 and Excel 2013 and the conclusions thus obtained should be extendable to all other
versions from 2007 to 2016, making the appropriate exceptions.
The limits thus obtained conform a kind of map of the boundaries of the extended
Excel territory. With the purpose of incorporating a certain sense of order I have divided this
territory into three big different regions: first we will explore structural and content related
limits, second we will see operational limits and third and final we will get acquainted with
VBA related limits. These are the regions of collapse, failure and breakdown but I do not want
this to convey any negative connotations about Excel. On the contrary, if any conclusion is to
be derived from this exploration of the border regions, that conclusion should be that Excel is
a very wide territory, one that is fairly safe and sound to move around as long as we keep our
security distance from these borders.
Welcome. The road is going to be bumpy. Fasten your seatbelts and let the trip begin.
WEAKEST LINKS.
2 STRUCTURAL AND CONTENT RELATED
LIMITS
2.1 The Cell Level
Though the ongoing debate about Excels prevailing nature (database tool, analysis
tool, calculation tool, presentation tool) is irrelevant to our purpose, it will be convenient
to lean on certain perspectives at some points, and prefer one of this approaches over the rest
for a while, just in order to have a better understanding about what is going on for that
particular problem.
Excel needs data to perform analysis on, and usually these data have to be loaded in a
worksheet[4] within a workbook. So, first of all, it is of interest to understand how Excel
stores data. The cell is the basic, fundamental, indivisible information unit of the worksheet in
any spreadsheet software, and there is a first consideration to be made regarding cell contents.
Excel Works with several kinds of data, and whether we have these differences into account or
we simply put data into cells as we go along, Excel will always interpret the contents of
each cell and assign those contents to one of the internally supported data types. This
interpretation process is not always 100% clear and particularly for the case of strings headed
by a leading zero things can get pretty messy and only some drastic measures will help clarify
the situation.
Regardless of how a cell in particular is formatted, we can say that there are four[5]
basic general types of data that can be contained within an Excel worksheet cell. These types
are the following:
As a general rule, Excel will always treat numbers as if they belonged to the Double
type, as the Double variables in the VBA programming language, that is, double precision
floating point numbers. But this being the case, the user can choose to format the cell
containing these numbers in many different ways, including scientific, percentage, fraction,
date and also lots of custom formats.
2.1.1.1.1.1 Precision
Typing numbers into any cell will pose no particular problem for figures consisting
of up to 15 digits, but from the 16th digit onwards, Excel will convert all typed digits to
zeroes, so some information will be lost as we type. The following table contains a set of
keyboard typed entries and their result as numbers interpreted by Excel (dots represent
thousands in this table).
So as you can see, from digit number 16 on, though the significant amount of digits
is not lost, all precision is ignored. A small nuance might be of interest here. As I have said
when numbers are typed from the keyboard, Excel will ignore any digit from the 16th place
on, including rounding considerations, and so a manual input such as:
12345678901234560
12,345,678,901,234,560
12,345,678,901,234,500
And so, 60 units of precision will be lost. But Excel will nevertheless regain
rounding sensitivity for the 16th digit when we try to operate with that figure through
formulas. And so, if we write the value:
60
In a different cell, and then sum the two values in a third cell, youll find that the
result is:
12,345,678,901,234,600
So there is no precision or even sensitivity for the 16th and following typed digits, but
there is still rounding sensitivity for that 16th place within formula operations. I agree that it is
very unlikely that we find a situation where this nuance is of importance, but we coincided in
the introduction that it is good to know the details of the frontier lines of out tool. LibreOffice
Calc, by the way, a competitor spreadsheet software, behaves in exactly the opposite way in
these extreme situations: it rounds typed entries from the 16th digit on, but its formulas show
no sensitivity in that respect.
If we tried to type a very large number in decimal format using the keyboard, as we
have said, Excel would just truncate any number after the 16th digit to zero and admit, in
theory, a maximum of only 255 consecutive digits. The largest positive number that can be
introduced in this way iscomposed of 255 consecutive 9 digits:
9999999999999999999999999999999999999999999999999999999999999999999999999
Which after hitting enter, Excel will finally store in the worksheet cell and it will do
it, by default, in scientific notation as:
9.99999999999999E+254[6]
An attempt at introducing a number with 256 digits or longer, will provoke that the
keyboard numerical input be interpreted by Excel as text. This is the default behaviour.
However, Excels behaviour in this respect includes that if we are working in edit mode
(writing directly in the formula line), the input will be interpreted as number, as long as the
input chain is 308 characters or shorter.
Example
Try typing a string composed of 256 consecutive 9 in cell A1. Since we are in
editing mode (writing directly in the cell), once we hit enter, Excel will accept the data as a
number and correspondingly truncate all the figures from the 16th place onwards to zeroes.
The cell will show: 1E+256
Go now to any plane text editor, such as Windows Notepad, and write a 256 long
character string composed only of the figure 9 . Select it, copy it ( CTRL+C ), and then go
to Excel, and paste the contents in cell A2 without editing (just paste or CTRL+V ).
Curiously, Excel will not interpret the input in the same way, and will take it as text.
Figure 3: Paste (CTRL+V) in cell A2
Repeat the same operation but at the time of pasting in cell A3, do not just select the
cell and paste, but make sure that you are in edit mode by going to the formula bar input area
and paste ( CTRL+V ) the contents there.
Once again, Excel will surprise us when we hit enter, by accepting the input as a
number and, as it does by default, converting it to its scientific version:
1E+256
Go now to Microsoft Word and repeat the same operation. Write a string of 256
characters 9, copy the string( CTRL+C ):
Figure 5: From Microsoft Word, repeat the same operation
Back in Excel, paste( CTRL+V ) the contents in cell A4 directly without editing.
Having in mind what happened when we performed this operation with Notepad, we naturally
expect Excel to accept the input as text. But curiously again, Excel will interpret and accept the
input as a number and give us again:
Writing the same string in a cell from the IDE, using VBA, will also result in Excel
interpreting it as a number.
Select cell A5. Click Alt+F11 or go to the Developer tab and click Visual Basic. In
the Immediate Window , write(there are 256 9 in the list):
Activecell.Value="9999999999999999999999999999999999999999999999999999999999
Example
Create a new Excel file. Try typing a string composed of 310 consecutive 9 in cell
A1 of Sheet1 . After hitting enter, Excel will accept the result as text. In cell A2 type the
formula:
=LEN(A1)
This formula will count the total number of characters in the string and itll return,
naturally:
310
So far so good. Now edit the contents of cell A1 by going to the formula bar and
delete the last 9 figure. Now, when you hit enter, cell A2 will, still naturally, show the
foreseeable result:
309
Repeat the previous step and delete again the last 9 figure in cell A1 by editing the
contents in the formula bar.
And now youll see that the logic has changed. Cell A1 no longer contains a string of
308 consecutive 9 figures, but instead Excel has automatically reinterpreted the data as
number: 1E+308
And the formula in cell A2 is now counting only 6 characters. Being in editing mode,
as soon as Excel has been capable of interpreting the input as number, it has done so,
converting the format, as usual, from decimal to scientific format. Does this mean that Excel
has a preference for numbers or is, somehow hungry for numbers? I cannot tell. But this
example demonstrates that unless we have prevented it in some forceful manner, like adding
an apostrophe, if the worksheet has a chance to take the entry as a number, it will do so, even
ignoring leading zeroes of supposed strings of numbers:
...
Figure 9: A single "zero" has been removed from the string in cell A1 and the interpretation logic of Excel has
completely changed to numeric
The reader may think that I have elaborated too much for a data input method which
is not practically used at all. And it is true that nobody introduces big numbers or operates
with them in this old fashioned way. But we agreed that our goal here is to gain as much
understanding as possible about the limits of Excel. The largest positive number that can be
introduced manually in a cell using scientific notation is a lot bigger than the one we were
able to enter using decimal notation. This number is:
9.99999999999999E+307
Make sure you have formatted the cell to show fifteen decimal places, otherwise
Excel will show the contents as:
1.00E+308
If we tried to type any number bigger than this in scientific notation, Excel will
simply do as it does in the case of decimal entries and accept the input as text.
Example
Type 1.00E+308 in any cell, and the input will be accepted only as text. This is a little
confusing, as you can see. The way Excel automatically rounds up and shows results can
induce us to think that 1.00E+308 can be typed an accepted as number in Excel, but it cant.
Real maximum is 9.99999999999999E+307 , but shown as 1.00E+308 , unless you format the
cell so that it shows fifteen decimal places. Thats how Excel behaves under these, really
extraordinary and unlikely circumstances, and this is as far as we can possibly go introducing
big numbers manually in Excel. But anyway, do not forget that nothing computable is really
lost in comparative terms, because ideally:
1.00E+308 - 9.99999999999999E+307 = 1
In many cases, the contents of a worksheet cell are not typed directly by the user, but
come as the result of a formula or have been written by VBA code as the outcome of a certain
subroutine.
Lets think of a situation where the maximum typed number that a cell can accept in
scientific notation: 9.99999999999999E+307 , is in cell A1 . We have made it clear that any
typed number bigger than this, however small the difference, will be interpreted by Excel as
text.Just as a reminder of this, type: 1E+308 in cell C1 .
Now type the number 7.9769313486231E+307 in cell A2 , and then write the
formula =A1+A2 in cell A3 . This is the result:
1.79769313486231E+308
As we can see, the formula result is quite bigger than the maximum numeric inputthat
we can type. To confirm this, type this very same number in cell C3 , and see that Excel will
only accept it as text.
Figure 10: The resulting number surpasses what cell B3 can interpret as such
Operating with numbers in VBA and then showing results in a spreadsheet cell can
open the door to some unlikely but new and curious situations.
When performing spreadsheet calculation, Excel always works internally with 8-byte
double precision floating point numbers[7] and shows the result according to cell formatting.
VBA can operate with different type variables that the user can specifically set for a special
purpose, usually depending on the expected maximum and minimum values. The main[8] data
types that can represent numbers in VBA, and their limits, are the following:
Interval of validity
Byte 0 255
Integer 0 32,768
Long 0 2,147,483,648
Double
4.944065645841247E- 1.79769313486232E+308
324
Clearly, there is a first consideration to be made, still without leaving VBA and the
IDE. If variables are overflown, that is, if their calculated value falls outside their interval of
validity, an error will occur. If at some point during the execution of the calculations written
in the code, the variable defined as Byte, reaches a negative value, or if it reaches a 256 or
bigger positive value, a Run-Time VBA error will occur.
Well put it into an Excel VBA subroutine and see what the outcome of its execution
is. The following screenshot contains the details:
Figure 11: The Byte variable has suffered overflow
As soon as the variable defined as Byte has been overflown, the IDEhas returned
a Run-Time error. Just before that, in cell A1 , the code has written the biggest value that the
Byte type variable can hold: 255 .
And the same kind of reasoning is applicable to all variable types. Although I will
deal in more detail with this VBA related limits in chapter 4, it will be worth taking a look at
the same problem from the perspective of the Double type, because itll provide us with the
insight we need to thoroughly understand the limits for maximum numbers in Excel. Now,
consider the following code. It will divide number 1 by number 1.5 iteratively until it can or
an error arises:
Sub limitDouble()
Dim doLimit As Double
doLimit = 1
Do While Not IsError(doLimit)
doLimit = doLimit * 1.5
ActiveCell.Value = doLimit
Debug.Print doLimit
Loop
End Sub
And what we are about to witness upon its execution comes again to confirm Excels
somewhat interesting conductwhen it has to deal with these border situations. We are going to
write results in ActiveCell and also on the Immediate Window , by using the Debug.Print line.
And this is what happens:
Figure 12: From the IDE we can write larger numbers than those typed directly from the keyboard
Yes. In cell A1 , which was ActiveCell at the time of starting the routine, the code has
been able to write the number:
1,4E+308
Which in theory, at least, we had stated it was impossible for a spreadsheet cell to be
typed directly from the keyboard or obtained as a formula result. Now lets check that this is,
in fact, a number and not text. Go to cell A2 and write the formula:
=A1/5
2,9E+307
Try and introduce the same number( 1,4E+308 ) in cell C1 from the keyboard, and
youll see that Excel will accept it only as text. Confirm this by writing a similar formula to
the previous one ( =C1/5 ) in cell C2. Youll get a #VALUE! error.
So, we can confirm again that a cell in any Excel worksheet will accept different
maximum values depending, first on the input format (decimal, scientific) and second and
most importantly, on the origin source of the data (keyboard, formula result, VBA outcome).
It is important to realize that once VBA has managed to introduce a certain number in
the cell, any Excel formula will be able to operate with that number and return the result
correctly, as long as this result doesnt surpass what we could definitevely call the absolute
maximum number that, regardless of the input method, an Excel cell can store, that is, the
maximum number that a VBA Double type can admit, namely:
1,79769313486232E+308
Continuing with the previous example, if we wrotea new formula in cell A3 , such
as =A2+A1 , this is what well get:
Figure 13: Once again, different numeric limits according to their origin
As we can see, once the value 1,4E+308 has been written by VBA, Excel is perfectly
capable of operating with it, dividing it by 5, adding both numbers and returning in cell A3
the result:
1,7E+308
Which is very close to the absolute maximum limit. So close, in fact, that if instead of
having =A1/5 in cell A2, we had =A1/4 , the situation will be completely different, as we can
see in the following screenshot:
Figure 14: Cell A3 has surpassed the absolute limit that a cell can show as number
Indeed, as we already saw in the case of formula results, the absolute limit has been
eventually overrun and Excel returnsa #NUM error.
As it was the case with big numbers, typing data into a cell directly from the
keyboard in editing mode will cause Excel to interpret the data as numbers while it can (and in
this case showing it always in scientific notation), and admitting the data as text in the rest of
cases.
1E-257
The same reflections made in the case of big numbers to non-editing mode and
Notepad and Word copy and paste options, are applicable here.
In the case of small numbers well proceed gradually and find out what the smallest
number is that an Excel worksheet cell will show precisely as such. We will edit the contents
of cell A1 and write numbers in the form:
0.000000000000001
As we said before, it turns out that Excel accepts the number and automatically shows
it in scientific format, in this case:
1E-15
A similar string containing 306 zeroes will be still accepted as number and shown as:
1E-307
Addingjust one more zero (307; equivalent to 1E-308 ) in the string will cause Excel
to still accept the input as number, but turn it to absolute zero without asking for any
confirmation: 0
One more zero in the chain (308; equivalent to 1E-309 ) will still return absolute
zero: 0
But one more zero (309; equivalent to 1E-310 ) will make things change again and
Excel will no longer interpret this input as number, but as text, and will show the contents of
the cell accordingly:
0.0000000000000000000000000000000000000000000000000000000000000000000000000
Introducing numbers in scientific notation does not imply too great a change with
respect to what we said before. After a sufficiently large number of attempts, we would find
out that the smallest number that can be typed into a worksheet cell, and admitted as such by
Excel is:
2.22507385850721E308
Any other typed scientific number smaller than this one, however small the
difference is (take away only the 1 figure that occupies the 15th decimal place) will be still
interpreted and admitted by Excel as number, but it will be turned to:
0.00E+00
Which is, again, Excels typed absolute zero, and this is as far as we can go with
manual inputs, either in decimal or scientific notation.
Example
Now change the contents of cell A2 to 4.494233 and the result in cell A3 will be:
Figure 15: For small numbers the limit is, of course, zero. No overflow errors associated.
Operating with big numbersand surpassing the limits of what a formula was capable
of returning meant a #NUM error, but in the case of small numbers, it means that Excel turns
the outcome simply tozero: 0 .
What would happen if this minimum non-zero number that a cell can hold was used
as divisor of another given number? Would Excel interpret this as a division by zero? Not
exactly. If the denominator is an ordinary number, the result or quotient would surely be a
really big number. But as long as it doesnt surpass the biggest number than the cell can
support, Excel will operate with it without problem. Otherwise the result will bea #NUM!
error. If the dividend is 4, it works, but let it grow to 5 and it stops working.
4/2,22507385850721E308 = 1,79769313486231E+308
5/2,22507385850721E308 = #NUM!
In the case of big numbers, we saw that operating with Double types in VBA didnt
have special implications because the maximum limit that such a type could admit was:
1,79769313486232E+308
And this was as well the maximum number that a spreadsheet formula could return.
Any number bigger than this, meanta #NUM! error, in the case of formulas, and a Run-time
error, in the case of operations within a VBA subroutine. Excels general behaviour is similar
with small numbers, but there are some interesting differences.
Lets see this with an example. Write the following code in the module of a new Excel
file:
Sub limitDoubleSmall()
Dim doLimit As Double, i As Long
doLimit = 1: i = 0
Do While Not doLimit = 0
doLimit = doLimit / 2
ActiveCell.Offset(i, 0).Value = doLimit
Debug.Print doLimit
i = i + 1
Loop
End Sub
As it can be seen, in this case VBA has been capable of operating with numbers, down
to an incredible level of smallness, a level that really corresponds to what a VBA Double type
is capable of handling:
4.9406564584124E-324
While the spreadsheet cell, as we can see from the results in Column A , has stopped
appreciating smallness once the following value has been reached:
2.2250738585072E-308
And Excel has converted all values to zero from then on, because when divided by 2,
they are in fact smaller than smallest number a cell can support:
2.22507427297923E-308
So, in the case of small numbers, VBA can incorporate a lot of additional precision
to ordinary spreadsheet cell values, namely 324-308 = 16 orders of precision. We will never
be able to make a spreadsheet cell to accept any figure below 2.22507427297923E-308 as
number. But we can operate within VBA and then show results as text within a cell whenever
we want to.
We could, for instance, take advantage of VBA and operate numerically with numbers
that, as mere spreadsheet, Excel could never handle. In the following example we can see one
of these operations, far out of reach for the usual spreadsheet cell formulas, but possible
when done directly in the Immediate Window :
Figure 17: VBA can operate with numbers smaller than those the worksheet cell can handle
But this lower border region, so to say, in the Excel VBA world can register some
anomalous behaviour. Try running the previously given subroutine:
Sub limitDoubleSmall()
Precision 1.00E-307
9.99999999999999E+254
reaches only
Longer chains of numbers will Longer similar chains up to 308
the 15 th
be interpreted as text zeroes will be taken 0 , but from
decimal place
309 zeroes on, Excel will take
but there is
the chains as text
rounding
sensitivity to
the 16 th
text into 0
error
All this having been stated about numeric limits, its worth remarking the unusual
nature of this situations, and insisting on how unlikely it is that an ordinary Excel user, or
developer, will ever come across one of them. In fact, why dont we bring on some of the
biggest and smallest numbers of nature in order to fully comprehend their unlikelihood?
One of the biggest amounts that can be thought of without completely losing touch
with reality, might be the number of protons in the observable universe. The figure, known as
Eddington Number, in acknowledgment of the astrophysicist Sir Arthur Eddington, and noted
NEdd , is estimated to be about:
1.00E+80
For small numbers, we can bring up the Planck numbers used in the field of Quantum
Mechanics, for the tiniest conceivable pieces that space and time can be cut into. These
numbers are the following:
As we can quickly appreciate, the numeric limits of Excel are ample enough to allow
it to deal even with the remotest numbers of nature in a rather comfortable way.
The code is quite simple. It merely concatenates 40,000 times the character w
in ActiveCell , writes the number of steps in the cell just below and then stops. In cell A3 we
will write the formula =LEN(A1) and then having the IDE and the Excel window in the same
view, we will execute the codeby clicking F5 or the Run button
in the IDE (Little green arrow).
This is the result:
The writeChar()
routine has worked
properly. It has completed
40,000 cycles and apparently
written the same number of
characters in cell A1. But the
number of characters in cell
A1 as measured by the
formula in A3 is only 32,767.
At first sight the messages are contradictory here. It could have been reasonable to
expect that if the maximum number of characters that a cell can support is 32,767 and we have
surpassed this limit, Excel would provide, at least, some kind of warning message. But in fact,
the situation is so highly improbable that we can hardly blame Excel for this slight blunder
and should worry more about someone who is trying to make ten-page long texts fit into a
spreadsheet cell.
So, regardless of how long the text we are trying to put into a cell is, and also
regardless of the way the input has been done: keyboard entry or database reading or VBA
writing, Excel will always cut the text to a maximum of 32,767 characters without giving any
warning message if the text is longer. If we proceed blindly with data where we know there
are some fields containing long strings of text, we might end up not even being conscious of
what we have lost in the process.
We have already explained why dates, at bottom, do not constitute a separate data type
in the worksheet, though they do in VBA. Dates shown in any Excel worksheet cell are really
integers that, with the appropriate formatting for the cell in particular, will be displayed in the
usual way we recognize dates, from mm/dd/yyyy to any of its possible geographical and
sometimes fancy variations. Also, we must not forget that in dealing with spreadsheets,
integers arent truly integers either: they are only double precision floating point numbers in
disguise.
The range of dates that can be natively shown in a worksheet cell goes
from 01/01/1900 to 31/12/9999 , which respectively correspond to the
integers 1 and 2,958,465. This is the interval of validity for dates.
Any string typed in this fashion[10] will be automatically shown as date in the
worksheet and any integer between the corresponding values 1 and 2,958,465 , or any
calculated formula whose result falls within the limits of this interval, can be shown by Excel
as a date just by using the formatting tool in the ribbon or in the contextual menu. Typing
dates outside of this interval will cause Excel to accept the data automatically as text and thus
the user will notice that the text is, by default, aligned to the left of the cell. The following
screenshot shows a variety of data entries typed from the keyboard, formula results from
adding 15 integer units or days to this dates, and the way they are accepted and shown by
Excel.
What we have referred as the interval of validity for dates, that is the interval of dates
that a worksheet cell is capable of showing with date format, lays between entries number 5
and 8 of the previous image. Once typed, the data in that region (Column B) have been given
by Excel the date default format and shown accordingly. Entries number 9 and 10 fall out of
the validity interval and have been accepted directly as text, as also have been entries number
2, 3 and 4. Input number 1 intended to fall into the distant past and has been typed
as: 31/12/99 , but instead of being taken as text has been interpreted by Excel as 31/12/1999 .
The contents of Column C are the same than those in Column B (cell C2 contains the
formula =B2 , and so on) but the range C2:C11 has been formatted as General which
means having no specific number format. Within the comfort area, we can appreciate that
what really lays behind a worksheet cell that contains a date is nothing more than an integer
number, and eventually a worksheet number which we already know to be a double precision
floating point number. For the rest of cells in the range, applying the General format brings
about no change at all. They continue to be treated as text. If we try to make a simple operation
with these data, just by adding 15 days (Column D) to every single date in Column B, then we
get the result shown in Column E. Outside the comfort area for dates, the result of adding a
text and a number is a #VALUE error, as it should have been expected. Within the comfort
area, which now includes also input number 1, things have run smoothly, except for the case
of inputnumber 8, where adding 15 days to 31/12/9999 , instead of resulting in 14/01/10000 ,
has brought about the illegible and interminable chain of pound symbols: ########## . If we
hover the mouse over this cell E9, a message prompts with the following warning: Dates and
times are negative or too large to display . And indeed, as we can see in cell F9 , the real
content of cell E9 , in General format is not ########## but the integer 2958480 which,
falling outside of the valid interval cannot be shown anymore in the date format. If instead of
adding, we subtracted 15 days, a similar situation would result, as we can see in the next
screenshot:
Figure 22: No dates previous to 01/01/1900 can be shown in the worksheet cell
Now the problem with the illegible chain ########## is in cell E6 , as it should
have also been expected, for the resulting date, 17/12/1899 falls now out of the validity
interval. Cell E6 , however, contains the value -14 , which in General format can be perfectly
shown, as we can see in cell F6 . And whats more, though not capable of being shown in the
worksheet, cell E6 contains actually a date, and VBA can read it as such. Having
cell E6 selected as ActiveCell , go to the Immediate Window and youll be able to read the
following:
Figure 23: The usual date limits for worksheet cells do not apply in VBA, at least towards the past
So, clearly, some of the usual limitations the worksheet environment imposes on
dates can be overcome using VBA and well elaborate more on the subject in chapter 4.
Booleantypes can only take two values: TRUE or FALSE and so Boolean algebra
operates within this simple range of values and no logical result can be different
from TRUE or FALSE , or their equivalent binary numerical: 0 and 1 . And here an important
nuance has to be mentioned: Boolean does not mean binary in the Excel frame. Binary is
not a type, it is a numeric base for number representation. The number 19, as expressed in
usual everyday decimal base, can also be represented in binary as 10011 ,as it can be
represented in octal as 23 , but it cannot be represented in Boolean, because Boolean types can
only be TRUE or FALSE . This is the table for the possible results of the AND operation with
two Boolean values.
Figure 24: Operating with Boolean values
As we can see TRUE AND TRUE is not 2 in Boolean, but simply TRUE . And yet we
have to be careful because Excel has only recognised the logical values as arguments of the
logical function =AND() . If we are not careful, Excel has been designed in such a
flexibleway that some strange situations are possible. For example, Excel will have no
problem in accepting different operators for the values TRUE and FALSE , operators that
have no meaning in Boolean terms. But Excel will interpret, by default,
that TRUEs and FALSEs are 0s and 1s, and will return curious results, sometimes
meaningless results.
Figure 25: Excel will assume that we meant 1+0 and return 1 as output
In the previous image, instead of returning an error and letting the user know that he
is trying to perform a meaningless operation with Booleantypes, Excel is assuming that we
did not mean TRUE and FALSE , but 0 and 1 . The opposite situation is not symmetrical, and
so, if we tried to perform a logical operation with numerical values, Excel would
immediately return an error value.
Figure 26: No logical operation is possible with numerical values. Only TRUE and FALSE are allowed as
arguments
Except, perhaps, if we are confronted with a simple comparison using the operators
= or < or > . If this is the case, Excel will assume that our intention is to make such
comparison and so it will return a logical or Boolean[11]value TRUE or FALSE . Using this
logical values as formula arguments can be problematic in some specific cases, such as
the SUMPRODUCT function.Excel will not interpret logical values as 0 s and 1 s unless we
explicitly use the double negative operator, whose function is precisely this.
Figure 27: The double negative operator turns TRUE into 1 and FALSE into 0
If the double negative operator( --() ) had not been used, the result of
the SUMPRODUCT formula would have been simply 0.
2.2 The Range Level
2.2.1 Evolution Constraints
Once we have explored the limits related to the basic information unit, the cell, we
have to move to the next level, which is the group of cells, usually known as range in Excel.
The most representative range that we can think of is, of course, an entire worksheet, a
rectangular area with 1,048,576 rows and 16,384 columns that makes 17,179,869,184 cells.
Well, after having made an acquaintance with the limits of the cell as the basic container of
information, one is tempted to ask an apparently obvious question regarding the worksheet:
Can I fill an entire Excel worksheet with the simplest possible data, for example number one
1? For the average Excel user this is a totally legitimate question. Previous to version 2007,
Excel offered a 65,536 row by 256 column grid, containing only 16,777,216 cells. In the days
of Excel 2003 it was still possible to go to the Immediate Window , write something like:
? Sheet1.Cells.Count
And still get some meaningful answer, like: 16,777,216. Back in those days, you
could still enter 1 in cell $A$1 , copy it, select all the cells in Sheet1 , paste it, and get the
16,777,216 cells filled with the value 1 , without not too much trouble. The resulting file
would have a size of about 100 MB and could be opened with some difficulty. But this is no
longer possible from 2007 on.If you go to the Immediate Window in Excel 2013 and write:
? Sheet1.Cells.Count
Will return a Long type number, as long as the selection has no more than:
Let us check this.
First we will find out what
column has the index
number 2,047 by writing
this into the Immediate
Window :
Columns(2047).select
As we will be able
to see, column BZS has been
selected.Now we can type this into the Immediate Window :
Columns("A:BZS").select
However, just adding one more single column to our selection would bring about a
very different result.
Figure 30: But adding one more column will make it impossible for Excel to count
This limitation will disappear when 64-bit versions have been fully implemented
and VBALong variables become VB LongLong variables, capable of holding whole
numbers up to 9,223,372,036,854,775,807 (9,2... E+18).
Figure 31: Excel warns of a large operation that can take a long time but still appears to be able to do it.
And then a second message will pop up letting you know that the system doesnt have
enough available resources to perform the task:
Figure 32: But it finally couldn't do it
This process will make Excel tumble and hesitate for a while but after some time it
will end with an overall collapse of the application. Clearly, the grid of Excel has improved a
lot from 2003 to 2007, expanding its size by a factor of:
But these theoretical capacities are practically limited in the real world of all
Windows systems to the point that a single operation such as filling all the cells in a single
worksheet with number 1 cannot be performed.
Why is that?
Windows old 32-Bit OS systems are limited by a 2-GB RAM memory limit per
process[12] and in many cases, although most users are now running on 64-Bit OS, from
Vista to Windows 8.1, their Excel installations are usually 32-Bit, unless they specifically
opted for the 64-Bit version at the moment of installation which is not a common thing so far.
In fact, the 32-bit installation was still recommended until recently because of the many
incompatibilities that the more powerful 64-bit version would imply as far as compatibility of
active X controls, VBA components or third party add-ins are concerned. So in spite of the
fact that for Windows 8 systems and X86 OS architecture, the minimum RAM physical
memory required is 4-GB, if our Excel installation is 32-Bit, it will be subjected to the 2-GB
virtual memory limit and any Excel process will collapse when that limit is reached.
It is evident that the more data an Excel file holds, the greater the size of the resulting
file. But the process associated to operating (opening, saving, adding more things to it) an
Excel file will take up much more virtual memory than the file size itself. So we cannot
pretend to have an Excel file whose file size is 2-GB, not even closer, and so the question of
how big the file size of a workbook can actually be in any ordinary computer is a legitimate
question whose answer is worth knowing. In order to respond to this question in a general
way, we are going to use a very simple macro that will gradually fill with the number 1
entire columns at a time until virtual memory usage grows to collapse. Meanwhile we are
going to be monitoring the whole process by using the Windows Task
Manager( CTRL+ALT+DEL ).
In order to be able to work with an Excel file, we need it fully open. There is no such
thing as a partially open Excel file. Therefore, as we put more and more data into the file, its
size grows, and it demands more and more virtual RAM memory. Eventually, the 2-GB limit
will be reached, though usually, performance is already quite poor and unstable from 1-GB
on and in the machine I used first (8-GB RAM) collapse usually happened at 1.5GB.
Example
Open a new Excel file, go to the Developer tab, add a new module and type the
following code:
Sub memCollapsExcel()
For i = 1 To 1000
Columns(i).Value = 1
Next i
End Sub
Before executing this
macro, go to your computer Windows Task Panel, click the Process tab and have it ready to
visualize its evolution as soon as you execute the macro. Now execute the macro whose
purpose is to fill entire columns at a time with number 1 .
Starting from a moderate value of about 50 MB[13], the Microsoft Excel process will
gradually demand more and more virtual RAM memory as entire columns are being filled
with the number 1 and the moment will come, close to the 2-GB value, when Excel
collapses altogether. In fact, Windows 8.1 allocates memory in my computer in such a way
that it reserves a great deal for the rest of open processes and collapse always takes place
when Excel reaches around 1.5GB of memory usage.
I have tried this very same experiment in different computers and with different Excel
versions. Curiously enough, when I run the same experiment on an older Windows Vista
environment, with a 32-bit Excel 2010 installation and a 4-GB physical RAM, Excel got to the
point of consuming more memory, about 1.7 GB. But it only did it at the cost of exhausting
the graphical resourcesof the system and Windows not being even capable of handling the
showing the usual Not Enough Resources dialog boxes of Excel and me not being able of
getting information as to how many columns were filled. Memory allocation appears to have
been different in Vista. Now, the number of columns that this macro has been able to fill on
my Windows 8.1 system before Excel collapses is, in the best case, 173[14], which means little
more than 1% of a simple worksheet theoretical capacity. Expressed in terms of columns:
The following macro will do exactly the same task, only this time the filling process
will be row-oriented:
Sub memCollapsExcel()
For i = 1 To 100000
Rows(i).Value = 1
Next i
End Sub
Executing this macro will cause the same Excel cannot complete this task with
available resources message to appear when the RAM usageof Excel reaches about 1.5GB and
the number of rows filled with number 1 is 10,848 which again takes us to roughly the
same percentage:
It is certainly possible for a OS programmer to get a worksheet filled with more data
than the previously mentioned figure, and to set up and configure the Windows Registry in
such a way that the theoretically possible capacity of 2-GB is optimized for use.
Taking into account that the need for memory keeps growing and growing, and that
the implementations of fully functional Excel 64-bit versions might take some time for the
reasons mentioned, Microsoft recently announced an improvement in memory capacities for
Excel 32-bit versions. According to Charles Williams[15], this change was introduced in
Office updates in May and June 2016 and it basically doubles the virtual memory for Excel
32-bit, provided you are using a 64-bit Windows that is, the amount of available RAM goes
from 2GB to 4GB. If you are still using a 32-bit Windows, this change increases the virtual
memory up to 3GB but you need to implement some additional measures regarding boot
switch[16].
Considering that I am using Excel 2013 now in a Windows 8.1 64-bit OS, I decided to
check whether these changes had been made effective and run the same experiment once
again. I must confess that I was unconvinced about this, but to my surprise, I was able to see
that, sure enough, the 2-GB limit was surpassed and Excel collapsed when the RAM usage was
about 3.3-GB, about double the memory it had collapsed in my previous experiments.
Regarding the amount of data, Excel was able to fill 323 columns before collapsing, once
again almost twice the value it completed in the old system previous to the update.
2.2.2.3 A Lot of Unusable Space
According to the previous figures and admitting slight variations on the calculated
percentages, variations that will depend on the precise configuration and capacities of the
computer we are working with, on how the RAM allocation is performed by the OS and on
what the demand of the rest of the tasks the computer is performing at the time is, the main
conclusion is that from version 2007 on, Excel, regarded as mere data container, comes with
an awful lot of space that in the most common configurations (32-bit Office installations) is
not available for the user in practical terms. The RAM memory required to handle such a big
file size will hit the Windows imposed 2-GB limit, or at best 3.3-GB limit after June 2016, and
Excel will collapse. In the following table, I have gathered information about what a step by
step process of adding more and more columns of data(value 1 ) would mean to an Excel
file( .xlsx) , both in terms of file size and, more importantly because this is what really limits
the operational capacity, in terms of RAM usage. Do not forget that file size is consistent from
one computer to another, but RAM usage doesnt have to be. It will depend on how that
particular machine allocates memory space between physical RAM and virtual disk memory,
and what the total demand of the rest of the tasks is at that moment. This table reflects the
results obtained within a Windows 8.1 OS with a 8-GB physical RAM, before the May 2016
update.
The first and more important conclusion we can draw from this table is that RAM
usage is very demanding and far surpasses file size as the graph shows:
Figure 35: RAM usage far exceeds the nominal file size of the workbook. There are several reasons involved
both Excel and non-Excel related, among them the undo actions kept in memory
Another conclusion for my standard system, Win 8.1, 8-GB RAM, is that once we are
operating with a number of columns filled with data greater than 16, that is around 17 million
data, performance decreases dramatically and there is an obvious shift in the quality of the
user experience, going from using Excel to suffering the consequences of trying to
handle too much data with Excel. The program stops being responsive and threatens collapse
almost all the time. It might still be possible to perform certain tasks with ranges greater than
these, but only as long as they dont involve putting data into the worksheet, i.e. changing cell
format and other similar and not too useful tasks. For example,I could go to the Immediate
Window and write:
Range(Columns(1), columns(172)).select
And provided we are using a good computer and the system is not too busy with
other tasks[17], Excel will go to its limits and quite painfully will write the total of
180,355,072 numbers 1 . But if we try to save that file we will receive a Not Enough
Available Resources error again, followed by another error that could look like this:
Figure 36: You may be able to put a lot of data in the workbook but if it is too much, even saving the file can
be an unsurmountable obstacle. The message warns that errors were detected while saving the file and offers repairing
but it will not be possible.
So, in spite of the fact that we may have managed to open an Excel file containing
180,000,000 data and that we are apparently capable of doing Excel analysis and operations
with the data in this file, this is pure illusion. At a practical level, the moment we try to do
something, however simple, Excel will just collapse and if none of the previous messages is
shown then it will be probably the following (messages in the image appear in Spanish):
The same sort of reasoning could be applied to putting formulas, instead of numbers,
into a worksheet. Only, we have to take into account that formulas, although they are nothing
more than a different type of data that Excel interprets in order to show cell contents
accordingly, they are automatically calculated[18] and consume, on average, significantly
more resources than simple data, such as number 1 , which we have been using so far. Let
us see an example. Open a new Excel workbook, go to the Developer tab and insert a new
module with the following code:
Sub memCollapsForm()
For i = 1 To 1000
Columns(i).FormulaR1C1 = "=RandBetween(1,1000)"
Next i
End Sub
And then, having the Windows Task Manager open, execute the macro and see how
collapseis reached, in the same way than it was when the macro wrote only number 1 , but a
lot quicker. Only 26 columns of data will be filled with the = RANDBETWEEN() formula
before the application crashes. Notice also the much more intense use of the computer s CPU
that this volatile formula, which is being recalculated with every loop in the macro, is
imposing in our system, taking it to unusually high percentages. It peaked to values close to
95% several times before collapsing. I repeated this experiment with a 16-GB RAM PC after
the June 2016 LAAC update and although the final outcome was the same, that is, collapse
after using intensely all available CPU resources, there were differences in the number of
columns Excel managed to fill with formulas, which was 59, and the RAM usage at which
collapse was reached, which was 3.3-GB, in both cases doubling the previous values before
the June 2016 update.
The collapse experience in this case appears a little more dramatic to the user and it
is probably going to include several more error messages, among them a 1004 Debugging
error and finally a general:
It is not impossible to
guess what a 64-Bit fully
operative Excel installation
with a fully capable 512-GB
physical RAM memory
could do. And even though it
is difficult to predict what
the exact RAM usage would
be, it is still quite possible to
venture that not only one, but
probably up to ten Excel
worksheets entirely filled with simple data ( 1 ) could be managed under these conditions.
A lot more could be said about memory usage in Windows but since I am no expert
in the field at all, Ill say no more about it. Many nuances in memory disposition could
influence Excels final performance in this fringe area of extremely big sets of data. But for
the ordinary user, all of this ultimately comes to a single conclusion: the maximum number of
data that an Excel file can hold, is a lot less that the one that is theoretically possible according
to the standard worksheet size, and it is radically limited by the available memory that
Windows will allow the Excel process to take, and this theoretical limit for 32-Bit versions of
Excel is 2-GB, or 4 GB for systems complying with LAAC, which in practise, and due to
memory fragmentation at handling large blocks of data[19], ends up being smaller and closer
to 1.5-GB or 3.3-GB respectively. So, only a tiny 1% to 2% of the apparently usable space in
an Excel worksheet is really available in the best case for a usual installation. In terms of data
usage capabilities, Excel 32-bit versions from 2007 on are wasting a vast theoretical storage
and operational space. It is there, but it cannot be used. Hopefully this situation will improve a
lot when fully capable 64-bit versions of Excel are implemented and running on fully capable
64-bit versions of Windows. Then, the 2-GB memory limit will be transformed in an
astonishing 512-GB[20]. Most likely, that will allow for Excel files to hold ten or more
worksheets entirely filled with data.
2.3 The File Level
2.3.1 Sheets, Formulas, Objects and Formats
Memory usage due to simply storing data in Excel files is not only related to the
number of cells containing data, but also to the type of data and the format that each cell
contains. And we shouldnt forget that worksheet cells can also hold a very special type of
data, which is really code, for that is what formulas are. Formulas and especially matrix
formulas are great consumers of memory. In general, we can expect that certain logical
rules[21] will be respected. And so, the resulting Excel file will be the greater, the more:
A detailed analysis of each of the previous factors would be too messy and would
probably not lead to any interesting additional conclusion. A similar reflection could be made
in the case of the Excel file, regarded as a composite of many sheets. There are, as far as I
know, no limits for the number of sheets a workbook can contain. Itll all eventually come
down to the number of data all those sheets contain, though sheets by themselves are rightful
Excel objects and also take up memory, as we will see in a vivid example right away within
the next chapter.
As I will explain later, there is a reason for the explicit 5447 number. You will
experience that after waiting for some 45 to 60 seconds, during which Excel has been
unresponsive in every sense of the word, the following result is shown:
Figure 39: Adding empty worksheets to a workbook. How far can Excel handle this?
And apparently a workbook containing 5,448 empty sheets is open and ready to be
used. But this is only a mirage. Just go to the Immediate Window and try to add one more
sheet to the workbook by typing
Sheets.Add
Then hit Enter . Collapse is not yet here but the result now is the following (message
in the image is in Spanish Microsoft Excel dej de trabajar. Windows est buscando una
solucin al problema, and translates as Microsoft Excel Stopped Working. Windows is
looking for a solution ):
Figure 40: No more than 5447 worksheets are admitted in a workbook.
In fact, once you minimize the VBA Editor it becomes clear that nothing practical can
be done: no editing of the document at all, no selection of cells or ranges, no saving, nothing.
Excel is frozen and stumbling on the borders of collapse. If you are patient enough, you may
still manage to save the file, the size of which by the way has not been greatly affected by
containing so many empty Worksheets, and you will realize that the roughly 160 MB of RAM
that the file consumes while being open, transforms to only 3.6 MB once it is closed.
However, let me warn you that once you have closed the file, you probably will not be able to
open it again to its full functionality. Also the VBA Editor may still respond to some basic
instructions, as we saw before so let us try something simpler than adding a new sheet. For
instance, let us count the number of sheets in the file. Type this in the Immediate Window and
click Enter:
? Sheets.Count
The number corresponds to the initial sheet plus the 5,447 added by the macro. And
in fact, it is not mere chance that I choose the number 5447 in the previous subroutine. The
first examples I run contemplated an initial number of 100,000 sheets to be added, but I
consistently found that Excel stopped when the macro tried in vain to add the 5448th sheet and,
in the middle of a poor graphic behaviour, some traces of which you can still appreciate in the
next image, it gave the following error message Run-Timeerror 1004 and Method Add of
object Sheets failed
Figure 41: Graphical performance is very poor under these circumstances
Taken all this into consideration, I think that we are not dealing with a RAM memory
usage associated problem here. In fact, if you look at the process in the Windows Task Panel,
you will realize that creating a 5,448 sheet workbook can consume, at least in my case,
between 160 and 370 MB of RAM, far away from the 1.5-GB (or 3,3GB after LAAC) that
would make Excel collapse. File size in not a problem either, even if we consider that the
virtual memory consumed is somewhat greater. As I stated before, I have been able to save the
5447-sheet file and see that the associated file size is only of 3.6 MB. But then any operation
with this file is slow and painful and eventually turns out to be not possible. I did the same
exercise decreasing the number of Sheets created and found the same usage problem with
4,000 sheets. Saving is still possible and results in a modest 2.0 MB file size, but Excel
operates with a certain delay and once the file has been closed, it only re-opens to a white
frozen screen.
Why, then, is Excel freezing here, if RAM and File Size are not getting anywhere
near the apparent limits? Office specifications for Excel, clearly state that the number of
worksheets that a workbook can contain is only limited by the available memory[22]. But we
seem to have run into a different problem here. Failure clearly occurs in the treatment VBA
applies to the Add method of the Sheet object which, for some unknown reason, cannot
add more than 5,448 sheet items into a single workbook. I repeated the same process within a
16-GB RAM PC and experienced exactly the same results. So chances are, that we have found
an unknown Excel VBA limit. I will keep on experimenting, just in case.
With respect to the difficulties in operating a workbook containing so many
worksheets, virtual memory, again, cannot be the real problem behind. My only guess is that
the problem is originated within the internal XML structure of the workbook. Since the
workbook contains thousands of worksheets, the system has to take into account and load into
RAM memory thousands of XML relationships ready to be used. And also the file includes
thousands of simple XML files, each file for each sheet. In all, the resulting Excel file is an
enormous conglomerate of XML sub files, within the different directories, together with an
extremely long list of relationships that slows down performance to an astonishing degree.
See first an image of the files within the directory worksheets:
Figure 42: There are as many sheet.xml files within the general XML structure of the Excel compound as
worksheets the workbook contains
And now a brief sample of the relationships file. There are thousands of lines of code
like the next ones:
Figure 44: I tried to trick Excel into letting me get rid of the only worksheet within a workbook. I was
operating from the IDE. Not possible even to hide it.
Charts are one of Excels strong presentation points and that is why they are widely
used. They are probably the most commonly used floating object and even though it is not
likely that we will have to encounter the frontiers of this feature during our professional
experience, it will not hurt to know the key point: a single chart of whatever kind cannot admit
more than 255 series of data, or refer to more than 255 different sheets. The rest of their
features, such as the number of points a series can contain, are only limited by available
memory. As a case in point, the next image shows Excel behaviour at charting a logarithmic
graph containing a whole column of data: 1,048,576 numbers. The graph only took some
seconds to pop up, but it soon became apparent that simple resizing and moving around the
visual area already posed problemsand triggered the No Response message on the program
bar. However, Excel was able to recuperate after a while.
Figure 45: "No response", said the bar title for some seconds while the chart for 1,048,576 data was being
drawn by Excel
Images can also be inserted in many ways within a workbook, the most common
being when they are left floating on top of a worksheet or maybe within a previously inserted
floating shape. But it is also possible to set up an image as background for a particular
worksheet by going to the Page Layout menu tabin the Ribbon and the going to Background .
Excel can handle most image formats: bmp, jpg, png, tiff You will notice that once you
have inserted an image and left it floating over a worksheet, Excel will provide a default name
in the name box of the upper left corner. This name can be changed by editing the contents of
the name box , and will represent the picture in all matters related to its handling via VBA
code. Supposing a picture has just been inserted in a recently created Excel workbook, with
one single worksheet, the name given by default will probably be Picture 1 .
Figure 46: When a picture is selected, the name box shows its name as a worksheet floating object
The correct syntax for referring to this picturewithin the IDE , for instance, in order
to select it, is as follows:
ActiveSheet.Shapes.Range(Array("Picture 1")).Select
And notice how it is necessary to refer first to the Sheet which is containing the
Picture. Objects in Excel do not float over workbooks in general but over concrete sheets
within a workbook.
Once we have selected the image floating object, a whole range of possibilities open,
from changing properties to perform actions. The best way to learn about this possibilities is
playing directly with the Macro Recorder and see what code is generated by the actions we
perform or the properties we change.
And so it was that thirty or more huge images appeared floating over the third sheet.
Once I got rid of them, the file size diminished to a manageable 18 kB. By the way, there is no
need to resort to this VBA trick since Excel 2007.
In the Home tab of the ribbon go to Find and Select and then choose Selection Pane .
All the objects floating abovethe ActiveSheet will appear on the pane, and there we will have
the chance to show them if they are hidden or hide them if they are visible. Besides each
floating object, on the right, you will see the icon of an open eye, if the Visible property of the
object is TRUE , or a horizontal line or shut eye, if it is FALSE . Two buttons on the upper
part of the pane allow for general hiding and revealing of all the objects floating over the
active worksheet, including images, charts, and the like.
Figure 47: Since Excel 2007, it is very easy to manage the visibility of all the objects floating over a worksheet
The picture or the group of pictures selected can also be compressed after having
been inserted by selecting them and then going to the PICTURE TOOLS tab that appears over
the ribbon, on the upper part of the title bar, featured in a different color to the rest of the tab
options, and then going to Format/Compress Picture . The chosen options can be applied only
to the selected image or group of images or they can also be extended to all images in the
workbook by checking/unchecking the checkboxthat says Apply only to this picture .
All treatment of images in Excel should be done having in mind that the program was
not designed as a tool for manipulating images. It is, nevertheless impressive to observe the
astonishing amount of options and tools that the ribbon provides in order to choose the final
appearance of the picture.
An awful lot more could be said about images and other floating objects, but in
regard to their influence on the file size and the possible constraints they can impose on the
limits of Excel, I think we have covered pretty much the basics.
TO DIFFERENT EVENTS.
3 OPERATIONAL LIMITS
Once we have examined the limits in the first level, that is, those related to the general
functional structure of the software and those related to the amount of data that can be put into
the workbook, which we have referred to as structural and content related limits, it is now
time to move on to the next level. Now we will be regarding Excel as basically a set of tools
to perform analysis and calculations on certain data previously loaded into the cells, so we
must now take a look at the limits of these tools, which in contrast with the ones we have seen
in the previous chapter, whose character were mainly conformational or structural, are
now clearly operational or functional limits.
#REF! The row or column of a cell Undo the cut or remove action by
negative number.
#N/A! One or more arguments cannot Check all the arguments and make
be found in the places they are sure the values can be found
expected to be found. Typical correctly. Be very careful with
#NULL! Special message in case a range We may have built the wrong
intersect.
#VALUE! At evaluating a formula, Excel Check all the arguments and make
cannot convert some function sure they are providing the data
=SUM(5+W)
Blue arrow Circular reference. Imagine cell Check dependencies and get rid of
The following image reflects a mosaic example of all formula errors, including a
warning message that Excel will give at opening the file, in case a circular reference is found
within the workbook.
Figure 50: All the different types of worksheet formula errors and a warning message about circular
references at opening the workbook.
Once the average user starts to feel comfortable using formulas, one of the things
that immediately follows is the tendency to gradually complicate these formula expressions
by nesting functions within functions, in such a way that, in spite of the readability of the final
expression being far from clear, the user feels he is doing an advanced use of the tool and
nobody can convince them of the contrary.
The maximum number of nested functions that Excel will tolerate in a single formula
since 2007 is 64[26], and thats probably more than it should, because the truth of the matter is
that the natural obscurity of the formula language, together with the absence of a decent
formula audit tool within Excel, makes the expression very difficult to understand for the
analyst, for the auditor and alas, even for the very author only a few weeks after he has written
it, even if the number of nested levels is only three or four.
In any case, it should be worth examining how Excel deals with this highly unlikely
event. Let us, then, only with illustration purposes, make an attempt at typing a formula with
64 nested levels, even if it is devoid of proper practical syntax, and see what happens, how and
when Excel reacts, if in the middle of typing or if it waits till we press Enter, and the kind of
warning message that it provides. All these points are exemplified in the following image:
Figure 51: Excels
reaction at the user trying to type too
long a formula.
3.1.2.2
Maximum
Length in
Formula
Strings and
Characters
Even if the number of nested levels that we intend to use in a single formula is no
bigger than a prudent two, there is still the possibility that the formula gets longer and longer,
simply because many partial terms are being aggregated, and we considered it more compact
and more meaningful to comprise it all within a single cell. In that case, several things should
be taken into consideration in order to avoid potential serious problems.
First, we may want to deal with strings of characters directly within the formula.
Excel has no problem identifying and operating with strings, for instance, the most usual
string operation is simple concatenation. However, no single string within a formula can be
longer than 255 characters. The following image depicts the situation clearly. The
=LEFT() formula is correct in syntax, but contains a string longer than permitted and
when we hit enter, Excel gives us warning against that
action.
It is still possible to
circumvent this limit by
concatenating as many
pieces of strings as we like,
as long as they are shorter
than 255 characters, joining
them by means of the usual
concatenation operator, the
ampersum sign, & . And yet,
although, as we have already
seen, a single cell can hold as much as 32,767 characters, it cannot hold more than 8,192
characters if Excel has to interpret the expression as a formula, that is, if the expression is
preceded by an = or a + sign. It is not easy to think of a reason why someone would want
to write such a long formula, other than to obfuscate their solutions and to hinder the work of
auditors and the future maintenance of the workbook, even for himself.
In order to illustrate this point I have created an absurd and artificially long Excel
formula, concatenating several strings of text in the middle and accumulating a number of
characters within the cell that is bigger than 8,192. If the cell were not headed by a = sign,
Excel would take the entry as text and that would be all, but the = sign forces Excel to
interpret the string as formula. And this is what happens:
Figure 53: 8192 characters is the longest any worksheet formula can be, if we want Excel to be able to
interpret the entry as such.
No formula in Excel will admit more than 255 separate arguments in a single
expression. Do not mistake the number of separate arguments with the number of cells in a
range provided as single argument. When a range of many cells is provided as a formula
argument, it counts only as one argument and there would remain 254 more arguments to be
included in that formula before Excel would experience any trouble regarding this matter.
For example, let us fill the entire column A with the number 1 , and then in cell B1,
we will write the formula:
=SUM(A1:A256)
Excel will interpret this correctly and provide the outcome 256. But if instead we take
an extremely weird approach and we write explicitly all the arguments of the formula one by
one, such as:
=SUM(A1;A2;A3;A4;;A256)
Which in spite of the fact that it seems a very tedious task to type all this characters,
we could easily do by using the following subroutine in order to write down for us the string
A1;A2;;A256 to the B1 cell:
Sub writearg256()
For i = 1 To 256
If i = 256 Then
myargs = myargs & "A" & i
Else
myargs = myargs & "A" & i & ";"
End If
Next i
myargs = "SUM(" & myargs & ")"
Cells(1, 2).Value = myargs
End Sub
And once we have all this text written into the cell, we only have to add the = sing at
the beginning in the edit box of cell B1, and so we will have transformed the long string into a
formula that Excel will be forced to interpret as such. The result will be the following
warning message:
ETC.
In the following example, a badly used formula will result in thousands and
thousands of unnecessary calculations, thus taking Excel to collapse by lack of available
resources, in this case, sequestering completely 100% of CPU. In a newly created workbook,
we will fill Column A with natural numbers, positive integers from 1 to 1,048,576. This task
can be easily performed in the following manner: type =ROW() in cell A1 , then Copy the
contents of cell A1 and paste them in one go to the entire Column A by selecting all the
column (clicking on the letter that designates the top of the column, that is, including cell A1)
and then pasting the contents as values.
Figure 55: Using formulas extensively in a careless manner can provoke massive sequestering of CPU
resources and eventual collapse
Goal Seek will look for solutions to a certain equation, or a model, or a formula
dependency, in which the whole model has been made dependent on only one unknown
parameter. Goal Seek will proceed using an approximation trial and error method equivalent
to what in mathematical calculus is known as bisection method which is a brute force
algorithm. Goal Seek can be found in the Data tab of the Ribbon and then in the group What
if analysis , and it will show the solution ( To value ) in a cell ( Set cell ) containing a formula
depending on another cellor input value ( By changing cell ) in which Excel will perform the
trial and error process, starting from the value that the cell contains at that moment. The
following image shows the initial set up of the utility for a certain simple case. The
parameters to introduce are three and many users think that they are shown in a somewhat
irritatingly confusing dialog box:
Set cell:
To value:
By changing cell:
I will avoid both, the discussion about the suitability of the dialog box and the
superfluous explanations about these, I think, self-explanatory concepts and go directly to the
illustration that an example provides.
Figure 56: Setting Goal Seek to work, but since the domain of the model does not take values lower than 25,
we will never find the solution. The domain of the model should be known in advance
The very nature of this method is already warning of possible unsolvable situations
and the previous graphic example is intended to illustrate the first likely problem in this
respect. As we can see, the curve is a parable and it gives us something we cannot always
have, a visualization of the model contained withincell F25 . This visualization allows us to
clearly see that the variable y , simply does not go lower than 25. So no matter how hard
we try, how many iterationswe force Goal Seek to take, what starting value we choose for
x , Goal Seek will never find a solution, that is, a value of the parameter x for which the
value of the variable y is lower than 25, because y just does not go into that domain.
Unless we are working in the field of pure mathematics, a visualization such as the one
showed above is very rarely possible in our usually complex models. But this example clearly
tells us that unless we have an approximate idea of the general values our model can take, and
the intervals of validity or the expected output, we will be at a disadvantage if we blindly
acceptwhat Goal Seek gives as a result.
So having this in mind, now we can start stablishing and classifying the possible
situations where Goal Seek will not find a solution even in cases where one or even many
solutions exist.
1. The solution sought is within the domain of the function that represents the
mathematical model
2. The starting point is well located relative to possible local extremes.
But in case any of these two conditions are not fulfilled, Goal Seek might easily get
into trouble.
It may seem incredible that something so simple could happen, but our models get so
complex sometimes, and especially if we built many connected dependencies between
different cells and different sheets and sometimes even different workbooks, that if we are not
careful enough, we run the risk of losing track of those connections. And the resulting
domain of the function that our model represents could be not as easy to visualize graphically
as the one we saw in the previous example. As a general practise I recommend to always do a
little graphic work finding the most representative values of the model by means of giving it
the most commonly expected inputs. That might even give you a clear picture of the domain
of the model, in order not to look for inexistent solutions and of possible local extremes, as
we will see soon, so as not to complicate the finding of a solution that does exist. Going back
to the previousexample, we will try to find a solution for y=20 , which does not exist. In that
case, Goal Seek will complete the number of default iterations which is usually 100. This
number cannot be greater than 32,767 andis set in File/Options/Formulas :
Figure 57: Setting up the number of iterations
And then begin seeking. The result, as we already know for this case, will be
disappointingly revealing: Goal Seek may not have found a solution.
RIGHT.
Figure 58: Goal Seek has not found a solution. In this case such solution didn't exist because the domain of
the model did not include the target value 20
3.1.4.1.1.2 Wrong Starting Point
In this case, there is a solution to the problem. But the bisection method operates by
consecutive approaches to the sought point, dividing intervals as largely as it can in order to
optimize the speed of the process (this is called a greedy algorithm) and progressing in the
direction that appears to be getting nearer to the aim at the quickest rate possible. In case
confusing results are met in these divisions, for instance if no clear quicker direction of
approach is found, Goal Seek may not be able to opt for any particular direction because
none is getting any nearer to the goal. Thus all the iterations will be spent without leaving the
vicinity of the starting point. This is illustrated in the following graphical example:
The model is
represented by a cubic curve
and the presence of two
local extremes,
symmetrically situated
relative to the starting point,
will get Goal Seek entangled
in an unclear situation
around x=0 from which it
will never be able to escape.
And yet the solution for
y=400 is there,
conspicuously around x=9, and a mere shift in the starting point from x=0 to x=1 will help
Goal Seek to easily find it.
Curiously enough, models apparently similar to each other do not always have the
same outcome when subjected to Goal Seek . The following image shows two curves of the
general same shape and the general same growing patterns. But the difference in the presence
of local extremes in one case and the absence in another causes them to have very different
results with Goal Seek . Starting from x=0 the solution will not be found for the curve with
extremes, but will easily be found for the other.
Figure 60: Two apparently similar mathematical models that cause Goal Seek to behave radically differently
And what if the equation that represents the model has more than one solution? In that
case, Goal Seek will at best find only the one that is closer to the starting point. And that is the
reason why it is so important to have some graphical idea of the model that the equation
represents. Any mathematical model that includes trigonometrical functions, for instance is
very likely to have many solutions and then the only way for the user not to get lost in the
jungle of possibilities or to be content with the first solution that comes across is to know
what they are doing.
And let us imagine that it represents a mathematical model for the evolution of a
certain parameter y and we are interested in monitoring positive values of x for which
y=5. Negative values of x are not a problem because they do not respond to the behaviour
of the variable x in our model. First we would proceed to calculate the values of y for a
sufficiently large and representative sample of values of x and we would make a chart of
them. This is what we would get:
Figure 61: The model cuts y=5 in four different places, so depending on the starting point, Goal Seek will be
able to find up to 4 solutions
As we can see now, the domain for which positive values of x reach y=5 is
limited to the interval from x=0 to x=5 and the number of solutions, as the chart reveals,
appears to be 4. In order to be able to find all four solutions we will have to modify the
starting pointor initial value of x that is the By Changing cell in the Goal Seek dialog
box. For instance, if the starting point is x=0 Goal Seek will only find the first solution, but
it will never get close to the three other solutions we can see beyond. It is only our graphical
understanding of the model that will allow us to try different starting points in order to get all
existing solutions.
Figure 62: The model includes a rounding formula, and this is a potential limitation for Goal Seek, because it
forces a certain precision that may not always be attainable
And when we use Goal Seek in order to find the concept Base for which the total
amount will be 8.5, we find this:
Figure 63: Indeed, rounding has made it impossible for Goal Seek to reach the target value after the number
of established iterations.
Exactly the same restrictions we have described for Goal Seek can be found when
using the SOLVER utility which, in the end, is a kind of Goal Seek with extended
functionalities like cell values adjustable to conditions: max, minetc. By the way, SOLVER
will not accept more than 200 of these adjustable cells.
There are limits for each one of the wonderful utilities that Excel is natively provided
with: filtering, sorting, conditional formatting, data importing. Though its true that some of
these limits have been so improved since 2003, going from the tenths to the thousands that
they have become insignificant o irrelevant.
The irrelevancy can come from hyper abundance. Having the possibility of mixing
64,000 different formats or styles in a worksheet is ok, but unless you are trying to
graphically design a product, in which case you are not using the right software in the first
place: what do you want so many formats for? Clarity? And if we speak about filtering or
sorting, it usually happens that after filtering or sorting 4 or 5 fields in a table, it becomes
really difficult to keep track of things and perhaps a few simple SQL SELECT statements in a
DB Management system would be more effective. So, is it really useful to be able to sort with
64 criteria instead of the usual 3 in 2003 and previous versions? The user will decide.
So, more than writing about the limits of these utilities, which by the look of them are
never going to be a problem, it would be better to write about constraints and rigidities. Do
you expect such things as retro-data-validation or sorting-dependent named-cells in a
spreadsheet software? These are neither Excel limits nor Excel bugs, only unrealistic
expectations about what a spreadsheet should do.
This case happens rather more frequently than expected. We have a simple data table
in Excel and for the sake of clarity, as it is the opinion of the user who did this, the cells of the
first column have been given names in the Name Box. Those names refer, in one way or
another, to the contents of the cell. In the example below, and for the sake of simplicity, I have
named each cell exactly like its contents and it shouldnt be the case that the connection were
lost. And yet, a simple sorting operation will destroy that connection.
Figure 64: At first sight, it might look logical to name the cells according to their contents
And then imagine we want to sort the table by age, largest to smallest values. What
will then be the content of the cell named Paul? It will be Lisa. So the user had better stop
complaining about Excel rigidities and reflect about the practical value of his strategy at
naming those cells in relation with their contents if manipulation is to follow later.
Figure 65: But if later we are going to perform sorting, naming the cells according to their contents might not
be a good idea at all
And so, we have to take that into consideration when we name cells in tables that will
later be subjected to sorting operations. Otherwise, the mess can be considerable.
This is one of the most known and evident limits of Excel. A worksheet having
1,048,576 rows since Excel 2007, one should not expect that the group of utilities in
the Data tab of the ribbon under the name of Get External Data would be able to deal with
databases having more registers than the previously mentioned number.
Figure 66: The maximum number of rows in Excel is 1,048,576 and we had better taking it into account when
getting external data
Let us check Excels performance at this. Imagine we have a .txt file originated
from a database and containing exactly 1,048,577 rows of data, one more than the number that
Excel can theoretically handle. We will open a new Excel workbookand go the Get External
Data/From text utility.
Figure 67: This dialog box is activated when we want to get external data from a text file
When we click Import , a 3 step process will begin for which we will leave selected
the default options that Excel shows (delimited, tabs, import) and then choose to put the data in
the existing worksheet. Excel will show this warning message:
So Excel will do as much as it can. All data that fits in the worksheet will be imported,
but the rest will be discarded. There are ways to circumvent this problem with the shiny Power
Utilities but as the subtitle of the book says, this is a very different story.
3.1.5.4 No Retro-Data-Validation
A great deal of Excel troubles are related to poorly or untidy inputdata. Data
Validation is great for this and it allows us to control user inputs with high accuracy, forcing
to choose from among previously accepted values. But Data Validation is not time-wise.
Imagine we have a source list as data input for a table such as the following image reflects.
The items in the Color field can only be chosen from the values in the source list $B$3:$B$5 ,
which we are going to name as a range sourcelist . But as time passes we realize that a
typowas made in the source list, and instead of writing Blue , we wrote Bleu and there
are many entries under the Color label which are erroneous. Changing the value in
the sourcelist will not change all the wrong values in the Color field of
the Units/Product/Color table . This may very likely be our intention, but it may be not, and
Excel doesnt know and doesnt take for granted any default assumptions.
Data Validation has no retrograde native capacities and we might see ourselves in the
uncomfortable situation of having to change manually all the values in the table. And in fact, it
is not such a big deal if we use Find and Replace , but it is still manual, and we always try to
avoid it because it is error prone.
Besides, Find and Replace is always a clumsy approach and a VBAsolution could be
designed to monitor changes within a previously defined sourcelist range and give the option
of automatically making this find and replace operation by means of a macro and only within
the list of validated values if the user so chooses.
Actually, this is a stimulating problem and involves at least two extremely interesting
and difficult to handle, when they operate in combination, worksheet events, and that is why
we are going to get it solved for the particular case of the previous example.
Let us change slightly the configuration of the problem as can be seen in the
following image.
Figure 70: If something changes in the source of validation, and we want that change to be reflected in the
validated list, how can we automate this?
The source list will be called myList , and the destination table of values will be
simplified to a few cells in a single column J4:J13. Also we will use city names instead of
colors. First let us enunciate the problem clearly. What we want is that in case any cell within
the source list myList range is changed, a VBA macro asks the user if they want the change to
be reflected in the Data column or myValidation range. If the user chooses Yes , the values are
changed, if not, nothing happens. This is the code associated to the object Sheet1 :
Public itWas As Variant, itIs As Variant
Private Sub Worksheet_SelectionChange(ByVal Target As Range)
'Detects any changes in the myList range and stores value in itWas
On Error Resume Next
If (Intersect(ActiveCell, Range("myList"))) = "Nothing" Then
'Nothing is done
Else
itWas = ActiveCell.Value
End If
End Sub
Private Sub Worksheet_Change(ByVal Target As Range)
'Capture value recently introduced and store in itIs
'Call retro validation macro with two arguments
'What it was before and what it is now
On Error Resume Next
If (Intersect(ActiveCell, Range("myList"))) = "Nothing" Then
'Nothing
Else
itIs = Target.Value
Call retroDataVal(itWas, itIs)
End If
End Sub
Figure 71: This is the code inserted in the Sheet1 object, event worksheet change.
And this is the code of the macro retroDataVal , included in the Module1 of the
workbook and that will be activated in case there is a change event within the input area of
source values.
Sub retroDataVal(past As Variant, present As Variant)
'Aske the user if changes have to be refelcted on Data
Dim whatToDo As VbMsgBoxResult, makeChanges As Boolean
whatToDo = MsgBox("Validation source changed" & vbCrLf _
& "To reflect on Data click YES", vbQuestion + vbYesNo, "Replicate Change")
If whatToDo = vbYes Then
'If the answer is YES...
makeChanges = True
End If
Application.EnableEvents = False
If makeChanges Then
For Each cell In Range("myValidation").Cells
If cell.Value = past Then cell.Value = present
Next cell
End If
Application.EnableEvents = True
End Sub
Notice the importance of using the worksheet events Change and SelectionChange at
the proper place, one to capture the introduced value and the other one to store the previous
value. Also, when the retroDataVal macro is called, it is necessary to suspend temporarily all
events in the application.
Application.EnableEvents = False
Once the editing process has been done, application events have to be enabled
again( True ) in order for the utility to work properly. Now suppose there is a change in the
original source myList and Madrid converts in Barcelona. This is what happens:
Figure 73: Now we have the automated process we were looking for. The chance to reflect changes in the final
data is offered
In case the user clicks Yes (Spanish S ) the change will be reflected in
the Data column. In case they click No , nothing happens. It is just what we wanted.
3.2 Number of Undo Levels
Sooner or later we all find ourselves in a situation with Excel where we need to undo
a number of actions. Excel comes preconfigured with 16 undo levels which should be enough
for most purposes. And yet it is possible to change this by editing the Windows Registry and
making the appropriate modifications. Microsoft recommends no to set the number of undo
levels to a value higher than 100 because it affects Excel performance very negatively due to
its impact on the RAM usage. Changing the Windows Registry is a delicate action that I will
not dare recommend here, nor will I give explicit details on how to do it. But in case it is of
vital importance for you as a user to count on more than 16 levels of undo, just follow the
guidelines given by Microsoft[32] and save a backup copy of the Windows Registry before,
just in case something goes wrong and your whole Windows system gets affected.
So bear in mind that macro actions cannot be undone. That is why it is always
important to be aware of the responsibility implied in developing and using VBA subroutines.
Macros can perform a huge variety of tasks, from the simplest to the more complex, but if
they have been designed, among other things, to make changes in the data loaded on the
worksheet, then the first precaution a good developer should always take is that a copy of the
original data is kept safe somewhere so that it can always be brought back if necessary.
There are several approaches to achieve a certain level of security with respect to
undoing actions performed by macros. The ones I am going to enumerate here are the more
obvious and they are not incompatible with each other.
1. Always save the workbook manually just before executing the macro. In this
way, should something unexpected happen, you will always have the
previous state at hand. Yes, it is quite tedious, I know.
2. Design your macros in such a way that the first thing they do is to always
make a backup copy of the worksheet/worksheets affected. You can keep
them on the same workbook you are working or you can send them to a new
one.
3. Always separate the input data from the output data in your workbooks,
preferably using different worksheets but if this is not possible or not
visually recommendable, separate clearly input from output areas within the
same worksheet. Now you can apply tactic number 2 only to the range area
which is affected by the macro. This area should logically be the output area.
4. The most radical and complex approach could be to include code so that the
very macro we have designed stores the existing data internally in some
variables or somewhere else in the workbook and contains a subroutine for
exactly undoing the action, a kind of anti-macro. This subroutine could be
executed automatically at the end of the procedure and could show an
optional dialog box, allowing the user to choose between undoing the
actions recently performed and accepting the result if everything looks ok.
There is no discussion that VBA makes Excel much stronger than it is by itself. The
landscape looks different and no doubt richer when contemplated from the point of view of a
solid knowledge of the combined capacities of both sides. The IDE even allows the use of
Excel as a pure programming environment that only reads values from a worksheet, operates,
and then returns the outcome only to a worksheet. However, the true power comes from the
cooperation of Excel and VBA through the use and manipulation of the Excel Object Model.
Everything in Excel: data, forms, shapes, utilities, sheets, ranges, cells can be treated as a
programming object. Its properties can be changed by means of code, and the actions we can
perform on them by ordinary methods (contextual menu, ribbon) can also be performed by
the corresponding VBA object methods.
But no matter how powerful this appears to be, VBA does not convert Excel into a
magic wand. There are also limitations to consider when using VBA. Some of these extremes
will be given by the constraints of the programming language itself, some by the
mathematical impossibilities that not even computers can avoid, some by the inappropriate
interaction between VBA code and the Excel Object Model. We must specify again and make a
distinction between limits and errors. Programming errors are usually considered under one
of the following two categories: syntax errors and Run-time errors
4.1 Syntax Errors
Syntax errors happen when carelessness in writing the code leads to what we could
call VBA typos, that is, words that have been incorrectly written and cannot be interpreted at
all by the VBA compiler. Depending on the structure of the code and the exact nature of the
typo, Excel can react in different ways. If the word that has been misspelled is a reserved
name, like AgtiveCell instead of ActiveCell and no additional measures have been taken, then
the syntax error could pass unnoticed to the compiler and show up at execution as a Run-time
error. Consider the following simple subroutine:
Sub check()
Dim qvalue As Integer
qvalue = InputBox("Enter value")
AgtiveCell.Value = qvalue Line contains typo
End Sub
Execution will cause an InputBox to pop up and once the value has been introduced
and accepted, this is what happens:
The Run-time error 424 reports about an object required. This objectis ActiveCell
that has beeninappropriately written as AgtiveCell . If the mistake hadnt been made at writing
a reserved word, that is, a word that Excel VBA already knows, but a newly created word for a
variable, the situation would be far worse, because Excel now doesnt have a way to guess the
badly written word and would just consider that it is a different variable with no value
assigned and therefore will just do nothing.
Executing this code will causethe InputBox to show up, but after typingthe entry and
clicking the OK button, nothing at all will happen. I said this is a far worse situation because
at least, in the previous one, an error message was displayed offering clues about how to
proceed. In this case, the user is left in the most absolute darkness. That is why Option
Explicit on top of each VBA module is very helpful for avoiding syntax errors to go
unnoticed and preventing them from revealing themselves as Run-time errors at execution or,
what is even worse, remain hidden and inhibitthe code from working at all. Option Explicit
forces the VBA compiler to check the syntax before execution and requires explicit variable
declaration. In the previous example this is what would happen for a mistakenly or badly
written variable:
4.2 Syntax Errors that Manifest as Run-time Errors
Good as it is, one must have into account that Option Explicit is not guaranteed to
detect and locate 100% of typos and writing errors at compiling.
Badly typed variable names and reserved words will always be detected, but typos in
object properties, for instance, very easy to make if we are typing directly or changing the
previously written property on the fly, instead of relying onthe IntelliSense capability, can
sometimes go undetected to the compiler barrier and show up at execution as annoying Run-
time errors. A long and pointless discussion could be maintained about whether these are
syntax errors in nature or Run-time errors. The truth is that their origin is syntactic but since
the compiler cannot detect them, they only manifest at the execution stage as Run-time errors.
In the following image, using IntelliSense would have prevented the typothat was
made at changing the property Value of the ActiveCell object.
Figure 76: The IntelliSense feature can be of great help in order to avoid typing errors
If this syntax error in origin is not detected by the user s own sight at checking the
code, it will go unnoticed under the radar of the compiler and reveal itself as a
metamorphosed Run-time error.
Figure 77: But not even Intellisense can assure that 100% of typing errors will be avoided. Falue was typed
instead of Value. Evidently, the ActiveCell object doesnt have a Falue property.
4.3 Run-Time Errors
Strictly speaking, Run-time errors are all those that manifest as errors at running
time, including those whose original nature being syntactic, go undetected and only show up
at execution. For the sake of clarity, and stating again that the debate is irrelevant, I will
consider all errors having syntactic origin as syntax errors, even if in order to pinpoint some
of them we need to go beyond the compiler and check many different cases at running time,
as a good programmer should always do, by the way.
Run-time errors are due to bad program planning, subroutine organizing, and not
appropriate checking and auditing of variable scope and intervals. Not being syntax errors,
they can pass unnoticed to the compiler, but they should not escape the always indispensable
proofing period. There can be many types of Run-time errors. In his 2013 Excel book, John
Walkenbach gives a list of about 80 types, and it would be pointless to describe them here
again. What we will do instead is to analyze the ones that are due to the inherent Excel VBA
limitations.
4.4 Variable Overflow
Usually the results and intermediate states of calculations within VBA macros will be
stored in variables that we will declare within the code. These variables are not ideal
containers that can hold all possible values from infinitesimality to infinity. They have
extremes, minimum and maximum values and once these limits have been overcome the code
will show an error message and will not be able to continue executing. A typical example can
be this: we declare an integer variable as integer that can only admit values between -32,256
and +32,256. However, we have not realized that a certain operation within the code will turn
up +33,879 as outcome. The moment the code arrives at this point, execution will be
interrupted and the following message shown:
Figure 78: The variable type cannot hold the value and the result is an Overflow error. Notice the title bar says
only Microsoft Visual Basic
And similar limits could be found for the rest of variable types: single, double,
decimal, byte, currency A simple search on the internet will give the exact values for the
intervals of validity of these types than any decent programmer should have at hand in order
to avoid disagreeable overflow errors. See chapter 2.1.
Reduce the size of those 6 dimensions to more moderate values and the macro will
run perfectly well as soon as the right amount of memory can be allocated by the system for
the expected size.
4.5 Extended Data Types in VBA not Supported by The
Worksheet Environment
In most cases, VBA variable types can support a wider range of values that their
worksheet equivalents. In 2.1 we have already seen a perspective of some numeric limits that
VBA can widen a little, relative to the contents of the cell and the results of formulas in the
typical worksheet cell. But there is a special case where the difference is remarkable: dates.
The difficulty comes from the fact that the Excel worksheet will not be able to show properly
these extended VBA dates as properly formatted dates. We can work with them as dates only
within the IDE environment and show them as dates in the Immediate Window , but if we need
to show them in a worksheet cell we will be able to do it only as text.
Its important for the average user to understand that a worksheet cell will never ever
be able to show any dates, with the proper date format, outside of what we have called the
comfort area or interval of validityof dates which spans from 01/01/1900 to 31/12/9999 ,
corresponding respectively to the integers 1 and 2,958,465 . But as I said somewhere before,
VBA can allow date calculations outside of the limits of this interval, at least towards the
distant past. And lets face it, any attempt at extrapolating financial trends further
than 31/12/9999 is too long a shot, and much as I trust humanity will have survived and
evolved to that date, and much as I love Excel, I sincerely doubt that spreadsheets will still be
around in those remote future days.
Remember that in the last example about dates in chapter 2, we saw that when a date is
outside of the comfort area for dates and the cell that holds it is properly formatted as date,
the cell still contains the integer, but Excel merely displaysa string in the form ########## .
At the same time a tip appears if we hover the mouse over the cell, warning that dates and
times negative or too large are shown in this way. However, something different occurs when
we ask VBA to convert a negative figure from integer to date. Lets see this with an example
that well execute from the Immediate Window .
Figure 81: As we already saw in a previous example, dates in VBA comprehend a longer period towards the
past than the mere worksheet calendar.
And indeed, -14 has been converted to date by VBA without any problems. In fact, as
long as we operate within the IDE environment we can add, subtract and operate normally
with dates, going back intothe past as far as 01/01/100 AD :
Figure 82: The first of January of the year 100 A.D. is the starting date for the VBA internal calendar. That is,
indeed, the remote past. Trajan was ruling the Roman Empire then.
Figure 83: In fact, the year 99 A.D. is not contemplated, but automatically interpreted as 1999.
Figure 84: Do not forget that every date is, at bottom, both in the worksheet and the VBA calendars, an integer
number of the Long type.
There is still the impossibility of presenting this result with a proper date format in a
worksheet cell. Say, for instance, that we want to send the result of the operation
CDate(-14), which we know to correspond to the date 16/12/1899,to cell A1. Let us try to do
it,writing directly on the Immediate Window :
Figure 85: Any attempt to write data out of the validity interval with the date format from the IDE to a
worksheet cell will result in a Run-time error.
Figure 86: At least, we can present the dates outside the interval of validity as a string of text. It looks like a
date, but it is mere text. Operations and calculations within the worksheet environment are not possible with such dates.
Now let us see a real-life, more or less, example that requires the user to work with
dates from the distant past. From a list of early Byzantine emperors and their crowning and
death[33] dates, a student of History has to calculate the exact number of days each one of
them was in office. The student has already typed the data as a table within a worksheet. As
separators for dates, the student has used / and though it appears that Excel has interpreted
dates correctly, it is only appearance and the left cell alignment is already warning that the cell
contains text.
Figure 87: The list of Eastern Roman Emperors and their inauguration and decease dates have been typed.
The dates only appear to be so. They are really text.
No matter how hard the student has tried to format the cells in order to compel Excel
to recognize the data as date, Excel has remained indifferent to this demands and, being dates
quite outside of the interval of validity, it has accepted the data as text strings. The trouble is,
of course, that the simple subtraction operation that could provide us with the information we
seek, is impossible because no mathematical operation can be performed on strings of text:
But now we know we can resort to VBAand create a User Defined Function UDF that
could help us out of this trouble and avoid a lot of menial and error prone labour counting
days and leap years, watching out for calendar changes from Gregorian to Julian, adding
upetc. Write this code in a VBA module:
Function OldDaysPassed(fromDate As Variant, toDate As _
Variant) As Long
OldDaysPassed = CLng(CDate(fromDate) - CDate(toDate))
End Function
And then use this function to obtain the number of days in office by calculating the
difference between the integers that dates are at bottom:
Figure 89: By using a UDF we manage to make calculations within the VBA environment, where dates are
properly interpreted and then we can send back results to the worksheet cell as numbers.
And using this technique well always be able to take advantage of VBAs extended
interval of validity for dates, ranging from 01/01/100 to 31/12/9999 . As I said before, its
probably pointless for any financial or engineering prospect to go so far into the future, but
with respect to the past, certain historical problems can be solved in this way.
The VBA calendar does not contain that mistake. Have it into account in case you use
VBA to operate with dates within the IDE and then use this data back on the worksheet.
Figure 91: Excel VBA is free from the well-known 29/02/1900 mistake.
In case you are going to work with files containing dates from antiquity and coming
from other platforms, database management systems, different spreadsheets, you had better
pay careful attention to this mistake. LibreOffice Calc, for instance, is free software and it is
perhaps Excel most celebrated competitor in this field. Well, there are several remarkable
differences in the treatment of dates among both office suites and I am sorry to say that
contrary to what it is normally the case in the rest of their comparative capacities, Excel
comes out as looser in this respect. I recommend that you install LibreOffice in your
computer and do the following exercise[34]:
Finally, let us see what is day 1 of the Excel VBA calendar and see if it agrees with the
Excel worksheet calendar, for which day 1 is 01/01/1900 or with the LibreOffice Calc
calendar.
Figure 93: And to conclude with dates, it also has to be mentioned that day 1 of the calendar in VBA
(31/12/1899) doesn't agree with day 1 on the worksheet calendar (01/01/1900).
I saw it coming. VBA agrees with LibreOffice Calc in respectto day 1 . Nothing to
make a fuss about, but something to bear in mind just in case.
4.6 VBA Interaction with Excels Object Model
Writing VBA code doesnt usually mean working in an isolated way within the IDE,
typing only pristine Basic language and avoiding interaction with Excel objects and structures
at all costs. On the contrary, the most common macros and subroutines, and also the more
functional and efficient ones, involve interaction between code and Excels objects: cells,
ranges, sheets, graphs, shapes, comments. Even though this interaction is not something
mysterious as far as writing the code is concerned, and in fact it can be quickly learned paying
the proper attention, there are many derivations regarding how the programming action is
actually carried out by Excel and some important restrictions must be taken into account in
order to avoid errors. The result is that some things that might look evident or simple at first
sight, may, sometimes, not be possible at all, or may, at other times, bring about some
undesirable consequences.
A floating object cannot usually be selected unless the worksheet that it belongs to is
the ActiveSheet . Otherwise an Object not found error will take place. In the following
image, there is a certain Oval 1 object floating on Sheet1 butthe macro has been called
when Sheet2 was the ActiveSheet .
Figure 94: The object named "Oval 1" is floating over Sheet1 and so it has not been found floating over
Sheet2 and therefore a Run-time error has been activated.
Figure 95: Sometimes objects are removed without properly cleaning any possible code associated with them.
There are many ways we can simulate this situation and see how Excel reacts. The
quickest one is going to the Immediate Window and start invoking objects that simply dont
exist. Among these objects there are some that could exist in theory, or may have existed in
the past and have been deleted,like a certain Sheet8 . If this Sheet8 has never been created,
then we get the following error message:
Figure 96: The same sort of thing happens if you try to invoke an inexistent object from anywhere within the
code.
But inexistence of the object that we intend to work with is not the only problem that
could arise when trying to manipulate objects in Excel. It could also be the case that an object
like the one called Sheet8 actually exists within the workbook and it has been hidden
previously because as programmers, we do not want this object to be accessible to the user, in
order to prevent improper manipulation of data, undue change of properties or accidental
loss of information, for instance. If this is the case we would not exactly get a Run-time 424
Object required error , but we would get a very similar one:
A way out of this situation is to unhide the object for a while, operate with it as it is
required and hide it again, all programmatically.
Figure 97: Selection of an object whose property visible is hidden is not possible and will result in a Run-time
object-defined error.
So in order to be able to work with objects, we usually need to select them first and in
order to be able to select them we have to make sure that their Object.Visible property is not
turned to FALSE , that they are not hidden. This requirement is not always so constricted.
There are some object properties that can be changed without the objects being selected. But if
we have to apply object methods, selection is almost certainly always required.
There is a series of events associated to the Worksheet object, some of which can be
extremely useful for macros that have to watch over certain areas where, in case something
happens, like a cell value is changed, or reaches a definite limit, or a new selection is made,
we want a particular action to take place or a certain macro to execute. Of all the potentially
dangerous and risky situations I have found myself in Excel, the correct management
of Worksheet events is the most delicate one and the most prone to result in collapse, infinite
loops and problems of the sort. In order to illustrate this problem we are going to examine a
simple case.Let us consider a list of values that we will call sourcelist . Every time one value
of this list is changed, we want Excel to automatically reflect in some contiguous cells both,
the value that the cell used to hold and the value it holds after the change has taken place. The
following image intends to illustrate the situation.
Figure 98: When data in the sourcelist range changes, we want the macro to write the previous value in cell
D3 and the new value in cell E3. Interaction between the worksheet change event and the updating of the sheet object
itself will present certain challenges, since the updating triggers a new change event.
In order for something of the sort to happen automatically without us clicking any
buttons, the only way to proceed byusing one of the Worksheet events. Let us do it step by
step and first let us try to record the new value which will go in cell E3 , under the title It is
now . First of all, in order to capture a newly typed value within a cellwe are going to use the
object target and its property value . The following image reflects clearly what we have done.
The macro has been created within the object Sheet1 ,and it has been related to the
event Change of this object. Then when the value of cell B4 is changed from Red to Yellow ,
the Target.Value has been captured and sent to cell E3 , that is to the object cells(3,5) . The
macroworks perfectly fine. But it turns out that we only want this to happen if the change takes
place within the range of cells that we have named sourcelist . So a very few lines of code
should solve the problem and we proceed like this.
Private Sub Worksheet_Change(ByVal Target As Range)
On Error Resume Next
If (Intersect(ActiveCell, Range("sourcelist"))) = "Nothing" Then
'Don't do anything
Else
Debug.Print Target.Value
End If
End Sub
Figure 100: The code prints to the Immediate Window as expected.
Now all we have to do is to recover the previous line of code and instead of printing
to the Immediate Window , write again in cell E3 . The new code will be:
Private Sub Worksheet_Change(ByVal Target As Range)
On Error Resume Next
If (Intersect(ActiveCell, Range("sourcelist"))) = "Nothing" Then
'Don't do anything
Else
Cells(3, 5) = Target.Value
End If
End Sub
So we do. And then we change the contents of cell B4 from Brown to Green .
Apparently everything seems to be working fine, but just one more little change in
cell B4 again from Green to Orange and Excel will start showing clear signs of
impending collapsebefore crashing altogether. In the following image, messages in Spanish
translate as Microsoft Excel stopped working and Would you like to send information
about the problem . In case you choose Yes a .cvr error file will go to Microsoft to
report about this unanticipatedcrash. What has created the problem is this:
the Worksheet.Change event is triggered when we click Enter after having typed Orange in
cell B4 and then the macro executes. But instead of printing results to the Immediate Window ,
we are now printing to cell D3 , an action which is the exact equivalent to editing the cell and
clicking Enter , thereby triggering the Worksheet.Change event again and so causing the
macro to execute. Since the ActiveCell is still within the sourcelist area, the macro will be
triggered again, and again, and that is how Excel VBA has entered an unsuspected and non-
mathematical infinite loop.
Figure 101: A loop trap has been created because the change event of the worksheet object is activated again
and again by the very macro that performs a new change in the contents of a cell. Excel is stuck and will simply stop
working.
When using WorkSheet events, especially the Change event, one has to be very
careful not to send back results to the same worksheetby means of writing and therefore
triggering the Change event again and again and thus creating a kind of endless loop trap.
4.7 Unpredicted Results in Numeric Operations
Leaving out the case of variable overflow, the most dangerous situation that a
numeric operation can present within the VBA environment is division by zero. Such a
situation has to be anticipated by the programmer and handled in an elegant way, for
otherwise, the result will be a frustrating Run-time error for the end user.
Figure 102: Every division operation within the code should contain an exception to deal with the zero
denominator case.
4.8 VBA Infinite Loops
4.8.1 Badly Written Iterative Structure
Iteration or looping is one of the most commonly used programming structures in
order to do useful things with sets of numbers or collections of objects. Careless approaches
to looping structure design can cause VBA to enter an infinite loopfrom where the only way
out is interrupting Excel completely by Ctrl+Alt+Del and going to the Windows Task
Manager. Certain varieties of looping structures are more prone to this mistake than others. A
typical example is the Do While--Loop structure or the Do--Until structure, and the most
common error situation is that we forget to refresh the variable that is in control of loop
progression. Take the following example:
Sub fact2()
Calculate factorial of Number
Dim n2 As Double, i2 As Double, result As Double
n2 = InputBox("Number")
i2 = 1
result = 1
Do While i2 <= n2
result = result * i2
i2 = i2 + 1 'Control statement
Loop
MsgBox result
End Sub
And suppose that we forget the i2 = i2 + 1'Control statemen . Once it enters the Do While
loop structure, the macro will never be able to leave it. Excel will stay there giving a simple
No response message in the title bar of the program.
Figure 103: Incorrect management of iterative structures, especially of control assignments, can lead to
infinite loops.
There is no elegant way to get out of these infinite loops once we are experiencing
them at Run-time but going to the Windows Task Manager and finalizing Excel abruptly and
this is really an undesirable situation that leaves the inexperienced user in total confusion.
Needless to say that when we design looping or iterative structures infinite loops have to be
carefully anticipated so that their severe consequences avoided at all costs.
4.9 Computational Time
4.9.1 Algorithm Optimization
Problem solving of complex computational cases usually require the design of
iterative algorithms that demand a good use of resources by the programmer. Resorting to the
so called brute force algorithms is always the first approach, but it should really be the
exception, not the norm. Because even if we have been careful enough no to originate an
infinite loop, the time required for the brute force algorithm to complete the task can far
exceed reasonable values. Algorithm analysis and estimation of time consumption of iterative
structures is a whole subject of study by itself, the basis of which cannot be ignored by any
programmer. Paying no attention to algorithm design and blindly trusting always the power
of the machine by resorting to brute force approaches can lead to very surprising situations.
Let us illustrate this by a simple example. We need to sum the first n positive whole
numbers, usually called natural numbers. The immediate brute force approach will be:
Sub SumNatNum1()
limitNumber = CDbl(Inputbox("Number to sum"))
'Brute force algorithm
For i = 1 To limitNumber
result = result + i
Next i
MsgBox ("The sum of the first " & limitNumber & _
" natural numbers is: " & vbCrLf & result)
End Sub
This code does not lead to an infinite loop or to any other especial problems but it is
utterly inefficient. In order to illustrate this point we are going to gradually increase the
number to sum and measure the computational time. This is the slightly modified macro that
we will use:
Sub SumNatNum1()
limitNumber = CDbl(Inputbox("Number to sum"))
STimer = Timer
'Brute force algorithm Complexity grows with N
For i = 1 To limitNumber
result = result + i
Next i
MsgBox ("The sum of the first " & limitNumber & " natural numbers is: " & vbCrLf & result) & vbCrLf & "And
it took " & Timer - STimer & " secs"
End Sub
Everything runs smoothly and takes no time for values less than 250,000[36].
Figure 104
But once we enter the million zone, time begins to count, and in case we have to deal
with really big numbers, which is why we are using computers, the inefficiency of this
elementary brute force algorithm is revealed.
Figure 105
Thirty seconds is too much to do simple sums, especially when we can use a much
simpler algorithm. It is said that when the German mathematician Carl Friedrich Gauss was a
boy, his math teacher needed time to correct a pile of exams and posed a problem to the class
that could entertain the children for some time: Add up the first one hundred positive integers
or natural numbers, he said, expecting this to take them a full hour at least. But little Gauss
completed it in a few seconds by calculating the average of the 100 numbers, times the
number of elements:
100
50.5*100=5,500
Complexity of the brute force algorithm we used first is of the order N, that is, the
amount of operations to be made will be proportional to the amount of numbers to sum. But
in the case of the Gauss algorithm, complexity is constant because regardless of how big the
number to sum is, the amount of operations will always remain 2. So simple an algorithm is
not always possible, of course. The example has been carefully chosen with illustration
purposes and I understand some readers could doubt that the method Gauss used is an
algorithm at all. It is merely a pair of concatenated calculations.
Let us then, select a real life computational problem and solve it by using two
different real life algorithms. We will estimate complexity for both and check their
performance with a practical example in VBA Excel to see how they behave. The example I
have chosen is sorting a set of numerical values. This is a well-known and well solved
computational problem and the two different algorithms that we are going to compare are
called Insertion-Sort and Bubble-Sort . We will generate a list of 5,000 random numbers
between 1 and 100,000,000 in the first column of a new Excel worksheet and then sort the list
of values and write it in the second column. We will monitor time spent in the process taking
into account that the reading and writing processes will be the same for both algorithms and
so, the difference in duration will be due only to the sorting process.
Starting by the second element of the list, this algorithm looks at the place that each
element, one by one, has to go for the list to be sorted up to that precise element. In the worst
case, if the original list is in exactly opposite disarray, this algorithm will have to do a lot of
work.
Let us see a simple example with an original list of four numbers: 8, 5, 3, and 2.
Insertion-Sort would first look at the second element, number 5, compare it with the previous
one, which is 8, and figure out that it has to go first in order to get: 5, 8, 3, 2. Then Insertion-
Sort would look at the next element, 3, compare it to the previous one, which is 8 and figure
out that it goes first, and continue to compare it to the previous one, which is 5 and figure out
that it also goes before. So now we have: 3, 5, 8, 2. In the final step, number 2 would move all
the way from the last to the first position to get the list perfectly ordered: 2, 3, 5, 8. Let us see
what we have done:
0 8,5,3,2 1
1 5,8,3,2 2
2 3,5,8,2 3
3 2,3,5,8 ---
Total 6
As we can see, the total running time for this process will depend on the number of
elements but it will more critically depend on the initial sorting quality of the list. If the list
was originally sorted the number of operations on each step would be 1 and the total would
coincide with the number of elements. But the interesting analysis is the one we have done, the
one we call the worst case scenario. And in this case sorting a list of n elements would take
a running time which is proportional to:
This is the expression of the arithmetical series (1+2+3+n) and it can be proven in
mathematics that n2 is an upper asymptotic bound for this expression. The analysis of running
time which is usually done in the world of algorithms is what is called asymptotic analysis
and it estimates what happens to T(n) when n grows to infinity. In this kind of analysis,
leading constants and terms of lesser order than the mayor one are usually not taken into
consideration. So if, for example, an algorithm has an asymptotic upper bound proportional
to n3, this algorithm will always be slower than a different algorithm with an upper bound of
n2+n, or slower than the one we have seen, as it is shown in the next table that reflects the
number of operations to perform for three algorithms with different upper bounds:
N Elements
1 1 1 1
2 3 4 8
3 6 9 27
4 10 16 64
5 15 25 125
When we apply this subroutine to a list of 5,000 numbers, we find out that Insertion-
Sort does the job in 1.535 seconds.
Figure 106: Insertion-Sort applied to a list of 5,000 numbers.
This is perhaps the most widely known simple sorting algorithm, and popular as it is
within VBA programming circles, the truth is that it is perhaps the least efficient of all sorting
algorithms. Let us see first the code that we are going to use. This code is a modified version
of the one given in J. Walkenbachs book: Excel 2003 VBA Programming.
Sub bubbleSort()
'Sort a list of numbers
Dim NewLista() As Variant, inn As Long, k As Long, sTime As Single
sTime = Timer
inn = Selection.Cells.Count
ReDim NewLista(1 To inn)
k = 0
For Each cell In Selection
k = k + 1
NewLista(k) = cell.Value
Next cell
Call NumSort(NewLista, False)
For k = 1 To inn
Selection.Cells(k).Offset(0, 1) = NewLista(k)
Next k
MsgBox (Timer - sTime)
End Sub
Sub NumSort(nums() As Variant, Ascending As Boolean)
'Pass in numeric array you want to sort by reference and read it back. The array should be declared as an array of
variants
'Set Ascending to True to sort ascending, false to sort descending
Dim I As Long
Dim J As Long
Dim NumInArray, LowerBound As Long
NumInArray = UBound(nums)
LowerBound = LBound(nums)
For I = LowerBound To NumInArray
J = 0
For J = LowerBound To NumInArray
If Ascending = True Then
If nums(I) < nums(J) Then
NumSwap nums(I), nums(J)
End If
Else
If nums(I) > nums(J) Then
NumSwap nums(I), nums(J)
End If
End If
Next J
Next I
End Sub
Private Sub NumSwap(var1 As Variant, var2 As Variant)
Dim x As Variant
x = var1
var1 = var2
var2 = x
End Sub
If we made an asymptotical analysis of the code as we did with the previous example,
we would find out that this algorithm, as the Insertion-Sort algorithm we saw before, also has
an upper bound of the order n2. And yet at a practical level we can see that it is less efficient
because as we already said, the asymptotical analysis aims at the comparison of bigger order
terms and does not contemplate leading constants and terms of lesser order. And it is evident
that the Bubble-Sort algorithm has to loop twice internally for every step in order to exchange
the position of values and then check IF statements also twice. As a whole, the greater the
initial list of values to be sorted the greater the difference in duration between the two
subroutines will be. The following table reflects a comparison of results in duration (seconds)
between the two subroutines for a different number of elements to sort.
Bubble-Sort consistently takes longer than Insertion-Sort and the time difference
continues to grow with the number of elements to sort, even though it does it at avery slow
rate. For small lists, both subroutines perform decently, for larger lists Insertion-Sort
performs slightly better.
Figure 107: Neither of them are good sorting algorithms for long lists, but Bubble-Sort takes consistently
longer than Insertion-Sort
Sorting a list of 1,048,576 values by using the ribbon utility DATA/Sort takes little
more than a second. But attempting the same task with the Insertion-Sort subroutine will take
practically forever and let you see and suffer the fuzzy screen of death again.
I strongly recommend that you dont try this, but since it is within the goals of this
book, I am going to try to sort an entire column of numerical values using my Insertion-Sort
algorithm. As soon as execution starts, the bar title of the VBAwindow says No response .
Be prepared to wait for minutes and minutes on end, or click CTRL+ALT+DEL to interrupt
the process and use the ribbon DATA/Sort utility to get the problem solved in a jiffy.
Figure 109: The much feared "No response" message in the title bar while Insertion-Sort is trying to handle
more than a million data. If sorting 5,000 numbers took 1,5 seconds, a good 314 seconds could be expected for this
macro to perform the task. During this time, the user will be confused before this semi blank screen.
4.9.1.2 Algorithm Complexity and CPU Speed
Bear in mind that even if we have found an optimized algorithm for the problem at
hand, the CPU of the actual machine we are using will rarely dedicate as many resources as
we would like to the problem. The Windows OS will be responsible for the final assignment
of resources and this will always be a limit we cannot control, unless we are experienced OS
programmers, of course.
And sooner or later, occasions will arise when the only possible strategy to tackle a
computational problem successfully is a brute force algorithm. The following example shows
the simple task of counting the cells in a range selection.
Sub countCells()
Dim hmCells As Double, sTime As Long, tTime As Long
sTime = Timer
For Each cell In Selection
hmCells = hmCells + 1
Next cell
tTime = Timer - sTime
Debug.Print tTime & "___" & hmCells
End Sub
The question is: Will we be able to count all the cells in a worksheet, which since
Excel 2007 we know are 1.048.576*16.384=17.179.869.184, with such a macro?
Let us start by selecting 100 columns, that is, well more than 100 million cells. Since
the task of counting cells with this brute force algorithm will be accumulative in a linear
manner, once we have counted the cells in 100 columns we will be able to estimate how long
it would take to count all the cells in the worksheet by simply multiplying.
Figure 110: A simple macro to count the number of cells in 100 entire columns.
If counting all the cells in 100 columns took this macro 15 seconds, and since, as we
said, the task is linearly accumulative, we can estimate that counting the total number of cells
in the worksheet, which has 16,384 columns, would take:
It is a divergent series which means there is no limit for its summation when n
tends to infinity. Another way of looking at this might be to say that Hn always grows as n
grows, yet the growth is ever decreasing. A standard problem related to the harmonic series is
calculating the value of n, that is, how many terms is necessary to sum in order for Hn to
reach a certain value, let us say 50. A first and superficial look at the problem induces the user
to think that it is easily solvable. Consider the following code:
Sub harmonic()
Dim i As Long, Hn As Double, n As Double
n = InputBox("Harmonic series to number")
For i = 1 To n
Hn = Hn + (1 / i)
Next i
Debug.Print Hn
End Sub
In order to get H100, let us execute the macro and enter n=100 in the Inputbox form
which is displayed.
Figure 111
So one guesses that apparently, if by only summing the first one hundred terms we
reach the value: Hn = 5.18
But the real situation is quite different. As we already stated, the growth rate of the
series decreases with n, and in fact it does it so quickly that the problem turns out to be
unsolvable within the Excel VBA frame. In fact this is a typical computational problem that
has to be tackled using radically different approaches to the usual brute force algorithm and
all this only to get approximate answers. Summing 1,000,000 terms will take us to Hn = 14.39.
Going to 100,000,000 will only add up to Hn = 18.99 and consume more time than it is
reasonable. Very soon we would realize that getting to Hn = 50 is far beyond the reach of our
computer. With the usual lines of code that allow us to get information about time consumed
by the algorithm, we get this:
Figure 113: Monitoring the time it takes for the macro to obtain the result of the harmonic series according to
the index number.
Any Excel VBA programmer should take very good care not to let his algorithms
into one of this unfathomable time traps. In fact, mathematicians have found, using algebraic
methods, that the value of n for which Hn = 100 is comparable to e100. And this means that
the harmonic series poses problems not only to Excel or to VBA, but to any modern
programming language that attempts to find out something about its concrete asymptotic
extremes by using brute force algorithms. There are some complex mathematical problems
that can defy not only Excel, but even the best programming languages and the strongest
computers of the age and we should always be watchful in case we have to deal with them and
perhaps be on the look up for some other subtle approaches.
As we have done in the rest of the examples in the present book, we will use
the Timer VBA method to measure how long it takes for the subroutine to run. And the result
of its execution can be seen in the following image.
This macro writeLoop took more than 16 seconds to write 300,000 data in a classical
looping cell by cell fashion.
And as it turned out this second subroutine is much faster than the first. Instead of 16
seconds, it only takes 0.2 seconds, which means that the second method is about 80 times
faster.
Figure 115: Writing on ranges in a matrix-like manner takes considerably shorter time than doing it
sequentially cell by cell. It was to be expected.
In respect to this option one thing should be mentioned. Excel VBA will not require
that there be an exact correspondence in size between the set of values and the range those
values are going to be written on. If the range of destination is bigger in size than the original
set of values of the VBA matrix, either in rows or incolumns, then extra cells will be filled
with the #N/A! formula error. If it is smaller, Excel will just fill whatever destination range is
provided and leave the rest of cells blank. The difference in size will not generate errors, but
at the same time it will not give any clues about the exactitude of the outcome.
Figure 116: If the destination range happens to be larger than the VBA matrix, then the extra cells will be
filled with a #N/A formula error.
For illustration purposes I have chosen an example that is hard in the details of the
algorithm that performs the action, but simple in its practical application regarding the task
that will provoke the error. Given a set of numbers or letters in the first row of a newly open
workbook, we will calculate and write down in consecutive rows the different permutations
(without repetition) that the set of numbers can generate.
I will not enter deeply in the reasoning process that gives birth to the code that I
present here, which is my own creation. I leave it to the reader to check that it performs the
task correctly. There are many other methods to get the permutations of a given set of
numbers, but mine is specially oriented to illustrate the limits of an Excel sheet.
Option Explicit
Sub R_Permutacion()
'Insertion-Push algorithm to generate permutations in Excel
'By Eloy Caballero
'***For more solutions visit***www.ideasexcel.com*********
Dim DatString() As Variant 'It will hold the chain to permute
Dim ene As Long, sTimer As Single
ene = Worksheets("Sheet1").Range("a1").CurrentRegion.Count 'Number of elements: Minimum 2 elements
If ene <= 1 Then
MsgBox "Minimum 2 elements"
Exit Sub
End If
ReDim DatString(1 To ene)
Dim i As Long, j As Long, k As Long, m As Long, n As Long
'Load elements to permute
For i = 1 To ene
DatString(i) = Worksheets("Sheet1").Cells(1, i).Value
Next i
Dim ciclos As Long
ciclos = Fkii(ene)
sTimer = Timer
'Start Push-Insertion algorithm****************************
Dim FilCiclo, ColCiclo As Long 'Rows and columns each loop
Dim NewElment As Variant 'New element to be pushed
Dim MPer() As Variant, MPas() As Variant 'Arrays for results and passage
ReDim MPer(1 To ciclos, 1 To ene)
ReDim MPas(1 To ciclos, 1 To ene)
Dim SubNiv As Long 'Sublevel of passage matrix
'Load first two levels of permutation***********************
MPer(1, 1) = DatString(2)
MPer(1, 2) = DatString(1)
MPer(2, 1) = DatString(1)
MPer(2, 2) = DatString(2)
'Start Push-Insertion algorithm
'As many loops as elements to permute
For i = 3 To ene
FilCiclo = Fkii(i)
ColCiclo = i
NewElment = DatString(i)
'from 2nd loop on create
'passage matrix for each row in cicle
For j = 1 To FilCiclo
SubNiv = 1 + Application.WorksheetFunction.Quotient((j - 1), i)
For k = 1 To ColCiclo
MPas(j, k) = MPer(SubNiv, k)
Next k
Next j
'Populate permutation matrix with passage matrix***********
For m = 1 To ciclos
For n = 1 To ene
MPer(m, n) = MPas(m, n)
Next n
Next m
'For each row in loop, push and insertion****************
Dim jmod As Long
For j = 1 To FilCiclo
'Modulate index j for correct column cicle
jmod = j Mod i
If jmod = 0 Then
jmod = i
End If
'from the last backward pushing to jmod
For k = ColCiclo To jmod + 1 Step -1
MPer(j, k) = MPer(j, k - 1)
Next k
'In jmod new element in blank generated when pushing
MPer(j, jmod) = NewElment
Next j
Next i
'Print results in a down row fashion**********************
For m = 1 To ciclos
For n = 1 To ene
Worksheets("Sheet1").Cells(m + 1, n) = MPer(m, n)
Next n
Next m
MsgBox ciclos & "Generated permutations" & "..." & Timer - sTimer
End Sub
'*********************************************************
Function Fkii(arg As Long) As Long
'Calculate factorial of a number
Fkii = Application.WorksheetFunction.Fact(arg)
End Function
Now, as the reader probably already knows, the number of permutations that can be
generated from a set of n elements is given by the expression n!, the factorial of n, or
1*2*(n-1)*n. And this is a number that grows very rapidly and in fact, it dramatically beats
the geometric series of n order for any sufficiently large n.
Figure 117: See how the geometrical series of different order grows in comparison with the factorial. It turns
out that n! is greater than any nm for a sufficiently large n.
If worksheet limitations are not taken into account, any attempt to get permutations
for an original set of numbers of 10 or more elements, will result in a generated set of
permutations that is bigger than the number of rows a worksheet can hold, and since we are
printing results in a down row fashion, Excel will not be able to print further than the
1,048,576throw and the subroutine will give a Run-time error. Let us see what it looks like.
First let us permute a list numbers 1 to 5.
Figure 118: Permuting numbers 1,2,3,4,5 means 120 permutations.
The macro works perfectly well and has generated 120 permutations in little more
than a hundredth of a second. As we increase the number of elements to permute, performance
develops as it is shown in the next table:
5 120 0.01
6 720 0.08
7 5,040 0.70
8 40,320 6.35
9 362,880 64.88
memory
This Run-time error 7 that we have obtained when the number of elements to
permute is 10, is not always guaranteed to happen. In this case memory allocation has failed to
assign enough space for the code arrays that are supposed to contain the values of the
permutations before writing the results to the worksheet.
All Run-time errors are dangerous, but this one is particularly disgusting and it
requires for the developer to have control of the size of the arrays than can be generated
inside the code in order not to let them grow in an uncontrolled way. Usually this requires
setting up limitations for the allowed sizes and preventing the user to engage in excessively
large initial situations. In the present case, and since our clear intention was that the results
should appear written in a down row manner withinthe ActiveSheet , we should not have
allowed initial selections of 10 or more elements.
The following piece of code at the beginning would have sufficed.
'Allow only 9 items max
If ene > 9 Then
MsgBox "9 elements maximum, please"
Exit Sub
End If
Otherwise, the inexpert user for whom the utility was intended will suffer the utterly
bad experience of having to dealwith the Run-time error and the VBA debugger.
Figure 119: Permuting 10 elements is more than what this algorithm can do without demanding too much
memory from Excel VBA.
If this error had not originated the macro would have kept writing until it reached
row number 1,048,576 and then there is no more rows in the worksheet that the macro can
write on. We can simulate the error that would have arisen in that case with the following very
simple macro which is to be executed in a newly created workbook,
having cell(1,1) as Activecell in Sheet1 .
Sub printMoreThanAllowed()
For i = 1 To 2000000
ActiveCell.Value = 7
ActiveCell.Offset(1, 0).Select
Next i
End Sub
This macro will attempt to fill each cell in the first column of the ActiveWorksheet
with the value 7 in a consecutively orderly down row fashion. But when the loop counter
i reaches the value 1,048,577there will be no more rows to write on. And then a Run-time
error will originate. Before we force this to happen so that we can study the details, let us
remember the kind of error that Excel initiateswhen we try to do something in a row that does
not exist. In order to do this, we will go to the Immediate Window and write there the
following statement:
Rows(2000000).select
Evidently, there is no row with the index 2,000,000 in any Excel worksheet, so let us
click Enter in the Immediate Window and obtain this:
Will the same situation be reproduced when we try to do an equivalent action from a
VBA subroutine?
Figure 121: A 1004 Object defined Run-time error happens when VBA tries to write in a cell that does not
exist.
Yes. The same error has been generated.
All this explanation was only aimed at illustrating the need for the VBA developer to
have complete control of the worksheet limits when this object is used as a destination for the
output data of VBA macros.
4.10 Active X Controls and Compatibility Issues
In case you reuse macros dating from the old days previous to Excel 2007 there is
surely something that needs updating. Very typically, the old limits for the number of rows
and columns might have been explicitly used by a former programmer and those limits might
create malfunctioning now. For instance, instead of using:
Columns(1).Select
Despite the fact that the list of additional controls for VBA UserForms displayed by
the IDE is very long, the reality is that the majority of Active X controls installed on your
Windows system cannot be used in Excel. Some of them even require special license. Also
there may be some Active X controls that worked in the past but may no longer be supported
by the IDE environment.
Figure 122: Apparently, there are lots and lots of additional controls that can be added to the project. Only a
few ones are really at the user's disposal.
There used to be, for instance, a calendar form control called DTPicker which was
taken out of circulation in Excel 2010 and is no longer available. So in case you have to deal
with an old book designed in an Excel 2007 environment and containing that control, you will
have to examine possible alternatives. Maybe there is a new version that can be downloaded as
and add-in component, like the MSCOMCT2.OCX for the mentioned DTPicker , maybe there
is no way to obtain that control anymore and you have no other choice but to perform serious
removal surgery on the project.
Figure 123: This macro used to work with Excel 2003, but it includes references to components or modules
that are no longer available. In this case, the Microsoft Calendar 10.0 Control.
This is a real case that happened to me as a consultant. I was unable to get out of the
message loop generated immediately after opening the file and was therefore unable to gain
access to any worksheet or to do anything with the workbook at all. I was forced to erase all
the code in the module that the IDE was showing as problematic and only then the next error
message came about:
Despite the fact that in order to gain access to the worksheets I was forced to erase
completely the code in the highlighted VBA module, it was only to realize after all that the
missing calendar control that was causing the problem did not even made any references to
this module. It had been inserted directly on a worksheet just in order to let the user pick up a
date more conveniently and it was still there floating as an image but unusable in any other
respect and behaving very much like an unrecognized object. It allowed resizing but no
properties were identified so nothing more could be done with it.
LICENSE.
Figure 125: Something like a digital carbon copy of the old calendar control was still visible, though only as a
passive image.
Which is basically an incompatibility error and can arise too when we try to execute
a method or modify a property that an object cannot support. For instance, the method
Move has meaning for a worksheet, but it doesnt mean anything for a cell or a range.
Figure 126: Do not try to apply methods that are not supported by the object in question.
The opposite situation is also possible though less likely every day. It is amazing to
see, and it only speaks favorably about Excel reliability, that a lot of people continue to work
with versions of the spreadsheet software dating from years ago, particularly 2007 and 2010.
But the possibility that you come across a client that still needs a workbook in the old format
of Excel 2003 cannot be entirely disregarded.
Probably the most remarkable innovation that Excel 2007 brought was the new 4
letter extensions for the files. Any workbook created with Excel 2007 or later will give a
compatibility warning if we try to save it with the old format. And this is because some
features like formulas or controls or object properties or methods[37] that are not supported
any longer will not be saved and that could cause a minor or significant loss of functionality.
Figure 127: Saving files to a lower version of Excel will trigger this message box with a list of warnings
concerning loss of functionality
Figure 128: Active X controls can be inserted as floating objects over the worksheet, but the same limitations
apply than those already mentioned for IDE userform controls.
For example, the WizCombo control appears to be ready for use but if we try to put
it on a worksheet we get the warning message:
Figure 129: This is what happens when you try to insert or embed most of the Active x controls shown in the
list.
However, this particular control can at least be inserted on a Userform within the
IDE:
Figure 130: The WizCombo Class control cannot be inserted as a floating object over a worksheet, but it can
be embedded in a user form.
Figure 131: Excel cannot explain exactly what or why, but the unspecified error message tells us that
something went wrong when we tried to embed a particular control in the user form.
Old Excel projects containing macros can also have dependencies on external .bas
modules that were planned to be imported from an explicit location that doesnt exist in the
new computer. If that is the case, a Path not found error 76 could arise.
Figure 132: The macro intended to save the file to a location that no longer exists in the computer. The result
is a "Path not found" error.
Hey, in case something goes wrong from now on, just ignore
it and go ahead
Fair enough. This doesnt look professional to Excel VBA outsiders. But the
interaction between the VBA code and the Object Model in Excel makes this statement very
useful sometimes when we know that a certain error will arise naturally and instead of
trapping the error in running time and acting accordingly, there is no danger at all in
ignoring it for what remains to be done in the execution sequence. For instance, performing
actions on floating objects requires selection and a macro might be aimed at doing something
to an object only in case there is such object and simply do nothing in the negative case.
Is it better to capture the error and act accordingly instead of just ignoring it? As a
general rule the answer is: yes, it is. It is always better and less risky and looks more
professional. If I create a macro that is supposed to do something to an object and the object is
not found, it is perhaps better for the user to receive the message No object on which to
perform the action has been found than a situation where nothing happens. This can be
confusing. But having stated that, it is equally true that On Error Resume Next comes in very handy
many times and, the essential precautions having been taken, the experienced programmer
will take very good advantage of it. I will just cite one of the examples already used in this
book:
Private Sub Worksheet_Change(ByVal Target As Range)
On Error Resume Next
If (Intersect(ActiveCell, Range("sourcelist"))) = _
"Nothing" Then
'Don't do anything
Else
Debug.Print = Target.Value
End If
End Sub
This macro performs the action Debug.Print = Target.Value only if the intersection
between ActiveCell and the range sourcelist is not empty, otherwise it will give an error. In
case that intersection were empty and instead of using the statement On Error Resume Next we
could have captured the error, and could have shown a message to the user. Butsince the
macro is triggered by the WorkSheet.Change event, the user would be constantly receiving
error messages every time the ActiveCell would not be in the area of the intersection. This
would be utterlyunnecessary and redundant. And in this case, the use of On Error Resume Next is
plainly justified. This justification can be defended strongly or weakly. All those who refer to
themselves as serious programmers will perhaps never accept it. But my opinion is that when
used wisely, On Error Resume Next can come in very handy for Excel VBA developers without
adding significant risk and that no Excel programmer should be ashamed of resorting to its
use as long as it is done in a sensible way.
Take the example we used in the previous chapter, the macro that attempted to fill the
first column with the value 7 , thus overflowing the number of rows that a worksheet
contains. Remember that we got a Run-time errorthat we could have avoided easily by
using On Error Resume Next :
Sub printMoreThanAllowed()
On error resume next
For i = 1 To 2000000
ActiveCell.Value = 7
ActiveCell.Offset(1, 0).Select
Next i
End Sub
The macro would have kept loopinguntil it reached 2,000,000. Errors would have
been generated from i=1,046,576 on but they would have been ignored and only a little time
would have been lost while the macro kept on looping. I agree with all those who protest that
it would have been better to keep track of the i index and never allow it to grow beyond
what is permitted. But in case this is not possible, I think the controlled use of the statement On
Error Resume Next is perfectly legitimate for VBA users and developers.
4.12 Excel VBA Oddities
4.12.1 Some Disappointing Features about Excel VBA
There are some very demanding tasks in Excel VBA, which are not infinite loops,
where the program shows very poor performance with symptoms such as fuzziness, screen
freezing, No response messages in the title bar and the like. And even if we have
anticipated this because we were aware of the complexity of the algorithm, and we have
designedsome information methods like showing progress in the Status Bar in order to let the
user know that Excel is not in No response state, but simply working on it, we may find
after all that Excel is incapable of showing this information while it is doing the main task.
The Status Bar is not updated, form controls are not updated. Nothing happens but the
message on the program upper bar that shows the name of the file besides a No response
message. This is very disappointing.The following image contains a No response message
in the title bar (Spanish No responde)
Figure 133: No response message in the title bar. The situation is confusing, but Excel is working on the
task and it may take it some time to recover its normal functionality.
4.12.2 DoEvents
Many of these screen freezing problems can be solved by using the VBA statement
DoEvents that forces Excel to explicitly do what it has been told to do. But the apparent
advantages of DoEvents come at a very high price in terms of computational resources and
time consumed. Consider the following VBA form:
Figure 134: This is the VBA UserForm we are going to work with.
This is the very same solution we used in order to solve the problem of summing the
first n natural numbers. But this time, the UserForm1 and its input and output controls, will
allow us to have more detailed information about the process. That is the intention of the two
statements:
Label1.Caption = Format(mySum, "##,##")
Label6.Caption = Timer - Stime & " secs"
The code can be triggered by writing the following statementin the Immediate
Window :
UserForm1.Show
In the TextBox we will type 1000000 and the click the button Sum
Figure 135: Fuzziness and "No response". Excel VBA is at work, but we have no clue about how long it might
take.
And as we can see, the labels we included in the form are not updating and showing
information, yet they should.The Userform is in a visually fuzzy freezing state. It is not yet
the state of collapse, but it doesnt allow draggingand warns us with a No response
message in the title bar. These controls are not behaving properly and this is one of the few
things that can legitimately be blamed on Excel as really unsatisfactory and depressing
features. This should not happen.
We can force the control labelsto behave by including the DoEvents statement, for
which we only have to uncomment the appropriate line of code in the previous example. It
took Excel around 12 seconds to perform the previous macro. Let us see what happens when
the DoEvents statement is activated.
Label1.Caption = Format(mySum, "##,##")
Label6.Caption = Timer - Stime & " secs"
DoEvents
it is uncommented to let it work
Next i
Figure 136
Now the labels are showing progress information and the user form not only lacks
the No response dismal message, but it even allows dragging across the screen. There is
no screen freezing, no fuzziness and the VBA instructions are being followed to the letter. But
instead of 12 seconds, the process now took 185 seconds, more than 15 times the period it
took without the DoEvents line. Now we can guess that Excel was probably optimizing
resources and putting them to work on the important aspects of the macro, disregarding the
showing of progress information, and even looking inelegant and fuzzy in favor of the core
task being performed with priority. DoEvents compels Excel to minutely do what it has been
told, but at the same time, prevents it from taking certain shortcuts and neglecting secondary
tasks that in spite of making it look awkward, could save a lot of time.
It should be said that the StatusBar , in spite of it being a native Excel component
especially designed to show information, is a great consumer of resources too.Let us slightly
modify the previous macro to show the same progress information on the Status Bar :
Private Sub CommandButton1_Click()
Stime = Timer
Dim mySum As Double
lastN = CLng(TextBox1.Text)
For i = 1 To lastN
mySum = mySum + i
Application.StatusBar = "Progress is... " & Format(i / _
lastN, "0.00%") & " Time is..." & Timer - Stime
Next i
End Sub
The Status Bar responds well but the process takes 111 seconds. It is quite sad that
Excel cannot make a better management of graphical resources and that in case we are forced
to show information progress in order not to let the user in the dark about what our utility is
doingwhile it appears fuzzy and shows No response , we have but little choice between Status
Bar and the DoEvents statement, both excessively expensive in terms of computational time.
By File Corruption we understand the state a computer file of any type reaches when
after having been subjected to manipulation of different kinds and saved in different physical
disks or drives through the years, the application naturally associated to its file extension is
no longer capable of handling its contents correctly or showing or even recovering part, or
maybe all the information that the file originally contained. In Excel this can vary from losing
some data in the cells, or losing Active X controls, form controls, code, or other VBA
components, maybe losing all the VBA code and macros, and perhaps resulting in the
impossibility of being able even to open the file or to recover any original information or
data at all. File corruption is a serious matter whose original causes can seldom be traced with
certainty and this prevents us from being able to make clear diagnoses on how to avoid file
corruption in the future. Once a file has been corrupted this state is rather persistent and the
only sensible option is usually trying to extract as much data from the file as possible, using
whatever method we can resort to, and then copying it to a new file. In Excel, and provided the
corrupt file can still be opened, we could probably copy and paste worksheet by worksheet, to
a new file.
Figure 138: All the file extensions that Excel can handle.
File extension changing is easy to do, but it is also a delicate matter that could lead to
serious information loss and that is why Windows will take precautions against it. Any attempt
at changing the extension of a file in Windows will trigger a system message box warning
that changing the file extension might cause the file to become unusable and asking for
confirmation.
Since 2007, VBA functionality depends critically on file extension. Those extensions
capable of storing macros are .xlsm , .xltm for templates, and .xlam for add-ins.
The extension .xlsb . also usually preserves the macros. If a workbook containing macros
is to be recorded with the Excel default extension, which astonishingly is .xlsx since
2007[40], Excel at least shows a somewhat confusing warning message that offers the macro-
free option as default:
Figure 139: A rather confusing message when trying to save a macro workbook.
Excel assumes that the user does not want to save the macros by default and this is
really an inappropriate thing. But thats the way it is since 2007.If the Yes button is clicked and
the file is misguidedly saved as .xlsx macro-free file, then all VBA related functionality
will be lost. In the next graphical example, originally a macro was written and associated to
the button Greetings but, in an unguarded moment, the file was finally saved by the user as
proposed by the default option .xlsx . When reopened, evidently no macros remained.
Figure 140: Not paying careful attention at the time of saving can lead to serious loss of typed code.
From Excel 2007 onwards the real format which is behind any Excel file is the XML
format which comprises a structure of files and directories internally related by means of
XML code. As John Walkenbach specifies in his 2013 book[41] Excel files are actually
compressed files, able to be read by the spreadsheet software, and as such they can be
uncompressed and its individual components examined. The easiest approach to this is just to
rename the Excel file in the Windows File Explorer, by adding .zip to the
whateverfilename.xlsx original name of the file and then unzipping the resulting
compressed file with any program capable of doing so, like WinZip or 7zip or the very same
Windows File Explorer. Let us prepare a very simple Excel file whose contents can be seen in
the following image: one single worksheet, six cells containing simple data.
Figure 141
Close the file, go to Windows File Explorer and add .zip to its name. Windows
will show a message warning of possible unusable file because of this change in the
extension, but we will proceed because we know what we are doing.
Double click or use an unzip program and you will see this:
Figure 143: The inner structure of an Excel file from the XML point of view.
But since the goal of this example is to provoke file corruption, let us actually unzip
the created zip in order to be able to access and edit those files from inside the structure.
These are the contents of the xl directory:
Figure 144: More files and new folders inside the XL folder of the XML structure.
Figure 145
And so we do, erasing randomly one or two characters, saving the file afterwards,
compressing again the whole XML compound of files and folders to a zip format and
changing again the name by erasing the .zip part and leaving the file with the original .xlsx
format. Will Excel be able to open this file or will the file have been corrupted beyond
recovery? This is the message we get when we try to open the book:
Figure 147: The message says: "We found a problem with some content in "File name.zip.xlsx" Do you want
us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
And in fact clicking Yes and trying to recover as much as we can results in nothing,
because with our internal manipulation, though insignificant, we spoiled the only worksheet
the workbook had and nothing can be done now. The file is corrupt beyond help and so Excel
confirms:
This corruption process that we have artificially forced here, can be the equivalent of
many real life situations that lead to this undesirable result. And by the way, do not forget that
this zip and unzip trick will only work for Excel files from 2007 on, never for 2003 and
previous files with the old .xls and similar three letter file extensions. A similar but more
radical approach to artificially corrupting Excel files could be taken if you manage to edit the
file with a program like Notepad , modify one or two characters within the gibberish of code
and save the changes without changing the original Excel extension. The result would be the
same: the file is corrupt. I created a simple Excel file for this purpose containing exactly the
same kind of simple information that I used before, three cells with numbers 1 in cell A1, and
2 and 3 in cells A2 an A3, and three cells with letters in the same manner: A,B,C in cells B1 to
B3. Then I editedthe file with Notepad++ .
Figure 149: Excel files and any file can be edited with Notepad++.
The sight of this file in Notepad++ is disquieting enough because of non-native code
obfuscation but we can easily erase just a few characters and save the file without changing
the original file extension .xlsx .
Figure 150: But editing with Notepad++ means facing lots of obfuscated code that is, in fact,
incomprehensible gibberish. And yet, deleting or changing any character within these lines will have serious
consequences for the integrity of the file.
The attempt to reopen the file with Excel now warns us of problems found with the
content, but it is still possible to try to recover as much as we can.
Figure 151: We are already familiar with this message. If we click Yes, Excel will try to recover as much as it
can.
If we click Yes, Excel tries to recover as much as it can. When the process is
completed, a new warning message arises.
Figure 152: But there is no guarantee that the recovery process is going to be successful. In this case, the file
was open, but the data in the worksheet cells was lost.
According to this report only some worksheet properties had been removed from the
file, but the truth is that even the actual simple data that I had typed in range A1:B3 has been
lost beyond help. This is typical of corrupt files.
Accessing the internals of an Excel file either taking advantage of its XMLstructure,
or by using a text editor such as Notepad++ , makes it possible to find and manipulate
information regarding, among other things, password protection. It is no secret that password
protection of Excel files is not especially strong, not even against the most common and low
level attacks. But breaking password protection does not always have to be an essentially evil
thing. The case might be that a client has an old password protected Excel file that someone
created long ago. The password may have been forgotten or lost and the client desperately
needs to open the file. Let it suffice for the reader to know that it is possible to access this data
quite easily by editing the files of the .zip arrangement in the way that we have described. I
will not give any more details here, but the internet is full of material regarding this topic and
I insist that occasions may come for the Excel user or programmer where password
protection breaking has to be attempted without any malicious purpose behind.
One final situation that cannot be forgotten when dealing with these general file level
issues in Excel is the frightening Device I/O error. As far as I can tell this error is less and
less frequent since Excel 2007. It is typically related to VBA writing/reading data to/from
external devices and the impossibility of doing so because of missing or corrupt parts of
code, inexistent paths to directories and files which are no longer there as such in the expected
location, maybe because of a printing statement when the macro is executed on a computer
that has no printer installed. The error code for Device Input/Output (I/O) error is 57 and the
chances of recovering information from a Workbook that suffers this error are virtually null.
Device Input/Output (I/O) errors are not to be taken slightly. Experts say they can also be
caused by power interruption, hard drive crash, incorrect or incomplete installation of Excel,
influence of unverified add-ins, or uncontrolled changes in the Windows Registry. We may be
using, perhaps, a not 100% legal version of Windows or Excel. Also virus infection could not
be disregarded as a possibility.
In the old days it was also common to save files in devices such as floppy disks with
only 1.2 MB of memory. In case we opened an Excel file from one of these devices, made
some work with it and increased its size in such a way that the capacity of the device was
surpassed, if we tried to save the file in the device once again, Excel could not do it and
chances were that the Device Input/Output (I/O) error took place or perhaps that Excel
presented a warning: The disk is full .
Figure 154
Excel would still give us the chance to save the file in a different location. If we
insisted in saving the file to an already full device, modern versions of Excel would interpret
that as a troublesome situation and would still try to save the workbook automatically in some
location of its own choice within the computer with the .xlsb extension.
Figure 155: The message says: "Excel encountered errors during save. However, Excel was able to minimally
save your file to C:\Dir....\FileName.xlsb".
Once the saving process is completed,a version .xlsx file will be recovered with a
final information message reporting details of the process:
Figure 156
Whatever the cause of corruption has been, as I said, once it happens it is persistent
and since the corrupt file is, almost certainly, beyond any hope of full recovery, the priority is
usually to try to recover as much data from the corrupt file as possible and put it into a new
workbook. We have already mentioned some data recovery approaches in the last paragraphs,
and indeed most of them are launched automatically by Excel itself in case problems with the
content are found. Microsoft has support pages with special information on how to proceed
on such cases[42] and some of the suggested approaches are really picturesque, like saving
the workbook with a .sylk extension. Also, it should be mentioned that opening an existing
Excel workbook from the dialog box of the program will always give us the chance to try and
repair corrupt files. I will try again with my simple artificially corrupted file named: Simple
to play with corruption.xlsm
Figure 157: Attempting to repair an Excel file: option "Open and repair" in the open dialog box.
There is no assurance that the recovery processwill be successful at all, but in any
case, two options are offered then: Repair or Extract Data
Figure 158: The message says that Excel can perform checks while opening the workbook and attempt to
repair corruption defects. Then it offers options for complete repair or simple data recovery.
First option is repairing the file, that turns out to be impossible for Excel:
Figure 159: And these tiny characters read: "Excel cannot open the file because the file format or file
extension is not valid"
Figure 160: Excel warns us that formula references cannot be recovered and asks if we want Excel to
transform the formulas into values. However, both options are eligible.
Since my workbook was so simple and didnt contain any formulas, I will
click convert to values and see what happens:
Figure 161: Lamentably, the recovery process, though followed diligently, has not been able to recover any
data at all.
No. Unfortunately Excel was not able to recover any data at all and it only
said: Unable to read the file .
Figure 164: The message reads: "Excel cannot update one or more links in this workbook..."
I tried a different trick: the Insert Hyperlink option from the ribbon to link this cell to
cell A1 from Sheet1 in my corrupt workbook.
Figure 165: I tried inserting a hyperlink to a corrupt workbook, to see if I was able to read information or
data from that file.
When the hyperlink was inserted and I clicked on it, Excel gave me this message:
Potential security concern .
Figure 166: There is a Microsoft general warning message about hyperlinks and their potentially dangerous
content.
Of course I wanted to see where this way would take me, so I clicked Yes and I got a
somewhat familiar message:
Figure 167: Again, the message: "Do you want us to try to recover as much as we can?" It doesn't look
promising but I click Yes anyway.
Figure 168
But, no, no, no. The file is beyond any hope of recovery. And yet when I clicked OK
in this message box, a surprise popped up. Excel had apparently not yet given up. The
message in Spanish says: Opening the file and still pretends that Excel is struggling to
have it open.
ANEW.
I clicked Accept again. But nothing at all happened. This was the end of my recovery
adventure by using links.
There was still a more elementary question to be answered. What if I just undid what I
had done in order to corrupt the file? Would Excel be capable of opening the workbook
again?
I went to the app.xml file and changed back Propertes to the original
Properties , saved the assembly again, changed back the name to its proper .xlsm
extension and opened it with Excel. Did the workbook open? No, it didnt. As I said before,
once a file is corrupt, the state is tenaciously persistent. And yet, being the modification that
led to file corruption so insignificant (the elimination of a single character), I was very
irritated at this failure, so I tackled the problem from another angle.
Look at this simple workbook. It has only one worksheet called Sheet1 with the
following data:
Figure 170
The name of the file is indicative of the operation we are going to perform. We will
open the file with Notepad++ , erase the first character, whichis P . This will result in Excel
file corruption, as it indeed does. Then we will open the corrupt file again with Notepad and
write the exact character that we erased in the same exact position, save the file and see
whether corruption has been reversed. It should, shouldnt it? After all, what we did to spoil
the file, we undid a few seconds later to leave the file exactly as it was when it was an
uncorrupted Excel file. And in this case, Im glad to say that I was able to heal the file and
open it again with Excel and with its contents intact. There can be some sporadic exceptions to
that tenacious persistence of file corruption that I mentioned before.
So I would like to end this chapter summarizing the main ideas about file corruption:
Causes of file corruption. It is very difficult to trace the exact causes of file
corruption but we know that the likelihood of corruption increases with the life span of the
file, the frequency of manipulation, the number of risky episodes it has gone through, like
power cuts, PC falling to the ground, and it also increases with the logical incoherencies
related to its nature: broken links, objects mentioned in the code, but erased from the
workbook. Even things which are apparently unimportant, like having a macro enabled
workbook with extension .xlsm but with no actual macros or modules within it,
incorporate a degree of logical incoherency that increases the risk of accidental corruption at
saving the workbook.
Effects of file corruption. Corruption of files, though less and less frequent with the
years, as it appears, is a very serious issue that involves the almost certain loss of VBA
functionality, Active X and form controls, floating objects, and the probable loss of data
stored in worksheets.
Recovering data from corrupt files. Once a file is corrupt the defect is rather
persistent and the chance that the file can be cleaned in some manner, though not always
impossible as we have seen, is very remote. Our best hope is to recover as much data as we
can using some of the techniques mentioned and put the data safely into a new file. There are
special ad-hoc products in the market dedicated to Excel corrupt file data recovery and some
people say that LibreOffice Calc can sometimes be able to open corrupt Excel files or that a
try should be given to it before going for the market product or loosing hope altogether. I
cannot confirm or deny any of these testimonies. I have not tried any third party products and
my attempt to open a corrupt file with LibreOffice Calc passed through a sequence of
promising messages, but ended up in general error.
Microsoft Excel is by far the best spreadsheet software in the market, still more if
considered alongside the additional capabilities added by the VBA programing environment.
But in spite of all these qualifications, Excel is not all-powerful and has some limitations.
These limitations do not usually have a critical nature, but it is good to know about them in
case we might come across them in the future.
A lot of what are normally called Excel errors and Excel bugs are really due to
inaccurate introduction of data, careless use of native functionalities, ignorance or
misunderstanding of the actual real limits of the tool and unrealistic expectations about how a
spreadsheet software should behave. But even in these extreme situations, Excel responds
relatively well, very rarely, if ever at all, giving an erroneous result or calculation as an
output and always showing messages containing hints to help us understand and solve the
issue.
And now this exploration trip of the borders of the Excel world comes to an end. I
hope that you have enjoyed it and that you have learned something new about your favorite
spreadsheet software.
EXCEL WORLD.
Bibliography
Alconchel, M. B. (2005). Matemticas con Microsoft Excel. Madrid: Ra-Ma.
Billo, E. J. (2007). Excel for Scientists and Engineers. Hoboken, New Jersey: Wiley-
Interscience.
Bullen, S., Bovey, R., & Green, J. (2005). Professional Excel Development. Upper Saddle
River, NJ: Addison-Wesley.
Ediciones ENI. (2000). Excel 2000 Sigue el ejemplo. Barcelona: Ediciones ENI.
Ediciones ENI. (2000). VBA Excel 2000 Manual prctico. Barcelona: Ediciones ENI.
Guen, F. L. (2013). Macros y lenguaje VBA Aprender a programar con Excel. Barcelona:
Ediciones ENI.
Hawley, D., & Hawley, R. (2004). Excel los mejores trucos. Madrid: Ediciones Anaya.
Jacobson, R. (2002). Excel 2002 Macros y visual basic. Madrid: McGraw Hill
Profesional.
Jelen, B., & Syrstad, T. (2007). Excel Macros y VBA Trucos esenciales. Madrid:
Ediciones Anaya.
Jimeno Garca, M., Mguez Prez, C., Matas Garca, A., & Prez Agudn, J. (2008).
Hacker Gua prctica. Madrid: Ediciones Anaya.
Moty, J., & Rendell, I. (2008). Spreadsheet Projects in Excel. London: Hodder Education.
Robinson, E. (2006). Excel VBA in Easy Steps. Southam Warwickshire: Computer Step.
Walkenbach, J. (2004). Excel 2003 Programacin con VBA. Madrid: Ediciones Anaya.
Walkenbach, J. (2013). Excel 2013 Power Programming with VBA. Hoboken, New
Jersey: John Wiley & Sons, Inc.
Index
#DIV/0! 77
#N/A! 77
#NULL! 77
#NUM! 76
#REF! 76
#VALUE! 77
#NAME? 76
Application.StatusBar 178
asymptotic analysis 140
brute force algorithm 87, 136, 137, 138, 148, 149, 151
built-in utilities 87
Circular reference 77
Do While--Loop 134
Do--Until 134
DTPicker 163
Eusprig 3
Fix-It 168
floating object 69, 71, 72, 125
Goal Seek 87, 88, 89, 90, 91, 92, 93, 94, 95, 96
greedy algorithm 91
IDE 2, 14, 20, 21, 35, 70, 71, 108, 116, 117, 123, 125, 163, 165, 168, 169, 170
Immediate Window 14, 22, 30, 31, 40, 44, 55, 63, 64, 114, 116, 118, 127, 131, 132,
161, 176
leading zeroes 16
Name Box 97
nested functions 79
overall collapse 48
Overflow 44, 113
RAM allocation 54
SQL 96
Timer 136, 137, 141, 142, 143, 147, 148, 153, 156, 157, 175, 176, 177, 178
UserForms 163
variable scope 113
VBA 1, 2, 5, 7, 8, 14, 18, 19, 20, 21, 23, 24, 27, 29, 30, 31, 32, 34, 37, 40, 44, 45, 47,
49, 61, 63, 65, 67, 70, 72, 75, 76, 96, 101, 102, 106, 108, 109, 110, 113, 114, 116, 117, 118,
119, 120, 121, 123, 124, 125, 129, 133, 134, 139, 141, 142, 145, 146, 150, 151, 152, 153, 154,
155, 160, 162, 163, 165, 172, 173, 174, 175, 177, 179, 181, 189, 200, 203
VBA compiler 108, 110, 172
VBA Editor 63
volatile formula 58
What if analysis 87
WizCombo 169
[1] https://en.wikipedia.org/wiki/2012_JPMorgan_Chase_trading_loss
[2] http://blogs.wsj.com/moneybeat/2014/10/16/spreadsheet-mistake-costs-tibco-shareholders-100-million/
[3] http://www.eusprig.org/index.htm
[4] Not strictly. VBA can be used to get data from outside sources without loading the data in a sheet, but this is
quite unusual for the average Excel user.
[5] Youll find different interpretations. Some authors think that Boolean is a particular type by itself. Others refuse
to consider error messages as an independent type of data.
[6] Shown as 1.00E+254 if the number of decimals shown in the cell is less than 15
[7] This includes integers and dates. Dates are apparently integers shown with a certain format, but internally, they
are spreadsheet numbers too and therefore 8-byte double precision floating point numbers.
[8] There are others that I do not cite, for instance currency.
[9] Until precision does not allow to distinguish from zero. We will ignore the details of infinitesimal series where
in theory, we would never reach zero.
[10] Apart from the slash sing or /, the minus sign, or -, will also be accepted by Excel as day-month-year
separator in the worksheet.
[11] I must insist; we may use the terms logical and Boolean as equivalents within the Excel environment, but
never Binary, which is a numeric base for number representation.
[12] Overcoming this memory limit is possible for OS programmers. The absolute limitation for 32-bit processes is
4GB and Windows splits 2-GB for applications and 2GB for the system. Some patches capable of modifying this limitation can
be found on the internet. I have not tried any of them. More about memory usage:
http://blogs.technet.com/b/markrussinovich/archive/2008/11/17/3155406.aspx
[15] https://fastexcel.wordpress.com/2016/11/27/excel-memory-checking-tool-using-laa-to-increase-useable-
excel-memory/
[16] https://support.microsoft.com/en-gb/kb/3160741
[17] The response in an ordinary computer may be a lot more disappointing, with Excel not being able to
accomplish the writing operation, taking ages to do it, or the whole OS graphical resources becoming irresponsive.
Nevertheless, this is a border area and some surprises may arise. Perhaps the inferior computer could outperform the superior in
some respects, depending on the total demand of the system at that precise moment and on the rest of factors involved.
[18] Unless we have set up the workbook on the less recommendable manual calculation mode. We will talk
about this later.
[19] http://www.decisionmodels.com/memlimitsc.htm
[20]http://msdn.microsoft.com/en-
us/library/windows/desktop/aa366778%28v=vs.85%29.aspx#physical_memory_limits_windows_8
[21] The logic is consistent in general terms, but can have fuzzy behavior for some particular cases. Two columns
filled with the number 15 will require a little more space than a full column of 15s and another of 16s. Sometimes strings,
though apparently longer in number of characters than some numbers, will occupy less memory space. Practice can be the
[22] http://office.microsoft.com/en-us/excel-help/excel-specifications-and-limits-HP010342495.aspx
[23] There are also hints that Excel can give us regarding his own interpretation of our entries, but that is a
different story and has nothing to do with operational errors of the spreadsheet itself.
[24] Cannot be interpreted by Excel
[25] As I have stated several times in this book, errors are not limits, still less syntax errors.
[26] It used to be only 7 in 2003 and previous versions, and formulas containing only three or four nested levels
were already a nightmare to audit and correct.
[27] http://www.decisionmodels.com/calcsecrets.htm
[28] If Im not mistaken I think that in Excel 2007 and previous Calculation Options affected not the particular
file, but Excel as a whole. This has been an improvement
[29] It would have been simpler to accumulate on each cell just adding the value from the previous cell to the
cell besides.
[30] https://en.wikipedia.org/wiki/Bisection_method
[31] And combined with any of the other integrated utilities, in fact.
[32] https://support.microsoft.com/en-us/kb/211922
[33] Neither resignation nor abdication were common back in those days.
[34] My recommendation is general, not only for doing this exercise. LibreOffice is an excellent product and can
complement MS Office for some tasks.
[35] And the natural object for such purposes in the spreadsheet environment.
[36] This is no fixed value and will depend on the CPU and RAM of your machine.
[37] https://msdn.microsoft.com/es-es/library/office/mt728944.aspx
[38] https://support.microsoft.com/en-us/kb/3025036
[39] https://msdn.microsoft.com/en-us/library/office/gg251321.aspx
[40] A sad fact that doesnt stop to amaze me and many other users. The default option, even if you have already
added code to your workbook, is .xlsx which doesnt support macros. If you click Yes, you may end up loosing all your code
in the most unexpected and unfair fashion. I hope they change this in 2016.