This is an ugly dump of the html from this nice writeup… I just didn’t want to lose it…

OK, so it’s a pun on "The 7 Habits of Highly Effective People". But
there are a number of programming techniques that I’ve found virtually instantly
identify a defective program. In addition, there are a number of techniques that
mark a program as deeply suspect, and likely to break. Finally, there are a
number of techniques which represent poor style, some exceptionally poor, which
violate principles of modularity and which result in programs that are clumsy to
construct, difficult to debug, and impossible to maintain.

Many of these are simple well-known programming issues; for example, the use
of constants where a #define is more appropriate. For some reason, I find
that there is a tendency of programmers to assign certain integers, such as control
IDs, to random hard-coded values. This is a serious problem. But there are a number of
techniques that also indicate serious flaws in a program. I will summarize these
here, and clicking on one will link to a discussion of it.

Generally, when I get a program from a client who is having problems, I can solve most of the problems by looking for the
items I describe here. My track record is that I have been shown code on a page, or on a screen, containing one or more of the
techniques I discuss here, and I point to it and say “There’s your problem!”, usually within the first 30 seconds. It sometimes takes another ten or twenty minutes to explain to the programmer what has gone wrong as a consequence, and demonstrate exactly what can go wrong in the future, but thus far I’ve never been wrong in my assessment. On one recent consulting trip, I spotted a
Sleep() inside a WaitFor…, and said “there’s the problem”, and then explained “Your program will fail under the following conditions (a), (b), (c) and the behavior it will exhibit on failure will be (d), (e) and (f)” and they said “Yes, that’s exactly where and how it is failing!”. As it turned out, there was nothing wrong with that part of the program that a complete rewrite wouldn’t solve. And the implications on
the rest of the program architecture were significant, so they now have several weeks’ work to fix the problem, but they now understand why it couldn’t possibly have worked, except by a great miracle.
I’ll talk more about the failure mode details later.

I have a little awk script I run on *.cpp, and it locates these “hotspots”. Usually in less than half an hour I can locate more problems than I have been asked to solve just by studying the code near the hotspots. (I consider this a seriously good revenue generator, since instead of the 1-day or 3-day fix expected, I often get a week or more out of it, since the client, like the one described above, says “Yes, we never did figure out why that was failing…” followed by “go ahead and fix that while you’re in there”

I’ve put some Severity notifications in these descriptions. My rating of
these is

  • Potentially fatal: The use of this
    technique shouldn’t work at all. If it appears to, it is an illusion.
  • Serious: There is an excellent chance that
    either the program is defective to start with, or an apparently simple
    change elsewhere in the program will create a defective program. In some
    situations, the fact that the program works at all should be considered
    miraculous.
  • Potentially Harmful: There is an
    excellent chance that minor changes in the program will result in a
    malfunctioning program, or necessitate gratuitous changes that a proper
    program construction would not have required.
  • Inefficient: this covers most polling
    situations. You will be wasting vast amounts of CPU time and accomplishing
    nothing.
  • Poor style: While the code works, it
    represents poor style of programming. It should be cleaned up to something
    that is manageable.
  • Deeply suspect: While the code works,
    there is almost certainly a better way to accomplish the same task which
    would be simpler, faster, cleaner, or some other suitably positive
    adjective. Generally, there are sometimes valid reasons for using the
    technique specified, but they are fairly rare and easily identified. You
    probably don’t have one of these reasons.

Note that I have not had time to complete all the points in this article, but
it has been sitting in my computer for months and I thought it time to finally
get it out, finished or not!

Threading

Processes

Clipboard

Timing

Synchronization

Memory Allocation

Dialogs, forms, and controls

I/O

Program Structure


AfxExitThread(), _endthread(), _endthreadex(),
or ExitThread()

Severity: potentially fatal

These will result in erroneous programs. The only correct way to exit a
thread is to either exit from the top-level thread function (straight worker
threads in MFC or pure C), to do a PostQuitMessage from within a UI
thread, or do a PostThreadMessage(WM_QUIT,0,0) to a UI thread from
outside it.

Why? Because these functions cause the thread to terminate immediately.
(Well, from the viewpoint of the thread, "immediately"; the DLL_THREAD_DETACH
calls will still be properly made, so it is not as bad as using TerminateThread).
Any pending objects on the stack will not have their destructors called.
Resources will leak. Worse still, if some of those objects are synchronization
objects, such as a CSingleLock or CMultiLock object, the lock will
never be released, which will eventually result in the program blocking indefinitely.

So, you claim, you don’t do any of this? Sure. Right. That’s this week. A
year from now, when you are working deep in the thread logic, and discover a
failure to synchronize, and add one of these, you are now dead. You just don’t
know it yet.

Then there’s the issue of expectations. When a top-level thread function
looks like this

/* static */ UINT CMyClass::threadfunc(LPVOID p)
   {
    while(somecondition)
       { /* thread loop */
        DoSomething();
       } /* thread loop */
    return 0;
   } // CMyClass::threadfunc

and you decide to add something useful to the completion, such as

/* static */ UINT CMyClass::threadfunc(LPVOID p)
   {
    while(somecondition)
       { /* thread loop */
        DoSomething();

       } /* thread loop */
    wnd->PostMessage(UWM_THREAD_COMPLETED);
    clean up thread state
    return 0;
   } // CMyClass::threadfunc

this is based on the assumption that you will actually exit the thread
through the top-level thread function. Any premature exit of the thread handled
at some lower level will bypass this code. So the program will occasionally
malfunction in bizarre, possibly non-reproducible ways. 

Why would you ever want to exit the thread prematurely? Many valid reasons.
The file that the thread is supposed to open isn’t accessible. The network
connection it was managing was disconnected. You had an internal data structure
consistency error.

So how do you exit a thread prematurely? Simple: throw an exception.
For example, I would write

/* static */ UINT CMyClass::threadfunc(LPVOID p)
   {
    while(somecondition)
       { /* thread loop */
        TRY 
          { /* try */
           DoSomething();
          } /* try */
        CATCH(CMyTerminateException, e)
          { /* catch */
           e->Delete();
           break;
          } /* catch */
       } /* thread loop */
    wnd->PostMessage(UWM_THREAD_COMPLETED);
    clean up thread state
    return 0;
   } // CMyClass::threadfunc

To define your own exception, you can just derive from the general MFC CException
class, e.g.,

class CMyException : public CException {
    public:
        CMyException() { }
        virtual ~CMyException() { }
};

If you want to pass parameters in the CMyException, just add
parameters to the constructor, and appropriate member variables. To throw the
exception, just write

throw new CMyException;

In pure C, you should use the __try/__except mechanism and you
can only throw integer exceptions, but the technique is the same.

Note that __try/__finally will guarantee cleanup, but only if the
thread does not terminate; so while the __finally clause is
"guaranteed" to execute, this guarantee applies only if control passes
through it. An ExitProcess or ExitThread (or anything that
conceals them) will bypass this effect.

Note that in MFC, explicitly terminating a thread is always a
potentially fatal operation, because the per-thread handle maps are not cleaned
up, including temporary objects. This will leak storage, as well as leaving
various kernel objects orphaned. AfxEndThread will actually clean up
the MFC state correctly, but has all the other hazards, so don’t do it.

Note also that the documentation of _beginthread is misleading; it
suggests that calling _endthread "helps to ensure proper recovery of
resources allocated for the thread", suggesting that it makes sense for you
to do this. Yet it also states that , "_endthread or _endthreadex
is called automatically when the thread returns from the routine passed as a
parameter", without explaining how it knows to call which function (it is
actually because _beginthread will implicitly call _endthread and _beginthreadex

will actually call _endthreadex, which you can easily see by reading the
source code for the C runtime in

d:\Program Files\Microsoft Visual Studio\vc98\crt\src\thread.c

and

d:\Program Files\Microsoft Visual Studio\vc98\crt\src\threadex.c

(Providing you installed the C runtime source; if you didn’t, you should
have. Go install it now)

What they really mean is that you should not call ExitThread,
which would not clean up these resources. There is no justification for
explicitly calling _endthread or _endthreadex yourself.


_beginthread(), _beginthreadex(), or CreateThread() in
an MFC program

Severity: potentially fatal

MFC requires certain internal state be set up properly for its correct
functioning in a multithreaded environment. This includes creating (and upon
completion, destroying) the per-thread handle maps. If you use these methods,
you bypass MFC’s maintenance of its internal state.

If you call no MFC functions from within the thread, and use no
MFC objects in any form, you might get away with using _beginthread,

_beginthreadex or CreateThread. But that means that using any
MFC function inadvertently opens you to potentially fatal problems. Since there
is essentially no reason to use these, you should use only AfxBeginThread.


CreateThread in a C program

Severity: serious,

potentially fatal

The C runtime comes in two flavors: the single-threaded libraries (normal,
debug, and DLL) and the multithreaded libraries (normal, debug, and DLL). It is essential
that you use the multithreaded C library (in any of its forms). Unfortunately,
the default for a C application is to use the single-threaded library. This
library gains some efficiency at the cost of losing all thread safety. 

If you call _beginthread or _beginthreadex, the functions will
generate a compiler error if you have not gone into Project | Settings | Code
Generation
and selected the multithreaded library. Therefore, you know

that if you get a compilation error, you have to enable the multithreaded
library. If, however, you use CreateThread, this is an API function,
which is always defined. Therefore you can create threads which use the
single-threaded C library from multiple threads.

This technique is perfectly safe if you never call any C library functions
from the thread. However, maintaining this restriction is difficult over the
lifetime of the program. Since _beginthread or, if you need all the
capabilities, _beginthreadex, already provide everything you need to do,
there is little if any reason for using CreateThread.


TerminateThread

Severity: potentially fatal

This is always a bad idea. Always. It is to my deep embarrassment that in my
early writings on threading, in Win32 Programming, I made the mistake of
using this. At least I said that I knew it was a Bad Idea, but I was under a
fair amount of pressure to finish the book, and was not as careful as I should
have been. I have documented the proper ways of terminating a blocked thread in
a companion essay

The problem with calling TerminateThread is that it will instantly
terminate the thread, and will even suppress the DLL_THREAD_DETACH calls
that many DLLs rely on for correct behavior.

Never use this call. If you think you need it, recode your program so you
don’t need it
.


Polling for thread completion

Severity: inefficient

A common failure mode in multithreaded programming is to have some thread
"wait" for another by asking "are we there yet? are we there
yet?". All this accomplishes is to waste a significant amount of CPU time,
since, unless you have a multiprocessor, the thread that is "waiting"
will consume its entire time slice asking for a condition that cannot
possibly be set to true
because during the time that thread is running, the
thread that is being waited for is not running and therefore
cannot possibly change the state!

In a multiprocessor, it means the waiting processor simply wastes a
significant amount of its time waiting for a thread to finish; that time could
be better spent doing something productive, such as running one of the other
threads of your application.

It doesn’t matter how you poll; some people use a Boolean variable, waiting
for it to go TRUE, and some use GetThreadExitCode. Both of these
are equivalent, and wrong.

There are two techniques for dealing with thread completion. The simplest
method, which I will explain why has serious defects in a different
section
of this essay, is to simply do a WaitFor… operation on the
thread handle. The thread handle remains non-signaled as long as the thread is
running (or potentially runnable) and becomes signaled when the thread
terminates. Therefore, the thread that is waiting for the completion uses no CPU
time while it is waiting. However, since this blocks the waiting thread, and
that can be the main GUI thread, this is not generally recommended in the main
thread.

The second method is to have the thread PostMessage a completion
message (user-defined) to the GUI thread. This can be handled by either a view
or the main frame. The thread that receives the message then can enable/disable
controls respond to the completion, and so on.


Blocking the main GUI thread

Severity: deeply suspect

A problem that comes up fairly often in the newsgroups deals with the fact
that the "GUI is non-responsive", usually to something like a
"Cancel" button not being able to cancel a running thread. It is essential
that you not block the GUI thread if you expect features like this to work. It
is generally a Very Bad Idea to block the GUI thread.

The usual excuse is "but I don’t want to have the user start another
thread" or something like that. Fine. This is trivial. Just disable the
controls that would initiate a new thread. For example,

void CMyView::OnStartComputation()
   {
    AfxBeginThread(...);
    threadrunning = TRUE;
   }
void CMyView::OnUpdateStartComputation(CCmdUI * pCmdUI)
   {
    pCmdUI->Enable(!threadrunning);
   }

I then use a user-defined message which the thread posts back to the main GUI
thread just after the thread loop completes, using

wnd->PostMessage(UWM_THREAD_FINISHED);

In the handler for wnd, add

ON_MESSAGE(UWM_THREAD_FINISHED, OnThreadFinished)

or

ON_REGISTERED_MESSAGE(UWM_THREAD_FINISHED, OnThreadFinished)
LRESULT CMyView::OnThreadFinished(WPARAM, LPARAM)
   {
     threadrunning = FALSE;
   }

For a dialog, read my essay on dialog control
management
. What I do is

void CMyDialog::OnStartComputation()
   {
    AfxBeginThread(...);
    threadrunning = TRUE;
    updateControls();
   }
void CMyDialog::updateControls()
   {
    ...
    c_StartComputation.EnableWindow(!threadrunning);
   }

The only difference in the handler for handling the OnThreadFinished
event in a dialog is that I call updateControls() explicitly. 

For more details on handling user-defined messages, see my essay on message
management
.

The use of a blocking call on the main GUI thread, or alternatively a long
computation, produces unexpected and generally unpleasant effects on the user
interface. The failure to put up a wait cursor exacerbates this.


Using PeekMessage anywhere

Severity: serious, potentially
fatal
, inefficient, poor
style
. (It loses in a lot of ways!)

First rule: PeekMessage is obsolete

Second rule: If you think you need PeekMessage,
consult rule 1.

Actually, PeekMessage can be useful, if used with extreme care.
For example, the MFC message pump uses it, and that code demands very careful
study if you ever think you want to use PeekMessage. It is subtle code.
If you cannot be at least as subtle as this code, don’t even consider trying to
use it.

But in most cases, just pretend you never heard of PeekMessage. Most
people who try to use it get it so badly wrong that its very existence has
negative effect on program quality. You are always safe if you apply the two
rules above. While they are a heuristic for testing the validity of a program,
one of my criteria for determining if a program is seriously defective is the
mere presence of this API call. In all cases I have seen this used, it has been
used incorrectly. So as a simple heuristic I can state that nearly everyone gets
it wrong, and therefore, its very presence is a good indicator of a
fundamentally flawed program design.

If you think you have to use it, you are almost certainly better off using
threads. But I’ve seen some pretty bad code when threads are used. So if you
think you need it in a single-threaded program, your structure is probably
wrong, and you should have used multiple threads. And if you think you need it
in a multithreaded application, you’re almost certainly wrong, since there is
nothing meaningful it can do.

For example, here is an adaptation from a piece of code I found on the MFC
newsgroup. It is actually a hybrid of several serious errors I’ve found in
multithreaded programs. I decided to discuss all of these in one synthesized
example.

// The global variable that controls and reports the thread state
BOOL running = FALSE;
CRITICAL_SECTION lock;

void CMyClass::OnInitThread()
  {
   EnterCriticalSection(&lock);
   running = TRUE;
   LeaveCriticalSection(&lock);
   AfxBeginThread(threadfunc, parms);
   // Do not block the GUI thread while the worker is running
   while(running)
     {
      MSG msg;
      if(PeekMessage(&msg, 0, 0, NULL, PM_REMOVE))
        { 
         TranslateMessage(&msg);
         DispatchMessage(&msg);
        }
     }

   DoSomething();
  }
UINT threadfunc(LPVOID p)
   {
    BOOL isrunning;
    while(TRUE)
      {
       EnterCriticalSection(&lock);
       isrunning = running
       LeaveCriticalSection(&lock);
       if(!isrunning)
          break;
       ...do thread thing here...
       if(thread_has_completed_its_work)
         { 
          EnterCriticalSection(&lock);
          running = FALSE;
          LeaveCriticalSection(&lock);
         }
      }
    }

What’s Wrong With This Picture? 

To be blunt, nearly everything. The AfxBeginThread is salvageable, as
is the code represented by the "…do thread thing here…". The test
for the thread having completed its work. The function call for doing something
when the thread has finished. Nothing else is salvageable. This is a horror of
the first order.

First, there is a global variable in use. There is no need for a global
variable. There are a lot of ways of dealing with this, but a global variable is
an indication there is something wrong with the structure. See, for example, my
essays on worker threads, and on callbacks.
These show why you would not ever need a global variable.

Then there is the global thread function. This is not necessary. This can be
a static class member function (again, see my previously-cited essays on worker
threads and callbacks).

The BOOL variable access is carefully synchronized, which is
unnecessary, and it is not declared volatile, which is necessary.

But the worst horror of all is the PeekMessage loop with its
accompanying comment. This is absolutely, positively incorrect program
structure. The failure in thinking here is that the actions which must follow
the thread completion must physically follow the thread invocation, in the same
function in which the thread invocation appears. This is completely nonsensical.
The completion actions are syntactically unrelated to the thread initiation.
They are only required to temporally follow the execution of the thread.

The PeekMessage loop is absurd here. It means that while the thread is
running, half the CPU cycles go to running a thread that is doing absolutely nothing
but consuming CPU cycles in a completely pointless and nonproductive fashion!

The correct solution is to get rid of the PeekMessage loop entirely.
There is no justification for its existence.

One of several correct approaches can be characterized by the code below:

class CMyWndClass : public CSomeWindowBaseClass {
    ... lots of stuff here
    protected:
      volatile BOOL running;
      static UINT threadfunc(LPVOID p);
      void RunThread();
      afx_msg LRESULT OnThreadFinished(WPARAM, LPARAM);
};

void CMyWndClass::OnInitiateThread()
   {
    running = FALSE; 
    updateControls(); // dialogs only
    AfxBeginThread(threadfunc, this);
   } 

That’s all that is required. Note that there is absolutely nothing else
happens in this function! That’s because there is nothing else that needs to
happen.

When this function returns, the built-in message loop resumes. If there is
nothing to do, the message loop blocks on the GetMessage, thus not
consuming any CPU cycles because there is nothing else to do. In the best case,
if this were the only application requiring service on the CPU, 100% of the CPU
cycles would go to accomplishing useful work in the thread, instead of 50% to a
useless PeekMessage loop and 50% to the thread. Read: if your task took
ten minutes under the previous implementation, it takes five minutes under this
one. That performance improvement matters.

Now what does the thread function look like? For details, consult my essay
on worker threads
, but the code is simply expressed as

/* static */ UINT CMyWndClass::threadfunc(LPVOID p)
   {
     ((CMyWndClass *)p)->RunThread();
     return 0; // return value never, ever is of interest
   }

void CMyWndClass::threadfunc()
   {
    while(running)
      { /* thread loop */
       ...do thread thing here...
       if(thread_has_completed_its_work)
         break;
      } /* thread loop */
    PostMessage(UWM_THREAD_FINISHED); // See my essay on Message Management

   }

LRESULT CMyWndClass::OnThreadFinished(WPARAM, LPARAM)
   {
    DoSomething();
    running = FALSE;
    updateControls(); // dialogs only
    return 0;
   }

This is not only efficient, but note that it does not do any synchronization!
Why not? Clearly, the running flag is being accessed by two different
threads: the GUI-level thread can set it to FALSE by the user clicking a Stop
button or selecting a Stop menu item or any other of a variety of ways.
So why don’t we need synchronization?

Synchronization is needed only when
concurrent access can produce incorrect results. The main GUI thread never reads
the value; it only writes it. The worker thread only reads it, but if there is
any timing error it really doesn’t matter in the slightest. The worst case is
that the worker thread might go through the loop one more time if there was a
concurrent access that caused the thread to see a TRUE although
concurrently there was a FALSE being set. So functionally, this is the
same as if the user had hesitated perhaps an extra nanosecond on the click. So
no synchronization is required.

What about the argument that "I don’t want the user to start the
thread(s) again until it(they) finish". Well, the first example code
doesn’t actually prevent that from happening! So it would be possible to start a
new of thread running each time the operation was re-selected. You need some way
to disable the initiation of new threads. The simplest way to do this in a
view-based app that is using menu items or toolbar buttons is to simply have an UPDATE_COMMAND_UI
handler:

void CMyView::OnUpdateStartThreads(CCmdUI * pCmdUI)
   {
     pCmdUI->Enable(!running);
   }

The controls are disabled if the thread is running. For a dialog, or a CFormView,
where you have a button on the dialog or form, the updateControls handler
I describe in my essay on control management would be

void CMyDialog::updateControls()
   {
    ...other stuff here
    c_StartThread.EnableWindow(!running);
   }

Note how much simpler the code is. There are no temporal dependencies that
are enforced by syntactic sequencing. The system is fully asynchronous, requires
a minimum of synchronization (none), consumes no CPU cycles by polling for
messages, and is easier to understand.


Failure to use volatile

Severity: potentially fatal

Any variable that is shred between two threads and can be changed
concurrently must be marked as volatile. This keeps the compiler
from doing certain optimizations, such as code motions, constant propagation and
common subexpression elimination, on a variable whose value might change
spontaneously.

Note that declare a variable as volatile does not make it
thread-safe. But a thread-safe variable, properly handled with synchronization
primitives, can still produce incorrect results in the Release version if
the variable is not declared volatile.

Example:

while(running)
    { /* thread loop */
     ...bunch of stuff here
    } /* thread loop */

looks like a perfectly good loop, where running is, for example, a
member variable of a class and can be modified by some other thread. For
example, you might do

void CMyDialog::OnCancel()
   {
    thread->running = FALSE;
   }

However, the optimizing compiler might detect that within the thread loop,
nothing modifies the running variable. If it believes this, it feels
perfectly free to transform the program as if it had been coded

if(running)
   { /* thread loop */
     compiler_generated_label:
       ...bunch of stuff here
     goto compiler_generated_label;
    } /* thread loop */

Now, you say, that’s ridiculous! That’s a completely different program! Not
so. From the viewpoint of code optimization the two programs are absolutely
identical
. This is because the optimizer assumes single-thread semantics.
And this assumption is completely wrong.

If, however, you declared the member variable as

volatile BOOL running;

then the compiler recognizes that some mechanism it is not aware of can
change the value, and it will not do the transformation. 


Failure to synchronize access to a variable
between threads

Severity: potentially fatal

No variable or structure should be accessed from multiple threads without
being accessed in a thread-safe fashion. Note that any thread-safe access must
also be multiprocessor-safe. 

Note that this applies only to variables where concurrent access can produce
incorrect results. There are situations in which concurrent access can never
result in an incorrect result. One such
example
is discussed in this essay in the section
on the (mis)use of PeekMessage
.

However, for situations in which concurrent access can produce an incorrect
result, you must provide proper synchronization. This means that even trivial actions like

volatile int a;
a++; // not safe!

cannot possibly work. If you look at the generated code, you will find that
it actually takes a couple instructions to increment a, but even if the compiler
generated an inc instruction, that takes two memory cycles, which
means that in a multiprocessor environment the other processor could
simultaneously fetch the initial value. Then both processors would increment the
value (say, from 17 to 18) and the value 18 would be stored back, first by one
processor, then by the other. So executing a++ twice would cause the value to go
from 17 to 18, which doesn’t make much sense.

If you believe this cannot happen, you are wrong. I first found this in an
operating system in 1968. I found that instead of doing the multithread-safe
increment, it did a multithread-unsafe increment. I reported this to my manager,
who reported it to IBM. In addition I fixed it. Suddenly, the MTBF of the system
went up; in retrospect, examining the "indeterminate" crash memory
dumps, we determined that one crash a week happened because there was a thread
preemption between the instruction that tested the value and the instruction
that set the value.

For simple operations, like increment, decrement, simple addition and
subtraction, and a compare-and-exchange, there are Interlocked…
operations, so the correct form of the code would be

volatile long a;
InterlockedIncrement(&a);

There are also InterlockedDecrement, InterlockedExchangeAdd,
and InterlockedCompareAndExchange, among others.

However, generally you need to provide higher-level synchronization on
objects, because very few changes can take place in a single instruction. So you
should use CRITICAL_SECTION or a Mutex, or the MFC classes CCriticalSection
or CMutex to provide the necessary synchronization.

There is a persistent piece of misinformation which I have said back to me
often enough to have realized this has now achieved the status of Programming
Legend: by declaring a variable as volatile access to it is
implicitly synchronized. Nothing could be further from the truth. The truth is
that volatile in no way provides synchronization. What it does do
is inform the other threads that they cannot cache the value, which is not at
all the same! There is absolutely, positively, no synchronization whatsoever
provided by declaring a variable as volatile. You, and you alone, are
responsible for providing synchronization. If, however, you also fail to declare
the variable as volatile, the cached value may be used by another thread,
which would also be incorrect. You must do both.

See my companion essay on synchronization.


Overriding the
Run() method in a CWinThread class

Severity: fatal. No "potentially" about it.
Programs that are written this way invariably do not even work, from day one.

Basically, there is no reason to do this. Or the reasons are so exotic that
you have to be a very advanced user to do this sanely.

Hint: I’ve been writing threaded code for over a decade and not once, not
ever, have I had a need to override the Run() method.

Invariably this is the result of a misunderstanding of the purpose of a UI
thread.

The purpose of a UI thread is to have a message pump running. This is
important for dispatching events such as socket messages and other asynchronous
events from libraries such as SAPI (the speech API). Overriding the Run
method will block these messages. The most common failure mode is to put an
infinite loop in the Run() method. This can be assumed to make no sense.
If you need an infinite loop, use a general worker thread. If you need a UI
thread, you do not need an infinite loop in it.

Whatever you are trying to do, overriding Run is going to be
the wrong way to accomplish it. Putting an infinite loop in the Run
method is always a mistake.



Putting an infinite loop in the InitInstance handler of a UI thread

This makes no sense at all. Ever.

It defeats the purpose of having a UI thread. There is absolutely no way this
could ever make sense in a UI thread. Use a worker thread. It will block the
message pump, which is the only reason you would ever need a UI thread.
Therefore it represents a fundamental design flaw.

Whatever was intended by this, the choice of putting it in a UI thread is
always a mistake.


SendMessage between threads

Severity: serious,
potentially fatal

Never use SendMessage between threads. The result of this can be a
deadlock situation. The deadlock can be unpredictable and therefore difficult to
debug if it appears to be happening. So the easiest way to avoid this is to
never use it.

Use PostMessage to send messages between threads. This also means that
a worker thread cannot use most methods on controls, since they all end up being

SendMessage calls. Here are just a few of the operations you must not
do on a control or window from a thread which does not own it. I don’t intend to
reproduce the entire list here. Assume that if you are doing anything to
a window (or a control, which is just a window) from a thread, except PostMessage,
your program is at serious risk.

Note that I’ve had people say "But I’m not sending any messages
to the controls; I use MFC methods!" which is actually the same as saying
"I send messages to controls". MFC is just a nice syntactic wrapper on
hundreds of SendMessage operations.

Message prefix Example messages MFC equivalents
BM_ BM_GETCHECK CButton::GetCheck
BM_SETCHECK CButton::SetCheck
CB_ CB_ADDSTRING CComboBox::AddString
CB_DELETESTRING CComboBox::DeleteString
CB_FINDSTRING CComboBox::FindString
CB_GETCOUNT CComboBox::GetCount
CBEM_ CBEM_DELETEITEM CComboBoxEx::DeleteItem
CBEM_GETITEM CComboBoxEx::GetItem
CBEM_INSERTITEM CComboBoxEx::InsertItem
DM_ DM_GETDEFID CDialog::GetDefID
DM_SETDEFID CDialog::SetDefID
DTM_ DTM_GETRANGE CDateTimeCtrl::GetRange
DTM_GETSYSTEMTIME CDateTimeCtrl::GetSystemTime
DTM_SETRANGE CDateTimeCtrl::GetRange
EM_ EM_GETLIMITTEXT CEdit::GetLimitText
EM_GETLINE CEdit::GetLine
EM_GETLINECOUNT CEdit::GetLineCount
HDM_ HDM_DELETEITEM CHeaderCtrl::DeleteItem
HDM_GETITEM CHeaderCtrl::GetItem
HDM_INSERTITEM CHeaderCtrl::InsertItem
HDM_SETITEM CHeaderCtrl::SetItem
IPM_ IPM_CLEARADDRESS CIPAddressCtrl::ClearAddress
IPM_GETADDRESS CIPAddressCtrl::GetAddress
IPM_ISBLANK CIPAddressCtrl::IsBlank
IPM_SETADDRESS CIPAddressCtrl::SetAddress
LB_ LB_ADDSTRING CListBox::AddString
LB_DELETESTRING CListBox::DeleteString
LB_GETCURSEL CListBox::GetCurSel
LB_SETCURSEL CListBox::SetCurSel
LVM_ LVM_DELETEITEM CListCtrl::DeleteItem
LVM_EDITLABEL CListCtrl::EditLabel
LVM_GETITEM CListCtrl::GetItem
MCM_ MCM_GETCURSEL CMonthCalCtrl::GetCurSel
MCM_GETMONTHRANGE CMonthCalCtrl::GetMonthRange
MCM_SETRANGE CMonthCalCtrl::SetRange
PBM_ PBM_GETPOS CProgessCtrl::GetPos
PBM_SETRANGE32 CProgressCtrl::SetRange32
PBM_STEPIT CProgressCtrl::Stepit
PSM_ PSM_ADDPAGE CPropertySheet::AddPage
PSM_GETPAGEINDEX CPropertySheet::GetPageIndex
PSM_GETACTIVEPAGE CPropertySheet::GetActivePage
RB_ RB_GETBANDCOUNT CRebarCtrl::GetBandCount
RB_GETBKCOLOR CRebarCtrl::GetBkColor
RB_SETBANDINFO CRebarCtrl::SetBandInfo
SB_ SB_GETTEXT CStatusBar::GetPaneText
CStatusBarCtrl::GetText
SB_SETTEXT CStatusBar::SetPaneText
CStatusBarCtrl::SetText
SBM_ SBM_GETPOS CScrollBar::GetPos
SBM_SETPOS CScrollBar::SetPos
SBM_SETRANGE CScrollBar::SetRange
STM_ STM_GETICON CStatic::GetIcon
STM_GETIMAGE CStatic::GetImage
STM_SETICON CStatic::SetIcon
STM_SETIMAGE CStatic::SetImage
TB_ TB_ISBUTTONENABLED CToolbarCtrl::IsButtonEnabled
TB_PRESSBUTTON CToolbarCtrl::PressButton
TB_SETSTATE CToolBarCtrl::SetState
TBM_ TBM_CLEARSEL CSliderCtrl::ClearSel
TBM_GETNUMTICS CSliderCtrl::GetNumTics
TBM_SETBUDDY CSliderCtrl::SetBuddy
TCM_ TCM_ADJUSTRECT CTabCtrl::AdjustRect
TCM_GETITEM CTabCtrl::GetItem
TCM_SETIMAGELIST CTabCtrl::SetImageList
TTM_ TTM_ADDTOOL CToolTipCtrl::AddTool
TTM_GETTOOLINFO CToolTipCtrl::GetToolInfo
TTM_NEWTOOLRECT CToolTipCtrl::SetToolRect
TVM_ TVM_DELETEITEM CTreeCtrl::DeleteItem
TVM_GETCOUNT CTreeCtrl::GetCount
TVM_GETITEMRECT CTreeCtrl::GetItemRect
UDM_ UDM_GETBUDDY CSpinCtrl::GetBuddy
UDM_SETRANGE32 CSpinCtrl::SetRange32

Believing a thread starts
immediately
/Believing a thread does not start immediately

Severity: potentially fatal

Rule 1: any time you assume that a non-suspended thread starts immediately,
you are wrong
Rule 2: any time you assume there is a delay in starting a non-suspended thread,
you are wrong
Rule 3: there are no exceptions for non-suspended threads. See rules 1 and 2.

An example of the first failure is seen by the number of people who pass a
local variable to a thread:

SOMESTRUCT s;
s.whatever = somevalue;
s.thing = something;
AfxBeginThread(threadfunc, &s);

The first failure is that the code will be something like

UINT threadfunc(LPVOID p)
    SOMESTRUCT * s = (SOMESTRUCT *)p;
    ...stuff
    ... = s->whatever; // invalid access

By the time the thread gets around to accessing the s->whatever field,
the struct is long gone. The creating function returned, its stack frame is
gone, and the space formerly used to hold SOMESTRUCT s is now being used
for some other purpose. It gets even worse if the contents of SOMESTRUCT

were pointers; they now point to someplace, but where? Who knows?
Certainly serious malfunctions will result if they are used to load or store
anything.

I’ve seen a solution posted that looked like this:

UINT threadfunct(LPVOID p)
   {
     SOMESTRUCT s = *(SOMESTRUCT *)p;

and the explanation was that by making a copy, the contents would now be
saved. Sorry to disappoint the poster, but this is every bit as incorrect as the
previous example, and for the same reasons. All you know about a thread, once
you’ve called the ::CreateThread call (by whatever wrapper your
environment demands, such as _beginthread, _beginthreadex, or one
of the forms of AfxBeginThread), is that at some point in the
indeterminate future, the thread will start. And at some other indefinite
point in the future, the thread will end
.

If you need to pass information across a thread boundary, it must be space
that is either statically allocated, or is allocated on the heap, and it must be
guaranteed to remain correct for the entire lifetime of the thread.
Anything less will not work.

Sometimes you have a single thread that does one thing. In this case, it
sometimes can simply use member variables of the class that spawns it. For
example, a thread started by a CView-derived or CDialog-derived
class may use member variables of the class instance. Note that it is your
responsibility to see that this thread has completely terminated before allowing
the CView or CDialog to be destroyed. Rather than use individual
members, it is often best to have a struct/class which contains all the
variables needed for the thread.

Sometimes you need many threads to do certain tasks on behalf of a single
view, document, or other class. In this case, it would be impossible to use a
single variable or set of variables to hold the state. In this case, you must
actually allocate heap memory and pass a pointer to this heap memory to your
thread. The memory must be freed by the thread when it has finished using it.

The complementary problem is the illusion that the thread still exists after
the return from ::CreateThread or its wrappers. For example, here is a
classic error:

CMyThread * t = (CMyThread *)AfxBeginThread(RUNTIME_CLASS(CMyThread), ...);
t->m_bAutoDelete = FALSE;

Whoops! If you think this code can work, rethink what you are doing. It
cannot work. By the time the assignment is done to t, the thread might
have finished, and because m_bAutoDelete was implicitly TRUE to
start, the thread object referred to by t is gone, it has been returned
to the heap, the space has been reallocated for some other purpose, and you just
stored the value "1" somewhere, but you have no idea where! Perhaps you
merely corrupted the heap. Perhaps you have damaged some other data structure.
You have no idea. None whatsoever. And someday your code is going to crash and
burn in a spectacular manner.

This is one of the many reasons that keeping thread object pointers around
beyond the immediate creation of the thread is often useless. If you need to set
member variables, you need to do

CMyThread * t = (CMyThread *)AfxBeginThread(RUNTIME_CLASS(CMyThread), ..., CREATE_SUSPENDED, ...);
t->m_bAutoDelete = FALSE;
t->anythingelse = something;
t->ResumeThread();

Now you can depend on the fact that the thread has not started
running, because it was created suspended. Until you execute ResumeThread,
you are safe. Once ResumeThread is called, the thread could be terminated
before the call returns. (and if you think this is impossible, you have not
thought out what threading means).

One of the common questions which arises under these conditions is "But I
can’t do x until I know the thread is executing!". If that is so, you
need to rethink what you are doing. If you need to do w, start a thread,
wait for the thread to be running, and then do x, you have to stop
thinking in a procedural fashion and think in an event-driven fashion. The
correct approach is to do w, and start the thread. That’s it. That’s as
far as you go. If x needs to be done after the thread starts, why is it
not done in the thread? That’s the first question to ask. If there is
actually a good reason (not just a failure in design) that this must be so, then
the correct behavior is to have the thread PostMessage a notification to
the main GUI thread, or possibly PostThreadMessage if it was spawned by
some other UI thread, and when that message is received, you are now free to do

x.

I’ve seen solutions of the form:

HANDLE h = ::CreateEvent(NULL, TRUE, FALSE, NULL);
AfxBeginThread(threadfunc, (LPVOID)h);
::WaitForSingleObject(h, INFINITE);

where the thread contains

UINT threadfunc(LPVOID p)
    {
     HANDLE h = (HANDLE)p;
     ... thread setup goes here...
     ::SetEvent(h);
     ... rest of thread goes here...
    }

which is a bit clunky; it is not a clean solution. Worse still, I’ve seen the
passing of two event handles; after the creator comes out of the
::WaitForSingleObject
, it then does something and does a ::SetEvent

to tell the thread it can now continue; the thread, after it did the ::SetEvent
to resume its creator, did a ::WaitForSingleObject to wait for a resume
notification. This is really clumsy, and shows a failure to give up
procedural, sequential thought models. Junk code like this, and rewrite it into
something simple. Nearly all the time I see code like this, there is actually a
much simpler way to write it. The few times a better solution was not feasible,
it was because of fundamental design failures elsewhere that could not be easily
rewritten. Poor design is poor design, and there is usually not a single point
where it goes bad, but lots of places that go bad.


ExitProcess() and exit()

Severity: potentially fatal

Programmers have been erroneously trained to "exit the program if there
is an error". Much of this training was simply incorrect, and was often
based upon people who learned how to program when programs were encoded in
pieces of cardboard and run overnight.

In a modern interactive program it is never appropriate to "exit
the program". Exiting the program is an admission of total failure. It is
always an error. Therefore, an ExitProcess call, or as it is disguised by
the C library, exit, is always inappropriate. If you think you have an

"unrecoverable error", you are wrong. All errors are
recoverable. The recovery may mean that a whole lot of menu items, controls,
etc. are now disabled, but the condition is always recoverable with
respect to keeping the program running. To think that such a thing as an "urecoverable"
error that necessitates exiting the program could ever exist is a deep flaw in
thinking.

I once bought a third party database library, CodeBase. It was without
a doubt the worst subroutine library I have had the misfortune to be
subjected to. Its attitude was that if an error occurred, there were two errors:
those caused by incorrect programming, which would pop up and obscenely cryptic MessageBox,

underneath the application because the programmer did not
understand that using a NULL handle made the dialog box a child of the
desktop, and those errors which were caused by its perception that data was
incorrect, in which case it called exit instead of returning an error
code that said "data corrupted". They would not even listen to an
analysis of what was wrong; they gave lame excuses such as "If the index is
bad, we can’t proceed". Fine, but that has nothing to do with the
program
. Any subroutine library that contains an ExitProcess or exit

is so fundamentally erroneous that it can only be ignored as a piece of
legitimate software. Its inclusion in a project probably dooms the project.
After all, do you want to take the call from your customer saying "I
was working in the program and it just went away!"? Do you think you can
reproduce it?

The arrogance that says a subroutine writer has the right to determine that
my program should shut down requires a serious attitude adjustment on the part
of that programmer. No subroutine has the right to terminate a program. Only the
user has the right to terminate a program. Now, the user may choose to do so
because an internal error has rendered the program unusable, but by that time,
there should have been detailed analyses of the failure mode presented to the
user, the error would have been logged in a way that can be presented to a
technical support person, and operations the user might do that would now be
invalid should be disabled.

So you say, "If that error occurs, the user cannot proceed!". Fine. This
means you end up disabling all the menu items and toolbar items except those
which are still viable, such as Exit and About… (the latter is
critical if the user calls tech support!)

If an error occurs in a subroutine, return an error code. Or throw an
exception. But never, ever call ExitProcess. In addition, in a GUI
program, if you call ExitProcess, the process stops. The user gets no
chance to decide if buffers should be saved; the mechanism that asks the
question has been bypassed; objects which may need to be cleaned up (for
example, semaphores, which have an existence independent of the process itself)
are left corrupted; Mutexes which are currently locked will generate WAIT_ABANDONED
notifications, which guarantee that other processes which are depending on them
will also have to enter a nonrecoverable error situation; data which has been
buffered but not yet written to the kernel, such as file buffers and record
buffers, are lost. Because of these hazards, I have classified this as potentially
fatal
.

It doesn’t matter what you need to do to avoid ExitProcess. Do it. It
doesn’t matter if the program is a GUI program or an ordinary console app; never
call ExitProcess (or exit). I have managed to avoid using this
(except for the occasional trivial ten-line test program, and even then I get
nervous) for something like 20 years. It isn’t hard.


PostQuitMessage() and PostMessage(WM_QUIT)

Severity: potentially fatal. If you are
lucky, merely serious

Using these on the main GUI thread is almost always an error. The only
context in which they ever made sense was in the WM_DESTROY (or the WM_NCDESTROY)
handler of a pure C API program. They never made sense in any other context in
pure C programming, and they never make sense in an MFC program.

The way to shut a program down is to PostMessage(WM_CLOSE) to the main
window. This invokes the standard closedown actions, which typically include
such actions as prompting the user to save unsaved buffers. If you get a WM_QUIT
message to the message loop, the message loop shuts down immediately and
control proceeds to ExitInstance. Nothing else in the program associated
with shutdown will take place. This will eventually lead to fatal situations; if
not immediately, certainly later (often before the product is released, and if
not then, usually on the first round of maintenance). 

The only valid place these can make sense is to shut down a worker thread
with a message queue, known somewhat erroneously as a "UI" thread, and
then only if you have a fair degree of confidence that everything else in the
thread is shut down. Usually I will PostMessage a message to the UI
thread asking it to shut itself down, and when it has determined it is in a
clean state, it does a PostQuitMessage to itself.


PostMessage(WM_CLOSE)

The only time an application should shut down is when the user or the
system asks it to shut down. This means that the user has used one of the
numerous means to ask a shutdown of the app (the close box, Alt+F4, the system
menu, the Task Manager), or the user is logging off or the system is shutting
down. Otherwise, this has all the charm of using ExitProcess()
or exit()
and is an error. If the program is seriously broken, you
should still keep it running…just don’t let it do anything.

Sometimes, shutdown is a multiphase operation. The OnClose handler
only initiates the close operation, but does not actually call the superclass OnClose
handler which eventually gets to the built-in DefWindowProc which calls DestroyWindow.
This is deferred until the necessary conditions are met, such as network
connections quiesce, the robotic arm has been retracted to its safe position,
and similar long-period delay operations. In this case, the superclass OnClose
handler will be called when the necessary conditions are met.


Using Events for Mutual
Exclusion

Severity: potentially fatal

Events (those objects created by ::CreateEvent or by using CEvent)
can be used for synchronization, but they are inadequate for doing mutual
exclusion. The reason is that many threads can pass an Event. For mutual
exclusion use a Mutex (created by ::CreateMutex) or CMutex or a CRITICAL_SECTION.
If you are using an Event for mutual exclusion, you are using the incorrect
mechanism and need to change. 


Entering a
Critical Section before a Lengthy Operation

Severity: serious, potentially fatal

A Critical Section of any sort (a section of code which is protected from
concurrent execution on the same data item) should be as short as possible.
Generally there are a very small number of lines of code between the locking and
the unlocking, and furthermore these lines do not usually contain lengthy loops.
The longer you spend in a critical section, the higher the probability that
another thread will block on the same lock. This decreases total throughput of
your program.

But one thing that is always a mistake is to block before a blocking
operation. I have actually seen code that locks a buffer in the following
fashion (this was in the worker thread). 

// in the shared data area:
volatile DWORD length;
----------------------------
length = 0;
while(reading)
   { /* read loop */
    EnterCriticalSection(&bufferlock);
    ReadFile(comport, &buffer[length], bufferlength - length, &bytesRead, NULL);
    length += bytesRead;
    LeaveCriticalSection(&bufferlock);
    mainthread->PostMessage(UWM_DATA_READ);
   } /* read loop */

The main GUI loop was then supposed to read the data by doing

LRESULT CView::OnDataRead(WPARAM, LPARAM)
   {
    EnterCriticalSection(&bufferlock);
    ... use data in buffer...
    length = 0; // reset the length
    LeaveCriticalSection(&bufferlock);
   }

There are two deep and unrecoverable bugs in this code. The programmer had
analyzed it as "I can only have one thread at a time accessing the buffer
and the length, so I have done proper synchronization". However, the
crucial problem was that there was never really a time when the buffer was unlocked.
By the time the PostMessage was processed, the buffer was locked again!
Consider the execution of the loop: it leaves the critical section, posts the
message, loops around, locks the critical section again, and blocks on the ReadFile.
So by the time the message is processed in the main GUI thread, the thread is
blocked when it tries to enter the critical section. When the worker thread
unblocks, it fills in the buffer, releases the lock, posts a message, loops
around, and locks the critical section. Boom!

Now the second fatal bug is the fact that there is no test for buffer
overrun
. That is, if you look at this, it just keeps reading data from the
serial port. Because the main thread can never get a chance to reset the position
to zero, it merely keeps increasing. When the position gets to be
greater than bufferlength, the computation bufferlength – position
gives a negative number, but since the input is a DWORD, this turns out
to be a massively large positive number. Since the buffer is not as long
as the large negative number suggests, the inevitable result is that the I/O
Manager will reject the ReadFile by throwing an access fault
exception. 

The trick here is to never, ever lock the buffer during the I/O operation.
The best method would be to have the thread allocate an input buffer on the
stack, and upon completion, allocate a block of storage to hold the data and PostMessage
a pointer to the heap block to the main GUI thread which handles it. This is
similar to the technique that I describe in my essay on worker threads, except I
wouldn’t use CString * for the data because the data could contain
embedded NUL bytes, which would be incompatible with CString. I might PostMessage
a simple BYTE * pointer in WPARAM and the length in LPARAM,
or I might use a CArray<BYTE, BYTE> *, or, if I were a user of
STL, some STL array structure. At no time would any locking be required.

Alternatively, it is worth observing that if I put a lock simply around the
setting and examination of the buffer length, there would only be a very small
window of locking, and furthermore the locking would essentially be guaranteed
to complete in a nominally brief period of time (excluding effects of thread
priorities). However, I would be disinclined to use this method when a simpler
method already exists. Note that the performance issue doesn’t arise because the
entire allocation and message handling task completes in less than a single
character time at any serial port speed, and would therefore be undetectable.
Again, refer to my essay on "Optimization, Your
Worst Enemy
". Why introduce complexity when it is not needed?


Using SetCurrentDirectory/_chdir

Severity: poor style/potentially
fatal

There are some very rare occasions in which it makes sense to change the
working directory. Every one of them is at the explicit request of the user. It
is never appropriate to simply change it. Possibly the worst example of
the misuse of SetCurrentDirectory (or its C-library equivalent, _chdir)
is in programs that try to do recursive tree walks during wildcard searches. The
problem with this call is that it changes the working directory for the entire
process, meaning that every thread sees the effect of this. Since many threads
open or create files based on the current directory setting, a thread that
creates a file while another thread is changing the directory will end up
creating it in some random place. 

The changing of a directory should only be a consequence of the user
requesting a change to a specific directory. Changing it at any other time, for
any other reason, that does not involve the user explicitly specifying the
directory (often as part as executing one of the file dialogs or a ShBrowseForFolder
dialog), can be thought of as a serious error. The programmer does not have the
right to spontaneously change the current directory. To do so is at the very
least exceedingly poor style, and at worst it can have potentially fatal
consequences if you have multiple threads. 


Polling for process completion

Severity: inefficient

A common failure mode in multithreaded programming is to have some thread
"wait" for another by asking "are we there yet? are we there
yet?". All this accomplishes is to waste a significant amount of CPU time,
since, unless you have a multiprocessor, the thread that is "waiting"

will consume its entire time slice asking for a condition that cannot
possibly be set to true
because during the time that thread is running, the
thread that is being waited for is not running and therefore
cannot possibly change the state! Putting this thread in a separate process is
no different from polling for thread completion in the same process. GetExitCodeProcess
is as pointless as GetExitCodeThread. If you want
to see the process exit code, you do this after you have
determined the process has finished. Since most processes don’t return a
meaningful code, even that is usually pointless.

In a multiprocessor, it means the waiting processor simply wastes a
significant amount of its time waiting for a thread to finish; that time could
be better spent doing something productive, such as running one of the other
threads of your application, or some other application, or a thread of the
application you are waiting for.

There are three techniques for dealing with process completion. The simplest
method, which I will explain why has serious defects in a different
section
of this essay, is to simply do a WaitFor… operation on the
process handle. The process handle remains non-signaled as long as the process
is running (or potentially runnable) and becomes signaled when the process
terminates. Therefore, the thread that is waiting for the completion uses no CPU
time while it is waiting. However, since this blocks the waiting thread, and
that can be the main GUI thread, this is not generally recommended in the main
thread.

The second method is useful when you have a console-style app for which you
have redirected stdout, which you are receiving via an unnamed pipe in
your application. When the pipe is broken, it typically means that the
application has finished, since very few applications would spontaneously close stdout.
It is usually closed implicitly when the process finishes.

The third method is to create a thread that waits for the process to
complete, and posts a user-defined message to the appropriate window (view or
mainframe) to indicate that the process has finished. This is discussed in the
next section and in an accompanying essay.


TryEnterCriticalSection

Severity: inefficient

I find this a pointless operation. The whole idea of a critical section is
that you should block if you can’t enter it. But here is a call that simply
returns FALSE if you can’t enter the critical section. What can you do?
Well, retry the operation. This is very much like polling. If you need a
timeout, use a Mutex and a WaitForSingleObject with a timeout. So
I find that this does not seem to solve any problem that matters, but the few
times I’ve seen it used it results in a merely inefficient program, and in one
case an erroneous program.

And, I should point out, that the following are not equivalent:

::WaitForSingleObject(mutex, timeout);
while(!TryEnterCriticalSection(&lock)) 
       Sleep(deltaT);

In the first form, the wait blocks the thread until either the mutex
handle becomes signaled or the timeout occurs, whichever occurs first. In the
second form, if the TryEnterCriticalSection fails, the thread always

blocks for the deltaT time. If the lock becomes signaled 0.1ms after the
TryEnterCriticalSection, the loop still waits the full deltaT
before testing again. Think about it This is like calling up tech support,
finding there is someone ahead of you in the queue, and waiting five minutes
before calling back. There is an excellent chance you will never, ever get
support.


Blocking the main GUI thread waiting for process
completion

Severity: potentially fatal

This is almost always an error. For one thing, if you are capturing output
from the process, you won’t be able to display it, because even if the I/O is
done in a separate thread, the updating of the main GUI controls is handled by
the GUI threads. So the net result is that the program appears to not work at
all. Never block the main GUI thread if you can help it. Certainly never block
it waiting for process completion.

Note that this is more serious than just blocking a thread and getting a
non-responsive GUI, which is merely annoying to the user. Your whole program can
simply hang. So it goes beyond poor interface strategy and falls into the
potentially fatal class.

As with asynchronous thread notification, you simply disable all controls,
menu items, etc. that are inappropriate while the process is running. You’ve
seen this happen when you run a Build under the VC++ IDE. If you drop down the Build
menu, there are lots of menu items enabled, but Stop Build is disabled.
If you select a build option and the compiler is running, all the other options
are disabled but Stop Build is enabled. This is simple OnUpdateCommandUI
handling.

What I do is have a Boolean that enables/disables the controls. It is set to TRUE

when a process is running and FALSE when a process completes.

class waitInfo {
   public:
      waitInfo(HANDLE p, CWnd * wnd) {process = p; recipient = wnd; }
      HANDLE process;
      CWnd * recipient;
};

...AfxBeginThread(waiter, new waitInfo(procinfo.hProcess, this));
processRunning = TRUE;


/* static */ UINT CMyView::waiter(LPVOID p)
    {
     waitInfo * info = (waitInfo *)p;
     ::WaitForSingleObject(info->process, INFINITE);
    recipeint->PostMessage(UWM_PROCESS_FINISHED, (WPARAM)info->process);
    delete info;
    return 0;
   }

LRESULT CMyView::OnProcessFinished(WPARAM, LPARAM)
   {
    processRunning = FALSE;
    return 0;
   }

void CMyView::OnRunProcess(CCmdUI * pCmdUI)
   {
    pCmdUI->Enable(!processRunning);
   }

Using setjmp/longjmp in a C++ program

Severity: fatal. No "potentially" about it.

These are functions that are such an outrageous kludge that I’m not going to
dignify them with an explanation of how they work internally. They don’t work, never did.
The only reason they gave the appearance of working was the incredibly poor
quality of the PDP-11 C compiler, which ran in 64K of memory and generated lousy
code. The technique couldn’t possibly work in any modern optimizing compiler,
and didn’t. Been there, done that. 

The essence of setjmp was that it marked a place to which control
would transfer when a longjmp was executed. It would return with 0 when
called directly and nonzero (the longjmp value) if it was an abnormal
return.

Today, this is handled with exceptions. setjmp is replaced by TRY/CATCH

and longjmp is replaced by throw. Key here is that in C++, these
are first-class concepts in the language and not subject to the horrendous bugs
that setjmp/longjmp generated in most compilers (the compiler was working
correctly; the longjmp logic would produce erroneous results). But the major
failure mode of longjmp is that it bypasses all destructors on
the stack, with inevitably fatal consequences.

Bottom line: if you have setjmp/longjmp in a C++ program, you have a
fatally flawed program. Remove them and replace them with C++ exception
handling.

Note: If you believe there is any justification for setjmp/longjmp in
a C++ program, you are wrong. Don’t try to justify the situation, fix it.
(Seriously, I’ve had people argue this point with me. It takes less time to fix
the problem than to try to justify why it makes sense).


Using setjmp/longjmp in any GUI program

Severity: potentially fatal (except if the
GUI program is written in C++, in which case it is

always a fatal error)

Yes, I’ve done it. In Win16. With optimizations turned off, because the
brain-dead third-party library we were using demanded that we support it. I
would never support it willingly. There is no excuse for using it in Win32. The
problem in a GUI program is that you can end up with state that is not cleaned
up, operations that do not complete, and a plethora of other problems. And I
cover these in the next section.


Using setjmp/longjmp in any program

Severity: potentially fatal (except if the
program is written in C++, in which case it is always

a fatal error)

If you are programming in C, use Structured Exception Handling. If you want
portability, you cannot tolerate setjmp/longjmp in any case because it is only
barely functional. To make it work correctly, you must declare every local
variable as volatile, and that’s just the start. You must not use the register
qualification on any local variable or parameter. There are limits how setjmp
can be called. There are nonportable limitations on the contexts in which longjmp
can be called. Managing the longjmp buffer is a real royal pain. They
don’t nest without a lot of effort. Given we now have linguistic mechanisms that
get rid of the need for this horror, it is best to avoid it.

Essentially, if I see this in a program, I know the program is working only
by amazing good fortune.


Using the clipboard for anything other than copy or cut

Severity: fatal. This is so incredibly
bogus that it is impossible to classify it as anything else.

No, it isn’t fatal to your program. Programs that violate this rule will
probably continue to work just fine. Just don’t ever be caught in a dark
alley with one of your users!

You, the programmer, do not have the right to modify the clipboard
contents
. Only the user has the right to modify those contents. Thus,
the only time you are permitted to modify the contents of the clipboard
is when the user has requested you to by doing a copy-related or
cut-related action. You may not modify it at any other time. I’ve seen people
use it for interprocess communication. This is absolutely beyond any shadow
of a doubt the WRONG thing to do
. There
is no way to emphasize this strongly enough. Only if the user has an
explicit operation that says "Put this in the clipboard" are you
permitted to change the clipboard contents. It could be a menu item, a popup
menu item, Ctrl+C, Ctrl+X, a pushbutton that says

"copy", or anything else. Then, and only then, are you
permitted to change the clipboard contents.

Imagine the following: your program is running. It fires off another program
to run. It puts something in the clipboard to send to that program. However, I
left out something in this sequence. The real sequence is

  • Your program starts
  • While your program is running, the user opens some other document,
    probably from another application
  • The user does a Cut operation
  • The user closes the document, saving the contents which have had something
    removed
  • Your program drops trash into the clipboard
  • The user brings up another app and does a Paste operation, expecting to
    get the valuable information which has been cut to the clipboard
  • The user puts out a contract on you

Sleep(n)

Severity: potentially fatal

There are valid reasons for using Sleep. However, the most common
usages I find of them are completely wrong-headed and in most cases result in
fatal flaws in the programs.

First, you have to understand what the value of n means: it means,
"Stop this thread for at least n milliseconds". The thread will
resume sometime after that. Perhaps a long time after that. So if you say

"Sleep(5)" the likelihood that your thread will be running 5ms later
is nearly zero. This is because the basic clock is 10 or 15 ms (or in MS-DOS, 55 ms,
but I consider those systems dead), so it cannot possibly run sooner than one
clock tick. But if a higher priority thread is running, it will not be
preempted, so it might be several timeslices before the thread which is now
runnable can actually run. Even if you think you have control of the priorities,
you don’t, because the Balance Set Manager thread may be running, and it can
boost any thread to priority 15 and give it a double timeslice, and in NT, there
are file system and other kernel threads, and Deferred Procedure Calls (DPCs)
that will also preempt. Read my essay "Time is the
simplest thing
" for more details.

Mostly, when I spot a Sleep() call in a program, I ask "Why is
this here?" and the answer is "Because the threads won’t work unless
it is". What this means is the multithreading architecture is irrevocably
flawed and needs to be rethought and perhaps rewritten. Throwing a Sleep()

call may give the illusion that you have solved the problem, but in fact there
is a fundamental problem, and the Sleep() call disguises it. However, on
a faster machine, or a different version of the operating system, or a different
mix of applications, or a different loading factor on your app, or most
especially, on a multiprocessor, you will discover
that the fundamental architecture is flawed. At that point, you don’t have a
choice.

True story: some time ago I was consulting with a client, doing a code
walkthrough of their multithreaded application. I spotted a Sleep() and
knew immediately that this was where the problem was. I didn’t know what
the problem was, but I knew I had found it. After some explanation by their
programmer (which
included "the threading doesn’t work right if it isn’t there") I
understood the code, and said "This will fail under the following
conditions" and proceeded to outline the conditions, and said "and it
will fail in the following ways" and listed a set of ways the program would
fail. They were impressed. I had described exactly the conditions under
which it failed and the manner of the failures, without ever having seen the
program run. It had taken me perhaps 15 seconds to spot the Sleep() call
(they had said "Here’s the function that is failing" so I didn’t have
to look for it). How did I do this so quickly? Because it was by no means the
first such instance I had seen! And in every previous time I had been absolutely
correct.

Another case I see all too often is after a ::CreateProcess call there
is a Sleep() call, "to give the process time to come up". No matter what
value is chosen here, it is wrong. There is no correct value for this
Sleep()
call!
Either it is too short, or too long, but it will never

be "just right". There is a simpler method if you are waiting for a GUI app to
become active, with its window fully displayed. Use the WaitForInputIdle
API call. You can put a timeout on it, but this timeout is usually fairly large
and is intended to indicate that you are in really deep yogurt, somewhere down
in the boysenberries. If the timeout is selected properly, it will be fairly
long, and it means that the process has failed to come up at all.

There are occasional cases in which a Sleep() makes sense. But they
are so rare, and so special-case, that I can’t give a general rule by which you
can determine that it is a valid usage. But keep in mind that most uses are not
valid or sensible. And also it means that any constant you choose for the timing
value is probably wrong, also. In my multithreading lab, I demonstrate a case of
using Sleep(5000) to make the program work. I then force the students to
consider all the ways this can fail, and why any value, 5000, 10000,
1000, etc. will always be wrong. Then I force them to rethink how the
synchronization between the threads is done. When we are finished, we have a correct,

Sleep-free program.

There are times when Sleep(0) is handy to produce interesting
animation effects in multithreaded programs; Sleep(0) forces the current
thread to yield and allows another thread to run. This is in general
inefficient, because it induces a lot of gratuitous context swaps and scheduling
operations, but for some animation effects to illustrate the effects of
multithreading it really makes a nice visual effect. But you have to be willing
to give up a fair amount of CPU resource to use this, and it is very special
case.


Using SleepEx()

Severity: potentially fatal, or

absolutely essential

SleepEx() has all of the charm of Sleep(). It differs only
slightly in that it takes two
parameters; one of which is the sleep interval and one of which is a Boolean
indicating if pending Asynchronous Procedure Calls (APCs) should be taken. The
combination SleepEx(0, TRUE) has the characteristic that it does not
really suspend the thread, but allows any pending APCs to be taken. If there are
any APCs, they will be processed before the SleepEx continues. Thus, if
you are using asynchronous I/O, this is actually not only a useful call, but in
some ways an essential call (You could also use WaitFor…Ex()

calls). So you can feel free to use it in the context of pending asynchronous
callback I/O. If you are not using one of the WaitFor…Ex() calls you must
use SleepEx(n, TRUE).


Polling for time

Severity: inefficient to potentially
fatal

This is one of the most common errors I’ve seen. The idea is to somehow ask
the system about how much time has elapsed and then take some action based on a
change in time. Most of the code I’ve seen to do this is usually bizarre or
convoluted. And mostly seems to be based on some illusions about what time is.
As a prerequisite to doing anything with time, you should read my essay on time.

First, you can’t poll for time. This just wastes CPU time with no guarantees
that anything meaningful will happen.

As pointed out in my essay, the resolution of a time value has nothing
to do with the precision that is used to represent it. For example, a
time value that can be stored in milliseconds. I’ve seen code that goes like
this

SYSTEMTIME t0;
GetSystemTime(&t0);

while(TRUE)
   { /* pause loop */
    SYSTEMTIME now;
    GetSystemTime(&now);
    if(abs(t0.wMilliseconds - now.wMilliseconds) > 1)
      { /* 1 ms passed */
       break;
      } /* 1 ms passed */
   } /* pause loop */

Actually, this code was written because the programmer said "But you
said that the resolution of the Sleep() function was only 10ms, so this
gets around that!". Actually, the system time increments by some number of
milliseconds on each clock tick, and consequently if you write a program that
simply reads the time in a tight loop, the minimum difference you will see is
10ms or 15ms. So this code is identical to Sleep(1) without the
simplicity, and will resume sometime between 1ms and 10ms after the call. So the
above code is fatally flawed. Not only is it inefficient, it doesn’t do what it
claims to do.

An even worse case was one where the programmer wanted to activate something
at 8 a.m.. The code was similar to this:

void DoAtEight()
   {
    while(TRUE)
      { /* Wait loop */
       CTime t = CTime::GetCurrentTime();
       CString s = t.Format(_T("%H:%M:%S"));
       if(s == _T("08:00:00"))
         { /* 8am */
          ...do processing...;
         } /* 8am */
      } /* Wait loop */
   }

Now this code cannot possibly work reliably. First, the programmer observed
that the machine ran somewhat sluggishly and Task Manager showed that the
application was consuming 100% of the CPU time. This should not be surprising;
the code sits there and keeps asking "are we there yet?" "are we
there yet?" and will only be suspended when it has exhausted its timeslice.
But what is worse, this code cannot work. Consider: a timeslice is about 200ms
(the details vary based on the fundamental time tick). So suppose that at
07:59:59.99, the last time read, the thread executing this code is suspended.
There are ten other threads waiting their turn. The next time the waiting thread
is scheduled, it reads the time, and finds that the time is now 08:00:01.13. So
at no time will the string representing the time ever be 08:00:00. Now on some
days this will work, but on other days it will fail. In addition, in the
unlikely chance that the time read was actually 08:00:00, if the processing
takes less than one second, there is an excellent chance that the next time
through the loop the  time might again be 08:00:00, causing the
processing code to be executed two or more times, instead of the once that was
desired. All in all, possibly the worst possible way to accomplish the task.

My proposed solution was to do something like this:

void DoAtEight()
    {
     BOOL processed = FALSE;
     while(TRUE)
       { /* wait loop */
        CTime t = CTime::GetCurrentTime();
        if(t.GetHour() >= 8 && !processed)
          { /* process it */
           processed = TRUE;
           ...do processing...;
          } /* process it */
        else
        if(t.GetHour() < 8)
           processed = FALSE;
        Sleep(60000); // wait for a minute, or a bit more
       } /* wait loop */
     }

This is still not optimal; it wakes up once a minute, performs its test, and
goes back to sleep. It would be even better to use a waitable timer.

void DoAtEight()
   {
    HANDLE timer = ::CreateWaitableTimer(NULL, FALSE, NULL);
    BOOL processed = FALSE;
    while(TRUE)
      { /* wait loop */
       SYSTEMTIME t0;
       ::GetSystemTime(&t0);
       if(t0.wHour > 8)
         { /* adjust to tomorrow */
          ULARGE_INTEGER t;
          ::SystemTimeToFileTime(&t0, (FILETIME *)&t);
          t += 24ui64 * // hours in a day
               60ui64 * // minutes in an hour
               60ui64 * // seconds in a minute
             1000ui64 * // milliseconds in a second
             1000ui64 * // microseconds in a millisecond
               10ui64;  // 100 ns units in a microsecond
          ::FileTimeToSystemTime((FILETIME *)&t, &t0);
         } /* adjust to tomorrow */
       t0.wHour = 8;
       t0.wMinute = 0;
       t0.wSecond = 0;
       t0.wMilliseconds = 0;
       LARGE_INTEGER alarm;
       ::SystemTimeToFileTime(&t0, (FILETIME *)&alarm);
       ::SetWaitableTimer(timer, &alarm, 0, NULL, NULL, FALSE);
       ::WaitForSingleObject(timer, INFINITE);
       ...do processing...
      } /* wait loop */
   }

Use of malloc/free in C++ programs

Severity: poor style,
potentially harmful

There is no reason I can imagine to use malloc or free in a C++
program. If you need them, there are useful alternatives. For example, if you
need a dynamically-allocated buffers of n bytes, the simplest way is not

to write

    BYTE * buffer = malloc(n);

but to write

    BYTE * buffer = new BYTE[n]; 

They perform the same action, so why use an obsolete allocation mechanism.

Why does this matter? Consider the following: I need an array of objects, so
I allocate them as follows:

typedef struct {
     int x;
     int y;
} Coordinate;
Coordinate * coordinates = (Coordinate *)malloc(n * sizeof(Coordinate));

Looks good, right? Works fine, right? Sure. This week. But now I change the
structure to

typedef struct {
   int x;
   int y;
   CString description;
} Coordinate;

Now what happens? The malloc operation does not call the CString
constructor. You have garbage. Your program crashes. 

The correct solution was to have always written the code as

Coordinate * coordinates = new Coordinate[n];

This works, and it will always work correctly, even if you later add
members to the structure that require constructors. Writing code that breaks
when apparently irrelevant changes are made is always a Bad Idea. And since
there is no possible justification for using malloc, why bother writing
incorrect code?

Yes, I’ve heard the argument, "malloc is faster". I have
some advice for anyone who advocates this: Get A Life. You are clueless. An
allocation takes thousands of instructions. The trivial overhead introduced by
using new is unmeasurable. This is a classic example of the
"let’s optimize at the instruction level" when the real costs are at
the architecture level. If you are allocating frequently enough that you think
this trivial overhead can matter, you are very confused; why are you allocating
at all? The real issue is to make programs efficient, and you don’t accomplish
that by worrying about unmeasurable overheads when the real costs (such as the
actual cost of the allocator) swamp it by two to four orders of magnitude.

An even worse situation I’ve seen is the mixing of malloc with
delete
, or even new with free. There is no possible
justification for such insanity. If it exists, you are wrong. Fatally wrong (no
"potentially" here!). Fix the code. Pretend malloc and free are
obsolete and should never, ever be used in a C++ program.


Use of GlobalAlloc/GlobalLock/GlobalUnlock/GlobalFree

Severity: inefficient, deeply
suspect
(except those cases where it is mandated)

These APIs are essentially obsolete, with a very few exceptions. Only when
you have one of these exceptions should you ever consider these as viable
entities. Otherwise, use new/delete. They are relatively
inefficient in terms of both time and space. 

The contexts that I am aware of are the clipboard, where you must provide a
global memory handle This must be allocated by GlobalAlloc. You use GlobalLock

to make its address available, GlobalUnlock when you are done with it,
and you never call GlobalFree on a clipboard handle, under any
circumstances. 

There are certain uses with Blobs in OLE controls that use GlobalAlloc.

The DEVNAMES structure and certain fields of the PRINTDLG (contained
in CPrintDialog) use global memory handles for backward compatibility
with 16-bit Windows.

There may be other uses, but the documentation will always tell you if a
global handle is needed. Otherwise, avoid them entirely.


Using LocalAlloc

Severity: poor style

There is no reason to use LocalAlloc. This is a throwback for
compatibility with old Win16 programs. Ignore its existence. The only reason to
use LocalFree is for the buffer that is returned from FormatMessage.
There may be other uses that are mandated, since I have not read or memorized
all 5,000 or so API calls. But unless the use is mandated by an API call, there
is absolutely no reason to consider this as useful.


Doing a delete of an array object without []

Severity: potentially fatal

When you do a delete on an array of objects, it is crucial that you include
the [] in the syntax, that is, if data is an array of objects,

delete data;    // wrong!
delete [] data; // right

The problem is that the first form only deletes the space occupied by the
array. The second for deletes the space occupied by the array, but first calls
the destructors on every object in the array.

For example, consider the class

class MyData {
    public:
      int x;
      int y;
};

If I write

MyData * data = nwe MyData[n];

then n objects are allocated. If I do 

delete data;

the array is deleted. Everything appears to work. But consider, if I do

class MyData {
    public:
      int x;
      int y;
      CString s;
};

The the call

delete data;

will delete the space, but the CString destructor will not be called
on each element. The effect will be a storage leak. If, however, I write

delete [] data;

then the destructor is called on each element of the array, and proper
object destruction is done for each contained object. Note that the effect of
failing to write the delete correctly is that the addition of an object
requiring a destructor will mean the program stops working properly in some way
(usually a memory leak, but it could be more serious), and you won’t know what
went wrong.


Using non-const static variables in functions

Severity: deeply suspect to
potentially fatal

By "static" variables here I mean a declaration of the form

static type name;

and all variants, when they appear inside a function. An example is

int DoSomething()
   {
    static int count = 0;
    ...
    count++;
    }

This may give the illusion of being modular, but in fact it is a very poor
programming technique. I find that this is a technique which is a throwback to
the more primitive forms of C programming. It is inappropriate in a Windows
programming environment. The reasons deal with functions that need to be
reentrant across class instances and functions which could appear in a
multithreading context.

The worst and most typical form is

LPCTSTR ConvertToString(SomeType * t)
    {
      static TCHAR result[SOME_SIZE_HERE];
      wsprintf(_T("%d:%02d/%d"), t->seconds / 60, t->seconds % 60, t->divisor);
      return result;
    }

A piece of code like this in any program is an invitation to disaster. The
first thing I do when I find it is rip it out and replace it with something
sensible. There is no point in wasting time trying to debug a program that is
this deeply flawed. When I’m working with MFC, I tend to return CString

data types, since the allocation is automatically handled.

When would the above code fail? Consider the case

CString s;
s.Format(_T("%s and not greater than %s"), CovertToString(t1), ConvertToString(t2));

Both ConvertToString operations use the same buffer and return it, so
by the time the format string is processed, the buffer computed by the
conversion of t2 is clobbered by the result of the conversion to t1

(remember that parameters are always evaluated right-to-left).

Consider the case of multithreading, where two threads execute a call to ConvertToString.
The value seen is unpredictable, and on a multiprocessor the string could
potentially change while being read!

The use of const static variables is perfectly acceptable, for
example, 

double ConvertToSomething(double f)
   {
    static const factors[] = {1.0, 1.5, 1.723, 1.891, 1.9843};
    ... do some conversion here, using factors[i] based on 
    ... some criterion appropriate
    return result;
   }

The code of the form

CString cvMonth(int n)
    {
     static const LPCTSTR names[] = {_T("Jan"), _T("Feb"), _T("Mar"), ...etc };
     return names[n];
    }

however, is probably a Very Bad Idea since it binds a particular language
into the source code, which is a losing idea when internationalization is
required. You could do

CString cvMonth(int n)
   {
    static const UINT names[] = { IDS_JAN, IDS_FEB, IDS_MAR, ...etc. };
    CString result;
    result.LoadString(names[i]);
    return result;
   }

But even this is a Bad Idea if you have to deal with any culture that has
more than 12 months in its year, or whose month index would not correspond to
the Western European/American civil conventions that January is the first (zeroth)
month. Use the National Language Support APIs, but that’s a different essay to
be written at some future time.

There is one exception to this: cached values. A cache keeps a precomputed
value, but–and this is critical–it has a way of detecting if the cache is
invalid. You can frequently get nice performance speedups if you use cached
values, but here you have to weigh the cost of cache management with the cost of
not caching. This management includes both the execution costs of determining if
the cache is valid and the architectural costs imposed if explicit cache
invalidation is required (and the concomitant costs of failure to invalidate and
using incorrect data). Here you have to use good judgment, and put lots of
explicit comments explaining exactly why the use of a non-const
static value is a valid thing to do. And even then, I’d be more inclined to keep
the cache in a member variable of a class that was involved in the computation,
rather than in a static variable. The only reason to use static variables is if
the cache maintains validity across multiple objects.


Using new without any reason to

For some reason, people use new when there is absolutely no reason to
do so. The most common failure of thinking is when it comes to passing pointers
to objects. For example, if I have an API that requires a parameter
SOMEDATATYPE
 *, the solution is obvious:

    SOMEDATATYPE data;
    SomeAPI(&data);

This has not stopped programmers from thinking that the type of the argument
has to match the type of a variable, so they will do

    SOMEDATATYPE * data;
    data = new SOMEDATATYPE;
    SomeAPI(data);
    ... use it
    delete data;

It is hard to imagine why the second form could ever be imagined to make
sense. It introduces a gratuitous concept, memory allocation, to a place where
memory allocation is not needed, and requires a corresponding delete.
This not only adds intellectual baggage, it obscures the purpose of the code,
and introduces two possibilities for error. The first is that the new
failed (note the lack of any check for its success!) and the second is that in
the section called "use it" that the code gets very long, and someone puts, in a
later modification, a return FALSE or some similar construct into the
code which bypasses the delete, resulting in a storage leak.

Another example is to create a member variable for some type of window, e.g.,
a dynamically-created CButton:

class whatever : public CFormView {
    ...
    protected:
        CButton * mybutton;
};

then in the code do

    mybutton = new CButton;
    mybutton->Create(...stuff here...);

This has the same problem as above. This introduces a gratuitous need for a
concept that has no relevance, allocation; it requires that you remember to put
into the constructor an assignment

    mybutton = NULL;

and in the destructor

    delete mybutton;

all of which is pointless intellectual overhead. Instead, do

    protected:
        CButton mybutton;

and then you can do

    mybutton.Create(...stuff here...);

Note that concepts such as initialization of the pointer, destruction of the
object, and the use of new are replaced by a trivial concept: declaring a
variable. There is virtually no reason to allocate a CWnd-derived
class dynamically when its extent is the extent of the parent class.

Just about the only place where you do this dynamically is to allocate a
CDialog *-
derived object for a modeless dialog.

Another example is "but what if I need an array of integers?" What if you do?
CArray or std::vector exist, and work quite well. For example,
consider the case where you need an array of integers which represent the
selected items in a multi-select listbox:

    CArray<int, int>selections;
    int n = c_MyListBox.GetSelCount();
    selections.SetSize(n);
    c_MyListBox.GetSelItems(n, selections.GetData());

works nicely, and has the advantage that you don’t need to remember to call
delete.

There are lots of valid reasons to use new. Foremost among them is
passing information across thread boundaries. Information to be passed is put
into a heap-allocated object, and a pointer to that object is passed from one
thread to another. When the receiving thread is finished with the object, it
calls delete.


Creating Controls Dynamically

Severity: deeply suspect,
poor style
(unless there really is no alternative!)

There are lots of valid reasons for creating controls dynamically. One correspondent
pointed out that he had to create controls based on an XML description. I’ve
done them from database schemata. But 99% of the time I see someone creating a
control dynamically, they’re doing it for the wrong reasons. And sometimes when
they do it, they do it in the wrong way.

It is rare to need to create a control dynamically (in spite of the fact
there are lots of valid reasons, they only rarely apply). However, it is also
unnecessary in general to do a new to allocate an object to reference the
control. I commonly see people doing things like

BOOL CMyDialog::OnInitDialog()
   {
    CDialog::OnInitDialog();
    if(sometestorother)
       { /* we could stop */
        CButton stop = new CButton;
        CRect r(175, 54, 205, 65);
        stop->Create("Stop", BS_PUSHBUTTON, r, this, IDC_STOP);
       } /* we could stop */

Just about every line in the above code is about as wrong as you can get. The
problems are

  • You don’t need to do a new  
  • You would have to do a delete but this code stores the pointer in a
    local variable so the window could not be deleted
  • The magic constants shown are complete nonsense
  • It only works for 8-bit English applications
  • It actually assumes the the Create operation works!

Key here is that none of these lines are required, or are even
correct. I would simply create the control in the Dialog editor, size it and
place it where I wanted it, and then do

BOOL CMyDialog::OnInitDialog()
   {
    CDialog::OnInitDialog();
    c_Stop.ShowWindow(sometestorother ? SW_SHOW : SW_HIDE);

Note how much simpler and cleaner this is. I don’t have to compute a
position, I don’t have to allocate anything, I don’t have to delete anything,
and I don’t have to do a Create

If I had to create a control at runtime, there is no reason to do a new
if there is not some dynamic quantity being created. Just declare yourself,
outside the scope of the magic AFX comments, an appropriate variable

CMyCustomControl ctl;

Don’t do a new unless there is a compelling reason to do so
(there can be, but most of the instances I’ve seen where new is used to
create a control variable are just a waste of conceptual energy).

When you need to create the control, just make sure your custom control has a
Create method, and write

CString caption;
caption.LoadString(IDS_STOP);
VERIFY(ctl.Create(caption,and,parms,you,want,here));

To create a control based on a custom window class, you can select the
"custom control" (the little human head icon) from the toolbox. Use it
to place the control. Specify the class name of the control class in the Properties
box. Select the desired styles. The problem with the styles is that you can’t
just use symbolic names like WS_VISIBLE, which is a real pain; you have
to go into winuser.h and figure out the correct hex constants to use.
Sigh. Make sure you have registered the window class before you try to create
the window. You may do this in the InitInstance handler for a
dialog-based app, or in the constructor for a CDialog- or CFormView-derived
class. Or see my essay on self-registering
classes
.

I use VERIFY to make sure that if there were an error, I will see it
at the time the creation is done, instead of having some other part of my
program fail much later for reasons I will have to work hard to understand. In
some cases I might store the result and take other action, but in most cases,
creation failures represent coding errors which, once fixed, no longer exist.
This catches the failure early.

Hints: making dynamic creation easy

Suppose I wanted to create a pushbutton. I would know that, for example, its
left edge, right edge, top edge, and/or bottom edge (choose at least one
horizontal and one vertical) would have to line up in some way with some other
control. Or be a certain percentage distance from a left, right, top, and/or
bottom of the window.

Rather than compute the button size, which must be in pixels, based on some
criterion, or trying to mess with Dialog Box Units, which are a waste of
intellectual effort, I would use one of several tricks.

When I had to do this in a product I delivered, I created a form which had
several hidden controls: a combo box, a radio button, a check box, a static
label, and an edit control. I created these as disabled, invisible controls. I
did not have space on the form to put them (the entire body of the form was a
giant ListBox that contained these controls inside it), so I simply made the ListBox
a little smaller vertically and resized it in the OnInitialUpdate
handler of the form. Thus it overlapped the invisible controls, but I could
still see them in the dialog editor (hint: if you ever discover you are
creating physically overlapping controls, rethink what you are doing.
There is always a better way. I’ll trade off more complex code for ease
of long-term maintenance any day!

When I needed to create an Edit control, I had a variable, c_EditPrototype,
which got the value of the edit control that I put on the dialog. I wrote
something along these lines:

void CMyForm::CreateEdit(int n)
   {
    // Create an edit control on line n of the listbox
    CRect r;
    c_EditPrototype.GetWindowRect(&r);
    CRect line;
    c_Data.GetItemRect(&line);
    int left = ...complex computation of left edge of edit control;
    CRect edit(left, line.top, left + r.Width(), line.top + r.Height());
    CEdit * ctl = new CEdit;
    ctl->Create(styles, edit, &c_Data, controlid++);
    ...lots more stuff here, including saving the CEdit * for later deletion
   }

One key insight here is that I did not have to worry about the font size or
screen resolution. The edit control prototype was the width desired (all edit
controls would be the same width), the height was automatically set when the
dialog was created, taking into account the font size, resolution, etc., so all
I had to do was copy them. 

The buttons were a bit trickier; what I did was create a button with no
caption text. This gave me my minimum width; I then added GetTextExtent
pixels to that to get the new desired size.

Bottom line, be lazy. Let the system do as much for you as it
can. 

If I had wanted a caption, I would have stored the caption in the STRINGTABLE
and loaded it dynamically. I would never hard-code English text into a product
program. If I wanted a button with a non-language caption, I would create it as

c_Button.Create(_T("<<"), BS_PUSHBUTTON, r, this, IDC_REVERSE);

Note the use of _T( ) around the string constant so it is
Unicode-aware.

Positioning

Sometimes it is easy. Suppose I need to create a custom control centered in a
form, just above another control. We have a desired width and height.

CRect client;
GetClientRect(&client);

CRect below;
c_WhateverControl.GetWindowRect(&below); // the control below the one we want
ScreenToClient(&below); // convert to client coordinates
gap = 2 * ::GetSystemMetrics(SM_CYEDGE); // allow nice gap between

int left = (client.Width() - width) / 2;
CRect ctlrect(left, below.top - height - gap,
               left + width, below.top - gap, this, ctlid);

You can compute all the geometry you want, but if constants appear in any
context other than multipliers or divisors, you are in trouble. Note that I even
compute the gap based on the screen driver’s opinion of the height of the ideal
3-D horizontal edge.

If you know where you want it, and how big it will be, you can simply create,
in the dialog editor, a static control, such as a frame or rectangle. Mark it as
invisible. Change its ID from IDC_STATIC to something useful. Create a
control variable for it. For example, I might call it IDC_CUSTOM and the
variable c_CustomFrame.

CRect w;
c_CustomFrame.GetWindowRect(&w);
ScreenToClient(&w);
c_WhateverControl.Create(styles, w, this, ctlid);
c_CustomFrame.DestroyWindow(); // no longer need frame

Note that there is no computation of the size; the size has already
been computed for me by the dialog handler when it instantiated the static
control, and I am creating a custom control in the same area.


Using hardwired
constants for any size or position

Severity: serious

It is a common practice to want to rearrange controls on a dialog at run
time, for example, when the dialog is resized. An unfortunate characteristic of
such code is that it tends to include hardwired constants, which is almost
always an unrecoverable flaw.

This is because dialogs are always expressed in terms of Dialog Box Units,
which are an abstract coordinate system. When a dialog is instantiated, the
system "realizes" the dialog box by computing a function based on the
default system font. From that point on, everything is in terms of pixels. For a
very high resolution (e.g., 1600 Ã—1200) the dialog may take many more pixels
than if the display is running at some lower (e.g., 800 Ã— 600), because it tries
to keep an approximately same-size-on-the-screen effect of a given dialog. The
downside is that any attempt to use a hardwired constant is doomed from the
start..

The proper method is to use one of the basic resolution-sensitive parameters
of the system to scale the dimensions properly. Two of the key parameters are
the border size and the resize border size. For example, I might choose to
separate a set of buttons by a multiple of the resize border size. In a very
high resolution, this size may be doubled over what it is at a lower resolution.
Useful computations I include in my positioning code might be

UINT vsep = 3 * ::GetSystemMetrics(SM_CXBORDER);
UINT colsep = 2 * ::GetSystemMetrics(SM_CXEDGE);

The most useful values I have found for ::GetSystemMetrics that are
handy to know about are shown in the table below. However, there are many other
parameters which can be interesting.

SM_CXBORDER
SM_CYBORDER
Width and height, in pixels, of a window
border. This is equivalent to the SM_CXEDGE value for windows with
the 3-D look.
SM_CXEDGE
SM_CYEDGE
Dimensions, in pixels, of a 3-D border. These
are the 3-D counterparts of SM_CXBORDER and SM_CYBORDER
SM_CXFIXEDFRAME,
SM_CYFIXEDFRAME
Thickness, in pixels, of the frame around the
perimeter of a window that has a caption but is not sizable. SM_CXFIXEDFRAME
is the heightof the horizontal border and SM_CYFIXEDFRAME is the
width of the vertical border.
SM_CXSIZEFRAME,
SM_CYSIZEFRAME
Thickness, in pixels, of the sizing border
around the perimeter of a window that can be resized. SM_CXSIZEFRAME
is the width of the horizontal border, and SM_CYSIZEFRAME is the
height of the vertical border.
SM_CXHTHUMB Width, in pixels, of the thumb box in a
horizontal scroll bar.
SM_CYVTHUMB Height, in pixels, of the thumb box in a
vertical scroll bar.
SM_CYCAPTION Height, in pixels, of a normal caption area.
SM_CXVSCROLL,
SM_CYVSCROLL
Width, in pixels, of a vertical scroll bar; and
height, in pixels, of the arrow bitmap on a vertical scroll bar.

 In those rare cases where you may need to create controls dynamically,
hardwiring any integer value for the width or height is uniformly a blunder. I
have found the most convenient way to create controls dynamically is to lay down
a "prototype" control on the dialog. I mark this control as invisible,
and create a control member variable for it. Then, for example, if I need to
create a control at runtime I use the dimensions of this control to determine
the dimensions of the control I want to create, most often the height.

CStatic c_PrototypeStatic;   // declared in the class by ClassWizard
//...create the control at position x,y holding text 'caption'
CRect r;
c_PrototypeStatic.GetWindowRect(&r);
ScreenToClient(&r);
CClientDC dc(&c_PrototypeStatic);
int width = dc.GetTextExtent(caption).cx;
width += 2 * ::GetSystemMetrics(SM_CXEDGE);
r.right = r.left + width;

CRect w(x, y, x + r.Width(), y + r.Height());
...Create(caption, SS_LEFT, w, this, IDC_STATIC);

Note that I did not show anything to the left of the Create method
because there are lots of ways I could create a window as simply calling

CStatic * wnd = new CStatic;
wnd->Create(...);

or have an array 

wnd[i] = new CStatic;
wnd[i]->Create(...);

or have a variable declared in the class and write

c_MyLabel.Create(...);

Here’s another trick I use a lot: I need, for whatever reason, to create a
window at runtime at a specific location on a dialog. Often this is a custom
control of some sort. To get it properly positioned on the dialog, I do not ever
rely on coordinates that I "wire in". Instead, I place an invisible CStatic
control with an ID other than IDC_STATIC on the dialog, and use its
coordinates to create my desired custom window. For example

CStatic c_Frame; // created by ClassWizard

CMyControl c_Whatever; // added by hand in the protected area

BOOL CMyDialog::OnInitDialog()
   {
    ... lots of stuff here...
    CRect r;
    c_Frame.GetWindowRect(&r);
    ScreenToClient(&r);
    UINT id = c_Frame.GetDlgCtrlId();
    c_Whatever.Create(r, this, id);
    c_Frame.DestroyWindow();
    ... more stuff here...
   } // CMyDialog::OnInitDialog

Note some tricks here. I can actually reuse the control ID because I destroy
the static window (although this is not actually necessary, I find it cleaner).
The Create method shown here is the Create method I have written
for my own control. Your Mileage May Vary and you may have additional
parameters.

Another time I use this trick is when I want to limit the amount the user can
resize a resizable dialog. What I will do is create an invisible static control
that represents the minimum size. This is much cleaner than picking some control
to serve the purpose, because if you rearrange the controls the control you
originally selected may no longer be the rightmost or bottommost. 

To create an OnGetMinMaxInfo handler, you either have to add it by
hand or go to the Class tab in the ClassWizard and change the type from dialog
to general CWnd. It should have been obvious to Microsoft that a resizable
dialog would need this (they have the style flags available to determine this)
but this is one of the many defects of ClassWizard.

void CMyDialog::OnGetMinMaxInfo(MINMAXINFO * lpMMI)
   {
    CRect r;
    c_Frame.GetWindowRect(&r);
    ScreenToClient(&r);
    lpMMI.ptMinTrackSize.x = r.right;
    lpMMI.ptMinTrackSize.y = r.bottom;
   }

Because I have the frame to reference, if I add new controls that I don’t
want covered on a resize, I only have to adjust the size of the hidden frame.

However, note that at all times I have never, ever once used a hardwired
constant to determine either the position or the size of any control, or other
distance, on the dialog.


Using hardwired
constants for control IDs

Severity: potentially (usually, when small
values are used) fatal

It is frequently useful, and sometimes necessary, to be able to create
controls at run time. This is a practice which ought to be avoided.
Typical silly reasons are largely similar to "I only need the progress
control when I’m doing the [read, write, computation]". This is a reason so
silly that I cannot imagine why anyone would ever think it made sense. It is so
trivial to hide a control when you don’t need it (and even create it without the

Visible property checked) and show it when you need it that this never
makes sense. It is usually compounded by the tendency to hardwire
control coordinates
, which is even less excusable. An even sillier excuse is
"Well, if I create it in the dialog editor and then hide it, it takes up
space
". Yes, and the price of gasoline in downtown Pittsburgh on
18-Mar-03 was $1.659/10. That is about as relevant to the
discussion. People who talk this way act as if the infinitesimal amount of space
required to hold the window information actually matters. In fact, the code
required to create and manage the window probably takes up more space! This is
another case of "optimizing" the wrong parameter, with no
justification whatsoever.

A typical sort of insanity tends to look like

m_progressbar = new CProgressCtrl;
m_progressbar->Create(...., 1);

This embodies at least two serious blunders. The use of a dynamically-created
CProgressCtrl is another silly idea. You
could create a CProgressCtrl variable, not a CProgressCtrl *,
and it would work just as well. Without involving the additional complexity of
the allocation, and necessitating the corresponding delete.

But why create the control at all? Place it on the dialog in the dialog
editor! You get a nice control ID already assigned, you can use the ClassWizard
to create a binding to a control variable. Just uncheck the Visible
attribute. Then when you need it, replace the complex lines shown above with a
simple

c_progressbar.ShowWindow(SW_SHOW);

when done with it, do a ShowWindow with SW_HIDE.

Now, when it comes to using hardwired numbers, note the use of the number
"1". There is no explanation of why so many people choose the number

"1", but in a dialog this is fatal. The control which is the OK button
has ID IDOK, which a quick inspection of the header files will show is ID
number 1. Hence, most attempts to bind the control will fail with ASSERT
failures when the binding is tried, because the control with ID 1 is already
assigned.

Typically, for most of us when creating controls, we pick a range of integers
in the range 10000 to 32767. Most importantly, we use a #define to define
the base of this range, so if there is a conflict for some reason, it is trivial
to compensate for it. The reason for the choice of values is that the dialog
editor tends to assign much smaller numbers, and the values in the range greater
than 32767 would often result in a serious conflict with the existing Microsoft
menu item ranges.


Using GetDlgItem

Severity: poor style

This is hardly ever necessary. It is certainly not necessary for getting (as
some have argued with me) the window reference for a control. Just use the
dialog editor to create a control variable. I discuss this in depth in my companion
essay
. Key here is that if you subclass a control later, and override one or
more of its built-in methods, GetDlgItem guarantees that your program
will stop behaving correctly. 

There is rarely an excuse for using it; I write perhaps one a year, and
consider very carefully the alternatives before doing it.


Using UpdateData

Severity: poor style

This is one of those ideas so bad that if it hadn’t already been invented,
nobody would be dumb enough to invent it. It is based on the illusion that you
want memory variables to be exact replications of control state. We all know
from basic design principles that replicating information is usually fatal
unless you are extremely careful about how you manage the state. My experience
with UpdateData was that I found programs harder to write, harder to
debug, and harder to maintain than programs that didn’t use it. Almost every
time I get a dialog that has failure modes such as inconsistent control
behavior, incorrect results depending on what order controls are used in, and
similar problems, I look for UpdateData. I find it. I discover the
programmer was clueless as to how to use it in a way that would guarantee these
problems would not exist, which is not surprising given how poorly designed it
is. So I just rewrite the code to eliminate it (see my essays on Avoiding
UpdateData
, Avoiding GetDlgItem, and Dialog
Control Control Management
). At that point the problems simply disappear.


Writing an OnCommand handler

Severity: poor style,
potentially harmful,
potentially fatal

I have not encountered any situation in which this is a valid approach to
solving any problem. In all cases I have seen it, it is someone who doesn’t
understand either Windows or MFC trying to do something that is much more easily
(and often correctly) done by using normal MFC message dispatching. In eight
years of MFC programming, I have only ever written on OnCommand handler,
and that was to investigate how OnCommand works to explain command
routing. Every example I have found in code, newsgroup postings, etc. was the
result of a beginning programmer being totally clueless. Some if this code
persists for years, confusing the next maintainer who has no idea how to modify
the code. The only solution is to rip this code out and replace it with
something that makes sense.


Writing a
PreTranslateMessage handler

Severity: deeply suspect

This is an overused and usually inappropriately-used technology. Many
programmers use it as a substitute for doing proper subclassing of events. There
are many legitimate reasons to use it, but they are far and away overshadowed by
inappropriate misuse. Generally, the severe warning signs of misuse are

  • Subclass avoidance
    • There is a test for a specific control ID in the PreTranslateMessage
      code
    • The control has not been subclassed
    • The message type is a message that could be handled by subclassing the
      control
  • Testing for Enter or Escape keys in a dialog to avoid
    closing the dialog (see my essay on dialog apps
    for a cleaner solution!)

Appropriate uses include testing for mouse and keyboard events without
attaching those to a specific control ID (for example, handling F1 help),
handling complex tasks which MFC interferes with, such as editing labels in a
tree control contained in a property page (the Enter key to terminate
label editing will also close the property sheet if you don’t do the override),
and, within a subclass handling messages sent to that control.


Writing an OnCtlColor handler

Severity: (exceptionallly) poor style

This seems like it would be a legitimate activity. It rarely is.

OnCtlColor is part of the parent class, and therefore suggests that somehow
the parent class should know what color to use for a child window. This is a
throwback to the days of early 16-bit Windows programming, when the paradigms
had not been properly thought out. It violates every sensible definition of
modularity that exists. I have no idea how a parent control can know how a child
control should paint itself. Years ago, I started writing reflected
WM_CTLCOLOR
handlers for child classes, and it seemed much saner. All I did
was take the incoming WM_CTLCOLOR message and send it to the window whose
handle was provided. That window handled the coloring.  I
have only seen one sensible use of OnCtlColor, when the programmer wanted
to change the color of every child control in a dialog (and I question
why this was a valid thing to do). The correct approach is to subclass
the control, and use the reflected WM_CTLCOLOR_xxx message
by subclassing the child control and creating an =WM_CTLCOLOR

reflected-message handler in that subclass. MFC
will then route the message to the child class, which is where the decisions
belong. Key warning sign: there is a test for both the control message type and
a specific control ID in the body of the handler.


Control Management

Severity: potentially fatal (when done
poorly, that is, distributed throughout the dialog)

If you do the same computation in two places, pretty soon you won’t have the
same computation.

This is why we use subroutines.

Unfortunately, this knowledge seems to be lost on programmers who try to
manipulate controls in a CDialog- or CFormView-derived class.

I blame some of this on Charles Petzold, because I learned from his Win16 book
back in 1989 or so, and
it teaches a number of horrendously bad Windows programming habits, which I then
had to break. Essentially, what I did was say "There is something wrong
with this; how do I do this right in Windows?" and it took me a couple
years to get the paradigms right. 

So I wrote an essay on how I do dialog control management in dialogs and
CFormViews. It works, it is highly effective, and it is the way I have written
control management for about eight years. It consolidates all the control logic
in one place. It would have been nicer if Microsoft had given us ON_UPDATE_COMMAND_UI
handlers in dialogs, an extension that seems blindingly obvious, but they
didn’t. I have written an essay on this
technique. 

Bottom line: when I get a dialog that malfunctions in terms of enabling, I
simply look for all instances of ShowWindow and EnableWindow, and
then consolidate them in one function I call updateControls. I was given
one app which updated several controls, in 23 different places, using 6
different algorithms, all of which were wrong!  Furthermore,
depending on what the user had most recently done, controls that should have
been enabled remained disabled, or controls which should have been disabled
remained enabled. Any programming style that has this sort of result is flawed
beyond hope of redemption. Abandon that style.


Manipulating Global Variables

Severity: poor style,
deeply suspect
, serious

There are very, very few global variables needed in a C++/MFC program.
Generally, these variables represent key global state which is the raison
d’être
of the program. Other global variables, incidental to the actual
computations being performed, are usually a mistake.

Perhaps the worst mistake committed is the use of global variables which are
used or manipulated by a dialog. This results in a dialog which cannot be reused
easily, and introduces possibilities of serious errors during maintenance.

Now let’s get something established at the start here: I come from the school
of "Global Variables Considered Harmful". I mean that literally. My
Ph.D. dissertation advisor and one of my committee members wrote a paper of that
title:

W. Wulf and M. Shaw. Global variables considered
harmful
. Sigplan Notices, 8(2):28–34, February 1973. 17.

Since there is no reason to use global variables (except for
the basic program state being manipulated), my rule is
that no dialog I ever write will use a global or static variable, with the
possible exception of a table labeled as a static const table and
declared in the source module of the dialog.

Note that many programmers show a proclivity to stuff all
sorts of variables into the CWinApp-derived class or the CMainFrame

class, and sprinkle constructs like

((CMyApp *)AfxGetApp())->somevariable

throughout the program. This is functionally identical to the
use of global variables. Unfortunately, they rationalize this as "I’m not
using a global variable, I’m using a member variable of a class!" This is
arrant nonsense. The real issue is the encapsulation of information, and
statements like the one above demonstrate a profound violation of the concept of
encapsulation. Having a document, a view, a dialog, or whatever sharing some
variable in the CWinApp-derived class violates all sorts of principles of
encapsulation, and as such is no better than the use of a global variable.

Dialogs which need to obtain values from, or communicate
values to, their parent can easily do so by using SendMessage. Read my
essay on messaging for more information. This in
particular applies to modeless dialogs. A toolbox-style dialog, that is, one
that is supposed to apply to the current view, I implement by having it SendMessage

to the mainframe:

AfxGetMainWnd()->SendMessage(...);

and have the mainframe do a GetActiveView() to forward
the message.

in the message map
ON_REGISTERED_MESSAGE(UWM_SOME_VALUE_CHANGED, OnSomeValueValued)

...the handler
LRESULT CMainFrame::OnSomeValueChanged(WPARAM wParam, LPARAM lParam)
   {
     CView * view = GetActiveView();
     if(view == NULL)
       return 0; // or other appropriate value indicating failure
     return view->SendMessage(UWM_SOME_VALUE_CHANGED, wParam, lParam);
   }

Note that I can’t do this in the dialog because GetActiveView
is a member of CMainFrame. This also makes the dialog easily reusable,
since the only thing a client needs to know is that it sends a message to the
mainframe. Note my preference for Registered Window Messages, described in my
essay on message management.


Manipulating
or calling members of another window class from within a window class

Severity: poor style, serious

This is a very common practice, and as far as I can tell, it
evolves from the C attitude that "I can do anything I want to any data
structure at any time I feel like it". This is an unhealthy attitude to
bring to OO programming. The key question is, why would you ever need to
manipulate the member variables or call the member methods of sibling or parent window
class? Hint: there are no right answers. The only time you want to
manipulate variables in general (and there are people who are more fanatic than
I on this, who assert that this is even too much) is to set the public members
of a CDialog class before doing a DoModal() call. They argue, and
I philosophically agree with them, that these should not be public
variables, but parameters to the constructor. And if they need to be queried,
access functions should be provided to be called after DoModal() returns.
We see exactly this approach in many MFC classes such as CFileDialog.
However, Microsoft’s ClassWizard is not especially programmer-friendly, and does
not provide us the support we need to do this easily. It also makes sense to be
able to call the methods of child controls. But that’s as far as it makes sense
to go. Reaching into the parent is poor style. Manipulating variables in
the CWinApp is inexcusable. Repeat after me: "modularity is good.
modularity is good. modularity is good!"

The inter-window communication mechanism is messages. In
general, you shouldn’t even need a specifically-stored pointer to a window class
to send a message. A dialog would query state by doing a GetParent()->SendMessage(…)
which is clean because the dialog then becomes entirely self-contained.

In general, windows are independent entities and have no
business knowing about each other in any detail, and certainly don’t need to
know anything about the internal implementations of each other. Communication is
via well-specified interfaces. For example, a dialog may mention a set of public
variables which should be initialized before DoModal is called and a set
of public variables (perhaps the same set) from which values can be obtained
after DoModal is completed. Other than that, nothing about the
implementation needs to be known, particularly such nonsense as indexes
into ListBoxes that don’t exist, which is what UpdateData does (see my
essays on "Avoiding UpdateData" and

"Who Owns the GUI?"). Otherwise, if a
dialog needs to make a real-time query of its parent, it should use SendMessage,
and if it needs to set a value, it should use SendMessage.

Views do not need to know about each other. In fact, it is
inconceivable to me that two views should know any details of each other, even
that they exist. If you do something in a view, the only reason another view
needs to know about it is because the first view has changed something. And
that’s what a CDocument-derived class is for. If a view makes a change,
it notifies the document, and the document undertakes to notify any other views.
Or the view can simply ask the document to notify other views, using UpdateAllViews.
All views (if any others) will be notified, but the view that is doing the
notification doesn’t have to know that any other views even exist.

One objection usually turns out to be something along the
lines of "Well, I can write it syntactically, so why shouldn’t I be allowed
to do it?" Sure, and you can program in assembly code, too, but that
doesn’t mean that it makes sense. Good OO programming is a discipline.

I elaborate a lot more on this set of problems in my
companion essay on the design of dialogs and controls.


Manipulating the controls of a dialog from outside the dialog

Severity: (exceptionally) poor style,
serious

I cannot imagine how this could ever make sense. A dialog is
an object, and the controls represent an internal implementation detail of the
class. So nothing outside that class should know its implementation details. It
is true that the ClassWizard creates the control member variables as public,
but I consider this a deep blunder representing an inadequate design of
ClassWizard (it isn’t the worst or most egregious flaw in ClassWizard). I’ve
seen people trying to reach into modeless dialogs from outside it calling things
like

(CEdit *)(modeless->GetDlgItem(IDC_NAME))->SetWindowText(s);

which is about as bad as you can get; second worst is
something like this, executed by a child dialog:

(CEdit *)(GetParent()->GetDlgItem(IDC_NAME))->GetWindowText(s);

These represent depths of tastelessness in OO programming
that set my teeth on edge. For example, this style presumes that you know the
implementation of the object being manipulated. So if you change the
implementation for some reason, it means that places in your program you never
heard about now have to be changed, violating every known principle of good,
modular design. For example, see my companion essay "Who
owns the GUI?
".

For example, suppose I create a derived class based on a
standard control, such as CEdit or CListBox. I then override
methods of these controls. For example, my ListBox class

that automatically maintains its horizontal scrollbar is described in another
essay. I override InsertString, AddString, ResetContents,
and DeleteString. Now suppose you want to add something to this ListBox.
Instead of notifying the dialog to add something, so it correctly calls CHListBox::InsertString,
you have done

((CListBox *)dlg->GetDlgItem(IDC_SOMELIST))->AddString(s);

which will call CListBox::AddString, not CHListBox::AddString

So, you say, suppose I do it "right" and access the
correct member variable, which is already bound to the correct type? Wouldn’t
this solve the problem?

dlg->SomeList.AddString(s);

That solves one problem, but now it means that the dialog
contents can be modified in ways that are not expected. Suppose I am using an
owner-draw ListBox, and I want to modify the structure that is used. Now I have
to locate all the places inside my program where that operation is performed,
and modify all of them as well! If I did it only in my dialog, I would have the
control localized to exactly one place. This is why I frequently make such
structures protected within the dialog.

The correct specification of this is to have a method
which is invoked to perform the data setting or retrieval; it is up to that
method to decide, within the confines of the dialog, how to implement the
functionality that effects the desired change or retrieval. Nothing outside the
dialog needs to know how this is done. The cleanest method by far is to use SendMessage
as the mechanism. So I would do something like

modeless->SendMessage(UWM_GET_NAME, buffer, size);

where I would write the specification

/*******************************************************************
*                            UWM_GET_NAME
* Inputs:
*        WPARAM: (WPARAM)(LPTSTR) pointer to a buffer to be filled
*        LPARAM: (LPARAM)(DWORD)  the size of the buffer
* Result: LRESULT
*        Logically void, zero, always
* Effect:
*        Fills the buffer with the name supplied by the user
*******************************************************************/
#define UWM_GET_NAME see_my_essay_on_messaging

This defines a clean modular interface between the dialog and
its clients.

Think of messages being a clean way to invoke virtual
methods, and you have the right idea.

However, this leads to another of the horrors: simulating
control messages from outside the dialog by sending messages to the dialog. This
is never sensible. For example, someone wants to get the effect of clicking the
"Do Something" button. They will code

dialog->SendMessage(WM_COMMAND, 
                    (WPARAM)MAKELONG(IDC_DO_SOMETHING, BN_CLICKED), 
                    (LPARAM)dialog->GetDlgItem(IDC_DO_SOMETHING));

(that is, assuming they can actually read the documentation;
most do something far sillier:

dialog->SendMessage(BN_CLICKED, 0, 0)

or something equally bad). This pathetic attempt at
notification is in the same class as manipulating the controls. It is just the
wrong approach. Note that the sender had to know the control ID of the control,
the type of the control, etc. These are all violations of object integrity.

The only correct solution is to use SendMessage to
send a user-defined message to the dialog. Then the dialog decides how that
message is handled. It might call the same functions as the button-click
handler; it might not. It is none of your business, as the initiator, how
this is implemented
.

The correct, and for all practical purposes, the only
correct, method of requesting a dialog, CFormView, or other such window
perform something is to send it a user-defined message. For example, SendMessage
or PostMessage. As an example, 

dialog->SendMessage(UWM_SOME_USER_MESSAGE);

with the possible addition of a WPARAM or LPARAM value.

One of the nicer features of VS7 (Visual Studio .NET 2002 and
Visual Studio .NET 2003) is that you can declare the variables protected.
You should declare all member variables, especially control variables, and all
message handler functions, protected. The only public variables to
a dialog are those you use to pass information into and out of the dialog; and
by "pass into" I mean "those variables whose values must be set before creating
the dialog", and by "pass out of" I mean "those variables whose values contain
information to be retrieved after the dialog has terminated". They are never

used to communicate information to anyone during the lifetime of the dialog, and
particularly they are never accessed by any child dialogs.


Drawing on
the dialog

Severity: insanely poor style, hard to get working
(if ever), easy to break

Many people are tempted to draw things on the dialog, by
overriding the OnPaint handler in their CDialog-derived class.
Assume that for all practical purposes this is always a mistake. As far as I can
tell, the only construable excuse might be to use a bitmap image as the
background for the dialog, and even them I’m deeply suspect of this excuse.

Don’t do it! Don’t even bother trying! Else you will end up
posting questions of the form "I tried to draw … on my dialog, and it ….<description
of numerous unexpected results
>". These usually include random
disappearances, solid disappearances, flickering, ovewriting other controls, and
a few other things I can’t recall at the moment. Not that it matters–the
problems go away when you give up trying to write on the dialog.

If you need to draw on a dialog, create a static control, and
draw in that. This is always the simplest answer. For example, I once needed to
draw arrows that went from box to box on a dialog. To do this, I simple created
a CArrow subclass, and in its OnPaint handler I used 1 bit to
indicate its direction (N/S, E/W–all arrows are only horizontal or vertical),
and two bits to indicate which ends got arrowheads (00 – neither, 01 – N/E, 10 –
S/W, 11 – both ends). The 00 combination gave me "thick lines" so I could draw
L-shaped and other non-simple arrows. Not once did I draw anything on the
surface of the dialog.

I’ve never tried to draw on a dialog, and I’ve never had
problems. Every week or two someone posts a question on the newsgroup that
starts out with that "I tried to draw … on my dialog" and then describes some
bizarre behavior. Patient: "Doctor, it hurts when I do this" Doctor: "Then don’t do that!". The
easiest way to solve a problem is to avoid choosing a way that is going to fail,
and use a way that is known to work, is easier to use, has no known problems,
and is utterly reliable.

Even to do the background of a dialog, I simply put a static
control with the WS_CLIPSIBLINGS style down, and put the background
bitmap in it. I’ve even done it using OnPaint and StretchBlt to
fill the dialog area, which works pretty well for most situations, but looks
really bad if you have a resizable dialog that can get very much smaller or very
much larger than the background bitmap, but this is a defect in StretchBlt
and doing your own painting on the dialog won’t make it go away.


Using GetDC

There is rarely a need to use GetDC. The need is so rare and unusual
that it is safe to assume that if you are writing GetDC( ), you have made
a mistake. Part of the reason for this is that the number of times I see people
write GetDC( ) without writing ReleaseDC is horrendous.

Generally, you don’t need to get a DC at all. Certainly if you think you need
to get a DC for drawing, you should be immediately suspect. If you are doing
"rubber-band" objects (lines, selection rectangles, etc.) you need a DC other
than in the OnPaint handler, and occasionally you may need to compute a
text width, but the rest of the time, ask yourself, "Do I
really need a DC?" If the answer is "Yes, of course, to paint" and you aren’t
doing rubber-banding in an OnMouseMove handler, you have probably made a
fundamental error. If you are not computing some text width
somewhere, the answer again is that you have probably made a fundamental error.

Given that you need a DC, the correct way in MFC to do it is to use the
CClientDC data type, e.g. instead of writing

    CDC * dc = GetDC();
    ... do something with DC
    ReleaseDC(dc);

you should be writing

    CClientDC dc(this);
    ...do something with DC

Note that no release is required because the destructor of the CClientDC
handles the release.


Thinking you have one monitor

Severity: deeply suspect

OK, I confess: I’ve done this far too much. Back in the Bad Old Days of
single monitors, you could tell where something was by simply looking at the
coordinates of the window. If they were negative, you were off-screen somewhere.
If any of them exceeded ::GetSystemMetrics(SM_CXSCREEN) and/or ::GetSystemMetrics(SM_CYSCREEN)
meant you were offscreen in the other direction. I’ve even heard that people
will "move a window offscreen" instead of using the more sensible ::ShowWindow(SW_HIDE)

by setting the coordinates to be large enough in some direction to force the
window offscreen. And the other thing I got wrong was fixing a different bug: if
you store the window placement somewhere, such as in the Registry, and the
screen resolution is changed (usually to a lower value), the window will end up
offscreen. This results in a window being unintentially invisible if its former
coordinates were no longer on-screen given the new resolution, and often it was
nearly impossible to force the window back on-screen. So I’d compare the
coordinates of the window to the coordinates of the screen, and if it was out of
range, I’d force it onscreen.

Alas, it ain’t so any longer. Multiple-monitor support has been with us for
several years, and it is now time to abandon the old ideas and move on with the
rest of the world.

 


Having an #include of the app file in every module

Severity: poor style, inefficient
(development)

Unfortunately, the ClassWizard does this for you. I consider
this one of the numerous serious defects in the ClassWizard. There are virtually
no modules in your program, except the CWinApp-derived module and the CMainFrame-derived
module, that need access to the application object (many components may invoke
methods of the generic CWinApp class such as LoadCursor and such,
but they do not need the CWinApp-derived header file included). The
inclusion of this file makes it impossible to reuse the class in any other app.
In general, I try to immediately remove this #include. Mostly I can just
remove it. However, in many situations all I need to replace it with is

#include "resource.h"

which is all you usually need.

Not only does the presence of this file tempt you to access
member variables and methods of the CWinApp class, which is usually poor
style, but it also means that every time you happen to modify the CWinApp
class, every module that includes its header has to be recompiled, whether the
module needs the definitions or not, which is simply stupid. If you aren’t using
anything in the module, it should not have a dependency. This links to the
notion of "global header files" which is equally sloppy.


Global (or "umbrella") header
files

Severity: unbelievably poor style

There is an extremely unfortunate habit that many programmers
develop, which is to dump every possible definition of every possible variable
into a single header file, with some name like "global.h" or "all.h"
or something like that. This technique has offended me for decades. I always
said this was a really stupid thing to do, because it means that every change
you make recompiles nearly everything. I was generally told I was just flaming.

Now I have proof. If you want to see it, get a copy of Ellen
Borison’s PhD dissertation, 

“Program
Changes and the Cost of Selective Recompilation”, PhD Dissertation, Department
of Computer Science, Carnegie-Mellon University, May, 1989.

In
this she demonstrates beyond the shadow of any doubt that this technique is
always a poor methodology.

A
module should include exactly those header files it requires, and no
more. There are two methods that can be adopted. One is to do all the
#includes
"flat" style, that is, no header file includes any other header file.
A slightly cleaner method is that a header file which requires that some other
header file be included do the
#include

itself, and that all #include
files have some sort of protection against multiple inclusion.

In
addition, there is a tendency to say "well, since I have included these six
files in every compilation unit, I really need to make a single #include that
contains all six files". OK, maybe that makes sense. Just barely. But that
is the start of the "slippery slope" effect, wherein you then add a
seventh, and an eighth, and a twenty-second, and a thirty-fifth, and pretty soon
every header file you have is defined in that header file and no matter what you
do, everything recompiles. This is lousy programming style. It introduces
gratuitous interdependencies that do not exist. It also tempts you to using
sloppy programming style; since you have all those definitions, you just
concatenate accesses to get such grotesquely ugly expressions as

BOOL changed = (CEdit *)((CMyApp *)AfxGetApp())->modeless->(GetDlgItem(IDC_NAME)))->GetModify();

Don’t
laugh. I saw this in one program! It was memorable. Read my sections in this
essay on avoiding global variables
and manipulating dialog
controls
to see why this incorporates some of the grossest programming I
have ever encountered. 

If
you need a global function (one whose behavior is defined entirely by its input
parameters, for example), just create a file to hold it, create a header file to
define it, and include that header file in the very few modules that use the
function. In the presence of modern browsing technology (such as the Class View
of VC++), there is no loss of ease of use, and you don’t end up with
modules which have every sort of mishmash of functions in them. A good
programmer I knew years ago said "You know you are in trouble if you write
a file called "
util.c" that holds all your "utility"

functions. You probably are clueless about program structure". I learned
from that, and have never since written such a file, independent of what name it
might have. That was something like 25 years ago.


Reaching
across classes

Severity: potentially fatal

There
is a perennial set of blunders that start out with a question of the form

"I
want to access the contents of my [document, view, dialog] from the mainframe. I
am doing … and it doesn’t work. What am I doing wrong?"

This
is usually accompanied by some grotesque code example such as

CEdit * data = (CEdit *)(&my_View->m_InputName)

or
something equally appalling. There is no excuse for writing such execrable code.
If you have something that involves a view, put the code in the view. Use
messages to transmit information if necessary. For example, I might write

CView * view = GetCurrentView();
CString * name = NULL;
if(view != NULL)
    name = (CString *)view->SendMessage(UWM_QUERY_NAME);

although it is more likely I would not do anything in the mainframe that
required such precise information from a view. You should neither know nor care
about the type of the view. A view which understands the message will respond to
it. One which does not will ignore it, and you will get NULL or 0 or
FALSE
back. Maintain your abstractions!

In
a sane and responsible world, control variables would have been declared protected;
it is a design defect of the ClassWizard that makes them public. Similarly, a
document must never reach into a view, a view should never reach into a dialog
(except for setting input and retrieving output variables of a modal dialog),
and so on. Just because the language allows some horrible kludge is not a
justification for writing one. 

A
variant of this is the one where the questioner says: "I have this custom
control and it needs to call a method/access a variable of the parent class. I
did the following…" usually accompanied by some ghastly example of the
form

classs CSomething : public CStatic {
     public:
        CMyParent * parent;
        ...
    };

and
an access of the form

parent->SomeMethod();
parent->SomeVariable = ...;
... = parent->SomeVariable;

They
are often perplexed that the code does not compile. Usually some helpful soul on
the newsgroup will point out that they need to #include "MyParent.h"

or some equally bad piece of advice.

This
usually solves the problem about the program compiling. It does not solve the
more fundamental problem, that the person writing the code is totally clueless
about program structure.

A
dialog or child control does not need to know anything about its parent. Ever.
It has a set of clean interfaces that specify its behavior, entirely in terms
of the control or dialog itself
. It is up to the outside world to meet the
constraints of this interface specification. 

A
dialog or control that knows anything at all about its parent is a design so
poor that there is no salvaging it. See my essay on Dialog and Control Design.


Overriding
CWinThread::Run
or CWinApp::Run

Severity: Almost always poor style. Abysmally

poor style

Every time I have seen someone override CWinThread::Run (which is
CWinApp::Run
, because CWinApp is a subclass of CWinThread) it
has represented something that could have been done more readily by another
mechanism. Simpler, faster, easier, and more reliable code would result. It is
almost always the result of failing to understand how MFC works, and kludging in
something based upon having once read a book on pure Win32 programming.

For example, people who create UI threads, that override CWinThread::Run

and put an infinite loop inside, have missed the fact that they should have used
a worker thread, and not a UI thread (a similar mistake is made if they put the
infinite loop in their InitInstance handler and return FALSE when
it exits). There was no reason to use a UI thread!

Another common failure is to create a thread, and override CWinThread::Run
to have a GetMessage or PeekMessage loop. There is no point to
this! That’s what the Run method already does! And it can be safely
assumed that the loop is probably wrong. For example, a PeekMessage loop
which has no blocking condition merely creates a mechanism for consuming 100% of
the CPU.

What is the proper solution?

First, don’t use a UI thread if you meant to use a worker thread. This
eliminates a number of failure modes.

One excuse is "I need to poll for incoming events, but do a loop that is
doing something useful otherwise". There is already a better mechanism for doing
this, the OnIdle handler. What you do in the OnIdle handler is
part of a computation; for example, in one application I got, each invocation of
OnIdle tested a flag to see if a report was being generated. If one was,
it generated one line of the report and returned TRUE. It kept
this up until the report was completed (or cancelled). If the report completed
or was cancelled, the pending-report flag was cleared; if this flag was cleared,
it returned FALSE. This was a way to get "background processing" in a
16-bit Windows app. In Win32, this would actually be better handled by firing
off a thread to create the report.

Another excuse is "I need to process a message to cancel the loop", which is
silly, because there is no need to process a message to cancel a loop. Read my
essay on worker threads, including the section
on thread cancellation.

I’ve seen even weirder structures, so convoluted that even trying to explain
them here would take more effort than I’m willing to expend. They were all
wrong. In many cases, the program simply never worked correctly, because the
whole mechanism that was used was so deeply flawed that any illusion that it
worked was just that: an illusion. Under load conditions, the mechanism
collapsed under its own weight.

I’ve even seen someone put a TranslateMessage/DispatchMessage handler
in the middle of a CWinThread::Run, for a thread that had no windows!
And wondered why this was not handling thread messages (duh! Thread messages
aren’t associated with windows! That’s why they’re called thread

messages). Exactly how it was going to handle them escapes me.

To handle thread messages, create a WM_APP-based message, or better
still, a RegisterWindowMessage value, and use that in an
ON_THREAD_MESSAGE
or ON_REGISTERED_THREAD_MESSAGE entry in the
Message Map. See my essay on Message Management for
more details on this, as well as my essay on UI threads.

As far as I can tell, there is no need to ever override CWinThread::Run.
Assume, the instant you decide to do this, that you are wrong. That assessment
is correct so often that the times that overriding CWinThread::Run would
be correct are measured in events-per-decade. Except that in nearly a decade of
MFC programming, I have yet to find a reason to ever override CWinThread::Run.
And I’ve done some pretty sophisticated UI threads.

Summary

I’m sure that I’ve manage to offend nearly everyone who has
read this with some opinion or the other, because I’ve slammed your favorite way
of doing things. That’s fine. I want you to think about what you are
doing. Programs have the unfortunate characteristic that just because you can

do something in a certain way, you have not proven that is the most effective
way to do things. Often programmers invent idioms which they use over and over
because they once carefully figured out how to reach inside abstractions and
extract what they need; this does not make that programming methodology clean or
effective. Just because "it works" does not make it
"good". 

I work in a world in which I not only write code, but I have
to maintain it a year or two after I wrote it. I cannot build
"fragile" programs that might break. They are often maintained
(perhaps usually maintained) by unskilled labor, which includes myself a year or
two later. I cannot afford to do things that make simple changes complex.

I use multithreading a lot. I have to be exceedingly careful
to get it right; any technique that is error prone is eventually fatal. I may
not see conditions arise during testing that arise in normal use, and there can
be thousands of copies of my program out there being used. Anything that is
deadlock-possible will
deadlock, and this reflects badly on my
clients’ products, and in turn reflects badly on me. I am in the habit of
delivering robust applications that I can tackle modifications on after having
seen tens of thousands of lines of other code for months. Adopting fairly rigid
styles which are, if not impervious to changes under maintenance, are extremely
insensitive to changes under maintenance, is critical (I prefer impervious when
I can achieve it).

So this essay not only captures techniques which I use in my ordinary
programming, but also captures a set of bugs, stylistic errors, and fragilities
I encounter in nearly every program I am given to maintain. It is very sad, but
most programs I get are fairly poorly written (probably because the well-written
ones don’t need the kind of repairs I can give, sort of like healthy people not
needing to see physicians as often as those with medical problems–I remember a
pre-Monty-Python radio program ["I’m Sorry, I’ll Read That Again"] in
which John Cleese said something along the lines of "I’m tired of being
forced to look into people’s filthy mouths! I’d like to fix something nice and
clean, like a broken arm, someday" and the reply "But you’re a dentist".
So I sometimes have that feeling, too). So I’ve learned to look for these
problems when I get symptoms that they would cause, and I find them. And I fix
them. 

But it is a whole lot easier to not commit these crimes against good
programming in the first place.

The views expressed in these essays are those of the
author, and in no way represent, nor are they endorsed by, Microsoft.

Send mail to newcomer@flounder.com
with questions or comments about this web site.
Copyright © 2002-2003 FlounderCraft Ltd.
All Rights Reserved
Last modified: