Of course, this is largely true of TP6, too, but in real mode we can do segment arithmetic with impunity. In Windows' protected mode environment, 32-bit pointers no longer consist of a 16-bit segment and a 16-bit offset, but of a 16-bit selector and a 16-bit offset. The selector is essentially a pointer into an relatively small array of segment descriptors, so that while any 32-bit value represents a valid address in real mode, the same is not true in protected mode. (In fact, since a selector is a 16-bit pointer to an eight byte descriptor, the odds are 7 to 1 that any given 16-bit value will be an invalid selector. What's more, since all 8192 possible descriptors will probably not be filled in, the odds against an arbitrary 16-bit value mapping to a valid selector are significantly higher than 7 to 1.) While, of course, this hardware pickiness about selector values is what makes invalid pointer bugs so much easier to find under Windows than under DOS, it also complicates random access to huge structures. We can't just increment the segment part of a pointer to slide our 64K window 16 bytes forward; we can only use selectors that correspond to descriptors that Windows has already built.
As it happens, whenever you allocate a block of memory larger than 64K, Windows defines only as many selectors as are necessary to completely 'cover' the block with 64K "tiles". That is, even though byte $10000 within a large structure is logically and (may be) physically contiguous with byte $FFFF, we have to treat them almost as if they were in two completely different 64K structures: We can not use one selector to refer to both bytes. Similarly, we have to be careful that all references to multibyte types (like numbers, or records, or the 640 byte arrays in Figure 1) are completely contained within a single segment. Trying to read or write past the end of a segment will cause a UAE: We either have to make sure no multibyte structures straddle segment boundaries; refer to them byte-at-a-time; or use multiple block moves to move data to and from an intermediate buffer.
Working within this rigid framework of 64K peepholes into large blocks of memory complicates and slows down any
code that has to deal with huge arrays, but certainly shouldn't deter anyone who really deserves their 'programming
license'. What does cause trouble is the "__ahincr
double whammy": not only does
TPW not provide a Pascal binding for __ahincr
, the standard Windows constant that you can add
to a pointer's selector to step it 64K forward, the SDK manuals
only talk about how to use __ahincr
, not how to obtain it, since that is normally handled by the C
runtime library, or by the file MACROS.INC
that comes with MASM, not the SDK. Since I use TASM, not MASM, I had to ask around
until I found someone who could tell me "Oh, __ahincr
is just the offset of a routine you
import from the KERNEL
DLL just like you'd import any other API
function." (Rule Number 1
of Windows programming is Know A Guru: sooner or later, you're bound to run into something that's not in the
manuals and that you can't find by trial and error.)
As you can see, given __ahincr
, the HugeModl
unit in
Listing 1 is
pretty straightforward. The unit supports huge model programming on a variety of levels: It provides enhanced
GetMem and FreeMem routines that allow you to allocate blocks larger than $FFFF bytes, and it provides three levels
of tools for manipulating large data objects. The lowest level is, of course, a Pascal binding for
__ahincr
. This is used throughout the unit, and you may find your own uses for it, too. The middle level
is a set of functions and macros that can step a pointer by any amount or calculate the offset of any byte within an
object, while the top level is a Move routine that can move huge blocks without any apparent concern for the 64K tile
boundaries.
The enhanced GetMem and FreeMem routines look exactly the same as the System unit routines they "shadow"
except that the block size parameter can be greater than $FFFF. This lets you use the same set of routines for small
data as for large data, without having to do anything but put HugeModl
in your
uses
clause(s). GetMem and FreeMem pass 'small' (less than 64K) requests on to the System unit
routines, and call GetHuge and FreeHuge to handle the 'large' (at least 64K) requests. (Bear in mind that in
"Standard" (286) mode, Windows won't allocate any blocks larger than a megabyte.) The GetHuge routine uses the
GlobalAlloc function, which returns a handle to the allocated memory block, and then uses the GlobalLock routine to
convert the handle into a pointer. The FreeHuge routine in turn uses the GlobalHandle function to convert the
pointer's selector back to a handle, and then uses the GlobalFree call to free the handle. One important thing to note
is that FreeHuge (and, transitively, HugeModl.FreeMem) can only free pointers that came from GetHuge/GetMem:
You cannot allocate a block then free some bytes from the end (you'll have to use the Windows GlobalRealloc
function for that) nor can you FreeMem a largish typed constant, say, from the middle of your data segment.
Of course, once you lay claim to the continent, you have to get over the Appalachians! HugeModl supplies three
pointer math routines that can take you safely past the 'mountains at the end of the segment': OfsOf, which adds a
long offset to a base pointer; PostInc, which steps a pointer by a long and returns the original value; and
PreInc, which steps a pointer by a long and returns the new value. All three add the (32-bit) step to the base
pointer's (16-bit) offset, using the low word of the result as the new offset, and using the high word to step the selector
by the appropriate number of 64K "tiles". (With the {$X+}
enhancements, PreInc and
PostInc can be used as procedures that modify the pointer argument. PreInc is better for this than PostInc, as it does
two fewer memory accesses and is thus a little faster.)
All three pointer math routines are defined as inline
macros and as
assembler
routines. If the HugeModl unit is compiled with Huge_InLine
$define
-d, the routines are defined as macros, while otherwise each operation requires a far
call. It's probably best to use the macros because virtually every routine that takes a huge parameter will need to do
some pointer calculations, even though that very ubiquity also means that using the macros can add quite a lot to
your object code's size.
Whether you use OfsOf or PostInc/PreInc is largely a matter of taste, though it is often simpler and/or cheaper to step
a pointer (using PostInc/PreInc) by the array element's size than it is to keep multiplying a stepped array index by the
element size. In either case, you'll quickly find that there is one major drawback to using huge arrays "by hand"
instead of letting the compiler do it all for you transparently: the routines in the HugeModl unit all return untyped
pointers, and you'll end up having to use a lot of casts, if you don't want to use Move
for
everything. For example, something like for Idx := 1 to ArrayEls do SmallArray[Idx] := Idx;
will become Ptr := HugeArrayPtr; for Idx := 1 to ArrayEls do LongPtr(PostInc(Ptr, SizeOf(Long)))^ :=
Idx;
.
Of course, untyped pointers are no problem when you do use the Move routine as it, too, is meant to extend
the range of the System unit routine it shadows without breaking any existing code. Thus, the first two arguments are
untyped var
parameters pointing to the data's source and destination, while the third
argument is the number of bytes to copy. (Naturally, unlike the System unit's Move routine, HugeModl's can move
any number of bytes from 0 to $7FFFFFFF.)
You may find that reading the code for the huge Move routine will help you to write your own huge model code. It breaks a potentially huge block move, which might span several segments, into a succession of <= 64K moves, each entirely within a single segment. For each submove, it uses the larger of the source and destination offsets to decide how many bytes it can move without walking off the end of the source or destination segments. It then calls a word move routine to move that many bytes, increments the pointers, and decrements the byte count. Since the block move is by words, not bytes, it can easily handle a 64K byte move (when both the source and destination offsets are 0) as a single 32K word move.
Now, while the HugeModl.Move routine can handle structures that straddle segment boundaries, compiler-level
pointer expressions like Ptr^ := Fn(Ptr^);
can not. This means that you should only
use pointer expressions when you know that they will not cause an UAE by hitting the end of a segment. If the base
structure's size is a power of two (words, longs, doubles, &c), you can generally use pointer expressions so long
as the array starts an integral number of elements from the high end of the initial segment. That is, an array of
words should start on a word boundary (the offset must be even) while an array of double
s
should start on a qword boundary (the offset must be a multiple of eight). Since you have to process unaligned data
byte-at-a-time or via a buffer you Move data to and from, it may be worth adding blank fields to "short"
structures so that their size will be a power of two.
Since GetHuge always returns a pointer with an offset of 0, you don't have to worry about the alignment of "simple"
(fixed-length, headerless) arrays. However, most variable length arrays will have a fixed length header and a variable
length tail, which can leave the tail unaligned. Rather than using the object-oriented technique of Hax #?? [PC
Techniques volume 2, #1], where we declare the header as an object then declare the actual variable length
object as a descendant of the header object that adds a large "template" array, we can retain control over our huge
tails' alignment by simply putting a tail pointer in the header. Then, when we allocate the object, we can simply
allocate as many extra bytes as we might need to insert between the header and the tail to maintain proper
alignment. Thus, with a array of longint
tail, we would just allocate three extra bytes and say
something like TailPtr := PChar(HdrPtr) + ((Ofs(HdrPtr^) + SizeOf(Hdr) + 3) and $FFFC);
.
If you've gotten the impression that writing 'huge model' code under TPW is a lot more work than writing normal, '16-bit' code, you're both right and wrong. Yes, you will be making a lot of 'calls' to PostInc/PreInc, and you will be making a lot of casts of their results, but if you already write code that sweeps arrays by stepping pointers, you will probably find that using PostInc/PreInc makes for fewer source lines, which in turn tends to counter the legibility lost in all the calls and casts. Not to mention that huge model Pascal code is a lot easier to write and read than its 32-bit assembler equivalent!
Copyright © 1992 by Jon Shemitz - jon@midnightbeach.com - html markup 8-25-94