What
is PPMalloc?
Why is it called PPMalloc?
What is the legal status of
PPMalloc?
What platforms is PPMalloc
useful for?
Why create PPMalloc; aren't there other allocators
already?
How efficient is PPMalloc?
Where do I get examples of how to use PPMalloc?
I found a bug in the PPMalloc package; what do I
do about it?
What is the complete set of compile-time
options (#defines) for PPMalloc?
How can block overhead be as little as 4 bytes?
Why doesn't GeneralAllocator have a special optimization
for small or tiny block sizes?
What are the min and max allocation
sizes possible with GeneralAllocator?
GeneralAllocator::kAllocationFlagHigh
doesn't seem to work. What's wrong?
How do I tell if GeneralAllocator
is initialized or how much core it has?
How do I temporarily disable the small
block cache to improve some kinds of fragmentation?
What optimization settings are required for
various PPMalloc modules to perform best?
I want to allocate memory on startup
(e.g. in global variables), but there are order of initialization issues.
How do I override operator new
and delete with GeneralAllocator?
How do I override malloc and free
with GeneralAllocator?
I want GeneralAllocator to automatically
allocate all available memory on startup but it doesn't seem to be able to
do that.
GeneralAllocator in its Shutdown is not freeing
the block of memory I gave it on Init.
I tell GeneralAllocatorDebug to use some
debug option with SetDefaultDebugDataFlag(s) and it doesn't work.
Why does GeneralAllocatorDebug seem to take long
to shut down when exiting my app?
How do I use GeneralAllocator in a managed DLL?
For multiprocessing consoles, do
I want a multiprocessor-savvy allocator such as Hoard or SmartHeap?
How do I force the usage of mapped memory on platforms
that have it?
Why must MallocAligned have an alignment offset
of a multiple of 8 and not something smaller?
Why doesn't GeneralAllocator let
you specify an allocator-wide minimum alignment?
How do I swap two allocators?
GeneralAllocator allocates memory in unexpected
ways.
How is high memory implemented in GeneralAllocator?
How do I tell the amount of free heap space from
GeneralAllocator? There's no simple accessor.
I get an assert or crash in GeneralAllocator. What
do I do?
How do I take advantage of large pages in Windows or
XBox 360?
How can I debug memory leaks?
How do I debug heap corruption problems?
Why is the GeneralAllocator mutex unlocked when
calling user hooks?
I've heard about using page protection to detect
memory problems. How do I do that?
Why use a heap manager on XBox 360, PS3 and PC
when they have a hardware MMU?
Is GeneralAllocator deterministic?
How do I prevent users from calling global malloc
and new?
I am getting a linker error regarding gpEAGeneralAllocator.
The Windows task manager is showing different memory
stats than GeneralAllocatorDebug.
Why might I use PPMalloc as opposed to ptmalloc?
How does PPMalloc compare to HeapAgent, BoundsChecker, Purify?
How much overhead does GeneralAllocator Malloc have compared to built-in malloc?
How do I override the malloc and free functions, as this isn't directly supported by C++ as with new/delete?
What's the best way to setup GeneralAllocator for use as the main app heap?
I'm getting incomplete XBox 360 crash dumps when using PPMalloc.
I want MallocAligned to ensure the result is minimally aligned to the requested alignment and no more.
How do I test the system's memory? I suspect I have bad RAM.
PPMalloc is a suite of memory allocators. At the core is GeneralAllocator, which is a fully generalized allocator which can replace malloc/free but does a lot more. Other specialized allocators are provided for specialized uses and include SmallObjectAllocator, StackAllocator, FixedAllocator, NonLocalAllocator, etc. The algorithm behind GeneralAllocator is derived from the well-studied dlmalloc allocator. The implementation is significantly improved over dlmalloc.
It is simply following the naming tradition of dlmalloc, lkmalloc, ptmalloc, and other variants. You can look these up on the internet if you wish. The PP prefix stands for Plenty Powerful.
What is the legal status of PPMalloc?
PPMalloc is usable for all uses within Electronic Arts, both internally and in shipping products for all platforms. All source code was written by a single EA engineer and none of the source code comes from an external source. The primary algorithm of GeneralAllocator is based on the public domain dlmalloc algorithm and it has been confirmed with EA legal that there are no legality issues with this.
What platforms is PPMalloc useful for?
PPMalloc is designed to work on all Electronic Arts platforms for both internal and shipping runtime code. The runtime optimized code will perform very well on all platforms, from handheld devices and embedded systems up to large server systems of multiple gigabytes. PPMalloc has been tested on both 32 bit and 64 bit systems.
The current set of platforms supported by PPMalloc include: PS2, PSP, PS3, GameCube, Revolution, XBox, XBox 360 (a.k.a. Xenon, XBox2), Win32, Win64, WinCE, Linux, MacOSX, Solaris, and BSD Unix.
Why create PPMalloc; aren't there other allocators already?
Quite simply, PPMalloc outperforms all other observed allocators within EA and in general has more functionality as well. PPMalloc outperforms dlmalloc, so even projects using dlmalloc will see performance improvements with PPMalloc. PPMalloc is particularly useful for its debugging features, which provide pointer validation, named blocks, per-block call stack tracing, detailed heap reporting, detailed heap validation, heap recording and playback, allocation hooks, arbitrary tagged data, flexible guard fills, detailed leak reporting, and more.
Speed-wise, PPMalloc's GeneralAllocator is similar to dlmalloc but slightly faster, assuming you use them configured equivalently. This algorithm is well-studied and provides some of the fastest performance seen in generalized allocators. PPMalloc has other more specialized allocators such as FixedAllocator and StackAllocator which provide fixed-size block allocation and pointer-increment allocation respectively. These later allocators work much like other similar systems you may have seen.
Memory-wise, PPMalloc's GeneralAllocator is nearly identical to dlmalloc in its efficiency. Due to interesting tricks done with internal block management, there is a nonimal overhead of only four bytes per allocation. This is about as low as it gets for a generalized allocator. PPMalloc is a strongly coalescing allocator; all free blocks are immediately merged with surrounding blocks in order to reduce fragmentation. FixedAllocator has zero overhead per an individual allocation but wastes memory due to unused blocks in its pool. StackAllocator has zero overhead per individual allocation. HandleAllocator has the same overhead as GeneralAllocator but has heap compaction functionality in order to eliminate fragmentation.
Where do I get examples of how to use PPMalloc modules such as GeneralAllocator?
Example code can be found in these documentation files, the project 'scrap' directory, and most significantly, the unit test code. The simplest example of all is this:
EA::Allocator::GeneralAllocator allocator;
allocator.Malloc(20);
I found a bug in the PPMalloc package; what do I do about it?
It would be best if you reported it to the package owner. You can find out about the package at the EA Package Server at http://packages.eac.ad.ea.com/
What is the complete set of compile-time options (#defines) for PPMalloc?
These options are defined for each module in the header file for that module. Examples of such defines for GeneralAllocator include (but aren't limited to) PPM_NULL_POINTER_FREE_ENABLED, PPM_HOOKS_SUPPORTED, PPM_AUTO_HEAP_VALIDATION_SUPPORTED, and PPM_NEW_CORE_SIZE_DEFAULT.
How can it be that the system overhead for an allocated block in GeneralAllocator can be as little as 4 bytes when an allocated Chunk looks like this (and thus seems to have an overhead of 8 bytes)?:
struct Chunk
{
uint32_t mnPriorSize;
uint32_t mnSize;
char mUserData[];
};
The reason is that allocated chunks use the mnPriorSize field of the next chunk in memory for user data. It turns out that the mnPriorSize field is only needed when applied to free chunks, so allocated chunks are free to use this field for user data. If all user allocations were an even multiple of 8 plus 4 (i.e. 4, 12, 20, 28, etc.), then allocations would pack very tightly indeed and would always have exactly and only 4 bytes of overhead. Alignment requirements and the user requests of odd-sized blocks (e.g. 13 bytes) can raise the nominal 4 byte overhead above 4.
Why doesn't GeneralAllocator have a special optimization for small or tiny block sizes?
Some generalized heap allocators have a special optimization for small or tiny block sizes. What they often do is reserve a fixed block of memory for allocations of 4, 8, 12 and 16 byte allocations and allocation requests within this size range come from this block and have zero overhead (unlike GeneralAllocator's 4 or 8 byte overhead).
Small block support with GeneralAllocator would be outside the domain of GeneralAllocator, would bloat the implementation, and would take away user freedom to implement a small block allocation scheme that best suites their needs. GeneralAllocator implements a single general purpose heap with a rich set of functionality for that heap. A small block allocator within GeneralAllocator would make it no longer be a single generalized heap and would make the heap functionality much more difficult to support.
The solution is to create a shim between the user and GeneralAllocator (which just about all projects already do) and call your own custom small block allocator from the shim. This gives the user the freedom of using their own small block allocator and allows the easy enabling and disabling of that functionality at compile-time or run-time. The examples directory has an example of such a small object allocator.
What are the min and max allocation sizes possible with GeneralAllocator?
The minimum allocation size is zero bytes.
The maximum allocation size is about 2GB - 2K on both 32 bit and 64 bit systems. However, there is an option to have the maximum be about 2^63 - 2K on 64 bit systems, at the cost of higher per-block overhead (8 bytes per block).
GeneralAllocator::kAllocationFlagHigh doesn't seem to work. I allocate a block with Malloc(16, GeneralAllocator::kAllocationFlagHigh); and the returned block isn't high.
With PPMalloc's GeneralAllocator, you can request that a block be allocated low or high in the heap. Blocks requested to be high in the heap are meant to be more permanent while those located low are more dynamic. The idea is that you get better heap packing with that design.
With PPMalloc, this functionality is enabled by making sure that PPM_HIGH_SUPPORTED is defined to 1. With console platforms this value defaults to 1, so you probably don't have to do anything for them. Once high allocation is enabled via this define, you need to enable it at runtime with a call to SetOption(GeneralAllocator::kOptionEnableHighAllocation, 1). The reason this is required is that there is a (albeit small) runtime cost to this functionality. From there you simply pass in the kAllocationFlagHigh flag to Malloc.
How do I tell if GeneralAllocator is initialized or how much core it has?
In the name of simplicity, there currently is no function to tell if GeneralAllocator has been initialized (and thus has core). But here is a function that does this for you:
size_t GetCoreSize(GeneralAllocator* pGA)
{
size_t size = 0;
const void* const pContext = pGA->ReportBegin(NULL, GeneralAllocator::kBlockTypeCore);
for(const GeneralAllocator::BlockInfo* pBlockInfo = pGA->ReportNext(pContext);
pBlockInfo; pBlockInfo = pGA->ReportNext(pContext))
{
size += pBlockInfo->mnBlockSize;
}
pGA->ReportEnd(pContext);
return size;
}
How do I temporarily disable the small block cache to improve some kinds of fragmentation?
The small block cache (a.k.a. fast bin cache) improves speed and in some cases improves fragmentation. However, there are some allocation patterns whereby the small block cache worsens fragmentation. Memory allocation patterns can be very complex to analyze, and so there is no simple way to provide a simplified predictor for this situation. You can however enable and disable the small block cache with the kOptionMaxFastBinRequestSize function, as so:
SetOption(GeneralAllocator::kOptionMaxFastBinRequestSize, 0); // Disables it SetOption(GeneralAllocator::kOptionMaxFastBinRequestSize, 64); // Enables it
You may want to try enabling and disabling the cache at certain times during runtime. Additionally, following sound practices regarding memory allocation practices will help alleviate the situation as well.
What optimization settings are required for various PPMalloc modules to perform best?
Some of the PPMalloc modules require function inlining to be enabled to perform at their best. These modules are GeneralAllocator, FixedAllocator, and StackAllocator. With other modules, inlining doesn't significantly matter. The inlined functionality of the above three modules is chosen such that the inlined functions are almost always very small one to three line functions whose inlining not only improves the speed of the function but also adds little or no size to it due to the smallness of the operation. Thus there is almost no reason not to enable inlining for these modules in an optimized build.
I want to be able to allocate memory on startup (e.g. in global variables), but there are order of initialization issues.
If you want to create an instance of allocators on startup and have them initialize before other global objects (which may need these allocators), you need to tell the compiler to alter the initialization order. This can be usually be done by altering the order of object files passed to the linker, but can also be done manually in the source code with many compilers.
With GCC 3.x or later, it is done at the object level with:
__attribute__ ((init_priority (n)))
where n is a value between 101 and 65535 (65535 is default) and lower numbers mean earlier initialization.
An example would be:
GeneralAllocator gEAGeneralAllocator(arguments) __attribute__ ((init_priority (1000)));
GeneralAllocator* gpEAGeneralAllocator = &gEAGeneralAllocator; // Doesn't need init_priority attribute.
Unfortunately, reports are that not all variations of GCC 3.x support this properly.
With GCC 2.x (e.g. the PS2 and GameCube compilers), the init_priority attribute is broken and cannot be used to solve this problem. In this case you have no choice but to rearrange the link order or to do an on-demand initialization of the allocator.
With VC++, it is done at the source file level by adding either of the following to the top of the .cpp file:
#pragma init_seg(compiler)
#pragma init_seg(lib)
An example would be:
#pragma init_seg(lib)
GeneralAllocator gEAGeneralAllocator(arguments);
GeneralAllocator* gpEAGeneralAllocator = &gEAGeneralAllocator;
How do I override operator new and delete with GeneralAllocator? How about for malloc and free?
For overriding new and delete, see EANewDelete.cpp. However, the standard set of functions to override are:
void* operator new(size_t)
void* operator new(size_t, std::nothrow_t&)
void* operator new[](size_t)
void* operator new[](size_t, std::nothrow_t&)
void operator delete(void*)
void operator delete(void*, std::nothrow_t&)
void operator delete[](void*)
void operator delete[](void*, std::nothrow_t&)
Additionally, when compiling some VC++ code (you especially run into this with MFC), you will find that Microsoft does this in debug builds:
#define new new(__FILE__, __LINE__)
So if you have code that uses new, it gets silently transformed into a different function call. To make sure you aren't allocating memory from one heap (i.e. MS) and free it with another (i.e. your heap), you need to make sure that you implement a file/line version yourself so that those redefines result in your heap being used. Here is such an example:
void* operator new (size_t n, const char* pFile, int line);
void* operator new[] (size_t n, const char* pFile, int line);
void operator delete (void* p, const char* pFile, int line);
void operator delete[](void* p, const char* pFile, int line);
The C++ language doesn't allow you to override malloc and free (as opposed to new and delete) at link time in any portable or straightforward way. To override malloc and free requires tricks played with the linker which are system-specific. Application usage of malloc and free can of course be overridden at compile time by #defining malloc and free to be something else that you provide. This technique would work for any source code that you are compiling but would not work for code that comes from some external library and is thus already compiled to use malloc and free directly.
GCC lets you override malloc and related functions via the --wrap linker option, currently documented at http://sourceware.org/binutils/docs-2.16/ld/Options.html#index-_002d_002dwrap-235. To use it you define your function prefixed with "__wrap_" (e.g. __wrap_malloc) and declare an extern version of the function prefixed with "__real_" (e.g. __real_malloc). You then supply a linker argument of the form "--wrap,<func>" (e.g. --wrap,malloc) for each function you want to wrap. Typically you would want to wrap malloc, calloc, realloc, memalign, and free (using the following linker arguments: "--wrap,malloc,--wrap,calloc,--wrap,realloc,--wrap,free"). Here is some example code:
#include <stdlib.h>
extern "C" {
void* __real_malloc(size_t n);
void* __wrap_malloc(int n)
{
return __real_malloc(n);
}
}
int main(int, char**)
{
free(malloc(123));
return 0;
}
I want GeneralAllocator to automatically allocate all available memory on startup but it doesn't seem to be able to do that.
GeneralAllocator does not automatically try to make decisions about how it is to be used. Such decisions might be fine for one team but not for another. However, GeneralAllocator does let you initialize it with a hint of how much memory to start with and you can set this hint to be a very large number if you wish. Also, if GeneralAllocator needs memory but has none, it will attempt to obtain memory from the system (if you don't have this feature disabled).
GeneralAllocator in its Shutdown is not freeing the block of memory I gave it on Init.
The AddCore function (and the Init function) gives you two arguments: bShouldFreeCore and bShouldTrimCore. bShouldFreeCore lets the allocator free the core if during use it becomes unused. bTrimCore lets the allocator free part of the core if it becomes unused. If you set bShouldFreeCore to false, then the allocator will never free the core, including on its shutdown. The reason for this is that very often the reason you set bShouldFreeCore to false is that the core simply cannot be freed (e.g. static memory) or the allocator can't know how to free the core. It may be worth considering adding an extra option to AddCore which allows freeing of the core only on Shutdown, but in the meantime manual freeing of user-supplied core has worked fine for people.
I tell GeneralAllocatorDebug to use some debug option with SetDefaultDebugDataFlag(s) and it doesn't work.
The most common cause of this is that you are using the option id and not the option flag. As noted in the function documentation, if you want to enable kDebugDataIdCallStack, you need to pass in (1 << kDebugDataIdCallStack) as the flag and not kDebugDataIdCallStack by itself. Another cause of this problem is specific to kDebugDataIdCallStack and happens when call stack tracing is simply not supported for the current platform.
Why does GeneralAllocatorDebug seem to take long to shut down when exiting my app?
GeneralAllocatorDebug implements delayed freeing of blocks and depending on the settings, there may be thousands or tens of thousands of such blocks that need to be freed on shutdown. Validation checks are run on these blocks as they are freed and depending on other settings these validation checks can in sum take a relatively long time. Additionally, automatic heap validation can kick in frequently during the freeing of these blocks.. Normally all of this is a good thing, as GeneralAllocator and GeneralAllocatorDebug are very strongly validating and do a good job of finding heap problems.
If you want to reduce or disable this slowness, you can reduce the delayed free settings, reduce the guard fill settings, reduce the PPM_DEBUG level to 1, and reduce the automatic heap validation level and/or frequency.
How do I use GeneralAllocator in a managed DLL?
There are only issues here if you want to declare a GeneralAllocator object as a global variable. The problem is that the managed C++ DLL startup code doesn't initialize global variables automatically. We don't have a single comprehensive answer to print here at this time (though a little research should rectify this), but you can read about the situation by reading documents such as this: http://support.microsoft.com/default.aspx?scid=kb;en-us;814472
For multiprocessing consoles, do I want a multiprocessor-savvy allocator such as Hoard or SmartHeap.
You probably don't want a multiprocessor-savvy allocator such as Hoard or Smart Heap. The reason is that these allocators are optimized for the case where memory is very frequently being allocated from multiple threads on multiple processors at the same time. An example of such as case might be a heavily stressed commercial web server. The downside to these allocators is that they are slower than conventional allocators and waste much more memory that conventional allocators. A well-written game application would seek to minimize the amount of dynamic memory allocation and would seek to reduce the allocation volume coming from multiple threads. Additionally, multiprocessor-savvy allocators don't generally start becoming beneficial unless the machine has at least four processors.
How do I force the usage of mapped memory on platforms that have it?
As of 5/2005, the platforms that support mapped memory are Windows, Xenon (a.k.a. XBox2), and most Unix variants. The PS3 supports a form of mapped memory which is not entirely understood at this early stage in development.
To force the usage of mapped memory, you can use the GeneralAllocator kAllocationFlagMMap allocation flag. Alternatively, you can fiddle with the GeneralAllocator option flags related to mapped memory:
kOptionMMapThreshold
kOptionMMapMaxAllowed
kOptionMMapTopDown
See the code documentation for up-to-date details on all of the above features.
Why must MallocAligned have an alignment offset of a multiple of 8 and not something smaller?
It is not possible to have aligned memory returned on multiples other than 8 without adding unusual hacks to the system that would either be slow, burn memory, or be unreliable. Such a hack, for example would be to allocate a bunch of extra memory and fill it with a magic number that isn't likely to be duplicated and use that to tell that you have a unusually-aligned object. Just about any generalized heap (which doesn't implement aligned allocations via wasted memory) with a minimum alignment of 8 would have this same situation, and data types such as uint64_t and double just about require that generalized allocations be of a minimum alignment of 8.
There are the following resolutions:
Why doesn't GeneralAllocator let you specify an allocator-wide minimum alignment?
(Despite the following discussion, PPMalloc now has the PPM_MIN_ALIGN define which controls minimum alignment at compile time).
It can be imagined that it might be useful to set the allocator to always return memory with 32 byte alignment, as this would make all allocations friendly to cache line behaviour on platform X. However, GeneralAllocator does not support this behaviour on the grounds of efficiency and portability. Specifically:
Consider that the EA_ALIGN_OF (or GCC's __alignof__) operator can be used to help code detect alignment of a structure in an automatic way for you in most cases. Consider also that you can make a custom operator new that takes alignment, like this:
#include <new>
void* operator new(size_t size, size_t align)
{ return gAllocator.MallocAligned(size, align); }
void* operator new[](size_t size, size_t align)
{ return gAllocator.MallocAligned(size, align); }
Then users can do this:
Thing* pThing = new(16) Thing;
You can swap two allocators via std::swap or via any typical equivalent, as the allocators are designed to support conventional C++ object manipulation constructs. For example:
StackAllocator sa1; StackAllocator sa2; std::swap(sa1, sa2);
GeneralAllocator allocates memory in unexpected ways.
If you create a GeneralAllocator instance and allocate a few blocks of memory and then do a memory dump, the memory dump might not look how you initially expected it to look. This can be especially true if you are doing aligned allocations and are doing frees of this memory, and if any of this is being done with GeneralAllocatorDebug as opposed to just GeneralAllocator. The reasons for this are many, and we could spend quite a few pages of text here explaining in detail what is going on, but suffice it to say that there are no known bugs in GeneralAllocator with respect to this and the reasons for unexpected memory layouts can be summarized as:
Another result of this is that sometimes users attempt to manually calculate memory overhead by subtracting the returned pointers of two successive allocations and incorrectly conclude that the overhead is more than it should be.
How is high memory implemented in GeneralAllocator?
For a heap that is a single large core block (e.g. 32MB), all memory that is below a given designated address is low memory and all memory that is above it is high memory. This value of this address changes at runtime as the user allocates memory from low and high memory. For a 32MB heap it starts out at the 16MB point, and as the user allocates a lot of low memory, the dividing point (mpHighFence) moves upward. Then if the user starts allocating high memory, the dividing point moves downward, but is always in between the top of low memory and the bottom of high memory.
The only complication comes in the case where there are multiple core blocks. In this case, the middle of the first core block is the dividing point and high memory is only memory between that middle point and the top of that core block. Low memory is all low memory in that core block and all memory from any other core blocks. If you exhaust the space from the first core block, the dividing point is moved to a position within the next core block with free space. As soon as that happens then all memory that was considered "high" in the original core block is now considered low and only memory above the dividing point in the new core block is considered high. If the user frees a core block with the dividing point in it, a new dividing point is found within another core block.
Memory that straddles the dividing point (i.e. start of block is before it but end of block is after it) is considered to be low memory. The only case this can happen is if all memory in between the low area and the high area has been exhausted and there are no other core blocks.
How do I tell the amount of free heap space from GeneralAllocator? There's no simple accessor.
Aside from the GeneralAllocator::GetLargestFreeBlock function, and GeneralAllocatorDebug's Metrics functions, there isn't a single function to get the amount of free space. The reason for this is primarily three-fold:
However, you can get a decent practical estimate for the amount of free space by getting the amount of allocated memory and subtracting it from the size of the allocator's heap:
freesize = heapsize - allocator->GetMetrics(GeneralAllocatorDebug::kMetricTypeAll).mnAllocationVolume;
It might be useful to put this functionality into GeneralAllocator and document it as being only an estimate, as things like delayed frees (if you have them enabled) can make the free space look smaller than it is.
Also, aside from the above solutions there is the GetLargestFreeBlock function, which is another memory estimate designed to be very fast but has only so much use.
I get an assert or crash in GeneralAllocator. What do I do?
Most of the time asserts or crashes in GeneralAllocator are due to the user over-running or otherwise corrupting the heap. These asserts or crashes that are due to user-caused heap corruption are often seen in the GeneralAllocator::ClearFastBins function and sometimes occur in release builds and not debug builds. However, the asserts or crashes can occur elsewhere as well. If you are seeing crashes in release but not debug builds, it may be due to the fact that GeneralAllocator packs memory a little more tightly in release builds and just a single byte overrun can cause problems. You can test this by modifying the PPM_DEBUG_PRESERVE_PRIOR value (see the documentation). If an assert occurs while the user is freeing a pointer, the chances are that the pointer has already been freed previously. You can use GeneralAllocatorDebug's kOptionEnablePtrValidation to specifically detect double-frees as they are attempted.
If you are getting a crash in GeneralAllocator, the chances are at least 98% that it is due to user code corrupting the heap. Note that just because some previous memory manager that you used didn't crash doesn't mean your code was good. GeneralAllocator packs memory very efficiently, but is concomitantly more sensitive to user errors. That being said, there is always the possibility that some as-yet undetected problem with GeneralAllocator exists. There have been two significant GeneralAllocator bugs found in 2005 and both were cases where the heap was completely exhausted and the user requested more memory but GeneralAllocator but didn't deal with the situation properly (i.e. it didn't simply return a NULL pointer to the user, whereas it should have done so).
See also the entry entitled How do I debug heap corruption problems?
How do I take advantage of large pages in Windows or XBox 360?
Large pages is a feature of XBox 360, Win64, and Win32 (Vista and later). A large page is a (usually) 64K page as opposed to a 4K page. Large pages allow memory accesses to be faster (due to lower system overhead) and should be taken advantage of in larger applications when possible. The way to do this with GeneralAllocator is to supply GeneralAllocator with core memory (e.g. AddCore) that uses large pages allocated via VirtualAlloc.
On XBox360, large pages are 64K and you can request them via VirtualAlloc by ORing MEM_LARGE_PAGES to the VirtualAlloc flAllocationType parameter. This will round up your allocation size to a multiple of the page size and give you the requested type of page.
On Windows platforms, large pages aren't necessarily 64K and you need to call GetLargePageSize to determine the min large page size. Also beware than on Windows large page requests may fail due to system resources being exhausted, and so you should have a fallback plan.
Large page functionality is not available on Sony or Nintendo gaming platforms.
GeneralAllocator currently doesn't attempt to allocate large pages, as XBox 360 users normally supply core memory manually to GeneralAllocator instances. With respect to Windows platforms, the operating system that support large pages aren't available as of this writing.
There are two primary tools in tracking memory leaks:
Additional techniques for tracking down leaks:
If GeneralAllocator reports a memory leak, it is 99% likely that it is in fact a leak and not a problem with the allocator itself. However, improper init/shutdown of the allocator can cause something to look like a leak when it isn't. This can happen, for example, when you shutdown the allocator before you are done using the memory you allocated from it.
How do I debug heap corruption problems?
90% of heap corruption is due to the following simple causes:
Other causes of heap corruption include (but are not limited to) the following:
The first thing you want to do is use the ValidateHeap function and possibly the AutoHeapValidation feature. Additionally there is the delayed free feature and the pointer validation feature (kOptionEnablePtrValidation) to help find corruption. Make sure your PPM_DEBUG level is high enough. Another solution for some platforms is to create a page protected heap.
Most of the time asserts or crashes in GeneralAllocator are due to the user over-running or otherwise corrupting the heap. These asserts or crashes that are due to user-caused heap corruption are often seen in the GeneralAllocator::ClearFastBins function and sometimes occur in release builds and not debug builds. However, the asserts or crashes can occur elsewhere as well. If you are seeing crashes in release but not debug builds, it may be due to the fact that GeneralAllocator packs memory a little more tightly in release builds and just a single byte overrun can cause problems. You can test this by modifying the PPM_DEBUG_PRESERVE_PRIOR value (see the documentation). If an assert occurs while the user is freeing a pointer, the chances are that the pointer has already been freed previously. You can use GeneralAllocatorDebug's kOptionEnablePtrValidation to specifically detect double-frees as they are attempted.
If you are getting a crash in GeneralAllocator, the chances are at least 98% that it is due to user code corrupting the heap. Note that just because some previous memory manager that you used didn't crash doesn't mean your code was good. GeneralAllocator packs memory very efficiently, but is concomitantly more sensitive to user errors. That being said, there is always the possibility that some as-yet undetected problem with GeneralAllocator exists.
Here is some example code that might be useful:
allocator.SetDefaultDebugDataFlag(EA::Allocator::GeneralAllocatorDebug::kDebugDataIdGuard); allocator.SetGuardSize(3.f, 128, 1024); allocator.SetOption(EA::Allocator::GeneralAllocatorDebug::kOptionEnablePtrValidation, 1); allocator.SetAutoHeapValidation(EA::Allocator::GeneralAllocator::kHeapValidationLevelDetail, 16); allocator.SetDelayedFreePolicy(EA::Allocator::GeneralAllocatorDebug::kDelayedFreePolicyCount, 1000);
The numbers above can be changed to use more aggressive numbers, though the return on investment gets flatter.
If you can run on a PC or have a lot of free memory, you can try using page protected allocations via EAStompAllocator (PPMalloc\dev\examples\EAStompAllocator.h/cpp).
Why is the GeneralAllocator mutex unlocked when calling user hooks?
The reason is that doing so is unsafe because it can create deadlocks that the user may not be able to avoid. So GeneralAllocator puts the onus on the user to make the decision on locking, and gives the user the Lock function to do it right. If GeneralAllocator were to lock the mutex, the user would have no recourse in a deadlock situation, as the user cannot safely unlock the mutex on behalf of GeneralAllocator.
Consider this situation:
User A calls Malloc, which locks the mutex and calls the hook function.
User B calls system X, which locks its own mutex before calling GeneralAllocator to do something.
The hook function calls system X, which locks its own mutex before calling GeneralAllocator to do something.
In the above situation we have a deadlock because User A has the allocator mutex locked while user B has the X mutex locked. And user a needs the X mutex while user B needs the allocator mutex.
The only way for GeneralAllocator to deal with this in a safe way is to not lock the mutex or to possibly have a user-level option to not lock mutexes on hook callbacks and let the user beware.
I've heard about using page protection to detect memory problems. How do I do that?
This technique works on machine architectures that support page-level memory allocation and mapping. The technique is to not use a general purpose allocator such as GeneralAllocator but instead to directly allocate pages for each user allocation and to align the returned pointer so that the end of the user's requested size is at the end of the page. The page after that is set to be non-read/write and thus any reads or writes the user attempts beyond the requested size is immediately trapped with an exception.
We provide an example of this technique supplied by Kevin Perry in the examples directory present in the PPMalloc package; see examples/EAStompAllocator.h.cpp).
Note that this technique is great for debugging but is perhaps not something you can ship with, as it is a little slow, chews up a lot of address space, and disables other allocator functionality that allocators such as GeneralAllocator have.
Current machines that support this functionality are XBox360 (a.k.a. Xenon)/PowerPC, XBox/x86, Win32/x86, Win64/x86-64, Win64-Itanium. Sony and Nintendo-derived platforms such as PS2, PS3, GameCube, and Revolution do not support this functionality.
Why use a heap manager on XBox 360, PS3 and PC when they have a hardware MMU?
Yes and no. The system memory manager has 4 GB of address space to map, but it still has only n MB of physical memory to map to it. And while XBox, XBox 360, and PS3 have memory mapping, they don't have virtual memory (whereby memory is swapped to a hard drive when physical memory is exhausted). Thus, mapped memory that the system supplies must come from a store of free physical memory, and it must be allocated in page-sized blocks. Additionally, PS3 and XBox 360 run significantly faster (~10-15%) if you allocate mapped memory with 64 K pages as opposed to 4K pages, so any allocations that are mapped allocations will want to use 64K pages, though that potentially burns a lot more unused memory.
But the real issue is that you need to either do all all your app's allocations with mapped memory or you need to reserve some memory for a conventional space-efficient heap and reserve some memory for mapped allocations. If you choose to do all your app's allocations with mapped memory then you can waste a lot of memory, especially if you try to use those 64K pages. If you choose to do all your app's allocations with a conventional heap, you save a lot of space and run fast (because that heap itself was allocated with 64K pages). Chances are that you want a little of both. If you can figure out an ideal amount of memory to dedicate to the main regular heap on startup, then the rest of physical memory can be allocated as mapped memory. And PPMalloc's GeneralAllocator supports this as per above. GeneralAllocator does not have a feature where by if mappable memory is exhausted, it can try to free some of its regular heap memory back to the system. Actually, it turns out that GeneralAllocator does have this feature, but it won't succeed if your regular heap memory is highly sliced up by allocations; so I'm not sure you could rely on it.
Is GeneralAllocator deterministic?
This would require a definition of deterministic, and more specifically, deterministic with respect to what? It seems to me that there two primary aspects of determinism at hand here:
With respect to structural determinism, PPMalloc is deterministic if you supply it with consistent core. On a console, you generally obtain an N MB block of memory and hand it over to PPMalloc and tell it to use that. This will result in deterministic behaviour. You can also do this with desktop and server platforms such as Win32 and Linux, though people tend not to do this in the shipping version of such applications. With these desktop platforms, people often let PPMalloc get its 'core' memory from the system as it needs it and as the system provides it. This is non-deterministic, particularly on Windows.
With respect to temporal determinism, there are two kinds of determinism:
PPMalloc is deterministic with respect to the first item, but not the second item. If you do ten calls to Malloc, the amount of time each takes can and will differ, sometimes by an order of magnitude. If you are using Win32 and letting it obtain core automatically from the OS, then the time can vary by two or more orders of magnitude in the case that core has been exhausted and new core needs to be obtained from the system. This latter case is rare and shouldn't happen in a well-tuned game, but it is a theoretical possibility. Making a general heap that is fast, space-efficient, and yet temporally deterministic is a difficult exercise. You tend to get only two out of these three features. A strictly bin-based allocator will be fast and more temporally deterministic, but it wastes memory.
How do I prevent users from using global malloc and new?
The two primary approaches to this are:
The first option is the easiest but VC++ doesn't make it easy to override malloc.
The second version can override malloc easily but some platforms (e.g. XBox 360) disallow writing to code and provide no memory unprotection mechanism that can get around it. You can write to code in a DLL on disk before loading the DLL, but this fails for malloc on XBox 360 because Microsoft doesn't provide a C runtime library in a DLL. A modification of the executable before running it would likely work, though the implementation of this is currently beyond the scope of this FAQ.
I am getting a linker error regarding gpEAGeneralAllocator.
You would get a link error if you are compiling code that references such a pointer. A couple of the specialized PPMalloc allocators (e.g. StackAllocator) references such a variable. This variable is something that the application defines and initializes, as only the application knows what it wants the value to be set to. It is basically a standard name for the global GeneralAllocator instance. If you don't compile files which use gpEAGeneralAllocator, then you wouldn't have to define such a variable.
The Windows task manager is showing different memory stats than GeneralAllocatorDebug.
GeneralAllocator merely uses the lower level Windows allocation function to get its core memory: VirtualAlloc. So any discrepancies are going to have a logical explanation. GeneralAllocator is not doing anything unusual. In practice, there is no easy way that GeneralAllocator could provide/report the memory usage numbers reported by the task manager, as there are many more things that allocate memory than just the main application's heap, such as:
Almost certainly the above are a major part of discrepancies between Windows task manager values and GeneralAllocator values.
Why might I use PPMalloc as opposed to ptmalloc?
ptmalloc is a variant of dlmalloc which has some built-in multithreading knowledge.
How does PPMalloc compare to HeapAgent, BoundsChecker, Purify?
PPMalloc:
| + | Free and usable without limitations. | |
| + | Is always present. Works with the codebase as you develop it. | |
| + | Better supported than anything else. You get to email the actual author and get source code. | |
| + | Provides just about all the conventional heap validation options that others have, plus some additional. This includes active bad memory write detection when using the page protected allocator (though it's not as flexible as with Purify). Actually, PPMalloc's heap validation is more rigorous than that provided by HeapAgent or BoundsChecker. | |
| + | Works on all conceivable platforms and compilers and doesn't break when the compiler or OS changes. | |
| − | Doesn't detect overruns as they occur (detects them when the heap or pointer is validated). The page protected allocator can provide some of this detection, though at some cost and limitations. | |
| − | Debug functionality works only on PPMalloc heaps. Can't use it on the Windows provided heap, for example. |
HeapAgent:
| + | Has a GUI app to go with the heap validation functionality. | |
| + | Probably better than BoundsChecker at heap validation, but does only heap validation. | |
| − | Can't work with a user-implemented heap. | |
| − | Is tied to compiler revisions. Change you compiler and your HeapAgent may break. |
BoundsChecker:
| + | Does Windows API validation, profiling, and coverage testing as well as doing memory validation. | |
| − | Can't work with a user-implemented heap. | |
| − | Is tied to compiler revisions. Change you compiler and your HeapAgent may break. |
Purify:
| + | Works the best of all solutions… if you can get it to work. It instruments the binary code, unlike other solutions which patch malloc calls. Purify is rather like Valgrind (Linux tool). So it sees your bad memory writes as they occur. | |
| + | Non-intrusive. You can just point it at an app in its /bin directory. | |
| + | Has a working downloadable demo. | |
| + | Can detect memory leaks as they occur instead of just when the app is exiting. | |
| − | Cost is prohibitive for team-wide use. | |
| − | Very flaky. We gave up on it after not being able to get it to work with large projects and trying technical support. | |
| − | Very slow. Works only on small projects. It is unusable on large PC games such as Sims 2 and SimCity 4. | |
| − | Can't fully work with a user-implemented heap. |
PPMalloc's debug heap functionality is actually fairly similar to HeapAgent and BoundsChecker. They don't do much more than PPMalloc. Purify is another story, as it is more powerful than any of the others. In practice you are most likely to be able to solve whatever problems you are having by simply cranking up the PPMalloc debug options. You will definitely solve them if they are repeatable. See other FAQ entries for information on the options.
Also there are the EACallstack and ExceptionHandler packages which provide additional debug functionality that may be useful. One of the things EACallstack has (in addition to its primary functionality) is the CallstackRecorder, which can be used to record and match refcount mismatches and resulting leaks.
How much overhead does GeneralAllocator Malloc have compared to built-in malloc?
Here is a comparison between the Microsoft C Runtime Library malloc and PPMalloc GeneralAllocator. An application was run which does nothing but allocate memory of a given size until it fails.
Number of allocations that could be done.
sizeCRT Debug CRT Release PPMalloc Debug PPMalloc Release 0 7,783,292 15,578,941 31,096,762 31,151,737 4 7,783,292 15,578,941 31,096,762 31,151,737 8 7,783,292 15,578,941 20,731,155 31,151,737 16 6,225,129 15,578,941 15,548,365 20,767,806 32 5,188,860 10,383,434 10,365,576 12,460,687 64 3,891,646 6,230,066 6,219,334 6,922,601 128 2,594,427 3,461,148 3,455,191 3,664,894 256 1,556,655 1,832,783 1,829,197 1,887,957 512 864,805 944,164 942,298 958,490 1024 457,836 479,236 478,384 482,943
Volume of memory that could be allocated.
sizeCRT Debug CRT Release PPMalloc Debug PPMalloc Release 0 0 0 0 0 4 31,133,168 62,315,764 124,387,048 124,606,948 8 62,266,336 124,631,528 165,849,240 249,213,896 16 99,602,064 249,263,056 248,773,840 332,284,896 32 166,043,520 332,269,888 331,698,432 398,741,984 64 249,065,344 398,724,224 398,037,376 443,046,464 128 332,086,656 443,026,944 442,264,448 469,106,432 256 398,503,680 469,192,448 468,274,432 483,316,992 512 442,780,160 483,411,968 482,456,576 490,746,880 1024 468,824,064 490,737,664 489,865,216 494,533,632
How do I override the malloc and free functions, as this isn't directly supported by C++ as with new/delete?
To do this on GCC-based applications you need to use the --wrap compile option to tell the compiler that you want to wrap malloc and free with your own implementation. With VC++-based applications you should be able to just provide your own malloc and free functions and the linker will ignore the standard library versions.
What's the best way to setup GeneralAllocator for use as the main app heap?
For console platforms, a good pattern is to do the following:
For PC or server platforms (e.g. Windows, MacOS, Unix), it's usually best to not init GeneralAllocator with any core memory and to just let it allocate the core memory itself. It may be useful to set some of the options, but you may find you don't need to set any options. However, if you know that your app will be using at least N megabytes at runtime then it's probably a good idea to seed the allocator with that minimum N amount of memory. On Windows you would allocate that memory with VirtualAlloc; on Unix systems you can allocate it with sbrk or simply malloc.
I'm getting incomplete XBox 360 crash dumps when using PPMalloc.
This is due to the fact that the XBox 360 ignores physical memory when generating crash dumps. And by default PPMalloc GeneralAllocator allocates physical memory under XBox 360 (EA_PLATFORM_XENON). There are good reasons for it to allocate physical memory, so the default is there for a reason. A workaround for this problem is to allocate memory yourself for GeneralAllocator via VirtualAlloc (as opposed to XPhysicalAlloc). The downside is that this memory can't be used where the 360 requires physical memory.
I want MallocAligned to ensure the result is minimally aligned to the requested alignment and no more.
This kind of functionality is useful for debugging, whereby such a feature can help guard against mistaken alignment assumptions in code. GeneralAllocator doesn't have a means for the user to specify that the returned pointer is aligned to the user value (e.g. 32) and not coincidentally by more (e.g. 64). This is a feature that could probably be implemented but would take some effort to achieve a fully satisfactory implementation. However, if the user is willing to put up with some memory waste for this debug feature then this functionality can be built on top of a PPMalloc heap or any heap which has an aligned allocation function. See examples/MinimalAlignmentAllocator.h/cpp for an implementation provided by Dave Cope.
How do I test the system's memory? I suspect I have bad RAM.
On desktop platforms (Windows and Linux) it's best to run Memtest86+. On Macintosh you can run the Mac equivalent of Memtest. On PS3, the system provides a diagnostic feature which users can run. On XBox 360, you can build and run the MemoryTest app that Kevin Perry wrote and maintains on EA's internalPerforce server at //EAOS_SB/kperry/memorytest/. We don't currently have answers for other platforms.