Update: Improve performance by removing redundant memset().
The calloc() program is supposed to guarantee 0 filled data.
Either the libc or the kernel know how to optimize this automatically using numerous tricks based on architecture or lack thereof.
This makes calloc() potentially faster than malloc()+memset().
Calling calloc()+memset() is just ridiculous.
Remove the calls to memset() that follow a calloc() call.
This is guaranteed to be a performance increase (but how much? I didn't bother trying to find out).