Accidental Complexity in OpenSSL HMAC functions
SSL Documentation Analaysis
This question is pertaining the usage of the HMAC routines in OpenSSL.
Since Openssl documentation is a tad on the weak side in certain areas, profiling has revealed that using the:
unsigned char *HMAC(const EVP_MD *evp_md, const void *key, int key_len, const unsigned char *d, int n, unsigned char *md, unsigned int *md_len);
From here, shows 40% of my library runtime is devoted to creating and taking down HMAC_CTX's behind the scenes.
There are also two additional function to create and destroy a HMAC_CTX explicetly:
HMAC_CTX_init() initialises a HMAC_CTX before first use. It must be called.
HMAC_CTX_cleanup() erases the key and other data from the HMAC_CTX and releases any associated resources. It must be called when an HMAC_CTX is no longer required.
These two function calls are prefixed with:
The following functions may be used if the message is not completely stored in memory
My data fits entirely in memory, so I choose the HMAC function -- the one whose signature is shown above.
The context, as described by the man page, is made use of by using the following two functions:
HMAC_Update() can be called repeatedly with chunks of the message to be authenticated (len bytes at data).
HMAC_Final() places the message authentication code in md, which must have space for the hash function output.
The Scope of the Application
My application generates a authentic (HMAC, which is also used a nonce), CBC-BF encrypted protocol buffer string. The code will be interfaced with various web-servers and frameworks Windows / Linux as OS, nginx, Apache and IIS as webservers and Python / .NET and C++ web-server filters.
The description above should clarify that the library needs to be thread safe, and potentially have resumeable processing state -- i.e., lightweight threads sharing a OS thread (which might leave thread local memory out of the picture).
How do I get rid of the 40% overhead on each invocation in a (1) thread-safe / (2) resume-able state way ? (2) is optional since I have all of the source-data present in one go, and can make sure a digest is created in place without relinquishing control of the thread mid-digest-creation. So,
(1) can probably be done using thread local memory -- but how do I resuse the CTX's ? does the HMAC_final() call make the CTX reusable ?.
(2) optional: in this case I would have to create a pool of CTX's.
(3) how does the HMAC function do this ? does it create a CTX in the scope of the function call and destroy it ?
Psuedocode and commentary will be useful.
The documentation for the HMAC_Init_ex() function in OpenSSL 0.9.8g says:
HMAC_Init_ex() initializes or reuses a HMAC_CTX structure to use the function evp_md and key key. Either can be NULL, in which case the existing one will be reused.
(Emphasis mine). So this means that you can initialise a HMAC_CTX with HMAC_CTX_Init() once, then keep it around to create multiple HMACs with, as long as you don't call HMAC_CTX_cleanup() on it and you start off each HMAC with HMAC_Init_ex().
So yes, you should be able to do what you want with a HMAC_CTX in thread-local memory.
If you aren't trying to restrict your dependencies, you could choose a HMAC implementation that is self contained and requires that the user explicitly control all the aspects that OpenSSL is, in it's documentation, vague about. Many such simple C/C++ alternatives exist, but it is up to you to choose and evaluate such an alternative.