Struct hack equivalent in C++

The struct hack where you have an array of length 0 as the last member of a struct from C90 and C99 is well known, and with the introduction of flexible array members in C99, we even got a standardized way of using it with []. Unfortunately, C++ provides no such construct, and (at least with Clang 3.4), compiling a struct with either [0] or [] will yield a compilation warning with --std=c++11 -pedantic:

$ cat test.cpp 
struct hack {
  char filler;
  int things[0];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:14: warning: zero size arrays are an extension [-Wzero-length-array]
  int things[0];

and similarly

$ cat test.cpp 
struct fam {
  char filler;
  int things[];
};
$ clang++ --std=c++11 -pedantic test.cpp
\test.cpp:3:7: warning: flexible array members are a C99 feature [-Wc99-extensions]
  int things[];

My question then is this; say that I want to have a struct that contains an array of variable size as the last item in C++. What is the right thing to do given a compiler that supports both? Should I go with the struct hack [0] (which is a compiler extension), or the FAM [] (which is a C99 feature)? As far as I understand it, either will work, but I am trying to figure out which is the lesser evil?

Also, before people start suggesting keeping an int* to a separately allocated piece of memory in the struct instead, that is not a satisfactory answer. I want to allocate a single piece of memory to hold both my struct and the array elements. Using a std::vector also falls into the same category. If you wonder why I don't want to use a pointer instead, the R.'s answer to another question gives a good overview.

There have been some similar questions elsewhere, but none give an answer to this particular question:

Answers


You can get more or less the same effect using a member function and a reinterpret_cast:

int* buffer() { return reinterpret_cast<int*>(this + 1); }

This has one major defect: it doesn't guarantee correct alignment. For example, something like:

struct Hack
{
    char size;
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

is likely to return a mis-aligned pointer. You can work around this by putting the data in the struct in a union with the type whose pointer you are returning. If you have C++11, you can declare:

struct alignas(alignof(int)) Hack
{
    char size;
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

(I think. I've never actually tried this, and I could have some details of the syntax wrong.)

This idiom has a second important defect: it does nothing to ensure that the size field corresponds to the actual size of the buffer, and worse, there is no real way of using new here. To correct this, somewhat, you can define a class specific operator new and operator delete:

struct alignas(alignof(int)) Hack
{
    void* operator new( size_t, size_t n );
    void operator delete( void* );
    Hack( size_t n );
    char size;
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

The client code will then have to use placement new to allocate:

Hack* hack = new (20) Hack(20);

The client still has to repeat the size, but he cannot ignore it.

There are also techniques which can be used to prevent creating instances which aren't allocated dynamically, etc., to end up with something like:

struct alignas(alignof(int)) Hack
{
private:
    void operator delete( void* p )
    {
        ::operator delete( p );
    }
    //  ban all but dynamic lifetime (and also inheritance, member, etc.)
    ~Hack() = default;

    //  ban arrays
    void* operator new[]( size_t ) = delete;
    void operator delete[]( void* p ) = delete;
public:
    Hack( size_t n );
    void* operator new( size_t, size_t n )
    {
        return ::operator new( sizeof(Hack) + n * sizeof(int) );
    }
    char size;
    //  Since dtor is private, we need this.
    void deleteMe() { delete this; }
    int* buffer() { return reinterpret_cast<int*>(this + 1); }
};

Given the fundamental dangers of such a class, it is debatable if so many protective measures are necessary. Even with them, it's really only usable by someone who fully understands all of the constraints, and is carefully paying attention. In all but extreme cases, in very low level code, you'd just make the buffer a std::vector<int> and be done with it. In all but the lowest level code, the difference in performance would not be worth the risk and effort.

EDIT:

As a point of example, g++'s implementation of std::basic_string uses something very similar to the above, with a struct containing a reference count, the current size and the current capacity (three size_t), followed directly by the character buffer. And since it was written long before C++11 and alignas/alignof, something like std::basic_string<double> will crash on some systems (e.g. a Sparc). (While technically a bug, most people do not consider this a critical problem.)


This is C++, so templates are available:

template <int N>
struct hack {
    int filler;
    int thing [N];
};

Casting between different pointers to different instantiations will be the difficult issue, then.


The first thing that comes to mind is DON't, don't write C in C++. In 99.99% of the cases this hack is not needed, won't make any noticeable improvement in performance over just holding a std::vector and will complicate your life and that of the other maintainers of the project in which you deploy this.

If you want a standard compliant approach, provide a wrapper type that dynamically allocates a chunk of memory large enough to contain the hack (minus the array) plus N*sizeof(int) for the equivalent of the array (don't forget to ensure proper alighnment). The class would have accessors that map the members and the array elements to the correct location in memory.

Ignoring alignment and boiler plate code to make the interface nice and the implementation safe:

template <typename T>
class DataWithDynamicArray {
   void *ptr;
   int* array() {
      return static_cast<int*>(static_cast<char*>(ptr)+sizeof(T)); // align!
   }
public:
   DataWithDynamicArray(int size) : ptr() {
      ptr = malloc(sizeof(T) + sizeof(int)*size); // force correct alignment
      new (ptr) T();
   }
   ~DataWithDynamicArray() { 
      static_cast<T*>(ptr)->~T();
      free(ptr);
   }
// copy, assignment...
   int& operator[](int pos) {
       return array()[pos];
   }
   T& data() {
      return *static_cast<T*>(ptr);
    }
};

struct JustSize { int size; };
DataWithDynamicArray<JustSize> x(10);
x.data().size = 10
for (int i = 0; i < 10; ++i) {
    x[i] = i;
}

Now I would really not implement it that way (I would avoid implementing it at all!!), as for example the size should be a part of the state of DataWithDynamicArray...

This answer is provided only as an exercise, to explain that the same thing can be done without extensions, but beware this is just a toy example that has many issues including but not limited to exception safety or alignment (and yet is better than forcing the user to do the malloc with the correct size). The fact that you can does not mean that you should, and the real question is whether you need this feature and whether what you are trying to do is a good design at all or not.


If you really you feel the need to use a hack, why not just use

struct hack {
  char filler;
  int things[1];
};

followed by

hack_p = malloc(sizeof(struct hack)+(N-1)*sizeof int));

Or don't even bother about the -1 and live with a little extra space.


C++ does not have the concept of "flexible arrays". The only way to have a flexible array in C++ is to use a dynamic array - which leads you to use int* things. You will need a size parameter if you are attempting to read this data from a file so that you can create the appropriate sized array (or use a std::vector and just keep reading until you reach the end of the stream).

The "flexible array" hack keeps the spatial locality (that is has the allocated memory in a contiguous block to the rest of the structure), which you lose when you are forced to use dynamic memory. There isn't really an elegant way around that (e.g. you could allocate a large buffer, but you would have to make it sufficiently large enough to hold any number of elements you wanted - and if the actual data being read in was smaller than the buffer, there would be wasted space allocated).

Also, before people start suggesting keeping an int* to a separately allocated piece of memory in the struct instead, that is not a satisfactory answer. I want to allocate a single piece of memory to hold both my struct and the array elements. Using a std::vector also falls into the same category.

That is the way you would do it in C++. You can down-vote it all you want, but the fact remains: a non-standard extension is not going to work when you move to a compiler that does not support it. If you keep to the standard (e.g. avoid using compiler-specific hacks), you are less likely to run into these types of issues.


There is at least one advantage for flexible array members over zero length arrays when the compiler is clang.

struct Strukt1 {
    int fam[];
    int size;
};

struct Strukt2 {
    int fam[0];
    int size;
};

Here clang will error if it sees Strukt1 but won't error if it instead sees Strukt2. gcc and icc accept either without errors and msvc errors in either case. gcc does error if the code is compiled as C.

The same applies for this similar but less obvious example:

struct Strukt3 {
    int size;
    int fam[];
};

strukt Strukt4 {
    Strukt3 s3;
    int i;
};

Need Your Help

How to run a .class file that is part of a package from cmd?

java class cmd package javac

I keep getting errors when I make my .class part of a package and try to run it from cmd.

NumPy or Pandas: Keeping array type as integer while having a NaN value

python numpy int pandas type-conversion

Is there a preferred way to keep the data type of a numpy array fixed as int (or int64 or whatever), while still having an element inside listed as numpy.NaN?