Lords Of Tech

(Sort of) Reflection in C++11

This is yet another trick to achieve reflection-like functionality before Reflection TS, this time needing only C++11. It’s based on using CRTP to fill the object with carefully chosen garbage before initialisation.

In C++, it’s often annoying to implement some boring functionality to a class, such as printing some members, serialising them, registering them somewhere and so forth. Reflection TS should help it once its standardisation is complete. Some tricks like obscure tricks like stateful metaprogramming from Defect Report 2118 allow getting a list of member types of aggregate initialisable classes, code generation can help at the cost of complicating the build system.

While trying to insert names to members that were serialised to JSON using the stateful metaprogramming trick, I have found a new way to do it and it needs only C++11. It cannot analyse the content of any class, but it’s possible to access members that are initialised using methods from a CRTP parent. A class like this:

struct SupplyState : Printable<SupplyState> {
  float voltage = printed(" V");
  uint8_t resistanceCompensation;
  double current = printed(" mA");
};

can be used with this:

SupplyState state;
state.voltage = 220.54;
state.current = 10.042;
std::cout << state << std::endl;

to print {220.54 V, 10.042 mA}. Of course, it could also be used to read from a file or the strings could be used as attribute names for XML or JSON serialisation and deserialisation. A very similar trick can be used to allow custom types to learn the address of the object whose members they are.

Similar, but easy

Let’s start with an easier task. We have a standard protocol of communicating with some devices. The function that sends the messages accepts a number that identifies the value and a value. There’s plenty of such devices, each with its own set of control messages, so we’d like to implement representations of the devices as conveniently as possible. Here is a way it could be done:

struct DivergentOscillator : Connected {
	Connection<float, 3> voltage = this;
	Connection<float, 7> frequency = this;
};

Doing this is rather easy. The Connection class stores the pointer to the parent object and uses its access to the the messaging function.

template <typename T, int operation>
class Connection {
	Connected* parent;
	T value;
public:
	Connection(Connected* parent) : parent(parent) {}
	T operator=(T assigned) {
		parent->sendMessage(operation, assigned);
		value = assigned;
		return assigned;
	}
	operator T&() {
		return value;
	}
};

However, if the types are not custom or there isn’t a fitting assignment, it becomes harder.

Okay, now somewhat harder

Now, how about doing it this way:

struct DivergentOscillator : Connected {
	Connection<float> voltage = connect(3);
	Connection<float> frequency = connect(7);
};

It’s somewhat harder, but not a big deal. It all lies in a proper implementation of the connect() method.

// Inside the Connected class
struct Connector {
	Connected parent*;
	int operation;
	template <typename T>
	operator Connection<T>() {
		// Assuming Connection has an updated constructor
		return Connection<T>(parent, operation);
	}
};
Connector connect(int operation) {
	return Connector(this, operation);
}

Of course, the constructor of Connection could accept the Connector class instead of the Connector class having a conversion operator. I needed to showcase this for the next part.

The problem with general types

Now, going back to the first example:

struct SupplyState : Printable<SupplyState> {
  float voltage = printed(" V");
  uint8_t resistanceCompensation;
  double current = printed(" mA");
};

This seems somewhat similar. The Printable class can store a vector of functions that store lambdas that do the priting operations of individual members and the printed() methods fill it.

// The conversion operator of the class returned by printed()
template <typename T>
operator T() {
	T retval;
	parent->printers.push_back([&](std::ostream& out) {
		out << retval << suffix;
	};
	return retval;
}

However, there’s a catch. The variable named retval may not be created at the return address. Copy elision does not require it, not even after the changes in C++17. I have experimented with it and it seems to be done for larger classes. Is there a way to learn it?

Learning the return address in class member initialisation

I got stuck here for over a year. But then, while taking a shower, I got enlightened.

Garbage is the key. It’s possible to exploit uninitialised variables. When an object is being initialised, its members are initialised in descending order, overwriting garbage. The initialisation progress can be snapshotted just before every initialisation. Knowing the changes done by every initialisation is enough to learn the member’s offset. Members initialised differently, padding and uninitialised members can be correctly ignored. The parent can access all of the child’s bytes thanks to CRTP.

A visualisation of memory before and after the member's initialisation
A visualisation of memory before and after the member’s initialisation

This is an algorithm describing its operation:

  1. When the printed() function is called, find the last non-garbage byte and save it along with the lambda that processes it
  2. Repeat this for every element initialised through printed()
  3. When initialisation is done, find the first non-garbage element after the saved byte indexes; these are the addresses of the elements

Implementation necessities

There is a problem with the finalisation step that needs running code provided by the parent after the child class’ initialisation. This is not doable for an object. But it can be done for the entire class. The parent class can use a static member variable to have the constructor work differently during initialisation and initialise a second instance of the class in order to study it afterwards. All the inferred information can be stored in a static member and used for all instances.

Another problem is the definition of garbage. It can be a magic number, let’s choose 13, and it’s unlikely that any relevant members would start with byte 13. Furthermore, other variables can be set to 13 too (or 0x0d). This can be avoided by initialising the class twice, with different magic numbers. This could fail only if the class was not always initialised in the same way (in combination with bad luck). Pointers to structures dynamically allocated always at different location would not break this, because dynamically allocated objects are guaranteed to be placed to addresses that are aligned and thus will always be even (or, to be correct, it’s fine on little endian or on 64-bit architectures). In-place constructor is needed to construct the object in prepared garbage.

It would also be a problem for classes that violate single responsibility principle by both serving to hold values and have complex constructors. The problem is that a constructor can fill a garbage-filled member after the initialisation of later elements is done. This is a bad, error-prone practice anyway.

The actual code

#include <iostream>
#include <vector>
#include <array>
#include <functional>
#include <cstring>
#include <memory>

template <typename Child>
  class Printable {
    struct MappingElement {
      int size;
      std::array<int, 2> lastInitialisedBefore;
      const char* name;
    };
    struct MappingInfo {
      std::vector<MappingElement> elements;
      const Child* instance;
      int index;
      int parentOffset;
    };
    static MappingInfo*& mappingInfo() {
      static MappingInfo* instance = nullptr;
      return instance;
    }

    enum class InitialisationState: uint8_t {
      UNINITIALISED,
      INITIALISING,
      INITIALISING_AGAIN,
      INITIALISED
    };
    static InitialisationState& initialisationState() {
      static InitialisationState state = InitialisationState::UNINITIALISED;
      return state;
    }

    struct SubPrinter {
      virtual void print(std::ostream& out, Printable<Child>* base) = 0;
      int offset;
      virtual~SubPrinter() = default;
    };

    static std::vector<std::unique_ptr<SubPrinter>>& subPrinters() {
      static std::vector<std::unique_ptr<SubPrinter>> instance;
      return instance;
    }

    static int& parentOffset() {
      static int instance;
      return instance;
    }

    constexpr static int8_t garbageNumber1 = 13;
    constexpr static int8_t garbageNumber2 = -13;

    struct Assigner {
      template <typename Member>
        operator Member() {
          if (Printable::mappingInfo()) {
            MappingInfo& mappingInfo = *Printable::mappingInfo();
            mappingInfo.elements[mappingInfo.index].size = sizeof(Member);
            Printable::InitialisationState state =
                        Printable::initialisationState();
            int8_t garbageNumber =
                  (state == InitialisationState::INITIALISING)
                        ? Printable::garbageNumber1
                        : Printable::garbageNumber2;

            int lastUninitialised = sizeof(Child) - 1;

            // Valgrind will complain
            while (lastUninitialised
                  && reinterpret_cast<const int8_t*>(mappingInfo.instance)
                  [lastUninitialised - 1] == garbageNumber) {
              lastUninitialised--;
           }

            mappingInfo.elements[mappingInfo.index].lastInitialisedBefore
                        [state == InitialisationState::INITIALISING_AGAIN]
                        = lastUninitialised;

            if (state == InitialisationState::INITIALISING) {
              struct: SubPrinter {
                void print(std::ostream& out,
                                          Printable<Child>* base) override {
                  Member *member = reinterpret_cast<Member*>
                              (reinterpret_cast<uint64_t>(base)
                              + SubPrinter::offset + base->parentOffset());
                  out << *member << name;
                }
                const char* name; // The offset will be set later
              }
              subPrinter;
              subPrinter.name =
                        mappingInfo.elements[mappingInfo.index].name;
              subPrinters().emplace_back(
                              new decltype(subPrinter)(subPrinter));
            }

            mappingInfo.index++;
          }
          return Member {};
        }
    };

    static std::vector<std::function<void(Printable*)>>& printers() {
      static std::vector<std::function<void(Printable*)>> instance;
      return instance;
    }

    protected:

      Assigner printed(const char* name) {
        if (Printable::initialisationState() ==
                        InitialisationState::INITIALISING) {
          mappingInfo()->elements.emplace_back();
          mappingInfo()->elements.back().name = name;
        }
        return Assigner();
      }
   template <typename... Args>
    Printable(Args... args) {
      if (initialisationState() == InitialisationState::UNINITIALISED) {
        // Prepare stuff
        initialisationState() = InitialisationState::INITIALISING;
        MappingInfo info;
        mappingInfo() = &info;

        // Create the child class in specially prepared garbage
        constexpr int allocatedSize = sizeof(Child) / sizeof(void*) + 1;
        std::array<std::array<void* , allocatedSize>, 2> allocated;
        // Allocate as void* to have proper padding
        std::array< int8_t*, 2> childBytes;
        struct ChildDestroyer {
                  // We must assure proper destruction of Child,
                  // even if an exception is called
          Child* child = nullptr;
          ~ChildDestroyer() {
            if (child)
              child->~Child();
          }
        };
        std::array<ChildDestroyer, 2>destroyers;
        auto makeChild = [&](int index, int garbageNumber) {
          info.instance = reinterpret_cast<Child*>(&allocated[index]);
          info.index = 0;
          destroyers[index].child = new(&allocated[index]) Child(args...);
          childBytes[index] = reinterpret_cast<int8_t*>
                                    (destroyers[index].child);
        };
        makeChild(0, garbageNumber1);
        parentOffset() = reinterpret_cast<uint64_t>(
                  static_cast<const Printable<Child>*> (info.instance))
                  - reinterpret_cast<uint64_t>(info.instance);

        // Do it again
        initialisationState() = InitialisationState::INITIALISING_AGAIN;
        makeChild(1, garbageNumber2);

        // Check where garbage was left
        for (unsigned int i = 0; i < info.elements.size(); i++) {
          int start = std::max(info.elements[i].lastInitialisedBefore[0],
                        info.elements[i].lastInitialisedBefore[1]);
          while (childBytes[0][start] == garbageNumber1
                        && childBytes[1][start] == garbageNumber2) {
            if (start > sizeof(Child))
                  throw std::logic_error("Reflection failed");
            start++;
          }
          subPrinters()[i]->offset = start;
        }

        mappingInfo() = nullptr;
        initialisationState() = InitialisationState::INITIALISED;
      } else if (initialisationState() == InitialisationState::INITIALISING
          || initialisationState() == InitialisationState::INITIALISING_AGAIN) {
        // We need to keep track of what is allocated and what is trash
        void* start = reinterpret_cast<void*>(reinterpret_cast<uint64_t>(this));
        uint8_t garbageNumber = (initialisationState() ==
               InitialisationState::INITIALISING) ? garbageNumber1 :
                garbageNumber2;
        size_t length = sizeof(Child) - (reinterpret_cast<uint64_t>(this)
                - reinterpret_cast<uint64_t>(mappingInfo()->instance));
        memset(start, garbageNumber, sizeof(Child) - sizeof(Printable<Child>));
      }
    }

    friend std::ostream& operator<<(std::ostream& out,
                              Printable<Child>& instance) {
      out << "{";
      for (unsigned int i = 0; i < subPrinters().size(); i++) {
        subPrinters()[i]->print(out, &instance);
        if (i < subPrinters().size() - 1)
          out << ", ";
      }
      out << "}";
      return out;
    }

  };

// USAGE

struct SupplyState: Printable<SupplyState> {
  uint16_t address = printed("");
  float voltage = printed(" V");
  uint8_t voltageCompensation = 13;
  float current = printed(" mA");
  int8_t currentCompensation = -13;
  int8_t currentCorrection = 13;
  double frequency = printed(" Hz");
};

int main() {
  SupplyState state;
  state.address = 13;
  state.voltage = 220.54;
  state.current = 10.042;
  state.frequency = 50.00434;
  std::cout << state << std::endl;
}

The output is:

{13, 220.54 V, 10.042 mA, 50.0043 Hz}

Alternative usage: supplying pointer to parent to member variables

Pre-initialisation garbage can also be used in another way. A parent object can use it to supply its address to member objects, allowing them to use it without needing any explicit code to connect them (unlike in the Connection example). CRTP can be used to make sure only the space where the object is going to be will be overwritten. The standard layout has pointers that are always at addresses divisible by their size, so the repeated address will start at the correct byte.

#include <iostream>

struct ConnectedBase {
  void* content;
  template <typename T>
  void sendMessage(int operation, T message) {
    // Should send data into some device rather than print it
    std::cout << "Ordering operation " << operation << " with argument " << message << std::endl;
  }
};

template <typename Child>
struct Connected : ConnectedBase {
  Connected() {
    int left = reinterpret_cast<uint64_t>(static_cast<Child*>(this))
                  + sizeof(Child) - reinterpret_cast<uint64_t>(this); 
    for (unsigned int i = sizeof(ConnectedBase) / sizeof(Child*);
                  i < left / sizeof(Child*); i++)
      reinterpret_cast<ConnectedBase**>(this)[i] = this;
  }
};

template <typename T, int operation>
class Connection {
	ConnectedBase* parent;
	T value;
public:
	T operator=(T assigned) {
		parent->sendMessage(operation, assigned);
		value = assigned;
		return assigned;
	}
	operator T&() {
		return value;
	}
};

struct DivergentOscillator : Connected<DivergentOscillator> {
	Connection<float, 3> voltage;
	Connection<float, 7> frequency;
};

int main() {
  DivergentOscillator oscillator;
  oscillator.voltage = 20;
  oscillator.frequency = 10000;
}

The output of this program is:

Ordering operation 3 with argument 20
Ordering operation 7 with argument 10000

Leave a Reply

Your email address will not be published. Required fields are marked *