Lords Of Tech

Writing a JSON library in 2 hours in C++

C++ has a bad reputation for long times needed for development. However, as this task has shown, it does not deserve this reputation.

JSON (shortcut for JavaScript Object Notation) is a human readable data exchange format. It’s used usually for the same purpose as XML, but its rigid structure makes it more convenient for most purposes. By virtue of being valid JavaScript code, it can be directly loaded as JavaScript data structures. It, however, isn’t true for other languages.

C++ has plenty of libraries for dealing with JSON, however, I needed a small one to attach to one small project to avoid adding it additional dependencies that would complicate its use. So I decided to write one.

C++ has a bad reputation for long times needed for development. However, as this task has shown, it does not deserve this reputation.

Requirements

The JSON library needed to have these features:

  • A data structure for representing JSON
  • A parser that will convert strings into the JSON data structure
  • A serialiser that will write the JSON data structure as string
  • No dependencies outside standard C++14 libraries

Performance and convenience of usage were not critical.

The JSON structure

A JSON value is of one of the following types:

  • A null value
  • A boolean value
  • A number
  • A string
  • A hashtable of string-indexed JSON values
  • An array of JSON values

The most straightforward way to implement this was usual object oriented code, with an interface with getters for all types of content that were overriden by classes holding specific types.

Because C++ does not differ between parent classes and interfaces, the parent interface provides default implementations of all access methods that only throw exceptions. The interface itself can be used to represent the null type, because nothing can be obtained from this type.

All types can be represented by similar C++ structures from the C++11 standard:

  • null is not represented, RTTI is enough to determine it’s null
  • number is represented by double
  • boolean is represented by bool
  • string is represented by std::string
  • object is represented by std::unordered_map<std::string, std::shared_ptr<JSON>>
  • array is represented by std::vector<std::shared_ptr<JSON>>
enum class JSONtype : uint8_t {
  NIL,
  STRING,
  NUMBER,
  BOOL,
  ARRAY,
  OBJECT
};
struct JSON {
  inline virtual JSONtype type() {
    return JSONtype::NIL;
  }
  inline virtual std::string& getString() {
    throw(std::runtime_error("String value is not really string"));
  }
  inline virtual double& getDouble() {
    throw(std::runtime_error("Double value is not really double"));
  }
  inline virtual bool& getBool() {
    throw(std::runtime_error("Bool value is not really bool"));
  }
  inline virtual std::vector<std::shared_ptr<JSON>>& getVector() {
    throw(std::runtime_error("Array value is not really array"));
  }
  inline virtual std::unordered_map<std::string, std::shared_ptr<JSON>>& getObject() {
    throw(std::runtime_error("Object value is not really an object"));
  }
};

Note that because C++ must be able to fall back to C and NULL is a macro in C, the null type had to be renamed to nil in code.

Classes for other types didn’t need much code:

struct JSONstring : public JSON {
  std::string contents_;
  JSONstring(const std::string& from = "") : contents_(from) {}

  inline virtual JSONtype type() {
    return JSONtype::STRING;
  }
  inline virtual std::string& getString() {
    return contents_;
  }
};

The overriding methods should have been tagged as override, but I forgot about it when writing it and didn’t have the related warning enabled.

It took only a few minutes to implement and there was pretty much no way to make a mistake except typos detectable at compile time.

Parser

JSON can be parsed with a single run through the string. Its nesting can be conveniently dealt with using recursion. So I wrote a recursive function that parses the input character by character.

I didn’t use many high level features because they would not shorten the code much, but make the code more error prone and slower (this might not be true for other programming languages). They were used only for converting strings into numbers and memory management.

static std::shared_ptr<JSON> parseJSON(std::istream& in) {
  auto readString = [&in] () -> std::string {
    char letter = in.get();
    std::string collected;
    while (letter != '"') {
      if (letter == '\\') {
        if (in.get() == '"') collected.push_back('"');
        else if (in.get() == 'n') collected.push_back('\n');
        else if (in.get() == '\\') collected.push_back('\\');
      } else {
        collected.push_back(letter);
      }
      letter = in.get();
    }
    return collected;
  };
  auto readWhitespace = [&in] () -> char {
    char letter;
    do {
      letter = in.get();
    } while (letter == ' ' || letter == '\t' || letter == '\n' || letter == ',');
    return letter;
  };

  char letter = readWhitespace();
  if (letter == 0 || letter == EOF) return std::make_shared<JSON>();
  else if (letter == '"') {
    return std::make_shared<JSONstring>(readString());
  }
  else if (letter == 't') {
    if (in.get() == 'r' && in.get() == 'u' && in.get() == 'e')
      return std::make_shared<JSONbool>(true);
    else
      throw(std::runtime_error("JSON parser found misspelled bool 'true'"));
  }
  else if (letter == 'f') {
    if (in.get() == 'a' && in.get() == 'l' && in.get() == 's' && in.get() == 'e')
      return std::make_shared<JSONbool>(false);
    else
      throw(std::runtime_error("JSON parser found misspelled bool 'false'"));
  }
  else if (letter == 'n') {
    if (in.get() == 'u' && in.get() == 'l' && in.get() == 'l')
      return std::make_shared<JSON>();
    else
      throw(std::runtime_error("JSON parser found misspelled bool 'null'"));
  }
  else if (letter == '-' || (letter >= '0' && letter <= '9')) {
    std::string asString;
    asString.push_back(letter);
    do {
      letter = in.get();
      asString.push_back(letter);
    } while (letter == '-' || letter == 'E' || letter == 'e'
              || letter == ',' || letter == '.' || (letter >= '0' && letter <= '9'));
    in.unget();
    std::stringstream parsing(asString);
    double number;
    parsing >> number;
    return std::make_shared<JSONdouble>(number);
  }
  else if (letter == '{') {
    auto retval = std::make_shared<JSONobject>();
    do {
      letter = readWhitespace();
      if (letter == '"') {
        const std::string& name = readString();
        letter = readWhitespace();
        if (letter != ':')
            throw std::runtime_error("JSON parser expected an additional ':' somewhere");
        retval->getObject()[name] = parseJSON(in);
      } else break;
    } while (letter != '}');
    return retval;
  }
  else if (letter == '[') {
    auto retval = std::make_shared<JSONarray>();
    do {
      letter = readWhitespace();
      if (letter == '{') {
        in.unget();
        retval->getVector().push_back(parseJSON(in));
      } else break;
    } while (letter != ']');
    return retval;
  } else {
    throw(std::runtime_error("JSON parser found unexpected character " + letter));
  }
  return std::make_shared<JSON>();
}

Note that a switch wasn’t used because it allows selecting for specific values only.

Escaping is done in a slightly nonstandard way because I could not find anything official about it and UTF-8 makes escaping non-controlling characters unnecessary anyway.

Serialiser

Serialising JSON back to string is fairly straightforward, all it needs is some recursive tree traversal and string concatenation. I chose to do it with a method implemented in each class that writes its value into a string given as argument with an indentation given as another argument.

// null
inline virtual void write(std::ostream& out, int = 0) {
  out << "null";
}

// ...
// array
inline void write(std::ostream& out, int depth = 0) {
  out.put('[');
  if (contents_.empty()) {
    out.put(']');
    return;
  }
  for (auto& it : contents_) {
    out.put('\n');
    indent(out, depth);
    it->write(out, depth + 1);
  }
  out.put('\n');
  indent(out, depth);
  out.put(']');
}

Debugging

So far, it took me 1 hour and 30 minutes. When I made the code compilable, I wrote short code that used it for testing.

Because the memory management was done at sufficiently high level, there wasn’t much space for low level specific errors, but its internal logic was not written correctly.

Basic JSON worked at first try, but the code wasn’t bug-free. Indentation was not working at all, the parser was getting lost under some conditions and there were some other problems. I could fix them with a bit of looking at suspicious code, debug prints and trial and error.

The code was able to parse structures containing all JSON types and write it back after 2 hours of coding. I never missed any features later, but I found some little bugs eventually.

Conclusion

This proves that coding in C++ isn’t slow. I am not trying to imply that it doesn’t require a lot of experience to be efficient.

The JSON tool was used as a backend for a class that allows serialising object-oriented data structures with very little code. The source code has been heavily edited since then, but it can be found in the history of its repo.

Leave a Reply

Your email address will not be published. Required fields are marked *