3 The term "SAX" originated from [Simple API for XML](http://en.wikipedia.org/wiki/Simple_API_for_XML). We borrowed this term for JSON parsing and generation.
5 In RapidJSON, `Reader` (typedef of `GenericReader<...>`) is the SAX-style parser for JSON, and `Writer` (typedef of `GenericWriter<...>`) is the SAX-style generator for JSON.
11 `Reader` parses a JSON from a stream. While it reads characters from the stream, it analyze the characters according to the syntax of JSON, and publish events to a handler.
13 For example, here is a JSON.
27 While a `Reader` parses this JSON, it publishes the following events to the handler sequentially:
32 String("world", 5, true)
53 These events can be easily matched with the JSON, except some event parameters need further explanation. Let's see the `simplereader` example which produces exactly the same output as above:
56 #include "rapidjson/reader.h"
59 using namespace rapidjson;
62 struct MyHandler : public BaseReaderHandler<UTF8<>, MyHandler> {
63 bool Null() { cout << "Null()" << endl; return true; }
64 bool Bool(bool b) { cout << "Bool(" << boolalpha << b << ")" << endl; return true; }
65 bool Int(int i) { cout << "Int(" << i << ")" << endl; return true; }
66 bool Uint(unsigned u) { cout << "Uint(" << u << ")" << endl; return true; }
67 bool Int64(int64_t i) { cout << "Int64(" << i << ")" << endl; return true; }
68 bool Uint64(uint64_t u) { cout << "Uint64(" << u << ")" << endl; return true; }
69 bool Double(double d) { cout << "Double(" << d << ")" << endl; return true; }
70 bool String(const char* str, SizeType length, bool copy) {
71 cout << "String(" << str << ", " << length << ", " << boolalpha << copy << ")" << endl;
74 bool StartObject() { cout << "StartObject()" << endl; return true; }
75 bool Key(const char* str, SizeType length, bool copy) {
76 cout << "Key(" << str << ", " << length << ", " << boolalpha << copy << ")" << endl;
79 bool EndObject(SizeType memberCount) { cout << "EndObject(" << memberCount << ")" << endl; return true; }
80 bool StartArray() { cout << "StartArray()" << endl; return true; }
81 bool EndArray(SizeType elementCount) { cout << "EndArray(" << elementCount << ")" << endl; return true; }
85 const char json[] = " { \"hello\" : \"world\", \"t\" : true , \"f\" : false, \"n\": null, \"i\":123, \"pi\": 3.1416, \"a\":[1, 2, 3, 4] } ";
89 StringStream ss(json);
90 reader.Parse(ss, handler);
94 Note that, RapidJSON uses template to statically bind the `Reader` type and the handler type, instead of using class with virtual functions. This paradigm can improve the performance by inlining functions.
98 As the previous example showed, user needs to implement a handler, which consumes the events (function calls) from `Reader`. The handler must contain the following member functions.
105 bool Uint(unsigned i);
106 bool Int64(int64_t i);
107 bool Uint64(uint64_t i);
108 bool Double(double d);
109 bool RawNumber(const Ch* str, SizeType length, bool copy);
110 bool String(const Ch* str, SizeType length, bool copy);
112 bool Key(const Ch* str, SizeType length, bool copy);
113 bool EndObject(SizeType memberCount);
115 bool EndArray(SizeType elementCount);
119 `Null()` is called when the `Reader` encounters a JSON null value.
121 `Bool(bool)` is called when the `Reader` encounters a JSON true or false value.
123 When the `Reader` encounters a JSON number, it chooses a suitable C++ type mapping. And then it calls *one* function out of `Int(int)`, `Uint(unsigned)`, `Int64(int64_t)`, `Uint64(uint64_t)` and `Double(double)`. If `kParseNumbersAsStrings` is enabled, `Reader` will always calls `RawNumber()` instead.
125 `String(const char* str, SizeType length, bool copy)` is called when the `Reader` encounters a string. The first parameter is pointer to the string. The second parameter is the length of the string (excluding the null terminator). Note that RapidJSON supports null character `'\0'` inside a string. If such situation happens, `strlen(str) < length`. The last `copy` indicates whether the handler needs to make a copy of the string. For normal parsing, `copy = true`. Only when *insitu* parsing is used, `copy = false`. And beware that, the character type depends on the target encoding, which will be explained later.
127 When the `Reader` encounters the beginning of an object, it calls `StartObject()`. An object in JSON is a set of name-value pairs. If the object contains members it first calls `Key()` for the name of member, and then calls functions depending on the type of the value. These calls of name-value pairs repeats until calling `EndObject(SizeType memberCount)`. Note that the `memberCount` parameter is just an aid for the handler, user may not need this parameter.
129 Array is similar to object but simpler. At the beginning of an array, the `Reader` calls `BeginArary()`. If there is elements, it calls functions according to the types of element. Similarly, in the last call `EndArray(SizeType elementCount)`, the parameter `elementCount` is just an aid for the handler.
131 Every handler functions returns a `bool`. Normally it should returns `true`. If the handler encounters an error, it can return `false` to notify event publisher to stop further processing.
133 For example, when we parse a JSON with `Reader` and the handler detected that the JSON does not conform to the required schema, then the handler can return `false` and let the `Reader` stop further parsing. And the `Reader` will be in error state with error code `kParseErrorTermination`.
135 ## GenericReader {#GenericReader}
137 As mentioned before, `Reader` is a typedef of a template class `GenericReader`:
140 namespace rapidjson {
142 template <typename SourceEncoding, typename TargetEncoding, typename Allocator = MemoryPoolAllocator<> >
143 class GenericReader {
147 typedef GenericReader<UTF8<>, UTF8<> > Reader;
149 } // namespace rapidjson
152 The `Reader` uses UTF-8 as both source and target encoding. The source encoding means the encoding in the JSON stream. The target encoding means the encoding of the `str` parameter in `String()` calls. For example, to parse a UTF-8 stream and outputs UTF-16 string events, you can define a reader by:
155 GenericReader<UTF8<>, UTF16<> > reader;
158 Note that, the default character type of `UTF16` is `wchar_t`. So this `reader`needs to call `String(const wchar_t*, SizeType, bool)` of the handler.
160 The third template parameter `Allocator` is the allocator type for internal data structure (actually a stack).
162 ## Parsing {#SaxParsing}
164 The one and only one function of `Reader` is to parse JSON.
167 template <unsigned parseFlags, typename InputStream, typename Handler>
168 bool Parse(InputStream& is, Handler& handler);
170 // with parseFlags = kDefaultParseFlags
171 template <typename InputStream, typename Handler>
172 bool Parse(InputStream& is, Handler& handler);
175 If an error occurs during parsing, it will return `false`. User can also calls `bool HasParseEror()`, `ParseErrorCode GetParseErrorCode()` and `size_t GetErrorOffset()` to obtain the error states. Actually `Document` uses these `Reader` functions to obtain parse errors. Please refer to [DOM](doc/dom.md) for details about parse error.
179 `Reader` converts (parses) JSON into events. `Writer` does exactly the opposite. It converts events into JSON.
181 `Writer` is very easy to use. If your application only need to converts some data into JSON, it may be a good choice to use `Writer` directly, instead of building a `Document` and then stringifying it with a `Writer`.
183 In `simplewriter` example, we do exactly the reverse of `simplereader`.
186 #include "rapidjson/writer.h"
187 #include "rapidjson/stringbuffer.h"
190 using namespace rapidjson;
195 Writer<StringBuffer> writer(s);
197 writer.StartObject();
199 writer.String("world");
209 writer.Double(3.1416);
212 for (unsigned i = 0; i < 4; i++)
217 cout << s.GetString() << endl;
222 {"hello":"world","t":true,"f":false,"n":null,"i":123,"pi":3.1416,"a":[0,1,2,3]}
225 There are two `String()` and `Key()` overloads. One is the same as defined in handler concept with 3 parameters. It can handle string with null characters. Another one is the simpler version used in the above example.
227 Note that, the example code does not pass any parameters in `EndArray()` and `EndObject()`. An `SizeType` can be passed but it will be simply ignored by `Writer`.
229 You may doubt that, why not just using `sprintf()` or `std::stringstream` to build a JSON?
231 There are various reasons:
232 1. `Writer` must output a well-formed JSON. If there is incorrect event sequence (e.g. `Int()` just after `StartObject()`), it generates assertion fail in debug mode.
233 2. `Writer::String()` can handle string escaping (e.g. converting code point `U+000A` to `\n`) and Unicode transcoding.
234 3. `Writer` handles number output consistently.
235 4. `Writer` implements the event handler concept. It can be used to handle events from `Reader`, `Document` or other event publisher.
236 5. `Writer` can be optimized for different platforms.
238 Anyway, using `Writer` API is even simpler than generating a JSON by ad hoc methods.
240 ## Template {#WriterTemplate}
242 `Writer` has a minor design difference to `Reader`. `Writer` is a template class, not a typedef. There is no `GenericWriter`. The following is the declaration.
245 namespace rapidjson {
247 template<typename OutputStream, typename SourceEncoding = UTF8<>, typename TargetEncoding = UTF8<>, typename Allocator = CrtAllocator<>, unsigned writeFlags = kWriteDefaultFlags>
250 Writer(OutputStream& os, Allocator* allocator = 0, size_t levelDepth = kDefaultLevelDepth)
254 } // namespace rapidjson
257 The `OutputStream` template parameter is the type of output stream. It cannot be deduced and must be specified by user.
259 The `SourceEncoding` template parameter specifies the encoding to be used in `String(const Ch*, ...)`.
261 The `TargetEncoding` template parameter specifies the encoding in the output stream.
263 The `Allocator` is the type of allocator, which is used for allocating internal data structure (a stack).
265 The `writeFlags` are combination of the following bit-flags:
267 Parse flags | Meaning
268 ------------------------------|-----------------------------------
269 `kWriteNoFlags` | No flag is set.
270 `kWriteDefaultFlags` | Default write flags. It is equal to macro `RAPIDJSON_WRITE_DEFAULT_FLAGS`, which is defined as `kWriteNoFlags`.
271 `kWriteValidateEncodingFlag` | Validate encoding of JSON strings.
272 `kWriteNanAndInfFlag` | Allow writing of `Infinity`, `-Infinity` and `NaN`.
274 Besides, the constructor of `Writer` has a `levelDepth` parameter. This parameter affects the initial memory allocated for storing information per hierarchy level.
276 ## PrettyWriter {#PrettyWriter}
278 While the output of `Writer` is the most condensed JSON without white-spaces, suitable for network transfer or storage, it is not easily readable by human.
280 Therefore, RapidJSON provides a `PrettyWriter`, which adds indentation and line feeds in the output.
282 The usage of `PrettyWriter` is exactly the same as `Writer`, expect that `PrettyWriter` provides a `SetIndent(Ch indentChar, unsigned indentCharCount)` function. The default is 4 spaces.
284 ## Completeness and Reset {#CompletenessReset}
286 A `Writer` can only output a single JSON, which can be any JSON type at the root. Once the singular event for root (e.g. `String()`), or the last matching `EndObject()` or `EndArray()` event, is handled, the output JSON is well-formed and complete. User can detect this state by calling `Writer::IsComplete()`.
288 When a JSON is complete, the `Writer` cannot accept any new events. Otherwise the output will be invalid (i.e. having more than one root). To reuse the `Writer` object, user can call `Writer::Reset(OutputStream& os)` to reset all internal states of the `Writer` with a new output stream.
290 # Techniques {#SaxTechniques}
292 ## Parsing JSON to Custom Data Structure {#CustomDataStructure}
294 `Document`'s parsing capability is completely based on `Reader`. Actually `Document` is a handler which receives events from a reader to build a DOM during parsing.
296 User may uses `Reader` to build other data structures directly. This eliminates building of DOM, thus reducing memory and improving performance.
298 In the following `messagereader` example, `ParseMessages()` parses a JSON which should be an object with key-string pairs.
301 #include "rapidjson/reader.h"
302 #include "rapidjson/error/en.h"
308 using namespace rapidjson;
310 typedef map<string, string> MessageMap;
312 struct MessageHandler
313 : public BaseReaderHandler<UTF8<>, MessageHandler> {
314 MessageHandler() : state_(kExpectObjectStart) {
319 case kExpectObjectStart:
320 state_ = kExpectNameOrObjectEnd;
327 bool String(const char* str, SizeType length, bool) {
329 case kExpectNameOrObjectEnd:
330 name_ = string(str, length);
331 state_ = kExpectValue;
334 messages_.insert(MessageMap::value_type(name_, string(str, length)));
335 state_ = kExpectNameOrObjectEnd;
342 bool EndObject(SizeType) { return state_ == kExpectNameOrObjectEnd; }
344 bool Default() { return false; } // All other events are invalid.
346 MessageMap messages_;
349 kExpectNameOrObjectEnd,
355 void ParseMessages(const char* json, MessageMap& messages) {
357 MessageHandler handler;
358 StringStream ss(json);
359 if (reader.Parse(ss, handler))
360 messages.swap(handler.messages_); // Only change it if success.
362 ParseErrorCode e = reader.GetParseErrorCode();
363 size_t o = reader.GetErrorOffset();
364 cout << "Error: " << GetParseError_En(e) << endl;;
365 cout << " at offset " << o << " near '" << string(json).substr(o, 10) << "...'" << endl;
372 const char* json1 = "{ \"greeting\" : \"Hello!\", \"farewell\" : \"bye-bye!\" }";
373 cout << json1 << endl;
374 ParseMessages(json1, messages);
376 for (MessageMap::const_iterator itr = messages.begin(); itr != messages.end(); ++itr)
377 cout << itr->first << ": " << itr->second << endl;
379 cout << endl << "Parse a JSON with invalid schema." << endl;
380 const char* json2 = "{ \"greeting\" : \"Hello!\", \"farewell\" : \"bye-bye!\", \"foo\" : {} }";
381 cout << json2 << endl;
382 ParseMessages(json2, messages);
389 { "greeting" : "Hello!", "farewell" : "bye-bye!" }
393 Parse a JSON with invalid schema.
394 { "greeting" : "Hello!", "farewell" : "bye-bye!", "foo" : {} }
395 Error: Terminate parsing due to Handler error.
396 at offset 59 near '} }...'
399 The first JSON (`json1`) was successfully parsed into `MessageMap`. Since `MessageMap` is a `std::map`, the printing order are sorted by the key. This order is different from the JSON's order.
401 In the second JSON (`json2`), `foo`'s value is an empty object. As it is an object, `MessageHandler::StartObject()` will be called. However, at that moment `state_ = kExpectValue`, so that function returns `false` and cause the parsing process be terminated. The error code is `kParseErrorTermination`.
403 ## Filtering of JSON {#Filtering}
405 As mentioned earlier, `Writer` can handle the events published by `Reader`. `condense` example simply set a `Writer` as handler of a `Reader`, so it can remove all white-spaces in JSON. `pretty` example uses the same relationship, but replacing `Writer` by `PrettyWriter`. So `pretty` can be used to reformat a JSON with indentation and line feed.
407 Actually, we can add intermediate layer(s) to filter the contents of JSON via these SAX-style API. For example, `capitalize` example capitalize all strings in a JSON.
410 #include "rapidjson/reader.h"
411 #include "rapidjson/writer.h"
412 #include "rapidjson/filereadstream.h"
413 #include "rapidjson/filewritestream.h"
414 #include "rapidjson/error/en.h"
418 using namespace rapidjson;
420 template<typename OutputHandler>
421 struct CapitalizeFilter {
422 CapitalizeFilter(OutputHandler& out) : out_(out), buffer_() {
425 bool Null() { return out_.Null(); }
426 bool Bool(bool b) { return out_.Bool(b); }
427 bool Int(int i) { return out_.Int(i); }
428 bool Uint(unsigned u) { return out_.Uint(u); }
429 bool Int64(int64_t i) { return out_.Int64(i); }
430 bool Uint64(uint64_t u) { return out_.Uint64(u); }
431 bool Double(double d) { return out_.Double(d); }
432 bool RawNumber(const char* str, SizeType length, bool copy) { return out_.RawNumber(str, length, copy); }
433 bool String(const char* str, SizeType length, bool) {
435 for (SizeType i = 0; i < length; i++)
436 buffer_.push_back(std::toupper(str[i]));
437 return out_.String(&buffer_.front(), length, true); // true = output handler need to copy the string
439 bool StartObject() { return out_.StartObject(); }
440 bool Key(const char* str, SizeType length, bool copy) { return String(str, length, copy); }
441 bool EndObject(SizeType memberCount) { return out_.EndObject(memberCount); }
442 bool StartArray() { return out_.StartArray(); }
443 bool EndArray(SizeType elementCount) { return out_.EndArray(elementCount); }
446 std::vector<char> buffer_;
449 int main(int, char*[]) {
450 // Prepare JSON reader and input stream.
452 char readBuffer[65536];
453 FileReadStream is(stdin, readBuffer, sizeof(readBuffer));
455 // Prepare JSON writer and output stream.
456 char writeBuffer[65536];
457 FileWriteStream os(stdout, writeBuffer, sizeof(writeBuffer));
458 Writer<FileWriteStream> writer(os);
460 // JSON reader parse from the input stream and let writer generate the output.
461 CapitalizeFilter<Writer<FileWriteStream> > filter(writer);
462 if (!reader.Parse(is, filter)) {
463 fprintf(stderr, "\nError(%u): %s\n", (unsigned)reader.GetErrorOffset(), GetParseError_En(reader.GetParseErrorCode()));
471 Note that, it is incorrect to simply capitalize the JSON as a string. For example:
476 Simply capitalizing the whole JSON would contain incorrect escape character:
481 The correct result by `capitalize`:
486 More complicated filters can be developed. However, since SAX-style API can only provide information about a single event at a time, user may need to book-keeping the contextual information (e.g. the path from root value, storage of other related values). Some processing may be easier to be implemented in DOM than SAX.