Aspecto blog

On microservices, OpenTelemetry, and anything in between

Genson-js: a user-friendly JSON Schema generator

Share this post

In modern web development, we deal with JSON every day.

External APIs and our microservices usually have RESTful API and use JSON as the primary format. 

Since we often use dynamically typed scripted languages like JavaScript and Python on the backend, those JSON payloads may be quite dynamic as well. 

By “dynamic”, I mean that the same endpoint may return a different JSON structure depending on request parameters or something else. So we may want to implement some validation of those structures. 

To do that, we’d need some language to describe it, and such language is called JSON Schema

But creating those schemas manually is a somewhat tedious job, and since developers love to automate things, some libraries can do just that, generate or “infer” JSON schemas.

There are a few existing libraries that can infer schemas, mostly written in Java and Python (this list is not complete and doesn’t have any particular order):

All of them have a different set of features, APIs, and use cases.

Genson-js: Our Motivation

At Aspecto, our backend is mostly written in TypeScript. 

To use a library written in another language, we needed to create a separate microservice and call it via REST API.

And that’s exactly what we did.

This solution worked for some time, but soon we started having issues with it.

The reason is that schema generation is a relatively cheap and quick operation, but when you do it as a REST API call, HTTP brings a lot of overhead. 

We tried to do some optimizations, like batching, but it didn’t help much, and we had to deal with high latencies and CPU usage, as we were using this schema generation service quite heavily.  

Since there was no existing JavaScript library that would meet our needs, we decided to build one. 

It is called genson-js.

You can find it on GitHub and npm. It is licensed under Apache-2.0.

Usage

As with any other npm package, to start using it, all you need to do is:

npm i genson-js

I will use TypeScript in the following examples, but you can easily translate it to JavaScript, use require instead of import.

So let’s have a look at some examples.

Generating a schema for any JSON object is as simple as this:

import { createSchema } from 'genson-js';
const schema = createSchema({
    userName: 'smith',
    languages: ['c++', 'java'],
    age: 40,
});

The resulting schema:

{
  type: 'object',
  properties: {
    userName: { type: 'string' },
    languages: { type: 'array', items: [Object] },
    age: { type: 'integer' }
  },
  required: [ 'userName', 'languages', 'age' ]

You can take a look at more examples in the unit tests.

Apart from inferring schemas, the library allows you to merge one or more schemas so that the resulting schema would be a superset.

This is useful if you have a set of expected data structures and you want to validate that the one you get is not something new. Meaning, it doesn’t have any new fields, and the type of the fields is the same as you expect.

To do that you can use mergeSchemas to create a superset schema, and isSubset to do the validation:

import { mergeSchemas, isSubset, ValueType } from 'genson-js';
const merged = mergeSchemas([{ type: ValueType.Number }, { type: ValueType.String }]);
// will create merged schema like this:
// { type: ['number', 'string'] }
// will return true, as number is a one of the possible types in merged schema
isSubset(merged, { type: ValueType.Number });
// will return false, as we don't expect to get an array
isSubset(merged, { type: ValueType.Array });

It was quite a simple example and in this case, we could just as well use instance of to do the same thing, however, it gets much more complicated when you deal with large JSON structures.


Lastly, if you just need to compare two schemas, you can use areSchemasEqual:

import { areSchemasEqual } from 'genson-js';
areSchemasEqual({ type: ValueType.Number }, { type: ValueType.Number });
// will return true

Performance

I don’t think it worth doing proper benchmarking, since we weren’t aiming to build the fastest schema generator.

But to give you a basic understanding of how much time it takes to generate a schema, I’ve created a 1MB JSON file with all kinds of nested objects and used this code to measure it:

const jsonFile = require('./big-json-file.json');
import { performance, PerformanceObserver } from 'perf_hooks';
import { createSchema } from '.';
const timedCreateSchema = performance.timerify(createSchema);
function runBenchmarks() {
    for (let i = 0; i < 10; i++) {
        timedCreateSchema(jsonFile);
    }
}
const obs = new PerformanceObserver((list) => {
    const entries = list.getEntries();
    for (let entry of entries) {
        console.log(`${Math.ceil(entry.duration)}ms`);
    }
});
obs.observe({ entryTypes: ['function'] });
runBenchmarks();
obs.disconnect();

And those are the results (I’m using MacBook Pro with Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz and Node v12.20.1):

➜  genson-js git:(master) ✗ ts-node src/bench.ts
78ms
37ms
28ms
30ms
28ms
27ms
26ms
26ms
25ms
24ms

As you can see, once hot functions are JIT-compiled, it takes 25ms on average to generate a schema for 1Mb of JSON.

It would take a few seconds to do it with a REST API call, so the difference is very noticeable.

Main Takeaways 

It was interesting and fun to build this library, and it works well for us. Hopefully, someone else can benefit from it too.

The main takeaway for us was understanding that sometimes it’s easier to build a library in the same language than trying to use an existing solution as a service over RPC.

It depends on the complexity and size of it, but in this case, it was the right decision.

Give our new library a go and if you find it helpful, share this blog post with your team.

We are always happy to receive feedback – let us know.


Developed by Aspecto with ❤️

We are always working to create more libraries and OpenTelemetry Instrumentations. Check out one of our recent posts – OpenTelemetry KafkaJS Instrumentation for Node.js.

Spread the word

Subscribe for more distributed applications tutorials and insights that will help you boost microservices troubleshooting.