Minimal json schema from json

Script to convert json to minimal json-schema

Recently at work I had to build a tool to prevent misconfiguration of our apps. The configuration file is a simple json file, but for added security, it was decided to do some sanity checks on the values.

So the setup is simple: I have a fairly large json file and I want to output a corresponding json schema to validate the config.

For example, given the following json object:

{
  "network": {
    "poll_frequency": 42,
    "server_url": "https://foo.bar.com"
  },
  "storage": "sqlite"
}

Would give the following json schema:

{
  "type": "object",
  "properties": {
    "network": {
      "type": "object",
      "properties": {
        "poll_frequency": {
          "type": "number"
        },
        "server_url": {
          "type": "string"
        }
      },
      "required": ["poll_frequency", "server_url"]
    },
    "storage": {
      "type": "string"
    }
  },
  "required": ["network", "storage"]
}

Haskell script with stack

With stack, it is possible to run haskell as a script, which is very useful. Some boilerplate first:

#!/usr/bin/env stack
{- stack --resolver lts-6.10 runghc
    --package text
    --package aeson
    --package aeson-pretty
    --package containers
-}

A few notes:

Pragmas and imports

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy as B
import qualified System.Environment as Env
import qualified Data.Aeson as JSON
import Data.Aeson.Encode.Pretty (encodePretty', defConfig, confCompare)
import qualified Data.HashMap.Strict as Map
import qualified Data.HashSet as Set
import qualified Data.Vector as Vector

OverloadedStrings is very basic and makes working with Text and String much easier. I'll go over the imports as they are used in the code. I like to keep them qualified as much as possible to keep things crystal clear.

Decode a given payload

main :: IO ()
main = do
    [specPath, outPath] <- Env.getArgs
    rawSpec <- B.readFile specPath
    case JSON.eitherDecode rawSpec of -- :: Either String JSON.Value
        Left err -> print err
        Right val -> undefined -- coming soon

Aeson provides a function eitherDecode with the type FromJSON a => ByteString -> Either String a. Here, explicitely providing an instance is not required because we're going to directly use a JSON Value and not bother converting to some internal datatype.

Converting a json object to a schema

This is where pattern matching comes very handy.

The function will have a simple type:

convertToSchema :: JSON.Value -> JSON.Value

First, let's have a look at the simple cases. Json schema has some primitive types, like number, null, string and boolean.

convertToSchema (JSON.String _) = JSON.Object $ Map.singleton "type" (JSON.String "string")
convertToSchema (JSON.Number _) = JSON.Object $ Map.singleton "type" (JSON.String "integer")
convertToSchema (JSON.Bool _) = JSON.Object $ Map.singleton "type" (JSON.String "boolean")
convertToSchema JSON.Null = JSON.Object $ Map.singleton "type" (JSON.String "null")

Here, every value with a given type is simply replaced by a schema specifying its type.

Now, for an object, we want to do two things:

convertToSchema (JSON.Object o) =
  let
    keys = JSON.String <$> Map.keys o  -- wrap all keys as json string
    props = JSON.Object $ convertToSchema <$> o  -- recurse over the values of the current object
  in
    JSON.Object $ Map.fromList [
      ("type", JSON.String "object")
    , ("properties", props)
    , ("required", JSON.Array $ V.fromList keys)
    ]

An array is very similar. The main difference is to chose which schema(s) to generate for the items. Also, the minimum number of items will be the number of items we got. This is somewhat arbitrary but it can always be changed later (manually if needs be).

-- utility to remove duplicates in a vector
nubVector v = V.fromList $ Set.toList $ V.foldl' (flip Set.insert) Set.empty v

convertToSchema (JSON.Array arr) =
    let
        items = convertToSchema <$> arr
        uniqueItems = nubVector items
        innerSchema = if length uniqueItems == 1
            then uniqueItems V.! 0
            else JSON.Object $ Map.singleton "oneOf" (JSON.Array uniqueItems)
        minItems = fromIntegral (length arr)
    in
        JSON.Object $ Map.fromList [
        ("type", JSON.String "array")
        , ("minItems", JSON.Number minItems)
        , ("items", innerSchema)
        ]

Pretty printing the schema

That's where aeson-pretty comes in handy:

    -- continued from main
    Right val -> do
        let schema = convertToSchema val
        let pretty = encodePretty' (defConfig {confCompare = compare}) schema
        B.writeFile outPath pretty

Et voilĂ , nothing more to do. The resulting schema is very minimal, but it's a good base to customize later. Since json schemas can be quite verbose, it's a pain to manually write them.

For ease of copy pasting, the complete script can be found below.

#!/usr/bin/env stack
{- stack --resolver lts-6.10 runghc
    --package text
    --package aeson
    --package aeson-pretty
    --package containers
    -- -W
-}
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy as B
import qualified System.Environment as Env
import qualified Data.Aeson as JSON
import Data.Aeson.Encode.Pretty (encodePretty', defConfig, confCompare)
import qualified Data.HashMap.Strict as Map
import qualified Data.Vector as V
import qualified Data.HashSet as Set

main :: IO ()
main = do
  [specPath, outPath] <- Env.getArgs
  rawSpec <- B.readFile specPath
  case JSON.eitherDecode rawSpec of -- :: Either String JSON.Value
    Left err -> print err
    Right val -> do
      let schema = convertToSchema val
      let pretty = encodePretty' (defConfig {confCompare = compare}) schema
      B.writeFile outPath pretty


convertToSchema :: JSON.Value -> JSON.Value
convertToSchema (JSON.Object o) =
  let
    keys = JSON.String <$> Map.keys o
    props = JSON.Object $ convertToSchema <$> o
  in
    JSON.Object $ Map.fromList [
      ("type", JSON.String "object")
    , ("properties", props)
    , ("required", JSON.Array $ V.fromList keys)
    ]

convertToSchema (JSON.Array arr) =
    let
        items = convertToSchema <$> arr
        uniqueItems = nubVector items
        innerSchema = if length uniqueItems == 1
            then uniqueItems V.! 0
            else JSON.Object $ Map.singleton "oneOf" (JSON.Array uniqueItems)
        minItems = fromIntegral (length arr)
    in
        JSON.Object $ Map.fromList [
        ("type", JSON.String "array")
        , ("minItems", JSON.Number minItems)
        , ("items", innerSchema)
        ]

convertToSchema (JSON.String _) = JSON.Object $ Map.singleton "type" (JSON.String "string")
convertToSchema (JSON.Number _) = JSON.Object $ Map.singleton "type" (JSON.String "integer")
convertToSchema (JSON.Bool _) = JSON.Object $ Map.singleton "type" (JSON.String "boolean")
convertToSchema JSON.Null = JSON.Object $ Map.singleton "type" (JSON.String "null")

-- remove duplicates in a vector
nubVector v = V.fromList $ Set.toList $ V.foldl' (flip Set.insert) Set.empty v