Querying an RDBMS directly in JSON

How we went about designing the Hasura platform data APIs — MongoDB like simplicity, but over Postgres.

TLDR

What if this intent:

Fetch top 5 articles by rating for every category

was this query:

Query:

{
    "from"     : "category",
    "columns"  : [
        "name",
        "description",
        { "name"     : "articles",
          "columns"  : ["title", "rating"],
          "order_by" : "-rating",
          "limit"    : 5
        }
    ]
}

Response:

[ 
  { "category"    : "technology",
    "description" : "...",
    "articles"    : [
        { "title"  : "Hello",
          "rating" : 5
        },
        ..
    ]
  },
  ..
]

Why is this useful?

Imagine the server-side programming effort we would save, if we had an existing service that arbitrated a JSON interface to an RDBMS.

Technically, the required input/information for this service is only:

  1. The schema information: To compile JSON queries to SQL and to serialize the SQL response to JSON
  2. Row Level Security annotations: To prevent unauthorized access to data. Eg: one user should not be able to query other users’ data

Building the JSON interface

Consider this schema, for the discussions below:

Every article has only one author, and only one category 
We will discuss the following:

  1. SELECT query and response translation to JSON
  2. SELECT with JOINs, queries and responses
  3. JOIN with WHERE, queries and responses

1. Simple SELECT queries

1.1 Syntax

Except for the WHERE clause, a simple SQL SELECT query can be represented in JSON as follows:

SQL:  

SELECT title, rating  
  FROM article 
  WHERE published = '2015-11-04' AND rating > 4 
  ORDER BY rating DESC 
  LIMIT 10
  OFFSET 10

Desired JSON:

{
    "columns"  : ["title", "rating"],
    "from"     : "article",
    "where"    : ????
    "order_by" : "-rating",
    "limit"    : 10,
    "offset"   : 10
}

A where clause in SQL's SELECT is a boolean expression which can be evaluated to True or False for each row. We now need a syntax for boolean expressions in JSON

Taking inspiration from MongoDB:

This basic boolean expression { "COLUMN" : { "OP" : VALUE } } can now be combined using $and, $or and $not to build increasingly complex boolean expressions.So, the where clause in the above SQL query can now be written in JSON as

{ 
    "$and" : [ { "published": { "$eq": "2015-11-04" } } 
             , { "rating"   : { "$gt": 4 } }
             ]
}

However, this could be further simplified with some syntactic sugar as follows :

{
    "published" : "2015-11-04",
    "rating"    : { "$gt" : 4 }
}

So finally:

SQL:

SELECT title, rating  
  FROM article 
  WHERE published = '2015-11-04' AND rating > 4 
  ORDER BY rating DESC 
  LIMIT 10
  OFFSET 10

Desired JSON:  

{
    "columns"  : ["title", "rating"],
    "from"     : "article",
    "where"    : {
        "published" : "2015-11-04",
        "rating"    : { "$gt" : 4 }
    },
    "order_by" : "-rating",
    "limit"    : 10,
    "offset"   : 10
}

1.2 Response

The response of an SQL SELECT statement is zero or more rows. 
Each row can be mapped to an object in JSON, with the entire result being an array of objects.

SQL:

+---------+------------+
|  title  |   rating   |
+---------+------------+
|  Hello  |      5     |
+----------------------+
|  World  |     4.9    |
+----------------------+

JSON:

[
    {
        "title"  : "Hello",
        "rating" : 5
    },
    {
        "title"  : "World",
        "rating" : 4.9
    }
]

2. SELECT queries with JOINs

Typically, JOINs in SQL are used to de-normalize data (in a sense, resolving data relationships captured by foreign key constraints). For example, to resolve the 'author' information of an article, the query would be as follows :

SELECT title, rating, author.id, author.name  
  FROM article 
  JOIN author
    ON article.author_id = author.id

Instead of coming up with an alternative syntax for JOINs, we’ll approach JOINs more implicitly through the use-case of denormalization.

To begin, instead of looking at the query translation, let’s look at ways to represent denormalized data in JSON.

2.1 Denormalized data in JSON

Example denormalizations for this schema:

Article and it’s author:

{
    "title"  : "Hello",
    "rating" : 5,
    "author" : {
        "name" : "vamshi",
        "id"   : 1
    }
}

Author and his articles:  

{
    "id"       : 1,
    "name"     : "vamshi",
    "articles" : [
        {
            "title"  : "Hello",
            "rating" : 5
        },
        {
            "title"  : "World",
            "rating" : 4.9
        }
    ]
}

As seen above, it seems natural to extend the object and array syntax, to represent relationships. This is already an established way of representing relationships in document stores.

If we call the author of an article an object_relationship, and articles of an author an array_relationship, the comparison with ORMs is as follows :

2.2 Denormalization queries in JSON

Object relationships

Fetch articles and each one's author

{
    "columns"  : [
        "title",
        "rating", 
        { "name"    : "author",
          "columns" : ["id", "name"] 
        }
    ],
    "from"     : "article"
}

Array relationships

Fetch authors and each one's articles

{
    "columns"  : [
        "id",
        "name",
        { "name"    : "articles",
          "columns" : ["title", "rating"] 
        }
    ],
    "from"     : "author",
    "where"    : { "id" : 1 }
}

2.3 Arbitrary depth denormalization

Fetch 1 author, his articles, and the category for each article

Request:  

{
    "columns"  : [
        "id",
        "name",
        { "name"    : "articles",
          "columns" : [
              "title",
              "rating",
               { "name"    : "category"
               , "columns" : ["id", "name", "desc"]
               }
          ]
        }
    ],
    "from"     : "author",
    "limit"    : 1
}

Response:

[{
    "id"       : 1,
    "name"     : "Ramu",
    "articles" : [
        {
            "title"  : "Hello",
            "rating" : 5,
            "category" : { "name" : "general", 
                           "desc" : "..." }
        },
        {
            "title"  : "World",
            "rating" : 4.1,
            "category" : { "name" : "programming", 
                           "desc" : "..." }
        }
    ]
}]

3. JOINs with WHEREs

Denormalization via the notion of relationships in our data, thus mostly mitigate the need for explicit joins. 
Let us push this idea of relationships even further.

3.1. WHERE inside the nested data

What if we allowed a where clause, inside the nested column specification?

{
    "columns"  : [
        "id",
        "name",
        { "name"    : "articles",
          "columns" : ["title", "rating"],
          "where"   : { "rating" : { "$gt" : 4.5 } }
        }
    ],
    "from"     : "author"
}

We’re not fetching authors whose articles have a rating > 4.5, we’re fetching the articles of the authors that have a rating > 4.5 for every author.

We’re filtering the nested array on properties of objects within the array. Going all the way, and adding order-ing, limit-ing we can do some pretty cool stuff.

This functionality is similar to the recently introduced inner_hits in elasticsearch.

3.3. Top level WHERE clause using nested object properties

We can flip the idea above, and use the property of a nested object to filter our top-level result.

Articles whose author relationship holds this property { “name” : “vamshi” }

{
    "columns" : ["title", "rating"]
    "from"    : "article"
    "where"   : {
        "author" : { "name" : "vamshi" }
    }
}


All authors who have written at least one article with rating greater than 4

{
    "columns" : ["id", "name"]
    "from"    : "author"
    "where"   : {
        "articles" : { "rating" : { "$gt" : 4 } }
    }
}

Comparison with GraphQL

While there is a fair amount of overlap between this kind of a JSON query syntax and GraphQL, both aim to solve slightly different fundamental problems.

JSON queries to SQLGraphQLDesigned to be used with an RDBMSBackend agnosticUsing schema information, compiler should be able to execute all JSON queries within this syntaxWrite server side code to fetch data for predefined query ‘templates’A query is just JSON dataGraphQL data format

One subtle yet extremely important difference is that GraphQL is not really a query language in the traditional sense. It is as much of a query language as you want it to be (by writing code on a GraphQL server).

Conclusion

The next parts in this series will address Row Level Security or “permissions”, without which none of this would make sense for direct client access. Subsequently, we’ll tackle schema-altering JSON queries too.

The Hasura platform Data APIs

The Hasura platform’s Data APIs are an implementation of this JSON query compiler on top of Postgres, and applies application-user permissions to ensure safe data access. Take them for a spin here: https://hasura.io


Hasura is an open-source engine that gives you realtime GraphQL APIs on new or existing Postgres databases, with built-in support for stitching custom GraphQL APIs and triggering webhooks on database changes.