GraphQL and Tree Data Structures with Postgres on Hasura GraphQL engine

Modelling a tree data structure on postgres and using GraphQL to add and get data from the tree.

When are tree data structures used ?

Recursive or Tree Data Structures are commonly used to model your schema to store data that has a hierarchy.

Common use-cases of tree data structures include:

Storing a directory structure for an application like Google Drive where you would have a root directory. Within this root directory you may have other directories or files. Every directory at any level of nesting can have any number of files or directories inside it.
Comment threads for blogs or forums where each post or topic can have comments and each comment can in turn have comments (or replies).
Finding common connections or friends for a networking application like LinkedIn or Facebook.

In this blog post we are going to take a look at how you can model your database to build a comment thread feature for your application. Moreover, we will also be using GraphQL to add and delete comments.

Let’s take an example of a very simple blog and work on enabling comments for each post. To elaborate on what we need

Every post can have comments
Every comment can have replies (child comments)
We want to list these comments in an ordered fashion sorted by when they were created.

We are going to go with Hasura GraphQL engine as it provides a Postgres database with instant GraphQL queries.

Getting started

We will use the Hasura GraphQL engine for instantly getting GraphQL APIs over Postgres. Click on the button below to deploy the GraphQL engine to Heroku’s free tier.

Click this button to deploy the GraphQL engine to Heroku

This will deploy the graphql-engine to Heroku. You might need to create a Heroku account if you don’t have one. The graphql-engine will be running at https://your-app.herokuapp.com (replace your-app with your heroku app name).

API Console

Every Hasura cluster comes with an API Console that you can use to build the backend for your application.

The API Console is available at https://your-app.herokuapp.com/console

Creating a table

Head to the Data tab and click on Create Table to create a new table

Let’s call our table post_comments to store the comments for each post. The table will have the following columns

id Integer(auto increment) Primary Key
parent_id Integer Nullable
comment Text
created_at Timestamp default now()
post_id Integer
user_id Integer

Hit the create button to create the table.

This is the table where we will store all the comments for each of the blog posts.

Creating a self reference

Next, let’s define a foreign key constraint on the parent_id column to the id column.

To do this, head to the Modify tab and click on the Edit button next to parent_id. Check the Foreign Key checkbox, select post_comments as the reference table and id as the reference column.

Adding a relationship to fetch the child comments

Next, click on the Relationship tab and click on the Add button under the Suggested Array Relationship column. Name the relationship children_comments

Hit Save to add this relationship.

Fetching and adding comments

The mutation to insert a comment will be

mutation add_comment {
  insert_post_comments(objects: $objects) {
    returning{
      id
      parent_id
      comment
      created_at
      user_id
      post_id
    }
  }
}

The $objects variable will be

{
  "objects": [
    {
      "user_id": 1,
      "post_id": 1,
      "comment": "First comment on post 1",
      "parent_id": null
    }
  ]
}

Similarly, the $objects variable to add a reply to a comment

{
 "objects": [
    {
      "user_id": 1,
      "post_id": 1,
      "comment": "First comment on post 1",
      "parent_id": 1 //Or id of the comment to which this comment is a reply to
    }
  ]
}

Fetching comments

If we are aware of the level of nesting in our comments, then the GraphQL query to fetch comments and all children comments for a post would be

query get_comments_for_post {
  post_comments(
    where: {
      post_id: 1
      parent_id: null
    }
     order_by: ["+created_at"]
  ) {
    id
    parent_id
    comment
    created_at
    user_id
    post_id
    children_comments (
      order_by: ["+created_at"]
    ){
      id
      parent_id
      comment
      created_at
      user_id
      post_id
    }
  }
}

Here, we are fetching all the comments for a post with an id value of 1 and whose parent_id is null. We are then fetching all the replies (children_comments) as a relationship. The +created_at in the order_by field denotes that the comments should be fetched in the ascending order (based on the value of created_at ). Alternatively, - would denote descending. If no symbol is specified in the query, then + is assumed by default.

Similarly, in case you had another level of nesting, the query would be

query get_comments_for_post {
 post_comments(
    where: {
      post_id: 2
      parent_id: null
    }
     order_by: ["+created_at"]
  ) {
    id
    parent_id
    comment
    created_at
    user_id
    post_id
    children_comments (
      order_by: ["+created_at"]
    ){
      id
      parent_id
      comment
      created_at
      user_id
      post_id
      children_comments(
        order_by: ["+created_at"]
      ) {
        id
        parent_id
        comment
        created_at
        user_id
        post_id
      }
    }
  }
}

Working with unknown levels of nesting

Sites like https://news.ycombinator.com/ allow any level of nesting. Which means that every comment can have one more child comments. In this case, fetching our comment like we did above does not work since we do not know how many levels we need to fetch.

One of the ways of handling this is to fetch the complete list of comments for the particular topic (in this case a blog post) and then arrange it in memory on your client.

query get_comments_for_post {
  post_comments(
    where: {
      post_id: 1
    }
     order_by: ["created_at"]
  ) {
    id
    parent_id
    comment
    created_at
    user_id
    post_id
  }
}

You could also have another table that keeps a track of ancestry, something like post_comment_ancestry

comment_id
ancestor_id

Here, for each comment you will store a list of all of its ancestors. For eg: if comment A has two child comments B and C and comment C has a child comment D,

A
| - B
| - C
    | - D

the entry in the post_comment_ancestry table would be

+------------------------+
|comment_id | ancestor_id|
+------------------------+
|     B     |      A     |
|     C     |      A     |
|     D     |      C     |
|     D     |      A     |
+------------------------+

As you can see, comment D has two entries for A and C respectively. Using this table you can fetch a list of all child comments to any arbitrary amount of nesting for a particular comment.