Postgres Containment Operators : Part 2 — Performance comparison with MongoDB

In Part 1 of the series, we looked at what Postgres containment operators are. In Part 2, we see how they perform when compared to MongoDB.

What are we measuring ?

We measure the time taken to execute the following four types of queries,

  1. Simple — Query for a value in a top level key
  2. Embedded Document — Query for a value inside a nested object
  3. Array — Query for the existence of a value inside an array
  4. Multi-key — Query for values across multiple keys

under the following settings,

  1. Postgres without using containment operator
  2. Postgres using containment operator
  3. MongoDB

in both indexed and non-indexed cases.

Measurement Setup

The tests are run on a MacBook Pro (Retina, 13-inch, Late 2013). The versions of Postgres and MongoDB are 9.4.4 and 3.0.7 respectively.

Postgres timing information is obtained by prefixing the queries with EXPLAIN (ANALYZE TRUE, TIMING FALSE). MongoDB timing information is obtained from db.collection.find().explain("executionStats").

Data

Both databases are loaded with 1 million random JSON objects of the form,

{
  "name": "Ochoa",
  "age": 47,
  "spouse": {
    "name": "Santos",
    "age": 35
  },
  "children": [
    "Jannie",
    "Olsen"
  ]
}

The Postgres table the data was loaded into has the schema,

Column | Type  | Modifiers
-------+-------+----------------------------------------------------
 id    |integer| not null default nextval('people_id_seq'::regclass)
 person| jsonb |

Queries

Query Type   | Query                                           |Hits
--------------------------------------------------------------------
Simple       | Find all people with name "John"                |478
Embedded Doc | Find all people with spouse's name "John"       |492
Array        | Find all people having a child named "John"     |1532
Multi-key    | Find all people aged 50 whose spouse is aged 32 |2288

The following queries were used.

1. Postgres (without containment)

SELECT person FROM people WHERE person ->> 'name' = 'John'

SELECT person FROM people WHERE person - > 'spouse' - >> 'name' = 'John'

SELECT person  
FROM people  
WHERE EXISTS (  
        SELECT 1
        FROM jsonb_array_elements_text(person - > 'children') children
        WHERE children = 'John'
        )

SELECT person  
FROM people  
WHERE person - >> 'age' = '50'  
    AND person - > 'spouse' - >> 'age' = '32';

2. Postgres (with containment)

SELECT person  
FROM people  
WHERE person @ > '{"name" : "John"}';

SELECT person  
FROM people  
WHERE person @ > '{"spouse" : {"name" : "John"}}'

SELECT person  
FROM people  
WHERE person @ > '{"children" : ["John"]}'

SELECT person  
FROM people  
WHERE person @ > '{"age" : 50, "spouse": {"age" : 32}}'

3. MongoDB

db.people.find({"name" : "John"})

db.people.find({"spouse.name" : "John"})

db.people.find({"children" : "John"})

db.people.find({"age": 50, "spouse.age" : 32})

Indexes

In case of Postgres (without containment) and MongoDB we created indices for — name, spouse.name, children, age & spouse.age and in the case of Postgres (with containment) we created one GIN (jsonb_path_ops) index.

Results

We present two results — the timing measurements in non-indexed and indexed cases.

Non-Indexed

Indexed

Conclusion

The numbers suggest that Postgres jsonb performance is at par with MongoDB in most cases and betters it in multi-key containment queries. Moreover it can do it with just one index, thanks to GIN. Well done Postgres!


Hasura is an open-source engine that gives you realtime GraphQL APIs on new or existing Postgres databases, with built-in support for stitching custom GraphQL APIs and triggering web hooks on database changes.


Hasura

Hasura

The Hasura GraphQL Engine gives you realtime, high performance GraphQL on any Postgres app. Now supports event triggers for use with serverless.

Read More