I'm currently building an application which makes heavy use of JSONB in PostgreSQL. After a while of searching I was unable to find any solution to people's questions about distinct JSONB keys across an entire table. Here are my findings and I hope it helps someone else.
For the application I"m building, it's highly critical upfront to know which keys exist in the column in order to customize the UI. Data stored in the JSONB column is completely dynamic and has no schema associated, similar to what you'd expect from MongoDB. Due to this restraint, without such query, the application in unable to know which keys it can query.
This is the simplified query I came up with which gives a unique list of every key stored across the entire table. I tested it across with a few million records and returned in a few miliseconds so it more than serves it's performance needs and it can even be easily cached as a flat string array on billions of records.
SELECT
DISTINCT field
FROM (
SELECT jsonb_object_keys(column_name) AS field
FROM table_name
) AS subquery
The above query uses a sub statement to unfold the keys as a queryable array, which then uses DISTINCT
to only bring back unique values from the sub array. Since I'm using Ruby on Rails, I combined it with ActiveRecord in order to give me an array of values to then use in the UI:
result = ActiveRecord::Base.connection.execute(query_string)
fields = result.map { |row| row['field'] }
I managed to build this query using documentation from JSONB Functions and Operations which is quite in depth. I hope this helps someone looking for a similar use case as it seems to be a common question among the fairly new JSONB column types. Comment below with any questions or suggestions for improvement.