Distinct on Postgresql JSON data column

Trying to do distinct on a mode with rails.

2.1.1 :450 > u.profiles.select("profiles.*").distinct


Profile Load (0.9ms)  SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" WHERE "integrations"."user_id" = $1  [["user_id", 2]]
PG::UndefinedFunction: ERROR:  could not identify an equality operator for type json
LINE 1: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integ...
                        ^
: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" WHERE "integrations"."user_id" = $1
ActiveRecord::StatementInvalid: PG::UndefinedFunction: ERROR:  could not identify an equality operator for type json
LINE 1: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integ...
                        ^
: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" WHERE "integrations"."user_id" = $1
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/rack-mini-profiler-0.9.1/lib/patches/sql_patches.rb:109:in `prepare'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/rack-mini-profiler-0.9.1/lib/patches/sql_patches.rb:109:in `prepare'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:834:in `prepare_statement'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:795:in `exec_cache'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql/database_statements.rb:139:in `block in exec_query'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract_adapter.rb:442:in `block in log'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activesupport-4.0.4/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract_adapter.rb:437:in `log'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql/database_statements.rb:137:in `exec_query'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:908:in `select'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract/query_cache.rb:63:in `select_all'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/querying.rb:36:in `find_by_sql'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:585:in `exec_queries'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/association_relation.rb:15:in `exec_queries'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:471:in `load'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:220:in `to_a'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:573:in `inspect'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands/console.rb:90:in `start'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands/console.rb:9:in `start'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands.rb:62:in `<top (required)>'
    from bin/rails:4:in `require'
    from bin/rails:4:in `<main>'2.1.1 :451 > 

Getting an error PG::UndefinedFunction: ERROR: could not identify an equality operator for type json

Converting to Hstore is not an option for me in this case. Any work arounds?

Answers


The reason behind this, is that in PostgreSQL (up to 9.3) there is no equality operator defined for json (i.e. val1::json = val2::json will always throw this exception) -- in 9.4 there will be one for the jsonb type.

One workaround is, you can cast your json field to text. But that won't cover all json equalitions. f.ex. {"a":1,"b":2} should be equal to {"b":2,"a":1}, but won't be equal if casted to text.

Another workaround is (if you have a primary key for that table -- which should be) you can use the DISTINCT ON (<expressions>) form:

u.profiles.select("DISTINCT ON (profiles.id) profiles.*")

Note: One known caveat for DISTINCT ON:

The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.


Sorry I'm late on this answer, but it might help others.

As I understand your query, you're only getting possible duplicates on profiles because of the many-to-many join to integrations (which you're using to determine which profiles to access).

Because of that, you can use a new GROUP BY feature as of 9.1:

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

So in your case, you could get Ruby to create the query (sorry, I don't know the Ruby syntax you're using)...

SELECT profiles.* 
FROM "profiles" 
  INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" 
  INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" 
WHERE "integrations"."user_id" = $1
GROUP BY "profiles"."id"

I only removed the DISTINCT from your SELECT clause and added the GROUP BY.

By referring ONLY to the id in the GROUP BY, you take advantage of that new feature because all the remaining profiles columns are "functionally dependent" on that id primary key.

Somehow, wonderfully that avoids the need for Postgres to do equality checks on the dependent columns (ie your json column in this case).

The DISTINCT ON solution is also great, and clearly sufficient in your case, but you can't use aggregate functions like array_agg with it. You CAN with this GROUP BY approach. Happy days! :)


If you use PG 9.4 , using JSONB rather than JSON solves this problem Example :

-- JSON datatype test 

create table t1 (id int, val json);
insert into t1 (id,val) values (1,'{"name":"value"}');
insert into t1 (id,val) values (1,'{"name":"value"}');
insert into t1 (id,val) values (2,'{"key":"value"}');
select * from t1 order by id;
select distinct * from t1 order by id;

-- JSONB datatype test 

create table t2 (id int, val jsonb);
insert into t2 (id,val) values (1,'{"name":"value"}');
insert into t2 (id,val) values (1,'{"name":"value"}');
insert into t2 (id,val) values (2,'{"key":"value"}');

select * from t2 order by id;

select distinct * from t2 order by id;

Result of running the above script :

CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
1 | {"name":"value"}
1 | {"name":"value"}
2 | {"key":"value"}

ERROR:  could not identify an equality operator for type json
LINE 1: select distinct * from t1 order by id;
                    ^
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
1 | {"name": "value"}
1 | {"name": "value"}
2 | {"key": "value"}

1 | {"name": "value"}
2 | {"key": "value"}

As you can see PG succeeded to imply DISTINCT on a JSONB column while it fails on a JSON column !

Try also the following to see that actually keys in the JSONB are sorted :

insert into t2 values (3, '{"a":"1", "b":"2"}');
insert into t2 values (3, '{"b":"2", "a":"1"}');
select * from t2;

1 | {"name": "value"}
1 | {"name": "value"}
2 | {"key": "value"}
3 | {"a": "1", "b": "2"}
3 | {"a": "1", "b": "2"}

note that '{"b":"2", "a":"1"}' was inserted as '{"a":"1", "b":"2"}' therefor PG identifies that as the same record :

select distinct * from t2;
3 | {"a": "1", "b": "2"}
2 | {"key": "value"}
1 | {"name": "value"}

Need Your Help

How do I set a blank value for an f.select form field

ruby-on-rails ruby select

I am using the following to allow my users to select their sex in their profile.

How to open multiple pull requests on GitHub

git github pull-request

When I open a pull request on GitHub, all commits since my last request and all new ones are automatically added to this request. I can't seem to control which commits are added and which are not. ...