all posts

Querying an Embedded Map in PostgreSQL with Ecto

Published on 2018-03-09

8m read
elixir

Structs and maps are easy to work with in Elixir, but if they are stored in the database as JSON and accessed via an Ecto Schema, it’s not as clear how to query them. We’re going to explore how to do that, and make it clear and easy.

PostgreSQL has great support for objects stored as JSON. This is useful for those moments when you need to store data that could be variably structured, such as responses from other services’ APIs, or data that frequently travels together within your relational tables.

A common trade-off for mixing scalar column data types (like varchar or integer) with column data types that handle more-complicated objects (like <abbr title=”JavaScript Object Notation”>JSON</abbr>) is that <abbr title=”Object-relational Mapping”>ORMs</abbr> or data mappers sometimes can’t introspect on them for you, which means it becomes much harder to query that data.

Using Ecto’s embedded_schema helps introspect on those known values, but it doesn’t really assist you with querying those fields in SQL. This is where I became extremely greatful for Ecto’s escape hatch: fragment().

Define the Struct or Map in Ecto

Let’s dive into some code as an example:

I have a Vehicle.Photo schema that has several versions of the photo:

  • craigslist_ad
  • facebook_ad
  • facebook_carousel_ad
  • extra_large
  • extra_small
  • large
  • medium
  • original
  • small

We decided to store the versions’ <abbr title=”Uniform Resource Locators”>URLs</abbr> inside a map in the database, because we’re going to use a set of the URLs at the same time inside of an HTML <img srcset />. You can read more about srcset from MDN and how it helps with responsive images.

The Ecto migration looks like this:

def up do
  alter table(:vehicle_photos) do
    add :standard_urls, :map
    add :facebook_urls, :map
    add :craigslist_urls, :map
  end
end

The Ecto schema looks like this:

schema "vehicle_photos" do
  field(:file, PhotoUploader.Type)

  embeds_one :standard_urls, StandardUrls, on_replace: :update do
    field(:extra_large, :string)
    field(:extra_small, :string)
    field(:large, :string)
    field(:medium, :string)
    field(:original, :string)
    field(:small, :string)
  end

  embeds_one :facebook_urls, FacebookUrls, on_replace: :update do
    field(:hero_ad, :string)
    field(:carousel_ad, :string)
  end

  embeds_one :craigslist_urls, CraigslistUrls, on_replace: :update do
    field(:ad, :string)
  end
end

Since this is a known structure, Ecto can introspect on the JSON values and cast and dump them to the appropriate Elixir data types, which is immensely helpful. Here I am achieving that by using embeds_one and specifying the struct. Once pulled from the database, Ecto will decode them.

Other times, you may not be able to do this ahead of time, so the schema might look like this (the api_response field):

schema "vehicle_photos" do
  field(:file, PhotoUploader.Type)
  field(:api_response, :map)
end

Query the JSON

Continuing with the struct example schema, we found out that some of our URLs weren’t being populated like we expected, so I had to find those photos and fix them. How do I query for them since they’re stored in PostgreSQL as JSON? We need to drop down into raw SQL:

def where_photo_urls_have_a_null(query) do
  query
  |> where([_q], fragment(
    """
    (facebook_urls IS NULL) OR
    (facebook_urls->>'ad_version' IS NULL) OR
    (facebook_urls->>'hero_version' IS NULL) OR
    (craigslist_urls->>'ad' IS NULL)
    """
  ))
end

The SQL operator ->> will leverage PostgreSQL’s JSON functions to retrieve the text or integers that are stored in the JSON. You can access them using this syntax: column->>key. In my case, I needed to find if the column was null, or it wasn’t null, then to ask if the JSON object has any keys that are null. This will work regardless of whether you use an embedded struct or a map, because PostgreSQL sees it as the same thing: JSON.

Here’s an example that checks for substrings:

def where_photo_url_wrong(query) do
  query
  |> where([_q], fragment(
    """
    (facebook_urls->>'hero_ad' NOT ILIKE ?) OR
    (facebook_urls->>'carousel_ad' NOT ILIKE ?) OR
    (craigslist_urls->>'ad' NOT ILIKE ?)
    """,
    "%facebook_hero_ad%",
    "%facebook_carousel_ad%",
    "%craigslist_ad%"
  ))
end

Make the Query Composable

Above is all I needed for my use case, but I wondered how I could continue querying those fields in a reusable way. For example, how do I chain these together in an OR statement that uses both of these fragments?

To do that, I’ll need to extract the fragment expressions and put them into a macro so they can be used within Ecto’s functions.

defmodule MyProject.SampleQuery.Fragments do
  import Ecto.Query.API, only: [fragment: 1]

  defmacro photo_urls_have_a_null do
    quote do
      fragment(
        """
        (facebook_urls IS NULL) OR
        (facebook_urls->>'ad_version' IS NULL) OR
        (facebook_urls->>'hero_version' IS NULL) OR
        (craigslist_urls->>'ad' IS NULL)
        """
      )
    end
  end

  defmacro photo_urls_not_contain([hero_ad_value, carousel_ad_value, ad_value]) do
    quote do
      fragment(
        """
        (facebook_urls->>'hero_ad' NOT ILIKE ?) OR
        (facebook_urls->>'carousel_ad' NOT ILIKE ?) OR
        (craigslist_urls->>'ad' NOT ILIKE ?)
        """,
        ^"%#{unquote(hero_ad_value)}%",
        ^"%#{unquote(carousel_ad_value)}%",
        ^"%#{unquote(ad_value)}%"
      )
    end
  end
end

Now that those fragments are extracted, let’s use them:

import MyProject.SampleQuery.Fragments
alias MyProject.Photo

defmodule MyProject.SampleQuery do
  def find_bad_photos(query \\ Photo) do
    query
    |> where([_p], photo_urls_have_a_null())
    |> or_where([_p], photo_urls_not_contain([
      "facebook_hero_ad",
      "facebook_carousel_ad",
      "craigslist_ad"
    ]))
    |> Repo.all
  end
end

Beautiful.

If you’d like to check out the code a little more, you can see this sample Ecto and Phoenix repo with tests.

This article only explains how to query a JSON object in the database and how it works with Ecto querying. If you’re needing to store an array of maps or structs, then check out Jon’s post Why Ecto’s Way of Storing Embedded Lists of Maps Makes Querying Hard.


What do you think of what I said?

Share with me your thoughts. You can tweet me at @bernheisel .