Write One Query

Sometimes is easier to write some Ruby to tackle a problem, but may result in a performance hit. For example, looping through a set of ActiveRecord objects and doing some processing on each object, or even performing an SQL query on every iteration to obtain the desired results. I came across some code recently that was doing just this.

It involved three models with a grandparent > parent > grandchild relationship with your typical 1:M parent to child associations. The grandchild entity included a foreign key status_id which corresponds to a particular status, for example 'Pending', 'Reviewing', or 'Approved'. The goal was to determine whether or not the grandparent had any grandchildren that were in any of these states, so the end result would be this desired hash for each grand_parent:

[
  {
    :grand_parent_id             => 1,
    :has_pending_grandchildren   => true,
    :has_reviewing_grandchildren => false,
    :has_approved_grandchildren  => true
  },
  {
    # ...
  }
]

The purpose of returning this format was to be able to apply some frontend filtering at the grandparent level based on the status of the grandchildren. The original code was doing something along these lines:

# ['Pending','Reviewing', 'Approved', ...]
statuses = Status.pluck(:title)
results = []
GrandParent.all.each do |gp|
  hash = { :grand_parent_id => gp.id }
  statuses.each do |status|
    status_count = GrandChild.find_by_sql("
      SELECT count(gc.id) gc_count
      FROM grand_children gc
      JOIN parents p on p.id = gc.parent_id
      JOIN grand_parents gp on gp.id = p.grand_parent_id
      JOIN statuses s on s.id = gc.status_id
      WHERE gp.id = #{gp.id} AND s.title = #{status}
    ").first
    hash["has_#{status.downcase}_grandchildren".to_sym] = status_count.present? && status_count.gc_count > 0 ? true : false
  end
  results << hash
end

For a small number of records, this would not be an issue, but when you start to look at a 100 grandparent records and thousands of grandchildren, then the performance hit is noticeable. My goal was to bring the logic into a single SQL query to eliminate overhead. I started with a pure SQL statement given a couple status types and used case statements to return 1 or 0 based on the count of a particular status. The syntax is based on using Oracle, but I imagine would be similar across other RDBMS systems.

SELECT
  gp.id grand_parent_id,
  CASE
    sum(CASE s.title WHEN 'Pending' THEN 1 ELSE 0 END)
    WHEN 0 THEN 0
    WHEN NULL THEN 0
    ELSE 1
  END has_pending_grandchildren,
  CASE
    sum(CASE s.title WHEN 'Reviewing' THEN 1 ELSE 0 END)
    WHEN 0 THEN 0
    WHEN NULL THEN 0
    ELSE 1
  END has_reviewing_grandchildren,
  CASE
    sum(CASE s.title WHEN 'Approved' THEN 1 ELSE 0 END)
    WHEN 0 THEN 0
    WHEN NULL THEN 0
    ELSE 1
  END has_approved_grandchildren
  FROM grand_parents gp
  LEFT JOIN parents p on p.grand_parent_id = gp.id
  LEFT JOIN grand_children gc on gc.parent_id = p.id
  LEFT JOIN statuses s on s.id = gc.status_id
  GROUP BY gp.id

As you can see, there is a pattern for determining the whether or not any of the grandchildren are of a particular status. It takes the sum of a particular status and returns 0 if the sum is null or zero, otherwise returns one. The null portion is required in case there are no children since we are still doing a left join. I was now able to build the query using ruby, since status is table drive, we will want to cover all statuses possible. Here's the final solution I came up with:

statuses = Status.pluck(:title)
sql = "SELECT "
statuses.each do |status|
  sql << "CASE "
  sql << "sum(CASE s.title WHEN '#{status}' THEN 1 ELSE 0 END) "
  sql << "WHEN 0 THEN 0 WHEN NULL THEN 0 ELSE 1 END has_#{status.downcase}_grandchildren, "
end
sql << "gp.id grand_parent_id "
sql << "FROM grand_parents gp "
sql << "LEFT JOIN parents p on p.grand_parent_id = gp.id "
sql << "LEFT JOIN grand_children gc on gc.parent_id = p.id "
sql << "LEFT JOIN statuses s on s.id = gs.status_id "
sql << "GROUP BY gp.id"

# We still need to loop thorugh to set boolean values
results = []
GrandParent.find_by_sql(sql).each do |result|
  hash = result.attributes
  hash.keys.each do |k|
    next if k.to_s == "grand_parent_id"
    hash[k] = hash[k] == 0 ? false : true
  end
  results << hash
end

And that's that! In my case I was able to change the load time from about 6s to under a second. Important - this is prone to SQL injection, so status must be sanitized appropriately.