ActiveRecord and Why “Easier” May Not Always be “Better”

During this phase in our bootcamp we have been learning the foundations for building a backend in Ruby. We began with some brief refreshing of Ruby’s syntactical and other differences (for we had been working with JavaScript up until this point). This was followed by a quick jaunt into SQL, just enough to get a feel for what would be happening under the hood when we began working with ActiveRecord.

My grasp of chaining commands in SQL was still quite shaky when we moved on to ActiveRecord, but I already recognized the potential utility of a well-crafted query. With the promised power of SQL queries still bouncing around in my head, I began to undertake a series of labs focused on teaching us ActiveRecord.

One lab which involved tables of students and exams, as well as a join-table connecting them, called for a method that would find the exam with the lowest class-average. I reflexively went about the task by using what muddled understanding I had about chained SQL statements and trying to implement that approach in ActiveRecord. The end result was an approach that worked, but felt really cumbersome and confusing. I dreaded the prospect of going about future tasks similarly, there had to be a better approach.

class Exam < ActiveRecord::Base  has_many :student_exams  has_many :students, through: :student_exams
. . .
def self.lowest_average id = joins(:student_exams)
.min_by {|k, v| v}[0])
find(id) end

(Note: the above code is as it was originally written and is presented to illustrate my journey, and is not meant to be instructive)

A short while after constructing that monstrosity, I was introduced to a much easier approach using ActiveRecord methods and some simple iteration:

def class_average  student_exams.average(:grade).to_fend
def self.lowest_average all.min_by {|exam| exam.class_average} end

This looked so much cleaner and felt way more intuitive to me, and I was overjoyed that I would not have to go through the anguish of my first approach forevermore. However, when I began to test this new approach, I noticed something curious: where previously my lowest_average method had been generating a single SQL query behind the scenes this shiny new model was generating way more, one to grab all of the exams and then another to grab each of exams’ averages. While this wasn’t causing any problems for me at this tiny scale, I wondered what the implications might be of this discrepancy I discovered.

After looking into it for a mere second, I discovered that my discovery was in fact a well known issue called the N+1 Queries Problem. This problem arises in situations where you have (at least) a one-to-many relationship and you access the numerous objects that belong to owner objects using what is called “lazy loading,” that is to say querying for specific things only when it is necessary to access them. My latter, cleaner, reformatted code above is an example of lazy loading. First, it queries the list of exams, and only after it has those does it query the student_exams join table for each exam. My first attempt at the code accidentally avoided this (at the time unknown) problem by essentially constructing a single query that asked the database to do the work of organizing the exams by their class averages.

Fortunately, my convoluted original strategy is not the only solution to this problem! ActiveRecord has several built in methods that can “eagerly load” (the opposite of lazy loading) information with fewer queries, such as .includes, .eager_load, and .preload. The relative benefits and pitfalls of each of these, as well as their individual implementation are beyond the scope of this post, but I will certainly be exploring this topic further as I get closer to working with projects of a scale where efficiency matters.