The Siren Song of ‘has_and_belongs_to_many’ in Rails

Pete Hanner
7 min readSep 19, 2019
Beware! Read on to find out why.

As an object-oriented language, Ruby (and by extension, Rails) supports the use of classes to give context and functionality to the various object instances used throughout the program. These classes are all inter-related to each other in some way or another, and while many exist to model real-world objects, many exist only to store the relationship between those other classes. Rails, with its goal of making our programming lives as easy as possible, has provided an out-of-the-box way to create these model-linking models (called join tables) quickly and easily. That may sound tempting, but let’s dig into the reasons why the easiest path isn’t always the best.

Join Tables: The Origin Story

Databases and many-to-many relationships

In Ruby programming, relationships between our classes or models are discussed in terms of what other models they have and belong to, and whether those models are one or many. A one-to-many relationship (or has_many / belongs_to relationship in Ruby terms) is fairly simple. Imagine a museum full of paintings. A given artist may have dozens of paintings, but each painting belongs only to the one artist who painted it. When a painting is hung in the museum, a placard is placed below it listing who its artist is. Similarly, if we were to keep track of the gallery in a Rails program, we would just give our Painting model an artist_id column in its database migration. Thanks to Rails magic, all we have to do is assign a painting to an artist in the code, and ActiveRecord takes care of all the model linking behind the scenes.

It’s when we arrive at many-to-many relationships that things get thorny. To continue our painting example: the great masters often worked for many patrons, but each patron might have several painters as clients. If our program wants to track this relationship, we hit a problem. If an Artist wants to track all their Patrons in their own model, they would have to add a new column in the database table for any new patron, since databases can only store one piece of information per cell. As a popular artist grows in prosperity, we start to see the problem as columns forpatron1_id, patron2_id, patron3_id, patron4_id and so on gradually stretch across the page. This is true for all Artists, and the same holds for Patrons on the opposite side of the relationship.

The solution is the above-mentioned join table. This database table has just three columns: its own id, an Artist id, a Patron id (or whatever your many-to-many models happen to be). Without getting too deep into database concepts, this solves our problem by having a separate model that keeps track of every relationship between two other models — which as an added bonus is easier for a computer to search and filter. In Rails, we would just set up an ArtistPatron class and table in our migrations. From there, thanks to ActiveRecord magic again, we can assign or call things like michelangelo.patrons or lorenzo_medici.artists throughout our program, and the join table is updated or referenced automatically behind the scenes.

Lisa approves.

Join Tables Made Easy With Rails

Can you hear the sirens calling?

Let’s imagine we’re writing a recipe app (like my wonderful partner Marisa and I were for Module 2 project week at Flatiron), where we want users to be able to add ingredients to their kitchen. Users will have many ingredients in their kitchens, and a given ingredient will appear in many users’ kitchens. Sounds like we need a join table! Let’s walk through the process, keeping in mind that this example glosses over or abridges some steps to focus on the relevant material.

First we set up the migrations:

class CreateUsers < ActiveRecord::Migration[6.0]
def change
create_table :users do |t|
t.string :first_name
t.string :last_name
t.string :username
t.timestamps
end
end
end
class CreateIngredients < ActiveRecord::Migration[6.0]
def change
create_table :ingredients do |t|
t.string :name
t.integer :calories
t.timestamps
end
end
end
class CreateUserIngredients < ActiveRecord::Migration[6.0]
def change
create_table :user_ingredients do |t|
t.integer :ingredient_id
t.integer :user_id
t.timestamps
end
end
end

Once we’ve got our migrations up and running, then we wire up our relationships in the appropriate model files:

class User < ApplicationRecord
has_many :user_ingredients
has_many :ingredients, through: :user_ingredients
...
class Ingredient < ApplicationRecord
has_many :user_ingredients
has_many :users, through: :user_ingredients
...
class UserIngredient < ApplicationRecord
belongs_to :user
belongs_to :ingredient
...

UGH! That’s so boring and takes so much time! All we really care about is the users and the ingredients, right? If the join table is just tracking the relationship between users and ingredients, isn’t there some way we could get rid of all the busy work in setting it up? Well as luck would have it, Rails provides just that very thing, in the form of thehas_and_belongs_to_many statement!

Oh hai.

Life is so much easier now! We just have to set up two migrations…

class CreateUsers < ActiveRecord::Migration[6.0]
def change
create_table :users do |t|
t.string :first_name
t.string :last_name
t.string :username
t.timestamps
end
end
end
class CreateIngredients < ActiveRecord::Migration[6.0]
def change
create_table :ingredients do |t|
t.string :name
t.integer :calories
t.timestamps
end
end
end

… and only need to worry about two models to update, with even less code up top!

class User < ApplicationRecord
has_and_belongs_to_many :ingredients
...
class Ingredient < ApplicationRecord
has_and_belongs_to_many :users
...

That’s soooooo much cleaner, and easier for us as programmers to boot! Rails, being the smart cookie that it is, sees that has_and_belongs_to_many keyword and knows to create a join table behind the scenes, linking users and ingredients every time we associate any of their object instances

But When Is a Join Table More Than a Join Table?

Lashing yourself to the mast

Our recipe app is going great! Our users can add ingredients to their kitchen and properly keep track of their inventory. Let’s say we’ve even kept working and fleshed out our models by associating ingredients to individual recipes, giving recipes cuisine style tags and a rating system, let users sort recommended recipes based on ingredients they already have, and so on. Things are looking good, and now it’s time to implement our next feature! We want users to be able to mark certain ingredients as restricted, so any recipes using those ingredients get filtered out for them. This is a relationship between…

…hmmmm… this is a relationship between Users and Ingredients. But we’re already using the join table describing that relationship to keep track of a user’s kitchen items...? Okay, so we can’t set up another join table between those models, because it violates the single source of truth. Why don’t we just add a column to the user_ingredients table that keeps track of whether a user’s relationship to a given ingredient is “in kitchen” or “restricted?

Oh... Right. The user_ingredients table and its corresponding class model don’t exist. Because we got rid of them. Because we used has_and_belongs_to_many.

See why we shouldn’t do this?!? If we had just stuck with good old has_many :through, we wouldn’t be in this pickle! We could just fire up a new migration to add a column to the join table and be on our merry way! Now we have to roll our migrations aaaaalllll the way back and set up the model relationships from scratch.

Because, see, sometimes a join table isn’t just a join table. Sometimes — in fact, quite often — we want to keep track of extra data in that join model. Users might have different types of relationships with different ingredients. Movie tickets link customers to film showings, but also hold information like time, price, and location. Students have many classes and vice versa, but that relationship also has its own attributes like assignments and grades. Our painter-patron relationship from the top of the article also has attributes like what year the patronage began, which of an artist’s works were patronized by whom, and so forth. The examples are endless.

It’s also hard to know when a join table will stop being a join table and become a model in its own right. When we’re first starting a project and aiming only towards its skateboard form, it might seem like it makes sense to set up a join table the easy way. But as our projects grow, evolve, and add layers of complexity, we might realize our simple join table actually needs to be a full-fledged model to accomplish our goals.

There’s a tongue-in-cheek saying in software engineering that a programmer’s greatest virtue is their laziness, but there are even older proverbs to “look before you leap” and that “a stitch in time saves nine.” The saying about programmer laziness is really about the virtue of being creative in finding efficient solutions that accomplish a great deal without requiring intensive work. Rails’ implementation of has_and_belongs_to_many is an example of a solution that seems quick and easy, but will actually lead to more, harder work most of the time. That’s not the good kind of programmer laziness!

Just like Odysseus heard the alluring call of the sirens but knew well enough to lash himself to the mast lest he dash his ship against the rocks, we as programmers should know well enough to resist the tempting call of shortcuts and easier paths until we’re one hundred percent sure that they suit our situation and won’t end up creating more headaches in the long run. Go forth and code — carefully!

Maybe for my next blog post I can find a Rails metaphor for Scylla and Charybdis.

--

--

Pete Hanner

Former paralegal gladly opting for programming instead of law school. Engaged in a years-long, steady migration northward.