The Sweet Spot
On software, engineering leadership, and anything shiny.

Pitfalls to avoid when moving to async systems

I recently published a post on the Carbon Five blog titled “Evented Rails: Decoupling complex domains in Rails with Domain Events” that takes some of my thoughts about moving a Rails app to use Domain Events - leveraging the power of Sidekiq (or your job runner of choice) to send async messages between different domains of your app.

This approach always seems nice from the outset, but can hide some painful complexities if you go too far down the rabbit hole. Here is a repost of the latter half of that article, which is worth repeating:

Big win[s of the async model]: speed & scalability

By splitting out domain logic into cohesive units, we’ve just designed our systems to farm out their workloads to a greater scalable number of workers. Imagine if you had a web request thread that would take 500ms to return, but 150ms of that time was spent doing a round trip to a different service. By decoupling that work from the main request thread and moving it to a background job – we’ve just sped up the responsiveness of our system for our end user, and we know that studies have shown that page speed performance equals money.

Additionally, making our application calls asynchronous allows us to scale the number of processing power we allocate to our system. We now have the ability to horizontally scale workers according to the type of job, or the domain they are working from. This may result in cost and efficiency savings as we match processing power to the workload at hand.

Big challenge: dealing with asynchronous data flows

Once things go async, we now have a fundamentally different data design. For example, say you implemented an HTTP API endpoint that performed some action in the system synchronously. However, now you’ve farmed out the effects of the action to background processes through domain events. While this is great for response times, you’ve now no longer got the guarantees to the caller that the desired side effect has been performed once the server responds back.

Asynchronous polling

An option is to implement the Polling pattern. The API can return a request identifier back to the caller on first call, with which which the caller can now query the API for the result status. If the result is not ready, the API service will return with a Nack message, or negative Ack, implying that the result data has not arrived yet. As soon as the results in the HTTP API are ready, the API will correctly return the result.

Pub/Sub all the way down

Another option is to embrace the asynchronous nature of the system wholly and transition the APIs to event-driven, message-based systems instead. In this paradigm, we would introduce an external message broker such as RabbitMQ to facilitate messages within our systems. Instead of providing an HTTP endpoint to perform an action, the API service could subscribe to a domain event from the calling system, perform its side effect, then fire off its own domain event, to which the calling system would subscribe to. The advantage of this approach is that this scheme makes more efficient use of the network (reducing chattiness), but we trade off the advantages of using HTTP (the ubiquity of the protocol, performance enhancements like layered caching).

Browser-based clients can also get in on the asynchronous fun with the use of WebSockets to subscribe to server events. Instead of having a browser call an HTTP API, the browser could simply fire a WebSocket event, to which the service would asynchronously process (potentially also proxying the message downstream to other APIs with messages) and then responding via a WebSocket message when the data is done processing.

Big challenge: data consistency

When we choose an asynchronous evented approach, we now have to consider how to model asynchronous transactions. Imagine that one domain process charges a user’s credit card with a third party payment processor and another domain process is responsible for updating it in your database. There are now two processes updating two data stores. A central tenet in distributed systems design is to anticipate and expect failure. Let’s imagine any of the following scenarios happens:

  1. An Amazon AWS partial outage takes down one of your services but not the other.
  2. One of your services becomes backed up due to high traffic, and no longer can service new requests in a timely manner.
  3. A new deployment has introduced a data bug in a downstream API that your teams are rushing to fix, but will requiring manual reconciling with the data in the upstream system.

How will you build your domain and data models to account for failures in each processing step? What would happen if you have one operation occur in one domain that depends on data that has not yet appeared in another part of the system? Can you design your data models (and database schema) to support independent updates without any dependencies? How will you handle the case when one domain action fails and the other completes?

First approach: avoid it by choosing noncritical paths to decouple, first

If you are implementing an asynchronous, evented paradigm for the first time, I suggest you carefully begin decoupling boundaries with domain events only for events that lie outside the critical business domain path. Begin with some noncritical aspect of the system — for example, you may have a third party analytics tracking service that you must publish certain business events to. That would be a nice candidate to decouple from the main request process and move to an async path.

Second approach: enforce transactional consistency within the same process/domain boundary

Although we won’t discuss specifics in this article, if you must enforce transactional consistency in some part of your system (say, the charging of a credit card with the crediting of money to a user’s account) then I suggest that you perform those operations within the same bounded context and same process, leaning on transactional consistency guarantees provided by your database layer.

Third approach: embrace it with eventual consistency

Alternatively, you may be able to lean on “eventual consistency” semantics with your data. Maybe it’s less important that your data squares away with itself immediately — maybe it’s more important that the data, at some guaranteed point in time — eventually lines up. It may be OK for some aspect of your data (e.g. notifications in a news feed) and may not be appropriate for other data (e.g. a bank account balance).

You may need to fortify your system to ensure that data eventually becomes consistent. This may involve building out the following pieces of infrastructure.

  1. Messages need to be durable — make sure your job enqueuing system does not drop messages, or at least has a failure mode to re-process them when (not if!) your system fails.
  2. Your jobs should be designed to be idempotent, so they can be retried multiple times and result in the correct outcome.
  3. You should easily be able to recover from bad data scenarios. Should a service go down, it should be able to replay messages, logs, or the consumer should have a queue of retry-able messages it can send.
  4. Eventual consistency means that you may need an external process to verify consistency. You may be doing this sort of verification process in a data warehouse, or in a different software system that has a full view of all the data in your distributed system. Be sure that this sort of verification is able to reveal to you holes in the data, and provide actionable insights so you can fix them.
  5. You will need to add monitoring and logging to measure the failure modes of the system. When errors spike, or messages fail to send (events fail to fire), you need to be alerted. Once alerted, your logging must be good enough to be able to trace the source and the data that each request is firing.

The scale of this subject is large and is under active research in the field of computer science. A good book to pick up that discusses this topic is Service-Oriented Design with Ruby on Rails. The popular Enterprise Integration Patterns book also has a great topic on consistency (and is accompanied by a very helpful online guide as well).

Rails, meet Phoenix: Migrating to Phoenix with Rails session sharing

You’ve resolved to build your company’s Next Big Thing in Phoenix and Elixir. That’s great! You’re facing a problem though - all user authentication and access concerns are performed on your Rails system, and the work to reimplement this in Phoenix is significant.

Fortunately for you, there is a great Phoenix plug to share session data between Rails and Phoenix. If you pull this off, you’ll be able to build your new API on your Phoenix app, all while letting Rails handle user authentication and session management.

Before we begin

In this scenario, you want to build out a new API in Phoenix that is consumed by your frontend single-page application, whose sessions are hosted on Rails. We’ll call the Rails app rails_app and your new Phoenix app phoenix_app.

Additionally, each app will use a different subdomain. The Rails app will be deployed at the www.myapp.com subdomain. The Phoenix app will be deployed at the api.myapp.com subdomain.

We are going to take Chris Constantin’s excellent PlugRailsCookieSessionStore plug and integrate it into our Phoenix project. Both apps will be configured with identical cookie domains, encryption salts, signing salts, and security tokens.

In the examples that follow, I’ll be using the latest versions of each framework at the time of writing, Rails 4.2 and Phoenix 1.2.

Our session data is stored on the client in a secure, encrypted, validated cookie. We won’t cover the basics of cookies here, but you can read more about them here.

Our approach will only work if your current Rails system utilizes cookie-based sessions. We will not cover the use case with a database-backed session store in SQL, Redis, or Memcache.

Step 1: Configure Rails accordingly

Let’s set up your Rails app to use a JSON cookie storage format:

1
2
3
4
5
6
7
8
9
# config/initializer/session_store.rb

# Use cookie session storage in JSON format. Here, we scope the cookie to the root domain.
Rails.application.config.session_store :cookie_store, key: '_rails_app_session', domain: ".#{ENV['DOMAIN']}"
Rails.application.config.action_dispatch.cookies_serializer = :json

# These salts are optional, but it doesn't hurt to explicitly configure them the same between the two apps.
Rails.application.config.action_dispatch.encrypted_cookie_salt = ENV['SESSION_ENCRYPTED_COOKIE_SALT']
Rails.application.config.action_dispatch.encrypted_signed_cookie_salt = ENV['SESSION_ENCRYPTED_SIGNED_COOKIE_SALT']

Your app may not be configured with a SESSION_ENCRYPTED_COOKIE_SALT and SESSION_ENCRYPTED_SIGNED_COOKIE_SALT. You may generate a pair with any random values.

Some speculate that Rails does not require the two salts by default because the SECRET_KEY_BASE is sufficiently long enough to not require a salt. In our example, we choose to supply them anyways to be explicit.

Another important value to note here is that we have chosen a key for our session cookie - _rails_app_session. This value will be the shared cookie key for both apps.

Step 2: Configure the plug for Phoenix

Turning our attention to our Phoenix app, in the mix.exs file, add the library dependency:

1
2
3
4
5
6
7
8
# mix.exs
defmodule PhoenixApp
  defp deps do
    # snip
    {:plug_rails_cookie_session_store, "~> 0.1"},
    # snip
  end
end

Then run mix deps.get to fetch the new library.

Now in your web/phoenix_app/endpoint.ex file, remove the configuration for the existing session store and add the configuration for the Rails session store.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# lib/phoenix_app/endpoint.ex
defmodule PhoenixApp.Endpoint do
  plug Plug.Session,
    # Remove the original cookie store that comes with Phoenix, out of the box.
    # store: :cookie,
    # key: "_phoenix_app_key",
    # signing_salt: "M8emDP0h"
    store: PlugRailsCookieSessionStore,
    # Decide on a shared key for your cookie. Oftentimes, this should
    # mirror your Rails app session key
    key: "_rails_app_session",
    secure: true,
    encrypt: true,
    # Specifies the matching rules on the hostname that this cookie will be valid for
    domain: ".#{System.get_env("DOMAIN")}",
    signing_salt: System.get_env("SESSION_ENCRYPTED_SIGNED_COOKIE_SALT"),
    encryption_salt: System.get_env("SESSION_ENCRYPTED_COOKIE_SALT"),
    key_iterations: 1000,
    key_length: 64,
    key_digest: :sha,
    # Specify a JSON serializer to use on the session
    serializer: Poison
end

We set a DOMAIN environment variable with the value
myapp.com. The goal is for these two apps to be able to be deployed at any subdomain that ends in myapp.com, and still be able to share the cookie.

The secure flag configures the app to send a secure cookie, which only is served over SSL HTTPS connections. It is highly recommended for your site; if you haven’t upgraded to SSL, you should do so now!

Our cookies are signed such that their origins are guaranteed to have been computed from our app(s). This is done for free with Rails (and Phoenix’s) session libraries. The signature is derived from the secret_key_base and signing_salt.

The encrypt flag encrypts the contents of the cookie’s value with an encryption key derived from secret_key_base and encryption_salt. This should always be set to true.

key_iterations, key_length and key_digest are configurations that dictate how the signing and encryption keys are derived. These are configured to match Rails’ defaults (see also: defaults). Unless your Rails app has custom configurations for these values, you should leave them be.

Step 3: Configure both apps to read from the new environment variables

Be sure your development and production versions of your app are configured with identical values for DOMAIN, SESSION_ENCRYPTED_COOKIE_SALT and SESSION_ENCRYPTED_SIGNED_COOKIE_SALT. You’ll want to make sure your production apps store identical key-value pairs.

Step 4: Change Phoenix controllers to verify sessions based on session data.

Now when the Phoenix app receives incoming requests, it can simply look up user session data in the session cookie to determine whether the user is logged in, and who that user is.

In this example, our Rails app implements user auth with Devise and Warden. We know that Warden stores the user ID and a segment of the password hash in the warden.user.user.key session variable.

Here’s what the raw session data looks like when the PlugRailsCookieSessionStore extracts it from the cookie:

1
2
3
%{"_csrf_token" => "ELeSt4MBUINKi0STEBpslw3UevGZuVLUx5zGVP5NlQU=",
  "session_id" => "17ec9b696fe76ba4a777d625e57f3521",
  "warden.user.user.key" => [[2], "$2a$10$R/3NKl9KQViQxY8eoMCIp."]}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
defmodule PhoenixApp.SomeApiResourceController do
  use PhoenixApp.Web, :controller

  def index(conn, _params) do
    {:ok, user_id} = load_user(conn)

    conn
    |> assign(:user_id, user_id)
    |> render("index.html")
  end

  plug :verify_session

  # If we've found a user, then allow the request to continue.
  # Otherwise, halt the request and return a 401
  defp verify_session(conn, _) do
    case load_user(conn) do
      {:ok, user_id} -> conn
      {:error, _} -> conn |> send_resp(401, "Unauthorized") |> halt
    end
  end

  defp load_user(conn) do
    # => The Warden user storage scheme: [user_id, password_hash_truncated]
    # [[1], "$2a$10$vnx35UTTJQURfqbM6srv3e"]
    warden_key = conn |> get_session("warden.user.user.key")

    case warden_key do
      [[user_id], _] -> {:ok, user_id}
      _ -> {:error, :not_found}
    end
  end
end

A very naive plug implementation simply renders a 401 if the session key is not found in the session, otherwise it allows the request through.

Step 5: Move session concerns into its own module

Let’s move session concerns around session parsing out of the controller into its own Session module. Additionally, we include two helpers, current_user/1 and logged_in?/1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# web/models/session.ex
defmodule PhoenixApp.Session do
  use PhoenixApp.Web, :controller
  def current_user(conn) do
    # Our app's concept of a User is merely whatever is stored in the
    # Session key. In the future, we could then use this as the delegation
    # point to fetch more details about the user from a backend store.
    case load_user(conn) do
      {:ok, user_id} -> user_id
      {:error, :not_found} -> nil
    end
  end

  def logged_in?(conn) do
    !!current_user(conn)
  end

  def load_user(conn) do
    # => The Warden user storage scheme: [user_id, password_hash_truncated]
    # [[1], "$2a$10$vnx35UTTJQURfqbM6srv3e"]
    warden_key = conn |> get_session("warden.user.user.key")

    case warden_key do
      [[user_id], _] -> {:ok, user_id}
      _ -> {:error, :not_found}
    end
  end
end

This leaves the controller looking skinnier, implementing only the Plug. Extracted methods are delegated to the new Session module.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
defmodule PhoenixApp.SomeApiResourceController do
  use PhoenixApp.Web, :controller
  alias PhoenixApp.Session

  def index(conn, _params) do
    IO.inspect conn.private.plug_session
    user_id = Session.current_user(conn)

    conn
    |> assign(:user_id, user_id)
    |> render("index.html")
  end

  plug :verify_session

  # Future refinements could extract this into its own Plug file.
  defp verify_session(conn, _) do
    case Session.logged_in?(conn) do
      false -> conn |> send_resp(401, "Unauthorized") |> halt
      _ -> conn
    end
  end
end

Finally, we implement some nice helpers for your APIs:

1
2
3
4
5
6
7
8
# web/web.ex

def view do
  quote do
    # snip
    import PhoenixApp.Session
  end
end

This gives you the ability to call logged_in?(@conn) and current_user(@conn) from within your views, should you desire to.

Step 6: Fetching additional information from the backend

Let’s enhance our Session module with the capability to fetch additional information from another resource.

In this case, we’ll model a call an external User API to fetch extended data about the User, potentially with some sensitive information (that’s why we didn’t want to serialize it into the session).

1
2
3
4
5
6
7
8
9
10
11
12
13
# web/models/user.ex
defmodule PhoenixApp.User do
  # Gets some user identity information like email, avatar image.
  # For this example, we'll use a random user generator.
  #
  # This example hits an API, but this could just as easily be something that hits
  # the database, or Redis, or some cache.
  def fetch(user_id) do
    %{ body: body } = HTTPotion.get("https://randomuser.me/api?seed=#{user_id}")
    [result | _ ] = body |> Poison.decode! |> Map.get("results")
    result
  end
end

Now our Session can be extended to return the proper User, which may provide more utility to us as we implement our Phoenix feature.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
defmodule PhoenixApp.Session do
  use PhoenixApp.Web, :controller
  alias PhoenixApp.User

  def current_user(conn) do
    case load_user(conn) do
      # Changed current_user/1 to now return a User or a nil.
      {:ok, user_id} -> user_id |> User.fetch
      {:error, :not_found} -> nil
    end
  end

  # snip
end

Here’s the two apps in action:

Flipping between the two apps, logged in and out.

Heroku deployment gotchas

If you are deploying this to Heroku with the popular Heroku Elixir buildpack, please be aware that adding or changing environment variables that are required at build time require that the new environment variables outlined here are added to your elixir_buildpack.config file in your repository.

1
2
# elixir_buildpack.config
config_vars_to_export=(SECRET_KEY_BASE SESSION_ENCRYPTED_COOKIE_SALT SESSION_ENCRYPTED_SIGNED_COOKIE_SALT DOMAIN)

Caveats and considerations

CSRF incompatibilites

At the time of this writing, Phoenix and Rails overwrite each others’ session CSRF tokens with incompatible token schemes. This means that you are not able to make remote POST or PUT requests across the apps with CSRF protection turned on. Our current approach will work best with a read-only API, at the moment.

Cookies themselves have their own strengths and drawbacks. We should note that you should be judicious about the amount of data you store in a session (hint: only the bare minimum, and nothing sensitive).

The OWASP guidelines also provide some general security practices around cookie session storage.

Moving beyond session sharing

Even though this scheme may work in the short run, coupling our apps at this level in the long run will result in headaches as the apps are coupled to intricate session implementation details. If, in the long run, you wanted to continue scaling out your Phoenix app ecosystem, you may want to look into the following authentication patterns, both of which move your system toward a microservices architecture.

1) Develop an API gateway whose purpose is to be the browser’s buffer to your internal service architecture. This one gateway is responsible for identity access and control, decrypting session data and proxying requests to an umbrella of internal services (which may be Rails or Phoenix). Internal services may receive user identities in unencrypted form.

2) Consider implementing a JWT token implementation across your apps, in which all session and authorization claims are stored in the token itself, and encrypted in the client and server.. This scheme may still rely on cookies (you may store the token in a cookie, or pass it around in an HTTP header). The benefits of this scheme is the ability for your app(s) to manage identity and authentication claims on their own without having to verify against a third party. Drawbacks of this scheme are the difficulty around revoking or expiring sessions.

Each of these approaches is not without overhead and complexity; be sure to do your homework before your proceed.

Conclusion

That’s it! I hope I’ve illustrated a quick and easy way to get a working Phoenix app sharing sessions with Rails app(s), should you decide to prototype one in your existing system. I’ve also pushed up a sample app if you want to cross-reference the code. Good luck!

Evented Rails: Decoupling domains in Rails with Wisper pub/sub events

One common pattern in Domain-Driven Design is the use of publish/subscribe messaging to communicate between domains. When Domain Events are created from within a domain, other domains are able to subscribe to these events and take action within their own domains, respectively.

This is not a common pattern in Rails, particularly because of Ruby’s lack of language support for functional programming paradigms that exist in other languages. However, with a nifty framework and the help of Sidekiq, we can get just a little bit closer.

What is a Domain Event?

A domain event is a recorded property in the system that tracks an action that the system performs, and the factors/properties that lead to its creation.

In the following examples, we are going to use the Wisper gem to implement domain events in our sample Delorean app.

Imagine that we are writing an endpoint that our users will hit, indicating that they want to hail a time-traveling cab. Now the logic to hail a cab is rather complicated and lives in an entirely different area of the codebase, perhaps even in another application. How should we call the other code and ensure that our code is cleanly decoupled?

With our Domain-Driven powers, we’ve been smart enough to segregate our code into different subdomains and bounded contexts, denoted by these two Ruby modules Ridesharing and DriverRouting.

Example 1: In-process pub-sub event modeling, with a service object.

A simple way to use Wisper is to use it to implement your service objects with Wisper, calling the service from the controller.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
module Ridesharing
  class RidesController < ApplicationController
    def post
      # Hail a time-traveling Delorean:
      command = HailDelorean.new
      command.on('hailed') { |driver|
        render text: "Hailed you a cab: #{driver} is arriving!"
      }
      .on('could_not_hail') {
        render text: "Sorry, no dice."
      }
      command.hail!(current_user)
    end
  end
end

Note that the HailDelorean class has powers of event subscriptions now. Our calling code does not have to concern itself with the implementation details of the HailDelorean service - it merely needs to register handlers for the two possible outcomes, hailed and could_not_hail. Here’s how the service class is implemented:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
module Ridesharing
  class HailDelorean
    include Wisper::Publisher

    def hail!(user)
      # broadcast() is a Wisper method to fire an event
      driver = find_driver(user)
      if driver
        broadcast('hailed', driver)
      else
        broadcast('could_not_hail')
      end

    def find_driver(user)
      # Here lies slow, complex domain logic
      DriverRouting::FindDriver.new(user)
    end
  end
end

Handling side effects in subscriber classes

Other side-effects can subscribe to the HailDelorean events. Let’s say we want to fire an event to Segment analytics tracking. I can create a plain Ruby object that simply needs to implement a method with the same name as the event.

Let’s implement hailed and could_not_hail methods on this subscriber class:

1
2
3
4
5
6
7
8
9
class TrackSegmentAnalytics
  def self.hailed(driver)
    # fire analytics event to Segment
  end

  def self.could_not_hail
    # fire analytics event to Segment
  end
end

And we hook it up by subscribing it to the command handler:

1
2
3
4
5
6
7
8
9
10
11
12
module Ridesharing
  class RidesController < ApplicationController
    def post
      # snip
      command = HailDelorean.new(current_user)

      # register the subscriber to the triggering action
      command.subscribe(TrackSegmentAnalytics)
      # snip
    end
  end
end

OK, that was a little awkward, doing all that wiring up in the controller. What if we did the wiring globally, within an app initializer?

1
2
3
4
5
# config/initializers/domain_event_subscriptions.rb
Wisper.subscribe(TrackSegmentAnalytics, scope: "HailDelorean")

# alternate form:
HailDelorean.subscribe(TrackSegmentAnalytics)

This registers a global subscriber for all future instances of HailDelorean.

Example 2: Asynchronous events with subscription handlers and Sidekiq

Here’s the real power of Wisper - we can decouple our application domain responsibilities by modeling effects as subscription objects and do them out-of-band of the primary web request thread.

Note that with the wisper-sidekiq gem, all subscriptions given with an async: true option flag will automatically execute in an external thread as a Sidekiq job. Let’s take advantage of that now.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
module Ridesharing
  class RidesController < ApplicationController
    def post
      # Hail a time-traveling Delorean:
      HailDelorean.hail(current_user.id)
      render text: 'Hailing a cab, please wait for a response...'
    end
  end

  class HailDelorean
    include Wisper::Broadcaster

    def self.hail(passenger_id)
      broadcast(:hail, passenger_id)
    end
  end
end

module DriverRouting
  # Note that this class is both a subscriber and a publisher
  class FindDriver
    include Wisper::Publisher

    def self.hail(passenger_id)
      # Do slow, complex hairy routefinding/optimization/messaging behind the scenes:
      driver = find_driver_for(passenger_id)

      if driver
        broadcast('driver_found', passenger_id, driver.id)
      else
        broadcast('driver_not_found', passenger_id)
      end
    end
  end
end

Finally, we add handlers (subscribers) to these domain objects:

1
2
3
4
5
6
7
8
9
10
11
module Ridesharing
  class NotifyPassengerWithDriverStatus
    def self.driver_found
      # send them a text message :)
    end

    def self.driver_not_found
      # send them a text message :(
    end
  end
end

Now let’s link it together with subscriptions:

1
2
3
4
# config/initializers/domain_event_subscriptions.rb
Ridesharing::HailDelorean.subscribe(DriverRouting::FindDriver, async: true)
DriverRouting::FindDriver.subscribe(Ridesharing::NotifyPassengerWithDriverStatus, async: true)
Wisper.subscribe(AnalyticsListener, scope: "Ridesharing::NotifyPassengerWithDriverStatus", "DriverRouting::FindDriver"], async: true)

Now our messages between our domains are pulled out of the main request thread, and operate in an asynchronous fashion with Sidekiq as the runner.

Code in our domains are kept clean - note that there are no direct references to the other subdomains within each subdomain. Our app more cleanly segregates the responsibilities between each app, heavy workloads are naturally balanced as they move to worker threads.

Caveats: Beware of overbuilding

If you are on a small app, you probably should go with approach #1. The weight of indirection can be a cognitive load on development, unless you truly need to build async code in #2. The overhead and conceptual complexities of the approach can only be justified with large codebases, or in apps where a domain-centric view (and segregation) of code is present.

Caveats: Event subscriptions can be a tangled mess

Note that the act of wiring can quickly fan out into a spidery mess of handlers - you could even further decouple your handlers by modeling a global event bus as a publisher, and having each domain tap into the bus’ events and figure out how to handle each event on its own.

Caveats: transactional consistency!

If you implement this asynchronously, you’ll have to think about how to deal with transactional consistency. Can you design your data models (and database schema) to support independent updates without any dependencies? How will you handle the case when one domain action fails and the other completes?

You may have to roll your own two-phase commit here, the specifics of which I won’t delve into. However, for most of our applications, we may want to skip the asynchronous and keep our events synchronous.

Domain-Driven Design & The Joy of Naming

I want to discuss a topic near and dear to my heart, and what I believe is at the crux of effective software design. It’s not a new functional language, it’s not a fancy new framework, a how-to guide to do microservices, nor a quantum leap in the field of machine learning.

It’s much simpler.

It’s about names.

In the beginning...

Names define us. They define concepts. They imbue a concept with shared understanding. They’re language concepts, but more than that, they’re units of meaning.

Software development is a fundamentally human endeavour. No amount of technical computing breakthroughs will change the fact that software development is still the arduous task of getting a team together full of humans from a kaleidescope of different cultural, linguistic backgrounds - then throwing them together to build an arbitrarily complex product in a rapidly-shifting competitive landscape.

Not only that, the thing to build is chock-full of systems that interact with other systems of unbounded complexity. Additionally, once your software system is out in the wild, you need to make sure that it was the right thing to build. Is the product you built correctly tuned to your market? Is it generating sufficient revenue?

The landscape is littered with software projects that began ambitiously, but got lost in a towering mess of fragile code. It’s no wonder that developing reliable, successful software is more art than science.

Crossing our linguistic wires

Let’s rewind back to a scene from a typical day in the life of your software development team. Think back to the last time you discussed a story with your product owner, how did it unfold?

Let’s imagine a scene at Delorean, the Uber for time travel, where you work. Your team is responsible for writing software systems that calculate the payment processing for your users who are hailing rides from your company’s time-traveling ridesharing service.

PO: Our next big project is to update our driver app to show rider locations on the timeline map.

You: And when do these riders show up on the timeline map?

PO: When the driver turns on the app and signals that she’s driving.

You: OK, so that means when the app boots up and the DriverStatus service receives a POST we’ll need to simultaneously fetch references from the HailingUser service based on time locality.

PO: Um… I guess so?

Or how about your last iteration planning meeting, where you discussed the intricacies of a specific story?

PO: In this story, we’re going to add a coupon box to the checkout flow.

You: [Thinking out loud] Hm… would that mean we add a /coupon route to the checkout API?

Teammate: Wait - I think we call them Discounts in the backend. And the checkout flow is technically part of the RideCommerce service.

You: Right - I mean let’s call the route /coupon but it’ll create a Discount object. And in this story, let’s just remember that the checkout API really refers to the RideCommerce service.

PO: I’ll add a note to the story.

The implementing engineer, of course, doesn’t read the note in the story (who has time to, anyways?). In the course of implementation, he gets tripped up in semantics and spends the better part of a half day re-implementing the Checkout flow as an entirely new service, before realizing his mistake in code review and backing out his changes.

Months later, a new colleague is tasked to fix the link in the checkout flow, but files an incomplete fix because she was not aware of the fact that Coupons actually had mappings back to Discounts. The bug makes its way to production, where it subtly lies dormant until a most inopportune time…

A better, Domain-Driven way

In Eric Evans’ book Domain-Driven Design, he describes the concept of a Ubiquitous Language - a shared, common vocabulary that the entire team shares when discussing software.

When we say the “entire team”, we mean the combined team of designers, developers, the product owner and any other domain experts that might be at hand.

Your product owner may be your domain expert (and typically is). However, you may have other domain experts such as:

  • Any team that builds reporting or analytics off of your software.
  • Upstream data providers
  • Anybody further up the reporting chain whose purview includes the software you’re building, or its effects. Think: the Director of Finance, the COO, the head of Customer Support.
  • The users of your software

Side note: in XP, each team has an “onsite customer” - this is your domain expert!

Developing a Ubiquitous Language with a Glossary

Try this: keep a living document of all the terminology your team uses - along with all its definitions. This Glossary is exactly what it sounds - a list of terms and their definitions.

Delorean Team Glossary

  • Coupon: an applied discount to a BookingAmount. A coupon may take the form of a Fixed or a Percentage amount.
    • Fixed-type: A coupon that applies a fixed amount of money - e.g. a $30 USD discount.
    • Percentage-type: A coupon that applies a percentage savings off the total BookingAmount.
  • Driver: An employed driver who drives within the system, picking up passengers and driving Trips for payment.
  • Trip: An itinerary of passenger pick-up and drop-off location and times.
  • Rider: The passenger that books the trip and is transported by the Driver.
  • Booking: A reservation for a Trip, as booked by the Rider.
  • BookingAmount: The monetary amount of the Trip, accounting for the trip cost, surge pricing, coupons and taxes.
  • Routing Engine: The software system that maps out the driving directions for a driver.
  • Payment: A record of how a user paid.
  • Charge: A financial transaction for a specific dollar amount, for a specific charge method to an institution.
  • Checkout: A workflow in which a Payment is made for a Booking.

From now on, use only the term definitions listed here in your stories. Be explicit about how you use your language!

I’ve been on many projects where the sloppy usage of a term from project inception led to the usage of that term in the code - codifying that messy, slippery term throughout the life of the project.

Which leads us to our next point:

Refactoring your team to use the right terms

Your Glossary is a living document. It is meant to be living - either on a continually-updated Google Doc or a wiki page. It should be visible for all to see - you should print it out and post it on the walls!

Meanwhile, in a planning meeting:

You: So when a user logs into the app and broadcasts that they’re ready to drive…

PO: You mean Driver. When a Driver logs in.

You: Right. Good catch.

It seems a little silly (after all, you both know only Drivers use the broadcast feature of the app), but the laser focus on using the right words means that your team is always on the same page when talking about things.

Later that afternoon, your teammate taps you on the shoulder:

Teammate: I’m about to implement the Coupon story. I suggest we rename the Discount class to Coupon.

You: Great idea. That way, we aren’t tripped up by the naming mismatches in the future.

Teammate: I do have a question about the coupon, though. Do you think it’s applied to the BookingAmount, or is it added?

PO: [Overhearing conversation] You had it right. It’s applied.

You and your teammate then go and update the glossary, scribbling an addendum on the wall (or updating your wiki):

Delorean Team Glossary

  • Coupon: … Coupons may be applied to BookingAmounts to discount the total cost of the booking.

Refactoring your code to use the right terms

Your teammate and you then walk over to her desk; as a pair you proceed to refactor the existing account code. We’ll use Ruby for the sake of this example.

In the beginning, the code looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Checkout
  def initialize(booking_amount, discount)
    @booking_amount = booking_amount
    @discount = discount
  end

  def total
    @booking_amount.total - @discount.calculate_amount_for(booking_amount: booking_amount)
  end
end

class Discount
  STRATEGY_FIXED = 'STRATEGY_FIXED'
  STRATEGY_PERCENTAGE = 'STRATEGY_PERCENTAGE'

  def initialize(amount, strategy)
    @amount = amount
    @strategy = strategy
  end

  def calculate_amount_for(booking_amount:)
    # Implementation...
  end
end

You take a first pass and rename the Discount class to Coupon.

1
2
3
4
5
6
7
8
9
10
11
12
13
class Coupon
  STRATEGY_FIXED = 'STRATEGY_FIXED'
  STRATEGY_PERCENTAGE = 'STRATEGY_PERCENTAGE'

  def initialize(amount, strategy)
    @amount = amount
    @strategy = strategy
  end

  def calculate_amount_for(booking_amount:)
    # Implementation...
  end
end

Now there’s something funny here - your domain language suggests that a Coupon is applied to a BookingAmount. You pause, because the code reads the opposite - “A Coupon calculates its amount for a BookingAmount”.

You: How about we also refactor the calculate_amount_for method to reflect the language a little better?

Teammate: Yeah. It sounds like the action occurs the other way - the BookingAmount is responsible for applying a Coupon to itself.

In your next refactoring pass, you move the calculate_amount_for method into the BookingAmount, calling it applied_discount_total:

1
2
3
4
5
6
7
class BookingAmount
  # implementation details...

  def applied_coupon_amount(coupon:)
    # Implementation...
  end
end

Finally, you change your Checkout implementation to match:

1
2
3
4
5
6
7
8
9
10
class Checkout
  def initialize(booking_amount, coupon)
    @booking_amount = booking_amount
    @coupon = coupon
  end

  def total_amount
    @booking_amount.price - @booking_amount.applied_coupon_amount(coupon: @coupon)
  end
end

When you read the implementation in plain English, it reads:

The checkout’s total amount is calculated by subtracting the booking amount’s applied coupon amount from the booking amount price.

Phew! Designing a strong Ubiquitous Language was hard work! In fact, you had spent a goodly amount of time debating and clarifying with your domain experts:

  • Is a Coupon applied to a BookingAmount, or is it discounted from one?
  • Should we call it a Coupon amount, or a Coupon cost?
  • Is the pre-tax, pre-discount amount in the BookingAmount called a price, or a cost?

Whatever you agreed on, that’s what you changed your code to reflect.

Continual refinement

Hm. Something still feels off.

You and your teammate feel your OOP spidey senses going haywire.

Teammate: Hm. I guess that worked, but that’s still not exactly as clean as we wanted it. Isn’t it kind of weird how the Checkout owns the calculation for the calculation of a discount?

You: Yeah, I see where you’re coming from. That’s just not good OO design. Additionally, if we notice the language our domain experts were using, they didn’t mention that the checkout total was some subtraction of something from another thing. The Checkout’s total simply is the order amount, after application of a Coupon.

Your partner and you take one last step:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Checkout
  def initialize(booking_amount, coupon)
    @booking_amount = booking_amount
    @booking_amount.apply!(coupon)
  end

  def total_amount
    @booking_amount.amount
  end
end


class BookingAmount
  # Implementation...

  def apply!(coupon)
    @coupons += coupon
  end

  def amount
    @amount - @coupons.sum(&:amount)
  end
end

You sit back and read it back, out loud:

The checkout’s total amount is the BookingAmount after a Coupon has been applied.

You both smile. Much better.

In closing…

In this brief time we had together,

  • We discussed why names are important - especially in a complex endeavour like software development.
  • We covered why it’s important to arrive at a shared understanding, together as a team, using the same words and vocabulary.
  • We discovered how to build and integrate a Glossary into the daily rhythm of our team
  • We refactored the code twice - illustrating how to get code in line with the domain language.

And there is much more!

In an upcoming post, we’ll investigate how the Ubiquitous Language applies to a core concept of Domain-Driven Design: the Bounded Context. Why is that important? Because Bounded Contexts give us tools to organize our code - and to do further advanced things like break up monoliths into services.

Knex.js and PostGIS cheat sheet

As follows are some code snippets for using Knex.js for executing Postgres and PostGIS queries.

Execute raw SQL in migration

I often find this useful for fancy SQL, like creating views.

1
2
3
exports.up = function(knex, Promise) {
  return knex.raw(`YOUR RAW SQL`);
};

Add a PostGIS Point type to a table in a migration:

1
2
3
return knex.schema.table('events', function(table) {
  table.specificType('point', 'geometry(point, 4326)');
})

Add a foreign key to another table.

1
2
3
return knex.schema.table('events', function(table) {
	table.integer('device_id').references('id').inTable('devices');
});

Add a multi-column unique index

1
2
3
return knex.schema.table('events', function(table) {
  table.unique(['start_time', 'end_time', 'start_location', 'end_location', 'distance_miles']);
});

Find a collection

1
2
3
4
knex.select('*')
.from('participants')
.where({ name: 'Jason' })
.andWhere('age', '>=', 20)

Custom operations in SELECT clause

1
2
3
knex('trips')
.select(knex.raw('miles * passengers as passenger_miles'))
.select(knex.raw("CONCAT('Hello, ', name) as greeting_message"))

Return PostGIS data from a spatial column:

We use knex-postgis to gain access to PostGIS functions in Postgres. Here, we return a ‘point’ column with ST_AsGeoJSON:

1
2
const knexPostgis = require('knex-postgis')(knex);
knex('events').select('*', knexPostgis.asGeoJSON('point'));

See knex-postgis documentation for a list of other PostGIS functions that are supported.