The Sweet Spot
On software, engineering leadership, and anything shiny.

Strange Loop 2015: Notes & Reflections

Going to Strange Loop was a huge check off my conference bucket list (lanyard?). I’d always heard about this slightly-weird, highly academic collision between academia and industry, skewing toward programming languages you haven’t heard of (or, at the very least, you’ve never used in production). I anticipated sitting at the feet of gray-haired wizards and bright-eyed hipsters with Ph.Ds.

The conference did not disappoint. And it was not quite what I expected-I less sat at the feet of geniuses than I did talk with them, peer-to-peer, about topics of interest. All around me people were saying “Don’t be afraid to ask questions. Don’t feel stupid - nobody knows everything.” Speakers were tweeting about how much they were learning. It was comforting, because lots of topics I had come to see were those in which I had no. freakin. clue. about.

The following is culled from my notes from different sessions I attended. I will focus on brevity. I will keep it clear. Here we go:

Opening Keynote: “I see what you mean” - Peter Alvaro

  • Instructions, behaviors & outcomes.
  • It “feels good” to write in C (a hardcore 1000 liner)
  • But a declarative program (e.g. SQL) works well, but is harder to come
    up with.
  • The declarative world - as described in the work done in Datalog
  • How can we take concepts from Datalog and apply to real-world
    resources like network actors (distributed systems)?
  • It becomes easier to model these systems declaratively when we
    explicitly capture time.
  • Enter Dedalus: extension to Datalog where time is a modeling
    construct.
  • (Show off usage of @next and @async annotations
  • Computation is redezvous - the only thing that you know is what YOU
    know at that point in time.
  • Takeaway: Abstractions leak. Model them better (e.g. with time)
  • Inventing languages is dope.

Have your Causality and your Wall Clocks Too (Jon Moore)

  • Take concept of Lamport clocks and extend them with hybrid clocks.
  • And extend them one further with: Distributed Monotonic Clocks
  • These DMCs use population protocol (flocking) to each actor in the
    system communicate with another, updating their source of truth to
    eventually agree on a media time w/in the group
  • DMC components:
    1. Have a reset button by adding epoch bit
    2. Use flocking (via population protocol) to avoid resets
    3. Accomodates for some clockless nodes
    4. Explicitly reflects causality

Building Isomorphic Web Apps with React - Elyse Gordon

  • Vevo needed better SEO for SPAs. Old soln was to snapshot page and upload to S3.
  • Beneficial for SEO crawlers
  • React in frontend. Node in backend.
  • Vevo-developed pellet project as Flux-like framework to organize
    files.
  • Webpack aliases/shims
  • Server hands off to browser, bootstraps React in client.
  • Alternatives: Relay, Ember

Designing for the Worst Case: Peter Bailis (@pbailis)

  • Designing for worst case often penalizes average case
  • But what if designing for the worst case actually helps avg case?
  • Examples from dstbd systems:
    • Worst case of disconnected data centers, packet loss/link loss. Fix by introducing coordination-free protocols. Boom, you’ve now made your network more scalable, performant, resistent to downtime.
    • Worst case: hard to coordinate a distributed transaction between services. What do you do? You implement something like buffered writes out of process.
      • CRDT, RAMP, HAT, bloom
      • Suddenly, you have fault tolerance
    • Tail latency problem in microservices: the more microservices you query, the higher the probability of hitting a slow server response.
      • Your service’s corner case is your user’s average case
    • HCI: accessibility guidelines in W3C lift standards for all. Make webpages easier to navigate. Side effect of better page performance, higher conversion.
    • Netflix designing CC subtitles also benefits other users.
    • Curb cuts in the real world to help ADA/mobility-assisted folks also benefit normal folks too
  • Best has pitfalls too: your notion of best may be hard to hit, or risky. You may want to optimize for “stable” solution. (Robust optimization)
  • When to design for worst case?
    • common corner cases
    • environmental conditions vary
    • “normal” isn’t normal
  • worst forces a conversation
    • how do we plan for failures?
    • what is our scale-out strategy?
    • how do we audit failures? data breaches?

Ideology by Gary Bernardt

  • Rumsfeld: known knowns, known unknowns, and unknown unknowns.
  • Ideology is the thing you know you do not know you know
  • Conflict between typed vs dynamic programmers:
    • Typed: “I don’t need tests, I have types”
    • Dynamic: “I write tests, so I don’t need types”
  • In reality, they are solving different places in the problem domain, but they have different beliefs about the world that are hidden in the shadows:
    • Typed: “Correctness comes solely from types”
    • Dynamic: “Correctness comes solely from example”
  • “I need nulls” -> You believe nulls are the only way to represent absence
  • “Immutable data structures are slow” -> You believe all immutable types are slow
  • “GC is impractical” -> you believe GC algorithms won’t get faster.
  • Read CSE 341 Type systems, Dan Grossman

Building Scalable, Stateful Services: Caitlin McCaffrey

Sticky connection: always talk to the same machine

Building sticky connections:
- persistent connections (load balancing cannot rebalance server)
- implement backpressure (d/c connection)

dynamic cluster membership

  • gossip protocols -> availability
  • consensus systems -> consistency
    (everybody needs to have the same worldview.

work distribution:

random:
  • write anywhere, read from all
consistent hashing: on session ID

hash space -> node
dynamoDB, Manhattan

con: can have hotspots, could have uneven distribution of resources cannot move work.

distributed hash table

statefully store hash

Real world

Scuba (Facebook)
- distributed in-memory DB

Ringpop (Uber)
- Node.js swim gossip protocol, consistent hashing

Orleans (MS Research)
- actor model
- gossip
- consistent hash
- distributed hashtable

Idalin “Abby” Bobé: From Protesting to Programming: Becoming a Tech Activist

  • Tech to resist exploitation
  • Technologists as activists
  • Idalin Bobé -> Changed name to “Abby” to get a job.
  • Pastor Jenkins - magnifying glass vs paper
  • Philadelphia Partnership Program:
    • 1st to college
    • work <> school
  • Difficult to balance.
  • Mills MBA, CS
  • Joined Black Girls Code
    • Apply technology in the right way
  • Ferguson happened
    • Thoughtworkers joined on the ground
    • Hands Up United: www.handsupunited.org
  • “Do not be led by digital metrics” - even though the activists had digital tooling, the tools were being used against activists. Phone calls, chats monitored. Movement tracked.
  • New group starting up in St. Louis called “Ray Clark, Sr.” - named after a black man who played a strong role in the founding of Silicon Valley.
  • 21st century technologists need 21st century skillsets.
  • Dream Defenders
  • “it is our duty to fight for our freedom/it is our duty to win/we must love and support one another/we have nothing to lose but our chains”

Notes on performance tuning a Puma server

A couple of months ago, I was tuning a Rails app for one of our clients.
This client wanted to know how performant their app would be under load.

To do that, you can do several different things:

  1. Tune the thread/process balance within the VM
  2. Horizontally scale with your cloud platform.

This is a discussion of the former (#1):

1) Set up the test

Drive with a synthetic script

Our application had a synthetic load driver that would run Selenium to
execute various app tasks. This synthetic driver could be parallelized
across many notes via Rainforest QA, Sauce Labs or Browserify.

In our case, I only needed to run our synthetic load script on a single
node in multiple processes, which simulated enough load to anticipate
another order of magnitude of traffic.

Know how to inspect the server under load.

Commands you will want to know:

$ free -m # Find the total amount of free memory on your machine
$ ps uH p <pid> # List out process threads
$ kill -TTIN <puma_master_pid> # Add a puma worker
$ kill -TTOU <puma_master_pid> # Remove a puma worker
$ kill -USR2 <puma_master_pid> # Kill the puma master & workers

Generating more load: use external load testing services, or plain tools.

Try using Flood.io or JMeter for performance load.

I tried looking into the puma_auto_tune gem, but it required a higher level of production instrumentation than I was ready to give it.

Analysis: New Relic scalability analysis

New Relic gave us a scalability analysis scatter plot, plotting
throughput against average application response time. In essence, it
allows you to see spikes in response times as correlated to throughput.

Process:

My approach was to use the synthetic script to generate productionlike
node and ramp up the # of load actors in 5m increments. Each run would
test the following Puma process/thread balance:

Run #1: Single-process, multi threads.
Run #2: Multiple processes, single threaded.
Run #3: Multiple processes, multiple threads.

Aside: how many of these threads/processes should I be using?

Note that your numbers will be different on the execution
characteristics of your app and your server environment. Tweak it for
yourself. You’re designing an experiment.

If you’re curious, our Rails app started out with 4 threads on 2
workers. We made the # of Puma workers (both min and max) environment
variables so we could tweak the variables easily without deploying.

The strategy was then to look at the perf characteristics of each run in
the scatter plot. If there were any spikes in the graph with the
increase of load, then that would be noted. Even minor features like an
increase in slope would be noted - at that point, the incremental cost
of each request increases with overall system load.

Results

I don’t have the New Relic data on hand to show, now, but in our case we
discovered two things:

  1. The server easily scaled from ~10 -> ~500 rpm with a virtually flat
    line for all runs.
  2. The app exhibited no noticeable performance differences when flipped
    between uniprocess-multithreaded, multiprocess-unithreaded, and
    multiprocess-multithreaded modes. Any performance gains were under a
    tolerable threshold.

How do we parse these results?

  • We note that we didn’t really push the performance threshold on this
    app (it’s not meant to be a public web site and 95% of it is behind a
    login wall to a specialized group of users). Thus, if we pushed the
    concurrent connections even more, we may have seen more of a pronounced
    difference.
  • The absence of any major red flags was itself a validation. The
    question we wanted answered coming into this experiment was “how close
    are we to maxing out our single-node EC2 configuration such that we will
    have to begin configuring horizontal scaling?”? The answer was: we can
    safely scale further out in the near-term future, and cross the bridge
    of horizontal scaling/bursting when we get there.
  • We did not have enough statistically significant differences in
    performance for #threads/#processes in Puma. However, if we wanted to
    truly find the optimal performance in our app, we would have turned to
    tools like puma_auto_tune to answer those questions.

Let me know in the comments if you have any questions!

Toolbox: learning Swift and VIPER

The following are some notes I’m compiling as I’m beginning a journey
down the rabbit hole, writing an app in Swift utilizing the VIPER app development methodology

  • I had trouble importing nested source code into XCode before realizing that I
    needed to import the folder with corresponding Groups. This is done by
    clicking the checkbox “Create Groups for any Added Folders”

    Reference: https://developer.apple.com/library/ios/technotes/iOSStaticLibraries/Articles/configuration.html

    Without doing this, the compiler was not able to build the project.

    • Since there is no way to do method swizzling in Swift, there are no real easy ways to do mocking/stubbing the way we used to do so in Ruby. Instead, this is forcing me to rely on plain old Swift structs. There are some simple ways to stub, but it ends up looking kind of awkward and very wiring-intensive like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class NewRidePresenterSpec: QuickSpec {
  override func spec() {
    describe("#startRecordingGpsTrack") {
      class MockInteractor: NewRideInteractor {
        var wasCalled: Bool = false

        @objc private override func startRecordingGpsTrack() {
          wasCalled = true
        }
      }

      var subject = NewRidePresenter()

      it("tells the interactor to start recording") {
        let mockInteractor = MockInteractor()
        subject.interactor = mockInteractor
        subject.startRecordingGpsTrack()

        expect(mockInteractor.wasCalled).to(beTrue())
      }
    }
  }
}
  • Using the vipergen and boa scaffolding generators helped me understand the concepts behind the view.
  • Tip: Build a VIPER module, but don’t build it all at once. Just focus on the Presenter-Interactor-Wireframe component, or the DataStore-Entity-Interactor component. This will keep your head from exploding.
  • Dude. I miss vim. Alcatraz + xvim helped a little…
  • xcodebuild + xcpretty + Guard-shell == some sort of CI feedback loop.
  • Manually creating mocks in Swift = kind of painful. If you override (subclass) a NSObject in Swift, you must provide it with the @objc pragma, otherwise it throws a segfault error
  • You must contact CircleCI manually if you want to activate an iOS build (it’s still in beta). What are some other good CI tools to use with iOS?

Building GPX stats through FRP principles with Bacon.js

With my current fascination with tracking workouts and location-based-activities, I have been interested in how I might be able to rewrite some of my stats logic with FRP principles.

What is FRP?

FRP, or Functional Reactive Programming, is often defined as “functional programming over values that change over time”. It uses functional composition for streams of data that may appear in an infinite stream of data for some far indeterminate future - these types of use cases are served well by FRP which “(simplifies) these problems by explicitly modeling time”.

GPS - your location, varied over time.

A great application of this would be a workout. Let’s say I wanted to build an app that received realtime updates on a person’s position. Say the app was a Node server that received this JSON blob from a web API as a location update:

1
2
3
4
{ 'lat': 29.192414,
  'lon': 148.113241,
  'ele': 122.1,
  'time': '2015-04-18T13:54:56Z' }

Say that some time later, the API receives this JSON blob:

1
2
3
4
{ 'lat': 29.192424,
  'lon': 148.113251,
  'ele': 123.1,
  'time': '2015-04-18T13:55:26Z' }

So we have these data points, that the user has moved +0.00001 latitude points and +0.00001 longitude points, climbing a total of +1.0 meters, over a period of 30 seconds.

Exercise: Get my instantaneous velocity

If we performed this imperatively, we would write it something like this:

1
2
3
4
5
6
7
var locations = [{ /*json*/ }, { /*json*/ } /*, ...*/];
var last = locations[locations.length-1];
var secondToLast = locations[locations.length-2];
var timeDelta = last.time - secondToLast.time;
var distanceDelta = getDistance(last.lon, last.lat, secondToLast.lon, secondToLast.lat);
var velocity = distanceDelta / timeDelta;
console.log(velocity);

With FRP, it might look more like this:

1
2
3
4
5
6
7
8
9
10
var locationStream = [{ /*json*/ }, { /*json*/ } /*, ...some JSON objects that might appear in the future */];
locationStream.slidingWindow(2)
              .map(function(pairs) {
                var timeDelta = pairs[1].time - pairs[0].time;
                var distanceDelta = getDistance(pairs[1].lon, pairs[1].lat, pairs[0].lon, pairs[0].lat);
                return distanceDelta / timeDelta
              })
              .onValue(function(velocity) {
                console.log(velocity);
              });

There is a key difference that is not easily demonstrated here - that the former imperative example requires that all JSON arrays be materialized at once - via db query, in-memory store, etc. It doesn’t account for change in time.

However, the latter functional example accounts for changing values of time as they appear over the stream - as soon as a new value shows up in the stream, the velocity is changed instantly.

Some more location-based experiments: rxlocation

I wrote up a library to parse various facts from a changing stream of GPS events, from instantaneous velocity, average velocity, moving/stopped status, etc.

I investigated different reactive frameworks, mainly RxJS and Bacon.js. My takeaways were that RxJS does everything and the kitchen sink, but I got lost trying to reconcile Node streams with RxJS cold streams. Bacon.js just seemed to work for me, out of the box. I’m still learning, so I hope to have a better understanding of the core issues here.

You can check it out here: rxlocation.

Docker, Rails, and Docker Compose in your development workflow

(This post originally appeared on the Carbon Five blog.)

We’ve been trialing the usage of Docker and Docker Compose (previously known as fig) on a Rails project here at Carbon Five. In the past, my personal experience with Docker had been that the promise of portable containerized apps was within reach, but the tooling and development workflow were still awkward - commands were complex, configuration and linking steps were complicated, and the overall learning curve was high.

My team decided to take a peek at the current landscape of Docker tools (primarily boot2docker and Docker Compose) and see how easily we could spin up a new app and integrate it into our development workflow on Mac OS X.

In the end, I’ve found my experience with Docker tools to be surprisingly pleasant; the tooling easily integrates with existing Rails development workflows with only a minor amount of performance overhead. Docker Compose offers a seamless way to build containers and orchestrate their dependencies, and helps lower the learning curve to build Dockerized applications. Read on to find out how we built ours.

Introduction to docker-compose (née Fig).

Docker Compose acts as a wrapper around Docker - it links your containers together and provides syntactic sugar around some complex container linking commands.

We liked Docker Compose for its ability to coordinate and spin up your entire application and dependencies with one command. In the past, frameworks like Vagrant were easy ways to generate a standard image for your development team to use and get started on. Docker Compose offers similar benefits of decoupling the app from the host environment, but also provides the container vehicle for the app to run in all environments - that is, the container you develop in will often be the same container that you deploy to production with.

Docker (with the orchestration tooling provided by Compose) provides us the ability to:

  • Upgrade versions of Ruby or Node (or whatever runtime your app requires) in production with far less infrastructure coordination than normally required.
  • Reduce the number of moving parts in the deployment process. Instead of writing complex Puppet and Capistrano deployment scripts, our deployments will now center around moving images around and starting containers.
  • Simplify developer onboarding by standardizing your team on the same machine images.

In this example, we will run two Docker containers - a Rails container and a MySQL container - and rely on Compose to build, link, and run them.

Installing boot2docker, Docker, and Docker Compose.

Docker runs in a VirtualBox VM through an image called boot2docker. The reason we have to use boot2docker and VirtualBox is because the Mac OSX filesystem is not compatible with the type of filesystem required to support Docker. Hence, we must run our Docker containers within yet another virtual machine.

  1. Download and install VirtualBox.
  2. Now install boot2docker and Docker Compose.
1
$ brew install boot2docker docker-compose
  1. Initialize and start up boot2docker
1
2
$ boot2docker init
$ boot2docker start
  1. Configure your Docker host to point to your boot2docker image.
1
$ $(boot2docker shellinit)

You’ll need to run this for every terminal session that invokes the docker or docker-compose command - better export this line into your .zshrc or .bashrc.

Creating a Dockerfile

Let’s start by creating a Dockerfile for this app. This specifies the base dependencies for our Rails application. We will need:

  • Ruby 2.2 - for our Rails instance
  • NodeJS and NPM - for installation of Karma, jshint, and other JS dependencies.
  • MySQL client - for ActiveRecord tasks
  • PhantomJS - for executing JS-based tests
  • vim - for inspecting and editing files within our container

Create a Dockerfile from within your Rails app directory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
FROM ruby:2.2.0
RUN apt-get update -qq && apt-get install -y build-essential nodejs npm nodejs-legacy mysql-client vim
RUN npm install -g phantomjs

RUN mkdir /myapp

WORKDIR /tmp
COPY Gemfile Gemfile
COPY Gemfile.lock Gemfile.lock
RUN bundle install

ADD . /myapp
WORKDIR /myapp
RUN RAILS_ENV=production bundle exec rake assets:precompile --trace
CMD ["rails","server","-b","0.0.0.0"]

Let’s start by breaking this up line-by-line:

1
FROM ruby:2.2.0

The FROM directive specifies the library/ruby base image from Docker Hub, and uses the 2.2.0 tag, which corresponds to the Ruby 2.2.0 runtime.

From here on, we are going to be executing commands that will build on this reference image.

1
2
RUN apt-get update -qq && apt-get install -y build-essential nodejs npm nodejs-legacy mysql-client vim
RUN npm install -g phantomjs

Each RUN command builds up the image, installing specific application dependencies and setting up the environment. Here we install our app dependencies both from apt and npm.

An aside on how a Docker image is built

One of the core concepts in Docker is the concept of “layers”. Docker runs on operating systems that support layering filesystems such as aufs or btrfs. Changes to the filesystem can be thought of as atomic operations that can be rolled forward or backwards.

This means that Docker can effectively store its images as snapshots of each other, much like Git commits. This also has implications as to how we can build up and cache copies of the container as we go along.

The Dockerfile can be thought of as a series of rolling incremental changes to a base image - each command builds on top of the line before. This allows Docker to quickly rebuild changes to the reference image by understanding which lines have changed - and not rebuild the image from scratch each time.

Keep these concepts in mind as we talk about speeding up your Docker build in the following section.

Fast Docker builds by caching your Gemfiles

The following steps install the required Ruby gems for Bundler, within your app container:

1
2
3
4
WORKDIR /tmp
COPY Gemfile Gemfile
COPY Gemfile.lock Gemfile.lock
RUN bundle install

Note how we sneak the gems into /tmp, then run the bundle install which downloads and installs gems into Bundler’s vendor/bundle directory. This is a cache hack - whereas in the past we would have kept the Gemfiles in with the rest of the application directory in /myapp.

Keeping Gemfiles inline with the app would have meant that the entire bundle install command would have been re-run on each docker-compose build – without any caching – due to the constant change in the code in the /myapp directory.

By separating out the Gemfiles into their own directory, we logically separate the Gemfiles, which are far less likely to change, from the app code, which are far more likely to change. This reduces the number of times we have to wait for a clean bundle install to complete.

HT: Brian Morearty: “How to skip bundle install when deploying a Rails app to Docker”

Adding the app

Finally, we finish our Dockerfile by adding our current app code to the working directory.

1
2
3
4
ADD . /myapp
WORKDIR /myapp
RUN RAILS_ENV=production bundle exec rake assets:precompile --trace
CMD ["rails","server","-b","0.0.0.0"]

This links the contents of the app directory on the host to the /myapp directory within the container.

Note that we precompile all our assets before the container boots up - this ensures that the container is preloaded and ready to run and jives with Docker tenets that a container should be the same container that runs in development, test, and production environments.

Setting up Docker Compose

Now that we’ve defined a Dockerfile for booting our Rails app, we turn to the Compose piece that orchestrates the linking phase between the Rails app and its dependencies - in this case, the DB.

A docker-compose.yml file automatically configures our application ecosystem. Here, it defines our Rails container and its db container:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
web:
  build: .
  volumes:
    - .:/myapp
  ports:
    - "3000:3000"
  links:
    - db
  env_file:
    - '.env.web'
db:
  image: library/mysql:5.6.22
  ports:
    - "13306:3306"
  env_file:
    - '.env.db'

A simple:

$ docker-compose up

will spin up both the web and db instances.

One of the most powerful tools of using Docker Compose is the ability to abstract away the configuration of your server, no matter whether it is running as a development container on your computer, a test container on CI, or on your production Docker host.

The directive:

1
2
links:
  - db

will add an entry for db into the Rails’ container’s /etc/hosts, linking the hostname to the correct container. This allows us to write our database.yml like so:

1
2
3
# config/database.yml
development: &default
  host: db

Another important thing to note is the volumes configuration:

1
2
3
# docker-compose.yml
volumes:
  - .:/myapp

This mounts the current directory . on the host Mac to the /myapp directory in the container. This allows us to make live code changes on the host filesystem and see code changes reflected in the container.

Also note that we make use of Compose’s env_file directive, which allows us to specify environment variables to inject into the container at runtime:

1
2
env_file:
  - '.env.web'

A peek into .env.web shows:

1
2
3
4
5
6
7
PORT=3000
PUMA_WORKERS=1
MIN_THREADS=4
MAX_THREADS=16
SECRET_KEY_BASE=<Rails secret key>
AWS_REGION=us-west-2
# ...

Note that the env_file is powerful in that it allows us to swap out environment configurations when you deploy and run your containers. Perhaps your container needs separate configurations on dev than when on CI, or when deployed to staging or on production.

Creating containers and booting them up.

Now it’s time to assemble the container. From within the Rails app, run:

$ docker-compose build

This downloads and builds the containers that your web app and your db will live in, linking them up. You will need to re-run the docker-compose build command every time you change the Dockerfile or Gemfile.

Running your app in containers

You can bring up your Rails server and associated containers by running:

$ docker-compose up

This is a combination of build, link, and start-services command for
each container. You should see output that indicates that both our web and db containers, as configured in the docker-compose.yml file, are booting up.

Development workflow

I was pleasantly surprised to discover that developing with Docker added very little overhead to the development process. In fact, most commands that you would run for Rails simply needed to be prepended with a docker-compose run web.

When you want to run: With Docker Compose, you would run:
bundle install docker-compose run web bundle install
rails s docker-compose run web rails s
rspec spec/path/to/spec.rb docker-compose run web rspec spec/path/to/spec.rb
RAILS_ENV=test rake db:create docker-compose run -e RAILS_ENV=test web rake db:create
tail -f log/development.log docker-compose run web tail -f log/development.log

Protips

Here are some nice development tricks I found useful when working with Docker:

  • Add a dockerhost entry to your /etc/hosts file so you can visit dockerhost from your browser.
1
2
$ boot2docker ip
192.168.59.104

Then add the IP to your /etc/hosts

1
192.168.59.104  dockerhost

Now you can pull up your app from dockerhost:3000:

Screenshot of your URL bar

  • Debugging containers with docker exec

    Sometimes you need to get inside a container to see what’s really happening. Perhaps you need to test whether a port is truly open, or verify that a process is truly running. This can be accomplished by grabbing the container ID with a docker ps, then passing that ID into the docker exec command:

1
2
3
4
5
$ docker ps
CONTAINER ID        IMAGE
301fa6331388        myrailsapp_web:latest
$ docker exec -it 301fa6331388 /bin/bash
root@301fa6331388:/myapp#
  • Showing environment variables in a container with docker-compose run web env
1
2
3
4
5
6
7
8
$ docker-compose run web env
AWS_SECRET_KEY=
MAX_THREADS=16
MIN_THREADS=4
AWS_REGION=us-west-2
BUNDLE_APP_CONFIG=/usr/local/bundle
HOME=/root
#...
  • Running an interactive debugger (like pry) in your Docker container

    It takes a little extra work to get Docker to allow interactive terminal debugging with tools like byebug or pry. Should you desire to start your web server with debugging capabilities, you will need to use the --service-ports flag with the run command.

1
$ docker-compose run --service-ports web

This works due to two internal implementations of docker-compose run:

  • docker-compose run creates a TTY session for your app to connect to, allowing interactive debugging. The default docker-compose up command does not create a TTY session.
  • The run command does not map ports to the Docker host by default. The --service-ports directive maps the container’s ports to the host’s ports, allowing you to visit the container from your web browser.
  1. Use slim images when possible on production

Oftentimes, your base image will come supplied with a -slim variant on Docker Hub. This usually means that the image maintainer has supplied a trimmed-down version of the container for you to use with source code and build-time files stripped and removed. You can oftentimes shave a couple hundred megabytes off your resulting image – we did when we switched our ruby image from 2.2.1 to 2.2.1-slim. This results in faster deployment times due to less network I/O from the registry to the deployment target.

Gotchas

  • Remember that your app runs in containers - so every time you do a docker-compose run, remember that Compose is spinning up entirely new containers for your code but only if the containers are not up already, in which case they are linked to that (running) container.

    This means that it’s possible that you’ve spun up multiple instances of your app without thinking about it - for example, you may have a web and db container already up from a docker-compose up command, and then in a separate terminal window you run a docker-compose run web rails c. That spins up another web container to execute the command, but then links that container with the pre-launched db container.

  • There is a small but noticeable performance penalty running through both the VirtualBox VM and docker. I’ve generally noticed waiting a few extra seconds when starting a Rails environment. My overall experience has been that the penalty has not been large enough to be painful.

Try it out

Give this a shot and let me know how Docker has been working for you. What have your experiences been? What are ways in which you’ve been able to get your Docker workflow smoother? Share in the comments below.

Coming up: integration with CI and deployment.

In upcoming blog posts, we will investigate how to use the power of Docker Compose to test and build your containers in a CI-powered workflow, push to Docker registries, and deploy to production. Stay tuned!