The Sweet Spot

I'm tired of reading your slop

2026-05-22T22:28:00-07:00

]]> As a reader, reading AI-generated content brings up deep feelings of revulsion and makes it harder to connect to the author's content. But where does it come from?

Someone wrote a doc and circulated it at work the other day - “Guidelines for vibe-coded docs”. My colleague got tired of reading super long design docs generated by LLMs. Too many docs at work are getting thrown around with reams of nonsensical text. “As an author, you must own your [AI-generated] work” my colleague wrote. It’s been getting out of control.

It got me thinking about how prevalant AI writing slop is these days. It’s especially egregious if you’ve ever surfed Reddit or X or.. God forbid, LinkedIn lately.

Lately I’ve been catching myself cringing when I read Claude-isms show up in my social feeds. “It’s not X, it’s Y.” Claude has a very specific voice and it’s all over the place. I absolutely freaking hate it. You see, to me, using AI signals to me that you’re not using your brain. You don’t stand behind your words. You’re just good at generating… words. But you don’t even know what it means, and I don’t want to engage with it.

I was asking myself why my experience of this was so strong. I think it’s the same phenomenon as the Uncanny Valley - the closer a mechanically-generated thing gets to appearing human, the more it elicits feelings of revulsion or disgust. The original scholarship suggests it may be an artifact of evolutionary biology, or of deep religious or cultural cues.

But is it that simple? After doing a bit of soul searching, I’ve come to realize reading AI slop makes it easy to discount the author and disconnect from their thesis. Reading AI content feels… generic. AI content is too wordy. Or too concise. Too flowery. Doesn’t make sense. I can’t hear the author’s voice behind it. I’ve already disconnected with your content the moment I caught whiff of the “It’s not X, it’s Y” LLM author-isms.

But if I went even further down the path of introspection… reading AI slop brings up emotions of fear in me. Fear that my genuine strength – writing! – is becoming commoditized. And where will that leave me? I was always a bookish kid, good with words, involved with the creating writing mag at school, deeply into the blogging scene when that was a thing. We craved authenticity, good writing, praised the ability of the internet to create real, authentic connection through digital media - most of all, through words.

It feels like all of this is threatened by the massive firehose of technologically-empowered slop that threatens to bury my small little skills with sheer volume. So deep down at the heart of it I’m deeply fearful - of irrelevance and of replacement.

I’ve been guilty of it as well - using AI to generate research docs or execute code snippets and only give it a passing glance of a review, declaring it “good enough”, then shipping it out for the world to see (or for my poor colleagues to review). I know it’s not as simple as being lazy, but there’s immense pressure right now to be “AI-first”, to fill up your hands with so many parallel things it’s impossible to review every word. So with that, your work and your craft becomes diluted because there is just so much less attention to go around.

But here’s a ray of hope.

Tomasz Tunguz made a quaint observation recently in his post “Observations on Writing with AI”. “What’s authentic?” he asks. “Imperfection.” AI editors and authors alike all sound the same. Your voice - with its imperfections - make your work uniquely stand out. We can lean into that.

Elsewhere on the internet, I came across Emily Segal’s Substack article on “tasteslop” where she argues that LLMs and the technology operators that own them, are attempting to capture the optimized taste of society. They do this through expert human raters, fine tuned reinforcement learning and real-world internet signals trained on billions of people. But what true taste is is socially constructed and always constructed in opposition to the dominant framework of the time. So by definition - the moment it gets captured by the machine, it ceases to be tasteful.

The evolving preference of human beings in the here and now, working together in sociotechnical systems - is still impossible to capture and distill (though I’m sure some PhD researcher somewhere is working on this as we speak). So here’s to keeping your own identity in the age of AI. Writing things (or typing things) out by hand. Viva la analog, and all things imperfect. It may feel like AI is capturing the world - and accelerating tons of good things - but it cannot come for the soul of the craft. That’s for you and me to own, and keep our voices authentically human in these weird, weird times.

All Hail the Humans

2026-04-05T23:11:00-07:00

]]> There is just so much AI hype going around nowadays, it's hard to figure out what's real and what's noise. The right thing to do is to bet on humans.

There’s much fear and confusion these days in AI. We’ve got developers harnessing multi-agent swarms, with collective societies of agents running out and refining and defining their work. Agents are now running ML experiments. Articles speculate about how AI agents going to start improving themselves. It’s hard to imagine a world where there isn’t going to be a robot uprising. All this can present to us a special kind of of anxiety and fear about some new world order where we’re rendered obsolete.

When I read the “AI as Normal Technology” paper by Arvind Narayanan and Sayash Kapoor, researchers at Princeton, it all made sense. I’m going to do a quick recap of what their paper says, and reassure you that it’ll be okay. Really.

Tech’s adoption and impact is slowed by societal “speed bumps”. This is no different for AI.

The reason I’m so confident about saying this is that technology doesn’t develop in a vacuum. The dissemination, usage, application and improvement of technology happens in the context of socio-technical systems, whose processes are governed by human systems and have their own “speed limits”. Essentially, the authors argue that the “fast” view of AGI (superintelligent species or rapid human extinction) was unlikely to happen because of the socio-technical limits of AGI. It’s not like an AGI would just appear and humans would become fully AI-pilled and surrender completely to it. The authors note:

…the speed of diffusion is inherently limited by the speed at which not only individuals, but also organizations and institutions, can adapt to technology. This is a trend that we have also seen for past general-purpose technologies: Diffusion occurs over decades, not years.

The authors argue that a modern-day analogy would be more like electrification. Citing analysis from Paul A. David (“The Dynamo and the Computer, A Historical Perspective on the Modern Productivity Paradox”), who argued that despite electricity being invented and even with infrastructure built, it took decades for electrification to become utilized in such a way to dramatically affect productivity:

What eventually allowed gains to be realized was redesigning the entire layout of factories around the logic of production lines. In addition to changes to factory architecture, diffusion also required changes to workplace organization and process control, which could only be developed through experimentation across industries. Workers had more autonomy and flexibility as a result of the changes, which also necessitated different hiring and training practices.

Thus we see that there are practical limits to the adoption and transformative effects of new technology, and adoption – society-altering affects on a grand scale – will be slow.

Humans remain in control of AI systems and are unlikely to delegate control.

But isn’t there a chance that this time it’s different? That AI superintelligence could break out of a lab like a computer virus and start to wreak havoc on society? After all, aren’t there researchers broadly and deeply concerned about misalignment risks?

There was that old AI tale, about an AI that was instructed to make as many paperclips as possible. In the misaligned scenario, the AI takes over the world by creating paperclip factories and deeming humans a risk to itself, destroying humanity.

The authors’ response:

Misalignment concerns often presume that AI systems will operate autonomously, making high-stakes decisions without human oversight. But as we argued in Part II, human control will remain central to AI deployment. Existing institutional controls around consequential decisions—from financial controls to safety regulations—create multiple layers of protection against catastrophic misalignment.

Humans are the ones building or designing products; humans are the ones who decide how they’re deployed, what safeguards to deploy around them, how much or how little responsibility they should take on, how their outputs are parsed and used, and whether or not they should be automated and deployed into the wild. We are very unlikely to ever see the total delegation of decision making to an AI system.

AI is in service of a human; AI is in service to humanity. Humans use AI as tools, and if there’s anything to be reassured of in this day and age, it’s that human creativity and oversight will durably remain over the machines.

This is actually quite an insightful and perhaps insidious insight. The true risks and choke points lie in the providers and organizations who run the services, develop models, and license their technology for whatever technology. What are their values, what are their motivations? These are more important than worrying about what the machines are doing.

Disruption in the software industry

Circling back to the first point - if you’ve been watching the layoffs at Block and Oracle and (likely) Meta, you’re probably still just as discomfited as you were before reading this. After all, it doesn’t matter if AGI or superintelligence is here (or not). The AI hype / bubble is disrupting our industry, especially in software.

I know I started this article telling you that “it’s going to be okay”. The general anxiety around how AI will disrupt our industry is very real, and it’s certainly true for industries where the work is rote and easily automated.

The best thing to do is to understand the technology. Build product taste and exercise technical judgement. Be shrewd enough to learn from the new technology - how to harness it and understand how it works. But don’t delegate your thinking or your skills away.

Can I guarantee that mass layoffs aren’t around the corner? No, I can’t predict that. The sheer variability and noise in the modern world today is incredible. But the best thing we can do is lean in, focus on what we can control, and bet on ourselves. We humans are going to figure it out.

The AI fun factor

2026-01-25T00:00:00-08:00

]]> No matter the hype, there's something addictive about AI coding that makes it hard to ignore.

Let’s set aside the debate about AI productivity, its implications for the economy, the tech job market, and the existential questions about whether we’re in a bubble or not and the trillions of dollars in spending going into the ecosystem. Never mind the fact I just posted an AI-skeptical screed on the importance of artisan hand-crafting software with your old fashioned wetware. Ignore that a bit with me for now.

In my experience, there’s no doubt about it: AI-assisted coding is just plain fun.

It’s fun to blast through your backlog and move at the speed of thought. It’s plain fun to watch your little hordes of AI minions do their research and come back with all these great ideas. It’s fun to feel like things are moving where they’d been stuck before, maybe even for years.

There’s the sheer illusion of progress or speed, which reminds me a little bit of that typical scenario from the pre-AI era when starting up a software project. You’re maybe 20% of the way in, and you’re flying. Commits are flying off your fingertips. The team is sprinting, running toward that MVP. Your proof of concept materializes quickly. Maybe you’re in a greenfield project, and your test suite is blazing fast. Holy crap, you’re knocking it out of the park. Features are coming together and you just have this sense that the wind is at your back and the software gods are smiling down on you. ¹

AI coding feels like this all the freakin time. Your ideas? Launched out of an idea cannon immediately, fully formed. You’ve got three or four or five Claude Code windows open at a time, all working on something different. Your dopamine receptors are tingling with excitement as you guide each micro-team of agents toward their goal, each checking back with you every ten or fifteen seconds. Is this what you’re looking for? What else would you like me to do?. There’s really nothing like it, which is how I imagine Neo felt when plugging into the Matrix:

Thematrixreloaded Neo GIFfrom Thematrixreloaded GIFs

It’s pure dopamine. Sugar straight into the bloodstream. Man, it feels good.

I’d been setting up a Tailscale session to be able to SSH into my phone and build features on Wejoinin. Let me tell you, it’s probably just as addictive as social media, only arguably slightly more constructive. This time, whenever I want, I can tap straight into the thoughts of the computer and just direct these hordes of agents and send them off to build whatever harebrained feature I want to do next.

Regardless of whether AI coding is actually 10x more productive, or whether it will have macro-state impact on the economy or the entire knowledge economy, it can’t be denied that it’s just fun. And that alone makes it worth continuing to examine, to play with at the edges of your time, no matter if you’re technical or not. That kind of upskilling isn’t just boring and technical, like studying for an exam. It’s a positive reinforcement loop, all the worth diving into and getting your feet wet. You’ll soon find yourself going into the deep end.

Little do you know that in about a week, you’re about to hit a wall and hit a million corner cases and the dreaded “last 20% of the project is 80% of the work” truism is about to bite. But more on that later. ↩

A treatise on writing, coding and LLMs: writing is thinking

2026-01-04T23:00:00-08:00

]]> Steve Jobs famously said that computers would be like "bicycles for the mind". What effect will AI have for our minds?

Steve Jobs was remarking on the fact that computers would augment human abilities and allow humans to accomplish far more than they ever would have without. Now, with the dawn of the AI age, it would appear like we are about to make another leap in human capabilities. Or are we?

This last year I really jumped deep into the use of generative language and coding models in my work and personal life. Have I been seeing the payoff commesurate to the hype? Both yes and no - but this post will first start off with the pessimistic case.

AI generates hollow artifacts that can deceive

AI at this stage is impressively good at generating code. It’s impressively good at generating code that looks plausible and passable for real, thoughtful work. But looking like the real thing does not mean it is of the same caliber as a human. Why?

I’ve been trying to get to the bottom of this feeling; the feeling that after working with an AI system for any period of time that the work being generated is not intentional. Don’t get me wrong, the quality of the artifacts being generated - functional code, tests, and documentation - is fairly high quality. But it’s deceiving - peer at the code, and the system is doing things that look correct but fail in non-obvious ways. The code looks correct. It has all the right documentation. The test suite seems complete. But peer closer:

New patterns and data structures are invented and injected when compliance with existing patterns would be preferred.
The test suite blindly writes test cases that either don’t need to be written or are overly verbose.
Blind adherence to application or codebase prompts (“ALWAYS follow TDD” or “You must always write a feature spec” leads to code bloat and reduced long term maintainability)

AI’s strengths (broad) and weaknesses (shallow) create distinctly unique artifacts that are unlike anything we’ve seen.

Now you might say, “hey, this is just the thing where an AI is like that overeager junior developer.” But it’s a wildly different beast. A junior developer usually has a clear limit of where she or he can operate. Maybe are able to implement some key feature within a system boundary, but a single misunderstanding of fundamental system assumption leads their solution to be nonoptimal or incorrect. That’s really easy to code review, or sit down and correct.

But an AI, they’re that intern but with wildly overconfident estimation of their abilities and a truly dizzying breadth of knowledge. They literally have the entire corpus of the Internet downloaded into the foundation model, with all of the patterns and knowledge. This makes correction and code review exhausting:

AI models often overcorrect or overimplement in subtle ways.
Sycophancy problems mean that AI models have difficulty pushing back meaningfully.
With their vast knowledge, AI output often incorporates new or external concepts that do not quite fit the codebase. Depending on your intent, this could be a feature (hey, this new approach works better) or a bug (hey, this new approach is totally overbuilt)

AIs need consistent humans in the loop to correct or justify their approach. For humans, this is exhausting.

When Claude Code started to ask you “hey, is this what you mean?” and ask you questions to get you to clarify your intent, this was pretty amazing. It meant that the context that lived in my head, the implicit understanding that I would have never dumped out into the open was being pulled from me.

But on the other hand, it’s really exhausting and explains why long term using AI has not yet translated into wild productivity gains for myself. Having an AI continually prompt you for clarification on every micro-decision used to be something a programmer just did implicitly in their head; one of a million micro-decisions that formed the direction, vision and principles of a codebase. Having that pulled out into the open is not just wasting tokens, but wasting time.

Human writing (and human coding) is thinking, which is valuable in and of itself

My final and most important point; by delegating this work to AI, we lose out on the reward of reasoning and learning through thinking and work. I feel this is wildly overlooked by the AI conversation today, with most commentary along the lines of “work is now all about having product taste” or “programming is now becoming a higher level activity” as if humans just needed another layer of abstraction.

But the dark side of abstraction is ignorance, either knowingly or inadvertently. The foreman of a group of workers is no longer intimately connected to the craft. The manager of a team of coders is, purely by their (lack) of exposure to the direct work, less and less connected to the system. Our inability to interrogate an AI to understand its reasoning and the prolonged exposure to this new level of indirection is whittling away our ability to reason about and assure ourselves of the system’s integrity.

I’ve been working on a large project in a personal codebase of mine. Day by day I’ve been using AI to build things, carefully crafting CLAUDE.md instructions, reading best practices for context engineering, and dutifully doing my part to keep the intention of software clear, precise and defined in text. I’d be lying if I said it wasn’t satisfying; it’s so cool to see commit after commit land, sprinting from one checkpoint to another. I’m burning down the backlog at a rapid rate. I’ve started coding on my phone, managing my AI agent directly while running errands at the store. So it’s undoubtedly a satisfying experience for me.

On the other hand, it’s been taxing to manage this agent every few minutes. To directly manage its direction, provide additional context, and verify behavior. I’ve caught it duplicating code that existed already, or writing test cases that looked correct, but did not actually test anything of value. Repeat this hundreds of times over and it begins to dawn on you: I could have done this faster myself.

I was with a group of friends recently discussing our work with AI and one had a very astute observation: “I think working with AI is just less satisfying”. What he meant is that the distance from the work, and the work required to harness and manage an AI coding assistant was far more circuitous, far more taxing than just jumping in and doing the work himself.

I would agree. With increased use of these models, I feel the distance from myself to the produced artifact become dull. As a software craftsperson (professionally, I came of age in the XP and Software Craftsmanship movement), this pains me greatly. I am less connected, more cognitively taxed, and less satisfied with the work that is being done. As a writer, I relish the process of writing, refining an idea and polishing the gem. I love that the process of getting something down on paper (or a screen) is a dialog to refine the idea in your head, and back and forth it goes. That is the inherently creative part of the process I fear that I am losing.

A proposal for the future

I’d be lying if we didn’t acknowledge the times that these models have landed a particularly complicated refactor. Or the magical moment when an AI agent anticipated what needed to be done and did it directly in the background. It’s clear that the advances in technology are not going to stop. The annoyances I’ve listed here might not last another year at the rate of progress we’re seeing in the industry.

On the other hand, let’s not offload our thinking entirely, in order to stay knowledgeable and reputable. How might we…

Think of less cumbersome ways to interact with LLMs than textually?
Challenge ourselves to work unassisted for some time to better understand the system?
Consider the use of these models in an assistive mode, rather than as a be-all owner?
Keep the human in the loop, but make it an enjoyable loop?

I think tools are going to be the way forward. More on that in a bit.

An aside: I found myself recently on a flight back home with time to spare. I had intended to work with AI assistance (Claude Code) but my flight had wifi issues and I was offline for much of it. The two hours I spent digging in unassisted in the codebase gave me more insight and intuition than two months of work did.

A Staff Engineer's Survival Guide to Big Tech

2025-01-01T00:53:00-08:00

]]> After a career built around startups, scale-ups and consulting, I left Lyft and landed a role in Google (YouTube) in 2022. Two years later, here's what I've learned about the differences between the two.

Moving from low-coordination to high-coordination contexts

At Lyft and in roles prior, I worked on teams that were highly independent by design! By virtue of these companies having startup DNA and by virtue of existing during the zero-interest-rate 2010’s phenomenon, we had mandates to move fast and to be decoupled from other teams.

At Lyft I was in the Growth organization which was peripheral to the core Ride Booking experience. I was tasked with building user acquisition funnels and retention flows, which funneled into the core product experience. But because there were very clean lines in the user journey where we could draw a boundary on, the organizations had very clean lines of separation from each other. This gave my team a very long leash to execute our big ideas on. It was taken for gospel that we needed to ship fast and iterate quickly - my director told us his goal was to have a team ship a new experiment every week.

Now switch contexts to Google (YouTube), where I work now work on a core YouTube product surface - one that is embedded deeply in the product experience and with countless dependencies on other teams. For one my teams, I counted no less than 12 product- and infra-team dependencies. Want to ship a feature of any significant product impact? That’s how many teams need to convince, align and get approvals from.

As a result, you get heavy process- and planning-driven cultures, with technical program managers running the show (by the way, bless my TPgMs, they are a godsend and the only way anything gets done).

Moving from microservice to monolithic architectures

Following Conway’s law, it’s little surprise that our architectures reflected our org structures. Lyft was on the bleeding edge of the microservices wave in the 2010s (our frontend engineering guild of ~100 engineers counted nearly as many frontend microservices total - that’s nearly one service per engineer!). Want to get a change in? No problem - service boundaries cleanly match team boundaries, APIs are exposed between other teams and you can just basically write your code and ship it in 15 minutes.

Google, on the other hand, is famed for having a monolithic architecture (which is, as it is, experiencing a resurgence lately). This means that engineers are often queued up waiting for code reviews on other teams that are dependencies or owners of upstream services that interface with your feature. This is by design, though - the bar is set very high for any changes entering the codebase, and having a monolith gives engineers visiblity and accountability to maintaining its high bar.

Note that I’m not suggesting that a monolith necessarily means more perverse coupling or more technical debt - in fact, the Google way of doing things is incredibly elegant, and there is a lot of proper thought given to encapsulation of concerns.

But taken to the extreme, this can often mean playing review tag for days at a time with a team that’s on the opposite side of the globe, leading to very real implications for getting code shipped.

Product maturity and scale dramatically dictate how you approach product development

At Lyft, we were the #2 player in the market and always playing catch-up. We were always worried about losing users and market share to the competition, and this do-or-die mindset was always at the forefront of our decisionmaking. Thus, we were hyper-focused on shipping new features (even if the bets we were taking were a little risky). Reviews were fast and decisions were made quickly because the threat was existential. The question here is: are we shipping fast enough?

On the other hand, at the scale of YouTube as the 900-lb gorilla in the space, the business is designed to defend itself. There is so much more to lose than to gain from shipping new features. Want to do something radical? Product owners spend months debating, aligning, drafting annual roadmaps, then shipping them around to get reviewed and aligned with other teams to ensure that the feature you want to build is actually the right thing to build. Vision docs are written, alignment sessions are brought in, OKRs are hotly debated for quarters on end. The question here is: are we shipping the right thing?

The curse of data-driven decision making

At both of these companies you’ll find experimentation-driven cultures and advanced experimentation tooling, but Big Tech takes it to a new level. At Lyft, we definitely had a few north star metrics that we aimed to move (or used as guardrails) - but we really only tracked a handful, tops.

At YouTube, there are literally thousands of metrics that can move with a change, and you had better have a good idea why they move. Any change in metrics is inspected with a microscope, any deviations can send engineers and data scientists off on data expeditions to understand what happened. It could be real, or it could just be statistical noise. And God forbid this is a metric that you have never heard of - better go find the right person to ask and figure it out.

This leads to what we’ve internally called “metrics hell”. When a metric goes awry, how do you know:

What actually is happening? What are our theories about why this is? Does it logically make sense to the change we are testing?
Is it statistical noise - and can be ignored?
What other kinds of analysis can we do to understand this outside our experimentation system? Manual testing? Deep-dive into log data?
If it’s a real metrics drop, how much can we tolerate? When can we get it back?

Teams can spend quarters working on a big project only to get it up to an experiment then get stuck in Metrics Hell for months on end. It is very rare to see a project get the go-ahead if a core data concern is not resolved and investigated. This means we need to deeply instrument the system with the right data inspection tools, and train our engineers to query and analyze it.

Moving from zero-interest-rates to zero-headcount scenarios

Finally, this is maybe the most important point.

When I was at Lyft from 2019-2022, we were in a big wave of IPOs. Crypto frenzy was everywhere. Tech, coming out of the pandemic was white-hot, and 2022 felt like the year that workers held all the chips in their hands. I had never felt so much confidence in my prospects in the market.

It would soon all change as 2022 came to an end. Only a half year after my departure, I received news that my prior team at Lyft had seen heavy layoffs and more or less been disbanded. One by one, companies started to fold and layoffs started to materialize in places I had considered invulnerable to layoffs. The unspoken feeling was that we were all vulnerable, and that no job was safe.

Where in my last role there was a feeling of freedom, of experimentation and trust and play, the dominant experience of the next role was… stress. I had never felt so much pressure to onboard quicker, learn faster, work longer hours, and drive impact. It exhausted me, and work was no longer enjoyable.

So where does that leave me now? Well, I’m hoping to explore some of these learnings and themes in my next few posts over the next year. Some of the musings will be very practical about navigating a Big Tech job at Staff+. Other will be pretty personal about managing anxiety and stress. My goal is the write the Survival Guide I wish I had when I started out this new role - and hopefully it will be useful to you too.

What's the fuss about formal specifications? (Part 2)

2022-03-12T00:00:00-08:00

]]> In which we debug a production bug (loosely based on a real bug at Lyft) and check its fix in the TLA+ model checker!

In Part 1 we discussed the use cases for formal specifications and we looked at a simple transaction isolation bug in a financial institution.

Introduction

This is a distributed system transaction orchestration problem.

In this exercise, imagine we are a bank, and we are serving an API request from our banking mobile app to initiate a bank transfer from an external financial institution to the user’s account.

We are building a API endpoint that asks the other financial institution to move money over to us.

The wrinkle here is that internally, the transfer also needs to be synchronized with a third, internal transaction database (our internal source of truth) to formally recognize the balance in the user’s account.

How do we design this system to ensure the design is resilient to failures and outages?

Requirements:

We must guarantee that balances are kept consistent between our system and the external institutions. No money should be lost/created on either side (obviously).
The external service may fail to process the transaction for any reason (downtime, network partition, system error)
The internal transaction service may also fail to process the transaction for any reason (antifraud rules, domain logic, ratelimiting, network partitions)
The API request is synchronous, and must respond within 200ms @ p99

The Happy Path

We illustrate the system design using the happy path. Our mobile client calls an API gateway, which we use as a transaction coordinator.

The API gateway makes 2 calls. First, it calls the external financial institution to initiate the transfer. If the transfer is successful, the API then turns around and pings the internal balance store service to note the transaction was a success.

(Note that this is not representative of a real world bank! This is a contrived, simplified example).

Because we need our API to be synchronous, the API coordinator blocks until both responses are success, before returning a success response to the client.

sequenceDiagram
autonumber
OurMobileClient->>+OurAPIGateway: SubmitTransfer
OurAPIGateway->>+ExternalFinancialInstitution: StartTransfer
ExternalFinancialInstitution->>-OurAPIGateway: SUCCESS
OurAPIGateway->>+OurInternalBalanceStore: UpdateUserBalance
OurInternalBalanceStore->>-OurAPIGateway: SUCCESS
OurAPIGateway->>-OurMobileClient: SUCCESS

Internal API error with compensating transaction

Of course, we know that errors can crop up in the real world. If the call to either service borks, we will need a way to either retry or fail gracefully. Here, we consider the use case where our internal API service crashes.

We consult with the team and decide that if the API service crashes for any reason, we will want to undo the transaction in the external financial institution with a compensating transaction. We will throw this work onto an external queue as soon as an error occurs.

sequenceDiagram
autonumber
OurMobileClient->>+OurAPIGateway: SubmitTransfer
OurAPIGateway->>+ExternalFinancialInstitution: StartTransfer
ExternalFinancialInstitution->>-OurAPIGateway: SUCCESS
OurAPIGateway->>+OurInternalBalanceStore: UpdateUserBalance
OurInternalBalanceStore->>-OurAPIGateway: FAILED
OurAPIGateway--)BackgroundWorker: Enqueue Reversal
Note over OurAPIGateway,BackgroundWorker: Compensating transaction kicks off after a failure
OurAPIGateway->>-OurMobileClient: FAILED
BackgroundWorker->>+ExternalFinancialInstitution: UndoStartTransfer
ExternalFinancialInstitution->>-BackgroundWorker: SUCCESS

But wait! We see that there’s a bug. For there’s a race condition when users “button mash” after they hit an error dialogue in the mobile client and immediately retry their request again!

sequenceDiagram
autonumber
OurMobileClient->>+OurAPIGateway: SubmitTransfer
OurAPIGateway->>+ExternalFinancialInstitution: StartTransfer
ExternalFinancialInstitution->>-OurAPIGateway: SUCCESS
OurAPIGateway->>+OurInternalBalanceStore: UpdateUserBalance
OurInternalBalanceStore->>-OurAPIGateway: FAILED
OurAPIGateway--)BackgroundWorker: Enqueue Reversal
OurAPIGateway->>-OurMobileClient: FAILED
OurMobileClient->>+OurAPIGateway: SubmitTransfer
OurAPIGateway->>+ExternalFinancialInstitution: StartTransfer
ExternalFinancialInstitution->>-OurAPIGateway: FAILED
OurAPIGateway->>-OurMobileClient: FAILED
Note over OurMobileClient,OurAPIGateway: User's  re-submittal fails because there are no funds in account (race condition)
BackgroundWorker->>+ExternalFinancialInstitution: UndoStartTransfer
ExternalFinancialInstitution->>-BackgroundWorker: SUCCESS

This will fail.

Enter formal specifications!

OK, let’s try to model this behavior as a formal TLA+ spec. I’ll write out how the spec would look, and we’ll go through it line by line:

variables
    queue = <<>>,
    reversal_in_progress = FALSE,
    transfer_amount = 5,
    button_mash_attempts = 0,
    external_balance = 10,
    internal_balance = 0;

define
    NeverOverdraft == external_balance >= 0
    EventuallyConsistentTransfer == <>[](external_balance + internal_balance = 10)
end define;

\* This models the API endpoint coordinator
fair process BankTransferAction = "BankTransferAction"
begin
    ExternalTransfer:
        external_balance := external_balance - transfer_amount;
    InternalTransfer:
        either
          internal_balance := internal_balance + transfer_amount;
        or
          \* Internal system error!
          \* Enqueue the compensating reversal transaction.
          queue := Append(queue, transfer_amount);
          reversal_in_progress := TRUE;

          \* The user is impatient! Their transfer must go through. They button mash (up to 3 times)..
          UserButtonMash: 
            if (button_mash_attempts < 3) then
                button_mash_attempts := button_mash_attempts + 1;

                \* Start from the top and do the external transfer
                goto ExternalTransfer;
            end if;
        end either;
end process;

\* This models an async task runner that will run a
\* a reversal compensating transaction. It uses
\* a queue to process work.
fair process ReversalWorker = "ReversalWorker"
variable balance_to_restore = 0;
begin
    DoReversal:
      while TRUE do
         await queue /= <<>>;
         balance_to_restore := Head(queue);
         queue := Tail(queue);
         external_balance := external_balance + balance_to_restore;
         reversal_in_progress := FALSE;
      end while;

end process;

Whew, ok! That’s a lot. Let’s go through it line by line:

First up, we declare variables and operators:

\* These are global variables
variables
    queue = <<>>,
    transfer_amount = 5,
    button_mash_attempts = 0,
    external_balance = 10,
    internal_balance = 0;

define
    NeverOverdraft == external_balance >= 0
    EventuallyConsistentTransfer == <>[](external_balance + internal_balance = 10)
end define;

There are two main blocks here, the variables block and the define block. The variables defined here track values that will be used globally throughout the model. The operators in the define block are properties that the model checker will use to make sure invariants and temporal properties hold true throughout the lifecycle of the model.

It’s imporant to note the properties defined here in the spec:

NoOverdrafts is checked on every state combination, ensuring that there cannot be a scenario where the external financial institution is asked to transfer more money than is in its account.
EventuallyConsistentTransfer is a Temporal Property that checks whether the system always eventually converges on the condition listed below - that external + internal balance equals $10, the starting amount. We are essentially guaranteeing that we cannot unintentially create or lose any money between our institutions.

Next up, there are two process blocks being defined here, representing the two internal systems whose interactions we are modeling here.

The first process is the API coordinator. Inside this coordinator, each action is marked by a label - so note the labels ExternalTransfer, InternalTransfer, and UserButtonMash. These correspond with various phases of our system sequence diagram. Let’s walk through the code:

fair process BankTransferAction = "BankTransferAction"
begin
    ExternalTransfer:
        external_balance := external_balance - transfer_amount;
        ...

This is fairly self-explanatory - the system is set up to first call the external institution and tell them to withdraw the money. For simplicity’s sake, we assume it always is successful. (It obviously won’t be, and we have the perfect tool to model failure scenarios around that!)

 InternalTransfer:
        either
          internal_balance := internal_balance + transfer_amount;
        or
          \* Internal system error!
          \* The system will enqueue the compensating reversal transaction.
          queue := Append(queue, transfer_amount);
          reversal_in_progress := TRUE;
          ...
        end either;

The next label is interesting. We use an either...or control structure to tell the model checker that there is possibly branching logic here (in this case, there is a success case and a failure case). Both these branches will be exhaustively explored.

In the successful case, we observe that the internal API is called successfully and the balance is correctly stored. However, the failure case will have us enqueue a compensating transaction (a “reversal”) that will be processed by an asynchronous worker.

          \* The user is impatient! Their transfer must go through.
          \* They button mash (up to 3 times)....
          UserButtonMash: 
\*            await reversal_in_progress = FALSE;         
            if (button_mash_attempts < 3) then
                \* But the UI blocks them from re-submitting until the transaction
                \* has finished being reversed/compensated.
                button_mash_attempts := button_mash_attempts + 1;
     
                goto ExternalTransfer;
            end if;

Ooh, the user, the user. You can always count on the user to do something unexpected. So now while the user is enqueuing the compensating transaction, our poor user is confused and is now retrying the original transaction (aka “button mashing”) the UI button in hopes that it will go through. Will it succeed?

Note that the way I’ve built the spec, I’m specifying a finite limit to the number of user retries, if only to make sure the program will eventually terminate.

Finally, observe the goto ExternalTransfer statement on Line 10. This basically tells the model checker to jump to the ExternalTransfer: label - i.e. the top of the program to re-execute the process all over again.

(Author’s note: I haven’t finished this yet, but thought I’d push this up as a work in progress. Do you see the error? Are your spidey senses tingling here? More to come!)

What's the fuss about formal specifications? (Part 1)

2022-03-09T00:00:00-08:00

]]> What Math ✨ can bring to your daily toolbox of programming tools to write robust, concurrent programs: a light introduction to TLA+.

If you’ve been writing software for any amount of time, you may be familiar with the many tools we have available to us to ensure correctness, consistency and debuggability of our systems. They range the gamut of unit / acceptance / integration tests, QA plans, CI/CD automation and the like. System or language tools like type systems, interactive debuggers and profilers abound. Practices emerge like DevOps process, TDD/BDD, and even Agile process itself can be argued to be invented toward the goal of writing correct, robust, easy to maintain systems.

Surely these tools are advanced enough in the 70-plus years of computing to help! But no - with the rise of distributed computing, the classes of bugs that start to emerge start to get ornery and complex, are usually nondeterministic, and often beyond the reach of ordinary tools.

But what if I told you there was another option from the world of… math?

Enter formal verification

What if there was a way to guarantee that our systems and algorithms are performant, run correctly, are reliable against race conditions and the like?

Here’s how it works:

You describe your system (or program) in terms of formal logic statements. You assert specific conditions that must hold throughout the program runtime (invariants). You write this in the form of a proof (that lives outside your actual program).
The tool has a “model checker” which is a glorified BFS search algorithm that explores every possible state space of your program proof and lets you know if the invariant conditions hold.
If they do - congratulations! You’ve verified your system. If it doesn’t pass - congratulations! You’ve found a potential bug!
Using the results from the model checker, you can fix the proof to fix the model checking error. This will translate into a real world fix that you can then roll back into your program.

It’s not magical. It’s also a lot of work, and in all fairness, slightly out of the reach of the typical industry programmer. But it’s much more in reach than you think!

A simple example

I’ll be using a tool called TLA+, and writing a sample spec from a derivative syntax called PlusCal. I’ll walk us through a simple example that can be found on the Learn TLA web site.

In the book Designing Data-Intensive Applications, the Transactions chapter illustrates a scenario where read isolation is not correctly implemented in the database, leading to dirty reads - simultaneous queries may be able to read dirty data from complex multi-statement operations - leading to bad outcomes.

Let’s say we are a bank where a user attempts to transfer money between two accounts, and a separate query is being run by an auditor who wants to ensure that the bank software is working correctly and no funny money business is happening:

sequenceDiagram
autonumber
User->>Account1: Add $500
Auditor->>Account1: Query Balance
Auditor->>Account2: Query Balance
User->>Account2: Subtract $500

Alas, our system was implemented a bit naively, and we can see that the application makes two calls to the database, debiting from Account1 and crediting to Account2 in two separate statements.

Assuming both accounts have initial values of $1000, the User’s transfer completes successfully, transferring $500 from Account1 to Account2, maintaining the correct money flow (Account1 + Account2 = $2000).

However, the Auditor has had the unfortunate timing to look at the state of the world in between the two user operations and has a different view of the world, seeing that $500 has materialized out of thin air into Account 1 (Account1 + Account2 = $2500)!

Let’s model this behavior as a TLA PlusCal algorithm:

variables
  transfer_amount = 500,
  account1 = 1000,
  account2 = 1000;

process User = "user"
begin
  StartUserTransfer:
    account1 := account1 + transfer_amount;
  FinalizeUserTransfer:
    account2 := account2 - transfer_amount;
end process;

process Auditor = "auditor"
begin
  DoAudit:
    assert account1 + account2 = 2000
end process;

High level explanation - the two process blocks model two independent activities happening here - the user initiating the transfer and the auditor running the query.

This will blow up! The TLA model checker will compute all possible computation states between the two processes as delineated by the statements inside the StartUserTransfer, FinalizeUserTransfer, and DoAudit labeled statement groups, including when the auditor runs before, during, and after the user’s inter-account transfer.

Just look - the model checker has run and blown up on a series of state transitions that got us to the Very Wrong situation we discussed before. The poor auditor has found that Account1 has $1500 and Account2 has $1000. That’s not good!

How do we fix this?

Clearly, this is incorrect. We will need to ensure that the database is not able to allow other queries to read values happening from within a transaction. So here, we say “OK, we’re going to wrap up these statements in a TRANSACTION block”. But hold up! We need to move that change in system design into our TLA model.

Recall that the TLA model checker can only test state combinations between each labeled state, meaning that statements grouped inside a label are considered atomic operations. Knowing this, we move both transfers to within the same label to tell the model checker “these two operations happen at the same time, as if they were running in a transaction”.

begin
  DoUserTransfer:
    account_1 := account_1 + transfer_amount
    \* Collapse this transaction with the one above to make them atomic
    account_2 := account_2 - transfer_amount

Run the model checker again - it passes.

More resources

This was a fairly high level overview on how to write TLA specs. This is much better explained on the Learn TLA site: please read more there!

For the sake of time, I will direct you to some great resources:

Learn TLA - A beginner-friendly resource from author Hillel Wayne
Practical TLA+ - Hillel Wayne’s more comprehensive resource for specification programming in TLA+
The TLA+ Home Page - Leslie Lamport (Author of TLA+)’s resources for learning and running specs written in TLA+

Up next

I’d love to run through a real world example of using TLA+ to specify a distributed system, loosely based on a concurrency bug we saw recently at Lyft. Stay tuned!

Consider consulting

2022-03-07T00:00:00-08:00

]]> What career move will give you maximal exposure to technologies, industries, and orgs? Why you should consider a stint in consulting as part of your career path.

When you think of a software engineering career path, you may default to the idea that you can climb the career ladder at various product companies and corporations, working directly with stakeholders and leadership to ship products to customers. The types of companies you might consider are early/mid/late-stage startups, established enterprises or Big Tech companies.

What you may not have considered, however, is how a stint in consulting can accelerate your learning curve and teach you lessons that can multiply your effectiveness across any engineering organization you join later in your career.

Now you may have a stereotype of a consultant - maybe of a management consultant that flies out to clients five days out of the week and works 80 hour weeks and lives out of suitcase and makes PowerPoint presentations all day. If life as a suit doesn’t seem appetizing, that’s okay. That’s not the consulting I’m talking about!

I spent four years working as a developer at a XP software consultancy shop and… really loved my time there.

What’s software consulting?

Software consulting is defined by working with a client on a short-to-medium-lived project that has a digital deliverable - a software platform, an updated platform capability, or an MVP to show to first customers.

Consulting may consist of process deliverables - I’ve been on many a project with the entrenched old guard in some industry realizing that the new upstarts are eating their lunch with software - and that they’d better get along with the innovation. That means teaching folks Agile process, or product management.

Now I can’t claim to know everything about consulting, as my experience is limited to one consultancy in my career. However, I can say that it was a huge springboard for my career because it increased my exposure to people and organizations. The following are some of my learnings and takeaways from my time:

1. The hardest problems are people problems

You can always make a tech problem work. It’s the people problems that are the hardest. From stubborn and resistant developers who need to be wooed to your side, or to surprise stakeholders surprising you right before ship date. The keys to project success are almost never at the execution layer.

As a consultant, you will learn to very quickly read the room and understand where the power structures are. There’s the account manager, who’s stuck their neck out to really get your team in the door. There’s the VP eng, who is somewhat skeptical of your team, but who needs to be shown results. This is no different from inside the walls of a product company, where teams need to know where and through whom the power flows - and properly seek to manage that relationship.

2. Relationships are the key (as is lunch)

As a consultant, you are an outsider and often met with skepticism if not outright hostility. Finding ways to build trust and rapport with your client partners (read: grumpy engineers or skeptical directors) are super important. To that end, it was important to show my face in the office(s) as much as possible to see. Making small talk was key, or grabbing lunch with the team.

As a consultant, you are always grabbing lunch with people. At a product company, you too will learn that it’s important to build bridges and relationships with the stakeholders and collaborators on your team and outside.

3. Even if you think you’re the smartest person in the room, be flexible and humble

Many of us are hired for our domain expertise or Thought Leadership(tm), which would seem to naturally imply that consultants have a lot of power or sway in what can or should be done. After all, they are expensive!

But wait! That also means consultants are often seen as a threat. After all, who has to maintain the codebase after these consultants build their thing into it? You can never roll in and assume that you have the permission of the entire team to build a new system/introduce a new process/launch a new product the way you think should be done.

Even though we held strongly to our product development principles, we would sometimes bend to the customers’ whims because we understood that not all the time, one-size-fits all. So if the client balks at writing tests a certain way, or if they really don’t want to name the class that name, or if they have really weird preferences around line breaks and indentation - we let it go.

4. Don’t chase the shiny (too hard)

In consulting, the pace of learning is exhilarating. One month you’re working on a kubernetes migration for a Fortune 500 company, the next month you’re dipping your toes in the latest React library, and the next you’re building an iOS app for a stealth startup (oh, and some coworker keeps talking about OCaml or Haskell or something). It’s easy to get caught up in the temptation to choose the latest shiny for everything you build.

My advice - Choose Boring Technology. More specifically, use the “innovation tokens” mentioned in Dan McKinley’s article and choose one (or maybe two) fun, new things to use. Don’t dump the innovation tax on your clients and customers. This is hard to choose into. At a product company, this will be important to learn as well as you learn to identify the tradeoffs of choosing The New and Shiny versus the stability of Boring Tech. At a product company, you are also responsible for long-term maintenance of your systems, so this lesson may emerge no matter what.

5. Use your energy thoughtfully - and rest

Billing hourly had the result of forcing me to think about where and what I allocated my hours to, every day. My consultancy had a rule to never bill more than 8 hours / day, and thankfully it was modeled from the top that we really would sign off at the end of the day¹. You’re forced to really work on the most important things - pushing your product counterpart to ruthlessly prioritize only the most important things. Then you sign off and you just don’t work at night. No emails.

Quite honestly, this is something I really find hard to do nowadays I’m at a product company. It’s not easy to put down the computer and not answer emails².

6. Pairs are a pleasure

Gosh, I loved pairing. I know that I’m a weird outlier. But I picked up so many technologies, architecture pointers, vim shortcuts, and other random things from all the pairs I had over the years. I was fortunate to really enjoy my coworkers, and by far, this was my biggest growth multiplier in my technical skills.

Now I’m back in Big Company life, I’m known as the engineer who keeps scheduling time to pair with teammates or folks on different teams. It’s the best way to learn a new domain or system, and to also build trust with the person you’re working with.

7. The importance of business development and sales

As a developer, I naturally shy away from the sales and business development process. As a principal engineer in the consultancy, I was often tasked to go on sales meetings with prospective clients to be the face of engineering and also to vet client systems. I gained newfound appreciation for our partners and business development staff and learned ways to properly put value on the process and art of software development. And even though lots of these BD trips ended up without an agreement, it gave me opportunities to meet staff at other companies and get a little window into how they worked.

Fin

So that’s my little spiel about how helpful my consulting background has been now that I’m at a larger company.

By the way - Gerald Weinberg said all this stuff better in Secrets of Consulting. It’s a great read - no matter if you’re in consulting or not!

This must have been more of a high-end software consultancy thing, as we were more of a boutique firm with name recognition. Projects were structured in terms of time & materials, so we never had pressure to work nights and weekends to ship X by Y date (opting instead to cut scope). This let us rest easy at night. ↩
Then there’s the matter of being on call, which is pretty rare in consulting. Geez… I miss that. ↩

Quarter Life Crisis

2022-02-18T00:00:00-08:00

]]> At 23 years old I decided to quit my first full time tech job and take a year off to figure out my life. Was it worth it?

I grew up on the stereotypical overachiever fast track. My dad was a Silicon Valley hardware engineer who got me into coding when I was in fifth grade, and my passions kept me scripting, coding and building web sites in middle school and high school. When I graduated from Berkeley with an EECS degree, it was pretty clear I was ready to dive headfirst into the industry.

I joined a fairly large company, filled with smart and friendly people. It was a pretty stable, comfortable place. My coworkers organized lots of social events and there was an obvious deep camaraderie between all.

I was the new grad hire on an older team¹. I had a great manager and teammates I could learn a ton from. Totally an ideal place to launch a career.

Except… a year and a half in, I quit. I decided I’d leave the industry for a bit. I was happy at work, but I wasn’t OK.

You see, I had just gone through my first big breakup, one that reverberated deeper than I realized. When that relationship ended, I went through a period of deep soul-searching and realized that I needed some time away.

Around that time, some friends let me know that they were entering a yearlong internship at our Oakland church community. I decided that I’d join them that year. And so it went - I moved out of my comfy Emeryville apartment and into the cramped quarters of our East Oakland community center. It was going to be the start of one of the most transformative experiences of my life.

Instead of daily standups behind big glass vistas of the San Francisco skyline, I woke up to daily meditation and time spent in the urban community garden. Where I took for granted the amenities and services of our big glass skyscraper, I was now the one vacuuming, scrubbing and cleaning the facilities². Instead of spending most of my day with high-earning tech workers, many days were spent chatting (and sometimes squabbling) with our unhoused friends who lived on the church steps.

I know it’s cliché, but having time to step out of the career hustle was so good for me. It was good for the young man that I was, who needed time to focus on himself and rebuild a grounded identity. It was good for me to spend among friends and trusted community. It was good for me to spend a season focusing my energies outward. It was good for my balance and sense of what was normal to see how folks way, way outside the tech bubble lived, especially in East Oakland as we served in the soup kitchen.

I think that if I had not taken that year off, I would have continued in the hustle - lost deep in the bubble that so many of us in tech ensconce ourselves with.

I fully understand that my time spent in East Oakland that year cannot fully be separated from conversations about gentrification and privilege. After all, I had the financial means to take a year off without worrying about debt. And a year later, I re-joined the industry, easily switching back into my privileged life in tech. To that end, the learning continues.

And yet, that year fundamentally transformed me - it gave me a perspective on life outside of the tech bubble. It gave me friendships that have lasted to this day and sweet memories (and uproarious stories) that will last a lifetime.

At 23 years old, I made a good decision to take a year off. I’d say it was worth it.

My team’s average age was over 40 - I think about how rare this is now. ↩
There was one particularly bad rainy day where the sewer main backed up that resulted in shenanigans we collectively dubbed “Chocolate Rain”. You don’t want to know. ↩

Let's talk about the FAANG interview

2022-02-06T00:00:00-08:00

]]> If you've ever cruised Hacker News or Blind, you'll know there's an intense obsession with passing the interview loop at big tech companies like FAANG. This is a post for early-career engineers and those looking to break into the field. As someone who's failed SO MANY of these loops, I'm here to tell you to relax.

A tale of two interviews

Let’s get to the point - I’ve never passed the interview loop at a FAANG¹ company, and I’ve tried at least 6 times² throughout my career - Rejection City!

Now the typical loop at these companies will prioritize mastery over algorithms and data structures. This advantages folks at computer science research institutions, or people who have hours every day to grind leetcode³. Because there are so many applicants passing through the pipeline, there’s a pretty low margin of error for any of these interviews.

These big tech companies are inundated with candidates, and it pays them to be aggressive in how they filter candidates out. This means it’s acceptable to have a very high rate of false negatives, or regrettable rejections (in statistical terms, that means they optimize for high precision and low recall). It also means that there is an entire cottage industry of cram schools, course materials that would rival any SAT / higher education cram school.

It’s perfectly normal to end up failing out one of these loops because of anything including - nerves, blanking out, or simply having some bad luck with the type of problem you were given. At this point, I’ve received 6 FAANG rejections so far (and counting). Does it sting? No doubt, especially as I consider myself a fairly competent engineer. (I did have some memorable experiences⁴ though, and the interviewers I’ve met have been all kind and fair.)

Does my record indicate I’m any less of an engineer? Nope - I know what I’m worth and what I’m capable of. And you know what? I’m OK with it!

What’s your superpower?

That’s what I usually ask students or junior engineers at this point - what’s your superpower? Is it your keen collaborative spirit? Your thorough PR reviews, and responsible custodianship of the health of your systems? Your ability to write a thorough doc or tech spec? Your deep knowledge of important domains of web performance or observability or some deep understanding of the business?

These might not have a chance to shine in your next FAANG interview.

I know. It sucks. It’s their loss they didn’t design their interview loops to let you shine. And if they don’t, take your awesome self and go apply at a different company. The sad thing is that this can be so much better across the entire industry!

Designing a better interview loop

I’m a big believer in making interviews work like actual day-to-day coding practices, over demonstrating algorithmic prowess. I’ve been part of some really well designed loops that:

Let you work as a pair with your interviewer, two pairs of hands on the keyboard.
With real production-like code (or as real of a coding challenge as can reasonably/legally designed)
Emphasizing the glue work involved in being an engineer - how you review PRs, or write specs, or plan to develop a project.
Giving you the opportunity to show your work offline with take-home programming challenges. Though controversial, I generally enjoy them provided they are bounded in scope because they decrease the stress of coding with someone looking over your shoulder.

It’s the reason at Lyft we’ve designed our Apprenticeship interview loop to be specifically more collaborative. And each day I hear more and more about people working on well-designed interview loops.

Plenty of fish

Let’s end with reasons you might have a better career path elsewhere:

A smaller co (or startup) will give you more opportunities to grow. Exciting projects may show up on your doorstep without you having to compete for them.
FAANG-adjacent companies (companies with late-stage funding rounds, or newly IPO’d) nowadays pay equivalently or even more than FAANG. If compensation is your goal - it pays to do your research and apply broadly!
A smaller company gives you more opportunities to be “in the room” when the decisions are made.
Hiring processes are changing, and they are changing more rapidly at other companies than at FAANG. More and more companies are tweaking their interview loops to mirror real code exercises, look at real code, or give you extended time pairing together.

It’s a hot market, and people are fighting for talent right and left. Don’t limit yourself to just FAANG (or adjacent). Do your research and find the right company that fits your strengths and skills. Now go forth and interview - good luck and go get ‘em.

Epilogue

And finally - a conversation I came across recently on Twitter (the author is a prolific author in the Ruby community):

🧵This should stop. Such practices are not inclusive unfortunately. Believe it or not but for some folks any form of a "test/quiz" can be triggering. My brain shuts down when somebody tells me to do a test in front of them. This is what trauma does to people. https://t.co/0Q3Aqz3nbh
— solnic (@solnic29a) February 5, 2022

What if there was a radically different way to interview people - in a way that doesn’t trigger test or performance anxiety?

FAANG: Facebook (Meta), Amazon, Apple, Netflix, and Google, but not exclusively limited to this club. Mainly the halo circle of prestigious companies offering top tier comp and staggering stock returns. With Meta’s recent rebranding, now alternatively called “MANGA”. ↩
I’ve applied to at least one FAANG company each time I’m in a job-hunting phase, and never passed hiring committee review – with the exception of a brief internship at Apple, which I’ll confess I didn’t do great at either. But that’s a different blog post altogether. ↩
“Grinding leetcode” - to hit the books like you’re studying for finals. Leetcode users have levels and rankings, and people brag about how quickly they can solve hard problems (and how many hours they spend on the platform). However, consider the time investment required to better oneself on the platform, and the types of people this would exclude. ↩
Once, when interviewing for an internship at Facebook I did manage to learn that Joe Hewitt (of Firebug fame) was working next door and I managed to shuffle out of my room to shake his hand… nice guy. ↩

I believe in you

2022-01-26T00:00:00-08:00

]]> Struggling with the responsibilities of being a new manager, I received encouragement from an unlikely person. This next post is a short one. It's about the power of encouragement at the right time and place.

2014 was my first year as a manager, and it had been brutal. In addition to feeling the (normal) overwhelm of transitioning from technical IC to manager, I was also struggling with performance managing one of my direct reports, who was pushing for a title and compensation bump and running into process and procedural hurdles from the company. I felt trapped and caught in the middle and completely out of my depth, losing sleep night after night wondering how I was going to make this happen.

I worked with senior leadership and our head of HR to work through the logistics of this process (because reasons). This specific case was thorny because things weren’t straightforward on both sides. My report had gone about the process in a way that turned messy, but the company itself hadn’t formally defined a career ladder, so it was kind of on us.

I’d show up multiple days in a row to work with our HR director to push the process forward and keep her apprised of updates in the process. She’d give me input on how I was handling the process, and I’d run that back with engineering leadership to figure out a way forward.

The back and forth was exhausting.

Finally, we got it done. I delivered the news of the promotion and title bump to my direct report, plus the constructive feedback I needed to deliver. I was drained. I walked back to our HR director and let her know the news, expecting a perfunctory acknowledgement.

She thanked me for the news, and on my way out, she stopped me.

“Andrew, you did a good job with this. It wasn’t easy.”

I thanked her for the compliment.

She looked me in the eye. “You’re going to be a CTO one day.”

I nearly laughed in the moment, but thanked her and walked out. What did she know about me? If management was this stressful, no way in hell I wanted to be a CTO.

I thought nothing more of it in the moment, just glad to be done. But in the years to come, I’d go back to that moment in times when I’d doubt myself. The words, “You’re going to be a CTO” wasn’t meant to shoehorn me into a specific vision of the future, but meant to tell me, “I see you have the potential to rise to a level of leadership that you can’t see yourself.”

Truth be told, I didn’t think I was really cut out for leadership. I didn’t think I knew how to handle management, nor handle messy situations well. There was much to critique about how I had handled things. But a few well-placed words at the right time from the right person changed my trajectory and fanned a little ember of self-confidence in years to come.

These days I try to do the same for my mentees and sponsees. I try to have radical candor when giving people feedback. And when I see glimpses of them rising to the occasion, I tell them, I believe in you. You may not know it now, but you will succeed.

Overproduction and its discontents

2022-01-20T00:00:00-08:00

]]> I'm committing myself to an article a week on this blog while on paternity leave in early 2022. Today's topic is on how I'm recovering and reorienting myself after a disorienting 2021.

What made 2021 so difficult? 2020 was tough enough, but I felt like I ran it all on adrenaline and we survived. 2021 felt like it opened with a glimmer of hope but then it quickly fell apart again.

In some sense, 2021 was a big success. I got promoted at work; I had multiple speaking engagements and was able to move the needle on several important initiatives. I received validation of my work and my leadership. We welcomed our second child at the end of the year.

But in another sense, 2021 was incredibly draining. It was the year the world collectively realized that the pandemic was here to stay, and the psychic toll that took on us was heavy. It required we suffer through the ever-blurring line between personal and work life. It was hellish to figure out how to raise a young kid in these times.

One more: my family learned the news that my mom has late-stage pancreatic cancer. Suddenly the things that were important came into focus: family was the most important thing, and it was the most important thing to be close to her for however long we had.

And about our second-born: our first experience with our firstborn broke us - much of it due to our physical distance from family. We knew that the second time around needed to be closer to family - both for their help, but also their encouragement and love.

The conclusion was simple: we immediately made a move from Oakland to San Diego to be near my mom and the rest of our families in Southern California. Fortunately, there was the silver lining in COVID remote work - that such a move was possible without having to risk my job.

However, the increased concerns in my personal life started to also nudge into my work life. Working past 6PM wasn’t an option anymore - I was needed. I had to work later in the evenings to make up. I started to decline opportunities I would have jumped at before - conference speaking, or networking events. Side projects and professional reading lay fallow.

At first, it felt really shitty, like I was limiting my career growth due to the pesky realities of personal life. But upon reflection, I realized I was given the gift of focus, and the opportunity to say no.

Much of my early career had been characterized by me saying yes to any opportunity that came my way. The opportunity to jump into engineering management, or the opportunity to lead a big project for a big client, or take a speaking gig or do a conference talk. These things were all well and good. But the cost of saying yes to everything is that you are not in control of your own time, energy and emotional state.

I recently read an article by Steve Magness titled “Own Your Distractions So They Don’t Own You”. In it, the author discusses how our lizard brains fall prey to modern life in the “candy shop”, full of digital distraction. If we live without intentionality, we fritter away our energy and our health, far from our rooted center in healthy relationships.

So back to the work aspect of things. I titled this “Overproduction” because, well, I’ve frankly worked a lot this past year. Much of it has been incredibly fruitful, impactful and fulfilling. Some of it, if I’m being frank, has not been the best use of my time. I have been wondering how things would have turned out if I had been a better delegator, or used my “no” muscle more.

While I’m on paternity leave, my goal is to reorient myself both personally and professionally. I’ll speak to the latter here: I’m going to do a retrospective for myself on my work life. I need to figure out where I’m going, and what is worth my time, and what isn’t. I have less of it than ever these days, and the time I do have needs to be put to good use. We’ll see where that leads.

Until then, I have a few queued up posts that I’ve been working on that I’ll release weekly. Let me know what you think on Twitter at @andrewhao!

Expand Your Scope (Without Losing Your Soul) - A Staff Engineer's Guide to Career Development

2021-11-25T00:00:00-08:00

]]> So you've set your sights on a staff-plus title, or you're looking to grow in your role. How do you proceed?

This post originally appeared as a guest article on LeadDev.com titled “How to Expand Your Scope as a Staff Engineer”.

You’ve been a solid senior engineering lead for the past several years at your current company. You’re well respected among your teammates and have a solid track record of shipping impactful products and features. However, you can’t help but shake the feeling that you’re stuck in your career growth and that your prospects are limited where you stand.

Your mandate as a staff engineer is to have a deep impact across multiple teams and the organization, but the road to get there is unclear. Maybe your position on your current team limits the types of projects you can execute, or your manager is too busy to help you grow. Perhaps you’re a new staff hire, and struggling to navigate the landscape of the organization and looking for the most impactful places to operate. Or you may be experiencing the opposite problem: underwater with a flood of small projects that aren’t really large enough to get you where you want to be.

These scenarios all share a common thread - the scope and influence you currently hold is not large enough to tackle the deep, cross-cutting projects you want to lead as your career advances. Let’s discuss how you can get there!

Why build influence?

For some of us, the thought of growing our influence may conjure up bad experiences at dysfunctional organizations, where kingdom-building and power games were the norm. For others of us, growing influence feels like a zero-sum game: To grow my scope, I need to be taking it away from someone else. And for some of us, we experience icky feelings of dread that we’ll need to cozy up to people in authority. Building influence or scope feels intimidating, aggressive, or unnatural to our collaborative instincts.

On the other hand, we know that we can’t just sit on our hands. In an ideal world, we want to believe that if we just quietly do the work, we’ll naturally get noticed and people will give us credit. In this ideal world, people would step aside to create opportunities for us when we’re deemed ready. Unfortunately, good work is not always noticed, and your internal ambitions are not always recognized.

The good news is that there is a third way - a way where you can be responsible for your own growth and trajectory, without the power games. Growing your influence can be done by naturally leaning into collaboration – here’s how.

Build your network - and let your ambitions be known

At this point in your career, it is a given that your technical skills are strong. They have served you well up to this point, but they will not (usually) be your primary means of growth in this phase of your career. Instead, it’s your relationships and connections that will serve as the catalyst for your growth.

Why build a network?

As a Staff engineer, your responsibility is to understand what’s going on at all levels of the organization, linking leadership strategy to what’s happening on the ground. To that end, you’re going to need relationships and touch points that can give you insight into what other teams are doing. You’re going to need to meet people outside of your circle that can help you see the other parts of the organization that you’re not seeing - and fill out the context you’ve been missing.

Not only that, you’re building out relationships with other teams that you can informally lean on if you need a favor done - or provide help to someone else when they need something from you.

The word network is no doubt loaded with notions of clammy hands, awkward small talk and unwanted inbound LinkedIn messages. Once again - it doesn’t have to be this way! Instead, consider a few updated ideas for the modern, remote work world.

Who should I be talking to?

There will be people that are immediately obvious to connect with. For example, you may want to set up recurring 1:1 syncs with leads on adjacent teams in your immediate group. In these syncs, consider filling each other in on your team roadmaps, common challenges you face. Some of these conversations may be fertile ground to identify problems that can be solved.

Other people worth connecting with may be peers in adjacent organizations - for example, engineers on platform teams may want to reach out to leads on product teams that are customers. You may want to consider networking with peers who share the same function as you (iOS/Android, frontend, data science, etc). Ask them what challenges they face, and compare notes on any gaps or opportunities you might see to be filled in your respective roles.

Finally, you may want to schedule time with your skip-level manager or a member of the leadership team. Consider asking them questions about the state of the organization, what challenges they face, and what their top priorities are (see Will Larson’s excellent blog post “Staying aligned with authority”).

Let your ambitions be known

Many of us aren’t comfortable revealing our career ambitions to others. However, holding back on conversations with your manager or a more senior sponsor along the lines of “I want to grow my scope so I can get a title at the next level” or “I’d like to have a greater role on Project X” will limit their ability to help you. Leaders in organizational authority roles are in the room where decisions are made - and you want them to be aware of your goals so they can position you for that new project or initiative that could help your career break out.

“People who are in the more senior role that you want also have their own goals and career aspirations. While it can be intimidating to ask someone ‘How do I get your job?’, remember that they probably don’t want to hold onto that role forever,” advises Ashley Kasim, a Staff Engineer at Lyft. “Uplifting me is a part of their journey to get to where they want to go too. Now that I’m in my current role, I’m also trying to grow my replacement. It’s about mutual benefit.”

Genuinely listen and offer help

It’s a bit cliche to say, but Dale Carnegie’s advice from How to Win Friends and Influence People still stands today: the best way to build your influence is to freely offer your genuine self. Offer to do a favor for a team that’s feeling crunched. Share your time as a mentor or sponsor for someone who needs it. Celebrate and elevate the wins of teams around you. Make sure you’re really listening to them as they share their wins and their struggles. Remember the details of what they share - from teammates’ names, to the particular challenges they face on specific projects. Do this freely, with no strings attached. Come crunch time, you may be surprised at how easily many will return the favor.

In today’s remote-work environment, it’s important to remember to make personal connections with people. It is too easy to start meetings by diving straight into business while forgetting to connect with the human behind the face on the screen. Personally, I treasure the chance to make small talk. I love learning about people’s vacation plans, or taking a few minutes to rabbit hole in a shared interest, or sharing a laugh over a funny story heard the other day. This breaks the monotony of back-to-back calls and also opens an opportunity for camaraderie, levity and connection. Ultimately, these little actions build trust - the raw currency you need for effective operation at your level.

Find the right problems and opportunities

As you build your network, you’ll also want to identify potential problem spaces that may be opportunities for your growth. Here’s a few ideas.

Connect the dots

In addition to your 1:1s, position yourself to receive information pushes from different parts of the organization. Join Slack chat rooms of other teams, where you can get a pulse for project status or updates. Add yourself to email distributions where you can receive project updates asynchronously. Attend an all-hands meeting for an entirely different group.

By being everywhere, you may be able to connect the dots on changing product strategy in another group that is upstream from yours, or jump on a new platform that has collaboration potential for you. You are now at a unique place to be able to connect the dots about problems or opportunities across the company:

CI frustrations? There might be an org-wide gap in platform tooling.
Patterns and cycles of rework? There might be a common stakeholder you share that may be contributing to rework
Duplicate systems doing identical tasks? There may be an opportunity to create a shared platform.
Big wins or ships from other orgs? Get in touch with adjacent teams to learn from their process, or compare notes from similar projects.
Pivoting product strategy in a different line of business? This may give you advance notice to organizational or product changes that may ripple into your org.

Your job is to synthesize that kind of information and use it to create new innovation opportunities for yourself and your team.

Look at the seams

Opportunities can often be found in the seams where one team ends and one team begins. For example,

If your team is responsible for one part of the user experience and then hands it off to another team to handle another part of the experience, is there friction in the handoff?
If one team depends on another, are the interfaces between system responsibilities well-designed? Are SLOs in place to provide targets for system reliability?
If a neighboring team works in a shared system, how is the quality or health of the shared platform? Are there opportunities to lead initiatives to improve quality, encapsulation, or abstraction?

Spread your impact by leading with your strengths

Another way to come up with impactful projects is to do an assessment of your personal strengths and follow them to see if there are opportunities within the company. Maybe you’re a gifted teacher and coach, or you’re a deeply technical data scientist in your team domain. Now take a completely different axis of the business and imagine what it would look like to offer your leadership there.

As a gifted teacher, how might you level up the testing culture of your fellow engineers - through workshops or starting a community of practice?
As a technical expert in your domain, how might you help an entirely different business function - marketing, legal, or finance - understand and deepen their fluency in your expert area?
As a mobile engineer, how might you build a skills library for server or frontend engineers who might want to contribute to your codebase?
As an accessibility advocate, how might you leverage your expertise to advocate at a company wide level?

Be wise about the projects you choose - and collaborative in your style

By now, you no doubt have a large list of potential opportunities or projects to tackle. But wait - you don’t just get to tackle them all at once, you need to be strategic about what to advocate for and how to build the case to get the green light.

Align to your org’s scope

More likely than not, you’ll have more than enough opportunities to start up new projects, or contribute to high impact opportunities. However, it’s important to make sure those opportunities are the ones that are aligned with the goals of your organization.

Are you on a product team, but you see a glaring need for an infrastructure improvement? Instead of offering to build a grand, universal solution for the whole company, you may consider building out a local proof of concept for your team - then work with the platform team to integrate your work with theirs.

A good question to ask yourself is - _if I take on this project, does my organization or group move faster? _Some of the reasons for this are pragmatic to your career development - your peers and team leadership are the ones who will be validating your work, and arguably the ones who know you the best. Other reasons are pure Conway’s Law concerns - you will succeed most when you are working with the systems and the teams you’re most familiar with.

However, don’t be afraid to push the boundaries of your org scope. After all, it’s at the seams that opportunities can be found!

Lead with a collaborative style

When advocating for expanding your scope, you want to create buy-in. I prefer a style of collaboration where we lead with the mindset of solving together. Let’s imagine a situation where you might be advocating for a solution that moves into another team’s domain. You might try the following:

Build rapport and empathy by meeting 1:1 with the tech lead and hearing about their problems. Use this time to align with their roadmaps to identify the top OKRs and projects on their minds.
Offer up a proposal: “I’ve been hearing about this problem on your end for some time and I wrote up a draft proposal for something that might help and be a win for both of us - would you mind reading it and telling me what you think?”
Frame the discussion as a mutual win - if you or your team take on this new initiative or scope, how would it benefit both your teams? How would it move the needle on their metrics?

When proposing increasing scope, you should be ready to receive a polite rejection. Be aware that if you move into someone else’s scope, you will almost certainly be creating more work for them - therefore, make sure that the thing you offer is a clear win for the other side. It’s completely normal - and that gives you the green light to move on to your next project idea!

A few plays

Finally, the tactical part of the picture. Building scope may involve running a few “plays”, or actions, that help you build the scope you are looking for.

Lead (and land) an important initiative for your team(s)

In my experience, this is the most common sign of operating at a Staff-plus level - running a multi-team project that ships something complex and important for the larger group. You will want to work with your manager and your network to get on a project that has a surface area at the organization-wide level. These types of projects tend to have multiple phases, require buy-in from teams across the org, and have visible impact to OKRs for the group. As a lead, you will want to be at a level where you are delegating to the team and helping unblock or clarify project timelines, dependencies, and status outward to relevant stakeholders.

It’s important to not be coding in the critical path and end up neglecting your responsibilities around product leadership, technical guidance and helping the team make crucial architecture decisions. That’s why delegation will be your superpower here.

Take over a leadership role for a project outside your team

Your manager or a peer in an adjacent org may inform you of a project that is seeking additional help or leadership due to circumstances that are risking its delivery. Consider joining the project as a player-coach - as you help steady the delivery of the project, you’ll be building domain knowledge outside your immediate team and leveling up coworkers on other teams. This domain knowledge - and the relationships you build outside your world - will help you expand your reputation.

Scale a internal product into a platform

Have you built a solution that solves a problem for other teams, such as a machine learning model, a UI library, or an API? You may want to mature this offering into a general solution for other teams to digest and consume. Much has been written on this topic, but in general you will want to try to solicit concrete use cases from one or two early adopters, who you can use to gradually evolve your system from a bespoke solution to an extensible platform.

Advance a competency across functional areas

Is there a skill gap that you see across a functional role? Maybe there’s an opportunity to upskill engineers by leading workshops or starting a community of practice, whether that be around clean coding practices, testability, performance, a new tool, software language, or framework. The best thing is that you don’t have to be an expert to accomplish this - you can pitch this as a way of learning and upskilling together, and you can take the lead in assembling the team.

Improve a process or capability across the company

Perhaps there are ways to improve ways of working by tweaking an agile practice or process that has long stopped working for people. You may notice a gap in how you conduct hiring reviews, or see a need to implement an architecture review process. You may champion new programs, such as starting a hiring pipeline from nontraditional career backgrounds. As is the case with organizational change, make sure you have clear buy-in and support from leadership and other stakeholders before proceeding.

Putting it all together

It’s tempting to start from the tactics and think that Staff+ career advancement just means executing and shipping bigger projects. But the reality is,

If we enter a new project without the right relationships and trust with our coworkers, then our actions may be viewed with suspicion or indifference.
If we aren’t aligned with information flows (both official and backchannel), then we may miss out on impactful opportunities to intervene.
If we start a project that has too small a scope, doesn’t solve a problem or lacks actual use cases, then it doesn’t have sufficient outward visibility and impact on the organization.
If we get on a project that doesn’t align with our organizational scope, then we will be doing work that is not visible or rewarded by the organization we belong to.
If we are the ones doing all the heavy lifting and taking all the interesting IC work, then we miss opportunities to grow our teammates by giving them the meaty work. Additionally, we limit the effectiveness of our teams.

Over to you

Take some time to follow some of the prompts listed above and do your own introspective work. Is your manager or sponsor aware of your ambition? With whom do you have relationship with within the organization - and where might you need to be? Where might you plug in to receive information flows? How can you be helpful to others around you?

Growing your network, influence and scope is like nurturing a garden - your progress is hidden for a long time while the roots form underground. However, there will come a day when you get to reap the fruits. Take your time, have patience, be kind, and stay strategic. You’ll soon be going places!

What linguistics can teach us about building software with distributed teams

2021-07-13T22:21:00-07:00

]]> What can semiotics reveal about the hidden cultural frictions in distributed software development?

This post originally appeared as a guest article on LeadDev.com.

Worst day ever

It hasn’t been a good Monday afternoon. A remote team checks something into the Authentication service codebase that breaks the User Profile service owned by your team. By the time the deploy is rolled back, customers haven’t been able to log in to your product for several hours and news outlets have picked up the story. The CEO is on the line, demanding answers. You send hasty apology emails to your customers, then sign off for the day, exhausted.

In the days to come, everybody has an opinion about what could have gone better. Some suggest the teams needed better external documentation. Others suggest that a signoff process should be enforced and project management should get more involved. Another director suggests starting yet another architecture review committee. Although all those sound like good ideas, something nags at you - what’s actually happening here?

Software as written, cultural artifact

When we think of a software system, we often conceptualize it in its dynamic, operationalized form in production. We ask questions like, how many transactions per second can it process under load? Is it meeting SLO targets? But we also need to remember that software systems take a form much like written communication between the developers working on the system. It is just as important for a software system to have several nines of uptime as it is for it to be readable and understandable to the engineer poring over the code, trying to make sense of its shape.

After all, we know that engineers spend more time reading code than writing it. If an engineer makes a change to a system with an incorrect mental model, then defects will emerge. And the team’s mental model is indelibly imprinted in the code, the tests, and the documentation.

Software’s purpose is not just to achieve business goals for the company, but to be easily changeable, elegantly designed, and robustly tested for all current and future members of the team. In other words - one of the primary purposes of software is to guide its readers into constructing a mental model of the world and how it works.

What if we tried to imagine our systems as if they were time capsules - artifacts left behind for future teams and collaborators to sift through and understand? I believe that software systems can be thought of as textual artifacts, messages in a bottle, if you will, to a future teammate or cross-org collaborator, meant to convey the shape and meaning of the system in this current point in time.

To do that, I want to take you on a detour through semiotics, a field of linguistics and communication theory that deconstructs meaning in everything from literature, TV ads, political messaging, and Internet memes. What could that have to do with software development?

Intro to semiotics

Emerging from the work of Ferdinand de Saussure in the early 1900s, semiotics is the study of signs and symbols as they make meaning in cultural communications. Saussure was a linguist interested in how meaning was constructed through language. In Saussure’s model, meaning is constructed by a Signifier, a concrete “thing” in the world, and its corresponding Signified concept (a connoted meaning). Take the example of this image:

Photo by Carlos Quintero on Unsplash

The Signifier is the representation of this rose on your screen, pixel by pixel. By itself, it doesn’t communicate any meaning. Now you, the viewer, see this image and may think to yourself - ah, a rose! and automatically think about the flower (the Signified concept). How did you know that? You have familiarity with this type of flower in your lived experience, having seen roses in flower shops and in gardens around you.

But if you saw this image of a rose on a highway billboard for a jewelry store or in an online Valentine’s Day floral service ad, you may see different layers of meaning. You may understand this rose as signifying the concept of Romance, Love, or Passion, based on your cultural understanding of roses and the role they play in cultural tropes in movies, film, TV.

But just one second - if you were from a non-Western background (or an intergalactic alien being), this image of a rose may mean nothing to you!

Signifier (Concrete)	Context	Signified (Concept)
An image of a rose	On a billboard for a jewelry ad	The concept of “Romance”
	In a gardening textbook	The concept of the flower known as the rose
	A non-Western context	?

Saussure developed the beginnings of a framework for how meaning is communicated and understood through concrete artifacts in the world. This framework of semiotic analysis allows us to deconstruct a message into constituent parts - the actual form of the idea in a concrete form, its metaphorical or connoted meaning, and the role of the receiver as the message is parsed in context.

So what does this all have to do with software development? Semiotics pinpoints the hidden role of cultural assumptions of different viewers in different contexts looking at the same things! Let’s get practical and try to apply some semiotic thinking to highlight how divergent understandings can arise from seemingly innocuous features of our systems.

Naming

The first and most obvious place to apply semiotic thinking is in the naming of the concepts made concrete in our software. Go through your system and make a list of the concepts encoded in class names, variable names, functions and even database tables. You may observe:

Names may be internally consistent, from years of legacy system drift
Names may reflect unique business processes or concepts that are unfamiliar to an external reader
Names for concepts (like a User or an Account) may have a specific meaning within your business unit, but may have slightly different meanings to teams in other business contexts.

For example…

A User to your team may refer to a system record that contains login and authentication fields (like password authentication).
However, a User to an external team that builds the core product flow may associate it with user metadata, like a public profile.
Though your system may, in fact, use the same database table to store the two, the two concepts are different enough to draw separate distinctions.
This may lead to mistaken assumptions about key features about your User Profile API, which may be assumed to model one part of the system when in reality it does not.

Signifier (Concrete)	Context	Signified (Concept)
The User in the database	The Identity team	A record that maintains core account authentication attributes
	The Core Product team	A record that maps to the user’s unique presence in the world as they use the core product - for example, a place to store Profile information

Aha! That’s a key insight - that the perspective of a teammate in the Identity organization leads them to understand the User model slightly differently from a teammate in the Core Product organization.

Make it clear

To tackle this challenge and make these cultural assumptions explicit, draw from disciplines like Domain-Driven Design where the nuances in language in business contexts are made explicit in code. You may consider:

Creating a Ubiquitous Language of terms and definitions, written down in a Glossary to capture distinctions about the concepts in your corner of the world. This comes in handy for external readers who may be unfamiliar with the nuances of your world.
Running a Context Mapping or Event Storming exercise to collectively create a working mental model of your system, and bring out the names of concepts that run your business. This is particularly useful if there has not previously been much thought given to naming before.
If you depend on a “black box” system (an ML model, an API) upstream where the definition of the data is unclear, this would encourage your team to contact that team to build that shared understanding together.
If it becomes clear that your system is internally inconsistent or does not match the language of the business, then it would behoove your group to undergo some maintenance to rename your concepts in code for clarity.

Structure

The structure, or form, of our software systems is another concrete feature that communicates meaning. Consider these considerations around the architecture, platform and runtime features of our system:

Your team may have chosen a dynamic, convention-over-configuration language and framework like Ruby on Rails because you valued shipping features quickly while you found product-market fit. Or the team may have chosen Go because it valued performance, or you may have chosen Python because of its natural integration into the data science workflow.
Your chosen data store is DynamoDB because you have to handle huge, spiky surges of production traffic. Alternatively, you chose a relational database over a key-value store because you value normalized data and consistency for the query patterns you have to sustain.
Your usage of Event Sourcing architecture is due to an architect’s new vision for a highly scalable architecture with first-class auditability.

Many of these choices are not obvious to our new teammates or collaborators, who may have their own cultural or organizational constraints. “Why is feature X built with Y?” they may ask. If you don’t answer these questions up front, they may project their own assumptions incorrectly on your systems.

For example:

Your team has chosen an Event Sourcing architecture, which eschews typical CRUD database operations for immutable events on a message bus.
However, your collaborator teammates on the other side of the company are unfamiliar with the paradigm since that architecture is new to their company unit.
This mismatch in mental models leads their team to call an undocumented API endpoint that they believed was safe for use, but in reality emitted an event that corrupted the state of the system
Leading to the site outage that fateful Monday…

Signifier (Concrete)	Context	Signified (Concept)
The (undocumented) User Profile API	The Identity team	A protected API endpoint that is meant only for use in manual data migrations. If misused, it will emit invalid events that could corrupt the event store.
	The Core Product team	An API endpoint that allows our service to store profile data.

Make it clear

To solve for this, explicitly write all architectural and structural decisions down:

Creating an Architecture master document detailing the system’s shape, key features, and primary concerns can be a great go-to reference for new teammates and collaborators. Here you may link important technical specifications or design documents that were instrumental to creating the system.
Architecture Decision Records capture important decisions at different points in time that give readers context to why decisions were made the way they were made.
System diagrams (or anything visual) communicate the shape of the system far quicker than any word-filled paragraph can.
Documenting Value Trade-offs can be helpful ways for other teams to understand what your system is meant to optimize toward. For example:
- “We chose to build on Ruby on Rails because we value shipping features quickly over being overly obsessed about performance.”
- “We have an event-sourced architecture, so we prefer to think in terms of events over records of truth. This also makes auditability a first-class feature of our systems, which is important in the financial payments space we are in“
- “We use random forest models over neural networks because we value model interpretability over overall accuracy, especially since we our business model carries legal exposure”

Writing software is intercultural communication

Through our crash course in semiotics, we’ve learned how it can identify structures in our software systems that are open to (mis)interpretation. By getting ahead of ourselves and thinking about how different teams in different parts of the organization parse and interpret our systems and flows, we can anticipate ways in which we can end up with divergent mental models. We can get ahead of the issues to develop habits to document the hidden cultural forces that shape our code, our architectural decisions, and our software use cases.

Your software system is a message in a bottle. When a stranger in the future picks it up on some faraway sandy shore, what stories will it tell?

How now, sacred cow?

2020-12-30T00:00:00-08:00

]]> Sometimes, our most dearly-held beliefs and practices are the very things that keep us from succeeding in new leadership roles.

A friend had introduced me to a peer from another company last year who was having troubles adjusting to his new role as tech lead on his new team. We sat down at a local coffee shop to talk it through.

“My team just doesn’t know how to do Agile” the engineer stated. “I propose all these process changes but people are skeptical.”

Having myself been hired by my employer as a team lead, I knew how frustrating that could feel. I pressed on with some more questions about why the process was off.

“They’re pre-assigning stories at the start of the sprint. They estimate with time instead of points. And they’re all working in individual silos” this engineer sighed.

There was one last question I had to ask - “Is there a specific problem that’s happening as a result of this broken process? Is anything actually wrong?”

There was silence as my conversation partner thought further. “Not really,” he allowed. “They seem to be working out all right.”

“Does your manager think anything’s wrong?” I asked.

“Hm, not that I think about it. He’s understanding of what I want to accomplish, but the team’s been together for a long time and this is just how they work. People are willing to make the changes, but they’re skeptical what benefit it would bring,” he acknowledged.

Is there actually a problem?

Kent Beck, in an interview on the Software Engineering Daily podcast describes coming in to Facebook in 2011 and having his fundamental assumptions about software engineering challenged (for those of us who are unfamiliar, Kent Beck is one of the original authors of the Agile Manifesto and earliest advocates of XP and TDD):

Interviewer: When you joined Facebook, my understanding is that around that time, Facebook really didn’t have much testing. It’s ironic, because you were the creator of extreme programming. It was highly dependent on the process of writing unit tests and then writing the features. Facebook… was able to be successful, despite the fact that they wrote their features before they wrote their tests.

Kent Beck: The answer that I came to is that… one is, how many of your problems can you test for and how many problems only show up in production? … If you can’t write unit tests for it… The Facebook answer is don’t.

The second part of the answer is tests are a form of feedback and Facebook engineers had many, many other forms of feedback [describes rollout process].

Then the tests that I had written broke almost immediately. They were deleted. That was one of the things that surprised me. If you had a test and it failed, but the site was up, they just delete the test… If you eliminate this noise production, per definition the situation is clearer all of a sudden.

Mr. Beck realized that, despite being the very author of the book that espouses best practice, test-driven software development, his perspective was limited at best and he needed to recalibrate to the context of his new company.

When the playbook doesn’t apply any more

Many times we take our belief systems and playbooks with us as we progress in our careers. Who has experienced being under incoming senior leadership, having seen success at Company X attempt to implement that same process at new Company Y and see it fizzle out?

Why would that happen? After all, weren’t they hired to replicate their success at their prior position in the new position?

When I joined my current employer two years ago, I had what I thought were a list of unbreakable “rules to work by” and held conceptions of what were best practices. In discussions with my to-be-manager in the hiring process, I was brought on with a mandate to uplevel the team. And coming from a background in consulting with Very Successful Outcomes™, I had a very specific set of practices and dogmas that should Always Be Followed. Things like:

Pull-based Kansan-like flows > pre-assigned work
Pairing as much as possible > working in silos
Test-Driven > Test-After > No tests
Timelines and schedules should be deferred or delayed as long as possible, instead referring to team velocity as a proxy for progress.

In my first few weeks on the team, I could already see that we were violating every single one of my principles. Work was pre-assigned. Engineers took individual projects. Tests were sparse. The PM asked (in my opinion, pressured!) individual engineers for project dates on a regular basis. Points were individually given and tied to a time scale. Egads!

I could have a talk with my manager and have the Hey, I’m here to change everything up talk. But I knew it was wiser to wait and observe some more. And sure enough, I was surprised.

I had assumed that it was always advantageous to treat the team as a similarly-skilled set of work executors to be able to build work in the style of XP or Kanban. I felt allergic to the idea that the team might keep individual ownership over a particular project or system. In my mind, that meant a low bus factor.

However, as I observed the team, I realized that ownership and doing the implementation work were not the same. On a team that is gelled and high-performing, an “owner” of a project may also invite other teammates to work with them on it. Ownership meant accountability and leadership, not sole execution.

In another example, I pushed back hard against any marketer or PM who would ask me for date updates. “So when’s Project X going to launch?” or when writing a tech spec, being required to fill in expected launch time frames. I swallowed my pride and, after much discussion, deliberation (and padding) with the team, came up with something we could live with.

To my surprise, giving dates didn’t kill me. Dates, at this company, were used as sight lines rather than cudgels. If a project slipped, then it was communicated early and everybody adjusted! I had failed to take into account that my prior experience with deadline-driven development was an unhealthy one, and I needed to recalibrate my experiences with this new team.

Leave your assumptions at the door

There’s a reason so many management gurus espouse going on listening tours for new leaders before starting to execute. They need context, but they also need to see their new teams with fresh eyes.

I’ll leave with this excerpt I found fascinating from an interview with Nick Caldwell (VP Eng Reddit) who talks about making the shift from his time at Microsoft to a small team at Reddit:

Nick: Your natural assumption is to take whatever works in your previous roles and use it as a template… I view process more as a set of tools I carry around with me… you need to spend the first couple weeks just listening to the problems people on the team present to you. Then, you can dig around in your process tool bag, figure out the right tool for the job, and adapt that tool to fit the situation.

What tools, processes, or sacred cows do you bring along with you, and is it time to re-examine those?

The Product Owner Engineer - The Four Product Hats

2020-08-17T23:31:00-07:00

]]> Cool, we've got swarms of empowered ICs working on their newly ideated projects. How do you keep the team holistically moving toward the right goals, without having the superpowers of a PM on hand? You improvise and hand out some hats.

In Part 2, we built a bottom-up Idea Backlog-driven generation engine. Now we have swarms of ICs working on their own projects, the team still lacks organizing principles and a way to keep moving forward toward the right goalposts. In other words, we still need to solve the same problems that a PM would normally solve:

Deciding what the most important (or impactful) work is to take on during that iteration.
Reporting team progress and work output to interested stakeholders
Managing work within an Agile process

The Four Hats

We thought about what PM’s do for teams and we broke them up into four jobs, or “hats”, which four volunteers on the team manage. It looks something like this:

In addition to taking on Project Roles, teammates also rotate through Team Roles that reflect one aspect of product ownership. Let’s go through them one by one:

The Messenger’s Job is to Inform Stakeholders

In one sentence: The Messenger ensures that necessary stakeholders are informed of team progress.

It looks like: Each week, the Messenger posts a team update to a Slack channel describing the achievements of the team that past week. The update will include relevant links to backlog items, tech specs, and experiment results. The update highlights upcoming work and current project blockers.

Messengers are responsible for staying abreast of team members’ work status, aggregating them and summarizing them for a larger audience.

Messengers may also represent the team at cross-team meetings, such as product reviews and higher-level meetings to represent the work of the team.

If the Messenger were not doing their job: Stakeholders would be unaware of the work (and the wins) that the team is accomplishing. The potential for organizational confusion and duplicate work would rise.

The Scrum Master’s job is to Do Agile™️

In one sentence: The Scrum Master facilitates our planning-oriented Agile ceremonies:

backlog grooming
sprint planning
(and sometimes - standups and retros)

The Scrum Master doesn’t have any explicit decision-making powers, but ensures that these meetings run smoothly. Their laser focus is to help the team develop a beautiful set of backlog items.

It looks like: The Scrum Master works with each teammate to write well-defined user stories. This means that this individual must be well-read on Agile story-writing and understand how to facilitate sprint planning and estimation activities.

(Note - Agile purists will note that this is not technically the definition of a Scrum Master, but we liked the name and it stuck).

If the Scrum Master were not doing their job: the work necessary to develop a clean work backlog would fall through the cracks, leading to disorganization, uneven story definitions and unclear team metrics.

The Architect’s Job is to Make Prioritization Decisions

In one sentence: The Architect makes prioritization decisions for the team.

It looks like: The Architect thinks hard about the overall team roadmap, what initiatives need prioritization (and conversely, deprioritizing), and then makes the calls for the team. The Architect is likely that team member who already operates at a strategic level - most likely a more senior member of the team or someone in management.

This team member, throughout the week, makes prioritization decisions in the backlog, bringing up projects and stories that have immediate impact or urgency.

The Architect is responsible for maintaining relationships with other Product Managers in the organization to stay abreast of strategy discussions and stay in the loop with product discussions. This means that the Architect must be sure to represent the team in formal product reviews.

On my team, my manager plays the Architect, and it’s clear that their natural relationships with product and engineering leadership makes them the most qualified to play the Architect. It doesn’t mean that the Architect can’t be someone in an official management/leadership role, but it certainly makes it easier.

If the Architect slacked on the job: then the team would have no runway for new projects, or would spin their wheels trying to identify the most important work to do. The team may be unaware of broader strategy discussions, or run and chase after non-impactful projects.

The Dreamer’s Job is to Get the Idea Juices Flowing

In one sentence: The Dreamer’s job is to facilitate the Idea Generation engine.

It looks like: Our team weekly maintains a separate list of new ideas that we vet together, and the Dreamer’s job is to make sure that list of ideas is prioritized accordingly and well-defined.

In other words, the Dreamer’s job is to make sure the Idea Engine is running smoothly. They are the Scrum Master (the process facilitator) for the intake engine.

The Dreamer is constantly reminding the team to input new ideas into the engine. The Dreamer also facilitates group brainstorms and other idea-generation sessions that help the group do brainstorms.

One thing that I’ve noticed is that “idea generation” is not a natural thing unless you are deeply tuned into customer problems. The Dreamers who are most effective are those who can keep the team in tune with real customer pain points.

This might mean collaborating with a UX researcher to summarize findings for the team, or even doing a deep-dive onsite interview with a new customer. It might mean shadowing a support agent for a day to understand issues that customers face, so they can return to the team and broadcast some real challenges that need addressing.

If the Dreamer did not do their job, then the team would run out of impactful projects to implement, or would work on shallow-work that doesn’t truly solve a customer problem.

How’s it going?

We’ve been running this process for nine months now, and a few findings have emerged:

Wins

It’s very possible to operate without a PM! Months of product-owner operation have demonstrated to us that it is possible to run a bottom-up engineering team process. Our overall team output has not wavered much, despite the additional overhead of managing our PM responsibilities.

The team feels empowered to make impact - the culture of the team is very encouraging and positive. Ideas are encouraged and celebrated. People report that they feel empowered to make ideas a reality, and that the team’s culture encourages new, risky or unproven ideas.

Challenges

Lacking a seat at the table: Even though our team can operate independently, we oftentimes lose out on having a seat at the table with the rest of the PMs when they gather to formally discuss strategy and have syncs. This causes us to sometimes lose advance conversations that lead us to be a step behind emerging product strategy. This means that we must work hard to develop relationships within the PM org.

Idea Generation is unnatural: We’re still building the Idea Generation muscle into our team memory. For years, most of us have worked by having work handed to us by a PM. When given the opportunity to experiment and try out new things, we need to be in the right mindset to try that out. As a team, we are learning to understand customer pain points better so we can be nudged to experiment and try out new solutions.

Wearing a PM Hat is uncelebrated work: The additional workload of product management is a significant time commitment for our team. Depending on the role, this may take anywhere from 10% to 40% of their time. Many of our engineers (including myself) who have internalized an idea of “real work” to mean “execution” need to recalibrate our expectation of productivity to include the full lifecycle of events - from ideation to execution. As a team, we’re learning to celebrate the journey of building our PM muscles.

In conclusion

Can you really take a PM and split them up into four roles that cleanly - and divide the roles among a distributed team? Not easily. But it can be done. And when it’s done well, it leads to higher engagement and impact on the team.

The Product Owner Engineer - Engineers as Project Drivers

2020-08-17T21:13:00-07:00

]]> When individual empowerment is your radical idea, then you commit fully to it with the Engineer as Project Driver. All engineers, no matter their experience, are given the responsibility to take a project from start to finish and own the outcomes.

In our last installment, we saw how the Idea Backlog was a great tool to generate new ideas. Now that we’ve got great, well-defined ideas with clear measurement metrics and juice opportunity sizes, how do we execute on this work with the team?

What is a Project Driver?

The project driver is the individual responsible for the outcome of an experiment or project.

The driver may do things like:

Be the point person to loop in additional resources like Design, Data Science, or QA.
Write the technical spec describing the feature or experiment to launch.
Do the market research to understand the opportunity size or impact analysis of the change of this feature (ideally quantifiable to a core business metric - lift metric X by Y%, increase revenue by $Z).
Collaborate with adjacent teams to hammer out expectations or give advance notice of the expected changes their changes will drive.
Send out announcement and update emails explaining the launch of the feature, as well as follow-up emails reporting the results of the launch.

The driver does not:

Write all the code alone. We encourage teammates to pair with each other regularly or pitch in on parts of the project, especially those with multi-iteration timelines.

The Project & the Project Driver

We conceptualize the lifecycle of a Project from a Project Driver’s perspective into Plan, Build and Measure phases:

Plan

In the Plan phase, the requirements are still being gathered and relationships are being built. This is the “soft” part of the project - we are trying to convince folks to help us or commit to collaborating with us in this form.

A tech spec may be written and circulated among relevant teammates. Clear experiment hypotheses or measurements for success are defined and documented.

Build

In the Build phase, we actually build the systems and write the code that support our feature. There is a project management angle to this as well - requirements are written as user stories and logged into JIRA or system of choice.

The team (or teams) build against this backlog.

Measure

In the Measure phase, we launch the product and we let it bake, gathering metrics along the way. The product should ideally be launched in a manner that facilitates A/B testing so we can accurately measure the effect of the change.

Once we release our change and enough time has passed, we report our results to relevant stakeholders and mark our hypotheses as Validated (or not). We then turn around and use these learnings for a second iteration of our hypotheses and return into the Build phase to focus in some more.

Note that not all experiments or product launches are destined to live another day. Some should be reverted or axed. Others will live on as future product ideas or learnings to apply to a different domain.

(Astute observers may see that this process is a rebranded version of the Lean Startup Build-Measure-Learn triangle.)

Everybody should be a Project Driver

In skills-diverse teams, there will be a natural gap between engineers with different amounts of industry experience. We believe that all engineers, even new grads, can be trained to drive projects of increasing scope.

…the New Grad

Let’s take the New Grad as an example:

Although this is his first job out of college, our new grad is given a simple project to drive - an A/B testing experiment that adds a new module to a landing page.

Even though this project’s scope and complexity is small, our new grad will be able to pick up some incredible experience from it:

He will learn to communicate with adjacent teams when the changes require a new API change from an upstream service maintained by a sibling team.
He will learn to write queries in the backend when building out an analytics dashboard measuring the impact of his changes. By writing queries, he also gets closer with actual customer behavior - maybe here he discovers a hidden cohort of users that can be targeted differently.

The New Grad will succeed as a Project Driver if:

We set him up with a more senior pair buddy who can walk him through the steps and in real-time, mentor him.
Expectations are clearly set and documentation is plenty about how to perform the Project Driver role.
The team exhibits psychological safety around failure.

…the Experienced Engineer

On the other end of the spectrum, let’s take our Experienced Engineer as another example:

Even though our engineer is highly skilled, her growth arc will increase when she is challenged to ideate at a size and scale larger than she’s taken on before by owning big ideas with big outcomes. Here, her project is the development of a new machine learning platform that personalizes the search experience for the product.

She will be asked to consider leveraging new technologies to apply to new problems - is there a way to apply statistical models or machine learning to this domain?
She may pitch a Really Big Idea that captures momentum and require the collaboration of multiple teams - of which she is the technical lead. Here, she may be asked to involve contributors from the ML engineering group, as well as contributors from the realtime event streaming platform alongside her Search Product team.

The Experienced Engineer will succeed as a Project Driver if:

She is able to relinquish some coding responsibilities (even the juicy ones) to the team.
She is able to stay in the realm of project leading and technical design, building good feedback loops with other PMs and tech leads.

Final note - team coordination

While everyone has the opportunity to be a project driver, not every team member should be individually driving a project all the time. Projects should be sequenced so that teams do not have more than a handful of active projects at a time (limited WIP).

We encourage engineers to lead for some season, but in other seasons they can be the supporting cast for another engineer’s project. This allows all of our team members to be in the driver’s seat. Pun intended.

In our next installment, Part 4, we’ll look at how it all comes together by rotating our team through the four “Hats” of Product Management. Tune in!

The Product Owner Engineer - The Idea Backlog

2020-06-23T22:10:00-07:00

]]> How do you teach engineers to think like PMs? You give them the awesome and scary responsibility of choosing their work. Our team learned how to think and prioritize strategically by collectively building an Idea Backlog where we dream up and prioritize impactful projects to tackle. Here's how it works.

What if we turned all our engineers into product owners?

In Part 1, I told a story about how the departure of our PM led my manager and I to split the PM roles between ourselves, to shield the team from the change. This spread the two of us way too thin, which also caused us to be less effective overall. That clearly wasn’t going to work, and burnout lay around the corner.

We decided to make a major change to the team structure with a process that would focus on individual empowerment. What if all our engineers were empowered - and required - to decide what to work on in a truly radical way? What if we asked them all to think like product owners, and take responsibility for team output?

Instead of being fed work from a product manager, our engineers would be responsible for thinking of new features and projects that drive growth and seeing them through to completion through the entire software development cycle. Terrifying? Absolutely.

Individual Empowerment through the Idea Backlog

In this new model, the ball starts with each individual contributor. Armed with a clear vision of the goal (OKR) and the tools to measure them (metrics and analytics), the team is given leeway and authority to work on any project that can drive impact.

We track this process with an Idea Backlog, a prioritized backlog of ideas that are in various stages of maturity. Ideas may range from:

Landing page experiments involving copy or UX tweaks
New partnership ideas with other companies
New product ideas that solve a specific problem in the customer journey
New ways to gather feedback from users
Total moonshots - 10-year scenarios!

We ask teammates to continuously brainstorm and generate new ideas to keep the creative cycle going. Each week, we check in with the team and see if people have any new ideas for projects and experiments that can move the needle.

How do you think of an idea? Get people in front of customers.

People don’t just think of ideas unprompted! One way we help the team think up ideas is to get them consistently in front of customers. For example, by manning a chat widget on one of our landing pages, our team had to talk to several customers a day and learn about what they were having trouble with - those conversations led to product ideas that eventually became experiments!

In another experiment, I ended up piloting a customer-facing product aimed at restaurant owners and other SMBs. I had long phone conversations with restaurant owners and discovered new use cases and concerns that I had never known about. These become ideas as well.

Other ways we do this is by chatting with our UX researchers and reading customer interviews done by others. No matter whether your company is 5 people or 5000, there are always ways to break through the inertia and get the team in front of customers to build empathy.

Idea Backlog: Product Spec 1-pagers

Our Idea Backlog is really just a Kanban board of big ideas; rough and unfiltered. But they need to eventually get vetted. We do this in a weekly check-in where we vet the fitness of each idea.

Each idea backlog item moves from “this is a cool idea” verbal discussion into written form as a 1-pager specification, written by the idea’s author. This specification needs to have:

A sense of the opportunity size or impact, usually measured in one of our key metrics ($ or another core business metric - rides, in our case).
A discussion of how the implementation will work, usually defined as a A/B experiment that is measurable and time-bound.
A discussion of the risks and non-goals involved.
Any external stakeholders or teams that also need to be involved.

Yes, it’s painstaking work that often lives outside of most engineers’ comfort zones. This means oftentimes setting up meetings with different teams, other PMs, and product leaders. This means writing specs (ugh). This means getting down and dirty with data analysis, writing queries and hunting down data tables that may be poorly documented or hard to understand.

In other words, this isn’t coding, but it’s a foundational part of the exploratory analysis needed to be a product owner. And we have our engineers all go build that muscle.

This is also time-consuming - idea generation should be factored into team sprints. It’s not uncommon to spend a whole day writing queries digging through data to form a hypothesis, or finalizing the 1-pager spec that is to be circulated.

How Ideas move into the backlog

At the beginning of each week, we evaluate our roadmap and see if there is any need to “pull” a new idea from our idea backlog into the roadmap. At this point, the team can decide together whether an idea has enough legs, definition, and impact alignment so as to actually become a formal work project.

The Ideas that rise to the top are the ones that have at least one of these traits:

Has a high impact-to-effort ratio
Has a total opportunity that meaningfully moves the needle
Helps us learn something we don’t know and could validate a new market segment, partnership, or use case.

Thoughtfully designed Ideas will also be:

Defined well enough so that they are actionable to one or two weeks’ of effort.
Backed by clear metrics (or are designed with the right data instrumentation in mind to be able to retrieve clear metrics)
Launchable easily with minimal dependencies

This usually means that the kinds of ideas that get wings are quick tests - for example, we want to test the listing on the App Store. Or we want to test a quick JSON-LD rich snippet change on a web page that might cause our Google search index rank to rise. Or we deploy an alpha prototype to a small group of trusted prerelease customers.

It doesn’t mean we avoid ambitious, multimonth releases, but those are vetted with more deliberation and care, and do require much more unblocking and team alignment. My manager counterpart will often have to do some roadmap planning and stakeholder communications with upstream/downstream teams to get buy-in consensus for the very big projects.

Measuring success

And finally, the hallmark of an idea is that there are clear metrics for success - or failure. From this, we take a page from the playbook of the Lean Startup manual. We oftentimes use an Experiment Canvas to help think through our experiment design to build metrics that are clearly quantifiable and time-bound.

Features are launched with metrics dashboards built as part of the feature work. Clear metrics are essential to our ideas since they force us to see the outcome of our work through actual data, give us the courage to roll back changes that are ineffective.

Some caveats

Interested in trying out this method? Of course there are limitations, and here are some of them:

Make sure you have buy-in from other product organizations.
As you might imagine, bottom-up ideation must be guided by well-defined OKRs. We are fortunate to have OKRs that are well-designed and aligned with the rest of the company.
Having an Idea Backlog doesn’t mean your team doesn’t have long-term strategy. You still need someone to think about the long-term bets and plans for the team (skip ahead to Part 4 to read more about how we do that)

Idea Backlogs - a great way to build bottom-up ownership and product thinking on the team

The projects that make it out of this cycle get formalized in our roadmap and worked on in upcoming sprints. Better yet - all engineers are responsible for the generation of these ideas, so the team feels an increased ownership over their work.

In Part 3, we’ll discuss how we distribute this work among team members and got the day-to-day work of executing and strategy development rolling among the team!

The Product Owner Engineer: What if there were no PMs?

2020-05-11T22:04:00-07:00

]]> Can a team with a recently-departed product manager learn to survive on its own? How we built a bottom-up product development culture on my team at Lyft.

What if there were no product managers?

A former PM colleague once told me, with some amount of jest, “I don’t know why product managers need to exist.”

While it shocked me at first (coming from a product manager, no less), she was emphasizing that the core value of product managing is ephemeral - it’s strategic, it’s relational, and it’s hard to quantify and measure. It’s the stuff that fills the gaps and spaces between the orgs in the company, teams and process. A PM’s core function is to be the glue - maintaining alignment and developing strategy and execution planning among different areas of the business.

This post is certainly not about bashing PM’s. Let’s get that out of the way - I love PM’s, and a good one is worth their weight in gold. Their skills in focusing the team on building the right thing the right way at the right time can be a game-changer for many companies.

But sometimes that role isn’t needed. Businesses and companies in specific configurations oftentimes have plenty of runway where they can operate without a formal Product Manager. For example, plenty of founding engineering teams operate in a vacuum without a product owner. In those cases, their engineers develop the skills and product intuition to work like a PM.

Only once an organization scales and the focusing powers of a PM are needed, do PM needs really reach their zenith.

However - there’s sometimes an exception where engineering teams at all scales can and do operate without PMs - I’ve met with some world-class engineers at Pinterest and (once upon a time) at Stitch Fix. These were product teams (often growth) that were formed as 100% engineering teams that - surprise - do not operate with PMs.

You may not need one right now either. To explain what I mean by that, I need to tell a story about my team at Lyft.

Life with a PM was good

When I first joined Lyft, my team operated the same way many standard software teams do. We had a dedicated product manager and a team of engineers, along with design and marketing as shared resources. Our product manager was responsible for the usual things a PM does - prioritization of new projects, creating product specs (including market research and opportunity analysis) and doing all the legwork to communicate between teams.

And it was glorious. All the hard work of communicating with stakeholders all along the org? Done. Prioritization decisions? Instantly made. Deep domain expertise about our product? Batteries included.

Then one day, it all stopped.

Our PM colleague came in one day and announced his departure. Just like that, we were thrown for a loop. What were we to do? A replacement was not coming - instead, the team was to own their output and impact for the business. Gulp.

We improvised. My manager (the team’s engineering manager) and I (the tech lead) decided that we’d split up PM responsibilities between the two of us. So I took on the organizational part of the role, and got busy organizing stories and the backlog. He got busy jumping into lots of meetings with leadership and other stakeholders. We held it all together with duct tape and Slack chats and prayed the whole enterprise would hold together till the end of the year.

The Reckoning

The end of the year arrived and quite honestly, we whiffed on our goals. Our team performance against our OKRs was underwhelming. What went wrong?

We looked back at our team output and concluded:

We had not been working on the most important (the most impactful) things. We worked too hard and too long on projects that did not end up with much business impact. We had spent so much energy just keeping the lights on for the team that we forgot to be strategic and ruthlessly audit and reprioritize our plans. Instead, we got emotionally involved in the details over concrete analysis of the data.
We were blockers for the team. We felt overwhelmed with a constant stream of tasks - planning, stakeholder management, backlog grooming and the mundane tasks that keep a team running. The team was often blocked and waiting to be fed work.
We were not communicating enough with stakeholders. We were often slow to broadcast our work to the rest of the organization, and so our team’s work remained low-visibility. We missed opportunities to seek out feedback and gain corrective wisdom from our peers.

Surprise, surprise. Having lost a PM, we had one less teammate who could be wholly devoted to guiding the team to build the right solutions for the customer. As hard as we worked, we were still the bottlenecks, limited by inexperience and lacking time to deeply think about the product needs of the team.

We needed to empower the team - the entire team - to think like product owners.

In the next post, I’ll talk about how we made some changes to the team - and empowered every engineer to think like a product owner. Read Part 2!

High Output Management for (Non-managing) Tech Leads

2020-01-04T13:43:00-08:00

]]> If you've worked in tech long enough, you'll hear praise for Andy Grove's High Output Management, oft-praised as a touchstone of technology management (Grove is often referred to as the "Father of the OKR"). A year or so ago, I got to read his book through Harrison Metal's General Management course. What can you apply from the book when you're a tech lead, and not actually a manager?

I’ve written a bit on this blog about the highs and lows of my time in engineering management. I’m a team lead nowadays - less managing, more coding, but I still think long and hard about what leadership means and looks like in the tech industry.

Re-reading the book recently made me realize that there could be a useful angle for tech leaders who aren’t in direct management, but are in leadership roles anyway. Folks in our shoes tend to work closely with first- and second-line managers. Even though Andy Grove’s advice is targeted to this group, plenty of it still applies to the lead role, and understanding the challenges and mindset of your manager compatriots will greatly increase your effectiveness as a lead.

Output Observability & Grove’s Breakfast Factory

Andy Grove leads with an iconic example of running a diner, giving examples of how you might measure the output of a business flipping griddles, eggs, and waffles at a breakfast diner. To him, software production is not so different from an assembly line metaphor, and he takes the reader through a tour of his fictional diner business. In the end, he concludes that managers need to develop metrics that let them monitor the output of their own “factories” through three types of indicators: Output Indicators, Quality Indicators and In-Process Inspections.

What are the indicators your manager could be watching for?

Output indicators: Did we ship what we committed to ship, on time? How is the velocity of the team trending, and what can we do to unblock them to go faster? What is the business impact of our software releases, and are they trending positively?
Quality indicators: What was the defect rate of what we’re shipping? How is code quality doing overall? How is test coverage?
In-process inspections: Provide some context and walk your management counterpart through a “batch” of output. It could be walking them through a feature that was just released, or explaining the architecture of a new, complex feature.

The takeaway for technical leads? Collaborate closely with your management counterpart to develop and contextualize these indicators. Your management counterpart may not always have the day-to-day context of what is happening, so your expertise will set them up for an accurate evaluation of team output and performance. Help explain the progress and status of the team to give the manager confidence that the team is operating to its fullest potential.

This could look like:

Being fully invested in the OKR-setting process, making sure you’re both on the same page about the metrics most important to tracking team output.
Shipping a live dashboard tracking key metrics, displayed on a TV monitor on the team floor.
Weekly check-ins with an agenda going over team output metrics.
Or even more simply - a conversation - “What do you care about seeing?” “What worries you the most about our team?” Then from there, jump to developing metrics that display progress toward a goal.

Leverage, for the Technical Lead

Much is written in the book about how a manager’s role is defined by their indirect influence on the output of the organization(s) under them. Given the scarcity of time for most managers, Grove encourages managers to seek out high-leverage activities to maximize their influence.

Oh, and one more key insight. Grove cautions his readers from confusing activity with output. Merely filling your day with low-leverage meetings may not be the best use of your time.

Technical leads have much to learn from this insight as well. What are high-leverage activities for a technical lead? What are low-leverage activities?

High Leverage	Low Leverage
Attending an architecture review board meeting	Working on a “juicy” feature in isolation
Leading a backlog grooming session to make sure the work is well-defined for the next iteration or sprint.	Working on something just because the tech is cool but with low business value
Pairing with another developer to do knowledge transfer	Nitpicking over syntax on someone’s PR
Working on external documentation*	Oh, did I mention coding in isolation?
Performing an effective, empathetic code review
Discussing tradeoffs and opportunities with other product leaders

* Varies by org size and team composition

Relationships, Leverage, and Connective Tissue

See that? High leverage activities involve communication and coordination between teams. They often help communicate work streams, set expectations to other stakeholders, or do the plain, boring work of defining work clearly and explicitly.

Low leverage activities oftentimes happen to be the “fun stuff” - a juicy feature that happens to let you refactor your fancy system to use a new pub/sub framework, or try out a new frontend or mobile framework. The “low leverage” part comes when you, the lead, take the work yourself without bringing anyone along with you.

In other words, the leverage from a team lead comes from the strength of the connective tissue they are building within their team and between other teams.

Meetings and 1:1s

While many of us might view meetings as a necessary evil to minimize, Grove takes a high view of 1:1 meetings. I believe that the effective lead should also do the same.

Since leads often have the highest amount of day to day context to the team’s work, you can be well positioned to put the day to day in context of the strategic in 1:1 meetings with your teammates.

What do I mean by “context of the day’s activities”? If you are doing your job of directing the day’s efforts and helping coordinate the daily tasks on the team, then you have execution authority to the execution flow of the team. You know who is working on what, and what streams of work remain blocked or undone.

What to talk about	Why
Opinions and feelings about project progress (“How is the project going?”)	You can get a read into your teammates’ heads and address morale or nagging questions
Career mentoring and coaching (“How can I help you grow?”)	Your unique view of the project as a peer on the project will allow you to feed your teammate advice as to how to adapt her or his skills to seek advancement. Focus on giving actionable feedback with an immediate application.
Real-time feedback (“How am I doing?”)	Since you are a peer doing the work along with your coworker, you can give feedback in near-real-time as it happens. That gaffe in that meeting, or a code review comment that fell flat, or the bungled feature deployment can all be discussed or addressed at your 1:1 without having too much time pass between the incident and the resolution.

The lead doesn’t have direct managerial authority, and in many ways, that is your superpower. Your authority comes in a tactical form, and that oftentimes makes input easier to swallow (than, say, if it came from your boss). You may find that your teammates find it easier to open up to you when they find that you are a safe person to confide in (more on that in a future post).

This also makes you valuable to your managing counterpart, as they can work with you to come up with coaching strategies for new hires or underperforming teammates.

At Lyft, we have a strong culture of inter-team one-on-one meetings. I’ve used these meetings to talk teammates through interpersonal issues, provide a listening ear to discuss the purpose of different strategic initiatives, or simply talk through execution-oriented agenda items like code style and architecture patterns.

By the way, this doesn’t mean that I don’t believe that managers can do these things either! I just mean that tech leads have more natural authority that comes from their tactical involvement in the day-to-day.

The outward-focused Operational Review meeting

Grove encourages managers to convene Operational Review meetings, whose purpose is to review the progress of an initiative or project. They can be held with a cross-team audience or within the team. The purpose of these meetings is to help leadership assess the health of a project or initiative so they can make decisions to keep it on track.

As a tech lead you may consider your role in presenting operational reviews to product or technical leadership. While your immediate management counterpart likely already has a great grasp on the status of your project, you can maximize alignment potential of communicating status clearly to cross-functional stakeholders.

Imagine a tech lead who is asked to represent the team at a monthly status meeting that convenes leaders in mid- to senior-level management who are interested in the progress on the team’s highly-visible project. Our lead may want to represent the technical execution aspect of the project, ready to answer questions like:

How does the architecture of this project set us up for strategic success 2 years down the road?
Are the technical debt tradeoffs we are making in this iteration addressable in the next?
How is this team collaborating and integrating with the systems of other teams up- or downstream from the system?

The peer-level Operational Review meeting

I’m going a little off-script here since this isn’t exactly a Grove takeaway in his book, but from my experience it is just as important for the lead to demonstrate status progress to her peers - other leads or managers in other parts of the org. This reduces duplicative work, allows “aha” realizations that one team can use the tech of another team’s work - in other words, it increases leverage.

These peer reviews can take several forms. You may want to convene a cross-functional group of peers (co-leaders at your level) to communicate the status and challenges of your product surface. This can be something as simple as explaining progress on the project, presenting some architectural diagrams and summarizing with some needs and open blockers. Your audience may take time to give their perspective, to offer solutions, or to help problem solve with you.

Else, the tech lead can send out group updates to his or her peers in a shared Slack channel or email distribution. Peer cohorts can subscribe to these updates and glance through the updates periodically.

Example: At Lyft, a group of engineers in our Growth organization often gather in a Friday meeting to discuss their product areas on a monthly cadence. This group gathered represents teams from across nearly the entire product surface. Conversations may range from architecture reviews, to product presentations with open Q&A for presentations or brainstorming. The objective of the meeting is to share information that would otherwise have been contained in a silo, and find leverage points between the groups where opportunities may have been missed.

Making Decisions (as a Lead)

Grove talks about the primary responsibility of managers to make decisions. How should this work? Managers need to provide a venue to provide input, facilitate discussion, then after all inputs have been received, the manager can make a decision.

We can apply this to technical leading in two ways. First, it is important for a lead to be a decision-making partner to her management counterpart. Let’s imagine a manager needs to decide whether to make a case to senior leadership to expand headcount on the team. The technical lead can make the tactical case for doing so by explaining how the team’s output is blocked due to a key skill set being missing with specific examples of PRs, code samples, and/or specific backlog items that went unfinished or were underinvested in.

Secondly, the tech lead herself may make decisions. These decisions tend to live at the boundaries at the tactical and the strategic, and may include:

Should we address tech debt now, or punt on it until after the big product push?
Is there a new technology that we should be investigating now that could pay dividends in the future? For example, a new core technology (a streaming architecture, a blockchain/distributed ledger, a native mobile app) may unlock new product capabilities for the business to capitalize on.
Are there leverage points for our team to integrate with other teams? Is there an opportunity to internally (or externally) platform-ize our technology?

Of course, these decisions do not occur in a vacuum. Like Grove suggests, our lead must be able to convene the group of experts (the team) and have the team give their feedback and opinion.

Conclusion

Tech leads reading Grove’s High Output Management get a cheat sheet into the manager’s mindset, helping their management counterparts get the clearest picture into the performance of the team.

Tech leads can also take on a manager’s mindset, using 1:1s with their teammates to develop each member. They maximize their leverage as technical leaders to build bridges between other groups within their organizations.

References:

Andy Grove: High Output Management. https://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884