On Engineering Discovery

Navigating changes in a complex environment

Dec 27, 2022

I’ve had the opportunity of building ‘greenfield’ software at different organisation sizes. Did at my own startup (that was a failure, more on that in a different post, I guess), at a series B funded startup, at a VC funded scale-up and now at a very large enterprise. I realise that as the size of the company grows, both the stakes and the complexity of the greenfield software goes up. There is so much more to building software than writing code, especially when building something from scratch. This article is an attempt at sharing some things I have found that worked well for me. YMMV.

Complexity

If you are doing any type of discovery, chances are that you are working on something complex, not merely complicated. There is a difference between complex systems and complicated systems. This HBR article does a fantastic job of describing the difference than I ever could. I highly recommend reading it. A quote from the article -

Practically speaking, the main difference between complicated and complex systems is that with the former, one can usually predict outcomes by knowing the starting conditions. In a complex system, the same starting conditions can produce different outcomes, depending on the interactions of the elements in the system.

The characteristic of a system where it’s output is dependent on the type of interactions between the elements it is composed of is referred to as its emergent behaviour. The whole is greater than the sum of its parts. There is an entire field of cross disciplinary science referred to as complexity theory that describes this characteristic as being ‘emergent’.

Software architecture is an example of an emergent property of a social system. In our jobs, we work in teams, with other teams, in an org, with other orgs - all of which are social systems. If we ask N groups of people to “design” a software architecture to solve a loosely defined problem, it is very likely that we will get close to ‘N’ different architectures. The architectures are influenced by both the people in the groups and interactions between them. Architecture is an emergent property of a complex social system.

In the context of software engineering, when executing a change, it is important to understand and establish that we are operating in a complex system. This will dawn upon us the realisation that there will be 2nd order effects and non linear dynamics (typical of complex systems) at play that we cannot possibly think of up front. What this means is that there will always be unknown unknowns and new shit will always come to light. Accepting this reality will positively impact our execution. The scale of the complexity is directly proportional to the scale of the social system that you are operating within. This is why it gets incrementally (dare I say exponentially) more difficult to drive effective change as you move from being a startup to a scale up to a full blow enterprise.

Share Engineering Management

Strategy

What does change mean? It means that we have decided, for whatever reasons, that we are not happy with where we are right now and we are making a bet that if we were to be somewhere different from where we are right now, it would be beneficial for the team, the org or the company. In concrete terms, this might mean that we have decided to build a new product, retire an existing one, replace an existing solution with a new one, migrate from one tech stack to another, etc, etc. Whatever the reality & nuance of your situation may be, it can pretty much be summed up in the below canvas -

Think of the green blobs as unique coordinates on a plane. The “Where we want to be” blob can be anywhere on this plane. What determines the exact location of this blob is a good, well reasoned strategy. An important point to note here is that the “Where we want to be” is a bet. If anyone claims for it to be otherwise, they are either kidding themselves or those around them or both. Acknowledging that this is a bet will significantly impact the way we charter a course from where we are to where we want to be.

Discovery

Once we align on where we want to be, we need to charter a course to get there.

A path from ‘where we are’ to ‘where we want to be’ can pass through any number of coordinates (each empty circle representing a coordinate). The way I would like to frame the problem of engineering discovery is figuring out the shortest path from where we are to where we want to be.

It can be the case that with good luck by our side, we take the shortest possible path to where we want to be -

It can also be the case that with bad luck, we end up taking a really long path to where we want to be -

We can agree that depending on luck is not the best way to go about engineering discovery - that’s too risky and does not increase our chances of success. We want to minimise the influence of luck when it comes to discovery.

Guiding Principles

If there is anything I’d like for you to take away, it is that -

If you are doing discovery, you are making things move to validate a bet. The sooner you get validation, the sooner you can decide the next course of action.
There will be many unknown unknowns while doing discovery and new shit will always come to light. The best way to eliminate risk is to disambiguate and the best way to do that is to make progress.

Below, I list out a few principles that have helped me do engineering discovery in my past roles and my current role as well -

The low hanging fruit trap - When we don’t know any better, or feel the need to make progress, we end up going after the low hanging fruits. Apart from making us feel good, there is not a lot this will accomplish. As an extreme example, if you are building a new service at a large company with a complex infrastructure landscape that is spread out across multiple cloud vendors and data centres, pushing a commit that contains a hello world end point is the least of your problems. Having said that, work that is considered a low hanging fruit is still work that needs to be done. We need to be smart about when we go after the low hanging fruits. More on that later.
Identify Limiting Constraints - Both Eli Goldratt (Theory of Constraints fame) & Andy Grove (Intel & High Output Management fame) refer to the importance of identifying a limiting constraint and solving it. Not identifying the limiting constraints is akin to missing the forest for the trees. In the previous point, identifying where you should be hosting your service is one limiting constraint. For a more thorough treatment of constraints, I can only refer you to the works of the folks I have mentioned here.
Get you a team of people comfortable with ambiguity - An engineering discovery exercise is like an expedition (minus the risk to life). There are plenty of unknowns. A lot of paths the team will go down as part of discovery will lead to dead ends. Getting the right group of people to do the discovery is vital for success. Common traits that I came across in engineers who were involved in successful discoveries - they were comfortable not knowing, were resilient to change, driven, had their own implicit feedback loops on how they were making progress, were okay with not succeeding.
Optimise for time to value - If we are making a big bet that once we reach where we want to be, we will be able to deliver more value, then it is prudent to clarify what ‘more value’ means? For example, if the tech stack of an existing product is being upgraded, then getting to feature parity, imo, is an absolute waste of everyone’s time, efforts & money. The question we should really be asking ourselves is what the value add is. Once we identify the value add, we use ‘minimise time to value add’ as a guiding principle to make decisions and make progress.
Scrum is not your friend - There is no point doing any rituals or maintaining a 2 week cadence for the sake of maintaining cadence. You are operating in an uncharted territory and no agile framework can help you de-risk your goals. It would be naive to think so. Chose a low overhead, high bandwidth communications way of working with the team and get cracking.
Flow Interruptions & Context Switches - Because we are dealing with unknowns, we will spend a lot of time getting blocked. Our flow gets interrupted. Interruptions can be because we hit a dead end, coz the person who has tribal knowledge of the system we are dealing with is on leave or is slow to respond, we are waiting on product input to move forward on a potentially expensive engineering decision, etc, etc. When there are flow interruptions, in the interest of making progress, we will have to context switch. We can switch the context to try and solve other limiting constraints, or, as mentioned previously, we can go after the low hanging fruits. The frequent flow interruptions is something that the team should come to expect. A note to both engineers & managers is that if we are not smart about it, the frequent flow interruptions and context switches can place a huge amount of cognitive load on the engineers. It’s not always a fun place to be in. This is where a close knit team with camaraderie between the team mates goes a long long way.
Time box decisions - Each step in the discovery process will require someone to make some sort of a decision. To make progress and to de-risk the execution of change, it is important to time box decision making. I’ve seem far too often, even with time boxes, teams fall into the analysis paralysis trap. Done is better than great. Applies to decisions too.
Fail Forward - You will never always make good decisions. It is better to realise that and factor that into your execution. This will force you to take incremental steps towards ‘where you want to be’. Each incremental step resulting in some sort of feedback or validation of the decisions you have made.
Demo Driven Development - During discovery, it easy to develop a mindset of not having users we have to deliver value to. Setting up demos at short cadences creates a sense of progress. In a healthy team/org culture, it can also act as a forcing function to execute. Plus it is a great way of getting feedback and validation from people who are not involved in the discovery.
Stakeholder management - In a complex change, there is always going to be stakeholder interest. It is important that the team maintains bi directional communication with the stakeholders. New shit always comes to light and we don’t want there to be any surprises that can potentially sidetrack us from our goals. Sometimes, despite our best efforts, a needle can be moved only when the right stakeholder gets involved. It has been my observation that large and complex changes fail because the people executing the change and the stakeholder sponsoring the change were not on the same page.
Decouple yourself from adjacent teams - If you are in the midst of a complex change, chances are, teams you interface with and the systems owned by them are also in the midst of a complex change. We do not want to be put in a situation where we have to solve a constraint for each team that we collaborate with. This will only increase our complexity. Collaboration with other teams can be proxied to how systems owned by them integrate with your systems. Put on your software architecture hat and try to decouple your system from others. This way, you protect teams from each other’s complexities leaking into your boundaries.
Celebrate Small Wins & Milestones - You can spend two quarters (sometimes more) making steady progress but have zero users to show for it. Motivated engineers want to put stuff they’ve built into users hands. If this hasn’t happened for a long time, it is easy to get demotivated and lose sight of the why. Celebrating small wins & milestones helps the team by creating a sense of progress and ensuring that everyone still has their eyes on the prize.
Document - As you start executing the change and start moving towards where you want to be, there will be plenty of people who will be interested in what you are up to and why you are doing what you are doing. This can be other managers, business people who have a bone to pick with the change you are driving, staff engineers interested in technical alignment, product or design folks from your team or adjacent teams. Each of them will have a perspective that will likely help the team achieve its goal. It becomes super important that you capture your thoughts in a structured document that caters to different types of audiences. I have found that C4 architectural diagrams, Architecture Decision Records & an Engineering strategy document have worked out reasonably well.

Conclusion

The above listed guiding principles are exactly that. My opinion is that they are necessary but not sufficient for carrying out successful engineering discovery. How has your experience been navigating the phase of engineering discovery for your org? I’d love to hear your thoughts on this topic. Thank you for reading this far. I appreciate you.

Musings on Software

Discussion about this post