While I agree with the statements made in the original post, I'm afraid this thinking can be used as an excuse for avoiding any attempts at finding proper abstractions. Similarly to how the term "premature optimisation" is so frequently used by people unable to write efficient code to excuse for their lack of skill or laziness, despite the context and the times when those words were first used were vastly different and the author meant something else.
IMHO abstraction should not be guided by the desire to remove duplication. Duplication is not even the only (and far from the worst) result of insufficient abstraction.
Insufficient abstraction leads to increased complexity, not just duplication.
Example: just this week I've been working on some code that has to deal with arbitrary ranges of ordered values. Typically when you think of a range, you think of a pair of bounds - the lower and the upper bound. However, the input is allowed to have only half-ranges so that one of the ends might be unbounded. So in the code I inherited there are 3 cases: a range with both lower and upper bounds defined, a range with only a lower bound, and a range with only an upper bound. All code processing those ranges has to deal with that optionality of either end, thus making it way more complex than needed - lot of if ladders or switch statements. And it multiplies very quickly when you deal with more than one range at a time. It is insufficiently abstract, even though it doesn't have any obvious duplication. The proper abstraction would be to transform the half-ranges to full ranges by introducing special open-end items (always smaller or greater than every possible value) which would allow one simple type of range to cover all possible cases.
Very good example of how even a small scale piece of logic can have very messy effects down the line.
Both insufficient and wrong abstractions are viral. They infect everything they touch, which can snowball into large parts being more complex, harder to understand and debug and often also slower.
The wrong abstraction is wrong, insufficient abstraction is wrong.
Really the only weapons against complexity we have as programmers are decomposition and abstraction. We have to take things apart, like in your example it would be the meaning of each parameter, and then we put them together in such away that the details below our abstraction can mostly be ignored.
I say that all with a caveat: I tend to prefer less, insufficient or no abstraction over the wrong one. The former few options can lead to code that is hard to understand as a whole and can be brittle, but the latter drives you into a corner: The only way out is either trying to patch over it or starting from scratch - choose your poison...
Going back not only means starting from scratch but also adopting anything that the code to be changed touches. It's often a lot of work with uncertainty and thanklessness attached to it.
It's not any worse than introducing a new abstraction when there previously wasn't one. At least in a statically typed language one will quickly identify those touch points. When introducing a new abstraction there is no such help.
Not starting from scratch. But developers never do things like reduplication, or merging modules.
Those concepts are so alien that I believe if I say those words on a recent conversation, lots of people will pop trying insisting on redefining them into meaningless ones.
I'd say this can serve as an example where triplication is better than your abstraction. What are these special open-ended items? How do you need to extend comparison to account for them? Etc. Whereas the three cases are perfectly clear and easy to understand.
The three cases force every code using ranges to deal with them. Apply this way if thinking many times for multiple concepts and you end up with a spaghetti of multiply nested if statements that's near impossible to analyze for correctness. Because now you have to read all code instead of just a tiny subset.
> What are these special open-ended items? How do you need to extend comparison to account for them?
The whole point of abstraction is to make those decisions once and isolate the complexity in one place instead of having it spread over N places in the code, forcing everybody to solve the same problems again and again.
Do ranges not support a limited set of operations through which the rest of the code can interact with them, instead of manipulating the endpoints directly?
I would think of the range itself as the abstraction, and then it matters less how it's implemented since any potential problems are local to the implementation and cheap-ish to fix.
Yes, this is correct. The range is the abstraction, and then you can choose how to represent it. Not much difference between the three range cases, and the single case with special endpoints, except that the three cases are more general, as no special values are needed.
Sure you can probably also do it, but this is not the way how the code was originally written. The original ranges present the bounds in their public API, and most code just operated on them.
Then I would say that's the problem. Whatever implementation you leak, the problem is the lack of implementation hiding, not that the implementation looks this way or that way.
I agree, your point still holds and if anything becomes stronger, because now we've identified a concept at a higher level of abstraction to replace endpoint operations!
Hard disagree here. Encapsulation is just one aspect of implementing the abstraction. And I'm not even convinced it is the best way here. Leaving the ranges as simple structures with two public fields, but introducing an "infinity" concept is another way to go. No encapsulation but still abstract (although one may argue this is still encapsulation but at a different level - applied to range bounds instead of whole ranges).
Encapsulation of ranges doesn't actually fully solve the problem, but just moves and isolates the complexity to the private implementation of the range concept (likely a class in OOP). E.g. you want to compute if two ranges overlap - you still have to deal with the complexity of 3 cases in each argument, so total 9 cases.
And hiding the bounds is likely going to be a lot more intrusive on the existing caller code (more refactoring).
When your three cases run into someone else’s two you have six, and it can even get much worse than that. Or an object running into itself and getting nine cases.
Intervals are nice for representing acceptable ranges. Half intervals mean greater/less than. If you stick infinities on the ends, everything likely works. You then expose methods or functions for all your operations. From the outside, you don’t have to care if it’s a half interval or not (unless that is what you’re particularly checking). On the inside you don’t really, either.
If you’re messing with intervals in a business setting, it’s worth considering if you need multi intervals, non continuous regions.
These are all great for handling uncertainty. Like if you add two weights that have +/- values, you can have the sum have those and be correct. The math is all well defined and rather easy. Wikipedia has good pages on it.
I don't remember who said that, but mathematics is all about building abstractions, not about computation. So many times mathematics helped me make code simpler.
The post I am replying to made no assumptions about the domain the order is defined on. If it is over the reals, sure, you can use -∞ and ∞. If it is over the integers, you can use MIN_VALUE and MAX_VALUE, sacrificing some of your domain (which might be a problem, depending on the context), or you can use Option[Int], which comes with performance issues.
Or, you can use a range which is a sum of three/four cases, and not worry about any of that.
The time dimension is often forgotten when applying these maxims. When we see code, we often fail to consider the the journey it's taken to arrive at that point in time, and where it might be headed in future.
In the example you set, it's the right time to apply an abstraction, so it's no longer premature. Perhaps the maxim should be labelled as "premature abstraction", rather than "premature optimisation".
> The proper abstraction would be to transform the half-ranges to full ranges by introducing special open-end items (always smaller or greater than every possible value) which would allow one simple type of range to cover all possible cases.
You wouldn't even need to create anything new—both math and C already provide this abstraction in the form of −∞/-inf and ∞/inf.
I wasnt talking specifically about real (float) numbers, but yes - this is that abstraction. And it generalizes to any type with ordering (can work with integral types as well).
I'm encountering a lot of these types of small abstraction projects in a React project I'm working on. It's a music theory "explorer" app and, maybe unsurprisingly if you know any music theory, getting a good abstraction that doesn't fall victim to lots of weird little edge cases is tricky.
I'm using Tonal which makes it easier, because I can mostly push weirdness into wrappers for individual Tonal calls. It's honestly been a great little challenge because the scope is so small that it doesn't take all that much analysis or thought to see where abstractions break down. Fun little exercise in code design.
IDK anything about music theory but I wonder, if you're having trouble finding a good at abstraction to express theory, perhaps the theory is at fault.
I mean, at a high level, theory is the abstraction isn't it?
It's really use case specific in my case - things like having a selected key should be straightforward enough, but some of my components weren't written with that in mind. It's little things like that that make me say it's a good exercise - many of the difficulties are my own fault, which means it's easy to learn from my mistakes.
Music theory in general is a somewhat difficult abstraction due to the multiple ways to interpret different things in different contexts. The same chord progression might be thought of as being in several different keys based on other contextual information for example.
Problems like these often come from the pressure to ship fast, and not writing code 2-3 times to find a good way to express something. If you're going to rush through abstracting things away, I'd rather you duplicated. If you will take time to express it well, then I'd prefer a good abstraction.
Whenever this article is posted it amazes me. People seem to only reply to the title, and ignore the substance of it. The point is not to "not abstract" or "rule of 3". The point is requirements change, features are added, and when an abstraction becomes wrong, tear it out.
> The point is requirements change, features are added, and when an abstraction becomes wrong, tear it out.
I like this phrasing a lot, thanks for this!
I'm still wondering if there's also potential in avoiding the wrong abstractions in the first place. For that we'd need a "cheap" way to decide whether an abstraction is good/bad/something else.
Is there generally applicable, widely accepted principles or research around this? A quick search only revealed random blog posts; nothing I'd consider widely accepted.
I feel there's a beautiful representation of (many) problems that is waiting to be found that makes particular software problems easy to read and understand. When the mental model of the software just makes sense. I don't want to reject other people's models of how programs work but I want to understand them.
Unfortunately, I feel the the original problem can be obfuscated by adding ideas to the existing problem who now need to understand your ideas (or mental model) of the problem to understand your code. I need to understand how you think to read your code. And if your way of thinking is more advanced than mine or incomplete or not great, then my work is harder.
Mental models are how I understand software, there is what I call a "critical insight" that makes the code obvious and easy to understand. I don't want to be deciphering and spend days investigating code to understand how to change it or build upon it or use it. I want the APIs to reflect their expected usage and behaviours.
My perspective is that computers are adding and arrangement machines - they add and do operation on numbers and move things around to different locations. My mental model of computers is that it comes down to LOGISTICS and arrangement/ordering problems. Unfortunately, APIs and data structures are nested and ordered and obfuscate the underlying movement of things between places or addition to different things.
Everything that obfuscates the rules of the computation means getting the behaviour you want from the code is harder.
I've often thought about "commutative computation" where we specify what we want to be true to the computer and the computer works out how to arrange all its existing computations to satisfy that additional invariant. I often think of software as a series of behaviours rather than functional or imperative.
Think of a materialised view, we have an existing behaviour of the computer and we want to customise the behaviour. You could work out where you need to insert your code snippet into but that's really hard. Or you could add an invariant to the system that the system now satisfies.
> Unfortunately, I feel the the original problem can be obfuscated by adding ideas to the existing problem who now need to understand your ideas (or mental model) of the problem to understand your code.
What do you think are the steps someone goes through when they obfuscate the problem by adding ideas? Like why do you think they do it?
Things get obfuscated because someone's viewing the problems from a different abstraction lens, and they're building a system onto that lens.
Eg.
Iterate through an array:
const arr = [1, 2, 3];
for (let i = 0, l = arr.length; i < l; ++i) { console.log(arr[i]) }
Let's model it differently using an iterator:
const arr = [1, 2, 3];
const arrIter = arr[Symbol.iterator]();
let i = arrIter.next();
while (!i.done) {
console.log(i.value);
i = arrIter.next();
}
At this level it's still pretty obvious what's going on, but you can still see that there's a level of abstraction between an array access vs calling 'next/value', and that obfuscates what is actually happening at the computation/instruction level.
If I extend this another level then I'm going to start modelling problems using an iterable and not an array/index. New requirements come in and we extend to use an async iterable. Everything still works nicely, but in some scenarios where the actual iterable is just an array, now there's a lot of extra overhead to just do an index lookup.
Using the iterator allows the code to be reused in more scenarios, but there's usually a cost to switching the lens of abstraction so that it fits into a problems modeled differently.
I can tell you what I want to believe of myself that I think I do.
I try think of the simplest most elegant, beautiful solution to the problem that allows the minimal of code and minimal cleverness and complexity be used to solve the problem with trivial loop, map, hash lookup or traversal or association.
That usually comes with trying to see the problem in a different light, to reframe the problem as a different kind of problem, which can obfuscate the original problem.
This is part of the reason why I like Tailwind-style utility classes and Typescript union/intersection types. The simple fact that I am (often) spared the intellectual effort of coming up with a name. I wrote this: https://itnext.io/and-naming-things-tailwind-css-typescript-...
For limited occurrences that’s correct, but if you find yourself having the same `foo | bar | baz` all over the code, you’re going to want to introduce a shortcut term for it. Even just to be able to efficiently talk about it.
The other thing is that unions/intersections are not an abstraction, because they don’t hide any details. The purpose of an abstraction is to separate essential properties of whatever is being modeled (the interface) from current details that may change later, or that client code shouldn’t depend on (the implementation).
In case it's unclear, I agree with you completely. Introducing the right abstraction into a code base can feel like someone switching on the lights. Far more benefit than just DRY.
Conversely, I am currently working with a frontend code base that is using "classic CSS", and it's striking to me how frustrating it is to have to think up what the "semantics" of this and that particular <div> can be said to be, when there very often aren't any.
Yes, one of the reasons I like using mypy for python and typescript for frontend is that it forces me to recognise opportunities for abstractions. If some input/return type is getting really complicated or reappears in many places in the code then likely it’s a good candidate for an abstraction.
1. Organizations must value continuous improvement. We want to avoid two extremal behaviors that sours individuals. First, the lethargic in-bred sterility of: hey, it worked before you got here, and it's fine now. Play-it-off is not wisdom. On the other extreme is frustration gone wrong. Sure, you can see a problem AND be right about it. But whining and constant criticism sours. Everybody's problem is there are 10,000 things that could be worked on, and resources only for 1000. You better make sure you're customer driven so you pick the right 1000.
2. Duplication is better for the medium term ... if you stay with the problem for a while, you are better able to distill the big picture into a more coherent new abstraction. Here you can cite a problem, cite a solution, and stick to your guns. You are better positioned to impact change without being a whiner. Now, problems are working for you, not against.
More often than not, when I tried to employ the strategy explained in the post, the sunk-cost people would try to shut me down.
Fortunately my current project is different, because the team is very small and we have silos of responsibility, so we don't really get in each other's way that much.
It appears that the largest obstacle here is not the lack of ability, but agency.
In all the article, no reference to the business those abstractions or duplications are made for. I mean the way to decide if it is a "good duplication" is to ask ourself if it is a coincidence that it is the same code: 2 business rules having the same implementation does not mean it is a duplication.
> Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it
In my experience, this is the failure point in thinking about incremental development. A portion of code's mere existence does not make it untouchable -- I constantly express this to my team. If we are presented with new information (such as a capability) that would function best with a refactor, so be it.
We have some team members who think the opposite -- that any existing implementation must be preserved (I call this the "incumbency fallacy".) It creates a weird motivation to implement quickly/early, as if getting your pull request in is the most important aspect of your work. And yet, what I find is that early implementations often inform how something should be improved after the fact.
Why is this article recognizing only two extremes? Either everyone uses the same instance, or there's NO SHARING AT ALL:
"Re-introduce duplication by inlining the abstracted code back into every caller."
Or maybe if there are, say, 10 places dependent on the shared code, we can make them 5 places dependent on one version and 5 places dependent on another?
Forking and merging is part of business as usual in programming and we should be used to it. We should not be shocked that sometimes you have to fork a function because adding more parameters is not feasible, but nor should we declare sharing is therefore wrong or harmful.
Also, how you design parameters is extremely important. One callback parameter may be worth a hundred "normal" ones.
The point she is making is about choosing to go back, towards less abstraction, rather than forward. So I expect the answer to your question is that two endpoints are enough to establish both directions and make the intended point.
If midpoints are introduced then comments like yours "but what about..." can always be made until the entire abstraction tower is fully described, and that's not the blog post (or book) the author wanted to write.
I wish every time someone established two extreme points, everyone is like you, automatically interpolating an entire space of endless possibilities, countless shades of gray. But this is decidedly NOT how we think, because dichotomies are simpler to mentally process, and in fact ultimatums or "single right solution" situations are easiest to process.
Have you ever seen an online argument? If someone is right, and someone is not AS right as they are, they are a "left shill" and vice versa. If you promote solution A, and someone promotes solution B, then they're "wrong". Not establishing points, just "wrong".
So I think establishing two directions is best accomplished not by marking up two extreme points and leaving the rest to the imagination as our imagination is apparently quite poor.
It's more correct to describe the next step in a direction, and let us take things step by step and know that nuance is inherent to our success, not optional.
Of course. The best of all is to let go of all dogmas about coding. But we will not get there by either blog posts or comments, it requires education over time.
However, stuff like we use and have always used Kafka (read the code!) for messaging, so we're not doing kernel-by-pass to move data now is small 'c' culture.
Small 'c' culture is the kind of stuff that, if you abrogate it, a small army of people will come out of the woodwork and brow beat you for it. Brow beating to keep you inline is not engineering. It's nagging.
Tradition, when it's small 'c', is stifling. Don't fall for it.
In our team we have this rule that you should'nt even think about introducing an abstraction unless there's at least 3 real use cases to consider. You're most likely to create a wrong abstraction if there's only 1 use case; 2 cases may be just a coincidence (2 business rules look similar on the surface but have nothing to do with each other really). 3 is an heuristic but it saves us from investing too much time on most likely useless abstractions which only get in your way.
I think the rule of 3 tends to lead to reasonable results, but asking "is this really the same piece of knowledge I'm encoding in N places" (as the original formulation of DRY suggests) is going to be a little better. Sometimes it's two places but it's really clear they'll always change together, sometimes it's ten places but each is going to evolve independently (which, to be fair, might well be the determination made when "think[ing] about introducing an abstraction" in your formulation).
To push back a bit on naive misapplication of DRY I've been saying we should call collapsing things that are just coincidentally similar (and likely to change independently) "Huffman coding".
Wrong abstraction is a type of premature optimization, it's an anti-pattern that's very common among the senior and supersenior coders that 'already know everything in advance' and that knowledge turns out to be false.
Agree with the big idea, but the problem is - If you are not a computer scientist and current with the latest papers, you certainly have the "wrong" abstraction
So i'd collapse this whole article down to 1 bit - "software development is hard"
>duplication is far cheaper than the wrong abstraction
Isn't that a pretty useless sentence?
Of course duplication is cheaper because aren't the higher costs one of the reasons why it's a wrong abstraction?
> Stephen: Well of course too much is bad for you, that's what "too much" means you blithering twat. If you had too much water it would be bad for you, wouldn't it? "Too much" precisely means that quantity which is excessive, that's what it means. Could you ever say "too much water is good for you"? I mean if it's too much it's too much. Too much of anything is too much. Obviously. Jesus.
Duplication and abstraction aren’t the same, though. Abstraction is a tool for reducing duplication. The point of the post is that if the abstraction is wrong, it’s worse than just leaving the seemingly duplicated code.
I'd disagree. An abstraction is a way to reason about a problem. Often that reduces duplication, but it is a side effect from a better understanding of the problem.
Oh I see what you’re saying. No, “the wrong abstraction” doesn’t intrinsically mean it’s more costly than duplicated code. A lot of people argue that the wrong abstraction is still better than having duplicated code. She’s saying that’s not the case.
IMHO abstraction should not be guided by the desire to remove duplication. Duplication is not even the only (and far from the worst) result of insufficient abstraction.
Insufficient abstraction leads to increased complexity, not just duplication.
Example: just this week I've been working on some code that has to deal with arbitrary ranges of ordered values. Typically when you think of a range, you think of a pair of bounds - the lower and the upper bound. However, the input is allowed to have only half-ranges so that one of the ends might be unbounded. So in the code I inherited there are 3 cases: a range with both lower and upper bounds defined, a range with only a lower bound, and a range with only an upper bound. All code processing those ranges has to deal with that optionality of either end, thus making it way more complex than needed - lot of if ladders or switch statements. And it multiplies very quickly when you deal with more than one range at a time. It is insufficiently abstract, even though it doesn't have any obvious duplication. The proper abstraction would be to transform the half-ranges to full ranges by introducing special open-end items (always smaller or greater than every possible value) which would allow one simple type of range to cover all possible cases.