Exactly. Nobody's on board with paying at least twice as much for software though. But that's what you get when things change and you have to refactor BOTH your code AND your tests.
But what is your process for determining code is correct, and is it really faster and more reliably than writing tests? Sheer force of will? Running it through your brain a few times? Getting peer review? I often find tests that all things being equal tests are just the fastest way to review my own work, even if I hate writing them sometimes.
To have automated tests does not mean you have well-defined requirements.
I 100% agree with capturing requirements in tests. However, I argue that TDD does not cause that to happen.
I'd even make a stronger statement. Automated tests that don't capture a requirement should be deleted. Those sorts of tests only serve to hinder future refactoring.
A good test for a sort method is one that verifies data is sorted at the end of it. A bad test for a sort method is one that checks to see what order elements are visited in the sorting process. I have seen a lot of the "element order visit" style tests but not a whole lot of "did this method sort the data" style tests.
Imagine that in the process of implementing the sort method, you decide "I'm going to use a heap sort".
So, you say "Ok, I'll need a `heapify` method and a `siftDown` methods, so I'm going to write the tests to make sure both of those are working properly". But remember, we started this discussion saying "we need a sort method". So now, if you decide "You know what, heapsort is garbage, let's do tim sort instead!" Now, all the sudden you've got a bunch of useless tests. In the best case, you can simply delete those tests, but often devs often get intimidated deleting such tests "What if something else needs the `heapify` method"?
And that's exactly the problem I was pointing out with the example. We started the convo saying "tests couple implementation" and that's what's happened here. Our tests are making it seem like heap sort is the implementation we should use when we started this convo, we just needed a sorting method.
But now imagine we are talking about something way more complicated and/or less well known than a sorting algorithm. Now, it becomes a lot harder to sift out which tests are for implementation things and which are for requirements things. Without deleting the "these tests make sure I did a good implementation" future maintainers of the code are left to guess on what's a requirement and what's an implementation detail.
You don't write unit tests for heap sort, you write them for sort. Then you get them to pass using heap sort. Later, you replace heap sort with tim sort, and you can write it quickly and with confidence, because the test suite shows you when you've succeeded.
Public interfaces should change only under extreme circumstances, so needing to refactor legacy tests should be a rare event. Those legacy tests will help ensure that your public interface hasn't changed as it is extended to support changing requirements. You should not be testing private functions, leaving you free to refactor endlessly behind the public interface. What goes on behind the public interface is to be considered a black box. The code will ultimately be tested by virtue of the public interface being tested.
Assuming any piece of code won't or shouldn't be changed feels wrong. If you're a library developer you have to put processes in place to account for possible change. If not those public interfaces are just as refactorable as any other code imo. Nothing would be worse than not being able to implement a solution in the best manner because someone decided on an interface a year ago and enshrined it in unit tests.
Much, much worse is users having to deal with things randomly breaking after an update because someone decided they could make it better.
That's not to say you can't seek improvement. The public interface can be expanded without impacting existing uses. If, for example, an existing function doesn't reflect your current view of the world add a new one rather than try to jerry-rig the old one. If the new solutions are radically different such that you are essentially rewriting your code from scratch, a clean break is probably the better route to go.
If you are confident that existing users are no longer touching your legacy code, remove it rather than refactor it.
Things change. You change the code in response. What broke? Without the tests, you don't know.
"Things change" include "you fixed a bug". Bug fixes can create new bugs (the only study I am familiar with says 20-50% probability). Did your bug fix break anything else? How do you know? With good test coverage, you just run the tests. (Yes, the tests are never complete enough. They can be complete enough that they give fairly high confidence, and they can be complete enough to point out a surprising number of bugs.)
Does that make you pay "at least twice"? No. It makes you pay, yes, but you get a large amount of value back in terms of actually working code.
That can be an acceptable risk actually and it does quite often. There are two conceptually different phases in SDLC: verification that proves implementation is working according to spec and validation that proves that spec matches business expectations. Automated tests work on first phase, minimizing the risk that when reaching the next phase we will be validating the code that wasn’t implemented according to spec. If that risk is big enough, accepting the refactoring costs after validation may make a lot of sense.
Is it twice as much? I think unsound architectural practices in software is the root cause of this issue, not red green refactor.
You aren't doing "double the work" even though it seems that way on paper, unless the problem was solved with brittle architectural foundations and tightly coupled tests.
At the heart of this problem is most developers don't quite grasp boundary separation intuitively I think.