Are you saying you bought an electric car with functional 84kwh pack for less than 3 grand? If so I think the outlier is you. That is a better deal than I have seen.
What you are proposing sounds like a nightmare to debug. The high level perspective of the operation is of course valuable for determining if an investigation is necessary, but the low level perspective in the library code is almost always where the relevant details are hiding. Not logging these details means you are in the dark about anything your abstractions are hiding from higher level code (which is usually a lot)
If it’s not your code how is a log useful vs returning an error?
Even relatively complex operations like say convert this document into a PDF etc basically only has two useful states either it worked or something specific failed at which point just tell me that thing.
Now independent software like web servers or database can have useful logs because they have completely independent interfaces with the outside world. But I call libraries they don’t call me.
That’s a very simple operation. Try “take these 100 user generated pdfs and translate all of them”. Oh, “cannot parse unexpected character 0x001?” Cool beans, I wish I knew more.
No, I’m only saying a useless error code and a useless log are both possible. Either could be useful or they could both be useless because the creator was actively malicious etc. Thus, the possibility of a useless error code doesn’t inherently mean a log would improve things.
Really the only thing we can defiantly say is when both approaches are executed well it’s harder to use log entries in your code. If something returns an error that’s tied to a specific call to a specific bit of code, where a log entry could in theory be from anything etc.
Trace can become so voluminous that it is switched on only on a need basis which can be too late for rare events. Also trace level as more a need to use debug tool tends to be less scrutinized for exposing sensitive data making it unsuitable for continuous operation or use in live production.
It’s not that simple. First, this results in exception messages that are a concatenation of multiple levels of error escalation. These become difficult to read and have to be broken up again in reverse order.
Second, it can lose information about at what exact time and in what exact order things happened. For example, cleanup operations during stack unwinding can also produce log messages, and then it’s not clear anymore that the original error happened before those.
Even when you include a timestamp at each level, that’s often not sufficient to establish a unique ordering, unless you add some sort of unique counter.
It gets even more complicated when exceptions are escalated across thread boundaries.
> First, this results in exception messages that are a concatenation of multiple levels of error escalation. These become difficult to read and have to be broken up again in reverse order
Personally I don't mind it... the whole "$outer: $inner" convention naturally lends to messages that still parse in my brain and actually include the details in a pretty natural way. Something like:
"Error starting up: Could not connect to database: Could not read database configuration: Could not open config file: Permission denied"
Tells me the config file for the database has broken permissions. Because the permission denied error caused a failure opening the config file, which caused a failure reading the database configure, which caused a failure connecting to the database, which caused an error starting up. It's deterministic in that for "$outer: $inner", $inner always caused $outer.
Maybe it's just experience though, in a sense that it takes a lot of time and familiarity for someone to actually prefer the above. Non-technical people probably hate such messages and I don't necessarily blame them.
Sometimes you don’t have all the relevant details in scope at the point of error. For instance some recoverable thing might have happened first which exercises a backup path with slightly different data. This is not exception worthy and execution continues. Then maybe some piece of data in this backup path interacts poorly with some other backend causing an error. The exception won’t tell you how you got there, only where you got stuck. Logging can tell you the steps that led up to that, which is useful. Of course you need a way to deal with verbose logs effectively, but such systems aren’t exactly rare these days.
> Then maybe some piece of data in this backup path interacts poorly with some other backend causing an error. The exception won’t tell you how you got there, only where you got stuck.
Then catch the exception on the backup path and wrap it in a custom exception that conveys to the handler the fact that you were on the backup path. Then throw the new exception.
At the extreme end: If my Javascript frontend is being told about a database configuration error happening in the backend when a call with specific parameters is made - that is a SERIOUS security problem.
Errors are massaged for the reader - a database access library will know that a DNS error occurred and that is (the first step for debugging) why it cannot connect to the specified datastore. The service layer caller does not need to know that there is a DNS error, it just needs to know that the specified datastore is uncontactable (and then it can move on to the approriate resilience strategy, retry that same datastore, fallback to a different datastore, or tell the API that it cannot complete the call at all).
The caller can then decide what to do (typically say "Well, I tried, but nothing's happening, have yourself a merry 500)
It makes no sense for the Service level to know the details of why the database access layer could not connect, no more than it makes any sense for the database access layer to know why there is a DNS configuration error - the database access just needs to log the reasons (for humans to investigate), and tell the caller (the service layer) that it could not do the task it was asked to do.
If the service layer is told that the database access layer encountered a DNS problem, what is it going to do?
Nothing, the best it can do is log (tell the humans monitoring it) that a DB access call (to a specific DB service layer) failed, and try something else, which is a generic strategy, one that applies to a host of errors that the database call could return.
Imagine you have a caching library that handles DB fallback. A cache that should be there but goes missing is arguably an issue.
Should if throw an exception for that to let you know, or should it gracefully fallback so your service stays alive ? The middle ground is leaving a log and chugging along, your proposition throws that out of the window.
I guess in those cases standard practice is for lib to return a detailed error yeah.
As far as traces, trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
From my experience working on B2B applications, I am happy that everything is generally spammed to the logs because there would simply be no other reasonable way to diagnose many problems.
It is very, very common that the code that you have written isn't even the code that executes. It gets modified by enterprise anti virus or "endpoint security". All too often do I see "File.Open" calls return true that the caller has access, but actually what's happened is AV has intercepted the call, blocked it improperly, and returns 0 bytes file that exists (even though there is actually a larger file there) instead of saying the file cannot open.
I will never, in a million years, be granted access to attach a debugger to such a client computer. In fact, they will not even initially disclose that they are using anti virus. They will just say the machine is set up per company policy and that your software doesn't work, fix it. The assumption is always that your software is to blame and they give you nearly nothing, except for the logs.
The only way I ever get this solved in a reasonable amount of time is by looking at verbose logs, determining that the scenario they have described is impossible, explaining which series of log messages is not able to occur, yet occurred on their system, and ask them to investigate further. Usually this ends up being closed with a resolution like "Checked SuperProtectPro360 logs and found it was writing infernal error logs at the same time as using the software. Adjusted the monitoring settings and problem is now resolved."
I don’t really understand what you mean about opening files. Is this just an example of an idempotent action or is there some specific significance here?
Either way logging the input (file name) is notably not sufficient for debugging if the file can change between invocations. The action can be idempotent and still be affected by other changes in the system.
> trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
If my program is broken I need it fixed regardless of why it’s broken. The specific example here of a file changing is likely to manifest as flakiness that’s impossible to diagnose without detailed logs from within the library.
I was just trying to think of an example of a non idempotent function. As in it depends on an external IO device.
I will say that error handling and logging in general is one of my weakpoints, but I made a comment about my approach so far being dbg/pdb based, attaching a debugger and creating breakpoints and prints ad-hoc rather than writing them in code. I'm sure there's reasons why it isn't used as much and logging in code is so much more common, but I have faith that it's a path worth specializing in.
Back to the file reading example, for a non-idempotent function. Considering we are using an encapsulating approach we have to split ourselves into 3 roles. We can be the IO library writer, we can be the calling code writer, and we can be an admin responsible for the whole product. I think a common trap engineers fall for is trying to keep all of the "global" context (or as much as they can handle) at all times.
In this case of course we wouldn't be writing the non-idempotent library, so of course that's not a hat we wear, do not quite care about the innards of the function and its state, rather we have a well defined set of errors that are part of the interface of the function (EINVAL, EACCES, EEXIST).
In this sense we respect the encapsulation boundaries and are provided the information necessary by the library. If we ever need to dive into the actual library code, first the encapsulation is broken and we are dealing with a leaky abstraction, second we just dive into the library code, (or the filesystem admin logs themselves).
It's not precisely the type of responsibility that can be handled at design time and in code anyways, when we code we are wearing the calling-module programmer hat. We cannot think of everything that the sysadmin might need at the time of experiencing an error, we have to think that they will be sufficiently armed with enough tools to gather the information necessary with other tools. And thank god for that! checking /proc/fs and looking at crash dumps, and attaching processes with dbg will yield far better info than relying on whatever print statements you somehow added to your program.
Anyways at least that's my take on the specific example of glibc-like implementations of POSIX file operations like open(). I'm sure the implications may change for other non-idempotent functions, but at some point, talking about specifics is a bit more productive than talking in the abstract.
The issue with relying on gdb is that you generally cannot do this in production. You can’t practically attach a debugger to a production instance of a service for both performance and privacy reasons, and the same generally applies to desktop and mobile applications being run by your customers. Gdb is mostly for local debugging and the truth is that “printf debugging” is how it often works for production. (Plus exception traces, crash dumps, etc. But there is a lot of debugging based on logging.) Interactive debugging is so much more efficient for local development but capable preexisting logging is so much more efficient for debugging production issues.
I generally agree that I would not expect a core library to do a bunch of logging, at least not onto your application logs. This stuff generally is very stable with a clean interface and well defined error reporting.
But there’s a whole world of libraries that are not as clean, not as stable, and not as well defined. Most libraries in my experience are nowhere near as clean as standard IO
libraries. They often do very complex stuff to simplify for the calling application and have weakly defined error behavior. The more complexity a library contains, the more it likely has this issue. Arguably that is leaky abstraction but it’s also the reality of a lot of software and I’m not even sure that’s a bad thing. A good library that leaks in unexpected conditions might be just fine for many real world purposes.
I guess my experience is more from the role of a startup or even in-house software. So we both develop and operate the software. But in scenarios where you ship the software and it's operated by someone else, it makes sense to have more auditable and restricted logging instead of all-too-powerful ad-hoc debugging.
I also generally find that people looking for “best practices” to follow are trying to avoid that “sitting down to think about the software and its role in the overall system” piece.
at some level it's not really an engineering issue. "bug free" requires that there is some external known goal with sufficient fidelity that it can classify all behaviors as "bug" or "not bug". This really doesn't exist in the vast majority of software projects. It is of course occasionally true that programmers are writing code that explicitly doesn't meet one of the requirements they were given, but most of the time the issue is that nothing was specified for certain cases, so code does whatever was easiest to implement. It is only when encountering those unspecified cases (either via a user report, or product demo, or manual QA) that the behavior is classified as "bug" or "not bug".
I don't see how AI would help with that even if it made writing code completely free. Even if the AI is writing the spec and fully specifies all possible outcomes, the human reviewing it will glance over the spec and approve it only to change their mind when confrunted with the actual behavior or user reports.
What if the AI kept bringing up unspecified cases and all you (the human) had to do all day was respond to it on what the behavior should be in each case? In this model the AI would not specify the outcomes; the specification is whatever you initially specified, and your responses to the AI's questions about the outcomes. At some point you'd decide that you'd answered enough questions (or the AI could not come up with any more unspecified cases), and bugs would be in what remained, but it would still mean substantially more thinking about cases than now.
I don't actually think people are driving around with high beams on. Modern LED headlights are just brighter, and cars are higher up than they used to be, meaning older lower cars, especially sedans are just in the path of regular beams. I actually yelled at someone once to turn off their high beams because I was so convinced that's what it was. turns out, they just drive a tesla, which just have blinding lights. I guess there are also probably people with high beams, but most of the ones that are terrible aren't high beams, they are just modern.
While I think that's part of it I can assure you there's also a lot of people running around with their brights on. I've seen them switch (after aggressively flashing mine) and the brights are a wider beam. So it is not just the intensity.
My running hypothesis is the autobright features on some cars is to blame. My friend drives around with his on and I definitely notice it doesn't properly react.
> they just drive a tesla, which just have blinding lights
I was thinking of that Deer in the Headlights ad from years ago and then stumbled on Mercedes promoting this... 11 years ago...[0].
I live in a pretty urban area, so maybe that's why I haven't seen so many brights. I've heard the autobright theory and it certainly makes sense though, especially in areas where people have a reason to use their brights.
I believe this technology is generally called adaptive headlights, and is implemented in quite a few cars. I believe some quirk of US laws (which specifies brightness at various points in the beam) actually prevents this feature from being implemented. Really sucks, because these new LED headlights are really hard to stand at night.
I live in an urban area, but not what I'd call a big city. There are plenty of dark areas so people do put on their brights even though it actually isn't necessary. I see people turn them on anytime there's 100ft without a streetlight. Though I've still seen some of this in major cities like LA, though I feel like the highbeam rate is lower.
Then again, I also see people turn them on in the fog and rain...
But I'm convinced it isn't a singular problem. I'm certain the LEDs are part of the issue, but not the whole story. I mean for example, improperly aligned headlights isn't as big of a problem when the lights aren't as bright. But when they are brighter then this matters a lot more. Hell, I've seen brand new vehicles be misaligned.
Those boards have a lot more on the board than just the cpu. At a minimum they have power conditioning and ram, usually also storage. A lot of what you pay for with an sbc is that routing and layout. If it’s got WiFi as well, you could be paying for the testing that goes into rf micro strips and potentially certifications on em emissions.
It is, of course possible to do all that yourself, but the system on module exists, because this integration has value that people are willing to pay for.
Ah I see. I guess if the design files are available that might be possible. Not sure about component availability though. I don’t remember for sure, but I thought there was something custom about the Broadcom Soc they were using, although that might have been for a different model.
I think the question is “how are the behavior of random spammers on your search page getting picked up by the crawler”? The assumption with cache is that searches of one user were being cached so that the crawler saw them. Other alternatives I can imagine are that your search page is powered by google, so it gets the search terms and indexes the results, or that you show popular queries somewhere. But you have to admit that the crawler seeing user generated search terms points to some deeper issue.
Interesting, I thought Apple Silicon was still ahead on raw numbers, would you mind pointing me at any resources to learn more?
Is that still true when you consider the whole system power consumption vs performance? I was under the impression that Apple's ram and storage solutions give them a small edge here (at the cost of upgradability / repairability)
Apple Silicon has a lead in performance per watt over the competition (not a gigantic one, but a real one nontheless), but we were talking about M1, which is 5 years old now and has no appreciable hardware advantages compared to an AMD or Intel laptop made in the last few years.
The reason an old M1 laptop gets better battery life is almost entirely a software difference.
"raw numbers" always means a lot of things. Apple's CPU benchmarks are neck-and-neck in multicore and usually top-of-class in single-core performance compared to other desktop chips. x86 will draw more power when idling and during bursty workloads, but is typically more efficient during sustained SIMD-style workloads.
The M3 Ultra is putting up some of the saddest OpenCL benches I've ever seen from a 200-300w GPU. The entry-level RTX 5060 Ti runs circles around it with a $400 MSRP and 180w TDP. I truly feel bad for anyone that bought a Mac Studio for AI inference.
reply