Extraneous material in my example would be potentially the rest of a large parag...

MrJohz · on Oct 13, 2024

That doesn't sound like soft or hard wrapping, though, that sounds like semantic wrapping, which is a separate concept entirely. With semantic wrapping you put each sentence (or similar) on a new line, which helps with diffing. But if that sentence runs over e.g. 80 characters, you still need to decide whether you're going to hard wrap or soft wrap that sentence. And in the inverse direction, if you don't do semantic wrapping, you'll have similar issues with diffs regardless of whether you use hard wraps or soft wraps.

So I think that's a good argument for doing semantic wrapping of code and text (I guess semantic wrapping for code is just not writing everything in one long line separated by semicolons), but once you've put in semantic line breaks, you still need to decide how to handle text that spans multiple lines.

a1369209993 · on Oct 13, 2024

  > But if that sentence runs over e.g. 80 characters,
  >  you still need to decide
  >  whether you're going to hard wrap or soft wrap that sentence.

No I don't. Semantic wrapping all the way.

MrJohz · on Oct 13, 2024

    > This is a sentence that includes the word "Lopadotemachoselachogaleokranioleipsanodrimhypotrimmatosilphiokarabomelitokatakechymenokichlepikossyphophattoperisteralektryonoptekephalliokigklopeleiolagoiosiraiobaphetraganopterygon" in it. 
    > How should it be wrapped semantically?

This is a psychological case to demonstrate how semantic wrapping does not by itself solve the "hard vs soft" wrapping question. If the answer is that the word should remain as a single word, then you are using soft wraps (or no wraps at all). If the answer is that the word should be split into 80 character chunks, then you're using hard wraps.

a1369209993 · on Oct 13, 2024

> How should it be wrapped semantically?

I have no idea what the semantics of that word are, which is information that is required in order to properly semantically wrap it. (Inherently, since conveying such semantics is one of the major pointer of semantic wrapping.)

However, you included embedded control characters (C2 AD aka 'SOFT HYPHEN'; below replaced with '-') that encode less semantic information than is necessary for proper semantic wrapping, but not none:

Lopado-temacho-selacho-galeo-kranio-leipsano-drim-hypo-trimmato-silphio-karabo-melito-katakechy-meno-kichl-epi-kossypho-phatto-perister-alektryon-opte-kephallio-kigklo-peleio-lagoio-siraio-baphe-tragano-pterygon.

Web browsers use that information to do poor-quality semantic wrapping automatically - actual hard or soft[0] wrapping would produce something like:

  Lopadotemachoselachogaleokranioleipsanod-
  rimhypotrimmatosilphiokarabomelitokatake-
  chymenokichlepikossyphophattoperisterale-
  ktryonoptekephalliokigklopeleiolagoiosir-
  aiobaphetraganopterygon.

Which looks like the following from a partly-semanically-aware perspective:

Lopado-temacho-selacho-galeo-kranio-leipsano-d[BREAK]rim-hypo-trimmato-silphio-karabo-melito-katake[BREAK]chy-meno-kichl-epi-kossypho-phatto-perister-ale[BREAK]ktryon-opte-kephallio-kigklo-peleio-lagoio-sir[BREAK]aio-baphe-tragano-pterygon.

The fact that you included soft hyphens rather concedes the point that hard and soft[0] wrapping is incorrect[1].

0: Or rather, non-semantic, which is what we're actually arguing over. Technically, semantic wrapping is a subset of hard wrapping, but it's a specific subset that isn't what is expressed by just saying "hard wrapping". Kind of like how birds aren't what anyone means when they just say "dinosaurs".

1: Granted, to be fair, a lot of the time we just don't care. But (contra your original comment) we never need to resort to non-semantic wrapping; we just sometimes (often) decide to be lazy because it doesn't matter.

MrJohz · on Oct 14, 2024

I think this a valid approach to semantic wrapping, but I don't think this is the only one, and specifically I think it has significant flaws: (1) We've lost grepability unless I write rather complex regexes to handle the possible places where hard line breaks may have been added. (2) We've lost diffability in the sense that if I correct a typo in the word, that correction can cascade through the word and cause multiple lines to show up as changed in the diff when semantically only one part of one word has changed.

Instead, I would prefer a soft semantic wrap: if a single semantic unit (be that a word, a clause, or whatever else) extends beyond, say, 80 characters, we keep it on the same line and let the editor/file viewer handle wrapping. This means that we maintain grepability over words and semantically-connected phrases, and we maintain diffability by avoiding the hard-wrap cascade. To me, this is a much more useful version of semantic wrapping, because it only wraps when there is a semantic clause, and not on any arbitrary semantic break.

My goal here isn't to convince you that this version is better than your version of semantic wrapping, only that wrapping based on semantics is an orthogonal concept to hard and soft wrapping, and that even if we choose to take a semantic wrapping approach, we still need to decide what to do with particularly long lines.

(Although I will add to this: I had a colleague who was a deep fan of semantic wrapping, and I just never really got it. I used it for a couple of years, but I've never run into issues with simply soft-wrapping everything. When inserting new clauses or changing text in the middle of a line, every diff tool that I've used has been able to accurately identify which portion of a given paragraph has changed and highlight it. Meanwhile, as a writer and reader, I need to put more effort into reading prose that is written in an odd, stylised format that is very different from the intended paragraph structure. I can see the argument that I've accepted semantic line breaks in code or configuration files, so I should be able to handle it in markdown, but I just find it harder to read and more irritating to write. But assuming someone does want to use semantic line breaks, I still believe that that's an orthogonal choice to deciding between hard and soft wrapping.)

a1369209993 · on Oct 14, 2024

> Instead, I would prefer a soft semantic wrap

So would I, but...

> if a single semantic unit (be that a word, a clause, or whatever else) extends beyond, say, 80 characters, we keep it on the same line and let the editor/file viewer handle wrapping.

...the editor can't do that because it doesn't understand the semantics.

> that wrapping based on semantics is an orthogonal concept to hard and soft wrapping

Yes, that's why I've been saying "hard and/or soft [but in either case nonsemantic] wrapping".

> > > With semantic wrapping you put each sentence (or similar) on a new line [...] But if that sentence runs over e.g. 80 characters, [then...]

... You don't need to fall back on non-semantic wrapping, you can just just keep breaking it up into smaller and smaller semantically-meaningful pieces.

(You have to do that 'hard'-ly because the editor doesn't understand the semantics, but that's not "decid[ing] whether you're going to hard wrap or soft wrap", it's being forced to hard wrap as a implementation detail because that's what results in correct wrapping.)

It might not be worth the effort to do that, but you're never forced not to (given not-pathologically-short line length limits like 20 characters).

MrJohz · on Oct 14, 2024

Hmm, I think we have different definitions of a semantic line wrap. To me, semantic line breaks means that line breaks are used to separate clauses and sentences, such that at least every sentence is on its own line, and every line break represents a semantic clause or sentence gap.

To you, I get the impression that semantic wraps are about ensuring that every wrap/line break happens at a semantically valid place, where semantically valid could be a semantically valid clause, but also a semantically valid intra-word line break.

In that sense, I can see how your strategy would produce the same effects as hard wrapping, albeit with different choices about where to put the wraps. But I think then, like I said, you end up running into the same difficulties that you do with conventional hard wrapping, at least in pathological cases.

a1369209993 · on Oct 14, 2024

> such that at least every sentence is on its own line

Yes, with the obvious possible exception of trivial/degenerate cases like "i++; j--;" in C or "This is a cat. That is a dog." in English.

> and every line break represents a semantic clause or sentence gap.

Specifically, it represents a maximally coarse semantic gap, drilling as shallowly down into subclauses as possible/practical.

> wrap/line break [can happen at ...] also a semantically valid intra-word line break.

Preferably only if that word would already be alone on its overly-long line. Eg:

  # bad, breaks subordinate clause before superordinate
  That sounds supercalifragilistic-
    expialidocious.
  
  # semantically valid, but ugly (a pathological case)
  That sounds
    supercalifragilisticexpialidocious.
  
  # vertically larger, but probably fine
  # (unless you're feeling incunabulum-y[0])
  That sounds
    supercalifragilistic-
    expialidocious.

> you end up running into the same difficulties that you do with conventional hard wrapping, at least in pathological cases.

I've yet to see any evidence that really pathological cases exist. (As opposed to "I'm lazy and can't be arsed" cases, which I'm fairly explicitly not disputing.)

0: http://code.jsoftware.com/wiki/Essays/Incunabulum

a1369209993 · on Oct 14, 2024

> given not-pathologically-short line length limits like 20 characters

Poor phrasing; 20 characters was meant as a example of a limit that is pathologically short.