The sample responses given are fascinating. It seems more difficult than normal ...

libraryofbabel · 2025-12-19T00:05:07 1766102707

I used to teach 19th-century history, and the responses definitely sound like a Victorian-era writer. And they of course sound like writing (books and periodicals etc) rather than "chat": as other responders allude to, the fine-tuning or RL process for making them good at conversation was presumably quite different from what is used for most chatbots, and they're leaning very heavily into the pre-training texts. We don't have any living Victorians to RLHF on: we just have what they wrote.

To go a little deeper on the idea of 19th-century "chat": I did a PhD on this period and yet I would be hard-pushed to tell you what actual 19th-century conversations were like. There are plenty of literary depictions of conversation from the 19th century of presumably varying levels of accuracy, but we don't really have great direct historical sources of everyday human conversations until sound recording technology got good in the 20th century. Even good 19th-century transcripts of actual human speech tend to be from formal things like court testimony or parliamentary speeches, not everyday interactions. The vast majority of human communication in the premodern past was the spoken word, and it's almost all invisible in the historical sources.

Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!

nemomarx · 2025-12-19T00:44:15 1766105055

I wonder if the historical format you might want to look at for "Chat" is letters? Definitely wordier segments, but it's at least the back and forth feel and we often have complete correspondence over long stretches from certain figures.

This would probably get easier towards the start of the 20th century ofc

libraryofbabel · 2025-12-19T00:48:55 1766105335

Good point, informal letters might actually be a better source - AI chat is (usually) a written rather than spoken interaction after all! And we do have a lot transcribed collections of letters to train on, although they’re mostly from people who were famous or became famous, which certainly introduces some bias.

pigpop · 2025-12-19T20:31:41 1766176301

The question then would be whether to train it to respond to short prompts with longer correspondence style "letters" or to leave it up to the user to write a proper letter as a prompt. Now that would be amusing

Dear Hon. Historical LLM

I hope this letter finds you well. It is with no small urgency that I write to you seeking assistance, believing such an erudite and learned fellow as yourself should be the best one to furnish me with an answer to such a vexing question as this which I now pose to you. Pray tell, what is the capital of France?

dleeftink · 2025-12-19T00:21:33 1766103693

While not specifically Victorian, couldn't we learn much from what daily conversations were like by looking at surviving oral cultures, or other relatively secluded communal pockets? I'd also say time and progress are not always equally distributed, and even within geographical regions (as the U.K.) there are likely large differences in the rate of language shifts since then, some possibly surviving well into the 20th century.

NooneAtAll3 · 2025-12-19T07:41:55 1766130115

don't we have parlament transcripts? I remember something about Germany (or maybe even Prussia) developing fast script to preserve 1-to-1 what was said

libraryofbabel · 2025-12-19T17:57:28 1766167048

I mentioned those in the post you’re replying to :)

It’s a better source for how people spoke than books etc, but it’s not really an accurate source for patterns of everyday conversation because people were making speeches rather than chatting.

bryancoxwell · 2025-12-19T02:11:46 1766110306

Fascinating, thanks for sharing

DGoettlich · 2025-12-20T00:58:26 1766192306

very interesting observation!

_--__--__ · 2025-12-18T23:24:23 1766100263

The time cutoff probably matters but maybe not as much as the lack of human finetuning from places like Nigeria with somewhat foreign styles of English. I'm not really sure if there is as much of an 'obvious LLM text style' in other languages, it hasn't seemed that way in my limited attempts to speak to LLMs in languages I'm studying.

d3m0t3p · 2025-12-18T23:51:29 1766101889

The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.

paul_h · 2025-12-19T07:51:35 1766130695

Diverged as well as standardized. I did some research into "out of pocket" and how it differs in meaning in UK-English (paying from one's own funds) and American-English (uncontactable) and I recall 1908 being the current thought as to when the divergence happened: 1908 short story by O. Henry titled "Buried Treasure."

anonymous908213 · 2025-12-18T23:26:56 1766100416

There is. I have observed it in both Chinese and Japanese.

kccqzy · 2025-12-19T04:53:55 1766120035

Oh definitely. One thing that immediately caught my mind is that the question asks the model about “homosexual men” but the model starts the response with “the homosexual man” instead. Changing the plural to the singular and then adding an article. Feels very old fashioned to me.

tonymet · 2025-12-19T01:00:28 1766106028

the samples push the boundaries of a commercial AI, but still seem tame / milquetoast compared to common opinions of that era. And the prose doesn't compare. Something is off.