It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.
Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.
The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.
> It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.
Definitely. Even more interesting could be seeing them fall into the same trappings of quackery, and come up with things like over the counter lobotomies and colloidal silver.
On a totally different note, this could be very valuable for writing period accurate books and screenplays, games, etc ...
There's quite a lot of text in pre-Internet daily newspapers, of which there were once thousands worldwide.
When you're looking at e.g. the 19th century, a huge number are preserved somewhere in some library, but the vast majority don't seem to be digitized yet, given the tremendous amount of work.
Given how much higher-quality newspaper content tends to be compared to the average internet forum thread, there actually might be quite a decent amount of text. Obviously still nothing compared to the internet, but still vastly larger than just from published books. After all, print newspapers were essentially the internet of their day. Oh, and don't forget pamphlets in the 18th century.
And it's a 4B model. I worry that nontechnical users will dramatically overestimate its accuracy and underestimate hallucinations, which makes me wonder how it could really be useful for academic research.
I think not everyone in this thread understands that. Someone wrote "It's a time machine", followed up by "Imagine having a conversation with Aristotle."
> the issue is there is very little text before the internet,
Hm there is a lot of text from before the internet, but most of it is not on internet. There is a weird gap in some circles because of that, people are rediscovering work from pre 1980s researchers that only exist in books that have never been re-edited and that virtually no one knows about.
There is no doubt trillions of tokens of general communication in all kinds of languages tucked away in national archives and private collections.
The National Archives of Spain alone have 350 million pages of documents going back to the 15th century, ranging from correspondence to testimony to charts and maps, but only 10% of it is digitized and a much smaller fraction is transcribed. Hopefully with how good LLMs are getting they can accelerate the transcription process and open up all of our historical documents as a huge historical LLM dataset.
Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.
The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.