The generative part of language models can make for really fun "single-player" games where you're really competing with the inventiveness of the language model, so there's some sense that you're playing a game with infinite hidden complexity.
I appreciate that the site tells you when you create something that's never been discovered before. Especially when it happens with what you thought were pretty common combinations.
Thanks for sharing Simon! I will note that by training an adapter layer between this autoencoder's embedding space and OpenAI's, it's possible to recover a significant amount of detail from text-embedding-ada-002's embeddings with this model too[0]. But as the paper author's reply in a different thread points out, their iterative refinement approach is able to recover much more detail in their research with a smaller model.
This is definitely an interesting idea I've also pondered before. In my experience (just speaking from intuition) what's "easy" for LMs to predict often doesn't line up with our human expectations for what's "obvious". Often LLMs will learn seemingly "low information content" statistical correlations that just helps it lower its training loss.
For the many Obsidian users here, wanted to share an Obsidian demo/plugin that I saw recently by Justin Smith[0] that I think faithfully carries over a lot of what I liked about this idea into the Obsidian land, complete with a semantic index w/ language models.
If you're an Obsidian user, do check out the demo. I can't take credit for any part of building it, but it's really cool to see the idea in action :)
Hey. Any source for notation available? I fiddled around with codemirror to have my own notes app and iam curious about your implementation and would love to dig into it.
Hey HN! Fun to see this project from my college years on HN again. To clear my name, I have actually written other languages that are arguably more useful, like Oak [0] which I used to build a bunch of personal software infrastructure [1] including my current daily-driver apps for notes, longform writing, a Twitter client, a personal search engine, and a few other knickknacks.
I find building little toy interpreters and programming languages fantastically fulfilling, creatively energizing, and a really fun way to learn how such a common element of our work that's often a black box functions inside. I've written before about how to get started [2] with these kinds of little toy programming language projects if you're curious to dive in.
You know the documentation is good when I can hand it to ChatGPT and it spits out a fully functional program for me after two small tweaks (I had to tell it to add RUMOR HAS IT and to not chain operations).
Write a program in Tabloid that takes 5 numbers and returns the product of the first three numbers plus the product of the last two.
Run it with the numbers 4,5,7,8,9.
----> ChatGPT response:
DISCOVER HOW TO product_with_five_numbers WITH a, b, c, d, e
RUMOR HAS IT
EXPERTS CLAIM ab_product TO BE a TIMES b
EXPERTS CLAIM first_product TO BE ab_product TIMES c
EXPERTS CLAIM de_product TO BE d TIMES e
EXPERTS CLAIM result TO BE first_product PLUS de_product
SHOCKING DEVELOPMENT result
END OF STORY
EXPERTS CLAIM input_a TO BE 4
EXPERTS CLAIM input_b TO BE 5
EXPERTS CLAIM input_c TO BE 7
EXPERTS CLAIM input_d TO BE 8
EXPERTS CLAIM input_e TO BE 9
EXPERTS CLAIM final_result TO BE product_with_five_numbers OF input_a, input_b, input_c, input_d, input_e
YOU WON'T WANT TO MISS 'The result is'
YOU WON'T WANT TO MISS final_result
PLEASE LIKE AND SUBSCRIBE
I wonder if a hacky solution may be to have some kind of intermediate model to serialize the text (whether from an image of it or the raw PDF data) into LaTeX? I imagine the LM has seen enough formulas in TeX to understand it, but in most PDFs formulas are just jumbles of letters.
That's interesting! I think GPS falls in the category of "works so reliably that I don't perceive it as technology" for me. But yes, GPS an underrated piece of infrastructure, even more impressive as it's globally available (edit: as in, US tech infra made available beyond US).
If you want to appreciate GPS, check out this article and explanation of how it works. It's been posted to this website a few times now: https://ciechanow.ski/gps/
Thanks for the repo/issues link! Didn't know that resources existed.
Looks like the immediate issues got taken care of thanks to the HF team, but I'll probably pop over there in the future if I have ideas or notice things that can be improved.
The generative part of language models can make for really fun "single-player" games where you're really competing with the inventiveness of the language model, so there's some sense that you're playing a game with infinite hidden complexity.