Hacker Newsnew | past | comments | ask | show | jobs | submit | thesephist's commentslogin

Reminds me a lot of Infinite Craft (https://neal.fun/infinite-craft/).

The generative part of language models can make for really fun "single-player" games where you're really competing with the inventiveness of the language model, so there's some sense that you're playing a game with infinite hidden complexity.


the latent space is so deep, you may never run out of possibilities...


I appreciate that the site tells you when you create something that's never been discovered before. Especially when it happens with what you thought were pretty common combinations.


yes you feel very creative and cool! (and slightly disappointed in others for not having come up with this yet!)


Thanks for sharing Simon! I will note that by training an adapter layer between this autoencoder's embedding space and OpenAI's, it's possible to recover a significant amount of detail from text-embedding-ada-002's embeddings with this model too[0]. But as the paper author's reply in a different thread points out, their iterative refinement approach is able to recover much more detail in their research with a smaller model.

[0] https://twitter.com/thesephist/status/1698095739899974031


Definitely possible if supported by transformers.js. If I see enough folks wanting it I'll likely add it at some point.


This is definitely an interesting idea I've also pondered before. In my experience (just speaking from intuition) what's "easy" for LMs to predict often doesn't line up with our human expectations for what's "obvious". Often LLMs will learn seemingly "low information content" statistical correlations that just helps it lower its training loss.


Hey HN! Author here.

For the many Obsidian users here, wanted to share an Obsidian demo/plugin that I saw recently by Justin Smith[0] that I think faithfully carries over a lot of what I liked about this idea into the Obsidian land, complete with a semantic index w/ language models.

If you're an Obsidian user, do check out the demo. I can't take credit for any part of building it, but it's really cool to see the idea in action :)

[0] https://twitter.com/justindsmith/status/1679978286955532296


Excellent! This was my first question upon reading, whether it can be integrated into my Obsidian Database.


Hey. Any source for notation available? I fiddled around with codemirror to have my own notes app and iam curious about your implementation and would love to dig into it.


Hey HN! Fun to see this project from my college years on HN again. To clear my name, I have actually written other languages that are arguably more useful, like Oak [0] which I used to build a bunch of personal software infrastructure [1] including my current daily-driver apps for notes, longform writing, a Twitter client, a personal search engine, and a few other knickknacks.

I find building little toy interpreters and programming languages fantastically fulfilling, creatively energizing, and a really fun way to learn how such a common element of our work that's often a black box functions inside. I've written before about how to get started [2] with these kinds of little toy programming language projects if you're curious to dive in.

[0] https://oaklang.org/

[1] https://github.com/topics/oaklang

[2] https://thesephist.com/posts/pl/


You know the documentation is good when I can hand it to ChatGPT and it spits out a fully functional program for me after two small tweaks (I had to tell it to add RUMOR HAS IT and to not chain operations).

Write a program in Tabloid that takes 5 numbers and returns the product of the first three numbers plus the product of the last two.

Run it with the numbers 4,5,7,8,9.

----> ChatGPT response:

    DISCOVER HOW TO product_with_five_numbers WITH a, b, c, d, e

    RUMOR HAS IT

        EXPERTS CLAIM ab_product TO BE a TIMES b

        EXPERTS CLAIM first_product TO BE ab_product TIMES c

        EXPERTS CLAIM de_product TO BE d TIMES e

        EXPERTS CLAIM result TO BE first_product PLUS de_product

        SHOCKING DEVELOPMENT result

    END OF STORY

    EXPERTS CLAIM input_a TO BE 4

    EXPERTS CLAIM input_b TO BE 5

    EXPERTS CLAIM input_c TO BE 7

    EXPERTS CLAIM input_d TO BE 8

    EXPERTS CLAIM input_e TO BE 9

    EXPERTS CLAIM final_result TO BE product_with_five_numbers OF input_a, input_b, input_c, input_d, input_e

    YOU WON'T WANT TO MISS 'The result is'

    YOU WON'T WANT TO MISS final_result

    PLEASE LIKE AND SUBSCRIBE


This project is from a couple of years ago. Sibling comment is correct that I'm working now.


I wonder if a hacky solution may be to have some kind of intermediate model to serialize the text (whether from an image of it or the raw PDF data) into LaTeX? I imagine the LM has seen enough formulas in TeX to understand it, but in most PDFs formulas are just jumbles of letters.


That's interesting! I think GPS falls in the category of "works so reliably that I don't perceive it as technology" for me. But yes, GPS an underrated piece of infrastructure, even more impressive as it's globally available (edit: as in, US tech infra made available beyond US).


If you want to appreciate GPS, check out this article and explanation of how it works. It's been posted to this website a few times now: https://ciechanow.ski/gps/


What a fantastic deep dive ! Thanks for (re-)posting this, never had stumbled upon it here before.


Atomic clocks! In space! That's the epitome of modern technology for me.


Thanks for the repo/issues link! Didn't know that resources existed.

Looks like the immediate issues got taken care of thanks to the HF team, but I'll probably pop over there in the future if I have ideas or notice things that can be improved.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: