Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Who will pay me for my AI chat histories?

Seriously, make a browser extension that people can turn on and off (no need to be dishonest here), and pay people to upload their AI chats, and possibly all the other content they view.

If Reddit wont let you scrape, pay people to automatically upload the Reddit comments they view normally.

If Claude cuts you off, pay people to automatically upload their Claude conversations.

Am I crazy, am I hastening dystopia?



> Who will pay me for my AI chat histories?

All the chatbots with free access do that, they pay you by running your arbitrary computations on their servers.


But then only one company is paying me for something that is of equal benefit to all.


Than I would simply use AI to generate chat histories and get paid (:


That is not a problem if the price paid is lower than what generating synthetic data of similar size will cost .


Great point. Verifying the synthetic data also has a cost, I wonder if it is cheaper than generating it?


They could probably pay you based on loss / some similar metric during training.


I've done a lot of post training and data collection for post-training

I think if you're not OpenAI/Anthropic sized (in which case you can do better) you're not going to get much value out of it

It's hard to usefully post-train on wildly varied inputs, and post-training is all most people can afford.

There's too much noise to improve things unless you do a bunch of cleaning and filtering that's also somewhat expensive.

If you constrain the task (for example, use past generations from your own product) you get much further along though.

I've thought about building a Chrome plugin to do something useful for ChatGPT web users doing a task relevant to what my product does, then letting them opt into sharing their logs.

That's probably a bit more tenable for most users since they're getting value, and if your extension can do something like produce prompts for ChatGPT, you'll get data that actually overlaps with what you're doing.


There is an A16z company that does exactly this, called yupp.ai. They need genuine labelling/feedback in return, but you get to either spend credits on expensive APIs or cash out. Likewise, openrouter has free endpoints from some providers who will retain your sessions for training.


Two things. First, no one wants your AI chat histories. They want to interact with the LLM themselves. Second, their business models break down when they can't steal content to train on. Paying for training data on a large scale is out of the question.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: