Seriously, make a browser extension that people can turn on and off (no need to be dishonest here), and pay people to upload their AI chats, and possibly all the other content they view.
If Reddit wont let you scrape, pay people to automatically upload the Reddit comments they view normally.
If Claude cuts you off, pay people to automatically upload their Claude conversations.
I've done a lot of post training and data collection for post-training
I think if you're not OpenAI/Anthropic sized (in which case you can do better) you're not going to get much value out of it
It's hard to usefully post-train on wildly varied inputs, and post-training is all most people can afford.
There's too much noise to improve things unless you do a bunch of cleaning and filtering that's also somewhat expensive.
If you constrain the task (for example, use past generations from your own product) you get much further along though.
I've thought about building a Chrome plugin to do something useful for ChatGPT web users doing a task relevant to what my product does, then letting them opt into sharing their logs.
That's probably a bit more tenable for most users since they're getting value, and if your extension can do something like produce prompts for ChatGPT, you'll get data that actually overlaps with what you're doing.
There is an A16z company that does exactly this, called yupp.ai. They need genuine labelling/feedback in return, but you get to either spend credits on expensive APIs or cash out. Likewise, openrouter has free endpoints from some providers who will retain your sessions for training.
Two things. First, no one wants your AI chat histories. They want to interact with the LLM themselves. Second, their business models break down when they can't steal content to train on. Paying for training data on a large scale is out of the question.
Seriously, make a browser extension that people can turn on and off (no need to be dishonest here), and pay people to upload their AI chats, and possibly all the other content they view.
If Reddit wont let you scrape, pay people to automatically upload the Reddit comments they view normally.
If Claude cuts you off, pay people to automatically upload their Claude conversations.
Am I crazy, am I hastening dystopia?