> 1) If your blog posts are private, why are they on publicly accessible websites? Why not put it behind a paywall of some sort?
If I grow apple trees in front of my house and you come and take all apples and then turn up at my doorstep trying to sell me apple juice made from the apples you nicked that doesn't mean you had the right to do it, because I chose not to build a tall fence around my apple trees. Public content is free to read for humans, not free for corporations to offer paid content generation services based on my public content taken without me knowing or being asked for permission.
> 2) How many novels have bibliographies? How many musicians cite their influences? Citing sources is all well and good in academic papers, but there’s a point at which it just becomes infeasible. The more transformative the work, the harder it is to cite inspiration.
You are making this kind of argument: "How much is a drop of gas? Nothing. Right, could you fill my car drop by drop?"
If we have technology that can charge for producing bullshit on an industrial scale by recombining sampled works of others, we are perfectly capable of keeping track of the sources used for training and generative diarrhoea.
> 3) What about libraries? Should they be licensing every book they have in their collections? Should the people who check the books out have to pay royalties to learn from them?
All of these responses were so quality, there's really no need to add. I Especially like the apple argument about a product in your front yard. You still have no basis to take them from my front yard.
If there was the equivalent of what a lot of other sites have (gems, gold, ribbons) I'd give you one. Got a lot of gems, I'll send you an admittedly teeny heliodore, tourmaline, or peridot at cost if you want one. Gemstone market's junk lately with the economy.
You're both just repeating the "you wouldn't download an apple" argument. In the context of the Internet, you're voluntarily sending the user an apple and expecting them to not do various things to it, which is unreasonable. Nothing is taken. If it were, your website would be completely empty.
Remember, Copying Is Not Theft. Copyright law is just a temporary monopoly meant to economically incentivize you. Nothing more.
BTW, pro-AI countries do differentiate between private and public posts. If it's public, it's legally fair game to train on it. If it's private, you need a license to access it. So it does matter. Also see: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn
If I grow apple trees in front of my house and you come and take all apples and then turn up at my doorstep trying to sell me apple juice made from the apples you nicked that doesn't mean you had the right to do it, because I chose not to build a tall fence around my apple trees. Public content is free to read for humans, not free for corporations to offer paid content generation services based on my public content taken without me knowing or being asked for permission.
> 2) How many novels have bibliographies? How many musicians cite their influences? Citing sources is all well and good in academic papers, but there’s a point at which it just becomes infeasible. The more transformative the work, the harder it is to cite inspiration.
You are making this kind of argument: "How much is a drop of gas? Nothing. Right, could you fill my car drop by drop?"
If we have technology that can charge for producing bullshit on an industrial scale by recombining sampled works of others, we are perfectly capable of keeping track of the sources used for training and generative diarrhoea.
> 3) What about libraries? Should they be licensing every book they have in their collections? Should the people who check the books out have to pay royalties to learn from them?
Yes https://www.bl.uk/plr