More

kriro · 2025-12-10T11:02:00 1765364520

Good times, remember riding our bikes to Toys 'R' Us of all places to buy the game with a buddy. Paddled back, played through the Orc campaign until 4 a.m. in the morning. One of my all time favorites.

kriro · 2025-12-08T10:16:40 1765189000

Well good news, these days there's another layer. "Not even GPT4-level LLM" bots that frustrate you into giving up by circling to the FAQs over and over.

kriro · 2025-12-03T11:29:39 1764761379

Library/API conflicts are the biggest pain point for me usually. Especially breaking changes. RLlib (currently 2.41.0) and Gymnasium (currently 0.29.0+) have ended in circles many times for me because they tend to be out of sync (for multi-agent environments). My go to test now is a simple hello world type card game like war, competitive multi-agent with rllib and gymnasium (pettingzoo tends to cause even more issues).

Claude Sonnet 4.5 was able to figure out a way to resolve it eventually (around 7 fixes) and I let it create an rllib.md with all the fixes and pitfalls and am curious if feeding this file to the next experiment will lead to a one-shot. GPT-5 struggled more but haven't tried Codex on this yet so it's not exactly fair.

All done with Copilot in agent mode, just prompting, no specs or anything.

kriro · 2025-09-11T09:39:17 1757583557

Thank you for doing that nad being a voice for liberty.

kriro · 2025-09-11T07:26:31 1757575591

Nothing planned. That's the point.

kriro · 2025-09-05T21:42:14 1757108534

Really happy to see this and will give it a good spin. They seem to be doing things the right way in my subjective opinion:

""" To implement this filter, we begin by ranking URL domains according to the volume of texts they contribute to the FineWeb (Penedo et al., 2024a) and FineWeb-2 (Penedo et al., 2025) corpus, as an approximation of web-level English and multilingual data. From this ranking, we select the top one million English domains and the top one million non-English domains. Due to domain overlap and the fact that some sites are now offline, the total number of accessible robots.txt files is smaller than two million. For each domain that remains reachable, we retrieve its robots.txt file as of January 2025 and examine the directives relevant to AI training. In particular, we focus on those targeting the AI-specific user agents listed in Appendix A. Any contents blocked by the current robots.txt is removed retroactively from the entire 2013-2024 range of the training dataset. We follow an opt-out policy, that is, if the corresponding robots.txt files are not available, we consider the data usable for training. The filtering process results in an estimated token loss of approximately 8% in English data and 4% in multilingual data. """

mycall · 2025-09-05T22:30:02 1757111402

> Any contents blocked by the current robots.txt is removed retroactively from the entire 2013-2024 range of the training dataset

Why not check historical versions of the robots.txt (e.g. archive.org) and contain the retroactive cutoff to a certain date range, parsing the robots.txt accordingly? That might increase the corpus size within legal and fair use boundaries.

lllllm · 2025-09-05T22:56:44 1757113004

common crawl anyway respects the CCbot opt-out every time they do a crawl.

we went a step further because back in old ages (2013 is our oldest training data) LLMs did not exist, so website owners opting out today of AI crawlers might like the option to also remove their past contents.

arguments can be made either way but we tried to remain on the cautious side at this point.

we also wrote a paper on how this additional removal affects downstream performance of the LLM https://arxiv.org/abs/2504.06219 (it does so surprisingly little)

pdpi · 2025-09-06T16:15:18 1757175318

"I didn't know to withdraw consent" isn't the same as "I consent". Thank you for doing the right thing.

mycall · 2025-09-06T17:59:59 1757181599

Ah good points, thanks for the clarification.

3np · 2025-09-06T08:29:30 1757147370

I imagine coverage is sparse enough to not be worth it.

kriro · 2025-08-14T12:39:00 1755175140

I'd say take and share. Seems like people these days value pictures not as a snapshot for themselves (memory) but rather as a snapshot to show themselves to others (projection). Or at least there was some sort of shift.

kriro · 2025-08-13T21:16:26 1755119786

Good find. Might be worth to read this and consider filing a complaint. Seems pretty clear they are in violation of BCorp values: https://www.bcorporation.net/en-us/standards/complaints/

If it's worth it. OP needs to decide.

keysdev · 2025-08-13T21:38:13 1755121093

BCorp is just virtual signaling. There is no reason for small business or startup to be Bcorp

3836293648 · 2025-08-14T03:59:27 1755143967

The expression is virtue signaling, not virtual

ummonk · 2025-08-14T17:57:49 1755194269

It's specifically an ESG firm, not just any random business.

stevage · 2025-08-15T03:43:11 1755229391

It depends what you mean by "just".

I worked at one. The BCorp label seemed to do a lot of good in establishing organisational culture and attracting people who were a good fit. The organisation did (and still does) a lot of good.

runlaszlorun · 2025-08-13T22:06:38 1755122798

[flagged]

abound · 2025-08-14T03:25:45 1755141945

For a bit more color here: a B Corp designation really is just a marketing tool. Unlike the name implies, it's not some special corporate structure, it's just a certification you pay some company to get and pinky promise that you'll be good.

Nonprofits and public benefit corporations at least have some "teeth" to them: they both allow you (in different ways) to do things that aren't directly in the interest of your fiduciary duties, and that single-minded money chasing is what incentives a lot of "bad" corporate behavior.

kriro · 2025-08-13T21:10:52 1755119452

I saw on Reddit, that you already reached out to some people in the OSS space that might have the legal expertise. This actually seems like a very relevant case to me. If a trademark is granted to an open source project, it seems ridiculous to me to apply market based use criteria.

Tbh...use should already be satisfied by having a Github or website and using the registered name.

Keep us posted.

ddtaylor · 2025-08-14T03:29:05 1755142145

> use should already be satisfied

A lot of posts on HN are about things that should have happened already. Every few days there is a story about a person doing something pretty boring and standard, but they can't because a payment processor or large regulatory body got involved and the computer went "boop boop" and now someone can't have money or continue to invent things. Sorry, pull the slot machine again and see if you get lucky?

kriro · 2025-08-08T11:45:27 1754653527

I think chess commentators are pretty lost when analyzing games of higher rated players without engines.

They are good at framing what is going on and going over general plans and walking through some calculations and potential tactics. But I wouldn't say even really strong players like Leko, Polgar, Anand will have greater insights in a Magnus-Fabi game without the engine.