Hacker Newsnew | past | comments | ask | show | jobs | submit | more zrm's commentslogin

Note that those two links are using different configs. Here's the link for Threadripper 9995WX:

https://www.phoronix.com/review/amd-threadripper-9995wx-trx5...

That's using the same config as the server systems (allmodconfig) but it has the 9950X listed there and on that config it takes 547.23 seconds instead 47.27. That puts all of the consumer CPUs as slower than any of the server systems on the list. You can also see the five year old 2.9GHz Zen2 Threadripper 3990X in front of the brand new top of the range 4.3GHz Zen5 9950X3D because it has more cores.

You can get a pretty good idea of how kernel compiles scale with threads by comparing the results for the 1P and 2P EPYC systems that use the same CPU model. It's generally getting ~75% faster by doubling the number of cores, and that's including the cost of introducing cross-socket latency when you go from 1P to 2P systems.


Oh good catches! I must have grabbed the wrong chart from the consumer CPU benchmark, thanks for pointing out the subsequent errors. The resulting relations do make more sense (clock speed certainly helps, but there is wayyyy less of a threading wall than I had incorrectly surmised).

Here is the corrected link for the 9950X review with allmod instead of def for equal comparison (I couldn't find the def chart in the server review) https://www.phoronix.com/benchmark/result/amd-ryzen-9-9900x-...


AI training and the thing search engines do to make a search index are essentially the same thing. Hasn't the latter generally been regarded as fair use, or else how do search engines exist?


There was a relatively tiny but otherwise identical uproar over Google even before they added infoboxes that reduced the number of people who clicked through.


There was also the lawsuit against google for the Google Scholar project, which is not only very similar to how AI use ingest copyright material, but even more than AI actually reproduced word for word (intentionally so) snippets of those works. Google Scholar is also fair use.


> There was a relatively tiny but otherwise identical uproar over Google even before they added infoboxes that reduced the number of people who clicked through.

But is that because it isn't fair use or because of the virulent rabies epidemic among media company lawyers?


This was normal people, as much as bloggers on the pre-social media early web could be considered normal.


Normal people that aren't media companies were objecting to search engines indexing websites? That seems more likely to have been media companies using the fact that they're media companies to get people riled up over a thing the company is grumpy about.


I don't think regular people pay attention to copyright decisions (they don't even pay attention to the cases to make it to the supreme court) but there are plenty of lawyers who don't work for media companies who disagree with the findings. I also think your characterization is ridiculous and pejorative.


They disagree with search engines being fair use?

The general problem is that both the structure of copyright and the legacy media business model were predicated on copying being a capital-intensive process. If a printing press is expensive then reproduction is a good place to collect royalties, because you could go after that expensive piece of equipment if they don't pay. And if a printing press is expensive then a publisher who has one is offering a scarce service in a market with a high barrier to entry.

The internet made copying free and that pretty well devastated the publishing industry, more as a result of the second one than the first. If your product isn't scarce -- if your news reporting is in competition with every blog and social media post -- you're not getting the same margins you used to. But there's no plausible way the incumbents are going to convince people that reporters with a website instead of a printing press need to be excluded from the market so they can have less competition, and that by itself and nothing more means their traditional business model is gone. They're competing for readers and advertisers against Substack and Reddit and the cat's not going back in the bag.

Meanwhile copyright infringement got way easier and that's much more plausible to frame as a problem, so the companies want to sic their lawyers on it, except that the bag is here on the ground and the cat is still over there getting a million hits. There is no obviously good way to solve it (but plenty of bad ways to not solve it) and solving it still wouldn't put things back the way they were anyway.

So their lawyers are constantly under pressure to do something but none of their options are good or effective which means they're constantly demanding things that are oppressive or asinine or, like the anti-circumvention clause in the DMCA, own-goals that tech megacorps use against content creators to monopolize distribution channels. Which is why it's an epidemic. If you can see the target the pressure is on to pull the trigger even when all you have is a footgun.


>They disagree with search engines being fair use?

No, with LLMs being fair use. I'm not going to respond to the rest of your post which is a paranoid and pejorative screed based on the fallacy that copyright is predicated on copying being hard or intensive when that was never the case. Copying was always easy. Its the creative part that is hard and why copying was made illegal.


> No, with LLMs being fair use.

In which case you're responding to the wrong thread.

> Copying was always easy.

Compare the price of a physically printed book which is in the public domain to the median one that isn't. The prices are only a little lower because the printing and distribution costs are significant.

Now compare the price of ebooks in the public domain with ebooks still under copyright. The latter isn't 40% more or 75% more, it's a billion percent more. Infinitely more. Copying went from being a double-digit percentage of the price to being zero.


Most important part of fair use is does it harm the market for the original work. Search helps to brings more eyes to the original work, llms don't.


The fair use test (in US copyright law) is a 4 part test under which impact on the market for the original work is one of 4 parts. Notably, just because a use has massively detrimental harms to a work's market does not in and of itself constitute a copyright violation. And it couldn't be any other way. Imagine if you could be sued for copyright infringement for using a work to criticize that work or the author of that work if the author could prove that your criticism hurt their sales. Imagine if you could be sued for copyright infringement because you wrote a better song or book on the same themes as a previous creator after seeing their work and deciding you could do it better.

Perhaps famously, emulators very clearly and objectively impact the market for a game consoles and computers and yet they are also considered fair use under US copyright law.

No one part of the 4 part test is more important than the others. And so far in the US, training and using an LLM has been ruled by the courts to be fair use so long as the materials used in the training were obtained legally.


> And so far in the US, training and using an LLM has been ruled by the courts to be fair use so long as the materials used in the training were obtained legally.

Just like OpenAI is rightfully upset if their LLM output is used to train a competitor’s model and might seek to restrict it contractually, publishers too may soon have EULAs just for reading their books.


OpenAI's hypocrisy on this matter is precisely why hackers should be taking this as the best opportunity we've had in decades to scale back the massive expansions that Disney et al have managed to place on copyright. But instead of taking advantage of the fact that for once someone with funding and money can go toe to toe with the big publishers and that in doing so they will be hoist on their own petard, a lot of hackers appear to be circling the wagons and suddenly finding that they think this whole "IP" thing is good actually and maybe we should make copyright even stronger.

Surely making copyright even stronger (and even expanding it to cover style as some have argued in response to the Ghibli style stuff) will have no unintended consequences going forward into a future where more and more technology is locked down by major manufacturers with a strong incentive to use and abuse IP law to prevent competition and open alternatives... right?


GenAI art is like counterfeit goods. If left unchecked it will mostly destroy the market for the original.


That's certainly an argument often made about counterfeit goods, and it can certainly be true in cases (and counterfeiting has other problems, namely confusing the origin of a specific good when that matters to the consumer), but it's also not a universal truth either. Were it a universal truth, that would imply generally that open source can't work because anyone can make and distribute copies of the open material, but also it implies that Windows and macOS should not exist because of all the innumerable Linux clones.

Also instructive would be the IBM BIOS clone, it is perhaps true that the "IBM Compatibles" killed the market that existed for IBM machines at that moment in time, but it's also true that it opened whole new markets, both to the clone makers and the ancillary businesses, but also arguably IBM themselves.

3d printing and Arduino are probably other examples where "counterfeits" might have shrunk the market for the originals (Prusa is notably reducing how open their designs are, and Arduino themselves are not the healthiest, modulo being owned by Qualcom now), but the market for Aruduino projects and ancillary supplies and certainly the market for 3d printers is massively healthy, and arguably both are healthier than if Arduino or Prusa (or really Reprap) were the single and sole providers of their products.

And I think art has an even stronger bulwark in that a lot of the value of a given "art" comes not from the art itself, but from the artist. It's very possible many famous artist's works were actually made by their apprentices, but until someone proves that, the art will continue to have value as an original work of the artist. But art is also a dime a dozen (or less). The internet is full of free or dirt cheap art and today you can go on fiver or mechanical turk and commission any number of artworks for probably less than your day's wages. But no one is buying tickets to your Fiver concert. No one buys $1k per plate dinners at Deviant Art gallery showings. But they will pay many thousands of dollars for a piece of artwork that might destroy itself because the person who produced that artwork is named Banksy.


I don't think they are rightfully upset at all. Yeah, no kidding. Everyone becomes pro rent seeker when it suits them. Which is the exact reason we must rain it in


I misspoke. I should have written “understandably upset”.


1. Character of the use. Commercial. Unfavorable.

2. Nature of the work. Imaginative or creative. Unfavorable.

3. Quantity of use. All of it. Unfavorable.

4. Impact on original market. Direct competition. Royalty avoidance. Unfavorable.

Just because the courts have not done their job properly does not mean something illegal is not happening.


All of these apply to emulators.

* The use is commercial (a number of emulators are paid access, and the emulator case that carved out the biggest fair use space for them was Connectix Virtual Game Station a very explicitly commercial product)

* The nature of the work is imaginative and creative. No one can argue games and game consoles aren't imaginative and creative works.

* Quantity of use. A perfect emulator must replicate 100% of the functionality of the system being emulated, often times including bios functionality.

* Impact on market. Emulators are very clearly in direct competition with the products they emulate. This was one of Sony's big arguments against VGS. But also just look around at the officially licensed mini-retro consoles like the ones put out by Nintento, Sony and Atari. Those retro consoles are very clearly competing with emulators in the retro space and their sales were unquestionably affected by the existence of those emulators. Royalty avoidance is also in play here since no emulator that I know of pays licensing fees to Nintendo or Sony.

So are emulators a violation of copyright? If not, what is the substantial difference here? An emulator can duplicate a copyrighted work exactly, and in fact is explicitly intended to do so (yes, you can claim its about the homebrew scene, and you can look at any tutorial on setting up these systems on youtube to see that's clearly not what people want to do with them). Most of the AI systems are specifically programmed to not output copyrighted works exactly. Imagine a world where emulators had hash codes for all the known retail roms and refused to play them. That's what AI systems try to do.

Just because you have enumerated the 4 points and given 1 word pithy arguments for something illegal happening does not mean that it is. Judge Alsup laid out a pretty clear line of reasoning for why he reached the decision he did, with a number of supporting examples [1]. It's only 32 pages, and a relatively easy read. He's also the same judge that presided over the Oracle v. Google cases that found Google's use of the java APIs to be fair use despite that also meeting all 4 of your descriptions. Given that, you'll forgive me if I find his reasoning a bit more persuasive than your 52 word assertion that something illegal is happening.

[1]: https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/...


>If not, what is the substantial difference here?

Well they are completely different systems functioning in completely different ways and only looking at one of the four factors isn't doing any favors.


I believe we’re in violent agreement here, because my point was that all 4 aspects are equally important and they need to be evaluated as a whole. And further that the current legal rulings on these systems delve into each of those parts with much more nuance and care than the provided 56 word surface level examination of the issues


It seems like you're responding to a question about training by talking about inference. If you train an LLM because you want to use it to do sentiment analysis to flag social media posts for human review, or Facebook trains one and publishes it and others use it for something like that, how is that doing anything to the market for the original work? For that matter, if you trained an LLM and then ran out of money without ever using it for anything, how would that? It should be pretty obvious that the training isn't the part that's doing anything there.

And then for inference, wouldn't it depend on what you're actually using it for? If you're doing sentiment analysis, that's very different than if you're creating an unlicensed Harry Potter sequel that you expect to run in theaters and sell tickets. But conversely, just because it can produce a character from Harry Potter doesn't mean that couldn't be fair use either. What if it's being used for criticism or parody or any of the other typical instances of fair use?

The trouble is there's no automated way to make a fair use determination, and it really depends on what the user is doing with it, but the media companies are looking for some hook to go after the AI companies who are providing a general purpose tool instead of the subset of their "can't get blood from a stone" customers who are using that tool for some infringing purpose.


re ".....AI training and the thing search engines do to make a search index are essentially the same thing. ...."

Well, AI training has annoyed LOTS people. Overloaded websites.. Done things just because they can . ie Facebook sucking up content of lots pirate books

Since this AI race started our small website is constantly over run by bots and it is not usable by humans because of the load.. NEWER HAD this problem before AI , when just access by search engine indexing .....


This is largely because search engines are a concentrated market and AI training is getting done by everybody with a GPU.

If Google, Bing, Baidu and Yandex each come by and index your website, they each want to visit every page, but there aren't that many such companies. Also, they've been running their indexes for years so most of the pages are already in them and then a refresh is usually 304 Not Modified instead of them downloading the content again.

But now there are suddenly a thousand AI companies and every one of them wants a full copy of your site going back to the beginning of time while starting off with zero of them already cached.

Ironically copyright is actually making this worse, because otherwise someone could put "index of the whole web as of some date in 2023" out there as a torrent and then publish diffs against it each month and they could all go download it from each other instead of each trying to get it directly from you. Which would also make it easier to start a new search engine.


Weird, AI companies insist that AI models are not just indexes but instead something the model has "learned".

So, again, to answer my question, it's certainly not a settled matter of law that AI models and/or their "training" is actually akin to a search engine such that it amounts to a fair use. So how is it that the EFF is reporting it like a fact?


Google doesn't offer for own gains copies of existing websites (except they do that lately as well)


In the US some of it could be tariffs. Micron is a US company with some US fabs but most of theirs are in other countries and Samsung and Hynix are both South Korea.


U.S. tariffs inadvertently kept prices low, due to stockpiling of memory when prices were cheap, before tariffs took effect. As that inventory is depleted, new supply chain purchases are much more expensive and subject to tariffs.


That seems like classic Apple, really.


NTFS writing isn't that inexplicable. NTFS is a proprietary filesystem that isn't at all simple to implement and the ntfs-3g driver got there by reverse engineering. Apple doesn't want to enable something by default that could potentially corrupt the filesystem because Microsoft could be doing something unexpected and undocumented.

Meanwhile if you need widespread compatibility nearly everything supports exFAT and if you need a real filesystem then the Mac and Windows drivers for open source filesystems are less likely to corrupt your data.


Apple is likely to be in the position to negotiate nrfs documentation access with Microsoft for a clean-room implementation, with NDAs and everything.

My money is on apple not having the will to do thar.


I'll take ntfs-3g over the best implementation of exFAT in a heartbeat. Refusing to write to NTFS for reliability purposes, and thereby pushing people onto exFAT, is shooting yourself in the foot.


At which point you're asking why Apple doesn't have default support for something like ext4, which is a decent point.

That would both get you easier compatibility between Mac and Linux and solve the NTFS write issue without any more trouble than it's giving people now because then you'd just install the ext4 driver on the Windows machine instead of the NTFS driver on the Mac.


Is it that easy to use on Windows these days? I should give it a try.


> If Mozilla or Google were to make their code freely available on some git forge like GitHub

https://github.com/mozilla-firefox/firefox

https://github.com/chromium/chromium


It's the same language most of the code in Chrome and Firefox is written in.

It's also not clear what you're looking for in terms of cross-platform support. Some languages provide better standard library support for UI elements, but that's the part a browser will be implementing for itself regardless.


Sure but those browsers started long time ago and are implementing some of newer stack in Rust.


> For example, I regularly order via companies that use Shopify. Now, all of the shopify emails are going straight to spam in Gmail, despite constantly marking them as not spam. (These even pass dmarc/spf/dkim etc, so who knows what's going on here.)

There's a pretty good chance this is because Shopify is sending a lot of email users mark as spam, or is using the same mail server as someone who does. Then you marking them as not spam gives them a better score but the sender's reputation is still so bad that it can't break the threshold to stay out of the spam folder.


I mark them as spam. I only want the real notifications and not the free goodies and recap and others are interested in mails.


> multiple world routeable IPv4 addresses

It's pretty rare that you would need more than one.

If you're running different types of services (e.g. http, mail, ftp) then they each use their own ports and the ports can be mapped to different local machines from the same public IP address.

The most common one where you're likely to have multiple public services using the same protocol is http[s], and for that you can use a reverse proxy. This is only a few lines of config for nginx or haproxy and then you're doing yourself a favor because adding a new one is just adding a single line to the reverse proxy's config instead of having to configure and pay for another IPv4 address.

And if you want to expose multiple private services then have your clients use a VPN and then it's only the VPN that needs a public IP because the clients just use the private IPs over the VPN.

To actually need multiple public IPs you'd have to be doing something like running multiple independent public FTP servers while needing them all to use the official port. Don't contribute to the IPv4 address shortage. :)


> https://pjm.adobeconnect.com/p63ultsdb2v/

Apparently my browser does not support some content in the file I'm trying to view and I'm instructed to use, among other things, "Firefox undefined or later". Which may or may not be what I was trying to use to begin with.

Though it seems to work anyway, so okay then.


> "Firefox undefined or later"

Honestly, you should really upgrade to at least Firefox Null for the security updates, or even Firefox NaN if you’re okay with being on the bleeding edge.


That PJM training material uses some ancient Adobe product. Works fine, though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: