Hacker Newsnew | past | comments | ask | show | jobs | submit | amarcheschi's commentslogin

Devstral 2 should be above https://mistral.ai/news/devstral-2-vibe-cli

Though I haven't checked other benchmarks and they only report swe


Devstral 2 is free from the API. That has to be a bigger point to what makes it better. The price to performance ratio is practically better in every way. Does it matter if the performance is slightly worse when it is practically free?

Yes, but if it's actually competitive that won't last that long. Mistral will do the same as google (cut their free tier by 50x or so) if they ever catch up. Financially anything else would make no sense.

Of course currently Mistral has an insane free tier, 1 billion tokens for each(?) of their models per month.


"French President Emmanuel Macron was made aware of the fake video on December 14, but despite his pleas to take it down, Meta left it online for several days, saying it did not violate its platform rules"

Maybe those rules should be changed


The rules are fine and do prohibit this, it's their enforcement that's (intentionally) flawed.

Social media moderation has to balance "engagement" with the potential for bad PR or liability for the company. It turns out that content that is against the rules is also the one that generates the most engagement, so enforcing the rules as-is is bad for the bottom-line.

Thus for every piece of content that is potentially against the rules, the actual condition for removing it is whether the expected engagement potential outweighs the probability of someone rich/well-connected getting inconvenienced by it and how much inconvenience would it be. Content is only removed when the liability potential exceeds the profit potential.

At the beginning the reports were ignored because the system determined it is more profitable to stay up. I'm not sure what "his pleas to take it down" refers to, it would've likely been just his staff members flagging it with their personal accounts and those flags having very little weight. Eventually either someone managed to talk to a human and/or a letter to their legal department arrived, or the content achieving enough impressions to become a risk which caused the earlier flags to actually get reviewed by a competent human, at which point they realized what their liability was and quickly removed it.

You should expect to see an apology from their PR department soon and a promise they'll do better next time.


Click on America by design initiative

"What's the biggest brand in the world? If you said Trump, you're not wrong. But what's the foundation of that brand? One that's more globally recognized than practically anything else.

...

This is President Trump going bigger than President Nixon"

Jesus christ, man


I about spit my coffee when I saw that. Good grief.

> We've been conditioned to accept that mediocre in government is normal.

Yes, I do now accept that mediocre [sic] in government is normal for the next few years.


An entire generation has grown up post Nixon and hasn't known government that worked so yes...could you blame them?

Furthermore only ~50% of the country has a passport so many haven't even seen how things run elsewhere.


Are you under the impression this is somehow different from the last few years?

Of course. Whatever problems the US government had before, mass firings, loyalty tests, furloughs, and endless other shenanigans have only exacerbated them.

Are you American? Do you think this is normal? Just curious (non American).

There is a somewhat stubborn idea that a government will always have many inefficiencies baked in, since there’s no real incentive to remove them beyond a generic “that would be nice”.

Mediocre is probably the ceiling.

:( I had to click through because I didn't believe you at first... as someone who used to proudly work with feds, this yet another low point in many over the past ten years.

Just a few days ago I was doing some low paid (well, not so low) Ai classification task - akin to mechanical turk ones - for a very big company and was - involuntarily, since I guess they don't review them before showing - shown an ai image by the platform depicting a naked man and naked kid. though it was more barbie like than anything else. I didn't really enjoy the view tbh, contacted them but got no answer back

If the picture truly was of a child, the company is _required_ to report CSAM to NCMEC. It's taken very seriously. If they're not being responsive, escalate and report it yourself so you don't have legal problems.

See https://report.cybertip.org/.


Even if it's an Ai image? I will follow through contacting them directly rather than with the platform messaging system, then I'll see what to do if they don't answer

Edit i read the informations given in the briefing before the task, and they say that there might be offensive content displayed. They say to tell them if it happens, but well I did and got no answer so weeeell, not so inclined to believe they care about it


>Even if it's an Ai image?

This varies by country, but in many countries it doesn't matter if it is a drawing, AI, or a real image -- they are treated equally for the purposes of CSAM.


That's understandable

The company may not care, but the gov definitely does. And if you don’t report then you could be in serious legal jeopardy. If any fragments of that image are still present on your machine, whether it came from the company or not, you could be held accountable for possessing csam.

So screw the company, report it yourself and make sure to cite the company and their lack of a response. There’s a Grand Canyon sized chasm between “offensive content” and csam.


A nude picture of a child is not automatically CSAM.

It needs to be sexually abused or exploited for something to be CSAM.


That's understandable, I still felt uneasy

> It's taken very seriously

Can confirm. The amount of people I see in my local news getting arrested for possession that "... came from a cybertip escalated to NCMEC from <BIGCOMPANY>" is... staggering. (And it's almost always Google Drive or GMail locally, but sometimes a curveball out there.)


Where does this happen?

How can I find work like this?

Sorry, i'm not comfortable sharing the name of the platform given the situation. However, for similar jobs i find that browsing the web with the string "serious tasks beermoney reddit" gives you similar results to what i'm talking about

The explosions were in fact strong enough that innocent people, including children, died https://en.wikipedia.org/wiki/2024_Lebanon_electronic_device...

That doesn't necessarily mean the blast radius was large. The 9 year old was killed while holding the pager.

> Fatima was in the kitchen on Tuesday when a pager on the table began to beep, her aunt said. She picked up the device to bring it to her father and was holding it when it exploded, mangling her face and leaving the room covered in blood, she said.

https://www.nytimes.com/2024/09/18/world/middleeast/lebanon-...


Oh, I didn't know this. Innocent people were still killed and maimed by shrapnel. The other children aged 11 was killed when his father's pager detonated

hmm maybe you don't know there's "intentional homicide" and "unintentional homicide", and those two differ extremely in court?

seems like you like being sarcastic, but don't know basic stuff even 15 year olds know


The comment I was answering above above was saying that explosions were so weak that people inches away were unarmed. The doctors in Lebanon would probably dissent

Monitoring people for... Supporting opinions that don't agree with you?

There's a specific group of people that have this notion of thinking and I don't even need to explain further because most people will know who I am talking about

The benchmark of swe places it in a comparable score with respect to open models and just a few points below the top notch models though

We did it guys, we made physical devices usable only if you treat them as a SaaS otherwise you're sol when battery runs out

that's a pretty lame take, all I did was break down the cost to the lifespan of the device

it's useful to think of a lot of things this way, I also justify clothing purchases on a rough estimate of cost per wear



look at points everywhere for enshittification

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: