Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's neat, using Apple Foundation Models or something else? I'm very curious about how it's determining folder matches (I need to do something for images that are already classified/tagged via FastVLM) in iOS.




Not Apple Foundation Models — unfortunately they’re not capable enough (yet) for understanding content and matching it to folders.

I’m using SBERT-style embedding models for the semantic matching, which works very well in practice.

For non-text content, the app also analyzes images (OCR + object recognition) using Apple’s Vision framework. That part is surprisingly powerful, especially on Apple Silicon.

> I need to do something for images that are already classified/tagged via FastVLM

What’s the concrete use case you’re targeting with this?


Classifying real estate / property images. Also using Apple Vision which ain't half-bad for something on device and feeding that metadata along with what FastVLM returns into Foundation model to turn into structured output - trying to see how far a I can push that. But feels pretty limited/dated in term of capabilities vs lead edge models.

I’ve seen a huge advantage in running everything fully local and private. Not sure if that fits your use case, though. Nearly 90% of Floxtop users choose the app mainly for that privacy focus.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: