That's neat, using Apple Foundation Models or something else? I'm very curious a...

bobnarizes · 2025-12-19T21:47:31 1766180851

Not Apple Foundation Models — unfortunately they’re not capable enough (yet) for understanding content and matching it to folders.

I’m using SBERT-style embedding models for the semantic matching, which works very well in practice.

For non-text content, the app also analyzes images (OCR + object recognition) using Apple’s Vision framework. That part is surprisingly powerful, especially on Apple Silicon.

> I need to do something for images that are already classified/tagged via FastVLM

What’s the concrete use case you’re targeting with this?

cpursley · 2025-12-19T21:57:14 1766181434

Classifying real estate / property images. Also using Apple Vision which ain't half-bad for something on device and feeding that metadata along with what FastVLM returns into Foundation model to turn into structured output - trying to see how far a I can push that. But feels pretty limited/dated in term of capabilities vs lead edge models.

bobnarizes · 2025-12-19T22:06:37 1766181997

I’ve seen a huge advantage in running everything fully local and private. Not sure if that fits your use case, though. Nearly 90% of Floxtop users choose the app mainly for that privacy focus.