The question is simple. I wanted to get a general consensus on if people actually audit the code that they use from FOSS or open source software or apps.
Do you blindly trust the FOSS community? I am trying to get a rough idea here. Sometimes audit the code? Only on mission critical apps? Not at all?
Let’s hear it!


‘AI’ as we currently know it, is terrible at this sort of task. It’s not capable of understanding the flow of the code in any meaningful way, and tends to raise entirely spurious issues (see the problems the curl author has with being overwhealmed for example). It also wont spot actually malicious code that’s been included with any sort of care, nor would it find intentional behaviour that would be harmful or counterproductive in the particular scenario you want to use the program.
Having actually worked with AI in this context alongside github/azure devops advanced security, I can tell you that this is wrong. As much as we hate AI, and as much as people like to (validly) point out issues with hallucinations, overall it’s been very on-point.
Could you let me know what sort of models you’re using? Everything I’ve tried has basically been so bad it was quicker and more reliable to to the job myself. Most of the models can barely write boilerplate code accurately and securely, let alone anything even moderately complex.
I’ve tried to get them to analyse code too, and that’s hit and miss at best, even with small programs. I’d have no faith at all that they could handle anything larger; the answers they give would be confident and wrong, which is easy to spot with something small, but much harder to catch with a large, multi process system spread over a network. It’s hard enough for humans, who have actual context, understanding and domain knowledge, to do it well, and I’ve, personally, not seen any evidence that an LLM (which is what I’m assuming you’re referring to) could do anywhere near as well. I don’t doubt that they flag some issues, but without a comprehensive, human, review of the system architecture, implementation and code, you can’t be sure what they’ve missed, and if you’re going to do that anyway, you’ve done the job yourself!
Having said that, I’ve no doubt that things will improve, programming languages have well defined syntaxes and so they should be some of the easiest types of text for an LLM to parse and build a context from. If that can be combined with enough domain knowledge, a description of the deployment environment and a model that’s actually trained for and tuned for code analysis and security auditing, it might be possible to get similar results to humans.
Its just whatever is built into copilot.
You can do a quick and dirty test by opening copilot chat and asking it something like “outline the vulnerabilities found in the following code, with the vulnerabilities listed underneath it. Outline any other issues you notice that are not listed here.” and then paste the code and the discovered vulns.