Rendered at 18:13:59 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
Jimmc414 3 days ago [-]
“But it appears 1 or more organizations have successfully jail-broken Fable 5”
This is hardly true or it’s true of all frontier models and this was only magnified by Fables capabilities. It’s that you could hand Fable 5 vulnerable code, ask it to fix it, return patch plus test cases proving the fix and exploit relevant detail falls out as a byproduct of legitimate secure code review work.
I challenge anyone to provide a fix for this “exploit” without compromising Fable’s ability to patch unsecure code.
dlahoda 3 days ago [-]
I run orchestrated agents(context warm up, fork-join, file based communication, set of trailofbits skills, some dba skill, some sourcegraphindex, and few mine, API skill). None of skills are specific to out codebase(even my), some are explicitly for Claude.
Finds 90% bugs using Codex and Gemini 200usd subs.
You do not need service for auditing or review.
Just overlay you ci with ai.
I do not even sast (because we use rust typestates and invariants; and effect ts; and failfast).
Have seen death of 5 services for audit and review killed by "just overlay ci with ai in max sub on latest model".
basedrum 2 days ago [-]
So you have a write up on how to do this?
dlahoda 17 hours ago [-]
Not full text, but:
- config you cli to allow run agents in parallel
- ask to use tmp dir for all files, and git worktree copy
- ask to export all to files, progress, errors, reports, as it goes.
Now setup for each agent run (process with own context):
- planning-with-files, caveman, ask-questions-if-underspecified(if not in CI)
- these skills must be loaded into each subagent.
After, run assembling context:
- audit-context-building, trailmark-structural
At this point all only operator runs. No other agents.
Now ask for forks(parallel), each loads context built before:
- dba-review
- supply-chain-risk-auditor (ask to trigger only on lock files changes
- spec-to-code-compliance
- openai-gh-fix-ci (if audit of PR)
- graph-evolution, differential-review
- dimensional-analysis
- mutation-testing, property-based-testing (ask to mutate and proptest around changes).
So not of these skills are load into shared context. They have own to avoid pollution and loss of focus.
So all forks run done. Ask for join.
- fp-check, second-opinion
Ask for final report.
All skills above from github and public (not my, easy to find).
As with so much (LLM) security work, the devil is in the details: "~25 security issues per codebase" means nothing without a grounding in the codebase's actual security model, capabilities exposed to an attacker, etc. I haven't used Aikido's product, but my experience with similar tools is that tend to not find actual security issues until a proper security model is introduced for grounding.
(I say this as someone who is, broadly, extremely impressed by and interested in the use of LLMs for security research.)
MeetingsBrowser 3 days ago [-]
> logic based vulnerabilities like a ReDoS pattern identified from source without live exploitation, or an admin-only route that's never been exercised
The two classes of vulnerability given as examples are the exact kind of issue I probably don’t care about, and are not grounded in an actual security model
dwb 2 days ago [-]
I hope it’s an improvement on their current PR review code scanning, which alerts on code that only looks possibly vulnerable in isolation, without looking at the context. I guess I assumed it was an LLM being extremely lazy, but maybe it’s just static analysis. Anyway it’s pretty annoying.
Leynos 2 days ago [-]
Alerts on test fixtures, so suspect it is doing nothing new.
joshuat 3 days ago [-]
This looks promising, but I find it a little odd to bury the bulk of plan limitations under "fair-usage limits". When the limitations are specifically coupled to plans, it feels less like an FUP and more like plan-specific caps that should be surfaced more directly.
shireboy 3 days ago [-]
We’ve been using aikido code scanning and pen test tools and been pretty impressed. Will have to take a look at this.
leetrout 3 days ago [-]
I'm building a competing product and am curious if you'd be up for a conversation about what you've enjoyed best about Aikido and, importantly, what gaps are still not covered.
_def 3 days ago [-]
This is marketed as a defensive tool, but how do you prove that you check against "your" source code?
Shanyao 3 days ago [-]
Looks like a solid bridge between SAST and manual review. Will check it out.
dlahoda 3 days ago [-]
Feels like marketing, sir.
Can be your more specific what exactly nice?
Article was not exactly clear amid noise of it's text.
This is hardly true or it’s true of all frontier models and this was only magnified by Fables capabilities. It’s that you could hand Fable 5 vulnerable code, ask it to fix it, return patch plus test cases proving the fix and exploit relevant detail falls out as a byproduct of legitimate secure code review work.
I challenge anyone to provide a fix for this “exploit” without compromising Fable’s ability to patch unsecure code.
Finds 90% bugs using Codex and Gemini 200usd subs.
You do not need service for auditing or review.
Just overlay you ci with ai.
I do not even sast (because we use rust typestates and invariants; and effect ts; and failfast).
Have seen death of 5 services for audit and review killed by "just overlay ci with ai in max sub on latest model".
- config you cli to allow run agents in parallel
- ask to use tmp dir for all files, and git worktree copy
- ask to export all to files, progress, errors, reports, as it goes.
Now setup for each agent run (process with own context):
- planning-with-files, caveman, ask-questions-if-underspecified(if not in CI)
- these skills must be loaded into each subagent.
After, run assembling context: - audit-context-building, trailmark-structural
At this point all only operator runs. No other agents.
Now ask for forks(parallel), each loads context built before:
- dba-review
- supply-chain-risk-auditor (ask to trigger only on lock files changes
- spec-to-code-compliance
- openai-gh-fix-ci (if audit of PR)
- graph-evolution, differential-review
- dimensional-analysis
- mutation-testing, property-based-testing (ask to mutate and proptest around changes).
So not of these skills are load into shared context. They have own to avoid pollution and loss of focus.
So all forks run done. Ask for join.
- fp-check, second-opinion
Ask for final report.
All skills above from github and public (not my, easy to find).
I have private skills like:
- github-context-aggregation (links, comments, descriptions)
- structural-transformation-review (thing topology, category theory, duals)
- layered-context-chunking (2 layers at a time - like db+backend, backend+fe to more focus)
- product-feature-interactions (product line eng combinations)
Good mental model for what you are doing is
- https://github.com/microsoft/conductor/blob/main/examples/pa...
- https://github.com/features/actions
(I say this as someone who is, broadly, extremely impressed by and interested in the use of LLMs for security research.)
The two classes of vulnerability given as examples are the exact kind of issue I probably don’t care about, and are not grounded in an actual security model
Can be your more specific what exactly nice?
Article was not exactly clear amid noise of it's text.