The authors of this paper (link in comments) "show that LLM agents can autonomously hack websites" the LLM agents are "performing complex tasks without prior knowledge of the vulnerability. For example, these agents can perform complex SQL union attacks, which involve a multi-step process (38 actions) of extracting a database schema, extracting information from the database based on this schema, and performing the final hack. Our most capable agent can hack 73.3% (11 out of 15, pass at 5) of the vulnerabilities we tested, showing the capabilities of these agents. Importantly, our LLM agent is capable of finding vulnerabilities in real-world websites.
We're replacing legacy SAST using LLMs at DryRun Security, so this paper doesn't surprise me, but I think this will be huge for the pen testing and services industry.
anything that helps detect vulnerabilities prior to production is a good thing. Of course it needs to become a step in the QA/dev process.
I know there's a couple of startups in the space that have been working on this over the past year. So we'll see a lot of this very soon.
Great article with some pretty interesting implications. I cant say I am overly suprised, and it seems like a fairly natural progression/use of LLM technology in the security field given one of the primary failure points of automated tools over the years has consistantly been business logic flaws chaining into larger breaches.
Maybe is going surpirse someone, but AI cannot be "creative" at all. All this article is about simple network scanners and scan flows with some side atack workflow documentation, just like any other regular PAN test workflow.
CEO & Co-Founder, DryRun Security
2mohttps://arxiv.org/pdf/2402.06664.pdf