LeaderboardThe framework is based on 16 web application backends that the honeypots are tasked to simulate. These fixed environments ensure the reproducibility and comparability of evaluation runs. Further, Honeyval is composed of three evaluation tasks: the main task, in which an AI hacking agent is directly interacting with the honeypots, and two control tasks, one for the hacking agent and one for the honeypot, aimed at ensuring the reliability of the conclusions made in the main task.
In the main task, the hacking agent and the LLM-powered honeypot are interacting directly. Here, we keep track of the following key metrics: (i) interaction length, as a proxy for information gain about the attacker by the honeypot; (ii) honeypot detection TPR, to measure the stealth of the implemented honeypot; (iii) running cost, enabling to gauge the economic viability of an LLM-powered honeypot against agentic attackery; and (iv) response latency, to gauge fingerprintability risks through response speed. Additionally, custom metrics can be added with ease.
In the control tasks, both the agent and the honeypot are configured the exact same way as in the main task. This ensures the transferability of conclusions across the tasks, enabling to use the control tasks to monitor the impact of adaptations made for the main task in both the hacking agent and the honeypot. In the control task for the hacking agent, the agent is evaluated at exploiting real implementations of the webapp backends; benchmarking the hacking capability of the agent. In the control task for the honeypot, the honeypot is evaluated on a functional test suite corresponding to the simulated backend application; benchmarking the simulation accuracy of the system.
Honeyval is available as an open-source codebase on GitHub. To evaluate your HTTP honeypot, you need to implement the honeypot interface of Honeyval for your honeypot. More details are included in the code repository.
ContributingWe welcome new backend applications, exploit goals, agents, honeypot implementations, metrics, and general feedback from the community. Please visit our GitHub repository for details.
@misc{vero2026honeyval,
title = {Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots},
author = {Vero, Mark and Kaczmarczyck, Fabian and Petrov, Ivan and Shumailov, Ilia and Hayes, Jamie and Heinen, Niels and Fan, Tianqi and Invernizzi, Luca and Vechev, Martin},
year = {2026},
note = {Preprint}
}