Evals are scored via rule checks and LLM-as-judge, with metrics like task accuracy, tool-use precision, latency, and token cost. Suite name, models, judge model, repeats MCP server command and args ...
HexStrike AI MCP Agents is an advanced MCP server that lets AI agents (Claude, GPT, Copilot, etc.) autonomously run 150+ cybersecurity tools for automated pentesting, vulnerability discovery, bug ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results