问HN:Qwen3 – 它准备好用于驱动AI代理了吗?
看起来Qwen3并不具备独立推理的能力——它缺乏驱动完全自主AI代理所需的质量。<p>最初,我对它在通过聊天界面输出代码时的问题解决能力感到相当 impressed。它在处理某些问题时表现得比Claude或Gemini要好得多。然而,当我切换到阿里云的API,以提供我新一代AI代理(代码链)认知者接口的Dashscope实现时,整个魅力都消失了。<p>Qwen3在结构化生成尝试中表现不佳,经常在输出标记时陷入无限循环。<p>它在跨越语言边界时遇到困难,这对我的代理至关重要,因为它们是“用代码思考”的——编写包含JavaScript和SQL的Kotlin脚本等,因此它作为自动化软件工程师的表现并不好。<p>它是“固执”的——即使生成代码中的语法错误已明确指出,它仍然倾向于一遍又一遍地输出相同的错误代码,而不是测试其他假设。<p>它缺乏心智理论和对上下文及环境的理解。例如,当被要求检查最近的新闻时,它总是试图使用BBC API的URL,并将未填写的API密钥作为请求的一部分,同时将此URL传递给Files工具,而不是WebBrowser工具,这显然是失败的。<p>最后但同样重要的是——审查,例如Qwen3会拒绝搜索关于中国最近反政府抗议的信息。如果这些审查屏障在其他领域的认知质量不佳中也部分负责,我一点也不会感到惊讶。<p>也许是我做错了什么,而你在使用这个模型进行完全自主代理和反馈循环时得到了更好的结果?
查看原文
It seems that Qwen3 is not capable of driving independent reasoning - it lacks the quality needed to power fully autonomous AI agents.<p>Initially I was quite impressed with it's problem solving capabilities, when outputting the code through the chat interface. It addressed certain problems much better than Claude or Gemini. However, as soon as I switched to Alibaba Cloud's API to provide Dashscope based implementation of cognizer interface of my new generation of AI agents (chain of code), the whole charm was gone.<p>Qwen3 struggles with structured generation attempts, quite often falling into an infinite loop when spitting out tokens.<p>It has troubles crossing boundaries of languages, which is crucial for my agents which are "thinking in code" - writing Kotlin script, containing JavaScript, containing SQL, etc., therefore it will not work well as automated software engineer.<p>It is "stubborn" - even when the syntax error in generated code is clearly indicated, it is rather wiling to output the same error code again and again, instead of testing another hypothesis.<p>It lacks the theory of mind and understanding of the context and the environment. For example when asked to check the recent news, it is always responding by trying to use BBC API url, with non-filled API key as a part of the request, while passing this url to the Files tool instead of the WebBrowser tool, which obviously fails.<p>And the last, but not least - censorship, for example Qwen3 will refuse to search for the information on the most recent anti-governmental protests in China. I wouldn't be surprised if these censorship blockers were partially responsible for poor quality of cognition in other areas.<p>Maybe I'm doing something wrong, and you are getting much better results with this model for fully autonomous agents with feedback loop?