请问HN:使用AI/LLM API让我想放弃。我哪里做错了?
我正在尝试自动化我们目前的一些手动流程,但仍然无法克服这个难关。我到底做错了什么?
我正在使用这些人工智能API进行实际的处理工作,老实说,我感到沮丧和有些愤怒。这些人工智能公司向我们展示了一些宏大的自动化愿景,但实际使用他们的服务却是一种令人失望的体验。
1. 结果从来不一致。“请确保提取所有项目” -> [项目1, 项目2, 项目3, “字面意思是一个评论 // ...剩余项目”] 这是什么鬼!!有时它会给我一个完整的项目列表,有时却是这种胡扯。我提供了一个工具,但一半的时间它只抓取前三个,可能还会抓取最后一个(忽略中间的所有内容)。
2. 由于结果不可靠,我不得不进行更多的后处理。大约60%的时间,即使经过后处理,我也不得不拒绝,因为它们没有达到我的信心阈值。
3. 这些API的供应商支持很差。
- iOS有一些疯狂的行为,有时文件扩展名是.jpg或.JPG等。例如,OpenAI的API会因为扩展名不是“.jpg”而返回错误请求,所以我现在不得不添加更多代码,以确保用户上传文件时,我会重命名文件。
- 文档会说它支持一系列文件格式,但却因为不是.PDF而拒绝请求,尽管其目的为“助手”(文档中说可以处理图像)。没问题,我会转换一下。
- 处理来自其他来源(如G Drive等)的文件时,扩展名缺失但MIME类型存在……同样,错误请求。
4. 我们从2024年的“AGI即将到来”变成了今天的“人工超级智能即将到来”。我们能不能放松一下?我是不是掉进了营销陷阱?
我认为大型语言模型(LLMs)在像Cursor这样的应用程序中,或者在客户支持中非常出色,因为它们不需要给出“完美”的回答,因为人类操作员会进一步提示它。你有多少次不得不处理Cursor的愚蠢输出(我是重度用户,每天都在处理这个)。RAG是一个很酷的应用程序,在我看来,那里的正确性或精确性并没有真正的必要。我有数百条我输入的笔记,有时会参考。我每次得到的答案都不同,但我并不需要它们是完美的。
查看原文
I'm trying to automate a few manual processes we have right now, but I still can't get over this hump. What am I doing wrong?<p>I am using these AI APIs for actual processing type work, and I am left defeated and somewhat angry if I'm being honest. These AI companies sell us some galaxy-brain vision of automation, but actually using their services is a disappointing experience.<p>1. The results are never consistent. "Please ensure you extract ALL items" -> [Item1, Item2, Item3, "literally a comment // ...remaining items"] WHAT THE F$#K!! Sometimes it gives me a full list of all items, and sometimes it does that BS. I provided a tool, and half of the time it just grabs the first 3 and maybe it will grab the very last one too (ignoring everything in the middle).<p>2. Because the results are not reliable, I have to do more post-processing. About 60% of the time, even after post, I have to reject because they don't meet my confidence threshold.<p>3. The APIs are poorly supported by the vendors.<p>- iOS has some insane behavior where file extensions are sometimes .jpg or .JPG, etc. OpenAI's API, for example, will return Bad Request because the extension was not ".jpg" so now I have to add more code to ensure that when the user uploads files, I rename the file.<p>- The docs will say it supports a list of file formats, but then rejects the request because it was not .PDF even though the purpose was "assistants" (which the docs say can handle images). No problem, I'll just convert..<p>- Dealing with files coming from other sources (G Drive, etc.) where the extension is missing but the MIME type is present.. Again, bad request.<p>4. We went from "AGI any day now" in 2024, to "_A_rtificial _S_uper _I_ntelligence any day now" today. Can we just relax? Did I fall for a marketing trap?<p>I think LLMs are great for applications like in Cursor, or for customer support, where it doesn't need to give "perfect" responses because a human operator will prompt it further. How many times have you had to deal with stupid output from Cursor (I'm a power user, I deal with this daily). RAG is a cool application, and there's no real need for correctness or exactness there, IMO. I've got hundreds of my notes that I've fed which I reference sometimes. I get different answers each time, but I don't need them to be perfect.<p>:q!