返回首页
最新
Hi HN,<p>I built this tool to solve the "flakiness" problem in UI testing. Existing AI agents often struggle with precise interactions, while traditional frameworks (Selenium/Playwright) break whenever the DOM changes.<p>The Approach: Instead of relying on hard-coded selectors or pure computer vision, I’m using a multi-agent system powered by multimodal LLMs. We pass both the screenshot (pixels) and the browser context (network requests, console logs, etc) to the model. This allows the agent to:<p>"See" the UI like a user and accurately map semantic intent ("Click the Signup button") to precise coordinates even if the layout shifts.<p>The goal is to mimic natural user behavior rather than following a predefined script. It handles exploratory testing and finds visual bugs that code-based assertions miss.<p>I’d love feedback on the implementation or to discuss the challenges of using LLMs for deterministic testing.
Recently built something where simple domain-specific heuristics crushed a fancy ML approach I assumed would win. This has me thinking about how often we reach for complex tools when simpler ones would work better. Occam's razor moments.<p>Anyone have similar stories? Curious about cases where knowing your domain beat throwing compute at the problem.
Hi HN! I built VAM Seek because I was frustrated with 1D seek bars – you never know where you're going until you get there.<p>VAM Seek renders a 2D thumbnail grid next to your video. Click any cell to jump. All frame extraction happens client-side via canvas – no server processing, no pre-generated thumbnails.<p>- 15KB, zero dependencies
- One-line integration
- Works with any <video> element<p>Live demo: <a href="https://haasiy.main.jp/vam_web/deploy/lolipop/index.html" rel="nofollow">https://haasiy.main.jp/vam_web/deploy/lolipop/index.html</a><p>Would love feedback!