3作者: Olibier大约 1 个月前原帖
Hi HN, I’m the creator of YoloForge. I built this because I hit a wall with a hobby computer vision project: I needed a custom dataset, and zero-shot tools like Grounding DINO just weren&#x27;t accurate enough for my specific classes. I decided I’d rather write code for a couple of weeks than draw another box by hand.<p>I previously experimented with Grounding DINO and SAM3. While they are amazing for generic objects, I found they struggle with specific semantic requests (e.g. specific manufacturing parts, game characters or distinguishing &quot;a worker&quot; from &quot;a worker without a helmet&quot;).<p>I discovered that Gemini 3 Pro is surprisingly underrated for bounding box tasks if you prompt it with detailed visual descriptions. It handles semantic understanding significantly better than standard zero-shot detectors.<p>url: yoloforge.com<p>The Workflow:<p>Upload a zip of raw images (stored in Cloudflare R2). Describe class&#x2F;classes in plain English. The system generates a .jsonl batch file and sends it to the Gemini Batch API. This allows us to process thousands of images in parallel at 50% of the standard cost. You review&#x2F;correct boxes in the UI and export the YOLO train&#x2F;val&#x2F;test dataset.<p>Technical Challenges:<p>One hard part was getting valid JSON out of the LLM consistently. I ended up writing a robust parser that uses regex fallback strategies to literally &quot;salvage&quot; valid bounding boxes from malformed responses.<p>The Stack:<p>- Frontend: Next.js - Backend: FastAPI, Celery (for async zip processing and polling the batch API), Redis. - Storage: Supabase (Auth&#x2F;DB), Cloudflare R2 (Image Storage). - Model: Google Gemini 3 Pro via Batch API.<p>There is a live demo on the landing page (no sign-up required) where you can upload a single image to test the detection logic. But of course the tool really shines with datasets that have thousands of images with multiple classes.<p>If you have any technical questions please ask!
3作者: joshuafkon大约 1 个月前原帖
America&#x27;s TFR is 1.67. I wanted to understand what it would actually take to get back to replacement (2.1), so I built a simulator where you can stack policies and see the projected effects. Every policy has cited effect sizes (Cohen, Milligan, Raute, etc.) with confidence levels. You can click any policy title to see the methodology and sources. The model includes:<p>Fiscal tracking (policy costs, deficit impact, GDP effects) Diminishing returns when stacking similar interventions Immigration with selection mechanisms and generational convergence Tax increases and entitlement reform as funding options (with growth drag) A few &quot;illiberal&quot; policies for analytical completeness<p>The honest answer seems to be: it&#x27;s really hard. Most realistic packages get you to ~1.9-2.0 at enormous cost, and that&#x27;s assuming the effect estimates transfer to the US context (they might not). Built with vanilla JS. Feedback welcome - especially on the methodology or effect estimates I got wrong.
16作者: tsazan大约 1 个月前原帖
这里是原帖作者。我使用了非官方的IKEA美国数据集(最初由jeffreyszhou抓取)并将所有30,511个产品转换成了一种扁平化的、类似于Markdown的协议,称为CommerceTXT。 目标:看看扁平结构是否对大型语言模型(LLM)的上下文窗口更有效。 结果: - 规模:30,000个产品,涵盖632个类别。 - 效率:文本版本相比于等效的压缩JSON,使用的token减少了约24%(总共节省了360万token)。 - 结构:文件按文件夹组织(例如 /products/category/),这有助于测试层次检索路由器。 链接指向Hugging Face上的数据集,其中包含完整的基准测试。 解析器代码在这里: [https://github.com/commercetxt/commercetxt](https://github.com/commercetxt/commercetxt) 欢迎提问关于转换逻辑的任何问题!
2作者: realdexter大约 1 个月前原帖
OP here. I built RepoReaper to solve code context fragmentation in RAG.<p>Unlike standard chat-with-repo tools, it simulates a senior engineer&#x27;s workflow: it parses Python AST for logic-aware chunking, uses a ReAct loop to JIT-fetch missing file dependencies from GitHub, and employs hybrid search (BM25+Vector). It also generates Mermaid diagrams for architecture visualization. The backend is fully async and persists state via ChromaDB.<p>Link: <a href="https:&#x2F;&#x2F;github.com&#x2F;tzzp1224&#x2F;RepoReaper" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tzzp1224&#x2F;RepoReaper</a>
2作者: zaochen1224大约 1 个月前原帖
I built a small web tool that lets people create Arabic calligraphy without needing design software. Most existing tools are either too complex or very limited, so I wanted something simple and accessible.<p>Features: • Write Arabic directly or translate from English • 11 classic calligraphy styles (Thuluth, Naskh, Kufi, Diwani, etc.) • Adjust layout, colors, line height, stroke, and rotation • Export as PNG, JPG, or SVG • No signup required<p>I’d appreciate any feedback on performance, UI, or calligraphy accuracy. This is a solo side project and still evolving.<p>Site: <a href="https:&#x2F;&#x2F;arabiccalligraphygenerator.online" rel="nofollow">https:&#x2F;&#x2F;arabiccalligraphygenerator.online</a>
2作者: K-dash大约 1 个月前原帖
Finding open source issues is easy. Deciding which ones are worth your time is not.<p>I built Contrib.FYI as a simple web app to reduce that decision cost.<p>Instead of relying on static, curated lists, it uses live GitHub API data and shows issues in chronological order, so discovery stays fresh.<p>On top of that, it surfaces a few early signals (language, stars, no comments, no linked PRs) to help you avoid opening issues that are already being worked on.<p>The goal is not to find more issues, but to find better candidates to spend your time on.<p>Source code is available here: <a href="https:&#x2F;&#x2F;github.com&#x2F;K-dash&#x2F;contrib-fyi" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;K-dash&#x2F;contrib-fyi</a><p>Feedback is welcome.
1作者: lucienpeng大约 1 个月前原帖
Just pushed an update (v1.1) to Struxs.<p>We had some users asking for a way to constrain the scope of the visual perception. Specifically, they were processing scanned forms where a field like &quot;Gender&quot; or &quot;Payment Mode&quot; would return inconsistent raw text (e.g., &quot;M&quot;, &quot;Male&quot;, or just a checked box symbol) depending on the document layout.<p>To solve this, we added an Enum type to the builder.<p>You can now visually map a region and strictly define the allowed states (e.g., [&quot;Male&quot;, &quot;Female&quot;] or [&quot;Sedan&quot;, &quot;SUV&quot;, &quot;Truck&quot;]). The engine will now force the visual signal into one of those pre-defined buckets instead of returning ambiguous strings.<p>It’s a small change, but it makes the JSON output deterministic and saves you from writing extra code to normalize the data downstream.<p>Happy to hear any feedback.
1作者: khoinp1012大约 1 个月前原帖
I built kprotect because I wanted a way to protect my sensitive files (SSH keys, env files) that went beyond just standard Linux permissions. Even if a process is running as root, it shouldn&#x27;t be able to read my secrets unless it’s part of a trusted execution chain.<p>How it works: It uses BPF LSM (Linux Security Modules) to intercept file access at the kernel level. Instead of just checking the PID or the binary name, it looks at the entire lineage (the &quot;Chain of Trust&quot;). For example, cat is only allowed to read my SSH keys if the parent process is my-terminal and the grandparent is vscodium.<p>Key Tech:<p>Backend: Rust + Aya (for the eBPF bits).<p>Frontend: Tauri + React for the dashboard.<p>Security: Logs and configs are AES-encrypted to prevent tampering.<p>It’s currently in beta (0.1.0). It requires a kernel (5.10+) with BPF LSM enabled. I&#x27;d love to hear feedback on the &quot;Chain of Trust&quot; logic—specifically if anyone sees edge cases in how I&#x27;m verifying the process ancestors. GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;khoinp1012&#x2F;kprotect" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;khoinp1012&#x2F;kprotect</a>