1作者: moaffaneh27 天前原帖
Hi all, I am designing an AWS-based unstructured document ingestion platform (PDF&#x2F;DOCX&#x2F;PPTX&#x2F;XLSX) for large-scale enterprise repositories, using vision-language models to normalize pages into layout-aware markdown and then building search&#x2F;RAG indexes or extract structured data.<p>For those who have built something similar recently, what approach did you use to preserve document structure reliably in the normalized markdown (headings, reading order, nested tables, page boundaries), especially when documents are messy or scanned? Did you do page-level extraction only, or did you use overlapping windows &#x2F; multi-page context to handle tables and sections spanning pages?<p>On the indexing side, do you store only chunks + embeddings, or do you also persist richer metadata per chunk (page ranges, heading hierarchy, has_table&#x2F;contains_image flags, extraction confidence&#x2F;quality notes, source pointers) and if so, what proved most valuable? How does that help in the agent retrieval process?<p>What prompt patterns worked best for layout-heavy pages (multi-column text, complex tables, footnotes, repeated headers&#x2F;footers), and what failed in practice?<p>How did you evaluate extraction quality at scale beyond spot checks (golden sets, automatic heuristics, diffing across runs&#x2F;models, table-structure metrics)?<p>Any lessons learned, anti-patterns, or “if I did it again” recommendations would be very helpful.
1作者: rinvi28 天前原帖
I wanted to control the browser from the terminal so I made buse:<p>buse browser-1 # open chrome<p>buse browser-1 navigate &quot;<a href="https:&#x2F;&#x2F;example.com" rel="nofollow">https:&#x2F;&#x2F;example.com</a>&quot;<p>buse browser-2 # open a second browser<p>buse browser-2 search &quot;cat&quot;<p>buse browser-1 observe # returns JSON about the page<p>buse browser-1 click 16 # clicks on the learn more link<p>I&#x27;ve been reading about agentic computer use and I tried to use MCPs and Browserbase, but there was just a lot of friction for me. So, I brought it to the CLI instead.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;rinvii&#x2F;buse" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rinvii&#x2F;buse</a>
1作者: nocaptable28 天前原帖
I&#x27;ve been with a startup through two funding rounds, and growth has been very healthy.<p>I recently asked leadership how much I&#x27;ve been diluted, just for financial planning purposes. I assume I&#x27;ve been diluted &quot;a normal amount&quot; and am fine with that-- I just need to know the number. Instead I got a non-answer from leadership, which surprised me. So I&#x27;m curious:<p>- How common is this practice in mid-stage startups? - What is the actual rationale for withholding this information? I get why companies may want to keep the cap table confidential, but an employee&#x27;s dilution factor seems like the kind of thing that doesn&#x27;t matter for cap table confidentiality, but matters a lot to the employee.<p>Thanks in advance for any color or perspective on this.
1作者: MarkSweep28 天前原帖
Did you just wake up from a 20 year coma? Did you build a bunch of buzzword compliant web services back in the early 2000s and want all your SOAP and WSDL to be relevant again? Now you can put the smooth sheen of AI on your pile of angle brackets by exposing your SOAP-based web service as a Model Context Protocol (MCP) server.