问HN:从零开始构建相对先进的LLM代理?
众所周知,OpenAI并不是那么开放。<p>在2023年,我在玩转变压器(transformers)、递归神经网络(RNNs),并且对其从头到尾的工作原理有了一定的理解(例如,自己制作了Keras,可以在白板上画出小型网络),我可以很快在Keras或TensorFlow中组合一些东西。<p>我找到了一份工作,之后再也没有接触过这些技术。尽管有数据和计算资源,使用最新技术制作一个个人项目的基础模型有多难呢?我听说过专家混合(MoE)之类的东西,我想我们已经不再仅仅是在Keras中简单地堆叠一堆层和丢弃(dropout)了。
查看原文
As we know, OpenAI is not so open.<p>In 2023, I was playing with transformers, RNNs and I had an understanding how it worked from top to bottom (e.g. made my own keras, could whiteboard small nets) and I can throw things together in keras or tf pretty quick<p>I got a job and never touched that again.
Data and compute notwithstanding, how hard would it be to make a pet project foundation model using the latest techniques? I’ve heard about MoE, things like that and I figure we’re not just throwing a bunch of layers and dropout in Keras anymore.