really want a model that spits out high quality/infodense videos which answer my questions better than deep research with great visualizations/references/memory aids/etc.
this seems just about technically possible, but would definitely be slow. I wonder how many years of progress we need to get the pipeline under 5 seconds of latency
5,24K