As you can tell from the previous two posts on Page Street Labs, I have been obsessed with Very Large Parameter (VLP) models lately. I wasn’t always this way. On my personal blog and Twitter feed, I have written enough about the culture of building models by stacking layers and praying it works. Ever since we figured out that adding more parameters (more layers specifically) helps, folks have been pushing that limit. Here’s an example from ImageNet: And most of those efforts are B-O-R-I-N-G (...