AI training data is frequently thought of as a publicly available commodity. Something that is scraped off the internet, bundled into a big file, and fed to a model like digital food. While scraping data certainly plays a role, there’s another, often forgotten, component critical to training data: the data collectors. There is a significant amount of human labor that goes into creating, cleaning, ranking, and enriching datasets. The unsung heroes are the thousands, if not hundreds of thousand...