A retired AI researcher at my local hardware store made me rethink how I train my models

I was at Ace Hardware over on Grand Avenue last Saturday picking up some PEX fittings for a job, and I ran into this older guy in the plumbing aisle who was staring at the shelf for like five minutes. Turns out he used to work on early neural networks back in the 90s, and he just started talking to me about how everyone today overcomplicates things with huge datasets. He told me his old team would sometimes get better results by feeding a model just a few dozen carefully picked examples rather than millions of junk ones. I got to thinking about how I've been spending hours scraping tons of data when maybe I should be more deliberate about quality instead. Have any of you tried cutting way back on your training data size and actually saw improvements in accuracy?

3 comments

3 Comments

val97426d ago

Huh, that's wild. I used to be all about the "more data is always better" mindset, thinking you just needed a huge pile to train anything decent. But honestly, this makes a ton of sense. I've spent countless hours cleaning up messy datasets full of garbage that probably just confused the model anyway. Maybe the real trick is picking the right handful of examples that cover the important edge cases instead of drowning in noise. It's like teaching someone to fix a leaky pipe by showing them every possible pipe in existence versus just showing them the one that's actually broken. You ever try this with a smaller, high-quality set and see if your results held up over time?

margaretc4225d ago

Yeah, I trimmed a dataset to about 200 good examples and the model actually got better.

noahwood26d ago

Quality over quantity almost always wins in my experience with AI training data... I've seen models actually get worse when you feed them a ton of junk data that just makes them learn the wrong patterns. The pipe analogy is spot on actually.