F
3

Finally realized I was overcleaning my data for 2 years lmao

I was running all my training data through like 15 steps of cleaning thinking I was being thorough. Then a friend from a startup in SF showed me their pipeline - they barely clean anything and their model performance was actually better. I had been stripping out too much useful noise. Has anyone else found that less cleaning can sometimes help with AI models?
3 comments

Log in to join the discussion

Log In
3 Comments
charlie198
My roommate spent 3 months cleaning data that ended up being a corrupted CSV file.
6
scott.alex
Ugh, that's brutal, I feel for them.
0
shane170
shane1706d ago
lol yeah I went through the same phase, thought I was a data janitor scrubbing every little stain out. Turns out I was basically throwing the baby out with the bathwater and running a model on nothing but bathwater fumes. My buddy from some no-name startup just dumps his raw data in, runs a basic filter, and his stuff outperforms mine every time. It's kind of embarrassing how many hours I wasted on "cleaning" when the noise was actually doing the heavy lifting. What kind of stuff were you stripping out that you now think was useful?
5