Ag's Junk Data Problem

Farmers had a great saying growing up, which goes like this:  “If you put rotten feed in a silo, you get rotten feed out of a silo.”  The same is applicable to farm data.  Seems like every week a new product emerges that is designed to collect, analyze, and benchmark farmers’ data in hopes of improving productivity.  In the race to collect farm data, I think the data collectors are often forgetting this old farm expression.  There is no point in collecting and aggregating “junk” data.

What is junk data?  I think of junk data as information generated without proper calibration on the equipment, or from sloppy farming operations. For example, if a combine believes it has a 7-row corn head instead of an 8-row head, its yield sensor would be off by 12.5%, creating data that is also off by 12.5%.  Similarly, a farmer who doesn’t fully clean out his corn planter when switching hybrids may plant a field that is only 95% of the variety recorded, leading to a potential 5% inaccuracy.

In the past, such inaccuracies only affected the farmer who generated such data.  With the rise of big data analytic tools, that is no longer the case.  A farmer who enters inaccurate yield data into a benchmarking tool is skewing the data for everyone else.  As a result, the farmer who enters accurate data from properly calibrated machines is comparing his data to artificial data—and since he doesn’t know the benchmarking data contains junk—he doesn’t know his results are skewed.

Farmers should realize that there is a potential liability issue here, too.  Some ag technology provider contracts require that farmers only upload accurate data. I’m sure the day will come when a landowner intentionally uploads inflated yield data in order to show inflated farmland value before a sale.

What we really need is the ability to clean farm data.  When I do my taxes on TurboTax, at the end of the process the software tells me where I might have made mistake when entering data.  How does the software know this?  It doesn’t.  But by comparing the figures that I entered to my previous tax returns and the millions of other people who filed similar returns, TurboTax is able to ask me: “Are you sure you sold that old car for $100,000?”  At which point I realize I entered an extra zero, fix the mistake and correct the return.  Google search bars are masters at sniffing out mistakes in search terms.  Does your farm data analytic tool do the same if you unknowingly upload junk data?

My advice to farmers is two-fold.  First, calibrate those data collection machines.  Accurate data is valuable.  Junk data is just junk.  Second, look for analytic data aggregators that “clean” data.  If the program you are using now doesn’t do this, my guess is that it will soon.