Clean data means accurate data

by Deborah Jeanne Sergeant

WATERLOO, NY – Staying clean is pretty hard for many farmers to do. But keeping “clean” data is vital for determining an accurate yield potential database.

Jodi Putman, field crops specialist with Cornell’s Northwest New York Dairy, Livestock & Field Crops Team, presented “The Processing/Cleaning of Soybean Yield Monitor Data for Standardized Yield Maps Across Farms, Fields and Years” at the recent Soybean & Small Grain Congress.

“A yield potential database has never been developed for soybeans,” Putman said.

In figuring its guidelines for fertility management, Cornell University takes into account soil type. The data are being used in on-farm research studies, assessing nutrient balances, determining yield potential and creating management zones.

But to get accurate data, farms and farm workers need to use the same methods for collecting the information.

“A data cleaning protocol consistent across farms, fields and years is needed to evaluate the quality and credibility of yield monitor data for validity of management decisions made based on the yield data,” Putnam said. “Semi-automation is needed for quick and consistent processing of whole-farm data.”

Putnam related that although soybeans have been grown in North America since 1765, little data have been collected about them. According to the 2018 Agriculture Overview, New York farmers reported an average yield of 52 bushels/acre.

Raising that average yield will rely on technology, according to Putman, including renewable energy sources, sensors providing actionable data to be processed and implemented, data analytics to use information collected in real time and artificial intelligence for machine learning such as drones, scouting fields and autonomy.

“The biggest issue with technology is adoption,” Putnam said. “The generations have a different learning curve.”

With that learning curve can come inaccurate data. Putnam noted that in a study by Blackmore and Moore in 1999, 32% of observations were removed in grain crops due to error in yield data. A different study by Thylen in 2001 showed that the error percentage ranged from 10 – 50% and a third study by Simbahan in 2004 showed that 13 – 20% of data had errors.

Regardless of how high the percentage is, it’s clear grain crop data are rife with inaccuracies.

“How many of you have multiple people running the combine or chopper?” Putnam asked attendees. “Quite a few of you.”

She cited examples of ways data may be recorded inaccurately. The speed across the field can influence the quality of the data. The start and end pass delays – the period of time when the harvester slows or speeds up and the end or a row or field – can also vary greatly among equipment operators.

The sensor may delay in recording in reference to the geolocation. Equipment may pass over an area already harvested if the operator isn’t perfectly precise. Data-gathering instruments may also lack careful calibration.

Putnam said some farmers assume that if their scale tickets match the yield monitor data that their data must be accurate.

“Variability across the field can cause it to be inaccurate,” Putnam warned.

She added that other farmers think data cleaning is too hard or takes too much time. She encourages farmers to manually clean 10 sample fields and track the cleaning settings, looking for known field features of larger fields. The cleaning settings need to be determined each harvest. Then, farmers can apply the average cleaning settings to farm-wide “batch cleaning.”

“With practice, data cleaning typically takes one to two hours per farm,” Putnam said. “Most of the headaches I’ve had is getting the raw data from the yield monitor.

“All farms that submit data are combined in one large database to derive yield frequency histograms per soil type, and yield potentials per soil type can be set if at least 50 data points are available. My experience from the corn project also needs to be checked for grain. The data here are whole field, whole farm, including the headlands.”

So far, she has received and cleaned data from 7,790 acres of soybeans from 2017-19. The average whole farm yield per bushel was 56.5 in 2016 (124 acres), 52.6 in 2017 (503 acres) and 67.6 in 2018 (351 acres).

Putnam wants farmers to name their fields, establish boundaries and calibrate equipment pre-harvest. In the field, they need to calibrate equipment again and record the field name, harvest speed, header height, swath width and rows.

“Don’t have multiple combines or choppers in the field,” Putnam said.

She believes that by making the right decisions in field management, operators can generate more reliable data.

“Mitigating errors at the source reduces the amount of data loss and the time needed to clean data,” Putnam said. “Accuracy of yield data depends on proper calibration and setup of yield monitoring equipment prior to and during the harvest.”

It’s also vital to keep up with data monitoring throughout the harvest.

“Download raw yield monitor data periodically during the season,” Putnam said. “The data cleaning protocol requires raw data to be transferred to Ag Leader Format. Save the original files and back them up on thumb drives and your computer. I’ve had people lose their data in the cloud.”

While it may seem a lot of work to record yield data, the information offers valuable insights to farmers.

“Once we have several years of data, we can identify zones by soil type so you can identify your yield potential by soil type,” Putnam said.

2020-03-13T10:56:17-05:00March 13, 2020|Western Edition|0 Comments

Leave A Comment