Code
library(nlraa)
library(tidyverse)
library(here)
library(janitor)
library(Metrics)
library(cowplot)
library(nlme)
library(kableExtra)
library(purrr)
library(knitr)
library(patchwork)
# glimpse(sm)
<- sm %>%
data clean_names()
March 3, 2024
Farmers need to understand the biology of plants and their responses to fertilizers to maximize yield. In this report, I conduct non-linear least squares on experimental growth data for three grains in Greece to make predictions on their yields. Because many crop and soil processes are better represented by nonlinear models comapred to linear models, nonlinear regression models are used to explore this data.
“We used data from Danalatos et al. (2009), which represent destructive measurements of aboveground biomass accumulation with time for three crops: fiber sorghum (F), sweet sorghum (S), and maize (M), growing in a deep fertile loamy soil of central Greece under two management practices: high and low input conditions, in 2008.” (Archontoulis 2015). The data used in this report was accessed by installing the “nlraa” package and then using library(nlraa).
Archontoulis, S.V. and Miguez, F.E. (2015). Nonlinear Regression Models and Applications in Agricultural Research. Agronomy Journal, Volume 107, Issue 2. Retreived from: https://acsess.onlinelibrary.wiley.com/doi/10.2134/agronj2012.0506
The five variables in the dataset are Day of the Year (DOY), Block, Input, Crop, and Biomass yield in Mg/ha. The data variables are described as follows:
Load libraries, load data, clean/tidy the data
Run nls on one crop (Sorghum)
Use purrr to run NLS models for all 24 combinations of plot
Make a “good looking” table
I use the Beta function:
Y = Ymax (1 + (tc - t)/(tc - tm))(t/tc)^(tc / (tc - tm))
“This model was selected because it captures the decline of biomass toward the end of the growing season and supplementary figure for the beta growth function. Also, the parameters have clear meaning and are very suitable to answer the research questions.” (Archontoulis 2015).
The parameters are defined as:
### build a function
beta <- function(doy, t_e, t_m, y_max){
out = y_max * (1 + (t_e - doy)/ (t_e - t_m)) * (doy/t_e)^(t_e/(t_e - t_m))
return(out)
}
y_max_guess <- 20
t_e_guess <- 240
t_m_guess <- 200
guess <- ggplot(data = data, aes(x = doy, y = yield, shape = crop)) +
geom_point() +
geom_smooth() +
facet_wrap(~input, labeller = labeller(input = c("2" = "High", "1" = "Low"))) +
labs(x = "Day of the Year",
y = " Dry biomass (Mg/ha)",
shape = "Crop") +
theme_bw()
After defining the model, building a function, and making initial guesses, I now run the NLS on one crop: the high input sweet sorghum (S) crop. The selected parameter values, standard errors, and p-values of the estimated parameters are displayed in Table 1.
### Sorghum Fields w/ High Inputs
sm_high <- data %>% filter(crop == "S" & input == "2")
sm_nls = nls(formula = yield ~ beta(doy, t_e, t_m, y_max),
data = sm_high,
start = list(t_e = t_e_guess, t_m = t_m_guess, y_max = y_max_guess),
trace = FALSE)
sm_nls %>%
broom::tidy() %>%
mutate(p.value = ifelse(p.value <= 0.05, "<0.05")) %>%
kbl(digits = 2, align = NULL) %>%
kable_classic()
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
t_e | 281.59 | 2.08 | 135.33 | <0.05 |
t_m | 244.78 | 3.51 | 69.70 | <0.05 |
y_max | 39.82 | 2.25 | 17.71 | <0.05 |
Table 1: The selected parameter values, standard errors, and p-values of the estimated parameters for NLS on the high input sweet sorghum (S) crop.
Now I run the NLS models for all 24 combinations of plot, input level, and crop type using purrr. Table 2 portrays the RMSE and chosen parameter values of the best fitted models for each species.
### define a function for NLS for all
crop <- function(nls_test){
nls(yield ~ beta(doy, t_e, t_m, y_max),
data = nls_test,
start = list(t_e = t_e_guess, t_m = t_m_guess, y_max = y_max_guess))
}
### purrr
yield_all <- data %>%
group_by(block, input, crop) %>%
nest() %>%
mutate(nls_model = map(data,~crop(.x))) %>%
mutate(predictions = map2(nls_model, data, ~predict(.x, newdata = .y))) %>%
mutate(rmse = map2_dbl(predictions, data, ~ Metrics::rmse(.x, .y$yield))) %>%
mutate(smooth = map(nls_model, ~predict(.x, newdata = list(doy = seq(147, 306)))))
rmse_table <- yield_all %>%
group_by(crop) %>%
summarise(rmse = min(rmse))
low_rmse <- yield_all %>%
filter(rmse %in% rmse_table$rmse)
low_rmse_M <- broom::tidy(low_rmse$nls_model[[1]]) %>%
mutate(crop = "Maize(M)")
low_rmse_S <- broom::tidy(low_rmse$nls_model[[2]]) %>%
mutate(crop = "Sweet Sorghum (S))")
low_rmse_F <- broom::tidy(low_rmse$nls_model[[3]]) %>%
mutate(crop = "Fiber Sorgum (F)))")
low_rmse_combined <- bind_rows(low_rmse_M, low_rmse_S, low_rmse_F)
low_rmse_combined <- low_rmse_combined[, c("crop", setdiff(names(low_rmse_combined), "crop"))]
low_rmse_combined %>%
kbl(digits = 2, align = NULL) %>%
kable_classic()
crop | term | estimate | std.error | statistic | p.value |
---|---|---|---|---|---|
Maize(M) | t_e | 252.57 | 1.87 | 135.13 | 0 |
Maize(M) | t_m | 225.23 | 1.93 | 116.92 | 0 |
Maize(M) | y_max | 18.93 | 0.84 | 22.55 | 0 |
Sweet Sorghum (S)) | t_e | 278.35 | 1.82 | 153.18 | 0 |
Sweet Sorghum (S)) | t_m | 245.77 | 3.82 | 64.42 | 0 |
Sweet Sorghum (S)) | y_max | 31.35 | 2.05 | 15.31 | 0 |
Fiber Sorgum (F))) | t_e | 280.43 | 1.84 | 152.49 | 0 |
Fiber Sorgum (F))) | t_m | 245.00 | 3.54 | 69.24 | 0 |
Fiber Sorgum (F))) | y_max | 29.06 | 1.66 | 17.53 | 0 |
Table 2: The RMSE and chosen parameter values of the best fitted models for each species – fiber sorghum (F), sweet sorghum (S), and maize (M).
# Unnest predictions from data and clean maize data
un_df <- yield_all %>%
filter(block==1) %>%
tidyr::unnest(smooth) %>%
mutate(doy=seq(147,306)) %>%
filter(!(doy>263 & crop=="M"))
# Create a dataframe to add corn data
hi_filter <- data %>%
filter(block == 1 & input == 2)
low_filter <- data %>%
filter(block == 1 & input == 1)
# Make graphs
hi_plot <- un_df %>%
filter(block == 1 & input == 2) %>%
ggplot() +
geom_point(data = hi_filter, aes(x = doy, y = yield, shape = crop)) +
geom_line(aes(x = doy, y = smooth, linetype = crop)) +labs(y = " ", x = "DOY", color = " ")+
theme_minimal()
# hi_plot
low_plot<-un_df |>
filter(block==1 & input==1) |>
ggplot()+
geom_point(data=low_filter,aes(x=doy,y=yield,shape=crop))+
geom_line(aes(x=doy,y=smooth,linetype=crop))+
labs(y = "Biomass Yield", x = "DOY", color = "")+
theme_minimal()
# low_plot
combined_plot <- low_plot + hi_plot
combined_plot + plot_layout(guides = "collect") + plot_annotation(tag_levels = "A")
@online{calbert2024,
author = {Calbert, Madison},
title = {Non-Linear {Least} {Squares}},
date = {2024-03-03},
url = {https://madicalbert.github.io/posts/2024-03-03-nonlinear-least-squares/},
langid = {en}
}