Chapter 5: Conclusion

Chapter 5: Conclusion

We conclude this thesis by outlining our agronomic and computational contributions, and listing some limitations of the presented approach. We then outline some avenues for future research in this area.

5.1 Contributions

Agronomic Contributions

For the agronomic part, we determine performance‐critical thresholds of THI on dairy performance for six dairy cow breeds within Switzerland. This includes the non‐linear marginal effects of THI on milk and ECM yield for these breeds. The THI turning points begin as early as the high forties for primiparous cows and in the low fifties for multiparous cows, contingent upon the breed and performance variable. Prior research indicates a greater heat‐resilience in primiparous cows relative to multiparous cows; however, our findings suggest the contrary. Next, we note varying non‐linear marginal effects of THI on different breeds in terms of volumetric milk yield. Yet, these marginal effects align to very similar curves when the volume is adjusted with fat and protein to the ECM yield. Moreover, when the data is divided into two periods, 1982 to 2010 and 2011 to 2023, we observe that the THI turning point is generally lower in the more recent period. Furthermore, while not explicitly addressed in our research question, the incorporation of days in milk as a non‐linear function enables the derivation of average lactation curves for each breed and facilitates the identification of lactation peaks utilizing the same methodology as a secondary outcome. These are not fully assessed in this work; however, they imply that lactation peaks may occur earlier than traditionally assumed.

To the best of our knowledge, even when subsampling, our investigation offers unprecedented granularity across geospatial and temporal dimensions. Furthermore, the data‐driven smooth‐term strategy utilizing GAMs isolates the THI effect while accounting for geospatial, locational, animal, and temporal factors with sparse yet extensive production data, greatly simplifying the methodology by requiring minimal initial assumptions about non‐linearity of THI and the lactation curve. Generally, our research may be regarded as an advancement of Citation: et al., , , , & (). Quantifying the effect of thermal environment on production traits in three breeds of dairy cattle in New Zealand. , 50(3). 327–338. https://doi.org/10.1080/00288230709510301  , employing more recent computational and mathematical innovations, chiefly supported by Citation: , (). Generalized Additive Models: An Introduction with R, Second Edition (2). Chapman; Hall/CRC. https://doi.org/10.1201/9781315370279 .

Implications for Swiss Agriculture

Our results suggest that the thermoneutral zone for Swiss dairy cows is at a lower THI range than expected. When translated to a temporal context, this suggests that dairy cows are adversely affected by heat stress as early as March and continuing through to the late autumn months. These findings underscore the critical importance of well‐chosen farm management strategies, including housing and feeding practices, to mitigate milk yield reductions and maintain animal welfare. Although further validation is necessary, the results for distinct periods suggest diminished heat tolerance in today’s dairy cows. At present, increasing heat resilience is not a primary objective in breeding programs. However, given the anticipated rise in the frequency of hot days, such a breeding objective may be needed, depending on the progress in breeding for milk yield in the upcoming years.

Computational Contributions

For the computational part, arising from the challenges encountered with our sparse data structure as depicted in Figure 3.1 and the targeted estimation strategy delineated in Section 3.2.4, a bottleneck in the mgcv GAM fitting routines is identified. As a remedy, we develop two prototype extensions for gamm4 in order to estimate Generalized Additive Mixed Models with a larger number of samples and random effect factor levels than previously achievable. gamm4b represents a version with bug fixes and performance enhancements based on R. gammJ facilitates the estimation of GAMMs using a modified gamm4 along with a bridge to an adapted version of MixedModels.jl serving as the foundational estimation engine, complemented by a performance‐optimized data covariance matrix decomposition for post‐processing operations.

5.2 Limitations

High-Producing vs Low-Producing Cows

The principal objective of our work is to determine the national average impact of the THI on dairy performance across different breeds. High‐producing dairy cows are more susceptible to heat stress than their low‐producing counterparts Citation: et al., , , & (). Heat stress in lactating dairy cows: A review. , 77(1). 59–91. https://doi.org/10.1016/S0301-6226(01)00330-X ; Citation: et al., , & (). Seasonal heat stress: Clinical implications and hormone treatments for the fertility of dairy cows. , 84(5). 659–666. https://doi.org/10.1016/j.theriogenology.2015.04.021 . This particular aspect has not been incorporated into our current models and therefore, the consideration of production levels within our study remains incomplete. The unusually low peak THI values, alongside the residual patterns in diagnostic plots of the models, suggest that integrating this aspect could potentially enhance the model quality and provide a more lucid understanding of the peak THI values.

Single-Breed Models

As elucidated in Section 3.4.4.4, the principal limitation of gammJ is not in the estimation of models that incorporate millions of samples and numerous random effect factor levels, but rather in the calculation of the data covariance matrix V \mathbf{V} . This constraint hinders our capacity to estimate a multi‐breed model as proposed in Equation 3.2.3, as we foresee the requirement for at least several hundred thousand samples per breed. This conclusion is derived from model evaluations involving Jersey experiments as documented in Table 3.5. While single‐breed models allow for the estimation and visual examination of the THI effects on dairy performance across diverse breeds, they are insufficient for determining the statistical significance of the differences observed between these breeds. In a multi‐breed model, wherein the effect of THI is interacted with the breed, an estimate of the statistical significance of each smooth is obtainable, and moreover, they can be visually compared safely as they originate from the same model. Furthermore, differences between smooth terms can be computationally discerned Citation: , (). Comparing smooths in factor-smooth interactions i. https://fromthebottomoftheheap.net/2017/10/11/difference-splines-i/. ; Citation: , (). Comparing smooths in factor-smooth interactions II. https://fromthebottomoftheheap.net/2017/12/14/difference-splines-ii/. . When comparing smooth terms of distinct single‐breed models, a robust comparison is feasible only if one can substantiate that all terms in a corresponding multi‐breed model are interacted by breed. The justification for such an interaction for all model terms is partially addressed in Section 3.2.3, but the general question is still open.

Subsampling

Section 4.1 and the corresponding results demonstrate the influence of farm and sampling structure on the THI diversity within our subsamples. The central argument underpinning the subsampling strategy outlined in Chapter 4 is the implicit maintenance of farm structure within the model. Consequently, given the imposed limitation on the maximum number of samples due to computational constraints, it becomes pertinent to explore whether the current subsampling strategy remains suitable, since in scenarios the number of farms merely amounts to several hundred. For example, considerations may be made towards relaxing the assumption regarding farm structure and update the sampling strategy accordingly.

Limited Interpretability of P-Values and Spatial Autocorrelation

The field‐derived data is acquired from operational agricultural environments, with the objective of overseeing breeding advancements and facilitating long‐term analytics. This data is not derived from an experimental design, as elaborated in Section 3.1. Data collection includes repeated measurements for each animal and farm. On designated sampling days, data from all animals within a farm is collected concurrently. The geospatial distribution of farms introduces regional dynamics and topological diversity, which are further influenced by meteorological conditions. Therefore, the samples are not independent and identically distributed. This is why we use a mixed model. We partially control for group, spatial and temporal dynamics with animal α \alpha , farm ϕ \phi and spatio‐temporal ι \iota random effect terms in our models. Nonetheless, the issue of autocorrelation is not addressed in the error terms ϵ \epsilon , which are under iid assumption in a mixed model framework, nor is it considered within the model weights because both lme4 and MixedModels.jl lack implementation of such structures. Consequently, although a larger sample size results in significant p‐values, these p‐values lack interpretative validity. The same applies to confidence intervals. Nevertheless, the reliability of the estimated fixed effect coefficients and smooth terms remains unaffected, as they are derived from a substantial dataset comprising hundreds of thousands of samples. However, it is imperative to exercise utmost caution when interpreting confidence intervals and conducting hypothesis testing.

Farm Location as Fixed Effects

Both, random and fixed effects, are applied to account for unobserved heterogeneity. In this context, Citation: , (). Econometric analysis of cross section and panel data. The MIT Press. Retrieved from http://www.jstor.org/stable/j.ctt5hhcfr  [Chapter 10] examines the assumptions underlying both fixed and random effects models. The assumption associated with random effects suggests that unobserved individual heterogeneity remains uncorrelated with the covariates. In contrast, the fixed effects assumption allows for the possibility that these individual‐specific effects may be correlated with the explanatory variables. Drawing from this econometric perspective, it can be asserted that the farm location random effect ϕ \phi demonstrates a certain degree of correlation with the THI smooth function f(THI), thereby contravening the assumption of random effects. Consequently, it may be advocated that the unobserved heterogeneity associated with farm location should be represented through fixed effects. Notably, in Citation: et al., , , , & (). Quantifying the effect of thermal environment on production traits in three breeds of dairy cattle in New Zealand. , 50(3). 327–338. https://doi.org/10.1080/00288230709510301  , individual herds are modeled as fixed effects. A qualitative argument posits that constraining the individual farm coefficients by modeling them as random intercepts could be overly restrictive. The variability in farming practices and conditions may not be adequately captured by a normal distribution of these coefficients. Conversely, an argument in favor of treating individual farms as random intercepts arises from the limited subset of farms utilized in our subsamples. Despite this, farm coefficients are modeled as random effects in all our experiments for a practical reason: Most mixed model libraries do not implement the fixed effect matrix as sparse matrices, which would be a requirement given the number of farms in our dataset. MixedModels.jl does indeed provide support for sparse fixed effect matrices. However, due to time constraints, models where individual farms are treated as fixed effect coefficients have not been tested. Nevertheless, this debate surrounding farms as fixed effects requires more attention.

Additional Weather Variables as Control

Our models do not control for additional weather variables such as radiation, sunshine duration, or precipitation. As indicated in Chapter 2, these factors similarly influence a dairy cow’s heat management. Consequently, the effect of THI not be completely isolated. Therefore, it could be advantageous to conduct experiments that control for these variables, notwithstanding the possibility that they might confound with temperature and relative humidity. An example study which applies this methodology for test‐day milk samples is Citation: et al., , & (). Temperature, productivity, and heat tolerance: Evidence from Swedish dairy production. , 175(1-2). 10. https://doi.org/10.1007/s10584-022-03461-5 .

5.3 Future Research

Finally, we outline seven opportunities for extension of this work into multiple directions.

First, the preceding section underscores the necessity to conduct further experimentation in model selection and subsampling strategies. This involves a meticulous examination of whether single breed models are adequate to facilitate a valid comparison among breeds.

Second, it is essential to address the distinction between high‐ and low‐producing cows. In this context, data splits as straightforward as milk yield above μ+σ2 \mu + \frac{\sigma}{2} for high‐producing and below μσ2 \mu - \frac{\sigma}{2} for low‐producing cows could serve as an initial framework.

Third, other dimensions of model selection merit attention, including the decision between random effects and fixed effects for farm location, additional controls for weather, as well as considerations of autocorrelation. Generally, model selection involving mixed models requires the use of maximum likelihood estimation ML instead of restricted maximum likelihood REML to ensure their comparability with model selection criteria such as the Akaike Information Criterion Citation: , (). Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. , 52(3). 345–370. https://doi.org/10.1007/BF02294361 . To date, our extension from Section 3.4.4 has only been tested with REML, as a comprehensive model selection process has not yet been pursued. Our parameter investigation has prioritized the appropriate determination of degrees of freedom for the smooth terms, for which we used estimated degrees of freedom from mgcv as a selection criterion. In addressing autocorrelation aspects with GAMMs, spatial+ by Citation: et al., , & (). Spatial+: A novel approach to spatial confounding. arXiv. https://doi.org/10.48550/arXiv.2009.09420   might be considered. Additionally, incorporating spatial coordinates such as longitude and latitude, as well as the day of the year for temporal aspects, as smooth terms could potentially address autocorrelation concerns. Alternatively, Citation: , (). GLMM FAQ. https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#spatial-and-temporal-correlation-models-heteroscedasticity-r-side-models.   posits that the integration of spatial correlation functions into lme4 is not an immediate priority, but should be reportedly straightforward to implement in the context of linear mixed models. It is reasonable to infer that this assertion extends to the MixedModels.jl library as well.

Fourth, the effect of additional dairy performance variables from THI could be investigated, such as lactose and somatic cell count. Both of these are relevant for quality and health aspects. Moreover, the component yields such as fat and protein could be analyzed individually, in addition to the energy‐corrected milk (ECM) yield to better understand which and how much they contribute to the observed phenomenon of aligned ECM curves for individual breeds.

Fifth, our models do not include interactions between THI and days in milk. Given that GAMs can incorporate multi‐dimensional smooth terms, exploring a two‐dimensional smooth function f(THI,DIM) f(THI, DIM) could enhance both the interpretability and representativeness of the modelling.

Sixth, the models employed in this study do not include the weather exposure experienced by dairy cows prior to parturition. Additionally, they do not consider breeding ancestry or supplementary physiological data available. The data furnished by the three associations, in conjunction with the available meteorological data, would facilitate such expansions.

Seventh, the library mgcv developed by Simon Wood demonstrates high computational efficiency for generalized additive models (GAMs) when applied to datasets characterized by a large number of samples but limited in the diversity of factor levels. The methodologies we have formulated in Section 3.4.4 partially bridge this gap. Nevertheless, further empirical validation and optimizations are necessary to make these innovations ready for broader use in academic and industrial applications.

5.4 Bibliography

Ahmed, Tamminen & Emanuelson (2022)
, & (). Temperature, productivity, and heat tolerance: Evidence from Swedish dairy production. , 175(1-2). 10. https://doi.org/10.1007/s10584-022-03461-5
Bolker (2024)
(). GLMM FAQ. https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#spatial-and-temporal-correlation-models-heteroscedasticity-r-side-models.
Bozdogan (1987)
(). Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. , 52(3). 345–370. https://doi.org/10.1007/BF02294361
Bryant, López‐Villalobos, Pryce, Holmes & Johnson (2007)
, , , & (). Quantifying the effect of thermal environment on production traits in three breeds of dairy cattle in New Zealand. , 50(3). 327–338. https://doi.org/10.1080/00288230709510301
De Rensis, Garcia-Ispierto & López-Gatius (2015)
, & (). Seasonal heat stress: Clinical implications and hormone treatments for the fertility of dairy cows. , 84(5). 659–666. https://doi.org/10.1016/j.theriogenology.2015.04.021
Dupont, Wood & Augustin (2020)
, & (). Spatial+: A novel approach to spatial confounding. arXiv. https://doi.org/10.48550/arXiv.2009.09420
Simpson (2017)
(). Comparing smooths in factor-smooth interactions i. https://fromthebottomoftheheap.net/2017/10/11/difference-splines-i/.
Simpson (2017)
(). Comparing smooths in factor-smooth interactions II. https://fromthebottomoftheheap.net/2017/12/14/difference-splines-ii/.
Kadzere, Murphy, Silanikove & Maltz (2002)
, , & (). Heat stress in lactating dairy cows: A review. , 77(1). 59–91. https://doi.org/10.1016/S0301-6226(01)00330-X
Wood (2017)
(). Generalized Additive Models: An Introduction with R, Second Edition (2). Chapman; Hall/CRC. https://doi.org/10.1201/9781315370279
Wooldridge (2010)
(). Econometric analysis of cross section and panel data. The MIT Press. Retrieved from http://www.jstor.org/stable/j.ctt5hhcfr