Generative Modeling of Land Cover Data for Spatial Simulation
Christopher Krapu, Mark Borsuk, Ryan Calder
Land use/land cover (LULC) data are used in climate change studies to understand how urbanization and agricultural conversion impact carbon emissions. Quantifying the impact of land use decisions is challenging due to the complex response between spatial land cover patterns and quantities of interest such as carbon storage. We use generative models to produce rich counterfactual LULC simulations. This work explores the creation of an autoregressive decoder-only model as an effective strategy for modeling spatially resolved land cover data. We quantify the information content within gridded land cover data, measure 0.8 bits/pixel, and use this statistic to size and construct an order-agnostic GPT-based model trained on approximately 1 billion tokens derived from the National Land Cover Dataset. The model contains 13.6 million parameters and is trained to perform both unconditional and conditional land cover simulation. When tasked with inpainting 25% of pixels in the image interior, the model achieves a log loss of 1.04 and a pixel-wise accuracy of 0.61. We further show how to use this generative model to assess the probability of urbanization for a case study at Fort Hood, Texas.