SARIMA
SARIMA
This project focuses on predicting dengue outbreaks at the administrative subdistrict level (known as khwaengs) in Bangkok. At such a fine spatial scale, case counts are subject to high stochasticity. Therefore, additional processing and modeling steps are required to extract meaningful signals from noisy data.
Subdistrict Selection and Clustering
We first identify the 55 khwaengs (out of 180 total) that account for 67% of all reported dengue cases in Bangkok. Focusing on these higher-incidence areas helps stabilize the signal and reduces noise.
Next, we apply a mixed clustering approach based on inter-khwaeng correlation statistics, assigning these 55 khwaengs to 3 main clusters. By design, khwaengs within the same cluster exhibit higher internal correlation, strong enough to suppose the epidemic patterns are consistent within clusters.

Modeling Seasonality & Peak Intensity
Within each cluster, we renormalize the time series to analyze seasonal trends independently of epidemic size. We find distinct seasonal signatures across the clusters.

To model the upcoming year’s epidemic, we apply a SARIMA model to each cluster’s aggregated case count. While SARIMA captures intra-annual dynamics (i.e., seasonality), it doesn’t give any information about interannual variability. To recover this, we approximate a Poisson distribution of peak sizes using historical data and run Monte Carlo simulations to capture the variability.
Disaggregating to Subdistricts
The predicted cluster-level cases are then distributed across individual khwaengs within each cluster. This allocation is based on a Gaussian distribution fitted to the historical relative intensities observed in each khwaeng.
For the remaining 125 khwaengs not selected in the initial step — where case counts are sparse and variability is high– we estimate monthly incidence using Poisson models fitted independently for each month based on historical averages and seasonalities.