Schistosomiasis, a serious parasitic disease caused by trematode worms belonging to the genus Schistosoma, is not only harmful to health but also holds up the development of the economy and society in the endemic areas. According to the World Health Organization (WHO), the disease is prevalent in 78 tropical and subtropical countries around the world, where it affects the lives of more than 700 million people in endemic areas, with at least 240 million actually infected. Schistosomiasis thus remains a major public health hazard in low- and middle-income countries.
Controling the disease
Schistosomiasis transmission relies on the presence of water snails. Schistosomiasis japonica, once endemic in Yangtze River, China, is closely associated with the distribution of Oncomelania hupensis snail. After seven decades of continuous efforts, mainly based on snail control and treatment with the drug praziquantel, transmission interruption has been achieved in nine out of the 12 previously endemic provinces, the lowest level of prevalence ever obtained.
Now, China is making a great effort to eliminate schistosomiasis by specifically monitoring snail control. A national survey of snails is conducted every year during the spring and fall period and is the main effort to control schistosomiasis. The national snail survey is both labor-intensive and financially draining.
Effective spatial reassessment
The purpose of the snail survey is to locate the snail-survival places and make predictions of areas of high transmission of schistosomiasis. Due to the large scale of these snail surveys, and information obtained from pilot sites, the enormous snail database could be used to predict snail survival probability in the Yangtze River Basin.
We started from a simple idea: Do more survey sites mean more accurate predictions for snail probability?
We hypothesized that the original snail distribution data could be used by a resample approach to get new distribution points, then, using those new data we could predict the snail survival probabilities among study regions. Finally, the prediction performance could be compared to select the best resampling data.
This study initially developed the spatial reassessment process, which defines an ecological grid cell. These cells have the same environments for snail survival. If two or more snail sites exist in this grid cell, we only picked up one point.
By this spatial reassessment process, we could get a new sampling of snail data, then make model predictions for transmission areas of schistosomiasis.
In 2018 a total of 2369 sites among the Yangtze River Basin were sampled, out of which, 1061 contained live snails (detection rate is 0.448). We set grid cell distance as 5 km, 10 km, 50 km, 100 km, and 150 km. The resampling of snail sites were 1747, 1421, 209, 98, 44 respectively. The snail detection rate is 0.462, 0.471, 0.449, 0,469, and 0,477 respectively.
As the grid cells increased in size, the ecological zone got bigger and the snail resampling sites decreased at the same time, but the snail detection rate is keeping stable, as shown in the picture below.
Machine learning application
After resampling the snail datasets, we tried to combine the data with environmental and ecological variables, elevation, water distance, temperature, and rainfall to predict the suitability of snail survival areas.
The predictions are based on a machine learning Random forest algorithm. which could makes more precise predictions and could also compare different spatial reassessment of snail performance.
We found if we set grid cell distance to 5 km, the model performance gets the same results as the original snail survey site (see figure below).
Progress for the future
Progress has been made in establishing a surveillance tool for routine work of schistosomiasis. By setting a 5 km ecological snail-survey zone they could get the same results as the previous prediction, which decreased snail survey sites from 2369 to 1747 points. Machine learning is a useful modelling technique to predict the risk of schistosomiasis transmission. The advantages of machine learning and spatial resampling are to facilitate the investigation workload, and improved the disease monitoring system in the field of public health.