How many of us love animals and think of our pets as part of the family? Well, what if I told you that among the 6-8 million animals that enter the rescue shelters every year, nearly 3-4 million (i.e., 50% of the incoming animals) are euthanized. Even more heartbreaking is that 10% – 25% of them are put to death specifically because of shelter overcrowding each year.
The problem of overpopulation of domestic animals continues to rise, leaving shelters faced with the challenge of how to increase adoption rates. Though animal shelters provide incentives such as reduced adoption fees and sterilizing animals before adoption, only a quarter of total animals living in the shelter are adopted.
Among the 6-8 million animals that enter rescue shelters every year, nearly 3-4 million are euthanized
These staggering statistics led us to investigate the length of stay of animals at shelters and the factors influencing the rate of animal adoption. The overarching goal of this study was to use these factors to predict and then minimize how long an animal will stay in a shelter, thereby decreasing the number of animals euthanized due to overcrowding. Several steps must be conducted to accomplish this goal, such as a literature search for the factors, collection of data from databases and animal shelters, and utilizing machine learning algorithms on this data to make predictions on length of stay in the shelters.
To answer the question of what factors influence the length of stay, a thorough literature review was conducted. Several factors were found to influence the length of stay including color, gender, breed, animal type, and age. To make the predictions about the length of stay using these factors, we evaluated using machine learning algorithms and predictive analytics.
Machine learning is just the ability to program computers to learn and improve by itself using training experience. The developed system needs to analyze big data, quickly deliver accurate and repeatable results, and adapt to new data. A system can be trained to make accurate predictions by learning from examples of desired input-output data. In other words, we wanted to utilize a labeled data set with the output (length of stay) already known, so that the computer could learn from it. The next step was to obtain this data from databases and animal shelters across the country.
The data that was collected from the databases and animal shelters included information such as animal type, intake and outcome date, gender, color, breed, and intake and outcome status (behavior of animal entering the shelter and behavior of animal at outcome type). These data sets included information from mostly southern and southwestern states. For the length of stay, the categories included “low”, “medium”, “high”, and “very high” (euthanized). Once the data was collected and cleaned, it was time to input it into the machine learning algorithms.
There are so many different types of algorithms that can be used on a data set to make predictions. The hard part is determining which algorithm will perform the best on the given data set, as the performance of the models depends on the application. Simple classification algorithms such as logistic regression, artificial neural network, gradient boosting, and random forest were used in this study.
Examining the results, the most proficient predictive model was developed by the gradient boosting algorithm for this dataset, followed by the random forest algorithm. The logistic regression algorithm appeared to have the worst performance metrics for all categories of length of stay. What was interesting was that the gradient boosting and random forest algorithms performed well when predicting the very high length of stay or when the outcome was euthanization at around 70-80%.
Looking at the results from the exploratory data above, it was observed that the number of days a dog stays in the shelter decreases as the age increases. This was not expected, as it is predicted that the number of days in a shelter would be lower for younger dogs and puppies. This observation could be due to having more data points for younger dogs.
Results showed that age, size, and color have a significant impact or influence on the length of stay.
Another interesting result from the study was the top features or factors from each machine learning algorithm. Results showed that age (senior, super senior, and puppy), size (large and small), and color (multicolor) have a significant impact or influence on the length of stay.
For future studies, a prescriptive analytics approach will be utilized. Not only is our goal to increase adoption rates of pets in animal shelters, but to also determine the optimal animal shelter location where the animal will have the least amount stay in a shelter and most likely be adopted.