A Hybrid Ensemble method for Pulsar Candidate Classification. (arXiv:1909.05303v1 [astro-ph.IM])
<a href="http://arxiv.org/find/astro-ph/1/au:+Wang_Y/0/1/0/all/0/1">Yuanchao Wang</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Pan_Z/0/1/0/all/0/1">Zhichen Pan</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Zheng_J/0/1/0/all/0/1">Jianhua Zheng</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Qian_L/0/1/0/all/0/1">Lei Qian</a>, <a href="http://arxiv.org/find/astro-ph/1/au:+Li_M/0/1/0/all/0/1">Mingtao Li</a>

In this paper, three ensemble methods: Random Forest, XGBoost, and a Hybrid
Ensemble method were implemented to classify imbalanced pulsar candidates. To
assist these methods, tree models were used to select features among 30
features of pulsar candidates from references. The skewness of the integrated
pulse profile, chi-squared value for sine-squared fit to amended profile and
best S/N value play important roles in Random Forest, while the skewness of the
integrated pulse profile is one of the most significant features in XGBoost.
More than 20 features were selected by their relative scores and then applied
in three ensemble methods. In the Hybrid Ensemble method, we combined Random
Forest and XGBoost with EasyEnsemble. By changing thresholds, we tried to make
a trade-off between Recall and Precision to make them approximately equal and
as high as possible. Experiments on HTRU 1 and HTRU 2 datasets show that the
Hybrid Ensemble method achieves higher Recall than the other two algorithms. In
HTRU 1 dataset, Recall, Precision, and F-Score of the Hybrid Ensemble method
are $0.967$, $0.971$, and $0.969$, respectively. In HTRU 2 dataset, the three
values of that are $0.920$, $0.917$, and $0.918$, respectively.

In this paper, three ensemble methods: Random Forest, XGBoost, and a Hybrid
Ensemble method were implemented to classify imbalanced pulsar candidates. To
assist these methods, tree models were used to select features among 30
features of pulsar candidates from references. The skewness of the integrated
pulse profile, chi-squared value for sine-squared fit to amended profile and
best S/N value play important roles in Random Forest, while the skewness of the
integrated pulse profile is one of the most significant features in XGBoost.
More than 20 features were selected by their relative scores and then applied
in three ensemble methods. In the Hybrid Ensemble method, we combined Random
Forest and XGBoost with EasyEnsemble. By changing thresholds, we tried to make
a trade-off between Recall and Precision to make them approximately equal and
as high as possible. Experiments on HTRU 1 and HTRU 2 datasets show that the
Hybrid Ensemble method achieves higher Recall than the other two algorithms. In
HTRU 1 dataset, Recall, Precision, and F-Score of the Hybrid Ensemble method
are $0.967$, $0.971$, and $0.969$, respectively. In HTRU 2 dataset, the three
values of that are $0.920$, $0.917$, and $0.918$, respectively.

http://arxiv.org/icons/sfx.gif