We consider the problem of detecting and mitigating backdoors (Trojans) planted
in deep neural network classifiers through poisoning of data used for training.
We first describe two methods based on maximum classification margin to detect
and mitigate backdoors. These unsupervised methods are agnostic to the backdoor
pattern and backdoor mechanism, and the detection method does not rely on a
small clean dataset. Variations of the detection method are described including
for non-malicious bias. Finally, we will describe an agnostic method of reverse
engineering (inversion) of backdoors. These methods are based on the analysis of
embedded features (activation space) and are thus applicable to different
application domains including non-classification problems. This work is in
collaboration with David J. Miller, Zhen Xiang, Hang Wang, Xi Li.Biography:
George Kesidis received his MS (1990, neural networks and stochastic
optimization) and PhD (1992, performance evaluation and networking) in EECS
from UC Berkeley. Following eight years as a professor of ECE at the University
of Waterloo, he has been a professor of EE and CSE at the Pennsylvania State
University since 2000. In the past, his research has been supported by DARPA,
DHS, ONR, AFOSR and over a dozen NSF grants, and eight research gifts from
Cisco. His current research interests include cloud computing, caching, and
secure and robust ML/AI with applications. In 2012, he co-founded a start-up in
the AI/ML area.
Details