Papers

Black-box Adversarial Attacks with Limited Queries and Information

Andrew Ilyas*, Logan Engstrom*, Anish Athalye*, Jessy Lin*

Abstract

Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model of full query access. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.

@inproceedings{ilyas2018blackbox,
  author = {Andrew Ilyas and Logan Engstrom and Anish Athalye and Jessy Lin},
  title = {Black-box Adversarial Attacks with Limited Queries and Information},
  booktitle = {Proceedings of the 35th International Conference on Machine Learning, {ICML} 2018},
  year = {2018},
  month = jul,
  url = {https://arxiv.org/abs/1804.08598},
}

Synthesizing Robust Adversarial Examples

Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin Kwok

https://arxiv.org/abs/1707.07397

Abstract

Neural network-based classifiers parallel or exceed human-level accuracy on many common tasks and are used in practical systems. Yet, neural networks are susceptible to adversarial examples, carefully perturbed inputs that cause networks to misbehave in arbitrarily chosen ways. When generated with standard methods, these examples do not consistently fool a classifier in the physical world due to viewpoint shifts, camera noise, and other natural transformations. Adversarial examples generated using standard techniques require complete control over direct input to the classifier, which is impossible in many real-world systems.

We introduce the first method for constructing real-world 3D objects that consistently fool a neural network across a wide distribution of angles and viewpoints. We present a general-purpose algorithm for generating adversarial examples that are robust across any chosen distribution of transformations. We demonstrate its application in two dimensions, producing adversarial images that are robust to noise, distortion, and affine transformation. Finally, we apply the algorithm to produce arbitrary physical 3D-printed adversarial objects, demonstrating that our approach works end-to-end in the real world. Our results show that adversarial examples are a practical concern for real-world systems.

@inproceedings{athalye2018synthesizing,
  author = {Anish Athalye and Logan Engstrom and Andrew Ilyas and Kevin Kwok},
  title = {Synthesizing Robust Adversarial Examples},
  booktitle = {Proceedings of the 35th International Conference on Machine Learning, {ICML} 2018},
  year = {2018},
  month = jul,
  url = {https://arxiv.org/abs/1707.07397},
}