USS LAB Project Led by XU Wenyuan Won the Top 10 Academic Advances of Zhejiang University in 2017
The project “Dolphin Attacks - Protecting the Security of Speech Recognition Systems against Microphone Hardware Vulnerabilities” led by XU Wenyuan and JI Xiaoyu from Ubiquitous System Security Lab (USS LAB) of the EE College won the “Top 10 Academic Advances of Zhejiang University in 2017”.
Microphones are widely used in smart devices. “Dolphin Attacks” utilize the hardware vulnerabilities of microphonesto manipulate voice controllable systems with inaudible voice commands. Dolphin attacks provide a breakthrough in understanding the consequence of hardware vulnerabilities on the software and system level. The ability to defend against Dolphin Attacks can greatly improve the security of smart devices.
Speech recognition (SR) technologies allow machines or programs to identify spoken words and convert them into machine-readable formats. It has become an increasingly popular human-computer interaction mechanism because of its accessibility, efficiency, and recent advances in recognition accuracy. As a result, speech recognition systems have turned a wide variety of systems into voice controllable systems (VCS), e.g., Apple Siri and Google Now allow users to initiate phone calls by voices. Thus, it is important to understand the security of these systems: how speech recognition and the voice controllable systems behave under intentional and sneaky attacks.
The contribution of the paper includes the following. (1) We design DolphinAttack that can inject covert voice commands at state-of-the-art speech recognition systems by exploiting inaudible sounds and the hardware vulnerabilities of audio circuits. The attacks can inject a sequence of inaudible voice commands and can lead to unnoticed security breaches to the voice controllable systems. (2) Wesuggestbothhardware-basedandsoftware-baseddefense strategies to alleviate the attacks.
The underlying principles of Dolphin attacks are device non-linearity. For instance, amplifiers are known to have nonlinearity. By leveraging the nonlinearity of the microphone circuits, the modulated low-frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validate DolphinAttackon popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, to visit a malicious website, to send fake text messages and emails, and even manipulating the navigation system in an Audi automobile. We believe this list is by far not comprehensive. Nevertheless, it serves as a wake-up call to reconsider what functionality and levels of human interaction shall be supported in voice controllable systems.We propose hardware and software defense solutions to alleviate the attacks, and we provide suggestions to enhance the security of voice controllable systems.We validate that it is feasible to detectDolphinAttack by classifying the audios using supported vector machine (SVM), and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.