Using brain signals recorded from epilepsy patients, researchers from University of California, San Francisco (UCSF) have programmed a computer to mimic natural speech — an advancement that they suggest could one day have a profound effect on the ability of patients with speech loss to communicate.
The study, supported by the National Institutes of Health’s Brain Research through Advancing Innovative Technologies (BRAIN) Initiative, was published recently in Nature.
“Speech is an amazing form of communication that has evolved over thousands of years to be very efficient,” says senior author Edward F. Chang, MD, professor of neurological surgery at UCSF, in a media release from NIH.
“Many of us take for granted how easy it is to speak, which is why losing that ability can be so devastating. It is our hope that this approach will be helpful to people whose muscles enabling audible speech are paralyzed.”
Stroke, traumatic brain injury, and neurodegenerative diseases such as Parkinson’s disease, multiple sclerosis, and amyotrophic lateral sclerosis (ALS, or Lou Gehrig’s disease) often result in an irreversible loss of the ability to speak. Some people with severe speech disabilities learn to spell out their thoughts letter-by-letter using assistive devices that track very small eye or facial muscle movements. However, producing text or synthesized speech with such devices is laborious, error-prone, and painfully slow, typically permitting a maximum of 10 words per minute, compared to the 100-150 words per minute of natural speech.
The new system the researchers developed in Chang’s lab suggests that it may be possible to create a synthesized version of a person’s voice that can be controlled by the activity of their brain’s speech centers. In the future, this approach could not only restore fluent communication to individuals with severe speech disability, the authors say, but could also reproduce some of the musicality of the human voice that conveys the speaker’s emotions and personality, a separate release from UC San Francisco explains.
“For the first time, this study demonstrates that we can generate entire spoken sentences based on an individual’s brain activity,” Chang notes, in the UCSF release. “This is an exhilarating proof of principle that with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss.”
In the study, led by led by Gopala Anumanchipalli, PhD, a speech scientist, and Josh Chartier, a bioengineering graduate student in the Chang lab, five volunteers being treated at the UCSF Epilepsy Center — patients with intact speech who had electrodes temporarily implanted in their brains to map the source of their seizures in preparation for neurosurgery — were asked to read several hundred sentences aloud while the researchers recorded activity from a brain region known to be involved in language production.
Based on the audio recordings of participants’ voices, the researchers used linguistic principles to reverse engineer the vocal tract movements needed to produce those sounds: pressing the lips together here, tightening vocal cords there, shifting the tip of the tongue to the roof of the mouth, then relaxing it, and so on.
This detailed mapping of sound to anatomy allowed the scientists to create a realistic virtual vocal tract for each participant that could be controlled by their brain activity. This comprised two “neural network” machine learning algorithms: a decoder that transforms brain activity patterns produced during speech into movements of the virtual vocal tract, and a synthesizer that converts these vocal tract movements into a synthetic approximation of the participant’s voice, per the UCSF release.
The synthetic speech produced by these algorithms was significantly better than synthetic speech directly decoded from participants’ brain activity without the inclusion of simulations of the speakers’ vocal tracts, the researchers suggest. The algorithms produced sentences that were understandable to hundreds of human listeners in crowdsourced transcription tests conducted on the Amazon Mechanical Turk platform.
As is the case with natural speech, the transcribers were more successful when they were given shorter lists of words to choose from, as would be the case with caregivers who are primed to the kinds of phrases or requests patients might utter. The transcribers accurately identified 69% of synthesized words from lists of 25 alternatives and transcribed 43% of sentences with perfect accuracy. With a more challenging 50 words to choose from, transcribers’ overall accuracy dropped to 47%, though they were still able to understand 21% of synthesized sentences perfectly.
“We still have a ways to go to perfectly mimic spoken language,” Chartier acknowledges, in the UCSF release. “We’re quite good at synthesizing slower speech sounds like ‘sh’ and ‘z’ as well as maintaining the rhythms and intonations of speech and the speaker’s gender and identity, but some of the more abrupt sounds like ‘b’s and ‘p’s get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what’s currently available.”
The researchers are currently experimenting with higher-density electrode arrays and more advanced machine learning algorithms that they hope will improve the synthesized speech even further. The next major test for the technology is to determine whether someone who can’t speak could learn to use the system without being able to train it on their own voice and to make it generalize to anything they wish to say.
Preliminary results from one of the team’s research participants suggest that the researchers’ anatomically based system can decode and synthesize novel sentences from participants’ brain activity nearly as well as the sentences the algorithm was trained on. Even when the researchers provided the algorithm with brain activity data recorded while one participant merely mouthed sentences without sound, the system was still able to produce intelligible synthetic versions of the mimed sentences in the speaker’s voice.
The researchers also found that the neural code for vocal movements partially overlapped across participants, and that one research subject’s vocal tract simulation could be adapted to respond to the neural instructions recorded from another participant’s brain. Together, these findings suggest that individuals with speech loss due to neurological impairment may be able to learn to control a speech prosthesis modeled on the voice of someone with intact speech, the UCSF release continues.
“People who can’t move their arms and legs have learned to control robotic limbs with their brains,” Chartier comments. “We are hopeful that one day people with speech disabilities will be able to learn to speak again using this brain-controlled artificial vocal tract.”
Anumanchipalli adds, “I’m proud that we’ve been able to bring together expertise from neuroscience, linguistics, and machine learning as part of this major milestone towards helping neurologically disabled patients.”
[Source(s): NIH/National Institute of Neurological Disorders and Stroke, University of California – San Francisco, EurekAlert]