I’m writing some of this host with this speech to text app in my excess ability settings on my computer. Since we worked with text to speech applications to render the voices in Bartleby the scrivener, I thought it would be interesting to try things the other way around. I’ve never used speech to text before, just as I’ve never edited a few dozen audio clips together and end to end until this past week. Composing sentences in my mind first is much more difficult and composing them on a keyboard karma which in it, which enables me not only to correct mistakes, but to work with the language in a nonlinear way, and this seems to be the way my mind works with the language most of the time Amo G
Okay, this is becoming too difficult and time consuming.
Just as the experiment of “writing” with my voice forced me to shift my approach to composing with language, our group experiment, using text-to-speech for every character but Bartleby, forced me to listen differently and to think about how voice shapes a reader’s perceptions, not only of character but of an entire story. Removing the expected reshuffled the story thematically; it produced effects/affects that more conventional strategies might not achieved for an audiobook listener; and it caused me to be more open-minded about robots. The following ramble touches on some of the aforementioned:
These days speech-to-text/text-to-speech most often brings to mind the “accessibility features” I was just playing with: tools to make both two-way communication and/or reading easier for some people unable to speak, see, or manipulate a physical text. But speech-to-text was once primarily a tool used in modern business practices—dictation. The “boss” spoke into a machine and a secretary—a copyist of sorts who was the human mechanism for rendering speech to text—listened to it and typed it out. In the 20th century, the clerical workforce primarily comprised of women.* Without consulting any sources I’ll go way out on a limb and suggest that women still make up a large percentage of the white-collar workforce that is not “middle management” or above. Women added a gender differential to the workplace, and the way we perceive, define, and differentiate labor was forever changed. Automation is a form of apparently “workerless” labor; now that it surrounds us, could it be that the way we conceive of it is obliquely but intrinsically connected to the way we understand gender? Siri and Alexa, for example, have been discussed in this respect. (It’s also been argued that some voices are shown to affect listeners in a more optimal way than others, which is supposedly why people prefer women’s voices on their GPS.)
As we know, when Melville wrote Bartleby, the office worker and the copyist were traditionally male roles—women had not yet entered the public sphere as white-collar workers. It follows then that Bartleby, as both scrivener and refusenik, was rendered from the get-go as a rather robotic individual in every way, someone who becomes increasingly sapped of a humanity the narrator laments—or fears for?—in the final sentence. Ironically, this loss of humanity is also what seems to accelerate (or at least coincides with) his loss of functionality as a copying machine. Conversely, we like Siri and Alexa because they have the amazing functions of a machine, yet sound sort of human. (Although personally I can’t stand the sassy tone of the UPS lady.) Could it be that, in order for the world (as we know it) to work, robotic and humanistic elements must coexist in our machines somehow? In us? I don’t think Melville had the cyborg in mind, but N. Katherine Hayles might agree. Marx might not approve.
Our group chose to use a chorus of female automatons for our version of Bartleby. This was in large part an effort to work against the text, this time to turn the entirely male world of the narrative inside out. One thing that occurred to me as I discussed this with Lauren, just before she began converting the text to speech, is that this approach also turns a typical feminist reading on its head. Not only were all the human characters rendered as robots and the most robotlike character rendered as human, but women occupied roles both of labor and power over labor. They were running the machine.
These three shifts in perspective produced a very alien/alienating experience of the story– but in a good way. As other people have noted, listening accentuated aspects of the narrative that might have slipped by more easily in an unvoiced text. Listening to a text-to-speech rendering, with all of its imperfections, tone-deaf pronunciations, and incorrect “translations,” highlighted Melville’s language choices and the way we hear “authority” and force in individual words and phrases that could just as easily be deemphasized by “wrong” speech. Voices with the barest hint of emotion—as well as those with non-“English” accents—actually brought out the humor in the story, even in the mundane interactions of minor characters.
Not only did the humor become clearer to me, but oddly enough, so did the idiosyncrasies of Melville’s characters. In many ways, Bartleby is the least sympathetic character, if only because he is the most thinly written character. There’s just not much of him there at all, which is part of the point. Although the narrator’s empathy was something our group had debated, and which drove us to make some unusual post-production choices, as a female robot, the narrator actually seemed a little more human to me. They all did. Melville’s characters are flawed and interesting, because humans are flawed and interesting. Listening to the clips of “tape” I was ham-fistedly editing together, I found myself sympathizing more with these alien voices. Forced to act out Bartleby the Scrivener, they resembled machines that were putting on a play, on the verge of manifesting human traits, but just shy of the kind of emotional capacity we believe draws a line between them and us.
* I’m thinking ahead to fantasy scene in Vannevar Bush’s “As We May Think” (1945):
At a recent World Fair a machine called a Voder was shown. A girl stroked its keys and it emitted recognizable speech. No human vocal chords entered into the procedure at any point; the keys simply combined some electrically produced vibrations and passed these on to a loud-speaker. In the Bell Laboratories there is the converse of this machine, called a Vocoder. The loudspeaker is replaced by a microphone, which picks up sound. Speak to it, and the corresponding keys move. This may be one element of the postulated system.
The other element is found in the stenotype, that somewhat disconcerting device encountered usually at public meetings. A girl strokes its keys languidly and looks about the room and sometimes at the speaker with a disquieting gaze. From it emerges a typed strip which records in a phonetically simplified language a record of what the speaker is supposed to have said. Later this strip is retyped into ordinary language, for in its nascent form it is intelligible only to the initiated. Combine these two elements, let the Vocoder run the stenotype, and the result is a machine which types when talked to.