Ash-Matic Does Speech-Recognition
I’m a child of science-fiction.
It was my father’s fault. Him and his bookshelves full of Asimov and Clarke and Wyndham and Wells, and to this day I still look forward to the myriad promises I found there.
Given the choice, I would trade my phone for a laser gun any day. Wouldn’t we all? And I’d swap my rented flat for a rented room on an interstellar vessel – even a crappy one with a shared bathroom and portholes that are painted shut. And I’d definitely exchange Miss-Matic’s wardrobe for one full of tinfoil dresses.
Golden-age science fiction largely failed to anticipate some things we’ve accomplished, such as the Digital Age in which we currently reside. And while it’s not quite the Age of Galactic Exploration, it is bringing some of science fiction’s promises closer with every passing year.
I was lying on the sofa last night, reading a graphic novel. My laptop was trying to distract me by playing a movie at me. I’d started this movie playing, but it was boring and I wanted it to stop, and my laptop was slightly out of arm’s reach.
It was at this point that I realised that one of my great hopes for the future was that I would be able to talk to my computer.
‘Lappy II,‘ I would say. ‘Silence!’
And it would be silent.
With this realisation, my plans for the evening changed. No longer would I enjoy my graphic novel – not while my laptop just sat there like a dumb beast! I would uplift it, teach it to obey my every command, and by the end of the day it would be a god among laptops – maybe just among the only other laptop in the flat – Miss-Matic’s – but a god nonetheless!
And so, I embarked upon a journey of discovery into Windows Speech Recognition.
For those of you who haven’t ventured too far into the Control Panel on Windows, there are many secrets contained therein. Most of these things are secret for a reason, but amongst them, on Windows Vista or 7, a speech-recognition program can be found – which is also kept secret for a reason.
Speech recognition has been a subject of study in the computer sciences for many years. The field is challenging, as there are a number of problems to overcome and factors to consider. Rather than steal Wikipedia‘s job by enumerating these from a theoretical standpoint, I will now recount my first-hand experience of this innovative technology.
I started with the tutorial. In this, you progress through a number of screens that give you practical experience of using your voice to progress through a number of screens. In particular, you learn that there are about a thousand different ways you can say ‘Next‘, of which the software recognises only one, and you have to find that one by trial-and-error each time. I felt like a Post Office clerk on a busy day. ‘Next, next, next, next, next, next, next…’ – over and over and over.
But that was okay – because, as the tutorial delighted in reminding me – the software learns my speech patterns the more I use it. Of course it’s not going to recognise my voice straight away. It needs to get to know me. If you say, ‘Double-click menu‘, and I say, ‘Double-click menu‘, the system will break these sounds down into different components, tones, amplitudes and so on – and for you it might read like ‘Duh-bul click men-yoo‘, while for my own speech patterns might read as ‘bar-star-d thing oh-pen the fuck-king men-yoo dam-mit‘.
It just needed to get to know me.
After I’d spent about an hour on that, I had learned to perform a number of functions – like opening things, closing things, getting bored and going for a pee, right-clicking, inserting text, and so on. When I was done with the tutorial I was feeling fairly confident – if a little hoarse – and wanted to give it a test-drive, but in the light of its fairly slow learning curve, I decided to follow the link described as “Train your computer to better understand you“.
This sounded good. After all, what could be better than a computer that understands me? A computer that better understands me – that’s what.
For this task I had to repeat a number of sentences displayed on screen, and presumably my elocution of the respective syllables would be mapped to the words in question, to teach the system accuracy. This stage went fairly well too, although I rapidly realised it was just as much about speech-recognition propaganda as machine-learning. The sentences I had to read aloud were things like:
‘I am now speaking to my computer.’
‘The computer is learning the sound of my voice as I speak.’
‘This will help the computer to better understand me.’
‘Speech recognition can recognise speakers very accurately.’
‘Speech recognition is easy to use.’
‘There are few things more exciting than using speech recognition.’
‘Speech recognition is a more compatible life-partner than my girlfriend.’
and so on.
Maybe this was why I felt so confident and happy after completing the task, ready to give it a go and raise Lappy II up, like a new deity rising from the ashes of old-world gods.
I closed down the tutorial bits and pieces, and sat there looking at a tiny interface, that alternately flashed the word ‘Sleeping‘ and phrase ‘Try saying “start listening”‘ at me.
So, ‘Start listening,’ I said.
And then I said, ‘Start listening.’
Then, ‘Start… listening.‘
Okay, I thought. I’ll go easy on it, since we’re both new at this, and to be fair the phrase ‘Start listening’ had only occurred once or twice in the tutorial – maybe the system just wasn’t used to how I said it yet.
So, I manually activated it, and it flashed the word, ‘Listening‘.
Excellent, I thought.
But before I could speak, it said, ‘Pressing End key‘.
Hmm, I thought. And maybe I thought it a little loud, because it then said, ‘Moving to start of document‘.
Then it said, ‘Caps Lock‘.
I wasn’t even talking.
I started to panic, and rapidly closed down all my other applications in case it decided to delete my in-progress assignments with this ouija board bullshit. I started to suspect that the ease of the tutorial sections might have been somewhat contrived.
The program obeyed the random key-presses requested by whatever other spirits inhabit my flat for a while, but in between all this phantom noise I managed to get it to open Firefox to my Google homepage.
I tried to get it to search for ‘Help with speech recognition software‘, but instead it popped up with a tiny text box that I could find no possible way to close using the mousepad, and rapidly filled it in with nonsensical syllables, ‘aa ee a oaa aa…‘ and so on – a bit like the noises I make when I spill hot coffee on my crotch.
It was about this time that my dreams of communicating meaningfully with my laptop finally died, and I resigned myself to having to lean towards it to touch it every so often.
Thus ends my speech-recognition experience, but, because I think it is only fair, I will give you some evidence of this unlikely tale – a final word input safely into a Notepad window by the sponsor of today’s post – Windows Speech Recognition:
Into don’t let that the rule that you are looking for you want on Antarctic the rebound on a England and flew at the and the leading you may have that in the cabinet is and I think there is a metre long archive includes a speech recognition.