Jump to content

Speech Recognition


Recommended Posts

Hello

I need to interpret some voice comments recorded on a voice mail. The 'Speech' in question is a structured comment on the sounds in the voice recording.

There are several LV examples but I cannot get any to work. I use Win XP and LV 7.0.

In the MS package there are some C++ examples which compile and work right out of the box. I assume that this means I have all the ActiveX components required.

Does anyone have a good example of LV voice recognition, or even Grammar building.

The speech engine is totally mind-blowing. You can limit its vocabulary to a small set (solitaire words for example "play the red queen") which improves the recognition. I would not be surprised to see speech recognition as an important input route for LV. There was a chap the other day trying to get a reading from a DMM when the signal was stable, presumably both hands were busy and he needed a way to trigger the acquisition.

Yours Sincerely

John

Link to comment

QUOTE(jbrohan @ Sep 23 2007, 09:10 AM)

Hello

I need to interpret some voice comments recorded on a voice mail. The 'Speech' in question is a structured comment on the sounds in the voice recording.

There are several LV examples but I cannot get any to work. I use Win XP and LV 7.0.

In the MS package there are some C++ examples which compile and work right out of the box. I assume that this means I have all the ActiveX components required.

Does anyone have a good example of LV voice recognition, or even Grammar building.

The speech engine is totally mind-blowing. You can limit its vocabulary to a small set (solitaire words for example "play the red queen") which improves the recognition. I would not be surprised to see speech recognition as an important input route for LV. There was a chap the other day trying to get a reading from a DMM when the signal was stable, presumably both hands were busy and he needed a way to trigger the acquisition.

The problem with speech recognition is that it is a failry complicated technique to get to work in any useful way. I only played briefly with it in other applications, without trying to import it into LabVIEW and it did not feel up to what I would expect from such a tool.

It is simply rather complicated to configure and train it appropriately since human perception of speech seems to be such an involved process and as probably anyone who knows more than one language can attest, is also very much depending on the environmental influences where the language is one parameter of it.

There has been work in speech recognition for more than two decades now with speech recognition technology already available in the Windows 3.1 area and still it hasn't made it to a meaningful means of human interaction with the computer, not to talk about replacing human interface devices like a mouse or keyboard at all. This has been partly because of processing power and memory usage but that can't be the only problem, when you consider that computers have now already 1000 times as much memory as was common 15 years ago and the CPUs run at about 50 times the speed of then and are even more powerful, not to mention the availibility of multicore and multi CPU systems.

Not having looked at the MS Speech recognition API in a long time I can't really say much about it but it has been already complicated years ago and probably got even more possibilities and features since then.

Rolf Kalbermatter

Link to comment

QUOTE(rolfk @ Sep 23 2007, 08:38 PM)

The problem with speech recognition is that it is a failry complicated technique to get to work in any useful way. I only played briefly with it in other applications, without trying to import it into LabVIEW and it did not feel up to what I would expect from such a tool.

It is simply rather complicated to configure and train it appropriately since human perception of speech seems to be such an involved process and as probably anyone who knows more than one language can attest, is also very much depending on the environmental influences where the language is one parameter of it.

There has been work in speech recognition for more than two decades now with speech recognition technology already available in the Windows 3.1 area and still it hasn't made it to a meaningful means of human interaction with the computer, not to talk about replacing human interface devices like a mouse or keyboard at all. This has been partly because of processing power and memory usage but that can't be the only problem, when you consider that computers have now already 1000 times as much memory as was common 15 years ago and the CPUs run at about 50 times the speed of then and are even more powerful, not to mention the availibility of multicore and multi CPU systems.

Not having looked at the MS Speech recognition API in a long time I can't really say much about it but it has been already complicated years ago and probably got even more possibilities and features since then.

Rolf Kalbermatter

I have to say my first reaction was quite similar to Rolf's.

However, if the possibility exists to train the system externally, and simply use the speech recognition as input, the complexity should stay within more or less acceptable bounds.

That said, all the caveats Rolf has mentioned still apply. But if it's needed, then make sure the training and fine-tuning can be done seperately to the LV program itself, otherwise it'll most likely get ugly.

Just my 2c

Shane.

Link to comment

Hi Guys

I don't think that it's as bad as you are suggesting. There is a small C++ example which shows the internal processes of interpreting a spoken phrase from a random speaker. For example the spoken phrase "1234" goes through several hypotheses like "one to free four". "one to threee four" and then "one two three four" and finally "1234" This technology has come a long way. The logic is complicated, but it's well hidden in an Active-X. The main contender for speech recognition is "Dragon Naturally Speaking" and the same active-x can load that engine in place of the MS engine with apparently nothing more thna pointing to it. I haven't done this yet.

The example image is from a "Command and Control" grammar about Solitaire.

The Vista uses the 5.3 version of the SDK, and it's integral to the operating system. XP uses 5.1 and you have to download it.

The C++ example "Reco" is *nearly* fine for my purposes. My problem is that I don't know the syntax, I jumped straignt from C to LV 2.0 and didn't really work in OBject Oriented C++. The Labview has the same difficulties of the Object Oriented approach to a complex (though well documented) interface. If anyone wants to dive into either the C++ or the LV (preferrably a 7.x) I'll be happy to cooperate.

John

Link to comment

John,

I've never attempted to use the SR portion of Microsoft's Speech API - just the TTS part - but I'll agree with the others that it's fundamentally much more complex. The only suggestion I would immediately make is to look for shipping examples written for Visual Basic rather than C++. I think the Automation interface steps required to build a working VB example map much closer to on-to-one to LabVIEW.

Best of success to you,

Dave

Link to comment

Hello Everybody

My name is Dops ,

I'm french and here is my first post on the Lava forum. Sorry for my foreign english language wich is very :thumbdown: .

I have never posted on this forum because the level is much higher than my knowledge in LV.

I haven't understood all things you discuss but maybe my contribution could help someone looking for some helps about Speech Recognition (SR):

Here's a very basic exemple about command Recognition with MS Agents and SAPI5.1 I've created last year.

When you say "One two" the computer recognises it and writes "Hello !". If you say "one five" it writes "Byebye ".

Allthings are well explained in the agent Help

Juste one word about my little experience about SR : It requires a very clean sound that is difiicult to provide in a classical working session, specially with more and more noisy big CPUs.

I can provide litle example with active X Dragon Naturally SR engine if needed.

As the attached VIs are LV8.2 ,there is also the JPG caps for the LV7 users. I will try this night to create the same in LV7. Or maybe someone will post an optimised LV7 version.

Best regards and thanks for your job. :worship:

Link to comment

I saw a demo of voice recognition using LabVIEW and Vista last year at the Tech Symposium . It was a bit buggy, but the newer multicore processors and LabVIEW 8.5 may make this closer to reality. There's talk of .NET 3.0; I don't know if this is available for XP

http://www.ni.com/swf/flv/labview/us/vista/vr/

I believe the person in the video also gave the presentation here in Boston.

Link to comment

QUOTE(LV Punk @ Sep 24 2007, 04:57 PM)

I saw a demo of voice recognition using LabVIEW and Vista last year at the Tech Symposium . It was a bit buggy

That's interesting. I also saw one of the local NI reps trying to demo VR in Vista in a convention last year and failing magnificently. :) It was probably the same presentation.

Link to comment

Hi Dops

Thank you very much for your most informative posting.

It runs but does not recognize anything I say, even with a French Accent!

I get an error when (LV7.0) it loads the SRModeID into the CharacterEx. What is that long string of letters and numbers? it is not in the Registry!

Is there any callback for when it hears something but does not know what it is? This would assure me that the Speech Recognition system is doing something. At present I can't see what it's up to?

I hope I can get this working, it's so short and will be so neat to implement in my program!

Bonjour de Montréal

John

Link to comment

QUOTE(jbrohan @ Sep 24 2007, 11:22 PM)

Bonjour John !

The Msagents needs you install the "Microsoft Speech Recognition Engine v4.0". (6Mo)

You can download it and more things here :msagents downloads page

(the sdk doc is on the link "downloads for developpers")

At first time, I installed all the items of this page (sapi4.0) and it runs well on XP. Since I've been running under SAPI5.1, it's running as well.

You will find in the sdk doc the explanation of the SRmodeId and LanguageId.

I'm a surprised by discovering another CLSID of the MS Speech Engine at the bottom of this page : installing a speech engine,

whereas "my"(and running) CLSID is proposed at the following page :Accessing a Speech Engine in Your Code

:blink:

The 8.2 and 7.0 vis have been tested before posting them.

By default, your agent indicates in a bubble when he's listenning something and indicates his trouble when he doesn't understand the sound he heard. So I think you can obtain programatically this information .

There is among the methods and properties (Idon't remember wich one) the possibility to obtain the accuracy (or confident ?) coef of the recognition when it happens.

About you ragents, of course you can edit it and program it. You can create an agent that doesn't appear but still makes his job...

As I speak english like a spanish cow I prefer to use Dragon Naturally activeX whose Engine recognizes French language :rolleyes:

Good night ;)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.