2013年10月30日星期三

JAVA do voice recognition

I want to use java to achieve a voice recognition software . For example, I say 'hello', the software will capture the eigenvalues ​​of talking to me , and recorded and the corresponding output values. When people say 'hello' when collecting their characteristic values ​​, and with my comparison , the same as on the adoption of different inputs to continue . Sound microphone achieved primarily through acquisition .
------ Solution ---------------------------------------- ----
voice recognition is a specialized technique , in fact, the judgment of the sound wave . If you want to use the Java implementation , then you must first sound input and output of Java have a basic understanding of the operation . Here is a simple recording and output procedures. Recording, each read data is stored in the buffer into the byte array , and then output it . And you, is to identify , then, is to extract information from the array . So what is the structure of this array like it . On this procedure, using 16-bit dual-channel recording, then this array , each two put together , is a sample value, this value is the loudness of the sound . The 2 bytes as a unit , then, as is the two-channel , that is, a set of two units , one for the left channel and a right channel , alternate entry .

import java.io.*;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.FloatControl;
import javax.sound.sampled.TargetDataLine;
public class RecordAndPlay {
    volatile int divider;
    public RecordAndPlay(){
        Play();
    }
    public static void main(String[] args) {
        new RecordAndPlay();
    }
    //播放音频文件
    public void Play() {

        try {
            AudioFormat audioFormat =
//                    new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100F,
//                    8, 1, 1, 44100F, false);
             new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,44100F, 16, 2, 4,
             44100F, true);
            DataLine.Info info = new DataLine.Info(TargetDataLine.class,
                    audioFormat);
            TargetDataLine targetDataLine = (TargetDataLine) AudioSystem.getLine(info);
            targetDataLine.open(audioFormat);
            SourceDataLine sourceDataLine;
            info = new DataLine.Info(SourceDataLine.class, audioFormat);
            sourceDataLine = (SourceDataLine) AudioSystem.getLine(info);
            sourceDataLine.open(audioFormat);
            targetDataLine.start();
            sourceDataLine.start();
            FloatControl fc=(FloatControl)sourceDataLine.getControl(FloatControl.Type.MASTER_GAIN);
            double value=0.2;
            float dB = (float)(Math.log(value==0.0?0.0001:value)/Math.log(10.0)*20.0);
            fc.setValue(dB);
            int nByte = 0;
            final int bufSize=1024;
            byte[] buffer = new byte[bufSize];
            while (nByte != -1) {
                //System.in.read();
                nByte = targetDataLine.read(buffer, 0, bufSize);
                sourceDataLine.write(buffer, 0, nByte);
            }
            sourceDataLine.stop();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

------ Solution ------------------------------------- -------
it to you for the last time that the recording process , I wrote a program recording and waveform display . The following functions:
1. Click "Start" to start recording and display , click on the " pause" to pause the recording.
2. recording pause mode, you can move the scroll bar to view the waveform before , press the " Playback" and display the waveform can be played back , playback starts at the leftmost currently displayed waveform corresponding to the sound ( but here encounter a problem, that last bit sound playback does not come out , it is rather strange , did not find a reason ) .
procedures are as follows :

import java.awt.*;
import java.awt.event.*;
import java.awt.geom.Line2D;
import javax.swing.*;
import javax.swing.event.*;
import java.util.*;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.FloatControl;
import javax.sound.sampled.TargetDataLine;
/**
 * 2011-6-23 0:53:40
 * @author Administrator
 */
public class ShowWave {
    JFrame frame;
    JPanel pan;
    final int panHeight=400;
    JScrollBar timeLocationScrollBar;
    int point[];
    int number;
    byte bufferAll[];
    int bufferAllIndex;
    int vRate,hRate;
    JButton startButton,pauseButton;
    JButton replayButton,stopReplayButton;
    boolean continueRecorde;
    boolean continueReplay;
    JPanel centerPane,buttonPane;
    JSlider hSlider;
    boolean jsbActive;
    public ShowWave(){
        initData();
        frame=new JFrame("录音并显示波形");
        pan=new JPanel(){
            public void paint(Graphics g){
                g.setColor(Color.WHITE);
                g.fillRect(0, 0, this.getWidth(), this.getHeight());
                g.setColor(Color.red);
                int x[]=new int[number];
                for(int i=0;i<number;i++){
                    x[i]=i;
                    point[i]=panHeight-point[i];
                }
                g.drawPolyline(x, point, number);
//                Graphics2D g2d=(Graphics2D)g;
//                for(int i=0;i<number-1;i++){
//                    g2d.draw(new Line2D.Double(x[i], point[i], x[i+1], point[i+1]));
//                }
                g.setColor(Color.blue);
                
            }
        };
        pan.setPreferredSize(new Dimension(600,panHeight));
        timeLocationScrollBar=new JScrollBar();
        timeLocationScrollBar.setOrientation(JScrollBar.HORIZONTAL);
        timeLocationScrollBar.setMaximum(0);
        timeLocationScrollBar.setMinimum(0);
        timeLocationScrollBar.setValue(0);
        timeLocationScrollBar.addAdjustmentListener(new AdjustmentListener() {
            public void adjustmentValueChanged(AdjustmentEvent e) {
                if(jsbActive==false){
                    return;
                }
                synchronized(bufferAll){
                    int beginIndex=timeLocationScrollBar.getValue();
                    beginIndex=beginIndex*2*hRate;
                    if(beginIndex==0){
                        number=bufferAllIndex/hRate/2;
                        if(number>600){
                            number=600;
                        }
                    }
                    else{
                        number=600;
                    }
                    for(int i=0;i<number;i++,beginIndex+=2*hRate){
                        int hBit=bufferAll[beginIndex];
                        int lBit=bufferAll[beginIndex+1];
                        point[i]=hBit<<8|lBit;
                        point[i]/=vRate;
                        point[i]+=panHeight/2;
                    }
                    pan.repaint();
                }
            }
        });
        centerPane=new JPanel();
        centerPane.setLayout(new BorderLayout());
        centerPane.add(pan);
        centerPane.add(timeLocationScrollBar,BorderLayout.SOUTH);
        frame.getContentPane().add(centerPane);
        startButton=new JButton("开始");
        startButton.addActionListener(new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                continueRecorde=true;
                jsbActive=false;
            }
        });
        pauseButton=new JButton("暂停");
        pauseButton.addActionListener(new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                continueRecorde=false;
                jsbActive=true;
            }
        });
        hSlider=new JSlider();
        hSlider.setOrientation(JSlider.HORIZONTAL);
        hSlider.setMaximum(100);
        hSlider.setMinimum(1);
        hSlider.setValue(hRate);
        hSlider.addChangeListener(new ChangeListener() {
            public void stateChanged(ChangeEvent e) {
                hRate=hSlider.getValue();
                int length=bufferAllIndex/hRate/2;
                length-=600;
                int value=(int)((double)timeLocationScrollBar.getValue()/timeLocationScrollBar.getMaximum()*length);
                jsbActive=false;
                timeLocationScrollBar.setMaximum(length);
                jsbActive=true;
                timeLocationScrollBar.setValue(value);
            }
        });
        replayButton=new JButton("回放");
        replayButton.addActionListener(new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                if(continueRecorde||continueReplay){
                    return;
                }
                new Thread(){
                    public void run(){
                        try{
                            continueReplay=true;
                            AudioFormat audioFormat =new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
                                    44100F, 16, 1, 2,44100F, true);
                            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
                            SourceDataLine sourceDataLine;
                            sourceDataLine = (SourceDataLine) AudioSystem.getLine(info);
                            sourceDataLine.open(audioFormat);
                            sourceDataLine.start();
                            FloatControl fc=(FloatControl)sourceDataLine.getControl(FloatControl.Type.MASTER_GAIN);
                            double value=2;
                            float dB = (float)(Math.log(value==0.0?0.0001:value)/Math.log(10.0)*20.0);
                            fc.setValue(dB);
                            int beginIndex=timeLocationScrollBar.getValue();
                            beginIndex=beginIndex*2*hRate;
                            int bufSize=1024;
                            byte buffer[]=new byte[bufSize];            
                            while (beginIndex < bufferAllIndex && continueReplay) {
                                synchronized (bufferAll) {
                                    int nByte = bufferAllIndex - beginIndex > bufSize ? bufSize : bufferAllIndex - beginIndex;
                                    System.arraycopy(bufferAll, beginIndex, buffer, 0, nByte);
                                    sourceDataLine.write(buffer, 0, nByte);
//                                    System.out.println(beginIndex+"  "+bufferAllIndex);
                                    beginIndex += nByte;
                                    if(beginIndex/2/hRate<=timeLocationScrollBar.getMaximum()){
                                        timeLocationScrollBar.setValue(beginIndex/2/hRate);
                                    }
                                }
                            }
                            sourceDataLine.flush();
                            sourceDataLine.stop();
                            sourceDataLine.close();
                            continueReplay=false;
                        }catch(Exception ee){ee.printStackTrace();}
                    }
                }.start();
            }
        });
        stopReplayButton=new JButton("停止回放");
        stopReplayButton.addActionListener(new ActionListener(){
            public void actionPerformed(ActionEvent e) {
                continueReplay=false;
            }
        });
        Box box=Box.createHorizontalBox();
        box.add(Box.createHorizontalGlue());
        box.add(startButton);
        box.add(Box.createHorizontalStrut(10));
        box.add(pauseButton);
        box.add(Box.createHorizontalStrut(10));
        box.add(hSlider);
        box.add(Box.createHorizontalStrut(10));
        box.add(replayButton);
        box.add(Box.createHorizontalStrut(10));
        box.add(stopReplayButton);
        box.add(Box.createHorizontalGlue());
        box.setBorder(BorderFactory.createEmptyBorder(10, 10, 10, 10));
        frame.getContentPane().add(box,BorderLayout.SOUTH);
        frame.pack();
        frame.setLocationRelativeTo(null);
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        frame.setVisible(true);
        play();
    }

------ For reference only ----------------------------------- ----
this is probably not that clear Xieqing of twelve . . .
------ For reference only -------------------------------------- -
wait for the landlord to achieve !
------ For reference only -------------------------------------- -
good idea
continued efforts
------ For reference only -------------------------------- -------
strong, waiting for the code
------ For reference only ------------------------ ---------------
help you a top one ! I hope you will soon be finding out ! When talk about your ideas, paste the code ah ! Let us learn , learn ah !
------ For reference only -------------------------------------- -
learn. .
------ For reference only -------------------------------------- -
me too
------ For reference only ------------------------------- --------
Java in this respect an open source project shortly before someone else to do this ...... I use C + + to do a similar function , but it 's simpler
- ----- For reference only ---------------------------------------
top up +
Concern

------ For reference only ---------------------------------- -----
we all have top yourselves under way without professional help so many prawns are where to go

------ For reference only ---------------------------------- -----
java to do this . . More trouble
------ For reference only ------------------------------------ ---
landlord can understand speech recognition software under Linux

------ For reference only ------------------------------ ---------

goodies
learned
------ For reference only -------------------------------- -------
first learn because the original did not come into contact with do not know how to start it . You will use sphinx2-0.1 This I see something online that specialize in JAVA inside voice but under pressure do not know how to use

------ For reference only ---------------------------------- -----
13th floors can give thanks heroes speak more I looked a little confused huh. . .
------ For reference only -------------------------------------- -
good, worthy of attention. . .
------ For reference only -------------------------------------- -
top one , first learn ! !
------ For reference only -------------------------------------- -
up, learn
------ For reference only ------------------------------ ---------

Hey, in fact I know it is not much. Since the original stir the procedures used in a bare metal control sound output, the sampling rate, the number of sample bits , for a number of single and double channel which understood, and the sampling of the data structure and the meaning of some understanding, of course , these are your identification on the basis that if you do not know the meaning of reading , of course, is not to start with it . In fact, for the incoming data collected , as I posted that programs, two bytes form a unit value is the loudness of the sound at a time , then a series of consecutive these values ​​constitute the loudness changing sound sequence. Of course, for such a digital sampled values ​​are discrete. For example, with a sampling rate of 44100 , which 1s sampled 44,100 times per 1/44100s sampled value. If you use a timeline depicting these points and twenty-two connected , that seeing is a fitting waveform. You want to analyze , is this waveform.
------ For reference only -------------------------------------- -

can you give me some useful information Yeah
------ For reference only ------------------- --------------------
code is too long, only to separate stickers

public void initData(){
        point=new int[600];
        Arrays.fill(point, 0);
        number=600;
        bufferAll=new byte[130*1024*1024];
        bufferAllIndex=0;
        vRate=120;
        hRate=20;//1470
        continueRecorde=false;
        continueReplay=false;
        jsbActive=true;
    }
    public void play() {

        try {
            AudioFormat audioFormat =
//                    new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100F,
//                    8, 1, 1, 44100F, false);
             new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,44100F, 16, 1, 2,
             44100F, true);
            DataLine.Info info = new DataLine.Info(TargetDataLine.class,
                    audioFormat);
            TargetDataLine targetDataLine = (TargetDataLine) AudioSystem.getLine(info);
            targetDataLine.open(audioFormat);
            SourceDataLine sourceDataLine;
            info = new DataLine.Info(SourceDataLine.class, audioFormat);
            sourceDataLine = (SourceDataLine) AudioSystem.getLine(info);
            sourceDataLine.open(audioFormat);
            targetDataLine.start();
            sourceDataLine.start();
            FloatControl fc=(FloatControl)sourceDataLine.getControl(FloatControl.Type.MASTER_GAIN);
            double value=2;
            float dB = (float)(Math.log(value==0.0?0.0001:value)/Math.log(10.0)*20.0);
            fc.setValue(dB);
            int nByte = 0;
            final int bufSize=256;
            byte[] buffer = new byte[bufSize];
            new Thread(){
                public void run(){
                    while(true){
                        if(!continueRecorde){
                            try{
                                Thread.sleep(50);
                            }catch(InterruptedException ie){}
                            continue;
                        }
                        synchronized(bufferAll){
                            if(600*hRate*2<bufferAllIndex){
                                int beginIndex=bufferAllIndex-600*hRate*2;
                                for(int i=0;i<600;i++,beginIndex+=2*hRate){
                                    int hBit=bufferAll[beginIndex];
                                    int lBit=bufferAll[beginIndex+1];
                                    point[i]=hBit<<8|lBit;
                                    point[i]/=vRate;
                                    point[i]+=panHeight/2;
                                }
                                number=600;
                                pan.repaint();
                            }
                            else{
                                int beginIndex=0;
                                number=bufferAllIndex/hRate/2;
                                for(int i=0;i<number;i++,beginIndex+=2*hRate){
                                    int hBit=bufferAll[beginIndex];
                                    int lBit=bufferAll[beginIndex+1];
                                    point[i]=hBit<<8|lBit;
                                    point[i]/=vRate;
                                    point[i]+=panHeight/2;
                                }
                                pan.repaint();
                            }
                            int length=bufferAllIndex/hRate/2;
                            if(length>600){
                                timeLocationScrollBar.setMaximum(length-600);
                                timeLocationScrollBar.setValue(length-600);
                            }
                        }
                        try{
                            Thread.sleep(10);
                        }catch(InterruptedException ie){}
                    }
                }
            }.start();
            while (nByte != -1) {
                if(!continueRecorde){
                    try{
                        Thread.sleep(50);
                    }catch(InterruptedException ie){}
                    continue;
                }
                synchronized(bufferAll){
                    nByte = targetDataLine.read(buffer, 0, bufSize);
                    System.arraycopy(buffer, 0, bufferAll, bufferAllIndex, nByte);
                    bufferAllIndex+=nByte;
                    sourceDataLine.write(buffer, 0, nByte);
                }
//                try{
//                    Thread.sleep(10);
//                }catch(InterruptedException ie){}
            }
            sourceDataLine.stop();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    public static void main(String args[]){
        new ShowWave();
    }
}

really unreasonable , three hundred lines , also called long
------ For reference only ------------------------ ---------------
upstairs strong ! ! !
------ For reference only -------------------------------------- -
passing moment, said Kai-fu Lee has done voice recognition
------ For reference only --------------------- ------------------
really powerful ah , learning
------ For reference only ------------ ---------------------------
this is too cow B totally do not understand
------ For reference only ---------------------------------------
embedded C + + is still relatively good
------ For reference only ------------------------------------- -
Oh thank you very much . . . . I want to end posted
------ For reference only ------------------------------ ---------

looked, write nice , after recording the output ask where ah ? Convenient for you , please add QQ313158469, ask for advice about ?
Thank you !
------ For reference only -------------------------------------- -
I am also concerned about, , , extracting speech features ( sample ) ----- > Stock -------> ( voice command ) ----- > Libraries contrast. ( Specific recognition )


Who idea to extract voice features .


------ For reference only ---------------------------------- -----
remove noise ---- byte ( that is not what we are saying ) abs (80) or how much ,
---- - For reference only ---------------------------------------
Thank you very much ; 22 F
------ For reference only --------------------------------- ------
33 F, what progress ah, share with ah.
------ For reference only -------------------------------------- -
good thing to learn ah
------ For reference only -------------------------- -------------
code must annotate it
------ For reference only ----------------- ----------------------
Bangding ah, I was doing the voice score java program , a friend will add this to qq: 508651685
------ For reference only ---------------------------------------
sphinx4 under trial , can be transferred through the demo, the recognition rate is very low , I think my signature file which is not configured well, just do not know what it did a good job , have to do this for you, give guidance
------ For reference only ---------------------------------------
heroes Hello , would like to ask you a question.
"
recording, each read data is stored in the buffer into the byte array , and then output it . And you, is to identify , then, is to extract information from the array . So what is the structure of this array like it . On this procedure, using 16-bit dual-channel recording, then this array , each two put together , is a sample value, this value is the loudness of the sound . The 2 bytes as a unit , then, as is the two-channel , that is, a set of two units , one for the left channel and a right channel , alternate entry .
"

read your postings in the forum , but do not quite understand .

I do now a heartbeat frequency calculation procedures , sound recording to , but get this byte array buffer , do not know how to do the next step , seeking guidance , thank you.

没有评论:

发表评论