目录
  1. 1. 减噪方向
  2. 2. 卷积噪声?
  3. 3. 噪声对倒谱系数的影响
    1. 3.1. 高斯白噪声
    2. 3.2. car噪声
    3. 3.3. cafe噪声
  4. 4. 统计量估计
    1. 4.1. white噪声
    2. 4.2. car噪声
    3. 4.3. cafe噪声
  5. 5. 去除统计量测试
噪声对语音统计特性的影响

减噪方向

  • 信道空间:语音增强技术

    主要的目的去噪声:在时域中能获得更加清晰的波形。

    • 谱减法
    • 卡尔曼滤波器
    • 信道子空间法
  • 特征空间:寻找鲁棒性语音特征

    在特征域,这一部分主要也就是在Mel域中,考虑到了人耳的听觉效应,目的是为了得到鲁棒性更好的特征,就是无论外界用了有什么样的噪声,或者信号采集设备,提取出的特征基本不变。一般涉及鲁棒性的东西,基本上提取的是特征的统计属性。要在概率层面做出点工作。

    • CMS(CMN)
    • 直方图均衡化
  • 后端技术:声学模型自适应技术

    这涉及的不深,而且又点复杂,如PCM,根据预先采集到的噪声(即先验知识)协助识别。使得模型更加适应于模型。如MAP就是先生成一个UBM(先验知识),然后识别阶段基于UBM做点工作。

    • MAP自适应:如UBM
  • 最大相似度线性回归法:MLLR

    • 并行模型合并

卷积噪声?

  • 由信道差异所导致的(即,传输信道的频谱),房间的回声
  • 由语音增强所导致的,频域阶段随机的噪声。(如:音乐噪声)

噪声对倒谱系数的影响

图中x轴是各倒谱的取值分布情况,y轴是信噪比,z轴反映的是倒谱的概率密度,即PDF

可以从中发现:

  • 在低维部分(1-5维),倒谱不是一个正太分布,而高维取值(10~13维)从分布上更接近与正太分布
  • 强调正太分布的原因在于:均值归一化的理论在于纯语音的倒谱是一个准高斯分布,在奇数阶统计量应为0,所以将带噪语音段去均值有助于减少噪声对倒谱的影响。
  • 可以发现越来越尖:即方差越来越小。

高斯白噪声

高斯白噪声对各维倒谱系数的影响

car噪声

car噪声对各维倒谱系数的影响

cafe噪声

cafe 噪声对各维倒谱系数的影响
image-20200409221746465
clc;clear all;close all;
%% 超参数设置:
% MFCC configure
MFCC_conf = 'E';
SNR = 5 ;
Channel_noise=1 ;

%% TrainSet特征提取
topTraindir = 'C:\Users\wangj\Documents\MATLAB\Experiment05\data';%Train整体数据文件夹

tic;
h = waitbar(0,'训练开始'); % 初始化进度条
for k = 1:13
xi = determine_xi_range(topTraindir,MFCC_conf,k);
n_xi = length(xi);
MFCC_by_SNR = zeros(n_xi,41);
pdf_by_SNR = zeros(n_xi,41);
for SNR = 0:40
save_memory_dim = 1 ;
[all_Speaker_Data,all_Speaker_ID] = ...
get_noise_MFCC(topTraindir,save_memory_dim,SNR,MFCC_conf);
MFCC_by_frame = cell2mat(all_Speaker_Data);
for n = 1 : size(MFCC_by_frame,2)
MFCC_by_SNR(n,SNR+1) = MFCC_by_frame(k,n); % 取出第2个倒谱系数
end
pdf_by_SNR(:,SNR+1) = ksdensity(MFCC_by_SNR(:,SNR+1),xi);
end
figure(k);
[X,Y] = meshgrid(0:40,xi);
mesh(X,Y,pdf_by_SNR);
set(gca,'xtick',0:10:40,'xticklabel',{'SNR=0','SNR=10','SNR=20','SNR=30','SNR=40'});
set(gcf,'color','w')
view([60,50])
title(['cafe噪声,第' num2str(k) ' 维系数'])

waitbar(k/13,h,'updated message'); % 可以根据进度动态调整显示内容
end

close(h); % 关闭进度条
toc;

%% function函数

% 需要预先算一遍,确定xi的范围
function xi = determine_xi_range(topTraindir,MFCC_conf,k)
save_memory_dim = 1 ;
SNR=20 ; % 任取一个SNR,这里取中间值20
[all_Speaker_Data,~] = ...
get_noise_MFCC(topTraindir,save_memory_dim,SNR,MFCC_conf);
MFCC_by_frame = cell2mat(all_Speaker_Data);
MFCC_by_SNR = zeros(size(MFCC_by_frame,2),1);
for n = 1 : size(MFCC_by_frame,2)
MFCC_by_SNR(n) = MFCC_by_frame(k,n); % 取出第k个倒谱系数
end
[~,xi] = ksdensity(MFCC_by_SNR); % 取出SNR=0时
end


function [trainSpeakerData,train_SpeakerID] = get_noise_MFCC(topTraindir,N,SNR,MFCC_conf)
train_wavdir = get_wavdir(topTraindir,N);
nTra_Speakers = size(train_wavdir,1) ;
nTra_Channels = size(train_wavdir,2) ;

trainSpeakerData = cell(nTra_Speakers,nTra_Channels);
train_SpeakerID = cell(nTra_Speakers, nTra_Channels);
% fprintf('\nTrain Set特征提取...\n\n');
% tic ;
for i = 1:nTra_Speakers
for j = 1:nTra_Channels
[x,fs] = readsph(train_wavdir{i,j});
% qq = struct('tn',0.0075,'gz',0.00001);
% y2 = v_vadsohn(x,fs,'a',qq);
% x = x(y2==1);
% [x,~] = Gnoisegen(x,SNR);
type = 'cafe' ;
[x,~] = add_noise_Reality(x,type,SNR);
trainSpeakerData{i,j} = v_melcepst(x,fs,MFCC_conf)';
% 正则表达式提取SpeakerID
pat = '(?<=\\)[FM].{3}[0-9]' ;
name = regexpi(train_wavdir{i,j},pat,'match') ;
train_SpeakerID{i,j} = name{1} ;
% fprintf('\n%s的特征提取完成!',name{1});
end
end
% fprintf('\n特征提取完成!');
% toc;
end

function train_wavdir = get_wavdir(topTraindir,N)
%{
Input:
topTraindir:文件夹路径
OutPut:
train_wavdir : 提取文件夹路径
%}
nTra_Channels = 10;
trainFolder = dir(topTraindir);
trainFolder = trainFolder(3:end);
nTra_Speakers = size(trainFolder,1);
train_wavdir = cell(nTra_Speakers, nTra_Channels);
for i = 1:nTra_Speakers
curTraindir = [topTraindir,'\',trainFolder(i).name];%Train子文件夹
trainWav = dir([curTraindir,'\*.WAV']);%直接读取WAV文件,不用改
for j = 1:nTra_Channels
train_wavname = [curTraindir,'\',trainWav(j).name];%当前wav文件地址
train_wavdir{i,j} = train_wavname;
end
end
train_wavdir = train_wavdir(1:N,:);
end

function [x2,fs] = select_noice(type,Nx)
if type == "cafe"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\cafe.wav' ;
elseif type == "car"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\car.wav' ;
elseif type =="white"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\white.wav' ;
end

[x,fs] = readwav(dir) ;
max_length = length(x);

X = randi(max_length-Nx,1);
x2 = x(X:X+Nx-1);
end

function [y,fs] = add_noise_Reality(x,type,snr)

Nx = length(x); % 求出信号x长
[noise,fs] = select_noice(type,Nx) ; % 生成噪声数据

% 重新计算Noise信号的幅值,以达到预设的SNR
signal_power = 1/Nx*sum(x.*x); % 求出信号的平均能量
noise_power=1/Nx*sum(noise.*noise);% 求出噪声的能量
noise_variance = signal_power / ( 10^(snr/10) ); % 计算出噪声设定的方差值
noise=sqrt(noise_variance/noise_power)*noise; % 按噪声的平均能量构成相应的白噪声

% 这里假定为加性噪声
y=x+noise; % 构成带噪语音
end

统计量估计

  • 加性噪声对语音统计量的影响
    • 并非是纯净语音+噪声语音统计量
    • 原理上:卷积噪声在倒谱域上应该是相加。(卷积噪声\rightarrow 频域相乘 \rightarrow 到频域相加 ,但这里是加性噪声,不怎么明显,还是呈现非线性的影响)

white噪声

  • 均值,方差下降明显
  • 更高阶统计量无规律

car噪声

  • 对高阶系数,均值明显减少为0,但对于第2维放大
  • 方差还是整体减少。

cafe噪声

  • 主要方差有明显下降的趋势。

Matlab编程:

%{

重开主题,主要是对《基于特征参数归一化的鲁棒语音识别方法综述 》的复现
- 计算统计量

%}
clc;clear all;close all;
%% 超参数设置:
% MFCC configure
MFCC_conf = 'E' ;
SNR = 20 ;
Channel_noise=1 ;

%% TrainSet特征提取
topTraindir = 'C:\Users\wangj\Documents\MATLAB\Experiment05\data';%Train整体数据文件夹

tic;

% clean_cep
SNR = 20 ;
save_memory_dim = 1 ;
[clean_cep,noise_cep,trainSpeakerData] = ...
get_noise_MFCC_tmp(topTraindir,save_memory_dim,SNR,MFCC_conf);

MFCC = cell2mat(clean_cep);
MFCC_avg = mean(MFCC');
MFCC_cov = var(MFCC');
MFCC_skewness = skewness(MFCC');
MFCC_kurtosis = kurtosis(MFCC')-3;

subplot 431;bar(MFCC_avg);xlabel('均值');title('纯净信号');
subplot 434;bar(MFCC_cov);xlabel('方差');
subplot 437;bar(MFCC_skewness);xlabel('偏度');
subplot(4,3,10);bar(MFCC_kurtosis);xlabel('峰度')

% noise_cep
MFCC = cell2mat(noise_cep);
MFCC_avg = mean(MFCC');
MFCC_cov = var(MFCC');
MFCC_skewness = skewness(MFCC');
MFCC_kurtosis = kurtosis(MFCC')-3;

subplot 432;bar(MFCC_avg);xlabel('均值');title('纯噪声');
subplot 435;bar(MFCC_cov);xlabel('方差');
subplot 438;bar(MFCC_skewness);xlabel('偏度');
subplot(4,3,11);bar(MFCC_kurtosis);xlabel('峰度')

% SNR = 20 时的带噪MFCC
MFCC = cell2mat(trainSpeakerData);
MFCC_avg = mean(MFCC');
MFCC_cov = var(MFCC');
MFCC_skewness = skewness(MFCC');
MFCC_kurtosis = kurtosis(MFCC')-3;

subplot 433;bar(MFCC_avg);xlabel('均值');title('带噪信号(SNR=20dB)');
subplot 436;bar(MFCC_cov);xlabel('方差');
subplot 439;bar(MFCC_skewness);xlabel('偏度');
subplot(4,3,12);bar(MFCC_kurtosis);xlabel('峰度')

set(gcf,'color','w');
toc;

%% function函数

function [clean_cep,noise_cep,trainSpeakerData] = ...
get_noise_MFCC_tmp(topTraindir,N,SNR,MFCC_conf)
train_wavdir = get_wavdir(topTraindir,N);
nTra_Speakers = size(train_wavdir,1) ;
nTra_Channels = size(train_wavdir,2) ;

trainSpeakerData = cell(nTra_Speakers,nTra_Channels);
noise_cep = cell(nTra_Speakers, nTra_Channels);
clean_cep = cell(nTra_Speakers, nTra_Channels);
% fprintf('\nTrain Set特征提取...\n\n');
% tic ;
for i = 1:nTra_Speakers
for j = 1:nTra_Channels
[x,fs] = readsph(train_wavdir{i,j});
% qq = struct('tn',0.0075,'gz',0.00001);
% y2 = v_vadsohn(x,fs,'a',qq);
% x = x(y2==1);
% [x,~] = Gnoisegen(x,SNR);
clean_cep{i,j} = v_melcepst(x,fs,MFCC_conf)';
type = 'cafe' ;
[x,noise] = add_noise_Reality_tmp(x,type,SNR);
noise_cep{i,j} = v_melcepst(noise,fs,MFCC_conf)';
trainSpeakerData{i,j} = v_melcepst(x,fs,MFCC_conf)';
% 正则表达式提取SpeakerID
pat = '(?<=\\)[FM].{3}[0-9]' ;
name = regexpi(train_wavdir{i,j},pat,'match') ;
train_SpeakerID{i,j} = name{1} ;
% fprintf('\n%s的特征提取完成!',name{1});
end
end
% fprintf('\n特征提取完成!');
% toc;
end

function train_wavdir = get_wavdir(topTraindir,N)
%{
Input:
topTraindir:文件夹路径
OutPut:
train_wavdir : 提取文件夹路径
%}
nTra_Channels = 10;
trainFolder = dir(topTraindir);
trainFolder = trainFolder(3:end);
nTra_Speakers = size(trainFolder,1);
train_wavdir = cell(nTra_Speakers, nTra_Channels);
for i = 1:nTra_Speakers
curTraindir = [topTraindir,'\',trainFolder(i).name];%Train子文件夹
trainWav = dir([curTraindir,'\*.WAV']);%直接读取WAV文件,不用改
for j = 1:nTra_Channels
train_wavname = [curTraindir,'\',trainWav(j).name];%当前wav文件地址
train_wavdir{i,j} = train_wavname;
end
end
train_wavdir = train_wavdir(1:N,:);
end

function [x2,fs] = select_noice(type,Nx)
if type == "cafe"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\cafe.wav' ;
elseif type == "car"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\car.wav' ;
elseif type =="white"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\white.wav' ;
end

[x,fs] = readwav(dir) ;
max_length = length(x);

X = randi(max_length-Nx,1);
x2 = x(X:X+Nx-1);
end

function [y,noise] = add_noise_Reality_tmp(x,type,snr)

Nx = length(x); % 求出信号x长
[noise,~] = select_noice(type,Nx) ; % 生成噪声数据

% 重新计算Noise信号的幅值,以达到预设的SNR
signal_power = 1/Nx*sum(x.*x); % 求出信号的平均能量
noise_power=1/Nx*sum(noise.*noise);% 求出噪声的能量
noise_variance = signal_power / ( 10^(snr/10) ); % 计算出噪声设定的方差值
noise=sqrt(noise_variance/noise_power)*noise; % 按噪声的平均能量构成相应的白噪声

% 这里假定为加性噪声
y=x+noise; % 构成带噪语音
end

去除统计量测试

目的:为了使 带噪语音归一化结果 \rightarrow 纯净语音归一化结果

%{

重开主题,主要是对《基于特征参数归一化的鲁棒语音识别方法综述 》的复现
- 开始进行去均值操作

%}
clc;clear all;close all;
%% 超参数设置:
% MFCC configure
MFCC_conf = 'E' ;
SNR = 0 ;
Channel_noise=1 ;

%% TrainSet特征提取
topTraindir = 'C:\Users\wangj\Documents\MATLAB\Experiment05\data';%Train整体数据文件夹

tic;
k = 2 ; % 取出第1个倒谱系数
% clean_cep
SNR = 20 ;
save_memory_dim = 1 ;
[clean_cep,noise_cep,trainSpeakerData] = ...
get_noise_MFCC_tmp(topTraindir,save_memory_dim,SNR,MFCC_conf);

[pdf_by_SNR_1,xi_1] = plot_pdf_2D(clean_cep,k,'w');
[pdf_by_SNR_2,xi_2] = plot_pdf_2D(noise_cep,k,'w');
[pdf_by_SNR_3,xi_3] = plot_pdf_2D(trainSpeakerData,k,'w');

subplot 312;
plot(xi_1,pdf_by_SNR_1,'k');hold on;
plot(xi_2,pdf_by_SNR_2,'b-.');
plot(xi_3,pdf_by_SNR_3,'r');
legend('纯净语音归一化','噪声归一化','带噪语音归一化','location','northwest')
legend('纯净语音归一化','噪声归一化','带噪语音归一化','location','northwest')
xlabel('去均值CMS');
xlim([-5,5]);ylim([0,1.5]);

% 非均值
[pdf_by_SNR_1,xi_1] = plot_pdf_2D(clean_cep,k,'f');
[pdf_by_SNR_2,xi_2] = plot_pdf_2D(noise_cep,k,'f');
[pdf_by_SNR_3,xi_3] = plot_pdf_2D(trainSpeakerData,k,'f');

subplot 311;
plot(xi_1,pdf_by_SNR_1,'k');hold on;
plot(xi_2,pdf_by_SNR_2,'b-.');
plot(xi_3,pdf_by_SNR_3,'r');
legend('纯净语音归一化','噪声归一化','带噪语音归一化','location','northwest')
xlabel('不去均值');
xlim([-5,5]);ylim([0,1.5]);

% 去方差
% 非均值
[pdf_by_SNR_1,xi_1] = plot_pdf_2D(clean_cep,k,'v');
[pdf_by_SNR_2,xi_2] = plot_pdf_2D(noise_cep,k,'v');
[pdf_by_SNR_3,xi_3] = plot_pdf_2D(trainSpeakerData,k,'v');


subplot 313;
plot(xi_1,pdf_by_SNR_1,'k');hold on;
plot(xi_2,pdf_by_SNR_2,'b-.');
plot(xi_3,pdf_by_SNR_3,'r');
legend('纯净语音归一化','噪声归一化','带噪语音归一化','location','northwest')
xlabel('去方差CMVS');
xlim([-5,5]);ylim([0,1.5]);
set(gcf,'color','w');

%% function函数
function [clean_cep,noise_cep,trainSpeakerData] = ...
get_noise_MFCC_tmp(topTraindir,N,SNR,MFCC_conf)
train_wavdir = get_wavdir(topTraindir,N);
nTra_Speakers = size(train_wavdir,1) ;
nTra_Channels = size(train_wavdir,2) ;

trainSpeakerData = cell(nTra_Speakers,nTra_Channels);
noise_cep = cell(nTra_Speakers, nTra_Channels);
clean_cep = cell(nTra_Speakers, nTra_Channels);
% fprintf('\nTrain Set特征提取...\n\n');
% tic ;
for i = 1:nTra_Speakers
for j = 1:nTra_Channels
[x,fs] = readsph(train_wavdir{i,j});
% qq = struct('tn',0.0075,'gz',0.00001);
% y2 = v_vadsohn(x,fs,'a',qq);
% x = x(y2==1);
% [x,~] = Gnoisegen(x,SNR);
clean_cep{i,j} = v_melcepst(x,fs,MFCC_conf)';
type = 'car' ;
[x,noise] = add_noise_Reality_tmp(x,type,SNR);
noise_cep{i,j} = v_melcepst(noise,fs,MFCC_conf)';
trainSpeakerData{i,j} = v_melcepst(x,fs,MFCC_conf)';
% 正则表达式提取SpeakerID
pat = '(?<=\\)[FM].{3}[0-9]' ;
name = regexpi(train_wavdir{i,j},pat,'match') ;
train_SpeakerID{i,j} = name{1} ;
% fprintf('\n%s的特征提取完成!',name{1});
end
end
% fprintf('\n特征提取完成!');
% toc;
end

function train_wavdir = get_wavdir(topTraindir,N)
%{
Input:
topTraindir:文件夹路径
OutPut:
train_wavdir : 提取文件夹路径
%}
nTra_Channels = 10;
trainFolder = dir(topTraindir);
trainFolder = trainFolder(3:end);
nTra_Speakers = size(trainFolder,1);
train_wavdir = cell(nTra_Speakers, nTra_Channels);
for i = 1:nTra_Speakers
curTraindir = [topTraindir,'\',trainFolder(i).name];%Train子文件夹
trainWav = dir([curTraindir,'\*.WAV']);%直接读取WAV文件,不用改
for j = 1:nTra_Channels
train_wavname = [curTraindir,'\',trainWav(j).name];%当前wav文件地址
train_wavdir{i,j} = train_wavname;
end
end
train_wavdir = train_wavdir(1:N,:);
end

function [x2,fs] = select_noice(type,Nx)
if type == "cafe"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\cafe.wav' ;
elseif type == "car"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\car.wav' ;
elseif type =="white"
dir = 'C:\Users\wangj\Desktop\语音识别\different_noise\white.wav' ;
end

[x,fs] = readwav(dir) ;
max_length = length(x);

X = randi(max_length-Nx,1);
x2 = x(X:X+Nx-1);
end

function [y,noise] = add_noise_Reality_tmp(x,type,snr)

Nx = length(x); % 求出信号x长
[noise,~] = select_noice(type,Nx) ; % 生成噪声数据

% 重新计算Noise信号的幅值,以达到预设的SNR
signal_power = 1/Nx*sum(x.*x); % 求出信号的平均能量
noise_power=1/Nx*sum(noise.*noise);% 求出噪声的能量
noise_variance = signal_power / ( 10^(snr/10) ); % 计算出噪声设定的方差值
noise=sqrt(noise_variance/noise_power)*noise; % 按噪声的平均能量构成相应的白噪声

% 这里假定为加性噪声
y=x+noise; % 构成带噪语音
end

function [pdf_by_SNR,xi] = plot_pdf_2D(trainSpeakerData,k,flag)
MFCC = cell2mat(trainSpeakerData);
if flag == "w"
MFCC_avg = mean(MFCC');
MFCC_sub = MFCC' - MFCC_avg;
elseif flag == "f"
MFCC_sub = MFCC';
elseif flag == "v"
MFCC_avg = mean(MFCC');
MFCC_sub = MFCC' - MFCC_avg;
MFCC_sub = MFCC_sub./var(MFCC');
end

MFCC_by_SNR = zeros(13,1);
for n = 1 : size(MFCC_sub,2)
MFCC_by_SNR(n) = MFCC_sub(k,n); % 取出第2个倒谱系数
end
[pdf_by_SNR,xi] = ksdensity(MFCC_by_SNR);
end