Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
M
mtbookv2
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
NiuTrans
mtbookv2
Commits
ed87fd41
Commit
ed87fd41
authored
Jan 15, 2021
by
曹润柘
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update
parent
dd755fc1
隐藏空白字符变更
内嵌
并排
正在显示
3 个修改的文件
包含
109 行增加
和
75 行删除
+109
-75
Chapter17/Figures/figure-an-end-to-end-voice-translation-model-based-on-transformer.tex
+52
-35
Chapter17/Figures/figure-speech-translation-model-based-on-CTC.tex
+56
-39
Chapter17/chapter17.tex
+1
-1
没有找到文件。
Chapter17/Figures/figure-an-end-to-end-voice-translation-model-based-on-transformer.tex
查看文件 @
ed87fd41
\begin{tikzpicture}
\tikzstyle
{
layer
}
=[draw,rounded corners=2pt,font=
\scriptsize
,align=center,minimum width=
5
em]
\tikzstyle
{
layer
}
=[draw,rounded corners=2pt,font=
\scriptsize
,align=center,minimum width=
7.1
em]
\tikzstyle
{
word
}
=[font=
\scriptsize
]
%%%%encoder
\node
[layer,fill=red!20]
(en
_
sa) at (0,0)
{
Multi-Head
\\
Attention
}
;
\node
[layer,anchor=south,fill=green!20]
(en
_
ffn) at ([yshift=1.4em]en
_
sa.north)
{
Feed Forward
\\
Network
}
;
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=north]
(en
_
add) at ([yshift=-1.4em]en
_
sa.south)
{}
;
\draw
[]
(en
_
add.90) -- (en
_
add.-90);
\draw
[]
(en
_
add.0) -- (en
_
add.180);
\node
[layer,anchor=north,fill=yellow!20]
(en
_
cnn) at ([yshift=-1.4em]en
_
add.south)
{
CNN
}
;
\node
[anchor=south,layer,fill=yellow!20]
(en
_
add1) at ([yshift=1.0em]en
_
sa.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=green!20]
(en
_
ffn) at ([yshift=1.0em]en
_
add1.north)
{
Feed Forward
\\
Network
}
;
\node
[anchor=south,layer,fill=yellow!20]
(en
_
add2) at ([yshift=1.0em]en
_
ffn.north)
{
Add
\&
LayerNorm
}
;
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=north,thick]
(en
_
add) at ([yshift=-1.4em]en
_
sa.south)
{}
;
\draw
[thick]
(en
_
add.90) -- (en
_
add.-90);
\draw
[thick]
(en
_
add.0) -- (en
_
add.180);
\node
[layer,anchor=north,fill=yellow!20]
(en
_
cnn) at ([yshift=-1.0em]en
_
add.south)
{
CNN
}
;
\node
[anchor=east,font=\scriptsize,align=center]
(en
_
pos) at ([xshift=-2em]en
_
add.west)
{
位置编码
}
;
\node
[anchor=north,font=\scriptsize,align=center]
(en
_
input) at ([yshift=-1em]en
_
cnn.south)
{
源语言语音特征
\\
(FBank/MFCC)
}
;
\draw
[->,thick]
(en
_
input.90) -- ([yshift=-0.1em]en
_
cnn.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
cnn.90) -- ([yshift=-0.1em]en
_
add.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
add.90) -- ([yshift=-0.1em]en
_
sa.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
sa.90) -- ([yshift=-0.1em]en
_
add1.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
add1.90) -- ([yshift=-0.1em]en
_
ffn.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
ffn.90) --([yshift=-0.1em]en
_
add2.-90);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]en
_
sa.south)--([yshift=-0.6em,xshift=-4.0em]en
_
sa.south)--([xshift=-0.43em]en
_
add1.west)--(en
_
add1.west);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]en
_
ffn.south)--([yshift=-0.6em,xshift=-4.0em]en
_
ffn.south)--([xshift=-0.43em]en
_
add2.west)--(en
_
add2.west);
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=west]
(de
_
add) at ([xshift=7em]en
_
add.east)
{}
;
\draw
[]
(de
_
add.90) -- (de
_
add.-90);
\draw
[]
(de
_
add.0) -- (de
_
add.180);
%%%%decoder
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=west,thick]
(de
_
add) at ([xshift=9em]en
_
add.east)
{}
;
\draw
[thick]
(de
_
add.90) -- (de
_
add.-90);
\draw
[thick]
(de
_
add.0) -- (de
_
add.180);
\node
[layer,anchor=south,fill=red!20]
(de
_
sa) at ([yshift=1.4em]de
_
add.north)
{
Masked
\\
Multi-Head
\\
Attention
}
;
\node
[layer,anchor=south,fill=red!20]
(de
_
ca) at ([yshift=1.4em]de
_
sa.north)
{
Multi-Head
\\
Attention
}
;
\node
[layer,anchor=south,fill=green!20]
(de
_
ffn) at ([yshift=1.4em]de
_
ca.north)
{
Feed Forward
\\
Network
}
;
\node
[layer,anchor=south,fill=blue!20]
(sf) at ([yshift=1.6em]de
_
ffn.north)
{
Softmax
}
;
%\node[layer,anchor=south,fill=orange!20] (output) at ([yshift=1.4em]sf.north){STLoss};
\node
[anchor=north,font=\scriptsize,align=center]
(en
_
input) at ([yshift=-1em]en
_
cnn.south)
{
源语言语音特征
\\
(FBank/MFCC)
}
;
\node
[anchor=south,layer,fill=yellow!20]
(de
_
add1) at ([yshift=1.0em]de
_
sa.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=red!20]
(de
_
ca) at ([yshift=1.0em]de
_
add1.north)
{
Multi-Head
\\
Attention
}
;
\node
[anchor=south,layer,fill=yellow!20]
(de
_
add2) at ([yshift=1.0em]de
_
ca.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=green!20]
(de
_
ffn) at ([yshift=1.0em]de
_
add2.north)
{
Feed Forward
\\
Network
}
;
\node
[anchor=south,layer,fill=yellow!20]
(de
_
add3) at ([yshift=1.0em]de
_
ffn.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=blue!20]
(sf) at ([yshift=1.2em]de
_
add3.north)
{
Softmax
}
;
\node
[anchor=north,font=\scriptsize,align=center]
(de
_
input) at ([yshift=-1.1em]de
_
add.south)
{
目标语言文本
\\
编码表示
}
;
\node
[anchor=east,font=\scriptsize,align=center]
(en
_
pos) at ([xshift=-2em]en
_
add.west)
{
位置编码
}
;
\node
[anchor=west,font=\scriptsize,align=center]
(de
_
pos) at ([xshift=2em]de
_
add.east)
{
位置编码
}
;
\draw
[->]
(en
_
input.90) -- ([yshift=-0.1em]en
_
cnn.-90);
\draw
[->]
([yshift=0.1em]en
_
cnn.90) -- ([yshift=-0.1em]en
_
add.-90);
\draw
[->]
([yshift=0.1em]en
_
add.90) -- ([yshift=-0.1em]en
_
sa.-90);
\draw
[->]
([yshift=0.1em]en
_
sa.90) -- ([yshift=-0.1em]en
_
ffn.-90);
\draw
[->]
(de
_
input.90) -- ([yshift=-0.1em]de
_
add.-90);
\draw
[->]
([yshift=0.1em]de
_
add.90) -- ([yshift=-0.1em]de
_
sa.-90);
\draw
[->]
([yshift=0.1em]de
_
sa.90) -- ([yshift=-0.1em]de
_
ca.-90);
\draw
[->]
([yshift=0.1em]de
_
ca.90) -- ([yshift=-0.1em]de
_
ffn.-90);
\draw
[->]
([yshift=0.1em]de
_
ffn.90) -- ([yshift=-0.1em]sf.-90);
\draw
[->]
([yshift=0.1em]sf.90) -- ([yshift=1.5em]sf.90);
\draw
[->]
([xshift=0.1em]en
_
pos.0) -- ([xshift=-0.1em]en
_
add.180);
\draw
[->]
([xshift=-0.1em]de
_
pos.180) -- ([xshift=0.1em]de
_
add.0);
\draw
[->,rounded corners=2pt]
([yshift=0.1em]en
_
ffn.90) -- ([yshift=2em]en
_
ffn.90) -- ([xshift=4em,yshift=2em]en
_
ffn.90) -- ([xshift=-1.5em]de
_
ca.west) -- ([xshift=-0.1em]de
_
ca.west);
\draw
[->,thick]
(de
_
input.90) -- ([yshift=-0.1em]de
_
add.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add.90) -- ([yshift=-0.1em]de
_
sa.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
sa.90) -- ([yshift=-0.1em]de
_
add1.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add1.90) -- ([yshift=-0.1em]de
_
ca.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
ca.90) -- ([yshift=-0.1em]de
_
add2.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add2.90) -- ([yshift=-0.1em]de
_
ffn.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
ffn.90) -- ([yshift=-0.1em]de
_
add3.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add3.90) -- ([yshift=-0.1em]sf.-90);
\draw
[->,thick]
([yshift=0.1em]sf.90) -- ([yshift=1.0em]sf.90);
\draw
[->,thick]
([xshift=0.1em]en
_
pos.0) -- ([xshift=-0.1em]en
_
add.180);
\draw
[->,thick]
([xshift=-0.1em]de
_
pos.180) -- ([xshift=0.1em]de
_
add.0);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]de
_
sa.south)--([yshift=-0.6em,xshift=4.0em]de
_
sa.south)--([xshift=0.43em]de
_
add1.east)--(de
_
add1.east);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]de
_
ca.south)--([yshift=-0.6em,xshift=4.0em]de
_
ca.south)--([xshift=0.43em]de
_
add2.east)--(de
_
add2.east);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]de
_
ffn.south)--([yshift=-0.6em,xshift=4.0em]de
_
ffn.south)--([xshift=0.43em]de
_
add3.east)--(de
_
add3.east);
\draw
[->,rounded corners=2pt,thick]
([yshift=0.1em]en
_
add2.90) -- ([yshift=1.5em]en
_
add2.90) -- ([xshift=5.0em,yshift=1.5em]en
_
add2.90) -- ([xshift=-1.5em]de
_
ca.west) -- ([xshift=-0.1em]de
_
ca.west);
\begin{pgfonlayer}
{
background
}
\node
[draw=ugreen,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
][fit=(en_sa)(en_ffn)]
{}
;
\node
[draw=red,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
][fit=(de_sa)(de_ca)(de_ffn)]
{}
;
\node
[draw=ugreen,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
,xshift=-0.2em,yshift=-0.2em][fit=(en_add1)(en_add2)(en_sa)(en_ffn)]
(box1)
{}
;
\node
[draw=red,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
,xshift=0.2em,yshift=-0.2em][fit=(de_sa)(de_ca)(de_ffn)(de_add3)]
(box2)
{}
;
\end{pgfonlayer}
\node
[anchor=east,font=\scriptsize,text=ugreen]
at ([xshift=-0.1em]box1.west)
{$
N
\times
$}
;
\node
[anchor=west,font=\scriptsize,text=red]
at ([xshift=0.1em]box2.east)
{$
\times
N
$}
;
\node
[anchor=east,font=\scriptsize]
at ([xshift=-0.1em]en
_
cnn.west)
{$
2
\times
$}
;
\node
[anchor=east,font=\scriptsize,align=center,text=ugreen]
at ([xshift=-0.1em,yshift=3em]box1.west)
{
ST
\\
编码器
}
;
\node
[anchor=west,font=\scriptsize,align=center,text=red]
at ([xshift=0.1em,yshift=5em]box2.east)
{
ST
\\
解码器
}
;
\node
[anchor=east,font=\scriptsize,align=center,text=ugreen]
at ([xshift=-0.1em,yshift=3em]box1.west)
{
ST
\\
编码器
}
;
\node
[anchor=west,font=\scriptsize,align=center,text=red]
at ([xshift=0.1em,yshift=5em]box2.east)
{
ST
\\
解码器
}
;
\end{tikzpicture}
\ No newline at end of file
Chapter17/Figures/figure-speech-translation-model-based-on-CTC.tex
查看文件 @
ed87fd41
\begin{tikzpicture}
\tikzstyle
{
layer
}
=[draw,rounded corners=2pt,font=
\scriptsize
,align=center,minimum width=
5
em]
\tikzstyle
{
layer
}
=[draw,rounded corners=2pt,font=
\scriptsize
,align=center,minimum width=
7.1
em]
\tikzstyle
{
word
}
=[font=
\scriptsize
]
%%%%encoder
\node
[layer,fill=red!20]
(en
_
sa) at (0,0)
{
Multi-Head
\\
Attention
}
;
\node
[layer,anchor=south,fill=green!20]
(en
_
ffn) at ([yshift=1.4em]en
_
sa.north)
{
Feed Forward
\\
Network
}
;
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=north]
(en
_
add) at ([yshift=-1.4em]en
_
sa.south)
{}
;
\draw
[]
(en
_
add.90) -- (en
_
add.-90);
\draw
[]
(en
_
add.0) -- (en
_
add.180);
\node
[layer,anchor=north,fill=yellow!20]
(en
_
cnn) at ([yshift=-1.4em]en
_
add.south)
{
CNN
}
;
\node
[anchor=south,layer,fill=yellow!20]
(en
_
add1) at ([yshift=1.0em]en
_
sa.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=green!20]
(en
_
ffn) at ([yshift=1.0em]en
_
add1.north)
{
Feed Forward
\\
Network
}
;
\node
[anchor=south,layer,fill=yellow!20]
(en
_
add2) at ([yshift=1.0em]en
_
ffn.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=blue!20]
(en
_
sf) at ([yshift=2.4em]en
_
add2.north)
{
Softmax
}
;
\node
[layer,anchor=south,fill=orange!20]
(en
_
output) at ([yshift=1.0em]en
_
sf.north)
{
CTC Output
}
;
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=north,thick]
(en
_
add) at ([yshift=-1.4em]en
_
sa.south)
{}
;
\draw
[thick]
(en
_
add.90) -- (en
_
add.-90);
\draw
[thick]
(en
_
add.0) -- (en
_
add.180);
\node
[layer,anchor=north,fill=yellow!20]
(en
_
cnn) at ([yshift=-1.0em]en
_
add.south)
{
CNN
}
;
\node
[anchor=east,font=\scriptsize,align=center]
(en
_
pos) at ([xshift=-2em]en
_
add.west)
{
位置编码
}
;
\node
[anchor=north,font=\scriptsize,align=center]
(en
_
input) at ([yshift=-1em]en
_
cnn.south)
{
源语言语音特征
\\
(FBank/MFCC)
}
;
\draw
[->,thick]
(en
_
input.90) -- ([yshift=-0.1em]en
_
cnn.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
cnn.90) -- ([yshift=-0.1em]en
_
add.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
add.90) -- ([yshift=-0.1em]en
_
sa.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
sa.90) -- ([yshift=-0.1em]en
_
add1.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
add1.90) -- ([yshift=-0.1em]en
_
ffn.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
ffn.90) --([yshift=-0.1em]en
_
add2.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
add2.90) -- ([yshift=-0.1em]en
_
sf.-90);
\draw
[->,thick]
([yshift=0.1em]en
_
sf.90) -- ([yshift=-0.1em]en
_
output.-90);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]en
_
sa.south)--([yshift=-0.6em,xshift=-4.0em]en
_
sa.south)--([xshift=-0.43em]en
_
add1.west)--(en
_
add1.west);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]en
_
ffn.south)--([yshift=-0.6em,xshift=-4.0em]en
_
ffn.south)--([xshift=-0.43em]en
_
add2.west)--(en
_
add2.west);
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=west]
(de
_
add) at ([xshift=7em]en
_
add.east)
{}
;
\draw
[]
(de
_
add.90) -- (de
_
add.-90);
\draw
[]
(de
_
add.0) -- (de
_
add.180);
%%%%decoder
\node
[draw,circle,inner sep=0pt, minimum size=1em,anchor=west,thick]
(de
_
add) at ([xshift=9em]en
_
add.east)
{}
;
\draw
[thick]
(de
_
add.90) -- (de
_
add.-90);
\draw
[thick]
(de
_
add.0) -- (de
_
add.180);
\node
[layer,anchor=south,fill=red!20]
(de
_
sa) at ([yshift=1.4em]de
_
add.north)
{
Masked
\\
Multi-Head
\\
Attention
}
;
\node
[layer,anchor=south,fill=red!20]
(de
_
ca) at ([yshift=1.4em]de
_
sa.north)
{
Multi-Head
\\
Attention
}
;
\node
[layer,anchor=south,fill=green!20]
(de
_
ffn) at ([yshift=1.4em]de
_
ca.north)
{
Feed Forward
\\
Network
}
;
\node
[anchor=south,layer,fill=yellow!20]
(de
_
add1) at ([yshift=1.0em]de
_
sa.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=red!20]
(de
_
ca) at ([yshift=1.0em]de
_
add1.north)
{
Multi-Head
\\
Attention
}
;
\node
[anchor=south,layer,fill=yellow!20]
(de
_
add2) at ([yshift=1.0em]de
_
ca.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=green!20]
(de
_
ffn) at ([yshift=1.0em]de
_
add2.north)
{
Feed Forward
\\
Network
}
;
\node
[anchor=south,layer,fill=yellow!20]
(de
_
add3) at ([yshift=1.0em]de
_
ffn.north)
{
Add
\&
LayerNorm
}
;
\node
[layer,anchor=south,fill=blue!20]
(sf) at ([yshift=1.2em]de
_
add3.north)
{
Softmax
}
;
\node
[anchor=north,font=\scriptsize,align=center]
(de
_
input) at ([yshift=-1.1em]de
_
add.south)
{
目标语言文本
\\
编码表示
}
;
\node
[layer,anchor=south,fill=blue!20]
(en
_
sf) at ([yshift=3em]en
_
ffn.north)
{
Softmax
}
;
\node
[layer,anchor=south,fill=blue!20]
(sf) at ([yshift=2em]de
_
ffn.north)
{
Softmax
}
;
\node
[layer,anchor=south,fill=orange!20]
(en
_
output) at ([yshift=1.4em]en
_
sf.north)
{
CTC Output
}
;
%\node[layer,anchor=south,fill=orange!20] (output) at ([yshift=1.4em]sf.north){ST Output};
\node
[anchor=west,font=\scriptsize,align=center]
(de
_
pos) at ([xshift=2em]de
_
add.east)
{
位置编码
}
;
\node
[anchor=north,font=\scriptsize,align=center]
(en
_
input) at ([yshift=-1em]en
_
cnn.south)
{
语音特征
\\
(FBank/MFCC)
}
;
\node
[anchor=north,font=\scriptsize,align=center]
(de
_
input) at ([yshift=-1em]de
_
add.south)
{
标注文本
\\
编码表示
}
;
\draw
[->,thick]
(de
_
input.90) -- ([yshift=-0.1em]de
_
add.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add.90) -- ([yshift=-0.1em]de
_
sa.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
sa.90) -- ([yshift=-0.1em]de
_
add1.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add1.90) -- ([yshift=-0.1em]de
_
ca.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
ca.90) -- ([yshift=-0.1em]de
_
add2.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add2.90) -- ([yshift=-0.1em]de
_
ffn.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
ffn.90) -- ([yshift=-0.1em]de
_
add3.-90);
\draw
[->,thick]
([yshift=0.1em]de
_
add3.90) -- ([yshift=-0.1em]sf.-90);
\draw
[->,thick]
([yshift=0.1em]sf.90) -- ([yshift=1.0em]sf.90);
\draw
[->,thick]
([xshift=0.1em]en
_
pos.0) -- ([xshift=-0.1em]en
_
add.180);
\draw
[->,thick]
([xshift=-0.1em]de
_
pos.180) -- ([xshift=0.1em]de
_
add.0);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]de
_
sa.south)--([yshift=-0.6em,xshift=4.0em]de
_
sa.south)--([xshift=0.43em]de
_
add1.east)--(de
_
add1.east);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]de
_
ca.south)--([yshift=-0.6em,xshift=4.0em]de
_
ca.south)--([xshift=0.43em]de
_
add2.east)--(de
_
add2.east);
\draw
[->,rounded corners=2pt,thick]
([yshift=-0.6em]de
_
ffn.south)--([yshift=-0.6em,xshift=4.0em]de
_
ffn.south)--([xshift=0.43em]de
_
add3.east)--(de
_
add3.east);
\draw
[->,rounded corners=2pt,thick]
([yshift=0.1em]en
_
add2.90) -- ([yshift=1.5em]en
_
add2.90) -- ([xshift=5.0em,yshift=1.5em]en
_
add2.90) -- ([xshift=-1.5em]de
_
ca.west) -- ([xshift=-0.1em]de
_
ca.west);
\node
[anchor=east,font=\scriptsize,align=center]
(en
_
pos) at ([xshift=-2em]en
_
add.west)
{
位置编码
}
;
\node
[anchor=west,font=\scriptsize,align=center]
(de
_
pos) at ([xshift=2em]de
_
add.east)
{
位置编码
}
;
\draw
[->]
(en
_
input.90) -- ([yshift=-0.1em]en
_
cnn.-90);
\draw
[->]
([yshift=0.1em]en
_
cnn.90) -- ([yshift=-0.1em]en
_
add.-90);
\draw
[->]
([yshift=0.1em]en
_
add.90) -- ([yshift=-0.1em]en
_
sa.-90);
\draw
[->]
([yshift=0.1em]en
_
sa.90) -- ([yshift=-0.1em]en
_
ffn.-90);
\draw
[->]
(de
_
input.90) -- ([yshift=-0.1em]de
_
add.-90);
\draw
[->]
([yshift=0.1em]de
_
add.90) -- ([yshift=-0.1em]de
_
sa.-90);
\draw
[->]
([yshift=0.1em]de
_
sa.90) -- ([yshift=-0.1em]de
_
ca.-90);
\draw
[->]
([yshift=0.1em]de
_
ca.90) -- ([yshift=-0.1em]de
_
ffn.-90);
\draw
[->]
([yshift=0.1em]en
_
ffn.90) -- ([yshift=-0.1em]en
_
sf.-90);
\draw
[->]
([yshift=0.1em]en
_
sf.90) -- ([yshift=-0.1em]en
_
output.-90);
\draw
[->]
([yshift=0.1em]de
_
ffn.90) -- ([yshift=-0.1em]sf.-90);
\draw
[->]
([yshift=0.1em]sf.90) -- ([yshift=1.5em]sf.90);
\draw
[->]
([xshift=0.1em]en
_
pos.0) -- ([xshift=-0.1em]en
_
add.180);
\draw
[->]
([xshift=-0.1em]de
_
pos.180) -- ([xshift=0.1em]de
_
add.0);
\draw
[->,rounded corners=2pt]
([yshift=2em]en
_
ffn.90) -- ([xshift=4em,yshift=2em]en
_
ffn.90) -- ([xshift=-1.5em]de
_
ca.west) -- ([xshift=-0.1em]de
_
ca.west);
\begin{pgfonlayer}
{
background
}
\node
[draw=ugreen,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
][fit=(en_sa)(en_ffn)]
{}
;
\node
[draw=red,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
][fit=(de_sa)(de_ca)(de_ffn)]
{}
;
\node
[draw=ugreen,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
,xshift=-0.2em,yshift=-0.2em][fit=(en_add1)(en_add2)(en_sa)(en_ffn)]
(box1)
{}
;
\node
[draw=red,rounded corners=2pt,inner xsep=6pt,inner ysep=8pt,dashed,thick
,xshift=0.2em,yshift=-0.2em][fit=(de_sa)(de_ca)(de_ffn)(de_add3)]
(box2)
{}
;
\end{pgfonlayer}
\node
[anchor=east,font=\scriptsize,text=ugreen]
at ([xshift=-0.1em]box1.west)
{$
N
\times
$}
;
\node
[anchor=west,font=\scriptsize,text=red]
at ([xshift=0.1em]box2.east)
{$
\times
N
$}
;
\node
[anchor=east,font=\scriptsize]
at ([xshift=-0.1em]en
_
cnn.west)
{$
2
\times
$}
;
\node
[anchor=east,font=\scriptsize,align=center,text=ugreen]
at ([xshift=-0.1em,yshift=3em]box1.west)
{
ST
\\
编码器
}
;
\node
[anchor=west,font=\scriptsize,align=center,text=red]
at ([xshift=0.1em,yshift=5em]box2.east)
{
ST
\\
解码器
}
;
\node
[anchor=east,font=\scriptsize,align=center,text=ugreen]
at ([xshift=-0.1em,yshift=3em]box1.west)
{
ST
\\
编码器
}
;
\node
[anchor=west,font=\scriptsize,align=center,text=red]
at ([xshift=0.1em,yshift=5em]box2.east)
{
ST
\\
解码器
}
;
\end{tikzpicture}
\ No newline at end of file
Chapter17/chapter17.tex
查看文件 @
ed87fd41
...
...
@@ -75,7 +75,7 @@
\parinterval
经过上面的描述可以看出,音频的表示实际上是一个非常长的采样点序列,这导致了直接使用现有的深度学习技术处理音频序列较为困难。并且,原始的音频信号中可能包含着较多的噪声、环境声或冗余信息,也会对模型产生干扰。因此,一般会对音频序列进行处理来提取声学特征,具体为将长序列的采样点序列转换为短序列的特征向量序列,再用于下游系统。虽然已有一些工作不依赖特征提取,直接在原始的采样点序列上进行声学建模和模型训练
\upcite
{
DBLP:conf/interspeech/SainathWSWV15
}
,但目前的主流方法仍然是基于声学特征进行建模
\upcite
{
DBLP:conf/icassp/MohamedHP12
}
。
\parinterval
声学特征提取的第一步是预处理。其流程主要是对音频进行预加重、分帧和加窗。预加重
用来提升音频信号中的高频部分,目的是使频谱更加平
滑。分帧(原理如图
\ref
{
fig:17-3
}
所示)是基于短时平稳假设,即根据生物学特征,语音信号是一个缓慢变化的过程,10ms
$
\thicksim
$
30ms的信号片段是相对平稳的。基于这个假设,一般将每25ms作为一帧来提取特征,这个时间称为
{
\small\bfnew
{
帧长
}}
\index
{
帧长
}
(Frame Length)
\index
{
Frame Length
}
。同时,为了保证不同帧之间的信号平滑性,使每两个相邻帧之间存在一定的重合部分。一般每隔10ms取一帧,这个时长称为
{
\small\bfnew
{
帧移
}}
\index
{
帧移
}
(Frame Shift)
\index
{
Frame Shift
}
。为了缓解分帧带来的频谱泄漏,对每帧的信号进行加窗处理使其幅度在两段渐变到0,一般采用的是
{
\small\bfnew
{
汉明窗
}}
\index
{
汉明窗
}
(Hamming)
\index
{
Hamming
}
\upcite
{
洪青阳2020语音识别原理与应用
}
。
\parinterval
声学特征提取的第一步是预处理。其流程主要是对音频进行预加重、分帧和加窗。预加重
是通过增强音频信号中的高频部分来减弱语音中对高频信号的抑制,使频谱更加顺
滑。分帧(原理如图
\ref
{
fig:17-3
}
所示)是基于短时平稳假设,即根据生物学特征,语音信号是一个缓慢变化的过程,10ms
$
\thicksim
$
30ms的信号片段是相对平稳的。基于这个假设,一般将每25ms作为一帧来提取特征,这个时间称为
{
\small\bfnew
{
帧长
}}
\index
{
帧长
}
(Frame Length)
\index
{
Frame Length
}
。同时,为了保证不同帧之间的信号平滑性,使每两个相邻帧之间存在一定的重合部分。一般每隔10ms取一帧,这个时长称为
{
\small\bfnew
{
帧移
}}
\index
{
帧移
}
(Frame Shift)
\index
{
Frame Shift
}
。为了缓解分帧带来的频谱泄漏,对每帧的信号进行加窗处理使其幅度在两段渐变到0,一般采用的是
{
\small\bfnew
{
汉明窗
}}
\index
{
汉明窗
}
(Hamming)
\index
{
Hamming
}
\upcite
{
洪青阳2020语音识别原理与应用
}
。
%----------------------------------------------------------------------------------------------------
\begin{figure}
[htp]
\centering
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论