Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
T
Toy-MT-Introduction
概览
Overview
Details
Activity
Cycle Analytics
版本库
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
问题
0
Issues
0
列表
Board
标记
里程碑
合并请求
0
Merge Requests
0
CI / CD
CI / CD
流水线
作业
日程表
图表
维基
Wiki
代码片段
Snippets
成员
Collapse sidebar
Close sidebar
活动
图像
聊天
创建新问题
作业
提交
Issue Boards
Open sidebar
单韦乔
Toy-MT-Introduction
Commits
e1a8ace5
Commit
e1a8ace5
authored
Jan 02, 2020
by
xiaotong
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
binarized trees
parent
c5d54017
显示空白字符变更
内嵌
并排
正在显示
2 个修改的文件
包含
253 行增加
和
8 行删除
+253
-8
Section04-Phrasal-and-Syntactic-Models/section04-test.tex
+84
-8
Section04-Phrasal-and-Syntactic-Models/section04.tex
+169
-0
没有找到文件。
Section04-Phrasal-and-Syntactic-Models/section04-test.tex
查看文件 @
e1a8ace5
...
...
@@ -181,7 +181,7 @@
\end{center}
\begin{itemize}
\item
<2-> 一个
具体的
例子
\item
<2-> 一个例子
\end{itemize}
\vspace
{
-1.0em
}
...
...
@@ -206,16 +206,16 @@
\node
[anchor=north] (tw3) at ([yshift=-2em]sw3.south)
{
Trump
}
;
\draw
[-,dashed] (sw1.south) -- (tw1.north);
\draw
[-] (sw2.south) -- (tw2.north);
\draw
[-] (sw3.south) -- (tw3.north);
\draw
[-] (sw4.south) -- (tw3.north);
\draw
[-
,dashed
] (sw2.south) -- (tw2.north);
\draw
[-
,dashed
] (sw3.south) -- (tw3.north);
\draw
[-
,dashed
] (sw4.south) -- (tw3.north);
\node
[anchor=west] (rulelabel1) at ([xshift=1in,yshift=0em]n1.east)
{
\footnotesize
{
\textbf
{
抽取到的规则:
}}}
;
\node
[anchor=west] (rulelabel1) at ([xshift=1in,yshift=0
.3
em]n1.east)
{
\footnotesize
{
\textbf
{
抽取到的规则:
}}}
;
\node
[anchor=north west] (rule1) at (rulelabel1.south west)
{
NP(NNP
$_
1
$
NN
$_
2
$
NN(唐纳德) NN(特朗普))
}
;
\node
[anchor=north west] (rule1t) at ([yshift=0.2em]rule1.south west)
{$
\to
$
NNP
$_
1
$
NN
$_
2
$
Trump
}
;
\node
[anchor=north west] (rule2) at (rule1t.south west)
{
NP(NNP
$_
1
$
NN(总统) NN(唐纳德) NN(特朗普))
}
;
\node
[anchor=north west] (rule2t) at ([yshift=0.2em]rule2.south west)
{$
\to
$
NNP
$_
1
$
President Trump
}
;
\node
[anchor=north west] (rulelabel2) at (rule2t.south west)
{
\footnotesize
{
\textbf
{
\alert
{
不能
}
抽取到的规则:
}}}
;
\node
[anchor=north west] (rulelabel2) at (
[yshift=-0.3em]
rule2t.south west)
{
\footnotesize
{
\textbf
{
\alert
{
不能
}
抽取到的规则:
}}}
;
\node
[anchor=north west] (rule3) at (rulelabel2.south west)
{
NP(NN(唐纳德) NN(特朗普))
$
\to
$
Trump
}
;
\end{scope}
...
...
@@ -231,11 +231,87 @@
%%% tree binarization (cont.)
\begin{frame}
{
更多的规则 - 句法树二叉化(续)
}
\begin{itemize}
\item
句法分析器生成的句法树可能会非常平坦,这会导致抽取的规则很``大''而且规则无法继续被分解
\item
一种解决问题的思路是用二叉化方法把树结构变得更深
\vspace
{
-1.5em
}
\begin{center}
\begin{tikzpicture}
{
\scriptsize
\begin{scope}
[sibling distance=4pt, level distance=25pt]
\Tree
[.
\node
(n1)
{
NP
}
;
[.NNP
\node
(sw1)
{
美国
}
; ]
[.NN
\node
(sw2)
{
总统
}
; ]
[.NN
\node
(sw3)
{
唐纳德
}
; ]
[.NN
\node
(sw4)
{
特朗普
}
; ]
]
\node
[anchor=north] (tw1) at ([yshift=-2em]sw1.south)
{
U.S.
}
;
\node
[anchor=north] (tw2) at ([yshift=-2em]sw2.south)
{
President
}
;
\node
[anchor=north] (tw3) at ([yshift=-2em]sw3.south)
{
Trump
}
;
\draw
[-,dashed] (sw1.south) -- (tw1.north);
\draw
[-,dashed] (sw2.south) -- (tw2.north);
\draw
[-,dashed] (sw3.south) -- (tw3.north);
\draw
[-,dashed] (sw4.south) -- (tw3.north);
\draw
[->,very thick] ([xshift=1em]sw4.east) -- ([xshift=5em]sw4.east) node [pos=0.5,above]
{
\tiny
{
二叉化
}}
;
\end{scope}
\begin{scope}
[xshift=2.2in,sibling distance=10pt, level distance=15pt]
\Tree
[.
\node
(n1)
{
NP
}
;
[.NNP
\node
(sw1)
{
美国
}
; ]
[.NP-BAR
[.NN
\node
(sw2)
{
总统
}
; ]
[.NP-BAR
[.NN
\node
(sw3)
{
唐纳德
}
; ]
[.NN
\node
(sw4)
{
特朗普
}
; ]
]
]
]
\node
[anchor=north] (tw1) at ([yshift=-4.5em]sw1.south)
{
U.S.
}
;
\node
[anchor=north] (tw2) at ([yshift=-2.75em]sw2.south)
{
President
}
;
\node
[anchor=north] (tw3) at ([yshift=-1em]sw3.south)
{
Trump
}
;
\draw
[-,dashed] (sw1.south) -- (tw1.north);
\draw
[-,dashed] (sw2.south) -- (tw2.north);
\draw
[-,dashed] (sw3.south) -- (tw3.north);
\draw
[-,dashed] (sw4.south) -- (tw3.north);
\end{scope}
}
\end{tikzpicture}
\end{center}
\visible
<2->
{
\small
{
二叉化增加了更多的可信节点,这也带来了新的规则
}
\begin{center}
{
\footnotesize
\vspace
{
0.3em
}
NP-BAR(NN(唐纳德) NN(特朗普))
$
\to
$
Trump
\\
\vspace
{
0.3em
}
NP-BAR(NN
$_
1
$
NP-BAR
$_
2
$
)
$
\to
$
NN
$_
1
$
NP-BAR
$_
2
$
\vspace
{
0.3em
}
}
\end{center}
}
\item
<3-> 树二叉化已经成为基于句法机器翻译模型的常用方法
\begin{itemize}
\item
比如,在CTB中经常会看到很宽的子树结构
\item
有很多策略:左优先、右优先、head优先等等
\item
二叉化可以得到更多(细粒度)规则,保证规则的覆盖度
\end{itemize}
\end{itemize}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
...
...
Section04-Phrasal-and-Syntactic-Models/section04.tex
查看文件 @
e1a8ace5
...
...
@@ -3823,6 +3823,175 @@ VP(P(对) NP(NN(局势)) VP$_1$) $\to$ VP$_1$ about the situation \\
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% tree binarization
\begin{frame}
{
更多的规则 - 句法树二叉化
}
\begin{itemize}
\item
句法分析器生成的句法树可能会非常平坦,这会导致抽取的规则很``大''而且规则无法继续被分解
\begin{itemize}
\item
比如,在CTB中经常会看到很宽的子树结构
\end{itemize}
\end{itemize}
\begin{center}
\begin{tikzpicture}
{
\scriptsize
\begin{scope}
[scale = 0.9, sibling distance=20pt, level distance=30pt]
{
\footnotesize
\Tree
[.IP
[.NP ]
[.VP ]
[., ]
[.VP ]
[., ]
[.VP ]
[., ]
[.VP ]
[.
{
.
{
\color
{
white
}
V
}}
]
]
}
\end{scope}
}
\end{tikzpicture}
\end{center}
\begin{itemize}
\item
<2-> 一个例子
\end{itemize}
\vspace
{
-1.0em
}
\begin{center}
\begin{tikzpicture}
\visible
<2->
{
{
\scriptsize
\begin{scope}
[sibling distance=4pt, level distance=25pt]
{
\footnotesize
\Tree
[.
\node
(n1)
{
NP
}
;
[.NNP
\node
(sw1)
{
美国
}
; ]
[.NN
\node
(sw2)
{
总统
}
; ]
[.NN
\node
(sw3)
{
唐纳德
}
; ]
[.NN
\node
(sw4)
{
特朗普
}
; ]
]
}
\node
[anchor=north] (tw1) at ([yshift=-2em]sw1.south)
{
U.S.
}
;
\node
[anchor=north] (tw2) at ([yshift=-2em]sw2.south)
{
President
}
;
\node
[anchor=north] (tw3) at ([yshift=-2em]sw3.south)
{
Trump
}
;
\draw
[-,dashed] (sw1.south) -- (tw1.north);
\draw
[-,dashed] (sw2.south) -- (tw2.north);
\draw
[-,dashed] (sw3.south) -- (tw3.north);
\draw
[-,dashed] (sw4.south) -- (tw3.north);
\node
[anchor=west] (rulelabel1) at ([xshift=1in,yshift=0.3em]n1.east)
{
\footnotesize
{
\textbf
{
抽取到的规则:
}}}
;
\node
[anchor=north west] (rule1) at (rulelabel1.south west)
{
NP(NNP
$_
1
$
NN
$_
2
$
NN(唐纳德) NN(特朗普))
}
;
\node
[anchor=north west] (rule1t) at ([yshift=0.2em]rule1.south west)
{$
\to
$
NNP
$_
1
$
NN
$_
2
$
Trump
}
;
\node
[anchor=north west] (rule2) at (rule1t.south west)
{
NP(NNP
$_
1
$
NN(总统) NN(唐纳德) NN(特朗普))
}
;
\node
[anchor=north west] (rule2t) at ([yshift=0.2em]rule2.south west)
{$
\to
$
NNP
$_
1
$
President Trump
}
;
\node
[anchor=north west] (rulelabel2) at ([yshift=-0.3em]rule2t.south west)
{
\footnotesize
{
\textbf
{
\alert
{
不能
}
抽取到的规则:
}}}
;
\node
[anchor=north west] (rule3) at (rulelabel2.south west)
{
NP(NN(唐纳德) NN(特朗普))
$
\to
$
Trump
}
;
\end{scope}
}
}
\end{tikzpicture}
\end{center}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% tree binarization (cont.)
\begin{frame}
{
更多的规则 - 句法树二叉化(续)
}
\begin{itemize}
\item
一种解决问题的思路是用二叉化方法把树结构变得更深
\vspace
{
-1.5em
}
\begin{center}
\begin{tikzpicture}
{
\scriptsize
\begin{scope}
[sibling distance=4pt, level distance=25pt]
\Tree
[.
\node
(n1)
{
NP
}
;
[.NNP
\node
(sw1)
{
美国
}
; ]
[.NN
\node
(sw2)
{
总统
}
; ]
[.NN
\node
(sw3)
{
唐纳德
}
; ]
[.NN
\node
(sw4)
{
特朗普
}
; ]
]
\node
[anchor=north] (tw1) at ([yshift=-2em]sw1.south)
{
U.S.
}
;
\node
[anchor=north] (tw2) at ([yshift=-2em]sw2.south)
{
President
}
;
\node
[anchor=north] (tw3) at ([yshift=-2em]sw3.south)
{
Trump
}
;
\draw
[-,dashed] (sw1.south) -- (tw1.north);
\draw
[-,dashed] (sw2.south) -- (tw2.north);
\draw
[-,dashed] (sw3.south) -- (tw3.north);
\draw
[-,dashed] (sw4.south) -- (tw3.north);
\draw
[->,very thick] ([xshift=1em]sw4.east) -- ([xshift=5em]sw4.east) node [pos=0.5,above]
{
\tiny
{
二叉化
}}
;
\end{scope}
\begin{scope}
[xshift=2.2in,sibling distance=10pt, level distance=15pt]
\Tree
[.
\node
(n1)
{
NP
}
;
[.NNP
\node
(sw1)
{
美国
}
; ]
[.NP-BAR
[.NN
\node
(sw2)
{
总统
}
; ]
[.NP-BAR
[.NN
\node
(sw3)
{
唐纳德
}
; ]
[.NN
\node
(sw4)
{
特朗普
}
; ]
]
]
]
\node
[anchor=north] (tw1) at ([yshift=-4.5em]sw1.south)
{
U.S.
}
;
\node
[anchor=north] (tw2) at ([yshift=-2.75em]sw2.south)
{
President
}
;
\node
[anchor=north] (tw3) at ([yshift=-1em]sw3.south)
{
Trump
}
;
\draw
[-,dashed] (sw1.south) -- (tw1.north);
\draw
[-,dashed] (sw2.south) -- (tw2.north);
\draw
[-,dashed] (sw3.south) -- (tw3.north);
\draw
[-,dashed] (sw4.south) -- (tw3.north);
\end{scope}
}
\end{tikzpicture}
\end{center}
\visible
<2->
{
\small
{
二叉化增加了更多的可信节点,这也带来了新的规则
}
\begin{center}
{
\footnotesize
\vspace
{
0.3em
}
NP-BAR(NN(唐纳德) NN(特朗普))
$
\to
$
Trump
\\
\vspace
{
0.3em
}
NP-BAR(NN
$_
1
$
NP-BAR
$_
2
$
)
$
\to
$
NN
$_
1
$
NP-BAR
$_
2
$
\vspace
{
0.3em
}
}
\end{center}
}
\item
<3-> 树二叉化已经成为基于句法机器翻译模型的常用方法
\begin{itemize}
\item
有很多策略:左优先、右优先、head优先等等
\item
二叉化可以得到更多(细粒度)规则,保证规则的覆盖度
\end{itemize}
\end{itemize}
\end{frame}
%%%------------------------------------------------------------------------------------------------------------
%%% 基于树结构的翻译文法 - 树到树
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论