Commit d664c0a0 by xuchen

1. add macro to implement unary function 2. add sub and div function 3. merge…

1. add macro to implement unary function 2. add sub and div function 3. merge code with the latest branch of xiaotong-working
parent 7e9d7015
<script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML"> </script>
# NiuTrans.Tensor张量计算库 # NiuTrans.Tensor张量计算库
## NiuTrans.Tensor ## NiuTrans.Tensor
...@@ -27,39 +29,46 @@ NiuTrans.Tensor撠皞★撘銝芸極 ...@@ -27,39 +29,46 @@ NiuTrans.Tensor撠皞★撘銝芸極
## 什么是张量 ## 什么是张量
在计算机科学中,张量(Tensor)通常被定义为\\(n\\)维空间中的一种量,它具有\\(n\\)个分量,这种张量本质上是一个多维数组( multidimensional array)。张量的阶或秩是这个多维数组的维度,或者简单理解为索引张量里的每个元素所需要的索引个数。通常来说,0阶张量被定义为标量(Scalar),1阶张量被定义为向量(vector),而2阶张量被定义为矩阵(matrix)。比如,在一个三维空间中,1阶张量就是空间中点所表示的向量\\((x,y,z)\\),其中\\(x\\)、\\(y\\)、\\(z\\)分别表示这个点在三个轴上的坐标。 在计算机科学中,张量(Tensor)通常被定义为$n$维空间中的一种量,它具有$n$个分量,这种张量本质上是一个多维数组( multidimensional array)。张量的阶或秩是这个多维数组的维度,或者简单理解为索引张量里的每个元素所需要的索引个数。通常来说,0阶张量被定义为标量(Scalar),1阶张量被定义为向量(vector),而2阶张量被定义为矩阵(matrix)。比如,在一个三维空间中,1阶张量就是空间中点所表示的向量$(x,y,z)$,其中$x$、$y$、$z$分别表示这个点在三个轴上的坐标。
张量是一种高效的数学建模工具,它可以将复杂的问题通过统一、简洁的方式进行表达。比如,姜英俊同学做饭需要2斤牛肉、5斤土豆,市场上牛肉每斤32元、土豆每斤2元,那么购买这些食物总共花费\\(2 \times 32 + 5 \times 2 = 74\\)元。如果用张量来描述,我们可以用一个1阶张量\\(a=(2,5)\\)表示所需不同食物的重量。然后用另一个1阶张量\\(b=(32,2)\\)表示不同食物的价格。最后,我们用一个0阶张量\\(c\\)表示购买这些食物的总价,计算如下 张量是一种高效的数学建模工具,它可以将复杂的问题通过统一、简洁的方式进行表达。比如,姜英俊同学做饭需要2斤牛肉、5斤土豆,市场上牛肉每斤32元、土豆每斤2元,那么购买这些食物总共花费$2 \times 32 + 5 \times 2 = 74$元。如果用张量来描述,我们可以用一个1阶张量$a=(2,5)$表示所需不同食物的重量。然后用另一个1阶张量$b=(32,2)$表示不同食物的价格。最后,我们用一个0阶张量$c$表示购买这些食物的总价,计算如下
$$ $$
\begin{aligned} \begin{aligned}
c & = a \times b^T \\\\ c & = a \times b^T \\
& = \left(\begin{matrix}2 & 5\end{matrix}\right) \times \left(\begin{matrix}32 \\\\ 2\end{matrix}\right) \\\\ & = \left(\begin{matrix}2 & 5\end{matrix}\right) \times \left(\begin{matrix}32 \\ 2\end{matrix}\right) \\
& = 2 \times 32 + 5 \times 2 \\\\ & = 2 \times 32 + 5 \times 2 \\
& = 74 & = 74
\end{aligned} \end{aligned}
$$ $$
\\(b^T\\)表示行向量\\(b\\)的转置 - 列向量,\\(\times\\)表示向量的乘法。第二天,姜英俊同学换了一个市场,这里牛肉每斤35元、土豆每斤1元。如果要知道在两个市场分别购物的总价,可以把\\(b\\)重新定义为一个2阶张量\\(\left(\begin{matrix}32 & 2 \\\\ 35 & 1\end{matrix}\right)\\),总价\\(c\\)定义为一个2阶张量。同样有 中$b^T$表示行向量$b$的转置 - 列向量,$\times$表示向量的乘法。第二天,姜英俊同学换了一个市场,这里牛肉每斤35元、土豆每斤1元。如果要知道在两个市场分别购物的总价,可以把$b$重新定义为一个2阶张量$\left(\begin{matrix}32 & 2 \\ 35 & 1\end{matrix}\right)$,总价$c$定义为一个2阶张量。同样有
$$ $$
\begin{aligned} \begin{aligned}
c & = a \times b^T \\\\ c & = a \times b^T \\
& = \left(\begin{matrix}2 & 5\end{matrix}\right) \times \left(\begin{matrix}32 & 35 \\\\ 2 & 1\end{matrix}\right) \\\\ & = \left(\begin{matrix}2 & 5\end{matrix}\right) \times \left(\begin{matrix}32 & 35 \\ 2 & 1\end{matrix}\right) \\
& = \left(\begin{matrix}74 & 75\end{matrix}\right) & = \left(\begin{matrix}74 & 75\end{matrix}\right)
\end{aligned} \end{aligned}
$$ $$
即,在两个市场分别花费74元和75元。可以看出,利用张量可以对多样、复杂的问题进行建模,比如,可以进一步扩展上述问题中\\(a\\)、\\(b\\)、\\(c\\)的定义,把它们定义成更高阶的张量,处理不同时间、不同市场、不同菜谱的情况,但是不论情况如何变化,都可以用同一个公式\\(c = a \times b^T\\)来描述问题。 即,在两个市场分别花费74元和75元。可以看出,利用张量可以对多样、复杂的问题进行建模,比如,可以进一步扩展上述问题中$a$、$b$、$c$的定义,把它们定义成更高阶的张量,处理不同时间、不同市场、不同菜谱的情况,但是不论情况如何变化,都可以用同一个公式$c = a \times b^T$来描述问题。
许多现实世界的问题都可以被描述为张量表达式(expression),也就是把张量的组合、计算描述为算数表达式。这种建模方式也构成了现代神经网络模型及深度学习方法的基础。在许多机器学习工具中,张量计算已经成为了神经网络前向、反向传播等过程的基本单元,应用十分广泛。 许多现实世界的问题都可以被描述为张量表达式(expression),也就是把张量的组合、计算描述为算数表达式。这种建模方式也构成了现代神经网络模型及深度学习方法的基础。在许多机器学习工具中,张量计算已经成为了神经网络前向、反向传播等过程的基本单元,应用十分广泛。
## 如何定义张量 ## 如何定义张量
如果你是一名C/C++或者Python的使用者,那么在程序中使用NiuTrans.Tensor定义张量将非常简单。首先,下载NiuTrans.Tensor的工具包(source???),并解压到任意目录,比如~/NTS目录。我们会在NTS这个目录中找到source子目录,它是存放源代码的目录。对于source子目录的结构,信息如下: 如果你是一名C/C++或者Python的使用者,那么在程序中使用NiuTrans.Tensor定义张量将非常简单。首先,下载NiuTrans.Tensor的工具包,并解压到任意目录,比如~/NTS目录。我们会在NTS这个目录中找到source子目录,它是存放源代码的目录。对于source子目录的结构,信息如下:
* ~/NTS/source/XTensor.h - 定义了张量结构XTensor,以及构建和销毁XTensor的接口 * ~/NTS/source/XTensor.h - 定义了张量结构XTensor,以及构建和销毁XTensor的接口
* ~/NTS/source/core - 存放张量计算的函数声明及函数体实现的源文件 * ~/NTS/source/core - 存放张量计算的函数声明及函数体实现的源文件
* arithmetic - 存放有关算术运算的源文件
* getandset - 存放有关算术存取的源文件
* math - 存放有关数学运算的源文件
* movement - 存放有关数据移动的源文件
* reduce - 存放有关规约操作的源文件
* shape - 存放有关形状转换的源文件
* sort - 存放有关排序操作的源文件
* ~/NTS/source/function - 存放各种激活函数的源文件 * ~/NTS/source/function - 存放各种激活函数的源文件
* ~/NTS/source/test - 存放单元测试的源文件 * ~/NTS/source/test - 存放单元测试的源文件
* ~/NTS/source/*.h(cpp) - 与张量定义不相关,后文介绍 :) * ~/NTS/source/*.h(cpp) - 与张量定义不相关,后文介绍 :)
...@@ -84,7 +93,7 @@ int main(int argc, const char ** argv) ...@@ -84,7 +93,7 @@ int main(int argc, const char ** argv)
} }
``` ```
下一步,编译以上源程序,这个过程需要指定XTensor.h头文件所在目录。比如,使用g++编译sample.cpp(如果你使用的是visual studio,请看这里???) 下一步,编译以上源程序,这个过程需要指定XTensor.h头文件所在目录。比如,使用g++编译sample.cpp
``` ```
g++ sample.cpp -I~/NTS/source -o sample g++ sample.cpp -I~/NTS/source -o sample
...@@ -190,8 +199,6 @@ int main(int argc, const char ** argv) ...@@ -190,8 +199,6 @@ int main(int argc, const char ** argv)
| 创建4维稠密张量 | XTensor * NewTensor4D(<br>const int d0, const int d1, const int d2, const int d3, <br> const TENSOR_DATA_TYPE myDataType = X_FLOAT, <br>const int myDevID = -1, XMem * myMem = NULL) <br> <br> <br> <br> | d0 - 张量第一维大小 <br> d1 - 张量第二维大小 <br> d2 - 张量第三维大小 <br> d3 - 张量第四维大小 <br> myDataType - 张量的数据类型 <br> myDevID - 张量所在的设备ID <br> myMem - 张量所使用的内存池 | | 创建4维稠密张量 | XTensor * NewTensor4D(<br>const int d0, const int d1, const int d2, const int d3, <br> const TENSOR_DATA_TYPE myDataType = X_FLOAT, <br>const int myDevID = -1, XMem * myMem = NULL) <br> <br> <br> <br> | d0 - 张量第一维大小 <br> d1 - 张量第二维大小 <br> d2 - 张量第三维大小 <br> d3 - 张量第四维大小 <br> myDataType - 张量的数据类型 <br> myDevID - 张量所在的设备ID <br> myMem - 张量所使用的内存池 |
| 创建5维稠密张量 | XTensor * NewTensor5D(<br>const int d0, const int d1, const int d2, <br> const int d3, const int d4, <br> const TENSOR_DATA_TYPE myDataType = X_FLOAT, <br>const int myDevID = -1, XMem * myMem = NULL) <br> <br> <br> <br> | d0 - 张量第一维大小 <br> d1 - 张量第二维大小 <br> d2 - 张量第三维大小 <br> d3 - 张量第四维大小 <br> d4 - 张量第五维大小 <br> myDataType - 张量的数据类型 <br> myDevID - 张量所在的设备ID <br> myMem - 张量所使用的内存池 | | 创建5维稠密张量 | XTensor * NewTensor5D(<br>const int d0, const int d1, const int d2, <br> const int d3, const int d4, <br> const TENSOR_DATA_TYPE myDataType = X_FLOAT, <br>const int myDevID = -1, XMem * myMem = NULL) <br> <br> <br> <br> | d0 - 张量第一维大小 <br> d1 - 张量第二维大小 <br> d2 - 张量第三维大小 <br> d3 - 张量第四维大小 <br> d4 - 张量第五维大小 <br> myDataType - 张量的数据类型 <br> myDevID - 张量所在的设备ID <br> myMem - 张量所使用的内存池 |
## 设备
## 访问张量中的内容 ## 访问张量中的内容
在C/C++中,我们通过XTensor.h访问张量中的内容,并且仅需要在源程序中引用XTensor.h头文件就可以完成张量的定义。 在C/C++中,我们通过XTensor.h访问张量中的内容,并且仅需要在源程序中引用XTensor.h头文件就可以完成张量的定义。
...@@ -204,71 +211,73 @@ int main(int argc, const char ** argv) ...@@ -204,71 +211,73 @@ int main(int argc, const char ** argv)
| void * data | 保存元素的数据数组 | | void * data | 保存元素的数据数组 |
| int devID | 设备ID,指张量所申请的空间所在CPU或者GPU设备的编号,-1表示CPU | | int devID | 设备ID,指张量所申请的空间所在CPU或者GPU设备的编号,-1表示CPU |
| int order | 张量的维度,例如:一个矩阵(维度为2)是一个二维张量 | | int order | 张量的维度,例如:一个矩阵(维度为2)是一个二维张量 |
| int dimSize<br> [MAX_TENSOR_DIM_NUM] | 张量中每一维度的大小,索引0表示第1维 | | int dimSize[ ] | 张量中每一维度的大小,索引0表示第1维 |
| TENSOR_DATA_TYPE dataType | 每个数据单元的数据类型 | | TENSOR_DATA_TYPE dataType | 每个数据单元的数据类型 |
| int unitSize | 数据单元的大小,类似于sizeof() | | int unitSize | 数据单元的大小,类似于sizeof() |
| int unitNum | 数据单元的数量 | | int unitNum | 数据单元的数量 |
| bool isSparse | 是否稠密,一个n * m稠密矩阵的数据量大小为n * m,而稀疏(非稠密)矩阵的数据量大小则取决于矩阵中非零元素个数。| | bool isSparse | 是否稠密,一个n * m稠密矩阵的数据量大小为n * m,而稀疏(非稠密)矩阵的数据量大小则取决于矩阵中非零元素个数。|
| float denseRatio | 稠密度,指非零单元的比例,是介于0和1之间的一个实数,0表示所有单元全为零,1表示全为非零单元。| | float denseRatio | 稠密度,指非零单元的比例,是介于0和1之间的一个实数,0表示所有单元全为零,1表示全为非零单元。|
在XTensor.h头文件中定义的方法说明: 在XTensor.h头文件中定义部分方法说明,详情参见附录:
| 功能 | 函数 | 参数 | | 功能 | 函数 | 参数 |
| - | - | - | | - | - | - |
| 判断两个张量数据类型<br>和大小是否相同 | static bool IsIdentical(<br> XTensor * a, XTensor * b) | a - 进行比较的第一个张量 <br> b - 进行比较的第二个张量 |
| 判断三个张量数据类型<br>和大小是否相同 | static bool IsIdentical(<br> XTensor * a, XTensor * b, XTensor * c) | a - 进行比较的第一个张量 <br> b - 进行比较的第二个张量 <br> c - 进行比较的第三个张量 |
| 设置张量每一维度的大小 | void SetDim(int * myDimSize) |myDimSize - 张量每一维度的大小 | | 设置张量每一维度的大小 | void SetDim(int * myDimSize) |myDimSize - 张量每一维度的大小 |
| 得到张量中给定的维度大小 | int GetDim(const int dim) | dim - 张量的维度 | | 得到张量中给定的维度大小 | int GetDim(const int dim) | dim - 张量的维度 |
| 重新调整矩阵维度 | void Reshape(<br> const int order, const int * myDimSize) | order - 张量的维度 <br> myDimSize - 张量每一维的大小 | | 重新调整矩阵维度 | void Reshape(<br> const int order, const int * myDimSize) | order - 张量的维度 <br> myDimSize - 张量每一维的大小 |
| 得到张量中元素数量 | int GetSize() | N/A | | 得到张量中元素数量 | int GetSize() | N/A |
| 得到所给数据类型的数据<br> 单元大小 | int GetUnitSize(<br> TENSOR_DATA_TYPE myDataType) | myDataType - 所给数据类型 |
| 张量中所有元素设置为0 | void SetZeroAll(XStream * stream = NULL) | stream - 多线程流|
| 用数组赋值张量 | void SetData(<br> const void * d, int num, int beg = 0) | d - 赋值数组 <br> num - 数组大小 <br> beg - 赋值时从张量的第几位开始 | | 用数组赋值张量 | void SetData(<br> const void * d, int num, int beg = 0) | d - 赋值数组 <br> num - 数组大小 <br> beg - 赋值时从张量的第几位开始 |
| 设置张量服从均匀分布 | void SetDataRand(<br> DTYPE lower, DTYPE upper) | lower - 最小值 <br> upper - 最大值 | | 张量中所有元素设置为0 | void SetZeroAll(XStream * stream = NULL) | stream - 多线程流|
| 设置张量服从正态分布 | void SetDataRandn(<br> DTYPE mean, DTYPE standardDeviation) | mean - 均值 <br> standardDeviation - 标准差 |
| 将给定维度中元素<br> 设置为升序 | void SetAscendingOrder(int dim) | dim - 给定维度 |
| 获取二维张量的值 | DTYPE Get2D(int ni, int mi = 0) | ni - 行值 <br> mi - 列值 | | 获取二维张量的值 | DTYPE Get2D(int ni, int mi = 0) | ni - 行值 <br> mi - 列值 |
| 设置二维张量中<br> 的单元值 | bool Set2D(DTYPE value, int ni, int mi = 0) | value - 单元值 <br> ni - 行值 <br> mi - 列值 | | 设置二维张量中<br> 的单元值 | bool Set2D(DTYPE value, int ni, int mi = 0) | value - 单元值 <br> ni - 行值 <br> mi - 列值 |
| 增加二维张量中<br> 的单元值 | bool Add2D(DTYPE value, int ni, int mi = 0) | value - 单元值 <br> ni - 行值 <br> mi - 列值 | | 增加二维张量中<br> 的单元值 | bool Add2D(DTYPE value, int ni, int mi = 0) | value - 单元值 <br> ni - 行值 <br> mi - 列值 |
| 将矩阵重置为特定大小 | bool Resize(<br> const int myOrder, <br> const int * myDimSize, <br> const TENSOR_DATA_TYPE myDataType = DEFAULT_DTYPE, <br> const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 | | 将矩阵重置为特定大小 | bool Resize(<br> const int myOrder, <br> const int * myDimSize, <br> const TENSOR_DATA_TYPE myDataType = DEFAULT_DTYPE, <br> const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 |
| 将矩阵重置为<br> 另一矩阵大小 | bool Resize(<br> const XTensor * myTensor) | myTensor - 重置矩阵大小的参考矩阵 | | 将矩阵重置为<br> 另一矩阵大小 | bool Resize(<br> const XTensor * myTensor) | myTensor - 重置矩阵大小的参考矩阵 |
| 依据给定张量<br>复制一个新的张量 | XTensor * NewTensor(<br>XTensor * a, bool isFilledData = true) | a - 给定张量 <br> isFilledData - 是否申请张量中的数据空间 |
| 依据给定张量<br>释放数据空间 | void DelTensor(<br>const XTensor * tensor) | tensor - 给定张量 | | 依据给定张量<br>释放数据空间 | void DelTensor(<br>const XTensor * tensor) | tensor - 给定张量 |
## 张量计算 ## 张量计算
NiuTrans.Tensor提供关于张量计算的函数功能,主要包括一些基本的张量运算以及激活函数,在本节中,主要对这些函数及其用法用例进行介绍。 NiuTrans.Tensor提供关于张量计算的函数功能,主要包括一些基本的张量运算以及激活函数,在本节中,主要对这些函数及其用法用例进行介绍。我们以点乘(Multiply)操作为例介绍NiuTrans.Tensor的几种函数定义形式:
* _Multiply: 需指定输出张量,只支持前向操作
* Multiply: 输出张量与输入张量相同,只支持前向操作
* MultiplyMe: 输出张量需返回给上层,同时支持前向和反向操作
### arithmetic ### 代数计算(arithmetic)
此部分主要包括各种数学运算,加、减、乘、除、取负等。 此部分主要包括各种数学运算,加、减、乘、除、取负等。
#### 矩阵乘法(MatrixMul) #### 矩阵乘法(MatrixMul)
##### 什么是张量间矩阵乘法? ##### 什么是张量间矩阵乘法?
利用矩阵乘法可以将矩阵想乘并得到一个新的结果矩阵,两个维度分别为\\(2 \times 3\\)和\\(3 \times 2\\)的矩阵相乘过程如下所示,结果矩阵的维度为\\(2 \times 2\\): 利用矩阵乘法可以将矩阵想乘并得到一个新的结果矩阵,两个维度分别为$2 \times 3$和$3 \times 2$的矩阵相乘过程如下所示,结果矩阵的维度为$2 \times 2$:
$$ $$
\left(\begin{matrix}1.0 & 2.0 & 3.0\\\\-4.0 & 5.0 & 6.0\end{matrix}\right) × \left(\begin{matrix}1.0 & 2.0 & 3.0\\-4.0 & 5.0 & 6.0\end{matrix}\right) ×
\left(\begin{matrix}0.0 & -1.0\\\\1.0 & 2.0\\\\2.0 & 1.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & -1.0\\1.0 & 2.0\\2.0 & 1.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}8.0 & 6.0\\\\17.0 & 20.0\end{matrix}\right) \left(\begin{matrix}8.0 & 6.0\\17.0 & 20.0\end{matrix}\right)
$$ $$
##### 矩阵乘法的调用 ##### 矩阵乘法的调用
NiuTrans.Tensor提供了矩阵乘法的计算操作,在NiuTrans.Tensor/Tensor/core中定义,矩阵乘法的调用方式以及参数说明如下所示: NiuTrans.Tensor提供了矩阵乘法的计算操作,在NiuTrans.Tensor/Tensor/core/arithmetic中定义,函数定义为:
>c_{i,j} = trans(ai) * trans(bj) * alpha + c_{i,j} * beta
矩阵乘法的调用方式以及参数说明如下所示:
``` ```
void _MatrixMul(XTensor * a, MATRIX_TRANS_TYPE transposedA, XTensor * b, MATRIX_TRANS_TYPE transposedB, XTensor * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0) void _MatrixMul(XTensor * a, MATRIX_TRANS_TYPE transposedA, XTensor * b, MATRIX_TRANS_TYPE transposedB, XTensor * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0)
XTensor MatrixMul(const XTensor &a, MATRIX_TRANS_TYPE transposedA, const XTensor &b, MATRIX_TRANS_TYPE transposedB, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0)
``` ```
Parameters: Parameters:
* a - 操作张量1 * a - 输入张量1
* transposedA - 作张量1是否进行转置 * transposedA - a是否进行转置
* b - 操作张量2 * b - 输入张量2
* transposedB - 作张量2是否进行转置 * transposedB - b是否进行转置
* c - 作张量3 * c - 输出张量
* alpha - 系数 * alpha - 系数α
* beta - 系数 * beta - 系数β
##### 矩阵乘法片段示例 ##### 矩阵乘法片段示例
...@@ -285,26 +294,30 @@ NiuTrans.Tensor/Tensor/test/TMatrixMul.cpp ...@@ -285,26 +294,30 @@ NiuTrans.Tensor/Tensor/test/TMatrixMul.cpp
##### 什么是张量点乘? ##### 什么是张量点乘?
利用张量间的点乘操作可以进行张量间元素的按位置依次相乘,两个维度分别为\\(2 \times 2\\)的张量点乘过程如下所示: 利用张量间的点乘操作可以进行张量间元素的按位置依次相乘,两个维度分别为$2 \times 2$的张量点乘过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0\\\\2.0 & 3.0\end{matrix}\right) · \left(\begin{matrix}0.0 & 1.0\\2.0 & 3.0\end{matrix}\right) ·
\left(\begin{matrix}0.0 & 1.0\\\\2.0 & 3.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0\\2.0 & 3.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}0.0 & 1.0\\\\4.0 & 9.0\end{matrix}\right) \left(\begin{matrix}0.0 & 1.0\\4.0 & 9.0\end{matrix}\right)
$$ $$
##### 张量点乘的调用 ##### 张量点乘的调用
NiuTrans.Tensor提供了张量点乘的计算操作,用来计算张量中元素点乘结果,该函数在NiuTrans.Tensor/Tensor/core中定义,张量点乘的调用方式以及参数说明如下所示: NiuTrans.Tensor提供了张量点乘的计算操作,用来计算张量中元素点乘结果,该函数在NiuTrans.Tensor/Tensor/core/arithmetic中定义,张量点乘的调用方式以及参数说明如下所示:
``` ```
_Multiply(XTensor * a, XTensor * b, XTensor * c, int leadingDim, DTYPE alpha = 0) _Multiply(XTensor * a, XTensor * b, XTensor * c, int leadingDim, DTYPE alpha = 0)
void _MultiplyMe(XTensor * a, const XTensor * b, DTYPE alpha = 0, int leadingDim = 0)
XTensor Multiply(const XTensor &a, const XTensor &b, DTYPE alpha = 0, int leadingDim = 0)
``` ```
Parameters: Parameters:
* a - 操作张量1 * a - 输入张量1
* b - 操作张量2 * b - 输入张量2
* c - 果张量 * c - 输出张量
* leadingDim - ??? * leadingDim - 沿着指定维度进行点乘操作
* alpha - 系数 * alpha - 系数
##### 张量点乘片段示例 ##### 张量点乘片段示例
...@@ -322,22 +335,22 @@ NiuTrans.Tensor/Tensor/test/TMultiply.cpp ...@@ -322,22 +335,22 @@ NiuTrans.Tensor/Tensor/test/TMultiply.cpp
##### 什么是张量的取负操作? ##### 什么是张量的取负操作?
在进行张量的取负操作时,张量中每一元素都进行取负得到新的元素,所有新元素的组合得到新的结果张量,一个维度为\\(3 \times 2\\)的张量取负操作过程如下所示: 在进行张量的取负操作时,张量中每一元素都进行取负得到新的元素,所有新元素的组合得到新的结果张量,一个维度为$3 \times 2$的张量取负操作过程如下所示:
$$ $$
\left(\begin{matrix}1.0 & -2.0\\\\-3.0 & 4.0\\\\5.0 & -6.0\end{matrix}\right) \rightarrow \left(\begin{matrix}1.0 & -2.0\\-3.0 & 4.0\\5.0 & -6.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}-1.0 & 2.0\\\\3.0 & -4.0\\\\-5.0 & 6.0\end{matrix}\right) \left(\begin{matrix}-1.0 & 2.0\\3.0 & -4.0\\-5.0 & 6.0\end{matrix}\right)
$$ $$
##### 张量取负的调用 ##### 张量取负的调用
NiuTrans.Tensor提供了张量取负的计算操作,进行张量的按元素位置进行取负操作,该函数在NiuTrans.Tensor/Tensor/core中定义,张量取负的调用方式以及参数说明如下所示: NiuTrans.Tensor提供了张量取负的计算操作,进行张量的按元素位置进行取负操作,该函数在NiuTrans.Tensor/Tensor/core/arithmetic中定义,张量取负的调用方式以及参数说明如下所示:
``` ```
_Negate(XTensor * a) void _Negate(XTensor * a)
``` ```
Parameters: Parameters:
* a - 操作张量 * a - 输入张量
##### 张量取负片段示例 ##### 张量取负片段示例
...@@ -353,35 +366,39 @@ NiuTrans.Tensor/Tensor/test/TNegate.cpp ...@@ -353,35 +366,39 @@ NiuTrans.Tensor/Tensor/test/TNegate.cpp
#### 加法(Sum) #### 加法(Sum)
##### 什么是张量加法? ##### 什么是张量加法?
张量加法的目的是将n个张量相加得到一个新的结果张量,结果张量某一位置的元素数值为进行操作的张量在该位置上元素的求和,在张量加法的计算过程中进行操作的张量与结果张量的维度相同,两个维度为\\(2\times 3\\)的张量相加过程如下所示: 张量加法的目的是将n个张量相加得到一个新的结果张量,结果张量某一位置的元素数值为进行操作的张量在该位置上元素的求和,在张量加法的计算过程中进行操作的张量与结果张量的维度相同,两个维度为$2\times 3$的张量相加过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 \\\\ 3.0 & 4.0 & 5.0\end{matrix}\right) + \left(\begin{matrix}0.0 & 1.0 & 2.0 \\ 3.0 & 4.0 & 5.0\end{matrix}\right) +
\left(\begin{matrix}0.5 & 1.5 & 2.5 \\\\ 3.5 & 4.5 & 5.5\end{matrix}\right) \rightarrow \left(\begin{matrix}0.5 & 1.5 & 2.5 \\ 3.5 & 4.5 & 5.5\end{matrix}\right) \rightarrow
\left(\begin{matrix}0.5 & 2.5 & 4.5 \\\\ 6.5 & 8.5 & 10.5\end{matrix}\right) \left(\begin{matrix}0.5 & 2.5 & 4.5 \\ 6.5 & 8.5 & 10.5\end{matrix}\right)
$$ $$
##### 张量加法的调用 ##### 张量加法的调用
NiuTrans.Tensor提供了张量加法的计算操作,在NiuTrans.Tensor/Tensor/core中定义,该操作用来进行张量之间的按元素位置相加,并得到相加的结果张量,张量加法的调用方法为: NiuTrans.Tensor提供了张量加法的计算操作,在NiuTrans.Tensor/Tensor/core/arithmetic中定义,该操作用来进行张量之间的按元素位置相加,并得到相加的结果张量,张量加法的调用方法为:
``` ```
_Sum(XTensor * a, XTensor * b, XTensor * c, DTYPE beta) void _Sum(const XTensor * a, const XTensor * b, XTensor * c, DTYPE beta = (DTYPE)1.0)
void _SumMe(XTensor * a, const XTensor * b, DTYPE beta = (DTYPE)1.0)
XTensor Sum(const XTensor &a, const XTensor &b, DTYPE beta = (DTYPE)1.0)
``` ```
其中a和b为输入张量,c为结果张量,若c为NULL则将相加结果存入a中,beta为一个缩放参数,缩放公式为:c = a + b * beta,beta默认为1.0,NiuTrans.Tensor中张量加法的调用方式以及参数说明如下所示: 其中a和b为输入张量,c为结果张量,若c为NULL则将相加结果存入a中,beta为一个缩放参数,缩放公式为:c = a + b * beta,beta默认为1.0,NiuTrans.Tensor中张量加法的调用方式以及参数说明如下所示:
Parameters: Parameters:
* a - 操作张量1 * a - 输入张量1
* b - 操作张量2 * b - 输入张量2
* c - 结果张量,如果c为空则将结果存入a * c - 输出张量
* beta - 缩放参数 * beta - 缩放参数
##### 张量加法片段示例 ##### 张量加法片段示例
调用Sum进行张量间的和操作如下所示,在此例中直接将张量相加结果存入a中: 调用Sum进行张量间的和操作如下所示,在此例中将张量相加结果存入c中:
``` ```
/* call sum function */ /* call sum function */
_Sum(a, b); _Sum(a, b, c);
``` ```
详细代码示例见: 详细代码示例见:
...@@ -391,24 +408,24 @@ NiuTrans.Tensor/Tensor/test/TSum.cpp ...@@ -391,24 +408,24 @@ NiuTrans.Tensor/Tensor/test/TSum.cpp
##### 什么是SumByColumnTV? ##### 什么是SumByColumnTV?
SumByColumnTV的作用是将一个Tensor和一个Vector按列相加,所得结果维度与Tensor一致,一个\\(2 \times 4\\)的Tensor和一个\\(2 \times 1\\)的Vector的SumByColumnTV操作过程如下所示: SumByColumnTV的作用是将一个Tensor和一个Vector按列相加,所得结果维度与Tensor一致,一个$2 \times 4$的Tensor和一个$2 \times 1$的Vector的SumByColumnTV操作过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) + \left(\begin{matrix}1.0\\\\0.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) + \left(\begin{matrix}1.0\\0.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}1.0 & 2.0 & 3.0 & 4.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \left(\begin{matrix}1.0 & 2.0 & 3.0 & 4.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right)
$$ $$
##### SumByColumnTV的调用 ##### SumByColumnTV的调用
NiuTrans.Tensor提供了张量的SumByColumnTV操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的SumByColumnTV操作,调用方法及参数说明如下所示:
``` ```
_SumByColumnTV(XTensor * a, XTensor * b, XTensor * c, DTYPE beta) void _SumByColumnTV(XTensor * a, XTensor * b, XTensor * c, DTYPE beta)
``` ```
Parameters: Parameters:
* a - 操作张量 * a - 输入张量
* b - 操作向量 * b - 入向量
* c - 果张量 * c - 输出张量
* beta - 缩放参数 * beta - 缩放参数
调用SumByColumnTV进行的运算为c_col = a_col + b * \beta 调用SumByColumnTV进行的运算为c_col = a_col + b * \beta
...@@ -418,7 +435,7 @@ Parameters: ...@@ -418,7 +435,7 @@ Parameters:
SumByColumnTV示例代码如下,其中a为输入的张量,b为输入的向量,c为a和b按列相加所得结果: SumByColumnTV示例代码如下,其中a为输入的张量,b为输入的向量,c为a和b按列相加所得结果:
``` ```
/* call SumByColumnTV function */ /* call SumByColumnTV function */
_SumByColumnTV(a, b, c); void _SumByColumnTV(a, b, c);
``` ```
有关张量SumByColumnTV的详细代码示例见: 有关张量SumByColumnTV的详细代码示例见:
...@@ -428,11 +445,11 @@ NiuTrans.Tensor/Tensor/test/TSumByColumnTV.cpp ...@@ -428,11 +445,11 @@ NiuTrans.Tensor/Tensor/test/TSumByColumnTV.cpp
##### 什么是SumByColumnVT? ##### 什么是SumByColumnVT?
SumByColumnVT的作用是将一个Vector和一个Tensor按列相加,所得结果维度与Vector一致,一个\\(2 \times 1\\)的Vector和一个\\(2 \times 4\\)的Tensor的SumByColumnVT操作过程如下所示: SumByColumnVT的作用是将一个Vector和一个Tensor按列相加,所得结果维度与Vector一致,一个$2 \times 1$的Vector和一个$2 \times 4$的Tensor的SumByColumnVT操作过程如下所示:
$$ $$
\left(\begin{matrix}1.0\\\\0.0\end{matrix}\right) + \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}1.0\\0.0\end{matrix}\right) + \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}7.0\\\\22.0\end{matrix}\right) \left(\begin{matrix}7.0\\22.0\end{matrix}\right)
$$ $$
##### SumByColumnVT调用 ##### SumByColumnVT调用
...@@ -443,9 +460,9 @@ _SumByColumnVT(XTensor * a, XTensor * b, XTensor * c, DTYPE beta) ...@@ -443,9 +460,9 @@ _SumByColumnVT(XTensor * a, XTensor * b, XTensor * c, DTYPE beta)
``` ```
Parameters: Parameters:
* a - 操作向量 * a - 入向量
* b - 操作张量 * b - 输入张量
* c - 果向量 * c - 出向量
* beta - 缩放参数 * beta - 缩放参数
调用SumByColumnVT进行的运算为c = a + \sum{col} b_col * \beta 调用SumByColumnVT进行的运算为c = a + \sum{col} b_col * \beta
...@@ -461,7 +478,7 @@ _SumByColumnVT(a, b, c); ...@@ -461,7 +478,7 @@ _SumByColumnVT(a, b, c);
NiuTrans.Tensor/Tensor/test/TSumByColumnVT.cpp NiuTrans.Tensor/Tensor/test/TSumByColumnVT.cpp
### getandset ### 张量存取(getandset)
此部分包括各种数据类型转化,设置数据、取数据等操作。 此部分包括各种数据类型转化,设置数据、取数据等操作。
...@@ -469,25 +486,25 @@ NiuTrans.Tensor/Tensor/test/TSumByColumnVT.cpp ...@@ -469,25 +486,25 @@ NiuTrans.Tensor/Tensor/test/TSumByColumnVT.cpp
##### 什么是张量的选择操作? ##### 什么是张量的选择操作?
Select时按张量指定维度上的指定位置对张量进行选择的操作,一个\\(2 \times 2 \times 4\\)的张量选择过程如下所示,本例中是选择张量维度2上位置索引为1和2的元素并存入目标张量,得到一个维度为\\(2 \times 2 \times 2\\)的张量: Select时按张量指定维度上的指定位置对张量进行选择的操作,一个$2 \times 2 \times 4$的张量选择过程如下所示,本例中是选择张量维度2上位置索引为1和2的元素并存入目标张量,得到一个维度为$2 \times 2 \times 2$的张量:
$$ $$
\begin{aligned} \begin{aligned}
\Biggl( \Biggl(
& \left( & \left(
\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right),\\\\ \begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right),\\
& \left( & \left(
\begin{matrix}1.0 & 2.0 & 3.0 & 4.0\\\\5.0 & 6.0 & 7.0 & 8.0\end{matrix} \begin{matrix}1.0 & 2.0 & 3.0 & 4.0\\5.0 & 6.0 & 7.0 & 8.0\end{matrix}
\right) \right)
\Biggr) \Biggr)
\end{aligned} \rightarrow \end{aligned} \rightarrow
\begin{aligned} \begin{aligned}
\Biggl( \Biggl(
& \left( & \left(
\begin{matrix}1.0 & 2.0\\\\5.0 & 6.0\end{matrix} \begin{matrix}1.0 & 2.0\\5.0 & 6.0\end{matrix}
\right),\\\\ \right),\\
& \left( & \left(
\begin{matrix}2.0 & 3.0\\\\6.0 & 7.0\end{matrix} \begin{matrix}2.0 & 3.0\\6.0 & 7.0\end{matrix}
\right) \right)
\Biggr) \Biggr)
\end{aligned} \end{aligned}
...@@ -496,8 +513,25 @@ $$ ...@@ -496,8 +513,25 @@ $$
##### 张量选择的调用 ##### 张量选择的调用
NiuTrans.Tensor提供了张量的选择操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的选择操作,调用方法及参数说明如下所示:
第一种选择操由一个0,1构成的index矩阵对张量进行选择:
```
void _Select(const XTensor * a, XTensor * c, XTensor * indexCPU)
XTensor Select(const XTensor &a, XTensor &indexCPU)
```
Parameters:
* a - 输入张量
* c - 输出张量
* indexCPU - 张量选择标志
第二种调用方式是按位置范围对张量进行选择:
``` ```
_SelectRange(XTensor * a, int dim, int low, int high, XTensor * c) void _SelectRange(const XTensor * a, XTensor * c, int dim, int low, int high)
XTensor SelectRange(const XTensor &a, int dim, int low, int high)
``` ```
Parameters: Parameters:
...@@ -505,7 +539,7 @@ Parameters: ...@@ -505,7 +539,7 @@ Parameters:
* dim - 在哪一维对张量进行张量选择操作 * dim - 在哪一维对张量进行张量选择操作
* low - 张量选择范围的下限 * low - 张量选择范围的下限
* high - 张量选择范围的上限 * high - 张量选择范围的上限
* c - 果张量 * c - 输出张量
>需要注意的是,当张量选择的取值范围为[1,3]时意味着选择的是索引位置为1和2的值 >需要注意的是,当张量选择的取值范围为[1,3]时意味着选择的是索引位置为1和2的值
...@@ -524,23 +558,70 @@ NiuTrans.Tensor/Tensor/test/TSelect.cpp ...@@ -524,23 +558,70 @@ NiuTrans.Tensor/Tensor/test/TSelect.cpp
##### 什么是SetData? ##### 什么是SetData?
SetData的作用是将张量在一定取值范围内随机进行初始化设置,一个\\(2 \times 4\\)的张量在[0.0,1.0]的取值范围SetData过程如下所示: SetData的作用是将张量在一定取值范围内随机进行初始化设置,一个$2 \times 4$的张量在[0.0,1.0]的取值范围SetData过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 0.0 & 0.0 & 0.0\\\\0.0 & 0.0 & 0.0 & 0.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 0.0 & 0.0 & 0.0\\0.0 & 0.0 & 0.0 & 0.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}0.1 & 0.5 & 0.3 & 0.9\\\\0.8 & 0.5 & 0.5 & 0.2\end{matrix}\right) \left(\begin{matrix}0.1 & 0.5 & 0.3 & 0.9\\0.8 & 0.5 & 0.5 & 0.2\end{matrix}\right)
$$ $$
##### SetData调用 ##### SetData调用
NiuTrans.Tensor提供了张量的SetData操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的SetData操作,调用方法及参数说明如下所示:
设置张量为固定值:
``` ```
SetDataRand(DTYPE lower, DTYPE upper) void SetDataFixed(XTensor * tensor, void * valuePointer)
``` ```
Parameters: Parameters:
* lower - 取值下限 * tensor - 输入张量
* upper - 取值上限 * valuePointer - 指向数据的指针
设置张量为整型值:
```
void SetDataFixedInt(XTensor * tensor, int p)
```
Parameters:
* tensor - 输入张量
* p - 固定整型值
设置张量为单精度浮点值:
```
void SetDataFixedFloat(XTensor * tensor, float p)
```
Parameters:
* tensor - 输入张量
* p - 固定单精度浮点值
设置张量为双精度浮点值:
```
void SetDataFixedDouble(XTensor * tensor, double p)
```
Parameters:
* tensor - 输入张量
* p - 固定双精度浮点值
设置张量为随机分布:
```
void SetDataRand(XTensor * tensor, DTYPE low, DTYPE high)
```
* tensor - 输入张量
* low - 取值下限
* high - 取值上限
设置张量为正态分布:
```
void SetDataRandN(XTensor * tensor, DTYPE mean, DTYPE standardDeviation)
```
Parameters:
* tensor - 输入张量
* mean - 均值
* standardDeviation - 标准差
##### SetData片段示例 ##### SetData片段示例
...@@ -553,7 +634,7 @@ s->SetDataRand(0.0, 1.0); ...@@ -553,7 +634,7 @@ s->SetDataRand(0.0, 1.0);
NiuTrans.Tensor/Tensor/test/TSetData.cpp NiuTrans.Tensor/Tensor/test/TSetData.cpp
### math ### 数学运算(math)
此部分包括各种非基本代数操作,包括:log、exp、abs等。 此部分包括各种非基本代数操作,包括:log、exp、abs等。
...@@ -568,7 +649,11 @@ NiuTrans.Tensor/Tensor/test/TSetData.cpp ...@@ -568,7 +649,11 @@ NiuTrans.Tensor/Tensor/test/TSetData.cpp
NiuTrans.Tensor提供了张量的Normalize操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Normalize操作,调用方法及参数说明如下所示:
``` ```
_Normalize(XTensor * input, XTensor * output, int dim, XTensor * mean, XTensor * var, XTensor * a, XTensor * b, DTYPE epsilon) void _Normalize(const XTensor * input, XTensor * output, int dim, const XTensor * mean, const XTensor * var, const XTensor * a, const XTensor * b, DTYPE epsilon)
void _NormalizeMe(XTensor * input, int dim, const XTensor * mean, const XTensor * var, const XTensor * a, const XTensor * b, DTYPE epsilon)
XTensor Normalize(const XTensor &input, int dim, const XTensor &mean, const XTensor &var, const XTensor &a, const XTensor &b, DTYPE epsilon)
``` ```
Parameters: Parameters:
...@@ -596,25 +681,25 @@ NiuTrans.Tensor/Tensor/test/TNormalize.cpp ...@@ -596,25 +681,25 @@ NiuTrans.Tensor/Tensor/test/TNormalize.cpp
##### 什么是张量的幂运算操作? ##### 什么是张量的幂运算操作?
幂运算是一种关于幂的数学运算,张量的幂运算是将张量中的每个元素都进行幂运算从而得到新的张量,一个维度为\\(3 \times 2\\)的幂为2.0的张量幂运算过程如下所示: 幂运算是一种关于幂的数学运算,张量的幂运算是将张量中的每个元素都进行幂运算从而得到新的张量,一个维度为$3 \times 2$的幂为2.0的张量幂运算过程如下所示:
$$ $$
\left(\begin{matrix}1.0 & 2.0\\\\3.0 & 4.0\\\\5.0 & 6.0\end{matrix}\right) \rightarrow \left(\begin{matrix}1.0 & 2.0\\3.0 & 4.0\\5.0 & 6.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}1.0 & 4.0\\\\9.0 & 16.0\\\\25.0 & 36.0\end{matrix}\right) \left(\begin{matrix}1.0 & 4.0\\9.0 & 16.0\\25.0 & 36.0\end{matrix}\right)
$$ $$
##### 张量幂运算的调用 ##### 张量幂运算的调用
NiuTrans.Tensor提供了张量幂运算的操作,用来进行张量的按元素位置进行幂运算的操作,调用方法为: NiuTrans.Tensor提供了张量幂运算的操作,用来进行张量的按元素位置进行幂运算的操作,调用方法为:
``` ```
_Power(XTensor * a, DTYPE p) void _Power(XTensor * a, DTYPE p)
``` ```
其中a为进行操作的张量,p为次方数,张量幂运算的参数说明如下所示: 其中a为进行操作的张量,p为次方数,张量幂运算的参数说明如下所示:
Parameters: Parameters:
* a - 操作张量 * a - 输入张量
* p - 次方数 * p - 次方数
##### 张量幂运算片段示例 ##### 张量幂运算片段示例
...@@ -632,24 +717,29 @@ NiuTrans.Tensor/Tensor/test/TPower.cpp ...@@ -632,24 +717,29 @@ NiuTrans.Tensor/Tensor/test/TPower.cpp
##### 什么是张量的缩放和偏移? ##### 什么是张量的缩放和偏移?
张量的缩放和偏移计算公式为:p = p * scale + shift,其中scale和shift分别为张量缩放和偏移的参数,一个\\(2 \times 4\\)的张量进行缩放和偏移的过程如下所示,缩放参数取2.0,偏移参数取0.5: 张量的缩放和偏移计算公式为:p = p * scale + shift,其中scale和shift分别为张量缩放和偏移的参数,一个$2 \times 4$的张量进行缩放和偏移的过程如下所示,缩放参数取2.0,偏移参数取0.5:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}0.5 & 2.5 & 4.5 & 6.5\\\\8.5 & 10.5 & 12.5 & 14.5\end{matrix}\right) \left(\begin{matrix}0.5 & 2.5 & 4.5 & 6.5\\8.5 & 10.5 & 12.5 & 14.5\end{matrix}\right)
$$ $$
##### 张量缩放和偏移的调用 ##### 张量缩放和偏移的调用
NiuTrans.Tensor提供了张量的缩放和偏移操作,调用方法为: NiuTrans.Tensor提供了张量的缩放和偏移操作,调用方法为:
``` ```
_ScaleAndShift(XTensor * a, DTYPE scale, DTYPE shift) void _ScaleAndShift(const XTensor * a, XTensor * b, DTYPE scale, DTYPE shift = 0)
void _ScaleAndShiftMe(XTensor * a, DTYPE scale, DTYPE shift = 0)
XTensor ScaleAndShift(const XTensor &a, DTYPE scale, DTYPE shift = 0)
``` ```
张量的缩放和偏移操作结果为:p = p * scale + shift,其中scale和shift分别为张量的缩放和偏移参数,张量缩放和偏移操作的参数说明如下表所示: 张量的缩放和偏移操作结果为:p = p * scale + shift,其中scale和shift分别为张量的缩放和偏移参数,张量缩放和偏移操作的参数说明如下表所示:
Parameters: Parameters:
* a - 输入张量 * a - 输入张量
* b - 输出张量
* scale - 缩放参数 * scale - 缩放参数
* shift - 偏移参数 * shift - 偏移参数
...@@ -664,7 +754,7 @@ _ScaleAndShift(input, scaleFactor, shiftFactor); ...@@ -664,7 +754,7 @@ _ScaleAndShift(input, scaleFactor, shiftFactor);
NiuTrans.Tensor/Tensor/test/TScaleAndShift.cpp NiuTrans.Tensor/Tensor/test/TScaleAndShift.cpp
### movement ### 数据移动(movement)
此部分主要是介绍有关数据拷贝函数。 此部分主要是介绍有关数据拷贝函数。
...@@ -672,24 +762,26 @@ NiuTrans.Tensor/Tensor/test/TScaleAndShift.cpp ...@@ -672,24 +762,26 @@ NiuTrans.Tensor/Tensor/test/TScaleAndShift.cpp
##### 什么是张量的拷贝操作? ##### 什么是张量的拷贝操作?
拷贝,即将一个张量的值赋给另一个张量,也就是对张量进行拷贝操作,一个\\(2 \times 4\\)的张量拷贝过程如下所示: 拷贝,即将一个张量的值赋给另一个张量,也就是对张量进行拷贝操作,一个$2 \times 4$的张量拷贝过程如下所示:
$$ $$
\left(\begin{matrix}5.0 & 1.0 & 2.0 & 8.0\\\\4.0 & 3.0 & 7.0 & 6.0\end{matrix}\right) \rightarrow \left(\begin{matrix}5.0 & 1.0 & 2.0 & 8.0\\4.0 & 3.0 & 7.0 & 6.0\end{matrix}\right) \rightarrow
\left( \left(
\begin{matrix}5.0 & 1.0 & 2.0 & 8.0\\\\4.0 & 3.0 & 7.0 & 6.0\end{matrix}\right) \begin{matrix}5.0 & 1.0 & 2.0 & 8.0\\4.0 & 3.0 & 7.0 & 6.0\end{matrix}\right)
$$ $$
##### 张量拷贝操作的调用 ##### 张量拷贝操作的调用
NiuTrans.Tensor提供了张量的拷贝操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的拷贝操作,调用方法及参数说明如下所示:
``` ```
_CopyValues(XTensor * s, XTensor * t, XStream * stream) void _CopyValues(const XTensor * s, XTensor * t, XStream * stream = NULL)
XTensor CopyValues(const XTensor &s, XStream * stream = NULL)
``` ```
Parameters: Parameters:
* s - 输入张量 * s - 输入张量
* t - 输结果张量 * t - 输张量
* stream - 多线程流 * stream - 多线程流
##### 张量拷贝片段示例 ##### 张量拷贝片段示例
...@@ -707,30 +799,30 @@ NiuTrans.Tensor/Tensor/test/TCopyValues.cpp ...@@ -707,30 +799,30 @@ NiuTrans.Tensor/Tensor/test/TCopyValues.cpp
##### 什么是张量的CopyIndexed操作? ##### 什么是张量的CopyIndexed操作?
CopyIndexed,即按指定索引位置拷贝张量,一个\\(2 \times 2 \times 3\\)的张量拷贝过程如下所示,本例中是对张量维度2上起始位置索引为0和2的1个元素进行拷贝,所得张量维度为\\(2 \times 2 \times 2\\): CopyIndexed,即按指定索引位置拷贝张量,一个$2 \times 2 \times 3$的张量拷贝过程如下所示,本例中是对张量维度2上起始位置索引为0和2的1个元素进行拷贝,所得张量维度为$2 \times 2 \times 2$:
$$ $$
\begin{aligned} \begin{aligned}
\Biggl( \Biggl(
& \left( & \left(
\begin{matrix}0.0 & -1.0 & 2.0\\\\2.0 & 1.0 & 3.0\end{matrix}\right),\\\\ \begin{matrix}0.0 & -1.0 & 2.0\\2.0 & 1.0 & 3.0\end{matrix}\right),\\
& \left( & \left(
\begin{matrix}1.0 & 2.0 & 4.0\\\\3.0 & 1.0 & 2.0\end{matrix} \begin{matrix}1.0 & 2.0 & 4.0\\3.0 & 1.0 & 2.0\end{matrix}
\right),\\\\ \right),\\
& \left( & \left(
\begin{matrix}-1.0 & 3.0 & 2.0\\\\1.0 & -1.0 & 0.0\end{matrix} \begin{matrix}-1.0 & 3.0 & 2.0\\1.0 & -1.0 & 0.0\end{matrix}
\right) \right)
\Biggr) \Biggr)
\end{aligned} \rightarrow \end{aligned} \rightarrow
\begin{aligned} \begin{aligned}
\Biggl( \Biggl(
& \left( & \left(
\begin{matrix}0.0 & 2.0\\\\2.0 & 3.0\end{matrix}\right),\\\\ \begin{matrix}0.0 & 2.0\\2.0 & 3.0\end{matrix}\right),\\
& \left( & \left(
\begin{matrix}1.0 & 4.0\\\\3.0 & 2.0\end{matrix} \begin{matrix}1.0 & 4.0\\3.0 & 2.0\end{matrix}
\right),\\\\ \right),\\
& \left( & \left(
\begin{matrix}-1.0 & 2.0\\\\1.0 & 0.0\end{matrix} \begin{matrix}-1.0 & 2.0\\1.0 & 0.0\end{matrix}
\right) \right)
\Biggr) \Biggr)
\end{aligned} \end{aligned}
...@@ -740,12 +832,14 @@ $$ ...@@ -740,12 +832,14 @@ $$
NiuTrans.Tensor提供了张量的CopyIndexed操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的CopyIndexed操作,调用方法及参数说明如下所示:
``` ```
_CopyIndexed(XTensor * s, XTensor * t, int dim, int * srcIndex, int indexSize, int * tgtIndex, int copyNum) void _CopyIndexed(const XTensor * s, XTensor * t, int dim, int * srcIndex, int indexSize, int * tgtIndex, int copyNum)
XTensor CopyIndexed(const XTensor &s, int dim, int * srcIndex,int indexSize, int * tgtIndex, int copyNum)
``` ```
Parameters: Parameters:
* s - 输入张量 * s - 输入张量
* t - 输结果张量 * t - 输张量
* dim - 在哪一维对张量进行CopyIndexed操作 * dim - 在哪一维对张量进行CopyIndexed操作
* srcIndex - 源索引,即在指定dim上进行赋值的值的索引 * srcIndex - 源索引,即在指定dim上进行赋值的值的索引
* indexSize - 源索引的个数 * indexSize - 源索引的个数
...@@ -763,29 +857,31 @@ _CopyIndexed(s, t, 2, srcIndex, indexSize, tgtIndex, 1); ...@@ -763,29 +857,31 @@ _CopyIndexed(s, t, 2, srcIndex, indexSize, tgtIndex, 1);
NiuTrans.Tensor/Tensor/test/TCopyIndexed.cpp NiuTrans.Tensor/Tensor/test/TCopyIndexed.cpp
### reduce ### 规约操作(reduce)
#### 归约取最大值(ReduceMax) #### 归约取最大值(ReduceMax)
##### 什么是张量的归约取最大值? ##### 什么是张量的归约取最大值?
张量的归约取最大值操作是沿着张量的某一维度,取得该向量在该维度中的最大值,一个\\(2 \times 4\\)的张量在维度0和维度1进行取最大值操作的过程分别如下所示: 张量的归约取最大值操作是沿着张量的某一维度,取得该向量在该维度中的最大值,一个$2 \times 4$的张量在维度0和维度1进行取最大值操作的过程分别如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \left(\begin{matrix}4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right)
$$ $$
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}3.0\\\\7.0\end{matrix}\right) \left(\begin{matrix}3.0\\7.0\end{matrix}\right)
$$ $$
##### 张量归约取最大值操作的调用 ##### 张量归约取最大值操作的调用
NiuTrans.Tensor提供了张量的ReduceMax操作,用来获得张量中沿指定维度取得的最大值,张量归约取最大值操作的调用方式及参数说明如下所示: NiuTrans.Tensor提供了张量的ReduceMax操作,用来获得张量中沿指定维度取得的最大值,张量归约取最大值操作的调用方式及参数说明如下所示:
``` ```
_ReduceMax(XTensor * input, XTensor * output, int dim) void _ReduceMax(const XTensor * input, XTensor * output, int dim)
XTensor ReduceMax(const XTensor &input, int dim)
``` ```
Parameters: Parameters:
...@@ -809,23 +905,25 @@ NiuTrans.Tensor/Tensor/test/TReduceMax.cpp ...@@ -809,23 +905,25 @@ NiuTrans.Tensor/Tensor/test/TReduceMax.cpp
##### 什么是张量的归约求和操作? ##### 什么是张量的归约求和操作?
张量的归约求和操作是沿着张量的某一维度,计算该张量在该维度的和,一个\\(2 \times 4\\)的张量在维度0和维度1进行求和操作的过程分别如下所示: 张量的归约求和操作是沿着张量的某一维度,计算该张量在该维度的和,一个$2 \times 4$的张量在维度0和维度1进行求和操作的过程分别如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}4.0 & 6.0 & 8.0 & 10.0\end{matrix}\right) \left(\begin{matrix}4.0 & 6.0 & 8.0 & 10.0\end{matrix}\right)
$$ $$
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}6.0\\\\22.0\end{matrix}\right) \left(\begin{matrix}6.0\\22.0\end{matrix}\right)
$$ $$
##### 张量归约求和操作的调用 ##### 张量归约求和操作的调用
NiuTrans.Tensor提供了张量的ReduceSum操作,调用方法为: NiuTrans.Tensor提供了张量的ReduceSum操作,调用方法为:
``` ```
_ReduceSum(XTensor * input, XTensor * output, int dim, XTensor * shift, DTYPE power, bool isExp) void _ReduceSum(const XTensor * input, XTensor * output, int dim, const XTensor * shift = NULL, DTYPE power = (DTYPE)1.0F, bool isExp = false)
XTensor ReduceSum(const XTensor &input, int dim, const XTensor &shift = NULLTensor, DTYPE power = (DTYPE)1.0F, bool isExp = false)
``` ```
其中shift默认为NULL,power默认为1.0F,isExp默认为false,张量归约求和操作的参数说明如下所示: 其中shift默认为NULL,power默认为1.0F,isExp默认为false,张量归约求和操作的参数说明如下所示:
...@@ -854,23 +952,25 @@ NiuTrans.Tensor/Tensor/test/TReduceSum.cpp ...@@ -854,23 +952,25 @@ NiuTrans.Tensor/Tensor/test/TReduceSum.cpp
##### 什么是张量的归约取均值操作? ##### 什么是张量的归约取均值操作?
张量的归约取均值操作是沿着张量的某一维度,计算该张量在该维度的均值,一个\\(2 \times 4\\)的张量在维度0和维度1进行取均值操作的过程分别如下所示: 张量的归约取均值操作是沿着张量的某一维度,计算该张量在该维度的均值,一个$2 \times 4$的张量在维度0和维度1进行取均值操作的过程分别如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}2.0 & 3.0 & 4.0 & 5.0\end{matrix}\right) \left(\begin{matrix}2.0 & 3.0 & 4.0 & 5.0\end{matrix}\right)
$$ $$
$$ $$
\left(\begin{matrix}1.0 & 1.0 & 3.0 & 3.0\\\\4.0 & 4.0 & 6.0 & 6.0\end{matrix}\right) \rightarrow \left(\begin{matrix}1.0 & 1.0 & 3.0 & 3.0\\4.0 & 4.0 & 6.0 & 6.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}2.0\\\\5.0\end{matrix}\right) \left(\begin{matrix}2.0\\5.0\end{matrix}\right)
$$ $$
##### 张量归约取均值操作的调用 ##### 张量归约取均值操作的调用
NiuTrans.Tensor提供了张量的ReduceMean操作,调用方法为: NiuTrans.Tensor提供了张量的ReduceMean操作,调用方法为:
``` ```
_ReduceMean(XTensor * input, XTensor * output, int dim) void _ReduceMean(const XTensor * input, XTensor * output, int dim)
XTensor ReduceMean(const XTensor &input, int dim)
``` ```
ReduceMean用来获得张量中沿指定维度取得的数值均值,张量归约取均值的参数说明如下所示: ReduceMean用来获得张量中沿指定维度取得的数值均值,张量归约取均值的参数说明如下所示:
...@@ -896,10 +996,10 @@ NiuTrans.Tensor/Tensor/test/TReduceMean.cpp ...@@ -896,10 +996,10 @@ NiuTrans.Tensor/Tensor/test/TReduceMean.cpp
##### 什么是张量的归约取方差操作? ##### 什么是张量的归约取方差操作?
张量的归约取方差操作是沿着张量的某一维度,计算该张量在该维度的方差,一个\\(2 \times 4\\)的张量在维度0进行取方差操作的过程如下所示: 张量的归约取方差操作是沿着张量的某一维度,计算该张量在该维度的方差,一个$2 \times 4$的张量在维度0进行取方差操作的过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}8.0 & 8.0 & 8.0 & 8.0\end{matrix}\right) \left(\begin{matrix}8.0 & 8.0 & 8.0 & 8.0\end{matrix}\right)
$$ $$
...@@ -907,7 +1007,9 @@ $$ ...@@ -907,7 +1007,9 @@ $$
NiuTrans.Tensor提供了张量的ReduceSumSquared操作,调用方法为: NiuTrans.Tensor提供了张量的ReduceSumSquared操作,调用方法为:
``` ```
_ReduceSumSquared(XTensor * input, XTensor * output, int dim, XTensor * shift) void _ReduceSumSquared(const XTensor * input, XTensor * output, int dim, const XTensor * shift)
XTensor ReduceSumSquared(const XTensor &input, int dim, const XTensor &shift)
``` ```
ReduceSumSquared用来计算张量的沿着某一维度元素的方差,张量归约取方差操作的参数说明如下所示: ReduceSumSquared用来计算张量的沿着某一维度元素的方差,张量归约取方差操作的参数说明如下所示:
...@@ -933,10 +1035,10 @@ NiuTrans.Tensor/Tensor/test/TReduceSumSquared.cpp ...@@ -933,10 +1035,10 @@ NiuTrans.Tensor/Tensor/test/TReduceSumSquared.cpp
##### 什么是张量的归约取标准差操作? ##### 什么是张量的归约取标准差操作?
张量的归约取标准差操作是沿着张量的某一维度,计算该张量在该维度的标准差,一个\\(2 \times 4\\)的张量在维度0进行取标准差操作的过程如下所示: 张量的归约取标准差操作是沿着张量的某一维度,计算该张量在该维度的标准差,一个$2 \times 4$的张量在维度0进行取标准差操作的过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}4.0 & 4.0 & 4.0 & 4.0\end{matrix}\right) \left(\begin{matrix}4.0 & 4.0 & 4.0 & 4.0\end{matrix}\right)
$$ $$
...@@ -944,7 +1046,9 @@ $$ ...@@ -944,7 +1046,9 @@ $$
NiuTrans.Tensor提供了张量的ReduceVariance操作,调用方法为: NiuTrans.Tensor提供了张量的ReduceVariance操作,调用方法为:
``` ```
_ReduceVariance(XTensor * input, XTensor * output, int dim, XTensor * mean) void _ReduceVariance(const XTensor * input, XTensor * output, int dim, const XTensor * mean)
XTensor ReduceVariance(const XTensor &input, int dim, const XTensor &mean)
``` ```
ReduceVariance用来计算张量的沿着某一维度元素的标准差,张量归约取标准差操作的参数说明如下所示: ReduceVariance用来计算张量的沿着某一维度元素的标准差,张量归约取标准差操作的参数说明如下所示:
...@@ -966,7 +1070,7 @@ _ReduceVariance(input, output, 0, mean); ...@@ -966,7 +1070,7 @@ _ReduceVariance(input, output, 0, mean);
NiuTrans.Tensor/Tensor/test/TReduceVariance.cpp NiuTrans.Tensor/Tensor/test/TReduceVariance.cpp
### shape ### 形状转换(shape)
此部分主要包括关于形状改变的函数,比如:split、merge、reshape等。 此部分主要包括关于形状改变的函数,比如:split、merge、reshape等。
...@@ -974,36 +1078,41 @@ NiuTrans.Tensor/Tensor/test/TReduceVariance.cpp ...@@ -974,36 +1078,41 @@ NiuTrans.Tensor/Tensor/test/TReduceVariance.cpp
##### 什么是张量的级联操作? ##### 什么是张量的级联操作?
张量间的级联操作是沿着张量的某一维度,将一系列张量或是一个列表中的所有张量连接在一起组成一个更大的张量,将维度分别为\\(2 \times 1\\)和\\(2 \times 2\\)的两个张量进行级联过程如下所示: 张量间的级联操作是沿着张量的某一维度,将一系列张量或是一个列表中的所有张量连接在一起组成一个更大的张量,将维度分别为$2 \times 1$和$2 \times 2$的两个张量进行级联过程如下所示:
$$ $$
\left(\begin{matrix}0.0\\\\1.0\end{matrix}\right) + \left(\begin{matrix}0.0\\1.0\end{matrix}\right) +
\left(\begin{matrix}2.0 & 3.0\\\\4.0 & 5.0\end{matrix}\right) \rightarrow \left(\begin{matrix}2.0 & 3.0\\4.0 & 5.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}0.0 & 2.0 & 3.0\\\\1.0 & 4.0 & 5.0\end{matrix}\right) \left(\begin{matrix}0.0 & 2.0 & 3.0\\1.0 & 4.0 & 5.0\end{matrix}\right)
$$ $$
##### 张量级联的调用 ##### 张量级联的调用
NiuTrans.Tensor提供了张量间的级联操作,调用方法为: NiuTrans.Tensor提供了张量间的级联操作,调用方法为:
```
_Concatenate(XList * smalls, XTensor * big, int dim)
_Concatenate(XTensor * smallA, XTensor * smallB, XTensor * big, int dim)
```
第一种调用方法中的操作对象是列表,将进行级联操作的张量存入列表smalls中,级联结果存入张量big中: 第一种调用方法中的操作对象是列表,将进行级联操作的张量存入列表smalls中,级联结果存入张量big中:
```
void _Concatenate(const XList * smalls, XTensor * big, int dim)
XTensor Concatenate(const XList &smalls, int dim)
```
Parameters: Parameters:
* smalls - 进行级联张量的列表 * smalls - 进行级联张量的列表
* big - 果张量 * big - 输出张量
* dim - 在指定维度进行级联 * dim - 在指定维度进行级联
第二种方法操作对象不再是列表中的张量而是直接对一系列张量进行级联操作: 第二种方法操作对象不再是列表中的张量而是直接对一系列张量进行级联操作:
```
void _Concatenate(const XTensor * smallA, const XTensor * smallB, XTensor * big, int dim)
XTensor Concatenate(const XTensor &smallA, const XTensor &smallB, int dim)
```
Parameters: Parameters:
* smallA - 操作张量1 * smallA - 输入张量1
* smallB - 操作张量2 * smallB - 输入张量2
* big - 果张量 * big - 输出张量
* dim - 进行级联的维度 * dim - 进行级联的维度
##### 张量级联片段示例 ##### 张量级联片段示例
...@@ -1028,47 +1137,52 @@ NiuTrans.Tensor/Tensor/test/TConcatenate.cpp ...@@ -1028,47 +1137,52 @@ NiuTrans.Tensor/Tensor/test/TConcatenate.cpp
张量间的切分操作是沿着张量的某一维度,可以将一个张量切分成另一张量,也可以将一个大的张量切分成n个小的张量集合的列表。 张量间的切分操作是沿着张量的某一维度,可以将一个张量切分成另一张量,也可以将一个大的张量切分成n个小的张量集合的列表。
第一种情况下将维度为\\(4 \times 3\\)张量沿着维度0进行切分,切分份数为2,得到维度为\\(2 \times 2 \times 3\\)的张量的过程如下所示: 第一种情况下将维度为$4 \times 3$张量沿着维度0进行切分,切分份数为2,得到维度为$2 \times 2 \times 3$的张量的过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\\\\0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\\0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}\right) \rightarrow
\begin{aligned} \begin{aligned}
\Biggl( & \left( \Biggl( & \left(
\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\end{matrix}\right), \begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\end{matrix}\right),
\\\\ & \left( \\ & \left(
\begin{matrix}0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix} \begin{matrix}0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}
\right) \Biggr) \right) \Biggr)
\end{aligned} \end{aligned}
$$ $$
在第二种情况下将维度为\\(4 \times 3\\)张量沿着维度0进行切分,切分份数为2,得到两个维度均为\\(2 \times 3\\)的张量的过程如下所示: 在第二种情况下将维度为$4 \times 3$张量沿着维度0进行切分,切分份数为2,得到两个维度均为$2 \times 3$的张量的过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\\\\0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\\0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}\right) \rightarrow
\left(\begin{matrix}0.0 & 2.0 & 3.0\\\\1.0 & 4.0 & 5.0\end{matrix}\right) + \left(\begin{matrix}0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix}\right) \left(\begin{matrix}0.0 & 2.0 & 3.0\\1.0 & 4.0 & 5.0\end{matrix}\right) + \left(\begin{matrix}0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}\right)
$$ $$
##### 张量切分的调用 ##### 张量切分的调用
NiuTrans.Tensor提供了两种张量切分操作,调用方法为: NiuTrans.Tensor提供了两种张量切分操作,调用方法为:
```
_Split(XTensor * s, XTensor * t, int whereToSplit, int splitNum)
_Split(XTensor * big, XList * smalls, int whereToSplit, int splitNum)
```
在第一种调用方法中是将源张量中的某一维度进行Split操作,Split结果为张量t,whereToSplit为在哪一维度进行split操作,splitNum表示分成多少份,例如:(N, M) -> (N/3, M, 3),参数说明如下所示: 在第一种调用方法中是将源张量中的某一维度进行Split操作,Split结果为张量t,whereToSplit为在哪一维度进行split操作,splitNum表示分成多少份,例如:(N, M) -> (N/3, M, 3),参数说明如下所示:
```
void _Split(const XTensor * s, XTensor * t, int whereToSplit, int splitNum)
XTensor Split(const XTensor &s, int whereToSplit, int splitNum)
```
Parameters: Parameters:
* s - 操作张量 * s - 输入张量
* t - 果张量 * t - 输出张量
* whereToSplit - 在指定维度进行split操作 * whereToSplit - 在指定维度进行split操作
* splitNum - 分成多少份 * splitNum - 分成多少份
在第二种调用方法中是将所操作张量big按某一维度whereToSplit进行Split操作,操作结果为包含若干更小维度张量的列表smalls,splitNum表示分成多少份,例如:(N, M) -> 2 * (N/2, M),参数说明如下所示: 在第二种调用方法中是将所操作张量big按某一维度whereToSplit进行Split操作,操作结果为包含若干更小维度张量的列表smalls,splitNum表示分成多少份,例如:(N, M) -> 2 * (N/2, M),参数说明如下所示:
```
void _Split(const XTensor * big, XList * smalls, int whereToSplit, int splitNum)
XList SplitList(const XTensor &big, int whereToSplit, int splitNum)
```
Parameters: Parameters:
* big - 操作张量 * big - 输入张量
* smalls - 存放切分出张量的列表 * smalls - 存放切分出张量的列表
* whereToSplit - 在指定维度进行split操作 * whereToSplit - 在指定维度进行split操作
* splitNum - 分成多少份 * splitNum - 分成多少份
...@@ -1096,44 +1210,49 @@ NiuTrans.Tensor/Tensor/test/TSplit.cpp ...@@ -1096,44 +1210,49 @@ NiuTrans.Tensor/Tensor/test/TSplit.cpp
张量间的合并操作与级联有些类似,是沿着张量的某一维度,可以将一个张量合并为另一个维度不同的张量,也可以将一个列表中的所有张量合并在一起组成一个更大的张量。 张量间的合并操作与级联有些类似,是沿着张量的某一维度,可以将一个张量合并为另一个维度不同的张量,也可以将一个列表中的所有张量合并在一起组成一个更大的张量。
在第一种情况下将维度为\\(2 \times 2 \times 3\\)的张量在维度1进行合并,进行合并的维度为0,得到维度为\\(4 \times 3\\)的张量的过程如下所示: 在第一种情况下将维度为$2 \times 2 \times 3$的张量在维度1进行合并,进行合并的维度为0,得到维度为$4 \times 3$的张量的过程如下所示:
$$ $$
\begin{aligned} \begin{aligned}
\Biggl( & \left( \Biggl( & \left(
\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\end{matrix}\right), \begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\end{matrix}\right),
\\\\ & \left( \\ & \left(
\begin{matrix}0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix} \begin{matrix}0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}
\right) \Biggr) \right) \Biggr)
\end{aligned} \rightarrow \end{aligned} \rightarrow
\left(\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\\\\0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix}\right) \left(\begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\\0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}\right)
$$ $$
在第二种情况下将两个维度均为\\(2 \times 3\\)的张量沿着维度0合并为维度为\\(4 \times 3\\)的张量的过程如下所示: 在第二种情况下将两个维度均为$2 \times 3$的张量沿着维度0合并为维度为$4 \times 3$的张量的过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 2.0 & 3.0\\\\1.0 & 4.0 & 5.0\end{matrix}\right) + \left(\begin{matrix}0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 2.0 & 3.0\\1.0 & 4.0 & 5.0\end{matrix}\right) + \left(\begin{matrix}0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}\right) \rightarrow
\left(\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\\\\0.1 & 1.1 & 2.1\\\\3.1 & 4.1 & 5.1\end{matrix}\right) \left(\begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\\0.1 & 1.1 & 2.1\\3.1 & 4.1 & 5.1\end{matrix}\right)
$$ $$
##### 张量合并操作的调用 ##### 张量合并操作的调用
NiuTrans.Tensor提供了张量的合并操作,调用方法为: NiuTrans.Tensor提供了张量的合并操作,调用方法为:
```
_Merge(XTensor * s, XTensor * t, int whereToMerge, int leadingDim)
_Merge(XList * smalls, XTensor * big, int whereToMerge)
```
在第一种调用方法中是将源张量中的某一维度进行Merge操作,Merge结果为张量t,whereToMerge为指定进行Merge操作的维度,leadingDim为指定将哪一维度Merge,例如:(N/2, 2, M) -> (N, M),参数说明如下表所示: 在第一种调用方法中是将源张量中的某一维度进行Merge操作,Merge结果为张量t,whereToMerge为指定进行Merge操作的维度,leadingDim为指定将哪一维度Merge,例如:(N/2, 2, M) -> (N, M),参数说明如下表所示:
```
void _Merge(const XTensor * s, XTensor * t, int whereToMerge, int leadingDim = -1)
XTensor Merge(const XTensor &s, int whereToMerge, int leadingDim = -1)
```
Parameters: Parameters:
* s - 操作张量 * s - 输入张量
* t - 果张量 * t - 输出张量
* whereToMerge - 沿着指定维度进行Merge操作 * whereToMerge - 沿着指定维度进行Merge操作
* leadingDim - 把指定维度进行Merge操作 * leadingDim - 把指定维度进行Merge操作
在第二种调用方法中是将所操作张量存入列表smalls中,操作结果为张量big,whereToMerge为指定进行Merge操作的维度,例如:2 * (N/2, M) -> (N, M),参数说明如下表所示: 在第二种调用方法中是将所操作张量存入列表smalls中,操作结果为张量big,whereToMerge为指定进行Merge操作的维度,例如:2 * (N/2, M) -> (N, M),参数说明如下表所示:
```
void _Merge(const XList * smalls, XTensor * big, int whereToMerge)
XTensor Merge(const XList &smalls, int whereToMerge)
```
Parameters: Parameters:
* smalls - 存放进行合并张量的列表 * smalls - 存放进行合并张量的列表
...@@ -1161,26 +1280,26 @@ NiuTrans.Tensor/Tensor/test/TMerge.cpp ...@@ -1161,26 +1280,26 @@ NiuTrans.Tensor/Tensor/test/TMerge.cpp
##### 什么是Unsqueeze? ##### 什么是Unsqueeze?
Unsqueeze的作用是通过对张量进行操作,返回一个新的在指定维度插入新维度的张量,这个返回的张量与源张量共享相同的基础数据,一个\\(2 \times 3\\)的张量在维度1和2分别进行Unsqueeze的操作如下所示,插入新的维度大小均为2: Unsqueeze的作用是通过对张量进行操作,返回一个新的在指定维度插入新维度的张量,这个返回的张量与源张量共享相同的基础数据,一个$2 \times 3$的张量在维度1和2分别进行Unsqueeze的操作如下所示,插入新的维度大小均为2:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\end{matrix}\right) \rightarrow
\begin{aligned} \begin{aligned}
\Biggl( & \left( \Biggl( & \left(
\begin{matrix}0.0 & 1.0 & 2.0\\\\0.0 & 1.0 & 2.0\end{matrix}\right), \begin{matrix}0.0 & 1.0 & 2.0\\0.0 & 1.0 & 2.0\end{matrix}\right),
\\\\ & \left( \\ & \left(
\begin{matrix}3.0 & 4.0 & 5.0\\\\3.0 & 4.0 & 5.0\end{matrix} \begin{matrix}3.0 & 4.0 & 5.0\\3.0 & 4.0 & 5.0\end{matrix}
\right) \Biggr) \right) \Biggr)
\end{aligned} \end{aligned}
$$ $$
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0\\\\3.0 & 4.0 & 5.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0\\3.0 & 4.0 & 5.0\end{matrix}\right) \rightarrow
\begin{aligned} \begin{aligned}
\Biggl( & \left( \Biggl( & \left(
\begin{matrix}0.0 & 0.0\\\\1.0 & 1.0\\\\2.0 & 2.0\end{matrix}\right), \begin{matrix}0.0 & 0.0\\1.0 & 1.0\\2.0 & 2.0\end{matrix}\right),
\\\\ & \left( \\ & \left(
\begin{matrix}3.0 & 3.0\\\\4.0 & 4.0\\\\5.0 & 5.0\end{matrix} \begin{matrix}3.0 & 3.0\\4.0 & 4.0\\5.0 & 5.0\end{matrix}
\right) \Biggr) \right) \Biggr)
\end{aligned} \end{aligned}
$$ $$
...@@ -1189,12 +1308,14 @@ $$ ...@@ -1189,12 +1308,14 @@ $$
NiuTrans.Tensor提供了张量的Unsqueeze操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Unsqueeze操作,调用方法及参数说明如下所示:
``` ```
_Unsqueeze(XTensor * a, XTensor * b, int dim, int dSize) void _Unsqueeze(const XTensor * a, XTensor * b, int dim, int dSize)
XTensor Unsqueeze(const XTensor &a, int dim, int dSize)
``` ```
Parameters: Parameters:
* a - 输入张量 * a - 输入张量
* b - 输结果张量 * b - 输张量
* dim - 在指定维度进行Unsqueeze操作 * dim - 在指定维度进行Unsqueeze操作
* dSize - 插入维度的大小 * dSize - 插入维度的大小
...@@ -1210,7 +1331,7 @@ _Unsqueeze(s, t2, 2, 2); ...@@ -1210,7 +1331,7 @@ _Unsqueeze(s, t2, 2, 2);
NiuTrans.Tensor/Tensor/test/TUnsqueeze.cpp NiuTrans.Tensor/Tensor/test/TUnsqueeze.cpp
### sort ### 排序操作(sort)
此部分主要介绍排序相关的函数,如:sort、topk等。 此部分主要介绍排序相关的函数,如:sort、topk等。
...@@ -1218,23 +1339,23 @@ NiuTrans.Tensor/Tensor/test/TUnsqueeze.cpp ...@@ -1218,23 +1339,23 @@ NiuTrans.Tensor/Tensor/test/TUnsqueeze.cpp
##### 什么是Sort? ##### 什么是Sort?
Sort操作是对张量中元素沿着指定的维度进行排序,一个\\(2 \times 4\\)的张量沿着维度0进行Sort操作过程如下所示: Sort操作是对张量中元素沿着指定的维度进行排序,一个$2 \times 4$的张量沿着维度0进行Sort操作过程如下所示:
$$ $$
\left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow \left(\begin{matrix}0.0 & 1.0 & 2.0 & 3.0\\4.0 & 5.0 & 6.0 & 7.0\end{matrix}\right) \rightarrow
\left(\begin{matrix}4.0 & 5.0 & 6.0 & 7.0\\\\0.0 & 1.0 & 2.0 & 3.0\end{matrix}\right) \left(\begin{matrix}4.0 & 5.0 & 6.0 & 7.0\\0.0 & 1.0 & 2.0 & 3.0\end{matrix}\right)
$$ $$
##### Sort的调用 ##### Sort的调用
NiuTrans.Tensor提供了张量的Sort操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Sort操作,调用方法及参数说明如下所示:
``` ```
_Sort(XTensor * a, XTensor * index, int dim) void _Sort(XTensor * a, XTensor * index, int dim)
``` ```
Parameters: Parameters:
* a - 操作张量 * a - 输入张量
* index - 果张量中元素的索引 * index - 输出张量中元素的索引
* dim - 沿着指定维度进行Sort操作 * dim - 沿着指定维度进行Sort操作
##### Sort片段示例 ##### Sort片段示例
...@@ -1250,15 +1371,15 @@ _Sort(a, b, 0); ...@@ -1250,15 +1371,15 @@ _Sort(a, b, 0);
##### 什么是TopK? ##### 什么是TopK?
TopK操作是通过对张量中元素进行排序,得到最大或最小的k个元素值及其对应的索引值,在张量中,可以沿着某一维度进行TopK操作,一个\\(2 \times 4\\)的张量沿着维度0进行Top-2操作过程如下所示: TopK操作是通过对张量中元素进行排序,得到最大或最小的k个元素值及其对应的索引值,在张量中,可以沿着某一维度进行TopK操作,一个$2 \times 4$的张量沿着维度0进行Top-2操作过程如下所示:
$$ $$
\left(\begin{matrix}5.0 & 1.0 & 2.0 & 8.0\\\\4.0 & 3.0 & 7.0 & 6.0\end{matrix}\right) \rightarrow \left(\begin{matrix}5.0 & 1.0 & 2.0 & 8.0\\4.0 & 3.0 & 7.0 & 6.0\end{matrix}\right) \rightarrow
\begin{aligned} \begin{aligned}
outputAnswer: & \left( outputAnswer: & \left(
\begin{matrix}0.5 & 2.5 & 4.5 & 6.5\\\\8.5 & 10.5 & 12.5 & 14.5\end{matrix}\right)\\\\ + \begin{matrix}0.5 & 2.5 & 4.5 & 6.5\\8.5 & 10.5 & 12.5 & 14.5\end{matrix}\right)\\ +
\\\\ indexAnswer: & \left( \\ indexAnswer: & \left(
\begin{matrix}0 & 1 & 1 & 0\\\\1 & 0 & 0 & 1\end{matrix}\right) \begin{matrix}0 & 1 & 1 & 0\\1 & 0 & 0 & 1\end{matrix}\right)
\end{aligned} \end{aligned}
$$ $$
...@@ -1266,12 +1387,12 @@ $$ ...@@ -1266,12 +1387,12 @@ $$
NiuTrans.Tensor提供了张量的TopK操作,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的TopK操作,调用方法及参数说明如下所示:
``` ```
_TopK(XTensor * a, XTensor * b, XTensor * index, int dim, int k) void _TopK(XTensor * a, XTensor * b, XTensor * index, int dim, int k)
``` ```
Parameters: Parameters:
* a - 输入张量 * a - 输入张量
* b - 输结果张量 * b - 输张量
* index - 输出结果索引 * index - 输出结果索引
* dim - 沿着指定维度进行TopK操作 * dim - 沿着指定维度进行TopK操作
* k - TopK中k代表取最大的k个值 * k - TopK中k代表取最大的k个值
...@@ -1287,7 +1408,7 @@ _TopK(input, outputA, indexA, dim, k); ...@@ -1287,7 +1408,7 @@ _TopK(input, outputA, indexA, dim, k);
``` ```
有关TopK的详细代码示例见 NiuTrans.Tensor/Tensor/test/TTopK.cpp 有关TopK的详细代码示例见 NiuTrans.Tensor/Tensor/test/TTopK.cpp
### function ### 激活函数(function)
此部分主要介绍一些激活函数和损失函数。 此部分主要介绍一些激活函数和损失函数。
...@@ -1302,7 +1423,7 @@ Rectify銝蝘瘣餃嚗ectify摰蛹嚗 ...@@ -1302,7 +1423,7 @@ Rectify銝蝘瘣餃嚗ectify摰蛹嚗
NiuTrans.Tensor提供了张量的Rectify激活函数,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Rectify激活函数,调用方法及参数说明如下所示:
``` ```
Rectify(XTensor * x, XTensor * y) void _Rectify(const XTensor * x, XTensor * y)
``` ```
Parameters: Parameters:
...@@ -1314,7 +1435,7 @@ Parameters: ...@@ -1314,7 +1435,7 @@ Parameters:
Rectify示例代码如下,其中x为输入的向量,y为输入的张量: Rectify示例代码如下,其中x为输入的向量,y为输入的张量:
``` ```
/* call Rectify function */ /* call Rectify function */
Rectify(x, y); _Rectify(x, y);
``` ```
有关Rectify的详细代码示例见: 有关Rectify的详细代码示例见:
...@@ -1333,7 +1454,7 @@ HardTanH銝蝘瘣餃嚗ardTanH摰蛹嚗 ...@@ -1333,7 +1454,7 @@ HardTanH銝蝘瘣餃嚗ardTanH摰蛹嚗
NiuTrans.Tensor提供了张量的HardTanH激活函数,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的HardTanH激活函数,调用方法及参数说明如下所示:
``` ```
HardTanH(XTensor * x, XTensor * y) void _HardTanH(const XTensor * x, XTensor * y)
``` ```
Parameters: Parameters:
...@@ -1345,7 +1466,7 @@ Parameters: ...@@ -1345,7 +1466,7 @@ Parameters:
HardTanH示例代码如下,其中x为输入的向量,y为输入的张量: HardTanH示例代码如下,其中x为输入的向量,y为输入的张量:
``` ```
/* call hardtanh function */ /* call hardtanh function */
HardTanH(x, y); _HardTanH(x, y);
``` ```
有关HardTanH的详细代码示例见: 有关HardTanH的详细代码示例见:
...@@ -1362,7 +1483,7 @@ Identity銝蝘瘣餃嚗dentity摰蛹嚗 ...@@ -1362,7 +1483,7 @@ Identity銝蝘瘣餃嚗dentity摰蛹嚗
NiuTrans.Tensor提供了张量的Identity激活函数,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Identity激活函数,调用方法及参数说明如下所示:
``` ```
Identity(XTensor * x, XTensor * y) void _Identity(const XTensor * x, XTensor * y)
``` ```
Parameters: Parameters:
...@@ -1374,7 +1495,7 @@ Parameters: ...@@ -1374,7 +1495,7 @@ Parameters:
Identity示例代码如下,其中x为输入的向量,y为输入的张量: Identity示例代码如下,其中x为输入的向量,y为输入的张量:
``` ```
/* call Identity function */ /* call Identity function */
Identity(x, y); _Identity(x, y);
``` ```
有关Identity的详细代码示例见: 有关Identity的详细代码示例见:
...@@ -1391,7 +1512,7 @@ LogSoftmax銝蝘瘣餃嚗ogSoftmax摰蛹嚗 ...@@ -1391,7 +1512,7 @@ LogSoftmax銝蝘瘣餃嚗ogSoftmax摰蛹嚗
NiuTrans.Tensor提供了张量的LogSoftmax激活函数,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的LogSoftmax激活函数,调用方法及参数说明如下所示:
``` ```
LogSoftmax(XTensor * x, XTensor * y, int leadDim) void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim)
``` ```
Parameters: Parameters:
...@@ -1404,7 +1525,7 @@ Parameters: ...@@ -1404,7 +1525,7 @@ Parameters:
LogSoftmax示例代码如下,其中x为输入的向量,y为输入的张量,本例中沿着维度1进行LogSoftmax操作: LogSoftmax示例代码如下,其中x为输入的向量,y为输入的张量,本例中沿着维度1进行LogSoftmax操作:
``` ```
/* call LogSoftmax function */ /* call LogSoftmax function */
LogSoftmax(x, y, 1); _LogSoftmax(x, y, 1);
``` ```
有关LogSoftmax的详细代码示例见: 有关LogSoftmax的详细代码示例见:
...@@ -1421,7 +1542,7 @@ Sigmoid銝蝘瘣餃嚗igmoid摰蛹嚗 ...@@ -1421,7 +1542,7 @@ Sigmoid銝蝘瘣餃嚗igmoid摰蛹嚗
NiuTrans.Tensor提供了张量的Sigmoid激活函数,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Sigmoid激活函数,调用方法及参数说明如下所示:
``` ```
Sigmoid(XTensor * x, XTensor * y) void _Sigmoid(const XTensor * x, XTensor * y)
``` ```
Parameters: Parameters:
...@@ -1433,7 +1554,7 @@ Parameters: ...@@ -1433,7 +1554,7 @@ Parameters:
Sigmoid示例代码如下,其中x为输入的向量,y为输入的张量: Sigmoid示例代码如下,其中x为输入的向量,y为输入的张量:
``` ```
/* call Sigmoid function */ /* call Sigmoid function */
Sigmoid(x, y); _Sigmoid(x, y);
``` ```
有关Sigmoid的详细代码示例见: 有关Sigmoid的详细代码示例见:
...@@ -1450,7 +1571,7 @@ Softmax銝蝘瘣餃嚗oftmax摰蛹嚗 ...@@ -1450,7 +1571,7 @@ Softmax銝蝘瘣餃嚗oftmax摰蛹嚗
NiuTrans.Tensor提供了张量的Softmax激活函数,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Softmax激活函数,调用方法及参数说明如下所示:
``` ```
Softmax(XTensor * x, XTensor * y, int leadDim) void _Softmax(const XTensor * x, XTensor * y, int leadDim)
``` ```
Parameters: Parameters:
...@@ -1463,7 +1584,7 @@ Parameters: ...@@ -1463,7 +1584,7 @@ Parameters:
Softmax示例代码如下,其中x为输入的向量,y为输入的张量,本例中沿着维度1进行Softmax操作: Softmax示例代码如下,其中x为输入的向量,y为输入的张量,本例中沿着维度1进行Softmax操作:
``` ```
/* call Softmax function */ /* call Softmax function */
Softmax(x, y, 1); _Softmax(x, y, 1);
``` ```
有关Softmax的详细代码示例见: 有关Softmax的详细代码示例见:
...@@ -1485,7 +1606,7 @@ one hot error : loss = sum_{i} e_i <br /> ...@@ -1485,7 +1606,7 @@ one hot error : loss = sum_{i} e_i <br />
NiuTrans.Tensor提供了张量的Loss激活函数,调用方法及参数说明如下所示: NiuTrans.Tensor提供了张量的Loss激活函数,调用方法及参数说明如下所示:
``` ```
LossCompute(XTensor * gold, XTensor * output, LOSS_FUNCTION_NAME LFName,bool isLogOutput, int leadDim, int gBeg, int gLen, int oBeg) DTYPE LossCompute(XTensor * gold, XTensor * output, LOSS_FUNCTION_NAME LFName, bool isLogOutput, int leadDim, int gBeg, int gLen, int oBeg)
``` ```
Parameters: Parameters:
...@@ -1513,10 +1634,365 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp ...@@ -1513,10 +1634,365 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp
### 内存池 ### 内存池
内存作为计算机软件运行过程中不可或缺的一项重要资源,在软件开发过程中具有十分重要的地位。对于一个软件系统而言,如何更高效地进行内存管理将对系统整体性能,尤其是运行速度方面产生很大程度的影响。对于内存的管理一般来说主要包括分配、追踪以及释放,通过相应的接口即可简单地在内存空间上进行变量的定义、使用以及删除等操作。
虽然目前而言,主流编程语言均会为开发人员提供相应的系统级接口(如C语言中的malloc和free,C++中的new和delete等),但这类接口在设计的时候由于需要考虑各种使用情况,因此并不一定能够最适用于目前的使用需求(如对速度具有较高要求等),因此直接使用系统级的内存管理接口存在以下弊端:
1. 内存申请、释放时间消耗大:由于操作系统在进行内存管理的时候需要保证内存空间得到有效地使用,因此在执行内存申请操作的时候,系统将会根据“最先匹配”或“最优匹配”等算法在内存空间中找到一处闲置内存进行分配。同理,在对内存空间进行释放的时候,为方便后续空间的申请,系统也会在释放的过程中适时地合并空闲内存区域,保证系统中存在大块连续内存。诸如此类的操作虽然说能够使得内存空间的使用更加高效,但也给这些操作带来了许多额外的时间开销,导致频繁地对内存进行操作耗时较大。
2. 程序执行效率低:由于所申请内存块的大小不定,当频繁使用系统级接口进行内存管理的时候容易在存储空间中产生大量内存碎片,拖慢系统的执行效率。
3. 易发生内存泄漏:使用系统级接口对内存空间进行申请的时候,一般来说需要程序开发人员显性地对空间进行释放,一旦疏忽将导致内存泄漏情况的发生,严重情况下会使得软件甚至系统发生崩溃。因此使用系统级接口进行内存管理需要谨慎对存储空间的使用情况进行分析,使用相关检测工具对内存泄漏情况进行有效地核查。
此外,当系统中存在对GPU设备上的显存空间进行管理的时候,申请、释放操作所产生的时间代价相对普通内存来说更大。不同于内存空间的申请,在申请或释放显存的时候需要对CPU正在执行的操作进行中断,交由GPU设备进行显存的操作,因此这部分产生的时间消耗远比内存申请来说大得多,最终导致频繁地对显存空间进行操作会更严重地拖慢系统整体的执行效率。
针对以上问题,本系统支持使用内存池(Memory Pool)来对系统中的存储空间(包括内存和显存)进行管理。内存池的概念主要是在对存储空间进行使用之前,预先从系统中申请一整块的空间,由程序自身(内存池)对这部分的空间进行管理。这样做的好处在于对存储空间的申请、释放等操作不需要对系统的相应接口进行频繁调用,降低了其中中断、搜寻最优块等操作的耗时,同时也不易产生内存碎片。此外,由于内存池的申请是一次性的操作,因此不会在系统全局产生大规模内存|泄漏的情况,对系统的稳定性会有所助益。
具体来说,想要在NiuTrans.Tensor的工具包中使用内存池(XMem)进行操作,只需要三个步骤:内存池的定义,使用以及释放。
* 内存池的定义
最简单的定义一个内存池只需指定一个设备ID即可,下面是一段示例代码。
```
// 定义一个内存池mem,它的类型是XMem
XMem * mem = new XMem(devID);
```
若需要更具体地指定内存池的信息,可以定义内存池的时候通过myMode、myBlockSize、myBlockNum、myBufSize等参数设置内存池的使用模型、内存块大小、内存块数量以及缓存区大小。
* 内存池的使用
在定义好内存池之后,我们即可在该空间上进行变量的定义及使用了,这里以张量的定义为例,下面是一段示例代码。
```
// 声明一个变量tensor,它的类型是XTensor
XTensor tensor;
// 在内存池上初始化这个变量为50列*100行的矩阵(2阶张量)
InitTensor2D(&tensor, 50, 100, X_FLOAT, -1, mem);
```
我们可以看到,上述代码相对之前之前未使用内存池时的定义方式而言,仅需在定义的时候指定所使用的内存池即可,无需更复杂的操作。
* 内存池的释放
当希望将完全对内存池进行释放的时候,我们仅需直接对内存池进行删除即可,下面是一段示例代码。
```
// 删除内存池mem
delete mem;
```
## 实例1:矩阵乘法 ## 实例1:矩阵乘法
NiuTrans.Tensor提供的矩阵乘法实例如下所示,详细代码见NiuTrans.Tensor/Tensor/sample/mul/
```
#include "mul.h"
namespace nts
{
void sampleMUL()
{
DTYPE aData[2][3] = { { 1.0F, 2.0F, 3.0F },
{ -4.0F, 5.0F, 6.0F } };
DTYPE bData[3][2] = { { 0.0F, -1.0F },
{ 1.0F, 2.0F },
{ 2.0F, 1.0F } };
DTYPE answer[2][2] = { { 8.0F, 6.0F },
{ 17.0F, 20.0F } };
XTensor a;
//XTensor * a = NewTensor();?
XTensor b;
XTensor result;
InitTensor2D(&a, 2, 3);
InitTensor2D(&b, 3, 2);
//a.GetSize;
a.SetData(aData, 6);
b.SetData(bData, 6);
result = MatrixMul(a, X_NOTRANS, b, X_NOTRANS);
result.Dump(stderr, "result:");
if (result.CheckData(answer, 4))
fprintf(stderr, "answer is right\n");
}
void sampleMUL1()
{
DTYPE aData[2][3] = { { 1.0F, 2.0F, 3.0F },
{ -4.0F, 5.0F, 6.0F } };
DTYPE bData[3][2] = { { 0.0F, -1.0F },
{ 1.0F, 2.0F },
{ 2.0F, 1.0F } };
DTYPE answer[2][2] = { { 8.0F, 6.0F },
{ 17.0F, 20.0F } };
/* a source tensor of size (2, 3) */
int aOrder = 2;
int * aDimSize = new int[aOrder];
aDimSize[0] = 2;
aDimSize[1] = 3;
int aUnitNum = 1;
for (int i = 0; i < aOrder; i++)
aUnitNum *= aDimSize[i];
/* a source tensor of size (3, 2) */
int bOrder = 2;
int * bDimSize = new int[bOrder];
bDimSize[0] = 3;
bDimSize[1] = 2;
int bUnitNum = 1;
for (int i = 0; i < bOrder; i++)
bUnitNum *= bDimSize[i];
/* a target tensor of size (2, 2) */
int resultOrder = 2;
int * resultDimSize = new int[resultOrder];
resultDimSize[0] = 2;
resultDimSize[1] = 2;
int resultUnitNum = 1;
for (int i = 0; i < resultOrder; i++)
resultUnitNum *= resultDimSize[i];
XTensor * a = NewTensor(aOrder, aDimSize);
XTensor * b = NewTensor(bOrder, bDimSize);
XTensor * result = NewTensor(resultOrder, resultDimSize);
a->SetData(aData, aUnitNum);
b->SetData(bData, bUnitNum);
result->SetZeroAll();
_MatrixMul(a, X_NOTRANS, b, X_NOTRANS, result);
result->Dump(stderr, "result:");
}
}
```
## 实例2:前馈神经网络 ## 实例2:前馈神经网络
NiuTrans.Tensor提供的语言模型任务上的前馈神经网络实例部分代码如下所示,主要是关于前馈神经网络语言模型上前向和反向训练的处理过程,详细代码见NiuTrans.Tensor/Tensor/sample/fnnlm/
```
/*
forward procedure
>> inputs - input word representations
>> output - output probability
>> model - the fnn model
>> net - the network that keeps the internal tensors generated in the process
*/
void Forward(XTensor inputs[], XTensor &output, FNNModel &model, FNNNet &net)
{
int batchSize = -1;
int n = model.n;
int depth = model.hDepth;
XList eList(n - 1);
/* previoius n - 1 words */
for(int i = 0; i < n - 1; i++){
XTensor &input = inputs[i];
XTensor &w = model.embeddingW;
XTensor &embedding = net.embeddings[i];
if(batchSize == -1)
batchSize = input.dimSize[0];
else{
CheckErrors(batchSize == input.dimSize[0], "Wrong input word representations!");
}
/* embedding output tensor of position i */
InitModelTensor2D(embedding, batchSize, model.eSize, model);
/* generate word embedding of position i:
embedding = input * w */
_MatrixMul(&input, X_NOTRANS, &w, X_NOTRANS, &embedding);
eList.Add(&net.embeddings[i]);
}
/* concatenate word embeddings
embeddingcat = cat(embedding_0...embedding_{n-1}) */
InitModelTensor2D(net.embeddingCat, batchSize, (n - 1) * model.eSize, model);
_Concatenate(&eList, &net.embeddingCat, 1);
/* go over each hidden layer */
for(int i = 0; i < depth; i++){
XTensor &h_pre = i == 0 ? net.embeddingCat : net.hiddens[i - 1];
XTensor &w = model.hiddenW[i];
XTensor &b = model.hiddenB[i];
XTensor &h = net.hiddens[i];
XTensor &s = net.hiddenStates[i];
InitModelTensor2D(h, batchSize, model.hSize, model);
InitModelTensor2D(s, batchSize, model.hSize, model);
/* generate hidden states of layer i:
s = h_pre * w */
_MatrixMul(&h_pre, X_NOTRANS, &w, X_NOTRANS, &s);
/* make a 2d tensor for the bias term */
XTensor b2D;
InitTensor(&b2D, &s);
_Unsqueeze(&b, &b2D, 0, batchSize);
/* introduce bias term:
s = s + b
NOTE: the trick here is to extend b to a 2d tensor
to fit into the 2d representation in tensor summation */
_Sum(&s, &b2D, &s);
/* pass the state through the hard tanh function:
h = tanh(s) */
_HardTanH(&s, &h);
}
/* generate the output Pr(w_{n-1}|w_0...w_{n-2}):
y = softmax(h_last * w)
Note that this is the implementation as that in Bengio et al.' paper.
TODO: we add bias term here */
{
XTensor &h_last = depth > 0 ? net.hiddens[depth - 1] : net.embeddingCat;
XTensor &w = model.outputW;
XTensor &b = model.outputB;
XTensor &s = net.stateLast;
XTensor &y = output;
InitModelTensor2D(s, batchSize, model.vSize, model);
InitModelTensor2D(y, batchSize, model.vSize, model);
/* s = h_last * w */
_MatrixMul(&h_last, X_NOTRANS, &w, X_NOTRANS, &s);
XTensor b2D;
InitTensor(&b2D, &s);
_Unsqueeze(&b, &b2D, 0, batchSize);
_Sum(&s, &b2D, &s);
/* y = softmax(s) */
_LogSoftmax(&s, &y, 1);
}
}
/*
backward procedure
>> inputs - input word representations
>> output - output probability
>> gold - gold standard
>> loss - loss function name
>> model - the fnn model
>> grad - the model that keeps the gradient information
>> net - the network that keeps the internal tensors generated in the process
*/
void Backward(XTensor inputs[], XTensor &output, XTensor &gold, LOSS_FUNCTION_NAME loss,
FNNModel &model, FNNModel &grad, FNNNet &net)
{
int batchSize = output.GetDim(0);
int n = model.n;
int depth = model.hDepth;
/* back-propagation for the output layer */
XTensor &y = output;
XTensor &s = net.stateLast;
XTensor &x = depth > 0 ? net.hiddens[depth - 1] : net.embeddingCat;
XTensor &w = model.outputW;
XTensor &dedw = grad.outputW;
XTensor &dedb = grad.outputB;
XTensor deds(&y);
XTensor dedx(&x);
/* for y = softmax(s), we get dE/ds
where E is the error function (define by loss) */
_LogSoftmaxBackward(&gold, &y, &s, NULL, &deds, 1, loss);
/* for s = x * w, we get
dE/w_{i,j} = dE/ds_j * ds/dw_{i,j}
= dE/ds_j * x_{i}
(where i and j are the row and column indices, and
x is the top most hidden layer)
so we know
dE/dw = x^T * dE/ds */
_MatrixMul(&x, X_TRANS, &deds, X_NOTRANS, &dedw);
/* gradient of the bias: dE/db = dE/ds * 1 = dE/ds
specifically dE/db_{j} = \sum_{i} dE/ds_{i,j} */
_ReduceSum(&deds, &dedb, 0);
/* then, we compute
dE/dx_{j} = \sum_j' (dE/ds_{j'} * ds_{j'}/dx_j)
= \sum_j' (dE/ds_{j'} * w_{j, j'})
i.e.,
dE/dx = dE/ds * w^T */
_MatrixMul(&deds, X_NOTRANS, &w, X_TRANS, &dedx);
XTensor &gradPassed = dedx;
XTensor dedsHidden;
XTensor dedxBottom;
if (depth > 0)
InitTensor(&dedsHidden, &dedx);
InitTensor(&dedxBottom, &net.embeddingCat);
/* back-propagation from top to bottom in the stack of hidden layers
for each layer, h = f(s)
s = x * w + b */
for (int i = depth - 1; i >= 0; i--) {
XTensor &h = net.hiddens[i];
XTensor &s = net.hiddenStates[i];
XTensor &x = i == 0 ? net.embeddingCat : net.hiddenStates[i - 1];
XTensor &w = model.hiddenW[i];
XTensor &dedh = gradPassed; // gradient passed though the previous layer
XTensor &dedx = i == 0 ? dedxBottom : dedh;
XTensor &deds = dedsHidden;
XTensor &dedw = grad.hiddenW[i];
XTensor &dedb = grad.hiddenB[i];
/* backpropagation through the activation fucntion:
dE/ds = dE/dh * dh/ds */
_HardTanHBackward(NULL, &h, &s, &dedh, &deds, NOLOSS);
/* gradient of the weight: dE/dw = x^T * dE/ds */
_MatrixMul(&x, X_TRANS, &deds, X_NOTRANS, &dedw);
/* gradient of the bias: dE/db = dE/ds * 1 = dE/ds
specifically dE/db_{j} = \sum_{i} dE/ds_{i,j} */
_ReduceSum(&deds, &dedb, 0);
/* gradient of the input: dE/dx = dE/ds * w^T */
_MatrixMul(&deds, X_NOTRANS, &w, X_TRANS, &dedx);
if (i > 0)
_CopyValues(&dedx, &gradPassed);
}
XList eList(n - 1);
/* back-propagation for the embedding layer */
for (int i = 0; i < n - 1; i++) {
XTensor * dedy = NewTensor2D(batchSize, model.eSize, X_FLOAT, model.devID, model.mem);
eList.Add(dedy);
}
/* gradient of the concatenation of the embedding layers */
XTensor &dedyCat = depth > 0 ? dedxBottom : dedx;
/* split the concatenation of gradients of the embeddings */
_Split(&dedyCat, &eList, 1, n - 1);
/* go over for each word */
for (int i = 0; i < n - 1; i++) {
XTensor * dedy = (XTensor*)eList.GetItem(i);
XTensor &x = inputs[i];
XTensor &dedw = grad.embeddingW;
/* gradient of the embedding weight: dE/dw += x^T * dE/dy
NOTE that we accumulate dE/dw here because the matrix w
is shared by several layers (or words) */
_MatrixMul(&x, X_TRANS, dedy, X_NOTRANS, &dedw, 1.0F, 1.0F);
delete dedy;
}
}
```
## 实例3:循环神经网络 ## 实例3:循环神经网络
## 致谢 ## 致谢
...@@ -1527,13 +2003,15 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp ...@@ -1527,13 +2003,15 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp
| 成员变量 | 功能 | | 成员变量 | 功能 |
| - | - | | - | - |
| int id | 张量标识 |
| XMem * mem | 张量所使用的内存池 | | XMem * mem | 张量所使用的内存池 |
| void * data | 保存元素的数据数组 | | void * data | 保存元素的数据数组 |
| void * dataHost | 主机内存上的数据副本,只在GPU上运行时被激活 | | void * dataHost | 主机内存上的数据副本,只在GPU上运行时被激活 |
| void ** dataP | 指向数据地址的指针 |
| int devID | 设备ID,指张量所申请的空间所在CPU或者GPU设备的编号,-1表示CPU | | int devID | 设备ID,指张量所申请的空间所在CPU或者GPU设备的编号,-1表示CPU |
| int order | 张量的维度,例如:一个矩阵(维度为2)是一个二维张量 | | int order | 张量的维度,例如:一个矩阵(维度为2)是一个二维张量 |
| int dimSize<br> [MAX_TENSOR_DIM_NUM] | 张量中每一维度的大小,索引0表示第1维 | | int dimSize[ ] | 张量中每一维度的大小,索引0表示第1维 |
| int dimSizeRDI<br> [MAX_TENSOR_DIM_NUM] | 转置模式下张量中每一维度的大小,索引0表示第1维 | | int dimSizeRDI[ ] | 转置模式下张量中每一维度的大小,索引0表示第1维 |
| TENSOR_DATA_TYPE dataType | 每个数据单元的数据类型 | | TENSOR_DATA_TYPE dataType | 每个数据单元的数据类型 |
| int unitSize | 数据单元的大小,类似于sizeof() | | int unitSize | 数据单元的大小,类似于sizeof() |
| int unitNum | 数据单元的数量 | | int unitNum | 数据单元的数量 |
...@@ -1541,17 +2019,34 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp ...@@ -1541,17 +2019,34 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp
| int unitNumNonZero | 稀疏矩阵中非零元素个数 | | int unitNumNonZero | 稀疏矩阵中非零元素个数 |
| float denseRatio | 稠密度,指非零单元的比例,是介于0和1之间的一个实数,0表示所有单元全为零,1表示全为非零单元。| | float denseRatio | 稠密度,指非零单元的比例,是介于0和1之间的一个实数,0表示所有单元全为零,1表示全为非零单元。|
| bool isShared | 标志数据数组是否被其他张量所共享 | | bool isShared | 标志数据数组是否被其他张量所共享 |
| bool isDefaultDType | 矩阵中使用的数据类型是否是属于默认数据类型 |
| bool isInGlobalMem | 标志数据是否在全局内存而不是内存池中 | | bool isInGlobalMem | 标志数据是否在全局内存而不是内存池中 |
| bool isAllValued<br> [MAX_TENSOR_DIM_NUM] | 标志稀疏矩阵中是否每个维度都具有非零元素 | | bool isAllValued[ ] | 标志稀疏矩阵中是否每个维度都具有非零元素 |
| bool isInit | 张量是否被初始化 |
| bool isTmp | 张量是否为临时创建 |
| bool isGrad | 当使用模型参数时张量是否保持梯度 |
| unsigned int visitMark | 节点访问标志 |
| XTensor * grad | 反向传播的梯度 |
| XLink income | 超边的入边 |
| XLink outgo | 超边的出边 |
在XTensor.h头文件中定义的方法说明: 在XTensor.h头文件中定义的方法说明:
| 功能 | 函数 | 参数 | | 功能 | 函数 | 参数 |
| - | - | - | | - | - | - |
| 构造函数 | XTensor() | N/A |
| 析构函数 | ~XTensor() | N/A |
| 初始化成员变量 | void Init() | N/A |
| 销毁数据 | void DestroyData() | N/A |
| 张量的浅层复制 | void ShallowCopy(<br>const XTensor &tensor) | tensor - 进行复制的张量 |
| 重载等于符号 | XTensor& operator= (<br>const XTensor &tensor) | tensor - 重载的张量 |
| 重载加法符号 | XTensor operator+ (<br>const XTensor &tensor) | tensor - 重载的张量 |
| 重载乘法符号 | XTensor operator* (<br>const XTensor &tensor) | tensor - 重载的张量 |
| 线性变换 | XTensor Lin(<br>DTYPE scale, DTYPE shift = 0) | scale - 缩放参数 <br> shift - 偏移参数 |
| 判断两个张量数据类型<br>和大小是否相同 | static bool IsIdentical(<br> XTensor * a, XTensor * b) | a - 进行比较的第一个张量 <br> b - 进行比较的第二个张量 | | 判断两个张量数据类型<br>和大小是否相同 | static bool IsIdentical(<br> XTensor * a, XTensor * b) | a - 进行比较的第一个张量 <br> b - 进行比较的第二个张量 |
| 判断三个张量数据类型<br>和大小是否相同 | static bool IsIdentical(<br> XTensor * a, XTensor * b, XTensor * c) | a - 进行比较的第一个张量 <br> b - 进行比较的第二个张量 <br> c - 进行比较的第三个张量 | | 判断三个张量数据类型<br>和大小是否相同 | static bool IsIdentical(<br> XTensor * a, XTensor * b, XTensor * c) | a - 进行比较的第一个张量 <br> b - 进行比较的第二个张量 <br> c - 进行比较的第三个张量 |
| 设置张量每一维度的大小 | void SetDim(int * myDimSize) |myDimSize - 张量每一维度的大小 | | 设置张量每一维度的大小 | void SetDim(<br>int * myDimSize) |myDimSize - 张量每一维度的大小 |
| 得到张量中给定的维度大小 | int GetDim(const int dim) | dim - 张量的维度 | | 得到张量中给定的维度大小 | int GetDim(<br>const int dim) | dim - 张量的维度 |
| 重新调整矩阵维度 | void Reshape(<br> const int order, const int * myDimSize) | order - 张量的维度 <br> myDimSize - 张量每一维的大小 | | 重新调整矩阵维度 | void Reshape(<br> const int order, const int * myDimSize) | order - 张量的维度 <br> myDimSize - 张量每一维的大小 |
| 得到张量中元素数量 | int GetSize() | N/A | | 得到张量中元素数量 | int GetSize() | N/A |
| 得到内存使用大小 | int GetDataSizeInChar() | N/A | | 得到内存使用大小 | int GetDataSizeInChar() | N/A |
...@@ -1561,15 +2056,26 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp ...@@ -1561,15 +2056,26 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp
| 设置张量服从均匀分布 | void SetDataRand(<br> DTYPE lower, DTYPE upper) | lower - 最小值 <br> upper - 最大值 | | 设置张量服从均匀分布 | void SetDataRand(<br> DTYPE lower, DTYPE upper) | lower - 最小值 <br> upper - 最大值 |
| 设置张量服从正态分布 | void SetDataRandn(<br> DTYPE mean, DTYPE standardDeviation) | mean - 均值 <br> standardDeviation - 标准差 | | 设置张量服从正态分布 | void SetDataRandn(<br> DTYPE mean, DTYPE standardDeviation) | mean - 均值 <br> standardDeviation - 标准差 |
| 检查张量中元素是否相同 | bool CheckData(<br> const void * answer, int num, int beg = 0) | answer - 给定数组 <br> num - 数组大小 <br> beg - 赋值时从张量的第几位开始 | | 检查张量中元素是否相同 | bool CheckData(<br> const void * answer, int num, int beg = 0) | answer - 给定数组 <br> num - 数组大小 <br> beg - 赋值时从张量的第几位开始 |
| 将给定维度中元素<br> 设置为升序 | void SetAscendingOrder(int dim) | dim - 给定维度 | | 设置数据指针 | void SetDataPointer() | N/A |
| 获取张量中元素指针 | void * GetCell(int * index, int size) | index - 元素位置 <br> size-矩阵大小 | | 将给定维度中元素<br> 设置为升序 | void SetAscendingOrder(<br>int dim) | dim - 给定维度 |
| 获取二维张量中元素指针 | void * GetCell2D(int ni, int mi = 0) | ni - 行值 <br> mi - 列值 | | 得到索引指向的单元的值 | DTYPE Get(int index[], int size = -1) | index - 给定索引 <br> size-矩阵大小 |
| 获取二维张量的值 | DTYPE Get2D(int ni, int mi = 0) | ni - 行值 <br> mi - 列值 | | 获取张量中元素指针 | void * GetCell(<br>int * index, int size) | index - 元素位置 <br> size-矩阵大小 |
| 获取一维张量中元素的<br>默认类型值 | DTYPE Get1D(<br>int i) | i - 第一维 |
| 获取二维张量中元素的<br>默认类型值 | DTYPE Get2D(<br>int ni, int mi) const | ni - 第一维 <br> mi - 第二维 |
| 获取三维张量中元素的<br>默认类型值 | DTYPE Get3D(<br>int d0, int d1, int d2) | d0 - 第一维 <br> d1 - 第二维 <br> d2 - 第三维 |
| 获取一维张量中元素的<br>整形值 |int Get1DInt(<br>int i) | i - 第一维 |
| 获取二维张量中元素的<br>整形值 | int Get2DInt(<br>int ni, int mi) | ni - 第一维 <br> mi - 第二维 |
| 获取三维张量中元素的整形值 | int Get3DInt(<br>int d0, int d1, int d2) | d0 - 第一维 <br> d1 - 第二维 <br> d2 - 第三维 |
| 获取稀疏张量的值 | DTYPE GetInSparse(int i) | i - 稀疏矩阵中非0元素位置 | | 获取稀疏张量的值 | DTYPE GetInSparse(int i) | i - 稀疏矩阵中非0元素位置 |
| 获取稀疏张量中<br> 元组的键值 | int GetKeyInSparse(int i) | i - 稀疏矩阵中非0元素位置 | | 获取稀疏张量中<br> 元组的键值 | int GetKeyInSparse(int i) | i - 稀疏矩阵中非0元素位置 |
| 设置二维张量中<br> 的单元值 | bool Set2D(DTYPE value, int ni, int mi = 0) | value - 单元值 <br> ni - 行值 <br> mi - 列值 | | 设置单元中的值 | bool Set(<br>DTYPE value, int index[], int size = -1) | value - 值 <br> index - 元素位置 <br> size-矩阵大小 |
| 增加二维张量中<br> 的单元值 | bool Add2D(DTYPE value, int ni, int mi = 0) | value - 单元值 <br> ni - 行值 <br> mi - 列值 | | 设置一维张量中的单元值 | bool Set1D(<br>DTYPE value, int i) | value - 值 <br> i - 第一维 |
| 设置二维张量中的单元值 | bool Set2D(<br>DTYPE value, int ni, int mi) | value - 值 <br> ni - 第一维 <br> mi - 第二维 |
| 设置三维张量中的单元值 | bool Set3D(<br>DTYPE value, int d0, int d1, int d2) | value - 值 <br> d0 - 第一维 <br> d1 - 第二维 <br> d2 - 第三维 |
| 增加二维张量中<br> 的单元值 | bool Add2D(<br>DTYPE value, int ni, int mi = 0) | value - 单元值 <br> ni - 行值 <br> mi - 列值 |
| 获取稀疏矩阵中<br> 非零元素数量 | int GetNonzeroSize() | N/A | | 获取稀疏矩阵中<br> 非零元素数量 | int GetNonzeroSize() | N/A |
| 设置张量为临时变量 | void SetTMP(<br>bool myIsTmp = true) | myIsTmp - 是否为临时变量 |
| 张量是否保持梯度 | void SetGrad(<br>bool myIsGrad = true) | myIsTmp - 是否保持梯度 |
| 将矩阵重置为特定大小 | bool Resize(<br> const int myOrder, <br> const int * myDimSize, <br> const TENSOR_DATA_TYPE myDataType = DEFAULT_DTYPE, <br> const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 | | 将矩阵重置为特定大小 | bool Resize(<br> const int myOrder, <br> const int * myDimSize, <br> const TENSOR_DATA_TYPE myDataType = DEFAULT_DTYPE, <br> const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 |
| 将矩阵重置为特定大小<br>并不申请新空间 | bool ResizeWithNoData(<br> const int myOrder, <br> const int * myDimSize, <br> const TENSOR_DATA_TYPE myDataType = DEFAULT_DTYPE, <br> const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 | | 将矩阵重置为特定大小<br>并不申请新空间 | bool ResizeWithNoData(<br> const int myOrder, <br> const int * myDimSize, <br> const TENSOR_DATA_TYPE myDataType = DEFAULT_DTYPE, <br> const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 |
| 将矩阵重置为<br> 另一矩阵大小 | bool Resize(<br> const XTensor * myTensor) | myTensor - 重置矩阵大小的参考矩阵 | | 将矩阵重置为<br> 另一矩阵大小 | bool Resize(<br> const XTensor * myTensor) | myTensor - 重置矩阵大小的参考矩阵 |
...@@ -1580,4 +2086,4 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp ...@@ -1580,4 +2086,4 @@ NiuTrans.Tensor/Tensor/test/TLoss.cpp
| 在缓冲区创建张量 | XTensor * NewTensorBuf( <br> const int myOrder, <br> const int * myDimSize, XMem * myMem, <br> const TENSOR_DATA_TYPE myDataType = <br> X_FLOAT, const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myMem - 张量所使用的内存池 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 | | 在缓冲区创建张量 | XTensor * NewTensorBuf( <br> const int myOrder, <br> const int * myDimSize, XMem * myMem, <br> const TENSOR_DATA_TYPE myDataType = <br> X_FLOAT, const float myDenseRatio = 1.0F) | myOrder - 张量的维度 <br> myDimSize - 张量每一维的大小,索引0表示第一维 <br> myMem - 张量所使用的内存池 <br> myDataType - 张量的数据类型 <br> myDenseRatio - 张量的稠密度,1表示稠密张量 |
| 依据给定张量<br>复制一个新的张量 | XTensor * NewTensor(<br>XTensor * a, bool isFilledData = true) | a - 给定张量 <br> isFilledData - 是否申请张量中的数据空间 | | 依据给定张量<br>复制一个新的张量 | XTensor * NewTensor(<br>XTensor * a, bool isFilledData = true) | a - 给定张量 <br> isFilledData - 是否申请张量中的数据空间 |
| 依据给定张量<br>释放数据空间 | void DelTensor(<br>const XTensor * tensor) | tensor - 给定张量 | | 依据给定张量<br>释放数据空间 | void DelTensor(<br>const XTensor * tensor) | tensor - 给定张量 |
| 依据给定张量<br>在缓存中释放数据空间 | void DelTensorBuf(<br>const XTensor * tensor) | tensor - 给定张量 | | 依据给定张量<br>在缓存中释放数据空间 | void DelTensorBuf(<br>const XTensor * tensor) | tensor - 给定张量 |
\ No newline at end of file
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
#include <stdio.h> #include <stdio.h>
#include "XNet.h" #include "XNet.h"
#include "../tensor/XUtility.h"
#include "../tensor/function/FHeader.h" #include "../tensor/function/FHeader.h"
#include "../tensor/core/CHeader.h" #include "../tensor/core/CHeader.h"
#include "../sample/fnnlm/FNNLM.h" #include "../sample/fnnlm/FNNLM.h"
...@@ -29,13 +30,20 @@ ...@@ -29,13 +30,20 @@
//#include <stdlib.h> //#include <stdlib.h>
//#include <crtdbg.h> //#include <crtdbg.h>
using namespace nts; void TransposeTest();
using namespace samplefnnlm; void SumDimTest();
using namespace nts;
using namespace fnnlm;
int main( int argc, const char ** argv ) int main( int argc, const char ** argv )
{ {
//TransposeTest();
//return 0;
//SumDimTest();
//return 0;
if(argc > 1 && !strcmp(argv[1], "-test")) if(argc > 1 && !strcmp(argv[1], "-test"))
1;//Test(); 1;//Test();
else if(argc > 1 && !strcmp(argv[1], "-fnnlm")) else if(argc > 1 && !strcmp(argv[1], "-fnnlm"))
...@@ -47,6 +55,8 @@ int main( int argc, const char ** argv ) ...@@ -47,6 +55,8 @@ int main( int argc, const char ** argv )
fprintf(stderr, "Or run this program with \"-fnnlm\" for sample FNNLM!\n"); fprintf(stderr, "Or run this program with \"-fnnlm\" for sample FNNLM!\n");
} }
return 0;
XNet net; XNet net;
XTensor a; XTensor a;
XTensor b; XTensor b;
...@@ -80,3 +90,116 @@ int main( int argc, const char ** argv ) ...@@ -80,3 +90,116 @@ int main( int argc, const char ** argv )
return 0; return 0;
} }
void TransposeTest()
{
#ifdef USE_CUDA
XMem mem0(0, UNI_FREE, MILLION * 64, 1024, MILLION * 64);
//XMem mem1(1, UNI_FREE, MILLION * 64, 1024, MILLION * 64);
XTensor x;
XTensor y;
XTensor z;
int loops = 2000;
int B = 3 * 2 * 4;
int K = 8 * 1;
int N = 50;
int H = 512 * 4;
int nnn = GDevs.nGPU;
InitTensor3D(&x, B, N, H, X_FLOAT, 0);
InitTensor4D(&y, K, B, N, H/K, X_FLOAT, 0);
InitTensor3D(&z, B, N, H, X_FLOAT, 0);
cudaEvent_t ctime0;
cudaEvent_t ctime1;
cudaEvent_t ctime2;
cudaEvent_t ctime3;
cudaEvent_t ctime4;
cudaEvent_t ctime5;
float elapsedSplit = 0.0;
float elapsedMerge = 0.0;
float elapsedSum = 0.0;
cudaEventCreate(&ctime0);
cudaEventCreate(&ctime1);
cudaEventCreate(&ctime2);
cudaEventCreate(&ctime3);
cudaEventCreate(&ctime4);
cudaEventCreate(&ctime5);
cudaEventRecord(ctime0, 0);
double time0 = GetClock();
for(int i = 0; i < loops; i++)
_Split(&x, &y, 2, K);
double time1 = GetClock();
cudaEventRecord(ctime1, 0);
cudaEventSynchronize(ctime1);
cudaEventElapsedTime(&elapsedSplit, ctime0, ctime1);
cudaEventRecord(ctime2, 0);
double time2 = GetClock();
for(int i = 0; i < loops; i++)
_Merge(&y, &x, 3);
double time3 = GetClock();
cudaEventRecord(ctime3, 0);
cudaEventSynchronize(ctime3);
cudaEventElapsedTime(&elapsedMerge, ctime2, ctime3);
cudaEventRecord(ctime4, 0);
double time4 = GetClock();
for(int i = 0; i < loops; i++)
_Sum(&x, &z, &x);
double time5 = GetClock();
cudaEventRecord(ctime5, 0);
cudaEventSynchronize(ctime5);
cudaEventElapsedTime(&elapsedSum, ctime4, ctime5);
fprintf(stderr, "split:%f merge:%f sum:%f\n", time1 - time0, time3 - time2, time5 - time4);
fprintf(stderr, "split:%f merge:%f sum:%f\n", elapsedSplit, elapsedMerge, elapsedSum);
#endif
}
void SumDimTest()
{
XTensor x;
XTensor y;
XTensor z;
int a = 5;
int b = 7;
int c = 3;
InitTensor3D(&x, a, b, c, X_FLOAT, -1);
InitTensor1D(&y, c, X_FLOAT, -1);
InitTensor3D(&z, a, b, c, X_FLOAT, -1);
x.SetZeroAll();
y.SetZeroAll();
z.SetZeroAll();
float * data = new float[x.unitNum];
for(int i = 0; i < x.unitNum; i++)
data[i] = (DTYPE)i;
x.SetData(data, x.unitNum);
for(int i = 0; i < y.unitNum; i++)
data[i] = -(DTYPE)i;
y.SetData(data, y.unitNum);
_SumDim(&x, &y, &z, 2);
z.Dump(stderr, "z:");
delete[] data;
}
...@@ -63,6 +63,8 @@ void XFuncGrad::MakeGrad(XTensor * node) ...@@ -63,6 +63,8 @@ void XFuncGrad::MakeGrad(XTensor * node)
else{ else{
ShowNTErrors("Wrong activation function type!"); ShowNTErrors("Wrong activation function type!");
} }
node->visitMark = NODE_FINISHED;
} }
/* indicates whether the node is for an activation function */ /* indicates whether the node is for an activation function */
......
...@@ -37,10 +37,46 @@ void XMathGrad::MakeGrad(XTensor * node) ...@@ -37,10 +37,46 @@ void XMathGrad::MakeGrad(XTensor * node)
if(operID == MATH_SUM) if(operID == MATH_SUM)
GradSum(node); GradSum(node);
else if(operID == MATH_SUMDIM)
GradSumDim(node);
else if(operID == MATH_MULTIPLY) else if(operID == MATH_MULTIPLY)
GradMultiply(node); GradMultiply(node);
else if(operID == MATH_MATRIXMUL) else if(operID == MATH_MATRIXMUL)
GradMatrixMul(node); GradMatrixMul(node);
else if (operID == MATH_LOG)
GradLog(node);
else if (operID == MATH_POWER)
GradPower(node);
else if (operID == MATH_NEGATE)
GradNegate(node);
else if (operID == MATH_SCALEANDSHIFT)
GradScaleAndShift(node);
else if (operID == MATH_DIV)
GradDiv(node);
else if (operID == MATH_SUB)
GradSub(node);
else if (operID == MATH_SIN)
GradSin(node);
else if (operID == MATH_COS)
GradCos(node);
else if (operID == MATH_TAN)
GradTan(node);
else if (operID == MATH_EXP)
GradExp(node);
else if (operID == MATH_NORMALIZE)
GradNormalize(node);
else if (operID == MATH_ABSOLUTE)
GradAbsolute(node);
else if (operID == MATH_SIGN)
GradSign(node);
else if (operID == REDUCE_REDUCEMEAN)
GradReduceMean(node);
else if (operID == REDUCE_REDUCESUM)
GradReduceSum(node);
else if (operID == REDUCE_REDUCESUMSQUARED)
GradReduceSumSquared(node);
else if (operID == REDUCE_REDUCEVARIANCE)
GradReduceVariance(node);
else{ else{
ShowNTErrors("TODO!"); ShowNTErrors("TODO!");
} }
...@@ -70,11 +106,108 @@ void XMathGrad::GradSum(XTensor * node) ...@@ -70,11 +106,108 @@ void XMathGrad::GradSum(XTensor * node)
XTensor * a = income.tails[0]; XTensor * a = income.tails[0];
XTensor * b = income.tails[1]; XTensor * b = income.tails[1];
DTYPE beta = income.GetParam(0); DTYPE beta = income.GetParam(0);
XNoder::MakeGrad(a); XNoder::MakeGrad(a);
XNoder::MakeGrad(b); XNoder::MakeGrad(b);
_Sum(a->grad, node->grad, a->grad); _Sum(a->grad, node->grad, a->grad);
_Sum(b->grad, node->grad, b->grad, beta); _Sum(b->grad, node->grad, b->grad, beta);
node->visitMark = NODE_FINISHED;
}
/*
gradient for sum with one dimension
c = a + b * \beta
where the size of b is equal to dimension n of a, i.e., |b| = a.dimSize[n]
dE/da = dE/dc
dE/db = dE/dc * b.reduce(0,...,n-1,n+1,...) * \beta
*/
void XMathGrad::GradSumDim(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 2, "Wrong input tensor number for SUMDIM!");
XTensor * a = income.tails[0];
XTensor * b = income.tails[1];
int n = income.GetParamInt(0);
DTYPE beta = income.GetParam(1);
XNoder::MakeGrad(a);
XNoder::MakeGrad(b);
_Sum(a->grad, node->grad, a->grad);
int order = a->order;
int dimSize[MAX_TENSOR_DIM_NUM];
memcpy(dimSize, a->dimSize, sizeof(int) * a->order);
if(n == order - 1){
int reshapedSize[MAX_TENSOR_DIM_NUM];
reshapedSize[0] = a->unitNum/dimSize[order - 1];
reshapedSize[1] = dimSize[order - 1];
/* we reshape dE/dc to a matrix whose column number is equal to the
size of b. Then we can reduce the matrix into a row vector. */
node->grad->Reshape(2, reshapedSize);
if(b->outgo.tailNum > 1){
XTensor * bGradTMP = NewTensorBuf(b->grad, b->devID, b->mem);
_ReduceSum(node->grad, bGradTMP, 0);
if(beta != 1.0F)
_ScaleAndShiftMe(bGradTMP, beta);
_Sum(bGradTMP, b->grad, b->grad);
DelTensorBuf(bGradTMP);
}
else{
_ReduceSum(node->grad, b->grad, 0);
if(beta != 1.0F)
_ScaleAndShiftMe(b->grad, beta);
}
node->grad->Reshape(order, dimSize);
}
else{
int reshapedSize[MAX_TENSOR_DIM_NUM];
reshapedSize[0] = 1;
reshapedSize[1] = dimSize[n];
reshapedSize[2] = 1;
for(int i = 0; i < order; i++){
if(i < n)
reshapedSize[0] *= dimSize[i];
}
reshapedSize[2] = a->unitNum / (reshapedSize[0] * reshapedSize[1]);
/* we reshape dE/dc to a 3D tensor of size (x, y, z) where y = |b|.
Then reduce along with z and x to obtain dE/db. */
node->grad->Reshape(3, reshapedSize);
XTensor * interGrad = NewTensorBuf(2, reshapedSize, b->dataType, b->denseRatio, b->devID, b->mem);
_ReduceSum(node->grad, interGrad, 2);
if(b->outgo.tailNum > 1){
XTensor * bGradTMP = NewTensorBuf(b->grad, b->devID, b->mem);
_ReduceSum(interGrad, bGradTMP, 0);
if(beta != 1.0F)
_ScaleAndShiftMe(bGradTMP, beta);
_Sum(bGradTMP, b->grad, b->grad);
DelTensorBuf(bGradTMP);
}
else{
_ReduceSum(interGrad, b->grad, 0);
if(beta != 1.0F)
_ScaleAndShiftMe(b->grad, beta);
}
node->grad->Reshape(order, dimSize);
DelTensorBuf(interGrad);
}
node->visitMark = NODE_FINISHED;
} }
/* /*
...@@ -99,6 +232,8 @@ void XMathGrad::GradMultiply(XTensor * node) ...@@ -99,6 +232,8 @@ void XMathGrad::GradMultiply(XTensor * node)
CheckNTErrors(XTensor::IsSameShaped(a, b), "Wrong sized input tensors!"); CheckNTErrors(XTensor::IsSameShaped(a, b), "Wrong sized input tensors!");
_Multiply(node->grad, b, a->grad, 1.0F); _Multiply(node->grad, b, a->grad, 1.0F);
_Multiply(node->grad, a, b->grad, 1.0F); _Multiply(node->grad, a, b->grad, 1.0F);
node->visitMark = NODE_FINISHED;
} }
/* /*
...@@ -167,6 +302,557 @@ void XMathGrad::GradMatrixMul(XTensor * node) ...@@ -167,6 +302,557 @@ void XMathGrad::GradMatrixMul(XTensor * node)
/* dE/db = a * dE/dc * \alpha */ /* dE/db = a * dE/dc * \alpha */
_MatrixMul(a, X_NOTRANS, dedc, X_NOTRANS, dedb, alpha, 1.0F); _MatrixMul(a, X_NOTRANS, dedc, X_NOTRANS, dedb, alpha, 1.0F);
} }
node->visitMark = NODE_FINISHED;
}
/*
gradient for log
for
c = log(a)
we have
dE/da = dE/dc * 1/a
>> node - the node (c) for backward computation
*/
void XMathGrad::GradLog(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for LOG!");
XTensor * a = income.tails[0];
XNoder::MakeGrad(a);
_Div(node->grad, a, a->grad, 1.0F);
node->visitMark = NODE_FINISHED;
}
/*
gradient for power
for
c = pow(a,p)
we have
dE/da = (dE/dc) * p*a^(p-1)
>> node - the node (c) for backward computation
*/
void XMathGrad::GradPower(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for POWER!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XTensor * c = NewTensor(a);
DTYPE p = income.GetParam(0);
XNoder::MakeGrad(a);
_Power(a, b, (p-1)/p);
_ScaleAndShift(b, c, p);
_Multiply(node->grad, c, a->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete b;
delete c;
}
/*
gradient for negate
for
c = -a
we have
dE/da = dE/dc * (-1)
>> node - the node (c) for backward computation
*/
void XMathGrad::GradNegate(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for NEGATE!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XNoder::MakeGrad(a);
_ScaleAndShift(node->grad, b, -1.0F);
_Sum(a->grad, b, a->grad);
node->visitMark = NODE_FINISHED;
delete b;
}
/*
gradient for ScaleAndShift
for
c = a * scale + shift
we have
dE/da = dE/dc * scale
>> node - the node (c) for backward computation
*/
void XMathGrad::GradScaleAndShift(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for SCALEANDSHIFT!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
DTYPE scale = income.GetParam(0);
XNoder::MakeGrad(a);
_ScaleAndShift(node->grad, b, scale);
_Sum(a->grad, b, a->grad);
node->visitMark = NODE_FINISHED;
delete b;
}
/*
gradient for minus
for
c = a - b * \beta
we have
dE/da = dE/dc
dE/db = -dE/dc * \beta
>> node - the node (c) for backward computation
*/
void XMathGrad::GradSub(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 2, "Wrong input tensor number for SUBSTRACT!");
XTensor * a = income.tails[0];
XTensor * b = income.tails[1];
DTYPE beta = income.GetParam(0);
XNoder::MakeGrad(a);
XNoder::MakeGrad(b);
_Sum(a->grad, node->grad, a->grad);
_Sum(b->grad, node->grad, b->grad, -beta);
node->visitMark = NODE_FINISHED;
}
/*
gradient for divide
for
c = a / b
we have
dE/da = dE/dc / b
dE/db = dE/dc * a / -b^2
>> node - the node (c) for backward computation
*/
void XMathGrad::GradDiv(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 2, "Wrong input tensor number for DIVIDE!");
XTensor * a = income.tails[0];
XTensor * b = income.tails[1];
XTensor * c = NewTensor(b);
XTensor * d = NewTensor(b);
XTensor * e = NewTensor(b);
XNoder::MakeGrad(a);
XNoder::MakeGrad(b);
CheckNTErrors(XTensor::IsSameShaped(a, b), "Wrong sized input tensors!");
_Div(node->grad, b, a->grad, 1.0F);
_Power(b, c, -2.0F);
_Multiply(a, c, d);
_ScaleAndShift(d, e, -1.0F);
_Multiply(node->grad, e, b->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete c;
delete d;
delete e;
}
/*
gradient for exp
for
c = exp(a)
we have
dE/da = dE/dc * exp(a)
>> node - the node (c) for backward computation
*/
void XMathGrad::GradExp(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for EXP!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XNoder::MakeGrad(a);
_Exp(a, b);
_Multiply(node->grad, b, a->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete b;
}
/*
gradient for sin
for
c = sin(a)
we have
dE/da = dE/dc * cos(a)
>> node - the node (c) for backward computation
*/
void XMathGrad::GradSin(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for SIN!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XNoder::MakeGrad(a);
_Cos(a, b);
_Multiply(node->grad, b, a->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete b;
}
/*
gradient for cos
for
c = cos(a)
we have
dE/da = dE/dc * -sin(a)
>> node - the node (c) for backward computation
*/
void XMathGrad::GradCos(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for COS!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XTensor * c = NewTensor(a);
XNoder::MakeGrad(a);
_Sin(a, b);
_ScaleAndShift(b, c, -1.0F);
_Multiply(node->grad, c, a->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete b;
delete c;
}
/*
gradient for tan
for
c = tan(a)
we have
dE/da = dE/dc * 1/(cos(a))^2
>> node - the node (c) for backward computation
*/
void XMathGrad::GradTan(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for TAN!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XTensor * c = NewTensor(a);
XNoder::MakeGrad(a);
_Cos(a, b);
_Power(b, c, -2.0F);
_Multiply(node->grad, c, a->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete b;
delete c;
}
/*
gradient for normalize
>> node - the node (c) for backward computation
*/
void XMathGrad::GradNormalize(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 5, "Wrong input tensor number for NORMALIZE!");
XTensor * input = income.tails[0];
XTensor * mean = income.tails[1];
XTensor * var = income.tails[2];
XTensor * a = income.tails[3];
XTensor * b = income.tails[4];
XTensor * c = NewTensor(a);
XTensor * d = NewTensor(a);
XTensor * e = NewTensor(a);
XTensor * f = NewTensor(a);
XTensor * g = NewTensor(a);
XTensor * h = NewTensor(a);
XTensor * i = NewTensor(a);
XTensor * j = NewTensor(a);
XTensor * k = NewTensor(a);
XTensor * p = NewTensor(a);
XTensor * q = NewTensor(a);
XTensor * r = NewTensor(a);
DTYPE epsilon = income.GetParamInt(0);
int dim = income.GetParamInt(0);
int n = a->GetDim(dim);
XNoder::MakeGrad(input);
XNoder::MakeGrad(mean);
XNoder::MakeGrad(var);
XNoder::MakeGrad(a);
XNoder::MakeGrad(b);
/* dEdinput */
_ScaleAndShift(var, c, 1.0F, epsilon);
_Unsqueeze(c, d, dim, n);
_Power(d, e, -0.5F);
_Multiply(a, e, f);
_Multiply(node->grad, f, input->grad, 1.0F);
/* dEdmean */
_ScaleAndShift(f, g, -1.0F);
_Multiply(node->grad, g, mean->grad, 1.0F);
/* dEdvar */
_Unsqueeze(mean, h, dim, n);
_Sub(input, h, i);
_Multiply(a, i, j);
_Power(var, k, -1.5F);
_ScaleAndShift(k, p, -0.5F);
_Multiply(j, p, q);
_Multiply(node->grad, q, var->grad, 1.0F);
/* dEda */
_Multiply(i, e, r);
_Multiply(node->grad, r, a->grad, 1.0F);
/* dEdb */
_Sum(b->grad, node->grad, b->grad);
node->visitMark = NODE_FINISHED;
delete c;
delete d;
delete e;
delete f;
delete g;
delete h;
delete i;
delete j;
delete k;
delete p;
delete q;
delete r;
}
/*
gradient for absolute
for
c = |a|
we have
dE/da = dE/dc a >= 0
-dE/dc a < 0
>> node - the node (c) for backward computation
*/
void XMathGrad::GradAbsolute(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for ABSOLUTE!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XNoder::MakeGrad(a);
_Sign(a, b);
_Multiply(node->grad, b, a->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete b;
}
/*
gradient for sign
for
c = sign(a)
we have
dE/da = 0
>> node - the node (c) for backward computation
*/
void XMathGrad::GradSign(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for SIGN!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XNoder::MakeGrad(a);
b->SetZeroAll();
_Sum(a->grad, b, a->grad);
node->visitMark = NODE_FINISHED;
delete b;
}
/*
gradient for reduceMean
for
c = reduceMean(a, dim)
we have
dE/da = Unsqueeze(dE/dc) * 1/dimSizeA[dim]
>> node - the node (c) for backward computation
*/
void XMathGrad::GradReduceMean(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for Reduce!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
XTensor * c = NewTensor(a);
int dim = income.GetParamInt(0);
int n = a->GetDim(dim);
XNoder::MakeGrad(a);
_Unsqueeze(node->grad, b, dim, n);
_ScaleAndShift(b, c, 1 / n);
_Sum(a->grad, c, a->grad);
node->visitMark = NODE_FINISHED;
delete b;
delete c;
}
/*
gradient for reduceSum
for
c = reduceSum(a, dim)
we have
dE/da = Unsqueeze(dE/dc) * 1
>> node - the node (c) for backward computation
*/
void XMathGrad::GradReduceSum(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for Reduce!");
XTensor * a = income.tails[0];
XTensor * b = NewTensor(a);
int dim = income.GetParamInt(0);
int n = a->GetDim(dim);
XNoder::MakeGrad(a);
_Unsqueeze(node->grad, b, dim, n);
_Sum(a->grad, b, a->grad);
node->visitMark = NODE_FINISHED;
delete b;
}
/*
gradient for reduceSumSquared
for
c = reduceSumSquared(a, dim, b)
we have
dE/da = Unsqueeze(dE/dc) * 2a
dE/db = Unsqueeze(dE/dc) * (-2b)
>> node - the node (c) for backward computation
*/
void XMathGrad::GradReduceSumSquared(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 2, "Wrong input tensor number for Reduce!");
XTensor * a = income.tails[0];
XTensor * b = income.tails[1];
XTensor * c = NewTensor(a);
XTensor * d = NewTensor(b);
XTensor * e = NewTensor(c);
int dim = income.GetParamInt(0);
int n = a->GetDim(dim);
XNoder::MakeGrad(a);
XNoder::MakeGrad(b);
_ScaleAndShift(a, c, 2.0F);
_ScaleAndShift(b, d, -2.0F);
_Unsqueeze(node->grad, e, dim, n);
_Multiply(e, c, a->grad, 1.0F);
_Multiply(node->grad, d, b->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete c;
delete d;
delete e;
}
/*
gradient for reduceVariance
for
c = reduceVariance(a, dim, b)
we have
dE/da = Unsqueeze(dE/dc) * 2a/dimSizeA[dim]
dE/db = Unsqueeze(dE/dc) * (-2a/dimSizeA[dim])
>> node - the node (c) for backward computation
*/
void XMathGrad::GradReduceVariance(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 2, "Wrong input tensor number for Reduce!");
XTensor * a = income.tails[0];
XTensor * b = income.tails[1];
XTensor * c = NewTensor(a);
XTensor * d = NewTensor(b);
XTensor * e = NewTensor(a);
int dim = income.GetParamInt(0);
int n = a->GetDim(dim);
XNoder::MakeGrad(a);
XNoder::MakeGrad(b);
_ScaleAndShift(a, c, 2.0F / n);
_ScaleAndShift(b, d, -2.0F / n);
_Unsqueeze(node->grad, e, dim, n);
_Multiply(e, c, a->grad, 1.0F);
_Multiply(node->grad, d, b->grad, 1.0F);
node->visitMark = NODE_FINISHED;
delete c;
delete d;
delete e;
} }
} }
...@@ -44,6 +44,11 @@ private: ...@@ -44,6 +44,11 @@ private:
static static
void GradSum(XTensor * node); void GradSum(XTensor * node);
/* gradient for sum with one dimension: c = a + b * \beta
where the size of b is equal to that of one dimension of a */
static
void GradSumDim(XTensor * node);
/* gradient for multiply (dot production): c = a * b */ /* gradient for multiply (dot production): c = a * b */
static static
void GradMultiply(XTensor * node); void GradMultiply(XTensor * node);
...@@ -51,6 +56,74 @@ private: ...@@ -51,6 +56,74 @@ private:
/* gradient for matrix multiply: c = matmul(a, b) */ /* gradient for matrix multiply: c = matmul(a, b) */
static static
void GradMatrixMul(XTensor * node); void GradMatrixMul(XTensor * node);
/* gradient for log: c = log(a) */
static
void GradLog(XTensor * node);
/* gradient for power */
static
void GradPower(XTensor * node);
/* gradient for negate */
static
void GradNegate(XTensor * node);
/* gradient for ScaleAndShift */
static
void GradScaleAndShift(XTensor * node);
/* gradient for Minus */
static
void GradSub(XTensor * node);
/* gradient for Divide */
static
void GradDiv(XTensor * node);
/* gradient for reduceMean */
static
void GradReduceMean(XTensor * node);
/* gradient for reduceSum */
static
void GradReduceSum(XTensor * node);
/* gradient for reduceSumSquared */
static
void GradReduceSumSquared(XTensor * node);
/* gradient for reduceVariance */
static
void GradReduceVariance(XTensor * node);
/* gradient for sin */
static
void GradSin(XTensor * node);
/* gradient for cos */
static
void GradCos(XTensor * node);
/* gradient for tan */
static
void GradTan(XTensor * node);
/* gradient for exp */
static
void GradExp(XTensor * node);
/* gradient for normalize */
static
void GradNormalize(XTensor * node);
/* gradient for absolute */
static
void GradAbsolute(XTensor * node);
/* gradient for sign */
static
void GradSign(XTensor * node);
}; };
} }
......
...@@ -43,6 +43,12 @@ void XShapeGrad::MakeGrad(XTensor * node) ...@@ -43,6 +43,12 @@ void XShapeGrad::MakeGrad(XTensor * node)
GradMergeList(node); GradMergeList(node);
else if(operID == SHAPE_UNSQUEEZE) else if(operID == SHAPE_UNSQUEEZE)
GradUnsqueeze(node); GradUnsqueeze(node);
else if(operID == SHAPE_SPLIT)
GradSplit(node);
else if(operID == SHAPE_SPLIT_LIST)
GradSplitList(node);
else if (operID == SHAPE_TRANSPOSE)
GradTranspose(node);
else{ else{
ShowNTErrors("TODO!"); ShowNTErrors("TODO!");
} }
...@@ -55,6 +61,13 @@ bool XShapeGrad::IsShapeOP(XTensor * node) ...@@ -55,6 +61,13 @@ bool XShapeGrad::IsShapeOP(XTensor * node)
return (income.typeID & DATA_BASE) != 0; return (income.typeID & DATA_BASE) != 0;
} }
/* post processing of a node */
void XShapeGrad::PostProcessing(XTensor * node, int typeID)
{
if(typeID == SHAPE_SPLIT_LIST)
GradSplitListPost(node);
}
/* /*
gradient for merge gradient for merge
for for
...@@ -134,6 +147,8 @@ void XShapeGrad::GradMerge(XTensor * node) ...@@ -134,6 +147,8 @@ void XShapeGrad::GradMerge(XTensor * node)
gradInputSmall.data = NULL; gradInputSmall.data = NULL;
delete[] dims; delete[] dims;
node->visitMark = NODE_FINISHED;
} }
/* /*
...@@ -213,6 +228,120 @@ void XShapeGrad::GradMergeList(XTensor * node) ...@@ -213,6 +228,120 @@ void XShapeGrad::GradMergeList(XTensor * node)
gradSmall.data = NULL; gradSmall.data = NULL;
delete[] dims; delete[] dims;
} }
node->visitMark = NODE_FINISHED;
}
/*
gradient computation for split:
for
c = split(a)
we have
dE/da = merge(dE/dc)
>> node - the node (c) for backward computation
*/
void XShapeGrad::GradSplit(XTensor * node)
{
XLink &income = node->income;
XTensor * input = income.tails[0];
int whereToSplit = income.GetParamInt(0);
int splitNum = income.GetParamInt(1);
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for SPLIT!");
CheckNTErrors(node->order == input->order + 1, "Wrong tensor orders!");
CheckNTErrors(splitNum == node->dimSize[0], "Wrong split number!");
XNoder::MakeGrad(input);
/* we can simply merge the gradient tensor
if the input is used in spliting only */
if(input->outgo.tailNum == 1)
_Merge(node->grad, input->grad, whereToSplit + 1, 0);
/* if the tensor is used somewhere else, we need another SUM
for gradient accumulation */
else{
XTensor inputGradTMP(input);
_Merge(node->grad, &inputGradTMP, whereToSplit + 1, 0);
_Sum(input->grad, &inputGradTMP, input->grad);
}
node->visitMark = NODE_FINISHED;
}
/*
gradient computation for spliting
where we return the list of the splits
for
list(c_1, ...) = split(a)
we have
dE/da = merge(dE/c_1, ...)
>> node - the node (c) for backward computation
*/
void XShapeGrad::GradSplitList(XTensor * node)
{
XLink &income = node->income;
XTensor * input = income.tails[0];
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for SPLIT!");
CheckNTErrors(node->order == input->order + 1, "Wrong tensor orders!");
node->visitMark = NODE_DOING;
}
/*
gradient computation for spliting. We return
the list of the splits : list(c_1, ...) = split(a).
this method is called only when all nodes of spliting
have been processed. We do this in a post-processing
manner because we can fuze multiple memory copy jobs
one time. This is good for system speed up.
>> node - the node (c) for backward computation
*/
void XShapeGrad::GradSplitListPost(XTensor * node)
{
/* we compute the gradient for current node, rather than for
child node, i.e., we use the outgoing edge here */
XLink &outgo = node->outgo;
XList splits(outgo.tailNum);
int whereToSplit = -1;
int splitNum = 0;
for(int i = 0; i < outgo.tailNum; i++){
XTensor * parent = (XTensor*)outgo.tails[i];
XLink &income = parent->income;
if(income.typeID == SHAPE_SPLIT_LIST){
int w = income.GetParamInt(0);
int splitID = income.GetParamInt(1);
if(whereToSplit < 0)
whereToSplit = w;
splitNum++;
CheckNTErrors(whereToSplit == w, "Wrong dimension for spliting");
CheckNTErrors(income.tailNum == 1, "Something wrong with outgoing edge!");
CheckNTErrors(splitNum - 1 == splitID, "Wrong split id!");
splits.Add(parent);
}
}
/* we can simply merge the gradient tensor
if the node is used in spliting only */
if(outgo.tailNum == splitNum){
_Merge(&splits, node->grad, whereToSplit + 1);
}
/* if the tensor is used as input to other nodes
somewhere else, we need another SUM for gradient
accumulation */
else{
XTensor nodeGradTMP(node);
_Merge(&splits, &nodeGradTMP, whereToSplit + 1);
_Sum(node->grad, &nodeGradTMP, node->grad);
}
} }
/* /*
...@@ -239,6 +368,40 @@ void XShapeGrad::GradUnsqueeze(XTensor * node) ...@@ -239,6 +368,40 @@ void XShapeGrad::GradUnsqueeze(XTensor * node)
CheckNTErrors(output->unitNum = input->unitNum * dSize, "Wrong tensor size!"); CheckNTErrors(output->unitNum = input->unitNum * dSize, "Wrong tensor size!");
_ReduceSum(output->grad, input->grad, dim); _ReduceSum(output->grad, input->grad, dim);
node->visitMark = NODE_FINISHED;
}
/*
gradient for transposing a tensor
for
c = Transpose(a)
we have
dE/da = Transpose(dE/dc)
>> node - the node (c) for backward computation
*/
void XShapeGrad::GradTranspose(XTensor * node)
{
XLink &income = node->income;
CheckNTErrors(income.tailNum == 1, "Wrong input tensor number for TRANSPOSE!");
XTensor * output = node;
XTensor * input = income.tails[0];
XTensor * b = NewTensor(input);
XNoder::MakeGrad(input);
int i = income.GetParamInt(0);
int j = income.GetParamInt(1);
CheckNTErrors(input->order > i && i >= 0, "index of dimension is out of scope!");
CheckNTErrors(input->order > j && j >= 0, "index of dimension is out of scope!");
_Transpose(output->grad, b, i, j);
_Sum(input->grad, b, input->grad);
node->visitMark = NODE_FINISHED;
delete b;
} }
} }
\ No newline at end of file
...@@ -40,18 +40,41 @@ public: ...@@ -40,18 +40,41 @@ public:
static static
bool IsShapeOP(XTensor * node); bool IsShapeOP(XTensor * node);
/* post processing of a node */
static
void PostProcessing(XTensor * node, int typeId);
private: private:
/* gradient for merge: c = merge(a, b, ...) */ /* gradient computation for merge: c = merge(a, b, ...) */
static static
void GradMerge(XTensor * node); void GradMerge(XTensor * node);
/* gradient for merging a list of tensors : c = merge(list(a, b, ...)) */ /* gradient computation for merging a list of tensors : c = merge(list(a, b, ...)) */
static static
void GradMergeList(XTensor * node); void GradMergeList(XTensor * node);
/* gradient for unsqueezing a tensor : c = unsqueeze(a) */ /* gradient computation for split: c = split(a) */
static
void GradSplit(XTensor * node);
/* gradient computation for spliting. we return the list of the splits : list(c_1, ...) = split(a) */
static
void GradSplitList(XTensor * node);
/* gradient computation for spliting. we return the list of the splits : list(c_1, ...) = split(a).
this method is called only when all nodes of spliting have been processed. We do this in a post-processing
manner because we can fuze multiple memory copy jobs one time. This is good for system speed up. */
static
void GradSplitListPost(XTensor * node);
/* gradient computation for unsqueezing a tensor : c = unsqueeze(a) */
static static
void GradUnsqueeze(XTensor * node); void GradUnsqueeze(XTensor * node);
/* gradient computation for unsqueezing a tensor : c = unsqueeze(a) */
static
void GradTranspose(XTensor * node);
}; };
} }
......
...@@ -143,7 +143,7 @@ void XNet::Backward(XList &roots, XList &golds, LOSS_FUNCTION_NAME loss) ...@@ -143,7 +143,7 @@ void XNet::Backward(XList &roots, XList &golds, LOSS_FUNCTION_NAME loss)
/* back-propagation from output to input */ /* back-propagation from output to input */
for(int i = nodes.count - 1; i >= 0; i--){ for(int i = nodes.count - 1; i >= 0; i--){
XTensor * node = (XTensor*)nodes.Get(i); XTensor * node = (XTensor*)nodes.Get(i);;
if(node->visitMark == NODE_FINISHED) if(node->visitMark == NODE_FINISHED)
continue; continue;
...@@ -176,6 +176,10 @@ void XNet::BackwardNode(XTensor * node) ...@@ -176,6 +176,10 @@ void XNet::BackwardNode(XTensor * node)
return; return;
if(!XNoder::IsLeaf(node)){ if(!XNoder::IsLeaf(node)){
/* post processing for parent nodes */
BackwardNodePost(node);
/* process the current node */
if(XMathGrad::IsMathOP(node)) if(XMathGrad::IsMathOP(node))
XMathGrad::MakeGrad(node); XMathGrad::MakeGrad(node);
else if(XFuncGrad::IsFunc(node)) else if(XFuncGrad::IsFunc(node))
...@@ -186,8 +190,24 @@ void XNet::BackwardNode(XTensor * node) ...@@ -186,8 +190,24 @@ void XNet::BackwardNode(XTensor * node)
ShowNTErrors("Wrong node type!"); ShowNTErrors("Wrong node type!");
} }
} }
}
/*
backward computation (in post processing) for a given node
>> node - the node whose parent nodes are not processed yet. So
we do the job at the child node.
*/
void XNet::BackwardNodePost(XTensor * node)
{
bool isSplitList = false;
XLink &outgo = node->outgo;
for(int i = 0; i < outgo.tailNum; i++){
if(outgo.tails[i]->income.typeID == SHAPE_SPLIT_LIST)
isSplitList = true;
}
node->visitMark = NODE_FINISHED; if(isSplitList)
XShapeGrad::PostProcessing(node, SHAPE_SPLIT_LIST);
} }
/* /*
......
...@@ -73,6 +73,9 @@ struct XNet ...@@ -73,6 +73,9 @@ struct XNet
/* backward computation for a given node */ /* backward computation for a given node */
void BackwardNode(XTensor * node); void BackwardNode(XTensor * node);
/* backward computation (in post processing) for a given node */
void BackwardNodePost(XTensor * node);
/* traverse the net and find the topological order by /* traverse the net and find the topological order by
depth-first search (Tarjan's algorithm) */ depth-first search (Tarjan's algorithm) */
void Traverse(XTensor &root); void Traverse(XTensor &root);
......
...@@ -33,7 +33,7 @@ ...@@ -33,7 +33,7 @@
#include "../../tensor/function/FHeader.h" #include "../../tensor/function/FHeader.h"
#include "../../network/XNet.h" #include "../../network/XNet.h"
namespace samplefnnlm namespace fnnlm
{ {
#define MAX_NAME_LENGTH 1024 #define MAX_NAME_LENGTH 1024
...@@ -57,7 +57,7 @@ void LoadArgs(int argc, const char ** argv, FNNModel &model); ...@@ -57,7 +57,7 @@ void LoadArgs(int argc, const char ** argv, FNNModel &model);
void Init(FNNModel &model); void Init(FNNModel &model);
void Check(FNNModel &model); void Check(FNNModel &model);
void Copy(FNNModel &tgt, FNNModel &src); void Copy(FNNModel &tgt, FNNModel &src);
void Clear(FNNModel &model); void Clear(FNNModel &model, bool isNodeGrad);
void InitModelTensor1D(XTensor &tensor, int num, FNNModel &model); void InitModelTensor1D(XTensor &tensor, int num, FNNModel &model);
void InitModelTensor2D(XTensor &tensor, int rowNum, int colNum, FNNModel &model); void InitModelTensor2D(XTensor &tensor, int rowNum, int colNum, FNNModel &model);
void Train(const char * train, bool isShuffled, FNNModel &model); void Train(const char * train, bool isShuffled, FNNModel &model);
...@@ -153,43 +153,80 @@ load arguments ...@@ -153,43 +153,80 @@ load arguments
*/ */
void LoadArgs(int argc, const char ** argv, FNNModel &model) void LoadArgs(int argc, const char ** argv, FNNModel &model)
{ {
fprintf(stderr, "args:\n");
for(int i = 0; i < argc; i++){ for(int i = 0; i < argc; i++){
if(!strcmp(argv[i], "-train") && i + 1 < argc) if(!strcmp(argv[i], "-train") && i + 1 < argc){
strcpy(trainFN, argv[i + 1]); strcpy(trainFN, argv[i + 1]);
if(!strcmp(argv[i], "-model") && i + 1 < argc) fprintf(stderr, " -train=%s\n", argv[i + 1]);
}
if(!strcmp(argv[i], "-model") && i + 1 < argc){
strcpy(modelFN, argv[i + 1]); strcpy(modelFN, argv[i + 1]);
if(!strcmp(argv[i], "-test") && i + 1 < argc) fprintf(stderr, " -model=%s\n", argv[i + 1]);
}
if(!strcmp(argv[i], "-test") && i + 1 < argc){
strcpy(testFN, argv[i + 1]); strcpy(testFN, argv[i + 1]);
if(!strcmp(argv[i], "-output") && i + 1 < argc) fprintf(stderr, " -test=%s\n", argv[i + 1]);
}
if(!strcmp(argv[i], "-output") && i + 1 < argc){
strcpy(outputFN, argv[i + 1]); strcpy(outputFN, argv[i + 1]);
if(!strcmp(argv[i], "-n") && i + 1 < argc) fprintf(stderr, " -output=%s\n", argv[i + 1]);
}
if(!strcmp(argv[i], "-n") && i + 1 < argc){
model.n = atoi(argv[i + 1]); model.n = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-esize") && i + 1 < argc) fprintf(stderr, " -n=%d\n", model.n);
}
if(!strcmp(argv[i], "-esize") && i + 1 < argc){
model.eSize = atoi(argv[i + 1]); model.eSize = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-vsize") && i + 1 < argc) fprintf(stderr, " -esize=%d\n", model.eSize);
}
if(!strcmp(argv[i], "-vsize") && i + 1 < argc){
model.vSize = atoi(argv[i + 1]); model.vSize = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-hdepth") && i + 1 < argc) fprintf(stderr, " -vsize=%d\n", model.vSize);
}
if(!strcmp(argv[i], "-hdepth") && i + 1 < argc){
model.hDepth = atoi(argv[i + 1]); model.hDepth = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-hsize") && i + 1 < argc) fprintf(stderr, " -hdepth=%d\n", model.hDepth);
}
if(!strcmp(argv[i], "-hsize") && i + 1 < argc){
model.hSize = atoi(argv[i + 1]); model.hSize = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-lrate") && i + 1 < argc) fprintf(stderr, " -hsize=%d\n", model.hSize);
}
if(!strcmp(argv[i], "-lrate") && i + 1 < argc){
learningRate = (float)atof(argv[i + 1]); learningRate = (float)atof(argv[i + 1]);
if(!strcmp(argv[i], "-nstep") && i + 1 < argc) fprintf(stderr, " -lrate=%f\n", learningRate);
}
if(!strcmp(argv[i], "-nstep") && i + 1 < argc){
nStep = atoi(argv[i + 1]); nStep = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-nepoch") && i + 1 < argc) fprintf(stderr, " -nstep=%d\n", nStep);
}
if(!strcmp(argv[i], "-nepoch") && i + 1 < argc){
nEpoch = atoi(argv[i + 1]); nEpoch = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-minmax") && i + 1 < argc) fprintf(stderr, " -nepoch=%d\n", nEpoch);
}
if(!strcmp(argv[i], "-minmax") && i + 1 < argc){
minmax = (float)fabs(atof(argv[i + 1])); minmax = (float)fabs(atof(argv[i + 1]));
if(!strcmp(argv[i], "-batch") && i + 1 < argc) fprintf(stderr, " -minmax=%f\n", minmax);
}
if(!strcmp(argv[i], "-batch") && i + 1 < argc){
sentBatch = atoi(argv[i + 1]); sentBatch = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-wbatch") && i + 1 < argc) fprintf(stderr, " -batch=%d\n", sentBatch);
}
if(!strcmp(argv[i], "-wbatch") && i + 1 < argc){
wordBatch = atoi(argv[i + 1]); wordBatch = atoi(argv[i + 1]);
if(!strcmp(argv[i], "-shuffle")) fprintf(stderr, " -wbatch=%d\n", wordBatch);
}
if(!strcmp(argv[i], "-shuffle")){
shuffled = true; shuffled = true;
if(!strcmp(argv[i], "-autodiff")) fprintf(stderr, " -shuffle=true\n");
}
if(!strcmp(argv[i], "-autodiff")){
autoDiff = true; autoDiff = true;
if(!strcmp(argv[i], "-dev") && i + 1 < argc) fprintf(stderr, " -autodiff=true\n");
}
if(!strcmp(argv[i], "-dev") && i + 1 < argc){
model.devID = atoi(argv[i + 1]); model.devID = atoi(argv[i + 1]);
fprintf(stderr, " -dev=%d\n", model.devID);
}
} }
for(int i = 0; i < argc; i++){ for(int i = 0; i < argc; i++){
...@@ -230,16 +267,37 @@ void Copy(FNNModel &tgt, FNNModel &src) ...@@ -230,16 +267,37 @@ void Copy(FNNModel &tgt, FNNModel &src)
} }
} }
/* reset model parameters */ /*
void Clear(FNNModel &model) reset model parameters
>> model - the model whose parameter (gradient) is set to 0
>> isNodeGrad - indicates whether the tensor node keeps the
gradient information
*/
void Clear(FNNModel &model, bool isNodeGrad)
{ {
model.embeddingW.SetZeroAll(); if (isNodeGrad) {
for(int i = 0; i < MAX_HIDDEN_NUM; i++){ if(model.embeddingW.grad != NULL)
model.hiddenW[i].SetZeroAll(); model.embeddingW.grad->SetZeroAll();
model.hiddenB[i].SetZeroAll(); for (int i = 0; i < MAX_HIDDEN_NUM; i++) {
if(model.hiddenW[i].grad != NULL)
model.hiddenW[i].grad->SetZeroAll();
if(model.hiddenB[i].grad != NULL)
model.hiddenB[i].grad->SetZeroAll();
}
if(model.outputW.grad != NULL)
model.outputW.grad->SetZeroAll();
if(model.outputB.grad != NULL)
model.outputB.grad->SetZeroAll();
}
else {
model.embeddingW.SetZeroAll();
for (int i = 0; i < MAX_HIDDEN_NUM; i++) {
model.hiddenW[i].SetZeroAll();
model.hiddenB[i].SetZeroAll();
}
model.outputW.SetZeroAll();
model.outputB.SetZeroAll();
} }
model.outputW.SetZeroAll();
model.outputB.SetZeroAll();
} }
/* /*
...@@ -401,7 +459,7 @@ void Train(const char * train, bool isShuffled, FNNModel &model) ...@@ -401,7 +459,7 @@ void Train(const char * train, bool isShuffled, FNNModel &model)
FNNNet net; FNNNet net;
/* gradident = 0 */ /* gradident = 0 */
Clear(grad); Clear(grad, false);
/* forward computation */ /* forward computation */
Forward(inputs, output, model, net); Forward(inputs, output, model, net);
...@@ -413,6 +471,9 @@ void Train(const char * train, bool isShuffled, FNNModel &model) ...@@ -413,6 +471,9 @@ void Train(const char * train, bool isShuffled, FNNModel &model)
Update(model, grad, learningRate, false); Update(model, grad, learningRate, false);
} }
else{ else{
/* gradient = 0 */
Clear(model, true);
/* forward + backward process */ /* forward + backward process */
ForwardAutoDiff(inputs, output, model); ForwardAutoDiff(inputs, output, model);
...@@ -492,21 +553,24 @@ void Update(FNNModel &model, FNNModel &grad, float epsilon, bool isNodeGrad) ...@@ -492,21 +553,24 @@ void Update(FNNModel &model, FNNModel &grad, float epsilon, bool isNodeGrad)
gradList.Add(&grad.embeddingW); gradList.Add(&grad.embeddingW);
} }
else{ else{
paraList.Add(model.outputW.grad); gradList.Add(model.outputW.grad);
paraList.Add(&model.outputB.grad); gradList.Add(model.outputB.grad);
for (int i = 0; i < model.hDepth; i++) { for (int i = 0; i < model.hDepth; i++) {
paraList.Add(&model.hiddenW[i].grad); gradList.Add(model.hiddenW[i].grad);
paraList.Add(&model.hiddenB[i].grad); gradList.Add(model.hiddenB[i].grad);
} }
paraList.Add(&model.embeddingW.grad); gradList.Add(model.embeddingW.grad);
} }
for (int i = 0; i < paraList.count; i++) { for (int i = 0; i < paraList.count; i++) {
XTensor * para = (XTensor*)paraList.GetItem(i); XTensor * para = (XTensor*)paraList.GetItem(i);
XTensor * paraGrad = (XTensor*)gradList.GetItem(i); XTensor * paraGrad = (XTensor*)gradList.GetItem(i);
//fprintf(stderr, "%d\n", i);
//paraGrad->Dump(stderr, "grad:", 10);
/* the delta rule */ /* the delta rule */
_Sum(para, paraGrad, para, -epsilon); _Sum(para, paraGrad, para, -epsilon);
} }
...@@ -911,7 +975,6 @@ forward process (with tensor connections) ...@@ -911,7 +975,6 @@ forward process (with tensor connections)
*/ */
void ForwardAutoDiff(XTensor inputs[], XTensor &output, FNNModel &model) void ForwardAutoDiff(XTensor inputs[], XTensor &output, FNNModel &model)
{ {
int batchSize = inputs[0].GetDim(0);
int n = model.n; int n = model.n;
int depth = model.hDepth; int depth = model.hDepth;
...@@ -935,15 +998,13 @@ void ForwardAutoDiff(XTensor inputs[], XTensor &output, FNNModel &model) ...@@ -935,15 +998,13 @@ void ForwardAutoDiff(XTensor inputs[], XTensor &output, FNNModel &model)
hidden = Merge(hidden, 2, 0); hidden = Merge(hidden, 2, 0);
/* hidden layers */ /* hidden layers */
for(int i = 0; i < depth; i++){ for(int i = 0; i < depth; i++)
b = Unsqueeze(model.hiddenB[i], 1, batchSize); hidden = MMul(hidden, model.hiddenW[i]) + model.hiddenB[i];
hidden = MMul(hidden, model.hiddenW) + b;
}
b = Unsqueeze(model.outputB, 1, batchSize);
/* output layer */ /* output layer */
output = LogSoftmax(MMul(hidden, model.outputW) + b, 1); output = LogSoftmax(MMul(hidden, model.outputW) + model.outputB, 1);
//XLink::ShowNetwork(stderr, &output);
} }
/* /*
...@@ -1040,18 +1101,23 @@ void Test(const char * test, const char * result, FNNModel &model) ...@@ -1040,18 +1101,23 @@ void Test(const char * test, const char * result, FNNModel &model)
/* the gold standard */ /* the gold standard */
XTensor gold; XTensor gold;
/* prepare an empty network for building the fnn */ if (!autoDiff) {
FNNNet net; /* prepare an empty network for building the fnn */
FNNNet net;
/* make the input tensor for position i */ /* make the input tensor for position i */
for (int i = 0; i < model.n - 1; i++) for (int i = 0; i < model.n - 1; i++)
MakeWordBatch(inputs[i], ngrams, ngramNum, i, model.vSize, model.devID, model.mem); MakeWordBatch(inputs[i], ngrams, ngramNum, i, model.vSize, model.devID, model.mem);
/* make the gold tensor */ /* make the gold tensor */
MakeWordBatch(gold, ngrams, ngramNum, model.n - 1, model.vSize, model.devID, model.mem); MakeWordBatch(gold, ngrams, ngramNum, model.n - 1, model.vSize, model.devID, model.mem);
/* forward computation */ /* forward computation */
Forward(inputs, output, model, net); Forward(inputs, output, model, net);
}
else {
ForwardAutoDiff(inputs, output, model);
}
/* prediction probabilities */ /* prediction probabilities */
XTensor probs; XTensor probs;
......
...@@ -36,7 +36,7 @@ ...@@ -36,7 +36,7 @@
using namespace nts; using namespace nts;
namespace samplefnnlm namespace fnnlm
{ {
#define _EXIT_(x)// exit(x) #define _EXIT_(x)// exit(x)
...@@ -126,7 +126,7 @@ struct FNNNet ...@@ -126,7 +126,7 @@ struct FNNNet
XTensor output; XTensor output;
}; };
/* entry of the program */ /* entrance of the program */
int FNNLMMain(int argc, const char ** argv); int FNNLMMain(int argc, const char ** argv);
}; };
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include <math.h>
#include "T2TAttention.h"
#include "T2TUtility.h"
#include "../../tensor/core/CHeader.h"
namespace transformer
{
/* constructor */
T2TAttention::T2TAttention()
{
nhead = -1;
dk = -1;
dv = -1;
d = -1;
}
/* deconstructor */
T2TAttention::~T2TAttention()
{
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> myDevID - device id
>> myMem - the memory pool
*/
void T2TAttention::InitModel(int argc, const char ** argv, int myDevID, XMem * myMem)
{
devID = myDevID;
mem = myMem;
float minmax = 0;
LoadParamInt(argc, argv, "nhead", &nhead, 8);
LoadParamInt(argc, argv, "dk", &dk, 512);
LoadParamInt(argc, argv, "dv", &dv, 512);
LoadParamInt(argc, argv, "d", &d, 512);
LoadParamFloat(argc, argv, "attminmax", &minmax, 0.08F);
InitTensor2D(&wk, d, dk, X_FLOAT, devID, mem);
InitTensor2D(&wq, d, dk, X_FLOAT, devID, mem);
InitTensor2D(&wv, d, dv, X_FLOAT, devID, mem);
wk.SetDataRand(-minmax, minmax);
wq.SetDataRand(-minmax, minmax);
wv.SetDataRand(-minmax, minmax);
}
/*
make the network
>> k - keys. It might be of size B * L * H
where B = batch size, L = sequence length,
and H = vector size of each position
>> q - queries
>> v - values
<< return - multi-attention result
*/
XTensor * T2TAttention::Make(XTensor * k, XTensor * q, XTensor * v)
{
XTensor k2;
XTensor q2;
XTensor v2;
/* linear transofmration before self-attention */
k2 = MMul(*k, wk);
q2 = MMul(*q, wq);
v2 = MMul(*v, wv);
XTensor kheads;
XTensor qheads;
XTensor vheads;
/* multi head */
kheads = Split(k2, k2.order - 1, nhead);
qheads = Split(q2, q2.order - 1, nhead);
vheads = Split(v2, v2.order - 1, nhead);
XTensor att;
XTensor scalar;
/* scalar = softmax(Q * K^T / sqrt(dk)) * V */
scalar = Softmax(Linear(BMMul(qheads, X_NOTRANS, kheads, X_TRANS), 1/sqrt((float)dk)), -1);
att = MMul(scalar, vheads);
XTensor * result = new XTensor();
/* concatenate the heads */
*result = Merge(att, -1);
return result;
}
}
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TATTENTION_H__
#define __T2TATTENTION_H__
#include "../../network/XNet.h"
using namespace nts;
namespace transformer
{
/*
multi-head attention
y(Q, K, V) = cat(head_1, head_2, ..., head_n)
where head_i = Attention(Q * w_i^Q, K * w_i^K, V * w_i^V)
attention(Q, K, V) = softmax(Q * K^T/d_k^0.5) V
d_k = dimension size of K
*/
class T2TAttention
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
/* head number */
int nhead;
/* transformation matrix for K */
XTensor wk;
/* transformation matrix for Q */
XTensor wq;
/* transformation matrix for V */
XTensor wv;
/* size of transformed Q and K */
int dk;
/* size of transformed V */
int dv;
/* size of input Q, K and V */
int d;
public:
/* constructor */
T2TAttention();
/* de-constructor */
~T2TAttention();
/* initialize the model */
void InitModel(int argc, const char ** argv, int myDevID = -1, XMem * myMem = NULL);
/* make the network */
XTensor * Make(XTensor * k, XTensor * q, XTensor * v);
};
}
#endif
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TDECODER_H__
#define __T2TDECODER_H__
namespace transformer
{
class T2TDecoder
{
};
class AttDecoder : T2TDecoder
{
public:
/* initialize the model */
void InitModel(int argc, const char ** argv);
};
}
#endif
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-08-01
*/
#include <math.h>
#include "T2TEmbedding.h"
#include "T2TUtility.h"
#include "../../tensor/core/CHeader.h"
namespace transformer
{
/* constructor */
T2TEmbedder::T2TEmbedder()
{
devID = -1;
mem = NULL;
vSize = -1;
maxLength = -1;
}
/* deconstructor */
T2TEmbedder::~T2TEmbedder()
{
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> myDevID - device id
>> myMem - the memory pool
*/
void T2TEmbedder::InitModel(int argc, const char ** argv, int myDevID, XMem * myMem)
{
devID = myDevID;
mem = myMem;
int d = 0;
LoadParamInt(argc, argv, "vsize", &vSize, -1);
LoadParamInt(argc, argv, "maxlen", &maxLength, 256);
LoadParamInt(argc, argv, "d", &d, 256);
InitTensor2D(&w, vSize, eSize, X_FLOAT, devID, mem);
w.SetDataRandn(0, sqrt((float)eSize));
/* create the positional embedding matrix */
MakePosEmbedding(eSize, d, maxLength);
}
/*
make positional embeddings (of size eSize * length
eSize - embedding size
length - length of the sequenc
*/
void T2TEmbedder::MakePosEmbedding(int eSize, int d, int length)
{
InitTensor2D(&posEmbedding, length, eSize, X_FLOAT, devID, mem);
float * data = new float[posEmbedding.unitNum];
for(int pos = 0; pos < length; pos++){
float * dp = data + pos * eSize;
for(int k = 0; k < eSize; k++){
if(k % 2 == 0){
int i = k/2;
dp[k] = sin(pos/pow(10000.0F, 2.0F*i/d));
}
else{
int i = (k - 1)/2;
dp[k] = cos(pos/pow(10000.0F, 2.0F*i/d));
}
}
}
posEmbedding.SetData(data, posEmbedding.unitNum);
delete[] data;
}
/*
make the network
*/
XTensor * T2TEmbedder::Make(XTensor * input)
{
CheckNTErrors(input->GetDim(-1) == vSize, "Wrong vocabulary size!");
CheckNTErrors(input->order > 1, "Wrong input tensor size!");
CheckNTErrors(input->dimSize[input->order - 2] < maxLength, "The sequence is too long!");
int dims[MAX_TENSOR_DIM_NUM];
memcpy(dims, input->dimSize, input->order);
dims[0] = eSize;
bool match = (posEmbedding.order == input->order);
if(match){
for(int i = 0; i < input->order; i++){
if(dims[i] != posEmbedding.GetDim(i))
match = false;
}
}
/* we make positional embeddings first */
if(!match){
InitTensor(&posEmbedding, input->order, dims, X_FLOAT, 1.0F, devID, mem);
XTensor * posTMP = NewTensorBuf(2, dims, X_FLOAT, 1.0F, devID, mem);
_CopyValues(&posEmbeddingBase, 0, posTMP->unitNum, posTMP, 0);
int dims2[MAX_TENSOR_DIM_NUM];
dims2[0] = dims[0];
dims2[1] = dims[1];
dims2[2] = posEmbedding.unitNum / (dims[0] * dims[1]);
posEmbedding.Reshape(3, dims2);
_Unsqueeze(posTMP, &posEmbedding, 0, dims2[2]);
posEmbedding.Reshape(input->order, dims);
DelTensorBuf(posTMP);
}
XTensor wordEmbedding;
/* then we make word embeddings */
wordEmbedding = MMul(*input, w);
XTensor * result = new XTensor();
/* we sum over the two embeddings */
*result = wordEmbedding + posEmbedding;
return result;
}
}
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-08-01
*/
#ifndef __T2TEMBEDDING_H__
#define __T2TEMBEDDING_H__
#include "../../network/XNet.h"
using namespace nts;
namespace transformer
{
/*
embedding (of word at position i):
word embedding + positional embedding
*/
class T2TEmbedder
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
/* vocabulary size */
int vSize;
/* embedding size */
int eSize;
/* maximum length of the sequence */
int maxLength;
/* word embedding matrix */
XTensor w;
/* predefined positional embeddings. It can speeds up
the embedding processing by re-loading. */
XTensor posEmbeddingBase;
/* positional embeddings */
XTensor posEmbedding;
public:
/* constructor */
T2TEmbedder();
/* de-constructor */
~T2TEmbedder();
/* initialize the model */
void InitModel(int argc, const char ** argv, int myDevID = -1, XMem * myMem = NULL);
/* make positional embeddings */
void MakePosEmbedding(int eSize, int d, int length);
/* make the network */
XTensor * Make(XTensor * input);
};
}
#endif
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include <math.h>
#include "T2TEncoder.h"
#include "T2TLayerNormal.h"
#include "T2TUtility.h"
#include "../../tensor/core/CHeader.h"
namespace transformer
{
/* constructor */
AttEncoder::AttEncoder()
{
}
/* de-constructor */
AttEncoder::~AttEncoder()
{
delete[] attentions;
delete[] fnns;
delete[] layerNorms;
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> myDevID - device id
>> myMem - the memory pool
*/
void AttEncoder::InitModel(int argc, const char ** argv, int myDevID, XMem * myMem)
{
devID = myDevID;
mem = myMem;
LoadParamInt(argc, argv, "nstack", &nlayer, 6);
LoadParamInt(argc, argv, "hsize", &hSize, 512);
LoadParamInt(argc, argv, "esize", &eSize, 512);
LoadParamInt(argc, argv, "vsize", &vSize, -1);
CheckNTErrors(nlayer > 1, "We have one encoding layer at least!");
CheckNTErrors(vSize > 1, "set vocabulary size by \"-vsize\"");
/* embedding model */
embedder.InitModel(argc, argv, devID, mem);
attentions = new T2TAttention[nlayer];
fnns = new T2TFNN[nlayer];
layerNorms = new T2TLN[nlayer];
/* initialize the stacked layers */
for(int i = 0; i < nlayer; i++){
attentions[i].InitModel(argc, argv, myDevID, myMem);
fnns[i].InitModel(argc, argv, myDevID, myMem);
layerNorms[i].InitModel(argc, argv, myDevID, myMem);
}
}
/*
make the encoding network
>> input - the input tensor of the encoder
<< return - the output tensor of the encoder
*/
XTensor * AttEncoder::Make(XTensor * input)
{
XTensor * x = embedder.Make(input);
for(int i = 0; i < nlayer; i++){
XTensor * att;
XTensor * ln;
XTensor * fnn;
XTensor res;
/* self attention */
att = attentions[i].Make(x, x, x);
/* residual connection */
res = Sum(*att, *x);
/* TODO: dropout */
/* layer normalization */
ln = layerNorms[i].Make(&res);
/* input of next layer */
x = ln;
/* fnn */
fnn = fnns[i].Make(x);
/* residual connection */
res = Sum(*fnn, *x);
/* TODO: dropout */
/* layer normalization */
ln = layerNorms[i].Make(&res);
/* input of next layer */
x = ln;
}
return x;
}
}
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TENCODER_H__
#define __T2TENCODER_H__
#include "T2TFNN.h"
#include "T2TAttention.h"
#include "T2TEmbedding.h"
#include "T2TLayerNormal.h"
#include "../../network/XNet.h"
using namespace nts;
namespace transformer
{
/*
base class of the encoder
*/
class T2TEncoder
{
public:
virtual
XTensor * Make(XTensor * input) = 0;
};
/*
the encoder based on RNN
*/
class RNNEncoder : T2TEncoder
{
public:
XTensor * Make(XTensor * input);
};
/*
the encoder based on self-attention
*/
class AttEncoder : T2TEncoder
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
/* layer number */
int nlayer;
/* hidden layer size of the FNN layer */
int hSize;
/* embedding size */
int eSize;
/* vocabulary size */
int vSize;
/* embedding of word at each position */
T2TEmbedder embedder;
/* FNN model of each layer */
T2TFNN * fnns;
/* attention model of each layer */
T2TAttention * attentions;
/* layer normalization */
T2TLN * layerNorms;
/* input tensor of the encoder */
XTensor * input;
/* output tensor of the encoder */
XTensor * output;
public:
/* constructor */
AttEncoder();
/* de-constructor */
~AttEncoder();
/* initialize the model */
void InitModel(int argc, const char ** argv, int myDevID = -1, XMem * myMem = NULL);
/* make the encoding network */
XTensor * Make(XTensor * input);
};
}
#endif
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include "T2TFNN.h"
#include "T2TUtility.h"
#include "../../tensor/core/CHeader.h"
#include "../../tensor/function/FHeader.h"
namespace transformer
{
/* constructor */
T2TFNN::T2TFNN()
{
inSize = -1;
outSize = -1;
hSize = -1;
}
/* deconstructor */
T2TFNN::~T2TFNN()
{
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> myDevID - device id
>> myMem - the memory pool
*/
void T2TFNN::InitModel(int argc, const char ** argv, int myDevID, XMem * myMem)
{
devID = myDevID;
mem = myMem;
float minmax = 0;
LoadParamInt(argc, argv, "d", &inSize, 512);
LoadParamInt(argc, argv, "d", &outSize, 512);
LoadParamInt(argc, argv, "fnnh", &hSize, 512);
LoadParamFloat(argc, argv, "fnnminmax", &minmax, 0.08F);
InitTensor2D(&w1, inSize, hSize, X_FLOAT, devID, mem);
InitTensor1D(&b1, hSize, X_FLOAT, devID, mem);
InitTensor2D(&w2, hSize, outSize, X_FLOAT, devID, mem);
InitTensor1D(&b2, outSize, X_FLOAT, devID, mem);
w1.SetDataRand(-minmax, minmax);
b1.SetDataRand(-minmax, minmax);
w2.SetDataRand(-minmax, minmax);
b2.SetDataRand(-minmax, minmax);
}
/*
make the network
y = max(0, x * w1 + b1) * w2 + b2
>> input - the input tensor
>> return - the output tensor
*/
XTensor * T2TFNN::Make(XTensor * input)
{
XTensor t1;
XTensor * result = new XTensor();
/* t1 = max(0, x * w1 + b1) */
t1 = Rectify(MMul(*input, X_NOTRANS, w1, X_NOTRANS) + b1);
/* result = t1 * w2 + b2 */
*result = MMul(t1, X_NOTRANS, w2, X_NOTRANS) + b2;
return result;
}
}
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TFNN_H__
#define __T2TFNN_H__
#include "../../tensor/XTensor.h"
using namespace nts;
namespace transformer
{
/* a fnn: y = max(0, x * w1 + b1) * w2 + b2 */
class T2TFNN
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
/* size of input vector */
int inSize;
/* size of output vector */
int outSize;
/* size of hidden layers */
int hSize;
/* matrix of transformation 1 */
XTensor w1;
/* bias of transformation 1 */
XTensor b1;
/* matrix of transformation 2 */
XTensor w2;
/* bias of transformation 2 */
XTensor b2;
public:
/* constructor */
T2TFNN();
/* deconstructor */
~T2TFNN();
/* initialize the model */
void InitModel(int argc, const char ** argv, int myDevID = -1, XMem * myMem = NULL);
/* make the network */
XTensor * Make(XTensor * input);
};
}
#endif
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include "T2TLayerNormal.h"
namespace transformer
{
/* constructor */
T2TLN::T2TLN()
{
devID = -1;
mem = NULL;
}
/* de-constructor */
T2TLN::~T2TLN()
{
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> myDevID - device id
>> myMem - the memory pool
*/
void T2TLN::InitModel(int argc, const char ** argv, int myDevID, XMem * myMem)
{
devID = myDevID;
mem = myMem;
}
/*
make the network
for each layer representation x, we have
y =
>> input - the input tensor
>> return - layer normalization output
*/
XTensor * T2TLN::Make(XTensor * input)
{
return NULL;
}
}
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TLAYERNORMAL_H__
#define __T2TLAYERNORMAL_H__
#include "../../network/XNet.h"
using namespace nts;
namespace transformer
{
class T2TLN
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
public:
/* constructor */
T2TLN();
/* de-constructor */
~T2TLN();
/* initialize the model */
void InitModel(int argc, const char ** argv, int myDevID = -1, XMem * myMem = NULL);
/* make the network */
XTensor * Make(XTensor * input);
};
}
#endif
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include "T2TModel.h"
#include "T2TUtility.h"
namespace transformer
{
/* constructor */
T2TModel::T2TModel()
{
devID = -1;
mem = NULL;
isLM = false;
isMT = false;
}
/* de-constructor */
T2TModel::~T2TModel()
{
delete mem;
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
*/
void T2TModel::InitModel(int argc, const char ** argv)
{
bool useMem = false;
LoadParamInt(argc, argv, "dev", &devID, -1);
LoadParamBool(argc, argv, "mem", &useMem, useMem);
LoadParamBool(argc, argv, "lm", &isLM, true);
LoadParamBool(argc, argv, "mt", &isMT, false);
if(useMem){
delete mem;
mem = new XMem(devID);
}
encoder.InitModel(argc, argv, devID, mem);
outputLayer.InitModel(argc, argv, devID, mem);
}
/*
make the encoding network
>> input - input tensor
<< return - encoding result
*/
XTensor * T2TModel::MakeEncoding(XTensor * input)
{
return encoder.Make(input);
}
/*
make the entire network (with the output softmax layer)
>> input - input tensor
>> output - output tensor (distribution)
*/
void T2TModel::Make(XTensor * input, XTensor * output)
{
if(isLM){
XTensor * encoding = MakeEncoding(input);
outputLayer.Make(encoding, output);
}
else{
ShowNTErrors("TODO!");
}
}
}
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TMODEL_H__
#define __T2TMODEL_H__
#include "T2TFNN.h"
#include "T2TAttention.h"
#include "T2TEncoder.h"
#include "T2TDecoder.h"
#include "T2TOutput.h"
namespace transformer
{
class T2TModel
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
/* the encoder */
AttEncoder encoder;
/* the decoder */
AttDecoder decoder;
/* output layer */
T2TOutput outputLayer;
/* indicates whether the model is running for language modeling */
bool isLM;
/* indicates whether the model is running for machine translation */
bool isMT;
public:
/* constructor */
T2TModel();
/* de-constructor */
~T2TModel();
/* initialize the model */
void InitModel(int argc, const char ** argv);
/* make the encoding network */
XTensor * MakeEncoding(XTensor * input);
/* make the entire network (with the output softmax layer) */
void Make(XTensor * input, XTensor * output);
};
}
#endif
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include "T2TOutput.h"
#include "T2TUtility.h"
#include "../../tensor/core/CHeader.h"
namespace transformer
{
/* constructor */
T2TOutput::T2TOutput()
{
devID = -1;
mem = NULL;
vSize = -1;
inSize = -1;
hSize = -1;
}
/* de-constructor */
T2TOutput::~T2TOutput()
{
}
/*
initialize the model
>> argc - number of arguments
>> argv - list of pointers to the arguments
>> myDevID - device id
>> myMem - the memory pool
*/
void T2TOutput::InitModel(int argc, const char ** argv, int myDevID, XMem * myMem)
{
devID = myDevID;
mem = myMem;
LoadParamInt(argc, argv, "vsize", &vSize, -1);
LoadParamInt(argc, argv, "hsize", &inSize, 512);
LoadParamInt(argc, argv, "hsize", &hSize, 512);
}
/*
make the network
y = softmax(x * w)
>> input - input tensor
<< return - output tensor
*/
XTensor * T2TOutput::Make(XTensor * input)
{
XTensor &x = *input;
XTensor * result = new XTensor();
*result = LogSoftmax(MMul(x, w), -1);
return result;
}
/*
make the network (redefined output tensor)
>> input - input tensor
>> output - output tensor
*/
void T2TOutput::Make(XTensor * input, XTensor * output)
{
XTensor &x = *input;
*output = LogSoftmax(MMul(x, w), -1);
}
}
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TOUTPUT_H__
#define __T2TOUTPUT_H__
#include "../../tensor/function/FHeader.h"
using namespace nts;
namespace transformer
{
/* output layer */
class T2TOutput
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
/* vocabulary size */
int vSize;
/* input vector size */
int inSize;
/* vector size of the linear transformation */
int hSize;
/* transformation matrix */
XTensor w;
public:
/* constructor */
T2TOutput();
/* de-constructor */
~T2TOutput();
/* initialize the model */
void InitModel(int argc, const char ** argv, int myDevID = -1, XMem * myMem = NULL);
/* make the network */
XTensor * Make(XTensor * input);
/* make the network (redefined output tensor) */
void Make(XTensor * input, XTensor * output);
};
}
#endif
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-08-02
*/
#include "T2TTrainer.h"
#include "T2TUtility.h"
namespace transformer
{
/* constructor */
T2TTrainer::T2TTrainer()
{
seqLen = NULL;
nseqBuf = 0;
nextSeq = -1;
}
/* de-constructor */
T2TTrainer::~T2TTrainer()
{
delete[] buf;
delete[] seqLen;
}
/*
initialization
>> argc - number of arguments
>> argv - list of pointers to the arguments
*/
void T2TTrainer::Init(int argc, const char ** argv)
{
LoadParamFloat(argc, argv, "lrate", &lrate, 0.001F);
LoadParamInt(argc, argv, "sbatch", &sBatchSize, 1);
LoadParamInt(argc, argv, "wbatch", &wBatchSize, 1);
LoadParamInt(argc, argv, "nepoch", &nepoch, 1);
LoadParamInt(argc, argv, "nstep", &nstep, 1);
int maxUnitInBuf;
LoadParamInt(argc, argv, "bufsize", &maxUnitInBuf, 20000);
buf = new int[maxUnitInBuf];
seqLen = new int[maxUnitInBuf];
seqOffset = new int[maxUnitInBuf];
}
/*
train the model
>> fn - training data file
>> model - model to train
*/
void T2TTrainer::Train(const char * fn, T2TModel * model)
{
}
char line[MAX_SEQUENCE_LENGTH];
/*
load data to buffer
>> file - where to load data
*/
int T2TTrainer::LoadBuf(FILE * file)
{
int lineCount = 0;
int seqCount = 0;
int wordCount = 0;
while(fgets(line, MAX_SEQUENCE_LENGTH - 1, file)){
int len = (int)strlen(line);
if(line[len - 1] == '\r')
line[len - 1] = 0;
len = (int)strlen(line);
if(len == 0)
continue;
/* how many characters are in a word */
int wSize = 0;
/* how many words are in the sentence */
int wNum = 0;
int wNumLocal = 0;
for(int i = 0; i < len; i++){
/* load word (id) seperated by space or tab */
if((line[i] == ' ' || line[i] == '\t' || i == len - 1) && wSize > 0){
line[i] = 0;
if(wSize == 3 && line[i - 1] == '|' && line[i - 2] == '|' && line[i - 3] == '|'){
seqLen[seqCount] = wNumLocal;
seqOffset[seqCount] = wordCount + wNum - wNumLocal;
seqCount++;
wNumLocal = 0;
}
else{
buf[wNum++] = atoi(line + i - wSize);
wNumLocal++;
}
wSize = 0;
}
else
wSize++;
}
seqLen[seqCount] = wNumLocal;
seqOffset[seqCount] = wordCount + wNum - wNumLocal;
seqCount++;
wordCount += wNum;
lineCount++;
if(wordCount >= wBatchSize)
break;
if(lineCount >= sBatchSize)
break;
}
nseqBuf = seqCount;
nextSeq = 0;
return lineCount;
}
/*
load a batch of sequences
>> file - the handle to the data file
>> batch - the batch
>> step - the step we go over when move to the next sequence
>> vs - vocabulary size
>> sBatch - batch size of sequences
>> wBatch - batch size of words
>> isSorted - indicates whether the sequences are sorted by length
*/
int T2TTrainer::LoadBatch(FILE * file, XTensor * batch, int step, int vs, int sBatch, int wBatch, bool isSorted)
{
if(nextSeq >= nseqBuf)
LoadBuf(file);
int seq = nextSeq;
int wc = 0;
int sc = 0;
int max = 0;
while(seq < nseqBuf){
wc += seqLen[seq];
sc += 1;
if(max < wc)
max = wc;
if(sc >= sBatch && wc >= wBatch)
break;
}
if(sc > 0){
int dims[MAX_TENSOR_DIM_NUM];
dims[0] = sc;
dims[1] = max;
dims[2] = vs;
if(batch->order != 3 || batch->GetDim(0) != dims[0] ||
batch->GetDim(1) != dims[1] || batch->GetDim(2) != dims[2]){
InitTensor(batch, 3, dims, X_FLOAT, 1.0F, devID, mem);
}
batch->SetZeroAll();
for(int s = seq; s < seq + sc; s++){
for(int w = 0; w < seqLen[s]; w++){
batch->Set3D(1.0F, s - seq, w, buf[seqOffset[s] + w]);
}
}
}
return sc;
}
}
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-08-02
*/
#ifndef __T2TTRAINER_H__
#define __T2TTRAINER_H__
#include "T2TModel.h"
#include "../../tensor/function/FHeader.h"
#define MAX_SEQUENCE_LENGTH 1024 * 64
using namespace nts;
namespace transformer
{
/* trainer of the T2T model */
class T2TTrainer
{
public:
/* device id */
int devID;
/* memory pool */
XMem * mem;
/* buffer for loading words */
int * buf;
/* length of each sequence */
int * seqLen;
/* offset of the first word for each sequence */
int * seqOffset;
/* number of sequences in the buffer */
int nseqBuf;
/* offset for next sequence in the buffer */
int nextSeq;
/* vocabulary size of the source side */
int vSize;
/* learning rate */
float lrate;
/* sentence batch size */
int sBatchSize;
/* word batch size */
int wBatchSize;
/* training epoch number */
int nepoch;
/* traing step number */
int nstep;
public:
/* constructor */
T2TTrainer();
/* de-constructor */
~T2TTrainer();
/* initialize the trainer */
void Init(int argc, const char ** argv);
/* train the model */
void Train(const char * fn, T2TModel * model);
/* load data to buffer */
int LoadBuf(FILE * file);
/* load a batch of sequences */
int LoadBatch(FILE * file, XTensor * batch, int step, int vs, int sBatch, int wBatch, bool isSorted);
};
}
#endif
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
namespace transformer
{
void LoadParamString(int argc, const char ** argv, const char * name, char * p, char * defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for(int i = 0; i < argc; i++){
if(!strcmp(argv[i], vname) && i + 1 < argc){
*(int*)p = atoi(argv[i + 1]);
fprintf(stderr, " %s=%s\n", name, argv[i + 1]);
hit = true;
}
}
if(!hit)
strcpy(p, defaultP);
}
void LoadParamInt(int argc, const char ** argv, const char * name, int * p, int defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for(int i = 0; i < argc; i++){
if(!strcmp(argv[i], vname) && i + 1 < argc){
*(int*)p = atoi(argv[i + 1]);
fprintf(stderr, " %s=%s\n", name, argv[i + 1]);
hit = true;
}
}
if(!hit)
*p = defaultP;
}
void LoadParamBool(int argc, const char ** argv, const char * name, bool * p, bool defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for(int i = 0; i < argc; i++){
if(!strcmp(argv[i], vname)){
*(bool*)p = true;
fprintf(stderr, " %s=%s\n", name, "true");
}
}
if(!hit)
*p = defaultP;
}
void LoadParamFloat(int argc, const char ** argv, const char * name, float * p, float defaultP)
{
char vname[128];
vname[0] = '-';
strcpy(vname + 1, name);
bool hit = false;
for(int i = 0; i < argc; i++){
if(!strcmp(argv[i], vname) && i + 1 < argc){
strcpy((char*)p, argv[i + 1]);
fprintf(stderr, " %s=%s\n", name, argv[i + 1]);
}
}
if(!hit)
*p = defaultP;
}
}
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#ifndef __T2TUTILITY_H__
#define __T2TUTILITY_H__
#include <stdio.h>
namespace transformer
{
/* load model parameters */
void LoadParamString(int argc, const char ** argv, const char * name, char * p, char * defaultP);
void LoadParamInt(int argc, const char ** argv, const char * name, int * p, int defaultP);
void LoadParamBool(int argc, const char ** argv, const char * name, bool * p, bool defaultP);
void LoadParamFloat(int argc, const char ** argv, const char * name, float * p, float defaultP);
}
#endif
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
*/
#include "Transformer.h"
namespace transformer
{
int TransformerMain(int argc, const char ** argv)
{
return 0;
}
}
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
*
* An impelementation of the transformer system. See more details
* about FNNLM in
* "Attention Is All You Need" by Vaswani et al.
* https://arxiv.org/pdf/1706.03762.pdf
*
* $Created by: XIAO Tong (xiaotong@mail.neu.edu.cn) 2018-07-31
* I start writing the code related to NMT - a long time since my last coding
* work on MT
*/
#ifndef __TRANSFORMER_H__
#define __TRANSFORMER_H__
#include "../../tensor/XGlobal.h"
#include "../../tensor/XTensor.h"
#include "../../tensor/core/CHeader.h"
namespace transformer
{
/* entrance of the program */
int TransformerMMain(int argc, const char ** argv);
}
#endif
\ No newline at end of file
...@@ -29,6 +29,7 @@ ...@@ -29,6 +29,7 @@
#include "XTensor.h" #include "XTensor.h"
#include "XDevice.h" #include "XDevice.h"
#include "./test/Test.h" #include "./test/Test.h"
#include "./core/CHeader.h"
//#define CRTDBG_MAP_ALLOC //#define CRTDBG_MAP_ALLOC
//#include <stdlib.h> //#include <stdlib.h>
...@@ -36,7 +37,9 @@ ...@@ -36,7 +37,9 @@
using namespace nts; using namespace nts;
void SetDataTest();
void SmallTest(); void SmallTest();
void TransposeTest();
int main( int argc, const char ** argv ) int main( int argc, const char ** argv )
{ {
...@@ -92,3 +95,35 @@ void SmallTest() ...@@ -92,3 +95,35 @@ void SmallTest()
c.Dump(stderr, "c:"); c.Dump(stderr, "c:");
d.Dump(stderr, "d:"); d.Dump(stderr, "d:");
} }
void TransposeTest()
{
XTensor a;
XTensor b;
int I = 2;
int J = 3;
InitTensor4D(&a, 2, 3, 4, 5);
int * dims = new int[a.order];
memcpy(dims, a.dimSize, sizeof(int) * a.order);
dims[I] = a.dimSize[J];
dims[J] = a.dimSize[I];
InitTensor(&b, 4, dims);
a.SetZeroAll();
b.SetZeroAll();
float * data = new float[a.unitNum];
for(int i = 0; i < a.unitNum; i++)
data[i] = (float)i;
a.SetData(data, a.unitNum, 0);
_Transpose(&a, &b, I, J);
b.Dump(stderr, "b:");
delete[] data;
}
...@@ -40,6 +40,7 @@ XDevManager GDevs; ...@@ -40,6 +40,7 @@ XDevManager GDevs;
/* constructor */ /* constructor */
XDevice::XDevice() XDevice::XDevice()
{ {
stream = NULL;
Clear(); Clear();
#ifdef USE_CUDA #ifdef USE_CUDA
...@@ -55,6 +56,8 @@ XDevice::~XDevice() ...@@ -55,6 +56,8 @@ XDevice::~XDevice()
MUTEX_DELE(cublasMutex); MUTEX_DELE(cublasMutex);
if(isHandleReady) if(isHandleReady)
cublasDestroy(cublasHandle); cublasDestroy(cublasHandle);
if(stream != NULL)
delete stream;
#endif #endif
} }
...@@ -118,6 +121,8 @@ void XDevice::Init(int myDevID) ...@@ -118,6 +121,8 @@ void XDevice::Init(int myDevID)
} }
else else
sprintf(name2, "GPU-%d %s", devID, name); sprintf(name2, "GPU-%d %s", devID, name);
stream = new XStream(0, devID);
#endif #endif
} }
...@@ -161,6 +166,14 @@ cublasHandle_t * XDevice::GetCublasHandle() ...@@ -161,6 +166,14 @@ cublasHandle_t * XDevice::GetCublasHandle()
return &cublasHandle; return &cublasHandle;
} }
/* get the stream of cuda */
cudaStream_t * XDevice::GetCudaStream()
{
CheckNTErrors(stream != NULL, "the stream is not initialized!");
return &stream->stream;
}
#endif // USE_CUDA #endif // USE_CUDA
/* switch to a device */ /* switch to a device */
...@@ -311,11 +324,19 @@ void XDevManager::Clear() ...@@ -311,11 +324,19 @@ void XDevManager::Clear()
/* get the handle of GPU */ /* get the handle of GPU */
cublasHandle_t * XDevManager::GetCudaHandle(const int devID) cublasHandle_t * XDevManager::GetCudaHandle(const int devID)
{ {
CheckNTErrors((devID < nGPU), "index of GPU is out of range."); CheckNTErrors(devID < nGPU, "index of GPU is out of range.");
return GPUs[devID].GetCublasHandle(); return GPUs[devID].GetCublasHandle();
} }
/* get the stream of cuda */
cudaStream_t * XDevManager::GetCudaStream(const int devID)
{
CheckNTErrors(devID < nGPU, "index of GPU is out of range.");
return GPUs[devID].GetCudaStream();
}
#endif #endif
/* /*
...@@ -384,13 +405,10 @@ int XDevManager::GetCudaThread2D(const int devID, const int n, const int m, int ...@@ -384,13 +405,10 @@ int XDevManager::GetCudaThread2D(const int devID, const int n, const int m, int
memset(gridSize, 0, sizeof(int) * 3); memset(gridSize, 0, sizeof(int) * 3);
memset(blockSize, 0, sizeof(int) * 3); memset(blockSize, 0, sizeof(int) * 3);
if(n <= 0 || m <= 0 || devID >= nGPU) if(n <= 0 || m <= 0)
return 1; return 1;
if(devID < 0){ CheckNTErrors(devID >= 0 && devID < nGPU, "Invalid GPU device id!");
XPRINT(0, stderr, "WARNING! You are calling the grid and block size computation function on a CPU!");
return 0;
}
#ifdef USE_CUDA #ifdef USE_CUDA
......
...@@ -25,6 +25,7 @@ ...@@ -25,6 +25,7 @@
#define __XDEVICE_H__ #define __XDEVICE_H__
#include "XThread.h" #include "XThread.h"
#include "XStream.h"
#ifdef USE_CUDA #ifdef USE_CUDA
...@@ -92,6 +93,9 @@ public: ...@@ -92,6 +93,9 @@ public:
/* specify whether Unified Virtual Address Space (UVA) is supported */ /* specify whether Unified Virtual Address Space (UVA) is supported */
bool isUVASupported; bool isUVASupported;
/* default stream for the device */
XStream * stream;
#ifdef USE_CUDA #ifdef USE_CUDA
/* mutex for handle (GPU cublas) */ /* mutex for handle (GPU cublas) */
...@@ -121,6 +125,9 @@ public: ...@@ -121,6 +125,9 @@ public:
#ifdef USE_CUDA #ifdef USE_CUDA
/* get cublas handle */ /* get cublas handle */
cublasHandle_t * GetCublasHandle(); cublasHandle_t * GetCublasHandle();
/* get the stream of cuda */
cudaStream_t * GetCudaStream();
#endif #endif
/* switch to a device */ /* switch to a device */
...@@ -178,6 +185,9 @@ public: ...@@ -178,6 +185,9 @@ public:
#ifdef USE_CUDA #ifdef USE_CUDA
/* get the handle of GPU */ /* get the handle of GPU */
cublasHandle_t * GetCudaHandle(const int devID); cublasHandle_t * GetCudaHandle(const int devID);
/* get the stream of cuda */
cudaStream_t * GetCudaStream(const int devID);
#endif #endif
/* get grid and block sizes that max potential */ /* get grid and block sizes that max potential */
......
...@@ -167,7 +167,9 @@ void XLink::SetType(int id) ...@@ -167,7 +167,9 @@ void XLink::SetType(int id)
type[0] = 0; type[0] = 0;
strcpy(type, GetOPName(id)); strcpy(type, GetOPName(id));
typeID = id; typeID = id;
CheckNTErrors(strcmp(type, "NULL"), "illegal edge type name!"); if(id != 0){
CheckNTErrors(strcmp(type, "NULL"), "illegal edge type name!");
}
} }
/* /*
...@@ -515,7 +517,7 @@ void XLink::CopyIncoming(const XTensor * reference, XTensor * target) ...@@ -515,7 +517,7 @@ void XLink::CopyIncoming(const XTensor * reference, XTensor * target)
tails.Add(tail); tails.Add(tail);
} }
MakeLink(&tails, target, reference->id); MakeLink(&tails, target, reference->income.typeID);
int paraNum = reference->income.paramNum; int paraNum = reference->income.paramNum;
target->income.paramNum = paraNum; target->income.paramNum = paraNum;
......
...@@ -208,22 +208,16 @@ void XList::Insert(int pos, void * item) ...@@ -208,22 +208,16 @@ void XList::Insert(int pos, void * item)
/* get the item at position i */ /* get the item at position i */
void * XList::GetItem(int i) const void * XList::GetItem(int i) const
{ {
if( i >= 0 && i < count ) CheckNTErrors(i >= 0 && i < count, "Index of a list item is out of scope!");
return items[i]; return items[i];
else
return NULL;
} }
/* get the integer-typed item at position i */ /* get the integer-typed item at position i */
int XList::GetItemInt(int i) int XList::GetItemInt(int i)
{ {
CheckNTErrors(isIntList, "An int list is required!"); CheckNTErrors(isIntList, "An int list is required!");
CheckNTErrors(i >= 0 && i < count, "Index of a list item is out of scope!");
if( i >= 0 && i < count ){ return *(int*)(items[i]);
return *(int*)(items[i]);
}
else
return 0;
} }
/* set the item at position i */ /* set the item at position i */
......
...@@ -181,7 +181,10 @@ void XMem::Free(int myDevID, void * mem) ...@@ -181,7 +181,10 @@ void XMem::Free(int myDevID, void * mem)
else{ else{
#ifdef USE_CUDA #ifdef USE_CUDA
SetDevice(myDevID); SetDevice(myDevID);
CheckNTErrors(cudaFree((char*)mem) == cudaSuccess, "Cannot free the memory."); cudaError_t error = cudaFree((char*)mem);
if(error != cudaSuccess){
ShowNTErrors("Cannot free the memory.");
}
#else #else
ShowNTErrors("Please specify USE_CUDA for compiling this program."); ShowNTErrors("Please specify USE_CUDA for compiling this program.");
#endif #endif
......
...@@ -29,20 +29,34 @@ const char * GetOPName(int type) ...@@ -29,20 +29,34 @@ const char * GetOPName(int type)
if ((type & MATH_BASE) != 0){ if ((type & MATH_BASE) != 0){
if (type == MATH_ABSOLUTE) if (type == MATH_ABSOLUTE)
return "M_ABSOLUTE"; return "M_ABSOLUTE";
else if (type == MATH_EXP)
return "M_EXP";
else if (type == MATH_LOG)
return "M_LOG";
else if (type == MATH_SIN)
return "M_SIN";
else if (type == MATH_COS)
return "M_COS";
else if (type == MATH_TAN)
return "M_TAN";
else if (type == MATH_MATRIXMUL) else if (type == MATH_MATRIXMUL)
return "M_MATRIXMUL"; return "M_MATRIXMUL";
else if (type == MATH_MATRIXMULBATCHED) else if (type == MATH_MATRIXMULBATCHED)
return "M_MATRIXMULBATCHED"; return "M_MATRIXMULBATCHED";
else if (type == MATH_MULTIPLY) else if (type == MATH_MULTIPLY)
return "M_MULTIPLY"; return "M_MULTIPLY";
else if (type == MATH_DIV)
return "M_DIV";
else if (type == MATH_NEGATE) else if (type == MATH_NEGATE)
return "M_NEGATE"; return "M_NEGATE";
else if (type == MATH_SIGN) else if (type == MATH_SIGN)
return "M_SIGN"; return "M_SIGN";
else if (type == MATH_SUM) else if (type == MATH_SUM)
return "M_SUM"; return "M_SUM";
else if (type == MATH_LOG) else if (type == MATH_SUB)
return "M_LOG"; return "M_SUB";
else if (type == MATH_SUMDIM)
return "M_SUMDIM";
else if (type == MATH_NORMALIZE) else if (type == MATH_NORMALIZE)
return "M_NORMALIZE"; return "M_NORMALIZE";
else if (type == MATH_POWER) else if (type == MATH_POWER)
......
...@@ -31,15 +31,23 @@ namespace nts { // namespace nts(NiuTrans.Tensor) ...@@ -31,15 +31,23 @@ namespace nts { // namespace nts(NiuTrans.Tensor)
/* math operations */ /* math operations */
#define MATH_BASE 0x00001000 #define MATH_BASE 0x00001000
#define MATH_ABSOLUTE MATH_BASE + 1 #define MATH_ABSOLUTE MATH_BASE + 1
#define MATH_MATRIXMUL MATH_ABSOLUTE + 1 #define MATH_EXP MATH_ABSOLUTE + 1
#define MATH_LOG MATH_EXP + 1
#define MATH_SIN MATH_LOG + 1
#define MATH_COS MATH_SIN + 1
#define MATH_TAN MATH_COS + 1
#define MATH_NEGATE MATH_TAN + 1
#define MATH_MATRIXMUL MATH_TAN + 1
#define MATH_MATRIXMULBATCHED MATH_MATRIXMUL + 1 #define MATH_MATRIXMULBATCHED MATH_MATRIXMUL + 1
#define MATH_MULTIPLY MATH_MATRIXMULBATCHED + 1 #define MATH_MULTIPLY MATH_MATRIXMULBATCHED + 1
#define MATH_NEGATE MATH_MULTIPLY + 1 #define MATH_DIV MATH_MULTIPLY + 1
#define MATH_SIGN MATH_NEGATE + 1 #define MATH_SIGN MATH_DIV + 1
#define MATH_SUM MATH_SIGN + 1 #define MATH_SUM MATH_SIGN + 1
#define MATH_SUB MATH_SUM + 1
#define MATH_SUMDIM MATH_SUB + 1
#define MATH_LOG MATH_SUM + 1 #define MATH_NORMALIZE MATH_SUMDIM + 1
#define MATH_NORMALIZE MATH_LOG + 1
#define MATH_POWER MATH_NORMALIZE + 1 #define MATH_POWER MATH_NORMALIZE + 1
#define MATH_SCALEANDSHIFT MATH_POWER + 1 #define MATH_SCALEANDSHIFT MATH_POWER + 1
......
...@@ -84,7 +84,7 @@ void XStream::Create(int priority, int myDevID) ...@@ -84,7 +84,7 @@ void XStream::Create(int priority, int myDevID)
XDevice::SetGPUDevice(myDevID); XDevice::SetGPUDevice(myDevID);
//cudaStreamCreateWithPriority(&stream, cudaStreamDefault, priority); //cudaStreamCreateWithPriority(&stream, cudaStreamDefault, priority);
CheckNTErrors((cudaStreamCreate(&stream) == cudaSuccess), CheckNTErrors((cudaStreamCreate(&stream) == cudaSuccess),
"cannot create the cuda stream!"); "cannot create the cuda stream!");
XDevice::SetGPUDevice(backupDevID); XDevice::SetGPUDevice(backupDevID);
#endif #endif
devID = myDevID; devID = myDevID;
......
...@@ -426,8 +426,12 @@ get the size of a given dimension ...@@ -426,8 +426,12 @@ get the size of a given dimension
int XTensor::GetDim(const int dim) int XTensor::GetDim(const int dim)
{ {
CheckNTErrors(dim < order, "dimenision is out of range!"); CheckNTErrors(dim < order, "dimenision is out of range!");
int d = dim;
if(dim < 0)
d = order - 1;
return dimSize[dim]; return dimSize[d];
} }
/* /*
...@@ -1439,6 +1443,21 @@ void XTensor::Dump(FILE * file, const char * label, const int n, const int verbo ...@@ -1439,6 +1443,21 @@ void XTensor::Dump(FILE * file, const char * label, const int n, const int verbo
} }
/* /*
dump data to a file
>> tensor - tensor whose data is dumped
>> file - where to domp the data
>> label - label of the tensor
>> n - number of items to dump
>> verbose - verbose level
*/
void XTensor::Dump(const XTensor * tensor, FILE * file, const char * label, const int n, const int verbose)
{
XTensor a(tensor->order, tensor->dimSize, tensor->dataType, tensor->denseRatio, tensor->devID, tensor->mem);
_CopyValues(tensor, &a);
a.Dump(file, label, n, verbose);
}
/*
read data from a file read data from a file
>> file - where to load the data >> file - where to load the data
>> label - label of the tensor >> label - label of the tensor
...@@ -1687,13 +1706,13 @@ void InitTensor(XTensor * tensor, ...@@ -1687,13 +1706,13 @@ void InitTensor(XTensor * tensor,
dims[0] = -abs(dims[0]); dims[0] = -abs(dims[0]);
tensor->Resize(myOrder, dims, myDataType, myDenseRatio); if (myDevID == CURRENT_GPU)
if(myDevID == CURRENT_GPU)
tensor->devID = XDevice::GetGPUDevice(); tensor->devID = XDevice::GetGPUDevice();
else else
tensor->devID = myDevID; tensor->devID = myDevID;
tensor->Resize(myOrder, dims, myDataType, myDenseRatio);
if(allocated) if(allocated)
XTensor::AllocateData(tensor); XTensor::AllocateData(tensor);
} }
...@@ -1870,28 +1889,47 @@ generate a XTensor which allocates data on the buffer ...@@ -1870,28 +1889,47 @@ generate a XTensor which allocates data on the buffer
>> myDimSize - the size of each dimension >> myDimSize - the size of each dimension
>> myMem - memory pool used to allocating the data array. >> myMem - memory pool used to allocating the data array.
we actually allocate the data on the buffer associated with we actually allocate the data on the buffer associated with
the memory pool. the memory pool
>> devID - device id
>> myDataType - unit size (e.g., int, float, and double) >> myDataType - unit size (e.g., int, float, and double)
>> myDenseRatio - how often an element has non-zero value >> myDenseRatio - how often an element has non-zero value
*/ */
XTensor * NewTensorBuf(const int myOrder, const int * myDimSize, XMem * myMem, XTensor * NewTensorBuf(const int myOrder, const int * myDimSize,
const TENSOR_DATA_TYPE myDataType, const float myDenseRatio) const TENSOR_DATA_TYPE myDataType, const float myDenseRatio,
const int devID, XMem * myMem)
{ {
CheckNTErrors(myMem != NULL, "No memory pool specified!");
int dims[MAX_TENSOR_DIM_NUM]; int dims[MAX_TENSOR_DIM_NUM];
memcpy(dims, myDimSize, sizeof(int) * myOrder); memcpy(dims, myDimSize, sizeof(int) * myOrder);
dims[0] = -abs(dims[0]); dims[0] = -abs(dims[0]);
XTensor * tensor = NewTensor(myOrder, dims, myDataType, myDenseRatio, -1, myMem); XTensor * tensor = NewTensor(myOrder, dims, myDataType, myDenseRatio, devID, myMem);
tensor->data = myMem->AllocBuf(myMem->devID, tensor->unitNum * tensor->unitSize);
if(myMem != NULL)
tensor->data = myMem->AllocBuf(myMem->devID, tensor->unitNum * tensor->unitSize);
else
tensor->data = XMemAlloc(devID, tensor->unitNum * tensor->unitSize);
return tensor; return tensor;
} }
/* /*
generate a XTensor which allocates data on the buffer
>> reference - reference tensor
>> devID - device id
>> myMem - memory pool used to allocating the data array.
we actually allocate the data on the buffer associated with
the memory pool
*/
XTensor * NewTensorBuf(const XTensor * reference, int devID, XMem * myMem)
{
return NewTensorBuf(reference->order, reference->dimSize,
reference->dataType, reference->denseRatio,
devID, myMem);
}
/*
generate a dense vector generate a dense vector
>> num - number of entries >> num - number of entries
>> myDataType - unit size (e.g., int, float, and double) >> myDataType - unit size (e.g., int, float, and double)
...@@ -2041,7 +2079,7 @@ XTensor * NewTensor(XTensor * a, bool isFilledData) ...@@ -2041,7 +2079,7 @@ XTensor * NewTensor(XTensor * a, bool isFilledData)
free the data space of a given tensor free the data space of a given tensor
>> tensor - pointer to the tensor >> tensor - pointer to the tensor
*/ */
void DelTensor(const XTensor * tensor) void DelTensor(XTensor * tensor)
{ {
delete tensor; delete tensor;
} }
...@@ -2050,10 +2088,13 @@ void DelTensor(const XTensor * tensor) ...@@ -2050,10 +2088,13 @@ void DelTensor(const XTensor * tensor)
free the data space of a given tensor (on the buffer) free the data space of a given tensor (on the buffer)
>> tensor - pointer to the tensor >> tensor - pointer to the tensor
*/ */
void DelTensorBuf(const XTensor * tensor) void DelTensorBuf(XTensor * tensor)
{ {
CheckNTErrors(tensor->mem != NULL, "No memory pool found!"); if(tensor->mem != NULL)
tensor->mem->ReleaseBuf(tensor->devID, tensor->unitNum * tensor->unitSize); tensor->mem->ReleaseBuf(tensor->devID, tensor->unitNum * tensor->unitSize);
else
XMemFree(tensor->devID, tensor->data);
tensor->data = NULL;
delete tensor; delete tensor;
} }
......
...@@ -45,12 +45,13 @@ namespace nts{ ...@@ -45,12 +45,13 @@ namespace nts{
struct XLink; struct XLink;
/* define the maximum number of dimensions in a tensor */ /* define the maximum number of dimensions in a tensor */
#define MAX_TENSOR_DIM_NUM 6 #define MAX_TENSOR_DIM_NUM 8
#define USE_BATCHED_STRIDED_MAT_MUL #define USE_BATCHED_STRIDED_MAT_MUL
#define MIN_TENSOR_SPLIT_NUM 10 #define MIN_TENSOR_SPLIT_NUM 0
#define MIN_TENSOR_SPLIT_LIST_NUM 1024 #define MIN_TENSOR_SPLIT_LIST_NUM 1024
#define MIN_TENSOR_CAT_NUM 8 #define MIN_TENSOR_CAT_NUM 8
/* computation flags */ /* computation flags */
#define UNSAFE_BUT_FAST_MEM #define UNSAFE_BUT_FAST_MEM
#define FAST_MATRIX #define FAST_MATRIX
...@@ -328,6 +329,10 @@ public: ...@@ -328,6 +329,10 @@ public:
/* dump data to a file */ /* dump data to a file */
void Dump(FILE * file, const char * label = NULL, const int n = -1, const int verbose = 0); void Dump(FILE * file, const char * label = NULL, const int n = -1, const int verbose = 0);
/* dump data to a file */
static
void Dump(const XTensor * tensor, FILE * file, const char * label = NULL, const int n = -1, const int verbose = 0);
/* read data from a file */ /* read data from a file */
void Read(FILE * file, const char * label = NULL); void Read(FILE * file, const char * label = NULL);
...@@ -386,8 +391,12 @@ XTensor * NewTensor(const int myOrder, const int * myDimSize, const TENSOR_DATA_ ...@@ -386,8 +391,12 @@ XTensor * NewTensor(const int myOrder, const int * myDimSize, const TENSOR_DATA_
const float myDenseRatio = 1.0F, const int myDevID = -1, XMem * myMem = NULL); const float myDenseRatio = 1.0F, const int myDevID = -1, XMem * myMem = NULL);
/* generate a XTensor which allocates data on the buffer */ /* generate a XTensor which allocates data on the buffer */
XTensor * NewTensorBuf(const int myOrder, const int * myDimSize, XMem * myMem, XTensor * NewTensorBuf(const int myOrder, const int * myDimSize,
const TENSOR_DATA_TYPE myDataType = X_FLOAT, const float myDenseRatio = 1.0F); const TENSOR_DATA_TYPE myDataType = X_FLOAT, const float myDenseRatio = 1.0F,
const int myDevID = -1, XMem * myMem = NULL);
/* generate a XTensor which allocates data on the buffer */
XTensor * NewTensorBuf(const XTensor * reference, int devID, XMem * myMem);
/* generate a dense vector */ /* generate a dense vector */
XTensor * NewTensor1D(const int num, const TENSOR_DATA_TYPE myDataType = X_FLOAT, const int myDevID = -1, XTensor * NewTensor1D(const int num, const TENSOR_DATA_TYPE myDataType = X_FLOAT, const int myDevID = -1,
...@@ -417,10 +426,10 @@ XTensor * NewTensor5D(const int d0, const int d1, const int d2, const int d3, co ...@@ -417,10 +426,10 @@ XTensor * NewTensor5D(const int d0, const int d1, const int d2, const int d3, co
XTensor * NewTensor(XTensor * a, bool isFilledData = true); XTensor * NewTensor(XTensor * a, bool isFilledData = true);
/* free the data space of a given tensor */ /* free the data space of a given tensor */
void DelTensor(const XTensor * tensor); void DelTensor(XTensor * tensor);
/* free the data space of a given tensor (on the buffer) */ /* free the data space of a given tensor (on the buffer) */
void DelTensorBuf(const XTensor * tensor); void DelTensorBuf(XTensor * tensor);
} /* end of the nts (NiuTrans.Tensor) namespace */ } /* end of the nts (NiuTrans.Tensor) namespace */
......
...@@ -175,29 +175,38 @@ void XMemCopy(void * t, int devIDT, const void * s, int devIDS, size_t size) ...@@ -175,29 +175,38 @@ void XMemCopy(void * t, int devIDT, const void * s, int devIDS, size_t size)
return; return;
} }
#ifdef USE_CUDA #ifdef USE_CUDA
else if(devIDT >= 0 && devIDS < 0){
cudaError_t error = cudaMemcpy(t, s, size, cudaMemcpyHostToDevice);
if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy error (cudaMemcpyHostToDevice)");
}
}
else if(devIDT < 0 && devIDS >= 0){
cudaError_t error = cudaMemcpy(t, s, size, cudaMemcpyDeviceToHost);
if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToHost)");
}
}
else{ else{
//if(devIDT == devIDS){ int devID = devIDT < 0 ? devIDS : devIDT;
cudaError_t error = cudaMemcpy(t, s, size, cudaMemcpyDeviceToDevice); int devIDBackup = 0;
cudaGetDevice(&devIDBackup);
cudaSetDevice(devID);
if(devIDT >= 0 && devIDS < 0){
cudaError_t error = cudaMemcpy(t, s, size, cudaMemcpyHostToDevice);
if(error != cudaSuccess){ if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToDevice)"); ShowNTErrors("cudaMemcpy error (cudaMemcpyHostToDevice)");
} }
/*} }
else if(devIDT < 0 && devIDS >= 0){
cudaError_t error = cudaMemcpy(t, s, size, cudaMemcpyDeviceToHost);
if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToHost)");
}
}
else{ else{
CheckNTErrors((cudaMemcpyPeer(t, devIDT, s, devIDS, size) == cudaSuccess), //if(devIDT == devIDS){
"cudaMemcpy error (cudaMemcpyDeviceToDevice)"); cudaError_t error = cudaMemcpy(t, s, size, cudaMemcpyDeviceToDevice);
}*/ if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToDevice)");
}
/*}
else{
CheckNTErrors((cudaMemcpyPeer(t, devIDT, s, devIDS, size) == cudaSuccess),
"cudaMemcpy error (cudaMemcpyDeviceToDevice)");
}*/
}
cudaSetDevice(devIDBackup);
} }
#else #else
ShowNTErrors("Please specify USE_CUDA and recompile the code!"); ShowNTErrors("Please specify USE_CUDA and recompile the code!");
...@@ -208,6 +217,9 @@ void XMemCopy(void * t, int devIDT, const void * s, int devIDS, size_t size) ...@@ -208,6 +217,9 @@ void XMemCopy(void * t, int devIDT, const void * s, int devIDS, size_t size)
#ifdef USE_CUDA #ifdef USE_CUDA
void XMemCopyAsync(void * t, int devIDT, const void * s, int devIDS, size_t size, cudaStream_t stream, int streamDevID) void XMemCopyAsync(void * t, int devIDT, const void * s, int devIDS, size_t size, cudaStream_t stream, int streamDevID)
{ {
if(t == s)
return;
int devIDBackup = -1; int devIDBackup = -1;
if(streamDevID >= 0 && (devIDT >= 0 || devIDS >= 0)){ if(streamDevID >= 0 && (devIDT >= 0 || devIDS >= 0)){
CheckNTErrors((cudaGetDevice(&devIDBackup) == cudaSuccess), "Cannot get GPU device id!"); CheckNTErrors((cudaGetDevice(&devIDBackup) == cudaSuccess), "Cannot get GPU device id!");
...@@ -220,17 +232,23 @@ void XMemCopyAsync(void * t, int devIDT, const void * s, int devIDS, size_t size ...@@ -220,17 +232,23 @@ void XMemCopyAsync(void * t, int devIDT, const void * s, int devIDS, size_t size
return; return;
} }
else if(devIDT >= 0 && devIDS < 0){ else if(devIDT >= 0 && devIDS < 0){
CheckNTErrors((cudaMemcpyAsync(t, s, size, cudaMemcpyHostToDevice, stream) == cudaSuccess), cudaError_t error = cudaMemcpyAsync(t, s, size, cudaMemcpyHostToDevice, stream);
"cudaMemcpyAsync error (cudaMemcpyHostToDevice)"); if(error != cudaSuccess){
ShowNTErrors("cudaMemcpyAsync error (cudaMemcpyHostToDevice)");
}
} }
else if(devIDT < 0 && devIDS >= 0){ else if(devIDT < 0 && devIDS >= 0){
CheckNTErrors((cudaMemcpyAsync(t, s, size, cudaMemcpyDeviceToHost, stream) == cudaSuccess), cudaError_t error = cudaMemcpyAsync(t, s, size, cudaMemcpyDeviceToHost, stream);
"cudaMemcpyAsync error (cudaMemcpyDeviceToHost)"); if(error != cudaSuccess){
ShowNTErrors("cudaMemcpyAsync error (cudaMemcpyDeviceToHost)");
}
} }
else{ else{
//if(devIDT == devIDS){ //if(devIDT == devIDS){
CheckNTErrors((cudaMemcpyAsync(t, s, size, cudaMemcpyDeviceToDevice, stream) == cudaSuccess), cudaError_t error = cudaMemcpyAsync(t, s, size, cudaMemcpyDeviceToDevice, stream);
"cudaMemcpyAsync error (cudaMemcpyDeviceToDevice)"); if(error != cudaSuccess){
ShowNTErrors("cudaMemcpyAsync error (cudaMemcpyDeviceToDevice)");
}
//} //}
/*else{ /*else{
CheckNTErrors((cudaMemcpyPeerAsync(t, devIDT, s, devIDS, size, stream) == cudaSuccess), CheckNTErrors((cudaMemcpyPeerAsync(t, devIDT, s, devIDS, size, stream) == cudaSuccess),
...@@ -261,18 +279,69 @@ void XMemCopy2D(void * t, size_t tPitch, int devIDT, const void * s, size_t sPit ...@@ -261,18 +279,69 @@ void XMemCopy2D(void * t, size_t tPitch, int devIDT, const void * s, size_t sPit
return; return;
} }
#ifdef USE_CUDA #ifdef USE_CUDA
else if (devIDT >= 0 && devIDS < 0) { else{
CheckNTErrors((cudaMemcpy2D(t, tPitch, s, sPitch, mSize, n, cudaMemcpyHostToDevice) == cudaSuccess), int devID = devIDT < 0 ? devIDS : devIDT;
"cudaMemcpy2D error (cudaMemcpyHostToDevice)"); int devIDBackup = 0;
cudaGetDevice(&devIDBackup);
cudaSetDevice(devID);
if (devIDT >= 0 && devIDS < 0) {
cudaError_t error = cudaMemcpy2D(t, tPitch, s, sPitch, mSize, n, cudaMemcpyHostToDevice);
if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy2D error (cudaMemcpyHostToDevice)");
}
}
else if (devIDT < 0 && devIDS >= 0) {
cudaError_t error = cudaMemcpy2D(t, tPitch, s, sPitch, mSize, n, cudaMemcpyDeviceToHost);
if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToHost)");
}
}
else {
cudaError_t error = cudaMemcpy2D(t, tPitch, s, sPitch, mSize, n, cudaMemcpyDeviceToDevice);
if (error != cudaSuccess) {
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToDevice)");
}
}
cudaSetDevice(devIDBackup);
} }
else if (devIDT < 0 && devIDS >= 0) { #else
CheckNTErrors((cudaMemcpy2D(t, tPitch, s, sPitch, mSize, n, cudaMemcpyDeviceToHost) == cudaSuccess), ShowNTErrors("Please specify USE_CUDA and recompile the code!");
"cudaMemcpy error (cudaMemcpyDeviceToHost)"); #endif
}
void XMemCopy2DAsync(void * t, size_t tPitch, int devIDT, const void * s, size_t sPitch, int devIDS, size_t mSize, int n, XStream * stream)
{
if (t == s)
return;
if (devIDT < 0 && devIDS < 0) {
for(int i = 0; i < n; i++)
memcpy((char*)t + tPitch * i, (char*)s + sPitch * i, mSize);
return;
} }
else { #ifdef USE_CUDA
cudaError_t error = cudaMemcpy2D(t, tPitch, s, sPitch, mSize, n, cudaMemcpyDeviceToDevice); else{
if (error != cudaSuccess) { CheckNTErrors(stream != NULL, "No stream found!");
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToDevice)"); cudaStream_t &cstream = stream->stream;
if (devIDT >= 0 && devIDS < 0) {
cudaError_t error = cudaMemcpy2DAsync(t, tPitch, s, sPitch, mSize, n, cudaMemcpyHostToDevice, cstream);
if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy2D error (cudaMemcpyHostToDevice)");
}
}
else if (devIDT < 0 && devIDS >= 0) {
cudaError_t error = cudaMemcpy2DAsync(t, tPitch, s, sPitch, mSize, n, cudaMemcpyDeviceToHost, cstream);
if(error != cudaSuccess){
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToHost)");
}
}
else {
cudaError_t error = cudaMemcpy2DAsync(t, tPitch, s, sPitch, mSize, n, cudaMemcpyDeviceToDevice, cstream);
if (error != cudaSuccess) {
ShowNTErrors("cudaMemcpy error (cudaMemcpyDeviceToDevice)");
}
} }
} }
#else #else
......
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
#include <stdio.h> #include <stdio.h>
#include "XGlobal.h" #include "XGlobal.h"
#include "XDevice.h"
#ifndef __XUTILITY_H__ #ifndef __XUTILITY_H__
#define __XUTILITY_H__ #define __XUTILITY_H__
...@@ -41,6 +42,7 @@ extern void XMemSet(void * p, int value, size_t size); ...@@ -41,6 +42,7 @@ extern void XMemSet(void * p, int value, size_t size);
extern void XMemSet(int devID, void * p, int value, size_t size); extern void XMemSet(int devID, void * p, int value, size_t size);
extern void XMemCopy(void * t, int devIDT, const void * s, int devIDS, size_t size); extern void XMemCopy(void * t, int devIDT, const void * s, int devIDS, size_t size);
extern void XMemCopy2D(void * t, size_t tPitch, int devIDT, const void * s, size_t sPitch, int devIDS, size_t mSize, int n); extern void XMemCopy2D(void * t, size_t tPitch, int devIDT, const void * s, size_t sPitch, int devIDS, size_t mSize, int n);
extern void XMemCopy2DAsync(void * t, size_t tPitch, int devIDT, const void * s, size_t sPitch, int devIDS, size_t mSize, int n, XStream * stream);
extern void * XMemAlloc(int devID, size_t size); extern void * XMemAlloc(int devID, size_t size);
extern void * XMemAllocOnDev(int devID, size_t size); extern void * XMemAllocOnDev(int devID, size_t size);
extern void XMemFree(int devID, void * p); extern void XMemFree(int devID, void * p);
......
...@@ -26,49 +26,62 @@ ...@@ -26,49 +26,62 @@
#include "../XTensor.h" #include "../XTensor.h"
#include "shape/Concatenate.h" #include "arithmetic/Div.h"
#include "shape/ConcatenateSolely.h"
#include "movement/CopyBlocks.h"
#include "movement/CopyBlocksInGrid.h"
#include "movement/CopyBlocksOnSite.h"
#include "movement/CopyData2D.h"
#include "movement/CopyIndexed.h"
#include "movement/CopyInGrid.h"
#include "movement/CopyValues.h"
#include "utilities/FlushToMem.h"
#include "shape/MakeMergeBlockIndex.h"
#include "shape/MakeSplitBlockIndex.h"
#include "arithmetic/MatrixMul.h" #include "arithmetic/MatrixMul.h"
#include "arithmetic/MatrixMul2D.h" #include "arithmetic/MatrixMul2D.h"
#include "arithmetic/MatrixMul2DMultiTheading.h" #include "arithmetic/MatrixMul2DMultiTheading.h"
#include "arithmetic/MatrixMul2DParallel.h" #include "arithmetic/MatrixMul2DParallel.h"
#include "arithmetic/MatrixMulBatched.h" #include "arithmetic/MatrixMulBatched.h"
#include "arithmetic/MatrixMULBatchedCPU.h"
#include "shape/Merge.h"
#include "shape/MergeBlockLists.h"
#include "arithmetic/Multiply.h" #include "arithmetic/Multiply.h"
#include "arithmetic/Negate.h" #include "arithmetic/Negate.h"
#include "arithmetic/Sign.h"
#include "arithmetic/Sub.h"
#include "arithmetic/Sum.h"
#include "arithmetic/SumByColumnTV.h"
#include "arithmetic/SumByColumnVT.h"
#include "arithmetic/SumDim.h"
#include "arithmetic/XTensorBLAS.h"
#include "getandset/ConvertDataType.h"
#include "getandset/Select.h"
#include "getandset/SetData.h"
#include "math/Normalize.h" #include "math/Normalize.h"
#include "shape/Permute.h"
#include "math/Power.h" #include "math/Power.h"
#include "math/ScaleAndShift.h"
#include "math/Unary.h"
#include "movement/CopyBlocks.h"
#include "movement/CopyBlocksInGrid.h"
#include "movement/CopyBlocksOnSite.h"
#include "movement/CopyData2D.h"
#include "movement/CopyIndexed.h"
#include "movement/CopyInGrid.h"
#include "movement/CopyValues.h"
#include "reduce/ReduceMax.h" #include "reduce/ReduceMax.h"
#include "reduce/ReduceMean.h" #include "reduce/ReduceMean.h"
#include "reduce/ReduceStandardVariance.h" #include "reduce/ReduceStandardVariance.h"
#include "reduce/ReduceSum.h" #include "reduce/ReduceSum.h"
#include "reduce/ReduceSumSquared.h" #include "reduce/ReduceSumSquared.h"
#include "reduce/ReduceVariance.h" #include "reduce/ReduceVariance.h"
#include "math/ScaleAndShift.h"
#include "getandset/Select.h" #include "shape/Concatenate.h"
#include "getandset/SetData.h" #include "shape/ConcatenateSolely.h"
#include "sort/Sort.h" #include "shape/MakeMergeBlockIndex.h"
#include "shape/MakeSplitBlockIndex.h"
#include "shape/Merge.h"
#include "shape/MergeBlockLists.h"
#include "shape/Permute.h"
#include "shape/Split.h" #include "shape/Split.h"
#include "arithmetic/Sum.h"
#include "arithmetic/SumByColumnTV.h"
#include "arithmetic/SumByColumnVT.h"
#include "sort/TopK.h"
#include "shape/Transpose.h" #include "shape/Transpose.h"
#include "shape/Unsqueeze.h" #include "shape/Unsqueeze.h"
#include "sort/Sort.h"
#include "sort/TopK.h"
#include "utilities/XMatrixSegment.h" #include "utilities/XMatrixSegment.h"
#include "arithmetic/XTensorBLAS.h" #include "utilities/FlushToMem.h"
#endif // __CHEADER_H__ #endif // __CHEADER_H__
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11
*/
#include <math.h>
#include "../../XTensor.h"
#include "../../XName.h"
#include "Absolute.h"
#include "Absolute.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
set every entry to its absolute value
>> a - input tensor we are processing
>> b - output tensor we are processing
*/
void _Absolute(const XTensor * a, XTensor * b)
{
#ifdef USE_CUDA
/* run it on GPUs */
if (a->devID >= 0) {
_CudaAbsolute(a, b);
return;
}
#endif
CheckNTErrors((XTensor::IsSameShaped(a, b)), "Input tensors should have the same type!");
CheckNTErrors((a->dataType == DEFAULT_DTYPE), "TODO!");
DTYPE * d = (DTYPE*)a->data;
DTYPE * db = (DTYPE*)b->data;
for (int i = 0; i < a->unitNum; i++)
db[i] = (DTYPE)fabs(d[i]);
}
/*
set every entry to its absolute value (do it on site)
keep the result in the input tensor a and return nothing
>> a - the tensor we are processing
*/
void _AbsoluteMe(XTensor * a)
{
_Absolute(a, a);
}
/*
set every entry to its absolute value (return a XTensor structure)
make a new tensor to keep the result and return it
>> a - input tensor we are processing
<< return - the absolute value of input tensor
*/
XTensor Absolute(const XTensor & a)
{
XTensor b(&a);
b.SetTMP();
/* call _Absolute function */
_Absolute(&a, &b);
/* tensor connections */
XLink::MakeLink(&a, NULL, &b, MATH_ABSOLUTE);
return b;
}
} // namespace nts(NiuTrans.Tensor)
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11
*/
#include "../../XDevice.h"
#include "../../XTensor.h"
#include "Absolute.h"
#include "Absolute.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/*
set each entry to its absolute value (CUDA Kernel)
>> a - pointer to input data array
>> b - pointer to output data array
>> size - size of the data array
*/
__global__
void KernelAbsolute(DTYPE * a, DTYPE * b, int size)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size)
b[i] = fabs(a[i]);
}
/*
set each entry to its absolute value (CUDA Kernel)
This is for float16 computation
>> a - pointer to input data array
>> b - pointer to output data array
>> size - size of the data array
*/
__global__
void KernelAbsolute(__half * a, __half * b, int size)
{
return;
}
/*
set each entry to its absolute value
>> a - input tensor
>> b - output tensor
*/
void _CudaAbsolute(const XTensor * a, XTensor * b)
{
CheckNTErrors((XTensor::IsSameShaped(a, b)), "Input tensors should have the same type!");
CheckNTErrors((a->isSparse == false), "TODO!");
int gridSize[3];
int blockSize[3];
GDevs.GetCudaThread(a->devID, a->unitNum, gridSize, blockSize);
dim3 blocks(gridSize[0]);
dim3 threads(blockSize[0]);
int devIDBackup;
ProtectCudaDev(a->devID, devIDBackup);
if (a->dataType == DEFAULT_DTYPE) {
KernelAbsolute << <blocks, threads >> >((DTYPE*)a->data, (DTYPE*)b->data, a->unitNum);
}
else if (a->dataType == X_FLOAT16) {
KernelAbsolute << <blocks, threads >> >((__half*)a->data, (__half*)b->data, a->unitNum);
}
else {
ShowNTErrors("TODO!");
}
BacktoCudaDev(a->devID, devIDBackup);
}
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#include "../../XTensor.h"
#include "../../XName.h"
#include "Div.h"
#include "Div.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
element-wise division of two tensors
c(i) = a(i)/b(i) + \alpha * c(i)
where i is the index of the item
>> a - tensor a
>> b - tensor b
>> c - result tensor
>> alpha - the coefficient
>> leadingDim - the dimension along which we perform broadcasting
*/
void _Div(const XTensor * a, const XTensor * b, XTensor * c, DTYPE alpha, int leadingDim)
{
int leadingDimRDI = a->order - leadingDim - 1;
CheckNTErrors((a->unitNum <= c->unitNum && b->unitNum <= c->unitNum),
"Unmatched tensors in multiplication!");
CheckNTErrors((a->order == b->order && a->order == c->order),
"Unmatched tensors!");
#ifdef USE_CUDA
if (a->devID >= 0 || b->devID >= 0 || c->devID >= 0) {
_CudaDiv(a, b, c, alpha, leadingDim);
return;
}
#endif
int stride = 1;
int blockSizeA = 1;
int blockSizeB = 1;
int blockSizeC = 1;
int blockNum = 1;
int dimensionSizeA = a->dimSizeRDI[leadingDimRDI];
int dimensionSizeB = b->dimSizeRDI[leadingDimRDI];
int dimensionSizeC = c->dimSizeRDI[leadingDimRDI];
for (int i = 0; i < a->order; i++) {
if (i != leadingDimRDI) {
CheckNTErrors((a->dimSizeRDI[i] == b->dimSizeRDI[i] && a->dimSizeRDI[i] == c->dimSizeRDI[i]),
"Unmatched tensors!");
}
if (i < leadingDimRDI)
stride *= a->dimSizeRDI[i];
}
blockSizeA = stride * dimensionSizeA;
blockSizeB = stride * dimensionSizeB;
blockSizeC = stride * dimensionSizeC;
blockNum = a->unitNum / blockSizeA;
if (!a->isSparse && !b->isSparse) {
if (a->dataType == DEFAULT_DTYPE && b->dataType == DEFAULT_DTYPE) {
if (a->unitNum == c->unitNum && b->unitNum == c->unitNum) {
int size = a->unitNum;
DTYPE * ap = (DTYPE*)a->data;
DTYPE * bp = (DTYPE*)b->data;
DTYPE * cp = (DTYPE*)c->data;
if (alpha == 0) {
for (int i = 0; i < size; i++)
cp[i] = ap[i] / bp[i];
}
else {
for (int i = 0; i < size; i++)
cp[i] = ap[i] / bp[i] + alpha * cp[i];
}
}
else {
for (int k = 0; k < blockNum; k++) {
for (int ci = 0, ai = 0, bi = 0; ci < dimensionSizeC; ci++, ai++, bi++) {
if (ai >= dimensionSizeA)
ai = 0;
if (bi >= dimensionSizeB)
bi = 0;
DTYPE * ap = (DTYPE*)a->data + k * blockSizeA + ai * stride;
DTYPE * bp = (DTYPE*)b->data + k * blockSizeB + bi * stride;
DTYPE * cp = (DTYPE*)c->data + k * blockSizeC + ci * stride;
for (int j = 0; j < stride; j++)
cp[j] = ap[j] / bp[j] + cp[j] * alpha;
}
}
}
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
}
/*
element-wise division of two tensors (do it on site)
keep the result in the input tensor a and return nothing
a(i) = a(i)*b(i) + \alpha * a(i)
where i is the index of the item
>> a - tensor a (where keep the result)
>> b - tensor b
>> alpha - the coefficient
>> leadingDim - the dimension along which we perform broadcasting
*/
void _DivMe(XTensor * a, const XTensor * b, DTYPE alpha, int leadingDim)
{
_Div(a, b, a, alpha, leadingDim);
}
/*
element-wise division of two tensors (return a XTensor structure)
make a new tensor c to keep the result and return it
c(i) = a(i)*b(i)
where i is the index of the item
>> a - tensor a
>> b - tensor b
>> leadingDim - the dimension along which we perform broadcasting
<< return - the product of the tensors
*/
XTensor Div(const XTensor &a, const XTensor &b, int leadingDim)
{
CheckNTErrors(a.dimSize[leadingDim] == b.dimSize[leadingDim], "TODO!");
XTensor c(&a);
c.SetTMP();
/* call _Multiply function */
_Div(&a, &b, &c, 0, leadingDim);
/* tensor connections */
XLink::MakeLink(&a, &b, &c, MATH_DIV);
XLink::AddParamToHeadInt(&c, leadingDim);
return c;
}
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-04-24
*/
#include "../../XDevice.h"
#include "../../XTensor.h"
#include "Div.h"
#include "Div.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/*
division of data arrays in a element-wise manner c(i) = a(i)/b(i)
>> a - data array a
>> b - data array b
>> c - result data array
>> size - size of c
*/
__global__
void KernelDivElementWise(DTYPE * a, DTYPE * b, DTYPE * c, int size)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size)
c[i] = a[i] / b[i];
}
/*
division of data arrays in a element-wise manner c(i) = a(i)/b(i) + \alpha*c(i)
>> a - data array a
>> b - data array b
>> c - result data array
>> size - size of c
>> alpha - the coefficient
*/
__global__
void KernelDivElementWiseV2(DTYPE * a, DTYPE * b, DTYPE * c, int size, DTYPE alpha)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size)
c[i] = a[i] / b[i] + alpha * c[i];
}
/*
division of two tensors in a element-wise manner c(i) = a(i)/b(i).
Note that a and b can be of different sizes here, i.e.,
|a_lead| <= |c_lead| and |b_lead| <= |c_lead|
where |a_lead| means the size of the leading dimension of a
>> a - tensor a
>> b - tensor b
>> c - result tensor
>> alpha - the coefficient
>> stride - the number of items we go over when move next along the leading dimension in a block
>> ldSizeA - size of the leading dimension of a
>> ldSizeB - size of the leading dimension of b
>> ldSizeC - size of the leading dimension of c
>> blockNum - number of blocks
*/
template<int nonZeroAlpha> __global__
void KernelDivElementWiseTensorDynamic(DTYPE * a, DTYPE * b, DTYPE * c, DTYPE alpha,
int stride, int ldSizeA, int ldSizeB, int ldSizeC, int blockNum)
{
__shared__ DTYPE* ap[MAX_CUDA_THREAD_NUM_PER_BLOCK];
__shared__ DTYPE* bp[MAX_CUDA_THREAD_NUM_PER_BLOCK];
__shared__ DTYPE* cp[MAX_CUDA_THREAD_NUM_PER_BLOCK];
int i = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
if (i >= blockNum * stride || j >= ldSizeC)
return;
if (threadIdx.y == 0) {
int block = i / stride;
int size = block * stride;
ap[threadIdx.x] = a + size * ldSizeA;
bp[threadIdx.x] = b + size * ldSizeB;
cp[threadIdx.x] = c + size * ldSizeC;
}
__syncthreads();
int aj = j >= ldSizeA ? j % ldSizeA : j;
int bj = j >= ldSizeB ? j % ldSizeB : j;
int offseti = i % stride;
if (nonZeroAlpha == 0)
cp[threadIdx.x][j * ldSizeC + offseti] = ap[threadIdx.x][aj * ldSizeA + offseti] / bp[threadIdx.x][bj * ldSizeB + offseti];
else
cp[threadIdx.x][j * ldSizeC + offseti] = ap[threadIdx.x][aj * ldSizeA + offseti] / bp[threadIdx.x][bj * ldSizeB + offseti]
+ alpha * cp[threadIdx.x][j * ldSizeC + offseti];
}
/*
element-wise division of two tensors
c(i) = a(i)*b(i) + \alpha * c(i)
where i is the item index
>> a - tensor a
>> b - tensor b
>> c - result tensor
>> alpha - the coefficient
>> leadingDim - dimension along which we perform broadcasting
*/
void _CudaDiv(const XTensor * a, const XTensor * b, XTensor * c, DTYPE alpha, int leadingDim)
{
int leadingDimRDI = a->order - leadingDim - 1;
CheckNTErrors((a->unitNum <= c->unitNum && b->unitNum <= c->unitNum),
"Unmatched tensors in multiplication!");
CheckNTErrors((a->order == b->order && a->order == c->order), "Unmatched tensors!");
int stride = 1;
int blockSizeA = 1;
int blockNum = 1;
int dimensionSizeA = a->dimSizeRDI[leadingDimRDI];
int dimensionSizeB = b->dimSizeRDI[leadingDimRDI];
int dimensionSizeC = c->dimSizeRDI[leadingDimRDI];
for (int i = 0; i < a->order; i++) {
if (i != leadingDimRDI) {
CheckNTErrors((a->dimSizeRDI[i] == b->dimSizeRDI[i] &&
a->dimSizeRDI[i] == c->dimSizeRDI[i]),
"Unmatched tensors!");
}
if (i < leadingDimRDI)
stride *= a->dimSizeRDI[i];
}
blockSizeA = stride * dimensionSizeA;
blockNum = a->unitNum / blockSizeA;
int devIDBackup;
ProtectCudaDev(a->devID, devIDBackup);
if (!a->isSparse && !b->isSparse) {
if (a->dataType == DEFAULT_DTYPE && b->dataType == DEFAULT_DTYPE) {
int cudaGridSize[3];
int cudaBlockSize[3];
if (a->unitNum == c->unitNum && b->unitNum == c->unitNum) {
GDevs.GetCudaThread(a->devID, c->unitNum, cudaGridSize, cudaBlockSize);
dim3 blocks(cudaGridSize[0]), threads(cudaBlockSize[0]);
if (alpha == 0)
KernelDivElementWise << <blocks, threads >> >((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data, c->unitNum);
else
KernelDivElementWiseV2 << <blocks, threads >> >((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data, c->unitNum, alpha);
}
else {
GDevs.GetCudaThread2D(c->devID, stride * blockNum, dimensionSizeC, MAX_INT, cudaGridSize, cudaBlockSize);
dim3 blocks(cudaGridSize[0], cudaGridSize[1]), threads(cudaBlockSize[0], cudaBlockSize[1]);
if (alpha == 0) {
KernelDivElementWiseTensorDynamic<0> << <blocks, threads >> >
((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data, 0,
stride, dimensionSizeA, dimensionSizeB, dimensionSizeC, blockNum);
}
else {
KernelDivElementWiseTensorDynamic<1> << <blocks, threads >> >
((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data, alpha,
stride, dimensionSizeA, dimensionSizeB, dimensionSizeC, blockNum);
}
}
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
BacktoCudaDev(a->devID, devIDBackup);
}
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#ifndef __DIV_CUH__
#define __DIV_CUH__
#include "Div.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/* division of two tensors in a element-wise manner c(i) = a(i)/b(i) */
__global__
void KernelDivElementWise(DTYPE * a, DTYPE * b, DTYPE * c, int size);
/* division of two tensors in a element-wise manner c(i) = a(i)/b(i) + \alpha*c(i) */
__global__
void KernelDivElementWiseV2(DTYPE * a, DTYPE * b, DTYPE * c, int size, DTYPE alpha);
/* division of two tensors in a element-wise manner c(i) = a(i)/b(i)+ \alpha*c(i) */
template<int nonZeroAlpha>__global__
void KernelDivElementWiseTensorDynamic(DTYPE * a, DTYPE * b, DTYPE * c, DTYPE alpha, int stride, int ldSizeA, int ldSizeB, int ldSizeC, int blockNum);
/* element-wise division of two tensors */
void _CudaDiv(const XTensor * a, const XTensor * b, XTensor * c, DTYPE alpha = 0, int leadingDim = 0);
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
#endif // __DIV_CUH__
...@@ -16,31 +16,39 @@ ...@@ -16,31 +16,39 @@
*/ */
/* /*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/ */
#ifndef __LOG_H__ #ifndef __DIV_H__
#define __LOG_H__ #define __DIV_H__
#include "../../XTensor.h" #include "../../XTensor.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* set every entry to its log value */ /*
void _Log(const XTensor * a, XTensor * b); element-wise division of two tensors:
c(i) = a(i)/b(i) + \alpha * c(i)
where i is the index of the element
*/
void _Div(const XTensor * a, const XTensor * b, XTensor * c, DTYPE alpha = 0, int leadingDim = 0);
/* /*
set every entry to its log value (do it on site) element-wise division of two tensors (do it on site)
keep the result in the input tensor a and return nothing keep the result in the input tensor a and return nothing
a(i) = a(i)/b(i) + \alpha * a(i)
where i is the index of the element
*/ */
void _LogMe(XTensor * a); void _DivMe(XTensor * a, const XTensor * b, DTYPE alpha = 0, int leadingDim = 0);
/* /*
set every entry to its log value (return a XTensor structure) element-wise division of two tensors (return a XTensor structure)
make a new tensor to keep the result and return it make a new tensor to keep the result and return it
c(i) = a(i)/b(i)
where i is the index of the element
*/ */
XTensor Log(const XTensor & a); XTensor Div(const XTensor &a, const XTensor &b, int leadingDim = 0);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
#endif // __LOG_H__ #endif // __DIV_H__
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-04-24
*/
#include "../../XTensor.h"
#include "MatrixMULBatchedCPU.h"
#include "MatrixMul2D.h"
#include "XTensorBLAS.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
matrix multiplication in batch mode (BLAS)
c_i = trans(a_i) * trans(b_i) * \alpha + c_i * \beta for each i in [0,count-1]
>> a - list of input matrices (2d tensors)
>> transposedA - indicate whether the matrix a is transposed
>> b - another list of input matrices (2d tensors)
>> transposedB - indicate whether the matrix b is transposed
>> c - output matrix (2d tensor)
>> alpha - scalar
>> beta - scalar
*/
void _MatrixMULBatchedCPU(const XList * a, MATRIX_TRANS_TYPE transposedA,
const XList * b, MATRIX_TRANS_TYPE transposedB,
XList * c, DTYPE alpha, DTYPE beta)
{
CheckNTErrors(a && b && c, "Empty input lists!");
CheckNTErrors(a->count == b->count && a->count == c->count, "Input lists must be of the same size!");
if (a->count == 0)
return;
bool isUniform = true;
for (int i = 1; i < a->count; i++) {
XTensor * aim = (XTensor*)a->GetItem(i - 1);
XTensor * bim = (XTensor*)b->GetItem(i - 1);
XTensor * cim = (XTensor*)c->GetItem(i - 1);
XTensor * ai = (XTensor*)a->GetItem(i);
XTensor * bi = (XTensor*)b->GetItem(i);
XTensor * ci = (XTensor*)c->GetItem(i);
if (!XTensor::IsSameShaped(aim, ai) ||
!XTensor::IsSameShaped(bim, bi) ||
!XTensor::IsSameShaped(cim, ci))
{
isUniform = false;
break;
}
}
for (int i = 0; i < a->count; i++) {
XTensor * ai = (XTensor*)a->GetItem(i);
XTensor * bi = (XTensor*)b->GetItem(i);
XTensor * ci = (XTensor*)c->GetItem(i);
CheckNTErrors((ai->order == 2), "2d tensor (i.e., matrix) is required!");
CheckNTErrors((bi->order == 2), "2d tensor (i.e., matrix) is required!");
CheckNTErrors((ci->order == 2), "2d tensor (i.e., matrix) is required!");
#ifdef USE_BLAS
if (useBLAS)
_MatrixMULCPU(ai, transposedA, bi, transposedB, ci, alpha, beta);
else
_MatrixMul2D(ai, transposedA, bi, transposedB, ci, alpha, beta);
#else
_MatrixMul2D(ai, transposedA, bi, transposedB, ci, alpha, beta);
#endif
}
//}
}
} // namespace nts(NiuTrans.Tensor)
\ No newline at end of file
...@@ -24,8 +24,8 @@ ...@@ -24,8 +24,8 @@
#include "../../XName.h" #include "../../XName.h"
#include "MatrixMul.h" #include "MatrixMul.h"
#include "MatrixMul2D.h" #include "MatrixMul2D.h"
#include "MatrixMULBatchedCPU.h"
#include "XTensorBLAS.h" #include "XTensorBLAS.h"
#include "MatrixMulBatched.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
...@@ -156,9 +156,9 @@ void _MatrixMul(const XTensor * a, MATRIX_TRANS_TYPE transposedA, ...@@ -156,9 +156,9 @@ void _MatrixMul(const XTensor * a, MATRIX_TRANS_TYPE transposedA,
} }
else { else {
CheckNTErrors((a->dataType == DEFAULT_DTYPE), "TODO!"); CheckNTErrors((a->dataType == DEFAULT_DTYPE), "TODO!");
_MatrixMULBatchedCPU(aList, transposedA, _MatrixMulBatchedCPU(aList, transposedA,
bList, transposedB, bList, transposedB,
cList, alpha, beta); cList, alpha, beta);
} }
for (int i = 0; i < aList->count; i++) { for (int i = 0; i < aList->count; i++) {
......
...@@ -23,8 +23,8 @@ ...@@ -23,8 +23,8 @@
#include "../../XDevice.h" #include "../../XDevice.h"
#include "../../XName.h" #include "../../XName.h"
#include "MatrixMulBatched.h" #include "MatrixMulBatched.h"
#include "MatrixMULBatchedCPU.h"
#include "XTensorBLAS.h" #include "XTensorBLAS.h"
#include "MatrixMul2D.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
...@@ -57,6 +57,43 @@ void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA, ...@@ -57,6 +57,43 @@ void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA,
CheckNTErrors((a->order == b->order && a->order == c->order), CheckNTErrors((a->order == b->order && a->order == c->order),
"Input tensor and output tensor must have same order!"); "Input tensor and output tensor must have same order!");
if (a->devID >= 0 || b->devID >= 0 || c->devID >= 0)
_MatrixMulBatchedGPU(a, transposedA, b, transposedB, c, alpha, beta);
else
_MatrixMulBatchedCPU(a, transposedA, b, transposedB, c, alpha, beta);
}
/*
matrix multiplication of the two tensors
optimized for GPU
for each 2-dimensional data array in a (denoted as ai) and
each 2-dimensional data array in b (denoted as bi), we have
ci = trans(ai) * trans(bi) * alpha + cm * beta
where trans() returns the transposed matrix if the flag is fired
>> a - tensor a
>> transposedA - indicates whether the matrices in a are transposed
>> b - tensor b
>> transposedB - indicates whether teh matrices in b are transposed
>> c - where we keep a*b
>> alpha - a coefficient
>> beta - another coefficient
*/
void _MatrixMulBatchedGPU(const XTensor * a, MATRIX_TRANS_TYPE transposedA,
const XTensor * b, MATRIX_TRANS_TYPE transposedB,
XTensor * c, DTYPE alpha, DTYPE beta)
{
#ifdef USE_CUDA
CheckNTErrors((a && b && c), "Empty input tensors!");
CheckNTErrors((a->dataType == b->dataType && a->dataType == c->dataType),
"Input tensors should have the same data type!");
CheckNTErrors((a->order >= 2 && b->order >= 2 && c->order >= 2),
"Input tensors must have a order >= 2!");
CheckNTErrors((a->order == b->order && a->order == c->order),
"Input tensor and output tensor must have same order!");
CheckNTErrors(a->devID >= 0 && b->devID >= 0 && c->devID >= 0, "The tensors must be on GPUs");
int an = transposedA == X_TRANS ? a->dimSizeRDI[0] : a->dimSizeRDI[1]; int an = transposedA == X_TRANS ? a->dimSizeRDI[0] : a->dimSizeRDI[1];
int am = transposedA == X_TRANS ? a->dimSizeRDI[1] : a->dimSizeRDI[0]; int am = transposedA == X_TRANS ? a->dimSizeRDI[1] : a->dimSizeRDI[0];
int bn = transposedB == X_TRANS ? b->dimSizeRDI[0] : b->dimSizeRDI[1]; int bn = transposedB == X_TRANS ? b->dimSizeRDI[0] : b->dimSizeRDI[1];
...@@ -64,8 +101,7 @@ void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA, ...@@ -64,8 +101,7 @@ void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA,
int cn = c->dimSizeRDI[1]; int cn = c->dimSizeRDI[1];
int cm = c->dimSizeRDI[0]; int cm = c->dimSizeRDI[0];
CheckNTErrors((am == bn && an == cn && bm == cm), CheckNTErrors((am == bn && an == cn && bm == cm), "Unmatched tensors in multiplication!");
"Unmatched tensors in multiplication!");
int aBlockSize = a->dimSizeRDI[0] * a->dimSizeRDI[1]; int aBlockSize = a->dimSizeRDI[0] * a->dimSizeRDI[1];
int bBlockSize = b->dimSizeRDI[0] * b->dimSizeRDI[1]; int bBlockSize = b->dimSizeRDI[0] * b->dimSizeRDI[1];
...@@ -81,76 +117,154 @@ void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA, ...@@ -81,76 +117,154 @@ void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA,
blockNum *= a->dimSizeRDI[i]; blockNum *= a->dimSizeRDI[i];
} }
XList * aList = new XList(10); cublasHandle_t * handle = a->mem != NULL ? a->mem->GetCublasHandle() : GDevs.GetCudaHandle(a->devID);
XList * bList = new XList(10); _CudaBLASMatrixMULBatchedStrided(handle,
XList * cList = new XList(10); a->data, transposedA, a->dataType, aBlockSize,
int aDimSize[2] = { -a->dimSizeRDI[1], a->dimSizeRDI[0] }; b->data, transposedB, b->dataType, bBlockSize,
int bDimSize[2] = { -b->dimSizeRDI[1], b->dimSizeRDI[0] }; c->data, c->dataType, cBlockSize, blockNum,
int cDimSize[2] = { -c->dimSizeRDI[1], c->dimSizeRDI[0] }; a->dimSizeRDI[1], a->dimSizeRDI[0],
b->dimSizeRDI[1], b->dimSizeRDI[0],
for (int p = 0; p < blockNum; p++) { c->dimSizeRDI[1], c->dimSizeRDI[0], alpha, beta);
void * ap = (char*)a->data + aRealBlockSize * p; #endif
void * bp = (char*)b->data + bRealBlockSize * p; }
void * cp = (char*)c->data + cRealBlockSize * p;
XTensor * ai = NewTensor(2, aDimSize, a->dataType, a->denseRatio, a->devID, a->mem); /*
XTensor * bi = NewTensor(2, bDimSize, b->dataType, b->denseRatio, b->devID, b->mem); matrix multiplication of the two tensors
XTensor * ci = NewTensor(2, cDimSize, c->dataType, c->denseRatio, c->devID, c->mem); optimized for CPU
ai->data = ap;
bi->data = bp; for each 2-dimensional data array in a (denoted as ai) and
ci->data = cp; each 2-dimensional data array in b (denoted as bi), we have
aList->Add(ai); ci = trans(ai) * trans(bi) * alpha + cm * beta
bList->Add(bi); where trans() returns the transposed matrix if the flag is fired
cList->Add(ci);
>> a - tensor a
>> transposedA - indicates whether the matrices in a are transposed
>> b - tensor b
>> transposedB - indicates whether teh matrices in b are transposed
>> c - where we keep a*b
>> alpha - a coefficient
>> beta - another coefficient
*/
void _MatrixMulBatchedCPU(const XTensor * a, MATRIX_TRANS_TYPE transposedA,
const XTensor * b, MATRIX_TRANS_TYPE transposedB,
XTensor * c, DTYPE alpha, DTYPE beta)
{
CheckNTErrors((a && b && c), "Empty input tensors!");
CheckNTErrors((a->dataType == b->dataType && a->dataType == c->dataType),
"Input tensors should have the same data type!");
CheckNTErrors((a->order >= 2 && b->order >= 2 && c->order >= 2),
"Input tensors must have a order >= 2!");
CheckNTErrors((a->order == b->order && a->order == c->order),
"Input tensor and output tensor must have same order!");
int an = transposedA == X_TRANS ? a->dimSizeRDI[0] : a->dimSizeRDI[1];
int am = transposedA == X_TRANS ? a->dimSizeRDI[1] : a->dimSizeRDI[0];
int bn = transposedB == X_TRANS ? b->dimSizeRDI[0] : b->dimSizeRDI[1];
int bm = transposedB == X_TRANS ? b->dimSizeRDI[1] : b->dimSizeRDI[0];
int cn = c->dimSizeRDI[1];
int cm = c->dimSizeRDI[0];
CheckNTErrors((am == bn && an == cn && bm == cm), "Unmatched tensors in multiplication!");
int aBlockSize = a->dimSizeRDI[0] * a->dimSizeRDI[1];
int bBlockSize = b->dimSizeRDI[0] * b->dimSizeRDI[1];
int cBlockSize = c->dimSizeRDI[0] * c->dimSizeRDI[1];
int aRealBlockSize = aBlockSize * a->unitSize;
int bRealBlockSize = bBlockSize * b->unitSize;
int cRealBlockSize = cBlockSize * c->unitSize;
int blockNum = 1;
for (int i = 2; i < a->order; i++) {
CheckNTErrors((a->dimSizeRDI[i] == c->dimSizeRDI[i]), "Incorrect tensor sizes!");
CheckNTErrors((b->dimSizeRDI[i] == c->dimSizeRDI[i]), "Incorrect tensor sizes!");
blockNum *= a->dimSizeRDI[i];
} }
if (a->devID >= 0 && b->devID >= 0 && c->devID >= 0) { int aDimSize[2] = {-a->dimSizeRDI[1], a->dimSizeRDI[0]};
#ifdef USE_CUDA int bDimSize[2] = {-b->dimSizeRDI[1], b->dimSizeRDI[0]};
CheckNTErrors((a->devID == b->devID && a->devID == c->devID), int cDimSize[2] = {-c->dimSizeRDI[1], c->dimSizeRDI[0]};
"The code must be run on the same GPU!");
XTensor * ai = NewTensor2D(aDimSize[0], aDimSize[1], a->dataType, a->devID, a->mem);
int devIDBackup; XTensor * bi = NewTensor2D(bDimSize[0], bDimSize[1], b->dataType, b->devID, b->mem);
ProtectCudaDev(a->devID, devIDBackup); XTensor * ci = NewTensor2D(cDimSize[0], cDimSize[1], c->dataType, c->devID, c->mem);
cublasHandle_t * handle = a->mem != NULL ? a->mem->GetCublasHandle() : GDevs.GetCudaHandle(a->devID); for (int i = 0; i < blockNum; i++) {
_CudaBLASMatrixMULList(handle, ai->data = (char*)a->data + i * aRealBlockSize;
aList, transposedA, bi->data = (char*)b->data + i * bRealBlockSize;
bList, transposedB, ci->data = (char*)c->data + i * cRealBlockSize;
cList, aList->count, #ifdef USE_BLAS
alpha, beta); if (useBLAS)
_MatrixMULCPU(ai, transposedA, bi, transposedB, ci, alpha, beta);
BacktoCudaDev(a->devID, devIDBackup); else
_MatrixMul2D(ai, transposedA, bi, transposedB, ci, alpha, beta);
#else #else
ShowNTErrors("Please specify USE_CUDA and recompile the code!"); _MatrixMul2D(ai, transposedA, bi, transposedB, ci, alpha, beta);
#endif #endif
} }
else {
CheckNTErrors((a->dataType == DEFAULT_DTYPE), "TODO!");
_MatrixMULBatchedCPU(aList, transposedA,
bList, transposedB,
cList, alpha, beta);
}
for (int i = 0; i < aList->count; i++) { ai->data = NULL;
XTensor * ai = (XTensor*)aList->GetItem(i); bi->data = NULL;
ai->data = NULL; ci->data = NULL;
delete ai; delete ai;
} delete bi;
delete ci;
}
for (int i = 0; i < bList->count; i++) { /*
XTensor * bi = (XTensor*)bList->GetItem(i); matrix multiplication in batch mode for list inputs (BLAS)
bi->data = NULL; c_i = trans(a_i) * trans(b_i) * \alpha + c_i * \beta for each i in [0,count-1]
delete bi; >> a - list of input matrices (2d tensors)
} >> transposedA - indicate whether the matrix a is transposed
>> b - another list of input matrices (2d tensors)
>> transposedB - indicate whether the matrix b is transposed
>> c - output matrix (2d tensor)
>> alpha - scalar
>> beta - scalar
*/
void _MatrixMulBatchedCPU(const XList * a, MATRIX_TRANS_TYPE transposedA,
const XList * b, MATRIX_TRANS_TYPE transposedB,
XList * c, DTYPE alpha, DTYPE beta)
{
CheckNTErrors(a && b && c, "Empty input lists!");
CheckNTErrors(a->count == b->count && a->count == c->count, "Input lists must be of the same size!");
if (a->count == 0)
return;
for (int i = 0; i < cList->count; i++) { bool isUniform = true;
XTensor * ci = (XTensor*)cList->GetItem(i); for (int i = 1; i < a->count; i++) {
ci->data = NULL; XTensor * aim = (XTensor*)a->GetItem(i - 1);
delete ci; XTensor * bim = (XTensor*)b->GetItem(i - 1);
XTensor * cim = (XTensor*)c->GetItem(i - 1);
XTensor * ai = (XTensor*)a->GetItem(i);
XTensor * bi = (XTensor*)b->GetItem(i);
XTensor * ci = (XTensor*)c->GetItem(i);
if (!XTensor::IsSameShaped(aim, ai) ||
!XTensor::IsSameShaped(bim, bi) ||
!XTensor::IsSameShaped(cim, ci))
{
isUniform = false;
break;
}
} }
delete aList; for (int i = 0; i < a->count; i++) {
delete bList; XTensor * ai = (XTensor*)a->GetItem(i);
delete cList; XTensor * bi = (XTensor*)b->GetItem(i);
XTensor * ci = (XTensor*)c->GetItem(i);
CheckNTErrors((ai->order == 2), "2d tensor (i.e., matrix) is required!");
CheckNTErrors((bi->order == 2), "2d tensor (i.e., matrix) is required!");
CheckNTErrors((ci->order == 2), "2d tensor (i.e., matrix) is required!");
#ifdef USE_BLAS
if (useBLAS)
_MatrixMULCPU(ai, transposedA, bi, transposedB, ci, alpha, beta);
else
_MatrixMul2D(ai, transposedA, bi, transposedB, ci, alpha, beta);
#else
_MatrixMul2D(ai, transposedA, bi, transposedB, ci, alpha, beta);
#endif
}
} }
/* /*
......
...@@ -26,6 +26,8 @@ ...@@ -26,6 +26,8 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
#define BMMul MatrixMulBatched
/* /*
matrix multiplication of the two tensors c = trans(a) * trans(b) * alpha + c * beta matrix multiplication of the two tensors c = trans(a) * trans(b) * alpha + c * beta
...@@ -37,6 +39,28 @@ where trans() returns the transposed matrix if the flag is fired ...@@ -37,6 +39,28 @@ where trans() returns the transposed matrix if the flag is fired
void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA, const XTensor * b, MATRIX_TRANS_TYPE transposedB, void _MatrixMulBatched(const XTensor * a, MATRIX_TRANS_TYPE transposedA, const XTensor * b, MATRIX_TRANS_TYPE transposedB,
XTensor * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0, XPRunner * parallelRunner = NULL); XTensor * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0, XPRunner * parallelRunner = NULL);
/*
matrix multiplication of the two tensors c = trans(a) * trans(b) * alpha + c * beta
optimized for GPU
*/
void _MatrixMulBatchedGPU(const XTensor * a, MATRIX_TRANS_TYPE transposedA, const XTensor * b, MATRIX_TRANS_TYPE transposedB,
XTensor * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0);
/*
matrix multiplication of the two tensors c = trans(a) * trans(b) * alpha + c * beta
optimized for GPU
*/
void _MatrixMulBatchedCPU(const XTensor * a, MATRIX_TRANS_TYPE transposedA, const XTensor * b, MATRIX_TRANS_TYPE transposedB,
XTensor * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0);
/*
matrix multiplication of the two tensors c = trans(a) * trans(b) * alpha + c * beta (for list inputs)
optimized for GPU
*/
void _MatrixMulBatchedCPU(const XList * a, MATRIX_TRANS_TYPE transposedA, const XList * b, MATRIX_TRANS_TYPE transposedB,
XList * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0);
/* /*
matrix multiplication of the two tensors (return a XTensor structure) c = trans(a) * trans(b) * alpha matrix multiplication of the two tensors (return a XTensor structure) c = trans(a) * trans(b) * alpha
make a new tensor to keep the result and return it make a new tensor to keep the result and return it
......
...@@ -32,9 +32,9 @@ element-wise product of two tensors ...@@ -32,9 +32,9 @@ element-wise product of two tensors
c(i) = a(i)*b(i) + \alpha * c(i) c(i) = a(i)*b(i) + \alpha * c(i)
where i is the index of the item where i is the index of the item
>> a - matrix a >> a - tensor a
>> b - matrix b >> b - tensor b
>> c - result matrix >> c - result tensor
>> alpha - the coefficient >> alpha - the coefficient
>> leadingDim - the dimension along which we perform broadcasting >> leadingDim - the dimension along which we perform broadcasting
*/ */
......
...@@ -104,9 +104,9 @@ void KernelMulElementWiseTensorDynamic(DTYPE * a, DTYPE * b, DTYPE * c, DTYPE al ...@@ -104,9 +104,9 @@ void KernelMulElementWiseTensorDynamic(DTYPE * a, DTYPE * b, DTYPE * c, DTYPE al
int offseti = i % stride; int offseti = i % stride;
if (nonZeroAlpha == 0) if (nonZeroAlpha == 0)
cp[threadIdx.x][j * ldSizeC + offseti] = ap[threadIdx.x][aj* ldSizeA + offseti] * bp[threadIdx.x][bj* ldSizeB + offseti]; cp[threadIdx.x][j * ldSizeC + offseti] = ap[threadIdx.x][aj * ldSizeA + offseti] * bp[threadIdx.x][bj * ldSizeB + offseti];
else else
cp[threadIdx.x][j * ldSizeC + offseti] = ap[threadIdx.x][aj* ldSizeA + offseti] * bp[threadIdx.x][bj* ldSizeB + offseti] + cp[threadIdx.x][j * ldSizeC + offseti] = ap[threadIdx.x][aj * ldSizeA + offseti] * bp[threadIdx.x][bj * ldSizeB + offseti] +
alpha * cp[threadIdx.x][j * ldSizeC + offseti]; alpha * cp[threadIdx.x][j * ldSizeC + offseti];
} }
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#include "../../XTensor.h"
#include "../../XName.h"
#include "../../XUtility.h"
#include "Sub.h"
#include "Sub.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
tensor subtraction c = a - b * \beta
>> a - a tensor
>> b - another tensor
>> c - where we put a-b*\beta. we save it in a if c is NULL
>> beta - the scaling factor
*/
void _Sub(const XTensor * a, const XTensor * b, XTensor * c, DTYPE beta)
{
CheckNTErrors(a && b && c, "Empty tensor input!");
CheckNTErrors(a->unitNum == b->unitNum && a->unitNum == c->unitNum,
"Unmatched tensors in addition!");
CheckNTErrors(a->dataType == b->dataType && a->dataType == c->dataType,
"Unmatched tensors in addition!");
if (a->devID >= 0 || b->devID >= 0 || c->devID >= 0) {
#ifdef USE_CUDA
if (a == c) {
int P2PAccesible = 0;
#ifdef CUDA_UVA
cudaDeviceCanAccessPeer(&P2PAccesible, a->devID, b->devID);
#endif
if ((a->devID < 0 && b->devID >= 0) ||
(a->devID >= 0 && b->devID < 0) ||
(a->devID >= 0 && b->devID >= 0 && a->devID != b->devID && !P2PAccesible))
{
ShowNTErrors("Cannot run this method on multiple devices simultaneously!");
}
else
_CudaSub(a, b, c, beta);
}
else
_CudaSub(a, b, c, beta);
#endif
}
else {
if (!a->isSparse && !b->isSparse) {
CheckNTErrors(!c->isSparse, "Illegal use of sparse tensor in addition!");
if (a->dataType == DEFAULT_DTYPE &&
b->dataType == DEFAULT_DTYPE &&
c->dataType == DEFAULT_DTYPE)
{
DTYPE * ap = (DTYPE*)a->data;
DTYPE * bp = (DTYPE*)b->data;
DTYPE * cp = (DTYPE*)c->data;
/* unrolling */
int num = a->unitNum;
if (num % 4 == 0) {
for (int i = 0; i < num; i += 4) {
cp[i] = ap[i] - bp[i] * beta;
cp[i + 1] = ap[i + 1] - bp[i + 1] * beta;
cp[i + 2] = ap[i + 2] - bp[i + 2] * beta;
cp[i + 3] = ap[i + 3] - bp[i + 3] * beta;
}
}
else if (num % 2 == 0) {
for (int i = 0; i < num; i += 2) {
cp[i] = ap[i] - bp[i] * beta;
cp[i + 1] = ap[i + 1] - bp[i + 1] * beta;
}
}
else {
for (int i = 0; i < num; i++) {
cp[i] = ap[i] - bp[i] * beta;
}
}
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
}
}
/*
tensor subtraction a = a - b * \beta (do it on site)
keep the result in the tensor a and return nothing
>> a - a tensor
>> b - another tensor
>> beta - the scaling factor
*/
void _SubMe(XTensor * a, const XTensor * b, DTYPE beta)
{
_Sub(a, b, a, beta);
}
/*
tensor subtraction c = a - b * \beta (return a XTensor structure)
make a new tensor c to keep the result and return it
>> a - a tensor
>> b - another tensor
>> beta - the scaling factor
<< return - the result of tensor subtraction
*/
XTensor Sub(const XTensor &a, const XTensor &b, DTYPE beta)
{
XTensor c(&a);
c.SetTMP();
/* call _Sub function */
_Sub(&a, &b, &c, beta);
/* tensor connections */
XLink::MakeLink(&a, &b, &c, MATH_SUB);
XLink::AddParamToHead(&c, beta);
return c;
}
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#include "../../XDevice.h"
#include "../../XUtility.h"
#include "Sub.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/*
subtraction of data arrays (CUDA Kernel)
c = a - b * \beta
>> a - A matrix
>> b - another matrix
>> c - where we put a-b
>> size - the size of a/b/c
>> beta - the coefficient
*/
__global__
void KernelSUB(DTYPE * a, DTYPE * b, DTYPE * c, int size, DTYPE beta)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size)
c[i] = a[i] - b[i] * beta;
}
/*
tensor subtraction c = a - b * \beta (cuda version)
>> a - a tensor
>> b - another tensor
>> c - where we put a-b*\beta.
>> beta - the scaling factor
*/
void _CudaSub(const XTensor * a, const XTensor * b, XTensor * c, DTYPE beta)
{
CheckNTErrors(a && b && c, "Empty tensor input!");
CheckNTErrors((a->unitNum == b->unitNum && a->unitNum == c->unitNum),
"Unmatched tensors in addition!");
CheckNTErrors((a->dataType == b->dataType && a->dataType == c->dataType),
"Unmatched tensors in addition!");
CheckNTErrors((a->devID == b->devID && a->devID == c->devID),
"The tensors must be on the same!");
int devIDBackup = XDevice::GetGPUDevice();
XDevice::SetGPUDevice(a->devID);
if (!a->isSparse && !b->isSparse) {
CheckNTErrors(!c->isSparse, "Illegal use of sparse matrix in addition!");
if (a->dataType == DEFAULT_DTYPE &&
b->dataType == DEFAULT_DTYPE &&
c->dataType == DEFAULT_DTYPE)
{
int gridSize[3], blockSize[3];
GDevs.GetCudaThread(a->devID, a->unitNum, gridSize, blockSize);
dim3 blocks(gridSize[0]);
dim3 threads(blockSize[0]);
KernelSUB << <blocks, threads >> >((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data, a->unitNum, beta);
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
}
else {
// TODO!!
ShowNTErrors("TODO!");
}
XDevice::SetGPUDevice(devIDBackup);
}
/* subtraction over arrays
tensor subtraction c = a - b * \beta (cuda version) with an input handle
>> devID - device ID (MUST >= 0)
>> handle - cuda handle
>> a - an array
>> b - another array
>> c - where we put a-b
>> size - size of the array
>> beta - the coefficient
*/
void _CudaSubWithHandle(int devID, cublasHandle_t * handle, DTYPE * a, DTYPE * b, DTYPE * c, int size, DTYPE beta)
{
if (size == 0)
return;
if (c == NULL)
c = a;
CheckNTErrors((a && b && c), "Empty arrays in addition!");
int devIDBackup;
ProtectCudaDev(devID, devIDBackup);
if (c == a) {
#ifdef DOUBELPRICSION
cublasDaxpy(*handle, size, &beta, b, 1, a, 1);
#else
cublasSaxpy(*handle, size, &beta, b, 1, a, 1);
#endif
}
else {
int gridSize[3], blockSize[3];
GDevs.GetCudaThread(devID, size, gridSize, blockSize);
dim3 blocks(gridSize[0]);
dim3 threads(blockSize[0]);
KernelSUB<<<blocks, threads>>>((DTYPE*)a, (DTYPE*)b, (DTYPE*)c, size, beta);
}
BacktoCudaDev(devID, devIDBackup);
}
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#ifndef __SUB_CUH__
#define __SUB_CUH__
#include "Sub.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/* subtraction of data arrays (CUDA Kernel) */
__global__
void KernelSUB(DTYPE * a, DTYPE * b, DTYPE * c, int size, DTYPE beta = (DTYPE)1.0);
/* tensor subtraction c = a - b * \beta (cuda version) */
void _CudaSub(const XTensor * a, const XTensor * b, XTensor * c = NULL, DTYPE beta = (DTYPE)1.0);
/* tensor subtraction c = a - b * \beta (cuda version) with an input handle */
void _CudaSubWithHandle(int devID, cublasHandle_t * handle, DTYPE * a, DTYPE * b, DTYPE * c, int size, DTYPE beta = (DTYPE)1.0);
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
#endif // __SUB_CUH__
/* NiuTrans.Tensor - an open-source tensor library /* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University. * Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved. * All rights reserved.
* *
* Licensed under the Apache License, Version 2.0 (the "License"); * Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License. * you may not use this file except in compliance with the License.
* You may obtain a copy of the License at * You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
/* /*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/ * Today is the first day of August. It's still very hot.
*/
#ifndef __ABSOLUTE_H__ #ifndef __SUB_H__
#define __ABSOLUTE_H__ #define __SUB_H__
#include "../../XTensor.h" #include "../../XTensor.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* set every entry to its absolute value */ /* tensor subtraction c = a - b * \beta */
void _Absolute(const XTensor * a, XTensor * b); void _Sub(const XTensor * a, const XTensor * b, XTensor * c, DTYPE beta = (DTYPE)1.0);
/* /*
set every entry to its absolute value (do it on site) tensor subtraction a = a - b * \beta
keep the result in the input tensor a and return nothing keep the result in the input tensor a and return nothing
*/ */
void _AbsoluteMe(XTensor * a); void _SubMe(XTensor * a, const XTensor * b, DTYPE beta = (DTYPE)1.0);
/* /*
set every entry to its absolute value (return a XTensor structure) tensor subtraction c = a - b * \beta
make a new tensor to keep the result and return it make a new tensor c to keep the result and return it
*/ */
XTensor Absolute(const XTensor & a); XTensor Sub(const XTensor &a, const XTensor &b, DTYPE beta = (DTYPE)1.0);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
#endif // __ABSOLUTE_H__ #endif // __SUB_H__
...@@ -24,6 +24,7 @@ ...@@ -24,6 +24,7 @@
#include "../../XUtility.h" #include "../../XUtility.h"
#include "Sum.h" #include "Sum.h"
#include "Sum.cuh" #include "Sum.cuh"
#include "SumDim.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
...@@ -67,7 +68,7 @@ void _Sum(const XTensor * a, const XTensor * b, XTensor * c, DTYPE beta) ...@@ -67,7 +68,7 @@ void _Sum(const XTensor * a, const XTensor * b, XTensor * c, DTYPE beta)
} }
else { else {
if (!a->isSparse && !b->isSparse) { if (!a->isSparse && !b->isSparse) {
CheckNTErrors(!c->isSparse, "Illegal use of sparse matrix in addition!"); CheckNTErrors(!c->isSparse, "Illegal use of sparse tensor in addition!");
if (a->dataType == DEFAULT_DTYPE && if (a->dataType == DEFAULT_DTYPE &&
b->dataType == DEFAULT_DTYPE && b->dataType == DEFAULT_DTYPE &&
...@@ -123,6 +124,33 @@ void _SumMe(XTensor * a, const XTensor * b, DTYPE beta) ...@@ -123,6 +124,33 @@ void _SumMe(XTensor * a, const XTensor * b, DTYPE beta)
{ {
_Sum(a, b, a, beta); _Sum(a, b, a, beta);
} }
/*
return a dimension if the sum is performed as SumDim (in more details in SumDim.h
>> a - a tensor
>> b - another tensor for sum
*/
int GetSumDimIndex(const XTensor &a, const XTensor &b)
{
if(a.order < b.order)
return -1;
int hitCount = 0;
int hitDim = -1;
for(int i = 0; i < b.order; i++){
if(b.dimSize[b.order - 1 - i] == 1)
continue;
else if(b.dimSize[b.order - 1 - i] == a.dimSize[a.order - 1 - i]){
hitCount++;
hitDim = a.order - b.order + i;
}
}
if(hitCount == 1)
return hitDim;
else
return -1;
}
/* /*
tensor summation c = a + b * \beta (return a XTensor structure) tensor summation c = a + b * \beta (return a XTensor structure)
...@@ -137,13 +165,29 @@ XTensor Sum(const XTensor &a, const XTensor &b, DTYPE beta) ...@@ -137,13 +165,29 @@ XTensor Sum(const XTensor &a, const XTensor &b, DTYPE beta)
{ {
XTensor c(&a); XTensor c(&a);
c.SetTMP(); c.SetTMP();
int n = GetSumDimIndex(a, b);
if(n == -1){
/* call _Sum function */
_Sum(&a, &b, &c, beta);
/* call _Sum function */ /* tensor connections */
_Sum(&a, &b, &c, beta); XLink::MakeLink(&a, &b, &c, MATH_SUM);
XLink::AddParamToHead(&c, beta);
}
else if(n >= 0 && n < a.order){
/* call _Sum function */
_SumDim(&a, &b, &c, n, beta);
/* tensor connections */ /* tensor connections */
XLink::MakeLink(&a, &b, &c, MATH_SUM); XLink::MakeLink(&a, &b, &c, MATH_SUMDIM);
XLink::AddParamToHead(&c, beta); XLink::AddParamToHeadInt(&c, n);
XLink::AddParamToHead(&c, beta);
}
else{
ShowNTErrors("Something is wrong!");
}
return c; return c;
} }
......
...@@ -20,6 +20,7 @@ ...@@ -20,6 +20,7 @@
*/ */
#include "../../XDevice.h" #include "../../XDevice.h"
#include "../../XUtility.h"
#include "Sum.cuh" #include "Sum.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-07-29
*/
#include "Sum.h"
#include "SumDim.h"
#include "SumDim.cuh"
#include "../../XName.h"
#include "../movement/CopyValues.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
tensor summation
c = a + b * \beta
where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting
>> a - a tensor
>> b - another tensor whose size is equal to that of dimension n of a
>> c - where we put a+b*\beta. we save it in a if c is NULL
>> n - the dimension index
>> beta - the scaling factor
*/
void _SumDim(const XTensor * a, const XTensor * b, XTensor * c, int n, DTYPE beta)
{
CheckNTErrors(a && b && c, "Empty tensor input!");
CheckNTErrors(a->unitNum == c->unitNum, "Unmatched tensors in addition!");
CheckNTErrors(a->dataType == b->dataType && a->dataType == c->dataType,
"Unmatched data types in addition!");
CheckNTErrors(a->order == c->order, "The input tensors do not have the same order in addition!");
CheckNTErrors(!a->isSparse && !b->isSparse && !c->isSparse, "Dense tensors are required!");
CheckNTErrors(a->dimSize[n] == b->unitNum, "Wrong tensor size!");
if(beta == 0){
_CopyValues(a, c);
return;
}
if(XTensor::IsSameShaped(a, b)){
_Sum(a, b, c, beta);
return;
}
if(a->devID >= 0 || b->devID >= 0 || c->devID >= 0){
#ifdef USE_CUDA
_CudaSumDim(a, b, c, n, beta);
#else
ShowNTErrors("Please specify USE_CUDA and recompile the code!");
#endif
}
else{
int stride = 1;
int blockSize = a->dimSize[n];
int blockNum = 1;
for(int i = a->order - 1; i >= 0; i--){
if(i > n)
stride *= a->dimSize[i];
else if(i < n)
blockNum *= a->dimSize[i];
}
if (a->dataType == DEFAULT_DTYPE){
int num = a->unitNum;
if(stride > 1){
for(int i = 0, j = 0; i < num; i += stride, j++){
DTYPE * ap = (DTYPE*)a->data + i;
DTYPE bv = *((DTYPE*)b->data + j % blockSize) * beta;
DTYPE * cp = (DTYPE*)c->data + i;
for(int k = 0; k < stride; k++)
cp[k] = ap[k] + bv;
}
}
else if(stride == 1){
DTYPE * bp = (DTYPE*)b->data;
for(int i = 0; i < num; i += blockSize){
DTYPE * ap = (DTYPE*)a->data + i;
DTYPE * cp = (DTYPE*)c->data + i;
if(beta == 1.0F){
for(int j = 0; j < blockSize; j++)
cp[j] = ap[j] + bp[j];
}
else{
for(int j = 0; j < blockSize; j++)
cp[j] = ap[j] + bp[j] * beta;
}
}
}
else{
ShowNTErrors("Something is wrong!");
}
}
else {
ShowNTErrors("TODO!");
}
}
}
/*
tensor summation (do it on site)
keep the result in the input tensor and return nothing
a = a + b * \beta
where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting
>> a - a tensor
>> b - another tensor whose size is equal to that of dimension n of a
>> n - the dimension index
>> beta - the scaling factor
*/
void _SumDim(XTensor * a, const XTensor * b, int n, DTYPE beta)
{
_SumDim(a, b, a, n, beta);
}
/*
tensor summation (return a XTensor structure and make tensor connections)
make a new tensor to keep the result and return it
c = a + b * \beta
where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting
>> a - a tensor
>> b - another tensor whose size is equal to that of dimension n of a
>> n - the dimension index
>> beta - the scaling factor
<< return - the result tensor by tensor summation
*/
XTensor SumDim(const XTensor &a, const XTensor &b, int n, DTYPE beta)
{
XTensor c(&a);
c.SetTMP();
/* call _Sum function */
_SumDim(&a, &b, &c, n, beta);
/* tensor connections */
XLink::MakeLink(&a, &b, &c, MATH_SUMDIM);
XLink::AddParamToHeadInt(&c, n);
XLink::AddParamToHead(&c, beta);
return c;
}
}
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-07-29
*/
#include "SumDim.cuh"
#include "../../XDevice.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/*
tensor summation of a tensor and a row vector
c = a + b * \beta
where a is a tensor and b is a row vector
>> a - pointer to the data array of a
>> b - pointer to the data array of b
>> c - pointer to the data array of c
>> rowNum - number of rows of a and c
>> colNum - number of columns of a and c (i.e., the size of b)
>> beta - the scaling factor
*/
template <class T, bool betaFired>
__global__
void KernelAddWithRow(T * a, T * b, T * c, int rowNum, int colNum, T beta)
{
__shared__ T bv[MAX_CUDA_THREAD_NUM_PER_BLOCK];
int col = blockDim.x * blockIdx.x + threadIdx.x;
int row = blockDim.y * blockIdx.y + threadIdx.y;
if(col >= colNum || row >= rowNum)
return;
if(threadIdx.y == 0)
bv[threadIdx.x] = b[col];
__syncthreads();
int offset = colNum * row + col;
if(betaFired)
c[offset] = a[offset] + bv[threadIdx.x] * beta;
else
c[offset] = a[offset] + bv[threadIdx.x];
}
/*
tensor summation of a tensor and a colum vector
c = a + b * \beta
where a is a tensor and b is a colum vector
>> a - pointer to the data array of a
>> b - pointer to the data array of b
>> c - pointer to the data array of c
>> rowNum - number of rows of a and c (i.e., the size of b)
>> colNum - number of columns of a and c
>> blockNum - size of a block (matrix), i.e., rowNum * colNum
>> blockNum - number of matrics
>> beta - the scaling factor
*/
template <class T, bool betaFired>
__global__
void KernelAddWithCol(T * a, T * b, T * c, int rowNum, int colNum, int blockSize, int blockNum, T beta)
{
__shared__ T bv[MAX_CUDA_THREAD_NUM_PER_BLOCK];
int colIndex = blockDim.x * blockIdx.x + threadIdx.x;
int row = blockDim.y * blockIdx.y + threadIdx.y;
int col = colIndex % colNum;
int block = colIndex / colNum;
if(row >= rowNum || block >= blockNum)
return;
if(threadIdx.x == 0)
bv[threadIdx.y] = b[row];
__syncthreads();
int offset = block * blockSize + row * colNum + col;
if(betaFired)
c[offset] = a[offset] + bv[threadIdx.y] * beta;
else
c[offset] = a[offset] + bv[threadIdx.y];
}
/*
tensor summation (cuda version)
c = a + b * \beta
where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting
>> a - a tensor
>> b - another tensor whose size is equal to that of dimension n of a
>> c - where we put a+b*\beta. we save it in a if c is NULL
>> n - the dimension index
>> beta - the scaling factor
*/
void _CudaSumDim(const XTensor * a, const XTensor * b, XTensor * c, int n, DTYPE beta)
{
CheckNTErrors(a && b && c, "Empty tensor input!");
CheckNTErrors(a->unitNum == c->unitNum, "Unmatched tensors in addition!");
CheckNTErrors(a->dataType == b->dataType && a->dataType == c->dataType,
"Unmatched data types in addition!");
CheckNTErrors(a->order == c->order, "The input tensors do not have the same order in addition!");
CheckNTErrors(!a->isSparse && !b->isSparse && !c->isSparse, "Dense tensors are required!");
CheckNTErrors(a->dimSize[n] == b->unitNum, "Wrong tensor size!");
int stride = 1;
int blockSize = a->dimSize[n];
int blockNum = 1;
for(int i = a->order - 1; i >= 0; i--){
if(i > n)
stride *= a->dimSize[i];
else if(i < n)
blockNum *= a->dimSize[i];
}
int cudaGrids[3];
int cudaBlocks[3];
int devIDBackup = 0;
ProtectCudaDev(a->devID, devIDBackup);
if (a->dataType == DEFAULT_DTYPE){
if(stride > 1){
GDevs.GetCudaThread2D(a->devID, stride * blockNum, blockSize, MAX_INT, cudaGrids, cudaBlocks);
if(beta == (DTYPE)1.0F)
KernelAddWithCol<DTYPE, false> <<<dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1])>>>
((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data,
blockSize, stride, blockSize * stride, blockNum, beta);
else
KernelAddWithCol<DTYPE, true> <<<dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1])>>>
((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data,
blockSize, stride, blockSize * stride, blockNum, beta);
}
else if(stride == 1){
GDevs.GetCudaThread2D(a->devID, blockSize, blockNum, MAX_INT, cudaGrids, cudaBlocks);
if(beta == (DTYPE)1.0F)
KernelAddWithRow<DTYPE, false> <<<dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1])>>>
((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data,
blockNum, blockSize, beta);
else
KernelAddWithRow<DTYPE, true> <<<dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1])>>>
((DTYPE*)a->data, (DTYPE*)b->data, (DTYPE*)c->data,
blockNum, blockSize, beta);
}
else{
ShowNTErrors("Something is wrong!");
}
}
else {
ShowNTErrors("TODO!");
}
BacktoCudaDev(a->devID, devIDBackup);
}
#endif
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-07-29
*/
#ifndef __SUMDIM_CUH__
#define __SUMDIM_CUH__
#include "../../XTensor.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/* tensor summation c = a + b * \beta where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting (cuda version) */
void _CudaSumDim(const XTensor * a, const XTensor * b, XTensor * c, int n, DTYPE beta = (DTYPE)1.0);
#endif
} // namespace nts(NiuTrans.Tensor)
#endif // __SUMDIM_CUH__
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2018, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-07-29
* It reached to 39 centigrade around 3:00 pm in Shenyang
*/
#ifndef __SUMDIM_H__
#define __SUMDIM_H__
#include "../../XTensor.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/* tensor summation c = a + b * \beta where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting */
void _SumDim(const XTensor * a, const XTensor * b, XTensor * c, int n, DTYPE beta = (DTYPE)1.0);
/* tensor summation c = a + b * \beta where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting. we keep the result in the input tensor a and return nothing */
void _SumDim(XTensor * a, const XTensor * b, int n, DTYPE beta = (DTYPE)1.0);
/* tensor summation c = a + b * \beta where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting. We make a new tensor c to keep the result and return it */
XTensor SumDim(const XTensor &a, const XTensor &b, int n, DTYPE beta = (DTYPE)1.0);
} // namespace nts(NiuTrans.Tensor)
#endif // __SUMDIM_H__
...@@ -20,6 +20,7 @@ ...@@ -20,6 +20,7 @@
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-05-08 * $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-05-08
*/ */
#include <math.h>
#include "SetData.h" #include "SetData.h"
#include "SetData.cuh" #include "SetData.cuh"
#include "../../XUtility.h" #include "../../XUtility.h"
...@@ -37,6 +38,43 @@ ...@@ -37,6 +38,43 @@
namespace nts{ // namespace nts(NiuTrans.Tensor) namespace nts{ // namespace nts(NiuTrans.Tensor)
/*
Fills the input Tensor or Variable with values according to the method described in
"Understanding the difficulty of training deep feedforward neural networks" - Glorot, X. & Bengio, Y. (2010),
using a uniform distribution. The resulting tensor will have values sampled from :math:`U(-a, a)`
where :math:`a = gain \times \sqrt{2 / (fan\_in + fan\_out)} \times \sqrt{3}`. Also known as Glorot initialisation.
>> tensor - the tensor whose data array would be initialized
>> gain - an optional scaling factor
*/
void _SetDataFanInOut(XTensor * tensor, DTYPE gain)
{
CheckNTErrors(tensor->dataType == X_FLOAT, "the tensor must be in X_FLOAT!");
CheckNTErrors(tensor->order >= 2, "the tensor dimension must be no less than 2!");
int fanIn = 1;
int fanOut = 1;
int order = tensor->order;
if (order == 2) {
fanIn = tensor->dimSize[1];
fanOut = tensor->dimSize[0];
}
else {
int numInputFmaps = tensor->dimSize[1];
int numOutputFmaps = tensor->dimSize[0];
int receptiveFieldSize = 0;
for (int i = 2; i < order; i++)
receptiveFieldSize += tensor->dimSize[i];
fanIn = numInputFmaps * receptiveFieldSize;
fanOut = numOutputFmaps * receptiveFieldSize;
}
DTYPE std = gain * sqrt(2.0/(fanIn + fanOut));
DTYPE a = sqrt(3.0) * std;
_SetDataRand(tensor, -a, a);
}
/* /*
generate data items with a fixed value p generate data items with a fixed value p
>> tensor - the tensor whose data array would be initialized >> tensor - the tensor whose data array would be initialized
...@@ -65,7 +103,7 @@ void _SetDataFixed(XTensor * tensor, void * valuePointer) ...@@ -65,7 +103,7 @@ void _SetDataFixed(XTensor * tensor, void * valuePointer)
} }
else{ else{
#ifdef USE_CUDA #ifdef USE_CUDA
CudaSetDataFixedInt(tensor, p); _CudaSetDataFixedInt(tensor, p);
#endif #endif
} }
} }
...@@ -88,7 +126,7 @@ void _SetDataFixed(XTensor * tensor, void * valuePointer) ...@@ -88,7 +126,7 @@ void _SetDataFixed(XTensor * tensor, void * valuePointer)
} }
else{ else{
#ifdef USE_CUDA #ifdef USE_CUDA
CudaSetDataFixedFloat(tensor, p); _CudaSetDataFixedFloat(tensor, p);
#endif #endif
} }
} }
...@@ -111,7 +149,7 @@ void _SetDataFixed(XTensor * tensor, void * valuePointer) ...@@ -111,7 +149,7 @@ void _SetDataFixed(XTensor * tensor, void * valuePointer)
} }
else{ else{
#ifdef USE_CUDA #ifdef USE_CUDA
CudaSetDataFixedDouble(tensor, p); _CudaSetDataFixedDouble(tensor, p);
#endif #endif
} }
} }
...@@ -137,7 +175,7 @@ generate data items with a fixed value p (in integer) ...@@ -137,7 +175,7 @@ generate data items with a fixed value p (in integer)
*/ */
void _SetDataFixedInt(XTensor * tensor, int p) void _SetDataFixedInt(XTensor * tensor, int p)
{ {
CheckNTErrors(tensor->dataType == X_INT, "the tensor must be in X_INT"); CheckNTErrors(tensor->dataType == X_INT, "the tensor must be in X_INT!");
if(p == 0) if(p == 0)
tensor->SetZeroAll(); tensor->SetZeroAll();
...@@ -152,7 +190,7 @@ generate data items with a fixed value p (in float) ...@@ -152,7 +190,7 @@ generate data items with a fixed value p (in float)
*/ */
void _SetDataFixedFloat(XTensor * tensor, float p) void _SetDataFixedFloat(XTensor * tensor, float p)
{ {
CheckNTErrors(tensor->dataType == X_FLOAT, "the tensor must be in X_INT"); CheckNTErrors(tensor->dataType == X_FLOAT, "the tensor must be in X_FLOAT!");
if(p == 0) if(p == 0)
tensor->SetZeroAll(); tensor->SetZeroAll();
...@@ -167,7 +205,7 @@ generate data items with a fixed value p (in double) ...@@ -167,7 +205,7 @@ generate data items with a fixed value p (in double)
*/ */
void _SetDataFixedDouble(XTensor * tensor, double p) void _SetDataFixedDouble(XTensor * tensor, double p)
{ {
CheckNTErrors(tensor->dataType == X_DOUBLE, "the tensor must be in X_INT"); CheckNTErrors(tensor->dataType == X_DOUBLE, "the tensor must be in X_DOUBLE!");
if(p == 0) if(p == 0)
tensor->SetZeroAll(); tensor->SetZeroAll();
...@@ -183,6 +221,8 @@ generate data items with a uniform distribution in [low,high] ...@@ -183,6 +221,8 @@ generate data items with a uniform distribution in [low,high]
*/ */
void _SetDataRand(XTensor * tensor, DTYPE low, DTYPE high) void _SetDataRand(XTensor * tensor, DTYPE low, DTYPE high)
{ {
CheckNTErrors(high > low, "the high value must be greater than low value!");
if(tensor == NULL) if(tensor == NULL)
return; return;
...@@ -215,10 +255,13 @@ void _SetDataRand(XTensor * tensor, DTYPE low, DTYPE high) ...@@ -215,10 +255,13 @@ void _SetDataRand(XTensor * tensor, DTYPE low, DTYPE high)
TODO: generate data points on GPUs straightforwardly. TODO: generate data points on GPUs straightforwardly.
*/ */
else{ else{
XTensor * t2 = NewTensor(tensor->order, tensor->dimSize, tensor->dataType, tensor->denseRatio, -1); #ifdef USE_CUDA
_SetDataRand(t2, low, high); _CudaSetDataRand(tensor, low, high);
_CopyValues(t2, tensor); #endif
delete t2; //XTensor * t2 = NewTensor(tensor->order, tensor->dimSize, tensor->dataType, tensor->denseRatio, -1);
//_SetDataRand(t2, low, high);
//_CopyValues(t2, tensor);
//delete t2;
} }
} }
......
...@@ -21,7 +21,10 @@ ...@@ -21,7 +21,10 @@
* I'm surprised that I did not write this file till today. * I'm surprised that I did not write this file till today.
*/ */
#include <curand.h>
#include <time.h>
#include "SetData.cuh" #include "SetData.cuh"
#include <curand_kernel.h>
#include "../../XDevice.h" #include "../../XDevice.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
...@@ -46,7 +49,7 @@ generate data items with a fixed value p (in int) ...@@ -46,7 +49,7 @@ generate data items with a fixed value p (in int)
>> tensor - the tensor for initialization >> tensor - the tensor for initialization
>> p - the initial value >> p - the initial value
*/ */
void CudaSetDataFixedInt(XTensor * tensor, int p) void _CudaSetDataFixedInt(XTensor * tensor, int p)
{ {
CheckNTErrors(tensor->dataType == X_INT, "the tensor must be in X_INT!"); CheckNTErrors(tensor->dataType == X_INT, "the tensor must be in X_INT!");
...@@ -86,7 +89,7 @@ generate data items with a fixed value p (in float) ...@@ -86,7 +89,7 @@ generate data items with a fixed value p (in float)
>> tensor - the tensor for initialization >> tensor - the tensor for initialization
>> p - the initial value >> p - the initial value
*/ */
void CudaSetDataFixedFloat(XTensor * tensor, float p) void _CudaSetDataFixedFloat(XTensor * tensor, float p)
{ {
CheckNTErrors(tensor->dataType == X_FLOAT, "the tensor must be in X_FLOAT!"); CheckNTErrors(tensor->dataType == X_FLOAT, "the tensor must be in X_FLOAT!");
...@@ -126,7 +129,7 @@ generate data items with a fixed value p (in double) ...@@ -126,7 +129,7 @@ generate data items with a fixed value p (in double)
>> tensor - the tensor for initialization >> tensor - the tensor for initialization
>> p - the initial value >> p - the initial value
*/ */
void CudaSetDataFixedDouble(XTensor * tensor, double p) void _CudaSetDataFixedDouble(XTensor * tensor, double p)
{ {
CheckNTErrors(tensor->dataType == X_DOUBLE, "the tensor must be in X_DOUBLE!"); CheckNTErrors(tensor->dataType == X_DOUBLE, "the tensor must be in X_DOUBLE!");
...@@ -146,4 +149,115 @@ void CudaSetDataFixedDouble(XTensor * tensor, double p) ...@@ -146,4 +149,115 @@ void CudaSetDataFixedDouble(XTensor * tensor, double p)
BacktoCudaDev(tensor->devID, devIDBackup); BacktoCudaDev(tensor->devID, devIDBackup);
} }
/*
call curand_init function on each kernel with the same random seed
and init the rng states
*/
__global__
void KernelInitializeCurand(curandState * state, unsigned long seed)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
curand_init(seed, i, 0, &state[i]);
}
/* */
__device__
float GenerateFloat(curandState* globalState, int i)
{
//copy state to local mem
curandState localState = globalState[i];
//apply uniform distribution with calculated random
float randNum = curand_uniform(&localState);
//update state
globalState[i] = localState;
//return value
return randNum;
}
/**/
__device__
double GenerateDouble(curandState* globalState, int i)
{
//copy state to local mem
curandState localState = globalState[i];
//apply uniform distribution with calculated random
double randNum = curand_uniform_double(&localState);
//update state
globalState[i] = localState;
//return value
return randNum;
}
/*
set data array with a uniform distribution in [low, high]
>> deviceStates - the state of curand
>> d - float datatype pointer to the data array
>> size - size of the array
>> low - low value of the range
>> high - high value of the range
*/
__global__
void KernelSetDataRandFloat(curandState* deviceStates, float * d, int size, DTYPE low, DTYPE variance)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size) {
float randNum = GenerateFloat(deviceStates, i);
d[i] = randNum * variance + low;
}
}
/*
set data array with a uniform distribution in [low, high]
>> deviceStates - the state of curand
>> d - double datatype pointer to the data array
>> size - size of the array
>> low - low value of the range
>> high - high value of the range
*/
__global__
void KernelSetDataRandDouble(curandState* deviceStates, double * d, int size, DTYPE low, DTYPE variance)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size){
double randNum = GenerateDouble(deviceStates, i);
d[i] = randNum * variance + low;
}
}
/*
generate data items with a uniform distribution in [low,high]
>> tensor - the tensor whose data array would be initialized
>> low - lower value of the range
>> high - higher value of the range
*/
void _CudaSetDataRand(XTensor * tensor, DTYPE low, DTYPE high)
{
CheckNTErrors(high > low, "the high value must be greater than low value!");
int gridSize[3];
int blockSize[3];
GDevs.GetCudaThread(tensor->devID, tensor->unitNum, gridSize, blockSize);
dim3 blocks(gridSize[0]);
dim3 threads(blockSize[0]);
int devIDBackup;
ProtectCudaDev(tensor->devID, devIDBackup);
curandState *deviceStates;
cudaMalloc(&deviceStates, sizeof(curandState));
DTYPE variance = high - low;
KernelInitializeCurand<<<blocks, threads>>>(deviceStates, unsigned(time(NULL)));
if (tensor->dataType == X_FLOAT)
KernelSetDataRandFloat <<<blocks, threads >>>(deviceStates, (float*)tensor->data, tensor->unitNum, low, variance);
else if (tensor->dataType == X_DOUBLE)
KernelSetDataRandDouble <<<blocks, threads >>>(deviceStates, (double*)tensor->data, tensor->unitNum, low, variance);
BacktoCudaDev(tensor->devID, devIDBackup);
}
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
...@@ -29,13 +29,16 @@ ...@@ -29,13 +29,16 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* generate data items with a fixed value p (in int) */ /* generate data items with a fixed value p (in int) */
void CudaSetDataFixedInt(XTensor * tensor, int p); void _CudaSetDataFixedInt(XTensor * tensor, int p);
/* generate data items with a fixed value p (in float) */ /* generate data items with a fixed value p (in float) */
void CudaSetDataFixedFloat(XTensor * tensor, float p); void _CudaSetDataFixedFloat(XTensor * tensor, float p);
/* generate data items with a fixed value p (in double) */ /* generate data items with a fixed value p (in double) */
void CudaSetDataFixedDouble(XTensor * tensor, double p); void _CudaSetDataFixedDouble(XTensor * tensor, double p);
/* generate data items with a uniform distribution in [low,high] */
void _CudaSetDataRand(XTensor * tensor, DTYPE low, DTYPE high);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
......
...@@ -27,6 +27,9 @@ ...@@ -27,6 +27,9 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* generate data items with a xavier initialization */
void _SetDataFanInOut(XTensor * tensor, DTYPE gain = 1.0F);
/* generate data items with a fixed value p */ /* generate data items with a fixed value p */
void _SetDataFixed(XTensor * tensor, void * valuePointer); void _SetDataFixed(XTensor * tensor, void * valuePointer);
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11
*/
#include "../../XTensor.h"
#include "../../XName.h"
#include "Log.h"
#include "Log.cuh"
#include <math.h>
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
set every entry to its log value (do it on site)
>> a - input tensor we are processing
>> b - output tensor we are processing
*/
void _Log(const XTensor * a, XTensor * b)
{
#ifdef USE_CUDA
/* run it on GPUs */
if (a->devID >= 0) {
_CudaLog(a, b);
return;
}
#endif
CheckNTErrors((XTensor::IsSameShaped(a, b)), "Input tensors should have the same type!");
CheckNTErrors((a->dataType == DEFAULT_DTYPE), "TODO!");
DTYPE * d = (DTYPE*)a->data;
DTYPE * db = (DTYPE*)b->data;
for (int i = 0; i < a->unitNum; i++)
db[i] = (DTYPE)log(d[i]);
}
/*
set every entry to its log value
keep the result in the input tensor a and return nothing
>> a - the tensor we are processing
*/
void _LogMe(XTensor * a)
{
_Log(a, a);
}
/*
set every entry to its log value (return a XTensor structure)
make a new tensor to keep the result and return it
>> a - input tensor we are processing
<< return - the log value of the input tensor
*/
XTensor Log(const XTensor & a)
{
XTensor b(&a);
b.SetTMP();
/* call _Log function */
_Log(&a, &b);
/* tensor connections */
XLink::MakeLink(&a, NULL, &b, MATH_LOG);
return b;
}
} // namespace nts(NiuTrans.Tensor)
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11
*/
#include "../../XDevice.h"
#include "../../XTensor.h"
#include "Log.h"
#include "Log.cuh"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/*
set each entry to its log value (CUDA Kernel)
>> a - pointer to input data array
>> b - pointer to output data array
>> size - size of the data array
*/
__global__
void KernelLog(DTYPE * a, DTYPE * b, int size)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size)
b[i] = log(a[i]);
}
/*
set each entry to its log value (CUDA Kernel)
This is for float16 computation
>> a - pointer to input data array
>> b - pointer to output data array
>> size - size of the data array
*/
__global__
void KernelLog(__half * a, __half * b, int size)
{
return;
}
/*
set each entry to its log value
>> a - input tensor
>> b - output tensor
*/
void _CudaLog(const XTensor * a, XTensor * b)
{
CheckNTErrors((XTensor::IsSameShaped(a, b)), "Input tensors should have the same type!");
CheckNTErrors((a->isSparse == false), "TODO!");
int gridSize[3];
int blockSize[3];
GDevs.GetCudaThread(a->devID, a->unitNum, gridSize, blockSize);
dim3 blocks(gridSize[0]);
dim3 threads(blockSize[0]);
int devIDBackup;
ProtectCudaDev(a->devID, devIDBackup);
if (a->dataType == DEFAULT_DTYPE) {
KernelLog << <blocks, threads >> >((DTYPE*)a->data, (DTYPE*)b->data, a->unitNum);
}
else if (a->dataType == X_FLOAT16) {
KernelLog << <blocks, threads >> >((__half*)a->data, (__half*)b->data, a->unitNum);
}
else {
ShowNTErrors("TODO!");
}
BacktoCudaDev(a->devID, devIDBackup);
}
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
#include <math.h>
#include "../../XName.h"
#include "Unary.h"
#include "Unary.cuh"
namespace nts{
#ifdef USE_CUDA
/* define three marco separately, specify the respective function names */
#define _SIMPLE_UNARY_FUNCTION(_funcName, _cudaFuncName, origFunc) \
void _funcName(const XTensor * a, XTensor * b) \
{ \
/* run it on GPUs */ \
if (a->devID >= 0) { \
_cudaFuncName(a, b); \
return; \
} \
CheckNTErrors((XTensor::IsSameShaped(a, b)), \
"Input tensors should have the same type!"); \
CheckNTErrors((a->dataType == DEFAULT_DTYPE), "TODO!"); \
DTYPE * d = (DTYPE*)a->data; \
DTYPE * db = (DTYPE*)b->data; \
for (int i = 0; i < a->unitNum; i++) \
db[i] = (DTYPE)origFunc(d[i]); \
}
#define _SIMPLE_UNARY_FUNCTION_ME(_funcNameMe, _funcName) \
void _funcNameMe(XTensor * a) \
{ \
_funcName(a, a); \
}
#define SIMPLE_UNARY_FUNCTION(funcName, _funcName, operationId) \
XTensor funcName(const XTensor &a) \
{ \
XTensor b(&a); \
b.SetTMP(); \
_funcName(&a, &b); \
XLink::MakeLink(&a, NULL, &b, operationId); \
return b; \
}
_SIMPLE_UNARY_FUNCTION(_Absolute, _CudaAbsolute, fabs)
_SIMPLE_UNARY_FUNCTION_ME(_AbsoluteMe, _Absolute)
SIMPLE_UNARY_FUNCTION(Absolute, _Absolute, MATH_ABSOLUTE)
_SIMPLE_UNARY_FUNCTION(_Exp, _CudaExp, exp)
_SIMPLE_UNARY_FUNCTION_ME(_ExpMe, _Exp)
SIMPLE_UNARY_FUNCTION(Exp, _Exp, MATH_EXP)
_SIMPLE_UNARY_FUNCTION(_Log, _CudaLog, log)
_SIMPLE_UNARY_FUNCTION_ME(_LogMe, _Log)
SIMPLE_UNARY_FUNCTION(Log, _Log, MATH_LOG)
_SIMPLE_UNARY_FUNCTION(_Sin, _CudaSin, sin)
_SIMPLE_UNARY_FUNCTION_ME(_SinMe, _Sin)
SIMPLE_UNARY_FUNCTION(Sin, _Sin, MATH_SIN)
_SIMPLE_UNARY_FUNCTION(_Cos, _CudaCos, cos)
_SIMPLE_UNARY_FUNCTION_ME(_CosMe, _Cos)
SIMPLE_UNARY_FUNCTION(Cos, _Cos, MATH_COS)
_SIMPLE_UNARY_FUNCTION(_Tan, _CudaTan, tan)
_SIMPLE_UNARY_FUNCTION_ME(_TanMe, _Tan)
SIMPLE_UNARY_FUNCTION(Tan, _Tan, MATH_TAN)
#else
/* define three marco separately, specify the respective function names */
#define _SIMPLE_UNARY_FUNCTION(_funcName, origFunc) \
void _funcName(const XTensor * a, XTensor * b) \
{ \
CheckNTErrors((XTensor::IsSameShaped(a, b)), \
"Input tensors should have the same type!"); \
CheckNTErrors((a->dataType == DEFAULT_DTYPE), "TODO!"); \
DTYPE * d = (DTYPE*)a->data; \
DTYPE * db = (DTYPE*)b->data; \
for (int i = 0; i < a->unitNum; i++) \
db[i] = (DTYPE)origFunc(d[i]); \
}
#define _SIMPLE_UNARY_FUNCTION_ME(_funcNameMe, _funcName) \
void _funcNameMe(XTensor * a) \
{ \
_funcName(a, a); \
}
#define SIMPLE_UNARY_FUNCTION(funcName, _funcName, operationId) \
XTensor funcName(const XTensor &a) \
{ \
XTensor b(&a); \
b.SetTMP(); \
_funcName(&a, &b); \
XLink::MakeLink(&a, NULL, &b, operationId); \
return b; \
}
_SIMPLE_UNARY_FUNCTION(_Absolute, fabs)
_SIMPLE_UNARY_FUNCTION_ME(_AbsoluteMe, _Absolute)
SIMPLE_UNARY_FUNCTION(Absolute, _Absolute, MATH_ABSOLUTE)
_SIMPLE_UNARY_FUNCTION(_Exp, exp)
_SIMPLE_UNARY_FUNCTION_ME(_ExpMe, _Exp)
SIMPLE_UNARY_FUNCTION(Exp, _Exp, MATH_EXP)
_SIMPLE_UNARY_FUNCTION(_Log, log)
_SIMPLE_UNARY_FUNCTION_ME(_LogMe, _Log)
SIMPLE_UNARY_FUNCTION(Log, _Log, MATH_LOG)
_SIMPLE_UNARY_FUNCTION(_Sin, sin)
_SIMPLE_UNARY_FUNCTION_ME(_SinMe, _Sin)
SIMPLE_UNARY_FUNCTION(Sin, _Sin, MATH_SIN)
_SIMPLE_UNARY_FUNCTION(_Cos, cos)
_SIMPLE_UNARY_FUNCTION_ME(_CosMe, _Cos)
SIMPLE_UNARY_FUNCTION(Cos, _Cos, MATH_COS)
_SIMPLE_UNARY_FUNCTION(_Tan, tan)
_SIMPLE_UNARY_FUNCTION_ME(_TanMe, _Tan)
SIMPLE_UNARY_FUNCTION(Tan, _Tan, MATH_TAN)
#endif
}
\ No newline at end of file
#include <math.h>
#include "../../XDevice.h"
#include "../../XName.h"
#include "Unary.cuh"
namespace nts {
#define SIMPLE_UNARY_FUNCTION_GPU(funcName, origFunc) \
__global__ \
void Kernel##funcName(DTYPE * a, DTYPE * b, int size) \
{ \
int i = blockDim.x * blockIdx.x + threadIdx.x; \
\
if (i < size) \
b[i] = (DTYPE)origFunc(a[i]); \
} \
__global__ \
void Kernel##funcName(__half * a, __half * b, int size) \
{ \
return; \
} \
void _Cuda##funcName(const XTensor * a, XTensor * b) \
{ \
CheckNTErrors((XTensor::IsSameShaped(a, b)), \
"Input tensors should have the same type!"); \
CheckNTErrors((a->isSparse == false), "TODO!"); \
\
int gridSize[3]; \
int blockSize[3]; \
\
GDevs.GetCudaThread(a->devID, a->unitNum, gridSize, blockSize); \
\
dim3 blocks(gridSize[0]); \
dim3 threads(blockSize[0]); \
\
int devIDBackup; \
ProtectCudaDev(a->devID, devIDBackup); \
\
if (a->dataType == DEFAULT_DTYPE) { \
Kernel##funcName << <blocks, threads >> > \
((DTYPE*)a->data, (DTYPE*)b->data, a->unitNum); \
} \
else if (a->dataType == X_FLOAT16) { \
Kernel##funcName << <blocks, threads >> > \
((__half*)a->data, (__half*)b->data, a->unitNum); \
} \
else { \
ShowNTErrors("TODO!"); \
} \
\
BacktoCudaDev(a->devID, devIDBackup); \
} \
SIMPLE_UNARY_FUNCTION_GPU(Absolute, fabs)
SIMPLE_UNARY_FUNCTION_GPU(Exp, exp)
SIMPLE_UNARY_FUNCTION_GPU(Log, log)
SIMPLE_UNARY_FUNCTION_GPU(Sin, sin)
SIMPLE_UNARY_FUNCTION_GPU(Cos, cos)
SIMPLE_UNARY_FUNCTION_GPU(Tan, tan)
}
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#ifndef __UNARY_CUH__
#define __UNARY_CUH__
#include "../../XTensor.h"
#include "Unary.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA
/* set each entry to its absolute value (CUDA Kernel) */
__global__
void KernelAbsolute(DTYPE * a, DTYPE * b, int size);
/* set each entry to its absolute value (CUDA Kernel) with float16 data type*/
__global__
void KernelAbsolute(__half * a, __half * b, int size);
/* set each entry to its absolute value */
void _CudaAbsolute(const XTensor * a, XTensor * b);
/* set each entry to its exponent value (CUDA Kernel) */
__global__
void KernelExp(DTYPE * a, DTYPE * b, int size);
/* set each entry to its exponent value (CUDA Kernel) with float16 data type*/
__global__
void KernelExp(__half * a, __half * b, int size);
/* set each entry to its exponent value */
void _CudaExp(const XTensor * a, XTensor * b);
/* set each entry to its logarithm value (CUDA Kernel) */
__global__
void KernelLog(DTYPE * a, DTYPE * b, int size);
/* set each entry to its logarithm value (CUDA Kernel) with float16 data type*/
__global__
void KernelLog(__half * a, __half * b, int size);
/* set each entry to its logarithm value */
void _CudaLog(const XTensor * a, XTensor * b);
/* set each entry to its sine value (CUDA Kernel) */
__global__
void KernelSin(DTYPE * a, DTYPE * b, int size);
/* set each entry to its sine value (CUDA Kernel) with float16 data type*/
__global__
void KernelSin(__half * a, __half * b, int size);
/* set each entry to its sine value */
void _CudaSin(const XTensor * a, XTensor * b);
/* set each entry to its cosine value (CUDA Kernel) */
__global__
void KernelCos(DTYPE * a, DTYPE * b, int size);
/* set each entry to its cosine value (CUDA Kernel) with float16 data type*/
__global__
void KernelCos(__half * a, __half * b, int size);
/* set each entry to its cosine value */
void _CudaCos(const XTensor * a, XTensor * b);
/* set each entry to its tangent value (CUDA Kernel) */
__global__
void KernelTan(DTYPE * a, DTYPE * b, int size);
/* set each entry to its tangent value (CUDA Kernel) with float16 data type*/
__global__
void KernelTan(__half * a, __half * b, int size);
/* set each entry to its tangent value */
void _CudaTan(const XTensor * a, XTensor * b);
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
#endif // __UNARY_CUH__
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#ifndef __UNARY_H__
#define __UNARY_H__
#include "../../XTensor.h"
namespace nts{
/* set every entry to its absolute value */
void _Absolute(const XTensor * a, XTensor * b);
/*
set every entry to its absolute value (do it on site)
keep the result in the input tensor a and return nothing
*/
void _AbsoluteMe(XTensor * a);
/*
set every entry to its absolute value (return a XTensor structure)
make a new tensor to keep the result and return it
*/
XTensor Absolute(const XTensor & a);
/* set every entry to its exponent value */
void _Exp(const XTensor * a, XTensor * b);
/*
set every entry to its exponent value (do it on site)
keep the result in the input tensor a and return nothing
*/
void _ExpMe(XTensor * a);
/*
set every entry to its exponent value (return a XTensor structure)
make a new tensor to keep the result and return it
*/
XTensor Exp(const XTensor & a);
/* set every entry to its logarithm value */
void _Log(const XTensor * a, XTensor * b);
/*
set every entry to its logarithm value (do it on site)
keep the result in the input tensor a and return nothing
*/
void _LogMe(XTensor * a);
/*
set every entry to its logarithm value (return a XTensor structure)
make a new tensor to keep the result and return it
*/
XTensor Log(const XTensor & a);
/* set every entry to its sine value */
void _Sin(const XTensor * a, XTensor * b);
/*
set every entry to its sine value (do it on site)
keep the result in the input tensor a and return nothing
*/
void _SinMe(XTensor * a);
/*
set every entry to its sine value (return a XTensor structure)
make a new tensor to keep the result and return it
*/
XTensor Sin(const XTensor & a);
/* set every entry to its cosine value */
void _Cos(const XTensor * a, XTensor * b);
/*
set every entry to its cosine value (do it on site)
keep the result in the input tensor a and return nothing
*/
void _CosMe(XTensor * a);
/*
set every entry to its cosine value (return a XTensor structure)
make a new tensor to keep the result and return it
*/
XTensor Cos(const XTensor & a);
/* set every entry to its tangent value */
void _Tan(const XTensor * a, XTensor * b);
/*
set every entry to its tangent value (do it on site)
keep the result in the input tensor a and return nothing
*/
void _TanMe(XTensor * a);
/*
set every entry to its tangent value (return a XTensor structure)
make a new tensor to keep the result and return it
*/
XTensor Tan(const XTensor & a);
}
#endif //end __UNARY_H__
\ No newline at end of file
...@@ -35,24 +35,33 @@ copy a number of blocks to target positions ...@@ -35,24 +35,33 @@ copy a number of blocks to target positions
>> target - target data array >> target - target data array
>> targetBlocks - target positions of the copy >> targetBlocks - target positions of the copy
>> myMem - the memory pool >> myMem - the memory pool
>> devID - device id
*/ */
void _CopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem) void _CopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem, int devID)
{ {
if (myMem != NULL && myMem->devID >= 0) { if (myMem != NULL)
devID = myMem->devID;
if (devID >= 0) {
#ifdef USE_CUDA #ifdef USE_CUDA
/* copy the index from host to device */ /* copy the index from host to device */
int * targetBlocksTMP = (int*)myMem->AllocBuf(myMem->devID, blockNum * sizeof(int)); int * targetBlocksTMP = myMem != NULL ?
(int*)myMem->AllocBuf(myMem->devID, blockNum * sizeof(int)):
(int*)XMemAlloc(devID, blockNum * sizeof(int));
XMemCopy(targetBlocksTMP, myMem->devID, targetBlocks, -1, blockNum * sizeof(int)); XMemCopy(targetBlocksTMP, myMem->devID, targetBlocks, -1, blockNum * sizeof(int));
_CopyBlocksOnSite(source, blockSize, blockNum, target, targetBlocksTMP, myMem); _CopyBlocksOnSite(source, blockSize, blockNum, target, targetBlocksTMP, devID);
myMem->ReleaseBuf(myMem->devID, blockNum * sizeof(int)); if(myMem != NULL)
myMem->ReleaseBuf(myMem->devID, blockNum * sizeof(int));
else
XMemFree(devID, targetBlocksTMP);
#else #else
ShowNTErrors("Plesae specify USE_CUDA and recompile the code!"); ShowNTErrors("Plesae specify USE_CUDA and recompile the code!");
#endif #endif
} }
else { else {
_CopyBlocksOnSite(source, blockSize, blockNum, target, targetBlocks, myMem); _CopyBlocksOnSite(source, blockSize, blockNum, target, targetBlocks, devID);
} }
} }
...@@ -65,11 +74,12 @@ copy a number of blocks source source positions to target positions ...@@ -65,11 +74,12 @@ copy a number of blocks source source positions to target positions
>> target - target data array >> target - target data array
>> targetBlocks - target positions of the copy >> targetBlocks - target positions of the copy
>> myMem - the memory pool >> myMem - the memory pool
>> devID - device id
*/ */
void _CopyBlocks(void * source, int blockSize, int * sourceBlocks, int blockNum, void * target, int * targetBlocks, XMem * myMem, int devID) void _CopyBlocks(void * source, int blockSize, int * sourceBlocks, int blockNum, void * target, int * targetBlocks, XMem * myMem, int devID)
{ {
if (myMem != NULL) if (myMem != NULL)
CheckNTErrors((myMem->devID == devID), "DevIDs are different between memory pool and input devID!"); devID = myMem->devID;
if (devID >= 0) { if (devID >= 0) {
#ifdef USE_CUDA #ifdef USE_CUDA
......
...@@ -27,7 +27,7 @@ ...@@ -27,7 +27,7 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* copy a number of blocks to target positions */ /* copy a number of blocks to target positions */
void _CopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem); void _CopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem, int devID);
/* copy a number of blocks from source positions to target positions */ /* copy a number of blocks from source positions to target positions */
void _CopyBlocks(void * source, int blockSize, int * sourceBlocks, int blockNum, void * target, int * targetBlocks, XMem * myMem, int devID); void _CopyBlocks(void * source, int blockSize, int * sourceBlocks, int blockNum, void * target, int * targetBlocks, XMem * myMem, int devID);
......
...@@ -223,8 +223,11 @@ void _CudaCopyBlocksInGrid(void * source, int blockSize, int blockNum, int gridN ...@@ -223,8 +223,11 @@ void _CudaCopyBlocksInGrid(void * source, int blockSize, int blockNum, int gridN
int cudaGrids[3]; int cudaGrids[3];
int cudaBlocks[3]; int cudaBlocks[3];
int threadNum = MIN(MAX(blockSize, blockNum), MAX_CUDA_THREAD_NUM_PER_BLOCK); int threadNum = MIN(MAX(blockSize, blockNum), MAX_CUDA_THREAD_NUM_PER_BLOCK);
int devIDBackup;
ProtectCudaDev(myMem->devID, devIDBackup);
GDevs.GetCudaThread2D(myMem->devID, threadNum, gridNum * blockNum, INT_MAX, cudaGrids, cudaBlocks); GDevs.GetCudaThread2D(myMem->devID, threadNum, gridNum * blockNum, INT_MAX, cudaGrids, cudaBlocks);
cudaBlocks[1] = 1; cudaBlocks[1] = 1;
...@@ -237,39 +240,41 @@ void _CudaCopyBlocksInGrid(void * source, int blockSize, int blockNum, int gridN ...@@ -237,39 +240,41 @@ void _CudaCopyBlocksInGrid(void * source, int blockSize, int blockNum, int gridN
if (blockNum == 4) { if (blockNum == 4) {
if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum) if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum)
KernelCopyBlocksInGridFast<int, 4, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 4, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
else else
KernelCopyBlocksInGridFast<int, 4, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 4, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
} }
else if (blockNum == 6) { else if (blockNum == 6) {
if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum) if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum)
KernelCopyBlocksInGridFast<int, 6, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 6, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
else else
KernelCopyBlocksInGridFast<int, 6, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 6, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
} }
else if (blockNum == 8) { else if (blockNum == 8) {
if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum) if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum)
KernelCopyBlocksInGridFast<int, 8, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 8, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
else else
KernelCopyBlocksInGridFast<int, 8, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 8, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
} }
else if (blockNum == 12) { else if (blockNum == 12) {
if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum) if ((SHARED_MEMORY_SIZE / itemSize - 2 * MAX_CUDA_THREAD_NUM_PER_BLOCK) >= 2 * cudaBlocks[0] * blockNum)
KernelCopyBlocksInGridFast<int, 12, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 12, 2> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
else else
KernelCopyBlocksInGridFast<int, 12, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGridFast<int, 12, 1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
} }
else { else {
KernelCopyBlocksInGrid<int> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > KernelCopyBlocksInGrid<int> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((int*)source, blockSize, blockNum, gridNum, (int*)target, index); ((int*)source, blockSize, blockNum, gridNum, (int*)target, index);
} }
BacktoCudaDev(myMem->devID, devIDBackup);
} }
#endif // USE_CUDA #endif // USE_CUDA
......
...@@ -34,29 +34,35 @@ all the data has been on the device (CPU/GPU) already. ...@@ -34,29 +34,35 @@ all the data has been on the device (CPU/GPU) already.
>> blockNum - number of blocks >> blockNum - number of blocks
>> target - target data array >> target - target data array
>> targetBlocks - target positions of the copy >> targetBlocks - target positions of the copy
>> myMem - the memory pool >> devID - device id
*/ */
void _CopyBlocksOnSite(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem) void _CopyBlocksOnSite(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, int devID)
{ {
if (myMem != NULL && myMem->devID >= 0) { if (devID >= 0) {
#ifdef USE_CUDA #ifdef USE_CUDA
_CudaCopyBlocks(source, blockSize, blockNum, target, targetBlocks, myMem); _CudaCopyBlocks(source, blockSize, blockNum, target, targetBlocks, devID);
#else #else
ShowNTErrors("Plesae specify USE_CUDA and recompile the code!"); ShowNTErrors("Plesae specify USE_CUDA and recompile the code!");
#endif #endif
} }
else { else {
int devID = myMem != NULL ? myMem->devID : -1;
/* /*
The following code should be fine with GPUs, but too many The following code should be fine with GPUs, but too many
kernel calls would slow down the system. We prefer to use kernel calls would slow down the system. We prefer to use
one kernel to do block copy in batch (kernel fusion). one kernel to do block copy in batch (kernel fusion).
*/ */
for (int i = 0, b = 0; i < blockNum; i++, b += blockSize) { if(blockSize == sizeof(int)){
XMemCopy((char*)target + targetBlocks[i] * blockSize, devID, for (int i = 0, b = 0; i < blockNum; i++, b += blockSize) {
(char*)source + b, devID, blockSize); *(int*)((char*)target + targetBlocks[i] * blockSize) =
*(int*)((char*)source + b);
}
}
else{
for (int i = 0, b = 0; i < blockNum; i++, b += blockSize) {
XMemCopy((char*)target + targetBlocks[i] * blockSize, devID,
(char*)source + b, devID, blockSize);
}
} }
} }
} }
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
\ No newline at end of file
...@@ -36,39 +36,48 @@ NOTE that this version makes more use of the 2d threads in cuda ...@@ -36,39 +36,48 @@ NOTE that this version makes more use of the 2d threads in cuda
>> target - target data array >> target - target data array
>> targetBlocks - target positions of the copy >> targetBlocks - target positions of the copy
*/ */
template<int miniBlockSize> template<class T>
__global__ __global__
void KernelCopyBlocks(DTYPE * source, int blockSize, int blockNum, DTYPE * target, int * targetBlocks) void KernelCopyBlocks(T * source, int blockSize, int blockNum, T * target, int * targetBlocks)
{ {
/* entry index in the block */ /* entry index in the block */
int i = (blockDim.x * blockIdx.x + threadIdx.x) * miniBlockSize; int i = blockDim.x * blockIdx.x + threadIdx.x;
/* block index */ /* block index */
int j = blockDim.y * blockIdx.y + threadIdx.y; int j = blockDim.y * blockIdx.y + threadIdx.y;
if (j >= blockNum) if (i >= blockSize || j >= blockNum)
return; return;
/* target position */ T * s = source + blockSize * j;
int k = targetBlocks[j]; T * t = target + blockSize * targetBlocks[j];
DTYPE * s = source + blockSize * j; t[i] = s[i];
DTYPE * t = target + blockSize * k; }
if (i < blockSize) { /*
if (miniBlockSize == 4) { copy a number of blocks to target positions
t[i] = s[i]; NOTE that this version makes more use of the 2d threads in cuda
t[i + 1] = s[i + 1]; >> source - data array (head of the blocks) to copy from
t[i + 2] = s[i + 2]; >> blockSize - size of block
t[i + 3] = s[i + 3]; >> blockNum - number of blocks
} >> target - target data array
else if (miniBlockSize <= 1) { >> targetBlocks - target positions of the copy
t[i] = s[i]; */
} template<class T>
else { __global__
printf("something wrong!"); void KernelCopyBlocksV2(T * source, int blockSize, int blockNum, int totalSize, T * target, int * targetBlocks)
} {
} /* entry index in the block */
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i >= totalSize)
return;
int targetBlockID = targetBlocks[i / blockSize];
int targetOffset = i % blockSize;
*(target + blockSize * targetBlockID + targetOffset) = source[i];
} }
/* /*
...@@ -78,29 +87,42 @@ copy a number of blocks to target positions (cuda version) ...@@ -78,29 +87,42 @@ copy a number of blocks to target positions (cuda version)
>> blockNum - number of blocks >> blockNum - number of blocks
>> target - target data array >> target - target data array
>> targetBlocks - target positions of the copy (on the device) >> targetBlocks - target positions of the copy (on the device)
>> myMem - memory pool >> devID - device id
*/ */
void _CudaCopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem) void _CudaCopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, int devID)
{ {
CheckNTErrors((myMem != NULL), "No memory pool!"); CheckNTErrors(devID >= 0, "Wrong device to run!");
CheckNTErrors((myMem->devID >= 0), "Wrong device to run!");
CheckNTErrors((blockSize % sizeof(DTYPE) == 0), "Unsupported block size!");
int cudaGrids[3]; int cudaGrids[3];
int cudaBlocks[3]; int cudaBlocks[3];
int bSize = blockSize / sizeof(DTYPE);
if (bSize % 4 == 0) { int devIDBackup;
GDevs.GetCudaThread2D(myMem->devID, bSize / 4, blockNum, MAX_INT, cudaGrids, cudaBlocks); ProtectCudaDev(devID, devIDBackup);
KernelCopyBlocks<4> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((DTYPE*)source, bSize, blockNum, (DTYPE*)target, targetBlocks); if(blockSize % sizeof(double) == 0){
int bSize = blockSize / sizeof(double);
GDevs.GetCudaThread(devID, bSize * blockNum, cudaGrids, cudaBlocks);
KernelCopyBlocksV2<double> <<<dim3(cudaGrids[0]), dim3(cudaBlocks[0]) >>>
((double*)source, bSize, blockNum, bSize * blockNum, (double*)target, targetBlocks);
//GDevs.GetCudaThread2D(devID, bSize, blockNum, MAX_INT, cudaGrids, cudaBlocks);
//KernelCopyBlocks<double> <<<dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >>>
// ((double*)source, bSize, blockNum, (double*)target, targetBlocks);
}
else
if(blockSize % sizeof(float) == 0){
int bSize = blockSize / sizeof(float);
GDevs.GetCudaThread(devID, bSize * blockNum, cudaGrids, cudaBlocks);
KernelCopyBlocksV2<float> <<<dim3(cudaGrids[0]), dim3(cudaBlocks[0]) >>>
((float*)source, bSize, blockNum, bSize * blockNum, (float*)target, targetBlocks);
//GDevs.GetCudaThread2D(devID, bSize, blockNum, MAX_INT, cudaGrids, cudaBlocks);
//KernelCopyBlocks<float> <<<dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >>>
// ((float*)source, bSize, blockNum, (float*)target, targetBlocks);
} }
else { else{
GDevs.GetCudaThread2D(myMem->devID, bSize, blockNum, MAX_INT, cudaGrids, cudaBlocks); ShowNTErrors("Unsupported block size!");
KernelCopyBlocks<1> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
((DTYPE*)source, bSize, blockNum, (DTYPE*)target, targetBlocks);
} }
BacktoCudaDev(devID, devIDBackup);
} }
#endif // USE_CUDA #endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
\ No newline at end of file
...@@ -28,15 +28,11 @@ namespace nts { // namespace nts(NiuTrans.Tensor) ...@@ -28,15 +28,11 @@ namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA #ifdef USE_CUDA
/* copy a number of blocks to target positions */
__global__
void KernelCopyBlocks(DTYPE * source, int blockSize, int blockNum, DTYPE * target, int * targetBlocks);
/* copy a number of blocks to target positions (cuda version) */ /* copy a number of blocks to target positions (cuda version) */
void _CudaCopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem); void _CudaCopyBlocks(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, int devID);
#endif // USE_CUDA #endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
#endif // __COPYBLOCKS_CUH__ #endif // __COPYBLOCKS_CUH__
\ No newline at end of file
...@@ -27,7 +27,7 @@ ...@@ -27,7 +27,7 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* copy a number of blocks to target positions (on site) */ /* copy a number of blocks to target positions (on site) */
void _CopyBlocksOnSite(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, XMem * myMem); void _CopyBlocksOnSite(void * source, int blockSize, int blockNum, void * target, int * targetBlocks, int devID);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
......
...@@ -75,6 +75,9 @@ void _CudaCopyBlocksSelected(void * source, int blockSize, int * sourceBlocks, i ...@@ -75,6 +75,9 @@ void _CudaCopyBlocksSelected(void * source, int blockSize, int * sourceBlocks, i
CheckNTErrors(devID >= 0, "Wrong device to run!"); CheckNTErrors(devID >= 0, "Wrong device to run!");
CheckNTErrors((blockSize % sizeof(DTYPE) == 0), "Unsupported block size!"); CheckNTErrors((blockSize % sizeof(DTYPE) == 0), "Unsupported block size!");
int devIDBackup;
ProtectCudaDev(devID, devIDBackup);
/* copy the index to the GPU memory */ /* copy the index to the GPU memory */
int * sourceBlocksTMP = myMem != NULL ? (int*)myMem->AllocBuf(myMem->devID, blockNum * sizeof(int)) : (int *)XMemAlloc(devID, blockNum * sizeof(int)); int * sourceBlocksTMP = myMem != NULL ? (int*)myMem->AllocBuf(myMem->devID, blockNum * sizeof(int)) : (int *)XMemAlloc(devID, blockNum * sizeof(int));
int * targetBlocksTMP = myMem != NULL ? (int*)myMem->AllocBuf(myMem->devID, blockNum * sizeof(int)) : (int *)XMemAlloc(devID, blockNum * sizeof(int)); int * targetBlocksTMP = myMem != NULL ? (int*)myMem->AllocBuf(myMem->devID, blockNum * sizeof(int)) : (int *)XMemAlloc(devID, blockNum * sizeof(int));
...@@ -97,6 +100,8 @@ void _CudaCopyBlocksSelected(void * source, int blockSize, int * sourceBlocks, i ...@@ -97,6 +100,8 @@ void _CudaCopyBlocksSelected(void * source, int blockSize, int * sourceBlocks, i
XMemFree(devID, sourceBlocksTMP); XMemFree(devID, sourceBlocksTMP);
XMemFree(devID, targetBlocksTMP); XMemFree(devID, targetBlocksTMP);
} }
BacktoCudaDev(devID, devIDBackup);
} }
#endif // USE_CUDA #endif // USE_CUDA
......
...@@ -37,8 +37,8 @@ copy indexed sub-tensors ...@@ -37,8 +37,8 @@ copy indexed sub-tensors
>> indexSize - length of srcIndex (and tgtIndex) >> indexSize - length of srcIndex (and tgtIndex)
>> tgtIndex - index of the target sub-tensors >> tgtIndex - index of the target sub-tensors
>> copyNum - number of the sub-tensors we copy for each source index, >> copyNum - number of the sub-tensors we copy for each source index,
e.g., for srcIndex = [1,4] and copyNum = 2, e.g., for srcIndex = [1,4] and copyNum = 2,
we actually copy the source sub-tensors 1, 2, 4, 5 we actually copy the source sub-tensors 1, 2, 4, 5
*/ */
void _CopyIndexed(const XTensor * s, XTensor * t, int dim, int * srcIndex, int indexSize, int * tgtIndex, int copyNum) void _CopyIndexed(const XTensor * s, XTensor * t, int dim, int * srcIndex, int indexSize, int * tgtIndex, int copyNum)
{ {
...@@ -73,17 +73,23 @@ void _CopyIndexed(const XTensor * s, XTensor * t, int dim, int * srcIndex, int i ...@@ -73,17 +73,23 @@ void _CopyIndexed(const XTensor * s, XTensor * t, int dim, int * srcIndex, int i
int * realSrcIndex = new int[realIndexSize]; int * realSrcIndex = new int[realIndexSize];
int * realTgtIndex = new int[realIndexSize]; int * realTgtIndex = new int[realIndexSize];
for (int i = 0; i < indexOffsetNum; i++) { for (int i = 0; i < indexOffsetNum; i++) {
int base = i * indexSize * copyNum;
int baseSrc = i * leadDimSizeSrc;
int baseTgt = i * leadDimSizeTgt;
for (int j = 0; j < indexSize; j++) { for (int j = 0; j < indexSize; j++) {
int offset = base + j * copyNum;
int * rsi = realSrcIndex + offset;
int * rti = realTgtIndex + offset;
for (int k = 0; k < copyNum; k++) { for (int k = 0; k < copyNum; k++) {
realSrcIndex[i * indexSize * copyNum + j * copyNum + k] = i * leadDimSizeSrc + srcIndex[j] + k; rsi[k] = baseSrc + srcIndex[j] + k;
realTgtIndex[i * indexSize * copyNum + j * copyNum + k] = i * leadDimSizeTgt + tgtIndex[j] + k; rti[k] = baseTgt + tgtIndex[j] + k;
} }
} }
} }
for (int i = 0; i < indexSize; i++) { for (int i = 0; i < indexSize; i++) {
CheckNTErrors((srcIndex[i] < blockNumSrc), "Index is out of range!"); CheckNTErrors((srcIndex[i] < blockNumSrc), "Index is out of scope!");
CheckNTErrors((tgtIndex[i] < blockNumTgt), "Index is out of range!"); CheckNTErrors((tgtIndex[i] < blockNumTgt), "Index is out of scope!");
} }
_CopyBlocks(s->data, blockSizeSrc * s->unitSize, realSrcIndex, realIndexSize, t->data, realTgtIndex, s->mem, s->devID); _CopyBlocks(s->data, blockSizeSrc * s->unitSize, realSrcIndex, realIndexSize, t->data, realTgtIndex, s->mem, s->devID);
......
...@@ -20,6 +20,7 @@ ...@@ -20,6 +20,7 @@
*/ */
#include "../../XName.h" #include "../../XName.h"
#include "../../XUtility.h"
#include "CopyValues.h" #include "CopyValues.h"
#include "CopyValues.cuh" #include "CopyValues.cuh"
...@@ -42,7 +43,7 @@ void _CopyValues(const XTensor * s, XTensor * t, XStream * stream) ...@@ -42,7 +43,7 @@ void _CopyValues(const XTensor * s, XTensor * t, XStream * stream)
if ((s->dataType == X_FLOAT16 && t->dataType == X_FLOAT) || if ((s->dataType == X_FLOAT16 && t->dataType == X_FLOAT) ||
(s->dataType == X_FLOAT && t->dataType == X_FLOAT16)) { (s->dataType == X_FLOAT && t->dataType == X_FLOAT16)) {
CheckNTErrors(((s->devID < 0 && t->devID < 0) || s->devID == t->devID), CheckNTErrors(((s->devID < 0 && t->devID < 0) || s->devID == t->devID),
"The code must be run on the same device!"); "The code must be run on the same device!");
CheckNTErrors((s->isSparse || t->isSparse), "TODO!"); CheckNTErrors((s->isSparse || t->isSparse), "TODO!");
ConvertDataType(s->devID, s->data, s->dataType, t->data, t->dataType, s->unitNum); ConvertDataType(s->devID, s->data, s->dataType, t->data, t->dataType, s->unitNum);
} }
...@@ -69,6 +70,34 @@ void _CopyValues(const XTensor * s, XTensor * t, XStream * stream) ...@@ -69,6 +70,34 @@ void _CopyValues(const XTensor * s, XTensor * t, XStream * stream)
} }
/* /*
copy s to t
>> s - source
>> sBeg - begining of the segment
>> sLen - length of the segment
>> t - target
>> tBeg - beginning of the segment on the target side
>> stream - the stream for creating the job pipeline
*/
void _CopyValues(const XTensor * s, const int sBeg, const int sLen, XTensor * t, const int tBeg, XStream * stream)
{
CheckNTErrors(s != NULL && t != NULL, "The input tensor and output tensor must be nonempty!");
CheckNTErrors(s->data != NULL && t->data != NULL, "Cannot copy from an empty data array!");
CheckNTErrors(s->unitSize == t->unitSize, "The input tensors must be of the same unit size!");
CheckNTErrors(s->order > sBeg && sBeg >= 0 && sLen <= s->unitNum, "Wrong segment on the source side");
CheckNTErrors(t->order > tBeg && tBeg >= 0, "Wrong segment on the target side");
if (!s->isSparse && !t->isSparse) {
XMemCopy((char*)t->data + tBeg * t->unitSize, t->devID,
(char*)s->data + sBeg * s->unitSize, s->devID,
s->unitSize * sLen);
}
else {
ShowNTErrors("TODO!");
}
}
/*
copy s to t (return a XTensor structure) copy s to t (return a XTensor structure)
make a new tensor to keep the result and return it make a new tensor to keep the result and return it
......
...@@ -29,6 +29,9 @@ namespace nts { // namespace nts(NiuTrans.Tensor) ...@@ -29,6 +29,9 @@ namespace nts { // namespace nts(NiuTrans.Tensor)
/* copy s to t */ /* copy s to t */
void _CopyValues(const XTensor * s, XTensor * t, XStream * stream = NULL); void _CopyValues(const XTensor * s, XTensor * t, XStream * stream = NULL);
/* copy a segment of s to t */
void _CopyValues(const XTensor * s, const int sBeg, const int sLen, XTensor * t, const int tBeg, XStream * stream = NULL);
/* /*
copy s to t (return a XTensor structure) copy s to t (return a XTensor structure)
make a new tensor to keep the result and return it make a new tensor to keep the result and return it
......
...@@ -33,14 +33,14 @@ set target data block index for the data movement in merge ...@@ -33,14 +33,14 @@ set target data block index for the data movement in merge
>> splitSizeInGrid - size of each data array to merge >> splitSizeInGrid - size of each data array to merge
>> gridSize - number of blocks in a grid (here grid is a higher level orgnization upon blocks) >> gridSize - number of blocks in a grid (here grid is a higher level orgnization upon blocks)
>> gridNum - number of grids >> gridNum - number of grids
>> mem - the memory pool >> devID - device id
*/ */
void _MakeMergeBlockIndex(int * blockIndex, int blockNum, int blockNumInMerge, void _MakeMergeBlockIndex(int * blockIndex, int blockNum, int blockNumInMerge,
int splitSizeInGrid, int gridSize, int gridNum, XMem * mem) int splitSizeInGrid, int gridSize, int gridNum, int devID)
{ {
if (mem != NULL && mem->devID >= 0) { if (devID >= 0) {
#ifdef USE_CUDA #ifdef USE_CUDA
_CudaMakeMergeBlockIndex(mem->devID, blockIndex, blockNum, blockNumInMerge, splitSizeInGrid, gridSize, gridNum); _CudaMakeMergeBlockIndex(devID, blockIndex, blockNum, blockNumInMerge, splitSizeInGrid, gridSize, gridNum);
#else #else
ShowNTErrors("Please specify USE_CUDA and recompile the code!"); ShowNTErrors("Please specify USE_CUDA and recompile the code!");
#endif #endif
......
...@@ -28,7 +28,7 @@ namespace nts { // namespace nts(NiuTrans.Tensor) ...@@ -28,7 +28,7 @@ namespace nts { // namespace nts(NiuTrans.Tensor)
/* set target data block index for the data movement in merge */ /* set target data block index for the data movement in merge */
void _MakeMergeBlockIndex(int * blockIndex, int blockNum, int blockNumInMerge, void _MakeMergeBlockIndex(int * blockIndex, int blockNum, int blockNumInMerge,
int splitSizeInGrid, int gridSize, int gridNum, XMem * mem); int splitSizeInGrid, int gridSize, int gridNum, int devID);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
......
...@@ -31,13 +31,13 @@ set target data block index for the data movement in split ...@@ -31,13 +31,13 @@ set target data block index for the data movement in split
>> splitNum - number of splits >> splitNum - number of splits
>> blockSplitSize - size of the splitted block >> blockSplitSize - size of the splitted block
>> blockNum - number of data blocks >> blockNum - number of data blocks
>> mem - the memory pool >> devID - device id
*/ */
void _MakeSplitBlockIndex(int * blockIndex, int splitNum, int blockSplitSize, int blockNum, XMem * mem) void _MakeSplitBlockIndex(int * blockIndex, int splitNum, int blockSplitSize, int blockNum, int devID)
{ {
if (mem != NULL && mem->devID >= 0) { if (devID >= 0) {
#ifdef USE_CUDA #ifdef USE_CUDA
_CudaMakeSplitBlockIndex(mem->devID, blockIndex, splitNum, blockSplitSize, blockNum); _CudaMakeSplitBlockIndex(devID, blockIndex, splitNum, blockSplitSize, blockNum);
#else #else
ShowNTErrors("Please specify USE_CUDA and recompile the code!"); ShowNTErrors("Please specify USE_CUDA and recompile the code!");
#endif #endif
......
...@@ -27,7 +27,7 @@ ...@@ -27,7 +27,7 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* set target data block index for the data movement in split */ /* set target data block index for the data movement in split */
void _MakeSplitBlockIndex(int * blockIndex, int splitNum, int blockSplitSize, int blockNum, XMem * mem); void _MakeSplitBlockIndex(int * blockIndex, int splitNum, int blockSplitSize, int blockNum, int devID);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
......
...@@ -42,10 +42,13 @@ e.g., (N/3, M, 3) -> (N, M) ...@@ -42,10 +42,13 @@ e.g., (N/3, M, 3) -> (N, M)
*/ */
void _Merge(const XTensor * s, XTensor * t, int whereToMerge, int leadingDim) void _Merge(const XTensor * s, XTensor * t, int whereToMerge, int leadingDim)
{ {
int whereToMergeRDI = s->order - whereToMerge - 1; if(leadingDim < 0)
int leadingDimRDI = s->order - leadingDim - 1; leadingDim = 0;
int whereToMergeRDI = s->order - whereToMerge - 1;
int leadingDimRDI = s->order - leadingDim - 1;
if (leadingDimRDI < 0) if (leadingDimRDI < 0)
leadingDimRDI = s->order - 1; leadingDimRDI = s->order - 1;
CheckNTErrors((s != NULL && t != NULL), "Invalid tensors!"); CheckNTErrors((s != NULL && t != NULL), "Invalid tensors!");
CheckNTErrors((s->devID == t->devID || (s->devID < 0 && t->devID < 0)), CheckNTErrors((s->devID == t->devID || (s->devID < 0 && t->devID < 0)),
...@@ -60,8 +63,12 @@ void _Merge(const XTensor * s, XTensor * t, int whereToMerge, int leadingDim) ...@@ -60,8 +63,12 @@ void _Merge(const XTensor * s, XTensor * t, int whereToMerge, int leadingDim)
CheckNTErrors((t->dimSizeRDI[i] == s->dimSizeRDI[i] * s->dimSizeRDI[leadingDimRDI]), CheckNTErrors((t->dimSizeRDI[i] == s->dimSizeRDI[i] * s->dimSizeRDI[leadingDimRDI]),
"Unmatched tensor sizes!"); "Unmatched tensor sizes!");
} }
else if (i < leadingDimRDI){
CheckNTErrors((s->dimSizeRDI[i] == t->dimSizeRDI[i]),
"Unmatched tensor sizes!");
}
else if (i > leadingDimRDI) { else if (i > leadingDimRDI) {
CheckNTErrors((s->dimSizeRDI[i - 1] == t->dimSizeRDI[i]), CheckNTErrors((s->dimSizeRDI[i] == t->dimSizeRDI[i - 1]),
"Unmatched tensor sizes!"); "Unmatched tensor sizes!");
} }
} }
...@@ -119,28 +126,24 @@ void _Merge(const XTensor * s, XTensor * t, int whereToMerge, int leadingDim) ...@@ -119,28 +126,24 @@ void _Merge(const XTensor * s, XTensor * t, int whereToMerge, int leadingDim)
int realBlockSize = blockSize * t->unitSize; int realBlockSize = blockSize * t->unitSize;
int * blockIndex = (int*)(mem != NULL ? int * blockIndex = (int*)(mem != NULL ?
mem->AllocBuf(mem->devID, blockNum * gridNum * sizeof(int)) : mem->AllocBuf(mem->devID, blockNum * gridNum * sizeof(int)) :
XMemAlloc(mem->devID, blockNum * gridNum * sizeof(int))); XMemAlloc(s->devID, blockNum * gridNum * sizeof(int)));
_MakeMergeBlockIndex(blockIndex, blockNum, blockNumInMerge, splitSizeInGrid, gridSize, gridNum, mem); _MakeMergeBlockIndex(blockIndex, blockNum, blockNumInMerge, splitSizeInGrid, gridSize, gridNum, s->devID);
_CopyBlocksOnSite(s->data, realBlockSize, blockNum, dataTMP, blockIndex, mem); _CopyBlocksOnSite(s->data, realBlockSize, blockNum * gridNum, dataTMP, blockIndex, s->devID);
if (mem != NULL) if (mem != NULL)
mem->ReleaseBuf(mem->devID, blockNum * gridNum * sizeof(int)); mem->ReleaseBuf(mem->devID, blockNum * gridNum * sizeof(int));
else else
XMemFree(mem->devID, blockIndex); XMemFree(s->devID, blockIndex);
/* copy from tmp to target */
XMemCopy(t->data, t->devID, dataTMP, s->devID, size);
if (!isOnSameDevice) { if (!isOnSameDevice) {
XMemCopy(t->data, t->devID, dataTMP, s->devID, size); XMemCopy(t->data, t->devID, dataTMP, s->devID, size);
if (mem != NULL) if (mem != NULL)
mem->ReleaseBuf(mem->devID, size); mem->ReleaseBuf(mem->devID, size);
else else
XMemFree(mem->devID, dataTMP); XMemFree(s->devID, dataTMP);
} }
} }
} }
...@@ -163,7 +166,7 @@ XTensor Merge(const XTensor &s, int whereToMerge, int leadingDim) ...@@ -163,7 +166,7 @@ XTensor Merge(const XTensor &s, int whereToMerge, int leadingDim)
CheckNTErrors(leadingDim < whereToMerge, "Invalid leading dimension!"); CheckNTErrors(leadingDim < whereToMerge, "Invalid leading dimension!");
if (leadingDim < 0) if (leadingDim < 0)
leadingDim = 0; leadingDim = 0;
int order = s.order - 1; int order = s.order - 1;
int * dimSize = new int[order]; int * dimSize = new int[order];
...@@ -205,7 +208,7 @@ merge small tensors into a big tensor ...@@ -205,7 +208,7 @@ merge small tensors into a big tensor
*/ */
void _Merge(const XList * smalls, XTensor * big, int whereToMerge) void _Merge(const XList * smalls, XTensor * big, int whereToMerge)
{ {
CheckNTErrors((smalls != NULL), "Invalid list!"); CheckNTErrors((smalls != NULL), "Invalid list!");
CheckNTErrors((smalls->count > 0), "Empty list!"); CheckNTErrors((smalls->count > 0), "Empty list!");
bool uniform = true; bool uniform = true;
...@@ -233,7 +236,7 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge) ...@@ -233,7 +236,7 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge)
int mergedNum = smalls->count; int mergedNum = smalls->count;
XTensor * s0 = (XTensor*)smalls->GetItem(0); XTensor * s0 = (XTensor*)smalls->GetItem(0);
int whereToMergeRDI = s0->order - whereToMerge - 1; int whereToMergeRDI = s0->order - whereToMerge - 1;
for (int i = 0; i < s0->order; i++) { for (int i = 0; i < s0->order; i++) {
if (i <= whereToMergeRDI) if (i <= whereToMergeRDI)
blockSize *= s0->dimSizeRDI[i]; blockSize *= s0->dimSizeRDI[i];
...@@ -268,10 +271,10 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge) ...@@ -268,10 +271,10 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge)
} }
/* merging with fewer kernel/api calls??? (i'm not sure about it!! may remove this later) */ /* merging with fewer kernel/api calls??? (i'm not sure about it!! may remove this later) */
else { else {
int* dimSizeTMP = new int[MAX_TENSOR_DIM_NUM]; int* dimSizeTMP = new int[smallsItem0->order + 1];
for (int i = 0; i < MAX_TENSOR_DIM_NUM; i++) for (int i = 0; i < smallsItem0->order; i++)
dimSizeTMP[i] = -smallsItem0->dimSizeRDI[i]; dimSizeTMP[i + 1] = -smallsItem0->dimSize[i];
dimSizeTMP[smallsItem0->order] = -mergeNum; dimSizeTMP[0] = -mergeNum;
XMem * mem = smallsItem0->mem; XMem * mem = smallsItem0->mem;
XTensor * tensorTMP = new XTensor(smallsItem0->order + 1, dimSizeTMP, XTensor * tensorTMP = new XTensor(smallsItem0->order + 1, dimSizeTMP,
...@@ -283,7 +286,7 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge) ...@@ -283,7 +286,7 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge)
if (uniform) if (uniform)
dataTMP = smallsItem0->data; dataTMP = smallsItem0->data;
else else
dataTMP = mem != NULL ? mem->AllocBuf(mem->devID, size) : XMemAlloc(mem->devID, size); dataTMP = mem != NULL ? mem->AllocBuf(mem->devID, size) : XMemAlloc(big->devID, size);
tensorTMP->data = dataTMP; tensorTMP->data = dataTMP;
...@@ -295,18 +298,17 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge) ...@@ -295,18 +298,17 @@ void _Merge(const XList * smalls, XTensor * big, int whereToMerge)
} }
} }
_Merge(tensorTMP, big, whereToMerge); _Merge(tensorTMP, big, whereToMerge + 1);
delete[] dimSizeTMP; delete[] dimSizeTMP;
tensorTMP->data = NULL;
dataTMP = NULL;
tensorTMP->data = NULL;
delete tensorTMP; delete tensorTMP;
if ((!uniform) && (mem != NULL)) if ((!uniform) && (mem != NULL))
mem->ReleaseBuf(mem->devID, size); mem->ReleaseBuf(mem->devID, size);
else else
XMemFree(mem->devID, dataTMP); XMemFree(big->devID, dataTMP);
} }
} }
......
...@@ -24,6 +24,7 @@ ...@@ -24,6 +24,7 @@
#include "MakeSplitBlockIndex.h" #include "MakeSplitBlockIndex.h"
#include "../../XName.h" #include "../../XName.h"
#include "../../XTensor.h" #include "../../XTensor.h"
#include "../../XDevice.h"
#include "../../XUtility.h" #include "../../XUtility.h"
#include "../movement/CopyBlocksOnSite.h" #include "../movement/CopyBlocksOnSite.h"
...@@ -88,10 +89,33 @@ void _Split(const XTensor * s, XTensor * t, int whereToSplit, int splitNum) ...@@ -88,10 +89,33 @@ void _Split(const XTensor * s, XTensor * t, int whereToSplit, int splitNum)
int n = blockNum / splitNum; int n = blockNum / splitNum;
int sStep = blockSize * s->unitSize; int sStep = blockSize * s->unitSize;
int tStep = n * tPitch; int tStep = n * tPitch;
for (int k = 0; k < splitNum; k++) { if(t->devID < 0){
XMemCopy2D((char*)t->data + k * tStep, tPitch, t->devID, for (int k = 0; k < splitNum; k++) {
(char*)s->data + k * sStep, sPitch, s->devID, XMemCopy2D((char*)t->data + k * tStep, tPitch, t->devID,
mSize, n); (char*)s->data + k * sStep, sPitch, s->devID,
mSize, n);
}
}
else{
#ifdef USE_CUDA
#ifdef STREAMED_MEMCPOPY
XStream * stream = GDevs.GPUs[t->devID].stream;
for (int k = 0; k < splitNum; k++) {
XMemCopy2DAsync((char*)t->data + k * tStep, tPitch, t->devID,
(char*)s->data + k * sStep, sPitch, s->devID,
mSize, n, stream);
}
stream->StreamSynchronize();
#else
for (int k = 0; k < splitNum; k++) {
XMemCopy2D((char*)t->data + k * tStep, tPitch, t->devID,
(char*)s->data + k * sStep, sPitch, s->devID,
mSize, n);
}
#endif
#else
ShowNTErrors("Please specify USE_CUDA and recompile the code!");
#endif
} }
} }
else { else {
...@@ -108,17 +132,17 @@ void _Split(const XTensor * s, XTensor * t, int whereToSplit, int splitNum) ...@@ -108,17 +132,17 @@ void _Split(const XTensor * s, XTensor * t, int whereToSplit, int splitNum)
int blockSplitSize = blockNum / splitNum; int blockSplitSize = blockNum / splitNum;
int * blockIndex = (int*)(mem != NULL ? int * blockIndex = (int*)(mem != NULL ?
mem->AllocBuf(mem->devID, blockNum * sizeof(int)) : mem->AllocBuf(mem->devID, blockNum * sizeof(int)) :
XMemAlloc(mem->devID, blockNum * sizeof(int))); XMemAlloc(s->devID, blockNum * sizeof(int)));
_MakeSplitBlockIndex(blockIndex, splitNum, blockSplitSize, blockNum, mem); _MakeSplitBlockIndex(blockIndex, splitNum, blockSplitSize, blockNum, s->devID);
_CopyBlocksOnSite(s->data, realBlockSize, blockNum, dataTMP, blockIndex, mem); _CopyBlocksOnSite(s->data, realBlockSize, blockNum, dataTMP, blockIndex, s->devID);
if (mem != NULL) if (mem != NULL)
mem->ReleaseBuf(mem->devID, blockNum * sizeof(int)); mem->ReleaseBuf(mem->devID, blockNum * sizeof(int));
else else
XMemFree(mem->devID, blockIndex); XMemFree(s->devID, blockIndex);
/* copy from tmp to target */ /* copy from tmp to target */
if (!isOnSameDevice) { if (!isOnSameDevice) {
...@@ -127,7 +151,7 @@ void _Split(const XTensor * s, XTensor * t, int whereToSplit, int splitNum) ...@@ -127,7 +151,7 @@ void _Split(const XTensor * s, XTensor * t, int whereToSplit, int splitNum)
if (mem != NULL) if (mem != NULL)
mem->ReleaseBuf(mem->devID, size); mem->ReleaseBuf(mem->devID, size);
else else
XMemFree(mem->devID, dataTMP); XMemFree(s->devID, dataTMP);
} }
} }
} }
...@@ -226,20 +250,46 @@ void _Split(const XTensor * big, XList * smalls, int whereToSplit, int splitNum) ...@@ -226,20 +250,46 @@ void _Split(const XTensor * big, XList * smalls, int whereToSplit, int splitNum)
int n = blockNum / splitNum; int n = blockNum / splitNum;
int sStep = blockSize * big->unitSize; int sStep = blockSize * big->unitSize;
int tStep = 0; int tStep = 0;
for (int k = 0; k < splitNum; k++) {
XTensor * t = (XTensor*)smalls->GetItem(k); if(big->devID < 0){
XMemCopy2D((char*)t->data + k * tStep, tPitch, t->devID, for (int k = 0; k < splitNum; k++) {
(char*)big->data + k * sStep, sPitch, big->devID, XTensor * t = (XTensor*)smalls->GetItem(k);
mSize, n); XMemCopy2D((char*)t->data + k * tStep, tPitch, t->devID,
(char*)big->data + k * sStep, sPitch, big->devID,
mSize, n);
}
}
else{
#ifdef USE_CUDA
#ifdef STREAMED_MEMCPOPY
XStream * stream = GDevs.GPUs[big->devID].stream;
for (int k = 0; k < splitNum; k++) {
XTensor * t = (XTensor*)smalls->GetItem(k);
XMemCopy2DAsync((char*)t->data + k * tStep, tPitch, t->devID,
(char*)big->data + k * sStep, sPitch, big->devID,
mSize, n, stream);
}
stream->StreamSynchronize();
#else
for (int k = 0; k < splitNum; k++) {
XTensor * t = (XTensor*)smalls->GetItem(k);
XMemCopy2D((char*)t->data + k * tStep, tPitch, t->devID,
(char*)big->data + k * sStep, sPitch, big->devID,
mSize, n);
}
#endif
#else
ShowNTErrors("Please specify USE_CUDA and recompile the code!");
#endif
} }
} }
/* splitting with fewer kernel/api calls??? (i'm not sure about it!! may remove this later) */ /* splitting with fewer kernel/api calls??? (i'm not sure about it!! may remove this later) */
else { else {
int* dimSizeTMP = new int[MAX_TENSOR_DIM_NUM]; int* dimSizeTMP = new int[big->order + 1];
for (int i = 0; i < MAX_TENSOR_DIM_NUM; i++) for (int i = 0; i < big->order; i++)
dimSizeTMP[i] = -big->dimSize[i]; dimSizeTMP[i + 1] = -big->dimSize[i];
dimSizeTMP[whereToSplit] /= splitNum; dimSizeTMP[whereToSplit + 1] /= splitNum;
dimSizeTMP[big->order] = -splitNum; dimSizeTMP[0] = -splitNum;
XMem * mem = big->mem; XMem * mem = big->mem;
XTensor* tensorTMP = new XTensor(big->order + 1, dimSizeTMP, big->dataType, big->denseRatio, big->devID, mem); XTensor* tensorTMP = new XTensor(big->order + 1, dimSizeTMP, big->dataType, big->denseRatio, big->devID, mem);
...@@ -251,7 +301,7 @@ void _Split(const XTensor * big, XList * smalls, int whereToSplit, int splitNum) ...@@ -251,7 +301,7 @@ void _Split(const XTensor * big, XList * smalls, int whereToSplit, int splitNum)
dataTMP = first->data; dataTMP = first->data;
} }
else { else {
dataTMP = mem != NULL ? mem->AllocBuf(mem->devID, size) : XMemAlloc(mem->devID, size); dataTMP = mem != NULL ? mem->AllocBuf(mem->devID, size) : XMemAlloc(big->devID, size);
} }
tensorTMP->data = dataTMP; tensorTMP->data = dataTMP;
...@@ -270,13 +320,12 @@ void _Split(const XTensor * big, XList * smalls, int whereToSplit, int splitNum) ...@@ -270,13 +320,12 @@ void _Split(const XTensor * big, XList * smalls, int whereToSplit, int splitNum)
delete[] dimSizeTMP; delete[] dimSizeTMP;
tensorTMP->data = NULL; tensorTMP->data = NULL;
dataTMP = NULL;
delete tensorTMP; delete tensorTMP;
if ((!uniform) && (mem != NULL)) if ((!uniform) && (mem != NULL))
mem->ReleaseBuf(mem->devID, size); mem->ReleaseBuf(mem->devID, size);
else else
XMemFree(mem->devID, dataTMP); XMemFree(big->devID, dataTMP);
} }
} }
......
...@@ -26,6 +26,8 @@ ...@@ -26,6 +26,8 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
#define STREAMED_MEMCPOPY
/* /*
transform a tensor by splitting it transform a tensor by splitting it
e.g., (M, N) -> (M, N/3, 3) e.g., (M, N) -> (M, N/3, 3)
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-07-28
* It is extreamly hot these days and i cannot sleep well. Fortunately we had
* good lunch of Steamed Cold Noodles. This made me feel much better!
*/
#include "Transpose.h"
#include "Merge.h"
#include "../../XUtility.h"
#include "../../XName.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
tensor transposition of dimensions i and j
b = transposed(a)
For a input tensor a, we tranpose the dimensions i and j of it.
E.g., let a be a tensor of size x * y * z, i = 0, j = 2,
then the output will be a tensor of size z * y * x.
>> a - the input tensor
>> b - the output tensor by transpose tensor a with specified dimensions i and j
>> i - the transposed dimension
>> j - the transposed dimension
*/
void _Transpose(const XTensor * a, XTensor * b, const int i, const int j)
{
CheckNTErrors(a && b, "Empty tensors");
CheckNTErrors(a->order == b->order, "Wrong tensor orders");
CheckNTErrors(a->unitNum == b->unitNum && a->unitSize == b->unitSize, "Wrong tensor sizes");
CheckNTErrors(a->order > i && i >= 0, "index of dimension is out of scope!");
CheckNTErrors(a->order > j && j >= 0, "index of dimension is out of scope!");
for(int k = 0; k < a->order; k++){
if(k == i){
CheckNTErrors(a->dimSize[k] == b->dimSize[j], "Wrong dimension size in transposition");
}
else if(k == j){
CheckNTErrors(a->dimSize[k] == b->dimSize[i], "Wrong dimension size in transposition");
}
else{
CheckNTErrors(a->dimSize[k] == b->dimSize[k], "Wrong dimension size in transposition");
}
}
if(i == j){
XMemCopy(b->data, b->devID, a->data, a->devID, b->unitNum * b->unitSize);
}
else{
int I = MIN(i, j);
int J = MAX(i, j);
int * dims = new int[a->order + 1];
for(int k = 0; k <= J; k++)
dims[k] = a->dimSize[k];
dims[J + 1] = -1;
for(int k = J + 1; k < a->order; k++)
dims[k + 1] = a->dimSize[k];
/* reshape tensor a form (..., n_I, ..., n_J, ...) => (..., n_I, ..., n_J, 1, ...)*/
XTensor * aTMP = new XTensor(a->order + 1, dims, a->dataType, a->denseRatio, a->devID, a->mem);
aTMP->data = a->data;
for(int k = 0; k < I; k++)
dims[k] = a->dimSize[k];
for(int k = I + 1; k <= J; k++)
dims[k - 1] = a->dimSize[k];
dims[J] = a->dimSize[I];
for(int k = J + 1; k < a->order; k++)
dims[k] = a->dimSize[k];
/* reshape tensor b form (..., m_I, ..., m_J, ...) => (..., m_J, m_I, ...) */
b->Reshape(b->order, dims);
/* tensor (..., n_I, ..., n_J, 1, ...) => tensor (..., m_J, m_I, ...) */
_Merge(aTMP, b, J + 1, I);
memcpy(dims, a->dimSize, sizeof(int) * a->order);
dims[I] = a->dimSize[J];
dims[J] = a->dimSize[I];
/* reshape tensor b form (..., m_J, m_I, ...) => (..., m_J, ..., m_I, ...) => */
b->Reshape(b->order, dims);
aTMP->data = NULL;
delete[] dims;
delete aTMP;
}
}
/*
tensor transposition of dimensions i and j (return a XTensor structure).
make a new tensor to keep the result and return it.
b = transposed(a)
For a input tensor a, we tranpose the dimensions i and j of it.
E.g., let a be a tensor of size x * y * z, i = 0, j = 2,
then the output will be a tensor of size z * y * x.
>> a - the input tensor
>> i - the transposed dimension
>> j - the transposed dimension
<< return - the output tensor by transpose tensor a with specified dimensions i and j
*/
XTensor Transpose(const XTensor &a, const int i, const int j)
{
CheckNTErrors(a.order > i && i >= 0, "index of dimension is out of scope!");
CheckNTErrors(a.order > j && j >= 0, "index of dimension is out of scope!");
int order = a.order;
int * dimSize = new int[order];
for(int k = 0; k < order; k++){
if(k == i)
dimSize[k] = a.dimSize[j];
else if(k == j)
dimSize[k] = a.dimSize[i];
else
dimSize[k] = a.dimSize[k];
}
float dr = (!a.isSparse) ? 1.0F : a.denseRatio;
XTensor b(order, dimSize, a.dataType, dr, a.devID, a.mem);
b.SetTMP();
/* call _Transpose function */
_Transpose(&a, &b, i, j);
/* tensor connection */
XLink::MakeLink(&a, NULL, &b, SHAPE_TRANSPOSE);
XLink::AddParamToHeadInt(&b, i);
XLink::AddParamToHeadInt(&b, j);
/* destroy variables */
delete[] dimSize;
return b;
}
}
...@@ -27,27 +27,18 @@ ...@@ -27,27 +27,18 @@
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
#define transpose _Transpose_
/* /*
generate a transposed 1D/2D tensor tensor transposition of dimensions i and j
b = transposed(a) b = transposed(a)
*/ */
void _Transpose(XTensor * a, XTensor * b); void _Transpose(const XTensor * a, XTensor * b, const int i, const int j);
/*
transpose a 1D/2D tensor (do it on site).
keep the result in the input tensor and return nothing.
a = transposed(a)
*/
void _TransposeMe(XTensor * a);
/* /*
make a transposed 1D/2D tensor (return a XTensor structure). tensor transposition of dimensions i and j (return a XTensor structure).
make a new tensor to keep the result and return it. make a new tensor to keep the result and return it.
b = transposed(a) b = transposed(a)
*/ */
XTensor Transpose(XTensor &a); XTensor Transpose(const XTensor &a, const int i, const int j);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
......
...@@ -32,12 +32,108 @@ namespace nts { // namespace nts(NiuTrans.Tensor) ...@@ -32,12 +32,108 @@ namespace nts { // namespace nts(NiuTrans.Tensor)
insert a dimension by copying the blocks for n times (where n is the size of the inerted dimension) insert a dimension by copying the blocks for n times (where n is the size of the inerted dimension)
>> s - pointer to the source data array >> s - pointer to the source data array
>> blockSize - size of a block >> blockSize - size of a block
>> totalSize - total size of the blocks (i.e., blockSIze * n)
>> t - pointer to the target data array
>> n - number of blocks to copy data
*/
template<class T>
__global__
void KernelUnsqueezeFlat(void * s, int blockSize, int totalSize, void * t, int n)
{
/* index of data items */
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i >= blockSize)
return;
T value = ((T*)s)[i];
T * tData = (T*)t;
__syncthreads();
for (int k = i; k < totalSize; k += blockSize)
tData[k] = value;
}
/*
insert a dimension by copying the blocks for n times (where n is the size of the inerted dimension)
>> s - pointer to the source data array
>> blockSize - size of a block
>> totalSize - total size of the blocks (i.e., blockSIze * n)
>> t - pointer to the target data array
>> n - number of blocks to copy data
*/
template<class T>
__global__
void KernelUnsqueezeFlatBigram(void * s, int blockSize, int totalSize, void * t, int n)
{
/* index of data items */
int i = (blockDim.x * blockIdx.x + threadIdx.x) * 2;
if (i >= blockSize)
return;
T value = ((T*)s)[i];
T value2 = ((T*)s)[i + 1];
T * tData = (T*)t;
__syncthreads();
for (int k = i; k < totalSize; k += blockSize){
tData[k] = value;
tData[k + 1] = value2;
}
}
/*
insert a dimension by copying the blocks for n times (where n is the size of the inerted dimension)
>> s - pointer to the source data array
>> blockSize - size of a block
>> totalSize - total size of the blocks (i.e., blockSIze * n)
>> t - pointer to the target data array
>> n - number of blocks to copy data
*/
template<class T>
__global__
void KernelUnsqueezeFlat2D(void * s, int blockSize, int totalSize, void * t, int n)
{
__shared__ T data[MAX_CUDA_THREAD_NUM_PER_BLOCK];
__shared__ int offsets[MAX_CUDA_THREAD_NUM_PER_BLOCK];
/* index of data items */
int i = blockDim.x * blockIdx.x + threadIdx.x;
/* index of data items */
int j = blockDim.y * blockIdx.y + threadIdx.y;
if (i >= blockSize || j >= n)
return;
if(threadIdx.y == 0)
data[threadIdx.x] = ((T*)s)[i];
if(threadIdx.x == 0)
offsets[threadIdx.y] = blockSize * j;
__syncthreads();
((T*)t)[offsets[threadIdx.y] + i] = data[threadIdx.x];
}
/*
insert a dimension by copying the blocks for n times (where n is the size of the inerted dimension)
>> s - pointer to the source data array
>> blockSize - size of a block
>> blockNum - number of the blocks >> blockNum - number of the blocks
>> totalSize - total size of the blocks (i.e., blockSIze * n)
>> t - pointer to the target data array >> t - pointer to the target data array
>> n - number of blocks to copy data
*/ */
template<class T> template<class T>
__global__ __global__
void KernelUnsqueeze(void * s, int blockSize, int blockNum, void * t, int n) void KernelUnsqueeze(void * s, int blockSize, int blockNum, int totalSize, void * t, int n)
{ {
/* index of data items */ /* index of data items */
int i = blockDim.x * blockIdx.x + threadIdx.x; int i = blockDim.x * blockIdx.x + threadIdx.x;
...@@ -51,11 +147,10 @@ void KernelUnsqueeze(void * s, int blockSize, int blockNum, void * t, int n) ...@@ -51,11 +147,10 @@ void KernelUnsqueeze(void * s, int blockSize, int blockNum, void * t, int n)
MTYPE offset = blockSize * j; MTYPE offset = blockSize * j;
T value = ((T*)s)[offset + i]; T value = ((T*)s)[offset + i];
T * tData = (T*)t + offset * n; T * tData = (T*)t + offset * n;
int length = blockSize * n;
__syncthreads(); __syncthreads();
for (int k = i; k < length; k += blockSize) for (int k = i; k < totalSize; k += blockSize)
tData[k] = value; tData[k] = value;
} }
...@@ -83,21 +178,71 @@ void _CudaUnsqueeze(const XTensor * a, XTensor * b, int dim, int dSize) ...@@ -83,21 +178,71 @@ void _CudaUnsqueeze(const XTensor * a, XTensor * b, int dim, int dSize)
int cudaGrids[3]; int cudaGrids[3];
int cudaBlocks[3]; int cudaBlocks[3];
GDevs.GetCudaThread2D(a->devID, blockSize, blockNumA, MAX_INT, cudaGrids, cudaBlocks);
int devIDBackup = 0; int devIDBackup = 0;
ProtectCudaDev(a->devID, devIDBackup); ProtectCudaDev(a->devID, devIDBackup);
if (a->dataType == X_FLOAT && a->dataType == X_FLOAT) { if(blockNumA > 1){
KernelUnsqueeze<float> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > GDevs.GetCudaThread2D(a->devID, blockSize, blockNumA, MAX_INT, cudaGrids, cudaBlocks);
(a->data, blockSize, blockNumA, b->data, dSize);
if (a->dataType == X_FLOAT && a->dataType == X_FLOAT) {
KernelUnsqueeze<float> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
(a->data, blockSize, blockNumA, blockSize * dSize, b->data, dSize);
}
else if (a->dataType == X_INT && a->dataType == X_INT) {
KernelUnsqueeze<int> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
(a->data, blockSize, blockNumA, blockSize * dSize, b->data, dSize);
}
else {
ShowNTErrors("TODO!");
}
}
else if(blockNumA == 1 && blockSize < MAX_CUDA_THREAD_NUM_PER_BLOCK){
GDevs.GetCudaThread2D(a->devID, blockSize, dSize, MAX_CUDA_THREAD_NUM_PER_BLOCK/4, cudaGrids, cudaBlocks);
if (a->dataType == X_FLOAT && a->dataType == X_FLOAT) {
KernelUnsqueezeFlat2D<float> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
(a->data, blockSize, blockSize * dSize, b->data, dSize);
}
else if (a->dataType == X_INT && a->dataType == X_INT) {
KernelUnsqueezeFlat2D<int> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> >
(a->data, blockSize, blockSize * dSize, b->data, dSize);
}
else {
ShowNTErrors("TODO!");
}
}
else if(blockNumA == 1 && blockSize % 2 == 0){
GDevs.GetCudaThread(a->devID, blockSize/2, cudaGrids, cudaBlocks);
if (a->dataType == X_FLOAT && a->dataType == X_FLOAT) {
KernelUnsqueezeFlatBigram<float> << <dim3(cudaGrids[0]), dim3(cudaBlocks[0]) >> >
(a->data, blockSize, blockSize * dSize, b->data, dSize);
}
else if (a->dataType == X_INT && a->dataType == X_INT) {
KernelUnsqueezeFlatBigram<int> << <dim3(cudaGrids[0]), dim3(cudaBlocks[0]) >> >
(a->data, blockSize, blockSize * dSize, b->data, dSize);
}
else {
ShowNTErrors("TODO!");
}
} }
else if (a->dataType == X_INT && a->dataType == X_INT) { else if(blockNumA == 1){
KernelUnsqueeze<int> << <dim3(cudaGrids[0], cudaGrids[1]), dim3(cudaBlocks[0], cudaBlocks[1]) >> > GDevs.GetCudaThread(a->devID, blockSize, cudaGrids, cudaBlocks);
(a->data, blockSize, blockNumA, b->data, dSize);
if (a->dataType == X_FLOAT && a->dataType == X_FLOAT) {
KernelUnsqueezeFlat<float> << <dim3(cudaGrids[0]), dim3(cudaBlocks[0]) >> >
(a->data, blockSize, blockSize * dSize, b->data, dSize);
}
else if (a->dataType == X_INT && a->dataType == X_INT) {
KernelUnsqueezeFlat<int> << <dim3(cudaGrids[0]), dim3(cudaBlocks[0]) >> >
(a->data, blockSize, blockSize * dSize, b->data, dSize);
}
else {
ShowNTErrors("TODO!");
}
} }
else { else{
ShowNTErrors("TODO!"); ShowNTErrors("Something is wrong!");
} }
BacktoCudaDev(a->devID, devIDBackup); BacktoCudaDev(a->devID, devIDBackup);
......
...@@ -117,7 +117,7 @@ void CudaGPUToCPUFlush(XTensor * tensor) ...@@ -117,7 +117,7 @@ void CudaGPUToCPUFlush(XTensor * tensor)
else { else {
tensor->dataHost = new char[tensor->unitNum * tensor->unitSize]; tensor->dataHost = new char[tensor->unitNum * tensor->unitSize];
if (tensor->data != NULL) if (tensor->data != NULL)
cudaMemcpy(tensor->dataHost, tensor->data, tensor->unitNum * tensor->unitSize, cudaMemcpyDeviceToHost); XMemCopy(tensor->dataHost, -1, tensor->data, tensor->devID, tensor->unitNum * tensor->unitSize);
else else
memset(tensor->dataHost, 0, tensor->unitNum * tensor->unitSize); memset(tensor->dataHost, 0, tensor->unitNum * tensor->unitSize);
} }
......
...@@ -38,6 +38,17 @@ log scale softmax y = log(e^x / \sum_{i} e^{x_i}) ...@@ -38,6 +38,17 @@ log scale softmax y = log(e^x / \sum_{i} e^{x_i})
*/ */
void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim) void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim)
{ {
CheckNTErrors(!x->isSparse && !y->isSparse, "TODO!");
CheckNTErrors(x && y, "Empty input tensors!");
if(leadDim < 0)
leadDim = x->order - 1;
if(y->dimSize[leadDim] == 1){
y->SetZeroAll();
return;
}
int leadDimRDI = x->order - leadDim - 1; int leadDimRDI = x->order - leadDim - 1;
if (!x->isSparse && !y->isSparse && if (!x->isSparse && !y->isSparse &&
x->dataType == DEFAULT_DTYPE && y->dataType == DEFAULT_DTYPE) x->dataType == DEFAULT_DTYPE && y->dataType == DEFAULT_DTYPE)
...@@ -68,25 +79,27 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim) ...@@ -68,25 +79,27 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim)
blockSize = stride * dimensionSize; blockSize = stride * dimensionSize;
blockNum = y->unitNum / blockSize; blockNum = y->unitNum / blockSize;
max = NewTensor(x->order - 1, dimSize, x->dataType, x->denseRatio, x->devID, mem); max = NewTensorBuf(x->order - 1, dimSize, x->dataType, x->denseRatio, x->devID, mem);
sum = NewTensor(x->order - 1, dimSize, x->dataType, x->denseRatio, x->devID, mem); sum = NewTensorBuf(x->order - 1, dimSize, x->dataType, x->denseRatio, x->devID, mem);
max->data = mem != NULL ? (char*)mem->AllocBuf(mem->devID, max->unitNum * max->unitSize) : XMemAlloc(max->devID, max->unitNum * max->unitSize);
sum->data = mem != NULL ? (char*)mem->AllocBuf(mem->devID, sum->unitNum * sum->unitSize) : XMemAlloc(sum->devID, sum->unitNum * sum->unitSize);
_ReduceMax(x, max, leadDim); _ReduceMax(x, max, leadDim);
_ReduceSum(x, sum, leadDim, max, 1.0F, true); _ReduceSum(x, sum, leadDim, max, 1.0F, true);
if (x->devID >= 0) { if (x->devID >= 0) {
int dims[2]; if(leadDimRDI == 0){
dims[0] = -stride; blockSize = y->unitNum;
dims[1] = dimensionSize; blockNum = 1;
blockx = NewTensor(2, dims, x->dataType, x->denseRatio, x->devID, mem); blockx = NewTensor2D(blockSize/dimensionSize, -dimensionSize, x->dataType, x->devID, mem);
blocky = NewTensor(2, dims, x->dataType, x->denseRatio, x->devID, mem); blocky = NewTensor2D(blockSize/dimensionSize, -dimensionSize, x->dataType, x->devID, mem);
dims[0] = -stride; blockMax = NewTensor2D(blockSize/dimensionSize, -1, x->dataType, x->devID, mem);
dims[1] = 1; blockSum = NewTensor2D(blockSize/dimensionSize, -1, x->dataType, x->devID, mem);
blockMax = NewTensor(2, dims, x->dataType, x->denseRatio, x->devID, mem); }
blockSum = NewTensor(2, dims, x->dataType, x->denseRatio, x->devID, mem); else{
blockx = NewTensor2D(-stride, dimensionSize, x->dataType, x->devID, mem);
blocky = NewTensor2D(-stride, dimensionSize, x->dataType, x->devID, mem);
blockMax = NewTensor2D(-stride, 1, x->dataType, x->devID, mem);
blockSum = NewTensor2D(-stride, 1, x->dataType, x->devID, mem);
}
} }
for (int k = 0; k < blockNum; k++) { for (int k = 0; k < blockNum; k++) {
...@@ -123,7 +136,10 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim) ...@@ -123,7 +136,10 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim)
blockMax->data = mp; blockMax->data = mp;
blockSum->data = sp; blockSum->data = sp;
#ifdef USE_CUDA #ifdef USE_CUDA
_CudaLogSoftmaxSumMax(blockx, blocky, leadDim, blockSum, blockMax); if(leadDimRDI == 0)
_CudaLogSoftmaxSumMax(blockx, blocky, 1, blockSum, blockMax);
else
_CudaLogSoftmaxSumMax(blockx, blocky, leadDim, blockSum, blockMax);
#else #else
ShowNTErrors("Please specify USE_CUDA and recompile the code!"); ShowNTErrors("Please specify USE_CUDA and recompile the code!");
#endif #endif
...@@ -135,18 +151,8 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim) ...@@ -135,18 +151,8 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim)
} }
if (x->devID < 0) { if (x->devID < 0) {
if (mem != NULL) { DelTensorBuf(max);
mem->ReleaseBuf(mem->devID, max->unitNum * max->unitSize); DelTensorBuf(sum);
mem->ReleaseBuf(mem->devID, sum->unitNum * sum->unitSize);
}
else {
XMemFree(max->devID, max->data);
XMemFree(sum->devID, sum->data);
max->data = NULL;
sum->data = NULL;
}
delete max;
delete sum;
} }
else { else {
delete blockx; delete blockx;
...@@ -184,6 +190,27 @@ XTensor LogSoftmax(const XTensor &x, int leadDim) ...@@ -184,6 +190,27 @@ XTensor LogSoftmax(const XTensor &x, int leadDim)
return y; return y;
} }
/*
log scale softmax y = log(e^x / \sum_{i} e^{x_i})
make a new tensor to keep the result and return it
>> x - input vector
>> y - output vector
>> leadDim - leading dimension (along which we perform reduction)
*/
void LogSoftmax(const XTensor &x, XTensor &y, int leadDim)
{
if(!XTensor::IsSameShaped(&x, &y))
InitTensor(&y, &x);
/* call _LogSoftmax function */
_LogSoftmax(&x, &y, leadDim);
/* tensor connection */
XLink::MakeLink(&x, NULL, &y, FUNC_LOGSOFTMAX);
XLink::AddParamToHeadInt(&y, leadDim);
}
/* /*
backward computation for dense matrices with default data type backward computation for dense matrices with default data type
......
...@@ -33,6 +33,9 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim); ...@@ -33,6 +33,9 @@ void _LogSoftmax(const XTensor * x, XTensor * y, int leadDim);
/* log scale softmax y = log(e^x / \sum_{i} e^{x_i}) (return a XTensor structure) */ /* log scale softmax y = log(e^x / \sum_{i} e^{x_i}) (return a XTensor structure) */
XTensor LogSoftmax(const XTensor &x, int leadDim); XTensor LogSoftmax(const XTensor &x, int leadDim);
/* log scale softmax y = log(e^x / \sum_{i} e^{x_i}) (with both argument of x and y) */
void LogSoftmax(const XTensor &x, XTensor &y, int leadDim);
/* de/dx */ /* de/dx */
void _LogSoftmaxBackward(XTensor * gold, XTensor * y, XTensor * x, void _LogSoftmaxBackward(XTensor * gold, XTensor * y, XTensor * x,
XTensor * dedy, XTensor * dedx, XTensor * dedy, XTensor * dedx,
......
...@@ -24,7 +24,7 @@ ...@@ -24,7 +24,7 @@
#include "../XDevice.h" #include "../XDevice.h"
#include "../core/math/Power.h" #include "../core/math/Power.h"
#include "../core/math/ScaleAndShift.h" #include "../core/math/ScaleAndShift.h"
#include "../core/math/Log.h" #include "../core/math/Unary.h"
#include "../core/arithmetic/Negate.h" #include "../core/arithmetic/Negate.h"
#include "../core/arithmetic/Sum.h" #include "../core/arithmetic/Sum.h"
#include "../core/arithmetic/Multiply.h" #include "../core/arithmetic/Multiply.h"
......
...@@ -37,6 +37,9 @@ softmax y = e^x / \sum_{i} e^{x_i} ...@@ -37,6 +37,9 @@ softmax y = e^x / \sum_{i} e^{x_i}
*/ */
void _Softmax(const XTensor * x, XTensor * y, int leadDim) void _Softmax(const XTensor * x, XTensor * y, int leadDim)
{ {
if(leadDim < 0)
leadDim = x->order - 1;
int leadDimRDI = x->order - leadDim - 1; int leadDimRDI = x->order - leadDim - 1;
if(!x->isSparse && !y->isSparse && x->dataType == y->dataType){ if(!x->isSparse && !y->isSparse && x->dataType == y->dataType){
int * dimSize = new int[x->order - 1]; int * dimSize = new int[x->order - 1];
......
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-12 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-12
*/ */
#include "../core/math/Unary.h"
#include "TAbsolute.h" #include "TAbsolute.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
...@@ -30,14 +31,14 @@ Set every entry to its absolute value. ...@@ -30,14 +31,14 @@ Set every entry to its absolute value.
bool TestAbsolute1() bool TestAbsolute1()
{ {
/* a tensor of size (3, 2) */ /* a tensor of size (3, 2) */
int aOrder = 2; int order = 2;
int * aDimSize = new int[aOrder]; int * dimSize = new int[order];
aDimSize[0] = 3; dimSize[0] = 3;
aDimSize[1] = 2; dimSize[1] = 2;
int aUnitNum = 1; int unitNum = 1;
for (int i = 0; i < aOrder; i++) for (int i = 0; i < order; i++)
aUnitNum *= aDimSize[i]; unitNum *= dimSize[i];
DTYPE aData[3][2] = { {1.0F, -2.0F}, DTYPE aData[3][2] = { {1.0F, -2.0F},
{0.5F, -4.0F}, {0.5F, -4.0F},
...@@ -50,14 +51,14 @@ bool TestAbsolute1() ...@@ -50,14 +51,14 @@ bool TestAbsolute1()
bool cpuTest = true; bool cpuTest = true;
/* create tensors */ /* create tensors */
XTensor * a = NewTensor(aOrder, aDimSize); XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(aOrder, aDimSize); XTensor * b = NewTensor(order, dimSize);
XTensor * aMe = NewTensor(aOrder, aDimSize); XTensor * aMe = NewTensor(order, dimSize);
XTensor bUser; XTensor bUser;
/* initialize variables */ /* initialize variables */
a->SetData(aData, aUnitNum); a->SetData(aData, unitNum);
aMe->SetData(aData, aUnitNum); aMe->SetData(aData, unitNum);
/* call Absolute function */ /* call Absolute function */
_Absolute(a, b); _Absolute(a, b);
...@@ -65,21 +66,21 @@ bool TestAbsolute1() ...@@ -65,21 +66,21 @@ bool TestAbsolute1()
bUser = Absolute(*a); bUser = Absolute(*a);
/* check results */ /* check results */
cpuTest = b->CheckData(answer, aUnitNum, 1e-4F) && aMe->CheckData(answer, aUnitNum, 1e-4F) && bUser.CheckData(answer, aUnitNum, 1e-4F); cpuTest = b->CheckData(answer, unitNum, 1e-4F) && aMe->CheckData(answer, unitNum, 1e-4F) && bUser.CheckData(answer, unitNum, 1e-4F);
#ifdef USE_CUDA #ifdef USE_CUDA
/* GPU test */ /* GPU test */
bool gpuTest = true; bool gpuTest = true;
/* create tensor */ /* create tensor */
XTensor * aGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0); XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0); XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * aMeGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0); XTensor * aMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU; XTensor bUserGPU;
/* Initialize variables */ /* Initialize variables */
aGPU->SetData(aData, aUnitNum); aGPU->SetData(aData, unitNum);
aMeGPU->SetData(aData, aUnitNum); aMeGPU->SetData(aData, unitNum);
/* call Absolute function */ /* call Absolute function */
_Absolute(aGPU, bGPU); _Absolute(aGPU, bGPU);
...@@ -87,7 +88,7 @@ bool TestAbsolute1() ...@@ -87,7 +88,7 @@ bool TestAbsolute1()
bUserGPU = Absolute(*aGPU); bUserGPU = Absolute(*aGPU);
/* check results */ /* check results */
gpuTest = bGPU->CheckData(answer, aUnitNum, 1e-4F) && aMeGPU->CheckData(answer, aUnitNum, 1e-4F) && bUserGPU.CheckData(answer, aUnitNum, 1e-4F); gpuTest = bGPU->CheckData(answer, unitNum, 1e-4F) && aMeGPU->CheckData(answer, unitNum, 1e-4F) && bUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */ /* destroy variables */
delete a; delete a;
...@@ -96,7 +97,7 @@ bool TestAbsolute1() ...@@ -96,7 +97,7 @@ bool TestAbsolute1()
delete aGPU; delete aGPU;
delete bGPU; delete bGPU;
delete aMeGPU; delete aMeGPU;
delete[] aDimSize; delete[] dimSize;
return cpuTest && gpuTest; return cpuTest && gpuTest;
#else #else
...@@ -104,7 +105,7 @@ bool TestAbsolute1() ...@@ -104,7 +105,7 @@ bool TestAbsolute1()
delete a; delete a;
delete b; delete b;
delete aMe; delete aMe;
delete[] aDimSize; delete[] dimSize;
return cpuTest; return cpuTest;
#endif // USE_CUDA #endif // USE_CUDA
......
...@@ -22,7 +22,6 @@ ...@@ -22,7 +22,6 @@
#ifndef __TEST_ABSOLUTE_H__ #ifndef __TEST_ABSOLUTE_H__
#define __TEST_ABSOLUTE_H__ #define __TEST_ABSOLUTE_H__
#include "../core/arithmetic/Absolute.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#include "../core/math/Unary.h"
#include "TCos.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: test Cos function.
Set every entry to its cosine value.
*/
bool TestCos1()
{
/* a tensor of size (3, 2) */
int order = 2;
int * dimSize = new int[order];
dimSize[0] = 3;
dimSize[1] = 2;
int unitNum = 1;
for (int i = 0; i < order; i++)
unitNum *= dimSize[i];
DTYPE aData[3][2] = { {1.0F, 2.0F},
{-1.0F, -2.0F},
{0.0F, 0.5F} };
DTYPE answer[3][2] = { {0.5403F, -0.4161F},
{0.5403F, -0.4161F},
{1.0F, 0.8776F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(order, dimSize);
XTensor * aMe = NewTensor(order, dimSize);
XTensor bUser;
/* initialize variables */
a->SetData(aData, unitNum);
aMe->SetData(aData, unitNum);
/* call Cos function */
_Cos(a, b);
_CosMe(aMe);
bUser = Cos(*a);
/* check results */
cpuTest = b->CheckData(answer, unitNum, 1e-4F) && aMe->CheckData(answer, unitNum, 1e-4F) && bUser.CheckData(answer, unitNum, 1e-4F);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * aMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU;
/* Initialize variables */
aGPU->SetData(aData, unitNum);
aMeGPU->SetData(aData, unitNum);
/* call Cos function */
_Cos(aGPU, bGPU);
_CosMe(aMeGPU);
bUserGPU = Cos(*aGPU);
/* check results */
gpuTest = bGPU->CheckData(answer, unitNum, 1e-4F) && aMeGPU->CheckData(answer, unitNum, 1e-4F) && bUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete aMe;
delete aGPU;
delete bGPU;
delete aMeGPU;
delete[] dimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete aMe;
delete[] dimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for Cos Function */
bool TestCos()
{
XPRINT(0, stdout, "[TEST Cos] set every entry to its cosine value \n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestCos1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#ifndef __TEST_SIN_H__
#define __TEST_SIN_H__
namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for Sin Function */
bool TestSin();
} // namespace nts(NiuTrans.Tensor)
#endif // __TEST_SIN_H__
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#ifndef __TEST_COS_H__
#define __TEST_COS_H__
namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for Cos Function */
bool TestCos();
} // namespace nts(NiuTrans.Tensor)
#endif // __TEST_COS_H__
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#include "TDiv.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: element-wise division of two tensors
c(i) = a(i)/b(i) + \alpha * c(i)
In this case, (2, 2) (2, 2) -> (2, 2), leadingDim=0, alpha=0.
*/
bool TestDiv1()
{
/* a source tensor of size (2, 2) */
int sOrder1 = 2;
int * sDimSize1 = new int[sOrder1];
sDimSize1[0] = 2;
sDimSize1[1] = 2;
int sUnitNum1 = 1;
for (int i = 0; i < sOrder1; i++)
sUnitNum1 *= sDimSize1[i];
/* a source tensor of size (2, 2) */
int sOrder2 = 2;
int * sDimSize2 = new int[sOrder2];
sDimSize2[0] = 2;
sDimSize2[1] = 2;
int sUnitNum2 = 1;
for (int i = 0; i < sOrder2; i++)
sUnitNum2 *= sDimSize2[i];
/* a target tensor of size (2, 2) */
int tOrder = 2;
int * tDimSize = new int[tOrder];
tDimSize[0] = 2;
tDimSize[1] = 2;
int tUnitNum = 1;
for (int i = 0; i < tOrder; i++)
tUnitNum *= tDimSize[i];
DTYPE sData1[2][2] = { {0.0F, 1.0F},
{2.0F, 3.0F} };
DTYPE sData2[2][2] = { {1.0F, 1.0F},
{4.0F, 9.0F} };
DTYPE answer[2][2] = { {0.0F, 1.0F},
{0.5F, 0.3333F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * s1 = NewTensor(sOrder1, sDimSize1);
XTensor * s2 = NewTensor(sOrder2, sDimSize2);
XTensor * t = NewTensor(tOrder, tDimSize);
XTensor * tMe = NewTensor(tOrder, tDimSize);
XTensor tUser;
/* initialize variables */
s1->SetData(sData1, sUnitNum1);
tMe->SetData(sData1, sUnitNum1);
s2->SetData(sData2, sUnitNum2);
t->SetZeroAll();
/* call Div function */
_Div(s1, s2, t, 0, 0);
_DivMe(tMe, s2, 0, 0);
tUser = Div(*s1, *s2, 0);
/* check results */
cpuTest = t->CheckData(answer, tUnitNum, 1e-4F) &&
tMe->CheckData(answer, tUnitNum, 1e-4F) &&
tUser.CheckData(answer, tUnitNum, 1e-4F);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * sGPU1 = NewTensor(sOrder1, sDimSize1, X_FLOAT, 1.0F, 0);
XTensor * sGPU2 = NewTensor(sOrder2, sDimSize2, X_FLOAT, 1.0F, 0);
XTensor * tGPU = NewTensor(tOrder, tDimSize, X_FLOAT, 1.0F, 0);
XTensor * tMeGPU = NewTensor(tOrder, tDimSize, X_FLOAT, 1.0F, 0);
XTensor tUserGPU;
/* Initialize variables */
sGPU1->SetData(sData1, sUnitNum1);
tMeGPU->SetData(sData1, sUnitNum1);
sGPU2->SetData(sData2, sUnitNum2);
tGPU->SetZeroAll();
/* call Div function */
_Div(sGPU1, sGPU2, tGPU, 0, 0);
_DivMe(tMeGPU, sGPU2, 0, 0);
tUserGPU = Div(*sGPU1, *sGPU2, 0);
/* check results */
gpuTest = tGPU->CheckData(answer, tUnitNum, 1e-4F) &&
tMeGPU->CheckData(answer, tUnitNum, 1e-4F) &&
tUserGPU.CheckData(answer, tUnitNum, 1e-4F);
/* destroy variables */
delete s1;
delete s2;
delete t;
delete tMe;
delete sGPU1;
delete sGPU2;
delete tGPU;
delete tMeGPU;
delete[] sDimSize1;
delete[] sDimSize2;
delete[] tDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete s1;
delete s2;
delete t;
delete tMe;
delete[] sDimSize1;
delete[] sDimSize2;
delete[] tDimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for Div Function */
bool TestDiv()
{
XPRINT(0, stdout, "[TEST Div] element-wise division of two tensors \n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestDiv1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
...@@ -16,19 +16,20 @@ ...@@ -16,19 +16,20 @@
*/ */
/* /*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-06-15 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/ */
#ifndef __TEST_MATRIXMULBATCHEDCPU_H__ #ifndef __TEST_DIV_H__
#define __TEST_MATRIXMULBATCHEDCPU_H__ #define __TEST_DIV_H__
#include "../core/arithmetic/MatrixMULBatchedCPU.h" #include "../core/arithmetic/Div.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for MatrixMulBatchedCPU Function */ /* test for Div Function */
extern "C" extern "C"
bool TestMatrixMulBatchedCPU(); bool TestDiv();
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
#endif // __TEST_MATRIXMULBATCHEDCPU_H__
#endif // __TEST_DIV_H__
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#include "../core/math/Unary.h"
#include "TExp.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: test Exp function.
Set every entry to its exponent value.
*/
bool TestExp1()
{
/* a tensor of size (3, 2) */
int order = 2;
int * dimSize = new int[order];
dimSize[0] = 3;
dimSize[1] = 2;
int unitNum = 1;
for (int i = 0; i < order; i++)
unitNum *= dimSize[i];
DTYPE aData[3][2] = { {1.0F, 2.0F},
{-1.0F, -2.0F},
{0.0F, 0.5F} };
DTYPE answer[3][2] = { {2.7183F, 7.3891F},
{0.3679F, 0.1353F},
{1.0F, 1.6487F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(order, dimSize);
XTensor * aMe = NewTensor(order, dimSize);
XTensor bUser;
/* initialize variables */
a->SetData(aData, unitNum);
aMe->SetData(aData, unitNum);
/* call Exp function */
_Exp(a, b);
_ExpMe(aMe);
bUser = Exp(*a);
/* check results */
cpuTest = b->CheckData(answer, unitNum, 1e-4F) && aMe->CheckData(answer, unitNum, 1e-4F) && bUser.CheckData(answer, unitNum, 1e-4F);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * aMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU;
/* Initialize variables */
aGPU->SetData(aData, unitNum);
aMeGPU->SetData(aData, unitNum);
/* call Exp function */
_Exp(aGPU, bGPU);
_ExpMe(aMeGPU);
bUserGPU = Exp(*aGPU);
/* check results */
gpuTest = bGPU->CheckData(answer, unitNum, 1e-4F) && aMeGPU->CheckData(answer, unitNum, 1e-4F) && bUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete aMe;
delete aGPU;
delete bGPU;
delete aMeGPU;
delete[] dimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete aMe;
delete[] dimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for Exp Function */
bool TestExp()
{
XPRINT(0, stdout, "[TEST Exp] set every entry to its exponent value \n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestExp1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
...@@ -16,20 +16,16 @@ ...@@ -16,20 +16,16 @@
*/ */
/* /*
* $Created by: XIAO Tong (email: xiaotong@mail.neu.edu.cn) 2018-04-24 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/ */
#ifndef __MATRIXMULBATCHEDCPU_H__ #ifndef __TEST_EXP_H__
#define __MATRIXMULBATCHEDCPU_H__ #define __TEST_EXP_H__
#include "../../XTensor.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* matrix multiplication in batch mode (CPU code) */ /* test for Exp Function */
void _MatrixMULBatchedCPU(const XList * a, MATRIX_TRANS_TYPE transposedA, const XList * b, MATRIX_TRANS_TYPE transposedB, bool TestExp();
XList * c, DTYPE alpha = (DTYPE)1.0, DTYPE beta = 0);
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
#endif // __TEST_EXP_H__
#endif // __MATRIXMULBATCHEDCPU_H__
\ No newline at end of file
...@@ -19,6 +19,7 @@ ...@@ -19,6 +19,7 @@
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-12 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-12
*/ */
#include "../core/math/Unary.h"
#include "TLog.h" #include "TLog.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
...@@ -30,14 +31,14 @@ Set every entry to its log value. ...@@ -30,14 +31,14 @@ Set every entry to its log value.
bool TestLog1() bool TestLog1()
{ {
/* a tensor of size (3, 2) */ /* a tensor of size (3, 2) */
int aOrder = 2; int order = 2;
int * aDimSize = new int[aOrder]; int * dimSize = new int[order];
aDimSize[0] = 3; dimSize[0] = 3;
aDimSize[1] = 2; dimSize[1] = 2;
int aUnitNum = 1; int unitNum = 1;
for (int i = 0; i < aOrder; i++) for (int i = 0; i < order; i++)
aUnitNum *= aDimSize[i]; unitNum *= dimSize[i];
DTYPE aData[3][2] = { {1.0F, 2.0F}, DTYPE aData[3][2] = { {1.0F, 2.0F},
{0.5F, 4.0F}, {0.5F, 4.0F},
...@@ -50,14 +51,14 @@ bool TestLog1() ...@@ -50,14 +51,14 @@ bool TestLog1()
bool cpuTest = true; bool cpuTest = true;
/* create tensors */ /* create tensors */
XTensor * a = NewTensor(aOrder, aDimSize); XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(aOrder, aDimSize); XTensor * b = NewTensor(order, dimSize);
XTensor * aMe = NewTensor(aOrder, aDimSize); XTensor * aMe = NewTensor(order, dimSize);
XTensor bUser; XTensor bUser;
/* initialize variables */ /* initialize variables */
a->SetData(aData, aUnitNum); a->SetData(aData, unitNum);
aMe->SetData(aData, aUnitNum); aMe->SetData(aData, unitNum);
/* call Log function */ /* call Log function */
_Log(a, b); _Log(a, b);
...@@ -65,21 +66,21 @@ bool TestLog1() ...@@ -65,21 +66,21 @@ bool TestLog1()
bUser = Log(*a); bUser = Log(*a);
/* check results */ /* check results */
cpuTest = b->CheckData(answer, aUnitNum, 1e-4F) && aMe->CheckData(answer, aUnitNum, 1e-4F) && bUser.CheckData(answer, aUnitNum, 1e-4F); cpuTest = b->CheckData(answer, unitNum, 1e-4F) && aMe->CheckData(answer, unitNum, 1e-4F) && bUser.CheckData(answer, unitNum, 1e-4F);
#ifdef USE_CUDA #ifdef USE_CUDA
/* GPU test */ /* GPU test */
bool gpuTest = true; bool gpuTest = true;
/* create tensor */ /* create tensor */
XTensor * aGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0); XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0); XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * aMeGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0); XTensor * aMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU; XTensor bUserGPU;
/* Initialize variables */ /* Initialize variables */
aGPU->SetData(aData, aUnitNum); aGPU->SetData(aData, unitNum);
aMeGPU->SetData(aData, aUnitNum); aMeGPU->SetData(aData, unitNum);
/* call Log function */ /* call Log function */
_Log(aGPU, bGPU); _Log(aGPU, bGPU);
...@@ -87,7 +88,7 @@ bool TestLog1() ...@@ -87,7 +88,7 @@ bool TestLog1()
bUserGPU = Log(*aGPU); bUserGPU = Log(*aGPU);
/* check results */ /* check results */
gpuTest = bGPU->CheckData(answer, aUnitNum, 1e-4F) && aMeGPU->CheckData(answer, aUnitNum, 1e-4F) && bUserGPU.CheckData(answer, aUnitNum, 1e-4F); gpuTest = bGPU->CheckData(answer, unitNum, 1e-4F) && aMeGPU->CheckData(answer, unitNum, 1e-4F) && bUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */ /* destroy variables */
delete a; delete a;
...@@ -96,7 +97,7 @@ bool TestLog1() ...@@ -96,7 +97,7 @@ bool TestLog1()
delete aGPU; delete aGPU;
delete bGPU; delete bGPU;
delete aMeGPU; delete aMeGPU;
delete[] aDimSize; delete[] dimSize;
return cpuTest && gpuTest; return cpuTest && gpuTest;
#else #else
...@@ -104,7 +105,7 @@ bool TestLog1() ...@@ -104,7 +105,7 @@ bool TestLog1()
delete a; delete a;
delete b; delete b;
delete aMe; delete aMe;
delete[] aDimSize; delete[] dimSize;
return cpuTest; return cpuTest;
#endif // USE_CUDA #endif // USE_CUDA
......
...@@ -22,8 +22,6 @@ ...@@ -22,8 +22,6 @@
#ifndef __TEST_LOG_H__ #ifndef __TEST_LOG_H__
#define __TEST_LOG_H__ #define __TEST_LOG_H__
#include "../core/math/Log.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for Log Function */ /* test for Log Function */
......
...@@ -16,8 +16,8 @@ ...@@ -16,8 +16,8 @@
*/ */
/* /*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-02 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-02
*/ */
#ifndef __TEST_LOGSOFTMAX_H__ #ifndef __TEST_LOGSOFTMAX_H__
#define __TEST_LOGSOFTMAX_H__ #define __TEST_LOGSOFTMAX_H__
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-06-15
*/
#include "TMatrixMULBatchedCPU.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: matrix multiplication in batch mode (CPU code).
In this case, aList=2*(2, 3), bList=2*(3, 2) -> c=2*(2, 2), transposedA=X_NOTRANS, transposedB=X_NOTRANS.
*/
bool TestMatrixMulBatchedCPU1()
{
/* create list */
XList * aList = new XList();
XList * bList = new XList();
XList * cList = new XList();
/* a source tensor of size (2, 3) */
int aOrder = 2;
int * aDimSize = new int[aOrder];
aDimSize[0] = 2;
aDimSize[1] = 3;
int aUnitNum = 1;
for (int i = 0; i < aOrder; i++)
aUnitNum *= aDimSize[i];
/* a source tensor of size (3, 2) */
int bOrder = 2;
int * bDimSize = new int[bOrder];
bDimSize[0] = 3;
bDimSize[1] = 2;
int bUnitNum = 1;
for (int i = 0; i < bOrder; i++)
bUnitNum *= bDimSize[i];
/* a target tensor of size (2, 2) */
int cOrder = 2;
int * cDimSize = new int[cOrder];
cDimSize[0] = 2;
cDimSize[1] = 2;
int cUnitNum = 1;
for (int i = 0; i < cOrder; i++)
cUnitNum *= cDimSize[i];
DTYPE aData1[2][3] = { {1.0F, 2.0F, 3.0F},
{-4.0F, 5.0F, 6.0F} };
DTYPE aData2[2][3] = { {1.0F, -2.0F, -3.0F},
{-4.0F, 3.0F, 2.0F} };
DTYPE bData1[3][2] = { {0.0F, -1.0F},
{1.0F, 2.0F},
{2.0F, 1.0F} };
DTYPE bData2[3][2] = { {0.0F, 1.0F},
{3.0F, 2.0F},
{2.0F, 1.0F} };
DTYPE answer1[2][2] = { {8.0F, 6.0F},
{17.0F, 20.0F} };
DTYPE answer2[2][2] = { {-12.0F, -6.0F},
{13.0F, 4.0F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a1 = NewTensor(aOrder, aDimSize);
XTensor * a2 = NewTensor(aOrder, aDimSize);
XTensor * b1 = NewTensor(bOrder, bDimSize);
XTensor * b2 = NewTensor(bOrder, bDimSize);
XTensor * c1 = NewTensor(cOrder, cDimSize);
XTensor * c2 = NewTensor(cOrder, cDimSize);
/* initialize variables */
a1->SetData(aData1, aUnitNum);
a2->SetData(aData2, aUnitNum);
b1->SetData(bData1, aUnitNum);
b2->SetData(bData2, aUnitNum);
c1->SetZeroAll();
c2->SetZeroAll();
/* add tensors to list */
aList->Add(a1);
aList->Add(a2);
bList->Add(b1);
bList->Add(b2);
cList->Add(c1);
cList->Add(c2);
/* call MatrixMULBatchedCPU function */
_MatrixMULBatchedCPU(aList, X_NOTRANS, bList, X_NOTRANS, cList);
/* check results */
cpuTest = c1->CheckData(answer1, cUnitNum) && c2->CheckData(answer2, cUnitNum);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensors */
XTensor * aGPU1 = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * aGPU2 = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU1 = NewTensor(bOrder, bDimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU2 = NewTensor(bOrder, bDimSize, X_FLOAT, 1.0F, 0);
XTensor * cGPU1 = NewTensor(cOrder, cDimSize, X_FLOAT, 1.0F, 0);
XTensor * cGPU2 = NewTensor(cOrder, cDimSize, X_FLOAT, 1.0F, 0);
/* initialize variables */
aGPU1->SetData(aData1, aUnitNum);
aGPU2->SetData(aData2, aUnitNum);
bGPU1->SetData(bData1, aUnitNum);
bGPU2->SetData(bData2, aUnitNum);
cGPU1->SetZeroAll();
cGPU2->SetZeroAll();
/* clear list */
aList->Clear();
bList->Clear();
cList->Clear();
/* add tensors to list */
aList->Add(aGPU1);
aList->Add(aGPU2);
bList->Add(bGPU1);
bList->Add(bGPU2);
cList->Add(cGPU1);
cList->Add(cGPU2);
/* call MatrixMULBatchedCPU function */
_MatrixMULBatchedCPU(aList, X_NOTRANS, bList, X_NOTRANS, cList);
/* check results */
gpuTest = cGPU1->CheckData(answer1, cUnitNum) && gpuTest;
gpuTest = cGPU2->CheckData(answer2, cUnitNum) && gpuTest;
/* destroy variables */
delete a1;
delete a2;
delete b1;
delete b2;
delete c1;
delete c2;
delete aGPU1;
delete aGPU2;
delete bGPU1;
delete bGPU2;
delete cGPU1;
delete cGPU2;
delete[] aDimSize;
delete[] bDimSize;
delete[] cDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a1;
delete a2;
delete b1;
delete b2;
delete c1;
delete c2;
delete[] aDimSize;
delete[] bDimSize;
delete[] cDimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for MatrixMulBatchedCPU Function */
extern "C"
bool TestMatrixMulBatchedCPU()
{
XPRINT(0, stdout, "[TEST MATRIXMULBATCHEDCPU] matrix multiplication in batch mode (CPU code) \n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestMatrixMulBatchedCPU1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
...@@ -25,133 +25,10 @@ namespace nts { // namespace nts(NiuTrans.Tensor) ...@@ -25,133 +25,10 @@ namespace nts { // namespace nts(NiuTrans.Tensor)
/* /*
case 1: element-wise product of two tensors case 1: element-wise product of two tensors
c(i) = a(i)*b(i) + \alpha * c(i)
In this case, (2, 1) (2, 1) -> (2, 1), leadingDim=0, alpha=0.
*/
bool TestMultiply1()
{
/* a source tensor of size (2, 1) */
int sOrder1 = 2;
int * sDimSize1 = new int[sOrder1];
sDimSize1[0] = 2;
sDimSize1[1] = 1;
int sUnitNum1 = 1;
for (int i = 0; i < sOrder1; i++)
sUnitNum1 *= sDimSize1[i];
/* a source tensor of size (2, 1) */
int sOrder2 = 2;
int * sDimSize2 = new int[sOrder2];
sDimSize2[0] = 2;
sDimSize2[1] = 1;
int sUnitNum2 = 1;
for (int i = 0; i < sOrder2; i++)
sUnitNum2 *= sDimSize2[i];
/* a target tensor of size (2, 1) */
int tOrder = 2;
int * tDimSize = new int[tOrder];
tDimSize[0] = 2;
tDimSize[1] = 1;
int tUnitNum = 1;
for (int i = 0; i < tOrder; i++)
tUnitNum *= tDimSize[i];
DTYPE sData1[2][1] = { {0.0F},
{1.0F} };
DTYPE sData2[2][1] = { {2.0F},
{3.0F} };
DTYPE answer[2][1] = { {0.0F},
{3.0F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * s1 = NewTensor(sOrder1, sDimSize1);
XTensor * s2 = NewTensor(sOrder2, sDimSize2);
XTensor * t = NewTensor(tOrder, tDimSize);
XTensor * tMe = NewTensor(tOrder, tDimSize);
XTensor tUser;
/* initialize variables */
s1->SetData(sData1, sUnitNum1);
tMe->SetData(sData1, sUnitNum1);
s2->SetData(sData2, sUnitNum2);
t->SetZeroAll();
/* call Multiply function */
_Multiply(s1, s2, t, 0, 0);
_MultiplyMe(tMe, s2, 0, 0);
tUser = Multiply(*s1, *s2, 0);
/* check results */
cpuTest = t->CheckData(answer, tUnitNum)
&& tMe->CheckData(answer, tUnitNum) && tUser.CheckData(answer, tUnitNum);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * sGPU1 = NewTensor(sOrder1, sDimSize1, X_FLOAT, 1.0F, 0);
XTensor * sGPU2 = NewTensor(sOrder2, sDimSize2, X_FLOAT, 1.0F, 0);
XTensor * tGPU = NewTensor(tOrder, tDimSize, X_FLOAT, 1.0F, 0);
XTensor * tMeGPU = NewTensor(tOrder, tDimSize, X_FLOAT, 1.0F, 0);
XTensor tUserGPU;
/* Initialize variables */
sGPU1->SetData(sData1, sUnitNum1);
tMeGPU->SetData(sData1, sUnitNum1);
sGPU2->SetData(sData2, sUnitNum2);
tGPU->SetZeroAll();
/* call Multiply function */
_Multiply(sGPU1, sGPU2, tGPU, 0, 0);
_MultiplyMe(tMeGPU, sGPU2, 0, 0);
tUserGPU = Multiply(*sGPU1, *sGPU2, 0);
/* check results */
gpuTest = tGPU->CheckData(answer, tUnitNum)
&& tMeGPU->CheckData(answer, tUnitNum) && tUserGPU.CheckData(answer, tUnitNum);
/* destroy variables */
delete s1;
delete s2;
delete t;
delete tMe;
delete sGPU1;
delete sGPU2;
delete tGPU;
delete tMeGPU;
delete[] sDimSize1;
delete[] sDimSize2;
delete[] tDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete s1;
delete s2;
delete t;
delete tMe;
delete[] sDimSize1;
delete[] sDimSize2;
delete[] tDimSize;
return cpuTest;
#endif // USE_CUDA
}
/*
case 2: element-wise product of two tensors
c(i) = a(i)*b(i) + \alpha * c(i) c(i) = a(i)*b(i) + \alpha * c(i)
In this case, (2, 2) (2, 2) -> (2, 2), leadingDim=0, alpha=0. In this case, (2, 2) (2, 2) -> (2, 2), leadingDim=0, alpha=0.
*/ */
bool TestMultiply2() bool TestMultiply1()
{ {
/* a source tensor of size (2, 2) */ /* a source tensor of size (2, 2) */
int sOrder1 = 2; int sOrder1 = 2;
...@@ -212,8 +89,9 @@ bool TestMultiply2() ...@@ -212,8 +89,9 @@ bool TestMultiply2()
tUser = Multiply(*s1, *s2, 0); tUser = Multiply(*s1, *s2, 0);
/* check results */ /* check results */
cpuTest = t->CheckData(answer, tUnitNum) cpuTest = t->CheckData(answer, tUnitNum) &&
&& tMe->CheckData(answer, tUnitNum) && tUser.CheckData(answer, tUnitNum); tMe->CheckData(answer, tUnitNum) &&
tUser.CheckData(answer, tUnitNum);
#ifdef USE_CUDA #ifdef USE_CUDA
/* GPU test */ /* GPU test */
...@@ -270,113 +148,6 @@ bool TestMultiply2() ...@@ -270,113 +148,6 @@ bool TestMultiply2()
#endif // USE_CUDA #endif // USE_CUDA
} }
/*
case 3: element-wise product of two tensors, c(i) = a(i)*b(i) + \alpha * c(i)
In this case, (2, 2) (2, 2) -> (2, 2), leadingDim=1, alpha=0.
*/
bool TestMultiply3()
{
/* a source tensor of size (2, 2) */
int sOrder1 = 2;
int * sDimSize1 = new int[sOrder1];
sDimSize1[0] = 2;
sDimSize1[1] = 2;
int sUnitNum1 = 1;
for (int i = 0; i < sOrder1; i++)
sUnitNum1 *= sDimSize1[i];
/* a source tensor of size (2, 2) */
int sOrder2 = 2;
int * sDimSize2 = new int[sOrder2];
sDimSize2[0] = 2;
sDimSize2[1] = 2;
int sUnitNum2 = 1;
for (int i = 0; i < sOrder2; i++)
sUnitNum2 *= sDimSize2[i];
/* a target tensor of size (2, 2) */
int tOrder = 2;
int * tDimSize = new int[tOrder];
tDimSize[0] = 2;
tDimSize[1] = 2;
int tUnitNum = 1;
for (int i = 0; i < tOrder; i++)
tUnitNum *= tDimSize[i];
DTYPE sData1[2][2] = { {0.0F, 1.0F},
{2.0F, 3.0F} };
DTYPE sData2[2][2] = { {0.0F, 1.0F},
{2.0F, 3.0F} };
DTYPE answer[2][2] = { {0.0F, 1.0F},
{4.0F, 9.0F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * s1 = NewTensor(sOrder1, sDimSize1);
XTensor * s2 = NewTensor(sOrder2, sDimSize2);
XTensor * t = NewTensor(tOrder, tDimSize);
/* initialize variables */
s1->SetData(sData1, sUnitNum1);
s2->SetData(sData2, sUnitNum2);
t->SetZeroAll();
/* call MultiplyElementWise function */
_Multiply(s1, s2, t, 0, 1);
/* check results */
cpuTest = t->CheckData(answer, tUnitNum);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * sGPU1 = NewTensor(sOrder1, sDimSize1, X_FLOAT, 1.0F, 0);
XTensor * sGPU2 = NewTensor(sOrder2, sDimSize2, X_FLOAT, 1.0F, 0);
XTensor * tGPU = NewTensor(tOrder, tDimSize, X_FLOAT, 1.0F, 0);
/* Initialize variables */
sGPU1->SetData(sData1, sUnitNum1);
sGPU2->SetData(sData2, sUnitNum2);
tGPU->SetZeroAll();
/* call MultiplyElementWise function */
_Multiply(sGPU1, sGPU2, tGPU, 0, 1);
/* check results */
gpuTest = tGPU->CheckData(answer, tUnitNum);
/* destroy variables */
delete s1;
delete s2;
delete t;
delete sGPU1;
delete sGPU2;
delete tGPU;
delete[] sDimSize1;
delete[] sDimSize2;
delete[] tDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete s1;
delete s2;
delete t;
delete[] sDimSize1;
delete[] sDimSize2;
delete[] tDimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */ /* other cases */
/* /*
TODO!! TODO!!
...@@ -398,26 +169,6 @@ bool TestMultiply() ...@@ -398,26 +169,6 @@ bool TestMultiply()
else else
XPRINT(0, stdout, ">> case 1 passed!\n"); XPRINT(0, stdout, ">> case 1 passed!\n");
/* case 2 test */
caseFlag = TestMultiply2();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 2 failed!\n");
}
else
XPRINT(0, stdout, ">> case 2 passed!\n");
/* case 3 test */
caseFlag = TestMultiply3();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 3 failed!\n");
}
else
XPRINT(0, stdout, ">> case 3 passed!\n");
/* other cases test */ /* other cases test */
/* /*
TODO!! TODO!!
......
...@@ -19,16 +19,17 @@ ...@@ -19,16 +19,17 @@
* $Created by: Lin Ye (email: linye2015@outlook.com) 2018-06-15 * $Created by: Lin Ye (email: linye2015@outlook.com) 2018-06-15
*/ */
#ifndef __TEST_MULTIPLYELEMENTWISE_H__ #ifndef __TEST_MULTIPLY_H__
#define __TEST_MULTIPLYELEMENTWISE_H__ #define __TEST_MULTIPLY_H__
#include "../core/arithmetic/Multiply.h" #include "../core/arithmetic/Multiply.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for MultiplyElementWise Function */ /* test for Multiply Function */
extern "C" extern "C"
bool TestMultiply(); bool TestMultiply();
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
#endif // __TEST_MULTIPLYELEMENTWISE_H__
#endif // __TEST_MULTIPLY_H__
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#include "../core/math/Unary.h"
#include "TSin.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: test Sin function.
Set every entry to its sine value.
*/
bool TestSin1()
{
/* a tensor of size (3, 2) */
int order = 2;
int * dimSize = new int[order];
dimSize[0] = 3;
dimSize[1] = 2;
int unitNum = 1;
for (int i = 0; i < order; i++)
unitNum *= dimSize[i];
DTYPE aData[3][2] = { {1.0F, 2.0F},
{-1.0F, -2.0F},
{0.0F, 0.5F} };
DTYPE answer[3][2] = { {0.8415F, 0.9093F},
{-0.8415F, -0.9093F},
{0.0F, 0.4794F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(order, dimSize);
XTensor * aMe = NewTensor(order, dimSize);
XTensor bUser;
/* initialize variables */
a->SetData(aData, unitNum);
aMe->SetData(aData, unitNum);
/* call Sin function */
_Sin(a, b);
_SinMe(aMe);
bUser = Sin(*a);
/* check results */
cpuTest = b->CheckData(answer, unitNum, 1e-4F) && aMe->CheckData(answer, unitNum, 1e-4F) && bUser.CheckData(answer, unitNum, 1e-4F);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * aMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU;
/* Initialize variables */
aGPU->SetData(aData, unitNum);
aMeGPU->SetData(aData, unitNum);
/* call Sin function */
_Sin(aGPU, bGPU);
_SinMe(aMeGPU);
bUserGPU = Sin(*aGPU);
/* check results */
gpuTest = bGPU->CheckData(answer, unitNum, 1e-4F) && aMeGPU->CheckData(answer, unitNum, 1e-4F) && bUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete aMe;
delete aGPU;
delete bGPU;
delete aMeGPU;
delete[] dimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete aMe;
delete[] dimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for Sin Function */
bool TestSin()
{
XPRINT(0, stdout, "[TEST Sin] set every entry to its sine value \n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestSin1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
...@@ -16,26 +16,16 @@ ...@@ -16,26 +16,16 @@
*/ */
/* /*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/ */
#include "Absolute.h" #ifndef __TEST_SIN_H__
#define __TEST_SIN_H__
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA /* test for Sin Function */
bool TestSin();
/* set each entry to its absolute value (CUDA Kernel) */ } // namespace nts(NiuTrans.Tensor)
__global__ #endif // __TEST_SIN_H__
void KernelAbsolute(DTYPE * a, DTYPE * b, int size);
/* set each entry to its absolute value (CUDA Kernel) with float16 data type*/
__global__
void KernelAbsolute(__half * a, __half * b, int size);
/* set each entry to its absolute value */
void _CudaAbsolute(const XTensor * a, XTensor * b);
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor)
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#include "TSub.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/* case 1: tensor subtraction c = a - b * \beta */
bool TestSub1()
{
/* a tensor of size (2, 4) */
int order = 2;
int * dimSize = new int[order];
dimSize[0] = 2;
dimSize[1] = 4;
int unitNum = 1;
for (int i = 0; i < order; i++)
unitNum *= dimSize[i];
DTYPE aData[2][4] = { {0.0F, 1.0F, 2.0F, 3.0F},
{4.0F, 5.0F, 6.0F, 7.0F} };
DTYPE bData[2][4] = { {1.0F, -1.0F, -3.0F, -5.0F},
{-7.0F, -9.0F, -11.0F, -13.0F} };
DTYPE answer[2][4] = { {-1.0F, 2.0F, 5.0F, 8.0F},
{11.0F, 14.0F, 17.0F, 20.0F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(order, dimSize);
XTensor * c = NewTensor(order, dimSize);
XTensor * cMe = NewTensor(order, dimSize);
XTensor cUser;
/* initialize variables */
a->SetData(aData, unitNum);
cMe->SetData(aData, unitNum);
b->SetData(bData, unitNum);
c->SetZeroAll();
/* call Sub function */
_Sub(a, b, c);
_SubMe(cMe, b);
cUser = Sub(*a, *b);
/* check results */
cpuTest = c->CheckData(answer, unitNum)
&& cMe->CheckData(answer, unitNum) && cUser.CheckData(answer, unitNum);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * cGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * cMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor cUserGPU;
/* Initialize variables */
aGPU->SetData(aData, unitNum);
cMeGPU->SetData(aData, unitNum);
bGPU->SetData(bData, unitNum);
cGPU->SetZeroAll();
/* call Sub function */
_Sub(aGPU, bGPU, cGPU);
_SubMe(cMeGPU, bGPU);
cUserGPU = Sub(*aGPU, *bGPU);
/* check results */
gpuTest = cGPU->CheckData(answer, unitNum, 1e-4F)
&& cMeGPU->CheckData(answer, unitNum, 1e-4F) && cUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete aGPU;
delete bGPU;
delete cGPU;
delete cMeGPU;
delete[] dimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete[] dimSize;
return cpuTest;
#endif // USE_CUDA
}
/* case 2: tensor subtraction c = a - b * \beta */
bool TestSub2()
{
/* a tensor of size (2, 4) */
int order = 2;
int * dimSize = new int[order];
dimSize[0] = 2;
dimSize[1] = 4;
int unitNum = 1;
for (int i = 0; i < order; i++) {
unitNum *= dimSize[i];
}
DTYPE aData[2][4] = { {0.0F, 1.0F, 2.0F, 3.0F},
{4.0F, 5.0F, 6.0F, 7.0F} };
DTYPE bData[2][4] = { {1.0F, -1.0F, -3.0F, -5.0F},
{-7.0F, -9.0F, -11.0F, -13.0F} };
DTYPE answer[2][4] = { {-0.5F, 1.5F, 3.5F, 5.5F},
{7.5F, 9.5F, 11.5F, 13.5F} };
float beta = 0.5F;
/* CPU test */
bool cpuTest = true;
/* create tensor */
XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(order, dimSize);
XTensor * c = NewTensor(order, dimSize);
XTensor * cMe = NewTensor(order, dimSize);
XTensor cUser;
/* initialize variables */
a->SetData(aData, unitNum);
cMe->SetData(aData, unitNum);
b->SetData(bData, unitNum);
c->SetZeroAll();
/* call Sub function */
_Sub(a, b, c, beta);
_SubMe(cMe, b, beta);
cUser = Sub(*a, *b, beta);
/* check results */
cpuTest = c->CheckData(answer, unitNum)
&& cMe->CheckData(answer, unitNum) && cUser.CheckData(answer, unitNum);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * cGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * cMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor cUserGPU;
/* Initialize variables */
aGPU->SetData(aData, unitNum);
cMeGPU->SetData(aData, unitNum);
bGPU->SetData(bData, unitNum);
cGPU->SetZeroAll();
/* call Sub function */
_Sub(aGPU, bGPU, cGPU, beta);
_SubMe(cMeGPU, bGPU, beta);
cUserGPU = Sub(*aGPU, *bGPU, beta);
/* check results */
gpuTest = cGPU->CheckData(answer, unitNum, 1e-4F)
&& cMeGPU->CheckData(answer, unitNum, 1e-4F) && cUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete aGPU;
delete bGPU;
delete cGPU;
delete cMeGPU;
delete[] dimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete[] dimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for Sub Function */
bool TestSub()
{
XPRINT(0, stdout, "[TEST SUB] tensor subtraction c = a - b * beta\n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestSub1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* case 2 test */
caseFlag = TestSub2();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 2 failed!\n");
}
else
XPRINT(0, stdout, ">> case 2 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-08-01
*/
#ifndef __TEST_SUB_H__
#define __TEST_SUB_H__
#include "../core/arithmetic/Sub.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for Sub Function */
bool TestSub();
} // namespace nts(NiuTrans.Tensor)
#endif // __TEST_SUB_H__
...@@ -16,8 +16,8 @@ ...@@ -16,8 +16,8 @@
*/ */
/* /*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-04-30 * $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-04-30
*/ */
#include "TSum.h" #include "TSum.h"
...@@ -59,14 +59,14 @@ bool TestSum1() ...@@ -59,14 +59,14 @@ bool TestSum1()
b->SetData(bData, unitNum); b->SetData(bData, unitNum);
c->SetZeroAll(); c->SetZeroAll();
/* call sum function */ /* call Sum function */
_Sum(a, b, c); _Sum(a, b, c);
_SumMe(cMe, b); _SumMe(cMe, b);
cUser = Sum(*a, *b); cUser = Sum(*a, *b);
/* check results */ /* check results */
cpuTest = c->CheckData(answer, unitNum) cpuTest = c->CheckData(answer, unitNum)
&& cMe->CheckData(answer, unitNum) && cUser.CheckData(answer, unitNum); && cMe->CheckData(answer, unitNum) && cUser.CheckData(answer, unitNum);
#ifdef USE_CUDA #ifdef USE_CUDA
/* GPU test */ /* GPU test */
...@@ -85,14 +85,14 @@ bool TestSum1() ...@@ -85,14 +85,14 @@ bool TestSum1()
bGPU->SetData(bData, unitNum); bGPU->SetData(bData, unitNum);
cGPU->SetZeroAll(); cGPU->SetZeroAll();
/* call sum function */ /* call Sum function */
_Sum(aGPU, bGPU, cGPU); _Sum(aGPU, bGPU, cGPU);
_SumMe(cMeGPU, bGPU); _SumMe(cMeGPU, bGPU);
cUserGPU = Sum(*aGPU, *bGPU); cUserGPU = Sum(*aGPU, *bGPU);
/* check results */ /* check results */
gpuTest = cGPU->CheckData(answer, unitNum) gpuTest = cGPU->CheckData(answer, unitNum)
&& cMeGPU->CheckData(answer, unitNum) && cUserGPU.CheckData(answer, unitNum); && cMeGPU->CheckData(answer, unitNum) && cUserGPU.CheckData(answer, unitNum);
/* destroy variables */ /* destroy variables */
delete a; delete a;
...@@ -155,14 +155,14 @@ bool TestSum2() ...@@ -155,14 +155,14 @@ bool TestSum2()
b->SetData(bData, unitNum); b->SetData(bData, unitNum);
c->SetZeroAll(); c->SetZeroAll();
/* call sum function */ /* call Sum function */
_Sum(a, b, c, beta); _Sum(a, b, c, beta);
_SumMe(cMe, b, beta); _SumMe(cMe, b, beta);
cUser = Sum(*a, *b, beta); cUser = Sum(*a, *b, beta);
/* check results */ /* check results */
cpuTest = c->CheckData(answer, unitNum) cpuTest = c->CheckData(answer, unitNum)
&& cMe->CheckData(answer, unitNum) && cUser.CheckData(answer, unitNum); && cMe->CheckData(answer, unitNum) && cUser.CheckData(answer, unitNum);
#ifdef USE_CUDA #ifdef USE_CUDA
/* GPU test */ /* GPU test */
...@@ -181,14 +181,14 @@ bool TestSum2() ...@@ -181,14 +181,14 @@ bool TestSum2()
bGPU->SetData(bData, unitNum); bGPU->SetData(bData, unitNum);
cGPU->SetZeroAll(); cGPU->SetZeroAll();
/* call sum function */ /* call Sum function */
_Sum(aGPU, bGPU, cGPU, beta); _Sum(aGPU, bGPU, cGPU, beta);
_SumMe(cMeGPU, bGPU, beta); _SumMe(cMeGPU, bGPU, beta);
cUserGPU = Sum(*aGPU, *bGPU, beta); cUserGPU = Sum(*aGPU, *bGPU, beta);
/* check results */ /* check results */
gpuTest = cGPU->CheckData(answer, unitNum) gpuTest = cGPU->CheckData(answer, unitNum)
&& cMeGPU->CheckData(answer, unitNum) && cUserGPU.CheckData(answer, unitNum); && cMeGPU->CheckData(answer, unitNum) && cUserGPU.CheckData(answer, unitNum);
/* destroy variables */ /* destroy variables */
delete a; delete a;
......
...@@ -16,8 +16,8 @@ ...@@ -16,8 +16,8 @@
*/ */
/* /*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-04-30 * $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-04-30
*/ */
#ifndef __TEST_SUM_H__ #ifndef __TEST_SUM_H__
#define __TEST_SUM_H__ #define __TEST_SUM_H__
......
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-30
*/
#include "TSumDim.h"
#include "../core/arithmetic/SumDim.h"
#include "../XTensor.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: tensor summation c = a + b * \beta
where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting
*/
bool TestSumDim1()
{
/* a tensor of size (2, 4) */
int aOrder = 2;
int * aDimSize = new int[aOrder];
aDimSize[0] = 2;
aDimSize[1] = 4;
int aUnitNum = 1;
for (int i = 0; i < aOrder; i++)
aUnitNum *= aDimSize[i];
/* a tensor of size (2) */
int bOrder = 1;
int * bDimSize = new int[bOrder];
bDimSize[0] = 2;
int bUnitNum = 1;
for (int i = 0; i < bOrder; i++)
bUnitNum *= bDimSize[i];
DTYPE aData[2][4] = { {0.0F, 1.0F, 2.0F, 3.0F},
{4.0F, 5.0F, 6.0F, 7.0F} };
DTYPE bData[2] = {1.0F, -1.0F};
DTYPE answer[2][4] = { {1.0F, 2.0F, 3.0F, 4.0F},
{3.0F, 4.0F, 5.0F, 6.0F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(aOrder, aDimSize);
XTensor * b = NewTensor(bOrder, bDimSize);
XTensor * c = NewTensor(aOrder, aDimSize);
XTensor * cMe = NewTensor(aOrder, aDimSize);
XTensor cUser;
/* initialize variables */
a->SetData(aData, aUnitNum);
cMe->SetData(aData, aUnitNum);
b->SetData(bData, bUnitNum);
c->SetZeroAll();
/* call SumDim function */
_SumDim(a, b, c, 0);
_SumDim(cMe, b, 0);
cUser = SumDim(*a, *b, 0);
/* check results */
cpuTest = c->CheckData(answer, aUnitNum)
&& cMe->CheckData(answer, aUnitNum)
&& cUser.CheckData(answer, aUnitNum);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(bOrder, bDimSize, X_FLOAT, 1.0F, 0);
XTensor * cGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * cMeGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor cUserGPU;
/* Initialize variables */
aGPU->SetData(aData, aUnitNum);
cMeGPU->SetData(aData, aUnitNum);
bGPU->SetData(bData, bUnitNum);
cGPU->SetZeroAll();
/* call sum function */
_SumDim(aGPU, bGPU, cGPU, 0);
_SumDim(cMeGPU, bGPU, 0);
cUserGPU = SumDim(*aGPU, *bGPU, 0);
/* check results */
gpuTest = cGPU->CheckData(answer, aUnitNum)
&& cMeGPU->CheckData(answer, aUnitNum)
&& cUserGPU.CheckData(answer, aUnitNum);
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete aGPU;
delete bGPU;
delete cGPU;
delete cMeGPU;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest;
#endif // USE_CUDA
}
/*
case 2: tensor summation c = a + b * \beta
where the size of b is equal to the n-th dimension of a,
i.e., a is summed with b by broadcasting
*/
bool TestSumDim2()
{
/* a tensor of size (2, 4) */
int aOrder = 2;
int * aDimSize = new int[aOrder];
aDimSize[0] = 2;
aDimSize[1] = 4;
int aUnitNum = 1;
for (int i = 0; i < aOrder; i++)
aUnitNum *= aDimSize[i];
/* a tensor of size (2, 2) */
int bOrder = 2;
int * bDimSize = new int[bOrder];
bDimSize[0] = 2;
bDimSize[1] = 2;
int bUnitNum = 1;
for (int i = 0; i < bOrder; i++)
bUnitNum *= bDimSize[i];
DTYPE aData[2][4] = { {0.0F, 1.0F, 2.0F, 3.0F},
{4.0F, 5.0F, 6.0F, 7.0F} };
DTYPE bData[2][2] = { {1.0F, -1.0F},
{-1.0F, 1.0F} };
DTYPE answer[2][4] = { {1.0F, 0.0F, 1.0F, 4.0F},
{5.0F, 4.0F, 5.0F, 8.0F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(aOrder, aDimSize);
XTensor * b = NewTensor(bOrder, bDimSize);
XTensor * c = NewTensor(aOrder, aDimSize);
XTensor * cMe = NewTensor(aOrder, aDimSize);
XTensor cUser;
/* initialize variables */
a->SetData(aData, aUnitNum);
cMe->SetData(aData, aUnitNum);
b->SetData(bData, bUnitNum);
c->SetZeroAll();
/* call SumDim function */
_SumDim(a, b, c, 1);
_SumDim(cMe, b, 1);
cUser = SumDim(*a, *b, 1);
/* check results */
cpuTest = c->CheckData(answer, aUnitNum)
&& cMe->CheckData(answer, aUnitNum)
&& cUser.CheckData(answer, aUnitNum);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(bOrder, bDimSize, X_FLOAT, 1.0F, 0);
XTensor * cGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * cMeGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor cUserGPU;
/* Initialize variables */
aGPU->SetData(aData, aUnitNum);
cMeGPU->SetData(aData, aUnitNum);
bGPU->SetData(bData, bUnitNum);
cGPU->SetZeroAll();
/* call sum function */
_SumDim(aGPU, bGPU, cGPU, 1);
_SumDim(cMeGPU, bGPU, 1);
cUserGPU = SumDim(*aGPU, *bGPU, 1);
/* check results */
gpuTest = cGPU->CheckData(answer, aUnitNum)
&& cMeGPU->CheckData(answer, aUnitNum)
&& cUserGPU.CheckData(answer, aUnitNum);
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete aGPU;
delete bGPU;
delete cGPU;
delete cMeGPU;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete c;
delete cMe;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for SumDim Function */
bool TestSumDim()
{
XPRINT(0, stdout, "[TEST SUMDIM] tensor summation c = a + b * beta by broadcasting\n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestSumDim1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* case 2 test */
caseFlag = TestSumDim2();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 2 failed!\n");
}
else
XPRINT(0, stdout, ">> case 2 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-30
* I finish my summer holidays and go back to study.
*/
#ifndef __TEST_SUMDIM_H__
#define __TEST_SUMDIM_H__
#include "../core/arithmetic/SumDim.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for SumDim Function */
extern "C"
bool TestSumDim();
} // namespace nts(NiuTrans.Tensor)
#endif // __TEST_SUMDIM_H__
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/
#include "../core/math/Unary.h"
#include "TTan.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: test Tan function.
Set every entry to its tangent value.
*/
bool TestTan1()
{
/* a tensor of size (3, 2) */
int order = 2;
int * dimSize = new int[order];
dimSize[0] = 3;
dimSize[1] = 2;
int unitNum = 1;
for (int i = 0; i < order; i++)
unitNum *= dimSize[i];
DTYPE aData[3][2] = { {1.0F, 2.0F},
{-1.0F, -2.0F},
{0.0F, 0.5F} };
DTYPE answer[3][2] = { {1.5574F, -2.1850F},
{-1.5574F, 2.1850F},
{0.0F, 0.5463F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(order, dimSize);
XTensor * b = NewTensor(order, dimSize);
XTensor * aMe = NewTensor(order, dimSize);
XTensor bUser;
/* initialize variables */
a->SetData(aData, unitNum);
aMe->SetData(aData, unitNum);
/* call Tan function */
_Tan(a, b);
_TanMe(aMe);
bUser = Tan(*a);
/* check results */
cpuTest = b->CheckData(answer, unitNum, 1e-4F) && aMe->CheckData(answer, unitNum, 1e-4F) && bUser.CheckData(answer, unitNum, 1e-4F);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor * aMeGPU = NewTensor(order, dimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU;
/* Initialize variables */
aGPU->SetData(aData, unitNum);
aMeGPU->SetData(aData, unitNum);
/* call Tan function */
_Tan(aGPU, bGPU);
_TanMe(aMeGPU);
bUserGPU = Tan(*aGPU);
/* check results */
gpuTest = bGPU->CheckData(answer, unitNum, 1e-4F) && aMeGPU->CheckData(answer, unitNum, 1e-4F) && bUserGPU.CheckData(answer, unitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete aMe;
delete aGPU;
delete bGPU;
delete aMeGPU;
delete[] dimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete aMe;
delete[] dimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for Tan Function */
bool TestTan()
{
XPRINT(0, stdout, "[TEST Tan] set every entry to its tangent value \n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestTan1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
...@@ -16,31 +16,16 @@ ...@@ -16,31 +16,16 @@
*/ */
/* /*
* $Created by: LI Yinqiao (li.yin.qiao.2012@hotmail.com) 2018-7-11 * $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-31
*/ */
#ifndef __LOG_CUH__ #ifndef __TEST_TAN_H__
#define __LOG_CUH__ #define __TEST_TAN_H__
#include "Log.h"
namespace nts { // namespace nts(NiuTrans.Tensor) namespace nts { // namespace nts(NiuTrans.Tensor)
#ifdef USE_CUDA /* test for Tan Function */
bool TestTan();
/* set each entry to its log value (CUDA Kernel) */
__global__
void KernelLog(DTYPE * a, DTYPE * b, int size);
/* set each entry to its log value (CUDA Kernel) with float16 data type*/
__global__
void KernelLog(__half * a, __half * b, int size);
/* set each entry to its log value */
void _CudaLog(const XTensor * a, XTensor * b);
#endif // USE_CUDA
} // namespace nts(NiuTrans.Tensor) } // namespace nts(NiuTrans.Tensor)
#endif // __TEST_TAN_H__
#endif // __LOG_CUH__
\ No newline at end of file
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-12
*/
#include "TTranspose.h"
#include "../core/movement/CopyValues.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/*
case 1: test Transpose function.
tensor transposition of dimensions i and j
*/
bool TestTranspose1()
{
/* a tensor of size (3, 2) */
int aOrder = 2;
int * aDimSize = new int[aOrder];
aDimSize[0] = 3;
aDimSize[1] = 2;
int aUnitNum = 1;
for (int i = 0; i < aOrder; i++)
aUnitNum *= aDimSize[i];
/* a tensor of size (2, 3) */
int bOrder = 2;
int * bDimSize = new int[bOrder];
bDimSize[0] = 2;
bDimSize[1] = 3;
int bUnitNum = 1;
for (int i = 0; i < bOrder; i++)
bUnitNum *= bDimSize[i];
DTYPE aData[3][2] = { {1.0F, 2.0F},
{3.0F, 4.0F},
{5.0F, 6.0F} };
DTYPE answer[2][3] = { {1.0F, 3.0F, 5.0F},
{2.0F, 4.0F, 6.0F} };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(aOrder, aDimSize);
XTensor * b = NewTensor(bOrder, bDimSize);
XTensor bUser;
/* initialize variables */
a->SetData(aData, aUnitNum);
/* call Transpose function */
_Transpose(a, b, 0, 1);
bUser = Transpose(*a, 0, 1);
/* check results */
cpuTest = b->CheckData(answer, aUnitNum, 1e-4F)
&& bUser.CheckData(answer, aUnitNum, 1e-4F);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(bOrder, bDimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU;
/* Initialize variables */
aGPU->SetData(aData, aUnitNum);
/* call Transpose function */
_Transpose(aGPU, bGPU, 0, 1);
bUserGPU = Transpose(*aGPU, 0, 1);
/* check results */
gpuTest = bGPU->CheckData(answer, aUnitNum, 1e-4F)
&& bUserGPU.CheckData(answer, aUnitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete aGPU;
delete bGPU;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest;
#endif // USE_CUDA
}
/*
case 2: test Transpose function.
tensor transposition of dimensions i and j
*/
bool TestTranspose2()
{
/* a tensor of size (4, 3, 2) */
int aOrder = 3;
int * aDimSize = new int[aOrder];
aDimSize[0] = 4;
aDimSize[1] = 3;
aDimSize[2] = 2;
int aUnitNum = 1;
for (int i = 0; i < aOrder; i++)
aUnitNum *= aDimSize[i];
/* a tensor of size (2, 3, 4) */
int bOrder = 3;
int * bDimSize = new int[bOrder];
bDimSize[0] = 2;
bDimSize[1] = 3;
bDimSize[2] = 4;
int bUnitNum = 1;
for (int i = 0; i < bOrder; i++)
bUnitNum *= bDimSize[i];
DTYPE aData[4][3][2] = { { {1.0F, 2.0F},
{3.0F, 4.0F},
{5.0F, 6.0F} },
{ {2.0F, 4.0F},
{4.0F, 7.0F},
{6.0F, 8.0F} },
{ {1.0F, 2.0F},
{3.0F, 4.0F},
{5.0F, 6.0F} },
{ {2.0F, 4.0F},
{4.0F, 7.0F},
{6.0F, 8.0F} },};
DTYPE answer[2][3][4] = { { {1.0F, 2.0F, 1.0F, 2.0F},
{2.0F, 4.0F, 2.0F, 4.0F},
{3.0F, 4.0F, 3.0F, 4.0F} },
{ {4.0F, 7.0F, 4.0F, 7.0F},
{5.0F, 6.0F, 5.0F, 6.0F},
{6.0F, 8.0F, 6.0F, 8.0F} } };
/* CPU test */
bool cpuTest = true;
/* create tensors */
XTensor * a = NewTensor(aOrder, aDimSize);
XTensor * b = NewTensor(bOrder, bDimSize);
XTensor bUser;
/* initialize variables */
a->SetData(aData, aUnitNum);
/* call Transpose function */
_Transpose(a, b, 0, 2);
bUser = Transpose(*a, 0, 2);
/* check results */
cpuTest = b->CheckData(answer, aUnitNum, 1e-4F)
&& bUser.CheckData(answer, aUnitNum, 1e-4F);
#ifdef USE_CUDA
/* GPU test */
bool gpuTest = true;
/* create tensor */
XTensor * aGPU = NewTensor(aOrder, aDimSize, X_FLOAT, 1.0F, 0);
XTensor * bGPU = NewTensor(bOrder, bDimSize, X_FLOAT, 1.0F, 0);
XTensor bUserGPU;
/* Initialize variables */
aGPU->SetData(aData, aUnitNum);
/* call Transpose function */
_Transpose(aGPU, bGPU, 0, 2);
bUserGPU = Transpose(*aGPU, 0, 2);
/* check results */
gpuTest = bGPU->CheckData(answer, aUnitNum, 1e-4F)
&& bUserGPU.CheckData(answer, aUnitNum, 1e-4F);
/* destroy variables */
delete a;
delete b;
delete aGPU;
delete bGPU;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest && gpuTest;
#else
/* destroy variables */
delete a;
delete b;
delete[] aDimSize;
delete[] bDimSize;
return cpuTest;
#endif // USE_CUDA
}
/* other cases */
/*
TODO!!
*/
/* test for Transpose Function */
bool TestTranspose()
{
XPRINT(0, stdout, "[TEST TRANSPOSE] tensor transposition with specified dimensions \n");
bool returnFlag = true, caseFlag = true;
/* case 1 test */
caseFlag = TestTranspose1();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 1 failed!\n");
}
else
XPRINT(0, stdout, ">> case 1 passed!\n");
/* case 2 test */
caseFlag = TestTranspose2();
if (!caseFlag) {
returnFlag = false;
XPRINT(0, stdout, ">> case 2 failed!\n");
}
else
XPRINT(0, stdout, ">> case 2 passed!\n");
/* other cases test */
/*
TODO!!
*/
if (returnFlag) {
XPRINT(0, stdout, ">> All Passed!\n");
}
else
XPRINT(0, stdout, ">> Failed!\n");
XPRINT(0, stdout, "\n");
return returnFlag;
}
} // namespace nts(NiuTrans.Tensor)
/* NiuTrans.Tensor - an open-source tensor library
* Copyright (C) 2017, Natural Language Processing Lab, Northestern University.
* All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* $Created by: Xu Chen (email: hello_master1954@163.com) 2018-07-30
*/
#ifndef __TEST_TRANSPOSE_H__
#define __TEST_TRANSPOSE_H__
#include "../core/shape/Transpose.h"
namespace nts { // namespace nts(NiuTrans.Tensor)
/* test for Transpose Function */
bool TestTranspose();
} // namespace nts(NiuTrans.Tensor)
#endif // __TEST_TRANSPOSE_H__
...@@ -32,15 +32,17 @@ bool Test() ...@@ -32,15 +32,17 @@ bool Test()
wrong = !TestAbsolute() || wrong; wrong = !TestAbsolute() || wrong;
wrong = !TestConcatenate() || wrong; wrong = !TestConcatenate() || wrong;
wrong = !TestConcatenateSolely() || wrong; wrong = !TestConcatenateSolely() || wrong;
wrong = !TestCos() || wrong;
wrong = !TestConvertDataType() || wrong; wrong = !TestConvertDataType() || wrong;
wrong = !TestCopyIndexed() || wrong; wrong = !TestCopyIndexed() || wrong;
wrong = !TestCopyValues() || wrong; wrong = !TestCopyValues() || wrong;
wrong = !TestDiv() || wrong;
wrong = !TestExp() || wrong;
wrong = !TestLog() || wrong; wrong = !TestLog() || wrong;
wrong = !TestMatrixMul() || wrong; wrong = !TestMatrixMul() || wrong;
wrong = !TestMatrixMul2D() || wrong; wrong = !TestMatrixMul2D() || wrong;
wrong = !TestMatrixMul2DParallel() || wrong; wrong = !TestMatrixMul2DParallel() || wrong;
wrong = !TestMatrixMulBatched() || wrong; wrong = !TestMatrixMulBatched() || wrong;
wrong = !TestMatrixMulBatchedCPU() || wrong;
wrong = !TestMerge() || wrong; wrong = !TestMerge() || wrong;
wrong = !TestMultiply() || wrong; wrong = !TestMultiply() || wrong;
wrong = !TestNegate() || wrong; wrong = !TestNegate() || wrong;
...@@ -56,11 +58,16 @@ bool Test() ...@@ -56,11 +58,16 @@ bool Test()
wrong = !TestSetAscendingOrder() || wrong; wrong = !TestSetAscendingOrder() || wrong;
wrong = !TestSetData() || wrong; wrong = !TestSetData() || wrong;
wrong = !TestSign() || wrong; wrong = !TestSign() || wrong;
wrong = !TestSin() || wrong;
wrong = !TestSort() || wrong; wrong = !TestSort() || wrong;
wrong = !TestSplit() || wrong; wrong = !TestSplit() || wrong;
wrong = !TestSub() || wrong;
wrong = !TestSum() || wrong; wrong = !TestSum() || wrong;
wrong = !TestSumByColumnTV() || wrong; wrong = !TestSumByColumnTV() || wrong;
wrong = !TestSumByColumnVT() || wrong; wrong = !TestSumByColumnVT() || wrong;
wrong = !TestSumDim() || wrong;
wrong = !TestTan() || wrong;
wrong = !TestTranspose() || wrong;
wrong = !TestTopK() || wrong; wrong = !TestTopK() || wrong;
wrong = !TestUnsqueeze() || wrong; wrong = !TestUnsqueeze() || wrong;
wrong = !TestXMem() || wrong; wrong = !TestXMem() || wrong;
......
...@@ -25,15 +25,17 @@ ...@@ -25,15 +25,17 @@
#include "TAbsolute.h" #include "TAbsolute.h"
#include "TConcatenate.h" #include "TConcatenate.h"
#include "TConcatenateSolely.h" #include "TConcatenateSolely.h"
#include "TCos.h"
#include "TConvertDataType.h" #include "TConvertDataType.h"
#include "TCopyIndexed.h" #include "TCopyIndexed.h"
#include "TCopyValues.h" #include "TCopyValues.h"
#include "TDiv.h"
#include "TExp.h"
#include "TLog.h" #include "TLog.h"
#include "TMatrixMul.h" #include "TMatrixMul.h"
#include "TMatrixMul2D.h" #include "TMatrixMul2D.h"
#include "TMatrixMul2DParallel.h" #include "TMatrixMul2DParallel.h"
#include "TMatrixMulBatched.h" #include "TMatrixMulBatched.h"
#include "TMatrixMULBatchedCPU.h"
#include "TMerge.h" #include "TMerge.h"
#include "TMultiply.h" #include "TMultiply.h"
#include "TNegate.h" #include "TNegate.h"
...@@ -49,11 +51,16 @@ ...@@ -49,11 +51,16 @@
#include "TSetAscendingOrder.h" #include "TSetAscendingOrder.h"
#include "TSetData.h" #include "TSetData.h"
#include "TSign.h" #include "TSign.h"
#include "TSin.h"
#include "TSort.h" #include "TSort.h"
#include "TSplit.h" #include "TSplit.h"
#include "TSub.h"
#include "TSum.h" #include "TSum.h"
#include "TSumByColumnTV.h" #include "TSumByColumnTV.h"
#include "TSumByColumnVT.h" #include "TSumByColumnVT.h"
#include "TSumDim.h"
#include "TTan.h"
#include "TTranspose.h"
#include "TTopK.h" #include "TTopK.h"
#include "TUnsqueeze.h" #include "TUnsqueeze.h"
#include "TXMem.h" #include "TXMem.h"
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论