merge into xiao clip/scaleandshift(float16/int/int8) logsoftmax/hardtanh(float16) modify XGlobal __int8