Megvii Engine Team
|
d8bb3ff5b4
|
fix(cuda): fix fp16 tensorcore gemm split k workspace
GitOrigin-RevId: d04a0e0985
|
3 years ago |
Megvii Engine Team
|
d7b0994a3e
|
feat(cuda): add fp16 compute 16 kernel
GitOrigin-RevId: e03435be02
|
3 years ago |
Megvii Engine Team
|
8a2e92bd6c
|
refactor(cuda): depthwish large kernel
GitOrigin-RevId: dade8710b4
|
3 years ago |
Megvii Engine Team
|
6b8a69d5b6
|
feat(cuda): float16 depthwise large kernel conv compute fp32
GitOrigin-RevId: 3050d48f26
|
3 years ago |
Megvii Engine Team
|
bc385b5374
|
feat(cuda): support float16 depthwise large kernel conv
GitOrigin-RevId: fdc1b15fbc
|
3 years ago |
Megvii Engine Team
|
7d2063e35a
|
perf(cuda): speedup conv backward data with small feature map and large filter size
GitOrigin-RevId: 85592bca6b
|
3 years ago |
Megvii Engine Team
|
72403e8929
|
perf(cuda): speedup chanwise conv with small feature map and large filter size
GitOrigin-RevId: e65b2ce856
|
3 years ago |
Megvii Engine Team
|
ab6d12caff
|
feat(mge): add conv padding mode
GitOrigin-RevId: 147ced856e
|
3 years ago |
Megvii Engine Team
|
47fe766310
|
feat(dnn/cuda): add implicit bmm kernels for large kernel depthwise convolution backward filter opr
GitOrigin-RevId: 932e7689e8
|
3 years ago |
Megvii Engine Team
|
dcc9693582
|
feat(dnn/cuda): add heuristic rule for implicit batched gemm large kernel dwconv2d kernels
GitOrigin-RevId: 2d2c213bfd
|
3 years ago |
Megvii Engine Team
|
6cefabe734
|
fix(dnn/cuda): fix ci
GitOrigin-RevId: 8267e5f9dd
|
3 years ago |
Megvii Engine Team
|
888f4e46ae
|
feat(dnn/cuda): add implicit bmm large kernel dwconv2d dgrad kernels
GitOrigin-RevId: fcb7974d62
|
3 years ago |
Megvii Engine Team
|
08d8635ff5
|
feat(dnn/cuda): add implicit bmm large kernel dwconv2d fprop impl
GitOrigin-RevId: feb09ebb58
|
3 years ago |
Megvii Engine Team
|
260923e11c
|
perf(aarch64): optimize aarch64 uint16 relayout with block_w==3
GitOrigin-RevId: fe6aaaac0c
|
3 years ago |
Megvii Engine Team
|
95ac055538
|
feat(dnn,mgb,imperative): add diag opr implement
GitOrigin-RevId: 43016ffa2b
|
3 years ago |
Megvii Engine Team
|
39d77fb55a
|
feat(arm): add arm rnn_cell/lstm_cell/lstm optimized kernel
GitOrigin-RevId: b9bb7352bc
|
3 years ago |
Megvii Engine Team
|
f509b1be9b
|
fix(build): split elemwise_multi_type cpp
GitOrigin-RevId: 13267e9db6
|
3 years ago |
Megvii Engine Team
|
3251f50114
|
fix(mgb/cuda-stub): add libcuda-wrap_11.4.h to fit the CUDA11.4 toolchain
GitOrigin-RevId: efa38f00d1
|
3 years ago |
Megvii Engine Team
|
ee0b95e935
|
feat(dnn/elemwise/arm_common): support part of arm ternary elemwise multithread
BCAST111C_VEC_BCAST111C and BCAST101_VEC_BCAST101
GitOrigin-RevId: 0e26553c90
|
3 years ago |
Megvii Engine Team
|
cbbca5fb10
|
feat(mge): add softmax op use cudnn api
GitOrigin-RevId: 7734ebf8c4
|
3 years ago |
Megvii Engine Team
|
20b42a8c3b
|
fix(dnn): add naive lstm kernel
GitOrigin-RevId: f08ef810cf
|
3 years ago |
Megvii Engine Team
|
2faa6ea5a9
|
Merge pull request #213 from kxz18:rnn
GitOrigin-RevId: 9e9215c115
|
3 years ago |
Megvii Engine Team
|
82be0aaced
|
test(dnn): fix compute capability requirement for NCHWX test
GitOrigin-RevId: d2f8022be1
|
3 years ago |
Megvii Engine Team
|
3b41840b68
|
fix(mgb): change caffepooling log level
GitOrigin-RevId: 290d243ef5
|
3 years ago |
Megvii Engine Team
|
1999307015
|
feat(mgb/opr): add dropout kernel
GitOrigin-RevId: d248bd2005
|
3 years ago |
Megvii Engine Team
|
32717b0ca4
|
fix(build): split some cpp, which consume two many mem when build
make build possible at 8G ddr env, when -j8
GitOrigin-RevId: d0c442b41d
|
3 years ago |
Megvii Engine Team
|
a93741815b
|
feat(mgb/opr): add layernorm forward and backward kernel
GitOrigin-RevId: 0cd484e753
|
3 years ago |
Megvii Engine Team
|
a404cd7d06
|
fix(mgb/src): add tensorRT version check
GitOrigin-RevId: 7abfd30cab
|
3 years ago |
Megvii Engine Team
|
c53cad2049
|
feat(cmake): format all cmake file
GitOrigin-RevId: 0a4ecab99b
|
3 years ago |
Megvii Engine Team
|
a5803058b4
|
fix(dnn/x86): opt algo order
GitOrigin-RevId: 6dd14f9a96
|
3 years ago |
Megvii Engine Team
|
93310c0e4b
|
fix(mgb/gopt): fix cpu global layout transform fastrun error
GitOrigin-RevId: ea254297e5
|
3 years ago |
Megvii Engine Team
|
c90e0b54be
|
perf(arm): optimize arm uint16 relayout with n=4
GitOrigin-RevId: 5779c6b9c1
|
3 years ago |
Megvii Engine Team
|
f6d9909460
|
feat(dnn): add elemwise multi type support i16xf32 and u8xf32
GitOrigin-RevId: 2fe469bb4e
|
3 years ago |
Megvii Engine Team
|
d9a46ea47b
|
fix(dnn): correct behaviour of floor div for int tensor
GitOrigin-RevId: 1444f69cce
|
3 years ago |
Megvii Engine Team
|
0ad5eeaedd
|
feat(mgb/gopt): global layout transform support opencl
GitOrigin-RevId: 132605c7d9
|
3 years ago |
kxz@thumt102-1
|
8f48da7ffe
|
feat(mgb/opr): add cell level rnn/lstm and sequence level rnn/lstm
|
3 years ago |
Megvii Engine Team
|
2881934cb8
|
feat(dnn/check_non_finite): addmul scale to check_non_finite opr
GitOrigin-RevId: c35a219e52
|
3 years ago |
Megvii Engine Team
|
6bb5409976
|
feat(dnn/src): add images2neibs kernel of opencl and related test
GitOrigin-RevId: 82242b7437
|
3 years ago |
Megvii Engine Team
|
6ce4a34403
|
feat(dnn): add fallback postprocess
GitOrigin-RevId: 4201a0f158
|
3 years ago |
Megvii Engine Team
|
c96dbd29b8
|
fix(dnn/arm_common): support more monotonous case in arm typecvt for performance
GitOrigin-RevId: 9e28a64d93
|
3 years ago |
Megvii Engine Team
|
ead611e11d
|
perf(dnn): slightly improve arm neon transcendental function performance
GitOrigin-RevId: 210d88f81e
|
3 years ago |
Megvii Engine Team
|
0d16952470
|
fix(mgb/cuda): fix conv error when the input tensor is too large
GitOrigin-RevId: 1b1d693795
|
3 years ago |
Megvii Engine Team
|
02d5f46d90
|
fix(mgb/x86): fix convbias crash on X86
GitOrigin-RevId: cc7283c6a2
|
3 years ago |
Megvii Engine Team
|
accb2d8d47
|
fix(mgb/serialize): fix flatbuffer compatibility issues
GitOrigin-RevId: e4771d6bc4
|
3 years ago |
Megvii Engine Team
|
5e07e1e0f9
|
fix(dnn/falback): let cpu be able to execute int4 model
GitOrigin-RevId: 1a6b78f3b6
|
3 years ago |
Megvii Engine Team
|
2696e4efaa
|
feat(dnn): add float16 for remap backward
GitOrigin-RevId: 0263030051
|
3 years ago |
Megvii Engine Team
|
1f0cc891b0
|
feat(dnn): enable eye to support bool
GitOrigin-RevId: 76d874d5b7
|
3 years ago |
Megvii Engine Team
|
11d75fecb5
|
feat(dnn/check_non_finite): add batch check_non_finite
GitOrigin-RevId: e108133282
|
3 years ago |
Megvii Engine Team
|
2318ea3f15
|
fix(dnn): fix naive average pooling overflow bug for int8 type
GitOrigin-RevId: b60a7b6cf8
|
3 years ago |
Megvii Engine Team
|
2d54ad185b
|
feat(lite): add global layout transform interface for load and run
GitOrigin-RevId: 65c2430ec2
|
3 years ago |