Megvii Engine Team
|
f7994683bd
|
feat(cuda): add large kernel direct conv to heuristic algo chooser
GitOrigin-RevId: bc927b6df7
|
3 years ago |
Megvii Engine Team
|
6dc0c0b9cc
|
fix(dnn): fix the sync problem in some kernels
GitOrigin-RevId: df3f7dc51b
|
3 years ago |
Megvii Engine Team
|
04193e3bd1
|
feat(dnn): add nearest mode for remap and resize
GitOrigin-RevId: 31e7b72a78
|
3 years ago |
Megvii Engine Team
|
93c7e45188
|
feat(arm): delete the reduant implement
GitOrigin-RevId: ff32a3dc8b
|
3 years ago |
Megvii Engine Team
|
e34a642b31
|
feat(fallback): reduce support general intrinsic
GitOrigin-RevId: f250aa7b2a
|
3 years ago |
Megvii Engine Team
|
10f23778a8
|
feat(fallback): add simd general intrinsic
GitOrigin-RevId: ad78ba689f
|
3 years ago |
Megvii Engine Team
|
286051ede1
|
feat(dnn): differentiate sass kernel with cuda version
GitOrigin-RevId: 40bb4423b8
|
3 years ago |
Megvii Engine Team
|
f78b60ec10
|
feat(bazel): make bazel gensass depend on cuda toolchain version automatically
GitOrigin-RevId: 9433f21a91
|
3 years ago |
Megvii Engine Team
|
f48227c07d
|
feat(mgb): show more details for cuda driver api call
GitOrigin-RevId: 40e63d9dac
|
3 years ago |
Megvii Engine Team
|
d8bb3ff5b4
|
fix(cuda): fix fp16 tensorcore gemm split k workspace
GitOrigin-RevId: d04a0e0985
|
3 years ago |
Megvii Engine Team
|
d7b0994a3e
|
feat(cuda): add fp16 compute 16 kernel
GitOrigin-RevId: e03435be02
|
3 years ago |
Megvii Engine Team
|
8a2e92bd6c
|
refactor(cuda): depthwish large kernel
GitOrigin-RevId: dade8710b4
|
3 years ago |
Megvii Engine Team
|
6b8a69d5b6
|
feat(cuda): float16 depthwise large kernel conv compute fp32
GitOrigin-RevId: 3050d48f26
|
3 years ago |
Megvii Engine Team
|
bc385b5374
|
feat(cuda): support float16 depthwise large kernel conv
GitOrigin-RevId: fdc1b15fbc
|
3 years ago |
Megvii Engine Team
|
7d2063e35a
|
perf(cuda): speedup conv backward data with small feature map and large filter size
GitOrigin-RevId: 85592bca6b
|
3 years ago |
Megvii Engine Team
|
72403e8929
|
perf(cuda): speedup chanwise conv with small feature map and large filter size
GitOrigin-RevId: e65b2ce856
|
3 years ago |
Megvii Engine Team
|
ab6d12caff
|
feat(mge): add conv padding mode
GitOrigin-RevId: 147ced856e
|
3 years ago |
Megvii Engine Team
|
47fe766310
|
feat(dnn/cuda): add implicit bmm kernels for large kernel depthwise convolution backward filter opr
GitOrigin-RevId: 932e7689e8
|
3 years ago |
Megvii Engine Team
|
dcc9693582
|
feat(dnn/cuda): add heuristic rule for implicit batched gemm large kernel dwconv2d kernels
GitOrigin-RevId: 2d2c213bfd
|
3 years ago |
Megvii Engine Team
|
6cefabe734
|
fix(dnn/cuda): fix ci
GitOrigin-RevId: 8267e5f9dd
|
3 years ago |
Megvii Engine Team
|
888f4e46ae
|
feat(dnn/cuda): add implicit bmm large kernel dwconv2d dgrad kernels
GitOrigin-RevId: fcb7974d62
|
3 years ago |
Megvii Engine Team
|
08d8635ff5
|
feat(dnn/cuda): add implicit bmm large kernel dwconv2d fprop impl
GitOrigin-RevId: feb09ebb58
|
3 years ago |
Megvii Engine Team
|
260923e11c
|
perf(aarch64): optimize aarch64 uint16 relayout with block_w==3
GitOrigin-RevId: fe6aaaac0c
|
3 years ago |
Megvii Engine Team
|
95ac055538
|
feat(dnn,mgb,imperative): add diag opr implement
GitOrigin-RevId: 43016ffa2b
|
3 years ago |
Megvii Engine Team
|
39d77fb55a
|
feat(arm): add arm rnn_cell/lstm_cell/lstm optimized kernel
GitOrigin-RevId: b9bb7352bc
|
3 years ago |
Megvii Engine Team
|
f509b1be9b
|
fix(build): split elemwise_multi_type cpp
GitOrigin-RevId: 13267e9db6
|
3 years ago |
Megvii Engine Team
|
3251f50114
|
fix(mgb/cuda-stub): add libcuda-wrap_11.4.h to fit the CUDA11.4 toolchain
GitOrigin-RevId: efa38f00d1
|
3 years ago |
Megvii Engine Team
|
ee0b95e935
|
feat(dnn/elemwise/arm_common): support part of arm ternary elemwise multithread
BCAST111C_VEC_BCAST111C and BCAST101_VEC_BCAST101
GitOrigin-RevId: 0e26553c90
|
3 years ago |
Megvii Engine Team
|
cbbca5fb10
|
feat(mge): add softmax op use cudnn api
GitOrigin-RevId: 7734ebf8c4
|
3 years ago |
Megvii Engine Team
|
20b42a8c3b
|
fix(dnn): add naive lstm kernel
GitOrigin-RevId: f08ef810cf
|
3 years ago |
Megvii Engine Team
|
2faa6ea5a9
|
Merge pull request #213 from kxz18:rnn
GitOrigin-RevId: 9e9215c115
|
3 years ago |
Megvii Engine Team
|
82be0aaced
|
test(dnn): fix compute capability requirement for NCHWX test
GitOrigin-RevId: d2f8022be1
|
3 years ago |
Megvii Engine Team
|
3b41840b68
|
fix(mgb): change caffepooling log level
GitOrigin-RevId: 290d243ef5
|
3 years ago |
Megvii Engine Team
|
1999307015
|
feat(mgb/opr): add dropout kernel
GitOrigin-RevId: d248bd2005
|
3 years ago |
Megvii Engine Team
|
32717b0ca4
|
fix(build): split some cpp, which consume two many mem when build
make build possible at 8G ddr env, when -j8
GitOrigin-RevId: d0c442b41d
|
3 years ago |
Megvii Engine Team
|
a93741815b
|
feat(mgb/opr): add layernorm forward and backward kernel
GitOrigin-RevId: 0cd484e753
|
3 years ago |
Megvii Engine Team
|
a404cd7d06
|
fix(mgb/src): add tensorRT version check
GitOrigin-RevId: 7abfd30cab
|
3 years ago |
Megvii Engine Team
|
c53cad2049
|
feat(cmake): format all cmake file
GitOrigin-RevId: 0a4ecab99b
|
3 years ago |
Megvii Engine Team
|
a5803058b4
|
fix(dnn/x86): opt algo order
GitOrigin-RevId: 6dd14f9a96
|
3 years ago |
Megvii Engine Team
|
93310c0e4b
|
fix(mgb/gopt): fix cpu global layout transform fastrun error
GitOrigin-RevId: ea254297e5
|
3 years ago |
Megvii Engine Team
|
c90e0b54be
|
perf(arm): optimize arm uint16 relayout with n=4
GitOrigin-RevId: 5779c6b9c1
|
3 years ago |
Megvii Engine Team
|
f6d9909460
|
feat(dnn): add elemwise multi type support i16xf32 and u8xf32
GitOrigin-RevId: 2fe469bb4e
|
3 years ago |
Megvii Engine Team
|
d9a46ea47b
|
fix(dnn): correct behaviour of floor div for int tensor
GitOrigin-RevId: 1444f69cce
|
3 years ago |
Megvii Engine Team
|
0ad5eeaedd
|
feat(mgb/gopt): global layout transform support opencl
GitOrigin-RevId: 132605c7d9
|
3 years ago |
kxz@thumt102-1
|
8f48da7ffe
|
feat(mgb/opr): add cell level rnn/lstm and sequence level rnn/lstm
|
3 years ago |
Megvii Engine Team
|
2881934cb8
|
feat(dnn/check_non_finite): addmul scale to check_non_finite opr
GitOrigin-RevId: c35a219e52
|
3 years ago |
Megvii Engine Team
|
6bb5409976
|
feat(dnn/src): add images2neibs kernel of opencl and related test
GitOrigin-RevId: 82242b7437
|
3 years ago |
Megvii Engine Team
|
6ce4a34403
|
feat(dnn): add fallback postprocess
GitOrigin-RevId: 4201a0f158
|
3 years ago |
Megvii Engine Team
|
c96dbd29b8
|
fix(dnn/arm_common): support more monotonous case in arm typecvt for performance
GitOrigin-RevId: 9e28a64d93
|
3 years ago |
Megvii Engine Team
|
ead611e11d
|
perf(dnn): slightly improve arm neon transcendental function performance
GitOrigin-RevId: 210d88f81e
|
3 years ago |