Megvii Engine Team
|
d9c4ef59fe
|
perf(imperative): using simple hash key in heuristic cache
GitOrigin-RevId: 6fddd612e7
|
3 years ago |
Megvii Engine Team
|
26ea33c6a7
|
perf(imperative): improve convbwd performance
GitOrigin-RevId: cfc8623d7a
|
3 years ago |
Megvii Engine Team
|
3949d425fb
|
feat(core): always show MegEngine version and git commit id
GitOrigin-RevId: 4daa5be6d6
|
3 years ago |
Megvii Engine Team
|
b6ad457269
|
feat(cuda): support int1 simplewq conv
GitOrigin-RevId: 9c37c41bc7
|
3 years ago |
Megvii Engine Team
|
331567af5d
|
fix(opencl/ci): misc opt and fix:
1: fix megbrain test failed on mali 2.1 devices
2: reduce ci time (about reduce 20min)
GitOrigin-RevId: 4dcdcd48a6
|
3 years ago |
Megvii Engine Team
|
ff6a3bb819
|
fix(fallback): delete the repeat opcaller in fallback and arm_common
GitOrigin-RevId: 87046b8197
|
3 years ago |
Megvii Engine Team
|
547945e854
|
feat(fallback): support general intrinsic in elemwise in fallback
GitOrigin-RevId: 96ff2e88cc
|
3 years ago |
Megvii Engine Team
|
a017bed3aa
|
fix(fallback): reman general intrinsic type and add more intrinsic
GitOrigin-RevId: 37409bae9a
|
3 years ago |
Megvii Engine Team
|
fd6f8e58b0
|
feat(mgb/dtype): add dtype qint1
GitOrigin-RevId: abe9fb68b1
|
3 years ago |
Megvii Engine Team
|
2a900a69cb
|
perf(imperative): improve reduce op performance
GitOrigin-RevId: 26d982a7b8
|
3 years ago |
Megvii Engine Team
|
72a70dd6a7
|
perf(imperative): specialize convolution implementation
GitOrigin-RevId: 33634c550f
|
3 years ago |
Megvii Engine Team
|
2b80806f21
|
perf(imperative/src): improve dot performance
GitOrigin-RevId: 35b5bd164f
|
3 years ago |
Megvii Engine Team
|
3c3fc6f33c
|
refactor(imperative): move python code of elemwise/reduce/conv2d/bn to c++
GitOrigin-RevId: 01b5324392
|
3 years ago |
Megvii Engine Team
|
e400b7ffe5
|
perf(imperative): enable memory forwarding for imperative
GitOrigin-RevId: 7c1993979c
|
3 years ago |
Megvii Engine Team
|
1ce78aa09b
|
fix(imperative): destruct dnn handles at last
GitOrigin-RevId: 7a67c68c55
|
3 years ago |
Megvii Engine Team
|
3228fb75a5
|
fix(cuda): conv algo heuristic choose
GitOrigin-RevId: 95c5e7d627
|
3 years ago |
Megvii Engine Team
|
8c415f4ed7
|
feat(dnn): cuda nhwc nearest resize support not 1 or 3 channel
GitOrigin-RevId: 764504c341
|
3 years ago |
Megvii Engine Team
|
6fb5a34360
|
build(flatbuffer/cx2): fix cx2 build and fix uclibc build flatbuffer
GitOrigin-RevId: af851e155f
|
3 years ago |
Megvii Engine Team
|
87de704a46
|
feat(gopt): fuse conv h_swish
GitOrigin-RevId: a3d12991fb
|
3 years ago |
Megvii Engine Team
|
3726f5cc92
|
feat(gopt): merger consecutive relayout and dimshuffle to one relayout to optimize CD4 performarce
GitOrigin-RevId: a058776be3
|
3 years ago |
Megvii Engine Team
|
ac26bdcef5
|
fix(cuda): fix direct conv speed and memory problem
GitOrigin-RevId: 6faeeff3b8
|
3 years ago |
Megvii Engine Team
|
f7994683bd
|
feat(cuda): add large kernel direct conv to heuristic algo chooser
GitOrigin-RevId: bc927b6df7
|
3 years ago |
Megvii Engine Team
|
6dc0c0b9cc
|
fix(dnn): fix the sync problem in some kernels
GitOrigin-RevId: df3f7dc51b
|
3 years ago |
Megvii Engine Team
|
04193e3bd1
|
feat(dnn): add nearest mode for remap and resize
GitOrigin-RevId: 31e7b72a78
|
3 years ago |
Megvii Engine Team
|
93c7e45188
|
feat(arm): delete the reduant implement
GitOrigin-RevId: ff32a3dc8b
|
3 years ago |
Megvii Engine Team
|
e34a642b31
|
feat(fallback): reduce support general intrinsic
GitOrigin-RevId: f250aa7b2a
|
3 years ago |
Megvii Engine Team
|
10f23778a8
|
feat(fallback): add simd general intrinsic
GitOrigin-RevId: ad78ba689f
|
3 years ago |
Megvii Engine Team
|
286051ede1
|
feat(dnn): differentiate sass kernel with cuda version
GitOrigin-RevId: 40bb4423b8
|
3 years ago |
Megvii Engine Team
|
f78b60ec10
|
feat(bazel): make bazel gensass depend on cuda toolchain version automatically
GitOrigin-RevId: 9433f21a91
|
3 years ago |
Megvii Engine Team
|
f48227c07d
|
feat(mgb): show more details for cuda driver api call
GitOrigin-RevId: 40e63d9dac
|
3 years ago |
Megvii Engine Team
|
d8bb3ff5b4
|
fix(cuda): fix fp16 tensorcore gemm split k workspace
GitOrigin-RevId: d04a0e0985
|
3 years ago |
Megvii Engine Team
|
d7b0994a3e
|
feat(cuda): add fp16 compute 16 kernel
GitOrigin-RevId: e03435be02
|
3 years ago |
Megvii Engine Team
|
8a2e92bd6c
|
refactor(cuda): depthwish large kernel
GitOrigin-RevId: dade8710b4
|
3 years ago |
Megvii Engine Team
|
6b8a69d5b6
|
feat(cuda): float16 depthwise large kernel conv compute fp32
GitOrigin-RevId: 3050d48f26
|
3 years ago |
Megvii Engine Team
|
bc385b5374
|
feat(cuda): support float16 depthwise large kernel conv
GitOrigin-RevId: fdc1b15fbc
|
3 years ago |
Megvii Engine Team
|
7d2063e35a
|
perf(cuda): speedup conv backward data with small feature map and large filter size
GitOrigin-RevId: 85592bca6b
|
3 years ago |
Megvii Engine Team
|
72403e8929
|
perf(cuda): speedup chanwise conv with small feature map and large filter size
GitOrigin-RevId: e65b2ce856
|
3 years ago |
Megvii Engine Team
|
ab6d12caff
|
feat(mge): add conv padding mode
GitOrigin-RevId: 147ced856e
|
3 years ago |
Megvii Engine Team
|
47fe766310
|
feat(dnn/cuda): add implicit bmm kernels for large kernel depthwise convolution backward filter opr
GitOrigin-RevId: 932e7689e8
|
3 years ago |
Megvii Engine Team
|
dcc9693582
|
feat(dnn/cuda): add heuristic rule for implicit batched gemm large kernel dwconv2d kernels
GitOrigin-RevId: 2d2c213bfd
|
3 years ago |
Megvii Engine Team
|
6cefabe734
|
fix(dnn/cuda): fix ci
GitOrigin-RevId: 8267e5f9dd
|
3 years ago |
Megvii Engine Team
|
888f4e46ae
|
feat(dnn/cuda): add implicit bmm large kernel dwconv2d dgrad kernels
GitOrigin-RevId: fcb7974d62
|
3 years ago |
Megvii Engine Team
|
08d8635ff5
|
feat(dnn/cuda): add implicit bmm large kernel dwconv2d fprop impl
GitOrigin-RevId: feb09ebb58
|
3 years ago |
Megvii Engine Team
|
260923e11c
|
perf(aarch64): optimize aarch64 uint16 relayout with block_w==3
GitOrigin-RevId: fe6aaaac0c
|
3 years ago |
Megvii Engine Team
|
95ac055538
|
feat(dnn,mgb,imperative): add diag opr implement
GitOrigin-RevId: 43016ffa2b
|
3 years ago |
Megvii Engine Team
|
39d77fb55a
|
feat(arm): add arm rnn_cell/lstm_cell/lstm optimized kernel
GitOrigin-RevId: b9bb7352bc
|
3 years ago |
Megvii Engine Team
|
f509b1be9b
|
fix(build): split elemwise_multi_type cpp
GitOrigin-RevId: 13267e9db6
|
3 years ago |
Megvii Engine Team
|
3251f50114
|
fix(mgb/cuda-stub): add libcuda-wrap_11.4.h to fit the CUDA11.4 toolchain
GitOrigin-RevId: efa38f00d1
|
3 years ago |
Megvii Engine Team
|
ee0b95e935
|
feat(dnn/elemwise/arm_common): support part of arm ternary elemwise multithread
BCAST111C_VEC_BCAST111C and BCAST101_VEC_BCAST101
GitOrigin-RevId: 0e26553c90
|
3 years ago |
Megvii Engine Team
|
cbbca5fb10
|
feat(mge): add softmax op use cudnn api
GitOrigin-RevId: 7734ebf8c4
|
3 years ago |