761 Commits (v1.11.1)

Author SHA1 Message Date
  Megvii Engine Team 2b80806f21 perf(imperative/src): improve dot performance 3 years ago
  Megvii Engine Team 3c3fc6f33c refactor(imperative): move python code of elemwise/reduce/conv2d/bn to c++ 3 years ago
  Megvii Engine Team e400b7ffe5 perf(imperative): enable memory forwarding for imperative 3 years ago
  Megvii Engine Team 1ce78aa09b fix(imperative): destruct dnn handles at last 3 years ago
  Megvii Engine Team 3228fb75a5 fix(cuda): conv algo heuristic choose 3 years ago
  Megvii Engine Team 8c415f4ed7 feat(dnn): cuda nhwc nearest resize support not 1 or 3 channel 3 years ago
  Megvii Engine Team 6fb5a34360 build(flatbuffer/cx2): fix cx2 build and fix uclibc build flatbuffer 3 years ago
  Megvii Engine Team 87de704a46 feat(gopt): fuse conv h_swish 3 years ago
  Megvii Engine Team 3726f5cc92 feat(gopt): merger consecutive relayout and dimshuffle to one relayout to optimize CD4 performarce 3 years ago
  Megvii Engine Team ac26bdcef5 fix(cuda): fix direct conv speed and memory problem 3 years ago
  Megvii Engine Team f7994683bd feat(cuda): add large kernel direct conv to heuristic algo chooser 3 years ago
  Megvii Engine Team 6dc0c0b9cc fix(dnn): fix the sync problem in some kernels 3 years ago
  Megvii Engine Team 04193e3bd1 feat(dnn): add nearest mode for remap and resize 3 years ago
  Megvii Engine Team 93c7e45188 feat(arm): delete the reduant implement 3 years ago
  Megvii Engine Team e34a642b31 feat(fallback): reduce support general intrinsic 3 years ago
  Megvii Engine Team 10f23778a8 feat(fallback): add simd general intrinsic 3 years ago
  Megvii Engine Team 286051ede1 feat(dnn): differentiate sass kernel with cuda version 3 years ago
  Megvii Engine Team f78b60ec10 feat(bazel): make bazel gensass depend on cuda toolchain version automatically 3 years ago
  Megvii Engine Team f48227c07d feat(mgb): show more details for cuda driver api call 3 years ago
  Megvii Engine Team d8bb3ff5b4 fix(cuda): fix fp16 tensorcore gemm split k workspace 3 years ago
  Megvii Engine Team d7b0994a3e feat(cuda): add fp16 compute 16 kernel 3 years ago
  Megvii Engine Team 8a2e92bd6c refactor(cuda): depthwish large kernel 3 years ago
  Megvii Engine Team 6b8a69d5b6 feat(cuda): float16 depthwise large kernel conv compute fp32 3 years ago
  Megvii Engine Team bc385b5374 feat(cuda): support float16 depthwise large kernel conv 3 years ago
  Megvii Engine Team 7d2063e35a perf(cuda): speedup conv backward data with small feature map and large filter size 3 years ago
  Megvii Engine Team 72403e8929 perf(cuda): speedup chanwise conv with small feature map and large filter size 3 years ago
  Megvii Engine Team ab6d12caff feat(mge): add conv padding mode 3 years ago
  Megvii Engine Team 47fe766310 feat(dnn/cuda): add implicit bmm kernels for large kernel depthwise convolution backward filter opr 3 years ago
  Megvii Engine Team dcc9693582 feat(dnn/cuda): add heuristic rule for implicit batched gemm large kernel dwconv2d kernels 3 years ago
  Megvii Engine Team 6cefabe734 fix(dnn/cuda): fix ci 3 years ago
  Megvii Engine Team 888f4e46ae feat(dnn/cuda): add implicit bmm large kernel dwconv2d dgrad kernels 3 years ago
  Megvii Engine Team 08d8635ff5 feat(dnn/cuda): add implicit bmm large kernel dwconv2d fprop impl 3 years ago
  Megvii Engine Team 260923e11c perf(aarch64): optimize aarch64 uint16 relayout with block_w==3 3 years ago
  Megvii Engine Team 95ac055538 feat(dnn,mgb,imperative): add diag opr implement 3 years ago
  Megvii Engine Team 39d77fb55a feat(arm): add arm rnn_cell/lstm_cell/lstm optimized kernel 3 years ago
  Megvii Engine Team f509b1be9b fix(build): split elemwise_multi_type cpp 3 years ago
  Megvii Engine Team 3251f50114 fix(mgb/cuda-stub): add libcuda-wrap_11.4.h to fit the CUDA11.4 toolchain 3 years ago
  Megvii Engine Team ee0b95e935 feat(dnn/elemwise/arm_common): support part of arm ternary elemwise multithread 3 years ago
  Megvii Engine Team cbbca5fb10 feat(mge): add softmax op use cudnn api 3 years ago
  Megvii Engine Team 20b42a8c3b fix(dnn): add naive lstm kernel 3 years ago
  Megvii Engine Team 2faa6ea5a9 Merge pull request #213 from kxz18:rnn 3 years ago
  Megvii Engine Team 82be0aaced test(dnn): fix compute capability requirement for NCHWX test 3 years ago
  Megvii Engine Team 3b41840b68 fix(mgb): change caffepooling log level 3 years ago
  Megvii Engine Team 1999307015 feat(mgb/opr): add dropout kernel 3 years ago
  Megvii Engine Team 32717b0ca4 fix(build): split some cpp, which consume two many mem when build 3 years ago
  Megvii Engine Team a93741815b feat(mgb/opr): add layernorm forward and backward kernel 3 years ago
  Megvii Engine Team a404cd7d06 fix(mgb/src): add tensorRT version check 3 years ago
  Megvii Engine Team c53cad2049 feat(cmake): format all cmake file 3 years ago
  Megvii Engine Team a5803058b4 fix(dnn/x86): opt algo order 3 years ago
  Megvii Engine Team 93310c0e4b fix(mgb/gopt): fix cpu global layout transform fastrun error 3 years ago