2350 Commits (286051ede10bdb3603a0873f1bf5ed5f94feac16)
 

Author SHA1 Message Date
  Megvii Engine Team 286051ede1 feat(dnn): differentiate sass kernel with cuda version 3 years ago
  Megvii Engine Team f78b60ec10 feat(bazel): make bazel gensass depend on cuda toolchain version automatically 3 years ago
  Megvii Engine Team f48227c07d feat(mgb): show more details for cuda driver api call 3 years ago
  Megvii Engine Team bb5af9b475 feat(lite): hidden lar gflags symbols for static link 3 years ago
  Megvii Engine Team d8bb3ff5b4 fix(cuda): fix fp16 tensorcore gemm split k workspace 3 years ago
  Megvii Engine Team 597efed40b feat(lite): add get last error code interface in lite c 3 years ago
  Megvii Engine Team 90c8a58cca docs(docstring): add pad docstring 3 years ago
  Megvii Engine Team 5f4501e0f3 fix(gopt): fix conv bias fuse 2 noline 3 years ago
  Megvii Engine Team ac2f548c9a docs(imperative/dataloader): update preload description 3 years ago
  Megvii Engine Team 73b518b718 feat(lite): add get physic addr interface in lite 3 years ago
  Megvii Engine Team f67086adde fix(lite): fix lite global layout transform symvar replace error 3 years ago
  megvii-mge 4462953fba feat(mge/third_party): update cutlass version 3 years ago
  Megvii Engine Team d7b0994a3e feat(cuda): add fp16 compute 16 kernel 3 years ago
  Megvii Engine Team 8a2e92bd6c refactor(cuda): depthwish large kernel 3 years ago
  Megvii Engine Team 6b8a69d5b6 feat(cuda): float16 depthwise large kernel conv compute fp32 3 years ago
  Megvii Engine Team bc385b5374 feat(cuda): support float16 depthwise large kernel conv 3 years ago
  Megvii Engine Team 7d2063e35a perf(cuda): speedup conv backward data with small feature map and large filter size 3 years ago
  Megvii Engine Team 72403e8929 perf(cuda): speedup chanwise conv with small feature map and large filter size 3 years ago
  Megvii Engine Team 28d48f2f7a fix(mgb/src): fix megbrain cmake unsupport android_nn 3 years ago
  Megvii Engine Team ab6d12caff feat(mge): add conv padding mode 3 years ago
  Megvii Engine Team 177001d5e5 refactor(dispatch): allow dynamic type creation 3 years ago
  Megvii Engine Team 150a6a6151 perf(dispatch/trace): remove unnecessary h2d for constant 3 years ago
  Megvii Engine Team 81d8c73a41 perf(dispatch/trace): serval tricks to speed up trace 3 years ago
  Megvii Engine Team 4fa6162027 perf(dispatch): improve performance of dispatch system 3 years ago
  Megvii Engine Team ca00177719 perf(dispatch): speed up dispatch system 3 years ago
  Megvii Engine Team 187c1dc081 fix(jit): copy aux var when shallow copying JITExecutor 3 years ago
  Megvii Engine Team 7bd848ce04 fix(subgraph): fix hand-written backward for serval jit-elemwise ops 3 years ago
  Megvii Engine Team 7be7656c9f fix(imperative): explicitly manage global structures 3 years ago
  Megvii Engine Team 62034fb262 fix(imperative): make CompNode finalize happens before global object destructor 3 years ago
  Megvii Engine Team 59cbf9583d fix(subgraph): use CompiledOp in cpu to avoid workspace error 3 years ago
  Megvii Engine Team b6ce02a152 fix(subgraph): fallback back to cg if jit unsupported 3 years ago
  Megvii Engine Team 21f5a7fcc0 fix(subgraph): fix device recognition and scalar propagate 3 years ago
  Megvii Engine Team 27346b0b65 test(opr): add scalar check for opr_test 3 years ago
  Megvii Engine Team 225045236b perf(imperative): improve shape inference 3 years ago
  Megvii Engine Team df3474ca1d perf(functional): rewrite serval elemwise ops with jit subgraph 3 years ago
  Megvii Engine Team c55fda9a7c fix(fastrun): don't kill profiling worker 3 years ago
  Megvii Engine Team 2775f4580c feat(subgraph): subgraph builder supports jit and custom grad 3 years ago
  Megvii Engine Team 3c61e0e02a feat(ops): add JITFusion op 3 years ago
  Megvii Engine Team aa587446fc feat(subgraph): support shape inference for CompiledOp 3 years ago
  Megvii Engine Team 1c1e9b002d fix(rng): init layout strides 3 years ago
  Megvii Engine Team 9527859cc8 feat(opcache): add ndim and has_value to cache key 3 years ago
  Megvii Engine Team cbb47089a6 perf(interpreter): add fastpath for GetVarShape 3 years ago
  Megvii Engine Team b458178847 feat(opr): add mutable tensor opr 3 years ago
  Megvii Engine Team 47fe766310 feat(dnn/cuda): add implicit bmm kernels for large kernel depthwise convolution backward filter opr 3 years ago
  Megvii Engine Team dcc9693582 feat(dnn/cuda): add heuristic rule for implicit batched gemm large kernel dwconv2d kernels 3 years ago
  Megvii Engine Team 6cefabe734 fix(dnn/cuda): fix ci 3 years ago
  Megvii Engine Team 888f4e46ae feat(dnn/cuda): add implicit bmm large kernel dwconv2d dgrad kernels 3 years ago
  Megvii Engine Team 08d8635ff5 feat(dnn/cuda): add implicit bmm large kernel dwconv2d fprop impl 3 years ago
  Megvii Engine Team 93ceb80ad2 refactor(imperative): fix broadcast,reshape,reduce 3 years ago
  Megvii Engine Team d919aaebc7 test(imperative): reopen special interpolate test and sync when test rng 3 years ago