name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
910B后端,Matmul算子网络走ge流程推理,算子融合为NZ格式,推理失败
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode graph
../prototxt/akg/test_mindir_akg_general_ascend_910_matmul_cpp_func_nz_ge_001.prototxt
1.安装akg和自定义算子,开启MS_DEV_DUMP_GRAPH_KERNEL_IR,可查看子图
export MS_DEV_DUMP_GRAPH_KERNEL_IR=on pip uninstall akg -y pip install akg/akg*.whl bash custom_kernels/ascend/tbe_and_aicpu/install.sh
2.context设置ge后端
3. 模型build前load配置文件akg_matmul_nz_ge.config
[acl_init_options]
ge.exec.formatMode=0
[ascend_context]
privider=ge
[graph_kernel_param]
opt_level=2
算子融合,模型推理成功,精度合理
INFO: Step [cpp_predict], cmd : cd /data/local/tmp/; source env.sh; ./test_basic_predict ms_predict /data/local/tmp/ /data/local/tmp/test_mindir_akg_general_ascend_910_matmul_cpp_func_nz_ge_001.prototxt > tmp.log 2>&1 && echo Success || echo Failed; cat tmp.log
==================== WARNING: Skipping akg as it is not installed.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 23.1.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
INFO: Step [cpp_predict], res: Looking in indexes: https://mirrors.tools.huawei.com/pypi/simple
Processing ./akg/akg-2.2-cp37-cp37m-linux_aarch64.whl
Requirement already satisfied: scipy>=1.5.2 in /home/ci/miniconda3/envs/torch_1.12_ci3.7/lib/python3.7/site-packages (from akg==2.2) (1.7.0)
Requirement already satisfied: numpy>=1.17.0 in /home/ci/miniconda3/envs/torch_1.12_ci3.7/lib/python3.7/site-packages (from akg==2.2) (1.21.2)
Requirement already satisfied: decorator>=4.4.0 in /home/ci/miniconda3/envs/torch_1.12_ci3.7/lib/python3.7/site-packages (from akg==2.2) (5.1.1)
Installing collected packages: akg
Successfully installed akg-2.2
[runtime] [2024-04-28 09:30:02] [INFO] install package to /usr/local/Ascend/latest/opp/vendors
[runtime] [2024-04-28 09:30:02] [INFO] [ops_custom] process the framework
[runtime] [2024-04-28 09:30:02] [INFO] create /usr/local/Ascend/latest/opp/vendors/mslite_tbe_and_aicpu/framework.
[runtime] [2024-04-28 09:30:02] [INFO] copy new ops framework files ......
[runtime] [2024-04-28 09:30:02] [INFO] [ops_custom] process the op proto
[runtime] [2024-04-28 09:30:02] [INFO] create /usr/local/Ascend/latest/opp/vendors/mslite_tbe_and_aicpu/op_proto.
[runtime] [2024-04-28 09:30:02] [INFO] copy new ops op_proto files ......
[runtime] [2024-04-28 09:30:02] [INFO] [ops_custom] process the op impl
[runtime] [2024-04-28 09:30:02] [INFO] create /usr/local/Ascend/latest/opp/vendors/mslite_tbe_and_aicpu/op_impl.
[runtime] [2024-04-28 09:30:02] [INFO] copy new ops op_impl files ......
[runtime] [2024-04-28 09:30:02] [INFO] no need to upgrade custom.proto files
[runtime] [2024-04-28 09:30:02] [INFO] SUCCESS
Failed
[ptest]Running ms_predictLoadPrototxtConfig
load prototxt file success
Load context info: cpu_bind_mode = 0
SetDeviceID: 0
SetProvider: ge
Config file path is: /data/local/tmp/data/akg_matmul_nz_ge.config
LoadConfig StatusCode:0
Model file path is: /data/local/tmp/data/akg_matmul.mindir
x:resize = false
y0:resize = false
y1:resize = false
[common.cpp] Loading data from: /data/local/tmp//data/akg_matmul_0.bin
[common.cpp]Read Binary Data Over, get tensorSize as: 16384
inputs[i].DataSize() VS size: 16384:16384
Loading data from /data/local/tmp//data/akg_matmul_0.bin to model tensor: x
x:transpose = none
x ---- 0.828125 0.600098 0.157349 0.816406 0.752441 0.419189 0.487793 0.68457 0.876953 0.612793
shape(16,512,)
[common.cpp] Loading data from: /data/local/tmp//data/akg_matmul_1.bin
[common.cpp]Read Binary Data Over, get tensorSize as: 65536
inputs[i].DataSize() VS size: 65536:65536
Loading data from /data/local/tmp//data/akg_matmul_1.bin to model tensor: y0
y0:transpose = none
y0 ---- 0.106445 0.558105 0.843262 0.338623 0.550781 0.361816 0.750977 1.00098 0.727051 0.187622
shape(512,64,)
[common.cpp] Loading data from: /data/local/tmp//data/akg_matmul_2.bin
[common.cpp]Read Binary Data Over, get tensorSize as: 65536
inputs[i].DataSize() VS size: 65536:65536
Loading data from /data/local/tmp//data/akg_matmul_2.bin to model tensor: y1
y1:transpose = none
y1 ---- 0.0227814 0.529297 0.774902 0.550293 0.0733643 0.89502 0.211914 0.0552673 0.292725 0.995605
shape(512,64,)
[ERROR] ME(750599,ffff11452810,test_basic_predict):2024-04-28-09:30:33.943.409 [mindspore/lite/src/extendrt/delegate/ascend_ge/ge_graph_executor.cc:1544] operator()] RunAsync failed.E40021: 2024-04-28-09:30:33.937.537 Failed to compile Op [Default/GraphKernel_MatMul_split-op1Fused_x2_y1]. (oppath: [Compile /usr/local/Ascend/CANN-7.2/opp/vendors/mslite_tbe_and_aicpu/op_impl/ai_core/tbe/mslite_tbe_and_aicpu_impl/custom.py failed with errormsg/stack: File "/home/ci/miniconda3/envs/torch_1.12_ci3.7/lib/python3.7/site-packages/akg/utils/tbe_codegen_utils.py", line 97, in build_npu_for_akg
from tbe.tvm.driver.cce_build_module import _count_time, generate_cce_code
ImportError: cannot import name 'generate_cce_code' from 'tvm.driver.cce_build_module' (/home/ci/miniconda3/envs/torch_1.12_ci3.7/lib/python3.7/site-packages/tbe/tvm/driver/cce_build_module.py),
], optype: [Fused_x2_y1])
Solution: See the host log for details, and then check the Python stack where the error log is reported.
TraceBack (most recent call last):
Compile op[Default/GraphKernel_MatMul_split-op1Fused_x2_y1] failed, oppath[/usr/local/Ascend/CANN-7.2/opp/vendors/mslite_tbe_and_aicpu/op_impl/ai_core/tbe/mslite_tbe_and_aicpu_impl/custom.py], optype[Fused_x2_y1], taskID[14]. Please check op's compilation error message.[FUNC:ReportBuildErrMessage][FILE:fusion_manager.cc][LINE:748]
[SubGraphOpt][Compile][ProcFailedCompTask] Thread[281469532465168] recompile single op[Default/GraphKernel_MatMul_split-op1Fused_x2_y1] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:962]
[SubGraphOpt][Compile][ProcFailedCompTask] Thread[281469532465168] recompile single op[Default/GraphKernel_MatMul_split-op0Fused_x2_y1] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:962]
[SubGraphOpt][Compile][ParalCompOp] Thread[281469532465168] process fail task failed[FUNC:ParallelCompileOp][FILE:tbe_op_store_adapter.cc][LINE:1010]
[SubGraphOpt][Compile][CompOpOnly] CompileOp failed.[FUNC:CompileOpOnly][FILE:op_compiler.cc][LINE:1119]
[GraphOpt][FusedGraph][RunCompile] Failed to compile graph with compiler Normal mode Op Compiler[FUNC:SubGraphCompile][FILE:fe_graph_optimizer.cc][LINE:1385]
Call OptimizeFusedGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:partition1_rank1_new_sub_graph1[FUNC:OptimizeSubGraph][FILE:graph_optimize.cc][LINE:126]
subgraph 0 optimize failed[FUNC:OptimizeSubGraphWithMultiThreads][FILE:graph_manager.cc][LINE:1021]
[ERROR] ME(750599,ffff7812e010,test_basic_predict):2024-04-28-09:30:33.943.533 [mindspore/lite/src/extendrt/delegate/ascend_ge/ge_graph_executor.cc:1814] RunGraph] Exec compute graph failed, graph id 0
[ERROR] ME(750599,ffff7812e010,test_basic_predict):2024-04-28-09:30:33.943.607 [mindspore/lite/src/extendrt/session/delegate_session.cc:262] RunGraph] GraphSinkSession::RunGraph run graph failed
[ERROR] ME(750599,ffff7812e010,test_basic_predict):2024-04-28-09:30:33.943.639 [mindspore/lite/src/extendrt/cxx_api/model/model_impl.cc:653] Predict] ModelImpl::Predict RunGraph failed with Common error code.
((predict_ret)==(kSuccess))Expectation Failed
Testcase Name: ms_predict
Please assign maintainer to check this issue.
请为此issue分配处理人。
@fengyue25
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
CI用例:
matmul算子走ge流程、使能akg(NZ格式),推理成功精度合理
test_mindir_akg_general_ascend_910_matmul_cpp_func_nz_ge_001
版本:2.3 B230
910b
ci执行结果pass
登录 后才可以发表评论