name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
[vgg16][graph][x86-910 8p]FPS:4560 < 4600,网络在X86-910上性能劣化
模型仓地址:https://gitee.com/mindspore/models/tree/master/official/cv/VGG/vgg16
网络在X86-Ascend910上性能劣化,在ARM-Ascend910达标。
vgg16:
X86-Ascend910:4560 <4600
ARM-Ascend910: 4868 > 4600
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend x86/
run包: HiAI/HISI_C30/20230720/
mindspore: r2.1_20230725161523_15377429
PyNative
/Graph
):Please delete the mode not involved / 请删除不涉及的模式:
/mode graph
用例仓地址:solution_test/case/02network/00cv/vgg16/train
用例:test_ms_vgg16_cifar10_train_infer_910_8p_0001
1.get code from models
2.cd models/official/cv/VGG/vgg16
3.Usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [cifar10|imagenet2012]
4.验证网络是否训练成功
网络训练成功,性能达到4600FPS
Train epoch time: 88755.073 ms, per step time: 915.001 ms
Train epoch time: 10426.437 ms, per step time: 107.489 ms
Train epoch time: 10416.589 ms, per step time: 107.388 ms
Train epoch time: 10422.537 ms, per step time: 107.449 ms
Train epoch time: 12669.396 ms, per step time: 130.612 ms
Train epoch time: 10413.061 ms, per step time: 107.351 ms
Train epoch time: 10419.519 ms, per step time: 107.418 ms
Train epoch time: 10429.868 ms, per step time: 107.524 ms
Train epoch time: 10420.229 ms, per step time: 107.425 ms
Train epoch time: 12760.309 ms, per step time: 131.550 ms
Train epoch time: 10408.571 ms, per step time: 107.305 ms
Train epoch time: 10438.841 ms, per step time: 107.617 ms
Train epoch time: 10424.552 ms, per step time: 107.470 ms
Train epoch time: 10422.712 ms, per step time: 107.451 ms
Train epoch time: 12748.803 ms, per step time: 131.431 ms
Train epoch time: 10416.548 ms, per step time: 107.387 ms
Train epoch time: 10420.336 ms, per step time: 107.426 ms
Train epoch time: 10415.631 ms, per step time: 107.378 ms
Train epoch time: 10424.064 ms, per step time: 107.465 ms
Train epoch time: 12739.141 ms, per step time: 131.331 ms
Train epoch time: 10411.635 ms, per step time: 107.336 ms
Train epoch time: 10429.574 ms, per step time: 107.521 ms
Train epoch time: 10410.773 ms, per step time: 107.328 ms
Train epoch time: 10423.095 ms, per step time: 107.455 ms
Train epoch time: 12755.915 ms, per step time: 131.504 ms
Train epoch time: 10414.946 ms, per step time: 107.371 ms
Train epoch time: 10414.219 ms, per step time: 107.363 ms
Train epoch time: 10410.813 ms, per step time: 107.328 ms
Train epoch time: 10409.670 ms, per step time: 107.316 ms
Train epoch time: 12731.053 ms, per step time: 131.248 ms
Train epoch time: 10416.225 ms, per step time: 107.384 ms
Train epoch time: 10419.394 ms, per step time: 107.416 ms
Train epoch time: 10421.844 ms, per step time: 107.442 ms
Train epoch time: 10423.710 ms, per step time: 107.461 ms
Train epoch time: 12872.563 ms, per step time: 132.707 ms
Train epoch time: 10424.155 ms, per step time: 107.466 ms
Train epoch time: 10421.570 ms, per step time: 107.439 ms
Train epoch time: 10437.652 ms, per step time: 107.605 ms
Train epoch time: 10407.323 ms, per step time: 107.292 ms
Train epoch time: 12741.421 ms, per step time: 131.355 ms
Train epoch time: 10407.239 ms, per step time: 107.291 ms
Train epoch time: 10418.948 ms, per step time: 107.412 ms
Train epoch time: 10404.158 ms, per step time: 107.259 ms
Train epoch time: 10413.049 ms, per step time: 107.351 ms
Train epoch time: 12725.214 ms, per step time: 131.188 ms
Train epoch time: 10396.183 ms, per step time: 107.177 ms
Train epoch time: 10411.662 ms, per step time: 107.337 ms
Train epoch time: 10416.259 ms, per step time: 107.384 ms
Train epoch time: 10418.939 ms, per step time: 107.412 ms
Train epoch time: 12768.704 ms, per step time: 131.636 ms
Train epoch time: 10415.551 ms, per step time: 107.377 ms
Train epoch time: 10399.587 ms, per step time: 107.212 ms
Train epoch time: 10426.311 ms, per step time: 107.488 ms
Train epoch time: 10414.772 ms, per step time: 107.369 ms
Train epoch time: 12765.006 ms, per step time: 131.598 ms
Train epoch time: 10415.831 ms, per step time: 107.380 ms
Train epoch time: 10401.095 ms, per step time: 107.228 ms
Train epoch time: 10418.307 ms, per step time: 107.405 ms
Train epoch time: 10411.587 ms, per step time: 107.336 ms
Train epoch time: 12744.037 ms, per step time: 131.382 ms
Train epoch time: 10395.420 ms, per step time: 107.169 ms
Train epoch time: 10407.376 ms, per step time: 107.293 ms
Train epoch time: 10422.577 ms, per step time: 107.449 ms
Train epoch time: 10420.676 ms, per step time: 107.430 ms
Train epoch time: 12731.580 ms, per step time: 131.253 ms
Train epoch time: 10427.068 ms, per step time: 107.496 ms
Train epoch time: 10435.553 ms, per step time: 107.583 ms
Train epoch time: 10423.365 ms, per step time: 107.457 ms
Train epoch time: 10410.719 ms, per step time: 107.327 ms
Train epoch time: 12686.112 ms, per step time: 130.785 ms
走给代宇鑫
Please assign maintainer to check this issue.
请为此issue分配处理人。
@sunjiawei999
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的反馈,您可以评论//mindspore-assistant更快获取帮助,更多标签可以查看标签列表:
算子时长相差不大,只差1ms,跑了两次,结果基本一致,从profiling数据来看,主要差距在拖尾阶段,相差7ms左右,需要找AllReduce相关负责人帮忙分析一下环境差异
arm
x86
对比单卡性能相差不大,per step time 分别是:
x86:69.127750ms
arm:69.805500ms
2023/7/27 CCB:
遗留原因:ARM环境性能正常,X86性能相比基线下降0.9%,当前外部使用环境大部分为ARM,同时性能下降较小,影响可控,经CCB裁决,问题遗留
影响:X86环境下vgg16静态图模式性能下降0.9%
规避措施:用户如有疑问,可通过社区回复性能优化计划,明确当前X86上有性能劣化,但是ARM上性能是正常的
2023.08.04 CCB:
暂时不在X86看护,改为ARM环境看护。
补充试验:8个进程跑单卡用例,看ARM和X86多卡间性能的抖动是否一致。
登录 后才可以发表评论