2.3K Star 8.1K Fork 4.3K

GVPMindSpore / mindspore

 / 详情

[MT][NET][910A][8p][pvt_v2_b0]训练精度不提升

DONE
Bug-Report
创建于  
2024-04-27 15:41
name about labels
Bug Report Use this template for reporting a bug kind/bug

Describe the current behavior / 问题描述 (Mandatory / 必填)

[pvt_v2_b0]训练精度不提升

Environment / 环境信息 (Mandatory / 必填)

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
/device ascend

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) :master_20240426230938_a87635b6f65bc6225173fec53883351ebf449b66
    -- Python version (e.g., Python 3.7.5) :Python 3.7.6
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):4.19.90-vhulk2211.3.0.h1543.eulerosv2r10.aarch64
    -- GCC/Compiler version (if compiled from source): gcc version 7.3.0 (GCC)
    run包Milan_C17/20240414
    mindspore包master_20240426230938_a87635b6f65bc6225173fec53883351ebf449b66

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:
/mode graph

Related testcase / 关联用例 (Mandatory / 必填)

test_ms_lab_pvt_v2_b0_acc3_ascend_train_8p_0008

Steps to reproduce the issue / 重现步骤 (Mandatory / 必填)

1.get code from solution_test
2.cd solution_test/cases/02network/00cv/pvt_v2_b0/train/
3.pytest -s test_ms_lab_pvt_v2_b0_acc3_ascend_train_8p_0008.py
4.查看过训练精度是否正常

Describe the expected behavior / 预期结果 (Mandatory / 必填)

[pvt_v2_b0]训练精度正常提升

Related log / screenshot / 日志 / 截图 (Mandatory / 必填)

2024-04-27 05:10:13] mindcv.utils.callbacks INFO - Epoch: [34/34], batch: [1251/1251], loss: 9.847733, lr: 0.000996, time: 389.866129s
[2024-04-27 05:10:23] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5840%, time: 10.636981s
[2024-04-27 05:10:25] mindcv.utils.callbacks INFO - Saving model to ./ckpt/pvt_v2_b0-34_1251.ckpt
[2024-04-27 05:10:27] mindcv.utils.callbacks INFO - Total time since last epoch: 404.670255(train: 389.894190, val: 10.636981)s, ETA: 0.000000s
[2024-04-27 05:10:27] mindcv.utils.callbacks INFO - --------------------------------------------------------------------------------
[INFO] RUNTIME(66342,python):2024-04-27-05:13:32.464.990 [engine.cc:1709] 71397 ReportTimeoutProc: report timeout! streamId=1, taskId=357, execId=355, pendingNum=9, reportCount=427, parseTaskCount=427, msec=7991049, curSec=8175284
[INFO] RUNTIME(66342,python):2024-04-27-05:16:36.785.136 [engine.cc:1709] 71397 ReportTimeoutProc: report timeout! streamId=1, taskId=357, execId=355, pendingNum=9, reportCount=427, parseTaskCount=427, msec=7991049, curSec=8359604
[2024-04-27 05:17:30] mindcv.utils.callbacks INFO - Epoch: [35/34], batch: [1251/1251], loss: 10.055557, lr: 0.000996, time: 422.013266s
[2024-04-27 05:17:40] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5840%, time: 10.416377s
[2024-04-27 05:17:41] mindcv.utils.callbacks INFO - Saving model to ./ckpt/pvt_v2_b0-35_1251.ckpt
[2024-04-27 05:17:44] mindcv.utils.callbacks INFO - Total time since last epoch: 436.063551(train: 422.062537, val: 10.416377)s, ETA: -436.063551s
[2024-04-27 05:17:44] mindcv.utils.callbacks INFO - --------------------------------------------------------------------------------
[INFO] RUNTIME(66342,python):2024-04-27-05:20:48.684.866 [engine.cc:1709] 71397 ReportTimeoutProc: report timeout! streamId=1, taskId=531, execId=529, pendingNum=9, reportCount=587, parseTaskCount=587, msec=8427122, curSec=8611503
[INFO] RUNTIME(66342,python):2024-04-27-05:23:53.004.959 [engine.cc:1709] 71397 ReportTimeoutProc: report timeout! streamId=1, taskId=531, execId=529, pendingNum=9, reportCount=587, parseTaskCount=587, msec=8427122, curSec=8795824
[2024-04-27 05:24:37] mindcv.utils.callbacks INFO - Epoch: [36/34], batch: [1251/1251], loss: 10.048895, lr: 0.000996, time: 413.378937s
[2024-04-27 05:24:48] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 0.1000%, Top_5_Accuracy: 0.5840%, time: 11.029083s
[2024-04-27 05:24:50] mindcv.utils.callbacks INFO - Saving model to ./ckpt/pvt_v2_b0-36_1251.ckpt
[2024-04-27 05:24:52] mindcv.utils.callbacks INFO - Total time since last epoch: 428.123073(train: 413.405590, val: 11.029083)s, ETA: -856.246147s
[2024-04-27 05:24:52] mindcv.utils.callbacks INFO - --------------------------------------------------------------------------------

Special notes for this issue/备注 (Optional / 选填)

可能与这个问题单有关联
https://e.gitee.com/mind_spore/issues/list?issue=I9HVXV

上次精度正常上升版本,也就是上一次ci下发执行该用例
run包Milan_C17/20240414
mindspore包master_20240417142516_74e1f3ea86a988bceb71bd7f02b674af4c97c89a/

评论 (6)

chentangyu 创建了Bug-Report
chentangyu 添加了
 
kind/bug
标签
chentangyu 添加了
 
attr/accuracy
标签
chentangyu 添加了
 
device/ascend
标签
chentangyu 添加了
 
v2.3.0.rc2
标签
chentangyu 添加协作者Shawny
chentangyu 添加协作者chentangyu
展开全部操作日志

Please assign maintainer to check this issue.
请为此issue分配处理人。
@chentangyu

感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:

  1. 如果您刚刚接触MindSpore,或许您可以在教程找到答案
  2. 如果您是资深Pytorch用户,您或许需要:
  1. 如果您遇到动态图问题,可以设置set_context(pynative_synchronize=True)查看报错栈协助定位
  2. 模型精度调优问题可参考官网调优指南
  3. 如果您反馈的是框架BUG,请确认您在ISSUE中提供了MindSpore版本、使用的后端类型(CPU、GPU、Ascend)、环境、训练的代码官方链接以及可以复现报错的代码的启动方式等必要的定位信息
  4. 如果您已经定位出问题根因,欢迎提交PR参与MindSpore开源社区,我们会尽快review
chentangyu 修改了描述
Shawny 负责人PingqiLi 修改为bantao
Shawny 添加协作者PingqiLi
Shawny 里程碑B-SIG-Kit 修改为B-SIG-OPS

问题同:I9HVXV

i-robot 添加了
 
foruda
标签
chentangyu 移除了
 
v2.3.0.rc2
标签
chentangyu 移除了
 
v2.3.0.rc2
标签
chentangyu 添加了
 
v2.3.0
标签
wangbixing 添加了
 
v2.3.0.rc2
标签
i-robot 添加了
 
gitee
标签
bantao 里程碑B-SIG-OPS 修改为B-MDTest
bantao 任务状态TODO 修改为VALIDATION
bantao 添加协作者bantao
bantao 负责人bantao 修改为chentangyu
bantao 取消协作者chentangyu
chentangyu 移除了
 
v2.3.0
标签
chentangyu 移除了
 
v2.3.0
标签

master分支430 ci包还未出,用开发提供的pr包
910A回归通过
输入图片说明

chentangyu 任务状态VALIDATION 修改为DONE

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(5)
8108889 shawny233 1628167362 13326378 bantao1 1693875092 7508424 tacyi139 1588073933
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助