name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
SSD教程在mac环境训练失败,环境上ulimit -a显示file descriptors为256,mac环境默认就是256,环境重启之后也无法训练成功
教程地址:https://www.mindspore.cn/tutorials/application/zh-CN/master/cv/ssd.html
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device CPU
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
失败版本:r2.3.q1_20240329061516_c99698ba26
上次pass版本:r2.3_20240315121520_a24a055ea90a9
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
/mode graph
solution_test/cases/03subject_test/06document/02network_cases/test_ms_tutorial_cv_ssd_0001.py
网络训练成功
Traceback (most recent call last):
File "/Users/jenkins/solution_test/cases/03subject_test/06document/02network_cases/test_ms_tutorial_cv_ssd_0001_PYNATIVE_MODE/ssd.py", line 1004, in <module>
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/iterators.py", line 152, in __next__
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/iterators.py", line 301, in _get_next
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3458, in launch
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3488, in create_pool
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3569, in _launch_watch_dog
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/process.py", line 121, in start
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/context.py", line 277, in _Popen
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/popen_fork.py", line 64, in _launch
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/util.py", line 300, in _run_finalizers
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/util.py", line 224, in __call__
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/multiprocessing/util.py", line 133, in _remove_temp_dir
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/shutil.py", line 711, in rmtree
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/shutil.py", line 709, in rmtree
OSError: [Errno 24] Too many open files: '/var/folders/yj/z1vw4dv17bx3y4yvlh0dtml40000gp/T/pymp-ag8kn2jf'
Exception ignored in: <function MapDataset.__del__ at 0x122613af0>
Traceback (most recent call last):
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3737, in __del__
self.process_pool.terminate()
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3494, in terminate
self.abort_watchdog()
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3589, in abort_watchdog
_PythonMultiprocessing._terminate_processes([self.cleaning_process])
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3363, in _terminate_processes
p._popen.wait() # pylint: disable=W0212
AttributeError: 'NoneType' object has no attribute 'wait'
Exception ignored in: <function _PythonMultiprocessing.__del__ at 0x12260fdc0>
Traceback (most recent call last):
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3293, in __del__
self.terminate()
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3494, in terminate
self.abort_watchdog()
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3589, in abort_watchdog
_PythonMultiprocessing._terminate_processes([self.cleaning_process])
File "/Users/jenkins/miniconda3/envs/ci/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 3363, in _terminate_processes
p._popen.wait() # pylint: disable=W0212
AttributeError: 'NoneType' object has no attribute 'wait'
走给郭志建
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
已经确认dataset模块句柄没有增加,且dataset模块目前是有单元测试来保证句柄合理的。
单元测试:https://gitee.com/mindspore/mindspore/blob/r2.3/tests/st/dataset/test_dataset_with_multiprocessing.py
https://gitee.com/mindspore/mindspore/blob/r2.3/tests/ut/python/dataset/test_datasets_generator.py : test_generator_multiprocessing_with_fixed_handle
https://gitee.com/mindspore/mindspore/blob/r2.3/tests/ut/python/dataset/test_map.py : test_map_multiprocessing_with_fixed_handle
https://gitee.com/mindspore/mindspore/blob/r2.3/tests/ut/python/dataset/test_var_batch_map.py : test_batch_multiprocessing_with_in_out_rowsize
但是不确认是哪个模块导致 整个训练进程 使用的句柄数增加,建议 测试通过 二分法 定位到准确模块。
基于以上,先转回测试,找到准确的模块来修复。
问题单未解决前不能走回给测试
登录 后才可以发表评论