Hello,
I’ve got a weird message when running my training script on a p3.8xlarge instance. This message does not occurs on p3.2xlarge instance. I don’t really know if this is an AMI/Docker image/instance issue so feel free to move my post in the right section.
Not sure if this message has an impact on my training because everything seems running fine. But I prefer ask you :-)
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/HDKKNJFWH3N4KASB6A45SSP6CY:/var/lib/docker/overlay2/l/R7JYTFIN5UAGQWX26JNXG6YP5W:/var/lib/docker/overlay2/l/HJWZVSQB72SY42K7RHHCWGJ26T:/var/lib/docker/overlay2/l/WK3VOV5D7YBN5M75PAQMMB6PLB:/var/lib/docker/overlay2/l/R2ZKF77GYBONTEO2EBRSCLECI5:/var/lib/docker/overlay2/l/2ALSS6OFGBX2IZ2M2CID5KZWT5:/var/lib/docker/overlay2/l/KAZFDKEQYECLJVXNCU6CNCKSQF:/var/lib/docker/overlay2/l/72FJ5PC4LULGFTBBKBTCNN73WH:/var/lib/docker/overlay2/l/4YXTSXF42BN4E'
Unexpected end of /proc/mounts line `DSR4UGDLMG42F:/var/lib/docker/overlay2/l/7EMX6CD7JXBXJH2VIWNXMTOVIV:/var/lib/docker/overlay2/l/E2IQH5TDAK7GGMT7UW42Y4HDKD:/var/lib/docker/overlay2/l/BPZCHVZ5US2MJZJWHIM66DRAZO:/var/lib/docker/overlay2/l/MDB3K2O6XCTTFNHVML5A6UAOXJ:/var/lib/docker/overlay2/l/VS3X5J2EUB4P7R6JUJUDOCFQL5:/var/lib/docker/overlay2/l/2KCEL7WCOFRP7CFCYBRMWRKOZC:/var/lib/docker/overlay2/l/XDO5EL3TFTKGTKRQVTUPBKY4O3:/var/lib/docker/overlay2/l/66CEBDJAIZJ2ZXSDV6YPNZ3RMB:/var/lib/docker/overlay2/l/FHA6MAWU3R3Z3UPWLDKHP7M3VB:/var/lib/do'
Unexpected end of /proc/mounts line `cker/overlay2/l/HJDBAOVOKVPDCIP2LAFL4VAXEC:/var/lib/docker/overlay2/l/KLCT3CSJ4XS2VBNP4ZBP4PZ3AW:/var/lib/docker/overlay2/l/IFEMT2MMRRNPOFUX62OAHQLFYP:/var/lib/docker/overlay2/l/JC2WISYN2KYQ7LF4R7ZG57KG4I:/var/lib/docker/overlay2/l/AHK5K5EAGHVRDGPA4R3PMQ5HLN:/var/lib/docker/overlay2/l/4EVNGFBNV4S3FKKIGN3XDWTE4Y:/var/lib/docker/overlay2/l/5EXPUJMDKQ2G7TVGRPBVTAQJOQ:/var/lib/docker/overlay2/l/TNUPFWPGBHINQCKSTRA3KALRQE:/var/lib/docker/overlay2/l/SON3ZQOPOUXUC6FCBQT5DWFLFT:/var/lib/docker/overlay2/l/EOJQKLPNK'
Unexpected end of /proc/mounts line `4KJAADXWNESTV6DV5:/var/lib/docker/overlay2/l/4TZIIEAJQOSGLGTI7TTEDAKMHM:/var/lib/docker/overlay2/l/K2HLSA7XBOY4TGWIOTBP3HTZSA:/var/lib/docker/overlay2/l/RMXRW72LOIIYBEFEEPGMC6FONI:/var/lib/docker/overlay2/l/K65UY2CD3J4B2WM5OQKEY5NQE6:/var/lib/docker/overlay2/l/JMTQMCNGJHMHG5LU3LICEVMA4P:/var/lib/docker/overlay2/l/ARFTDPLJWJXJEWVL44QTTSYRYK:/var/lib/docker/overlay2/l/VCQ4VXQNQ7VO55W3Z27ATO54FX:/var/lib/docker/overlay2/l/BOY46UWZQ72JAFHJSOM4NXACY3:/var/lib/docker/overlay2/l/GQJUZY566ZHAZH4QZMGBT5S2IA:/var/li'
- Docker image: nvcr.io/nvidia/pytorch:17.11 with my scripts inside
- Training dataset on EFS
- No weird message with p3.2xlarge