"Unexpected end of /proc/mounts line `overlay" on p3.8xlarge

Hello,

I’ve got a weird message when running my training script on a p3.8xlarge instance. This message does not occurs on p3.2xlarge instance. I don’t really know if this is an AMI/Docker image/instance issue so feel free to move my post in the right section.

Not sure if this message has an impact on my training because everything seems running fine. But I prefer ask you :-)

Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/HDKKNJFWH3N4KASB6A45SSP6CY:/var/lib/docker/overlay2/l/R7JYTFIN5UAGQWX26JNXG6YP5W:/var/lib/docker/overlay2/l/HJWZVSQB72SY42K7RHHCWGJ26T:/var/lib/docker/overlay2/l/WK3VOV5D7YBN5M75PAQMMB6PLB:/var/lib/docker/overlay2/l/R2ZKF77GYBONTEO2EBRSCLECI5:/var/lib/docker/overlay2/l/2ALSS6OFGBX2IZ2M2CID5KZWT5:/var/lib/docker/overlay2/l/KAZFDKEQYECLJVXNCU6CNCKSQF:/var/lib/docker/overlay2/l/72FJ5PC4LULGFTBBKBTCNN73WH:/var/lib/docker/overlay2/l/4YXTSXF42BN4E'
Unexpected end of /proc/mounts line `DSR4UGDLMG42F:/var/lib/docker/overlay2/l/7EMX6CD7JXBXJH2VIWNXMTOVIV:/var/lib/docker/overlay2/l/E2IQH5TDAK7GGMT7UW42Y4HDKD:/var/lib/docker/overlay2/l/BPZCHVZ5US2MJZJWHIM66DRAZO:/var/lib/docker/overlay2/l/MDB3K2O6XCTTFNHVML5A6UAOXJ:/var/lib/docker/overlay2/l/VS3X5J2EUB4P7R6JUJUDOCFQL5:/var/lib/docker/overlay2/l/2KCEL7WCOFRP7CFCYBRMWRKOZC:/var/lib/docker/overlay2/l/XDO5EL3TFTKGTKRQVTUPBKY4O3:/var/lib/docker/overlay2/l/66CEBDJAIZJ2ZXSDV6YPNZ3RMB:/var/lib/docker/overlay2/l/FHA6MAWU3R3Z3UPWLDKHP7M3VB:/var/lib/do'
Unexpected end of /proc/mounts line `cker/overlay2/l/HJDBAOVOKVPDCIP2LAFL4VAXEC:/var/lib/docker/overlay2/l/KLCT3CSJ4XS2VBNP4ZBP4PZ3AW:/var/lib/docker/overlay2/l/IFEMT2MMRRNPOFUX62OAHQLFYP:/var/lib/docker/overlay2/l/JC2WISYN2KYQ7LF4R7ZG57KG4I:/var/lib/docker/overlay2/l/AHK5K5EAGHVRDGPA4R3PMQ5HLN:/var/lib/docker/overlay2/l/4EVNGFBNV4S3FKKIGN3XDWTE4Y:/var/lib/docker/overlay2/l/5EXPUJMDKQ2G7TVGRPBVTAQJOQ:/var/lib/docker/overlay2/l/TNUPFWPGBHINQCKSTRA3KALRQE:/var/lib/docker/overlay2/l/SON3ZQOPOUXUC6FCBQT5DWFLFT:/var/lib/docker/overlay2/l/EOJQKLPNK'
Unexpected end of /proc/mounts line `4KJAADXWNESTV6DV5:/var/lib/docker/overlay2/l/4TZIIEAJQOSGLGTI7TTEDAKMHM:/var/lib/docker/overlay2/l/K2HLSA7XBOY4TGWIOTBP3HTZSA:/var/lib/docker/overlay2/l/RMXRW72LOIIYBEFEEPGMC6FONI:/var/lib/docker/overlay2/l/K65UY2CD3J4B2WM5OQKEY5NQE6:/var/lib/docker/overlay2/l/JMTQMCNGJHMHG5LU3LICEVMA4P:/var/lib/docker/overlay2/l/ARFTDPLJWJXJEWVL44QTTSYRYK:/var/lib/docker/overlay2/l/VCQ4VXQNQ7VO55W3Z27ATO54FX:/var/lib/docker/overlay2/l/BOY46UWZQ72JAFHJSOM4NXACY3:/var/lib/docker/overlay2/l/GQJUZY566ZHAZH4QZMGBT5S2IA:/var/li'
  • Docker image: nvcr.io/nvidia/pytorch:17.11 with my scripts inside
  • Training dataset on EFS
  • No weird message with p3.2xlarge

This is caused by a bug in the hwloc library (which is used by NVML and other things to detect the system topology) where it cannot correctly parse long lines in /proc/mounts. Container file systems tend to create those long lines on a regular basis due to the long hashes they use in file and directory names. Fortunately this has no impact on the hwloc functionality we actually need.

So while the error messages may look alarming, they are harmless and can be safely ignored.

Thanks,
Cliff

Thanks for the information Cliff :-)

I understand it doesn’t impact functionality, but curious to know if this will be fixed at some point?

This is fixed in our 396 and later driver series. Thanks!