Profiling¶
CPU¶
jobstats¶
htop¶
projplot¶
GPU¶
nvidia-smi¶
nvidia-smi dmon -o DT
nvidia-smi --format=noheader,csv --query-compute-apps=timestamp,gpu_name,pid,name,used_memory --loop=1 -f sample_run.log
nvtop¶
Pytorch¶
memory_viz¶
CUDA OOMs memory snapshot and memory profiler
Start: torch.cuda.memory._record_memory_history(max_entries=100000)
Save: torch.cuda.memory._dump_snapshot(f"{file_name}.pickle")
Stop: torch.cuda.memory._record_memory_history(enabled=None)
To visualize the snapshot file, PyTorch has a tool hosted at https://pytorch.org/memory_viz.
Reach more : https://pytorch.org/blog/understanding-gpu-memory-1/
Tensorflow¶
Tensorboard¶
tensorboard
and tensorboard-data-server
are available as a module :
Read more : https://uppmax.github.io/uppmax4DL/tensorboard/