Parsing Logs
The following used a small 150 kilobytes Traefik access.log.
BAD
import tracemalloc
tracemalloc.start()
with open("access.log", "r") as f:
for line in f.readlines():
print(line.split()[0])
print(tracemalloc.get_traced_memory())
tracemalloc.stop()The peak memory consumption was 238.501 kilobytes.
It is very important to not load everything into memory with f.readlines(), instead you should read from the buffer to reduce memory consumption.
Good
import tracemalloc
tracemalloc.start()
with open("access.log", "r") as f:
for line in f:
print(line.split()[0])
print(tracemalloc.get_traced_memory())
tracemalloc.stop()The peak memory consumption was 22.762 kilobytes. 91% smaller than
f.readlines().
Reading from the buffer will keep memory consumption low. However, the performance of the 2 programs in this case were very similar (~0.025 seconds). In larger programs or larger log files, this may cause a performance penalty because if fast RAM runs out, the slower swap memory will be used.