RotatingFileHandler replacement with concurrency, gzip and Windows support
This package provides an additional log handler for Python's standard logging package (PEP 282). This handler will write log events to a log file which is rotated when the log file reaches a certain size. Multiple processes can safely write to the same log file concurrently. Rotated logs can be gzipped if desired. Both Windows and POSIX systems are supported. An optional threaded queue logging handler is provided to perform logging in the background.
This is a fork of Lowell Alleman's ConcurrentLogHandler 0.9.1 which fixes a hanging/deadlocking problem. See this.
Summary of other changes:
concurrent_log_handler
(abbreviated CLH in this file.)use_gzip
option to compress rotated logsportalocker
package, which (on Windows only) depends on PyWin32
The main use case this is designed to support is when you have a Python application that runs in multiple processes, potentially on multiple hosts connected with a shared network drive, and you want to write all log events to a central log file and have those files rotated based on size and/or time, e.g. daily or hourly.
However, this is not the only way to achieve shared logging from multiple processes. You can also centralize logging by using cloud logging services like Azure Log Monitor, Logstash, etc. Or you can implement your own remote logging server as shown here:
Concurrent-Log-Handler includes a QueueHandler and QueueListener implementation that can be used to perform logging in the background asynchronously, so the thread or process making the log statement doesn't have to wait for its completion. See this section. Using that example code, each process still locks and writes the file separately, so there is no centralized writer. You could also write code to use QueueHandler and QueueListener to queue up log events within each process to be sent to a central server, instead of CLH's model where each process locks and writes to the log file.
The main ConcurrentRotatingFileHandler
class supports size-based rotation only.
In addition, a ConcurrentTimedRotatingFileHandler
class is provided that supports both time-based
and size-based rotation. By default, it does hourly time-based rotation and no size rotation.
See this section for more details.
You can download and install the package with pip
using the following command:
pip install concurrent-log-handler
This will also install the portalocker module, which on Windows in turn depends on pywin32.
If installing from source, use the following command:
python setup.py install
If you plan to modify the code, you should follow this procedure:
Clone the repository
Create a virtual environment (venv
) and activate it.
Install the package in editable mode with the [dev] option: pip install -e .[dev]
Run the tests: tox
or run pytest
directly.
Or manually run a single pass of the stress test with specific options:
python tests/stresstest.py --help
python tests/stresstest.py --gzip --num-processes 12 --log-calls=5000
python setup.py clean --all build sdist bdist_wheel
# Copy the .whl file from under the "dist" folder
# or upload with twine:
pip install twine
twine upload dist/concurrent-log-handler-0.9.23.tar.gz dist/concurrent_log_handler-0.9.23-py3-none-any.whl
Concurrent Log Handler (CLH) is designed to allow multiple processes to write to the same logfile in a concurrent manner. It is important that each process involved MUST follow these requirements:
You can't serialize a handler instance and reuse it in another process. This means you cannot, for
example, pass a CLH handler instance from parent process to child process using
the multiprocessing
package in spawn mode (or similar techniques that use serialized objects).
Each child process must initialize its own CLH instance.
When using the multiprocessing
module in "spawn" (non-fork) mode, each child process must create
its OWN instance of the handler (ConcurrentRotatingFileHandler
). The child target function
should call code that initializes a new CLH instance.
This requirement does not apply to threads within a given process. Different threads within a process can use the same CLH instance. Thread locking is handled automatically.
This also does not apply to fork()
based child processes such as gunicorn --preload.
Child processes of a fork() call should be able to inherit the CLH object instance.
This limitation exists because the CLH object can't be serialized, passed over a network or pipe, and reconstituted at the other end.
It is important that every process or thread writing to a given logfile must all use the same
settings, especially related to file rotation. Also do not attempt to mix different handler
classes writing to the same file, e.g. do not also use a RotatingFileHandler
on the same file.
Special attention may need to be paid when the log file being written to resides on a network shared drive or a cloud synced folder (Dropbox, Google Drive, etc.). Whether the multiprocess advisory lock technique (via portalocker) works in these folders may depend on the details of your configuration.
Note that a lock_file_directory
setting (kwarg) now exists (as of v0.9.21) which lets you
place the lockfile at a different location from the main logfile. This might solve problems
related to trying to lock files in network shares or cloud folders (Dropbox, Google Drive, etc.)
However, if multiple hosts are writing to the same shared logfile, they must also have access
to the same lock file.
Alternatively, you may be able to set your cloud sync software to ignore all .lock
files.
A separate handler instance is needed for each individual log file. For instance, if your app writes to two different log files you will need to set up two CLH instances per process.
Here is a simple direct usage example:
from logging import getLogger, INFO
from concurrent_log_handler import ConcurrentRotatingFileHandler
import os
log = getLogger(__name__)
# Use an absolute path to prevent file rotation trouble.
logfile = os.path.abspath("mylogfile.log")
# Rotate log after reaching 512K, keep 5 old copies.
rotateHandler = ConcurrentRotatingFileHandler(logfile, "a", 512 * 1024, 5)
log.addHandler(rotateHandler)
log.setLevel(INFO)
log.info("Here is a very exciting log message, just for you")
See also the file src/example.py for a configuration and usage example.
This shows both the standard non-threaded non-async usage, and the use of the asyncio
background logging feature. Under that option, when your program makes a logging statement,
it is added to a background queue and may not be written immediately and synchronously. This
queue can span multiple processes using multiprocessing
or concurrent.futures
, and spanning
multiple hosts works due to the use of file locking on the log file. Note that with this async
logging feature, currently there is no way for the caller to know when the logging statement
completed (no "Promise" or "Future" object is returned).
To use this module from a logging config file, use a handler entry like this:
[handler_hand01]
class = handlers.ConcurrentRotatingFileHandler
level = NOTSET
formatter = form01
args = ("rotating.log", "a")
kwargs = {'backupCount': 5, 'maxBytes': 1048576, 'use_gzip': True}
That sets the files to be rotated at about 10 MB, and to keep the last 5 rotations. It also turns on gzip compression for rotated files.
Please note that Python 3.7 and higher accepts keyword arguments (kwargs) in a logging config file, but earlier versions of Python only accept positional args.
Note: you must have an import concurrent_log_handler
before you call fileConfig(). For
more information see Python docs on log file formats
The size-based rotation limit (maxBytes
) is not strict. The files may become slightly
larger than maxBytes
. How much larger depends on the size of the log message being
written when the rollover occurs.
By contrast, the base RotatingLogHandler
class tries to ensure that the log file is
always kept under maxBytes
taking into account the size of the current log message being
written. This limitation may be changed in the future.
For best performance, avoid setting the backupCount
(number of rollover files to keep) too
high. What counts as "too high" is situational, but a good rule of thumb might be to keep
around a maximum of 20 rollover files. If necessary, increase the maxBytes
so that each
file can hold more. Too many rollover files can slow down the rollover process due to the
mass file renames, and the rollover occurs while the file lock is held for the main logfile.
How big to allow each file to grow (maxBytes
) is up to your needs, but generally a value of
10 MB (1048576) to 100 MB (1048576) is reasonable.
Gzip compression is turned off by default. If enabled it will reduce the storage needed for rotated files, at the cost of some minimal CPU overhead. Use of the background logging queue shown below can help offload the cost of logging to another thread.
Sometimes you may need to place the lock file at a different location from the main log
file. A lock_file_directory
setting (kwarg) now exists (as of v0.9.21) which lets you
place the lockfile at a different location. This can often solve problems related to trying
to lock files in cloud folders (Dropbox, Google Drive, OneDrive, etc.) However, in
order for this to work, each process writing to the log must have access to the same
lock file location, even if they are running on different hosts.
You can set the namer
attribute of the handler to customize the naming of the rotated files,
in line with the BaseRotatingHandler
class. See the Python docs for
more details.
By default, the logfile will have line endings appropriate to the platform. On Windows the line endings will be CRLF ('\r\n') and on Unix/Mac they will be LF ('\n').
It is possible to force another line ending format by using the newline and terminator arguments.
The following would force Windows-style CRLF line endings on Unix:
kwargs={'newline': '', 'terminator': '\r\n'}
The following would force Unix-style LF line endings on Windows:
kwargs={'newline': '', 'terminator': '\n'}
An alternative class ConcurrentTimedRotatingFileHandler
is also provided which supports
time-based rotation, defaulting to hourly. Like the main class, it uses advisory file
locking to both ensure that only one process/thread is writing to the log file at a time,
and to coordinate the rollover time between processes.
By default, it has maxBytes
set to 0, which means that it will not rotate based on file size,
but it is possible to set maxBytes
to a value to limit the size of each file in addition
to the time-based rotation. When files are rotated based on size, they may have an additional
numeric suffix like .1
added to the filename. Note that like with the main CLH class,
the file size limits are not strictly adhered to.
All the same settings are available for this class as for the main class, including
maxBytes
, use_gzip
, lock_file_directory
, newline
, and terminator
. However,
the ordering of the arguments is different, so it's recommended to use keyword arguments
when using or configuring this class. The arguments shared with TimedRotatingFileHandler
are in the same order as the base class, and the extra CLH arguments come after that,
although not in the exact same order due to some overlap.
For configuration, see the configuration section above, but substitute in
class=handlers.ConcurrentTimedRotatingFileHandler
and other appropriate settings
like when
and interval
. See the Python docs for TimedRotatingFileHandler
for
more details.
To use the background logging queue, you must call this code at some point in your
app after it sets up logging configuration. Please read the doc string in the
file concurrent_log_handler/queue.py
for more details. This requires Python 3.
See also src/example.py.
from concurrent_log_handler.queue import setup_logging_queues
# convert all configured loggers to use a background thread
setup_logging_queues()
This module is designed to function well in a multi-threaded or multi-processes concurrent environment. However, all writers to a given log file should be using the same class and the same settings at the same time, otherwise unexpected behavior may result during file rotation.
This may mean that if you change the logging settings at any point you may need to restart your app service so that all processes are using the same settings at the same time.
The ConcurrentRotatingFileHandler
class is a drop-in replacement for
Python's standard log handler RotatingFileHandler
. This module uses file
locking so that multiple processes can concurrently log to a single file without
dropping or clobbering log events. This module provides a file rotation scheme
like with RotatingFileHandler
. Extra care is taken to ensure that logs
can be safely rotated before the rotation process is started. (This module works
around the file rename issue with RotatingFileHandler
on Windows, where a
rotation failure means that all subsequent log events are dropped).
This module attempts to preserve log records at all cost. This means that log
files will grow larger than the specified maximum (rotation) size. So if disk
space is tight, you may want to stick with RotatingFileHandler
, which will
strictly adhere to the maximum file size.
Important:
If you have multiple instances of a script (or multiple scripts) all running at
the same time and writing to the same log file, then all of the scripts should
be using ConcurrentRotatingFileHandler
. You should not attempt to mix
and match RotatingFileHandler
and ConcurrentRotatingFileHandler
.
The file locking is advisory only - it is respected by other Concurrent Log Handler
instances, but does not protect against outside processes (or different Python logging
file handlers) from writing to a log file in use.
See CHANGELOG.md
The original version was written by Lowell Alleman.
Other contributors are listed in CONTRIBUTORS.md.
See the LICENSE file