Diff Match and Patch
Google's Diff Match and Patch library, packaged for modern Python.
diff-match-patch is supported on Python 3.7 or newer. You can install it from PyPI:
python -m pip install diff-match-patch
Generating a patchset (analogous to unified diff) between two texts:
from diff_match_patch import diff_match_patch
dmp = diff_match_patch()
patches = dmp.patch_make(text1, text2)
diff = dmp.patch_toText(patches)
Applying a patchset to a text can then be done with:
from diff_match_patch import diff_match_patch
dmp = diff_match_patch()
patches = dmp.patch_fromText(diff)
new_text, _ = dmp.patch_apply(patches, text)
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.
Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.
Although each language port of Diff Match Patch uses the same API, there are some language-specific notes.
A standardized speed test tracks the relative performance of diffs in each language.
This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.
This library also implements a Bitap matching algorithm at the heart of a flexible matching and patching strategy.