Detect confusable usage of unicode homoglyphs, prevent homograph attacks.
[doc] <http://confusable-homoglyphs.readthedocs.io/en/latest/>
__.. image:: https://img.shields.io/travis/vhf/confusable_homoglyphs.svg :target: https://travis-ci.org/vhf/confusable_homoglyphs
.. image:: https://img.shields.io/pypi/v/confusable_homoglyphs.svg :target: https://pypi.python.org/pypi/confusable_homoglyphs
.. image:: https://readthedocs.org/projects/confusable_homoglyphs/badge/?version=latest :target: http://confusable-homoglyphs.readthedocs.io/en/latest/ :alt: Documentation Status
a homoglyph is one of two or more graphemes, characters, or glyphs with
shapes that appear identical or very similar
wikipedia:Homoglyph <https://en.wikipedia.org/wiki/Homoglyph>
__
Unicode homoglyphs can be a nuisance on the web. Your most popular client, AlaskaJazz, might be upset to be impersonated by a trickster who deliberately chose the username ΑlaskaJazz.
AlaskaJazz
is single script: only Latin characters.ΑlaskaJazz
is mixed-script: the first character is a greek
letter.You might also want to avoid people being tricked into entering their
password on www.microsоft.com
or www.faϲebook.com
instead of
www.microsoft.com
or www.facebook.com
. Here is a utility <http://unicode.org/cldr/utility/confusables.jsp>
__ to play
with these confusable homoglyphs.
Not all mixed-script strings have to be ruled out though, you could only exclude mixed-script strings containing characters that might be confused with a character from some unicode blocks of your choosing.
Allo
and ρττ
are fine: single script.AlloΓ
is fine when our preferred script alias is 'latin': mixed script, but Γ
is not confusable.Alloρ
is dangerous: mixed script and ρ
could be confused with
p
.This library is compatible Python 2 and Python 3.
API documentation <http://confusable-homoglyphs.readthedocs.io/en/latest/apidocumentation.html>
__Yep.
The unicode blocks aliases and names for each character are extracted
from this file <http://www.unicode.org/Public/UNIDATA/Scripts.txt>
__
provided by the unicode consortium.
The matrix of which character can be confused with which other
characters is built using this file <http://www.unicode.org/Public/security/latest/confusables.txt>
__
provided by the unicode consortium.
This data is stored in two JSON files: categories.json
and
confusables.json
. If you delete them, they will both be recreated by
downloading and parsing the two abovementioned files and stored as JSON
files again.
Initial release.
allowed_categories
renamed to allowed_aliases
Courtesy of Ryan P Kilby, via https://github.com/vhf/confusable_homoglyphs/pull/6 :
confusable_homoglyphs
package directory instead of the user's current working directory.cli
bundle, eg. pip install confusable_homoglyphs[cli]
.