Project: wiki-fetch

Parser for Wikipedia.org

Project Details

Latest version
0.1.1
Home Page
https://github.com/d3z-the-dev/wiki-fetch
PyPI Page
https://pypi.org/project/wiki-fetch/

Project Popularity

PageRank
0.0017802739998305523
Number of downloads
242418

wiki-fetch

PyPI Status PyPI Downloads Python Version License Issues

Installation

  • PyPI
pip install wiki-fetch
  • Source
git clone git@github.com:d3z-the-dev/wiki-fetch.git
cd wiki-fetch && poetry build
pip install ./dist/*.whl

Usage

CLI

Options for use in console
Option Flag Long Default Example
Wiki's page link -u --url None https://en.wikipedia.org/wiki/The_Doors
Search query -q --query None The Doors (band)
Page language -l --lang English English
Part of the page -p --part all infobox
Parts by order -i --item all first
Output format -o --output text text
wiki-fetch -q 'The Doors (band)' -p infobox -i first
output
Infobox: 
    The Doors: 
        The Doors: 
            Image: https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/The_Doors_1968.JPG/250px-The_Doors_1968.JPG
            Caption: The Doors in 1966: Morrison (left), Densmore (centre), Krieger (right) and Manzarek (seated)
        Background information: 
            Origin: Los Angeles, California, U.S.
            Genres: 
                Psychedelic Rock
                Blues Rock
                Acid Rock
            Years active: 
                1965-1973
                1978
            Labels: 
                Elektra
                Rhino
            Spinoffs: 
                The Psychedelic Rangers
                Butts Band
                Nite City
                Manzarek-Krieger
            Spinoff of: Rick & the Ravens
            Past members: 
                Jim Morrison
                Ray Manzarek
                Robby Krieger
                John Densmore
            Website: thedoors.com
URL: https://en.wikipedia.org/?search=The Doors (Band)

Python

Arguments of function and class
Argument Values Description
url str Any Wiki's page URL
query str Any query string
lang str Any of available languages
part infobox, paragraph, table, list, thumb, toc, all Specify page part
item first, last, all Specify the order of the part
from wiki_fetch.driver import Wiki

output = Wiki(lang='English').search(query='The Doors (band)', part='infobox', item='first')
print(output.json)
output
{
    "Infobox": [
        {
            "The Doors": {
                "The Doors": {
                    "Image": "https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/The_Doors_1968.JPG/250px-The_Doors_1968.JPG",
                    "Caption": "The Doors in 1966: Morrison (left), Densmore (centre), Krieger (right) and Manzarek (seated)"
                },
                "Background information": {
                    "Origin": "Los Angeles, California, U.S.",
                    "Genres": [
                        "Psychedelic Rock",
                        "Blues Rock",
                        "Acid Rock"
                    ],
                    "Years active": [
                        "1965-1973",
                        "1978"
                    ],
                    "Labels": [
                        "Elektra",
                        "Rhino"
                    ],
                    "Spinoffs": [
                        "The Psychedelic Rangers",
                        "Butts Band",
                        "Nite City",
                        "Manzarek-Krieger"
                    ],
                    "Spinoff of": "Rick & the Ravens",
                    "Past members": [
                        "Jim Morrison",
                        "Ray Manzarek",
                        "Robby Krieger",
                        "John Densmore"
                    ],
                    "Website": "thedoors.com"
                }
            }
        }
    ],
    "URL": "https://en.wikipedia.org/?search=The Doors (Band)"
}

Specification

Available options
Parts of page Output formats Language
infobox text English
paragraph json Ukrainian
table dict Russian
list Polish
thumb German
toc Nederlands
Swedish
Spanish
French
Italian
Japanese
Chainese
Cebuano