DEV Community

Cover image for BeautifulSoup: REPLACEMENT CHARACTER
YURIIDE
YURIIDE

Posted on • Updated on

BeautifulSoup: REPLACEMENT CHARACTER

BeautifulSoup: Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER

Use UnicodeDammit, more https://www.crummy.com/software/BeautifulSoup/bs4/doc/#unicode-dammit

self.bs = BeautifulSoup(
    UnicodeDammit(
        content, 
        ["latin-1", "iso-8859-1", "windows-1251"]
    ).unicode_markup,
    "html.parser")
Enter fullscreen mode Exit fullscreen mode

Top comments (0)