Skip to content

Commit 7ce4176

Browse files
committed
Set bozo bit on non-fatal errors
Previously these errors were ignored, since an exception is raised only on fatal errors. With this change, when a non-fatal error occurs, the bozo bit is still set, but the feed is not reparsed with the loose parser. Background and discussion here: lemon24/reader#350
1 parent 11990ea commit 7ce4176

File tree

2 files changed

+10
-2
lines changed

2 files changed

+10
-2
lines changed

Diff for: feedparser/api.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -327,8 +327,12 @@ def _parse_file_inplace(
327327
saxparser.parse(source)
328328
except xml.sax.SAXException as e:
329329
result["bozo"] = 1
330-
result["bozo_exception"] = feed_parser.exc or e
330+
result["bozo_exception"] = e
331331
use_strict_parser = False
332+
else:
333+
if feed_parser.bozo:
334+
result["bozo"] = 1
335+
result["bozo_exception"] = feed_parser.exc
332336

333337
# The loose XML parser will be tried if the strict XML parser was not used
334338
# (or if it failed to parse the feed).

Diff for: tests/illformed/always_strip_doctype.xml

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://site.invalid/">
22
<!--
33
Description: unstripped invalid doctype
4-
Expect: not bozo and feed['title'] == 'found'
4+
Expect: feed['title'] == 'found'
5+
--><!--
6+
The Expect-line above doesn't check if the bozo bit is set, because it depends
7+
on the XML parser used: libxml2 issues an error for the invalid doctype, while
8+
Expat does not.
59
-->
610
<rss>
711
<channel>

0 commit comments

Comments
 (0)