-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community[minor]: [GoogleApiYoutubeLoader] Replace API used in _get_document_for_channel from search to playlistItem #24034
Conversation
TupleType
commented
Jul 9, 2024
- Description: Search has a limit of 500 results, playlistItems doesn't. Added a class in except clause to catch another common error.
- Issue: None
- Dependencies: None
- Twitter handle: @TupleType
…Items because search has a limit of 500 results. Added a class in except clause to catch another common error.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
I got a non descriptive linting error, can someone please help? |
@@ -7,6 +7,7 @@ | |||
from pathlib import Path | |||
from typing import Any, Dict, Generator, List, Optional, Sequence, Union | |||
from urllib.parse import parse_qs, urlparse | |||
from xml.etree.ElementTree import ParseError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do this:
from xml.etree.ElementTree import ParseError # OK: user-must-opt-in
The linter is flagging the fact that the underlying library is relying on the built-in XML library.
It's essentially surfacing this:
Depending on user environment the built in may or may not have vulnerabilities:
https://docs.python.org/3/library/xml.html#xml-vulnerabilities
Do you know if the underlying library reading from a google provided XML file? If so, while not amazing this is probably fine.
Could you add a security note to the GoogleAPIClient saying that it relies on the standard xml library, but we're viewing the input as trusted in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the library well enough to determine that but this is the error I'm getting for https://www.youtube.com/watch?v=pM8e0Dbzopk:
File "C:\Users\user\repos\rag-demo\loader.py", line 71, in _get_document_for_playlist
transcript = self._get_transcripe_for_video_id(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\repos\rag-demo\.venv\Lib\site-packages\langchain_community\document_loaders\youtube.py", line 410, in _get_transcripe_for_video_id
transcript_pieces = transcript.fetch()
^^^^^^^^^^^^^^^^^^
File "C:\Users\user\repos\rag-demo\.venv\Lib\site-packages\youtube_transcript_api\_transcripts.py", line 292, in fetch
return _TranscriptParser(preserve_formatting=preserve_formatting).parse(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\repos\rag-demo\.venv\Lib\site-packages\youtube_transcript_api\_transcripts.py", line 358, in parse
for xml_element in ElementTree.fromstring(plain_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\xml\etree\ElementTree.py", line 1338, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 57, column 99
dependency checks failed but it doesn't have logs |
@@ -452,34 +463,32 @@ def _get_document_for_channel(self, channel: str, **kwargs: Any) -> List[Documen | |||
) | |||
|
|||
channel_id = self._get_channel_id(channel) | |||
request = self.youtube_client.search().list( | |||
uploads_playlist_id = self._get_uploads_playlist_id(channel_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change backwards compatible? I don't use the youtube API at all. Will this break anything for existing users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's backward compatible
…ocument_for_channel from search to playlistItem (langchain-ai#24034) - **Description:** Search has a limit of 500 results, playlistItems doesn't. Added a class in except clause to catch another common error. - **Issue:** None - **Dependencies:** None - **Twitter handle:** @TupleType --------- Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>