Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiating Automatically-generated Caption #199

Open
teron131 opened this issue Aug 26, 2024 · 0 comments
Open

Differentiating Automatically-generated Caption #199

teron131 opened this issue Aug 26, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@teron131
Copy link

teron131 commented Aug 26, 2024

How to mark the fetched caption lang code to differentiate whether the caption is automatically generated?

Currently:

from pytubefix import YouTube

# Using url where only auto-gen English caption is available
yt = YouTube(sample_urls[0])
yt.caption_tracks[0]
# <Caption lang="English" code="en">

yt.captions["en"]
# <Caption lang="English" code="en">

I see that the source code included captions.py:

class Caption:
    """Container for caption tracks."""

    def __init__(self, caption_track: Dict):
        """Construct a :class:`Caption <Caption>`.

        :param dict caption_track:
            Caption track data extracted from ``watch_html``.
        """
        self.url = caption_track.get("baseUrl")

        # Certain videos have runs instead of simpleText
        #  this handles that edge case
        name_dict = caption_track['name']
        if 'simpleText' in name_dict:
            self.name = name_dict['simpleText']
        else:
            for el in name_dict['runs']:
                if 'text' in el:
                    self.name = el['text']

        # Use "vssId" instead of "languageCode", fix issue #779
        self.code = caption_track["vssId"]
        # Remove preceding '.' for backwards compatibility, e.g.:
        # English -> vssId: .en, languageCode: en
        # English (auto-generated) -> vssId: a.en, languageCode: en
        self.code = self.code.strip('.')

How to make the function calls to get "a.en"?

@teron131 teron131 added the enhancement New feature or request label Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: in progress
Development

No branches or pull requests

1 participant