Skip to content

Speech for PDF documents #1131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from
Draft

Conversation

michalrentka
Copy link
Contributor

No description provided.

@yexingsha
Copy link

iOS-read-aloud-mobile.mp4
iOS-read-aloud-tablet.mp4

Mocks in Figma.

  • The panel is a sheet on mobile, and a popover on tablet. It can be closed or minimized (more on that later) by tapping outside, or tapping its button, or dragging the handle down on mobile.
  • Toggling Reading Mode will auto close the panel. When Reading Mode is on, the button icon is inverted.
  • The buttons in the top bar in Reading Mode reflect an ideal state; right now we just need this one button, so that you can access Read Aloud there.
  • Tapping "Read Aloud" will start the voice over, and expand the group to show controls. Tapping the close button in the group will revert it to button. I designed it this way so that 1) if you mainly use this panel for Reading Mode, you don't have to get distracted by the Read Aloud controls every time; 2) it's clear when there is no Read Aloud in progress, as opposed to when the Read Aloud in progress was paused. This distinction is necessary for the next point.
  • One major issue I have with Safari's TTS approach is that you have to open the panel to get access to the Read Aloud controls, making it hard to read and listen at the same time. In the design above, when you minimize the panel while Read Aloud is in progress, it will turn into a tool bar. Tapping the tool bar or the button will get the panel back.
    • On mobile, it's a bottom tool bar, very straightforward.
    • On tablet, it floats in the top left corner, so contents will get pushed down (if page transition is "continuous", then only the first page gets pushed down). It's a bit awkward, but the toolbar has to stay at roughly the same position as the popover for the transition to work. I tried moving the controls into the top navigation bar and it was much worse than this.
  • I think it's necessary for us to provide voice options here, since PDFs might not set language correctly. Ideally voice should persist per language.
  • As to the icon of the button, I think doc.plaintext is fine, since both Reading Mode and Read Aloud basically is outputting plaintext, in a sense. I'm not sure if we should copy Safari's icon, since it doesn't really convey any clear meaning, and feels like a quirky hamburger icon in its appearance and function.
  • Also: we should change the gearshape icon to textformat.alt, since it basically does what the appearance icon does on desktop. In the future, it will contain more options for EPUB and Reading Mode, like line height, font size, etc.

@michalrentka
Copy link
Contributor Author

michalrentka commented Jun 24, 2025

Thanks for that @yexingsha, looks good! I just wonder whether it's worth "hiding" the speech interface behind another button. Wouldn't it be simpler to just immediately show the speech interface? If you don't want speech it's easy to ignore and if you want it you can just tap twice instead of three times to start it.

Edit: Actually nevermind. I just re-read it and the button would immediately start speech, so that's all good I'd say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants