Channel: GeekTips
sudo apt install python-is-python3is an automatic way instead of having so symlink python3 to python it sets it all up.
substitcher202309.zip
14.4 MB
substitcher202309 .zip 14MiB
requirements
Biggest difference I've noticed is medium is better to not hallucinate than medium.en and large (which is largev2). I've also tried quantized 5 model but accuracy is as bad as small so might as well just use small in that case.
Substicher comes with a sample librivox audiobook to quickly play around with the options. You put all your srt or vtt subs into the root directory along with the opus audio segments and stitch them together. With the included audiobook extract by chapters and rename 001.opus, 002.opus which is option h) and that will correspond to the whisper.cpp transcribed vtt or srt.
Play audiobooks with subs with a black cover image. Linux SMPlayer, Windows PotPlayer, Mac IINA, Android mpv-android, iOS $ nPlayer or Liquid Player.
requirements
pip3 install pysubs2 titlecasePurpose is to identify hallucinations, repeating subs, stuck timecodes, repeating timecodes.
sudo apt install jq kid3-cli rename ffmpeg
flatpak install flathub org.freac.freac
Biggest difference I've noticed is medium is better to not hallucinate than medium.en and large (which is largev2). I've also tried quantized 5 model but accuracy is as bad as small so might as well just use small in that case.
Substicher comes with a sample librivox audiobook to quickly play around with the options. You put all your srt or vtt subs into the root directory along with the opus audio segments and stitch them together. With the included audiobook extract by chapters and rename 001.opus, 002.opus which is option h) and that will correspond to the whisper.cpp transcribed vtt or srt.
Play audiobooks with subs with a black cover image. Linux SMPlayer, Windows PotPlayer, Mac IINA, Android mpv-android, iOS $ nPlayer or Liquid Player.
ffmpeg -i a.vtt a.srt
ffmpeg -i a.mkv -c copy srt a.srt
now reverse it from srt to vttffmpeg -i a.srt a.vtt
This might be better than pysubs2 a.vtt -t srt
due to the fact that it won’t put an index number. This alone would enable Substitcher to run on mac and most likely Windows as I wouldn’t need those two sed commands I’m thinking.for f in *.vtt ; do ffmpeg -i "$f" "${f%.*}.srt" ; done
is equivalent of
pysubs2 *.vtt -t srt
And last one which lets you download the subs (srt) is Youtube Subtitle Downloader https://www.youtube.com/watch?v=GMcDekiRKs8 (this is the one that has some subs in other languages)
substitcher202310.zip
14.4 MB
Substitcher 202310 finally got it to work on Mac. Main thing was switch from converting subs from pysubs2 to ffmpeg so didn’t need to use two sed scripts that were trouble on mac. Still use pysubs2 for shifting time.
Removing metadata should work much better now as all chapter data is deleted. Title is added based on filename. Added a bunch more time scripts to each job so it shows time took. Cleaned up script a bit.
Removing metadata should work much better now as all chapter data is deleted. Title is added based on filename. Added a bunch more time scripts to each job so it shows time took. Cleaned up script a bit.
mkdir output; parallel --tag -j 2 ocrmypdf -O 3 -s --skip-big .1 '{}' 'output/{}' ::: *.pdf
Batch PDF image optimization to reduce file sizes. Usually use level 2 but sometimes 3 is warranted as long as you don't have text in images then it looks bad. In these pdfs only the first two images look kinda bad but rest throughout the pdfs are just fine. File savings is great 2.5GB —> 1.4GB (44% overall reduction)
Adding bookmarks with booky script. Download
Now just run booky and a new PDF file with the bookmarks is created
Creates
booky .sh
and booky .py
and chmod +x
both and put in /usr/local/bin
then make a text file with chapters / bookmarks with a comma , before the page number and save as bookmarks.txt
Now just run booky and a new PDF file with the bookmarks is created
booky.sh some.pdf bookmarks.txt
Creates
some_new.pdf
bookmarks.txt
7.8 KB
bookmarks.txt give you an example just put bookmarks between { }. You can do multiple levels that expand by using many { }.
HTML Embed Code: