Channel: GeekTips
bookmarks.txt
5.7 KB
booky. sh some.pdf bookmarks.txt
Here are the bookmarks with the 7 sections and offset adjusted to +1 for pages numbers to get an idea.
Had a chm and uploaded to an online site zamzar and it automatically generated bookmarks. So I wanted to mimic that. Finally found pdf.tocgen
Extract some metadata to help make a recipe. Doing level one (-a 1) on page 78 (-p 78) and trying to figure out font and font size and/or font color to search for the pattern so it automatically generates the bookmarks.
pip3 install pdf.tocgen
Extract some metadata to help make a recipe. Doing level one (-a 1) on page 78 (-p 78) and trying to figure out font and font size and/or font color to search for the pattern so it automatically generates the bookmarks.
pdfxmeta -a 1 -p 78 input.pdf >> recipe.toml
now add toc / bookmarks to pdf
Another PDF I did it went up 1.5MB in file size with tons of bookmrks. This one went from 11MB to 242MB. No idea why so compressed it and it went down to 5MB.
pdftocio input.pdf < toc
and it'll output a PDF with bookmarks named input_out.pdfAnother PDF I did it went up 1.5MB in file size with tons of bookmrks. This one went from 11MB to 242MB. No idea why so compressed it and it went down to 5MB.
ocrmypdf -O 3 -s --skip-big .1 --jbig2-lossy input.pdf output.pdf
pdftocgen -v volume_10_surahs_12-15.pdf < recipe.toml > toc
pdftocio volume_10_surahs_12-15.pdf < toc
pdftocgen -v volume_11_surahs_16-20.pdf < recipe.toml > toc
pdftocio volume_11_surahs_16-20.pdf < toc
pdftocgen -v volume_12_surahs_21-25.pdf < recipe.toml > toc
pdftocio volume_12_surahs_21-25.pdf < toc
pdftocgen -v volume_13_surahs_26-32.pdf < recipe.toml > toc
pdftocio volume_13_surahs_26-32.pdf < toc
pdftocgen -v volume_14_surahs_33-39.pdf < recipe.toml > toc
pdftocio volume_14_surahs_33-39.pdf < toc
Used this recipe
[[heading]]
level = 1
greedy = true
font.name = "BookAntiqua-Italic"
font.size = 24.0
[[heading]]
level = 1
greedy = true
font.name = "BookAntiqua-Bold"
font.size = 24.0
[[heading]]
level = 1
greedy = true
font.name = "BookAntiqua"
font.size = 18.0
[[heading]]
level = 2
greedy = true
font.name = "BookAntiqua"
font.size = 23.377168655395508
font.size_tolerance = 0.2
[[heading]]
level = 2
greedy = true
font.name = "BookAntiqua"
font.size = 24.0
[[heading]]
level =
3
greedy = true
font.name = "BookAntiqua-Bold"
font.size = 12.0
An this is the result. Still need to edit the TOC a tad before writing it to a new pdf. Couldn't imagine manually bookmarking thousands of bookmark in this 18 volume book.
regex to strip out any bookmarks longer than 60 characters in length
search:
to replace all no verses 7:23, 8:23, 110:11 for instance
just select verses Ctrl+Shift+L in vscode for regex. Cut them then delete rest of lines then paste back
regex to strip out any bookmarks longer than 60 characters in length
search:
(^"\d+:\d+.\D{60})(.*?")replace:
$1"
to replace all no verses 7:23, 8:23, 110:11 for instance
(^"\D+\.*?\d+ \d+.\d+)
(^"\D+.*?$)
just select verses Ctrl+Shift+L in vscode for regex. Cut them then delete rest of lines then paste back
(^"\d+:\d+.*?$)
Switched Substicher down to 99 max for last month or so but now back up to 200 since no matter what I did be it removing silence, medium or large model and translate or no translate or other language then translate lo and behold it would hallucinate (repeat). So now combatting it with keeping each audio segment under 30 mins. Hopefully when stitch back all 200 subtitles into one it'll be in sync.
This particular audiobook I'm transcribing is 84 hours so 200 chunks is 25m each. Found out if using large with ComputeAllUnites for CoreML with large model works fine but can't really use M1 for anything else so using -ng (no graphics) option allows one to keep using laptop and whisper.cpp goes a tad slower. Worth it for me though.
yt-dlp -f wa -o "%(autonumber)03d - %(title)s.%(ext)s" "someyoutubeplaylist"
If the playlist you're downloading only has chapter names then they'll be out of order. Numbering them by modified date might not always work either.
The Cow [HDenOsoOJXQ].mp4
The Family of Amran [jUondpleUD8].mp4
The Opening [KlCyXnSCcyM].mp4
The Women [rY8l3LkcLKw].mp4
so instead of getting out of order (like above) the playlist order is maintained
001 - The Opening.mp4
002 - The Cow.mp4
003 - The Family of Amran.mp4
004 - The Women.mp4
005 - The Food.mp4
006 - The Cattle.mp4
007 - The Elevated Places.mp4
download a playlist and number in reverse since playlist is in reverse order.
yt-dlp -f wa --split-chapters --playlist-reverse -o "%(playlist_autonumber)03d - %(title)s.%(ext)s "someyoutubeplaylist URL"
parallel examples ..notice after the ::: is the input
I just processed 105 mp3 files to remove silence and hiss and first method I used took on a Mac M1
Removing silence and hiss took 01h:52m:05s
Now I still might use this method but obviously use parallel.
Now for an A to B comparison I did only ffmpeg to remove silence and dynamic audio normalization, remove hiss
Removing silence and hiss took 00h:40m:23s
Removing silence and hiss using parallel took 00h:10m:42s
So yeah 4x faster.
I just processed 105 mp3 files to remove silence and hiss and first method I used took on a Mac M1
Removing silence and hiss took 01h:52m:05s
Now I still might use this method but obviously use parallel.
Now for an A to B comparison I did only ffmpeg to remove silence and dynamic audio normalization, remove hiss
Removing silence and hiss took 00h:40m:23s
Removing silence and hiss using parallel took 00h:10m:42s
So yeah 4x faster.
example convert a for in loop using ffmpeg to parallel
Convert to parallel (below to above example)
dtmove=$( date +%Y_%m_%d_%H_%M_%S)
[ ! -d output ] && mkdir output
mkdir output/"$dtmove"
parallel --bar ffmpeg -i {} -hide_banner -c:a libopus -b:a 32k -af "highpass=200,lowpass=3000,afftdn=tr=1,volume=8dB,dynaudnorm" output/$dtmove/{/} ::: output/$dt/*.opus
Convert to parallel (below to above example)
for i in *.opus ; do ffmpeg -i "$i" -hide_banner -c:a libopus -b:a 32k -af "highpass=200,lowpass=3000,afftdn=tr=1,volume=8dB,dynaudnorm" ../$dtmove/"$i" ; done
mpvconfig02152024mac.zip
3.7 MB
mpvconfig 02152024 Mac 3.7MiB ... mainly for opus chaptered audiobooks. Includes iptv lists.
HTML Embed Code: