TG Telegram Group Link
Channel: GeekTips
Back to Bottom
How to apply an offset to page numbers. So the actual page number is +1. Sometimes it's +3 or even +30. So in VSCode first select all the numbers you wish to increase. Only wish to select numbers at end of line.
First select all numbers with regex (\d+)$ and they'll be highlighted but not selected.
On Mac press CMD+Shift+L. On Linux, Windows Ctrl+Shift+L to select all regex selections.
In VSCode using extension Text Power Tools choose option increase/decrease decimal number with a custom value and put in 1 to increase. Notice all page numbers have been incremented by 1.
Final result for bookmarked PDF
bookmarks.txt
5.7 KB
booky. sh some.pdf bookmarks.txt

Here are the bookmarks with the 7 sections and offset adjusted to +1 for pages numbers to get an idea.
Had a chm and uploaded to an online site zamzar and it automatically generated bookmarks. So I wanted to mimic that. Finally found pdf.tocgen
pip3 install pdf.tocgen

Extract some metadata to help make a recipe. Doing level one (-a 1) on page 78 (-p 78) and trying to figure out font and font size and/or font color to search for the pattern so it automatically generates the bookmarks.
pdfxmeta -a 1 -p 78 input.pdf >> recipe.toml
This is the recipe.toml and found main chapters at level 1 and at level 2 the verses. Playing with tolerance took awhile but now I kinda know what to do.
toc (table of contents) using the recipe you created.

pdftocgen doc.pdf < recipe.toml > toc

Clean up toc if necessary.

Just had to do some regex to get rid of unwanted text after verses and some other unwated text and then delete any blank lines and the toc is ready to go.
now add toc / bookmarks to pdf
pdftocio input.pdf < toc
and it'll output a PDF with bookmarks named input_out.pdf

Another PDF I did it went up 1.5MB in file size with tons of bookmrks. This one went from 11MB to 242MB. No idea why so compressed it and it went down to 5MB.

ocrmypdf -O 3 -s --skip-big .1 --jbig2-lossy input.pdf output.pdf
pdftocgen -v volume_10_surahs_12-15.pdf < recipe.toml > toc
pdftocio volume_10_surahs_12-15.pdf < toc
pdftocgen -v volume_11_surahs_16-20.pdf < recipe.toml > toc
pdftocio volume_11_surahs_16-20.pdf < toc
pdftocgen -v volume_12_surahs_21-25.pdf < recipe.toml > toc
pdftocio volume_12_surahs_21-25.pdf < toc
pdftocgen -v volume_13_surahs_26-32.pdf < recipe.toml > toc
pdftocio volume_13_surahs_26-32.pdf < toc
pdftocgen -v volume_14_surahs_33-39.pdf < recipe.toml > toc
pdftocio volume_14_surahs_33-39.pdf < toc


Used this recipe

[[heading]]
level = 1
greedy = true
font.name = "BookAntiqua-Italic"
font.size = 24.0

[[heading]]
level = 1
greedy = true
font.name = "BookAntiqua-Bold"
font.size = 24.0

[[heading]]
level = 1
greedy = true
font.name = "BookAntiqua"
font.size = 18.0

[[heading]]
level = 2
greedy = true
font.name = "BookAntiqua"
font.size = 23.377168655395508
font.size_tolerance = 0.2

[[heading]]
level = 2
greedy = true
font.name = "BookAntiqua"
font.size = 24.0

[[heading]]
level =
3
greedy = true
font.name = "BookAntiqua-Bold"
font.size = 12.0
An this is the result. Still need to edit the TOC a tad before writing it to a new pdf. Couldn't imagine manually bookmarking thousands of bookmark in this 18 volume book.

regex to strip out any bookmarks longer than 60 characters in length
search:
(^"\d+:\d+.\D{60})(.*?")
replace:
$1"

to replace all no verses 7:23, 8:23, 110:11 for instance
(^"\D+\.*?\d+ \d+.\d+)

(^"\D+.*?$)

just select verses Ctrl+Shift+L in vscode for regex. Cut them then delete rest of lines then paste back
(^"\d+:\d+.*?$)
Switched Substicher down to 99 max for last month or so but now back up to 200 since no matter what I did be it removing silence, medium or large model and translate or no translate or other language then translate lo and behold it would hallucinate (repeat). So now combatting it with keeping each audio segment under 30 mins. Hopefully when stitch back all 200 subtitles into one it'll be in sync.
This particular audiobook I'm transcribing is 84 hours so 200 chunks is 25m each. Found out if using large with ComputeAllUnites for CoreML with large model works fine but can't really use M1 for anything else so using -ng (no graphics) option allows one to keep using laptop and whisper.cpp goes a tad slower. Worth it for me though.
yt-dlp -f wa -o "%(autonumber)03d - %(title)s.%(ext)s" "someyoutubeplaylist"

If the playlist you're downloading only has chapter names then they'll be out of order. Numbering them by modified date might not always work either.

The Cow [HDenOsoOJXQ].mp4
The Family of Amran [jUondpleUD8].mp4
The Opening [KlCyXnSCcyM].mp4
The Women [rY8l3LkcLKw].mp4


so instead of getting out of order (like above) the playlist order is maintained

001 - The Opening.mp4
002 - The Cow.mp4
003 - The Family of Amran.mp4
004 - The Women.mp4
005 - The Food.mp4
006 - The Cattle.mp4
007 - The Elevated Places.mp4


download a playlist and number in reverse since playlist is in reverse order.

yt-dlp -f wa --split-chapters --playlist-reverse -o "%(playlist_autonumber)03d - %(title)s.%(ext)s "someyoutubeplaylist URL"
parallel isn't all that hard. I actually decided to read the man page once. I used to specify -j2 or -j8 for number of jobs. No need as it automatically uses all available depending on computer.
parallel examples ..notice after the ::: is the input

I just processed 105 mp3 files to remove silence and hiss and first method I used took on a Mac M1

Removing silence and hiss took 01h:52m:05s

Now I still might use this method but obviously use parallel.

Now for an A to B comparison I did only ffmpeg to remove silence and dynamic audio normalization, remove hiss

Removing silence and hiss took 00h:40m:23s

Removing silence and hiss using parallel took 00h:10m:42s

So yeah 4x faster.
Notice it removed almost 4 hours of silent segments in 80 hours of content. That can be adjusted with the dB. Think I'm happy with -30dB though.
example convert a for in loop using ffmpeg to parallel

dtmove=$( date +%Y_%m_%d_%H_%M_%S)
[ ! -d output ] && mkdir output
mkdir output/"$dtmove"

parallel --bar ffmpeg -i {} -hide_banner -c:a libopus -b:a 32k -af "highpass=200,lowpass=3000,afftdn=tr=1,volume=8dB,dynaudnorm" output/$dtmove/{/} ::: output/$dt/*.opus

Convert to parallel (below to above example)

for i in *.opus ; do ffmpeg -i "$i" -hide_banner -c:a libopus -b:a 32k -af "highpass=200,lowpass=3000,afftdn=tr=1,volume=8dB,dynaudnorm" ../$dtmove/"$i" ; done
mpvconfig02152024mac.zip
3.7 MB
mpvconfig 02152024 Mac 3.7MiB ... mainly for opus chaptered audiobooks. Includes iptv lists.
HTML Embed Code:
2025/07/07 02:27:10
Back to Top