[CSDb] - User Forums - On the relative pros and cons of various RLE schemes

Welcome to our latest new user Werner ! (Registered 2024-05-30)

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > On the relative pros and cons of various RLE schemes

2024-04-21 17:12

ChristopherJam

Registered: Aug 2004
Posts: 1382

On the relative pros and cons of various RLE schemes

So over in Native crunch/decrunch code. I just suggested encoding
abbcdddeffffg as
abb{0}cdd{1}eff{2}g

Raistlin kindly liked the idea, Krill quite sensibly didn't want us to hijack the post and suggested a new post for the new topic. so... Have at it!

... 32 posts hidden. Click here to view all posts....

2024-04-23 10:06

enthusi

Registered: May 2004
Posts: 675

Just to add an alternative which is a bit special but cute:
Maniac Mansion and ZMK use some RLE depacker that resides in ZP and encodes most-common bytes, which I felt works rather well for graphics. I wrote something about that here:
https://www.pagetable.com/?p=603

A byte of < $40 = count of raw bytes following
$3f to < $80 represents the length of the run of the byte to follow (+$3f of course).
$7f to < $a0 is the number of runs of most common byte 1,
$9f to < $c0 and $bf to <$e0 and $df-$ff for common byte 2, 3 and 4 respectively.

2024-04-23 15:21

ChristopherJam

Registered: Aug 2004
Posts: 1382

ooh, I like that one enthusi.

Really interesting the historical divide between packers and crunchers, especially the way intros tended to be displayed between one and the other. It does indeed reduce the impact on runlength limits considerably. Not sure why I ever bothered trying to improve the ratio of my higher ratio crunchers for empty spaces now :P

2024-04-23 16:35

Martin Piper

Registered: Nov 2007
Posts: 646

In the old days, dictionary based compression would be quite slow and memory hungry. The usual way of speeding it up would be to limit the distance searched and limit the dictionary.

Using a RLE compression pass before a dictionary compression stage would reduce the total input data, it would reduce the strain on the dictionary, and would pass some benefit along to the dictionary based compression.

As dictionary algorithms improve, and also as time and memory becomes less of an issue, the dictionary size can be increased and this means there is less benefit to shrinking the input data with a RLE pre-pass.

2024-04-23 18:46

Fungus

Registered: Sep 2002
Posts: 629

Main reason for packing first was before crunchers had REU versions with 2mhz or 8mhz support they were slow as hell and took many hours. So smaller input files crunched way faster. Also crunching large files tended to bug out. They generally were smaller too, so it became the norm.

2024-05-29 11:39

Starfox

Registered: Jul 2014
Posts: 34

I researched a lot about RLE a year ago.

Forgot most of it, lol!

But you can preprocess the data before run-length encoding it, to make the RLE more successful.

There were a couple of schemes used, which were probably mostly optimized for text. Burrows-Wheeler Transform for instance.

I have my notes at home, let me see if there's anything useful.

2024-05-29 14:20

Martin Piper

Registered: Nov 2007
Posts: 646

Quote: I researched a lot about RLE a year ago.

Forgot most of it, lol!

But you can preprocess the data before run-length encoding it, to make the RLE more successful.

There were a couple of schemes used, which were probably mostly optimized for text. Burrows-Wheeler Transform for instance.

I have my notes at home, let me see if there's anything useful.

If I recall, reversing BWT encoded data needs a few sorts for each stored string. Which makes it quite expensive? Creating the BWT data also needs a sort, but I don't care about that so much.

2024-05-29 20:13

Bansai

Registered: Feb 2023
Posts: 36

Quoting Martin Piper

If I recall, reversing BWT encoded data needs a few sorts for each stored string. Which makes it quite expensive? Creating the BWT data also needs a sort, but I don't care about that so much.

https://marknelson.us/posts/1996/09/01/bwt.html

...BWT is off topic from RLE, but Nelson breaks compression and decompression up into phases that can be piped, so it's easy to move them around, omit them entirely, etc., to see what they do.

His "The Data Compression Book" was a decent reference at the time it was published: https://marknelson.us/pages/tdcb.html. It's a good intro text for those who want to understand various forms of data compression.

2024-05-30 20:15

Starfox

Registered: Jul 2014
Posts: 34

I found some links in my notes. Take them as they are.

It's mostly BWT though:

https://www.baeldung.com/cs/bwt-rle-compression-algorithm-for-s..

https://www.fileformat.info/mirror/egff/ch09_03.htm

https://www.youtube.com/watch?v=4n7NPk5lwbI

I had some other pdf's/videos with some other methods. If only I could find them... (same youtube channel)

Here's one "FM index": https://www.youtube.com/watch?v=kvVGj5V65io (It's also based on BWT).

I can't find a description of a scheme to encode the number of bytes in a row, I read, where you stored the number of equal bytes in bit representation so it could be any number. I'll put the link if I find it.

Think it is this Unary coding/elias in this video, where he explains it:

https://www.youtube.com/watch?v=3jKLjmV1bL8

ok last link about text transforms before encoding:
https://www.youtube.com/watch?v=Q2pinaj3i9Y

p.s sorry for messy post :)

2024-05-31 04:34

ChristopherJam

Registered: Aug 2004
Posts: 1382

Cheers for the links, Starfox!

This is getting a bit off topic (ironic given that this page was created to shift all the RLE talk away from another discussion) but yeah, I've been looking at FM-Index and BWT for implementing a native cruncher that doesn't require an REU just to get compression time down to something workable. There's been a lot of interesting research the past 20 years!

Have you done any work on implementing BWT in a c64 context?

2024-05-31 20:41

Burglar

Registered: Dec 2004
Posts: 1051

Quoting enthusi

Just to add an alternative which is a bit special but cute:
Maniac Mansion and ZMK use some RLE depacker that resides in ZP and encodes most-common bytes, which I felt works rather well for graphics. I wrote something about that here:
https://www.pagetable.com/?p=603

A byte of < $40 = count of raw bytes following
$3f to < $80 represents the length of the run of the byte to follow (+$3f of course).
$7f to < $a0 is the number of runs of most common byte 1,
$9f to < $c0 and $bf to <$e0 and $df-$ff for common byte 2, 3 and 4 respectively.

that is interesting indeed, also lovely article, even Ron Gilbert himself replied :)

not having looked at the unpacker, i'm assuming lotsa CMPs, so possibly custom ranges could generate great results for various inputs.

Previous - 1 | 2 | 3 | 4 | 5 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

xIII/ATL/WOW
Guests online: 106

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.7)
5 Edge of Disgrace  (9.6)
6 Aliens in Wonderland  (9.6)
7 No Bounds  (9.6)
8 Comaland 100%  (9.6)
9 Uncensored  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 Happy Birthday Dr.J  (9.7)
2 Layers  (9.6)
3 It's More Fun to Com..  (9.6)
4 Cubic Dream  (9.6)
5 Party Elk 2  (9.6)
6 Copper Booze  (9.6)
7 TRSAC, Gabber & Pebe..  (9.5)
8 Rainbow Connection  (9.5)
9 Dawnfall V1.1  (9.5)
10 Daah, Those Acid Pil..  (9.5)

Top Groups

1 Nostalgia  (9.4)
2 Oxyron  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 SHAPE  (9.3)

Top Diskmag Editors

1 Jazzcat  (9.4)
2 Magic  (9.4)
3 hedning  (9.2)
4 Elwix  (9.1)
5 A Life in Hell  (9.1)

Page generated in: 0.041 sec.