Endianness, a constant source of conflict for decades
ghusbands 2021-08-18 10:50:39 +0000 UTC [ - ]
DEADBEEF
And a little-endian system gives: FEBEADDE
Only one of those reads naturally. As long as your increasing memory order goes from left to right, as is the norm.
zepearl 2021-08-18 20:09:52 +0000 UTC [ - ]
but in little-endian systems the order of the individual bits in a byte is still (kind of) big-endian, with the most meaningful bit on the left, right?
Personally I always think first about how the single bits are stored in a byte, which is as far as I know, with the most important bit on the left, independently if the system is big- or little-endian, for example:
===
0: 0000 0000
1: 0000 0001
2: 0000 0010
...
255: 1111 1111
===
Therefore if the number/value gets bigger than a byte for me it's just more natural to keep adding bits to the left of the sequence of bits (therefore big-endian style), for example:
===
256: 1 0000 0000
257: 1 0000 0001
...
===
ghusbands 2021-08-18 21:21:41 +0000 UTC [ - ]
It has no bearing on the layout of bits in the RAM chips, caches, register files or data/address lines in a computer. Endianness pretty much only affects the operation of transferring values from/to memory.
zepearl 2021-08-18 21:55:29 +0000 UTC [ - ]
To me it would make sense to use little-endian if the same would be applied as well on the level of "bits" within a byte... . Using little-endian on a byte-level but big-endian on a bit-level (within a byte) is quite confusing for me. Just my subjective opinion.
yakubin 2021-08-18 23:39:21 +0000 UTC [ - ]
Single bits are not addressable, so from the point of view of a programmer "order of bits" isn't well-defined. There isn't a way to distinguish endianness of bits for a programmer.
api 2021-08-18 16:26:49 +0000 UTC [ - ]
Big-endian is pretty much dead. I am not aware of any BE systems on sale right now outside small MIPS routers and embedded. If the Internet were designed today network byte order would probably be little-endian. New cryptographic algorithms are being designed little-endian-first.
classichasclass 2021-08-18 23:35:17 +0000 UTC [ - ]
recursive 2021-08-18 16:21:16 +0000 UTC [ - ]
PennRobotics 2021-08-18 09:00:46 +0000 UTC [ - ]
-----
At least for me, reading 16- and/or 32-bit multi-channel sensor data and then transceiving via 8-bit radio (atmega32u4) is at least a little easier when all the endians align. No byte swapping. No jumping a pointer ahead for each byte in a packet. Most importantly, no calling swapbytes (and its relevant data casts) in Matlab reading the serial data directly off the microcontroller.
While this is all relatively straightforward to code, each extra line is another tiny chance of error, and the failure modes in streaming data are not always obvious e.g. FIFO size is a power of two but channel data is an odd number of bytes/words. Do you drop data or pause until FIFO empties? Do circular buffers increment or decrement?
Luckily, modern sensors (e.g. IMUs) I've used usually have registers to byte swap, drop LSBs when not needed, choose right or left zero bit padding, change channel order, alter FIFO behavior, interrupt at a FIFO threshold, and so on.
JoachimS 2021-08-18 09:17:03 +0000 UTC [ - ]
kzrdude 2021-08-18 10:04:58 +0000 UTC [ - ]
Yes, 55 is "half threesh" in the same way, that is we say 5 + half 3 x 20, but half 3 == 2.5, of course!
dandellion 2021-08-18 09:56:08 +0000 UTC [ - ]
So you call any number "five and half of thirty"? That sounds pretty easy to be honest.
JoachimS 2021-08-18 11:49:17 +0000 UTC [ - ]
agent327 2021-08-18 10:21:20 +0000 UTC [ - ]
PennRobotics 2021-08-18 10:25:24 +0000 UTC [ - ]
TacticalMalice 2021-08-18 09:08:56 +0000 UTC [ - ]
Dutch is similar and this is a source of mistakes when writing down (phone) numbers. I've resorted to calling out the digits in LTR order.
PaulIH 2021-08-18 10:47:35 +0000 UTC [ - ]
BoxOfRain 2021-08-18 09:16:33 +0000 UTC [ - ]
junon 2021-08-18 09:14:41 +0000 UTC [ - ]
Another thing, years in German aren't spoken as "twenty twenty-one" as we commonly do in English, but instead the number is spoken out fully - "two-thousand one and twenty" ("zwei tausend eins und zwanzig").
twic 2021-08-18 09:53:28 +0000 UTC [ - ]
darrenf 2021-08-18 16:58:10 +0000 UTC [ - ]
IME road number† pronunciation in England goes out of its way to avoid "thousand" and "hundred" – with exceptions, of course. Off the top of my head, I reckon I say them like this:
* one or two digits = spoken as the number rather than the digits: A three, M twenty five, etc.
* three digits = sometimes spoken as digits: A two-one-seven - but sometimes broken into two numbers: B one-eleven
* four digits = sometimes the number A thirty-one-hundred (never three-thousand-one-hundred!), sometimes digits B triple-one three, sometimes year-style B thirteen eighteen
There are probably more variations that I can't think of right now too. It's a mess :D
† and bus route numbers, for that matter
skydhash 2021-08-18 11:45:45 +0000 UTC [ - ]
PennRobotics 2021-08-18 10:33:31 +0000 UTC [ - ]
Hearing Google Assistant/Maps mispronounce German street or city names in an American accent is very grating to the ears. The pronunciation of a location name should ignore the language spoken, right? (Ignore for a moment the edge cases, like München vs Munich... although the voice says, "Munchin'," which is wrong in both languages!) And it can't be too complicated to borrow phonemes from another language where they don't exist... Right? Your American text-to-speech algorithm encounters an umlaut, then generate the correct waveforms from a language with umlauts.
(I'm sure someone reading this is jumping up and down, yelling about the "photo of a bird" xkcd.)
Akronymus 2021-08-18 10:53:30 +0000 UTC [ - ]
(I have my phone set to english, because I prefer it like that, despite living in austria, europe. Street names are one of the reasons I rarely ever use google maps for navigation)
simtel20 2021-08-18 10:56:22 +0000 UTC [ - ]
PennRobotics 2021-08-18 16:19:09 +0000 UTC [ - ]
(The German Michael is kinda... Michh-aye-ehl'.)
maxerickson 2021-08-18 11:04:56 +0000 UTC [ - ]
PennRobotics 2021-08-18 10:24:37 +0000 UTC [ - ]
I guess this is exactly what we're talking about---mistakes because you are not natively familiar with a particular system, and then you miss the non-base case. For me, I got the tens digit right but not the ten thousands digit.
In the memory case, it's knowing to change a pointer location because an address to a 32-bit value will start or end at a different address than a 64-bit value.
agent327 2021-08-18 10:22:24 +0000 UTC [ - ]
__del__ 2021-08-18 16:12:57 +0000 UTC [ - ]
ex. 415-222-9670 becomes: sub universal (one less than the answer to life, the universe and everything) deck (52 cards) swift (she's feeling 22) resolution (old dpi on wandows) top speed (California speed limit)
now isn't "sub universal deck swift resolution top speed" easier than googling twitter hq? ;] granted, the associations have to make sense to you. for me, 96 was a toss up between nashville (code name of windows 96) and the resolution i had to train myself to remember after moving from the mac's 72.
skerit 2021-08-18 16:10:38 +0000 UTC [ - ]
thaumasiotes 2021-08-18 16:04:34 +0000 UTC [ - ]
What's unique about that?
Sing a song of sixpence, a pocket full of rye
Four and twenty blackbirds baked in a pie
When the pie was opened, the birds began to sing
Wasn't that a dainty dish to set before the king?
8ytecoder 2021-08-18 16:32:10 +0000 UTC [ - ]
ithkuil 2021-08-18 11:39:10 +0000 UTC [ - ]
I remember VMS EXAMINE command but also printed DEC manuals showing hex dumps where the columns where numbered RTL. The ascii view on the right half of the hex dump was following the LTR order, so basically the two views were mirrored. (see example at http://www0.mi.infn.it/~calcolo/OpenVMS/ssb71/4556/4556p004....)
With such a rendering, little endian does indeed natural.
We're doing it all the time when rendering bit positions:
bit pos: 3210
bit val: 1100
extending this layout for byte indices is quite natural indeed.
Miiko 2021-08-18 07:31:40 +0000 UTC [ - ]
* "big-endian" should have "big" (most significant) part on the end
* and "little-endian" should have "little" (least significant) bits at the end
Is there different mnemonics to remember what is what?
yetihehe 2021-08-18 07:35:14 +0000 UTC [ - ]
Miiko 2021-08-18 07:45:14 +0000 UTC [ - ]
kevin_thibedeau 2021-08-18 08:28:05 +0000 UTC [ - ]
yetihehe 2021-08-18 09:55:16 +0000 UTC [ - ]
kevin_thibedeau 2021-08-18 15:20:55 +0000 UTC [ - ]
anyfoo 2021-08-18 16:43:53 +0000 UTC [ - ]
kevin_thibedeau 2021-08-18 17:26:27 +0000 UTC [ - ]
anyfoo 2021-08-18 17:37:44 +0000 UTC [ - ]
kazinator 2021-08-18 08:03:01 +0000 UTC [ - ]
However, this effect can hide bugs under little endian, which will instantly reproduce on big endian.
Suppose that, say, a function expects a 32 bit parameter, but the caller thinks it is passing a byte, whose value is XX. Suppose that by fluke the memory is all zeros. Under little endian, the caller puts XX at the right memory location in the stack, resulting in XX 00 00 00. And, by golly, the callee gets the corect 32 bit value XX.
Under big endian, even if by fluke the memory is all zeros, the caller will put the XX byte resulting in the same XX 00 00 00. But this now looks like a huge 32 bit value to the callee, hopefully caught in testing.
The apparently correct value will not be caught in testing.
Little endian would need nonzero values in the extra bytes instead of the fluky zeros in order to see a bad value.
rwmj 2021-08-18 11:57:40 +0000 UTC [ - ]
classichasclass 2021-08-18 23:37:15 +0000 UTC [ - ]
That said, my POWER9 runs little, though it's perfectly capable of running big.
scratcheee 2021-08-18 09:25:03 +0000 UTC [ - ]
Maybe I'm just dumb, but surely the actual language order is entirely irrelevant? If hypothetically we wrote in English right-to-left instead, then we'd write our numbers right-to-left, and our memory dumps right-to-left, so then we'd find that little-endian caused the data to start with the smallest byte first (on the right).
Mirroring the language doesn't undo a mirroring within the language, and that's what little-endian is, so a more accurate statement would be:
>Binary dumps look more in line with how humans with single-direction scripts expect to read numbers.
cies 2021-08-18 09:45:46 +0000 UTC [ - ]
If you read the article the author shows you that in RTL langs (where our current number system originated from) the numbers were also RTL. We just stuck to the convention.
Interesting how this little bit of RTL snuck into Europes otherwise LTR languages to the extend that when i type a number in a spreadsheet it changes the allignment to with this... Interesting/insightful article!
thaumasiotes 2021-08-18 16:20:16 +0000 UTC [ - ]
Well, no, the author says this:
> Our modern numbering system has its roots in the Hindu numbering system, which was invented somewhere between the 1st and 4th century. Like the dominant writing system of the time, numbers were written right-to-left
This is not obvious - it appears that there was a right-to-left Indic script centered around Pakistan ( https://en.wikipedia.org/wiki/Kharosthi ) and a left-to-right one ( https://en.wikipedia.org/wiki/Brahmi_script ) farther south / east.
Before that, Sanskrit was written from left to right. It seems far more likely to me in any event that the order in which numbers are written, when the system is innovated, will reflect the order in which they are spoken in whatever language, not directly the order in which the language is written down.
Over time they will always develop a big-endian order, because that allows sorting them.
renox 2021-08-18 13:46:38 +0000 UTC [ - ]
When we speak we don't really use big endian nor little endian, we say 'six thousand one hundred' not 'six one zero zero' and obviously we prefer to start with the 'big' part because it's the most important for the listener: I don't really care if your price is '6 thousand and one' or '6 thousand and two', the listener hear '6 thousand' and then switch off..
And then written language followed oral language of course..
swiley 2021-08-18 08:24:35 +0000 UTC [ - ]
dataflow 2021-08-18 08:31:11 +0000 UTC [ - ]
Lvl999Noob 2021-08-18 08:46:44 +0000 UTC [ - ]
Here, I think 0 would be the most significant nibble and would be written at the left most point.
dataflow 2021-08-18 09:00:58 +0000 UTC [ - ]
vardump 2021-08-18 09:09:07 +0000 UTC [ - ]
Bytes 67 45 23 01 interpreted as 16-bit words in:
Little endian, LE, or what x86 uses: 4567 0123
Big endian, BE, or what 68k uses: 6745 2301
dataflow 2021-08-18 09:55:52 +0000 UTC [ - ]
Let me try illustrating a different situation, hopefully without that confusion.
Let's say your data starts with the byte sequence 67 45 23 01...
If you assume these represents some LE numbers, and want to multiply by 256 (decimal), you end up with 00 67 45 23 01... it really doesn't matter (and you don't need to know) what the word sizes were. That's the only sane result, and the byte at offset 2 would end up being 45h... end of story. Even if your number was only supposed to be N bytes and now it's N+1 bytes, you can just chop it back to N bytes and your result will still be correct (modulo 256^N) and as intuitive as it could be.
But if you start working in BE, suddenly things get confusing fast. Imagine what this operation would be for 2-byte BE words. The first word in BE is 6745 and now becomes 674500, and you overflowed by 1 byte. So which part do you keep and which part do you overflow to the next word? If you keep the 6745, then the 00 ends up affecting the second word rather than the first one, which is just completely nonsensical. The other option is to keep 4500 and and shove the 67 into the next word, turning it from 2301 into 230167. Now you have to repeat the same procedure with the 23, etc. until you reach the end of the data.
Now look at what just happened in the BE case. You have the bizarre situation where your words are internally BE (i.e. "go to offset 0" would now land on the 00 byte, which are not the first 2 characters in the editor!). And across words, they're still treated like LE—the inter-word overflows are still moving bytes to higher-order words on the right, not the left! There's just no sane way to do math with N-byte words and avoid LE entirely; even if you treat each word as BE, you're absolutely forced to treat the word sequence as LE. The only way to actually avoid all LE is to interpret the whole thing as 1 gigantic bignum, where "multiply by 256" ends up being translated into "append 00 to the end of the stream". That's great if your data really was 1 gigantic bignum, but not so much if your data was just typical ints or longs.
If this is still confusing (I realize it might be) then I'm not sure how else to put my thoughts into words unfortunately (no pun intended). Hopefully you can kind of see what I'm getting at though, even if I'm explaining some portions of it poorly (sorry).
nybble41 2021-08-18 19:29:13 +0000 UTC [ - ]
In other words, your problem is that your hex editor is incorrectly assuming that the numbers are little-endian. If it interpreted them as big-endian then nothing would be swapped since the order of the bytes matches the standard conventions for numbers in most European writing systems. This is not a big-endian problem, it's a little-endian problem. A decent hex editor will allow you to set the byte order to match your file. (And as the article points out, the issue would be reversed if the bytes were displayed right-to-left following the same conventions that we use for numbers… but then all your strings would be reversed.)
Both representations have strengths and weaknesses depending on what you want to do. Most arbitrary-precision math works better with LE. On the other hand, hexadecimal string formatting works better with the BE encoding, where LE would require either the input or output to be reversed.
dataflow 2021-08-18 20:12:12 +0000 UTC [ - ]
The most natural order/sequence out there is that of natural numbers with zero (aka whole numbers), i.e. N0 = 0, 1, 2, 3, ...
So I'm basically arguing that means the most digit places are in the same order, i.e. the coefficients would be ordered as 256^N i.e. 256^0, 256^1, 256^2, 256^3, ...
nybble41 2021-08-18 23:53:46 +0000 UTC [ - ]
For most arbitrary-precision math operations LE is easier to work with. The only thing making BE more "natural" in some situations (in particular, reading numbers out of a byte-oriented hex listing) is the historical accident that we borrowed our number system from right-to-left languages, where they were written little-endian with the one's place on the right, without reversing the order to match the surrounding text. Which leads to the alignment issues highlighted in the article.
dandanua 2021-08-18 08:41:53 +0000 UTC [ - ]
[1] https://github.com/dandanua/little-endian-vs-big-endian-in-q...
baybal2 2021-08-18 09:19:37 +0000 UTC [ - ]
bitwize 2021-08-18 07:24:59 +0000 UTC [ - ]
iainmerrick 2021-08-18 08:38:15 +0000 UTC [ - ]
Although to borrow from minusf’s point, it’s good for software robustness that file formats and hardware use different endianness, as it forces you to read things byte-by-byte rather than lazily assuming you can just read 4 bytes and cast them directly to an int32.
walki 2021-08-18 08:48:48 +0000 UTC [ - ]
Except that it is very bad for performance. As far as CPUs are concerned little-endian has definitely won, most CPU architectures that have been big endian in the past (e.g. PowerPC) are now little endian by default.
If all new CPU architectures are little endian this means that within a decade or two there won't be any operating systems that support big endian anymore.
classichasclass 2021-08-18 23:39:48 +0000 UTC [ - ]
iainmerrick 2021-08-18 09:11:43 +0000 UTC [ - ]
(That may even happen already on some architectures for all I know)
walki 2021-08-18 09:26:58 +0000 UTC [ - ]
Yes, most CPUs have special instructions for swapping between little and big endian byte arrangement. The GCC compiler has the __builtin_bswap64(x) for accessing this instruction. However this is an additional instruction that needs to be executed for each read of a 64-bit word that needs to be converted, in some workloads this can double the number of executed instructions and hence add significant overhead.
Supporting big endian CPUs in systems programming sucks beyond imagination. There are virtually no big endian users anymore and making sure your software works fine on big endian requires testing it on a big endian CPU. However it is not possible to buy a big endian CPU anymore as there exist no more consumer big endian CPUs. For this reason I still have a Mac PowerPC from 2003 at home running an ancient version of Mac OS X. But over the last 2 years I have stopped testing my software on big endian, I just don't care about big endian anymore...
foxfluff 2021-08-18 12:34:44 +0000 UTC [ - ]
Why is this good? How does the extra work make software more robust?
flohofwoe 2021-08-18 08:51:41 +0000 UTC [ - ]
Indeed, reading file headers byte by byte also avoids alignment issues on some CPUs. At least older ARM CPUs trapped misaligned reads (not sure if this is still the case though).
walki 2021-08-18 09:09:59 +0000 UTC [ - ]
No this is not the case anymore. Nowadays support for unaligned memory accesses is very good on ARM and most other CPU architectures. On x86 aligned memory used to be very important for SIMD but now there are even special SIMD instructions for unaligned data and the performance overhead of unaligned memory accesses is generally very small in my experience.
minusf 2021-08-18 07:34:48 +0000 UTC [ - ]
iainmerrick 2021-08-18 08:34:37 +0000 UTC [ - ]
(edit to fix interesting autocorrect glitch... “bug-endian” indeed!)
edflsafoiewq 2021-08-18 07:42:57 +0000 UTC [ - ]
saurik 2021-08-18 08:06:24 +0000 UTC [ - ]
agent327 2021-08-18 20:10:19 +0000 UTC [ - ]
KingOfCoders 2021-08-18 08:29:31 +0000 UTC [ - ]
Only if you write left to right, and not right to left.
nybble41 2021-08-18 19:37:33 +0000 UTC [ - ]
nly 2021-08-18 08:54:59 +0000 UTC [ - ]
a_t48 2021-08-18 09:01:21 +0000 UTC [ - ]
rkangel 2021-08-18 09:57:36 +0000 UTC [ - ]
This misses the point of the original reason for this - big endian is more convenient if you're reading the value into a shift register. You don't need to know in advance how big the value is because you just shift the contents left each time you get a new byte and you end up with the appropriate 0 padded value.
Basically for a period of history, big endian was easier to implement for comms and little endian for processors. The big endian comms reasons have generally died out while the processor ones remain.
Network byte order (big endian in communication) is a very strong convention though. If you define a network protocol and use little endian then I will be very sad.
AnIdiotOnTheNet 2021-08-18 13:58:11 +0000 UTC [ - ]
rkangel 2021-08-18 14:13:50 +0000 UTC [ - ]
If I'm building a network stack and at every layer I'm pulling values out of the header in network byte order, except in one special case where I have to pull them the other way (for no good reason) then that's the protocol author's fault.
If there is a good reason for using middle-Endian EBCDIC (compatibility or technical) then fine. If not, then please use the thing that's the strong convention and therefore simplest to work with.
dahfizz 2021-08-18 17:55:58 +0000 UTC [ - ]