Files
lab_encoding/questions.md
mdecker62 f84f6b13b3 d
2026-03-03 14:42:56 -05:00

115 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Boolean questions
Create the following variables.
```
a = Bits("11110000")
b = Bits("10101010")
```
For each of the following bytes, give an equivalent
expression which uses only `a`, `b`, and bit operators.
The answers to the first two questions are given.
1. 01010101
~b
2. 00000101
~a & ~b
3. 00000001
~a & b
4. 10000000
a & (b << 1)
5. 01010000
a & ~b
6. 00001010
~a & b
7. 01010000
a & ~b
8. 10101011
b | (~a & ~b)
## Integer questions
These questions are difficult! Try exploring ideas with `Bits`
in Terminal, a paper and pencil, and a whiteboard. And definitely talk with others.
9. If `a` represents a positive integer, and `one = Bits(1, length=len(a))`, give an expression equivalent to `-a`, but which does not use negation.
Answer: To find -a without using the negative sign, you can flip all the bits of a and then add 1. Flipping the bits means turning every 1 into a 0 and every 0 into a 1, which you can do with ~a. After that, you add one, which is just the number 1 written with the same number of bits as a. So the expression ~a + one gives the same result as -a. This works because thats how negative numbers are stored in binary.
10. It is extremely easy to double a binary number: just shift all the bits to the left. (`a << 1` is twice `a`.) Explain why this trick works.
Answer: Shifting a binary number to the left doubles it because of how place value works in base 2. In binary, each position represents a power of 2, just like in regular numbers each position represents a power of 10. When you shift all the bits one place to the left, every digit moves to the next higher power of 2. That means each value becomes twice as large. So doing a << 1 multiplies the number by 2, which is why it doubles the value.
11. Consider the following:
```
>>> hundred = Bits(100, 8)
>>> hundred
01100100
>>> (hundred + hundred)
11001000
>>> (hundred + hundred).int
-56
```
Apparently 100 + 100 = -56. What's going on here?
Answer: This happens because were working with 8-bit signed numbers, which have a limited range. In 8 bits using twos complement, the largest positive number you can store is 127. When you add 100 + 100, you get 200, but 200 is too big to fit in 8 bits. Since there isnt enough space to store it, the number “wraps around” and the leftmost bit becomes 1, which means the number is now interpreted as negative. The result 11001000 is what 200 looks like in 8 bits, but in twos complement that pattern represents -56. So nothing is mathematically wrong — its just overflow happening because the number is too large to fit in one byte.
12. What is the bit representation of negative zero? Explain your answer.
Answer: In twos complement, there is no separate bit pattern for negative zero. Zero is represented as all 0s (for example, 00000000 in one byte), and that is the only way zero is stored. If you try to compute negative zero using the twos complement method (flip all the bits of zero and add 1), you end up back at 00000000 again. Because of how twos complement is designed, positive zero and negative zero are the same, so there is only one representation of zero.
13. What's the largest integer that can be represented in a single byte?
Explain your reasoning.
Answer: The largest integer that can be represented in a single byte (8 bits) is 127. This is because, in signed 8-bit numbers using twos complement, one bit is used for the sign and the other 7 bits store the value. The biggest positive number happens when the first bit (the sign bit) is 0 and all the remaining bits are 1: 01111111. That equals 127 in decimal. If the first bit were 1, the number would be negative, so 127 is the largest positive value you can store in one signed byte.
14. What's the smallest integer that can be represented in a single byte?
Explain your reasoning.
Answer: The smallest integer that can be represented in a single byte (8 bits) is -128. In signed 8-bit numbers using twos complement, the first bit is the sign bit. When that first bit is 1 and all the other bits are 0, the pattern is 10000000. In twos complement, that bit pattern represents -128. Because one bit is used for the sign, the negative side actually goes one number farther than the positive side, which is why the range is from -128 to 127.
15. What's the largest integer that can be represented in `n` bits?
Explain your reasoning.
Answer: The largest integer that can be represented in n bits (using signed twos complement) is 2ⁿ⁻¹ 1. This is because one of the bits is used for the sign, leaving n 1 bits to store the positive value. The largest positive number happens when the first bit (the sign bit) is 0 and all the remaining bits are 1.
## Text questions
16. Look at the bits for a few different characters using the `utf8` encoding.
You will notice they have different bit lengths:
```
>>> Bits('a', encoding='utf8')
01100001
>>> Bits('ñ', encoding='utf8')
1100001110110001
>>> Bits('♣', encoding='utf8')
111000101001100110100011
>>> Bits('😍', encoding='utf8')
11110000100111111001100010001101
```
When it's time to decode a sequence of utf8-encoded bits, the decoder somehow needs to decide when it has read enough bits to decode a character, and when it needs to keep reading. For example, the decoder will produce
'a' after reading 8 bits but after reading the first 8 bits of 'ñ', the decoder realizes it needs to read 8 more bits.
Make a hypothesis about how this could work.
Answer:My hypothesis is that UTF-8 uses the first few bits of the first byte to signal how many bytes the character will take. If the first bit starts with a 0, that tells the decoder its a single-byte character (like regular English letters), so it can stop after 8 bits. But if the byte starts with multiple 1s followed by a 0, that pattern tells the decoder it needs to keep reading more bytes. For example, a starting pattern like 110 means the character uses two bytes, 1110 means three bytes, and 11110 means four bytes. The additional bytes follow a special pattern that makes it clear they are part of the same character. So the decoder doesnt guess — it looks at the beginning of the first byte to know exactly how many bytes to read.