From c841583d86ec24ed469a97645ed5986a780df4c9 Mon Sep 17 00:00:00 2001 From: zoeyande2 Date: Mon, 2 Mar 2026 17:26:02 -0500 Subject: [PATCH] Finished checkpoint 3, hypothesis on utf8 --- questions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/questions.md b/questions.md index 01290bc..11c94f6 100644 --- a/questions.md +++ b/questions.md @@ -116,4 +116,6 @@ I was a bit hesitant about this one, because my math brain tells me that there i Make a hypothesis about how this could work. +There must be some sort of pattern to the way characters are encoded in the first place. It must have something to do with how many zeros are next to each other, like maybe there's a rule that there can't be more than 4 zeros in a row in a given byte? So when it sees more than that in a byte, it stops and decides to code up until that point? +I decided to look this up after I hypothesized because I was struggling to find a pattern, and the actual way it works is super cool! It uses the first few digits of the first byte to determine how many bytes long the character is, and then it reads only that many!