blog.ratterobert.com

falsifian (www.falsifian.org)

James Cook. Time-space trader and software hipster.

falsifian (www.falsifian.org)

@movq Wow, I use Firefox and didn't realize this existed! Thanks for pointing it out. I noticed at least one bug cited a webcompat.com report; I wonder if someone at Mozilla monitors those. https://webcompat.com/issues?page=1&per_page=50&state=open&stage=all&sort=created&direction=desc

In reply to: #65xtmrq 6 months ago
falsifian (www.falsifian.org)

@prologic Here's mine. A pile of three laptops, a switch and a fibre modem connected by ethernet cables, sitting on a table in an unfinished basement.

In reply to: #jb5qaka 7 months ago
falsifian (www.falsifian.org)

@lyse I am a big fan of "obvious" math facts that turn out to be wrong. If you want to understand how reusing space actually works, you are mostly stuck reading complexity theory papers right now. Ian wrote a good survey: https://iuuk.mff.cuni.cz/~iwmertz/papers/m23.reusing_space.pdf . It's written for complexity theorists, but some of will make sense to programmers comfortable with math. Alternatively, I wrote an essay a few years ago explaining one technique, with (math-loving) programmers as the intended audience: https://www.falsifian.org/blog/2021/06/04/catalytic/ .

In reply to: #jd3p3nq 7 months ago
falsifian (www.falsifian.org)

@lyse Still melting! An irregular chunk of ice at the bottom of a metal sink.

In reply to: #3vtnszq 7 months ago
falsifian (www.falsifian.org)

@sorenpeter Sorry, I realized that shortly after posting. Here's another attempt to post the images: A large, irregular block of ice in a kitchen sink. It is the top of a group of icicles. Another view of the large, irregular block of ice in a kitchen sink. An open door revealing snow packed neatly against where the door rests when closed.

In reply to: #3vtnszq 7 months ago
falsifian (www.falsifian.org)

@kh1b Welcome to twtxt!

In reply to: #q7fdwia 9 months ago
falsifian (www.falsifian.org)

@bender I try to avoid editing. I guess I would write 5/4, 6/4, etc, and hopefully my audience would be sympathetic to my failing.

Anyway, I don't think my eccentric decision to number my twts in the style of other social media platforms is the only context where someone might write 1/4 not meaning a quarter. E.g. January 4, to Americans.

I'm happy to keep overthinking this for as long as you are :-P

In reply to: #xmq2anq 11 months ago
falsifian (www.falsifian.org)

@bender @prologic I'm not exactly asking yarnd to change. If you are okay with the way it displayed my twts, then by all means, leave it as is. I hope you won't mind if I continue to write things like 1/4 to mean "first out of four".

What has text/markdown got to do with this? I don't think Markdown says anything about replacing 1/4 with ¼, or other similar transformations. It's not needed, because ¼ is already a unicode character that can simply be directly inserted into the text file.

What's wrong with my original suggestion of doing the transformation before the text hits the twtxt.txt file? @prologic, I think it would achieve what you are trying to achieve with this content-type thing: if someone writes 1/4 on a yarnd instance or any other client that wants to do this, it would get transformed, and other clients simply wouldn't do the transformation. Every client that supports displaying unicode characters, including Jenny, would then display ¼ as ¼.

Alternatively, if you prefer yarnd to pretty-print all twts nicely, even ones from simpler clients, that's fine too and you don't need to change anything. My 1/4 -> ¼ thing is nothing more than a minor irritation which probably isn't worth overthinking.

In reply to: #gctrz4q 11 months ago
falsifian (www.falsifian.org)

@prologic I'm not a yarnd user, so it doesn't matter a whole lot to me, but FWIW I'm not especially keen on changing how I format my twts to work around yarnd's quirks.

I wonder if this kind of postprocessing would fit better between composing (via yarnd's UI) and publishing. So, if a yarnd user types 1/4, it could get changed to ¼ in the twtxt.txt file for everyone to see, not just people reading through yarnd. But when I type 1/4, meaning first out of four, as a non-yarnd user, the meaning wouldn't get corrupted. I can always type ¼ directly if that's what I really intend.

(This twt might be easier to understand if you read it without any transformations :-P)

Anyway, again, I'm not a yarnd user, so do what you will, just know you might not be seeing exactly what I meant.

In reply to: #ovlagaa 11 months ago
falsifian (www.falsifian.org)

@prologic I wrote 1/4 (one slash four) by which I meant "the first out of four". twtxt.net is showing it as ¼, a single character that IMO doesn't have that same meaning (it means 0.25). Similarly, 3/4 got replaced with ¾ in another twt. It's not a big deal. It just looks a little wrong, especially beside the 2/4 and 4/4 in my other two twts.

In reply to: #slyb5qq 11 months ago
falsifian (www.falsifian.org)

@bender It's the experience of an ordinary person in a strange place where memories are disappearing with the help of the Memory Police. The setting feels contemporary (to the book's 1994 publication date) rather than futuristic, except for some unexplained stuff about memories.

In reply to: #t5sq5rq 1 year ago
falsifian (www.falsifian.org)

@prologic Thanks for pointing out it lasts four hours. That's a big window! I wonder when most people will be on. I might aim for halfway through unless I hear otherwise. (12:00Z is a bit early for me.)

In reply to: #yjv73uq 1 year ago
falsifian (www.falsifian.org)

@movq Yes, the tools are surprisingly fast. Still, magrep takes about 20 seconds to search through my archive of 140K emails, so to speed things up I would probably combine it with an indexer like mu, mairix or notmuch.

In reply to: #hpxs42q 1 year ago
falsifian (www.falsifian.org)

@prologic Thanks for writing that up!

I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.

I am not sure how I feel about all this being done at once, vs. letting conventions arise.

For example, even today I could reply to twt abc1234 with "(#abc1234) Edit: ..." and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.

Similarly we could just start using 11-digit hashes. We should iron out whether it's sha256 or whatever but there's no need get all the other stuff right at the same time.

I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).

However I recognize that I'm not the one implementing this stuff, and it's less work to just have everything determined up front.

Misc comments (I haven't read the whole thing):

  • Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. I'd suggest gaining 11 bits with base64 instead.

  • "Clients MUST preserve the original hash" --- do you mean they MUST preserve the original twt?

  • Thanks for phrasing the bit about deletions so neutrally.

  • I don't like the MUST in "Clients MUST follow the chain of reply-to references...". If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldn't declare the client non-conforming just because they didn't get to all the bells and whistles.

  • Similarly I don't like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (I'm again thinking again of the 40-line shell script).

  • For "who follows" lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?

  • Why can't feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasn't too bad, but 1.0 would have been slightly simpler.

  • Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.

  • I'm a little sad about other protocols being not recommended.

  • I don't know how I feel about including markdown. I don't mind too much that yarn users emit twts full of markdown, but I'm more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.

In reply to: #zqpkfla 1 year ago
falsifian (www.falsifian.org)

@prologic Wikipedia claims sha1 is vulnerable to a "chosen-prefix attack", which I gather means I can write any two twts I like, and then cause them to have the exact same sha1 hash by appending something. I guess a twt ending in random junk might look suspcious, but perhaps the junk could be worked into an image URL like screenshot. If that's not possible now maybe it will be later.

git only uses sha1 because they're stuck with it: migrating is very hard. There was an effort to move git to sha256 but I don't know its status. I think there is progress being made with Game Of Trees, a git clone that uses the same on-disk format.

I can't imagine any benefit to using sha1, except that maybe some very old software might support sha1 but not sha256.

In reply to: #vqgs4zq 1 year ago
falsifian (www.falsifian.org)

@movq Agreed that hashes have a benefit. I came up with a similar example where when I twted about an 11-character hash collision. Perhaps hashes could be made optional somehow. Like, you could use the "replyto" idea and then additionally put a hash somewhere if you want to lock in which version of the twt you are replying to.

In reply to: #ce4g4qa 1 year ago
falsifian (www.falsifian.org)

@prologic Why sha1 in particular? There are known attacks on it. sha256 seems pretty widely supported if you're worried about support.

In reply to: #vqgs4zq 1 year ago
falsifian (www.falsifian.org)

@prologic

There's a simple reason all the current hashes end in a or q: the hash is 256 bits, the base32 encoding chops that into groups of 5 bits, and 256 isn't divisible by 5. The last character of the base32 encoding just has that left-over single bit (256 mod 5 = 1).

So I agree with #3 below, but do you have a source for #1, #2 or #4? I would expect any lack of variability in any part of a hash function's output would make it more vulnerable to attacks, so designers of hash functions would want to make the whole output vary as much as possible.

Other than the divisible-by-5 thing, my current intuition is it doesn't matter what part you take.

  1. Hash Structure: Hashes are typically designed so that their outputs have specific statistical properties. The first few characters often have more entropy or variability, meaning they are less likely to have patterns. The last characters may not maintain this randomness, especially if the encoding method has a tendency to produce less varied endings.

  2. Collision Resistance: When using hashes, the goal is to minimize the risk of collisions (different inputs producing the same output). By using the first few characters, you leverage the full distribution of the hash. The last characters may not distribute in the same way, potentially increasing the likelihood of collisions.

  3. Encoding Characteristics: Base32 encoding has a specific structure and padding that might influence the last characters more than the first. If the data being hashed is similar, the last characters may be more similar across different hashes.

  4. Use Cases: In many applications (like generating unique identifiers), the beginning of the hash is often the most informative and varied. Relying on the end might reduce the uniqueness of generated identifiers, especially if a prefix has a specific context or meaning.

In reply to: #zaazoeq 1 year ago
falsifian (www.falsifian.org)

@quark It looks like the part about traditional topics has been removed from that page. Here is an old version that mentions it: https://web.archive.org/web/20221211165458/https://dev.twtxt.net/doc/twtsubjectextension.html . Still, I don't see any description of what is actually allowed between the parentheses. May be worth noting that twtxt.net is displaying the twts with the subject stripped, so some piece of code is recognizing it as a subject (or, at least, something to be removed).

In reply to: #sbg7p7a 1 year ago
falsifian (www.falsifian.org)

It should be fixed now. Just needed some unusual quoting in my httpd.conf: https://mail-archive.com/misc@openbsd.org/msg169795.html

In reply to: #y2t2tnq 1 year ago
falsifian (www.falsifian.org)

@movq

Maybe I’m being a bit too purist/minimalistic here. As I said before (in one of the 1372739 posts on this topic – or maybe I didn’t even send that twt, I don’t remember 😅), I never really liked hashes to begin with. They aren’t super hard to implement but they are kind of against the beauty of the original twtxt – because you need special client support for them. It’s not something that you could write manually in your twtxt.txt file. With @sorenpeter’s proposal, though, that would be possible.

Tangentially related, I was a bit disappointed to learn that the twt subject extension is now never used except with hashes. Manually-written subjects sounded so beautifully ad-hoc and organic as a way to disambiguate replies. Maybe I'll try it some time just for fun.

In reply to: #uscpzpq 1 year ago
falsifian (www.falsifian.org)

@movq @prologic Another option would be: when you edit a twt, prefix the new one with (#[old hash]) and some indication that it's an edited version of the original tweet with that hash. E.g. if the hash used to be abcd123, the new version should start "(#abcd123) (redit)".

What I like about this is that clients that don't know this convention will still stick it in the same thread. And I feel it's in the spirit of the old pre-hash (subject) convention, though that's before my time.

I guess it may not work when the edited twt itself is a reply, and there are replies to it. Maybe that could be solved by letting twts have more than one (subject) prefix.

But the great thing about the current system is that nobody can spoof message IDs.

I don't think twtxt hashes are long enough to prevent spoofing.

In reply to: #bawn2ca 1 year ago
falsifian (www.falsifian.org)

@prologic The headline is interesting and sent me down a rabbit hole understanding what the paper (https://aclanthology.org/2024.acl-long.279/) actually says.

The result is interesting, but the Neuroscience News headline greatly overstates it. If I've understood right, they are arguing (with strong evidence) that the simple technique of making neural nets bigger and bigger isn't quite as magically effective as people say --- if you use it on its own. In particular, they evaluate LLMs without two common enhancements, in-context learning and instruction tuning. Both of those involve using a small number of examples of the particular task to improve the model's performance, and they turn them off because they are not part of what is called "emergence": "an ability to solve a task which is absent in smaller models, but present in LLMs".

They show that these restricted LLMs only outperform smaller models (i.e demonstrate emergence) on certain tasks, and then (end of Section 4.1) discuss the nature of those few tasks that showed emergence.

I'd love to hear more from someone more familiar with this stuff. (I've done research that touches on ML, but neural nets and especially LLMs aren't my area at all.) In particular, how compelling is this finding that zero-shot learning (i.e. without in-context learning or instruction tuning) remains hard as model size grows.

In reply to: #b242aea 1 year ago
falsifian (www.falsifian.org)

@prologic Thanks. It's from a non-Euclidean geometry project: https://www.falsifian.org/blog/2022/01/17/s3d/

In reply to: #tm726iq 1 year ago
Reply via email