Friday, October 5, 2012

Q: How many unique Tweets can ever be posted?

At least 2^{4200} (a number with 1,264 decimal digits), using the contents of the tweet alone. Plus there is all the metadata, which probably adds at least a thousand bits.

See https://blogs.oracle.com/ksplice... . It turns out that Twitter allows almost 2^31 choices per "character," at least when a tweet is first posted. (They decay over time...)

Unicode itself is a 20.1-bit system, but Twitter doesn't allow literally all Unicode scalar values. (E.g. it messes with < and >.) On the other hand, Twitter does allow the huge characters above the first 2^20, that is to say not Unicode, but below 2^31. (This is almost 31 bits anyway.)

Disclaimer: I have not checked this myself since writing that blog post in March 2010.

2 comments:

Note: Only a member of this blog may post a comment.