Wednesday, March 2, 2011

Q: How many read/writes does Twitter do a second?

It depends how you count. I'm no Twitter expert, but here's my understanding from a tech talk by Raffi Krikorian and colleagues in September 2010. Would happily correct this if anybody from Twitter wants to chime in.
  • The volume of actual tweets is relatively small -- the most ever in a single second was fewer than 7,000 tweets, which is a peak load of less than 8 Mbps. Average load is about 1.6 Mbps, or 17 GB a day of tweets, and about 1,300 tweets per second. This is about 2% of the average trade frequency on Nasdaq.
  • Twitter's architecture generally writes a reference to each incoming tweet immediately to the "timeline" structure of each recipient. Lady Gaga has 8.5 million followers, so when she tweets to the world, that causes 8.5 million updates to the recipients' timelines.
  • Twitter's real load is not from tweets, but from changes in the social graph. "The rate of operations in the social graph is actually much faster than the incoming tweets we have," they said, meaning follow and unfollow events occur more often than tweets. Their flockdb graph store, built on MySQL, handles peak load of about 20,000 writes per second and 100,000 reads per second as of April 2010. (Apparently a common pattern for Twitter users is to sign up, follow a bunch of people, but rarely tweet themselves.)
  • One difficult task comes from how Twitter handles "directed" tweets. Twitter only shows the tweet to people who follow both the sender and recipient, so the service has to compute the intersection between both groups at post time. "A bad case for us is if Lady Gaga responds to Justin Bieber. We have to compute the intersection between 6.xx million followers and 5.1 million followers. So you know Flock is having a bad day when celebrities start tweeting each other."
  • The volume of tweets is only 17 gigabytes a day, but Twitter stores a lot more data than that -- the company tracks user behavior comprehensively, generating 12 terabytes of data a day. (I assume not all of this is stored for very long.)