Sunday, October 16, 2011

Q: How does an investigative reporter get started?

Many investigative reporters got where they are by starting as beat reporters. This is a good way to get story ideas and familiarity with an area.

On "day one," you have an assigned "beat" -- an area of coverage. Maybe it's city government, or the courthouse, or local business. At the WSJ we had reporters covering finance, medicine and healthcare, technology, education, mergers and acquisitions, automobiles, energy, etc. We had a database where most Fortune 500 companies were assigned to exactly one reporter as the responsible person for any news involving that company.

We subscribed to the newswires (AP, Reuters, Dow Jones) and filtered for any news involving our companies and beats. We read the SEC filings.

On "day one," I called my companies and introduced myself as the new WSJ reporter covering them. They would invite me to their HQ to meet the CEO, one or two other executives, and the PR apparatus. Sometimes they would complain in a friendly way about previous WSJ coverage or about too-favorable coverage of their competitor. Sometimes they invite you to visit their factory or meet their customers. They encourage you to attend their industry conference or trade shows.

Some companies will go all-out. When UPS gets a new WSJ reporter covering them, they order the person a brown uniform in their size and have them ride in a delivery truck for a day so the reporter gets a sense of how the company works.

Saturday, July 2, 2011

Q: What are the most iconic images from astrophysics?

Orbital decay of a binary pulsar compared with the prediction of general relativity. This won the Nobel Prize in physics in 1993.

Power spectral density of the cosmic microwave background radiation, showing perfect correspondence with a black body. The measurement was taken from the first nine minutes of data from COBE, launched in 1989.

Maps of the cosmic microwave background radiation (with and without the dipole from our own motion), as measured by COBE in its first four years. The above two graphs won the Nobel Prize in physics in 2006.

Angular power spectrum of the magnitude of the cosmic microwave background radiation, here as measured by WMAP, showing the "third peak" and giving clues to what went on in the early universe.

Saturday, June 18, 2011

Q: Is there any technical data available to support claims of “best HD picture quality” when choosing a satellite, cable or IPTV service provider?

I have done a real-time comparison between the HDTV signals transmitted by local broadcasters in the Boston area (NBC, ABC, CBS) over broadcast ATSC and the corresponding signals transmitted over Comcast service in QAM digital cable.

The two streams were bit-for-bit identical.

I'm sure it is true for at least some and maybe most "cable-only" channels, the different providers do vary in signal quality and how much they "re-compress" the signal.

But I can tell you that for the over-the-air channels, at least in Boston on Comcast, there is no difference between getting the signal over the air and getting it from cable. The two streams were exactly the same.[*]

A rigorous comparison of "cable-only" channels would be more difficult because it's hard to get these in unencrypted form from any provider. Just counting the bitrate (if available on a set-top box) is probably a reasonable proxy, but keep in mind that some providers (like DirecTV) use H.264 (aka MPEG-4 part 10) for at least some channels. That's much more efficient than the MPEG-2 part 2 used by digital cable and broadcast HDTV channels.

[*] The MPEG-2 video elementary stream was bit-for-bit identical. I didn't examine the audio but suspect it was the same. The systems-layer framing and other streams were different, if only because Comcast's channels are at 38.8 Mbps and broadcast ATSC is 19.4 Mbps and because the program IDs were rearranged.

Tuesday, May 31, 2011

Q: What was the first digital newspaper in the world?

This is a provocative question, and it really depends on what you mean by a digital newspaper.

Arguably any written language formed out of a well-understood dictionary of symbols (like letters) could constitute a digital newspaper, since the signal distinguishing feature of "digital" technology is that a static discipline ensures that marginal inputs can become perfect outputs. This makes perfect copies possible and allowed scribes to preserve our culture for thousands of years, longer than the life of any physical medium.

Under this definition, the earliest handwritten gazettes, or even the texts of ancient civilizations could count.

But a major leap happened when perfect copies became not just possible, but easy and capable of mass production -- so maybe we should focus on the first typesetnewspapers.

Yet another leap happened when news began to be distributed not just in mass-produced digital form (the typeset broadsheet) that could be "easily" copied, but in electronic form that could be "instantly" copied.

Here we would be talking about the rise of telegraphy, which allowed the first stock tickers and the Associated Press and other newswires.

In the 60s and 70s, newspapers began to distribute national and international editions, sending their pages by satellite to printing presses across the country or in Europe. A battle ensued between the Wall Street Journal and the New York Times over the appropriate "digital" way to distribute their copy. The New York Times "digitized" its pages by scanning them and preparing a facsimile raster image, transmitting that pixel by pixel, and having it lithographed at printing presses across the country. The Wall Street Journal digitized its pages by coding the contents of each article letter-by-letter and sending that, along with a layout, to typesetters at each printing press who would have to re-typeset the pages before lithographing them. In John Hess's book "My Times," he describes how the NYT facsimile approach proved impractical with the technology of the day and was soundly beaten by the more data-frugal WSJ approach.

The 70s and 80s also saw the rise of electronic distribution to businesses and the public, with newspapers transmitting news copy to databases like Nexis and posting articles on online services like Compuserve and AOL. The 90s obviously saw extraordinary growth in the electronic distribution of news, as newspapers began posting their entire editions on the Internet.

I think if I had to pick one development along this timeline that signaled the birth of "digital" newspapers, it would probably be the invention of typesetting and the printing revolution, which made the mass-produced newspaper possible. It looks like that means I have to pick Carolus's "Relation," whose first edition was published in 1605.

But surely the invention of the telegraphic newswire is also a major, major advance that you could call the first "digital" newspaper.

And I think the anthropologists might point to the invention of written language in the first place -- and what it meant, namely the ability to perfectly copy texts and preserve them for longer than the life of any physical artifact -- as the signal development in our ability to use "digital" means to distribute and preserve our culture.

Saturday, May 28, 2011

Q: Is the Charles River safe to swim in?

It depends -- on the day, your location in the river basin, and who you are.

The short answer is that if you are young and have a robust immune system, swimming east of the Harvard Bridge, without recent rainfall, you should be fine. I have done it on hot summer days and it was beautiful. The government, which plays these things pretty conservatively, agrees that on most days, the river basin meets the standards for swimmability below the Mass. Ave. bridge.

The longer answer is that the conditions vary depending on where you are in the river and the day. For centuries, the Charles was practically Boston's sewer, not just for human waste but also for all manner of industrial heavy metals.

In 2010, most sites east of Magazine Beach would have been "swimmable" about 75% of the time. See this presentation last October by the Massachusetts Water Resources Authority. (http://www.charlesriverconservan..., slide 14)

Unfortunately, last year we had a problem with "Harmful Algae Blooms," perhaps caused by hot water runoff from power plants. The state posted an algae advisory from July through September. See http://www.charlesriverconservan..., slides 14-16.

During the summer, the Charles River Watershed Association maintains a system of weekly monitoring and flies flags to show whether the water is "safe for boating" at nine places. You can get the data at

As I understand, the standard for "safe for boating" is less than 630 colony-forming-units of E. coli per 100 mL, plus an acceptable level of blue-green algae. The EPA standard for swimming is tighter, I understand around 225 colony-forming units, plus a limit on Enterococcus.

Sunday, May 8, 2011

Q: Do the odds in a horse race add up to more than 100%?

It doesn't quite make sense to sum odds. But if we talk about converting the odds into the probabilities of each horse's winning, then yes, they do add to more than 100% -- because the house takes more than 16% of every dollar bet!

For example, let's say we had two horses in a race, equally favored to win. A "fair" race chart, with no house take, would give each horse 1:1 odds against winning -- meaning a successful $1 bet will pay back a total of $2. The probability that corresponds to 1:1 odds is 1/(1+1), or 50% -- and two of these sum to 100%. Say you want to be guaranteed to walk away with $1. Then you need to bet 1/2 dollar on each horse. Exactly one horse will win (paying off 1:1, so you'll get a dollar back), so you will break even.

In reality, the odds won't be 1:1 for each horse. It will be something like 2:3 odds for each horse, meaning a successful $3 bet will pay back a total of $5. The probability that corresponds to 2:3 odds is 3/5, or 60%. Two of these sum to 120%! Say you want to be guaranteed to walk away with $5. Then you must bet $3 on the first horse, and $3 on the second horse. You've bet $6 to be assured of winning $5 -- the house has taken 16.7%.

We can see this if you look at the chart from yesterday's running of the Kentucky Derby ( ):

The race went off like this:
  • Animal Kingdom, 20.90 (odds $1)
  • Nehro, 8.50
  • Mucho Mucho Man, 9.30
  • Shackleford, 23.10
  • Master of Hounds, 16.80
  • Santiva, 34.70
  • Brilliant Speed, 27.90
  • Dialed In, 5.20
  • Pants On Fire, 8.10
  • Twice the Appeal, 11.90
  • Soldat, 11.90
  • Stay Thirsty, 17.20
  • Derby Kitten, 36.30
  • Decisive Moment, 39.30
  • Archarcharch, 12.50
  • Midnight Interlude, 9.60
  • Twinspired, 32.90
  • Watch Me Go, 33.60
  • Comma to the Top, 35.80

Let's say on each horse we made a bet "to win" in the amount of what it took to be assured of walking away with $1. Exactly one bet will succeed, so the amount we need to bet is the reciprocal of one plus the odds. (E.g. if the odds are 9.60 against 1, we can bet 1/10.60 dollars to receive $1 if it's a success.)

The total we need to bet is

1/21.90 + 1/9.50 + 1/10.30 + 1/24.10 + 1/17.80 + 1/35.70 + 1/28.90 + 1/6.20 + 1/9.10 + 1/12.90 + 1/12.90 + 1/18.20 + 1/37.30 + 1/40.30 + 1/13.50 + 1/10.60 + 1/33.90 + 1/34.60 + 1/36.80 = 1.195...

So it takes about $1.20 to be assured of walking away with exactly $1. The house is taking about 16%.

(To be fair, the house is not just one house -- OTB facilities take a cut, etc. etc. But from the perspective of the gambler, the house advantage in horseracing is much higher than even the worst casino games.)

Wednesday, March 2, 2011

Q: How many read/writes does Twitter do a second?

It depends how you count. I'm no Twitter expert, but here's my understanding from a tech talk by Raffi Krikorian and colleagues in September 2010. Would happily correct this if anybody from Twitter wants to chime in.
  • The volume of actual tweets is relatively small -- the most ever in a single second was fewer than 7,000 tweets, which is a peak load of less than 8 Mbps. Average load is about 1.6 Mbps, or 17 GB a day of tweets, and about 1,300 tweets per second. This is about 2% of the average trade frequency on Nasdaq.
  • Twitter's architecture generally writes a reference to each incoming tweet immediately to the "timeline" structure of each recipient. Lady Gaga has 8.5 million followers, so when she tweets to the world, that causes 8.5 million updates to the recipients' timelines.
  • Twitter's real load is not from tweets, but from changes in the social graph. "The rate of operations in the social graph is actually much faster than the incoming tweets we have," they said, meaning follow and unfollow events occur more often than tweets. Their flockdb graph store, built on MySQL, handles peak load of about 20,000 writes per second and 100,000 reads per second as of April 2010. (Apparently a common pattern for Twitter users is to sign up, follow a bunch of people, but rarely tweet themselves.)
  • One difficult task comes from how Twitter handles "directed" tweets. Twitter only shows the tweet to people who follow both the sender and recipient, so the service has to compute the intersection between both groups at post time. "A bad case for us is if Lady Gaga responds to Justin Bieber. We have to compute the intersection between 6.xx million followers and 5.1 million followers. So you know Flock is having a bad day when celebrities start tweeting each other."
  • The volume of tweets is only 17 gigabytes a day, but Twitter stores a lot more data than that -- the company tracks user behavior comprehensively, generating 12 terabytes of data a day. (I assume not all of this is stored for very long.)

Thursday, February 10, 2011

Q: Given that the universe is expanding and the Solar System is hurtling through space, is there any frame of reference for zero motion?

The laws of physics (Maxwell's equations, etc.) work the same in any inertial reference frame, so in that sense, no, there is no inertial reference frame that is the unique one with "zero motion." Space could just as easily be hurtling past the Solar System as the Solar System is past space!

However, one thing we can observe is the light from shortly after the Big Bang (about 300,000 years after). This "Cosmic Microwave Background Radiation" was discovered in the 1960s and has dimmed considerably over the last 14 billion years. It has the spectrum of a "black body" -- like the light you get off a hot piece of metal, like a lightbulb filament, except that this piece of metal is 2.725 Kelvins.

The COBE and WMAP satellites have observed that the light is slightly bluer in one direction in space (l = 264 degrees, b = 48 degrees) by about 3.4 milliKelvins and slightly redder in the opposite direction. We believe this is like the Doppler shift that happens when an ambulance drives by and shifts in pitch -- in other words, that the Solar System is flying through this background radiation, left over from the Big Bang, at a particular speed and direction.

We can calculate the velocity that must be:

2.725 K + 3.4 mK = \frac{(2.725 K) \sqrt{1 - (v/c)^2}}{1 - \frac{v}{c}}

Solving for v, we get v = 229 miles per second.

So, to sum up, in theory anybody in the universe who can measure the CMB precisely enough can agree on a "CMB rest frame" that is moving, relative to our Solar System, in the opposite direction from the galactic coordinates l = 264 degrees, b = 48 degrees, at the speed of 229 miles per second.