A NEW EXCOGITATION
Using Spam Blockers To Target HIV,
Too
A Microsoft researcher and his team
make a surprising new assault on the AIDS epidemic
Cut-rate painkillers! Unclaimed
riches in Nigeria!! Most of us quickly identify such e-mail messages as spam.
But how would you teach that skill to a machine? David Heckerman needed to
know. Early this decade, Heckerman was leading a spam-blocking team at
Microsoft Research. To build their tool, team members meticulously mapped out
thousands of signals that a message might be junk. An e-mail featuring
"Viagra," for example, was a good bet to be spam--but things got
complicated in a hurry.
If spammers saw that
"Viagra" messages were getting zapped, they switched to V1agra, or Vi
agra. It was almost as if spam, like a living thing, were mutating.
This parallel between spam and
biology resonated for Heckerman, a physician as well as a PhD in computer
science. It didn't take him long to realize that his spam-blocking tool could
extend far beyond junk e-mail, into the realm of life science. In 2003, he
surprised colleagues in Redmond, Wash., by refocusing the spam-blocking
technology on one of the world's deadliest, fastest- mutating conundrums: HIV,
the virus that leads to AIDS.
Heckerman was plunging into
medicine--and carrying Microsoft (MSFT ) with him. When he brought his
plan to Bill Gates, the company chairman "got really excited,"
Heckerman says. Well versed on HIV from his philanthropy work, Gates lined up
Heckerman with AIDS researchers at Massachusetts General Hospital, the
University of Washington, and elsewhere.
Since then, the 50-year-old
Heckerman and two colleagues have created their own biology niche at Microsoft,
where they build HIV-detecting software. These are research tools to spot
infected cells and correlate the viral mutations with the individual's genetic
profile. Heckerman's team runs mountains of data through enormous clusters of
320 computers, operating in parallel. Thanks to smarter algorithms and more
powerful machines, they're sifting through the data 480 times faster than a
year ago. In June, the team released its first batch of tools for free on the
Internet.
A new industry for the behemoth to
conquer? Not exactly. Heckerman's nook in Redmond represents just one small
node in a global AIDS research effort marked largely by cooperation. "The
Microsoft group has a different perspective and a good statistical
background," says Bette Korber, an HIV researcher at Los Alamos National
Laboratories. The key quarry they all face is the virus itself, which is
proving wilier than any of Microsoft's corporate foes. While Heckerman has high
hopes that his tools will lead to vaccines that can be tested on humans within
three years, his research sits outside of Microsoft's business plan. "It
has nothing to do with Microsoft," he says, "except that we can
help." From the company's perspective, the sums invested in HIV research
amount to a rounding error--only a couple million dollars per year in a
research and development budget of $7 billion. The potential payoff would be to
contribute to the holy grail of AIDS research, successful vaccines. In the
optimal scenario, drug companies would distill such research into targeted
varieties of vaccines, which would help defend millions around the world from
the scourge. The business payoff? Well, if helping to conquer a plague doesn't
justify the effort--and burnish Microsoft's image--it might just be that a
virus-sniffing tool could perhaps drive spam into submission.
If it seems strange that
spam-blockers would end up studying nucleic acids, it shouldn't. Research is
growing increasingly quantitative. Nearly everything these days, from atoms and
cells on up, is described in data. When the work involves finding statistical relationships
in mountains of bits, two things happen: First, mathematicians and computer
scientists gain sway, which means an expanding role in research for powerhouses
such as Microsoft and IBM (IBM ). Second, as researchers find
common patterns, they start jumping from one discipline to the next.
CALCULATING
ODDS
The battle against HIV draws loads
of such jumpers. Several scientists at Los Alamos, for example, were teaching
machines to recognize patterns in satellite imagery. This led them to HIV,
where they're building tools along the lines of Microsoft's. And many of the
800 researchers at Microsoft cross disciplines every which way. One of them,
Michael Cohen, started out building software to stitch photos into a panorama.
Now he's piecing thousands of brain scans into 3D models for scientists.
For Heckerman, the connections
between spam and HIV boil down to mathematics. He analyzes both scourges by
studying statistical relationships among their ever-changing features. Consider
the word "Viagra." Sometimes it shows up in legitimate e-mails. Often
it appears in spam. If researchers study thousands of e-mails, they can
calculate the percentage of e-mails with that word that are spam. That's one
clue. But the spam-filtering machine needs to know more than that. What other
features in an e-mail signal that it's spam? Are certain fonts particularly
spammy? What about e-mail addresses or types of punctuation? The trick is to
figure out which combinations of these features identify an e-mail as spam.
Each decision can involve thousands of variables and millions of different
calculations.
From Heckerman's perspective, HIV is
like a cagey spammer. After attacking a cell, it injects its own genetic
material and proceeds (much like a spam jockey who has commandeered an
unprotected computer) to manufacture thousands of copies of the virus. It's a
notoriously sloppy copier, but that adds to its vigor. Each mistake launches
mutant viruses into the system. Many fail. Some, though, survive--and resist
the drugs.
One challenge for HIV researchers is
to find the variables that point to an infected cell. Ordinarily, the first
clues--the cellular equivalent of the variations in fonts and words that
Heckerman has discovered in his spam research--are bits of protein that sit
atop each cell. These communicate to passersby, including armies of antibodies,
what's going on inside the cell. For years, researchers have been striving to
single out the combinations of protein that point to an HIV-infected cell. Once
they do, the next step is to package those bits of protein into a vaccine. In
theory, this would introduce a person's immune system to an entire gang of
undesirables, so that it could recognize and attack those cells.
The trouble? Complexity and
mutations. HIV-infected cells often wear mutated nameplates that immune systems
haven't learned to read. In this sense, vaccines have been like faulty spam
filters, the ones that block e-mails promoting "Viagra" while letting
ads for "V1agra" scoot through. This leads some researchers to throw
up their hands. "We've thrown billions down the black hole of AIDS
vaccines," laments Leroy Hood, co-founder of the Institute for Systems
Biology in Seattle.
But Heckerman is upbeat. He argues
that by revving up the computing power and blending thousands of new
variables--including dizzying genetic differences in each patient--researchers
are making progress. One key, he says, is to map the patterns of mutation and
incorporate them into medicine. These mutations, he says, appear to vary
according to a person's immune system. If researchers can find the patterns,
they'll be closer to making effective vaccines. Yet if they conclude that the
mutations are utterly random, then "we're in big trouble," says
Heckerman.
The hunt goes on. No one is betting
on miracles from Microsoft. But in a research community desperate for answers,
the hum of those computers churning in Redmond is a welcome sound.
|