Stephen Jay Gould was the JK Galbraith of biology

If you’re reading this, you’re probably somewhat familiar with Stephen Jay Gould.  Most likely, you know of him as a paleontologist and evolutionary biologist, and you may even have read one of his (excellent) popular-science books like The Panda’s Thumb.  You probably assumed, like I did, that from this work and as a Harvard professor he was an eminent figure in academia as well.

Well, no.

I remember first learning about Gould’s most famous theory, “punctuated equilibrium“, quite some years back.  I won’t digress into the details here, but when I thought I’d understood it, I remember thinking something like, “OK this might be true, but then what?”  I assumed that I was missing something and didn’t think much more of it.

Then a few years ago, the scales fell from my eyes.  This letter in reply to a couple of Gould’s summarizes how “real” evolutionary biologists felt about Gould:

John Maynard Smith, one of the world’s leading evolutionary biologists, recently summarized in the NYRB the sharply conflicting assessments of Stephen Jay Gould: “Because of the excellence of his essays, he has come to be seen by non-biologists as the preeminent evolutionary theorist. In contrast, the evolutionary biologists with whom I have discussed his work tend to see him as a man whose ideas are so confused as to be hardly worth bothering with, but as one who should not be publicly criticized because he is at least on our side against the creationists.” (NYRB, Nov. 30th 1995, p. 46). No one can take any pleasure in the evident pain Gould is experiencing now that his actual standing within the community of professional evolutionary biologists is finally becoming more widely known.

In other words, evolutionary biologists considered Gould what we might call a “useful idiot”.  The letter continues with one of the meanest paragraphs I’ve ever read:

Now, given the foregoing, one is left with the puzzle of why Gould so customarily reverses the truth in his writing. We suggest that the best way to grasp the nature of Gould‘s writings is to recognize them as one of the most formidable bodies of fiction to be produced in recent American letters. Gould brilliantly works a number of literary devices to construct a fictional “Gould” as the protagonist of his essays and to construct a world of “evolutionary biology” every bit as imaginary and plausible as Faulkner’s Yoknapatawpha County. Most of the elements of Gould‘s writing make no sense if they are interpreted as an honest attempt to communicate about science (e.g., why would he characterize so many researchers as saying the opposite of what they actually do) but come sharply into focus when understood as necessary components of a world constructed for the fictional “Gould” to have heroic fantasy adventures in — adventures during which the admirable character of “Gould” can be slowly revealed.

Wow.  (To be fair, Gould really did bring this on himself.  Read the entire discussion.)  So, stay for Gould’s pop essays, pass on his “theory”.

Around the same time, I found a talk by Paul Krugman entitled What Economists Can Learn from Evolutionary Theorists.  Krugman had the same unpleasant surprise about Gould that I did, and dropped another nugget:

I am not sure how well this is known. I have tried, in preparation for this talk, to read some evolutionary economics, and was particularly curious about what biologists people reference. What I encountered were quite a few references to Stephen Jay Gould, hardly any to other evolutionary theorists. Now it is not very hard to find out, if you spend a little while reading in evolution, that Gould is the John Kenneth Galbraith of his subject. That is, he is a wonderful writer who is bevolved by literary intellectuals and lionized by the media because he does not use algebra or difficult jargon. Unfortunately, it appears that he avoids these sins not because he has transcended his colleagues but because he does does not seem to understand what they have to say; and his own descriptions of what the field is about – not just the answers, but even the questions – are consistently misleading.

Now, you’re probably less likely to be familiar with John Kenneth Galbraith.  I first encountered him in William F Buckley’s excellent Firing Line archive.  (Go watch for a while; I’ll be here when you come back.)  Galbraith played the role of “Serious Liberal Economist”, counterpoint to Buckley and his guru the (actual) Serious Conservative Economist Milton Friedman.  Before life as a public intellectual, Galbraith did extensive work in public policy and politics, in addition to economics.  But as Krugman introduces before the paragraph above,

And I guess it is no secret that even John Kenneth Galbraith, still the public’s idea of a great economist, looks to most serious economists like an intellectual dilettante who lacks the patience for hard thinking.

Ouch.  (Although, I’ve read that Krugman’s dismissal of Galbraith is somewhat more controversial than the dismissal of Gould, that Galbraith’s public policy work had substance.)

Both these were quite surprising to me — I hope you learned something as well.  And it’s always worth checking the credentials of the popular “TV experts”.


Paying for our free press

The concept of a “free press” comprises

  • free, as in free speech
  • independent, as in free from conflict of interest
  • pluralistic, as in free marketplace of ideas

Take away any of those elements and the system is weakened.  Authoritarian states always restrict free speech, and the results are well known.  If your press consists of multiple, free-speech publishers all dependent on the government, for example, then you would have to read their publications with heightened scrutiny.  And if you take away pluralism by gating access to journalism through a single large content distributor, for example, then your views become colored by the biases of the distributor.  This is all old news, of course (pun intended).

As the “press” has moved online, people have come to expect another “freedom”: free as in beer, no charge.  That’s eroded publishers’ traditional revenue streams, as many have written about already, and now the news has become a tough business to be in.  This is also not a new observation.

What happens if, in a worst-case scenario, high-quality investigative journalism becomes financially unsustainable and collapses?  Previously I’ve imagined this leaving behind a news vacuum, to be filled with clickbait rotgut.  (Maybe some think we’re heading into that territory already.)  This would be an Idiocracy-style tragedy, to be sure.  But now there’s something new on the horizon.

In this US election cycle, attempts by foreign governments to influence American thought through reporting with particular biases have become higher profile.  (Or you might call it “propaganda” if so inclined.)  Now, there’s absolutely nothing wrong with that.  In fact, it’s something we should eagerly welcome into a pluralistic free press; at best it provides another perspective to consider, and at worst we can roll our eyes and ignore it.  And, it’s a big middle-finger to authoritarian regimes who would, and have, banned the same at home.  The challenge comes from the fact that, when financially sustained by those foreign governments, those news outlets don’t face the same revenue-generation constraints as independent outlets.

Not to dabble in tinfoil-hat paranoia, but now imagine the worst-case scenario above again, but with foreign-government-controlled news outlets, with their own ulterior motives, ready to fill the vacuum.  By remaining free-as-in-no-charge and presenting the semblance of traditional journalism, they might be able to exert a real influence.  That would have been completely unthinkable thirty years ago, for a variety of reasons.

What’s the solution?  We have to financially support good reporting.  With the way online news is evolving, it’s almost a patriotic duty now as well; that’s not a thought that had occurred to me before.  I don’t want to get too sidetracked on the mechanics of financial support since lots has been written about it too.  Subscriptions are fine but have some flaws; crowdfunded journalism is interesting; micropayments, for example as implemented in the Brave browser, is a new idea that might have potential.  But really it can be all of the above.  The important thing is,

Please, pay for our free press!

ES6 + react + flow: achievement unlocked

This weekend I refactored a couple pieces of my react web app.  I kept whacking at the code until the flow type checker stopped erroring.  And after that, … my app still worked!  Achievement unlocked.  This is a major improvement over the dark-ages web-app workflow that went something like, edit; refresh; see what broke; repeat.  (Of course, not to poke the embers of that religious war, but fanciers of statically-typed languages, like myself, will be saying “well, finally”.)

In the react + flow development workflow though, type errors don’t block the app from updating on changes.  So I can quickly hack up something half-broken to try out an idea, and then if it works go back and fix up the code to be less broken.  (Gradual/optional typing enthusiasts won’t be surprised at this either.)  But of course, type errors do block deployment in my setup.

And I’ve been finding ES6 to be a pretty nice language to work with; a big advance over vanilla JavaScript / ES5, which is not my cup of tea.  Lots has been written on this topic, and I won’t go down that rabbit hole here.

Then finally, enabling all these goodies nowadays is as simple as

npm install -g create-react-app
create-react-app foo

Previously, you had to be something of a node.js pseudo-build-system ninja to set up this kind of environment, although it was possible.

In all probability, this environment won’t graduate beyond the prototype I’m hacking up right now (more on this later), but current mood:


Donald Trump is Ellis from Die Hard

In this actual, real-life, archival footage:

we see the candidate:

  • brag about closing deals
  • display his smooth dealings with members of the opposite sex
  • brag more about negotiating deals
  • open negotiations with Vladimir Putin

If you’ve seen Die Hard, you remember how this ends.

Word trivia: what’s the craic, crack shot?

This summer I learned the slang “what’s the craic” from this interview:

Craic is pronounced like crack in English, and the whole phrase means something like “what’s new” or “what’s up”.  Delightful!  I regret only having learned it this recently.

Craic would be an odd English spelling.  Turns out it was borrowed from Irish pretty recently, within the last 40 years or so.  But before that, craic was borrowed into Irish from the English crack, as a Gaelicized spelling of it, not too long before craic was re-borrowed into English.

Crack in that sense of “news” or “chat” came from Northern England and Scotland, as a softening of an earlier usage of crack that meant “loud boasting”.  And that crack traces back to the Middle English word crak, meaning the same.  Crak in turn comes from Old English cracian, meaning to make a sudden sharp noise.

What about “crack shot”, you’re now asking.  From Northern England and Scotland, usage of crack to mean “loud boasting” worked its way south.  As it did, its meaning shifted more towards describing what was being boasted about.  So if your “crack” was being a good marksman, then eventually I could call you a “crack shot”.

But remember this ultimately derives from cracian, a sudden sharp noise; a word you would use to describe a gunshot.  So modern English in essence re-purposed a term from Old English, that was borrowed via Middle English into Scots and Northern English.


Doggie DNA database: dystopian?

My apartment building started requiring DNA samples from tenants’ dogs:

Now, through a simple cheek swab, all of your dogs will have their DNA registered with our vendor Mr. Dog Poop (yes that is the name of the company). We will be working with all new leases and renewals to get their dogs registered. Once a dog is registered this program will allow us to test the DNA of the “poop” left on our property, so we can trace it back to the correct owners. If the sample is found to belong to one of our resident animals their owners will be charged $110 to cover our cost of the DNA collection and testing. We would really like to avoid charging this cost so we encourage everyone to scoop their poop.

This was quite stunning to me.  I don’t have a dog, but I probably wouldn’t have moved into this building if the program had been going on.  I can’t even really explain my strong visceral reaction to this; I guess this blog post is a way for me to think “out loud”.

Is the program warranted?  We see an occasional pile of poop, but it doesn’t seem like a frequent problem.  But we’re not the building maintenance people, so maybe they see more of it.

Is it cost effective?  Judging by Mr. Dog Poop’s website, I’d estimate the DNA registration cost for our building would be on the order of a few thousand dollars.  That works out to a few hundred hours of labor.  I can’t imagine anyone spends hours a day on this problem.  Hmm.  If the building does catch someone not picking up for their dog, then they collect a nice margin: $110 for the $30 or so service charge for matching the DNA.  So they need to catch 50 or so violators for the program to pay for itself.

The DNA certainly has some value on its own.  If I knew your dog’s genetic profile, then I could target various kinds of advertising at you, for example.  But I don’t know if the profile Mr. Dog Poop collects ties back to the dog owner.  It seems that it does.

Looking at this from another perspective, there’s certainly an issue with cigarettes at our property; it’s non-smoking, but people smoke all the time and leave cigarette butts strewn about.  If the apartment had a DNA database of all tenants, then it’d be easy to catch violators and fine them.   But even the mention of that is quite chilling, and feels out of proportion to the problem.  Not to mention the violation of privacy and all that can of worms.

So I dunno.  I wish I could end with a strong conclusion and call to action, but I’m rather left with a general sense of uneasiness.  Watch out for these kinds of policies next time you move, and I hope we can vote with our wallets and stamp this out before it becomes common.

What do you think?

A lovely Wikipedia, just for me

If you ever look into using Wikipedia content for analysis — and wow, there’s tons of cool stuff to do with it — you’ll want a local copy.  You can’t scale an analysis that queries the public server, and it’s pretty rude to even try.  Setting up a local copy takes some time, but at the end you’ll have a goodly portion of accumulated human knowledge, and metadata relating it, right there on your local drive.


Quick caveat: there’s a difference between a Wikipedia mirror — which is a faithful duplicate of all the content on Wikipedia — and what I’m calling a Wikipedia local copy: page text and some of the metadata.  A cache, basically.  Other guides describe how to set up a mirror, and it allegedly takes twelve days.  That was far beyond my budget.

Another caveat: even a not-perfect-fidelity Wikipedia local copy occupies a lot of disk space.  The page contents themselves are about 50GB.  Depending on how much metadata you import (see below), usage can go up quite a bit from there.  My local copy is just over 200GB, for reference.

Here are the steps.

Download the “pages-articles” dump

Navigate to the Wikimedia database dump list.  Choose any project you wish; for this post I’ll assume English Wikipedia (“enwiki”), which happens to be the largest Wikimedia project.  Here’s the latest enwiki dump (20160820) as of the time of this writing.

Search that page for the file named like “enwiki-20160820-pages-articles.xml.bz2” and start downloading it.  This is an archive of the contents of all the project pages.  The 20160820 pages-articles dump is 12.3GB for example, so it’ll take a while.

Keep this tab open: you might want to download more files from here later.

Install MediaWiki software

On OS X, this is quite painless: install the Bitnami MediaWiki Stack and follow its configuration instructions.  When you set a username and password, make sure to remember the password (or write it down): you’ll need it below.  Make sure to start the services.

Bitnami has installers for other platforms, and cloud VMs ready to launch out of the box.  There are also plenty of other setup guides out there, particularly for MediaWiki on Linux.

After setup, you should be able to load http://localhost:8080/mediawiki/Main_Page and see … well nothing really, just an empty MediaWiki main page.  We’ll import the real stuff below.  Don’t create any content here: we’re going to wipe the DB.

Build the pages-articles import tool

Unfortunately, this is a bit of a pain.  First you need Java >= 1.7.  Download the JDK here and follow the instructions for the installer.  (On Linux, you may want to use your distribution’s Java installation method.)  On OS X, add this line to your .profile or .bashrc:

export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"

(On Linux, you may need to update your preferred Java version.)  Next you need to install git and maven.  On OS X, homebrew is very convenient for this:

brew install git maven

(And likewise on Linux.)  Now we can build the pages-articles importer.

mkdir ~/wikipedia
cd ~/wikipedia
git clone
cd mwdumper
mvn package

OK!  If you see a compiler error about an unsupported Java version, ensure that “javac -version” says something like “javac 1.8.0_102”; otherwise make sure you followed the steps above to set your preferred Java.

Import pages-articles

Has your pages-articles download finished yet?  No?  OK, come back here when it does.

For simplicity, let’s say that you’ve chosen the 20160820 enwiki dump, and you saved the pages-articles archive to “~/Downloads”.  These instructions assume a Bitnami install on OS X, but will work on Linux with small tweaks to the tool paths and DB names etc.

First we need to wipe any existing content from the MediaWiki DBs.  [Ed: you can probably skip the DB wipe, but I did it anyway.]

export PATH="/Applications/mediawiki-1.26.3-1/mysql/bin/:$PATH"
mysql -u root -p bitnami_mediawiki
Enter password:

Here you’ll need to provide the password you choose during configuration — you remembered it right? — but ignore the username part of the command (“-u root”); it’s always “root” here, no matter what you chose above.

mysql> DELETE FROM page; DELETE FROM text; DELETE FROM revision;
mysql> quit
cd /Applications/mediawiki-1.26.3-1/apps/mediawiki/htdocs/maintenance/
php rebuildall.php

Now we can import pages-articles:

cd ~/wikipedia
mkdir enwiki-20160820
cd enwiki-20160820
mv ~/Downloads/enwiki-20160820-pages-articles.xml.bz2 ./
java -jar ../mwdumper/target/mwdumper-1.25.jar \
  --format=mysql:1.25 enwiki-20160820-pages-articles.xml.bz2 \
  | mysql -u root -p bitnami_mediawiki

This will take several hours at least; you may want to run it overnight.  See you later!

[WARNING: when I imported a dump several months older than 20160820, the import tool died with a parse error just before the import finished.  This seems to have resulted in some articles not being imported, but didn’t noticeably impact my project at the time.  YMMV.  I don’t know if this is still an issue.]

After the import finishes, you should be able to load an arbitrary article, for example http://localhost:8080/mediawiki/War_hammer (from enwiki).  Of course, the displayed page will look considerably different than it does on the public server, because our local copy only has the page text.

Import metadata (optional)

Depending on your use case, you may want more than just the article text.  Details about the available metadata archives are beyond the scope of this article, but they’re hosted on the same dump page from which you downloaded the pages-articles archive.

Let’s say you want to import the redirect list metadata.  Search the dump page for a file called something like “enwiki-20160820-redirect.sql.gz” and download it.  Then import it with

export PATH="/Applications/mediawiki-1.26.3-1/mysql/bin/:$PATH"
cd ~/wikipedia/enwiki-20160820
mv ~/Downloads/enwiki-20160820-redirect.sql.gz ./
bzcat enwiki-20160820-redirect.sql.gz | mysql -u root -p bitnami_mediawiki

To import another metadata archive, follow the steps above replacing “redirect” with the metadata you want to import.  Be aware that some larger metadata archives take quite some time, hours, to import.

Have fun

That’s it!  Now you have a local copy of Wikipedia accessible offline, and — more interestingly to me — query-able through the MediaWiki API with minimal latency and no throttling.  (And the raw DB tables are available to power users.)

Protip: I found it quite handy to design my queries using the Wikipedia API Sandbox on the public server, which of course has all the metadata and secondary content, and then trying the queries locally.  All the metadata and secondary content on the public server makes queries easier to debug, and comparing results public vs. local lets you see where you may need to import more metadata archives.  But keep in mind that the public server content is constantly changing, so you won’t always get exactly the same results locally even if all dependencies have been imported properly.