Biotech

Published on June 2016 | Categories: Types, Brochures | Downloads: 76 | Comments: 0 | Views: 2275
of x
Download PDF   Embed   Report

Comments

Content



www.nature.com/naturebiotechnology

EDITORIAL OFFICE
[email protected]
75 Varick Street, Fl 9, New York, NY 10013-1917
Tel: (212) 726 9200, Fax: (212) 696 9635
Chief Editor: Andrew Marshall
Deputy Editor: Kathy Aschheim (Research)
Senior Editors: Laura DeFrancesco (News & Features), Markus Elsner (Research), Susan Jones
(Research), Michael Francisco (Resources and Special Projects), Victor Bethencourt (Special Projects),
Christine Borowski (Research), Craig Mak (Research)
Locum Editor: Mary Muers
Business Editor: Brady Huggett
Senior News Editor: Lisa Melton
Editorial Assistant: Miranda Dubner
Editor-at-Large: John Hodgson
Contributing Editors: Mark Ratner, Chris Scott
Contributing Writer: Jeffrey L. Fox
Editorial Advisor: Mark Kessel
Statistical Advisor: Theresa Hyslop
Senior Copy Editor: Teresa Moogan
Managing Production Editor: Renee Lucas
Production Editors: Brandy Cafarella, Carol Evangelista, Ivelisse Robles
Senior Illustrator: Katie Vicari
Illustrator: Marina Corral Spence
Cover design: Marina Corral Spence
MANAGEMENT OFFICES
NPG New York
75 Varick Street, Fl 9, New York, NY 10013-1917
Tel: (212) 726 9200, Fax: (212) 696 9006
Publisher: Melanie Brazil
Executive Editor: Veronique Kiermer
Commercial Director/EVP of Sales: Dean Sanderson
Head of Nature Research & Reviews Marketing: Sara Girard
Circulation Manager: Stacey Nelson
Production Coordinator: Diane Rios
Head of Web Services: Anthony Barrera
Senior Web Production Editor: Laura Goggin
NPG London
The Macmillan Building, 4 Crinan Street, London N1 9XW
Tel: 44 207 833 4000, Fax: 44 207 843 4996
Managing Director: Steven Inchcoombe
Publishing Director: Peter Collins
Editor-in-Chief, Nature Publications: Philip Campbell
Director of Web Publishing: Dan Pollock
NPG Nature Asia-Pacific
Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843
Tel: 81 3 3267 8751, Fax: 81 3 3267 8754
Regional Managing Director — Asia-Pacific: David Swinbanks
Director — Asia-Pacific: Antoine E. Bocquet
Operations Director: Hiroshi Minemura
Senior Marketing Manager: Sachiyo Ikeda
Asia-Pacific Sales Director: Kate Yoneyama
Asia-Pacific Sales Manager: Ken Mikami
DISPLAY ADVERTISING
[email protected] (US/Canada)

[email protected] (Europe)

[email protected] (Asia)
Global Head of Advertising and Sponsorship: Andrew Douglas, Tel: 44 207 843 4975,
Fax: 44 207 843 4996
Global Head of Display Advertising and Sponsorship: Gerard Preston, Tel: 44 207 843 4965,
Fax: 44 207 843 4749
Asia-Pacific Sales Director: Kate Yoneyama, Tel: 81 3 3267 8765, Fax: 81 3 3267 8752
Display Account Managers:
New England: Sheila Reardon, Tel: (617) 399 4098, Fax: (617) 426 3717
New York/Mid-Atlantic/Southeast: Jim Breault, Tel: (212) 726 9334, Fax: (212) 696 9481
Midwest: Mike Rossi, Tel: (212) 726 9255, Fax: (212) 696 9481
West Coast: George Lui, Tel: (415) 781 3804, Fax: (415) 781 3805
Germany/Switzerland/Austria: Sabine Hugi-Fürst, Tel: 41 52761 3386, Fax: 41 52761 3419
UK/Ireland/Scandinavia/Spain/Portugal: Evelina Rubio-Hakansson, Tel: 44 207 843 4079,
Fax: 44 207 843 4749
France/Belgium/The Netherlands/Luxembourg/Italy/Israel/Other Europe: David Watson,
Tel: 44 207 843 4959, Fax: 44 207 843 4749
Asia-Pacific Sales Manager: Ken Mikami, Tel: 81 3 3267 8765, Fax: 81 3 3267 8752
Greater China/Singapore: Gloria To, Tel: 852 2811 7191, Fax: 852 2811 0743
SPONSORSHIP [email protected]
Global Head of Sponsorship: Gerard Preston, Tel: 44 207 843 4965, Fax: 44 207 843 4749
Business Development Executive: David Bagshaw, Tel: (212) 726 9215, Fax: (212) 696 9591
Business Development Executive: Patrick Murphy, Tel: (617) 475 9216, Fax: (617) 494 4960
Business Development Executive: Reya Silao, Tel: 44 207 843 4977, Fax: 44 207 843 4996
Business Development Manager: Will Piper, Tel: 44 207 014 4181
NATUREJOBS
[email protected] (US/Canada)

[email protected] (Europe)

[email protected] (Asia)
US Sales Manager: Ken Finnegan, Tel: (212) 726 9248, Fax: (212) 696 9482
European Sales Manager: Dan Churchward, Tel: 44 207 843 4966, Fax: 44 207 843 4596
Asia-Pacific Sales & Business Development Manager: Yuki Fujiwara, Tel: 81 3 3267 8765,
Fax: 81 3 3267 8752
SITE LICENSE BUSINESS UNIT
Americas: Tel: (888) 331 6288
Asia/Pacific: Tel: 81 3 3267 8769
Australia/New Zealand: Tel: 61 3 9825 1160
India: Tel: 91 124 2881054/55
ROW: Tel: 44 207 843 4759

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

CUSTOMER SERVICE
www.nature.com/help
Senior Global Customer Service Manager: Gerald Coppin
For all print and online assistance, please visit www.nature.com/help
Purchase subscriptions:
Americas: Nature Biotechnology, Subscription Dept., 75 Varick Street, Fl 9, New York, NY 100131917, USA. Tel: (866) 363 7860, Fax: (212) 334 0879
Europe/ROW: Nature Biotechnology, Subscription Dept., Macmillan Magazines Ltd., Brunel Road,
Houndmills, Basingstoke RG21 6XS, United Kingdom. Tel: 44 1256 329 242, Fax: 44 1256 812 358
Asia-Pacific: Nature Biotechnology, NPG Nature Asia-Pacific, Chiyoda Building,
2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 3 3267 8751, Fax: 81 3 3267 8752
India: Nature Biotechnology, NPG India, 3A, 4th Floor, DLF Corporate Park, Gurgaon 122002, India.
Tel: 91 124 2881054/55, Tel/Fax: 91 124 2881052
REPRINTS [email protected]
Nature Biotechnology, Reprint Department, Nature Publishing Group, 75 Varick Street, Fl 9,
New York, NY 10013-1917, USA.
For commercial reprint orders of 600 or more, please contact:
UK Reprints: Tel: 44 1256 302 923, Fax: 44 1256 321 531
US Reprints: Tel: (617) 494 4900, Fax: (617) 494 4960

volume 31 number 8 AUGUST 2013

e d i tor i a l
661

Open to interpretation

Circular reconstructions with images
of retinal tissue stained for various
photoreceptor and retinal markers.
Ali and colleagues generate mature
photoreceptors by transplantation
of mouse embryonic stem cells
differentiated in three-dimensional
culture (p 741).
Credit: Anai Gonzales-Cordero

663 Myriad decision aftershocks ripple through biotech
664 UCLA and GSK reconcile
665 BARDA to pick and choose next-generation antibiotics
666 French scorn Sunshine
666 Melanoma combination therapies ward off tumor resistance
667 Dengue clinches Takeda deal
668 Indiana’s game-changing venture
669 Volunteer GM wheat, mischief or carelessness?
669 Paper firm to improve poor farmers’ crops
670 Around the world in a month
671 SARS-like virus reignites ownership feuds
672 DATA PAGE: 2Q13—an IPO revival
673 NEWS FEATURE: Spreading biotech dollars around Washington

B i oe n trepre n eur
B u i l d i n g a bus i n ess
Myriad ruling and its
consequences for biotech, p 663

676

Stock options and beyond
John J Cannon III & Mark Kessel

op i n i o n a n d comme n t

npg

© 2013 Nature America, Inc. All rights reserved.

news

C O R R E S P O ND E N C E
681
684
686
688
691
Biotech industry engages in influence
peddling, p 673

694

Heritable gene targeting in the mouse and rat using a CRISPR-Cas system
Simultaneous generation and germline transmission of multiple gene mutations in
rat using CRISPR-Cas systems
Targeted genome modification of crop plants using a CRISPR-Cas system
Multiplex and homologous recombination–mediated genome editing in
Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9
Targeted mutagenesis in the model plant Nicotiana benthamiana using Cas9
RNA-guided endonuclease
Chinese hamster genome sequenced from sorted chromosomes OPEN

Nature Biotechnology (ISSN 1087-0156) is published monthly by Nature Publishing Group, a trading name of Nature America Inc. located at 75 Varick Street,
Fl 9, New York, NY 10013-1917. Periodicals postage paid at New York, NY and additional mailing post offices. Editorial Office: 75 Varick Street, Fl 9, New York,
NY 10013-1917. Tel: (212) 726 9335, Fax: (212) 696 9753. Annual subscription rates: USA/Canada: US$250 (personal), US$4,677 (institution), US$5,382
(corporate institution). Canada add 5% GST #104911595RT001; Euro-zone: €202 (personal), €3,713 (institution), €4,634 (corporate institution); Rest of world
(excluding China, Japan, Korea): £130 (personal), £2,400 (institution), £2,990 (corporate institution); Japan: Contact NPG Nature Asia-Pacific, Chiyoda Building,
2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 (03) 3267 8751, Fax: 81 (03) 3267 8746. POSTMASTER: Send address changes to Nature
Biotechnology, Subscriptions Department, 75 Varick Street, 9th Floor, New York, NY 10013-1917. Authorization to photocopy material for internal or personal
use, or internal or personal use of specific clients, is granted by Nature Publishing Group to libraries and others registered with the Copyright Clearance Center
(CCC) Transactional Reporting Service, provided the relevant copyright fee is paid direct to CCC, 222 Rosewood Drive, Danvers, MA 01923, USA. Identification
code for Nature Biotechnology: 1087-0156/04. Back issues: US$45, Canada add 7% for GST. CPC PUB AGREEMENT #40032744. Printed by Publishers
Press, Inc., Lebanon Junction, KY, USA. Copyright © 2013 Nature America, Inc. All rights reserved. Printed in USA.

i

volume 31 number 8 AUGUST 2013
C omme n tar y
696

CASE STUDY: The rarest of bounties
Brady Huggett

F E AT U R E
697

Public biotech 2012—the numbers
Brady Huggett
pate n ts

Silencing indirect effects in networks,
p 720

704

The European BRCA patent oppositions and appeals: coloring inside the lines
Gert Matthijs, Isabelle Huys, Geertrui Van Overwalle & Dominique Stoppa-Lyonnet

711

Recent patent applications in drug discovery automation

N E W S A ND V I E W S
© 2013 Nature America, Inc. All rights reserved.

712

From embryonic stem cells to mature photoreceptors
see also p 741
David M Gamm & Lynda S Wright

714 Network cleanup
Babak Alipanahi & Brendan J Frey

see also pp 720, 726

716

Taking the fish out of fish oil
see also p 734
James P Wynn

717

A caffeine fix for human nuclear transfer?
Anthony C F Perry

719

Research highlights

computat i o n a l b i o l og y
A filter for improving
network analyses, p 726

A N A LY S I S

npg

720 Network link prediction by global silencing of indirect correlations
see also p 714
Baruch Barzel & Albert-László Barabási
726 Network deconvolution as a general method to distinguish direct dependencies in
networks
Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis
see also p 714
Palmitic Acid
(C16:0)
C16/18
elongase

research

Stearic Acid
(C18:0)
∆9
desaturase

Oleic Acid
(C18:1)
∆12
desaturase

LA
(C18:2, ω-6)
∆6
desaturase

GLA
(C18:3, ω-6)
C18/20
elongase

∆9
elongase

EDA
(C20:2, ω-6)

ARTICLES
∆17
desaturase

ALA
(C18:3, ω-3)
∆9
elongase

∆17
desaturase

∆8
desaturase

DGLA
(C20:3, ω-6)
∆5
desaturase

ARA
(C20:4, ω-6)

ETrA
(C20:3, ω-3)
∆8
desaturase

∆17
desaturase

ETA
(C20:4, ω-3)

∆17
desaturase

∆6
desaturase

STA
(C18:4, ω-3)
C18/20
elongase

∆5
desaturase

EPA
(C20:5, ω-3)

Sustainable production of omega-3
fatty acids, p 734

nature biotechnology

734

Production of omega-3 eicosapentaenoic acid by metabolic engineering of
Yarrowia lipolytica
Zhixiong Xue, Pamela L Sharpe, Seung-Pyo Hong, Narendra S Yadav,
Dongming Xie, David R Short, Howard G Damude, Ross A Rupert, John E Seip,
Jamie Wang, Dana W Pollak, Michael W Bostick, Melissa D Bosak, Daniel J Macool,
Dieter H Hollerbach, Hongxiang Zhang, Dennis M Arcilla, Sidney A Bledsoe,
Kevin Croker, Elizabeth F McCord, Bjorn D Tyreus, Ethel N Jackson & Quinn Zhu
see also p 716

iii

volume 31 number 8 AUGUST 2013
l etters

Single-cell SNP phenotypes, p 748

741

Photoreceptor precursors derived from three-dimensional embryonic stem cell
cultures integrate and mature within adult degenerate retina
Anai Gonzalez-Cordero, Emma L West, Rachael A Pearson, Yanai Duran,
Livia S Carvalho, Colin J Chu, Arifa Naeem, Samuel J I Blackford,
Anastasios Georgiadis, Jorn Lakowski, Mike Hubank, Alexander J Smith,
see also p 712
James W B Bainbridge, Jane C Sowden & Robin R Ali

748

Single-cell gene expression analysis reveals genetic associations masked in
whole-tissue experiments
Quin F Wills, Kenneth J Livak, Alex J Tipping, Tariq Enver, Andrew J Goldson,
Darren W Sexton & Chris Holmes

753

Bispecific antibodies with natural architecture produced by co-culture of bacteria
expressing two distinct half-antibodies
Christoph Spiess, Mark Merchant, Arthur Huang, Zhong Zheng, Nai-Ying Yang,
Jing Peng, Diego Ellerman, Whitney Shatz, Dorothea Reilly, Daniel G Yansura &
Justin M Scheer

759

72,548.69

Intensity (10e-5)

2

no NEM

1
0
4
3

+NEM

72,548.73
+ NEM

2
1
0
72,250

D
K
T
H
T
C
P
P
C
P
A
P
E
L
L
G

NEM does not
react with oxidized
hinge cysteines
S

S

Genomic landscapes of Chinese hamster ovary cell lines as revealed by the
Cricetulus griseus draft genome OPEN
Nathan E Lewis, Xin Liu, Yuxiang Li, Harish Nagarajan, George Yerganian,
Edward O’Brien, Aarash Bordbar, Anne M Roth, Jeffrey Rosenbloom, Chao Bian,
Min Xie, Wenbin Chen, Ning Li, Deniz Baycin-Hizal, Haythem Latif, Jochen Forster,
Michael J Betenbaugh, Iman Famili, Xun Xu, Jun Wang & Bernhard O Palsson

72,750

Mass (amu)

careers a n d recru i tme n t
Bispecific antibodies made by
bacterial co-culture, p 753

766

Second-quarter biotech job picture
Michael Francisco

768

people

npg

© 2013 Nature America, Inc. All rights reserved.

RESOURCE

Draft genome of the Chinese hamster,
p 759

nature biotechnology

v

Editorial

Open to interpretation
An international alliance to enable secure sharing of human genomic and clinical data merits both broad support
and financial backing from the global research and clinical communities.

npg

© 2013 Nature America, Inc. All rights reserved.

I

magine a world where genomics and clinical data travel seamlessly
between repositories at different institutions around the world; where
harmonized standardized data formats and consent processes enable pooling of sequence data; and where standardized guidelines exist for informing patients and their families of the pathogenic significance of variations
in their genome sequences. Making such a world a reality is the aim of the
Global Alliance, an international initiative that aims to create universal
interoperability standards and guidelines for genomics and medical data.
In recent weeks, the alliance (http://www.broadinstitute.org/news/
globalalliance) published a white paper outlining its draft mission, goals
and core principles. This was accompanied by a letter of intent with >70
signatories from medical, research and advocacy organizations in 41 countries, including such funders as the US National Institutes of Health, the
UK’s Wellcome Trust and Genome Canada. Since that time, another 20
organizations have signed the letter and the alliance has received numerous other expressions of interest.
The initiative is important because it recognizes a major impediment
to human genomics research—the sequestration of data within individual institutions and the difficulty of sharing data securely in the public
domain. Both databases and the literature remain replete with descriptions of sequence variants that are ill-defined or inappropriate, despite
the existence of standardized nomenclatures. These issues, and the lack
of international data formatting and exchange standards, not only make
sharing and pooling of data problematic, but also impede progress in interpreting what human genetic variation means.
The ability to share data is important because it is becoming clear that
sequence data from tens of thousands of people will be required to unravel
the clinical significance of common genetic variation; indeed, millions of
study participants will likely be required to identify interactions between
genes and lifestyle risk factors and to unravel additive, epistatic interactions. One need only look at the recent COGS (Collaborative Oncological
Gene-environment Study) collaboration (http://www.nature.com/icogs/),
which gathered sequence data from over 200,000 people to identify 74 new
susceptibility loci for breast, ovarian and prostate cancer.
Currently, COGS—and studies like it—take years to complete because
no one institution has the resources to gather data on this scale, and
pooling nonstandardized genetic data from different institutions is long
and grueling work. These difficulties also impair progress in rare disease
research, which could benefit from access to sequence data worldwide.
The problem of data silos, if left unattended, is only going to get worse.
More and more sequencing is taking place in academic medical centers,
blurring the line between the research setting and the clinic. Clinical
research requires much more stringent quality criteria and attention to
ethical norms (confidentiality/anonymity of patient information, consent procedures and patient consultation as to the clinical significance
of findings). As a result, much of the data obtained from clinical work

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

in the past has been off-limits to research. But this doesn’t make much
sense going forward and the Global Alliance emphasizes the importance
of autonomy—that patients should decide whether and how their data are
shared—as potentially tens, perhaps hundreds, of thousands of exomes
are sequenced in the clinic. Even today, diagnostic laboratories have more
clinical information on sequence variants than is present in the literature—
the question remains how to incentivize them to spend the time and effort
in posting variant data in open repositories like ClinVar.
Finally, and perhaps most importantly, the forces of commercial balkanization and annexation provide even greater urgency to efforts to
encourage genome data exchange in the public domain. The past year
has witnessed extensive consolidation in the sequencing and molecular
diagnostics sectors, with a handful of companies now monopolizing the
market. In 2012, Life Technologies acquired direct-to-consumer (DTC)
genomics provider Navigenics for its clinical testing services and 23andMe
consolidated its stranglehold on the DTC market through lowball pricing. Earlier this year, Illumina also bought the diagnostic startup Verinata
and has kept on undercutting competitors through its Illumina Genome
Network for clinical sequencing. At the same time, there has been a boom
in startups, such as GenomeQuest, Knome, Omicia, Personalis, Real Time
Genomics, Strand Life Sciences, SV Bio and SimulConsult, that offer software or services for clinical interpretation of genetic variation.
There’s nothing wrong with this in and of itself, of course. But many of
these businesses are currently being built around in-house databases that
sequester data surrounding genetic variants and claim it as a trade secret.
One need look no further than Myriad Genetics. Since the company
stopped sharing its BRCA1/BRCA2 variant information with the Breast
Cancer Information Core database in 2006, it has been building a secret
database that contains information on >14,300 variants of the two genes.
And despite the recent establishment of the open Sharing Clinical Reports
Project for BRCA1/BRCA2 data (see p. 713), many believe it will take years
to accumulate sufficient data in the public domain to rival the power of
Myriad’s repository.
Another salutary lesson comes from electronic health records (EHRs).
Spurred by the US Health Information Technology for Economic and
Clinical Health Act, the adoption of EHRs has increased from 10% to
~40% in the past four years. The problem is, though, that most commercial
EHR software cannot interface with that of competitors. Why? Because
interoperability standards simply came too late to the conversation.
For all these reasons, it is imperative that the global research and clinical
communities embrace and participate in the Global Alliance, engage in its
working groups and find a means to fund it sustainably. The effort is necessary and it is necessary to act now. Anything less and human genomics
research and clinical translation may be held back for years, even decades.
Let’s make sure it is not only industry that reaps the profits from interpreting our genomic heritage.
661

news
in this section
Melanoma
combination
therapies ward off
tumor resistance

Volunteer GM
wheat, mischief or
carelessness? p669

SARS-like virus
reignites ownership
feuds
p671

p666

The weight of the US Supreme Court’s June
13 decision, that a naturally occurring DNA
segment is a product of nature and not patent eligible merely because it has been isolated, may lie more in what the court didn’t
say than in what it did (Nat. Biotechnol. 31,
574, 2013). The decision struck down patent
claims by Myriad Genetics, of Salt Lake City,
Utah, on isolated DNA covering the genes
BRCA1 and BRCA2 used in its breast and
ovarian cancer diagnostic test. But in their
ruling, the justices did not provide guidance
about the extent to which a sequence must be
modified to become patent eligible, nor did
they suggest how to view the patentability
of other molecules—naturally occurring or
synthesized—that have a naturally occurring
counterpart. The Court’s reasoning almost
certainly validates the patent eligibility of
highly engineered DNAs such as those coding for humanized or chimeric antibodies,
according to an analysis by attorneys for
Foley Hoag in Boston.
The decision may create difficulties for natural products, agbiotech and generally for the
patenting of isolated proteins. Indirectly, the
circumstances of the case also highlight how
biotech monopolies limit access to clinical
data, which stifles innovation and reduces the
transparency of decision making predicated
on complex molecular tests (Box 1).
“The Court was being deliberately narrow,”
says patent attorney Hathaway Russell of Foley
Hoag. The justices acknowledged that isolated
DNA molecules do not exist in nature because
they have been removed from a chromosome,
and therefore from their natural context. But
they also concluded that those changes were
minimal. cDNA, on the other hand, is patentable in the Court’s view not because it is made
by a laboratory technician but because it is
manifestly different from the natural gene.
“That’s one reading of the case,” says
Hans Sauer of the Biotechnology Industry
Organization (BIO) in Washington, DC. But a
broad interpretation could extend the Court’s
logic to other molecules, he says. Indeed, its
reasoning could translate pretty directly to
molecules such as the anti-cancer drug Taxol
(paclitaxel), which is isolated from the bark of

Ikon Images / Alamy

npg

© 2013 Nature America, Inc. All rights reserved.

Myriad decision aftershocks ripple through biotech

Is it only human genes that are not patent eligible? The Supreme Court’s leaves a host of unsettled
questions which could have implications for the biotech industry.

the Pacific yew tree, or an antibiotic. In fact,
an antibiotic molecule or medicinal molecule
is much closer to the natural thing than the
isolated DNA is to genomic DNA, he says.
“Natural products will have bigger problems
with patentability now,” concludes David
Resnick of the Boston-based law firm Nixon
Peabody. The ruling is not good for innovation, Resnick says. “For 30 years, people
have been patenting isolated DNA and naturally occurring proteins like Taxol.” For the
Supreme Court to come along now and change
the rules in the middle of the game “is not
good for anything,” he says. “This is a debate
we should have had 15 years ago.”
Indeed, the arguments heard in Myriad are
being echoed by advocacy groups opposed
to patenting human embryonic stem (hES)

nature biotechnology volume 31 NUMBER 8 august 2013

cells (Box 2). On July 2, the public interest groups Consumer Watchdog of Santa
Monica, California (CW), and the New
York-based Public Patent Foundation (a
plaintiff in the Myriad case) asked a federal
appeals court to invalidate a patent on hES
cells held by the Wisconsin Alumni Research
Foundation (WARF) because the claimed in
vitro cell culture, they say, is a product of
nature. CW has been contesting the patent
since it was issued in 2006, claiming WARF’s
assertion of it, which CW says gives WARF
the potential to preempt all uses of hES
cells, has put a burden on taxpayer-funded
research (Nat. Biotechnol. 26, 393, 2008).
The unintended fallout of the Court’s ruling could reach areas that have nothing to do
with human genetics. Albeit the Myriad case
663

NEWS

in brief

npg

© 2013 Nature America, Inc. All rights reserved.

UCLA and GSK reconcile
In May, the University of California, Los
Angeles (UCLA) barred its scientists from
taking part in GlaxoSmithKline’s (GSK)
Discovery Fast Track (DFT) competition
just as the University of California, San
Francisco (UCSF) proudly announced an
expansion of its Centers for Therapeutic
Innovation (CTI) drug discovery alliance
with Pfizer of New York. Two months later,
UCLA and GSK reconciled their differences
and UCLA finally sanctioned its scientists
taking part in the program. So why was
one approach seen as good and the other
not? The UCLA tech transfer office was
concerned about the potential for disclosure
of confidential information and conflict
with rights to the researchers’ discoveries,
under the terms of participation in the GSK
program. GSK’s DFT competition builds on
the Discovery Partnerships with Academia
(DPAc) program. Launched in the UK in late
2010, DPAc invites academic partners with
deep understanding of disease biology to
become members of drug discovery teams.
GSK brings its industrial approach to drug
discovery and funds the activities. This is
similar to Pfizer’s CTI, which also involves
joint discovery (Nat. Biotechnol. 29, 3–4,
2011) and access to the pharma partner’s
expertise, compound libraries and biological
assays. Pearl Huang, global head of DPAc,
at King of Prussia, Pennsylvania, explains
that collaborations through DPAc involve
complicated contract negotiations that can
take time to put in place. As a way to avoid
this bottleneck, DFT was conceived, “as a
means to rapidly identify the most promising
hypotheses in academia,” she says. It is,
in other words, a giant fishing trip in which
academics are invited to submit a onepage application describing a novel drug
development concept. Ten winners will get
access to GSK’s screening facilities, and
academics whose screens are successful will
be offered DPAc contracts. Because UCLA
requires that researchers refer potential
inventions to its Office of Intellectual
Property and Industry Sponsored Research
before discussing them with companies,
entering GSK’s DFT could easily breach the
rules. But Huang says there was no intention
to bypass technology transfer offices. Under
the amended terms, tech transfer offices will
electronically monitor submissions to ensure
nothing confidential is disclosed to pharma.
“The main thing is to be good at
balancing,” says Susan Searle, formerly
CEO of Imperial Innovations, the technology
commercialization and investment group.
“It’s about getting the partnership right and
Nuala Moran
sharing rewards,” she says. 

664

was “served up with a narrative about human
genes and diagnostics, there is nothing in
the decision that limits it to only nucleic
acids with human sequences,” Sauer says.
Relatively few gene patents are owned by
diagnostic services companies like Myriad,
he points out. “The majority don’t have anything to do with human genes at all. That’s
something that troubles us that the Supreme
Court didn’t acknowledge.”
Agbiotech companies may feel the impact
most of all. A recent study on the changing
landscape of gene patent ownership in the

US as of October 2012 lists Dupont/Pioneer
Hi-Bred of Johnston, Iowa, as the largest
holder of such patents, both in terms of number and number in force (Nat. Biotechnol. 31,
404, 2013).
In agbiotech, DNA isolated from nature
is still essential in the R&D process, says
Dominic Muyldermans, senior legal consultant for CropLife International in Brussels.
“It confers a lot of useful and desired new
plant products,” he says.
Although companies that work in agriculture and bioenergy no longer patent every

Box 1 Cracking data monopolies
Myriad’s defeat in the Supreme Court has already spurred several laboratories to
launch their own BRCA testing: Ambry Genetics of Aliso Viejo, California; GeneDx of
Gaithersburg, Maryland; Pathway Genomics of San Diego; the University of Washington
in Seattle; Gene by Gene in Houston; and Quest Laboratories of Madison, New Jersey.
In response, in July, Myriad filed infringement suits against at least two of these rival
companies, Ambry and Gene by Gene.
Another strike at the patent monopoly held by Myriad’s $3,000 BRCAnalysis
test comes from outside the commercial setting. In April, Robert Nussbaum of
the University of California, San Francisco, unveiled a Web site, http://www.
sharingclinicalreports.org/, designed to collect information on BRCA variants and
maake this information publicly available in the ClinVar database of the National
Center for Biotechnology Information. Already, 6,000 reports have come in,
Nussbaum says.
BRCA test results from any laboratory, including Myriad, are included in the public
database. Its supporters argue that such transparency is necessary to understand, for
example, the sequence calls Myriad makes to identify BRCA variants and the clinical
actions it advises on that basis.
“The lack of access to the data was a hidden issue behind the gene patent,” says
Nussbaum. “Only by sharing data can others apply informatics tools and studies on
Myriad’s calls and what other people might be thinking, to figure out where Myriad
might be wrong,” says Peter Kolchinsky of RA Capital in Boston. Because patents do
not assure disclosure of data, “merely saying you can sequence BRCA genes doesn’t
enable the public to do anything,” he says. “The threat of competition encourages
innovation.”
At one time, Myriad contributed data on BRCA mutations to the Breast Cancer
Information Core, an open access online mutation database for breast cancer
susceptibility genes. It stopped the practice “as the information was supposed to be
for research use only and the database was not validated for providing test results to
patients in a clinical setting, which posed regulatory and quality system concerns,” a
company spokesperson said in an e-mail.
Nussbaum calls the Myriad statement facetious. “If these are clinical reports that
they are standing by and sending to their doctors, and serious decisions are going to
be made, of course they are going to be used for clinical purposes. What they are
trying to say is they are trying to own the use of these data for clinical purposes. It’s
not that they think it’s being misused for clinical purposes, it’s being used by people
who are not paying them.”
Among those praising the Supreme Court decision was US Congresswoman Debbie
Wasserman-Schultz, who tested positive for a BRCA2 mutation. Because of Myriad’s
patents, “I was unable to get a second opinion on the test,” she said in a statement
following the decision. Wasserman-Schultz shepherded legislation requiring the
US Patent and Trademark Office to conduct a study on ways to remove barriers for
patients to get access to second opinions on genetic testing, the results of which
MR
should be released this summer.

volume 31 NUMBER 8 august 2013 nature biotechnology

news

in brief

npg

© 2013 Nature America, Inc. All rights reserved.

Box 2 The Myriad soap opera
The case against Myriad was driven in part by an antimonopoly sentiment, fueled by
plaintiffs who were not happy that Myriad Genetics was charging a lot of money and
that some people weren’t covered, says David Resnick of the law firm Nixon Peabody.
Those emotions ran alongside the feeling that, fundamentally, parts of the human body
should not be covered by patents. “Essentially it was like a soap opera,” he says.
Myriad was an atypical case. Normally in a patent suit, Myriad would have sued
an infringer and the other party would have explained what they’d done, how they’d
extracted primers, made amplicons using PCR, or whatever, says Hans Sauer of BIO,
so the judge would have had a good understanding of what had happened. But this
was a declaratory judgment case where the plaintiffs claimed that Myriad’s BRCA
patents prevented people from examining, studying, testing and researching genes.
“It was a relatively abstract proposition and invited a relatively abstract answer,” he
says.
Many observers felt the Court oversimplified the issues before it. In the oral
argument, the questions the Justices put to the attorneys sought to move the
discussion away from chromosomes and complex biological materials by analogizing
DNA to the ingredients of a chocolate chip cookie.
“I found it frightening,” Resnick says. Whenever the attorneys from Myriad tried to
talk about the science “they were turned back,” he says. “Absolutely ridiculous. The
science is hard but it is not that hard.”
With a better grasp of the science, the Supreme Court “might have been able to
write a less ambiguous decision,” adds Sauer. “But I think at the end of the day the
outcome probably would have been the same because the outcome the Court had was
the outcome that they probably wanted.”
MR

trait they uncover as a protective measure
against potential competition, gene patenting is still relevant to enable the development
of new traits, creating benefits to farmers,
Muyldermans says.
In practice, a 2012 Supreme Court decision
in Mayo Collaborative Services v. Prometheus
Laboratories had a much broader effect on
diagnostics by raising the bar for obtaining patent method claims (Nat. Biotechnol. 30, 373–
374, 2012). But taken together, the Prometheus
and Myriad decisions make the US one of the
most restrictive jurisdictions. “Now in the US
your diagnostic methods are limited, and your
ability to claim naturally occurring molecules
is limited as well,” says Resnick. “We used to say
Europe was a jurisdiction that gave you limited
patent protection. [But] you can patent all this
stuff in Europe.”

Moving forward, Sauer thinks there will be
patent protection for most essential claims in
the industry. “Composition claims for DNA
are still possible,” he says. “That’s one thing I
think is really important.”
“The Court is saying you have protection
but it will be limited,” says Resnick. If DNA
is in a vector, for example, the bases could be
modified and the DNA made more stable.
“That does not exist in nature,” he says. A
transformed host cell also could be claimed.
“What concerns me are the unintended consequences with respect to isolated proteins,” he
says. “Our goal is to always make something
as close to human as possible. But now, the
closer you get, the greater the chances are it’s
not going to be patent eligible.”
Mark Ratner Cambridge, Massachusetts

BARDA to pick and choose
next-generation antibiotics
In May, the Biomedical Advanced Research
and Development Authority (BARDA), the US
government agency charged with developing
countermeasures to bioterrorist threats,
struck a novel type of collaboration deal with
GlaxoSmithKline (GSK) to develop several
antibacterial agents. The agreement gives
the London-based pharma $40 million over
the first 18 months and up to $200 million
over five years, under a new type of flexible
structure. Instead of focusing on a single
medical countermeasure, BARDA can shift
funds around GSK’s antibacterial portfolio.
The “portfolio approach” is a more efficient
way to partner with the company, says BARDA
director Robin Robinson, of Washington, DC.
With the new partnership, “if one or more
drugs do not meet our requirements we will
replace them with others in the GSK pipeline,”
says Robinson. The partnership, funded by
BARDA’s Broad Spectrum Antimicrobial
Program, allows the government agency to
decide which drug candidates to include in the
portfolio. GSK will conduct the preclinical and
clinical studies to develop antibacterials for
bioterrorism indications such as anthrax, plague
and tularemia as well as address antibiotic
resistance—part of BARDA’s strategic plan since
2011. One new class of antibiotic investigated
under this program is GSK’944 to treat bacterial
infections acquired in hospital and community
settings. “Because of economic and regulatory
barriers, very few pharmaceutical companies
pursue antibiotic R&D,” says Amanda Jezek at
the Infectious Disease Society of America. “This
type of public-private collaboration is critical to
help leverage government and industry funding
for antibiotic R&D.” Also in May, BARDA signed
a $75.7-million deal with Cempra of Chapel
Hill, North Carolina, to develop solithromycin
(licensed from Optimer Pharmaceuticals, of
Jersey City, New Jersey), a next-generation
fluoroketolide antibiotic in phase 3 trials to treat
community-acquired bacterial pneumonia, and
potentially anthrax and tularemia infections,
in children. BARDA currently has 140 drug
candidates in its pipeline, 80 of which
are directed against chemical, biological,
radiological and nuclear defense. The majority
were developed through partnerships with
Gunjan Sinha
companies, says Robinson. 

in their words
“There is this perception
that the key to the next
breakthrough is from
someone finding a gene that
is sitting somewhere and
someone having a eureka
moment. What I learned
is that it doesn’t usually
happen that way.” Brad
Margus, advocate for rare diseases and former CEO
of Perlegen and Envoy Therapeutics, speaking of

the importance of sharing genome data. (New York
Times, 5 June 2013)
“We don’t have a lot of questions on drugs
because they’re slam dunks. It’s not if we’re
going to approve them. It’s how fast we’re going
to approve them.” Richard Padzur, director of
the FDA’s Office of Oncology and Hematology
Products, on the need for speed in approving
new cancer drugs. (Forbes, 23 June 2013)

nature biotechnology volume 31 NUMBER 8 august 2013

“Talk about personal genomics. It doesn’t get any
more personal than trying to figure out what’s
wrong with your own kid.” Gary Schroth, an
R&D director at Ilumina in San Diego. Illumina
participated in a nine-year quest by a father for
the cause of his daughter’s undiagnosed malady.
The answer (mutation in transforming growth
factor β-3) came out of exome sequencing.
(Nature, 26 June 2013)

665

NEWS

in brief

npg

© 2013 Nature America, Inc. All rights reserved.

UCLA and GSK reconcile
In May, the University of California, Los
Angeles (UCLA) barred its scientists from
taking part in GlaxoSmithKline’s (GSK)
Discovery Fast Track (DFT) competition
just as the University of California, San
Francisco (UCSF) proudly announced an
expansion of its Centers for Therapeutic
Innovation (CTI) drug discovery alliance
with Pfizer of New York. Two months later,
UCLA and GSK reconciled their differences
and UCLA finally sanctioned its scientists
taking part in the program. So why was
one approach seen as good and the other
not? The UCLA tech transfer office was
concerned about the potential for disclosure
of confidential information and conflict
with rights to the researchers’ discoveries,
under the terms of participation in the GSK
program. GSK’s DFT competition builds on
the Discovery Partnerships with Academia
(DPAc) program. Launched in the UK in late
2010, DPAc invites academic partners with
deep understanding of disease biology to
become members of drug discovery teams.
GSK brings its industrial approach to drug
discovery and funds the activities. This is
similar to Pfizer’s CTI, which also involves
joint discovery (Nat. Biotechnol. 29, 3–4,
2011) and access to the pharma partner’s
expertise, compound libraries and biological
assays. Pearl Huang, global head of DPAc,
at King of Prussia, Pennsylvania, explains
that collaborations through DPAc involve
complicated contract negotiations that can
take time to put in place. As a way to avoid
this bottleneck, DFT was conceived, “as a
means to rapidly identify the most promising
hypotheses in academia,” she says. It is,
in other words, a giant fishing trip in which
academics are invited to submit a onepage application describing a novel drug
development concept. Ten winners will get
access to GSK’s screening facilities, and
academics whose screens are successful will
be offered DPAc contracts. Because UCLA
requires that researchers refer potential
inventions to its Office of Intellectual
Property and Industry Sponsored Research
before discussing them with companies,
entering GSK’s DFT could easily breach the
rules. But Huang says there was no intention
to bypass technology transfer offices. Under
the amended terms, tech transfer offices will
electronically monitor submissions to ensure
nothing confidential is disclosed to pharma.
“The main thing is to be good at
balancing,” says Susan Searle, formerly
CEO of Imperial Innovations, the technology
commercialization and investment group.
“It’s about getting the partnership right and
Nuala Moran
sharing rewards,” she says. 

664

was “served up with a narrative about human
genes and diagnostics, there is nothing in
the decision that limits it to only nucleic
acids with human sequences,” Sauer says.
Relatively few gene patents are owned by
diagnostic services companies like Myriad,
he points out. “The majority don’t have anything to do with human genes at all. That’s
something that troubles us that the Supreme
Court didn’t acknowledge.”
Agbiotech companies may feel the impact
most of all. A recent study on the changing
landscape of gene patent ownership in the

US as of October 2012 lists Dupont/Pioneer
Hi-Bred of Johnston, Iowa, as the largest
holder of such patents, both in terms of number and number in force (Nat. Biotechnol. 31,
404, 2013).
In agbiotech, DNA isolated from nature
is still essential in the R&D process, says
Dominic Muyldermans, senior legal consultant for CropLife International in Brussels.
“It confers a lot of useful and desired new
plant products,” he says.
Although companies that work in agriculture and bioenergy no longer patent every

Box 1 Cracking data monopolies
Myriad’s defeat in the Supreme Court has already spurred several laboratories to
launch their own BRCA testing: Ambry Genetics of Aliso Viejo, California; GeneDx of
Gaithersburg, Maryland; Pathway Genomics of San Diego; the University of Washington
in Seattle; Gene by Gene in Houston; and Quest Laboratories of Madison, New Jersey.
In response, in July, Myriad filed infringement suits against at least two of these rival
companies, Ambry and Gene by Gene.
Another strike at the patent monopoly held by Myriad’s $3,000 BRCAnalysis
test comes from outside the commercial setting. In April, Robert Nussbaum of
the University of California, San Francisco, unveiled a Web site, http://www.
sharingclinicalreports.org/, designed to collect information on BRCA variants and
maake this information publicly available in the ClinVar database of the National
Center for Biotechnology Information. Already, 6,000 reports have come in,
Nussbaum says.
BRCA test results from any laboratory, including Myriad, are included in the public
database. Its supporters argue that such transparency is necessary to understand, for
example, the sequence calls Myriad makes to identify BRCA variants and the clinical
actions it advises on that basis.
“The lack of access to the data was a hidden issue behind the gene patent,” says
Nussbaum. “Only by sharing data can others apply informatics tools and studies on
Myriad’s calls and what other people might be thinking, to figure out where Myriad
might be wrong,” says Peter Kolchinsky of RA Capital in Boston. Because patents do
not assure disclosure of data, “merely saying you can sequence BRCA genes doesn’t
enable the public to do anything,” he says. “The threat of competition encourages
innovation.”
At one time, Myriad contributed data on BRCA mutations to the Breast Cancer
Information Core, an open access online mutation database for breast cancer
susceptibility genes. It stopped the practice “as the information was supposed to be
for research use only and the database was not validated for providing test results to
patients in a clinical setting, which posed regulatory and quality system concerns,” a
company spokesperson said in an e-mail.
Nussbaum calls the Myriad statement facetious. “If these are clinical reports that
they are standing by and sending to their doctors, and serious decisions are going to
be made, of course they are going to be used for clinical purposes. What they are
trying to say is they are trying to own the use of these data for clinical purposes. It’s
not that they think it’s being misused for clinical purposes, it’s being used by people
who are not paying them.”
Among those praising the Supreme Court decision was US Congresswoman Debbie
Wasserman-Schultz, who tested positive for a BRCA2 mutation. Because of Myriad’s
patents, “I was unable to get a second opinion on the test,” she said in a statement
following the decision. Wasserman-Schultz shepherded legislation requiring the
US Patent and Trademark Office to conduct a study on ways to remove barriers for
patients to get access to second opinions on genetic testing, the results of which
MR
should be released this summer.

volume 31 NUMBER 8 august 2013 nature biotechnology

news

in brief

npg

© 2013 Nature America, Inc. All rights reserved.

Box 2 The Myriad soap opera
The case against Myriad was driven in part by an antimonopoly sentiment, fueled by
plaintiffs who were not happy that Myriad Genetics was charging a lot of money and
that some people weren’t covered, says David Resnick of the law firm Nixon Peabody.
Those emotions ran alongside the feeling that, fundamentally, parts of the human body
should not be covered by patents. “Essentially it was like a soap opera,” he says.
Myriad was an atypical case. Normally in a patent suit, Myriad would have sued
an infringer and the other party would have explained what they’d done, how they’d
extracted primers, made amplicons using PCR, or whatever, says Hans Sauer of BIO,
so the judge would have had a good understanding of what had happened. But this
was a declaratory judgment case where the plaintiffs claimed that Myriad’s BRCA
patents prevented people from examining, studying, testing and researching genes.
“It was a relatively abstract proposition and invited a relatively abstract answer,” he
says.
Many observers felt the Court oversimplified the issues before it. In the oral
argument, the questions the Justices put to the attorneys sought to move the
discussion away from chromosomes and complex biological materials by analogizing
DNA to the ingredients of a chocolate chip cookie.
“I found it frightening,” Resnick says. Whenever the attorneys from Myriad tried to
talk about the science “they were turned back,” he says. “Absolutely ridiculous. The
science is hard but it is not that hard.”
With a better grasp of the science, the Supreme Court “might have been able to
write a less ambiguous decision,” adds Sauer. “But I think at the end of the day the
outcome probably would have been the same because the outcome the Court had was
the outcome that they probably wanted.”
MR

trait they uncover as a protective measure
against potential competition, gene patenting is still relevant to enable the development
of new traits, creating benefits to farmers,
Muyldermans says.
In practice, a 2012 Supreme Court decision
in Mayo Collaborative Services v. Prometheus
Laboratories had a much broader effect on
diagnostics by raising the bar for obtaining patent method claims (Nat. Biotechnol. 30, 373–
374, 2012). But taken together, the Prometheus
and Myriad decisions make the US one of the
most restrictive jurisdictions. “Now in the US
your diagnostic methods are limited, and your
ability to claim naturally occurring molecules
is limited as well,” says Resnick. “We used to say
Europe was a jurisdiction that gave you limited
patent protection. [But] you can patent all this
stuff in Europe.”

Moving forward, Sauer thinks there will be
patent protection for most essential claims in
the industry. “Composition claims for DNA
are still possible,” he says. “That’s one thing I
think is really important.”
“The Court is saying you have protection
but it will be limited,” says Resnick. If DNA
is in a vector, for example, the bases could be
modified and the DNA made more stable.
“That does not exist in nature,” he says. A
transformed host cell also could be claimed.
“What concerns me are the unintended consequences with respect to isolated proteins,” he
says. “Our goal is to always make something
as close to human as possible. But now, the
closer you get, the greater the chances are it’s
not going to be patent eligible.”
Mark Ratner Cambridge, Massachusetts

BARDA to pick and choose
next-generation antibiotics
In May, the Biomedical Advanced Research
and Development Authority (BARDA), the US
government agency charged with developing
countermeasures to bioterrorist threats,
struck a novel type of collaboration deal with
GlaxoSmithKline (GSK) to develop several
antibacterial agents. The agreement gives
the London-based pharma $40 million over
the first 18 months and up to $200 million
over five years, under a new type of flexible
structure. Instead of focusing on a single
medical countermeasure, BARDA can shift
funds around GSK’s antibacterial portfolio.
The “portfolio approach” is a more efficient
way to partner with the company, says BARDA
director Robin Robinson, of Washington, DC.
With the new partnership, “if one or more
drugs do not meet our requirements we will
replace them with others in the GSK pipeline,”
says Robinson. The partnership, funded by
BARDA’s Broad Spectrum Antimicrobial
Program, allows the government agency to
decide which drug candidates to include in the
portfolio. GSK will conduct the preclinical and
clinical studies to develop antibacterials for
bioterrorism indications such as anthrax, plague
and tularemia as well as address antibiotic
resistance—part of BARDA’s strategic plan since
2011. One new class of antibiotic investigated
under this program is GSK’944 to treat bacterial
infections acquired in hospital and community
settings. “Because of economic and regulatory
barriers, very few pharmaceutical companies
pursue antibiotic R&D,” says Amanda Jezek at
the Infectious Disease Society of America. “This
type of public-private collaboration is critical to
help leverage government and industry funding
for antibiotic R&D.” Also in May, BARDA signed
a $75.7-million deal with Cempra of Chapel
Hill, North Carolina, to develop solithromycin
(licensed from Optimer Pharmaceuticals, of
Jersey City, New Jersey), a next-generation
fluoroketolide antibiotic in phase 3 trials to treat
community-acquired bacterial pneumonia, and
potentially anthrax and tularemia infections,
in children. BARDA currently has 140 drug
candidates in its pipeline, 80 of which
are directed against chemical, biological,
radiological and nuclear defense. The majority
were developed through partnerships with
Gunjan Sinha
companies, says Robinson. 

in their words
“There is this perception
that the key to the next
breakthrough is from
someone finding a gene that
is sitting somewhere and
someone having a eureka
moment. What I learned
is that it doesn’t usually
happen that way.” Brad
Margus, advocate for rare diseases and former CEO
of Perlegen and Envoy Therapeutics, speaking of

the importance of sharing genome data. (New York
Times, 5 June 2013)
“We don’t have a lot of questions on drugs
because they’re slam dunks. It’s not if we’re
going to approve them. It’s how fast we’re going
to approve them.” Richard Padzur, director of
the FDA’s Office of Oncology and Hematology
Products, on the need for speed in approving
new cancer drugs. (Forbes, 23 June 2013)

nature biotechnology volume 31 NUMBER 8 august 2013

“Talk about personal genomics. It doesn’t get any
more personal than trying to figure out what’s
wrong with your own kid.” Gary Schroth, an
R&D director at Ilumina in San Diego. Illumina
participated in a nine-year quest by a father for
the cause of his daughter’s undiagnosed malady.
The answer (mutation in transforming growth
factor β-3) came out of exome sequencing.
(Nature, 26 June 2013)

665

NEWS

in brief
The French Ministry of Health has published the
French Sunshine Act requiring all companies
in the healthcare sector to declare contracts or
gifts worth €10 or more (including tax). The new
law issued on May 21 has been greeted by the
biotech industry with derision and even outright
hostility. The Bertrand law, so-called for former
health minister Xavier Bertrand, came in the
wake of the scandal over the diabetes drug
Mediator (benfluorex), made by the Suresnesbased Servier, alleged to have caused hundreds
of deaths before being withdrawn in November
2009. The new transparency requirements are
aimed at restoring public confidence in France’s
drug regulatory body, by exposing financial ties
between drug firms and doctors or experts.
Contracts must be approved in advance by
professional supervisory bodies for doctors and
pharmacists, and all will ultimately be posted on
a single website. “Implementation will involve
an enormous bureaucratic machine,” says
Renaud Vaillant, CEO of Theravectys, a vaccine
developer located on the outskirts of Paris.
“The intention is good, but no one is happy
with the result,” he adds. And the €10 ($12.8)
threshold “is a big mistake. No expert will be
influenced for little more than the price of a cup
of coffee and a couple of croissants.” Failure
to comply will result in fines of up to €30,000
($38,500). “Controls already exist in France,”
says Judith Greciet, CEO of BioAlliance Pharma
in Paris, and the new rules will represent a hefty
workload for a firm like hers, where about twothirds of the 50-plus employees are researchers,
and administration is pared to a minimum.
But the move is a sign of the times, as other
European countries are also trying to crack
down on conflicts of interest, notes Alexandre
Regniault, a partner in the Simmons & Simmons
law firm in Paris. The Netherlands, too,
implemented a Central Transparency Register
for the healthcare industry, known as the Dutch
Sunshine Act, in January. In the UK, the Ethical
Standards in Health and Life Sciences Group,
a gathering of 20 healthcare organizations, is
working on the idea, says Regniault. “But the
burden on life sciences companies is already
considerable, and minor variations in the
obligations imposed by different countries is
going to be extremely tiresome. The question
should really be taken up at a European level,”
Barbara Casassus
he adds. 

On July 9, GlaxoSmithKline of London filed
an application with the US Food and Drug
Administration (FDA) seeking approval of
BRAF inhibitor Tafinlar (dabrafenib) in combination with MEK inhibitor Mekinist (trametinib) for treating adults with metastatic
melanoma with specific mutations. In May, the
FDA had already approved the individual drugs
as single agents with a genetic test to determine
if the melanoma cells have the BRAF V600E or
V600K mutation. The London-based pharma
has moved quickly to file for approval of the two
small-molecule drugs combined, relying only on
phase 1/2 combination trial results (New Engl.
J. Med. 367, 1694–1703, 2012), not waiting for
results from phase 3 trials already underway.
The rush is understandable, as this is the first
combination trial in any cancer driven by a
mutant oncogene to significantly delay acquired
resistance to a targeted therapy. In an indication
that has seen four drugs approved in the past
two years, GSK and other industry players are
increasingly looking to drug cocktails to gain a
market advantage over rival therapies.
But FDA approval of the twinned BRAF- and
MEK-targeting drugs is not assured. Oncologist
Paul Chapman, at the Memorial SloanKettering Cancer Center in New York, doubts
the agency will approve the combination based

on current data. “I think the data in the New
England Journal paper was fairly unimpressive,”
Chapman says, noting that the combination
improved median progression-free survival by
only 3.6 months. “The combination was associated with significant toxicity, and also astounding expense—because you know [the treatment]
is going to be very expensive.” Mekinist will cost
$8,700 a month and Tafinlar $7,600 a month,
wholesale, individually.
Keith Flaherty, of the Dana-Farber/Harvard
Cancer Center in Boston, Massachusetts, says
the FDA should approve. “The FDA could
give accelerated approval to the combo and
then withdraw that if the phase 3 data did not
corroborate the phase 1/2 data,” he writes in an
email.
Regardless of the outcome, industry is
already moving on (Table 1). Its goal is to
identify and overcome drug resistance to
both targeted therapy and immunotherapy in
individual melanoma patients. For targeted
therapy, doubling down on the RAS-RAFMEK-ERK signaling pathway, which drives
most melanomas, is the rationale for the GSK
combination and several others in development. “We have to hit that pathway hard at
the beginning with more than one drug,” said
Jeffrey Sosmon, of the Vanderbilt-Ingram

in their words
We could not find the evidence [of corruption]
in their accounts. They used travel agents as a
money platform. But I must make it clear that
among these partners, GSK is the main party
responsible. It is like a criminal organization,
there is always a boss. In this game, GSK is the
godfather.” Gao Feng, head of the economic
crimes investigation unit at the Chinese
Ministry of Public Security comments on the
3 billion yuan in bribes allegedly dispensed by
GlaxoSmithKline’s operations in China. (The
Telegraph, 15 July 2013)

666

OJO Images Ltd / Alamy

npg

© 2013 Nature America, Inc. All rights reserved.

French scorn Sunshine

Melanoma combination therapies ward off
tumor resistance

Metastatic melanoma cells invade a blood vessel. Targeted therapies and immunotherapy have made
impressive inroads in melanoma, but companies are now turning to combination therapies to circumvent
drug resistance.
volume 31 NUMBER 8 august 2013 nature biotechnology

NEWS

in brief
The French Ministry of Health has published the
French Sunshine Act requiring all companies
in the healthcare sector to declare contracts or
gifts worth €10 or more (including tax). The new
law issued on May 21 has been greeted by the
biotech industry with derision and even outright
hostility. The Bertrand law, so-called for former
health minister Xavier Bertrand, came in the
wake of the scandal over the diabetes drug
Mediator (benfluorex), made by the Suresnesbased Servier, alleged to have caused hundreds
of deaths before being withdrawn in November
2009. The new transparency requirements are
aimed at restoring public confidence in France’s
drug regulatory body, by exposing financial ties
between drug firms and doctors or experts.
Contracts must be approved in advance by
professional supervisory bodies for doctors and
pharmacists, and all will ultimately be posted on
a single website. “Implementation will involve
an enormous bureaucratic machine,” says
Renaud Vaillant, CEO of Theravectys, a vaccine
developer located on the outskirts of Paris.
“The intention is good, but no one is happy
with the result,” he adds. And the €10 ($12.8)
threshold “is a big mistake. No expert will be
influenced for little more than the price of a cup
of coffee and a couple of croissants.” Failure
to comply will result in fines of up to €30,000
($38,500). “Controls already exist in France,”
says Judith Greciet, CEO of BioAlliance Pharma
in Paris, and the new rules will represent a hefty
workload for a firm like hers, where about twothirds of the 50-plus employees are researchers,
and administration is pared to a minimum.
But the move is a sign of the times, as other
European countries are also trying to crack
down on conflicts of interest, notes Alexandre
Regniault, a partner in the Simmons & Simmons
law firm in Paris. The Netherlands, too,
implemented a Central Transparency Register
for the healthcare industry, known as the Dutch
Sunshine Act, in January. In the UK, the Ethical
Standards in Health and Life Sciences Group,
a gathering of 20 healthcare organizations, is
working on the idea, says Regniault. “But the
burden on life sciences companies is already
considerable, and minor variations in the
obligations imposed by different countries is
going to be extremely tiresome. The question
should really be taken up at a European level,”
Barbara Casassus
he adds. 

On July 9, GlaxoSmithKline of London filed
an application with the US Food and Drug
Administration (FDA) seeking approval of
BRAF inhibitor Tafinlar (dabrafenib) in combination with MEK inhibitor Mekinist (trametinib) for treating adults with metastatic
melanoma with specific mutations. In May, the
FDA had already approved the individual drugs
as single agents with a genetic test to determine
if the melanoma cells have the BRAF V600E or
V600K mutation. The London-based pharma
has moved quickly to file for approval of the two
small-molecule drugs combined, relying only on
phase 1/2 combination trial results (New Engl.
J. Med. 367, 1694–1703, 2012), not waiting for
results from phase 3 trials already underway.
The rush is understandable, as this is the first
combination trial in any cancer driven by a
mutant oncogene to significantly delay acquired
resistance to a targeted therapy. In an indication
that has seen four drugs approved in the past
two years, GSK and other industry players are
increasingly looking to drug cocktails to gain a
market advantage over rival therapies.
But FDA approval of the twinned BRAF- and
MEK-targeting drugs is not assured. Oncologist
Paul Chapman, at the Memorial SloanKettering Cancer Center in New York, doubts
the agency will approve the combination based

on current data. “I think the data in the New
England Journal paper was fairly unimpressive,”
Chapman says, noting that the combination
improved median progression-free survival by
only 3.6 months. “The combination was associated with significant toxicity, and also astounding expense—because you know [the treatment]
is going to be very expensive.” Mekinist will cost
$8,700 a month and Tafinlar $7,600 a month,
wholesale, individually.
Keith Flaherty, of the Dana-Farber/Harvard
Cancer Center in Boston, Massachusetts, says
the FDA should approve. “The FDA could
give accelerated approval to the combo and
then withdraw that if the phase 3 data did not
corroborate the phase 1/2 data,” he writes in an
email.
Regardless of the outcome, industry is
already moving on (Table 1). Its goal is to
identify and overcome drug resistance to
both targeted therapy and immunotherapy in
individual melanoma patients. For targeted
therapy, doubling down on the RAS-RAFMEK-ERK signaling pathway, which drives
most melanomas, is the rationale for the GSK
combination and several others in development. “We have to hit that pathway hard at
the beginning with more than one drug,” said
Jeffrey Sosmon, of the Vanderbilt-Ingram

in their words
We could not find the evidence [of corruption]
in their accounts. They used travel agents as a
money platform. But I must make it clear that
among these partners, GSK is the main party
responsible. It is like a criminal organization,
there is always a boss. In this game, GSK is the
godfather.” Gao Feng, head of the economic
crimes investigation unit at the Chinese
Ministry of Public Security comments on the
3 billion yuan in bribes allegedly dispensed by
GlaxoSmithKline’s operations in China. (The
Telegraph, 15 July 2013)

666

OJO Images Ltd / Alamy

npg

© 2013 Nature America, Inc. All rights reserved.

French scorn Sunshine

Melanoma combination therapies ward off
tumor resistance

Metastatic melanoma cells invade a blood vessel. Targeted therapies and immunotherapy have made
impressive inroads in melanoma, but companies are now turning to combination therapies to circumvent
drug resistance.
volume 31 NUMBER 8 august 2013 nature biotechnology

news

npg

© 2013 Nature America, Inc. All rights reserved.

Table 1 Selected melanoma combination trials
Company sponsor

Agents

Cancer type

GlaxoSmithKline

Dabrafenib (BRAF inhibitor) and trametinib BRAF V600E/K mutant
(MEK inhibitor)
melanoma

Stage
Phase 3

Bristol-Myers Squibb Ipilimumab and Nivolumab (anti-PD1 mAb) Melanoma

Phase 3

Bristol-Myers Squibb Ipilimumab and Zelboraf (sequentially)

Melanoma

Phase 2

Novartis

MEK162 and AMG 479 (ganitumab) (antiIGFR MAb)

BRAF-mutated melanoma;
colorectal and pancreatic

Phase 2

Novartis

MEK162 and LEE011 (CDK 4/6 inhibitor) NRAS mutant melanoma

Novartis

LGX818 (BRAF inhibitor) and MEK162,
LEE011, BGJ398 (FGFR inhibitor),
BKM120 (PI3K inhibitor) and INC280
(c-Met inhibitor)

BRAF-mutated melanoma; Phase 2
individual combination based pending
on documented molecular
resistance mechanism

Novartis

LGX818 and MEK162

BRAF-dependent solid
tumors

Phase 1b/2

Novartis

MEK162 and RAF265

Solid tumors with RAS or
BRAF mutations

Phase 1

Genentech

MPDL3280A (anti-PD-L1 mAb) and
Zelboraf

Melanoma

Phase 1b

Genentech

MPDL3280A (anti-PD-L1) and Avastin
(bevacizumab)

Solid tumors

Phase 1

Bristol-Myers
Squibb, Genentech

Ipilimumab and Avastin

Melanoma

Phase 1

Phase 2

Source: Clinicaltrials.gov

Cancer Center in Nashville, Tennessee, at
the American Society of Clinical Oncology
(ASCO) annual meeting in Chicago, in June.
Inhibiting MEK along with BRAF helps
overcome the upstream RAS mutations and
BRAF splice variants that confer drug resistance in about half of patients. (RAS so far
is undruggable.)
The nature of the emerging resistance
remains unknown, because BRAF, a signal
transduction protein kinase, unlike driver
oncogenes in other tumors, doesn’t mutate
the drug binding site. Growth factors in the
tumor microenvironment seem to be implicated. In July 2012, two groups reported
that growth factor signaling from the tumor
microenvironment drives receptor tyrosine
kinase activation, restoring RAS-ERK signaling and activating the PI3 kinase pathway, and conferring immediate resistance
to BRAF inhibitors (Nature 487, 500–504
and 505–509, 2012). The main culprit in
these studies was hepatocyte growth factor
(HGF), but other growth factors have also
been implicated.
The stage is now set for rationally choosing combinations to overcome drug resistance in individual patients. Novartis in
Basel is preparing to launch an innovative
phase 2 melanoma trial of its new BRAF
inhibitor, LGX818, together with experimental inhibitors of MEK, CDK 4/6, FGFR, PI3K
and c-Met, the receptor for HGF. The drug
combination used for any given patient will
depend on the exact form of that individual’s
molecular resistance to LGX818, determined
from biopsy samples. “It’s a very ambitious

trial design,” says Chapman.
Meanwhile, melanoma immunotherapy
is moving just as fast. Here the resistance
problem is different. Because immunotherapy targets the immune system, not
the tumor, mutations in driver oncogenes
like RAS and BRAF don’t matter. Instead,
in response to the persistent presentation of
tumor-associated antigens, T cells upregulate
negative checkpoint receptors that dampen
T-cell activity and allow tumor cells to escape
immune cell killing. Such T-cell “exhaustion” appears to be a fundamental mechanism of tumor resistance to immunotherapy.
Two such exhaustion-inducing negative
checkpoint receptors are CTLA4 (cytotoxic
T-lymphocyte-associated antigen 4) and
PD-1 (programmed cell death protein-1).
The FDA approved the anti-CTLA4 antibody Yervoy (ipilimumab), from BristolMyers Squibb in New York, for melanoma in
March, 2011 (Nat. Biotechnol. 29, 75, 2011),
and the pharma’s anti-PD-1 monoclonal
antibody nivolumab has shown impressive
single-agent activity (Nat. Biotechnol. 30,
729–730, 2012). Yervoy and nivolumab
produce objective response rates of 11% and
31%, respectively, including some long-term
responses and even apparent cures. (One
phase 1 trial patient has been tumor free for
almost 12 years.) Researchers are struggling
to understand why some people respond but
the majority do not.
One possible reason is the functional
redundancy of CTLA4 and PD-1. The two
molecules work differently to disable T cells.
“PD-1 seems to interfere with the wiring of

nature biotechnology volume 31 NUMBER 8 august 2013

in brief
Dengue clinches Takeda
deal
Takeda Pharmaceutical of Osaka, Japan,
has acquired vaccine developer Inviragen, a
biotech company specializing in innovative
vaccines for emerging infectious diseases.
Under the May 7 deal, Takeda will pay
$35 million upfront and up to $215 million
in clinical and commercial milestones to
the Fort Collins, Colorado–based biotech.
Takeda was attracted to Inviragen’s vaccine
expertise, and primarily its lead candidate
DENVax, a tetravalent recombinant dengue
fever vaccine in phase 2 testing, according to
Rajeev Venkayya, executive vice president and
head of Takeda’s vaccine business division.
The acquisition brings recombinant vaccine
technology and additional live-attenuated
vaccine capabilities to the big pharma. Last
October Takeda also acquired Bozeman,
Montana–based LigoCyte Pharmaceuticals for
its virus-like particle and inactivated vaccine
technologies, as well as a phase 1/2 vaccine
candidate to prevent norovirus gastroenteritis.
In addition to innovative technologies, each
acquisition provides a distinct geographical
reach, Venkayya explains. The norovirus
vaccine could initially target populations
in more developed countries, whereas
the Inviragen candidates “would be more
immediately beneficial to middle- and lowerincome developing countries,” Venkayya says.
Viren Mehta, founder and managing partner of
Mehta Partners, a strategic and institutional
advisory firm based in New York, says Takeda
is taking a wise step towards the global
vaccines market. More than half the world’s
population is at risk of becoming infected by
the mosquito-borne dengue viruses, which are
endemic throughout subtropical and tropical
regions and cause more than 20,000 deaths
each year. “Ten years ago, nobody wanted to
hear about how developing markets would
drive growth similar to the developed world,”
Mehta says. “The Inviragen and LigoCyte
acquisitions will be positioning Takeda over
the next several years with the right set of
Aaron Bouchie
innovations.” 

in their words
“We were devastated
when we found out about
the diagnosis and locked
ourselves in our house for
three days. We decided
to stop feeling sorry for
ourselves and decided
to fight back against
SMA.” Vincent Gaynor,
whose three-year-old daughter, Sophie, suffers
from spinal muscular atrophy. Gaynor, with the
help of his union of steamfitters, has been
raising money for Sophie’s Cure Foundation.
(Wantagh-Seaford Patch, 10 June 2013)

667

NEWS

in brief

npg

© 2013 Nature America, Inc. All rights reserved.

Indiana’s game-changing
venture
An industry-driven biosciences research
institute, in partnership with the state’s
research universities, opened on May 30 in
Indiana, focused on commercializing homegrown healthcare innovations. The Indiana
Biosciences Research Institute (IBRI)
supported by the region’s BioCrossroads—
an initiative to strengthen life sciences
businesses across the state—will collaborate
with Indiana University, Purdue University
and the University of Notre Dame. On its
board of directors sit industry players,
including Indianapolis-based Eli Lilly,
Dow AgroSciences and Roche Diagnostics,
Warsaw-based Biomet Biologics, and Cook
of Bloomington. As for funding, the state
of Indiana has appropriated $25 million of
the $50 million biennium startup costs, and
Lilly Endowment, an Indianapolis-based
private philanthropic foundation, donated
$10 million in June. The remainder is being
sought from corporate and philanthropic
sources, to reach an estimated $360 million.
Industry is anticipated to provide up to a
third of the annual operating budget through
sponsored research, a level unprecedented
anywhere else in the country, according
to David Johnson, BioCrossroads’ CEO.
Additional operating cash flow will come
from IBRI endowment proceeds and federally
funded research. In Indiana, the diverse
life sciences cluster and university assets
are unlike those in any other state, creating
opportunities for collaboration towards
common scientific discoveries, according
to Darren Carroll, vice president for Eli
Lilly’s corporate business development, and
also on IBRI’s board. IBRI will focus on
cardiovascular disease, diabetes, obesity and
nutrition. Details about how discoveries will
be taken forward remain sketchy, although
Carroll says each will be handled on a
case-by-case basis, and intellectual property
policies have yet to be finalized. Emma Dorey

in their words
“This is more of Myriad’s
arrogant monopolist
behavior that is so
harmful to patients,
science and health-care
costs.” Dan Ravicher,
president of the New
York-based Public Patent
Foundation, comments
on the filing by Myriad of two suits for patent
infringement against companies offering
BRCA1/2 testing. (liveMint, 10 July 2013)

668

the ignition switch, if you will, rather than… Yervoy with the targeted therapy Zelboraf
putting the brakes on,” says immunologist (vemurafenib), an FDA-approved, smallJames Allison of the MD Anderson Cancer molecule BRAF inhibitor from Daiichi Sankyo
Center in Houston. So combining the two in Tokyo. BRAF inhibitors like Tafinlar
inhibiting antibodies is a rational approach, and Zelboraf quickly shrink BRAF-mutant
despite the theoretical risk of unleashing a tumors, but responses are rarely complete or
massive autoimmune response. In preclini- durable. Yervoy is the opposite—relatively
cal models, “the combination of CTLA4 few responses, which tend to be late, deep
blockade and PD-1 appears to be synergis- and lasting. The combination promised the
tic in rescuing those exhausted cells,” says best of both worlds. At the cellular level,
Nils Lonberg, senior vice president of bio- rapid tumor cell killing by Zelboraf should
logics discovery at Bristol-Myers Squibb. release dozens of tumor “neoantigens” generNivolumab activates the exhausted cells, ated by genomic instability, and thus provoke
Lonberg explains, but Yervoy plays a key role, a strong, multipronged immune response.
too. “If you have a T cell that is exhausted “The targeted therapies make really good vacand is expressing both CTLA4 and PD-1, you cines,” says Allison.
But the phase 1/2 clinical trial of concurhave to overcome both,” he says. Yervoy not
only removes the brake on T-cell activation, rent Zelboraf and Yervoy ended earlier this
but also kills regulatory T cells in the tumor year when six out of the first ten patients
microenvironment, further boosting the developed grade 3 elevations in aminoantitumor response (J. Exp. Med., in press). transferase levels, indicating possible liver
damage. (The symptoms
Whatever the mechawere reversible.) A phase
nism, combining Yervoy
2 trial of the drugs, given
and nivolumab in the
Continued
sequentially, continues to
clinic was spectacular.
progress rides on
accrue patients. Although
Given concurrently in
individualized
waiting between treatthe phase 1 trial, the
approaches like
ments may help minimize
combination produced
toxicity, Allison stresses
an objective response
Novartis’s pending
that Yervoy must be given
rate of 53% at the optimal
LGX818 trial, and
very soon after targeted
dose level (New Engl. J.
similar efforts in
therapy, before the vaccine
Med., 369, 122–133,
effect disappears. “Timing
2013). All nine respondthe immunotherapy
has got to be really, really
ers at the second dose
realm.
quick,” he says. “Dendritic
level had more than 80%
cells don’t hang around for
tumor shrinkage, including three complete responses, a rare occur- very long, after they get loaded with antigen.
rence for either drug given alone. “The rate So you’ve got to give the immunotherapy
of [antitumor] activity was incredible,” says while they’re still there.”
Sznol is confident that the issues can be
trial co-lead investigator Mario Sznol, of Yale
University Medical School in New Haven, solved. “As a field we’ll work out dose and
schedule issues, and be able to get those comConnecticut.
Sznol cautions that patient numbers are binations forward,” he says.
Given the impossibility of testing more
small and that a phase 3 trial, already underway, is needed to confirm that the combination than a fraction of the possible combinations
is superior to nivolumab alone. Fortunately, in clinical trials, continued progress rides
“synergy in therapeutic benefit… was not on individualized approaches like Novartis’s
accompanied by a synergistic increase in tox- pending LGX818 trial, and similar efforts
icity,” says Allison. The drug combination pro- in the immunotherapy realm. For example,
duced more toxicity than either drug alone, researchers are testing patient tumor expresbut the side effects were manageable and gen- sion of PD-L1, the ligand for PD-1, to predict
response to anti-PD-1 antibodies, but results
erally reversible.
With combination therapy making clear so far have been inconclusive. At this stage,
inroads on drug resistance to both immu- “there are more questions than answers,” said
notherapy and targeted therapy in mela- Walter Urba, of Providence Portland Medical
noma, an obvious step is to combine both Center in Portland, Oregon, at the ASCO
approaches. Bristol-Myers Squibb and meeting. As the current clinical trials unfold,
Genentech in South San Francisco, California, that may soon change.
jointly sponsored such a trial, combining
Ken Garber Ann Arbor, Michigan

volume 31 NUMBER 8 august 2013 nature biotechnology

news

npg

© 2013 Nature America, Inc. All rights reserved.

Table 1 Selected melanoma combination trials
Company sponsor

Agents

Cancer type

GlaxoSmithKline

Dabrafenib (BRAF inhibitor) and trametinib BRAF V600E/K mutant
(MEK inhibitor)
melanoma

Stage
Phase 3

Bristol-Myers Squibb Ipilimumab and Nivolumab (anti-PD1 mAb) Melanoma

Phase 3

Bristol-Myers Squibb Ipilimumab and Zelboraf (sequentially)

Melanoma

Phase 2

Novartis

MEK162 and AMG 479 (ganitumab) (antiIGFR MAb)

BRAF-mutated melanoma;
colorectal and pancreatic

Phase 2

Novartis

MEK162 and LEE011 (CDK 4/6 inhibitor) NRAS mutant melanoma

Novartis

LGX818 (BRAF inhibitor) and MEK162,
LEE011, BGJ398 (FGFR inhibitor),
BKM120 (PI3K inhibitor) and INC280
(c-Met inhibitor)

BRAF-mutated melanoma; Phase 2
individual combination based pending
on documented molecular
resistance mechanism

Novartis

LGX818 and MEK162

BRAF-dependent solid
tumors

Phase 1b/2

Novartis

MEK162 and RAF265

Solid tumors with RAS or
BRAF mutations

Phase 1

Genentech

MPDL3280A (anti-PD-L1 mAb) and
Zelboraf

Melanoma

Phase 1b

Genentech

MPDL3280A (anti-PD-L1) and Avastin
(bevacizumab)

Solid tumors

Phase 1

Bristol-Myers
Squibb, Genentech

Ipilimumab and Avastin

Melanoma

Phase 1

Phase 2

Source: Clinicaltrials.gov

Cancer Center in Nashville, Tennessee, at
the American Society of Clinical Oncology
(ASCO) annual meeting in Chicago, in June.
Inhibiting MEK along with BRAF helps
overcome the upstream RAS mutations and
BRAF splice variants that confer drug resistance in about half of patients. (RAS so far
is undruggable.)
The nature of the emerging resistance
remains unknown, because BRAF, a signal
transduction protein kinase, unlike driver
oncogenes in other tumors, doesn’t mutate
the drug binding site. Growth factors in the
tumor microenvironment seem to be implicated. In July 2012, two groups reported
that growth factor signaling from the tumor
microenvironment drives receptor tyrosine
kinase activation, restoring RAS-ERK signaling and activating the PI3 kinase pathway, and conferring immediate resistance
to BRAF inhibitors (Nature 487, 500–504
and 505–509, 2012). The main culprit in
these studies was hepatocyte growth factor
(HGF), but other growth factors have also
been implicated.
The stage is now set for rationally choosing combinations to overcome drug resistance in individual patients. Novartis in
Basel is preparing to launch an innovative
phase 2 melanoma trial of its new BRAF
inhibitor, LGX818, together with experimental inhibitors of MEK, CDK 4/6, FGFR, PI3K
and c-Met, the receptor for HGF. The drug
combination used for any given patient will
depend on the exact form of that individual’s
molecular resistance to LGX818, determined
from biopsy samples. “It’s a very ambitious

trial design,” says Chapman.
Meanwhile, melanoma immunotherapy
is moving just as fast. Here the resistance
problem is different. Because immunotherapy targets the immune system, not
the tumor, mutations in driver oncogenes
like RAS and BRAF don’t matter. Instead,
in response to the persistent presentation of
tumor-associated antigens, T cells upregulate
negative checkpoint receptors that dampen
T-cell activity and allow tumor cells to escape
immune cell killing. Such T-cell “exhaustion” appears to be a fundamental mechanism of tumor resistance to immunotherapy.
Two such exhaustion-inducing negative
checkpoint receptors are CTLA4 (cytotoxic
T-lymphocyte-associated antigen 4) and
PD-1 (programmed cell death protein-1).
The FDA approved the anti-CTLA4 antibody Yervoy (ipilimumab), from BristolMyers Squibb in New York, for melanoma in
March, 2011 (Nat. Biotechnol. 29, 75, 2011),
and the pharma’s anti-PD-1 monoclonal
antibody nivolumab has shown impressive
single-agent activity (Nat. Biotechnol. 30,
729–730, 2012). Yervoy and nivolumab
produce objective response rates of 11% and
31%, respectively, including some long-term
responses and even apparent cures. (One
phase 1 trial patient has been tumor free for
almost 12 years.) Researchers are struggling
to understand why some people respond but
the majority do not.
One possible reason is the functional
redundancy of CTLA4 and PD-1. The two
molecules work differently to disable T cells.
“PD-1 seems to interfere with the wiring of

nature biotechnology volume 31 NUMBER 8 august 2013

in brief
Dengue clinches Takeda
deal
Takeda Pharmaceutical of Osaka, Japan,
has acquired vaccine developer Inviragen, a
biotech company specializing in innovative
vaccines for emerging infectious diseases.
Under the May 7 deal, Takeda will pay
$35 million upfront and up to $215 million
in clinical and commercial milestones to
the Fort Collins, Colorado–based biotech.
Takeda was attracted to Inviragen’s vaccine
expertise, and primarily its lead candidate
DENVax, a tetravalent recombinant dengue
fever vaccine in phase 2 testing, according to
Rajeev Venkayya, executive vice president and
head of Takeda’s vaccine business division.
The acquisition brings recombinant vaccine
technology and additional live-attenuated
vaccine capabilities to the big pharma. Last
October Takeda also acquired Bozeman,
Montana–based LigoCyte Pharmaceuticals for
its virus-like particle and inactivated vaccine
technologies, as well as a phase 1/2 vaccine
candidate to prevent norovirus gastroenteritis.
In addition to innovative technologies, each
acquisition provides a distinct geographical
reach, Venkayya explains. The norovirus
vaccine could initially target populations
in more developed countries, whereas
the Inviragen candidates “would be more
immediately beneficial to middle- and lowerincome developing countries,” Venkayya says.
Viren Mehta, founder and managing partner of
Mehta Partners, a strategic and institutional
advisory firm based in New York, says Takeda
is taking a wise step towards the global
vaccines market. More than half the world’s
population is at risk of becoming infected by
the mosquito-borne dengue viruses, which are
endemic throughout subtropical and tropical
regions and cause more than 20,000 deaths
each year. “Ten years ago, nobody wanted to
hear about how developing markets would
drive growth similar to the developed world,”
Mehta says. “The Inviragen and LigoCyte
acquisitions will be positioning Takeda over
the next several years with the right set of
Aaron Bouchie
innovations.” 

in their words
“We were devastated
when we found out about
the diagnosis and locked
ourselves in our house for
three days. We decided
to stop feeling sorry for
ourselves and decided
to fight back against
SMA.” Vincent Gaynor,
whose three-year-old daughter, Sophie, suffers
from spinal muscular atrophy. Gaynor, with the
help of his union of steamfitters, has been
raising money for Sophie’s Cure Foundation.
(Wantagh-Seaford Patch, 10 June 2013)

667

NEWS

in brief

npg

© 2013 Nature America, Inc. All rights reserved.

Indiana’s game-changing
venture
An industry-driven biosciences research
institute, in partnership with the state’s
research universities, opened on May 30 in
Indiana, focused on commercializing homegrown healthcare innovations. The Indiana
Biosciences Research Institute (IBRI)
supported by the region’s BioCrossroads—
an initiative to strengthen life sciences
businesses across the state—will collaborate
with Indiana University, Purdue University
and the University of Notre Dame. On its
board of directors sit industry players,
including Indianapolis-based Eli Lilly,
Dow AgroSciences and Roche Diagnostics,
Warsaw-based Biomet Biologics, and Cook
of Bloomington. As for funding, the state
of Indiana has appropriated $25 million of
the $50 million biennium startup costs, and
Lilly Endowment, an Indianapolis-based
private philanthropic foundation, donated
$10 million in June. The remainder is being
sought from corporate and philanthropic
sources, to reach an estimated $360 million.
Industry is anticipated to provide up to a
third of the annual operating budget through
sponsored research, a level unprecedented
anywhere else in the country, according
to David Johnson, BioCrossroads’ CEO.
Additional operating cash flow will come
from IBRI endowment proceeds and federally
funded research. In Indiana, the diverse
life sciences cluster and university assets
are unlike those in any other state, creating
opportunities for collaboration towards
common scientific discoveries, according
to Darren Carroll, vice president for Eli
Lilly’s corporate business development, and
also on IBRI’s board. IBRI will focus on
cardiovascular disease, diabetes, obesity and
nutrition. Details about how discoveries will
be taken forward remain sketchy, although
Carroll says each will be handled on a
case-by-case basis, and intellectual property
policies have yet to be finalized. Emma Dorey

in their words
“This is more of Myriad’s
arrogant monopolist
behavior that is so
harmful to patients,
science and health-care
costs.” Dan Ravicher,
president of the New
York-based Public Patent
Foundation, comments
on the filing by Myriad of two suits for patent
infringement against companies offering
BRCA1/2 testing. (liveMint, 10 July 2013)

668

the ignition switch, if you will, rather than… Yervoy with the targeted therapy Zelboraf
putting the brakes on,” says immunologist (vemurafenib), an FDA-approved, smallJames Allison of the MD Anderson Cancer molecule BRAF inhibitor from Daiichi Sankyo
Center in Houston. So combining the two in Tokyo. BRAF inhibitors like Tafinlar
inhibiting antibodies is a rational approach, and Zelboraf quickly shrink BRAF-mutant
despite the theoretical risk of unleashing a tumors, but responses are rarely complete or
massive autoimmune response. In preclini- durable. Yervoy is the opposite—relatively
cal models, “the combination of CTLA4 few responses, which tend to be late, deep
blockade and PD-1 appears to be synergis- and lasting. The combination promised the
tic in rescuing those exhausted cells,” says best of both worlds. At the cellular level,
Nils Lonberg, senior vice president of bio- rapid tumor cell killing by Zelboraf should
logics discovery at Bristol-Myers Squibb. release dozens of tumor “neoantigens” generNivolumab activates the exhausted cells, ated by genomic instability, and thus provoke
Lonberg explains, but Yervoy plays a key role, a strong, multipronged immune response.
too. “If you have a T cell that is exhausted “The targeted therapies make really good vacand is expressing both CTLA4 and PD-1, you cines,” says Allison.
But the phase 1/2 clinical trial of concurhave to overcome both,” he says. Yervoy not
only removes the brake on T-cell activation, rent Zelboraf and Yervoy ended earlier this
but also kills regulatory T cells in the tumor year when six out of the first ten patients
microenvironment, further boosting the developed grade 3 elevations in aminoantitumor response (J. Exp. Med., in press). transferase levels, indicating possible liver
damage. (The symptoms
Whatever the mechawere reversible.) A phase
nism, combining Yervoy
2 trial of the drugs, given
and nivolumab in the
Continued
sequentially, continues to
clinic was spectacular.
progress rides on
accrue patients. Although
Given concurrently in
individualized
waiting between treatthe phase 1 trial, the
approaches like
ments may help minimize
combination produced
toxicity, Allison stresses
an objective response
Novartis’s pending
that Yervoy must be given
rate of 53% at the optimal
LGX818 trial, and
very soon after targeted
dose level (New Engl. J.
similar efforts in
therapy, before the vaccine
Med., 369, 122–133,
effect disappears. “Timing
2013). All nine respondthe immunotherapy
has got to be really, really
ers at the second dose
realm.
quick,” he says. “Dendritic
level had more than 80%
cells don’t hang around for
tumor shrinkage, including three complete responses, a rare occur- very long, after they get loaded with antigen.
rence for either drug given alone. “The rate So you’ve got to give the immunotherapy
of [antitumor] activity was incredible,” says while they’re still there.”
Sznol is confident that the issues can be
trial co-lead investigator Mario Sznol, of Yale
University Medical School in New Haven, solved. “As a field we’ll work out dose and
schedule issues, and be able to get those comConnecticut.
Sznol cautions that patient numbers are binations forward,” he says.
Given the impossibility of testing more
small and that a phase 3 trial, already underway, is needed to confirm that the combination than a fraction of the possible combinations
is superior to nivolumab alone. Fortunately, in clinical trials, continued progress rides
“synergy in therapeutic benefit… was not on individualized approaches like Novartis’s
accompanied by a synergistic increase in tox- pending LGX818 trial, and similar efforts
icity,” says Allison. The drug combination pro- in the immunotherapy realm. For example,
duced more toxicity than either drug alone, researchers are testing patient tumor expresbut the side effects were manageable and gen- sion of PD-L1, the ligand for PD-1, to predict
response to anti-PD-1 antibodies, but results
erally reversible.
With combination therapy making clear so far have been inconclusive. At this stage,
inroads on drug resistance to both immu- “there are more questions than answers,” said
notherapy and targeted therapy in mela- Walter Urba, of Providence Portland Medical
noma, an obvious step is to combine both Center in Portland, Oregon, at the ASCO
approaches. Bristol-Myers Squibb and meeting. As the current clinical trials unfold,
Genentech in South San Francisco, California, that may soon change.
jointly sponsored such a trial, combining
Ken Garber Ann Arbor, Michigan

volume 31 NUMBER 8 august 2013 nature biotechnology

news

Local scientists and US Department of
Agriculture (USDA) officials in May reported
finding genetically modified (GM) soft white
wheat growing as weeds on an Oregon farm.
Uncovering Roundup Ready wheat growing
where it should not is proving puzzling for
industry, agricultural and university experts
as well as activists and investigators, particularly because this GM crop was never given the
commercial go-ahead. Although some express
mere perplexity, others make more pointed
allegations of sabotage and eco-terrorism. In
any case, wheat farmers faced more immediate worries when both Japanese and South
Korean buyers suspended plans to purchase
US-grown soft wheat varieties that resemble
the GM-contaminated variety, despite its
being found only in miniscule amounts on a
single farm. In addition, lawsuits are being filed
against Monsanto of St. Louis, the ultimate if
indirect source of the GM wheat. More broadly,
this incident has dealt another setback to the
commercial development of GM wheat of any
kind, largely on hold for the past few years (Nat.
Biotechnol. 27, 974–976, 2009).
This episode began when an Oregon farmer,
whose identity and farm’s location are carefully
kept secrets, noticed that at the time of spraying

npg

© 2013 Nature America, Inc. All rights reserved.

Volunteer GM wheat, mischief or
carelessness?

Gregory Bergman / Alamy

The wheat incident in Oregon has dealt another
setback to the commercial development of
genetically-modified wheat.

a fallow field to clear weeds in preparation for
planting, ‘volunteer’ wheat plants (not planted
by the farmer but that sprout spontaneously
among other weeds) did not die when treated
with the herbicide glyphosate, which Monsanto
markets as Roundup Ready (RR). The farmer
had planted two kinds of soft wheat the previous growing season, neither of which was RR
tolerant as such wheat seed is not commercially
available.
The farmer sent samples of those herbicideresistant volunteer plants to scientists at Oregon
State University (OSU) in Corvallis for analysis. “My first reaction was ‘no, it can’t be,’ but
the plants tested positive as transgenic for the
CP4 gene [cp4 epsps], which is present in all RR
crops,” says OSU weed science professor Carol
Mallory-Smith, the first to analyze those samples. “How that trait got there, I have no idea,
and it is still a mystery.”
OSU scientists in turn contacted USDA,
where further analysis became the responsibility of officials in the Animal and Plant Health
Inspection Service (APHIS). Their investigation of the GM wheat from Oregon, which
began early in May, is “active and ongoing,” and
will continue until “we run all the leads down,”
says Brian Mabry of USDA. Although spare on
details, APHIS officials confirm the presence of
the RR trait in wheat samples from that single
farm in Oregon but no others. They also confirm that this trait corresponds to cp4 epsps that
Monsanto tested in USDA-authorized field trials
of several GM wheat varieties in 16 states from
1998 to 2005. Furthermore, the GM wheat from
Oregon poses no food safety concerns, and “all
information collected so far shows no indication of the presence of GM wheat in commerce,”
they say. Although USDA sanctioned numerous
GM wheat field trials in past years, no such GM
variety was ever “deregulated,” that is, approved
for commercial production.
Soon after this information was made public,
Japanese and South Korean wheat buyers said
that they would postpone purchasing US soft
wheat from Oregon, Washington and Idaho,
whereas officials from Taiwan said they would
not buy US soft wheat from Oregon, according
to Blake Rowe who heads the Oregon Wheat
Grower’s League in Pendleton. Meanwhile, two
national wheat organizations, the US Wheat
Associates (USW) and the National Association
of Wheat Growers (NAWG) took steps to shore
up “the trust we’ve earned with our customers
at home and around the world.”
Without doubt, wheat growers are watching these matters closely. “Our crops in

nature biotechnology volume 31 NUMBER 8 august 2013

in brief
Paper firm to improve poor
farmers’ crops
A Brazilian forestry company will be sharing
yield-enhancement traits used in woody
crops with a nonprofit research institute
to improve the resilience of staple crops
grown by small farmers in arid and semiarid regions of Asia, Africa and Brazil. On
May 29, Sao Paulo-based FuturaGene, a
wholly owned subsidiary of forestry and
paper company Suzano Pulp and Paper,
based in Sao Paulo signed an agreement
with the Donald Danforth Plant Science
Center, a large nonprofit research institute
based in St. Louis. The Danforth Center will
use the agbiotech company’s technology,
already tested in genetically modified (GM)
eucalyptus and poplar, to boost plant biomass
levels, improve crop adaptation to climate
change and facilitate processing for animal
feed in strategic crops. The technology
hinges on the endo-β-1,4-glucanase CEL1
gene isolated from Arabidopsis thaliana,
which encodes an enzyme implicated in
cell wall metabolism. Expressing this gene
relaxes the crystalline matrix of the plant cell
wall facilitating cell expansion. Eucalyptus
variants overexpressing this glucanase gene
are currently in regulatory trials in Brazil.
The Danforth Center expects to introduce
the transgene into the model grass setaria
and, if successful, to millet, sorghum and
cassava. “We can envisage applications
to increase biomass accumulation or to
reduce crop-cycle duration,” says Mike May,
FuturaGene vice president of public affairs.
This alliance “is an example to follow on
what is possible when the public and private
sector break down the barriers and join forces
towards putting advanced technologies in the
hands of resource-poor farmers,” says Marc
Van Montagu, chairman of the FuturaGene
scientific advisory board and recent winner
of the 2013 World Food Prize. “I believe that
Brazil has a powerful role to play in guiding
and training other countries, particularly in
Anna Meldolesi
Africa,” he says 

in their words
“Activists have been
working on this [GM food]
labeling issue for a long
time because they see
it as a way to influence
industry behavior. And
they haven’t had a lot of
success in the United
States otherwise.”
Rachel A. Schurman, a sociology professor
at the University of Minnesota, comments on
the expected focus on GM foods in trade talks
between the US and the EU. (The New York
Times, 10 July 2013)

669

Montana are not affected,” says one of them,
Ryan McCormick of Kremlin, Montana, who
is the current president of the Montana Grain
Growers Association, an affiliate of NAWG.
Although these organizations support GM
developments, in terms of their application
to wheat, growers generally prefer to “wait for
the market to accept” such products and only
after they “go through a regulatory assessment
before being released,” he says. “Eighty percent
of Montana wheat is exported, and those markets are key to us.”
Meanwhile, Ernest Barnes, a wheat farmer
near Elkhart, Kansas, early in June filed a lawsuit against Monsanto in the US District Court
in Kansas, claiming he “lost money and his
livelihood is now at serious risk as a result of
Monsanto’s negligence or gross negligence,”
leading to the threat of foreign buyers cancelling their orders. Barnes seeks “a sum in excess
of $100,000” as well as costs. A few days later,
activist Andrew Kimbrell of the Center for Food
Safety in Washington, DC, joined with a group
of wheat farmers in Washington state to file a
class action lawsuit against Monsanto, seeking
damages as well as “relief and forcing Monsanto
[to] take measures to clean up the contamination and ensure it never happens again.”
“You have to wonder how the contamination occurred and in only one field,” says Doug
Gurian-Sherman of the Union of Concerned

Scientists in Washington, DC. Moreover, that
farm did not partake in field trials involving GM
wheat and is said not to be near fields that were
used for such trials, which ended years ago. “It’s
really puzzling,” he says. “The huge number of
unanswered questions defies entropy.”
However, Robert Fraley, chief technology
officer at Monsanto, some wheat farmers and
a few industry consultants say that sabotage
or eco-terrorism might explain this mystery.
“The fact pattern and the agronomic data that
we know about planting, harvesting and volunteer management indicates the strong possibility that someone intentionally introduced
wheat seed containing the glyphosate tolerance
[trait],” Fraley says. “Our testing for the original RR wheat technology in Oregon ended 12
years ago, [and] the program for closing out
[that] program was rigorous, documented and
audited. Seed from the field research programs
was destroyed or shipped to a USDA/ARS
[Agricultural Research Service] Colorado facility or to Monsanto in St. Louis.”
“I personally believe this was eco-terrorism,
but not so much sinister as borne of passion and
frustration,” says one wheat farmer, noting his
conclusion comes from a “gut feeling…predicated on my understanding of farming practices
and how seed is grown and distributed.” Other
farmers “are being very careful not to speculate
publicly,” he says, adding, “Yes, it is possible this

was carelessness. But it looks like mischief to me.
I doubt this will be definitively solved.”
“Sabotage is possible, but it’s unusual for it
[the GM wheat] to be only in one place,” says
geneticist and wheat specialist Peggy G. Lemaux
of the University of California, Berkeley.
Another unexplained oddity is that the GM
volunteer wheat was a winter wheat variety,
whereas Monsanto’s GM field-tests in Oregon
were planted with a spring wheat variety. “That’s
weird,” she says. “Where did it [the GM wheat]
come from?”
“I don’t buy the conspiracy theories,” says
Mallory-Smith of OSU. “I don’t buy that someone held onto seeds since Monsanto shut down
its RR wheat program. There were lots of field
trials and plenty of opportunities for seeds to be
mixed with batches meant for seed production.
Conspiracy theories throw off reporters and
make no more sense than any other scenarios.”
Conspiracy theories notwithstanding,
Lemaux sees this episode as a distraction from
“much bigger problems” as well as a setback for
serious efforts to improve wheat crops. “We’ve
done things in wheat that would be extremely
valuable in the face of climate change that can’t
be done with conventional breeding, and they
also increase yields by about 10%,” she says. “But
because they involve GM, no one wants to deal
with it. Our mind-set is in the wrong place.”
Jeffrey L Fox Washington, DC

Around the world in a month
UK
A new government-owned company called
Genomics England is set up to oversee the
government’s effort to sequence 100,000 genomes.
Genomics England will manage a massive database to match
DNA and clinical data, and handle other genome-focused
health projects. The program has already received a
government pledge of £100 ($149) million, which was
previously provided for as part of the 100K Genome Project.

npg

© 2013 Nature America, Inc. All rights reserved.

NEWS

TURKEY
Three Turkish universities and China’s BGI
sign agreements to advance genomics and
its clinical applications. BGI will collaborate with
Bogazici University on human genetics, plant and
animal genomics, with Acibadem University on
medical research and subsequent clinical applications, and with Cankiri Karatekin University on an
International Olive genome project.

MALAYSIA

BRAZIL
The Ministry of Health agrees to
build a new facility at Fundação
Oswaldo Cruz, known as Fiocruz, to produce
the Protalix-Pfizer enzyme replacement
therapy Uplyso (Elelyso; taliglucerase alfa), a
treatment for Gaucher disease. As part of the
licensing agreement, Fiocruz must purchase
$280 million worth of the drug from Protalix.

670

Five major plantation
companies sign a Biomass
Joint Venture to produce secondgeneration biofuel. Agensi Inovasi
Malaysia, the brain behind the project,
has identified about 70 to 120-plus palm
oil mills in Sabah that generate abundant
empty fruit bunches, biomass waste that
can be converted into biomass pellet or
bioethanol. The companies involved are
Teck Guan Group, Bell Group, Genting,
Kelas Wira and Golden Elate.

CHINA
German pharma Boehringer
invests $46 million to set up
cGMP mammalian cell culture facility for
biopharmaceutical production in China in
partnership with Zhangjiang Biotech &
Pharmaceutical Base Development. The
facility could open in 2016.

volume 31 NUMBER 8 august 2013 nature biotechnology

news

Local scientists and US Department of
Agriculture (USDA) officials in May reported
finding genetically modified (GM) soft white
wheat growing as weeds on an Oregon farm.
Uncovering Roundup Ready wheat growing
where it should not is proving puzzling for
industry, agricultural and university experts
as well as activists and investigators, particularly because this GM crop was never given the
commercial go-ahead. Although some express
mere perplexity, others make more pointed
allegations of sabotage and eco-terrorism. In
any case, wheat farmers faced more immediate worries when both Japanese and South
Korean buyers suspended plans to purchase
US-grown soft wheat varieties that resemble
the GM-contaminated variety, despite its
being found only in miniscule amounts on a
single farm. In addition, lawsuits are being filed
against Monsanto of St. Louis, the ultimate if
indirect source of the GM wheat. More broadly,
this incident has dealt another setback to the
commercial development of GM wheat of any
kind, largely on hold for the past few years (Nat.
Biotechnol. 27, 974–976, 2009).
This episode began when an Oregon farmer,
whose identity and farm’s location are carefully
kept secrets, noticed that at the time of spraying

npg

© 2013 Nature America, Inc. All rights reserved.

Volunteer GM wheat, mischief or
carelessness?

Gregory Bergman / Alamy

The wheat incident in Oregon has dealt another
setback to the commercial development of
genetically-modified wheat.

a fallow field to clear weeds in preparation for
planting, ‘volunteer’ wheat plants (not planted
by the farmer but that sprout spontaneously
among other weeds) did not die when treated
with the herbicide glyphosate, which Monsanto
markets as Roundup Ready (RR). The farmer
had planted two kinds of soft wheat the previous growing season, neither of which was RR
tolerant as such wheat seed is not commercially
available.
The farmer sent samples of those herbicideresistant volunteer plants to scientists at Oregon
State University (OSU) in Corvallis for analysis. “My first reaction was ‘no, it can’t be,’ but
the plants tested positive as transgenic for the
CP4 gene [cp4 epsps], which is present in all RR
crops,” says OSU weed science professor Carol
Mallory-Smith, the first to analyze those samples. “How that trait got there, I have no idea,
and it is still a mystery.”
OSU scientists in turn contacted USDA,
where further analysis became the responsibility of officials in the Animal and Plant Health
Inspection Service (APHIS). Their investigation of the GM wheat from Oregon, which
began early in May, is “active and ongoing,” and
will continue until “we run all the leads down,”
says Brian Mabry of USDA. Although spare on
details, APHIS officials confirm the presence of
the RR trait in wheat samples from that single
farm in Oregon but no others. They also confirm that this trait corresponds to cp4 epsps that
Monsanto tested in USDA-authorized field trials
of several GM wheat varieties in 16 states from
1998 to 2005. Furthermore, the GM wheat from
Oregon poses no food safety concerns, and “all
information collected so far shows no indication of the presence of GM wheat in commerce,”
they say. Although USDA sanctioned numerous
GM wheat field trials in past years, no such GM
variety was ever “deregulated,” that is, approved
for commercial production.
Soon after this information was made public,
Japanese and South Korean wheat buyers said
that they would postpone purchasing US soft
wheat from Oregon, Washington and Idaho,
whereas officials from Taiwan said they would
not buy US soft wheat from Oregon, according
to Blake Rowe who heads the Oregon Wheat
Grower’s League in Pendleton. Meanwhile, two
national wheat organizations, the US Wheat
Associates (USW) and the National Association
of Wheat Growers (NAWG) took steps to shore
up “the trust we’ve earned with our customers
at home and around the world.”
Without doubt, wheat growers are watching these matters closely. “Our crops in

nature biotechnology volume 31 NUMBER 8 august 2013

in brief
Paper firm to improve poor
farmers’ crops
A Brazilian forestry company will be sharing
yield-enhancement traits used in woody
crops with a nonprofit research institute
to improve the resilience of staple crops
grown by small farmers in arid and semiarid regions of Asia, Africa and Brazil. On
May 29, Sao Paulo-based FuturaGene, a
wholly owned subsidiary of forestry and
paper company Suzano Pulp and Paper,
based in Sao Paulo signed an agreement
with the Donald Danforth Plant Science
Center, a large nonprofit research institute
based in St. Louis. The Danforth Center will
use the agbiotech company’s technology,
already tested in genetically modified (GM)
eucalyptus and poplar, to boost plant biomass
levels, improve crop adaptation to climate
change and facilitate processing for animal
feed in strategic crops. The technology
hinges on the endo-β-1,4-glucanase CEL1
gene isolated from Arabidopsis thaliana,
which encodes an enzyme implicated in
cell wall metabolism. Expressing this gene
relaxes the crystalline matrix of the plant cell
wall facilitating cell expansion. Eucalyptus
variants overexpressing this glucanase gene
are currently in regulatory trials in Brazil.
The Danforth Center expects to introduce
the transgene into the model grass setaria
and, if successful, to millet, sorghum and
cassava. “We can envisage applications
to increase biomass accumulation or to
reduce crop-cycle duration,” says Mike May,
FuturaGene vice president of public affairs.
This alliance “is an example to follow on
what is possible when the public and private
sector break down the barriers and join forces
towards putting advanced technologies in the
hands of resource-poor farmers,” says Marc
Van Montagu, chairman of the FuturaGene
scientific advisory board and recent winner
of the 2013 World Food Prize. “I believe that
Brazil has a powerful role to play in guiding
and training other countries, particularly in
Anna Meldolesi
Africa,” he says 

in their words
“Activists have been
working on this [GM food]
labeling issue for a long
time because they see
it as a way to influence
industry behavior. And
they haven’t had a lot of
success in the United
States otherwise.”
Rachel A. Schurman, a sociology professor
at the University of Minnesota, comments on
the expected focus on GM foods in trade talks
between the US and the EU. (The New York
Times, 10 July 2013)

669

Montana are not affected,” says one of them,
Ryan McCormick of Kremlin, Montana, who
is the current president of the Montana Grain
Growers Association, an affiliate of NAWG.
Although these organizations support GM
developments, in terms of their application
to wheat, growers generally prefer to “wait for
the market to accept” such products and only
after they “go through a regulatory assessment
before being released,” he says. “Eighty percent
of Montana wheat is exported, and those markets are key to us.”
Meanwhile, Ernest Barnes, a wheat farmer
near Elkhart, Kansas, early in June filed a lawsuit against Monsanto in the US District Court
in Kansas, claiming he “lost money and his
livelihood is now at serious risk as a result of
Monsanto’s negligence or gross negligence,”
leading to the threat of foreign buyers cancelling their orders. Barnes seeks “a sum in excess
of $100,000” as well as costs. A few days later,
activist Andrew Kimbrell of the Center for Food
Safety in Washington, DC, joined with a group
of wheat farmers in Washington state to file a
class action lawsuit against Monsanto, seeking
damages as well as “relief and forcing Monsanto
[to] take measures to clean up the contamination and ensure it never happens again.”
“You have to wonder how the contamination occurred and in only one field,” says Doug
Gurian-Sherman of the Union of Concerned

Scientists in Washington, DC. Moreover, that
farm did not partake in field trials involving GM
wheat and is said not to be near fields that were
used for such trials, which ended years ago. “It’s
really puzzling,” he says. “The huge number of
unanswered questions defies entropy.”
However, Robert Fraley, chief technology
officer at Monsanto, some wheat farmers and
a few industry consultants say that sabotage
or eco-terrorism might explain this mystery.
“The fact pattern and the agronomic data that
we know about planting, harvesting and volunteer management indicates the strong possibility that someone intentionally introduced
wheat seed containing the glyphosate tolerance
[trait],” Fraley says. “Our testing for the original RR wheat technology in Oregon ended 12
years ago, [and] the program for closing out
[that] program was rigorous, documented and
audited. Seed from the field research programs
was destroyed or shipped to a USDA/ARS
[Agricultural Research Service] Colorado facility or to Monsanto in St. Louis.”
“I personally believe this was eco-terrorism,
but not so much sinister as borne of passion and
frustration,” says one wheat farmer, noting his
conclusion comes from a “gut feeling…predicated on my understanding of farming practices
and how seed is grown and distributed.” Other
farmers “are being very careful not to speculate
publicly,” he says, adding, “Yes, it is possible this

was carelessness. But it looks like mischief to me.
I doubt this will be definitively solved.”
“Sabotage is possible, but it’s unusual for it
[the GM wheat] to be only in one place,” says
geneticist and wheat specialist Peggy G. Lemaux
of the University of California, Berkeley.
Another unexplained oddity is that the GM
volunteer wheat was a winter wheat variety,
whereas Monsanto’s GM field-tests in Oregon
were planted with a spring wheat variety. “That’s
weird,” she says. “Where did it [the GM wheat]
come from?”
“I don’t buy the conspiracy theories,” says
Mallory-Smith of OSU. “I don’t buy that someone held onto seeds since Monsanto shut down
its RR wheat program. There were lots of field
trials and plenty of opportunities for seeds to be
mixed with batches meant for seed production.
Conspiracy theories throw off reporters and
make no more sense than any other scenarios.”
Conspiracy theories notwithstanding,
Lemaux sees this episode as a distraction from
“much bigger problems” as well as a setback for
serious efforts to improve wheat crops. “We’ve
done things in wheat that would be extremely
valuable in the face of climate change that can’t
be done with conventional breeding, and they
also increase yields by about 10%,” she says. “But
because they involve GM, no one wants to deal
with it. Our mind-set is in the wrong place.”
Jeffrey L Fox Washington, DC

Around the world in a month
UK
A new government-owned company called
Genomics England is set up to oversee the
government’s effort to sequence 100,000 genomes.
Genomics England will manage a massive database to match
DNA and clinical data, and handle other genome-focused
health projects. The program has already received a
government pledge of £100 ($149) million, which was
previously provided for as part of the 100K Genome Project.

npg

© 2013 Nature America, Inc. All rights reserved.

NEWS

TURKEY
Three Turkish universities and China’s BGI
sign agreements to advance genomics and
its clinical applications. BGI will collaborate with
Bogazici University on human genetics, plant and
animal genomics, with Acibadem University on
medical research and subsequent clinical applications, and with Cankiri Karatekin University on an
International Olive genome project.

MALAYSIA

BRAZIL
The Ministry of Health agrees to
build a new facility at Fundação
Oswaldo Cruz, known as Fiocruz, to produce
the Protalix-Pfizer enzyme replacement
therapy Uplyso (Elelyso; taliglucerase alfa), a
treatment for Gaucher disease. As part of the
licensing agreement, Fiocruz must purchase
$280 million worth of the drug from Protalix.

670

Five major plantation
companies sign a Biomass
Joint Venture to produce secondgeneration biofuel. Agensi Inovasi
Malaysia, the brain behind the project,
has identified about 70 to 120-plus palm
oil mills in Sabah that generate abundant
empty fruit bunches, biomass waste that
can be converted into biomass pellet or
bioethanol. The companies involved are
Teck Guan Group, Bell Group, Genting,
Kelas Wira and Golden Elate.

CHINA
German pharma Boehringer
invests $46 million to set up
cGMP mammalian cell culture facility for
biopharmaceutical production in China in
partnership with Zhangjiang Biotech &
Pharmaceutical Base Development. The
facility could open in 2016.

volume 31 NUMBER 8 august 2013 nature biotechnology

news

npg

© 2013 Nature America, Inc. All rights reserved.

A new human coronavirus isolated from a
patient in Saudi Arabia is raising questions
over how to handle the intellectual property
(IP) of newly emerging infectious diseases. As
Nature Biotechnology went to press, the World
Health Organization (WHO) had been notified of 81 cases and 45 deaths globally since
September 2012 attributed to the Middle East
respiratory syndrome coronavirus, (MERSCoV). Ali Mohamed Zaki, a microbiologist
at Soliman Fakeeh Hospital in Jeddah, Saudi
Arabia, who isolated the virus from a patient,
has lost his job after announcing the existence
of the virus through a public medium. Saudi
officials accuse him of mailing a virus sample
to a laboratory in The Netherlands without permission. They also claim that patents filed by
the Dutch researchers have delayed the Saudi
health response.
Back in June 2012, Zaki sent samples from
a patient with a severe respiratory infection
to both the Saudi Ministry of Health and the
laboratory of Ron Fouchier of Erasmus Medical
Center in Rotterdam, The Netherlands, for
routine help diagnosing the isolates. The Saudi
Ministry of Health told Zaki that the virus
did not match any pathogens in their screen.
In early September, Fouchier’s lab told Zaki
that the sample was a new kind of coronavirus, similar to the severe acute respiratory disease (SARS) that killed at least 774 people in
2003 and infected over 8,000. On September
15 Zaki posted the news to ProMED, an email
listserv hosted by the International Society for
Infectious Diseases.
Zaki’s employer soon asked for his resignation. Saudi Deputy Minister for Public
Health Ziad Memish wrote in October 2012 to
ProMED that “internal reporting mechanisms
were either intentionally or inadvertently circumvented” and warned that panicked reporting on SARS had harmed Canada’s economy
in 2003. Saudi Arabia hosts several million
religious pilgrims in Mecca each year.
Erasmus researcher Fouchier says that Zaki’s
ProMed post was “a little surprising” but
“defensible” because the Saudi government had
not made an official announcement.
Fouchier’s laboratory—which last year
published a controversial paper describing
mutations in H5N1 avian influenza virus that
allowed it to spread among ferrets by airborne
droplets—meantime sequenced the virus and
applied for patents on potential tests, vaccines
and drugs based on the virus genome sequence.
Future diagnostics or treatments could be
“partially dependent on our patents,” Fouchier
says, but “the IP we filed at no point in time has

BSIP SA / Alamy

SARS-like virus reignites ownership feuds

A coronavirus isolated from a patient in Saudi
Arabia is posing questions on IP ownership.

stood in the way of test development.” By May
2013 at least 20 people had died of the virus.
At the World Health Assembly held in Geneva
that month, Memish said that “patenting of
the virus” was causing diagnostics delays, even
though patent applications are unenforceable
while pending. And as the application listed
investigators from both Erasmus Medical
Center and Soliman Fakeeh Hospital as inventors, it would normally be managed by their
host institutions, Fouchier says, meaning the
Saudis would have had a hand in controlling
the IP until Zaki left.
If the Saudis do not find a way to co-manage
the patent application through Zaki and
Soliman Fakeeh Hospital, they may be able
to claim ownership under the Convention
on Biological Diversity (CBD). In a 2007 outbreak of avian influenza (H5N1), Indonesia
claimed ownership of virus under the CBD
and demanded fair access to any vaccines that
resulted from sharing virus samples. The WHO
issued a resolution covering influenza samples,
but the resolution does not apply to MERS, a
coronavirus. So far, Saudi Arabia has not yet
made a similar demand. In fact, Fouchier says
the Saudis have not asked for help or material,
which he has given free of charge to researchers
in over 100 countries.
In response to Memish’s statement, however, WHO director-general Margaret Chan
called on researchers not to let IP slow healthcare. The WHO has also asked governments
in MERS-CoV host countries to share epidemiology data. So far, fewer people have been
harmed by MERS-CoV than by SARS, Memish
notes. Animal screening and risk profiling is
underway in partnership with the US National
Institutes of Health.

nature biotechnology volume 31 number 8 august 2013

Lucas Laursen Madrid

671

data page

2Q13—an IPO revival
Walter Yang
Biotech stock indices continued their climb to new heights, posting >20% in 1H13. Industry raised $1.2 billion from 19 IPOs in
2Q13, including listings from Cambridge, Mass.–based innovators
bluebird bio and Epizyme. The last time IPOs pulled in >$1 billion

was 2Q07, when 22 new listings raised $1.4 billion. Together with a
nearly fourfold uptick in follow-on deals, $10.2 billion was raised this
quarter, excluding partnership monies. Venture money was also up to
$1.9 billion.

Stock market performance

Global biotech initial public offerings

The BioCentury 100 and NASDAQ Biotechnology indices were up 6–9%, whereas
the Dow and S&P 500 were up only 2%.

19 companies raised $1.2 billion via IPOs in 2Q13 versus three raising
$163.5 million in 2Q12.

1,800

Index

1,400
1,200
1,000

Amount raised in IPOs
(millions)

NASDAQ Biotech
BioCentury 100
NASDAQ
Swiss Market
S&P 500
Dow Jones

1,600

3/
13

/1
2
12

9/
12

6/
12

Month

Private biotechs raised $1.9 billion in 1Q13, up 2x from 1Q13 and up 38% from
a year ago.
$2,000

$19
$459
$894

$1,500

$96
$202
$1,582

$9
$348
$1,370

$15
$628
$1249

$300
$0

$0
$24
$139

$32
$16
$135

2Q12

3Q12

$36
$32
$241

$24
$0
$305

4Q12

1Q13

Asia-Pacific
Europe
Americas

$500

2Q12

3Q12

4Q12

1Q13

2Q13

Americas

2

2

4

5

13

Europe

1

1

3

0

4

Asia-Pacific

0

1

3

1

2

Venture capital
Amount raised
(millions)

Company (lead investors)
Intrexon (not disclosed)

$0

2Q12

3Q12

4Q12

1Q13

Financial quarter

2Q13

Table indicates number of IPOs. Source: BCIQ: BioCentury Online Intelligence.

2Q13

Round number

Date closed

$150

6

1-May

Symphogen (Novo A/S; PKA)a

$53

5

2-May

Trevena (Forest Laboratories)

$60

3

9-May

Natera (OrbiMed; Harmony Partners)

$55

5

1-May

2Q12

3Q12

4Q12

1Q13

2Q13

Americas

63

47

77

62

67

Auris Medical (Sofinnova Partners; Sofinnova Ventures)

$51

3

16-Apr

Europe

34

20

24

29

45

Karyopharm Therapeutics (undisclosed private investor)

$48

2

20-May

5

3

5

5

2

Effector Therapeutics (U.S. Venture Partners;
Abingworth; Novartis Venture Funds; SR One)

$45

1

20-May

Acquirer

Valueb
(millions)

Date
announced

Life Technologies

Thermo Fisher Scientific

$13,600

15-Apr

Pearl Therapeutics

AstraZeneca

$1,150

10-Jun

Aragon Pharmaceuticals

Johnson & Johnson

$1,000

17-Jun

SARcode Bioscience

Shire

$675

25-May

Asia-Pacific

Table indicates number of VC investments and includes rounds where the amount raised was not disclosed.
Source: BCIQ: BioCentury Online Intelligence.

Mergers and acquisitions

Global biotech industry financing
Excluding partnerships, biotechs raised $10.2 billion in 2Q13, up 42% from $7.2
billion in 2Q12.
Partnership
Debt and other

Venture
Follow-on

IPO
PIPES
$9.63 $2.23 $1.89 $4.16 $1.22 $0.66

2Q13

Financial quarter

npg

$600

Notable 2Q13 deals (continued)

$38
$128
$774

$1,000

Asia-Pacific
Europe
Americas

$900

Financial quarter

Global biotech venture capital investment

VC amount raised (millions)

© 2013 Nature America, Inc. All rights reserved.

12

3/
12

/1
1

800

$74
$142
$1005

$1,200

$8.92 $1.83 $0.94 $2.35 $0.33 $0.62

1Q13

$10.08 $2.76 $1.88 $1.81 $0.31 $0.65

4Q12

Licensing/collaboration

$12.28 $9.20 $1.73 $1.31 $0.18 $0.61

3Q12

$5.12 $4.04 $1.37 $1.05 $0.16 $0.53

2Q12
0

Target

5

10

15

20

25

Researcher

Investor

Valueb
(millions)

MorphoSys

Celgene

$818

Global co-development and co-commercialization in Europe
of MorphoSys’ multiple myeloma anti-GMCSF monoclonal
antibody (mAb) (MOR202)

CytomX

Pfizer

$635

Develop and commercialize multiple oncology drugs conjugated to probody (mAb with CDRs occluded by disease
protease–sensitive mask)

MorphoSys

GlaxoSmithKline

$579

Exclusive, global rights to develop and commercialize antiGMCSF mAb (MOR103) in inflammatory diseases

Seattle
Genetics

Bayer

$520

Global rights to auristatin-based antibody-drug conjugate
technology in multiple cancer targets

Forma
Therapeutics

Celgene

>$515

Discover, develop and commercialize drugs that regulate
protein homeostasis for multiple diseases, with emphasis
on cancer

Cytokinetics

Astellas

$490

Develop and commercialize small-molecule skeletal muscle
activators to treat muscle weakness conditions (plus rights
to CK-2127107 in non-neuromuscular indications)

Trevena

Forest
Laboratories

$460

Exclusive option on global development and commercialization rights of small molecule beta-arrestin stimulator/angiotensin II (AT-1 receptor) antagonist in acute heart failure

30

Amount raised (billions)
Source: BCIQ: BioCentury Online Intelligence, Burrill & Co.

Notable 2Q13 deals
IPOs
Company (lead underwriters)

Amount raised
(millions)

Change in stock
price since offer

Date
completed

PTC Therapeutics (JPMorgan; Credit Suisse)

$144.4

0%

20-Jun

Portola Pharmaceuticals (Morgan Stanley; Credit Suisse)

$140.4

69%

22-May

Chimerix (Morgan Stanley; Cowen)

$117.9

73%

11-Apr

bluebird bio (JPMorgan; BofA Merrill Lynch)

$116.1

47%

18-Jun

Prosensa (JPMorgan; Citigroup)

$89.7

48%

27-Jun

Epizyme (Citigroup; Cowen; Leerink)

$88.7

88%

30-May

Receptos
(Credit Suisse; Leerink; BMO Capital Markets)

$85.0

42%

8-May

Esperion Therapeutics (Credit Suisse; Citigroup)

$80.5

1%

26-Jun

aSeries

Deal description

E extension. bValues include milestones. Source: BCIQ: BioCentury Online Intelligence.

Walter Yang is Research Director at BioCentury
672

volume 31 number 8 august 2013 nature biotechnology

n ew s feat u r e

npg

These sums may be a drop in the ocean for
larger firms, but many smaller biotechs, too,
need to make their voices heard among policymakers and regulators, to maximize the chances
of their products reaching the intended customers. This is especially true for those working on
The drug industry leads the US in lobbying spending. What does
novel technologies, in new therapy areas or in
all that money get them? Melanie Senior investigates.
areas where regulatory standards are unclear.
Politicians can’t prioritize everything. For
these companies, which are often operating
In June 2013, Oregon became the fifth US state big reasons for this: (i) drugs are highly regu- at a loss, lobbying costs can be much higher,
to introduce legislation requiring physician lated, and (ii) the government is also the biggest relatively speaking. Their interests are often too
notification when pharmacists substitute bio- payer, even in the US. Top company spend- product- or indication-specific to be covered
logics with cheaper biosimilars. It was another ers Eli Lilly, based in Indianapolis, and New by the Biotechnology Industry Organization
(BIO), the industry’s
small victory for innovator biologics compa- York’s Pfizer, each
Washington, DC–
nies seeking to limit biosimilar uptake—and forked out more than
based trade group,
another example of the impact of lobbying on $10 million in 2012
which itself spent
state lawmaking, the latest biosimilar battle- lobbying Congress
on a host of issues
$7.5 million on lobground (Box 1).
bying activities in
The biosimilar war has showcased how lob- including healthcare
2012—roughly
a
bying can influence healthcare policy and legis- reform, patent laws
third of which goes to
lation. The recent state-level advocacy mirrors and Medicare reimpay outside lobbyists.
that of the more powerful American Legislative bursement, according
Current BIO prioriExchange Council (ALEC), a club of big cor- to the CRP website
ties include FDA and
porations and state legislators covering (and OpenSecrets.org. For
National Institutes of
shaping) a host of policy areas across all sectors. larger companies,
Health funding, proALEC has come under growing scrutiny, most of the reported
and its membership is shrinking (Box 2). Yet money goes to profes- Where the action is in Washington lobbying circles. tecting reimbursement
(including
although lobbying remains a dirty word in sional lobbying firms
many quarters, it’s a reality for industry in any in Washington; the rest goes to the salaries of Medicare) and ensuring that tax reforms continue to encourage innovation.
democracy. Biotech is no exception. So, beyond employees engaged in advocacy.
In biotech, Amgen, based in Thousand Oaks,
Orexigen Therapeutics of La Jolla, California,
biosimilars, how does advocacy serve the
sector—especially the small, pre-commercial California, easily tops the lobbying league in a pre-commercial, publicly traded biotech
terms of spending and, thus, influence: it forked focused on obesity, invested $486,000 in lobbyfirms?
out $9.3 million in 2012 (Table 1). Much of ing in 2012, up from $400,000 the year before.
Amgen’s activities, mediated by more than two Its CEO, Mike Narachi, firmly believes in the
Lobbying basics
The US drug industry spends more than dozen lobbying firms as well as its own team, value of advocacy. Granted, Narachi also hapany other sector on lobbying each year: were related to biosimilar legislation, implemen- pens to be a board member of both BIO and the
$234 million in 2012, according to the Center tation of the Patient Protection and Affordable Pharmaceutical Research and Manufacturers of
for Responsive Politics (CRP), a nonprofit Care Act, US Food and Drug Administration America (PhRMA), which represents researchresearch group in Washington, DC. (Insurers (FDA) funding and reimbursement for dialysis focused companies. And he spent the first 20
years of his career at lobby-leader Amgen. “I
were second, with $152 million.) There are two drugs.
have come to appreciate and understand how
important it is to influence effectively,” he says.
Box 1 David versus Goliath: the US biosimilars act
In early 2011, Orexigen’s obesity drug, naltrexone/bupropion (Contrave) received a comThe FDA is officially open for biosimilar business, but it has yet to receive its first
plete response letter from the FDA calling for a
submission. That’s because the regulatory pathway itself—which eventually emerged as part
preapproval trial for cardiovascular outcomes.
of President Barack Obama’s Patient Protection and Affordable Care Act, signed into law in
But the agency didn’t initially specify what safety
2010—is highly innovator friendly. That, in turn, reflects the David-versus-Goliath nature of
standards the drug had to meet. So, given the
this battle, with multibillion-dollar biotechs Amgen and Genentech on one side (combined
ambiguity, “we started the influence process,”
annual 2012 lobbying spending: $13.8 million) and a far less coordinated patchwork of
says Narachi. The company did so, he elaborates,
companies and organizations on the other (annual 2012 lobbying spending by the Generic
not because “we wanted to challenge FDA’s stanPharmaceutical Association, Teva and Hospira: about $6 million).
dard, but because we wanted clarity on what it
The pathway requires generics companies to reveal their (otherwise confidential)
would take to meet it.”
submission packages to the originators, allowing the latter to prepare patent-infringement
Alamy Stock Photos)

© 2013 Nature America, Inc. All rights reserved.

Spreading biotech dollars around
Washington

claims. It doesn’t guarantee interchangeability and grants 12 years’ data exclusivity to
originator biologics—considerably more than the five years granted under the Hatch-Waxman
Act for small molecules. “Essentially, the innovators got what they wanted. In fact, if they’d
served a road map of their ideal bill, this would have been 90% of it,” notes a spokesperson
from the generics camp.

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

Start with the FDA, then go for the Hill
Requesting clarity is fair enough: the company needed to know what goals to meet for
naltrexone/bupropion to have any chance of
reaching patients. So, what, in Narachi’s view
673

N E W S feat u r e

Table 1 Top spenders of 2012
Biotech company

Spending ($ millions)

Amgen

9.3

Biogen Idec (Cambridge,
Massachusetts)

1.7

Cubist Pharmaceuticals
(Lexington, Massachusetts)

1.6

Gilead Sciences

1.6

Genzyme

1.5

Pharma company
Lilly

11.1

Pfizer

10.2

Merck (Whitehouse
Station, New Jersey)

9.5

Amgen

9.3

Sanofi

7.7

npg

© 2013 Nature America, Inc. All rights reserved.

Source: CRP

at least, is ‘the influence process’? It involves,
first of all, talking to those close to the issue
at hand, including the head of the FDA and/
or the FDA’s Center for Drug Evaluation and
Research, and perhaps also some of those on
the relevant FDA advisory committee. If no
progress is made there, one may have to go
higher, explains Narachi. That means turning
to Capitol Hill, congressional committees and/
or individual members of congress, and persuading them, in turn, to put pressure on the
FDA. “Savvy industry lobbyists know how crucial it is to target those committees, and their
members, that have jurisdiction over industry
priorities,” notes Celia Wexler, who lobbies for
accountability and transparency at federal agencies as senior Washington representative at the
Union of Concerned Scientists, headquartered
in Cambridge, Massachusetts. For biotech, the
most influential committees are the Senate
Committee on Health, Education, Labor and
Pensions, and the House Energy and Commerce
Committee’s subcommittee on health.
Using Capitol Hill to influence the FDA
is a delicate, if effective, lobbying technique.
It received a near-fatal blow in 2009, when a
knee-repair device manufactured by ReGen
Biologics of Hackensack, New Jersey, that had

been repeatedly rejected by FDA advisers was
approved as a result of pressure from four New
Jersey congressmen whom ReGen had persistently lobbied and donated campaign funds
to. The product, Menaflex (a collagen scaffold
designed to replace damaged cartilage), was
withdrawn two years later owing to adverse
events and long recovery periods. That saga
“chilled the [lobbying] industry,” says James
Ravitz, partner at law and lobbying firm Arent
Fox in Washington, DC. It recovered, although
lobbying activities—and even the word itself—
continue to arouse suspicion. Several companies
were unwilling to comment for this article, and
even BIO was tight-lipped about its lobbying
priorities.
Orexigen’s Narachi points to what he claims
are concrete results from his group’s advocacy
efforts, some of which were in conjunction
with a coalition of other obesity drug developers, academic researchers and patient groups. At
a macro level, the reauthorization in July 2012
of the Prescription Drug User Fee Act (which
secured the FDA’s funding through 2017)
included a paragraph highlighting the need to
clarify the development path of obesity drugs.
No details of that path were provided in the bill,
but “at least we got Congress to say they needed
to do something about it,” says Narachi. More
specifically to Orexigen, the company received
a clear answer about the safety standards its
product had to meet, albeit by following the formal dispute-resolution process with the FDA.
Orexigen expects to file naltrexone/bupropion
for approval in the US and the European Union
(EU) this year.
Thus far, the safety hurdles outlined for naltrexone/bupropion haven’t been expanded into
an updated guidance for obesity drugs. But
Orexigen, along with a cluster of other obesity
drug developers, patient advocacy organizations, academics and professionals, has a bigger, more ambitious and ultimately even more
important lobbying goal: achieving Medicare
reimbursement for obesity drugs. At present,
anorectics are specifically excluded from reimbursement under Medicare Part D. Elderly,

Medicare-eligible patients aren’t the biggest
consumers of obesity drugs, but Medicare’s coverage policy strongly influences the policies of
commercial payers. They do matter.
Changing reimbursement policy requires a
change in law. That means targeting Congress.
But “to influence that [legislative] process, you
have to talk to everyone” dealing with the cost
burden of obesity, says Narachi. That, of course,
includes the Center for Medicare and Medicaid
Services (CMS). But it also means addressing
key employers, such as the US Army, which
purchases insurance on behalf of its employees.
Granted, this moves away from political lobbying and into the realm of commercialization
and market access. But it’s all part of the effort
to influence the system as a whole.
Two recent developments, announced within
days of each other in June, suggest that this influence peddling is working: the American Medical
Association’s decision to classify obesity as a disease may help persuade payers to cover treatments and encourage the FDA to approve them
faster. And the introduction of the bipartisan
Treat and Reduce Obesity Act in the Senate and
House of Representatives on 19 June will allow
the CMS to cover obesity drugs under Medicare
Part D. For Narachi, this outcome “is an example
of many constituents…using our collective voice
to help influence policy and law changes over a
long period of time.”
Most smaller biotechs can’t afford to invest
in multiyear advocacy efforts. Yet, especially
for companies working on new (and sometimes controversial) technologies such as gene
therapy or stem cells, “it pays to ensure you have
political support from healthcare leaders in
Washington, so they’re receptive to your product
going forward,” says Michael Werner, partner at
law firm Holland and Knight in Washington,
DC, and executive director of the Alliance for
Regenerative Medicine, which promotes patient
access to regenerative medicines. Precisely when
firms should start thinking about lobbying will
depend on their focus area. “If you’re doing
something really controversial, you might want
folk helping you from the start. Others tend to

Box 2 Companies and state lawmakers snub federal government
The American Legislative Exchange Council (ALEC) is a 40-yearold, nonpartisan partnership of conservative state lawmakers
and corporations. It claims to support free-market enterprise and
limited government, declaring its distaste for the “bloated federal
government in Washington, DC.” It boasts an “unmatched record
of achieving ground-breaking changes in public policy,” working
across a huge range of policy areas including energy, education,
public-sector pay and civil liberties.
But in the past couple of years ALEC’s activities have come
under fire from civil rights groups and journalists. They accuse

674

the group of driving laws designed to increase corporate
profits at the expense of individuals and the environment and
of undermining the public’s democratic freedom to shape its
country. Several dozen members have left the organization as
a result of the negative PR, including, most recently, Londonbased GlaxoSmithKline. That company was previously a member,
along with Pfizer, Bayer (based in Leverkusen, Germany) and
the Pharmaceutical Research and Manufacturers of America, of
ALEC’s private enterprise advisory council. Amgen left ALEC in
August 2012.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

n ew s feat u r e
wait until they have a program in phase 1/2,
and then start thinking about how it might be
approved and reimbursed,” advises Werner.
Smaller firms’ lobbying dollars tend to be
more focused on a particular project or issue,
which can mean highly variable year-to-year
spending. Isis Pharmaceuticals of Carlsbad,
California, for instance, spent $200,000 in 2011
on lobbying for patent reform and to promote
policies encouraging the use of novel technologies such as its own antisense platform. In 2012,
spending by Isis dropped to just $10,000.
Similarly,
lobbying
spending
by
Organogenesis, a regenerative medicine firm
based in Canton, Massachusetts, peaked in 2011
and 2012 at $200,000 and $150,000, respectively
(from just $60,000 in 2010), as the company’s
novel dentistry product, Gintuit, approached
and emerged from regulatory review. Gintuit
was approved by the FDA in March 2012 as the
first allogeneic cell–based product for therapeutic human use.
The money pit
Isis’s reported 2011 lobbying investment all went
to law firm McDermott, Will and Emery, based
in New York (many lobbyists work at law firms).
Not all biotechs hire external lobbyists, though.
At Orexigen, Narachi says, “we’re the ones
knocking on doors in Congress, visiting agencies
and policymakers, and writing letters, including
to FDA.” His experience and position at BIO and
PhRMA make that more likely, perhaps, but in
any case, “the most effective [advocacy] interactions are when the individuals involved sit down
and share views with persons of influence,” he
argues. Even if they don’t do the actual persuading, however, experts can help companies to
navigate the most appropriate course to achieve
their aims. This may include helping to draft a
letter to the FDA or Congress, or advising on the
process of getting a bill onto Capitol Hill.
Still, for those who can afford them, the best
professional lobbyists “are profoundly good
advocates,” notes one Washington, DC–based
lawyer. “They learn their cases, marshal evidence and provide packages to senators that
convincingly justify their positions.” One highlevel Republican lobbyist is reported to have quit
his previous job as a trial lawyer because “it’s
basically the same job, but with better hours.”
External lobbyists also offer firms the means
to invest sufficient time in building trust-based
relationships with individual lawmakers—time
that CEOs and top management can’t often
spare. These established relationships make it
easier to capture politicians’ attention regarding specific issues and thus to influence policy
decisions. Individual members of congress
and senators typically deal with thousands
of issues—from malfunctioning traffic lights

Box 3 EU lobbying on the sneak
Lobbying is as much a part of the ecosystem in Brussels, the heart of EU lawmaking, as
it is in Washington, DC. The cities have similar numbers of lobbyists (15,000–30,000),
although Washington wins outright in terms of the amount of money sloshing around. Firms
lobby European commissioners (who propose legislation) or members of the European
Parliament (who debate and amend legislation) just as they would members of congress
or senators. But comparisons are tricky; drug companies aren’t required, as they are in the
US, to disclose lobbying expenditure. Efforts are under way to make European lobbying
more transparent: there’s a code of conduct for lobbyists, and the European Parliament
and Commission set up a lobbyists’ register two years ago (http://europa.eu/transparencyregister/). All lobbyists must be registered to enter the Parliament or Commission buildings,
but the register is voluntary and has been boycotted by some large law firms.
In the UK, a long-promised statutory lobbyists’ register is back on the agenda after
another scandal involving a member of parliament who raised a legislative issue in
exchange for cash from a special-interest group. The planned register would include only
third-party lobbyists, however, and require minimal information.

in their districts to big-ticket health policy—
and, for the most part, “struggle to get into the
kind of detail which really matters,” notes the
Washington lawyer. That means that when
they’re forced to make a big decision in a specialist field, such as the length of biologic exclusivity guaranteed as part of the laws that set up
a regulatory pathway for biosimilars, many will
end up relying on the companies, brands and
individuals they know and trust. “As a congressman, am I going to trust Pfizer, Amgen or any
other big name, or am I going to trust this guy
representing a [smaller, lesser-known] generics
firm?” illustrates the lawyer.
The biggest, best-established biopharma
brands will probably also make considerable
campaign contributions. “Especially in states
where there’s a big industry presence, many
companies will contribute to the campaigns
of senators or their opponents, depending on
their position,” explains one Washington-based
policy expert. These contributions amount to
issue-focused advocacy; in effect, “you’re lobbying to put a particular person in or out of
office,” explains the expert. There are legal limits
on donations to particular candidates or committees, but not on contributions to ‘superPACs’:
political action committees that do not donate
directly to candidates but can spend unlimited
sums on issue advocacy.
According to the CRP, which also lists campaign contributions, Amgen committed more
than $2.5 million to political action committees and candidates in the 2012 election cycle.
That’s not as much as the $4 million donated
by Pfizer (which the CRP classifies as a ‘heavy
hitter’—one of the top 140 overall donors to
federal elections since 1990), but it dwarfs
sums donated by global generics firms Teva
Pharmaceutical Industries, based in Petah
Tikva, Israel ($334,000), or Hospira, based in
Lake Forest, Illinois ($176,000). Other biotechs

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

paying considerable sums include Celgene, of
Summit, New Jersey, ($417,000), Foster City,
California–based Gilead Sciences, ($202,300),
and even Vertex Pharmaceuticals of Cambridge,
Massachusetts ($62,000).
Bottom line
It is tricky, if not impossible, to calculate the ROI
for lobbying. Disclosure rules, plus organizations such as the CRP, have shone a bright light
on the overall sums involved in lobbying in the
US—information that’s not often available to the
same extent in other countries (Box 3). But even
in the US, it remains impossible to link specific
expenditure with specific goals, making this “an
imperfect system at best,” according to Wexler of
the Union of Concerned Scientists.
For firms such as Amgen or Genentech, based
in South San Francisco, California, with billions
of dollars at risk from generics or biosimilars,
the figures involved seem to be worthwhile, even
if they delay unfriendly legislation by just a few
months. For smaller companies, it’s often less
clear cut. “It’s hard to generalize on what lobbying money most effectively buys,” concludes
Ravitz. Success is not guaranteed, no matter
how biotechs spend their money—lobbying or
otherwise.
Yet lobbying is clearly a reality in today’s business. “There’s no question that you can improve
your odds, and mitigate risk, by having good
advocates who represent you before policymakers and help you navigate that world,” says
Werner, who equates lobbying with investor or
public relations in its value to a company. It just
has a different audience: government policymakers instead of investors or the media. Lobbying
also remains much more secretive—not least
because, according to one advocate within a
science-focused nonprofit, “the best lobbyists
are those who leave the fewest fingerprints.”
Melanie Senior, London

675

building a business

Stock options and beyond
John J Cannon III & Mark Kessel

npg

© 2013 Nature America, Inc. All rights reserved.

I

f you decide to found a company, you need
to understand how your investment in time,
vision, savings, and blood, sweat and tears
is likely to be compensated via equity-based
remuneration, particularly stock options. You
also need to appreciate how to attract, retain and
motivate your employees through appropriately
designed equity incentives. Similarly, if you are a
researcher leaving academia to work in a startup
or even an established biotech company, it
would behoove you to familiarize yourself with
the stock options or other equity awards that
you may be offered as part of your compensation package and the reasons why you and your
employer may have different preferences regarding the selection of the form of equity award.
In this article, we summarize the advantages
and disadvantages of stock options, the important accounting, US tax and other regulatory
constraints that you need to appreciate, and
possible modifications or alternatives to stock
option programs that may be available to you.
(Also provided, in Table 1, is a list of some of the
not entirely familiar compensation terminology
with which you may have to acquaint yourself
when you leave academia for a startup.)
The basics
Over the past quarter century—at least until
fairly recently—stock options have been the
equity award of choice for US companies, particularly in the biotech space. Stock options
provide employees the ability to participate in
equity appreciation without an up-front investment of money and with control over the timing
of recognition of taxable income. As a founder
or other employee of a biotech startup, you may
correctly consider these features attractive. That
being said, it is important that you understand

John J. Cannon III is a partner and Mark Kessel
is counsel at Shearman & Sterling LLP,
New York, New York, USA.
e-mail: [email protected] or
[email protected]

676

that your ability to
‘cash out’ of these
options is dependent
on whether your company can find a suitable exit through the
sale of the company or
an initial public offering (IPO) of its securities—events sadly not
as common as they
once were.
Even if your com- For any member of a startup, understanding the accounting, US tax and
pany does manage to other regulatory constraints associated with stock option programs and their
complete an IPO, you alternatives is paramount to maximize the financial reward.
may not be able to
divest all your holdings or otherwise monetize cise or later continues to make these financial
your options. Moreover, there is no guarantee instruments very attractive to founders and
that all—or even any—of your options will be ‘in other employees of nascent biotech companies.
the money’ (that is, with an exercise price below
A non-qualified stock option that is not subthe price of the underlying stock) at the time of ject to Section 409A of the US tax code (see
an IPO or will remain in the money after the below) generally is not taxable upon grant or
IPO. This is perhaps the greatest shortcoming of vesting; instead, ordinary income tax is due
stock options in an industry where equity values upon exercise of the option based on the posican be so volatile and unpredictable.
tive difference between the exercise price and
If you are a founder, and wearing your the fair market value of the stock underlying the
employer hat, you also need to recognize that the exercised option.
tax and accounting treatment of stock options
In contrast, provided that various
is more complex and less favorable than it once conditions—most notably, an exercise price no
was. In short, stock options may not be the per- lower than fair market value as of the date of
fect incentive device that they often have been grant (higher for certain substantial shareholder
touted as representing.
employees) and holding period requirements—
are satisfied, ISOs are not taxable upon grant,
Differences between stock options
vesting or exercise. Instead, the first related taxStock options may be either non-qualified or able event occurs upon the sale of the shares
tax qualified, with the latter being governed received upon exercise, with any appreciation
by Section 422 of the Internal Revenue Code above the exercise price paid being taxed as a
and labeled ‘incentive stock options’ (ISOs). capital gain. These potential benefits are what
Whether one is considering non-qualified or give ISOs their cachet, and are why you may
tax-qualified options—and despite changes in have heard industry peers talk about them at
accounting rules, described below, that have social events.
removed certain extra incentives for the use
However, in reality, typically very few ISOs
of options—the ability of an option holder to qualify for this favorable tax treatment because
delay income recognition (and the related need employees rarely satisfy the requirement to hold
for liquidity to pay associated taxes) until exer- the shares for at least one year from exercise and
volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

James Endicott © Images.com/Corbis

Throwing some light on the Byzantine rules surrounding stock options and other equity associated with startups.

b u i l d i n g a b us i n e ss

npg

© 2013 Nature America, Inc. All rights reserved.

Table 1 Common terms relating to stock options and equity
Term

Description

Stock option

A contractual right granted to an employee entitling the recipient to buy shares of stock at a specified price (the ‘exercise’ or ‘strike’
price), typically subject to a vesting schedule whereby the option is not exercisable—and may be forfeited upon termination of
employment—until it ‘vests’. Once the option has vested, the holder typically may exercise the option (that is, buy the underlying stock
by paying the exercise price) at any time within the remaining term of the option. Options customarily have a total maximum term of five
to ten years, but usually remain exercisable for only a limited period of time following termination of employment. If, as of a given time
after the grant of the option, the shares subject to the option have a value that is greater than the exercise price (such excess is known as
the ‘spread’), the options are referred to as being ‘in the money’; if, on the other hand, the exercise price is greater than the value of the
underlying stock, the options are described as ‘out of the money’ or ‘underwater’.

Restricted stock

Shares of stock that are issued initially to an employee but remain subject to potential forfeiture upon termination of employment until
they vest; in private companies such shares typically are also subject to transfer restrictions that extend beyond the vesting period and
lapse upon a sale or IPO of the company. May also be subject to objective performance-based conditions to vesting: for example, successful clinical trials or the achievement of financial objectives (in which case it may be labeled ‘performance stock’ or a similar designation).
Usually granted for no value other than services.

Restricted stock unit
(RSU)

A contractual promise to deliver stock on a specified date in the future to an employee, subject to forfeiture upon termination of employment until they vest. RSUs may also be subject to the satisfaction of performance conditions (in which case they may be labeled ‘performance units’, ‘performance share units’ (PSUs) or a similar designation).

Stock appreciation
right (SAR)

A contractual right granted to an employee entitling the recipient to receive (either in cash or stock) the positive difference (if any)
between the value of stock at the time of grant and the value at the time of exercise, typically subject to a vesting schedule whereby
the SAR is not exercisable—and may be forfeited upon termination of employment—until it vests. An SAR is effectively the economic
equivalent of a stock option, but it does not require the payment of an exercise price; instead, the analog of the exercise price is simply
deducted from the value of the shares at the time of exercise of the SAR to determine the amount payable.

Phantom share

A contractual promise to pay the cash value of a share of stock on a specified date in the future to an employee, subject to forfeiture
upon termination of employment until they vest. In effect, a cash-settled RSU. Can be made subject to performance conditions as well.

Non-qualified deferred Various compensation arrangements in which payment is made in a taxable year later than the year in which the relevant services were
compensation
performed by an employee. Frequently, non-qualified deferred compensation (NQDC) takes the form of supplemental executive pensions
(SERPs) and deferred compensation plans pursuant to which employees may elect to receive part of their salaries or bonuses in future
years rather than currently. These arrangements are labeled ‘non-qualified’ to distinguish them from ‘qualified’ plans, which include
401(k) plans and traditional pension plans. RSUs, SARs and phantom shares all are forms of non-qualified deferred compensation.

two years from grant of the ISO. Employees generally do not like to put up the cash to exercise
options without almost simultaneously selling
the shares received, or at least enough of them to
cover the exercise price paid. For these reasons,
as well as unattractive and often unanticipated
Alternative Minimum Tax treatment of ISO
exercises and the unavailability of corporate
deductions for those ISOs that satisfy the conditions for favorable employee treatment, ISOs
are rarely used by public biotech companies and
probably are a questionable choice for private
entities as well. In our experience, ISOs in effect
make promises that they cannot keep.
Tax considerations and Section 409A
Non-qualified deferred compensation arrangements have been commonplace in corporate
America for many years, but the US Internal
Revenue Service (IRS; Washington, DC) has
long been concerned by what it perceived as frequent abuses and tax avoidance schemes hiding
under the deferred compensation label. Scandals
at companies like WorldCom (Clinton, MS) and
Enron (Houston, TX), in which (among other
things) executives accelerated the payment of
deferred compensation to avoid their employer’s
default while rank-and-file employees had their
savings stuck in 401(k) plans invested in soonto-be-worthless company stock, spurred the US
Congress to take action.
This action resulted in Section 409A of the
US Internal Revenue Code and the complicated

and lengthy associated regulations subsequently
adopted by the IRS. Section 409A governs all
forms of non-qualified deferred compensation, which it defines broadly to include many
arrangements—such as certain severance plans
and agreements and equity-based incentive
awards—not usually understood by researchers
(or business people, for that matter!) to constitute deferred compensation.
In general, Section 409A imposes strict rules
on the timing of deferral elections, permissible
payment events and the ability to accelerate or
further defer compensation once the original
deferral terms have been set. The gist of these
rules is to prevent the manipulation of the timing of income recognition. Although Section
409A does not prohibit the deferral of compensation, the statute greatly circumscribes the flexibility that previously applied. Failure to comply
with Section 409A’s rigid requirements has serious consequences for employees: accelerated
income recognition and taxation; imposition of
an additional 20% tax; and an interest charge.
Included within Section 409A’s coverage are
stock options with an exercise price per share
below the fair market value of the underlying
stock as of the grant date. If a typical US stock
option were deemed to be non-qualified
deferred compensation, it would violate
Section 409A because of the absence of predetermined payment dates or events and consequently the holder would be taxed at ordinary
income rates upon vesting of the options and

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

be subject to the 20% extra tax plus interest.
Accordingly, you should make certain that any
stock options granted to you are exempt from
Section 409A.
As you can imagine, the determination of
the fair market value of a startup company
can be a difficult and occasionally speculative undertaking. Under Section 409A, in the
case of stock not “readily tradable on an established securities market,” fair market value
means “a value determined by the reasonable
application of a reasonable valuation method.”
There is a presumption that an independent
appraisal results in a reasonable valuation for
a period of 12 months. This presumption may
be rebutted by the IRS upon a showing that the
valuation method or application of the method
was grossly unreasonable. This may occur, for
example, if a board of directors relies on a valuation even though the board has good reason to
believe there have been fundamental changes
to the business since the date of the valuation.
Given the substantial adverse tax consequences under Section 409A of an option
being deemed in the money on the date of
grant, great care must be taken by startups to
establish strong support for the valuation used
to set the exercise price of options.
Other considerations
Although this is perhaps of minor interest to
a non-management employee, if you end up
founding a company or otherwise being part
677

npg

© 2013 Nature America, Inc. All rights reserved.

b u i l d i n g a b us i n e ss
of a biotech company’s management team, you
should familiarize yourself with some of the
other technical considerations that can affect the
choice of whether to use options or other forms
of equity awards, particularly if you hang around
with the company long enough that it ends up
floating on the public markets to raise finance.
These considerations include the deductibility of
incentive compensation under Section 162(m)
of the Internal Revenue Code, and the accounting treatment of stock options and other equity
awards, which can have a profound impact on
your company’s earnings or profits reflected in
its financial statements provided to investors.
Following a post-IPO transition period, a
newly public biotech company will become
subject to Section 162(m) of the Internal
Revenue Code, which limits the deductibility
of annual compensation in excess of $1 million
paid to the company’s CEO and three (or two,
in the case of “Emerging Growth Companies”:
newly listed companies with annual total
gross revenues of less than $1 billion) most
highly compensated executive officers—
other than the CFO—listed in the summary
compensation table contained in the compa-

ny’s proxy statement (the disclosure document
distributed to public company shareholders in
connection with the company’s annual meeting and the matters, including the election
of directors, to be voted on). The most substantial relief from this deduction limitation
is the exception for “qualified performancebased compensation,” and the easiest way for
a company to deliver incentive compensation to executive officers in compliance with
this exception is by granting at-the-market
stock options. Unlike other forms of exempt
performance-based compensation, stock
options need not be subject to pre-established, shareholder-approved performance
conditions—the mere fact that they have no
realizable value unless the company’s stock
price increases is enough for the Section
162(m) deduction limitation not to apply. This
fact, together with the favorable accounting
treatment accorded options until the adoption
of new accounting standards described below,
is the principal reason for the explosion in the
use of stock options in the US in the 1990s
through the mid-2000s.

Box 1 Possible fixes for underwater options
Underwater options are less of a headache for those working for a startup than for those
at a publicly held company. Private companies may re-price underwater options relatively
easily, although, as with public companies, incremental accounting expense will be
recognized in an amount equal to the excess, if any, of the fair value of the option as
modified over the fair value immediately before modification. The most direct approach
is simply to lower the price of the existing options. For purposes of Section 409A, such
a modification would result in a deemed new grant but, so long as the new exercise
price is not lower than the fair market value of the underlying stock as of the date of the
modification, no violation of Section 409A will result.
In contrast, public companies face substantial practical constraints in dealing with
underwater stock options. Among other things, the stock exchanges require shareholder
approval of ‘repricings’ (which is very broadly defined to include not only direct reductions
in exercise price but also various transactions by which underwater options are replaced by
new awards with substantially identical accounting value) unless the relevant equity plan
approved by shareholders expressly authorizes repricings. This, however, is highly unlikely
to happen, as the two major proxy advisory firms, ISS (Rockville, MD) and Glass Lewis
(San Francisco, CA), will recommend against shareholder approval of any equity plan that
permits repricings. In addition, those firms and institutional shareholders will not abide
simple exercise price reductions because of both the incremental accounting expense and
ever-increasing governance concerns.
As a practical matter, then, any public company that wants to implement a repricing
will need to obtain specific shareholder approval of an exchange of the underwater options
for new awards of equal value. Often, this involves the issuance of a number of at-themarket options lower than the number of underwater options being replaced or of even
fewer restricted shares or restricted stock units. Furthermore, unless the relevant plan
would permit unilateral action by the company to replace the outstanding awards (which
is improbable), a typical value-for-value exchange is likely to require participant elections
through a registered exchange offer filed with and cleared by the US Securities Exchange
Commission (SEC), a not-so-simple undertaking. This is because the new awards have a
different economic profile than the underwater options that they would replace, so that the
employees’ election involves an investment decision between different securities, thereby
triggering an SEC filing.

678

Accounting implications
The Financial Accounting Standards Board
(Stamford, CT; a not-for-profit self-regulatory
organization of the US accounting profession,
known as FASB), with the blessing of the US
Securities and Exchange Commission (SEC;
Washington, DC), establishes the accounting
standards (US GAAP) applicable to audited
financial statements in the United States,
including those contained in securities filings
by listed companies. As a consequence, earnings and other financial measures of company
performance must be determined in accordance with US GAAP, and in turn earnings
can be materially affected by how compensation expense is measured. Until late 2004, the
then applicable US GAAP rule (APB 25) provided that an option with a strike price set at or
above the fair market value of the underlying
stock at grant would generate no compensation expense to the issuing company. In other
words, the issuance of options would not reduce
the earnings of the issuing company released to
the public. This was because under APB 25’s
‘intrinsic value’ accounting, the only expense
generated by a typical stock-based award was
the ‘spread’ at grant. For a restricted share or
restricted stock unit granted for services rather
than cash, the application of this methodology
generally meant that the expense would be the
stock price at grant; for standard options, the
expense would be zero. For many companies,
particularly startups, stock options seemed like
‘funny money’—a form of compensation that
was perceived by employees as highly valuable
but that required no outlay of cash by employers
and no accounting charges which would reduce
company earnings.
Although the accounting profession recognized for many years that APB 25’s treatment
of options did not reflect economic reality, the
move by FASB to fair value accounting (which
attributes to the option itself, for purposes of
calculating compensation expense, a ‘fair value’
based on Black-Scholes or other option pricing
models developed by academic economists)
was delayed for many years by intense lobbying efforts from the private sector (particularly
the technology industry) and members of the
US Congress representing districts with large
concentrations of tech startups. The opponents
of fair value accounting argued, among other
things, that the favorable accounting treatment
of options under APB 25 was a key driver of
the success of the US technology industry and
prophesied disastrous consequences if it were to
be abandoned. These prophesies have not come
true, but the accounting change did affect the
prevalence of options.
Once the US GAAP rule (FAS123(R), subsequently redesignated as ASC 718) came into

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

b u i l d i n g a b us i n e ss

npg

© 2013 Nature America, Inc. All rights reserved.

effect in December 2004, many companies (and
almost all public issuers) moved away from
an exclusive reliance on stock options. Today
most use a mix of options and other forms of
equity awards, predominantly restricted stock
or restricted stock units (which, if also made
subject to performance-based conditions, are
often referred to as ‘performance shares’ or
‘performance share units’). Although this can
in part be attributed to the change in accounting rules, it also reflects the impact of the options
backdating scandals (in which some companies
were found to have retroactively set grant dates
to take advantage of lower stock—and hence
option exercise—prices), corporate malfeasance and meltdowns at WorldCom and Enron
(arguably contributed to by executives who tried
to artificially prop up stock prices) and, more
recently, the financial crisis.
Pitfalls of stock options
When you are offered or elect to take stock
options in a biotech company, you need to keep
in mind some key issues. The foremost of these
relate to company liquidity and the underperformance of company stock.
Private company liquidity issues. As noted
above, the ability to defer taxation until the
exercise of stock options (and potentially later
in the case of ISOs) is one advantage of stock
options as a means of compensation. However,
the deferral potential of a stock option is not
unlimited, as most options have a maximum
term of between five and ten years. Moreover,
to the extent that they are not forfeited upon
termination of employment, options typically
are (and to qualify for ISO treatment must be)
exercisable for only a limited time following
termination. This means that a stock option
holder may be required to put up cash to exercise options before having the ability to sell the
shares to recoup the cost. (Shares in a private
company typically are subject to transfer restrictions preventing shareholders from selling their
shares until an IPO or sale of the company.)
Private equity, venture capital and other financial investors in biotech startups are unlikely to
be willing for the startup to extend loans to fund
employees’ exercises of stock options, and in any
event such loans would run afoul of Section 402
of the Sarbanes-Oxley legislation if the company were to go public. (Section 402 prohibits
employer loans or other extensions of credit to
officers arranged by US public companies.) As a
result, risk-averse employees may leave options
unexercised, particularly where, as is often the
case with biotech companies, the prospect of a
corporate liquidity event (that is, an IPO or sale
of the company) may seem too distant.
Founders and their management teams that

seek to extend the term of stock options to
mitigate liquidity problems need to consider
the consequences under Section 409A of the
Internal Revenue Code. Although the amendment of a non-qualified option’s terms to extend
post-termination exercisability (but not beyond
the ultimate, maximum term of the option or
ten years from grant, if earlier) will not endanger
the exception from Section 409A, an extension
of the maximum (typically five- to ten-year)
term of an in-the-money non-qualified stock
option will result in the option being deemed
subject to, and in violation of, Section 409A
from the date of grant.
Underwater options. The shortcoming of
share ownership most commonly experienced
by founders and employees of biotech startups
is the phenomenon of out-of-the-money, or
‘underwater’, options. This especially plagues
biotech firms, the financial results and stock
valuation of which are highly volatile and subject to frequent and sometimes lengthy setbacks
due to difficulties and delays in the regulatory
or commercialization process. What companies
can do, if anything, about underwater options is
a more difficult, technical question.
Box 1 outlines some measures companies can
take to fix them.
Alternatives to stock options
Many biotech companies now supplement or
even replace the use of options with so-called
‘full value awards’, equity awards that correspond to a share rather than an option or stock
appreciation right, giving the holder a stake in
downside as well as upside.
Restricted stock. The most traditional form of
full value award is shares of restricted stock—
actual shares granted or sold to employees subject to transfer and forfeiture restrictions and a
vesting schedule. The vesting of restricted stock
usually is time based, but performance conditions also can be applied. Restricted stock is subject to taxation under Section 83 of the Internal
Revenue Code, which provides that restricted
stock is taxed as ordinary income when it vests
and ceases to be subject to a “substantial risk of
forfeiture,” although the recipient of restricted
property can also accelerate taxation to the date
of grant (thereby assuring that any subsequent
appreciation will be taxed as capital gains at a
lower rate than ordinary income) by making an
election under Section 83(b) that is filed with
the employer and the IRS within 30 days following the date of grant. Being subject to Section 83
means restricted stock is categorically excluded
from coverage under Section 409A. Even during
the heyday of stock options restricted stock was
fairly frequently used at startups, particularly

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

for the senior-most executives and where initial
valuation was low, making a Section 83(b) election attractive.
Note that time-based vesting restricted stock,
unlike stock options and performance-based
restricted stock, is subject to the deduction
limitations of Section 162(m) of the Internal
Revenue Code discussed above. This is because
the performance-based exception to Section
162(m) can only be satisfied by either (a) awards
the value of which is based solely on the appreciation of the stock from the date of grant (such
as stock options and stock appreciation rights
that are not in the money on the date of grant)
or (b) awards the value and payment of which
is based on the achievement of predetermined,
objective performance goals. Unlike those two
categories, non-performance vesting restricted
stock has imbedded value and is not contingent
on the satisfaction of performance goals, and
therefore is not exempt from Section 162(m).
Restricted stock units. Restricted stock units
(RSUs) are economically identical to restricted
stock but subject to a different tax regime.
Unlike restricted stock, they represent a contractual promise to deliver actual shares (or,
less frequently, the cash value of shares) in the
future rather than a current transfer of shares. As
such, they are not subject to Section 83 and are
taxable upon payment or settlement rather than
upon vesting. That being said, RSUs issued by
public companies often are settled upon vesting,
thereby eliminating any real-life tax difference
between RSUs and restricted stock.
Section 409A of the Internal Revenue Code
will apply to any RSUs that are payable later than
the March 15 of the year following vesting (the
deadline for application of the so-called ‘shortterm deferral’ exception to Section 409A). As
noted in the discussion of Section 409A above,
Section 409A imposes strict rules on the timing
of deferral elections and the ability to accelerate
or further defer compensation once the original deferral terms have been set. Furthermore,
deferred compensation subject to Section 409A
generally can only be paid on a specified date or
dates or on certain permissible events, namely,
separation from service, death, disability or a
change in control (which does not include an initial public offering). Thus, if Section 409A applies
to an award of RSUs, there will be very limited
ability to accelerate or further defer the payment
of RSUs after grant. Lastly, public company
officers also may be subject to a mandatory sixmonth delay of payment upon separation from
service. Despite these Section 409A–based limitations, many private companies avail themselves
of the ability to defer settlement and taxation
(and the related employee liquidity issues that
they create) to Section 409A–compliant dates or
679

b u i l d i n g a b us i n e ss

npg

© 2013 Nature America, Inc. All rights reserved.

events likely to occur substantially in the future,
when it is hoped that liquidity will be available.
RSUs often are subject to performance conditions, in which case they often are referred to
as performance (stock) units (PSUs). Because
time-based RSUs are subject to Section 162(m)’s
deduction limitation, and institutional shareholders and shareholder advisory firms prefer
performance-based awards, PSUs currently are
the most frequently employed replacement or
supplement to stock options at public biotech
companies. The performance objectives usually are financial, but can also include product
development milestones. Product-related performance goals can be particularly useful at biotech firms, where financial results may be less
important in the short to medium term than
making progress toward regulatory approval or
commercialization.
Other incentive arrangements. Other forms
of long-term incentives include cash- or stocksettled stock appreciation rights (SARs), cashsettled RSUs and PSUs, and other long-term
incentive plans paying bonuses based on the
level of achievement of various financial, operational and product development metrics. Other
than stock-settled SARs, which are accounted
for in the same manner as stock options under

ASC 718, these other cash-settled forms of
award are considered ‘liability awards’, requiring
so-called ‘mark-to-market’ expensing under US
GAAP, whereby the accounting expense associated with an award, rather than being fixed at
grant, is adjusted over time to reflect the award’s
changing value. For this reason, as well as the
cash-poor nature of many private biotech companies and young public companies and the
preference of institutional shareholders for the
greater stockholder-management alignment
of interests produced by equity-settled awards,
these awards are used relatively infrequently.
Conclusions
Although stock options continue to be a
popular employee incentive device, in the
past few years their advantages have been
diminished through accounting and tax law
changes, whereas their shortcomings have
become more apparent in the biotech sector—in which a consistently growing stock
price is far from assured, or even likely. As a
consequence, biotech firms are moving away
from an exclusive reliance on stock options
and instead are using a mix of equity-based
incentives, most commonly a combination
of stock options and performance-based
stock units.

Startups on the menu
In 2011, Steve Finkbeiner, of the University of California, San
Francisco Gladstone Institutes and Taube-Koret Center, participated in
the Bay Area SciCafé following publication of his paper describing small
molecules that stimulate autophagy as possible treatments for neurodegenerative disease (Nat. Med. 16, 1227, 2010). Key to this
discovery was the invention of a patented high-throughput
single-cell imaging platform that makes it possible to
track the development of brain cells from patient-derived
induced pluripotent stem cells.
Nature Biotechnology: How have you built on the work
described in the Nature Medicine paper?
Steve Finkbeiner: Initially, our efforts were directed at
developing leads from our internal academic programs
far enough that they warranted industry partnerships, using financial
support from philanthropists or other non-dilutive funding sources.
The goal was to catalyze the discovery of therapeutics by carrying out
the early-stage discovery and development work necessary to de-risk
the leads. However, as we developed innovative tools and deep biology
expertise to do this work, industry sought access to our platform to
advance their own programs.
NBT: What types of challenges does commercialization of neuroscience research pose?
SF: Early-stage central nervous system drug discovery is viewed as risky,

680

From the perspective of a founder or other
employee, the shift to a combination of stock
options and some form of restricted stock or
stock units should be welcome, making it less
likely that the employee’s awards will have no
value at all. Unlike the corporate employer, an
employee would prefer that restricted stock or
stock units not be subject to performance conditions. As for a preference between restricted stock
or restricted stock units, if the underlying value of
the stock at grant is low enough that the employee
could afford to make a Section 83(b) election
(and thereby have future appreciation taxed
entirely at capital gains rates), then restricted
stock, rather than RSUs, is the way to go.
If you find the complexity of the rules
described above daunting, seeking the advice
of a financial advisor upon grant—and certainly
before exercising or dealing—may be advisable.
In some cases, the financial stakes involved
could be sizeable.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

For more content on bioentrepreneurism,
visit our Trade Secrets blog.

http://blogs.nature.com/trade_secrets/

so the extent to which discoveries must be de-risked is especially
high. Collaboration and open innovation are ways to manage
risk because it reduces the investment necessary to have an effective
development infrastructure. Philanthropy is absolutely critical as well.
It makes it possible to carry out the development of promising leads
without adding encumbrances that would ultimately make
those leads difficult to partner out. Industry partnerships
are essential because they are uniquely resourced to afford
and execute clinical trials. My impression is that philanthropy in this area is growing, and I hope that the message
that philanthropists have the opportunity to make a major
difference and can see the impact of their efforts entices
even greater investment.
NBT: What led you to pursue translational applications
as well as fundamental research?
SF: Part of my work as an academic scientist led naturally to a focus on
mechanisms of disease, which in turn led to the discovery of potential
therapeutic targets. A few years ago, I was fortunate to be approached
by philanthropists interested in one of the diseases we study, and with
their help, created an infrastructure for developing discoveries with
therapeutic potential from the academic research program. We raise
about $5 from other sources for every $1 we receive in philanthropy.
For example, the invention of a first-generation high-throughput stem
cell platform was made possible with philanthropy. Our early successes
using it attracted the resources to develop the technology further and
attract pharma partnerships and sponsored research agreements.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

b u i l d i n g a b us i n e ss

npg

© 2013 Nature America, Inc. All rights reserved.

events likely to occur substantially in the future,
when it is hoped that liquidity will be available.
RSUs often are subject to performance conditions, in which case they often are referred to
as performance (stock) units (PSUs). Because
time-based RSUs are subject to Section 162(m)’s
deduction limitation, and institutional shareholders and shareholder advisory firms prefer
performance-based awards, PSUs currently are
the most frequently employed replacement or
supplement to stock options at public biotech
companies. The performance objectives usually are financial, but can also include product
development milestones. Product-related performance goals can be particularly useful at biotech firms, where financial results may be less
important in the short to medium term than
making progress toward regulatory approval or
commercialization.
Other incentive arrangements. Other forms
of long-term incentives include cash- or stocksettled stock appreciation rights (SARs), cashsettled RSUs and PSUs, and other long-term
incentive plans paying bonuses based on the
level of achievement of various financial, operational and product development metrics. Other
than stock-settled SARs, which are accounted
for in the same manner as stock options under

ASC 718, these other cash-settled forms of
award are considered ‘liability awards’, requiring
so-called ‘mark-to-market’ expensing under US
GAAP, whereby the accounting expense associated with an award, rather than being fixed at
grant, is adjusted over time to reflect the award’s
changing value. For this reason, as well as the
cash-poor nature of many private biotech companies and young public companies and the
preference of institutional shareholders for the
greater stockholder-management alignment
of interests produced by equity-settled awards,
these awards are used relatively infrequently.
Conclusions
Although stock options continue to be a
popular employee incentive device, in the
past few years their advantages have been
diminished through accounting and tax law
changes, whereas their shortcomings have
become more apparent in the biotech sector—in which a consistently growing stock
price is far from assured, or even likely. As a
consequence, biotech firms are moving away
from an exclusive reliance on stock options
and instead are using a mix of equity-based
incentives, most commonly a combination
of stock options and performance-based
stock units.

Startups on the menu
In 2011, Steve Finkbeiner, of the University of California, San
Francisco Gladstone Institutes and Taube-Koret Center, participated in
the Bay Area SciCafé following publication of his paper describing small
molecules that stimulate autophagy as possible treatments for neurodegenerative disease (Nat. Med. 16, 1227, 2010). Key to this
discovery was the invention of a patented high-throughput
single-cell imaging platform that makes it possible to
track the development of brain cells from patient-derived
induced pluripotent stem cells.
Nature Biotechnology: How have you built on the work
described in the Nature Medicine paper?
Steve Finkbeiner: Initially, our efforts were directed at
developing leads from our internal academic programs
far enough that they warranted industry partnerships, using financial
support from philanthropists or other non-dilutive funding sources.
The goal was to catalyze the discovery of therapeutics by carrying out
the early-stage discovery and development work necessary to de-risk
the leads. However, as we developed innovative tools and deep biology
expertise to do this work, industry sought access to our platform to
advance their own programs.
NBT: What types of challenges does commercialization of neuroscience research pose?
SF: Early-stage central nervous system drug discovery is viewed as risky,

680

From the perspective of a founder or other
employee, the shift to a combination of stock
options and some form of restricted stock or
stock units should be welcome, making it less
likely that the employee’s awards will have no
value at all. Unlike the corporate employer, an
employee would prefer that restricted stock or
stock units not be subject to performance conditions. As for a preference between restricted stock
or restricted stock units, if the underlying value of
the stock at grant is low enough that the employee
could afford to make a Section 83(b) election
(and thereby have future appreciation taxed
entirely at capital gains rates), then restricted
stock, rather than RSUs, is the way to go.
If you find the complexity of the rules
described above daunting, seeking the advice
of a financial advisor upon grant—and certainly
before exercising or dealing—may be advisable.
In some cases, the financial stakes involved
could be sizeable.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

For more content on bioentrepreneurism,
visit our Trade Secrets blog.

http://blogs.nature.com/trade_secrets/

so the extent to which discoveries must be de-risked is especially
high. Collaboration and open innovation are ways to manage
risk because it reduces the investment necessary to have an effective
development infrastructure. Philanthropy is absolutely critical as well.
It makes it possible to carry out the development of promising leads
without adding encumbrances that would ultimately make
those leads difficult to partner out. Industry partnerships
are essential because they are uniquely resourced to afford
and execute clinical trials. My impression is that philanthropy in this area is growing, and I hope that the message
that philanthropists have the opportunity to make a major
difference and can see the impact of their efforts entices
even greater investment.
NBT: What led you to pursue translational applications
as well as fundamental research?
SF: Part of my work as an academic scientist led naturally to a focus on
mechanisms of disease, which in turn led to the discovery of potential
therapeutic targets. A few years ago, I was fortunate to be approached
by philanthropists interested in one of the diseases we study, and with
their help, created an infrastructure for developing discoveries with
therapeutic potential from the academic research program. We raise
about $5 from other sources for every $1 we receive in philanthropy.
For example, the invention of a first-generation high-throughput stem
cell platform was made possible with philanthropy. Our early successes
using it attracted the resources to develop the technology further and
attract pharma partnerships and sponsored research agreements.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

correspondence

npg

© 2013 Nature America, Inc. All rights reserved.

Heritable gene targeting in the mouse and rat using a
CRISPR-Cas system
To the Editor:
CRISPR-Cas systems have been developed
as an efficient gene editing technology in
cells and model organisms. Here we use
a CRISPR-Cas system to induce genomic
DNA fragment deletion in mice by coinjecting two single-guide RNAs (sgRNAs)
targeting the Uhrf2 locus with Cas9 mRNA.
Furthermore, we report the generation of
a Mc3R and Mc4R double-gene knockout
rat by means of a single microinjection.
High germline-transmission efficiency was
observed in both mice and rats.
The clustered, regularly interspaced,
short palindromic repeats (CRISPR)associated protein (Cas) system has evolved
in bacteria and archaea as an RNA-based
adaptive immune system against viral
and plasmid invasion1. Based on gene
conservation and locus organization,
three major types of CRISPR systems have
been identified2,3. In the type II systems,
the complex of a CRISPR RNA (crRNA)
annealed to a trans-activating crRNA
(tracrRNA) is sufficient to guide the
Cas9 endonuclease to a specific genomic
sequence to generate double-strand
breaks in target DNA4. Previous studies
established a strategy for multiplex genome
engineering with the Cas9 RNA-guided

endonuclease in mammalian cells5,6.
Recently, efficient genome editing by the
CRISPR-Cas system has been shown in
multiple organisms, including zebrafish,
mice and bacteria7–9. Several groups have
demonstrated that compared with zinc
finger nucleases (ZFNs) and transcription
activator-like effector nucleases (TALENs),
CRISPR-Cas–mediated gene targeting

a

NLS

Spacer

NLS

has similar or greater efficiency in cells
and zebrafish5–7,10. Although it has been
demonstrated that multiple genes can be
disrupted in individual mouse embryos
using CRISPR-Cas–mediated systems9,
germline transmission of Cas9-mediated
mutations in animals has not yet been
reported. In addition, whether long,
specific, genomic DNA target fragments

tracrRNA

NLS

b

M

1

2

3

4

5

6

7

8

9

10

11 12

13

NLS

14 15

T7E1


T7E1
+

c

Founder no.
Mc4r

1
2

3
4
5

Figure 1 Generation of gene mutant rats with
a CRISPR-Cas system. (a) Constructs of Cas9/
RNA system used in this study for DNA (left) and
RNA (right) injections. Spacer, nuclease guide
sequence; DR, direct repeat to separate individual
spacers. NLS, nuclear localization signal.
(b) Detection of mutations in F0 rats generated by
injection of gRNA:Cas9 targeting Mc4r before (–)
or after (+) T7E1 digestion using PCR products
amplified from Mc4r F0 rats tail genomic DNA.
Arrowheads, mutant band. M, DNA molecular
weight marker. (c) DNA sequences of Mc3r
or Mc4r genomic loci in founders. Red boxes
enclose nucleotide substitutions. The change in
the base-pair sequence is shown at right. Six TA
clones of the PCR products amplified from each
founder were analyzed by DNA sequencing. The
incidences of each genotype in six clones were
listed at rightmost column.

6

7
9
10

11
12
13
15

Mc3r

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

681

c o rresp o ndence
Table 1 Generation of knockout mice and rats via the CRISPR-Cas system
Mutation rate
Number injected/
Gene

Strain

Dose (ng/µl)

transferred (%)

Total number of
newborns (%) Mutant number

As a percentage of
total number of
newborns

As a percentage of
total number of
injected embryos

DNA
Th

FVB/N

1

125/50 (40)

10 (8)

0

0

0

Th

FVB/N

2.5

350/75 (21)

11 (3)

1

9

0.3

B6

25/12.5

120/100 (83)

9 (7.5)

8

90

6.7

Rheb

B6

25/12.5

115/45 (40)

4 (3.5)

3

75

2.6

Uhrf2

FVB/N

25/12.5

105/33 (31)

12 (11.5)

11

92

10.5

13

87

10.6

1

7

0.8

RNA
Th

Co-injection
Mc4r

Sprague-Dawley

Mc3r

Sprague-Dawley

25/12.5 + 12.5

122/68 (56)

15 (12)

npg

© 2013 Nature America, Inc. All rights reserved.

sgRNA:Cas9 system for each target gene was injected into fertilized eggs. DNA was injected into male pronuclei and RNA was injected into the cytoplasm of mouse or rat zygotes which were
then transferred into pseudopregnant females. Mutations of the newborns were confirmed by sequencing after weaning.

can be deleted by the CRISPR-Cas system is
still unknown. Moreover, the utility of the
CRISPR-Cas system for gene targeting in
other mammalian models, for example the
laboratory rat, still needs to be determined.
Here, we report the generation of highly
efficient, heritable, gene knockout in mice
and rats by using a CRISPR-Cas system.
To test the activity of Cas9-mediated
gene targeting in mice, a genomic Th
site that has been previously targeted
efficiently in mouse cells4,5 was selected
for the initial experiments in knockout
mouse generation. First, we injected
different concentrations of linearized DNA
(encoding humanized Cas9, target specific
crRNA and tracrRNA) (Fig. 1a) into
the male pronuclei of FVB strain mouse
embryos and assayed the genomic mutation
status of the Th locus in the resulting pups.
Similar to ZFNs and TALENs, the CRISPRCas system induces double-strand DNA
breaks that are repaired mainly by errorprone nonhomologous end joining (NHEJ).
Only 1/11 (9%) of the pups generated from
high-concentration DNA injection (2.5 ng/
μl) was modified at the Th locus, with only
wild-type pups generated by injection of
a lower DNA concentration (Table 1 and
Supplementary Fig. 1), suggesting that the
mutation rate is low when the CRISPR-Cas
system is delivered as linearized DNA.
To improve efficiency, we next injected
RNAs synthesized in vitro. First, we
constructed Cas9 expression vectors for
in vitro transcription of Cas9 mRNA by
subcloning a DNA fragment harboring
the SP6 promoter sequence into a
vector containing the nuclear-targeted
humanized Cas9 coding sequence5. We
also constructed a fusion of the crRNA
682

and tracrRNA expression vectors that
enable T7 promoter-driven production
of a customizable synthetic (sgRNA) with
20 nucleotides of target-specific sequence
followed by tracrRNA-derived sequences
at the 3ʹend (Fig. 1a). Concentrations
of Cas9-encoding mRNA comparable to
those used in TALEN studies11 (25 ng/
μl), together with Th-targeting sgRNA
(12.5 ng/μl), were microinjected into
the cytoplasm of one-cell-stage C57BL/6
mouse embryos. Ninety percent (8 of
9) of the pups from RNA injection were
founders bearing mutations at the Th
locus, as determined by a T7 endonuclease
I (T7EI) assay and DNA sequencing
(Supplementary Fig. 1). The longest
deletion was observed in founder 6,
bearing a 70-bp deletion. Similarly, founder
mice bearing insertion/deletion mutations
(indels) at the Rheb genomic locus were
generated with high efficiency using the
same strategy (Table 1 and Supplementary
Fig. 2).
One of the most important advantages
of CRISPR-Cas systems is that the Cas9
protein can be guided by individual gRNAs
to modify multiple genomic target loci
simultaneously5. To test this in mice, we
then designed and injected two sgRNAs
targeting adjacent sites spanning 86 bp
in the Uhrf2 locus with Cas9 mRNA into
embryos to make deletions. Eleven of 12
F0 mice had mutations in the Uhrf2 locus,
and 6 of these founders had a total of 7
different disruptions of both these targets
on the same allele (Supplementary Fig. 3).
Three of the six founders modified at both
sites had large deletions (Supplementary
Fig. 3). These large deletions were probably
generated by simultaneous DNA cleavage

of these two sites, followed by end joining
ligation of the broken ends.
The mutations generated by ZFNs and
TALENs in founders are transmitted
efficiently to the next generation11–15, but
to the best of our knowledge, the germline
transmission efficiency of a CRISPR-Cas
system has not been reported in animals.
To investigate this issue, we crossed
Th founders to wild-type mice and the
genotypes of the pups or fetuses were
determined by T7E1 digestion or DNA
sequencing. Although only two mutations
were identified in the tail DNA of the
founder generated by DNA injection, in
6 of 10 F1 fetuses we found a total of five
different mutations (Supplementary
Fig. 4). These data imply that the founder
was a mosaic due to the delay in DNA
cleavage by Cas9 in the embryos. It also
suggests that sequencing of six clones
of PCR products from founder tail
DNA would not reveal all the mutations
generated. Another two founders generated
by mRNA injection also transmitted
the mutation to the next generation
(Supplementary Fig. 4). These data
demonstrate that a CRISPR-Cas system is
a useful genetic tool to generate heritable
mutant mice with very high efficiency.
The laboratory rat is important
for modeling diseases and has many
advantages over mouse models in
toxicology and pharmacology. Previous
studies have successfully generated
knockout rats through both ZFNs and
TALENs14,15. Here, we attempt to generate
knockout rats using a CRISPR-Cas system.
Two sgRNAs targeting rat melanocortin
3 receptor (Mc3r) and melanocortin 4
receptor (Mc4r) were synthesized. Cas9

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

c o rresp o ndence
mRNA and a mixture of sgRNAs were
injected into one-cell-stage SpragueDawley rat embryos, which were then
transferred to pseudopregnant females.
Pup genomic DNA was extracted for PCR
amplification of the target loci. A T7EI
assay and DNA sequencing data show that
both the Mc3r and Mc4r loci were modified
by our CRISPR-Cas system (Fig. 1b,c).
However, the activities of these two
Cas9-based nucleases are quite different.
Thirteen of 15 F0 pups were identified as
the founders of Mc4r mutant rats by T7E1
digestion and subsequent sequencing
(Fig. 1b). Founders containing large
deletions were easily detected by PCR
without digestion. No founder with a Mc3r
mutation was identified by T7E1 digestion,
but one rat that had a Mc4r mutation, also
had a single-nucleotide deletion in the
Mc3r locus as determined by sequencing
(Fig. 1c). We investigated the body weight,
food intake, insulin level and leptin mRNA
level of Mc4r founders and found that
the biallelic mutants exhibited a similar
phenotype to the N-ethyl-N-nitrosourea
(ENU)-induced Mc4r mutant rat that has
been used as an obesity animal model16
(Supplementary Fig. 5). These data
suggest that a CRISPR-Cas system can
generate gene knockout rats, and that the
efficiency depends on the target site. In
addition, a single injection is capable of
inducing disruption of at least two different
genes in the rat. To determine the germline
transmission capability of Cas9-mediated
gene mutation in rats, we crossed Mc4r
mutant rat founder 12 with wild-type rat
and determined the Mc4r target sequence
of fetuses. Three of six fetuses containing
two different mutations were identified
(Supplementary Fig. 6), suggesting high
efficiency of germline transmission in rats
with Cas9-mediated gene mutation.
Another important issue for genome
editing is the in vivo specificity. Previous
studies indicated two possible rules
for how mismatches affected Cas9mediated DNA cleavage. One is that
single-base mismatches up to 12-bp 5ʹ
of the protospacer adjacent motif (PAM)
completely abolished Cas9-mediated
DNA cleavage. The other is that a stretch
of at least a 13-bp match between gRNA
and target DNA proximal to the PAM

is required for efficient cleavage, and
mismatches outside this motif can be
tolerated4,5. We investigated the specificity
of our CRISPR-Cas system to their targets
by analyzing potential off-target sites in the
mouse genome in these two categories. As it
is difficult to analyze all potential off-target
sites, the two sites with fewer than four
mismatches or with a contiguous match
to PAM motif of >11 bp were selected
(Supplementary Fig. 7). No mutations were
observed at these potential off-target sites
in the 12 founders that were analyzed by
sequencing (Supplementary Fig. 7).
In cells and zebrafish, CRISPR-Cas
system–mediated gene disruption has
similar activity to that of TALENs5,7;
however, our study, together with a previous
report9, suggests that the activity of the
CRISPR-Cas system is greater than that
of ZFNs or TALENs, at least in mice. The
mutation rate of TALEN-modified mice
ranges from 13% to 67% in our previous
study11, but the gRNA:Cas9-induced
mutation rate is usually >70%. We also note
that the toxicity (referring to the viability
immediately after microinjection) of a
CRISPR-Cas system (Table 1) is a little bit
greater than that of TALENs11,12,14. The
germline transmission rate of the CRISPRCas system is similar to that of TALENs.
As with TALENs, we also found that some
targets cannot be disrupted by CRISPR-Cas
for unknown reasons (data not shown), but
the potential targets are more flexible than
with TALENs. Although the targets are
shorter than those of TALENs and ZFNs,
no off-target mutation event using the
CRISPR-Cas system was found in our study,
suggesting it is a reliable technique for gene
editing in animals. More comprehensive
studies, such as the one recently published
by Fu et al.17, will be needed to establish the
relative advantages and disadvantages of the
various genome editing systems.
In this study, we successfully generated
specific gene knockout mice of distinct
genetic backgrounds as well as a genespecific knockout in Sprague-Dawley rats
using a CRISPR-Cas system. Featuring
highly efficient genomic modification
activity and germline transmission, the
RNA-guided CRISPR-Cas system is a
potentially useful genetic tool for functional
genomic research in mammalian organisms.

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

Note: Supplementary information is available in the
online version of the paper (doi:10.1038/nbt.2661).
ACKNOWLEDGMENTS
We thank F. Zhang of the Broad Institute of MIT
and Harvard for kindly providing us with the Cas9
expression vector. We thank S. Siwko for comments
and advice. We also thank S.S. Bae and J.-S. Kim of
Seoul National University for helping us to predict
the potential off-target sites. This work was partially
supported by grants from the State Key Development
Programs of China (2012CB910400 to M.L.,
2010CB945403 to D.L.), grants from the National
Natural Science Foundation of China (31171318
to D.L. and 30930055 to M.L.), a grant from the
Science and Technology Commission of Shanghai
Municipality (11DZ2260300) and grants from the
Program for Changjiang Scholars and Innovative
Research Team in University (IRT1119 and IRT1128).
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Dali Li1,4, Zhongwei Qiu1,4, Yanjiao Shao1,
Yuting Chen1, Yuting Guan1, Meizhen Liu1,
Yongmei Li1, Na Gao1, Liren Wang1,
Xiaoling Lu2, Yongxiang Zhao2 &
Mingyao Liu1,3
1Shanghai Key Laboratory of Regulatory Biology,

Institute of Biomedical Sciences and School of
Life Sciences, East China Normal University,
Shanghai, China. 2Biological Targeting Diagnosis
and Therapy Research Center, Guangxi
Medical University, Nanning, Guangxi, China.
3The Institute of Biosciences and Technology,
Texas A&M University Health Science
Center, Houston, Texas, USA. 4These authors
contributed equally to this work.
e-mail: [email protected],
[email protected] or
[email protected]
1. Garneau, J.E. et al. Nature 468, 67–71 (2010).
2. Makarova, K.S. et al. Nat. Rev. Microbiol. 9, 467–477
(2011).
3. Wiedenheft, B., Sternberg, S.H. & Doudna, J.A. Nature
482, 331–338 (2012).
4. Jinek, M. et al. Science 337, 816–821 (2012).
5. Cong, L. et al. Science 339, 819–823 (2013).
6. Mali, P. et al. Science 339, 823–826 (2013).
7. Hwang, W.Y. et al. Nat. Biotechnol. 31, 227–229
(2013).
8. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini,
L.A. Nat. Biotechnol. 31, 233–239 (2013).
9. Wang, H. et al. Cell 153, 910–918 (2013).
10. Cho, S.W., Kim, S., Kim, J.M. & Kim, J.S. Nat.
Biotechnol. 31, 230–232 (2013).
11. Qiu, Z. et al. Nucleic Acids Res. e120 (2013).
12. Sung, Y.H. et al. Nat. Biotechnol. 31, 23–24 (2013).
13. Cui, X. et al. Nat. Biotechnol. 29, 64–67 (2011).
14. Tesson, L. et al. Nat. Biotechnol. 29, 695–696 (2011).
15. Geurts, A.M. et al. Science 325, 433 (2009).
16. Mul, J.D. et al. Obesity (Silver Spring) 20, 612–621
(2012).
17. Fu, Y. et al. Nat. Biotechnol. advance online publication
http://www.nature.com/doifinder/10.1038/nbt.2623
(23 June 2013).

683

c o rresp o ndence

npg

© 2013 Nature America, Inc. All rights reserved.

Simultaneous generation and
germline transmission of multiple
gene mutations in rat using
CRISPR-Cas systems
To the Editor:
CRISPRs are clustered, regularly interspaced,
short palindromic repeats present in many
bacteria and archaea genomes. Proteins
encoded by CRISPR-associated (Cas) genes
serve as guardians of the genome, which
target foreign DNA at specific sites by means
of small CRISPR RNA (crRNA)-guided DNA
recognition and degradation1–4. Recently,
several groups described how CRISPR-Cas
systems efficiently create site-specific gene
modifications in whole organisms such as
Streptococcus pneumoniae, Escherichia coli,
Danio rerio (zebrafish) and mice, suggesting
its potential application in the production of
genetically engineered organisms5–7, although
germline transmission of the mutations
remains to be shown. Here, we report the
use of CRISPR-Cas systems to generate
multiple gene mutations in rats in a germlinecompetent manner.

small RNA that can be simply expressed or
transfected with the Cas9 nuclease.
We chose the Tet family genes as targets
to demonstrate the feasibility of CRISPRCas–mediated mutagenesis in rat. The
three Tet genes encode proteins with
DNA hydroxymethylase functions and
play key roles in the regulation of many
important biological processes11. The
Cas9 endonuclease functions either with a
combination of a short crRNA and a transactivating crRNA (tracrRNA), or with a
chimeric single guide RNA (sgRNA). We
designed six sgRNAs targeting six different
genomic sites encoding rat Tet1 (sgTet1-1 and
sgTet2-1), Tet2 (sgTet2-1 and sgTet2-2) and
Tet3 (sgTet3-1 and sgTet3-2), respectively.
Two sgRNAs were chosen for each gene to
increase the likelihood of successful targeting.
The sgRNAs were designed to contain
20-nucleotide customized spacer sequences

The laboratory rat is an important model
organism for human disease research.
Several technologies including zinc finger
nuclease (ZFN), transcription activator-like
effector nuclease (TALEN) and isolation
of rat embryonic stem cells have enabled
targeted gene mutation of the rat genome8–10.
Compared with these techniques, CRISPRCas systems provide a gene editing tool that
can more easily be targeted to one or more
genomic loci, as targeting is mediated by

a

b

* * *

*

*

PCR
T7

Cas9

SV40 polyA
NLS

NLS

Tet1
(site1)

472 bp
472 bp
349 bp

Surveyor

123 bp
T7

sgRNA

63.2

16.1 17.8 14.8 17.6 22.5 13.3 20.5 15.5 13.9

495 bp
400 bp

Surveyor

495 bp
400 bp
282 bp

Cas9
GTGGAATATGAAGACATTGC T PAM
GG
GG
20 bp Tet1 target site1
5ʹ- TGACATTTGTTTCT A
AGACTGTCGACTT -3ʹ
3ʹ- ACTGTAAACAAAGAT
TCTGACAGCTGAA -5ʹ
CC
CC
A
CACCTTATACTTCTGTAACG

Tet2
(site1)

Indel (%)

PCR

118 bp
100

15.2 22.4 19.8

15.6 22.9 17.7 19.8

Indel (%)

5ʹ- GUGGAAUAUGAAGACAUUGCGUAAGCGUUAUCAAUGGCUUUA -3ʹ

Figure 1 Targeted disruption in the Tet gene
loci in rat by means of CRISPR-Cas systems.
(a) Engineered CRISPR-Cas systems and their
target Tet1 gene locus used in this study. The
protospacer adjacent motif (PAM) is shown in red
letters, and the 20-base-pair spacer sequence is
shown in purple letters. NLS, nuclear localization
signal. (b) The PCR products encompassing the
targeted sites and their Surveyor assay for indels
in the PCR products. All three Tet genes of each
rat are assayed and presented in the same order.
Individuals with heteroduplex of wild-type (WT)
and mutated alleles have three bands in Surveyor
assay, individuals with large indels have PCR
and Surveyor products with shifted sizes. Indel
rate is calculated and labeled under the lanes
with mutation. *, rats with mutations in all three
target sites. (c) Sanger sequencing of targeted
gene alleles in each of the triple-mutated rats
induced by injection of Cas9 enzyme together
with sgTet1-1/sgTet2-1/sgTet3-1. The PAM sites
are red, upper case bold; insertions are red, lower
case; point mutations are blue, lower case. The
sizes of the point mutations (p), insertions (+)
or deletions (D) are shown to the right of each
allele. The fractions indicate the read number of
the mutant allele (numerator) out of total read
number (denominator). The targeted locus is
labeled as “biallelic mutation” if no wild-type
allele reads were recovered, as “monoallelic
mutation” if a wild-type allele was recovered, and
as “mosaic” if more than two types of alleles were
recovered (Supplementary Discussion).

684

CG CG
UA GC
UA AU
UA C U
UA CU
UA UAG
G A CG
UA
GG GC
AU
sgRNA AGU
GU
A G
G A
C CA
UG
A A
G U
AA A

c

Tet2
(site1)

Tet3
(site1)

Rat 2

528 bp
Surveyor

281 bp
247 bp
8.5

8.7 16.1 9.2 10.2

7.0

Indel %

ATGAAGACATT---GCTGGAGACTGTCGACTTGGTAATGAAGAGGGGCGTCCTTTCTCTG

Wild-type

ATGAAGACATTtttGCTGGAGACTGTCGACTTGGTAATGAAGAGGGGCGTCCTTTCTCTG

(+3) 10/10

Biallelic
mutation

ATGCCTACAGGAATCAGGTAAGTTGTTGTTCATGAGTCAAACTTCATTACTGGTGGCCTA
ATGCCTA-------CAGGTAAGTTGTTGTTCATGAGTCAAACTTCATTACTGGTGGCCTA

Wild-type
(△7) 10/15

Biallelic
mutation

ATGCCT--------CAGGTAAGTTGTTGTTCATGAGTCAAACTTCATTACTGGTGGCCTA

(△8) 5/15

AGCTTTCAGGACCTGGCCACTGAAGTTGCTCCCCTATACAAGCGGCTGGCACCCCAGGCC

Wild-type

AGCTTTCAGGACCTGGCCACTGAAGTTGCTCCCCTATACAAGC---TGGCACCCCAGGCC

(△3) 6/14

Monoallelic
mutation

Tet1
(site1)

TTTGAAGACAGAGATTTGGTGCTCCCATTTGTAA//TGAAGACATTGCTGGAAGAGTGTC

Wild-type

TTTGAAGACAGAGATTTGG-----------(-178 bp)-------------AGAGTGTC

(△178) 6/15 mutation

Tet2
(site1)

TGATGCCTACAGGAATCAGGTAAGTTGTTGTTCATGAGTCAAACTTCATTACTGGTGGCC

Wild-type

TGATGCCTACAG-------GTAAGTTGTTGTTCATGAGTCAAACTTCATTACTGGTGGCC

(△7) 15/15

TGGCCACTGAAGTTGCTCCCCTATACAAGCGGCTGGCACCCCAGGCCTATCAGA-ACCAG
TGGCCACTGAAGTTGCTCCCCTATACAAGC---TGGCACCCCAGGCCTATCAGAcACCAG

Wild-type
Monoallelic
(△3, +1) 7/11 mutation

ATGAAGACAT--TGCTGGAGACTGTCGACTTGGTAATGAAGAGGGGCGTCCTTTCTCTGG

Wild-type

ATGAAGACATatTGCTGGAGACTGTCGACTTGGTAATGAAGAGGGGCGTCCTTTCTCTGG

(+2) 2/8
(△13) 4/8

Tet3
(site1)

Tet1
(site1)

Rat 3

Tet3
(site1)

12.2 17.3 21.4

Tet1
(site1)

Rat 1

528 bp

PCR

Tet2
(site1)

Tet3
(site1)

ATGA---------------GACTGTCGACTTGGTAATGAAGAGGGGCGTCCTTTCTCTGG
ATGCCTACAGGAATCAGGTAAGTTGTTGTTCATGAGTCAAACTTCATTACTGGTGGCCTA
ATGCCTACAG-------GTAAGTTGTTGTTCATGAGTCAAACTTCATTACTGGTGGCCTA
AGCTTTCAGGACCTGGCCACTGAAGTTGCTCCCCTATACAAGCGGCTGGCACCCCAGGCC

Monoallelic

Biallelic
mutation

Monoallelic
mutation
(mosaic)

Wild-type
Biallelic
(△13) 15/15 mutation

AGCTTTCAGGACCTGGCCACTGAAGTTGCTCCCCTATACAAGCaGCTGGCACCCCAGGCC

Wild-type
(p1) 4/10

AGCTTTCAGGACCTGGCCACTGAAGTTGCTCCCCTATACAAGC---TGGCACCCCAGGCC

(△3) 1/10

Monoallelic
mutation
(mosaic)

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

c o rresp o ndence
Table 1 Multiple gene disruption in rats by means of CRISPR-Cas systems

npg

© 2013 Nature America, Inc. All rights reserved.

Mutated rats (%)
Single (%)

Double (%)

Triple (%)

Injected
sgRNAs

Injected
embryos

Transferred
embryos

Newborns
(% of transferred)

Assayed rats

Tet1

Tet2

sgTet3-1/
sgTet3-2

130

100 (76.9)

42 (42)

18





sgTet1-1/
sgTet2-1

140

105 (75.0)

30 (28.6)

24

16 (67.7) 23 (95.8)

sgTet1-2/
sgTet2-2

120

80 (66.7)

22 (31.2)

20

16 (80.0) 19 (95.0)

sgTet1-1/
sgTet2-1/
sgTet3-1

90

70 (77.8)

22 (31.4)

22

15 (68.2) 20 (90.9) 16 (72.7) 15 (68.2) 13 (59.1) 16 (72.7)

together with tracrRNA-derived sequences
as previously described1,5 (Supplementary
Table 1 and Supplementary Sequences). For
the production of mRNA for microinjection,
we cloned the Cas9 gene and sgRNA
sequences into separate expression vectors
with a prokaryotic T7 promoter (Fig. 1a).
To generate rats with mutations in Tet3,
we first microinjected the two Tet3-targeting
sgRNAs together with Cas9 mRNA into the
cytoplasm of one-cell-stage rat embryos to
test the in vivo function of the sgRNA:Cas9
system. We transferred 100 out of 130 injected
embryos into pseudopregnant mothers, and
42 pups were obtained (Table 1). All but one
pup died within 2 days. We randomly chose
18 dead pups and used the SURVEYOR assay
to detect insertions or deletions (indels)
within the Tet3 coding region. All of these
(18/18, 100%) had mutations within the
Tet3 coding region, with both designed
sgRNA targeted sites within Tet3 containing
indels. No wild-type Tet3 allele was obtained
in the sequence reads of any examined
dead pups, indicating they had biallelic
mutations of Tet3, which is consistent with
the previously reported neonatal lethality of
a Tet3 knockout12 (Supplementary Fig. 1,
Supplementary Table 2 and Supplementary
Discussion).
As the target specificity of Cas9 is
determined by a small RNA, it should be
relatively straightforward to simultaneously
target multiple genomic locations at once, a
potential advantage of the Cas9 technology.
We tested the efficiency of the CRISPR-Cas
systems in mutating two genes simultaneously
by co-injecting the Tet1- and Tet2-targeting
sgRNAs (only one sgRNA was chosen for
each gene) together with the Cas9 mRNA
into one-cell-stage rat embryos. We obtained
30 rats from embryos injected with one set
of sgRNAs (set 1: sgTet1-1 and sgTet2-1) and
22 rats from embryos injected with another
set of sgRNAs (set 2: sgTet1-2 and sgTet2-2)
(Table 1). We selected 22 and 20 rats from
each group, respectively, to examine their

Tet3

Tet1/Tet2

Tet1/Tet3

Tet2/Tet3

Tet1/Tet2/Tet3











15 (62.5)









15 (75.0)







18 (100)

mutations. SURVEYOR assays showed that
~70% (set 1, 16/24; set 2, 16/20) of the rats
contained Tet1 mutations and 95% (set 1,
23/24; set 2, 19/20) contained Tet2 mutations,
of which >65% (set 1, 15/24; set 2, 15/20)
contained both mutations (Table 1). Sanger
sequencing confirmed the targeted sites
within both Tet1 and Tet2 in the doublemutant rats contained indels. Analysis of
sequence reads by rat revealed that most
double-mutant rats had at least one allele
mutated for each target gene, and >40% of
the mutated rats of the sgTet1-2 and sgTet2-2
(set 2) group had biallelic mutations of both
genes, showing the CRISPR-Cas systems
can simultaneously produce two-gene,
biallelic mutations in rats (Supplementary
Figs. 2 and 3, Supplementary Table 2 and
Supplementary Discussion). All the mutated
rats survived, consistent with a previous
functional report of Tet1 and Tet2 knockout
in mice13.
Next, we analyzed the triple-gene mutation
efficiency of the CRISPR-Cas systems through
co-injection of sgRNAs for the three Tet genes
(sgTet1-1, sgTet2-1 and sgTet3-1) together
with Cas9 mRNA into one-cell-stage rat
embryos. A total of 22 newborn pups were
obtained from the 70 transferred embryos.
We examined all the rats and found that 13
rats (13/22, 59.1%) contained mutations
of all three Tet genes (Fig. 1b). All these
triple-mutated rats contained site-specific
indels in all three gene coding regions, and
~60% of them had a biallelic or monoallelic
mutation for each targeted locus (Fig. 1c,
Supplementary Fig. 4 and Supplementary
Table 2). Some of the mutations led to
truncated proteins lacking the catalytic
domain, potentially disrupting gene function
(Supplementary Fig. 5). Consequently,
we observed reduced 5-hydroxymethyl
cytosine levels in some triple-mutated rats
(Supplementary Fig. 6). These results
demonstrate that simultaneous triple-gene
mutations can be achieved in rat by the
CRISPR-Cas systems.

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

13 (59.1)

We observed that all six sgRNA:Cas9
constructs could efficiently introduce targeted
indels, including point mutations, insertions
and deletions ranging from ~1 bp to ~180 bp
(Supplementary Figs. 1–4). Previous studies
suggested that CRISPR-Cas systems may
tolerate sequence mismatches distal from
the protospacer adjacent motif (PAM) at the
5ʹ end of sgRNAs, which would probably
induce off-target mutations. To detect offtarget effects of the sgRNA:Cas9 constructs
in vivo, we screened the genome of 13 triplemutated rats for all off-target sites with
more than 14-bp sequence identity to the six
sgRNAs. Only four such sites were identified
for all six sgRNAs (Supplementary Fig. 7).
The SURVEYOR assay revealed that only
one potential site was mutated by sgTet1-1
in all the examined triple-mutated rats
(Supplementary Fig. 8). The high birth rate
and survival rate of the mutant rats (except
for the neonatal lethal gene Tet3) indicated
that the sgRNA:Cas9 construct had very low
toxicity to rat embryos (Table 1).
Successful germline transmission of
genetic mutations is essential for establishing
genetically modified animal models. To test
the transmission ability of the mutation, we
produced fertilized embryos with sperm
from Tet1/Tet2 double-mutant rats and wildtype oocytes. Of seven embryos successfully
assayed by Sanger sequencing, we identified
five inherited mutations in the Tet2 site only,
and two inherited mutations in both the
Tet1 and Tet2 sites, indicating that mutations
in the founder rats were transmitted to the
next generation (Supplementary Fig. 9).
Because a high ratio of mutant rats had
biallelic mutations or nonmosaic monoallelic
mutations for all the targeted genes, we expect
that the mutations introduced by sgRNA:Cas9
could be efficiently transmitted into further
generations (Supplementary Table 2).
In summary, our results demonstrate
that CRISPR-Cas systems efficiently and
simultaneously generated single and multiple
gene mutations in vivo in rats. During the
685

c o rresp o ndence
revision process of this work, an independent
study reported the simultaneous generation
of multiple mutations in mice7. Our work,
together with the mice work, demonstrates
that it should be feasible to produce genetargeted models in rodents and probably
other mammalian species using the CRISPRCas systems.
Note: Supplementary information is available in the
online version of the paper (doi/10.1038/nbt.2652).

npg

© 2013 Nature America, Inc. All rights reserved.

ACKNOWLEDGMENTS
This study was supported by grants from the National
Basic Research Program of China 2012CBA01300,
“Strategic Priority Research Program” of the Chinese
Academy of Sciences XDA01020101, the National
Science Foundation of China 90919060 and the
Ministry of Science and Technology of China
2011CBA01101 (all to Q.Z.).
Author contributions
Q.Z. designed the experiments, supervised laboratory
work, analyzed and interpreted data; Q.Z. and W.L.
wrote the paper; W.L., F.T. and T.L. performed the
experiments.

COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Wei Li1,3, Fei Teng1–3, Tianda Li1 & Qi Zhou1
1State Key Laboratory of Reproductive Biology,

Institute of Zoology, Chinese Academy of
Sciences, Beijing, China. 2University of Chinese
Academy of Sciences, Beijing, China. 3These
authors contributed equally to this work.
e-mail: [email protected]
1. Jinek, M. et al. Science 337, 816–821 (2012).
2. Cho, S.W., Kim, S., Kim, J.M. & Kim, J.S. Nat.
Biotechnol. 31, 230–232 (2013).
3. Cong, L. et al. Science 339, 819–823 (2013).
4. Mali, P. et al. Science 339, 823–826 (2013).
5. Hwang, W.Y. et al. Nat. Biotechnol. 31, 227–229
(2013).
6. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini,
L.A. Nat. Biotechnol. 31, 233–239 (2013).
7. Wang, H. et al. Cell 153, 910–918 (2013).
8. Geurts, A.M. et al. Science 325, 433 (2009).
9. Tong, C., Li, P., Wu, N.L., Yan, Y. & Ying, Q.L. Nature
467, 211–213 (2010).
10. Tesson, L. et al. Nat. Biotechnol. 29, 695–696 (2011).
11. Wu, H. & Zhang, Y. Genes Dev. 25, 2436–2452
(2011).
12. Gu, T.P. et al. Nature 477, 606–610 (2011).
13. Dawlaty, M.M. et al. Dev. Cell 24, 310–323 (2013).

Targeted genome modification of
crop plants using a CRISPR-Cas
system
To the Editor:
Although genome editing technologies
using zinc finger nucleases (ZFNs)1
and transcription activator-like effector
nucleases (TALENs)2 can generate genome
modifications, new technologies that are
robust, affordable and easy to engineer are
needed. Recent advances in the study of
the prokaryotic adaptive immune system,
involving type II clustered, regularly
interspaced, short palindromic repeats
(CRISPR), provide an alternative genome
editing strategy3. Type II CRISPR systems
are widespread in bacteria; they use a single
endonuclease, a CRISPR-associated protein
Cas9, to provide a defense against invading
viral and plasmid DNAs4. Cas9 can form a
complex with a synthetic single-guide RNA
(sgRNA), consisting of a fusion of CRISPR
RNA (crRNA) and trans-activating crRNA.
The sgRNA guides Cas9 to recognize
and cleave target DNA. Cas9 has a HNH
nuclease domain and a RuvC-like domain;
each cleaves one strand of a doublestranded DNA. It can be used as an RNAguided endonuclease to perform sequencespecific genome editing in bacteria, human
cells, zebrafish and mice5–11. Here we
686

show that customizable sgRNAs can direct
Cas9 to induce sequence-specific genome
modifications in the two most widely
cultivated food crops, rice (Oryza sativa)
and common wheat (Triticum aestivum).
We first codon-optimized Streptococcus
pyogenes Cas9 (SpCas9), attached nuclear
localization signals (NLSs) at both ends
(Fig. 1a and Supplementary Fig. 5)
and expressed sgRNA transcripts (Fig.
1a, Supplementary Methods and
Supplementary Fig. 4). To disrupt
endogenous genes in rice protoplasts, we
designed two sgRNA, SP1 and SP2, which
target different DNA strands of the rice
phytoene desaturase gene OsPDS (Fig. 1b
and Supplementary Table 4). Efficient,
targeted mutagenesis (15%) was detected
starting at 18 h of protoplast cultivation,
and similar, if not higher, efficiencies
were observed from 24 h through 72
h (Supplementary Fig. 1a,b). PCR/
restriction enzyme (PCR/RE) assays were
carried out to detect mutations in both
target regions (Supplementary Methods
and Supplementary Table 5). Digestionresistant bands were detected in both
sgRNA targets with efficiencies ranging

from 14.5% to 20.0%, as estimated by band
intensities (Fig. 1c and Supplementary
Methods). Cloning and sequencing of these
uncut bands revealed indels in the targeted
OsPDS gene. The highest frequency of
mutations was obtained with an sgRNA
with 20 nucleotides (nts) of sequence
complementary to the OsPDS-SP1 target
site (P = 0.039) (Supplementary Fig. 1c,d).
We targeted another three rice genes
(OsBADH2, Os02g23823 and OsMPK2) and
one wheat gene (TaMLO) (Supplementary
Tables 1 and 4) in protoplasts, with indel
frequencies of 26.5–38.0% (Supplementary
Fig. 2 and Supplementary Table 2).
The frequency of mutations induced by
sgRNA:Cas9 in Os02g23823 was lower
(26.0%) than that induced by TALENs
(36.5%) (Supplementary Fig. 1e), whereas,
in OsBADH2, it was considerably higher12
(26.5% versus 8.0%) (Supplementary Fig. 2
and Supplementary Table 7). Our results
suggest that a customized sgRNA:Cas9
efficiently induces sequence-specific
modifications in plants. Moreover, only a
single customized sgRNA, encoded by a
sequence of ~100 nt, is required to target a
specific sequence, and Cas9 does not have
to be reengineered for each new target
site. The sgRNA:Cas9 system is therefore
much more straightforward than ZFNs or
TALENs.
To test whether sgRNA:Cas9 can induce
gene knockouts in rice plants, we bombarded
rice callus cells with Cas9 plasmid and
sgRNA expression plasmids designed
to cleave either OsPDS or OsBADH2
(Supplementary Methods). Transformed,
hygromycin-tolerant calli were grown into
whole plants. Mutations in OsPDS-SP1 were
identified in 9 of 96 independent transgenic
plants (9.4%), and mutations in OsBADH2
in 7 of 98 transgenic plants (7.1%) (Fig. 1d
and Supplementary Table 2). In addition,
biallelic mutations were identified in 3 of
the 9 plants mutated in OsPDS-SP1. Two
of them were homozygous for the same
one-nucleotide insertion (Fig. 1d), and all
three had the albino and dwarf phenotype
(Fig. 1e), showing that the rice phytoene
desaturase gene had been disrupted.
To examine homology-directed repair
(HDR), we designed a single-stranded oligo
with a KpnI + EcoRI site to be introduced
into OsPDS (Fig. 2a and Supplementary
Table 6). To detect such mutations, we
used a PCR/RE assay that preferentially
amplifies mutated DNA sequences.
Protoplast genomic DNA was cleaved with
PstI before PCR amplification to enrich
for sgRNA:Cas9-induced mutations.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

c o rresp o ndence
revision process of this work, an independent
study reported the simultaneous generation
of multiple mutations in mice7. Our work,
together with the mice work, demonstrates
that it should be feasible to produce genetargeted models in rodents and probably
other mammalian species using the CRISPRCas systems.
Note: Supplementary information is available in the
online version of the paper (doi/10.1038/nbt.2652).

npg

© 2013 Nature America, Inc. All rights reserved.

ACKNOWLEDGMENTS
This study was supported by grants from the National
Basic Research Program of China 2012CBA01300,
“Strategic Priority Research Program” of the Chinese
Academy of Sciences XDA01020101, the National
Science Foundation of China 90919060 and the
Ministry of Science and Technology of China
2011CBA01101 (all to Q.Z.).
Author contributions
Q.Z. designed the experiments, supervised laboratory
work, analyzed and interpreted data; Q.Z. and W.L.
wrote the paper; W.L., F.T. and T.L. performed the
experiments.

COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Wei Li1,3, Fei Teng1–3, Tianda Li1 & Qi Zhou1
1State Key Laboratory of Reproductive Biology,

Institute of Zoology, Chinese Academy of
Sciences, Beijing, China. 2University of Chinese
Academy of Sciences, Beijing, China. 3These
authors contributed equally to this work.
e-mail: [email protected]
1. Jinek, M. et al. Science 337, 816–821 (2012).
2. Cho, S.W., Kim, S., Kim, J.M. & Kim, J.S. Nat.
Biotechnol. 31, 230–232 (2013).
3. Cong, L. et al. Science 339, 819–823 (2013).
4. Mali, P. et al. Science 339, 823–826 (2013).
5. Hwang, W.Y. et al. Nat. Biotechnol. 31, 227–229
(2013).
6. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini,
L.A. Nat. Biotechnol. 31, 233–239 (2013).
7. Wang, H. et al. Cell 153, 910–918 (2013).
8. Geurts, A.M. et al. Science 325, 433 (2009).
9. Tong, C., Li, P., Wu, N.L., Yan, Y. & Ying, Q.L. Nature
467, 211–213 (2010).
10. Tesson, L. et al. Nat. Biotechnol. 29, 695–696 (2011).
11. Wu, H. & Zhang, Y. Genes Dev. 25, 2436–2452
(2011).
12. Gu, T.P. et al. Nature 477, 606–610 (2011).
13. Dawlaty, M.M. et al. Dev. Cell 24, 310–323 (2013).

Targeted genome modification of
crop plants using a CRISPR-Cas
system
To the Editor:
Although genome editing technologies
using zinc finger nucleases (ZFNs)1
and transcription activator-like effector
nucleases (TALENs)2 can generate genome
modifications, new technologies that are
robust, affordable and easy to engineer are
needed. Recent advances in the study of
the prokaryotic adaptive immune system,
involving type II clustered, regularly
interspaced, short palindromic repeats
(CRISPR), provide an alternative genome
editing strategy3. Type II CRISPR systems
are widespread in bacteria; they use a single
endonuclease, a CRISPR-associated protein
Cas9, to provide a defense against invading
viral and plasmid DNAs4. Cas9 can form a
complex with a synthetic single-guide RNA
(sgRNA), consisting of a fusion of CRISPR
RNA (crRNA) and trans-activating crRNA.
The sgRNA guides Cas9 to recognize
and cleave target DNA. Cas9 has a HNH
nuclease domain and a RuvC-like domain;
each cleaves one strand of a doublestranded DNA. It can be used as an RNAguided endonuclease to perform sequencespecific genome editing in bacteria, human
cells, zebrafish and mice5–11. Here we
686

show that customizable sgRNAs can direct
Cas9 to induce sequence-specific genome
modifications in the two most widely
cultivated food crops, rice (Oryza sativa)
and common wheat (Triticum aestivum).
We first codon-optimized Streptococcus
pyogenes Cas9 (SpCas9), attached nuclear
localization signals (NLSs) at both ends
(Fig. 1a and Supplementary Fig. 5)
and expressed sgRNA transcripts (Fig.
1a, Supplementary Methods and
Supplementary Fig. 4). To disrupt
endogenous genes in rice protoplasts, we
designed two sgRNA, SP1 and SP2, which
target different DNA strands of the rice
phytoene desaturase gene OsPDS (Fig. 1b
and Supplementary Table 4). Efficient,
targeted mutagenesis (15%) was detected
starting at 18 h of protoplast cultivation,
and similar, if not higher, efficiencies
were observed from 24 h through 72
h (Supplementary Fig. 1a,b). PCR/
restriction enzyme (PCR/RE) assays were
carried out to detect mutations in both
target regions (Supplementary Methods
and Supplementary Table 5). Digestionresistant bands were detected in both
sgRNA targets with efficiencies ranging

from 14.5% to 20.0%, as estimated by band
intensities (Fig. 1c and Supplementary
Methods). Cloning and sequencing of these
uncut bands revealed indels in the targeted
OsPDS gene. The highest frequency of
mutations was obtained with an sgRNA
with 20 nucleotides (nts) of sequence
complementary to the OsPDS-SP1 target
site (P = 0.039) (Supplementary Fig. 1c,d).
We targeted another three rice genes
(OsBADH2, Os02g23823 and OsMPK2) and
one wheat gene (TaMLO) (Supplementary
Tables 1 and 4) in protoplasts, with indel
frequencies of 26.5–38.0% (Supplementary
Fig. 2 and Supplementary Table 2).
The frequency of mutations induced by
sgRNA:Cas9 in Os02g23823 was lower
(26.0%) than that induced by TALENs
(36.5%) (Supplementary Fig. 1e), whereas,
in OsBADH2, it was considerably higher12
(26.5% versus 8.0%) (Supplementary Fig. 2
and Supplementary Table 7). Our results
suggest that a customized sgRNA:Cas9
efficiently induces sequence-specific
modifications in plants. Moreover, only a
single customized sgRNA, encoded by a
sequence of ~100 nt, is required to target a
specific sequence, and Cas9 does not have
to be reengineered for each new target
site. The sgRNA:Cas9 system is therefore
much more straightforward than ZFNs or
TALENs.
To test whether sgRNA:Cas9 can induce
gene knockouts in rice plants, we bombarded
rice callus cells with Cas9 plasmid and
sgRNA expression plasmids designed
to cleave either OsPDS or OsBADH2
(Supplementary Methods). Transformed,
hygromycin-tolerant calli were grown into
whole plants. Mutations in OsPDS-SP1 were
identified in 9 of 96 independent transgenic
plants (9.4%), and mutations in OsBADH2
in 7 of 98 transgenic plants (7.1%) (Fig. 1d
and Supplementary Table 2). In addition,
biallelic mutations were identified in 3 of
the 9 plants mutated in OsPDS-SP1. Two
of them were homozygous for the same
one-nucleotide insertion (Fig. 1d), and all
three had the albino and dwarf phenotype
(Fig. 1e), showing that the rice phytoene
desaturase gene had been disrupted.
To examine homology-directed repair
(HDR), we designed a single-stranded oligo
with a KpnI + EcoRI site to be introduced
into OsPDS (Fig. 2a and Supplementary
Table 6). To detect such mutations, we
used a PCR/RE assay that preferentially
amplifies mutated DNA sequences.
Protoplast genomic DNA was cleaved with
PstI before PCR amplification to enrich
for sgRNA:Cas9-induced mutations.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

c o rresp o ndence

npg

© 2013 Nature America, Inc. All rights reserved.

Figure 1 Genome editing in rice and wheat using an engineered type II
CRISPR-Cas system. (a) Schematic illustrating the engineered type II
CRISPR-Cas system. The Cas9 HNH and RuvC-like domains each cleave one
strand of the sequence targeted by the sgRNA, providing that the correct
protospacer-adjacent motif sequence (PAM) is present at the 3ʹ end. NLSs,
nuclear localization signals. (b) Schematic of the OsPDS gene with the two
sgRNA:Cas9 targets (blue) and corresponding PAMs (red). A PstI site is
underlined. (c) PCR/RE assay to detect engineered sgRNA:Cas9-induced
mutations in protoplasts. Lanes 1 and 2, PCR products of samples treated
with the respective sgRNA:Cas9. Lanes 3 and 4, undigested and digested
wild-type controls, respectively. Red arrowhead indicates the band used for
quantification. The numbers at the bottom of the gels indicate mutation
frequencies measured by band intensities. Deletions and insertions are
indicated by dashes and red letters, respectively. Numbers on the side
indicate types of mutation and numbers of nucleotides involved. Percent
indels (%) were calculated from band intensities. (d) sgRNA:Cas9-induced
OsPDS-SP1 and OsBADH2 mutations in transgenic rice plants. DNA
samples from independent transgenic rice seedlings were analyzed for
mutations by the PCR/RE assay and sequencing. In the top gel, lanes 4,
5, 16, 19 and 20 are monoallelic mutants of OsPDS; lanes 8 and 13 are
biallelic mutants of OsPDS. In the bottom gel, lane 4, 8, 10 and 12 are
monoallelic mutants of OsBADH2. Red arrowheads indicate bands used for
mutation identification. (e) Phenotypes of the pds mutants.
(1) Nontransgenic wild-type rice plant. (2) Monoallelic mutant. (3) Biallelic
homozygous mutant. (4) Biallelic heterozygous mutant. Mutants 3 and 4
have the albino and dwarf phenotype.

PCR products were verified by cloning,
restriction digestion with KpnI or EcoRI,
and DNA sequencing. Two of 29 single
colonies had the expected insertion of the
KpnI + EcoRI site into OsPDS (Fig. 2b–d),
demonstrating the possibility of HDRmediated genome modification by cotransformation of Cas9, sgRNA and singlestranded DNA oligos into plant cells.
We next evaluated potential off-target
effects of two sgRNA:Cas9 constructs
targeting the OsMPK2 or OsPDS genes.
Three nearly identical sequences, PDS_NI-1,
MPK2_NI-1 and MPK2_NI-2, with one-base
or three-base mismatches to the PDS-SP1
and MPK2 target sites were identified from
the rice genome (Supplementary Fig. 3a).

Using the PCR/RE assay (Supplementary
Table 5), no evidence of sgRNA:Cas9-induced
mutation was found in PDS_NI-1 and
MPK2_NI-2. In contrast, several deletion
events were identified from the MPK2 target
site to the MPK2_NI-1 site, 30 bp downstream
of the MPK2 target site (Supplementary
Fig. 3b). These deletion events could be the
products of rejoining broken ends generated
by separate cleavages at both the MPK2_NI-1
and MPK2 target sites, suggesting that offtarget cleavage can occur in homologous
sequences13. They could also be explained
by homologous recombination between the
nearly identical sites induced by a single
sgRNA:Cas9-mediated cleavage in the
target site. Comprehensive studies using

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

genome-wide approaches are required to
thoroughly address the off-target issue for the
sgRNA:Cas9 system.
The system described can in principle
target any sequences, such as 5ʹ-A-N(20)GG-3ʹ and 5ʹ-G-N(20)-GG-3ʹ in rice and
wheat, respectively. Use of the rice U3
promoter and the wheat U6 promoter
constrains the first positions in the
corresponding RNA transcripts to be ‘A’
and ‘G’, respectively. A computer search
generated ~3,183,497 and ~566,367
sequences specifically targetable by sgRNAs
in the rice genome and rice cDNAs,
respectively, representing nearly nine targets
per cDNA (Supplementary Table 3).
Loosening these constraints to target
687

c o rresp o ndence
Figure 2 HDR-mediated genome modification
in rice protoplasts. (a) Schematic of the oligo
targeting site in OsPDS. The sgRNA targeting
sgRNA
sequence is in blue, and the PAM sequence
in red. The 72-bp donor oligo is shown under
the target site, with 12-bp insertions (KpnI
72-bp oligo
+ EcoRI site) in green. (b) PCR amplification
of the protoplast genomic DNA predigested
with PstI to enrich for sgRNA:Cas9-induced
mutations. Specific 1F and 1R primers were used.
HR, homologous recombination. (c) Targeted
–RE
integration of the KpnI and EcoRI restriction sites.
+Pstl
The enrichment PCR product (+Cas9, +sgRNA,
+HR template) was cloned into pEASY-Blunt
+Kpnl
vector (TransGen). Lanes 1–12, representative
PCR products of cloned alleles for digesting assay;
+EcoRI No.3/No.8
–RE, PCR amplification of colonies with M13F/R
primers; +PstI, +KpnI and +EcoRI, PCR products
digested with PstI, KpnI and EcoRI, respectively.
Two cloned alleles (no. 3 and no. 8, arrowhead) with KpnI and EcoRI insertions were identified. (d) Sanger sequencing results for cloned alleles no. 3
and no. 8 show HDR-mediated targeting. Inserted sequences are labeled in red.

a

b

npg

© 2013 Nature America, Inc. All rights reserved.

c

sequences of the form 5ʹ-A-N(19–21)-GG-3ʹ
identified 32 targets on average per cDNA.
The wheat A and D genomes yielded similar
results (Supplementary Table 3). Our
findings establish that the sgRNA:Cas9
system can be used for rice and wheat
genome modification, the first plants shown
to be amenable to this gene editing approach.
Note: Supplementary information is available in the
online version of the paper (doi:10.1038/nbt.2650).
ACKNOWLEDGMENTS
This work was supported by the National Natural
Science Foundation of China (201263, 383601
and 31200273), the Ministry of Agriculture of
China (2011ZX08002-004 and 2013ZX08010002) and Chinese Academy of Sciences (KSCX2EW-N-06, KSCX2-EW-J-6). We thank X. Wang for
bioinformatics analysis.
AUTHOR CONTRIBUTIONS
Q.S., Y.W., J.L., Y.Z., K.C., Z.L., J.J.X., J.-L.Q. and C.G.
designed the experiments; Q.S., Y.W., J.L., Y.Z., K.Z.
and J.L. performed experiments; Q.S., Y.W., J.L.,
J.-L.Q. and C.G. wrote the paper.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Qiwei Shan1,4, Yanpeng Wang1,4, Jun Li1,4,
Yi Zhang1, Kunling Chen1, Zhen Liang1,
Kang Zhang1, Jinxing Liu1, Jianzhong Jeff Xi2,
Jin-Long Qiu3 & Caixia Gao1
1State Key Laboratory of Plant Cell and

Chromosome Engineering, Institute of Genetics
and Developmental Biology, Chinese Academy of
Sciences, Beijing, China. 2Institute of Molecular
Medicine, Peking University, Beijing, China.
3State Key Laboratory of Plant Genomics,
Institute of Microbiology, Chinese Academy
of Sciences, Beijing, China. 4These authors
contributed equally to this work.
e-mail: [email protected] or [email protected]
1. Zhang, F. et al. Proc. Natl. Acad. Sci. USA 107, 12028–
12033 (2010).
2. Chen, K. & Gao, C. J. Genet. Genomics 40, 271–279

688

d

(2013).
3. Mussolino, C. & Cathomen, T. Nat. Biotechnol. 31,
208–209 (2013).
4. Barrangou, R. et al. Science 315, 1709–1712
(2007).
5. Jinek, M. et al. Science 337, 816–821 (2012).
6. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V.
Proc. Natl. Acad. Sci. USA 109, E2579–E2586 (2012).
7. Cong, L. et al. Science 339, 819–823 (2013).

8. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini,
L.A. Nat. Biotechnol. 31, 233–239 (2013).
9. Hwang, W.Y. et al. Nat. Biotechnol. 31, 227–229
(2013).
10. Chang, N. et al. Cell Res. 23, 465–472 (2013).
11. Wang, H. et al. Cell 153, 910–918 (2013).
12. Shan, Q. et al. Mol. Plant 6, 1365–1368 (2013).
13. Fu, Y. et al. Nat. Biotechnol. advance online publication,
doi:10.1038/nbt.2623 (23 June 2013).

Multiplex and homologous
recombination–mediated genome
editing in Arabidopsis and
Nicotiana benthamiana using
guide RNA and Cas9
To the Editor:
Elucidation and manipulation of human,
animal and plant genomes is key to basic
biology research, medical advances and crop
improvement. The development of targeted
genome editing, particularly homologous
recombination–based gene replacement,
is of great value in all organisms. Recent
advances in engineered nucleases with
programmable DNA-binding specificities,
such as zinc finger nucleases (ZFNs)
and transcription activator-like effector
nucleases (TALENs), have provided valuable
means to create targeted mutations in
metazoan and plant genomes with high
specificity1–6. However, these technologies
demand elaborate design and assembly
of individual DNA-binding proteins for
each DNA target site1–6. Recently, a simple,
versatile and efficient genome engineering
technology has been developed based on the

bacterial clustered, regularly interspaced,
short palindromic repeats (CRISPR)associated protein (Cas) adaptive immune
systems7. In a type II CRISPR-Cas system
from Streptococcus pyogenes, a single
Cas9 endonuclease guided by a duplex of
mature CRISPR RNA (crRNA) and transactivating crRNA (tracrRNA) cleaves
trespassing DNA from bacteriophage or
plasmids in a sequence-specific manner7.
By reconstitution of the S. pyogenes Cas9
(SpCas9) and an artificial chimera of crRNA
and tracrRNA called synthetic-guide RNA
(sgRNA) in eukaryotic cells, including yeast,
zebrafish, mouse and human cells, targeted
genome editing has been achieved through
either error-prone nonhomologous end
joining (NHEJ) or homology-directed repair
(HDR) of the intended cleavage site7–14.
Here, we show the feasibility and efficacy
of sgRNA:Cas9–based genome editing

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

c o rresp o ndence
Figure 2 HDR-mediated genome modification
in rice protoplasts. (a) Schematic of the oligo
targeting site in OsPDS. The sgRNA targeting
sgRNA
sequence is in blue, and the PAM sequence
in red. The 72-bp donor oligo is shown under
the target site, with 12-bp insertions (KpnI
72-bp oligo
+ EcoRI site) in green. (b) PCR amplification
of the protoplast genomic DNA predigested
with PstI to enrich for sgRNA:Cas9-induced
mutations. Specific 1F and 1R primers were used.
HR, homologous recombination. (c) Targeted
–RE
integration of the KpnI and EcoRI restriction sites.
+Pstl
The enrichment PCR product (+Cas9, +sgRNA,
+HR template) was cloned into pEASY-Blunt
+Kpnl
vector (TransGen). Lanes 1–12, representative
PCR products of cloned alleles for digesting assay;
+EcoRI No.3/No.8
–RE, PCR amplification of colonies with M13F/R
primers; +PstI, +KpnI and +EcoRI, PCR products
digested with PstI, KpnI and EcoRI, respectively.
Two cloned alleles (no. 3 and no. 8, arrowhead) with KpnI and EcoRI insertions were identified. (d) Sanger sequencing results for cloned alleles no. 3
and no. 8 show HDR-mediated targeting. Inserted sequences are labeled in red.

a

b

npg

© 2013 Nature America, Inc. All rights reserved.

c

sequences of the form 5ʹ-A-N(19–21)-GG-3ʹ
identified 32 targets on average per cDNA.
The wheat A and D genomes yielded similar
results (Supplementary Table 3). Our
findings establish that the sgRNA:Cas9
system can be used for rice and wheat
genome modification, the first plants shown
to be amenable to this gene editing approach.
Note: Supplementary information is available in the
online version of the paper (doi:10.1038/nbt.2650).
ACKNOWLEDGMENTS
This work was supported by the National Natural
Science Foundation of China (201263, 383601
and 31200273), the Ministry of Agriculture of
China (2011ZX08002-004 and 2013ZX08010002) and Chinese Academy of Sciences (KSCX2EW-N-06, KSCX2-EW-J-6). We thank X. Wang for
bioinformatics analysis.
AUTHOR CONTRIBUTIONS
Q.S., Y.W., J.L., Y.Z., K.C., Z.L., J.J.X., J.-L.Q. and C.G.
designed the experiments; Q.S., Y.W., J.L., Y.Z., K.Z.
and J.L. performed experiments; Q.S., Y.W., J.L.,
J.-L.Q. and C.G. wrote the paper.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Qiwei Shan1,4, Yanpeng Wang1,4, Jun Li1,4,
Yi Zhang1, Kunling Chen1, Zhen Liang1,
Kang Zhang1, Jinxing Liu1, Jianzhong Jeff Xi2,
Jin-Long Qiu3 & Caixia Gao1
1State Key Laboratory of Plant Cell and

Chromosome Engineering, Institute of Genetics
and Developmental Biology, Chinese Academy of
Sciences, Beijing, China. 2Institute of Molecular
Medicine, Peking University, Beijing, China.
3State Key Laboratory of Plant Genomics,
Institute of Microbiology, Chinese Academy
of Sciences, Beijing, China. 4These authors
contributed equally to this work.
e-mail: [email protected] or [email protected]
1. Zhang, F. et al. Proc. Natl. Acad. Sci. USA 107, 12028–
12033 (2010).
2. Chen, K. & Gao, C. J. Genet. Genomics 40, 271–279

688

d

(2013).
3. Mussolino, C. & Cathomen, T. Nat. Biotechnol. 31,
208–209 (2013).
4. Barrangou, R. et al. Science 315, 1709–1712
(2007).
5. Jinek, M. et al. Science 337, 816–821 (2012).
6. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V.
Proc. Natl. Acad. Sci. USA 109, E2579–E2586 (2012).
7. Cong, L. et al. Science 339, 819–823 (2013).

8. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini,
L.A. Nat. Biotechnol. 31, 233–239 (2013).
9. Hwang, W.Y. et al. Nat. Biotechnol. 31, 227–229
(2013).
10. Chang, N. et al. Cell Res. 23, 465–472 (2013).
11. Wang, H. et al. Cell 153, 910–918 (2013).
12. Shan, Q. et al. Mol. Plant 6, 1365–1368 (2013).
13. Fu, Y. et al. Nat. Biotechnol. advance online publication,
doi:10.1038/nbt.2623 (23 June 2013).

Multiplex and homologous
recombination–mediated genome
editing in Arabidopsis and
Nicotiana benthamiana using
guide RNA and Cas9
To the Editor:
Elucidation and manipulation of human,
animal and plant genomes is key to basic
biology research, medical advances and crop
improvement. The development of targeted
genome editing, particularly homologous
recombination–based gene replacement,
is of great value in all organisms. Recent
advances in engineered nucleases with
programmable DNA-binding specificities,
such as zinc finger nucleases (ZFNs)
and transcription activator-like effector
nucleases (TALENs), have provided valuable
means to create targeted mutations in
metazoan and plant genomes with high
specificity1–6. However, these technologies
demand elaborate design and assembly
of individual DNA-binding proteins for
each DNA target site1–6. Recently, a simple,
versatile and efficient genome engineering
technology has been developed based on the

bacterial clustered, regularly interspaced,
short palindromic repeats (CRISPR)associated protein (Cas) adaptive immune
systems7. In a type II CRISPR-Cas system
from Streptococcus pyogenes, a single
Cas9 endonuclease guided by a duplex of
mature CRISPR RNA (crRNA) and transactivating crRNA (tracrRNA) cleaves
trespassing DNA from bacteriophage or
plasmids in a sequence-specific manner7.
By reconstitution of the S. pyogenes Cas9
(SpCas9) and an artificial chimera of crRNA
and tracrRNA called synthetic-guide RNA
(sgRNA) in eukaryotic cells, including yeast,
zebrafish, mouse and human cells, targeted
genome editing has been achieved through
either error-prone nonhomologous end
joining (NHEJ) or homology-directed repair
(HDR) of the intended cleavage site7–14.
Here, we show the feasibility and efficacy
of sgRNA:Cas9–based genome editing

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

c o rresp o ndence
a

b

Protoplast co-transfection
FLAG NLS IV2 intron
35SPPDK
U6 (Pol III)

NLS

pcoCas9
Guide

NOS

sgRNA scaffold

Cas9

TTTTTT

sgRNA

cAtPDS3

dAtFLS2

sgRNA:pcoCas9 = 1:1 5.6% (10/180) mutated
PAM

eNbPDS3 target 1

0% (0/180) mutated
0% (0/93) mutated

npg

© 2013 Nature America, Inc. All rights reserved.

sgRNA:pcoCas9 = 1:1 37.7% (43/114) mutated

Chromosome
AtPDS3 exon 6

sgRNA:pcoCas9 = 1:1 1.1% (2/190) mutated
PAM

sgRNA:pcoCas9 = 19:1
sgRNA:pcoCas9 = 20:0
sgRNA:pcoCas9 = 19:1
sgRNA:pcoCas9 = 20:0

PAM

Target sequence

0% (0/191) mutated
0% (0/95) mutated

fNbPDS3 target 2

sgRNA:pcoCas9 = 1:1 38.5% (25/65) mutated

PAM

PAM

sgRNA:pcoCas9 = 0:20 0% (0/58) mutated

sgRNA:pcoCas9 = 0:20

g

0% (0/67) mutated

Plant seedling/leaf agroinfiltration
LB 35S

pcoCas9

NOS

LB 35S

pcoCas9

NOS RB

U6

Guide

h

AtPDS3

AtPDS3
sgRNA scaffold

i
PAM

TTTTTT RB

NbPDS target 1

2.7% (9/336) mutated

4.8% (5/105) mutated

0% (0/94) mutated

0% (0/76) mutated

NbPDS3 target 1

PAM

Figure 1 Targeted plant genome editing by sgRNA:pcoCas9. (a) sgRNA:pcoCas9 constructs for protoplast
co-transfection. NLS, nuclear localization sequence. (b) Diagram of the sgRNA:pcoCas9 complex
targeting the Arabidopsis AtPDS3 exon 6. (c–f) Targeted genome editing on AtPDS3 (c) and AtFLS2
(d) in Arabidopsis protoplasts and NbPDS (e,f) in Nicotiana benthamiana protoplasts. (g) Binary plasmids
for genome editing of AtPDS3 and NbPDS in Arabidopsis and N. benthamiana plants, respectively,
by Agrobacterium-mediated transient gene expression. (h,i) Targeted genome editing on AtPDS3 in
Arabidopsis seedlings (h) and NbPDS in N. benthamiana leaves (i). The mutation rate in c–g was
calculated based on the mutant/total alleles of randomly selected clonal amplicons of the target locus. In
c–f, h and i, blue shadow marks the target sequence recognized by cognate sgRNA. PAM, the protospacer
adjacent motif. DNA insertions, deletions and point mutations are shown in red as upper case letters,
dashes and lower case letters, respectively. The upright arrow and number in red indicate a long insertion.

technology in the model plants Arabidopsis
thaliana and Nicotiana benthamiana.
To explore the use of sgRNA:Cas9
technology for plant genome engineering,
we first expressed a plant codon–optimized
SpCas9 (pcoCas9) and an sgRNA
targeting Arabidopsis thaliana PDS3
(PHYTOENE DESATURASE) (Fig. 1a,b and
Supplementary Sequences) in Arabidopsis
mesophyll protoplasts, which are freshly
isolated leaf cells without cell walls. The
protoplast transient expression system
supports highly efficient DNA co-transfection

and protein expression15. The pcoCas9 was
expressed under the hybrid constitutive
35SPPDK promoter15, whereas the sgRNA
was transcribed from the Arabidopsis U6
polymerase III promoter (Fig. 1a). Notably,
pcoCas9 was expressed at a substantially
higher level than the humanized SpCas9
(ref. 9) using the same expression vector in
Arabidopsis protoplasts (Supplementary
Fig. 1). In addition, pcoCas9 encodes nuclear
localization sequences at both protein
termini (Fig. 1a) for optimal protein nuclear
localization8. A potato IV2 intron (Fig. 1a)

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

was inserted to minimize adverse effects on
bacterial growth16 resulting from potential
leaky expression and nuclease activities of
pcoCas9 in Escherichia coli during cloning.
To determine the mutagenesis efficiency
of the sgRNA:pcoCas9 system in Arabidopsis
protoplasts, we cloned and Sanger-sequenced
genomic PCR (gPCR) amplicons of the
target region using total genomic DNA
(gDNA) from transfected protoplasts as
templates (Supplementary Methods).
With a DNA ratio of sgRNA:pcoCas9 at
1:1 during co-transfection, we detected ten
mutated AtPDS3 target alleles among 180
randomly sequenced amplicons, reaching an
approximate mutagenesis frequency of 5.6%
(Fig. 1c). Of note, a ratio of sgRNA:pcoCas9
at 1:19 failed to induce any mutation in 180
sequenced amplicons, and no mutation was
detected among 93 sequenced amplicons
when pcoCas9 was expressed alone (Fig. 1c).
For a second gene, AtFLS2 (FLAGELLIN
SENSITIVE 2), tested in Arabidopsis
protoplasts, the sgRNA:pcoCas9-mediated
mutagenesis also only occurred with a DNA
ratio of sgRNA:pcoCas9 at 1:1 but not at 1:19
(Fig. 1d). In this case, a lower mutagenesis
frequency (1.1%) was observed (Fig. 1d).
Taken together, these results suggest that
sgRNA expression is the limiting factor for
optimal targeting and mutagenesis in plant
cells, as in human cells12.
To extend the application of
sgRNA:pcoCas9-mediated genome editing
to other plant systems, we carried out
a parallel study using N. benthamiana
protoplasts. Notably, we targeted NbPDS (the
N. benthamiana ortholog of AtPDS3) at two
different sites, and obtained substantially
higher mutagenesis frequencies than in
Arabidopsis, namely 37.7% for the first
target site (Fig. 1e) and 38.5% for the second
target site (Fig. 1f). The sgRNA:pcoCas9induced mutagenesis frequently led to
considerable DNA deletions or insertions
but rare single-nucleotide (nt) substitutions
in N. benthamiana cells (Fig. 1e,f and
Supplementary Fig. 2a,b), as in animal
and human cells displaying relatively high
mutation rates (e.g., 37.6% in K562 cells
and 24.6% in 293T cells)7–14. In contrast,
single-nucleotide deletions, insertions or
substitutions were most frequently detected in
Arabidopsis cells with relatively low mutation
rates ranging from 1.1% to 5.6% (Fig. 1c,d).
The use of high-fidelity DNA polymerase
in amplifying these short (~300 bp) target
regions (Supplementary Methods) and the
absence of mutagenesis in control experiments
(Fig. 1c,d) excluded the possibility that the
single-nucleotide mutations observed in
689

c o rresp o ndence
a

b
PAM

AtRACK1a

0% (0/94) mutated

AtRACK1b

2.7% (2/75) mutated

PAM

AtRACK1c

2.5% (2/79) mutated

PAM

Seed sequence

c

U6 (Pol III)

d

AtPDS3

Guide 2

sgRNA scaffold

7.7% (7/91) mutated

AtPDS3 target 2

e

533 bp

© 2013 Nature America, Inc. All rights reserved.

pcoCas9
sgRNA
HDR template
CYCD3
AvrII

NbPDS

sgRNA scaffold

AtPDS3 target 1

PAM

114 bp
AvrII

g

npg

Guide 1

TTTTTT

PAM

f

NbPDS
locus
HDR template

U6 (Pol III)

TTTTTT

9.0% (14/155) with AvrII

AvrII

HDR (%)



+

+


+

0



+
+



+

0



+
+
+


+

10.7



+
+
+
+

+

bp
197
141
56

11.1

Figure 2 Multiplex and HDR-mediated genome editing by sgRNA:pcoCas9 in plant cells. (a) sgRNA
targeting two genes. AtRACK1b and AtRACK1c but not AtRACK1a from the Arabidopsis RACK1 family
are sgRNA targets. The target sequence recognized by sgRNA is highlighted in blue and the sgRNA
seed sequence is underlined. (b) Targeted mutations induced by sgRNA:pcoCas9 in AtRACK1b and
AtRACK1c but not AtRACK1a in Arabidopsis protoplasts. (c) A tandem sgRNA construct. (d) Large
genomic deletions are induced by double sgRNAs targeting the AtPDS3 locus. In b and d, DNA
insertions, deletions and point mutations are shown in red as upper case letters, dashes and lower case
letters, respectively. (e) HDR strategy. Successful HDR creates an AvrII site in the target sequence of
the NbPDS locus. The arrows represent the primers for gPCR amplification of the target region. (f) AvrII
digestion products (marked by asterisks) of NbPDS target amplicons exist upon successful HDR in
N. benthamiana protoplasts. Arabidopsis cyclin D–type 3 (CYCD3), a master activator of the cell cycle.
(g) DNA sequencing evidence of successful HDR in the presence of pcoCas9, sgRNA and HDR template.

Arabidopsis protoplasts were introduced
through PCR amplification. Our results
demonstrate that the sgRNA:pcoCas9 system
is effective in plant cells. Whether the different
genome mutagenesis frequencies and patterns
in Arabidopsis and N. benthamiana are due to
distinct plant genotypes or physiological states
requires future investigation.
To validate the occurrence of the
sgRNA:pcoCas9-induced targeted
mutagenesis in PDS in planta, we transiently
co-expressed pcoCas9 and AtPDS3- or
NbPDS-targeting sgRNA on a single binary
plasmid (Fig. 1g) in intact leaves of 2-weekold Arabidopsis seedlings or 5-week-old N.
benthamiana plants through Agrobacterium
leaf infiltration (agroinfiltration). Biallelic
disruption of PDS in the Arabidopsis or N.
benthamiana genome would be expected
to abolish carotenoid biosynthesis and
promote chlorophyll oxidation, leading
to a photobleached phenotype. We did
not observe any visible albino spot on
agroinfiltrated leaves from Arabidopsis or
N. benthamiana plants 7 days after
690

infiltration. This suggests that there were
either no cells with biallelic disruption of
PDS or the population of photobleached
cells was too small, as the cell division might
have ceased in the infiltrated leaves. Careful
screens of single cells after the degradation of
existing chlorophyll is necessary for further
characterization using fluorescent microscopy.
By sequencing target gPCR amplicons, we
did detect precise genomic mutations in
the AtPDS3 and NbPDS target sequence
in cells from agroinfiltrated leaves with a
mutagenesis frequency of 2.7% for AtPDS3
and 4.8% for NbPDS (Fig. 1g). Considering
that agroinfiltration has lower efficiency and
higher variability in gene transfer than the
protoplast transfection15, these mutagenesis
frequencies might reflect dilution of the
targeted mutations by wild-type gDNA from
leaf cells without successful DNA delivery.
Notably, the different sgRNA:pcoCas9induced mutagenesis patterns in Arabidopsis
and N. benthamiana protoplasts were also
observed in corresponding whole plants.
Although targeted mutations in Arabidopsis

seedlings were frequently single-nucleotide
substitutions (Fig. 1h), those in N.
benthamiana plants often involved longer
DNA deletions (Fig. 1i). The leaves infiltrated
with Agrobacteria expressing pcoCas9 alone
did not induce mutations in the target
regions (Fig. 1g). These data show that the
sgRNA:pcoCas9 system is also effective in
planta.
To test whether the sgRNA:pcoCas9
system allows multiplex genome editing in
Arabidopsis protoplasts, we first identified
an identical sgRNA target site (target
candidate no. 2, Supplementary Fig. 3)
for both AtRACK1b and AtRACK1c, two
members of the Arabidopsis RECEPTOR FOR
ACTIVATED C KINASE 1 (RACK1) family
(Fig. 2a). By co-expressing pcoCas9 and the
cognate sgRNA, we observed mutations in
both target genes with a similar mutagenesis
frequency (2.5–2.7%; Fig. 2b). Only singlenucleotide substitutions or insertions were
detected in these Arabidopsis genes (Fig.
2b). Notably, no mutation was detected in
a homologous sequence from AtRACK1a
(Fig. 2b), which contains a valid protospacer
adjacent motif (PAM) but two mismatches to
the 12-nt seed sequence governing the sgRNA
specificity7–9,13,16,17 (Fig. 2a), illustrating
the high specificity of the sgRNA:pcoCas9directed genome editing in plant cells. We
further co-expressed pcoCas9 and tandem
sgRNAs aiming for two juxtaposed targets
in AtPDS3 with a 24-bp spacer (Fig. 2c).
Interestingly, this simultaneous targeting
with two sgRNAs led to deletions of up
to 48 bp genomic segments between
these two target sites by sgRNA:pcoCas9
with a mutation frequency of 7.7% (Fig.
2d and Supplementary Fig. 2c). Taken
together, these results demonstrated that
the sgRNA:pcoCas9 system could facilitate
multiplex genome editing in plants.
We next addressed whether the presence
of a DNA donor upon sgRNA:pcoCas9mediated generation of a double-strand break
would lead to gene replacement by HDR,
which could precisely integrate an intended
mutation from the DNA donor into the
target site. We co-expressed pcoCas9 and the
sgRNA aiming for the NbPDS target 1 in N.
benthamiana protoplasts and concurrently
supplied a double-stranded DNA donor
that contains a unique AvrII site flanked by
a 533-bp left homology arm and a 114-bp
right homology arm to the NbPDS locus
(Fig. 2e). AvrII digestion of gPCR amplicons
spanning the NbPDS target site revealed
AvrII incorporation in the target locus with
a frequency of 10.7%, and this incorporation
strictly relied on both sgRNA and the DNA

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

c o rresp o ndence
donor (Fig. 2f). Sanger sequencing further
verified the anticipated creation of the AvrII
site in the target sequence without additional
modifications and indicated an HDRmediated gene replacement at a frequency
of 9.0% (Fig. 2g). In addition, we detected
NHEJ-mediated targeted mutagenesis at
the NbPDS locus with a frequency of 14.2%
(Supplementary Fig. 4). As mesophyll
protoplasts are isolated from differentiated
leaves without active cell division, we
tested the possibility of enhancing HDR by
triggering ectopic cell division. Co-expression
of Arabidopsis CYCD3 (CYCLIN D-TYPE
3), a master activator of the cell cycle, hardly
promoted the HDR in N. benthamiana
protoplasts (Fig. 2f). Exploration of HDR
in Arabidopsis protoplasts was unsuccessful,
presumably owing to intrinsically low
efficiency of HDR in Arabidopsis18.
To facilitate genome-wide application of the
sgRNA:pcoCas9 technology in Arabidopsis,
we generated, using bioinformatics, a database
containing a total of 1,466,718 unique
sgRNA target sequences in Arabidopsis
exons (Supplementary Database), which
cover >99% (26,942 out of 27,206) of the
nuclear protein-encoding genes defined
by TAIR10 (The Arabidopsis Information
Resource 10, http://arabidopsis.org/portals/
genAnnotation/gene_structural_annotation/
annotation_data.jsp/). Targeting efficacy
and specificity of selected sgRNA target
candidates from this database need to be
experimentally determined each time during
future implementation. We also introduced
a facile method to manually design a shared
sgRNA target site specific for multiple
homologous target genes by aligning their
coding sequences and carrying out a BLAST
search to evaluate off-target possibilities
(Supplementary Fig. 3). The sgRNA:pcoCas9
technology enables an easy reprogramming
of DNA targeting specificity by changing
the 20-nt guide sequence in the sgRNA
without modifying the pcoCas9 protein.
We have established a simple and rapid
procedure to create a custom sgRNA through
overlapping PCR (Supplementary Fig. 5 and
Supplementary Table 1). Thus, it is feasible
to use single or tandemly expressed sgRNAs
(Fig. 2c) to simultaneously target multigene
families, which is not easily done with ZFNs
and TALENs.
We have tested a total of seven target
sequences in five target genes in Arabidopsis
or N. benthamiana, and obtained targeted
mutagenesis in all cases. The variation in
mutagenesis efficiency among different
genes in Arabidopsis may stem from distinct
sgRNA binding strength to individual target

sequences or distinct chromatin structure
and epigenetic state at individual target loci,
which requires future investigation. We have
demonstrated that plant protoplasts provide a
useful system to rapidly evaluate the efficiency
of the sgRNA:pcoCas9-mediated genome
editing at a specific genomic locus. Our data
also suggest that targeting an Arabidopsis
gene with multiple sgRNAs could improve
the success rate of targeted mutagenesis and
generate deletions to ensure gene knockout.
Notably, sgRNA:pcoCas9 achieved high
efficiency of HDR-mediated gene replacement
in N. benthamiana protoplasts. The simplicity
and versatility of the sgRNA:pcoCas9
technology demonstrated in this work
promise marker gene–independent and
antibiotic selection–free genome engineering
with high precision in diverse plant species to
advance basic science and biotech.
Note: Supplementary information is available in the
online version of the paper (doi:10.1038/nbt.2654).
ACKNOWLEDGMENTS
We thank F. Ausubel for critical reading of the article,
D. Voytas for discussion on the HDR strategy,
Y. Xiong for the CYCD3 expression plasmid. J.F.L.
is supported by the MGH ECOR Postdoctoral
Fellowship for Medical Discovery. The Research is
supported by the Department of Energy grant DEFG02-02ER63445 to G.M.C., the National Science
Foundation grant IOS-0843244 and the National
Institutes of Health grants R01 GM60493 and R01
GM70567 to J.S.
Author contributions
J.-F.L. and J.S. designed experiments; J.-F.L. and D.Z.
performed experiments; J.A., J.E.N., M.M. and G.M.C.

conducted bioinformatics analyses; J.B. supplied plant
materials; J.-F.L. and J.S. wrote the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Jian-Feng Li1,2, Julie E Norville2,3, John Aach2,3,
Matthew McCormack1,2, Dandan Zhang1,2,
Jenifer Bush1,2, George M Church2,3 &
Jen Sheen1,2
1Department of Molecular Biology and Center

for Computational and Integrative Biology,
Massachusetts General Hospital, Boston,
Massachusetts, USA. 2Department of Genetics,
Harvard Medical School, Boston, Massachusetts,
USA. 3Wyss Institute for Biological Inspired
Engineering, Harvard University, Cambridge,
Massachusetts, USA.
e-mail: [email protected]
1. Zhang, F. et al. Proc. Natl. Acad. Sci. USA 107, 12028–
12033 (2010).
2. Zhang, Y. et al. Plant Physiol. 161, 20–27 (2013).
3. Li, T. et al. Nat. Biotechnol. 30, 390–392 (2012).
4. Gaj, T. et al. Trends Biotechnol. 31, 397–405 (2013).
5. Mussolino, C. & Cathomen, T. Nat. Biotechnol. 31,
208–209 (2013).
6. Streubel, J. et al. Nat. Biotechnol. 30, 593–595 (2012).
7. Jinek, M. et al. Science 337, 816–821 (2012).
8. Cong, L. et al. Science 339, 819–823 (2013).
9. Mali, P. et al. Science 339, 823–826 (2013).
10. Hwang, W.Y. et al. Nat. Biotechnol. 31, 227–229
(2013).
11. Cho, S.W. et al. Nat. Biotechnol. 31, 230–232 (2013).
12. Jinek, M. et al. eLIFE 2, e00471 (2013).
13. DiCarlo, J.E. et al. Nucleic Acids Res. 41, 4336–4343
(2013).
14. Wang, H. et al. Cell 153, 910–918 (2013).
15. Yoo, S.D. et al. Nat. Protoc. 2, 1565–1572 (2007).
16. Qi, L.S. et al. Cell 152, 1173–1183 (2013).
17. Jiang, W. et al. Nat. Biotechnol. 31, 233–239 (2013).
18. de Pater, S. et al. Plant Biotechnol. J. 11, 510–515
(2013).

Targeted mutagenesis in the model
plant Nicotiana benthamiana using
Cas9 RNA-guided endonuclease
To the Editor:
Sustainable intensification of crop production
is essential to ensure food demand is matched
by supply as the human population continues
to increase1. This will require high-yielding
crop varieties that can be grown sustainably
with fewer inputs on less land. Both plant
breeding and genetic modification (GM)
methods make valuable contributions to
varietal improvement, but targeted genome
engineering promises to be critical to
elevating future yields. Most such methods
require targeting DNA breaks to defined
locations followed by either nonhomologous
end joining (NHEJ) or homologous
recombination2. Zinc finger nucleases (ZFNs)
and transcription activator-like effector

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

nucleases (TALENs) can be engineered to
create such breaks, but these systems require
two different DNA binding proteins flanking
a sequence of interest, each with a C-terminal
FokI nuclease module. We report here that
the bacterial clustered, regularly interspaced,
short palindromic repeats (CRISPR) system,
comprising a CRISPR-associated (Cas)9
protein and an engineered single guide RNA
(sgRNA) that specifies a targeted nucleic acid
sequence3, is applicable to plants to induce
mutations at defined loci.
To test the potential of the Cas9 system
to induce gene knockouts in plants, we took
advantage of Agrobacterium tumefaciens–
mediated transient expression assays
(agroinfiltration) to co-express a Cas9 variant
691

npg

© 2013 Nature America, Inc. All rights reserved.

c o rresp o ndence
donor (Fig. 2f). Sanger sequencing further
verified the anticipated creation of the AvrII
site in the target sequence without additional
modifications and indicated an HDRmediated gene replacement at a frequency
of 9.0% (Fig. 2g). In addition, we detected
NHEJ-mediated targeted mutagenesis at
the NbPDS locus with a frequency of 14.2%
(Supplementary Fig. 4). As mesophyll
protoplasts are isolated from differentiated
leaves without active cell division, we
tested the possibility of enhancing HDR by
triggering ectopic cell division. Co-expression
of Arabidopsis CYCD3 (CYCLIN D-TYPE
3), a master activator of the cell cycle, hardly
promoted the HDR in N. benthamiana
protoplasts (Fig. 2f). Exploration of HDR
in Arabidopsis protoplasts was unsuccessful,
presumably owing to intrinsically low
efficiency of HDR in Arabidopsis18.
To facilitate genome-wide application of the
sgRNA:pcoCas9 technology in Arabidopsis,
we generated, using bioinformatics, a database
containing a total of 1,466,718 unique
sgRNA target sequences in Arabidopsis
exons (Supplementary Database), which
cover >99% (26,942 out of 27,206) of the
nuclear protein-encoding genes defined
by TAIR10 (The Arabidopsis Information
Resource 10, http://arabidopsis.org/portals/
genAnnotation/gene_structural_annotation/
annotation_data.jsp/). Targeting efficacy
and specificity of selected sgRNA target
candidates from this database need to be
experimentally determined each time during
future implementation. We also introduced
a facile method to manually design a shared
sgRNA target site specific for multiple
homologous target genes by aligning their
coding sequences and carrying out a BLAST
search to evaluate off-target possibilities
(Supplementary Fig. 3). The sgRNA:pcoCas9
technology enables an easy reprogramming
of DNA targeting specificity by changing
the 20-nt guide sequence in the sgRNA
without modifying the pcoCas9 protein.
We have established a simple and rapid
procedure to create a custom sgRNA through
overlapping PCR (Supplementary Fig. 5 and
Supplementary Table 1). Thus, it is feasible
to use single or tandemly expressed sgRNAs
(Fig. 2c) to simultaneously target multigene
families, which is not easily done with ZFNs
and TALENs.
We have tested a total of seven target
sequences in five target genes in Arabidopsis
or N. benthamiana, and obtained targeted
mutagenesis in all cases. The variation in
mutagenesis efficiency among different
genes in Arabidopsis may stem from distinct
sgRNA binding strength to individual target

sequences or distinct chromatin structure
and epigenetic state at individual target loci,
which requires future investigation. We have
demonstrated that plant protoplasts provide a
useful system to rapidly evaluate the efficiency
of the sgRNA:pcoCas9-mediated genome
editing at a specific genomic locus. Our data
also suggest that targeting an Arabidopsis
gene with multiple sgRNAs could improve
the success rate of targeted mutagenesis and
generate deletions to ensure gene knockout.
Notably, sgRNA:pcoCas9 achieved high
efficiency of HDR-mediated gene replacement
in N. benthamiana protoplasts. The simplicity
and versatility of the sgRNA:pcoCas9
technology demonstrated in this work
promise marker gene–independent and
antibiotic selection–free genome engineering
with high precision in diverse plant species to
advance basic science and biotech.
Note: Supplementary information is available in the
online version of the paper (doi:10.1038/nbt.2654).
ACKNOWLEDGMENTS
We thank F. Ausubel for critical reading of the article,
D. Voytas for discussion on the HDR strategy,
Y. Xiong for the CYCD3 expression plasmid. J.F.L.
is supported by the MGH ECOR Postdoctoral
Fellowship for Medical Discovery. The Research is
supported by the Department of Energy grant DEFG02-02ER63445 to G.M.C., the National Science
Foundation grant IOS-0843244 and the National
Institutes of Health grants R01 GM60493 and R01
GM70567 to J.S.
Author contributions
J.-F.L. and J.S. designed experiments; J.-F.L. and D.Z.
performed experiments; J.A., J.E.N., M.M. and G.M.C.

conducted bioinformatics analyses; J.B. supplied plant
materials; J.-F.L. and J.S. wrote the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Jian-Feng Li1,2, Julie E Norville2,3, John Aach2,3,
Matthew McCormack1,2, Dandan Zhang1,2,
Jenifer Bush1,2, George M Church2,3 &
Jen Sheen1,2
1Department of Molecular Biology and Center

for Computational and Integrative Biology,
Massachusetts General Hospital, Boston,
Massachusetts, USA. 2Department of Genetics,
Harvard Medical School, Boston, Massachusetts,
USA. 3Wyss Institute for Biological Inspired
Engineering, Harvard University, Cambridge,
Massachusetts, USA.
e-mail: [email protected]
1. Zhang, F. et al. Proc. Natl. Acad. Sci. USA 107, 12028–
12033 (2010).
2. Zhang, Y. et al. Plant Physiol. 161, 20–27 (2013).
3. Li, T. et al. Nat. Biotechnol. 30, 390–392 (2012).
4. Gaj, T. et al. Trends Biotechnol. 31, 397–405 (2013).
5. Mussolino, C. & Cathomen, T. Nat. Biotechnol. 31,
208–209 (2013).
6. Streubel, J. et al. Nat. Biotechnol. 30, 593–595 (2012).
7. Jinek, M. et al. Science 337, 816–821 (2012).
8. Cong, L. et al. Science 339, 819–823 (2013).
9. Mali, P. et al. Science 339, 823–826 (2013).
10. Hwang, W.Y. et al. Nat. Biotechnol. 31, 227–229
(2013).
11. Cho, S.W. et al. Nat. Biotechnol. 31, 230–232 (2013).
12. Jinek, M. et al. eLIFE 2, e00471 (2013).
13. DiCarlo, J.E. et al. Nucleic Acids Res. 41, 4336–4343
(2013).
14. Wang, H. et al. Cell 153, 910–918 (2013).
15. Yoo, S.D. et al. Nat. Protoc. 2, 1565–1572 (2007).
16. Qi, L.S. et al. Cell 152, 1173–1183 (2013).
17. Jiang, W. et al. Nat. Biotechnol. 31, 233–239 (2013).
18. de Pater, S. et al. Plant Biotechnol. J. 11, 510–515
(2013).

Targeted mutagenesis in the model
plant Nicotiana benthamiana using
Cas9 RNA-guided endonuclease
To the Editor:
Sustainable intensification of crop production
is essential to ensure food demand is matched
by supply as the human population continues
to increase1. This will require high-yielding
crop varieties that can be grown sustainably
with fewer inputs on less land. Both plant
breeding and genetic modification (GM)
methods make valuable contributions to
varietal improvement, but targeted genome
engineering promises to be critical to
elevating future yields. Most such methods
require targeting DNA breaks to defined
locations followed by either nonhomologous
end joining (NHEJ) or homologous
recombination2. Zinc finger nucleases (ZFNs)
and transcription activator-like effector

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

nucleases (TALENs) can be engineered to
create such breaks, but these systems require
two different DNA binding proteins flanking
a sequence of interest, each with a C-terminal
FokI nuclease module. We report here that
the bacterial clustered, regularly interspaced,
short palindromic repeats (CRISPR) system,
comprising a CRISPR-associated (Cas)9
protein and an engineered single guide RNA
(sgRNA) that specifies a targeted nucleic acid
sequence3, is applicable to plants to induce
mutations at defined loci.
To test the potential of the Cas9 system
to induce gene knockouts in plants, we took
advantage of Agrobacterium tumefaciens–
mediated transient expression assays
(agroinfiltration) to co-express a Cas9 variant
691

with a eukaryotic nuclear localization signal
and an sgRNA in the model plant Nicotiana
benthamiana4. First, we constructed a green
fluorescent protein (GFP)-tagged version of
Cas9 using a previously described clone5. We
expressed GFP-Cas9 in N. benthamiana
leaf tissue using standard agroinfiltration
protocols6 and observed a clear nuclear
localization (Supplementary Fig. 1)
consistent with the nuclear localization
previously observed in human cells7. We
then generated an sgRNA with the guide
sequence matching a 20-bp region within
the phytoene desaturase (PDS) gene in
Nicotiana benthamiana (Fig. 1a). The
sgRNA was placed under an Arabidopsis
U6 promoter (Supplementary Fig. 2).
Both GFP-Cas9 and sgRNA were coexpressed in N. benthamiana leaf tissue
using A. tumefaciens as a vector. The tissue
was harvested 2 days later and DNA was
extracted. To easily detect sgRNA-guided,
Cas9-induced mutations at the PDS locus,
we used the restriction enzyme site loss
method2; as the target sequence within the
PDS gene overlaps with an MlyI restriction
site, we digested the genomic DNA with
MlyI and then performed a polymerase
chain reaction (PCR) with primers flanking
the target site (Supplementary Table 1).
By doing so, we greatly reduced unaltered
wild-type DNA in the sample and enriched
for DNA molecules carrying mutations that
remove the MlyI site.

The presence of both Cas9 and the sgRNA
resulted in increased levels of the PCR
product (Fig. 1b, lane 1) compared with
negative control treatments (Fig. 1b, lanes 2
and 3). Nondigested N. benthamiana genomic
DNA was used as a positive control (Fig. 1b,
lane 4). The assay was robust and reproducible
because we detected MlyI-resistant amplicons
in three additional independent experiments
using different plants (Supplementary Figs. 3
and 4). The PCR products from Figure 1b,
lanes 1 and 4 were cloned into a high-copy
vector and individual clones sequenced.
Sequence analysis of 20 clones derived from
the PCR product in lane 1 revealed the
presence of indels in 17 of them. The indels
can be grouped into nine different types
ranging from 1- to 9-bp deletions to 1-bp
insertions (Fig. 1c and Supplementary Fig. 5).
All recovered indels abolish the MlyI
restriction site within the target region. With
regard to 1-bp indels, we cannot totally rule
out the possibility that these mutations were
introduced by the DNA polymerase during
the PCR amplification step. Sequences of the
eight clones derived from the control PCR
product shown in lane 4 were all wild type.
To estimate the efficiency of targeted
mutagenesis, we amplified nondigested
genomic DNA from negative controls and
N. benthamiana leaves expressing both Cas9
and sgRNA, digested the amplicons with MlyI
and subjected them to gel electrophoresis.
We then measured the intensity of the uncut

a

Mlyl

npg

© 2013 Nature America, Inc. All rights reserved.

c o rresp o ndence

Primer F

Infiltrate Agrobacteria
Extract genomic
carrying Cas9 and sgRNA DNA and digest
expression constructs
with Mlyl
into Nicotiana benthamiana leaf

b

sgRNA
Cas9

1

+
+

2


+

3




c

4




band relative to the intensity of all detectable
bands in a gel lane as described previously8
(Fig. 2a and Supplementary Fig. 6). We
estimated the mutation rate to be in the range
of 1.8% to 2.4% (2.1% average) based on four
independent experiments.
We also examined whether plants could
be regenerated from cells modified using the
Cas9 system. N. benthamiana leaf sections
expressing Cas9 and the sgRNA were excised
and placed on selective medium to regenerate
plantlets (Supplementary Methods). DNA
extracted from leaf tissue of the regenerated
plants was used to detect sgRNA-guided,
Cas9-induced mutations with the restriction
enzyme site loss method described above.
Increases of MlyI-resistant PCR product were
observed in 2 out of 30 plants regenerated
from the sgRNA:Cas9-expressing tissue
but not in the negative control treatments
(Fig. 2b). To determine which mutations are
present in the PDS locus of transgenic plants
2 and 3, we cloned DNA fragments from
PCR products amplified using MlyI-digested
genomic DNA (Fig. 2b). In the case of plant 2,
only one type of mutation was found, whereas
sequencing reads from plant 3 revealed four
different mutations (Supplementary Fig. 7).
Both plants appear to carry the wild-type
PDS locus given that the PCR products
amplified using the nondigested genomic
DNA were partially cut by MlyI (Fig. 2b).
Therefore, plant 3 is clearly mosaic with
multiple mutations in addition to the wild-

PAM

PDS
Primer R

PCR-amplify across
the target site

Clone PCR products

Mlyl

Sequence

PAM

bp
700
500
400
300

Mlyl-digested
genomic DNA

Figure 1 Targeted mutagenesis in planta using the Cas9 RNA-guided endonuclease. (a) Assay scheme. (b) DNA gel with PCR bands obtained upon
amplification using primers flanking the target site within the PDS gene of N. benthamiana. In lanes 1–3 the template genomic DNA was digested with MlyI,
whereas in lane 4 nondigested genomic DNA was used. (c) Alignment of reads with Cas9-induced indels in PDS obtained from lane 1 of b. The wild-type
sequence is shown at the top. The sequence targeted by the synthetic sgRNA is shown in red whereas the mutations are shown in blue. PAM, the protospaceradjacent motif, was selected to follow the consensus sequence NGG. The changes in length and sequence are shown to the right. Three additional replicates
of this experiment are presented in Supplementary Figure 3.

692

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

c o rresp o ndence

sgRNA
Cas9


++ +

b

PD
S
GF
P

a

bp

Genomic
Non-digested Mlyl-digested
DNA template: 1 2 3 c 1 2 3 c bp

500
400

Mlyl (–)

300
200

Mlyl (+)

500
400
500
400
300
200
75

75

npg

© 2013 Nature America, Inc. All rights reserved.

Mutation rate: 1.9%

Figure 2 Measurement of the mutation rate induced by Cas9 RNA-guided endonuclease and
transgenic N. benthamiana plants carrying mutations in the PDS gene. (a) The PDS locus was
amplified using nondigested genomic DNA from leaf tissue expressing Cas9 and sgRNA targeting
PDS as well as from negative controls (Cas9 plus an sgRNA targeting GFP, and Cas9 on its own).
To measure the mutation rate, we divided the intensity of the uncut band by the intensity of all bands
in the lane. Three additional replicates of this experiment are presented in Supplementary Figure 6.
(b) N. benthamiana plants were transformed using A. tumefaciens carrying Cas9 and an sgRNA
targeting the PDS locus as described in the Supplementary Methods. The two plantlets carrying
mutations in the PDS gene were analyzed alongside negative controls. The PDS locus was amplified
using either nondigested or MlyI-digested genomic DNA as a template. The resulting amplicons were
then digested with MlyI. Plant 1 does not carry mutations in the PDS locus, whereas plants 2 and
3 do. Lane ‘c’ corresponds to nontransformed N. benthamiana. The arrowhead indicates the
MlyI-resistant band; the asterisk indicates the band resulting from star (nonspecific DNA cleavage)
activity of MlyI.

type sequence, whereas plant 2 could be either
mosaic or heterozygous. Overall, these results
suggest that Cas9 and the sgRNA are not
toxic and that the induced mutations can be
transferred to whole plants.
Given that the target sequence is 20 bp, the
sgRNA:Cas9 system may not be as specific
as TALEN-induced mutagenesis, which can
be tailored to target longer sequences9. We
identified a total of 98 potential off-target
sequences by searching the N. benthamiana
genome database against the 20-bp target
sequence within the PDS locus using the
BLASTN tool (Supplementary Table 2). We
managed to assay 18 of the identified offtarget sites using the restriction enzyme site
loss method described above (Supplementary
Methods). These sites have 14- to 17-bp out of
20-bp identity to the targeted PDS sequence.
None of 18 amplicons showed evidence

of sgRNA-guided, Cas9-induced MlyI
restriction site loss as observed with the PDS
target sequence (Supplementary Table 3 and
Supplementary Fig. 8). We therefore did not
detect any Cas9 activity with the subset of offtarget sequences tested. Nevertheless, more
comprehensive analyses of off-target activity
are required to address this issue further,
especially considering recent findings9.
These data clearly indicate that Cas9
and an engineered sgRNA can direct DNA
breaks at defined chromosomal locations in
plants. The rapid and robust transient assay
we have developed will enable plant-specific
optimization of the Cas9 system. Relative to
other methods of plant genome engineering,
the CRISPR-Cas9 system has the potential
to simplify the process of plant genome
engineering and editing because only a short
fragment in the sgRNA needs to be designed

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

to target a new locus. This creates a valuable
new tool for plant biologists and breeders,
and it hastens the prospects of achieving
routine targeted genome engineering for
basic and applied science.
Note: Supplementary information is available in the
online version of the paper (doi:10.1038/nbt.2655).
ACKNOWLEDGMENTS
We thank S. Marillonnet and Icon Genetics, Halle,
Germany for providing plasmid vectors, J. Win and S.
Dong for help with figure preparation, and M. Smoker
for help with the plant transformation. This work
was supported by the Gatsby Charitable Foundation,
the European Research Council (ERC), and the
Biotechnology and Biological Sciences Research
Council (BBSRC).
Author contributions
V.N. performed the experiments. V.N. and J.D.G.J.
designed the constructs. V.N., J.D.G.J. and S.K. wrote
the manuscript. V.N., B.S., D.W., J.D.G.J. and S.K.
contributed to the design of the study.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests:
details are available in the online version of the paper
(doi:10.1038/nbt.2655).

Vladimir Nekrasov1, Brian Staskawicz2,
Detlef Weigel3, Jonathan D G Jones1,4 &
Sophien Kamoun1,4
1The Sainsbury Laboratory, Norwich Research

Park, Norwich, UK. 2Department of Plant and
Microbial Biology, University of California,
Berkeley, California, USA. 3Max Planck Institute
for Developmental Biology, Tübingen, Germany.
4These authors contributed equally to this work.
e-mail: [email protected] or
[email protected]
1. Griggs, D. et al. Nature 495, 305–307 (2013).
2. Voytas, D. Annu. Rev. Plant Biol. 64, 327–350 (2013).
3. Mussolino, C. & Cathomen, T. Nat. Biotechnol. 31,
208–209 (2013).
4. Goodin, M. et al. Mol. Plant Microbe Interact. 21,
1015–1026 (2008).
5. Mali, P. et al. Science 339, 823–826 (2013).
6. Van der Hoorn, R. Mol. Plant Microbe Interact. 13,
439–446 (2000).
7. Cong, L. et al. Science 339, 819–823 (2013).
8. Qi, Y. et al. Genome Res. 23, 547–554 (2013).
9. Fu, Y. et al. Nat. Biotechnol. advance online publication, doi:10.1038/nbt.2623 (2013).

693

c o rresp o ndence

npg

© 2013 Nature America, Inc. All rights reserved.

Chinese hamster genome
sequenced from sorted
chromosomes
To the Editor:
In recent years, the number of published
genome sequences has increased substantially
owing to major developments in nextgeneration sequencing (NGS) technologies,
concomitant reduction of sequencing costs
and improvements in assembly strategies. In
2011, your journal published the genome of
Chinese hamster ovary (CHO)-K1 cells, the
most frequently used mammalian production
cell line for biopharmaceutical products1.
In this issue, the genomes of several related
CHO cell lines as well as of the genome of
the Chinese hamster are also presented2.
Although this information provides longawaited and necessary insights for scientists
working with these important production
hosts, it also highlights a major drawback
of short-read NGS technology, namely, the
difficulty of assembling short-read data
and scaffolding these sequences into a
fully structured genome. This is especially
critical for CHO cells, which are known
to be genomically unstable, with frequent
chromosome rearrangements and loss3,4. In
the following correspondence, we describe
how a chromosome sorting approach can
facilitate genome assembly from short-read
sequences.
The effects of chromosome rearrangements on behavior relevant to individual
bioprocesses of different CHO cell lines
is not clear and will require more detailed
analysis in the future. Although it seems less
likely that large segments of genomic DNA
are lost completely, which would entail the
loss of necessary cellular functions, presumably leading to cell death, rearrangements
probably lead to subtle changes in transcription patterns. These may affect cellular
properties relevant to bioprocessing, such
as growth, robustness and productivity of
CHO cell lines and clones. For future studies
on these changes and their impact on cell
behavior in industrial cell lines, it is thus of
prime importance to have, on the one hand,
a reference genome that includes the allocation of scaffolds and contigs to chromosomes and, on the other hand, a method that
enables characterization of chromosomal
translocations present in CHO cell lines
being sequenced.
694

Current NGS technology yields shortread sequences typically in the range of
100–500 bp, so that common repeats cannot
be assembled and the precise location
of duplicated sequences is likely to be
missed5. De novo assembly generates, on
average, scaffolds of 1–2 Mb if genome
coverage is sufficiently high (50- to 100fold). As chromosomes are several fold
larger (typically 90-200 Mb), chromosomal
rearrangements and translocations can be
captured only in part.
Here, we address this dilemma by
isolating individual chromosomes by flow
cytometric cell sorting, followed by NGS of
the obtained material in separate sequencing
reactions. After curation and assembly, the
resulting scaffolds can be assigned to specific
chromosomes. We applied our approach to
cells from the Chinese hamster strain 17A/GY
and came across several challenges, such as
cross-contamination by chromosomes that
were too close in the flow histogram and
which required a bioinformatic procedure
for curation (Fig. 1). The most severely
affected chromosomes in this respect were
chromosomes 5 and 6. Chromosomes 9 and
10 could only be separated as a pool and
chromosome Y was not sorted at all. For
library construction, we obtained 80–620 ng
of DNA for each sorted chromosome and
prepared, in addition, a 5,000-bp mate-pair
sequencing library from whole genome
DNA. We sequenced the libraries on an
Illumina (San Diego) Genome Analyzer IIx,
using TrueSeq PE Cluster Kit v5-CS-GA and
TrueSeq SBS Kit v5-GA and generated ~70fold genome coverage, assuming a genome
size of 2.8 Gb for the Chinese hamster6.
Subsequently, 1.4 billion reads were
assembled into a draft sequence for
the separated chromosomes using
ALLPATHS-LG7. As mentioned above,
sequencing libraries from separated
chromosomes might be contaminated with
sequences from other hamster chromosomes.
The separated chromosome assemblies were
therefore analyzed to identify and eliminate
contaminating scaffolds from the data. This
filtering led to high-quality assemblies of
separated Chinese hamster chromosomes

with the total number of scaffolds ranging
from 517 for chromosome 8 to 5,348 for
chromosomes 9+10, and a total genome size
of 2.33 Gb (Table 1).
We mapped scaffolds of the separated
hamster chromosome libraries to the mouse
genome together with the published CHOK1 genomic sequence1 (Supplementary
Fig. 1). This revealed that, in principle, the
entire genome of the mouse can be covered
by Chinese hamster sequences, even though
complex chromosomal rearrangements
have occurred. The only exceptions are
mouse chromosomes 7, 14, 17 and X, which
are incompletely covered by both Chinese
hamster and CHO-K1 sequences. Gaps
detected between the Chinese hamster
scaffolds and mouse chromosomes occur
primarily in regions with a high frequency
of interspersed repeats and low complexity
regions, which cannot be assembled
properly from short sequence reads. As the
missing regions on mouse chromosomes 7
and 12 are in part covered by short scaffolds
and as the corresponding CHO-K1 genome
has even more sequences mapping to
these locations, it seems likely that these
sequences are not missing in the Chinese
hamster, but might have been difficult
to assemble owing to sequence repeats.
Also notable is that despite the severe
chromosomal rearrangements that have
occurred in CHO-K1 (refs. 3,4), no major
parts of the genome are completely missing:
gaps relative to the mouse chromosomes
1,000

800

Hoechst 33258

OPEN

600

400

200

0

0

200

400

600

800

1,000

Chromomycin A3

Figure 1 Bivariate flow cytometric analysis
of Chinese hamster chromosomes. Fibroblast
cultures were established from strain 17A/GY.
Staining was performed with Hoechst 33258 and
chromomycin A3. Fluorescence intensity is plotted
for 30,000 events. Numbers and letters refer to
the respective chromosomes. The X chromosome
and all autosomes except chromosomes 9 and 10
(sorted as a pool) show individual peaks. The Y
chromosome peak was very close to chromosome
5 and therefore not sorted.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

c o rresp o ndence
Table 1 Assembly statistics of separated Chinese hamster chromosomes
Feature
1
Number of contigs
Number of scaffolds

3

4

5

6

7

8

9 + 10

9,296

X

8,120 24,872

319,162

3,367

2,835

2,452

4,520

818

1,009

517

5,348

3,235

28,764

Total contig length (Mb)

492

432

247

190

190

153

125

94

52

114

2,089

Total scaffold length, with gaps (Mb)

563

464

278

228

215

160

137

98

54

137

2,333

N50 contig length (kb)

10.99

15.41

11.55

7.64

11.85

17.03

10.35

16.79

12.54

7.75

11.9

N50 scaffold length, with gaps (kb)

1,149

1,496

821

577

1,647

3,421

2,484

2,601

2,288

622

1,245

Median length of gaps in scaffolds (nt)

© 2013 Nature America, Inc. All rights reserved.

2

87,097 48,331 36,687 38,813 31,629 15,138 19,179
4,663

Percentage of Ns

npg

Sum of separated
chromosome assemblies

Separated chromosome assemblies

500

360

590

682

503

222

315

164

251

651

472

12.50

6.79

11.29

16.50

11.87

4.84

8.64

3.89

3.40

16.21

10.45

occur at the same positions of high repeat
density as for the Chinese hamster reference
genome, and only very small regions are
missing in CHO-K1 that are present in
the Chinese hamster genome. Homologies
between the Chinese hamster chromosome
sequences and mouse chromosomes
identified by sequence mapping
compare well to reciprocal chromosome
painting results of hamster and mouse
chromosomes8.
The sequence of the Chinese hamster
provides a reference for future research of
sufficient quality and precision to enable
characterization and study of chromosomal
rearrangements and stability in CHO cell
lines. In addition, the results of this study
suggest that the approach of using sorted
chromosomes for library generation may
prove beneficial for sequencing of complex
reference genomes of other eukaryotes.

the CLIB Graduate Cluster Industrial Biotechnology.
Part of this research was supported by ACIB, the
Austrian Center of Industrial Biotechnology, a K2
competence center within the COMET program of
the Austrian FFG (the Austrian Research Promotion
Agency).

Accession code. GenBank: APMK00000000.
The version described in this paper is the first
version, APMK01000000.

This work is licensed under a
Creative Commons AttributionNonCommercial-ShareAlike 3.0
Unported License. To view a copy of
this license, visit http://creativecommons.org/licenses/
by-nc-sa/3.0/.

Note: Supplementary information is available in the
online version of the paper (doi:10.1038/nbt.2645).
ACKNOWLEDGMENTS
F.K. acknowledges the receipt of a scholarship from

AUTHOR CONTRIBUTIONS
N.B., J.G., J.E.M. and A.P. originated the concept of
the study. H.L. contributed the chromosome sorting
strategy. The project was further developed by W.E.B.,
D.M., T.J., M.L. and B.H. K.B., T.N., A.T. and A.G.
carried out the sequencing project design. W.E. and
F.H. contributed to study planning and generated
samples of cells and genomic DNA of the Chinese
hamster. R.K. and J.W. sorted Chinese hamster
chromosomes. H.L. and S.R. prepared DNA from
sorted chromosomes. O.R., F.K. and B.L. performed
data analysis. All authors contributed to drafting and
reviewing the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests:
details are available in the online version of the paper
(doi:10.1038/nbt.2645).

Karina Brinkrolf 1,2,7, Oliver Rupp1,7,
Holger Laux3,7, Florian Kollin1,

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

Wolfgang Ernst4, Burkhard Linke1,
Rudolf Kofler5, Sandrine Romand3,
Friedemann Hesse4, Wolfgang E Budach3,
Sybille Galosy6, Dethardt Müller4, Thomas Noll1,
Johannes Wienberg5, Thomas Jostock3,
Mark Leonard6, Johannes Grillari4,
Andreas Tauch1,2, Alexander Goesmann1,
Bernhard Helk3, John E Mott6, Alfred Pühler1 &
Nicole Borth2,4
1Center for Biotechnology, Bielefeld University,

Germany. 2ACIB, Austrian Center of
Industrial Biotechnology, Austria. 3Novartis
Pharma, Basel, Switzerland. 4Department of
Biotechnology, University of Natural Resources
and Life Sciences, Vienna, Austria. 5Molecular
Cytogenetics, Chrombios, Nussdorf, Germany.
6Pfizer, New York, New York, USA. 7These
authors contributed equally to this work.
e-mail: [email protected] or
[email protected]
1. Xu, X. et al. Nat. Biotechnol. 29, 735–741 (2011).
2. Lewis, N.E. et al. Nat. Biotechnol. 31, 759–765
(2013).
3. Derouazi, M. et al. Biochem. Biophys. Res. Commun.
340, 1069–1077 (2006).
4. Cao, Y. et al. Biotechnol. Bioeng. 109, 1357–1367
(2012).
5. Alkan, C. et al. Nat. Methods 8, 61–65 (2011).
6. Omasa, T. et al. Biotechnol. Bioeng. 104, 986–994
(2009).
7. Gnerre, S. et al. Proc. Natl. Acad. Sci. USA 108,
1513–1518 (2011).
8. Yang, F. et al. Chromosome Res. 8, 219–227 (2000).

695

C O M M E N TA R Y

case study

The rarest of bounties
Brady Huggett

npg

© 2013 Nature America, Inc. All rights reserved.

L

ast year, Alexion Pharmaceuticals reported
modest net income of about $255 million.
This would hardly seem to mark Alexion as
a biotech star, yet the company’s stock ended
2012 at $93.74, with a market cap greater than
$18 billion. Alexion thus ranked in market cap
right below biotech bellwether Biogen Idec,
which for 2012 reported net income of about
$1.4 billion, fifth highest of today’s public biotechs. In July rumors surfaced that Alexion
might be bought, with analysts speculating the
price-per-share for any purchase could climb as
high as $130.
What has given Alexion these lofty valuations?
The company was founded in 1992 to pursue complement inhibition therapies. Early on,
it developed the humanized monoclonal antibody Soliris (eculizumab), which is designed
to block the production of C5a and C5b-9,
both mediators of the inflammatory process.
After false starts testing the drug in asthma and
autoimmune disorders, the company launched
a trial in 2002 to test the use of Soliris in the
fatal condition paroxysmal nocturnal hemoglobinuria (PNH). It would go on to run a pilot
phase 2 study in PNH, and two phase 3 trials
(TRIUMPH and SHEPHERD). Of the 195
people enrolled in those trials, everyone who
received Soliris had an objective improvement
in hemolysis, and reductions in blood transfusions, anemia and the risk for thrombosis.
Gauging these impressive data, Alexion knew
it had a gold mine on its hands.
And it has mined it. In 2007, Soliris became
the only drug approved to treat PHN to reduce
hemolysis, and Alexion aggressively priced it at
around $400,000 annually, giving it the dubious honor of being the world’s most expensive
drug. (Alexion provides the drug free to those
who cannot afford it.) A second indication
came in 2011: atypical hemolytic uremic syndrome (aHUS). Here, too, Soliris is the first and
only drug approved to treat a small but needy
patient population. Dosing is slightly higher in

Brady Huggett is Business Editor at
Nature Biotechnology.

696

this indication, which pushes the annual price
tag to around $450,000. The company reported
net sales of $1.13 billion in 2012.
Charging a premium for efficacious drugs
in unmet niche indications is not new—
Genzyme paved this business path long ago. Its
first orphan drug, Ceredase (alglucerase), was
approved in 1991 for Gaucher disease, and the
second-generation drug, Cerezyme (imiglucerase), in 1994. The company charged around
$350,000 per year per patient, and through the
years it brought in billions of dollars.
Genzyme’s success produced business
model copycats, such as BioMarin, which was
founded in 1997. Like Genzyme, BioMarin
focuses on enzyme-replacement therapies
for niche indications and charges premium
prices. Its four approved products—Naglazyme
(galsulfase), Kuvan (sapropterin dihydrochloride), Aldurazyme (laronidase) and Firdapse
(approved only in EU; amifampridine phosphate)—earned ~$497 million in net revenue
in 2012.
But Alexion, of Cheshire, Connecticut, has
surpassed these successes in two ways. First, it
has been exceptionally effective at simultaneously controlling costs and casting a wide net
for rare patients. Last year, it reported cost of
sales of ~$126 million (representing only 11%
of net product revenue—a low figure for producing a biologic). The company has kept costs
around this level since the drug was introduced,
which means that even as Alexion expands its
global reach—it has operations in 50 countries and is earmarking such areas as Korea,
Turkey and Latin America for expansion—
the bottom line has held steady.
In comparison, when Genzyme introduced Cerezyme, it had net product sales of
~$72 million against a cost of those products
sold of $33.2 million, or 46%. This has improved
over the years (but was still ~29% in 2010, the
last full year of reporting before Sanofi bought
it). BioMarin has done better—its cost of products sold was 18% in 2012—but that still cannot
touch Alexion. Whether the reason is company
size; an easier, high-yielding manufacturing
process; a focused sales force; or Soliris’s higher

Closing share price ($)

In a land where drugs for ultra-rare indications are the new blockbusters, Alexion’s Soliris is king.
100
90
80
70
60
50
40
30
20
10
0

93.74
71.5

40.28
18.1

2008

24.41

2009

2010
Year

2011

2012

Figure 1 The rise of Alexion stock.

price, Alexion is pocketing more per sale than
others.
Second, Alexion’s research has been remarkably serendipitous. In Soliris, it has the rarest of
rare things—an ultra-orphan drug that works
exceptionally well in another orphan indication
lacking treatment options. Net sales for Soliris
last year were up 45% from 2011, and though
sales were heavily slanted to the more established PNH indication, the company believes
the aHUS indication will eventually be as large
as PNH and is currently investigating nine
other indications—no wonder investors are
salivating. In contrast, the orphan indication
enzyme replacement therapies of Genzyme and
BioMarin are lifesaving and valuable revenue
drivers, but they are one-trick ponies.
Of course, there are threats to Alexion’s premium pricing model. It is currently awaiting
review by the UK National Institute for Health
and Care Excellence and other reimbursement
authorities across the broader European Union.
And skeptics warn that this rapid patient discovery and revenue growth cannot continue.
For now, though, Alexion’s stock is on a steep
upward climb (Fig. 1). And the company has
diversified through the acquisition of Enobia in
2011, bringing aboard ENB-0400 (asfotase alfa),
an enzyme replacement therapy for the ultrarare, genetic metabolic disease hypophosphatasia. The move oozes synergy. But it would be
foolish to expect that drug—or any other for that
matter—to perform like Soliris. Soliris is a oncein-a-lifetime drug for the patients it helps, and
it’s a once-in-a-lifetime drug for the company it
buoys.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

F E AT U R E

Public biotech 2012—the numbers
Brady Huggett
Initial public offerings are back. And big biotech is getting bigger: another year of profitability, increasing drug sales
and bustling partnering activity.

Brady Huggett is Business Editor at
Nature Biotechnology.

Nature Biotechnology has published a report on public biotech companies in its pages since
1996. Our definition of what constitutes a biotech company has changed with the industry,
as have our methods for gathering the information that powers this article. We generally
include companies built upon applications of biological organisms, systems or processes,
or the provision of specialist services to facilitate the understanding thereof. We exclude
pharmaceutical companies, medical device firms and contract research organizations to
better focus on the unique attributes and situations that make up the biotech sector.
The data were provided by Ernst & Young. The top ten lists and other aggregate lists are
sourced appropriately, although mostly they are generated by an analysis of data supplied
by BioCentury (San Carlos, CA, USA). In this regard, because investors do not stratify the
biotech sector as stringently as Nature Biotechnology, we use money figures from across the
biotech and biopharmaceutical arena to best highlight trends. Companies delisted in 2012
from major exchanges were excluded.

investing in the broader biotech field last year
was the best since 2007 and the second best
in ten years. Follow-on offerings ($6.3 billion)
were at the highest point in a decade, whereas
money raised through IPOs bettered 2011.
Certainly, the public markets provide a monetary backbone for biotech, but the majority of
financial support for public and private companies comes through collaborations.

lower valuations and less money upfront, and
more focused deals centered around a single
asset, bringing down potential milestone payments.
The partnership at the top of our 2012 list
(Table 1) comes from unique circumstances,
as it results from the Bristol-Myers Squibb
(BMS; Princeton, NJ, USA) buyout of Amylin.
After the acquisition, Amylin’s main assets
were folded into a development collaboration
with AstraZeneca, which paid $3.4 billion in
cash for the privilege. Any profits or losses
will be split, and AstraZeneca can pay another

Swing your partner
Money linked to partnering deals last year
reached $37.9 billion, a slight dip from 2011.
Although the total
amount was nearly
2,000
the same as in 2011,
1,430.81
it reached it through
1,500
more total deals. Data
970.17
from Recap, of San
843.57
834.96
1,000
1,084.72
768.52 790.31 798.39
729.54
Francisco, show that
500 724.14
the amounts per deal,
including upfront
0
payments and potential milestones, were,
on average, smaller.
Year
This was a result of
more
early-stage Figure 1 NASDAQ biotech index over time. The data cover the year ending
deals, which carry December 31.

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

20
12

20
11

20
10

20
09

20
08

20
07

20
06

20
05

20
04

03

Out of the stocks
The NASDAQ biotech index (NBI) ended 2012
at 1,430.81, nearly 350 points higher than its
2011 close and a 31.9% increase for the year.
That far outpaces even the sizable growth seen
in other indexes: the Dow Jones industrial
average climbed 7.2% last year, the S&P 500
moved up 13.4% and NASDAQ rose 15.9%.
It was a good year to be in biotech stocks,
then. But the past decade suggests it almost
always is (Fig. 1). Public biotech stocks have
been on a steady upward climb, with a hiccup
only around the collapse of the financial markets in 2008.
Perhaps that is why many investors find the
mid-to-late life sciences space so alluring. Our
money chart (Fig. 2) shows that venture capital

Box 1 The numbers

20

or publicly listed biotech companies,
2012 was a year to remember. Follow-on
offerings were historically impressive; the
initial public offering (IPO) market warmed;
stocks rose nearly the entire year; partnering
activity, though more frugal, was robust; and
sales of biotech’s best-selling drugs increased
yet again. Above all, last year the public sector (as measured by our criteria; Box 1 and
Supplementary Table 1) achieved its fifth
straight year of profitability. This success was
powered, as usual, by large-cap companies,
which recorded $75.7 billion in revenue.
Overall, our contingent of public biotechs
brought in more than $103 billion in revenue, spent more than $25 billion on R&D,
and reported a collective profit of more than
$7.7 billion.
Nestled amid this performance, however,
is perhaps the greatest indicator of biotech’s
maturation: an ability to repeatedly produce
large-cap companies that are both profitable
and provide jobs.

Closing share price of NBI

npg

© 2013 Nature America, Inc. All rights reserved.

F

697

f eature

55
50

Amount raised ($ billions)

45

8.93
8.67
2.84
3.88
0.54
3.97

40
35
30

10.93
8.48
3.27
3.34
2.56
5.34

17.27
5.65
3.12
4.85
1.86
5.3

23.42
10.27
6.40
4.38
3.08
6.9

b
120
18.97
2.69
3.79
1.73
0.13
5.38

25
20
15
10

100
Amount raised ($ billions)

a

18.97
11.54
5.02
5.58
2.03
5.89

80

53.28
9.68
2.95
6.15
0.93
5.34

60.77
22.41
3.37
3.47
1.63
5.07

38.05
36.38
2.68
4.16
1.01
5.21

37.87
19.65
2.97
6.32
1.12
6.15

Partnerships
Debt
PIPEs & other
Follow-on

60

IPO

40

Venture capital

20

5
0

2004

2005

2006
2007
Year

2008

2009

0

2009

2010 2011
Year

2012

npg

© 2013 Nature America, Inc. All rights reserved.

Figure 2 Global biotech industry financing. (a) Partnering data for 2004–2009 includes only US
companies. PIPEs, private investments in public entities. (b) Partnering data for 2009–2012 includes
global deals. Sources: BCIQ BioCentury Online Intelligence. Partnership figures from Burrill &
Company. BioCentury updates its financing data on an ongoing basis.

$135 million to gain equal governing rights
on decision making. So although this is a bigmoney deal for Amylin, the company is no
longer a freestanding biotech.
The Genmab–J&J (New Brunswick, NJ,
USA) Biotech licensing deal, the third largest
on our list, centers on daratumumab, a human

CD38 monoclonal antibody, but includes a
backup human CD38 antibody. Daratumumab
is being developed for multiple myeloma,
though it may have potential in acute myeloid
leukemia. Genmab (Copenhagen) will
received $55 million upfront in the deal, with
Johnson & Johnson also buying an $80-million equity stake
(5.4 million shares)
Table 1 Top ten partnerships in 2012
in Genmab, which
Researcher
Investor
Date
Valuea ($ millions)
could potentially
Amylinb
AstraZeneca
7/2
3,400
pick up $1 billion in
Galapagos
Abbott
2/1
1,350
milestone payments
for development,
Genmab
J&J
8/30
1,135
approval and sales
Endocyte
Merck
4/16
1,000
targets, plus tiered
Regulus
AstraZeneca
8/16
882
double-digit royalEvotec
Bayer
10/1
762
ties.
Isis
Biogen Idec
1/4
630
The Genmab deal
Symphogen
Merck KGaA
9/6
623
joins three others on
Threshold
Merck KGaA
2/3
550
our list that, when
Ablynx (Belgium)
Merck KGaA
10/2
590
milestone payments
aValue includes milestones. bAmylin had been acquired by BMS prior to this partnership
deal.
are factored in, could
Source: BCIQ BioCentury Online Intelligence and Burrill & Company.
reach $1 billion or
Table 2 Top mergers and acquisitions of 2012
Target

Acquirer

Date completed

Upfront ($ millions)

Pharmasset

Gilead Sciences

1/17

11,200

Amylin Pharmaceuticals

BMS

8/9

5,300

Gen-Probe

Hologic

8/1

3,800

Human Genome Sciences

GlaxoSmithKline

8/3

3,600

Inhibitex

BMS

2/12

2,500

Micromet

Amgen

3/7

1,160

Devgen

Syngenta

9/21

523

Ista Pharmaceuticals

Bausch + Lomb

6/6

500

Proximagen Neuroscience

Upsher-Smith Laboratories

8/14

347

Allos Therapeutics

Spectrum Pharmaceuticals 9/20

194

Data are matched for the definition of biotech in Box 1. Source: BCIQ BioCentury Online Intelligence.

698

more in total value. And like the Genmab
deal, half of the collaborations (Table 1) have
at least one aspect covering oncology; two of
those are antibody partnerships.
The deals in our top ten list have a total
potential value of nearly $11 billion, or
$1.09 billion per deal, when the Amylin outlier is included. This betters 2011, when the
total value of the top ten deals was nearly
$10.6 billion, with an average of ~$1.06 billion.
Discount the Amylin outlier, though, and the
average potential of the remaining nine total
$7.5 billion, or $835.8 million per deal, again
pointing to smaller deals than in recent years.
Early indicators for 2013 show an increase
in partnering activity. This would follow estimates by the consultancy Campbell Alliance
(New York) which, in their “Dealmakers’
Intentions 2013” survey, questioned 129 licensing professionals about expectations for collaborations. Both in-licensors and out-licensors
anticipated an uptick in deal making, and, in
particular, more deals involving de-risking
through milestones and more early-stage deals,
thus translating to lower payouts.
Partnerships often lead to acquisitions
(Table 2). The largest buyout of 2012 was the
purchase by Gilead (Foster City, CA, USA)
of Pharmasset (Princeton, NJ, USA) for
$11.2 billion, picking up the lead candidate PSI7977, a nucleotide analog polymerase inhibitor for hepatitis C. (The deal was announced
in November 2011 but closed in early 2012.)
Gilead, which also completed its purchase of
YM Biosciences (Mississauga, ON, Canada)
for $510 million earlier this year, is a leader in
HIV therapies but has drugs in development
for oncology and inflammation, respiratory
disorders and cardiovascular indications, as
well as a handful of other hepatitis C, hepatitis
B and liver fibrosis drugs. The price per share
($137) for Pharmasset was nearly 90% more
than Pharmasset’s trading price the day before
the announcement, leading some to label the
acquisition pricey. Gilead has submitted the
lead product, now called sofosbuvir,
to US and EU marketing authorities, with a Prescription Drug User
Fee Agreement (PDUFA) date this
December.
BMS’s purchase of Amylin
Pharmaceuticals last year for $31
per share, or $5.3 billion, removed
another profitable biotech from the
landscape. BMS sank $1.7 billion
more into the deal to cover Amylin
debt and a payment obligation to
Eli Lilly (Indianapolis; related to
an old collaboration), moving the
total amount involved to $7 billion.
The assets in the collaboration are

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

f eature
Amylin’s glucagon-like peptide 1 (GLP-1)
agonists, Byetta (exenatide) and Bydureon
(exenatide extended release); metreleptin, now
under priority review by the US Food and Drug
Administration (FDA) for metabolic disorders
associated with inherited or acquired lipodystrophy; and Symlin (pramlintide acetate),
approved for type 1 and 2 diabetes in patients
with inadequate glycemic control already taking meal-time insulin.
BMS also bought Inhibitex last year for
$2.5 billion upfront, hoping to plant a flag in
the hepatitis C space. That move has already
backfired, however, as work on Inhibitex’s lead
drug, now called BMS-986094, a nucleotide
polymerase (NS5B) inhibitor, was discontinued after a patient in a phase 2 trial suffered
heart failure.
Other notable buyouts in 2012 include
London-based GlaxoSmithKline’s (GSK)
purchase of Human Genome Sciences (HGS;
Rockville, MD, USA) for $3.6 billion. The
deal took months to price, with HGS refusing an initial offer at $13 a share and GSK
eventually taking it directly to shareholders, before the two sides agreed on $14.25 a
share, a hefty 99% premium to HGS’s stock
price, before the deal was announced. It gave
GSK Benlysta (belimumab), a lupus drug,
and a cardiovascular and diabetes drug still
in development; the companies had been
partnered on all.
The joy of being public
The 17 IPOs last year, by our count, helped
offset the 25 ‘casualties’—companies acquired,
delisted, gone bankrupt or otherwise removed
from the public biotech scene (Tables 3 and 4).
This meant the total number of public biotechs
in 2012 shrank marginally to 427, down from
439 in 2011 and 460 in 2010.
The 17 IPOs raised an average of $53.6 million (Fig. 3), an increase over 2011 in total
IPOs, but less raised per event than in 2010,
when the sector produced 19 IPOs, averaging
$57 million apiece in raised funds.
Eight of these IPOs came in the last quarter
of 2012, and this momentum has carried over
to 2013, with 18 biotechs having gone public
through the first six months of the year, and
another 9 on file and waiting to price their
shares. This group has raised, on average, $68.8
million apiece. If these types of numbers are
duplicated in the second half, 2013 will rank as
the busiest in offerings since 2007, but raising
more on average.
The largest biotech IPO of the year was
secured by Merrimack Pharmaceuticals, selling
14.3 million shares at $7 apiece for $100 million. Merrimack (Cambridge, MA, USA) uses
a systems biology approach to fight cancer. It

Table 3 IPOs of 2012
Date
completed

Amount raised
($ millions)

Adocia (Lyon, France)

2/14

36.29

Phase 2

Atossa Genetics (Seattle)

11/8

4

Market (diagnostic)
Phase 3

Company (location)

Cempra (Chapel Hill, NC, USA)

Development status

2/2

57.96

2/22

65

Market

2/8

63.75

Phase 3

DBV Technologies (Bagneux, France)

3/28

53.78

Market (diagnostic)

Durata Therapeutics (Chicago)

7/19

77.63

Phase 3

Intercept Pharmaceuticals (New York)

10/15

86.25

Phase 3

Kythera (Calabasas, CA, USA)

10/16

80.96

Phase 3

3/28

100.1

Phase 3

10/24

18.48

Phase 1
Preclinical

Ceres (Thousand Oaks, CA, USA)
ChemoCentryx (Mountain View, CA, USA)

Merrimack Pharmaceuticals (Cambridge, MA, USA)
Nanobiotix (Paris)
Regulus Therapeutics (San Diego)
Taiwan Liposome Company (Taipei City, Taiwan)
Tesaro (Waltham, MA, USA)
TheraDiag (Marne-la-Vallée, France)
UMN Pharma (Akita, Japan)
Verastem (Cambridge, MA, USA)

10/4

70

12/21

25.10

Market

6/27

86.8

Phase 3
Market (diagnostic)

12/6

10.66

12/11

2.36

Approved

1/26

63.25

Phase 1/2

Table 4 Causalities of 2012
Company

Reason removed from list of public biotech companies

Advanced Cell Technologies

Delisted

Affitech

Delisted

Allos Therapeutics

Acquired by Spectrum Pharmaceuticals

Amsterdam Molecular Therapeutics

Taken over by uniQure

Asterand

Acquired by DiscoveRx

Amylin

Acquired by BMS

AVI BioPharma

Changed name to Sarepta

Bio-Bridge Science

Ceased operations

Callisto Pharmaceuticals

Acquired by Synergy Pharmaceuticals

Complete Genomics

Acquired by Beijing Genomics Institute

Devgen

Acquired by Syngenta

ExonHit Therapeutics

Changed name to Diaxonhit

Gen-Probe

Acquired by Hologic

Genta

Filed for bankruptcy

Hana Biosciences

Changed name to Talon Therapeutics

Harbor Biosciences

Delisted

Hexima

Delisted

Human Genome Sciences

Acquired by GSK

Ipsogen

Acquired by Qiagen

Ista Pharmaceuticals

Acquired by Bausch & Lomb

LifeCycle Pharma

Changed name to Veloxis Pharmaceuticals

Micromet

Acquired by Amgen

NABI Biopharmaceuticals

Merged with Biota Holdings to form Biota Pharmaceuticals

NextGen

Delisted

Ore Holdings

Transformed into a management company

Pharmasset

Acquired by Gilead Sciences

Probiomics

Reverse takeover by Hunter Immunology to form Bioxyne

Pro-Pharmaceuticals

Changed name to Galectin Therapeutics

Protox Therapeutics Inc.

Changed name to Sophiris Bio

Proximagen Neuroscience

Acquired by Upsher-Smith Laboratories

Renovo

Transformed into an investment company

Select Vaccines

Transformed into mining company

Sembiosys

Delisted

Virax Holdings

Delisted

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

699

f eature
Number of IPOs
Average amount raised

60

48
53

Number of IPOs

93

58
51

80

10

70

51

30

90

81

40

20

100

54

50

39
22

28
14

40

19

17
13

10

10

10

0
2003

2004

2005

2006

2007
Year

npg

© 2013 Nature America, Inc. All rights reserved.

Figure 3 Global biotech IPOs through the years.

has six oncology products in the clinic, with the
lead product, MM-121, a monoclonal antibody
targeting ErbB3, being developed for nonsmall-cell lung, breast and ovarian cancers.
Tesaro (Waltham, MA, USA), another oncology company, was founded in March 2010 by
the former management group of Abraxis
BioScience (Los Angeles) and picked up
$20 million in Series A funding a couple of
months later. In 2011, it raised $101 million
through a Series B round of funding, then
priced its public offering at $13.50 per share,
bringing in $86.8 million, including the overallotment for underwriters (an arrangement
allowing a company to issue as many as 15%
more shares than originally planned in an offering). At the top of its pipeline sits the smallmolecule NK1 receptor antagonist rolapitant,
in phase 3 for chemotherapy-induced vomiting and nausea, and the small-molecule poly
(ADP-ribose) polymerase (PARP) 1/2 inhibitor
niraparib, with planned phase 3 trials in ovarian and breast cancer. It also has TSR-011 in
phase 2 testing for solid tumors. All three drugs
are in-licensed, and this year Tesaro went back
to the public markets, raising ~$91 million in
a secondary offering.
Intercept Pharmaceuticals (New York)
raised $86.25 million in October, selling 5 million shares at $15 apiece and another 750,000
shares in an overallotment option. The company’s IPO story is in stark contrast to those going
public even just a year earlier, when companies
were forced to drop their price-per-share range
and number of shares offered to fit lackluster
demand, and even then watched their shares
struggle on the open market. Intercept priced
at the high end of its anticipated range, and the
company was able to increase the number of
shares it planned to sell. The stock spiked on
700

2008

30
20

6

2002

60

2009

2010

2011

2012

0

Average amount raised ($ millions)

41
45

50

41
49

lion. Vivus opened 2012 with about $147
million in cash, its stock around $10, and a
beleaguered obesity drug in front of the FDA.
The regulatory authority had turned it down
once for safety concerns, but on February 22,
2012, the FDA’s advisory panel voted 20–2 in
favor of the drug, (Qsymia; extended-release
phentermine and topiramate) and Vivus’ stock
roared as high as $21.44 on the next day of
trading. Vivus priced its offering at $22.50 a
week after the advisory panel announcement,
and said it would use the funds to hire a sales
force. However, the drug (Qsiva in Europe) was
rejected in Europe in 2013 and sales have not
been what the company hoped in the United
States. At mid-year 2013, the company’s stock
was back below $13.
Infinity Pharmaceuticals (Cambridge,
MA, USA) went to the public markets twice
last year, raising $88.3 million in August and
more than $172 million in December. Over
the course of 2012, Infinity reported preliminary data for a phase 1 trial of a small-molecule phosphoinositide-3 kinase delta/gamma
inhibitor IPI-145 in hematologic malignancies,
it started a phase 2a trial in asthma for the same
drug and completed enrollment in a phase 2

first trading and kept climbing. Intercept raised
another $57.1 million in a follow-on offering
this June, selling shares at $33.01 per share, not
including the overallotment option.
The post-IPO stock performance is indicative of the current appetite for biotech stocks.
Of the 17 biotech
IPOs that priced in
Table 5 Top ten follow-on offerings of 2012
2012, 11 were trading
Amount raised
above their offering
Company name
Date completed
($ millions)
price at the halfway
BioMarin Pharmaceuticals
5/31
248.89
point of 2013, and
Vivus
2/29
202.50
some, like Tesaro
Indenix Pharmaceuticals
8/2
176
and Intercept, by
Infinity Pharmaceuticals
12/31
172.50
very large margins.
Rigel Pharmaceuticals
10/3
130.08
This has the potential
Exelixis
8/9
127.50
to draw previously
ImmunoGen
7/12
100
reluctant institutional
Alnylam Pharmaceuticals
2/14
92.72
investors back into
Ironwood Pharmaceuticals
2/9
91.11
the mix.
Neurocrine Biosciences
1/19
88.49
The robust stock
Data are matched for the definition of biotech in Box 1. Source: BCIQ BioCentury
market for biotech
Online Intelligence.
in 2012 helped push
follow-on financings
to a 10-year high,
Table 6 Top ten debt financing in 2012
raising $6.3 billion,
Date
Amount raised
$200 million more
Company
Financing type
completed
($ millions)
than the amount
Gilead Sciences
Other
1/17
2,150
raised in 2009 (Table
Celgene
Sr. notes
8/7
1,500
5). Our list of followElan
Sr. notes
9/25
600
on offerings puts
Alkermes
Other
9/19
375
BioMarin at the top,
Exelixis
Sr. convertible notes
8/9
250
but the more interIsis Pharmaceuticals Sr. convertible notes
8/8
201
esting tale is that of
Amarin
Sr. convertible notes
1/4
150
Vivus (Mountain
Sequenom
Sr. convertible notes
9/12
130
View, CA, USA),
Nektar Therapeutics Sr. notes
7/11
125
which sold 9 milAffymetrix
Sr. convertible notes
6/20
105
lion shares at $22.50
Data are matched for the definition of biotech in Box 1. Source: BCIQ BioCentury
per share, bringing
Online Intelligence.
aboard $202.5 mil-

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

f eature

Table 7 Top-ten-selling biologic drugs of 2012
Name

Lead company Molecule type

Approved indication(s)

Humira (adalimumab)

AbbVie

mAb

Rheumatoid arthritis (RA), juvenile rheumatoid arthritis, Crohn’s disease, psoriatic arthritis (PA), psoriasis, ankylosing spondylitis, ulcerative
colitis (UC), Behçet syndrome

9,266

Enbrel (etanercept)

Amgen

Protein

RA, psoriasis, ankylosing spondylitis, PA, juvenile rheumatoid arthritis

7,967

Rituxan (rituximab)

Roche

mAb

RA, chronic lymphocytic leukemia/small cell lymphocytic lymphoma, nonHodgkin’s lymphoma, antineutrophil cytoplasmic antibodies associated vasculitis, indolent non-Hodgkin’s lymphoma, diffuse large B-cell lymphoma

7,049

Remicade (infliximab)

J&J

mAb

RA, Crohn’s disease, psoriasis, UC, ankylosing spondylitis, Behçet syndrome, PA

6,564

Herceptin (trastuzumab)

Roche

mAb

Breast cancer, gastric cancer

6,188

Avastin (bevacizumab)

Roche

mAb

Colorectal cancer, non–small cell lung cancer, renal cell cancer, brain cancer (malignant glioma; anaplastic astrocytoma, glioblastoma multiforme)

6,059

Neulasta (pegfilgrastim)

Amgen

Protein

Neutropenia/leukopenia

4,092

Lucentis (ranibizumab)

Roche

mAb

Wet age-related macular degeneration, diabetic macular edema, retinal
vein occlusion

4,003

Avonex (interferon beta-1a) Biogen IDEC

Protein

Multiple sclerosis

2,913

Rebif (interferon beta-1a)

Protein

Multiple sclerosis

2,408

Merck

2012 (worldwide sales)

npg

© 2013 Nature America, Inc. All rights reserved.

mAb, monoclonal antibody. Source: BioMedTracker.

for retaspimycin HCl (a heat shock protein 90
inhibitor) in non–small-cell lung cancer. The
company began 2012 at $9, but as its share price
rose, it sold stock at $14.50 and then $26.33. It
ended 2012 at $35 a share, and rode that cycle
as high as $50 this year, but less-than-impressive trial data have since devalued the stock.
One notable entrant, not only in the list
of follow-on offerings but also on our list
of top ten debt deals (Table 6) is Exelixis
(S. San Francisco, CA, USA). It raised nearly
$70 million in public funds in February, and
then again in August for $127.5 million,
when it also brought in another $250 million
through senior convertible notes. According to
BioCentury’s BCIQ Online Intelligence database, this brings the amount of money raised
by Exelixis in its existence to a hefty $1.3 billion, including venture capital rounds. That
$1.3 billion has partly financed the company’s
first approval, the small-molecule multikinase
inhibitor Cometriq (cabozantinib), for metastatic medullary thyroid cancer, and the company has a pipeline of partnered products.
Debt deals in general were in favor last year,
though they dropped to below $20 billion,
down from $36 billion in 2011 and $22 billion
in 2010. That high total in 2011 was influenced
mightily by financings by large-caps Amgen,
and to a lesser degree, Gilead; indeed 2012 was
notable in that for the first time since 2008,
Amgen did not raise money by debt financing.
Gilead, which brought in $4.7 billion in debt in
2011, raised slightly less in 2012 ($2.15 billion)
to help pay for its acquisition of Pharmasset.
This decrease in activity from both these two
large, profitable biotechs dropped the total
amount raised in debt deals last year.
Celgene, which last raised money through
debt in 2010, picked up $1.5 billion through

a senior unsecured note deal, and Alkermes,
which raised $375 million in debt, used the
money to simply retire the balance of a previous $450-million debt deal. Isis (Carlsbad, CA,
USA) did something similar, pricing an offer
of $201.25 million in convertible senior notes,
with the plan to use net proceeds to redeem
outstanding convertible subordinated notes,
and for general corporate and working capital
purposes.
At the drugstore
The list of top-selling biologics in 2012 is led
by Humira (adalimumab), which brought
in nearly $9.3 billion worldwide in 2012, up
almost 17% over the prior year (Table 7).
Humira gained a new approval in 2012—
ulcerative colitis—and it now has eight across
the globe, including Behçet Syndrome in Japan.
Its revenue growth is mostly in the US market,
however; there, sales moved from $3.4 billion
in 2011 to $4.37 billion last year, a jump of 28%.
Humira’s sales have slowed in Europe, however, up just 9% in 2012, after jumping 23%
and 24% in the two preceding years. AbbVie
(Deerfield, IL, USA) attributes the fall last year
to the austerity measures in Europe, which have
affected healthcare spending and the pricing of
pharmaceuticals. There could be more of this
ahead for Humira, and generic competition,
as well (Humira begins to lose patent protection in 2016). Sagient Research Systems’ (San
Diego) BioMedTracker projects peak sales for
the drug in 2014.
For now, though, every drug on our list
of top sellers improved its haul from 2011,
though a few, such as the interferon beta-1a
Rebif (up $54 million) and granulocytemacrophage colony stimulating factor
(GM-CSF) Neulasta (pegfilgrastim; up $140

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

million) did not improve theirs much. There
is one change in the rankings from last year—
Herceptin (trastuzumab; $6.188 billion) overtook Avastin (bevacizumab; $6.059 billion) in
2012, reversing their positions from 2011.
Indeed, the 25 drugs grossing the highest revenues changed little year on year, but
there are a few newer drugs that are growing
fast: Alexion Pharmaceuticals’ (Cheshire, CT,
USA) high-priced, anti-complement factor 5
humanized monoclonal antibody Soliris (eculizumab) brought in $1.134 billion in 2012—a
jump of 44.8% from 2011; and Biogen Idec’s
(Cambridge, MA, USA) anti–alpha-4 integrin humanized monoclonal antibody Tysabri
(natalizumab), approved for multiple sclerosis
in 2004 and Crohn’s disease in 2008, brought
in $1.6 billion last year in worldwide sales, up
32.6% from 2010.
In March, Biogen also received US approval
for Tecfidera (dimethyl fumarate), an oral,
second-generation fumarate derivative for
multiple sclerosis. The drug is expected to
receive approval in Europe this year and has
been quick out of the gate, beating analyst estimates for patient uptake. BioMedTracker estimates a high of more than $6 billion in sales for
the drug by 2020.
But there continues to be pushback on high
drug pricing in both Europe and the United
States. Sanofi, of Paris, and partner Regeneron,
received FDA approval for Zaltrap (zivaflibercept) in August 2012 for metastatic
colon cancer. The drug was priced at more than
$10,000 a month, but in October, three doctors
from the Memorial Sloan-Kettering Cancer
Center wrote an opinion piece in The New
York Times announcing that they would not use
Zaltrap for their cancer patients, owing to its
high cost versus small increase in survival, and
701

f eature

Table 8 Top ten gainers and losers of 2011
Company

2012 revenue
($ millions)

2011 revenue
($ millions)

Change in
revenue ($ millions) Percent change

Gainers
Amgen

17,265

15,582

1,683

11

Monsanto

13,504

11,822

1,682

14

Gilead Sciences

9,703

8,385

1,317

16

Regeneron

1,378

446

933

209

Celgene

5,507

4,842

665

14

Biogen Idec

5,516

5,038

479

10

Shire

4,681

4,263

418

10

Alexion Pharmaceuticals

1,134

783

351

45

Endo Pharmaceuticals

3,027

2,730

297

11

Biomerieux

2,070

1,854

216

12

Losers

0

44

–44

–100

Infinity Pharmaceuticals

47

93

–46

–49

Progenics Pharmaceuticals

14

85

–71

–83

Amyris

74

147

–73

–50

362

447

–85

–19

9

96

–87

–91

428

544

–116

–21

Onyx Pharmaceuticals
Acrux
ViroPharma
AVEO Pharmaceuticals

19

165

–146

–88

Momenta Pharmaceuticals

64

283

–219

–77

Exelixis

47

290

–242

–84

702

Number of employees

Ups and downs
In the realm of public biotech companies, the
largest total revenue gainers in 2012 were some
of the biggest names: Amgen, Gilead, Celgene
and Biogen (Table 8).
Amgen, with its ten approved products—
including five in the 25 top-selling biologics
from 2012—has a range of revenue streams
to produce the growth shareholders want. It
posted a $1.683-billion revenue increase in
2012, 11% higher than in 2011. Gilead, jumping $1.317 in revenue from 2011, grew 16%.
Celgene rose $665 million, or 14%. Biogen,
increasing revenue $479 million, or 10%,
moved to $5.516 billion in total revenue in
2012.
But the fastest risers on the list are Alexion,
which had a 45% increase in revenue to
$1.134 billion in 2012, and Regeneron, which
grew 209%, or $933, to $1.378 billion. Alexion
achieved this growth through additions of
new patients for the paroxysmal nocturnal
hemoglobinuria indication for the company’s
flagship drug, Soliris, as well as increasing the

number of patients with atypical hemolytic
uremic syndrome receiving Soliris. The company says growth for Soliris is in the early part
of an extended run. Most analysts agree.
Regeneron had its first full year of profitability in 2012, mostly due to US sales of vascular endothelial growth factor B (VEGF)-B
trap Eylea (aflibercept) of $838 million. a
At the end of June,
80 75.73
the company’s stock
Large cap
sat at $225 a share,
Mid-cap
60
and its market cap
Small cap
40
was over $22 billion.
Micro cap
16.97
15.48
20
14.71
Eylea, a recombinant
7.02
3.97
3.94
3.37
3.22
decoy receptor com0
–0.59 –3.02
prising portions of
–20
–4.10
Revenues
R&D
Profit/loss
the VEGF receptors
1 and 2 extracellular b
domains fused to the
350
120,000
288
Fc portion of human
104,728
300
100,000
IgG1, is approved for
250
wet age-related macu80,000
200
lar degeneration and
54,791
150
retinal vein occlu60,000
90
100
sion, and Regeneron
40,000
23,940
expects the drug
31
20,345
50
18
could do $1.2 billion
20,000
0
Large
Mid
Small
Micro
to $1.3 billion in sales
this year.
Figure 4 Public biotech barometers. (a) Public biotech company revenue,
In terms of rev- R&D spending, net profits and loss. (b) Number of companies and employees
enue losses, Exelixis by market cap. Large cap, ≥$5 billion; mid-cap, $1 billion to <$5 billion;
took the biggest hit, small cap, $250 million to <$1 billion; microcap, <$250 million.
Amount ($ billions)

because a similar drug, Genentech’s Avastin,
provided a comparable benefit at lower price.
In response, Sanofi offered fifty-percent-off
vouchers to doctors using the drug. Zaltrap was
not expected to be a big earner for Sanofi or
Regeneron (they split the revenue), and though
it is a single example, it was a high-profile one,
and could suggest more resistance ahead.

Number of companies

npg

© 2013 Nature America, Inc. All rights reserved.

Diamyd Medical

 

sinking $242 million, or 84%, to $47.5 million mostly due to exceptional and one-off
events (that is, lump payments of license revenue from termination of agreements with
BMS and Sanofi, together with a payment for
transferring development activities for two
compounds to Sanofi) that inflated Exelixis’
revenues to $289.6 million in 2011.
Aveo Oncology had a drop in revenues due
to similar windfalls the previous year. In 2011,
it reported $164.8 million in collaboration
revenue, a combination from signing its agreement with Astellas and revenue from other collaborations. Revenues fell back down to $19.3
million last year, mostly due to a $15-million
milestone it received from partner Astellas
after Aveo filed a new drug application for the
small-molecule multikinase inhibitor tivozanib
in advanced renal cell carcinoma.
Combine these winners and losers together,
and our collection of public biotechs put forth
another profitable year in 2012. This is due
to the large caps, profitable by a wide margin
(Fig. 4); 31 mid-caps were collectively in the
red by around half a billion ($593 million), a
slight increase over their losses in 2011; and
the small caps and microcaps were, as usual,
far from profitability. Small caps, in fact,
increased their bleeding to over $3 billion
in 2012, from $1.69 billion in 2011, but the
microcaps improved to lose $4.1 billion last
year, compared to $4.68 billion in 2011.
The 18 large caps in our group turned
a profit of more than $15 billion in
2012, up from the $13.7 billion profit

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

f eature
they collectively recorded in 2011. With
those large profits come the ability to
spend—the large caps burned through
$14.7 billion in R&D in 2012, more than the
three other categories of public biotechs (midcaps, small caps and microcaps) combined.
By our count, there were 427 public biotechs
at the end of 2012, and a little more than half
(215) were located in the United States (Fig. 5).
The vast majority of these companies were
microcap companies, but across all subsectors, the companies have been getting leaner
over the past five years. In 2008, a large-cap
company averaged about 8,500 employees;
last year, it was only a little more than 5,800.
The average workforce of a mid-cap company
also shrank, from 1,900 workers in 2008 to
1,767 last year. Small caps went from 429
employees to 260. Microcaps also trended
down from 97 employees to an average of 70
in 2012.
This is likely a direct result of the fiscal crisis
that hit in 2008—layoffs and downsizing came
in waves in the United States and abroad. At the
time, this was suggested as the silver lining of
the economic crisis—companies would learn
to do more with less and be more efficient with
human and capital resources. Five years later, it
appears to be true.
The one percent increases in size
What’s keeping the executives of public biotech companies up at night? BDO (Brussels),
a global consultancy, surveyed the most recent
filings from the 100 largest life sciences companies on NASDAQ, and their findings show that
many companies are concerned that the Patient
Protection and Affordable Care Act will increase
regulatory burdens and drive up operating costs.
Another 87% fret over reimbursement, showing how important the payor has become to the

United States

9

Canada
Australia
Europe

128

ROW

214
42
34
Figure 5 Public biotech companies by
geographical location. ROW, rest of world.

biotech equation. In general, there’s a continuing
fear that austerity measures will plague healthcare, both in Europe and the United States, and
drive down the prices of drugs.
There’s also the fear that the indifferent IPO
market for biotechs over recent years, especially
those lacking human efficacy trial data, compounded by the dwindling number of pharma
companies able to acquire biotechs, have combined to shrink the number of viable exits for
venture capitalists so far that investing in the
classic university biotech spinout is becoming
unrealistic. Certainly, the public biotech sector
has bounced back, but the types of companies
going public today often look quite different
from the types of enterprises (many of which
came from universities) that went public in previous decades.
This is a concern. Many of today’s most profitable large-cap companies would simply not
exist if current criteria for listing on a public
stock exchange had been applied in previous
decades. This suggests that the profitable public
biotechs of tomorrow might never be built—in
essence, the industry of today may never create another Genentech, thus depriving us all of
innovative drugs. Factor this in with consolida-

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

tion among global pharma, which limits the
number of transactions and exits for biotechs,
and biotech is looking at far fewer innovative
shots at the goal for its collective pipeline.
But there is a wrinkle. The number of profitable, large-cap biotech companies is actually
increasing. Consider that in 2003, our survey
counted 11 large-cap public biotechs. Five years
later, there were nine. Since then, the numbers
have increased yearly: 13, 15, 14 and then 18 in
2012. Some of the jump in 2012 can be attributed to the buoyant year in stocks in general,
driving up market caps, but over the past ten
years, the number of large-cap biotechs has
increased, not decreased. And it has increased
amid churn. Of the 11 large-cap biotechs existing in 2003, only Amgen, Monsanto, Gilead,
Biogen, Celgene and Shire remain as freestanding entities today. That means that the holes
left by the five large-cap biotechs that were
acquired (Chiron, Genzyme, MedImmune,
Millennium and Genentech) were not only
filled, but another seven were added: CSL,
Alexion, Regeneron, Vertex, Novozymes,
Life Technologies, Illumina, BioMarin, Elan,
Actelion, IDEXX Laboratories and Onyx. At
the end of 2012, these companies all had a market cap greater than $5 billion.
At press time, Onyx is on the block, having
received a $10-billion buyout offer from fellow
biotech Amgen. It rejected the bid, but contacted its financial adviser to look into potential buyers. It’s likely Onyx will be purchased
and its life as an independent public biotech
will end. But if the trend shown by our survey
is correct, this is nothing to grieve. There may
never be another Genentech, but there will
almost certainly be another Onyx.
Note: Supplementary Information files are available in the online version of the paper.

703

p at e n t s

The European BRCA patent oppositions and appeals:
coloring inside the lines
Gert Matthijs, Isabelle Huys, Geertrui Van Overwalle & Dominique Stoppa-Lyonnet

npg

© 2013 Nature America, Inc. All rights reserved.

The patents on BRCA1 and BRCA2 held by Myriad Genetics have been the subject of much attention in the United
States recently, but the fire was lit in Europe more than a decade ago.

T

he patents on the BRCA1 and BRCA2
genes relating to the diagnosis of familial breast and ovarian cancers have attracted
worldwide attention. A fierce race to find
and patent the genes1,2 (see Box 1) was
followed by an intense struggle by the patentees to safeguard their rights. A suit was
initiated by the American Civil Liberties
Union together with the Association
for Molecular Pathology, the American
College of Medical Genetics, the American
Society for Clinical Pathology, the College
of American Pathologists, Breast Cancer
Action and the Boston Women’s Health
Book Collective in 2009 (ref. 3), and the case
eventually reached the US Supreme Court,
which issued its momentous decision on
13 June 2013 (ref. 4). On the European side,
however, BRCA patent oppositions and
appeals began in early 2001, after the first
BRCA1 patent was granted (Fig. 1). They
were supported by several genetics and
cancer institutes, national genetics societies, patients’ associations, nongovernmental
organizations and governments5. The most
Gert Matthijs is at the Center for Human
Genetics of the University of Leuven, Belgium;
Isabelle Huys is in the Faculty of Pharmaceutical
Sciences and the Centre for Intellectual Property
Rights (CIR) of the University of Leuven,
Belgium; Geertrui Van Overwalle is at the CIR
of the University of Leuven, Belgium, and at
the Tilburg Institute for Law, Technology and
Society of Tilburg University, The Netherlands;
Dominique Stoppa-Lyonnet is at the Université
Paris Descartes and the Service de Génétique at
the Institut Curie, Paris, France. G.M. and I.H.
contributed equally to this work.
e-mail: [email protected] or
[email protected]

704

recent action dates from September 2010,
with a decision of the European Patent Office
(EPO) on the BRCA2 patent. Here, we summarize the decade-long series of legal actions
against the Myriad patents6 in Europe, comparing the original European patents granted
to the final ones, as well as to the US patents.
Oppositions and appeals to BRCA patents
The grounds for opposition covered, among
others, criteria for patenting listed in the
European Patent Convention (EPC), such as
lack of novelty, lack of inventive step, lack of
industrial applicability and incomplete disclosure. Apart from the issue relating to whether

or not the BRCA1 gene had been effectively
isolated at the priority date, no real discussion
about the patentability of the human genes
(as occurred in the US), took place before the
EPO. Hence, challenges to claims on genes and
diagnostic methods based on a link between a
genetic defect and a disease as subject matter
of such patents has never been successful, at
least not before the EPO, even though the patentability of this type of genetic invention has
been strongly contested by geneticists, medical
professionals, politicians and in the scientific
and the popular press in Europe.
Outside the EPO, however, wide concerns
were raised about the potential negative

BOX 1 The race for the identification of BRCA genes
Different genes, germline mutations and variants have been associated with an elevated
risk of breast and/or ovarian cancer, but two genes count heavily in causing an autosomal
dominant form of familial predisposition: BRCA1 and BRCA2.
Efforts to identify these genes started more than 25 years ago, when in 1988
researchers from the UK, the US, Canada, France and The Netherlands joined efforts
in the International Breast Cancer Linkage Consortium (BCLC)32. The BRCA1 locus
was mapped to chromosome 17 by linkage analysis, with a lod score (logarithm of the
likelihood ratio for linkage) of nearly 6, by a group led by Mary-Claire King (University of
California, Berkeley) in 1990 (refs. 1,33). The location of the gene was refined, thanks
to the international and collaborative efforts of the BCLC, and described by Easton and
colleagues in 1993 (ref. 34). Narod and colleagues linked this gene to hereditary forms of
ovarian cancer35. A race to identify and clone the gene was on36. One research group, led
by Mark Skolnick at the University of Utah (part of the BCLC), went private through the
creation of Myriad Genetics, and in 1994 isolated and sequenced the BRCA1 gene using
private and federal support10,11. Confirmatory results from parallel investigations appeared
two months later in Nature Genetics37.
Knowledge of the BRCA1 gene and progress in genetic molecular cloning by the Human
Genome Project quickly led to the localization of the BRCA2 locus on chromosome 13,
by researchers from the University of Utah, the Institute for Cancer Research and the
Wellcome Trust Sanger Institute38. The BRCA2 gene was partially sequenced in 1995
by the Cancer Research Campaign (now Cancer Research UK) led by Mike Stratton12. Its
sequence was later completed by Myriad39.
For the clinical and diagnostic importance of these genes, see refs. 40 and 41.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

patents

npg

© 2013 Nature America, Inc. All rights reserved.

effects of the BRCA patents. First, geneticists
and cancer specialists voiced fears about barriers to patients’ access to genetic breast cancer susceptibility tests at fair and reasonable
prices (see Box 2). Myriad was marketing an
allegedly expensive BRACAnalysis test, and
samples were required to be sent overseas to
the US to be analyzed. Second, at that time
the BRACAnalysis test was not comprehensive: it did not include deletion and duplication analysis (first developed in Europe, in
public genetics laboratories), so patients were
not offered the best available tests7,8. It was
not until January 2013 that Myriad’s ‘integrated’ test included the BRACAnalysis Large
Rearrangement Test (BART)9. In addition,
patients’ access to independently developed
BRCA tests was impeded, making it harder
to confirm test results or find more affordable alternatives. Finally, Myriad’s direct-toconsumer sales strategy7 was a serious concern
for the genetics community.
A chronological and in-depth analysis of the
BRCA patent dispute is worthwhile and will
be useful in the context of new developments
in European as well as US case law on human
genes and genetic diagnostic testing.
Patenting of the BRCA genes
In spite of the existence of the Breast Cancer
Linkage Consortium (see Box 1), Myriad
applied independently for a number of
US patents on the BRCA1 gene, BRCA1
mutations and the genetic diagnostic test,
among which was US300266 (here referred
to as priority document 2 (P2)), filed on
2 September 1994 (see Supplementary Table 1).
Importantly, the BRCA1 sequence provided
in the first patent application contained
sequencing errors; P2 discloses a BRCA1
sequence (SEQ ID NO: 1) differing in 15

7 Sept. 2010

13 Nov. 2008
19 Nov. 2008

27 Sep. 2007

13 Mar. 2007

9 Jun. 2005
29 Jun. 2005
19 Sep. 2005

17 May 2004

11 Feb. 2004

8 Jan. 2003

28 Nov. 2001

23 May 2001

10 Jan. 2001

25 Nov. 1996
17 Dec. 1996

8 Aug. 1995

Figure 1 Timeline of
1995–1996
2001–2004
2004–2007
2007–2010
the application, grant,
FILING
GRANTING
OPPOSITION
APPEAL
opposition and appeal
events of the various
BRCA1 and BRCA2 patents.
The number of claims at
EP699754
BRCA1 - DIAGN
each stage is indicated.
EP705903
For the BRCA1 patents,
BRCA1 - MUT
the priority documents
BRCA1
were P1, US289221
EP705902
BRCA1 - GENE
(filed 12.08.1994);
P2, US300266 (filed
EP785216
2.09.1994); P3, US308104
BRCA2
(filed 16.09.1994);
BRCA2
P4, US348824 (filed
EP858467
29.11.1994);
BRCA2 (CRUK)
P5, US409305 (filed
24.03.1995);
P6, US483554 (filed
7.06.1995); P7, US487022 (filed 7.06.1995); P8, US488011 (filed 7.06.1995). P5 discloses
SEQ ID NO: 2 (the incorrect sequence) of BRCA1.

nucleotides from the BRCA1 sequence disclosed in the European patent applications.
Nine of the sequence deviations in P2 lead
to amino acid substitutions in the predicted
protein, and six are silent. The relevant scientific publication, which contained (only) the
protein sequence of the 17q-linked BRCA1
gene, appeared in Science on 7 October 1994
(refs. 10,11). Myriad released the (correct)
nucleotide sequence through GenBank at the
time of the Science publication. However, it
was only in March 1995 that Myriad filed a
further US patent application (here referred
to as P5) that contained, effectively, the correct sequence. On 11 August 1995, within 12

months of the P2 filing, three European BRCA1
patents (EP699754, EP705903 and EP705902)
were filed, claiming priority from the different
US applications.
In Europe, the diagnostic methods, a number of mutations and their diagnosis and the
BRCA1 polynucleotides and proteins were ab
initio divided among these three patent applications. Use of the BRCA1 gene to diagnose,
by any technique, a predisposition to breast
and ovarian cancer was patented through
EP699754; a series of BRCA1 mutations was
claimed in EP705903; and the BRCA1 gene
itself and a plethora of possible applications
(such as the development of antibodies and
transgenic mice) were claimed in EP705902.
The precise content, problems with the filing
dates and subsequent amendments to these patents are discussed below. It will become clear
that, if P2 were a valid priority document, then
the subsequent Science publication10 (and the
disclosure of the correct sequence in GenBank
in October 1994) would not have been novelty
destroying for the European BRCA1 patent
application and granting procedures.
For the BRCA2 gene, patent applications were filed by two parties. The first
priority filings leading to the BRCA2 gene
patent EP858467 were submitted in the
United Kingdom by Cancer Research UK on
23 November 1995. The content of these UK
priority filings was published in Nature on
21 December in the same year12. This patent did
not contain a complete BRCA2 sequence, but it
disclosed the method by which the sequence

BOX 2 Licensing policy of Myriad and co-owners
After the publication of the BRCA1 and BRCA2 sequences in GenBank in 1994 and
1995, laboratories worldwide began designing in-house diagnostic tests for detection of
breast cancer–associated mutations and providing services to patients, largely without
knowledge of pending patent rights. At the time, US applications were not published and
European patent applications remained unpublished up to 18 months after the earliest
filing (March 1996 for BRCA1).
Soon after the discovery of the BRCA genes, Myriad Genetics began marketing the
Multisite 3 BRACAnalysis42, designed for the Ashkenazi Jewish population. Myriad set up
a business and promoted its diagnostic tests and educated physicians to use its testing
services. Its business model was to conduct all proband sequence analysis at its Utah
laboratory and to license out the less expensive, single-mutation detection in the US or in
other countries43.
Meanwhile, Myriad began sending cease-and-desist letters to several laboratories
performing the test by other methods; the first was directed to the University of
Pennsylvania’s Genetic Diagnostics Laboratory. Similar letters were sent to Canadian labs.
To our knowledge, no European lab has received such a letter, although Myriad offered
a license to public laboratories in France and elsewhere to conduct genetic testing5.
Published figures have mentioned an up-front payment of $15,000 plus $50 per test
for a test of five mutations44 but no one has accepted the licensing conditions. However,
the threat of being sued was an additional trigger for European opposition actions. At the
time, many genetic discoveries were being made and patented, and public debate on the
patenting of biotechnological inventions and access thereto was rife45.

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

705

patents
BOX 3 Patent granting and opposition procedure

npg

© 2013 Nature America, Inc. All rights reserved.

For a European patent to be granted, the invention must be new,
inventive and industrially applicable. Findings contrary to the
ordre public or morality are excluded from patentability (Articles
52–57, European Patent Convention (EPC)). Further, inventions
must be fully disclosed in the patent application, completely
described and clear enough for it to be carried out by a person
skilled in the art (Articles 83 and 84 EPC). Analyses applied
to claims on genes and diagnostic methods were published by
Huys et al.13,14,46.
Opposition is a procedure at the European Patent Office
(EPO) to challenge European patents (Article 100 EPC). An
opposition can be filed up to nine months after mention of grant
of the patent, for the following reasons: the subject matter is
not patentable within the terms of Articles 52–57 EPC, the

had been identified (including a ‘product-byprocess’ claim). It is known as the ‘Stratton
patent’, after the main inventor and principal
investigator on the patent and publications. In
December 1996, Myriad filed a European patent application (leading to EP785216) claiming
the BRCA2 sequence, a number of mutations
that had been discovered in a research context
and a wide range of diagnostic applications.
Understanding the course of the process
Opposition and appeal procedures (see Box 3)
were launched before the EPO with respect to
the three BRCA1 patents (EP699754, EP705903
and EP705902) and the BRCA2 patents
(EP858467 and EP785216). A recurring factor
in the decisions of the EPO opposition division
and boards of appeal was the importance and
questionable validity of P2 (US300266). What
happened is not an uncommon scenario in the
deposition of genetic sequencing results: given
the importance of the ‘first-to-file’ principle in
Europe, the identification of a genetic sequence
often triggered a rush by the inventors to file a
patent application to get the earliest possible priority date. In this case, the priority date was thus
linked to P2. Despite the errors in the sequence,
P2 contained the correct open reading frame
(ORF) for the BRCA1 gene. Therefore, by subsequently limiting the claims to the ORF rather
than to the correct complete sequence, the patent owners managed to push back the priority date of the patent from March 1995 (P5) to
September 1994 (P2) so that the Science publication could no longer be considered novelty
destroying. This saved the diagnostic-method
claims in EP699754: the board of appeal held
that the exact nucleic acid sequence is not decisive for the validity of the priority date when the
method is meant to detect frameshift mutations
and when the ORF is correct. This explains why,
eventually, a patent was granted on a method
for determining frameshift mutations. But for
706

patent does not disclose the invention in a manner sufficiently
clear and complete or the subject matter of the patent extends
beyond the content of the application as filed. Appeals can be
filed against the decisions of the examining office or opposition
division. During opposition and appeal, the parties may propose
alternative claims (main or auxiliary requests) to the opponents.
Doubts with regard to the public interest, however, are not
official grounds for opposition or appeal before the EPO.
Thus, the patent system, and especially the procedure for
attacking patents, differs from that in the US, where granted
patents are typically challenged in court rather than before the
patent office. In Europe, third parties will more often make
use of the opposition procedure to challenge the validity of the
patents before the EPO.

claims to oligonucleotides (that is, probes), such
as those in EP705902 and EP705903, errors in
the sequence did undermine their validity.
These proceedings had one particularly
intriguing feature that is also important for
understanding this review and its complexity: the timing of the oppositions and appeals
regarding the three BRCA1 patents (that is,
the dates set by the EPO for the oral hearings)
changed over the course of the procedure. The
patent that would most affect genetic diagnostics (EP699754) was issued first, and, logically, the oral hearings before the Opposition
Division were scheduled first. During these oral
hearings, the patent was revoked in its entirety.
Shortly thereafter, the patentee significantly
reduced the scope of the claims of the two other
BRCA1 patents (EP705902 and EP705903) prior
to the oral hearings on each patent’s opposition.
But the oral hearings for the second (EP705903)
and third (EP705902) patent were handled in
reverse order. The EPO’s rationale for this decision has not been disclosed. Nevertheless, both
patents were further reduced in scope during
the oral proceedings. Interestingly, the order
in which the oral proceedings for the appeal
procedures were scheduled was again different, such that the most important of the three,
diagnostic patent EP699754, was handled last.
It would appear that there were good reasons
for altering the order. It can be argued, however,
that this alteration was of considerable advantage to the patent owner.
For the sake of clarity, we will discuss the outcome of each patent in the order of the respective decisions in appeal. Note that because
validity was first challenged before the EPO, the
case has not (yet) reached the national courts
in Europe.
European actions against BRCA1 patents
Claims on the BRCA1 gene (EP705902). The
original European patent application EP705902

was filed with 50 claims covering, among others, a handful of mutations found to be linked
to familial breast and ovarian cancer, and the
use of these mutations in a method for diagnosing predisposition to these cancers (the most
relevant claims are included in Supplementary
Table 2). The claims were found inventive by
the EPO, although they relate to mutations
that could be easily and immediately found
in different patients in different populations.
The patent was granted with 34 claims that
referred directly or indirectly to the BRCA1
gene sequence “or an amino acid sequence
with at least 95% identity to the amino acid
sequence of SEQ. ID. NO: 2” (Supplementary
Table 1, EP705902 claim 1). This description
was inserted to encompass all anticipated polymorphic BRCA1 alleles. The proprietor tried
to use this variability to accommodate the
sequencing errors in P2. But the 5% margin
of error is staggeringly high compared to the
variation in a human population. For reference,
the divergence in coding sequences between
individuals is on the order of 0.1%; between
humans and chimpanzees it is less than 2%.
Eight oppositions were filed against the patent as granted5. This was more than for the
other patents, probably because it was the last
case to be handled, hence the response from
clinical and research communities, national
governments and nongovernmental organizations had accumulated. Prior to the oral hearing and very much aware—from the parallel
proceedings—of the problems with the priority date, the patentee filed a new main request
(see Box 3). In this request, the original product claim (claim 1 of the patent as granted)
was reformulated as a product-by-process
claim (indicated by wordings such as “obtainable by”), a similar approach to that taken by
Cancer Research UK in the Stratton patent.
An auxiliary request was filed as well, limited
to claims 6, 10 and 12 of the original patent,

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

patents
directed to a probe, a cloning vector and host
cells, respectively. However, the opponents’
main concerns were related to the product-byprocess claim in the newly filed main request,
which was considered to be unsupported by the
application as filed and, arguably, beyond the
scope of the original filing (and thus contrary
to EPC Article 123(2), 123(3) and 84). The patentee put forward that, by using the specified
probe to screen a genomic library, one could
find the BRCA1 coding sequence, and that
from this point of view the invention had been
disclosed sufficiently to obtain a patent. But the
opposition division agreed with the opponents
and decided that the main request did not meet
the requirements of the EPC. For this reason,
EP705902 was limited to claims covering only
the probe, the cloning vector and the host cells
(Supplementary Table 1).
From a molecular diagnostic standpoint, a
patent claim on a probe to isolate a gene would
not interfere with the sequencing process to
detect BRCA1; thus, the impact of EP705902
was significantly reduced.
The patent proprietor and the opponents
lodged appeals against the EPO’s decision in
March and November 2005. Subsequently,
a new main request and three other auxiliary
requests were filed (Supplementary Table 1).
On 12 December 2007, the appeal board decided
the case15. The board first considered the ‘main
request’, which was again written in a productby-process format, and which was again found
by the appeal board not to meet the requirements of Article 123(2) EPC and, hence, not
supported by the original application as filed.
Next, the board considered auxiliary request I,
which contained a disclaimer (Supplementary
Table 1) that selectively excluded the differences
between the correct and erroneous sequences
of BRCA1, and found it, too, not compliant
with Article 123(2) EPC. The board then discussed in detail the issue of ‘priority’ in view
of auxiliary request II. According to the board,
which followed earlier case law16, the same date
of priority could be claimed only for the same
invention (Article 87(1) EPC). Thus, no priority
right should be claimed from an earlier application (such as P2) disclosing a DNA sequence
unless its deviation from the claimed sequence
is within the margin of error of the sequencing
method used. In addition, support for a claim in
an application is acknowledged only if a skilled
person can derive the subject matter of the
claim ‘directly and unambiguously’, using common general knowledge, from the application
as a whole. The patentee admitted during the
appeal procedures that there were differences
between the sequences in P2 and the application filed with the EPO; however, according to
the patentee, this difference occurred within

the margin of error of Sanger sequencing and
had no effect on the use of the DNA sequence
to diagnose breast and ovarian cancer. However,
the board held that a narrow and strict interpretation should be applied to the concept of
‘the same invention’ (following earlier case law,
G2/98 and T70/05) and thus refused the second
auxiliary request.
With regard to the third auxiliary request,
the discussion briefly addressed whether
the claimed probe was a discovery or a patentable invention. The board concluded
that the probes are isolated elements of the
human body and are therefore patentable
(Supplementary Table 1) and, thus, accepted
auxiliary request III, which closed the proceedings on this patent.
Overall, the patent was significantly limited,
to claims to a probe derived from the BRCA1
gene, a cloning vector or host cells. These
claims entail no specific problems for genetic
diagnostics.
From multiple claims to one gene claim
(EP705903). On 11 August 1995, Myriad
Genetics and co-applicants filed a European
patent application claiming rights to a mutant
BRCA1 gene, a probe, a cloning vector and, in
addition, a method for identifying a mutant
BRCA1 nucleotide sequence (already patented in EP699754, hence a form of ‘selection
invention’; the most important independent
claims are listed in Supplementary Table 1).
Essentially, a claim including 34 individual BRCA mutations was granted, as were
claims to methods for detecting these mutations. Oppositions were filed by six parties5
(Greenpeace and the government of The
Netherlands joined the parties that had previously opposed EP699754).
Oral proceedings were held on
24–25 January 2005. The first patent
(EP699754) had already been revoked by
the opposition division (on 17 May 2004),
and the patentee arguably tried to recover
the original method claim of the EP699754
patent by submitting an entirely new main
request and three auxiliary requests for
EP705903. The main request now contained
only one claim for a method for diagnosing
breast and ovarian cancer by determining the
presence of one mutation (instead of 34) in
the BRCA1 gene—185delAG, a frameshift
mutation that occurs frequently, particularly
in Ashkenazi Jewish populations. But because
the claim still relied on the priority document containing the wrong BRCA1 sequence
(P2), the opposition division decided that it
could not be accepted. Hence, EP705903
was amended before the opposition division
(on 9 June 2005) to contain three specific

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

claims, whereby the method claim for the
detection of the 185delAG mutation was
changed to refer to a probe “consisting of
15 to 30 nucleotides of SEQ ID NO: 1 and
containing the mutation 185delAG>ter39”
(deletion of AG nucleotides at the 185th
position (185delAG); the two-base-pair
deletion introduces a stop codon at the
39th amino acid residue; Supplementary
Table 1). In practice, this means that the patentee could exclude third parties from using
such a probe to detect the mutation, but one
could still detect the mutation by sequencing
across exon 2 of BRCA1 without infringing
the patent.
The patent owner and the opponents
appealed against the decision of the opposition
division. The board of appeal concluded the
issue on 13 November 2008 (ref. 17). During
the oral proceedings, the patentee requested
the patent be maintained on the basis of nine
claims, wherein the product claim on the
probe was replaced (again) by a diagnostic
method claim on the ‘Ashkenazi mutation’
(Supplementary Table 1). The opponents
again identified sequencing errors in intron
and exon regions of the BRCA1 sequence
in one of the priority documents. However,
the board of appeal in this case took a less
stringent interpretation of the ‘same invention’ requirement. With respect to priority,
the board concluded that none of the 15
sequence deviations in the priority document
could have any effect on the claimed method
of diagnosing this specific mutation (as the
closest deviation occurred several nucleotides
away from the 185delAG mutation). Thus, it
was decided that with respect to the defining
features of the claimed invention, the disclosure of the sequence in the relevant priority
document (P2) and the patent in suit were
identical.
This decision elaborated previous case law
on priority claims, particularly a case concerning tissue plasminogen activator (t-PA)18,
and on the difference between a product claim
and a method claim in the context of entitlement to priority. In the t-PA case (containing a product claim for the PLAT gene, which
encodes t-PA in humans, the board denied
entitlement to priority on the basis of three
nucleotide differences (resulting in three
amino acid differences) between the priority sequence of the PLAT gene, which was
later corrected in the European patent. In
that case, the board of appeal stated that “the
primary amino acid sequence of a protein (or
nucleotide sequence of a DNA) constitutes a
true technical feature and relying on a given
sequence rather than on another one for the
definition of the subject matter in a claim
707

patents

npg

© 2013 Nature America, Inc. All rights reserved.

makes a critical difference”. This view was
confirmed in several subsequent cases19–21
and even in the appeal decision of the
BRCA1 patent EP705902 (ref. 15). However,
for a (use) claim to a probe, because absolute identity is not necessary to make such a
probe useful, small differences in sequences
are tolerated. This means that the importance of mismatches between the sequences
filed and those granted is weighted against
the type of claim at hand (protein sequence
versus probe).
The decision of the opposition division
ultimately was set aside and the patent was
amended according to the new main request.
Its claims were limited, however, with a potentially higher impact on diagnostics for certain
Jewish populations.
From a comprehensive diagnostic method
to claims for detecting frameshift mutations
(EP699754). From a genetic diagnostic standpoint, patent EP699754 has always been the
most important. Myriad filed a patent application on 11 August 1995 with claims broadly
covering the use of BRCA1 for in vitro diagnostic testing, whatever the technique chosen in the laboratory. The patent was granted
on 10 January 2001 to Myriad Genetics, the
University of Utah Research Foundation
and the US Department of Human Health
and Services, and it contained a total of 29
claims (claims 1, 2, 25 and 26 are shown in
Supplementary Table 1).
Oppositions were launched before the
EPO by three French institutes (the Institut
Curie, the Assistance Publique–Hôpitaux de
Paris and the Institut Gustave Roussy); by
an informal Europe-wide consortium led
by the Belgian Society of Human Genetics;
and by an Italian initiative with the Angela
Serra Association for Cancer Research, which
represented an Italian patient organization5.
Grounds for opposition were similar as those
for the other patents. In addition, the opponents cited prevention of innovation, lack of
guarantee of the test’s availability for certain
women and unfair advantage in the creation
of a private mutation databank by Myriad.
During the oral proceedings before the opposition division, the proprietor filed several
auxiliary requests (Supplementary Table 1),
some of which contained wording that
would allow for a removal of the sequence
information on the BRCA1 gene from the
claim. Interestingly, during the oral proceedings, Myriad failed to introduce the
‘frameshift approach’ (that is, the limitation
of the claims to frameshift mutations only)
that was later proposed during appeal (and
upheld by the board of appeal, as mentioned
708

earlier). The opposition division found that
all these requests lacked clarity or extended
the scope of the patent, contrary to Article
123(2) or 123(3) EPC, and revoked the
patent on 17 May 2004. This decision was
widely covered in the press.
During the proceedings of the board of
appeal, the rephrasing of the claim led to a
reinstatement of the patent, now claiming “a
method for diagnosing a predisposition for
breast and ovarian cancer in a human subject
which comprises determining in a tissue sample of said subject whether there is a germline alteration that is a frameshift mutation in
the sequence of the BRCA1 gene coding for a
BRCA1 polypeptide altering the open reading
frame for SEQ ID NO: 2, said alteration being
indicative of a predisposition to said cancer”
(Supplementary Table 1).
Notably, the amended patent now also
includes claims specifically addressing
the detection of the 5385insC mutation in
BRCA1. The opponents argued that there
was a lack of clarity because the sequence
in SEQ ID NO: 2 defined an amino acid
sequence, whereas an ORF is a feature of a
nucleic acid sequence. But the board concluded that ‘frameshift mutation’ would be
clearly understood by the skilled person
without ambiguity22.
With regard to the errors in the coding
sequence in P2, the board argued that a
skilled person would be able to anneal the
primers to the target sequence using other
selected experimental conditions. Thus, the
board concluded that the same test results
would be obtained by a skilled person performing the method of claim 1 of the main
request using the sequence information from
the disputed P2 or from the patent in suit.
This contrasts with the decision in the related
T1213/05 case on EP705902. The claim in
the BRCA patent EP705902 was a product
claim, conferring absolute product protection for the claimed DNA sequence—that
is, also covering uses where the differences
were of vital importance. In EP699754, the
invention is a diagnostic method in which
the information is necessary as a reference
for the determination of frameshift mutations. According to the board, this comparison does not require the exact sequence. As
a final result of the appeal procedure, the
patent was maintained in its amended form,
including the ‘frameshift claim’ and six additional claims.
As frameshift mutations are the most frequently occurring pathogenic mutations in
the BRCA1 and BRCA2 genes23, this patent
would potentially interfere with genetic diagnostic services.

European actions against BRCA2 patents
EP785216. The first BRCA2 patent, granted
to Myriad Genetics and the University of
Pennsylvania, related to BRCA2, several
disease-associated BRCA2 mutations, and
for methods of sequencing allelic variants
suspected to be associated with a predisposition to breast cancer. Oppositions were filed
by several parties, including the French- and
Belgian-led groups mentioned earlier5.
During the opposition proceedings, the
patentee filed a new main request comprising a single claim, which read: “Use of an isolated nucleic acid which comprises the coding
sequence set forth in SEQ ID NO: 1 from
nucleotide position 229 to nucleotide position
10482 and further comprising the mutation
associated with a predisposition to breast
cancer, wherein T at nucleotide position 6174
is deleted, for diagnosing a predisposition to
breast cancer in Ashkenazi-Jewish women
in vitro” (Supplementary Table 1). Although
it is understandable that the patentee wanted
to safeguard a claim on this mutation because
of its commercial interest, the fact that the
mutation as such had been described prior to
the filing date of this patent necessitated the
inclusion of a distinguishing technical feature to make the claim novel and inventive.
The proprietor argued that the reference to
Ashkenazi Jewish women constituted such a
feature. In other words, the claim was granted
on a method for diagnosing Ashkenazi Jewish
women with this mutation, on the basis of
the argument that no one had previously
associated this mutation with this particular
subpopulation. From a molecular geneticist’s
perspective, it could be argued that this is not
much more than a coincidence.
The opposition division considered this new
request, limited to detection of the 6174delT
mutation in women of Ashkenazi descent, valid;
the patent was maintained in amended form24.
The opposition division stated during the oral
hearings that the claim meets Article 53(a)
EPC—the morality clause. In the weeks afterward, the European Society of Human Genetics
and several Jewish women issued press releases
and sent letters to the EPO expressing concerns
about the specific maintenance of the Ashkenazi
Jewish people in the claims. This ruffled feathers at the EPO.
Because the BRCA2 patent (EP785216)
was thought to contain several inventions, a
BRCA2 divisional application (EP1260520)
was filed focusing on the use aspect, but the
examining division considered the claims for
genetic diagnostic methods not to be patentable, as not supported by the application as
filed (because of a difference in the reference
material). On appeal, the patentee rephrased

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

patents
BRCA1
EP699754
EP705902
EP705903
BRCA2
EP785216

npg

© 2013 Nature America, Inc. All rights reserved.

Figure 2 Limited coverage of BRCA patents in Europe (as of 1 July 2013). For a patent to be valid, the
proprietor has to pay annual fees per country (these renewal fees are paid to each national patent office,
which then pays half of the proceeds back to the EPO). Thus, a patent proprietor typically selects the
countries in which to maintain the patents. The coverage for the Myriad patents has changed over time.
Patent data can be retrieved at https://register.epo.org/espacenet/regviewer. Yellow: patent coverage of
BRCA1 and BRCA2 patents. Grey: patent coverage of only BRCA2 patents.

its method claim to “a method for determining
variation in the open reading frame (ORF)…”
(claim 1 of Main Request on EP1260520) in
line with the previous BRCA decisions. In
a communication issued in June 2010, the
board expressed a positive view of the claims
in the new main request. The appeal board
considered itself in line with the rulings of the
board in T666/05 and T80/05, two other cases
involving determination of frameshift mutations where the reference ORF was defined in
relation to a specific sequence, and proposed
to remit the case to the department of first
instance for further prosecution.
EP858467. The second BRCA2 patent
granted—the ‘Stratton patent’—asserted
rights over BRCA2 in all its allelic variants
through a product-by-process claim (claim 3)
and contained a claim to a method of diagnosis by determination of mutations in part of
the BRCA2 gene (claim 16). As with BRCA1,
incomplete BRCA2 sequences were initially
filed by Cancer Research UK. This patent
was opposed by Myriad Genetics. The opposition division decided that priority for claims
referring to the full-length BRCA2 sequence
could validly be claimed from the first two
priority documents (GB 9523959.6, filed
23 November 1995, and GB 9525555.0, filed
14 December 1995), though the first priority
document represented only about 10% of the
full-length BRCA2 sequence, and the second
represented about 75% of the full sequence.
During opposition, the patent was amended
(Supplementary Table 1).
The decision of the opposition division was
appealed by Myriad, and the patent EP858467
was revoked25. Thus, at present, only one
BRCA2 patent is still effective, namely Myriad’s
patent.

Consequences of the EPO decisions
The EPO decisions in the oppositions and
appeals described above may influence genetic
diagnostics on the BRCA genes but have limited
effect on genetic diagnostics in general8,13,26.
With regard to BRCA testing, the
European patents on BRCA1 and BRCA2
held by Myriad have been drastically reduced
in scope. Myriad now owns patents covering the detection of an individual ‘Ashkenazi
mutation’ in the BRCA1 gene (EP705902);
methods for detecting frameshift mutations
in BRCA1 (EP699754); probes, cloning vectors and host cells relating to the BRCA1
gene (EP705903); and methods for detecting mutations in BRCA2 (EP785216). The
patents no longer cover the BRCA1 gene
sequence as such. It is noted that the patent on the frameshift mutations creates legal
uncertainty, as it is unclear how to interpret
the claim from a diagnostic perspective14.
When a screening for genes associated with
breast cancer is performed, the results may
include a frameshift, but most of them will
not, given that BRCA1 and BRCA2 mutations
are found in only a fraction of cases (depending on the inclusion criteria for testing). For
the BRCA2 gene, no claims on frameshift
mutations are on file at present, but there is
a chance that they will be introduced for the
divisional filings.
With regard to technical matters, it appears
that for product claims on DNA or amino acid
sequences, entitlement to priority can be lost
on the basis of sequence deviations between
a priority document and a subsequently filed
European patent application. But this is not
necessarily the case for claims on methods in
which DNA or amino acid sequences are used,
as long as the sequence deviations do not affect
the method claimed.

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

Although Myriad has consistently communicated that it has no intention of attacking
academic researchers who use its inventions27,
certain (commercial) diagnostic laboratories could now potentially be prosecuted
for infringing these patents. However, the
infringing laboratories have to be brought
before a national court, and it remains to be
seen whether the patents would be considered valid after a further exhaustive review in
such courts. In this regard, the Court of First
Instance in Paris has held that “the French
courts are not bound by the decision of the
EPO so that these decisions—even if issued
by the Enlarged Board of Appeal—are merely
indications of the analysis made by the EPO
to grant European patents”28. Thus, one may
expect that the Myriad patents will be subjected
to a thorough de novo investigation into patentability if these patents are ever enforced before
a national court in Europe. Interestingly, the
patents have been maintained only in selected
countries (Fig. 2). Clearly, this creates additional disparity for genetic laboratories and
patients.
Outlook
Was it worth it, in the end, to oppose the BRCA
patents before the EPO? The decade-long challenge to the patents in Europe arguably kickstarted the worldwide discussion of their legal
and ethical validity that has now culminated in
the US with the recent Supreme Court ruling.
The focus, however, seems to have shifted from
patents on genes to patents on biomarkers—
often genetic29,30.
In Europe, the legislation on patenting—
and, thus, the legal framework in which the
EPO operates—has not changed. There is no
way to color outside the lines of patent law
during opposition and appeal, though society’s appreciation of genes and patents may well
have changed, and gene patents are probably
less valued than they were before the recent
evolutions in the US.
In the meantime, Myriad Genetics has
opened a laboratory in Europe and seems to
have decided to compete commercially with
European genetics laboratories rather than
legally challenge them. New issues, such as the
creation of proprietary databases of human
genetic sequences, are arising as a result of
the monopoly in genetic testing caused by the
patenting31.
Note: Any Supplementary Information and Source Data
files are available in the online version of the paper
(doi:10.1038/nbt.2644).
ACKNOWLEDGMENTS
The authors would like to acknowledge many people
for advice throughout the years, and specifically

709

patents
D. Halley, E. Girodon-Boulandet and
E. Van Zimmeren for critical comments on the
manuscript. This article is based on the examination
documents publicly available on the register of the
EPO (www.epoline.org or https://register.epo.org/
espacenet/regviewer) and related literature.

1. Davies, K. & White, M. Breakthrough: The Race to Find
the Breast Cancer Gene (J. Wiley, Hoboken, NJ, USA,
1996).
2. Dalpé, R. et al. Sci. Technol. Human Values 28, 187–
216 (2003).
3. Association for Molecular Pathology et al. v. Myriad
Genetics, Inc., et al. 702 F.Supp. 2d 181, 192–211
(SDNY, 2010).
4. Association for Molecular Pathology et al. v. Myriad
Genetics, Inc. et al. US 12-398 (2013).
5. Matthijs, G. Fam. Cancer 5, 95–102 (2006).
6. In 2004, Myriad Genetics transferred its rights on the
different patents to the University of Utah Research
Foundation. Strictly speaking, one can thus no longer
call them the “Myriad patents.” Still, it is generally
known that Myriad is the exclusive licensee, so for the
purpose of this survey, the change in patent ownership
is of no relevance.
7. Matloff, E.T. & Brierley, K.L. Lancet 376, 314–315
(2010).
8. Cook-Deegan, R. et al. Genet. Med. 12, S15–S38
(2010).

Sharp & Dohme Corp. Tribunal de Grande Instance de
Paris, third chamber, 28 September 2010, N° RG:
07/16296.
29. Hopkins, M.M. & Hogarth, S. Nat. Biotechnol. 30,
498–500 (2012).
30. Graff, G.D. et al. Nat. Biotechnol. 31, 404–410
(2013).
31. Cook-Deegan, R., Conley, J.M., Evans, J.P. &
Vorhaus, D. Eur. J. Hum. Genet. 21, 585–588 (2012).
32. Lenoir, G.M. et al. Cancer Res. 50, 4448–4449
(1990).
33. Hall, J.M. et al. Science 250, 1684–1689 (1990).
34. Easton, D.F. et al. Am. J. Hum. Genet. 52, 678–701
(1993).
35. Narod, S.A. et al. Lancet 338, 82–83 (1991).
36. Albertsen, H. et al. Am. J. Hum. Genet. 54, 516–525
(1994).
37. Friedman, L.S. et al. Nat. Genet. 8, 399–404 (1994).
38. Wooster, R. et al. Science 265, 2088–2090 (1994).
39. Tavtigian, S.V. et al. Nat. Genet. 12, 333–337 (1996).
40. Lalloo, F. & Evans, D.G. Clin. Genet. 82, 105–114
(2012).
41. Ripperger, T. et al. Eur. J. Hum. Genet. 17, 722–731
(2009).
42. Williams-Jones, B. Health Law J. 10, 123–146
(2002).
43. Gold, E.R. & Carbone, J. Myriad: In the Eye of the
Policy Storm. (The Innovation Partnership and the
McGill Centre for Intellectual Property Policy, 2008).
44. Munktell, P. Compulsory Patent Licensing. (Master’s
thesis, Univ. Lund, Sweden, 2004).
45. Crichton, M. Next 418–419 (Harper Collins, 2008).
46. Huys, I. et al. Nat. Biotechnol. 27, 903–909 (2009).

npg

© 2013 Nature America, Inc. All rights reserved.

COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

9. Myriad. Integrated BRACAnalysis to include BART.
<http://d1izdzz43r5o67.cloudfront.net/sales-aids/Inte
grated+BRACAnalysis+to+Include+BART.pdf> (2012).
10. Miki, Y. et al. Science 266, 66–71 (1994).
11. Futreal, P.A. et al. Science 266, 120–122 (1994).
12. Wooster, R. et al. Nature 378, 789–792 (1995).
13. Huys, I. et al. Eur. J. Hum. Genet. 19, 1104–1107
(2011).
14. Huys, I. et al. Nat. Rev. Genet. 13, 441–448 (2012).
15. T1213/05 (Breast and ovarian cancer/UNIVERSITY OF
UTAH) of 27.9.2007.
16. G2/98, OJ EPO 2001, 413.
17.
T0666/05 (Mutation/UNIVERSITY OF UTAH) of
13.11.2008.
18. T923/92, OJ EPO 1996.
19. T0351/01 (Tissue Factor Protein/GENENTECH) of
2.7.2003.
20.
T70/05 (Apoptosis receptors/GENENTECH) of
7.2.2006.
21. T0030/02 (Xylanase/NOVOZYMES) of 9.10.2006.
22. T0080/05 (Method of diagnosis/UNIVERSITY OF
UTAH) of 19.11.2008.
23. Caux-Moncoutier, V. et al. Hum. Mutat. 32, 325–334
(2011).
24. T0156/08 (BRCA2/UNIVERSITY OF UTAH) of
14.1.2011.
25. T0902/07
(BRCA2/CANCER
RESEARCH
TECHNOLOGY) of 7.9.2010.
26. Cho, M. Trends Biotechnol. 28, 548–551 (2010).
27. Rimmer, M. Intellectual Property and Biotechnology:
Biological Inventions (Edward Elgar, Cheltenham, UK,
2008).
28. Actavis Group & Alfred E. Tiefenbacher GmbH v. Merck

710

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

patents

npg

© 2013 Nature America, Inc. All rights reserved.

Recent patent applications in drug discovery automation
Patent number

Description

Assignee

Inventor

WO 2012026929,
EP 2609209,
US 20130173503

A computerized method for analysis of compound data
used for drug discovery, involving providing a computer
having memory, providing to the memory training data
from a training data set containing at least one training
compound with at least one property value for each training
compound and providing to the memory data identifying
each training compound as achieving or not achieving the
objective (preferably drug discovery objective).

Optibrium
(Cambridge, UK),
Hashimoto T,
Segall M

Hashimoto T,
Segall M

US 20130090268

A microarray culture system comprising a microfluidic
device with functioning pneumatically controllable valves
and enabled to interface with automated robotic systems,
where the device is integrated with housing or a holder that
facilitates putting reservoirs of fluids in communication
with the device, and connects fluid lines and pneumatic
controls to the device; useful in drug discovery.

EMD Millipore
(Billerica, MA, USA)

Hung PJ, Lee PJ

EP 2546644,
US 20130014566

An operation optimization method for a liquid chromatography (LC) system whereby a sample is injected from
the autosampler to an LC apparatus after transmission of
a signal from the LC apparatus to the autosampler when
no errors are encountered during autosampler operation;
useful in delivering a liquid sample applied to the fields
of drug discovery and development, environmental testing
and diagnostics.

Thermo Finnegan
(San Jose, CA, USA),
Marks AN

JP 2013001460

An automatic storage cabinet for drug discovery sample
units, with a cam-groove portion extended along the cam
plate driving direction and connected to another camgroove portion diagonally intersected to the longitudinal
direction of the guide rail.

US 20110256630

Priority
application
date

Publication
date

8/25/2010

3/1/2012,
7/3/2013,
7/4/2013

1/4/2006

4/11/2013

Marks AN

7/15/2011

1/16/2013,
1/17/2013

Tsubakimoto Chain
(Osaka, Japan)

Tsutsumi K

6/10/2010

1/7/2013

A monitoring system for conducting automated sampling,
sample preparation and/or sample analysis in a multiwell
plate assay format, comprising sample collection modules
fluidically connected to a detection module, and a plate
handling subsystem.

Clinton CM

Clinton CM

4/19/2010

10/20/2011

JP 2011027465

A trace-amount liquid fractionation device with several
micro side-flow paths that are arranged at one side of the
side-flow paths; useful in drug discovery.

Kyushu University
(Fukuoka, Japan)

Yasuda T

7/22/2009

2/10/2011

GB 2472252

A microplate holder for an automated microplate processing system with a platform for inclining the microplate and
positioning the microplate at a certain angle range so that
one edge is vertically raised relative to another edge of the
microplate.

Stafford S

Stafford S

7/31/2009

2/2/2011

GB 2479628,
WO 2011128228

A system for automated determination of motion of a biological object that generates a time series of subtractive
images, derives measurements from the time series and
analyzes the measurements to quantify motion in the time
series.

GE Healthcare UK
(Little Chalfont, UK)

Thomas N

4/12/2010

10/19/2010,
10/20/2010

US 20100211211,
EP 2224248,
JP 2010210237

A drug discovery screening apparatus with conveyance
arms conveying a plate onto a fixed stage and a plate onto
an XY stage so as to allow respective plates to cross each
other.

Yokogawa Electric
(Tokyo)

Kei T, Nedu T,
Suzuki T,
Yamamoto K

2/13/2009

8/19/2010,
9/1/2010,
9/24/2010

WO 2009145532,
KR 2009122836

A protein separation apparatus for genomic drug discovery
with an isoelectric focusing portion that is connected to a
flat channel through parallel openings, enabling simultaneously isolating proteins in multichannels, and enhancing
protein isolation speed. The proteins can be isolated by pI
and molecular weight, and are not denatured by protein
isolation. The amphoteric electrolyte used in isoelectric
point isolation can be automatically removed.

Yonsei University
Industry-Academic
Cooperation
Foundation (Seoul)

Kim KH, Moon MH

5/26/2008

12/3/2009,
12/1/2009

JP 2009216741

Yokogawa Electric
A drug discovery screening device with an information
processing unit that emits a drive command with respect to (Tokyo)
a well plate based on driven information of the well plate
and focus error signal between the objective lens and well
plate.

3/7/2008

9/24/2009

Jing HZ, Mikuriya K,
Niimi T, Yokoyama Y,
Kei K

Source: Thomson Scientific Search Service. The status of each application is slightly different from country to country. For further details, contact Thomson Reuters (Search
Service), 1925 Ballenger Avenue, Suite 400, Alexandria, VA 22314, USA. Tel: 1 (800) 337-9368 (http://thomsonreuters.com/).

nature biotechnology volume 31 NUMBER 8 AUGUST 2013

711

n e ws a n d v i e ws

From embryonic stem cells to mature photoreceptors
David M Gamm & Lynda S Wright

npg

© 2013 Nature America, Inc. All rights reserved.

Photoreceptor precursors generated from embryonic stem cells integrate and mature after transplantation into the
diseased mouse retina.
The prospect of restoring vision in people with
certain forms of blindness is an exciting frontier
of regenerative cell therapy. Already, expanded
autologous limbal cells are used to correct vision
in patients with damaged corneas1, and clinical
trials are underway involving retinal pigment
epithelial cells derived from human embryonic
stem cells (ESCs). In the field of neuroretinalcell replacement, the group of Ali, Sowden and
colleagues achieved considerable functional
improvement in mice by transplantation of
postnatal photoreceptor precursor cells2,3. Now,
working with a more clinically relevant cell type,
the same group reports in this issue that transplanted photoreceptor precursors derived from
mouse ESCs can integrate and mature in mice
with distinct forms of retinal dysfunction4.
The study marks a notable advance in efforts
to define and optimize conditions for mammalian photoreceptor replacement using an ESC
source. Together with earlier findings from this
team2,3,5,6, it also underscores the importance
of continuously refining our understanding of
donor cell populations and stringently evaluating their engraftment behavior in multiple
disease models.
The array of variables to consider when
planning, executing and evaluating a cell
transplant study is daunting, particularly
when one is working with multiple donor cell
candidates and complex host tissue environments. Determining the ideal cell population
to transplant for a particular therapeutic application is especially challenging when the cell
source is immature primary tissue or pluripotent stem cells, both of which usually contain
heterogeneous cell types at different stages of
David M. Gamm is in the Department of
Ophthalmology and Visual Sciences and the
McPherson Eye Research Institute, University of
Wisconsin, Madison, Wisconsin, USA. Lynda S.
Wright is at the Waisman Center, University of
Wisconsin, Madison, Wisconsin, USA.
e-mail: [email protected]

712

development. In previous work, Ali, Sowden
and colleagues2,3 showed that the developmental
stage of donor mouse retinal cells is an important
parameter in the efficiency of integration, as are
the quantity and the purity of cells. In a subsequent paper, they investigated a photoreceptor
cell source more akin to what might be used
clinically in humans—rod precursors differentiated from ESCs5. However, these precursors,
made from mouse ESCs using an adherent (‘twodimensional’) culture protocol, did not integrate
into normal or degenerated mouse retina5.
In the present study, the authors test rod
precursors produced from mouse ESCs by
a different method—a recently published
suspension-culture system from the Sasai
group7, which generates retinal cell types in
three-dimensional structures that are remarkably similar to the developing mouse optic cup.
Unlike cells produced by two-dimensional
culture, GFP-labeled early rod precursors isolated from these structures by flow cytometry
integrate within the photoreceptor layer of host
mice4. Furthermore, following transplantation,
the donor cells express photoreceptor-specific
proteins, display typical mGluR8-dependent
calcium responses and develop structural characteristics of highly differentiated rods.
Although it is tempting to conclude from
these results that only rod precursors generated
from ESCs in a three-dimensional, tissue-like
environment acquire properties conducive to
integration, other factors, such as variations in
formulation of culture media4,5, may have had
a role. Regardless, comparative analyses of rod
precursor populations that can and cannot integrate in vivo could reveal cellular attributes critical to subsequent cell migration into host tissue
and synapse formation. Indeed, the authors
find that early rod precursors produced in the
optic cup–like structures express more mature
rod markers in culture than those generated
in adherent cultures4. It is possible that the
histologically correct cell polarity and intercellular
interactions achieved in three-dimensional

tissue-like structures help prime early rod precursors to differentiate more fully in vivo.
In addition to highlighting the importance
of the differentiation method used to generate
donor rods from ESCs, the study also reinforces
earlier observations regarding the optimal stage
of photoreceptor differentiation for retinal
engraftment. Previously, the authors examined
donor retina from mice at different ages from
embryonic day 11.5 to adulthood and found that
early postmitotic rod precursors from postnatal
days 4 to 6 integrate most robustly2. Similarly,
rod precursors isolated from day 26 mouse ESC
optic cups, which correspond developmentally
to mouse retina at postnatal days 4 to 6, integrate
into host retina with higher efficiency than cells
from older (day 34) mouse ESC cultures4. These
results add to the growing body of evidence that
a specific time window of photoreceptor differentiation will be needed to achieve maximal cell
replacement in human patients.
Just as not all donor photoreceptors are
created equal, not all host retinal tissues are
equally receptive to cell transplantation. Failure
to examine a variety of diseases and disease
stages as potential environments for cell-based
therapeutics may result in falsely optimistic
or pessimistic conclusions. The authors’ most
striking results in this4 and previous2,3,5,6
publications involve mouse models of visual
impairment in which the photoreceptor cell
layer is relatively intact, albeit nonfunctional.
Improvements in vision have been documented
in more severe models of photoreceptor degeneration8, although without the seamless donor
rod integration seen in more hospitable hosts.
Ali, Sowden and colleagues have also investigated other conditions affecting cell integration in degenerating mouse retina, such as the
continuity of the host outer limiting membrane
and the presence of activated glia9. With these
studies, the authors have begun the important
work of establishing a plausible rationale for the
selection of patients for future photoreceptorreplacement clinical trials.

volume 31 number 8 AUGUST 2013 nature biotechnology

news and v i ews
Host selection

Donor cell preparation
Cross-section
of attached
optic cup

Outcome evaluation

Subretinal
transplantation of
unsorted cells

wEB

Falsely labeled host
photoreceptor
(α-Transducinnegative)

+
–/–

Gnat1 mouse
lacks rod
α-Transducin

FACS
AAV2/9
Unsorted
Dissociated
cell mixture
wEBs containing Rhodopsin
GFP
with GFP-labeled
WT rod precursors
WT rod precursors
and residual virions
Stratified retina-like
tissue

Subretinal
transplantation
of GFP-sorted cells

True donor-derived
photoreceptor
with α-Transducinpositive outer
segment in red

npg

© 2013 Nature America, Inc. All rights reserved.

Figure 1 Considerations in photoreceptor transplantation. Factors such as the choice of retinal disease model and stage of disease, the method and timing
of donor rod production, and the controls used for outcome evaluation are important considerations when designing a retinal cell–replacement strategy.
A schematic of an experiment used by Ali, Sowden and colleagues4 to distinguish true donor-rod integration from false host-rod labeling is shown.
AAV, adeno-associated virus; FACS, fluorescence-activated cell sorting; wEB, whole embryoid body; WT, wildtype.

The effort invested in characterizing populations of donor cells and in testing them in
different disease models is worthwhile only if
the methods used to monitor the cells’ fate and
function are reliable and free of confounding
artifacts. Thus, for all of the important advances
made by this research team, it is their description of the possible pitfalls in the interpretation of retinal transplantation results that may
prove most valuable. In a previous publication,
they sounded a note of caution concerning the
sensitivity of tests of retinal function, which
led them to predict the minimal numbers of
integrated donor photoreceptors needed for
functional recovery3. In the present study, they
show clearly that virus particles used to fluorescently label rods in mixed donor-cell suspensions can be unintentionally carried into the
subretinal space, where they spuriously label
host photoreceptors (Fig. 1).
To investigate host uptake of contaminating
virus particles, the authors transduce differentiated cells from a CBA.YFP mouse ESC line with
an adeno-associated viral vector carrying an
RFP reporter under the control of a Rhodopsin
promoter (AAV2/9.Rhop.RFP), which specifically labels rods. Subretinal injection of
the unsorted cell mixture results in numerous
virus-labeled, RFP+YFP– cells, indicative of
host photoreceptor transduction.
The authors also distinguish virus-labeled
host photoreceptors from integrated donor
rods by using the Gnat1−/− mouse model of
stationary night blindness. These mice lack rod
α-Transducin; therefore, only integrated donor
rods would be expected to express this protein
in their outer segments. After transplantation of
unsorted differentiated populations transduced
with AAV2/9.Rhop.GFP, ~99% of the GFP+ cells
in the retinas of Gnat1−/− hosts do not express
α-Transducin and thus are endogenous photoreceptors, not integrated donor rods. In contrast,
when the cells are sorted for GFP expression

before transplantation, which presumably
reduces contaminating virus, ~80% of the GFP+
cells in the host photoreceptor layer express
properly localized α-Transducin, reflecting
their donor origin. The integrated donor cells
exhibit characteristic features of mature rods,
such as outer segments oriented toward the
retinal pigment epithelium and basal processes
with synaptic bouton–like structures near afferent terminals of rod bipolar cells. These findings
provide the most definitive evidence to date that
ESCs can serve as a source of early rod precursors suitable for transplantation in at least some
mouse models of vision impairment.
Taken together, the recent studies from Ali,
Sowden and colleagues2–6,9 represent a masterful demonstration of the value of a systematic
approach, careful attention to detail and a willingness to reassess results with the advent of new
knowledge, techniques and tools. Ultimately,
the payoff for taking such a deliberate and contemplative approach to photoreceptor transplantation at early stages of investigation will be
greater confidence in the design of future clinical trials. Translation to human patients is on
the horizon, as the Sasai group and researchers
in our laboratory have already described threedimensional methods to derive photoreceptor
precursors from human ESCs10,11 and induced
pluripotent stem cells11,12. Furthermore, human
ESC-derived retinal cells expressing the rod
marker NRL, but lacking inner or outer segments, were shown to integrate in a mouse
model of early childhood blindness13.
The critical next steps toward clinical trials
include improvements in survival and integration efficiency of donor cells, demonstration of efficacy, additional assurance of safety,
and adoption of clinical Good Manufacturing
Practices for human photoreceptor production. In addition, the potential for confounding
variables in transplant studies should be kept
in mind. For example, the present report does

nature biotechnology volume 31 number 8 AUGUST 2013

not specifically rule out the occurrence of fusion
between donor and host cells. However, the
expression and precise localization of outer segment proteins (α-Transducin, Peripherin and
Rhodopsin) in their corresponding null mouse
mutant argues against simple fusion as being the
cause of the observed donor cell integration.
Once the safety and efficacy of photoreceptor transplantation is established, attention will
be directed toward combined replacement of
photoreceptors and the adjacent retinal pigment epithelium, which provides supportive
functions to photoreceptors. This is an important future goal, as both of these cell types
are eventually lost in age-related macular
degeneration. Looking beyond the eye, lessons learned on the arduous road to human
retinal-cell replacement, which is paved by
studies such as the one presented here, may
also accelerate efforts to repair other tissues
of the central nervous system. Such advances
will be necessary to achieve the National Eye
Institute’s recently announced ‘audacious goal’
of developing effective treatments for vision
disabilities resulting from neuron loss within
the eye and visual system.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Rama, P. et al. N. Engl. J. Med. 363, 147–155 (2010).
MacLaren, R.E. et al. Nature 444, 203–207 (2006).
Pearson, R.A. et al. Nature 485, 99–103 (2012).
Gonzalez-Cordero, A. et al. Nat. Biotechnol. 31,
741–747 (2013).
5. West, E.L. et al. Stem Cells 30, 1424–1435 (2012).
6. Barber, A.C. et al. Proc. Natl. Acad. Sci. USA 110,
354–359 (2013).
7. Eiraku, M. et al. Nature 472, 51–56 (2011).
8. Singh, M.S. et al. Proc. Natl. Acad. Sci. USA 110,
1101–1106 (2013).
9. Pearson, R.A. et al. Cell Transplant. 19, 487–503
(2010).
10. Nakano, T. et al. Cell Stem Cell 10, 771–785 (2012).
11. Meyer, J.S. et al. Stem Cells 29, 1206–1218 (2011).
12. Phillips, M.J. et al. Invest. Ophthalmol. Vis. Sci. 53,
2007–2019 (2012).
13. Lamba, D.A. et al. Cell Stem Cell 4, 73–79 (2009).
1.
2.
3.
4.

713

news and v i ews

Network cleanup
Babak Alipanahi & Brendan J Frey

npg

© 2013 Nature America, Inc. All rights reserved.

The erroneous links in networks inferred from data can be efficiently eliminated under certain conditions.
Networks offer an alluring simplicity for
representing complex systems of interacting
parts1. But when networks are constructed
from biological data through statistical inference, it is often unclear how faithfully they
represent the real systems. In many cases,
true links between nodes are obscured by a
sea of noise in the form of erroneous links.
In this issue, two studies by Feizi et al.2 and
Barzel et al.3 describe efficient, easily implemented methods for identifying and removing erroneous links, thereby producing
more accurate networks. Both papers demonstrate the application of their techniques
to large-scale practical problems, such as the
DREAM5 gene regulatory network inference
challenge4. In addition, Feizi et al.2 explore
other applications by analyzing networks of
interacting residues in protein structures and
social networks of scientists.
The problem of erroneous links in inferred
networks was described in 1921 by the geneticist—and founder of the field of network
inference—Sewall Wright, who said, “The
degree of correlation between two variables
can be calculated by well-known methods, but
when it is found it gives merely the resultant
of all connecting paths of influence”5. As an
example, suppose one gene directly controls
a second gene, which in turn directly controls
a third gene. Correlation analysis will erroneously indicate that the first gene directly
influences the third gene. Other methods for
linking variables, such as mutual information
and distance correlation, are limited by the
same problem. The goal of network inference is to identify the direct links and their
strengths while suppressing the indirect,
or transitive, associations. This problem is
difficult because experimental techniques
often cannot distinguish between direct and
indirect effects.
Feizi et al.2 and Barzel et al.3 tackle the problem of network inference from conceptually
Babak Alipanahi and Brendan J. Frey are
with the Departments of Electrical and
Computer Engineering and the Donnelly Centre
for Cellular and Biomolecular Research at
the University of Toronto, Toronto, Ontario,
Canada.
e-mail: [email protected] or
[email protected]

714

a
Input network, G

Output network, S

True network

1
2

3

3

6

2

4
6

3

1

5

4
6

2

Barzel et al.3:
“indirect-link
silencing”

3

4

5

b

1

4

5

2

5

2

Feizi et al. :
“network
deconvolution”

1

6

Total effect through network
of source on neighbor (Gkj)
k

Direct effect of neighbor
on target (Sik)

j

i

Target

Source

Direct effect of source on target (Sij)

Figure 1 Cleaning up networks. (a) An example of how the erroneous links in an input network (G)
derived using pairwise correlation can be suppressed using the methods described by Feizi et al.2
(called network deconvolution) and Barzel et al.3 (called silencing). The network of true relationships (S)
is shown on the right. Edge thickness corresponds to the connection strength. (b) How the network of
total, measured effects G is related to the unknown network of direct effects, S. The total effect of node j
on node i, Gij, can be obtained by adding the direct effect of j on i (Sij) to the sum of the direct effects of
each neighbor k on i (Sik) multiplied by the total effect of j on k in the appropriate subnetwork (Gkj).

quite different starting points. Feizi et  al.2
view the measured correlations as a consequence of flows along the edges in the true
network. In contrast, Barzel et al.3 treat the
measured correlations as small perturbations
that result from adding up the small perturbations induced along edges in the true
network. In both cases, the authors turn the
seemingly intractable problem of network
inference into easily implemented algorithms
that invert these processes to obtain the true
network from the measured correlations.
To illustrate the methods, we have implemented them, applied them to a ‘toy’ gene
regulatory network problem and compared

the results with the known true network of
inter­actions (Fig. 1a).
Both methods account for how the total,
measured effect of a source node on a target
node is mediated by the direct neighbors
of the target (Fig. 1b). In addition, if the
source is directly connected to the target
node, that direct effect is accounted for too.
This accounting is not quite correct because
of loops, but it is reasonably accurate if the
strengths of indirect effects decay substantially as they propagate around the loops.
As a concrete example, consider the network shown in Figure 1a, in which circles
represent genes and links between genes

volume 31 number 8 AUGUST 2013 nature biotechnology

news and v i ews

Box 1 Mathematical basis of suppressing indirect effects in networks
A network can be represented as a matrix with each node corresponding to one row and column and edges represented as entries in the
matrix. This allows mathematical analysis of the network using tools from linear algebra. Using notation from Barzel et al.3, S is the true
matrix of direct associations, where each entry Sij is the rate of change of variable i with respect to variable j, assuming all other variables
are held constant. The correlation matrix G is constructed from measurements of the total effect of each variable on every other variable,
and it can be related to S as follows (also see Fig. 1b). The total effect of j on i, Gij, can be obtained by summing up the effects mediated
through the direct neighbors of i in the true network, S. Each direct neighbor, k, is connected to j through a subnetwork whose total effect
is approximately equal to Gkj, giving a relationship used in both studies
Gij = Sij  + Σk:k≠jGkjSik

npg

© 2013 Nature America, Inc. All rights reserved.

Network inference entails finding a network S that satisfies the above equation for all i ≠ j. Both Barzel et al.3 and Feizi et al.2 describe
unique, but approximate, closed form solutions for S in terms of G.
It turns out that both approaches are related to the method of partial correlation, which is the correlation between the residual errors for
two variables when they are linearly predicted from all other variables5,6. If the input G is a correlation matrix and P = G–1, then for i ≠ j,
Feizi et al.’s solution has the form Sij = –Pij, Barzel et al.’s solution has the form Sij = –(1–Σk≠iG2ik)Pij and the partial correlation is
Sij = –Pij /√(Pii Pjj). So, the different methods scale the inverse correlation matrix differently. The proposed methods can be applied to other
types of input matrix, such as one derived using mutual information.

represent regulatory effects. These effects
may be mediated by different molecular
mechanisms, such as direct binding of a transcription factor to the promoter of a target
gene or phosphorylation of a target protein
by a kinase. In the example, the effect of gene
1 on gene 4 can be broken down into the sum
of three terms: (i) the direct effect of gene
2 on gene 4 multiplied by the total effect of
gene 1 on gene 2, (ii) the direct effect of gene
3 on gene 4 multiplied by the total effect of
gene 1 on gene 3 and (iii) the direct effect of
gene 6 on gene 4 multiplied by the total effect
of gene 1 on gene 6. If there were a direct link
between genes 1 and 4, it would be included
in the sum.
We found that the linear algebra used to
derive the methods of Feizi et al.2 and Barzel
et al.3 can be reformulated so as to obtain
insights into the similarities and differences between the methods and into their
relationship to previous work on partial
correlation5,6. The methods differ in how the
inferred, direct effect associated with each
link is scaled (Box 1). The partial correlation
scales each link according to its source and
target. The method of Barzel et al.3 scales the
strength of each link according to its target,
whereas the method of Feizi et al.2 does not
scale the links. The strength of each link is
used as a proxy for the significance of the
association, so links with small strengths are
discarded. The scaling factors for partial correlation are determined from the inverse of
the correlation matrix, and the scaling factors used by Barzel et al.3 can be computed
directly from the correlation matrix. As a
consequence of these differences, the methods can output quite different solutions.

Further work is required to determine the
relative advantages of each approach.
How well can we expect these methods,
and others based on partial correlation, to
work, and what are their limitations? If the
physical system being analyzed is linear and
all variables are observed, a nonzero partial
correlation between two variables indicates
that they are dependent when all other variables are held constant8. Whether or not
this implies a physically relevant link should
be answered in the context of the problem of causal interpretations of networks,
where an intervention-based approach
may be needed to identify causal links with
some guarantees9.
Both proposed methods, and related ones,
will be highly sensitive to missing variables
because of Simpson’s paradox, which states
that a statistical relationship between two
variables may be reversed or eliminated
when additional variables are included10.
For example, one may observe that the
expression of gene 1 is negatively correlated with that of gene 2 but that after the
expression value is adjusted to account for
the cell cycle, gene 1 is positively correlated
with gene 2. Yet, when additional adjustments are made for chromatin structure,
the genes may no longer appear correlated.
Another limitation is that the methods do
not output confidence levels for links or,
more generally, distributions over possible
networks, which could be useful for downstream analyses.
The methods discussed here, and related
ones, should be viewed as exploratory tools
that can be applied to guide research rather
than to draw scientific conclusions. As shown

nature biotechnology volume 31 number 8 AUGUST 2013

by the example in Figure 1a, the methods
can be used to direct attention to links that
are more likely to have direct, possibly
causal effects, so that a more careful analysis, possibly including additional data, can
be conducted. For example, by using these
methods to infer networks of RNA-binding
proteins7, it is possible to identify potential regulators and incorporate them into
an accurate regulatory model of splicing11.
There are other frameworks that can be used
to infer networks, including Bayesian reasoning, which would generate a distribution of
possible networks, and information theory,
which provides guarantees based on asymptotic analysis12. Ultimately, the processes
of link hypothesis generation, causal testing and network refinement can be formalized using structural equation modeling5,9
and structural causal modeling9, two techniques from the statistics and artificial
intelligence communities.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
1. Lopes, C.T. et al. Bioinformatics 26, 2347–2348
(2010).
2. Feizi, S. et al. Nat. Biotechnol. 31, 726–733
(2013).
3. Barzel, B. et al. Nat. Biotechnol. 31, 720–725
(2013).
4. Marbach, D. et al. Nat. Methods 9, 796–804
(2012).
5. Wright, S. J. Agric. Res. 20, 557–585 (1921).
6. Raveh, A. Am. Stat. 39, 39–42 (1985).
7. Ray, D. et al. Nature 499, 172–177 (2013).
8. Pearl, J. Sociol. Methods Res. 27, 226–284
(1998).
9. Pearl, J. Stat. Surv. 3, 96–146 (2009).
10. Simpson, E.H. J. Roy. Stat. Soc. B 13, 238–241
(1951).
11. Barash, Y. et al. Nature 465, 53–59 (2010).
12. Santhanam, N.P. et al. IEEE Trans. Inf. Theory 58,
4117–4134 (2012).

715

news and v i ews

Taking the fish out of fish oil
James P Wynn

npg

© 2013 Nature America, Inc. All rights reserved.

Extensive metabolic engineering of the yeast Yarrowia lipolytica results in a sustainable and commercially viable
alternative to fish oil for the supply of eicosapentaenoic acid.
Alternatives to fish oil for the supply of longchain polyunsaturated fatty acid (PUFA) eicosapentaenoic acid (EPA) in the human diet have
remained stubbornly elusive. Natural organisms that synthesize this fatty acid in isolation
(without docosahexaenoic acid; DHA) have
proven unsuitable for commercial production.
In this issue, Xue et al.1 describe how rational
use of metabolic engineering, together with a
touch of serendipity, results in the production
of the richest available source of this nutritionally essential fatty acid to date. Their work
involves the heterologous expression of multiple copies of an array of biosynthetic genes not
usually found in the yeast Yarrowia lipolytica
(their production host of choice) and the identification of a gene that has a crucial role in the
regulation of PUFAs. The resulting product is a
single cell oil that contains EPA at levels higher
than those found in any naturally occurring
oil. This is now available to US consumers in
the form of both a human nutritional supplement and an aquaculture feed as a sustainable
replacement for fish-based sources of EPA.
Fish oils have long been recognized as beneficial to human health and fitness. For much
of the 20th century this benefit was associated
with the vitamin D content of these oils, but
since the beginning of this millennium, attention has switched to the fatty-acid composition
of these oils and in particular the use of marine
fish oils as rich sources of the omega-3 (n–3)
long-chain PUFAs (n–3 LCPUFAs), DHA and
EPA. Whereas DHA is noted predominantly for
its role in brain and eye health, EPA appears to
be mostly responsible for the anti-inflammatory
activity of fish oils, contributing to cardiovascular and joint health2,3.
As consumer recognition of the health benefits of these two n–3 LCPUFAs has increased,
so the demand for nutritional supplements
and foods rich in these fatty acids has grown.
The market for n–3 fatty acid oil ingredients
is projected to increase at a compound annual
growth rate of >13% through 2016 (ref. 4).
However, marine fish oils, which currently
dominate the market, are not considered a
sustainable resource as decades of overfishing
James P. Wynn is at MBI, Lansing, Michigan,
USA
e-mail: [email protected]

716

have resulted in drastically diminished fish
stocks and the imposition of quotas for many
species. Use of fish oils has also been hampered
by the possibility that they contain fat-soluble
environmental contaminants (for example,
polychlorinated biphenyls), which can persist
even in highly purified oils5. Problems with
environmental contaminants, and in particular sustainability concerns, have prompted the
search for alternative sources of both DHA and
EPA to replace fish oils.
It is ironic that marine fish do not actually
synthesize the n–3 LCPUFAs for which their
oils are now so highly valued. These fatty acids
are obtained from marine algae and bacteria
at the base of the marine food chain. Indeed,
these fatty acids are as nutritionally important for marine fish as they are for humans,
with the result that a major end-point of the
n–3 LCPUFAs procured from wild-caught
fish is aquaculture feed (in the form of either
fishmeal or oil) to support the production of
farmed fish.
Biological producers of oils rich in DHA
in the form of marine bacteria and algae have
been known for many years6. These have
been commercialized by the multinational
DSM (Heerlen, the Netherlands) in the form
of oils purified from two marine microalgae:
Crythecodinium cohnii and Schizochytrium sp.
Other sources of oils rich in EPA produced by
organisms amenable to large-scale production have proven elusive, however. Previous
attempts to generate sustainable and scalable
EPA-rich oils have included extensive metabolic engineering of oil-seed crop plants7 and
even the conversion of a photosynthetic alga
known to be capable of producing oil containing substantial EPA levels into a heterotrophic
organism to facilitate mass production of oil
in traditional stirred-tank fermentors8. None
of these attempts have resulted in commercial production of an EPA-rich oil to replace
fish oils.
Recently, a fermented oil from another
marine microalgal source containing a mix
of DHA and EPA has been launched (also by
DSM); however, an oil rich only in EPA was
unavailable until the success of the work presented here by Xue et al.1.
The production of EPA-rich oil in Y. lipolytica
can be seen as a metabolic-engineering tour
de force, requiring as it did transformation of

Acetyl-CoA

7 x Malonyl-CoA

Fatty acid
synthase

Native genes

Introduced genes
16:0
C16 elongase- 1 copy

C16 elongase
18:0
D9 desaturase
18:1
D12 desaturase

D12 desaturase- 5 copies

18:2 n-6
D9 elongase- 7 copies
20:2 n-6
D8 desaturase- 8 copies
20:3 n-6
D5 desaturase- 5 copies
20:4 n-6
D17 desaturase- 3 copies
20:5 n-3

Figure 1 An overview of the metabolic
engineering approach to the efficient production
of EPA in the oleaginous yeast Yarrowia lipolytica.
The activities to the left of the scheme
(in black) are native; the activities to the
right of the scheme (in red) are activities that
were added via metabolic engineering. Native
biosynthetic reactions are denoted with black
arrows, reactions augmented by expression of
heterologous genes are denoted with orange
arrows and reactions not naturally observed
in Yarrowia lipolytica (added by metabolic
engineering) are denoted with red arrows.

the yeast with 21 heterologous genes encoding five different activities. In so doing, the
authors engineer a completely rerouted PUFA
biosynthetic pathway that proceeds via a ∆9
elongation followed by a ∆8 elongation, a
∆17 desaturation and finally a ∆5 desaturation (all activities not found in the wild-type
organism) into the oleaginous host. The use
of this alternate pathway ensures that the
rate-limiting step of the engineered pathway
is the first heterologous reaction, resulting
in minimal accumulation of unnatural and
unwanted intermediates.
Although overexpression of the heterologous fatty acid biosynthetic genes is in and of
itself a considerable achievement and results
in biosynthesis of a substantial amount of EPA,
serendipity also had a role in the success of

volume 31 number 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

news and v i ews
this project. During the integration of a multi­
tude of biosynthetic genes into Y. lipolytica,
it appears that the peroxisomal biogenesis
factor 10 (PEX10) gene was inadvertently interrupted. When Xue et al.1 examined more closely
the mechanism by which EPA synthesis was
elevated, they deduced that PEX10 has a critical
role in controlling accumulation of PUFAs—a
finding that may have implications for
metabolic engineering of other sustainable
PUFAs. The function of PEX10 is in the biogenesis of peroxisomes, and its deletion interrupts the cells’ ability to degrade intracellular
fatty acids by β-oxidation. It will be interesting to see how further characterization of the
cellular function of this gene may facilitate the
engineering of other novel PUFA-synthesizing
organisms in the future.
Thus, a mixture of expertise in metabolic
engineering and serendipitous identification
of the regulatory role of PEX10 has resulted in
an organism that is the richest source of EPA oil
available on the planet, containing in excess of
50% of the total fatty acids as EPA. This organism has already been translated by DuPont
(Wilmington, DE) into an industrial process
that is replacing fish oil as products for both

human nutritional supplementation (as New
Harvest gel capsules) and aquaculture feed (as
Verlasso). The work described in this paper1
exemplifies the way in which biotech can generate novel products and bring to market alternatives that can replace those from nonsustainable
traditional sources. This use of metabolic engineering to produce a key chemical currently
sourced from a non-sustainable resource is an
example of the power of biotechnology in the
push towards a sustainable society.
COMPETING FINANCIAL INTERESTS
The author declares no competing financial interests.
1. Xue, Z. et al. Nat. Biotechnol. 31, 734–740 (2013).
2. Uauy, R. & Dangour A.D. Nutrition Rev. 64, S24–S33
(2006).
3. Nobre, M.E.P. et al. Nutrition Res. 33, 422–433
(2013).
4. Shanahan, C. Natural Products Insider (21 May
2013) <http://www.naturalproductsinsider.com/
articles/2013/05/dissecting-omega-3-dietarysupplement-products.aspx>.
5. Ashley, J.T.F. et al. Food Additives and Contaminants;
Part A 30, 506–514 (2013).
6. Wynn J.P. & Ratledge C. in Bailey’s Industrial Oil and
Fat Products, 3, 6th edn. (ed., F. Shahidi) 121–153
(Wiley Press, 2005).
7. Haslam, R.P. et al. Plant Biotechnol. J. 11, 157–168
(2013).
8. Zaslavskaia, L.A. et al. Science 292, 2073–2075
(2001).

A caffeine fix for human nuclear
transfer?
Anthony C F Perry
Efficient human nuclear-transfer embryo cloning is here at last, prompting
renewed efforts to explore hidden variables in the cloning process.
After years of failed attempts and one infamous claim of success, the derivation of
human nuclear-transfer embryonic stem
(ntES) cells has now been achieved. Writing in
Cell, Mitalipov and co-workers1 describe fusing human nucleus donor and recipient egg
cells and deriving ntES cells from resultant
blastocyst-stage embryos. This work overcomes
the sea of challenges attendant upon cloning
human embryos. Yet it does little to explain
why previous attempts2 failed or why this one
worked. The nuclear-transfer protocol is based
on a standard one3, and although an additional
and central step is the inclusion of caffeine,
Anthony C.F. Perry is at the Laboratory
of Mammalian Molecular Embryology,
Department of Biology and Biochemistry,
University of Bath, Bath, UK.
e-mail: [email protected]

those interested in nuclear transfer remain in
the dark as to the mechanistic contribution
caffeine made. As much as anything, the study
hints at an expanse of variables—hidden and
apparent—that continue to confound human
nuclear transfer, and leaves open questions on
the relative merits and demerits of human ntES
cells and induced pluripotent stem (iPS) cells.
To begin their study, Mitalipov and coworkers1 used a rhesus macaque model in
which adult-derived fibroblasts were fused with
fertilizable metaphase II eggs whose spindlechromosome assemblies had been removed
(a process called enucleation). The best blasto­
cyst development (17.5% of reconstructions)
occurred when fusion was promoted by
inactivated Sendai virus and reconstructed
cells were subjected to an electric pulse in
the presence of the serine/threonine kinase
inhibitor 6-dimethylaminopurine. Subsequent

nature biotechnology volume 31 number 8 AUGUST 2013

incubation included the histone deacetylase
inhibitor trichostatin A.
But when the group applied this protocol to
introduce human fetal fibroblast nuclei into
recipient human oocytes from healthy, young
volunteers (aged 23–31), development was
poor. It was suspected that enucleation had
impaired reprogramming by destabilizing
metaphase II arrest. The group had previously
ameliorated this problem in monkey nuclear
transfer with the phosphatase inhibitor caffeine4. This solution worked for human nuclear
transfer as well. When 1.25 mM caffeine was
included, spindle formation appeared efficient
and development improved, with 23.5% blastocyst formation (out of 42).
Oocytes from one egg donor (‘donor A’)
yielded blastocysts at an impressive rate
(71.4%), of which 80% (4 of 5) produced ntES
cells; these were karyotypically normal (46XX),
with ES-cell characteristics including OCT4,
NANOG and SOX2 expression. In another
series, 62.5% (of 8) nuclear-transfer embryos
developed to blastocysts, of which 80% yielded
ntES cell lines. As a clinical proof of principle,
skin fibroblasts from a patient (age unstated)
with Leigh syndrome—a necrotizing encephalomyelopathy—yielded two ntES cell lines.
The paper reports ten human ntES cell lines
in total.
This protocol echoes the first report of mammalian cloning, which compared fusion by
Sendai virus with electrical fusion3, although the
electrical pulse here1 probably triggered oocyte
activation. Oocyte activation classically refers
to the early embryonic phase that links the initiating intracellular release of calcium to the first
mitotic prometaphase. This embryonic phase
includes meiotic resumption (metaphase II
exit) and chromatin remodeling (pronucleus formation), and because it establishes
totipotency—the ability of a cell to give rise to
an entire individual—it is pivotal to nuclear
transfer cloning. However, oocyte activation is
also a dynamic process, and the ability of recipient oocytes to reprogram incoming somaticcell nuclei rapidly declines after its onset, in the
mouse at least5. Although this suggests that
changes in the cytoplasmic milieu close the
reprogramming window, the window reopens
to somatic-cell nuclei during the first mitotic
M-phase5, highlighting cell cycle regulation
as an important determinant of somatic-cell
nuclear reprogramming and cloning success.
Consistent with a key role for the cell cycle in
nuclear transfer, meiotic regulation is front and
center in the protocol reported by Mitalipov
and co-workers1: nuclear transfer was inefficient until metaphase II had been stabilized
by caffeine. This finding highlights the sweeping paucity of information on human meiotic
717

npg

© 2013 Nature America, Inc. All rights reserved.

news and v i ews
this project. During the integration of a multi­
tude of biosynthetic genes into Y. lipolytica,
it appears that the peroxisomal biogenesis
factor 10 (PEX10) gene was inadvertently interrupted. When Xue et al.1 examined more closely
the mechanism by which EPA synthesis was
elevated, they deduced that PEX10 has a critical
role in controlling accumulation of PUFAs—a
finding that may have implications for
metabolic engineering of other sustainable
PUFAs. The function of PEX10 is in the biogenesis of peroxisomes, and its deletion interrupts the cells’ ability to degrade intracellular
fatty acids by β-oxidation. It will be interesting to see how further characterization of the
cellular function of this gene may facilitate the
engineering of other novel PUFA-synthesizing
organisms in the future.
Thus, a mixture of expertise in metabolic
engineering and serendipitous identification
of the regulatory role of PEX10 has resulted in
an organism that is the richest source of EPA oil
available on the planet, containing in excess of
50% of the total fatty acids as EPA. This organism has already been translated by DuPont
(Wilmington, DE) into an industrial process
that is replacing fish oil as products for both

human nutritional supplementation (as New
Harvest gel capsules) and aquaculture feed (as
Verlasso). The work described in this paper1
exemplifies the way in which biotech can generate novel products and bring to market alternatives that can replace those from nonsustainable
traditional sources. This use of metabolic engineering to produce a key chemical currently
sourced from a non-sustainable resource is an
example of the power of biotechnology in the
push towards a sustainable society.
COMPETING FINANCIAL INTERESTS
The author declares no competing financial interests.
1. Xue, Z. et al. Nat. Biotechnol. 31, 734–740 (2013).
2. Uauy, R. & Dangour A.D. Nutrition Rev. 64, S24–S33
(2006).
3. Nobre, M.E.P. et al. Nutrition Res. 33, 422–433
(2013).
4. Shanahan, C. Natural Products Insider (21 May
2013) <http://www.naturalproductsinsider.com/
articles/2013/05/dissecting-omega-3-dietarysupplement-products.aspx>.
5. Ashley, J.T.F. et al. Food Additives and Contaminants;
Part A 30, 506–514 (2013).
6. Wynn J.P. & Ratledge C. in Bailey’s Industrial Oil and
Fat Products, 3, 6th edn. (ed., F. Shahidi) 121–153
(Wiley Press, 2005).
7. Haslam, R.P. et al. Plant Biotechnol. J. 11, 157–168
(2013).
8. Zaslavskaia, L.A. et al. Science 292, 2073–2075
(2001).

A caffeine fix for human nuclear
transfer?
Anthony C F Perry
Efficient human nuclear-transfer embryo cloning is here at last, prompting
renewed efforts to explore hidden variables in the cloning process.
After years of failed attempts and one infamous claim of success, the derivation of
human nuclear-transfer embryonic stem
(ntES) cells has now been achieved. Writing in
Cell, Mitalipov and co-workers1 describe fusing human nucleus donor and recipient egg
cells and deriving ntES cells from resultant
blastocyst-stage embryos. This work overcomes
the sea of challenges attendant upon cloning
human embryos. Yet it does little to explain
why previous attempts2 failed or why this one
worked. The nuclear-transfer protocol is based
on a standard one3, and although an additional
and central step is the inclusion of caffeine,
Anthony C.F. Perry is at the Laboratory
of Mammalian Molecular Embryology,
Department of Biology and Biochemistry,
University of Bath, Bath, UK.
e-mail: [email protected]

those interested in nuclear transfer remain in
the dark as to the mechanistic contribution
caffeine made. As much as anything, the study
hints at an expanse of variables—hidden and
apparent—that continue to confound human
nuclear transfer, and leaves open questions on
the relative merits and demerits of human ntES
cells and induced pluripotent stem (iPS) cells.
To begin their study, Mitalipov and coworkers1 used a rhesus macaque model in
which adult-derived fibroblasts were fused with
fertilizable metaphase II eggs whose spindlechromosome assemblies had been removed
(a process called enucleation). The best blasto­
cyst development (17.5% of reconstructions)
occurred when fusion was promoted by
inactivated Sendai virus and reconstructed
cells were subjected to an electric pulse in
the presence of the serine/threonine kinase
inhibitor 6-dimethylaminopurine. Subsequent

nature biotechnology volume 31 number 8 AUGUST 2013

incubation included the histone deacetylase
inhibitor trichostatin A.
But when the group applied this protocol to
introduce human fetal fibroblast nuclei into
recipient human oocytes from healthy, young
volunteers (aged 23–31), development was
poor. It was suspected that enucleation had
impaired reprogramming by destabilizing
metaphase II arrest. The group had previously
ameliorated this problem in monkey nuclear
transfer with the phosphatase inhibitor caffeine4. This solution worked for human nuclear
transfer as well. When 1.25 mM caffeine was
included, spindle formation appeared efficient
and development improved, with 23.5% blastocyst formation (out of 42).
Oocytes from one egg donor (‘donor A’)
yielded blastocysts at an impressive rate
(71.4%), of which 80% (4 of 5) produced ntES
cells; these were karyotypically normal (46XX),
with ES-cell characteristics including OCT4,
NANOG and SOX2 expression. In another
series, 62.5% (of 8) nuclear-transfer embryos
developed to blastocysts, of which 80% yielded
ntES cell lines. As a clinical proof of principle,
skin fibroblasts from a patient (age unstated)
with Leigh syndrome—a necrotizing encephalomyelopathy—yielded two ntES cell lines.
The paper reports ten human ntES cell lines
in total.
This protocol echoes the first report of mammalian cloning, which compared fusion by
Sendai virus with electrical fusion3, although the
electrical pulse here1 probably triggered oocyte
activation. Oocyte activation classically refers
to the early embryonic phase that links the initiating intracellular release of calcium to the first
mitotic prometaphase. This embryonic phase
includes meiotic resumption (metaphase II
exit) and chromatin remodeling (pronucleus formation), and because it establishes
totipotency—the ability of a cell to give rise to
an entire individual—it is pivotal to nuclear
transfer cloning. However, oocyte activation is
also a dynamic process, and the ability of recipient oocytes to reprogram incoming somaticcell nuclei rapidly declines after its onset, in the
mouse at least5. Although this suggests that
changes in the cytoplasmic milieu close the
reprogramming window, the window reopens
to somatic-cell nuclei during the first mitotic
M-phase5, highlighting cell cycle regulation
as an important determinant of somatic-cell
nuclear reprogramming and cloning success.
Consistent with a key role for the cell cycle in
nuclear transfer, meiotic regulation is front and
center in the protocol reported by Mitalipov
and co-workers1: nuclear transfer was inefficient until metaphase II had been stabilized
by caffeine. This finding highlights the sweeping paucity of information on human meiotic
717

a

b

Reprogramming to iPS cells

Reprogramming to ntES cells

Fibroblasts

Metaphase II
oocyte

Spindle
removal

Introduction of
reprogramming
transcription factors
(OCT4, SOX2 and
others)

Sendai
virus

npg

Nucleus donor
cell fusion

Isolation of iPS cells
Pronuclear
zygote

Figure 1 Generation of human induced pluripotent stem
(iPS) and nuclear transfer ES (ntES) cells. (a,b) Principal
steps and processes are shown for iPS cells (a) and
caffeine-enhanced ntES cell production1 (b). Pluripotent
iPS cells emerge after 2–3 weeks, but the mechanisms
underlying this emergence are unknown. The duration of
reprogramming following nuclear transfer (b) is thought to
be several hours and may be less, but the oocyte factors
responsible are also unknown. Additional unknowns in
nuclear transfer are suggested in Table 1. DMAP, 2 mM
6-dimethylaminopurine; TSA, 10 nM trichostatin A.

exit. For example, the dynamics of calcium
and calmodulin kinase II (CAMKII), which
transduces the calcium signal in Xenopus and
mouse fertilization6, are poorly characterized
in humans. Meiotic regulators conserved in
Xenopus and mice are often spindle associated,
including the cyclin-dependent kinase Cdc2
(cyclin B-Cdc2; Cdc2 is known as CDK1 in
humans), which imposes metaphase II arrest,
and the cytostatic factor Emi2, which stabilizes
cyclin B-Cdc2 by inhibiting the anaphasepromoting complex (APC)6. Notwithstanding
the proteins’ pivotal meiotic roles, few, if any,
reports characterize CAMKII, EMI2 or APC
subunits in healthy human eggs or in meiotic
or early embryonic dysfunction. Patients with
impaired fertility constitute a heterogeneous
source of naturally occurring meiotic and early
epigenetic defects not afforded by inbred animal models, yet basic mechanisms in primate
oocyte activation remain obscure.
The role of the spindle is relevant to human
nuclear transfer not least because depletion
of meiotic factors during spindle removal
may have been what caused the precocious
718

TSA DMAP

Electroactivation

Katie Vicari

© 2013 Nature America, Inc. All rights reserved.

Drug selection
for pluripotency

1.25 mM caffeine

news and v i ews

Development
in vitro

Nuclear transfer
blastocyst

Nuclear transfer
ES cells

metaphase II progression observed in human
oocytes1. Meiotic exit is in turn important
if, as suggested, the human metaphase II
cytoplasm contains chromatin remodeling
factors whose activities rapidly diminish
during meiotic progression1, and the spindle
itself may be enriched for remodeling factors
in human eggs7. Analysis of spindle samples
after human oocyte enucleation could reveal
the presence or absence of candidate remodeling factors7 and whether unspecified protein8
or mRNA9 content correlates with developmental potential.
How might caffeine exert its beneficial effect
to fix human nuclear transfer? Mitalipov and
co-workers1 indicated that caffeine impairs the
exit of human cells from metaphase II by acting as a protein phosphatase inhibitor, but they
did not evaluate this putative inhibitory activity. Although Emi2-mediated metaphase II
arrest and exit in frogs and mice involves cascades of phosphorylation and dephosphorylation6, potential cascade targets in human eggs
have not been identified. Caffeine may therefore have promoted human nuclear transfer

through any one of a broad range of the other
direct and indirect intracellular effects it has
been reported to elicit. These include inhibition of kinases (including CAMKII) and
phosphodiesterases. Caffeine can also synergistically dysregulate the cell cycle and render
cells more sensitive to mutagenesis. As the
process of nuclear reprogramming may itself
engender mutagenesis10, this argues for careful
evaluation of ntES cell genomes generated by
the caffeine protocol.
In mouse and human early embryos,
increased oocyte histone acetylation correlates with poor development, although
development is paradoxically improved in
mouse nuclear transfer by inhibiting histone
deacetylases with trichostatin A. Mitalipov and
colleagues1 employed 10 nM trichostatin A,
within the 5–100 nM range successfully used
in mouse nuclear transfer, so it is reasonable to
ask whether there was any net global effect on
histone acetylation in human nuclear transfer
embryos. Histone acetylation is a key mediator
of transcription, but although the initiation of
embryonic transcription (at around the fourcell stage in humans) is critical for development, it is unknown whether the inclusion of
trichostatin A (or caffeine) affected the transcriptomes of human nuclear-transfer embryos.
This issue is addressable in an age of single-cell
transcriptomics9, and it is predicted that transcriptomes resembling those after fertilization
would be more common in human nuclear
transfer embryos arising from caffeine treatment. In addition to histone acetylation, other
transcription-regulatory chromatin modifications such as DNA methylation, DNA
hydroxymethylation and histone methylation
may also collectively produce useful indicators
of preimplantation developmental potential but
are under-reported in human early embryos.
Like ntES cells, iPS cells10,11 have a broad
range of developmental commitment options,
and the two cell types share potential utility in
disease evaluation and regenerative therapy
(for a comparison of human ES and iPS cells,
see ref. 10). Whereas the derivation of ntES cells
requires an oocyte to reprogram an incoming
nucleus for embryogenesis, iPS cells are generated by introducing certain transcription or
other factors to reprogram somatic cells such as
skin-derived fibroblasts in situ, circumventing
the need for oocytes (Fig. 1). However, in the
mouse at least, early-passage iPS cells retain a
transient epigenetic memory of their somaticcell provenance12,13, whereas ntES cells have
undergone more complete reprogramming13.
Reprogramming mechanisms in nuclear
transfer and iPS cell generation are unknown
(Fig. 1, Table 1 ). In caffeine-enhanced human
nuclear transfer1, exogenous nuclei were

volume 31 number 8 AUGUST 2013 nature biotechnology

news and v i ews

Table 1 Hidden, cryptic and undisclosed variables in human nuclear transfer
Variable

Possible relevance of variable

Microinjectionist

Unidentified differences in competence produce variation in
developmental outcomes
Is cell fusion preferable to microinjection in humans?
Oocyte donor age and diet are among poorly understood influences on
developmental potential
Unknown or possibly adverse effects on oocyte quality
Can adult-derived nuclei work efficiently (as in mice)? What are the
cell cycle and other key culture parameters?
More prevalent in oocytes that fail in IVF, but the reason is uncertain
<1 h improves development, but why?

Micromanipulation
Oocyte source
Hormone hyperstimulation
Nucleus donor cell
First polar body
Oocyte collection–
micromanipulation interval
Cellular reprogramming
environment
Cytoskeletal integrity
Histone acetylation
Histone methylation

npg

© 2013 Nature America, Inc. All rights reserved.

DNA methylation
Embryo culture conditions
Other environmental factors
Preimplantation development

Blastocyst quality
Pharmacological agents (e.g.,
caffeine)

Is metaphase II essential, and how long do human oocytes take for
sufficient remodeling?
Poorly characterized spindle defects cause aneuploidy, but little is
known about cytoskeletal behavior apart from that of lamins
Hyperacetylation correlates with poor development, but why?
Key determinant of gene expression in other systems, but few data
exist for humans
Essential for global and imprinted gene expression control in mice,
but effects in humans are unknown
Effects on epigenetic regulation and potentially on development
Air quality and overlaying mineral oil may affect development
Development in vitro not guarantee of quality; efficient mouse
blastocyst parthenogenesis may be followed by developmental
failure
For example, cell number and trophectoderm (CDX2+) and inner cell
mass (OCT4+) lineage specification rarely recorded
What are their relevant targets in human eggs and embryos? How are
they affected and are the agents specific?

Some hidden developmental influences are easier to discover than others and potentially include factors contributed by the
biology of human cells and the techniques used to manipulate them.

remodeled by recipient oocyte factors in a process taking hours (although remodeling could
continue during embryogenesis) with no drug
selection and with an overall ntES cell derivation rate of ~12%. The genesis of human iPS
cells from somatic cells is initiated by overexpressing the transcription factors OCT4 and
usually SOX2 with one or more other factors,
includes selection, and takes 2–3 weeks, with a
success rate of ~0.1% and often much lower11.
Induced pluripotency may reflect a balanced
equilibrium between competing differentiation forces14 and is promoted by the expression
of oncogenes (of which OCT4 and SOX2 are
examples) or the removal of tumor suppressors11. It is not known to what extent the reprogramming pathways that lead to ntES and iPS
cell generation overlap, if at all. What is more
apparent is that these pathways represent black
boxes whose contents are often concealed by
hidden variables (Fig. 1, Table 1).
Looked at one way, the efficiencies of cloned
embryo and ntES cell derivation here1 are
impressively high. This may reflect the youth
of oocyte donors. The four carefully characterized ntES cell lines were all derived from the
eggs of the same donor (‘donor A’), but these
oocytes may have been anomalously proficient
nucleus recipients. Moreover, there are cases
in mammals where ES cells can be derived
after efficient formation of developmentally

incompetent blastocysts. For example,
mouse parthenogenetic blastocysts develop
efficiently—and parthenogenetic ES cells can
be derived from them—but developmental
catastrophe ensues and mammalian parthenogenetic offspring have not been reported. This
shows that neither blastocyst development
nor (nt)ES cell derivation necessarily reflect
developmental normality, a difficulty that is

exacerbated in man because much of the fundamental biology of human oocytes and the
technical aspects of their manipulation remain
hidden (Table 1).
The availability, albeit limited, of naturally
impaired human oocytes and early embryos
represents a unique opportunity to delineate
molecular processes that direct the initiation
of embryogenesis in fertilization. These processes are likely to include key determinants
of genome reprogramming in nuclear transfer, and as long as they are unknown, human
oocytes will remain refractory to prescriptive
manipulation. This is a call to redouble efforts,
not to give up: weaknesses and strengths inherent to human ntES and iPS cells argue that each
should be pursued absent an alternative to both.
To succeed, such efforts will have to produce a
fix that is a lot stronger than caffeine.
COMPETING FINANCIAL INTERESTS
The author declares no competing financial interests.
1. Tachibana, M. et al. Cell 153, 1228–1238
(2013).
2. Stojkovic, M. et al. Reprod. Biomed. Online 11,
226–231 (2005).
3. Willadsen, S.M. Nature 320, 63–65 (1986).
4. Mitalipov, S.M. et al. Hum. Reprod. 22, 2232–2242
(2007).
5. Egli, D. et al. Nature 447, 679–685 (2007).
6. Perry, A.C.F. & Verlhac, M.-H. EMBO Rep. 9, 246–251
(2008).
7. Noggle, S. et al. Nature 478, 70–75 (2011).
8. Han, Z. et al. J. Proteome Res. 9, 6025–6032
(2010).
9. VerMilyea, M.D. et al. EMBO J. 30, 1841–1851
(2011).
10. Nazor, K.L. et al. EMBO Rep. 13, 890–894 (2012).
11. Masip, M. et al. Mol. Hum. Reprod. 16, 856–868
(2010).
12. Polo, J.M. et al. Nat. Biotechnol. 28, 848–855
(2010).
13. Kim, K. et al. Nature 467, 285–290 (2010).
14. Shu, J. et al. Cell 153, 963–975 (2013).

Research Highlights
Papers from the literature selected by the Nature Biotechnology editors. (Follow us on
Twitter, @NatureBiotech #nbtHighlight)
Glycan receptor binding of the influenza A virus H7N9 hemagglutinin
Tharakaraman, K. et al. Cell 153, 1486–1493 (2013)
Structural determinants for naturally evolving H5N1 hemagglutinin to switch its receptor
specificity
Tharakaraman, K. et al. Cell 153, 1475–1485 (2013)
Vascularized and functional human liver from an iPSC-derived organ bud transplant
Takebe, T. et al. Nature doi:10.1038/nature12271 (3 July 2013)
Whole-genome sequence–based analysis of high-density lipoprotein cholesterol
Morrison, A.C. et al. Nat. Gene. doi:10.1038/ng.2671 (16 June 2013)
Nanoscopy with more than 100,000 ‘doughnuts’
Chmyrov, A. et al. Nat. Methods doi:10.1038/nmeth.2556 (7 July 2013)
Biosynthesis of antinutritional alkaloids in solanaceous crops is mediated by clustered genes
Itkin, M. et al. Science 341, 175–179 (2013)

nature biotechnology volume 31 number 8 AUGUST 2013

719

news and v i ews

Table 1 Hidden, cryptic and undisclosed variables in human nuclear transfer
Variable

Possible relevance of variable

Microinjectionist

Unidentified differences in competence produce variation in
developmental outcomes
Is cell fusion preferable to microinjection in humans?
Oocyte donor age and diet are among poorly understood influences on
developmental potential
Unknown or possibly adverse effects on oocyte quality
Can adult-derived nuclei work efficiently (as in mice)? What are the
cell cycle and other key culture parameters?
More prevalent in oocytes that fail in IVF, but the reason is uncertain
<1 h improves development, but why?

Micromanipulation
Oocyte source
Hormone hyperstimulation
Nucleus donor cell
First polar body
Oocyte collection–
micromanipulation interval
Cellular reprogramming
environment
Cytoskeletal integrity
Histone acetylation
Histone methylation

npg

© 2013 Nature America, Inc. All rights reserved.

DNA methylation
Embryo culture conditions
Other environmental factors
Preimplantation development

Blastocyst quality
Pharmacological agents (e.g.,
caffeine)

Is metaphase II essential, and how long do human oocytes take for
sufficient remodeling?
Poorly characterized spindle defects cause aneuploidy, but little is
known about cytoskeletal behavior apart from that of lamins
Hyperacetylation correlates with poor development, but why?
Key determinant of gene expression in other systems, but few data
exist for humans
Essential for global and imprinted gene expression control in mice,
but effects in humans are unknown
Effects on epigenetic regulation and potentially on development
Air quality and overlaying mineral oil may affect development
Development in vitro not guarantee of quality; efficient mouse
blastocyst parthenogenesis may be followed by developmental
failure
For example, cell number and trophectoderm (CDX2+) and inner cell
mass (OCT4+) lineage specification rarely recorded
What are their relevant targets in human eggs and embryos? How are
they affected and are the agents specific?

Some hidden developmental influences are easier to discover than others and potentially include factors contributed by the
biology of human cells and the techniques used to manipulate them.

remodeled by recipient oocyte factors in a process taking hours (although remodeling could
continue during embryogenesis) with no drug
selection and with an overall ntES cell derivation rate of ~12%. The genesis of human iPS
cells from somatic cells is initiated by overexpressing the transcription factors OCT4 and
usually SOX2 with one or more other factors,
includes selection, and takes 2–3 weeks, with a
success rate of ~0.1% and often much lower11.
Induced pluripotency may reflect a balanced
equilibrium between competing differentiation forces14 and is promoted by the expression
of oncogenes (of which OCT4 and SOX2 are
examples) or the removal of tumor suppressors11. It is not known to what extent the reprogramming pathways that lead to ntES and iPS
cell generation overlap, if at all. What is more
apparent is that these pathways represent black
boxes whose contents are often concealed by
hidden variables (Fig. 1, Table 1).
Looked at one way, the efficiencies of cloned
embryo and ntES cell derivation here1 are
impressively high. This may reflect the youth
of oocyte donors. The four carefully characterized ntES cell lines were all derived from the
eggs of the same donor (‘donor A’), but these
oocytes may have been anomalously proficient
nucleus recipients. Moreover, there are cases
in mammals where ES cells can be derived
after efficient formation of developmentally

incompetent blastocysts. For example,
mouse parthenogenetic blastocysts develop
efficiently—and parthenogenetic ES cells can
be derived from them—but developmental
catastrophe ensues and mammalian parthenogenetic offspring have not been reported. This
shows that neither blastocyst development
nor (nt)ES cell derivation necessarily reflect
developmental normality, a difficulty that is

exacerbated in man because much of the fundamental biology of human oocytes and the
technical aspects of their manipulation remain
hidden (Table 1).
The availability, albeit limited, of naturally
impaired human oocytes and early embryos
represents a unique opportunity to delineate
molecular processes that direct the initiation
of embryogenesis in fertilization. These processes are likely to include key determinants
of genome reprogramming in nuclear transfer, and as long as they are unknown, human
oocytes will remain refractory to prescriptive
manipulation. This is a call to redouble efforts,
not to give up: weaknesses and strengths inherent to human ntES and iPS cells argue that each
should be pursued absent an alternative to both.
To succeed, such efforts will have to produce a
fix that is a lot stronger than caffeine.
COMPETING FINANCIAL INTERESTS
The author declares no competing financial interests.
1. Tachibana, M. et al. Cell 153, 1228–1238
(2013).
2. Stojkovic, M. et al. Reprod. Biomed. Online 11,
226–231 (2005).
3. Willadsen, S.M. Nature 320, 63–65 (1986).
4. Mitalipov, S.M. et al. Hum. Reprod. 22, 2232–2242
(2007).
5. Egli, D. et al. Nature 447, 679–685 (2007).
6. Perry, A.C.F. & Verlhac, M.-H. EMBO Rep. 9, 246–251
(2008).
7. Noggle, S. et al. Nature 478, 70–75 (2011).
8. Han, Z. et al. J. Proteome Res. 9, 6025–6032
(2010).
9. VerMilyea, M.D. et al. EMBO J. 30, 1841–1851
(2011).
10. Nazor, K.L. et al. EMBO Rep. 13, 890–894 (2012).
11. Masip, M. et al. Mol. Hum. Reprod. 16, 856–868
(2010).
12. Polo, J.M. et al. Nat. Biotechnol. 28, 848–855
(2010).
13. Kim, K. et al. Nature 467, 285–290 (2010).
14. Shu, J. et al. Cell 153, 963–975 (2013).

Research Highlights
Papers from the literature selected by the Nature Biotechnology editors. (Follow us on
Twitter, @NatureBiotech #nbtHighlight)
Glycan receptor binding of the influenza A virus H7N9 hemagglutinin
Tharakaraman, K. et al. Cell 153, 1486–1493 (2013)
Structural determinants for naturally evolving H5N1 hemagglutinin to switch its receptor
specificity
Tharakaraman, K. et al. Cell 153, 1475–1485 (2013)
Vascularized and functional human liver from an iPSC-derived organ bud transplant
Takebe, T. et al. Nature doi:10.1038/nature12271 (3 July 2013)
Whole-genome sequence–based analysis of high-density lipoprotein cholesterol
Morrison, A.C. et al. Nat. Gene. doi:10.1038/ng.2671 (16 June 2013)
Nanoscopy with more than 100,000 ‘doughnuts’
Chmyrov, A. et al. Nat. Methods doi:10.1038/nmeth.2556 (7 July 2013)
Biosynthesis of antinutritional alkaloids in solanaceous crops is mediated by clustered genes
Itkin, M. et al. Science 341, 175–179 (2013)

nature biotechnology volume 31 number 8 AUGUST 2013

719

A n a ly s i s

Network link prediction by global silencing of indirect
correlations

npg

© 2013 Nature America, Inc. All rights reserved.

Baruch Barzel1,2 & Albert-László Barabási1–3
Predictions of physical and functional links between cellular
components are often based on correlations between
experimental measurements, such as gene expression.
However, correlations are affected by both direct and indirect
paths, confounding our ability to identify true pairwise
interactions. Here we exploit the fundamental properties of
dynamical correlations in networks to develop a method to
silence indirect effects. The method receives as input the
observed correlations between node pairs and uses a matrix
transformation to turn the correlation matrix into a highly
discriminative silenced matrix, which enhances only the
terms associated with direct causal links. Against empirical
data for Escherichia coli regulatory interactions, the method
enhanced the discriminative power of the correlations by
twofold, yielding >50% predictive improvement over traditional
correlation measures and 6% over mutual information.
Overall this silencing method will help translate the abundant
correlation data into insights about a system’s interactions,
with applications ranging from link prediction to inferring the
dynamical mechanisms governing biological networks.
The currently incomplete maps of molecular interactions between ­cellular
components limit our understanding of the molecular mechanisms
behind human disease1–6. Ultimately, high-throughput projects7–10
are expected to provide the accurate maps of inter­actomes necessary
to systematically unlock disease mechanisms. Yet, as a complete inter­
action map is currently not at hand, we need to develop tools that allow
us to infer the structure of cellular networks from empirically obtained
biological data11,12. Many current tools designed to infer functional and
physical interactions in the cell rely on the global response matrix,
Gij =

dxi
,
dx j

(1)

which captures the change in node i’s activity in response to changes in
node j’s13. This matrix can be measured directly from gene knockout or
overexpression experiments, or inferred indirectly using related mea­sures
such as Pearson or Spearman correlations14, mutual information15,16 or
1Center

for Complex Network Research and Departments of Physics, Computer
Science and Biology, Northeastern University, Boston, Massachusetts, USA.
2Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Harvard
Medical School, Boston, Massachusetts, USA. 3Department of Medicine,
Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts,
USA. Correspondence should be addressed to A.-L.B. ([email protected]).
Received 22 October 2012; accepted 23 April 2013; published online 14 July
2013; doi:10.1038/nbt.2601

720

Granger causality17. Traditional methods for predicting links15,16,18,19
assume that the magnitude of Gij correlates with the likelihood of a direct
functional or physical link between nodes i and j. Yet Gij cannot distinguish between direct and indirect relationships: a path i → k → j can
result in a measurable response observed between i and j, falsely suggesting the existence of a direct link between them (Fig. 1a,b).
Several methods to correct for such effects have been proposed.
Information theory approaches evaluate the association between
nodes by measuring the entropy of their mutual activities, where a low
entropy indicates a statistical dependence between the node activities16,18,20; probabilistic models, such as the graphical Gaussian model,
allow one to evaluate the correlation between i and j, while controlling
for the state of node k, and thereby provide a more indicative measure
of direct linkage21–25; other models rely on assumptions pertaining to
the network topology, such as the tendency of real networks to exhibit
strong degree correlations26. The ultimate solution, however, should
enable us to fully unwind the direct from the indirect effects, providing a measure that distinctly indicates the existence of direct links.
Consequently, we focus here on the local response matrix
Sij =

∂xi
,
∂x j

(2)

in which the contribution of indirect effects is eliminated. In contrast
with equation (1), which allows for global changes in i and j’s environment, here the “∂ ” indicates that Sij is defined to capture only local
effects, namely the response of i to changes in j when all surrounding
nodes except i and j remain unchanged. Hence Sij > 0 implies a direct
link between i and j.
We derive a method for calculating the local response matrix (2)
from experimentally accessible correlation measures, allowing us
to mathematically discriminate direct from indirect links (Fig. 1).
We show that the resulting Sij matrix, in which the contribution of
indirect paths is silenced, is more discriminative than the empirically
obtained Gij matrix, enhancing our ability to extract direct links from
experimentally collected correlation data.
RESULTS
The silencing method
To extract Sij from the experimentally accessible Gij, we formally link
equations (1) and (2) via
 dxi = 1
 dx
 i

N
∂xi dxk
 dxi =

 dx j

x dx j
k =1 k


i≠ j

.

(3)

VOLUME 31  NUMBER 8  AUGUST 2013 nature biotechnology

a n a ly s i s

npg

Equation (3) is exact and the sum accounts for all network paths
connecting i and j (Supplementary Note, I.1–2). It is of limited use,
however, as it requires us to solve N2 coupled algebraic equations. In
Supplementary Note, I.1, we show that equation (3) can be reformulated as
S = (G − I + D(S ⋅G ))G −1 ,

(4)

where I is the identity matrix and D(M) sets the off-diagonal terms of
M to zero. To obtain an approximate solution for S, we use the fact that,
typically, perturbations decay rapidly as they propagate through the
network, so that the response observed between two nodes is dominated by the shortest path between them. This allows us to approximate
D(S · G) with D((G − I)G) (Supplementary Note, I.3), obtaining
S = (G − I + D((G − I )G))G −1

(5)

Equation (5), our main result, provides Sij from the experimentally
accessible Gij. It achieves this through a ‘silencing effect’, in which
direct response terms are preserved, whereas indirect responses are
silenced. To understand this, consider a specific term in Gij, documenting the response of node i to j’s perturbation. As indicated by
equation (3), this response is a consequence of all direct and indirect
paths leading from j to i. As we document below, the transformation (5) detects the indirect paths and silences them, maintaining
only the contribution of the direct paths (Fig. 1d–f). An alternative
method to approximate D(S · G) in equation (4) is using an iterative
scheme, in which Sij is evaluated first via equation (5) and then used
as input in equation (4), repeating the process until sufficient accuracy
is achieved (Supplementary Note, I.1).
Silencing in model systems
To demonstrate the predictive power of equation (5), we implemented
Michaelis-Menten dynamics on a model network (Supplementary
Note, III), as commonly used to model gene regulation27,28.
We obtained Gij by perturbing the activity of each node and then calculated Sij using equation (5). Figure 2a shows the Gij and Sij terms
associated with interacting and noninteracting node pairs. Although
Gij is higher for direct interactions, the overlap between the orange
and the green symbols indicates a lack of a clear threshold q that separates direct and indirect interactions. In contrast, Sij displays a clear
nature biotechnology VOLUME 31  NUMBER 8  AUGUST 2013

Gij =

b

dxi
dxj

Physical link
Spurious interaction

c

Direct
Indirect
∆G

⟨G⟩

a

d

S = (G – I +

((G – I)G)) G–1

h

Silencing

Gkj

e

Sij =

∂xi
∂xj

Source
j

k

Sik

�=

i Target

f

∆S
∆G

g
⟨ S⟩

© 2013 Nature America, Inc. All rights reserved.

Figure 1  Silencing indirect links. (a) The experimentally observed global
response matrix, Gij, accounts for direct as well as indirect correlations,
with no clear separation between them. The source of Gij could be
gene coexpression data, statistical correlations or genetic perturbation
experiments. (b) In the absence of a clear separation in Gij assigned to
direct and indirect correlations, our ability to infer direct physical links
(solid lines) is limited. Simple thresholding, that is, accepting all links
for which Gij exceeds a predefined threshold, is known to predict spurious
links (thick dashed lines) and overlook true links (thin solid lines).
(c) Although the average Gij terms associated with direct links are higher
than the average terms associated with indirect links, as captured by the
discrimination ratio, ∆G, the difference is not sufficient to fully discriminate
between direct and indirect links. (d) Silencing is achieved through
equation (5), which exploits the flow of information in the network: the flow
from the source (j ) to the target (i ) is carried through the indirect effect Gkj
(brown) coupled with the direct impact Sik of the target’s nearest neighbor k.
By silencing the indirect contributions, equation (5) provides the local
response matrix, Sij, whose nonzero elements correspond to direct links.
(e,f) In Sij the terms associated with indirect links are silenced, allowing
us to detect only the direct links of the underlying network. (g) As indirect
terms become much smaller in Sij, we obtain a greater discrimination ratio, ∆S.
(h) The degree of silencing, κ, captures the increase observed in the
discrimination ratio by the transition from Gij to Sij (through equation (5)).

∆S

­separation between direct and indirect interactions, accurately predicting each direct link. Indeed, the receiver operating characteristic
(ROC) curve derived from Gij (Fig. 2b) has an area of AUROC = 0.91,
reflecting inherent limitations in separating direct from indirect interactions based on Gij only. In contrast, for Sij we obtain AUROC = 0.997
(blue), where the true-positive rate reaches 100% with a false-positive
rate of <10−3. Also, as opposed to Gij, for which precision increases
gradually with the threshold q (Fig. 2c), Sij’s precision jumps to 1 for
q > 10−4. Hence, in our well-controlled model system, any nonzero Sij
corresponds effectively to a direct link.
The performance of equation (5) is due to the silencing effect.
It leaves Gij unchanged if i and j are linked, whereas it systematically lowers all Gij not rooted in a direct interaction. To quantify this
effect we measured the discrimination ratio ∆G = ⟨Gij⟩Dir/⟨Gij⟩Indir
(∆S = ⟨Sij⟩Dir/⟨Sij⟩Indir), which captures the ratio between Gij (Sij) terms
associated with direct links and those associated with indirect links.
We find that Sij is much more discriminative than Gij owing to its
silencing of indirect responses. This effect can be quantitatively mea­
sured through the silencing metric
k =

∆S
,
∆G

(6)

which captures the increased power of Sij to discriminate between
direct and indirect links compared to Gij (Fig. 1h). In our model system
we find that κ = 15, a silencing of more than an order of magnitude
(Fig. 2d). Furthermore, the longer the distance dij between two nodes,
the larger is the silencing (Fig. 2e). As an illustration, consider a linear
cascade in which changes in any node result in a finite response Gij by
all other nodes (Fig. 2f). Equation (5) silences all indirect responses,
while leaving the response of direct links effectively unchanged, offering a discriminative measure that enables a perfect reconstruction of
the original network.
Predicting molecular interactions in E. coli
To test the predictive power of equation (5) on real data, we used
the E. coli data sets distributed by the DREAM5 network inference
challenge19. The input data include a compendium of microarray
721

A n a ly s i s
a

100

b 1.0

Sij

Gij

TPR

–1

10

0.5
Sij

–2

Gij and Sij

10

0

10–3

c
Precision

10–4
10–5
10



d

Gij

0

–6

105
4

10

Figure 2  Network inference in model systems. We numerically simulated
Michaelis-Menten dynamics on a scale-free network (refs. 40–42),
extracting the correlations Gij between all pairs of nodes (Supplementary
Note, III). (a) Gij and Sij associated with interacting and noninteracting
node pairs. Sij silences the correlations associated with indirect
interactions, resulting in a clear separation between direct and
indirect interactions, a phenomenon absent from Gij. (b) ROC curve
obtained from Gij (red, AUROC = 0.91) and Sij (blue, AUROC = 0.997).
The Sij network reaches 100% accuracy with a negligible amount
of false positives. TPR, true-positive rate; FPR, false-positive rate.
(c) Precision obtained for threshold q for Gij and Sij. The gradual rise
of the Gij-based precision indicates that for a broad range of thresholds
only a small fraction of the links will be identified. In contrast, the steep
rise in precision for Sij indicates its enhanced discriminative power
between direct and indirect links; virtually any nonzero Sij corresponds
to a directly interacting pair. (d) The discrimination ratio, ∆, is much
higher in Sij compared to Gij. This indicates that Sij is a much better
predictor of direct versus indirect interactions. The silencing metric (6),
which captures the increase in the discrimination ratio, is κ = 15.0.
(e) Silencing increases with the path length dij between i and j, so
that the more indirect the link, the more dramatic the silencing.
(f) The source of Sij’s success is the silencing effect, here illustrated
on correlations measured for a linear cascade. The reconstruction of
the cascade from Gij is confounded by numerous nonvanishing indirect
correlations. In Sij the indirect correlations are silenced, providing a
perfect reconstruction.

Interacting
Noninteracting

1.0

0.5
FPR

1.0

0.5
Sij
Gij

0

10–5 10–4 10–3 10–2 10–1
Threshold (q)

� = 15.0

f

Sij
Gij

Original
network

Correlation
matrix Gij

e


102

Interacting
Noninteracting

101
Silenced
matrix Sij

100
1

2

3

(Supplementary Note, IV.3). From each of the three Gij matrices,
we obtained Sij via equation (5), and compared the performance
of Gij with the pertinent Sij. To validate our predictions we relied
on the gold standard used in the DREAM5 challenge, consisting
of 2,066 established gene regulatory interactions. Measuring
AUROC from Gij and Sij, we found an improvement of 56% for
Pearson correlations (Fig. 3a), 67% for Spearman rank correlations (Fig. 3b) and a smaller improvement of 6% for mutual information (Fig. 3c), allowing us to improve upon the top-performing
inference methods19.

4

dij

­experiments measuring the expression levels of 4,511 E. coli genes
(141 of which are known transcription factors) under 805 different
experimental conditions (Supplementary Note, IV.1). We constructed
three separate global response matrices Gij between the 141 transcription factors and their 4,511 potential target genes, based on (i) Pearson
correlations, (ii) Spearman rank correlations and (iii) mutual information, which are three commonly used methods for link detection

a

b

c

AUROC

TPR

I

ea

rm

M

an

n

so

Sp

ar

M

I

Pe

ea

rm

an

rs
on

Sp

Pe
a

I

M

an

m

rs
on

Sp
ea
r

Pe
a

722

f



e



d

AUROC

TPR

AUROC

TPR

Pearson
Spearman
Mutual information
Figure 3  Inferring regulatory interactions in
1.0
1.0
1.0
Pearson
Spearman
MI
E. coli. (a) Starting from gene expression data,
Silencer
Silencer
Silencer
we used Pearson correlations in expression
patterns to construct Gij for 4,511 E. coli genes,
0.5
0.5
0.5
obtaining Sij via equation (5). We compared our
0.68
0.65
0.64
0.67
predictions to a gold standard of experimentally
0.59
0.59
19
verified genetic regulatory links . The area
0
0
0
under the ROC curve (AUROC) is increased
0
0.5
1.0
0
0.5
1.0
0
0.5
1.0
from 0.59 to 0.64 in the transition from Gij to
FPR
FPR
FPR
Sij, representing a 56% improvement (above
the baseline of 0.5 for a random guess). TPR,
I
I
true-positive rate; FPR, false-positive rate.
X
Y
X
Y
(b) An improvement of 67% is observed for
Spearman rank correlations. (c) A less dramatic
General silencing
Cascade silencing
Coregulation silencing
improvement of 6% is shown when Gij is
� = 3.3
� = 3.1
constructed using mutual information (MI).
(d) The discrimination ratio for all three
� = 3.3
4.0
4.0 � = 3.0
4.0 � = 2.3
� = 2.2
methods compared with that obtained from
� = 2.2
� = 2.1
the pertinent Sij matrix. The transition to Sij
� = 1.6
2.0
2.0
2.0
increases the discrimination between direct
and indirect interactions by a factor of 2
0
0
0
or more, so that indirect interactions
have a considerably lower expression in Sij.
(e,f) This observation becomes even more
dramatic when focusing on two specific
motifs: cascades and co-regulators. In Gij
the indirect correlation between X and Y, which is induced by the intermediate node, I, may lead to the false prediction of the spurious X-Y link.
Thanks to silencing, the discrimination between the direct and indirect links in these motifs is increased by a factor of 3 or more for Pearson and
Spearman correlations, and by a factor of ~2 for mutual information.


npg

© 2013 Nature America, Inc. All rights reserved.

103

VOLUME 31  NUMBER 8  AUGUST 2013 nature biotechnology

a n a ly s i s



�c = 0.75

10

x–1

1

100

No silencing

10–2

10–1

100



We further tested the discrimination ratio, ∆, and the silencing, κ,
for each of these methods, finding that indirect correlations are subject to an average of twofold silencing in the transition from Gij to Sij
(Fig. 3d). Silencing is especially crucial in the presence of the cascade
and co-regulation motifs (Fig. 3e,f), where most inference methods
indicate a spurious link between X and Y, owing to the indirect correlation mediated by node I. Indeed, the transformation (5) silences
these indirect correlations by a factor of three or more for Pearson and
Spearman correlations and by a smaller factor (1.6 or 2.1) for mutual
information, overcoming one of the most common hurdles of inference methods, which tend to over-represent triadic motifs19.
The effects of noise and uncertainty
As all experimental data are subject to noise, the global response
matrix, Gij, is characterized by some degree of uncertainty. To test the
performance of our methodology in the presence of noise, we repeated
the numerical experiment of Figure 2, this time adding Gaussian
noise to Gij, which allows us to calculate silencing as a function of
increasing the signal-to-noise ratio θ (Fig. 4). As expected, silencing
is unaffected by small values of θ, so that κ features a plateau below
θ  0.1. For large θ, silencing decays as κ ~ θ−1, demonstrating that
the performance of the method decreases slowly as the signal-to-noise

a

hC ≈ 1 −


,
⟨k⟩

(7)

where Ω = 2 ln( 2 + 2) ≈ 1.7 (Supplementary Note, V.2). This equation indicates that for large ⟨k⟩ the method will be reliable even if a
large fraction of the nodes are hidden.
To test this prediction, we revisited the numerically obtained Gij
analyzed in Figure 2 and measured the degree of silencing after randomly removing an increasing fraction of nodes. In each case we also

b

c





Figure 5  Performance with hidden nodes.
(fraction of hidden nodes)
(a) A network with N = 8 nodes, of which a
0
0.1
0.6 0.8 1
fraction h = 1/4 are hidden. The observable
d
>
1
subnetwork has 6 nodes, 5 forming a connected
Isolated
15
�=1
node
101
component (with 10 connected node pairs) and
Hidden
nodes
1 isolated (6 isolated pairs). The ratio between
10
�C ≈ 0.57
isolated and connected node pairs here is
r = 6/10. Equation (5), applied to the observable
5
d=∞
network, successfully silences the indirect
100
Gij terms among the nodes of the connected
0
No silencing
component. However, the correlations between
the isolated node and the rest of the network,
102
104
10–4 10–2 100
lacking an indirect path, are not silenced.

(isolated/connected node pairs)
(b) To test the silencing in the presence of
hidden nodes, we used the numerically obtained
Gij (Fig. 2) from which we eliminated a fraction h of the nodes, obtaining an observable network with 10 4 isolated node pairs (r ≈ 10−3). After applying
equation (5) to the remaining nodes, we found that the silencing of Gij terms associated with connected node pairs is unaffected (orange bar), whereas
for the isolated node pairs, silencing drops to k = 1, namely no silencing (purple bar). Hence, for the isolated node pairs, Sij is not more predictive
than Gij. (c) Increasing the fraction of hidden nodes, h (top horizontal axis), we measured k versus r. As expected, silencing is observed as long as
most node pairs are connected via finite paths (r < 1). However, when the number of hidden nodes is increased to the point that the isolated pairs
dominate (r > 1), silencing is no longer observed (κ = 1). The critical fraction of hidden nodes, hC, corresponds to r = 1, the point at which silencing
no longer plays an important role. Here we find hC ≈ 0.57 (blue arrow), in agreement with the prediction of equation (7).
ed

at

Is
ol

C

on

ne

ct

ed



npg

© 2013 Nature America, Inc. All rights reserved.

Figure 4  Silencing in a noisy environment. To test the method’s performance
in the presence of a noisy input we added Gaussian noise to the numerically
obtained Gij, and measured the silencing, κ, versus the signal-to-noise
ratio θ. For low noise levels (θ  0.1), silencing is relatively unharmed.
At higher noise levels, silencing decreases as κ ~ θ −1, a slow decay that
supports the robustness of the method. Silencing is lost at θC ≈ 0.75,
when the signal is almost fully driven by the noise.

ratio is increased. Indeed, as opposed to a rapid exponential decay, the
observed, slower, power-law dependence indicates that the method is
rather tolerant to noise. Silencing is lost only when the noise reaches
the critical level θC ≈ 0.75, when the signal is almost completely overridden by noise, leading to κ = 1 (Supplementary Note, V.1).
Hidden nodes offer another source of uncertainty. They represent
the fact that in most cases we are unable to read the states of all nodes
in the system29. To illustrate the effect of the hidden nodes on the
performance of the silencing method, we consider the case of a simple
cascade i → k → j, where the intermediate node k is hidden. In this
scenario, equation (5) will not be able to silence the indirect i → j link,
because in the observable system, the Gij term cannot be attributed
to any indirect path. Hence, absent any other information about the
system, it is mathematically impossible to infer the indirectness of Gij,
as the removal of k isolated i from j30. This touches upon the fundamental mechanism of silencing: the silencing transformation (5)
exploits the flow of information through indirect paths (Fig. 1 and
Supplementary Note, I.2). Consequently, if as a result of hidden
nodes, the network fragments into several components such that the
node pair i and j become isolated from each other, then all indirect
paths between them became hidden and the pertinent Gij term will
not be silenced (Fig. 5a,b). Hence silencing is expected to fail only
when the network breaks into many isolated components so that most
node pairs become isolated. Fortunately, a fundamental property of
complex networks is that with average degree ⟨k⟩  1, one needs to
remove a large fraction of the nodes to fragment the underlying giant
connected component31–34. Therefore we can build on percolation
theory, which allows us to analytically predict how the size of the
largest connected component changes with the random removal of
a certain fraction of nodes35,36. The calculation shows that silencing
is maintained as long as the fraction of hidden nodes is smaller than

nature biotechnology VOLUME 31  NUMBER 8  AUGUST 2013

723

A n a ly s i s

npg

© 2013 Nature America, Inc. All rights reserved.

measured the ratio between isolated and connected node pairs ( ρ).
We found that, as predicted, the degree of silencing is driven mainly
by ρ, approaching κ ≈ 1 (no silencing) when ρ ≥ 1, namely, when the
isolated pairs begin to dominate the network (Fig. 5c). Here as ⟨k⟩ = 4,
equation (7) predicts ηC ≈ 0.57, that is, the method will fail only when
almost 60% of the nodes are hidden. Note that for biological networks,
⟨k⟩ is expected to be in the range of ⟨k⟩  10 (ref. 37), predicting
ηC  0.8. Namely, one needs to lose access to 80% of the nodes for
silencing to lose its effectiveness.
DISCUSSION
With computational complexity O(N3), equation (5) is scalable and
requires no assumptions about the network topology. By silencing
indirect effects, it turns the raw correlation data into a predictive Sij
matrix, dominated by direct interactions. It is especially suited to treat
perturbation data, such as genetic perturbation experiments, in which
case Gij describes the response of all genes (dxi) as a consequence of
the perturbation of the source gene (dxj)38. In practice, however, Gij
could be the result of a broader set of experimental realizations where
other measures are used to evaluate the association between nodes,
typically statistical measures such as Pearson or Spearman correlation
coefficients. Still, our empirical results (Fig. 3) clearly show that the
transformation (5) successfully applies to these empirically accessible
measures as well. Hence, silencing is largely insensitive to the specific
process by which Gij was constructed.
The method’s broad applicability is rooted in the fact that it does
not depend on the value of each specific term in Gij, but rather on
the global relationships between them. Indeed, the global structure
of Gij reflects the patterns of propagation of the perturbations along
the network. Equation (5) helps uncover these paths from the raw
data, disentangling the direct from the indirect effects. These patterns
of information flow are inherent to the underlying network structure and should not depend on the specific experimental realization
of equation (1). For instance, a cascade i → j → k will be characterized by a decreasing correlation propagating along the arrows,
a large correlation between i and j and a weaker one between i
and k. Although the magnitude of these correlations might depend
on the size or the form of i’s perturbation as well as on the statistical measure we used to evaluate them, the decay pattern required to
infer the structure of the cascade is an inherent property of the network flow and can be successfully detected by the silencing method
(Supplementary Note, I.4).
The silencing transformation is derived from fundamental mathematical principles of dynamical correlations in networks. Hence it
is expected to apply under rather general conditions. However, as
equation (5) indicates, it requires that the input matrix, Gij, is invertible. This imposes some limitations when constructed from statistical correlation measures. For instance, in the empirical results of
Figure 3a, we constructed Gij from Pearson correlations, using the
states of 4,511 nodes measured under 805 experimental conditions.
In general, if the number of experimental conditions is smaller than
the number of nodes, the resulting Pearson correlation matrix may
be singular. In this case, additional processing will be required before
equation (5) can be applied. In this work, following the DREAM5
protocol, we only focused on the correlations between the 141 known
transcription factors and the rest of the nodes, which lead to an invertible Gij (Supplementary Note, IV). Other means to ensure Gij’s invertibility are discussed in Supplementary Note, IV.4.
Isolating indirect effects in correlation data, a fundamental challenge of network inference, is typically approached through local
probabilistic tools12,14–18. In contrast, the success of the silencing
724

method is rooted in its exploitation of the global network topology39. It relies on the fundamental principles of network structure
and dynamics to identify and silence the effects of indirect paths.
The ability to extract Sij from Gij could also have implications for our
understanding of network dynamics. Indeed, Gij is a global network
measure, as its magnitude is determined by the numerous indirect
paths connecting i and j. Hence, for a given dynamics, the Gij matrix
will take a different form depending on the network topology, making
it a poor predictor of the system’s dynamics. By eliminating indirect
effects, Sij measures the effect gene i would have on gene j had they
been isolated from the rest of the network. It thus helps us quantify
the dynamical mechanism that governs individual pairwise interactions, avoiding the convolution of dynamical and topological effects
present in experimental data. For instance, consider a set of perturbation experiments providing Gij. The structure of Gij reflects the
microscopic mechanisms that govern the pairwise interactions, for
example, genetic regulation and biochemical processes. It is difficult,
however to extract this information from Gij because its terms are a
convolution of many interactions, reflecting the many paths leading
from i to j. The transition to Sij, via equation (5), allows us to treat
each isolated interaction on its own, providing a direct observation
of the microscopic interaction mechanism. Direct application of this
fact could be the derivation of a rate equation that governs the system’s
dynamics from Gij, as well as predicting the universality class and the
scaling laws governing the system’s response to perturbations. Hence
equation (5) helps translate the ever-growing amount of data on global
correlations into valuable local information.
METHODS
Methods and any associated references are available in the online
version of the paper.
Note: Supplementary information is available in the online version of the paper.
Acknowledgments
We thank B. Alipanahi and B. Frey for their valuable insights, A. Sharma, F. Simini,
J. Menche, S. Rabello, G. Ghoshal, Y.-Y. Liu, T. Jia, M. Pósfai, C. Song, Y.-Y. Ahn,
N. Blumm, D. Wang, Z. Qu, M. Schich, D. Ghiassian, S. Gil, P. Hövel, J. Gao,
M. Kitsak, M. Martino, R. Sinatra, G. Tsekenis, L. Chi, B. Gabriel, Q. Jin and Y. Li
for discussions, and S.S. Aleva, S. Morrison, J. De Nicolo and A. Pawling for their
support. This work was supported by the US National Institutes of Health (NIH),
Center of Excellence of Genomic Science (CEGS), Grant number NIH CEGS
1P50HG4233; and the NIH, award number 1U01HL108630-01; DARPA Grant
Number 11645021; DARPA Social Media in Strategic Communications project
under agreement number W911NF-12-C-0028; the Network Science Collaborative
Technology Alliance sponsored by the US Army Research Laboratory under
agreement number NS-CTA W911NF-09-02-0053; the Office of Naval Research
under agreement number N000141010968; and the Defense Threat Reduction
Agency awards WMD BRBAA07-J-2-0035 and BRBAA08-Per4-C-2-0033.
AUTHOR CONTRIBUTIONS
Both authors designed the research and wrote the paper. B.B. analyzed the
empirical data, and did the analytical and numerical calculations.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
1. Buchanan, M., Caldarelli, G., De Los Rios, P., Rao, F. & Vendruscolo, M. (eds).
Networks in Cell Biology (Cambridge University Press, 2010).
2. Ideker, T. & Sharan, R. Protein networks in disease. Genome Res. 18, 644–652
(2008).
3. Kann, M.G. Protein interactions and disease: computational approaches to uncover
the etiology of diseases. Brief. Bioinform. 8, 333–346 (2007).
4. Albert, R. Scale-free networks in cell biology. J. Cell Sci. 118, 4947–4957
(2005).

VOLUME 31  NUMBER 8  AUGUST 2013 nature biotechnology

5. Barabási, A.-L. & Oltvai, Z.N. Network biology: understanding the cell’s functional
organization. Nat. Rev. Genet. 5, 101–113 (2004).
6. Vidal, M., Cusick, M.E. & Barabási, A.-L. Interactome networks and human disease.
Cell 144, 986–998 (2011).
7. Rual, J.F. et al. Towards a proteome-scale map of the human protein-protein
interaction network. Nature 437, 1173–1178 (2005).
8. Yu, H. et al. High-quality binary protein interaction map of the yeast interactome
network. Science 322, 104–110 (2008).
9. Braun, P. et al. An experimentally derived confidence score for binary protein-protein
interactions. Nat. Methods 6, 91–97 (2009).
10. Krogan, N.J. et al. Global landscape of protein complexes in the yeast Saccharomyces
cerevisiae. Nature 440, 637–643 (2006).
11. Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425–431
(2010).
12. Ramani, A.K. et al. A map of human protein interactions derived from co-expression
of human mRNAs and their orthologs. Mol. Syst. Biol. 4, 180–195 (2008).
13. Barzel, B. & Biham, O. Quantifying the connectivity of a network: the network
correlation function method. Phys. Rev. E 80, 046104 (2009).
14. Eisen, M.B. et al. Cluster analysis and display of genome-wide expression patterns.
Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).
15. Butte, A.J. & Kohane, I.S. Mutual information relevance networks: functional
genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput.
5, 415–426 (2000).
16. Margolin, A.A. et al. ARACNE: an algorithm for the reconstruction of gene regulatory
networks in a mammalian cellular context. BMC Bioinformatics 7, S7 (2006).
17. Guo, S. et al. Uncovering interactions in the frequency domain. PLoS Comput. Biol.
4, e1000087 (2008).
18. Faith, J.J. et al. Large-scale mapping and validation of Escherichia coli  transcriptional
regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).
19. Marbach, D. et al. Wisdom of crowds for robust gene network inference.
Nat. Methods 9, 796–804 (2012).
20. Lezon, T.R. et al. Using the principle of entropy maximization to infer genetic
interaction networks from gene expression patterns. Proc. Natl. Acad. Sci.
USA 103, 19033–19038 (2006).
21. Ma, S. et al. An Arabidopsis gene network based on the graphical Gaussian model.
Genome Res. 17, 1614–1625 (2007).
22. Han, L. & Zhu, J. Using matrix of thresholding partial correlation coefficients to
infer regulatory network. Biosystems 91, 158–165 (2008).

23. Chen, L. & Zheng, S. Studying alternative splicing regulatory networks through
partial correlation analysis. Genome Biol. 10, R3 (2009).
24. Peng, J. et al. Partial correlation estimation by joint sparse regression models.
J. Am. Stat. Assoc. 104, 735–746 (2009).
25. Yuan, Y. et al. Directed Partial Correlation: inferring large-scale gene regulatory
network through induced topology disruptions. PLoS ONE 6, e16835 (2011).
26. Adamic, L.A. & Adar, E. Friends and neighbors on the web. Soc. Networks 25,
211–230 (2003).
27. Alon, U. An Introduction to Systems Biology: Design Principles of Biological Circuits
(Chapman & Hall, London, 2006).
28. Karlebach, G. & Shamir, R. Modeling and analysis of gene regulatory networks.
Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).
29. Caldarelli, G., Capocci, A., De Los Rios, P. & Muñoz, M.A. Scale-free networks from
varying vertex intrinsic fitness. Phys. Rev. Lett. 89, 258702 (2002).
30. Liu, Y.-Y., Slotine, J.-J. & Barabási, A.-L. Observability of complex systems.
Proc. Natl. Acad. Sci. USA 110, 2460–2465 (2013).
31. Erdo˝s, P. & Rényi, A. On the evolution of random graphs. Publications Math. Inst.
Hungarian Acad. Sci. 5, 17–61 (1960).
32. Albert, R., Jeong, H. & Barabási, A.-L. Error and attack tolerance of complex
networks. Nature 406, 378–382 (2000).
33. Cohen, R., Erez, K., Ben-Avraham, D. & Havlin, S. Resilience of the Internet to
random breakdowns. Phys. Rev. Lett. 85, 4626–4628 (2000).
34. Bollobás, B. The Evolution of Random Graphs—the Giant Component. in Random
Graphs 2nd ed. (Cambridge University Press, 2001).
35. Stauffer, D. & Aharony, A. Introduction to Percolation Theory (CRC Press, 1994).
36. Cohen, R. & Havlin, S. Complex Networks: Structure, Robustness and Function
(Cambridge University Press, 2010).
37. Venkatesan, K. et al. An empirical framework for binary interactome mapping.
Nat. Methods 6, 83–90 (2009).
38. Kauffman, S. The ensemble approach to understand genetic regulatory networks.
Physica A 340, 733–740 (2004).
39. Marks, D.S., Hopf, T.A. & Sander, C. Protein structure prediction from sequence
variation. Nat. Biotechnol. 30, 1072–1080 (2012).
40. Barabási, A.-L. & Albert, R. Emergence of scaling in random networks.
Science 286, 509–512 (1999).
41. Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod.
Phys. 74, 47–97 (2002).
42. Caldarelli, G. Scale-free Networks (Oxford University Press, 2007).

nature biotechnology VOLUME 31  NUMBER 8  AUGUST 2013

725

npg

© 2013 Nature America, Inc. All rights reserved.

a n a ly s i s

a n a ly s i s

Network deconvolution as a general method to
distinguish direct dependencies in networks

npg

© 2013 Nature America, Inc. All rights reserved.

Soheil Feizi1–3, Daniel Marbach1,2, Muriel Médard3 & Manolis Kellis1,2

Received 12 September 2012; accepted 11 June 2013; published online
14 July 2013; doi:10.1038/nbt.2635

inferring an edge from node 1 to node 3, even though there is no
direct information flow between them (Fig. 1a). Moreover, even if a
true relationship exists between a pair of nodes, its strength may be
over-estimated owing to additional indirect relationships, and distinguishing the convolved direct and indirect contributions is a daunting
task. As the size of networks increases, a very large number of indirect edges may be due to second-order, third-order and higher-order
interactions, resulting in diffusion of the information contained in
the direct network, and leading to inaccurate network structures and
network weights in many applications1,5–11.
Several approaches have been proposed to infer direct dependencies among variables in a network. For example, partial correlations
have been used to characterize conditional relationships among small
sets of variables12–14, and probabilistic approaches, such as maximum
entropy models, have been used to identify informative network
edges10,15,16. Other works use graphical models and message-passing
algorithms to characterize direct information flows in a network17,18,
or variations of Granger causality19 to capture the dynamic relationships among variables20–22. Alternative approaches formulated
the problem of separating direct from indirect dependencies as a
general feature-selection problem23–25, using Bayesian networks26–28,
or using an information-theoretic approach to eliminate indirect
information flow in the network29. These methods are limited to
relatively low-order interaction terms29, or are computationally very
expensive12–14, or are designed for specific applications10,15–17,30,31,
thus limiting their applicability.
In this paper, we formulate the problem of network deconvolution
in a graph-theoretic framework. Our goal is a systematic method for
inferring the direct dependencies in a network, corresponding to true
interactions, and removing the effects of transitive relationships that
result from indirect effects. When the matrix of direct dependencies
is known, all transitive relationships can be computed by summing
this direct matrix and all its powers, corresponding to the transitive
closure of a weighted adjacency matrix, which convolves all direct
and indirect paths at all lengths (Fig. 1b). Given an observed matrix
of correlations that contains both direct and indirect effects, our task
is to recover the original direct matrix that gave rise to the observed
matrix. For a weighted network where edge weights represent the
confidence, mutual information or correlation strength relating two
elements in the network, the inverse problem seeks to recognize the
fraction of the weight of each edge attributable to direct versus indirect contributions, rather than to keep or remove unit-weight edges.
This inverse problem is dramatically harder than the forward problem
of transitive closure, as the original matrix is not known.

726

VOLUME 31  NUMBER 8  AUGUST 2013 nature biotechnology

Recognizing direct relationships between variables connected
in a network is a pervasive problem in biological, social and
information sciences as correlation-based networks contain
numerous indirect relationships. Here we present a general
method for inferring direct effects from an observed
correlation matrix containing both direct and indirect
effects. We formulate the problem as the inverse of network
convolution, and introduce an algorithm that removes the
combined effect of all indirect paths of arbitrary length in
a closed-form solution by exploiting eigen-decomposition
and infinite-series sums. We demonstrate the effectiveness
of our approach in several network applications: distinguishing
direct targets in gene expression regulatory networks; recognizing
directly interacting amino-acid residues for protein structure
prediction from sequence alignments; and distinguishing
strong collaborations in co-authorship social networks using
connectivity information alone. In addition to its theoretical
impact as a foundational graph theoretic tool, our results suggest
network deconvolution is widely applicable for computing direct
dependencies in network science across diverse disciplines.
Network science has been widely adopted in recent years in diverse
settings, including molecular and cell biology1, social sciences2, information science3, document mining4 and other data mining applications. Networks provide an efficient representation for variable
interdependencies, represented as weighted edges between pairs of
nodes, with the edge weight typically corresponding to the confidence
or the strength of a given relationship. Given a set of observations
relating the values that elements of the network take in different conditions, a network structure is typically inferred by computing the
pairwise correlation, mutual information or other similarity metrics
between each pair of nodes.
The resulting edges include numerous indirect dependencies owing
to transitive effects of correlations. For example, if there is a strong
dependency between nodes 1 and 2, and between nodes 2 and 3 in the
true (direct) network, high correlations will also be visible between
nodes 1 and 3 in the observed (direct and indirect) network, thus
1Computer

Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts
Institute of Technology (MIT), Cambridge, Massachusetts, USA. 2Broad Institute
of MIT and Harvard, Cambridge, Massachusetts, USA. 3Research Laboratory of
Electronics at MIT, Cambridge, Massachusetts, USA. Correspondence should be
addressed to M.K. ([email protected]).

npg

We introduce an algorithm for network deconvolution that can efficiently solve the inverse problem of transitive closure of a weighted
adjacency matrix, by use of decomposition principles of eigenvectors
and eigenvalues, and by exploiting the closed form solution of infinite
Taylor series. We demonstrate the effectiveness of this approach and
our algorithm in several large-scale networks from different domains
and with different properties (Supplementary Table 1). First, we seek
to distinguish likely direct targets in gene regulatory networks as a postprocessing step for diverse gene network inference methods, and show
that network deconvolution improves both global and local network
quality. Second, we show the effectiveness of network deconvolution in
distinguishing directly interacting amino-acid residues based on pairwise mutual information data in multispecies protein alignments. Third,
we apply network deconvolution to a social network setting using a coauthorship network that contains solely connectivity information, and
show that the resulting edge weights are able to distinguish strong and
weak ties independently inferred based on the number of joint papers
and additional co-authors. The wide applicability of network deconvolution suggests that such a closed-form solution is not only of important
theoretical use in reversing the effect of matrix transitive closure, but also
of wide practical applicability in a diverse set of real-world networks.

a

True network (Gdir)
2

Observed network (Gobs)
Transitive closure

3

2

(TC)

1

3

1

Network deconvolution

5
4

5
4

(ND)

Direct effects
Indirect effects

b

Indirect effects Series closed form
2

Transitive closure:

3

–1

Gobs = Gdir + Gdir + G dir + ··· = Gdir( I – Gdir)

Network deconvolution: Gdir = Gobs( I + Gobs)–1

c

True network

Observed network

Gdir = U�dirU –1

Gobs = UΣobsU –1

�1dir 0
�dir
0
2
0

d

Network deconvolution

0

�idir
�dir
n

e

=

�obs
i

1 + �iobs

0

�obs
n

Deconvolved network

ND

Observed network
with nonlinear indirect flows

Direct network

�1obs 0
0
�2obs
0

Observed network
with linear indirect flows

Direct network
Transitive closure
(linear, additive)

Figure 1  Network deconvolution overview. (a) Direct edges in a network
(solid blue arrows) can lead to indirect relationships (dashed red arrows)
as a result of transitive information flow. These indirect contributions
can be of length 2 (e.g., 1→2→3), 3 (e.g., 1→2→3→5) or higher, and
can combine both direct and indirect effects (e.g., 2→4), and multiple
indirect effects along varying paths (e.g., 2→3→5, 2→4→5). Self-loops
are excluded from networks. Network deconvolution seeks to reverse the
effect of transitive information flow across all indirect paths in order to
recover the true direct network (blue edges, Gdir) based on the observed
network (combined blue and red edges, Gobs). (b) Algebraically, the
transitive closure of a network can be expressed as an infinite sum of
the true direct network and all indirect effects along paths of increasing
lengths, which can be written in a closed form as an infinite-series sum.
Network deconvolution exploits this closed form to express the direct
network Gdir as a function of the observed network Gobs. (c) To efficiently
compute this inverse operation, we express both the true and observed
networks Gdir and Gobs by decomposition into their eigenvectors and
eigenvalues, which enables each eigenvalue λidir of the original network to
be expressed as a nonlinear function of a single corresponding eigenvalue
λiobs of the convolved observed network. (d,e) Network deconvolution
assumes that indirect flow weights can be approximated as the product
of direct edge weights, and that observed edge weights are the sum of
direct and indirect flows. When these assumptions hold (d), network
deconvolution removes all indirect flow effects and infers all direct
interactions and weights exactly. Even when these assumptions do not
hold (e), network deconvolution infers 87% of direct edges, showing
robustness to nonlinear effects.

Non-linear effects

© 2013 Nature America, Inc. All rights reserved.

a n a ly s i s

Deconvolved network

ND

Direct interactions, correctly recovered (true positives)
Length-2 indirect interactions (false positives)
Length n > 2 indirect interactions (false positives)
True interactions removed by ND (false negatives)

RESULTS
Resolving direct and indirect dependencies in a graph
Mathematically, we model the weights of an observed network
Gobs, whose diagonal is set to zero, as the sum of both direct weights
in the true network Gdir, and indirect weights due to indirect paths
2
3
of increasing length in Gdir
, Gdir
and others (Fig. 1a). The inverse
problem of inferring the direct network from the observed network
is seemingly intractable, as the direct information has now diffused through the observed network beyond recognition. However,
expressing Gobs as an infinite sum of the exponentially decreasing
contributions of increasingly indirect paths leads to a closed form
solution for Gobs as a function of Gdir using an infinite-series summation (Fig. 1b). Moreover, by decomposing the observed network
into its eigenvalues and eigenvectors, which provide a factorization of

the connectivity matrix into its canonical form, we can express each
eigenvalue of the direct matrix as a function of the corresponding
eigenvalue of the observed matrix (Fig. 1c). This decomposition leads
to a simple closed-form solution for Gdir and provides a framework for
an efficient globally optimal algorithm to deconvolve the contributions of direct and indirect edges given an observed network (Online
Methods and Supplementary Note, 1).
The resulting network deconvolution algorithm can be viewed as
a nonlinear filter over eigenvalues of a locally observed network to
compute global edge significance, removing indirect flow effects for
each eigenvalue by computing the inverse of a Taylor series expansion (Supplementary Fig. 1). This results in the decrease of large positive eigenvalues of the observed dependency matrix that are inflated
owing to indirect effects. The eigenvalue/eigenvector matrix decomposition holds for all symmetric matrices, including all correlation
or ­information-based matrices, and also for some asymmetric input
matrices, such as those in Supplementary Note, 1.4.1. For nondecomposable matrices, we present an iterative conjugate gradient descent
approach for network deconvolution that converges to a globally optimal solution by convex optimization (Supplementary Note, 1.4.2 and
Supplementary Fig. 2).
Our formulation of network deconvolution has two under­lying
modeling assumptions: first that indirect flow weights can be
approximated as the product of direct edge weights, and second, that
observed edge weights are the sum of direct and indirect flows. When
these assumptions hold, network deconvolution provides an exact

nature biotechnology VOLUME 31  NUMBER 8  AUGUST 2013

727

A n a ly s i s
a

b

npg

© 2013 Nature America, Inc. All rights reserved.

MI & correlation
Other

S. cerevisiae score

E. coli score

In silico score

Overall score

Figure 2  Deconvolution of gene regulatory
MI and correlation methods
Other inference methods
Community
Before ND After ND
networks. (a) Network deconvolution applied to
60
Before ND
Casc.FFL Casc.FFL
the inferred networks of top-scoring methods
After ND
40
CLR
from DREAM5 leads to consistent improvements
ARACNE
for mutual information (MI) and correlation20
based methods (average performance increase,
MI
0
59%). Network deconvolution also improves
10
1
2
3
4
5
6
7
8
9
Pearson
150
other top-scoring methods (11% on average),
Spearman
including the best-performing method of
100
GENIE3
the DREAM5 challenge (GENIE3), thus
50
TIGRESS
leading to a new overall highest performance.
Inferelator
Moreover, the community network obtained by
0
10
1
2
3
4
5
6
7
8
9
ANOVerence
integrating network predictions from individual
20
methods (1–9) before network deconvolution
15
Community
is outperformed by the community network
10
Average improvement
based on deconvolved networks by ~22%.
5
(b) Network motif analysis showing the relative
Relative performance
0
performance of inference methods for cascades
10
1
2
3
4
5
6
7
8
9
(AUROC)
–5%
0
+5%
4
(casc.) and feed-forward loops (FFL) before
Feed-forward loop
B
3
(FFL) contains
and after network deconvolution. Red and
A
C feed-forward edge
2
blue corresponds to increased and decreased
Feed forward edge
prediction accuracy, respectively, of the two
1
Cascade (casc.) lacks
B
motif types relative to the overall performance
feed-forward edge
0
10
1
2
3
4
5
6
7
8
9
A
C
of the method before network deconvolution
I
r
E
S
ity
ce
M
on
an
to
E3
LR
S
N
I
n
s
n
a
l
C
E
(measured by AUROC; Supplementary Note,
u
rm
ar
re
R
AC
re
EN
m
ea
G
G
fe
Pe
Ve
AR
om
TI
In
O
2.4). The original methods (before network
Sp
C
AN
deconvolution, left side) have different
relative performances for cascades and FFLs, for example, MI–based network inference tends to include feed-forward edges (red arrow), resulting in
higher accuracy for FFLs but lower accuracy for cascades, whereas the opposite is true for the Inferelator and ANOVerence. The deconvolved networks
(after network deconvolution, right side) show significantly higher accuracy (AUROC) for true cascade network motifs for all methods, and moderately
improved accuracy for FFLs on average, showing that network deconvolution successfully eliminates spurious indirect feed-forward edges for true
cascade motifs, without sacrificing accuracy for true FFLs.

Application to gene regulatory networks
We first apply our network deconvolution algorithm to gene regulatory networks, which are pervasively used in molecular biology

to describe regulatory relationships between transcription factors
(regulators) and their target genes1. Regulatory network inference
from high-throughput gene expression data1,6,32, or by integrating
complementary types of data sets33–35, is a well-studied problem in
computational molecular biology26,29,36,37, enabling us to benefit from
available data sets and community efforts for direct method comparisons1,6. Perhaps the largest such comparison is the recently published network inference challenge part of the Dialogue on Reverse
Engineering Assessment and Methods (DREAM) project5.
In the DREAM5 network inference challenge5, different methods
were applied to reconstruct networks for the bacterium Escherichia
coli and the single-celled eukaryote Saccharomyces cerevisiae based on
experimental data sets, and to reconstruct an in silico network based
on simulated data sets (Supplementary Note, 2.1 and Supplementary
Fig. 6). True positive interactions were defined as a set of experimentally validated interactions from the RegulonDB database for E. coli38,
and a high-confidence set of interactions supported by genome-wide
transcription-factor binding data (ChIP-chip) and evolutionarily conserved binding motifs for S. cerevisiae39. All methods were evaluated
using the same four performance evaluation metrics: (i) the area under
the precision-recall curve; (ii) the area under the receiver operating
characteristic curve; (iii) a combined per-network score that utilizes
both previous metrics for each individual network; and (iv) an overall
per-method score that summarizes the combined performance across
all three networks (Online Methods and Supplementary Note, 2.3).
The DREAM5 challenge provides an ideal benchmark for evaluating
network deconvolution, given the uniform benchmarks for network
reconstruction used, and the participation of many of the research teams
at the forefront of network inference research, with a total of 35 different
prediction methods applied across a wide array of methodologies.
Given that network deconvolution is designed as a way to
eliminate indirect edge weights in mutual information–based and

728

VOLUME 31  NUMBER 8  AUGUST 2013 nature biotechnology

closed-form solution for completely removing all indirect flow effects
and inferring all direct interactions and weights exactly (Fig. 1d).
We show that network deconvolution performs well even when
these assumptions do not hold, by inclusion of nonlinear effects
through simulations when the direct edges are known (Fig. 1e and
Supplementary Note, 1.3) and by application to diverse real-world
biological and social networks where additional properties can be
independently evaluated. Our Taylor series closed-form solution
assumes that all eigenvalues of the direct dependency matrix are
between –1 and 1, which leads to a geometric decrease in the contributions of indirect paths of increasing lengths (Supplementary Note,
1.2). This assumption can be achieved for any matrix by scaling the
observed input network by a function of the magnitude of its eigenvalues (Supplementary Note, 1.6 and Supplementary Fig. 4).
We also provide a useful generalization of network deconvolution when the observation dependency matrix is itself noisy
(Supplementary Note, 1.5). Although direct dependency weights cannot be recovered exactly from the noisy observations, we show that the
resulting estimates are close to true weights for moderate noise levels
in the input data sets (Supplementary Fig. 3). We also present two
extensions of the network deconvolution algorithm (Supplementary
Note, 1.7) that make it scalable to very large networks: the first exploits
the sparsity of eigenvalues of low rank networks, and the second parallelizes network deconvolution over potentially overlapping subgraphs
of the network (Supplementary Fig. 5).
We next apply our network deconvolution approach to three settings:
(i) inferring gene regulatory networks, (ii) inferring protein structural
constraints and (iii) inferring weak and strong ties in social networks.

npg

© 2013 Nature America, Inc. All rights reserved.

a n a ly s i s
for feed-forward loops relative to MI, highlighting the difficulty of
distinguishing transitive edges from true feed-forward edges. If network deconvolution can accurately identify spurious indirect edges
but preserve true feed-forward edges, we should expect substantially
increased accuracy for cascades, and no decrease in accuracy for feedforward edges. Indeed, we found that deconvolved networks lead to
improved prediction accuracy for true cascades for each method, thus
correctly eliminating spurious A → C edges (Fig. 2b). Importantly, the
improved performance on cascades did not lead to an increased error
rate on feed-forward loops, where prediction accuracy remained similar or improved in most deconvolved networks, with the exception
of TIGRESS, which was also the only method where network deconvolution did not lead to an improved overall performance. Taken
together, these results show that network deconvolution effectively
distinguishes direct from indirect edges, improving the predictions of
a wide range of gene regulatory network inference approaches.

correlation-based networks, we first applied it to the networks
predicted by the top-scorers of such methods, including CLR37,
ARACNE29 and basic mutual information (relevance networks)40. In
all cases, we found that network deconvolution substantially improved
the performance of each method according to all metrics used and for
all networks tested in DREAM5 (Fig. 2a). The average per-method
score increased by 59%, and the per-network scores increased by 53%,
78% and >300-fold in the in silico, E. coli and S. cerevisiae networks,
respectively, (the strong S. cerevisiae improvements are due to low
scores for all methods). It is notable that ARACNE, which seeks to
remove transitive edges by studying feed-forward loops directly,
showed a 75% improvement by network deconvolution, indicating
that these indirect effects are not always detectable at the local level
but instead require a global network deconvolution approach. As
information theory methods are among the most widely used network inference approaches5,6, their use in combination with network
deconvolution can be of great general use.
We next applied network deconvolution to other top-performing
inference methods that are not based on mutual information or correlation. These include ANOVerence41 that uses a nonparametric
nonlinear similarity metric between transcription factors and target
genes, GENIE3 (ref. 23) that uses regression and a tree-based ensemble method, TIGRESS42 that uses a sparse regression formulation and
feature selection, and Inferelator32 that uses regression and variable
selection based on expression data. We found that network deconvolution was effective even when applied to these methods, leading
to an overall performance increase of 11% on average using the same
metric. The performance was increased for three of the four methods,
including for the top-performing method (GENIE3), which increased
by 13%. As GENIE3 was the overall top-performing method, this
suggests that the combination of GENIE3 and network deconvolution
provides the new top-performing method, outperforming all other
35 methods that were assessed in the DREAM5 challenge5. We also
applied network deconvolution in combination with the community
prediction method from DREAM5 (ref. 5). We found that community
prediction after network deconvolution showed 22% greater performance than community prediction on the original networks, suggesting
that network deconvolution maintains the complementary aspects of
these networks important in community prediction approaches. We
note that the community prediction approach is not the best predictor
here, with or without network deconvolution, probably owing to the
insufficiently diverse nature of the original networks. Overall, these
results suggest that despite the ability of even the best-performing
methods to recover high-quality networks, strong indirect effects
remain, which can be reduced by use of network deconvolution.
We next studied how network deconvolution affects the prediction
of local network connectivity patterns. We specifically focused on
the ability to correctly predict feed-forward loops, that truly contain
both an indirect A→B→C path and a feed-forward A→C edge, and
regulatory cascades, for which A and C are only connected through
B (Supplementary Note, 2.4). Consistent with previous studies5, we
found that network inference methods tend to perform better on one
or the other network motif, based on their approach for dealing with
indirect information (Fig. 2b). For example, mutual information–
based network inference (MI) is biased toward including feed-forward
edges, leading to increased accuracy for feed-forward loops, but many
spurious transitive edges for cascades, whereas the Inferelator and
ANOVerence are biased toward excluding feed-forward edges, leading
to increased accuracy for cascades but many missing feed-forward edges
in feed-forward loops. Notably, the ARACNE algorithm, which seeks
to directly remove transitive edges, shows a decreased performance

Application to protein structural constraints
We next applied ND to infer structural constraints between pairs of
amino-acids for protein structure prediction43–45. Prior work used
evolutionary information to reveal pairs of amino acid residues that
are proximal in the three-dimensional protein structure. However, the
pairwise evolutionary correlation matrix may contain many transitive
relationships between pairs of residues7–10,17,31,46–49. For example,
if two amino-acid residues both interact with an intermediate residue, but are not directly interacting with each other, they will show
high mutual information owing to indirect effects. One approach to
remove transitive noise is to use a probabilistic maximum entropy
solution10 that is specifically designed for inferring directly interacting residues15,16,30. Our aim here is to demonstrate effectiveness of
using network deconvolution as a general method to infer directly
interacting residues over protein contact networks.
As strong clusters of high mutual information have been shown
to hinder identification of directly interacting residues, we reasoned
that network deconvolution may be able to break up these clusters
and reveal directly interacting residues, by distinguishing those correlations that can be explained by transitive relationships. Here, we
build on an approach which uses comparative genomics information
of residue co-variation across evolutionarily diverged species.
We applied network deconvolution to predict contact maps on
fifteen proteins in different folding classes with sizes ranging from
50 to 260 residues15 (Supplementary Table 2). In our input network,
the nodes represent amino acid residues, and each edge between a
pair of residues represents their co-variation across multiple sequence
alignments spanning 2,000–72,000 sequences, quantified by their
mutual information. Applying network deconvolution to a mutual
information network leads to a systematic and substantial increase
in the discovery rate of interacting amino-acids, based on non­adjacent amino-acid contact maps for known structures (Fig. 3a and
Supplementary Fig. 7). High mutual information residue pairs contain both physically interacting residues and non-interacting residues,
presumably owing to indirect interactions. Application of network
deconvolution reduces the scores of non-interacting pairs and facilitates distinguishing directly interacting ones (Fig. 3b).
We also applied network deconvolution to a weighted interaction network based on direct information15. Although using network deconvolution over direct information led to a small improvement over the top
predictions, which was especially consistent for non-redundant interacting pairs (Supplementary Fig. 7b), a robust performance assessment requires comparison of predicted proteins 3D structures, which
is beyond the scope of this study (Supplementary Figs. 8–10).

nature biotechnology VOLUME 31  NUMBER 8  AUGUST 2013

729

A n a ly s i s

1hzx
0

3tgi

Known contacts

50

100

b

5p21
0

0

100

+

MI

× MI+ND

+

DI

× DI+ND

ND advantage

100
200
200

1f21
0

1e6k

1bkr

2it6

0

0

0

50

50

50

100

100

100

0.6
3% discrimination

0.4
0.2

0

100

2o72

0

0

50

50

1r9h

1odd

0

0

20

20

40

100

40
60

1wvn

0

0

20

20

40
60

5pti
0

20

2hda
0

0.4

0.6

0.8

1.0

0.8

1.0

Interacting pairs

0.8

Noninteracting pairs

0.6
15% discrimination

0.4
0.2
0

100

1g2e

0.2

After ND

1.0
Cumulative distribution

1rqm

80

© 2013 Nature America, Inc. All rights reserved.

Noninteracting pairs

Edge weights

60

npg

Interacting pairs

0.8

0

50

Amino acid position

ND disadvantage

150

Before ND

1.0
Cumulative distribution

a

0

0.2

0.4

0.6

Edge weights

20

40
40

40

60

Amino acid position

Figure 3  Application to protein structure prediction. (a) Applying network deconvolution to predict experimentally determined residue contacts (gray
dots) based on amino-acid sequence alignments on 15 proteins in different folding classes with sizes ranging from 50 to 260 residues in human. We
applied network deconvolution to networks derived by mutual information (MI) and direct information 15 (DI). Each plot shows the full residue contact
map twice, with the lower left triangle showing network deconvolution (ND) applied to MI, and upper right triangle showing ND applied to DI. Arrows
highlight distinct residue interactions captured by each method, highlighting the improvement over both MI and DI. (b) Cumulative distributions of graph
weights for interacting and noninteracting amino acid pairs, for both MI (blue) and network deconvolution (ND, red), of all proteins except 1hzx as all
methods have very low performance prior to network deconvolution, possibly due to its poor sequence alignment. Network deconvolution assigns higher
weights to true-positive edges and lower weights to false negatives, leading to fivefold higher discrimination between true contacts and indirect ones for
the 10% of edges with highest scores.

Application to co-authorship collaboration relationships
We next applied our network deconvolution approach to a social
network of co-authorship information50 to distinguish strong and
weak collaborations, that can play different key roles in social networks11,51–53. Given the recent surge of social networks like Facebook
or ResearchGate, recognizing weak and strong ties is increasingly
important for recommending friends or colleagues, recognizing
conflicts of interest or evaluating an author’s contribution to a team.
Previous approaches have defined strong ties using shared indirect
contacts54, edges that increase network distance upon removal or
edges connecting nodes within the same module52. In co-authorship
networks, strong ties have been defined by using additional information beyond network connectivity (Supplementary Note, 4),
including the number of co-authored papers and the number of other
co-authors of these papers50,55.
We used an unweighted input network of 1,589 scientists working in the field of network science 55, in which two authors are
connected by an edge if they have co-authored at least one paper.
We then applied our network deconvolution approach directly on the
edges provided by the co-authorship network, to recognize whether
network connectivity information alone is sufficient to capture additional information about strong and weak ties previously computed

on the same network. Our assumption is that edges resulting from
indirect paths likely correspond to weak collaborations, diluted over
many other co-authors, whereas edges with low indirect contributions are more likely to correspond to meaningful collaborations.
Application of network deconvolution to this unit-weight network
led to a weighted network whose transitive closure most closely captures the input network information, and whose weights represent
the inferred strength of likely direct interactions. We then ranked all
co-authorship edges according to the weight assigned to each by the
network deconvolution approach.
We found that the resulting edge weights indeed capture co­authorship tie strengths previously computed by summing the number
of co-authored papers and down-weighting each paper by the number
of additional co-authors55. We defined true strong ties based on
Newman’s weight ≥ 0.5 (36% of edges) that incorporates additional
publication information, and our predictions based on the network
deconvolution weight corresponding to the same fraction of edges (network deconvolution weight ≥ 0.64). We found that network deconvolution ­correctly recovered 77% of strong co-authorship ties solely by use
of the network topology, demonstrating that additional information
about collaboration strength lies within network connectivity information, and that network deconvolution is very well-suited for discovering

730

VOLUME 31  NUMBER 8  AUGUST 2013 nature biotechnology

a n a ly s i s
a

b

1.0
Weak
incorrect

ND predicted collaboration strength

0.8
Strong
correct
0.6

0.4

0.2
Weak
correct

npg

© 2013 Nature America, Inc. All rights reserved.

Strong correct
Strong incorrect
Weak correct
Weak incorrect

0

0

0.2

0.4
0.6
True collaboration strength

Strong
incorrect

0.8

1.0

Figure 4  Application to co-authorship social network. (a) Use of network deconvolution to distinguishing strong ties from weak ties in the largest
connected component of a co-authorship network containing 379 authors. True collaboration strengths were computed by summing the number of
co-authored papers and down-weighting each paper by the number of additional co-authors. Network deconvolution only had access to unweighted
co-authorship edges, but exploiting transitive relationships to weigh down weak ties resulting in 77% accurate predictions (solid lines) and only 23%
inaccurate predictions (dashed lines), demonstrating that this information lies within the network edges, and that network deconvolution is wellsuited for discovering it. (b) Beyond the binary classification of strong and weak ties, we found a strong correlation (R2 = 0.76) across all 2,742 edges
connecting 1,589 authors, between the weights assigned by network deconvolution (ND) and the true collaboration strengths obtained using additional
publication details.

it (Fig. 4a). Beyond the binary classification of edges into strong and
weak, we found a strong overall agreement between the rank obtained
by the true collaboration strength and the rank provided by the network deconvolution weight (correlation coefficient R2 = 0.76, Fig. 4b).
The exception was a population of edges that had strong collaboration scores but weak network deconvolution weights, likely due to the
number of co-authored publications that factors in the collaboration
score but is not available in the network deconvolution network input.
Indeed, collaborators connected by a strong edge that were incorrectly
predicted by network deconvolution had on average co-authored sixfold more papers per author than collaborators correctly predicted as
weak, suggesting a very strong additional bias beyond the information
provided by the topology. With the widespread availability of social
networks and the current interest in predicting strong and weak social
ties, we expect that network deconvolution will be widely useful in
many social network applications beyond co-authorship.
DISCUSSION
Network deconvolution provides a general framework for computing direct dependencies in a network by use of observed similarities.
It can recognize and remove spurious transitive edges due to indirect
effects, decrease edge weights that are overestimated owing to indirect relationships, and assign edge weights corresponding to direct
dependencies to the remaining edges. Thereby, network deconvolution can improve the quality of a broad range of observed networks
that are tainted by indirect edge weights because of transitive effects.
We introduced an efficient and scalable algorithm for deconvolving an
observed network based on a nonlinear filter computing the inverse of
a Taylor series expansion over each eigenvalue. We demonstrated that

network deconvolution is effective for gene regulatory network inference, protein contact prediction based on protein sequence alignment
and inference of collaboration strength from co-authorship social
networks. In each case, even though we did not use domain-specific
knowledge, network deconvolution was effective, illustrating the
generality and wide applicability of the approach.
The problem of indirect spurious edges has been widely recognized
in network inference, but characterized mostly at the local level. In
particular, even top-performing network inference methods have
been shown to contain many false transitive edges in cascade network
motifs, and efforts to remedy this situation lead to incorrect removal
of true edges in feed-forward loops5. At this local level, we have shown
that network deconvolution has the ability to correctly remove spurious transitive edges in true cascade network motifs, while maintaining
true feed-forward edges in feed-forward network motifs. In contrast
to previous methods that make well-documented tradeoffs in sensitivity versus specificity for these transitive edges5, network deconvolution reduces the number of false positives on indirect interactions,
while maintaining true positives in feed-forward loops.
However, network deconvolution has a much broader effect
than simply removing local indirect edges. In contrast to previous
approaches that study local patterns of dependencies to recognize
potential indirect edges, network deconvolution takes a global
approach by directly inverting the transitive closure of the true network. Previous algorithms29 have sought local approximations to the
removal of indirect effects which have been limited to indirect paths
of only limited lengths (typically of length 2), owing to the computational complexity of enumerating and evaluating all higher-order
paths, and the lack of a systematic way to compute their combined

nature biotechnology VOLUME 31  NUMBER 8  AUGUST 2013

731

npg

© 2013 Nature America, Inc. All rights reserved.

A n a ly s i s
effects. By exploiting eigenvector decomposition and Taylor series
closed form solutions, network deconvolution provides four advantages over local approaches: (i) it leads to a much more computationally efficient solution, (ii) it has the power to remove indirect effects
over paths with arbitrary lengths, (iii) it can remove the combined
effects of arbitrarily many indirect paths between two nodes and
(iv) it eliminates the need for iterative network refinement. These
advantages are due to the fact that network deconvolution is essentially a single global operation to subtract the transitive effects of all
powers of an adjacency matrix, rather than testing only pair-wise
relationships or small network motifs one at a time.
Moreover, we showed that network deconvolution can be applied to
networks with very different properties. The networks used here were
of different size, density, clustering coefficient or network centrality,
showing that network deconvolution is robust to these parameters.
The input networks were also based on different properties, including
mutual information and correlation that network deconvolution was
designed for, but also networks based on regression, tree-based ensemble methods, feature selection approaches and other nonlinear similarity metrics. We also applied network deconvolution to both weighted
and unweighted networks, and used the results both for reweighing
of edges and for edge classification, demonstrating the discrete and
continuous applications of the approach. More generally, network
deconvolution is not just about edge inclusion or removal, but about
probabilistic weighing of individual edges to reveal direct interactions
based on observed relationships across the complete network.
We believe that the network deconvolution algorithm introduced
here will serve as a foundational graph theoretic tool for computing
direct dependencies in many problems in network science and other
fields. Although the forward problem of repeated matrix multiplication,
also known as network convolution or matrix interpolation in applied
fields, has been a key graph theoretical tool, the inverse problem has
received relatively little attention. Matrix interpolation has been used
in protein-protein interaction networks to propagate functional information through the network56, in movies and shopping applications to
make recommendations for users based on previous actions57 and in
social networks to make friend recommendations. We similarly expect
network deconvolution to lead to a rich set of applications in network
science, molecular and cell biology and many other fields.

Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.

COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

1. De Smet, R. & Marchal, K. Advantages and limitations of current network inference
methods. Nat. Rev. Microbiol. 8, 717–729 (2010).
2. Newman, M.E.J. The structure and function of complex networks. SIAM Rev. 45,
167–256 (2003).
3. Koetter, R. & Médard, M. An algebraic approach to network coding. IEEE/ACM
Trans. Netw. 11, 782–795 (2003).
4. Witten, I.H., Frank, E. & Hall, M.A. Data Mining: Practical Machine Learning Tools
and Techniques (Morgan Kaufmann, 2011).
5. Marbach, D. et al. Wisdom of crowds for robust gene network inference.
Nat. Methods 9, 796–804 (2012).
6. Marbach, D. et al. Revealing strengths and weaknesses of methods for gene network
inference. Proc. Natl. Acad. Sci. USA 107, 6286–6291 (2010).
7. Dunn, S.D., Wahl, L.M. & Gloor, G.B. Mutual information without the influence
of phylogeny or entropy dramatically improves residue contact prediction.
Bioinformatics 24, 333–340 (2008).
8. Burger, L. & van Nimwegen, E. Disentangling direct from indirect co-evolution of
residues in protein alignments. PLoS Comput. Biol. 6, e1000633 (2010).
9. Giraud, B.G., Heumann, J.M. & Lapedes, A.S. Superadditive correlation. Phys.
Rev. 59, 4983–4991 (1999).
10. Lapedes, A.S., Giraud, B.G., Liu, L. & Stormo, G.D. Correlated mutations in models
of protein sequences: phylogenetic and structural effects. IMS Lecture NotesMonograph Series 33, 236–256 (1999).
11. Friedkin, N.E. Information flow through strong and weak ties in intra-organizational
social networks. Soc. Networks 3, 273–285 (1982).
12. de la Fuente, A., Bing, N., Hoeschele, I. & Mendes, P. Discovery of meaningful
associations in genomic data using partial correlation coefficients. Bioinformatics
20, 3565–3574 (2004).
13. Hemelrijk, C.K. A matrix partial correlation test used in investigations of reciprocity
and other social interaction patterns at group level. J. Theor. Biol. 143, 405–420
(1990).
14. Veiga, D.F.T., Vicente, F.F.R., Grivet, M., De la Fuente, A. & Vasconcelos, A.T.R.
Genome-wide partial correlation analysis of Escherichia coli microarray data.
Genet Mol. Res. 6, 730–742 (2007).
15. Marks, D.S. et al. Protein 3D structure computed from evolutionary sequence
variation. PLoS ONE 6, e28766 (2011).
16. Hopf, T.A. et al. Three-dimensional structures of membrane proteins from genomic
sequencing. Cell 149, 1607–1621 (2012).
17. Weigt, M., White, R.A., Szurmant, H., Hoch, J.A. & Hwa, T. Identification of direct
residue contacts in protein-protein interaction by message passing. Proc. Natl.
Acad. Sci. USA 106, 67–72 (2009).
18. Wainwright, M.J. & Jordan, M.I. Graphical models, exponential families, and
variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008).
19. Seth, A. Granger causality. Scholarpedia 2, 1667 (2007).
20. Quinn, C.J., Coleman, T.P., Kiyavash, N. & Hatsopoulos, N.G. Estimating the
directed information to infer causal relationships in ensemble neural spike train
recordings. J. Comput. Neurosci. 30, 17–44 (2011).
21. Ding, M., Truccolo, W.A. & Bressler, S.L. Evaluating causal relations in neural
systems: Granger causality, directed transfer function and statistical assessment of
significance. Biol. Cybern. 157, 145–157 (2001).
22. Pearl, J. Causality: Models, Reasoning, and Inference (Cambridge Univ Press,
2000).
23. Huynh-Thu, V.A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory
networks from expression data using tree-based methods. PLoS ONE 5, e12776
(2010).
24. Meinshausen, N. & Bühlmann, P. High dimensional graphs and variable selection
with the Lasso. Ann. Stat. 34, 1436–1462 (2006).
25. Pinna, A., Soranzo, N. & de la Fuente, A. From knockouts to networks: establishing
direct cause-effect relationships through graph analysis. PLoS ONE 5, e12912
(2010).
26. Friedman, N., Linial, M., Nachman, I. & Pe’er, D. Using Bayesian networks to
analyze expression data. J. Comput. Biol. 7, 601–20 (2000).
27. Friedman, N. Inferring cellular networks using probabilistic graphical models.
Science 303, 799–805 (2004).
28. Hartemink, A., Gifford, D., Jaakkola, T.S. & Young, R.A. Using graphical models
and genomic expression to statistically validate models of genetic regulatory
networks. Pac. Symp. Biocomput. 6, 422–433 (2001).
29. Margolin, A.A. et al. ARACNE: an algorithm for the reconstruction of gene
regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, S7
(2006).
30. Marks, D.S., Hopf, T.A. & Sander, C. Protein structure prediction from sequence
variation. Nat. Biotechnol. 30, 1072–1080 (2012).
31. Jones, D., Buchan, D., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact
prediction using sparse inverse covariance estimation on large multiple sequence
alignments. Bioinformatics 28, 184–190 (2012).
32. Bonneau, R. et al. The Inferelator: an algorithm for learning parsimonious
regulatory networks from systems-biology data sets de novo. Genome Biol. 7, R36
(2006).
33. Bar-Joseph, Z. et al. Computational discovery of gene modules and regulatory
networks. Nat. Biotechnol. 21, 1337–1342 (2003).

732

VOLUME 31  NUMBER 8  AUGUST 2013 nature biotechnology

Methods
Methods and any associated references are available in the online
version of the paper.
Accession codes. All code and data sets are available at <http://
compbio.mit.edu/nd> and in Supplementary Data.
Note: Supplementary information is available in the online version of the paper.
Acknowledgments
We thank B. Holmes for suggestions on the initial aspects of this work, M. Bansal
on the protein structural constraints, Y. Liu for initial analysis of DREAM5
networks and R. Küffner for discussions and code used for network motif analysis.
The work was supported by US National Institutes of Health grants R01 HG004037
and HG005639 to M.K., a Swiss National Science Foundation fellowship to
D.M. and National Science Foundation CAREER award 0644282 to M.K.
Author contributions
S.F. and M.K. developed the method, analyzed results and wrote the paper.
D.M. contributed to DREAM5 data sets, gene network inference and network
motif analysis. M.M. contributed to correctness proof and robustness analysis.

34. Reiss, D.J., Baliga, N.S. & Bonneau, R. Integrated biclustering of heterogeneous
genome-wide datasets for the inference of global regulatory networks. BMC
Bioinformatics 7, 280 (2006).
35. Greenfield, A., Madar, A., Ostrer, H. & Bonneau, R. DREAM4: Combining genetic
and dynamic information to identify biological networks and dynamical models.
PLoS ONE 5, e13397 (2010).
36. di Bernardo, D. et al. Chemogenomic profiling on a genome-wide scale using reverseengineered gene networks. Nat. Biotechnol. 23, 377–383 (2005).
37. Faith, J.J. et al. Large-scale mapping and validation of Escherichia coli
transcriptional regulation from a compendium of expression profiles. PLoS Biol.
5, e8 (2007).
38. Gama-Castro, S. et al. RegulonDB version 7.0: transcriptional regulation of
Escherichia coli K-12 integrated within genetic sensory response units (Gensor
Units). Nucleic Acids Res. 39, D98–D105 (2011).
39. MacIsaac, K.D. et al. An improved map of conserved regulatory sites for
Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006).
40. Butte, A.J. & Kohane, I.S. Mutual information relevance networks: functional
genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput.
426, 418–429 (2000).
41. Küffner, R., Petri, T., Tavakkolkhah, P., Windhager, L. & Zimmer, R.
Inferring gene regulatory networks by ANOVA. Bioinformatics 28, 1376–1382
(2012).
42. Haury, A.C., Mordelet, F., Vera-Licona, P. & Vert, J.P. TIGRESS: trustful
inference of gene regulation using stability selection. BMC Syst. Biol. 6, 145
(2012).
43. Altschuh, D., Lesk, A., Bloomer, A. & Klug, A. Correlation of co-ordinated amino
acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol.
Biol. 193, 693–707 (1987).
44. Göbel, U., Sander, C., Schneider, R. & Valencia, A. Correlated mutations and residue
contacts in proteins. Proteins 18, 309–317 (1994).
45. Neher, E. How frequent are correlated changes in families of protein sequences?
Proc. Nat. Acad. Sci. USA 91, 98–102 (1994).

46. Nugent, T. & Jones, D.T. Accurate de novo structure prediction of large
transmembrane protein domains using fragment-assembly and correlated mutation
analysis. Proc. Natl. Acad. Sci. USA 109, E1540–E1547 (2012).
47. Morcos, F. et al. Direct-coupling analysis of residue coevolution captures
native contacts across many protein families. Proc. Natl. Acad. Sci. USA 108,
E1293–E1301 (2011).
48. Lapedes, A., Giraud, B. & Jarzynski, C. Using sequence alignments to predict protein
structure and stability with high accuracy. Preprint at ⟨http://arXiv.org/abs/1207.2484⟩
(2012).
49. Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction
in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E 87, 012707
(2013).
50. Newman, M.E.J. Finding community structure in networks using the eigenvectors
of matrices. Phys. Rev. E 74, 036104 (2006).
51. Granovetter, M. The strength of weak ties: a network theory revisited. Sociol.
Theory 1, 201–233 (1983).
52. Ferrara, E., De Meo, P., Fiumara, G. & Provetti, A. The role of strong and weak ties
in Facebook: a community structure perspective. Preprint at <http://arXiv.org/
abs/1203.0535> (2012).
53. Tang, J., Sun, J., Wang, C. & Yang, Z. Social influence analysis in large-scale
networks. Proceedings of the 15th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining KDD ’09, 807–816. ⟨doi:10.1145/155701
9.1557108⟩ (2009).
54. Shi, X. Networks of strong ties. Physica A 378, 33–47 (2007).
55. Newman, M.E.J. Scientific collaboration networks. II. Shortest paths, weighted
networks, and centrality. Phys. Rev. E 64, 016132 (2001).
56. Sharan, R., Ulitsky, I. & Shamir, R. Network-based prediction of protein function.
Mol. Syst. Biol. 3, 88 (2007).
57. Song, X., Tseng, B.L., Lin, C.-Y. & Sun, M.-T. Personalized recommendation driven
by information flow. Proceedings of the 29th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval SIGIR ’06,
509–516 ⟨doi:10.1145/1148170.1148258⟩ (2006).

nature biotechnology VOLUME 31  NUMBER 8  AUGUST 2013

733

npg

© 2013 Nature America, Inc. All rights reserved.

a n a ly s i s

ONLINE METHODS

Network deconvolution. Network deconvolution framework is outlined in
Figure 1 (full description in Supplementary Note, 1). A perennial challenge
to inferring networks is that, observed similarity weights are the sum of both
direct and indirect relationships. A direct information flow modeled by an
edge in Gdir can give rise to two or higher level indirect flows. Such indirect
flows are captured in Gindir:

npg

© 2013 Nature America, Inc. All rights reserved.

2
3
Gindir = Gdir
+ Gdir
+

where the power associated with each term in Gindir corresponds to the number
of edges of indirect paths. Gdir + Gindir together capture both direct and indirect dependencies, which in fact comprise the observed dependencies. Note
that, the observed dependency matrix is linearly scaled so that the largest
absolute eigenvalue of Gdir < 1. Therefore, the effects of indirect information
flows decrease exponentially with the length of indirect paths (Supplementary
Notes, 1.2 and 1.6). Self-loops of observed dependency network are excluded
by setting its diagonal components to zero.
Suppose Gobs represents the matrix of observed dependencies: a properly
scaled similarity matrix between variables (nodes in the network). Gobs can
be derived by use of different pairwise similarity metrics, such as correlation or mutual information, and scaled linearly, based on the largest absolute eigenvalue of the unscaled similarity matrix. The observed dependency
matrix captures both direct and indirect effects; that is, Gobs = Gdir + Gindir.
Note that, the indirect dependency matrix, Gindir, is a function of another
unknown Gdir. The main question is how to compute Gdir by using the tainted
observed similarities Gobs.
Although Gindir may at first appear intractable because it is an infinite sum,
one may note that, similarly to Taylor series expansions, under mild conditions
(Supplementary Notes, 1.1 and 1.2) that are generally present in the setting
that we consider, we have:
2
Gobs = Gdir + Gindir = Gdir (I + Gdir + Gdir
+  ) = Gdir (I − Gdir )−1.

The above observation leads to a simple closed-form expression for
Gdir (Fig. 1b):
Gdir = Gobs (I + Gobs )−1.
For symmetric input matrices and some asymmetric ones, we show that, the
observed dependency matrix Gobs can be decomposed to its eigenvalues and
eigenvectors (Supplementary Note, 1.4). Say U and Σobs represent the matrix

nature biotechnology

of eigenvectors and a diagonal matrix of eigenvalues of matrix Gobs. The i-th
diagonal component of the matrix Σobs represents the i-th eigenvalue liobs of
the observed dependency matrix Gobs. Then, by using the eigen decomposition
principle, we have Gobs = UΣobsU−1.
In this framework, an optimal solution to compute direct dependencies
can be computed in the following steps, which comprise the main parts of the
proposed network deconvolution algorithm (Fig. 1c):
Step 1 (decomposition step). Decompose the observed dependency matrix
Gobs to its eigenvalues and eigenvectors such that Gobs = UΣobsU−1.
Step 2 (deconvolution step). Form a diagonal matrix Σdir whose i-th
diagonal component is
lidir =

liobs

1 + liobs

.

Then, the output direct dependency matrix is Gdir = UΣdirU−1.
We show that this algorithm finds a globally optimal direct dependency
matrix without error (Supplementary Note, 1.2).
Performance metrics for gene regulatory networks. A detailed description
of gene regulatory network performance metrics is given in Supplementary
Note, 2.3. Network predictions were evaluated as binary classification tasks
where edges were predicted to be present or absent. Then, standard performance metrics from machine learning were used: precision-recall (PR) and
receiver operating characteristic (ROC) curves. Similar to DREAM5 (ref. 5),
only the top 100,000 edge predictions were accepted. Then, AUROC and
AUPR were separately transformed into P-values by simulating a null distribution for 25,000 random networks. To compute an overall score that summarizes the performance over the three networks with available gold standards
(E. coli, S. cerevisiae and in silico), we used the same metric as in the DREAM5
project, which is defined as the mean of the (log-transformed) networkspecific P-values:
ROCscore =

PRscore =

1 3
∑ − log10 ( pROCi )
3i =1
1 3
∑ − log10 ( pPRi )
3i =1

score = (ROCSCORE + PRscore )/2



doi:10.1038/nbt.2635

Articles

Production of omega-3 eicosapentaenoic acid by
metabolic engineering of Yarrowia lipolytica

npg

© 2013 Nature America, Inc. All rights reserved.

Zhixiong Xue1, Pamela L Sharpe2, Seung-Pyo Hong2, Narendra S Yadav1, Dongming Xie2, David R Short1,
Howard G Damude3, Ross A Rupert2, John E Seip2, Jamie Wang2, Dana W Pollak2, Michael W Bostick2,
Melissa D Bosak2, Daniel J Macool1, Dieter H Hollerbach2, Hongxiang Zhang1, Dennis M Arcilla2,
Sidney A Bledsoe1, Kevin Croker2, Elizabeth F McCord4, Bjorn D Tyreus2, Ethel N Jackson2 & Quinn Zhu2
The availability of the omega-3 fatty acids eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) is currently limited
because they are produced mainly by marine fisheries that cannot keep pace with the demands of the growing market for these
products. A sustainable non-animal source of EPA and DHA is needed. Metabolic engineering of the oleaginous yeast Yarrowia
lipolytica resulted in a strain that produced EPA at 15% of dry cell weight. The engineered yeast lipid comprises EPA at 56.6%
and saturated fatty acids at less than 5% by weight, which are the highest and the lowest percentages, respectively, among known
EPA sources. Inactivation of the peroxisome biogenesis gene PEX10 was crucial in obtaining high EPA yields and may increase
the yields of other commercially desirable lipid-related products. This technology platform enables the production of lipids with
tailored fatty acid compositions and provides a sustainable source of EPA.
Omega-3 long-chain polyunsaturated fatty acids (LCPUFAs), which
include EPA (C20: 5n – 3) and DHA (C22: 6n – 3), are natural products that are thought to be essential for human health. In addition
to their use as health supplements, EPA and DHA are also used in
pharmaceutical, aquaculture, terrestrial animal feed, pet food and
personal care. The demand for EPA and DHA is growing, but most
commercially available EPA and DHA are produced using wild-caught
ocean fish. We report the metabolic engineering of Y. lipolytica to
provide a commercially viable, sustainable, land-based source of EPA
and other valuable LCPUFAs.
EPA and DHA have crucial roles in the structure and function of
cellular membranes and are precursors to several important eicosanoids, including prostacyclins, leukotrienes and prostaglandins1,2.
Human clinical studies have documented the health benefits of
omega-3 LCPUFAs3. Of particular interest is the Japan EPA Lipid
Intervention Study (JELIS), which showed that ingestion of EPA
reduced major coronary events by 19% in patients with a history of
coronary artery disease4, and the AMR101 study, which showed that
ingestion of pure EPA reduced triglyceride (TAG) concentrations in
adult patients with severe hypertriglyceridemia5. Highly purified EPA,
such as Epadel and Vascepa, or mixtures of EPA and DHA, such as
Lovaza, are prescribed to reduce TAG concentrations. Clinical studies using EPA lipids derived from the engineered Y. lipolytica strain
that we describe in this report confirmed that the ingestion of EPA
reduced TAG concentrations and also showed that EPA and DHA are
functionally distinct and have different physiological and pharmacological roles in human health6,7.

EPA and DHA are synthesized de novo in marine microorganisms and phytoplankton and accumulate in other species through
the food chain. EPA and DHA are synthesized by either an anaerobic
polyketide synthase pathway8 or an aerobic desaturase and elongase
pathway9–11. The aerobic pathway can be further classified into a
∆-6 desaturase pathway (the ∆-6 pathway, found in algae, mosses,
fungi and others) or a ∆-9 elongase and ∆-8 desaturase pathway (the
∆-9 pathway, found in euglenoids). Land-based production of EPA
and DHA using a variety of organisms ranging from algae to plants
has been investigated as a sustainable alternative to production from
fish11–20. DHA has been commercially produced from microalgae
such as Crypthecodinium cohnii and Schizochytrium sp.12,13, but so far,
attempts to produce EPA have not generated sufficiently high yields.
Genetic engineering of Saccharomyces cerevisiae for EPA production19
yielded less than 1% of the total fatty acids (TFA) as EPA. Transgenic
plants have been reported to produce EPA at about 3% of the TFA in
Arabidopsis leaves14 and at about 25% of the TFA in Brassica seeds17.
Commercial production of EPA from transgenic plants has not yet
been achieved.
We report the engineering of the oleaginous yeast Y. lipolytica for
commercial production of EPA. The engineered strain produced
lipids with EPA at 56.6% of the TFA (by weight) and accumulated
lipids at up to 30% of the dry cell weight (DCW). Two commercial
products have been developed using this yeast: NewHarvest EPArich oil (a human nutritional supplement) and Verlasso sustainably farmed salmon. To our knowledge, this is the first example
of metabolically engineered yeast that has been used to produce a

1Industrial

Biosciences, E.I. du Pont de Nemours and Company, Wilmington, Delaware, USA. 2Biochemical Sciences and Engineering, Central Research and
Development, E.I. du Pont de Nemours and Company, Wilmington, Delaware, USA. 3Crop Genetics, E.I. du Pont de Nemours and Company, Wilmington, Delaware,
USA. 4Corporate Center for Analytical Science, E.I. du Pont de Nemours and Company, Wilmington, Delaware, USA. Correspondence should be addressed to
Q.Z. ([email protected]).
Received 19 December 2012; accepted 29 May 2013; published online 21 July 2013; doi:10.1038/nbt.2622

734

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

Articles

RESULTS
Metabolic engineering of Y. lipolytica to produce EPA
The fatty acid profile of the Y. lipolytica wild-type strain ATCC#20362
(Fig. 1a) revealed that it can synthesize linoleic acid (C18: 2n – 6).
Introduction of either the ∆-6 or ∆-9 pathway genes (Fig. 1b) into the
wild-type strain should allow the production of EPA through desaturation and elongation of the native fatty acid species. However, cells
engineered with the ∆-6 pathway genes accumulated high amounts
of γ-linolenic acid (GLA; C18: 3n – 6), which indicated that the
elongation step from GLA to dihomo-γ-linolenic acid (DGLA; C20:
3n – 6) was rate limiting11. To avoid GLA build up, we selected the ∆-9
pathway to generate strain Y4305, which produced lipids with EPA at
56.6% of the TFA (Fig. 1c).
To produce EPA from linoleic acid through the ∆-9 pathway, we
needed a ∆-9 elongase, a ∆-8 desaturase, a ∆-5 desaturase and a ∆-17
desaturase (Fig. 1b). We identified genes encoding these enzymes
from a variety of LCPUFA-producing microorganisms such as algae,
euglenoids and fungi11,16,21,22 (unpublished data). To ensure efficient
expression of heterologous genes in Y. lipolytica, we constructed integration vectors by combining the codon-optimized coding regions11

a

with a suite of strong and regulable Y. lipolytica promoters, including
EXP1, FBAINm, GPAT, GPD and YAT23,24 (Supplementary Table 1).
Insertion of DNA fragments into the Y. lipolytica genome tends to
occur by nonhomologous end joining25. We used genome walking
and genome sequencing to determine all the integration sites of gene
expression cassettes that were used for the development of strain
Y4305 (Supplementary Table 2). We sequentially integrated four
expression cassettes into the genome of strain Y2224 (ura3 derivative
of the wild-type strain ATCC#20362) to construct the EPA-producing
intermediate strain Y4086 (Fig. 2). First, we used pZKLeuN-29E3,
which harbors one copy of a ∆-12 desaturase gene, two copies of a ∆-9
elongase gene and one copy of a C16 elongase (converting palmitic
acid to stearic acid) gene, to transform strain Y2224. We inserted the
integration cassette into the LEU2 locus (YALi0C00407g, GenBank
accession number AF260230). The resulting strain Y4001 (leu2) produced eicosadienoic acid (EDA; C20: 2n – 6) at 23.8% of the TFA,
which indicated that the heterologous genes that had been integrated
into the genome were expressed.
In the second step, we transformed strain Y4001U (ura3 derivative
of strain Y4001) with pKO2U-F8289. This introduced one copy of a
∆-12 desaturase gene, one copy of a ∆-9 elongase gene and two copies
of a synthetic mutant ∆-8 desaturase gene26 to enable the production of DGLA. The synthetic mutant ∆-8 desaturase gene contains

b

Wild-type ATCC#20362
C18:1

Palmitic acid
(C16:0)

Detector response

© 2013 Nature America, Inc. All rights reserved.

commercial product; it is also the first example of such a product
being used to replace an animal-derived product.

C16/18
elongase
Stearic acid
(C18:0)

C16:0
C16:1

∆9
desaturase

C18:2

Oleic acid
(C18:1)

C18:0

npg

∆12
desaturase
Retention time

c

LA
(C18:2, omega-6)

Engineered strain Y4305
C20:5

∆6
desaturase

Detector response

GLA
(C18:3, omega-6)

C18/20
elongase

C18:2

C16:0
C16:1

C18:1
C18:0

∆9
elongase
EDA
(C20:2, omega-6)
∆8
desaturase
DGLA
(C20:3, omega-6)
∆5
desaturase
ARA
(C20:4, omega-6)

∆17
desaturase

ALA
(C18:3, omega-3)

∆17
desaturase

∆9
elongase

ETrA
(C20:3, omega-3)

∆17
desaturase

STA
(C18:4, omega-3)

∆8
desaturase

ETA
(C20:4, omega-3)

∆17
desaturase

∆6
desaturase

C18/20
elongase

∆5
desaturase
EPA
(C20:5, omega-3)

Retention time

Figure 1  Fatty acid profiles of Y. lipolytica strains and EPA biosynthetic pathways. (a) Fatty acid profile of the Y. lipolytica wild-type strain ATCC#20362
determined by gas chromatography analyses (Online Methods). (b) Schematic diagram of the aerobic pathways for EPA biosynthesis. The native pathway
in Y. lipolytica is indicated with gray shading, and the engineered ∆-9 pathway for EPA biosynthesis is indicated with yellow shading. The ∆-17 desaturase
used in this study has strong activity to convert ARA to EPA; it also has weak activity to convert linoleic acid (LA) to ALA, EDA to ETrA, and DGLA to ETA 21.
STA, stearidonic acid. (c) Fatty acid profile of the engineered EPA-producing strain Y4305 determined by gas chromatography analyses (Online Methods).

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

735

Articles
Y2224
ura3

ATCC#
20362

∆17D (3)

Y4070
ura3

50

50

50

50

50

40

40

40

40

40

30

30

30

30

30

20

20

20

20

20

10

10

10

10

10

0

0

0

0

C16:0
EDA

∆12D
∆8D (2)
∆9E

50
25 amino-acid changes from the wild-type
40
gene of Euglena gracilis10 and has slightly bet30
ter substrate conversion activity. The result20
10
ing strain, Y4036, produced DGLA at 18.2%
0
of the TFA (Fig. 2).
We used pKSL-555R to transform Y4036U
(ura3 derivative of strain Y4036) in a third step to create strain Y4070.
This introduced three copies of a ∆-5 desaturase gene22, which enabled the production of arachidonic acid (ARA; C20: 4n – 6) at 11.9%
of the TFA. We then introduced three copies of a ∆-17 desaturase
gene21 into strain Y4070 using pZP3-PA777U to enable the production of EPA. The resulting strain, Y4086, contained a total of 14 copies
of 7 different chimeric genes (Supplementary Table 1) and produced
EPA at 9.8% of the TFA (Fig. 2).

C18:0

C18:1

C18:2

ALA

DGLA

ARA

ETrA

ETA

EPA
Y4259U
ura3

Y4217U
ura3

CPT
∆8D
∆9E
∆5D

∆12D
∆8D
∆9E
∆5D

CPT
∆12D
∆8D
∆9E

Y4217

Y4086

0

C16:1

Y4128U
ura3

Y4086U
ura3

Percentage TFA

© 2013 Nature America, Inc. All rights reserved.

∆5D (3)

Y4036
leu2

Y4001
leu2

Y4086U
ura3

Y4036U
leu2, ura3

∆12D
∆8D (2)
∆9E

Y4128

npg

Y4001U
leu2, ura3

C16E
∆12D
∆9E (2)

Percentage TFA

Figure 2  Schematic showing the construction
of the EPA-producing strain Y4305. The
construction of strain Y4305 from the
Y. lipolytica wild-type strain ATCC#20362
and the fatty acid composition (TFA (%))
of the intermediate strains obtained by gas
chromatography analyses (Online Methods)
are shown. Fatty acid species are labeled
as indicated in the key. Descriptions of the
plasmids used are given in Supplementary
Table 1. The genes added in each step are
indicated. C16E, C16 elongase gene; CPT,
cholinephosphotransferase gene; ∆9E, ∆-9
elongase gene; ∆5D, ∆-5 desaturase gene;
∆8DM, ∆-8 desaturase gene mutant26 derived
from E. gracilis10; ∆12D, ∆-12 desaturase gene;
∆17D, ∆-17 desaturase gene. The numbers
in parentheses indicate the number of genes
added, when not 1.

Y4259

Y4305

50

50

50

40

40

40

30

30

30

20

20

20

10

10

10

0

0

0

Deletion of PEX10 increases EPA titer
We constructed a pair of isogenic EPA-producing strains, Y4184 and
Y4184 (pex10∆), to assess the effects of PEX10 deletion (Fig. 3b–d).
Both strains grew normally on glucose-containing medium with no
other obvious phenotype. Compared with strain Y4184, the EPA titer

was increased by nearly twofold in strain Y4184 (pex10∆) (Fig. 3b,d).
Strain Y4184 (pex10∆) also did not grow on oleate medium (data
not shown). Therefore, complete deletion of PEX10 in strain Y4184
(pex10∆) resulted in the same phenotype as that observed in strain
Y4128, suggesting that the truncation of Pex10p led to a loss of function. The lipid content of strain Y4184 (pex10∆) reached its maximal
value after 3 d in high-glucose medium (HGM) and then remained
steady (Fig. 3c). In contrast, the lipid content of strain Y4184 reached
its maximal value after 3 d and then declined, such that by day 6 it
was about the same as it was at day 2. This suggests that a defect in
β-oxidation has a substantial impact on lipid accumulation.
The increased EPA titer in strains with nonfunctional Pex10p suggested that production of other LCPUFAs might benefit from mutation of PEX10 (Fig. 3). To test this hypothesis, we deleted PEX10 in the
DGLA-producing strain Y4036. The resulting strain, Y4036 (pex10∆),
produced DGLA at 42% of the TFA compared with strain Y4036,
which produced DGLA at 18% of the TFA (Fig. 3e). Introduction of
three copies of a ∆-5 desaturase gene into strains Y4036 and Y4036
(pex10∆) produced strains Y4049 and Y8051, respectively. Strain
Y8051 produced ARA at 35% of the TFA compared with strain Y4049,
which produced ARA at 13.5% of the TFA (Fig. 3e).
Pex10p is involved in the import of peroxisomal matrix proteins
and is required for normal peroxisome proliferation in Y. lipolytica27.
We introduced a GFP fusion gene with a peroxisome-targeting signal (PTS1) at the C terminus (GFP-SKL) into strains Y2224 and
Y4128U using pZP2-GFP-SKL to assess peroxisome protein import.
Fluorescence microscopy revealed that GFP without PTS1 was
distributed uniformly in the cytosol in strains Y2224 and Y4128U
(Fig. 4a). GFP-SKL was properly localized to the peroxisome in
PEX10 strain Y2224, as shown by the punctate fluorescence signal
pattern. However, GFP-SKL was distributed uniformly throughout
the cytosol in the pex10 strain Y4128U (Fig. 4a). Therefore, peroxisome protein localization seems to be dysfunctional in strain Y4128.
Electron microscopy analyses of strain Y4128 showed that normal
peroxisomes were absent from the cells. Instead, there were unidentified membrane-like structures that might have been deformed nonfunctional peroxisomes (Fig. 4b).

736

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

Alteration of Pex10p function affects EPA production
To further improve the flow of fatty acids into the engineered pathway
for EPA biosynthesis, we introduced one copy of a ∆-12 desaturase
gene, two copies of a synthetic mutant ∆-8 desaturase gene and one
copy of a ∆-9 elongase gene into strain Y4086 using pZP2-2988. One
transformant, designated strain Y4128, had an unusually high EPA titer
of about 38% of the TFA instead of the approximate 16% measured for
the other transformants (Figs. 2 and 3a). Genome walking from the
border sequences of the integration cassette of plasmid pZP2-2988
(Online Methods) revealed that it had inserted itself into the coding
region of the PEX10 gene (YALi0C01023g, GenBank accession number
CAG81606), resulting in the truncation of the C-terminal 32 amino
acids that form part of the PEX10 RING finger motif27,28. Evaluation
of growth on various carbon sources revealed that strain Y4128 could
not use oleic acid (oleate; C18: 1n – 9) as a sole carbon source (data
not shown), which suggested that β-oxidation was impaired in strain
Y4128. Transformation of strain Y4128U (ura3 derivative of strain
Y4128) with pPEX10-2, which carries the wild-type PEX10 gene,
restored the growth of strain Y4128 on oleate medium (data not
shown), and the EPA titer reverted to about 13% of the TFA (Fig. 3a).
This confirmed that loss of Pex10p function resulted in both a more
than doubled EPA titer and the β-oxidation defect in strain Y4128.

Articles
b

Y4128 (pex10) + PEX10

45

40

40
Percentage TFA

45
35
30
25
20
15

Y4184 (pex10∆)

Y4184
50

30
25
20
15

5

5

0

0

15

10

0
0

d

Y4184

e

Y4184 (pex10∆)

DGLA

f

ARA

45

60

40

30
20

0

30
25

20
15
10

2

4
Days in HGM

Y4036

6

6

DGAT2 overexpression
60

40
30
20

0

0
0

4
Days in HGM

10

5

10

2

50

35
Lipid (% DCW)

DGLA or ARA (% TFA)

EPA (% TFA)

40

Y4036
(pex10∆)

Y4049

Y8051
(pex10∆)

WT

WT +
DGAT2

PEX10

Y4305 Y4305 +
DGAT2
pex10

Figure 3  Fatty acid profiles and lipid content of strains with PEX10 mutations. (a) Fatty acid profile of strains Y4128 (pex10) and Y4128 (pex10) +
PEX10. Strain Y4128 (pex10) + PEX10 was designated as expression of PEX10 in the Y4128 (pex10) strain. (b) Fatty acid profiles of strains Y4184
and Y4184 (pex10∆). (c) Time courses of lipid production in strains Y4184 and Y4184 (pex10∆). (d) Time courses of EPA production in strains Y4184
and Y4184 (pex10∆). (e) DGLA content of strains Y4036 and Y4036 (pex10∆) and ARA content of strains Y4049 and Y8051 (pex10∆). (f) Lipid
content in PEX10 strains ATCC#20362 (wild type, WT) and WT with overexpression of DGAT2; also shown is lipid content in the pex10 strains Y4305
and Y4305 with overexpression of DGAT2.

Generation of commercial EPA production strain Y4305
Further engineering of strain Y4128 led to the construction of Y4305, a
strain with a high EPA titer that was suitable for commercial production
Y2224

Y4128U

GFP

GFP-SKL

a

b
Electron microscopy

npg

© 2013 Nature America, Inc. All rights reserved.

50

Y4184 (pex10∆)

5

C
16
C :0
16
C :1
18
C :0
18
C :1
18
:2
AL
A
ED
D A
G
LA
AR
A
Et
rA
ET
A
EP
A

10

Y4184
25

20

35

10

c

Lipid (% DCW)

Y4128 (pex10)
50

C
16
C :0
16
C :1
18
C :0
18
C :1
18
:2
AL
A
ED
D A
G
LA
AR
A
Et
rA
ET
A
EP
A

Percentage TFA

a

ATCC#20362

Y4128

P

MV

P

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

(Figs. 1c and 2). We transformed strain Y4128U with pZKL25U89GC (Supplementary Table 1), which contains one copy each
of a ∆-5 desaturase gene, a ∆-9 elongase gene, a synthetic mutant
∆-8 desaturase gene and a cholinephosphotransferase gene (CPT1,
YALi0C10989g), to create strain Y4217 with an EPA titer of 42% of
the TFA. Compared with Y4128, we achieved a marked reduction
of the amount of intermediates such as α-linolenic acid (ALA; C18:
3n – 3), EDA, DGLA, eicosatrienoic acid (ETrA; C20: 3n – 3) and
eicosatetraenoic acid (ETA; C20: 4n – 3), reflecting the increased
activities of ∆-9 elongase, ∆-8 desaturase and ∆-5 desaturase. The
addition of CPT1 has been shown to increase the desaturation of fatty
acids in Y. lipolytica in separate experiments (unpublished data).
We transformed strain Y4217U (ura3 derivative of Y4217) with
pZKL1-2SP98C (Fig. 2), which contains one copy each of a ∆-12
Figure 4  Peroxisome morphology and protein import in strains with PEX10
mutations. (a) GFP localization in different Y. lipolytica strains. Top row,
localization of GFP in Y2224 (ura3) and Y4128U (ura3, pex10) cells
transformed with plasmid pZP2-GFP and expressing wild-type GFP. Bottom
row, localization of GFP-SKL in Y2224 (ura3) and Y4128 (ura3, pex10)
cells transformed with plasmid pZP2-GFP-SKL and expressing GFP tagged
with PTS1 (SKL) at the C terminus. A description of the plasmids used
is given in Supplementary Table 3. (b) Cellular and organelle morphology
of different Y. lipolytica strains. Shown are electron microscopy pictures
of ATCC#20362 and Y4128 (pex10) cells. The peroxisome (P) in strain
ATCC#20362 and unknown membrane vesicles (MV) in strain Y4128
(pex10) are indicated with arrows. Scale bars, 0.5 µm.

737

Articles
a

b

Engineered strain Y4305
60

Lipid (% DCW)

EPA (% TFA)

Y4305 fatty acid composition
Fatty acid
Percentage TFA
Palmitic
Palmitoleic
Stearic
Oleic
Linoleic
ALA
EDA
DGLA
ARA
ETrA
ETA
EPA

50
Percentage

40
30
20
10
0

2

4
6
Days in HGM

8

C16:0
C16:1
C18:0
C18:1
C18:2
C18:3
C20:2
C20:3 (omega-6)
C20:4 (omega-6)
C20:3 (omega-3)
C20:4 (omega-3)
C20:5 (omega-3)

2.8
0.7
1.3
4.4
17.2
2.3
3.4
2.0
0.6
0.7
1.7
56.6

desaturase gene, a ∆-9 elongase gene, a synthetic mutant ∆-8 desaturase gene and a CPT1 gene, to produce strain Y4259. This strain has
an EPA titer of 46.5% of the TFA and showed further reduction of
the amount of DGLA and ETA in the TFA. We then transformed
Y4259U (ura3 derivative of Y4259) with pZKD2-5U89A2 (Fig. 2),
which contains one copy each of a ∆-5 desaturase gene, a synthetic
mutant ∆-8 desaturase gene, a ∆-9 elongase gene and a ∆-12 desaturase gene, to generate the final strain Y4305 (Fig. 2). Strain Y4305
contains 30 copies of 9 different genes and produces EPA at 56.6%
of the TFA.
Characterization of strain Y4305
We analyzed the fatty acid profile of strain Y4305 in shake flasks
(Figs. 1c and 5). In this strain, the EPA titer was as high as 56.6% of
the TFA, and the total saturated fatty acids, C16:0 and C18:0, were
only 4.1% of the TFA. The only major intermediate was C18:2, at
17.2% of the TFA. We tracked a typical time course of lipid and EPA
accumulation for strain Y4305 (Fig. 5a) and found that the lipid content reached its maximum after 6 d in HGM, whereas the EPA titer
continued to rise. At day 6, strain Y4305 produced lipids at up to
30% of the DCW and EPA at up to 56.6% of the TFA. Lipid distribution analyses showed that >85% of the fatty acids were in TAG

4.3% FFA

3.3% DAG

50

TAG fraction
Percentage TFA

Percentage TFA

b
87% TAG

40
30
20
10
0

50

FFA fraction
Percentage TFA

a

Percentage TFA

npg

© 2013 Nature America, Inc. All rights reserved.

Figure 5  EPA and lipid production in strain Y4305. (a) Time course of
EPA and lipid production in strain Y4305. (b) The final fatty acid profile
of Y4305 after 6 d in HGM.

40
30
20
10
0

5.6% PL

C16:0
EDA

C16:1
DGLA

C18:0
ARA

50

form (Fig. 6a). Whereas the EPA content in TAGs closely resembled
the EPA content of TFA, the EPA content in phospholipids was only
about 22% of the TFA (Fig. 6b).
We analyzed the distribution of EPA in TAGs at the three positions of the glycerol backbone by 13C NMR on a lipid sample containing EPA at 48.5% of the TFA. We extracted the lipid from the
biomass of strain Y4305 after 90 h of fermentation (Fig. 6c). NMR
is sensitive to the structure of organic and biological molecules,
which has been used to gain an advantage in the analysis of TAGs
and the lipids from which they are derived29. Peaks corresponding
to the α- and β-carbons of various fatty acids can also be identified
and quantified. The result showed that EPA is 55% of the TFA at
the sn – 1 and sn – 3 positions and is only 35% of the TFA at the
sn – 2 position, demonstrating that EPA is enriched at the sn – 1
and sn – 3 positions.
Determination of integration sites (Supplementary Table 2)
showed that in addition to the PEX10 and LEU2 genes, three other
open reading frames were disrupted by integration events. Of interest are LIP1, encoding lipase 1 (YALi0E10659g, GenBank accession number Z50020), and SCP2, encoding sterol carrier protein
(YALi0E12989g, GenBank accession number AJ431362). The third
open reading frame is the YALi0C18711g locus that encodes for a
nonessential gene with unknown function. Both Lip1p and Scp2p are
involved in lipid metabolism. The total lipid content of strain Y4305
was about 20% higher than that of strain Y4217 (data not shown);
this suggests that disruptions of LIP1 and SCP2 have positive effects
on lipid accumulation.
Overexpression of diacylglycerol acyltransferase (DGAT) markedly increased the lipid production in Y. lipolytica30. Overexpression
of Y. lipolytica DGAT2 (YALI0E32769g) in the PEX10 strain Y2224
resulted in a strain that was capable of accumulating lipid at 39%
of the DCW compared to 13% accumulation in the parent strain
(Fig. 3f). Overexpression of DGAT2 in the pex10 strain Y4305
resulted in lipid accumulation to more than 53% of the DCW
(Fig. 3f). These data demonstrate that there was an additive effect of
the pex10 mutation and overexpression of DGAT2 in lipid accumulation. However, the EPA titer remained at about 15% of the DCW, as
EPA in the lipids was reduced from 56.6% to about 30% of the TFA
(data not shown).

DAG fraction

c
Engineered strain Y4305

40
30

α-C20:5 (∆5)

20
10
0

50

PL fraction

α-satd, C18:0, C18:1,
C18:2, C18:3

β-satd, ∆9,8

40
30

β-C20:5 (∆5)

20
10
0
C18:1
ETrA

C18:2
ETA

ALA
EPA

Figure 6  Fatty acid distribution of lipid species from strain Y4305. (a) Lipid separation by thin layer chromatography (TLC) analysis. FFA, free fatty
acids; PL, phospholipids. (b) Fatty acid profile of each lipid species. Bands corresponding to each lipid species were excised from the TLC plate and
analyzed by gas chromatography as described in the Online Methods. (c) NMR spectrum of purified TAGs from the biomass of strain Y4305. satd,
saturated. The TAG extraction procedure and NMR analyses are described in the Online Methods.

738

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

Articles
DISCUSSION
So far, neither natural nor engineered EPA-producing microorganisms
or plants have achieved the productivity necessary for commercial
EPA production11,14,17,19,31. High amounts of EPA were produced
solely from fish oil by expensive separation and enrichment methods. Through metabolic engineering, we have generated a Y. lipolytica
strain, Y4305, that makes land-based commercial production of EPA
possible. The engineered strain produces EPA at 15% of the DCW
and produces lipids with EPA at 56.6% of the TFA. Both titers are the
highest among the currently known EPA sources. This achievement
was a result of careful selection of host organism and EPA production
pathways, balanced expression of pathway genes and modification of
host metabolism to improve lipid accumulation and remodeling.
Host selection was a crucial factor in the success of this project.
Y. lipolytica has a metabolism that is well suited to fatty acid production and lipid accumulation. Under nitrogen starvation and high
glucose concentration, Y. lipolytica produces lipids to about 30% of its
DCW. It has an established history of robust, commercial-scale fermentation performance and an excellent safety record32,33. Y. lipolytica
is one of the most studied unconventional yeasts, with a complete
genome sequence and many tools for genetic manipulation available34–37. It has also been used as a model system for studying hydrophobic substrate utilization, peroxisome biogenesis, lipid metabolism
and biolipid production30,37. These factors made Y. lipolytica an ideal
host for engineered production of omega-3 fatty acids.
The lipid produced by strain Y4305 has a fatty acid profile that
is extremely low in saturated fatty acids and contains only small
amounts of intermediates. This is due to both the characteristics of
the ∆-9 pathway and the carefully balanced expression of different
genes in the pathway. In engineered Y. lipolytica strains, the substrate
conversion efficiency (calculated as product/(product + substrate) ×
100%) of the introduced desaturases is substantially higher than that
of the elongases. This is the result of the differences in substrate and
availability. Desaturation occurs on the acyl moiety of phospholipid,
elongation occurs on the acyl moiety of the acyl-CoAs, and the acylCoA pool is limited, leading to lower conversion efficiency by elongases38. The selection of the ∆-9 pathway ensured that rate-limiting
elongation is the first step of the engineered pathway. Accumulation
of intermediates is therefore kept to a minimum, which is in contrast
to cells engineered with the ∆-6 pathway, in which the first step is
not rate limiting, and accumulation of GLA becomes substantial11.
Efficient and balanced expression of pathway genes is also crucial for
ensuring minimal build up of intermediates. Wild-type cells accumulated substantial amounts of C18:1 and C16 fatty acids, suggesting that
∆-12 desaturation and C16 elongation were limiting steps. We introduced five copies of the Fusarium moniliforme ∆-12 desaturase gene39
and one copy of the Mortierella alpina C16 elongase gene into strain
Y4305. The efficiency of heterologous gene expression was enhanced
through codon optimization11 and the use of strong and regulated
promoters23,24 (Supplementary Table 1). The combination of efficient overexpression of the introduced ∆-12 desaturase gene and the
C16 elongase gene effectively eliminated the bottleneck and ‘pushed’
the fatty acids into the engineered pathway, leading to a reduction of
C18:1 from greater than 40% to about 4% of the TFA and of C16 fatty
acids from about 30% to less than 4% of the TFA.
We created a strong ‘pull’ of intermediates toward EPA by the
introduction of three copies of an efficient ∆-17 desaturase gene21.
The substrate conversion efficiency of the ∆-17 desaturase reached
as high as 95.7%. The addition of these genes moved intermediates
along the pathway toward EPA, leading to an exceptionally high EPA
titer. Similarly, we achieved a more than 95% conversion efficiency

for the ∆-8 and ∆-5 desaturation steps through introduction of seven
and five copies of the desaturase genes, respectively. This effectively
prevented build up of intermediates.
The mutation in the PEX10 gene was the key to improvements
in EPA yield. Fluorescence and electron microscopy studies showed
that protein import into the peroxisome was disrupted and peroxisome morphology and integrity were compromised in the pex10
strains. The pex10 strains were unable to grow on oleate medium,
indicating a defect in β-oxidation. This confirmed a previous report
of loss of peroxisomes in a Y. lipolytica pex10 mutant40. The effect
of the pex10 mutation on lipid content was probably the result of a
defect in β-oxidation. Under oleaginous conditions, the source of
acyl-CoAs, the key intermediates in TAG biosynthesis, is both newly
synthesized fatty acids and fatty acids released from TAGs by lipases.
Acyl-CoAs can then be incorporated into either TAGs or phospholipids by various acyltransferases or degraded by β-oxidation. Lipid
content is determined by the relative strength of the synthesis and
degradation of acyl-CoAs. In PEX10 strains, β-oxidation activity is
substantial because of incomplete glucose repression in Y. lipolytica41.
As time progressed under nitrogen starvation, the total lipid content
also began to decline as a result of reduced TAG biosynthesis and/or
increased β-oxidation (Fig. 3d). In pex10 strains, β-oxidation was
impaired, resulting in higher and stable lipid content (Fig. 3d). When
DGAT2 was overexpressed in strain Y4305, the transformed cells
accumulated lipids at 53% of the DCW (Fig. 3f). These data suggest
that there is potential in engineering Y. lipolytica as an efficient host
to produce fatty acids and lipid-based products such as biodiesel and
waxes for other applications30,37.
The most notable feature of the pex10 mutant is the high EPA
titer in lipids. This is probably owing to the combination of an efficient EPA biosynthesis pathway with impaired β-oxidation and TAG
recycling. In pex10 cells, fatty acids might be released from TAGs,
but acyl-CoAs would be used as substrates for fatty acid elongases
and acyltransferases only, as there is no β-oxidation. This was evident from the 45% increase in ∆-9 elongation efficiency in strain
Y4184 (pex10∆) compared with strain Y4184. The reincorporation
of elongated acyl-CoAs into phospholipids allowed further desaturation, leading to enhanced EPA production. We observed similar
enrichment of DGLA and ARA with pex10 derivatives of strain Y4036
(Fig. 3e). The end product of the introduced pathway was produced
in very high amounts in pex10 mutants. These results suggest that
the high EPA titer in strain Y4305 (Figs. 1c and 5) was not due to
any EPA-preferred acyltransferase but rather to the efficient pathway
introduced and strong TAG recycling.
EPA in lipids produced by strain Y4305 was not evenly distributed
in the different lipid species (Fig. 6). The EPA content was about 55%
of the TFA in the TAG fraction but was only about 22% of the TFA in
the phospholipid fraction. These data suggest that Y. lipolytica has a
mechanism to control the amount of EPA in the phospholipids that
form cell membranes. EPA is more concentrated at the sn – 1 and
sn – 3 positions of the glycerol backbone in TAGs (Fig. 6c). Fatty acid
desaturation is believed to occur at the sn – 2 position of phospholipids38,42. Thus, EPA and other unsaturated fatty acids should concentrate at position 2 of the glycerol backbone. The enrichment of EPA
at positions 1 and 3, not at position 2, strongly suggests that there is
extensive lipid remodeling between the phospholipid and TAG fractions in engineered strains, leading to the redistribution of fatty acids
among lipid fractions and positions on the glycerol backbone.
Clinical trials have shown the safety and efficacy of the EPA­containing lipids produced with our engineered strain, Y4305
(refs. 6,7). This strain allowed us to realize the benefits of microbial

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

739

Articles
production of omega-3 fatty acid, namely a tailored fatty acid profile
high in desired omega-3 fatty acid and low in saturated fatty acids;
high productivity for cost-effective production that is unaffected by
natural and environmental events; an absence of ocean-born contaminants; and sustainable land-based production with no negative
impact on wild fish resources.
Although yeast has been used for the commercial production of
proteins encoded by a single gene, this is the first example, to our
knowledge, of engineered yeast with a metabolic pathway being
used to produce a commercial product. Our work has paved the way
for further development of strains with varying LCPUFA compositions tailored for specific applications and for developing a versatile
platform for the production of other high-value lipid products.

1. Ma, D.W.L. et al. n-3 PUFA and membrane microdomains: a new frontier in bioactive
lipid research. J. Nutr. Biochem. 15, 700–706 (2004).
2. Funk, C.D. Prostaglandins and leukotrienes: advances in eicosanoid biology. Science
294, 1871–1875 (2001).
3. Deckelbaum, R.J. & Torrejon, C. The omega-3 fatty acid nutritional landscape:
health benefits and sources. J. Nutr. 142, 587S–591S (2012).
4. Yokoyama, M. et al. Effects of eicosapentaenoic acid on major coronary events in
hypercholesterolaemic patients (JELIS): a randomised open-label, blinded endpoint
analysis. Lancet 369, 1090–1098 (2007).
5. Bays, H.E. et al. Eicosapentaenoic acid ethyl ester (AMR101) therapy in patients
with very high triglyceride levels (from the multi-center, plAcebo-controlled,
randomized, double-blINd, 12-week study with an open-label extension [MARINE]
trial). Am. J. Cardiol. 108, 682–690 (2011).
6. Schaefer, E.J. et al. Effects of eicosapentaenoic acid, docosahexaenoic acid,
and olive oil on cardiovascular disease risk factors. Circulation 122, A20007
(2010).
7. Gillies, P.J., Harris, W.S. & Kris-Etherton, P.M. Omega-3 fatty acids in food and
pharma: the enabling role of biotechnology. Curr. Atheroscler. Rep. 13, 467–473
(2011).
8. Metz, J.G. et al. Production of polyunsaturated fatty acids by polyketide synthases
in both prokaryotes and eukaryotes. Science 293, 290–293 (2001).
9. Meesapyodsuk, D. & Qiu, X. The front-end desaturase: structure, function, evolution
and biotechnological use. Lipids 47, 227–237 (2012).

10. Wallis, J.G. & Browse, J. The ∆8-desaturase of Euglena gracilis: an alternate pathway
for synthesis of 20-carbon polyunsaturated fatty acids. Arch. Biochem. Biophys.
365, 307–316 (1999).
11. Zhu, Q. et al. Metabolic engineering of an oleaginous yeast for the production of
omega-3 fatty acids. in Single Cell Oils: Microbial and Algal Oils (eds. Cohen, Z.
& Ratledge, C.) 51–73 (AOCS press, Urbana, 2010).
12. Barclay, W.R., Meager, K.M. & Abril, J.R. Heterotrophic production of long chain
omega-3 fatty acids utilizing algae and algae-like microorganisms. J. Appl. Phycol.
6, 123–129 (1994).
13. Barclay, W., Weaver, C. & Metz, J. Development of a docosahexaenoic acid production
technology using Schizochytrium: a history perspective. in Single Cell Oils (eds.
Cohen, Z. & Ratledge, C.) 36–73 (AOCS press, Champaign, Illinois, 2005).
14. Qi, B. et al. Production of very long chain polysaturated omega-3 and omega-6
fatty acids in plants. Nat. Biotechnol. 22, 739–745 (2004).
15. Napier, J.A. & Sayanova, O. The production of very-long-chain PUFA biosynthesis
in transgenic plants: towards a sustainable source of fish oils. Proc. Nutr. Soc. 64,
387–393 (2005).
16. Damude, H.G. & Kinney, A.J. Engineering oilseeds to produce nutritional fatty acids.
Physiol. Plant. 132, 1–10 (2008).
17. Cheng, B. et al. Towards the production of high levels of eicosapentaenoic acid in
transgenic plants: the effects of different host species, genes and promoters.
Transgenic Res. 19, 221–229 (2010).
18. Petrie, J.R. et al. Metabolic engineering of omega-3 long-chain polyunsaturated
fatty acids in plants using an acyl-CoA ∆6-desaturase with ω3-preference from the
marine microalga Micromonas pusilla. Metab. Eng. 12, 233–240 (2010).
19. Tavares, S. et al. Metabolic engineering of Saccharomyces cerevisiae for production
of eicosapentaenoic acid, using a novel ∆5-desaturase from Paramecium tetraurelia.
Appl. Environ. Microbiol. 77, 1854–1861 (2011).
20. Adarme-Vega, T.C. et al. Microalgal biofactories: a promising approach towards
sustainable omega-3 fatty acid production. Microb. Cell Fact. 11, 96 (2012).
21. Xue, Z. et al. Identification and characterization of new ∆-17 fatty acid desaturases.
Appl. Microbiol. Biotechnol. 97, 1973–1985 (2013).
22. Pollak, D.W. et al. Isolation of a ∆5 desaturase gene from Euglena gracilis and
functional dissection of its HPGG and HDASH motifs. Lipids 47, 913–926 (2012).
23. Hong, S.-P. et al. Engineering Yarrowia lipolytica to express secretory invertase with
strong FBA1IN promoter. Yeast 29, 59–72 (2011).
24. Blazeck, J., Liu, L., Redden, H. & Alper, H. Tuning gene expression in Yarrowia
lipolytica by a hybrid promoter approach. Appl. Environ. Microbiol. 77, 7905–7914
(2011).
25. Weterings, E. & Chen, D.J. The endless tale of non-homologous end-joining.
Cell Res. 18, 114–124 (2008).
26. Damude, H.G., He, H., Liao, D.-I. & Zhu, Q.Q. Mutant ∆8 desaturase genes
engineered by targeted mutagenesis and their use in making polyunsaturated fatty
acids. US patent 7,709,239 (2010).
27. Prestele, J. et al. Different functions of the C3HC4 zinc RING finger peroxins
PEX10, PEX2, and PEX12 in peroxisome formation and matrix protein import.
Proc. Natl. Acad. Sci. USA 107, 14915–14920 (2010).
28. Titorenko, V.I., Smith, J.j., Szilard, R.K. & Rachubinski, R.A. Peroxisome biogenesis
in the yeast Yarrowia lipolytica. Cell Biochem. Biophys. 32, 21–26 (2000).
29. Aursand, M., Standal, I.B. & Axelson, D.E. High-resolution 13C nuclear magnetic
resonance spectroscopy pattern recognition of fish oil capsules. J. Agric. Food Chem.
55, 38–47 (2007).
30. Tai, M. & Stephanopoulos, G. Engineering the push and pull of lipid biosynthesis
in oleaginous yeast Yarrowia lipolytica for biofuel production. Metab. Eng. 15, 1–9
(2013).
31. Wen, Z. & Chen, F. Prospects for eicosapentaenoic acid production using
microorganisms. in Single Cell Oils (eds. Cohen, Z. & Ratledge, C) 138–160 (AOCS
Press, Champaign, Illinois, 2005).
32. Ratledge, C. Single cell oils for the 21st century. in Single Cell Oils (eds. Cohen,
Z. & Ratledge, C.) 1–20 (AOCS Press, Champaign, Illinois, 2005).
33. Groenewald, M. et al. Yarrowia lipolytica: safety assessment of an oleaginous yeast
with a great industrial potential. Crit. Rev. Microbiol. (2013).
34. Barth, G. et al. Functional genetics of Yarrowia lipolytica. in Functional Genetics
of Industrial Yeasts: Topics in Current Genetics (ed. de-Winde, H.) 227–271
(Springer Verlag, Berlin, Germany, 2003).
35. Dujon, B. et al. Genome evolution in yeasts. Nature 430, 35–44 (2004).
36. Beopoulos, A., Nicaud, J.-M. & Gaillardin, C. An overview of lipid metabolism in
yeasts and its impact on biotechnological processes. Appl. Microbiol. Biotechnol.
90, 1193–1206 (2011).
37. Nicaud, J.-M. Yarrowia lipolytica. Yeast 29, 409–418 (2012).
38. Domergue, F. et al. Acyl carriers used as substrates by the desaturases and elongases
involved in very long-chain polyunsaturated fatty acids biosynthesis reconstituted
in yeast. J. Biol. Chem. 278, 35115–35126 (2003).
39. Damude, H.G. et al. Identification of bifunctional ∆12/ω3 fatty acid desaturases
for improving the ratio of ω3 to ω6 fatty acids in microbes and plants. Proc. Natl.
Acad. Sci. USA 103, 9446–9451 (2006).
40. Sumita, T. et al. Peroxisome deficiency represses the expression of n-alkane–
inducible YlALK1 encoding cytochrome P450ALK1 in Yarrowia lipolytica. FEMS
Microbiol. Lett. 214, 31–38 (2002).
41. Flores, C.-L. & Gancedo, C. Yarrowia lipolytica mutants devoid of pyruvate carboxylase
activity show an unusual growth phenotype. Eukaryot. Cell 4, 356–364 (2005).
42. Griffiths, G., Stobart, A.K. & Stymne, S. ∆6- and ∆12-desaturase activities and
phosphatidic acid formation in microsomal preparations from the developing cotyledons
of common borage (Borago officinalis). Biochem. J. 252, 641–647 (1988).

740

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

Methods
Methods and any associated references are available in the online
version of the paper.

npg

© 2013 Nature America, Inc. All rights reserved.

Note: Supplementary information is available in the online version of the paper.
Acknowledgments
We are grateful to H. Bryndza and J. Pierce for their strong support. We thank
A. Kinney and S. Picataggio for their suggestions, K. Czymmek and J. Li for their
technical help and D. Chesire for critical reading of this manuscript.
AUTHOR CONTRIBUTIONS
Q.Z. was responsible for strain-construction strategy, codon optimization
of synthetic genes and design and construction of integration plasmids, and
served as the lead for the strain-development team; Z.X., N.S.Y., H.G.D.,
E.N.J. and Q.Z. jointly conceived the concepts for gene isolation, selection and
pathway engineering; Z.X., P.L.S., S.-P.H. and Q.Z. determined integration
sites; R.A.R., J.E.S., J.W., D.W.P., M.D.B., D.J.M. and H.Z. performed molecular
biology experiments, transformation, primary screening, flask assays and gas
chromatography analyses; D.H.H. performed the analyses of fatty acid profiles,
lipid content and different lipid classes; P.L.S. and M.D.B. designed and performed
homologous recombination experiments for targeted PEX10 gene disruption;
Z.X., D.J.M. and K.C. performed cell biology experiments; D.X., D.R.S., D.M.A.,
S.A.B. and B.D.T. designed and performed fermentation experiments;
D.X. and B.D.T. developed models for fermentation experiments; E.F.M. performed
the NMR analysis; Z.X., M.W.B., S.-P.H., N.S.Y., E.N.J. and Q.Z. prepared the
manuscript; M.W.B. prepared the figures.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html

ONLINE METHODS

Strains and plasmids. Wild-type Y. lipolytica strain ATCC#20362 was
purchased from the America Type Culture Collection. Genetically engineered derivatives of strain ATCC#20362 and their key features are listed in
Supplementary Table 2. The integration plasmids used for the construction
of EPA-producing strains are described in Supplementary Table 1. Other
plasmids are listed in Supplementary Table 3. The desaturases and elongases
were isolated from various host organisms10,11,21,22. Y. lipolytica promoters
and terminators have been previously described23,24.

npg

© 2013 Nature America, Inc. All rights reserved.

Chemicals. Unless otherwise specified, chemicals were from Sigma-Aldrich
(St. Louis, MO). Medium components were from Difco (Lawrence, KS).
Fatty acid methyl ester standards for gas chromatography analyses were from
Nu-Chek Prep (Elysian, MN). 5-fluororotic acid was from Zymo Research
Corp. (Orange, CA). The Qiagen miniprep DNA preparation kit and PCR
purification kit were from Qiagen (Valencia, CA). Restriction enzymes
were from Promega (Madison, WI) or New England Biolabs (Ipswich, MA).
The Universal Genome Walker kit was from BD Clontech (Palo Alto, CA).
Media and culture conditions. Escherichia coli and yeast culture media
were prepared according to standard recipes40,41. The fermentation medium
contained, per liter, 6.7 g yeast nitrogen base (without amino acids and
with ammonium sulfate), 5.0 g yeast extract, 6.0 g KH2PO4, 2.0 g K2HPO4,
1.5 g MgSO4.7H2O, 1.5 mg thiamine hydrochloride and 20.0 g glucose. The
HGM contained, per liter, 80 g glucose, 6.3 g KH2PO4 and 27.0 g K2HPO4.
General molecular biology techniques. Standard recombinant DNA techniques were used as described43,44. Y. lipolytica transformation was carried out
as described45. Genomic sequencing of selected Y. lipolytica strains was done
using 454-Life Sciences and Illumina Next-Generation Sequencing platforms.
Codon-optimized genes were designed on the basis of the codon usage of
Y. lipolytica11 and synthesized by GenScript (Piscataway, NJ). Genome working was performed using the Universal Genome Walker kit from BD Clontech
(Palo Alto, CA) according to the manufacturer’s protocol.
Deletion of PEX10 in strain Y4184. The PEX10 gene was deleted from an
engineered EPA-producing strain, Y4184, by homologous recombination using
vector pYPS161 (ref. 46). Ura+ Y4184 transformants were screened by PCR using
primers Pex10 del1 3′F (5′-CCAACATGAGCGACAATACG-3′) and Pex10 del2
5′R (5′-CAAGTTCTGCTCTCTCACAC-3′) for the presence of a 2.8-kb fragment, which indicates the replacement of PEX10 with the URA3 marker.
Analyses of fatty acid profiles and lipid content. For the primary screen
of transformants, cells were cultivated in 24-well blocks with 3 ml selection
medium for 2 d at 30 °C and 250 r.p.m. The cells were then collected by centrifugation, resuspended in 3 ml HGM and incubated for 5 more days at 30 °C
and 250 r.p.m. Cells from 1 ml of the culture were collected by centrifugation
for the preparation of fatty acid methyl esters (FAMEs) as described47, except
that the FAMEs were extracted with 0.5 ml of heptane. Gas chromatography
analysis of FAMEs was done as previously described39,47.
Standard flask cultures were grown in 125-ml flasks containing 25 ml fermentation medium on an orbital shaker at 30 °C and 250 r.p.m. for 2 d. After 2 d

doi:10.1038/nbt.2622

of cultivation, cells from 6 ml of the culture were collected by centrifugation,
resuspended in 25 ml of HGM as described above and allowed to grow for
5 more days at 30 °C and 250 r.p.m. in 125-ml flasks. Cells were then harvested
by centrifugation and washed with water. We used 10 ml of the culture to determine the DCW. One milliliter of each culture was used for gas chromatography
analysis of fatty acid content and profile39,47.
Determination of lipid classes. The analyses of different lipid classes were
performed on lipid samples extracted from yeast biomass using a modified
Bligh and Dyer extraction method48. Lipid extraction, TLC separation of various lipid species and gas chromatography analysis of the fatty acid profiles of
each lipid species were done as previously described49.
Fluorescence and electron microscopy analysis of wild-type and pex10 cells.
Y. lipolytica ATCC#20362 and Y4128 cells carrying plasmid pZP2-GFP or
pZP2-GFP-SKL were grown in standard fermentation medium and analyzed
by fluorescence microscopy using a ZEISS AXIOPLAN fluorescence microscope. For electron microscopy, Y4128 or ATCC#20362 cells were grown as
described above. Samples were taken after 24 h and analyzed at the imaging
facility at University of Delaware as described50.
NMR analysis of the positional distribution of fatty acids on the
glycerol backbone. The biomass of strain Y4305 was extracted with hexane
in a Swedish tube51. Extracted lipids were analyzed by 13C NMR for the distribution of EPA on the glycerol backbone of TAGs. The 13C NMR spectra
were obtained on a Varian 700 MHz Direct Drive NMR spectrometer with
a 10-mm carbon or BB probe at 25 °C using 1 g of sample and 60 mg of
CrAcAc (0.05 M) dissolved in 3.1 ml total volume with chloroform-d1 with
an acquisition time of 1.2 s, a recycle delay time of 5 s, about 15.6 µs, 90° pulse,
a spectral width of 44.6 kHz, inverse-gated Waltz 1H decoupling and 5,680
transients averaged. Spectra were referenced to CDCl3 carbon at 77 p.p.m.
Spectra were processed with a line broadening of 0.5 Hz and zero filled to
512,000 complex data points.
43. Sherman, F. Getting started with yeast. in Methods Enzymology (eds. Guthrie, C.
& Fink, G.R.) 194, 3–20 (Academic Press, New York, 1991).
44. Ausubel, F.M. et al. Current Protocols in Molecular Biology (John Wiley, New York,
2010).
45. Chen, D.C., Beckerich, J.M. & Gaillardin, C. One-step transformation of the
dimorphic yeast Yarrowia lipolytica. Appl. Microbiol. Biotechnol. 48, 232–235
(1997).
46. Fickers, P. et al. New disruption cassettes for rapid gene disruption and marker
rescue in the yeast Yarrowia lipolytica. J. Microbiol. Methods 55, 727–737
(2003).
47. Cahoon, E.B., Ripp, K.G., Hall, S.E. & Kinney, A.L. Formation of conjugated ∆8,
∆10-double bonds by ∆12-oleic acid desaturase-related enzymes: biosynthetic origin
of calendic acid. J. Biol. Chem. 276, 2637–2643 (2001).
48. Zhang, H., Damude, H.G. & Yadav, N.S. Three diacylglycerol acyltransferases
contribute to oil biosynthesis and normal growth in Yarrowia lipolytica. Yeast 29,
25–38 (2012).
49. Christie, W.W. Lipid Analysis, 3rd ed. (The Oily Press, Bridgwater, UK, 2003).
50. Walther, P. & Ziegler, A. Freeze substitution of high-pressure frozen samples: the
visibility of biological membranes is improved when the substitution medium
contains water. J. Microsc. 208, 3–10 (2002).
51. Troëng, S. Oil determination of oilseed. Gravimetric routine method. J. Am. Oil
Chem. Soc. 32, 124–126 (1955).

nature biotechnology

letters

Photoreceptor precursors derived from threedimensional embryonic stem cell cultures integrate
and mature within adult degenerate retina

npg

© 2013 Nature America, Inc. All rights reserved.

Anai Gonzalez-Cordero1,5, Emma L West1,5, Rachael A Pearson1, Yanai Duran1, Livia S Carvalho1,
Colin J Chu1, Arifa Naeem1, Samuel J I Blackford1, Anastasios Georgiadis1, Jorn Lakowski2, Mike Hubank3,
Alexander J Smith1, James W B Bainbridge1, Jane C Sowden2 & Robin R Ali1,4
Irreversible blindness caused by loss of photoreceptors may
be amenable to cell therapy. We previously demonstrated
retinal repair1 and restoration of vision through transplantation
of photoreceptor precursors obtained from postnatal retinas
into visually impaired adult mice2,3. Considerable progress
has been made in differentiating embryonic stem cells (ESCs)
in vitro toward photoreceptor lineages4–6. However, the
capability of ESC-derived photoreceptors to integrate after
transplantation has not been demonstrated unequivocally.
Here, to isolate photoreceptor precursors fit for transplantation,
we adapted a recently reported three-dimensional (3D)
differentiation protocol that generates neuroretina from
mouse ESCs6. We show that rod precursors derived by this
protocol and selected via a GFP reporter under the control
of a Rhodopsin promoter integrate within degenerate retinas
of adult mice and mature into outer segment–bearing
photoreceptors. Notably, ESC-derived precursors at a
developmental stage similar to postnatal days 4–8 integrate
more efficiently compared with cells at other stages. This
study shows conclusively that ESCs can provide a source
of photoreceptors for retinal cell transplantation.
Many studies by our group and others have demonstrated integration into wild-type and degenerate mouse retinas of photoreceptor
precursors isolated from early postnatal retinas1–3,7–12. Moreover, we
have shown that transplantation of a purified population of postnatal photoreceptor precursors can restore rod-mediated vision in
mice2. Both the number of cells transplanted and the stage of their
development at the time of transplantation are important parameters
in achieving efficient integration1,2. The requisite next step toward
clinical translation is to prove that pluripotent stem cell lines, which
represent a renewable source of cells for transplantation, can provide equivalent transplantation-competent photoreceptor precursors. Although progress has been made in developing protocols for
in vitro differentiation of ESCs and induced pluripotent stem cells

(iPSCs) toward photoreceptor lineages4–6,13–16, no study has proved
that ESCs can give rise to mature photoreceptors bearing an outer segment. A feature of mature photoreceptors, outer segments are formed
of stacked membranous discs packed with the visual pigment and
enzymes required for phototransduction. They are essential for mediating efficient light-evoked responses. Using an optimized adherent,
two-dimensional (2D) culture system that generates retinal cells5,17,
we were unable to demonstrate the integration of GFP-labeled mouse
ESC–derived photoreceptors after transplantation18. These findings
led us to conclude that although current 2D ESC culture systems
produce cells expressing a selection of photoreceptor markers, they
do not faithfully re-enact developmental processes and are therefore unlikely to provide a robust source of photoreceptor precursors
equivalent to those from the developing retina.
In 2011, groundbreaking work6 described a 3D embryoid body–
based differentiation protocol that mimicked normal development
of embryonic retinal tissue and raised the possibility of generating
authentically specified and correctly staged photoreceptors for transplantation6,19. Here we have optimized and scaled up the generation
of mouse ESC-derived photoreceptors in 3D synthetic retinal tissue,
enabling us to transplant purified populations of photoreceptors from
defined stages of development and to investigate the potential of the
cells to integrate within the adult recipient retina and to mature into
new photoreceptors.
A schematic of in vitro retinal differentiation is shown in Figure 1a.
Continuous neuroepithelium-like structures were detected as early as
day 5 of differentiation (Fig. 1b). At day 7, the presumptive eye fields
evaginated from the embryoid bodies, forming hemispherical optic
vesicle–like structures (Fig. 1c). At around day 9, the optic vesicles
invaginated to form optic cup–like structures (Fig. 1d,e). Pigmented
retinal pigment epithelium cells were first detected at days 11–12
(Fig. 1f). Transparent neuroepithelial structures were still present
within the embryoid bodies at later time points of culture (Fig. 1g).
Early eye development occurs through a series of morphogenetic
events. A region of the diencephalon committed to form the eye,

1Department

of Genetics, UCL Institute of Ophthalmology, London, UK. 2Developmental Biology Unit, Institute of Child Health, University College London, London, UK.
Genomics Institute of Child Health, University College London, London, UK. 4Molecular Immunology Unit, Institute of Child Health, University College London,
London, UK. 5These authors contributed equally to this work. Correspondence should be addressed to R.R.A. ([email protected]) or E.L.W. ([email protected]).
3UCL

Received 13 March; accepted 22 June; published online 21 July 2013; doi:10.1038/nbt.2643

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

741

npg

a
Early retinal differentiation
Day 0

Day 1

Day 5

Day 7

Day 9
wEBs transfer

Late
retinal differentiation
Day 9
Day 34
Day 14
Neural retina

3,000 ESCs

Eye field

Embryoid
body

Optic vesicle

Optic cup

RPE

NE

Apical
Basal

2% Matrigel

12 wEBs per well

24-well plate- DMEM/F12/N2

96-well plate- DMEM/1.5% KSR

RA/Tau

c

h
25

d

Neuroepithelia

i

**

**

20
15

100
80
60

e

f

Optic vesicle–like

j

***

Percent of embryoid bodies

b

Percent of embryoid bodies

Figure 1  Efficient photoreceptor differentiation
in wEB 3D cultures. (a) Schematic of early
retinal 3D differentiation showing eye field,
optic vesicle and optic cup stages cultured in
96 well plates. wEBs were transferred from the
individual wells at day 9. (b–g) Representative
images of embryoid bodies in early stages of
retinal differentiation; neuroepithelium (b),
optic vesicle (c), optic cup–like (d,e) stages,
wEB showing pigmented retinal pigment
epithelium (RPE) at day 12 of differentiation (f)
and showing transparent optic vesicle in further
suspension culture (g). (h–j) Quantification of
embryoid bodies containing neuroepithelia (h),
optic vesicle (i) and optic cup (j) structures.
Error bars, mean ± s.e.m.; ANOVA, *P < 0.05,
**P < 0.01, ***P < 0.001; N = 4 independent
experiments with n = 288 embryoid bodies
counted per experiment. (k) RT-PCR analyses
showing expression of photoreceptor markers at
day 26 of culture. (l,m) Low magnification image
of a wEB showing Crx-positive photoreceptor
precursors (red). (n,o) Neuroepithelium at
days 20 (n) and 24 (o) showing increase in
Crx-positive photoreceptors precursors. Nuclei
were stained with DAPI (blue). Scale bars,
25 µm (n,o), 100 µm (b–f,l,m) and 200 µm (g).

Percent of embryoid bodies

© 2013 Nature America, Inc. All rights reserved.

letters

*

**

g

Optic cup–like

***

100

***

80
60

**

known as the eye field, evaginates bilaterally
40
10
40
to form optic vesicles. These vesicles then
20
5
20
invaginate to form bi-layered optic cups,
0
0
0
Day 5
Day 7
Day 9
Day 5
Day 7
Day 9
Day 5
Day 7
Day 9
generating the presumptive retinal pigment
epithelium (RPE) and neural retina20.
+ctrl d0 d26
Day 24
Day 20
To investigate this progressive retinal
k Crx
l
m
n
differentiation, we quantified the number of
Nrl
embryoid bodies containing eye-field stage,
Nr2e3
optic vesicle and optic cup–like structures at
Rho
days 5, 7 and 9 of differentiation (Fig. 1h–j,
Day 24
Rcvrn
o
respectively). Neuroepithelium and optic
Gnat1
vesicle–like structures decreased from days 5
Nte5
DAPI
to 9 in culture (Fig. 1h,i, P < 0.01 and
Actb
Crx
Crx
P < 0.001, ANOVA N = 4, respectively), but
the proportion of optic cup–like structures,
characterized by a hinge region produced by the inward folding in vivo, co-localization of Pax6 and Mitf was widespread throughof the neuroepithelia, increased substantially over the same period out the neuroepithelium of day 7 optic vesicles (Supplementary
(Fig. 1j, P < 0.001, ANOVA N = 4). Although an incomplete invagi- Fig. 2e–h). Mitf+ retinal pigment epithelium progenitors became
nation was observed in some instances, this did not interfere with progressively restricted to defined proximal portions of the invagifurther neural retinal specification and photoreceptor differentiation, nating neuroepithelium by day 12 (Supplementary Fig. 2i,j). Retinal
similar to other studies using human ESCs and iPSCs15,16. Unlike a differentiation was further confirmed by RT-PCR analysis of eyepreviously developed protocol6, we kept embryoid bodies as intact field transcription factors and retinal progenitor cell markers between
structures, referred to here as whole embryoid bodies (wEBs), for the day 0 and day 16 of differentiation (Supplementary Fig. 2l). From
entire period of differentiation, as manual excision of optic cup–like day 14 onward, wEBs were cultured in serum-free conditions and
structures from the embryoid bodies did not allow the scaling-up in the presence of retinoic acid and taurine, factors reported to
required to produce large numbers of transplantable photorecep- promote rod photoreceptor fate21–23. These conditions increased
tors. In addition, in contrast to the earlier protocol6, wEB cultures expression of rod-specific genes compared with both pulse appliwere grown under atmospheric oxygen levels (20% O2; 5% CO2). At cation of retinoic acid and taurine on days 14–16 of differentiation
days 9–12 of differentiation, optic cup–like structures demonstrated and culturing with fetal bovine serum throughout the culture period
apical-basal polarity, with the apical side facing the interior of the (Supplementary Fig. 3).
We next determined whether the retinal progenitor cells generwEB. Large numbers of dividing cells were observed, and mitosis
occurred at the apical surface (Supplementary Fig. 1). Similar to eye ated using our 3D wEB differentiation system were capable of further
development in vivo, the majority of cells within the ESC-derived differentiation into mature retinal cell types, despite the presence of
neuroepithelia analyzed at days 7–12 expressed Rax, Pax6 and Vsx2 other nonretinal neuronal and glial cell types (Supplementary Fig. 4).
(Chx10), indicating that they were proliferating retinal progeni- At day 26 of culture, markers for ganglion, amacrine, horizontal
tor cells (Supplementary Fig. 2a–d). Also similar to development and bipolar cells were found in a single layer at the basal side of the
742

ns

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

letters
optic cup–like structures. Photoreceptors were observed in a welldeveloped layer resembling the outer nuclear layer (ONL)
(Supplementary Fig. 5) and robust expression of a variety of
photoreceptor-specific markers was detected (Fig. 1k). By day 24
of culture, increasing from day 20, the majority of cells in the wEB
expressed Crx, a marker of postmitotic photoreceptor precursors
(Fig. 1l–o). All wEBs examined contained at least one neural retina–
like region, and these, without exception, expressed markers of
photoreceptor differentiation (n > 500 wEBs). Rod photoreceptors
were abundant, and very few cones were detected.
We sought to determine how closely ESC-derived photoreceptor
development within the 3D system compared with normal photo­
receptor development in vivo by analyzing the time-course of expression of a number of photoreceptor-specific proteins in ESC-derived
photo­receptors and photoreceptors from wild-type C57Bl/6J post­natal
retinas. There is a peak of Crx expression from postnatal day (P) 3
to P6 in the early postnatal retina, which diminishes in more mature
photoreceptors24–26. Similarly, in our wEB cultures the number
of Crx+ photoreceptor precursors increased markedly between
days 20 and 24 and decreased after day 26 (Fig. 2a). The reduction
in Crx protein levels was accompanied by a substantial increase in
the presence of Rhodopsin and Recoverin (Fig. 2a), placing cells at
day 26 of culture at a stage similar to the P4–P6 stage of development
(Supplementary Fig. 6). In vivo, both rod α-Transducin (Gnat1) and
Peripherin-2 protein levels increased substantially between P8 and
P12, coincident with the onset of outer-segment formation (Fig. 2b,c).
A similar pattern was observed in vitro; at day 28 there were few
positively labeled cells, but by day 36 the majority of cells were rod
α-Transducin and Peripherin-2 positive (Fig. 2d,e, respectively).
To further analyze the degree of similarity between the differentiation states of rod photoreceptors derived from ESCs and from
postnatal retinas, we compared their gene expression profiles by
microarray analysis. We used an adeno-associated viral vector
(pseudotype 2/9) carrying a GFP reporter under the control of
a Rhodopsin promoter (AAV2/9.Rhop.GFP) (Fig. 2f) to select the
rod photoreceptors. AAV2/9.Rhop.GFP-positive (Rhop.GFP+) rods
were sorted by fluorescence-activated cell sorting (FACS) at day 26 of
culture, day 34 of culture and P12 (Supplementary Fig. 7). The ESCderived populations (days 26 and 34) expressed genes enriched in
­transplantation-competent P4 rod photoreceptor precursors10, consistent with photoreceptor cell differentiation. Day 34 cultures showed
higher expression of genes encoding structural components of outer
segments and phototransduction, such as Gnat1, Rho, Pde6a
and Prph2. Hierarchical cluster analyses demonstrated that the
day 34 Rhop.GFP+ cells were more mature as they more closely resembled P12 photoreceptors than the earlier-stage day 26 cells (Fig. 2g).
Therefore, we sought to establish whether these late postnatal
Rhop.GFP+ rods formed outer segments. Peripherin-2, a marker
for outer segments in vivo, was found in the segment region of the
ESC-derived rods (Fig. 2h,i). However, although ultrastructural
examination of day 36 wEBs demonstrated the presence of innersegment and cilium-like structures, no outer segments were observed
(Fig. 2j). Cross-sections of these structures showed inner segments
packed full of mitochondria (Fig. 2j,k) and a typical photoreceptor
cilium, which contained the 9+0 microtubular arrangement (Fig. 2j,l).
Together, these findings confirm the survival and differentiation of
photo­receptors derived by the wEB 3D protocol to a stage equivalent
to late postnatal development.
We next examined the capability of wEB ESC–derived photo­
receptor precursors to integrate into adult retina and form new
mature photoreceptors. To ensure a robust evaluation of integration

and maturation, we assessed expression of outer-segment proteins
after transplantation in recipient retinas deficient in these proteins.
We transplanted ~200,000 Rhop.GFP+ FACS-sorted precursors (days 26–29) by means of subretinal injection into the adult
Gnat1−/− mouse, a model of stationary night blindness, which lacks
rod function because of the absence of rod α-Transducin phototransduction protein27. Three weeks after transplantation, Rhop.GFP+sorted photoreceptor precursors had migrated and integrated into the
recipient ONL (Fig. 3a). Integrated ESC-derived rods were correctly
oriented within the ONL and were usually found in small clusters,
a characteristic frequently seen in transplants using donor-derived
photoreceptor precursors2,9. Moreover, integrated rods displayed
morphological features typical of mature photoreceptors, including
inner and outer segments projected toward the host retinal pigment
epithelium (Fig. 3a) and rod spherules in the outer plexiform layer
(Fig. 3b). The identity and number of the integrated ESC-derived photoreceptors was established by counting GFP+ cells that also expressed
rod α-Transducin (Rhop.GFP+/Gnat1+). Integrated rods were found
predominantly around the cell mass near the injection site (Fig. 3c).
These cells were still present 6 weeks after transplantation.
Transplanted human ESC-derived cultures labeled with GFP
viruses have been reported to integrate within adult mouse retinas27.
The integrated cells resembled mouse photoreceptor cells in size,
and outer segments did not form after transplantation in the Crx−/−
mouse model28. As we recently demonstrated that it is possible
for contaminating viral particles to be injected with transplanted
cells and to label endogenous photoreceptors18, potentially leading to false-positive results, we formally excluded this possibility
in our experiments. To determine the number of virus-labeled host
photoreceptors after transplantation, we used a control ESCderived CBA.YFP+ FACS-sorted neuronal population transduced
with an AAV2/9.Rhop.RFP virus and quantified the YFP−/RFP+
photoreceptors. Transplantation of CBA.YFP +/Rhop.RFP− FACSsorted cells resulted in a negligible number of virally transduced host
photoreceptors (8 ± 2 photoreceptors per retina, n = 16). Without
FACS we noted significantly greater numbers of viral-labeled host
photoreceptors after transplantation (113 ± 29 versus 8 ± 2, photoreceptors; Mann-Whitney U, P < 0.0001, n ≥ 8) (Supplementary
Fig. 8a–c). Moreover, we also examined the transplantation of
unsorted ESC-derived mixed populations containing Rhop.GFP+
rods. We observed 113 ± 25 GFP-labeled photoreceptors in the
recipient ONL. However, only a small percentage of these cells (1 ±
0.6%) were verified to be ESC-derived Rhop.GFP+/Gnat1+ integrated
photoreceptors, suggesting that the majority of Rhop.GFP+/Gnat1−
cells were endogenous virus-labeled photoreceptors (Supplementary
Fig. 8d,e). In contrast, transplantation of a pure FACS-sorted
Rhop.GFP+ population showed that ~80% of ESC-derived Rhop.GFP+
integrated rods were also Gnat1+, similar to P4–8 Nrlp.GFP+ donorderived transplants (Supplementary Fig. 9). These experiments highlight the importance of stringent controls to identify true ESC-derived
integrated photoreceptors.
Early postnatal rod precursors integrate into the host ONL with
greater efficiency than do embryonic, late postnatal or adult mature
photoreceptors1. To determine whether ESC-derived photoreceptor
precursors behave in a similar manner, we transplanted FACS-sorted
Rhop.GFP+ cells at stages equivalent to early postnatal (days 26 and 29
in culture) and late postnatal (day 34 in culture) retina. The number
of integrated Rhop.GFP+/Gnat1+ photoreceptors from day 26 (420 ±
98 photoreceptors, n = 16) and day 29 (236 ± 44 photoreceptors,
n = 19) cultures was significantly greater than that obtained from
day 34 cultures (P < 0.05, ANOVA; 30 ± 6 photoreceptors, n = 24)

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

743

letters

Recoverin
Rhodopsin

Crx
Rhodopsin

a

Day 18

Day 20

Day 22

Day 24

Day 26

Day 28

ONL

ONL

P8

P12

DAPI
Rod transducin

b

d

Day 28

Day 36

f
Labeling and sorting
Day 22 Day 26

ONL

INL
ONL

c

AAV2/9
Rhop.GFP

P8

P12

e

Day 28

AAV2/9.Rhop.GFP

Day 29

Rods

Day 36

IS

OLM
105

Day 26/29/34

4

R6

DAPI
AAV2/9.Rhop.GFP

AAV2/9.Rhodopsinp.GFP

RFP Log

10

DAPI
Peripherin

© 2013 Nature America, Inc. All rights reserved.

ONL

10

25%

102
1

10

FACS for Rhodopsinp.GFP

ONL

3

100
AAV2/9.Rhop.GFP

1

10

2

3

4

10 10 10
GFP Log

5

10

ONL

g

Day 26 Rhop.GFP

Day 34 Rhop.GFP

h

P12 Rhop.GFP

Day 36

i

j

Gnat1
Rcvn

npg

Pde6a
Rho
Pde6g
Prph2

DAPI
Rhop.GFP
Peripherin

k

l

Rom1
Nr2e3
Rpgrip1
Pde6b
Nrl
Crx

*

*
*

*

*
*

*
*

*
*
500 nm

100 nm

300 nm

Figure 2  Time course of photoreceptor genesis in wEB 3D differentiation system. (a) Temporal expression of Crx and Rhodopsin (red and
green, respectively) and Recoverin- and Rhodopsin-positive photoreceptors (red and green, respectively) at different time points of culture.
(b–e) Immunohistochemical analysis for rod α-Transducin (b,d, red) and Peripherin-2 (c,e, red) in P8 and P12 retinas and ESC-derived
photoreceptors at day 28 and 36 of culture, respectively. (f) Schematic of viral labeling and FACS. Light image of day 29 wEBs showing areas of
neuroepithelium (black arrows). Fluorescent image of Rhop.GFP + wEBs (white arrows). Sections confirmed Rhop.GFP + in the neuroepithelium.
Representative FACS plot of Rhop.GFP+ photoreceptors (green) selected by flow cytometry. (g) Hierarchical clustering and heat map of 12
photoreceptor-associated transcripts at days 26 and 34 ESC-derived and P12 donor-derived Rhop.GFP + cells. (h,i) Day 36 viral-labeled Rhop.GFP+
photoreceptors showing Peripherin-2 (red) at the base of the inner segments. High magnification of a single Rhop.GFP + photoreceptor stained for
Peripherin-2 (i, arrowhead). (j) Ultrastructural sagittal section of a day 36 photoreceptor showing inner segment–like structures (black line),
cilium-like (dashed line) structures and the lack of outer segment (black arrowhead). (k,l) Representative images showing transverse sections through an
inner segment containing many mitochondria (k, asterisks) and a photoreceptor cilium with a 9+0 microtubular organization (l). Nuclei were stained
with DAPI (blue). Scale bars, 3 µm (i), 25 µm (a–e, f insert, h) and 200 µm (f).

744

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

b

OPL

INL

INL

c

INL

ONL
ONL

IS
ONL

SRS

No. of integrated
Rhop.GFP+/Gnat1+
photoreceptors per eye

d

e
***

1,500

DAPI
Rhop.GFP
Rod transducin

DAPI
Rhop.GFP
Rod transducin

OS
Rhop.GFP
Rod transducin

Gnat1–/–

f

Prph2rd2/rd2

Rho–/–

ONL

*

1,000
ONL
500
0

ONL
Day 26
Day 29
Day 34
Stage of culture

h

i

DAPI
Rhop.GFP
Rod transducin

OPL

n
OPL

100

N=3
n = 15
N=3
n = 26

80

N=2
n = 20
N=5
n = 29
N=5
n = 45

40
20
0
DAPI
Rhop.GFP
Dystrophin

k

l

DCPG

+

Gnat1–/–
Wild-type

o

N=4
n = 18
N=4
N=2
n = 36
n = 20

DCPG + CPPG

ONL

20% ∆F/F

DAPI
Rhop.GFP
OS
PKCa
Rod transducin

Integrated Rhop.GFP

N=2
n = 20

60
ONL

DAPI
Rhop.GFP
Rhodopsin

DAPI
Rhop.GFP
Peripherin

N=5
N=3
n = 30
n = 26
N=5
n = 41

120

j

g

DCPG

DCPG
after wash

DCPG
CPPG

NMDA

DCPG

50s

Gnat1
–/–

npg

a

+

Figure 3  Integration and connectivity of ESCderived photoreceptor precursors. (a,b) Rhop.GFP+
integrated photoreceptors showing mature
morphology with outer segments (OS) stained
for rod α-Transducin (a, inset) and spherule
formation in the outer plexiform layer (OPL) (b,
arrowhead). INL, inner nuclear layer; ONL, outer
nuclear layer; IS, inner segment. (c) Rhop.GFP+/
Gnat1+ integrated cells close to the cell mass
in the subretinal space (SRS). (d) Histogram
showing the number of Rhop.GFP+/Gnat1+
ESC-derived integrated rods from transplants of
days 26, 29 and 34 of culture. Error bars,
mean ± s.e.m; ANOVA, *P < 0.05, ***P < 0.001.
(e–g) Integration of ESC-derived Rhop.GFP+/Gnat1+
photoreceptors into the Gnat1−/− (e), Prph2rd2/rd2 (f)
and Rho−/− (g) degenerate models as
demonstrated by rod α-Transducin, Peripherin-2
and Rhodopsin segment staining, respectively
(red). (h) Rhop.GFP+/Gnat1+ integrated rod
spherule in close proximity to bipolar cells
(PKCα+, red). (i) Rhop.GFP+ rod spherule
localized with ribbon synaptic marker
Dystrophin (red). (h,i, insets) high magnification
single confocal sections of boxed region (j) 3D
confocal image of Gnat1−/− retinal flatmount
showing Rhop.GFP+ integrated rod stained for
rod α-Transducin (red) and synaptic marker
Ribeye (purple). (k–m) 3D reconstruction of the
integrated rod, highlighting morphology and
arrangement of the rod spherule and ribbon
synapse (white arrows). (n) Intracellular calcium
changes in integrated Rhop.GFP+, Gnat1−/− host
and WT photoreceptors are similarly evoked by
the mGluR8 agonist DCPG and blocked by the
specific antagonist CPPG. Error bars, mean
± s.e.m. (o,p) Mean traces (o) of integrated
Rhop.GFP+ (white circles) and recipient
photoreceptors (yellow circles) shown in (p).
N = number of eyes; n = number of cells
imaged. Nuclei were stained with DAPI (blue).
Scale bars, 3 µm (i,l,m), 5 µm (e–g,j),
10 µm (a,b,h) and 25 µm (c,p).

Rhop.GFP

© 2013 Nature America, Inc. All rights reserved.

letters

p

(Fig. 3d and Supplementary Fig. 9b), indicating that the developmental stage of the
donor photoreceptor is important in deterDAPI
mining its ability to integrate.
Rhop.GFP
Rod transducin
We have recently shown, by transplanting
Ribeye
Nrlp.GFP+ rods derived from the early postnatal retina into mouse models of degeneration,
that different disease environments have distinct and marked impacts
on the morphology of transplanted photoreceptors3. To confirm the
identity of the ESC-derived integrated rods in other models lacking
endogenous, photoreceptor-specific proteins, and to determine whether
ESC-derived precursors can integrate into different disease environments, we transplanted day 29 Rhop.GFP+ cells into two additional
models of inherited retinal degeneration. Compared with transplantation into Gnat1−/− mice, far fewer ESC-derived rods integrated into
the ONL of 2-month-old Peripherin-2 null mutant (Prph2rd2/rd2) mice
and 3-week-old rhodopsin knockout (Rho−/−) mice (Supplementary
Fig. 10). Transplanted rods integrated within the Gnat1−/− recipient
formed long outer segments (Fig. 3e). In contrast, integrated cells in
the Prph2rd2/rd2 and Rho−/− models formed shorter segments (Fig. 3f,g,
respectively), consistent with our previous findings in transplanting
donor-derived precursors3. Notably, ESC-derived photoreceptors

expressed the outer-segment proteins missing in the endogenous rods
in each of the respective knockout models examined.
To establish whether integrated ESC-derived rod precursors were
able to connect with the existing retinal circuitry, we examined transplanted eyes for the presence of synapses. Integrated Rhop.GFP+
photoreceptors extended basal processes that terminated as round,
synaptic bouton–like structures, which were in close proximity to the
afferent terminals of PKCα+ rod bipolar cells in the outer plexiform
layer (Fig. 3h). These synapse-like structures expressed the rod ribbon
synapse markers Dystrophin and Ribeye (Fig. 3i,j). 3D reconstruction of individual integrated cells in retinal flat mounts demonstrated
the correct morphology and anatomical localization of integrated
Rhop.GFP+/Gnat1+ cells and highlighted the correct spatial alignment
and morphology of the ribbon synapse in relation to the rod spherule
(Fig. 3j–m and Supplementary Movie 1).

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

745

m

Rhop.GFP

Fura Red

DAPI

npg

© 2013 Nature America, Inc. All rights reserved.

letters
Finally, we assessed whether the transplanted cells could respond
to pharmacological stimuli in a manner similar to that of endogenous
rods. In the ONL, the metabotropic glutamate receptor mGluR8 is
expressed on photoreceptor presynaptic terminals, and its activation
leads to a characteristic decrease in intracellular calcium in these
cells29,30 (Fig. 3n–p). (S)-3,4-dicarboxyphenylglycine (DCPG),
an agonist with high specificity for the mGluR8 subtype, consistently evoked appropriate decreases in intracellular calcium in both
endogenous Gnat1−/− rods and integrated ESC-derived Rhop.GFP+
precursor cells that were virtually indistinguishable from those seen
in wild-type rods (Fig. 3n,o), both in their profile and in the proportion of cells responding. In all cases, these decreases could be
blocked by the mGluR8-specific antagonist (RS)-alpha-cyclopropyl-4phosphonophenylglycine (CPPG) (Fig. 3n,o). Conversely, specific
agonists of another glutamate receptor, the N-methyl-d-aspartate
(NMDA) receptor, which is expressed by other retinal neurons but
not by photoreceptors, had no effect (Fig. 3n).
In this study, we generated optic cup–like structures from
3D-cultured mouse ESCs and isolated from the derived retinas a
population of pure photoreceptor precursors capable of integrating and maturing into new photoreceptors within a recipient retina
after transplantation. Notably, we establish that rods obtained from
stages in culture similar to P4–8 integrate more efficiently than do
mature rods expressing later phototransduction markers. Compared
with photoreceptors obtained from our previously described 2D
method18, the 3D-differentiated cells expressed significantly greater
levels of postnatal rod genes (Supplementary Fig. 11), supporting
our hypothesis that developmental stage is crucial for photoreceptor
integration1,18. Here, the ESC origin of the integrated photo­receptors
was verified by detection of proteins absent in the photoreceptors
of the recipient retinas in three different disease models. We also
confirmed that transplantation of a heterogeneous population of
ESC-derived cells leads to greatly reduced numbers of integrated
photoreceptors, consistent with our earlier observations in transplanting donor-derived cells2,18.
In our experiments, we transplanted ~200,000 ESC-derived photoreceptor precursors and observed ~0.3% of these cells integrating
into the retina. The number of integrated photoreceptors and their
morphology were similar to those in our earlier studies using donorderived photoreceptor precursors1,9,31. Assessment of visual function
will require further optimization to achieve higher numbers of integrating cells. Based on our earlier work, reliable electroretinographic
responses are achieved only with rescue of 150,000 functioning rods.
Restoration of visual function was demonstrated in the Gnat1−/−
model containing an average of 25,000 integrated cells (through testing by means of a water maze and an OptoMotry device), yet electroretinographic responses were not detectable even with this number
of new cells2. We find it difficult to reconcile our findings with previous reports of restoration of mouse electroretinographic responses
with as few as 3,000 integrated human ESC-derived Nrl+ cells that do
not form outer segments or with transplants of mixed cell populations
derived from mouse iPSCs28,32.
In conclusion, the 3D culture system described here provides
a robust and consistent method of differentiating ESCs into photo­
receptor precursors8. Our data now demonstrate unequivocally that
ESC-derived photoreceptor precursor cells have the capability to
integrate and mature to form outer segments and synaptic connections after transplantation into the degenerate adult mouse retina. We
present clear evidence to support the utility of ESC-derived cells for
photoreceptor replacement therapy. Similar 3D protocols have been
developed to generate photoreceptors from human ESCs33; future
746

transplantation studies will seek to establish that similar integration
can be achieved using human cells.
Methods
Methods and any associated references are available in the online
version of the paper.
Accession codes. ArrayExpress: E-MEXP-3921 and E-MEXP-3922.
Note: Supplementary information is available in the online version of the paper.
Acknowledgments
This work was supported by the Medical Research Council UK (mr/j004553/1,
G0901550), RP Fighting Blindness (GR566), The Miller’s Trust and Moorfields
Eye Charity through a generous private donation. A.G.-C. is a Wellcome Trust
PhD student (087256/Z08/Z). R.A.P. is a Royal Society University Research Fellow.
J.C.S. is supported by Great Ormond Street Hospital Children’s Charity. R.R.A. is
partly funded by the Department of Health’s National Institute for Health Research
Biomedical Research Centre at Moorfields Eye Hospital and Alcon Research
Institute. We thank A. Eddaoudi, A. Rose and T. Adejumo for FACS assistance;
S. Azam and S. Haria for virus purification; S. Sharma for performing the
Affymetrix microarray; and P. Munro for EM assistance. The mouse EK.CCE
ESC line34 (129/SvEv) was a kind gift of E. Robertson. The following mouse
lines were kind gifts: Gnat1−/− was provided by J. Lem, Tufts University School
of Medicine; Prph2rd2/rd2 by G. Travis, UCLA; Rho−/− by P. Humphries, Trinity
College Dublin and Nrlp.GFP+/+ by A. Swaroop, University of Michigan.
AUTHOR CONTRIBUTIONS
A.G.-C. and E.L.W. contributed equally to the concept, design, execution and
analysis of all experiments and manuscript writing. R.A.P. performed subretinal
transplantation and calcium imaging, and contributed to the concept and design
of the experiments, funding and manuscript writing. Y.D. performed subretinal
transplantations and histological processing. L.S.C., A.G. and J.L. contributed
to experimental execution. C.J.C. performed IMARIS reconstruction. A.N. and
S.J.I.B. provided technical assistance. M.H. performed microarray data analysis.
J.W.B.B., A.J.S., J.C.S. and R.R.A. contributed to the concept and design of the
experiments, funding and to manuscript writing.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
1. MacLaren, R.E. et al. Retinal repair by transplantation of photoreceptor precursors.
Nature 444, 203–207 (2006).
2. Pearson, R.A. et al. Restoration of vision after transplantation of photoreceptors.
Nature 485, 99–103 (2012).
3. Barber, A.C. et al. Repair of the degenerate retina by photoreceptor transplantation.
Proc. Natl. Acad. Sci. USA 110, 354–359 (2013).
4. Lamba, D.A., Karl, M.O., Ware, C.B. & Reh, T.A. Efficient generation of retinal
progenitor cells from human embryonic stem cells. Proc. Natl. Acad. Sci. USA 103,
12769–12774 (2006).
5. Osakada, F. et al. Toward the generation of rod and cone photoreceptors from
mouse, monkey and human embryonic stem cells. Nat. Biotechnol. 26, 215–224
(2008).
6. Eiraku, M. et al. Self-organizing optic-cup morphogenesis in three-dimensional
culture. Nature 472, 51–56 (2011).
7. Bartsch, U. et al. Retinal cells integrate into the outer nuclear layer and differentiate
into mature photoreceptors after subretinal transplantation into adult mice. Exp.
Eye Res. 86, 691–700 (2008).
8. Lakowski, J. et al. Cone and rod photoreceptor transplantation in models of the
childhood retinopathy Leber congenital amaurosis using flow-sorted Crx-positive
donor cells. Hum. Mol. Genet. 19, 4545–4559 (2010).
9. Pearson, R.A. et al. Targeted disruption of outer limiting membrane junctional
proteins (Crb1 and ZO-1) increases integration of transplanted photoreceptor
precursors into the adult wild-type and degenerating retina. Cell Transplant. 19,
487–503 (2010).
10. Lakowski, J. et al. Effective transplantation of photoreceptor precursor cells
selected via cell surface antigen expression. Stem Cells 29, 1391–1404 (2011).
11. Eberle, D., Schubert, S., Postel, K., Corbeil, D. & Ader, M. Increased integration
of transplanted CD73-positive photoreceptor precursors into adult mouse retina.
Invest. Ophthalmol. Vis. Sci. 52, 6462–6471 (2011).
12. Singh, M.S. et al. Reversal of end-stage retinal degeneration and restoration of
visual function by photoreceptor transplantation. Proc. Natl. Acad. Sci. USA 110,
1101–1106 (2013).

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

13. Meyer, J.S. et al. Modeling early retinal development with human embryonic and
induced pluripotent stem cells. Proc. Natl. Acad. Sci. USA 106, 16698–16703
(2009).
14. Hirami, Y. et al. Generation of retinal cells from mouse and human induced
pluripotent stem cells. Neurosci. Lett. 458, 126–131 (2009).
15. Meyer, J.S. et al. Optic vesicle-like structures derived from human pluripotent stem
cells facilitate a customized approach to retinal disease treatment. Stem Cells 29,
1206–1218 (2011).
16. Phillips, M.J. et al. Blood-derived human iPS cells generate optic vesicle-like
structures with the capacity to form retinal laminae and develop synapses. Invest.
Ophthalmol. Vis. Sci. 53, 2007–2019 (2012).
17. Osakada, F., Ikeda, H., Sasai, Y. & Takahashi, M. Stepwise differentiation of
pluripotent stem cells into retinal cells. Nat. Protoc. 4, 811–824 (2009).
18. West, E.L. et al. Defining the integration capacity of embryonic stem cell-derived
photoreceptor precursors. Stem Cells 30, 1424–1435 (2012).
19. Ali, R.R. & Sowden, J.C. Regenerative medicine: DIY eye. Nature 472, 42–43
(2011).
20. Martinez-Morales, J.R. & Wittbrodt, J. Shaping the vertebrate eye. Curr. Opin. Genet.
Dev. 19, 511–517 (2009).
21. Hyatt, G.A., Schmitt, E.A., Fadool, J.M. & Dowling, J.E. Retinoic acid alters
photoreceptor development in vivo. Proc. Natl. Acad. Sci. USA 93, 13298–13303
(1996).
22. Kelley, M.W., Williams, R.C., Turner, J.K., Creech-Kraft, J.M. & Reh, T.A. Retinoic
acid promotes rod photoreceptor differentiation in rat retina in vivo. Neuroreport
10, 2389–2394 (1999).
23. Lombardini, J.B. Taurine: retinal function. Brain Res. Brain Res. Rev. 16, 151–169
(1991).
24. Chen, S. et al. Crx, a novel Otx-like paired-homeodomain protein, binds to
and transactivates photoreceptor cell-specific genes. Neuron 19, 1017–1030
(1997).
25. Furukawa, T., Morrow, E.M. & Cepko, C.L. Crx, a novel otx-like homeobox gene,
shows photoreceptor-specific expression and regulates photoreceptor differentiation.
Cell 91, 531–541 (1997).

26. Blackshaw, S. et al. Genomic analysis of mouse retinal development. PLoS
Biol. 2, e247 (2004).
27. Calvert, P.D. et al. Phototransduction in transgenic mice after targeted deletion of
the rod transducin alpha -subunit. Proc. Natl. Acad. Sci. USA 97, 13913–13918
(2000).
28. Lamba, D.A., Gust, J. & Reh, T.A. Transplantation of human embryonic stem
cell-derived photoreceptors restores some visual function in Crx-deficient mice.
Cell Stem Cell 4, 73–79 (2009).
29. Koulen, P., Kuhn, R., Wässle, H. & Brandstätter, J.H. Modulation of the intracellular
calcium concentration in photoreceptor terminals by a presynaptic metabotropic
glutamate receptor. Proc. Natl. Acad. Sci. USA 96, 9909–9914 (1999).
30. Koulen, P. & Brandstätter, J.H. Pre- and postsynaptic sites of action of mGluR8a
in the mammalian retina. Invest. Ophthalmol. Vis. Sci. 43, 1933–1940 (2002).
31. West, E.L. et al. Pharmacological disruption of the outer limiting membrane leads
to increased retinal integration of transplanted photoreceptor precursors. Exp. Eye
Res. 86, 601–611 (2008).
32. Tucker, B.A. et al. Transplantation of adult mouse iPS cell-derived photoreceptor
precursors restores retinal structure and function in degenerative mice. PLoS
ONE 6, e18992 (2011).
33. Nakano, T. et al. Self-formation of optic cups and storable stratified neural retina
from human ESCs. Cell Stem Cell 10, 771–785 (2012).
34. Evans, M.J. & Kaufman, M.H. Establishment in culture of pluripotential cells from
mouse embryos. Nature 292, 154–156 (1981).
35. Gao, G.-P. et al. Rep/Cap gene amplification and high-yield production of AAV in
an A549 cell line expressing Rep/Cap. Mol. Ther. 5, 644–649 (2002).
36. Davidoff, A.M. et al. Purification of recombinant adeno-associated virus type 8
vectors by ion exchange chromatography generates clinical grade vector stock.
J. Virol. Methods 121, 209–215 (2004).
37. Luhmann, U.F.O. et al. Differential modulation of retinal degeneration by Ccl2 and
Cx3cr1 chemokine signalling. PLoS ONE 7, e35551 (2012).
38. Tschernutter, M. et al. Long-term preservation of retinal function in the RCS
rat model of retinitis pigmentosa following lentivirus-mediated gene therapy.
Gene Ther. 12, 694–701 (2005).

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

747

npg

© 2013 Nature America, Inc. All rights reserved.

letters

npg

© 2013 Nature America, Inc. All rights reserved.

ONLINE METHODS

3D ESC retinal differentiation culture. A mouse EK.CCE ESC line34
(129/SvEv; a kind gift of E. Robertson) was maintained as previously
described18. For 3D retinal differentiation, 3 × 104 dissociated ESCs were
resuspended per milliliter of differentiation medium (GMEM containing 1.5%
KSR, 0.1 mM NEAA, 1 mM pyruvate, 0.1 mM 2-mercaptoethanol), plated
into 96-well low-binding (Corning) plates and incubated at 37 °C, 5% CO2.
This was defined as day 0 of culture. Growth factor-reduced Matrigel (BD
Biosciences) was added to embryoid-body cell aggregates on day 1 of culture
to a final concentration of 2% (v/v). For wEB retinal differentiation, wEBs
were transferred into retinal maturation medium (DMEM/F12 Glutamax
containing N2 supplement and Pen/strep) at day 9, plated in low-binding
plates at a density of 6 wEBs/cm2 and incubated at 37 °C, 5% CO2. The media
was changed every 2–3 d, with the addition of 1 mM taurine (Sigma) and
500 nM retinoic acid (Sigma) from day 14 of culture onward. For quantification
of early stages in culture the following criteria were used: neuro­epithelia was
characterized by a continuous neuroepithelium around the whole circumference of the embryoid body; optic vesicle–like structures contained thickened
regions of neuroepithelium that protruded from the embryoid body; optic
cup–like structures included complete and incomplete invaginated embryoid
bodies and were characterized by the presence of a hinge region, whereby the
inward folding of the neuroepithelia produced an angle of less than 90°. For
further retinal differentiation, wEBs were transferred to fresh low-binding
plates at day 27 of culture, at a density of 3 wEBs/cm2, with 50% media changes
every 2–3 d. For experiments quantifying the number of host viral labeled
photoreceptors, a CBA.YFP ESC line (a variant of R1 ESCs; 7AC5/EYFP, from
ATCC) was differentiated using the protocol above.
Production of recombinant AAV2/9 Rhop.GFP/RFP. A pD10/Rhodopsin
promoter-GFP or pD10/Rhodopsin promoter-RFP construct containing
AAV-2 inverted terminal repeats was used to generate AAV2/9 Rhop.GFP or
Rhop.RFP. Recombinant AAV2/9 serotype particles were produced through
a previously described tripartite transfection method into HEK293T cells35,
followed by purification using ion exchange chromatography36. Viral particle
titers were determined using dot-blot analysis of purified virus DNA and
plasmid controls of known concentrations. wEBs were infected at day 22 of
culture with 1 × 1010 viral particles per wEB in retinal maturation medium.
FACS analysis. wEBs were dissociated with 0.25% Trypsin at various time
points and FACS-sorted for Rhop.GFP+ or CBA.YFP+/Rhop.RFP− cells, and
collected in retinal maturation medium containing 10% FBS, for further analysis. Cell sorting was done on a MoFlo XDP (Beckman Coulter) fitted with a
200 mW 488 nm blue laser (adjusted to 150 mW) to excite GFP and RFP. GFP
was collected in the 530/40 nm channel and RFP in the 613/20 nm channel.
Real-time and RT-PCR analysis. RNA was extracted with RNeasy Micro/
Mini Kit (QIAGEN) and reverse-transcribed using QuantiTect Reverse
Transcription Kit (QIAGEN). The cDNA was amplified with gene-specific
primers (Supplementary Table 1). PCRs were conducted using at least three
separate RNA preparations. Real-time quantitative RT-PCR was performed
with a thermal cycler (7900HT; Applied Biosciences) as previously described18.
Reagents were obtained from Roche Diagnostics and primers were designed
for specific probe-binding regions using the Roche Universal Probe Library.
Samples were run in triplicate and at least three independent differentiation
cultures were analyzed.
Microarray. AAV2/9.Rhodopsin.GFP virus was subretinally injected in
P2-P4 wild-type mice and day 22 wEBs. Three independent experiments were
used to obtain P12, day 26 and 34 rod precursors, which were dissociated
as described above and isolated by FACS. Total RNA was isolated using a
mirVana miRNA Isolation Kit (Ambion) and labeled for Affymetrix whole
transcriptome microarray analysis using the Ambion WT expression kit
(Invitrogen) and Affymetrix Mouse Gene 1.0 ST genechips. Raw data were
normalized using RMA in Expression Console 1.2. Array QC (PCA, signal
distribution and controls) was performed using Expression Console and
Bioconductor. Hierarchical clustering and heat map representations were
performed using GeneSpring 12.5.

nature biotechnology

Photoreceptor transplantation. C57Bl/6J (Harlan, UK), Gnat1−/− (J. Lem,
Tufts University School of Medicine), Prph2rd2/rd2 (G. Travis, UCLA), Rho−/−
(P. Humphries, Trinity College Dublin) and Nrlp.GFP+/+ (A. Swaroop,
University of Michigan) mice were maintained in the animal facility at
University College London. Experiments were conducted in accordance with
the Policies on the Use of Animals and Humans in Neuroscience Research,
revised and approved by the ARVO statement for the Use of Animals in
Ophthalmic and Vision Research. For the dystrophic models, Gnat1−/− mice
received transplanted cells at 8–12 weeks of age, Rho−/− mice (n = 10) at
3–4 weeks and Prph2rd2/rd2 mice (n = 8) at 8 weeks of age. This was in keeping
with the optimal recipient age as determined by donor-derived precursor cell
transplantation into these dystrophic models3. All mice were kept on a standard 12 h light-dark cycle. wEBs from various stages of culture were dissociated
and FACS-sorted as described above and Rho.GFP+ cells were resuspended at
a concentration of 200,000 cells/µl in sterile HBSS and DNase (0.05%) before
injection. Animals were anesthetized and surgery was performed under direct
ophthalmoscopy, as described previously1,2,18,36. Mice were euthanized 3 weeks
after transplantation.
Immunohistochemistry. wEBs and eye cups were fixed for 1 h in 4% paraformaldehyde (PFA) and embedded in OCT (Raymond A. Lamb Ltd.).
Cryosections were cut (18 µm thick) and all sections were collected for ana­
lysis. For immunohistochemistry, sections were blocked in 5% goat serum and
1% bovine serum albumin in PBS. Primary antibody (Supplementary Table 2)
was incubated overnight at 4 °C. Sections were incubated with secondary
antibody for 2 h at room temperature, washed and counterstained with DAPI
(Sigma-Aldrich). Retinal flatmounts were stained by adapting a published
protocol37. Following dissection, the retinas were placed into 100% methanol
overnight, then blocked and stained in 1% BSA (Sigma-Aldrich), 3% Triton
X-100 and 5% goat serum. Alexa-Fluor 488, 546 and 633 secondary antibodies
(Invitrogen-Molecular Probes) were used at a 1:500 dilution.
Image acquisition. Retinal section and flatmount images were acquired by
confocal microscopy (Leica DM5500Q). A series of XY optical sections, 1.0 µm
apart, throughout the depth of the section were taken and built into a stack to give
a projection image. LAS-AF image software was used. 3D reconstruction of highresolution Z-stack images was performed with Imaris software (Bitplane).
Electron microscopy. wEBs were fixed in 3% glutaraldehyde/1% PFA at
4 °C for 48 h and processed, as previously described9,18,38. Briefly, following
osmium fixation and ethanol dehydration, the specimens were embedded in
araldite and cured at 60 °C. Semithin (0.7 µm) and ultrathin (0.07 µm) sections were cut using a Leica ultracut S microtome fitted with an appropriate
diamond knife (Diatome histoknife Jumbo/Ultrathin). Ultrathin sections were
collected on copper grids (100 mesh, Agar Scientific), contrast-stained with 1%
uranyl acetate and lead citrate and analyzed using a JEOL 1010 transmission
electron microscope (80 kV).
Calcium imaging. Calcium imaging was performed using methods described
previously1 with minor modifications. Briefly, whole-mount neural retinas
were loaded with Fura-Red AM (15 µM, Molecular Probes)/Pluronic Acid
F127 (0.03% w/v, Sigma) in artificial cerebrospinal fluid (ACSF), which contained in mM: 119.0 NaCl, 26.2 HEPES, 11 D-glucose, 2.5 KCl, 1.0 K2HPO4,
2.5 CaCl2, 1.3 MgCl2 for 1 h at 37 °C and then de-esterified in ACSF alone
for 30 min at 37 °C. Retinas were transferred to an inverted microscope (SP2,
Leica) and held flat under a nylon-strung platinum wire ‘harp’, photoreceptor
side down, and perfused with oxygenated ACSF (36 °C) using a pres­surized
perfusion system (Harvard Apparatus Ltd). Drugs were applied through the
perfusion system and included DCPG ((S)-3,4-dicarboxyphenylglycine,
40 µM), CPPG ((RS)-alpha-cyclopropyl-4-phosphonophenylglycine, 100 µM)
and NMDA (N-methyl-d-aspartate, 200 µM) (all Tocris). GFP+ cells were
located using epifluorescence before taking confocal XY images of DAPI,
GFP and Fura-Red AM to confirm the location of GFP + cells within the
recipient ONL. Only cells with a highly condensed nucleus (typical of rods)
located within the recipient ONL were included in the analysis. Fura-Red fluores­
cence was acquired at 4 s intervals and analyzed off-line. NB: when Fura-Red
is excited at 488 nm, its emission undergoes an increase in fluorescence

doi:10.1038/nbt.2643

as [Ca2+]i decreases. Endogenous GFP− rods were randomly selected from
the Hoechst image. Changes in fluorescence were normalized against the
fluorescence at time 0 s and a change of >10% above baseline was considered
a response. Analysis was performed masked such that only the timing,
not the identity of the drug applied, was known at the time of analysis.

Statistical analysis. All means are presented ± s.e.m., unless otherwise stated;
N, number of animals or independent experiments performed; n, number of
eyes, embryoid bodies (prior to day 9 of culture) or wEBs (after day 9 of culture)
examined, where appropriate. For assessment of integration efficiency, statistical analysis is based on at least three independent transplantation sessions (cell
preparation, FACS and transplantation). Statistical significance was assessed
using GraphPad Prism 5 software and denoted as *P < 0.05, **P < 0.01 and
***P < 0.001. Appropriate statistical tests were applied including t-test, MannWhitney U and ANOVA with Tukey’s correction for multiple comparisons.

npg

© 2013 Nature America, Inc. All rights reserved.

Cell counts. Counts of integrated cells were taken 3 weeks after transplantation
using a fluorescence microscope (ObserverZ.1, Zeiss). The average number
of integrated cells per eye was determined by counting all the integrated
Rhop.GFP+ cells or rod α-Transducin+ cells in alternate serial sections through
each eye. This was doubled to give an estimate of the mean number of integrated cells per eye. Cells were considered to be integrated if the whole
cell body was correctly located within the outer nuclear layer, and at least
one of the following was visible: spherule synapse, inner/outer processes,

inner/outer segments. Animals were omitted from quantification analysis only
if there was clear evidence of an injection occurring intravitreally or if no cell
mass was evident in the subretinal space. Counts were done masked such that
the identity of the transplanted cells was not known.

doi:10.1038/nbt.2643

nature biotechnology

letters

Single-cell gene expression analysis reveals genetic
associations masked in whole-tissue experiments

npg

© 2013 Nature America, Inc. All rights reserved.

Quin F Wills1, Kenneth J Livak2, Alex J Tipping3, Tariq Enver3, Andrew J Goldson4, Darren W Sexton5 &
Chris Holmes1,6–8
Gene expression in multiple individual cells from a tissue
or culture sample varies according to cell-cycle, genetic,
epigenetic and stochastic differences between the cells.
However, single-cell differences have been largely neglected
in the analysis of the functional consequences of genetic
variation. Here we measure the expression of 92 genes affected
by Wnt signaling in 1,440 single cells from 15 individuals to
associate single-nucleotide polymorphisms (SNPs) with geneexpression phenotypes, while accounting for stochastic and
cell-cycle differences between cells. We provide evidence that
many heritable variations in gene function—such as burst size,
burst frequency, cell cycle–specific expression and expression
correlation/noise between cells—are masked when expression
is averaged over many cells. Our results demonstrate how
single-cell analyses provide insights into the mechanistic and
network effects of genetic variability, with improved statistical
power to model these effects on gene expression.
Human, clinical, genome-wide association studies (GWAS) have been
used to correlate genetic variants with disease and pharmacogenomic
traits, typically in a hypothesis-free manner. To investigate the mechanisms underpinning these statistical associations, molecular phenotyping technologies have also been used to associate traits such as
gene expression with genetic variants. Progress has been slow, with
tissue specificity, the low resolution of DNA genotypes and the technical challenges of assaying molecular traits all thought to be important
limiting factors1.
The most commonly studied molecular trait is baseline, wholetissue gene expression. In these studies, genetic variants that are associated with variation in gene expression are referred to as expression
quantitative trait loci (eQTLs)2. A trait with ‘smooth’ variation between
individuals due to many small contributing factors is treated as being
quantitative (continuous) in epidemiological models. However, this
is a simplistic treatment of gene expression as a trait, as its genetic
perturbation need not result in the smooth variation of expression.
This is most commonly seen when cancer somatic variations result in
pronounced transcriptomic changes. Theoretical discussions around

concepts such as self-organizing criticality3 have proven popular as
attempts at explaining this observed complexity of gene expression
and its regulation. The same lack of ‘smoothness’ can be seen with
the environmental perturbation of gene expression—as recently
demonstrated with lipopolysaccharide-exposed immune cells4—and
is likely to become an important theme in the pharmacogenomics of
gene function.
Although a number of disease-associated genetic variants have
been shown to be linked to eQTLs5, in most cases no such correlation exists. We hypothesize that instead of influencing the average gene expression of a gene in a whole organism or a specific cell
type or tissue, these variants might change the cell-to-cell variability,
temporal dynamics or cell cycle dependence of gene expression at
the single-cell level.
Here we explore whether studying individual cells can begin to provide greater mechanistic insights into how SNPs quantitatively affect
gene function, as opposed to just assaying their effects on average
tissue expression. We refer to these variants as single-cell quantitative
trait loci (scQTLs).
To demonstrate the importance of cell-to-cell variability, we mea­
sured gene expression of selected genes in fresh, naive B lymphocytes
from three individuals. Gene expression typically had much greater
variability between cells within an individual than between indivi­
duals, and the distribution of gene expression values is very different
between individuals for some genes. (Fig. 1a). The currently understood reasons for this large cell-to-cell noise are thermodynamic,
regulatory and cellular (Supplementary Fig. 1).
As a basis for studying the association of single-cell phenotypes
with genetic variants, we first sought to generate high-quality data in
a large population of cells (1,440 cells). We measured gene expression
using highly parallel qPCR validated with digital PCR, as single-cell
RNA sequencing still faces notable technical challenges6,7. We focused
on 92 genes affected by Wnt signaling, a major regulator of the cell
cycle that has been highlighted as a key pathway in clinical GWAS
and cancer epidemiology (Supplementary Fig. 2). Of the 92 genes
studied, 46 are listed in the Catalog of Genome-Wide Association
Studies (http://www.genome.gov/gwastudies). Wnt pathway genes

1Department of Statistics, University of Oxford, Oxford, UK. 2Fluidigm Corporation, South San Francisco, California, USA. 3Stem Cell Laboratory, UCL Cancer
Institute, University College London, London, UK. 4UEA Flow Cytometry Services, BioMedical Research Centre, School of Biological Sciences, University of East
Anglia, Norwich, UK. 5BioMedical Research Centre, Norwich Medical School, University of East Anglia, Norwich, UK. 6Wellcome Trust Centre for Human Genetics,
University of Oxford, Oxford, UK. 7Nuffield Department of Medicine, University of Oxford, Oxford, UK. 8Medical Research Council Harwell, Harwell Science and
Innovation Campus, UK. Correspondence should be addressed to Q.F.W. ([email protected]).

Received 8 February; accepted 14 June; published online 21 July 2013; doi:10.1038/nbt.2642

748

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

letters
a

b

Density

Figure 1  Single-cell gene expression
ACTB (0%)
GAPDH (0%)
distributions. (a) Wnt pathway genes from
JUN (1%)
FOXO1 (1%)
naive B lymphocytes in G0 phase of their cell
CTNNB1 (1%)
cycle were assayed in three human donors.
DAAM1 (2%)
EIF4E (2%)
Genes with expression in at least 50% of
POLR2A (2%)
CCND3 (2%)
the cells are shown (the percentage of cells
CSNK2A1 (2%)
without detectable expression are shown
USMG5 (3%)
S
PPIA (3%)
in parentheses). The box plot provides the
G1
MINPP1 (3%)
expression interquartile ranges, with the
PPP2CA (3%)
2,000
4,000
6,000
8,000
10,000
0
PRKCE (3%)
values for the three human donors per gene
Perturbed expression (median of 47 cells with expression)
FRZB (4%)
TCF4 (7%)
plotted together. Each of the genes show
PYGO1 (8%)
greater variability between cells than between
CTBP1 (9%)
CCND2 (9%)
individuals, with a level of ‘noise’ that is
TOP2B (10%)
GSK3A
(13%)
different for each gene. (b) The expression
MYC (14%)
distributions of PPP2R1A are shown in 15 cell
RAC1 (14%)
PPP2R1A (14%)
lymphoblast cell lines perturbed with a GSK3
HDAC9 (15%)
Wnt activated
GTSE1
MAP3K7 (16%)
inhibitor. Each cell line generated a median
Baseline
GSK3B (18%)
of 47 cells with PPP2R1A expression.
PPP2R5E (19%)
Wnt activated
ID2 (20%)
ADAR
Each curve represents a kernel density estimate
Baseline
CASP2 (25%)
of the distribution of the PPP2R1A gene
RARS (28%)
Wnt activated
CDKN1A (28%)
CDH1
expression of a single sample, with the gray
Baseline
FZD1 (31%)
ADAR (33%)
curve providing the combined distribution
Wnt activated
CSNK1G1 (33%)
VEGFC
of all samples. The primary data, shown as
NLK (35%)
Baseline
AXIN1 (36%)
tick marks below the distributions, give the
Wnt activated
BCL9 (36%)
MYC
TBP (36%)
expression values for individual cells, and the
Baseline
ICT1 (36%)
two red points are the mean estimates for a
WNT16 (39%)
Wnt activated
TCF7L2
APC (41%)
mixture of two Poisson distributions. If the
Baseline
BTRC (44%)
FZD2 (44%)
skewed distribution represents mostly promoter
PPARD (44%)
switching (Supplementary Fig. 3), the Poisson
TCF7 (45%)
0
1
2
3
4
NPPC (47%)
means and mixing proportion can be used as
Expression (log10 scaled)
KREMEN1 (49%)
markers of gene burst size and frequency. The
0
1
2
3
4
box plots provide the expression distributions for
Expression (log10 scaled)
cells in G1 versus early S phase. (c) Genes from
15 lymphoblastoid cell lines with statistically
significant differential expression (when chemically perturbed) are compared. The black lines show change in median expression and the blue bars
provide the inter-quartile expression ranges across cells. From these examples it can be seen that genes change not only whole-tissue expression but
also their expression noise.

npg

© 2013 Nature America, Inc. All rights reserved.

c

thus together form an ideal model for studying a clinically relevant
system where we expect to be able to identify genetic drivers of expression behavior. In addition to Wnt system genes, classical reference
genes such as GAPDH were assayed and found to demonstrate substantial variability in expression between cells. This makes it impossible to use such genes for traditional data normalization, and has
been discussed elsewhere8.
HapMap lymphoblastoid cell lines derived from 15 unrelated individuals of European descent9 were perturbed for 24 h with 10 µM
SB216763, a Wnt pathway agonist that inhibits GSK3 (ref. 10).
Forty-eight cells in each of 30 samples (15 baseline and 15 perturbed
cell lines) were assayed using a combination of flow cytometry and
microfluidic gene expression chips8 (Online Methods).
We observed variability both in the average gene expression values as well as distribution of expression values between the 15 indivi­
duals in both the unperturbed and inhibitor-treated cell populations
(Fig. 1b,c). A number of parameters can be deduced from these data
and the correlation of these to genetic variants tested.
Considering gene expression phenotypes in terms of single-cell
distributions provides important information about gene regulation,
as the noise from the regulation of transcription can be considered
separately from the noise of RNA turnover (Supplementary Fig. 1).
Thus constitutively expressed genes, for example, are expected to be
less noisy, demonstrating mostly the thermodynamic noise of RNA
turnover in the absence of variable regulation (Supplementary Fig. 1).
Analyzing four, public, single-cell RNA sequencing data sets6,7, we
found that this is indeed the case (Supplementary Notes, section 1).
A gene that is not constitutively expressed can be described in terms of

how often it switches ‘on’ (burst frequency), the amount of RNA produced when ‘on’ (burst size) and the rate at which its RNA is degraded.
A recent study11 of 8,000 human loci found that almost all of them
exhibited such ‘bursty’ expression, with certain loci modulating burst
frequency and others modulating burst size.
Our data suggest that the genes in this study differ from each other
in terms of burst size (Fig. 2a). Whereas an increase in both burst
size and frequency elevates mean gene expression, an increase in
only burst size raises the expression variance between cells relatively
more than the expression mean. This is evidenced by an increase in
the expression Fano factor (variance/mean). As genetic variants are
expected to affect these dynamics, we propose that it is important
for eQTL studies to begin considering expression dynamics in terms
of parameters from models describing cell-to-cell variability. Gene
expression between cells has been described as being log-Normal
or Gamma distributed12,13. Although the log-Normal model maps
to the standard Gaussian distribution, an advantage of the Gamma
model is that its parameters relate directly to gene burst frequency and
size (Supplementary Notes, section 2). We suggest that a more complete model should describe the Poisson-distributed thermodynamic
contribution to a gene’s noise and its mixing owing to gene bursting (Supplementary Figs. 3 and 4). Gene expression could, thus, be
modeled as overdispersed Poisson noise (variance greater than would
be expected with Poisson noise). For this, one suitable distribution
is the negative binomial, which is equivalent to a Gamma-Poisson
continuous mixture model where the Gamma distribution parameters of κ (shape) and φ (scale) increase with burst frequency and
size, respectively. For this work, a discrete three-parameter Poisson

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

749

npg

a

n

so

is

o
-P

a

m

am

Fano factor (log10 scaled)

G

Poisson

0.1

0

b

1

2
Mean expression (log10 scaled)

BCL9
EIF4E

LRP5

DVL2
MAP3K7

CTNNB1
CDKN1A

PPIA

4

3

ICT1
TOP2B

*
POLR2A

LDLR

*
WNT1

CCND2

*

PPP2R5E

*

ty

Figure 2  Properties of gene expression
noise. (a) Fano factors (variance/mean) of the
study’s genes are plotted against their mean
expression. Orange points are genes better
fitted with Poisson than overdispersed
Poisson distributions (Online Methods).
The Fano factors of highly expressed genes
are proportionally higher, in keeping with an
overdispersed Gamma-Poisson model of gene
expression (Supplementary Fig. 4). The Fano
factor for a Gamma-Poisson model increases
by φ + 1, where φ is the scale parameter of the
Gamma distribution, which increases with gene
burst size. If genes differ by burst frequency,
rather than burst size, the data would scatter
along a line parallel to the Poisson line.
(b) Baseline cell-to-cell expression correlations
of |ρ| > 0.5 are shown for ~200 cells from
sample GM10860. Genes are ordered clockwise
according to increasing number of correlations
(network connectivity). Red edges are
negative correlations, whereas green edges are
positive correlations (ρ > 0.7 are dark green).
Correlations that increase with perturbation
are plotted with bold lines. The right-hand
side cluster is a hub of highly connected
genes that tend to increase their correlations
with perturbation. We refer to this as a ‘noise
regulon’. Notably, the regulon correlations vary
without detectable change in mean expression
of any genes except ADAR.

Increasing con
nec
tiv
i

© 2013 Nature America, Inc. All rights reserved.

letters

TCF7L2

GAPDH

*

*
mixture was used as an approximation to the
GTSE1
ADAR
slower fitting, four-parameter, Beta-Poisson
*
mixture (Supplementary Fig. 4). Such a
ACTB
GSK3B
Poisson mixture model not only describes
*
the long-tailed behavior described by the
CASP2
CCND3
negative binomial, but allows for the expected
expression bimodality in genes with low burst
*
TBP
GSK3A
frequency. These models and their rationales
WNT4
are further detailed in the Online Methods
*
RAC1
and Supplementary Notes, section 2.
ID2
DAAM1
CCND1
Gene expression distributions can also be
RARS *
described in terms of heterogeneous cell subUSMG5 *
SOX17
WNT2B
TCF7
populations. We considered cells in different
CTBP1
stages of the cell cycle, and their varying pro*
DKK1
NPPC
MYC
NLK
* Noise
regulon
CSNK2A1
portions between samples. Using flow cyto­
PPP2R1A
*
PPP2CA
metry, we excluded cells with increased DNA
AXIN1
*
FRZB
CDH1
VEGFC
*
content, as would be expected in the S and
G2 cell cycle phases. We further subdivided
In addition to considering expression distributions and cell
cells into G1 and early S-phase based on their expression of GTSE1, a
cell division molecular switch that becomes highly expressed in the S cycle subpopulations, single-cell gene expression can be used to
and G2 phases14. Almost two-thirds of the genes demonstrated altered generate an expression network per sample. This is in contrast to
expression between G1 and early S-phase (PPP2R1A is shown in most systems biology approaches in human genetic epidemiology
Fig. 1b), raising the question of how many associations are driven that describe a network of all samples combined, rather than the
by differences in cell cycle subpopulation proportions between more explicit comparison of network parameters between samples.
samples. Per cell culture, the proportion of cells without increased Gene expression noise can thus be treated as a third form of perDNA content was found to be significantly anti-correlated with cell turbation, together with genetic and chemical perturbation.
density after 48 h of growth (Spearman’s ρ, −log10P = 5.00). We used Figure 2b plots an example of cell-to-cell gene correlations in ~200
this as a marker of cell line growth, which is a known confounder15,16 cells from one of the lymphoblastoid cell lines, for genes detected
and was hence adjusted for in the SNP associations. Growth was noted in at least 50% of the cells. This network provides an example of
to be sample specific, batch specific, and correlated with the expres- expression behavior detected only at the single-cell resolution: the
sion and noise of several genes (Supplementary Fig. 5). Counts of correlated and anti-correlated expression between cells. One can
active Epstein-Barr virus replication—the agent used to immortalize define ‘noise regulons’, that is, groups of genes co-regulated within
this expression heterogeneity, whose Spearman correlations alter
the cells—were assessed but did not correlate with growth.

750

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

letters
a

c

npg

© 2013 Nature America, Inc. All rights reserved.

Expression

b

Expression

on
lati
rre
co

SOX17
baseline correlation

Inc
Figure 3  The heritability of single-cell
rea
12
sin
CASP2
10
expression. As shown in Figure 1b, the
g
GAPDH
0.4
8
single-cell expression distribution of PPP2R1A
CCND3
6
USMG5
EIF4E
4
is highly variable between individuals,
ACTB
15
0
2
suggesting heritable drivers. (a) PPP2R1A has a
10
2 4 6 8 10 12
–0.4
GSK3B
nominally significant additive SNP association
RAC1
A/A
G/G
A/G
5
PPP2CA
with its baseline SOX17 correlation. The leftCTNNB1
rs8111131 (additive)
hand side plot shows each individual’s genotype
5 10 15
CDKN1A
PPP2R1A
CSNK2A1
versus their PPP2R1A-SOX17 correlation. The
right-hand side plots show the correlations for
POLR2A
CTBP1
two selected points. In both plots, the x-axis is
PPIA
the ranked PPP2R1A expression and the y-axis
Time
RARS
is the ranked SOX17 expression. (b) Single-cell
PPP2R5E
correlations are a useful phenotype to study
CCND2
GSK3A
TOP2B
heritable gene relationships, but more complete
ADAR
interpretation requires temporal studies.
Time
Gene expression correlations have time lags
and, as shown for two theoretical genes, this may be small (top plot) or large (bottom plot). The altered SOX17 correlation in a may simply represent a
change in the time lag with the putative altered PPP2R1A transcription rate by rs8111131. (c) The mean perturbed correlations of 21 genes in T allele
homozygotes for PPP2R1A are ordered clockwise by increasing Spearman correlation. Those genes with ρ > 0.5 are in green, those with ρ > 0.6 are in
orange, whereas those with ρ > 0.7 are in violet. All correlations decrease in individuals with a C allele of rs9304726, with those genes connected by
dashed lines dropping below a mean correlation of 0.5.

with chemical perturbation, but without detectable change in mean
(whole-tissue) expression.
For the SNP association testing we considered how such noisy
correlations (co-expression of different genes between genetically
homogeneous cells) of genes vary between individuals. We also
considered how the network connectivity of each gene varies
(the number of genes it correlates with).
Overall, for each gene the following phenotypes were measured
and associated with SNPs within 50 kb of the gene: (i) whole-tissue
expression, measured as the mean value over all cells; (ii) expression
heterogeneity/noise, measured as the gene’s Fano factor; (iii) burst
size, inferred from a discrete Poisson mixture; (iv) burst frequency,
inferred from a discrete Poisson mixture; (v) individual Spearman
correlation strengths with the five most correlated genes; (vi) network connectivity, measured as number of correlations of |ρ| > 0.5;
(vii) G1 and S-phase expression based on detectable GTSE1 expression and (viii) the number of cells for which expression could not
be detected.
The motivation for the latter phenotype is that gene expression may
be too low to be reliably detected, and thus zero-inflated, as shown in
Supplementary Figures 6 and 7. All phenotypes were analyzed in the
baseline state, perturbed state and as the log ratio of the perturbed to
baseline states. The top 374 SNP associations (above −log10P = 3) are
tabled with their corresponding P values in Supplementary Table 1.
To control for multiple testing, we considered and permuted only
the most statistically significant association per phenotype. For each
permutation, the maximum −log10P was used to generate a null distribution of most significant P values, and so provide a family-wise error
correction. As most of the genes are affected by a single pathway (that
is, are not independent), a conservative global significance threshold of
−log10P = 4 was used. We found 47 significant associations (0.9% of
all association tests), which are detailed in Supplementary Table 2,
where seven genes found in clinical GWAS studies that are without
previously known eQTLs are highlighted. When considering only
whole-tissue expression (and reducing the multiple testing threshold
to −log10P = 3), this resulted in an eightfold reduction in the number
of hits to 6; suggesting that a large portion of SNP effects typically
go undetected. It is interesting to note from the list of hits the overrepresentation of Wnt receptor genes that have altered ­correlations
with downstream genes, adding to the validity of correlations as a

phenotype in genetic epidemiology, as one would expect altered
correlations to reflect signaling pathway structure.
We also noted that clinical GWAS genes demonstrate greater G1 and
early S-phase inter-individual variability compared with other genes
(−log10P = 4.17 and −log10P = 5.27, Mann-Whitney testing of baseline
expression), which is more significant than the observed variability at
the whole-tissue level (−log10P = 3.13). At the systems level, clinical
GWAS genes appear, also, to have greater interindividual variability
of their network connectivities (−log10P = 2.37 for perturbed expression (Supplementary Fig. 8), −log10P = 2.24 for baseline expression).
Although we describe genes of only a single pathway, we speculate that
such results could be used to identify genes that are key modulators
of pathology risk and prognosis. Using cross-validation for model
selection, growth was included as a variable in most of the significant
associations, demonstrating a larger effect size than the co-associated
SNP for 36% of scQTLs. This pervasive association with growth is not
surprising considering the strong cell cycle activity of these genes. If
considered individually, the mean R2 values for genotype and growth
associations in the 47 hits suggest a statistical power of 5.6% and 2.9%,
respectively. When modeled together this raises the study power substantially to 36%, using almost half the sample size that would be
required to achieve this power if growth was not considered.
In addition to the cell cycle phase, noise and growth associations, the
results allowed us to propose mechanistic and systems-level hypotheses not possible with whole-tissue associations. Using PPP2R1A as
an example, our results suggest the interindividual variability shown
in Figure 1b to be a result of a gene burst size associations with SNPs
rs8111131 and rs8108607. These are listed together with other associations in Supplementary Table 1. The latter SNP is immediately
downstream of a binding site for transcriptional repressor CTCF17,
with a much smaller effect that would not be statistically significant
if considered without the sample growth effect. Although not globally
significant, rs8111131 also appears to drive a reversal of correlation
with SOX17 that may be a temporal effect (−log10P = 2.7, Fig. 3a,b).
Both genes negatively regulate cell growth by inhibiting Wnt signaling. Added to this, the C allele of rs9304726 (in linkage disequilibrium
with rs8108607) was found to almost halve PPP2R1A’s network connectivity from 21 to 11 (Fig. 3c). An interpretation of these results
is that variation of PPP2R1A’s transcription properties might have
broad systemic consequences, making it a promising ­candidate for

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

751

letters
more detailed follow-up studies to investigate the molecular basis of
the heritability of variations in Wnt signaling.
In conclusion, gene expression is not only different between individuals but also between cells. Genes display complex and heritable
spatiotemporal expression variability, which we propose is largely
masked without techniques that offer higher resolution. Using a clinically relevant model pathway, and the largest study known to us of
individual cells to date (1,440 cells), we have provided evidence that
this masking is likely to be important enough to require the inclusion
of single-cell technologies as part of the standard genetic epidemiology toolbox.

COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details are available in the online
version of the paper.

1. Nica, A.C. et al. The architecture of gene regulatory variation across multiple human
tissues: the MuTHER study. PLoS Genet. 7, e1002003 (2011).
2. Li, H. & Deng, H. Systems genetics, bioinformatics and eQTL mapping. Genetica
138, 915–924 (2010).
3. Bak, P. et al. Self-organized criticality: an explanation of the 1/f noise. Phys. Rev.
Lett. 59, 381–384 (1987).
4. Shalek, A.K. et al. Single-cell transcriptomics reveals bimodality in expression and
splicing in immune cells. Nature 498, 236–240 (2013).
5. Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation
to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
6. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly
multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).
7. Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and
individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
8. Livak, K.J. et al. Methods for qPCR gene expression profiling applied to 1440
lymphoblastoid single cells. Methods 59, 71–79 (2013).
9. International HapMap 3 Consortium. Integrating common and rare genetic variation
in diverse human populations. Nature 467, 52–58 (2010).
10. Coghlan, M.P. et al. Selective small molecule inhibitors of glycogen synthase kinase3 modulate glycogen metabolism and gene transcription. Chem. Biol. 7, 793–803
(2000).
11. Dar, R.D. et al. Transcriptional burst frequency and burst size are equally modulated
across the human genome. Proc. Natl. Acad. Sci. USA 109, 17454–17459 (2012).
12. Bengtsson, M., Stahlberg, A., Rorsman, P. & Kubista, M. Gene expression profiling
in single cells from the pancreatic islets of Langerhans reveals lognormal distribution
of mRNA levels. Genome Res. 15, 1388–1392 (2005).
13. Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with singlemolecule Sensitivity in single cells. Science 329, 533–538 (2010).
14. Bublik, D.R.R., Scolz, M., Triolo, G., Monte, M. & Schneider, C. Human GTSE-1
regulates p21(CIP1/WAF1) stability conferring resistance to paclitaxel treatment.
J. Biol. Chem. 285, 5274–5281 (2010).
15. Choy, E. et al. Genetic analysis of human traits in vitro: drug response and gene
expression in lymphoblastoid cell lines. PLoS Genet. 4, e1000287 (2008).
16. Im, H.K.K. et al. Mixed effects modeling of proliferation rates in cell-based models:
consequence for pharmacogenomics and cancer. PLoS Genet. 8, e1002525
(2012).
17. Cuddapah, S. et al. Global analysis of the insulator binding protein CTCF in
chromatin barrier regions reveals demarcation of active and repressive domains.
Genome Res. 19, 24–32 (2009).

752

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

Methods
Methods and any associated references are available in the online
version of the paper.
Note: Supplementary information is available in the online version of the paper.
Acknowledgments
Many thanks to L. Toji at the Coriell Institute for her valuable input on the cell line
growth and transformation characteristics. Also, thanks to the following people
at Fluidigm: B. Jones for his overall support, G. Harris and D. Wang for their
help with primer design, and the meticulous technical assistance of K. Datta and
R. Mittal. C.H. and T.E. are funded by the Medical Research Council of the UK.
T.E. is also funded by Leukaemia Lymphoma Research and EuroSyStem.
AUTHOR CONTRIBUTIONS
Q.F.W. and C.H. conceived and designed the study. A.J.T. and T.E. ran the initial flow
cytometry characterization and cell culture optimization. A.J.G. and D.W.S. ran the main
study’s cell culture and flow cytometry, further optimizing the sample characterization.
K.J.L. designed and optimized the single-cell RNA assays, and generated the gene
expression chip data. Q.F.W. analyzed the data and wrote the manuscript.

npg

© 2013 Nature America, Inc. All rights reserved.

Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.

npg

© 2013 Nature America, Inc. All rights reserved.

ONLINE METHODS

Culture and perturbation of lymphoblastoid cell lines. Fifteen lympho­
blastoid cell lines from unrelated HapMap individuals of European descent
were supplied by the Coriell Institute. Further details on these samples are
provided in the Supplementary Data. An initial 20 samples were selected
based on age (samples from older subjects tend to grow more poorly) and no
known poor growth characteristics. Four samples were removed based on slow
growth characteristics and unsuitable IgM immunophenotypes (see “Flow
cytometry and sample stratification”). Slow growth was found to broadly
correlate with poor cell viability based on ATP content (CellTiterGlo, http://
www.promega.com). All samples were seeded at 4 × 10 5cells/ml in standard
media (RPMI 1640 containing l-glutamine (http://www.lifetechnologies.
com/, 21875), 15% fetal calf serum (http://www.gehealthcare.com, A15-104),
and penicillin/streptomycin (100 units per ml/100 mg per ml final concentration (http://www.lifetechnologies.com/, 15140-122)). In order to avoid batch
to batch variations for cell growth, the standard media for all cell cultures
were obtained from single batches of each of the cell culture constituents.
Cells were initially passaged in T-25 flasks with all perturbations occurring
in 24-well plates. Passage numbers were the same for all cells lines used and
never exceeded 6. Treatment with 22.5 mg/ml acycloguanosine (Acyclovir)
to suppress Epstein-Barr-Virus (EBV) activity was not found to have any
observable effect on growth or gene expression and was, thus, omitted (data
not shown). Seeded cells were grown for an initial 24 h, then perturbed with
10 µM SB216763 (http://www.sigmaaldrich.com/) or left unperturbed (baseline) for a further 24 h, before sorting.
Flow cytometry and sample stratification. Protein expression flow cytometry
markers suggested widespread heterogeneity within and between samples;
however, the implication of this is not clear. Although it is feasible that EBV
transformation is not monoclonal, some of the markers appeared to drift (vary
in proportion) within samples over time. Nevertheless, the following three
markers were selected to minimize unwanted heterogeneity:
1. DNA content (phase of cell cycle). G1 cells were selected using nuclear
Hoechst staining. This also helped protect against ‘doublets’ (deposits of
two cells instead of one).
2. IgM expression. In keeping with the nondifferentiated nature of the
lymphoblasts, most samples cultured predominantly IgM− cells18. These
were selected for, to rule out any EBV-related heterogeneity that may be
occurring with the inclusion of IgM+ cells.
3. CD27 expression. CD27 expression is a marker for memory and plasma
B cells19, and, so, would not be expected in naive and undifferentiated cells.
CD27+ cells were excluded.
A BD FACS Aria II (Becton Dickinson) flow cytometer was used to perform single-cell sorting following the manufacturer’s aseptic sort protocol. Cells were counted and viability assessed using a hemocytometer
and trypan blue dye exclusion before staining. Nuclear DNA was stained
using Hoechst 33342 (2 µg/ml) in buffer (pH 7.2) containing HBSS, 20 mM
HEPES (http://www.invitrogen.com/), 5.55 mM glucose, 10% fetal calf
serum, 50 µM Verapamil for 90 min at 37 °C, with gentle vortexing every
15 min. Cells were subsequently stained with PE-Cy7 CD27 (http://www.
ebioscience.com) and Biotin IgM (http://www.bdbiosciences.com) antibodies
for 20 min and Streptavidin APC-eFluor 780 (http://www.ebioscience.com)
secondary antibody staining for a further 15 min. Antibody concentrations used were those recommended by the manufacturer and all antibody
staining was done on ice in the Hoechst buffer specified above. Hoechst
33342 staining was detected using 375 nm laser illumination and 450/40 nm
band pass–filtered detection; PE-Cy7 CD27 was detected using 488 nm
laser excitation and 780/60 nm band pass–filter detection; and IgM APCeFluor 780 was detected using 633 nm laser excitation and 780/60 nm band
pass filter detection. Individual cells were sorted using the following gating criteria: debris discrimination using forward and orthogonal 488 nm
laser scatter (cells selected), doublet discrimination using orthogonal pulse
height and width (individual cells selected), and the above listed markers.
In order to obtain maximum purity, cells were sorted twice using the defined
gating strategy. Initially, sorted cells were collected as a pooled sample and

doi:10.1038/nbt.2642

subsequently re-sorted for single-cell deposition directly into pre-aliquoted
lysis solution (see the section on “highly parallel qPCR”).
Details of flow cytometry and EBV assays related to cell growth are
provided in the Supplementary Notes, section 3. Details of the fresh
naive B lymphocyte isolation and phenotyping are also provided in the
Supplementary Notes, section 4, and Supplementary Figure 10.
Cell culture reproducibility. The hit genes listed in the Supplementary Notes,
section 8 were validated by comparing their expression in six of the cell lines
(GM12239, GM11881, GM12752, GM06991, GM07029, GM07019) with a
duplicated cell culture batch. Cell culture duplicate QQ plots of Cq values
from the combined samples are shown in Supplementary Figure 9 for each
gene. The adjacent bar plots show the interquartile ranges in the six baseline
(B) and six perturbed (P) samples. Duplicate samples from the two cell culture
batches are plotted in the same color. Most genes were found to be have highly
reproducible expression, expect for the three marked with borders. These were
genes expressed at low levels. As can be seen with some of the other genes, poor
reproducibility occurs with Cq values >18 (marked in gray on the QQ plots).
This is due to library generation and qPCR effects of very low starting RNA;
these are discussed in the Supplementary Notes, section 5.
Tests of SNP association. Generation, QC and normalization of the expression data are described in the Supplementary Notes, sections 5–7, and
Supplementary Figure 11. All phenotypes per gene were associated with the
publicly available HapMap SNP genotypes (http://hapmap.ncbi.nlm.nih.gov)
located 50 kb either side of the gene. Associations with less than 10 genotypephenotype pairwise complete values were omitted from further analysis. As
genes were selected for absence of nearby CNVs, these were not considered.
Additive, dominant and recessive genotype effects were tested against the
described phenotypes together with the growth effect described in the main
text. Using leave-one-out cross-­validation for each genotype, the model with
the lowest predictive error was selected (genotype-only model versus growthonly model versus genotype-plus-growth model). In addition to ordinary leastsquares (OLS) regression, robust Theil-Sen estimation and Kendall’s τ were
used to improve on the potential type II error rate with associations departing
from parametric assumptions. The Theil-Sen estimate of association is an
unbiased nonparametric linear regression approach20, being distribution free
while still retaining a high precision. As a measure of association it is simply
the median of all pairwise slopes. Under parametric assumptions the TheilSen estimate demonstrates a 91% Pitman efficiency, and has been shown to be
more efficient that OLS regression when data are not normal and skewed21.
This efficiency and robustness makes the Theil-Sen approach an attractive
option. As recommended by Sen22, Kendall’s τ was used to determine significance, whereas the Theil-Sen estimate was used for model selection. TheilSen multiple regression was by using the linear combination of genotype and
growth that minimized the variability of Theil-Sen pairwise slopes. The exact
P value from Kendall’s τ tested the null hypothesis that τ = 0, whereas OLS
regression tested the null hypothesis that the slope coefficient β = 0. If the
Theil-Sen estimate was more statistically signi­ficant than the OLS estimate, it
was the estimate taken forward for multiple testing correction. To control for
multiple testing, only the most statistically significant association per phenotype was considered. If the −log10P proved greater than 3, the phenotype was
permuted and retested against all genotypes 104 times. For each permutation, the maximum −log10P was used to generate a null distribution of most
significant P values, and so provide a family-wise error correction. As most
of the genes are affected by a single pathway (that is, are not independent),
a conservative corrected significance threshold of −log10P = 4 was used.
18. Hardy, R.R. & Hayakawa, K. B cell development pathways. Annu. Rev. Immunol.
19, 595–621 (2001).
19. Wu, B., Piatkevich, K.D., Lionnet, T., Singer, R.H. & Verkhusha, V.V.
Modern fluorescent proteins and imaging technologies to study gene expression,
nuclear localization, and dynamics. Curr. Opin. Cell Biol. 23, 310–317 (2011).
20. Siegel, A.F. Robust regression using repeated medians. Biometrika 69, 242–244
(1982).
21. Johnstone, I.M. & Velleman, P.F. The resistant line and related regression methods.
J. Am. Stat. Assoc. 80, 1041–1054 (1985).
22. Sen, P.K. Estimates of the regression coefficient based on Kendall’s Tau. J. Am.
Stat. Assoc. 63, 1379–1389 (1968).

nature biotechnology

letters

Bispecific antibodies with natural architecture produced
by co-culture of bacteria expressing two distinct
half-antibodies

npg

© 2013 Nature America, Inc. All rights reserved.

Christoph Spiess1,6, Mark Merchant2,6, Arthur Huang1,5, Zhong Zheng2, Nai-Ying Yang2, Jing Peng2,
Diego Ellerman3, Whitney Shatz3, Dorothea Reilly4, Daniel G Yansura1 & Justin M Scheer3
By enabling the simultaneous engagement of two distinct
targets, bispecific antibodies broaden the potential utility
of antibody-based therapies. However, bispecific-antibody
design and production remain challenging, owing to the need
to incorporate two distinct heavy and light chain pairs while
maintaining natural nonimmunogenic antibody architecture.
Here we present a bispecific-antibody production strategy that
relies on co-culture of two bacterial strains, each expressing
a half-antibody. Using this approach, we produce 28 unique
bispecific antibodies. A bispecific antibody against the
receptor tyrosine kinases MET and EGFR binds both targets
monovalently, inhibits their signaling, and suppresses MET
and EGFR-driven cell and tumor growth. Our strategy allows
rapid generation of bispecific antibodies from any two existing
antibodies and yields milligram to gram quantities of bispecific
antibodies sufficient for a wide range of discovery and
preclinical applications.
The modular architecture of IgG immunoglobulins allows engineering
of bispecific antibodies. The bivalent, though monospecific, nature
of an IgG antibody is achieved by two identical antigen-binding Fab
arms connected to a constant Fc domain. Although this geometry
provides the immune system with many advantages, in a therapeutic
context the direct pairing of two independent antibody specificities
to create a new bispecific antibody can provide novel and superior
modes of action. For example, it can facilitate recruitment of cytotoxic
T cells to tumor cells1, simultaneously inhibit two signaling pathways
or serve as a transport mechanism to shuttle an antibody across the
blood-brain barrier2.
Unfortunately, it is difficult to produce therapeutically useful
bi­specific antibodies because most proposed bispecific antibody formats
have substantial drawbacks. For example, because some bispecific
antibody fragments (e.g., the anti-CD19-CD3 single-chain fragment
blinatumomab1) are expressed as a single polypeptide chain they
include potentially immunogenic linkers. In addition, because they
lack natural Fc regions, they cannot bind to the neonatal FcRn receptor;

binding to FcRn delays antibody clearance and improves pharmacokinetic (PK) properties. Strategies to create more stable bispecific
antibodies with an Fc resulted in generation of hybrid molecules (e.g.,
the rat-murine hybrid molecule catumaxomab, which is specific for
epithelial cell adhesion molecule (EPCAM) and CD3 (ref. 3)), dual
variable domain IgGs (DVD)4,5, and ‘two-in-one’ IgGs (DAF)6,7. Each
of these approaches has disadvantages. The rat-murine hybrid format
introduces nonhuman sequences, which may result in immunological
responses that accelerate clearance and inhibit the function of the
antibody in humans. The DVD and DAF formats create molecules
that bind bivalently to both target antigens; although this could be
an advantage in cases where crosslinking and agonism of a target
is desired, it can be a disadvantage in cases when blockade and/or
antagonism of a target is the goal. These approaches also require complex engineering and production processes.
Knobs-into-holes heterodimerization technology offers a way to
address the shortcomings of bispecific antibody production by efficiently creating antibodies that have independent Fab arms (which
bind monovalently to target antigens) and retain natural IgG architecture8,9. Mutations in the CH3 interface promote the assembly
of heavy chains from two different parental antibodies (one with a
‘knob’ mutation and the other with ‘hole’ mutations) into a single
new bispecific antibody. This concept was recently modified to utilize electrostatic steering and strand-exchanged engineered domains
to drive heterodimerization of the two different heavy chains10,11.
Although these approaches can suppress the formation of Fc heavy
chain homodimers, they cannot prevent the mispairing of the light
chains associated with the two different heavy chains. The only way
to prevent light chain mispairing is by partial exchange of heavy and
light chain domains, so-called domain crossovers12. Unfortunately
molecules generated with this approach have unnatural domain junctions and lose natural antibody architecture. Therefore, if one wishes
to retain natural antibody architecture, one can use these knobsinto-holes approaches only with bispecific antibodies composed
of distinct heavy chains and a common light chain13,14. Screening
for a common light chain is time consuming, labor intensive, and

1Department of Antibody Engineering, Genentech, Inc., South San Francisco, California, USA. 2Department of Translational Oncology, Genentech, Inc., South San Francisco,
California, USA. 3Department of Protein Chemistry, Genentech, Inc., South San Francisco, California, USA. 4Department of Early Stage Cell Culture, Genentech, Inc.,
South San Francisco, California, USA. 5Present address: Laboratory for Circuit and Behavioral Physiology, RIKEN Brain Science Institute, Wako-shi, Saitama, Japan.
6These authors contributed equally to this work. Correspondence should be addressed to C.S. ([email protected]) or J.M.S. ([email protected]).

Received 12 September 2012; accepted 20 May 2013; published online 7 July 2013; doi:10.1038/nbt.2621

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

753

letters

npg

a

Anti-EGFR

Anti-MET
kD

M

C

FL

K

H

FL

K

H

C

200
H2L2

150
100
75

HL

50

H

Anti-EGFR

Anti-Met

Anti-EGFR

kD

Anti-Met

+DTT

50

Dimer

0

200

2

4

H2L2
HL
H

Monomer

6
8
10
Time (min)

12

14

8.512

150
100
75

Anti-EGFR
(knob)

160
140
120
100
80
60
40
20
0

7.327

c
Abs280 (mAu)

b

8.197

37

Abs280 (mAu)

© 2013 Nature America, Inc. All rights reserved.

Figure 1  Production of knob and hole half-antibodies in E. coli.
(a) Representative western blot of the full-length (FL) antibody or the
knob (K) or hole (H) mutant versions of the anti-EGFR and anti-MET
molecules. The HL label denotes a half-antibody species consisting of
only one heavy and one light chain, whereas the H 2L2 label denotes a
full antibody. M, molecular weight standard; C, control consisting of no
antibody expression plasmid. This is representative of three independent
western blots. (b) SDS-PAGE analysis of affinity purified anti-MET
(hole) and anti-EGFR (knob) half-antibodies. Both half-antibodies were
expressed in E. coli, extracted by microfluidization at neutral pH and
captured on Protein-A affinity resin. The proteins were analyzed in
both the direct affinity elution and under reducing conditions (+DTT).
Two independent experiments were done at this scale with the same
results. Additional independent experiments at larger culture volumes
were done with similar results. (c) Protein-A affinity purified anti-MET
(hole) and anti-EGFR (knob) half-antibodies were analyzed by size
exclusion chromatography directly after affinity purification, at 1 mg/
ml concentration in a PBS mobile phase containing 150 mM sodium
chloride. Dimer and monomer peaks are indicated, and molecular
weights were verified by light scattering. Two independent analyses
were done with the same results.

Anti-MET
(hole)

160
140
120
100
80
60
40
20
0

constrains antibody discovery efforts to include only those antibodies
with sequence diversity in only one chain. This limits potential for
maturation into high-affinity, target-selective antibodies.
To facilitate production of bispecific antibodies having two distinct
light chains, one group recently exploited the concept of arm exchange,
which occurs naturally in antibodies of the IgG4 isotype15. Briefly, an
IgG4 antibody exchanges one of its own heavy chains and the light
chain attached to this heavy chain (a ‘half-antibody’) with a heavylight chain pair from another IgG4 antibody; this process results in
the generation of a new bispecific antibody. Particular sequences in
the CH3 and core hinge regions are essential for arm exchange, but
these sequences are present only in antibodies of the IgG4 isotype.
By engineering the relevant CH3 and core hinge sequences into IgG1
and IgG2 antibodies, one can facilitate arm exchange in antibodies of
these isotypes16. However, the impact of these engineered mutations
on IgG1 and IgG2 hinge region flexibility and immunogenicity is
unknown. These IgG1 bispecific antibodies containing hinge region
mutations showed faster serum clearance in rats than the parental
IgG1 antibodies16.
Here we describe an approach for efficient generation of nonimmunogenic, stable bispecific antibodies with a natural IgG architecture.
A prime motivator was the desire to develop an antibody able to
simultaneously block signaling through MET and EGFR. MET and
EGFR drive the growth of a marked proportion of non-small cell
lung cancer tumors. MET and EGFR are often co-expressed and coactivated, and MET signaling can compensate for loss of EGFR signaling and vice versa17,18. Importantly, resistance to inhibitors of either
receptor can be mediated by signaling through the other receptor19–21.
Both MET and EGFR are known oncogenes, and multiple therapies
aimed at targeting these receptors are in clinical use. Onartuzumab is a
MET-specific monovalent antibody developed using knobs-into-holes
technology and antibody components expressed in the bacterial periplasm22. This antibody functions similarly to traditional antibodies
with respect to clearance and half-life, and its immunogenicity seems
to be minimal and not directed against the ‘knob’ and ‘hole’ mutations.
Owing to its knobs-into-holes design, onartuzumab is modular in
nature and therefore may be well suited to assembly with the potent
EGFR-specific antibody, D1.5, to form a new bispecific agent7.
We hypothesized that half-antibodies could initially be expressed
in separate cells (thereby preserving the natural pairing of heavy
and light chains), and could later be combined to generate an intact

bispecific antibody. Because the protein quality control system of
eukaryotic cells may not efficiently produce incomplete antibodies 23,
we attempted to express these in Escherichia coli.
Full-length human antibodies have been expressed in E. coli24,25.
Using similar constructs, we expressed full-length anti-EGFR (clone
D1.5) and anti-MET (clone 5D5) antibodies. As expected, the light
and heavy chains assembled into intact IgG antibodies with a molecular weight of 150 kD (H2L2) (Fig. 1a). During antibody folding,
heavy chains rapidly dimerize before association of light and heavy
chains26. Therefore, to suppress the self-assembly of heavy chains and
to maintain the proteins as half-antibodies, we introduced T366W
(knob) or T366S, L368A and Y407V (hole) mutations into the CH3
domains of the antibodies. These half-antibodies migrate with an
apparent molecular mass of 75 kD (HL), indicating that the knob
and hole mutations efficiently suppress heavy chain dimerization
and full-length antibody formation, but not heavy-light formation (Fig. 1a). The knob and hole containing half-antibodies were
expressed in similar amounts.
Because half-antibodies appear stable and soluble in bacterial
whole cell lysates, we attempted to purify, characterize and combine
these distinct half-antibodies in vitro to generate an intact bispecific
antibody. Both knob and hole half-antibodies could be efficiently
isolated by Protein-A capture. Reducing and nonreducing SDS-PAGE
analysis of the purified half-antibodies indicated that the majority of
captured protein corresponded to the expected 75-kD size (Fig. 1b).
Size-exclusion chromatography (SEC) analysis indicated that the halfantibodies purified by Protein-A capture do not aggregate and appear
stable in solution (Fig. 1c). Their molecular masses were confirmed
by electrospray ionization-time of flight (ESI-TOF) mass spectrometry with no sign of disulfide adducts on cysteine residues in the
hinge region. However, upon mixing knob and hole half-antibodies,
neither air nor oxidizing agents such as dehydroascorbic acid successfully oxidized the interchain disulfides (needed to form interchain
disulfide linkages). To determine if the hinge region cysteines were

754

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

37
L

0

2

4

Monomer

6
8
10
Time (min)

12

14

letters

npg

© 2013 Nature America, Inc. All rights reserved.

7.507

Knob
Hole
Bispecific

0

0

00

00

7,

0

00

0,

14

12

0,

00

0

14

6,

0,

00

0

14

10

80

,0

00

14

5,

00

0

Abs280 (mAu)

Reduced

Hole

c

Intensity (10e-4)

Intensity (10e-5)

Mixed

b

Knob

a

Oxidized

Figure 2  In vitro assembly of knob and
hole half-antibodies into an intact bispecific
180
antibody. (a) Oxidation state of half-antibodies
140
Bispecific
produced in E. coli. Half-antibodies purified
kD
100
72,548.69
200
by Protein-A were treated with the efficient
IEF
H2L2
2
60
No NEM D NEM does not
150
thiol-reactive compound N-ethylmaleimide
K
100
20
T react with oxidized
1
HL
(NEM; molecular weight 125 Da). Before and
H hinge cysteines
75
T
0
2
4
6
8 10 12 14
C S
0
after reaction with NEM, half-antibodies were
P
50
Time (min)
H
P
4
analyzed by ESI-TOF mass spectrometry.
C S
72,548.73
146,056.15
P
3
A
+ NEM
These data are representative of more than
1.5
1.50
P
2
146,056.15
37
E
1.0
three similar experiments. (b) SDS-PAGE
L
1.25
L
1
H/H
K/K
L
0.5
1.00
analysis (nonreducing) was used to monitor
G
0
0.75
0
72,250
72,750
the assembly process. Blot shows the knob
0.50
Mass
(amu)
(anti-EGFR) and hole (anti-MET) starting
0.25
0
materials, the mixed starting materials, and
the mix after DTT addition (labeled ‘reduced’) and subsequent oxidation (labeled ‘oxidized’). This experiment
is representative of two identical experiments and of results from many other similar independent experiments.
Mass (amu)
(c) The final bispecific product was analyzed by isoelectric focusing (IEF) and the gel is shown in the inset of the
top panel. The knob and the hole half-antibodies, as well as the bispecific, each migrate to distinct points on the gel.
The bispecific shows a pattern that is intermediate between the knob and the hole, though closer to the migration point of the hole. SEC analysis
produced the top chromatographic trace, which was used to quantify protein aggregate content. The bottom panel shows ESI-TOF mass spectrometry
analysis of the same product. The predicted locations for knob-knob (K/K) or hole-hole (H/H) homodimers are shown in the inset. The inset is a
zoomed-in view of the same mass spectrum. The isoelectric focusing, SEC and ESI-TOF data are representative of more that three independent
analyses of multiple independent preparations.

in fact still reduced, the protein was reacted at a neutral pH with
1 mM N-ethylmaleimide (NEM, only reacts with reduced cysteines)
for 1 h before analysis by mass spectrometry. The mass of the protein
was unchanged (Fig. 2a) indicating that the hinge cysteines were not
reduced and instead were oxidized to each other as an intrachain
disulfide. Therefore we developed a protocol to first reduce the intrachain disulfides. We mixed purified half-antibodies together at equal
mass, incubated (at 1 mg/ml) them at 37 °C for 3 h, reduced with
dithiothreitol (DTT) at 24 °C for 2 h, concentrated and reoxidized
them (Fig. 2b). The resulting bispecific antibody was purified on a
cation-exchange (CEX) column. SEC analysis showed that the product
was monodisperse and not aggregated (Fig. 2c, top panel). Isoelectric
focusing further supports the conclusion that this process produced a
pure preparation of bispecific antibody, as the final product focuses to
a pH intermediate between that of the parent half-antibodies (Fig. 2c,
top panel inset). Furthermore, mass spectrometry confirmed that the
only detectable intact antibody exhibited a molecular mass closely
matching the predicted mass of the heterodimeric anti-MET-EGFR
antibody (Fig. 2c, bottom panel). There was no evidence of knobknob or hole-hole homodimer formation. The bispecific antibody
also maintained the correct pairing of the light chains to cognate
heavy chains, as confirmed by Lys-C digestion (which hydrolyzes
Fab domains from the Fc) followed by mass spectrometry analysis
(Supplementary Fig. 1). The yield of this procedure, into which we
added about 2 mg of the knob and 2 mg of the hole parental halfantibodies, was ~0.5–1 mg of bispecific antibody.
Because the initial extraction and capture of the two half-antibodies
were identical, we hypothesized that we could reduce preparation
time and sample handling by combining two bacterial cell pellets,
each containing one of the half-antibodies, before bacterial cell lysis
and Protein-A capture. Protein-A isolates of the extraction from such
a mixed bacterial cell pellet revealed a 75-kD band characteristic of
each half-antibody and, surprisingly, a 150-kDa band characteristic
of an intact antibody (Supplementary Fig. 2).
Because co-extraction resulted in production of an intact (presumably bispecific) antibody, we explored whether co-culture of two
bacterial strains, one expressing the knob and one expressing the hole
half-antibody, might also result in production of an intact bispecific
antibody (Fig. 3a). To determine the optimal ratio of the two bacterial

strains in the co-culture, we inoculated co-cultures with starter cultures
(each having identical A600) at ratios ranging from 1:10 to 10:1 (antiEGFR:anti-MET). As shown by western blot analysis in nonreducing
conditions, the production of each half-antibody in co-culture correlated with the inoculation ratio (Supplementary Fig. 3). To determine
the potential for scaling the co-culture process, we started 1-liter shakeflask cultures with anti-EGFR/anti-MET ratios of 3:2, 1:1 and 2:3,
while keeping the total seed volume constant. After co-culture extraction and Protein-A capture we observed a direct correlation between
the starting ratio and the final product in terms of the abundance
of each half-antibody (data not shown). Importantly, we also noted
spontaneous assembly of the bispecific antibody with an excess of the
75-kD anti-EGFR half-antibody. Hydrophobic interaction chromatography (HIC) effectively isolated the intact bispecific antibody from
excess half-antibodies. Analysis by mass spectrometry and isoelectric
focusing indicated no detectable homodimer species. Therefore, the
co-culture of two half-antibodies provided the fastest and most efficient route to generate large quantities of bispecific antibody of high
quality (Supplementary Fig. 4).
We inoculated with varying ratios of bacteria expressing anti-EGFR
and anti-MET for scale up to 10-liter, high-density, fed-batch cultures (needed for production of sufficient material for in vivo studies).
The whole-cell broth was harvested and cells lysed by microfluidization and the antibody was purified by Protein-A chromatography.
Hydrophobic interaction chromatography (HIC) was used to remove
excess 75-kD half-antibodies. To optimize the culture process, HIC
analysis was used to identify the optimal starting culture ratio.
A 50:50 ratio yielded a product containing the bispecific antibody and
excess anti-EGFR (Fig. 3b). After integrating the peak absorbance at
280 nm to determine protein concentrations, we calculated that a starting ratio of 70:30 (anti-MET:anti-EGFR) would compensate for the
lower levels of anti-MET produced. Unfortunately, 70:30 co-cultures
produced excess anti-MET (Fig. 3b). Therefore we adjusted the
inoculation ratio to 60:40 (anti-MET:anti-EGFR), which produced
an optimal amount of bispecific antibody (Fig. 3b). As shown by
SEC and mass spectrometry analysis, the bispecific antibody produced by 60:40 co-cultures showed similar monomeric stability
and heterodimer purity as the bispecific antibody produced by the
half-antibody redox method (Figs. 2c and 3c). We used this method

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

755

letters

Induce

Co-culture

Lyse
Purify

Abs280 (mAu)

Grow

c

1,000
800 ratio:

60:40

600
400 70:30
200
0

7.905

Co-inoculate

Abs280 (mAu)

b

a

180
160
140
120
100
80
60
40
20
0

Bispecific

0
50:50

2

4

6
8
10
Time (min)

12

npg

© 2013 Nature America, Inc. All rights reserved.

Intensity (10e-4)

5
10 15 20 25 30
146,053.13
Figure 3  Making bispecific antibodies using bacterial co-culture.
1.50
Time (min)
(a) Schematic of the process for bispecific assembly using bacterial
1.25
co-culture. Five steps are involved, inoculating the co-culture (co-inoculate), growing the co-culture
1.00
K/K
H/H
without protein expression (grow), inducing half-antibody expression (induce), and finally lysis of the
0.75
biomass (lyse) and purification (purify). Two bacterial cell types, one expressing the knob and the
0.50
other expressing the hole half-antibody, are mixed together to inoculate shake-flask culture. After
0.25
standard growth conditions used to expand cell density, protein expression in the co-culture is
0
induced by depletion of phosphate and addition of IPTG. After 23–72 h of induction, the biomass
145,000 146,000 147,000
is lysed and the protein extracted by standard technique and purified by Protein-A chromatography. (b) Hydrophobic
Mass (amu)
interaction chromatography (HIC) analysis of Protein-A–purified antibody from 10 L cultures inoculated with the indicated
starting ratios of cells producing each half-antibody. The knob (anti-EGFR) is shown in red and the hole (anti-MET) is shown in blue. This panel
represents the results from three independent experiments at the 50:50 ratio, two independent experiments at the 70:30 ratio and four independent
experiments at the 60:40 ratio. (c) Characterization by size exclusion chromatography, top panel, and ESI-TOF mass spectrometry, bottom panel, of
the product from the 60:40 co-culture. The arrows indicate the expected mass location for potential knob-knob (K/K) and hole-hole (H/H) homodimers.
These data represent two independent preparations for both the size exclusion and the mass spectrometry analysis.

to generate a wide range of bispecific antibodies (Supplementary
Table 1) and produced up to 1 g amounts of intact bispecific antibody using fed-batch co-cultures. Half-antibody yields from initial
Protein-A capture steps were routinely >100 mg/L without any culture
optimization (Supplementary Table 1).
As both the anti-EGFR (D1.5) and anti-MET (5D5, onartuzumab)
parental antibodies are ligand-blocking antibodies7,22, we determined
whether the bispecific antibody produced by bacterial co-culture
inhibited hepatocyte growth factor (HGF) and transforming growth
factor-α (TGF-α)-mediated cell proliferation of the KP4 ductal pancreatic cancer cell line (which produces HGF in an autocrine manner),
the A549 non-small-cell lung carcinoma cell line (which expresses
wild-type versions of MET and EGFR), the A431 epidermoid cell line
(which harbors an EGFR amplification), and the NCI-H596 cell line,
which harbors a loss of exon 14 (MET∆ex14), resulting in the loss of the
Cbl-binding domain required for targeting MET for ubiquitin-mediated turnover, thereby rendering this line hyper-responsive to HGF.
We compared the bispecific antibody to anti-MET (onartuzumab)
or to anti-EGFR (D1.5 IgG, which is bivalent against EGFR) alone or
together. Because the bispecific antibody is monovalent against both
MET and EGFR, we also compared it to a monovalent (mono) version
of anti-EGFR (generated using the knob and hole platform).
Although neither anti-EGFR IgG nor anti-MET fully inhibited proliferation of any of the cell lines stimulated with both HGF and TGF-α,
the MET-EGFR bispecific antibody and the combination of anti-MET
with either anti-EGFR (IgG) or anti-EGFR (mono) resulted in substantial inhibition of ligand-driven cell proliferation (Fig. 4a). As expected,
some cell lines demonstrated a greater sensitivity to either anti-MET
(e.g., KP4) or anti-EGFR (e.g., A431), whereas in other cell lines the
contribution of MET and EGFR appeared more balanced (e.g., A549).
With some cell lines (e.g., KP4 and A549) the MET-EGFR bispecific
antibody more potently inhibited proliferation compared to the combination of anti-MET and anti-EGFR, whereas with other cell lines the
reverse was true (e.g., A431 and NCI-H596). However, in these latter
cell lines the bispecific antibody performed equivalently to the most
appropriate control of anti-MET plus anti-EGFR (mono).
In A549 cells, the bispecific antibody inhibited phosphorylation
of MET by HGF and EGFR by TGF-α, respectively, in a manner

similar to that of anti-MET or anti-EGFR IgG against their respective ligands (Fig. 4b). In addition, the bispecific antibody, or the
combination of anti-EGFR IgG and anti-MET, but neither antiEGFR IgG nor anti-MET alone, blocked phosphorylation induced
by treatment with both HGF and TGF-α (Fig. 4b). Collectively these
data suggest that the bispecific format does not impede anti-MET
or anti-EGFR function, and that bivalency enhances the activity of
anti-EGFR IgG.
Pharmacokinetic studies demonstrated that the bispecific antibody
has linear and dose-proportional pharmacokinetics between 5 and
100 mg/kg (Supplementary Fig. 5a) resembling that of other antibodies.
Note that because the D1.5 anti-EGFR IgG binds to both mouse and
human EGFR (the anti-MET parental antibody binds only human
MET), its abundance is influenced by target-mediated clearance in
mice (Supplementary Fig. 5b).
We next examined the in vivo anti-tumor activity of the MET-EGFR
bispecific antibody in MET-driven (KP4), EGFR-driven (A431) and
MET-EGFR co-dependent (NCI-H596) xenograft models. In the METdriven KP4 tumor model, 50 mg/kg of the bispecific antibody showed
weaker activity than anti-MET alone, but 100 mg/kg of the bispecific
antibody showed activity equivalent to anti-MET (Fig. 4c). As KP4
cells were more sensitive to the bispecific antibody than to anti-MET
in vitro, target-mediated clearance due to binding of the bispecific antibody to mouse EGFR may account for its weaker activity in vivo.
In the EGFR-driven A431 tumor model, the bispecific antibody
showed activity consistent with half doses of the single agent antiEGFR at both the 50 mg/kg and 100 mg/kg doses (Fig. 4c). Consistent
with the bispecific antibody being monovalent against EGFR, these
data suggest that the bispecific antibody must be present at twice
the concentration of the single agent anti-EGFR IgG to completely
saturate EGFR expressed on the tumor.
In the NCI-H596 tumor model, dependent on both MET and
EGFR, and grown in hHGFTg-SCID (severe combined immunodeficient) mice27, the bispecific antibody demonstrated potent anti-tumor
activity at both the 50 mg/kg (106% tumor-growth inhibition (TGI),
60% partial responses (PRs), 20% complete responses (CRs)) and
100 mg/kg doses (107% TGI, 100% PRs), with the highest dose resulting in a more prolonged tumor response (Fig. 4c). Taken together,

756

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

letters

120

60
40
20
0

80
60
40
20
0

© 2013 Nature America, Inc. All rights reserved.

npg

MET

60

p-EGFR

40

10

+



+

+
+




+



+

+
+




+



+

+
+




+



+

+
+




+



+

+
+

EGFR

20
0.001 0.01 0.1
1
Antibody (µM)

10

NCI-H596

100
80
60
40
20
0

0.001 0.01 0.1
1
Antibody (µM)




Anti-MET + Anti-MET/EGFR
anti-EGFR IgG
bispecific

p-MET

80

120

A431

100

A549

100

0

10

Proliferation (%)

Proliferation (%)

120

0.001 0.01 0.1
1
Antibody (µM)

HGF
TGF-α

Anti-EGFR
IgG

Anti-MET

0.001 0.01 0.1
1
Antibody (µM)

10

Anti-MET (33 mg/kg) + anti-EGFR IgG (25 mg/kg)
Anti-MET/EGFR bispecific (50 mg/kg)
Anti-MET/EGFR bispecific (100 mg/kg)

Vehicle
Anti-MET (33 mg/kg)
Anti-EGFR IgG (25 mg/kg)

c
4,096

KP4

2,048
1,024
512
256
128
64
0

10

20 30
Day

40

50

1,024 A431

Tumor volume (mm3) log2

80

Control

Tumor volume (mm3) log2

KP4

100

Proliferation (%)

Proliferation (%)

120

b

Anti-MET + anti-EGFR IgG
Anti-MET + anti-EGFR (mono)
Anti-MET/EGFR bispecific

Anti-MET
Anti-EGFR IgG
Anti-EGFR (mono)

Tumor volume (mm3) log2

a

512
256
128
64
32
0

10

20
30
Day

40

2,048

NCI-H596

1,024
512
256
128
64
32
16

0

10

20

30
Day

40

50

Figure 4  Characterization of the MET-EGFR bispecific antibodies in vitro and in vivo. (a) The MET-EGFR bispecific antibody, or parental anti-MET,
anti-EGFR IgG or monovalent anti-EGFR (mono) alone or together were added to KP4, A549, A431 or NCI-H596 cancer cell lines treated with both HGF
(0.5 nM) and TGF-α (0.86 nM) in serum-free conditions. Data were normalized to the no-treatment control and are plotted as percent of control (%).
Data are plotted as group mean ± s.e.m. (n = 3/group) and are representative of at least three separate experiments. (b) A549 cells were either untreated
(control) or pretreated with 1 µM of anti-MET (onartuzumab), the anti-EGFR IgG, anti-MET (onartuzumab) and anti-EGFR IgG, or the MET/EGFR
bispecific antibody for 2 h in serum-free conditions followed by stimulation with HGF (0.5 nM) and/or TGF-α (0.86 nM) for 10 min. Cells were lysed
and subjected to western blot analysis with antibodies specific for total or phosphorylated MET and EGFR. The western blot was done at least two
times with comparable results. (c) KP4 and A431 cell lines were injected into nude (nu/nu) mice and the NCI-H596 cell line was injected into hHGF TgC3H-SCID mice. Ten, 9 and 11 d after tumor cell injection for the KP4, A431 and NCI-H596 tumor models, respectively, animals were injected
intraperitoneally (once a week for 3 weeks with a 2× loading dose) with vehicle, with anti-MET (onartuzumab) or anti-EGFR IgG alone or in combination,
or with the MET/EGFR bispecific antibody. Tumor volume was monitored over time. Data are plotted as group mean ± s.e.m. (n = 10/group). Please
see Supplementary Table 2 for more detailed information. Xenograft studies were done once for KP4 and NCI-H596 and twice for A431.

these in vivo data demonstrate that the MET-EGFR bispecific antibody has activity against both the MET and EGFR pathways.
Beyond the MET-EGFR bispecific antibody, we applied the
co-culture approach to a diverse set of antibodies. A subset of 27
additional examples and their characterization is shown in
Supplementary Table 1. The robust nature of the approach is illustrated by its success in producing antibodies incorporating various
different human VL and VH framework families as well as murine
chimeric antibodies, which comprise a large fraction of antibodies
of primary research interest.
The half-antibody bacterial co-culture approach described here has
many advantages over existing approaches for producing bispecific
antibodies. It is simple, can be used with virtually any two existing
antibodies, eliminates the need for a common light chain, and requires
no reengineering of antibody functional domains. The resulting
bispecific antibodies maintain the native architecture of a typical antibody and therefore probably retain the favorable pharmacokinetics
properties and specificity of conventional antibodies.
This approach is scalable for manufacturing with a process similar
to that of other therapeutic antibodies, but would not require extensive Chinese hamster ovary cell line development as required for
other bispecific antibody approaches discussed earlier. The antibody
titers achieved with this method are substantially higher than those in
previously published studies using E. coli25,28,29, and are sufficient for
research activities ranging from proof-of-concept in vitro analysis to
toxicology studies in nonhuman primates requiring gram amounts of
antibody (Supplementary Table 1). Although the initial yields achieved
here are not always sufficient for the start of clinical trials, additional
optimization of translation rates, chaperones, strains and culture
nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

conditions can usually improve titers up to the required level30.
Regardless, because E. coli is widely used for making recombinant
proteins, this technology may be of broad interest for both academic
research and industrial development.
Methods
Methods and any associated references are available in the online
version of the paper.
Note: Supplementary information is available in the online version of the paper.
Acknowledgments
We thank members of the Antibody Engineering, Protein Chemistry,
Pharmacokinetics and Pharmacodynamics, Biochemical Pharmacology and
Translational Oncology departments at Genentech for technical support
and/or advice and stimulating discussions. In particular we would like to
thank H. Xiang, E. Mai, J. Young, D. Delarosa, B. Wilson, and K. Billeci for help
in assessing the pharmacokinetics of antibodies. We would also like to thank
P.J. Carter and D. Vandlen for useful discussions and comments on the manuscript.
AUTHOR CONTRIBUTIONS
Conception and design: C.S., M.M., D.G.Y. and J.M.S.; development of
methodology: C.S., D.G.Y. and J.M.S.; acquisition of data: C.S., A.H., Z.Z.,
N.-Y.Y., J.P., D.E., W.S. and J.M.S.; analysis and interpretation of data: C.S.,
M.M., D.R., D.G.Y. and J.M.S.; writing, review and/or revision of the manuscript:
C.S., M.M., D.R., D.G.Y. and J.M.S.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details are available in the online
version of the paper.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.

757

1. Bargou, R. et al. Tumor regression in cancer patients by very low doses of a
T cell-engaging antibody. Science 321, 974–977 (2008).
2. Yu, Y.J. et al. Boosting brain uptake of a therapeutic antibody by reducing its affinity
for a transcytosis target. Sci. Transl. Med. 3, 84ra44 (2011).
3. Chelius, D. et al. Structural and functional characterization of the trifunctional
antibody catumaxomab. MAbs 2, 309–319 (2010).
4. Wu, C. et al. Molecular construction and optimization of anti-human IL-1alpha/beta
dual variable domain immunoglobulin (DVD-Ig) molecules. MAbs 1, 339–347
(2009).
5. Wu, C. et al. Simultaneous targeting of multiple disease mediators by a dualvariable-domain immunoglobulin. Nat. Biotechnol. 25, 1290–1297 (2007).
6. Bostrom, J. et al. Variants of the antibody Herceptin that interact with HER2 and
VEGF at the antigen binding site. Science 323, 1610–1614 (2009).
7. Schaefer, G. et al. A two-in-one antibody against HER3 and EGFR has superior
inhibitory activity compared with monospecific antibodies. Cancer Cell 20, 472–486
(2011).
8. Ridgway, J.B., Presta, L.G. & Carter, P. ‘Knobs-into-holes’ engineering of antibody CH3
domains for heavy chain heterodimerization. Protein Eng. 9, 617–621 (1996).
9. Atwell, S., Ridgway, J.B., Wells, J.A. & Carter, P. Stable heterodimers from
remodeling the domain interface of a homodimer using a phage display library.
J. Mol. Biol. 270, 26–35 (1997).
10. Gunasekaran, K. et al. Enhancing antibody Fc heterodimer formation through
electrostatic steering effects: applications to bispecific molecules and monovalent
IgG. J. Biol. Chem. 285, 19637–19646 (2010).
11. Davis, J.H. et al. SEEDbodies: fusion proteins based on strand-exchange engineered
domain (SEED) CH3 heterodimers in an Fc analogue platform for asymmetric binders or
immunofusions and bispecific antibodies. Protein Eng. Des. Sel. 23, 195–202 (2010).
12. Schaefer, W. et al. Immunoglobulin domain crossover as a generic approach for
the production of bispecific IgG antibodies. Proc. Natl. Acad. Sci. USA 108,
11187–11192 (2011).
13. Merchant, A.M. et al. An efficient route to human bispecific IgG. Nat.
Biotechnol. 16, 677–681 (1998).
14. Jackman, J. et al. Development of a two-part strategy to identify a therapeutic
human bispecific antibody that inhibits IgE receptor signaling. J. Biol. Chem. 285,
20850–20859 (2010).
15. Labrijn, A.F. et al. Therapeutic IgG4 antibodies engage in Fab-arm exchange with
endogenous human IgG4 in vivo. Nat. Biotechnol. 27, 767–771 (2009).
16. Strop, P. et al. Generating bispecific human IgG1 and IgG2 antibodies from any
antibody pair. J. Mol. Biol. 420, 204–219 (2012).

17. Guo, A. et al. Signaling networks assembled by oncogenic EGFR and c-Met.
Proc. Natl. Acad. Sci. USA 105, 692–697 (2008).
18. Xu, H. et al. Dual blockade of EGFR and c-Met abrogates redundant signaling and
proliferation in head and neck carcinoma cells. Clin. Cancer Res. 17, 4425–4438
(2011).
19. Engelman, J.A. et al. MET amplification leads to gefitinib resistance in lung cancer
by activating ERBB3 signaling. Science 316, 1039–1043 (2007).
20. Turke, A.B. et al. Preexistence and clonal selection of MET amplification in EGFR
mutant NSCLC. Cancer Cell 17, 77–88 (2010).
21. Zhang, Y.-W. et al. MET kinase inhibitor SGX523 synergizes with epidermal
growth factor receptor inhibitor erlotinib in a hepatocyte growth factordependent fashion to suppress carcinoma growth. Cancer Res. 70, 6880–6890
(2010).
22. Jin, H. et al. MetMAb, the one-armed 5D5 anti-c-Met antibody, inhibits orthotopic
pancreatic tumor growth and improves survival. Cancer Res. 68, 4360–4368
(2008).
23. Feige, M.J., Hendershot, L.M. & Buchner, J. How antibodies fold. Trends Biochem.
Sci. 35, 189–198 (2010).
24. Simmons, L.C. et al. Expression of full-length immunoglobulins in Escherichia coli:
rapid and efficient production of aglycosylated antibodies. J. Immunol. Methods
263, 133–147 (2002).
25. Mazor, Y., Van Blarcom, T., Mabry, R., Iverson, B.L. & Georgiou, G. Isolation
of engineered, full-length antibodies from libraries expressed in Escherichia coli.
Nat. Biotechnol. 25, 563–565 (2007).
26. Baumal, R., Potter, M. & Scharff, M.D. Synthesis, assembly, and secretion of gamma
globulin by mouse myeloma cells. 3. Assembly of the three subclasses of IgG.
J. Exp. Med. 134, 1316–1334 (1971).
27. Zhang, Y.-W. et al. Enhanced growth of human met-expressing xenografts in a new
strain of immunocompromised mice transgenic for human hepatocyte growth factor/
scatter factor. Oncogene 24, 101–106 (2005).
28. Makino, T., Skretas, G., Kang, T.H. & Georgiou, G. Comprehensive engineering
of Escherichia coli for enhanced expression of IgG antibodies. Metab. Eng. 13,
241–251 (2011).
29. Chan, C.E.Z., Lim, A.P.C., Chan, A.H.Y., MacAry, P.A. & Hanson, B.J. Optimized
expression of full-length IgG1 antibody in a common E. coli strain. PLoS ONE 5,
e10261 (2010).
30. Reilly, D.E. & Yansura, D.G. Production of antibodies and antibody fragments
in Escherichia coli. in Antibody Engineering (eds. Kontermann, R. & Dübel, S.)
331–344 (Springer Berlin Heidelberg, 2010).

758

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

letters

ONLINE METHODS

npg

© 2013 Nature America, Inc. All rights reserved.

Plasmid construction and expression of antibodies. Antibodies were cloned
into expression vectors described previously24. Briefly, the STII signal sequence
with a translation initiation strength of 1 for both the heavy chain and light
chain preceded the sequence coding for the mature antibody. For antibody
expression, an overnight culture in a suitable W3110 derivative30 was grown
at 30 °C in Luria Bertani media (100 µg/ml carbenicillin), diluted 1:100 into
expression media30 (100 µg/ml carbenicillin) and grown for 24 h at 30 °C.
For scale-up to 10-liter culture24, initial starter cultures (500 ml) were grown
for both half-antibodies separately. After reaching stationary phase, the two
half-antibodies were mixed together for a total 500-ml inoculum (250 ml each
for 50:50 ratio) to start 10-liter cultures.
Antibody isolation and purification. For half-antibody production, the bacterial cells expressing knob and hole antibodies were grown in separate cultures.
The two antibody halves were paired and reassembled into the bispecific antibody by a process of annealing, reduction and oxidation. First, the molecules
were mixed together at a 1:1 mass ratio in annealing buffer (10 mM Tris, pH
7.5 and 100 mM NaCl), heated to 37 °C for 25 min, cooled to 24 °C for 30 min,
and reduced by addition of 2 mM dithiothreitol. After 2 h the antibody was
concentrated to 10 mg/ml and oxidized by the addition of 5 mM dehydroacorbic acid for 30 min. Unreacted half-antibodies were removed by SP-FF cation
exchange chromatography (CEX, GE Healthcare). Interim analysis showed the
majority of product was intact bispecific antibody (data not shown). For further
analytical characterization and purification, the assembled mixture was isolated
by CEX with pH 5.8 acetate. Antibody preparations were dialyzed into 10 mM
histidine-acetate buffer, pH 5.9, 0.01% Tween 20, and 240 mM sucrose. The
final product was quality checked by isoelectric focusing, liquid chromatography (LC)-ESI-TOF mass spectrometry and size exclusion chromatography in
line with multi-angle laser light scattering (SEC-MALS).
To produce bispecific antibodies by bacterial co-culture, separately grown LB
medium overnight cultures of bacterial cells with knob or hole antibody plasmids
were mixed to seed a 500-ml expression medium, cultured and protein extracted as
described before. Lysates were purified over Protein-A affinity resin and analyzed
by SDS-PAGE (nonreducing) and LC-ESI-TOF as described14. To quantify the
percentage of bispecific antibody in comparison to excess half-antibody in these
preparations, ProPac 10 hydrophobic interaction (HIC) chromatography was used
(Dionex, ProPac-10, 2.1 × 100). Briefly, the Protein-A column eluate was dialyzed
into 25 mM Tris-HCl, pH 7.5 containing 150 mM NaCl. The buffer-exchanged
protein was then diluted 1:1 with 1 M ammonium sulfate in 25 mM sodium phosphate pH 6.5. The conditioned protein was filtered and loaded on the HIC column
and resolved with a linear gradient of NH4SO4 in sodium phosphate pH 6.5 and
25% isopropyl alcohol. For protein purification a similar procedure was used with
a custom ProPAC HIC column from Dionex (ThermoFisher). The bispecific antibody was further purified by CEX chromatography to remove endotoxin and other
bacterial contaminants before formulating in 10 mM histidine-acetate buffer, pH
5.9, 0.01% Tween 20, and 240 mM sucrose.
For co-culturing in high-density cultures, starter cultures were prepared by
mixing equal volumes of the two bacterial strains (each of which produced knob
or hole antibodies) grown to stationary phase at 1:1 ratio (5 ml of each culture for
a total of 10 ml). The 10-liter co-cultures were inoculated with the 10 ml starter
culture mixtures and grow under identical conditions as for the half-antibodies
cultures as previously described23. Antibodies were extracted either from the cell
pellet or from whole broth, with a typical 10-liter culture resulting in 2.5 kg of
cell biomass. Cell pellets were resuspended in 5 liters of buffer containing 25 mM
Tris, pH 7.5 and 125 mM NaCl using a plytron mixer. Reconstituted cells were
microfluidized, clarified with 0.4% PEI, and prepared for Protein-A capture.
In addition, whole broth extractions can be done with similar results.
A simplified expression and purification scheme was developed
(Supplementary Fig. 4). Briefly, after Protein-A capture, antibody was diluted
1:1 with a buffer containing 1.5 M ammonium sulfate and 25 mM sodium phosphate pH 6.5 and loaded onto a HIC column (Dionex Pro Pac HIC-10 4.6 mm ×
100 mm). A 15-column volume gradient from 30–60% (buffer A composed of
25 mM sodium phosphate, pH 6.95, and 1.5 M ammonium sulfate and buffer
B composed of 25 mM sodium phosphate, pH 6.95, and 25% isopropyl alcohol)
was used to separate the Protein-A elution pool. The protein separated into
two major species, one containing the intact bispecific antibody and the other
containing the excess anti-EGFR IgG (D1.5) half-antibody. Fractions containing
doi:10.1038/nbt.2621

intact antibody were pooled and treated to remove any remaining contaminating
endotoxin by adherence to an S-FF column in a 25 mM sodium acetate buffer
at pH 5.0, washing with acetate buffer containing 0.1% Triton X114, and then
removing the detergent by washing with acetate buffer. The protein was eluted
from the S-FF column using 25 mM Tris pH 8.0, pooled, and analyzed by SDSPAGE, mass spectrometry and kinetic chromogenic limulus amebocyte lysate
(LAL) assays for endotoxin. The protein contained <0.05 EU/mg of endotoxin
in the final preparation, indicating that it is suitable for in vivo applications.
Cell lines. The KP4 (Riken, RCB1005), A549 (ATCC, CCL-185), A431 (ATCC,
CRL-1555) and NCI-H596 (ATCC, HTB-178) were obtained from either
the American Tissue Type Collection (ATCC) (Manassas, VA) or the Riken
Bioresource Center Cell Bank (Ibaraki, Japan). All cells were grown in normal
growth media (RPMI media + 10% FCS (FCS) + 2 mM l-glutamine) in a tissue
culture incubator kept at 37 °C with 5% CO2.
Cell proliferation assays. Cells were seeded at 2.5 × 105/ml in a volume of
100 µl in normal growth media without FCS with HGF (0.5 nM) and/or TGFα
(0.86 nM) in 96-well plates. Cells were treated with the indicated doses of
antibodies for 72 h in normal growth conditions before analysis by CellTiterGlo according to manufacturer’s recommendations (Promega, Madison, WI).
Data were normalized to untreated control samples fitted with nonlinear curve
fitting (Prism 5.0, GraphPad, San Diego, CA).
Immunoblotting. 200 µl of bacterial culture was used to generate lysates that were
analyzed by SDS-PAGE using nonreducing conditions in 100 µl of NR-lysis buffer
(88 µl PopCulture Reagent (Novagen), 10 µl 100 mM iodoacetamide, 2 µl lysonase
reagent (EMD Biosciences)). Samples were incubated for 10 min at room temperature, cleared by centrifugation for 2 min at 9,300×g, and supernatants mixed 1:1 in 2×
SDS sample buffer (Invitrogen). Samples were heated for 5 min at 95 °C, centrifuged
for 1 min at 16,000×g before gel electrophoresis using NuPAGE 4–12% Bis-Tris/MES
gels (Invitrogen). Gels were transferred by iBlot (Invitrogen) onto nitrocellulose
membrane, immunoblotted with IRDye800CW conjugated anti-human IgG Fc
antibody (Rockland) and imaged with a LI-COR Odyssey Imager.
Human cells were seeded at 0.5 – 1 × 106 cells per well in a 6-well plate
overnight. Media was removed and replaced with fresh media without FCS
for 3 h. The test therapeutic antibodies were then all added to a final concentration of 1 µM for 1 h before challenge with HGF (0.5 nM) and/or TGFα
(0.86 nM) for 10 min. Cells were washed once with cold PBS and lysed in 1×
Cell Extraction Buffer (Biosource, Carlsbad, CA) supplemented with protease inhibitors (Roche, Mannheim, Germany), 1 mM phenylmethylsulfonyl
fluoride (PMSF; Sigma-Aldrich, St. Louis, MO), and Phosphatase Inhibitor
Cocktails 1 and 2 (Sigma, St. Louis, MO). Determination of protein concentration was done using the BCA Protein Assay Kit (Pierce, Rockford, IL).
For western blot analysis, equal amounts of protein were separated by electrophoresis through NuPage Bis-Tris 4–12% gradient gels (Millipore, Bedford,
MA), proteins were transferred onto nitrocellulose membranes using the iBlot
system and protocol from Invitrogen (Carlsbad, CA). Antibodies to MET
(#3148), and pMET (3077) were obtained from Cell Signaling (Danvers, MA),
and the EGFR (#MI-12-1) from MBL and pEGFR (#2234) from Cell Signaling.
The β-actin and GAPDH antibodies were obtained from Sigma (St. Louis, MO).
All primary antibodies were used at 1:1,000 dilutions except for the β-actin,
which was used at 1:10,000. The anti-Fab and anti-Fc Abs were from Jackson
ImmunoResearch (West Grove, PA). Specific antigen-antibody interaction was
detected with a HRP-conjugated secondary antibody IgG using ECL detection
reagents (Amersham Biosciences, Pittsburgh, PA). The p-MET, total MET,
p-EGFR and total EGFR band intensities were quantified using the LI-COR
Odyssey immunoblotting system (Lincoln, NE), using an Alexa goat anti-rabbit
antibody (#A21109, Invitrogen) or an IRDye goat anti-mouse antibody 800CW
(#926-32210, LI-COR), both used at a 1:5,000 dilution.
Xenograft tumor studies. All experimental procedures conformed to the
guiding principles of the American Physiology Society and were approved
by Genentech’s Institutional Animal Care and Use Committee. The KP4 and
A431 cells (5 × 106 total) were mixed in Hank’s Balanced Salt Solution and then
inoculated subcutaneously (s.c.) in the rear right flank of female nude (nu/nu)
mice (Charles River Laboratories). The NCI-H596 tumor cells (2 × 106 total)
were implanted in a similar fashion into hHGFTg-C3H-SCID mice27.
nature biotechnology

resource
OPEN

Genomic landscapes of Chinese hamster ovary cell lines
as revealed by the Cricetulus griseus draft genome

npg

© 2013 Nature America, Inc. All rights reserved.

Nathan E Lewis1,14, Xin Liu2,3,14, Yuxiang Li2,14, Harish Nagarajan1,14, George Yerganian4,5, Edward O’Brien1,
Aarash Bordbar1, Anne M Roth6,13, Jeffrey Rosenbloom6,13, Chao Bian2, Min Xie2, Wenbin Chen2, Ning Li2,3,7,
Deniz Baycin-Hizal8, Haythem Latif1, Jochen Forster9, Michael J Betenbaugh8,9, Iman Famili6,13, Xun Xu2,3,
Jun Wang2,10–12 & Bernhard O Palsson1,9
Chinese hamster ovary (CHO) cells, first isolated in 1957, are the preferred production host for many therapeutic proteins.
Although genetic heterogeneity among CHO cell lines has been well documented, a systematic, nucleotide-resolution
characterization of their genotypic differences has been stymied by the lack of a unifying genomic resource for CHO cells.
Here we report a 2.4-Gb draft genome sequence of a female Chinese hamster, Cricetulus griseus, harboring 24,044 genes.
We also resequenced and analyzed the genomes of six CHO cell lines from the CHO-K1, DG44 and CHO-S lineages. This analysis
identified hamster genes missing in different CHO cell lines, and detected >3.7 million single-nucleotide polymorphisms
(SNPs), 551,240 indels and 7,063 copy number variations. Many mutations are located in genes with functions relevant to
bioprocessing, such as apoptosis. The details of this genetic diversity highlight the value of the hamster genome as the reference
upon which CHO cells can be studied and engineered for protein production.
Recombinant therapeutic proteins are increasingly important to the
pharmaceutical industry. Global spending on biologics, such as antibodies, hormones and blood factors, reached $138 billion dollars in
2010 (ref. 1). CHO cell lines are the preferred host expression system
for many therapeutic proteins2, and the cells have been repeatedly
approved by regulatory agencies. Moreover, they can be easily cultured
in suspension and can produce high titers of human-compatible
therapeutic proteins3.
Most improvements in CHO-based recombinant protein titer and
quality have been achieved by random cell-line mutagenesis and
media optimization4. Meanwhile, efforts to engineer mouse cells have
greatly benefited from numerous genomic tools and technologies,
owing in large part to the availability of the Mus musculus reference
genome sequence. Genomic resources are also becoming available for
CHO cells, such as the CHO-K1 genome5, expressed sequence tag6,7
and bacterial artificial chromosome (BAC) libraries8, and compendia of proteomic9–11 and transcriptomic data7,12–16. However, much
like how murine cell line data are routinely studied in the context of
the Mus musculus reference genome, there is a need for a standard
reference for all CHO cell lines to contextualize all of these valuable
genomic resources.
Many recombinant protein–producing CHO cell lines were derived
from the CHO-K1, CHO-S and DG44 lineages. Each has undergone

extensive mutagenesis and clonal selection17. Hence, a standard reference genome that is representative of the genomic sequence of all
native CHO genes and regulatory elements would be advantageous for
the successful implementation of genomic resources in CHO-based
bioprocessing4,17. To address this need, we present a draft genome
sequence of the C. griseus (Chinese hamster) colony from which the
CHO cell lines have been derived. This reference sequence is used to
analyze the genomic composition and mutational diversity among
seven CHO cell lines, and to study how sequence variations may affect
cellular processes that are of bioprocessing relevance. The C. griseus
genome may serve along with the previously published CHO-K1
genome as primary reference resources in future analyses of omics
data sets derived from CHO cells. This will also aid in bioprocessing
systems analysis and in cell line engineering studies.
RESULTS
Genome assembly
Female Chinese hamster DNA was acquired from various tissues and
sequenced using the Illumina HiSeq 2000 platform, yielding 347.5 Gb
of raw data (Supplementary Tables 1 and 2). Using SOAPdenovo, we
assembled 2.4 Gb of the genome with a contig N50 (the shortest length
of sequence contributing more than half of assembled sequences)
of 26.5 kb and scaffold N50 of 1.54 Mb (Table 1). The genome was

1CHOmics,

Inc., San Diego, California, USA. 2BGI-Shenzhen, Shenzhen, People’s Republic of China. 3BGI Europe, BGI-Shenzhen, Copenhagen Bio Science Park,
Copenhagen, Denmark. 4Cytogen Research and Development, Inc., West Roxbury, Massachusetts, USA. 5Foster Biomedical Research Laboratory, Brandeis University,
Waltham, Massachusetts, USA. 6GT Life Sciences, San Diego, California, USA. 7BGI Europe Institute, BGI-Shenzhen, Copenhagen Bio Science Park, Copenhagen,
Denmark. 8Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA. 9The Novo Nordisk Foundation Center
for Biosustainability, Technical University of Denmark, Hørsholm, Denmark. 10The Novo Nordisk Foundation Center for Basic Metabolic Research, University of
Copenhagen, Copenhagen, Denmark. 11Department of Biology, University of Copenhagen, Copenhagen, Denmark. 12King Abdulaziz University, Jeddah, Saudi Arabia.
13Present addresses: Life Technologies, Carlsbad, California, USA (A.M.R.), and Cell Engineering Unit, Intrexon Corporation, San Diego, California, USA (J.R. and
I.F.). 14These authors contributed equally to this work. Correspondence should be addressed to B.O.P. ([email protected]) or J.W. ([email protected]).
Received 7 August 2012; accepted 3 June 2013; published online 21 July 2013; doi:10.1038/nbt.2624

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

759

resource
Table 1  Assembly statistics
Contig
N90
N80
N70
N60
N50
Longest (bp)
Total size (bp)
Total number (≥100 bp)
Total number (≥2 kb)

Scaffold

Super-scaffolds

Size (bp)

Number

Size (bp)

Number

Size (bp)

Number

6,390
11,724
16,531
21,461
26,761
219,443
2,332,459,831



91,476
65,207
48,549
36,190
26,456


458,620
128,107

346,540
656,362
950,835
1,249,430
1,544,832
8,324,132
2,393,115,851



1,637
1,156
853
634
461


287,210
6,947

443,523
939,760
1,417,091
1,994,221
2,491,721
10,797,402
2,400,585,184



1,091
723
519
378
271


286,619
6,356

further assembled into super-scaffolds with optical mapping, yielding an N50 of 2.49 Mb. Ninety percent of the genome assembly was
included in the 1,091 longest super-scaffolds (Table 1). The overall
size of the hamster genome was estimated to be 2.7 Gb using the
k-mer estimation method (Supplementary Fig. 1). Optical mapping
data were further combined with published BAC-based fluorescence
in situ hybridization data8 to successfully associate 26% of the genome
sequence data to specific hamster chromosomes (Supplementary
Tables 3 and 4).
To assess the coverage of the hamster transcripts in the assembly,
we sequenced mRNA from a pool of hamster tissues and assembled
the transcriptome de novo into 98,116 contigs (Online Methods).
Mapping RNA-seq contigs to the genome assembly demonstrated that
>90% of the assembled transcripts could be associated with annotated
genes (Supplementary Table 5).
Genome annotation
We annotated repeat features and identified endogenous retroviral
elements (Supplementary Notes and Supplementary Tables 6–9).
We next predicted genes using homology-based approaches,
de novo gene prediction algorithms and transcriptome-based methods
(Supplementary Table 10 and Supplementary Fig. 2). The final
gene set consisted of 24,044 genes in the hamster genome, which
is similar to that of the CHO-K1 cell line5. Of these predicted
genes, 23,473 clustered into 21,628 gene families (Fig. 1a), and
3,052 (14.1%) gene families contained more than one gene in the
hamster. Only 20 gene families were unique to the hamster, when
compared to the rat, mouse and CHO-K1 genomes (Fig. 1b). We
functionally annotated 82% (19,775) of the predicted genes using
InterPro, Swiss-Prot, TrEMBL, Gene Ontology (GO) and KEGG
(Supplementary Table 11).

760

30,000

Comparison between hamster and CHO-K1 genomes
Mutations and structural variations are common in mammalian cell
line genomes17–19. Although large chromosomal rearrangements
have been shown in CHO cell lines previously8, the extent of these
changes at the sequence level remains unknown. Thus, we compared
the structure and gene content of the Chinese hamster genome and
the published genome of CHO-K1 cells from the American Type
Culture Collection (ATCC)5. To facilitate this comparison, we aligned
all large hamster and CHO-K1 scaffolds to the mouse chromosomes.
Numerous chromosomal translocations have occurred through evolution since the mouse and hamster diverged (Fig. 2a). However,
no large sections of the mouse chromosomes were missing in the
hamster (Fig. 2b). On the other hand, CHO-K1 scaffolds failed to
align to portions of mouse chromosomes 5, 7, 15 and 19 (Fig. 2b).
Meanwhile, Illumina sequencing reads from CHO-K1 (ref. 5) aligned
to the hamster scaffolds corresponding to these regions. This result
suggests the possibility that these regions are in CHO-K1, albeit
considerably mutated or rearranged. We next directly assessed the
scope of mutations by comparing the CHO-K1 genome to the hamster
genome. CHO-K1 contained 25,711 structure variations, including
13,735 insertions and 11,976 deletions (Supplementary Notes and
Supplementary Table 12). Despite the large number of structural
variations in CHO-K1, the set of annotated genes in the hamster
and CHO-K1 were highly similar. Specifically, there was a 99% overlap in gene content between the two genomes, and an assessment
of GOslim terms for these genes confirmed the similarity in gene
content (Fig. 2c).
Variation between different CHO cell lines
Despite the similarity in gene content, numerous genomic variations were detected in CHO-K1 relative to the hamster. To elucidate

Single-copy orthologs
Multiple-copy orthologs
Unique paralogs

25,000

Other orthologs
Unique genes

20,000

b
C. griseus
15,646
CHO-K1
14,795

M. musculus
16,435
R. norvegicus
15,981
366

20
4

8,188

15,000
24

126

697

10,000

3,167

12,281

5,000
5
0

4

or llu
ve s
g
. m icu
us s
H cul
. s us
O api
. a en
na s
tin
C us
H
O
C
. g K1
ris
C
. f eus
am
ilia
ris

ba
R

.n

.t

au

ru
s

80

ca

B

276

308
10

M

a

E.

Figure 1  Gene families across C. griseus and
several mammalian genomes. (a) The majority
of mammalian genes are orthologous, with
more than 5,000 preserved as single copies
in each species (dark blue). A few thousand
have species-specific duplications (light
blue), whereas other orthologs were shared by
only some of the nine mammals studied here
(orange). A small fraction of genes were unique
to just one species (green), and occasionally
had paralogs in that one species (pink). (b) The
overlap of orthologous gene clusters is shown
among the CHO-K1, C. griseus, M. musculus
and Rattus norvegicus genomes. ENSEMBL
(v58) annotated genes were used for the CHO-K1,
M. musculus and R. norvegicus genomes.

Number of genes

npg

© 2013 Nature America, Inc. All rights reserved.

Nx contig (scaffold) size is the length of the smallest contig (scaffold) S in the sorted list of all contigs (scaffolds) where the cumulative length from the largest contig to contig S
is at least x% of the total assembly length.

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

resource
a 65

b
C. griseus
scaffolds

50
40

M. musculus 1
chromosomes

30
20

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 1819 X

M. musculus chromosomes
1
2
3

4
5
6

7
8
9

10
X

C. griseus chromosomes

R

c

eg
M ula
et tio
N abo n o
uc li f
l c b
Tr eic pro iolo
an ac c gi
Pr sc id ess cal
o ri m
pr
oc
C tein ptio eta
at
es
a m n bol
s
Tr bo od
ic
an lic if
pr
oc
Si sp pr ica
gn or oc tio
es
s
Pr al t es n p
s ro
ot tra
ce
Tr ein nsd
ss
an m u
Io sla eta ctio
n ti b n
Bi tran on olic
os s
pr
oc
Pr yn por
es
o th t
s
C tein etic
ar t p
r
O boh ans roc
rg y p e
s
d
Li ane rat ort s
pi ll e
d e
D m or me
N et ga ta
R A m abo niz bol
es e l a ic
C po tab ic p tion pro
el ns o ro
ce
l
l
ss
Am ula e to ic p ces
r
i c s ro s
M no om tre ces
ul ac po ss s
C tice id / ne
el llu d n
l d la e t o
ea r riv rg
th org ati an
an ve iza
is m t io
m et n
al ab
de ol
ve ic
lo pro
pm c
en ess
t

0

2

CHO-K1
scaffolds

10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
X
Y
mt

Number of scaffolds

60

1,000

C. griseus
CHO-K1

100
10

CHO-K1

en

er

at

io

n

of

C Ce
yt l
os lu
ke lar
pr
le ho
ec
C ton me
R ur
el o o
es s
l c rg st
po or
om an as
m
ns e
m iza is
e ta
un ti
to bo
en lit C ica on
do es e tio
ge an ll c n
y
Vi nou d e cle
ra s n
C l re sti erg
el p m y
l− ro u
ce d lu
R
es
ll u s
R sig ctio
M pon
ep n n
R
ito se
eg
ro ali
ch t
ul
du ng
at
on o b
c
io
dr io G tion
An n o
io tic ro
n s w
at f g
or tim th
om en
ga u
ic e e C
ni lu
a
R l st xpr ell d Cel zat s
es ru e if l g io
po ct ssi fe ro n
R ns ure on ren wt
es e m , e ti h
po to o p ati
ns ex rp ige on
e te ho ne
to rn ge tic
ab al ne
i s s
Em Ce otic tim is
br ll p sti ulus
yo ro mu
de life lus
ve ra
lo tio
p n
C
el Be me
l r h nt
ec a
og vio
ni r
tio
n

1

G

CHO-K1/SF
(ECACC
93061607)
CHO protein free
(ECACC 00102307)

CHO-S
(Tilkins, 1991)

Gamma rays
CHO-DG44 (DHFR-)
(Urlaub & Chasin, 1983)
(Avail from L. Chasin)

CHO DG44
(cGMP banked)
(Life Technologies,
A1097101)

CHO-S

(cGMP banked)
(Life Technologies,
A1136401)

e

s
eu

-S

C0101

.g

CHO-K1/SF

10
2
1
1/2
1/10

Chinese
hamster

ris

44

O
H

CHO protein free
CHO K1/SF
CHO K1 ECACC
CHO K1 ATCC
CHO DG44
CHO-S
C0101
C. griseus

0
−10,000
0
10,000 20,000 30,000
Distance from DHFR gene start (bp)

C0101
(Production cell line)

C

C

1

G

C

CHO-K1
ECACC

Cell lines

CHO-K1
ECACC
(85051005)

CHO-MTXRIII
DHFR mutant

CHO
DG44

CHO-K1
ATCC

C

EMS exposure
CHO-K1
ATCC
(Puck,
ATCC CCL-61)

DHFR

AT

CHO variant
(Tobey, 1962)

CHO pro3- (DHFR+)
(Flintoff, 1976)

DHFR

D

2

d

K1

Original CHO line (Puck 1957)

c

1 2 3 4 5 6 7 8 9 10 X
Chromosome

CHO
protein free

CHO-S

f
Relative indel abundance
(indel count/expected
indel count)

b

Chinese hamster

CHO-K1
(Kao & Puck, 1968)

gene encoding a desired protein product on the same plasmid, the
GHT media and methotrexate can be used to select for clones that
overproduce DHFR and the recombinant protein of interest. Among
the cell lines sequenced here, only the DG44 cell line is known to carry
the DHFR-negative phenotype20. Consistent with this characteristic,

Relative SNP abundance
(SNP count/expected
SNP count)

a

Normalized sequencing read
depth

npg

© 2013 Nature America, Inc. All rights reserved.

C. griseus

the extent of genomic heterogeneity across
other cell lines, we sequenced six additional
CHO cell lines (Fig. 3a) to >9× depth, covering ~95% of each genome. Including the
previously sequenced CHO-K1 genome, the
seven cell lines accounted for three different
lineages and several different phenotypic features, for example, cells adapted to different
media, suspension-grown cells and antibodyproducing cells (Supplementary Table 13).
To initially validate our cell line resequencing data, we inspected
the genotype related to an important phenotypic marker for CHO cell
lines. Certain cell lines lack dihydrofolate reductase (DHFR) activity20, and cannot grow without glycine, hypoxanthine and thymidine
(GHT). However, when an exogenous DHFR gene is coupled to a

Number of genes
in GO category

Figure 2  Genome comparison between mouse,
Chinese hamster and CHO-K1. Conserved
sequences among the mouse, CHO-K1 and
C. griseus genomes were determined by aligning
their scaffolds (larger than 1 Mb) to the mouse
genome. (a) Assignment of C. griseus scaffolds
to M. musculus chromosomes. The C. griseus
scaffolds with chromosomal assignment
(accounting for more than a quarter of the
2.4 Gb of genomic sequence) were compared
to mouse chromosomes to assess the scale of
chromosomal rearrangement. (b) Alignment of
CHO-K1 and C. griseus genomes. Few large DNA
stretches are missing in the hamster, whereas
there are more regions to which CHO-K1
scaffolds could not align. (c) Gene annotation.
The number of genes was determined for each
“Biological Process” GO slim category in both
the C. griseus and CHO-K1 genomes.

10
2
1
1/2
1/10

1 2 3 4 5 6 7 8 9 10 X
Chromosome

Figure 3  Mutation landscape of CHO cell lines. CHO cell lines have diverged over time due to numerous iterations of mutation, selection and clonal
isolation. (a) The family tree of a few cell lines are shown here, with the sequenced lines highlighted in blue. Where known, the name of those who
isolated the strain and the year it was done are given in parentheses. (b) Sequencing read depth (normalized by the average read depth for the cell line,
and averaged over 100 bp bins) was assessed for the DHFR gene, a selectable marker for some CHO cell lines. The DHFR gene was clearly deleted in
the DG44 cell line, as no DG44 reads aligned to this region and (c) no PCR product was obtained for the gene. Mutations were further analyzed on a
genomic-wide scale. (d) A phylogenetic reconstruction based on the diversity of SNPs recapitulate the known historical divergence of these CHO cell
lines from inferred ancestral cell lines (gray parent nodes). (e,f) Furthermore, the abundance of SNPs (e) and indels (f) varied between the hamster
chromosomes, as determined using all scaffolds that could be assigned to specific chromosomes (~26% of the sequence data). ECACC, European
Collection of Cell Cultures; ATCC, American Type Culture Collection.

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

761

npg

0.1

–100

TRAIL-R

Direct interaction
Indirect interaction

C0101

CHO-S

IL-3R TrkA

Fas

DG44

2.5
TNF-R1
2
1.5
1

IL-1R

K1 ATCC

am
C ste
H r
O
H K1
am
C ste
H r
O
-K
1

−1

Decreased
expression
ATM
Fas
CASP12
PI3K
TNFR1
CASP7
CASP9
IKK
IRAK
p53
CytC
BAX
IκBα
ENDO-G

–10

0

c

–1

1

K1 ECACC

1

–0.1

Increased
expression

10

K1/SF

*

100

Protein free

Pro-apoptotic
Anti-apoptotic

NF-κB
PKA
Akt
AIF
Bcl-XL
Calpain
CASP6
DFF45
TRAF2
MyD88
FLIP
IL-1R
Calcineurin
RIP1
Bid
FADD
CASP8
TRAIL-R
IAP
Bcl2
DFF40
NIK

2

Difference in
expression level
from hamster
to CHO-K1

b

Normalized expression (log10)

a

Pro-apoptotic factors
Anti-apoptotic factors

Cell lines

Extrinsic
pathway

H

Figure 4  Expression changes and CNVs of key
members of the apoptotic pathways. Apoptosis
is a complex network of proteins that integrates
several external and internal signals to make
decisions about programmed cell death. (a) On
average, gene expression levels of pro-apoptotic
genes are only slightly lower in CHO-K1, in
comparison to the Chinese hamster. However,
anti-apoptotic gene expression is significantly
higher in CHO-K1 (*: P < 0.02, Wilcoxon
rank-sum test). (b) When assessing expression
of individual genes, pro-apoptotic genes (red)
tend to more frequently decrease mRNA
expression, whereas anti-apoptotic genes (blue)
more frequently increase expression. (c) Many
major pro-apoptotic (red) and anti-apoptotic
(blue) proteins are represented here in the
context of the extrinsic (brown), intrinsic (red)
or survival (blue) pathways. Proteins that have
CNVs are plotted in bar graphs with each bar
representing a unique cell line as detailed in
the legend, and copy numbers are normalized to
the copy number in hamster. Thus, a value less
than one suggests a loss of a gene copy, whereas
a value greater than one suggests duplication.
Details on each gene abbreviation are included
in Supplementary Table 21.

2+

RIP1

Ca -induced
cell death

2+

Calpain

Ca

FADD
MyD88
2.5
PKA
2
1.5
1

Bad

TRADD
2.5
2 FLIP
1.5
1

2.5
2 NIK
1.5
1

BAK

2.5
PI3K
2
PIK3C
1.5
1
PIK3R
0.5

BAX

Survival
factors

2.5
Akt
2
1.5
1

2.5
CytC
2
1.5
1

IKK
Bcl-2

Apaf-1

Stress
TRADD
TRAF2

IRAK

FADD

2.5
CASP8
2
1.5
1

Bid
2
1.5
1

CASP7

2.5
Calcineurin
2
1.5
1

ER

CASP12

2.5
2 CASP3
1.5
1

Bad

DNA damage–
induced cell
death
DNA
damage

2.5
2 ATM
1.5
1

CASP6

p53

Cleavage of
caspase
substrate

CASP9
2.5
2 DFF40

2

Death
genes
Bcl-2

DFF45

762

sic

rin

Int

Int

rin

sic

pa

pa

thw

thw

ay

ay

1.5
1.5
all cell lines had genomic sequence data for
IκBα
1
1
Bcl-XL
the DHFR gene, except for DG44 (Fig. 3b).
AIF
DNA
fragmentation
Intrinsic
NF-κB
This DG44-specific deletion was further conpathway
2.5
IAP
2
firmed by PCR (Fig. 3c).
Stress
1.5
Apoptosis
1
To assess the genome-wide differences
ENDO-G
between these CHO cell lines, we used the
hamster genome assembly as the reference sequence. This reference (P < 0.006 and P < 9 × 10−5, respectively; hypergeometric test).
sequence allowed us to determine SNPs, short insertions and deletions Notably, some signaling pathways were insulated from SNPs, such
(indels) and gene copy number variations (CNVs) (Supplementary as the WNT and mTOR signaling pathways (P < 0.02 and P < 0.002,
Table 14). Across the cell lines, we identified 3,715,639 SNPs, and a respectively; hypergeometric test) and autophagy (P < 0.01). These
phylogenetic reconstruction based on these SNPs accurately recapitu- pathways all contribute to the proliferative and immortalized pheno­
lated the cell line history (Fig. 3d). We also identified 551,240 indels types in cancer cells21–23 and likely play a similar role in CHO cell
shorter than 5 bp, 319 of which are predicted to be frame-shifting indels lines. Protein glycosylation was also significantly insulated from
in coding regions. SNPs and indels did not occur uniformly, and some SNPs in all cell lines (mean hypergeometric P = 0.018). Thus, the
hamster chromosomes were more affected than others (Fig. 3e,f).
distribution of mutations and CNVs seems consistent with traits
We also found 3,383 nonredundant duplicated regions in at that make CHO cell lines desirable protein production hosts
least one cell line and 177 duplicated regions in all seven cell lines (that is, high proliferation rate, suspension growth and protected
(Supplementary Table 15). In total, 4,241 genes resided entirely protein glycosylation).
within these 3,383 duplicated regions. Moreover, 113 genes were
found to have a reduced copy number in one or more cell lines. Using the genome to study the apoptosis pathway
In addition, 17 hamster genes were completely missing in at least one CHO production strains can be grown to high cell densities in fedcell line, and the missing genes often differed between the lineages batch cultures with serum-free media. Bioprocessing limitations in
(Supplementary Table 16).
nutrients in these environments can lead to apoptosis, thereby limA variety of genes are associated with mutations and CNVs iting viable cell density and volumetric productivity. To improve
(Supplementary Tables 17–20). Of the SNPs, 5,487 (0.15%) were bioprocessing efficiencies, many researchers have sought to improve
nonsynonymous and significantly enriched in many GO classes (false cell-line longevity by suppressing apoptosis in CHO cells. These
discovery rate < 0.01), such as olfactory genes and G protein–coupled efforts involve modulating protein activity by overexpressing antireceptors (P < 2 × 10−25 and 6 × 10−21, respectively; hypergeomet- apoptotic pathways24 and blocking pro-apoptotic pathways with
ric test), whereas genes in these same classes were rarely duplicated chemicals25, short interfering RNA (siRNA)26 and gene deletions27.
(P < 1 × 10−5 and 0.02, respectively; hypergeometric test). In addi- However, the complex nature of apoptosis has made it nontrivial to
tion, proteins involved in cell adhesion were also enriched in SNPs optimize in CHO cells. Thus, a more complete view of gene expression
(P < 0.004; hypergeometric test). It is possible that these mutations influ- and mutations in the apoptosis system could facilitate bioprocessing
ence the ability of  CHO cells to grow in suspension cultures without and cell engineering efforts to control cell death.
To assess changes in apoptosis in CHO cells, we first identified
adhesion factors.
Other genes were protected from SNPs, such as genes associated homologs for anti- and pro-apoptotic proteins in the C. griseus
with DNA binding transcription factor activity and metabolism genome (Supplementary Table 21). Of the 62 KEGG orthologous

Survival genes

© 2013 Nature America, Inc. All rights reserved.

resource

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

resource

DISCUSSION
Genomic resources have provided a wealth of tools in biotechnology4,
ranging from phenotyping tools, such as transcriptomics, to genome
editing technologies. These resources have transformed our ability
to study and modify the functions of human cells (e.g., cancer and
human embryonic kidney cells) and other model organisms. Similar
tools are becoming available for CHO cells 16,39,40, but maximizing
their potential requires a clear picture of the genomic landscape of
CHO cells. Here, we demonstrated how the C. griseus genome can
provide a sequence-level view of genomic heterogeneity between cell

lines and yield a more comprehensive picture of the variants in a cell
line of choice.
Numerous studies have shown large chromosomal rearrangements
in CHO cells, using banding techniques41–43 and fluorescence in situ
hybridization8,44–48. These approaches identified large translocations
in CHO cells, providing a coarse-grained view of genomic variations in these unstable genomes. We present, for the first time to our
knowledge, a whole-genome, sequence-level view of the heterogeneity between CHO cell lines. We showed that each cell line harbors
a unique set of mutations, including SNPs, indels, CNVs and missing genes. CNVs were particularly heterogeneous, with 48% (mostly
duplications) being unique to one cell line (Supplementary Table 16).
We also found that mutations rapidly accumulate during development
of production cell lines. For example, during the development of the
C0101 antibody-producing cell line from CHO-S, 301,753 new SNPs
arose, representing 9% of the SNPs in that cell line.
The nonuniform distribution of mutations in each cell line seemed
to have some phenotypic relevance. Indeed, several processes associated with proliferation and immortalized phenotypes were more
insulated from mutation. These included the WNT and mTOR
signaling pathways and processes such as autophagy. Mutations
in other pathways such as glycosylation and viral susceptibility
(Supplementary Notes) varied between cell lines and might influence desired phenotypic properties, although careful biochemical
studies are needed. Duplications were also seen for many apoptotic genes. Notably, many of the sequence variations were shared
between members of the same family of CHO cells (that is, CHOK1, DG44 or CHO-S), but these were frequently not shared across
CHO cell families (Supplementary Tables 16 and 22–24). A detailed
knowledge of mutations in each cell line may be valuable for cell
line selection, characterization and engineering, as well as bioprocess and media optimization. This knowledge for each cell line may
further improve the success of siRNAs, zinc finger nucleases and other
cell-line engineering tools. Additionally, as more sequence variation
data are collected on diverse cell lines, it may be possible to associate
cell phenotypes with different mutations (as is commonly done in
model organisms49).
To fully detail the sequence variations, it is necessary to have a
well-defined reference genome with relevance to all CHO cell lines.
The reference genome should exhibit several properties. First, it
must contain the genomic sequence of all native CHO genes and
their regulatory elements. We found that CHO-K1 seems to be missing certain hamster genes, and that cell lines from other lineages are
missing other genes (Supplementary Table 16). Although we focused
on genes that are entirely missing, many more truncated genes and
disrupted promoter elements may be found in each cell line as gene
models are improved and as regulatory elements are discovered.
Second, it is often desirable to identify all variants in a cell line,
and not just the genomic differences between two cell lines. There are
clear ultrastructural differences between the hamster and CHO cells.
Some chromosomal translocations are conserved among cell lines8.
These structural variations are likely conserved because CHO cells
from the CHO-K1, DG44 and CHO-S lineages share a common highly
mutated ancestor. Indeed, we found that 67% of SNPs (~2.5 million)
were shared among all CHO cell lines. These shared variants would
be missed if the CHO-K1 genome were used as the sole reference.
Mutations with deleterious effects on expression and/or activity can
be more comprehensively cataloged using the hamster genome as the
reference. Thus, endemic loss-of-function mutations in CHO could be
identified and remedied as needed for a desired phenotype.

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

763

gene identifiers in apoptosis, 92% were in the hamster genome.
Consistent with observations in mouse, caspase-10 was missing28.
Other missing genes included interleukin-3, interleukin-3 receptor
alpha and interleukin-1 alpha. Although these genes were undetected,
apoptosis utilizes redundant pathways, and the lack of these genes
should not hinder the system.
In the CHO-K1 cell line, no additional genes for anti- and proapoptotic proteins were lost relative to the hamster. Instead, apoptotic
gene expression significantly changed. Pro-apoptotic genes exhibited
slightly lower gene expression in CHO-K1 in comparison to C. griseus,
although this was not statistically significant. However, anti-­apoptotic
genes in CHO-K1 exhibited significantly higher median expression (P < 0.02; Wilcoxon rank-sum test; Fig. 4a). Apoptotic genes
with the greatest increase in expression tended to be anti-apoptotic
(e.g., NF-κB, protein kinase A, Akt and Bcl-XL), whereas repressed
genes tended to be pro-apoptotic (e.g., endonuclease G, IκBα, BAX,
and p53) (Fig. 4b). Thus, CHO-K1 suppresses apoptosis, and we
anticipate that similar gene expression changes occur in other CHO
cell lines.
In addition to changes in apoptotic gene expression, CNVs also
frequently occur in apoptotic genes in mammalian cell lines29. As
CNVs can complicate efforts to engineer cell lines, we also analyzed
CHO CNVs in the context of the apoptosis pathways.
The apoptotic network is stimulated by external signals through the
extrinsic pathway, or internal stress signals (e.g., increases in cytosolic
Ca2+ or DNA damage) through the intrinsic pathway. The diverse signals transmitted by each pathway converge upon the caspase proteases,
which cleave protein targets and lead to cell death 28. As a strategy
to increase CHO cell longevity, caspase activation has been targeted
with chemical inhibitors30 and caspase-inhibiting proteins24,31–33.
We found that several cell lines contained extra copies of various
caspases (Fig. 4c). Thus, efforts to remove pro-apoptotic genes, such
as caspases, should account for potential CNVs for those genes. Some
anti-apoptotic genes were duplicated only in individual cell lines,
which may lead to these lines being more resilient against apoptosis activation. For example, the inhibitors of apoptosis (IAP) family
of proteins inhibit caspases34, and we found that one IAP gene, BIRC7,
is duplicated in all cell lines. In addition, another anti-apoptotic
factor, phosphoinositide 3-kinase (PI3K), also showed cell type–
specific CNVs.
In general, CNVs occur in various pathways, such as apoptosis and
glycosylation (Supplementary Fig. 3) and can differ between cell
lines (Supplementary Tables 22 and 23). Knowledge of CNVs can
help researchers avoid unexpected genomic changes35–37 when using
nucleases in duplicated regions. CNVs can be clone-specific as gene
copy numbers in a single cell line vary considerably during growth
media adaptation or after several cell passages 29,38. Thus, clone­specific genomic data may indicate which cell line modifications will
be effective in developing a particular production cell line.

npg

© 2013 Nature America, Inc. All rights reserved.

resource
Third, a reference genome must be amenable to improvement over
time. The chromosomes of CHO cell lines are unstable, with nonnegligible karyotypic differences even in the same culture17,43. Thus, it
will be much easier to develop and maintain a gold standard reference
sequence of the more stable Chinese hamster genome. This resource
will be valuable for characterizing CHO cell lines and using omic
technologies, akin to how the M. musculus genome is used for studying murine cell lines. Furthermore, although regulatory challenges
remain for cell line engineering, whole-genome resequencing against
a reference genome will provide transparency as regulatory agencies
assess products from engineered cell lines for approval.
There are important differences in genomic content among CHO
cell lines that can influence cell line traits. These are likely to be further influenced by differences in gene expression levels. As a result,
genome-scale viewpoints will likely become increasingly relevant for
CHO-based bioprocessing, as they have for microbe-based manufacturing over the past decade. Although these approaches can require
expensive phenotyping and omic technologies, costs are rapidly
decreasing. Thus, genome-scale analyses may enhance our ability to
understand the production characteristics of CHO cell lines and aid
in the production of therapeutic proteins in the coming decades.

1. IMS Institute for Healthcare Informatics. The Global Use of Medicines: Outlook
Through 2015 ⟨http://www.imshealth.com/ims/Global/Content/Insights/IMS%20
Institute%20for%20Healthcare%20Informatics/Documents/The_Global_Use_of_
Medicines_Report.pdf⟩ (IMS, 2011).

2. Walsh, G. Biopharmaceutical benchmarks 2010. Nat. Biotechnol. 28, 917–924
(2010).
3. De Jesus, M. & Wurm, F.M. Manufacturing recombinant proteins in kg-ton quantities
using animal cells in bioreactors. Eur. J. Pharm. Biopharm. 78, 184–188
(2011).
4. Wuest, D.M., Harcum, S.W. & Lee, K.H. Genomics in mammalian cell culture
bioprocessing. Biotechnol. Adv. 30, 629–638 (2012).
5. Xu, X. et al. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell
line. Nat. Biotechnol. 29, 735–741 (2011).
6. Wlaschin, K.F. et al. EST sequencing for gene discovery in Chinese hamster ovary
cells. Biotechnol. Bioeng. 91, 592–606 (2005).
7. Kantardjieff, A. et al. Developing genomic platforms for Chinese hamster ovary cells.
Biotechnol. Adv. 27, 1028–1035 (2009).
8. Cao, Y. et al. Construction of BAC-based physical map and analysis of chromosome
rearrangement in Chinese hamster ovary cell lines. Biotechnol. Bioeng. 109,
1357–1367 (2012).
9. Baycin-Hizal, D. et al. Proteomic analysis of Chinese hamster ovary cells.
J. Proteome Res. 11, 5265–5276 (2012).
10. Meleady, P. et al. Utilization and evaluation of CHO-specific sequence databases
for mass spectrometry based proteomics. Biotechnol. Bioeng. 109, 1386–1394
(2012).
11. Wei, Y.Y. et al. Proteomics analysis of chinese hamster ovary cells
undergoing apoptosis during prolonged cultivation. Cytotechnology 63, 663–677
(2011).
12. Becker, J. et al. Unraveling the Chinese hamster ovary cell line transcriptome by
next-generation sequencing. J. Biotechnol. 156, 227–235 (2011).
13. Birzele, F. et al. Into the unknown: expression profiling without genome sequence
information in CHO by next generation sequencing. Nucleic Acids Res. 38,
3999–4010 (2010).
14. Clarke, C. et al. CGCDB: A web-based resource for the investigation of
gene coexpression in CHO cell culture. Biotechnol. Bioeng. 109, 1368–1370
(2011).
15. Hackl, M. et al. Computational identification of microRNA gene loci and
precursor microRNA sequences in CHO cell lines. J. Biotechnol. 158, 151–155
(2012).
16. Kildegaard, H.F., Baycin-Hizal, D., Lewis, N.E. & Betenbaugh, M.J. The emerging
CHO systems biology era: harnessing the ‘omics revolution for biotechnology.
Curr. Opin. Biotechnol. doi:10.1016/j.copbio.2013.02.007 (20 March 2013).
17. Wurm, F.M. & Hacker, D. First CHO genome. Nat. Biotechnol. 29, 718–720
(2011).
18. Mayshar, Y. et al. Identification and classification of chromosomal
aberrations in human induced pluripotent stem cells. Cell Stem Cell 7, 521–531
(2010).
19. Pleasance, E.D. et al. A comprehensive catalogue of somatic mutations from a
human cancer genome. Nature 463, 191–196 (2010).
20. Urlaub, G., Kas, E., Carothers, A.M. & Chasin, L.A. Deletion of the diploid
dihydrofolate reductase locus from cultured mammalian cells. Cell 33, 405–412
(1983).
21. Rosenfeldt, M.T. & Ryan, K.M. The multiple roles of autophagy in cancer.
Carcinogenesis 32, 955–963 (2011).
22. Sabatini, D.M. mTOR and cancer: insights into a complex relationship. Nat. Rev.
Cancer 6, 729–734 (2006).
23. Yu, M. et al. RNA sequencing of pancreatic circulating tumour cells implicates
WNT signalling in metastasis. Nature 487, 510–513 (2012).
24. Becker, E., Florin, L., Pfizenmaier, K. & Kaufmann, H. Evaluation of a combinatorial
cell engineering approach to overcome apoptotic effects in XBP-1(s) expressing
cells. J. Biotechnol. 146, 198–206 (2010).
25. Fussenegger, M., Schlatter, S., Datwyler, D., Mazur, X. & Bailey, J.E. Controlled
proliferation by multigene metabolic engineering enhances the productivity of
Chinese hamster ovary cells. Nat. Biotechnol. 16, 468–472 (1998).
26. Kim, S.H. & Lee, G.M. Down-regulation of lactate dehydrogenase-A by siRNAs for
reduced lactic acid formation of Chinese hamster ovary cells producing
thrombopoietin. Appl. Microbiol. Biotechnol. 74, 152–159 (2007).
27. Cost, G.J. et al. BAK and BAX deletion using zinc-finger nucleases yields
apoptosis-resistant CHO cells. Biotechnol. Bioeng. 105, 330–340 (2010).
28. Ghavami, S. et al. Apoptosis and cancer: mutations within caspase genes. J. Med.
Genet. 46, 497–510 (2009).
29. Laurent, L.C. et al. Dynamic changes in the copy number of pluripotency and cell
proliferation genes in human ESCs and iPSCs during reprogramming and time in
culture. Cell Stem Cell 8, 106–118 (2011).
30. Arden, N. et al. Chemical caspase inhibitors enhance cell culture viabilities and
protein titer. Biotechnol. Prog. 23, 506–511 (2007).
31. Dorai, H. et al. Combining high-throughput screening of caspase activity with antiapoptosis genes for development of robust CHO production cell lines. Biotechnol.
Prog. 26, 1367–1381 (2010).
32. Kim, Y.G., Kim, J.Y. & Lee, G.M. Effect of XIAP overexpression on sodium butyrateinduced apoptosis in recombinant Chinese hamster ovary cells producing
erythropoietin. J. Biotechnol. 144, 299–303 (2009).
33. Wang, Z., Park, J.H., Park, H.H., Tan, W. & Park, T.H. Enhancement of therapeutic
monoclonal antibody production in CHO cells using 30Kc6 gene. Process Biochem.
45, 1852–1856 (2010).
34. Dasgupta, A., Alvarado, C.S., Xu, Z. & Findley, H.W. Expression and functional role
of inhibitor-of-apoptosis protein livin (BIRC7) in neuroblastoma. Biochem. Biophys.
Res. Commun. 400, 53–59 (2010).

764

VOLUME 31  NUMBER 8  AUGUST 2013  nature biotechnology

Methods
Methods and any associated references are available in the online
version of the paper.
Accession codes. GenBank: AMDS00000000; the version described
in this study is AMDS00000000.1. Accession codes for the sequencing data for the cell lines and the hamster transcriptome are listed in
Supplementary Table 25.
Note: Supplementary information is available in the online version of the paper.
Acknowledgments
The authors would like to thank K.C. Hayes, at Brandeis University, for enabling
the continued housing of the Chinese hamster colony from which CHO cells
were derived. B. Monroe, L. Chasin and S. Gorfien kindly aided in delineating
the history of the cell lines used in this study, and T. Omasa gave guidance in
the chromosomal assignments of scaffolds. This work was funded in part by the
China National GeneBank-Shenzhen, the Shenzhen Engineering Laboratory
for Genomics-Assisted Animal Breeding, and the Shenzhen Key Laboratory of
Transomics Biotechnologies (NO.CXB201108250096A). This work and the Center
for Biosustainability at the Danish Technical University were also funded with
generous support from the Novo Nordisk Foundation. Female Chinese hamsters
were kindly provided by G. Yerganian. The authors would also like to thank
L. Donahue-Hjelle for kindly providing their cell lines.
Author contributions
B.O.P., I.F., X.X., J.W., M.J.B. and J.F. conceived, designed and guided the study.
N.E.L. and H.N. wrote the manuscript. G.Y., A.M.R., J.R., D.B.-H. and
N.L. prepared tissue and cells for sequencing. X.X., X.L., Y.L., C.B., W.C. and
M.X. performed the genome assembly, optical mapping and annotation. N.E.L.,
X.L., H.N., Y.L., E.O’B., A.B. and H.L. analyzed the variant and transcriptomic data.
All authors read and approved the manuscript.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details are available in the online
version of the paper.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
This work is licensed under a Creative Commons AttributionNonCommercial-ShareAlike 3.0 Unported License. To view a copy of this
license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

resource
43. Worton, R.G., Ho, C.C. & Duff, C. Chromosome stability in CHO cells. Somatic Cell
Genet. 3, 27–45 (1977).
44. Balajee, A.S., Dominguez, I. & Natarajan, A.T. Construction of Chinese hamster
chromosome specific DNA libraries and their use in the analysis of spontaneous
chromosome rearrangements in different cell lines. Cytogenet. Cell Genet. 70,
95–101 (1995).
45. Davies, J. & Reff, M. Chromosome localization and gene-copy-number quantification
of three random integrations in Chinese-hamster ovary cells and their amplified cell
lines using fluorescence in situ hybridization. Biotechnol. Appl. Biochem. 33,
99–105 (2001).
46. Simi, S., Xiao, Y., Campagna, M., Doehmer, J. & Darroudi, F. Dual-colour FISH
analysis to characterize a marker chromosome in cytochrome P450 2B1 recombinant
V79 Chinese hamster cells. Mutagenesis 14, 57–61 (1999).
47. Xiao, Y., Slijepcevic, P., Arkesteijn, G., Darroudi, F. & Natarajan, A.T. Development
of DNA libraries specific for Chinese hamster chromosomes 3, 4, 9, 10, X, and Y
by DOP-PCR. Cytogenet. Cell Genet. 75, 57–62 (1996).
48. Cao, Y. et al. Fluorescence in situ hybridization using bacterial artificial chromosome
(BAC) clones for the analysis of chromosome rearrangement in Chinese hamster
ovary cells. Methods 56, 418–423 (2012).
49. Rosenberg, N.A. et al. Genome-wide association studies in diverse populations.
Nat. Rev. Genet. 11, 356–366 (2010).

nature biotechnology  VOLUME 31  NUMBER 8  AUGUST 2013

765

npg

© 2013 Nature America, Inc. All rights reserved.

35. Bueno, C. et al. Etoposide induces MLL rearrangements and other chromosomal
abnormalities in human embryonic stem cells. Carcinogenesis 30, 1628–1637 (2009).
36. Lee, H.J., Kweon, J., Kim, E., Kim, S. & Kim, J.S. Targeted chromosomal duplications
and inversions in the human genome using zinc finger nucleases. Genome Res. 22,
539–548 (2012).
37. Piganeau, M. et al. Cancer translocations in human cells induced by zinc finger
and TALE nucleases. Genome Res. 7, 1182–1193 (2013).
38. Narva, E. et al. High-resolution DNA analysis of human embryonic stem cell lines
reveals culture-induced copy number changes and loss of heterozygosity.
Nat. Biotechnol. 28, 371–377 (2010).
39. Griffin, T.J., Seth, G., Xie, H., Bandhakavi, S. & Hu, W.S. Advancing mammalian
cell culture engineering using genome-scale technologies. Trends Biotechnol. 25,
401–408 (2007).
40. Datta, P., Linhardt, R.J. & Sharfstein, S.T. An ‘omics approach towards CHO cell
engineering. Biotechnol. Bioeng. 110, 1255–1271 (2013).
41. Deaven, L.L. & Petersen, D.F. The chromosomes of CHO, an aneuploid Chinese
hamster cell line: G-band, C-band, and autoradiographic analyses. Chromosoma 41,
129–144 (1973).
42. Derouazi, M. et al. Genetic characterization of CHO production host DG44 and
derivative recombinant cell lines. Biochem. Biophys. Res. Commun. 340,
1069–1077 (2006).

ONLINE METHODS

Sample preparation and DNA sequencing. Female Chinese hamsters were
kindly provided by G. Yerganian. Genomic DNA was isolated from multiple tissues using a modified SDS method50. Seven different paired-end libraries were
constructed with 170 bp, 500 bp, 800 bp, 2 kb, 5 kb, 10 kb and 20 kb insert sizes,
using the standard protocol provided by Illumina (San Diego). The sequencing was done using Illumina HiSeq 2000 according to the manufacturer’s
standard protocol. The raw data were filtered to remove low-quality reads,
reads with adaptor sequences, and duplicated reads before de novo genome
assembly (Supplementary Notes).

npg

© 2013 Nature America, Inc. All rights reserved.

Optical mapping. High molecular weight DNA was obtained from Chinese
hamster tissues. Whole genome shotgun, single-molecule restriction maps
were generated using the automated Argus system (OpGen Inc., Maryland,
USA), based on the optical mapping technology51,52. Individual DNA molecules were deposited onto silane-derivatized glass surfaces in MapCards
(OpGen Inc., MD, USA) and digested by BamHI enzyme. DNA was subsequently stained with JOJO fluorescence dye (Invitrogen, CA, USA) and imaged
within the Argus system. A total of 28 MapCards were processed. The DNA
molecules were marked up and restriction fragment size was determined
by image processing in parallel with image acquisition. This yielded ~26×
optical data.
Genome assembly. Similar to the assembly of the CHO-K1 genome, SOAPdenovo
v.1.06 (ref. 53) was used to assemble the hamster genome into contigs and scaffolds as well as for gap closure. The final genome assembly was 2.4 Gb in length,
which is about 89% of the estimated genome. The contig N50 (the shortest
length of sequence contributing more than half of assembled sequences) was
26.5 kb and the scaffold N50 was 1.54 Mb (Table 1 for statistics on genome
assembly). Optical mapping data were used to further assemble the genome
into super-scaffolds. The scaffolds were extended according to the optical maps
to determine overlapping regions between scaffolds and their relative location and orientation. First, the sequence scaffolds were converted into restriction maps by in silico restriction enzyme digestion by BamHI. These in silico
restriction maps were used as seeds to identify single-molecule restriction
maps of DNA from the corresponding genomic regions by map-to-map alignment. These single-molecule maps were then assembled together by using the
in silico maps, to produce elongated consensus maps (extended scaffolds). The
low coverage regions near the ends of the extended scaffolds were trimmed off
to maintain high extension quality. To generate sufficient extension length, we
repeated the alignment-assembly process 4–5 times, using the extended scaffolds as seeds for each subsequent iteration. All of the extended scaffolds were
then aligned to each other. Any pair-wise alignments above an empirically
decided confidence threshold were considered as initial candidates for scaffold
connection. Alignments that overlapped substantially with the initial scaffolds
were excluded from the candidates. Among the remaining alignments, those
with the highest score were considered. The relative location and orientation
of each pair of connected scaffolds were used to generate super-scaffolds. This
resulted in 6,356 super-scaffolds (>2 kb) with N50 of 2.49 Mb (Table 1).
Chromosomal assignment of scaffolds. To assign scaffolds to their respective chromosomes, our optical mapping data were used in conjunction with
published BAC end-sequencing and fluorescence in situ hybridization8.
Specifically, chromosomal assignments were obtained for each BAC, and then
blastn was used to find scaffolds with the highest homology to the BAC endsequences (E-value < 1 × 10−5). Scaffolds aligned to BACs from more than one
chromosome were filtered from the analysis. Once chromosomal assignments
were obtained for scaffolds (Supplementary Table 3), they were extended to
super-scaffolds based on optical mapping data (Supplementary Table 4). From
this analysis, we were able to reliably localize 26% of the genomic sequence to
specific hamster chromosomes.
RNA sequencing and assembly. RNA was isolated from eight tissues from
several Chinese hamsters. Total RNA was extracted using Trizol (Invitrogen,
USA). The isolated RNA was then treated by RNase-free DNase. The RNA
was subsequently mixed and treated using the Illumina mRNA-Seq Prep Kit
following the manufacturer’s instructions. The insert size of the RNA libraries

nature biotechnology

was about 170 bp, and the sequencing was done using Illumina HiSeq 2000.
Raw reads were filtered out if they contained contamination or were of low
quality (more than 10% of the bases with unknown quality). The resulting 5 Gb
of RNA-seq data were assembled into transcriptional fragments by Trinity54
(version: r2011-08-20). We then assessed the coverage of the transcripts in the
genome assembly by mapping the assembled transcriptional fragments to the
genome assembly using BLAT55.
Gene annotation. We predicted gene models using de novo, homology-based
and transcriptome-aided prediction approaches. For de novo gene prediction,
we used a repeat-masked genome assembly. We used AUGUSTUS (version
2.03)56, GlimmerHMM (version 3.02) and Genscan (version 1.0) for de novo
gene annotation. For homology-based prediction, we mapped the protein
sequences from the CHO-K1 cell line using BLAT, with an E-value cutoff of
10−2, followed by Genewise57 (version 2.2.0) for gene annotation. Genes with
less than 70% identity and 80% coverage in the BLAT alignment were filtered.
Transcriptome-aided annotation was done by mapping all RNA-seq reads back
to the reference genome using Tophat58 (version 1.3.3), implemented with
bowtie59 (version 0.12.5). The transcripts were assembled using Cufflinks60
(version 1.2.1). Taken together with the assembled transcripts from Cufflinks,
we identified the genomic regions covered by the transcriptome. De novo genes
with less than 50% coverage in the transcriptome data were filtered. Finally,
the nonredundant gene sets were merged with the homology-based method
genes and de novo genes, while filtering transposable element genes identified
in the functional annotation. Gene functions were assigned according to the best
match of the alignments using blastp (E-value ≤ 10−5) against the Swiss-Prot
and UniProt databases (release 15.10). The motifs and domains of genes were
determined by InterProScan61 (version 4.5) against protein databases. Gene
Ontology IDs for each gene were obtained from the corresponding InterPro
entry. All genes were aligned against KEGG (release 48.2) proteins, and the
pathway in which the gene might be involved was derived from the matching
genes in KEGG. If the best hit of a gene was “function unknown,” “putative,”
etc., the second best hit was used to assign function until there were no more
hits meeting the alignment criteria (then this gene would be annotated as functionally unknown). Repeat features, transposible elements and endogenous retroviral genes were also identified and annotated (Supplementary Notes and
Supplementary Figs. 4 and 5).
Genome comparison. The assembled Chinese hamster and CHO-K1 scaffolds (>1 kb) were masked by RepeatMasker to remove repeat elements. The
repeat-masked mouse genome62 was downloaded from ENSEMBL (release
60). The repeat-masked hamster and the CHO-K1 assemblies were aligned
to the mouse genome as previously described63. The LASTZ pair-wise whole
genome alignment software (http://www.bx.psu.edu/miller_lab/dist/README.
lastz-1.02.00/README.lastz-1.02.00a.html) was used with the parameters:
K = 4,500 l = 3,000 Y = 15,000 E = 150 H = 0 O = 600 T = 2. The Chain/Net
package64 was subsequently used to process the alignment. With the hamster chromosomal assignments (Supplementary Fig. 6) for many scaffolds,
comparisons on chromosomal localization were made between the mouse and
hamster (Fig. 2a, Supplementary Notes and Supplementary Fig. 7). Structural
variations between the hamster and CHO-K1 genomes were found using a
procedure previously applied to compare two human genomes65. Large masked
scaffolds (larger than 1 megabase in length) were processed with LASTZ using
the aforementioned parameter set. These alignments between the hamster and
CHO-K1 were corrected for inaccurately predicted gaps in the assembly and
other alignment errors. Using the corrected alignments, the best match for each
location on the CHO-K1 scaffolds was chosen by the option “axtBest.” This
deploys a dynamic programming algorithm using the same substitution matrix
as used during the alignment. The hits that contributed most to the colinearity
between the large scaffolds of the Chinese hamster and CHO-K1 were selected,
and discrepancies between the aligned sections were called as insertions and
deletions, exhibiting a wide range of lengths (Supplementary Fig. 8).
Detection of sequence variation among cell lines. We sequenced six different
CHO cell lines to assess the extent of genomic divergence from the hamster
genome. The cell lines were grown on their respective media (Supplementary
Table 13), after which their DNA was harvested and sequenced to greater

doi:10.1038/nbt.2624

© 2013 Nature America, Inc. All rights reserved.

was based on KEGG ortholog assignments (Supplementary Table 21).
Additional analysis on glycosylation and viral succeptability genes
(Supplementary Notes and Supplementary Fig. 9) were based on homology
to gene lists published previously5 (Supplementary Notes and Supplementary
Tables 26 and 27).

50. Peng, J., Wang, H., Haley, S.D., Peairs, F.B. & Lapitan, N.L.V. Molecular mapping of the
Russian wheat aphid resistance gene in wheat. Crop Sci. 47, 2418–2429 (2007).
51. Dong, Y. et al. Sequencing and automated whole-genome optical mapping of the
genome of a domestic goat (Capra hircus). Nat. Biotechnol. 31, 135–141
(2013).
52. Schwartz, D.C. et al. Ordered restriction maps of Saccharomyces cerevisiae
chromosomes constructed by optical mapping. Science 262, 110–114 (1993).
53. Li, R. et al. De novo assembly of human genomes with massively parallel short
read sequencing. Genome Res. 20, 265–272 (2010).
54. Grabherr, M.G. et al. Full-length transcriptome assembly from RNA-Seq data without
a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
55. Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664
(2002).
56. Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in
eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33,
W465–W467 (2005).
57. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14,
988–995 (2004).
58. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with
RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
59. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome. Genome Biol. 10, R25
(2009).
60. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals
unannotated transcripts and isoform switching during cell differentiation.
Nat. Biotechnol. 28, 511–515 (2010).
61. Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res.
33, W116–W120 (2005).
62. Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse
genome. Nature 420, 520–562 (2002).
63. Jex, A.R. et al. Ascaris suum draft genome. Nature 479, 529–533 (2011).
64. Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W. & Haussler, D. Evolution’s cauldron:
duplication, deletion, and rearrangement in the mouse and human genomes.
Proc. Natl. Acad. Sci. USA 100, 11484–11489 (2003).
65. Li, Y. et al. Structural variation in two human genomes mapped at single-nucleotide
resolution by whole genome de novo assembly. Nat. Biotechnol. 29, 723–730
(2011).
66. Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. CNVnator: an approach to
discover, genotype, and characterize typical and atypical CNVs from family and
population genome sequencing. Genome Res. 21, 974–984 (2011).

npg

than the minimum recommended depth of 9× for each cell line, to assure
that enough coverage was obtained to resolve heterozygous SNPs. Sequencing
data can be obtained from the NCBI short read archive (see Supplementary
Table 25 for accession numbers).
Missing genes in the six resequenced cell lines and the previously sequenced
CHO-K1 ATCC genome5 were detected as follows. Sequencing reads from
the seven cell lines and hamster were mapped to hamster assembly with BWA
(version 0.5.9). Read depth of genes was calculated using ‘depth’ tool of
SAMtools (version 0.1.18). A gene was declared to be deleted if it conformed to
the following criteria. First, when mapping the hamster reads to the assembled
hamster genome scaffolds, the read depth of the gene had to be greater than
half of the mean read depth across all hamster genes. Second, the read depth
of the gene for a given cell line had to be less than 0.1. SOAPaligner (version
2.21) was also used for a repeat trial. The resulting read depth distribution was
consistent with that derived from BWA.
To detect SNPs, indels and CNVs, the raw reads from each cell line were
mapped to the hamster genome assembly to determine sequence variations. To
aid the process of variant detection, the hamster scaffolds were concatenated in
a random fashion to obtain 12 pseudo chromosomes. SOAP was used to align
the sequencing reads from each cell line to the reference hamster assembly.
The alignments were subsequently split into pseudochromosomes and sorted
according to the mapped position. SOAPsnp was used to identify SNPs in
each cell line. To further refine the predicted SNPs, we adopted an alternative
approach using BWA to align the reads to the hamster assembly. The ‘mpileup’
tool of SAMtools was applied to get the information of each genomic position in the different samples and BCFtools in the same package was used for
variant calling. The two SNP data sets were subsequently combined to make
the final SNP data set. For each library, we filtered SNPs with depth less than
half of the mean depth. We also filtered SNPs that were located within 5 bp
of another SNP. In total, we identified 3,715,639 SNPs. SNPs were used to
reconstruct the phylogeny of the CHO cell lines. The Jukes-Cantor pairwise
distance was computed between all strains and the phylogenetic tree was built
using the unweighted pair group method average. The alignments were further
processed using SOAPindel (http://soap.genomics.org.cn/soapindel.html) to
identify indels and analyzed using CNVnator to detect CNVs66.
Nonsynonymous SNPs, frame-shifting indels, and gene-containing CNVs
were identified and analyzed. The hypergeometric test was used to identify
gene classes that were over- or under-represented in mutations in all Gene
Ontology classes and KEGG pathways, based on our genome annotation
(Supplementary Tables 17–20). More detailed analysis on apoptotic pathways

doi:10.1038/nbt.2624

nature biotechnology

careers and recruitment

Second-quarter biotech job picture
Michael Francisco

npg

© 2013 Nature America, Inc. All rights reserved.

I

n the second quarter of 2013, the number of advertised biotech and
pharma sector jobs fell slightly in the three job databases tracked by
Nature Biotechnology (Tables 1 and 2). Compared with the previous
quarter (Nat. Biotechnol. 31, 465, 2013), listings for the top 25 biotech
companies dipped on Monster, LinkedIn and Naturejobs. Pharma company listings were more mixed, staying almost identical on Monster while
increasing by 59% on LinkedIn and decreasing by more than 90% on
Naturejobs, respectively. Of special note this quarter was the first appearance in these databases of two Asian companies. Bangalore, India–based
Biocon posted 10 open positions and WuXi PharmaTech of Shanghai,
China, posted 25 open positions on LinkedIn’s job board, making it the
most utilized database among the top 25 biotechs, with all but 3 of the
companies represented.
In May, gene therapy developer GenVec (Gaithersburg, MD, USA)
announced that its board had approved a plan to dissolve the company
and liquidate its assets after reporting a 47% year-over-year drop in annual
revenue for 2012. The company plans to distribute all remaining cash
to shareholders after liabilities and other company obligations are paid.
In Europe, the sale of the former headquarters of Merck Serono in
Geneva to the consortium behind the Campus Biotech initiative has been
Table 1 Who’s hiring? Advertised openings at the 25 largest biotech companies
Advertised openingsb
Companya

Employees

Monsanto

21,400

Amgen

17,250

9

79

1

Life Technologies

11,000

12

34

823

Genzyme

Monster
2

LinkedIn
27

Naturejobs
0

10,100

200

244

0

CSL

9,992

0

3

0

Bio-Rad Laboratories

6,880

9

6

0

bioMerieux

6,378

23

25

0

PerkinElmer

6,200

38

13

0

Novozymes

5,655

0

13

0

Biogen Idec

4,850

5

50

0

IDEXX Laboratories

4,800

17

0

0

Biocon

4,478

0

10

0

WuXi PharmaTech

4,465

0

25

0

Shire

4,183

58

125

0

Celgene

4,182

35

71

208

Gilead Sciences

4,000

0

62

0

Cephalon

3,726

0

181

0

Qiagen

3,587

0

11

0

Endo Pharmaceuticals

2,947

0

11

0

Actelion

2,441

2

0

0

Illumina

2,100

79

20

1

Vertex
Pharmaceuticals
Biotest
Pharmaceuticals
Dendreon

1,691

2

13

0

1,627

0

53

0

1,497

0

30

0

Albany Molecular
Research
Total

1,421

0

0

0

489

1,075

1,033

defined in Nature Biotechnology’s survey of public companies (29, 585–591, 2011). bAs
searched on Monster.com, LinkedIn.com and Naturejobs.com, 18 July 2013. Jobs may overlap.

finalized. Campus Biotech, consisting of a biotech research center with
two local universities as well as a business incubator, is the brainchild of
Hansjorg Wyss, founder of Synthes, and Ernesto Bertarelli, former CEO of
Serono. The sale of the site cushions the blow of Merck Serono’s announcement in April that it would shut down the facility and relocate its R&D
activities to Germany, the United States and China.
Finally, Roche has said it will dissolve its Roche Applied Science division
and integrate its product portfolio into the company’s other diagnostics
business areas by year’s end. The division’s PCR technology and nucleic
acid product lines will fall under Roche Molecular Diagnostics, while the
portfolio of platforms and reagents will be moved to Roche Professional
Diagnostics. The changes will result in a head count reduction of about
170. Other notable second-quarter downsizings within the life science
industry are shown in Table 3.
Table 2 Advertised job openings at the ten largest pharma companies
Advertised openingsb
Companya
Johnson & Johnson

Employees
119,200

Monster
622

LinkedIn
111

Naturejobs
0

Bayer

106,200

3

0

GlaxoSmithKline

103,483

0

72

1

99,495

77

5

4

Novartis

98,200

86

81

1

Pfizer

86,600

3

234

57
9

Sanofi

8

Roche

78,604

0

201

Abbott Laboratories

68,697

1

152

1

AstraZeneca

67,400

53

52

0

Merck & Co.

59,800

Total

0

62

0

845

970

81

aData

obtained from MedAdNews. bAs searched on Monster.com, LinkedIn.com and Naturejobs.com,
18 July 2013. Jobs may overlap.

Table 3 Selected biotech and pharma downsizings
Employees
cut
Details
17
Reduced head count by 89%, to 2, to conserve cash. In the interim,
an operational subcommittee of the board will supervise the company.
NA
Notified the remaining 25% of its workforce of dates of separation and
termination of executive officers as it continues restructuring. Hired
new CEO, CFO and president and named new chairman in June.
Biota
~30
Will reduce head count by 27%, to 80, over the next several quarPharmaceuticals
ters to save cash. The cuts will be concentrated on R&D but will
include general and administrative.
Endo Health
~700
Will reduce head count by about 15% worldwide as part of a plan to
Solutions
reduce annual operating expenses by $325 million by mid-2014.
Orion Corp.
<80
Finnish subsidiary Orion Diagnostica Oy will reduce head count to
streamline operations and is considering closing its Turku site.
Pharmaxis
NA
Reduced head count by 30% to cut expenses and focus on partnership strategies and is consolidating its manufacturing facilities into
one production facility.
Predictive
91
Ceasing operations and laying off all employees after Medicare
Biosciences
decided to deny coverage for the company’s bladder cancer
diagnostics.
Zogenix
55
Will reduce head count by 37%, to 93, to reduce operating
expenses and achieve key business milestones, including gaining
US Food & Drug Administration approval of a New Drug Application
for pain product Zohydro ER hydrocodone bitartrate.
Source: BioCentury. NA, not available.
Company
Addex
Therapeutics
Affymax

aAs

766

Michael Francisco is a Senior Editor at Nature Biotechnology.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

npg

© 2013 Nature America, Inc. All rights reserved.

people

Genomic drug developer Warp Drive Bio (Cambridge, MA, USA) has
named Gregory Verdine (left) CEO, succeeding Alexis Borisy. Verdine
is a co-founder of the company, as well as the Erving Professor
of Chemistry in the Harvard University Departments of Stem Cell
and Regenerative Biology, Chemistry and Chemical Biology, and
Molecular and Cellular Biology. Borisy will remain on the board of
directors and assume the position of executive chairman. In addition,
Warp Drive Bio has appointed former president and COO of Resolvyx
Pharmaceuticals James Nichols to the newly created role of COO,
and Julian Adams, president of R&D at Infinity Pharmaceuticals, to the board.
“In the year since the launch of Warp Drive Bio, the team has invented a completely
new way of discovering ‘nature’s drugs’, which are hiding in plain sight literally in the
soil of our own backyards,” says Verdine. “Warp Drive Bio is leading the rebirth of natural
products, a field that has historically made a powerful contribution to the betterment of
human health but had worked itself into seeming obsolescence. We believe Warp Drive Bio’s
proprietary genomic technologies will propel natural products into the forefront of the future
pharmacopeia.”

OphthaliX (Petach Tikva, Israel) has announced
the appointment of Michael Belkin to the company’s board of directors. He is a professor of
ophthalmology at Tel Aviv University and the
director of the ophthalmic technologies laboratory at the university’s Eye Research Institute at
the Sheba Medical Center.
Prosonix (Oxford, UK) has announced the
appointment of Frank Condella as nonexecutive director. Condella has over 30 years
of experience in the pharma and healthcare
industry. He serves as president and CEO of
Columbia Laboratories and nonexecutive
chairman of Skyepharma, where he was CEO
between 2006 and 2008.
Cancer vaccine developer Immunicum
(Gothenburg, Sweden) has appointed Henrik
Elofsson COO, responsible for preclinical and
clinical trial project management and for the
establishment of a large-scale production process. Elofsson was most recently at Arterion,
where he worked as vice president in charge of
R&D and in the establishment of new production processes.
Edward Hodgkin
(left), a partner with
venture capital firm
Syncona Partners,
has been named
chairman of the
UK’s BioIndustry
Association (London)
768

effective October 30, 2013. A nonexecutive
director on the BIA board since 2010, Hodgkin
succeeds Tim Edwards, who will formally step
down at the October board meeting after three
years as BIA chairman. Edwards will remain a
member of the BIA board.
Molecular diagnostics company CombiMatrix
(Irvine, CA, USA) has named Robert E.
Hoffman to its board of directors. He currently
serves as senior vice president, finance and CFO
of Arena Pharmaceuticals.
Finox Biotech (Burgdorf, Switzerland) has
named Gavin Jelic-Masterton as CEO.
Previously the company’s senior vice president of marketing & commercial operations,
Jelic-Masterton joined Finox in February 2013
and has more than 15 years experience in pharmaceutical marketing and sales.
Orexo (Uppsala, Sweden) has announced the
appointment of Henrik Juuel as executive vice
president and CFO, succeeding Carl-Johan
Blomberg. In addition, Robert A. DeLuca has
been named president of Orexo’s US subsidiary. DeLuca will become a member of Orexo’s
executive management team.
Richard Koenig has joined privately held contract research organization Rho (Chapel Hill,
NC, USA) as vice president of operations. He has
close to 30 years of operations management and
clinical development experience, most recently
as vice president, operations for ClinStar.

Oxford BioTherapeutics (Oxford, UK) has
named Bryan G. Morton as its new non­
executive chairman. Morton has over 30 years
of industry experience, most recently serving as
CEO of EUSA Pharma, a global specialty oncology company acquired by Jazz Pharmaceuticals
in 2012. Previously, he founded and was
appointed CEO of Zeneus Pharma.
Novavax (Rockville, MD, USA) has appointed
Barclay A. “Buck” Phillips to the position of
senior vice president and CFO, with responsibility for managing the company’s finance,
treasury and communications functions.
Previously, Phillips was senior vice president
and CFO of Micromet, which was acquired by
Amgen in 2012.
Clinical stage ophthalmology company
Amakem (Diepenbeek, Belgium) has appointed
Kieran Rooney as vice president, business
development. Rooney brings to Amakem
over 25 years of experience in the biotech and
pharma industries. He is the founder and managing director of Halo BioConsulting.
Brad Thompson has been appointed to the
board of directors of Lorus Therapeutics
(Toronto). He has held the positions of chairman of the board and president and CEO of
Oncolytics Biotech since 1999. He is also currently a board member of Immunovaccine.
Metabolix (Cambridge, MA, USA) has
announced the promotion of Johan van Walsem
to the position of COO, a newly created role.
He most recently served as the company’s vice
president of manufacturing and product development.
Sidney Wolfe, longtime director of the Public
Citizen’s Health Research Group (Washington),
has relinquished his leadership of the consumer advocacy group, though he will continue his work under a new title: founder and
senior adviser. Wolfe founded HRG with Ralph
Nader in 1971, becoming one of the pharma
industry’s greatest antagonists on the issues of
drug safety, patient access to care and medical
board oversight of doctors. His successor at the
Health Research Group is Michael Carome,
who had served as Wolfe’s deputy director
since 2010.

volume 31 NUMBER 8 AUGUST 2013 nature biotechnology

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close