II. From First Principles — Root Cause Analysis and Non-Consensus Findings
“The things best to know are first principles and causes, but these things are perhaps the most difficult for men to grasp, for they are farthest removed from the senses.” — Aristotle
Introduction
In my previous post, Reimagining and Replatforming Science for the Era of AI, I described the origins of our journey — how, six years ago, we set out to fundamentally reimagine how science could be conducted in the age of AI.
That post was about conviction, and the belief that science must be rebuilt from first principles, not retrofitted with superficial AI overlays.
This next essay begins the analytical work of that conviction.
Here, I present our Root Cause Analysis and Non-Consensus Findings that emerged from those early days of investigation and debate — a first-principles dissection of why scientific productivity has stagnated despite breathtaking advances in compute, models, and molecular biology.
What follows is not a critique of individuals or institutions, but a systemic diagnosis, a structural explanation for why the scientific enterprise, as currently architected, generates diseconomies of scale instead of economies of scale; why it produces more data yet less knowledge; and why Eroom’s Law continues to worsen in an era when Moore’s Law should have made it obsolete.
Only by confronting these root causes can we design the architecture capable of reversing them, the subject of the following essay, III. From First Principles — A New Architecture and Roadmap for Scientific AI.
I. The Architecture of Decline
For decades, the life sciences have lived inside a paradox.
Every scientific advance from sequencing and automation to robotics, IoT, and AI, has expanded the frontier of possibility. And yet, paradoxically, R&D productivity continues to fall, even as investment and computational power explode.
This contradiction is codified in Eroom’s Law — Moore’s Law spelled backward — the observation that drug development costs double roughly every nine years while computing power doubles every 18 months. I’ll dive further into Eroom’s Law in a future essay.
The two curves are not merely diverging; they are tearing apart. The prevailing explanations are familiar: biology is complex, regulation is slow, capital is misallocated.
But these are symptoms, not causes. The root cause is architectural and cultural.
Science today operates on an outdated, artisanal, and fragmented system design fundamentally incompatible with the era of AI.
The industry has accumulated massive compute, vast data, and armies of talent — yet operates within an architecture that ensures those assets decay rather than compound.
II. The Artisanal Colony
Across discovery, development, and manufacturing, the scientific enterprise resembles a colony of artisans, not an industrial ecosystem.
Each lab is a self-contained workshop.
Each biopharma reinvents the same workflows, integrations, and analyses. Over and over and over again.
Each vendor operates as a micro-sovereign, operating with a zero sum mindset, focused on defending its silo.
This artisanal colony is populated by instruments, ELNs, LIMS, IoT devices, and robots that collectively propagate more than ten million silos of proprietary and unstructured data instead of producing scientific intelligence.
Scientists, instead of experimenting, spend their days stitching together CSVs, reconciling inconsistent metadata, and wrangling incompatible file formats.
The human capital of science — our most precious resource — is consumed by digital janitorial work.
The industry’s underlying architecture was never designed for scale, interoperability, or machine reasoning and it serves no purpose in the era of AI.
It was designed for compliance, record-keeping, and local optimization, an anti-scale architecture that compounds complexity and cost.
The result is systemic diseconomies of scale: every additional experiment increases friction; every new system multiplies incompatibility; every merger adds entropy.
Science, the very enterprise meant to industrialize knowledge, remains trapped in a pre-industrial mode of production.
III. The Myth of Quantity
For years, biopharma has conflated data quantity with data quality. Organizations boast about terabytes and petabytes of experimental data, as if volume were virtue.
But not all data are the same.
Financial, consumer, and clickstream data are narrow, repetitive, and structured. Scientific data are the opposite: heterogeneous, high-dimensional, contextual, and uniquely fragmented.
Every instrument, assay, and experiment generates data with its own geometry, semantics, and lineage.
What matters is not how much data you store, but whether it can be interpreted, linked, productized, industrialized, and reused.
Without structure, data are not knowledge — they are noise. Without context, AI cannot reason. At best, it can only approximate.
Biopharma’s vast data lakes have become data swamps, and AI’s promise collapses into vanity projects and board-level CYA efforts that never scale beyond curated pilots.
The limiting factor in Scientific AI is not compute, not models, not imagination — it is data architecture and cultural inertia.
IV. The Insufficiency of Compute
The broader AI community often assumes that progress scales with model size and compute speed, and that larger networks and faster GPUs will eventually solve everything. But compute and models, while necessary ingredients, are entirely insufficient to achieve Scientific AI.
In most digital domains such as language, vision, and commerce, data are abundant, homogeneous, and self-similar. Words, pixels, and transactions all obey shared grammars; they can be aggregated, tokenized, and synthetically expanded without losing meaning. These systems thrive on statistical regularity.
Scientific data live in a different universe. They are heterogeneous, context-dependent, and ontologically diverse, spanning thousands of instruments, assay types, units, and physical phenomena, and found in many hundreds of use cases and work flows across the value chain. Each dataset carries its own causal semantics, experimental lineage, and measurement uncertainty. Change an instrument, a reagent, or a calibration curve, and the meaning of the data changes with it.
Unlike social or linguistic data, scientific data do not describe opinions or patterns; they describe reality — and reality has dimensional structure. In this domain, context is content. A temperature without a unit, a concentration without a method, or a signal without calibration isn’t just incomplete, it’s meaningless.
You can scale parameters by orders of magnitude, but if the data feeding those models are inconsistent, unstructured, or semantically incoherent, no model will generalize. Models cannot infer what the data never encode.
GPUs can accelerate computation, but they cannot repair epistemology. They cannot infer the causal scaffolding that was never captured, or reconstruct provenance that was never recorded. They cannot reconcile two vendors’ proprietary schemas or correct for experimental inconsistency across labs, instruments, or conditions.
Without AI-native, contextualized, and standardized scientific data, the outputs of so-called “Scientific AI” will remain brittle — clever prototypes, impressive demos, and empty promises that collapse when confronted with real-world heterogeneity.
The bottleneck in science is neither compute nor models; it is the architecture, fidelity, and semantics of the data supply chain — the ability to transform experimental exhaust into epistemic fuel for machine reasoning.
V. The Economic Trap
The entire life sciences industry is built upon deeply misaligned incentive structures and self-imposed and manifestly unnecessary structural tension.
Biopharma operates as an N-of-1 economy: every organization builds bespoke data infrastructures, bespoke use cases, bespoke integrations — all for internal use.
There is no shared platform, no compounding ecosystem, no network effect. The consequence is catastrophic redundancy and shareholder value destruction.
Thousands of organizations replicate the same work, solving the same problems with marginal variations, wasting tens of billions of dollars annually on duplicated effort.
In any other industry, this would be irrational. In science, it’s normalized because the market structure has historically rewarded perceived exclusivity, not interoperability and scalability. In reality, the industry can have it all.
Roughly 90% of the R&D and manufacturing data and use-case stack is common across biopharma, yet the industry treats 100% of it as bespoke.
Despite overwhelming commonality, pharmas do not capitalize on shared innovation and investment amortization, only duplicative project costs and universal pain.
Vendor Economics and Myopia
Each scientific vendor misguidedly believes that their business model depends on what they deem proprietary data — the tighter the control, the stronger the revenue moat. They believe that commercial survival relies on building walled gardens where data flows inward but rarely outward. Interoperability, openness, and data liberation are marketing gimmicks, framed as features, but in practice they’re internally treated as existential threats.
Every “integration” is a hostage negotiation: limited APIs, opaque schemas, contractual friction. The objective is not to accelerate science but to maximize dependency — to make it easier to buy more from the same vendor than to connect with anyone else.
Each organization’s cultural echo chamber, meanwhile, prizes autonomy over standardization, reinforcing the fragmentation vendors believe they depend on.
The result is a system where collective learning and industry-wide innovation is economically disincentivized — a marketplace optimized for captivity, not collaboration.
This N-of-1 pharma paradigm, coupled with the deeply irrational and zero sum mentality of walled garden architects, precludes treating data and use cases as products, prevents shared investment and continuous improvement, and ensures that Eroom’s Law persists.
Closing Thought — The Diagnosis
The crisis of scientific productivity is not biological, computational, or even financial, per se.
It is architectural and cultural, a failure of legacy systems and of the mindsets and incentive structures that sustain them.
Until biopharma abandons its artisanal operating system, and the vendor ecosystem that profits from persistent data fragmentation abandons its zero-sum mentality, the industry will remain trapped in a spiral of self-inflicted inefficiency.
Only when all stakeholders understand that everyone can participate in a marketplace of abundance, and commit to a shared, AI-native infrastructure, can true scientific compounding begin.
The first step toward reversing Eroom’s Law is to recognize that the disease is design, and its most powerful ally is inertia.
Next in the series:




Speaking as an industry insider with an external perspective: there's an awful lot of truth in these observations. I disagree with some of the smaller points and comparisons, and have questions about others... none of those topics are suitable for debate in Substack's comment system, so I'll defer those for another time and place and just note that I agree fully with the overall tone and high-level conclusions.
One thing I'll add, though. This statement:
"Biopharma operates as an N-of-1 economy: every organization builds bespoke data infrastructures, bespoke use cases, bespoke integrations — all for internal use."
One possible response to that is that this is solved within Big Pharma, i.e. that the large pharmas can effectively be viewed as large portfolios of projects (each such project being comparable to individual biotechs), and that Big Pharma solves this N-of-1 problem across those projects by creating shared assets within the company. Data assets, computing assets, knowledge assets, scientific assets, common methods - all contained within (largely) the same IP ownership space.
And yet, even within the walls of those pharmas, everything you're saying here is true. While there absolutely are shared assets, there are also walled gardens of data. There are endlessly repeated experiments. There is an enabled artisanal culture. There's deep competition within the companies (to the detriment of patients) often enabled by data obscurity. I've lost track of the number of times that I, as a digital leader within pharma, had to tell a scientist, "I'm sorry, but it's not your data. It's the company's data."
There are also places where collaboration and data reuse are working incredibly well, thanks to people trying hard to work that way, or where the costs make it obviously the right thing to do. But it's very difficult to do so even just within one large company for all the reasons listed in this essay. Systemic architecture, misaligned incentives, and deeply-rooted culture. That ... and, turns out, scientific data is hard. People have known this for decades: the pharmas that finally get this right are going to have a huge advantage.. even moreso in the age of AI.
Excellent analysis of the situation. I would add that these structural and cultural problems flow from scientists trained - at least in the US and most of the EU - in an academic and funding model that prepares them to operate as agents in a discovery cottage industry, and trains them to behave as isolated practitioners in competition with essentially all other scientists in their field.
Research departments in universities are built around the careers of individual faculty members. These principle investigators are responsible for their own funding (and often to a large degree their income) and that of many or all of their trainees. This leads to a culture of control over every aspect of one’s lab: type of equipment, software, assays, chemical and biological materials, and most of all, the resulting data. Those data are the key to publications and those papers are in turn the means to the next grant.
In this ecosystem, siloing behavior may be entirely rational, and the founders of and incumbents in our biopharma industry are the product of this system. There are many encouraging examples of more collaborative science in academia. One of the early examples that worked in life sciences that comes to mind was a consortium that formed to study Huntington’s Disease, where group success was measured by overall progress, not individual achievement, and those successes were shared by all members of the group.
While there have been a growing number of examples of collaborative science, the basic structure and culture has not changed. Can industry help lead this change, or is a change in academia a necessary precursor to rewriting the operating model in industry?