LONDON and CAMBRIDGE, Mass., June 11, 2025 (GLOBE NEWSWIRE) -- Basecamp Research, an AI company dedicated to using nature to solve the most pressing challenges in the life sciences, today announced the discovery of over 1 million new species as part of BaseDataTM, the world’s largest biological protein sequence database, and the first to be purpose-built to power the next era of generative biology.

“The rise of generative biology – using AI foundation models to design, generate, and annotate proteins, pathways and therapeutics – creates unprecedented demand for large, diverse biological sequence databases,” said Glen Gowers, Ph.D., CEO and Co-Founder of Basecamp Research. “The million plus new species we’ve identified to date gives us an unrivaled understanding of how life on Earth has evolved over billions of years. We believe it will help the life sciences sector overcome today’s critical data bottleneck and build a truly generative biology ecosystem with applications across nearly every area of human and planetary health.”



Today, research relies heavily on public, open access biological sequence databases. Designed originally as academic collaboration tools, these databases are accessed over 100 million times per day and are the source of over 50% of all life science patents. This usage has grown by 10x in the last decade and continues to accelerate thanks in large part to the increasing use of data-hungry AI models in biopharma research.

However, compared to all life on Earth, these databases are incomplete, riddled with redundancies, and highly-biased – 70% of all sequence data comes from just 10 species. Furthermore, the information in these databases is growing slowly, and companies using these databases for commercial purposes face increasing scrutiny on various legal fronts.



This lack of growth and diversity holds back model performance and is a fundamental problem that slows research advancements in the life sciences: the data wall.

Basecamp Research has broken through this data wall by pioneering an economic partnership-based model that incentivises the collection of samples across the planet’s most extreme and biodiverse environments, working with 125+ communities in 26 countries. In a milestone disclosure, the company is unveiling the results of its biodiscovery initiative — BaseData has identified 9.8 billion new protein sequences. Once redundant sequences are removed, BaseData is currently over 10x bigger than all public databases combined. It continues to grow rapidly, offering a fundamentally new understanding of life on Earth.

“Biological foundation models are the key to continued progress in the life sciences, but their growth is slowing and performance is suffering,” said Oliver Vince, Ph.D., Co-Founder of Basecamp Research. “It’s this data wall that discourages life science teams in building and using huge biological foundation models. At Basecamp, we’ve demonstrated a new, scalable economic framework for expanding our knowledge of life on Earth and we show it is possible to overcome the data wall that’s limiting progress for others in bioAI. Training our own foundation models on this data is enabling collaborations across the life sciences sector to help solve pressing challenges in drug discovery.”

Basecamp Research scientists have uncovered more than one million new microbial species to date in some of the planet’s most remote and extreme environments, expanding the view of the tree of life by over 10 times. The previously hidden biological insights in this data, combined with foundational AI models, will help shape everything from environmental sustainability and repair to therapeutics development. Examples of such newly discovered species include:

On a World War II shipwreck, a new species of Burkholderia, a type of bacteria known for its ability to remove heavy metals from the environment. This could help improve pollution control and deepen our understanding of how bacteria resist antibiotics.

In acidic hot springs near an active volcano, a new member of the Sulfolobaceae family that thrives in searing temperatures. Its specialized proteins, stress-response systems and stability near boiling point could help develop new ways to deliver medicine or preserve biological materials under harsh conditions.

In Antarctic soil, a new species of Candidatus Eremiobacterota, a bacterium that survives by drawing nourishment from the air, generating its own water using hydrogen as an energy source — a finding that could inform novel gas-based drug delivery systems or therapeutic approaches.



Basecamp Research is sharing further details in a pre-print paper and plans to offer early access to its unique data to life sciences researchers who express interest via its website www.basecamp-research.com to help them further their research.



