About

firefoxSignificance A functional genomics database and web service for energy microalgae (EnergyAlgaeDB; http://www.bioenergychina.org:8989/) should be of significant value to a worldwide userbase for multiple disciplines, mainly for the following four reasons. (1) Microalgae have been considered as one promising feedstock for biofuel (liquid fuels such as TAG and gaseus fuels such as hydrogen) with a production scale potentially large enough to replace fossil diesel while simutaneously capture industrial CO2. However, the structural and functional diversity of microalgal genomes remain ill defined. For example, few oleaginous algal genomes were published to-date, although dozens of them are being sequenced across the world. Thus a database with comprehensive serach and comparison functionalities, such as EnergyAlgaeDB is an essential community resource for cataloguing, annotation and distributing genomic information. (2) Algae exhibit tremendous ecological, evolutionary and organismal diversity that together constituted 40% of photosynthesis on earth. They are part of the green plant lineage (Viridiplantae), and diverged from the Streptophytes (land plants and their close relatives) over a billion years ago. Thus they represent an evolutionarily crucial domain of life. Therefore, a database such as EnergyAlgaeDB should also be of significant value to those interested in the eukaryotic ecology and evolution. (3) Up to now, few functional-genomics research model organisms for such “energy-algae” have been established. Traditional laboratory model-organisms for algal physiology such as Chlamydomonas reinhardtii (http://www.chlamy.org/chlamydb.html) do not produce large amount of oil and usually are not amenable to large-scale outdoor cultivation, thus new research models for robust biofuel production are urgently needed. A database such as EnergyAlgaeDB, as a centralized infrastructure for storing, integrating and cross-validating the various kinds of “omics” data, is one prerequisite for the research community to together establish such emerging new microalgal model organisms. (4) There are so far only a few existing publicly accessible databases and web service on microalgae, and there has been no genomics database for the vast energy-algae despite of the rapidly emerging genomic resources. Most of the available databases for algae are providing ecological, phylogenetic and phenotypic curation of microalgal strains found in nature, yet do not provide any genomics or functional genomics information (e.g. Algae Resource Database, http://www.shigen.nig.ac.jp/algae/; AlgaeBase, http://www.algaebase.org/). The few that serve genomics resources are solely dedicated for Chlamydomonas reinhardtii, the existing laboratory research model organism (e.g. http://www.chlamy.org/chlamydb.html).

All these research needs and challenges suggested the urgency in developing a functional genomics database for energy microalgae. Therefore, we have created EnergyAlgaeDB. EnergyAlgaeDB is not only a database, but also a web service. It was designed to store, curate, integrate, search, compare and distribute functional genomics information for energy-algae.

The database At the database side, the functional genomics data currently provided in the database include genomes (complete or draft) and gene expression profiles. By February, 1, 2012, it will have included both mRNA and microRNA expression profiles (both microarray-based and RNA-Seq-based). The database can be readily extended to include proteomes, metabolomes, regulomes and interactomes. The data sources are primary-data producers such as us (BioEnergy Genome Center, CAS-QIBEBT; see a list of our ongoing energy-algae genome projects in Table 1) as well as the public domain. Such energy-algae include both those producing liquid fuels (e.g. TAG) and those producing gaseous fuels such as hydrogen, and could also include those producing high-value-added products such as carotenoids, EPA and DHA (for examples, see Table 1).

Nannochloropsis, a phylogenetically distinct group of wild unicellular microalgae, represent a prominent series of genome-entries in the current version of EnergyAlgaeDB. They are a genus in Eustigmatophyceae and many of them are capable of rapid growth and robust production of triacylglycerol (TAG) that can be readily converted into advanced biofuels. We have sequenced and presented in EnergyAlgaeDB a eight-member “Nannochloropsis PhyloGenome” (genome sizes all around 30Mb), which included two N. oceanica strains (OZ1 and CCMP531) and one strain from each of all other recognized species: N. salina (CCMP537 and CCMP1776), N. gaditana (CCMP527), N. oculata (CCMP525), N. limnetica (CCMP505) and N. granulata (CCMP529) (Table 1). Furthermore, for one of them, we present a single-base resolution transcriptomic program underpinning the full course of nitrogen-depletion-induced TAG production. All these data are novel and never published before. This unprecedented microalgal phylogenome dataset reveal the nature and degree of microalgal genome conservation and divergence at the strain-, species- and genus-levels, and unraveled genomic and transcriptomics “signatures” of oleaginous microalgae. The rich genomic resources, compact genome, wide ecological adaptation, together with the capability for large-scale cultivation, established Nannochloropsis as a valuable new model organism for photosynthetic production of renewable fuels and chemicals.

For all of the eight Nannochloropsis, the genome sequences, gene models (predicted and experimentally validated) and functional annotations were currently provided in EnergyAlgaeDB. Moreover, to provide users dynamic comparisons on the diversity and evolution of gene/genome structure and function in the various microalgal lineages, we have included in the current database several previously published microalgal genomes (Chlamydomonas reinhardti, Cyanidioschyzon merolae, Phaeodactylum tricornutum, Thalassiosira pseudonana, etc.). In the future, we plan to include all single-cell microalgae, either sequenced by us or from public sources, that produce not just biofuels but also platform-chemicals and other high-value-added products (for a few examples, see Table 1).

Furthermore, on top of the current EnergyAlgaeDB release (July, 2011), we are in the process of integrating a series of full time-course lipidome profiles into the genome and transcriptome data for N. oceanica OZ-1 (one of the eight Nannochloropsis genomes we sequenced). This task is expected to be completed by March. 2012.

The web service The database is implemented based on the mySQL database framework and in-house scripts. All of the genome sequences and gene structures could be downloaded as text files from the EnergyAlgaeDB website (http://www.bioenergychina.org:8989/, under the “Download” tab). Accompanied with the database is a genome browser interface and a set of search tools.

   Firstly, the full genome sequences (or scaffolds and contigs), their corresponding gene structures and functional annotation of these genes could be browsed at both per-base resolution and whole-genome scale. The genome regions as well as gene details could be viewed and annotated by online users in an interactive manner (zoom-in, zoom-out, etc.).

   Secondly, the genes could be searched by gene ID and gene name (under the “GBrowse” tab).

   Thirdly, all fragments on genome sequences could be searched by BLAST with either DNA sequences or proteins sequences as queries against the genomes in the database (under the “Blast” tab).

   Fourthly, custom tracks encoding users’ own genomic or transcriptomic data (in GTF format) can be freely uploaded by users and interactively compared with existing gene structures and annotations in the database (under the “GBrowse” tab).

   Finally, in order to handle the fast increasing number of queries to this database, we have implemented a multi-core CPU and GPU based high-performance computational platform as an option for efficient process of any database queries that would incurred (under the “Blast” tab).

Commitment We have a long-term commitment to the development, maintenance and expansion of this database service, free of charges, to the world-wide research community. New data emerged from our Center (e.g. Table 1) and from public data sources world-wide, such as new energy microalgal genomes, metagenomes, transcriptomes and other omics data, will be regularly added into and updated in the database. Therefore, EnergyAlgaeDB should represent a valuable community resource and service to those interested in the structure, function, regulation, evolution and ecology of microalgal functional-elements, genomes and organisms. Currently, it has already served more than 10 long-term research institute all around the world (Table 1). Its user-base should also include the rapidly growing academic and industrial research communities developing microalgae-based biofuels and chemicals.

Table 1. Completed and ongoing energy-algae genome projects at BioEnergy Genome Center, CAS-QIBEBT (as of December 29, 2011), as well as the number of long-term users for each project.

 

Species/Strain

Funding Source

Collaborators of CAS-QIBEBT

Motivation for seq.

Seq. Status

Database status

# Long-term Users

1

Nannochloropsis OZ-1

National Science Foundation of China (NSFC) and Chinese Academy of Sciences (CAS)

Arizona State University, USA; University of Maryland, USA

oil (TAG) and EPA production

completed

completed

5

2

Nannochloropsis CCMP 531

NSFC and CAS

Arizona State University

oil and EPA production

completed

completed

3

3

Nannochloropsis CCMP 529

NSFC and CAS

Arizona State University

oil and EPA production

completed

completed

3

4

Nannochloropsis CCMP 525

NSFC and CAS

Arizona State University

oil and EPA production

completed

completed

3

5

Nannochloropsis CCMP 505

NSFC and CAS

Arizona State University

oil and EPA production

completed

completed

3

6

Nannochloropsis CCMP 527

NSFC and CAS

Arizona State University

oil and EPA production

completed

completed

3

7

Nannochloropsis CCMP 537

NSFC and CAS

Arizona State University

oil and EPA production

completed

completed

3

8

Nannochloropsis CCMP 1776

Solix Biofuels Inc., USA

Solix Biofuels Inc., USA

oil and EPA production

completed

ongoing

3

9

Pseudochlorococcum sp.

NSFC and CAS

Arizona State University

oil production

ongoing

ongoing

3

10

Scenedesmus sp.

NSFC and CAS

Arizona State University

oil production

ongoing

ongoing

3

11

Haematococcus pluvialis sp. Str-001

CAS

Arizona State University

Production of astaxanthin

ongoing

ongoing

3

12

Haematococcus pluvialis sp. Str-002

CAS

Arizona State University

Production of astaxanthin

ongoing

ongoing

3

13

Chlorella sp.

Ministry of Science and Technology of China (MoST)

East China Univ. of Science and Technology

oil production

ongoing

ongoing

2

14

Chlorella sp.

CAS

Institute of Hybrobiology, CAS

oil production

ongoing

ongoing

2

15

Crypthecodiniumcohnii sp.

CAS

CAS-QIBEBT internal

DHA production

ongoing

ongoing

1

16

Neochloris sp.

CAS

Wageningen Univ., Netherland

oil production

ongoing

ongoing

2

18

Nannochloropsis sp.

CAS

Technical Univ. of Denmark

oil and EPA production

ongoing

ongoing

2