GSoC 2020: Implementing user-friendly search features


The NCBI Sequence Read Archive (SRA) is the primary archive of next-generation sequencing datasets. SRA makes metadata and raw sequencing data available to the research community to encourage reproducibility and to provide avenues for testing novel hypotheses on publicly available data[1].

pysradb provides a simple method to programmatically access metadata and download sequencing data from SRA and European Bioinformatics Institute’s European Reads Archive (ENA).

This project implements a search module for pysradb, which is able to send the user’s search query to NCBI’s SRA, GDB and EBI’s ENA databases, and return the search result to the user. The search module can also generate statistics and graphs to provide a quick summary of the search results.


1. Choudhary, Saket. “pysradb: A Python Package to Query next-Generation Sequencing Metadata and Data from NCBI Sequence Read Archive.” F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532 (