Our new paper, “A Case for Data Commons: Toward Data Science as a Service” was published in a special issue on Science as a Service in the Sep/Oct volume of Computing in Science and Engineering.
As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science life cycle, including long-term data storage, data exploration, and discovery services, and compute capabilities to support data analysis and reanalysis as new data is added and scientific pipelines are refined. The authors describe their experience developing data commons-interoperable infrastructure that collocates data, storage, and compute with common analysis tools. Across the presented case studies, several common requirements emerge, including the need for persistent digital identifier and metadata services, APIs, data portability, pay-for-compute capabilities, and data peering agreements between data commons. Although many challenges, including sustainability and developing appropriate standards remain, interoperable data commons bring us one step closer to effective data science as a service for the scientific research community.
Read the paper here: