Right here in Columbus are a series of databases that some scientists and business leaders call the world’s most valuable collection of scientific data. This week, they'll be used to train some of the world’s best young scientists.
Just how much information is in a petabyte?
“If it was in standard mp3 music format, it would be a song that would take 2,000 years to listen to,” says Kris Woods.
Woods is the infrastructure and workplace director at CAS, or Chemical Abstracts Service, a division of the American Chemical Society. Woods is charged with keeping database computers humming at CAS, and it's a big job.
Computers inside the CAS campus on Olentangy River Road actually house three petabytes, or 3,000 terabytes, of data. Information on more than 130 million chemical compounds is stored inside 2,600 physical servers that require their own massive air conditioning system, as well as two diesel generators and a room full of batteries to ensure backup power in the event of an emergency.
CAS’s Jonathan Taylor says they have to be accessible 24 hours a day, seven days a week, because more than 500,000 scientists around the world depend on the information.
“That’s important information for researchers who are trying to understand things like what’s been discovered before, or how has something been made,” Taylor says.
Getting raw data into a digestible and searchable form requires a lot of legwork. It’s undertaken by scores of scientists, developers, technicians and others housed in several CAS buildings.
Training The Next Generation
Taylor helps run CAS’ SciFinder Future Leaders program, which brings in some of the brightest young scientific minds from around the world for a week’s worth of training. The SciFinder program is named after CAS' best-known database.
“We talk to them about leadership principles, we try to teach them a little bit about how science information is built and created," Taylor says. "But we also learn from them and they have an impact on the databases."
Taylor says they have an acceptance rate “in the single digits.” This year they’ve invited 30 Ph.D. students and postdoctoral researchers. Their training gets into full swing on Tuesday afternoon when they meet with the CAS scientists who curate SciFinder.
Aside from learning the ins-and-outs of SciFinder and other CAS databases, Future Leaders enrollees also learn about market forces within the scientific community, how to get articles published, and how to maximize the impact of their work through the media.
The week-long program ends Sunday when enrollees present their own work to colleagues. Presentation topics this year include everything from “Addressing the gender gap in STEM disciplines” to “Accurately evaluating the photolytic fate of agrochemicals in natural waters.”