1922017, Diehard Coders Just Rescued NASAS Earth Science Data | WIRED
MEGAN MOLTENI SCIENCE 02.13.17 5:35 PM
AR CODERS JUST RESCUED NASA'S EARTH SCIENCE
AT!
AMIE LYONS.
ON SATURDAY MORNING, the white stone buildings on UC Berkeley’s campus radiated
with unfiltered sunshine. The sky was blue, the campanile was chiming. But instead
of enjoying the beautiful day, 200 adults had willingly sardined themselves into a
fluorescent-lit room in the bowels of Doe Library to rescue federal climate data.
Like similar groups across the country—in more than 20 cities—they believe that the
Trump administration might want to disappear this data down a memory hole. So
these hackers, scientists, and students are collecting it to save outside government
servers.
But now they’re going even further. Groups like DataRefuge and the Environmental
Data and Governance Initiative, which organized the Berkeley hackathon to collect
data from NASA's earth sciences programs and the Department of Energy, are doing
more than archiving. Diehard coders are building robust systems to monitor ongoing
changes to government websites. And they’re keeping track of what’s been removed
—to learn exactly when the pruning began.
nip two wird. comy201 702d ehard-codrs-ust-saved-nasas-arthscience-data>mbid=socsl_f> w1922007 Diehard Coders Just Rescued NASA Ear Sclenes Data WIRED
Tag It, Bag It
The data collection is methodical, mostly. About half the group immediately sets web
crawlers on easily-copied government pages, sending their text to the Internet,
Archive, a digital library made up of hundreds of billions of snapshots of webpages.
They tag more data-intensive projects—pages with lots of links, databases, and
interactive graphics—for the other group. Called “baggers,” these coders write
custom scripts to scrape complicated data sets from the sprawling, patched-together
federal websites.
It’s not easy. “All these systems were written piecemeal over the course of 30 years.
There’s no coherent philosophy to providing data on these websites,” says Daniel
Roesler, chief technology officer at UtilityAPI and one of the volunteer guides for the
Berkeley bagger group.
One coder who goes by Tek ran into a wall trying to download multi-satellite
precipitation data from NASA’s Goddard Space Flight Center. Starting in August,
access to Goddard Earth Science Data required a login. But with a bit of totally legal
digging around the site (DataRefuge prohibits outright hacking), Tek found a buried
link to the old FTP server. He clicked and started downloading. By the end of the day
he had data for all of 2016 and some of 2015. It would take at least another 24 hours to.
finish.
‘The non-coders hit dead-ends too. Throughout the morning they racked up “404 Page
not found” errors across NASA’s Earth Observing System website. And they more
than once ran across empty databases, like the Global Change Data Genter’s reports
archive and one of NASA’s atmospheric CO, datasets.
And this is where the real problem lies. They don’t know when or why this data
disappeared from the web (or if anyone backed it up first). Scientists who understand
it better will have to go back and take a look, But meantime, DataRefuge and EDGI
understand that they need to be monitoring those changes and deletions. That’s more
work than a human could do.
So they’re building software that can do it automatically.
Future Farming
Later that afternoon, two dozen or so of the most advanced software builders
gathered around whiteboards, sketching out tools they’ll need. They worked out
nip two wird. comy201 702d ehard-codrs-ust-saved-nasas-arthscience-data>mbid=socsl_f> an1922007 Dehra Coders Jit Rescued NASA Earh Science Data | WIRED
filters to separate mundane updates from major shake-ups, and explored blockchain-
like systems to build auditable ledgers of alterations. Basically it’s an issue of what
engineers call version control—how do you know if something has changed? How do
you know if you have the latest? How do you keep track of the old stuff?
There wasn’t enough time for anyone to start actually writing code, but a handful of
volunteers signed on to build out tools. That’s where DataRefuge and EDGI organizers
really envision their movement going—a vast decentralized network from all 50
states and Canada. Some volunteers can code tracking software from home. And
others can simply archive alittle bit every day.
By the end of the day, the group had collectively loaded 8,404 NASA and DOE
webpages onto the Internet Archive, effectively covering the entirety of NASA’s earth
science efforts. They'd also built backdoors in to download 25 gigabytes from 101
public datasets, and were expecting even more to come in as scripts on some of the
larger datasets (like Tek’s) finished running. But even as they celebrated over pints of
beer at a pub on Euclid Street, the mood was somber.
There was still so much work to do. “Climate change data is just the tip of the
iceberg,” says Eric Kansa, an anthropologist who manages archaeological data
archiving for the non-profit group Open Context. “There are a huge number of other
datasets being threatened with cultural, historical, sociological information.” A
panicked friend at the National Parks Service had tipped him off to a huge data portal
that contains everything from park visitation stats to GIS boundaries to inventories
of species. While he sat at the bar, his computer ran scripts to pull out a list of
everything in the portal. When it’s done, he'll start working his way through each
quirky dataset.
UPDATE 5:00pm Eastern, 2/15/17: Phrasing in this story has been updated to clarify
when changes were made to federal websites. Some data is missing, but it is still
unclear when that data was removed.
nip two wird. comy201 702d ehard-codrs-ust-saved-nasas-arthscience-data>mbid=socsl_f>
ar192.2017 Diehard Coders Just Rescued NASAS Earth Science Data | WIRED
MORE ON NASA 118
Your First Look at Nothing Is as Cool as Meet America's Newest, NASA Explains Jupiter's NA!
Jupiter's Gorgeous Sequencing DNA in Spa... High-Techiest Weathe.... Wild North & South... Tre
nips ww wired. com 2017104 ehard-codersust-saved.nasas-arth-scionce-datatmbid=social_f>192.2017 Diehard Coders Just Rescued NASAS Earth Science Data | WIRED
nips ww wired. com 2017104 ehard-codersust-saved.nasas-arth-scionce-datatmbid=social_f>192.2017 Diehard Coders Just Rescued NASAS Earth Science Data | WIRED
nips ww wired. com 2017104 ehard-codersust-saved.nasas-arth-scionce-datatmbid=social_f>192.2017 Diehard Coders Just Rescued NASAS Earth Science Data | WIRED
nips ww wired. com 2017104 ehard-codersust-saved.nasas-arth-scionce-datatmbid=social_f>
mt