Big_Hadoop_01_fullOriginally on the Huffington Post.

This blog post was co-written with Jason Daniel Schwartz, a journalist and writer. He recently completed a Masters of Environmental Management at the Yale School of Forestry and Environmental Studies.

Big Data has emerged as a game-changing presence in commerce and politics. What used to be the vast and unknown cosmos of individual behavior and preferences can now be parsed for patterns and trends to aid in decision-making. Where policies used to be based on gut-checks and intuition, Big Data is now being translated into decisions that result ingreat profitpolitical gain, or, according its more sanguine proponents, to save the world.

But forests don’t tweet, and whales don’t shop on Amazon. So what does Big Data mean for the environment and sustainability?

As creators of the Environmental Performance Index (EPI), we have yet to see the Big Data revolution enter the environmental domain. We sift through a plethora of globally available, national datasets that measure a suite of environmental issues, ranging from climate change to air quality and forests. Despite the data available, we are still woefully plagued with gaps in knowledge, imperfect data, and uncertainty. We lack, for example, global datasets for national recycling rates, waste management, and toxic chemicals.

That leaves us frequently creating indicators based on incomplete or imperfect data. These indicators are meant to provoke policymakers to act on an environmental issue.  One danger in creating these proxy measures is that issues with data gaps are often ignored because the underlying problems are masked.

So how can we bring Big Data to environmental decision-making? What is needed to invigorate the same kind of massive data collection that tech companies and the private sector are harnessing to their advantage?

We pondered this question at The Economist’s Big Data Information Forum in San Francisco last month. Panelists ranged from tech-world luminaries like Google and Intel, to representatives of from Silicon Valley startups, to local government representatives, including Michael Flowers, the Director of Mayor Bloomberg’s Office of Policy and Strategic Planning. Flowers, who oversees Bloomberg’s “Geek Squad,” highlighted Big Data’s role incatching illegal oil dumpers, cleaning up trees after storms like Hurricane Sandy, and determining potential building code violations in New York City. The key to Big Data Collection, he said, is government regulation.

This strikes as counterintuitive, particularly given the urgency of many environmental issues. If we waited on the US Congress to pass climate legislation, would we know that we recently surpassed 400 parts per million (ppm) atmospheric carbon dioxide concentrations — a threshold that some experts warn will lead to catastrophic global warming?  Simply put, regulation and legislation are often reactionary and slow in a way that tech companies and the private sector are not. Private sector companies are pushing the limits of Big Data for targeted solutions and predictive power — why can’t the same be done for the environment?

Another challenge is that we don’t yet know what environmental Big Data will look like or where it will come from. There are however, some are a few emerging suggestions. Crowd-sourcing and citizen science like Dangermap — a crowd-sourced environmental pollution map making ripples in China — are increasingly popular tools for creating information where there previously was none. Open hardware and the Arduino platform offer exciting prospects for widely distributedinexpensive tools  to enable crowd-sourced data collection. The World Resources Institute has teamed with the Center for Global Development to aggregate vast amounts of satellite data on forest cover, developing algorithms that will detect when deforestation might be happening in any part of the world.  If those algorithms and data are wrong, the Global Forest Watch 2.0 platform allows users to contribute their own observations. The National Ecological Observatory Network, or NEON, is aggregating and designing communicative platforms for the information we already have about climate change, land use change, and invasive species impacts. They are doing it in such a way that makes their resources open and plastic to new information as it comes in.

Still, we have a long way to go. Unlike stock market data that is updated faster than real time, there is no analogous platform or indicators for the environment. That’s a serious problem. Though many environmental phenomena manifest slowly over time, often it is already too late by the time we are able to perceive them.

In this 400 ppm time, we need to start thinking about how we can enlist Big Data for the environment.