Data Minimization for Green Tech: How Much Does Data Storage Affect The Environment, and What Can Be Done?
Data storage centers are the physical form of the Cloud, enabling fast and easy transmission and storage of data around the world. However, data storage has a massive environmental impact that many users might not be aware of. In addition to the electricity needed to run servers, data centers consume a massive amount of power and water for cooling. Green IT practices and resources are needed to reduce these effects, and well-implemented data minimization is a key part of greening the Cloud.
Data Center Electricity Demands
Data centers need to run continuously and without error--a breakdown or overheating without failsafes could lead to downtimes and interrupted service that cost the center thousands of dollars a minute.
Running servers requires a great deal of electricity, but the true energy hog of data storage is cooling. The flow of electricity through the servers generates significant heat, and if the servers are not continuously cooled they can shut down or be permanently damaged. Since the servers need to run 24/7, cooling systems need to function 24/7 as well. Multiple backups need to be on standby to prevent mass overheating in the event that the main systems fail. This constant electricity use means that data centers have a very large carbon footprint.
Most data centers rely on air conditioning to cool their rooms of servers; some even have specialized systems designed for cooling computer rooms. In most data centers, over 40% of electricity consumption goes to cooling. As Steven Gonzalez Monserrate notes in his paper The Cloud is Material: On the Environmental Impacts of Computation and Data Storage, this adds up dramatically. The Cloud's carbon footprint is larger than that of the entire airline industry, and it consumes 200 terawatt-hours of electricity a year--more than some countries. Just one data center can consume as much electricity as 50,000 homes. As little as 6 to 12 percent of that energy is used on active computation; the rest is spent on cooling and powering the many failsafes and backup systems needed to avoid downtime. Studies have calculated the energy cost of data storage and transfer to range from 3.1-7 kWh per gigabyte. Storing 100 GB of data in the US for a year would emit approximately 0.2 tons of CO2 (assuming an average rate of carbon emissions for electricity).
There is little opportunity to cut energy consumption in cooling. Cutting too many corners on cooling to save energy could lead to malfunctions and costly downtime. "Hyperscale" data centers can be powered by Big Tech-funded renewable energy investments and engineered for modern energy and heat efficiency, but these options aren't available for traditional data centers operating in older buildings. Some experts think that data storage facilities should be relocated to naturally cold countries like Iceland, but this would create latency problems and has already made electricity prices in Iceland spike.
Data Center Water Consumption
Water is a better heat absorber than air, so many data centers use chilled water to cool their servers. While this means lower electricity consumption and a smaller carbon footprint, the cost in water is just as significant. Data center liquid cooling places a lot of pressure on the area's water supply. This is especially taxing in the Western US and other naturally dry areas, especially as climate change accelerates water scarcity problems. As a particularly harrowing example, the community of Bluffdale, Utah, faces regular blackouts and water shortages due to the NSA's nearby Utah Data Center, which consumes seven million gallons of water every day.
Like with air cooling, there is only so much room for improvement in liquid cooling systems. Only hyperscale data centers have the resources to invest in "closed-loop" cooling systems that recapture evaporated water, and even then most of it is still lost. Big Tech companies have pledged to invest in water efficiency systems and infrastructure, but corporate pledges are unenforceable and Big Tech has failed other environmental pledges in the past.
How Data Minimization Can Help
Data minimization is a principle of data privacy that holds that organizations should only collect the minimum amount of data they need for their stated purpose. It is a mandated requirement of major data privacy laws like GDPR and ADPPA. Data minimization can also have significant impacts on system efficiency. By collecting and storing only the minimum amount of data necessary and eliminating redundant data points, an organization can dramatically reduce its storage needs. Most organizations tend to store more data than they need due to inefficient systems or lack of prioritization in data management.
Reducing demand via data minimization may be the best option for reducing the environmental impact of data storage. Cooling needs are hard to control. Cooling can't be cut without endangering the data, and as the world becomes more connected, the need for reliable data management will only increase. Improved cooling technologies may improve efficiency, but this is difficult to predict. The rate of improvement might not be enough to offset the increased demands of data storage and processing--data center energy consumption is projected to triple from 2016 to 2026. Since efficiency improvements have their limits, reducing the load on the system is the best way to reduce cooling needs.
If more organizations integrate data minimization principles into their operations and implement technologies to simplify that integration, then the benefits of data minimization will lead to saved costs, streamlined data management, and reduced environmental impact. Data minimization can be applied at any level, from Big Tech data hypercenters or smaller companies that can't afford carbon offsets or renewable energy investments. Tech can't be green without the intelligent application of data minimization.
About Ardent Privacy
Ardent Privacy is an "Enterprise Data Privacy Technology" solutions provider based in the Maryland/DC region of the United States and Pune, India. Ardent harnesses the power of AI to aid companies with data discovery and automated compliance with RBI Security Guidelines, GDPR (EU), CCPA/CPRA (California), and other global regulations by taking a data-driven approach. Ardent Privacy's solution utilizes machine learning and artificial intelligence to identify, inventory, map, minimize, and securely delete data in enterprises to reduce legal and financial liability.
For more information visit https://ardentprivacy.ai/ and for more resources here.
Ardent Privacy articles should not be considered legal advice on data privacy regulations or any other specific facts or circumstances.