Tracking solar farms with satellites in a 2.7M square kilometers area

Dymaxion Labs
4 min readJul 7, 2020

The renewable energy industry has experienced significant growth worldwide, explained by the drop in the cost of its generation in the last decade. According to the International Renewable Energy Agency (IRENA), the transition into using clean energy could meet 90% of the goals of the Paris Agreement to reduce emissions from this industry. Argentina has great solar and wind potential to generate energy from these sources, which would allow the country to increasingly focus on its production. Between 2015 and 2019, renewable energies went from representing 2% of the energy matrix to 8% explained especially by the growth of solar and wind power.

At Dymaxion Labs, we are developing a geospatial analytics API (DYMAX). Our goal is to make machine learning modeling with multispectral satellite imagery scalable. Partnered with La Nacion, we set out to use artificial intelligence to map the country’s solar farms and monitor their development.

In this article we describe the methodology developed to create the map, showcasing all the power of our platform and how we can apply to real-world examples like solar farms detection in large areas.

Problem definition

First, we need to understand what kind of problem we are trying to solve. We aim to detect and localize each solar farm from satellite imagery on Argentina’s 2.7M square kilometers territory. This is an object detection problem. In this case, there is only one type of object we are interested in, so the model will have only one label. Note that in this example we will talk about solar farms detection, but you can detect different kinds of objects like cattle, roads, and swimming pools.

Datasets involved

We use used Sentinel-2 imagery as the source of satellite data. The Ministry of Energy and Mining of Argentina provided a list of places where the solar farms would be built. However, the number of places with solar farms that were already built by December 2019 was not enough to generate a training dataset. To solve it, we looked for sources of additional solar farm locations like OpenStreetMap.

Chile has a more developed program of renewal energies than Argentina. Being that the geography of solar farms areas are similar, we used the locations available in OpenStreetMap to grow the training dataset.

By bringing these two subsets of data together, we were able to find 200 solar farms that were manually reviewed by both teams in an internal hackathon.

Nonogasta Solar Farm (-29.332365, -67.425823). La Rioja, Argentina. Source: Goole Maps.

Modeling

On the modeling side, the biggest challenge is deploying the model to the 2.7M square kilometers area of Argentina. To solve that we use our API to process the Sentinel-2 chips of the whole country. Despite Sentinel-2 having a moderate spatial resolution (10-m), utility-scale solar farms have distinguishing features, such as a large footprint and recognizable pattern, which make them great candidates for detection by computer vision algorithms. Likewise, having higher resolution images would improve the results, allowing the detection of smaller areas. In fact, the La Nacion team double-check our results with PlanetScope (3-m) data.

To use these images for training our model, a few annotations steps had to be made to differ our object of interest from its environment.

1. Draw polygons around known solar farms, based on the data set made.

2. The geolocated contours of solar farms were cut off from satellite imagery from June 2019 to mid-January 2020 and combined to obtain the best captures.

3. We use image segmentation techniques to assist in the process. The images have been cut into 100 x 100 pixels chips with positive and negative examples of solar farms that allow us to train our algorithm to distinguish solar farms from any other negative pattern, such as areas of agriculture and mountains, city, and bare soil. The images are cropped in such a way that the algorithm can distinguish patterns such as borders and color combinations with higher accuracy.

4. Finally, we deploy our model on images across the country, in tiles of 100x100 px size. A total of seven million chips were processed.

In the pictures below, it can be appreciated how data augmentation techniques are used to distinguish between solar farms and crop fields using the infrared band instead of the classical RGB bands.

Animation Source: La Nacion Data.

Concluding remarks

In numbers, 7.000.000 images were processed and 2,780,400 km2 were analyzed through the whole project.

10,999 chips were used to train the model, where 70% had a negative class, referring to the case when the chip doesn’t contain a solar farm.

Only 1,222 were used to evaluate the results (the algorithm did not use the images to train). With Dymaxion’s API, the training process lasted 30 hours and activated 30 graphical processing nodes (GPU). The precision of the algorithm was 94%.

The resulting map shows the 20 solar farms officially working and generating energy plus the other two private solar farms in Santa Fe and San Luis districts. You can access the results in this Google Spreadsheet.

Heatmap of Solar Farms locations in Argentina

Thanks La Nacion Data team, especially Florencia Coelho and Mathias Felipe, for choosing our tools to run this project.

--

--