Fill Missing Precipitation Data with Machine Learning in Python and Scikit-Learn – Tutorial

Hydrate

Evaluation of hydrological processes as evapotranspiration, runoff, routing and infiltration require data as precipitation, flow, temperature and radiation on a daily basis. Required data for hydrological modeling need to be accurate and must be completed over the study period. It is common that historical data from hydrological stations are incomplete and has many gaps that can be filled by the use of machine learning algorithms like Scikit-Learn in Python3. This tutorial shows a applied procedure to run a complete script for the filling of missing precipitation in one station by the use of data from 2 nearby stations. The script will be run on Python 3.9 on a Anaconda Prompt environment.

Tutorial

Code

#!pip install sklearn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor

precipStations =  pd.read_csv('../Data/Est1_Est2_Est3.csv',index_col=0,parse_dates=True)
precipStations.head()

	Est1	Est2	Est3
Fecha
2014-07-25	0.0	NaN	0.0
2014-07-26	0.6	NaN	0.0
2014-07-27	0.0	NaN	0.0
2014-07-28	0.0	NaN	0.0
2014-07-29	0.2	NaN	0.2

fig, axs=plt.subplots(3,1,figsize=(12,8),sharex=True,sharey=True)
axs[0].plot(precipStations.index,precipStations['Est1'],label='Est1')
axs[0].legend()
axs[1].plot(precipStations.index,precipStations['Est2'],label='Est2')
axs[1].legend()
axs[2].plot(precipStations.index,precipStations['Est3'],label='Est3')
axs[2].legend()
plt.xticks(rotation='vertical')
plt.show()

precipNotNan = precipStations.dropna()
precipNotNan

	Est1	Est2	Est3
Fecha
2014-08-23	0.0	0.0	0.0
2014-08-24	0.0	0.0	0.0
2014-08-25	0.0	0.0	0.0
2014-08-26	0.0	0.0	0.0
2014-08-27	0.0	0.0	0.0
…	…	…	…
2015-06-21	0.0	0.0	0.0
2015-06-22	0.0	0.0	0.0
2015-06-23	0.0	0.0	0.0
2015-06-24	0.0	0.0	0.0
2015-06-25	3.8	2.2	0.0

229 rows × 3 columns

xTrain = precipNotNan[['Est1','Est3']]
yTrain = precipNotNan[['Est2']].values.flatten()

print(xTrain[:10])
print(yTrain[:10])

Est1  Est3
Fecha                 
2014-08-23   0.0   0.0
2014-08-24   0.0   0.0
2014-08-25   0.0   0.0
2014-08-26   0.0   0.0
2014-08-27   0.0   0.0
2014-08-28   0.0   0.0
2014-08-29   0.0   0.0
2014-08-30   0.0   0.0
2014-08-31   0.0   0.0
2014-09-01   2.8   3.4
[0.  0.  0.  0.  0.  0.  0.  0.  0.  3.4]

scaler = StandardScaler().fit(xTrain.values)

xTrainScaled = scaler.transform(xTrain.values)

print(xTrainScaled[:10])

[[-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.04157671  0.123051  ]]

#check scaler
print(xTrainScaled.mean(axis=0))
print(xTrainScaled.std(axis=0))

[ 1.00841218e-16 -4.65421006e-17]
[1. 1.]

#regressor
regr = MLPRegressor(random_state=1, max_iter=5000).fit(xTrainScaled, yTrain)

#test
xTest = precipStations[['Est1','Est3']].dropna()
xTestScaled = scaler.transform(xTest.values)

print(xTest.describe())
print(xTestScaled[:10])

Est1        Est3
count  431.000000  431.000000
mean     2.376798    2.364269
std      4.948763    4.888289
min      0.000000    0.000000
25%      0.000000    0.000000
50%      0.000000    0.000000
75%      2.200000    2.400000
max     36.000000   43.800000
[[-0.53894584 -0.50809201]
 [-0.43236674 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.50341948 -0.47096595]
 [-0.50341948 -0.43383989]
 [ 0.98868793  0.97695037]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]
 [-0.53894584 -0.50809201]]

#regression
yPredict = regr.predict(xTestScaled)
print(yPredict[:10])

[0.05938569 0.15278002 0.05938569 0.05938569 0.17877184 0.26853265
 8.31785189 0.05938569 0.05938569 0.05938569]

#comparison of station 2
fig, ax=plt.subplots(figsize=(12,8),sharex=True,sharey=True)
ax.plot(precipStations.index,precipStations['Est2'],label='Est2')
ax.plot(xTest.index,yPredict,label='Est2Predict')
plt.legend()
plt.xticks(rotation='vertical')
plt.show()

#insert predicted values
precipStations.head()

	Est1	Est2	Est3
Fecha
2014-07-25	0.0	NaN	0.0
2014-07-26	0.6	NaN	0.0
2014-07-27	0.0	NaN	0.0
2014-07-28	0.0	NaN	0.0
2014-07-29	0.2	NaN	0.2

#create new column
precipStations['Est2Completed'] = 0

#fill the new column with original and predicted values for Est2
for index, row in precipStations.iterrows():
    if np.isnan(row['Est2']) and ~np.isnan(row['Est1']) and ~np.isnan(row['Est3']):
        rowScaled = scaler.transform([[row['Est1'],row['Est3']]])
        precipStations.loc[index,['Est2Completed']] = regr.predict(rowScaled)
    elif ~np.isnan(row['Est2']):
        precipStations.loc[index,['Est2Completed']] = row['Est2']
    else:
        row['Est2Completed'] = np.nan

#show original and filled values
fig, axs=plt.subplots(4,1,figsize=(12,8),sharex=True,sharey=True)
axs[0].plot(precipStations.index,precipStations['Est1'],label='Est1')
axs[0].legend()
axs[1].plot(precipStations.index,precipStations['Est2'],label='Est2',color='orangered')
axs[1].legend()
axs[2].plot(precipStations.index,precipStations['Est2Completed'],label='Est2Completed',color='darkorange')
axs[2].legend()
axs[3].plot(precipStations.index,precipStations['Est3'],label='Est3')
axs[3].legend()
plt.xticks(rotation='vertical')
plt.show()

Input data

You can download the input data from this link.

—————
TRANSCOM ISP – Free Sigma HSE Email
Level 6 Domain Names are FREE – Register Now.

Share on Social Media

Hydrate

Basic tutorial of online geological modeling with Gempy in AquiferApp

Gempy is an open-source library for modeling geology written in Python. The library is capable of creating complex 3D geological models including structures, fault networks, and unconformities and it can be coupled with uncertainty analysis. Hatarilabs have developed an online

May 4, 2024

extreme

Hydrate

Large Language models in Earth Observation

If you grabbed the concepts previous post, you are probably ready to try to understand how you can use them proficiently in you hydrological work.

May 3, 2024

extreme

Hydrate

Can Large Language Models be useful in Hydrology ?

Work in progressI am thinking if LLM can be useful in Hydrology. More than thinking actually, it requires some readings and some understanding of the

May 1, 2024

extreme

Hydrate

Exploring the Soil-Plant-Atmosphere Continuum: Advancements, Integrated Modeling and Ecohydrological Insights, a Ph.D. Thesis by C. D’Amato

This is a placeholder for the incoming Thesis draft by Concetta D’Amato
—————
TRANSCOM ISP – Free Sigma HSE Email
Level 6 Domain Names are FREE – Register

April 30, 2024

extreme

Tutorial

Code

Input data

extreme

Catch Up

Related Posts