There are a large number of Python AutoML libraries for fasting and simplifying machine learning tasks, such as H20, Pycaret Automl, and more. Here we are going to discuss salesforce Merlion library which has more features than some other libraries.
It supports univariate and multivariate time series forecasting.
It automatically detects outliers and anomalies.
It supports hyperparameter tuning.
It has inbuilt ensemble techniques.
It has an automatic visualization part.
This library’s main aim is to provide fast and accurate machine learning models for their specific time series problem.
There are several modules to boost ease-of-use, including visualization, anomaly score calibration to boost interpretability, AutoML for hyperparameter tuning and model selection, and model ensembling.
Merlion has two main sub-libraries on its ts_datasets for loading time series datasets and machine learning processes like data processing, data visualization, anomaly detection, hyperparameter tuning, data forecasting, etc…
This library loads data in the form of pandas.DataFrame.
You can simply install the library by calling Pypi features in your command prompt or anaconda prompt - pip install salesforce-merlion. You can also install by visiting the official Github page of salesforce-merlion at this Link.
Then move on to the next part, which is installing external dependencies of this library. You can see this simply below.
As we are working on ensemble modeling in this library, we also need to install the lightgbm using a simple pip install lightgbm.
Now we are working on anomaly detection so we also need to install the Java development kit, jdk using pip install openjdk.
Here we see some practical implementation of Merlion’s library using a time series dataset of Merlion’s inbuilt dataset importing library.
Begin by importing Merlion’s time series data using the data loader method. Anomalies mean outliers. Outliers are those data points which do not lie properly in our data distribution.
1
2
3
4
5
6
7
8
from merlion.utils
import TimeSeries
from ts_datasets.anomaly
import NAB
time_series, metadata = NAB(subset = "realKnownCause")[3]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
test_labels = TimeSeries.from_pd(metadata.anomaly[~metadata.trainval])
Now before doing anomaly detection, we initialize our data model parameters then insert data into it.
1 2 3 4 5
from merlion.models.defaults import DefaultDetectorConfig, DefaultDetector model = DefaultDetector(DefaultDetectorConfig()) model.train(train_data = train_data) test_pred = model.get_anomaly_label(time_series = test_data)
We call the Merlion library visualization function. Using it we can see various parameters of our data objectively like stationarity, seasonality, trends, outliers, cyclicity, etc…
1 2 3 4 5 6
from merlion.plot import plot_anoms import matplotlib.pyplot as plt fig, ax = model.plot_anomaly(time_series = test_data) plot_anoms(ax = ax, anomaly_labels = test_labels) plt.show()
Merlion also has a model evaluation function in it where we can get the accuracy score, precision and recall, and more results of our model performance.
1 2 3 4 5 6 7 8
from merlion.evaluate.anomaly import TSADMetric p = TSADMetric.Precision.value(ground_truth = test_labels, predict = test_pred) r = TSADMetric.Recall.value(ground_truth = test_labels, predict = test_pred) f1 = TSADMetric.F1.value(ground_truth = test_labels, predict = test_pred) mttd = TSADMetric.MeanTimeToDetect.value(ground_truth = test_labels, predict = test_pred) print(f "Precision: {p:.4f}, Recall: {r:.4f}, F1: {f1:.4f}\n" f "Mean Time To Detect: {mttd}")
This is the final and most important function of our Merlion library. We can forecast future data values on the basis of a date index. In this model we will initialize all the parameters of our library functions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
from merlion.utils
import TimeSeries
from ts_datasets.forecast
import M4
# Data loader returns pandas DataFrames, which we convert to Merlion TimeSeries
time_series, metadata = M4(subset = "Hourly")[0]
train_data = TimeSeries.from_pd(time_series[metadata.trainval])
test_data = TimeSeries.from_pd(time_series[~metadata.trainval])
from merlion.models.defaults
import DefaultForecasterConfig, DefaultForecaster
model = DefaultForecaster(DefaultForecasterConfig())
model.train(train_data = train_data)
test_pred, test_err = model.forecast(time_stamps = test_data.time_stamps)
After building the model we will visualize our complete prediction model.
1 2 3
import matplotlib.pyplot as plt fig, ax = model.plot_forecast(time_series = test_data, plot_forecast_uncertainty = True) plt.show()
Below is a summary of the above steps that we used for model forecasting.
Initialize a forecasting model (including ensembles and automatic model selectors)
Train the model
Produce a forecast with the model Visualizing
View the model’s predictions
Quantitatively evaluate the model
Save and load a trained model
Simulate the live deployment of a model