UHG
Search
Close this search box.

Hands-On Guide To Darts – A Python Tool For Time Series Forecasting

In this article, we will learn about Darts, implement this over a time-series dataset.

Share

Table of Content

Data collected over a certain period of time is called Time-series data. These data points are usually collected at adjacent intervals and have some correlation with the target. There are certain datasets that contain columns with date, month or days that are important for making predictions like sales datasets, stock price prediction etc. But the problem here is how to use the time-series data and convert them into a format the machine can understand? Python made this process a lot simpler by introducing a package called Darts. 

In this article, we will learn about Darts, implement this over a time-series dataset.

Introduction to Darts

For a number of datasets, forecasting the time-series columns plays an important role in the decision making process for the model. Unit8.co developed a library to make the forecasting of time-series easy called darts. The idea behind this was to make darts as simple to use as sklearn for time-series. Darts attempts to smooth the overall process of using time series in machine learning. 

The basic principles of darts are:

  1. There are two types of models in darts :

Regression models: these predict the output based on a set of input time-series.

Forecasting models: these predict a future output based on past values.

  1. They have a class called TimeSeries which is immutable like strings. 
  2. The TimeSeries class can either one single dimensional or multi-dimensional. Some models like neural networks need multiple dimensions while other simple models work with just 1 dimension.
  3. Methods like fit() and predict() are unified across all models from neural networks to ARIMA

Implementation of darts on time-series data

Darts is open-source and can be installed with the pip command. To install darts use:

pip install u8darts

Dataset

Next, choose any time-series dataset of your choice. I have selected the monthly production of beer in Australia dataset. To download this click here. Let us now load the dataset and import the libraries needed.

from google.colab import drive

drive.mount('/content/gdrive/')

import pandas as pd

from darts import TimeSeries

beer_data = pd.read_csv('/content/gdrive/My Drive/beer.csv')

beer_data.head()

darts

The dataset contains two columns- the month with the year and the beer production in that time period. 

Train-test split

Let us now use the TimeSeries class and split the data into train and test. We will use a method called from_dataframe for doing this and pass column names in the method. Then, we will split the data based on the time period. The dataset has around 477 columns, so I chose the 275th time period to make the split (1978-10).

get_data = TimeSeries.from_dataframe(beer_data, 'Month', 'Monthly beer production')

traindata, testdata = get_data.split_before(pd.Timestamp('1978-10'))

 Modelling

Training of the model is very simple with darts. An exponential smoothing model is used here to fit the data. Similar to sklearn, fit() method is used to fit the dataset. 

from darts.models import ExponentialSmoothing

beer_model = ExponentialSmoothing()

beer_model.fit(traindata)

This completes the training part. Let us now make predictions and plot the graph

prediction = beer_model.predict(len(test))

print("predicted" ,prediction[:5])

print("actual",test[:5])

darts

import matplotlib.pyplot as plt

get_data.plot(label='actual')

prediction.plot(label='predict', lw=3)

plt.legend()

time-series

Here the monthly values after 1978 are forecasted due to the model exponential smoothing. It shows the time-series predictions with good accuracy.

Darts can also be used in neural networks, multivariate models and clustering models. 

Conclusion

In this article, we saw how to use the darts library to forecast time-series problems with just a few simple lines of code. The library is fast and saves time when compared to the Pandas library. The library also contains options for backtesting, regression models and even automatically select models. It is a great way to handle time-series datasets.

📣 Want to advertise in AIM? Book here

Related Posts
19th - 23rd Aug 2024
Generative AI Crash Course for Non-Techies
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.
Flagship Events
Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord-icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.