Skip to content

Commit

Permalink
Release of Greykite 1.0.0 (#129)
Browse files Browse the repository at this point in the history
  • Loading branch information
sayanpatra authored Jan 16, 2024
1 parent a1cb0cd commit dac9237
Show file tree
Hide file tree
Showing 99 changed files with 15,294 additions and 328 deletions.
8 changes: 4 additions & 4 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ Authors
* Sayan Patra
* Yi Su
* Rachit Arora
* Brian Vegetabile
* Qiang Fei
* Phil Gaudreau
* Yi-Wei Liu

Other Contributors
------------------
* Qiang Fei
* Saad Eddin Al Orjany
* Rachit Kumar
* Phil Gaudreau
* Yi-Wei Liu
* Katherine Li
38 changes: 35 additions & 3 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,41 @@
History
=======

1.0.0 (2024-01-07)
------------------

* Greykite AD (Anomaly Detection) is now available.
It improves upon the out-of-box confidence intervals generated by Silverkite, by automatically tuning the confidence intervals
and other filters (e.g. based on ``Absolute Percentage Error (APE)``) using expected alert rate information and/ or anomaly labels, if available.
It allows the users to define robust objective function, constraints and parameter space to optimize the confidence intervals.
For example user can target a minimal recall level of 80% while maximizing precision. Additionally, the users can specify a
minimum error level to filter out anomalies that are not business relevant. The motivation to include criteria other than
statistical significance is to bake in material/ business impact into the detection.

* @Reza Hosseini: Devised the core anomaly detection library structure. Added base ``Detector`` module.
* @Reza Hosseini: Added `~greykite.detection.detector.reward.Reward` that allows users to specify and optimize robust anomaly detection objectives.
* @Sayan Patra: Added ``GreykiteDetector`` module that builds anomaly detection based on Greykite forecasting.
* @Sayan Patra: Added tutorials for Greykite anomaly detection.

* New features and methods
* @Reza Hosseini: Added `~greykite.common.features.outlier.ZScoreOutlierDetector` and `~greykite.common.features.outlier.TukeyOutlierDetector`, improved outlier detection modules.
* @Sayan Patra: Added `~greykite.detection.common.pickler.GreykitePickler`. This improves the pickling function for Greykite models and allows to store the model in a single file.
* @Yi-Wei Lu: Added ``DifferenceBasedOutlierTransformer`` that can identify outliers in the ``sklearn`` pipeline.

* Library enhancements
* @Kaixu Yang: Added ``scipy`` solver to make quantile regression more stable.
* @Qiang Fei: Updated ``auto_holiday`` functionality to use holiday groupers for improved forecast performance in holiday periods.
* @Katherine Li: Improved changepoint detection method that can identify level shifts.

* Bug fixes
* @Reza Hosseini @Sayan Patra @Yi Su @Qiang Fei @Kaixu Yang @Phil Gaudreau: Other library enhancements and bug fixes.


0.5.1 (2023-06-01)
------------------

Loosen dill requirements
Loosen dill package requirements.


0.5.0 (2023-04-03)
------------------
Expand All @@ -32,6 +63,7 @@ Python 3.10 support.
* @Yi Su, @Sayan Patra: Now ``train_end_date`` is always respected if specified by the user. Previously it got ignored if there are trailing NA’s in training data or ``anomaly_df`` imputes the anomalous points to NA. Also, now ``train_end_date`` accepts a string value.
* @Yi Su: The seasonality order now takes `None` without raising an error. It will be treated the same as `False` or zero.


0.4.0 (2022-07-15)
------------------

Expand All @@ -41,7 +73,7 @@ Python 3.10 support.
* @Kaixu Yang: Auto model components. (1) seasonality inferrer (2) holiday inferrer (3) automatic growth.
* @Kaixu Yang: Lag-based estimator. Supports lag-based forecasts such as week-over-week.
* @Reza Hosseini: Fast simulation option. Provides a better accuracy and speed for mean prediction when simulation is used in autoregression.
* @Kaixu Yang: Quantile regression option for Silverkite `fit_algorithm`.
* @Kaixu Yang: Quantile regression option for Silverkite ``fit_algorithm``.

* New model templates
* @Kaixu Yang: AUTO. Automatically chooses templates based on the data frequency, forecast horizon and evaluation configs.
Expand All @@ -55,7 +87,7 @@ Python 3.10 support.

* Library enhancements and bug fixes
* The SILVERKITE template has been updated to include automatic autoregression and changepoint detection.
* Renamed `SilverkiteMultistageEstimator` to `MultistageForecastEstimator`.
* Renamed ``SilverkiteMultistageEstimator`` to ``MultistageForecastEstimator``.
* Renamed the normalization method "min_max" to "zero_to_one".
* @Reza Hosseini: Added normalization methods: "minus_half_to_half", "zero_at_origin".
* @Albert Chen: Updated tutorials.
Expand Down
18 changes: 15 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Greykite: A flexible, intuitive and fast forecasting library
Greykite: A flexible, intuitive and fast forecasting and anomaly detection library

.. raw:: html

Expand All @@ -21,6 +21,16 @@ evaluation, benchmarking, and plotting.
Other open source algorithms can be supported through Greykite’s interface to take advantage of this framework,
as listed below.

Greykite AD (Anomaly Detection) is an extension of the Greykite Forecasting library. It provides users with an interpretable,
fast, robust and easy to use interface to monitor their metrics with minimal effort.

Greykite AD improves upon the out-of-box confidence intervals generated by Silverkite, by automatically tuning the confidence intervals
and other filters (e.g. based on ``APE``) using expected alert rate information and/ or anomaly labels, if available.
It allows the users to define robust objective function, constraints and parameter space to optimize the confidence intervals.
For example user can target a minimal recall level of 80% while maximizing precision. Additionally, the users can specify a
minimum error level to filter out anomalies that are not business relevant. The motivation to include criteria other than
statistical significance is to bake in material/ business impact into the detection.

For a demo, please see our `quickstart <https://linkedin.github.io/greykite/get_started>`_.

Distinguishing Features
Expand All @@ -47,7 +57,8 @@ Distinguishing Features

Algorithms currently supported within Greykite’s modeling framework:

* Silverkite (Greykite’s flagship algorithm)
* Silverkite (Greykite’s flagship forecasting algorithm)
* Greykite Anomaly Detection (Greykite's flagship anomaly detection algorithm)
* `Facebook Prophet <https://facebook.github.io/prophet/>`_
* `Auto Arima <https://alkaline-ml.com/pmdarima/>`_

Expand All @@ -62,6 +73,7 @@ libraries or even outside the forecasting context.
* SimpleSilverkiteForecast() - Silverkite algorithm with `forecast_simple` and `predict` methods.
* SilverkiteForecast() - low-level interface to Silverkite algorithm with `forecast` and `predict` methods.
* ReconcileAdditiveForecasts() - adjust a set of forecasts to satisfy inter-forecast additivity constraints.
* GreykiteDetector() - simple interface for optimizing anomaly detection performance based on Greykite forecasts.

Usage Examples
--------------
Expand Down Expand Up @@ -164,4 +176,4 @@ License
-------

Copyright (c) LinkedIn Corporation. All rights reserved. Licensed under the
`BSD 2-Clause <https://opensource.org/licenses/BSD-2-Clause>`_ License.
`BSD 2-Clause <https://opensource.org/licenses/BSD-2-Clause>`_ License.
16 changes: 14 additions & 2 deletions README_PYPI.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Greykite: A flexible, intuitive and fast forecasting library
Greykite: A flexible, intuitive and fast forecasting and anomaly detection library

.. image:: https://raw.githubusercontent.com/linkedin/greykite/master/LOGO-C8.png
:height: 300px
Expand All @@ -22,6 +22,16 @@ evaluation, benchmarking, and plotting.
Other open source algorithms can be supported through Greykite’s interface to take advantage of this framework,
as listed below.

Greykite AD (Anomaly Detection) is an extension of the Greykite Forecasting library. It provides users with an interpretable,
fast, robust and easy to use interface to monitor their metrics with minimal effort.

Greykite AD improves upon the out-of-box confidence intervals generated by Silverkite, by automatically tuning the confidence intervals
and other filters (e.g. based on ``APE``) using expected alert rate information and/ or anomaly labels, if available.
It allows the users to define robust objective function, constraints and parameter space to optimize the confidence intervals.
For example user can target a minimal recall level of 80% while maximizing precision. Additionally, the users can specify a
minimum error level to filter out anomalies that are not business relevant. The motivation to include criteria other than
statistical significance is to bake in material/ business impact into the detection.

For a demo, please see our `quickstart <https://linkedin.github.io/greykite/get_started>`_.

Distinguishing Features
Expand Down Expand Up @@ -49,6 +59,7 @@ Distinguishing Features
Algorithms currently supported within Greykite’s modeling framework:

* Silverkite (Greykite’s flagship algorithm)
* Greykite Anomaly Detection (Greykite's flagship anomaly detection algorithm)
* `Facebook Prophet <https://facebook.github.io/prophet/>`_
* `Auto Arima <https://alkaline-ml.com/pmdarima/>`_

Expand All @@ -63,6 +74,7 @@ libraries or even outside the forecasting context.
* SimpleSilverkiteForecast() - Silverkite algorithm with `forecast_simple` and `predict` methods.
* SilverkiteForecast() - low-level interface to Silverkite algorithm with `forecast` and `predict` methods.
* ReconcileAdditiveForecasts() - adjust a set of forecasts to satisfy inter-forecast additivity constraints.
* GreykiteDetector() - simple interface for optimizing anomaly detection performance based on Greykite forecasts.

Usage Examples
--------------
Expand Down Expand Up @@ -165,4 +177,4 @@ License
-------

Copyright (c) LinkedIn Corporation. All rights reserved. Licensed under the
`BSD 2-Clause <https://opensource.org/licenses/BSD-2-Clause>`_ License.
`BSD 2-Clause <https://opensource.org/licenses/BSD-2-Clause>`_ License.
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ Welcome to Greykite! View the contents to get started.
.. toctree::
:maxdepth: 1
:caption: Greykite Info
:caption: Overview
:hidden:
:glob:

pages/greykite/overview
pages/overview/*

.. toctree::
:maxdepth: 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,8 @@
holiday_df = get_holidays(countries=["US"], year_start=year_start, year_end=year_end)["US"]

# Defines the number of pre / post days that a holiday has impact on.
# If not specified, (0, 0) will be used.
# If not specified, numbers specified by ``holiday_impact_pre_num_days`` and
# ``holiday_impact_post_num_days`` will be used.
holiday_impact_dict = {
"Christmas Day": (4, 3), # 12/25.
"Independence Day": (4, 4), # 7/4.
Expand All @@ -390,8 +391,10 @@
holiday_df=holiday_df,
holiday_date_col="date",
holiday_name_col="event_name",
holiday_impact_pre_num_days=0,
holiday_impact_post_num_days=0,
holiday_impact_dict=holiday_impact_dict,
get_suffix_func="dow_grouped"
get_suffix_func="wd_we"
)

# Runs holiday grouper using k-means with diagnostics.
Expand Down
153 changes: 153 additions & 0 deletions docs/nbpages/quickstart/0200_simple_anomaly_detection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
"""
Simple Anomaly Detection
========================
You can create and evaluate an anomaly detection model with just a few lines of code.
Provide your timeseries as a pandas dataframe with timestamp and value.
Optionally, you can also provide the anomaly labels as a column in the dataframe.
For example, to detect anomalies in daily sessions data, your dataframe could look like this:
.. code-block:: python
import pandas as pd
df = pd.DataFrame({
"date": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
"sessions": [10231.0, 12309.0, 12104.0],
"is_anomaly": [False, True, False]
})
The time column can be any format recognized by `pandas.to_datetime`.
In this example, we'll load a dataset representing ``log(daily page views)``
on the Wikipedia page for Peyton Manning.
It contains values from 2007-12-10 to 2016-01-20. More dataset info
`here <https://facebook.github.io/prophet/docs/quick_start.html>`_.
"""

import warnings

import plotly
from greykite.common.data_loader import DataLoader
from greykite.detection.detector.config import ADConfig
from greykite.detection.detector.data import DetectorData
from greykite.detection.detector.greykite import GreykiteDetector
from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.model_templates import ModelTemplateEnum

warnings.filterwarnings("ignore")

# Loads dataset into pandas DataFrame
dl = DataLoader()
df = dl.load_peyton_manning()

# specify dataset information
metadata = MetadataParam(
time_col="ts", # name of the time column ("date" in example above)
value_col="y", # name of the value column ("sessions" in example above)
freq="D" # "H" for hourly, "D" for daily, "W" for weekly, etc.
# Any format accepted by `pandas.date_range`
)

# %%
# Create an Anomaly Detection Model
# -------------------------------
# Similar to forecasting, you need to provide a forecast config and an
# anomaly detection config. You can choose any of the available forecast model
# templates (see :doc:`/pages/stepbystep/0100_choose_model`).

# In this example, we choose the "AUTO" model template for the forecast config,
# and the default anomaly detection config.
# The Silverkite "AUTO" model template chooses the parameter configuration
# given the input data frequency, forecast horizon and evaluation configs.

anomaly_detector = GreykiteDetector() # Creates an instance of the Greykite anomaly detector

forecast_config = ForecastConfig(
model_template=ModelTemplateEnum.AUTO.name,
forecast_horizon=7, # forecasts 7 steps ahead
coverage=None, # Confidence Interval will be tuned by the AD model
metadata_param=metadata)

ad_config = ADConfig() # Default anomaly detection config

detector = GreykiteDetector(
forecast_config=forecast_config,
ad_config=ad_config,
reward=None)

# %%
# Train the Anomaly Detection Model
# ---------------------------------
# You can train the anomaly detection model by calling the ``fit`` method.
# This method takes a ``DetectorData`` object as input.
# The ``DetectorData`` object consists the time series information as a pandas dataframe.
# Optionally, you can also provide the anomaly labels as a column in the dataframe.
# The anomaly labels can also be provided as a list of boolean values.
# The anomaly labels are used to evaluate the model performance.

train_size = int(2700)
df_train = df[:train_size].reset_index(drop=True)
train_data = DetectorData(df=df_train)
detector.fit(data=train_data)

# %%
# Predict with the Anomaly Detection Model
# ---------------------------------------
# You can predict anomalies by calling the ``predict`` method.

test_data = DetectorData(df=df)
test_data = detector.predict(test_data)

# %%
# Evaluate the Anomaly Detection Model
# ------------------------------------
# The output of the anomaly detection model are stored as attributes
# of the ``GreykiteDetector`` object.
# (The interactive plots are generated by ``plotly``: **click to zoom!**)


# %%
# Training
# ^^^^^^^^
# The ``fitted_df`` attribute contains the result on the training data.
# You can plot the result by calling the ``plot`` method with ``phase="train"``.
print(detector.fitted_df)

fig = detector.plot(
phase="train",
title="Greykite Detector Peyton Manning - fit phase")
plotly.io.show(fig)

# %%
# Prediction
# ^^^^^^^^^^
# The ``pred_df`` attribute contains the predicted result.
# You can plot the result by calling the ``plot`` method with ``phase="predict"``.

print(detector.pred_df)

fig = detector.plot(
phase="predict",
title="Greykite Detector Peyton Manning - predict phase")
plotly.io.show(fig)

# %%
# Model Summary
# ^^^^^^^^^^^^^^^^^
# Model summary allows inspection of individual model terms.
# Check parameter estimates and their significance for insights
# on how the model works and what can be further improved.
# You can call the ``summary`` method to see the model summary.
summary = detector.summary()
print(summary)

# %%
# What's next?
# ------------
# If you're satisfied with the forecast performance, you're done!
#
# For a complete example of how to tune this forecast, see
# :doc:`/gallery/tutorials/0400_anomaly_detection_tutorial`.
Loading

0 comments on commit dac9237

Please sign in to comment.