- Paul Boniol, Inria, ENS, PSL University, CNRS
- Donato Tiano, Università degli Studi di Modena e Reggio Emilia
- Angela Bonifati, Lyon 1 University, IUF, Liris CNRS
- Themis Palpanas, Université Paris Cité, IUF
The easiest solution to install
pip install kgraph-ts
Graphviz and pyGraphviz can be used to obtain better visualisation for
brew install graphviz
sudo apt install graphviz
Stable Windows install packages are listed here
Once Graphviz is installed, you can install pygraphviz as follows:
pip install pygraphviz
You can also install manually
conda env create --file environment.yml
conda activate kgraph
pip install -r requirements.txt
You can then install
pip install .
In order to play with
import sys
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from sklearn.metrics import adjusted_rand_score
sys.path.insert(1, './utils/')
from utils import fetch_ucr_dataset
from kgraph import kGraph
path = "/Path/to/UCRArchive_2018/"
data = fetch_ucr_dataset('Trace',path)
X = np.concatenate([data['data_train'],data['data_test']],axis=0)
y = np.concatenate([data['target_train'],data['target_test']],axis=0)
# Executing kGraph
clf = kGraph(n_clusters=len(set(y)),n_lengths=10,n_jobs=4)
clf.fit(X)
print("ARI score: ",adjusted_rand_score(clf.labels_,y))
Running kGraph for the following length: [36, 72, 10, 45, 81, 18, 54, 90, 27, 63]
Graphs computation done! (36.71151804924011 s)
Consensus done! (0.03878021240234375 s)
Ensemble clustering done! (0.0060100555419921875 s)
ARI score: 0.986598879940902
For variable lenght time series datasets,
clf = kGraph(n_clusters=len(set(y)),variable_length=True,n_lengths=10,n_jobs=4)
We provide visualization methods to plot the graph and the identified clusters (i.e., graphoids). After running
clf.show_graphoids(group=True,save_fig=True,namefile='Trace_kgraph')
Instead of visualizing the graph, we can directly retrieve the most representative nodes for each cluster with the following code:
nb_patterns = 1
#Get the most representative nodes
nodes = clf.interprete(nb_patterns=nb_patterns)
plt.figure(figsize=(10,4*nb_patterns))
count = 0
for j in range(nb_patterns):
for i,node in enumerate(nodes.keys()):
# Get the time series for the corresponding node
mean,sup,inf = clf.get_node_ts(X=X,node=nodes[node][j][0])
count += 1
plt.subplot(nb_patterns,len(nodes.keys()),count)
plt.fill_between(x=list(range(int(clf.optimal_length))),y1=inf,y2=sup,alpha=0.2)
plt.plot(mean,color='black')
plt.plot(inf,color='black',alpha=0.6,linestyle='--')
plt.plot(sup,color='black',alpha=0.6,linestyle='--')
plt.title('node {} for cluster {}: \n (representativity: {:.3f} \n exclusivity : {:.3f})'.format(nodes[node][j][0],node,nodes[node][j][3],nodes[node][j][2]))
plt.tight_layout()
plt.savefig('Trace_cluster_interpretation.jpg')
plt.close()
You may find a script containing all the code above here.