You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Process the DataFrame in chunks and write to memory-mapped files
chunk_size = 100 # Define an appropriate chunk size
n_chunks = len(df) // chunk_size + 1
the chunk size is fixed for each device so n_chunks is easy to calculate
n_chunks = len(df) // chunk_size
window_idx = 0
for i in range(n_chunks):
start_idx = i * chunk_size
end_idx = min((i + 1) * chunk_size, len(df))
df_chunk = df.iloc[start_idx:end_idx].copy() # Create a copy to avoid SettingWithCopyError
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am trying to apply the data preparation techniques discussed on
https://colab.research.google.com/github/timeseriesAI/tsai/blob/master/tutorial_nbs/00_How_to_efficiently_work_with_very_large_numpy_arrays.ipynb
The np.memmap approach works but the itemify approach did not work and I find no other resources to help to give an example or discuss the application directly. If someone can help to clarify how itemify can be connected to model training, that will be greatly appreciated.
Code used
import numpy as np
import pandas as pd
from tsai.all import *
from fastcore.foundation import L
Example data preparation
df = pd.DataFrame({
'device': np.repeat(np.arange(10), 100),
'region': np.tile(np.repeat(['A', 'B'], 5), 100),
'time': np.tile(np.arange(100), 10),
'var_0': np.random.randn(1000),
'var_1': np.random.randn(1000),
'target': np.random.randint(0, 2, 1000)
})
Determine the total number of windows
window_len = 5
stride = 1
n_windows = 0
for device in df['device'].unique():
n_device_windows = (len(df[df['device'] == device]) - window_len) // stride + 1
n_windows += n_device_windows
Apply SlidingWindowPanel first on a small sample to get the shape of the resulting arrays
sample_df = df.iloc[:window_len + 2].copy() # Create a copy to avoid SettingWithCopyError
sample_X, sample_y = SlidingWindowPanel(
window_len=window_len,
unique_id_cols=['device'],
stride=stride,
start=0,
get_x=df.columns[3:5],
get_y=['target'],
horizon=0,
seq_first=True,
sort_by=['time'],
ascending=True
)(sample_df)
Verify the shapes
print("Sample X shape:", sample_X.shape)
print("Sample y shape:", sample_y.shape)
Initialize memory-mapped files
X_shape = (n_windows, sample_X.shape[1], sample_X.shape[2]) # Adjust dimensions to (n_windows, features, steps)
y_shape = (n_windows,) # Adjust dimensions for 1D y
import os
Specify the full paths
X_memmap_path = os.path.abspath('C:/AIML/TSAI Study/X_data.memmap')
y_memmap_path = os.path.abspath('C:/AIML/TSAI Study/y_data.memmap')
Remove any existing files to avoid conflicts
if os.path.exists(X_memmap_path):
os.remove(X_memmap_path)
if os.path.exists(y_memmap_path):
os.remove(y_memmap_path)
Create memory-mapped files
X_memmap = np.memmap(X_memmap_path, dtype='float32', mode='w+', shape=X_shape)
y_memmap = np.memmap(y_memmap_path, dtype='float32', mode='w+', shape=y_shape)
Process the DataFrame in chunks and write to memory-mapped files
chunk_size = 100 # Define an appropriate chunk size
n_chunks = len(df) // chunk_size + 1
the chunk size is fixed for each device so n_chunks is easy to calculate
n_chunks = len(df) // chunk_size
window_idx = 0
for i in range(n_chunks):
start_idx = i * chunk_size
end_idx = min((i + 1) * chunk_size, len(df))
df_chunk = df.iloc[start_idx:end_idx].copy() # Create a copy to avoid SettingWithCopyError
Flush changes to disk
X_memmap.flush()
y_memmap.flush()
Read back the data using np.memmap
X_memmap = np.memmap(X_memmap_path, dtype='float32', mode='r', shape=X_shape)
y_memmap = np.memmap(y_memmap_path, dtype='float32', mode='r', shape=y_shape)
Convert y_memmap to integers and then to strings
y_memmap = y_memmap.astype(int).astype(str)
Verify the shapes again
print("X_memmap shape:", X_memmap.shape)
print("y_memmap shape:", y_memmap.shape)
splits = get_splits(y_memmap, valid_size=0.2, stratify=True, random_state=42)
Create TSDatasets and TSDataLoaders
tfms = [None, [Categorize()]]
dsets = TSDatasets(X_memmap, y_memmap, tfms=tfms, splits=splits)
dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[64, 128], num_workers=0)
Example of using TSAI with the DataLoaders
model = build_ts_model(InceptionTimePlus, dls=dls)
learn = Learner(dls, model, metrics=accuracy)
learn.fit_one_cycle(25, lr_max=1e-3)
This code works up to here for loading the data into the model and training the model.
but the attempt to apply itemify failed below. I cannot find any helpful information to this issue.
there is no example of itemified data objects being transformed into TSDatasets
Use itemify to handle large np.memmap arrays efficiently
def itemify(*x): return L(*x).zip()
X_items = itemify(X_memmap)
y_items = itemify(y_memmap)
splits = get_splits(y_items, valid_size=0.2, stratify=True, random_state=42)
Create TSDatasets and TSDataLoaders
tfms = [None, [Categorize()]]
dsets = TSDatasets(X_items, y_items, tfms=tfms, splits=splits)
Traceback (most recent call last):
Cell In[117], line 1
dsets = TSDatasets(X_items, y_items, tfms=tfms, splits=splits)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\tsai\data\core.py:450 in init
X = to3d(X)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\tsai\utils.py:172 in to3d
if isinstance(o, (np.ndarray, pd.DataFrame)): return to3darray(o)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\tsai\utils.py:151 in to3darray
assert False, f'Please, review input dimensions {o.ndim}'
AssertionError: Please, review input dimensions 4
dls = TSDataLoaders.from_dsets(dsets.train, dsets.valid, bs=[64, 128], num_workers=0)
Example of using TSAI with the DataLoaders
model = build_ts_model(InceptionTimePlus, dls=dls)
learn = Learner(dls, model, metrics=accuracy)
learn.fit_one_cycle(25, lr_max=1e-3)
Beta Was this translation helpful? Give feedback.
All reactions