You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I compared the performance of the original, Lite, quantized, and pruned models in terms of validation accuracy, inference time, and model size. For model size, I used gzip for all models.
My Questions:
Inference Time:
I observe that: Original model > Lite model = Pruned model > Quantized model
Question 1 : Why does the pruned model have the same inference time as the Lite model? I expect from pruned model to have less inference time than lite model.
Model Size:
I observe that: Original model = Lite model > Pruned model > Quantized model
Question 2: Why does the Lite model have the same size as the original model? I expect from it to be less than original one.
Update:
I installed TensorFlow Flex to support operations not natively supported by TensorFlow Lite. I have provided a minimal, non-working example to illustrate my second question about model size: Why does the Lite model have the same size as the original model? I expected it to be smaller than the original.
I am unsure how to resolve this issue. Interestingly, when I perform the same steps with a simple CNN model, it works perfectly (Lite is smaller than Original and the code works without error). However, it fails with this SqueezeNet example.
importtensorflowastffromtensorflow.keras.modelsimportModelfromtensorflow.keras.layersimportInput, Conv2D, MaxPooling2D, Concatenate, Add, GlobalAveragePooling2D, Dropoutfromtensorflow.kerasimportbackendasKimportosimporttempfileimportzipfileimportpathlibimportnumpyasnpfromtensorflow.keras.utilsimportregister_keras_serializable# Function to print library versionsdefprint_library_versions():
print(f"TensorFlow Version: {tf.__version__}")
# Print library versionsprint_library_versions()
# Model definition functionsdefSqueezeNetSimple(input_shape, num_classes, use_bypass=False, dropout_rate=None):
input_img=Input(shape=input_shape)
x=Conv2D(16, (3, 3), activation='relu', padding='same', name='conv1')(input_img)
x=MaxPooling2D(pool_size=(2, 2), name='maxpool1')(x)
x=create_fire_module(x, 8, name='fire2')
x=create_fire_module(x, 8, name='fire3', use_bypass=use_bypass)
ifdropout_rate: x=Dropout(dropout_rate)(x)
x=Conv2D(num_classes, (1, 1), activation='relu', padding='same', name='conv2')(x)
x=GlobalAveragePooling2D(name='avgpool2')(x)
returnModel(inputs=input_img, outputs=x)
defcreate_fire_module(x, nb_squeeze_filter, name, use_bypass=False):
nb_expand_filter=4*nb_squeeze_filtersqueeze=Conv2D(nb_squeeze_filter, (1, 1), activation='relu', padding='same', name='%s_squeeze'%name)(x)
expand_1x1=Conv2D(nb_expand_filter, (1, 1), activation='relu', padding='same', name='%s_expand_1x1'%name)(squeeze)
expand_3x3=Conv2D(nb_expand_filter, (3, 3), activation='relu', padding='same', name='%s_expand_3x3'%name)(squeeze)
axis=-1ifK.image_data_format() =='channels_last'else1x_ret=Concatenate(axis=axis, name='%s_concatenate'%name)([expand_1x1, expand_3x3])
ifuse_bypass:
x_ret=Add(name='%s_concatenate_bypass'%name)([x_ret, x])
returnx_ret# Function to get gzipped model sizedefget_gzipped_model_size(model_name):
_, zipped_file=tempfile.mkstemp('.zip')
withzipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) asf:
f.write(model_name)
returnos.path.getsize(zipped_file) # Size in bytes# Function to save TFLite modeldefsave_tflite_model(model, a):
path_="../source"tflite_models_dir=pathlib.Path(path_)
tflite_models_dir.mkdir(exist_ok=True, parents=True)
namee=f"squeezenet_opportunity_{a}_sil.tflite"tflite_model_file=tflite_models_dir/nameetflite_model_file.write_bytes(model)
print("Model saved: ", namee)
returntflite_model_file# Function to create and save TFLite modeldefget_saved_model1(model_dir, a):
converter=tf.lite.TFLiteConverter.from_saved_model(model_dir)
converter.target_spec.supported_ops= [
tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS
]
# Disable resource variablesconverter.experimental_enable_resource_variables=False# Enable debugging and detailed loggingconverter.experimental_new_converter=Trueconverter.experimental_new_quantizer=Truelite_model=converter.convert()
model_path=save_tflite_model(lite_model, a)
returnmodel_path# Example data (replace with your actual data)x_train=np.random.rand(100, 30, 45, 1)
y_train=np.random.randint(0, 5, 100) # Dummy labels for 5 classesnum_classes=5# Create the simplified modelmodel=SqueezeNetSimple(x_train.shape[1:], num_classes, use_bypass=True, dropout_rate=0.5)
# Compile the modelmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train the model with dummy datamodel.fit(x_train, y_train, epochs=1, batch_size=16) # Adjust epochs and batch_size as needed# Save the model in TensorFlow SavedModel formatsaved_model_path='../models/Original/squeezenet/saved_squeezenet_simple'tf.saved_model.save(model, saved_model_path)
# Convert the SavedModel to TFLitemodel_path=get_saved_model1(saved_model_path, "m1_simple")
print("Model name: ", model_path)
print("Model size: ", os.path.getsize(model_path))
print("Model size gzip: ", get_gzipped_model_size(model_path))
print("The end")
# Load TFLite model and allocate tensors with Flex delegatedefload_tflite_model_with_flex(model_path, flex_delegate_path):
try:
interpreter=tf.lite.Interpreter(model_path=model_path,
experimental_delegates=[tf.lite.experimental.load_delegate(flex_delegate_path)])
interpreter.allocate_tensors()
returninterpreterexceptExceptionase:
print(f"Failed to load delegate: {e}")
returnNone# Specify the correct path to the Flex delegate libraryflex_delegate_path='bazel-bin/tensorflow/lite/delegates/flex/libtensorflowlite_flex_delegate.so'ifnotos.path.exists(flex_delegate_path):
raiseFileNotFoundError(f"Flex delegate library not found at {flex_delegate_path}")
interpreter1=load_tflite_model_with_flex(model_path, flex_delegate_path)
ifinterpreter1:
# Test the model on random input datainput_details=interpreter1.get_input_details()
output_details=interpreter1.get_output_details()
input_shape=input_details[0]['shape']
input_data=np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter1.set_tensor(input_details[0]['index'], input_data)
interpreter1.invoke()
output_data=interpreter1.get_tensor(output_details[0]['index'])
print("Output from model 1: ", output_data)
print("The end")
This is the output i got (I use macbook pro with m3 pro chip):
(tf_env) username@User-MBP source % python myscript.py
TensorFlow Version: 2.16.1
7/7 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.0782 - loss: 3.1644
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1719010068.685977 2618997 tf_tfl_flatbuffer_helpers.cc:390] Ignored output_format.
W0000 00:00:1719010068.686002 2618997 tf_tfl_flatbuffer_helpers.cc:393] Ignored drop_control_dependency.
2024-06-22 01:47:48.686209: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: ../models/Original/squeezenet/saved_squeezenet_simple
2024-06-22 01:47:48.687045: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-06-22 01:47:48.687053: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: ../models/Original/squeezenet/saved_squeezenet_simple
2024-06-22 01:47:48.696450: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-06-22 01:47:48.697860: I tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2024-06-22 01:47:48.735235: I tensorflow/cc/saved_model/loader.cc:218] Running initialization op on SavedModel bundle at path: ../models/Original/squeezenet/saved_squeezenet_simple
2024-06-22 01:47:48.746486: I tensorflow/cc/saved_model/loader.cc:317] SavedModel load for tags { serve }; Status: success: OK. Took 60277 microseconds.
2024-06-22 01:47:48.756733: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable.
loc(fused["ReadVariableOp:", "functional_1_1/conv1_1/Reshape/ReadVariableOp@__inference_serving_default_3237"]): error: missing attribute 'value'
LLVM ERROR: Failed to infer result type(s).
zsh: abort python myscript.py
The text was updated successfully, but these errors were encountered:
I converted my TensorFlow models (.h5 format) to TensorFlow Lite, including quantized and pruned versions.
Note: my model is squeezenet
Script to Save Lite Model:
Script to Create Pruned Version:
Script to Create Quantized Version:
I compared the performance of the original, Lite, quantized, and pruned models in terms of validation accuracy, inference time, and model size. For model size, I used gzip for all models.
My Questions:
Update:
I installed TensorFlow Flex to support operations not natively supported by TensorFlow Lite. I have provided a minimal, non-working example to illustrate my second question about model size: Why does the Lite model have the same size as the original model? I expected it to be smaller than the original.
I am unsure how to resolve this issue. Interestingly, when I perform the same steps with a simple CNN model, it works perfectly (Lite is smaller than Original and the code works without error). However, it fails with this SqueezeNet example.
This is the output i got (I use macbook pro with m3 pro chip):
The text was updated successfully, but these errors were encountered: