openvinotoolkit · alexsu52 · Oct 16, 2024 · Oct 17, 2024 · Oct 18, 2024 · Oct 21, 2024
@@ -1,8 +1,7 @@
 datasets
-whowhatbench @ git+https://github.com/andreyanufr/who_what_benchmark.git
-numpy>=1.23.5
+whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai.git#subdirectory=tools/who_what_benchmark
+numpy>=1.23.5,<2
 openvino==2024.5
-optimum-intel[openvino]>=1.13.0
+optimum-intel>=1.13.0
 transformers>=4.35.2
 onnx==1.17.0
-numpy<2
@@ -1,16 +1,12 @@
 # Compress TinyLLama model using synthetic data
 
-This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) model.
-To evaluate the accuracy of the compressed model we measure similarity between two texts generated by the baseline and compressed models using [WhoWhatBench](https://github.com/openvinotoolkit/openvino.genai/tree/master/llm_bench/python/who_what_benchmark) library.
+This example demonstrates how to optimize Large Language Models (LLMs) using NNCF weight compression API & synthetic data for the advanced algorithms usage. The example applies 4/8-bit mixed-precision quantization & Scale Estimation algorithm to weights of Linear (Fully-connected) layers of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) model. This leads to a significant decrease in model footprint and performance improvement with OpenVINO.
 
 The example includes the following steps:
 
-- Prepare `wikitext` dataset.
 - Prepare `TinyLlama/TinyLlama-1.1B-Chat-v1.0` text-generation model in OpenVINO representation using [Optimum-Intel](https://huggingface.co/docs/optimum/intel/inference).
-- Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & `wikitext` dataset.
 - Prepare `synthetic` dataset using `nncf.data.generate_text_data` method.
 - Compress weights of the model with NNCF Weight compression algorithm with Scale Estimation & `synthetic` dataset.
-- Measure the similarity of the two models optimized with different datasets.
 
 ## Install requirements
 

@@ -77,7 +77,6 @@ def main():
         scale_estimation=True,
     )
 
-    # Verify the model output in comparison to floating-point one
     input_ids = tokenizer("What is Python? ", return_tensors="pt").to(device=hf_model.device)
     max_new_tokens = 100
 

@@ -1,8 +1,6 @@
 torch==2.5.1
-datasets==3.0.1
-numpy>=1.23.5
+numpy>=1.23.5,<2
 openvino==2024.5
-optimum-intel[openvino]>=1.13.0
+optimum-intel>=1.13.0
 transformers>=4.35.2
 onnx==1.17.0
-numpy<2
@@ -9,13 +9,13 @@ pytest-forked
 
 librosa==0.10.0
 memory-profiler==0.61.0
-optimum-intel==1.15.2
-optimum==1.17.1
+optimum-intel==1.20.1
+optimum==1.23.3
 scikit-learn>=1.2.2,<=1.5.0
 soundfile==0.12.1
 tensorboard==2.13.0
 tensorflow-io==0.32.0
 timm==0.9.2
-transformers==4.38.2
-whowhatbench @ git+https://github.com/andreyanufr/who_what_benchmark@456d3584ce628f6c8605f37cd9a3ab2db1ebf933
+transformers==4.46.3
+whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai.git@7d8912ff9df9bcfacf0044d108963cb7618bff69#subdirectory=tools/who_what_benchmark
 datasets==2.21.0