Skip to content

Releases: erfanzar/EasyDeL

EasyDeL version 0.0.80

04 Dec 15:05
Compare
Choose a tag to compare

EasyDeL 0.0.80 brings enhanced flexibility, expanded model support, and improved performance with the introduction of vInference and optimized GPU/TPU integration. This version offers a significant speed and performance boost, with benchmarks showing improvements of over 4.9%, making EasyDeL more dynamic and easier to work with.

New Features:

  • Platform and Backend Flexibility: Users can now specify the platform (e.g., TRITON) and backend (e.g., GPU) to optimize their workflows.
    Expanded Model Support: We have added support for new models including olmo2, qwen2_moe, mamba2, and others, enhancing the tool's versatility.
  • Enhanced Trainers: Trainers are now more customizable and hackable, providing greater flexibility for project-specific needs.
  • New Trainer Types: Introduced sequence-to-sequence trainers and sequence classification trainers to support a wider range of training tasks.
  • vInference Engine: A robust inference engine for LLMs with Long-Term Support (LTS), ensuring stability and reliability.
  • vInferenceApiServer: A backend for the inference engine that is fully compatible with OpenAI APIs, facilitating easy integration.
  • Optimized GPU Integration: Leverages custom, direct TRITON calls for improved GPU performance, speeding up processing times.
  • Dynamic Quantization Support: Added support for quantization types NF4, A8BIT, A8Q, and A4Q, enabling efficiency and scalability.

Performance Improvements:

  • EasyDeL 0.0.80 has been optimized for speed and performance, with benchmarks showing improvements of over 4.9% compared to previous versions.
  • The tool is now more dynamic and easier to work with, enhancing the overall user experience.

This release is a significant step forward in making EasyDeL a more powerful and flexible tool for machine learning tasks. We look forward to your feedback and continued support.

Documentation:

Comprehensive documentation is available at https://easydel.readthedocs.io/en/latest/

Example Usage:

Load any of the 40+ available models with EasyDeL:

 
sharding_axis_dims = (1, 1, 1, -1)  # sequence sharding for better inference and training
max_length = 2**15
pretrained_model_name_or_path = "AnyEasyModel"
dtype = jnp.float16
model, params = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path,
    input_shape=(len(jax.devices()), max_length),
    auto_shard_params=True,
    sharding_axis_dims=sharding_axis_dims,
    config_kwargs=EasyDeLBaseConfigDict(
        use_scan_mlp=False,
        attn_dtype=jnp.float16,
        freq_max_position_embeddings=max_length,
        mask_max_position_embeddings=max_length,
        attn_mechanism=ed.AttentionMechanisms.VANILLA,
        kv_cache_quantization_method=ed.EasyDeLQuantizationMethods.A8BIT,
        use_sharded_kv_caching=False,
        gradeint_checkpointing=ed.EasyDeLGradientCheckPointers.NONE,
    ),
    quantization_method=ed.EasyDeLQuantizationMethods.NF4,
    quantization_block_size=256,
    platform=ed.EasyDeLPlatforms.TRITON,
    partition_axis=ed.PartitionAxis(),
    param_dtype=dtype,
    dtype=dtype,
    precision=lax.Precision("fastest"),
)

This release marks a significant advancement in making EasyDeL a more powerful and flexible tool for machine learning tasks. We look forward to your feedback and continued support.

Note

This might be the last release of EasyDeL that incorporates HF/Flax modules. In future versions, EasyDeL will transition to its own base
modules and may adopt Equinox or Flax NNX, provided that NNX meets sufficient performance standards. Users are encouraged to
provide feedback on this direction.
This release represents a significant step forward in making EasyDeL a more powerful and flexible tool for machine learning tasks. We
look forward to your feedback and continued support.

EasyDeL version 0.0.69

04 Jul 15:21
Compare
Choose a tag to compare

This release brings significant scalability improvements, new models, bug fixes, and usability enhancements to EasyDeL.

Highlights:

  • Multi-host GPU Training: EasyDeL now scales seamlessly across multiple GPUs and hosts for demanding training workloads.
  • New Models: Expand your NLP arsenal with the addition of Gemma2, OLMo, and Aya models.
  • Improved KV Cache Quantization: Enjoy a substantial accuracy boost with enhanced KV cache quantization, achieving +21% accuracy compared to the previous version.
  • Simplified Model Management: Load and save pretrained models effortlessly using the new model.from_pretrained and model.save_pretrained methods.
  • Enhanced Generation Pipeline: The GenerationPipeLine now supports streaming token generation, ideal for real-time applications.
  • Introducing the ApiEngine: Leverage the power of the new ApiEngine and engine_client for seamless integration with your applications.

Other Changes:

  • Fixed GPU Flash Attention bugs for increased stability.
  • Updated required jax version to >=0.4.28 for optimal performance. Versions 0.4.29 or higher are recommended if available.
  • Streamlined the structure import process and resolved multi-host training issues.

Upgrade:

To upgrade to EasyDeL v0.0.69, use the following command:

pip install --upgrade easydel==0.0.69

EasyDeL - 0.0.67

02 Jun 14:24
d33d2e8
Compare
Choose a tag to compare
  • New Features

    • GenerationPipeLine was added for fast streaming and easy generation with JAX.
    • Using Int8Params instead of LinearBitKernel.
    • Better GPU support.
    • EasyDeLState is now better and supports more general options.
    • Trainers now support .save_pretrained(to_torch) and training logging.
    • EasyDeLState supports to_8bit.
    • All of the models support to_8bit for params.
    • imports are now 91x times faster in EasyDeL version 0.0.67.
  • Removed API

    • JAXServe is no longer available.
    • PyTorchServe is no longer available.
    • EasyServe is no longer available.
    • LinearBitKernel is no longer available.
    • EasyDeL partitioners are no longer available.
    • Llama/Mistral/Falcon/Mpt static convertors or transforms are no longer available.
  • Known Issues

    • Lora Kernel Sometimes Crash.
    • GenerationPipeLine has a compiling problem when the number of available devices is more than 4 and using 8_bit params.
    • Most of the features won't work for TPU-v3 and GPUs with compute capability lower than 7.5.
    • Kaggle session will crash after importing EasyDeL (Kaggle's latest environment is not stable it's not related to EasyDeL). (Fixed in EasyDeL version 0.0.67)

Pallas Fusion: GPU Turbocharged 🚀

16 May 09:33
Compare
Choose a tag to compare

EasyDeL version 0.0.65

  • New Features

    • Pallas Flash Attention on CPU/GPU/TPU via FJFormer and supports bias.
    • ORPO Trainer is added and now it's in your bag.
    • WebSocket Serve Engine.
    • Now EasyDeL is 30% faster on GPUs.
    • No JAX-Triton is now needed to run GPU kernels.
    • Now you can specify the backward kernel implementation for Pallas Attention.
    • now you have to import EasyDeL as easydel instead of EasyDel.
  • New Models

    • OpenELM model series are now present.
    • DeepseekV2 model series are now present.
  • Fixed Bugs

    • CUDNN FlashAttention Bugs are now fixed.
    • Llama3 Model 8Bit quantization of parameters had a lot of improvements.
    • Splash Attention bugs on TPUs are now fixed .
    • Dbrx Model Bugs are fixed.
    • DPOTrainer Bugs are Fixed (creating dataset).
  • Known Bugs

    • Splash Attention won't work on TPUv3.
    • Pallas Attention won't work on TPUv3.
    • You need to install flash_attn in order to convert HF DeepseekV2 to EasyDeL (bug in DeepseekV2 implementation from original authors).
    • Some Examples are out dated.

Full Changelog: 0.0.63...0.0.65

0.0.63

27 Apr 12:56
Compare
Choose a tag to compare

whats changed

  • Phi3 Model Added.
  • Dbrx Model Added.
  • Arctic Model Added.
  • Lora Fine-Tuning Bugs Fixed.
  • Vanilla Attention is Optimized.
  • Sharded Vanilla is the default attention mechanism now.

Full Changelog: 0.0.61...0.0.63

EasyDeL-0.0.61 Dynamic Changes

17 Apr 15:45
Compare
Choose a tag to compare

What's Changed

  • Add support for iterable dataset loading by @yhavinga in #138
  • SFTTrainer bugs are fixed.
  • Parameter quantization is now available for all of the models.
  • AutoEasyDeLModelForCausalLM now supports load_in_8bit.
  • Memory Management improved.
  • Gemma Models Generation Issue is now Fixed.
  • Trainers are now 2~8% faster.
  • Attention Operation is improved.
  • The Cohere Model is now present.
  • JAXServer is improved.
  • Due to recent changes a lot of examples of documentation have changed and will be changed soon.

Full Changelog: 0.0.60...0.0.61

EasyDeL Version 0.0.60

06 Apr 15:50
Compare
Choose a tag to compare

What's Changed

  • SFTTrainer is now available.
  • VideoCausalLanguageModelTrainer is now available.
  • New models such as Grok-1, Qwen2Moe, Mamba, Rwkv, and Whisper are available.
  • MoE models had some speed improvements.
  • Training Speed is now 18%~42% faster.
  • Normal Attention is now faster by 12%~30% #131 .
  • DPOTrainer Bugs Fixed.
  • CausalLanguageModelTrainer is now more customizable.
  • WANDB logging has improved.
  • Performace Mode is added to Training Arguments.
  • Model configs pass attributes to PretrainedConfig to prevent override… by @yhavinga in #122
  • Ignore token label smooth z loss by @yhavinga in #123
  • Time the whole train loop instead of only call to train step function by @yhavinga in #124
  • Add save_total_limit argument to delete older checkpoints by @yhavinga in #127
  • Add gradient norm logging, fix metric collection on multi-worker setup by @yhavinga in #135

Full Changelog: 0.0.55...0.0.60

EasyDeL Version 0.0.55

03 Mar 09:30
Compare
Choose a tag to compare

EasyDeL Version 0.0.55

  • JAX DPOTrainer Bugs Fixed
  • StableLM Models are supported with FlashAttention and RING-Attention
  • RingAttention is supported for Up to 512K or 1M token training and inference
  • chunk MLP Is Supported for Up to 512K or 1M token training and inference
  • now all the Models support shared key and value caching for high context length interface and can be accessed via use_sharded_kv_caching=True in model config (see examples).
  • EasyDeL successfully passed 1256000 Context Length Inference on TPUs (Llama Model Tested)
  • Vision Trainer is added, you might except some bugs from that.

Full Changelog: 0.0.50...0.0.55

0.0.50 Mixture of EasyDeL experts

08 Feb 11:40
Compare
Choose a tag to compare

What's Changed

  • Optimize mean loss and accuracy calculation by @yhavinga in #100
  • Mixtral Models are fully supported and they are PJIT-compatible
  • A Wider range of models now support FlashAttention on TPU
  • Qwen 1, Qwen 2, PHI 2, Robert is new Added Models which support FlashAttention on TPU and EasyBIT
  • LoRA support for the trainer is now Added (EasyDeLXRapTureConfig)
  • Adding EasyDel Serve Engine APIs
  • Adding Prompter (Beta and might be removed in future updates)
  • The Training Process is now 21 % Faster in 0.0.50 than 0.0.42.
  • Transform Functions are now Automated for all the models (Except MosaicMPT for this one you still have to use static methods)
  • The Trainer APIs have changed and now it's faster, more dynamic, and more hackable.
  • Default Version of the JAX now changed to 0.4.22 for FJFormer custom Pallas kernels usage.

New Contributors

Full Changelog: 0.0.42...0.0.50

Version 0.0.42 Easy State

11 Jan 12:56
Compare
Choose a tag to compare

New Features:

  • EasyDelState is added
  • Auto Convertors from torch > huggingface > jax > flax > EasyDel are added
  • Trainer has a lot of improvements

Full Changelog: 0.0.41...0.0.42