Few improvements to README

mlverse · Sep 10, 2024 · cbc0d38 · cbc0d38
1 parent 6b7a00f
commit cbc0d38
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 33 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -20,7 +20,7 @@ library(DBI)
 mall::llm_use("ollama", "llama3.1", seed = 100)
 ```
 
-# `mall`
+# mall
 
 <!-- badges: start -->
 <!-- badges: end -->
@@ -32,17 +32,15 @@ table_of_contents()
 
 
 <!-- toc: start -->
-- [Intro](#intro)
-    - [Databricks integration](#databricks-integration)
 - [Motivation](#motivation)
-- [Examples](#examples)
+- [LLM functions](#llm-functions)
     - [Sentiment](#sentiment)
     - [Summarize](#summarize)
     - [Classify](#classify)
-    - [Extract ](#extract-)
+    - [Extract ](#extract)
     - [Translate](#translate)
     - [Custom prompt](#custom-prompt)
-    - [Init](#init)
+- [Initialize session](#initialize-session)
 - [Key considerations](#key-considerations)
 - [Performance](#performance)
 - [Vector functions](#vector-functions)
@@ -80,7 +78,7 @@ runs your text data directly against the LLM.  The LLM's flexibility, allows for
 it to adapt to the subject of your data, and provide surprisingly accurate predictions. 
 This saves the data scientist the need to write and tune an NLP model. 
 
-## Examples
+## LLM functions
 
 We will start with a very small table with product reviews:
 
@@ -221,7 +219,7 @@ reviews |>
   llm_custom(review, my_prompt)
 ```
 
-### Init
+## Initialize session
 
 Invoking an `llm_` function will automatically initialize a model selection
 if you don't have one selected yet. If there is only one option, it will 
@@ -296,7 +294,7 @@ library(tictoc)
 tic()
 reviews_llm <- book_reviews |>
   llm_sentiment(
-    x = review,
+    col = review,
     options = c("positive", "negative"),
     pred_name = "predicted"
   )

diff --git a/README.md b/README.md
@@ -1,23 +1,21 @@
 
 <!-- README.md is generated from README.Rmd. Please edit that file -->
 
-# `mall`
+# mall
 
 <!-- badges: start -->
 <!-- badges: end -->
 <!-- toc: start -->
 
-- [Intro](#intro)
-  - [Databricks integration](#databricks-integration)
 - [Motivation](#motivation)
-- [Examples](#examples)
+- [LLM functions](#llm-functions)
   - [Sentiment](#sentiment)
   - [Summarize](#summarize)
   - [Classify](#classify)
-  - [Extract](#extract-)
+  - [Extract](#extract)
   - [Translate](#translate)
   - [Custom prompt](#custom-prompt)
-  - [Init](#init)
+- [Initialize session](#initialize-session)
 - [Key considerations](#key-considerations)
 - [Performance](#performance)
 - [Vector functions](#vector-functions)
@@ -58,7 +56,7 @@ The LLM’s flexibility, allows for it to adapt to the subject of your
 data, and provide surprisingly accurate predictions. This saves the data
 scientist the need to write and tune an NLP model.
 
-## Examples
+## LLM functions
 
 We will start with a very small table with product reviews:
 
@@ -132,10 +130,10 @@ number of words to output (`max_words`):
 reviews |>
   llm_summarize(review, max_words = 5)
 #> # A tibble: 3 × 2
-#>   review                                   .summary                        
-#>   <chr>                                    <chr>                           
-#> 1 This has been the best TV I've ever use… excellent tv with great features
-#> 2 I regret buying this laptop. It is too … laptop is too slow noisy        
+#>   review                                   .summary                       
+#>   <chr>                                    <chr>                          
+#> 1 This has been the best TV I've ever use… very good tv experience overall
+#> 2 I regret buying this laptop. It is too … slow and noisy laptop purchase 
 #> 3 Not sure how to feel about my new washi… mixed feelings about new washer
 ```
 
@@ -146,10 +144,10 @@ argument. This works with the other `llm_` functions as well.
 reviews |>
   llm_summarize(review, max_words = 5, pred_name = "review_summary")
 #> # A tibble: 3 × 2
-#>   review                                   review_summary                  
-#>   <chr>                                    <chr>                           
-#> 1 This has been the best TV I've ever use… excellent tv with great features
-#> 2 I regret buying this laptop. It is too … laptop is too slow noisy        
+#>   review                                   review_summary                 
+#>   <chr>                                    <chr>                          
+#> 1 This has been the best TV I've ever use… very good tv experience overall
+#> 2 I regret buying this laptop. It is too … slow and noisy laptop purchase 
 #> 3 Not sure how to feel about my new washi… mixed feelings about new washer
 ```
 
@@ -197,7 +195,6 @@ to be defined. The translation accuracy will depend on the LLM
 ``` r
 reviews |>
   llm_translate(review, "spanish")
-#> ■■■■■■■■■■■ 33% | ETA: 3s ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 1s
 #> # A tibble: 3 × 2
 #>   review                                   .translation                         
 #>   <chr>                                    <chr>                                
@@ -230,18 +227,18 @@ reviews |>
 #> 3 Not sure how to feel about my new washi… No
 ```
 
-### Init
+## Initialize session
 
 Invoking an `llm_` function will automatically initialize a model
 selection if you don’t have one selected yet. If there is only one
 option, it will pre-select it for you. If there are more than one
 available models, then `mall` will present you as menu selection so you
 can select which model you wish to use.
 
-Calling `llm_use()` directly will let you specify the model and
-backend to use. You can also setup additional arguments that will be
-passed down to the function that actually runs the prediction. In the
-case of Ollama, that function is
+Calling `llm_use()` directly will let you specify the model and backend
+to use. You can also setup additional arguments that will be passed down
+to the function that actually runs the prediction. In the case of
+Ollama, that function is
 [`generate()`](https://hauselin.github.io/ollama-r/reference/generate.html).
 
 ``` r
@@ -316,16 +313,16 @@ library(tictoc)
 tic()
 reviews_llm <- book_reviews |>
   llm_sentiment(
-    x = review,
+    col = review,
     options = c("positive", "negative"),
     pred_name = "predicted"
   )
-#>  ■                                  1% |  ETA:  3m ■■                                 2% |  ETA:  2m ■■                                 3% |  ETA:  5m ■■                                 4% |  ETA:  4m ■■■                                5% |  ETA:  3m ■■■                                6% |  ETA:  3m ■■■                                7% |  ETA:  3m ■■■                                8% |  ETA:  3m ■■■■                               9% |  ETA:  3m ■■■■                              10% |  ETA:  3m ■■■■                              11% |  ETA:  3m ■■■■■                             12% |  ETA:  3m ■■■■■                             13% |  ETA:  3m ■■■■■                             14% |  ETA:  2m ■■■■■                             15% |  ETA:  3m ■■■■■■                            16% |  ETA:  2m ■■■■■■                            17% |  ETA:  2m ■■■■■■                            18% |  ETA:  2m ■■■■■■■                           19% |  ETA:  2m ■■■■■■■                           20% |  ETA:  2m ■■■■■■■                           21% |  ETA:  2m ■■■■■■■■                          22% |  ETA:  2m ■■■■■■■■                          23% |  ETA:  2m ■■■■■■■■                          24% |  ETA:  2m ■■■■■■■■■                         25% |  ETA:  2m ■■■■■■■■■                         26% |  ETA:  2m ■■■■■■■■■                         27% |  ETA:  2m ■■■■■■■■■                         28% |  ETA:  2m ■■■■■■■■■■                        29% |  ETA:  2m ■■■■■■■■■■                        30% |  ETA:  2m ■■■■■■■■■■                        31% |  ETA:  2m ■■■■■■■■■■■                       32% |  ETA:  2m ■■■■■■■■■■■                       33% |  ETA:  2m ■■■■■■■■■■■                       34% |  ETA:  2m ■■■■■■■■■■■                       35% |  ETA:  1m ■■■■■■■■■■■■                      36% |  ETA:  1m ■■■■■■■■■■■■                      37% |  ETA:  1m ■■■■■■■■■■■■                      38% |  ETA:  1m ■■■■■■■■■■■■■                     39% |  ETA:  1m ■■■■■■■■■■■■■                     40% |  ETA:  1m ■■■■■■■■■■■■■                     41% |  ETA:  1m ■■■■■■■■■■■■■■                    42% |  ETA:  1m ■■■■■■■■■■■■■■                    43% |  ETA:  1m ■■■■■■■■■■■■■■                    44% |  ETA:  1m ■■■■■■■■■■■■■■■                   45% |  ETA:  1m ■■■■■■■■■■■■■■■                   46% |  ETA:  1m ■■■■■■■■■■■■■■■                   47% |  ETA:  1m ■■■■■■■■■■■■■■■                   48% |  ETA:  1m ■■■■■■■■■■■■■■■■                  49% |  ETA:  1m ■■■■■■■■■■■■■■■■                  50% |  ETA:  1m ■■■■■■■■■■■■■■■■                  51% |  ETA:  1m ■■■■■■■■■■■■■■■■■                 52% |  ETA:  1m ■■■■■■■■■■■■■■■■■                 53% |  ETA:  1m ■■■■■■■■■■■■■■■■■                 54% |  ETA:  1m ■■■■■■■■■■■■■■■■■                 55% |  ETA:  1m ■■■■■■■■■■■■■■■■■■                56% |  ETA:  1m ■■■■■■■■■■■■■■■■■■                57% |  ETA:  1m ■■■■■■■■■■■■■■■■■■                58% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■               59% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■               60% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■               61% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■■              62% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■■              63% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■■              64% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■■■             65% |  ETA:  1m ■■■■■■■■■■■■■■■■■■■■■             66% |  ETA: 50s ■■■■■■■■■■■■■■■■■■■■■             67% |  ETA: 48s ■■■■■■■■■■■■■■■■■■■■■             68% |  ETA: 46s ■■■■■■■■■■■■■■■■■■■■■■            69% |  ETA: 45s ■■■■■■■■■■■■■■■■■■■■■■            70% |  ETA: 43s ■■■■■■■■■■■■■■■■■■■■■■            71% |  ETA: 41s ■■■■■■■■■■■■■■■■■■■■■■■           72% |  ETA: 40s ■■■■■■■■■■■■■■■■■■■■■■■           73% |  ETA: 38s ■■■■■■■■■■■■■■■■■■■■■■■           74% |  ETA: 37s ■■■■■■■■■■■■■■■■■■■■■■■           75% |  ETA: 35s ■■■■■■■■■■■■■■■■■■■■■■■■          76% |  ETA: 34s ■■■■■■■■■■■■■■■■■■■■■■■■          77% |  ETA: 33s ■■■■■■■■■■■■■■■■■■■■■■■■          78% |  ETA: 31s ■■■■■■■■■■■■■■■■■■■■■■■■■         79% |  ETA: 30s ■■■■■■■■■■■■■■■■■■■■■■■■■         80% |  ETA: 29s ■■■■■■■■■■■■■■■■■■■■■■■■■         81% |  ETA: 27s ■■■■■■■■■■■■■■■■■■■■■■■■■■        82% |  ETA: 25s ■■■■■■■■■■■■■■■■■■■■■■■■■■        83% |  ETA: 24s ■■■■■■■■■■■■■■■■■■■■■■■■■■        84% |  ETA: 22s ■■■■■■■■■■■■■■■■■■■■■■■■■■■       85% |  ETA: 21s ■■■■■■■■■■■■■■■■■■■■■■■■■■■       86% |  ETA: 20s ■■■■■■■■■■■■■■■■■■■■■■■■■■■       87% |  ETA: 19s ■■■■■■■■■■■■■■■■■■■■■■■■■■■       88% |  ETA: 17s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      89% |  ETA: 16s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      90% |  ETA: 14s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      91% |  ETA: 13s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     92% |  ETA: 11s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     93% |  ETA: 10s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     94% |  ETA:  8s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     95% |  ETA:  7s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■    96% |  ETA:  6s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■    97% |  ETA:  5s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■    98% |  ETA:  3s ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   99% |  ETA:  2s                                                   ! There were 1 predictions with invalid output, they were coerced to NA
+#> ! There were 1 predictions with invalid output, they were coerced to NA
 ```
 
 ``` r
 toc()
-#> 175.028 sec elapsed
+#> 169.546 sec elapsed
 ```
 
 As far as **time**, on my Apple M3 machine, it took about 3 minutes to