Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem with error message feature #48

Open
andreasabel opened this issue Feb 17, 2023 · 7 comments
Open

Performance problem with error message feature #48

andreasabel opened this issue Feb 17, 2023 · 7 comments
Labels
error reporting Concering errors reported by fix-whitespace performance
Milestone

Comments

@andreasabel
Copy link
Member

After PR #44, performance has degraded for large files with lots of violations.

Reproducer:

runhaskell test/GenerateViolations.hs 200000
FW=$(cabal list-bin fix-whitespace)
time $FW --check 200000-violations.txt 2> /dev/null

Takes 30sec on my machine. The released (0.0.11) finishes in a fraction of a second.

Proposed procedure:

  1. Try to fix the performance problem
  2. Guard error reporting under --verbose flag.
@andreasabel andreasabel added error reporting Concering errors reported by fix-whitespace performance labels Feb 17, 2023
@andreasabel andreasabel added this to the 0.1 milestone Feb 17, 2023
@ulysses4ever
Copy link
Contributor

After a couple of shots in the dark, it seems to me that the most straightforward way to go about it is to print violations in the main loop, right away, without building up a list of violations.

Flag sounds good! (I honestly thought that you'd require it from the original PR.) I wouldn't call it as generically as --verbose, but something like --print-violations.

@andreasabel
Copy link
Member Author

After a couple of shots in the dark, it seems to me that the most straightforward way to go about it is to print violations in the main loop, right away, without building up a list of violations.

I think it is good to keep a pure interface, and do IO only in the layer using that interface.

transform
:: TabSize -- ^ Expand tab characters to so many spaces. Keep tabs if @<= 0@.
-> Text -- ^ Text before transformation.
-> (Text, [LineError]) -- ^ Text after transformation and violating lines if any.

Flag sounds good! (I honestly thought that you'd require it from the original PR.) I wouldn't call it as generically as --verbose, but something like --print-violations.

Ok, we can bikeshed it. For a --print-XXX flag I would expect stuff to be printed to stdout for later consumption by another program. (Kind of the proper output of the program.)
There isn't too much written about naming options. There is https://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html . The subsequent option table mentions --quiet / --silent and ``--verbose`. This would be my first take, unless further diversification is needed.

ulysses4ever added a commit to ulysses4ever/fix-whitespace that referenced this issue Mar 18, 2023
Alleviates agda#48 but a proper performance fix would be better.
ulysses4ever added a commit to ulysses4ever/fix-whitespace that referenced this issue Mar 18, 2023
Alleviates agda#48 but a proper performance fix would be better.
ulysses4ever added a commit to ulysses4ever/fix-whitespace that referenced this issue Mar 18, 2023
Alleviates agda#48 but a proper performance fix would be better.
@ulysses4ever
Copy link
Contributor

#49 provides a way to work around the performance issue by resurrecting the old implementation and putting the new one under a flag. This is largely independent of the the issue itself.

I must say I find few things in live as sad as tuning performance of Haskell applications. Below are couple observations of my failed attempts at this in our case.

First, many times slow applications generate excessive amount of garbage and productivity goes down. Surprising as it is we don't experience this. Productivity stays above 99%.

Another observation: a lot of slowdown comes from just printing out stuff. I performed a little experiment where I don't do any analysis, and just print out every input line with the decoration displayLineError:

fix :: Mode -> Verbose -> TabSize -> FilePath -> IO Bool
fix mode verbose _tabSize f = do
  s <- Text.readFile f
  pure (CheckViolation s (buildVs s))
    >>= \case

    CheckViolation s vs ->  do
      Text.hPutStrLn stderr (msg vs)
      when (mode == Fix) $
        withFile f WriteMode $ \h -> do
          hSetEncoding h utf8
          Text.hPutStr h s
      return True

  where
    buildVs = zipWith LineError [1..] . Text.lines

    msg vs
      | mode == Fix =
        "[ Violation fixed ] " <> ft

      | otherwise =
        "[ Violation detected ] " <> ft <>
        (if not verbose then "" else
           ":\n" <> Text.unlines (map (displayLineError ft) vs))

    ft = Text.pack f
fix-whitespace on  performance-experiments-2 [?] via λ 9.2.6 
❯ time $FW --check 20000-violations.txt +RTS -s -p -RTS 2>/dev/null

________________________________________________________
Executed in    6.22 millis    fish           external
   usr time    1.31 millis  341.00 micros    0.97 millis
   sys time    5.00 millis  142.00 micros    4.86 millis


fix-whitespace on  performance-experiments-2 [?] via λ 9.2.6 
❯ time $FW -v --check 20000-violations.txt +RTS -s -p -RTS 2>/dev/null

________________________________________________________
Executed in  433.11 millis    fish           external
   usr time  253.06 millis    0.18 millis  252.88 millis
   sys time  180.00 millis    1.08 millis  178.91 millis

(i also use profiling but without profiling numbers are not much different)

@ulysses4ever
Copy link
Contributor

My idea for improving performance of the output would be to try the Builder type, perhaps.

@ulysses4ever
Copy link
Contributor

Unfortunately, the basic lazy builder didn't seem to help anything (the code is here ulysses4ever@faabc6a). I'm thinking to give this a try: https://github.com/Bodigrim/linear-builder.

@ulysses4ever
Copy link
Contributor

I tried an even simpler experiment, and it does look like just printing stuff out is embarrassingly slow. Here's a little program that reads stdin, breaks it into lines, adds the index to every line and prints out the result:

-- test.hs
module Main where

import TextShow
import qualified Data.Text                    as T
import qualified Data.Text.IO                 as T

main = T.getContents >>= T.putStrLn . procText

procText =
  T.unlines .
  zipWith (\i l -> showt i <> l) [1::Int ..] .
  T.lines

It's pretty slow already:

fix-whitespace on  performance-experiments-2 [?] via λ 9.2.6 
❯ time cat 20000-violations.txt | runghc test.hs >/dev/null

________________________________________________________
Executed in  387.92 millis    fish           external
   usr time  300.18 millis    0.22 millis  299.96 millis
   sys time   86.46 millis    2.10 millis   84.37 millis


fix-whitespace on  performance-experiments-2 [?] via λ 9.2.6 
❯ time cat 200000-violations.txt | runghc test.hs >/dev/null

________________________________________________________
Executed in    1.72 secs    fish           external
   usr time    1.20 secs    0.00 millis    1.20 secs
   sys time    0.52 secs    2.90 millis    0.52 secs

@ulysses4ever
Copy link
Contributor

Oops, using runghc was a slip. If I compile the above program, it runs on par with Rust analog. Back to square one...

andreasabel pushed a commit that referenced this issue Aug 7, 2023
Alleviates #48 but a proper performance fix would be better.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error reporting Concering errors reported by fix-whitespace performance
Projects
None yet
Development

No branches or pull requests

2 participants