Skip to content

A version of `bufio.Scanner` that works for lines of arbitrary length.

License

Notifications You must be signed in to change notification settings

turtlemonvh/altscanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AltScanner GoDoc Build Status

A version of bufio.Scanner that works with lines of arbitrary length.

Why

If you're getting a bufio.Scanner: token too long error, this may be what you want.

How

If your code used to look like this:

import "bufio"

s := bufio.NewScanner(myIoReader)
for s.Scan() {
    // Do work
}

You can now handle very long lines without errors by changing to:

import "github.com/turtlemonvh/altscanner"

s := altscanner.NewAltScanner(myIoReader)
for s.Scan() {
    // Do work
}

Caveats

  • Only breaks on newlines.
  • Just appends bytes to a byte slice instead of using a real buffer.

Alternatives

If you have a good idea about the size of your data and are running go>1.6 (where the Scanner.Buffer method was introduced), you probably just want to change the size of the buffer used by the scanner. For example:

// Create a scanner and resize its buffer to be 10X larger than usual (640 Kb instead of 64 Kb)
scanner := bufio.NewScanner(file)
scanner.Buffer(make([]byte, bufio.MaxScanTokenSize), bufio.MaxScanTokenSize*10)

However, if you need to be compatible with go<1.6 or you really have no idea about the size of your data, this approach works pretty well.

Performance

It is robust, but not very fast. The benchmark results below show the performance of reading in 5 lines of content. The lines used in the tests are either 30 bytes (short) or 300K bytes (long).

$ go test -test.bench=Scanner -test.run=^$ -test.benchmem
BenchmarkBufioScannerSmall-8             1000000          1061 ns/op        4128 B/op          2 allocs/op
BenchmarkBufferedBufioScannerSmall-8     1000000          1059 ns/op        4128 B/op          2 allocs/op
BenchmarkAltScannerSmall-8               1000000          1779 ns/op        5824 B/op          8 allocs/op
BenchmarkBufferedBufioScannerLong-8        50000         28077 ns/op      127008 B/op          6 allocs/op
BenchmarkAltScannerLong-8                   2000       1142195 ns/op     7032704 B/op         78 allocs/op
PASS
ok      github.com/turtlemonvh/altscanner   13.458s

AltScanner is significantly slower, has many more allocations, and uses significantly more bytes per operation than the buffer bufio.Scanner. In short: it is always faster to use Scanner.Buffer to adjust the size of the buffer if you are using go1.6+ and you are confident about the max possible size of an line.

License

MIT

About

A version of `bufio.Scanner` that works for lines of arbitrary length.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages