Skip to content

Automated web browsing with javascript / cookies support.

License

Notifications You must be signed in to change notification settings

picklingjar/spynner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

= Introduction =

Spynner is a stateful programmatic web browser module for Python. It is based upon [http://www.qtsoftware.com/ Qt] and [http://webkit.org/ WebKit], so it supports Javascript, AJAX, and every other technology that !WebKit is able to handle (Flash, SVG, ...). Spynner takes advantage of [http://jquery.com jQuery], a powerful Javascript library that makes the interaction with pages and event simulation really easy.  

Using Spynner you would able to simulate a web browser with no GUI (though a browsing window can be opened for debugging purposes), so it may be used to implement crawlers or acceptance testing tools.

= Dependencies =

  * [http://www.python.org Python] (>=2.5)
  * [http://www.riverbankcomputing.co.uk/software/pyqt/download PyQt] (>=4.4.3): Python wrappers for the [http://www.qtsoftware.com/ Qt] framework.

= Install =

  * A [http://code.google.com/p/spynner/downloads/list release] version (recommended):

{{{
$ wget http://spynner.googlecode.com/files/spynner-VERSION.tgz
$ tar xvzf spynner-VERSION.tgz
$ cd spynner-VERSION
$ sudo python setup.py install
}}}

or 

{{{
$ sudo easy_install spynner
}}}

  * The bleeding edge version:

{{{
$ svn checkout http://spynner.googlecode.com/svn/trunk/ spynner-trunk
$ cd spynner-trunk
$ sudo python setup.py install
}}}

= API =

http://tokland.freehostia.com/googlecode/spynner/api/

You can generate the API locally (will create docs/api directory):

$ python setup.py gen_doc

= Usage =

A basic example:

{{{
import spynner

browser = spynner.Browser()
browser.load("http://www.wordreference.com")
browser.runjs("console.log('I can run Javascript')")
browser.runjs("console.log('I can run jQuery: ' + jQuery('a:first').attr('href'))")
browser.select("#esen")
browser.fill("input[name=enit]", "hola")
browser.click("input[name=b]")
browser.wait_page_load()
print browser.url, browser.html
browser.close()
}}}

Sometimes you'll want to see what is going on:

{{{
browser = spynner.Browser()
browser.debug_level = spynner.DEBUG
browser.create_webview()
browser.show()
...
}}}

See more examples in the repository: 

http://code.google.com/p/spynner/source/browse/#svn/trunk/examples

= Running Javascript = 

Spynner uses jQuery to make Javascript interface easier. By default, two modules are injected to every loaded page:

  * [http://docs.jquery.com/Downloading_jQuery jQuery core]: Amongst other things, it adds the powerful [http://docs.jquery.com/Selectors jQuery selectors], which are used internally by some Spynner methods (_fill_, _select_, _click_, _check_, ...). Of course you can also use jQuery when you inject your own code into a page.

  * [http://code.google.com/p/jqueryjs/source/browse/trunk/plugins/simulate jQuery _simulate_ plugin]: Makes it possible to simulate mouse and keyboard events (for now spynner uses it only in the _click_ action). Look up the library code to see which kind of events you can fire.

Note that you must use __jQuery(...)_ instead of _jQuery(...)_  or the common shortcut _$(...)_. That prevents name clashing with the jQuery library used by the page.

= Cook your soup: parsing the HTML = 

You can parse the HTML of a webpage with your favorite parsing library ([http://www.crummy.com/software/BeautifulSoup BeautifulSoup], [http://codespeak.net/lxml/ lxml], ...). Since we are already using Jquery for Javascript, it feels just natural to work with [http://pypi.python.org/pypi/pyquery pyquery], its Python counterpart:

{{{
import spynner
import pyquery

browser = spynner.Browser()
...
d = pyquery.Pyquery(browser.html)
d.make_links_absolute(browser.get_url())
href = d("#somelink").attr("href")
browser.download(href, open("/path/outputfile", "w"))
}}}

= Running Spynner without X11 ==

Spynner needs a X11 server to run. If you are running it in a server without X11 you must install the virtual [http://en.wikipedia.org/wiki/Xvfb Xvfb server]. Debian users can use the small wrapper (xvfb-run). If you are not using Debian, you can download it here:

http://www.mail-archive.com/debian-x@lists.debian.org/msg69632/x-run

{{{
$ xvfb-run python myscript_using_spynner.py
}}}

= Feedback =

Open an [http://code.google.com/p/spynner/issues/list issue] to report a bug or request a new feature. Other comments and suggestions can be directly emailed to me: [mailto://tokland@gmail.com tokland@gmail.com].

About

Automated web browsing with javascript / cookies support.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published