YQL and Scraperwiki sitting in a tree…

April 12, 2010

·

Scraperwiki is brilliant. YQL is brilliant. Now, they can get together and make lots of datababies.

Using the simple webservice I’ve written, it’s a bit easier to use scraperwiki data in YQL queries and to mash up scraperwiki data with other YQL sources.

YQL

YQL presents a uniform query interface, modelled on SQL, for various web APIs. You can run queries like select * from flickr.photos.recent to get a list, in json or in xml, of recent flickr photos.

Data can be mashed together from multiple tables/urls, like so:

    select * from search.web where query in
        (select title from rss
            where url="http://rss.news.yahoo.com/rss/topstories"
            | truncate(count=1))
    limit 1

[run that in the YQL console]

By publishing a chunk of XML and javascript by way of an adaptor non-Yahoo! services can be queried through YQL.

There are already adaptors published for many web APIs. YQL calls these things ‘datatables’. And you can use them like this:

    USE "http://myserver.com/mytables.xml" AS mytable;
    SELECT * FROM mytable WHERE...

With me so far?

Scraperwiki

Scraperwiki is a new, still-in-beta service for building web-scrapers and sharing the data they scrape. More scrapers are appearing daily and the site provides a useful API for querying the data created.

Using scraperwiki in YQL

I’ve created a service that automatically generates YQL datatable definitions from scraperwiki scrapers. You can find the definition for any scraperwiki scraper at a url that looks like this: http://swikiyql.heroku.com/SCRAPERWIKI_SHORT_NAME.xml

You’ll need a scraperwiki API key – you can sign up for one on their site. But once you’ve done that you should be able to run YQL queries like this

    use 'http://swikiyql.heroku.com/wikipedia-2010-uk-election-candidates.xml'
        AS candidates;
    SELECT * FROM candidates
        WHERE party='UKIP'
        AND sw_api_key='YOUR_SW_API_KEY';

which queries the data from this scraperwiki scrape. And using this, you should be able to mash together different scraperwiki scrapes, or mash scrapes with any other YQL tables.

That’s it really – the source code for the swikiyql (pronounced swikiyql) is on github and there are probably lots of things wrong with it, but it works for me.

Now, go, mash things up!

Tags: ,

Leave a Comment

2 Comments
  1. Sym says:

    (Scraperwiki co-creator here)

    This is great!

    Do you mind if (in a couple of months) we install this on scraperwiki itself?

    Let’s talk on email :)

  2. steve white says:

    do you have example of use of it