Some quick and dirty notes on using the Google Spreadsheet API in Perl circa March 2007
The typical login techniques used on many websites are either:
Google take another approach.
my $ua = LWP::UserAgent->new;
my $response = $ua->post(
'https://www.google.com/accounts/ClientLogin',
{
accountType => 'GOOGLE',
Email => $email,
Passwd => $passwd,
service => $service,
source => 'anything.that.identifies.your.app',
}
);
die "Failed to login to google service $service as $email ", $response->status_line
if !$response->is_success;
my $auth = extract_auth($response->content);
$ua->default_header('Authorization' => "GoogleLogin auth=$auth");
I am aware this is a bit dense, but it did the job for me:
sub extract_auth {
# Split the input into lines, loop over and return the value for the one starting Auth=
for (split /\n/, shift) { return $1 if $_ =~ /\AAuth=(.*)\z/; }
return '';
}
Provided you now send subsequent requests using $ua you should be authenticated (until Google decide you need to re-authenticate).
Note: I'm doing this "quick and dirty" below using regexs. I know, it's nasty, and you should really be using XPaths to pull out the values.
First up, a little helper sub for fetching urls
sub fetch {
my ($ua, $url) = @_;
my $response = $ua->get($url);
die "Failed to fetch $url " . $response->status_line
if !$response->is_success;
return $response->content;
}
Get the list of spreadsheets for the current account
my $spreadsheets_xml = XMLin( fetch($ua, 'http://spreadsheets.google.com/feeds/spreadsheets/private/full'), ForceArray => 1, );
This extracts the url for the first spreadsheet that is called $title.
my ($spreadsheet) = grep { $_->{title}->[0]->{content} eq $title } @{$spreadsheets_xml->{entry}};
my ($spreadsheet_url) = grep
{
$_->{type} eq 'application/atom+xml'
&&
$_->{rel} =~ m{\A http://schemas.google.com/spreadsheets/\d+#worksheetsfeed \z}x
}
@{$spreadsheet->{link}};
Then go fetch that particular spreadsheet's XML
my $spreadsheet_xml = XMLin(
fetch($ua, $spreadsheet_url->{href}),
ForceArray => 1,
);
Watch out for the s on the end (or not) and don't confuse spreadsheets_xml with spreadsheet_xml. What is the collective noun for a gaggle of spreadsheets?
Next, pull out the url for the worksheet called $worksheet_title from our spreadsheet.
my ($worksheets) = grep { $_->{title}->[0]->{content} eq $worksheet_title } @{$spreadsheet_xml->{entry}};
my ($worksheet_url) = grep
{
$_->{type} eq 'application/atom+xml'
&&
$_->{rel} =~ m{\A http://schemas.google.com/spreadsheets/\d+#listfeed \z}x
}
@{$worksheets->{link}};
Nearly there now, we just need to get the data for our worksheet. Those still paying attention will obviously be thirsty for some degree of abstraction to factor out all this duplicate code. Like XPaths this is left as an exercise for you to have fun with ;)
my $data_xml = XMLin(
fetch($ua, $worksheet_url->{href}),
ForceArray => 1,
);
And as if by magic, you now have a typically bloated XML document representing what could be such a lean, mean array of arrays!
The documentation is (was?) helpfully bone-headed about this, so here goes:
First find the url of the cell or row you want to delete. You'll have to delve into the XML yourself to find that ...
$edit = the xml reference for the row/cell you want to delete.
Make sure you are using the $ua object we created above with the added authentication header stuffed in.
And don't be hoping to delete multiple cells or rows in a single call. Easy tiger!
Annoyingly we can't use $ua->post() because we need to munge the headers yet further.
my $headers = $ua->default_headers;
$headers->header('X-HTTP-Method-Override' => 'DELETE');
$headers->header('Content-length' => 0);
my $request = HTTP::Request->new(POST => $edit->{href}, $headers, '');
my $response = $ua->request($request);
die "Failed to post to $edit->{href} ", $request->as_string, $response->as_string
if !$response->is_success;
Cannot tell you how long that took to figure out! I will say tcpick rocks for reverse engineering the HTTP protocol out of the Java client examples ;)
... at least at the time of writing.
It'll work within Google Spreadsheets (the web version), Microsoft Excel and Open Office but your Google Spreadsheet API feed will terminate at the first blank line in your data. How very annoying!
One cell or row can be updated/deleted per http request. Fixing this is going to make a HUGE difference to the functionality imho. And perhaps take a while?
(anyone else thinking "row level locking"?)
I am almost certainly being dumb and mis-reading someones spec, but damn it there is a tonne of replication and redundant data in the XML documents.
The edit, post and fetch URLs are all the same. Identical. Why repeat them? Future proof my ass, as Bender might quip.