Work programmatically with Google Spreadsheets Part 2

A while back I wrote a short post on how you can write and read to Google Spreadsheets programmatically using Python and the package ‘gspread’.

Last time the reading was done by first creating arrays with the addresses to where the values could be found in the spreadsheet, and then run through all the values and replace the addresses with the values. It worked fine, but it’s not best practice or very efficient as it makes many single requests on the API. In part two, I will share a short tip on how to read the values in one go instead of iterating through a range of values.

Here is the excerpt dealing with retrieving values. (NB: see original blogpost for gspread initialization).

#Running through the values
get_val = []
set_name = []
set_country = []
for a in range(2,5124):
v = "B%s" % a
sn = "H%s" % a
sc = "G%s" % a
get_val.append(v)
set_name.append(sn)
set_country.append(sc)

for a in range(2,5124):
try:
name = worksheet.acell(get_val[a]).value
res = getCountry(name)
if res:
print res
country, last_id, name = res
worksheet.update_acell(set_name[a], name)
worksheet.update_acell(set_country[a], country)
except Exception as e:
print e

In a recent script we only wanted to download values from a Google spreadsheet (yes, we could have exported the files to .csv with similar result, but with a script we may expand and parse if needed), and this gave some time for refactoring the code as well.

The gspread function worksheet.get_all_values() returns a list of lists with the values. The outer list contains the rows, and the row list contains the specific value of the column at the numerical value for the column. In this example num_streams is the second column, and the position is hence [1] as the list starts at zero.

Also note the nifty way of writing utf-8 formatted strings to the file. UTF-8 can often cause an headache, but put a “u”-character before the string and open the stream with codecs.open(“filename”,”mode”,”encoding”).

The new way of retrieving data from a Google Docs Spreadsheet:

[sourcecode language=”python”]
# -*- coding: UTF-8 -*-
import gspread
import codecs

# Global variables for accessing resource
G_USERNAME = ‘user_email’
G_PASSWORD = ‘password’
G_IDENTIFIER = ‘spreadsheet_identifier’

# Connecting to the data source
gc = gspread.login(G_USERNAME,G_PASSWORD)
sht1 = gc.open_by_key(G_IDENTIFIER)
worksheet = sht1.get_worksheet(0)

all_val = worksheet.get_all_values()

output = codecs.open(‘output_norwegian_artists.csv’,’wb’, "utf-8-sig")

for l in all_val:
num_streams, artistid, name = (l[1],l[2],l[3])
norwegian = l[4]
if len(norwegian) < 3:
norwegian = ‘NULL’

string = u"{0};{1};{2};{3}\n".format(num_streams, artistid, name, norwegian)
output.write(string)

output.close()
[/sourcecode]

Picture licensed under a creative commons attribution license by the brilliant organization Open Knowledge Foundation. Picture retrieved through a CC-search on Flickr

Work programmatically with Google Spreadsheets

Some time back I authored a script which reads through a CSV formatted list, and based on the artists’ names tried to decide the nationality of the artists by querying the last.FM search engine and parsing the XML structured result.

The script worked, and found and returned about 80% of the artists, and around 80% of these again had the conceptually similar artist. What if the alteration could be done in the document itself? For the CSV based script the artists had to be extracted from the database, then parsed and then put into the document. With this attempt we can skip the middle step, and let the script run simultaneous with data being available to view for the users.

If you haven’t already used Google Docs, you should consider it, as it is a very convenient way of working with documents, especially where there are more users. Through Google Docs you can collaborate on working on documents, and since they are stored in clouds all users will instantly have the latest version. Another advantage is that whey you are working on the document simultaneously each user is editing the same document, so merging different versions is not a problem.

A great tool for combining Google Spreadsheets with the Python programming environment is the gspread module. After installing this package you only need to import the code and then you can with very few lines of code retrieve and update information in the spreadsheet cells.

The specifics for doing this task is pretty much these few lines. Import the package. Login, find correct document and do whatever you need to. (this excerpt won’t work. Check the whole script under – NB: mind the indentation. It may not be correct displayed in the browser.)

[sourcecode language=”python”]
import gspread

G_USERNAME = ‘your@gmail.com’
G_PASSWORD = ‘yourPassword’
G_IDENTIFIER = ‘document_identifier_checkdocument_url_in_your_browser’

# Connecting to the data source
gc = gspread.login(G_USERNAME,G_PASSWORD)
sht1 = gc.open_by_key(G_IDENTIFIER)
worksheet = sht1.get_worksheet(0)

for a in range(2,5124):
try:
name = worksheet.acell(get_val[a]).value
res = getCountry(name)
if res:
print res
country, last_id, name = res
worksheet.update_acell(set_name[a], name)
worksheet.update_acell(set_country[a], country)
except Exception as e:
print e

[/sourcecode]

Above is the lines related to connecting Python to the google docs, under you can see the whole script and how the method I mentioned in a post earlier is used in this setting.

[sourcecode language=”python”]
#!/usr/bin/python
# -*- coding:utf-8 -*-

"""
Clouds & Concerts – 2012
Ola Loevholm

Initialized from the commandline. Runs through The Google doc spreadsheet with topp 5000 artists, and
runs the parsing query against the Last.FM browser then enters the country and search string (for validation)
into the google docs.

"""

G_USERNAME = ‘your@gmail.com’
G_PASSWORD = ‘yourPassword’
G_IDENTIFIER = ‘document_identifier_checkdocument_url_in_your_browser’

import sys, urllib, string, csv, time
import xml.etree.ElementTree as ET
import gspread

# Loads a dictionary with ISO 3166-1 abbreviations and codes
COUNTRIES = {"AF":"AFGHANISTAN","AX":"ÅLAND ISLANDS","AL":"ALBANIA","DZ":"ALGERIA","AS":"AMERICAN SAMOA","AD":"ANDORRA","AO":"ANGOLA","AI":"ANGUILLA","AQ":"ANTARCTICA","AG":"ANTIGUA AND BARBUDA","AR":"ARGENTINA","AM":"ARMENIA","AW":"ARUBA","AU":"AUSTRALIA","AT":"AUSTRIA","AZ":"AZERBAIJAN","BS":"BAHAMAS","BH":"BAHRAIN","BD":"BANGLADESH","BB":"BARBADOS","BY":"BELARUS","BE":"BELGIUM","BZ":"BELIZE","BJ":"BENIN","BM":"BERMUDA","BT":"BHUTAN","BO":"BOLIVIA, PLURINATIONAL STATE OF","BQ":"BONAIRE, SINT EUSTATIUS AND SABA","BA":"BOSNIA AND HERZEGOVINA","BW":"BOTSWANA","BV":"BOUVET ISLAND","BR":"BRAZIL","IO":"BRITISH INDIAN OCEAN TERRITORY","BN":"BRUNEI DARUSSALAM","BG":"BULGARIA","BF":"BURKINA FASO","BI":"BURUNDI","KH":"CAMBODIA","CM":"CAMEROON","CA":"CANADA","CV":"CAPE VERDE","KY":"CAYMAN ISLANDS","CF":"CENTRAL AFRICAN REPUBLIC","TD":"CHAD","CL":"CHILE","CN":"CHINA","CX":"CHRISTMAS ISLAND",
"CC":"COCOS (KEELING) ISLANDS","CO":"COLOMBIA","KM":"COMOROS","CG":"CONGO","CD":"CONGO, THE DEMOCRATIC REPUBLIC OF THE","CK":"COOK ISLANDS","CR":"COSTA RICA","CI":"CÔTE D’IVOIRE","HR":"CROATIA","CU":"CUBA","CW":"CURAÇAO","CY":"CYPRUS","CZ":"CZECH REPUBLIC","DK":"DENMARK","DJ":"DJIBOUTI","DM":"DOMINICA","DO":"DOMINICAN REPUBLIC","EC":"ECUADOR","EG":"EGYPT","SV":"EL SALVADOR","GQ":"EQUATORIAL GUINEA","ER":"ERITREA","EE":"ESTONIA","ET":"ETHIOPIA","FK":"FALKLAND ISLANDS (MALVINAS)","FO":"FAROE ISLANDS","FJ":"FIJI","FI":"FINLAND","FR":"FRANCE","GF":"FRENCH GUIANA","PF":"FRENCH POLYNESIA","TF":"FRENCH SOUTHERN TERRITORIES","GA":"GABON","GM":"GAMBIA","GE":"GEORGIA","DE":"GERMANY","GH":"GHANA","GI":"GIBRALTAR","GR":"GREECE","GL":"GREENLAND","GD":"GRENADA","GP":"GUADELOUPE","GU":"GUAM","GT":"GUATEMALA","GG":"GUERNSEY","GN":"GUINEA","GW":"GUINEA-BISSAU","GY":"GUYANA","HT":"HAITI","HM":"HEARD ISLAND AND MCDONALD ISLANDS",
"VA":"HOLY SEE (VATICAN CITY STATE)","HN":"HONDURAS","HK":"HONG KONG","HU":"HUNGARY","IS":"ICELAND","IN":"INDIA","ID":"INDONESIA","IR":"IRAN, ISLAMIC REPUBLIC OF","IQ":"IRAQ","IE":"IRELAND","IM":"ISLE OF MAN","IL":"ISRAEL","IT":"ITALY","JM":"JAMAICA","JP":"JAPAN","JE":"JERSEY","JO":"JORDAN","KZ":"KAZAKHSTAN","KE":"KENYA","KI":"KIRIBATI","KP":"KOREA, DEMOCRATIC PEOPLE’S REPUBLIC OF","KR":"KOREA, REPUBLIC OF","KW":"KUWAIT","KG":"KYRGYZSTAN","LA":"LAO PEOPLE’S DEMOCRATIC REPUBLIC","LV":"LATVIA","LB":"LEBANON","LS":"LESOTHO","LR":"LIBERIA","LY":"LIBYA","LI":"LIECHTENSTEIN","LT":"LITHUANIA","LU":"LUXEMBOURG","MO":"MACAO","MK":"MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF","MG":"MADAGASCAR","MW":"MALAWI","MY":"MALAYSIA","MV":"MALDIVES","ML":"MALI","MT":"MALTA","MH":"MARSHALL ISLANDS","MQ":"MARTINIQUE","MR":"MAURITANIA","MU":"MAURITIUS","YT":"MAYOTTE","MX":"MEXICO","FM":"MICRONESIA, FEDERATED STATES OF",
"MD":"MOLDOVA, REPUBLIC OF","MC":"MONACO","MN":"MONGOLIA","ME":"MONTENEGRO","MS":"MONTSERRAT","MA":"MOROCCO","MZ":"MOZAMBIQUE","MM":"MYANMAR","NA":"NAMIBIA","NR":"NAURU","NP":"NEPAL","NL":"NETHERLANDS","NC":"NEW CALEDONIA","NZ":"NEW ZEALAND","NI":"NICARAGUA","NE":"NIGER","NG":"NIGERIA","NU":"NIUE","NF":"NORFOLK ISLAND","MP":"NORTHERN MARIANA ISLANDS","NO":"NORWAY","OM":"OMAN","PK":"PAKISTAN","PW":"PALAU","PS":"PALESTINIAN TERRITORY, OCCUPIED","PA":"PANAMA","PG":"PAPUA NEW GUINEA","PY":"PARAGUAY","PE":"PERU","PH":"PHILIPPINES","PN":"PITCAIRN","PL":"POLAND","PT":"PORTUGAL","PR":"PUERTO RICO","QA":"QATAR","RE":"RÉUNION","RO":"ROMANIA","RU":"RUSSIAN FEDERATION","RW":"RWANDA","BL":"SAINT BARTHÉLEMY","SH":"SAINT HELENA, ASCENSION AND TRISTAN DA CUNHA","KN":"SAINT KITTS AND NEVIS","LC":"SAINT LUCIA","MF":"SAINT MARTIN (FRENCH PART)","PM":"SAINT PIERRE AND MIQUELON","VC":"SAINT VINCENT AND THE GRENADINES",
"WS":"SAMOA","SM":"SAN MARINO","ST":"SAO TOME AND PRINCIPE","SA":"SAUDI ARABIA","SN":"SENEGAL","RS":"SERBIA","SC":"SEYCHELLES","SL":"SIERRA LEONE","SG":"SINGAPORE","SX":"SINT MAARTEN (DUTCH PART)","SK":"SLOVAKIA","SI":"SLOVENIA","SB":"SOLOMON ISLANDS","SO":"SOMALIA","ZA":"SOUTH AFRICA","GS":"SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS","SS":"SOUTH SUDAN","ES":"SPAIN","LK":"SRI LANKA","SD":"SUDAN","SR":"SURINAME","SJ":"SVALBARD AND JAN MAYEN","SZ":"SWAZILAND","SE":"SWEDEN","CH":"SWITZERLAND","SY":"SYRIAN ARAB REPUBLIC","TW":"TAIWAN, PROVINCE OF CHINA","TJ":"TAJIKISTAN","TZ":"TANZANIA, UNITED REPUBLIC OF","TH":"THAILAND","TL":"TIMOR-LESTE","TG":"TOGO","TK":"TOKELAU","TO":"TONGA","TT":"TRINIDAD AND TOBAGO","TN":"TUNISIA","TR":"TURKEY","TM":"TURKMENISTAN","TC":"TURKS AND CAICOS ISLANDS","TV":"TUVALU","UG":"UGANDA","UA":"UKRAINE","AE":"UNITED ARAB EMIRATES","GB":"UNITED KINGDOM","US":"UNITED STATES",
"UM":"UNITED STATES MINOR OUTLYING ISLANDS","UY":"URUGUAY","UZ":"UZBEKISTAN","VU":"VANUATU","VE":"VENEZUELA, BOLIVARIAN REPUBLIC OF","VN":"VIET NAM","VG":"VIRGIN ISLANDS, BRITISH","VI":"VIRGIN ISLANDS, U.S.","WF":"WALLIS AND FUTUNA","EH":"WESTERN SAHARA","YE":"YEMEN","ZM":"ZAMBIA","ZW":"ZIMBABWE"}

 

# Connecting to the data source
gc = gc = gspread.login(G_USERNAME,G_PASSWORD)
sht1 = gc.open_by_key(G_IDENTIFIER)
worksheet = sht1.get_worksheet(0)
# Iterates through XML-structure and removes the namespace, for easier navigation in getCountry()s ElementTree.findall()
def remove_namespace(doc, namespace):
"""Remove namespace in the passed document in place."""
ns = u'{%s}’ % namespace
nsl = len(ns)
for elem in doc.getiterator():
if elem.tag.startswith(ns):
elem.tag = elem.tag[nsl:]
# getCountry – where the magic happens. Encodes string with artistname to url, then query musicbrainz search engine.
# parses the XML-answer and get the name, id and country of the first returned element (with highest weight)
# returns country name i a) artist is found through the search engine b) artist has a country associated to the profile, otherwise returns False
def getCountry(name):
name = urllib.quote_plus(name)
BASE_URL = "http://musicbrainz.org/ws/2/artist/?query=%s&format=xml&method=advanced" % (name)
print "Querying: %s" % (BASE_URL)
try:
search_input = urllib.urlopen(BASE_URL)
# Checks whether HTTP Request Code is 200 – if not goes to sleep for 5 seconds // Inded for 503 Code
http_code = search_input.code
if http_code != 200:
# print "Could not access: %s \t Got HTTP Code: %s. 5 second cool-down" % (name, http_code)
time.sleep(5)
getCountry(name)
except Exception:
print "GETTING_ERROR: Something went wrong while getting HTTP"
return False
#search_xml = search_input.read()
#print search_xml
try:
tree = ET.parse(search_input)
remove_namespace(tree, u’http://musicbrainz.org/ns/mmd-2.0#’)
feed = tree.getroot()
elem = feed.findall("./artist-list/")
#print elem[0].find(‘name’).text
#print elem[0].get(‘id’)
except Exception:
print "PARSE_ERROR: Something went wrong while parsing HTTP"
return False
try:
if elem[0].find(‘country’) != None:
# print COUNTRIES[elem[0].find(‘country’).text]
try:
country = COUNTRIES[elem[0].find(‘country’).text]
except Exception:
print "Could not find key in countrylist error"
return False
return [country,elem[0].get(‘id’),elem[0].find(‘name’).text]
else:
print elem[0].find(‘name’).text + " has not any country associated\n"
return False
except IndexError, ValueError:
print "ERROR – COULD NOT GET DATA FROM %s\n" % (name)
return False

#Running through the values
get_val = []
set_name = []
set_country = []
for a in range(2,5124):
v = "B%s" % a
sn = "H%s" % a
sc = "G%s" % a
get_val.append(v)
set_name.append(sn)
set_country.append(sc)

for a in range(2,5124):
try:
name = worksheet.acell(get_val[a]).value
res = getCountry(name)
if res:
print res
country, last_id, name = res
worksheet.update_acell(set_name[a], name)
worksheet.update_acell(set_country[a], country)
except Exception as e:
print e

[/sourcecode]

Useful Research Tools

Writing a paper can be both joy and pain as the writing and editing sessions oscillate between a flow of excellent sentences conveying the propositional content in a clear and gracious way, and word-by-word slow typing sentences which does not feel elegant nor correct nor clear. The tools I have found, and today deeply appreciate, may not help you achieve a perfect state of mind with great ideas, nor a continuous flow of  great words since that is something you have to find for yourself, but the tools introduced below may help you in handling your information, handling your text and let you use time on things that really matter.

Zotero

This online service and Firefox plug-in is a great tool for saving web-sites and save book details for later references. Since most operating systems support Firefox (I use it with Windows, Mac OSX and Ubuntu Linux) you can bring with you your references across platforms, and as your data is stored in the cloud you can keep a synchronised list across your computers. You can also visit the website where you can review, organise and add more content if you are working on a computer without Firefox, the Zotero platform and administrative privileges. With Zotero you can also export your bibliography to locally installed software, e.g. BibDesk.

TexShop

LaTex is a great way of working with larger documents as it takes care of your layout so you can focus on the content. It takes some time to learn and get used to, but when you have internalised the instructions you will save much time in everything from checking font consistency throughout the document or add references and citations as LaTex will help you do so. TexShop is easy to install through a package named MacTex. This package is fairly large (1.8 GB) compared to what you would expect if you just want a program to write your 1,5 page papers. Today, space is almost free, and the ease of installation and potential expansion makes this a good package if you are a Mac user and want to create great document layout where you just have to contribute with great content.

BibDesk

If you are using both Zotero and TexShop, or similar software for reference aggregation and LaTex creation, there is no reason to not combine the power of those two. With BibDesk you can control which references you have and make them accessible to LaTex. Since both the generation of text and bibliography can be written in any text-editor BibDesk is a tool for making a JSON-like format more organised for human use. This program lets you define the source type and fill in relevant fields for this source which later automatically can be processed into a bibliography. Neat and time saving.

Dropbox

You probably know that you should take back-ups and with good reason. Even if nothing dramatically happens a computer hard-drive, at least the mechanical, will eventually  break down and at that point you will wish you had your data saved somewhere else as well. Back-up was a lot of work when zip, jazz or floppy-disk were what you used as a storage medium, but with an increasing amount of cloud services back-ups can be made in seconds and without leaving the office chair. Dropbox is a great tool for sharing data between computers and people, but also a good back-up tool. You got plenty of similar services competing directly with Dropbox or with other usage (I use Google’s Picasa to back-up my photos), but few have a user interfaces and tailored plug-ins similar to Dropbox. These are very useful. Dropbox do also expand the size of your account as you recruit new members, and this may encourage you to use the ingenious sharing capabilities.

Google Docs

Google Docs is a great tool. I personally do not prefer to write in the Google Docs web-app, but for sharing documents, or even more while collaborating in real-time this free to use office tool is practical. Think of Google docs as a web-based office suite. You can work together with others in real-time manipulating documents, and you can export these documents into most acknowledged file format if you not prefer to just share the link to the document as Google docs support a clever permission system. Google made web-apps a new standard with the Ajax-capabilities in Google Maps, and with Google Docs they show that programs mostly thought of as “native” can be brought into the web domain. With the recent developments in the HTML5 technologies can we expect to see even more functionality being pushed into the browser.

Transcriber 

From time to time you find yourself in a situation where you need to interview someone. This is a great way of getting second hand experience and to ask the questions you may have directly to the interviewee, but do not get tricked, you may feel you that you can remember most of the conversation as you are sitting in the situation nodding your head, but what you will discover is that pure remembering or taking simple notes on a piece of paper will not suffice. A good idea is to bring a recorder and ask permission to record the conversation, as long as your intentions are to have correct footage of what took place during the interview when writing and not to publish the material this should be OK. Transcribing the interview can be a time consuming task, but software intended for the purpose can make it easier. Transcriber is an Open Source transcriber which lets you play the sound-file and write what being said at various places in the file. This linking is good as you later can go back an revise the particular location of the sound-file without having to search through all of it. It also makes it easier to transcribe a you can both write, and start and stop the sound clip from the same program.

FocusWriter

I got to know about this program through Joe (thank you Joe) a late evening working in Alison House. Most of the time spent in front of computers at our master’s working environment has been spent on Macs, and perhaps the biggest difference, now that Apple are using Intel processors and much of the same hardware as Windows-based PCs, is the user experience and the focus on the human touch. Perhaps it is more experimental etnomethodology conducted at Cupertino than in Redmond, perhaps it is different target groups or just a different ideas on what computers are and how they should work, anyway, FocusWriter is a different text-editor.

 

Here is a short introduction to some of the programs which made my day easier when I wrote my master thesis. I cannot promise that these tools will write your thesis, or make your day, but hopefully can they help you paying attention to where it ought to be paid, on your tasks.  PS: I didn’t put in any links to any of the tools (except my favourite distribution-package of Latex), so you can use perhaps the most idiosyncratic tool of this decade: Google.

 

The picture is taken by the Flick user Anoldent and released under a Creative Commons licence attribution and share-alike.