Browsed by
Tag: spreadsheet

Work programmatically with Google Spreadsheets Part 2

Work programmatically with Google Spreadsheets Part 2

A while back I wrote a short post on how you can write and read to Google Spreadsheets programmatically using Python and the package ‘gspread’.

Last time the reading was done by first creating arrays with the addresses to where the values could be found in the spreadsheet, and then run through all the values and replace the addresses with the values. It worked fine, but it’s not best practice or very efficient as it makes many single requests on the API. In part two, I will share a short tip on how to read the values in one go instead of iterating through a range of values.

Here is the excerpt dealing with retrieving values. (NB: see original blogpost for gspread initialization).

#Running through the values
get_val = []
set_name = []
set_country = []
for a in range(2,5124):
v = "B%s" % a
sn = "H%s" % a
sc = "G%s" % a
get_val.append(v)
set_name.append(sn)
set_country.append(sc)

for a in range(2,5124):
try:
name = worksheet.acell(get_val[a]).value
res = getCountry(name)
if res:
print res
country, last_id, name = res
worksheet.update_acell(set_name[a], name)
worksheet.update_acell(set_country[a], country)
except Exception as e:
print e

In a recent script we only wanted to download values from a Google spreadsheet (yes, we could have exported the files to .csv with similar result, but with a script we may expand and parse if needed), and this gave some time for refactoring the code as well.

The gspread function worksheet.get_all_values() returns a list of lists with the values. The outer list contains the rows, and the row list contains the specific value of the column at the numerical value for the column. In this example num_streams is the second column, and the position is hence [1] as the list starts at zero.

Also note the nifty way of writing utf-8 formatted strings to the file. UTF-8 can often cause an headache, but put a “u”-character before the string and open the stream with codecs.open(“filename”,”mode”,”encoding”).

The new way of retrieving data from a Google Docs Spreadsheet:

# -*- coding: UTF-8 -*-
import gspread
import codecs

# Global variables for accessing resource
G_USERNAME = 'user_email'
G_PASSWORD = 'password'
G_IDENTIFIER = 'spreadsheet_identifier'

# Connecting to the data source
gc = gspread.login(G_USERNAME,G_PASSWORD)
sht1 = gc.open_by_key(G_IDENTIFIER)
worksheet = sht1.get_worksheet(0)

all_val = worksheet.get_all_values()

output = codecs.open('output_norwegian_artists.csv','wb', "utf-8-sig")

for l in all_val:
num_streams, artistid, name = (l[1],l[2],l[3])
norwegian = l[4]
if len(norwegian) < 3:
norwegian = 'NULL'

string = u"{0};{1};{2};{3}\n".format(num_streams, artistid, name, norwegian)
output.write(string)

output.close()

 

Picture licensed under a creative commons attribution license by the brilliant organization Open Knowledge Foundation. Picture retrieved through a CC-search on Flickr

Work programmatically with Google Spreadsheets

Work programmatically with Google Spreadsheets

Some time back I authored a script which reads through a CSV formatted list, and based on the artists’ names tried to decide the nationality of the artists by querying the last.FM search engine and parsing the XML structured result.

The script worked, and found and returned about 80% of the artists, and around 80% of these again had the conceptually similar artist. What if the alteration could be done in the document itself? For the CSV based script the artists had to be extracted from the database, then parsed and then put into the document. With this attempt we can skip the middle step, and let the script run simultaneous with data being available to view for the users.

If you haven’t already used Google Docs, you should consider it, as it is a very convenient way of working with documents, especially where there are more users. Through Google Docs you can collaborate on working on documents, and since they are stored in clouds all users will instantly have the latest version. Another advantage is that whey you are working on the document simultaneously each user is editing the same document, so merging different versions is not a problem.

A great tool for combining Google Spreadsheets with the Python programming environment is the gspread module. After installing this package you only need to import the code and then you can with very few lines of code retrieve and update information in the spreadsheet cells.

The specifics for doing this task is pretty much these few lines. Import the package. Login, find correct document and do whatever you need to. (this excerpt won’t work. Check the whole script under – NB: mind the indentation. It may not be correct displayed in the browser.)

[sourcecode language=”python”]
import gspread

G_USERNAME = ‘your@gmail.com’
G_PASSWORD = ‘yourPassword’
G_IDENTIFIER = ‘document_identifier_checkdocument_url_in_your_browser’

# Connecting to the data source
gc = gspread.login(G_USERNAME,G_PASSWORD)
sht1 = gc.open_by_key(G_IDENTIFIER)
worksheet = sht1.get_worksheet(0)

for a in range(2,5124):
try:
name = worksheet.acell(get_val[a]).value
res = getCountry(name)
if res:
print res
country, last_id, name = res
worksheet.update_acell(set_name[a], name)
worksheet.update_acell(set_country[a], country)
except Exception as e:
print e

[/sourcecode]

Above is the lines related to connecting Python to the google docs, under you can see the whole script and how the method I mentioned in a post earlier is used in this setting.

[sourcecode language=”python”]
#!/usr/bin/python
# -*- coding:utf-8 -*-

"""
Clouds & Concerts – 2012
Ola Loevholm

Initialized from the commandline. Runs through The Google doc spreadsheet with topp 5000 artists, and
runs the parsing query against the Last.FM browser then enters the country and search string (for validation)
into the google docs.

"""

G_USERNAME = ‘your@gmail.com’
G_PASSWORD = ‘yourPassword’
G_IDENTIFIER = ‘document_identifier_checkdocument_url_in_your_browser’

import sys, urllib, string, csv, time
import xml.etree.ElementTree as ET
import gspread

# Loads a dictionary with ISO 3166-1 abbreviations and codes
COUNTRIES = {"AF":"AFGHANISTAN","AX":"ÅLAND ISLANDS","AL":"ALBANIA","DZ":"ALGERIA","AS":"AMERICAN SAMOA","AD":"ANDORRA","AO":"ANGOLA","AI":"ANGUILLA","AQ":"ANTARCTICA","AG":"ANTIGUA AND BARBUDA","AR":"ARGENTINA","AM":"ARMENIA","AW":"ARUBA","AU":"AUSTRALIA","AT":"AUSTRIA","AZ":"AZERBAIJAN","BS":"BAHAMAS","BH":"BAHRAIN","BD":"BANGLADESH","BB":"BARBADOS","BY":"BELARUS","BE":"BELGIUM","BZ":"BELIZE","BJ":"BENIN","BM":"BERMUDA","BT":"BHUTAN","BO":"BOLIVIA, PLURINATIONAL STATE OF","BQ":"BONAIRE, SINT EUSTATIUS AND SABA","BA":"BOSNIA AND HERZEGOVINA","BW":"BOTSWANA","BV":"BOUVET ISLAND","BR":"BRAZIL","IO":"BRITISH INDIAN OCEAN TERRITORY","BN":"BRUNEI DARUSSALAM","BG":"BULGARIA","BF":"BURKINA FASO","BI":"BURUNDI","KH":"CAMBODIA","CM":"CAMEROON","CA":"CANADA","CV":"CAPE VERDE","KY":"CAYMAN ISLANDS","CF":"CENTRAL AFRICAN REPUBLIC","TD":"CHAD","CL":"CHILE","CN":"CHINA","CX":"CHRISTMAS ISLAND",
"CC":"COCOS (KEELING) ISLANDS","CO":"COLOMBIA","KM":"COMOROS","CG":"CONGO","CD":"CONGO, THE DEMOCRATIC REPUBLIC OF THE","CK":"COOK ISLANDS","CR":"COSTA RICA","CI":"CÔTE D’IVOIRE","HR":"CROATIA","CU":"CUBA","CW":"CURAÇAO","CY":"CYPRUS","CZ":"CZECH REPUBLIC","DK":"DENMARK","DJ":"DJIBOUTI","DM":"DOMINICA","DO":"DOMINICAN REPUBLIC","EC":"ECUADOR","EG":"EGYPT","SV":"EL SALVADOR","GQ":"EQUATORIAL GUINEA","ER":"ERITREA","EE":"ESTONIA","ET":"ETHIOPIA","FK":"FALKLAND ISLANDS (MALVINAS)","FO":"FAROE ISLANDS","FJ":"FIJI","FI":"FINLAND","FR":"FRANCE","GF":"FRENCH GUIANA","PF":"FRENCH POLYNESIA","TF":"FRENCH SOUTHERN TERRITORIES","GA":"GABON","GM":"GAMBIA","GE":"GEORGIA","DE":"GERMANY","GH":"GHANA","GI":"GIBRALTAR","GR":"GREECE","GL":"GREENLAND","GD":"GRENADA","GP":"GUADELOUPE","GU":"GUAM","GT":"GUATEMALA","GG":"GUERNSEY","GN":"GUINEA","GW":"GUINEA-BISSAU","GY":"GUYANA","HT":"HAITI","HM":"HEARD ISLAND AND MCDONALD ISLANDS",
"VA":"HOLY SEE (VATICAN CITY STATE)","HN":"HONDURAS","HK":"HONG KONG","HU":"HUNGARY","IS":"ICELAND","IN":"INDIA","ID":"INDONESIA","IR":"IRAN, ISLAMIC REPUBLIC OF","IQ":"IRAQ","IE":"IRELAND","IM":"ISLE OF MAN","IL":"ISRAEL","IT":"ITALY","JM":"JAMAICA","JP":"JAPAN","JE":"JERSEY","JO":"JORDAN","KZ":"KAZAKHSTAN","KE":"KENYA","KI":"KIRIBATI","KP":"KOREA, DEMOCRATIC PEOPLE’S REPUBLIC OF","KR":"KOREA, REPUBLIC OF","KW":"KUWAIT","KG":"KYRGYZSTAN","LA":"LAO PEOPLE’S DEMOCRATIC REPUBLIC","LV":"LATVIA","LB":"LEBANON","LS":"LESOTHO","LR":"LIBERIA","LY":"LIBYA","LI":"LIECHTENSTEIN","LT":"LITHUANIA","LU":"LUXEMBOURG","MO":"MACAO","MK":"MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF","MG":"MADAGASCAR","MW":"MALAWI","MY":"MALAYSIA","MV":"MALDIVES","ML":"MALI","MT":"MALTA","MH":"MARSHALL ISLANDS","MQ":"MARTINIQUE","MR":"MAURITANIA","MU":"MAURITIUS","YT":"MAYOTTE","MX":"MEXICO","FM":"MICRONESIA, FEDERATED STATES OF",
"MD":"MOLDOVA, REPUBLIC OF","MC":"MONACO","MN":"MONGOLIA","ME":"MONTENEGRO","MS":"MONTSERRAT","MA":"MOROCCO","MZ":"MOZAMBIQUE","MM":"MYANMAR","NA":"NAMIBIA","NR":"NAURU","NP":"NEPAL","NL":"NETHERLANDS","NC":"NEW CALEDONIA","NZ":"NEW ZEALAND","NI":"NICARAGUA","NE":"NIGER","NG":"NIGERIA","NU":"NIUE","NF":"NORFOLK ISLAND","MP":"NORTHERN MARIANA ISLANDS","NO":"NORWAY","OM":"OMAN","PK":"PAKISTAN","PW":"PALAU","PS":"PALESTINIAN TERRITORY, OCCUPIED","PA":"PANAMA","PG":"PAPUA NEW GUINEA","PY":"PARAGUAY","PE":"PERU","PH":"PHILIPPINES","PN":"PITCAIRN","PL":"POLAND","PT":"PORTUGAL","PR":"PUERTO RICO","QA":"QATAR","RE":"RÉUNION","RO":"ROMANIA","RU":"RUSSIAN FEDERATION","RW":"RWANDA","BL":"SAINT BARTHÉLEMY","SH":"SAINT HELENA, ASCENSION AND TRISTAN DA CUNHA","KN":"SAINT KITTS AND NEVIS","LC":"SAINT LUCIA","MF":"SAINT MARTIN (FRENCH PART)","PM":"SAINT PIERRE AND MIQUELON","VC":"SAINT VINCENT AND THE GRENADINES",
"WS":"SAMOA","SM":"SAN MARINO","ST":"SAO TOME AND PRINCIPE","SA":"SAUDI ARABIA","SN":"SENEGAL","RS":"SERBIA","SC":"SEYCHELLES","SL":"SIERRA LEONE","SG":"SINGAPORE","SX":"SINT MAARTEN (DUTCH PART)","SK":"SLOVAKIA","SI":"SLOVENIA","SB":"SOLOMON ISLANDS","SO":"SOMALIA","ZA":"SOUTH AFRICA","GS":"SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS","SS":"SOUTH SUDAN","ES":"SPAIN","LK":"SRI LANKA","SD":"SUDAN","SR":"SURINAME","SJ":"SVALBARD AND JAN MAYEN","SZ":"SWAZILAND","SE":"SWEDEN","CH":"SWITZERLAND","SY":"SYRIAN ARAB REPUBLIC","TW":"TAIWAN, PROVINCE OF CHINA","TJ":"TAJIKISTAN","TZ":"TANZANIA, UNITED REPUBLIC OF","TH":"THAILAND","TL":"TIMOR-LESTE","TG":"TOGO","TK":"TOKELAU","TO":"TONGA","TT":"TRINIDAD AND TOBAGO","TN":"TUNISIA","TR":"TURKEY","TM":"TURKMENISTAN","TC":"TURKS AND CAICOS ISLANDS","TV":"TUVALU","UG":"UGANDA","UA":"UKRAINE","AE":"UNITED ARAB EMIRATES","GB":"UNITED KINGDOM","US":"UNITED STATES",
"UM":"UNITED STATES MINOR OUTLYING ISLANDS","UY":"URUGUAY","UZ":"UZBEKISTAN","VU":"VANUATU","VE":"VENEZUELA, BOLIVARIAN REPUBLIC OF","VN":"VIET NAM","VG":"VIRGIN ISLANDS, BRITISH","VI":"VIRGIN ISLANDS, U.S.","WF":"WALLIS AND FUTUNA","EH":"WESTERN SAHARA","YE":"YEMEN","ZM":"ZAMBIA","ZW":"ZIMBABWE"}

 

# Connecting to the data source
gc = gc = gspread.login(G_USERNAME,G_PASSWORD)
sht1 = gc.open_by_key(G_IDENTIFIER)
worksheet = sht1.get_worksheet(0)
# Iterates through XML-structure and removes the namespace, for easier navigation in getCountry()s ElementTree.findall()
def remove_namespace(doc, namespace):
"""Remove namespace in the passed document in place."""
ns = u'{%s}’ % namespace
nsl = len(ns)
for elem in doc.getiterator():
if elem.tag.startswith(ns):
elem.tag = elem.tag[nsl:]
# getCountry – where the magic happens. Encodes string with artistname to url, then query musicbrainz search engine.
# parses the XML-answer and get the name, id and country of the first returned element (with highest weight)
# returns country name i a) artist is found through the search engine b) artist has a country associated to the profile, otherwise returns False
def getCountry(name):
name = urllib.quote_plus(name)
BASE_URL = "http://musicbrainz.org/ws/2/artist/?query=%s&format=xml&method=advanced" % (name)
print "Querying: %s" % (BASE_URL)
try:
search_input = urllib.urlopen(BASE_URL)
# Checks whether HTTP Request Code is 200 – if not goes to sleep for 5 seconds // Inded for 503 Code
http_code = search_input.code
if http_code != 200:
# print "Could not access: %s \t Got HTTP Code: %s. 5 second cool-down" % (name, http_code)
time.sleep(5)
getCountry(name)
except Exception:
print "GETTING_ERROR: Something went wrong while getting HTTP"
return False
#search_xml = search_input.read()
#print search_xml
try:
tree = ET.parse(search_input)
remove_namespace(tree, u’http://musicbrainz.org/ns/mmd-2.0#’)
feed = tree.getroot()
elem = feed.findall("./artist-list/")
#print elem[0].find(‘name’).text
#print elem[0].get(‘id’)
except Exception:
print "PARSE_ERROR: Something went wrong while parsing HTTP"
return False
try:
if elem[0].find(‘country’) != None:
# print COUNTRIES[elem[0].find(‘country’).text]
try:
country = COUNTRIES[elem[0].find(‘country’).text]
except Exception:
print "Could not find key in countrylist error"
return False
return [country,elem[0].get(‘id’),elem[0].find(‘name’).text]
else:
print elem[0].find(‘name’).text + " has not any country associated\n"
return False
except IndexError, ValueError:
print "ERROR – COULD NOT GET DATA FROM %s\n" % (name)
return False

#Running through the values
get_val = []
set_name = []
set_country = []
for a in range(2,5124):
v = "B%s" % a
sn = "H%s" % a
sc = "G%s" % a
get_val.append(v)
set_name.append(sn)
set_country.append(sc)

for a in range(2,5124):
try:
name = worksheet.acell(get_val[a]).value
res = getCountry(name)
if res:
print res
country, last_id, name = res
worksheet.update_acell(set_name[a], name)
worksheet.update_acell(set_country[a], country)
except Exception as e:
print e

[/sourcecode]