Useful Research Tools

Writing a paper can be both joy and pain as the writing and editing sessions oscillate between a flow of excellent sentences conveying the propositional content in a clear and gracious way, and word-by-word slow typing sentences which does not feel elegant nor correct nor clear. The tools I have found, and today deeply appreciate, may not help you achieve a perfect state of mind with great ideas, nor a continuous flow of  great words since that is something you have to find for yourself, but the tools introduced below may help you in handling your information, handling your text and let you use time on things that really matter.


This online service and Firefox plug-in is a great tool for saving web-sites and save book details for later references. Since most operating systems support Firefox (I use it with Windows, Mac OSX and Ubuntu Linux) you can bring with you your references across platforms, and as your data is stored in the cloud you can keep a synchronised list across your computers. You can also visit the website where you can review, organise and add more content if you are working on a computer without Firefox, the Zotero platform and administrative privileges. With Zotero you can also export your bibliography to locally installed software, e.g. BibDesk.


LaTex is a great way of working with larger documents as it takes care of your layout so you can focus on the content. It takes some time to learn and get used to, but when you have internalised the instructions you will save much time in everything from checking font consistency throughout the document or add references and citations as LaTex will help you do so. TexShop is easy to install through a package named MacTex. This package is fairly large (1.8 GB) compared to what you would expect if you just want a program to write your 1,5 page papers. Today, space is almost free, and the ease of installation and potential expansion makes this a good package if you are a Mac user and want to create great document layout where you just have to contribute with great content.


If you are using both Zotero and TexShop, or similar software for reference aggregation and LaTex creation, there is no reason to not combine the power of those two. With BibDesk you can control which references you have and make them accessible to LaTex. Since both the generation of text and bibliography can be written in any text-editor BibDesk is a tool for making a JSON-like format more organised for human use. This program lets you define the source type and fill in relevant fields for this source which later automatically can be processed into a bibliography. Neat and time saving.


You probably know that you should take back-ups and with good reason. Even if nothing dramatically happens a computer hard-drive, at least the mechanical, will eventually  break down and at that point you will wish you had your data saved somewhere else as well. Back-up was a lot of work when zip, jazz or floppy-disk were what you used as a storage medium, but with an increasing amount of cloud services back-ups can be made in seconds and without leaving the office chair. Dropbox is a great tool for sharing data between computers and people, but also a good back-up tool. You got plenty of similar services competing directly with Dropbox or with other usage (I use Google’s Picasa to back-up my photos), but few have a user interfaces and tailored plug-ins similar to Dropbox. These are very useful. Dropbox do also expand the size of your account as you recruit new members, and this may encourage you to use the ingenious sharing capabilities.

Google Docs

Google Docs is a great tool. I personally do not prefer to write in the Google Docs web-app, but for sharing documents, or even more while collaborating in real-time this free to use office tool is practical. Think of Google docs as a web-based office suite. You can work together with others in real-time manipulating documents, and you can export these documents into most acknowledged file format if you not prefer to just share the link to the document as Google docs support a clever permission system. Google made web-apps a new standard with the Ajax-capabilities in Google Maps, and with Google Docs they show that programs mostly thought of as “native” can be brought into the web domain. With the recent developments in the HTML5 technologies can we expect to see even more functionality being pushed into the browser.


From time to time you find yourself in a situation where you need to interview someone. This is a great way of getting second hand experience and to ask the questions you may have directly to the interviewee, but do not get tricked, you may feel you that you can remember most of the conversation as you are sitting in the situation nodding your head, but what you will discover is that pure remembering or taking simple notes on a piece of paper will not suffice. A good idea is to bring a recorder and ask permission to record the conversation, as long as your intentions are to have correct footage of what took place during the interview when writing and not to publish the material this should be OK. Transcribing the interview can be a time consuming task, but software intended for the purpose can make it easier. Transcriber is an Open Source transcriber which lets you play the sound-file and write what being said at various places in the file. This linking is good as you later can go back an revise the particular location of the sound-file without having to search through all of it. It also makes it easier to transcribe a you can both write, and start and stop the sound clip from the same program.


I got to know about this program through Joe (thank you Joe) a late evening working in Alison House. Most of the time spent in front of computers at our master’s working environment has been spent on Macs, and perhaps the biggest difference, now that Apple are using Intel processors and much of the same hardware as Windows-based PCs, is the user experience and the focus on the human touch. Perhaps it is more experimental etnomethodology conducted at Cupertino than in Redmond, perhaps it is different target groups or just a different ideas on what computers are and how they should work, anyway, FocusWriter is a different text-editor.


Here is a short introduction to some of the programs which made my day easier when I wrote my master thesis. I cannot promise that these tools will write your thesis, or make your day, but hopefully can they help you paying attention to where it ought to be paid, on your tasks.  PS: I didn’t put in any links to any of the tools (except my favourite distribution-package of Latex), so you can use perhaps the most idiosyncratic tool of this decade: Google.


The picture is taken by the Flick user Anoldent and released under a Creative Commons licence attribution and share-alike.


Open Data Phenomenon Trend Literature

On Tuesday it is just one month until the master thesis is to be handed in, hence the next weeks will be spent in front of the keyboard writing down findings from literature and interviews, editing the text, making connections and hopefully complete a text that can convey something interesting about what I perceive as an interesting trend in society.

At the current stage the chapter headings are written in section and subsection markup, but these are not written in stone. Trying to unravel a relatively new phenomenon  is hard, and especially when the intersections between disciplines, methodology, theories and praxis are so varied. Since 2009 have a new trend emerged in which government share raw data from the public sector with the public sphere. How did this trend emerge? How does it work? What are the motivations, and how are the data used? In which context does this trend emerge? Many questions are to be asked, and hopefully the results of the research and literature review will yield a descriptive title.

I want to divide the literature I have read into specific and general literature. The general literature is the many books which writes about the general trend in society related to what I hope to understand, on the other hand is the specific literature employed to answer smaller and more concentrated questions. After conducting one of the interviews I reviewed Edward Tufte’s The Visual Display of Quantitative Information. I think this is a book holds a key function in understanding computer visualisations, a popular usage of open public data, but it can not help me understand how, why and by which means the phenomenon came into existence. Books with a general scope are the books of Clay Shirky. Both his Here Comes Everybody and Cognitive Surplus seems to describe a general trend in our current society, and perhaps it is in the same ideological field that the interest for using public data starts. The alteration of our spare-time habits, and consumer/production dichotomy and the abolition of the self-supporting organisations structures are very interesting theories, and may explain trends such as Twitter, Wikipedia and Flick, but can it also explain “unprecedented level of openness in Government” to cite President Barack Obama. In the same categories of thoughts can we also find Jeff Howe’s book on Crowdsourcing, titled with its subject of discussion, or in Charles Leadbeater’s We Think focusing on the rise of mass creativity. This books exemplifies and describes the rise of the entrepreneurial web-culture, both as profit and non-profit, intensionally or not. Perhaps these books can explain the 2003 EU directive encouraging government to enable their information of re-use, as the axiom of crowds and content is that unexpected value creation will eventually happen.

The culture of Internet is important. Not only the culture as in the content provided by companies addressing their old customers on the new arena, but also the inherent culture of the Internet; the techno-meritocratic, the hacker, the communitarian and the entrepreneur cultures, as described by Manuel Castells’ in the Internet Galaxy. Perhaps the hacker-culture openness is gaining prevalence on the expense of the entrepreneurs, or have they merged? Bill Gates Open Letters to Hackers seems to be writing a long time ago considering todays remix culture where software companies developing operative systems provides a distribution platforms for smaller apps developed by everything from gigantic organisations to independent developers. For the mass market this came with App Store on iPhone, and later in Mac OSX and for Android users Google has provided Android Market (For the hacker community apt-like services has been available for longer, and with open source usage of code repositories such as git and subversion the newest versions have been available to be downloaded and compiled for free) Interesting fact, the governmental data can also be used for free, as in free speech, for people who intend to make profit on services in which they are included. An established rule in open source development testing is that “given enough eyeballs, all bugs are shallow”. Linus law, named after the father of the Linux core and bazaar development, by Eric S. Raymond in his book The Cathedral and the Bazaar. If source code is available, why shouldn’t governmental information be? The whistle-blower website Wikileaks got much attention after releasing US diplomatic cables and the arrest of its leader Julian Assange, formerly known as Mendax. The hacker culture is not about what the culture itself would define as cracking, or in most cases disillusioned script-kiddis, it is about making changes to, improving software and learning the skill of computers. Could a parallel be drawn to society, or knowledge in general? Do FOI, Freedom Of Information, have something in common with the famous quote from Emacs text-editor programmer and Open Source guru Richard Stallman “free as in free speech; not as in free beer.” In his book on Wikileaks, Wikileaks and The Age of Transparency, Micah Sifry dedicates a chapter to the open government and the open data government trend. If you are sceptical to transparency or want a good reflection on how it is not all good I can recommend Lawrence Lessig’s article Against Transparency in the New Republic.

Governments sharing their information online is not only interesting for the hacker community or former consumers now emancipated from the fetters of Falcon Crest, it also adds up to large number of data that can be used in what is considered to be the next major iteration of the World Wide Web: the semantic web. Sir Tim Berners-Lee, the father of the World Wide Web has in an online draft stipulated a rating system for how data can contribute in a web of data. The web as we know it is made up from document linked through URLs, but what if it can also contain data. If you want to see how document and data function differently you can try to compare Google to Wolfram Alpha, the latter being a search engine presenting the results based on data while Google’s page rank engine gives you results according to the textual information on a page and its keywords.

If you are interested in the subject I have added my literature below. Please note that web links are not included in these files. If you have any books you recommend that are not on the list below please post the reference to them in the comment field.


Literature as it was the 18. July 2011 pdf


Literature 1807 bibTex zipped


The illustrative photo, where none of the books are those mentioned in this chapter is licensed under Creative Commons by Sanford Kearns and found on Flickr. Please refer to the link for more information.


Last week Prime Minister David Cameron announced that more governmental information produced by the UK public sector should be opened to the public. This will according to the Guardian make the UK repository the largest in the world, exceeding the US and equivalent in other countries.

I decided two months ago to have a closer look at the emerging trend of government sharing their data with the public, and yesterdays statement adds to my impression of the importance this process has gained. Several countries and organisations are now sharing data they have generated and aggregated through governmental repositories into the public sphere.

In a letter on transparency and open data sent to the ministers of the cabinet the Prime Minister writes, addressing the secretaries of state:

As you know, transparency is at the heart of our agenda for Government. We recognise that transparency and open data can be a powerful tool to help reform public services, foster innovation and empower citizens. We also understand that transparency can be a significant driver of economic activity, with open data increasingly enabling the creation of valuable new services and applications. (The whole document containing a list of public data made available can be found here.)

The wording of this introductory excerpt is similar to that used by president Barack Obama in his memorandum on Transparency and Open Government sent to the heads of executive departments and agencies where he denotes that governments should be transparent, participatory and collaboratory and instructing the addressed departments and agencies to include these principles.

The goal of the data government repositories are to make open, high-value data produced in the public sector available to the public, but what does open and high-value mean?


Open Data

Here it can be useful to make a distinction between public and open data. Open data can be considered as data fulfilling the list below, this can be data from any entity. Public data however have two meanings. Public data can either be data released into the public sphere or data produced and aggregated by the public sector. I will as far as possible try to keep the distinctions clear by writing out the whole sentence whether the public data is produced by the public sector or released to the public sphere. This can also be done reciprocally when the public sphere produce data which is latter used by the public sector as in Fix My Street developed by My Society.

What is Open Data, and what delineates open from closed data? Is it enough to have the data accessible for the public to see, or does also other factors play in? To decide what should be considered as Open Data we need to set a threshold. Definitions are not neutral, and they bring with them implications.

Open Knowledge Foundation has on the site Open Definition defined what should be considered as Open. The Open Knowledge Definition contains an eleven point list over what recognises open knowledge. This list is influenced and hence similar to the Open Source Definition created by the Open Source Initiative, and inherent much of the ideology and principles behind the Open Source movement which evolved and manifested itself in the computer culture. Whether all, less or more criterion need to be fulfilled for an element of knowledge to be considered open is a normative question, but of practical reasons let us accept the Open Knowledge Definition as our definition.

Absence of Technological Restriction
No Discrimination Against Persons or Group
No Discrimination Against Field of Endeavour
Redistribution of Licence
Licence must not be specific to a Package
Licence Must Not Restrict the Distribution of Other Work

The data has to be accessible by all, and should be published under a licence granting the user right to redistribute and build upon the data. The data should not be discriminatory, and should make it possible for the user to use it for every purpose the user want to use it. Richard Stallman, developer of Emacs and the man behind GNU General Public Licence and Free Software Foundation is famous for his quote, summarising what should be considered as open in a comparison between two definition of free: free as in free speech, not as in free beer.

An important motivation for the release of data to the public sphere is the idea that data can be a valuable asset in the hands of the public. In directive 2003/98/EC on the re-use of public sector information the European Union sets guidelines where member states are asked to make available public sector information.


Semantic data

Another important aspect of the release of open governmental data is that is has to readable for both computers and humans. This may be seen as mundane, but if data are released as images or embedded into Portable Document Format documents or Flash the data is not readable or it would take a an unnecessary effort to screen scrape the data from the site, on the other hand if data stored in a binary format they are not explicitly readable by humans.

The organisation working with developing standards and to lead the web to its full potential, the W3C – World Wide Web Consortium, has suggested several ways of making data available to the public, and encourages government to enrich their online presence with semantics, meta-data, and identifiers as well as opening data in open formats and industrial standards especially XML, and allow the information for electronic citation.

In addition to XML and HTML based files are many of the services based on comma or tab separated lists for static data, and application programming interfaces for data that are frequently updated. Data sharers are also encouraged to release their data with semantics, and containing relevant meta-data. This is important to place the data in an understandable context, and of practical reasons such as ensuring more correct search engine result, or improving intelligent applications understanding of the data. This goes into the idea of a semantic web, where computers are more aware of the semantic content of the data represented. Through marking data with semantic denotation such as Resource Description Framework or Web Ontology Language developers can use data mining techniques to find connections between the data-sets by applying computer intelligence.

A five star scheme is developed to rank the semantic web value of data-sets and its meta-information according to how it has been released on the web. The data get one star as soon as it is released with an open licence. To get two stars the data has to be made available in a machine-readable structured format, and if this is a non-propriatary format e.g. CSV or XML instead of Excel the set is ranked to three stars. The two last stars are reserved for data that already have fulfilled the criterion for three stars, but in addition mark the data with semantic markers. The four and the five star evaluation is based on whether the data is placed in a context of other people’s data. To gain the top score the data has to be placed in a context (T.B.Lee Linked Data).


Governmental Data Repositories of the United States of America, of the United Kingdom and of the Norwegian government are just three examples of governmental data repositories aggregating data from the public sector in each country. We will review this three examples later, but first it can be beneficial to mention some of the similar traits shared by all the examples included in this paper, and also by other versions.

A naming convention seems to be established by the adaption of the data prefix followed by the domain of the public services. The repository sites are divided into section where users can find raw-data and where they can find applications made from these data. Both these options contain a search function where the user can find data or application based on criterion such as format, publisher of data, and topics. The data can also be sorted by ranking, number of visits and number of downloads. is the data repository for the government of the United Kingdom, and has been formally online since January 2010. The data can be sorted according to which department or agency that owns the data, and by doing this we can find find out that Department of Health is the largest contributor with 1001 data-sets, followed by The Department for communities and Local Government (781 data-sets), and UK Statistics Authority (716 data-sets). The site has 157 Apps registered. The work on the site is overseen by the Transparency Board where Sir Tim Berners-Lee is one of the members, the other member include Dr. Rufus Pollock who was one of the founders of the Open Knowledge Foundation and Francis Maude, the Minister of the Cabinet Office. The implementation of is led by the Transparency and Digital Engagement Team in the Cabinet Office.

The data published on is licensed under an Open Government Licence. This licence has been developed for making reuse of public data easy, and is maintained by the National Library. is the data repository of the United States of America. it was opened after a memorandum on transparency and open government to the administrative entities following the inauguration of president Barack Obama in 2009, in this the new president urged for a unprecedented level of openness in government. The site was officially opened on 21. May the same year, and when first opened it was released with 47 data-sets. On the two year anniversary of the site. Focus at the top administrative level has been said to be important for the quick release of data. The American government, and its agencies has an open attitude to sharing of data, and the democratic aspect has been emphasised through transparency and accountability. has also released a substantial amount of geographical data in addition to the other data-sets, this data contains over 390,000 records. is the future site of the Norwegian data repository, now hosting a blog and lists to beta data-sets and applications.

The Ministry of Innovation, Administration and Church Affairs administers a blog that lay out the development of an opening of public data. The blog, and some beta features including XX data-sets and applications can be found following the naming conventions already used by the US and UK governments. This blog was opened with a blog post by Minister Rigmor Aarud 19. April 2010.

The Norwegian Government has chosen to develop their own licence for sharing data. This has meet critique as a universal licence has been preferred, but creating an universal licence available in the local language can also be a better solution if the alternative is for the local organisations to use a separate, customised licence.

The Norwegian solutions focus is less on transparency and accountability, and more on value creation and new innovative and entrepreneurial solution. Sharing of public data has not had a similar attention as in the US and the UK, but it has been listed as a focus area (fellesføring) for the central governmental agencies and directorates.

Other countries
In May 2011 16 nations, several other governances and organisations ranging from local municipalities to international organisations had opened similar repositories to the world. The countries that have released public data is mainly located in Europe, North-America and Oceania, but also some African and South-American countries have opened data.

A list containing open data catalogs can be found on This list is curated by open data experts from different branches of government, organisations, and NGOs. The list was launched the Open Knowledge Conference in June 2011, and contains over 125 references to open data repositories. Among the organisations that have released high-value public data is the World Bank and the United Nations. This site is developed by Open Knowledge Foundation, the same organisation that have developed the Comprehensive Knowledge Archive Network the data stores the catalogue behind


This document is still under development so if you have any questions, feedback or correction please get in touch. The illustration Picture is made by the Sunlight foundation and borrowed from their homepage. Please read this blogpost for more information

The Writing Epiphany

I’ve been back in Edinburgh for over a week since my little trip to Norway, and as an epiphany the sudden realization of how much I have to write until mid-August. I’ve picked up many interesting points the last week from doing experiments with Open Data, and visualizations learned from the book Visualizing Data by Ben Fry. In addition I read the first half of Clay Shirky’s Here comes everybody (This book was recently listed on the Guardian’s 100 best non-fiction books).

It is a common conception that a text will be easy to write as soon as the thoughts are structured in the mind. I have several times been a victim of this idea. Several times I have read far too much, without getting a word to the paper (or in my case, to the text editor). This is a dangerous trend as words tend to change as they leave the dynamic, flexible realm of the internal mind when they manifest themselves as an own entity in the form of a text on paper or in a computers hard-drive.

This time I wanted to challenge this common conception, and rather try to start writing early. Already in May I started to gather some notes from literature, but much of what I wrote now seems redundant. From the same trip I do have a good conversation with Sverre Lunde-Danbolt, the project leader for, (thank you for your help) which I need to transcribe. I also have some questions to formulate to further interviews, but I feel I have more knowledge about the subject area now than when I decided to write about the phenomenon (if you read this, do you have any better term for this phenomenon which does not contain the word ‘phenomenon’? )

Today’s goal is to have a good text on the governmental data sites for the UK, US and Norwegian government.


The picture is taken by Mike McKay, and licensed under Creative Commons. Please refer to his Flickr-page for more information.

Master thesis

I’m now finished with two semesters coursework for my masters, and the thesis is left to write before I hopefully will be entitled to receive my masters degree. I have learned much from the two previous semesters, and hope to learn even more from the process leading up to my dissertation is printed and handed in. As a part of my documentation and thought process I will use this blog to write about topics and findings related to my thesis. This way I can both contribute to the Internet community (or a tiny tiny fragment of it) by adding knowledge I acquire, gain more experience in English writing and conveying academic ideas and results. Since I already update this blog I think this is a better solution to write here, than to create a new dedicated master blog. To make it easier for you to find information related to my master thesis, I have created a cloud tag to refeer to articles linked to this final paper. The tag “master project” will lead you to general post written about my project, and the tag “master thesis” is dedicated to information about the specific paper.

During the two previous semesters I have written, among more, about the following subjects: what is information and data, the bloggosphere and the agenda of the mass media, advertising  and branding in contemporary digital media society, technological determinism, the philosophy of technology. I have also revitalised my interest in programming, so for my final project and thesis I want to combine knowledge learned from the courses taken and also include a practical element.

I want to look into governments sharing their data repositories with the Internet users, and various topics related to this. Here is my project description at the current stage:




The liberal democracy, and the modern national state are ideas indebted to ideals of the enlightenment. As the power of societies changed from the Feudal state to the Bourgeois, a structure where the political decisions were to be rooted in the citizens became a goal. The philosopher Jürgen Habermas has described the conditions which lead up to and followed from the rise of the bourgeois public sphere. In a period from the late 18th and to the early 19th century democratic decisions were a result of discussions, exchange of meanings and political activity in coffee houses, in pamphlets, and in private arrangements. With changed conditions in the mid 19th century the bourgeois public sphere demised. Core ideals of the liberal democracy are information, education, equality and impartiality. Good decisions are often those that are well informed, and information is a key value to make weighted decisions. Habermas’ concern is about contemporary society is, among other, directed towards the commodification of the mass media. Critique is also directed towards public management, and the transformation where citizens are now users of public services or consumers of private commodities.

Many have asserted that the wide diffusion of the personal computer, and similar devices providing interactive means of gathering, sharing, commenting upon, analysing and treating information, in combination with the Internet, providing a two-way channel of information will reinvigorate democracy. Users are now free to make their own information based on their own sources, and create their own knowledge communities outside the domain of the mass media.

Open Government:

The United States’ government opened the service in 2009, and in 2010 was United Kingdoms’ equivalent released. Other governments have also created similar initiatives, sharing data repositories with the public. These data repositories contains data aggregated through the various state agencies, ministries, governmental organisations, and political administrations ranging from national to local levels. The UK government has shared 6,900 datasets, and the US shares 250,000. The sets are released under an open licence, and the sites encourage users to take advantage of the collections in creative ways by creating communities and sharing applications made from the datasets.


Various open frameworks have been developed and made available for users to create their own representations from the datasets shared by the open government. I want to use two tools, which are not exclusively visualisation tools: Processing, and its HTML5 canvas-object portation ProcesssingJS (where JS is the abbreviation for JavaScript) and Python. In addition I may use Google’s Map API for integration with geographical data.


Research aim:

I will see how Open Government data is an can be used to democratise the representation and interpretation of information gathered and produced by the governments. I will see what information the US, UK and Norwegian government shares on their data sites and review some ways these data are used. I will mainly look at examples where data has been used in new and alternative ways to convey information about the society, and how this is visualised. I will look into why data is shared, which motivations lies behind this sharing, how it can be used and how this sharing fits in with democratic ideas. I do also want to look into the semantic web aspect of such services.



My research will consist of two approaches.

First, I want to get in touch with people who have combined governmental data in applications to learn more about their motivations, aims and process, and to ask about the results and their opinions on digital media. I will also, at an early stage of the process, get in touch with government sources working with Open Government to learn more about the motivations, challenges and practices. To conduct these interviews I will use a qualitative semi-structured research interview that I will record and transcribe.

Second, I want to have a closer look at how these data are combined, what data that are available, and combine two or three different dataset in an application. I will write an application taking data from two or more sources and combine these to create meaning from data. In this process I hope to experience first hand how these repositories can be used, and learn more about the technical aspects of gathering, structuring and displaying data.

In addition I will read relevant literature ranging from the practical aspects with DIY-handbooks and blog post tutorials, to philosophy on democracy, public sphere, information and rationality.


Learning outcomes:

I want to combine the theoretical normative ideals of democracy, transparency, creativity and knowledge with the practical approach to sharing information and data. The theoretical aspects will be beneficiary for understanding information and its importance for democracy. The practical aspects will give insights to widespread technology that is expected to be utilized in more areas and to a greater extent. The project will also be interesting as many relevant theories within the field of digital media can be combined, e.g. intellectual property, crowd-sourcing, decentralisation of knowledge, bricolage, and semantic web. The project does also potentially hold an interesting philosophical debate in the epistemological questions which can be raised from a “data as truth”, and instrumental reason.


The article illustration is licenced under a Creative Commons licence, and is the property of OpenSoruceWay. It is found through Flickr. Please refer to link for more information about the illustration.