Browsed by
Category: data-work

Load database from CSV with columns and tables read from file.

Load database from CSV with columns and tables read from file.

The issue at hand here is to load a database with CSV-files in a generic way without having to write separate script for each table. This short Groovy-script is working based on two conventions: The name of the CSV-file matches the name of the table in the database. The name of the CSV-header matches the name of the column in the table. The script requires the CSV-files to be read in the folder “to_digest”. The connection details is placed in…

Read More Read More

SQL: Setup a Learning Workbench with SQLite and SQuirreL

SQL: Setup a Learning Workbench with SQLite and SQuirreL

Install SQLite To download SQLite you only need to download a file from the project homepage of SQLite. Select download, and precompiled download for your prefered operating system. Unzip the folder with the binary, and place it in a folder that is accessible from your command line interface (The terminal on Mac/*nix systems and Powershell or cmd on Windows). This is a folder that is in the path variable. Here is how you can add a folder to the path…

Read More Read More

Work programmatically with Google Spreadsheets Part 2

Work programmatically with Google Spreadsheets Part 2

A while back I wrote a short post on how you can write and read to Google Spreadsheets programmatically using Python and the package ‘gspread’. Last time the reading was done by first creating arrays with the addresses to where the values could be found in the spreadsheet, and then run through all the values and replace the addresses with the values. It worked fine, but it’s not best practice or very efficient as it makes many single requests on…

Read More Read More

API-scrape images from Instagram

API-scrape images from Instagram

An image can say more than a thousand words, especially when you add a retro-filter and a score of hashtags to go with it. That is a basic explanation to the functionality of Instagram; the power app which revolutionised peoples creativity when it came to documenting dietary habits… and popularised images in social media. Instagram brought social into photography in a way other more desktop-oriented photo sharing applications like Picasa and Flick never managed. It is social and users can…

Read More Read More

Prepare Your Data For Analysis: Load Data With Spoon

Prepare Your Data For Analysis: Load Data With Spoon

Note: The instructions described here is learnt and utilized while working on the Clouds & Concerts project at University of Oslo in close collaboration with Xin Jian, Johannes Bjelland og Pål Roe Sundsøy under supervision by Arnt Maasø. The credit for methodology and SQL ninja tricks is therefore to be attributed to all members of the fantastic stream team. Inconveniences or mistakes are solely my own. Technical set up We chose to use Amazon as a solution for doing our…

Read More Read More

Useful Terminal Commands for work with data

Useful Terminal Commands for work with data

Please, see my terminal-tools tag for more articles on nifty terminal tools. Enclose values in quotes, and add comma Say you work with a large datafile where you get all values printed on consecutive lines, and you want to copy these lines into an array, list or other data structure where strings need to be quoted, and values needs to be separated by commas. Here is a sed script that may help you with that task. sed ‘s/\(.*\)/”\1″,/g’ all_uuid.csv >…

Read More Read More

Safe functions for casting numbers and timestamp in Oracle SQL

Safe functions for casting numbers and timestamp in Oracle SQL

In the process of importing data which can be irregular from a flat file it is convenient to import all values as character strings, in this case Oracle’s varchar2, and then cast them into the proper format using the built-in datatype converters as e.g. to_number. The problem occurs when some value which is not convertible is stored in the column. This could be either because table column-overflow in the import process or because the value is optional and may be…

Read More Read More

Screenscrape av Øya-programmet

Screenscrape av Øya-programmet

Forskningsprosjektet Sky & Scene, hvor jeg jobber, ser blant mye annet nærmere på strømmetallene fra WiMP før, under og etter Øya-festivalen. For å gjøre dette trenger vi en liste over hvilke artister som spiller, hvilken dag de spiller og når på dagen de spiller. Før dataene kan analyseres må disse dataene være tilgjengelige i Excel-ark og i CSV-format og i databasen hvor strømmetallene finnes. Dataene må hentes og struktureres i et bestemt format. Et godt utgangspunkt er å samle dataene…

Read More Read More

Norske illustrasjonskart på fylkes- og kommunenivå med SVG

Norske illustrasjonskart på fylkes- og kommunenivå med SVG

Vektor- og rastergrafikk SVG – Scalable Vector Graphics – er, som gitt av navnet, et skalerbart vektorbasert grafikk-markup. Vektorgrafikk er bedre egnet til skalering enn raster, da bildet ikke består av små bildeelementer (Pixel – kort for nettopp picture element), men formler for opptegning av grafikken. Nå har det seg riktignok slik at ikke alt fungerer like godt i begge grafikkverdener. Siden vektorer er basert på forskjellige punkter på et lerret og forbindelsen mellom disse, enten igjennom punkter, rette linjer,…

Read More Read More

Work programmatically with Google Spreadsheets

Work programmatically with Google Spreadsheets

Some time back I authored a script which reads through a CSV formatted list, and based on the artists’ names tried to decide the nationality of the artists by querying the last.FM search engine and parsing the XML structured result. The script worked, and found and returned about 80% of the artists, and around 80% of these again had the conceptually similar artist. What if the alteration could be done in the document itself? For the CSV based script the…

Read More Read More