Useful Terminal Commands for work with data

Please, see my terminal-tools tag for more articles on nifty terminal tools.

Enclose values in quotes, and add comma

Say you work with a large datafile where you get all values printed on consecutive lines, and you want to copy these lines into an array, list or other data structure where strings need to be quoted, and values needs to be separated by commas. Here is a sed script that may help you with that task.

sed 's/\(.*\)/"\1",/g' all_uuid.csv > all_ready.txt

What this CLI snippet does is to use the sed, stream editor, and pass in the s – substitute – argument followed first by a backslash delimiter and then by a eat all regular expression enclosed in a group. This reads the whole line and is finished by yet another backslash delimiter. What comes between the first and second delimiter is the input, and now comes the output. We write the initial quote mark, then a \1 which is referring to the regular expression group in the input, this is followed by a closing quote and a comma. We add the g-argument to continue this operation for all matches, in other words all lines. Then we pass in the file we want to alter, and sends in the > operator to print the output of this command into the file all_ready.txt.

Copy a specific number of lines from one file to another

While working with large data you want to try something new. The problem is that it takes too much time running the process on a too large data-selection. How can you cut down on the length without opening the text editor. With the head command you can print the beginning of a file, and by adding a number as argument you can define exactly how many lines to add. The lines are printed to stdout, but by using the greater than operator you can route this output into a new file.

head -50000 oldfile > newfile

The head command reads the file from the beginning and is usually reading a default number of lines. We pass in an argument asking for 50.000 lines then the file which we want to read, and pass in the greater-than operator to write the output into a new file.

Getting the total number of lines of a textfile

Working with data, and transfer from one system to another, you will often find that the data is saved onto flat files. Flat files can be comma separated (the well known CSV-files), tab-separated or separated by other means. The general pattern is that attributes of the data (the columns) are stored separated by a predefined delimiter, and the occurrences (the rows) are separated by a new line. For this command to work you need the format to be of this type. Use e.g. head to verify that occurences are separated by new lines. When that is done, run the command below, and if you have a header line in your flat file substract 1 from the number of lines. This can easily be compared with the result of count(*) on the SQL database after the data is imported.

wc -l filename

Getting the total number of files in a directory

In this example we use the command from above, but this time we pipe in the output of the directory list (ls) with the each file on a new line argument (-l). With piping we can send the result of one command as the argument for a second. This kind of chaining makes *nix systems very convenient as you can do complex stuff from combining simple commands. Anyhow, here the wc -l from above gets a list of files and directories in a directory and prints the count of these.

ls -l [directory-name] | wc -l

Illustrational image by Travis Isaacs, licensed under a Creative Commons attribution licence. Found on Flickr.

The Raspberry Pi

The Raspberry Pi has been on the market for a while now, providing a very affordable, small computer for the computer interested and hobbyist hackers. The idea of the Raspberry Pi came in 2006 when teachers at the University of Cambridge’s Computer Laboratory became concern about the decline in numbers and skill-levels among the A-level students who entered computer science. The tinkers of the early computer generation were replaced due to well designed and easy-to-use interfaces, and in school curriculum Computer Science class was mostly filled with sessions on Excel Spreadsheets, Word-processing and other tools with a high abstraction level. The idea of Raspberry Pi was to provide those who wanted with an affordable, but capable platform for tinkering. Interestingly, this trend should coincide with the makers movement which have embraced DIY-electronics from the Arduino interface to 3D-printing.

What is the Raspberry Pi?

It’s a plain computer, very scaled down and very affordable. The A-model costs around 25 Pounds, and at 30 Pounds you get the B-model which comes with 512 MB RAM, two USB-sockets and an Ethernet adapter. Both versions have a SD-card reader (the SD card, not included in the price, functions as a hard-drive for the Pi), HDMI-port for visual (and sound) and 3.5 mm jack output for sound. Both devices is powered by a micro-USB plug, come with a 700 Mhz ARM processor (roadcom BCM2835 700MHz ARM1176JZFS processor with FPU and Videocore 4 GPU) and a dedicated graphics unit which supports OpenGL and 1080p30 H.264 video (which is great since a 700Mhz processor is not too found of running high-res video in adition to the OS). It also comes with an RCA composite video output and CPIO ports, the latter may have a good potential if you want the Raspberry Pi to work as a stand-alone Arduino-like device.

Perhaps the most imporant part about the Raspberry Pi is that it can be used for many things, and that it has a low-level entry cost. With the HDMI and RCA you could use it as a media center, or you could buy a SNES-controller adapter and use it as a Super Nintendo with the ZSNES emulator. This makes it a brilliant tinkering device.

After ordering a HDMI to DVI adapter I finally got my Raspberry up and running and I will be back with some projects built on this platform soon. Stay tuned!

Getting it up and running is just plug-and-play if you have a preloaded SD-card. If not it is a bit more hazzle, I used this introduction on how to Install the Operating system onto the SD-card and it worked well using the Mac OSX Terminal to load the Raspbian OS over on the card following the instructions.

[nggallery id=11]