Browsed by
Tag: social media

API-scrape images from Instagram

API-scrape images from Instagram

An image can say more than a thousand words, especially when you add a retro-filter and a score of hashtags to go with it. That is a basic explanation to the functionality of Instagram; the power app which revolutionised peoples creativity when it came to documenting dietary habits… and popularised images in social media.

Instagram brought social into photography in a way other more desktop-oriented photo sharing applications like Picasa and Flick never managed. It is social and users can like and comment on other’s pictures. Instagram also enhances images by reducing their photographic clarity (let’s emulate cameras far less technologically advance by adding a filter), but then again, this adds the style to images, and makes some of them look cool. However, I will let the filters and pastulation (the digital emulation of an analogue past – coined by moi, but please let me know if there is a buzzword for this and I may conform) rest for now. Let us instead focus – pun intended – on something else: the tagging of images, and how to retrieve images with a certain tag.

Adding context and indices with tags

An instagram picture may be tagged with several hashtags, these are words or concatenated words prepended with an hashtag #. The great thing with # is that they are 1) providing a social signifier for the users that this is a tag and hence, the users can use this tag to organise their content and create a social context for which the photo exists e.g. #instafood (picture of a meal), #selfie (a person taking a picture of him/herself usually in together with..), #duckface (quack quack pouting) and #onedirection (popular teenage idols). Tags can be of any kind, from current affair, to more general stuff. 2) providing a token to declare something indexable for the Instagram-server and other technical resources. Once the computer system knows it’s a tag it may group the tags together, perform analysis on the tag and users associated with this tag, aggregate statistics on the tag and other stuff to enhancing the user experience. In our case the tagging is great as we want to retrieve images with a given tag.

The #InstagramTagGetterScript

Below is a script which takes the tagname as an argument and downloads the images and meta-data associated with these images. To get it to work you will need to obtain an API-key from Instagram’s developer page. This URL you can put into the inital request sent to the server (that being stored into the next_url variable). We are using the tags-endpoint to download the images.

The rough outline of the script is as follows: 

First we define a class to store each InstaEntry, and this class comes with the functionality to retrieve and store the image and metadata, as well as dump the data to disk and load the data from disk. The class holds all the variables we are interested in collecting, and once instantiated these variables are set unless they do not exist with the image.

Once the structure is created some inital parameters are set: the tag and our initial URL-request, and the folders into where we will store data are created. When everything is set up we run a loop which continues to run as long as there are data available and we get responses with HTTP 200-status (OK). The loop instantiates an InstaEntry for each image which then download images as well as metadata on the fly. The objects are retained until the program is fully executed, but all large data (see images) are downloaded directly and not kept in memory.

Please contact me if you want to use this script, tailor it, or have any questions related to it.

#!/usr/bin/ruby
# encoding: UTF-8

require 'active_support'
require 'restclient'
require 'csv'
require 'open-uri'
require 'fileutils'

class InstaEntry
  attr_accessor :id, :username, :picture_url, :likes, :filter, :location, :type, :caption, :tags, :fullname, :user_id, :created_time, :link

  def initialize(id)
    @id = id
    @@threads = []
  end

  def marshal_dump
    [@id, @username, @picture_url, @likes, @filter, @location, @type, @caption, @tags, @fullname, @user_id, @created_time, @link]
  end

  def marshal_load(variables)
    @id = variables[0]
    @username = variables[1]
    @picture_url = variables[2]
    @likes = variables[3]
    @filter = variables[4]
    @location = variables[5]
    @type = variables[6]
    @caption = variables[7]
    @tags = variables[8]
    @fullname = variables[9]
    @user_id = variables[10]
    @created_time = variables[11]
    @link = variables[12]
  end

  def to_arr
    [@id, @username, @picture_url, @likes, @filter, @location, @type, @caption, @tags, @fullname, @user_id, @created_time, @link]
  end

  def self.get_image(obj,tag)
    @@threads << Thread.new(obj,tag) {
      begin
        open("images_#{tag}/#{obj.id}_#{obj.username}_.#{obj.picture_url.match('\.(jpe?g|gif|png)')[1]}","wb") do |file|
          file << open("#{obj.picture_url}").read
        end
      rescue
        puts "ERROR: #{obj.id} triggered an Exception in get_image method"
      end
    }
  end

  def self.print_metadata(obj,tag)
    open("md_#{tag}/#{@id}_#{@username}.txt","wb") do |file|
      file.print(obj.to_arr)
    end
  end

end #end InstaEntry class

#
# This block sets the parameters, and reads the first word for keyboard to be file
#

raise ArgumentError, "Missing name of tag to download" if ARGV.length < 1

$tag = ARGV[0]

output = open("output.json","wb")
next_url = URI::encode("https://api.instagram.com/v1/tags/#{$tag}/media/recent?access_token=51998418.d146264.e77441adc4a04399874a19b48bb91e71f&min_id=1")
# NB: The access token above is similar to a token, but obfuscated. Get your own by retrieving a developer account at Instagram.
puts next_url

unless File.directory?("md_#{$tag}")
  FileUtils.mkdir_p("md_#{$tag}")
end

unless File.directory?("images_#{$tag}")
  FileUtils.mkdir_p("images_#{$tag}")
end

count = 0
instas = {}

#
# This blocks run through all the subsequent pagination pages. Stop when stumbles upon HTTP code not being 200 or if the access string is shorter or like 5 characters.
#
begin
  response = RestClient.get(next_url)
  json = ActiveSupport::JSON.decode(response)
  pretty_json = JSON.pretty_generate(json)
  puts "Status code #{json['meta']['code']} for URL #{next_url}.. Fetching"
  next_url = json['pagination']['next_url']
  sleep 2

# loop through the data elements
json['data'].each do |item|
  puts item['link']
  puts item['user']['full_name']
  ie = InstaEntry.new(
    item['id'])
  instas[item['id']] = ie

  ie.username = item['user']['username']
  ie.picture_url = item['images']['standard_resolution']['url']
  ie.likes = item['likes']['count']
  ie.filter = item['filter']
  ie.location = item['location']
  ie.type = item['type']
  ie.caption = item['caption']['text'] unless item['caption'].nil? or item['caption']['text'].nil?
  ie.tags = item['tags']
  ie.fullname = item['user']['full_name']
  ie.user_id = item['user']['id']
  ie.created_time = item['created_time']
  ie.link = item['link']

  InstaEntry.get_image(ie,$tag)
  InstaEntry.print_metadata(ie,$tag)
end

count += 1

output << pretty_json

puts "Now checked __ #{count} __ files and __#{instas.length}__ number of instas"
puts "*****Ending with #{count} __ files and __#{instas.length}__ number of instas****" if next_url.nil?

end while not next_url.nil?

output.close

File.open("instadump_#{$tag}",'wb') do |f|
  f.write Marshal.dump(instas)
end

CSV.open("output_#{$tag}.csv", "wb", {:col_sep => "\t"}) do |csv|
  instas.each do |k,v|
    csv << instas[k].to_arr
  end
end

 

 

Disclaimer: Enabling you to download images associated with tags does not make me say that you can do whatever you want to. First, please refer to the Instagram guidelines to confirm that you are actually allowed to download images. Second, respect the individual users privacy and immaterial content rights, do not use images in a publishing context without the users consent. Generally: be nice, and do good. 

Please help me understand more about Social Media Usage

Please help me understand more about Social Media Usage

As a student of digital media it is important for me to stay up to date with the recent developments in the online community. Much have happened the last five years and statistics on usage is not always up to date. As a result I created this survey using the web service surveymonkey, which I have understood to be one of the major online survey companies on the web. Since I do not have much research money I chose the free version, and because of this the survey is limited to 10 questions. Please help me understand more, and as a free gift you can help improve the validity of what I write on this page. Facts for me, and facts for you.

Please take your time (it’s just ten questions) and help me fill out this form:

Listomania

Listomania

“Think less but see it grow like a riot, like a riot, oh
I’m not easily offended
It’s not hard to let it go from a mess to the masses”

The article’s title being a slightly modified song title by Phoenix, and containing the chorus from the same song, this is not a tribute to the popular French band but rather an explanation on something I have seen as being a growing phenomena in mine, and others, Internet habits: the growing need to fill out lists and compete and compare with online connected friends. Internet services helping you to coordinate your cultural and geographical memory, but they could also be used to brag how culturally superior you are. How you choose to use them are your choice, but be advised, no matter how you use them: They are terrible time consuming.

The Spotify Syndrome
Highly contagious matter, spreads through the social media Facebook, and results in an endless amount of time spent listening to collaborated “indie” play-lists, party hits and seasonal variations consisting of the best and well-known classics and the worst and hidden katzenjammer from the last 5 decades. Now with the ability to send your friends songs as easy as a poke on Facebook sending them in sleepless nights of bongo drums trance and pan flute ballades.

The iCheckMovies Disorder
This disorder creates an addictiveness to watch the canonized films voted forward of fellow film lovers on the great service IMDB. Not just the famous top 500 list, but also sorted by decade, genre and how much they played in this service generates hours of work from the Great Train Robbery to Toy Story 3. In addition to the democratic elected movies, iCheckMovies also provides you with the award winning movies from the Academy, BAFTA, Cannes and other festivals and institutions. This is a time consuming activity, but from your hard and beneficial (for yourself and your own general education) you get rewarded with different prizes depending on how many percent of the films you have seen in each category.

The Foursquare Virus
Both Spotify (in its most advance version) and iCheckMovies are services which are constrained to happen in front of a computer screen on the top of a lap or a desk, the next virus on our time consuming list is mobile. This means that it is following you around, in your very own pocket. Do you have a cell phone with a API-oriented operative system – such as iOS or Android – and wireless or mobile network this service could follow you anywhere. And anywhere is also the point of this service since the goal – the point creating activity – is to check-in at different establishments. This gives you points, badges and ultimately first-life perks. The latter can be found in e.g. Pizza Hut and Starbucks that will give you free pizza and coffee if you manage to reach the rank of major, in other words if you are the most frequent customer checking-in during a 30-days timespan.

Summary
The services above are time consuming, and can create an addiction. On the other hand will they provide you with hours of fun, a systematic approach to listen or see something and a diary of what you have done. The last two is interesting since they do not produce any value except that they reefer to other products, except of course if you are a major. All services make you able to see what you friends have been doing, even though many of the users are probably more interested in how they present themselves. Another interesting observation is that the meta-actions are becoming more important. The fact that many users, me included, spends hours on updating what we have done, what we have listened to, what we have watched, where we have been, who we know, where we have worked, how far we have jogged and more, takes up time that could  be used doing more of this. I know for my own sake that I could probably watch one or two full-feature movies each month instead of find movies I have watched and check the list, but there is little I can do about it. I am a listomanic. Are you?

Links

Foursquare, Spotify, iCheckMovies

iPhone goes large: iPad

iPhone goes large: iPad

During the last three weeks since the iPad was released, the Technology Press has been flooded by articles about the iPad, Apples newest gadget. The media has been asking questions like, What is iPad going to change? Will it replace the need for a printing press? Which former technology does it compete against? Which applications will be developed? and What will we be able to do with the iPad? It is clear that there are a lot of speculations about the iPad and what it will achieve. So now I will try to explain some of my ideas about the iPad.

The physical pad

My first impression when Steve Jobs showed off the iPad to the public was that this gadget seemed an awful lot like a big iPhone. The screen covering most of the device and the characteristic home-button, together with the soft-edge profile similar to the newer iPhones, made it clear that the iPhone design trend was still standing strong within Apple. This could be a result of the enormous popularity of the iPhone and the saying, ‘Don’t fix it if it ain’t broke’. At least that’s what I believe. Apple have found their design that expresses their ideology of making technology that is easy to use and which gets the job done.

The pad comes in several varieties linked to hard drive space and wireless network support. The differences in hard drive space is quite simple, more space for more money, but when it comes to the wireless network support it is more to look into. There will be a version which just supports the standard of 801.x, meaning that it will support wireless networking in the same standard as home routers work. The other version will in addition to the wifi-support come with a buildt-in 3G support which makes it possible to be online also through the cellular-phone network.  The iPad is an enlarged iPhone, but without the oportunity to make regular phone calls, but with different specifications follows different usage.

What can you do with the iPad?

“Its true, when something exedes your ability to understand how it works. Its sort of becomes magical. And thats ecactly how the ipad is” – John Ive, Vice President Industrial Design Apple corp.

I do, as Ive, see potensial in the iPad, but with that said: I do not see it as a magical invention. I persive it as an iPhone gone large. I do not think that it will challenge the book or renew the printing press in a major direction as many has suggested. Though it is true that the pad provide a posibility to show electronic articles, but there is nothing new about that. I read articles on my desktop and laptop everyday. So when Philip Schiller, Senior Vice President of Worldwide Product Marketing, in the presentational video says that: “its going to change the way we do the things we do- every day”, I think that could be an overstatement. With that said: I’m looking forward to see how the new touch screen is working, because it is in this the biggest difference between the Phone and the Pad lies. Apple promotes a web-browser, games, news and user made applications on their iPad website. The games can be used through a touch user interface or with the motion sensor. The most discussed usage is the news applications. New York Times have already published an App where the user can read the newspaper, and together with Apples own bookstore this could mean that there will be a focus on text application on the iPad. The bigger screen gived the Pad an advantage in comparison with the cell phones or PDAs with simmilar functionality, but with a use time on aproximatly 10 hours between each time it has to be charged it loose this competition against more direct electronic book marked products as Amazon’s Kindle.

I think that iPad is going to gratify an already existing market for gadgets which are changing they usability with their installed application. It’s not going to be a PC, but some of the tasks we normaly do on a PC can now be done on an iPad. The screen wich is bigger than on an iPhone or simmilar phones will make it more comfortable to use this with social media services such as Twitter, Facebook and Youtube.

With the popularity of Apples other products they have a big advantage when it comes to user reputiton. I guess there will be sold many iPads, but I doubt it will replace any existing technology rather I think it will mix in between the mini-devices as PDAs and cell phones and portable computers.

Video source: http://www.apple.com/ipad/ipad-video/

Bilde: Glenn Fleishman