Browsed by
Tag: data-work

What does your Twitter followers look like

What does your Twitter followers look like

I like Twitter. It’s the virtual world’s answer to Post-it notes, well not really, but the nature of the site constrains people from droning on-and-on about a topic. The restrictions in the number of characters a user may put into a tweet causes brevity, which is ideal when the total number of people who you follow is increasing and the live-feed of Tweets is updated several times per minute. This restrictions in the number of characters have fostered a certain repurposing of signs. The @ (at-sign) is working as a reference to users and the # (hash-sign) is used to concentrate commentaries about a topic.

From a programming-wise perceptive the decoupled nature of Twitter is interesting. While Facebook.com and other Facebook-created applications are the favourite place to access Facebook-data, Twitter is built around a more open model. Facebook is complicated with several ways of interacting, many modalities, third-party applications and a requirement to login to get access. Twitter is opposite, it has very simple and easy to understand structure and is open (though user may protect their feed to accepted followers this is not default and luckily not widely spread). These differences leads, at least for me, to Facebook being a personal and private social network and Twitter being used more for interests, both professional and private.

The decoupled nature of Twitter has led to an abundance of third-party applications which interact with the Twitter API. It’s easy to access Twitter data, and more importantly, easy to understand what it all does. Twitter is in that sense a good place to start if you want to play more around with data, Internet protocols and programming. A couple of posts ago I wrote about Mining the Social Web and Screen Scraping with Python (in Norwegian). The post you now is reading goes into the same category. This code example does not however mine any of the data from Twitter, it is just finding the ID of your followers, then get data about each of them and download their pictures. The code here may be used for further data-analysis though if you save the data gathered by the requests to somewhere instead of doing like I do store them in the heap while the script executes and hence automatically remove them when it terminates. Writing the data to file takes just two lines of code, writing them to a database probably four or five.

[sourcecode language=”ruby”]

#encoding: utf-8

dependencies = %w(net/http active_support open-uri uri)
dependencies.each {|m| require m}

class Follower
attr_accessor :name, :created_at, :profile_image_url, :location, :url, :lang, :geo_enabled, :description
def initialize(id)
@id = id
end

def say_hi
puts "Hello, my name is #{@name}, and I have the id #{@id}. Oh, by the way. I was created at #{@created_at}"
end

def download_picture
puts "downloading #{@id} : #{@name} from #{@profile_image_url} \n"
unless @profile_image_url.nil?
open(URI.escape(@profile_image_url)) {|f|
File.open("pictures/#{@id}.jpg","wb") do |file|
file.puts f.read
end
}
end

end

end

class TwitterGetter

def initialize(name)
@name = name
#@followers = Array.new
unless File.directory?("pictures")
Dir.mkdir("pictures", 0755)
end
end

def get_follower_list
response = Net::HTTP.get("api.twitter.com" , "/1/followers/ids.json?cursor=-1&screen_name=#{@name}" )
@followers = ActiveSupport::JSON.decode(response)
sleep_time = ((60*60)/ 75) + 2
puts "set sleeptime to: ", sleep_time, "\n"

@followers[‘ids’].each do |id|
sleep(sleep_time)
lookup_user(id)
end

end

def lookup_user(id)
response = Net::HTTP.get("api.twitter.com", "/1/users/show.json?user_id=#{id}&include_entities=true")
info = ActiveSupport::JSON.decode(response)
f = Follower.new(id)
f.name = info["name"]
f.created_at = info["created_at"]
f.profile_image_url = info["profile_image_url"]
f.location = info["location"]
f.url = info["url"]
f.lang = info["lang"]
f.geo_enabled = info["geo_enabled"]
f.description = info["description"]
f.say_hi
f.download_picture
end
end

tg = TwitterGetter.new("olovholm")
tg.get_follower_list

[/sourcecode]

So, how does this little script work? First it instantiates the work-horse class TwitterGetter which takes the argument of the user which followers’ pictures you want to download. This class creates the directory in which the pictures will be downloaded into. Once instantiated we call the get_follower_list method which accesses twitter data through the REST-API  then parse the JSON stream from   response = Net::HTTP.get("api.twitter.com" , "/1/followers/ids.json?cursor=-1&screen_name=#{@name}" ). Once the list is downloaded a script runs through the new list of followers and gathers data from the Twitter API on each of the users. Due to limitation in how many requests one can call to twitter each hour I put the script to sleep for

total time units % requests allowed per hour

(You may request Twitter 150 times per hours, but the lookup user calls download_picture which downloads the image – It may be that this is excluded from the restrictions). Well, the user class stores data from each of the users and has the code and responsibility for downloading the users pictures. It also contains a say_hi method which perhaps would be better as a bit more formal to_s method. The code is more a proof of concept and does not contain error handling for JSON data returned by Twitter (which, believe me, happens quite often) so that would be a good place to start if you want to expand on this code.