Tuesday, September 7, 2010

Split and join (in Javascript)

This article...

I want to take a quick look at splitting and joining text using javascript.

Splitting

Suppose you want to split some text. A language like javascript (and many other languages besides) makes this very easy:

  ',,a,,,,b,,c,,'.split(/,/) // case (I)
  => ["", "", "a", "", "", "", "b", "", "", ]

  ',,a,,,,b,,'.split(/,+/) // case (II)
  => ["", "a", "b", "", ]

You can recreate the string for (I) using an in-built join function:

  ',,a,,,,b,,'.split(/,/).join(',')
  => ',,a,,,,b,,'

Case (II) can't be put back the way it was because we do not know for any given joining point the size of the joining item since /,+/ matches a variable number of characters (commas in this case).

Joining for case (I)

Sometimes, you want to join a split as in case (I) but not back into a string. When I first tried to do this I ended up writing a horrifically complicated function.

Looking at this case again:

  ',,a,,,,b,,c,,'.split(/,/) // case (I)
  => ["", "", "a", "", "", "", "b", "", c, "", "", ]

The thing to remember is that "" represents the gaps between the commas in ',,a,,,,b,,c,,' including the gap before the very first comma and the gap after the very last comma. "a","b" etc are filled-in gaps. This is probably what is confusing about manually joining such a split array; because it's easy to fall into thinking that the ""-terms represent commas instead of the gaps.

Algorithm for manual joining

From an algorithmic point of view we want to map over the array produced in case (I) and process both the "" and non-"" terms.

The commas in the string may signify a point where we want to insert something. In my case, the strings I was splitting were text nodes from preformatted text (in pre-tags) that contained line feeds (\n or \r\n). I was tokenizing the text and wanted to preserve line feeds in the form of individual span tokens. So in this case the commas in case (I) would represent line feeds eg '\n\na\n\n\n\nb\n\nc\n\n' instead of ',,a,,,,b,,c,,'.

Going back to case (I), the terms (or gaps) are the best indication of where the commas are; if there are n commas, then there will be n+1 gaps (including filled in ones). Keeping this in mind the rules we could follow as we map over the array might be:

when we have a ""-term we insert comma
when we have a non-"" term we insert term followed by a comma
at the last position in the array don't insert a comma
- if last position in the array is a "" then do nothing
- if last position in the array is a filled-in gap, process it but don't insert comma

Functional approach

There are some nice ways to do this in javascript. Ecmascript 5 probably has mapping functions that might assist but here is a manual version that whilst not overly functional, facilitates a functional style when used (using the term 'functional' in a very loose sense):

  // Join elements that have been split by String.prototype.split(...).
  var join = function(arr,unsplit,process) {
      var i,l=arr.length;
      for(i=0;i<l;i++) {
          if(arr[i]!=='') process(arr[i],this);
          if(i!=l-1) unsplit(this);
      }
  }

Notes:

unsplit is a function that represents the "insert comma" operation
process is a function that represents the "insert term" operation which we apply to filled-in gaps like "a"
in addition, we pass this to both unsplit and process as this can faciliate sharing privileged information between unsplit and process; although this isn't necessary.

We could run join like this:

join(arr,f,g)

for some array arr and functions f and g.

But suppose we want to accumulate a result as join maps over arr or otherwise share privileged information between f and g, this is where this could be used:

var module1 = function() {
  var prog1 = function(text) {
    ...
    var someObj = {};
    ... initialize someObj ...
    var arr = text.split(...);
    join.call(someObj,arr,unsplit,process);
    ...
  }     
  var unsplit = function(obj) {
    ...
  }     
  var process = function(item,obj) {
    ...
  }     
}();

In the above we have a function prog1 inside a module that performs a split on some text. We invoke join using call passing someObj as the first argument; this becomes the this reference within join which in turn passes this to unsplit and process

Variations

We could skip using call/this and simply add an extra paramter to join to allow us to pass an object in.

Or we could also call unsplit and process. This removes the need to specify the obj parameter in these two functions:

  // Join elements that have been split by String.prototype.split(...).
  var join = function(arr,unsplit,process) {
      var i,l=arr.length;
      for(i=0;i<l;i++) {
          if(arr[i]!=='') process.call(this,arr[i]);
          if(i!=l-1) unsplit.call(this);
      }
  }
  var unsplit = function() {
    ... do something with 'this' ...
  }     
  var process = function(item) {
    ... do something with 'this' ...
  }

We could also define unsplit and process within prog1 giving these functions privileged access to someObj. These functions would be generated every time prog1 is invoked. But there would be no need to mess about with an extra parameter or this.

Wednesday, September 1, 2010

Surviving the twitter OAuthcalypse on the commandline (using Ruby)

Surviving the twitter OAuthcalypse on the command line

In this article...

I try to cover how to use twitter apis from the (linux) commandline via OAuth using the ruby twitter gem

Warning:

I'm a very light user of web services and social media in general
I have little knowledge of OAuth other than a general appreciation of what it is trying to do

Quick background...

I woke up to the OAuthcalypse today.

Up till now I had been using twitter in a very innocent, low-cal kind of way from the commandline (via an ungodly combination of curl and bash/shell utilities) and also from emacs. Both methods mysteriously failed today leaving me with blank screens and cryptic error messages.

Whilst I should have probably given up at this point and embraced one of the popular twitter services, I ended up instead, wasting half a day wrestling with OAuth in a bid to get my twitter commandline working again.

Ruby twitter gem

I've given up for the moment using shell utilities to access twitter like before the OAuthcalypse, although this might be possible. Instead, I'm going to use ruby which for me is rapidly turning into the new perl.

There are probably numerous libraries in ruby for doing twitter but I chose John Nunemaker's twitter gem

I found it helpful to get the actual source which I git cloned
- this turned out to be useful because the source includes a number of example files that are worth looking at
That being done, I installed the twitter gem in the normal way
```
  gem install twitter
```
- This will load several other gems; in my case:
  - oauth-0.4.2
  - hashie-0.2.2
  - crack-0.1.6
  - httparty-0.5.2
  - yajl-ruby-0.7.7
  - twitter-0.9.8
At this point you should be able to do a require 'twitter' successfully

OAuth Terminology

Just to be clear, here are the main protagonists in an OAuth exchange:

service is an oauth enabled web service like twitter
user is a person that has an account with a service and who is using a consumer (or app) to access that service
consumer - consumes a service; a consumer may itself be some sort of service or a client application that the user is using; the consumer has to use oauth protocols to access the user's information in service
app is alternative name for a consumer; I use both interchangeably

OAuth 1.0a and "out of band" (oob) processing

I'm going to cover the OAuth 1.0a process as it pertains to twitter and as best I can understand it after one day of head pounding.

Note:

Here's a description of how OAuth (1.0) works.
twitter implemented OAuth 1.0a a bit over a year ago.
there's an additional security check required by OAuth 1.0a (compared to OAuth 1.0) and which is discussed here (see oauth_verifier)
- in particular, twitter's pin (mentioned below) is special case process flow referred to as out of band processing which I cover below and which is intended for desktop (or in our case, commandline) apps/consumers

Back to OAuth:

OAuth requires 3 sets of tokens; each set consists of a token and a shared secret:

Consumer token / secret ctoken/csecret
- this is a once-only token and shared secret that identifies the app (consumer)
- you only need one for your app; so once you get it, you stash it somewhere where your app can load it
- in the case of twitter you can set up access privileges when setting these up; for twitter this is whether the consumer will have read-only or read/write access
- you can go here to arrange twitter to generate the ctoken/csecret pair for you
  - twitter will require you to give it some information such as the application name and a description
    - interestingly, you can't leave the description blank and twitter doesn't like you putting in an app name that has 'twitter' in it; I think you are also required to put in a url for the app
    - I never had to bother with this when I was using my old commandline app with http auth api so I am wondering if this is the only way to proceed now
Request token / secret rtoken/rsecret
- this is a transient token/secret pair that appears to represent the act of requesting an atoken/asecret pair; once you've used it to request an atoken/asecret pair (and hopefully succeeded), you can dump it
- you need to present a valid ctoken/csecret pair to twitter before you can get a rtoken/rsecret pair
- to "activate" this rtoken/rsecret pair the user is required to authenticate with the service (in the case of twitter via a specially crafted login url)
  - the user will be asked to login and then specify whether the consumer that initiated the request token should be allowed to proceed (allow or deny)
    - note: it may also be possible here to specify authorization privileges but in the case of twitter, this was done during the consumer token phase above
  - the user authenticates successfully and clicks 'allow',
  - because we're using out of band (oob) processing, the service will display a pin number; the user needs to (manually) give this to the app (our commandline twitter-gem based app) in order for it to proceed
    - the pin number is part of the "out of band" process flow; since we're trying to access twitter from the command line, we are very much out of band
Access token / secret atoken/asecret
- the app uses the pin and the associated rtoken/rsecret pair from the previous step, to authenticate itself with the service; all going well, the service should provide an atoken/asecret pair
- Once the app/consumer has an atoken/asecret pair, it can access the user's data from the service; this pair is acting like a substitute username/password.
  - note that the app/consumer never gets to see the real username/password of the user's account for the service and that the access pair can be easily revoked or set to expire

Using ruby twitter gem

Setting up config

If you looked at the examples section of the source for the twitter gem, there is a helpers/ directory containing config_store.rb
- This defines a small class called ConfigStore that can be configured to load information (such as stored tokens) from a yaml file
- Here's an example yaml file
```
  --- 
  ctoken: random-string-of-characters
  csecret: random-string-of-characters
  atoken: random-string-of-characters
  asecret: random-string-of-characters
```
- You'll need to set ctoken and csecret by visiting twitter to register your application.
- You won't be able to set atoken or asecret; these will be stored by ConfigStore when you do a successful authentication so leave these out for now
- I use a slightly modified version of config_store.rb which I copied from examples/ into my directory of choice

Managing an OAuth session

The first thing we've got to do is manage an OAuth session.

I've managed to boil it down to one of two routes:

if you don't have a valid atoken/asecret, the you'll need to do a "full login" which means requesting an rtoken/rsecret pair and then getting a pin out-of-band and feeding it back to our commandline app
if we have a valid atoken/asecret, then we can skip the above rigmarole and access the service directly since the atoken/asecret is acting like a temporary username/password.

I've encapsulated this behaviour in a TwitterSession class:


require 'twitter'
require File.join(File.dirname(__FILE__), 'config_store')
require 'pp'

# Handles OAuth authentication with twitter.
#
# @config must already contain a valid 'ctoken' and 'csecret'
# which you can get from twitter: http://twitter.com/oauth_clients/new

class TwitterSession

  attr_reader :oauth,:config

  def initialize config
    @config = ConfigStore.new(config)

    # Request rtoken/rsecret and login url from service:
    @oauth = Twitter::OAuth.new(@config['ctoken'], @config['csecret'])
    @config.update({ 'rtoken'  => @oauth.request_token.token,
                     'rsecret' => @oauth.request_token.secret, })
  end

  # Request new atoken.
  #
  # @config must already contain a valid 'ctoken' and 'csecret'
  # which you can get from twitter: http://twitter.com/oauth_clients/new
  #
  # You will need to do an out-of-band process which
  # will load a browser (lynx) to log the user into
  # twitter and which will provide a pin.

  def login

    # Get user to login and allow the consumer to proceed:
    #%x(firefox #{@oauth.request_token.authorize_url})
    system %{lynx #{@oauth.request_token.authorize_url}}

    STDOUT.print "> what was the PIN twitter provided you with? "
    pin = STDIN.gets.chomp

    @oauth.authorize_from_request(@oauth.request_token.token,
                                  @oauth.request_token.secret,
                                  pin)
    @config.update({ 'atoken'  => @oauth.access_token.token,
                     'asecret' => @oauth.access_token.secret,
                   }).delete('rtoken', 'rsecret')
  end

  # Login with existing atoken.

  def login_with_atoken
    if(@config['atoken'] && @config['asecret'])
      @oauth.authorize_from_access(@config['atoken'], @config['asecret'])
    else
      login
    end
  end

end

In the above file:

In addition to requiring the twitter gem, I also require my version of config_store.rb which is almost identical to the one in examples/
we initialize TwitterSession by giving it a name of the config store
- I have multiple accounts which, for the moment, I'll manage in separate stores and which we will instantiate separate TwitterSession instances for
intialize makes an oauth call to get the request token/secret pair
login is the full login method
- it uses ruby's system to call lynx which loads up the authentication/authorization page on twitter where we will get the pin; this will all get done in the same console;
- we copy the pin to the clipboard
- we then quit lynx
- the procedure will then ask for the pin and read it from STDIN
- login will then attempt to get atoken/asecret using the rtoken/rsecret pair and associated pin
login_with_atoken is the quick login method which will only work if you have an existing valid atoken/asecret in your config store
- You'll be able to run this most of the time after doing a single login.

Getting your timeline

We take the above TwitterSession class and use it like so:


require File.join(File.dirname(__FILE__), 'session')
sess = TwitterSession.new("/home/danb/twitter2/config_store")

# Force full login if we specify -l on commandline.
if ARGV[0]=='-l'
  sess.login
else
  sess.login_with_atoken
end

client = Twitter::Base.new(sess.oauth)
pp client.user_timeline