This article...
I want to take a quick look at splitting and joining text using javascript.
Splitting
Suppose you want to split some text. A language like javascript (and many other languages besides) makes this very easy:
',,a,,,,b,,c,,'.split(/,/) // case (I)
=> ["", "", "a", "", "", "", "b", "", "", ]
Or
',,a,,,,b,,'.split(/,+/) // case (II)
=> ["", "a", "b", "", ]
You can recreate the string for (I) using an in-built join
function:
',,a,,,,b,,'.split(/,/).join(',')
=> ',,a,,,,b,,'
Case (II) can't be put back the way it was because we do not know
for any given joining point the size of the joining item since /,+/
matches
a variable number of characters (commas in this case).
Joining for case (I)
Sometimes, you want to join a split as in case (I) but not back into a string. When I first tried to do this I ended up writing a horrifically complicated function.
Looking at this case again:
',,a,,,,b,,c,,'.split(/,/) // case (I)
=> ["", "", "a", "", "", "", "b", "", c, "", "", ]
The thing to remember is that ""
represents the gaps between the commas in ',,a,,,,b,,c,,'
including the gap before the very first comma and the gap after the very last
comma. "a"
,"b"
etc are filled-in gaps. This is probably what is confusing about manually joining such a split
array; because it's easy to fall into thinking that the ""
-terms represent commas instead of
the gaps.
Algorithm for manual joining
From an algorithmic point of view we want to map over the array produced in case (I) and process
both the ""
and non-""
terms.
The commas in the string may signify a point where we want to insert something.
In my case, the strings I was splitting were text nodes from preformatted
text (in pre-tags) that contained line feeds (\n
or \r\n
). I was tokenizing the text
and wanted to preserve line feeds in the form of individual span tokens.
So in this case the commas in case (I) would represent line feeds eg
'\n\na\n\n\n\nb\n\nc\n\n'
instead of ',,a,,,,b,,c,,'
.
Going back to case (I), the terms (or gaps) are the best indication of where the commas are; if there are n commas, then there will be n+1 gaps (including filled in ones). Keeping this in mind the rules we could follow as we map over the array might be:
- when we have a
""
-term we insert comma - when we have a non-
""
term we insert term followed by a comma - at the last position in the array don't insert a comma
- if last position in the array is a
""
then do nothing - if last position in the array is a filled-in gap, process it but don't insert comma
- if last position in the array is a
Functional approach
There are some nice ways to do this in javascript. Ecmascript 5 probably has mapping functions that might assist but here is a manual version that whilst not overly functional, facilitates a functional style when used (using the term 'functional' in a very loose sense):
// Join elements that have been split by String.prototype.split(...).
var join = function(arr,unsplit,process) {
var i,l=arr.length;
for(i=0;i<l;i++) {
if(arr[i]!=='') process(arr[i],this);
if(i!=l-1) unsplit(this);
}
}
Notes:
unsplit
is a function that represents the "insert comma" operationprocess
is a function that represents the "insert term" operation which we apply to filled-in gaps like"a"
- in addition, we pass
this
to bothunsplit
andprocess
as this can faciliate sharing privileged information betweenunsplit
andprocess
; although this isn't necessary.
We could run join like this:
join(arr,f,g)
for some array arr
and functions f
and g
.
But suppose we want to accumulate a result as join
maps over arr
or
otherwise share privileged information between f
and g
, this is where
this
could be used:
var module1 = function() {
var prog1 = function(text) {
...
var someObj = {};
... initialize someObj ...
var arr = text.split(...);
join.call(someObj,arr,unsplit,process);
...
}
var unsplit = function(obj) {
...
}
var process = function(item,obj) {
...
}
}();
In the above we have a function prog1
inside a module that
performs a split
on some text.
We invoke join
using call
passing someObj
as the first argument;
this becomes the this
reference within join
which in turn passes this
to unsplit
and process
Variations
We could skip using call
/this
and simply add an extra paramter to join
to allow us to pass an object in.
Or we could also call
unsplit
and process
. This removes the need to
specify the obj
parameter in these two functions:
// Join elements that have been split by String.prototype.split(...).
var join = function(arr,unsplit,process) {
var i,l=arr.length;
for(i=0;i<l;i++) {
if(arr[i]!=='') process.call(this,arr[i]);
if(i!=l-1) unsplit.call(this);
}
}
var unsplit = function() {
... do something with 'this' ...
}
var process = function(item) {
... do something with 'this' ...
}
We could also define unsplit
and process
within prog1
giving
these functions privileged access to someObj
. These functions would
be generated every time prog1
is invoked. But there would be no
need to mess about with an extra parameter or this
.
No comments:
Post a Comment