Archives

Categories

Bumblebees in Sandy, 2012

Following on from my other enthralling posts about grasshoppers and bush-crickets, here is one about bumblebees. I always used to think there were two sorts of bee: honey bees and bumblebees. I later thought there are two sorts of bumblebee: buff-tailed and red-tailed. However, it turns out that there are loads of bumblebees: about 25 species in the UK, although some of them are rare. Like the grasshoppers, bumblebees can be tricky to identify as they vary according to whether they are male or female or what kind of female they are: queen or worker. There are also considerable variations within species while some different species look the same as each other: see the first one below which is impossible to positively identify from a photo, or at least my photo. I got myself an excellent book recommended by Emily Heath* and submitted records to Beewatch, which has tools for identification as well as well as adding to national distribution data. Like the orthoptera scheme they also email you with confirmation of whether you got it right or not. I saw seven confirmed species of bumblebee in Bedfordshire over the summer, six of those in Sandy, and four in the garden.

I have followed the book’s practice of using the scientific name of each species as there is no consistency in common names. And it saves me some hassle. I have also noted whether each species is a social bumblebee (queen, workers, and males living in a nest a bit like a honey bee hive) or a cuckoo bumblebee (only females and males: the females take over social bumblebee nests whose workers raise the cuckoo female’s young). I never dreamt that such things as cuckoo bumblebees existed.

Bombus vestalis or Bombus bohemicus

Bombus vestalis or Bombus bohemicus

Bombus vestalis or Bombus bohemicus

A cuckoo bumblee, but uncertain precisely which species it is. These two species are very difficult to tell apart without catching them and examining them properly. From the photo, the Beewatch people could not be definite which it was. In Sandy, just off Sunderland Road.

Bombus pratorum

Bombus pratorum

Bombus pratorum

A social bumblebee. In Sandy, in the garden.

Bombus terrestris

Bombus terrestris

Bombus terrestris

The buff-tailed bumblebee, a social bumblebee. In Sandy, in the garden.

Bombus rupestris

Bombus rupestris

Bombus rupestris

A cuckoo bumblebee. In Sandy, near the station.

Bombus hypnorum

Bombus hypnorum

Bombus hypnorum

Tree bumblebee, a social bumblebee. First seen in the UK in 2001. In Sandy, in the garden.

Bombus campestris (probably)

Bombus campestris

Bombus campestris

A cuckoo bumblebee. In Willington (between Sandy and Bedford).

Bombus pascuorum

Bombus pascuorum

Bombus pascuorum

Bombus pascuorum male

Bombus pascuorum male

A social bumblebee. In Sandy, in the garden. Beewatch confirmed the first picture and I’m pretty sure about the id for the male (ginger beard and very round body), which makes it the first time I’ve seen a male bee and known it was a male.

* Edwards and Jenner. Field guide to the bumblebees of Great Britain & Ireland. 2005

Bush-crickets in Sandy, 2012

Following on from my write-up of grasshoppers I’ve seen in Sandy, I would like to do the same for their close-relatives, bush-crickets. Bush-crickets are also orthoptera, but I find them more interesting. They are generally shorter but more bulky, larger, and have really long antennae, hence their old-fashioned name: long-horned grasshoppers; in America I believe they are also known as katydids. They are a lot easier to identify than grasshoppers: colours tend to be more consistent, although most of them seem to be green, but the females in particular have long ovipositors at the back whose shape tends to give the species away. Some of them tend to hang around on tops of leaves if you keep your eyes open- especially dark bush-crickets and speckled bush-crickets- and some have repetitive (dark bush-cricket) or long (roesel’s bush-cricket) songs which helps in tracking them down. Some of them have been living in our garden for years, which helps. All these photos were taken in Sandy, several in our garden.

Dark bush-cricket (Pholidoptera griseoaptera)

Female dark bush-cricket

Female dark bush-cricket

Male dark bush-cricket

Male dark bush-cricket

The dark bush-cricket is commonly found on brambles, sometimes sitting on top of the leaves. Their song is quite distinctive: a short repeated buzz. I once went for a run at twilight past about 100 yards of brambles. I’ve never seen any there and couldn’t see any then as it was too dark but I heard loads all the way along. The male and female  look quite different. Only the male sings and so is the only one with any wings to speak of, although these are hardly there either. The female has a clear long and curved ovipositor.

More on this at the Orthoptera & Allied Insects site and Wikipedia.

Speckled bush-cricket (Leptophyes punctatissima)

Female speckled bush-cricket

Female speckled bush-cricket

Last year I managed to get loads of this as they’ve been active and breeding in the lavender in our garden (e.g. this one from above, this one shedding its skin, these two mating, and so on) but they weren’t around so much this year, presumably because of there being less sun. Like the dark bush-cricket, they are also found on brambles, and I’ve seen them together a few times. The picture above shows the wonderful crazy eyes bush-crickets have, especially when their antennae are going all over the place. They are also, as one might expect, speckled, although it’s not as obvious to the naked eye as it is on a photo. Although you can’t see it here, the ovipositor is sickle shaped. It does sing but its wings are so small they are barely audible without a bat detector.

More on this at the Orthoptera & Allied Insects site and Wikipedia.

Oak bush-cricket (Meconema thalassinum)

Female oak bush-cricket

Female oak bush-cricket

This one lives in trees, but I found it on a path near the health centre in Sandy. If you go near a bush-cricket, they normally jump away more violently than their awkward walking would suggest. This one though actually walked onto my hand like a ladybird would. It kept walking and was happy to keep walking over my hands and my coat. I was on my way to picking up my daughter from school and it stayed on my coat the whole way there, while I was waiting, and all the way home. I put it down on the pebbles in the back garden, where I got the above photo. It was so tame, it let me put it on the rosemary and, when I’d changed my mind, onto the apple tree where I thought it’d be happier. I haven’t seen it since though.

You can see it has a straighter ovipositor than the two above. Despite the larger wings, it doesn’t sing at all but (apparently) drums its foot on a leaf.

More on this at the Orthoptera & Allied Insects site and Wikipedia.

Long-winged conehead (Conocephalus discolor)

Male long-winged conehead

Male long-winged conehead

Female long-winged conehead

Female long-winged conehead

This is one I tracked down by sound on a patch of waste ground near the railway line. This is one example of an insect whose range has expanded hugely in recent years, presumably as a result of climate change. They have an excellent name, and there is indeed a short-winged conehead.* The ovipositor is almost straight.

More on this at the Orthoptera & Allied Insects site and Wikipedia.

Roesel’s bush-cricket (Metrioptera roeseli)

I sadly didn’t get any pictures of these this year although I saw a few and heard loads more. These are relatively easy to find as their song is a long aggressive buzzing, so you can home in on them quickly. They hide in long grass, though, so it’s hard to get a camera near them without a blade of grass getting in the way. They are also somewhat more jumpy than the oak bush-cricket mentioned above so if you get too close and alarm them they jump and disappear in a flash. However, here is one from 2011:

Male roesel's bush-cricket

Male roesel's bush-cricket

It looks a little like a dark bush-cricket at first glance although it has a distinctive pale U-shape behind the head. The female has a sharply curved ovipositor. They normally have shortish wings, but in good sunny years long-winged (macropterous) individuals appear and the one above is such a macropterous example. You can see and hear this singing in this dodgy video I took.

More on this at the Orthoptera & Allied Insects site and Wikipedia.

Hopefully next year there will be more sun, so there are more insects and more light for the camera!

* It has short wings.

Grasshoppers in Sandy, 2012

I don’t recall as a child ever seeing a grasshopper or cricket except for some locusts in the school science lab. Like most nature I assumed I didn’t live in the right place, or that these things were too scarce, too shy, or hard to find. I’ve always been useless at spotting birds, even when pointed out to me. Trailing my own children round the countryside and waste ground round Sandy and trying to find where in the grass some insect noises were actually coming from, I discovered that these things are not that hard to find. Grasshoppers and crickets (orthoptera) are actually quite common, distinctive, relatively large, and also very inclined to stay still, which makes photography a hell of a lot easier. Although common, there are not that many species (36 breeding species*) in the UK, so identifying them is not impossibly difficult.

That said, grasshoppers are problematic to identify as the differences between species can be subtle (e,g. shape of the pronotum behind their head, wing length, bulges on wings, and the shape of the antennae), even with a decent photograph. However, I have started submitting records to the Orthoptera & Allied Insects Recording Scheme. I like schemes like this as it means I can contribute something to SCIENCE (especially in view of climate change which seems to be having real effects as some crickets in particular are quickly spreading north) while also getting expert confirmation of my identifications.

After getting the bug** in 2011, I was really looking forward to summer 2012 as I knew the good sites around Sandy and had a good idea what I was looking for. I was also hoping that I might know how my camera works by now. However, 2012 was a notoriously bad year for insects. I don’t think grasshoppers are in quite the same bad situation as they don’t feed on nectar like butterflies and bees, but I didn’t see too many, possibly more due to the rain stopping me going out to look for them as much as I would have liked. I did get loads of pictures of young grasshoppers (nymphs) so they must have been around.

I only saw two confirmed species of grasshopper. The first two pictures below are confirmed by the recording scheme. All photos were taken in Sandy, Bedfordshire.

Meadow grasshopper (Chorthippus parallelus)

Meadow Grasshopper, Sandy

Female meadow grasshopper (Chorthippus parallelus

This is a relatively distinctive grasshopper, although don’t ask me to explain why (parallel pronotal keels and short wings are a start). The ones I’ve seen have all had the good manners to be green which this species tends to be: grasshoppers have a tendency to be all kinds of colours, even pink (photo by buzzbee4826). This one has the misfortune to only have five legs, which seems to be a relatively common affliction.

More on this at the Orthoptera & Allied Insects site and Wikipedia.

Field grasshopper (Chorthippus brunneus)

Field grasshopper nymph

Field grasshopper nymph (Chorthippus brunneus)

This seems to be the most common grasshopper in Sandy. When I think I’ve found something a bit different it normally turns out to be one of these. If nothing else, its wings are generally longer than the meadow grasshopper and the colours are normally all over the shop, none of which helps with identifying the nymphs. However, this photo was confirmed by an expert. The other photo is an adult, on a fencepost next to some vegetation growing over a path a stone’s throw from the house. Once you start looking for these things they turn up all over the place.

More on this at the Orthoptera & Allied Insects site and Wikipedia (although not a lot more).

Field grasshopper

Field grasshopper adult (Chorthippus brunneus)

* Evans and Edmondson. A photographic guide to the grasshoppers & crickets of Britain & Ireland. 2007. p. 7.

** LOL!!!

Running

For my own personal record as much as anything, here are the races I’ve entered and the chip times recorded:

Cambridge Cambourne 10K
3 April 2011
00:56:39

Silverstone Half Marathon
11 March 2012
02:13:46

Standalone 10K (2012)
7 October 2012
0:56:23

MRV MARC Record Viewer

I have finally completed a multiple record MARC Record Viewer. This has been rather long in the making but is essentially a quick and practical tool for looking at and assessing MARC records without having to load them into specialist software like MARCEdit or an LMS. It is essentially the same as the viewer built for my Codecademy project except that:

  • It reads multiple records in one file, rather than just one, and provides a count.
  • It has an input box so the records don’t have to be hard-coded into the script.

Some example .mrc records of varying lengths can be found here.

It is written in client-side Javascript, so you can view source and see how it works, copy it, and do what you like with it (although I would love to know if you do so). I quite defiantly haven’t used JQuery for this, which would probably have made the whole thing a bit easier; instead it uses proper old skool DOM scripting. It uses a minimal amount of CSS, in two files: a generic one, and one that roughly mimics how MARC records look in an Aleph editing screen. It should be fairly trivial to change this file to suit other purposes.
Thank you to those who have already have a shufti at earlier versions of this, especially on different browsers, and provided feedback! Please do let me know if you have any comments on this, suggestions for improvements, or if you come across errors. I have some ideas for improvements, mainly for making user input easier, and offering different formatting of results. I hope to start using JQuery for these too, and perhaps a later conversion of the whole thing would be in order.

One record in lots of data formats

For a Dev8d session I did with Owen Stephens in February I presented data for a single book and followed how it had changed as standards changed, trying above to explain to non-cataloguers why catalogue records look and work the way they do. At least one person found it useful. I am now drafting an internal session at work on the future of cataloguing and am planning to take a similar approach to briefly explain how we got to AARC2 and MARC21, and where we are heading. I took the example I used at Dev8d and hand-crafted some RDA examples, obtained a raw .mrc MARC21 file, and used the RDF from Worldcat to come up with a linked data example.

I have tried to avoid notes on the examples themselves. However, do note the following: the examples only generally use the same simple set of data elements, basically the bits you might find on a basic catalogue card: no subjects, few notes, etc.; the book is quite old so there is no ISBN anyway. The original index card is from our digitised card catalogue. The linked data example was compiled by copying the RDFa from the Worldcat page for the book; this was then put into this RDFa viewer (suggested by Manu Sporny) to extract the raw RDF/Turtle; I manually hacked this further to replace full URIs with prefixes as much as possible in an attempt to make it more readable (I suspect this is where some errors may have crept in). The example itself is of course a conversion from an AARC2/MARC21 record. C.M. Berners-Lee is Tim’s dad.

Feel free to use this and to point out mistakes. I would particularly welcome anyone spotting anything amiss in the RDA and linked data, where I am sure I have mangled the punctuation in both.

Harvard Citation

Berners-Lee, C.M. (ed.) 1965, Models For Decision: a Conference under the Auspices of the United Kingdom Automation Council Organised by the British Computer Society and the Operational Research Society, English Universities Press, London.

Pre-AACR2 on Index Card

BERNERS-LEE, C.M., [ed.].

Models for decision; a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society.

London, 1965.

x, 149p. illus. 22cm.

AACR2 on Index Card

Models for decision : a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society / edited by C.M. Berners-Lee. -- London : English Universities Press, 1965.

x, 149 p. : ill. ; 23 cm.

Includes bibliographical references.

-       Berners-Lee, C. M.

AACR2 in MARC21 (raw .mrc)

00788nam a2200181 a 4500001002700000005001700027008004100044024001500085245021000100260004900310300003200359504004100391650003300432700002300465710003900488710003000527710004900557_UCL01000000000000000477125_20061112120300.0_850710s1965    enka     b    000 0 eng  _8 _ax280050495_00_aModels for decision :_ba conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society /_cedited by C.M. Berners-Lee._  _aLondon :_bEnglish Universities Press,_c1965._  _ax, 149 p. :_bill. ;_c23 cm._  _aIncludes bibliographical references._ 0_aDecision making_vCongresses._1 _aBerners-Lee, C. M._2 _aUnited Kingdom Automation Council._2 _aBritish Computer Society._2 _aOperational Research Society (Great Britain)__

AACR2 in MARC21

245 00 $a Models for decision :
$b a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society /
$c edited by C.M. Berners-Lee.
260 __ $a London :
$b English Universities Press,
$c 1965.
300 __ $a x, 149 p. :
$b ill. ;
$c 23 cm.
504 __ $a Includes bibliographical references.
700 1_ $a Berners-Lee, C. M.

RDA

Title proper Models for decision
Other title information a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society
Statement of responsibility relating to title proper edited by C.M. Berners-Lee
Place of publication London
Publisher’s name The English Universities Press Limited
Date of publication 1965
Copyright date ©1965
Media type unmediated
Carrier type volume
Extent x, 149 pages
Dimensions 23 cm
Content type text
Illustrative content Illustrations
Supplementary content Includes bibliographical references.
Contributor Berners-Lee, C. M.
Relationship designator editor of compilation

RDA in MARC21

245 00 $a Models for decision :
$b a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society /
$c edited by C.M. Berners-Lee.
264 _1 $a London :
$b The English Universities Press Limited,
$c 1965.
264 _4 $c ©1965
300 __ $a x, 149 pages :
$b illustrations ;
$c 23 cm.
336 __ $a text
$2 rdacontent
337 __ $a unmediated
$2 rdamedia
338 __ $a volume
$2 rdacarrier
504 __ $a Includes bibliographical references.
700 1_ $a Berners-Lee, C. M.,
editor of compilation.

Linked data


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix worldcat: <http://www.worldcat.org/oclc/> .
@prefix library: <http://purl.org/library/> .
@prefix viaf: <http://viaf.org/viaf/> .
@prefix lc_authorities: <http://id.loc.gov/authorities/names/> .
@prefix mads: <http://www.loc.gov/mads/rdf/v1#> .

worldcat:221944758
  rdf:type schema:Book;
  library:oclcnum "221944758";
  schema:name "Models for decision : a conference under the auspices of the United Kingdom Automation Council organised by the British Computer Society and the Operational Research Society";
  library:placeOfPublication _:1;
  schema:publisher _:4 .
  schema:datePublished "[1965]";
  schema:numberOfPages "149";
  schema:contributor viaf:149407214;
  schema:contributor viaf:130073090;
  schema:contributor viaf:137135158;
  schema:contributor viaf:36887201;
_:1
  rdf:type schema:Place;
  schema:name "London :" .
_:4
  rdf:type schema:Organization;
  schema:name "English Universities Press" .
viaf:149407214
  rdf:type schema:Organization;
  madsrdf:isIdentifiedByAuthority lc_authorities:n79056431;
  schema:name "British Computer Society." .
viaf:130073090
  rdf:type schema:Organization;
  madsrdf:isIdentifiedByAuthority lc_authorities:n85076053;
  schema:name "Operational Research Society." .
viaf:137135158
  rdf:type schema:Organization;
  madsrdf:isIdentifiedByAuthority lc_authorities:n79063901;
  schema:name "Institution of Electrical Engineers." .
viaf:36887201
  rdf:type schema:Person;
  schema:name "Berners-Lee, C. M." .

How big is my book: Mashcat session

At Mashcat on 5 July in Cambridge I gave an afternoon session on getting computer readable information from the textual information held in MARC21 300 fields using Javascript and regular expressions. I intended this to be useful for cataloguers who might have done some of Codecademy‘s Code Year programme as well as an exploration of how data is entered into catalogue records, its problems, and potential solutions.

AACR2/MARC (and RDA) records store much quantitative information as text, usually as a number followed by units, e.g. “31 cm.” or “xi, 300 p”. This is not easy for computers to deal with. For instance, a computer programme cannot compare two sizes- e.g. “23 cm.” and “25 cm.”- without first extracting a number out of the string (23 and 25) as well as determining the units used (cm). In some cases, units might vary: in AARC2 books below 10 cm. are measured in mm., and non-book materials are often measured in inches (abbreviated to in.). Potential uses for better quantitative data in the 300$c include planning shelving for reclassification and more easily finding books by size or range.

Before the session, I sketched out a possible solution using Javascript and regular expressions to make this conversion for dimensions in the 300$c. I have a put up a version of A script to find the size of an item in mm. based on the 300$c, with the addition of an extra row which you can fill in to test your own examples without having to edit the script.

If you do want to look at how it works or try editing it yourself you can view source, copy all the HTML, then paste it into a text editor. Save it, then open the file using a browser to test it. Refresh the browser when you change the file.

The heart of the script looks like this:

var dollar_c = [
  "9 mm",
  "4 in.",
  "4 3/4 in.",
  "30 cm.",
  "1/2 in.",
  "20 x 40 cm."
];

// Convert text to mm
function text_to_mm (text) {
  // Convert fractions to decimals
  text = text.replace(/(\d*) (\d)\/(\d)/, function(str, p1,p2,p3) {return parseFloat(p1)+p2/p3});
  text = text.replace(/(\d)\/(\d)/, function(str, p1,p2) {return parseFloat(p1/p2)});
  // Extract the size of the book
  size = text.replace (/([\d\.]*).*/, "$1");
  // Extract the units
  units = text.replace(/.*([a-z]{2}).*/g, "$1");
  // Convert from various units to mm
  if (units === "mm") {
    var mm = size;
  }
  if (units === "cm") {
    var mm = size * 10;
  }
  if (units === "in") {
    var mm = size * 25.4;
  }
  mm=Math.floor(mm);
  return mm;
}

It starts with a declaration of an array of examples to be tested: you can alter this with your own if you prefer. text_to_mm is the function that does all the work. It takes in the text from a 300$c, converts fractions (e.g. 4 3/4) to decimals (4.75), finds a number, finds a unit, then performs calculations on the size depending on what the unit is to produce a figure to a standard figure in mm. At Mashcat, Owen Stephens managed to plug an adaptation of this script into Blacklight to create an index of book sizes. Using this he could do things like find the most common sizes or the largest book in a collection.

The main focus of my session, however, was on a similar script to figure out how many actual pages there are in a book, given the contents of a 300$a, e.g. “300 p.”, “ix, 350 p.”, “100 p., [45] leaves of plates”  (a page being one side of a sheet of paper; a leaf being a sheet of paper only printed on one side, so therefore counting as two pages). I have also published a version of A script to find the absolute no. of pages based on the 300$a with the similar addition of a row for easy user testing. Potential uses for recording page numbers rather than pagination include planning shelving space, easier to understand displays for users, and finding books of specified lengths.

The script starts with a similar array of examples to be tested:

// An array of test examples
var dollar_a = [
  "9 p.",
  "9p",
  "30 leaves",
  "30 p., 20 leaves",
  "xiv, 20 p.",
  "20, 30 p.",
  "20, 30, 40 p.",
  "xv, 20, 30, 40 p., 5, 5 leaves of plates",
  "clviii, ii, 4, vi p."
];

The main function is called text_to_pages. The first thing it does is convert any Roman numerals to Arabic ones. The heavy lifting for this is a function by Stephen Levithan which does the actual number conversion. However, we still need to identify and extract the Roman numerals from the pagination in order to convert them. This line does the extraction and makes a list of the Roman numerals:

var roman_texts=text.match(/[ivxlc]*[, ]/g);

The session I gave concentrated on regular expressions (a bit like the wildcards you use on library databases but turned up to eleven) which in all cases here are contained within slashes, and I made a simple introductory guide to regular expressions (.docx). There are many guides to regular expressions on the web too, and useful testers to play with such as this one. The regular expression in the line above can be broken down as follows:

  • [ivxlc] uses square brackets to look for any one of the characters listed within them.
  • The following * means to look for any number of these in a row
  • [, ] any of a comma or a space, again using square brackets. Obviously these characters are not used in Roman numerals but they are a convenient method of isolating these characters as numbers rather, say, the “l” in leaves which would also match otherwise.

The next few lines work through the list, replace any instances of [, ] with “” (i.e. nothing) to leave the bear Roman numerals, convert all the numbers in the list using Stephen Levithan’s functions, then do the replacements on the pagination given in text:

if (roman_texts) {
    for (i=0; i<roman_texts.length; i++) {
      // Remove space
      roman_texts[i]=roman_texts[i].replace(/[, ]/,"");
      var arabic_text = deromanize(roman_texts[i])+" ";
      text = text.replace(roman_texts[i],arabic_text+" ");
    }
  }
}

Like the size script above, the rest of conversion needs to do two things: find the numbers and find the units. To do this we need to find the sequences involved. While this is easy with something like “24 p.” (number is 24, unit is p) or even “xv leaves” (number is 15, unit is leaves), it becomes troublesome when you get something like “23, 100 p.”: the first number is 23 but there is no unit associated with it, only a comma to signify that it is a sequence at all. The following lines try and get round this problem but looking for sequences where the comma appears to be the unit and then looking ahead to find the next unit. In the “23, 100 p.” example the script would keep looking forward past the 100 until it gets to the “p”.

// Convert 20, 30 p. to 20 p. 30 p
  while (text.match(/\d*,/)) {
    text = text.replace(/(\d*),(.*?(p|leaves))/, "$1 $3 $2");
  }

The first regular expression in the while line looks for:

  • \d* any number of digits. \d is any digit and * looks for any number of them, followed by
  • , a comma

So as long as the script finds any sequences of numbers followed by a comma, it will carry on making the replacement underneath it. The replacement line itself looks for

  • \d* any number of digits again, followed by
  • , a comma
  • .*? which is . any character * any number of times. The ? makes sure that the smallest matching group of characters is matched; otherwise the expression will think that the units corresponding to the number 15 in the pagination “15, 25 p., 50 leaves” is “leaves” rather than “p”.
  • p|leaves either p or leaves. The pipe means either match on the left of it or the right of it. Because this is in a set of round brackets, the pipe only applies there, rather than the whole expression.

Brackets also capture subsets which is really useful here: the first set of () brackets captures the number of pages and stores it as $1, the second set captures everything between the comma and the end of the units as $2, the third  set captures the units only, either “p” or “leaves”, and stores it as $3. So in the example “15, 25 p., 50 leaves”, $1 is “15”, $2 is ” 25 p”, and $3 is “p”. The replacement puts these back in a different order, i.e. “$1 $3 $2″ which would be “15 p 25 p”.

Now that all the sequences will be in number-unit pairs, we can get on with making a list of them to work through:

 // Find sequences
  var sequences = text.match(/\d+.*?(,|p|leaves)/g);

This looks for:

  • \d+ at least one digit
  • .*? any number of any characters, although not being greedy
  • (,|p|leaves) any of a comma, “p”, or “leaves”. Obviously, if the while loop above has worked, then the comma isn’t needed, but I’ll confess this is a hangover from a previous version of the script…

The next section goes through each of the sequences found and extracts the number and then the unit:

// go through sequences
  var pages = 0;
  for (var i=0; i<sequences.length; i++) {
    // Extract no
    var number = parseFloat(sequences[i].match(/\d+/g)[0]);
    var units = sequences[i].match(/(p|leaves)/g)[0];
    if (units == "p") {
      pages+=number;
    }
    if (units == "leaves") {
      pages+=number*2;
    }
  }

The regular expression to find the number is straightforward:

  • \d+ at least one digit

The parseFloat converts the digits as a string to a Javascript number. The regular expression to find the unit is also simple:

  • (p|leaves) either “p” or “leaves”

If the units are “p”, then the variable pages is incremented by the value of the number found; if “leaves”, then pages is incremented by twice that number.

The programme should cope with the loss of abbreviations in RDA as “p.” is expanded to “pages” but the regular expression to find the units will still find the “p” at the beginning much as it isn’t put off by the full stop after the “p”. It could be expanded to look for other variations and I will do so if I can:

  • “S.” for German “Seite” or “Seiten”.
  • “leaf”, as in “1 leaf of plates”
  • sequences which start in the middle of larger ones, like journal issues with “xii, p. 546-738″. This one will be the most complicated as it goes against the basic flow of the existing code.

I also haven’t properly tested folded sheets or multiple volume works. Other improvements are needed in failing more gracefully when it doesn’t find what it’s expecting: the programme should really test the existence of the arrays it makes before looping through them, but this would make it harder to understand at a glance or demonstrate on screen so I didn’t do it.

The scripts are written in Javascript for several reasons: it is the language that Codecademy focusses on for beginners; it requires no specialist environment, server, or even a web connection: you just need a basic text editor and a browser; it is easy to adapt for a web page if you do manage to build something; and, it is the language I am most confident working in. It would be fairly easy to port to other languages though, and Owen changed the size script with some other modifications to work in Beanscript/Java in Blacklight.

I can’t speak for the attendees, but I learnt a lot, and much was made more clear, from playing around with these scripts and talking to people at Mashcat:

  • Quite how depended AARC2 and RDA (and consequently MARC21) are on textual information, even for what appears to be quantitative data.
  • That even for what appears to be standard number-unit data, there are too many complications that make it non trivial to extract data:
    • fractions (not even decimals) in 300$c
    • differing units: book sizes in mm. or cm. depending on how big the book is; disc sizes in in.; extent in pages or leaves (or volumes or atlases or sheets…)
    • sequences with implied units, such as those with commas.
  • there is frequently a lack of clarity and ambiguity of what is actually being measured:
    • for books the dimension recorded is normally height (although this is not explicit from a user’s point of view,  sometimes it’s height and width, and for a folded sheet it could be all sorts of things); for a disc it’s the diameter.
    • For the 300$a what’s being recorded is pagination, something entirely different from number of pages. Although important for things like rare books, how important is complete pagination for most users compared to a robust idea of how large a book is? Amazon provide a number of pages. More importantly, how understandable is pagination? During my demonstration, some of my audience of librarians were left cold by the meanings of square brackets for example (and square brackets can mean any number of things depending on context). Perhaps there is room for both.

I suppose this latter point is a potential conclusion. Ed Chamberlain asked me what I thought should be done. I don’t know to be honest. I think, like much of the catalogue record, lots more research is needed to see what users (both human and computer) actually want or need. It should be said that entering pagination is in many ways easier for the cataloguer. However, I do think we need:

  • quantitative data entered as numbers with clear and standard units. For instance, record all book heights as mm. and convert to cm. for display if needed.
  • more data elements to properly make clear what is being recorded. Instead of a generic dimension, we need height, width, depth?, diameter, etc. Instead of pagination, we could have separate elements for pagination, number of pages, and number of volumes (50 volumes each of 10 pages is not the same as 4 volumes of 1000 pages each). Obviously all of them wouldn’t be needed for all items.

The research to enable us to choose what to record, why we’re recording it, and for whose benefit would be the best starting point for this as well as many other questions in cataloguing and metadata.

Territorial Mercenaries discography

Over the weekend I came across the recorded musical output of my first band: the Territorial Mercenaries. The Mercenaries consisted of me and my friend Simon. I played the keyboard (and Spectrum loading sequence, etc.) and cobbled together the music;  Simon did the singing and wrote the lyrics. We took our inspiration from a number of sources, in particular an Island Records compilation tape and a Public Enemy song that someone was playing in a school lunchtime revision session that sounded like someone torturing a donkey (hence all the Dobbin/donkey references). It was all recorded on a tape to tape player, using ancient microphones that were sellotaped to the speakers of the keyboard so they didn’t jump and bang. The earlier ones used all manner of sophisticated layering using the tape to tape player’s full potential, but these songs ended up being nothing but 90% hiss, which is probably merciful to be honest. Later songs were generally done in one take.

Below is our discography, including Simon’s cover art. Technically speaking, it is not a discography as these recordings only exist on cassette (and I no longer have a player). There is also in existence an additional compilation I made for a over-curious university friend. He did say he was going to put it onto CD at some point. Maybe it’s best for all concerned if I don’t remind him. I’ve been meaning to put this up somewhere on the web since about 1996 when making lists of bands’ output on the web was the thing to do (those were the days when I made a few HTML lists and had arguably one of the best Radiohead sites on the web).*

Albums

In Bed with Dobbin (1992)

In Bed with DobbinSide A

  1. Indeedy
  2. Chinese Water Torture
  3. Viel Vergenugen
  4. Locomotion (Twin Peaks Karoake Mix)
  5. Sit
  6. Spot the Song

Side B

  1. Full Woolen Cardigan
  2. Full Woolen Cardigan (Reprise)
  3. Famous Ladies
  4. 1812 Underture
  5. Toxin in Loco Parentis
  6. The Last Political Waltz
  7. The Krypto Factor
  8. In Bed with Dobbin

Donkey Mafia Records DM1 (1992)

Dobbin sans Frontiers

Dobbin sans FrontiersSide I

  1. Epilogue
  2. Dobbin sans Frontiers
  3. Entice the Judicature with Dough
  4. Dobbin to Q4
  5. The One to Blame
  6. Nightmare
  7. Pump up the Aussie
  8. Stop the Snog?

Side II

  1. Tipping the Balance of the Scales
  2. Over Rated Stoat
  3. Shakepeare’s Second Cousin
  4. Norma Major
  5. Jive Dobbin
  6. Beeline for the Grave
  7. The American Dream
  8. Prologue

Donkey Mafia Records DM2 (1992)

Singles

Dobbin Is Dead

Dobbin is Dead1

  • Dobbin is Dead

2

  • Indeedy (No Song Swansong Remix)

Donkey Mafia Records DM2 1/2  (1992)

Indeedy (No Song Swansong Remix)

Full sleeve notes explaining the works would doubtless be beneficial although unwanted. As a taster, Indeedy is a reference to a maths teacher’s catchphrase. Full Woolen Cardigan is a Cagean concrete poetry-style piece using a knitting pattern. Dobbin to Q4 is a follow up to the latter with obvious chess inspiration. Chinese Water Torture is roughly what you would expect…

If you would like to hear any of this, you will need a means of converting cassette to CD, and an awful lot of persuasion.

* At my first successful library job interview in 1997 (I’m still here) I was asked something like whether I thought the web was any good for academia. I said no, as it was just full of things like band websites. LOL.