A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

Brief Introduction to Regular Expressions

This is a web version for reference of a docx file originally produced for the Mashcat 2012 session I did called How Big Is My Book. Resurrected to form the Manual for Meret, a regular expressions tutorial based on Marcedit examples.


Characters as you type them. E.g. i will look for a letter “i”. ii will look for two letter “i”s in a row. Eldorado will look for the exact string “Eldorado”, and 1234s will look for “1234s”.

Types of Character

There are a number of ways of looking for specific types of character:

. looks for any single character. It could be a letter, number, punctuation or anything.

[] looks for any one of the characters in square brackets, so m[ae]rc will match “marc” and “merc”. You can also specify ranges, e.g. [a-z] will find any letter from “a” to “z”, so [a-d]ad will match “aad”, “bad”, “cad”, and “dad”. Putting a ^ after the [ will look for any character that isn’t in the square brackets: u[^ks]marc will not match “ukmarc” or “usmarc” but will match “unmarc”.

\d a digit, same as [0-9]. Like all the following, counts as one character although written as two.

\D not a digit, same as [^0-9].

\w alphanumeric, including underscore, same as [A-Za-z0-9_]

\W non-alphanumeric, same as [^A-Za-z0-9_]

\s whitespace characters, e.g. spaces, tabs

\S non-whitespace characters

\b word boundary: the beginning or end of words (i.e. strings of alphanumeric characters), or the beginning or end of strings.

\ is also used before a special character so you can search on it. E.g, searching on . will look for any character and will match “.”, “d”, or “5”. To look for a full-stop, put \ in front: \..

Starts and Ends

^ matches the start of any string. So, in “marc must die” ^marc will match “marc” but ^must will match nothing.

$ matches the end of any string. So, in “marc must die” die$ will match “die” but must$ will match nothing.

Numbers of Characters

* matches the preceding element zero or more times, e.g. catalogu*ing will match “cataloging”, “cataloguing”, as well as “cataloguuing” and “cataloguuuuuuuuuuing”.

? matches the preceding element zero or one times, e.g. catalogu?ing will match “cataloging” and “cataloguing” but not “cataloguuing”. See also ? below.

+ matches the preceding element one or more times, e.g. catalogu+ing will match “cataloguing”, “cataloguuing”, and “cataloguuuuuuuuuuing”, but not “cataloging”.

{n} matches the preceding element exactly n times, e.g. catalogu{10}ing will match “cataloguuuuuuuuuuing” but not “cataloging”, “cataloguing”, or “cataloguuing”.

{m,n} matches the preceding element at least m times and no more than n times.

? also has a special meaning to restrict matches of multiple characters, e.g. looking for catalog.*ing in “cataloguing is ace. I love cataloguing” will greedily find “cataloguing is ace. I love cataloguing” as the .* matches both “uing is ace. I love catalogu” and “u”. Amending the regular expression to catalog.*?ing will find only “cataloguing”.


() groups characters together. This has a variety of uses. The group can be used a single character, e.g. (meta)* looks for the string “meta” zero or more times. It can also be used for capturing smaller parts of the expression for later use, e.g. catalog(.*) will match anything starting “catalog” but will also store what comes afterwards as $1.

| [pipe] allows alternatives either side of it, e.g. marc|rdf will match “marc” or “rdf”. Smaller alternatives can be matched with brackets, e.g. (uk|us)marc will match “ukmarc” or “usmarc” (and if there is a match will store “uk” or “us” as $1).

Regular Expressions in Javascript

To get matches, use string.match(//). The regular expression goes between the forward slashes. Put a g after the second slash to search for all matches, rather than just the first one. Put an i after the second slash to do a case-insensitive search. String.match returns an array of matches, or null if it finds nothing.

var hits = “team”.match(/i/g);

hits is null as there is no “i” in “team”.

var text = “Fox in socks in box on Knox”;
 var hits = text.match(/\w*ox\b/g);

hits is an array of three elements, all a series of words ending in “ox”: [“Fox”, “box”, “Knox”].

To search and replace within string, use string.replace(//, ””). The regular expression goes between the forward slashes. The g and i work in the same way. The string to replace matches with goes after the comma. You can insert subexpressions captured with round brackets by using $1 for the first, $2 for the second, and so on (see Grouping above and the example below). String.replace returns the string with replacements made:

To search and replace within string, use string.replace(//, ””). The regular expression goes between the forward slashes. The g and i work in the same way. The string to replace matches with goes after the comma. You can insert subexpressions captured with round brackets by using $1 for the first, $2 for the second, and so on (see Grouping above and the example below). String.replace returns the string with replacements made:

var text = “I love MARC. I think MARC is the future.”;
 text = text.replace(/MARC/g, ”linked data”);

text is now “I love linked data. I think linked data is the future.”

var text = “UKMARC is better than USMARC”;
 text=text.replace(/(.*?MARC) is better than (.*?MARC)/gi, “$2 is better than $1”);

Now, “USMARC is better than UKMARC”. Run the replacement again, and history is reset.


ISBN (from Thingology blog) ([0-9]{9}[0-9X]|(978|979)[0-9]{10})

UK Postcode (from Wikipedia) (GIR 0AA|[A-PR-UWYZ]([0-9][0-9A-HJKPS-UW]?|[A-HK-Y][0-9][0-9ABEHMNPRV-Y]?) [0-9][ABD-HJLNP-UW-Z]{2})

Importing URLs into a large MARC file with Marcedit

This is a brief documentation of how I used Marcedit to import correct URLs from an Excel spreadsheet into a large file of MARC records. The name of the ebook supplier has been changed to protect the innocent. The values below worked for me on the Excel spreadsheet I used.

Problem. Ebook supplier (EBS) supplies MARC records of generally good quality for a package of 600 ebooks. However, the URLs are inconsistent: there are between one and four in each record; several ebook supplies are represented, not just EBS; many of the DOIs for EBS- the only URLs that are consistent- do not work. We do have an Excel spreadsheet listing OCLC numbers and valid URLs for all titles.

General plan. To delete all the 856 fields in the MARC file and replace them with those from the spreadsheet. To do this, convert the relevant bits of the spreadsheet to a simple MARC file and merge the two using Marcedit.

Delete the URLs from the original file
Load/convert the original file as an .mrk file. Use the Tools>Add/Delete Field option to delete all the 856 fields in the original file.

Convert the spreadsheet to MARC.

  • In Marcedit (version 6), select Export Tab Delimited Text.
  • Choose the spreadsheet for the Source File
  • Choose a filename for the Marc text (.mrk ) file to be created
  • Specify the name of the sheet for an Excel file (e.g. in my case EBS)
  • Choose the delimiter that separates the data (in my case I left this alone as Tab. It worked)
  • Choose options (I left the LDR/008 and character encoding alone as I don’t think they mattered)
  • Next. The data snapshot shows the columns numbered Fields 0 to whatever. I needed columns A (OCLC number) and P (URL), so this meant Fields 0 and 15. The fields to select and how they work is done by using the Settings section to create Arguments. For this, I needed two arguments, one for each field:
  • First Argument (OCLC control number to go into the 001 field): Select = ”Field 0”; Map to = “001” ; Indicators = “\\” ; Term. punctuation = “” ; Constant Data & Repeatable Subfield = “”
  • Add Argument when done
  • Second Argument (URL to go into the 856 field): Select = “Field 15”; Map to = “856$u” ; Indicator =”40” ; Term. punctuation = “” ; Constant Data & Repeatable Subfield = “”
  • Add Argument when done
  • Finish. This disconcertingly takes you back to the previous screen but if you open up the .mrk file in the MarcEditor it should be all done. Each record will look something like this:

=LDR 00000nam 2200000Ia 45e0
=001 123456789
=008 140812s9999\\\\xx\\\\\\\\\\\\000\0\und\d
=856 40$u

Edit the new .mrk
As the OCLC numbers in the original MARC records were in the form “ocn123456789” (rather than simply “123456789”), I needed to do a find for “=001 “ and replace it with “=001 ocn” on the new file, then save it.


  • From the Tools menu of Marcedit, select Merge Records
  • Choose the .mrk of the original MARC records as the Source File (I don’t know if the .mrc would work too)
  • Choose the newly created .mrk file as the Merge File
  • Choose a filename for the newly merged file to be created
  • Leave Record identifier as 001. If you were searching on the ISBN, presumably the 020 would work but haven’t tried it. Other options are 010, 020, 022, and 035, and MARC21 (?)
  • Next.
  • Select the Merge Selected Field option
  • Next
  • Specify the 856 and move it to the Merge Fields box
  • “Merge Completed”

Ta da!

Bookmarklet to search Google Books from an HTML element

Install the bookmarklet by dragging the link to your bookmarks toolbar:

Or, create a bookmark manually, and change the Location property to the following:


To use it, go to a page which has an element with an id of “isbn” then click on the bookmark.

You can edit the bookmark if the id is called something else (change the bit in brackets and quotes from isbn to something else) or you want to search on another index (change q=isbn to q=somethingelse).

RLUK/European Library linked data sample

RLUK and the European Library (of which the RLUK is now a member) have just released 17 million records as linked open data. They have released three sets (via Mike Mertens), for which links to the RDF turtle versions are below:

I’ve tried to have a quick look at the last just to get an idea and I’ve isolated what I think is all the data for one book, chosen at random. The whole block of turtle prefixes from the start of the file are included:

@prefix rdaa: <> .
@prefix rdac: <> .
@prefix rdae: <> .
@prefix rdam: <> .
@prefix rdaw: <> .
@prefix rdau: <> .
@prefix dcterms: <> .
@prefix edm: <> .
@prefix foaf: <> .
@prefix frbrer: <> .
@prefix ore: <> .
@prefix owl: <> .
@prefix rdf: <> .
@prefix rdfs: <> .
@prefix skos: <> .
@prefix wgs84pos: <> .

<> a dcterms:BibliographicResource ;
      rdam:P30004 "local identifier:" ;
      rdau:P60049 <> ;
      rdam:P30003 "single unit"^^<> ;
      rdau:P60520 "Unkown"@en ;
      rdam:P30004 "isbn: 0198750315" ;
      rdam:P30156 "The philosophy of history" ;
      rdau:P60339 "edited by Patrick Gardiner." ;
      rdam:P30157 "Oxford readings in philosophy" ;
      rdau:P60398 _:node18kdvnimbx4386 .

_:node18kdvnimbx4386 a rdac:C10004 ;
      rdaa:P50111 "Patrick L. Gardiner" ;
      rdaa:P50121 "1922" .

<> rdau:P60073 "1974" ;
      rdau:P60099 <> ;
      rdau:P60163 _:node18kdvnimbx4387 .

_:node18kdvnimbx4387 rdau:P60366 "Oxford University Press" .

<> rdau:P60444 _:node18kdvnimbx4388 .

_:node18kdvnimbx4388 a rdac:C10005 ;
      rdaa:P50032 "London" .

<> rdau:P60163 <> ;
      dcterms:subject _:node18kdvnimbx4389 .

_:node18kdvnimbx4389 a frbrer:C1007 ;
      rdfs:label "History, Philosophy." ;
      dcterms:hasPart _:node18kdvnimbx4390 .

_:node18kdvnimbx4390 a frbrer:C1007 ;
      rdfs:label "History" .

<> dcterms:extent "224 p. ;" , "21 cm." ;
      rdau:P60470 "Includes index." ;
      dcterms:description "Bibliography: p. [218]-222." .

Some initial observations:

A short snippet from another book showing a blank node asserted as being the same as a VIAF entity, having a relationship with a work using RDA, and the detailed RDA data elements for the name:

_:node18kdvnimbx245 owl:sameAs <> .

<> rdau:P60398 _:node18kdvnimbx245 .

_:node18kdvnimbx245 a rdac:C10004 ;
      rdaa:P50111 "Niccolo Pagliarini" ;
      rdaa:P50121 "1717" ;
      rdaa:P50120 "1795" .

Bumblebees in Sandy, 2013

Last I managed to see six species of bumblebee in Sandy* and another one further afield in Bedfordshire**. This year I managed to spot eight in Sandy, all but one in the garden. I’m hoping to have lots more wild flowers in the garden this year so hope to attract the bees to go with them.

I’ve submitted all the following as records to Beewatch, which is also very useful in getting confirmation of IDs.

Bombus hortorum

Bombus hortorum

Bombus hortorum

The garden bumblebee, seen in the garden. This is the first one of these I’ve seen, despite them being very common.

Bombus hypnorum

Bombus hypnorum

Bombus hypnorum

The tree bumblebee. We in fact had two nests in our house and garden. The bee above is coming out of one they made in an old bird box in the garden. The birds never used it but these bees did. We had a second nest in the roof too.

Bombus lapidarius

Bombus lapidarius (worker)

Bombus lapidarius (worker)

Bombus lapidarius (male)

Bombus lapidarius (male)

The red-tailed bumblebee. Both in the garden.

Bombus pratorum

Bombus pratorum

Bombus pratorum

The early bumblebee, one of the smaller species. In the garden. They seem to like flatter flower heads, like on this senetti.

Bombus lucorum

Bombus lucrorum (female)

Bombus lucrorum (worker)

Bombus lucorum (male)

Bombus lucorum (male)

White-tailed bumblebee. The queens and workers look practically identical to the buff-tailed bumblebee (B. terrestris) although the males are very much more yellow and quite striking. In the garden.

Bombus pascuorum

Bombus pascuorum

Bombus pascuorum

Common carder bee. In the garden.

Bombus terrestris

Bombus terrestris (queen)

Bombus terrestris (queen)

Buff-tailed bumblebee. The buff tail is more obvious in the queen especially just forward of the tail.

Bombus vestalis

Bombus vestalis

Bombus vestalis

Bombus vestalis

Bombus vestalis

The southern cuckoo bumblebee. It takes over nests of B. terrestris. Sadly not seen in the garden but the bee on the bramble flower was on a piece of waste ground next to a path a stone’s throw away. The one on clover was near the railway station.

* Bombus vestalis?, B. pratorum, B. terrestris, B. rupestris, B. hypnorum, B. pascuorum

** B. campestris?

Automatic table of contents for RDA Toolkit workflows

Below is described a way to add tables of contents to RDA Toolkit workflows automatically, i.e. without manually adding anchors and creating a list. You can see an example of it action on this workflow (although of course I can’t guarantee that this workflow will always be around or look like this).

It uses some Javascript but requires no knowledge of it as it can be dropped in. It is 95% a script written by Stuart Langridge (@sil) with some minor amendments to get round some strange internal linking behaviour and to provide links to the top of the document throughout the workflow.

Instructions follow and some caveats are below.

  1. Open an RDA Toolkit workflow for editing
  2. Click on Source
  3. Insert the following snippet of HTML where you want the table of contents to appear:
    <div class="generate_from_h2" id="generated-toc"><a name="top"></a></div>
  4. If you have access to a local web server:
    1. Copy the Javascript file generate_toc_rda.js and put it somewhere sensible.
    2. At the very end of the workflow, put the following HTML snippet, changing the URL to where your copy of generate_toc_rda.js now lives:
      <script type="text/javascript" src=""></script>
  5. If you don’t have access to a local server:
    1. At the very end of the workflow, put the following HTML snippet:
      <script type="text/javascript">
    2. Copy the complete contents of the Javascript file generate_toc_rda.js and paste it on the next line. There will be a lot of it.
    3. On another line underneath, i.e. right at the end, put the following snippet of HTML:
  6. Save the workflow.
  7. Click on the workflow in the Toolkit to refresh it.
  8. Buy Stuart some beer next time you see him, e.g. some gueuze, or give him some custom.

Caveats: it is not official and while the script was designed to work on any web page, these things always depend on the approach taken by the encompassing page to be logical and consistent over time, and this can be particularly unpredictable in a CMS, which the Toolkit basically is. I am also unsure of the publisher’s attitude towards dropping Javascript into workflows, although cannot see why there should necessarily be objections to this. Lastly, using this approach also means  removing any existing apparatus of table of contents or links to the top. It would be advisable to back up everything, including the source of generated tocs, although in the worst case, it would probably be possible to move the contents of a workflow to an external file, run the toc script on it, then re-import the HTML source.

Please do let me know if you try this and how you get on. I might be amenable to making changes to it, time and circumstances allowing. Stuart released the original toc script “under an X11 licence. What this boils down to is: do what you like with it. You can use the script in commercial environments, you can use it on your intranet, you can use it anywhere you like.” Sounds good to me too.

Minimising images in a Twitter feed on Firefox

Recent changes to the standard Twitter timeline have resulted in all images in posts being expanded, thereby interrupting the compressed textual display. Here is a simple way to minimise embedded images in posts on a Twitter feed on Firefox. It uses Firefox’s ability to set up a custom stylesheet so that styles can be applied on top of or instead of those supplied by the website’s creator.

  1. Find your Firefox profile folder[*] and look for a folder called chrome. If it’s not there, create it.
  2. In the chrome folder, create a file called userContent.css and open it in a text editor.
  3. Add the following line to the file: a.twitter-timeline-link img {width:15% !important;}
  4. Save userContent.css.
  5. Restart Firefox
  6. Hurrah, hopefully. It works for me.

It should reduce the large images down to a little blob. It’s not the most elegant looking blob, but it’s small and clickable.

[*] The profile folder will normally live somewhere in a path like Appdata\Mozilla\Firefox, at least on Windows.

Red Mason Bee Sealing a Nest

This year I have a bee hotel attached to the house which provides somewhere for solitary bees to nest. The most commonly seen garden solitary bee is the red mason bee (Osmia bicornis) which finds cavities, such as in walls or canes in which to lay its eggs. It adds some pollen for food and seals up the cell with mud.

Yesterday I managed to catch a female red mason bee sealing off one of the tubes in the nest box and took some photos. They are not amazing photos, my main excuses being failing light and rubbish photography skills; I was also watering the garden and getting some washing in at the same time, etc etc. A selection of photos in chronological order are below, with the time taken in the caption.

In the first photo, you can see the female deep inside the tube.


Twenty minutes later the start of a wall is in place and the bee has gone off to get some more mud.


A few minutes later and she’s back constructing the wall.


Again she’s off, and there is now a complete ring.


Three minutes later and the ring is closing in.



The female is back with more mud.


Now the hole is clearly too small for her to get in or out.


More work…


…and the hole is sealed!


The bee continues to add more mud for a better seal.


The last photo is from over an hour later and shows the cap jutting out from the wall, not entirely neatly.


There are now six holes filled up in the bee hotel. I understand that each of the holes will have a number of cells, one in front of the other. Next spring, the small males, whose eggs are laid near the front, will hatch first, followed shortly by the larger females. There are hundreds of sorts of solitary bee in the UK, although only the larger ones will use the bee hotel. I saw some leaf-cutter bees, such as the one below, last year.

Leaf-cutter bee cutting a leaf

They use mashed-up leaf pulp instead of mud to do much the same thing, so I hope they might pop by too.


BIBFRAME has worked on modelling works as Works within the BIBFRAME model, similar to the RDA modelling work, itself modelled on the work on the FRBR model of Works and Expressions. A BIBFRAME Work is a creative work, perhaps a FRBR Work, or an RDA FRBR Work but it also expresses a FRBR Expression, and of course an RDA FRBR Expression. A Work may express another Work based on others’ work, not just a FRBR Work or an RDA Work. That also works. FRBR Works or RDA Works expressed as BIBFRAME Works can relate to FRBR Expressions (BIBFRAME Works or RDA Expressions). So, Works are works that can be Works but also Expressions linked to Works that really are Works.