|
|
I have come up with two bookmarklets that allow you to search for an author’s works in a library catalogue from the author’s Wikipedia page in one click. A bookmarklet is a browser bookmark that does something with the page you’re looking at rather than just taking you to a web page: see the helpful Firefox guide for more information. The bookmarklets are identical, except that one searches UCL’s Explore (Primo) service, and the other searches COPAC. To try them:
Install one of the bookmarklets by dragging the link to your bookmarks toolbar:
You can rename them to something more snappy if you like. Next, go to a Wikipedia page for an author. The bookmarklets only work on Wikipedia pages with VIAF or LC Authorities links in them, but most major authors should be fine. Some examples to try:
How it works. The bookmarklet itself is only a short snippet of javascript; all it does is look for any links that might be VIAF or LC Authorities links. It then appends this information as a query string to a URL for a remote PHP script. The PHP script does all the hard work. It first works out the URI for the VIAF entry and has a look at it using ARC2. It looks at the RDF for the authorised LC heading, constructs a search URL for either UCL or COPAC, then redirects itself there, where its work is done. If there is a problem with the VIAF entry it tries the LC link, if there is one, in a similar way. If there is nothing, it will fail and offer to go back to the Wikipedia page or forward to the catalogue you wanted to search.
Why. One of the promises of linked data and BIBFRAME and all the rest of it, is that data from different sources can be linked together and work with each other. Since VIAF links were recently added to Wikipedia, I’ve wondered what could be done to take advantage of this in a practical way. The link does mean that from a Wikipedia page and its uncontrolled (or at least only consistent within Wikipedia) names, you can find out the authorised form of an author’s name. Charles Darwin (the famous one) is only called Charles Darwin in the title of his Wikipedia article. Search for that on a library catalogue and you’ll get all his works plus the stuff written by other Charles Darwins. With the VIAF data, we know that he is known in most (or a huge number of English-language catalogues) as “Darwin, Charles, 1809-1882″ as opposed to the other Charles Darwin mentioned above, who is “Darwin, Charles, 1758-1778″. Although most catalogues or discovery systems don’t use linked data and non-textual identifiers, the ubiquity and uniqueness of an LC heading does almost perform a similar thing (although there are caveats galore).
Many of the caveats are in the way library systems search. Both the examples used are imperfect: the UCL one, as I’ve done it, uses a facet search on top of the bare search which while eliminating some incorrect hits (where “Steve”, “Jones”, and “1944-” appear coincidentally as author elements in a search) also misses out a few hits depending on what field he appears in in the record (this is I think a fixable glitch which I intend to get fixed); the COPAC one is author free text but I’ve tried to remedy some of the potential for false hits by putting all searches in quotes.
Improvements. These are legion, but a few sketched ideas below:
- Implement this as a browser extension. This was my original intention, so that someone could be browsing any old Wikipedia page and when they come across one with a VIAF (or other service) link at the bottom, a search link is created at the top of the page for them to click on. This could easily be extended in several ways:
- Add subject searches. Should be straightforward, although would require moar bookmarklets, relying on the PHP script offering options, or a proper browser extension
- Add more catalogues/discovery interfaces. This is again straightforward to add to the PHP script if you can figure out the web API for a search service but is subject to the same caveats as subject searches.
- Add more than just VIAF and LC Authorities. There are other links appearing at the bottom of some Wikipedia pages, most notably Worldcat. The bookmarklet itself could be easily adapted to accommodate these so avoiding a further profusion of bookmarklets, as well as more back up when services are down (VIAF went down twice while I was testing). Adding services to the PHP is a matter of knowing the structure of the RDF, which shouldn’t be too painful.
- Improve how errors are dealt with and reported, especially so the bookmarklet handles more of them and prevents the PHP script being called unnecessarily.
Feedback. I appreciate this is highly unlikely to set the world on fire, but I would be interested in any feedback or ideas of how it could be developed. Of course, please do let me know if you come across any mistakes or problems: it’s becoming almost traditional for me to get the most crucial link wrong in blog posts.
I have programmed a simple RDF viewer called RDFV RDF Viewer for viewing RDF. Copy and paste the contents of an RDF turtle or n-triples file into the box. The viewer will let you click on an element to highlight it and other instances of the same value, as well as triples with the same subject.
Why? The viewer has two purposes:
1) to make the analysis of RDF files easier. Although turtle/n-triples (I shall refer only to turtle) files are the easiest RDF files to read (certainly compated to RDF/XML which is impenetrable), they can still be complex, especially when there are lots of blank nodes. In particular, I have been trying to get to grips with BIBFRAME data, which can have things like this:
<http://id.loc.gov/resources/bibs/10342843>
bf:creator _:bnode2049831104 ;
bf:subject _:bnode1676317824 ,
_:bnode942225664 ;
The bnodes to which these refer can be many lines away and it is not trivial to match them up. I tried printing data out and drawing lines between them, but this got lost in secret symbols and incoherent scribbles. I thought there must be a better way. With the viewer, you can click on, for example, “_:bnode1676317824″ and it will highlight it in bold and in red text, and do the same for all other occurences of “_:bnode1676317824″. At the top, the number of instances will be shown.
2) To make demonstrating and training easier. I am trying to keep colleagues up to speed with linked data and BIBFRAME and it is especially useful I think to show people real linked data as much as possible. RDF is a trifle intimidating to say the least as first glance, so it is helpful to isolate and highlight sections if possible. As well as highlighting bnodes so you can show how one bit relates to another, the viewer also highlights all the triples with the same subject. So, in the above example, all the triples with the subject “<http://id.loc.gov/resources/bibs/10342843>” will be displayed with a shaded background colour when you click on it.
3. As a way of engaging with RDF, linked data, and BIBFRAME in particular, as well as programming in general. This is my first defence against the person who points out that Tool X already exists to do this and is in fact much better.
How to use it. Simply copy and paste the contents of an RDF turtle file into the box and click Submit. The triples will then be displayed underneath. Click on the data itself to highlight various bits. There are two sets of sample data included, both for the same book (Models for decision by C.M. Berners-Lee):
- OCLC sample data from the linked data example given and explained in my post about One record in lots of metadata formats.
- BIBFRAME data produced from the BIBFRAME website and run through a converter to turn it into turtle.
I hope to add some more examples. To look at your own BIBFRAME examples, following the steps below:
- Go to http://bibframe.org/tools/compare/
- Enter an LC system ID into the box (e.g. 10342843) and click on Search.
- Click on BIBFRAME RDF/XML.
- Select and copy everything in the box.
- Go to http://www.rdfabout.com/demo/validator/ and make sure Input Format is set to RDF/XML
- Paste the BIBFRAME data into the box, overwriting anything that’s already there.
- Click on Validate. You will get three versions of the RDF: as Notation 3 (of which Turtle is a subset); N-Triples; and RDF/XML.
- Select and copy either all of the Notation 3 or the N-Triples. Notation 3 is by far the easiest to read.
- Go to http://www.aurochs.org/rdfv/rdfv.html
- Paste the RDF data into the box.
- Click on Submit.
Technical stuff. The code accepts anything which is in turtle or turtle-like in nature. However, there are probably some strange characters it won’t like and some data structures that will fool it. In particular, it tries to shoehorn everything into three main columns: subject, predicate, object; there are also punctuation and language but the point is that nested triples won’t necessarily look particularly impressive. Moreover, it is only a viewer and it does not understand RDF: it can’t make inferences based on the data and won’t even know, given the prefix statement
@prefix bf: <http://bibframe.org/vocab/> .
that
bf:subject
and
http://bibframe.org/vocab/subject
are the same thing.
RDFV will not rewrite or abbreviate the data, with two exceptions: it formats its own white space and, if the punctuation at the end of a line immediately follows some data, it inserts a space before it, more for its own parsing ability as anything else. Please do let me know if you see anything wrong with it.
The viewer is written entirely in Javascript.
Improvements. I am planning and/or hoping to do some further improvements, including some greater control over the formatting from the page itself, some more examples (especially as BIBFRAME evolves).
I have finally completed a multiple record MARC Record Viewer. This has been rather long in the making but is essentially a quick and practical tool for looking at and assessing MARC records without having to load them into specialist software like MARCEdit or an LMS. It is essentially the same as the viewer built for my Codecademy project except that:
- It reads multiple records in one file, rather than just one, and provides a count.
- It has an input box so the records don’t have to be hard-coded into the script.
Some example .mrc records of varying lengths can be found here.
It is written in client-side Javascript, so you can view source and see how it works, copy it, and do what you like with it (although I would love to know if you do so). I quite defiantly haven’t used JQuery for this, which would probably have made the whole thing a bit easier; instead it uses proper old skool DOM scripting. It uses a minimal amount of CSS, in two files: a generic one, and one that roughly mimics how MARC records look in an Aleph editing screen. It should be fairly trivial to change this file to suit other purposes.
Thank you to those who have already have a shufti at earlier versions of this, especially on different browsers, and provided feedback! Please do let me know if you have any comments on this, suggestions for improvements, or if you come across errors. I have some ideas for improvements, mainly for making user input easier, and offering different formatting of results. I hope to start using JQuery for these too, and perhaps a later conversion of the whole thing would be in order.
At Mashcat on 5 July in Cambridge I gave an afternoon session on getting computer readable information from the textual information held in MARC21 300 fields using Javascript and regular expressions. I intended this to be useful for cataloguers who might have done some of Codecademy’s Code Year programme as well as an exploration of how data is entered into catalogue records, its problems, and potential solutions.
AACR2/MARC (and RDA) records store much quantitative information as text, usually as a number followed by units, e.g. “31 cm.” or “xi, 300 p”. This is not easy for computers to deal with. For instance, a computer programme cannot compare two sizes- e.g. “23 cm.” and “25 cm.”- without first extracting a number out of the string (23 and 25) as well as determining the units used (cm). In some cases, units might vary: in AARC2 books below 10 cm. are measured in mm., and non-book materials are often measured in inches (abbreviated to in.). Potential uses for better quantitative data in the 300$c include planning shelving for reclassification and more easily finding books by size or range.
Before the session, I sketched out a possible solution using Javascript and regular expressions to make this conversion for dimensions in the 300$c. I have a put up a version of A script to find the size of an item in mm. based on the 300$c, with the addition of an extra row which you can fill in to test your own examples without having to edit the script.
If you do want to look at how it works or try editing it yourself you can view source, copy all the HTML, then paste it into a text editor. Save it, then open the file using a browser to test it. Refresh the browser when you change the file.
The heart of the script looks like this:
var dollar_c = [
"9 mm",
"4 in.",
"4 3/4 in.",
"30 cm.",
"1/2 in.",
"20 x 40 cm."
];
// Convert text to mm
function text_to_mm (text) {
// Convert fractions to decimals
text = text.replace(/(\d*) (\d)\/(\d)/, function(str, p1,p2,p3) {return parseFloat(p1)+p2/p3});
text = text.replace(/(\d)\/(\d)/, function(str, p1,p2) {return parseFloat(p1/p2)});
// Extract the size of the book
size = text.replace (/([\d\.]*).*/, "$1");
// Extract the units
units = text.replace(/.*([a-z]{2}).*/g, "$1");
// Convert from various units to mm
if (units === "mm") {
var mm = size;
}
if (units === "cm") {
var mm = size * 10;
}
if (units === "in") {
var mm = size * 25.4;
}
mm=Math.floor(mm);
return mm;
}
It starts with a declaration of an array of examples to be tested: you can alter this with your own if you prefer. text_to_mm is the function that does all the work. It takes in the text from a 300$c, converts fractions (e.g. 4 3/4) to decimals (4.75), finds a number, finds a unit, then performs calculations on the size depending on what the unit is to produce a figure to a standard figure in mm. At Mashcat, Owen Stephens managed to plug an adaptation of this script into Blacklight to create an index of book sizes. Using this he could do things like find the most common sizes or the largest book in a collection.
The main focus of my session, however, was on a similar script to figure out how many actual pages there are in a book, given the contents of a 300$a, e.g. “300 p.”, “ix, 350 p.”, “100 p., [45] leaves of plates” (a page being one side of a sheet of paper; a leaf being a sheet of paper only printed on one side, so therefore counting as two pages). I have also published a version of A script to find the absolute no. of pages based on the 300$a with the similar addition of a row for easy user testing. Potential uses for recording page numbers rather than pagination include planning shelving space, easier to understand displays for users, and finding books of specified lengths.
The script starts with a similar array of examples to be tested:
// An array of test examples
var dollar_a = [
"9 p.",
"9p",
"30 leaves",
"30 p., 20 leaves",
"xiv, 20 p.",
"20, 30 p.",
"20, 30, 40 p.",
"xv, 20, 30, 40 p., 5, 5 leaves of plates",
"clviii, ii, 4, vi p."
];
The main function is called text_to_pages. The first thing it does is convert any Roman numerals to Arabic ones. The heavy lifting for this is a function by Stephen Levithan which does the actual number conversion. However, we still need to identify and extract the Roman numerals from the pagination in order to convert them. This line does the extraction and makes a list of the Roman numerals:
var roman_texts=text.match(/[ivxlc]*[, ]/g);
The session I gave concentrated on regular expressions (a bit like the wildcards you use on library databases but turned up to eleven) which in all cases here are contained within slashes, and I made a simple introductory guide to regular expressions (.docx). There are many guides to regular expressions on the web too, and useful testers to play with such as this one. The regular expression in the line above can be broken down as follows:
- [ivxlc] uses square brackets to look for any one of the characters listed within them.
- The following * means to look for any number of these in a row
- [, ] any of a comma or a space, again using square brackets. Obviously these characters are not used in Roman numerals but they are a convenient method of isolating these characters as numbers rather, say, the “l” in leaves which would also match otherwise.
The next few lines work through the list, replace any instances of [, ] with “” (i.e. nothing) to leave the bear Roman numerals, convert all the numbers in the list using Stephen Levithan’s functions, then do the replacements on the pagination given in text:
if (roman_texts) {
for (i=0; i<roman_texts.length; i++) {
// Remove space
roman_texts[i]=roman_texts[i].replace(/[, ]/,"");
var arabic_text = deromanize(roman_texts[i])+" ";
text = text.replace(roman_texts[i],arabic_text+" ");
}
}
}
Like the size script above, the rest of conversion needs to do two things: find the numbers and find the units. To do this we need to find the sequences involved. While this is easy with something like “24 p.” (number is 24, unit is p) or even “xv leaves” (number is 15, unit is leaves), it becomes troublesome when you get something like “23, 100 p.”: the first number is 23 but there is no unit associated with it, only a comma to signify that it is a sequence at all. The following lines try and get round this problem but looking for sequences where the comma appears to be the unit and then looking ahead to find the next unit. In the “23, 100 p.” example the script would keep looking forward past the 100 until it gets to the “p”.
// Convert 20, 30 p. to 20 p. 30 p
while (text.match(/\d*,/)) {
text = text.replace(/(\d*),(.*?(p|leaves))/, "$1 $3 $2");
}
The first regular expression in the while line looks for:
- \d* any number of digits. \d is any digit and * looks for any number of them, followed by
- , a comma
So as long as the script finds any sequences of numbers followed by a comma, it will carry on making the replacement underneath it. The replacement line itself looks for
- \d* any number of digits again, followed by
- , a comma
- .*? which is . any character * any number of times. The ? makes sure that the smallest matching group of characters is matched; otherwise the expression will think that the units corresponding to the number 15 in the pagination “15, 25 p., 50 leaves” is “leaves” rather than “p”.
- p|leaves either p or leaves. The pipe means either match on the left of it or the right of it. Because this is in a set of round brackets, the pipe only applies there, rather than the whole expression.
Brackets also capture subsets which is really useful here: the first set of () brackets captures the number of pages and stores it as $1, the second set captures everything between the comma and the end of the units as $2, the third set captures the units only, either “p” or “leaves”, and stores it as $3. So in the example “15, 25 p., 50 leaves”, $1 is “15″, $2 is ” 25 p”, and $3 is “p”. The replacement puts these back in a different order, i.e. “$1 $3 $2″ which would be “15 p 25 p”.
Now that all the sequences will be in number-unit pairs, we can get on with making a list of them to work through:
// Find sequences
var sequences = text.match(/\d+.*?(,|p|leaves)/g);
This looks for:
- \d+ at least one digit
- .*? any number of any characters, although not being greedy
- (,|p|leaves) any of a comma, “p”, or “leaves”. Obviously, if the while loop above has worked, then the comma isn’t needed, but I’ll confess this is a hangover from a previous version of the script…
The next section goes through each of the sequences found and extracts the number and then the unit:
// go through sequences
var pages = 0;
for (var i=0; i<sequences.length; i++) {
// Extract no
var number = parseFloat(sequences[i].match(/\d+/g)[0]);
var units = sequences[i].match(/(p|leaves)/g)[0];
if (units == "p") {
pages+=number;
}
if (units == "leaves") {
pages+=number*2;
}
}
The regular expression to find the number is straightforward:
The parseFloat converts the digits as a string to a Javascript number. The regular expression to find the unit is also simple:
- (p|leaves) either “p” or “leaves”
If the units are “p”, then the variable pages is incremented by the value of the number found; if “leaves”, then pages is incremented by twice that number.
The programme should cope with the loss of abbreviations in RDA as “p.” is expanded to “pages” but the regular expression to find the units will still find the “p” at the beginning much as it isn’t put off by the full stop after the “p”. It could be expanded to look for other variations and I will do so if I can:
- “S.” for German “Seite” or “Seiten”.
- “leaf”, as in “1 leaf of plates”
- sequences which start in the middle of larger ones, like journal issues with “xii, p. 546-738″. This one will be the most complicated as it goes against the basic flow of the existing code.
I also haven’t properly tested folded sheets or multiple volume works. Other improvements are needed in failing more gracefully when it doesn’t find what it’s expecting: the programme should really test the existence of the arrays it makes before looping through them, but this would make it harder to understand at a glance or demonstrate on screen so I didn’t do it.
The scripts are written in Javascript for several reasons: it is the language that Codecademy focusses on for beginners; it requires no specialist environment, server, or even a web connection: you just need a basic text editor and a browser; it is easy to adapt for a web page if you do manage to build something; and, it is the language I am most confident working in. It would be fairly easy to port to other languages though, and Owen changed the size script with some other modifications to work in Beanscript/Java in Blacklight.
I can’t speak for the attendees, but I learnt a lot, and much was made more clear, from playing around with these scripts and talking to people at Mashcat:
- Quite how depended AARC2 and RDA (and consequently MARC21) are on textual information, even for what appears to be quantitative data.
- That even for what appears to be standard number-unit data, there are too many complications that make it non trivial to extract data:
- fractions (not even decimals) in 300$c
- differing units: book sizes in mm. or cm. depending on how big the book is; disc sizes in in.; extent in pages or leaves (or volumes or atlases or sheets…)
- sequences with implied units, such as those with commas.
- there is frequently a lack of clarity and ambiguity of what is actually being measured:
- for books the dimension recorded is normally height (although this is not explicit from a user’s point of view, sometimes it’s height and width, and for a folded sheet it could be all sorts of things); for a disc it’s the diameter.
- For the 300$a what’s being recorded is pagination, something entirely different from number of pages. Although important for things like rare books, how important is complete pagination for most users compared to a robust idea of how large a book is? Amazon provide a number of pages. More importantly, how understandable is pagination? During my demonstration, some of my audience of librarians were left cold by the meanings of square brackets for example (and square brackets can mean any number of things depending on context). Perhaps there is room for both.
I suppose this latter point is a potential conclusion. Ed Chamberlain asked me what I thought should be done. I don’t know to be honest. I think, like much of the catalogue record, lots more research is needed to see what users (both human and computer) actually want or need. It should be said that entering pagination is in many ways easier for the cataloguer. However, I do think we need:
- quantitative data entered as numbers with clear and standard units. For instance, record all book heights as mm. and convert to cm. for display if needed.
- more data elements to properly make clear what is being recorded. Instead of a generic dimension, we need height, width, depth?, diameter, etc. Instead of pagination, we could have separate elements for pagination, number of pages, and number of volumes (50 volumes each of 10 pages is not the same as 4 volumes of 1000 pages each). Obviously all of them wouldn’t be needed for all items.
The research to enable us to choose what to record, why we’re recording it, and for whose benefit would be the best starting point for this as well as many other questions in cataloguing and metadata.
I have created a Codecademy project (with a lot of help in corrections and improvements from Esther Arens!) that builds a short script to read a raw MARC record and display it in a more readable format. Try it here: http://www.codecademy.com/courses/marc-viewer/.
This is not by no means the last word in reading a MARC file and is basically a walk through of one way to do it. There are other ways and better ways that use more advanced coding, or allow more sophisticated re-use of the bits and pieces that are pulled out of the MARC record. There are also entire programmes and programming utilities designed to do this kind of thing and to manipulate MARC records, not least library management systems and things like MARCEdit. Moreover, there are limitations in formatting on the Codecademy platform that can easily be overcome by adapting the script to be run directly in an HTML file (I have done a direct simple adaptation without any further elaboration (view the HTML source to see the code and the alterations)).
I hope, if nothing else, that it gives cataloguing coders an idea of what a MARC21 record looks like under the hood and helps clarify the cataloguer’s opinions as to whether MARC must or mustn’t die. (HINT: it must).
Please see the following notes below before proceeding:
1. This project was designed for someone who has done the first few weeks of the Code Year course. By necessity it introduces some new things and an attempt has been made to explain them and encourage the cataloguer to enter the actual lines of Javascript that make up the programme. In any case, the Hints always contain the correct code needed to proceed.
2. Output will often consist of many lines, so sometimes you will have to scroll up in the console to see what has happened.
3. Some lines (including line 1!) will always produce errors, although the script will still run. This will is because MARC uses BAD and DANGEROUS characters. BAD and DANGEROUS characters are of course common in the world of cataloguing (mentioning no names…).
There are many ways this could be improved or extended if you wanted the challenge, e.g.:
- Take the HTML version and use more HTML and CSS to make it clearer and prettier (e.g. more spacing, colour, bolding of codes). Try making it look like a specific LMS editing screen.
- Make it capture the elements in more detail and in a more re-usable way. For instance, try making each field an object with tags, indicators, subfields, etc. as properties. This would enable more interesting things, such as…
- A simple OPAC or even a card index display.
- Adapt it to read MARC files with more than one record. This isn’t as hard as it sounds, in that each record ends with a specific terminator (see the guide to record structure below).
For full technical details of how a MARC21 bibliographic record is put together, see the MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media Record Structure. For details of the contents and use of MARC21 fields, see LC’s MARC Standards page. For a HTML version of the completed MARC Viewer script, see my adaptation.
The Code Year programme is part of Codecademy, an online set of programming lessons. Cataloguers interested in learning to programme will find the independent CatCode Wiki useful for extra information, advice, and support. See also the #catcode hashtag on Twitter.
Do let me know if you come across any problems with it or have any comments on the project.
Thank you again to Esther for her help.
https://twitter.com/#!/EstherArensEstter
I have tidied up and moved my ancient flexi hours calculator which now lives at http://www.aurochs.org/flexi/flexi.html. Several people have emailed me out of the blue recently about setting up an Excel version, which I have done and which now lives at http://www.aurochs.org/flexi/flexi.xlsx.
This is a version of one of the oldest useful programmes I ever wrote. My dad used to manage an office where everyone filled in paper flexi-time forms. My dad then had to add them all up at the end of every week, which he did manually. I wrote something in AmigaBasic to make it easier for him. The main point of it was that it had to be easy and quick to enter the times in, which is why it uses the simple four figure times and no drop downs (although I’m not sure how or if you can do drop down lists in AmigaBasic (nor do I intend to find out now)). Eventually my dad got the IT people at work to replace their version of a flexi calculator with something based instead on mine, which is silly considering how simple this was. Sometime after starting at my current employer (1997), I thought it would be a useful exercise to convert the programme to work on the web as I was learning Javascript at the time. It still exists and hasn’t changed a lot since then, barring a bit of explanation and some atrocious styling: the last time that file was touched was in 2003.
Although it is really hard to find (I can never remember where I left it and it seems impossible to Google), people do seem to come across it quite often and find it useful. I’ve been emailed a couple of times about getting bespoke versions done in Excel which led me to create a version initially to record multiple users, and another with additional days and more complicated working patterns. I’ve put a more standard version up with seven days and two sessions per day. It can easily be altered if you’re into Excel or, if you have something particular in mind, do let me know and I might be able to do something with it.
The original online version is written in Javascript. If you’re learning Javascript, please don’t look at it as it is a most outdated and inefficient way of adding Javascript to a webpage. However, it works, and the effort of making it all elegant would I think be counterproductive. It works on the idea that the first two digits of a four digit time (HHMM) are hours, the second two digits minutes. It converts these both to minutes (HH * 60 + MM) and does all the necessary maths. The slightly more tricky bit is converting a total of minutes back into hours and minutes. This is one rare (for me) real life use of the modulo (%) operation (although looking at the code source I seem to have invented my own weird convoluted version of the same thing). If you have x number of minutes, the number of hours will be x divided by 60 with any remainder taken off, i.e. Math.floor(x / 60); the number of minutes will be that remainder, i.e. x % 60. These can be put together into a pretty string.
I’m always interested to know if people find this useful, come across problems with it, or would find a slightly different version useful.
I am not a trained programmer, coding is not part of my job description, and I have little direct access to cataloguing and metadata databases at work outside of normal catalogue editing and talking to the systems team, but I thought it might be worth making the point of how useful programming can be in all sorts of little ways. Of course, the most useful way is in gaining an awareness of how computers work, appreciating why some things might be more tricky than others for the systems team to implement, seeing why MARC21 is a bastard to do anything with even if editing it in a cataloguing module is not really that bad, and how the new world of FRDABRDF is going to be glued together. However, some more practical examples that I managed to cobble together include:
- Customizing Classification Web with Greasemonkey. This is a couple of short scripts using Javascript, which is what the default Codeacademy lessons use. Javascript is designed for browers and is a good one to start with as you can do something powerful very quickly with a short script or even a couple of lines (think of all the 90s image rollovers). It’s also easy to have a go if you don’t have your own server, or even if you’re confined to your own PC.
- Aleph-formatted country and language codes. I wrote a small PHP script to read the XML files for the MARC21 language and country codes and convert them into an up to date list of preferred codes in a format that Aleph can read, basically a text file which needs line breaks and spaces in the right places. It is easy to tweak or run again in the event of any minor changes. I don’t have this publicly available anywhere though. PHP is not the most elegant language but is relatively easy to dip into if you ever want to go beyond Javascript and do more fancy things, although it can be harder to get access to a server running PHP.
- MARC21 .mrc file viewer. I occasionally need to quickly look at raw .mrc files to assess their quality and to figure out what batch changes we want to make before importing them into our catalogue. This is an attempt to create something that I could copy and paste snippets of .mrc files into for a quick look. It is written in PHP and is still under construction. There are other better tools for doing much the same thing to be honest, but coding this myself has had the advantages of forcing me to see how a MARC21 file is put together and realising how fiddly it can be. Try this with an .mrc which has some large 520 or 505 fields in it (there are some zipped ones here, to pick at random) and watch the indicators mysteriously degrade thereafter. I will get to the bottom of this…
The following examples are less useful for my own practical purposes but have been invaluable for learning about metadata and cataloguing, in particular, RDF/linked data. I was very interested in LD when I first heard about it. Being able to actually try something out with it (even if the results are not mind-blowing) rather than just read about it, has been very useful. Both are written in PHP and further details are available from the links:
Nothing to do with cataloguing, but what I am most proud of is this, written in Javascript: Cowthello. Let me know if you beat it.
Update: Shana McDanold also wrote an excellent post on why a cataloguer should learn to code with lots of practical examples.
Behold: a version of Tom’s Excellent Javascript Snow (unobtrusive and customisable javascript snow for web pages using no images) that works on all websites you open on your browser (provided your browser is Firefox or something else that can run Greasemonkey scripts)!
To install it:
- Install Greasemonkey add-on for Firefox: https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/
- Make sure the monkey (probably at the top-right) is happy and colourful. Click on it if not.
- Install the tomsnow script by going to http://www.aurochs.org/zlib/js/userjs/tomsnow.user.js then
- Click on the Install button.
- Go and look at a new web page or reload one.
If you want to turn Greasemonkey off altogether, click on the monkey so he’s grey. If you want to stop individual scripts, click on the monkey, click on Manage User Scripts, and click on Disable next to the script.
These instructions were tested on Firefox 3.6.24 on Linux although I imagine they would be fine on any recent version of Firefox. I would be interested to hear anything confirming or undermining that assertion.
If you’re happy to play around, the snow is very customisable: you can easily alter the amount, speed, and style of snow, and so forth:
- Click on the monkey
- Click on Manage User Scripts
- Select tomsnow from the list
- Click on Options
- Click on Edit this user script (you will probably have to select a text editor at this point)
- Look for the section under the line of asterisks where more instructions can be found on how to make customisations.
- Save the file and reload any pages to see changes.
The Greasemonkey version of the script uses slightly different default settings to the previous version, in particular using a lower density of flakes as a huge blizzard of snow is not likely to be welcome if used on all sites one browses.
I haven’t found any particular problems and it doesn’t seem to stop any sites working although sites that are already very script heavy are obviously less happy about running more, e.g. Twitter which is fine but can get sticky, although Gmail seems curiously OK. My cPanel was the only one which was really not happy. When you manage a Greasemonkey script, you will see a box where you can specify websites that you don’t want it to work on. For example, put https://twitter.com/* and it will stop tomsnow working on that URL with anything after it.
Classification Web is ace, but there are a couple of things about the interface that annoy me and, in one colleague’s case, seriously put him off using it, in particular:
- The opening of a new tab/window when you click on the MARC view for a subject or name.
- The confusing menu. We don’t use LCC or DDC, and the browse options don’t really add much, so we only really need two options: Search LC Subject Headings and Search LC Name Headings.
I managed to work out a simple way of modifying how Classification Web works on Firefox using the Greasemonkey add-on and a couple of simple scripts, all of which is quick and easy to install:
- Install Greasemonkey: https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/
- Make sure the monkey in the bottom-right corner is happy and colourful. Click on it if not.
- If you want to prevent the MARC view opening a new window, install the classweb_no_new_ window script by going to http://www.aurochs.org/zlib/js/userjs/classweb_no_new_window.user.js then
- Click on the Install button
- If you want to reduce the main menu, install classweb_prune_menu script by going to http://www.aurochs.org/zlib/js/userjs/classweb_prune_menu.user.js then
- Click on Install button
- Reload/refresh Classweb if it’s still open and it should work.
If you want to turn Greasemonkey off altogether, click on the monkey so he’s sad and grey. If you want to stop individual scripts, right click on the monkey, click on Manage User Scripts, select a script from the list, and un-tick the Enabled box in the lower left corner.
These instructions were tested on Firefox 3.5.3 although I imagine they would be fine on any recent version of Firefox. I would be interested to hear anything confirming or undermining that assertion.
If you’re happy to play around, these scripts can be further altered. In particular, you can choose which menu items appear in the pruned menu script:
- Right click on the monkey
- Click on Manage User Scripts
- Select classweb_prune_menu from the list
- Click on Edit (you will probably have to select a text editor at this point)
- Edit the list of pages under the line var menu_items_to_keep = Array (. Enter each page you want to appear on the menu on a separate line in quotes, with a comma at the end of each line except the last line. The menu item must appear exactly as it does on the Classification Web menu, including capitals. E.g., the default set up looks like this:
var menu_items_to_keep = Array ( // end each line with a comma except the last line
"Search LC Subject Headings",
"Search LC Name Headings"
);
- Save the file, and reload Classification Web.
If anyone else finds this useful or can think of more customizations let me know.
Behold: Tom’s Excellent Javascript Snow: unobtrusive and customisable javascript snow for web pages using no images! If you’re looking at this directly, rather than through an RSS aggregator, you should see it falling now. Unless, that is, you’re watching this on Dave, in which case it will be some time past Christmas and I might have taken the snow away again. However, it always snowing at Tom’s Excellent Javascript Snow page!
I’ve been meaning to write this since last year when my attempts were full of fail. The idea was to create simple unobstrusive javascript snow that could be added to any page and that didn’t require any images. I think Tom’s Excellent Javascript Snow fulfills these criteria and is therefore full of win. Furthermore, it is very customisable, so you can easily alter the amount, speed, and style of snow, and so forth. Incidentally, it uses the asterisk character (*) by default. There is in fact a Unicode Tight Trifoliate Snowflake character, but it is only available in a few fonts by the looks of it and I haven’t tried it. The script depends on the DOM and kind of uses CSS, but it is all defined through the Javascript: so many of the properties are different for individual snowflakes or change while the script is running, that it is not worth having a general style. It also means you only need one file to do everything.
To use it, copy tomsnow_v1.js to a directory on your web-server, and add the following code to the head of any pages on which you would like snow:
<script type=”text/javascript” src=”http://www.yourdomain.com/path/tomsnow_v1.js”></script>
<script type=”text/javascript”>
function init () {
snow();
}
window.onload=init;
</script>
I’m sure Stuart will tell me there’s a better way of doing it…
I know this works on Firefox 2 and Internet Explorer 6 on Windows as well as Firefox 3 on Linux.
In the unlikely event you do use this, do let me know for the sake of my own vanity. Any comments generally are welcome. I do have some ideas for version 2, maybe for next year, mostly around wind effects such as better horizontal drifting, prevailing winds, and gusts. Ideally, I would like to make the snow lay in some way, as in the snow at St Pancras (you might have to wait for it to kick in), but that is quite unlikely given the trouble I had with page heights as it was.
|