The Shore Project Group
Computer Sciences Department
UW-Madison
Madison, WI
Thu Nov 3 14:18:07 CST 1994
This tutorial explains, through the use of a detailed example, how to write application programs in C++ that store and manipulate complex data structures in Shore.
This tutorial illustrates many aspects of Shore, including
This tutorial is not a general introduction to Shore, its goals, structure or status. You should read An Overview of Shore before reading any further. This tutorial is also not a reference manual. Reference manuals exist or are being written for all aspects of Shore. See The Shore Alpha Release for an index to the rest of the documentation. Finally, this tutorial does not even attempt to demonstrate all the features of Shore, just a subset sufficient to get you started.
The example program, called stree, uses an unbalanced binary search tree as an inverted index to a set of documents. The tree contains one node for each distinct word appearing in any of the documents. Associated with each word is a set of citations, each of which indicates a document and the offset within the document of the start of a line containing the word. Each tree node, citation, and document is a separate object stored in the Shore database. Although the documents are actually stored in the Shore database, they are stored in such a way that existing Unix programs can manipulate them as if they were ordinary files.
The program has options to add and remove documents from the database and to list the lines containing a given word. It also has a debugging option that dumps all the objects in the database. This option illustrates how to write a maintenance program that iterates through the objects in the database in a ``raw'' form.
The example program illustrates how to define objects with methods (``member functions'' in C++) linked together with pointers (called ``references'' in Shore). It also shows how to use relationships, which generalize pointers, adding the ability to represent ``1-to-N'' and ``M-to-N'' associations and automatically maintain inverse ``pointers''. It also illustrates the Unix-compatibility features of Shore.
First and foremost, this program does not illustrate the best-or even a good-way to build inverted indices. Shore has a built-in index feature (not illustrated by this program) that is far more efficient than a binary search tree, which is only efficient for main-memory structures (and even then, a hash table would be better). The example was chosen to illustrate how a program that manipulates linked data structures can easily be adapted to make those structures persistent. However, making a program that manipulates main-memory data structures run efficiently with persistent data generally requires a careful re-design of data structures and algorithms.
This example also does not illustrate all the features of Shore or the SDL data-definition language. It does not use all the available pre-defined data structures (such sequences or bags), the ``module'' facility for managing large complex designs, or inheritance (SDL supports multiple inheritance).
This tutorial walks through the example program sources in detail. The sources, as well as associated test programs and data, may be found in the src/examples/stree sub-directory of the distribution. Throughout this tutorial, we will assume (as does the Shore Software Installation Manual), that the environment variable $SHROOT contains the absolute path name of the root directory of the installed Shore software.
The first step in building an application to run under Shore is to define the data types of all persistent data that will be used by the application. These declarations are written in a type-definition language called SDL (for Shore Data-definition Language). The file stree.sdl contains the type definitions for the binary search tree example. This file contains one module (called stree), which defines four types: SearchTree, Word, Document, and Cite. These types are defined by interface definitions, which are quite similar to class definitions in C++.
SearchTree is the top-level object. The example program will create exactly one object of this type. Its public interface consists of three operations: initialize, insert, and find. It also defines one private operation, an alternate (overloaded) version of insert, and one attribute, a ref (persistent pointer) to the Word object that is the root of the tree. The comments in stree.sdl explain the semantics of the operations and attributes; we will remark here only on aspects of the definition that illustrate features of SDL.
The initialize operation of a SearchTree should be invoked once immediately after it is created. SDL does not yet have the counterpart of constructors and destructors in C++. (This deficiency may be rectified in a future version.) The insert operation is intended to be invoked with one argument, which is a (transient) C++ character string. We use type type lref<char>, which translates to char * in the C++ binding. This is a bit of a kludge. Future versions of SDL will probably have a more general mechanism for declaring operations that have parameters of ``opaque'' (i.e., non-SDL) types.
Word represents a node in the search tree. The private part of the interface is similar to the way one might define a binary search tree node in C++ or C. A Word has a string value and pointers to its left and right subtrees. The type string is a pre-defined type in SDL representing an arbitrary-length character string. The relationship declaration indicates that Words participate in an N-M (many-to-many) relationship with Cites. That is, for each instance w of Word, w.cited_by is a set of zero or more references to Cite objects. Moreover, Word::cited_by and Cite::cites are to be kept consistent: w.cited_by should contain a reference to an instance c of Cite if and only if c.cites contains a reference to w. The operations find_or_add and find are intended only to be called from the operations of SearchTree. They would be in the private part of the interface if SDL had the equivalent of the friend declaration of C++.
A Cite object represents a citation (an occurrence of a word in a line of a document). If SDL supported first class relationships in the sense of the entity-relationship model-that is, if relationships could have attributes-Cite would have been declared as a relationship between Word and Document with attribute offset.
Finally, Document represents an actual document stored in the repository. The type text of the attribute body is the same as string, but has the additional function of declaring that when a Document is accessed through the Unix compatibility interface, it will appear to be a file whose contents are the contents of this field.
The second step in building an application is to write the code that implements the operations of its interfaces. Currently, this code must be written in C++. The Shore project intends to support other implementation languages in future releases. The implementation code for the search tree example is contained in four files, one for each interface: tree.C, word.C, cite.C, and document.C.
On the whole, this code is similar to the code one would write to implement the C++ classes generated from the interface definitions. However, there are several points to note:
The main program of our sample application is in main.C. Most of the code should be clear to any experienced C++ programmer. We will only concentrate on those statements that exercise Shore features.
Any program that interacts with Shore must call the static member function Shore::init exactly once before doing any Shore operations. This function has one argument, the name of the program (generally argv[0]). Like many Shore interface functions, it returns a value of type shrc (``rc'' stands for ``return code'').
The macro SH_DO is handy for calling functions that are not expected to fail. It evaluates its argument and verifies that the result is RCOK. If not, it prints (on cerr) an error message and aborts the program.
Every Shore operation except Shore::init must be executed inside a transaction. A transaction groups a set of interactions with the database into a single atomic unit. Shore ensures that transactions running concurrently by multiple programs have a net effect that is equivalent to running them one at a time. (This property is called ``serializability''). Moreover, if a transaction should fail, Shore guarantees that all changes to the database performed by the transaction are undone. A program starts a transaction by invoking the macro SH_BEGIN_TRANSACTION. Its argument is a variable of type shrc. When it has successfully completed all the actions in a transaction, it invokes the parameterless macro SH_COMMIT_TRANSACTION to make all of its changes to the database permanent and to release any locks on database objects that Shore may have obtained to ensure serializability. In exceptional circumstances, Shore may reject the attempt to commit the transaction. Therefore, it returns an shrc value. Since we do not want to try any fancy recovery actions if SH_COMMIT_TRANSACTION fails in our application, we invoke it with SH_DO.
On occasion, an application program may discover that a transaction cannot be completed for application-specific reasons. In such occasions, the program can explicitly abort the transaction by calling the macro SH_ABORT_TRANSACTION(rc), where rc is a value of type shrc. In addition to requesting Shore to undo all changes to persistent data and release all locks, this macro performs a longjmp, returning control to the statement following the most recently executed SH_BEGIN_TRANSACTION and assigning the shrc value supplied to SH_ABORT_TRANSACTION to the result parameter of SH_BEGIN_TRANSACTION. Since any transaction may be aborted in this manner, each call to SH_BEGIN_TRANSACTION should be followed by code that tests the resulting shrc and takes corrective action if it is not RCOK. The member function shrc::fatal prints an appropriate message and aborts the program. The macro SH_DO previously described behaves somewhat differently if a transaction is active and an error is detected: Instead of terminating the program, it invokes SH_ABORT_TRANSACTION.
After beginning a transaction, our example program calls Shore::chdir, to go to directory search_tree. Shore::chdir is similar to the chdir system call of Unix: It alters the current Shore working directory. Note that the program has two ``current working directories'': one that is applies to Unix system calls and one that applies to all path names used in calls to Shore.
Like the Unix system call, Shore::chdir will fail if the the directory does not exist. In the case of our example program, Shore::chdir fails for this reason the first time the program is run. It recovers by creating the directory (using Shore::mkdir, which is similar to the Unix function of that name), and reissues the chdir request). A failure of the first chdir operation for any other reason is a catastrophic error. Thus the program checks that the return code is either RCOK or SVAS_NotFound (``SVAS'' stands for ``Shore Value-Added Server'').
When run for the first time, our example program also creates two ``registered'' objects: an instance of SearchTree registered under the path name search_trees/repository and a pool named search_trees/pool. The pool will be used to allocate ``anonymous'' instances of Word and Cite. See An Overview of Shore for more information about registered and anonymous objects and pools. The SearchTree object is created by a form of the C++ new operator applied to the class name SearchTree using C++ ``placement syntax'' to supply the path name and permission bits for the new object. If the operation should fail for any reason (such as permission denied), it will cause an abort of the current transaction, returning control to the statement following the most recent SH_BEGIN_TRANSACTION. Otherwise, a reference to the new object is assigned to the global variable repository.
The creation of the pool illustrates an alternative way of creating a registered object. The variable nodes is declared to have type REF(Pool), where Pool is a pre-defined Shore type. The class REF(T), for any T, has several static member functions, such as create_registered and create_anonymous, and create_pool. Each one has parameters to supply a path name and protection mode, as well as a result parameter to receive a reference to the created object. In this case we call nodes.create_pool to create a new Pool object. (We could have written equivalently REF(Pool)::create_pool). Each of these functions returns an shrc result to indicate success or failure.
The differing failure modes of these two ways of creating registered objects illustrate a general design principle of Shore. Many Shore operations are invoked implicitly. Another example of an implicit operation is dereferencing a REF(T) value. When any of them fails (for example, if the reference is null or dangling), Shore responds by aborting the current transaction. If the program needs more precise control-in particular, if it wants a chance to recover from the error-it must use an alternative interface by explicitly calling a Shore function that yields a return code.
If the directory search_tree already exists, the program expects to find existing registered objects search_tree/repository and search_tree/pool. Each reference class REF(T) has a static member function lookup, with a path-name input parameter and an output parameter of type REF(T). This function looks for a registered object with the given name, and if one is found, checks that its type (as indicated by data stored in the database) matches T. If both checks succeed, a reference to the object is returned in the result parameter. The initializations of repository and nodes illustrate two ways of invoking this function.
Finally, the main program performs one of four operations depending on a command-line switch. It either adds one or more documents to the repository, looks up a word, removes a document from the repository, or dumps all anonymous objects.
In a typical Shore application, the vast majority of objects created will not have path names. Unlike registered objects, which can be accessed either by path name or by references from other objects, these ``anonymous'' objects can only be accessed by following references. To assist in clustering, and to allow the application (and system administrators) to keep track of all allocated space, Shore requires each anonymous object to be allocated from a pool, which is a registered object. Our example program uses just one pool for all anonymous objects. A more sophisticated program might use a separate pool for each type extent, or for each major component of a complex data structure.
The function SearchTree::insert(char *fname) in tree.C shows how to create anonymous objects. The expression ``new (nodes) Cite'' allocates a new instance of interface Cite from the pool referenced by nodes. The function Document::finalize in document.C shows how to destroy an anonymous object: If p is a reference (an instance of REF(T), for some type T), p.destroy() destroys the object referenced by p. Registered objects cannot be explicitly destroyed; like Unix files, they are deleted by the system when they have no path names designating them. An example of code to delete a registered object may be found in the function delete_file in main.C The call Shore::unlink(fname) removes the name fname from a registered object. Since this program does not use Shore::link to create multiple aliases (``hard'' links) for objects, this operation will also cause the object to be destroyed.
The definitions in stree.sdl include two bidirectional relationships. One links words to their citations and the other links citations to the documents they cite. A bidirectional relationship has two names, one for each direction. For example, the relationship between citations and documents is called ``doc'' in the Cite-to-Document direction and ``cited_by'' in the reverse direction. This relationship is declared by the declaration
relationship ref<Document> doc inverse cited_by;in interface Cite, and by the declaration
relationship set<Cite> cited_by inverse doc;in interface Document. The SDL compiler checks that the two declarations are consistent. The use of ``ref'' rather than ``set'' in the first of these declarations indicates a functional dependency from Cite to Document; each Cite is related to at most one Document.
In the C++ binding, these declarations give rise to data members Cite::doc, of type REF(Document) and Document::cited_by, of type SET(Cite). Similarly, the relationship between words and citations is represented by Word::cited_by and Cite::cites.
The type SET(T) represents a set of zero or more references to T objects. It has member functions to add and delete values of type REF(Cite) and to iterate through its contents. The exact details of the interface are likely to change in future releases of Shore. Right now, the only documentation on the interface is the file $SHROOT/include/sdl_templates.h (see the definition of the macro SET_DCL(T)). An example of the use of the current interface may be seen in word.C. In Word::occurs_on, a citation of word w is recorded by adding the reference cite to w.cited_by. The runtime support automatically adds a reference to w to cite->cites. The function Word::occurrence uses the member function SET(Cite)::get_elt to retrieve (a reference to) one of the citations of a word, while Word::count uses SET(Cite)::get_size to determine how many citations there are. A reference can be deleted from a set with SET(T)::del. Document::finalize uses an alternative interface: The function SET(T)::delete_one deletes an arbitrary reference from the set and returns it as its value.
The function Document::finalize is called just before destroying a Document object. Although the runtime system automatically updates one end of an bidirectional relationship when the other end is updated by assignment, it does not yet update inverse relationships properly when an object is destroyed. (This is a bug; it will be fixed in a future release). However, even if it did so, there might be application-specific cleanup operations required. In our example program, we would like to ``garbage collect'' the Cite objects associated with the document being removed. Document::finalize iterates through the citations of the document, invoking Cite::finalize on each one and then destroying it. Cite::finalize simply removes all references from the citation to Word objects, thereby removing the citation from the cited_by set of each word. The example program does not remove words from the binary search tree when their citation counts drop to zero. Doing so would not be hard, but it does not illustrate any additional features of Shore.
The pre-defined type string is implemented as a char * pointer and a length (so strings can contain null bytes). When a persistent object containing strings is written to disk, the actual string data is appended to the object and the pointers are converted to a form appropriate for storage on disk. When it is brought back into memory, the pointers are restored (``swizzled'') to memory addresses. When an ordinary C++ (null-terminated) string is assigned to a Shore string, the bytes (up to and including the terminating null byte) are copied to dynamically allocated space. See for example, Word::initialize. When an object containing strings is removed from the object cache, its string space is freed. Thus Shore strings have value semantics.
Standard library string functions such as strcmp, strncmp, strlen, etc., as well as memcpy and bcopy are overloaded to work with Shore strings. In addition to strlen, strings support an operation blen which returns the total length (including null bytes). It is also possible to assign a character or string to an arbitrary offset in a Shore string. The target string is expanded if necessary to accommodate the data. For example, Document::append extends the body field of a document by invoking the sdl_string::set function (Document::body is actually of type text, but text and string are the same for the purposes of this discussion). See $SHROOT/include/sdl_string.h for more information.
The example program supports an option (-p) for dumping all the anonymous objects in the pool created by the program. This last option is useful for verifying that the object deletion code is working correctly, and illustrates how one might write administrative programs for maintaining a complex database. The function pool_list in main.C creates a PoolScan object to scan the contents of the pool, and tests whether the creation was successful. (If for example, the named pool did not exist or permission was denied, the scan object would be created in an ``invalid'' state, and would test as false when converted to Boolean.) The function PoolScan::next returns a reference to the ``next'' object in the pool (according to some arbitrary ordering) in its result parameter. It returns some shrc value other than RCOK when no more objects remain. The result parameter must be of type REF(any), the persistent analogue of void *-a reference to an object of unknown type. The actual type of object can be tested dynamically with the function TYPE(T)::isa(REF(any) &ref). Each interface T defined in an SDL definition gives rise to a type object (or meta-type), which is available as a global variable named TYPE_OBJECT(T) (of type TYPE(T)). One of the member functions of this object is isa, which accepts a parameter of type REF(any), tests whether it is a reference to an object of type T, and if so returns a reference of type REF(T) to it. Otherwise, isa return a null reference. It should be noted that this interface for dynamic type checking is provisional; it may be replaced with a facility more nearly resembling the dynamic_cast syntax for run-time type identification (RTTI) recently added to the proposed C++ standard.
After checking that the type of returned object conforms to one of the expected types (the program only creates anonymous objects of type Word and Cite), pool_list uses the cast reference to call the appropriate print function (Word::print or Cite::print).
To build the example program, you must have a copy of the Shore server running. The document Shore Software Installation Manual, particularly the section Testing Your Installation, gives simple instructions on how to start a server. You probably want to do this in a separate window. The server will accept interactive commands from the keyboard. The only one you will need for this demonstration is ``bye'', which causes the server to shut down cleanly and exit. The server will also occasionally produce debugging output.
The file Makefile.template is a sample Makefile for building the example program. Copy it to Makefile and edit it to set INSTALL_DIR to the root of the Shore installation directory ($SHROOT). Then type ``make''. You should see something like this
rm -f stree.h /usr/local/shore/bin/sdl <stree.sdl /usr/local/shore/bin/sdl_cxx stree > stree.h g++ -g -I/usr/local/shore/include -c main.C g++ -g -I/usr/local/shore/include -c stree_defs.C g++ -g -I/usr/local/shore/include -c tree.C g++ -g -I/usr/local/shore/include -c word.C g++ -g -I/usr/local/shore/include -c cite.C g++ -g -I/usr/local/shore/include -c document.C g++ -g main.o stree_defs.o tree.o word.o cite.o document.o /usr/local/shore/lib/libshore.a -o stree
The second line invokes the SDL compiler. It reads the SDL specification stree.sdl from standard input and builds a compiled version of the module stree in the Shore database. (The module is a registered object named /type/stree. Future versions of the SDL compiler will have options to control placement of its output.) The next line invokes the tool sdl_cxx, which builds a C++ language binding from the module object named by the command line argument (the directory /types is implied). We have redirected the output to the file stree.h, which is included by all of our source files. We then compile all of the source files and link them together, along with the Shore runtime support library. Any C++ compiler should be usable, but the current release is only tested to work with the GNU compiler (g++) version 2.6.0.
We have already explained the source files main.C, tree.C, word.C, cite.C, and document.C. The first of these is the main program, while the rest define the member functions for each of the classes corresponding to interfaces defined in stree.sdl. The file stree_defs.C is a small file containing just two lines:
#define MODULE_CODE #include "stree.h"
The generated file stree.h contains some function definitions and initializations of global variables. Compiling it with MODULE_CODE defined generates these functions and initializations for linking with the rest of the program.
First use stree to add the files test1, test2, and test3 to the repository.
% stree -av test?The output should look like this
Indexing file test1 Indexing file test2 Indexing file test3 about to commit committedNext, use the -lv (list verbose) option to look up some words.
% stree -lv six ========== six test2: two six test2: three six test2: six two test2: six three test2: six six test2: six seven test2: seven six test3: four six test3: five six test3: six four test3: six five test3: six six test3: six seven test3: seven six % stree -lv eight ========== eight **** Not foundUse the -d option to remove some of the documents.
% stree -d test2 % stree -lv six ========== six test3: four six test3: five six test3: six four test3: six five test3: six six test3: six seven test3: seven six % stree -d test1 % stree -lv seven ========== seven test3: four seven test3: five seven test3: six seven test3: seven four test3: seven five test3: seven six test3: seven sevenUse the -p option to see what anonymous objects remain in the pool.
% stree -p Word 'one' occurs on 0 lines Word 'three' occurs on 0 lines Word 'five' occurs on 7 lines Word 'seven' occurs on 7 lines Word 'two' occurs on 0 lines Word 'six' occurs on 7 lines Cite, offset 0 in file test3 cites four Word 'four' occurs on 7 lines Cite, offset 10 in file test3 cites four five Cite, offset 20 in file test3 cites four six Cite, offset 29 in file test3 cites four seven Cite, offset 40 in file test3 cites five four Cite, offset 50 in file test3 cites five Cite, offset 60 in file test3 cites five six Cite, offset 69 in file test3 cites five seven Cite, offset 80 in file test3 cites six four Cite, offset 89 in file test3 cites six five Cite, offset 98 in file test3 cites six Cite, offset 106 in file test3 cites six seven Cite, offset 116 in file test3 cites seven four Cite, offset 127 in file test3 cites seven five Cite, offset 138 in file test3 cites seven six Cite, offset 148 in file test3 cites sevenRemove the remaining document from the repository and verify that the pool contains only Word objects.
% stree -d test3 % stree -p Word 'one' occurs on 0 lines Word 'three' occurs on 0 lines Word 'five' occurs on 0 lines Word 'seven' occurs on 0 lines Word 'two' occurs on 0 lines Word 'six' occurs on 0 lines Word 'four' occurs on 0 lines
The stree directory has a sub-directory called sonnets which contains all 154 of Shakespeare's sonnets, one per file. For this test, add sonnets 10 through 19 to the repository.
% stree -av sonnets/sonnet01? Indexing file sonnets/sonnet010 Indexing file sonnets/sonnet011 Indexing file sonnets/sonnet012 Indexing file sonnets/sonnet013 Indexing file sonnets/sonnet014 Indexing file sonnets/sonnet015 Indexing file sonnets/sonnet016 Indexing file sonnets/sonnet017 Indexing file sonnets/sonnet018 Indexing file sonnets/sonnet019 about to commit committed % stree -lv summers ========== summers sonnet012: And summer's green all girded up in sheaves sonnet018: Shall I compare thee to a summer's day? sonnet018: And summer's lease hath all too short a date: % stree -l summers sonnet012 sonnet018 sonnet018Note that sonnet 18 is listed twice, since ``summers'' appears on two different lines in that sonnet.
To illustrate access to the Shore database from existing Unix utilities, mount the Shore database as a Unix file system, as explained in the Shore Software Installation Manual, in the section NFS-Mounting the Shore File System. If you follow the instructions there, you will have the Shore database mounted as /shoremnt.
% ls -l /shoremnt total 2 drwxrwxr-x 1 solomon 68 Oct 28 06:39 sample drwxr-xr-x 1 solomon 376 Oct 28 06:45 search_trees drwxr-xr-x 1 solomon 128 Oct 28 06:44 types % ls -l /shoremnt/search_trees total 6 prw-r--r-- 1 solomon 0 Oct 28 06:47 pool -rw-r--r-- 1 solomon 0 Oct 28 06:47 repository -rw-r--r-- 1 solomon 650 Oct 28 06:47 sonnet010 -rw-r--r-- 1 solomon 709 Oct 28 06:47 sonnet011 -rw-r--r-- 1 solomon 657 Oct 28 06:47 sonnet012 -rw-r--r-- 1 solomon 637 Oct 28 06:47 sonnet013 -rw-r--r-- 1 solomon 623 Oct 28 06:47 sonnet014 -rw-r--r-- 1 solomon 647 Oct 28 06:47 sonnet015 -rw-r--r-- 1 solomon 630 Oct 28 06:47 sonnet016 -rw-r--r-- 1 solomon 677 Oct 28 06:47 sonnet017 -rw-r--r-- 1 solomon 656 Oct 28 06:47 sonnet018 -rw-r--r-- 1 solomon 662 Oct 28 06:47 sonnet019
Note that there are 12 registered objects in the directory search_trees: 10 sonnets (objects of class Document), the object repository (of class SearchTree), and the pool object. The pool and repository show up under Unix as having zero size, since neither has a text member, but each of the sonnets shows up as a file whose contents are the same as its text member body.
% cat /shoremnt/search_trees/sonnet018 Shall I compare thee to a summer's day? Thou art more lovely and more temperate: Rough winds do shake the darling buds of May, And summer's lease hath all too short a date: Sometime too hot the eye of heaven shines, And often is his gold complexion dimmed, And every fair from fair sometime declines, By chance, or nature's changing course untrimmed: But thy eternal summer shall not fade, Nor lose possession of that fair thou ow'st, Nor shall death brag thou wand'rest in his shade, When in eternal lines to time thou grow'st, So long as men can breathe or eyes can see, So long lives this, and this gives life to thee.
The shell script swc illustrates how Shore applications can be combined with ``legacy'' Unix programs.
% swc summers 14 118 657 sonnet012 14 114 656 sonnet018 28 232 1313 totalThis script uses the output of stree -l (piped through sort -u to remove duplicates) as the list of arguments to a standard Unix utility (in this case wc) which accesses the objects as if they were ordinary Unix files. Note that there is no need to use a special version of wc or even to re-link wc with a special library.
A slightly more sophisticated example is afforded by sedit, which invokes an editor on the set of sonnets containing a given word.
% sedit summers MR Buffer Size Mode File -- ------ ---- ---- ---- . sonnet018 656 Text /shoremnt/search_trees/sonnet018 sonnet012 657 Text /shoremnt/search_trees/sonnet012 *scratch* 0 Lisp Interaction * *Buffer List* 274 Text --%%-Emacs: *Buffer List* 6:54am 0.23 (Buffer Menu)--All-------------------- Shall I compare thee to a summer's day? Thou art more lovely and more temperate: Rough winds do shake the darling buds of May, And summer's lease hath all too short a date: Sometime too hot the eye of heaven shines, And often is his gold complexion dimmed, And every fair from fair sometime declines, By chance, or nature's changing course untrimmed: But thy eternal summer shall not fade, Nor lose possession of that fair thou ow'st, Nor shall death brag thou wand'rest in his shade, -----Emacs: sonnet018 6:54am 0.23 (Text Fill)--Top----------------------- Commands: d, s, x, u; f, o, 1, 2, m, v; ~, %; q to quit; ? for help.
module stree { // interfaces defined here interface SearchTree; // the top-level construct interface Word; // a binary search tree node -- represents one word interface Document; // a document stored in the repository interface Cite; // a reference to a line in a document // A binary search tree of Word objects interface SearchTree { private: attribute ref<Word> root; // the root of the tree // Update the entry matching WORD to add a citation. // Add a new Word if necessary. void insert(in lref<char> word, in ref<Cite> cite); public: // Constructor: make an empty tree void initialize(); // Insert a new Document into the repository. The argument is a // pathname to be interpreted in the Unix name space as the name of a // Unix file containing the raw data. A new Document object with // the same base name is created in the current Shore directory, // filled with a copy of the file's context, and indexed by all of // its words. void insert(in lref<char> src); // Retrieve the Word object matching the argument. // Return NULL if not found. ref<Word> find(in lref<char> word) const; }; // There is one Word object for each distinct word appearing in any // document in the repository. interface Word { private: attribute string value; attribute ref<Word> left, right; public: relationship set<Cite> cited_by inverse cites; // Constructor: empty occurrences list void initialize(in lref<char> word); // How many occurrences? long count() const; // Get ith occurrence (returns NULL if not that many) ref<Cite> occurrence(in long i) const; // The following methods are meant to be used only by SearchTree. // Find decendant matching WORD creating one if necessary ref<Word> find_or_add(in lref<char> word); // Find only, return NULL on not found ref<Word> find(in lref<char> word) const; // Debugging dump void print(in long verbose) const; // Add an occurence void occurs_on(in ref<Cite> cite); }; // A Cite object represents a citation. There is one Cite object for each // line of each document in the repository. There is thus a many-many // relationship from Cite to Word and a many-one relationship from Cite // to Document. interface Cite { private: attribute long offset; public: relationship ref<Document> doc inverse cited_by; relationship set<Word> cites inverse cited_by; // Constructor void initialize(in ref<Document> d, in long o); // Print the referenced line void print(in long vflag) const; // Destructor void finalize(); }; // A Document is a chunk of text that looks like a Unix file. // We also record the file name under which it was created. // (The need to record the name may go away when Shore adds a way to find // the pathname of a registered object given a Ref to it.) interface Document { private: attribute text body; attribute string name; public: relationship set<Cite> cited_by inverse doc; // Constructor: The body is empty. void initialize(in lref<char> base_name); // Add some text to the end of the body. void append(in lref<char> str); // Read-only access to the file name. lref<char> get_name() const; // Current length of text long size() const; // Print a line starting at OFFSET void print_line(in long offset) const; // Destructor void finalize(); }; }
#include <iostream.h> #include <fstream.h> #include <std.h> #include "stree.h" REF(SearchTree) repository; REF(Pool) nodes; // Place to create new anonymous objects const char *DEMO_DIR = "search_trees"; char *argv0; int verbose; extern "C" int optind; enum OPERATION { OP_NONE, OP_ADD, OP_LIST, OP_DEL, OP_POOL_LIST } operation = OP_NONE; static void add_files(int argc, char *const*argc); static void list_files(char *str); static void delete_file(char *fname); static void pool_list(); void usage() { cerr << "usage:" << endl; cerr << "\t" << argv0 << " -a[v] fname [fname ...]" << endl; cerr << "\t" << argv0 << " -l[v] word" << endl; cerr << "\t" << argv0 << " -d fname" << endl; cerr << "\t" << argv0 << " -p" << endl; exit(1); } int main(int argc, char *const argv[]) { argv0 = argv[0]; int arg; shrc rc; // get command-line options int c; while ((c = getopt(argc,argv,"aldpvt")) != EOF) switch(c) { case 'a': operation = OP_ADD; break; case 'l': operation = OP_LIST; break; case 'd': operation = OP_DEL; break; case 'p': operation = OP_POOL_LIST; break; case 'v': verbose++; break; default: usage(); } if (operation == OP_NONE) usage(); // initialize connection to server SH_DO(Shore::init(argv0)); // Start a transaction for initialization SH_BEGIN_TRANSACTION(rc); if (rc) rc.fatal(); // this terminates the program with extreme prejudice // Check that our demo directory exists rc = Shore::chdir(DEMO_DIR); if (rc != RCOK) { if (rc.err_num() != SVAS_NotFound) SH_ABORT_TRANSACTION(rc); // Not found. Must be the first time through. // Create the directory SH_DO(Shore::mkdir(DEMO_DIR, 0755)) SH_DO(Shore::chdir(DEMO_DIR)) // Make a new SearchTree object ... repository = new("repository", 0644) SearchTree; repository.update()->initialize(); // ... and a pool for allocating Nodes. SH_DO(nodes.create_pool("pool", 0644, nodes)); } else { // not first time // Get the repository root from the database ... SH_DO(REF(SearchTree)::lookup("repository",repository)); // ... and the pool for creating nodes SH_DO(nodes.lookup("pool", nodes)); } SH_DO(SH_COMMIT_TRANSACTION); switch (operation) { case OP_ADD: add_files(argc-optind, argv+optind); break; case OP_LIST: if (optind != argc-1) usage(); list_files(argv[optind]); break; case OP_DEL: if (optind != argc-1) usage(); delete_file(argv[optind]); break; case OP_POOL_LIST: pool_list(); break; default: break; } return 0; } // main // Add all the named files to the repository static void add_files(int argc, char *const*argv) { shrc rc; SH_BEGIN_TRANSACTION(rc); if (rc) rc.fatal(); for (int i=0; i<argc; i++) repository.update()->insert(argv[i]); if (verbose) cout << "about to commit" << endl; SH_DO(SH_COMMIT_TRANSACTION); if (verbose) cout << "committed" << endl; } // add_files // List all uses of a word static void list_files(char *str) { shrc rc; SH_BEGIN_TRANSACTION(rc); if (rc) rc.fatal(); REF(Word) w = repository->find(str); if (verbose) cout << "========== " << str << endl; if (w && w->count() > 0) { REF(Cite) c; for (int i=0; c = w->occurrence(i); i++) c->print(verbose); } else if (verbose) cout << "**** Not found" << endl; SH_DO(SH_COMMIT_TRANSACTION); } // list_files // Removed the named file from the repository static void delete_file(char *fname) { shrc rc; SH_BEGIN_TRANSACTION(rc); if (rc) rc.fatal(); REF(Document) d; rc = d.lookup(fname,d); if (rc) cerr << rc; else { d.update()->finalize(); SH_DO(Shore::unlink(fname)); } SH_DO(SH_COMMIT_TRANSACTION); } // delete_file static void pool_list() { shrc rc; SH_BEGIN_TRANSACTION(rc); if (rc) rc.fatal(); REF(any) ref; REF(Word) w; REF(Cite) c; { PoolScan scan("pool"); if (scan != RCOK) SH_ABORT_TRANSACTION(scan.rc()); while (scan.next(ref, true) == RCOK) { if (w = TYPE_OBJECT(Word).isa(ref)) { w->print(1); } else if (c = TYPE_OBJECT(Cite).isa(ref)) { c->print(2); } else cout << " Unknown type of object" << endl; } } SH_DO(SH_COMMIT_TRANSACTION); } // pool_list
// Member functions of the SearchTree class #include <iostream.h> #include <string.h> #include <ctype.h> #include "stree.h" extern REF(Pool) nodes; static int getword(const char *&p, char *res, int size); extern int verbose; // defined in main.C void SearchTree::initialize() { root = NULL; } void SearchTree::insert(char *s, REF(Cite) c) { REF(Word) w; if (root) { w = root.update()->find_or_add(s); } else { root = new(nodes) Word; root.update()->initialize(s); w = root; } w.update()->occurs_on(c); } void SearchTree::insert(char *fname) { shrc rc; if (verbose) cout << "Indexing file " << fname << endl; // Open input file ifstream in(fname); if (!in) { perror(fname); SH_ABORT_TRANSACTION(rc); } // Create target document // Strip leading path from file name; char *base_name = strrchr(fname, '/'); if (base_name) base_name++; else base_name = fname; REF(Document) doc; rc = doc.new_persistent(base_name, 0644, doc); if (rc) { perror(base_name); SH_ABORT_TRANSACTION(rc); } doc.update()->initialize(base_name); // for each line of the document ... char linebuf[1024]; while (in.getline(linebuf, sizeof linebuf -1)) { long off = doc->size(); // copy the line to the body of the document doc.update()->append(linebuf); doc.update()->append("\n"); // allocate a new Cite object for this line REF(Cite) cite = new (nodes) Cite; cite.update()->initialize(doc, off); // for each word on the line ... char word[100]; const char *p = linebuf; while (getword(p, word, sizeof word)) { // link the citation to the word insert(word, cite); } while (getword(p, word, sizeof word)); } } REF(Word) SearchTree::find(char *str) const { if (root) return root->find(str); return NULL; } // Copy a word of at most SIZE characters (including terminating null) // in to the buffer starting at RES. Start searching at location P. // Words are delimited by white space. The result is translated to lower // case, with all non-letters eliminated. // P is updated to point to the first character not copied. // The result is 1 if a word is found, 0 if '\0' is encountered first. static int getword(const char *&p, char *res, int size) { for (;; ) { // skip leading white space while (isspace(*p)) p++; // check for eoln if (*p == 0) return 0; // gather non-space characters, translating to lower case and // ignoring non-alpha characters for (int len = 0; len < size-1 && *p && !isspace(*p); p++) { if (isupper(*p)) res[len++] = tolower(*p); else if (islower(*p)) res[len++] = *p; } if (len > 0) { res[len] = 0; return 1; } // otherwise, word was all digits and punctuation, so try again. } }
// Member functions of the Word class #include <iostream.h> #include <string.h> #include "stree.h" extern REF(Pool) nodes; void Word::initialize(char *word) { value = word; left = NULL; right = NULL; } long Word::count() const { return cited_by.get_size(); } REF(Cite) Word::occurrence(long i) const { return cited_by.get_elt(i); } REF(Word) Word::find_or_add(char *s) { int i = strcmp(s,value); if (i == 0) return this; if (i < 0) { if (left) return left.update()->find_or_add(s); else { left = new(nodes) Word; left.update()->initialize(s); return left; } } else { if (right) return right.update()->find_or_add(s); else { right = new(nodes) Word; right.update()->initialize(s); return right; } } } REF(Word) Word::find(char *s) const { int i = strcmp(s,value); if (i == 0) return this; if (i < 0) return left ? left->find(s) : (REF(Word))NULL; return right ? right->find(s) : (REF(Word))NULL; } void Word::occurs_on(REF(Cite) cite) { cited_by.add(cite); } void Word::print(long verbose) const { if (verbose) { int s = cited_by.get_size(); cout << "Word '" << (char *)value << "' occurs on " << s << " line" << (s==1 ? "" : "s") << endl; } else cout << (char *)value; }
// Member functions of the Cite class #include <iostream.h> #include "stree.h" void Cite::initialize(REF(Document) d, long o) { doc = d; offset = o; } void Cite::print(long v) const { switch (v) { default: case 0: // just the file name cout << doc->get_name() << endl; break; case 1: // the file name and the corresponding line cout << doc->get_name() << ": "; doc->print_line(offset); break; case 2: // debugging version cout << "Cite, offset " << offset << " in file " << doc->get_name() << " cites"; { int count = cites.get_size(); for (int i = 0; i < count; i++) { REF(Word) w = cites.get_elt(i); cout << " "; w->print(0); } cout << endl; } break; } } void Cite::finalize() { while (cites.delete_one()) {} }
// Member functions of the Document class #include <iostream.h> #include <string.h> #include "stree.h" void Document::initialize(char *base_name) { body = 0; name = base_name; } void Document::append(char *str) { body.set(str, body.blen(), ::strlen(str)); } char *Document::get_name() const { return name; } long Document::size() const { return body.strlen(); } void Document::print_line(long offset) const { char buf[100]; body.get(buf, offset, sizeof buf); buf[sizeof buf - 1] = 0; char *p = strchr(buf, '\n'); if (p) *++p = 0; cout << buf; } void Document::finalize() { REF(Cite) p; while (p = cited_by.delete_one()) { p.update()->finalize(); p.destroy(); } }
#define MODULE_CODE #include "stree.h"
#!/bin/sh mountpoint=/shoremnt stree=`pwd`/stree cd $mountpoint/search_trees wc `$stree -l $1 | sort -u`
#!/bin/sh mountpoint=/shoremnt files="`stree -l $1`" || exit 1 if [ -n "$files" ] then cd $mountpoint/search_trees emacs $files else echo $1 not found fi