Lesson 14: Data retrieval -The query tool

The difference between a query- and a text-search.

It is most welcomed that applications such as ATLAS/ti greatly simply the coding of large amounts of qualitative data as well as the elaboration of semantic networks. Nevertheless you will receive analytic constructions of a size and complexity that can hardly be surveyed by a simple visual inspection let alone the impossibility to verify them for consistency and reliability. This demands for an additional tool that is often referred to as a "Retrieval feature" because it is intended to retrieve a selection of created quotations depending on indicators (codes) to be named. The main difference to a simple text-search is that the retrieval-feature is not looking for formal indicators in the data itself but it rather retrieves all text-sections that were assigned an external, formal indicator (the code).

This tool is known as the "Query Tool" within ATLAS/ti and is definitely the most complex tool of the application from a user's perspective. A query typically consists of a number of more or less complex Operator/Operand combinations to specify the conditions quotations must match in order to be retrieved. Already outside of the query-tool ATLAS/ti continuously carries out retrieval-procedures: Each double-click on a code from the code-list results in a search for all quotations that were assigned the code. The quotation is immediately being displayed in the context upon a single hit. In case of more than a single hit a list of the quotations to choose from is presented as a list. The query tool is dedicated to more complex search-operations in which it is desirable to look up more than a single code and to search for certain, logically definable combinations of codes. Lets take a look at the user-interface of this tool that is launched by clicking the appropriate button (the "telescope") on the main toolbar:

Query Tool: user-interface

You notice a window separated into 5 sections: On the upper left is a list with code-families, on the lower left is a list with codes. Either of those object-types are basic Operands for searches. On the upper right hand are two windows separated by a moveable line in which the query is being displayed as an expression in various ways. On the lower right hand is the result-window returning retrieved quotations at any time of the search (even while successively constructing the search-term). This arrangement is decorated with a number of toolbar-ensembles and smaller button-groups. You especially notice the toolbar with grouped operators oriented at the vertical left side. On the upper right is a group of six buttons that are required when forming a search-expression. Between both the search-expression window and the results window are buttons for the so-called Supercode feature (see below) and to switch the display-mode of the lower search-expression window (switch to either Infix- or Prefix mode). On the right hand of the results-window are icons to delete and print results as formerly known from different contexts. Here you may, besides accessing the help-button, make the textbase-selection (the process of pre-selecting a text to be investigated) as well as updating the window‘s display-mode.

The query-language

The query language is a very powerful tool, but learning to use it will take some getting used to.

All queries are basically constructed using the so-called "Reversed Polish Notation" (RPN). This might sound more difficult than it actually is. Under no circumstances you will need to learn the polish language (although it might not be a disadvantage). The most significant difference between this notation and the various database query-systems and the "Infix"-notation familiar from pocket calculators is that expressions are not nested inside brackets. Instead an incremental approach is used to form expressions (step by step); initially the operands are entered (usually two, in rare cases like NOT, UP, DOWN only one) and then the operators to be applied to the operand. Interestingly each click or a double-click on an operator or and operand already produces a result that is immediately being displayed in the results window. At the same time it is virtually impossible to produce a syntactically incorrect query (the query-contents may however be meaningless) as it could easily happen with nesting expressions inside brackets.

You are actually used to the logic of this query-language due to the use of mouse-controlled application interfaces. Also the same procedure applies: an object is first selected and then the action to be carried out with it (e.g. copying) is chosen.

Forming a query

Any query begins by selectively choosing one but usually two operands from the code- or the code-family list. Each of the operands is immediately displayed upon clicking. At the same time a list of the quotations assigned to it are being displayed in the results-window. As soon as you have chosen the operator from the operator-list to be applied to both of the operands, the operator appears in the window on the upper right as part of the complete query-expression along with the selected operands. The results-window displays the result of the operation (about the „Schnittmenge" of quotation-references of either selected codes as with the Boolean AND-operation). The query-results may not only immediately be inspected in the list but may also be edited in the context of the primary window by double-clicking particular quotation-references.

Arrangement of the query-window

The upper query-window is arranged to facilitate having a particular expression appear as an element of a stack while the recently selected expressions are stored on the top (for example to clicked codes are stored there one by one). As soon as a valid query-expression was formed by adding an appropriate operator however, the entire expression is presented (code plus operator are presented as a single layer of the stack). Also this cases clarifies that this expression may be used (like with most of the operators) as an operand from now on.

The search can now be extended step by step and tentatively formed towards the correct direction. The previous result now provides the first operand for a continued query-process. By adding a second and a new operand to be applied to one of both you receive a new result. This way the query may be continued and specified as often as necessary.

In case you have received an unsatisfactory result during an operation and would like to restart the previous operation an "undo" button on the upper right is available to undo the previous operation but however doesn't alter the rest of the search-expression. If you incidentally notice that the recent operation was not in error though you may restore it by using "redo".

Operators

Three different types of operators are offered. They are located in the left, vertical toolbar within the query-tool. The arrangement from top to bottom is as follows:

Four Boolean operators
allow a mathematic-logical combination of search-strings.
Three semantic operators
allow you to examine the networking-structure of the reciprocal relation between the codes. Such operators are also known as Thesaurus-operators.
Six "Proximity" operators
intended to determine the proximity-space between text-segments. You can search for "embeddedness", overlapping and the co-occurrence) of interesting text-segments.

There is also a "ToolTip"-help, returning information on a particular operator. Especially with proximity-operators the question always arises in what order the operands are inserted into the operator-function. Following is a brief overview over the particular operators.

Boolean operators

The following Boolean operators are available: OR, XOR, AND, and NOT. As binary operators the first three require each two operands. Only the NOT-operation requires only a single operand. Nevertheless the operands itself may have a very complex structure already (for example if it is a code-family, a "Supercode" (see below) or pre-composed query-results).

OR
The simple OR-operation retrieves all quotations that are at least coded with one of the operands and also the ones that are linked to more than one questionable operands. This operation typically has a very high output but is only less precise on the other side.
XOR
While the OR-operation can basically be translated as „at least one of" the XOR function which is not commonly used in query-systems in contrast, is closer to the everyday use of the word „or". XOR retrieves all text-passages that are either coded with one or the other operand. A passage having both codes will not be displayed because XOR deals with the „Either-or" case.
AND
The AND operation retrieves all those quotations that match the conditions specified with the operand and is therefore highly selective and the produced output is rather little.
NOT
The NOT operator verifies the absence of a condition. The results are quotations that are not coded with the questionable code.

Imagine each code-family when used within the query tool as a large OR operation with all assigned codes. Since the code-families are not mutually exclusive, (each code may be contained in various code-families at the same time) using them as part of complex retrieval procedures may lead to semantic nonsense. It is already logically senseless to carry out an XOR operation with two code-families as an operand as far as there is only a single code existing in both families.

Semantic- or Thesaurus-operators

Operators of the semantic type partially utilize the hierarchic structure of the semantic network you have constructed in the previous tentative theory building work.

DOWN
The DOWN operator searches the network from conceptually higher-level codes down to the basic ones and collects all quotations that are found during the process. Only transitive relations (like „is a or „is part of") are used between codes regardless of the remaining relations. The DOWN operator usually achieves a relatively high hit-rate. Since there is a theory developed from the data behind the linked codes, the result is generally more accurate than from the above-mentioned OR operator.
UP The UP operator retrieves all quotations that were either assigned to the selected code or to its higher code. Please note: Unlike a classic unidimensional Thesaurus-structure (as an ideal type of classification systems) it may happen in semantic networks that a particular code has multiple high-level terms (for example depending on the dimension the assignation is related to: "work-salary" may be part of "work-motivation", but also a part of "types of income").
SIBlings
The SIBlings operator retrieves all quotations that are coded with the selected code or coded with a code that is itself a subject of a code that may then again be a subject of a code. Lets stick with the example of the work-income: Using the SIBlings function all quotations for the various types of income and for the various types of work-motivation would be retrieved. There might certainly be more semantically useful queries but this case is simply a sample explaining the principle.

IMPORTANT: you may understandably only use codes but not code-families, Supercodes or previous search-results as an operand for such semantic operations. This would result in relationships that cannot be clearly determined.

Proximity-operators

Operators of this type relate to the relative nearness or distance of two operators to each other and therefore require exactly two operands for a valid query. Since proximity operators are not commutative the order of operand-entry plays an important role. While „A or B" is equivalent to „B or A", „A follows B" is naturally different to „B follows A". As a rule of thumb for the procedure to enter these operators you can remember to always enter them in the order of the natural language form: "A follows B" therefore requires that you click A first and then B before the operator is finally appended. Because of the non-commutativity the proximity operators are always existing in two versions (A to B and B to A).

Embedded-operators look up quotations in which one contains or embraces the other and is coded with the specified code.
WITHIN finds all quotations that are coded with A and are embraced with a quotation coded with B (A WITHIN B).
ENCLOSES in contrast locates all quotations with the code A that are enclosing a quotation with the code B (A ENCLOSES B). The overlapping-operators search for quotations that overlap each other. Where…
OVERLAPPED_BY
searches all positions in which the first operand to be named (A) is overlapped by the second operand to be named (B) (A OVERLAPPED_BY B).
OVERLAPS
by contrast looks up the opposite and thus passages in which the first named operand (or the quotations coded with it) overlaps the second operand (A OVERLAPS B).
Sequence-operators
search for two criteria: this is first a particular sequence in which the quotation-sections coded with either operand A or operand B are following sequentially and secondly under the condition of a certain maximum „distance" of quotations to each other (expressed by a number of lines to be specified).
FOLLOWS
finds quotations that are coded with A and follow a passage coded with B (A FOLLOWS B).
PRECEDES
by contrast looks up all quotations coded with A that are preceding quotations coded with B (A PRECEDES B).
IMPORTANT: The number of lines as a distance-indicator is of course only an expedient. It is neither a methodological argument why a specific distance is still significant while another is already not nor is the meaning of the line measurement a stable indicator through various types of text and formatting. It is therefore only an inaccurate heuristic tool and you have to decide yourself from your analytic work how to usefully deploy it.
CO-OCCURS
finally is the only operator of its kind existing as a single version only because the only intention (less specific) is to determine whether codes of two specified quotations do overlap in an arbitrary order or whether they do mutually enclose each other. Therefore ‘A CO-OCCURRING WITH B' is semantically identical with "B CO-OCCURRING WITH A". As a result this operator detects the number of hits of the four-„overlap" and „encloses" operators.

Additional functions

Besides the demonstrated operators, the query tool offers a number of additional functions. Parts of it may be used similar to a pocket calculator and serves the usability when forming a query:

  • C deletes the entire sentence of expressions (in the upper right query-window)
  • S exchanges both topmost elements of the stack (especially useful whenever you made mistakes in the input-order)
  • P copies the topmost expression of the stack and stores once again above --- especially useful if you happen to reproduce a complex expression without re-entering it „click by click".
  • Recalc In case new quotations were produced while a code was currently used in an active query-session you may recalculate the query-result without starting a new query from scratch.
  • Undo removes the topmost expression of the stack from the query.
  • Redo neutralizes the undo-function by putting the recently removed expression onto the stack.
Additionally the following functionality within the query tool window is provided through buttons:
Super-Code
allows the creation of so-called Supercodes (see below for further details)
Prefix Display
The lower part of the 2 pane split query-window displays the search-expression either in Prefix- or Infix style while the latter is the default-setting. The button allows you to toggle between both display-modes.
Distance in lines
specifies the number of lines used by the FOLLOWS and PRECEDES operators as distance-criteria (default: 5 lines).
Refresh
allows you to update the code- and the code-family list within the query tool in case you have made any changes to the HU in the meantime.
The "Textbase Selection" button
finally opens another similarly structured window to let you specify what primary-texts or primary-text families to include in the search. After you have sorted the various types of data (interviews, observation-protocols, documents etc.) into different PD-families you may then decide to only relate to a part of the entire range of available data-types (for example only interview- and observation-protocols but no documents). This way you may alternatively select the data of a specific case for your search.

Supercodes

What exactly is behind the repeatedly mentioned "Supercode" function? It is the ability to store an entire query as dynamic code. What does this mean? Supercodes will not save a particular query-result (the hit list for a later use) but the entire query itself becomes a code to search hits in the proceeding analytic work by itself. This works as follows: After forming a satisfactory query which in your opinion accurately reflects a certain aspect of the evolving theory (Thomas Muhr characterizes this as a "frozen hypothesis") you can create a new kind of code that is referenced in the code-list as well as operable within the network-editor. Clicking the Supercode-button and entering a Supercode-name generates this code. The search-expression behind the Supercode is stored as a code-comment and is therefore easily viewable. Each time during your analytic work and whenever using one of the codes embedded into the Supercode as an operand, you are therewith coding with it or relate it to other codes, the „result" of the Supercode-query changes respectively. You will notice this effect not only when re-launching the query tool but already upon simply activating the Supercodes in the list. This is because the query behind the Supercode is carried out upon any activation of the Supercode.
However you must be careful when using Supercodes because you can easily produce a loop which is for example a Supercode containing code-families that again contains the same Supercode. Also it makes only little sense to directly code that data using Supercodes. Therefore the ability to directly code using Supercodes has been disabled starting from version 4.1 (build 51).