Saturday, April 23, 2005
How to showing the equivalent Tandem (NSK) process id of the OSS ?process?
jojo@\rat:~ $ ps -l
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME
CMD
000 0002 5300 318308420 1 - 175 - 0,347 - - #ptcaamr
00:00 /bin/-sh
000 2002 5300 234422338 318308420 - 175 - 0,491 - - #ptcaamr
00:00 ps
You could also use the '-o addr' option:
jojo@\rat:~ $ ps -o addr,command
ADDR COMMAND
0,347 /bin/-sh
0,410 ps -o addr,command
Binder Timestamp Versus SQL Compile Time
We found this information after much experimentation. The last modified time of a SQL program shows when it was last SQL-compiled - it does not reflect when the source language program was actually compiled. (The opposite is also true: the Binder timestamp does not reflect when the SQL-compile was performed but this information is much less important.) We ran into some problems where the wrong version of a program was used, but it was not possible to tell this from the last modified time information. Using the Binder you can obtain the date and time of both the source language compilation and the SQL-compilation.
To show the source language compilation time, issue the following command:
BIND SHOW SET TIMESTAMP FROM object
To determine the SQL-compilation time, use the following undocumented option of the Binder's INFO INCLUDE command:
BIND FILE object; INFO INCLUDE SQL_OBJ SQL-OBJECT, DETAIL
Try this in tandem oss -
Copy and paste the following and check it out for yourself..
echo ' 82 43/25 43+65P80P82P73P76P32P70P79P79P76P10P ' dc
Friday, March 18, 2005
TAL Stack Issues
In TAL, you need to define a stack large enough to hold the SQL statements. To solve the problem, use a stack size of 32 (the full lower half of the stack). We did not try to optimize the stack size, so it may be possible to run with a smaller stack size. This problem is unique to TAL programs; the COBOL and C compilers handle the stack sizing for you.
Friday, March 11, 2005
Howzz HP Tandem Cobol is different from Other standard Cobol?
Feature 1:-
------------
Receive Control paragraph is in input-output section of environmental division.
RECEIVE-CONTROL : -
To define the two tables used by $RECEIVE, include a RECEIVE-CONTROL paragraph in the Environment Division of your server process.
In the CRE, the RECEIVE-CONTROL paragraph used for a run unit is the one for the program that first opens $RECEIVE.
In the non-CRE environment, the RECEIVE-CONTROL paragraph used for a run unit is the one in the main program.
Here's the Syntax
.RECEIVE-CONTROL
EXTERNAL .
TABLE OCCURS phrase
SYNCDEPTH LIMIT phrase
REPLY CONTAINS phrase
ERROR CODE phrase
MESSAGE SOURCE phrase
REPORT phrase
Receive-Control Table: -
The receive-control table, an internal table, is required for $RECEIVE operation. Its purpose is to identify, by the PROCESS-ID, which requesting processes have opened the server process.
---------------------------------
Receive-Control Table Example
----------------------------------
Entry Number Requesting Process
1 REQUESTER PROCESS 1
2 REQUESTER PROCESS 2
3 REQUESTER PROCESS 3
……
100* REQUESTER PROCESS 100
--------------------------------
* The number of entries in the receive-control table is defined by table-length.
Reply TABLE: -
When reply messages are sent back to the requesting processes, the COBOL compiler constructs a second internal table in which to save the replies. The reply message includes the sync-idand the contents of the reply. In the COBOL85 environment, if the RECEIVE-CONTROL paragraph’s TABLE OCCURS phrase includes EXTENDED-STORAGE, the reply table is in the Extended-Storage Section.
Thursday, March 10, 2005
Hidden SQL/MP Features
Example:
>>select $zznorm on;
*** ERROR from SQL [-2014]: The SQL compiler was called in test mode.
This *** error is used to exit the compiler without further SQL processing.
>>select * from auditlog
>>where user_id BETWEEN "A" AND "ZZZZZZZZ"
>>browse access;
SELECT TIME_STAMP , USER_ID , MENU_ACTION
FROM \TDMDEV.$TST03.SQLSUBV.AUDITLOG
WHERE ( USER_ID >= "A" ) AND ( USER_ID <= "ZZZZZZZZ" ) FOR BROWSE ACCESS --- 0 row(s) selected. >>
2) add define =_sql_cmp_explain_cat, class map, file $wrk0.wietse.k;
After the defines was added the explain plan will display more information. It will return all the catalog information the compiler has used for its descission.
This information includes:-
Primary key information
- All index information-
Uec, high and low values for each column-
And much more
Sunday, March 06, 2005
SOAP - An Overview
SOAP is a lightweight protocol for the exchange of information in a decentralized, distributed environment.
A SOAP message is a transmission of information from a sender to a receiver.
SOAP messages are combined to perform request/response patterns.
SOAP is transport protocol independent.
SOAP can be run over existing Internet infrastructure (for example SOAP over HTTP).
SOAP enables the binding and usage of discovered Web services by defining a message path for routing messages.
SOAP is used to query UDDI for Web services.
SOAP is an XML-based protocol that defines three parts to every message:
Envelope: The envelope defines a framework for describing what is in a message and how to process it. A SOAP message is an envelope containing zero or more headers and exactly one body. The envelope is the top element of the XML document, providing a container for control information, the address of a message, and the message itself. Headers transport any control information such as quality-of-service attributes. The body contains the message identification and its parameters. Both the headers and the body are child elements of the envelope.
Encoding rules: The set of encoding rules expresses instances of application-defined data types. Encoding rules define a serialization mechanism that can be used to exchange instances of application-defined data types. SOAP defines a programming language-independent data type scheme based on XSD plus encoding rules for all data types defined according to this model.
Communication styles: Communications can follow a remote procedure call (RPC) or message-oriented (Document) format.
Communication styles:-
SOAP supports two different communication styles:
Remote procedure call (RPC). Invocation of an operation returning a result.
Message-Oriented. Also known as document-oriented or document style. This style provides a lower layer of abstraction, and requires more programming work. The request parameter is any XML document, the response can be anything or nothing.
Saturday, March 05, 2005
Pointers in COBOL85
>> In my program, I was getting structures from the MEASURE subsystem.
> Since each NSK version and MEASURE version has different structure lengths, I used pointers to line up properly.
>> In the WORKING-STORAGE section
I have the following:
>> 01 CPU-PNTR USAGE POINTER VALUE IS NULL.
> 01 DISC-PNTR USAGE POINTER VALUE IS NULL.
> 01 DISCOPEN-PNTR USAGE POINTER VALUE IS NULL.
> 01 PROCESS-PNTR USAGE POINTER VALUE IS NULL.
>> In the actual code:
>> SET CPU-PNTR TO ADDRESS OF MEASREAD-BUFFER.
>> where MEASREAD-BUFFER is the buffer returned by MEASREAD.
>> Also,
>> SET ADDRESS OF CPU-DEFINITION TO CPU-PNTR.
> SET CPU-PNTR UP BY CPU-ENTITY-LENGTH OF VERSION-TABLE> (VERSION-INDEX).
>> where CPU-DEFINITION is a structure of the CPU entity.
> CPU-ENTITY-LENGTH is the length of the cpu-buffer as returned by> MEASREADCONF.
This length varies from version to version.
>> I also used pointers to check if certain fields were present in the> structure.
>> IF (ADDRESS OF CPU-SUBTYPE OF CPU-DEFINITION <>
MOVE CPU-SUBTYPE OF CPU-DEFINITION TO CPU-POWER-SEARCH-SUBTYPE> ELSE> MOVE ZERO TO CPU-POWER-SEARCH-SUBTYPE>
END-IF.>>
NOTE: CPU-PNTR was set earlier to point to the next structure element.
> This IF verifies if the current MEASURE version has the SUBTYPE element> or not.
History of Tandem Computers
Tandem Computers was an early manufacturer of fault tolerant computer systems, marketed to the growing number of transaction processing customers who used them for ATMs, banks, stock exchanges and other similar needs. Tandem systems used a number of redundant processors and storage devices to provide high-speed "failover" in the case of a hardware failure, an architecture that they called NonStop. Over the two decades from the 1970s into the mid-90s, Tandem systems evolved into software-only solutions running on other platforms. The company was eventually purchased by Compaq in 1997 in order to provide that company with more robust server offerings. Today their software is still known as NonStop, as a separate product line offered by Hewlett-Packard.
History
Tandem Computers was founded in 1974 by a group of engineers from Hewlett-Packard, led by James Treybig. Their business plan called for systems that were proof from "single point failures" that were only slightly more expensive than competing non-fault tolerant systems. Tandem considered this to be very important to their business model, as customers invariably developed procedural solutions to downtime when the price was too high.
Design of their "NonStop I" system was complete in 1975, and the first example was sold to Citibank in 1976. The NonStop consisted of between 2 and 16 processor modules, each capable of about 0.7 MIPS with their own memory, I/O controllers, and dual connections to their custom inter-CPU computer bus, Dynabus. The modules were constructed so that failure would always leave at least one of the busses (both I/O and Dynabus), free for use by the other modules. The CPU's themselves were fairly simple. The basic design was patterned on the HP3000 CPU, a 16-bit stack-based machine with 32-bit addressing. In reality the full 32-bit address space could not be addressed, due to the use of a number of bits acting as status flags. Like the HP3000, the NonStop CPU added a number of registers for fast access, in this case programmer-specified global variables.
The "NonStop I" ran a custom operating system called Guardian that was key to the system's failover modes. A number of other companies had introduced failover that operated by restarting programs on other CPU's, but in Guardian all operations used message passing and were checkpointed for every operaton. That is, Guardian could restart from any instruction in the program, a key feature that the stack-based processor made fairly easy to construct because it had little "state" to move from machine to machine. All instructions consisted of taking data from the stack and putting it back on when it completed, and if the later failed the stack could be copied to another processor and restarted at that instruction.
While conventional systems of the era, including mainframes, had failure rates on the order of a few days, the NonStop system was designed to fail 100 times less, with "uptimes" measured in years. Nevertheless the NonStop was deliberately designed to be price-competitive with conventional systems, with a simple 2-CPU system priced at just over two times that of a competing single-processor mainframe, as opposed to four or more times of most competing solutions.
"NonStop I" was followed by the "NonStop II" in 1981, a slight improvement in speed to 0.8 MIPS, but a more measurable upgrade in memory from a maximum of 384kB per CPU in the I, to 2MB in the II, and the addition of a complete virtual memory system allowing for considerably larger address spaces. The same basic system, including the physical packaging, was used in 1983's NonStop TXP system that over doubled the speed to 2.0 MIPS, and the physical memory to 8MB. In all of these machines the same Dynabus system was used, which had been overdesigned in the NonStop I so they could avoid changing it in the future.
Introduced along with the TXP was a new fibre optic bus system, FOX. FOX allowed a number of TXP and NonStop II systems to be connected together to form a larger system with up to 14 nodes. Like the CPU modules within the computers, Guardian could failover entire task sets to other machines in the network.
In 1986 a major upgrade to the system was introduced, the NonStop VLX. VLX used a new Dynabus, increasing speed from 13MBps to 40MBps (total, 20MBps per independent bus). They also introduced FOX II, increasing the size of the networks from 1Km to 4Km. Using the original FOX VLX systems could be used with the older NonStop II and TPX's, but these systems were not supported on FOX II.
VLX was partnered with the NonStop CLX, a minicomputer sized machine for smaller installations. The CLX had roughly the same performance as the earlier TXP, but was much smaller and less expensive. By the end of it's lifetime the CLX had increased in speed considerably, and competed with the VLX, 1991's CLX 800 was only about 20% slower than the VLX, with the main difference being more limited expansion abilities.
In 1986 Tandem also introduced the first fault-tolerant SQL database, NonStop SQL. Developed on the famous Ingres code base, NonStop SQL added a number of features based on Guardian to ensure data validity across nodes. NonStop SQL was famous for scaling linearily in performance with the number of nodes added to the system, whereas most databases of the era had performance that plateaued quite quickly, often after two CPUs. A later version released in 1989 added transactions that could be spread over nodes, a feature that remained unique for some time.
The NonStop Cyclone was introduced in 1989, introducing a new superscalar CPU design. It was otherwise similar to earlier systems, although much faster. In general terms the Cyclone was about four times as fast as the CLX 800, which Tandem used as their benchmark. On the downside the new CPU was complex and expensive, requiring four circuit boards to implement a single CPU.
In 1991 followed this with RISC-implementations of Guardian, running on MIPS R3000-based CPU modules in the Cyclone/R and CLX/R. Programs written for the earlier stack-based CPU design were automatically translated on the fly into R3000 code in an interpreter, although they ran considerably slower than on earlier machines. Tandem also provided a number of tools to easily port existing object code to the new systems, resulting in code that was some 25% slower than the original Cyclone. Source code compilers were also available. While slower, the new system was considerably less expensive, and it was clear that RISC performance was outpacing CISC. By making the move when they did, they were banking on increases in MIPS performance quickly wiping out any performance disadvantages the system had at the time. Using the same basic hardware, Tandem also shipped NonStop Integrity, replacing Guardian with a modified Unix.
In 1993 Tandem introduced the NonStop Himalaya, also known as the S-Series. The Himalaya was the first system that changed the underlying architechture of the NonStop system, basing both the I/O and inter-CPU busses on their new ServerNet system. Whereas Dynabus and FOX linked the CPU's together into a ring, ServerNet was a true peer-to-peer network replacing both, and ran at much higher speeds. Another addition was the use of "lockstep processors"; each processor in the system had two MIPS CPUs running the same code, and if the results coming out ever disagreed, the processor was considered to be faulting and instantly stopped. At that point Guardian would move that task to another processor as in earlier systems, with lockstep guarenteeing that bad data was never written out.
Tandem was acquired by Compaq in 1997. In an ironic full-circle, Compaq was later acquired by HP in 2002, bringing Tandem back to its original roots. As of 2003, the NonStop product line continues to be produced, under the HP name.
The early years of Tandem
Treybig followed Perkins, and became Kleiner Perkins' fifth employee, spending two years there working on the Tandem idea. At HP, Treybig saw customers lash two HP servers together so one would take over if the primary system failed. These customers would replace the standard operating system with custom-built software, causing all sorts of support problems. Treybig thought such a setup could be created with standard computer hardware and software, replacing these exotic one-off systems.
"You could build an operating system that wouldn't fail and would be standard software," Treybig said. But to make the idea feasible, the systems had to be designed with the goliath input-output capabilities of mainframe computers, he said.
Treybig launched Tandem in 1974, with half the first 18 employees coming from HP. He also hired three gurus to design the system. Mike Green had also been at HP, where he had created its first time-sharing computer, a system that could work on several jobs simultaneously by quickly switching between different tasks. Jim Katzman, too, had been at HP but had become head of mainframe maker Amdahl's input-output work. And Davie Mackie had been working on custom systems for discount retailers in Europe.
"The three of them put together the architecture of the Tandem. There's still nothing like it," Treybig said.
Indeed, Illuminata's Eunice said, Tandem succeeded at an extremely difficult task, one nobody has yet been able to do as well. Tandem's technology can simultaneously distribute a gigantic database across numerous independent processors while supporting high volumes of transactions.
"Tandem's the only one who whacked at vastly parallel databases and was able to do online transaction processing at the same time," Eunice said.
But Tandem couldn't sustain its early growth. By the mid-1990s, revenue had flattened out at about $500 million a quarter, and Tandem had undergone rounds of layoffs, cutting hundreds from a staff that had grown to 9,500.
"In the early days," consultant Bonhiver said, "Tandem had much better growth because there were so many different kinds of (computer systems) out there, it didn't stand out as this weird, niche platform. Now everything is all Sun or (Windows) NT and some mainframe."
Hewlett-Packard never offered to buy Tandem, but it's clear HP was always in the back of Treybig's mind. In fact, recurring visions of HP would come to Treybig in his sleep.
"I used to dream that we failed, and (HP) put me in a cage in the front entryway," Treybig said.
"It was my nightmare."
Treybig initially proposed a buyout to Compaq, thinking Tandem couldn't survive on its own and that Compaq's thrust toward services and direct sales matched Tandem well. But Compaq wasn't interested at the time, and Treybig left Tandem in 1996, the year before Compaq changed its mind. Treybig now is a partner at Austin Ventures in Texas.
Tuesday, February 22, 2005
Working with SQL/MX EXPLAIN
Step One – Using Defaults
The SQL/MX Query guide describes several default values that can be used to address more global problems with the optimization of queries. For example, if a user experiences a general problem with too many random accesses into tables with nested joins or insert/selects, the defaults PREFERRED_PROBING_ORDER_ FOR_NESTED_JOIN and UPD_ORDERED can be used to produce more plans that lead to a more sequential access. The most simple change, however, is to set the OPTIMIZATION_LEVEL default to MAXIMUM. For larger joins (greater than five tables) this allows the optimizer to look at all potential plans. For all following steps it is advisable to leave the optimization level at MAXIMUM. Otherwise the optimizer may fail to produce the desired plan because that plan was not found within the optimizer’s time limit. Other joins that switch off compiler heuristics are CROSS_PRODUCT_CONTROL, ZIG_ZAG_JOINS, and DATA_FLOW_OPTIMIZATION. When tuning a query plan it is best to start by leaving the optimizer maximal freedom (once the desired plan is found one can always limit that freedom again).
control query default OPTIMIZATION_LEVEL ‘MAXIMUM’;control query default DATA_FLOW_OPTIMIZATION ‘OFF’;control query default CROSS_PRODUCT_CONTROL ‘OFF’;control query default ZIG_ZAG_JOINS ‘ON’;
Another very powerful yet simple default is JOIN_ORDER_BY_USER. With this default, a user can determine the join order of a query plan with the sequence in which the tables appear in the FROM clause.
Step Two – Start Using Control Query Shape
The SQL/MX optimizer picks up the most efficient plan it can, based on the search space it is allowed to traverse; the search space is affected by the control query defaults that are in effect at that time. Sometimes, customer might know a good plan for a given query. The customer can then influence the optimizer through the CQS statement. Writing a shape for a more complex query, however, is a complex task. Starting with a working shape, even if it is not the plan we want, will usually help quite a bit. The SHOWSHAPE command will usually provide such a shape, describing the plan chosen by the optimizer. We can then make incremental modifications to this shape. Before starting with these modifications, we can do two more things to simplify the task. First, we can make the shape easier to work with by formatting it in a readable way. Second, we can remove any parts that we do not need to force, such as extra arguments to scan operators and entire subtrees, e.g. for subqueries (replace such subtrees with the word “anything”).
Step Three – Making Incremental Modifications
Finally, we do want to incrementally change the shape to get the desired plan. The SQL/MX Query Guide contains detailed information on how to approach the use of CQS. It is a good idea to break this task up into small changes and to test the shape after each change. Unfortunately there are many reasons why a modification might not work. Here are some common ones:
Missing Enforcers
A changed plan may require a sort or a repartitioning step that has to be added to the shape as a sort or exchange node. Also, a partition_access node needs to be used above any operators that are to be executed in DP2.
Missing Root Groupby
Groupby operators can often be pushed down to DP2 or be executed in parallel, but in many cases a second “root” groupby operator is required to finalize the results from the parallel instances. Rather than moving such a groupby operator it needs to be split into two.
Missing or Extra Index Join
Accessing a base table through an index is modeled as a nested join between the index and the base table in SQL/MX and is sometimes called an “index join”. SQL/MX uses index joins only where necessary, and a shape needs to contain index joins if and only if one is needed. Index joins are necessary if the index itself does not supply the columns needed in the query.
Understanding SQL/MX EXPLAIN
When moving to SQL/MX, most users are at first surprised, overwhelmed, and sometimes annoyed by the results of the EXPLAIN stored function, the source of the equivalent information in SQL/MX. This paper is intended to take a closer look at SQL/MX EXPLAIN output and to provide an approach to its interpretation.
EXPLAIN Is Simple
All query plans in SQL/MX are managed as trees of operators (such as joins, unions, scans). This becomes visible to users at two places, the EXPLAIN table-valued stored function, and the use of CONTROL QUERY SHAPE commands to influence plans.
While the initial reaction to EXPLAIN is completely understandable, it is really simpler and in many ways easier to read than its SQL/MP counterpart. All the information is returned in normalized form as a single tree, with no exceptions (no subqueries, sorts, initialization steps, etc.). Unlike SQL/MP EXPLAIN, the MX version can be retrieved at any time, for any program, and it is the actual plan, not a view of what the optimizer would choose today. Since the interface is SQL, the returned information can be customized in many ways. We’ll see examples of this below.
Which Tools to Use
Unfortunately, the normalized form of EXPLAIN is also what makes the raw output hard to read. Like HTML text, raw EXPLAIN output is really meant to be the input for some tool that presents the information in a more human-readable format. In most cases that tool is Visual Query Planner (or VQP, for short), but there are a few more. See the NonStop SQL/MX Installation and Management Guide on how to install VQP; this tool is included with SQL/MX. In case an ODBC connection is not available or VQP cannot be installed on the PC, a way to explain a query through MXCI is more desirable. Let’s talk about some such ways:
Here is a simple example query (How many employees do we have in each department?) that we’ll use in the next few sections:
PREPARE XX FROM
SELECT d_name, count(*)
FROM emp, dept
WHERE emp.d_num = dept.d_num
GROUP BY dept.d_num, d_name;
First, using the SET SHOWSHAPE command in MXCI can help to get an idea on the query plan. For our sample query above, MXCI might generate the following output, which you may want to simplify for better readability, like it is done below:
control query shape
esp_exchange (
hybrid_hash_join (
sort_groupby (
split_top_pa (
sort_groupby (
scan(path ‘CAT.SCH.EMPX’)))),
esp_exchange (
partition_access (
scan(path ‘CAT.SCH.DEPT’)))));
Alternatively, we could look at the output of the EXPLAIN function. We could print only selected columns, like with this query:
SELECT seq_num, left_child_seq_num AS l,
right_child_seq_num AS r, operator, tname
FROM TABLE (EXPLAIN (NULL, ’XX’));
An undocumented option of SQL/MX does something similar, but with some more helpful information:
DISPLAY_EXPLAIN OPTIONS ’f’ XX;
This could produce the following result for our query:
LC RC OP OPERATOR OPT DESCRIPTION CARD
11 . 12 root 2.39E3
10 . 11 esp_exchange 1:8 (range) 2.39E3
9 3 10 hybrid_hash_join 2.39E3
8 . 9 map_value_ids 2.40E3
7 . 8 sort_partial_groupby 2.39E3
6 . 7 split_top 8 (range):16 (logph) 2.40E3
5 . 6 partition_access 2.40E3
4 . 5 sort_partial_groupby 2.40E3
. . 4 index_scan fs EMPX (s) 9.99E5
2 . 3 esp_exchange 8 (range):4 (range) 2.50E3
1 . 2 partition_access 2.50E3
. . 1 file_scan fs fr DEPT(s) 2.50E3
Again, please note that the above command is currently not supported by HP, has some limitations, and may change in the future. This command internally uses a complex SQL query to produce the output shown. The DESCRIPTION field contains information that depends on the operator type. For scan operators the table name is shown, while exchange operators show the number of parallel instances for the corresponding fragments.
Finally, if we decide to retrieve the full EXPLAIN information, another unsupported tool called FixExplain is available to format the result in a human-readable form.
Getting an Overview
When we look at the output of any of the commands or tools mentioned above we can see the general shape of the query. We can see that a hash join is used, with EMPX (an index on the employee table) on the left and with the department table on the right. This gives us some general idea about how tables are joined together and which indexes are selected.
Some of the displays above show the estimated cardinality of the intermediate results. Another important parameter to look at is cost. VQP is the best way to examine this information. Two numbers are given for each node in the tree, the cost of the operator itself and the cost of the operator and all of its children. SQL/MX considers overlapping operations, and, therefore, the total cost of an operator may be less than the sum of the children’s cost and the operator’s own cost.
Understanding Parallelism
Now that we know the basics of the query plan, it is time to look at how the optimizer has distributed the work over the CPUs. The first step to answering this question is to find the fragments of the query, determining the distribution of the work among different processes.
Operators, such as joins and scans, typically pass their result rows in memory to their parent operators. This means that they have to execute in the same process. The only way a row in SQL/MX can move from one process (such as DP2 or HP NonStop Server Disk Process, an ESP or Executor Server Process, or the application process) to another is by passing through an EXCHANGE operator. EXCHANGE operators consist of two (top and bottom) parts, one in each of the participating processes, and the rows get passed inside the operator. But, there is more to an EXCHANGE operator. It can sometimes also let multiple processes talk with each other, and can repartition data coming from N sources to M destinations.
SQL/MX has three flavors of EXCHANGE operators. Two flavors of the PARTITION_ACCESS (PA in short) operator communicate with DP2 and an ESP_EXCHANGE operator talks to ESPs and provides repartitioning. The two PA operator flavors can be distinguished by the node above the PA. If it’s a SPLIT_TOP node then we are dealing with a parallel flavor that can talk to multiple partitions at the same time. In all other cases we are dealing with an operator that talks to DP2s one at a time. Note that CONTROL QUERY SHAPE commands combine split top and PA node to a single operator called SPLIT_TOP_PA. Either way, SPLIT_TOP and PA form a unit whenever they appear together.
It is often useful to draw the query tree on a piece of paper and to indicate the process boundaries by drawing bubbles around those parts of the tree that execute in the same process. Those parts are called fragments. Here is how to find them. Each fragment has a root operator or an EXCHANGE operator at its top, and, conversely, each root node and EXCHANGE operator is at the top of a fragment. The bottom of a fragment is determined by a “leaf” node such as a scan operator, or by the start of another fragment. Here is the tree of our earlier query drawn in such a way: We see five fragments, one root fragment, two ESP fragments and two DP2 fragments. As you can see, we type the fragments based on where they execute. The top nodes of the fragments (ROOT, ESP_EXCHANGE, PARTITION_ACCESS) indicate this. These top nodes are shown as hollow circles in the diagram. SQL/MX is inherently parallel, and fragments have everything to do with this parallelism. While there is only one root fragment, representing the application program, ESP and DP2 fragments can have multiple parallel instances, running in multiple ESPs or DP2s. The SQL/MX optimizer has only limited influence on how many parallel DP2 fragments there are, since that is determined by the partitioning of the data. However, it has some freedom on the choice of the number of ESPs. The tree above indicates that the ESP fragment performing the hash join has eight parallel instances while the ESP fragment below it has only four parallel instances.
Managing NonStop SQL/MX Modules
As a result of that we have a separate preprocessor step that separates the language source code from the SQL statements. Then the standard language compiler is used to compile the host language source code to generate an object file. Multiple object files can then be bound together into a single executable. This executable is not an SQL object and is not registered into the SQL metadata. The SQL statements are put into a module definition file that is compiled by the SQL compiler, mxcmp, to generate the plan that it stores in a module file. An alternative method for module definitions is provided in SQL/MX Release 2.0 (discussed later). The executable has a reference to the name of this module file that it then loads at execution time. The module file does not know what executables are using it since the object file associated with the module (usually a temporary artifact) can be linked independently into many executables.
By contrast, SQL/MP uses an architecture that stores query plans in the executable program file. SQL/MP uses a proprietary compiler that creates an object file with the embedded SQL statements separated out within it. The preprocessor in effect is built into the compiler. SQLCOMP, the SQL compiler, then uses these statements within the object to generate an executable containing the SQL plans. This executable is registered into the SQL/MP catalog and the executable is marked as a SQL object. The executable has to be dropped like any other SQL object.
So let's see what some of the differences are between SQL/MP and SQL/MX before we discuss how to manage some of the complexity that having separate modules has introduced.
What the SQL/MP model provides:
1. A simple compilation model
2. A single self-contained executable to move to the production environment
3. The registration of a SQL executable in the catalog which facilitates "where used" reports for executables using certain SQL objects .
4.Invalidation of plans when DDL operations are performed.
What the SQL/MX model provides:
1. Independence from the language compiler being used
2. Standard object files not requiring proprietary handling by other tools, such as NOFT, NLD
3. The ability to link multiple objects with embedded SQL into a single executable
4. The ability to develop these multiple objects in different languages
5. The capability to create reusable embedded SQL component libraries.
Before a discussion on the issues with managing modules and some potential solutions, a bit of background would help. The module file is designed to have an ANSI three-part name composed of the catalog, schema, and module name since it is considered a SQL object. The catalog and schema for modules is currently completely independent of catalog and schema associated with the SQL objects it uses. This schema may not even exist.
The module name can be fully or partially qualified. If a catalog and schema is not specified, the preprocessor assumes a default catalog and schema name. The default catalog and schema name when preprocessing on the NSK under OSS is the group and user name respectively. The default names when preprocessing is done on a workstation, say using ETK, are SQLMX_DEFAULT_CATALOG_
.SQLMX_DEFAULT_SCHEMA_. Preprocessor run-time options -g moduleCatalog and -g moduleSchema can be used to provide specific catalog and schema names. The third part of the name is the name specified in the MODULE statement in the source program. The MODULE statement can specify the catalog and schema name as well. These will override the preprocessor options if specified. If a MODULE statement is not specified the preprocessor will generate a module name based on the timestamp, e.g., SQLMX_DEFAULT_MODULE_211954938763281467.
Issue: Grouping, Versioning, and Re-targeting Modules
What if I want to create a new version of a program without overlaying the old version? What if I want to use the same program but compile it to run against a different set of tables? What if I just wanted to group my modules in some way?
To address these issues, the first set of module management features was introduced in Release 1.8 with preprocessor options to add suffixes or prefixes to the module name. The options are -g moduleVersion, -g moduleTableSet, -g moduleGroup that result in a module name
This allows you to have a new version of the same program by using a suffix, such as v2, so that a different module is generated than the older version. It does not overlay the older module which can still be used. Similarly you can create a new module and an executable for the same program that runs against a different set of tables by using a suffix for it. Finally, you can provide a prefix so that within the catalog and schema you have used the modules will be grouped together for easier management.
Issue: Doing All of the Above Without Preprocessing
What if I have multiple environments on the system for the same application, such as development and QA? The above scheme would work but it would require preprocessing all of the programs with potentially a different prefix or suffix for the module names. What if instead I want to just replicate my application from development to QA and not have development updates of the modules affect my QA work?
To address this issue in SPR AAV we released the capability during a SQL compilation to place modules globally or locally. The syntax at SQL compile time is to use -g moduleGlobal or -g moduleLocal [=
The above paragraph used development and QA just as examples of two environments that you might maintain. But of course, these multiple environments could be of your own choosing with potentially no overlap of modules between them. In some sense global and local module support supersede the functionality provided by the Version, Table Set, and Group support provided earlier. However, you could potentially use both of those capabilities together. That is, you could have globally or locally placed modules with versions, groups, or table sets being maintained in the same directory.
Issue: Targeting a Different Catalog and Schema
So I can have multiple environments, such as development and QA on the same machine. However, if I specified a catalog and schema in my source file how can I target a different catalog and schema for my SQL objects for each environment without changing my source files or re-preprocessing?
There are SQL compile time options available since R1.8 using Control Query Default parameters (-d option) to specify a different catalog and schema name at SQL compile time. These options are different from the module catalog and schema options available during preprocessing. They work only if SQL objects referenced in your program are not fully qualified. If a SQL object is fully qualified these compile time options do not have any effect and do not remap the catalog and schema for that object. Declaring a catalog and schema within your program fully qualifies a SQL object and is equivalent to using its three-part ANSI name. With SQL/MP tables this is relevant only if MP aliases are being used. If DEFINEs are being used the question is irrelevant since the target objects will have to be specified as run-time DEFINEs.
Issue: Securing Modules
How can I prevent someone else from deleting or updating my module files?
Besides the fact that you may not want someone to clobber your module files, these files need to be secured since they have privileged access. There is a way using the OSS commands, such as chmod, chown, and umask, to secure the module directory using the sticky bit. Using this facility any non-privileged user can create module files in this directory. But once the module is created only that user (besides SUPER) can update or delete that module file. See Support Note S04015: Securing SQL/MX User Modules or the SQL/MX Release 2.0 manuals for details.
Issue: Managing Multiple Module Definition Files
I link multiple objects into a single executable and therefore have multiple module definition files associated with a single executable. If I have to move my application into production I have to move the executable and all its related module definition files into production and then recompile them individually. I liked the MP way!
With SQL/MX Release 2.0 you can now get functionality similar to SQL/MP. The preprocessor annotates the output source file so that the SQL statements that would have been in the module definition files now reside in the output file of the preprocessor. When this file is compiled with the language processor an object file is created. A utility called mxCompileUserModule now replaces mxcmp (it actually invokes mxcmp on your behalf). This utility can be run against the object file or the executable itself. If an object file will be bound into multiple executables it may be best to run the utility against that object file. Otherwise, it is best, especially when the executable is moved to production, to run the utility against the executable. You do need to be careful that when using the local directory capability that all the required modules are collocated with the executables that need them. It is still different than SQL/MP in the sense that SQL/MP puts the compiled plan into the executable whereas SQL/MX still creates separate module files.
Issue: Getting Module Usage Information
If I will be performing a DDL or Utility operation on a table how would I know which modules will be impacted by such an operation?
DISPLAY USE OF is available for modules starting with SPR ABC so that you can find out what objects are being used by a module or what modules are using a specific object. For the latter case this utility goes through all the module files to determine if the object is being accessed. Considering the infrequency of the need for this information the performance of this utility should be acceptable.What Is in the Future for Module Management?
We consider that the issues discussed above have been satisfactorily addressed. If a customer is not satisfied with the current solutions to address them we would need to get a Request for Enhancement for us to work on those issues further.
One issue that has not been addressed is module invalidation that was available with SQL/MP. It is not clear how important this is for our customers since most customers do not want an application down because an invalidated program fails to execute in production. However, if the requirement is to understand what modules will require recompilation in order to execute after a DDL or Utility operation we understand that requirement. The challenging part of addressing that requirement is that the development environment is not as tight and controlled. The modules do not really know which executables they ended up in. Users can delete modules or executables, and there is nothing that guarantees consistency and currency of the objects in the development environment. So a solution that tracks the modules that would be "invalidated" by a DDL operation is handicapped by the fact that the modules that were compiled may not exist or may not even be used with an executable. Certainly SQL is not aware of the executables and their movement or deletion. So resolving this issue needs a qualification of the requirement and a fair amount of deliberation. If you use consistent naming conventions for source code, modules, and executables it becomes somewhat easier to determine the relationships between your application objects. It may also make it easier to determine obsolete objects for deletion.
With the availability of SQL/MX native tables we would like users to be able to map a three part SQL object name to a different name at run-time. While most customers like to pre-compile their programs against the target SQL objects they will run against, many rely on late name binding to execute against a different set of SQL objects at run-time. DEFINEs can be used for SQL/MP tables to map a table to a different one at run-time. With SQL/MX tables this mapping can be achieved using PROTOTYPE support that uses host variables populated with the correct object names at run-time. This requires knowing the names of the run-time objects at compile time. It also requires logic in the application to choose the correct run-time object names if there are multiple environments that the application may run in. The SQL syntax for prototype is an extension of the ANSI standard syntax. The preferred mechanism would be similar to DEFINEs that would simply remap the SQL objects, or even just the catalog and schema, at run-time. We are looking at providing this functionality sometime in the future.
-Rohit Jain
Database Architect and Evangelist
HP
An Open-Source Web Services Solution for NonStop
When choosing a target host platform, we must carefully balance reliability, scalability, availability, performance, and security requirements against the implementation cost. This sometimes means that an application we would prefer to deploy to NonStopAA must instead be targeted to other platforms.
Recently, however, the HP NonStop Java J2EE platform, along with Open-Source Web Services components, has not only made it more feasible for us to target the HP NonStop platform, it has also simplified our integration with heterogeneous client platforms such as Microsoft .NET and Oracle, as well as Java J2EE.
The Business Case
We recently had a business requirement to implement a new fraud detection and prevention system within my company. The technical requirements were simple: We had to enable and disable merchant terminals, based on the results of an automated fraud analysis program. An Oracle database running on an HP-UX server would perform the fraud analysis and disable merchant terminals when fraud was detected. Subsequently, support staff would manually re-enable the terminals using a Web browser, accessing a Microsoft .NET Web application. The transaction engine, i.e., the NonStop system, would perform the actual enabling and disabling.
There are many integration solutions available for NonStop that could have met our functional requirements. But most of the solutions are complex, involving several tiers and layers upon layers of integration. Third-party software licensing contracts are required, along with all the associated custom integration development and support contracts. And, in the end, my company, Paymentech, would become more dependent on outside vendors. In addition, the fraud detection project was a good example of exactly the type of development that we prefer to keep in-house for security reasons.
After some analysis, we determined we needed a heterogeneous messaging interface, so the three systems (Oracle, Microsoft .NET, and NonStop) could efficiently interoperate. Gazing further down the road, we saw potential that the terminal control interface might be reused for many more internal projects. With the entire IT industry moving in that direction, a Web Services or Service Oriented Architecture (SOA) became the obvious choice.
We looked at several different Web Services infrastructure solutions. They were, for the most part, very expensive and/or required outside development and service contracts. We weren't prepared to invest heavily in third-party solutions on our first foray into the NonStop Web Services world. So, to get the ball rolling, we looked at available Open-Source solutions.
iTP Web Server. The iTP Web Server is a general-purpose Web server that, more significantly, is the required front-end for NSJSP. The iTP Web Server can static .html documents and CGI programs written in a variety of languages, including C and COBOL. In our case, we already had iTP Web Server installed. In the Open-Source world, the iTP server would be analogous to the Apache Web server (Tomcat runs under the Apache Web server; on NonStop, it runs under iTP).
NonStop Java Server (NSJ) 4.0. NonStop Java Server is HP NonStop's implementation of the Java compiler and Java Virtual Machine (JVM). It is, essentially, the Java JDK Version 4.0 for the NonStop platform. NSJ is required for all NonStop Java development. For us, the JVM was the most important component, since we develop and compile on our local workstations then deploy to the NonStop JVM.
NonStop Java Servlets for JSP (NSJSP). NSJSP is a port of the Apache Tomcat Web Application Server to the NonStop platform. NSJSP is a full servlet engine, capable of running many Open-Source components, including the Apache Axis Web Services. In our initial testing with Axis under NonStop, we ftp'd the axis.war file from our local Windows workstation to the webapps directory under NSJSP. Axis automatically deployed, and we were able to view the Axis home page running under NonStop. The steps for configuring Axis under NonStop were identical to those for installing under Tomcat on our local machines. On the Axis home page, there is a link called HappyAxis that validates the installation. The HappyAxis link works the same on NonStop as it does for other platforms. HappyAxis identified that we needed to tweak the location of a couple of archive (.jar) files, but this step was identical to the actions required for other platforms. Within minutes, we had Axis up and running on NonStop.
NonStop Java Toolkit (JToolkit). Of course, the real meat of our application required access to legacy NonStop Pathway servers. But how do you connect from a Java program (in the Axis case, a servlet) to a Pathway program written in COBOL? The answer was simple. HP NonStop has provided Java APIs for accessing all legacy servers and databases.
These APIs are included in a package called JToolkit:
JPathsend - Communicates with legacy Pathway serversJEnscribe - Directly accesses legacy Enscribe databasesJPathway - Framework for Java Pathway servers
In addition, JToolkit provides a utility program, ddl2java, which generates Java interface classes from the same DDL files that are used to generate the Pathsend messages for the legacy servers.
While the HP NSJSP application server and related products provide a complete Tomcat environment that will run Apache Axis, the NonStop Java JToolkit enables the key requirement of direct access to legacy Pathway servers and Enscribe tables.
Web Services is a powerful strategy for NonStop integration solutions in a heterogeneous environment. The HP NonStop Java development suite provides all the tools required to run the Open-Source Apache Axis Web Services distribution. The HP JToolkit APIs provide all the tools required to integrate directly with legacy COBOL Pathway servers. The Open-Source Apache Axis is therefore an efficient and cost-effective Web Services solution for the NonStop platform.
- Sam Ayers