Tuesday, June 14, 2016

Tandem Computers, Inc.

The Tandem computer was the granddaddy of fault-tolerant systems. Tandem’s first system was delivered in 1976. Forty years later, its original architecture remains the dominant fault-tolerant technology. Then and now, a Tandem system was a loosely coupled multiprocessor system that contained anywhere from two to sixteen independent processors in a node. Up to 255 nodes could be included in a single network, linked via Tandem’s Expand communication network.


The processors in a node were linked via a duplexed, interprocessor messaging bus called the Dynabus, capable of a 26 megabyte/second data rate.

All device controllers were dual-ported so that there was always a path to a device even if a processor failed. All critical processes ran as process pairs in two different processors. One process was the primary process, and one was the backup process. The primary process kept its backup process synchronized via checkpointing messages. Should the primary process fail (presumably due to a processor failure), the backup process took over and continued processing with no apparent interruption to the user. (Tandem’s later inclusion of the Pathway process monitor eliminated the need for application programmers to write checkpointed process pairs.)

With Tandem’s second release of its product, the Tandem NS2, each processor could be configured with two megabytes of memory. Each mirrored disk pair could provide 128 megabytes of memory (yes, that’s megabytes, not gigabytes).

Tandem was acquired by Compaq in 1997, which then was acquired by HP in 2002. Tandem computers are now known as HPE NonStop computers. (HPE is HP Enterprise, one of the two companies that resulted from the split in 2015 of Hewlett Packard into HP, Inc., which sells HP personal computers and printers, and HPE, which markets HP server, storage, and networking systems.) 

Sunday, June 12, 2016

Techniques - Which can use to influence the SQL compiler's choice of execution plan

Below techniques which can influence the SQL compiler's choice of execution plan.

  1. Performing Update Statistics
  2. Creating or Modifying Indexes
  3. Changing the Query Text
  4. Using SQL CONTROL Statements

1.Performing Update Statistics


The SQL compiler uses table statistics to help estimate the cost of potential execution plan candidates. (By the way, don't confuse table statistics with the execution statistics we covered above. 

Table statistics tell us about the distribution of data values in the columns of a table.) Table statistics are not maintained automatically. You have to tell NonStop SQL to re-calculate them on an on-going basis. You use the UPDATE STATISTICS command for this. 

Not only is it important to keep table statistics up to date, you must also generate the right statistics. In general, it is a good idea to update statistics for all key columns. If you are tuning a poorly performing query, make sure there are statistics for the columns involved in the filter conditions. For MX, be sure to include multicolumn statistics when a table filter refers to more than one column.


2. Creating or Modifying Indexes

Adding an index is often the best way to improve query performance.

However, you need to consider the additional cost that will be incurred for INSERT, UPDATE, and DELETE (IUD) operations against the table. An insert or delete against the table will also cause an insert or delete against all of its indexes. An update will cause an index to be updated if the index includes any columns that are changed by the update.

For a table that is not too volatile, creating an index to support an important query is often worth the extra cost incurred by IUD operations.

You might consider creating an index to:
• Support a filter, avoiding a full table scan.
• Support a join, if you suspect a nested join supported by an index will perform better than a hash join.
• Support an aggregate, enabling the use of an index only scan instead of a table scan.

You might consider adding columns to an index to avoid a table scan. If an index contains all of the columns required by a query an index-only scan can be used, this is almost always more efficient than a table scan (since index rows tend to be much smaller than
table rows).

3. Changing the Query Text
It is often possible to write a query in more than one way, yet still 
get the same rows in the result. Changing the way that a query is 

written can allow the SQL compiler to choose a better execution plan.



Here is a trivial example to illustrate the point:

Consider this query:

SELECT * FROM DEPT D 
WHERE D.DEPTNUM IN (1000, 3000, 4000)



The DEPT table has a primary (cluster) key of DEPTNUM.

The compiler generates a plan that performs a full table scan on
DEPT, and applies the filter predicate to each row.

If we re-write the query like this:
SELECT * FROM DEPT D
WHERE D.DEPTNUM IN (1000, 3000, 4000) AND
D.DEPTNUM >= 1000 AND
D.DEPTNUM <= 4000

This returns the same results, but the compiler generates a 
plan that does a subset scan that only reads the rows between 
DEPTNUM 1000 and 4000.


4. Using SQL Control Statements


You can use SQL CONTROL statements to influence the behavior of the SQL compiler. You can use them to directly control join order, join methods, access paths, and parallelism.
There are also less direct options like specifying selectivity estimates for example.

The MX CONTROL QUERY DEFAULT (CQD) statement supports a large number of options, including a number that are useful in query tuning.

The JOIN_ORDER_BY_USER attribute allows you to control the join order used in the execution plan. The tables are joined in the order in which they appear in the FROM
clause.

The HASH_JOINS, NESTED_JOINS, and MERGE_JOINS attributes can be used to enable or disable the corresponding join methods. Unfortunately, they apply to the query as a
whole, and not to individual tables.

The INTERACTIVE_ACCESS and INDEX_ELIMINATION attributes can be used to encourage or discourage the use of indexes.

The ATTEMPT_ESP_PARALLELISM attribute controls the generation of parallel plans.

The MX CONTROL QUERY SHAPE (CQS) statement provides a means to force the use of a particular execution plan. However, its use is generally discouraged. A change to
the database or the SQL compiler may make the forced plan inoperable. CQS statements are hard to write, especially from scratch. The best approach is to use CQD statements (including
the JON_ORDER_BY_USER option) to get a plan that is as close as possible to the desired plan. Then use the SHOWSHAPE statement to generate the CQS statement that reflects the current plan. Finally, make adjustments to the generated CQS statement.

The MP CONTROL TABLE statement provides direct control over the join order, join methods, and access paths used in the execution plan. The CONTROL EXECUTOR
statement allows for the selection of parallel or serial plans.

Using CONTROL statements is a two-edged sword. They give you a way to influence the execution plan produced by the compiler, but at the same time they may prevent the compiler
from finding a better plan.

Measuring Query Performance


In query tuning, there are two motivations for measuring performance: 

To compare the relative performance of your tuning experiments. This enables you to determine if an experiment improves performance compared to the baseline.

To measure the amount of work being performed when executing the query. This can help identify unnecessary work, for example database records that are read, but not used. The gold standard for monitoring performance is to use Measure.

You can use Measure to collect very detailed information on all aspects of query performance. However, it is quite labor intensive to gather and analyze a complete set of Measure data for a query.
This is especially true for non-trivial queries that access many tables and involve many ESP and DP2 processes.

A more practical approach is to use the execution statistics provided by SQL. SQL execution statistics include the elapsed time for compiling and executing the query, as well as per-table statistics. The per-table statistics include:

Records read and used. The number of records read from a table and the number that were actually used.

Number of messages and message bytes. This is a measure of message system activity, and is a good analog of IO load.

Lock waits and escalations. This is an indicator of lock contention. The response time of the query will suffer if it encounters lock waits. Unfortunately, neither VQP nor VQA will gather the SQL execution statistics for you. If you are using one of these tools for the execution plan analysis, you will need to use MXCI to gather the execution statistics. The easiest way to do this is to use the SET STATISTICS command. (MP users can use the SQLCI SET SESSION STATISTICS command.)

Once the statistics option is turned on, the execution statistics will be displayed automatically after preparing and/or executing the query. The SQLXPress Visual Query Tuner automatically collects and stores execution statistics as part of performing a tuning experiment

Query tuning

Query tuning is a multi-step process that generally goes something like this:



1. Establish a baseline. Generate the execution plan and establish the current performance characteristics of the existing query. This gives you a base for comparison when you perform tuning experiments. 

2. Analyze the current execution plan. You should understand the path to the data that is being taken by NonStop SQL. 

3. Identify problems with the existing plan, and come up with a better plan.

4. Make changes to get the plan you want. This is not always obvious. You may have to perform multiple tuning experiments to arrive at the solution.

5. Having implemented the changes, generate a new execution plan, and re-measure the performance. 

6. Compare the results with the baseline. If things have improved, incorporate the change into the new baseline, otherwise abandon the change. 

Repeat steps 2 through 6 until you are satisfied with the query performance.

Why SQL Query runs Slowly?

Introduction 

Query tuning is the process by which you answer the question "How can I make this query run faster?"

There are many possible reasons why an SQL query runs slowly.

• The system may simply not have enough processing power, memory, or disk bandwidth to handle the application load.

• There may be a problem with the system configuration, such as insufficient disk cache, MX buffer space, or kernel managed swap file space.

• The files used to store the database tables may be badly fragmented. Such "system-level" problems are outside the scope of query tuning, and must be addressed separately before query tuning can be effective.

The query tuning process treats the query text as the sole statement of the problem. Its goal is to produce an execution plan that runs the query faster, while returning the same set of rows as before.

Why Queries run slowly?



From a business process perspective, most reasonable queries will return a moderate number of rows.

For a reasonable OLTP query, the maximum number of rows returned is around one hundred or so (and is often much less than this).

The user of an interactive query doesn't want to browse through many more rows than this.

Even for a reasonable batch or reporting query, the user does not want to browse through more than a few hundred (or very few thousand) or so rows.

If a batch-type query returns more than a few thousand rows, it needs to be broken down into multiple more closely filtered queries, perhaps aimed at multiple classes of user. Modern NonStop systems are quite capable of executing a query that returns a few hundred (even a few thousand) rows in a reasonably short time.

So the question arises;

Why does a query that returns a modest number of rows take too long to run? One answer is that the query accesses and processes too many rows that are eventually discarded, and don't contribute to the final query result.

The goal in this case is to avoid performing such non useful work.

Another such class of query is one that returns a reasonable number of rows, but which is required to aggregate a very large number of rows in order to produce the required results.

A query containing a GROUP BY clause is a common example of this. These queries can be difficult to tune. A common solution is to preaggregate the data in the database, and to query the summary data without accessing the details. The careful use of triggers can be used to synchronize the redundant summary data.