Using Google Chrome?

Download my
free Chrome Extension, Power Notes Searcher, to make searching for and evaluating SAP notes, much easier.

Saturday, June 25, 2011

Oracle Home Space Cleanup (how random that sounds)

Just a quick post to make you aware of a fantastic option I recently discovered in opatch.
I look after a large number of database systems, some having multiple Oracle homes on the same drive (I know I know, but when you're limited on drives...).
These homes consist of things like Oracle RDBMS (obvious one), Oracle 10g EM Agent and finally Oracle 10gAS (and 10g OEM if you like to think that OMS is separate, but it's a 10gAS so I didn't).

Generally the Oracle home for a 10gR2 RDBMS Enterprise Edition binary codeset seems to be around 1GB in size.
Apply a couple of opatches and you could be looking at 2GB!
All of a sudden, the partition is looking quite full (it used to be 9i on there when it was originally designed!!).

Lets bring in a new opatch command that came in as part of an opatch upgrade in 10gR2 (opatch patch 6880880).
The command is part of the "opatch util" command and is a sub-command called "cleanup".
It supposedly removes the duplicated binary files taken during an opatch patch application.

Oracle doc 550522.1 states "Starting with 10.2, Opatch does not backup only the affected modules, it also takes a backup of the complete affected libraries to $ORACLE_HOME/.patch_storage//backup//.".

The cleanup command only removes files that are not required as part of a patch rollback.

Syntax: "$ opatch util cleanup".

You will be prompted to select the oracle home and confirm the cleanup. It will also inform you of the amount of space saving.
Enjoy the space you get back, while you can!

Friday, June 24, 2011

Checking Lack of Bind Variables & Adhoc SQL in Oracle

The following Oracle SQL is a crude method that can be used to check the V$SQLAREA view for SQL statements that have the same execution plan, but where the SQL is unique. e.g. SQL statements where only the predicate changes in the WHERE clause.  These are otherwise known as adhoc queries (something that SQL Server 2008 is supposed to be very good at detecting).

The statement was constructed based on the "Understanding Shared Pool Memory" document by Oracle (

The doc basically says that a count of the PLAN_HASH_VALUE column versus the number of SQL_IDs using the plan indicates how bad the SQL needs bind variables since it should have only one or two SQL statements (the lower the better) per plan.

In my opinion, systems that return >60% for ACTION "Could use bind variables." should either consider revising the application SQL code (permanent fix), or using the CURSOR_SHARING=FORCE init parameter (temporary fix!).

SELECT 'Multiple SQLs in SQLAREA using same plan:' DESCRIPTION,sum(pct_hash_use) PCT,'Could use bind variables.' ACTION FROM (
select hc.plan_hash_value,
round((hc.hash_use_count/sc.sql_count*100),2) pct_hash_use
(select s.plan_hash_value, count(s.plan_hash_value) hash_use_count
from v$sqlarea s
group by s.plan_hash_value
having plan_hash_value > 0
) hc,
(select count(1) sql_count
from v$sqlarea t
where plan_hash_value > 0
) sc
order by pct_hash_use desc
) WHERE hash_use_count > 1
SELECT 'Single SQLs in SQLAREA using same plan:' Description,sum(pct_hash_use) PCT,'No action needed.' ACTION FROM (
select hc.plan_hash_value,
round((hc.hash_use_count/sc.sql_count*100),2) pct_hash_use
(select s.plan_hash_value, count(s.plan_hash_value) hash_use_count
from v$sqlarea s
group by s.plan_hash_value
having plan_hash_value > 0
) hc,
(select count(1) sql_count
from v$sqlarea t
where plan_hash_value > 0
) sc
order by pct_hash_use desc
) WHERE hash_use_count <= 1;

You can also use the following query to determine roughly how much memory is wasted on single SQL statements that have been executed only once (you should use this on a well bedded in system that has been running for a while):

select count(1) num_sql,
sum(decode(executions, 1, 1, 0)) num_1_use_sql,
sum(sharable_mem)/1024/1024 mb_sql_mem,
sum(decode(executions, 1, sharable_mem, 0))/1024/1024 mb_1_use_sql_mem
from v$sqlarea
where sharable_mem >0;

You should double check the output with the SQL text in V$SQLAREA to ensure that the SQL is truly ad-hoc.

Remember to get some hard and fast values for the "Concurrency" wait class before and after the change (bind variables introduced or init parameter changed).

You can use SQL: "select * from v$system_event where wait_class#=4 order by average_wait desc;" to determine the average wait time.

I would expect the library cache operation wait times to reduce as hard parses are reduced.

Here is an example of a system that is not using bind variables to optimise SQL:

When running the script we get:

You should note that an OLTP system such as SAP ERP should have a much lower count for adhoc SQL  compared to a SAP BW system where users can write their own queries.

Oracle Explain Plans: Cardinality & Histograms

This is one of the better white papers on how to read an Oracle explain plan:
The most important point is made on cardinality on page 10 when explaining "Data Skew".
The term "Data Skew" is used to refer to the imbalance of values with respect to a normal distribution (
The imbalance is shown as the Mode ( of the values in a column of a table.


   1 | A <== Mode = most frequent.
   2 | A
   3 | A <== Median = middle one
   4 | B
   5 | B

Histogram for COL2:

Value |Number
A     |###
B     |##

Data Skew means that more of (usually it takes a lot more, like +60%) the values in COL2 have a value of "A" than of "B".
As the Oracle whitepaper points out "One of the simplest formulas is used when there is a single equality predicate in a single table query (with no histogram). In this case the Optimizer assumes a uniform distribution and calculates the cardinality for the query by dividing the total number of rows in the table by the number of distinct values in the column used in the where clause predicate.".

In example "SELECT * FROM A_TABLE WHERE COL2='A'" this would equate to: Cardinality = 5 / 2 A cardinality value of 2.5 would be rounded to 3.
Accurate enough.

The problem with this is that the larger the number of rows the more skewed the cardinality will become.
Imagine the table with values like so:

Value |Number
A     |###############
B     |##

Now in example "SELECT * FROM A_TABLE WHERE COL2='A'" this would equate to: Cardinality = 17 / 2
A cardinality value of 8.5 would be rounded to 9.
Definitely not 15.

To solve this problem, Oracle has the capability to collect statistics for individual columns.
You can use the DBMS_STATS.GATHER_TABLE_STATS command to collect stats on the column (

The below statement will collect stats on the COL2 column of A_TABLE in the OE schema with a bucket size of 20 buckets.  These column stats are histograms of the distribution of values.



Two types of HISTOGRAM are available in Oracle, FREQUENCY and HEIGHT.
Oracle uses HEIGHT based histograms *only* when the number of distinct values of a column is greater than the number of buckets specified.
"In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall.".

Oracle will use FREQUENCY based histograms by default when the number of distinct values of a column is less than or equal to the number of buckets specified.
"In a frequency histogram, each value of the column corresponds to a single bucket of the histogram.".

It is possible to view the collected histogram for a column:

SELECT column_name, num_distinct, num_buckets, histogram
WHERE table_name = 'A_TABLE' AND column_name = 'COL2';

SAP RFC Gateway Error 10060

Yesterday I spent the majority of the morning tracking a problem in the SAP ERP system.
It has been occurring for a while but I have had far more important issues.  Finally I got round to looking at it.

In hosta SM21 system log, the gateway process (RD) had registered a network connect error (Q0I) "Operating system call connect failed (error no. 10060)":

SM21 Q01 operating system call connect failed 10060

The architecture of the SAP systems involved in the landscape looks like the following diagram, which shows an SAP ERP single ABAP stack (hosta), which is connected to an SAP Enterprise Portal system which is clustered (hostb & hostc):

Looking in the gateway error log (TX: SMGW), I saw the network interface (Ni) timeout error when looking up the hostname for IP address


The business process involved means that an RFC connection is established from the EP server processes (hostb) to the ERP gateway process (hosta).
As you can see in the architecture diagram, the IP is the private IP address of the hostb server in the EP cluster.
The private IP address is only used as a heartbeat address as part of the Microsoft Cluster service.
So the question was, why was the SAP ERP gateway process receiving inbound connections from hostb via the private IP address?

The method of analysis involves understanding how the IP to hostname lookups are performed on the Windows servers:

First, if you open a command prompt and use NSLOOKUP to check the hostname ("C:> nslookup hostb") it will return the hostname from the Name Service configured for the Windows Server. Usually this is in the order of hosts file first, then DNS. It performs the lookup using the appropriate network interface according to the network routings ("C:> route print").
In our example, the nslookup correctly returned the public IP address.

Second, you can issue a PING of hostb from hostb (itself) using "C:> ping hostb".
When using PING, it sends a network packet using the appropriate network interface according to the network routings ("C:> route print"). Since we are pinging ourselves, it will use the first network interface in bind order of the Windows network connections.
In our case, it was pinging the *private* IP address.
According to Microsoft, the PING command in Windows uses gethostbyname() to find the IP address for the destination to be ping'd(

So we have established that the private IP address is seen when we ping our own servername on hostb, why does this mean that the private IP is seen as the client in a SAP RFC gateway connection?
Well, in the same way that PING uses gethostbyname(), the RFC connection that is established through jlaunch.exe (host of the JAVA stack server processes) also uses gethostbyname(). Of course, we know what IP address a gethostbyname() call returns, as this is used in the PING command .

(findstr /M gethostbyname jlaunch.exe)

So, the RFC connection performed by the JAVA stack is created from the jlanuch.exe binary, which uses gethostbyname() during the bind() call, which returns the private IP.
The RFC connection is established and part of the RFC connection process the jlanuch.exe process passes it's calling IP address. Which is the private IP address.
The ERP system gateway at the other end of the connection receives the incoming connection and notes that it is coming from Should the ERP system on hosta wish to make an RFC callback (
(, it has no chance of returning the call to the private IP address on hostb. This is where we get the error "NiHsLGetHostByName failed".

So how do we resolve the problem?
We could code the lookup into the hosts file (172.x.x.2 hostb) on hostb, but this might cause other issues.
A quick look into the SAP documentation for installing Netweaver 7.0 SR3 in a Microsoft Clustered system reveals that a simple step was missed during installation on hostb.
In the document it states "The card of the public network must be displayed before that of the private network. If necessary, change the order in which the cards are listed by using the Move Up and Move Down arrows".

So I check the binding orders on hostb in Control Panel -> Network Connections -> Advanced Settings:

Cluster heartbeat adapters and bindings

They are the wrong way around. The "Cluster Heartbeat" (Private IP) should be beneath the Public IP ("Local Area Connection" in our example).
The Microsoft article here ( lists the process to change it and provides alternative solutions (I would recommend to follow the SAP document).

Changing the binding order on Windows Server 2003 does not require a reboot, but the change will not be effective until a reboot is performed.

Monday, June 20, 2011

Detecting & Reducing Space Used For SAP Business Workplace Documents

Within the SAP Netweaver 7.0 system is a complete e-mail tool that allows users to send/receive electronic documents.
These documents are visible in the SAP Business Workplace (TX: SBWP) inbox/outbox and also in the SAP Connect transmission requests (TX: SOST).

As time goes by and users accumulate documents in their inbox and internal/external mails get sent through SAPConnect, space in the database will be used.

SAP Note: 966854 recommends running RSBCS_REORG to clear down these documents and free the space.
Unfortunately it doesn't tell you how much space is taken up and how much you could get back.

Using the details in SAP Note: 706478 it is possible to check some of the tables at the database level.
Once again, the note doesn't list all tables involved.
Instead, running the RSBCS_REORG report in "Test" mode reveals the full extent of the tables where records will be deleted:

So using this information, the following constructed SQL query will return the current size of the database segments (Oracle) for the selected tables (FOR ECC 6.0 based systems):

select sum(s.bytes)/1024/1024 size_mb ,
s.segment_name table_name,
t.ddtext descr
from dba_segments s,
sapsr3.dd02t t
where s.segment_name in ('BCST_BOR',
and s.segment_name = t.tabname
and ddlanguage='E'
group by s.segment_name, s.segment_type, s.segment_name, t.ddtext
union all
select sum(s.bytes)/1024/1024 size_mb ,
t.ddtext descr
from dba_segments s,
sapsr3.dd02t t,
dba_indexes i
where i.table_name in (
and i.table_name = t.tabname
and i.index_name = s.segment_name
and ddlanguage='E'
group by s.segment_name, s.segment_type, i.table_name, t.ddtext
) t1
ORDER BY t1.table_name ASC, t1.segment_type DESC;

For R/3 4.7 you can use the query below (removes the table and index descriptions):

select sum(s.bytes)/1024/1024 size_mb ,
s.segment_name table_name
from dba_segments s
where s.segment_name in ('BCST_BOR',
group by s.segment_name, s.segment_type, s.segment_name
union all
select sum(s.bytes)/1024/1024 size_mb ,
from dba_segments s,
dba_indexes i
where i.table_name in (
and i.index_name = s.segment_name
group by s.segment_name, s.segment_type, i.table_name
) t1
ORDER BY t1.table_name ASC, t1.segment_type DESC;

You may just be able to see that the SOC3 table is using ~7GB of space with an index of 144MB and a lob object of 64KB.

Now if you run the RSBCS_REORG report using "Test" mode, with the settings to remove the documents that are >60 days old, then you can estimate what percentage of records will be removed from table SOC3 and therefore estimate the space saving.

As a rule of thumb, it may be wise to remove the documents from users who have left the company.
Generally the SAP account will be locked, so you can pull the account names using the following SQL query then add them into the RSBCS_REORG report "User" field:

SELECT DISTINCT bname FROM sapsr3.usr02 WHERE uflag =128;

NOTE: When running RSBCS_REORG, it will not remove assigned workflows from the "Workflow" folder (or sub-folders).

You may also consider running the report for the user who runs the SAPConnect background job step.  As this user will have the majority of occupied space.

Once you've deleted the records that are not required, the space in the database tables will be freed.
However, this will not release the space to the rest of the database, only the tables from where the records were deleted.
Stay tuned for my up-coming post on how to free the table segment space after you've removed thousands of records from the Oracle database.

Useful SAP Reports

It's always good to have the exact report handy, just incase the transaction is not in your SAP role:

BTCTRNS1 - Suspend All Jobs For an Upgrade
BTCTRNS2 - Un-Suspend All Jobs For an Upgrade
RDBMIDOC - Create IDocs from change pointers and send.
RBDMOIND - Status conversion of successfull IDoc communication.
RSALDBRG - Reorganise CCMS Alert Database
RSALERTTEST - Test alerts
RSAVGL00 - Table adjustment across clients
RSINCL00 - Extended program list
RSBDCSUB - Release batch-input sessions automaticly
RSCPINST - Check language installation config.

RSCSAUTH - Authorisations assigned to ABAP reports.

RSTXSCRP - Transport SAPscript files across systems
RGUGBR00 - Substitution/Validation utility
RSBCSRE03 - Clean old SOST records (send deliveries)
RSBDCOS0 - Execute Shell command (non-interactive).
RSBTCDEL - Clean the old background job records
RSDBCREO - Clean batch input session log
RSEIDOCA - Active IDoc Monitoring
RSORAPATCHINFO - Oracle patch level.
RSPARAM - Display all instance parameters
RSPO0041 - Removing old spooling objects
RSSNAPDL - Clean the old ABAP error dumps
RSSLG200 - Remove expired application logs (SLG1).
RSSLGK90 - Delete application logs created before the specified date.
RSSOREST - Clean old SOST records (send deliveries)
RSPO1041 - Cross-Client Spool Request Cleanup.
RSPOR_SETUP - BW / BI Configuration Consistency Check
RSTXICON - List all icons in the system
RSTRFCEG - Consistency check of outbound queues (RFC/trfc/qrfc).
RSUSR002 - Roles by complex selection (new)
RSUSR003 - Check the passwords of users SAP* and DDIC in all clients
RSUSR006 - List users last login
RSUSR020 - Profiles by complex selection
RSUSR050 - Compare users
RSUSR070 - Roles by complex selection
RSUSR200 - List users by login date.
RSWUWFML2 - Delivers work items via users e-mail address.
RSWWWIDE - Remove work items and related sub-items.
RSXMB_CANCEL_MESSAGES - Cancel messages in error.
RSXMB_CHECK_MSG_QUEUE - Check messages in queue.
RSXMB_SHOW_STATUS - Show statuses of XI messages in the persistence layer.
SSF_ALERT_CERTEXPIRE - Certificate expiry check.

Tuesday, June 14, 2011

Oracle vs SQL Server - I like it!

I found this today
It shows a good comparison of basic syntactical differences in SQL Server and Oracle when looking to compare performance / features.
I especially like the use of blockrecover in RMAN towards the end.  Nice.

Just take a look at the tags for it's hosting blog page and it's obvious which one will win without even reading the slide.

Useful SAP Transactions

Here are a collection of SAP transactions that I find useful:

AISUSER - Solution Manager maintain S-user for maint-opt.
AL08 - All Users Logged On
AL13 - Display Shared Memory
ALRTCATDEF - Alert category definitions
ALRTDISP - Alert display
ALTINBOX - Alert inbox
BDLSS - Shows all items associated with the logical system name.
BDM7 - ALE Audit Statistics
BD10 - Send Material Master
BD12 - Send Customer Master
BD14 - Send Vendor Master
BD22 - Reorganise (Delete) Change Pointers
BD87 - ALE Message Monitor
BF24 - Create Product (User exits)
BF34 - Register user exit event to function module.
CG3Z - Transfer file from frontend.
CMOD - List user exits
DB02 - Database Tables/Tablespace Statistics
DB13 - Database Planning Calendar
DB14 - Database Operations Log
DB26 - Database Parameters
DBCO - External Database Connections in SAP.
F.10 - GL Accounts
F110 - BACS Payments
FBL1N - List Invoice Documents
IDX1 - Port Maintenance In IDoc Adapter
IDX2 - Meta Data Overview in IDoc Adapter
IDX5 - XI IDoc search.
LISTSCHEMA - BW Lists underlying tables for InfoCubes.
MMRV - Show current posting period
OAAD - Archivelink Administration of Stored Documents
OAER - Business Document Navigator
OAM1 - ArchiveLink Monitor
OAC0 - Define content servers
OAWD - Archivelink Document Storage.
OB29 - Define Posting Periods (See also SPRO -> Fianncial Accounting -> Financial Accounting Global Settings -> Fiscal Year)
OMJ3 - Printer Determination by Plant.
OMJR - Printer Determination by Output Type.
OS01 - Ping servers
ORA_PERF - Analyse tables, delete statistics.
OY18 - Reports for table logging.
PA30 - Maintain HR Master
PFAL - HR ALE Transfer of Master Data.
PFCG - Role Maintenance
PFTC - Workflow task maintenance.
PFTC_CHG - Change a workflow task.
PFUD - Profile consistency (mass user comparison) check (schedule job).
RMPS_SET_SUBSTITUTE - Set Workflow substitute.
PPOME - HR Mini Master (Personnel Management)
PPOMW - HR Mini Master (Organisational Management)
RRMX - Launch BEX Analyser
RSA7 - BW Delta Queue Maintenance
RSCSAUTH  - Maintain authorisation groups.
RSM37 - Display background jobs with parameters.
RSRDA - Stop all daemons for real-time data acquisition in BW.
RZ01 - Job Scheduling Monitor.
RZ03 - Display/ Manully Switch Operation Mode (start/stop instance).
RZ04 - Maintain Operations Modes
RZ12 - Maintain RFC groups.
RZ21 - Monitoring Properties
RZ70 - SLD Administration
SA38 - ABAP Reporting
SAPL0ARC - Archiving Customising For Object.
SCCL - Local Client Copy.
SCEM - CATT initial screen.
SCU3 - Analyse table logging logs
SDCCN - Service Data Control Centre
SECR - AIS (Audit Information System)
SE03 - System Change Option
SE11 - Data Dictionary Display
SE21 - Package Builder.
SE61 - Change display screen text.
SE93 - Transaction lookup.
SE37 - ABAP Function Modules
SE54 - Generate Table Maintenance.
SBIW - Maintain Business Information Warehouse Data Sources
SCU0 - Customizing Cross System Viewer
SLDCHECK - Check status of the SLD
SLG1 - Analyse application log
SLG2 - Application Logs Delete
SMLG - Logon groups.
SMGW - Gateway monitoring
SMICM - ICM Monitor
SMQ1 - qRFC Monitor (Outbound Queue)
SMQ2 - qRFC Monitor (Inbound Queue)
SMQ3 - qRFC Monitor (Saved E-Queue)
SMQE - qRFC Administration
SMSY - SOLMAN - Landscape Directory
SMSY_SETUP - Solution Directory Setup
SMX - Own Jobs
SM18 - Security Audit Log Reorganisation
SM19 - Security Audit Log Configuration
SM20 - Security Audit Log Analysis
SM35 - Batch Input Log (LSMW etc)
SM58 - Asynchornous RFC Error Log
SM63 - Maintain Operation Modes Calendar
SM64 - Background Jobs Events.
SM65 - Background processing check.
SM66 - All processes across system.
SM69 - Maintain External Commands
SO36 - Automatic System Forwarding for E-Mails
SOA0 - ArchiveLink Document Types.
SOLMAN_WORKCENTER - Solution Manager Maintenance Optimizer.
SP12 - TemSe administration
SSAA - System Administrator Assistant
ST02 - SAP Buffer Tuning
ST03 - Global Workload Monitor
ST04 - Database Statistics
ST06 - Operating System Monitor
SUCU - Table / View Authorisations.
SUGR - User Group Maintenance
SWDD - Workflow Builder.
SWDM - Workflow Builder
SWI1 - Workflow Instance Log
SWE2 - Workflow linkages
SWEL - Display Event Trace
SWELS - Switch on/off event trace
SWEQADM - Event Queue Administration
SWEQBROWSER - Event queue browser
SWO1 - Business Object Builder
SWU0 - Workflow event simulation
SWU3 - Workflow customizing
SWUS - Test Workflow
SWWL - Delete workflow
SXMB_MONI - Integration Engine Monitoring
SXMB_MONI_BPE - Process Engine Monitoring
SXMB_IFR - Call Integration Builder
SXMB_ADM - Integration Engine Admin
SXI_CACHE - XI Directory Cache
SXI_MONITOR - XI Message Monitoring
TU02 - Change parameters
USMM - User Measurement Statistics.
WE02 - IDoc Display
WE07 - IDoc Monitoring
WE05 - IDoc List.
WE09 - IDoc Search.
WE11 - Delete IDocs
WE19 - IDoc Test Tool
WE20 - IDoc Partner Profiles
WE40 - Idoc administration setup
WE41 - Outbound Process codes.
WE42 - Inbound process codes.
WE46 - IDoc Admin
WE60 - IDoc Types Documentation
XK02 - Changer Vendor Details

Useful SAP Tables

During my time as a SAP BASIS administrator, I have accumulated a number of useful database tables.

ADR6 - User Address Data (cross with USR21)
AGR_1250 - Authorization data for the activity group
AGR_USERS - Assignment of roles to users
AGR_SELECT - Assignment of roles to Tcode
AGR_PROF - Profile name for role
AGR_DEFINE - Role definition
BDCPS - Change Pointers
DDLOG - Table buffer changes for sync between dialog instances.
DD02T - SAP Tables and descriptions.
DD09L - SAP Tables (with logging PROTOKOLL=X)
PA0105 - HR Communications InfoType (includes USERID to PERNO).
SH_PRIN - Printer list
TADIR - Repository object table.
TBRG - All Authorisation Groups.
TDDAT - Tables to Authorisation Groups.
TBTCP - Background Job Steps (see field AUTHCKNAM for job user)
TOLE - OLE Link table
USR02 - User passwords.
USR01 - Users.
USR04 - User Master Authorisations.
USR10 - Authorisation Profiles.
UST04 - Users to profiles.
TDEVC - Package table.
TSP03D - Printer Output Devices
TST03 - Spool requests
TSTC - Transactions
TVARV - Variant Variables
VARI - Variants

Monday, June 13, 2011

Use of Oracle AWR / ASH leading to bad coding?

I had a brief email exchange with another Oracle guru the other day.
He suggested that the quality of Oracle coding in PL*SQL and Plain Jane ( SQL had gone down hill.

This could be attributed to two factors:
1, The level of coding experience has dropped.  Older more experienced coders have filled into the new architect roles and the void is being filled quickly by newer in-experienced coders.
2, The rigour with which debugging, testing and tuning is performed has become somewhat lax ( ) because there's just no emphasis on the developers to tune their code when the DBA has such great tools to do it for them.

Is it possible that the use of the additionally licensed tools such as AWR (Automatic Workload Repository) and ASH (Active Session History) introduced in Oracle 10g, have provided an easy mechanism for DBAs to seek out better performance.
I don't think these tools are just for DBAs, but the way they are marketed makes me feel they are pushed that way.

Sunday, June 12, 2011

Running Oracle (Windows vs UNIX/Linux)

For most of my IT career I've been using UNIX/Linux (let's call this ULix to save my fingers) to run Oracle.
Humans are creatures of habit (like cats and so why would I want to change this.
Running Oracle on ULix is a measured quantity.  It works, it's reliable, you can tune it and then you can really tune it.

Whilst browsing my book library I came across a book I've had for years:
"Configuring & Tuning Databases on the Solaris Platform" by Allan N. Packer
If you have ever wondered what the possibilities are in tuning an Oracle system on ULix, then read this book.  It might be old, but it's a good one, and it establishes all the basic principles of tuning.

Now if we were to consider running Oracle on Windows (, then out-of-the-box ( it would work perfectly fine.

Microsoft have made vast improvements to the monitoring/tuning capabilities in newer Windows Server versions, but I don't think this can compare to the flexibility of monitoring/tuning in ULix.
I find it very difficult to profile a database on Windows.
Think about it, where can I see the individual shadow processes ( at the OS level?

On Tanel Poder's company's website, there's a great whitepaper "Understanding LGWR, Log File Sync Waits and Commit Performance".
In this whitepaper, Tanel mentions monitoring and tuning the Oracle log writer (LGWR) process to get better CPU time.  I thought to myself, how would someone do this on Windows?

I really don't think you can.
To summarise: I like Windows, it can run Oracle RDBMS perfectly fine, but it's not as flexible as running on ULix.

Friday, June 10, 2011

Basic Performance Tuning Guide - SAP NetWeaver 7.0 - Part I - Finding Slow Running Programs

For me, as a BASIS administrator, it's the most fun part of the job.  Performance tuning!
I enjoy making things faster, more efficient.  It seems German ( in some ways, and I guess that I must have a little OCD in me, but it's rewarding.

I can't believe how simple performance tuning can be sometimes.  If I write the process down it even seems simple:
1, Find slow running programs.
2, Analyse why they are slow or poor performers.
3, Tune them.
4, Cup of tea.
5, See them fly.

For those Lean Six Sigma followers, this fits in with DMAIC ( quite nicely.
I bet it would even look good on a fish bone diagram ( (sorry, I did mean "cause & effect diagram" honest).

I'm going to break these steps down.

Let's start with number 1:
Find Slow Running Programs
So you have your new SAP system all installed.
The consultancy that briskly implemented it and have now rolled-off, have left it running beautifully (
Why wouldn't they.
As an operations specialist (, you get called in.
Of course, you're an expert and you know that profiling a system is one of those tasks that needs to be done to help you troubleshoot performance problems.  But being an expert, you also know that there are bigger fish to fry after a fresh implementation of anything.
So we assume that it is not plausible to use a comparative profile of before and after performance.

Let's assume that a report/transaction has been reported as running slowly today by a single user (one helpdesk ticket
You're on the case and you can look at the problem right now.
What's you're first call?
Wrong!  Crying won't get you anywhere!
We need to see if the problem is caused by the system in general, the user (sometimes you do get them) or something else.
I always use the following methods:

1, Check SM66 (global work process overview). It's easier than checking SM50, then remembering that you're on a system with Apps servers (additional dialog instances).
What's currently running?
Look specifically at the "Time" and "Action" columns:

SM66 global work process overview

If there are any long running programs (large "Time" values) in the list, what "Action" are they performing?  Could it affect the report that the user has identified as being slow?

2, Check SM21 (System Log) always good to keep abreast of potential system wide issues!  Brownie points ( from the big boss for being so quick!
Any problems in here will also mention short dumps in ST22, so no need to check ST22 unless it's in the system log.

3, Check STAD (SAP Workload - Single Business Transactions).  This shows specific transactions that the user has completed in the <n>hours (after this they get aggregated and shown in ST03N/ST03G depending on your collector settings).
Use the "Show Business Transaction Tots" option, it's easier to see the wood for the trees (
Enter the username into the filter and adjust the time period to search for (when did the problem happen?).

 STAD single business transactions

Great!  You can see the individual transactions performed by the user (even ITS based services e.g. BSPs will be shown):

TIP: Try it without the filter on a small system to see what you can actually get.  It's quite a lot of information.  RFCs, background jobs, HTTP.

STAD business transaction analysis

Check out the "Response Time", "CPU time" and "DB req time" columns.
See if any specific transaction stands out above the rest for these three values.

If you expand the "+" folder on the very left, you will see each dialog step (screen) that the user has been through:

 STAD business transaction analysis output

The time gap between each screen is the previous screens response time + roll-time + front-end time + "Think Time".

So now you've found the slow program, you need to analyse it.
Part II of this guide shows how to read and decipher the single record statistics using STAD.
Part III of this guide shows how to perform an SQL Trace.

Are you sure I'm not a consultant

I'd like to mention the growing (in my mind anyway) disparity between an operational support person and a consultant.
Having spent some time in contact with various recruitment agencies, I can understand how single words specified on a requirements brief from the client must be followed as accurately as possible.
The client, after all, wants the perfect candidate don't they?!

I've worked on both sides of the coin you see.
I started off as an apprentice in an IT department in the late nineties.
Customer satisfaction is key when you're in an IT department working the helpdesk.  Even if you're second line (which infers a military comparison, doesn't it?, you may be rated on your comms skills, how efficient you were and how accurate (first time fix).
So I can understand that supporting something that is already installed and working would be viewed as operational support and not consultancy.

I've also worked for a consultancy, quite a big one.  I wasn't called a consultant, but if you work for a consultancy you must be? right?

But wait, doesn't an internal IT department upgrade stuff?  performance tune?  generate ideas?
Are we saying that IT departments just do these things without talking to the business side.  I think not.  I remember many meetings where new ideas had been conceived in the IT department and were conveyed to the business.
Is this not consulting? Advising what would be best for the business and the technology.  I don't know any IT department today that would contemplate changing anything without consulting with the business first.

So if we're saying that IT departments do consult, the difference between a consultant and operational support must be in the *type* of consulting right?
You could argue that a consultant has more "soft" skills.  Um, like a salesman perhaps?
Or would you argue that a consultant may have broader knowledge.  Possibly.  More of a fish eye? (,_master_of_none).
Hmm, I can't see how this is different either.  You are either very good at these things (IT skills) or you're not.
The only difference is when you consider a company wanting change, but not having the skills internally.  Wouldn't that force them to use an external consultancy?  Someone *else* to consult with.

I think a "consultant" today is someone who has managed to escape the IT department.  He's a free radical (  But without the other elements he's not got anything to bond with.  He's able to offer unbounded ideas and generally gets the biggest budget (and sometimes some training), but he's not able to evolve into anything more complex.  He came, he saw, he implemented, he left.  There's a bigger side to this implementation lark, and it happens to be operational support!
Hurrah to the helpdesk operators, the backup tape changers and the email mailbox size limiting people (

Some of the best, most talented people I've ever worked with have been working in operational support.  Fantastic, creative ideas ingeniously devised in the smallest of rooms at the eleventh hour ( to stop the world ending (or the system from crashing) on a system that was implemented 20+ years ago.  How can these people not be consultants?  Maybe they're too honest ;-)