Pre-Con #SQLSAT130 Notes w/ @kekline

Today I attended Kevin Kline’s [blog|Twitter]  Pre-Con for SQL Saturday 130 in Jacksonville. Below are my unedited, raw notes from the seminar.

Troubleshooting & Performance Tuning Microsoft SQL Server

http://bit.ly/kekline – scripts galore!

This session is about finding problems, and then acting upon it!

– Monitoring

-Benefits of Benchmarking and Baselining

-Preventative Maintenance

**Where do DB problems come from? Hardware Problems 5%; Design Problems 15%; SQL Code Problems 80%

**Rocks, Gravel and Sand Method

Check the stupid stuff FIRST – EVERY TIME!

Process of elimination provides a strong, linear curve of fixes

The phone is not a good indicator of customer satisfaction – not being proactive!

Use quantitative, real information, actionable information

First place to check: Error Logs, save time first! Windows Application Event Viewer, then SQL Server Error Log and SQL Server Agent Log.

**Acting Upon Error Logs

Error Message

Severity Level 1-18 Informational, about 20 serious problem

What happens if there is no error message?

SQL Server Agent – New Agent – Configure agent to notify alerts over level 18-25

Coffee Break!

sqlcmd Utility

:connect server_name

download PowerGui for a GUI based powershell app

Windows Performance Monitor – Benefits: Shows the rate of resource consumption or activity in a wide variety of areas of the server

  • Disk IO; Memory; Network
  • SQL Server activity – locking, blocking, and deadlocking

PerfMon – DMV sys.dm_os_performance_counters

Select cntr_value

from sys.dm_os_performance_counters

where object_name = ‘sqlserver:buffer manager’

and counter_name = ‘page life expectancy’

How busy are my CPUs? System Processor Queue Length – Very Useful!

Disk I/O probably the number one bottle neck for SQL Server

W/ SSD Drives, Fragmentation is going to be a thing of the past – because with traditional HDs the disk appeture must move to get data, when data becomes fragmentated. W/ SSD this no longer occurs.

Check the following perfmon counters: logical disk – avg. disk sec reads/writes

we want these to be <5ms on a transaction log drive and <10ms on a data file drive

-10-20ms is generally acceptable

->20ms is considered to be unacceptable

Most people forget about TempDB

TempDB is a mjor source of contention on many multi-database system

Run PerfMon counters against TempDB as you would any other production database

Fill Factor – sp_configure set to say 70%  to avoid Page Splits: Number of 8k pages that filled and split into two new pages. Should be <20 per 100 batch requests/sec

Creating a Fill Factor of 70% would cause 30% more pages – causing more/higher I/O – something to note.

CodePlex – SQL Server 2008 Extended Events SSMS addin

http://www.extendedeventmanager.codeplex.com

**Server-side traces/SQL Profiler

Warning, Profiler can be overwhelmed by a high throughput

Server-side trace = GUI-less

Saving Profiler to a file allows you to import into PerfMon – then you can visually see when the CPU Spikes, you can see what script is running to cause it

Start following these blogs:

Glenn Berry

http://www.SQLSkills.com

http://www.SQLCat.com

Lunch!

**What About DMVs?

select * from sys.dm_os_performance_counters

Examples might include:

Network Interface – Bytes Total/Sec

Processor – % Processor Time

SQLServer: Databases Application Database  – Batch Requests/Sec

-General Statisitc – User Connections

-Latches – Average Latch Wait Time

-Locks Average Wait Time

What is a Latch? Short term, light weight locks on specific addresses in memory. IF you see a lot of Latches, you may have memory issues. It doesnt mean you dont have enough memory, just something is wrong with it.

Troubleshooting Sieve – Perfmon, SQL Profiler and Wait Stats

Wait Stats, its all about bottlenecks

Wait Stats – analogy a drive thru window. Think of a CPU as the drive thru window. A process is assigned a SPID.

SQL is waiting to services the process – goes into suspended (Resource Waits)

Runnable (Signal Waits) process ready to be run

Statuses a SPID will have, Running, Sleeping, Suspended, SPIDS reserved for 50 and under are system related.

SQL SERVER 2k5 has 230 Wait Types

SQL SERVER 2k8 has 475 Wait Types

sys.dm_os_wait_stats

-wait information

-cumulative by wait type

-persistent data

-transient data

Wait stat values can be cleared with: (w/0 restarting sql server)

dbcc sqlperf (‘sys.dm_os_wait_stats’, clear);

sys.dm_os_waiting_tasks (how much time has a SPID been waiting and why?)

PAGELATCH_SH (Shared) or IX (Intent) EX (Exclusive)

LATCH_XX commonly arise from contention on resources other than buffer pool, especially due to heaps or text data types

CXPACKET – Query parallelism due to splitting and merging overhead – how much time SQL takes to split threads across multiple CPU/CPU Cores

MAX DOP (Degrees of Parallelism) default is 0 set in sp_configure which means -1 cpu

COST THRESHOLD for Parallelism

**IO Bottlenecks

WRITELOG – writing transaction to the log on disk

PAGEIOLATCH_XX – respresent memory to disk transfers

IO_COMPLETION – Awaiting I/O task completion

SLEEP_BPOOL_FLUSH – Checkpoint IO throttling

select * from sys.dm_db_index_usage_stats – find out good information about indexes

http://www.SQLBLOG.com – Adam Machanic sp_whoisactive

**Storage Quick Check

sys.dm_io_virtual_file_stat – which of the files are getting used – how fast

sys.dm_io_pending_io_requests – who is waiting to get work done?

RAID10 will provide the best performance and availability

backups are single threaded, cannot take advantage of Parallelism

sys.dm_exec_query_stats – what is in the cache and what resources are being consumed

http://www.SQLBits.com – Disk IO Tuning for MS SQL SERVER Newbie

Raid5 is good with Reads

W/ Raid5 three disks required, one disk dies, we are good.

Raid1 two disks, one for one (Mirrored) one disk dies, not good

Raid10 one plus 0, mirrored and stripped

Disk Sector Alignment can yield  up to 40% improvement in some scenarios on pre-w2k systems Google Jimmy May – Disk Sector Alignment

SET NOCOUNT ON – allows you to suppress the message for all subsequent transactions in your session, until you issue the SET NOCOUNT OFF command

Make this your first line of code in any stored procedure

9999 rows into the sales table

finished in 5.1 seconds (5176 milliseconds) with SET NOCOUNT OFF

finished in 1.6 with SET NOCOUNT ON

Select …

From…

Where x = ? OR y = ?

This query was taking 29% of the cost

replaced y = ? w/ UNION ALL and broke one query into two. reduced cost

**Controlling and Monitoring Plan Caching

Warm Cache and Cold Cache Testing

DBCC FreeProcCache – Warm Cache Test

DBCC FlushProcInDB(dbid)

DBCC DropCleanBuffers – Cold Cache Testing

When Query Tuning, Watch out for…

-Execution plan red flags, such as: bookmark looups, key lookups and rid lookups

-table spools

-table scan (When a useful index exists)

-parallelization operations

SQLSentry – free tool Plan Explorer good alternative to Execution Plan

Non-Clustered index constructed in a B-tree

Clustered index –

Concatinated Indexes are column specific

LName, Fname, emp-id

where lname = ‘white’ good

where fname = ‘allan’ bad  (cannot use index because it is a concatinated index and the lname column needs to be referenced first)

Summary

Is anything broken? Error Logs

Is the problem inside or outside of SQL Server? PerfMon

Whats our Bottleneck inside of SQL Server? Wait Stats

Root-cause Analysis? DMVs

We also covered related topics:

Monitoring

Benefits of benchmarking and baselining

 

Leave a comment