Today I attended Kevin Kline’s [blog|Twitter] Pre-Con for SQL Saturday 130 in Jacksonville. Below are my unedited, raw notes from the seminar.
Troubleshooting & Performance Tuning Microsoft SQL Server
http://bit.ly/kekline – scripts galore!
This session is about finding problems, and then acting upon it!
– Monitoring
-Benefits of Benchmarking and Baselining
-Preventative Maintenance
**Where do DB problems come from? Hardware Problems 5%; Design Problems 15%; SQL Code Problems 80%
**Rocks, Gravel and Sand Method
Check the stupid stuff FIRST – EVERY TIME!
Process of elimination provides a strong, linear curve of fixes
The phone is not a good indicator of customer satisfaction – not being proactive!
Use quantitative, real information, actionable information
First place to check: Error Logs, save time first! Windows Application Event Viewer, then SQL Server Error Log and SQL Server Agent Log.
**Acting Upon Error Logs
Error Message
Severity Level 1-18 Informational, about 20 serious problem
What happens if there is no error message?
SQL Server Agent – New Agent – Configure agent to notify alerts over level 18-25
Coffee Break!
sqlcmd Utility
:connect server_name
download PowerGui for a GUI based powershell app
Windows Performance Monitor – Benefits: Shows the rate of resource consumption or activity in a wide variety of areas of the server
- Disk IO; Memory; Network
- SQL Server activity – locking, blocking, and deadlocking
PerfMon – DMV sys.dm_os_performance_counters
Select cntr_value
from sys.dm_os_performance_counters
where object_name = ‘sqlserver:buffer manager’
and counter_name = ‘page life expectancy’
How busy are my CPUs? System Processor Queue Length – Very Useful!
Disk I/O probably the number one bottle neck for SQL Server
W/ SSD Drives, Fragmentation is going to be a thing of the past – because with traditional HDs the disk appeture must move to get data, when data becomes fragmentated. W/ SSD this no longer occurs.
Check the following perfmon counters: logical disk – avg. disk sec reads/writes
we want these to be <5ms on a transaction log drive and <10ms on a data file drive
-10-20ms is generally acceptable
->20ms is considered to be unacceptable
Most people forget about TempDB
TempDB is a mjor source of contention on many multi-database system
Run PerfMon counters against TempDB as you would any other production database
Fill Factor – sp_configure set to say 70% to avoid Page Splits: Number of 8k pages that filled and split into two new pages. Should be <20 per 100 batch requests/sec
Creating a Fill Factor of 70% would cause 30% more pages – causing more/higher I/O – something to note.
CodePlex – SQL Server 2008 Extended Events SSMS addin
http://www.extendedeventmanager.codeplex.com
**Server-side traces/SQL Profiler
Warning, Profiler can be overwhelmed by a high throughput
Server-side trace = GUI-less
Saving Profiler to a file allows you to import into PerfMon – then you can visually see when the CPU Spikes, you can see what script is running to cause it
Start following these blogs:
Glenn Berry
Lunch!
**What About DMVs?
select * from sys.dm_os_performance_counters
Examples might include:
Network Interface – Bytes Total/Sec
Processor – % Processor Time
SQLServer: Databases Application Database – Batch Requests/Sec
-General Statisitc – User Connections
-Latches – Average Latch Wait Time
-Locks Average Wait Time
What is a Latch? Short term, light weight locks on specific addresses in memory. IF you see a lot of Latches, you may have memory issues. It doesnt mean you dont have enough memory, just something is wrong with it.
Troubleshooting Sieve – Perfmon, SQL Profiler and Wait Stats
Wait Stats, its all about bottlenecks
Wait Stats – analogy a drive thru window. Think of a CPU as the drive thru window. A process is assigned a SPID.
SQL is waiting to services the process – goes into suspended (Resource Waits)
Runnable (Signal Waits) process ready to be run
Statuses a SPID will have, Running, Sleeping, Suspended, SPIDS reserved for 50 and under are system related.
SQL SERVER 2k5 has 230 Wait Types
SQL SERVER 2k8 has 475 Wait Types
sys.dm_os_wait_stats
-wait information
-cumulative by wait type
-persistent data
-transient data
Wait stat values can be cleared with: (w/0 restarting sql server)
dbcc sqlperf (‘sys.dm_os_wait_stats’, clear);
sys.dm_os_waiting_tasks (how much time has a SPID been waiting and why?)
PAGELATCH_SH (Shared) or IX (Intent) EX (Exclusive)
LATCH_XX commonly arise from contention on resources other than buffer pool, especially due to heaps or text data types
CXPACKET – Query parallelism due to splitting and merging overhead – how much time SQL takes to split threads across multiple CPU/CPU Cores
MAX DOP (Degrees of Parallelism) default is 0 set in sp_configure which means -1 cpu
COST THRESHOLD for Parallelism
**IO Bottlenecks
WRITELOG – writing transaction to the log on disk
PAGEIOLATCH_XX – respresent memory to disk transfers
IO_COMPLETION – Awaiting I/O task completion
SLEEP_BPOOL_FLUSH – Checkpoint IO throttling
select * from sys.dm_db_index_usage_stats – find out good information about indexes
http://www.SQLBLOG.com – Adam Machanic sp_whoisactive
**Storage Quick Check
sys.dm_io_virtual_file_stat – which of the files are getting used – how fast
sys.dm_io_pending_io_requests – who is waiting to get work done?
RAID10 will provide the best performance and availability
backups are single threaded, cannot take advantage of Parallelism
sys.dm_exec_query_stats – what is in the cache and what resources are being consumed
http://www.SQLBits.com – Disk IO Tuning for MS SQL SERVER Newbie
Raid5 is good with Reads
W/ Raid5 three disks required, one disk dies, we are good.
Raid1 two disks, one for one (Mirrored) one disk dies, not good
Raid10 one plus 0, mirrored and stripped
Disk Sector Alignment can yield up to 40% improvement in some scenarios on pre-w2k systems Google Jimmy May – Disk Sector Alignment
SET NOCOUNT ON – allows you to suppress the message for all subsequent transactions in your session, until you issue the SET NOCOUNT OFF command
Make this your first line of code in any stored procedure
9999 rows into the sales table
finished in 5.1 seconds (5176 milliseconds) with SET NOCOUNT OFF
finished in 1.6 with SET NOCOUNT ON
Select …
From…
Where x = ? OR y = ?
This query was taking 29% of the cost
replaced y = ? w/ UNION ALL and broke one query into two. reduced cost
**Controlling and Monitoring Plan Caching
Warm Cache and Cold Cache Testing
DBCC FreeProcCache – Warm Cache Test
DBCC FlushProcInDB(dbid)
DBCC DropCleanBuffers – Cold Cache Testing
When Query Tuning, Watch out for…
-Execution plan red flags, such as: bookmark looups, key lookups and rid lookups
-table spools
-table scan (When a useful index exists)
-parallelization operations
SQLSentry – free tool Plan Explorer good alternative to Execution Plan
Non-Clustered index constructed in a B-tree
Clustered index –
Concatinated Indexes are column specific
LName, Fname, emp-id
where lname = ‘white’ good
where fname = ‘allan’ bad (cannot use index because it is a concatinated index and the lname column needs to be referenced first)
Summary
Is anything broken? Error Logs
Is the problem inside or outside of SQL Server? PerfMon
Whats our Bottleneck inside of SQL Server? Wait Stats
Root-cause Analysis? DMVs
We also covered related topics:
Monitoring
Benefits of benchmarking and baselining