SQL Server : Pourquoi ma requête est lente en production, mais pas quand je l’exécute dans Management Studio ? Derrière cette question se cache un des plus gros problèmes d'optimisation de requêtes. Il faut savoir que le moteur de base de données est adaptif et qu'il sait s'adapter aux ressources disponibles et aux volumes de données présent dans les tables. C'est d'ailleurs bénéfique dans la majorité des cas… Qui souhaiterait utiliser 8 processeurs pour exécuter une requête sur un serveur utilisant déjà 100% des CPU ? Le problème du phénomène d'écrit plus haut n'est généralement pas là. En effet même l'activité du serveur est calme et que la même requête avec les mêmes paramètres est exécutée avec quelques secondes d'intervalle entre votre application et Management Studio le comportement change. En fait une application ado.net et Management Studio ne partage pas les même paramètres de connexion ce qui explique l'origine de ce problème. On retrouve ces fameux paramètres de connexion avec :
select * from sys.dm_exec_sessions Les colonnes (entre autres) :
quoted_identifier arithabort ansi_null_dflt_on ansi_defaults ansi_warnings ansi_padding ...
Sont considérées comme telles… SQL Server va alors mémoriser certains de ces paramètres avec le plan d'exécution de la requête. De fait, le plan ne sera réutilisé que si les paramètres sont les mêmes. La requête suivante sera déclinée en 2 plans d'exécutions différents :
SET ANSI_NULLS ON
SELECT * FROM MaTable WHERE MaCol = NULL
SET ANSI_NULLS OFF
SELECT * FROM MaTable WHERE MaCol = NULL Ici le problème sera visible si vous avez une grande quantité d'enregistrement ayant NULL stockée dans la colonne MaCol et un index présent sur cette même colonne. Les plans seront alors très différent l'un de l'autre. La problématique est double. D'une part les paramètres de connexion sont différents entre les 2 applications (Ado.Net et Management Studio) et d'autre part la requête de part sa complexité peut être déclinée en plusieurs plan d'exécuti d'exécution on différent. La solution à ce problème sera supprimée le plan fautif (celui qui est le plus lent des 2), en vidant par exemple le cache de procédure (DBCC FREEPROCACHE) mais cette solution n'est que temporaire, le problème reviendra tôt ou tard. La solution durable est de signaler à SQL Server que la requête sur lequel nous planchons risque de générer des plans d'exécution différent en fonction du contexte et de tout simplement lui sire de régénérer un nouveau plan à chaque exécution :
SELECT * FROM MaTable WHERE MaCol = NULL
OPTION ( RECOMPILE ) Ou dans le cas d'une procédure stockée on ajoutera WITH RECOMPILE à sa définition. Bonne résolution...
Troubleshooting Slow Servers Missing indexes, an inadequate storage I/O subsystem, or a slow network are only some of the possible reasons why a SQL Server database engine might slow down, which is why finding the true cause of a performance bottleneck is vital. Resolving these problems might seem difficult, but if you have the right tools available, they can be easy enough to fix so that performance will improve.
Denny Cherry | Apr 21, 2012
When troubleshooting a performance problem within a SQL Server database instance, there are a variety of places that you can look to determine its root cause. Some of the biggest performance bottlenecks usually boil down to several causes:
Missing indexes
Poor index design
Poorly designed database schema
Inadequate storage I/O subsystem
Buffer pool too small
Slow network
Wrong technology used
Resolving these problems might seem difficult, but if you have the right tools available, they can be easy enough to fix so that performance will improve. Missing Indexes
Disk performance problems and an increased number of deadlocks are some of the potential indicators that you might have indexes missing. One way to easily determine if there are missing indexes is to use two dynamic management views (DMVs): sys.dm_db_missing_index_details and sys.dm_db_missing_index_columns. Between these two DMVs, you can easily identify many of the indexes that need to be created to improve performance. However, there's a catch: The system doesn't understand that indexes already in existence might only need another column added to their included columns list. So, SQL Server might recommend that another entire index be created when all that's needed is a new column in the existing index. Because of this, you shouldn't just take the information from the DMVs and create all the specified indexes. The reported indexes should be compared against the indexes that already exist in the database to ensure that duplicate indexes aren't being created. Duplicate indexes can lead to additional writes, updates, and deletes, causing performance problems. A better approach is to use a query like that in Listing 1 to identify the indexes that are missing from the database on which the query is being run. This query uses the db_id() system function to restrict the output to that from the current database. So, to run it in your database, simply specify that database in the Available Databases drop-down box in SQL Server Management Studio (SSMS) or add a USE statement before the query. SELECT a.avg_user_impact * a.avg_total_user_cost * a.user_seeks, db_name(c.database_id), OBJECT_NAME(c.object_id, c.database_id), c.equality_columns, c.inequality_columns, c.included_columns,
c.statement, 'USE [' + DB_NAME(c.database_id) + ']; CREATE INDEX mrdenny_' + replace(replace(replace(replace (ISNULL(equality_columns, '') + ISNULL(c.inequality_columns, ''), ', ', '_'), '[', ''), ']', ''), ' ', '') + ' ON [' + schema_name(d.schema_id) + '] .[' + OBJECT_NAME(c.object_id, c.database_id) + '] (' + ISNULL(equality_columns, '') + CASE WHEN c.equality_columns IS NOT NULL AND c.inequality_columns IS NOT NULL THEN ', ' ELSE '' END + ISNULL(c.inequality_columns, '') + ') ' + CASE WHEN included_columns IS NOT NULL THEN 'INCLUDE (' + included_columns + ')' ELSE '' END + ' WITH (FILLFACTOR=70, ONLINE=ON)' FROM sys.dm_db_missing_index_group_stats a JOIN sys.dm_db_missing_index_groups b ON a.group_handle = b.index_group_handle JOIN sys.dm_db_missing_index_details c ON b.index_handle = c.index_handle JOIN sys.objects d ON c.object_id = d.object_id WHERE c.database_id = db_id() ORDER BY DB_NAME(c.database_id), ISNULL(equality_columns, '') + ISNULL(c.inequality_columns, ''), a.avg_user_impact * a.avg_total_user_cost * a.user_seeks DESC
When reviewing the output from the query in Listing 1, look for queries that have a high value in the first column. The higher the number, the more savings that will be seen by adding an index. Poor Index Design
Poor index designs for database tables will usually manifest as slow running queries and queries that have a high execution plan cost. The easiest way to identity these problems is to query the procedure cache because queries' execution plans remain in the cache as long as there is enough memory to store them. Listing 2 shows a sample query that looks for queries with a high total cost. When reviewing the output from Listing 2, look at the StatementSubTreeCost column. The higher the number, the more expensive the execution plan. SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') SELECT query_plan AS CompleteQueryPlan, n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost, dm_ecp.usecounts FROM sys.dm_exec_cached_plans AS dm_ecp CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS dm_eqp CROSS APPLY query_plan.nodes ('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qp(n) ORDER BY n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') DESC GO
In SQL Server 2012, you can set up an Extended Events session to capture the execution plans for queries. However, it isn't recommended that Extended Events be used to capture execution plans in high-load systems as doing so could overload the system by 21% or more by capturing the showplan_xml column. For more information on how to properly design indexes, see SQL Server 2008 Query Performance Tuning Distilled (APress, 2009) by Grant Fritchey and Sajal Dam. Another book you might want to check out is Expert Performance Indexing for SQL Server 2012 ( APress) by Jason Strate and Ted Krueger, which is scheduled to be released in May 2012. Poorly Designed Database Schema
One of the biggest killers of database performance is a poorly designed database schema. Evidence of a poorly designed database schema can be seen when lots of I/O requests occur while queries are being run. The easiest way to look for this evidence is to use SQL Server Profiler to capture the queries that have the highest I/O load, then run those queries in SSMS with SET STATISTICS IO enabled. Between the table I/O outputs and the execution plans, you'll get a good indication of which table is having a performance problem.
Fixing Schema Problem
After the problematic table has been identified, fixing its schema problem is the next step. Sadly, fixing the problem usually isn't as simple as finding the problem. The table might need to be normalized or denormalized, depending on the table's design. Or perhaps the table needs to be vertically partitioned because there's too much data in it. Conversely, maybe the vertical partitioning was taken a little too far. In case you're unfamiliar with vertical partitioning, it's a technique in which a single table is broken into multiple tables, often using a date as the breakpoint. For example, instead of putting the sales history for every year into a single table, you could put the sales history for each year into a separate table. Inadequate Storage I/O Subsystem
The most obvious way to determine whether a SQL Server database instance is having storage performance problems is to look in the SQL Server ERRORLOG file. A message about slow I/O in the ERRORLOG file is a good indicator that there's something wrong with the storage I/O subsystem. However, not seeing any errors about slow I/O in the ERRORLOG file doesn't rule out that your storage I/O subsystem isn't the cause of a performance bottleneck. Not seeing these errors just means that your storage isn't ridiculously overtasked and not keeping up. A quick way to look for I/O problems is to download Adam Machanic's free sp_WhoIsActive stored procedure. When you run sp_WhoIsActive, it will give you the current wait type of all the currently running processes in the system. Look for wait types that report I/O as being the problem. Another way to look for I/O problems is to query the sys.dm_os_wait_stats DMV with code such as SELECT * FROM sys.dm_os_wait_stats ORDER BY wait_time_ms desc
If the first few rows returned from this query include some I/O wait types, then a slow I/O subsystem might be the problem. You might have noticed that I said that these wait types might indicate a slow I/O problem and not that they do indicate a slow I/O problem. That's because a poorly indexed database server can also report I/O problems when the I/O subsystem is perfectly sized for the application, but you're asking the I/O subsystem to do a lot more work than it should be doing. In other words, the I/O problem is actually the result of an index problem. To resolve it, you need to add indexes to the system. Because of this, whenever I look at a new system in which the client wants to upgrade the hard drives, I always start with the indexes because that might very well fix the problem. After the problem has been identified as actually being a storage I/O subsystem problem, there are a few directions that you can go to fix it. For large reporting databases, table partitioning might help. There are two different techniques that you can consider: partitioning for partition elimination and partitioning for CAPEX savings. No matter which approach is taken, a solid understanding of the data and how it's used is required. Partitioning for partition elimination. With this technique, the goal is to reduce the amount of data that SQL Server needs to scan through when doing table scans. Typically, this is done by using a static window approach with many partitions created in the system and data. For example, you could put the data for each month into its own partition. This way, when users run a query, SQL Server needs to scan through only one partition instead of going through all the data. Partitioning for CAPEX savings. With this technique, the goal is to put the most frequently used data on really fast disks, such as flash drives, and put the less frequently used data on slower, less expensive disks, such as Serial ATA (SATA) disks.
This often makes flash drives a much more affordable option, because you only need a small number of them. In either case, the same techniques are used to create the partition scheme and function, as discussed in "Data Warehousing: Horizontally Partitioning the Fact Table ." No matter which technique you use, you must have a complete understanding of how the data is used to successfully set up table partitioning. Another way to fix an inadequate storage I/O subsystem is to simply buy new hardware that can handle the increased workload. Although this is probably the easiest solution to implement, getting management to sign off on the purchase of new hardware might not be that easy. Several vendors offer storage solutions that can be configured to store data on multiple tiers on a single volume over different speed disks. This gives the sam e basic benefit as partitioning for CAPEX but without needing to dive into how the table needs to be partitioned. The storage array automatically moves the data that's most often used to the fastest disks, which leaves the stale or idle data on the slower, lessexpensive disks. Buffer Pool Too Small
The buffer pool is where SQL Server stores data pages that it has read or written to recently. A small buffer pool will generally cause performance problems, because it will put too much pressure on the disk subsystem. There are a couple of different performance monitor counters that you can look at to see if the buffer pool size is healthy. Both counters are located within the SQLServer:Buffer Manager performance counter object. The first counter to look at is Database Pages. This counter tells you how much memory is currently allocated to the buffer pool. This counter is in pages, so to get it into gigabytes, multiply the value by 8, then divide by 1,048,576.
The second counter to look at is Page Life Expectancy. This counter tells you how long, in seconds, SQL Server expects that you'll be keeping the data in memory before releasing the page so that additional data can be loaded into memory. There is no magic number that says that the buffer pool is healthy. There's a myth floating around the Internet that says a Page Life Expectancy of 300 means that the system is healthy. However, this isn't the case. On a system that has 100MB of RAM, a Page Life Expectancy of 300 seconds might be a good number to go by. However, a Page Life Expectancy of 300 on a system that has 256GB of RAM would be a disaster. Fixing a small buffer pool requires adding more memory to the server and configuring SQL Server to use the additional memory. The only other way to increase the amount of data that SQL Server can load from one database into the buffer pool is to move databases from one server to another so that the remaining databases have access to a higher percentage of the database engine's buffer pool. Slow Network
Another potential server bottleneck is the network. The easiest way to identify this problem is to look at the wait statistics on the system. A wait type of ASYNC_NETWORK_IO indicates that the network between SQL Server and the client computers is having a performance problem. If the client computers are located at a remote site, this could be the cause of the ASYNC_NETWORK_IO wait types. If all the client computers are local, ASYNC_NETWORK_IO wait types might be the result of an older version of SQL Server or an older network switch being used. A slow or misconfigured network port could also be the cause of ASYNC_NETWORK_IO wait types. For example, having a network port configured for 10/half duplex could cause performance problems. In addition, there could be performance problems if a network port is configured to auto-detect the network speed, but the network switch and server don't successfully negotiate the fastest possible network speed.
Fixing a slow network isn't something that a DBA will be able to handle alone. It will require a coordinated effort between the network administrator, the network provider if there's a WAN involved, and the DBA. Wrong Technology Used
Some database performance problems can be traced back to simply using the wrong technology to get the job in question done. Usually this revolves around running large reports against production OLTP databases or databases that have an OLTP schema. Many of these reports end up doing large, complex aggregations. Although SQL Server can do the aggregations, it isn't the best technology to use for them. For large, complex aggregations, SQL Server Analysis Services (SSAS) is often the more appropriate tool because SSAS pre-aggregates the data when the databases are loaded. As a result, when a query asks for aggregated data, it has already been aggregated and the results can simply be returned instead of being processed. The Cost of Poor Performance
There are many reasons why SQL Server database engines can slow down. It could be a database design problem, system design problem, infrastructure design problem, or code problem. No matter the reason, it's important to realize that slow SQL Server database engines can have more serious consequences than just people complaining. Customers might not get their orders placed. Application users might be spending minutes instead of seconds waiting for screens to load. Users might be waiting hours instead of minutes for reports to run. All of these things will cost the company money -- and nothing will cause a company to want to get its database performance problems resolved more quickly than being able to quantify the money being lost because of slow performance.
Pourquoi mon serveur de base de données rame ? Un article de zinzineti
1 Commentaire
Si le serveur rame c’est qu’il y a problème de performance. Lenteur = problème de performances. Comment identifier la (les) source(s) du lenteur ? /********************************************************************** **** — Description : Diagnostiquer les problèmes de performance — Auteur : Etienne ZINZINDOHOUE *********************************************************************** ***/ –> Quel sont les sources du lenteur ? ======================================================= ======================================= SELECT TOP 20 wait_type AS [Type Attente] -- Nom du type d'attente ,wait_time_ms / 1000 AS [Temps Attente Total (s)] - - Temps d'attente (en seconde) ,waiting_tasks_count AS [Nombre d'attente] -- Nombre d'attentes sur ce type d'attente. Ce compteur est incrémenté au début de chaque attente ,CONVERT(DECIMAL(12,2), wait_time_ms * 100.0 / SUM(wait_time_ms) OVER()) AS [% Temps Attente] FROM sys.dm_os_wait_stats ORDER BY [% Temps Attente] DESC;
sur mon serveur de test j’ai cet écran :
–> Résultat ======================================================= ======================================= Les sources de lenteurs observées : SQLTRACE_BUFFER_FLUSH: Se produit lorsqu’une tâche attend qu’une tâche en arrière-plan vide les tampons de traçage sur le disque toutes les quatre secondes. LAZYWRITER_SLEEP : Se produit lorsque des tâches d’écriture différée sont suspendues. Il s’agit d’une mesure de la durée consacrée aux tâches en arrière-plan qui attendent. Ne considérez pas cet état lorsque vous cherchez des blocages d’utilisateur. RESOURCE_SEMAPHORE : Se produit lorsqu’une demande de mémoire de requête ne peut pas être accordée immédiatement en raison d’autres requêtes simultanées. Des temps d’attente élevés peuvent indiquer un trop grand nombre de requêtes simultanées ou des quantités de demande de mémoire trop importantes. SLEEP_TASK : Se produit lorsqu’une tâche est en état de veille en attendant qu’un événement générique survienne. ASYNC_NETWORK_IO : Se produit sur des écritures réseau lorsque la tâche est bloquée derrière le réseau. Vérifiez que le client traite les données du serveur. PAGEIOLATCH_SH : Se produit lorsqu’une tâche attend sur un verrou interne un tampon qui est une demande d’E/S. La demande de verrou interne est en mode partagé. De longues attentes peuvent indiquer l’existence de problèmes au niveau du soussystème de disque. SOS_SCHEDULER_YIELD : Se produit lorsqu’une tâche abandonne volontairement le planificateur pour d’autres tâches à exécuter. Durant cette attente, la tâche attend le renouvellement de son quantum.
BROKER_TASK_STOP: Se produit lorsque le gestionnaire de tâches de file d’attente Service Broker essaie d’arrêter la tâche. Le contrôle d’état est sérialisé et doit être au préalable dans un état d’exécution. –> Analyse du résultat ======================================================= ======================================= Le temps d’attente le plus long concerne le type SQLTRACE_BUFFER_FLUSH, ce qui veut dire qu’il y a problème d’E/S disque, donc de non disponibilité de ressource disque. Ce qui est confirmé par la présence des types PAGEIOLATCH_SH et ASYNC_NETWORK_IO Or les principales causes d’attente de libération de ressources disques sont généralement des problèmes : * de mémoire virtuelle (insuffisante,…) * d’index (fragmentation, absence,…) * de croissance des fichiers de base de données (tempdb,…) * des requêtes( INSERT, UPDATE,…) peu performantes * ou toute autre activité qui cause l’activité du disque dur Remarques : 1.) Les statistiques ne sont pas conservées lors d’un redémarrage de SQL Server /!\ Les statistiques sont donc perdu au redémarrage des services SQL Server /!\ 2.) la Commande DBCC SQLPERF (‘sys.dm_os_wait_stats’, CLEAR); permet de réinitialiser la vue sys.dm_os_wait_stats c-a-d remettre tous les compteurs à Zéro. –> Quelles sont donc les bases qui lisent et écrivent le plus de pages logiques ? ======================================================= ======================================= Lecture et écriture de pages logiques puisque il y a d’abord écriture de pages en mémoire avant écriture sur le disque. SELECT TOP 20 SUM(total_logical_reads + total_logical_writes) AS [Total ES] ,SUM(total_logical_reads + total_logical_writes)/SUM(qs.execution_count) AS [Moyenne ES] ,DB_NAME(qt.dbid) AS [DatabaseName] FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as qt GROUP BY DB_NAME(qt.dbid) ORDER BY [Total ES]
DESC;
Dans mon environnement de test j’ai cet écran :
–> Analyse du résultat ======================================================= ======================================= La base de données qui a le plus de Total E/S a pour valeur NULL. cette valeur NULL identifie les instructions SQL ad hoc et préparées. ça veut dire quoi ? Autrement dit cette valeur NULL identifie le niveau d’utilisation de SQL natif, qui peut être lui-même cause potentielle de nombreux problèmes : * non réutilisation des plans d’éxécution de requête * non réutilisation de codes * au niveau sécurité Ensuite on a les bases msdb et ReportServer dans la liste des résultats. Allons voir ce qui se passe dans les fichiers de ces base de données ( msdb et ReportServer )à l’aide de la fonction : fn_virtualfilestats(DB_ID(N'msdb'), x) avec x l’ID du fichier (data,log,index,…)
on peut noter que l’ID de la base msdb = 4 (noté ID du fichier de données est 1 et le log 2) et que l’ID de la base ReportServer = 95 (noté ID du fichier de données est 1 et le log 2) Pour la base msdb : –> IoStallMS Durée totale (en millisecondes) d’exécution des E/S de l ecture et d’écriture sur le fichier de données (ID = 1) est supérieur à 2minutes ! Pour ReportServer –> IoStallMS Durée totale (en millisecondes) d’exécution des E/S de l ecture et d’écriture sur le fichier de données (ID = 1) est supérieur à 1 minute ! Donc dans mon cas, je dois me focaliser sur ce qui se passe sur la base msdb Un rapide coup d’oeil dans cette base montre qu’elle contient 54 tables utilisateurs et 103 procédures stockées qui servent à faire du reporting !!! –> Piste vers le diagnostic des problèmes de performances ======================================================= ======================================= N’oublions pas la question initiale : Pourquoi mon serveu r rame ? La première réponse pour mon cas c’est qu’il y a des problèmes sur la base msdb. Mais quel est exactement le problème sur cette base ? Les questions qu’il faut se poser après : 1) Est-ce des problèmes d’index sur la base ?
On peut utiliser des requêtes pour identifier s’il manque des index ou si les index présents sont mal utilisés ou défragmentés 2) Est-ce des problèmes de requêtes peu performantes ? On peut utiliser des DMVs pour identifier des requêtes à optimisées 3) Faut-il collecter/analyser des compteurs de performances ? les outils avancés de diagnostic des problèmes de pe rformances SQL Server 4) …. Que faut-il retenir ? ======================================================= ======================================= Diagnostiquer les sources de problèmes de performance est un travail d’expert. Ce travail demande une bonne maîtrise du fonctionnement de moteur SQL Serveur et surtout une bonne dose de patience. Alors comment doser patience et urgence (car pour un serveur qui commence à ramer il faut vite trouver une solution pour éviter le pire) ? D’où l’importance d’une surveillance quotidienne des compteurs de perf des serveurs afin