Database testing. Database load testing. Understanding DataSets and DataTables

: How to test and debug databases

Automatic unit testing of application code is simple and straightforward. How to test a database? Or an application that works with a database. After all, a database is not just program code, a database is an object that saves its state. And if we start changing the data in the database during testing (and without this, what kind of testing will we have?!), then after each test the database will change. This may interfere with subsequent tests and permanently corrupt the database.

The key to solving the problem is transactions. One of the features of this mechanism is that as long as the transaction is not completed, you can always undo all changes and return the database to the state at the time the transaction began.

The algorithm is like this:

  1. open a transaction;
  2. if necessary, we carry out preparatory steps for testing;
  3. perform a unit test (or simply run the script whose operation we want to check);
  4. check the result of the script;
  5. We cancel the transaction, returning the database to its original state.

Even if there are unclosed transactions in the code under test, the external ROLLBACK will still roll back all changes correctly.

It's good if we need to test a SQL script or stored procedure. What if we are testing an application that itself connects to the database, opening a new connection? In addition, if we are debugging, then we will probably want to look at the database through the eyes of the application being debugged. What to do in this case?

Don't rush to create distributed transactions, there is a simpler solution! Using standard SQL server tools, you can open a transaction on one connection and continue it on another.

To do this, you need to connect to the server, open a transaction, obtain a token for that transaction, and then pass this token to the application under test. It will join our transaction in its session and from that moment on, in our debugging session we will see the data (and also feel the locks) exactly as the application under test sees it.

The sequence of actions is as follows:

Having started a transaction in a debug session, we must find out its identifier. This is a unique string by which the server distinguishes transactions. This identifier must somehow be passed to the application under test.

Now the application’s task is to bind to our control transaction before it starts doing what it’s supposed to do.

Then the application starts working, including running its stored procedures, opening its transactions, changing the isolation mode... But our debugging session will all this time be inside the same transaction as the application.

Let's say an application locks a table and starts changing its contents. At this moment, no other connections can look into the locked table. But not our debugging session! From there we can look at the database in the same way as the application does, since the SQL server believes that we are in the same transaction.

While for all other sessions the application's actions are hidden by locks...

Our debugging session passes through the locks (the server thinks they are our own locks)!

Or imagine that the application starts working with its own versions of strings in SNAPSHOT mode. How can I look into these versions? Even this is possible if you are connected by a common transaction!

Don't forget to roll back the control transaction at the end of this exciting process. This can be done both from the debugging session (if the testing process completes normally) and from the application itself (if something unexpected happens in it).

You can learn more about this in the courses

A few years ago it turned out that SQL was suddenly outdated. And NoSQL solutions began to appear and multiply, discarding the SQL language and the relational data storage model. The main arguments in support of this approach: the ability to work with big data (the same Big Data), storing data in the most exotic structures and, most importantly, the ability to do all this very quickly. Let's see how the most popular representatives of the NoSQL world do it.

How is speed achieved in NoSQL? First of all, this is a consequence of a completely different data storage paradigm. Parsing and translation of SQL queries, the work of the optimizer, joining tables, etc. greatly increase the response time. If you take out all these layers, simplify queries, read from disk directly to the network, or store all the data in RAM, you can gain speed. Both the processing time of each request and the number of requests per second are reduced. This is how key-value databases appeared, the most typical and widely known representative of which is memcached. Yes, this cache, widely used in web applications to speed up data access, is also NoSQL.

NoSQL types

There are four main categories of NoSQL systems:

  • Key - value (key-value). A large hash table where only write and read operations on data by key are allowed.
  • Column. Tables with rows and columns. But unlike SQL, the number of columns from row to row can be variable, and the total number of columns can be measured in the billions. Also each row has a unique key. You can think of this data structure as a hash table of a hash table, the first key is the row key, the second is the column name. With the support of secondary indexes, selections are possible by the value in the column, and not just by the row key.
  • Document-oriented. Collections of structured documents. It is possible to select by various fields of the document, as well as modify parts of the document. This category also includes search engines, which are indexes, but, as a rule, do not store the documents themselves.
  • Graph. Specially designed for storing mathematical graphs: nodes and connections between them. As a rule, they also allow you to specify a set of arbitrary attributes for nodes and links and select nodes and links based on these attributes. Supports algorithms for traversing graphs and constructing routes.

For the test, we took representatives of the first three categories:

How the test was carried out

We had four server machines at our disposal. Each has an eight-core Xeon, 32 GB of RAM, four Intel SSDs of 120 GB each.

We tested using YCSB (Yahoo! Cloud Serving Benchmark). This is a special benchmark released by the Yahoo! Research in 2010 under the Apache license. The benchmark is specially created for testing NoSQL databases. And now it remains the only and quite popular benchmark for NoSQL, in fact a standard. By the way, it was written in Java. We added a driver for Aerospike to the original YCSB, slightly updated the driver for MongoDB, and also played a little trick with the output of the results.

INFO

In addition to YCSB, you can test the performance of a NoSQL database using, for example, JMeter.

It took eight client machines to create a load on our small cluster. Quad-core i5 and 4 GB of RAM each. One (or two, or three, or four...) clients was not enough to load the cluster. It may seem strange, but it is true.

All this moved in one gigabit local network. Perhaps it would have been more interesting on a ten-gigabit network, but we didn’t have such hardware. It’s more interesting, because when the number of operations per second starts to be measured in hundreds of thousands, we run into the network. With a throughput of gigabit per second (10^9 bits/s), the network can pass only about 100,000 (10^5) kilobyte packets (~10^4 bits). That is, we get only 100k operations per second. But we actually wanted to get a million :).

Network cards matter too. Correct server network cards have several I/O channels, respectively, each with its own interrupt. But by default in Linux, all these interrupts are assigned to one processor core. Only the guys from Aerospike took care of this subtlety, and their database configuration scripts scatter network card interrupts across the processor cores. You can see network card interrupts and how they are distributed across processor cores, for example, with the following command: “cat /proc/interrupts | grep eth".

We should also talk about SSDs. We wanted to test the operation of NoSQL databases specifically on solid-state drives to understand whether these drives are really worth it, that is, they provide good performance. Therefore, we tried to configure the SSD correctly. You can read more about this in the sidebar.

Setting up the SSD

In particular, SSDs require actions called overprovisioning. The fact is that the SSD has an address translation layer. The block addresses visible to the operating system do not correspond at all to the physical blocks in flash memory. As you know, flash memory has a limited number of rewrite cycles. In addition, the write operation consists of two stages: erasing (often several blocks at once) and the write itself. Therefore, to ensure longevity of the drive (even wear) and good write speed, the disk controller alternates physical memory blocks when writing. When the operating system writes a block to a certain address, the write physically occurs to some clean free block of memory, and the old block is marked as available for subsequent (background) erasure. For all these manipulations, the disk controller needs free blocks, the more, the better. An SSD that is 100% full can be quite slow.

Free blocks can be obtained in several ways. You can use the hdparm command (with the "-N" switch) to specify the number of disk sectors visible operating system. The rest will be at the full disposal of the controller. However, this does not work on every hardware (for example, it does not work in AWS EC2). Another way is to leave disk space not occupied by partitions (meaning partitions created, for example, by fdisk). The controller is smart enough to take advantage of this location. The third way is to use file systems and kernel versions that can report free blocks to the controller. This is the same TRIM command. Our hardware had enough hdparm, we gave 20% of the total disk volume to the controller to be torn to pieces.

The I/O scheduler is also important for SSDs. This is a kernel subsystem that groups and reorders I/O operations (mostly disk writes) to improve efficiency. By default, Linux uses CFQ (Completely Fair Queuing), which tries to rearrange writes to write as many blocks as possible sequentially. This is good for ordinary spinning (that's what they say - spinning:)) disks, because for them the speed of linear access is noticeably higher than access to random blocks (the heads need to be moved). But for SSDs, linear and random writing are equally effective (theoretically), and CFQ operation only introduces unnecessary delays. Therefore, for SSD drives, you need to enable other schedulers, such as NOOP, which simply executes I/O commands in the order in which they arrived. You can switch the scheduler, for example, with the following command: “echo noop > /sys/block/sda/queue/scheduler”, where sda ​​is your disk. To be fair, it is worth mentioning that the latest kernels themselves can detect SSD drives and enable the correct scheduler for them.

Any DBMS likes to write intensively to disk, as well as read intensively. And Linux really likes to do read-ahead, proactive reading of data, in the hope that once you have read this block, you will want to read the next few. However, with a DBMS, and especially with random reading (and this is our option), these hopes are not destined to come true. As a result, we have unnecessary reading and memory usage. The MongoDB developers recommend reducing the read-ahead value if possible. You can do this with the command “blockdev --setra 8 /dev/sda”, where sda ​​is your disk.

Any DBMS likes to open many, many files. Therefore, it is necessary to significantly increase the nofile limits (the number of available file descriptors for the user) in the /etc/security/limits.conf file to a value well above 4k.

Also arose interest Ask: How to use four SSDs? If Aerospike simply connects them as storage and somehow independently alternates access to the disks, then other databases imply that they have only one directory with data. (In some cases, you can specify multiple directories, but this does not involve striping data between them.) I had to create RAID 0 (striped) using the mdadm utility. I suppose one could play with LVM, but the database vendors only describe using mdadm.

Naturally, on all machines in the cluster (both server and client) the clocks must be synchronized using ntpd. Ntpdate is not suitable here because greater synchronization accuracy is required. For all distributed systems, it is vital that time between nodes is synchronized. For example, Cassandra and Aerospike store the modification time of a record. And if there are records with different timestamps on different nodes of the cluster, then the newest record will win.

The NoSQL databases themselves were configured as follows. The configuration was taken out of the box, and all the recommendations described in the documentation regarding achieving the best performance were applied. In difficult cases, we contacted the database developers. Most often, recommendations related to adjustments to the number of cores and the amount of RAM.

The easiest way to set up is Couchbase. It has a web console. It is enough to start the service on all nodes of the cluster. Then create a bucket on one of the nodes (a “basket” for keys-values) and add other nodes to the cluster. Everything is done through the web interface. It doesn't have any particularly tricky settings.

Aerospike and Cassandra are configured in approximately the same way. You need to create a configuration file on each cluster node. These files are almost identical for each node. Then launch the demons. If all is well, the nodes will join themselves into a cluster. You need to have a pretty good understanding of the configuration file options. Good documentation is very important here.

The hardest thing is with MongoDB. In other databases, all nodes are equal. This is not the case with Mongo. We wanted to put all databases in the same conditions, if possible, and set the replication factor for all to 2. This means that there should be two copies of data in the cluster, for reliability and speed. In other databases, replication factor is just a setup of the data store (or “basket”, or “column family”). In MongoDB, the number of copies of data is determined by the cluster structure. You can correctly configure a MongoDB cluster only by reading the official documentation dedicated to it twice :). In short, we need shards or replica-sets. Shards (well, you've probably heard the term "sharding") are subsets of the entire data set, as well as cluster nodes where each subset will be stored. Replica Sets is a MongoDB term for a set of cluster nodes that store identical copies of data. A replica network has a master node that performs write operations and secondary nodes to which data from the master node is replicated. In case of failures, the role of the master node can be transferred to another node of the replica set. For our case (four servers and the desire to store two copies of data), it turns out that we need two shards, each of which is a replica set of two servers with data. In addition, a so-called arbiter must be added to each replica set, which does not store data, but is needed to participate in the election of a new master node. Number of nodes in the replica network for correct elections must be odd. You also need a small configuration database that will store information about the shards and which data ranges are stored on which shard. Technically, this is also MongoDB, but (compared to the main data) it is very small. We placed the arbiters and configuration database on client machines. And on each client you need to run the mongos daemon (mongo switch), which will access the configuration database and route requests from each client between shards.

Each NoSQL database has its own unique way of representing data and valid operations on it. Therefore, YCSB took the path of maximum generalization of any database (including SQL).

The data set that YCSB operates on is a key and a value. The key is a string that includes a 64-bit hash. Thus, YCSB itself, knowing the total number of records in the database, accesses them using an integer index, and for the database the set of keys looks completely random. The value is a dozen fields of random binary data. By default, YCSB generates kilobyte-sized records, but, as you remember, on a gigabit network this limits us to only 100k operations per second. Therefore, in tests we reduced the size of one record to 100 bytes.

YCSB performs very simple operations on this data: insertion new entry with a key and random data, reading a record by a key, updating a record by a key. No table joins (and indeed only one “table” is meant). No selections on secondary keys. No multiple selections by condition (the only check is whether the primary key matches). This is very primitive, but can be done in any database.

Immediately before testing, the database needs to be filled with data. This is done by YCSB itself. Essentially, this is a load consisting only of insert operations. We experimented with two datasets. The first one is guaranteed to fit into the RAM of the cluster nodes, 50 million records, approximately 5 GB of clean data. The second one is guaranteed not to fit in RAM, 500 million records, approximately 50 GB of clean data.

The test itself - performing a certain set of operations - is performed under load different types. An important parameter is the ratio of operations - how many readings should there be and how many updates. We used two types: heavy write (Heavy Write, 50% reads and 50% updates) and mostly read (Mostly Read, 95% reads and 5% updates). Which operation to perform is chosen randomly each time, percentages determine the probability of choosing the operation.

YCSB can use various record (key) selection algorithms to perform an operation. This can be a uniform distribution (any key from the entire data set can be selected with the same probability), exponential distribution (keys “at the beginning” of the data set will be selected much more often) and some others. But the Yahoo team chose the so-called zipfian as a typical distribution. This is a uniform distribution in which, however, certain keys (a small percentage of the total number of keys) are selected significantly more often than others. This simulates popular posts on, say, blogs.

YCSB starts with multiple threads, running a loop in each of them, all on the same machine. With only four cores on a single client machine, it's quite frustrating trying to run more than four threads there. Therefore, we ran YCSB on eight client machines simultaneously. To automate the launch, we used fabric and cron (more precisely, at). A small Python script generates the commands necessary to run YCSB on each client, these commands are queued at at the same time for the nearest minute in the future on each client. Then at is triggered, and YCSB is launched successfully (or not so well, if the parameters are wrong) at the same time on all eight clients. To collect the results (YCSB log files), fabric is used again.

results

So, the initial results are YCSB logs from each client. These logs look something like this (the final piece of the file is shown):

Operations, 1187363 , Retries, 0 , AverageLatency(us), 3876.5493619053314 , MinLatency(us), 162 , MaxLatency(us), 278190 , 95thPercentileLatency(ms), 12 , 99thPercentileLatency(ms), 22 , Return=0, 118 7363, Reconnections , 0.0 , RunTime(ms), 303574.0 , Operations, 1249984.0 , Throughput(ops/sec), 4117.5594747903315

As you can see, there is a number of operations of a certain type (in in this example- reads), average, minimum and maximum delays, the delay within which 95 and 99% of operations were met, the number of successful operations (return code 0), the total test time, the total number of all operations and the average number of operations per second. We are most interested in the average latency (AverageLatency) and the number of operations per second (Throughput).

Using another Python script, data from a bunch of logs was collected into a table, and beautiful graphs were built from the table.





conclusions

NoSQL databases are divided into two groups: fast and slow. The key-value database turned out to be fast, as expected. Aerospike and Couchbase are far ahead of the competition.

Aerospike is indeed a very fast database. And we almost managed to reach a million operations per second (on data in memory). Aerospike also works quite well on SSDs, especially considering that Aerospike in this mode does not use data caching in memory, but accesses the disk for each request. This means that you can really fit a large amount of data into Aerospike (as long as there are enough disks, not RAM).

Couchbase is fast, but only fast on memory operations. The SSD test graphs show the speed of Couchbase with a data volume only slightly larger than the amount of RAM - a total of 200 million records. This is noticeably less than the 500 million with which other databases were tested. Couchbase simply couldn't insert any more records, it refused to evict the data cache from memory to disk, and it stopped writing (writes would fail). This is a good cache, but only for data that fits in RAM.

Cassandra is the only database that writes faster than it reads :). This is because writing to it is completed successfully (in the fastest version) immediately after writing to the log (on disk). But reading requires checks, several readings from the disk, selecting the most recent record. Cassandra is a reliable and fairly fast scalable data archive.

MongoDB is quite slow to write, but relatively fast to read. If the data (or rather, what is called a working set - a set of current data that is constantly accessed) does not fit into memory, it slows down greatly (and this is exactly what happens when testing YCSB). You also need to remember that MongoDB has a global read/write lock, which can cause problems under very high loads. Overall, MongoDB is a good database for the web.

PS

Let's take a little break from performance issues and look at how SQL and NoSQL solutions will develop further. In fact, what we see now is a repetition of a well-known story. All this already happened in the sixties and seventies of the twentieth century: before relational databases, there were hierarchical, object and others, and others. Then I wanted standardization, and SQL appeared. And all serious DBMSs, each of which supported its own query language and API, switched to SQL. The query language and relational model became the standard. It is curious that now they are also trying to graft SQL onto NoSQL, which leads to the creation of both wrappers on top of existing NoSQL and completely new databases called NewSQL.

If NoSQL decided to abandon the “heavy legacy” of SQL, reconsider approaches to data storage and created completely new solutions, then the term NewSQL is used to describe the movement to “revive” SQL. Taking ideas from NoSQL, the guys recreated SQL databases at a new level. For example, in the NewSQL world, you often find databases that store data in memory, but with full-fledged SQL queries, table joins, and other familiar things. In order to still store a lot of data, sharding mechanisms are built into these databases.

NewSQL includes VoltDB, TokuDB, MemDB and others. Remember these names, perhaps soon they will also be talked about at every IT conference.

The database testing service will help minimize risks when introducing the system into commercial operation. You will be able to check in advance the correctness and security of the database.
During the database testing process, the operation of the application database is checked for compliance with functional and non-functional requirements. Applications that include a database in their architecture require a database testing procedure, for example: corporate Information Systems, mobile and web applications.

Database performance is a critical factor in the effectiveness of management and business applications. If searching or writing data is slow, the application's ability to function normally decreases. The only way to find out the cause of poor performance is to take quantitative measurements and determine what is causing the performance problem.
The problems of identifying database performance bottlenecks are directly related to metrics, performance measurement methods and the technology for their implementation. For large corporations and large databases, the problem of determining database performance has another very important aspect: determining the IT infrastructure for long-term industrial operation of applications. This ultimately leads to a more accurate determination of the initial investment in hardware and basic software. Since high database performance strongly depends on the platform and equipment, and they are purchased and operated for the long term.
The most important metrics for measuring database performance are:

  • number of transactions per period of time ( various types transactions);
  • the number of I/O operations (read lines) per transaction and its execution time;
  • number of rows read per table per transaction;
  • average number of I/O operations per transaction by range;
  • SQL statements have a high operating cost of CPU time (user, system)
  • start and end times of statement execution
  • time to complete sort operations (number of sorts, number of sort overflows, time to complete sorts), highest elapsed time usage, and lowest index usage efficiency.

Memory usage metrics for tablespace pages and buffer pools (for reading data, for reading indexes), for performing sorts, for running utilities, for directories and cache memory packages, along with performance measurement metrics, are also important for tuning efficient access to data.

What else should you check when testing the database?

Data mapping

Make sure that the connections in the database correspond to the design documentation. For all CRUD operations, verify that the corresponding tables and records are updated when the user clicks Save, Refresh, Search, or Delete from the application GUI.

ACID transaction properties

ACID properties of transactions include atomicity, consistency, isolation, and strength. During database testing, you should check these four properties. This area requires more extensive testing if the database is distributed.

Data integrity

Note that different application modules (such as screens and forms) use the same data and perform CRUD operations differently. Therefore, you need to make sure that the latest state of the data is reflected equally everywhere. The system should show updated values ​​on all forms and screens. This is called data integrity.

Accuracy of business logic implementation

Today, databases are designed for more than just storing records. They have evolved into very powerful tools that provide developers with ample opportunities to implement business logic at the database level. Examples of powerful database features are "referential integrity", relational constraints, triggers and stored procedures. Thus, using these and many other features offered by the database, developers implement business logic at the database level. The tester must ensure that the implemented business logic is correct and works accurately.

How to test a database?

Writing SQL Queries

In order to properly organize the database testing process, testers must have good knowledge of SQL and DML (Data Manipulation Language) and have a clear understanding of the internal structure of the database. This is the best and most reliable way to test the database, especially for applications with low and medium complexity. But the two described prerequisites must be met. If the application is very complex, it will be difficult or even impossible for the tester to write all the necessary SQL queries themselves. Therefore, in case of some complex queries, the tester can turn to the developer for help. This method not only gives confidence that testing is done well, but also improves the skill of writing SQL queries.

Viewing data in tables

If the tester does not know SQL, then he can check the result of the CRUD operation using the application's graphical interface by viewing the tables (relations) of the database. This method of checking a database requires good knowledge of table structure and can be a bit tedious and cumbersome, especially when the database and tables have a large amount of data. This method of checking a database can be difficult for testers if the test data is located in several tables.

Developer Help

The tester performs any CRUD operations on the GUI and verifies their results by executing the corresponding SQL queries written by the developer. This method does not require either good knowledge of SQL or good knowledge of the application database structure. The method seems simple and good choice for testing the database. But its downside is chaos. What if the query written by the developer is semantically incorrect or does not fulfill the user's requirements correctly? In this case, testing does not provide any guarantee about the quality of the product.

An example of a methodology for testing database data integrity

Databases and database processes should be tested as an independent subsystem. In this case, all subsystems without the target user interface as an interface to the data must be tested. Additional research should be done in the database management system (DBMS) to determine the tools and techniques to support the testing defined in the following table.

Goals of the methodology Testing of database access methods and processes independent of the UI, so that malfunctioning target algorithms or data corruption can be observed and recorded.
Methodology Call each database access method or process, populating each one with valid and invalid data or data requests.

Validating the database to ensure that the data is populated correctly and that all database events occur as expected, or validating the returned data to ensure that the correct data is retrieved when necessary.

Oracles(a heuristic mechanism that helps identify a problem) Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.
Required Tools This technique requires the following tools:
  • Test Script Automation Tool
  • Imaging and Baseline Restore Tool
  • Backup and recovery tools
  • Installation monitoring tools (registry, HDD, CPU, memory and so on)
  • SQL Database Utilities and Tools
  • Data Generation Tools
Success Criteria This technique supports testing of all major database access methods and processes.
Specialinformation Testing may require a DBMS development environment or drivers to enter or change data directly in the database.

Processes must be called manually.

Small databases or databases minimum size(with a limited number of entries) should be used to extend the scope of all corrupted events.

Translation of the article by Rizwan Jafri

These days, the database is one of the inevitable parts of an application. When the application is executed, the end user mainly uses CRUD operations provided by the database tool.

C: Create(Create) - The 'Create' operation is performed when the user saves any new transaction.
R: Retrieve(Retrieve) - The 'Retrieve' operation is performed when the user searches or views any saved transaction.
U: Update(Update) - The 'Update' operation is performed when the user edits or changes an existing entry.
D: Delete(Delete) - The 'Delete' operation is performed when the user deletes any entry from the system.

It does not matter which database is used and how the operation was previously performed (join or subquery, trigger or stored procedure, query or function). The interesting thing is that all database operations performed by the user, from the user interface of any application, are one of these four CRUD operations.

What to check when testing a database?

1) Data mapping:
Make sure that the connections in the database correspond to the design documentation. For all CRUD operations, verify that the corresponding tables and records are updated when the user clicks Save, Refresh, Search, or Delete from the application GUI.

2) ACID transaction properties:
ACID properties of transactions include atomicity, subsequence, insulation And strength. During database testing, you should check these four properties. This area requires more extensive testing if the database is distributed.

3) Data integrity:
Note that different application modules (such as screens and forms) use the same data and perform CRUD operations differently. Therefore, you need to make sure that the latest state of the data is reflected equally everywhere. The system should show updated values ​​on all forms and screens. This is called data integrity.

4) Accuracy of business rules implementation:
Today, databases are designed for more than just storing records. They have evolved into very powerful tools that provide developers with ample opportunities to implement business logic at the database level. Examples of powerful database features are "referential integrity", relational constraints, triggers and stored procedures. Thus, using these and many other features offered by the database, developers implement business logic at the database level. The tester must ensure that the implemented business logic is correct and works accurately.

The points described above are the most important aspects of database testing. Database testing is a critical business task and should never be left to inexperienced employees without proper training.

How to test a database?

1. Writing SQL Queries
In order to check the database correctly and accurately, first of all, the tester must have very good knowledge of SQL and DML (Data Manipulation Language). Secondly, the tester must have a good understanding of the internal structure of the database. If these two prerequisites are met, then the employee is ready to test the database. He/she will perform any CRUD operations from the application's UI and then check the execution results using SQL queries.
This is the best and most reliable way to test the database, especially for applications with low and medium complexity. But the two described prerequisites must be met. Otherwise, this method of database testing will not work for you.
If the application is very complex, it will be difficult or even impossible for the tester to write all the necessary SQL queries themselves. Therefore, in case of some complex queries, the tester can turn to the developer for help.
This method not only gives confidence that testing is done well, but also improves the skill of writing SQL queries.

2. Viewing data in tables
If the tester does not know SQL, then he/she can check the result of the CRUD operation using the application's graphical interface by viewing the tables (relations) of the database. This method of checking a database requires good knowledge of table structure and can be a bit tedious and cumbersome, especially when the database and tables have a large amount of data.
Additionally, this way of checking a database can be very difficult for testers if the data to be checked is in multiple tables.

3. Developer Help
This is the easiest way. The tester performs any CRUD operations on the GUI and verifies their results by executing the corresponding SQL queries written by the developer. This method does not require either good knowledge of SQL or good knowledge of the application database structure.
So this method seems simple and a good choice for DB testing. But its downside is chaos. What if the query written by the developer is semantically incorrect or does not fulfill the user's requirements correctly? In this case, testing does not provide any guarantee about the quality of the product.

Conclusion

The database is the main and most important part of almost every application. Thus, database testing requires close attention, good skills in writing SQL queries, knowledge of the database structure and appropriate preparation.

To be sure that testing is effective, it must be assigned to a person who possesses all four of these qualities. Otherwise, after delivery of the product, there will most likely be incorrect or unintended behavior of the application, errors that the customer will find.

Databases and database processes should be tested as an independent subsystem. In this case, all subsystems without the target user interface as an interface to the data must be tested. Additional research should be done in the database management system (DBMS) to determine the tools and techniques to support the testing defined in the following table.

Objectives of the methodology:

Testing of database access methods and processes independent of the UI, so that malfunctioning target algorithms or data corruption can be observed and recorded.

Methodology:

Call each database access method or process, populating each one with valid and invalid data or data requests.

Validating the database to ensure that the data is populated correctly and that all database events occur as expected, or validating the returned data to ensure that the correct data is retrieved when necessary.

Oracles:
Required tools:

SQL database utilities and tools

data generation tools

Success criteria:

This technique supports testing of all major database access methods and processes.

Special information:

Testing may require a DBMS development environment or drivers to enter or change data directly in the database.

Processes must be called manually.

Small or minimally sized databases (with a limited number of records) should be used to extend the scope of all corrupted events.

Function testing

Test target function testing should focus on requirements that can be traced directly to use cases or business process functions and rules. The purpose of these tests is to verify the correct acceptance, processing and return of data, as well as the appropriate implementation of business process rules. This type of testing is based on black box techniques, which means testing an application and its internal processes is done by interacting with the application using a Graphical User Interface (GUI) and analyzing the findings or results. The following table defines the testing framework recommended for each application.

Objectives of the methodology:

Test the functionality of the testing target, including navigation, input, processing and return of data to monitor and record the target algorithm.

Methodology:

Test threads or functions and functionality separate use cases for each use case scenario, using valid and invalid data to verify that:

when using valid data, expected results are obtained

If invalid data is used, appropriate error or warning messages are displayed

all business process rules are applied accordingly

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

Test Script Automation Tool

image creation and base recovery tool

backup and recovery tools

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

data generation tools

Success criteria:

all major use case scenarios

all basic functions

Special information:

Identify or describe those elements or issues (internal or external) that affect the implementation and performance of the feature test.

Business Process Cycle Testing

Business process cycle testing should emulate tasks performed over time on<Имя проекта>. You should define a period, such as one year, and perform transactions and tasks that will occur during the year. This includes all daily, weekly and monthly cycles, as well as date-based events.

Objectives of the methodology:

Test the test target processes and background processes according to the required business process models and plans to monitor and record the target algorithm.

Methodology:

Testing simulates several cycles of a business process by doing the following:

The tests used to test the features of the test target will be modified or extended to increase the number of times each feature is executed to simulate multiple different users over a given period.

All date or time dependent functions will be performed using valid and invalid dates and periods.

All functions that are executed periodically will be executed or started at the appropriate time.

Testing will use valid and invalid data to test the following:

When valid data is used, the expected results are obtained.

If invalid data is used, appropriate error or warning messages are displayed.

All business process rules are applied accordingly.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

Test Script Automation Tool

image creation and base recovery tool

backup and recovery tools

data generation tools

Success criteria:

This technique supports testing of all critical business process cycles.

Special information:

System dates and events may require special supporting tasks.

A business process model is required to identify relevant requirements and testing procedures.

User Interface Testing

User interface (UI) testing tests the user's interaction with the software. The goal of UI testing is to ensure that the UI provides the user with appropriate access and navigation to the features of the testing target. In addition, UI testing helps ensure that objects in the UI perform as expected and comply with corporate or industry standards.

Objectives of the methodology:

Test the following to monitor and record compliance with the standards and target algorithm:

Navigation of the test target, reflecting the functions and requirements of the business process, including window-to-window, field-to-field methods, and the use of access methods (tab keys, mouse movements, keyboard shortcuts).

You can test window objects and characteristics such as menus, size, layout, state, and focus.

Methodology:

Create or modify per-window tests to verify correct navigation and object states for each application window and object.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires a test script automation tool.

Success criteria:

This technique supports testing of every home screen or window that will be widely used by users.

Special information:

Not all properties of custom objects and third-party objects can be accessed.

Performance Profiling

Performance profiling is a performance test that measures and evaluates response times, transaction speeds, and other time-sensitive requirements. The purpose of performance profiling is to verify that performance requirements have been achieved. Performance profiling is implemented and performed to profile and refine the performance algorithms of a test target as a function of conditions such as workload or hardware configuration.

Note: The transactions listed in the following table are classified as "logical business process transactions". These transactions are defined as specific use cases that an entity is expected to perform using a testing goal, such as adding or modifying a given contract.

Objectives of the methodology:

Test algorithms for specified functional transactions or business process functions under the following conditions to observe and record target algorithm and application performance data:

Methodology:

Apply testing procedures designed to test business process functions and cycles.

Modifying data files to increase the number of transactions or scripts to increase the number of iterations performed in each transaction.

Scripts should be run on the same system ( best option- start with one user, one transaction) and repeat with multiple clients (virtual or actual, see specific information below).

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

Test Script Automation Tool

application performance profiling tool such as Rational Quantify

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

Success criteria:

This technique supports testing:

Single transaction or single user: Successfully emulate transaction scenarios without failing due to test implementation issues.

Multiple transactions or multiple users: Successfully emulate the workload without failing due to test implementation issues.

Special information:

Comprehensive performance testing includes background load on the server.

There are several methods that can be used, including the following:

"Delivery of transactions" directly to the server, usually in the form of Structured Query Language (SQL) calls.

Creating a "virtual" user load to simulate several clients, usually several hundred. To achieve this load, remote terminal emulation tools are used. This technique can also be used to flood the network with a "data stream".

To create a load on the system, use several physical clients, each of which runs test scripts.

Performance testing should be performed on a dedicated system or at a dedicated time. This provides complete control and accurate measurements.

Databases used for performance testing must either be the actual size or be scaled equally.

Load testing

Load testing is a performance test in which the test target is subjected to different workloads to measure and evaluate the performance algorithms and the ability of the test target to continue to function appropriately under different workloads. The purpose of load testing is to determine and ensure that the system will operate correctly when subjected to the expected maximum operating load. Load testing also evaluates performance parameters such as response time, transaction speed, and other time-related parameters.

Note: The transactions listed in the following table are classified as "logical business process transactions". These transactions are defined as specific functions that the user is expected to perform while using the application, such as adding or changing a given contract.

Objectives of the methodology:

Execute designated transactions or business process variations under varying workload conditions to monitor and record target algorithm and system performance data.

Methodology:

Use transaction test scripts designed to test business process functionality and cycles as a basis, but remove unnecessary iterations and delays.

Modifying data files to increase the number of transactions or tests to increase the number of times each transaction is executed.

Workloads should include - for example, daily, weekly and monthly - peak loads.

Workloads should represent both average and peak loads.

Workloads must represent both instantaneous and long-term peak loads.

Workloads should be tested in different test environment configurations.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

Test Script Automation Tool

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

resource limiting tools; for example, Canned Heat

data generation tools

Success criteria:

This technique supports workload emulation testing, which is a successful emulation of the workload without failures due to test implementation issues.

Special information:

Load testing should be performed on a dedicated system or at a dedicated time. This provides complete control and accurate measurements.

Databases used for load testing must either be the actual size or be scaled equally.

Voltage testing

Stress testing is a type of performance test implemented and performed to understand how a system fails under conditions at or outside expected tolerances. It usually involves resource scarcity or competition for resources. Low-resource conditions indicate how a test goal fails in a way that is not obvious under normal conditions. Other flaws may result from contention for shared resources, such as database locks or network bandwidth, although some of these tests are typically performed in feature and load testing.

Objectives of the methodology:

Test the functions of the test target under the following voltage conditions in order to observe and record the target algorithm that identifies and documents the conditions under which the failure system in order to continue to operate accordingly:

small amount of memory or lack of free memory on the server (RAM and permanent memory)

the maximum physically or actually possible number of connected or simulated users

multiple users perform the same transactions with the same data or accounts

"excessive" transaction volume or a mixture of conditions (see Performance Profiling section above)

Methodology:

To test limited resources, tests should be run on a single system, and the RAM and persistent memory on the server should be reduced or limited.

For other voltage tests, multiple clients should be used, running the same or additional tests to generate the worst-case transaction volume or a mixture of both.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

Test Script Automation Tool

transaction load planning and management tool

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

resource limiting tools; for example, Canned Heat

data generation tools

Success criteria:

This technique supports voltage emulation testing. A system can be successfully emulated under one or more conditions, defined as stress conditions, and observations of the resulting state of the system can be recorded during and after the condition is emulated.

Special information:

To create stress on the network, network tools may be required to load the network with messages and packets.

The persistent memory used for the system should be temporarily reduced to limit the available space for database growth.

You should synchronize the simultaneous access of clients to the same data records or accounts.

Capacity testing

Capacity testing exposes the test target to large volumes of data to determine whether limits have been reached that cause the system to fail. Capacity testing also determines the continuous maximum load or capacity that the testing target can drive during a given period. For example, if a test target is processing a set of database records to generate a report, the capacity test will use a large test database and verify that software works fine and produces the correct report.

Objectives of the methodology:

Test the test target's functionality in the following high-capacity scenarios to observe and record the target algorithm:

The maximum (actual or physically possible) number of connected or simulated clients performing the same (worst in terms of performance) business process function over a long period.

The maximum size (actual or scale) of the database has been reached and multiple queries or reporting transactions are running concurrently.

Methodology:

Use tests designed for performance profiling or load testing.

Multiple clients should be used running the same or additional tests to generate the worst case volume of transactions or a mixture of them (see stress testing) over an extended period.

The maximum size of the database (actual, scaled, or populated with representative data) is created and multiple clients are used simultaneously to run queries and reporting transactions over an extended period.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

Test Script Automation Tool

transaction load planning and management tool

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

resource limiting tools; for example, Canned Heat

data generation tools

Success criteria:

This technique supports capacity emulation testing. Large numbers of users, data, transactions, or other aspects of system usage can be successfully emulated and observed changes in system state throughout capacity testing.

Special information:

Security and access control testing

Security and access control testing focuses on two key areas of security:

Application-level protection, including access to data or business process functions

System-level security, including login or remote access to the system

Based on the required level of protection, application-level protection ensures that subjects only have access to certain features or use cases, or that the data available to them is limited. For example, entry of data and creation of new accounts can be allowed to everyone, but deletion - only to managers. If there is data-level security, then testing ensures that the “type 1 user” has access to all customer information, including financial data, while the “type 2 user” only has access to demographic data about that same customer.

System-level security ensures that only users with system permissions have access to applications, and only through the appropriate gateways.

Objectives of the methodology:

Test the testing target under the following conditions in order to observe and record the target algorithm:

Application-level protection: the subject has access only to those functions and data for which a user of this type has access rights.

System-level security: Access to applications is limited to those with system and application permissions.

Methodology:

Application-level security: Define and list all user types and the functions and data that each type of user is allowed to access.

Create tests for each user type and check all access rights by creating transactions defined for each user type.

Changing the user type and rerunning tests for the same users. In each case, checking that access to additional functions or data is correctly allowed or denied.

System Level Access: See Special Information below.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

Test Script Automation Tool

"Hacker" tools for testing and finding security holes

OS security administration tools

Success criteria:

This methodology supports testing of the relevant features and data affected by security settings for each known user type.

Special information:

System access should be verified and discussed with appropriate system or network administrators. This testing may not be necessary because it may be part of network or system administration functions.

Testing disaster recovery

Disaster recovery testing verifies that the test target can be successfully recovered from a variety of hardware, software, and network failures with extensive data loss or data integrity.

For systems that must continue to operate, disaster recovery testing ensures that, in the event of a recovery from a failure, alternative or backup systems correctly "take over" for the system that failed without losing data or transactions.

Recovery testing is an antagonistic testing process in which an application or system is exposed to extreme conditions or simulated conditions that cause failure, such as device I/O failures or invalid database pointers and keys. Recovery processes are invoked and the application or system is monitored and controlled to verify that proper recovery of the application or system and data has been achieved.

Objectives of the methodology:

Simulate failure conditions and test recovery processes (manual and automatic) of the database, applications, and system to the desired known state. To monitor and record the operation algorithm after recovery, the following types of conditions are included in testing:

power interruption in the client system

power interruption in the server system

connection interruption through network servers

interrupted connection or loss of power to DASD (direct memory access devices) and DASD controllers

incomplete cycles (interruption of data filtering processes, interruption of data synchronization processes)

invalid pointers and database keys

invalid or corrupted data items in the database

Methodology:

You can use tests already created to test business process functionality and cycles as a basis for creating a series of transactions to support recovery testing. The first step is to identify tests for recovery success.

Power interruption in the client system: Power off the PC.

Server system power interruption: Simulates or initiates power-down procedures for the server.

Interrupt via network servers: Simulates or initiates a loss of network connection (physically disconnecting connecting wires or turning off power to network servers or routers).

Interruption of connection or loss of power to DASDs and DASD controllers: simulation or physical loss of connection to one or more DASDs or DASD controllers.

When the above or simulated conditions are reached, additional transactions should be executed and recovery procedures should be invoked when this second test step is reached.

When testing incomplete loops, the same methodology is used as described above, except that the database processes themselves must be interrupted or terminated prematurely.

Testing the following conditions requires reaching a known database state. Several database fields, pointers, and keys must be corrupted manually and directly in the database (using database tools). Additional transactions should be performed using tests from application feature testing and business process cycles and fully completed cycles.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

image creation and base recovery tool

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

backup and recovery tools

Success criteria:

This technique supports testing:

One or more simulated failures involving one or more combinations of applications, database, and system.

One or more simulated recoveries, involving one or more combinations of applications, database, and system, to a known desired state.

Special information:

Recovery testing is largely intrusive. Procedures for disconnecting electrical cables (when simulating power or connection loss) may not be desirable or feasible. Alternative methods such as diagnostic software tools may be required.

Requires resources from systems (or computer operations), databases, and network groups.

These tests should be performed outside of normal business hours or on an isolated system.

Configuration testing

Configuration testing verifies the performance of the test target under different hardware and software configurations. In most work environments, the specific hardware specifications for client workstations, network connections, and database servers may vary. Client workstations may have different software loaded (eg, applications, drivers, and so on), and many different combinations of software may be active at the same time, using different resources.

Objectives of the methodology:

Verifies the test target in the required hardware and software configurations to observe and record the target algorithm in different configurations and determine differences in configuration status.

Methodology:

Application of function tests.

Opening and closing various non-test related software, such as Microsoft® Excel® and Microsoft® Word® applications, either as part of a test or before running a test.

Execute selected transactions to simulate entities interacting with the test target and non-test target software.

Repeat the above process, minimizing the available main memory on the client workstation.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

image creation and base recovery tool

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

Success criteria:

This technique supports testing of one or more combinations of test target elements executed in the expected supported development environments.

Special information:

What non-target software is required, available, and accessible on the desktop?

What applications are commonly used?

What data do applications operate on? for example, a large spreadsheet open in Excel, or a 100-page document in Word?

As part of this test, you should also document the network, network servers, and system databases in general.

Testing the installation

Testing the installation has two purposes. The first is to make sure that the software can be installed (eg new installation, update, and full or custom installation) under various standard and non-standard conditions. Unusual conditions include insufficient disk space, insufficient permissions to create directories, and so on. The second purpose is to verify that the software works correctly after installation. Typically this is accomplished by running a series of tests designed to test functionality.

Objectives of the methodology:

Perform a test target installation on each required hardware configuration under the following conditions to observe and record installation behavior and configuration state changes:

new installation: a new system that has never been installed before<Имя проекта>

update: system on which it was previously installed<Имя проекта>, same version

version update: system on which it was previously installed<Имя проекта>, earlier version

Methodology:

Develop automated or manual scripts to test target system conditions.

new:<имя проекта>never installed

<имя проекта>the same or earlier version has already been installed

Start or complete installation.

Applying a predetermined subset of function testing scenarios, executing transactions.

Oracles:

Outline one or more strategies that can be used in the technique to correctly observe test results. An oracle combines elements of both the method by which an observation can be made and the characteristics of a particular outcome that indicate possible success or failure. Ideally, oracles will perform self-checking, allowing for initial assessment of success or failure by automated tests. However, you should be aware of the risks associated with automatically determining results.

Required tools:

This technique requires the following tools:

image creation and base recovery tool

installation monitoring tools (registry, hard drive, CPU, memory, etc.)

Success criteria:

This technique supports testing the installation of a developed product in one or more installation configurations.

Special information:

What transactions<имя проекта>should be chosen to provide a reliable test that the application<имя проекта>was installed successfully and that no important software components were missing?