Oracle Sharding is a data tier architecture that allows horizontal data partitioning across independent databases. Each database is hosted on its own server and is called a shard. All of them together make up a single logical database - a sharded database. From a client point of view, a sharded database looks like a single database.
Sharding is available in Oracle 12.2 and higher.
In a sharded database, a table can be either sharded or duplicated. A sharded table stores its own portion of records in each shard, and the data is distributed between shards based on a value of one or more of its columns, that are called the sharding key. A duplicated table stores a copy of all its content in each shard.
Rows may be split between different shards based on the sharding key hash, lists of sharding key values, or ranges of sharding key values.
Tables with a master-detail relationships that form a hierarchy that ascends to a single table form the table family. For example, there are three tables: Customers, Orders, and OrderLines. Orders is a detail table for Customers, and OrderLines is a detail table for Orders. There also may be other detail tables for Customers, and so on. In this case, the data of entire hierarchy of tables may be split between different shards based on the Customers sharding key. Note that all the table in the table family must have the sharding key columns with values, corresponding to their parents.
Oracle has certain limitation on data types that sharding key columns may have. A sharding key column can have only one of the following data types:
You can use a composite sharding key with one column determining data distribution between shardspaces and the second - between shards. In this case the part of the composite sharding key that is responsible for distributing data between shardspaces is sometimes called super sharding key and only the other sharding key columns are called shard key.
Note that this is somewhat simplified description of database sharding. More advanced aspects, such as table partitioning, tablespaces, and tablespace sets, on which the sharding technology is based, chunks, which are units of data migration between shards, and some other features are not touched here at all. For more detailed sharding description please address to Oracle documentation.
Applications are not requiered to know anything about database sharding, and can connect to a sharded database completely transparently, like with any non-sharded Oracle database. However, in such case queries may often need to run against all the shards, and in certain cases it may lead to inefficient resource use.
Sharded databases provide many performance benefits - linear scalability, geographical data distribution, etc. But the best practice is when an application mostly uses small short-living sessions that work only with a limited set of data from a single shard - perform single-shard queries. In case of the example with Customers, Orders, and OrderLines, described above, each session works with data of a single customer. In this case, such a connection can specify the necessary value of the sharding key of the Customers table family, and this allows connecting to a specific shard, where the needed data is stored, without the need to query other shards. This ensures the best performance and most efficient resource use.
Of course, whenever necessary, an application may connect without specifying sharding key, and even when the sharding key is specified, you are free to execute queries that return data from different shards. But the best practice is to minimize using of such queries.
As described above, sharding support on the client side consists in the ability to pass the required sharding key value. Note that it is not passed as a part of the connection string.
For passing a sharding key, dotConnect for Oracle offers the OracleShardingKey class that represents a sharding key value and the SetShardingKey method of OracleConnection. You create an OracleShardingKey instance or instances and specify the corresponding data types and values for them. Then you set the created keys for the connections. Note that they should be set prior to the first connection opening; otherwise, they are not applied. If you need to change the sharding key values used, you need to create a new connection.
Note that OracleShardingKey constructor and SetShardingKey method, which accept pairs of type/value for the sharding key, accept value arguments only of a specific type, corresponding to the provided type. Acceptable values are:
OracleDbType | Accepted types of data parameter |
---|---|
OracleDbType.Integer | Int32 or OracleNumber |
OracleDbType.Number | Decimal or OracleNumber |
OracleDbType.Date | DateTime or OracleDate or OracleTimeStamp |
OracleDbType.TimeStamp | DateTime or OracleTimeStamp |
OracleDbType.VarChar | String or OracleString |
OracleDbType.Raw | byte[] or OracleBinary |
Here is an example of initializing OracleConnection for performing single shard queries.
dotConnect for Oracle supports sharding only in the OCI mode. If you want to use connection pooling with sharding, you should use OCI session pooling and disable dotConnect for Oracle connection pooling (add "Pooling=false;Oci Session Pooling=true;" to your connection string.