tacnode

Semi-structured (JSONB)

JSONB: Perfect Combination of Relational and Document Data Models

In the realm of data storage, both relational and document databases offer unique benefits and drawbacks. Consider a hotel reservation system:

Relational Database: This model is excellent for managing customer information and booking details. However, its structure can become too rigid when adapting to rapidly changing or semi-structured customer needs, such as specific room configurations or additional service requests.

Document Database: This model provides exceptional flexibility for handling semi-structured data. However, issues arise when multiple customers book the same room, leading to redundant room information across individual customer documents, complicating data updates.

Choosing one database type exclusively may not be the most effective solution. Tacnode combines the strengths of relational and document models by introducing the JSONB data type. This dual capability efficiently manages relational data, like customer and booking information, while supporting flexible semi-structured requirements without data duplication.

Thus, the limitations of using only relational or document databases are evident. Tacnode offers an optimal data storage solution by integrating the advantages of both approaches.

When to use JSONB

Optimal handling of semi-structured and hierarchical data: JSONB is ideal for storing and querying semi-structured data, such as user preferences, configuration settings, multi-level classification systems, or nested data formats. It accommodates these complex structures directly in the database, avoiding the need to split them into multiple relational tables. This approach simplifies queries and enhances performance, especially with large or intricate hierarchical data.

Supports dynamic or evolving data models: JSONB is perfect for data models with frequently changing or uncertain attributes. It allows storing new or uncommon properties without altering existing database frameworks. This flexibility is crucial for adapting to evolving data requirements, significantly reducing maintenance costs and complexities associated with data model changes.

Streamlines integration with external systems: When collaborating with third-party systems, data often comes in various formats. JSONB provides an effective solution for storing this diverse data while maintaining its original structure and specifications. This capability simplifies importing and exporting data across different systems and enhances future data processing and analysis.

Comparing JSONB with Other Data Types

Flexibility and Strictness: Traditional data types like integers and characters are known for their strictness, ensuring data consistency and integrity. JSONB's advantage lies in its flexibility, allowing the storage and processing of irregular or unstructured data. The choice of data type should align with your application's specific needs. Traditional types are preferable for cases requiring strong data typing, whereas JSONB excels in variable or unstructured data situations.

Performance: JSONB's flexibility offers significant benefits for handling unstructured and diverse data, but its performance can be less efficient than traditional data types, like integers or characters, in certain query operations. This discrepancy is particularly evident during high-frequency query processing, where traditional types often show better optimization for query efficiency. Therefore, when designing your database architecture, evaluate each field's usage patterns and query necessities to balance flexibility and performance, ensuring overall database performance aligns with desired outcomes.

Storage Efficiency Trade-off: JSONB is advantageous for storing complex data structures and facilitating intricate query operations; however, these benefits often come at the cost of increased storage space. JSONB typically occupies more storage, which could be a crucial consideration in environments with limited resources. Thus, opting for JSONB requires carefully assessing its flexibility against storage efficiency, leading to informed decisions based on actual application contexts and resource limitations.

In summary, specific application needs should guide the choice between JSONB and traditional data types. If your application demands high flexibility in data structure and iteration speed, JSONB could be an excellent option. Nonetheless, it's crucial to also weigh query performance and storage considerations.

JSONB Usage

Create and Manage JSONB Tables

Create a table with JSONB columns

CREATE TABLE json_t
(
    id          serial PRIMARY KEY,
    json_column jsonb
);

Insert JSONB data

INSERT INTO json_t(json_column) VALUES ('{"key": "value"}');

Read JSONB data

SELECT json_column -> 'key' FROM json_t;

Update JSONB data

UPDATE json_t SET json_column = jsonb_set(json_column, '{key}', '"new_value"');

Delete some data inside JSONB

UPDATE your_table SET json_column = json_column - 'key';

Query JSONB by Operators

Use the -> AND ->> operator

  • ->: Returns JSONB array elements or object fields.
SELECT json_column -> 'key' FROM json_t;
SELECT json_column -> 0 FROM json_t; -- Get the first element of an array
  • ->>: Returns a JSONB array element or object field as text.
SELECT json_column ->> 'key' FROM json_t;

Use @> and <@ to determine whether it contains

  • Determine whether the JSONB column contains the specified key/value or element.
SELECT * FROM json_t WHERE json_column @> '{"key": "value"}';
SELECT * FROM json_t WHERE json_column <@ '{"key": "value", "another_key": "another_value"}'::jsonb;

Using #> and #>> for getting nested data

SELECT json_column #> '{path, to, key}' FROM json_t;
SELECT json_column #>> '{path, to, key}' FROM json_t;

Use the ? AND ?| operator to determine the existence of a key/value

  • Determine whether a JSONB column contains a specific key or value.
SELECT * FROM json_t WHERE json_column ? 'key';
SELECT * FROM json_t WHERE json_column ?| array['key1', 'key2'];

Manage JSONB by Functions

  • jsonb_set: Update or insert a new value.
SELECT jsonb_set(json_column, '{key}', '"new_value"') FROM json_t;
  • jsonb_insert: Insert a new value at the specified path.
SELECT jsonb_insert(json_column, '{path, to, key}', '"new_value"') FROM json_t;
  • jsonb_array_elements: Expands a jsonb array into separate rows. It produces a collection where each row represents a JSON element from the original array. Typically, you'll need to use FROM or JOIN to obtain the results with this function.

For example, consider a json_column column containing the following data:

["element1", "element2", "element3"]

Use jsonb_array_elements as follows:

SELECT jsonb_array_elements(json_column)
FROM json_t;

JSONB GIN Indexes

Why and when you need to index JSONB data

  • Indexing can greatly enhance query speed, especially when frequently accessing large volumes of JSONB data.

Create GIN index

The GIN (Generalized Inverted Index) is well-suited for JSONB data because it efficiently manages complex nested data and search queries.

-- Row table, USING gin
CREATE INDEX json_t_jsonbcol_gin ON json_t USING gin(json_column);
 
-- Columnar  or hybrid table, USING split_gin
CREATE INDEX json_t_jsonbcol_gin ON json_t USING split_gin(json_column);

Query using GIN index

  • GIN indexes can speed up many types of queries, such as using @> to find records containing a specific key/value:
SELECT * FROM json_t WHERE json_column @> '{"key": "value"}'::jsonb;

Delete index

  • If at some point you no longer need this index, you can delete it using the following command:
DROP INDEX json_t_jsonbcol_gin;

Creating indexes is essential for large datasets and frequent queries, as they improve performance. However, they also increase storage demands and may extend insert, update, or delete times due to index updates. Thus, consider the benefits and drawbacks before establishing an index.

Advanced Usage

JSONB path query

  • You can query nested data using JSON path expressions.
SELECT json_column #> '{path, to, nested, key}' FROM json_t;

Aggregate JSONB data

  • You can apply aggregate functions to combine multiple jsonb data rows into an array or an object.
-- Aggregate into an array
SELECT jsonb_agg(json_column) FROM json_t GROUP BY some_column;
 
-- Aggregate into objects (merge jsonb into one object)
SELECT jsonb_object_agg(key_column, value_column) FROM json_t;

Modify nested data

  • You can modify nested data at the specified path using jsonb_set.
UPDATE json_t SET json_column = jsonb_set(json_column, '{path, to, nested, key}', '"new_value"');

Array operations

  • Arrays can be modified through designated functions and operators in jsonb.
-- Add elements to the end of the array
UPDATE json_t SET json_column = json_column || '["new_element"]'::jsonb;
 
-- Get the length of the array
SELECT jsonb_array_length(json_column) FROM json_t;

Delete multiple keys

  • Apart from a single key, the jsonb_strip_nulls AND - operator can also remove several keys.
UPDATE json_t SET json_column = json_column - '{key1, key2}';

Query for a specific key/value

  • Use the ?, ?&, ?| operator to query for a specific key or value.
-- Check if a key exists
SELECT * FROM json_t WHERE json_column ? 'some_key';
 
-- Check if multiple keys exist
SELECT * FROM json_t WHERE json_column ?& array['key1', 'key2'];
 
-- Check if any of the keys exist
SELECT * FROM json_t WHERE json_column ?| array['key1', 'key2'];

Best Practices

  • Choose the Right Data Format: While jsonb offers flexibility, it isn't suitable for every situation. A traditional relational database table structure is often better for fixed-structured data that requires frequent correlation queries.

  • Use Indexing: Effective indexing is essential in large datasets with frequent queries. GIN indexes are typically the most efficient for querying jsonb data.

  • Monitor Query Performance: Avoid unnecessary nested queries and utilize indexes for the queries, especially in large datasets.

    Use EXPLAIN to evaluate the query plan. This will help you determine whether indexes are utilized and whether further optimizations are possible.

  • Data Normalization: Although jsonb allows for the flexibility of denormalized data storage, excessive use can lead to data redundancy and performance issues. Ensure data organization and normalization as appropriate.

  • Utilize Appropriate Functions and Operators: Tacnode offers numerous jsonb functions and operators tailored for data manipulation. Please familiarize yourself with these tools and incorporate them as needed.

  • Limit Frequent Modifications: Regularly altering jsonb column data may degrade performance, often requiring rewriting the entire jsonb object.

  • Storage Size: The jsonb storage size is generally larger than equivalent relational data. Therefore, when storing substantial amounts of data, plan your capacity properly.

FAQ

  1. Data Insertion Error

    Problem: An error arises when inserting invalid JSON data.

    Solution:

    • Confirm that the data you are inserting conforms to valid JSON format.
    • Use online tools or software libraries to validate your JSON data.
    • If an error message appears, verify the JSON format first.
  2. Query Did Not Return Expected Results

    Problem: The expected results are not produced when querying a specific jsonb key or value.

    Solution:

    • Make sure that the jsonb key or value you are querying precisely matches the data in the column, paying close attention to case sensitivity.

    • Format the output using the jsonb_pretty functions, which can help troubleshoot and resolve data issues.