tacnode

Tutorial - NYC Hotel Business (Semi-structured)

Background

In bustling New York City, the variety of restaurants is vast, ranging from street vendors to upscale dining. For a data analyst, New York is a treasure trove of data, full of opportunities and challenges. By delving into restaurant data, analysts can provide more valuable recommendations to diners, such as which restaurant has a higher food safety rating or which one has better customer reviews during specific time periods.

To conduct such in-depth analysis, data analysts need a database that can store both structured and unstructured data efficiently. Traditional relational databases excel at handling structured data but may face challenges with unstructured data like customer reviews and restaurant descriptions. Conversely, document databases are flexible with unstructured data but may not be as efficient for complex queries as relational databases.

This is where the Tacnode database stands out. It is fully compatible with PostgreSQL syntax and combines the strengths of both relational and document databases. This provides data analysts with an efficient and flexible tool to extract valuable insights from vast amounts of restaurant data.

This tutorial will guide you on how to create and query a New York restaurant dataset in Tacnode, showcasing its powerful data querying and management capabilities.

Data Structure and Sample Data Preparation

You don't need to start from scratch with raw JSON data. To streamline your experience, we have processed the data into a CSV format that matches the table structure. This format can be easily imported into Tacnode, leveraging its robust storage and querying capabilities.

CSV Sample Data:

1   Morris Park Bake Shop  Bronx  Bakery {"coord": [-73.856077, 40.848447], "street": "Morris Park Ave", "zipcode": "10462", "building": "1007"}    [{"date": {"$date": 1393804800000}, "grade": "A", "score": 2}, {"date": {"$date": 1378857600000}, "grade": "A", "score": 6}, {"date": {"$date": 1358985600000}, "grade": "A", "score": 10}, {"date": {"$date": 1322006400000}, "grade": "A", "score": 9}, {"date": {"$date": 1299715200000}, "grade": "B", "score": 14}]   30075445
2  Wendy'S    Brooklyn   Hamburgers {"coord": [-73.961704, 40.662942], "street": "Flatbush Avenue", "zipcode": "11225", "building": "469"} [{"date": {"$date": 1419897600000}, "grade": "A", "score": 8}, {"date": {"$date": 1404172800000}, "grade": "B", "score": 23}, {"date": {"$date": 1367280000000}, "grade": "A", "score": 12}, {"date": {"$date": 1336435200000}, "grade": "A", "score": 12}]    30112340
3  Dj Reynolds Pub And Restaurant Manhattan  Irish  {"coord": [-73.98513559999999, 40.7676919], "street": "West   57 Street", "zipcode": "10019", "building": "351"}   [{"date": {"$date": 1409961600000}, "grade": "A", "score": 2}, {"date": {"$date": 1374451200000}, "grade": "A", "score": 11}, {"date": {"$date": 1343692800000}, "grade": "A", "score": 12}, {"date": {"$date": 1325116800000}, "grade": "A", "score": 12}]    30191841

Before importing the data, ensure you have downloaded this CSV file and know its storage path (e.g., /tmp/restaurants.csv).

Prerequisites and Environment Setup

  • Basic knowledge of SQL.
  • Tacnode environment installed and configured.

Main Steps

1. Create Database and Tables

CREATE DATABASE restaurantdb;
 
\c restaurantdb
 
CREATE TABLE restaurants (
   id SERIAL PRIMARY KEY,
   name text NOT NULL,
   borough text,
   cuisine text,
   address JSONB NOT NULL,
   grades JSONB NOT NULL,
   restaurant_id text
);

2. Import Data

With the JSON data processed into a format matching the table structure, use the COPY command to import it.

\copy restaurants from '/tmp/restaurants.csv';

3. Data Querying

  1. Simple Query
SELECT name, borough, cuisine FROM restaurants WHERE borough = 'Brooklyn';
  1. Query Restaurants with Specific Ratings
SELECT name, grades FROM restaurants WHERE grades @> '[{"grade": "A"}]';
  1. Query Restaurants in a Specific Location
SELECT name FROM restaurants WHERE address->>'zipcode' = '11225';

Advanced Query Tutorial

  1. Query a Specific Type of Restaurant and Sort by Name:
SELECT name, borough FROM restaurants WHERE cuisine = 'Hamburgers' ORDER BY name;
  1. Query All Restaurants with Rating Data:
SELECT name FROM restaurants WHERE jsonb_array_length(grades) > 0;
  1. Query All Restaurants with an "A" Rating:
SELECT borough, name FROM restaurants WHERE grades @> '[{"grade": "A"}]'::jsonb order by borough;
  1. Query the Number of Restaurants in a Specific Zip Code:
SELECT address->>'zipcode' AS zipcode, COUNT(*) FROM restaurants GROUP BY address->>'zipcode';
  1. Query Restaurants Within a Specific Rating Date Range:
SELECT name FROM restaurants WHERE grades @> '[{"date": {"$date": 1419897600000}}]'::jsonb;
  1. Query Restaurants with Multiple Ratings:
SELECT name, grades FROM restaurants WHERE jsonb_array_length(grades) > 1;
  1. Query the Highest Rated Restaurant on a Specific Street:
SELECT name, MAX((grades->0->>'score')::int) AS max_score FROM restaurants WHERE address->>'street' = 'Flatbush Avenue' GROUP BY name;

Deep Dive into Tacnode: Advanced Queries and Data Analysis

In previous sections, we covered setting up data structures and importing data into Tacnode. However, the true value of data lies in how we query and analyze it. Tacnode's appeal is not only in its ability to store both relational and document data seamlessly but also in its efficiency in executing complex queries, especially those involving JSON data.

In this section, we will explore advanced querying techniques. These techniques will demonstrate how to leverage Tacnode's capabilities to conduct in-depth data mining and analysis on the restaurants table. These queries will help analysts better understand restaurant distribution, cuisine popularity, rating trends, and more. Let's get started!

  1. Aggregate Query: Find the number of restaurants for each cuisine.
SELECT cuisine, COUNT(*) as restaurant_count
FROM restaurants
GROUP BY cuisine
ORDER BY restaurant_count DESC;
  1. JSON Operations: Find all restaurants located within a specified coordinate range.
SELECT name, address->>'street' as street
FROM restaurants
WHERE (address->'coord'->>0)::float8 BETWEEN -73.97 AND -73.95
AND (address->'coord'->>1)::float8 BETWEEN 40.66 AND 40.67;
  1. Deep JSON Query: Find all restaurants with a score over 90.
SELECT name
FROM restaurants
WHERE EXISTS (
  SELECT 1
  FROM jsonb_array_elements(grades) grade
  WHERE (grade->>'score')::int > 90
);
  1. Using JSON Aggregate Functions: Find the top 5 restaurants with the highest average scores.
SELECT
    name,
    AVG((grade->>'score')::float) as average_score
FROM
    restaurants, jsonb_array_elements(grades) as grade
WHERE
    grade->>'score' IS NOT NULL
GROUP BY
    name
HAVING
    AVG((grade->>'score')::float) IS NOT NULL
ORDER BY
    average_score DESC
LIMIT 5;
  1. Date and Time Operations: Analyze the top 5 restaurants with the highest scores within a specific time period.
SELECT
    name,
    borough,
    cuisine,
    grades->0->>'score' AS score
FROM
    restaurants
WHERE
    (grades->0->>'score') IS NOT NULL
AND
    (grades->0->'date'->>'$date')::bigint / 1000 BETWEEN EXTRACT(EPOCH FROM '2015-01-01'::timestamp) AND EXTRACT(EPOCH FROM '2015-02-28'::timestamp)
AND
    grades->0->'date'->>'$date' IS NOT NULL
ORDER BY
    (grades->0->>'score')::int DESC
LIMIT 5;

Exploring Tacnode: JSONB Indexing and Optimization

With the widespread use of document-type data in modern applications, effectively querying and indexing JSON structures has become increasingly important. Especially in a high-performance database like Tacnode, leveraging the jsonb data type and its indexing capabilities can provide unprecedented query advantages. In this chapter, we will focus on maximizing the use of jsonb indexing and optimizing queries to ensure your queries are always in top form.

In our scenario, a data analyst is studying information from numerous restaurants stored in the restaurants table, where the address and grades fields use the jsonb data type. To quickly query and analyze these document-type data, we need to index them appropriately.

JSONB Index Types

Tacnode offers several jsonb index types, with GIN (Generalized Inverted Index) being the most common. It is particularly suitable for jsonb columns containing multiple key-value pairs, arrays, and nested structures.

To improve query efficiency, we can create GIN indexes for the address and grades fields:

CREATE INDEX idx_address_gin ON restaurants USING GIN(address);
CREATE INDEX idx_grades_gin ON restaurants USING GIN(grades);

Querying with Indexes

With GIN indexes, we can execute queries involving jsonb fields more efficiently. For example, to find all restaurants in the Brooklyn area with an "A" rating, we can query:

SELECT name, borough
FROM restaurants
WHERE borough = 'Brooklyn' AND grades @> '[{"grade": "A"}]';

Here, @> is the jsonb containment operator, used to check if the left jsonb value contains the right jsonb value.

Query Optimization Tips

  • Use Tacnode's jsonb operators and functions when querying jsonb fields, as they are optimized for GIN indexes.
  • Avoid using conversion functions like ::text, as they prevent index usage and lead to full table scans.
  • Regularly use the EXPLAIN command to check query execution plans to ensure they are effectively using indexes.

In a high-performance database like Tacnode, effectively using jsonb indexing ensures that querying document-type data is as efficient as querying traditional relational data, if not more so. By leveraging Tacnode's features, data analysts can flexibly analyze and explore document-type data without sacrificing query performance.

Exploring Tacnode: Using JSONPath to Query Restaurant Information

With jsonpath, we can query jsonb data more flexibly and intuitively. When conducting complex queries, the key is understanding its internal workings and how to interact with JSON structures. Below are some complex query examples and their detailed explanations:

1. Find Restaurants with at Least One "A" Rating and a Score Over 15

SELECT name
FROM restaurants
WHERE grades @? '$[*] ? (@.grade == "A" && @.score > 15)';

This query uses the [*] wildcard to iterate over each element in the grades array. The condition after the ? operator filters elements that meet the criteria. The @ symbol represents the current element being iterated.

2. Find All Restaurants Without a "B" Rating

SELECT name
FROM restaurants
WHERE NOT grades @? '$[*] ? (@.grade == "B")';

Here, the NOT keyword is used with the existence operator @? to ensure no elements have a "B" rating.

3. Find Restaurants with "Avenue" in the Street Name but Not "Flatbush Avenue"

SELECT name
FROM restaurants
WHERE address @? '$ ? (@.street like_regex "Avenue" flag "i" && !(@.street == "Flatbush Avenue"))';

The like_regex operator is used for regular expression matching, with flag "i" indicating case-insensitive matching. The ! symbol represents logical "not."

4. Query Restaurants with an "A" Rating Within a Specific Date Range

SELECT name
FROM restaurants
WHERE grades @? '$[*] ? (@.grade == "A" && @.date."$date" >= 1419897600000 && @.date."$date" <= 1422748800000)';

This query aims to filter ratings that are "A" and within a specified date range. First, we use the [*] wildcard to iterate over each element in the grades array. Then, we use the condition after the ? operator to filter, where the @ symbol represents the current element being iterated. We check if each rating's grade attribute is "A" and if its date.$date attribute falls within our target date range.

5. Find Restaurants with Scores Over 90 and Their Corresponding Scores

SELECT
    name,
    (jsonb_path_query_first(grades, '$[*] ? (@.score > 90)')->>'score')::int AS score_above_90
FROM
    restaurants
WHERE
    grades @? '$[*] ? (@.score > 90)'
ORDER BY
    score_above_90 DESC;

The entire JSONPath expression $[*] ? (@.score > 90) is used to iterate over each score object in the grades array and select those with a score attribute greater than 90.

jsonpath provides data analysts with a powerful tool to flexibly query nested and structured JSON data without using complex JSON functions and operators. Combined with Tacnode's high-performance features, jsonpath can help data analysts quickly obtain the information they need, enhancing their productivity.

Conclusion and Further Learning

Through this tutorial, we have explored how Tacnode seamlessly combines relational and document data storage and querying capabilities. With this knowledge, you can handle various complex data more flexibly, providing strong support for your data analysis work. For further learning, you can refer to Tacnode's official documentation to explore more advanced features and capabilities.