Skip to main content

3 posts tagged with "database"

View All Tags

· 14 min read
Taehoon Moon

Introduction

GlueSQL is a versatile database project designed for exceptional portability across a broad range of environments, from embedded systems and servers to web and mobile platforms. The core goal is to support diverse storage environments and manage various data types with a standard SQL approach.

Imagine handling files like CSV, JSONL, and Parquet, or transforming key-value or NoSQL databases such as RocksDB, Redis, and MongoDB into SQL-supporting databases—all feasible with GlueSQL. It can also operate with storages supported in web browsers.

GlueSQL's essential feature is providing a management layer for these diverse storage scenarios without requiring data migration. The broader aim is to facilitate portability of GlueSQL to any environment supporting read or read-write operations. This extends to APIs like GitHub, or messengers like Discord or Slack.

GlueSQL supports both structured and unstructured data and is written in Rust for compatibility with various environments. While portability is its core value, the emphasis is on creating an intuitive, comfortable development environment for easy custom storage implementation.

Ultimately, GlueSQL aims to significantly reduce the cost, time, and complexity of developing new databases. By leveraging GlueSQL for the parser, planner, and execution layer, developers can focus on creating specific storage implementations, leading to a more convenient query interface like SQL for many environments.

The Problem: Why Reinvent the Database?

Despite the numerous database implementations that currently exist, the emergence of new databases continues. The primary reason behind this trend is our need for databases for a broad spectrum of distinct purposes. For instance, new databases are surfacing that are specifically optimized for Large Language Models (LLMs) like ChatGPT. The range is wide and diverse, encompassing embedded databases, OLAP for data analysis, OLTP databases optimized for online transactions, databases specialized for time-series data processing, and many more.

With such varied requirements, we find ourselves in constant need of fresh databases. However, constructing a database from scratch is a monumental task. It necessitates defining a query interface for handling the database and implementing a corresponding parser. Moreover, a separate execution layer for running operations must be built. Also, the planning layer, which is responsible for devising execution strategies, is a vital aspect of this process. Let's not forget about the critical storage layer that physically reads and stores the data. In a nutshell, there's a daunting amount of work involved in developing a new database.

Given these circumstances, it's understandable why numerous emerging databases resort to high pricing structures—they need immediate revenue to offset continuous development costs.

But the story doesn't end here. Query interfaces like SQL are indeed useful for serious tasks, but they also provide excellent utility for handling simple log files such as CSV, JSONL, Parquet, and even for utilizing REST APIs for various applications. The issue arises when a complex query interface needs to be provided even for these lighter storage requirements—it necessitates a development process almost identical to building a sophisticated database. Implementing an entire parser and execution layer just to add SQL support to an existing service can seem like an excessive burden.

Whether it's a simple storage environment or a serious task, the key lies in the storage layer, which involves the actual reading and storing of data. So, what if developers focused on implementing these storage mechanisms while the remaining parts could be handled using existing libraries? This is the role that GlueSQL aspires to play.

The Vision of GlueSQL

The GlueSQL project aims to offer a unified query interface for various environments. The goal is to allow anyone to port and use SQL and GlueSQL's proprietary query builder, the AST Builder, in any desired environment. This could range from key-value databases, serious NoSQL databases, log files, and even REST API services. Essentially, if a service supports reading or read-writing data, regardless of the data type, it can readily support a complex query interface via GlueSQL.

Presently, the GlueSQL project itself directly supports a few storage types as reference storages. These include in-memory storage for non-persistent data handling, sled storage, which is a key-value database written in Rust, JSON storage for handling JSON and JSONL files, and a storage that ports SQL to the web browser's IndexedDB. While the GlueSQL Team is primarily developing these, the aim is to allow anyone to create such custom storages for a wide array of purposes, thus enabling them to assemble the database of their choosing.

Imagine using GlueSQL's SQL and AST Builder everywhere, with the simple method of swapping out storages to operate in diverse settings. It could significantly reduce software development costs. Developers wouldn't need to learn the different usage methods for each database. Instead, they could focus solely on implementing business logic using the same interface.

Our vision is to reduce database development costs by 10 times, or even more than 20 times. We aim to gather diverse database creators under the GlueSQL banner, making it the go-to solution for cost-effective database development.

Benefits to Database Users: Unifying Query Interfaces, Streamlining Software Development, and Reducing Costs

From the perspective of the users who engage with databases, there has always been the burden of learning different interfaces to interact with each database. The approach required to work with Redis is different from that necessary for MongoDB. Likewise, handling SQL databases necessitates using SQL. Although SQL databases generally use a common SQL, the SQL they support can considerably vary when examined in detail.

Naturally, there are legitimate reasons for such differences. Each database focuses on different areas, and to cater to specialized functionalities, they incorporate dedicated interface mechanisms. However, not all application development needs to utilize these database-specific core special functionalities.

Let's look at a couple of examples:

Suppose you're developing a back-end application that uses MySQL as the database and Redis for caching. Due to the vast differences in handling SQL databases and Redis, you would have to develop using different methods when storing data.

Here's another scenario: Imagine you're implementing a data migration pipeline between various databases and log files. Let's say you're transferring Parquet to Redis or MongoDB. In this case, you would need to convert data using different methods for each, all of which would be a cumbersome process.

In both of the above examples, GlueSQL can directly address and solve the issues. It offers the convenience of a uniform query interface to deal with these matters. In certain scenarios, even the construction of a data pipeline can potentially be solved with a single SQL query, thanks to GlueSQL.

Benefits to Database Developers: Drastically Lowering Development Costs and Simplifying the Creation of Purpose-Built Databases

If you want to support SQL in the desired environment, using GlueSQL essentially requires you to implement an interface for Storage. There's no need to support all functionalities from the beginning. You can start lightly, choosing and implementing storage features suitable for the environment you want to create. To facilitate this, GlueSQL also provides a library in the form of a test suite to easily validate the storage you've implemented.

Lowering development costs in this way will enable a broader range of developers to support the GlueSQL query interface. As more developers join, a significant synergy can be generated. Designing a query interface from scratch involves a great deal of work, including planning and supporting the interface for different target programming languages.

However, despite all this hard work, it is not easy to attract database users accustomed to different methods.

Consider that the SQL and AST Builder provided by GlueSQL are already securing numerous users. This eliminates the need for efforts to promote a newly planned query interface. Over the years, many new databases have emphasized compatibility with PostgreSQL or MySQL for similar reasons. As GlueSQL places a strong emphasis on portability in its query interface planning, it allows for more flexible configuration according to the desired situation. Through the AST Builder, it also eliminates the cost of porting to different languages.

For many database developers, using GlueSQL can be an optimal choice, as it can save costs and quickly secure users.

Let me mention one more thing: what's convenient for humans... could be applied to AI as well. Rather than making AI write automation code using different databases, providing a common query interface can be much more efficient.

The Future with GlueSQL

GlueSQL has been and will continue to improve and develop new features to enable portability in various environments. Thanks to the schemaless data support added last year, it is now possible to handle both structured data with schema and unstructured data like JSON simultaneously. This has significantly increased the range of storage environments that can be supported.

One of the key features added last year was the AST Builder. This feature allowed us to escape the confines of SQL and provide an interface for comfortably handling data in the programming languages used for development.

Of course, improving existing features is extremely important, and there are many new features to be added. As a major development plan this year, we aim to develop features to effectively attach GlueSQL to NoSQL databases with their own planners and execution layers. The GlueSQL query planner, currently at a basic level, will see significant changes this year. With the expansion of this planner, not only NoSQL databases but also other SQL databases could be supported without sacrificing performance using GlueSQL.

The synergy that arises from the combination of different databases is a significant bonus in this process.

The Journey of the GlueSQL Team

The GlueSQL project was first conceived in the fall of 2019, and since then we have been developing it continuously. Personally, I have created various products in a variety of environments, including game development, backend server, and frontend development over the past decade. The experience gained through this process was a major motivation to start the GlueSQL project.

To put it grandly, the inconveniences felt while using different databases in various environments were a major motivation, wouldn't you say?

The start was actually a bit simple. Around 2019, I was mainly doing web front-end development. However, the lack of a structured database for state management and internal data processing made it very uncomfortable, especially since I couldn't use SQL databases and the like. So I started to lightly create an SQL database that could run on a web browser. Also, I wanted to use Rust, but after failing to introduce it at the company I was working for at the time, I decided to use it in my own project.

But as I started developing, my dreams grew significantly. Beyond a SQL database that simply operates on a web browser, I started envisioning a database that fits the name "Glue", one that can easily be ported to various environments, and I continue that journey to this day.

Whether I took the database project too lightly, or because the features I wanted kept increasing, the content to be developed kept expanding. As a result, I ended up investing full time in the GlueSQL project development for over three years. For a year in between, I even juggled full-time software engineering work alongside GlueSQL development. Currently, I'm back to developing the GlueSQL project full time, alongside various part-time contributors.

Now, we're getting very close to the starting point of the picture I wanted to create through GlueSQL, and thankfully, with contributors joining me, I am not alone.

The Sustainability and Business Aspect of GlueSQL

I believe that what we create through GlueSQL will make a great contribution to the world and make many software engineers happy. This gives me immense strength to continue developing even in difficult situations. However, we cannot accomplish everything with pure passion alone. As much as the GlueSQL project can make a significant contribution, I also see it as holding great business value.

The business strategy of GlueSQL may be somewhat different from other databases. We distribute the project itself as open source under the Apache-2.0 license, so that anyone can use it fully, and we do not consider pricing methods such as restricting features to the storages we support. In fact, if there is any player who can do it better, there's no way to prevent them from taking the GlueSQL project and making it their own.

But we believe that GlueSQL has great potential in this regard. Anyone can participate and they are free to distribute their own storage in the way they want, whether it's open source, private, or commercial. This eliminates the need to create something to replace the GlueSQL project. We aim to prevent the need to recreate the wheel that we provide using GlueSQL.

Moreover, our GlueSQL team seeks to continually expand our group of developers and companies working with us. During this development process, while they can certainly implement everything on their own, there is also no reason not to collaborate with our GlueSQL Team, especially for databases like NoSQL that have their own planners and execution layers. If you have a REST API and want to enhance convenience through SQL support, you can do it yourself or you can collaborate with us.

In addition, for some storages, we can also participate as players in the same position as other custom storage developers. We plan to expand the GlueSQL ecosystem in various ways, such as technical support and storage development.

We are finally ready to provide GlueSQL to users at the production level. We are accelerating the development of GlueSQL. If you are a company interested in storage development like SQL support, or if you resonate with our vision and want to join us, please contact us at taehoon@gluesql.com.

Conclusion

The continued emergence of new databases is driven by the demand for diverse and specialized databases, such as those optimized for Large Language Models (LLMs) and databases catering to unique requirements, like embedded databases, OLAP, OLTP, and time-series data processing. However, developing a new database from scratch is a significant undertaking, requiring extensive work, which often results in high costs.

GlueSQL presents a solution to this challenge by providing a unified query interface that can be ported across various environments, from key-value databases, NoSQL databases, log files, to REST APIs. It allows anyone to create custom storages, reducing the need for developers to build entirely new databases and to learn different usage methods for each database. Instead, they can focus on implementing their business logic using the same interface.

From a user perspective, GlueSQL offers the convenience of a unified query interface, easing the burden of learning different interfaces for each database. This simplification of interface use can also extend to AI, potentially enhancing the efficiency of AI automation.

GlueSQL's development plan includes significant enhancements to its query planner and aims to enable effective attachment of GlueSQL to NoSQL databases. The synergy of combining different databases is a valuable bonus in this process.

Since its inception in the fall of 2019, the GlueSQL team has continuously developed the project, driven by the desire to mitigate the inconveniences encountered while using different databases in various environments. The journey has been a rewarding one, with the GlueSQL project now at a point where it closely resembles the envisioned product.

GlueSQL, distributed under the Apache-2.0 license, is free for anyone to use and adapt. While the GlueSQL team welcomes collaboration with other developers and companies, they also see significant potential for the project as a business venture. The team is working to expand the GlueSQL ecosystem through a variety of initiatives, including technical support and storage development.

With GlueSQL now sufficiently prepared for practical applications, the team invites companies interested in storage development or those who share their vision to join them in their journey of revolutionizing database development.

· 10 min read
Taehoon Moon

Introduction: GlueSQL and Test-Driven Documentation

Recently, the GlueSQL project reached a significant milestone with the release of version 0.14. This new version brings a host of fresh features to the table, yet one of the most notable changes is in the realm of documentation. For the first time, we're proud to announce the launch of our official documentation website. Interested readers can explore the full range of user manuals at https://gluesql.org/docs.

Prior to this update, the only way to navigate GlueSQL was by manually inspecting the test code within the test suite. With the recent release, however, a comprehensive user manual has been made public to facilitate a more user-friendly experience. We hope that this new addition will prove beneficial to a broad spectrum of users.

The task of compiling an entire database manual in one go was daunting due to the sheer volume of content required. Surprisingly, this process turned out to be smoother than initially anticipated, largely due to the invaluable aid of ChatGPT, which was instrumental in automating much of the document creation. Specifically, around 80% of the SQL Syntax section was generated using this tool.

This remarkable feat was only possible due to the solid foundation of test codes previously established in GlueSQL. In this article, we'll share how we managed to leverage ChatGPT in such a unique way. Based on our recent experience of crafting documents grounded in testing, we've begun to consider the possibility of entirely automating document creation, save for the initial stages.

Along with sharing our journey so far, we will also reveal our plans for future test-based automation of documentation within GlueSQL.

Test Codes and Documentation

The GlueSQL project has placed a significant emphasis on writing test codes. This might be a given for a database project; however, the thoroughness of our approach is evident from our line coverage of nearly 99% for core codes. While we devoted considerable effort to creating these test codes, our primary focus has always been on a different aspect: ensuring that anyone can quickly grasp the content of the tests and easily add new ones.

The intent here is to empower newcomers to GlueSQL to understand the functionality of the software solely by examining integration tests, even in the absence of a user manual.

The integration tests for GlueSQL can be found in the test-suite workspace. For example, here's an excerpt of the test code for the INSERT statement:

test_case!(insert, async move {
run!(
"
CREATE TABLE Test (
id INTEGER DEFAULT 1,
num INTEGER NULL,
name TEXT NOT NULL
);"
);

test! {
name: "basic insert - single item",
sql: "INSERT INTO Test (id, num, name) VALUES (1, 2, 'Hi boo');",
expected: Ok(Payload::Insert(1))
};


test! {
sql: "INSERT INTO Test VALUES(17, 30, 'Sullivan');",
expected: Ok(Payload::Insert(1))
};

test! {
sql: "INSERT INTO Test (num, name) VALUES (28, 'Wazowski');",
expected: Ok(Payload::Insert(1))
};

test! {
sql: "SELECT * FROM Test;",
expected: Ok(select!(
id | num | name;
I64 | I64 | Str;
1 2 "Hi boo".to_owned();
3 9 "Kitty!".to_owned();
2 7 "Monsters".to_owned();
17 30 "Sullivan".to_owned();
1 28 "Wazowski".to_owned()
))
};

// ...
});

Despite being written in Rust, these test cases are designed to be comprehensible, even to those unfamiliar with the language. Each test is a complete scenario from start to finish, and the results of each operation are readily observable.

Given that identifying results from SELECT operations in the form of Rust enums and structs can be challenging, we actively utilized macros such as select! and select_with_null!. We've composed the test cases to demonstrate that the INSERT statement can handle a wide range of cases, including specifying all columns, omitting some, or omitting all.

test! {
sql: "INSERT INTO Test (id, num) VALUES (1, 10);",
expected: Err(InsertError::LackOfRequiredColumn("name".to_owned()).into())
};

The tests also include scenarios for expected faulty inputs, indicating the error returns in these situations.

By organizing the tests in this manner, we aim to make it easy for anyone to read and write tests. Our goal was for these tests to serve as "documentation" for GlueSQL contributors. At the time we were writing these tests, we didn't anticipate that they could actually become documentation themselves. But we've come to realize that they have extraordinary potential.

Leveraging ChatGPT

When we first embarked on writing the User Manual, we were overwhelmed by the sheer volume of content we had to generate. Around the same time, ChatGPT was gaining prominence, and we thought it might be worth trying out, if only to lighten our load slightly.

To our surprise, ChatGPT exceeded our expectations. If the test codes were well written, it was capable of automatically crafting an exceptional document based on them, capturing all essential details.

After several trials and errors, we settled on the following prompt for document generation. While it's still a challenge to use the same prompt for all documentation, we made minor modifications to suit different situations:

I'm creating an SQL database documentation website, and I'd like you to help me with one of the pages that introduces the SQL syntax for GlueSQL.

1. Please provide the response content in the "markdown" format, so I can copy and paste it directly. Keep this constraint in mind while writing.
2. Regardless of the language I use, I need the content written in English.
3. I will provide some test case code written in the Rust language, which contains SQL examples. Please write the documentation based on these examples, but feel free to change the table names, column names, and data types as needed. Don't include any Rust related content or text in the response. all the response code example should be in plain SQL.
4. GlueSQL does not have the VARCHAR type. If you want to use that, please use TEXT instead. You don't need to mention this in the response.
5. Wrap the entire response text using <pre> and </pre> tags so I can copy all the content easily.

Now, I'd like you to write the following request:
SQL Statement - "INSERT"

Here's an example test code you can refer to:

test_case!(insert, async move {
run!(...

Generally, we used the template above, and copied and pasted the test code from our existing test-suite. We leveraged this method to harness our many tests to assist in the creation of the user manual. In the next section, we'll showcase a sample of the documentation generated in this manner. It's quite impressive.

Success Case: Automated User Manual of GlueSQL

Thanks to ChatGPT, the resulting INSERT document page can be viewed at the following link. It's important to note that we used ChatGPT 4 for this task. Version 3.5 wasn't quite up to the task, and using version 4 was the minimum requirement.

https://gluesql.org/docs/dev/sql-syntax/statements/data-manipulation/insert

INSERT Statement

The results were quite impressive. ChatGPT neatly categorized the test cases, explained the syntax, outlined constraints, and provided appropriate examples. It didn't stop there; it also skillfully recognized error test cases and incorporated them into the documentation, as shown below.

INSERT Statement

Isn't it amazing?

Future Plans: Fully Automating Documentation Generation

While the current documentation is far from perfect and there are many aspects that can be improved, we see great potential in this approach. We believe it's entirely possible to automate the process of writing this kind of document, and writing in general.

In the past, documents like user manuals required a great deal of effort to maintain once they were written. If a document contained real code examples, verifying that the code still worked was often a tedious task. With the ability to automatically generate documentation as we have done here, these issues are no longer problematic.

Previously, you would have to write tests and then also document them separately. If you had to support multiple languages, that would be an additional task. With a tool like ChatGPT, you can automate all of this. All a developer has to do is write the tests. This alone can be sufficient. You can generate documents automatically based on the tests. Eventually, we can even support automatically translating these documents into multiple languages.

The GlueSQL project repository is currently hosted on GitHub and makes good use of various GitHub Actions. We envision a GitHub Action that automatically regenerates a document corresponding to a test when a user modifies the test and raises a Pull Request. Another GitHub Action could automatically translate updated documents into supported languages and create new Pull Requests. The possibilities are truly endless.

Not only will this help with document generation, but it will also provide clear guidelines for writing better test code. If we can automatically generate documents based on written tests, the quality of those documents can serve as an indicator of the quality of the tests themselves. This means that a document automation tool can play the role of a good reviewer for tests. It can greatly reduce the time and effort required for painstakingly reviewing the quality of each test. Developers can also write tests without pressure, evaluate their test code by looking at the generated document, and improve it.

Furthermore, multi-language support becomes a breeze. In my personal experience as a software engineer over the past decade, developing various products such as games, web services, and applications, I often needed to support multiple languages. Each time, there was no definite solution. The optimal approach varied depending on the situation, and there were many things to consider when entrusting translations, such as effectively communicating the context of the target service to the translator. Moreover, regularly updating content and having it retranslated into various languages was a very tedious process. While I tried to automate as much as possible by creating various tools, I was never fully satisfied. I believe ChatGPT can completely solve these issues. If there is a need to provide project-specific context, all you need to do is prepare a prompt in advance. Instead of having to rely on and wait for professional translators, we can now entrust this task to ChatGPT, and we only need a few people to review the translated content.

We are nearing a world where documentation is no longer a burden for developers.

Conclusion: The Value of Test-Driven Documentation

The use of ChatGPT to generate documentation has proven a significant step forward in the GlueSQL user manual creation process. Through test-driven documentation, we've managed to automate a substantial part of the manual creation process, saving time and effort, and increasing accuracy.

Moreover, this process has unveiled a new potential for documentation: the possibility of fully automating document generation. We've seen that quality tests can become quality documentation with the help of AI, leading to more efficient workflows and possibly better test code as a result.

The journey doesn't stop here. We envision leveraging this capability further to auto-translate our documents into multiple languages, making our product more accessible to a global audience.

As we progress, we hope that our experience can inspire other developers to explore and embrace the benefits of AI-generated, test-driven documentation. It's not just about saving time—it's about improving the way we work, communicate, and share knowledge.

· 11 min read
Taehoon Moon

Introduction

The divide between SQL and NoSQL databases has often presented challenges in database management. GlueSQL, a unique database maker library, aims to blur this boundary, providing a versatile tool for handling these two distinct types of databases.

In this article, we explore how GlueSQL navigates the features of SQL and NoSQL databases, offering an integrated solution that promotes flexibility and efficiency. With its ability to unify disparate database types, GlueSQL heralds a new age of adaptable database creation and management.

The Interface Perspective: SQL & AST Builder

When we talk about SQL databases, it's almost a given that they support SQL - the standard query language. Although there are slight variations between databases, the convenience of using a similar SQL language across multiple databases cannot be overstated. However, from a software engineer's perspective, there's room for improvement. In most software development scenarios, a specific programming language is used. SQL is a separate language, which can cause friction when integrating it into your software. As a result, rather than using raw SQL, many developers employ query builders or ORMs to manipulate SQL conveniently using their preferred programming language. Although it's not efficient to generate SQL using a query builder and then parse it again in the database, it's a practical and effective choice.

On the other hand, NoSQL databases offer different mechanisms. Some of them have their own language similar to SQL, but most provide an interface library developed specifically for each programming language. While SQL databases rely on external query builder libraries to provide an interface for each programming language, NoSQL databases mostly develop and offer these libraries themselves. If we discount the convenience of SQL language, this is one of the major factors that make NoSQL databases more comfortable to use. Since query builder libraries supporting SQL databases often cater to multiple SQL databases, they are limited in fully supporting unique features of each database. NoSQL databases, on the other hand, can freely manage their interface libraries without these restrictions.

Providing a query interface for each programming language is not a fundamental difference between SQL and NoSQL, but we generally accept it implicitly.

Let's see what happens if we break down this boundary, using GlueSQL as an example. As you can see from the SQL postfix, GlueSQL supports SQL and can be classified as an SQL database.

CREATE TABLE Glue (id INTEGER, name TEXT);

INSERT INTO Glue VALUES (1, "hello"), (2, "gluesql");

SELECT * FROM Glue WHERE id = 1;

However, GlueSQL also supports its own query builder, like a NoSQL database. (Currently, only Rust is supported, but we're working on adding support for other languages.)

table("Glue")
.create_table()
.add_column("id INTEGER")
.add_column("name TEXT")
.execute(glue)

table("Glue")
.insert()
.values(vec![
vec![num(1), text("hello")],
vec![num(2), text("gluesql")],
])
.execute(glue)
.await;

table("Glue")
.select()
.filter(col("id").eq(1))
.execute(glue)
.await;

Let's reconsider the implicit distinction between SQL and NoSQL. GlueSQL indeed supports SQL, but it also officially develops and offers its own query builder. This query builder is not a secondary tool for SQL. While most SQL query builder libraries ultimately generate SQL strings, GlueSQL's builder directly creates an AST (Abstract Structure Tree) that is used for execution within GlueSQL. Hence, we call it the AST Builder. This means SQL and the AST Builder are two equally supported interfaces in GlueSQL.

This also offers an additional advantage:

table("Glue")
.select()
// 1.
.filter(col("id").eq(1))
// 2.
.filter("id = 1")
.execute(glue)
.await;

Because GlueSQL already supports SQL, not only can you use the custom interface in the AST Builder, but you can also use familiar SQL syntax in part. Whether you use col("id").eq(1) or "id = 1", you can use it in the way you prefer. The AST Builder interface, although initially unfamiliar, allows a gradual migration similar to writing SQL for your convenience.

Thus, we've dismantled one of the implicit distinctions between SQL and NoSQL. However, it's more of an implicit differentiation than a fundamental one. There are more significant design differences that we'll explore next.

Structured & Unstructured Data

In this section, we'll discuss how SQL and NoSQL handle data. SQL generally deals with structured data, and recently, it's been made to support semi-structured data as well. On the other hand, NoSQL supports schemaless, unstructured data. Then, we'll explain in detail how GlueSQL handles these two types of data. The last part of this section will provide a segue into the next section where we'll discuss the decomposition of database functions.

When talking about SQL databases, one aspect is usually considered together: SQL databases have a defined schema.

CREATE TABLE Foo (
id INTEGER,
name TEXT,
rate FLOAT NULL
);

However, these days, SQL databases tend to support semi-structured data types, such as LIST or JSON. But, supporting completely schemaless, unstructured data is a different matter. SQL databases typically require a minimum schema.

What about NoSQL databases? As NoSQL databases vary significantly, we can't make definitive statements. But let's consider a typical document database like MongoDB. Unlike SQL databases, it doesn't enforce a schema. Essentially, you can insert any form of data directly. Often, NoSQL databases support schemaless data, but they lack features that enforce a schema like SQL. They generally support structure via validation methods, rather than structured access.

Is there no choice but to distinguish between structured data and unstructured, schemaless data so clearly? GlueSQL is being developed with the goal of being adaptable in various environments. Being forced to choose regarding this schema constraint was quite inconvenient. We started pondering if we couldn't benefit from both aspects - supporting both schema and schemaless data simultaneously, and we eventually found the answer. Let's look at how GlueSQL currently solves this issue through familiar SQL examples.

CREATE TABLE Names (id INTEGER, name TEXT);
INSERT INTO Names VALUES (1, 'glue'), (2, 'sql');

You can create a regular table with a schema like this. But GlueSQL's choice for creating a schemaless table is as follows:

CREATE TABLE Logs;
INSERT INTO Logs VALUES
('{ "id": 1, "value": 30 }'),
('{ "id": 2, "rate": 3.0, "list": [1, 2, 3] }'),
('{ "id": 3, "rate": 5.0, "value": 100 }');

It creates a table without column definitions! If you do this, GlueSQL recognizes the table as schemaless and processes it internally.

SELECT id, rate, list[0] FROM Logs WHERE id = 2;

Although the way to create the table was a bit special, using it isn't much different from the regular SQL SELECT statement. Not only can you differentiate between schema and schemaless when creating tables, but you can also use them interchangeably!

SELECT * FROM Names JOIN Logs ON Names.id = Logs.id;
/*
| id | list | name | rate | value |
|----|---------|------|------|-------|
| 1 | | glue | | 30 |
| 2 |[1, 2, 3]| sql | 3 | |
*/

Here's an example of querying data by INNER JOINing the Names table, which has a schema, and the Logs table, which is schemaless. GlueSQL has resolved this problem by allowing the internal execution layer to handle both vector-type data, for cases where each row has a defined schema, and map-type data for schemaless cases.

Thanks to this, the variety of storage that can be supported through GlueSQL has expanded significantly. If there were previously limitations to supporting NoSQL databases that support schemaless data, that is no longer the case. The reference storage where you can directly experience this schemaless data support is JSON Storage. It offers features that allow you to deal directly with unstructured data like JSON using GlueSQL.

If GlueSQL starts from the perspective of an SQL database and expands, by providing the AST Builder directly, it once blurs the boundary, and by supporting unstructured data simultaneously, it knocks down the boundary once more. How do you like it?

Decomposing Database Functionality: Breaking Down SQL and NoSQL Features

The distinction between SQL and NoSQL is not just about whether they support unstructured data. Of course, there are examples like unstructured data, which is mainly supported only in NoSQL, but in many cases, SQL databases tend to support more diverse and complex queries. NoSQL often gains other advantages in exchange for reducing the range of query support provided by SQL databases.

GlueSQL is ambitious. It has devised a rather interesting method to support all of this through SQL and the AST Builder, with the same interface. When we usually say SQL database, it implicitly assumes that a lot of features have been fully implemented. Create tables by specifying a schema, modify schemas with "alter table", support both clustered and non-clustered indexes, and support transactions. And there's so much more. But the functionality that is naturally supported in SQL databases may not be natural in other environments.

Let's think about JSON Storage. GlueSQL's JSON Storage allows you to handle JSON, JSONL files using SQL and the AST Builder. This JSON Storage does not support atomic operations or transactions. Of course, it would be great if it did, but implementing and executing them would be a significant performance burden. In most cases, when you want to simply browse and handle JSONL files, the overhead caused by transactions can be an unnecessary burden. In this case, you want to handle JSON, JSONL files using SQL, but you don't necessarily need transactions.

To meet the requirements of these diverse environments, GlueSQL has separated the functionality of what we usually call an SQL database into multiple independent interfaces. Store, StoreMut, AlterTable, Transaction, .. These are just a few of the various storage interfaces that GlueSQL currently supports. The way it works can be summarized like this: If you implement Store, you can use SELECT. And if you implement both Store and StoreMut, you can support quite a number of basic SQL statements including SELECT. You can manage tables with CREATE TABLE, DROP TABLE, and handle data using INSERT, UPDATE, DELETE statements. If you only need to retrieve data, you only need to implement Store. If you want to support the ALTER TABLE statement, you can additionally implement the AlterTable interface. The Transaction interface works the same way. The interesting part is that, except for Store and StoreMut, all other storage interfaces can be implemented independently. GlueSQL allows you to choose and implement only the features you need. And it's not just about providing interfaces. It also provides integration tests suitable for each situation to verify what you have implemented. You just need to implement the interface and import the corresponding test case for verification.

In addition to supporting both structured and unstructured data simultaneously, GlueSQL provides the ability to divide the functionality of a database into multiple independent features and selectively implement them. This allows GlueSQL to be ported to a wide variety of environments without any burden.

Conclusion

GlueSQL, while serving as a database that provides its own reference storage, is fundamentally a library designed to simplify the creation of databases. One of the substantial challenges GlueSQL had to overcome in order to support a diverse array of environments was to address the distinctive features that separate conventional SQL databases from NoSQL databases. GlueSQL achieved this through several innovative approaches, managing to support both categories simultaneously despite their significantly different characteristics.

It offers support for SQL alongside an AST Builder, and accommodates both structured and unstructured data. Additionally, it decomposes database functionalities into multiple independent features, allowing each environment to selectively implement the functionalities it requires.

These unique attributes enable GlueSQL to live up to its 'Glue' prefix by facilitating effortless porting across various environments. While we have been developing it for several years, there is still much ground to cover. However, the fact that we are now able to introduce it publicly attests to our successful technological validation and completion of a demonstrable level of implementation.

Through GlueSQL, we hope to provide developers with a unified query interface that can be customized according to their needs, thereby enabling them to produce efficient products more effortlessly. There's a promising future ahead for GlueSQL, and we look forward to its contributions to the technology community.