Loading Initial Data with Spring Boot

In this article, we will look at options for loading initial data with Spring Boot. We will look at the different options available with Spring Boot.

Spring Boot makes it really easy to manage any type of database changes. It will search for entities in our packages and create the respective tables if we do not define any custom config and use the default configuration. We can use the data.sql and schema.sql files in spring to gain more control over database alterations. It is a powerful feature that lets you work in different environments. Let’s see how to load this initial data on startup with an example below.

1. Initializing Spring Boot JPA Entities

For loading initial data with Spring Boot, we can use the Spring Boot built in support for JPA. Let’s assume that we have an employee entity, which requires a schema and sample data to be initialized in the database.

public class Employee {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long id;
    
    private String employeeName;
    private String salary;
    private Date createdAt;
    private Date updatedAt;
}

When we run our application, Spring Boot will create an empty table for us, but will not populate it for the above defined Entity. Schemas for your entities can be created automatically by setting spring.jpa.hibernate.ddl-auto in Spring Boot configuration file to create or create-drop. If you set ddl-auto to create or create-drop, Hibernate will generate a schema for your entity based on its mapping. You need to add the following property in your application.properties file.

spring.jpa.hibernate.ddl-auto=create

When the ddl-auto or create-drop values are created, Hibernate looks for import.sql on the class path in order to initialize data. You can add the import.sql file on the classpath:

INSERT INTO employee VALUES 
	('Steve', '50000', '2022-04-04 11:33:30', NULL);
	('Bill', '55000', '2022-04-05 12:33:30', NULL);
	('Mark', '30000', '2022-04-01 04:31:50', '2022-04-08 09:12:32');
	('Josh', '60000', '2022-04-03 09:22:25', '2022-04-07 12:34:54');

The above approach, including the use of JPA entities, has its own drawbacks. In import.sql file, each line should contain a single SQL statement. To make the import.sql work, it should have one statement for each line

1.1 The data.sql File.

As the name suggests, we need to add the data.sql file on the classpath. Spring Boot will scan the classpath and pick the file during the database update operation. Here is how the file might look like:

INSERT INTO employee (employee_name, salary, created_at, updated_at)
VALUES ('Steve', '50000', '2022-04-04 11:33:30', NULL);

INSERT INTO employee (employee_name, salary, created_at, updated_at)
VALUES ('Bill', '55000', '2022-04-05 12:33:30', NULL);

INSERT INTO employee (employee_name, salary, created_at, updated_at)
VALUES ('Mark', '30000', '2022-04-01 04:31:50', '2022-04-08 09:12:32');

INSERT INTO employee (employee_name, salary, created_at, updated_at)
VALUES ('Josh', '60000', '2022-04-03 09:22:25', '2022-04-07 12:34:54');

**Loading Initial Data with Spring Boot**

1.2. The schema.sql File.

If we don’t want to use the default schema creation mechanism, we can create a custom schema.sql file for loading the initial data with Spring Boot. This file will be picked by Spring Boot for schema creation.

CREATE TABLE employee (
  id INTEGER NOT NULL AUTO_INCREMENT, 
  employee_name varchar(45), 
  salary varchar(45) NOT NULL, 
  created_at datetime NOT NULL, 
  updated_at datetime DEFAULT NULL, 
  PRIMARY KEY (id)
);

Note that script-based initializations, that is schema.sql and data.sql, and hibernate initializations together cause some issues.

To disable the Hibernate automatic creation, we can add the following property in the application.properties file. This property will ensure that Spring Boot script-based initialization will use schema.sql and data.sql directly.

spring.jpa.hibernate.ddl-auto=none

We can still have both Hibernate automatic schema generation and script-based schema creation in conjugation by setting the following property in the application.proerties.

spring.jpa.defer-datasource-initialization=true

As a result, once it complete schema creation, schema.sql will be read for any additional schema changes, and data.sql will be executed to populate the database. Any changes in the data.sql file and schema.sql file will affect the actual database and tables as well. The default performs script-based initialization, but this applies to embedded databases only.

If you always want database initialize using the scripts, add spring.sql.init.mode=always in the application.properties file.

2. Loading Data from Multiple DB Vendors

Spring Boot application can create DDL script schemas by using the JDBC Data source. The data-source connection factory automatically creates and initializes the DML scripts. This also loads the SQL as part of the standard classpath scanning for sql files, i.e. schema.sql and data.sql.

2.1. data.sql

We can update the data fields using this file:

INSERT INTO employee (employee_name, salary, created_at, updated_at) 
VALUES ('Steve', '50000', '2022-04-04 11:33:30', NULL);

We can also load the schema.sql file as described in Section 1 for loading Initial Data with Spring Boot. We can also process the schema-${platform}.sql and data-${platform}.sql (platform can be oracle, MySQL, PostgreSQL) files. This allows switching among the database-specific scripts if required. The database initialization happens on the embedded in-memory database by default, though we can set the spring.sql.init mode to always initialize the SQL database. It also enables the fail-fast feature by default for the script-based database initializer, i.e. the application cannot start if the scripts throw exceptions.

These types of script-based data source initialization take place before the creation of any EntityManagerFactory beans. The schema.sql manages the DDL and creates the schema whereas the data.sql manages the DML and populates the database. You can also use high-level DB migration tool like flyway or Liquibase for creating and initializing the schema. These can help you in making scripts with custom names.

3. Hibernate to Control Database Creation

A JPA-Specific property that is provided by Hibernate to control Database creation and perform DDL generation is spring.jpa.hibernate.ddl-auto. We can also use this property for loading initial data with Spring Boot. It has multiple property values that are create, update, create-drop, validate and <em>none</em>. Each of these has different functions and controls the Database creation differently. Let’s see how each of them changes the DDL queries below.

create: Hibernate will drop all the existing tables and then create the new tables from the start.
update: it created objects based upon the mappings that include annotations or XML. This is compared with the existing schema and then used for updating the schema as per the difference. It will not delete any existing tables or remove any columns, even when they are no longer required. It will only update the existing schema, i.e. change the data types and add any columns as required.
create-drop: Similar to the create property value. It will drop the entire database once we complete all the operations. It’s useful for the unit testing.
validate: Validating whether the tables and columns are defined in the .sql file exists in the database or not. It will throw an exception otherwise.
none: Turn off any type of DDL generation.

If no schema manager has been detected, Spring Boot will internally set this parameter value to create-drop, otherwise none for every other case

4. Configuring Database Schema Creation

By default, Spring Boot DataSource will be automatically initialized with a schema. If we want to alter or customize this behavior for loading Initial Data with Spring Boot, we can use the spring.sql.init.mode property. This property has three values:

always: This will always initialize the database.
embedded: Always initialize if an embedded database is in use. This property is set by default if any other property value is not specified.
never: Never initialize any type of database.

Using any non-embedded database like MySQL or PostgreSQL, it becomes necessary to set this property to always if we want to initialize its schema

5. Using @Sql Annotation

The @Sql annotation provides a declarative way of initializing and populating our test schema. For our integration test, let’s create a new table and load it with initial data using the @Sql annotation.

@RunWith(SpringRunner.class)
@SpringBootTest
public class SpringBootInitialLoadIntegrationTest {

    @Autowired
    private EmployeeRepository employeeRepository;

    @Test
    public void testLoadDataForTestClass() {
        assertEquals(4, employeeRepository.findAll().size());
    }
}

Let’s look at the different component of the @SQL annotation:

config – Describes the local configuration for the SQL scripts.
executionPhase – we can specify when the scripts will be executed, either BEFORE_TEST_METHOD or AFTER_TEST_METHOD.
statements – Allows declaring the inline SQL statements to be executed.
scripts – We can declare the paths to the SQL script files to execute.

@Test
@Sql({"/import_senior_employees.sql"})
public void testLoadDataForTestCase() {
    assertEquals(5, employeeRepository.findAll().size());
}

6. Using @SqlConfig

Using the @SqlConfig annotation for loading Initial Data with Spring Boot, we can configure how SQL scripts are parsed and run. We can declare class level configurations as @SqlConfig, where it serves as a global setting for the class. Or, we can use it to set specific @Sql annotations. Here’s an example where we specify the encoding of our SQL scripts, along with the transaction mode in which we execute them:

@Test
@Sql(scripts = {
        "/import_senior_employees.sql"
    },
    config = @SqlConfig(encoding = "utf-8", transactionMode = TransactionMode.ISOLATED))
public void testLoadDataV1ForTestCase() {
    assertEquals(5, employeeRepository.findAll().size());
}

blockCommentStartDelimiter–This represents the delimiter that is used to identify the start of block comments in SQL script files
blockCommentEndDelimiter–In SQL script files, this is used to show the end of block comments
commentPrefix–The prefix used to identify single-line comments in SQL scripts
dataSource–It will run XML scripts and SQL statements against the javax.sql.DataSource bean.
encoding–This represents the encoding that the SQL script files will use. By default, it’s the same as the platform encoding.
errorMode–This mode represents the errorMode that is going to be used whenever an error occurs while running the scripts
separator–This defines the string that is used for separating different individual statements. “-” is used by default.
transactionManager–This defines the bean name of the PlatformTransactionManager that is used by the transactions
transactionMode–Used while executing any scripts in the transaction.

7. Using @Sqlgroup Annotation

In Java 8 and above, multiple annotations are supported. We can use this feature for @Sql annotations for loading Initial Data with Spring Boot. For Java 7 and below, there is a container annotation called @SqlGroup. We can declare multiple @Sql annotations by using the @SqlGroup annotation.

@SqlGroup({
    @Sql(scripts = "/employees_schema.sql",
        config = @SqlConfig(transactionMode = TransactionMode.ISOLATED)),
    @Sql("/import_employees.sql")
})
public class SpringBootSqlGroupAnnotationIntegrationTest {

    @Autowired
    private EmployeeRepository employeeRepository;

    @Test
    public void testLoadDataForTestCase() {
        assertEquals(4, employeeRepository.findAll().size());
    }
}

Summary

In this article, we saw how to use different methods for loading initial data with the Spring Boot. We learned how to set up a schema and populate it with data by using schema.sql and data.sql files. In addition, we looked at how to load test data for tests using @Sql, @SqlConfig, and @SqlGroup annotations. It is important to note that this approach is more suitable for basic and simple scenarios. Any advanced database handling would require more advanced and refined tools like Liquibase and Flyway. As always, you check our GitHub repository for the latest source code.