DEV Community

Cover image for Importing CSV Data into PostgreSQL using Spring Batch
Nilanchal
Nilanchal

Posted on

Importing CSV Data into PostgreSQL using Spring Batch

Spring Batch is a powerful module of the Spring framework that provides out-of-the-box implementation for batch processing tasks.

It is used in scenarios where data needs to be processed in multiple batches, for example, generating daily reports, periodic import of data into a database, or for any complex calculations and transformations of your data.

A typical batch-processing application involves the following steps:

Image description

Why Spring Batch?

All the above steps can be achieved without using the spring batch. However, the spring batch provides the following benefits which makes a strong case for the framework.

  • You do not need to reinvent the wheel
  • Seamless integration with Spring ecosystem
  • Chunk-based processing
  • Includes I/O capabilities such as support for a wide range of data sources and targets, including databases, XML, JSON, and flat files.
  • Detailed monitoring and logging: Allows tracking of job and step execution, making it easier to understand the status and performance of batch jobs.

Spring Batch provides two different ways to implement a job. We can process data using Tasklets and Chunks.

A Tasklet is a single task within a step. The Tasklet interface defines a single method execute(), called once during the step execution.

A Chunk-based approach is more suitable for processing large datasets where data can be read, processed, and written in smaller, manageable chunks. This is typically used for reading data from a database or a file and processing it record by record. The chunk model is built around three main components; Reader, Writer and Processor

Spring Boot Batch Chunk Processing

The key components of a Spring boot batch application include: - Job

  • Step
  • Readers, Writers, Processors
  • Job Repository
  • Job Launcher

The primary components of the Spring Batch and overall process flow are shown in the figure below.

JobLauncher:
The JobLauncher is an interface that represents the component responsible for running the jobs. It takes care of receiving job parameters and launching a job with those parameters. It's typically used to start a job from different triggers such as an application event, a REST API call, or from scheduler.

Job:
A Job in a Spring Batch is an entity that encapsulates an entire batch process and is defined by a series of steps.

Step:
A single job may have one or more steps, where each step typically involves reading data, processing it, and writing the processed data to the output source.

ItemReader:
The ItemReader is responsible for reading data from different sources such as databases or files. The ItemReader has a read() method, every time this method is invoked, it will return one item. If there are no more items to read, it returns null to indicate the end of the data input.

The FlatFileItemReader can be used for reading data from flat files (like CSV), JdbcCursorItemReader for reading from databases using a JDBC cursor, and JpaPagingItemReader can be used for reading database records using JPA pagination.

ItemProcessor:
The ItemProcessor is completely optional. It is used to validate, transform, or filter the items before passing them to the ItemWriter.

ItemWriter:
The ItemWriter takes processed items and writes them to a database or a file. The FlatFileItemWriter can be used for writing data to flat files, JdbcBatchItemWriter for batching database operations through JDBC, and JpaItemWriter for handling database operations using JPA.

Job Repository:
The JobRepository does all the hard work such as recording the status of jobs in a database. It keeps track of which jobs are running, which have been completed, and if a job fails, what step it failed at. This is critical for jobs that need to be restarted after a failure, ensuring that the job can pick up where it left off.

Load CSV Data to Postgres SQL using Spring Batch

This example uses chunk-based processing for reading CSV files, processing it and then storing it in Postgres SQL. For managing the database migrations we will use Flyway.

Use Spring Initializr to bootstrap a spring boot project by selecting the required dependencies. This example uses Java 17 and Spring Boot 3.2.4 and has the following dependencies.

  • Spring Batch
  • Spring Data JPA
  • PostgreSQL Driver
  • Lombok - completely optional for reducing boilerplate code.
  • Flyway - Database migration

Watch the step-by-step guide here

Find the full blog post and source code here at stacktips.com

Top comments (0)