Opening a File and Reading Data Java
Reading files in Java is the cause for a lot of confusion. There are multiple ways of accomplishing the same task and it's oft non clear which file reading method is all-time to use. Something that's quick and dirty for a small example file might not be the best method to employ when you need to read a very large file. Something that worked in an earlier Java version, might not be the preferred method anymore.
This article aims to exist the definitive guide for reading files in Java vii, 8 and nine. I'grand going to comprehend all the ways y'all can read files in Java. Also ofttimes, you'll read an article that tells you one way to read a file, only to observe later there are other ways to exercise that. I'1000 actually going to embrace fifteen different ways to read a file in Java. I'm going to cover reading files in multiple ways with the core Java libraries besides as two third political party libraries.
Merely that's not all – what good is knowing how to practice something in multiple means if y'all don't know which way is best for your situation?
I also put each of these methods to a existent operation test and document the results. That way, you will have some hard data to know the operation metrics of each method.
Methodology
JDK Versions
Java code samples don't live in isolation, especially when it comes to Java I/O, as the API keeps evolving. All code for this article has been tested on:
- Java SE seven (jdk1.7.0_80)
- Java SE eight (jdk1.8.0_162)
- Java SE 9 (jdk-nine.0.iv)
When there is an incompatibility, information technology will exist stated in that section. Otherwise, the lawmaking works unaltered for different Coffee versions. The master incompatibility is the use of lambda expressions which was introduced in Java 8.
Coffee File Reading Libraries
There are multiple ways of reading from files in Java. This article aims to be a comprehensive drove of all the unlike methods. I will cover:
- java.io.FileReader.read()
- java.io.BufferedReader.readLine()
- java.io.FileInputStream.read()
- coffee.io.BufferedInputStream.read()
- java.nio.file.Files.readAllBytes()
- coffee.nio.file.Files.readAllLines()
- coffee.nio.file.Files.lines()
- coffee.util.Scanner.nextLine()
- org.apache.commons.io.FileUtils.readLines() – Apache Commons
- com.google.common.io.Files.readLines() – Google Guava
Closing File Resource
Prior to JDK7, when opening a file in Java, all file resource would need to be manually closed using a try-catch-finally block. JDK7 introduced the try-with-resources statement, which simplifies the process of endmost streams. You no longer demand to write explicit code to close streams because the JVM will automatically close the stream for you lot, whether an exception occurred or non. All examples used in this article use the try-with-resources argument for importing, loading, parsing and closing files.
File Location
All examples will read exam files from C:\temp.
Encoding
Graphic symbol encoding is not explicitly saved with text files and so Java makes assumptions about the encoding when reading files. Commonly, the assumption is correct simply sometimes you want to exist explicit when instructing your programs to read from files. When encoding isn't correct, yous'll come across funny characters announced when reading files.
All examples for reading text files use two encoding variations:
Default system encoding where no encoding is specified and explicitly setting the encoding to UTF-8.
Download Code
All code files are available from Github.
Code Quality and Code Encapsulation
In that location is a difference between writing lawmaking for your personal or work project and writing lawmaking to explain and teach concepts.
If I was writing this lawmaking for my own project, I would use proper object-oriented principles like encapsulation, brainchild, polymorphism, etc. But I wanted to brand each instance stand alone and easily understood, which meant that some of the lawmaking has been copied from 1 instance to the next. I did this on purpose because I didn't want the reader to have to figure out all the encapsulation and object structures I and then cleverly created. That would accept away from the examples.
For the same reason, I chose NOT to write these example with a unit testing framework similar JUnit or TestNG considering that'due south not the purpose of this article. That would add some other library for the reader to understand that has nothing to do with reading files in Java. That'southward why all the example are written inline inside the master method, without actress methods or classes.
My principal purpose is to make the examples as easy to empathise as possible and I believe that having extra unit testing and encapsulation code will non help with this. That doesn't mean that's how I would encourage you to write your own personal code. It's just the way I chose to write the examples in this commodity to make them easier to sympathize.
Exception Handling
All examples declare any checked exceptions in the throwing method declaration.
The purpose of this article is to testify all the different ways to read from files in Java – it's not meant to show how to handle exceptions, which will be very specific to your situation.
So instead of creating unhelpful try catch blocks that merely print exception stack traces and clutter up the code, all instance will declare whatsoever checked exception in the calling method. This will make the code cleaner and easier to sympathize without sacrificing whatever functionality.
Future Updates
As Java file reading evolves, I will be updating this article with whatever required changes.
File Reading Methods
I organized the file reading methods into three groups:
- Classic I/O classes that have been part of Java since before JDK 1.7. This includes the java.io and coffee.util packages.
- New Coffee I/O classes that have been function of Java since JDK1.7. This covers the java.nio.file.Files form.
- 3rd party I/O classes from the Apache Commons and Google Guava projects.
Classic I/O – Reading Text
1a) FileReader – Default Encoding
FileReader reads in one character at a time, without whatever buffering. It's meant for reading text files. It uses the default graphic symbol encoding on your system, and then I have provided examples for both the default case, likewise every bit specifying the encoding explicitly.
1
2
three
4
5
6
seven
8
9
10
11
12
13
fourteen
15
16
17
eighteen
19
import coffee.io.FileReader ;
import java.io.IOException ;public course ReadFile_FileReader_Read {
public static void chief( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;try ( FileReader fileReader = new FileReader (fileName) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = fileReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;//brandish one character at a time
System.out.print (singleChar) ;
}
}
}
}
1b) FileReader – Explicit Encoding (InputStreamReader)
It's actually not possible to fix the encoding explicitly on a FileReader so you accept to use the parent form, InputStreamReader and wrap it around a FileInputStream:
1
2
3
four
5
6
7
8
9
ten
eleven
12
13
xiv
15
xvi
17
18
xix
twenty
21
22
import java.io.FileInputStream ;
import java.io.IOException ;
import coffee.io.InputStreamReader ;public class ReadFile_FileReader_Read_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileInputStream fileInputStream = new FileInputStream (fileName) ;//specify UTF-eight encoding explicitly
endeavour ( InputStreamReader inputStreamReader =
new InputStreamReader (fileInputStream, "UTF-eight" ) ) {int singleCharInt;
char singleChar;
while ( (singleCharInt = inputStreamReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
System.out.print (singleChar) ; //brandish one grapheme at a time
}
}
}
}
2a) BufferedReader – Default Encoding
BufferedReader reads an entire line at a time, instead of i character at a time like FileReader. Information technology'due south meant for reading text files.
1
ii
3
4
5
six
vii
8
9
x
eleven
12
xiii
14
15
16
17
import java.io.BufferedReader ;
import java.io.FileReader ;
import coffee.io.IOException ;public class ReadFile_BufferedReader_ReadLine {
public static void main( Cord [ ] args) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
FileReader fileReader = new FileReader (fileName) ;effort ( BufferedReader bufferedReader = new BufferedReader (fileReader) ) {
Cord line;
while ( (line = bufferedReader.readLine ( ) ) != null ) {
System.out.println (line) ;
}
}
}
}
2b) BufferedReader – Explicit Encoding
In a similar way to how we set encoding explicitly for FileReader, we need to create FileInputStream, wrap it within InputStreamReader with an explicit encoding and pass that to BufferedReader:
1
2
iii
4
5
vi
7
8
9
10
eleven
12
13
14
xv
16
17
xviii
nineteen
twenty
21
22
import java.io.BufferedReader ;
import java.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public form ReadFile_BufferedReader_ReadLine_Encoding {
public static void main( Cord [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;FileInputStream fileInputStream = new FileInputStream (fileName) ;
//specify UTF-eight encoding explicitly
InputStreamReader inputStreamReader = new InputStreamReader (fileInputStream, "UTF-8" ) ;try ( BufferedReader bufferedReader = new BufferedReader (inputStreamReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != aught ) {
System.out.println (line) ;
}
}
}
}
Archetype I/O – Reading Bytes
1) FileInputStream
FileInputStream reads in one byte at a time, without any buffering. While it'southward meant for reading binary files such every bit images or audio files, it tin can even so be used to read text file. It's similar to reading with FileReader in that you're reading 1 character at a time every bit an integer and y'all need to bandage that int to a char to encounter the ASCII value.
By default, information technology uses the default character encoding on your organisation, so I take provided examples for both the default case, as well as specifying the encoding explicitly.
1
2
3
4
5
6
7
viii
9
10
eleven
12
13
xiv
xv
16
17
eighteen
nineteen
twenty
21
import coffee.io.File ;
import java.io.FileInputStream ;
import java.io.FileNotFoundException ;
import coffee.io.IOException ;public form ReadFile_FileInputStream_Read {
public static void main( String [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;endeavor ( FileInputStream fileInputStream = new FileInputStream (file) ) {
int singleCharInt;
char singleChar;while ( (singleCharInt = fileInputStream.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
System.out.print (singleChar) ;
}
}
}
}
two) BufferedInputStream
BufferedInputStream reads a set of bytes all at in one case into an internal byte assortment buffer. The buffer size tin exist fix explicitly or use the default, which is what nosotros'll demonstrate in our example. The default buffer size appears to be 8KB but I have not explicitly verified this. All functioning tests used the default buffer size and so it volition automatically re-size the buffer when it needs to.
1
2
3
iv
5
half-dozen
7
8
9
10
xi
12
xiii
fourteen
15
16
17
xviii
nineteen
20
21
22
import java.io.BufferedInputStream ;
import coffee.io.File ;
import java.io.FileInputStream ;
import coffee.io.FileNotFoundException ;
import java.io.IOException ;public class ReadFile_BufferedInputStream_Read {
public static void primary( Cord [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;
FileInputStream fileInputStream = new FileInputStream (file) ;try ( BufferedInputStream bufferedInputStream = new BufferedInputStream (fileInputStream) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = bufferedInputStream.read ( ) ) != - i ) {
singleChar = ( char ) singleCharInt;
System.out.impress (singleChar) ;
}
}
}
}
New I/O – Reading Text
1a) Files.readAllLines() – Default Encoding
The Files class is part of the new Java I/O classes introduced in jdk1.7. Information technology only has static utility methods for working with files and directories.
The readAllLines() method that uses the default character encoding was introduced in jdk1.8 so this case volition not work in Java 7.
1
2
3
iv
5
6
7
8
9
10
xi
12
13
xiv
15
16
17
import java.io.File ;
import coffee.io.IOException ;
import coffee.nio.file.Files ;
import java.util.Listing ;public class ReadFile_Files_ReadAllLines {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readAllLines (file.toPath ( ) ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
1b) Files.readAllLines() – Explicit Encoding
1
2
3
4
v
six
vii
8
9
10
11
12
13
14
15
sixteen
17
18
19
import java.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.List ;public course ReadFile_Files_ReadAllLines_Encoding {
public static void main( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//utilise UTF-8 encoding
Listing fileLinesList = Files.readAllLines (file.toPath ( ), StandardCharsets.UTF_8 ) ;for ( Cord line : fileLinesList) {
Organization.out.println (line) ;
}
}
}
2a) Files.lines() – Default Encoding
This lawmaking was tested to work in Java 8 and ix. Java vii didn't run considering of the lack of support for lambda expressions.
ane
2
3
4
5
6
vii
eight
ix
10
xi
12
13
14
15
16
17
import java.io.File ;
import java.io.IOException ;
import coffee.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines {
public static void master( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;attempt (Stream linesStream = Files.lines (file.toPath ( ) ) ) {
linesStream.forEach (line -> {
System.out.println (line) ;
} ) ;
}
}
}
2b) Files.lines() – Explicit Encoding
Just like in the previous instance, this code was tested and works in Java 8 and ix but non in Java 7.
one
two
3
4
five
6
vii
8
9
x
11
12
13
14
fifteen
16
17
18
import java.io.File ;
import coffee.io.IOException ;
import java.nio.charset.StandardCharsets ;
import coffee.nio.file.Files ;
import java.util.stream.Stream ;public form ReadFile_Files_Lines_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Stream linesStream = Files.lines (file.toPath ( ), StandardCharsets.UTF_8 ) ) {
linesStream.forEach (line -> {
Arrangement.out.println (line) ;
} ) ;
}
}
}
3a) Scanner – Default Encoding
The Scanner class was introduced in jdk1.7 and can be used to read from files or from the panel (user input).
1
2
3
four
5
vi
7
8
nine
10
11
12
13
14
15
16
17
18
19
import java.io.File ;
import java.io.FileNotFoundException ;
import coffee.util.Scanner ;public class ReadFile_Scanner_NextLine {
public static void main( String [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Scanner scanner = new Scanner(file) ) {
String line;
boolean hasNextLine = imitation ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
Arrangement.out.println (line) ;
}
}
}
}
3b) Scanner – Explicit Encoding
1
2
3
4
5
6
7
8
ix
x
11
12
xiii
14
15
16
17
18
19
20
import java.io.File ;
import java.io.FileNotFoundException ;
import java.util.Scanner ;public course ReadFile_Scanner_NextLine_Encoding {
public static void main( String [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//employ UTF-eight encoding
endeavour (Scanner scanner = new Scanner(file, "UTF-8" ) ) {
Cord line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
Organization.out.println (line) ;
}
}
}
}
New I/O – Reading Bytes
Files.readAllBytes()
Even though the documentation for this method states that "it is not intended for reading in large files" I found this to be the absolute best performing file reading method, fifty-fifty on files as large as 1GB.
ane
ii
3
4
5
6
7
8
9
ten
11
12
13
fourteen
15
sixteen
17
import coffee.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;public class ReadFile_Files_ReadAllBytes {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;byte [ ] fileBytes = Files.readAllBytes (file.toPath ( ) ) ;
char singleChar;
for ( byte b : fileBytes) {
singleChar = ( char ) b;
System.out.impress (singleChar) ;
}
}
}
tertiary Party I/O – Reading Text
Commons – FileUtils.readLines()
Apache Commons IO is an open up source Java library that comes with utility classes for reading and writing text and binary files. I listed it in this article considering it can be used instead of the built in Java libraries. The class we're using is FileUtils.
For this article, version two.vi was used which is uniform with JDK 1.seven+
Note that you demand to explicitly specify the encoding and that method for using the default encoding has been deprecated.
ane
ii
3
4
5
6
7
eight
9
10
eleven
12
thirteen
xiv
xv
16
17
eighteen
import coffee.io.File ;
import java.io.IOException ;
import java.util.List ;import org.apache.eatables.io.FileUtils ;
public class ReadFile_Commons_FileUtils_ReadLines {
public static void master( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = FileUtils.readLines (file, "UTF-8" ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
Guava – Files.readLines()
Google Guava is an open source library that comes with utility classes for common tasks like collections handling, cache management, IO operations, string processing.
I listed it in this commodity because it can be used instead of the built in Java libraries and I wanted to compare its performance with the Java built in libraries.
For this article, version 23.0 was used.
I'm non going to examine all the different ways to read files with Guava, since this article is not meant for that. For a more than detailed look at all the different ways to read and write files with Guava, have a look at Baeldung's in depth commodity.
When reading a file, Guava requires that the character encoding be set explicitly, just similar Apache Commons.
Compatibility annotation: This lawmaking was tested successfully on Java 8 and 9. I couldn't get it to work on Java 7 and kept getting "Unsupported major.small-scale version 52.0" error. Guava has a separate API doc for Java vii which uses a slightly unlike version of the Files.readLine() method. I thought I could go it to piece of work only I kept getting that fault.
i
2
3
four
5
6
7
viii
9
10
11
12
13
14
15
16
17
18
nineteen
import coffee.io.File ;
import java.io.IOException ;
import java.util.List ;import com.google.common.base.Charsets ;
import com.google.common.io.Files ;public class ReadFile_Guava_Files_ReadLines {
public static void chief( String [ ] args) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readLines (file, Charsets.UTF_8 ) ;
for ( Cord line : fileLinesList) {
System.out.println (line) ;
}
}
}
Performance Testing
Since in that location are and so many ways to read from a file in Java, a natural question is "What file reading method is the best for my situation?" So I decided to test each of these methods confronting each other using sample data files of dissimilar sizes and timing the results.
Each code sample from this article displays the contents of the file to a string and and so to the console (System.out). However, during the performance tests the System.out line was commented out since information technology would seriously dull down the performance of each method.
Each operation exam measures the time it takes to read in the file – line by line, character past character, or byte by byte without displaying anything to the console. I ran each test 5-10 times and took the boilerplate so equally not to let any outliers influence each test. I too ran the default encoding version of each file reading method – i.due east. I didn't specify the encoding explicitly.
Dev Setup
The dev environment used for these tests:
- Intel Core i7-3615 QM @two.3 GHz, 8GB RAM
- Windows 8 x64
- Eclipse IDE for Java Developers, Oxygen.2 Release (four.seven.ii)
- Java SE 9 (jdk-ix.0.4)
Information Files
GitHub doesn't allow pushing files larger than 100 MB, so I couldn't find a practical fashion to store my big test files to allow others to replicate my tests. And so instead of storing them, I'm providing the tools I used to generate them so you can create test files that are similar in size to mine. Apparently they won't exist the same, but you lot'll generate files that are similar in size equally I used in my performance tests.
Random Cord Generator was used to generate sample text and then I but re-create-pasted to create larger versions of the file. When the file started getting besides big to manage inside a text editor, I had to employ the command line to merge multiple text files into a larger text file:
copy *.txt sample-1GB.txt
I created the following 7 data file sizes to test each file reading method across a range of file sizes:
- 1KB
- 10KB
- 100KB
- 1MB
- 10MB
- 100MB
- 1GB
Functioning Summary
There were some surprises and some expected results from the operation tests.
As expected, the worst performers were the methods that read in a file grapheme by character or byte past byte. But what surprised me was that the native Java IO libraries outperformed both third political party libraries – Apache Eatables IO and Google Guava.
What's more – both Google Guava and Apache Commons IO threw a java.lang.OutOfMemoryError when trying to read in the ane GB test file. This too happened with the Files.readAllLines(Path) method but the remaining seven methods were able to read in all test files, including the 1GB exam file.
The following table summarizes the average time (in milliseconds) each file reading method took to complete. I highlighted the top three methods in green, the average performing methods in yellow and the worst performing methods in red:
The post-obit nautical chart summarizes the in a higher place tabular array but with the post-obit changes:
I removed coffee.io.FileInputStream.read() from the chart considering its functioning was so bad information technology would skew the entire chart and you wouldn't see the other lines properly
I summarized the data from 1KB to 1MB considering later on that, the chart would get too skewed with and then many nether performers and also some methods threw a java.lang.OutOfMemoryError at 1GB
The Winners
The new Java I/O libraries (java.nio) had the best overall winner (java.nio.Files.readAllBytes()) but it was followed closely behind by BufferedReader.readLine() which was also a proven pinnacle performer across the board. The other excellent performer was java.nio.Files.lines(Path) which had slightly worse numbers for smaller test files simply really excelled with the larger exam files.
The accented fastest file reader across all data tests was java.nio.Files.readAllBytes(Path). It was consistently the fastest and even reading a 1GB file only took virtually one 2d.
The post-obit chart compares performance for a 100KB test file:
You can see that the lowest times were for Files.readAllBytes(), BufferedInputStream.read() and BufferedReader.readLine().
The post-obit chart compares performance for reading a 10MB file. I didn't bother including the bar for FileInputStream.Read() because the performance was then bad information technology would skew the entire chart and you couldn't tell how the other methods performed relative to each other:
Files.readAllBytes() actually outperforms all other methods and BufferedReader.readLine() is a distant second.
The Losers
As expected, the absolute worst performer was java.io.FileInputStream.read() which was orders of magnitude slower than its rivals for most tests. FileReader.read() was also a poor performer for the aforementioned reason – reading files byte by byte (or character past grapheme) instead of with buffers drastically degrades performance.
Both the Apache Commons IO FileUtils.readLines() and Guava Files.readLines() crashed with an OutOfMemoryError when trying to read the 1GB test file and they were about average in performance for the remaining exam files.
java.nio.Files.readAllLines() too crashed when trying to read the 1GB test file but it performed quite well for smaller file sizes.
Functioning Rankings
Hither's a ranked list of how well each file reading method did, in terms of speed and handling of large files, as well equally compatibility with dissimilar Coffee versions.
| Rank | File Reading Method |
|---|---|
| 1 | coffee.nio.file.Files.readAllBytes() |
| two | java.io.BufferedFileReader.readLine() |
| 3 | coffee.nio.file.Files.lines() |
| 4 | java.io.BufferedInputStream.read() |
| five | coffee.util.Scanner.nextLine() |
| 6 | coffee.nio.file.Files.readAllLines() |
| seven | org.apache.commons.io.FileUtils.readLines() |
| eight | com.google.common.io.Files.readLines() |
| 9 | java.io.FileReader.read() |
| 10 | java.io.FileInputStream.Read() |
Conclusion
I tried to present a comprehensive set of methods for reading files in Coffee, both text and binary. We looked at fifteen different ways of reading files in Coffee and we ran performance tests to see which methods are the fastest.
The new Java IO library (java.nio) proved to exist a great performer simply and so was the classic BufferedReader.
Source: https://funnelgarden.com/java_read_file/
0 Response to "Opening a File and Reading Data Java"
Post a Comment