Reading file in OCaml
Reading file in OCaml can be done using In_channel
.
Consider an input file input.txt
Short explanation
Reading file in full
Reading file line by line
This will return us a list of string
In_channel
In_channel
is a module1 that provides functions to work with input channels. This can be a file or a standard input. It also provides useful functions to read a file. For example to read a text file we can use
The function takes care of opening and closing the file as well.
Longer explanation
We could read a file as binary or as text, there are functions available for both
.with_open_bin
.with_open_text
We will focus on reading as text for now.
We can read file two ways
- Full into memory
- Line by line (to process on the fly)
Reading full file
Here we are using In_channel.with_open_text
and In_channel.input_all
to read the full contents into memory.
Reading line by line
Here we ware using In_channel.with_open_text
and a lambda function to make use of In_channel.input_line
. We will need to iterate recursively until we reach end of file.
Let’s break down why you need to pass (fun in_channel -> ...)
as a second argument to In_channel.with_open_text
instead unlike In_channel.input_all
, plus what is happening with the function arguments.
The function In_channel.with_open_text
expects two arguments: a string (the file path) and a function. This function must take an In_channel.t
(the file channel) as its argument and return a result (often a string or some processed data from the file).
In_channel.input_all
is a function that takes an In_channel.t
and returns a string.
When we pass In_channel.input_all
to In_channel.with_open_text
we are essentially doing a partial application [[ocaml-partial-application]].
As per the above discussion In_channel.with_open_text
expects a function as second argument which should accept In_channel.t
as an input argument and then return a result and In_channel.input_all
fits perfectly here. Hence, we are able to pass it directly.
On the other hand, In_channel.input_line
is designed to read a single line from an input channel and its signature is:
It returns an option
because we can have two results
- A line was available to be read (
Some line
) - End of file has been reached (
None
)
Considering these, we will need to loop until we read all the contents of the file since a single input_line
execution will read only one line. Therefore, we are not able to pass In_channel.input_line
directly. Now, what if we do?
NOTE: We could still pass the function and match
the arguments but then we would be still reading only one line, obviously that is not our goal. This is anyway covered below.
Making changes one by one and observing the changes.
- Replacing
input_all
withinput_line
Result
That was obvious.
Now further changes to remove the extra processing, we are trying to print the result and we are now expecting only one line, which is the first line.
Result
Yes, this is expected as we discussed - we are getting an option from In_channel.input_line
and we need to respect that.
Changing further with match
Result
Now, this makes sense as we don’t have a loop and now we are only printing the first line of the file.
Finally enough changes to read the full file
Result
Footnotes
Updated on