Manu

Reading file in OCaml

Reading file in OCaml can be done using In_channel.

Consider an input file input.txt

hello
earth

hello
mars

hello
jupiter

Short explanation

Reading file in full

let () =
  let filename = "input.txt" in
  let contents = In_channel.with_open_text filename In_channel.input_all in
  String.split_on_char '\n' contents

Reading file line by line

let () =
  let filename = "input.txt" in
  In_channel.with_open_text filename (fun in_channel ->
      let rec loop () =
        match In_channel.input_line in_channel with
        | Some line ->
            print_endline line;
            loop ()
        | None -> print_endline "~~ End of file reached"
      in
      loop ())

This will return us a list of string.

Longer explanation

In_channel

In_channel is a module1 that provides functions to work with input channels. This can be a file or a standard input. It also provides useful functions to read a file. For example to read a text file we can use

In_channel.with_open_text "input.txt"

The function takes care of opening and closing the file as well.

We could read a file as binary or as text, there are functions available for both

We will focus on reading as text for now.

We can read file two ways

  1. Full into memory
  2. Line by line (to process on the fly)

Reading full file

let () =
  let filename = "input.txt" in
  let contents = In_channel.with_open_text filename In_channel.input_all in
  let lst = String.split_on_char '\n' contents in
  List.iter (fun x -> print_endline x) lst

Here we are using In_channel.with_open_text and In_channel.input_all to read the full contents into memory.

Reading line by line

let () =
  let filename = "input.txt" in
  In_channel.with_open_text filename (fun in_channel ->
      let rec read_lines () =
        match In_channel.input_line in_channel with
        | Some line ->
            print_endline line;
            read_lines () (* Continue reading next lines *)
        | None -> print_endline "\n~~End of file"
      in
      read_lines ())

Here we ware using In_channel.with_open_text and a lambda function to make use of In_channel.input_line. We will need to iterate recursively until we reach end of file.

Let’s break down why you need to pass (fun in_channel -> ...) as a second argument to In_channel.with_open_text instead unlike In_channel.input_all, plus what is happening with the function arguments.

The function In_channel.with_open_text expects two arguments: a string (the file path) and a function. This function must take an In_channel.t (the file channel) as its argument and return a result (often a string or some processed data from the file).

In_channel.input_all is a function that takes an In_channel.t and returns a string.

val input_all : In_channel.t -> string

When we pass In_channel.input_all to In_channel.with_open_text we are essentially doing a partial application.

As per the above discussion In_channel.with_open_text expects a function as second argument which should accept In_channel.t as an input argument and then return a result and In_channel.input_all fits perfectly here. Hence, we are able to pass it directly.

let file_contents = In_channel.with_open_text "filename.txt" In_channel.input_all

On the other hand, In_channel.input_line is designed to read a single line from an input channel and its signature is:

val input_line : In_channel.t -> string option

It returns an option because we can have two results

Considering these, we will need to loop until we read all the contents of the file since a single input_line execution will read only one line. Therefore, we are not able to pass In_channel.input_line directly. Now, what if we do?

NOTE: We could still pass the function and match the arguments but then we would be still reading only one line, obviously that is not our goal. This is anyway covered below.

Making changes one by one and observing the changes.

Replacing input_all with input_line

let () =
  let filename = "input.txt" in
  (* Changed input_all to input_line *)
  let contents = In_channel.with_open_text filename In_channel.input_line in
  let lst = String.split_on_char '\n' contents in
  List.iter (fun x -> print_endline x) lst

Result

File "./program.ml", line 4, characters 44-52:
4 |         let lst = String.split_on_char '\n' contents in
                                                ^^^^^^^^
Error: This expression has type string option
       but an expression was expected of type string

That was obvious.

Now further changes to remove the extra processing, we are trying to print the result and we are now expecting only one line, which is the first line.

let () =
  let filename = "input.txt" in
  let contents = In_channel.with_open_text filename In_channel.input_line in
  print_endline contents

Result

File "./program.ml", line 4, characters 22-30:
4 |         print_endline contents
                          ^^^^^^^^
Error: This expression has type string option
       but an expression was expected of type string

Yes, this is expected as we discussed - we are getting an option from In_channel.input_line and we need to respect that.

Changing further with match

let () =
  let filename = "input.txt" in
  let contents = In_channel.with_open_text filename In_channel.input_line in
  match contents with
  | Some line -> print_endline line
  | None -> print_endline "End of file reached"

Result

 ocaml full_file_2.ml
hello

Now, this makes sense as we don’t have a loop and now we are only printing the first line of the file.

Finally enough changes to read the full file

let () =
  let filename = "input.txt" in
  In_channel.with_open_text filename (fun in_channel ->
      let rec loop () =
        match In_channel.input_line in_channel with
        | Some line ->
            print_endline line;
            loop ()
        | None -> print_endline "~~ End of file reached"
      in
      loop ())

Result

❯ ocaml program.ml
hello
earth

hello
mars

hello
jupiter
~~ End of file reached

Fin.

  1. https://v2.ocaml.org/api/In_channel.html