Bash vs ksh pipes

Written by:

Bash vs ksh pipes
  • 0.00 / 5 5
0 votes, 0.00 avg. rating (0% score)

I am stuck with some problems with my scripts in ksh. FWIW the problem which I am unable to overcome is that when I use a structure such as this

command | while read VAR1 
do
   many.commands using $VAR1
done

I often get that my scripts do not perform the loop for every line piped to the while. To test this I then change the structure to

command > /tmp/tempfile
cat -n /tmp/tempfile >&2
cat /tmp/tempfile | while read VAR1
etc

This proves that there is many lines in the output.

Further I then add an extra line immediately after the do, like

echo DEBUGGING: $VAR1  >&2

Which proves that the loop runs only once. I am really stumped.

One workaround which is not always viable, is to do

for X in $(cat /tmp/tempfile )
do
...
done

This then works correctly but besides the fact that I hate this for structure, It means you expand the entire input data on the command-line (which has hard limits)

It appears that bash is better than ksh at handling this kind of thing. In particular it seems that this may be related to read calls failing but not retrying if the loop takes a long time to run.

However it seems that bash does not have a built-in “read”, which means much of my scripts will need to be re-written. I OFTEN use large structures like

command1 | command2 | while read SOMEVAR; do awk -F: "... long awk program" | sed "long sed program" ; done | sort -u | tail -1 | read FINAL_ANSWER

The problem is that bash uses /usr/bin/read which as expected throws away the result of FINAL_ANSWER as fast as it gets it. The obvious workaround is to replace

| read FINAL_ANSWER

with

> /tmp/final_answer && FINAL_ANSWER="$(cat /tmp/final_answer)"

So…. Any scripting gurus on here able to shed some more light on this? I deliberately did not post my real scripts here both because they are part of a sensitive solution developed for a customer, and because I don’t want the actual detail of the scripts to confuse the issue.

I use the “while read” format OFTEN. It usually works. I’ve in fact never had a problem with it in 25 years of shell scripting. Now I’m having problems. Very frustrating. Perplexing.

Initially I thought the while read is only receiving, or passing along, the first line of input. But then I discovered a situation where, when I run the script over and over, it runs further and further into the input. Specifically I have something line this

command | while read NEXT_ONE DONEFLAG
do
   if [ $DONEFLAG = "yes" ]
   then
       echo Already completed work for $NEXT_ONE
   else
       dowork $NEXT_ONE && set_flag $NEXT_ONE
   fi
done

It turns out that on each run of the script, it performs dowork once. It doesn’t really matter what dowork is, so long as it takes more than a few seconds. Some kind of shell pipe timeout occurs and the rest of the input then disappears. Google tells me that dtksh may solve this issue (Apparently it will retry the read/write or something, I didn’t read enough)

I see that dtksh exists in /usr/st/bin/dtksh

Who is this? I don’t like using shells that I don’t know, but it might be worth while to split small portions of scripts into sub-scripts with /usr/dt/bin/dtksh as interpreter.

Any advice?

EDIT: Providing an example of why I can not use bash as a drop-in replacement for ksh as interpreter:

sol10-primary> # cat test.sh
#!/bin/ksh
echo hello| read VAR1
echo $VAR1
sol10-primary> # ./test.sh
hello
sol10-primary> # sed 's/ksh/bash/' <test.sh >test2.sh
sol10-primary> # chmod +x test2.sh
sol10-primary> # ./test2.sh

sol10-primary> #

Your question is a bit rambling. I’ll answer what seems to be the central part, on the difference between ksh and bash that you observe.

You have encountered what is probably the #1 incompatibility between ksh and bash when it comes to scripts. ATT ksh (both ksh88 and ksh93) and zsh execute the last (rightmost) command in a pipeline in the parent shell, whereas other shells (Bourne, ash, bash, pdksh, mksh) execute all the commands including the last one in a subshell.

Here is a simple test program:

msg="a subshell"
true | msg="the parent shell"
echo "This shell runs the last command of a pipeline in $msg"

In ATT ksh and zsh, the second assignment to msg is executed in the parent shell so the effect is visible after the pipeline. In other shells, this assignment is executed in a subshell so the first assignment remains in place in the parent.

A workaround is to execute the rest of the script in the pipeline. This is a common idiom for reading data and doing some processing afterward:

output_some_stuff | {
  var=
  while IFS= read -r line; do
    var=$(process "$line")
  done
  use "$var"
}

You appear to have run into a ksh bug. I recommend upgrading to a non-buggy version. If that isn’t possible, try Stephane Chazelas’s workaround. While you can try running your scripts in bash, it is not (and does not pretend to be) a drop-in replacement for ksh; there are plenty of ksh features that bash doesn’t have (and vice versa). Bash and ksh are only compatible in their POSIX core and some other central features (in particular arrays, [[ … ]], and local variables in functions declared by typeset).

You could also try zsh, which when invoked as ksh behaves in a way that’s a bit closer to ksh than bash is. You may nonetheless run into incompatibilities.

Gilles

Leave a Reply