Wednesday, May 11, 2011

[TECH] nonlinear video editing with ruby

I recently had to do some video editing, and I ran into a spot of difficulty: I couldn't find a simple script-based system. Everything seems to be bundled with a complex, cumbersome graphical user interface. So I wrote a ruby script. It's meant to be included via "require" from a custom script containing configuration data and EDLs. It works in two stages: in the first stage, source material is expanded to uncompressed PCM and 4:2:0 YUV (typically generating huge video files). Title page text can also serve as source material (it is rendered using ImageMagick's "convert" tool). In the second stage, audio and video encoders are launched with their input files connected to FIFOs and the ruby script feeds these FIFOs with portions of the raw files according to an edit decision list. At the end of the second stage, the encoded audio and video files are muxed using mkvmerge. The whole thing isn't very user-friendly, but it does what little I needed done and it's a good starting point for anyone who needs to add special-purpose functionality.

The recommended way to use it is to edit one section at a time. That is, to "test" a section of video X...Y use an EDL like Y-3...Y , X...X+3. This allows you to adjust X and Y to get the cut points just right for the sequence X...Y and only requires you to encode 6 seconds of video in one go (with --preset ultrafast, it takes on the order of 9 seconds to see the result on a setup where the raw files are stored on very slow external storage). After testing each section, you can put them together into one large EDL and encode them with more aggressive compression settings. The script prints a progress line after each second of video encoded.

Hints:
  • Source video that has been split into multiple files by your video camera can usually be concatenated with "cat" and the result will work fine for importing.
  • 30fps is rarely 30fps. it is usually 30000/1001fps = 29.(970029)fps.
  • The frame size for 4:2:0 YUV is always 3 * width * height / 2.
  • The frame size for audio depends on the number of channels and the bit depth. 2 * 2 is a very good guess.
  • Video codec junkies will tell you that tweaking the video codec is essential for good quality. They are right. What they typically don't tell you is that the tweaking has already been done by the codec author. Use the --preset option.

Here follows the script (it's called vided.rb), and an example usage script (typically called go.rb).

#!/bin/false
# vided.rb
# copyright (c) 2011 by andrei borac

# please define the following (values are examples)
#  $vided_video_wid = 1920;
#  $vided_video_hei = 1080;
#  $vided_video_fsz = 3 * $vided_video_wid * $vided_video_hei / 2;
#  $vided_video_fps = 30000.0 / 1001.0;
#  $vided_audio_fsz = 2 * 2;
#  $vided_audio_fps = 48000.0;

raise("\$video_vided_wid undefined") if (!defined?($vided_video_wid));
raise("\$video_vided_hei undefined") if (!defined?($vided_video_hei));
raise("\$video_video_fsz undefined") if (!defined?($vided_video_fsz));
raise("\$vided_video_fps undefined") if (!defined?($vided_video_fps));
raise("\$video_audio_fsz undefined") if (!defined?($vided_audio_fsz));
raise("\$vided_audio_fps undefined") if (!defined?($vided_audio_fps));

def centconv(f)
  f = (10000.0 * f).round;
  f = 0 if (f < 0);
  f = 9999 if (f > 9999);
  
  n = f / 100;
  d = f % 100;
  
  n = n.to_s; n = "0" + n while (n.length < 2);
  d = d.to_s; d = "0" + d while (d.length < 2);
  
  return n.to_s + "." + d + "%";
end

def timeconv(s)
  s = s.round; # forget about fractional seconds
  s = 0 if (s < 0);
  
  m = ((s + 0.0) / 60).to_i; s -= 60 * m;
  h = ((m + 0.0) / 60).to_i; m -= 60 * h;
  d = ((h + 0.0) / 24).to_i; h -= 24 * d;
  
  s = s.to_s; s = "0" + s while (s.length < 2);
  m = m.to_s; m = "0" + m while (m.length < 2);
  h = h.to_s; h = "0" + h while (h.length < 2);
  d = d.to_s; d = "0" + d while (d.length < 3);
  
  return d.to_s + "d" + h.to_s + "h" + m.to_s + "m" + s.to_s + "s";
end

def run(command)
  $stderr.puts("^" + command + "$");
  
  if (!system("bash", "-c", "set -o errexit; set -o nounset; set -o pipefail; " + command))
    $stderr.puts("failed command ^" + command + "\$");
    exit(1);
  end
end

def import_generic(name, from, type)
  if (!File.exists?(type + "-" + name + ".raw.wok"))
    run("echo 0 5 0 > '.chomp." + name + ".edl'");
    run("mencoder -demuxer lavf -ignore-start '" + from + "' -of raw" + type + " -o '" + type + "-" + name + ".raw' -hr-edl-seek -edl '.chomp." + name + ".edl' -oac pcm -ovc raw -vf format=i420");
    run("touch '" + type + "-" + name + ".raw.wok'");
  end
end

def import_audio(name, from)
  import_generic(name, from, "audio");
end

def import_video(name, from)
  import_generic(name, from, "video");
end

def import_streams(name, from)
  import_audio(name, from);
  import_video(name, from);
end

def import_title_screen(name, from)
  if (!File.exists?("audio-" + name + ".raw.wok"))
    run("dd if=/dev/zero of='audio-" + name + ".raw' bs=1M count=1");
    run("touch 'audio-" + name + ".raw.wok'");
  end
  
  if (!File.exists?("video-" + name + ".raw.wok"))
    run("mencoder -of rawvideo -ovc raw -vf format=i420 -o 'video-" + name + ".420' mf://'" + from + "'");
    run("( for i in `seq 1 100`; do cat 'video-" + name + ".420'; done ) > 'video-" + name + ".raw'");
    run("touch 'video-" + name + ".raw.wok'");
  end
end

# w, h - size of title image to write text on
# offL - the left crop amount (image to be expanded this much leftward)
# offT - the top crop amount (image to be expanded this much upward)
# image is automatically expended rightward and downward to fill global widxhei
def import_title_screen_generate(name, w, h, offL, offT, d, text)
  if (!File.exists?("video-title-" + name + ".raw.wok"))
    run("convert " +
        "-size " + w.to_s + "x" + h.to_s + " " +
        "-background black " +
        "-fill white " +
        "-font Palatino-Roman " +
        "-density " + d.to_s + " " +
        "-pointsize 16 " +
        "-interline-spacing 13 " +
        "-gravity center " +
        "label:'" + text + "' " +
        "-background white " +
        "-gravity southeast " +
        "-extent " + (w + offL).to_s + "x" + (h + offT).to_s + " " +
        "-gravity northwest " +
        "-extent " + $vided_video_wid.to_s + "x" + $vided_video_hei.to_s + " " +
        "title-'" + name + "'.tga");
    import_title_screen("title-" + name, "title-" + name + ".tga");
  end
end

def above_zero(x)
  if (x > 0)
    return x;
  else
    return 0;
  end
end

def calculate_frame_offset(fps, tc_enter)
  fps += 0.0; tc_enter += 0.0;
  return above_zero((tc_enter * fps).round);
end

def calculate_frame_amount(fps, tc_ideal, tc_track, tc_enter, tc_leave)
  fps += 0.0; tc_ideal += 0.0; tc_track += 0.0; tc_enter += 0.0; tc_leave += 0.0;
  return above_zero(((tc_track - tc_ideal + tc_leave - tc_enter) * fps).round);
end

def reset_fifo(name)
  run("rm -f '" + name + "'; mkfifo '" + name + "'");
end

def export_chomp(dst_name, mencopts, local_video_wid, local_video_hei, lameopts, x264opts, mkvmopts, sections)
  if (!File.exists?(dst_name + ".mkv"))
    [ ".audio-", ".video-", ".final-" ].each { |prefix|
      [ ".raw", ".mp3", ".mp3.wok", ".264", ".264.wok", ".mkv" ].each { |suffix|
        run("rm -f '" + prefix + dst_name + suffix + "'");
      }
    }
    
    reset_fifo(".video-" + dst_name + ".raw");
    #reset_fifo(".audio-" + dst_name + ".mp3");
    #reset_fifo(".video-" + dst_name + ".264");
    
    audio_fifo_name = ".audio-" + dst_name + ".raw";
    video_fifo_name = ".video-" + dst_name + ".pre";
    
    reset_fifo(audio_fifo_name);
    reset_fifo(video_fifo_name);
    
    #run("( mkvmerge " + mkvmopts + " -o '.final-" + dst_name + ".mkv' '.audio-" + dst_name + ".mp3' --default-duration 0:" + $vided_video_fps.to_s + "fps '.video-" + dst_name + ".264' ; touch '.final-" + dst_name + ".mkv.wok' ) &> '.log-mkvm-" + dst_name + "' &");
    run("( lame -r " + lameopts + " '.audio-" + dst_name + ".raw' '.audio-" + dst_name + ".mp3' ; touch '.audio-" + dst_name + ".mp3.wok' ) &> '.log-lame-" + dst_name + "' &");
    run("( x264 " + x264opts + " -o '.video-" + dst_name + ".264' '.video-" + dst_name + ".raw' " + local_video_wid.to_s + "x" + local_video_hei.to_s + " ; touch '.video-" + dst_name + ".264.wok' ) &> '.log-x264-" + dst_name + "' &");
    run("( mencoder -demuxer rawvideo -rawvideo i420:w=" + $vided_video_wid.to_s + ":h=" + $vided_video_hei.to_s + " '.video-" + dst_name + ".pre' -of rawvideo -o '.video-" + dst_name + ".raw' -nosound -ovc raw -vf " + ((mencopts.length() > 0) ? (mencopts + ",") : ("")) + "format=i420 ) &> '.log-menc-" + dst_name + "' &");
    
    io_audio_fifo = IO.new(IO.sysopen(audio_fifo_name, "wb"), "wb");
    io_video_fifo = IO.new(IO.sysopen(video_fifo_name, "wb"), "wb");
    
    schedule = [];
    
    sections.each { |src_name, tc_enter, tc_leave|
      schedule << \
      [
       IO.new(IO.sysopen("audio-" + src_name + ".raw", "rb"), "rb"),
       IO.new(IO.sysopen("video-" + src_name + ".raw", "rb"), "rb"),
       tc_enter,
       tc_leave
      ]
    }
    
    tc_final = 0.0;
    
    sections.each { |src_name, tc_enter, tc_leave|
      raise("backward section") if (tc_leave <= tc_enter);
      tc_final += (tc_leave - tc_enter);
    }
    
    tc_ideal = 0;
    tc_audio = 0;
    tc_video = 0;
    tc_wrote = 0;
    
    operation_began = Time.now.to_f;
    
    schedule.each { |io_audio_file, io_video_file, tc_enter, tc_leave|
      tc_enter += 0.0; tc_leave += 0.0;
      
      $stderr.puts("tc_enter=" + tc_enter.to_s);
      $stderr.puts("tc_leave=" + tc_leave.to_s);
      
      audio_frame_offset = calculate_frame_offset($vided_audio_fps, tc_enter);
      audio_frame_amount = calculate_frame_amount($vided_audio_fps, tc_ideal, tc_audio, tc_enter, tc_leave);
      
      video_frame_offset = calculate_frame_offset($vided_video_fps, tc_enter);
      video_frame_amount = calculate_frame_amount($vided_video_fps, tc_ideal, tc_video, tc_enter, tc_leave);
      
      $stderr.puts("audio_frame_offset=" + audio_frame_offset.to_s);
      $stderr.puts("audio_frame_amount=" + audio_frame_amount.to_s);
      $stderr.puts("video_frame_offset=" + video_frame_offset.to_s);
      $stderr.puts("video_frame_amount=" + video_frame_amount.to_s);
      
      tc_ideal += (tc_leave - tc_enter);
      tc_audio += (audio_frame_amount + 0.0) / ($vided_audio_fps + 0.0);
      tc_video += (video_frame_amount + 0.0) / ($vided_video_fps + 0.0);
      
      $stderr.puts("tc_ideal=" + tc_ideal.to_s);
      $stderr.puts("tc_audio=" + tc_audio.to_s);
      $stderr.puts("tc_video=" + tc_video.to_s);
      
      while ((audio_frame_amount > 0) || (video_frame_amount > 0))
        audio_frame_atonce = [ audio_frame_amount, [ 1, $vided_audio_fps.round ].max ].min;
        IO.copy_stream(io_audio_file, io_audio_fifo, audio_frame_atonce * $vided_audio_fsz, audio_frame_offset * $vided_audio_fsz);
        audio_frame_offset += audio_frame_atonce;
        audio_frame_amount -= audio_frame_atonce;
        tc_wrote += (audio_frame_atonce + 0.0) / ($vided_audio_fps + 0.0);
        
        video_frame_atonce = [ video_frame_amount, [ 1, $vided_video_fps.round ].max ].min;
        IO.copy_stream(io_video_file, io_video_fifo, video_frame_atonce * $vided_video_fsz, video_frame_offset * $vided_video_fsz);
        video_frame_offset += video_frame_atonce;
        video_frame_amount -= video_frame_atonce;
        tc_wrote += (video_frame_atonce + 0.0) / ($vided_video_fps + 0.0);
        
        elapsed = Time.now.to_f - operation_began;
        completion = ((tc_wrote + 0.0) / (2 * (tc_final + 0.0)));
        $stderr.puts("ETA " + timeconv((elapsed / completion) - elapsed) + " " + centconv(completion));
      end
      
      io_audio_file.close();
      io_video_file.close();
    }
    
    io_audio_fifo.close();
    io_video_fifo.close();
    
    while (!((File.exists?(".audio-" + dst_name + ".mp3.wok")) && (File.exists?(".video-" + dst_name + ".264.wok"))))
      sleep(1);
    end
    
    run("mkvmerge " + mkvmopts + " -o '.final-" + dst_name + ".mkv' '.audio-" + dst_name + ".mp3' --default-duration 0:" + $vided_video_fps.to_s + "fps '.video-" + dst_name + ".264' &> '.log-mkvm-" + dst_name + "'");
    run("mv '.final-" + dst_name + ".mkv' '" + dst_name + ".mkv'");
  end
  
  # here lie outdated notes on how to use mencoder:
  # mencoder -demuxer rawvideo -rawvideo fps=30000/1001:w=1920:h=1080:yv12 video.raw -audio-demuxer rawaudio -rawaudio channels=2:rate=48000:samplesize=2 -audiofile audio.raw -o out.avi -oac mp3lame -ovc x264 -vf scale=512:288
end

#!/usr/bin/ruby

$vided_video_wid = 1920;
$vided_video_hei = 1080;
$vided_video_fsz = 3 * $vided_video_wid * $vided_video_hei / 2; # yuv
$vided_video_fps = 30000.0 / 1001.0; # framerate is not exactly 30fps
$vided_audio_fsz = 2 * 2;   # 16-bit stereo
$vided_audio_fps = 48000.0; # 48kHz

require("/path/to/vided.rb");

import_streams("s", "your-camera-output-video-file.avi");
import_title_screen_generate("main", 1440, 1080, 96, 0, 275,
                             [
                              "The Main Thing",
                              "... etc, etc, etc ..."
                             ].join("\n"));

def mkopts_menc(w, h)
  return "crop=1440:1080:96:0,scale=" + w.to_s + ":" + h.to_s;
end

def mkopts_lame(quality)
  return "-s 48 -m j --bitwidth 16 --signed --little-endian -q 0 --lowpass -1 --highpass -1 --vbr-new -V " + quality.to_s;
end

def mkopts_x264_fast(quality)
  return "--crf " + quality.to_s + " --preset ultrafast --threads 5 --b-pyramid strict";
end

def mkopts_x264_best(quality)
  return "--crf " + quality.to_s + " --preset veryslow --threads 1 --b-pyramid strict";
end

chomp =
   [ [ "title-main", 0, 1 ] ] * 5 +
   [
    [ "s", 1021, 1043 ],
    [ "s", 1165, 1187 ]
   ];
export_chomp("output-main", mkopts_menc(720, 540), 720, 540, mkopts_lame(0), mkopts_x264_fast(15), "", chomp);

No comments:

Post a Comment