Blog Archive

Wednesday, December 13, 2017

merge - Finding and merging down intervalls in Perl - Stack Overflow

merge - Finding and merging down intervalls in Perl - Stack Overflow:







https://stackoverflow.com/questions/42928964/finding-and-merging-down-intervalls-in-perl





#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

my %ranges;

#iterate line by line. 
while (<>) {
   chomp;
   #split by line
   my ( $name, $start_range, $end_range ) = split;
   #set a variable to see if it's within an existing range. 
   my $in_range = 0;
   #iterate all the existing ones. 
   foreach my $range ( @{ $ranges{$name} } ) {

      #merge if start or end is 'within' this range. 
      if (
         ( $start_range >= $range->{start} and $start_range <= $range->{end} )

         or

         ( $end_range >= $range->{start} and $end_range <= $range->{end} )
        )
      {


         ## then the start or end is within the existing range, so add to it:
         if ( $end_range > $range->{end} ) {
            $range->{end} = $end_range;
         }
         if ( $start_range < $range->{start} ) {
            $range->{start} = $start_range;
         }
         $in_range++;
      }

   }
   #didn't find any matches, so create a new range identity. 
   if ( not $in_range ) {
      push @{ $ranges{$name} }, { start => $start_range, end => $end_range };
   }
}

print Dumper \%ranges;

#iterate by sample
foreach my $sample ( sort keys %ranges ) {
   #iterate by range (sort by lowest start)
   foreach
     my $range ( sort { $a->{start} <=> $b->{start} } @{ $ranges{$sample} } )
   {
      print join "\t", $sample, $range->{start}, $range->{end}, "\n";
   }
}
Outputs with your data:
SampleA 100 600 
SampleA 700 800 
SampleA 900 1100    
SampleA 1200    1900    
SampleB 700 900 
SampleB 1000    1800    
SampleB 1900    2600    
SampleB 3000    3600    
This probably isn't the most efficient algorithm though, because it checks all the ranges - but you probably don't need to, because the input data is ordered - you can just check the 'most recent' instead.


'via Blog this'

No comments:

Post a Comment