Merged with the original to update the gem, but keep our customizations#1
Open
grumpit wants to merge 45 commits intoEntropyZero:masterfrom
Open
Merged with the original to update the gem, but keep our customizations#1grumpit wants to merge 45 commits intoEntropyZero:masterfrom
grumpit wants to merge 45 commits intoEntropyZero:masterfrom
Conversation
[ci skip]
These vars were already created at line 13 but they weren't used before.
Use a_start and b_start variables in HashDiff.lcs
A hash key with anything not matching \w (word characters) like -|, etc would not match the regex detecting arrays in the patch path and would not patch correctly.
Fix bug with array under hash key with non-word characters.
fix an error when a hash has mixed types
- Add 2.2, 2.3 and 2.4 to Travis CI. - Modern version of `rake` fails with old RSpec so its version should be restricted. - Fixnum is deprecated in Ruby 2.4.0 so replace it with Integer.
New rubies support.
Mention 2 compelling reasons to start using HashDiff
HashDiff#similar? compares values based on "node count" using
HashDiff#count_nodes and HashDiff#diff (if the counts don't cancel out). If
`a` and `b` are arrays/hashes `count_nodes` recursively counts the elements,
otherwise returns 1. If `a` and `b` are not arrays/hashes, HashDiff#similar?
needlessly counts/diffs values that will ultimately end up passing
through HashDiff#compare_values.
A considerable performance improvement can be had by circumventing the
needless recursion and call HashDiff#compare_values if `a` and `b` are
not arrays/hashes.
This has been benchmarked with this snippet:
```ruby
$LOAD_PATH << File.join(File.dirname(__FILE__), 'lib')
require 'hashdiff'
require 'benchmark'
seq1 = %w(a b c e h j l m n p)
seq2 = %w(a b c d e f j k l m r s t)
n = 10000
Benchmark.bm do |x|
x.report('lcs') { n.times do ; HashDiff.lcs(seq1, seq2); end }
end
```
Before:
```sh
$ ruby benchmark.rb
user system total real
lcs 6.640000 0.010000 6.650000 ( 6.649822)
```
After:
```sh
$ ruby benchmark.rb
user system total real
lcs 1.030000 0.010000 1.040000 ( 1.037542)
```
Greatly improve performance of HashDiff#similar?
add codecov to show coverage
The patches suggested in the README only work when you have string keys on the hash.
This introduces an array_path option that can be used when generating a
diff. This represents the path to aspects of the diff as an array rather
than a string.
eg.
```
x = {'a' => 1}
y = {'a' => 2}
HashDiff.diff(x, y)
=> [["~", "a", 1, 2]]h
HashDiff.diff(x, y, :array_path => true)
=> [["~", ["a"], 1, 2]]
```
This allows there to be more flexibility with the types used as keys in
a hash. Allowing workarounds for issues such as:
liufengyun#25
eg
```
x = {'a'=>1}
y = {:a=>1}
HashDiff.diff(x, y)
=> [["-", "a", 1], ["+", "a", 1]]
HashDiff.diff(x, y, :array_path => true)
=> [["-", ["a"], 1], ["+", [:a], 1]]
```
And improved ability to patch hashes with keys:
eg
```
x = {a: {b: :c}}
y = {a: {b: :d}}
diff = HashDiff.diff(x, y)
=> [["~", "a.b", :c, :d]]
HashDiff.patch!(x, diff)
NoMethodError: undefined method `[]=' for nil:NilClass
diff = HashDiff.diff(x, y, array_path: true)
=> [["~", [:a, :b], :c, :d]]
HashDiff.patch!(x, diff)
=> {:a=>{:b=>:d}}
```
This updates the `patch!` and `unpatch!` methods to accept diffs with
either paths as strings or as arrays.
Introduce an array_path option
The LCS algorithm produces excellent diffs. Unfortunately it has a
complexity, running at n^2 for the number of items in an array - which
can lead to extremely slow computations.
```
> require 'hashdiff';require 'benchmark'
> x = (1..100).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x) }
0.420000 0.000000 0.420000 ( 0.430212)
```
If the size of the array is 1000 then we see it get really painful
```
> x = (1..1000).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x) }
42.680000 0.590000 43.270000 ( 43.530287)
```
This commit introduces an option to sacrifice the quality of the diff
for a faster computational result with the `use_lcs` option, which can
be set to false to disable use of the LCS algorithm.
With `use_lcs` as false the array comparison is much simpler with a
complexity of at worst 2n for an array.
```
> x = (1..100).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x, use_lcs: false) }
0.010000 0.000000 0.010000 ( 0.004894)
> x = (1..1000).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x, use_lcs: false) }
0.040000 0.000000 0.040000 ( 0.042547)
```
The linear approach to comparing the array works on the basis that if
arrays are the same length it treats the array as having no additions or
deletions, only changes.
```
> HashDiff.diff([0,1,2], [3,4,5], use_lcs: false)
=> [["~", "[0]", 0, 3], ["~", "[1]", 1, 4], ["~", "[2]", 2, 5]]
```
compared to:
```
> HashDiff.diff([0,1,2], [3,4,5])
=> [["-", "[2]", 2], ["-", "[1]", 1], ["-", "[0]", 0], ["+", "[0]", 3], ["+", "[1]", 4], ["+", "[2]", 5]]
```
Whereas if there are more items in one array than the other it checks
the items surrounding the index for a match to calculate additions and
removals.
```
> HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5], use_lcs: false)
=> [["+", "[1]", 1], ["+", "[2]", 2], ["+", "[4]", 4]]
> HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5], use_lcs: false) == HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5])
=> true
```
For a combination of added and changed items the diff will appear
different to the lcs approach:
```
> HashDiff.diff([0, 1, 2], [0, 2, 2, 3], use_lcs: false)
=> [["~", "[1]", 1, 2], ["+", "[3]", 3]]
> HashDiff.diff([0, 1, 2], [0, 2, 2, 3])
=> [["-", "[1]", 1], ["+", "[1]", 2], ["+", "[3]", 3]]
```
However all diffs produce same results through `patch!` and `unpatch!`
methods.
Option to allow array comparisons in linear complexity
Documentation for the :use_lcs option
update minimum ruby to reflect actual support
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.