Compare two non-matching lists and identify the row with maximum matching elements

up vote
2
down vote

favorite

Background

I've two lists (of lists), each created by reading data from two address tables.

The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.

Each list would look somewhat like this:

List 1 (cli add)

['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']

['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']    

['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']

List 2 (struct add)

['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']

['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']

['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']

['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']

Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.

I am looping through each row in list 1 and comparing with each row of list 2, element wise. If all the elements pulled from list 1 row are found in any row from list 2, I mark that record as 'matching' and retain the row from list 2. Have been able to identify the completely matching records.

Problem Point
The real challenge is about the non matching rows. For the non matching records from list 1, I would want to identify the most closely matching row from list 2. e.g. if row from list 1 has matching elements in three rows from list 2, I would want to pick up the list 2 row which has the highest number of matching elements.

Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.

I want to be able to create a list of un-matching records and capture umatching elements:

[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]

So for above shared data, I need a new list (of lists) which looks something like:

[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]

[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]

The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.

Code Snippet
Here is how I found the matching and non-matching records (list 1 is referred by cli_add_fnl and list 2 is struc_add_fnl). Have also figured the way to list the unmatched elements and count of matching elements. Just need a way to pull only the rows with max count for list element 1.

### Step 4 - Identifying the matching and non matching addresses ###  

validated_addresses_all = 

invalid_addresses_all = 



for cli_add in cli_add_fnl:            

    comparison_cli_add=cli_add.copy()



    #removing the id column from comparison

    comparison_cli_add.pop(0)





    for struct_add in struct_add_fnl:

        matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]



        #capture the matching records

        if matching_elements == comparison_cli_add:

            validated_addresses_all.append(cli_add)

        else:

            invalid_addresses_all.append(cli_add)

            invalid_addresses_all.append(struct_add)

            invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))

            invalid_addresses_all.append(nonmatching_elements)



#remove the duplicate entries

fnl_validated_addresses =  

for add in validated_addresses_all: 

    if add not in fnl_validated_addresses: 

        fnl_validated_addresses.append(add)

edited Nov 22 at 18:22

asked Nov 22 at 17:15

Sushant Vasishta

948

You should look into using sets: docs.python.org/3.7/library/…
– Meow
Nov 22 at 18:22

If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
– BernardL
Nov 22 at 21:04

add a comment |

up vote
2
down vote

favorite

Background

I've two lists (of lists), each created by reading data from two address tables.

The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.

Each list would look somewhat like this:

List 1 (cli add)

['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']

['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']    

['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']

List 2 (struct add)

['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']

['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']

['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']

['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']

Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.

Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.

I want to be able to create a list of un-matching records and capture umatching elements:

[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]

So for above shared data, I need a new list (of lists) which looks something like:

[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]

[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]

The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.

### Step 4 - Identifying the matching and non matching addresses ###  

validated_addresses_all = 

invalid_addresses_all = 



for cli_add in cli_add_fnl:            

    comparison_cli_add=cli_add.copy()



    #removing the id column from comparison

    comparison_cli_add.pop(0)





    for struct_add in struct_add_fnl:

        matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]



        #capture the matching records

        if matching_elements == comparison_cli_add:

            validated_addresses_all.append(cli_add)

        else:

            invalid_addresses_all.append(cli_add)

            invalid_addresses_all.append(struct_add)

            invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))

            invalid_addresses_all.append(nonmatching_elements)



#remove the duplicate entries

fnl_validated_addresses =  

for add in validated_addresses_all: 

    if add not in fnl_validated_addresses: 

        fnl_validated_addresses.append(add)

edited Nov 22 at 18:22

asked Nov 22 at 17:15

Sushant Vasishta

948

You should look into using sets: docs.python.org/3.7/library/…
– Meow
Nov 22 at 18:22

If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
– BernardL
Nov 22 at 21:04

add a comment |

up vote
2
down vote

favorite

Background

I've two lists (of lists), each created by reading data from two address tables.

The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.

Each list would look somewhat like this:

List 1 (cli add)

['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']

['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']    

['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']

List 2 (struct add)

['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']

['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']

['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']

['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']

Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.

Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.

I want to be able to create a list of un-matching records and capture umatching elements:

[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]

So for above shared data, I need a new list (of lists) which looks something like:

[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]

[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]

The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.

### Step 4 - Identifying the matching and non matching addresses ###  

validated_addresses_all = 

invalid_addresses_all = 



for cli_add in cli_add_fnl:            

    comparison_cli_add=cli_add.copy()



    #removing the id column from comparison

    comparison_cli_add.pop(0)





    for struct_add in struct_add_fnl:

        matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]



        #capture the matching records

        if matching_elements == comparison_cli_add:

            validated_addresses_all.append(cli_add)

        else:

            invalid_addresses_all.append(cli_add)

            invalid_addresses_all.append(struct_add)

            invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))

            invalid_addresses_all.append(nonmatching_elements)



#remove the duplicate entries

fnl_validated_addresses =  

for add in validated_addresses_all: 

    if add not in fnl_validated_addresses: 

        fnl_validated_addresses.append(add)

edited Nov 22 at 18:22

asked Nov 22 at 17:15

Sushant Vasishta

948

Background

I've two lists (of lists), each created by reading data from two address tables.

The first element in each row is the unique identifier of the list row and the remaining elements are used for address comparison.

Each list would look somewhat like this:

List 1 (cli add)

['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA']

['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA']    

['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']

List 2 (struct add)

['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA']

['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA']

['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA']

['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']

Task Goal
My task is to compare the addresses from the test list to the other list, and flag all the matching and non matching records.

Expected Outcome
In the data shared above, from the list 1, second row (id 542) has complete match. But the other two records aren't yielding a complete match.

I want to be able to create a list of un-matching records and capture umatching elements:

[[comparison record from list 1],[Most matching record from list 2],[non-matching elements from list 1]]

So for above shared data, I need a new list (of lists) which looks something like:

[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],['BR']]

[['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],['70','RD','M6C2X8', 'YK']]

The code below gets me the results partially. I am not being able to find a way to filter the highest matching rows.

### Step 4 - Identifying the matching and non matching addresses ###  

validated_addresses_all = 

invalid_addresses_all = 



for cli_add in cli_add_fnl:            

    comparison_cli_add=cli_add.copy()



    #removing the id column from comparison

    comparison_cli_add.pop(0)





    for struct_add in struct_add_fnl:

        matching_elements = [address_element for address_element in comparison_cli_add if address_element in struct_add]



        #capture the matching records

        if matching_elements == comparison_cli_add:

            validated_addresses_all.append(cli_add)

        else:

            invalid_addresses_all.append(cli_add)

            invalid_addresses_all.append(struct_add)

            invalid_addresses_all.append(len(set(comparison_cli_add) & set(struct_add)))

            invalid_addresses_all.append(nonmatching_elements)



#remove the duplicate entries

fnl_validated_addresses =  

for add in validated_addresses_all: 

    if add not in fnl_validated_addresses: 

        fnl_validated_addresses.append(add)

python python-3.x

edited Nov 22 at 18:22

asked Nov 22 at 17:15

Sushant Vasishta

948

edited Nov 22 at 18:22

asked Nov 22 at 17:15

Sushant Vasishta

948

edited Nov 22 at 18:22

asked Nov 22 at 17:15

Sushant Vasishta

948

asked Nov 22 at 17:15

Sushant Vasishta

948

asked Nov 22 at 17:15

Sushant Vasishta

948

You should look into using sets: docs.python.org/3.7/library/…
– Meow
Nov 22 at 18:22

If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
– BernardL
Nov 22 at 21:04

add a comment |

You should look into using sets: docs.python.org/3.7/library/…
– Meow
Nov 22 at 18:22

If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
– BernardL
Nov 22 at 21:04

You should look into using sets: docs.python.org/3.7/library/…
– Meow
Nov 22 at 18:22

If any of the answers has helped you, please accept them as answers to help close the question. Thanks!
– BernardL
Nov 22 at 21:04

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

This is one way to do it with ignoring position and the first item by comparing the values that are in adds and struct_adds and internally keeping a counter of the highest matches. As long there is a match it will update the counter and gets the index of the highest match else in the example below, it does nothing. Differences from item in add and the highest matches are then compared.

The results are then appended accordingly to a list.

adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

        ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],

        ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]



struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

                ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]



results = 



for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = [add_item in struct_add[1:] for add_item in add[1:]]

        if matches.count(True) > match_count:

            match_count = matches.count(True)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = [i for i in add[1:] if i not in highest_match]

        results.append([add,highest_match,differences])

Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:

for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = set(add[1:]) & set(struct_add[1:])

        if len(matches) > match_count:

            match_count = len(matches)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = list(set(add[1:]) - set(highest_match[1:]))

        results.append([add,highest_match,differences])

Both yields the same results:

results

>>

[[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

  ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

  ['BR']],

 [['543',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BR',

   'CANADA'],

  ['7H0044',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BC',

   'CANADA'],

  ['BR']],

 [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],

  ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

  ['70', 'RD', 'M6C2X8', 'YK']]]

I should also add that in this example and also not to further complicate things, it will take the first highest match. This part is managed in the if clause comparing the count of true matches must be more than the current count of matches.

edited Nov 22 at 18:34

answered Nov 22 at 18:22

BernardL

2,3331829

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53435703%2fcompare-two-non-matching-lists-and-identify-the-row-with-maximum-matching-elemen%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

The results are then appended accordingly to a list.

adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

        ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],

        ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]



struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

                ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]



results = 



for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = [add_item in struct_add[1:] for add_item in add[1:]]

        if matches.count(True) > match_count:

            match_count = matches.count(True)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = [i for i in add[1:] if i not in highest_match]

        results.append([add,highest_match,differences])

Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:

for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = set(add[1:]) & set(struct_add[1:])

        if len(matches) > match_count:

            match_count = len(matches)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = list(set(add[1:]) - set(highest_match[1:]))

        results.append([add,highest_match,differences])

Both yields the same results:

results

>>

[[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

  ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

  ['BR']],

 [['543',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BR',

   'CANADA'],

  ['7H0044',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BC',

   'CANADA'],

  ['BR']],

 [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],

  ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

  ['70', 'RD', 'M6C2X8', 'YK']]]

edited Nov 22 at 18:34

answered Nov 22 at 18:22

BernardL

2,3331829

add a comment |

up vote
1
down vote

accepted

The results are then appended accordingly to a list.

adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

        ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],

        ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]



struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

                ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]



results = 



for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = [add_item in struct_add[1:] for add_item in add[1:]]

        if matches.count(True) > match_count:

            match_count = matches.count(True)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = [i for i in add[1:] if i not in highest_match]

        results.append([add,highest_match,differences])

Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:

for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = set(add[1:]) & set(struct_add[1:])

        if len(matches) > match_count:

            match_count = len(matches)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = list(set(add[1:]) - set(highest_match[1:]))

        results.append([add,highest_match,differences])

Both yields the same results:

results

>>

[[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

  ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

  ['BR']],

 [['543',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BR',

   'CANADA'],

  ['7H0044',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BC',

   'CANADA'],

  ['BR']],

 [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],

  ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

  ['70', 'RD', 'M6C2X8', 'YK']]]

edited Nov 22 at 18:34

answered Nov 22 at 18:22

BernardL

2,3331829

add a comment |

up vote
1
down vote

accepted

The results are then appended accordingly to a list.

adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

        ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],

        ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]



struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

                ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]



results = 



for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = [add_item in struct_add[1:] for add_item in add[1:]]

        if matches.count(True) > match_count:

            match_count = matches.count(True)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = [i for i in add[1:] if i not in highest_match]

        results.append([add,highest_match,differences])

Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:

for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = set(add[1:]) & set(struct_add[1:])

        if len(matches) > match_count:

            match_count = len(matches)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = list(set(add[1:]) - set(highest_match[1:]))

        results.append([add,highest_match,differences])

Both yields the same results:

results

>>

[[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

  ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

  ['BR']],

 [['543',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BR',

   'CANADA'],

  ['7H0044',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BC',

   'CANADA'],

  ['BR']],

 [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],

  ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

  ['70', 'RD', 'M6C2X8', 'YK']]]

edited Nov 22 at 18:34

answered Nov 22 at 18:22

BernardL

2,3331829

The results are then appended accordingly to a list.

adds = [['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

        ['543', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BR', 'CANADA'],

        ['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA']]



struct_adds = [ ['7H0044', '234', '654', 'BELMONT', 'AVENUE', 'V8S3T4', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

                ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

                ['7H0034', '217', 'BONNYMUIR', 'DRIVE', 'V7S1L4', 'WEST', 'VANCOUVER', 'BC', 'CANADA']]



results = 



for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = [add_item in struct_add[1:] for add_item in add[1:]]

        if matches.count(True) > match_count:

            match_count = matches.count(True)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = [i for i in add[1:] if i not in highest_match]

        results.append([add,highest_match,differences])

Or if you want to use set operations, which should be more effecient as suggested in the comments you can replace the for block with:

for add in adds:

    match_count = 0

    match_index = 0

    for idx,struct_add in enumerate(struct_adds):

        matches = set(add[1:]) & set(struct_add[1:])

        if len(matches) > match_count:

            match_count = len(matches)

            match_index = idx

    if match_count == 0:

        pass # no matches

    else:

        highest_match = struct_adds[match_index]

        differences = list(set(add[1:]) - set(highest_match[1:]))

        results.append([add,highest_match,differences])

Both yields the same results:

results

>>

[[['3', 'V8T5G2', 'VICTORIA', 'BR', 'CANADA'],

  ['7H0033', 'V8T5G2', 'VICTORIA', 'BC', 'CANADA'],

  ['BR']],

 [['543',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BR',

   'CANADA'],

  ['7H0044',

   '234',

   '654',

   'BELMONT',

   'AVENUE',

   'V8S3T4',

   'VICTORIA',

   'BC',

   'CANADA'],

  ['BR']],

 [['28', '70', 'RUSHTON', 'RD', 'M6C2X8', 'YK', 'ON', 'CANADA'],

  ['7H0001', '700', 'RUSHTON', 'ROAD', 'M6C2X7', 'YORK', 'ON', 'CANADA'],

  ['70', 'RD', 'M6C2X8', 'YK']]]

edited Nov 22 at 18:34

answered Nov 22 at 18:22

BernardL

2,3331829

edited Nov 22 at 18:34

answered Nov 22 at 18:22

BernardL

2,3331829

answered Nov 22 at 18:22

BernardL

2,3331829

answered Nov 22 at 18:22

BernardL

2,3331829

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Qfyilyi