Skip to content

NUTCH-2481 HostDatum deltas(previous step statistics) and Metadata expressions#278

Open
okedoki wants to merge 10 commits intoapache:masterfrom
okedoki:NUTCH-2481
Open

NUTCH-2481 HostDatum deltas(previous step statistics) and Metadata expressions#278
okedoki wants to merge 10 commits intoapache:masterfrom
okedoki:NUTCH-2481

Conversation

@okedoki
Copy link
Contributor

@okedoki okedoki commented Jan 17, 2018

The logic of updatehostdb is changed slightly.

In case of specification of hostdb.deltaExpression, we dont reset statistics in mapper, but send the previous step statistic first to the reducer and reset it afterwards.

In line 215 of the mapper
if (readingCrawlDb)
is replaced by
if (readingCrawlDb && !isDeltaStatisticCalculated) {
hostDatum.resetStatistics();

  •  }
    

Please, verify that logic doesn't break the current functionality.

@okedoki okedoki changed the title Nutch 2481 NUTCH-2481 Jan 17, 2018
@okedoki
Copy link
Contributor Author

okedoki commented Feb 12, 2018

@YossiTamari
Refactored according to your suggestion. It is quite bad that we have a utility for it and it wasnt used.

@lewismc lewismc changed the title NUTCH-2481 NUTCH-2481 HostDatum deltas(previous step statistics) and Metadata expressions Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants