Silver seems to have considered incumbency advantage decreasing, he gave Heitkamp a large one as it is a small, demographically distinctive state which he found have stronger incumbency advantages. Nelson's advantage is only 5.9% because Florida is a large state, and Cruz's is 5.4% similarly. The advantage in the House is also smaller than in the Senate. Furthermore, you could argue that in 2018 incumbency advantage is much smaller, that might be true, but if there's insufficient evidence then inserting your assumptions into the model would make it less reliable. Furthermore, this isn't that surprising, Democratic incumbents have been doing pretty well this cycle overall(in comparison to their state's partisanship), and even in recent elections incumbents have done well.
Let me restate and try to be more specific, also. The problem with the way incumbency is treated is not just that its effect is decreasing over time (though this is a problem, and is really mostly a function of increasing partisanship). That is just one aspect/example of the problem. More broadly, or additionally, the problem with incumbency is also is that it is standing in for omitted variables in the model. In this sense, the problems with how incumbency is treated in the 538 model are actually similar to the conceptual problems with the idea of "candidate quality" - it is to a significant degree simply a residual variable - whatever cannot be explained by other actual factors tends to get picked up and thrown into the bucket of "incumbency" (which means that the variable called "incumbency" in the model is not actually measuring true incumbency).
One of the major and important differences between Congressional/Senate elections and Presidential elections (and here is where domain knowledge in recognizing and understanding this difference comes in to play) is that most elections in Senate and House races are not seriously contested. In Presidential elections, it is true that some states are not contested (in the sense that the candidates run no deliberate campaign in those states), but nevertheless the voters have much more information than in Senate/House races that are not contested and the races are higher profile, and they still get a lot of news about the race. This is one of the main reasons why Presidential results are generally easier to predict and more systematic, because Presidential races are always contested seriously, to pretty much the maximum extent possible, whereas in Senate/House elections, this varies a great deal.
And in elections that are not seriously contested, incumbents do very well. This is not per se because they are "incumbents" and is not really "incumbency effect" per se - rather it is simply that the election was not seriously contested - but nonetheless if you run a regression of election results data that includes, for example, the 57th time Kent Conrad, Pat Leahy, or Chuck Grassley ran for a (basically uncontested) re-election against a non-serious challenger, then you will end up finding an artificially inflated effect of "incumbency." This is because your specification *should* include, essentially, something like a dummy variable for "this election was actually seriously contested." In the 538 model, fundraising does actually partially serve that purpose, but it won't ideally capture the true thing that is making a difference here. Name ID - if there were data for this and it were included, could similarly serve as an imperfect proxy variable for this.
OK, here I just have to say, all modeling is necessarily making assumptions. Models are inherently foundationalist constructs used to explain data. The process of specifying a model consists entirely of making choices about what variables to include and what variables to not include (and also what functional forms and other methodologies to use). These choices are inherently arbitrary and subjective, but yet some choices are definitely better than others (and that is where domain knowledge and also, indeed,
intuition comes in). So any such idea that there is some way to make a model that relies more or less on making assumptions is just wrong.
Indeed, a model is not "invalid" just because its findings are surprising (but in truth it is a
category error to apply the concept of "validity" to models - testing "validity" is just not something that mathematical models do or can do). But if a model's findings are too surprising, that is often a good hint that either there is some problem or limitation with your domain knowledge, or else that you are missing important omitted variables, or maybe that you are using a flawed/inappropriate functional form somewhere, etc. Surprises and things that simply don't make sense are hints that your model can be improved. Experience and time spent looking at these anomolies and finding order/patterns within the swirl of anomolies hones your intuition and helps you to then go back and make better models that will not omit the wrong variables, that will be specified in an alternate way, and also that will not overfit. Model making is really, above all, a matter of art and judgement.