|
After some years of accelerating IPv6 deployment, we are now into a period of slower growth and it’s not clear where we are heading. It is therefore interesting to try to predict the future of IPv6 over the coming years. At Ericsson Research, we have been working on this topic since 2013, but just recently created a forecast model that seems to be quite accurate. However, it gives a disappointing message of a very low final level of IPv6 deployment at less than 30%!
The model is based on the commonly used data set, “percentage of users that access Google over IPv6”, provided by Google, from which we use the time series for native IPv6 traffic. We assume this data set can be used as an approximative indicator of global IPv6 deployment, even though some countries, like China, are not properly represented, due to national regulations. Figure 1 shows a recent snapshot of the Google data set from 2008 to 2019.
Figure 1. Percentage of users that access Google over native IPv6.
The data set is quite noisy, and we also want to avoid impact from the well-known intra-week periodicity and from variations at the start and end of the months. Therefore, we sample the data monthly, using an average of two weeks in the middle of each month.
We then create a growth model for the sampled data based on logistic growth. This type of model is common when describing the evolution of new technology, where there is an accelerating phase in the beginning and a decelerating phase at the end, forming an S-shaped curve that approaches a maximum level over time. The results from the model is shown in figure 2, where the predicted values are shown in red, and the real data is shown in blue. The last data point from May 2019 is at 24.5%.
Figure 2. Forecast of the percentage of users that access Google over native IPv6. The red curve indicates the predicted values, while the blue curve shows monthly sampled data.
We can see in the figure that the predicted S-curve fits the data set quite well. Currently, the model predicts a surprisingly low final level of IPv6 deployment at only around 28%. According to the forecast, the IPv6 share will grow slower over the coming years and be close to this estimated end level in late 2022.
The predicted curve can be interpreted as a single step of growth, going from zero to 28% over a 15-year period. This is a bit unexpected since there has been a lot of hope that IPv6 would replace IPv4 quite soon and then 100% would be the obvious asymptotic end level. If our model is correct, IPv6 will not replace IPv4 - or even be the dominant network protocol - in the foreseeable future!
Model history
Considering the strange forecast, how much can we trust this model? We don’t know, but the model has evolved in our lab and each time the fit of predictions to real data have become better. Let’s look at the model history.
The first model was created back in 2013 when IPv6 deployment had been growing with accelerating rate for some years. It was by then natural to create a model based on simple exponential growth. For a year, the predictions were quite accurate, but then the predicted and the real data started to deviate, so this model had to be abandoned. Also, from a theoretical point of view, exponential growth is not sustainable in the long run.
The next model was based on logistic growth, which is a commonly used model for all types of growth where there is an upper limit. In our first attempt, we expected the limit to be 100% and used that value as a fixed parameter in our prediction model.
However, the predictions from this model didn’t make a perfect fit either—the real data tended to oscillate around the predicted curve. As a fix, we added a sine-wave oscillator around the logistic-growth curve, estimated from the sinusoidal difference between the growth model and the data. The idea was that if there is some feedback mechanism in the market that creates an oscillation, the model should be able to catch it. Both models are shown in figure 3, where the pure logistic growth model is shown in green and the model with an added oscillator is shown in yellow.
Figure 3. Different growth models for the percentage of users that access Google over native IPv6. Green and yellow curves show a logistic model with a final level of 100%, without and with an added oscillation. Red curve shows a logistic model with an estimated end level of around 28%. Blue curve shows monthly data.
This oscillating model seemed to work for some years, but at the beginning of 2018, the forecasts again started to deviate too much from real data, so also this model had to be abandoned. We had assumed growth in one big step from zero to 100%, but apparently, this is not a correct assumption. Furthermore, the sine-wave correction was just an ad-hoc fix, not based on any specific market mechanism. From figure 3 it is obvious that a model based on a single logistic step up to 100%, with or without sine wave corrections, is not compatible with the current growth trajectory of real IPv6 data, shown as a blue curve.
In 2018 we, therefore, decided to skip the idea with a fixed terminal level of 100%, but instead tried to fit the data to a pure logistic growth curve with a terminal level not known in advance. The end level is thus estimated from the data set. As we can see from the red curve in figure 3, the new model gives quite a good fit to historical data without any need for corrections—in fact, the previous oscillations can be fully captured by this logistic model having the end level at around 28% instead of at 100%.
Model stability
Our new model seems to be quite stable over time. Experiments were performed to see how long a training period is needed to get good predictions. For all experiments, the time series is split up in one training period and one test period, both with varying lengths. The training period always starts in September 2008 but ends in different months, while the test period is the remaining part of the data set.
It turns out that, for all experiments having the training period ending in any of the last 13 months (March 2018 or later), the predicted end levels are confined within a very small interval between 27.5% and 28.5%. Even shorter training periods give similar results, but with a larger spread—for training periods ending in any of the last 24 months (May 2017 and later), we get predicted end levels in the interval between 24% and 32%. Our conclusion is that during the last two years, the model is not very sensitive to the length of the training period, indicating a stable logistic growth, with a final level of around 28%.
The statistic metric R2 is consistently very high for all experiments during the last 13 months period—at 0.99 for the training sets and around 0.85 for the test sets. R2 can be interpreted as how large part of the variance of the data set a model is explaining, where a higher value is better. The high values in our experiments indicate a good fit of the model to the data.
We also tried to see if the sampling method affected the predicted final level, but it seems to be quite independent of data points being sampled daily, monthly or quarterly.
The future
So, this is it? Will the great hope for the future of Internet be stalling at a mere 28% of global deployment? Perhaps this is not the end of the story - there is always a possibility that the growth of IPv6 takes place in steps, like the evolution of many other technologies. One scenario is that, after a couple of years with an IPv6 deployment level of around 28%, there might be a start of a new period of accelerating growth, leveling out at a higher percentage. So far, there is no sign of any such next step, but even if there would soon be a new boost of IPv6 rollout, we will probably have to wait for a long time before IPv6 deployment is getting close to 100%.
Sponsored byVerisign
Sponsored byCSC
Sponsored byVerisign
Sponsored byIPv4.Global
Sponsored byWhoisXML API
Sponsored byRadix
Sponsored byDNIB.com
I also have looked into this space, https://blog.apnic.net/2017/06/06/five-years-ipv6-whither-next-five/
What I find unsatisfying, is that a single figure model appears to need its error bars and margins better defined. We already have Jio/Reliance (mobile) and Sky (broadband) over 95% and they represent large populations. There is also evidence that whilst slow, there is a reactive market element and in both India and the UK, large scale deployment by competitors is happening. So, whilst a global IPv6 deployment average might fit 30% on the projections, the spread includes significant scale economies by GDP and population which have much much larger deployment levels.
I would however agree with your conclusion. Waiting for 100% is not sensible. Deciding how to plot a future in the next 3/5/10 years, of a mixed-protocol world is unavoidable.
I believe that what is going on is a bifurcation into two models of TCO. In one, mixed-protocol costs are assessed as a better fit, and a CGN is deployed but with higher retained levels of IPv4, and the cost burden is acquisition of small amounts of globally routable IPv4 but larger than the other model. The second model is to deploy IPv6 aggressively, almost to single-stack, but accept the smaller burden of a small CGN to cope with what is now a legacy protocol cost. The IPv4 acquisition cost is far lower, and the actual network operation cost is lower, but the conversion cost is very probably higher if you have a large V4 legacy network. Therefore this suits either clean (new) deployment, or an aggressive re-capitalisation.
I think what I’m saying is that the rate of IPv6 deployment reflects the rate and nature of capital and operational investment in the platform.
I think this sentence say it “there is always a possibility that the growth of IPv6 takes place in steps ...”, even I will say in small jumps.
I think it is clear from several studies, and my personal experience, that IPv6 is happening in each country at a different rate, and basically it starts with one of the major ISPs deploying it, and most of the time the other ones following. This means that there may be no increase in the deployment in several weeks or even months, but then you have a sudden change, because even a single ISP can mean hundreds of thousands or even millions of customers.
Specially in the residential side, when an ISP deploys IPv6, typically over 65% (up to 85%), of the traffic goes to CDNs/caches, which are already IPv6 enabled, so those leaps are on that order.
This is a very intriguing report. It uses pure mathematical curve fitting technique to cut through a lot of confusions created through opinions and interpretations expressed by different interests. When I was first exposed to this Google data around 2014-2015, it did look very much ready to take off exponentially. Then, I got quite puzzled by the increasingly noisy curve with a tendency to lowering its slope with time. Thanks for a model that filters out the noise and arrives at an asymptotic prediction conclusion. This provides the baseline for a very concise visual of the trend.
A. Instead of using this Google data whose source is somewhat specific, may I recommend you to apply your technique to the IPv6 / IPv4 comparison statistics by AMS-IX that you likely know about at
https://stats.ams-ix.net/sflow/ether_type.html
This data is more general because it is from their peering business serving users across different categories. Although, there would be some other factors influencing the exact meaning of the information, as well. Of course, the challenge will be whether your model has the resolution capability to see the trend of this nearly stationary data, although also chronologically recorded in fine detail.
B. The reason that we have been keeping an eye on this types of statistics is because we are working on an approach that may relieve much of the IPv4 related issues and concerns, thus affecting the future IPv6 traffic. Appendix C of the following IETF Draft outlines a snapshot of the current status of our efforts. I believe that its implication is worth your review and comment.
https://tools.ietf.org/html/draft-chen-ati-adaptive-ipv4-address-space-05
A parameter of the Sigmoid function is its “maximum value”, which in the absence of a specific limiting factor can only be 100%. It is a fundamental mistake, I believe, to assume an arbitrary lower value, just because it fits the data. The ceiling you have derived simply doesn’t exist in reality.
This is amply demonstrated by the fact that the USA already exceeds 50% deployment:
https://stats.labs.apnic.net/ipv6/US
There is NOTHING stopping the world average from likewise exceeding 30%, then 50%, and then more. I suggest you try another curve, or else persist with the S-curve, with a maximum value of 100%, and watch how we track into an unpredictable future. That may yet be an interesting analysis.
The statement that IPv6 represents “28% of global deployment” is almost certainly an overestimate. This is just google traffic, and in many countries google is not as dominant as in English speaking countries. The statistics from the Amsterdam Internet Exchange (https://www.ams-ix.net/ams) are probably more representative and they show that traffic on July 5 2019 was 2.3%. Unfortunately they don’t supply historical data, but I have obtained some from the internet archive. This shows that IPv6 traffic rose from about 0.5% in late 2013 to 2.2% in late 2017, but since then has stalled. The pattern of the curve may be similar, but at about 1/10th of the google curve.
David, I am led to believe that quite a lot of traffic in IPv6 flows in direct private peering. The APNIC measure is capability: if given an IPv6 webblot, can you fetch it. When we say at aggregate 20-25% can do this, thats how we account for uptake. It wouldn't surprise me if both things are true: many people are capable of IPv6, and use IPv6 when in direct contact with a source, but for public-routing packet exchange, the figures strongly suggest a historical overarching IPv4 traffic volume.
As a distinct class of behaviour, IPv6 is now at high levels of penetration in the mobile sector, such as Reliance, and there are signs of active competition in provision of IPv6 in the Indian telephony sector. This traffic isn't going to show up at an Exchange very much because the kind(s) of engagement people do in mobile don't flow on public paths. They tend to caches, and direct/embedded service models. Given the deployment of a native IPv6 mobile with CGN either as overlay or dualstack, I could believe the "deployment" figure because by market share, India as mobile is huge, and by market share, Reliance in India is huge. Almost all the significant uptick in IPv6 in China is coming from the mobile sector. Because the federated states model inside China has distinct ASN, it doesn't show as a single line aggregate and again, peering in china is opaque and unlikely to show true levels of traffic because the modalities of usage are just different. Akamai's figures closely track APNIC and Google, for most ASN. Akamai are fully independently measuring IPv6 without reference to Googles figures. APNICs measurement does use Google advertising but it is run distinct from any numbers Google publishes directly and the tested clients are distributed worldwide in mobile, tablet and pc/desktop. They aren't a measurement of fetching of google assets, they are adverts placed by google/doubleclick, but in general purpose websites, games, and other ad-revenue apps.
We need to focus on principles to avoid getting into details that distract the focus leading to divergence in the discussion: A. Sigmoid function: This is a mathematical equation used to model many phenomena and events. It has asymptotic maximum and minimum limits which are commonly normalized to be 0 -- 100% or +/- 1. By itself, its curve does not have much meaning, until physical quantities of a subject matter are associated with it. If a product is to address a particular field, it can treat the maximum possible demand from that field as the 100% target. If such field is part of an industrial sector with more than one field, the product can not expect to fill the entire sector, thus the projected maximum demand for this product has to be less than 100%. Since IPv6 is only one of a few protocols that are currently carrying Internet traffic, it is not unreasonable to accept that IPv6 will handle less than the entire traffic. Unless, we have a definitive knowledge that other protocols will fade out at certain future time. With the Dual-Stack scheme expected to be in operation for a long time to come, it is clear that IPv6 can not assume 100% of the Internet traffic at least for the same duration. B. Deployment vs. Traffic: The APNIC statistics cited above is a chronicle equipment readiness record. It is fundamentally different from that of AMS-IX traffic data. The former can be expected to reach 100% someday when all IoTs are IPv6 capable. The latter is normally shared among several protocols. So, IPv6 traffic can not be 100%. In addition to Dual-Stack, IPv6's backward incompatibility will discourage adoption. Therefore, the percentage of Internet traffic carried by IPv6 will be capped at an even lower level. Until such handicaps are removed, we should not assume that the IPv6 target can be 100%. On the other hand, if some new event come along that has negative impact on IPv6, this cap may become lower still. A possible exercise as suggested in my initial comment is to multiple the IPv6 % value in the AMS-IX statistics by a factor of 10. The resultant numerical numbers would then be in the 20's which are in the same range as those in the Google data used by the author of this article. It would be interesting to see if the two curves have similar shape and projection? C. Backbone Peering: One argument often presented for the sake of the IPv6 is that it is deploying very fast and significant in various sectors. This may be true but hard to debate, unless the relevant worldwide data is fully disclosed. On the other hand, an article about IPv6 peering dispute among backbone router businesses sheds an interesting light: https://www.theregister.co.uk/2018/08/28/ipv6_peering_squabbles/ In essence, this article reasoned that because the peering arrangements for IPv6 was not as mature as those for IPv4, larger portion of IPv6 traffic compared to that of IPv4 was being diverted to the IXs, such as AMS-IX. In other words, if peering arrangements for both IPv4 and IPv6 are about the same, IPv6 traffic seen in the AMS-IX statistics will be even lower! Abe (2019-07-15 15:51)
Following my speculation, our team went through the exercise of plotting the AMS-IX %IPv6 Traffic history along with the analysis result of the Google data in this article (by extracting graph values every 6 months from Figure 2. through eye-balling) and posted on the following webpage: https://www.avinta.com/phoenix-1/home/IPv6-AMS-IX&Google.pdf It is apparent that both statistics exhibit very similar trends suggested by the Sigmoid model. The difference is just the scales which would be dependent on the basic mix of traffic types in respective environments. Next, it would be interesting if we can get hold of the Global IP Traffic history data to see if there is any similarity in the overall Internet environment.
Thanks for going through the AMS-IX data. It is interesting to see the similarities between the AMS and Google data, however with different scales. Since the two data sets have somewhat different origin I think your graph strengthens the case that we actually have a situation where the IPv6 deployment share is reaching some kind of plateau. Also, your plots with Global IP traffic below seem to show this kind of sigmoid behavior with a projected end level far below 100 %, further supporting the idea. In your AMS graph I notice a slight difference in time of the inflection points of the two curves. It seems the Google curve is lagging a year or so behind the AMS-IX curve, which is a bit strange if the two curves would capture the same trend. Also, from the figures with Global IP traffic it appears the Global IP curve lags another 12 years behind the Google curve. I see no obvious explanation for these differences. Perhaps the differences are related to the fact that the Google data is reporting the share of IPv6 users (IPv6 addresses), while the AMS and Global IP traces are reporting the share of IPv6 packets. I have now updated my model with recent Google data and see very similar results to what I presented in my article above in May 2019. When using different partitions of the data set for fitting the model, with lengths varying from ending in May 2019 up to ending in February 2020, the projected upper level of the IPv6 share is still very close to the previous forecasts; all predictions are in a small interval of 28.2 % to 30.1 %, where 30.1 % is the latest prediction. Even when only using a smaller data set ending two years ago, including February 2018, the predicted end level is similar at 29.3 % and the predicted curve is almost identical to the fitted curve we see today. I'm surprised about the stability of this sigmoid behavior of IPv6 deployment share over time. It strongly indicates that there is and unknown limiting factor of IPv6 growth. My guess is that there is a global equilibrium point in the market between cost/revenues of deploying IPv6 vs extending the life time of IPv4 that defines this level of 30 % IPv6 share. This equilibrium point has obviously been stable since several years and the market has slowly been moving towards it. The global market actually seems to behave like any physical system out of equilibrium. As long as the cost balance will not change it seems we will be stuck at 30 %. Perhaps when the price of IPv4 addresses starts to increase heavily we may see this level moving upwards.
Hi, Christofer: 0. Thanks for sharing your insights. Allow me to present some of our thoughts and speculations. 1. “It is interesting to see the similarities between the AMS and Google data, however with different scales.”: Yes, this was not exactly what we expected when we started out with a naïve guess. However, since many natural events, especially product life cycles, do go through very similar four phases, this correlation is not totally a surprise. What is intriguing to us is the well-defined fit between the Sigmoid model and the Internet events. How could these trends become visible so smoothly within such a short time frame, compared to other historical product statistics as summarized in a figure of the following APNIC blog? https://blog.apnic.net/2017/06/06/five-years-ipv6-whither-next-five/ which cited an earlier work in a Harvard Business Review article: https://hbr.org/2013/11/the-pace-of-technology-adoption-is-speeding-up Of course, with possible future significant events, these Internet event curves will be updated periodically to end up with “ripples” in the long term history, just like other legacy products. 2. “Since the two data sets have somewhat different origin I think your graph strengthens the case that we actually have a situation where the IPv6 deployment share is reaching some kind of plateau. Also, …. further supporting the idea. “: Agreed. 3. “... the Google curve is lagging a year or so behind the AMS-IX curve, ,,, Global IP curve lags another 12 years behind the Google curve. …”: Correct. To start with, since our involvement with the Internet is superficial, we have no idea what could be the real causes to these offsets. Also, we are not clear about how exactly each of these statistics was measured. We can only take their face values of what the respective titles imply. My wild guess is that the Google usage is a special subset of the overall Internet activity. With Google’s commitment to IPv6, the trend of the %IP6, although already higher, may extend its growth further in time than the general behavior observed by AMS-IX. The Global IP Traffic is actually a different animal. It is basically showing how overall worldwide Internet activity growth is. (I flagged that the AMS-IX and Google %IPv6 traces were included just for visual continuity, not for direct comparison.) With continued new contributions from developing countries, the notable lagging of this graph could be understood. This is why we are quite disappointed by not being able to locate any Global IPv6 Traffic data which will allow us to derive the Global %IPv6 statistics that is the proper format for comparing against those of AMS-IX and Google. 4. “… Even when only using a smaller data set ending two years ago, including February 2018, the predicted end level is similar at 29.3 % and the predicted curve is almost identical to the fitted curve we see today. “: I believe that the future trend of a Sigmoid curve is determined around its Inflection point (The layman’s term “Crossover Date” that I picked up from somewhere is not as elegant.). Since the Inflection point in the Google curve is around early 2017, any historical data set that covers dates close to this point will begin to predict very similar future outcomes. From there, the more current data added to the training phase, the more accurate and less spread the predictions will be. Since your training data set already passed the Inflection point, the variations due to data set changes would be minimal. We verified this characteristics when we tried to perform a sanity check on the R2 values of our Sigmoid modeling over both AMS-IX and eye-balled Google graphs. 5. “I'm surprised about the stability of this sigmoid behavior of IPv6 deployment share over time. It strongly indicates that there is an unknown limiting factor of IPv6 growth.”: One thing comes to my mind is the IPv4 CG-NAT (Carrier Grade Network Address Translation). Its deployment is general and wide-spread enough to create a notable effect on the traffic mix between IPv4 and IPv6. 6. “… Perhaps when the price of IPv4 addresses starts to increase heavily we may see this level moving upwards.“: Yes, this is the possibility that IPv6 promoters are hoping for. On the other had, if there were an approach to rejuvenate the IPv4 address pool capability such that the need for IPv6 were reduced, the trend would move to the opposite sense. There is such a potential case as mentioned in my initial comment. Please have a quick look at Subsection 3. C. of the following IETF Draft: https://tools.ietf.org/html/draft-chen-ati-adaptive-ipv4-address-space-06 In a nutshell, we started out to study the IPv4 address pool depletion issue that ended up with a solution called EzIP (phonetic for IPv4). To our total surprise, the degenerated form of the EzIP deployment, called RAN (Regional Area Network), offers a lot more than we could have dreamed for! I believe this will have significant implications to the %IPv6 trend that you started this discussion thread with. We would appreciate your thoughts and comments. Abe (2020-02-26 17:41 EST)
1. It turns out that the task that I proposed is quite challenging. Not only the Global Internet traffic statistics stopped since after Year 2016, Global IPv6 traffic data appears have never existed. It seems that CISCO decided to date stamp a version of their annual Internet statistics white paper with later dates then released as a current version, as well as redirecting some URLs for older versions to this copy, making this set of URLs rather confusing.
2. Courtesy of Wikipedia.org and Web.Archive.org, we managed to gather enough Global IP Traffic data (from 1990 through 2016) to be worth plotting a graph for curve fitting by the Sigmoid model, as posted at the following URL. Note that we included the previous %IPv6 in AMS-IX and Google data in these graphs for the visual continuity in discussion:
https://www.avinta.com/phoenix-1/home/Global_IP_Traffic_Statistics+Forecast.pdf
3. Since the “Actual” data stopped after Year 2016, we decided to make three versions of graphs after that:
Figure 1.: Forecast included till 2022,
Figure 2.: No forecast after 2016, and
Figure 3.: Conservative forecast till 2021 (based on a data set from the year just before the last original while paper)
4. It would be prudent to summarize the key parameters of these three graphs that may not be apparent by visually reviewing them; Asymptote (A), Slope (S), Crossover Date (C), and Static Metric (R2).
Figure 1.: A = 7.85, S = 1365, C = 2027-02-23, R2 = 0.99991
Figure 2.: A = 5.02, S = 1697, C = 2027-02-06, R2 = 0.91709
Figure 3.: A = 3.43, S = 1357, C = 2023-10-02, R2 = 0.99984
5. Although the modeling may be a bit of premature because the graphs have yet to reach the C region, it is intriguing to list a few interim observations:
5.A. Figures 1. & 2 have very close C values implying these two may be correlated, while Figure 3 has a much earlier C date of 2023-10.
5.B.: Sigmoid model fits much better in Figures 1 & 3 (higher R2 values) with the offsets in Figure 2. clearly visible. This could be the result of fewer data points in the latter.
5.C: It appears to be counter intuitive that A values of Figures 1. & 3 are on either sides of that for Figure 2. Is Figure 3. overly conservative? Or, could Figure 2. is the closest in predicting the trend?
6. These are all what we could do with the available data at this juncture. This is kind of disappointing due to the unexpected missing and outdated data. Is there any colleague out there who may have more current data? We would appreciate such additional information to improve this Internet status presentation.
7. As well, we are open to share our methodology with whoever interested in evaluating and keeping an eye on these vital Internet signs.
Abe (2020-02-24 22:54)