.
RIPE 89
29 October 2024
MAT Working Group
4 p.m.
MASSIMO CANDELA: Hello everybody. So, we are going to start the MAT Working Group in a few minutes, so, just stay with us.
STEPHEN STROWES: Good afternoon, and welcome to the MAT Working Group, we're going to be starting in a couple of minutes. Please grab a seat.
MASSIMO CANDELA: Okay, so good afternoon everybody. Good afternoon to this new session of the MAT, which is measurement analysis and tool. So, we ‑‑ I am one the Chairs. This is our staff. With me chairing this session, there is Nina, unfortunately she could not be here this time, and Stephen, who is going to help me through this session.
And I am happy to say that we received really a lot of submissions for presentations, and we have a really packed agenda, more than usual, we usually say that but this time we exaggerated kind of. So I would say that it is good that we immediately start with looking at the agenda.
So, the first presenter will be Jim, who is present measuring and visualising DNS water sheds. Then we have Etienne Khan presenting a first look at user installed residential proxies from a network operators perspective.
Then we have IoT bricks over v6. And they have Lion Steger, target acquired? Evaluating target generation algorithms for IPv6.
Then we will go the next one: What's in the data? Unboxing the APNIC user populations. Which will answer some of the questions that appeared in past sessions on data centre, they are commonly used for research. Then we have last but not least Emile presenting RPKI flutter. So I would say that we can immediately go with introducing the first presenter. We are proud to have here presenting, Jim Cowie. Jim was a founder of Renaissance in the early 2000, currently a research fellow of Internet and society at Harvard university, with he is working on the preservation of Internet measurement history, he will present measuring and visualising DNS water sheds.
JIM COWIE: Hi everybody. I am Jim Cowie, the Internet history initiative. This is going object extended version of a talk that debuted at the central Asian peering in inter‑connection forum last month, where being in the right watershed for content, being close to important services, is super important.
This is I think my first time on stage at RIPE since RIPE 70, so it's been a while.
So, just as a quick plug for the initiative. Every time you perform a traceroute, every time you do a DNS resolution or put a ping into Atlas, every time route views or RIPE RIS collect a BGP route, every time the Internet archives scapes a web page and carefully records the header from the servers and the IP address where the content came from, we are all collectively creating the historical record of of what happened here. And there are a lot of people in the world that will be able to use this record to do important things. Obviously for Internet policy and governance, obviously for us here who study the Internet, but also those who come after us, they want to understand eventually how the Internet changed, improved, didn't improve civilisation in this century after we're all gone. Who preserves that data? It's up to us as a community to do some of that.
So, enough of that plug.
So, more Cloud, less cloudy ideas. This is the Atlas probe set today. It is distributed ‑‑ you see here a kind of a log scale colouring plot, so the darkest redress here have more than 250 probes per HEX cell. These are same size HEX cells and the faintest ones have maybe one. Can we use this for a global study or to detect a global trend? Yes we can, but we have to use some caveats, right. Atlas is a wonderful network for measuring, originally designed I think for network people to understand what's going on with their own network.
The people who run Dallas probes and bless you all who do, are a unique population, right. We here in this room run Atlas probes. It's not necessarily a perfect model out of the box for how society uses the Internet, and so, we can do global trends, we can do global studies, we do need to take some care to caution ourselves about thinking that the Atlas network is a perfect representation of reality. It needs a lot more coverage in the global self for example. But, you know, a work in progress.
There are two specific queries that I discovered since I no longer have a company that does Internet measurement. I rely on the kindness of strangers. These are two queries that I found. They have been running since 2017. It was a hackathon, somebody said we need to figure out where DNS servers, where the probes are actually reaching in terms of the final resolver on the way to authoritative. Eats what these queries do. Go to Google or Akamai they tell you yeah your IP address is blah.
And here there is a third measurement these does v6, but today I'll focus on v4. You get this measurement automatically all probes, once an hour, continuously, for the last seven years, right. So that's a great historical dataset.
So, the question here was: What can we see in this dataset? Is we see the final resolver that is hitting authoritative. And so what are going to have to do is work backwards a little bit to figure out who that was, to figure out what the ultimate Atlas probe was thinking when it made that query, or not thinking, whatever their configuration was.
And this will be somewhat interesting.
So, one basic question is: When you look at T what would you expect about the distribution of the number of resolvers that do work on your behalf at authoritative as a probe? So the most common answer to this, which is kind of intuitive is two: A lot of people have a resolver configured, and they have a backup resolver configured and those are the resolvers go straight to the authoritative and the work is done. But if you use any of the big public Anycast continues instances, you are getting mapped to a region. Over time you may get mapped to different regions. Each of those regions is likely to contain some amount of space for the resources that make these queries. Maybe it's a /26, or so. And over time, they are going to pull resources out of that pool to go to make the queries on your behalf. This very rapidly adds up. So the median probe is seeing like on the order of 100, over the course of seven years. Which seems large but over seven years it's not, it's relatively stable.
The high end of this curve should give you pause. You'll note ‑‑ this is a semi‑log cumulative distribution. You'll see there is a little nudge up at the top of this curve. These are people who go to almost 10,000 individual resolvers, making queries on their behalf over the course of seven years.
And so, we don't know what this is. I spent a couple of days at DNSO orch just before this conference thinking about it with people. The rough consensus is this might be probes that somehow, I don't know what CPE does this, but behind a Tor, a Tor gateway, right, they are going through a tunnel that the exit node of that tunnel, if it's Tor, it's rotating every ten minutes for TCP and DNS queries are getting proxied right through that tunnel and so yes, every hour you get a different, what looks like random resolver to help you out. So, that's a little bit bizarre, so we'll study the rest of the curve here.
So, you know, let's try to draw maybe one broad trend from this. If we look at this plot we have got three lines. The blue line is the percentage of resolution on behalf of probes that we think are using the big Anycast open providers. And by big, I have arbitrarily made a cut‑off, this is quad 1, quad 8, Quad9. If you add them together over the last year I have graphed five years, their percentage of the probes demand for he resolution services has grown steadily from about 30% to about almost 50%, so that's pretty clear.
At the same time, who are they drawing that from? Again the temptation is to call it market share. The biggest competition for that is the aim AS, right. If you are a probe, and you have a resolver, and your same AS, that's the one you would traditionally be going to if you weren't using an open public.
So, what's interesting is the same AS percentage has declined, but only very slightly. By about 5 percent. And then you get other resolvers service, this is every other IP address I couldn't classify as either same AS as the probe or one the big three.
And that is declining. That's right down in the weeds now below 10%, my temporary interpretation of this is that it may be some consolidation in the field of public OpenDNS resolvers. So more market share going to the big three, or big four, whatever you want to cut that tail. And fewer going to random things like being behind a Tor node.
It doesn't really help to look at this regionally. Because regionally, the distribution of Atlas probes is such that either the behaviour is very similar to worldwide, which would be a European region, or there's two few probes to reliably feel good about painting the trend. I will show that Asia, if you add up all the probes in Asia, all regions of Asia, you notice that the crossover point where big public Anycast becomes dominant, I shouldn't say dominant, exceeds the percentage of same AS resolvers, it takes place just a couple of years earlier. But the trend is the same.
So, just a little bit, because this is, you know, this is a methodology session. So, we should talk about how we actually decided that these final resolvers are on behalf of anybody in particular, including these big three.
So, some Anycast services dedicate a lot of IPv4 space to this problem. They do it all out of a single AS. They publish their geolocations, which makes it really easy. Others are putting these things together out of spare parts in some sense, they may have primary space out of a big AS but then they are also working with smaller providers around who will give them will heical space to host Anycast instances and do service from there. And so it can be ‑‑ and they don't publish the official lists, so this would become a challenge to figure out how it gets put back together.
And it's counter intuitive a little bit because the final resolver on the dotted line, the last one that hits authoritative is not necessarily, even from the same AS that you asked your first question from as a resolver in a probe f they are a forwarder, any number of things can happen. They can use round robin to multiple big Anycast services. All kinds of things happen.
If you want to replicate some of this and study it, I have included a couple of URLs, if you have never looked at this, Google makes it pretty easy a. He can get this as a DNS query or you can look at a web page, they'll tell you exactly which address ranges they use as the backends for the public DNS, along with airport codes. Which is awfully friendly.
CloudFlare, I have not been able to cleanly get some geolocation on it but I accept offers of help. But broadly, they also make it pretty easy because everything they are using to target authoritative is coming out of 13.3.35, so anything we see coming from CloudFlare AS at the authoritative we will ascribe to them for these plots.
And then Quad9, I was able to do a little bit of reconstruction. They are polite in that you can do a query server at 9999, and will they'll tell you where your probe is getting service, at least in this one moment. And you can find the airport codes in there. You can then go back and turn these back into Unicast addresses. You go back to BGP, like I did, turn them back in the routed prefixes, turn them in ASes and at some point you have a rough and ready model of what that backend is going to look like. I think we have identified here seven different ASes that from the Atlas probe set are useful hosting backend resources for Quad9.
But, this is an admittedly incomplete map. I would not use these sorts of techniques to do market size surveys or compare providers or anything like that. We globe towards complete understanding of imperfectly revealed sets of resources.
Again, because this is about tools and methodologies, I wanted to recommend to everybody who hasn't tried this yet for visualisation to go look at h3, which was originally invented by Uber as a hierarchical HEX ago knell system. It can make nice maps and it's much better than using grid squares or lat long cells or any of those things.
Here is a couple of quick maps just before we're out of time. This shows the, in each HEX cell it's coloured according to the most popular backend location that serves that cell. So if this worked right, and I think it did, you can basically see for example that Google drags everybody in northern South America all three the Caribbean and Florida up to Charles tonne and on like that. Quad9 has the same sort of map, fewer people in the Atlas probe set rely on Quad9 for as many resolutions theres a few fewer dots, but they have a lot more backend centres from which to serve. So, you can see a very long list and nice little water sheds forming. So people are getting mapped to the proximate resources. Theories Quad99 in Europe. You can see the big green Frankfurt. Frankfurt going all the way out through Russia and into central Asia. And down into day ran. But by and large everybody getting mapped to the place they need to be.
And here is the equivalent for Google. Same kind of result. These are surprising, was surprise to go me when I first saw it, but Milan ends up going all the way down to Italy, across the Mediterranean and Turkey and the across the caucuses. You can see that Google serves all the Russian speaking world out of Finland, even if you are in Vladaroska. That's how it is. It's not optimal for everybody, but still pretty good job.
So I don't linger on conclusions, but I love the Atlas historical data, and it needs to be preserved and celebrated and there is secrets in there that, you know, there are those among of who have run experiments that do exactly the same thing and they are in Atlas and we don't know about eachother, it's hard to curate this set and I think we need to do a good job of that as a community.
The trends are clear. I think it's cool how the big Anycasters do establish water sheds and draw people in and get them reasonably low latencies to their local instances.
That's it. I might have time to avoid one question. But thank you.
(Applause)
STEPHEN STROWES: We have an approximately 30 seconds.
AUDIENCE SPEAKER: Hi. One question, if you update your statistics in the future, it would be interesting to correlate it with local censorship if that triggers a move to open resolvers. It definitely has in Switzerland. (Sorry, Silvan from Openfactory.)
JIM COWIE: That's what time series data is great for. I agree.
AUDIENCE SPEAKER: Just to make sure I understand correctly. So these are the resolvers that are configured by the owner of the probe, so what you see is that people who are techies, like interested in this stuff, tend to use OpenSource resolvers more.
JIM COWIE: It's hard to say. That is broadly what it says. I think Atlas probe users are more likely to choose something that didn't just show newspaper their DHCP at pair apartment. They want to use a specific. Maybe they want to use Quad9 for protection for what have you. But also at the same time, this avoids local forwarder. So, if you talking to somebody and they transparently or otherwise are forwarding your queries to the big three and that wasn't your choice, we're still going to record you as belonging to the big three.
AUDIENCE SPEAKER: Also, you could use one of the big three for your RIPE probe and using other resolvers for other devices you have, right. It doesn't have to be the same choice.
JIM COWIE: That's true. Thank you.
STEPHEN STROWES: Thank you Jim.
Our next speaker is Etienne Khan who is going to be talking about user installed residential proxies. He is a Ph.D. student in the dark scoop at the University of Twente, that's the design and analysis of communication systems and residential proxies are his research focus.
ETIENNE KHAN: Hi everyone. I am a Ph.D. stunt and I'd like to compare a bit to a television show with different episodes. Now is the next time for the next episode. And the first one for me was at RIPE 86 where I presented my first work about commercial VPN providers and how they due unblock content.
Many will probably want to press the SKIP INTRO button but I need to have one slide on what the motivation for this talk is. Back then I looked at the way commercial VPN providers have unblocked due block content and the conclusion that they have three methods. Using obscure hosting or creating completely fake content providers or the third thing which is residential proxies.
So, with that said, the next episode in my Ph.D. TV show will be a first look at user installed residential proxies from network operators perspective.
So, we have a few key terms here and just to be on the same page, we'll go through them.
Residential proxies. I assume most know what a proxy is, but just to be sure, if you are the user on the left and you connect to a site, that site could essentially see what AS your IP belongs to. If you use a residential proxy there is just something in between where then instead of your own AS, which is could be a data centre for example, then gets swapped or masked behind the residential one. And that can help you circumvent a few things.
Now, why does it have user installed in the title?
The reason is, when I did my first research, I was wondering how did these VPN providers get access to residential connections? Because I looked at the software they supply, and for the big VPN commercial VPN providers, the software is fine, it loads something like a wire configuration or OpenVPN in the background and there is nothing weird going on. If you have exit IPs from known big user ISPs they still must have some way of defendanting them. You might come cross something like this which are websites that claim that you can an earn passive income effortly if you give them your Internet connection. They look all differently but all have the same kind of focus on nice golden coins that come flowing into your pocket without doing anything. Of course you could also buy proxies there.
So, how did our experiment look like? A sense you have this very big Cloud of residential proxies that are part of the system. And our testbed is the square on the lower right. And we then tunnel requests for customers that buy these proxies to target sites, and from the streaming sites we have seen in the beginning we also want to know what else are they tongue. On the left side you can see that the residential proxy providers and something I call bandwidth broker is not really known what their connection is because the companies who ask you to install these proxies, let's say that either they are not registered as a company here in mainland Europe or in some Caribbean island, and you later see on in stock why that is the case.
I have collected a lot of data and to make my analysis easier and something that's helped me a lot was a tranalyser to create flows and most of my analysis was flows.
Before we continue, I assume most of the people here have a fast Internet connection. Can you guys raise your hands if you have a fast connection at home. Who would want to earn some passive income with that connection? Okay, that's great. No hands are going up. Because yeah, we'll be swimming in money, right, at the university we have gigabits so we can really pump that connection.
Now, let's have the first look then.
This is a site where you can buy residential proxies, and let's see what we can use these proxies for. So, there is this menu that opened up and you can use a proxy for folks for eBay, for Greg list, for Instagram, Facebook, price monitoring, web scraping and some other kind of use cases. But is that really all? And after having collected the data, we also saw a different side of those proxy.
Many of you probably heard of a new type of scam called pig butchering were people on dating sites at some point get asked to transfer money to, for example, a crypto exchange or a fake stockbroker because the person they have matched with on their dating site said they are very good at trading and they can multiply your money. It turns out this is a scam and somehow these people have to be able to create accounts in certain regions, right. And when we enter our dataset, this is something we found for tinder and sub‑domains within our residential proxy set for example and what we can see, this pattern which gives the impression that whoever is using our proxy for tinder is very active between eight or nine o'clock in the morning and activity ceases at around ten o'clock which would make sense if you were to target people in for example the Netherlands or western Europe in general.
Something else we have seen is phishinging. So this is a phishing tool kit that allows you to phish two factor authentication sites. You are the victim on the left. And you get a phishing link, you click it and at the same time the phishing engine will load that site for you and the man in the middle happens basically in between as a plain text session. But to hide the traces, the phish err might use the residential proxy Cloud to load the target site. Because of someone from let's say Amsterdam logs into their PayPal, if that request is suddenly from a data centre in France, that would be weird, so they buy residential proxies for that. Why do we think it's phishing? We make use of two I think so this. The first thing is TLS fingerprinting we use this to give unique fingerprints for different TLS engines and we look at server name indicators in the TLS hello affects.
And if you want to do web pacts through a residential proxy, I guess that's fine if you want to do that (packets) but if you open these kind of domains, I don't think on the account at booking dotcom login you can't scrape in any kind of prices. And the same is true for other landing packets like just PayPal dotcom and others.
The last look we had have sophisticated bots, and we have a partner for this research, and essentially there is a technique called denial inventory. You have all these these sites that compare prices, but the question is how can they do that? Well they get the prices through a proxy for example. But sometimes some flights might be booked out completely and the next moment they are empty again. That is called a denial of inventory. And with our partner, we have seen that some of these proxies actually try to block inventory so that other people can't reserve or register a flight. And then depending on what is handy for them, release those again.
Now, before I finish up, let's get back to all that money with we earned with the proxies.
Because, on the left side, those are the providers we checked. And in the end, after running this whole experiment for eight months and collecting 300 gigabytes of data, we earned $54 which is not enough to cover the costs for the servers we set up.
There is one last part I want to look at. And it's not coloured, which is the network operator's perspective. Sadly, my testbed consisted of one network operator which is from our university, and as a call to action for this topic, I would ask the ISPs to help understand this residential proxy issue by for example sampling in those networks to find indicators for residential proxy usage.
Just some quick takeaways. No one want to do that here, don't share your Internet connection. Ity probably a vehicle for very serious crime. Looking at traces and end user networks could reveal the true extent of these residential proxies. If you have any questions, please come to me in person or e‑mail them to me. Thank you.
(Applause)
MASSIMO CANDELA: I think we have a few questions.
AUDIENCE SPEAKER: Really interesting talk. Vassilly from CloudFlare. Whether what are the indicators force proxies after traffic leaves the ISP?
ETIENNE KHAN: In our case we have seen for example certain kind of DNS requests to a sort of mother ship like a status dot and then the name of the proxy company. But also, an a lot of them have certain back hole servers at well known ISPs that allow for a large volume of data to go through and those IPs are what I would say is a good indicator.
AUDIENCE SPEAKER: Me a. So thank you for doing the work, very interesting, I am wondering, is this all kind of bad actors or do you see also traffic and what's the issue?
ETIENNE KHAN: The question is is everyone a bad actor? I can't answer that question. In the case of our industry partner we worked with, those were bad actors. And then at some point it's a question do you think scraping Amazon or other e‑commerce site is inherent bad I think that's open for everyone to interpret on their own. The fact that you need an IP address ‑‑ or if a data centre IP address is not enough to reach your goal, I wonder what your goal in the first place is.
AUDIENCE SPEAKER: I could imagine there are countries where you just can't use YouTube or whatever.
ETIENNE KHAN: In the case of countries where you might be ken soared, I think that could happen, but usually (ken soared) residential proxies are paid by gigabytes. You buy data volume and those prices are more expensive than if you would buy an off‑the‑shelf commercial VPN provider if you want to use that for privacy.
MASSIMO CANDELA: We don't have any more time. We can bring this offline. Sorry that we can't...
.
(Applause)
.
Now, we have the next presenter, which is the first of this session online, teen an roulette had you. He is a Ph.D. student. He is research is on network measurement, IoT security and privacy, specifically analysing the network behaviours of IoT device to say uncover and understand security and privacy concern. So he is connected.
TIANRUI HU: Thank you for the introduction do it. So, hello everyone. Today I'm presenting our paper IoT obstetrics over v6. This paper is accepted by... so it's a collaboration with these people.
So, we all know IPv6 adoption has been slow over years, due to many reasons. That said, there is a considerable amount of traffic, it's over v6 right now, so it has have show you like the percentage of users that Google over IPv6. And we also know, so most companies use operating systems over IPv6. So which means thattest /PHO of you personal devices are running this IPv6 ready device, unless you are in a something user. As we have various most general... devices, they are IPv6 ready.
So, however, smart home becomes like more and more popular so the household penetration rate is over 50% in the US but 5 percent in the Europe. IPv6 has the addressing problem for IoT privacies and provide remote access.
So, what we don't know S, you know, this popular IoT devices on the market actually support IPv6 and how do they use it? So, in this paper we will answer this question.
So we have two goals, first are consumer IoT devices ready for IPv6. If not why? It to what extent are IPv6 features supported? What IP version do IoT devices prefer?
.
Goal 2, what are the privacy and security implications?
.
So answer our questions, we first did a testbed. So we have IoT based devices from seven Greece and 45 manufacturers. We designed six experiments to test out their functionality under different network conditions. So one IPv4 only as baseline, 3 IPv6‑only and 2 dual stack. Commissioner monthly used for residential network, so please check on our paper for for details on the methodology. So to test if they are functional, you insert excerpts, so we designed a function where we consider why the functional, their primary function is working. So depending on the like of the device category, the primary function would be how the streaming YouTube or on TV etc.
Okay, so what did we find from our experiments? So only eight devices are considered functional on IPv6‑only network. So I think that the answer to this question is like no, most of them are not ready for IPv6. But, despite not functioning properly, many of them do show some level of IPv6 support. So, 59 of them have IPv6 traffic. 51 assign themselves one IPv6 address. 22 of them query DNS in IPv6 and 19 send data to the IPv6 Internet destination and /# had functional IPv6 response. Many of them have some level of IPv6 support. So we have 34 with no support at all, 8 have traffic RIR no address, 29 an address and no DNS, 3 have DNS but no data, 11 have basically everything but still not functional. Why is that?
.
There are multiple factors involved in the support for IPv6, and if any of the necessary component is not categorised properly, the device won't operate. So I will use some example here to explain.
First, let's look at the 11 devices that support all IPv6 features but remain non‑functional in IPv6‑only networks.
So this is because of their reliance on IPv4 only domains. So, although these devices have to some IPv6 destinations, the domains that are essential for their functionalities are still over IPv4. So why are those only IPv4 only? It is because of the failure to provide AAAA DNS records. This is huge comment here to support IPv6. Here are more cases about DNS. First their clients are not ready. So you activate the only network,
.
Which means that although they can stand DNS in IPv6, this devices are not able to support AAAA queries for the domains.
So in dual stack network, 33 devices had quad required, but only activated four. Which means that despite the AAAA record, which is good, they don't use activated the end resolvers for these records. Having IPv4 available is still a requirement for these device to say get AAAA DS responses.
So, how about the other sites. We performed the IPv4 AAAA queries for the destination domain. We got for the 8 functional devices, 73% of the destinations are AAAA available. But 85 non‑functional devices, only 31%. So this is definitely another important indicator of IPv6 support for IoT devices.
So, the last example is 29 devices with IPv6 address but do not send DNS in IPv6. This is because they don't have any global IPv6 address. So, they only use local IPv6 address because some smart hosts such as home case were the new one matter, mandate to use them IPv6 locally, but they just don't have configure IPv6 global address.
So after this like not working, how about like the functional devices? So here are all eight functional devices, so it's clear that first manufacturer has a bit ever an impact. Eight devices are from these four manufacturers. So similar devices from other manufacturers may not have full IPv6 support, the fire TV clearly says IPv6 may limit their functionality. For many major IT manufacturers, we don't find any of their devices on IPv6, like the four manufacturers on the other side. Many of them even join the similar operating systems, for example this is Android based.
And the device agree, so Smart TVs, smart speakers, are really like more sophisticated software, that are updated regularly, they are IPv6 support. Compared to home automation, smart cameras or etc.
.
So, this also means that they are OS software stack has an impact. So of the eight devices, that use Os that support IPv6, other devices may have limited device resources may not run software that's important.
Basically they are activated as an unnecessary component for now. Because most of the users are still IPv4, or at least use that network.
So, what about their preference of this to IP protocols? So we found that in the network, 2.8 of the domains use IPv4 were only despite receiving valid quad records, and 11% of the domains from IPv6‑only experiments fully switch to IPv4 when available. So although RFC 6724 recognises prioritising IPv6 over IPv4, it's still not the case for many smart home devices.
What about privacy afterward security? Let me talk about some background here. The IPv6 addresses was defined, it was matter use... as the interface ID to form activated address. So the MAC address is stable, it's unique, and it's traceable. So it can be used as, and are actually being used as tracer ID. So the latest RFC suggests you should not use this moth any
.
How many of our devices use it? So eight of them use them for DNS and five am them use for data communication with 27 destination domains.
We classify these domains based on their owners. They found that one of them is actually a third party and an active server. So, we disclose our to the manufacturers of these eight devices, and of them acknowledge our disclosure and indicated potential metadata in the future list.
So, what's the conclusion here?
.
First, obviously smart home is not fully ready for IPv6. So, why is this? So, other speculation that it is due to the lack of incentives. So device vendors, network providers, and administrators all must develop and maintain the IPv6 support in their software, DNS, library, etc. Do they have enough user needs? No, IPv4 still meets the connectivity needs for the IoT.
And there might be interoperability issues, privacy and security considerations, so that the cost of supporting IPv6 might be even higher than not supporting IPv6 for IoT manufacturers.
And all these factors makes them hesitate to fully support IPv6. So, to encourage IPv6 transition, we still need joint efforts from all stakeholders, the policy makers and the standards bodies, can also create some incentives for consumer IoT vendors to improve IPv6 support.
And as I mentioned before, a new IoT based standards matter that is been adopted by most of the major IP vendors now. Although it's like a local network standards that mandate IoT devices to communicate over IPv6 within the home network, not device to device or device to mobile, not through the Internet, but I believe it's a good first time.
So, before I add end my presentation, I also want to talk about our research on IoT measurements over several years, so we are a research group from, we have a head byte with more than 10 IoT devices, so we have 12 publications on IoT measurements, on privacy and security, we have several testbeds, without the IoT devices and the clients. You don't have to start from scratch. They have a testbed somewhere. So we also have remotely accessible IoT test site, we have collaboration with EU labs from California, and these on the list.
And up close, we are always open to collaboration from either academia or industry, so feel free to reach out to us if you have any questions, you can find our research and the public data sites on our website. Thank you I am happy to take any questions.
(Applause)
STEPHEN STROWES: We have one question.
AUDIENCE SPEAKER: Hi. Jan listen Cova. One comment and one question. Do I understand correctly that in your v6 only setup you did not have DNS64?
TIANRUI HU: No. Yeah.
JEN LINKOVA: I'm not surprised things did not work but I guess it's kind of unrealistic setup, it would be very interesting to see how your results would change if you had DNS64 support to devices could reach out to v4 only domains. And question: I am very much confused how, why do you see devices which do not have v6 addresses but send v6 traffic, how does it do that?
TIANRUI HU: So, basically like, they use out assigned IPv6 address as their source address and sent multitask ‑‑
JEN LINKOVA: Lifecycle Google. Very quickly. Did you just use Slack or you have both DHCP 6 and Slack set‑up?
TIANRUI HU: We have three different configurations. One with Slack, one with DHCP v6 and one with both. (SLAAC).
STEPHEN STROWES: I have one very short question. You pointed of course a bunch of IoT devices still using EO I 64 identifiers. You reached out to vendor and they acknowledged this was something that they would fix. Have they fixed it and do you have any indication on how many devices might pick up those updates?
TIANRUI HU: No. Not yet. I mean, they told us like they might, there might be potential mitigation. So, yeah, I have no idea of their timeline.
STEPHEN STROWES: Then I hope they maybe potentially fix this.
TIANRUI HU: Hopefully.
STEPHEN STROWES: Thank you once again.
(Applause)
Our next speaker is Leon, who is going to be talking about evaluating target generation algorithms in IPv6. Leon is a Ph.D. student at T U M and his research focus is on Internet measurements and privacy preserving proxy technologies.
LION STEGER: Hi. Thank you for the introduction. I am here to talk to you about some pitfalls of IPv6 measurements and how we used our data and what we found out about them.
So, first of all, what's so interesting about, and special about IPv6 measurements? For IPv4 Internet measurements are straightforward, you scan through the whole IPv4 address space and that's it. That's not possible for IPv6. Since the address space is way too large. Therefore, when conducting IPv6 measurements you rely on an input dataset, which you use for your measurements. This is often called a hit list, a collection of active IPv6 addresses, which you can use to feed your scans.
These hit lists are also often used for, as an input for so‑called target generation algorithms. They are are algorithms that take active addresses as an input and analyse them for petabytes and generate them for new potential addresses again them and you can scan for active addresses.
The question now arises: Can these IPv6 hit lists actually represent the active part of the IPv6 Internet or are they maybe biased in some way? Because that would influence not only scan results but also target generation algorithms.
The main problem here is that a hit list is just a collection of IPv6 addresses regardless of what kind of devices is actually behind this address. It might be a client device, it web server, Internet infrastructure, routers, whatever, it's just represented as an address.
So, we ask ourselves the following questions: Are popular hit lists potentially biased towards any of these address or device types? And does that make a difference? It should make a difference, probably those addresses have to behave differently but do they? For that, we analysed the data from the IPv6 hit list service, a popular hit list publicly available. The next question we ask is: Do bias input actually influence the target generation how do they perform with biased inputs? For that we took ten different target generation algorithms and tested them with biased input. And last we asked the question if we might benefit from categorising such popular hit lists and to identify just a bias or maybe even comment on it.
As a background for what we did here, we looked at the, at what the IPv6 hit list service actually is. And this service was introduced in 2018 by gas err, it collects more than 2.4 billion IPv6 addresses at the moment from various resources such as traceroutes, DNS resolutions, public data sets, target generation gilts and much more.
The hit list services then runs them through a filter, for example a list prefix detection but also a filter for inactive addresses and afterwards scans these address with different probes for responsiveness. At the moment it contains 21 million addresses, responsive to at least one of those probes just for reference.
In 2022, target generation algorithms were actually already tested, and/or run with the hit list and included in the hit list. Back then it increased the size and the coverage of the hit list by 168 percent, which shows that yes, such target generation gilts are actually a valuable tool for hit lists.
So, what did we analyse with this hit list? We talked about different device types represented in the hit list, but the problem with device types can only be estimated. We cannot for sure know what kind of devices is behind such an address. We can only estimate it for example by analysing what kind of network this address comes from. To analyse the original work of the network we took PeeringDB which we heard about before here, a community maintained database where network operators can enter a category for their network on an autonomous system level. It includes 11 different Greece of which wee took the most important five, and combined the rest of the categories.
When we now look at the representation of these different categories in the hit list service, we see that it's not uniform at all. For example, addresses from IP networks, from NSP networking are far over represented. On the other hand, for example, addresses from university networks or NGO networks are under represented regardless if you look at the input of the hit list or the active output.
So, we know that this representation is not uniform, but does it make a difference? We analysed the behaviour of these different categories in regards to different metrics. For example temporal stability. Does the category of a network make a difference in the behaviour of the address?
So we firstly looked at the amount of state changes in our hit list. The amount of time an address becomes responsive or inresponsive in the hit list. And summed up the amount of ‑‑ the sum of down and uptimes for an address in the hit list. We see in that an average address from an ISP network is up for only an average of seven days, which is not very long. On the other hand, for example, CDN addresses have a much longer uptime and fewer down times. This means that for example, if you were conducting longitudinal measurements, so if you want to scan the same set of addresses for a longer time, you don't want to choose ISP addresses, you want to choose for example CDN addresses. Otherwise you will definitely waste some of your scanning resources.
Another metric re‑looked at is responsiveness through the different probes we send. And what difference is makes from the category. So, here you see the response rate to the different probe to send depending on the category. We see that all address categories share a high responsiveness for ICMP which is somewhat expected. However, ISP addresses respond to basically nothing else but ICMP, which is also somewhat expected. On the other hand, CDN addresses often use for web hosting and such, respond more often on average to probes for TCP port 80 or TCP port 443. This again shows that depending on the use case of your IPv6 measurement, you should probably filter these addresses from the hit list service by some category or for example, exclude ISP addresses if you are doing a web scan.
Okay. Now that we have talked about the hit list contents, we also wanted to look into the influences on target generation algorithms. A quick recap of target generation algorithms, such an algorithm starts with a seed dataset. It analyses the seed dataset for patterns and tries to find more and new potentially active IPv6 addresses from that. And that's the output is called a candidate dataset, which you can then scan and see if those addresses are actually active.
Some of these algorithms also implement their own scanning mechanisms, so that they can adapt their generation strategy to the results of the scanner.
In order to measure the influence of the input dataset, we first use the full hit list as an input, and compare it with the behaviour when using the categorised versions of the hit list as an input.
We chose ten different algorithms with OpenSource implementations from peer review publications, their methods include language models, machine learning, and much more, so a very diverse set of algorithms.
Some general observations when running the algorithms, the size of the candidate dataset varies greatly depending on the algorithm but also the input dataset from a couple of thousand of candidates to multiple hundred millions of candidates.
If you use the default input. So the full hit list as an input, there is a bias towards ISP addresses. This is most likely because ISP addresses contain patterns which are easier to recognise for such algorithms. But we cannot know for sure.
When we use biased input, so, category dependent input, we see that the response rates or the rate of active addresses from the candidate addresses depends a lot on the input dataset. So again, we find here that first of all, TGAs behave vastly differently from almost 50% rate to zero. And almost that dynamic algorithms such as 6 hits, 6 scam which have their own scanning mechanisms then to have a higher response rate which shows that scanning yourself and adapting the scanning strategy or generational strategy to the scanning output leads to a more efficient scanning.
Again, we see that when using ISP addresses as input, higher response rate is the result. This can again mean that ISP addresses have patterns which are easier to recognise for target generation algorithms and also are easier to predict in regards to responsiveness.
Still, this shows a clear bias when you use bias input you also get a bias output.
So to sum everything up network categories are not evenly distributed in the IPv6 hit list service. There is a clear bias towards ISP, NS B and CDN addresses.
The categories themselves show very different behaviour in regards to responsiveness to our different probes as well as temporal stability. For example, ISP addresses are much less stable than, for example CDN addresses, which is very important for longitudinal measurements. On the other hand, ISP addresses only respond to ICMP, while for example CDN addresses also respond to other probes.
So, if you want to avoid scanning overhead in your IPv6 measurements, be aware of these categories and filter accordingly.
Target generation algorithms are, by default, biased towards ISP addresses and they have easier to detect patterns. So, in default input also leads to ICMP biased responsiveness.
Response rates, depending on your use case, have to be considered, and they vary greatly depending on input as well as algorithm.
So, filtering the input can avoid biased candidate data sets and also vary them in size.
If you are interested in more of these findings or our general data or the general details of the IPv6 hit list service, please go to this address or scan this QR code. And now I am open for your questions. Thank you.
(Applause)
MASSIMO CANDELA: Thank you very much. So, we have time for some questions. Okay, somebody is coming.
AUDIENCE SPEAKER: Max Planck. I was wondering, you said that this hit list are biased and then therefore also the target generation results are biased towards ISPs. But I would argue that most likely there is most active hosts in ISPs. So if they are like proportionally over represented in a set, would at that really be a bias? It or just representation of ‑‑ well reality?
LION STEGER: So you say because the biased input set for the ISP is the largest, it should also produce the biggest output, no?
AUDIENCE SPEAKER: I am saying that is it really biased or is it just more IPs and ISPs?
LION STEGER: Ah, so ‑‑ sorry, of course we cannot know, we don't know the ‑‑ how the actual IPv6 Internet looks like, the proportion of categories in the real IPv6 Internet, we cannot know that. So, we cannot say for sure if this is a bias. We only know that the representation in the hit list is not uniform. So, as I said, of course we don't know how the IPv6 Internet looks exactly, but this is our best guess and this shows that it's just not uniformly represented.
MASSIMO CANDELA: I think it would be better to take it offline. Thank you very much.
Now we go to the next presentation.
He is a research engineer at CloudFlare working on Internet measurement routing and security. He is also the winner of today's shortest bio for this session. And he will present what's in the dataset unlocking the APNIC user population.
VASILIS GIOTSAS: Thank you very much. I am happy to be here. I will present you our work on assessing the unboxing the estimates for the number of ASes customers. Before I proceed, let me first...
.
So, I want to give a shout out to Maya mazing collaborateers who should get most of the credit for this work. And let's see why it's important to understand user populations.
So in general not just in the Internet, when we want to understand the impact of an event or an action, we want to put it in the context of how many people are affected by it. So if there is a natural disaster, wild fires, hurricanes, we want to understand how many people were displaced or affected by these. If the government makes a policy change, like a tax cut, how many people will benefit? We want to do the same in the Internet. When there is an incident or a policy action, we want to understand not just how many IPs will be affected or how many autonomous system networks, but also how many people will experience the phenomenon if an undersea cable card stops the Internet in the region, we want to know how many users will either lose connectivity or experience higher latencies, or if a CDN deploys a new presence somewhere, how many users will enjoy lower latencies.
The problem is not all users is not directly available in dataset, so how do we get this number?
.
Fortunately for the community, APNIC, the Asia pacific region, publishes AS customer populations through their website, they have been doing it for some years now, and to the best of my knowledge, it is the only available dataset on this metric,
.
And the fact that it is the only available dataset means that people have to rely on it. So it's not vetted thoroughly, like other data. And this is the topic of our work. So we tried to understand when this dataset is accurate, what are the nuances, what are the bit falls, right.
Before I explain how we tried to do it, let me explain briefly how APNIC estimates user populations.
So they have partnered with Google to deliver Google adds to users across the world. Whenever users receive an ad, APNIC will take their IP address, will geolocate it and will map it to an ASN, so for example, if we have three users that receive ad, APNIC allocates all three of them in Spain and one of them is behind Vodafone, two of them behind Telefonica. So we have 33% of the ads served to Vodafone, 66% of the ads served to Telefonica. Now APNIC will also combine this statistic with the number of users per country provided by the ITU. So they will say for instance, Spain has 45 million users and combined with the statistics on ad served they can estimate that Vodafone as about 15 million users, Telefonica 26 million users. It's a very simple but very smart methodology.
However it may have some pitfalls, some potential biases. For instance, what if the ad placement is not uniform across all incidents? Then this direct mapping between samples and populations gets tricky. What if the ITU estimates on country populations are inaccurate? Maybe some government are to the that good at providing statistics. We may also have incorrect IP location, so, if an IP address is geolocated in Spain but in reality it's in Portugal, then these figures get biased by that. And we may even have incorrect IP to AS mapping which we know that it's not a trivial task.
We tried to understand how these biases may affect the quality of the APNIC customers. To do that we collect data that reflects the opinion of user populations, either directly or indirectly, from four different data sources. First we get data from very large global CDN with presence across the world in more than 300 points of presence. We get the number of distinct user agents per ASN and the HTTP traffic per ASN. This is the result. We also did the manual survey of the number of problems of subscribers forecast that is publish their market share. There are many ASes that say we have that many subscribers or their market intelligence reports. We collected reports like that for 20 countries. We also some to the data from PeeringDB and M‑Lab.
We'll focus on the first two data sets. Let me start by comparing APNIC estimates against the survey we did for broadband subscribers.
So for the 20 countries we tested we find strong linear correlation for ten countries, 70%. So the headline is at ‑‑ this is good news, right. APNIC seems to be, you know, very accurate in the general case. But there are also some notable exceptions where correlation is pulled, Russia, Brazil, Japan, Poland, South Korea and China. And we also anticipated that even in countries where you have got correlation APNIC tends to over estimate the number of users for mobile broadband providers and we hypothesise that it may be because ad blocking is less popular in mobile devices. So they receive this proportionationly more ads compared to desktop users.
Now let's compare the APNIC dataset with the CDN data. Here I saw the data obtained from user agents, but the result for HTTP traffic is very similar. We tried to compute two types of correlations between the APNIC user estimate and the same type of estimates but this time with a CDN data.
The first one is the kind of calculation by essentially sees both data sets, identifies similar organisation order, not necessarily is the actual number is the same, but at least we want the ranking of the organisations to be the same, so let's say bit is first, virgin second and something third this, something like that. We also get this correlation which shows the agreement between the two data sets regarding the most significant networks. Right. So here we focus on incomes that are the market leaders per country. Maybe they don't agree all the AS in the countries. For example the US has thousands of ASes but if they agree in the market leaders, then we say that they have principal AS agreement. So in this map we colour coated the different types of agreement. With green we have complete agreement, both rank and principal AS agreement. In purple, we have only principal agreement. And with blue, we have only rank agreement, and when you see that it means no agreement. Again, the overall picture is positive. So we have in most countries either complete agreement, which is the idea, or principal AS agreement. So for the market leader they agree, we we also see some notable outliers like Russia or Norway or Cameroon, where we have no agreement. And for some of them, the correlation is very poor. So let's see a couple of them and try to explain why.
Russia is the country where they disagree the most, right. And we believe that this is because of the ongoing war in Ukraine, Google has reduced its presence in Russia, and Russia also tries to reduce its reliance on Google. So the market leader for ads today, that that means that it does not have enough ad samples, to provide accurate estimates for Russian ASes. The other interesting case is Norway, where we have the very pronounced outlier, you see it on the right‑hand side plot, on the corner, this is OpenVPN, this provides free VPN exit nodes located in Norway, and APNIC thinks that, you know, this network has the most users in Norway, right. So, its proportion of users in the country is skewed by a lot. In the CDN, there is a filter for such IPs, so it identifies actual providers in the country. There are more examples in the paper. But you see the kind of biases that may happen.
Now, given that most of us don't have perfect data to filter APNIC against, how can we make the best use of this dataset? We identified three key points to take into consideration.
The first is the number of ad samples per estimated user. The second is to remember that APNIC provides estimates smoothed over a period of 60 to 1020 days. They don't provide estimates for the current day, but they provide a number of and also geolocation is often responsible for errors. Let me focus on the first point.
Users to samples ratio per country. In this plot X axis is number of samples, number of ads impressions. Y axis is number of users and each point is a country. We have the linear feed scale, the red is the 95% confidence interval and the countries that are above this confidence inter al we see it's hard to get accurate predictions for these countries. One single ad impression corresponds to 1,000 estimated users. So very small fluctuations in that impression can cause big deviations in user estimations making hard to create accurate predictions.
In this map, we see a fraction of this across 2024 where the user sample ratio is below ‑‑ outside, other, the estimated confidential inter al you see that again good news over all most countries are consistently close to zero, but also some countries light up, Russia is always outside of these, and we have many countries in Africa and the Middle East that go outside of this interval frequently.
So, the recommendation we have here is that for these countries, that I'm not always in the confidence interval, in the 60 day period find the days where the user to sample ratio is the lowest and use this data these days to compute the user populations, right.
Finally, before I close, one point to be aware of that if you want fine‑grained estimations like what is the population in a specific day behind an AS, you don't want to use the user estimate, you want to use the samples, and to give you an illustrated example. Here we see the samples we have red and the estimated users with blue for JAnet, which is the UK academic network, the network which the UK universities use to connect. And I have the numbers October, from start of October until today. So, as we would expect, universities are empty during the weekends, so the samples go down and they have a lot of people during week dates, so the samples go up. The red line matches our expectation. The blue line doesn't though. It goes up, right. Why is that? So this happens because the data average over a period of two to three months and they take into account the summer period of when the universities were closed, so if you want accurate estimates for very granular time periods, use the samples and not the estimates, right.
So, to conclude.
I think the message is positive. APNIC works well for countries with sufficient Google ad impressions. However, we need to be aware of the pitfalls, we need to take into account the users to samples ratio. We need to take into account the sampling fluctuation, and also the type of the network that we are studying, right. So it's important to understand this to avoid misuse, and in the paper we explain how you can use some public data sets to cross validate APNIC given this this data is not available to anyone. I don't have time to explain it now but I hope I got you interested enough to go and read the paper.
Thank you very much. I am ready for your questions.
(Applause)
AUDIENCE SPEAKER: Hello, Daniel arena, NAMex, has this global CDN you speak of considered to make their data public?
VASILIS GIOTSAS: That's a good question. Unfortunately this is not possible, at least at this point. But we want really to somehow be able to help CDNs serve data in a privacy preserving manner so that users don't know from where the data came from but they have access to the data from multiple CDNs.
AUDIENCE SPEAKER: Are you already working with other CDNs to do that?
VASILIS GIOTSAS: Not really. But maybe this presentation is a good opportunity to discuss it.
AUDIENCE SPEAKER: Okay. Thank you.
STEPHEN STROWES: Okay, we are just about over time. If you have a quick question.
AUDIENCE SPEAKER: I just have a question for a very like the question you asked there at the beginning, because if you ask how many people affected from the chain, there is a question what counts as a people or a single person? For example, take me, I have a mobile phone in my pocket with provider A of the Internet and I have my connection at home with provider B; when I'm in the train I use the train wi‑fi which is provider C. And if you, for example, improve the latency of some service for provider B, did you improve latency of the service for me? Do I count?
VASILIS GIOTSAS: That's a good question, that goes back to JAnet. The university has many users in the weekday, not many in the weekend, if they improve connectivity ‑‑ you know, we think we can say the Max numb inform ever users is affected by Tor, the minimum number of users, but we can make educated, you know, ranges, right. We can see how this fluctuations change over time.
AUDIENCE SPEAKER: Thank you.
(Applause)
STEPHEN STROWES: And our final speaker for today is Emile Aben. He is a data scientist at the RIPE NCC, previously at CAIDA, joins the RIPE NCC around about RIPE 58 and he works on doing interesting things with Atlas and RIS etc.
EMILE ABEN: Hi. I want to talk to you about RPKI flutter, and this is like butterrflies, this is like I am still nervous after 15 years of talks at the RIPE meeting. But this is also about what do butterrflies do, they flutter, they make lots of small chaotic movement. And that's also what the RPKI system does.
This is actually a collaboration with my colleagues, and maybe they can get on screen quickly. This is Tees and Augustine helped put this together, so, this is a team effort.
And this started when we started looking at a paper that was presented here at the MAT Working Group, it's called RPKI time of flight by Romain from Touche and collaboraters, and this looked at how the time delays in the RPKI system. And from that paper, there is a quote there. "We observe significant disparities in ISPs reaction time to new RPKI information, ranging from a few minutes to one hour."
So that's ‑‑ it's a fascinating paper and observation. And we started wondering, hey, they did this in a controlled fashion, they controlled the ‑‑ when they put in new information into the RPKI system. So it's a controlled experiment and we wonder what does this look like in the wild? And that's how this started.
So ‑‑ and we specifically wanted to see this part, the circled part, from where things become maybe publicly visible. So after relying parties, or validators, as I like to call them, when things come out of the RPKI clouds and go into routers, what is the timing there actually? And does it ‑‑ how does it affect BGP?
So we have BGP data in RIS of course, but the real granular data for the RPKI system, we didn't collect, and that's what we actually started doing. We started collecting this, so, enter our RPKI flutter system. And we wanted to see the RPKI outputs at high time granularity.
And actually what ‑‑ so we started building a prototype or a minimum viable product or ‑‑ What we actually found is that you can do this in like a really small dataset. And it might fill a gap in a realtime, near realtime analysis and postmortem analysis of things happening in the RPKI system.
And it's ‑‑ I am comparing it to other systems. But first, what are we actually doing? In we are pulling four, actually five now, route nature instances for when the VRP, so the stuff that comes out of the RPKI machinery, before it goes into routers, in are production, that is actually VRT protocol but routers also have a JSON Delta API endpoint, they query every 20 seconds. These are basically like updates in BGP and just to make it similar to BGP, we just, at the beginning of a day, we just created a state. So that's basically a B view, if you have used BGP data before.
And for those who don't know what VRPs look like, I put some up there. It's basically the material that a router needs to figure out if a route was originated by the correct owner or owners.
What we actually do is we make a parka file, a daily parka file and this is really small if you compare to an other RPKI data collection efforts. As ourselves, the RIPE NCC, we do a daily snapshot of this, this is available on our website. I should have linked it probably. It's one vantage point, it's 750 Mb, it has the pre‑valuation data plus the post valuation data. It also has all the stuff pre‑whatever magic a relying party does. And then we have the excellent RPKI view as project that does this at, these type of shots every 15 minutes, five vantage points. That's 180 Gb per day. Also has the pre‑validation and post‑validation data. This new thing I hope fills a gap here‑‑ Four, now five, vantage points and is 30 Mb bytes per day. Do people remember floppy discs? This is kind of like a Windows 95. On floppy discs everyday. It's not that bad. It doesn't constrain the pre‑validation data, if you want to deep dig into debugging of the what the relying parties do, you cannot do that with this data, but what comes out is actually a function of how the global set of validators, how they are functioning, athletes that's what I hope.
So we first have that dataset, we have it ‑‑ I have the URL there so you can look there if you understand files, you can look at it. But it's probably easier if you have like a nice interface to look at this. That's what we also built on a platform called observable HQ. We basically load one of these data sets into your browser, and then you just see okay for this day, there were this many of these fluttering events, and fluttering is basically when a prefix, a VRP, I should say, goes up or down during a day.
And then we compare it to a BGP data. So you can see in near realtime ‑‑ I have two minutes ‑‑ so, look at the URL, it's self explanatory.
What I find fascinating is if you start looking at this, there is far more than what you can actually see in BGP. For instance, Amazon is putting up material in here and retracting it all the time without BGP prefix that matches it. So, they are actually measuring the RPKI system, and what you can see is that the pattern up there, all our vantage points, so each line is an ‑‑ up here, each bar is one of the vantage points and they see different views of the state of the whole thing.
Another one, is an experiment that I did with Max Stucchi, who is here. He wrote a RIPE Labs post about how switching VRPs on and off on a prefix that exists actually causes a lot of BGP churn. You can see it in the bottom graph here, that it causes BGP churn, and if you look closely, there is a close‑up here that you can actually see Max put one, what debugging this whole system he put a VRP up and down again and up again and only two vantage points see this and the other two don't. They see it at different times. So this is a view into how this works over time.
We see IP leasing enforcement. So, people putting ‑‑ or that's what this looks like at least. We see a non‑matching VRP for a prefix that is seen in BGP, and it causes the RIS visibility to drop to half basically. And you can actually see if you zoom in, you can actually see how it drops. So it's like in the period of 20, 30 minutes, something changes here.
Conclusion:
Notebook is available. I would like people to try that out. The data is available. I would like people to use that. I am certainly going to use it, so I hope others will be able to also use it.
This is not ‑‑ so we do this every, the VRP data is every 15 minutes. The RIS data is sometimes slightly more delayed. I hope we can make this more realtime. I hope we can get other vantage points N for example RPKI client, so we can tee see if the routernators and the RPKI clients would behave differently, and yeah, if you find an interesting case, please post on social media with a nice tag. I would love to have your feedback. And I am kind of on time. That's it. Thank you.
(Applause)
STEPHEN STROWES: And we are at time, I believe you promised to take questions into the break. Wonderful.
All right. Thank you all. This was the MAT Working Group session for RIPE 89. Felipe will be covering the NCC's activity plan for to 25 and that will include work on RIPE Atlas, so please go to that if you are interested, please provide feedback to the NCC, they do appreciate feedback. Please vote on the presentations. If there is stuff that you particularly liked or particularly did not like, let us know. We do find it useful. Send feedback directly to us. Join the mailing lists and talk to the folks on there as well.
That's us. See you in Lisbon. Thank you all.
(Applause)
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.