RIPE 89
Routing Working Group session
31 October 2024
2 p.m.
BEN COX: We'll be starting in one minute. Hello everybody. Welcome to the Routing Working Group. So, we are your Chairs. So, yeah, hi, I am Ben. This is Paul, Ignas, we are your Routing Working Group co‑chairs. I would like to clear up a clarification from last RIPE. This is from the steno‑typer, it said that I said that the Routing Working Group is closed. It turns out that's not the right thing to say. I intended ‑‑ I said the session was closed, I didn't mean to destroy the entire Working Group. So I'd like to clarify that we do intend to keep going. I'll say the right thing next time.
We have some really good talks today. Unfortunately the talk from Job Snijders is not going to happen, unfortunately he is sick, but I'm sure he'll appreciate good get well messages.
Please rate the talks as you go. We really appreciate feedback. It makes us, it makes it much easier to figure out if talks were good or not. You have to do log in through RIPE. And critically as well, we will be running a poll, or a selection, so, if you can fish out the e‑mail, this is running on Meetecho, so it is important that you have access to Meetecho at the end if you want to participate in this selection for the next Routing Working Group co‑chair. So, we currently have 27 in there, there a lot of you in here, I would love to see more people in the Meetecho able to participate in the poll. And with that, I think we can move on.
JOHANN SCHLAMP: Hi, I am going to talk about BGP parsing, about the current state of BGP parsing, how come? It started with a minor headache in my database, some weird routes that kept popping up over several months, and even possibly years, and at some point in time that headache was big enough to have a more closer look into those routes, and that small look‑up ended in the implication of my own MRT BGP parser, and I started to compare all those BGP parsers that are out there, and from this process I want to share a few insights.
So, there are autonomous systems in the routing table that are not routed, at least according to some of the parser implementations, and that could be a problem, because obviously these ASs are not assigned and the prefixes are not assigned, so, we have to deal with that situation.
For a disclaimer, this talk is not about which parser is best. I won't go into details into any parser implementation, and it's also not about performance, so a lot of parsers try to be very fast in parsing BGP, but it's not about performance. All analysis that I'm going to talk about are based on full MRT data sets, meaning that we are taking the first routing information base table dump and all BGP updates for the day consecutively and also from all major BGP data providers like route views, RIPE RIS and packet clearing house, which amounts to about 3 billion routes for a single day that we are analysing, and for time analysis, we are considering the first of each month for a time frame of two years.
So, what is the problem with parsing or understanding BGP dot or MRT data, which is more or less the same actually? So we have a lot of moving parts.
We have the body of standards, which is complex, and has its problems of its own. There are ambiguities in the standard and there are some degrees of freedom and it comes down to the implementation. Then we have BGP speakers that have to implement this standard all over the world. So we have millions of BGP speakers all over the world.
Afterwards, we are looking at a single exporter, meaning a single collector BGP process, exporter, that takes data, transforms data into the MRT format and puts it down onto the hard disk, and then we have BGP parsers that are trying to reverse that process. So it's a complex train of possible error analysis and we also have conflicting goals like should we be conforming with the standard or should we try to get the most of the information contained in that data out of it.
So we also have different use cases, obviously like live inserts, we have standalone parsers, we have large ecosystems like BGP stream or even RIPE RIS life, and also we have the research community and the operations community which is not always the same. And there are also implementation problems that come in play like complicated point arithmetics or algorithm decisions. If you ask is it possible that any AS path is longer than 255 hops, then the answer so it depends on your parser. Some do support this, some don't. And putting it all together, there are a few problems I have listed here, namely we have more than 70 RFC standards that have to be taken into account to implement a good MRT parser, and that is difficult.
So, what is it all about? Is it a problem? Actually it's only software that is with a clear intent and purpose or we could build it as it should be, right, so there shouldn't be any big problem when looking at the data with any tool that is available to us.
So, if you ask a question like how big is the Internet? Most of you would have some gut feeling when we ask different parsers, then the answer is it depends. So we don't have one size for the Internet. We have one size for the Internet from a specific parser. And what would be an acceptable margin of error? So if we have 10% difference, is that a problem or not?
Now, you could argue a couple of /24s, or maybe even smaller prefixes, is not that big of a fuss, so we can also look at the aggregated size of the router address space, and the answer again is it depends on the implementation. Some parsers yield an aggregated size of the Internet of a /0.5 and others of a /0.25. So, what is it? That is 20% of a difference, and in IPv6, you have even a larger margin of error.
It's actually twice the size for some implementations.
Maybe another question: How many others do we have and how are they interconnected? Maybe a different question. It's an important question. We have countless analysis that take place on the BGP AS BGP attribute. There is a whole body of science, there are thousands of students that work on this BGP attribute and we have graph metrics, AS rankings, I don't want to name everything. We have seen security that somehow depend on it with preliminary as per analysis. So, how many ASes do we have? How many AS links do we see in the Internet? Does it matter if it's 10% more or less?
So, to relax the problem a bit, we have the best situation now than we had ever. That's nice. So, these are mostly recent figures. It is way more complicated in the past, but today, we can roughly agree on the size of the Internet. So, there are 173 ASes that are somehow half life state. So, I would be offended if my AS or my ASes are one of those that are not contracted routed even though they are connected to the Internet. But maybe others are not. So maybe it's fine to have 99.8% coverage, meaning agreement between those parsers, but regarding prefix size and prefix numbers, that is a bit more of a problem still today.
So, why is that? Is it a matter of taste or how come that different parsers arrive at different conclusions?
So, first of all, we observe different types of parsers. Actually three different types of parsers, and we can distinguish those parsers by their size of their recite set. On the left‑hand side we see more or less two parsers but there are different plots where we can see different types. So how many routes do we have in the routing table?
So during a specific day? It depends. As we can see here, there are certain implementations that agree with each other, and certain other implementations that have the same result, but in total, we have three different directions that a parser can take.
So, first, you could parse exactly as standardised, meaning if you are too inaudible it's like that. You don't care if the input is faulty it's not your fault.
The second strategy would be you are trying to recover as best as possible from errors, and you filter out some really un ‑‑ some really prohibitive ones like IPv4 prefixes that are larger than a /32, which is in the data actually, or IPv6 prefixes larger than a /128. So, you don't want to have that in your data, but otherwise, you try to parse as closely to the standard as possible.
And then we have a third category of parsers that are not trying to implement the standard, but trying to get the information out of the data. So it's even possibly heuristically, an heuristicical approach to get the data out of it, and I wouldn't argue for any of those three strategies. You see it depends on your use case. If you are asking the question: Is anybody on the Internet announcing a /129? Which is a weird question, obviously. But if you filter out all those anomalies in your BGP parser, you wouldn't be able to answer it. Because if you look closely, the standard doesn't forbid it, so, yeah...
Let's stay change the topic. Do we like the multithreaded routing tool kit, or the MRT format?
Let's put it differently. There is actually not much of a difference between MRT and BGP. MRT is a very thin wrapper around BGP messages, and then you simply coordinate those BGP messages or MRT entries and you have MRT data. So, if there is a problem in BGP, there is automatically a problem in MRT.
For example, let's take a look at the BGP add path capability. It clearly states that you have no way of telling if a BGP message is add path enabled or not. But it's a severe problem because 4 bytes before the prefix the net lure routability information, is either a path ID or not. So if you don't know if the first 4 bytes of the prefix belong to the prefix or not, you do have a problem and you get weird prefixes out of your parser. So we would need a concept of peer capability in MRT, which is not there. We don't know which peer has which capability, and ‑‑ but the exporters do. And the solution in the MRT is that we have record types that model those capabilities. Like we have received a message from a peer that was 4‑byte AS enabled, or add path cable or feature 2025 enabled.
So, if you go down this road, we will have a lot of BGP or MRT entry types, and that will lead to even more errors in parsing. So, I would argue that this situation is not ideal, and I have put together a few case studies. I'll maybe go a bit quicker over them.
Are there still 2 byte only BGPs because 2 byte AS number BGP speakers? Yes. So we still observe BGP open negotiations that do not negotiate the 4‑byte support and we also have not much, but it is in the data, we also have 1.1% of MRT entries that are even typed with the wrong 4‑byte or 2 byte type. So, the consequence would be the message is typed as a 2 byte message, but includes 4‑byte or the other way around. And when you are simply putting too much bytes into one AS, or too few bytes into one AS and that's how you get those weird AS numbers that we had before.
We also see the transitional attribute that was used between non‑4‑byte speakers when we receiving 4‑byte AS numbers, still out there, but those are not the biggest problem, it's more the ill‑typed MRT entries.
Let's look at multiple paths, BGP add path, which some of might have already enabled.
It has been tested by route views, at least the data suggests that and it's also getting checked at packet clearing house. There is an interesting ‑‑ so, ADDPATH will increase data sizes, that's somehow right, and that's I think why some of the providers haven't enabled it. But it will come, because it's an important information and researchers do need this information. And when we look at the anomalies that can be attributed to a BGP ADDPATH features, or path enabled collectors, we see a clear correlation, especially in the IPv6 chart, between 2022 and 2023, during that time frame, where routers enabled the ADDPATH feature, we had a lot of additional anomalies.
So, it's necessary that both the exporter and the BGP speaker that exports the data and the parser support new capabilities at the same time, otherwise you get anomalies in your data instantly.
So what is the lessons learned? We had to implement our own parser. We don't have too much time left so I won't go into details. I wanted a parser that has more flexibility. I am a Python fanatic, so it had to be in Python and it's a nice tool, but I won't go too much into detail here. It has a nice API. You can use it out of the box, and I just published it, so you can give it a try.
What is my summary?
We have raw BGP data that requires interpretation and interpolation. We have dialects and artifacts. We would need knowledge of peer capabilities but we have no way in MRT to get this information. That's not correct actually, but it's very hard to get this information out of MRT data. Adding new features to the standard can lead to data corruption, but the situation has improved with better exporters, most exporters, not only parsers, but with better exporters, still historic analysis remain a problem. And if you go down the route, you can even craft certain P messages that crash certain parsers, I won't go into details here. This is a work in progress. We are working on a scientific paper that will be submitted this year. We are publishing a preprint soon. We are also looking for collaboraters to improve this weird MRT standard, like adding peer capabilities or also RPKI features to the standard.
And you can try out the parser if you like.
Thanks a lot.
(Applause)
AUDIENCE SPEAKER: Tom: Super interesting, thank you for that. I am mostly just a quick question wondering what your data sources for the MRT data is that you are throwing through your parser, was it just RIPE views, RIPE RIS?
JOHANN SCHLAMP: Maybe you missed the disclaimer, it's RIPE RIS, packet clearing house, route views for single days and then multiple days throughout years. All the data that's publicly out there, yes.
AUDIENCE SPEAKER: Super interesting talk. I am Colin Petrie, I am also the maintainer of BGP dump, and author of the RFC that defined the ADDPATH extensions for MRT. You are talking at the end about all the issues with the different capabilities and things like that. I would be super happy to talk to you and work with you, especially if you want to do some work on the MRT protocol, things like that.
JOHANN SCHLAMP: Yes, I appreciate that, thank you.
AUDIENCE SPEAKER: Steve Wallace. The link you posted gives me a 404 error on GitHub.
JOHANN SCHLAMP: Maybe there is there is a typo in there. It will be released in maybe five minutes. I uploaded it yesterday, so ‑‑ sorry.
AUDIENCE SPEAKER: Emile Aben: Do you have any ideas on how route collector projects could record the capabilities?
JOHANN SCHLAMP: Yeah, I do have several ideas on that, yes, but I think it's a complicated issue because if you want to study a small interval of time, you have a small BGP update file and it doesn't include the capabilities, because you don't want to replicate those capabilities for everyone minute file or every five minute file. So, in my opinion it would be best to have a third type like table dump and update messages and peer capabilities that are stored separately and are updated with a time stamp when needed. But there are other options, so we could discuss that offline. Yes.
AUDIENCE SPEAKER: Andrew: Thank you for the talk and I have a question, Your own parser, how do you address the issues you mentioned?
JOHANN SCHLAMP: Actually, that's a good question, because one of my main points when talking to a scientific audience is, you have to state the parser which you are using, and ideally the parser has to state which design decisions it had done. So ‑‑ it had made. So, it's a complicated question because there are hundreds of design decisions. You can abort the parsing at any time, but you somehow have to make clear to the user what you are actually doing. And I don't have a solution for that besides having lots of comments in my code. We have to maybe agree on certain decisions that are, that must be published to understand what a parser is actually doing. But yeah, that's open ‑‑ that's an open issue and a work in progress.
IGNAS BAGDONAS: Last call, any other questions, comments? A round of applause.
TIM BRUIJNZEELS: Hi, I'd like to give you an update today on the RPKI at the RIPE NCC, but I would also like to talk quite a bit about the infrastructure and processes, especially regarding the trust anchor that we have at the RIPE NCC.
So, this is what I just said. I'll start with a small update, talk about the infrastructure and close with highlights of what's next for us.
So, first big thing we did. We, and this has been sent out to the mailing list as well, we recently released a new dashboard for the RPKI. The link is here. Please try it out. We still have the old dashboard around actually if you really want to use it, or need to. When we deployed this, we got one small feature request to add something for delegated CAs, the, ability to view which resources they had, and Alex reported a small issue that we particularly figured, but other than that, we got no response which I guess is a good sign, but please do feel free if you have any comments, questions, to talk to us.
Now, we did all this work mainly, well for two reasons: We wanted to improve the usability. But we also we were facing an ageing technology stack that we needed to redo essentially to be ready for the future and be able to add new features.
When you look at the interface and how it's changed, if you are a regular visitor you might notice immediately is the landing page is now an overview page, and we have done this because we want to really make it more clear to people that we have these other features that were kind of hidden in the past. Like, you can subscribe to e‑mail alerts, so you get e‑mails about announcements that are not found or invalid, compared to the ROAs you configured. And you can also review the history of changes that were made.
The actual management of ROAs is very similar to what we had in terms of functionality. Look and feel is different, but what you'll see most prominently here is that we have an additional tab here about pending changes. And this is because, well, if you make a ROA now you get explicit question, like do not apply this change now, or do you want to keep making changes essentially? In the previous interface, it would always do the latter. And what would happen is that a button would pop up in the bottom right of your screen saying review your changes in order to apply them, and some people actually missed that, so that was one the things we found in our user experience research that we did, which was well people thought they were done but they weren't. Now we make this choice much more explicit and hopefully, you know, if you click at changes here you'll realise that you need to look at that tab, pending changes.
Obviously there is a lot more, but would I say the best way to look at this is by really looking at this. So, please do so, and let us know if you have any questions or comments.
The other big thing that we have been working on is, and I won't go into a lengthy report of this because it might not be that interesting to everybody, but it is a big thing. We have the, we work on SOC 2 Type I insurance report. This is quite important when it comes to regulations, and it was talked about quite a big deal yesterday, how that is relevant, that we can show that basically we're doing the right thing. It is a bit more than that. If you look at it from a practical operational point of view, that's what I'll focus on here.
SOC 2, essentially uses these trust service principles, as they call it, that focus on different aspects like security, availability, confidentiality, process integrity. And what you make based on this is essentially controls which is a fancy of saying a description of things that you do in terms of policies, processes and your infrastructure that are supposed to meet the challenges defined here on the left‑hand side.
What the SOC 2 type Type 1 assurance report tells us essentially is that (a) we have controls that are applicable to the problem domain and (b) we can provide evidence proof that we can do these things. ) Type II report assurance is plan for next year. Essentially the big difference between the Type II and the Type I is the Type I is the starting and Type II, well you need to prove you continue to do these things. And if you change it, things in your processors or infrastructure, you can but you have to document why and how, obviously.
Now, this brings me to infrastructure and processes that I actually hope are think more interesting to this group. And to start off with, I want to start with the high level overview of how the different RPKI CAs work in our case.
We have an offline trust anchor operating on a laptop. We do resigning sessions every eight to twelve weeks, eight weeks preferably, because twelve it cutting it close. The requests come out of an online system, the responses go there. In the online system, we have an old resources CA, as we call it, as the trust anchor and CA below it they claim all the addresses, IPv4 and the IPv6 resources resources, all the ASNs in the world, but if we resign every eight weeks, we cannot make changes quickly. So that's why we have this.
Underneath we have an additional CA which limits the set of resources to the resources that the RIPE NCC actively manges. So this can change everyday based on transfers between RIRs, for example.
Below this, we have a member CAs and their ROAs. I'll talk about the signing process in a bit more detail now.
So, for the trust anchor, or TA for short, we use a laptop and a USB based HSM connected to it. We keep this laptop turned off, always offline actually even when it's turned on, in a safe, in our office, but in order to use it, we need to have at least three out of ten cardholders present to convince the HSM that signing can happen.
If we look at the online system that we have, we don't actually have a distinct thing in that code called a trust anchor proxy, but in conceptual terms that's what we have. We have something there where a, one of our engineers can go in and generate a request for the trust anchor to do a resigning of the manifest and CRL, and/or re‑issue a certificate to the underlying CAs.
This request is put on a USB stick, and then we use what we call Sneakernet to, you know, go over to this laptop that is offline, so that we can access the request there. We start up a software we present three out of ten cardholders have to present their card and password, and after that, the signing process happens. When the signing process is complete, we make a backup of everything, which is stored obviously. This backup is safe, because it only contains the data that is in a way public, because the actually key is obviously kept in the HSM, so that's not in the backup. But, you know, even though that's the case we still put it in a secure backup location. We then get a response that is uploaded to the online system, and the online system then takes care of publishing all the manifest and whatever the trust anchor needs to publish because the offline system can't reach our servers obviously.
So, that's essentially the overview of how that signing process works.
Now, another aspect to look at is the availability of our systems, and I'll talk about the repository in a bit. This is mainly about the CA service, let's say. So the dashboard you look into, where you create your ROAs and so on.
We have two data centres that we use for this, with the redundant setup of a front‑end VMs running in each that can connect to the core backend VMs, and if one of them fails, it can connect to the other one. There is a primary node here, like a primary core that is responsible for background jobs, we don't want two systems running background jobs at the same time. So if you have an issue here, our 24/7 engineer can come in and change the roll over to the other node. We use net HSM, so they don't require cards to be presented on every change, because that would be impractical. But they do provide security in terms of safe randomness and they ensure that our backups don't contain gear text keys.
These are also set‑up in a redundant fashion, so, the core machines can connect to both HSMs if one of them would become unavailable they can connect to the other one.
Then finally we have a database layer where we have a primary node, we use post SQL for this, we have a replication node. So the failover from primary to secondary here is a manual process that we do actually when we batch these machines, so that involves some down time unfortunately, but then again we have been thing about doing like active active database set‑ups and we might get there, but yeah, that's not trivial either. And in practice, this has been working quite well for us except for the unfortunate down time that is involved when we need to patch these VMs.
Okay. So publication server times 3. I'll try to shed some more light on that.
The CAs from the previous picture, well when they have done all their signing, they want to publish their content somewhere. And the way we have configured this is that we actually have three completely independent publication server instances. They don't share any state. So we have one running in one data centre, one in another and we still have a backup in the Cloud for this. This is public data, so in that sense, there is no concerns around privacy or anything regarding the content.
We prefer to use the internal ones though, but we have that one as a backup.
Furthermore, we use ‑‑ so on the right‑hand side I am focusing on RRDP, and for those not familiar with RPKI, there is essentially two ways to retrieve your objects, you can use rsync or an HTPS protocol that's called RRDP.
So for RRDP we have a name and a DNS, that is essentially resourced to a CDN that we use, we have two CDNs, so if you want to use one we set it up another way. We do weekly switches so we are comfortable that we can do this, so we can switch over to the other CDN. In part, this is to, practice what you do and you know how to do it, because when we suspect that there might be any issue between our CDN and validation software out there, instead of spending time to debug this, we can just make the switch, which is usually quicker. Now, if one of the backend publication servers would go down, then we just need to reconfigure our CDN to use one the other ones, so that involves logging in and making a change.
When we look at RSYNC, we have two points of presence for that. RSYNC VMs that get their content by looking at the RRDP state of the internal primary publication server, primary meaning the same one that is used for the the CDN, because we don't want to have differences between what's in RSYNC and what is an RRDP in public, if we can help it.
So, that involves a load balancer, if one of the nodes would become unavailable then the other node takes over.
By and large, RPKI relying party software prefers RRDP, so we don't get a lot of traffic on these, but they are there for backup purposes, at least, because there might be a fallback. And also RSYNC is quite useful if you want to debug things in RPKI if you want to have direct access to certain objects and inspect them, it's quite useful.
And that brings me almost to the end.
Obviously there is more about infrastructure that I could talk about, so if people have questions, please do ask them.
But now I want to move on to what's next? So back to the update part of this presentation.
Future work: Highlights: One of the things we want to improve is ROA history. So, we do have history in our system, but yeah, you need to go through it line by line essentially and there is a text representation of what changed, and what we would really like is to make it easier for people to say exactly what change with regards to my ROAs, who did it, when, and might I cherry pick roll back a certain change or go back to a point in time even, at which point it would tell you okay, review this and we believe this is the impact it would have on your current routing based on the information we have. Are you sure? But then of course if you are sure, you can do it.
Another thing is that the BGP preview as we kind of call it, information, is based on the RIS OS dumps and they can up to eight hours old so the feedback you are getting here can be really, it can take a while but we do have systems that have this information, you know, much more quickly internally, so one the things that we'd like to improve on is to get that information quickly and so it to users.
Then there is also of course RPKI object types that are being discussed or have been discussed, there is ASPA, AS provider at the stations, which we believe are really help to improve the applicability of all of this. And we'd like to implement that as soon as is ready, essentially in the IETF.
There is BGPSEC, which we can implement fairly easily on the API level. And that would be a good start so that people can well work with this in practice and move it along, and I think if the uptake, then shows that people want more, we can build a UI for it as well.
And finally, I'm not sure how familiar people are with this one, RSC is RPKI checklist. That's an idea you have detached signatures in the RPKI saying "resource holder signs basically anything", which might be useful for prototyping for bringing your own IP for that kind of application. But we're still reviewing if we can do this, so, yeah, that's still ongoing. Because, there might be implications here because to shed a bit of light on that, when we talk about ROAs, for example, it's really clear what the problem space is and with RSC that's much more open so this needs further analysis.
And with that, I actually got to the end of my slide deck. I'd like to open the floor for questions.
(Applause)
IGNAS BAGDONAS: Any questions or comments?
AUDIENCE SPEAKER: Silvan from Openfactory: If nobody else is asking a question, I would like to ask for the dashboard to not make it red when there is nothing causing issues, because I always stumble over this. It's red, there must be a problem, make it grey please.
TIM BRUIJNZEELS: The dashboard is making things red when there is no problem?
AUDIENCE SPEAKER: Can you go to your slide with the dashboard in the beginning. It's also red when there is causing invalid announcements 0, it's red. I think there is a problem. There is no problem.
TIM BRUIJNZEELS: Okay, good point. We can look at that. And ‑‑ okay, yes, I understand what you mean. It's saying causing invalid 0. So perhaps we should not say it, but then that could also be confusing. So, with our UX people in the house we need to think a bit about how to manage this, perhaps make it a bit so that it doesn't jump out but it's still there. Good feedback. We'll look into that.
AUDIENCE SPEAKER: Hi. Ben Cartwright‑Cox: A question I have is, as the NCC moves more stuff to AWS or Cloud providers, are you looking into moving the HSMs as well? If some of them offer Cloud HSM products which may be easier, I don't know what the state of your HSM or requirements are?
TIM BRUIJNZEELS: In the short‑term we have no plans to move the core of the RPKI into the Cloud. We have our own HSMs. I there are Cloud HSMs but we're not keen on using a Cloud for this part. We use it for the data and that serve us really well. But we are looking into is that currently, we have two data centres, and well one of them was taken over by the same provider as the other data centre, so we're actually looking into changing, adding ‑‑ well moving away from one of them and adding another one so that we don't dependency on one organisation there. In terms of Cloud usage for RPKI I think it's mainly data and something like the publication server, well, it only contains public data, so we're less reluctant there, to put our crown jewels, let's say in the Cloud, and ‑‑ yeah ‑‑ we're not keen on doing that, we don't see a huge big case for doing that just yet.
RANDY BUSH: I just want to say thank you for the transparency and lucid explanation of what you are doing. I think it's admirable. Thanks.
TIM BRUIJNZEELS: All right. Thank you.
IGNAS BAGDONAS: Any other comments? So, I have one, and I'll move to the correct microphone to ask the questions.
So you were talking about the trust anchor which is implemented as a very secure laptop hidden somewhere in a concrete bunker, and it is disconnected. Well, if it's disconnected say Flash based storage may degrade or something else might happen, the battery may leak. Well, it's just a technical aspect. What do you do about that? Do you have, say, a contingent plans for this?
TIM BRUIJNZEELS: Yes. So we plan the resigning sessions well in advance of when we need to do them. So there is a next update time on manifests that's typically three months. We try to do the signing sessions one month before it would expire. Essentially, we can use commodity hardware to replace any broken hardware, unless of course we do upgrades of the OS and what not, because ‑‑ and that's actually something we have done recently. We bought a new laptop, we wanted to use a new system, and then it turned out that that actually had issues, but we were able to use the current laptop while we were working out the OS operator and the libraries involved to talk to the HSM etc. If the current laptop would break we can essentially just buy one somewhere, install the same software on it, and use it.
For the HSM, we have a contract that essentially ensures that we can get a replacement HSM well within the time period that we need. And then well we would have to do a restore process on that one, that again involves cards, pass phrases, but this is a process that we understand well and can do if we need to.
IGNAS BAGDONAS: Thank you.
AUDIENCE SPEAKER: What kind of contingency plan do you have for the trust anchor backups and cards if for some reason the NCC office exploded? Do you have anything so you could restore if everything in the NCC office was lost?
TIM BRUIJNZEELS: Well, yeah, maybe it depends on the size of the explosion, but generally speaking we keep the laptop and the HSM in the office, but we don't keep the cards in the office. So, if, you know, something would happen to Amsterdam central station and our whole office would burn‑down, we still have access to people with cards and we can get new hardware and, you know, take it from there essentially.
Obviously if you are talking, you know ‑‑ let's not go there, but let's say a global catastrophe, then things may become more difficult because if you don't have access to your staff any more, then at some point you are out of options. But in this you also look at okay, what is the likelihood of something happening, what is the impact. And what can you actually do? So, I think we ‑‑ we are trying to strike a balance here where we feel confident that you know for the most likely issues that could happen, like a fire on central station or something, we can definitely recover from that. Like, you know, a big bomb on Amsterdam, yeah, maybe that, not.
IGNAS BAGDONAS: Right. Thank you, Tim.
(Applause)
PETROS GIGIS: Hello. Today I will tell with you an idea that might be useful for I space and this is a joint work of my supervisor, Prof. Mark Handley and Stefan Vissicchio.
Let's consider network topology consisting of three ARCses and let's take the perspective of AS1. AS1 expects traffic originating from its customer AS2 to transfers through a direct link to generate a profit. However traffic may follow a different path. And this can be even a misconfiguration or may be AS1 failing to advertise a route to AS 2. Equally a network operator could use sFLow, packet counters to detect this and start troubleshooting. What in reality can happen is that traffic with source IPs of AS2 may arrive via AS3 and this traffic to be spoofed.
In this case, relying solely on packet counters would not reveal what is actually happening, and if we mistakenly consider about the spoof traffic is legitimate, it may be we will start troubleshooting a routing issue that never existed.
So, the question is how can we effectively distinguish legitimate traffic from spoofed traffic. Unfortunately, techniques cannot solve this problem and the only solution is to look on the data plane and inspect the traffic.
In this work, we look for legitimate traffic and we determine legitimate traffic will affect traffic that originates where it claims from. The idea is to use legitimate traffic as a routing signal and build a system on top of this but minimise false alerts and notifies the operator only when legitimate traffic ingresses in an unexpected location.
Apart from these use cases we have envisaged use cases which I will talk later about BGP hijacking detection, route leak detection and how to suboptimal paths by transfers links.
One upload that an ISP could use to test the traffic if it's legitimate met or spoof can be summarised in this diagram. So we have with the POP. Traffic is ingressing from AS3 to AS1 per route AS3 and then is proceeds to its destination. So, the ISP could deploy an 86 box and slice part of the traffic, analyse the traffic, and send it back to the network. So, the next question is, can we distinguish legitimate traffic from spoofed traffic? It is easy? The short answer is it's not easy at all. The traffic that we observe may be impacted by many external factors. This factors including implementation of a transport protocols, the network conditions, for example like packet loss, but also potential malicious users that send traffic in different paths.
However, there is a fundamental difference between legitimate traffic and spoofed traffic. The legitimate traffic consists of some closed‑looped flows. And with the term closed‑loop flows would affect the flows where the sender send data to elicit a response to send more data. And the key idea behind this work is that we can use the closed‑loop traffic as a proxy to detect legitimate traffic.
So, how we can detect closed‑loop flows? The answer is, we can do it with traffic. To do this, we refer to ‑‑ we are going to introduce a module called checker and the checker sits between the sender and the receiver. Unfortunately, due to routing asymmetries, the checker will see only one direction of a flow. To determine if the flow is closed‑loop, the checker can tweak the traffic between the sender and the receiver and then the action will pass from the receiver to the sender and return back to the checker. We consider that TCP is the best candidate protocol because inherently it's a closed‑loop protocol.
And what is the easiest and simplest action that we can do in order to trigger all the TCP action in this is to drop the packet and we expect any stack of TCP to redress the missing bytes.
So the idea sounds very simple so far. We drop a packet and if we robe a tradition we closed‑loop. If we don't the flow is not closed‑loop. Is it that simple? Unfortunately, not. After we drop the packet, many things may go wrongs. For example, in the scenario of a closed‑loop flow, we may drop a packet and where a retradition packet follows a path that follows the path by the checker. On the other hand a sophisticated attacker may send a packet in order to mimic the traditions. In we drop a packet we may wrongly characterise the duplicate packet as a tradition. In this case we will consider the not closed‑loop flow as closed‑loop.
It becomes obvious, but the signal from a single packet drop is weak and noisy. So, is there somehow that we can improve this? The answer is yes. Instead of dropping a single packet, we can drop a few packets until we get enough confidence on the nature of the traffic. Let's see our approach in practice.
The black line represents the aggregated traffic arriving at the analysis box. Over time, we drop a few packets indicated by the red marks. This packet drops occur across different company flows minimising the impact of the performance but also providing a signal, a very strong signal about the nature of the traffic. And now let me introduce you our system "Penny". Penny is a traffic taker designed with the model consisting of two completing hypothesis. The first hypothesis is that the traffic is closed‑loop and the second is that that the traffic is not closed‑loop.
Penny interacts with packet ‑‑ Penny drops packets with equal probability and gains confidence in one of the two hypothesis based on the outcome of the packet drop. After a few packet drops, we check if one of the two hypothesis dominates the other.
Here is the maps of our model. Our model consists of counters, variables, probable parameters. The maps evaluate the likelihood of the close loops generated for a tradition packet take into account potential miss transmissions in the case of receive duplicated packets. In order to help you I'll give you an idea of how the two hypothesis evolve over time.
As in figure ARCs when the traffic is not closed‑loop and the number of packets increases, the hypothesis tends to zero and when the traffic is closed‑loop, like figure it, H2 tends to zero. And you'll notice that wording, that is work.
And before proceeding to the evaluation results, I need to highlight that in order to create this probablistic mechanism we had to deal with a lot of challenges and these challenges have to do with this protocol itself, for example we had to deal with reset packets, expiration timers, but also had to deal with network conditions such as external packet loss, or load balancing. But also, we had to deal with the scenario that someone knows exactly how our mechanism works and tries to bypass it. And if you are interested in learning more about the actual model, you can read our second paper.
For evaluation, we used later, and in order to check what can go wrong, we tested multiple variants, diverse network conditions, different types of traffic, but also we played with open parameters.
We found that if you drop exactly twelve packets, the chances of a false alarm are one in a million. And the worse‑case scenario we considered that someone knows exactly how our model works and tries to bypass it by trying to guess which packet we drop.
So, the best attacker strategy is to duplicate packets just below our duplicate, and we saw that dropping exactly twelve packets makes the chances one in a million.
We also found that we can detect legitimate traffic in mixed aggregate even when it accounts only for 10% of the traffic. And furthermore, in our performance evaluation, we found that if we run this mechanism across aggregate, the compilation time of the TCP flows is minimal. This can be shown in the following experiment.
In this experiment we consider a simple topology with some TCP background traffic and 100 non‑spoofed close. We blocked the CDF of a flow completion time. The blue line represents the flow completion time for flows when Penny is disabled, and the red line when Penny is enabled. As we can see, the difference between the two lines is minimal, and in this scenario, we dropped exactly twelve packets.
And now, let me go back and try to describe additional use cases that we thought for this mechanism.
The first one is BGP hijack detection. So, in this scenario, let's take the perspective of AS1. AS1 expects traffic with source IP from AS3 to reach the direct link as they generate profit. However, AS1 may observe traffic with source IPs of AS3 coming through the provider AS5. And this can be maybe varies AS did the hijack and now the path is AS3, AS2, AS5, AS1. And if AS1 does not have any monitors in AS3, it's impossible to detect this. However, there is a case that the observed traffic is not legitimate, but it's spoofed traffic and uses source IPs of AS3. So, in this case, AS1 does not know which case it's in. We consider that using a traffic signal can provide the signal on the nature of the traffic.
Another use case that we thought is a route leak detection. In this scenario AS1 monitors the traffic from AS2, an alerts AS2 about any route that AS2 may have unnecessarily leaked. To do this, AS2 provides AS1 with a list of networks for which it doesn't want to provide transit. For example, providers. At some point in time, AS1 observes traffic from AS2 with source IPs that are not on this list. And this can be again two cases. The first case is that AS2 unintentionally leaks some routes to AS3, or someone sending spoofed traffic using Penny AS1 can test unexpected traffic and notify AS2 if there is some legitimate traffic.
And finally another use case will be the detection of unexpected path, transatlantic path. With a destination IP to an AS3, to a destination IP in AS2 to transfers the link between AS5 and AS1 located in Europe.
We now assume that we observe traffic instead of crossing the Europe link to traverse a link in the US. Again, in this case, can be either the outcome of a configuration in the route in AS5 or it can be spoofed traffic with source IPs of AS3.
Using Penny, AS1 could check if there is some legitimate traffic, and if it is, AS1 may want to contact the operators of AS and ask them for information about various traffic in the US link.
Let me now summarise the main takeaways from my presentation today.
Detecting spoofed traffic might be useful to detect and identify routing incidents such as misconfigurations.
This can be done reliably and cheaply on traffic aggregates by dropping a few packets and Penny is our proof of concept.
If you want to learn more about the mechanism, I suggest you read our second paper.
And now I have two questions for you:
The first question is: Will something like this be useful to you? And the second question is:
Can you think of other use cases that the sort of mechanism could be useful? Thank you.
(Applause)
AUDIENCE SPEAKER: Alexander: Thank you for your efforts. But I need to share concern that some network, some transit network, to get more confidence that my traffic is legitimate is starting to drop my packets of my legitimate TCP sessions with my users. And if multiple transit networks will follow this advice, it gives me very uneasy thoughts, unfortunately. And the question, just giving away these thoughts, the question is: When you were evaluating dropping TCP packets, were you paying attention to which packets are affected?
PETROS GIGIS: Regarding the first question, we dropped a few packets across thousands of packets and we only do it when we have some form of signalling in terms of we're seeing traffic coming somewhere, but it looks already suspicious.
And regarding the second part, in our mechanism, we carefully pick which packet we drop and we mark as drop. In order to avoid, for example, to drop a packet that is a transmission of a previously dropped packet, in order to avoid causing the BGP favourite bug off.
AUDIENCE SPEAKER: What about SYN packets?
PETROS GIGIS: We don't drop SYN packets. We only drop data packets.
AUDIENCE SPEAKER: Okay. Thank you.
AUDIENCE SPEAKER: Hello, Aleksi: I have concern with your work, it's really great work, but when you take it account in reality, spoofed packets don't be mainly used as a way to perform the DDoS amplification. But this kind of attacks generally rely on non‑collected protocol. So, currently you have done something with connected protocol and no, it had the goal to be really useful in a way to mitigate some kind of the talks DDoS attacks, but in other end you completely forgot the uncollated protocols where it should be the most useful to have this kind of detection measured.
PETROS GIGIS: In this work we are not interested in finding spoofed traffic. We are interested in finding legitimate traffic at least some legitimate traffic as we think that if there is legitimate traffic coming at that location, there is a routing path, so there is a routing path that leads legitimate traffic to that hop. In our mechanism we primarily focused on finding at least some legitimate traffic, not detecting spoofed traffic.
AUDIENCE SPEAKER: Please help me. My poor understanding on protocols make me think in the wrong direction. If I understand TCP correctly, you need the closed‑loop to come to data section at all because you have to respond to a hack and something. That means if you have data packets, increasing sequence numbers, you already have the closed‑loop and you know it. If you have spoofed traffic, what's the purpose? The only point we have seen spoofed TCP packets in the real world where TCP resets, so, they are claiming that they want to stop an existing connection by trying a lot of sequence numbers and hoping that some of them go through. So I do not see the reason for the whole project besides the academic interest, it's wonderful. But for the practical purpose, I am lost.
PETROS GIGIS: It's a good comment, but do we know by looking one direction if there are people minimum mimicking TCP connections? And how we can check if, by looking only in one direction, just seeing the sequence space to increase, that there is a closed‑loop flow. So there might be traffic patterns, but they already do this, and the question is how we can detect this.
AUDIENCE SPEAKER: I have to think about it. Thank you.
AUDIENCE SPEAKER: There is one question online from Janus Paulis, he asks "what if there is no incoming TCP traffic to drop?"
PETROS GIGIS: That's a very good question. We consider that there might be some other mechanism for QUIC, now that it's getting more and more popular. But the problem with QUIC is that everything on the packet is encrypted and everything on the header. So dropping a QUIC packet does not provide an insight, even if the next packet is a tradition bug. And one approach will be try to play with the spin bit of a QUIC to come up with a statistical model and by tweaking the spin byte of QUIC packets to see if we can get an accurate signal. And another approach that it's a bit more aggressive, like, if we drop the first packet of a QUIC connection, it might fallback to TCP traffic.
BEN COX: Okay, I think that's it. Thank you very much.
(Applause)
IGNAS BAGDONAS: So, on the agenda the next one we had Job Snijders talking about something interesting, but he cannot be here, so that is not happening and we're moving to the next part of the agenda.
PAUL HOOGSTEDER: Hello, at the last Routing Working Group session, a half year ago, I announced that I will step down as co‑chair of this Working Group as I also have joined the Connect Working Group. Therefore, we need to do a Chair selection. Now, over the last two meetings, we talked about what we found important on how to run an actual selection, it's not an election, and we came up with a process.
Before the meeting, I put out a message on the mailing list asking for people who are interested in joining as co‑chair to send an e‑mail to the current Chairs, not to the mailing list. We waited two weeks. We confirmed the name of the two candidates on the mailing list, and we hope to select one over here.
Antonious Sheridan, I hope I say that correctly, and Sebastian Becker, are you two in the room? If you could come up on stage and very shortly introduce yourself, and state why you want to help Chair this Working Group.
SPEAKER: Hello everyone, I'm Antonious, very good pronunciation on your part. I currently work at Cisco. I have been doing networking for around ten years, and have done security, privacy and networking research as well. I have tried to push for initiatives in IPv6, and RPKI, and I think that's a short enough introduction. Thank you.
SPEAKER: My name is Sebastian Becker, I work for Deutsche Telekom, that small local ISP in Germany. I am doing routing for more than 22 years at that company, being in operations, configuration, engineering and now finally the one and only peering manager for Deutsche Telekom. I hope that brings some qualification to be a Chair here. Thank you.
PAUL HOOGSTEDER: Thank you both. You forgot friendly ISP. I would like to ask the RIPE NCC to show the poll in Meetecho. People have five minutes to respond after which we will stop the poll. Thank you.
BEN COX: If you need to find your Meetecho, you want to go and search your inbox for Meetecho, that might have been sent to you because they sent one for the General Meeting, assuming you were in there. So look for the first one. Slowly watching the participant number going up.
PAUL HOOGSTEDER: If someone wants to tell a joke, you can come on stage.
BEN COX: I have a joke about golf, but it's not really up to PAR!!!!!!
STENOGRAPHER: What do you call an Irish woman on a clothes line?
Peg!
What did the zero say to the 8?
Nice belt!
(Thought you'd appreciate that one)
AUDIENCE SPEAKER: What makes humans most intelligent? It's the only species in the world that knows why they got extinct.
AUDIENCE SPEAKER: An IPv4 address walks into a bar and asks for a cider and says, I am exhausted!
AUDIENCE SPEAKER: Do you want to hear a TCP joke? I know one. Do you want to hear a TCP joke? I know one.
AUDIENCE SPEAKER: It sounds like we became the not Working Group very quickly.
PAUL HOOGSTEDER: Hopefully work will be done when I'm gone.
AUDIENCE SPEAKER: There is ten kinds of people, those who know binaries and those who don't.
AUDIENCE SPEAKER: Surely the not Working Group is the DNS one!
IGNAS BAGDONAS: We have ten seconds left. This was one of the more memorable meetings of a long time, typically Routing Working Group overflows, this was a severe under flow, but that's over. (Blame Job!)
IGNAS BAGDONAS: The results are in, Antonious 31 and Sebastian 41.
Congratulations to Sebastian.
PAUL HOOGSTEDER: I am out. And congratulations to my new co‑chair.
BEN COX: I think that is it. So we will give you back some time. You get a whole eight minutes extra for the coffee break. So thank you everybody for joining. And what's the next session in here? Community Plenary is next in.
(Coffee break)