CZ.NIC ODVR problems

Today 23. 6. since 8:30 (czech time) morning I experience problems with DNS resolution. I’ve discovered that problem causes CZ.NIC configured as forwarder - switching to Google or Cloudflare solves the problem…

For me it was already since yesterday 22.6. about 22:00 CET. I thought it’s problem with kresd which happened a few months ago. Switching to Cloudflare works, switching back to CZ.NIC doesn’t so the problem will be probably with the CZ.NIC servers

dig @odvr.nic.cz google.com appears to be working whilst

dig @193.17.47.1 google.com & dig @2001:148f:ffff::1 google.com do not and reporting

connection timed out; no servers could be reached

So what is the official status of NIC? Pepe from NIC team has silently corrected my announcement but no further information. Has NIC any status web server in which we could check the status of services NIC offers?

He is with the TO developer team and not with the division that operates the DNS server but perhaps is reaching out to them.


Whilst this is not solving the issue and just as a note - it might be recommendable not to rely on a sole forward upstream server as it increases the chance of failure, this being a case in point.

To mitigate the failure potential it might be worth considering to specify multiple upstream servers.

I can confirm this issue.

The problem started yesterday, 2019-06-22 and I’ve noticed it around 8:00 CET, exactly 4 hours after my Turris Omnia was updated to 3.11.5 (perhaps related).

I’ve observed the problems with the LXC service. No template was present in LuCI and SSH commands were failing when I’ve tried to create a new virtual machine. It was exactly this problem:


Chybějící LXC kontejnery

and also reported here:

LXC Container - no templates


I’ve temporarily switched to Cloudfare and the containers showed up. Eventually, I switched back to CZ.NIC DNS.


Today, 2019-06-23 around 2:00 CET I had DNS issues inside my LXC containers:

root@PSCorev2:~/dotnet# apt-get update
Err:1 http://security.debian.org/debian-security stretch/updates InRelease
Temporary failure resolving 'security.debian.org'
Err:2 http://deb.debian.org/debian stretch InRelease
Temporary failure resolving 'deb.debian.org'
Reading package lists... Done
W: Failed to fetch http://deb.debian.org/debian/dists/stretch/InRelease  Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://security.debian.org/debian-security/dists/stretch/updates/InRelease  Temporary failure resolving 'security.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
root@PSCorev2:~/dotnet# wget https://download.visualstudio.microsoft.com/download/pr/50bc5936-b374-490b-9312-f3ca23c0bcfa/d7680c7a396b115d95ac835334777d02/dotnet-sdk-3.0.100-preview6-012264-linux-arm.tar.gz
--2019-06-23 06:29:17--  https://download.visualstudio.microsoft.com/download/pr/50bc5936-b374-490b-9312-f3ca23c0bcfa/d7680c7a396b115d95ac835334777d02/dotnet-sdk-3.0.100-preview6-012264-linux-arm.tar.gz
Resolving download.visualstudio.microsoft.com (download.visualstudio.microsoft.com)... failed: Temporary failure in name resolution.
wget: unable to resolve host address 'download.visualstudio.microsoft.com'

The problem persist on the morning.

All problems were solved once I switch DNS in Foris to Cloudfare again.

I asked guys, who are responsible for CZ.NIC ODVR servers, if they know about this issue. Let’s don’t forget that there is a weekend, and the response can take a little bit longer than usual. Maybe @vcunat knows something, which could help.

3 Likes

Yes there is a weekend, but you administer critical infrastructure service - at least you present this in such way.

1 Like

@Pepe
@Radovan_Haban has a point! If that somebody would have two weeks vacation, we would be waiting two week+ for problem solution? :thinking:

The issue is not caused by TO software or hardware failure but an external service provider - it just coincides that the umbrella company is the same.

You are at liberty to utilize another service provider that works more to your liking.

What would you do if big G or CF fails - direct your disappointment at the TO developer team too?

@pepe was trying to speed things up but you could always contact the NIC.CZ team in charge for the DNS service directly and see if that provides a better outcome.

Google has stays pages see https://status.cloud.google.com/
CF has stays page see https://www.cloudflarestatus.com/

NIC has nothing and they want to provide cyber security for our state…

2 Likes

You could make a suggestion to NIC.CZ to provision such feature. To voice your concern in this forum may not yield a result in this regard since it is not covering the development of the DNS service provided by NIC.CZ.

I just got information that the culprit was found and it is fixed. We are sorry for any inconvenience caused by this.

2 Likes

3 posts were split to a new topic: Foris - add option for custom DNS servers

Hi All,

Around 12:07PM US Central time we began experiencing DNS lookup failures via a Turris Omnia 4.0 Beta11 configured for NIC.CZ DNS lookups in Foris.
A quick check from an endpoint showed connectivity by IP Address was working as expected.
Switching DNS Providers on the Turris temporarily resolved the issue as a workaround.

Testing DNS lookups directly from an endpoint to NIC.CZ via IP Address results in this error message:
[193.17.47.1] can’t find google.com: Server failed

It appears NIC.CZ may be experiencing another outage.
This was reported to NIC.CZ support via the chat interface a moment ago to request resolution.

A status dashboard would be a great addition to the service.

It would also be informative if we could get NIC.CZ added to the comparison metrics at:

We appreciate all the hard work from the CZ.NIC team and the Turris team!
:slight_smile:

1 Like

Hi @utc.dm,

Thanks for reaching CZ.NIC via chat interface. We would like to have more details about your issue, which you are experiencing. I’m going to send you PM as there might be some details, which shouldn’t be posted publicly here.

1 Like

Hi All,

Around 2:05PM US Central time the issue has been confirmed to be resolved.

Thanks to everyone at NIC.CZ and Turris project for the quick response.

Not sure what it would take to apply for NIC.CZ to be included in the provider list at dnsperf.com but being able to visualize the uptime statistics are great marketing.

We may setup an automated DNS lookup via our monitoring system in order to detect any NIC.CZ outage ASAP going forward.

I’m glad that it now works for you as it should! Thank you for reporting and for your kind words!

Actually I think an interactive approach is better in this case, i.e. people should be able to see that someone else is reporting a problem, even before anyone reacts – I wonder whether a special category in forum.turris.cz could be a good way.

In any case, we primarily want to avoid issues getting so far that a user notices a problem. Turris 3.x actually uses multiple IPs for our forwarding set up through Foris, so users of that shouldn’t have noticed anything (perhaps a slight slow-down); integration of this into Turris 4.x is planned.

We already had internal monitoring via DNS lookups :slight_smile: but in this case only some names were affected, so unfortunately the problem wasn’t detected. We’re thinking of ways to improve it.

My personal opinions: their way of showing the numbers has strong preference for services meant to be fast world-wide, which has not been our aim (so far), so I don’t think it would appear so nice in there.

1 Like