How to replace a NDB node on EC2

NDB cluster is a very interesting solution in term of high availability since there are no single point of failure. In an environment like EC2, where a node can disappear almost without notice, one would think that it is a good fit.

It is indeed a good fit but reality is a bit trickier. The main issue we faced is that IPs are dynamic in EC2 so when an instance restarts, it gets a new IP. What the problem with a new IP? Just change the IP in the cluster config and perform an rolling restart! no? In fact this will not work, since the cluster is already in degraded mode, restarting the surviving node of the degraded node group (NoOfReplicas=2) will cause the NDB cluster to shutdown.

This can be solved by using host names instead of IPs in the config.ini file. What needs to be done is to define, in /etc/hosts, on entry per cluster member. The API nodes are not required. Here is an example:

$ more /etc/hosts
127.0.0.1       localhost.localdomain localhost
10.11.11.11   mgmn1
10.22.22.22   mgmn2
10.33.33.33   data1
10.44.44.44   data2

$ more /etc/hosts

127.0.0.1 localhost.localdomain localhost

10.11.11.11 mgmn1

10.22.22.22 mgmn2

10.33.33.33 data1

10.44.44.44 data2

the file will be present and identical, at least for the NDB part, in all hosts. Next the NDB configuration must use the hostname like:

[NDBD DEFAULT]

NoOfReplicas=2

Datadir=/var/lib/mysql-cluster/
DataMemory=1G

IndexMemory=100M

[NDB_MGMD]

Id=1

Hostname=mgmn1

[NDB_MGMD]
Id=2
Hostname=mgmn2

[NDBD]

Id=3

Hostname=data1

[NDBD]
Id=4
Hostname=data2

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[NDBD DEFAULT]

NoOfReplicas=2

Datadir=/var/lib/mysql-cluster/

DataMemory=1G

IndexMemory=100M

[NDB_MGMD]

Id=1

Hostname=mgmn1

[NDB_MGMD]

Id=2

Hostname=mgmn2

[NDBD]

Id=3

Hostname=data1

[NDBD]

Id=4

Hostname=data2

[MYSQLD]

This is, of course, a very minimalistic configuration but I am sure you get the point. Let’s go back to our original problem and consider that data2 went down. You spin up a new host, configure it with NDB. Consider, for example, that the IP of the new host is 10.55.55.55. The first thing to do is to update all the /etc/hosts files so that they look like:

$ more /etc/hosts
127.0.0.1       localhost.localdomain localhost
10.11.11.11   mgmn1
10.22.22.22   mgmn2
10.33.33.33   data1
10.55.55.55   data2

$ more /etc/hosts

127.0.0.1 localhost.localdomain localhost

10.11.11.11 mgmn1

10.22.22.22 mgmn2

10.33.33.33 data1

10.55.55.55 data2

And now, you start the ndbd process (or ndbmtd) on data2. Since the management nodes read the /etc/hosts file before the change to get the IP from which to expect the connection, you’ll get the “No free node” error. But then, the management node, when looping back in its connection handling code, will read the new /etc/hosts file and the second time, the connection will succeed.

10 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Éric St-Jean

14 years ago

Yves,
no you wouldn’t need to open up the firewall, because the nodes would still be connecting to each other through their internal IPs.

Yves Trudeau

Author

14 years ago

Éric,
Correct me if I am wrong but using Elastic IPs, do I need to open the firewall on the NDB communication port in a way that open it to the outside world. NDB is completely unsecured so that would stress me a bit. If it can be made secure then yes, the use of elastic IPs would work. I don’t really like to depend on DNS but NDB uses DNS only when connecting, so the spof is not dramatic.

Yves Trudeau

Author

14 years ago

Hi Marco,

indeed a long time, I remember a dinner with the Sun MySQL consultants a few years ago.

This comes from a customer that has 100% of its infrastructure on EC2. I had a bit of the same concerns you have but they proved to wrong or they found way around therm. My main concern was the network but when using m2.2xlarge or m2.4xlarge instances, it looks much better. In the past, I tested with m1.small instances (cheap), likely many of these small instances are on the same physical server so performance suffers. The cluster has been stable for weeks with the default heartbeat timings. Also, they stripes ephemeral storage and get rather decent IO capacity. I have no numbers to show, I would need to ask the customer for that.

I am not saying that NDB cluster on EC2 performs better or even that it is on par with physical hardware but it very decent and nothing to be ashamed of.

marcos.albe

Editor

14 years ago

Yves,
hi long time we don’t see each other. Interesting exercise, but not sure if and for what a customer would like to have a NDB cluster on EC2.
I am particularly concern on the network traffic nodes generate, the I/O during the checkpoints, and how it can really scale with parallel connections/requests.
How many threads did you set there? Did you get any number/statistics also from the different kernel-blocks?
I am interested, so if you have any metrics and if you can share more information on the settings you apply … would be cool.

Ciao ;o)

Éric St-Jean

14 years ago

You don’t need to change all the host files – you can use elastic IPs.
Now, if you just use elastic IPs in your config, that won’t work – or, rather, it’ll work, but it’ll suck. The EIPs are outside IPs, so all inter-node traffic will go through the outside EC2 routers, which is slower, adds latency, and AWS will ding you for the internode traffic.
But, the elastic IP will have an associated dns name, such as ec2-50-16-238-173.compute-1.amazonaws.com. What’s neat about the aws dns names is that if you try to resolve them from inside EC2, they resolve to the internal 10/8 IPs. From the outside, they resolve to the actual outside elastic IP.
Those names aren’t pleasant though, so what i do is use another domain and setup CNAMEs. So, if you need 4 nodes in your cluster, you grab 4 elastic IPs, and you setup 4 cnames, such as mgmn1.example.com, mgmn2.example.com, data1.example.com, data2.example.com. And you use those in your ndb configs. Whenever they need to look for each other, the nodes will perform the lookup and find the internal IP.
If a node goes down, you restart it, and re-assign the elastic ip to the new instance.
Elastic IPs cost nothing when they’re assigned. True, you’ve added a possible point of failure, but you don’t have to go change host files everywhere. If you use route53 for your domain’s dns, the lookups will be superfast, and it’s all on aws.

Matthew Montgomery

14 years ago

Great, that is reassuring.

Yves Trudeau

Author

14 years ago

Matthew,
I did a test with my local virtualbox server with had already a cluster setup and got this:

ndb_mgm> show
Cluster Configuration
———————
[ndbd(NDB)] 2 node(s)
id=3 @10.20.2.2 (mysql-5.1.56 ndb-7.1.13, Nodegroup: 0, Master)
id=4 @10.20.2.4 (mysql-5.1.56 ndb-7.1.13, Nodegroup: 0)

after changing node id 4 IP from 10.20.2.3 to 10.20.2.4.

Yves Trudeau

Author

14 years ago

Matthew,
This is from an onsite engagement I did, a few weeks ago so I don’t currently have access to the box. From what I recall, the show output was normal. I’ll spin up a few instances and confirm.

Matthew Montgomery

14 years ago

Yves,

I haven’t tried this for myself yet. Can you confirm that the surviving node accepts the recovering node into the cluster under the new IP (when using hostnames)? I’m concerned that in a 2 data node config the node might recover in a split brain state on the new IP after StartPartialTimeout and StartPartitionedTimeout ? Can you confirm that both ndbd nodes do not appear as “master” in the ndb_mgm> SHOW output.

lee

13 years ago

In your test, Has data node real data?
This was not work in my case.
My data node has 1G data.
In config.ini, after ip change from 10.0.0.2 to 10.0.0.3, 10.0.0.3’s date node restart.
but, in phase 100 not progress next phase.

[ndbd(NDB)] 2 node(s)
id=4 @10.0.0.1 (mysql-5.1.56 ndb-7.1.13, Nodegroup: 0, Master)
id=5 @10.0.0.3 (mysql-5.1.56 ndb-7.1.913 starting, Nodegroup: 0)

ndb_mgm> all status;
Node 4: started (mysql-5.1.56 ndb-7.1.13)
Node 5: starting (Last completed phase 100) (mysql-5.1.56 ndb-7.1.13)

Do you know any solution to the problem in this case?

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to replace a NDB node on EC2

Related Blog Articles

RECOMMENDED ARTICLES

Introducing the GA Release of the New Percona Operator for MySQL: More Replication Options on Kubernetes

A Tale of Two Databases: No-Op Updates in PostgreSQL and MySQL

Surprise with innodb_doublewrite_pages in MySQL 8.0.20+

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to replace a NDB node on EC2

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

Introducing the GA Release of the New Percona Operator for MySQL: More Replication Options on Kubernetes

A Tale of Two Databases: No-Op Updates in PostgreSQL and MySQL

Surprise with innodb_doublewrite_pages in MySQL 8.0.20+

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads