Mosaic Game 的分布式场景

space_service上的分布式场景

在mosaic_game中也实现了一个类似的分布式场景管理，这个分布式的支持并不会对所有的场景类型生效，只有对配置表中配置为了is_union_space的场景才会启用。

struct space_type_info
{
	union
	{
		struct 
		{
			std::uint32_t is_union_space:1; //是否是大世界可分块场景
			std::uint32_t is_town_space:1; // 是否是城镇场景
			std::uint32_t is_player_dungeon:1; // 是否是单人副本
			std::uint32_t is_team_dungeon:1; //是否是组队副本
			std::uint32_t is_match_space:1; // 是否是匹配场景
			std::uint32_t auto_select_when_empty_id:1; // 空space_id进入时自动选择负载最低的instance
			std::uint32_t auto_create_new_heavy_load:1; // 高负载下自动创建新场景
			std::uint32_t support_back_return:1; // 是否支持离开后再回来
		};
		std::uint32_t all_flags = 0;
		
		
	};
	std::uint32_t space_type; // 场景类型
	std::uint32_t max_player_load; // 单场景最大玩家数量
};

因此在mosaic_game中，一个space根据这个字段是否为true被分为了mono_space和union_space两种类型，mono_space没有分布式的支持，而union_space支持了分布式。

对于一个支持分布式的场景union_space来说，其逻辑上也是有一个或多个互不相交的cell_space组成的，同时union_space的XZ平面区域会被这些cell_space完整填充。cell_space是在单一space_server上的一个space_entity，而union_space则是将这些cell_space连接起来的结构。归属于同一个union_space的cell_space上的union_space_id都是相同的，都等于所属union_space的space_id；同时他们的space_id又是不同的，作为具体的space_entity的唯一标识符。


struct mono_space_info
{
	std::string space_id;
	std::unordered_map<std::string, player_info_in_space> players;
	std::uint32_t space_no;
	std::string game_id;
	std::string team_id; // 如果有值 代表只能这个队伍的成员能进入
	std::string player_id; // 如果有值 代表只有这个人能进入
	std::map<std::string, std::uint32_t> player_factions; // 如果有值 则代表只能有这些玩家进入
	bool ready = false;
};

struct cell_space_info
{
	std::string union_space_id;
	std::string space_id;
	std::uint32_t space_no;
	std::string game_id;
	bool ready = false;

};
struct union_space_info
{
	const std::string space_id;
	distributed_space::space_cells cells;
	const std::uint32_t space_no;
	std::unordered_map<std::string, player_info_in_space> players;
	bool ready = false;
	union_space_info(const std::string& in_space_id, const distributed_space::cell_bound& bound, const std::uint32_t in_space_no, const std::string& cell_game_id, const std::string& cell_space_id, const double in_ghost_radius)
	: space_id(in_space_id)
	, cells(bound, cell_game_id, cell_space_id, in_ghost_radius)
	, space_no(in_space_no)
	{

	}
};

上面的cell_space_info和union_space_info就是space_service上存储的分布式场景的相关信息，而mono_space_info存储的是一个非分布式场景的信息:

std::unordered_map<std::uint32_t, std::unordered_set<std::string>> m_spaces_by_no;
std::unordered_map<std::string, std::unique_ptr<mono_space_info>> m_mono_spaces;
std::unordered_map<std::string, std::unique_ptr<cell_space_info>> m_cell_spaces;
std::unordered_map<std::string, std::unique_ptr<union_space_info>> m_union_spaces;
std::unordered_map<std::string, const misc::space_type_info*> m_space_types;

仅用集合关系来描述一个union_space内的所有cell_space是不够的，其真正的结构应该是跟bigworld里一样的树形结构。每个union_space对应一棵分割树, cell_space就是这棵树里的叶子节点。为了方便的对union_space进行管理，这里用distributed_space::space_cells来描述分割树。这个distributed_space::space_cells类型由自己编写的一个独立库distributed_space提供，这样独立出去的好处是可以更好的进行测试、调试与可视化。

cell_space管理器space_cells

之前提到了为了描述cell_space之间的关系，需要使用一个二叉KD树来串联所有的cell_space，每个cell_space都对应这个KD树中的一个叶子节点。不过在space_cells中并没有显示的用类型去区分叶子节点和内部节点，统一都使用space_node来描述一个节点的信息，这个与BigWorld中的实现不一样：



class space_node
{
private:
	std::string m_space_id;
	std::string m_game_id;
	// boundary的长宽都需要大于四倍的ghost_radius
	// removing状态下的除外 因此此时会缩小到0.5* ghost_radius
	cell_bound m_boundary; 
	std::array<space_node*, 2> m_children;
	space_node* m_parent = nullptr;
	bool m_ready = false;
	bool m_is_merging = false;
	bool m_is_split_x = false;
};

m_parent与m_children两个字段用来组成树形关系，这里的m_children用的是array而不是vector是为了节省内存，以及避免一些动态内存分配。在这样的设计下，叶子节点的m_children存储的都是nullptr,所以判断一个节点是否是叶子节点就可以利用这个性质：

bool is_leaf_cell() const
{
	return !m_children[0];
}

这里的m_boundary字段代表了这个节点所覆盖的XZ平面范围：

struct point_xz
{
	union
	{
		struct {
			double x;
			double z;
		};
		double val[2];
	};
	
	double& operator[](int index)
	{
		return val[index];
	}
	double operator[](int index) const
	{
		return val[index];
	}
	NLOHMANN_DEFINE_TYPE_INTRUSIVE(point_xz, x, z)
};
struct cell_bound
{
	point_xz min;
	point_xz max;
	NLOHMANN_DEFINE_TYPE_INTRUSIVE(cell_bound, min, max);

	bool cover(const double x, const double z) const;

	bool intersect(const cell_bound& other) const;
};

注意到m_boundary上的注释，默认情况下这个矩形的长和宽都要大于四倍的ghost_radius，这个ghost_radius代表的是ghost_entity创建的容差范围。至于为什么要限定大于4*ghost_radius，这个将会在后续的real-ghost管理中介绍。

m_is_split_x字段代表当前是内部节点时的分割轴方向，但是这里没有存储分割轴的位置信息。如果分割轴为X轴，则m_children[0]存储的是分割轴左边的子节点，m_children[1]存储的是分割轴右边的子节点；如果分割轴为Z轴，则m_children[0]存储的是分割轴下面的子节点，m_children[1]存储的是分割轴上面的子节点。在这个设定下，获取分割轴位置可以通过下面的方法计算出来:

double get_split_pos() const
{
	if(m_is_split_x)
	{
		return m_children[0].max.x;
	}
	else
	{
		return m_children[0].max.z;
	}
}

space_cells中存储了两个unordered_map来通过space_id来查询内部节点与叶子节点，同时使用一个字段m_root_node存储了这棵树的根节点：

class space_cells
{
private:
	std::unordered_map<std::string, space_node*> m_leaf_nodes;
	std::unordered_map<std::string, space_node*> m_internal_nodes;
	// split与merge不会影响root_node
	space_node* m_root_node;
	std::uint64_t m_temp_node_counter = 0;
	// master 代表主逻辑cell 这个cell的生命周期伴随着整个space的生命周期
	// space的主体逻辑由master cell控制
	// 第一个创建的cell就是master_cell
	// 这个cell无法被合并到其他节点 同时也无法被负载均衡删除
	std::string m_master_cell_id;

	// 任何一个cell的长和宽必须大于等于四倍的ghost_radius
	// removing状态下的除外
	double m_ghost_radius;
};

注意到这里有一个m_master_cell_id字段，记录了这个space_cells被创建时的第一个Cell的space_id。由于一个space内需要存储一些全局性的数据以及执行一些全局相关的逻辑，为了保证这些数据与逻辑的有序性，我们需要固定住这些逻辑的承载Cell，也就是这个space_cells的master_cell。一个master_cell将不会在负载均衡时被删除，这样才能保证有序性。

在执行节点的分裂操作的时候，会创建一个内部节点，为了给这个内部节点一个不会与所有叶子节点相冲突的唯一标识符，我们将强制使用数字转成的字符串作为其space_id，而这个数字为了保证当前space_cells内唯一，使用了一个递增计数器m_temp_node_counter：

bool space_cells::check_valid_space_id(const std::string& space_id) const
{
	if (space_id.empty())
	{
		return false;
	}
	return !std::all_of(space_id.begin(), space_id.end(), ::isdigit);
}

const space_cells::space_node* space_cells::split_x(double x, const std::string& origin_space_id, const std::string& new_space_game_id, const std::string& left_space_id, const std::string& right_space_id)
{
	if(!check_valid_space_id(left_space_id) || ! check_valid_space_id(right_space_id))
	{
		return nullptr;
	}
	auto dest_node_iter = m_leaf_nodes.find(origin_space_id);
	if(dest_node_iter == m_leaf_nodes.end())
	{
		return nullptr;
	}
	auto dest_node = dest_node_iter->second;
	if(!dest_node->is_leaf_cell())
	{
		return nullptr;
	}
	m_temp_node_counter++;
	auto result = dest_node->split_x(x, new_space_game_id, left_space_id, right_space_id, std::to_string(m_temp_node_counter));
	if(!result)
	{
		return nullptr;
	}
	m_leaf_nodes.erase(dest_node_iter);
	for(auto one_child: dest_node->children())
	{
		m_leaf_nodes[one_child->space_id()] = one_child;
	}
	m_internal_nodes[dest_node->space_id()] = dest_node;
	return result;
	
}

SpaceCells的状态同步

除了space_service上会存储一个union_space上的space_cells结构外，这个union_space的所有Cell也会存储一个同步副本到space_cell_component中，因为space_entity上有很多逻辑需要知道当前Cell周围的其他Cell的信息：

class Meta(rpc) space_cell_component final: public space_component::sub_class<space_cell_component>
{
private:
	utility::timer_handler m_check_ghost_real_timer;

private:
	std::unique_ptr<distributed_space::space_cells> m_space_cells;
};

当space_service创建一个union_space时，对应的space_cells会被初始化，然后会将这个space_cells的数据填充到场景的init数据中：

std::string space_service::do_create_space(std::uint32_t cur_space_no, std::uint32_t cur_space_type, const std::string& pref_space_id,  const std::string& dest_game_id, const json::object_t &init_info)
{
	auto cur_space_sysd = m_space_config_data->get_row(cur_space_no);
	auto cur_game_id = dest_game_id;
	if(cur_game_id.empty())
	{
		choose_game_for_space(cur_space_no, cur_space_sysd);
	}
	
	auto cur_space_id = pref_space_id;
	if(pref_space_id.empty())
	{
		cur_space_id = get_server()->gen_unique_str();
	}

	m_logger->info("try create space no {} with id {}", cur_space_no, cur_space_id);
	auto cur_space_type_info = misc::space_type_info_mgr::get_space_type_info(cur_space_type);
	std::string cell_space_id;
	std::string union_space_id;
	json::object_t space_init_info = init_info;
	utility::rpc_msg cur_msg;
	cur_msg.cmd = "notify_create_space";
	cur_msg.args.reserve(10);
	if(cur_space_type_info->is_union_space)
	{
		cell_space_id = get_server()->gen_unique_str();
		union_space_id = cur_space_id;
		std::array<mosaic_game::utility::entity_pos_t, 2> map_range;
		if(!cur_space_sysd.expect_value(std::string("map_range"), map_range))
		{
			return {};
		}
		distributed_space::cell_bound cur_map_range;
		cur_map_range.min.x = map_range[0].x;
		cur_map_range.max.x = map_range[1].x;
		cur_map_range.min.z = map_range[0].z;
		cur_map_range.max.z = map_range[1].z;
		std::unique_ptr<union_space_info> cur_union_space_ptr = std::make_unique<union_space_info>(union_space_id, cur_map_range, cur_space_no, cur_game_id, cell_space_id, m_ghost_radius);
		space_init_info["components"]["cell"] = cur_union_space_ptr->cells.encode(); // 填充space_cells数据
		m_union_spaces[union_space_id] = std::move(cur_union_space_ptr);
		m_space_types[union_space_id] = cur_space_type_info;
		m_logger->info("create cell {} for union space {}", cell_space_id, union_space_id);
		
	}
	// 省略后续代码
}

当一个cell_space被初始化的时候，会从init_info中找到这个space_cells数据，来填充space_cell_component里的space_cells:

bool space_cell_component::init(const json& data)
{
	if(!m_owner->is_cell_space())
	{
		return true;
	}
	distributed_space::cell_bound temp_bound;
	temp_bound.min.x = -1;
	temp_bound.min.z = -1;
	temp_bound.max.x = 1;
	temp_bound.max.z = 1;
	
	m_space_cells = std::make_unique<distributed_space::space_cells>(temp_bound, std::string{}, std::string{}, 100);
	if(!m_space_cells->decode(data))
	{
		m_owner->logger()->error("cant decode cell_region");
		return false;
	}
	m_is_master_cell = m_space_cells->cells().size() == 1;
	auto cur_root_cell = m_space_cells->root_node();
	m_master_cell_proxy = std::make_shared<std::string>(anchor_for_cell(cur_root_cell));
	check_ghost_and_real();
	
	return true;
}

这样就做到了初始时第一个Cell的space_cells同步,如果后续创建了更多的Cell，将也会对场景创建信息init_info做同样的填充:

void space_service::after_split_space(std::uint32_t cur_space_no, const std::string& cur_union_space_id, distributed_space::space_cells& cur_space, const std::string& new_space_id, const std::string& dest_game_id)
{
	add_space_load_to_game(cur_space_no, new_space_id, dest_game_id, cur_union_space_id, misc::space_type_info_mgr::get_space_type_info(cur_space_no));
	utility::rpc_msg create_space_msg;
	create_space_msg.cmd = "notify_create_space";
	json::object_t space_init_info;
	space_init_info["components"]["cell"] = cur_space.encode();
	create_space_msg.set_args(new_space_id, cur_space_no, cur_union_space_id, space_init_info);
	call_space_manager(dest_game_id, create_space_msg);
}

同时，space_service上任何对space_cells的修改操作，都会通过notify_cell_change向这个union_space的所有Cell进行广播：

void space_service::notify_cell_change(const std::string &union_space_id, const utility::rpc_msg &notify_msg, const std::string &expect_cell_id)
{
	m_logger->info("notify_cell_change {} msg {} except {}", union_space_id, json(notify_msg).dump(), expect_cell_id);
	auto cur_union_space_iter = m_union_spaces.find(union_space_id);
	if (cur_union_space_iter == m_union_spaces.end())
	{
		return;
	}
	std::vector<std::string> cell_call_anchors;
	const auto &cur_cells = cur_union_space_iter->second->cells.cells();
	cell_call_anchors.reserve(cur_cells.size());
	for (const auto &one_pair : cur_cells)
	{
		if (one_pair.first == expect_cell_id)
		{
			continue;
		}
		cell_call_anchors.push_back(utility::rpc_anchor::concat(one_pair.second->game_id(), one_pair.first));
	}
	get_server()->call_server_multi(notify_msg, cell_call_anchors);
}

当space_cell_component接收到相关的修改RPC之后，本地会将这些修改进行回放:

Meta(rpc) void notify_cell_ready(const utility::rpc_msg& msg, const std::string& cell_id);
Meta(rpc) void notify_split_space(const utility::rpc_msg& msg, bool is_x, double split_value, const std::string& origin_space_id, bool remain_left, const std::string& new_space_game_id, const std::string& new_space_id);
Meta(rpc) void notify_split_space_with_direction(const utility::rpc_msg& msg, const std::string& cell_id, const std::string& new_space_id, int cur_split_direction, const std::string& new_space_game_id);
Meta(rpc) void notify_start_merge_space_node(const utility::rpc_msg& msg, const std::string& cell_id);
Meta(rpc) void notify_finish_merge_space_node(const utility::rpc_msg& msg, const std::string& cell_id);
Meta(rpc) void notify_balance_space(const utility::rpc_msg& msg, const std::string& parent_space_id, double split_value);

这样就保证了对于同一个union_space，space_service上存储的space_cells与其所有Cell里space_cell_component存储的space_cells内容完成同步。

SpaceCells负载均衡

单个CellSpace的负载并不是一成不变的，特别是大量的Actor移动的时候，不同的CellSpace之间的负载分布很可能出现不均匀的情况。即使所有的Actor的位置都不发生改变，由于每个Actor的执行逻辑可能被用户输入、时间等各个因素影响，所以CellSpace的负载在这个情况下也不是固定的。为了避免由于CellSpace之间的负载分配不均匀导致若干进程的CPU过高，需要定期的对相邻的CellSpace做边界调整，有些时候甚至需要执行CellSpace的分裂来让更多的进程来平摊负载。这个负载的调整过程就是SpaceCells的负载均衡。

执行负载均衡的前提是首先需要知道单个CellSpace和单个ActorEntity上的负载度量。最简单的负载度量方法就是在一个ActorEntity(A)执行任意函数funcB之前记录一下开始时间begin_ts，执行完这个函数之后记录一下结束时间end_ts，这样end_ts - begin_ts的时间插值就是执行当前函数funcB的消耗，累积在ActorEntity(A)上作为函数消耗总度量。当这个函数调用耗时统计覆盖到ActorEntity的所有逻辑时，这个函数消耗总度量就是当前ActorEntity的负载。这里有一个非常重要的地方需要关注，即funcA可能调用funcB，如果无脑记录所有函数的消耗并执行累加，会导致funcA和funcB的消耗都累加进入，导致负载记录不准确。实际上只能记录最外层的funcA的消耗，任何内层的函数调用的消耗都不能参与累加。为了方便做到这样的只记录最外层函数消耗的目的，mosaic_game里创建了这样的一个类型entity_load_stat:

class entity_load_stat
{
	// 这里的单位都是ns
	std::uint64_t m_current_load; // 当前时间段内的负载统计
	std::uint64_t m_accumulated_load; // 创建之后的总负载统计
	std::uint32_t m_depth = 0; // 记录嵌套层级 只有为0的时候才会记录时间
	std::uint64_t m_record_begin_ts = 0; // 记录第一层调用时的开始时间点
public:
	entity_load_stat()
	: m_current_load(0)
	, m_accumulated_load(0)
	{

	}

	std::uint64_t current_load() const
	{
		return m_current_load;
	}


	std::uint64_t  accumulated_load() const
	{
		return m_accumulated_load;
	}
	
	void flush_period()
	{
		m_current_load = 0;
	}

	void add_load(std::uint64_t in_load)
	{
		m_current_load += in_load;
		m_accumulated_load += in_load;
	}
	void push_record()
	{
		
		if(m_depth == 0)
		{
			m_record_begin_ts = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
		}
		m_depth++;
	}

	void pop_record()
	{
		assert(m_depth> 0);
		m_depth--;
		if(m_depth==0)
		{
			auto now_ts = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
			add_load(now_ts - m_record_begin_ts);
		}
	}

	bool is_recording() const
	{
		return !!m_depth;
	}
};

class server_entity
{
protected:
	utility::entity_load_stat m_entity_load_stat;
public:
	utility::entity_load_stat& entity_load_stat()
	{
		return m_entity_load_stat;
	}
};

每次调用一个函数的时候执行一下这里的push_record，每次执行完一个函数的时候执行一下这里的pop_record。内部使用m_depth这个计数器来记录当前的函数是第几层调用，当m_depth==0的时候代表当前执行的函数是最外层，此时才会记录begin_ts和end_ts，并将时间差值加入到统计。由于push_record与pop_record总是需要配对使用，为了避免人工加入这些代码时的配对错误，这里还提供了一个RAII类型entity_load_stat_recorder来自动的在构造函数里调用push_recrod，同时在析构函数里调用pop_record:

class entity_load_stat_recorder
{
	entity_load_stat& m_dest_stat;
public:
	entity_load_stat_recorder(entity_load_stat& in_dest_stat)
	: m_dest_stat(in_dest_stat)
	{
		m_dest_stat.push_record();
	}
	entity_load_stat_recorder(const entity_load_stat_recorder& other) = delete;
	entity_load_stat_recorder& operator=(const entity_load_stat_recorder& other) = delete;
	~entity_load_stat_recorder()
	{
		m_dest_stat.pop_record();
	}
};

有了这个函数消耗度量工具之后，剩下的任务就是如何尽可能少的修改原来的代码的同时来达到Entity主要逻辑的全覆盖。这里回想一下之前介绍的mosaic_game里的逻辑驱动方式，主要有这四种：RPC驱动，计时器驱动，事件驱动，异步回调驱动。因此只要能够有效的修改这四个逻辑驱动的源代码，就可以准确的对Entity的消耗进行统计。

首先对RPC调用做消耗记录，当前的RPC调用有两个统一的入口，都在entity_manager上，entity_manager会通过传入的entity_id找到对应的entity并在这个entity上做分发，所以只需要在做RPC分发之前插入一个entity_load_stat_recorder即可:

utility::rpc_msg::call_result dispatch_rpc_msg(const std::string& dest, const utility::rpc_msg& msg)
{
	auto cur_entity = get_entity(dest);
	if (!cur_entity)
	{
		return utility::rpc_msg::call_result::dest_not_found;
	}
	utility::entity_load_stat_recorder temp_recorder(cur_entity->entity_load_stat());
	return cur_entity->on_rpc_msg(msg);
}

utility::rpc_msg::call_result dispatch_entity_raw_msg(const std::string& dest, std::uint8_t cmd, std::shared_ptr<const std::string> msg)
{
	auto cur_entity = get_entity(dest);
	if (!cur_entity)
	{
		return utility::rpc_msg::call_result::dest_not_found;
	}
	utility::entity_load_stat_recorder temp_recorder(cur_entity->entity_load_stat());
	return cur_entity->on_entity_raw_msg(cmd, msg);
}

计时器的超时计算刚好也在entity_manager上提供了一个统一的入口，类似的添加entity_load_stat_recorder就好了:

std::size_t poll_timers(utility::ts_t cur_ts)
{
	std::size_t result = 0;
	for(auto one_slot: m_total_entities_vec)
	{
		if(one_slot.first)
		{
			utility::entity_load_stat_recorder temp_recorder(one_slot.first->entity_load_stat());
			result += one_slot.first->poll_timers(cur_ts);
		}
	}

	m_last_poll_ts = cur_ts;
	return result;
}

space_server上处理异步回调也有一个唯一入口，在此添加entity_load_stat_recorder:

void space_server::handle_entity_cb(spiritsaway::mosaic_game::utility::local_entity_id local_eid, const std::shared_ptr<const std::string>&  call_proxy,  utility::mixed_callback_manager::callback_handler entity_cb, const json& data)
{
	auto temp_entity = entity::entity_manager::instance().get_entity(local_eid);
	if (temp_entity)
	{
		utility::entity_load_stat_recorder temp_recorder(temp_entity->entity_load_stat());
		temp_entity->invoke_callback(entity_cb, data);
	}
	else
	{
		if(!call_proxy)
		{
			return;
		}
		else
		{
			utility::rpc_msg cur_msg;
			cur_msg.cmd = "remote_invoke_cb";
			cur_msg.set_args(entity_cb.value(), data);
			call_server(call_proxy, cur_msg);
		}
	}
	
}

最后的事件驱动部分mosaic_game里并没有去做调用统计，因为绝大部分的事件分发都是在actor_entity自身的函数里引发的，没必要徒劳的去记录非最外层的函数调用。

space_entity的函数调用负载除了上述的统计位点之外，还需要在创建actor_entity和actor_entity进出space_entity的时候增加一下额外的统计，因为这三个逻辑很多时候是从space_manager上中转过来的:

actor_entity* space_entity::create_entity(const std::string& entity_type, const std::string& entity_id, json::object_t& init_info, const json::object_t& enter_info, std::uint64_t online_entity_id)
{
	utility::entity_load_stat_recorder temp_recorder(entity_load_stat());
	// 省略后续所有代码
}

void space_entity::enter_space(actor_entity* cur_entity, const json::object_t& enter_info)
{
	utility::entity_load_stat_recorder temp_recorder(entity_load_stat());
	// 省略后续所有代码
}

bool space_entity::leave_space(actor_entity* cur_entity)
{
	utility::entity_load_stat_recorder temp_recorder(entity_load_stat());
	// 省略后续所有代码
}

做好了这些准备工作之后，统计一段时间内的指定space里的所有actor_entity的负载就比较简单了,直接遍历当前场景里的所有actor_entity，获取entity_load_stat上记录的数据即可，为了后续的边界调整需要，这里还加上了每个actor_entity的位置坐标信息:

float space_entity::gather_space_load_with_actors(std::vector<spiritsaway::distributed_space::entity_load>& entity_loads_vec)
{
	float total_entity_load = 0;
	total_entity_load += m_entity_load_stat.current_load();
	m_entity_load_stat.flush_period();
	for(const auto& one_type_ent_vec: m_total_entities)
	{
		for(auto one_entity_pair: one_type_ent_vec)
		{
			auto cur_entity = one_entity_pair.second;
			if(!cur_entity)
			{
				continue;
			}
			auto cur_entity_load = cur_entity->entity_load_stat().current_load();
			cur_entity->entity_load_stat().flush_period();
			if(is_cell_space())
			{
				spiritsaway::distributed_space::entity_load temp_entity_load;
				temp_entity_load.is_real = !cur_entity->is_ghost();
				temp_entity_load.load = cur_entity_load;
				temp_entity_load.pos.x = cur_entity->pos()[0];
				temp_entity_load.pos.z = cur_entity->pos()[2];
				temp_entity_load.name = std::to_string(cur_entity->online_entity_id());
				entity_loads_vec.push_back(std::move(temp_entity_load));
			}
			total_entity_load += cur_entity_load;
			
		}
	}
	return total_entity_load;
}

在这个gather_space_load_with_actors接口之上可以进一步的统计当前进程里的所有space的详细负载信息，只需要对space_manager上的所有space做遍历。这个全进程统计负载的接口暴露成为了一个RPC，方便space_service来定期获取负载状态:

void space_manager::request_gather_space_loads(const utility::rpc_msg& msg)
{
	auto cur_server = m_server;
	if (!cur_server->has_service("space_service"))
	{
		return;
	}
	auto now_ts = utility::timer_manager::now_ts();
	auto delta_ms = now_ts - m_last_report_load_ts;
	m_last_report_load_ts = now_ts;
	auto cur_total_load = m_server->stub_main_thread_load_total();
	auto load_diff = cur_total_load - m_last_report_load_total;
	m_last_report_load_total = cur_total_load;
	auto cur_load = load_diff * 100.0/(delta_ms + 1);
	std::vector<std::pair<std::string, float>> space_loads_arr;
	std::vector<std::vector<spiritsaway::distributed_space::entity_load>> space_entity_loads;
	space_loads_arr.reserve(m_spaces.size());
	for(auto& [one_space_id, one_space_entity]: m_spaces)
	{
		space_entity_loads.push_back({});
		space_loads_arr.push_back(std::make_pair(one_space_id, one_space_entity->gather_space_load_with_actors(space_entity_loads.back())/(delta_ms * 1000 * 1000)));
	}
	

	utility::rpc_msg report_space_load_msg;
	report_space_load_msg.cmd = "report_space_load";
	report_space_load_msg.from = m_server->gen_local_anchor(name());
	report_space_load_msg.args.reserve(3);
	report_space_load_msg.args.push_back(m_server->local_stub_info().name);
	report_space_load_msg.args.push_back(cur_load);
	report_space_load_msg.args.push_back(space_loads_arr);
	report_space_load_msg.args.push_back(space_entity_loads);
	m_server->call_server("space_service", report_space_load_msg);
}

这里上传的负载数据除了gather_space_load_with_actors计算的部分之外，还会加上当前进程的整体负载数据stub_main_thread_load_total，这个变量的更新比较简单，就是在服务器的主循环里对每次loop的执行时间累加到这个stub_main_thread_load_total变量上。

在space_service初始化函数里会添加一个计时器来定期的向所有的space_server上广播这个request_gather_space_loads请求，来做定期的space_server负载收集操作:

void space_service::gather_space_loads()
{
	m_gather_space_loads_timer = add_timer_with_gap(std::chrono::seconds(m_gather_space_loads_gap_seconds), [=]()
	{
		gather_space_loads();
	});
	utility::rpc_msg cur_msg;
	cur_msg.cmd = "request_gather_space_loads";
	for(const auto& [one_game_id, _]: m_game_loads)
	{
		call_space_manager(one_game_id, cur_msg);
	}
	// 收集完成之后再执行负载均衡
	m_balance_cells_timer = add_timer_with_gap(std::chrono::seconds(m_balance_cells_gap), [=]()
	{
		do_balance_space_loads();
	});
}

当每个space_service接收到一个space_server的最新负载记录之后，首先把进程总负载更新到m_game_loads上，然后遍历每个上报过来的space负载，如果这个space对应的是分布式大世界的场景，则需要将这个记录的详细的actor_entity的负载更新到space_cell里:

void space_service::report_space_load(const utility::rpc_msg &data, const std::string &game_id, float cur_load, const std::vector<std::pair<std::string, float>>& space_loads, const std::vector<std::vector<distributed_space::entity_load>>& space_entity_loads)
{
	m_game_loads[game_id].cur_load = cur_load;
	m_game_loads[game_id].max_load = 100;
	if(space_loads.size() != space_entity_loads.size())
	{
		m_logger->error("report_space_load size mismatch space_loads size {} space_entity_loads size {}", space_loads.size(), space_entity_loads.size());
		return;

	}
	for(std::uint32_t i = 0; i< space_loads.size(); i++)
	{
		const auto& [one_space_id, one_space_load] = space_loads[i];
		auto cur_space_type_iter = m_space_types.find(one_space_id);
		if(cur_space_type_iter == m_space_types.end())
		{
			m_logger->error("report_space_load cant find space {} ", one_space_id);
			continue;
		}
		if(!cur_space_type_iter->second->is_union_space)
		{
			continue;
		}
		auto cur_space_iter = m_cell_spaces.find(one_space_id);
		if(cur_space_iter == m_cell_spaces.end())
		{
			m_logger->error("report_space_load cant find cell space {} ", one_space_id);
			continue;
		}
		auto cur_union_space_iter = m_union_spaces.find(cur_space_iter->second->union_space_id);
		if (cur_union_space_iter == m_union_spaces.end())
		{
			m_logger->error("space {} has invalid unity space {}", one_space_id, cur_space_iter->second->union_space_id);
			continue;
		}

		cur_union_space_iter->second->cells.update_cell_load(one_space_id, one_space_load, space_entity_loads[i]);
	}
}

void space_cells::update_cell_load(const std::string& cell_space_id, float cell_load, const std::vector<entity_load>& new_entity_loads)
{
	auto cur_node_iter = m_leaf_nodes.find(cell_space_id);
	if(cur_node_iter == m_leaf_nodes.end())
	{
		return;
	}
	if (cur_node_iter->second->is_leaf_cell())
	{
		cur_node_iter->second->update_load(cell_load, new_entity_loads);
	}

}

由于后续调整space_cell边界的时候需要快速的知道分割轴位置需要调整多大才能使当前space_cell的actor_entity负载降低指定值，所以这里执行update_load的时候会按照x/z两个轴分别对entity_loads数组进行坐标排序:

void space_cells::space_node::update_load(float cur_load, const std::vector<entity_load>& new_entity_loads)
{
	m_cell_load_report_counter++;
	m_cell_loads[m_cell_load_report_counter % m_cell_loads.size()] = cur_load;
	m_entity_loads = new_entity_loads;
	make_sorted_loads();
}

void space_cells::space_node::make_sorted_loads()
{
	m_sorted_entity_load_idx_by_axis[0].resize(m_entity_loads.size(), 0);
	m_sorted_entity_load_idx_by_axis[1].resize(m_entity_loads.size(), 0);
	for (std::uint16_t i = 0; i < m_entity_loads.size(); i++)
	{
		m_sorted_entity_load_idx_by_axis[0][i] = i;
		m_sorted_entity_load_idx_by_axis[1][i] = i;
	}
	std::sort(m_sorted_entity_load_idx_by_axis[0].begin(), m_sorted_entity_load_idx_by_axis[0].end(), [this](std::uint16_t a, std::uint16_t b)
		{
			return this->get_entity_loads()[a].pos.x < this->get_entity_loads()[b].pos.x;
		});
	std::sort(m_sorted_entity_load_idx_by_axis[1].begin(), m_sorted_entity_load_idx_by_axis[1].end(), [this](std::uint16_t a, std::uint16_t b)
		{
			return this->get_entity_loads()[a].pos.z < this->get_entity_loads()[b].pos.z;
		});
}

这里排序的时候并不是创建一个额外的m_entity_loads副本，而是使用m_sorted_entity_load_idx_by_axis这个数组来存储m_entity_loads里的数据索引，比较的时候用索引去获取真正的entity_load，并最终使用pos字段来做坐标排序。

当space_service::gather_space_loads发出广播收集负载请求之后，会等待一段时间，让所有的space_server完成数据上报。当计时器超时之后，可以认为所有负载数据已经上报完毕，可以开始调用do_balance_space_loads来执行最终的负载均衡了：

void space_service::gather_space_loads()
{
	m_gather_space_loads_timer = add_timer_with_gap(std::chrono::seconds(m_gather_space_loads_gap_seconds), [=]()
	{
		gather_space_loads();
	});
	utility::rpc_msg cur_msg;
	cur_msg.cmd = "request_gather_space_loads";
	for(const auto& [one_game_id, _]: m_game_loads)
	{
		call_space_manager(one_game_id, cur_msg);
	}
	// 收集完成之后再执行负载均衡
	m_balance_cells_timer = add_timer_with_gap(std::chrono::seconds(m_balance_cells_gap), [=]()
	{
		do_balance_space_loads();
	});
}

这个do_balance_space_loads代码有点长，但是可以比较独立的分为四个部分:

处理一些非分布式大世界场景的分线按需创建
处理一些分布式大世界里的相邻Cell的边界调整，在不同的Cell间转移负载
处理一些分布式大世界里的高负载Cell的分裂，通过创建新的Cell来平坦负载
处理一些分布式大世界里的低负载Cell的合并，通过减少Cell数量的方式来减少负载

非分布式大世界场景的新分线创建规则是这个场景配置成了自动扩缩容，且所有分线的平均负载人数大于指定阈值。此时会通过不指定game_id的定时来通过do_create_space自动的选择负载最小的场景去创建当前space_no的新实例:

void space_service::check_heavy_load_auto_create()
{
	// 会定期的扫描这些开启了自动扩容的场景里的平均人数负载，如果大于了`80%`则会自动的创建一个新实例：
	std::vector<std::pair<std::uint32_t, std::uint32_t>> need_create_spaces;
	for(auto [cur_space_no, cur_space_type]: m_check_load_create_spaces)
	{
		auto temp_iter = m_spaces_by_no.find(cur_space_no);
		if(temp_iter == m_spaces_by_no.end())
		{
			continue;
		}
		const auto& cur_space_ids = temp_iter->second;
		int space_instance_count = 0;
		int space_player_count = 0;
		for(const auto& one_space_id: cur_space_ids)
		{
			auto cur_mono_space_instance_iter = m_mono_spaces.find(one_space_id);
			if(cur_mono_space_instance_iter != m_mono_spaces.end())
			{
				space_instance_count++;
				space_player_count += cur_mono_space_instance_iter->second->players.size();
			}
		}
		auto cur_space_type_info = misc::space_type_info_mgr::get_space_type_info(cur_space_type);
		if(cur_space_type_info && space_player_count > (space_instance_count * cur_space_type_info->max_player_load) * m_heavy_load_threshold)
		{
			need_create_spaces.push_back(std::make_pair(cur_space_no, cur_space_type));
		}
	}
	m_logger->info("check_heavy_load_auto_create with result {}", serialize::encode(need_create_spaces).dump());
	for(auto one_space_pair: need_create_spaces)
	{
		do_create_space(one_space_pair.first, one_space_pair.second, std::string{}, std::string{}, json::object_t{});
	}
}

在处理相邻space_node的边界调整的时候，需要明确一下一个space_node能够作为边界缩小对象的相关规则space_node::check_can_shrink：

当前节点以及兄弟节点的负载最近一段时间内没有被调整过，这样可以避免对于同一个节点的短期重复调整，因为从调整边界到新的负载最终稳定需要一个过程
当前节点的所有叶子节点的平均负载要大于指定阈值，如果平均负载小于这个指定阈值，则代表这个节点的内部仍然有边界调整的空间，需要优先调整其内部的子节点
当前节点与兄弟节点的平均叶子负载的差值要大于指定阈值，这样才会有明显的负载变化
兄弟节点不能处于等待删除的状态，因为删除状态的节点会自动的调整边界
由于我们在space_cells里为了避免单一space_node所覆盖的区域太小，给space_node增加了正常情况下长宽都要不小于四倍的ghost_radius的限定，而且我们每次调整的最小步长为一个ghost_radius，所以变成调整方向的边长要不小于5*ghost_radius

有了这个check_can_shrink的实现时候，检查一个space_cells能否执行space_node的边界调整就可以以递归的形式从根节点往下查询，这样实现的好处就是会优先执行子节点的边界调整，只有在子节点都无法调整的时候才会尝试去调整上一个层级的节点:

const space_cells::space_node* space_cells::space_node::calc_shrink_node(const std::unordered_map<std::string, float>& game_loads, const cell_load_balance_param& lb_param, const double ghost_radius) const
{
	for (auto one_child : m_children)
	{
		if (one_child)
		{
			auto cur_child_result = one_child->calc_shrink_node(game_loads, lb_param, ghost_radius);
			if (cur_child_result)
			{
				return cur_child_result;
			}
		}
	}
	if (check_can_shrink(game_loads, lb_param, ghost_radius))
	{
		return this;
	}
	return nullptr;
}

const space_cells::space_node* space_cells::get_best_node_to_shrink(const std::unordered_map<std::string, float>& game_loads, const cell_load_balance_param& lb_param)
{
	return m_root_node->calc_shrink_node(game_loads, lb_param, m_ghost_radius);
}

但是这个get_best_node_to_shrink只是筛选出来了收缩边界的最佳节点，具体收缩多少边界还是需要计算的,这里calc_best_shrink_new_split_pos计算新的分割位置的时候会首先以ghost_radius作为基础的收缩大小，然后将剩余的可收缩空间切分为十份，不断的扩张搜索的大小直到通过calc_move_split_offload计算出来的负载转移值会大于指定的阈值lb_param.min_sibling_game_load_diff_when_shrink / 2:

double space_cells::space_node::calc_best_shrink_new_split_pos(const cell_load_balance_param& lb_param, const double ghost_radius) const
{
	bool is_split_pos_smaller = m_parent->m_children[0] == this;
	auto max_move_length = calc_max_boundary_move_length(m_parent->is_split_x(), is_split_pos_smaller);
	max_move_length -= 5 * ghost_radius;
	auto move_unit = max_move_length / 10;
	auto cur_split_pos = m_boundary.min.x;
	for (int i = 0; i < 10; i++)
	{
		if (m_parent->is_split_x())
		{
			if (is_split_pos_smaller)
			{
				cur_split_pos = m_boundary.max.x - ghost_radius - i * move_unit;
			}
			else
			{
				cur_split_pos = m_boundary.min.x + ghost_radius + i * move_unit;
			}
		}
		else
		{
			if (is_split_pos_smaller)
			{
				cur_split_pos = m_boundary.max.z - ghost_radius - i * move_unit;
			}
			else
			{
				cur_split_pos = m_boundary.min.z + ghost_radius + i * move_unit;
			}
		}
		auto cur_split_offload = calc_move_split_offload(cur_split_pos, m_parent->is_split_x(), is_split_pos_smaller);
		if (cur_split_offload > lb_param.min_sibling_game_load_diff_when_shrink / 2)
		{
			return cur_split_pos;
		}
	}
	return cur_split_pos;
}

当新的split_pos被计算出来之后，就需要按照新的边界来调整所有的相关节点,也就是当前调整节点的父节点下面的所有子节点:

bool space_cells::balance(double split_v, const space_cells::space_node* cur_node)
{
	if (!cur_node || cur_node->is_leaf_cell())
	{
		return false;
	}
	double pre_split_pos = cur_node->m_children[0]->boundary().max.x;
	if (!cur_node->is_split_x())
	{
		pre_split_pos = cur_node->m_children[0]->boundary().max.z;
	}
	auto mutable_cur_node = m_internal_nodes[cur_node->space_id()];
	assert(mutable_cur_node);
	mutable_cur_node->children()[0]->update_boundary_with_new_split(split_v, cur_node->is_split_x(), split_v < pre_split_pos, true);
	mutable_cur_node->children()[1]->update_boundary_with_new_split(split_v, cur_node->is_split_x(), split_v < pre_split_pos, false);
	return true;
}

这里的update_boundary_with_new_split是一个递归下降的过程,每次处理到一个节点的时候，都需要将当前的cell_loads计数器重置为1,这样可以避免这些节点的频繁调整，因为调整的条件之一就是m_cell_load_report_counter要大于指定阈值:

void space_cells::space_node::update_boundary_with_new_split(double new_split_pos, bool is_x, bool is_split_pos_smaller, bool is_changing_max)
{
	auto cur_axis = is_x ? 0 : 1;
	if (is_changing_max)
	{
		m_boundary.max[cur_axis] = new_split_pos;
	}
	else
	{
		m_boundary.min[cur_axis] = new_split_pos;
	}
	if (is_leaf_cell())
	{
		
		m_cell_loads[1] = get_latest_load();
		m_cell_load_report_counter = 1;
	}
	else
	{
		if (is_x)
		{
			if (m_is_split_x)
			{
				if (is_changing_max)
				{
					return m_children[1]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
				}
				else
				{
					return m_children[0]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
				}

			}
			else
			{
				m_children[1]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
				m_children[0]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
			}
		}
		else
		{
			if (m_is_split_x)
			{
				m_children[1]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
				m_children[0]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
			}
			else
			{
				if (is_changing_max)
				{
					return m_children[1]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
				}
				else
				{
					return m_children[0]->update_boundary_with_new_split(new_split_pos, is_x, is_split_pos_smaller, is_changing_max);
				}
			}
		}

	}
}

负载均衡里最复杂的边界调整部分讲解完了，接下来考虑节点分裂。一个节点如果需要被分裂，需要满足如下几个条件:

这个节点需要是叶子节点
这个节点的在最近多次的负载汇报里都处于高负载状态
这个节点的长和宽都大于8*ghost_radius，这样可以保证分裂出来的两个子节点都维持长宽都不小于4*ghost_radius的限制
这个节点所在的进程负载也需要大于指定阈值，因为如果进程负载还有空余的话就没有必要去执行节点分裂了

最后如果有多个节点满足这个分裂需求，选择其中负载最高的节点，上述条件整体封装成了下面的这个函数:

const space_cells::space_node* space_cells::get_best_cell_to_split(const std::unordered_map<std::string, float>& game_loads, const cell_load_balance_param& lb_param) const
{
	const space_cells::space_node* best_result = nullptr;
	for (const auto& [one_cell_id, one_cell_node] : m_leaf_nodes)
	{
		if (!one_cell_node->is_leaf_cell())
		{
			continue;
		}
		if (one_cell_node->cell_load_report_counter() <= lb_param.min_cell_load_report_counter_when_split)
		{
			continue;
		}
		auto cur_boundary = one_cell_node->boundary();
		if ((cur_boundary.max.x - cur_boundary.min.x < 8 * m_ghost_radius) && (cur_boundary.max.z - cur_boundary.min.z < 8 * m_ghost_radius))
		{
			continue;
		}
		if (one_cell_node->get_smoothed_load() < lb_param.min_cell_load_when_split)
		{
			continue;
		}
		auto temp_game_iter = game_loads.find(one_cell_node->game_id());
		if (temp_game_iter == game_loads.end())
		{
			continue;
		}
		if (temp_game_iter->second < lb_param.min_game_load_when_split)
		{
			continue;
		}
		if (!best_result || one_cell_node->get_smoothed_load() >= best_result->get_smoothed_load())
		{
			best_result = one_cell_node;
		}

	}
	return best_result;
}

如果通过get_best_cell_to_split找到了一个可以用来分裂的节点，执行这个节点的分裂的时候有一个非常重要的问题，就是决定在哪个位置分裂。由于我们设定分裂的时候新创建的节点的最短边长为4*ghost_radius，因此只需要选择分裂的方向即可。可选的分裂方向有四个left_x, right_x, low_z, high_z，我们需要计算这四个方向上分裂出去的entity_load，选择其中最大的作为最佳分裂方向，这个过程被封装成了下面的这个函数:

cell_split_direction space_cells::space_node::calc_best_split_direction(float ghost_radius) const
{
	
	if (m_entity_loads.empty())
	{
		// 选择最长边的一个方向

		if (m_boundary.max.x - m_boundary.min.x > m_boundary.max.z - m_boundary.min.z)
		{
			return cell_split_direction::left_x;
		}
		else
		{
			return cell_split_direction::low_z;
		}

	}
	else
	{
		std::array<float, 4> split_gains;
		std::fill(split_gains.begin(), split_gains.end(), 0);
		
		float temp_acc_loads = 0;
		for (int i = 0; i < 1; i++)
		{
			auto cur_sorted_load_idx_copy = vec_iter_wrapper(m_sorted_entity_load_idx_by_axis[i], false);

			while (cur_sorted_load_idx_copy.valid())
			{
				auto one_index = cur_sorted_load_idx_copy.get_and_next();
				const auto& one_load = m_entity_loads[one_index];
				if (one_load.pos[i] > m_boundary.min[i] + 4 * ghost_radius)
				{
					break;
				}
				else
				{
					temp_acc_loads += one_load.load;
				}
			}
			split_gains[i*2] = temp_acc_loads;
			
			temp_acc_loads = 0;
			auto cur_sorted_load_idx_copy_reverse = vec_iter_wrapper(m_sorted_entity_load_idx_by_axis[i], true);

			while (cur_sorted_load_idx_copy_reverse.valid())
			{
				auto one_index = cur_sorted_load_idx_copy_reverse.get_and_next();
				const auto& one_load = m_entity_loads[one_index];
				if (one_load.pos[i] + 4 * ghost_radius < m_boundary.max[i])
				{
					break;
				}
				else
				{
					temp_acc_loads += one_load.load;
				}
			}
			split_gains[i*2 + 1] = temp_acc_loads;
		}
		
	}
}

这里使用split_gains这个四个元素的array来记录四个方向执行分裂时移出当前节点的负载，计算的时候先使用一个for循环来遍历x/z这两个轴方向，然后在同一个轴里再计算从左侧切割和从右侧切割4*ghost_radius时能够减轻的负载。计算的时候会对m_sorted_entity_load_idx_by_axis[i]这个有序数组做遍历，m_sorted_entity_load_idx_by_axis[i]里存储的是按照对应轴坐标上升的m_entity_loads索引。这里使用一个vec_iter_wrapper来分别对这个坐标上升的数组索引做一次正向遍历和一次反向遍历，刚好对应从坐标轴的左侧切割和从坐标轴的右侧切割。

在计算出四个方向分裂的负载降低值之后，选择其中split_gain最大的作为分裂方向进行返回。为了避免当前节点的分裂方向与父节点的分裂方向一致，会对同分裂方向计算出来的负载乘以一个系数，这样期望可以得到交替方向的节点分裂:

// 避免新的子节点与原来的兄弟节点划分方向相同 以免出现连续多个同方向划分
if (m_parent)
{
	if (m_parent->is_split_x())
	{
		if (this == m_parent->m_children[0])
		{
			split_gains[int(cell_split_direction::right_x)] *= 0.5f;
		}
		else
		{
			split_gains[int(cell_split_direction::left_x)] *= 0.5f;
		}
	}
	else
	{
		if (this == m_parent->m_children[0])
		{
			split_gains[int(cell_split_direction::high_z)] *= 0.5f;
		}
		else
		{
			split_gains[int(cell_split_direction::low_z)] *= 0.5f;
		}
	}
}
int best_dir = 0;
float best_gain = split_gains[0];
for (int i = 1; i < 4; i++)
{
	if (split_gains[i] > best_gain)
	{
		best_gain = split_gains[i];
		best_dir = i;
	}
}
return cell_split_direction(best_dir);

在明确了分裂方向之后，需要从当前所有进程里选出负载最低的进程作为新space_node的归属进程，然后通过split_at_direction来更新space_cells的内部结构:

const space_cells::space_node* space_cells::split_at_direction(const std::string& origin_space_id, cell_split_direction split_direction, const std::string& new_space_id, const std::string& new_space_game_id)
{
	auto cur_cell_iter = m_leaf_nodes.find(origin_space_id);
	if (cur_cell_iter == m_leaf_nodes.end())
	{
		return nullptr;
	}
	auto cur_cell = cur_cell_iter->second;
	if (!cur_cell->ready() || cur_cell->cell_load_report_counter() == 0)
	{
		return nullptr;
	}
	switch (split_direction)
	{
	case cell_split_direction::left_x:
		return split_x(cur_cell->boundary().min.x + 4 * m_ghost_radius, origin_space_id, new_space_game_id, new_space_id, origin_space_id);
	case cell_split_direction::right_x:
		return split_x(cur_cell->boundary().max.x - 4 * m_ghost_radius, origin_space_id, new_space_game_id, origin_space_id, new_space_id);
	case cell_split_direction::low_z:
		return split_z(cur_cell->boundary().min.z + 4 * m_ghost_radius, origin_space_id, new_space_game_id, new_space_id, origin_space_id);
	case cell_split_direction::high_z:
		return split_z(cur_cell->boundary().max.z - 4 * m_ghost_radius, origin_space_id, new_space_game_id, origin_space_id, new_space_id);
	default:
		return nullptr;
	}

}

这里会从分割方向从当前的节点里分裂出4 * m_ghost_radius的部分，然后使用split函数来创建新节点并维护好整体的分割树。

最后的删除低负载space_node的负载均衡最简单，遍历所有的叶子节点，如果这个节点近期汇报的负载都小于阈值lb_param.max_cell_load_when_remove，则选取其中负载最低的作为要合并的节点，相关负载都需要合并到其兄弟节点里去:

const space_cells::space_node* space_cells::get_best_cell_to_merge(const std::unordered_map<std::string, float>& game_loads, const cell_load_balance_param& lb_param)
{
	const space_cells::space_node* best_result = nullptr;
	for (const auto& [one_cell_id, one_cell_node] : m_leaf_nodes)
	{
		if (!one_cell_node->is_leaf_cell())
		{
			continue;
		}
		if (one_cell_node->is_merging())
		{
			continue;
		}
		if (one_cell_node->space_id() == m_master_cell_id)
		{
			continue;
		}
		if (one_cell_node->cell_load_report_counter() <= lb_param.min_cell_load_report_counter_when_remove)
		{
			continue;
		}
		auto cur_sibling = one_cell_node->sibling();
		if (!cur_sibling || cur_sibling->m_min_cell_load_report_counter <= lb_param.min_cell_load_report_counter_when_remove)
		{
			continue;
		}
		if (cur_sibling->is_merging())
		{
			continue;
		}
		if (one_cell_node->get_smoothed_load() > lb_param.max_cell_load_when_remove)
		{
			continue;
		}
		
		if (!best_result || one_cell_node->get_smoothed_load() < best_result->get_smoothed_load())
		{
			best_result = one_cell_node;
		}

	}
	return best_result;
}

这个节点合并并不是一下就执行好的，而是先将这个节点标记为is_merging状态，同时将这个节点的边长缩小为0.5*m_ghost_radius:

bool space_cells::start_merge(const std::string& cell_id)
{
	auto temp_node_iter = m_leaf_nodes.find(cell_id);
	if (temp_node_iter == m_leaf_nodes.end())
	{
		return false;
	}

	auto cur_node = temp_node_iter->second;
	if (!cur_node->parent())
	{
		return false;
	}
	if (cur_node->is_merging())
	{
		return false;
	}
	if(cur_node->sibling()->is_merging())
	{
		return false;
	}
	cur_node->set_is_merging();
	auto cur_parent = cur_node->parent();
	auto remain_radius = 0.5 * m_ghost_radius;
	double new_split_pos = cur_node->boundary().min.x + remain_radius;
	double old_split_pos = 0;
	if (cur_parent->is_split_x())
	{
		if (cur_node == cur_parent->children()[0])
		{
			old_split_pos = cur_node->boundary().max.x;
			new_split_pos = cur_node->boundary().min.x + remain_radius;
		}
		else
		{
			old_split_pos = cur_node->boundary().min.x;
			new_split_pos = cur_node->boundary().max.x - remain_radius;
		}
	}
	else
	{
		if (cur_node == cur_parent->children()[0])
		{
			old_split_pos = cur_node->boundary().max.z;
			new_split_pos = cur_node->boundary().min.z + remain_radius;
		}
		else
		{
			old_split_pos = cur_node->boundary().min.z;
			new_split_pos = cur_node->boundary().max.z - remain_radius;
		}
	}
	cur_parent->children()[0]->update_boundary_with_new_split(new_split_pos, cur_parent->is_split_x(), new_split_pos < old_split_pos, true);
	cur_parent->children()[1]->update_boundary_with_new_split(new_split_pos, cur_parent->is_split_x(), new_split_pos < old_split_pos, false);
	return true;
}

当space_server同步到了最新的space_node的范围之后，被标记为is_merging的节点会在计时器里驱动当前节点的所有entity迁移到周围的space_node中。当所有的entity都迁移完毕之后，会通知回space_service，通过space_cells::finish_merge来最终将这两个叶子节点合并为一个节点，同时通知对应的space对象执行销毁操作：

void space_service::report_finish_cell_merge(const utility::rpc_msg& data,  const std::string& cell_space_id)
{
	auto cur_cell_iter = m_cell_spaces.find(cell_space_id);
	if (cur_cell_iter == m_cell_spaces.end())
	{
		return;
	}
	auto cur_union_space_iter = m_union_spaces.find(cur_cell_iter->second->union_space_id);
	if (cur_union_space_iter == m_union_spaces.end())
	{
		return;
	}
	auto cur_cell_game_id = cur_union_space_iter->second->cells.finish_merge(cell_space_id);
	if (cur_cell_game_id.empty())
	{
		return;
	}
	
	utility::rpc_msg cell_notify_msg;
	cell_notify_msg.cmd = "notify_finish_merge_space_node";
	cell_notify_msg.set_args(cell_space_id);
	notify_cell_change(cur_cell_iter->second->union_space_id, cell_notify_msg, std::string{});


	utility::rpc_msg cur_destroy_msg;
	cur_destroy_msg.cmd = "notify_destroy_space";
	cur_destroy_msg.args.push_back(cell_space_id);
	call_space_manager(cur_cell_game_id, cur_destroy_msg);
	m_game_loads[cur_cell_game_id].cell_spaces.erase(cell_space_id);
	m_cell_spaces.erase(cell_space_id);
	m_space_types.erase(cell_space_id);
}

深入理解游戏服务端:从网络通信到无缝迁移

Mosaic Game 的分布式场景

space_service上的分布式场景

cell_space管理器space_cells

SpaceCells的状态同步

SpaceCells负载均衡