BigWorld 的场景管理

Cell与Space

bigworld中每个场景都有一个Space结构来表示,每个Space都有一个uint32的唯一标识符:

class Space
{
public:
	Space( SpaceID id = 0, bool isNewSpace = true,
		bool isFromDB = false, uint32 preferredIP = 0 );
	~Space();

	void shutDown();

	SpaceID id() const		{ return id_; }

	CellData * addCell( CellApp & cellApp, CellData * pCellToSplit = NULL );
	CellData * addCell();
	void addCell( CellData * pCell );
	CellData * addCellTo( CellData * pCellToSplit );
private:
	SpaceID	id_;
	Cells cells_;

	CM::BSPNode * pRoot_;
};

然后对于分布式的场景,整个逻辑场景会由多个方块场景聚合而成,每个方块部分对应一个CellData,然后所有的方块存储在Cells这个CellData的线性容器中:

class Cells
{
private:
	typedef BW::vector< CellData * > Container;

public:
	Cells() {}
	~Cells();

	void add( CellData * pData )		{ cells_.push_back( pData ); }
	void erase( CellData * pData );
private:
	Container cells_;
};

值得注意的是每个CellData除了在这个Cells里线性存储之外,CellData其实还有一个二叉树状结构,它继承自BSPNode,这个BSP其实就是Binary Space Partitioning的简称。每个Space都有一个CM::BSPNode * pRoot_的成员变量来存储二叉分割树的根节点,同时每个BSPNode都有一个BW::Rect range_代表当前Cell负责的场景区域:

class CellData : public CM::BSPNode
{
public:
	CellData( CellApp & cellApp, Space & space );
	CellData( Space & space, BinaryIStream & data );
	~CellData();
};

class BSPNode : public WatcherProvider
{
public:
	BSPNode( const BW::Rect & range );
	virtual ~BSPNode() {};
protected:
	BW::Rect range_;
	EntityBoundLevels entityBoundLevels_;
	BW::Rect chunkBounds_;
};

二叉分割树

在二维平面里的二叉分割允许使用任意的直线,不过这里的Binary Space Partitioning会限制为只能水平划分或者垂直划分,对应的addCell接口里需要显示用bool isHorizontal来表明是水平划分还是垂直划分:

virtual CM::BSPNode * addCell( CellData * pCell, bool isHorizontal );

此时就退化成了一个KDTree:

二叉分割树垂直

下面就是一个具体按照水平或者竖直方向进行划分的的Space实例:

space划分实例

此时对应的KDTree就是这样的:

space划分对应的kdtree

注意到前面addCell的时候,新的Cell对应的分割轴上的区间大小其实是0,也就是说新Cell对应的Rect面积是0。新添加的CellRect会在后续的负载均衡中进行调整:

CM::BSPNode * CellData::addCell( CellData * pCell, bool isHorizontal )
{
	const float partitionPt = range_.range1D( isHorizontal ).max_;
	BW::Rect newRange = range_;
	newRange.range1D( isHorizontal ).min_ = partitionPt;
	newRange.range1D( isHorizontal ).max_ = partitionPt;
	pCell->setRange( newRange );

	// TODO: At the moment, the new cell is always added to the right or top. It
	// may be better to choose the side based on which side is unbounded. A
	// simple test might be to check if fabs( min_ ) < fabs( max_ ) of
	// range_.range1D( isHorizontal ).

	return new CM::InternalNode( this, pCell,
			isHorizontal, range_, partitionPt );
}

注意这里最后的返回值是CM::InternalNode,这个类型也继承自BSPNode,传入的两个CellData会作为当前InternalNode的左右子节点存在:

InternalNode::InternalNode( BSPNode * pLeft, BSPNode * pRight,
		bool isHorizontal, const BW::Rect & range, float position ) :
	// Note: There are three constructors.
	BSPNode( range )
{
	this->init();
	pLeft_ = pLeft;
	pRight_ = pRight;
	isHorizontal_ = isHorizontal;
	position_ = position;
}

所以整个BSPNode被划分为了两种类型:

  1. 一种是有两个子节点的InternalNode类型,是BSP树里的内部节点,这个类型不负责具体的场景区域,
  2. 另外一种是没有子节点的CellData类型,是BSP树里的叶子节点,每个叶子节点负责一块具体的场景区域

Space里存储的根节点CM::BSPNode * pRoot_则可能是两种节点类型中的一种。

Space的创建

Bigworld里,CellAppMgr负责创建Space,并将其分配到合适的CellApp上运行。这个创建Space的入口函数是createEntityInNewSpace,这个函数会在CellAppMgr收到创建新Space的请求时被调用:

void CellAppMgr::createEntityInNewSpace( const Mercury::Address& srcAddr,
		const Mercury::UnpackedMessageHeader& header,
		BinaryIStream & data )
{
	bool doesSpaceHavePreferredIP;

	data >> doesSpaceHavePreferredIP;

	uint32 preferredIP = (doesSpaceHavePreferredIP ? srcAddr.ip : 0);

	if (doesSpaceHavePreferredIP)
	{
		TRACE_MSG( "CellAppMgr::createEntityInNewSpace: "
					"Creating space with preferred IP %s\n",
				srcAddr.ipAsString() );
	}

	Space * pSpace = new Space( this->generateSpaceID(),
		/*isNewSpace*/ true, /*isFromDB*/ false,
		preferredIP );
	if (pSpace->addCell())
	{
		this->addSpace( pSpace );
	}
	else
	{
		ERROR_MSG( "CellAppMgr::createEntityInNewSpace: "
				"Unable to add a cell to space %u.\n", pSpace->id() );
		bw_safe_delete( pSpace );
	}

	//passing pSpace==NULL is needed here to send the errors (and is safe)
	this->createEntityCommon( pSpace, srcAddr, header, data );
}

这个RPC的第一个参数是bool doesSpaceHavePreferredIP,表示是否要求在RPC发送者的IP地址上创建Space。如果为false,则会将这个preferredIP设置为0,代表随机选择一个CellApp作为SpaceCellApp。然后CellAppMgr会通过generateSpaceID来生成一个随机生成的SpaceID作为唯一标识符,并以这些参数来New一个新的Space对象:

Space::Space( SpaceID id, bool isNewSpace, bool isFromDB, uint32 preferredIP ) :
	id_( id ),
	pRoot_( NULL ),
	isBalancing_( false ),
	preferredIP_( preferredIP ),
	isFirstCell_( isNewSpace ),
	isFromDB_( isFromDB ),
	hasHadEntities_( !isFromDB ),
	waitForChunkBoundUpdateCount_( 0 ),
	spaceGrid_( 0.f ),
	spaceBounds_( 0.f, 0.f, 0.f, 0.f ),
	artificialMinLoad_( 0.f )
{
}

Space的构造函数里,pRoot_被初始化为nullptr, 为了维持树结构的有效性,SpaceCellAppMgr创建的时候会自动的通过addCell接口来创建根节点:

CellData * Space::addCell()
{
	CellAppGroup * pGroup = NULL;

	if (!cells_.empty())
	{
		pGroup = cells_.front()->cellApp().pGroup();
	}

	const CellApps & cellApps = CellAppMgr::instance().cellApps();
	CellApp * pCellApp = cellApps.findBestCellApp( this, pGroup );


	return pCellApp != NULL ? this->addCell( *pCellApp ) : NULL;
}

Space::addCell这个接口会通过findBestCellApp选择一个负载合适的CellApp来承载这个完整的Space。然后再以这个pCellApp作为唯一参数去调用双参数形式的addCell,此时第二个参数默认为nullptr:

CellData * Space::addCell( CellApp & cellApp, CellData * pCellToSplit = NULL )
{
	INFO_MSG( "Space::addCell: Space %u. CellApp %u (%s)\n",
			id_, cellApp.id(), cellApp.addr().c_str() );

	if (cellApp.isRetiring())
	{
		WARNING_MSG( "Space::addCell: Adding a cell to CellApp %u (%s) which "
			"is retiring.\n", cellApp.id(), cellApp.addr().c_str() );
	}

	CellData * pCellData = new CellData( cellApp, *this );

	if (pCellToSplit)
	{
		MF_ASSERT( pRoot_ != NULL );
		pRoot_ = pRoot_->addCellTo( pCellData, pCellToSplit );
		MF_ASSERT( pRoot_ != NULL );
	}
	else
	{
		pRoot_ = (pRoot_ ? pRoot_->addCell( pCellData ) : pCellData);
	}

	pRoot_->updateLoad();
	// 省略后续代码
}

这里的会发现此时的pRoot_为空,因此直接使用新创建的pCellData作为pRoot_,因此刚创建的时候SpaceBSP树只有一个叶子节点CellData,负责所有区域。

在初始状态下,SpaceBSP树只有一个叶子节点CellData,负责所有区域,后续会根据负载均衡的结果来不断的调整BSP树的结构,来增减CellData节点。这部分的内容将留到后续的章节中介绍。

目前执行这个远程调用的代码只有一处,在BaseApp暴露给Python脚本的Base::py_createInNewSpace里:

/**
 *	This method implements the base's script method to create an associated
 *	entity on a cell in a new space.
 */
PyObject * Base::py_createInNewSpace( PyObject * args, PyObject * kwargs )
{
	const char * errorPrefix = "Base.createEntityInNewSpace: ";

	PyObject * pPreferThisMachine = NULL;

	static char * keywords[] = 
	{
		const_cast< char * >( "shouldPreferThisMachine" ),
		NULL
	};

	if (!PyArg_ParseTupleAndKeywords( args, kwargs,
		"|O:Base.createEntityInNewSpace", keywords, &pPreferThisMachine ))
	{
		return NULL;
	}

	std::auto_ptr< Mercury::ReplyMessageHandler > pHandler(
		this->prepareForCellCreate( errorPrefix ) );

	if (!pHandler.get())
	{
		return NULL;
	}

	bool shouldPreferThisMachine = false;

	if (pPreferThisMachine)
	{
		shouldPreferThisMachine = PyObject_IsTrue( pPreferThisMachine );
	}

	Mercury::Channel & channel = 
		BaseApp::getChannel( BaseApp::instance().cellAppMgrAddr() );

	// We don't use the channel's own bundle here because the streaming might
	// fail and the message might need to be aborted halfway through.
	std::auto_ptr< Mercury::Bundle > pBundle( channel.newBundle() );

	// Start a request to the Cell App Manager.
	pBundle->startRequest( CellAppMgrInterface::createEntityInNewSpace,
			pHandler.get() );

	*pBundle << shouldPreferThisMachine;

	*pBundle << this->channel().version();

	*pBundle << false; /* isRestore */

	// See if we can add the necessary data to the bundle
	if (!this->addCellCreationData( *pBundle, errorPrefix ))
	{
		isCreateCellPending_ = false;
		isGetCellPending_ = false;

		return NULL;
	}

	// Send it to the Cell App Manager.
	channel.send( pBundle.get() );
	pHandler.release(); // Now owned by Mercury.

	Py_RETURN_NONE;
}

这个接口会暴露给Python脚本调用,从而创建一个新的Space,并在这个Space里创建一个新的实体。这个接口唯一的参数是shouldPreferThisMachine,表示是否要求在当前BaseApp所在的机器上创建Space。如果为true,则会将当前机器的IP地址传递给CellAppMgr。在选择合适的CellApp的时候会通过BaseCellTrafficScorer来提升指定IPCellApp的优先级:

/**
 *	This method calculates the score for a CellApp's base-to-cell traffic.
 *	This is determined by comparing the IP address of the CellApp with the
 *	preferred IP of the space on which a new cell is being added. If this
 *	CellApp is running on the preferred machine, then it is likely that many
 *	of the space's Base entities will exist on that machine. This means that
 *	much of the base-to-cell traffic will occur between processes on the same
 *	machine, reducing network load.
 *	This method returns 1 if the CellApp is on the preferred IP, and 0 if not.
 */
float BaseCellTrafficScorer::getScore( const CellApp * pApp,
		const Space * pSpace ) const
{
	MF_ASSERT( pSpace );

	return (pApp->addr().ip == pSpace->preferredIP()) ? 1.f : 0.f;
}

这个时候大家可能有点疑问了,BaseApp上只能管理Base,是不能管理Cell的,那为什么要通知CellAppMgr优先使用当前BaseAppIP呢?其实BaseAppCellApp只是进程之间隔离,并不需要使用机器来隔离,一个物理机器上可以同时部署多个BaseAppCellApp。所以BaseApp暴露自己的IPCellAppMgr去创建Space是没有什么问题的,这样做的好处就是CellApp与相关的BaseApp之间通信延迟会大大减小,因为只需要本机通信即可。

Space的销毁

Space的销毁同样是由CellAppMgr来负责的,CellAppMgr会收到一个远程调用shutDownSpace,这个调用会传入需要销毁的SpaceID,然后通过findSpace找到对应的Space对象,然后调用其shutDown接口来销毁:

/**
 *	This method handles a message informing us to shut down a space.
 */
void CellAppMgr::shutDownSpace(
		const CellAppMgrInterface::shutDownSpaceArgs & args )
{
	Space * pSpace = this->findSpace( args.spaceID );

	if (pSpace)
	{
		if (pSpace->hasHadEntities())
		{
			// Delay shutting down the space until the end of tick
			//	don't shutdown twice
			if (spacesShuttingDown_.insert( args.spaceID ).second)
			{
				pSpace->shutDown();
			}
		}
		else
		{
			NOTICE_MSG( "CellAppMgr::shutDownSpace: Not shutting down space "
								"%u since it has not had any entities\n",
							pSpace->id() );
		}
	}
	else
	{
		ERROR_MSG( "CellAppMgr::shutDownSpace: Could not find space %u\n",
			args.spaceID );
	}
}

这里的spacesShuttingDown_是一个std::set< SpaceID >,用来记录正在销毁的Space,防止重复销毁。

在执行Space::shutDown的时候,会遍历所有的Cell,并通知其CellApp来销毁Space:

/**
 *	This method shuts down this space and removes it from the system.
 */
void Space::shutDown()
{
	INFO_MSG( "Space::shutDown: Shutting down space %u "
				"(remaining cells: %" PRIzu ")\n",
			id_, cells_.size() );

	Cells::iterator iter = cells_.begin();

	while (iter != cells_.end())
	{
		CellApp * pApp = (*iter)->pCellApp();

		if (pApp)
		{
			pApp->shutDownSpace( this->id() );
		}

		++iter;
	}
}

这里的CellApp::shutDownSpace接口会将销毁Space的请求构造为CellAppInterface::shutDownSpace消息,然后发送给对应的CellApp:

/**
 *	This method lets the CellApp know that the space is being destroyed.
 */
void CellApp::shutDownSpace( SpaceID spaceID )
{
	Mercury::Bundle & bundle = this->bundle();
	bundle.startMessage( CellAppInterface::shutDownSpace );
	bundle << spaceID;

	this->send();
}

CellApp收到CellAppInterface::shutDownSpace消息的时候,会调用Space::shutDownSpace接口来销毁Space。这里并不会执行立即销毁,而是注册一个定时器shuttingDownTimerHandle_,计时器的超时时间为1s

/**
 *	This method handles a message from the CellAppMgr telling us that the space
 *	has been destroyed. It may take some time before all the cells are removed.
 */
void Space::shutDownSpace( BinaryIStream & data )
{
	if (!shuttingDownTimerHandle_.isSet())
	{
		// Register a timer to go off in one second.
		shuttingDownTimerHandle_ =
			CellApp::instance().mainDispatcher().addTimer( 1000000, this, NULL,
			"ShutdownSpace" );
	}
	else
	{
		INFO_MSG( "Space::shutDownSpace: Already shutting down.\n" );
	}
}

这个销毁计时器超时之后,会调用pCell_->onSpaceGone接口来通知Cell开始执行退出逻辑,然后检查是否还有其他CellSpace中,如果没有其他CellSpace中且Space中没有其他实体在存在,那么就会调用CellApp::destroyCell来彻底销毁Cell

/**
 *	This method handles the timer associated with the space.
 *	Currently it is only used for the shutting down timer.
 */
void Space::handleTimeout( TimerHandle handle, void * arg )
{
	if (pCell_)
	{
		pCell_->onSpaceGone();

		if (this->hasSingleCell() && entities_.empty())
		{
			CellApp::instance().destroyCell( pCell_ );
			// when the cell is destructed it will clear our ptr to it
			MF_ASSERT( pCell_ == NULL );
		}
	}
}

这里的onSpaceGone接口会遍历所有的实体,调用实体的onSpaceGone脚本接口,然后检查实体是否需要被销毁。如果实体需要被销毁且是RealEntity,那么就会调用实体的destroy接口来销毁实体:

/**
 *	This method is called when this space wants to be destroyed.
 */
void Cell::onSpaceGone()
{
	BW::vector< EntityPtr > entities( realEntities_.size() );
	std::copy( realEntities_.begin(), realEntities_.end(), entities.begin() );

	BW::vector< EntityPtr >::iterator iter = entities.begin();

	while (iter != entities.end())
	{
		EntityPtr pEntity = *iter;

		if (!pEntity->isDestroyed())
		{
			Entity::nominateRealEntity( *pEntity );

			PyObject * pMethod =
				PyObject_GetAttrString( pEntity.get(), "onSpaceGone" );
			Script::call( pMethod, PyTuple_New( 0 ),
					"onSpaceGone", true/*okIfFnNull*/ );

			if (!pEntity->isDestroyed() &&
					pEntity->isReal() &&
					&pEntity->space() == &this->space())
			{
				pEntity->destroy();
			}

			Entity::nominateRealEntityPop();
		}

		++iter;
	}
}
/**
 *	This method kills a cell.
 */
void CellApp::destroyCell( Cell * pCell )
{
	cells_.destroy( pCell );
}

void Cells::destroy( Cell * pCell )
{
	Container::iterator iter = container_.find( pCell->spaceID() );

	MF_ASSERT( iter != container_.end() );

	if (iter != container_.end())
	{
		container_.erase( iter );
		delete pCell;
	}
	else
	{
		ERROR_MSG( "Cells::deleteCell: Unable to kill cell %u\n",
									pCell->spaceID() );
	}

}


/**
 *	The destructor for Cell.
 */
Cell::~Cell()
{
	TRACE_MSG( "Cell::~Cell: for space %u\n", space_.id() );

	while (!realEntities_.empty())
	{
		int prevSize = realEntities_.size();

		realEntities_.front()->destroy();

		MF_ASSERT( prevSize > (int)realEntities_.size() );

		if (prevSize <= (int)realEntities_.size())
		{
			break;
		}
	}

	bw_safe_delete( pReplayData_ );

	MF_ASSERT_DEV( space_.pCell() == this );

	space_.pCell( NULL );
}


CellAppMgr帧末尾的时候,会遍历spacesShuttingDown_集合,来强行删除所有的成员Space

{
	SpaceIDs::iterator iter = spacesShuttingDown_.begin();
	while (iter != spacesShuttingDown_.end())
	{
		Spaces::iterator found = spaces_.find( *iter );
		if (found != spaces_.end())
		{
			delete found->second;
			spaces_.erase( found );
		}
		++iter;
	}
	spacesShuttingDown_.clear();
}

看上去一旦接收到销毁Space的请求,就会立即添加到spacesShuttingDown_集合中,然后在CellAppMgr帧末尾的时候,会遍历spacesShuttingDown_集合,来强行删除所有的成员Space。完全没有等待所有的CellSpace销毁完成的步骤,由于CellSpace的销毁是异步的,所以在CellAppMgr帧末尾的时候,可能还有CellSpaceCellApp里处于销毁中的状态。

为了避免异步操作可能出现的问题,需要在销毁的RPC发起者那里确保后续不再需要这些CellSpace去执行逻辑。目前这个shutDownSpace的唯一调用位置就在CellApp上的Space::requestShutDown接口中:

/**
 *
 */
void CellAppMgrGateway::shutDownSpace( SpaceID spaceID )
{
	CellAppMgrInterface::shutDownSpaceArgs args;
	args.spaceID = spaceID;

	channel_.bundle() << args;

	channel_.send();
}


/**
 *	This method sends a request to the CellAppMgr to shut this space down.
 */
void Space::requestShutDown()
{
	if ( CellAppConfig::useDefaultSpace() && this->id() == 1 )
	{
		ERROR_MSG( "Space::requestShutDown: Requesting shut down for "
			"the default space\n" );
	}
	CellApp::instance().cellAppMgr().shutDownSpace( this->id() );
}

这个接口有两个调用位置,一个是Entity::destroySpace,这个destroySpace暴露给了Python,用来让逻辑层来强行驱动一个场景的销毁;另一个是Space::checkForShutDown,用来检查场景是否需要被销毁。


/*~ function Entity destroySpace
	*  @components{ cell }
	*	This method attempts to shut down the space that the entity is in.
	*	It is not possible to shut down the default space.
	*/
PY_METHOD( destroySpace )

/**
 *	This method allows script to destroy a space.
 *
 *	@return Whether we were allowed to destroy the space
 */
bool Entity::destroySpace()
{
	AUTO_SCOPED_THIS_ENTITY_PROFILE;

	if ( CellAppConfig::useDefaultSpace() && this->space().id() == 1)
	{
		PyErr_Format( PyExc_ValueError,
			"destroySpace called on entity %d in default space", int(id_) );
		return false;
	}
	this->space().requestShutDown();
	return true;
}

/**
 *	This method checks whether we should request for this space to shut down.
 *	If we have no entities and we're the only cell, request a shutdown.
 *	We won't actually be deleted however until we've unloaded all our chunks.
 */
void Space::checkForShutDown()
{
	if (this->hasSingleCell() &&
			entities_.empty() && CellApp::instance().hasStarted() &&
			!this->isShuttingDown() &&
			!CellApp::instance().mainDispatcher().processingBroken() &&
			!(CellAppConfig::useDefaultSpace() && id_ == 1)) // Not for the default space.
	{
		INFO_MSG( "Space::checkForShutDown: Space %u is now empty.\n", id_ );
		this->requestShutDown();
	}
}

这个Space::checkForShutDown的销毁条件比较符合预期,就是当前场景没有实体,也只有一个Cell在运行,且不是默认场景。所以此时通知CellAppMgr来销毁这个场景是没有任何问题的。所以这个checkForShutDown接口的调用时机就是两个地方:一个是Space::removeEntity,代表一个Entity离开场景的时候;以及Space::updateGeometry,代表Cell格局被修改的时候:


/**
 *	This method removes an entity from this space.
 */
void Space::removeEntity( Entity * pEntity )
{
	// 省略一些代码

	if (entities_.empty())
	{
		if (pCell_ != NULL)
		{
			this->checkForShutDown();
		}
	}
}

/**
 *	This method handles a message from the server that updates geometry
 *	information.
 */
void Space::updateGeometry( BinaryIStream & data )
{
	bool wasMulticell = !this->hasSingleCell();
	// 省略一些代码
	
	// see if we want to expressly shut down this space now
	if (wasMulticell)
	{
		this->checkForShutDown();
	}
}