并查集

有序表并查集

问题引入

题目：一个矩阵中只有0和1两种值，每个位置都可以和自己的上、下、左、右四个位置相连，如果有一片1连在一起则成为一个岛，求一个矩阵中有几个岛

{0,0,1,0,1,0
1,1,1,0,1,0
1,0,0,1,0,0
0,0,0,0,0,0}
这个矩阵有3个岛

【进阶】如何设计一个并行算法解决

200. 岛屿数量 - 力扣（LeetCode）

int numIslands(vector<vector<char>>& grid) {
    int m=grid.size();
    int n=grid[0].size();
    int result=0;
    for(int i=0;i<m;i++){
        for(int j=0;j<n;j++){
            if(grid[i][j]=='1'){
                result++;
                inflect(grid,m,n,i,j);
            }
        }
    }
    return result;
}

void inflect(vector<vector<char>>& grid,int m,int n,int i, int j){
    if(i>=m||i<0||j>=n||j<0||grid[i][j]!='1') return;
    grid[i][j]='2';
    inflect(grid,m,n,i-1,j);
    inflect(grid,m,n,i+1,j);
    inflect(grid,m,n,i,j-1);
    inflect(grid,m,n,i,j+1);
}

【进阶】===》并查集

操作：

bool isSameSet(a,b)
void union(a,b)

很多结构都可以，但只有并查集两种操作的时间复杂度都是O(1)

bool isSameSet(a,b)

判断a和b的父节点是否相同
void union(a,b)

判断a和b的父节点是否相同，若不同，个数少的集合的顶端挂在个数多的集合顶端之下

findHead(Element a)：往上走找代表结点

改进：往上走的路径记录下来，路径上的结点直接指向父节点

结论：当findHead调用次数逼近或者大于O(N)，时间复杂度趋近于O(1)——越用越快

C++11的for循环有了一种新的用法：
1
2
3
for (auto n : arr){
	std::cout << n << std::endl;
}
上述方式是只读，如果需要修改arr里边的值，可以使用for(auto& n:arr)，for循环的这种使用方式的内在实现实际上还是借助迭代器的，所以如果在循环的过程中对arr进行了~~增加和删除操作~~，那么程序将对出现意想不到的错误。

实现：

#include <iostream>
#include <list>
#include <unordered_map>
#include <unordered_set>
#include <stack>
using namespace std;

class UnionFindSet {
private:
	int findHead(int element) {
		stack<int> path;
		while (element != fatherMap.at(element)) {
			path.push(element);
			element = fatherMap[element];
		}
		while (!path.empty()) {
			fatherMap[path.top()] = element;
			path.pop();
		}
		return element;
	}
public:
	unordered_set<int> set;
	unordered_map<int, int> fatherMap;
	unordered_map<int, int> sizeMap;
	UnionFindSet(list<int> dataList) {
		for (int value : dataList) {
			set.emplace(value);
			fatherMap.emplace(make_pair(value, value));
			sizeMap.emplace(make_pair(value, 1));
		}
	}

	bool isSameSet(int a, int b) {
		if (set.count(a) && set.count(b)) {
			return findHead(a) == findHead(b);
		}
		return false;
	}
	void Union(int a, int b) {
		if (set.count(a) && set.count(b)) {
			int afather = findHead(a);
			int bfather = findHead(b);
			if (afather != bfather) {
				int big = sizeMap[afather] >= sizeMap[bfather] ? afather : bfather;
				int small = big == afather ? bfather : afather;
				fatherMap[small] = big;
				sizeMap[big] = sizeMap[afather] + sizeMap[bfather];
				sizeMap.erase(small);
			}
		}
	}
};

进阶解法：设计一个并行算法解决

问题：左右两个cpu分别感染，左侧cpu会识别两个岛，右侧cpu会识别两个岛，原因：破坏了岛的连通性

左右cpu分别计算的结果：4个岛，接下来进行合并

合并方法：

记录初始感染点（图中绿色），记录边界上被感染点的初始感染点是谁（红色）
四个初始感染点，记为4个集合{A}{B}{C}{D}
比较在边界上相连的感染点
1. 感染点分别为A和C，不在一个集合，进行合并：{AC}{B}{D}，岛数=4-1=3
2. 感染点为B和C，不在一个集合，进行合并：{ABC}{D}，岛数=3-1=2
3. 感染点为B和D，不在一个集合，进行合并：{ABCD}，岛数=2-1=1
4. 感染点为A和D，在一个集合，无需操作
最后，总岛数=1

如果使用很多个CPU，同样的做法：每个CPU分别感染+记录边界上的感染点

有序表 并查集

问题引入

并查集

进阶解法：设计一个并行算法解决

有序表并查集