1.读取csv数据做dbscan分析
读取csv文件中相应的列,然后进行转化,处理为本算法需要的格式,然后进行dbscan运算,目前公开的代码也比较多,python教程分享Python取读csv文件做dbscan分析根据公开代码修改,
具体代码如下:
from sklearn import datasets import numpy as np import random import matplotlib.pyplot as plt import time import copy import pandas as pd # from sklearn.datasets import load_iris def find_neighbor(j, x, eps): n = list() for i in range(x.shape[0]): temp = np.sqrt(np.sum(np.square(x[j] - x[i]))) # 计算欧式距离 if temp <= eps: n.append(i) return set(n) def dbscan(x, eps, min_pts): k = -1 neighbor_list = [] # 用来保存每个数据的邻域 omega_list = [] # 核心对象集合 gama = set([x for x in range(len(x))]) # 初始时将所有点标记为未访问 cluster = [-1 for _ in range(len(x))] # 聚类 for i in range(len(x)): neighbor_list.append(find_neighbor(i, x, eps)) if len(neighbor_list[-1]) >= min_pts: omega_list.append(i) # 将样本加入核心对象集合 omega_list = set(omega_list) # 转化为集合便于操作 while len(omega_list) > 0: gama_old = copy.deepcopy(gama) j = random.choice(list(omega_list)) # 随机选取一个核心对象 k = k + 1 q = list() q.append(j) gama.remove(j) while len(q) > 0: q = q[0] q.remove(q) if len(neighbor_list[q]) >= min_pts: delta = neighbor_list[q] & gama deltalist = list(delta) for i in range(len(delta)): q.append(deltalist[i]) gama = gama - delta ck = gama_old - gama cklist = list(ck) for i in range(len(ck)): cluster[cklist[i]] = k omega_list = omega_list - ck return cluster # x = load_iris().data data = pd.read_csv("testdata.csv") x,y=data['time (sec)'],data['height (m hae)'] print(type(x)) n=len(x) x=np.array(x) x=x.reshape(n,1) y=np.array(y) y=y.reshape(n,1) x = np.hstack((x, y)) cluster_std=[[.1]], random_state=9) eps = 0.08 min_pts = 5 begin = time.time() c = dbscan(x, eps, min_pts) end = time.time() plt.figure() plt.scatter(x[:, 0], x[:, 1], c=c) plt.show()
2.输出结果显示
修改参数显示:
eps = 0.8 min_pts = 5
3.计算效率
采用少量数据计算的时候效率问题不明显,随着数据量增大,计算效率问题就变得尤为明显,难以满足大量数据的计算需求了,后期将想办法优化计算方法或者收集c++代码进行优化了。
到此这篇关于python取读csv文件做dbscan分析的文章就介绍到这了,更多相关python dbscan分析内容请搜索<计算机技术网(www.ctvol.com)!!>以前的文章或继续浏览下面的相关文章希望大家以后多多支持<计算机技术网(www.ctvol.com)!!>!
需要了解更多python教程分享Python取读csv文件做dbscan分析,都可以关注python教程分享栏目—计算机技术网(www.ctvol.com)!
本文来自网络收集,不代表计算机技术网立场,如涉及侵权请联系管理员删除。
ctvol管理联系方式QQ:251552304
本文章地址:https://www.ctvol.com/pythontutorial/1086465.html