# Analysis of test data using K-Means Clustering in Python

This article demonstrates an illustration of K-means clustering on a sample random data using open-cv library.

Pre-requisites: Numpy, OpenCV, matplot-lib
Let’s first visualize test data with Multiple Features using matplot-lib tool.

 `# importing required tools ` `import` `numpy as np ` `from` `matplotlib ``import` `pyplot as plt ` ` `  `# creating two test data ` `X ``=` `np.random.randint(``10``,``35``,(``25``,``2``)) ` `Y ``=` `np.random.randint(``55``,``70``,(``25``,``2``)) ` `Z ``=` `np.vstack((X,Y)) ` `Z ``=` `Z.reshape((``50``,``2``)) ` ` `  `# convert to np.float32 ` `Z ``=` `np.float32(Z) ` ` `  `plt.xlabel(``'Test Data'``) ` `plt.ylabel(``'Z samples'``) ` ` `  `plt.hist(Z,``256``,[``0``,``256``]) ` ` `  `plt.show() `

Here ‘Z’ is an array of size 100, and values ranging from 0 to 255. Now, reshaped ‘z’ to a column vector. It will be more useful when more than one features are present. Then change the data to np.float32 type.

Output:

Now, apply the k-Means clustering algorithm to the same example as in the above test data and see its behavior.
Steps Involved:
1) First we need to set a test data.
2) Define criteria and apply kmeans().
3) Now separate the data.
4) Finally Plot the data.

 `import` `numpy as np ` `import` `cv2 ` `from` `matplotlib ``import` `pyplot as plt ` ` `  `X ``=` `np.random.randint(``10``,``45``,(``25``,``2``)) ` `Y ``=` `np.random.randint(``55``,``70``,(``25``,``2``)) ` `Z ``=` `np.vstack((X,Y)) ` ` `  `# convert to np.float32 ` `Z ``=` `np.float32(Z) ` ` `  `# define criteria and apply kmeans() ` `criteria ``=` `(cv2.TERM_CRITERIA_EPS ``+` `cv2.TERM_CRITERIA_MAX_ITER, ``10``, ``1.0``) ` `ret,label,center ``=` `cv2.kmeans(Z,``2``,``None``,criteria,``10``,cv2.KMEANS_RANDOM_CENTERS) ` ` `  `# Now separate the data ` `A ``=` `Z[label.ravel()``=``=``0``] ` `B ``=` `Z[label.ravel()``=``=``1``] ` ` `  `# Plot the data ` `plt.scatter(A[:,``0``],A[:,``1``]) ` `plt.scatter(B[:,``0``],B[:,``1``],c ``=` `'r'``) ` `plt.scatter(center[:,``0``],center[:,``1``],s ``=` `80``,c ``=` `'y'``, marker ``=` `'s'``) ` `plt.xlabel(``'Test Data'``),plt.ylabel(``'Z samples'``) ` `plt.show() `

Output:

This example is meant to illustrate where k-means will produce intuitively possible clusters.

Applications:
1) Identifying Cancerous Data.
2) Prediction of Students’ Academic Performance.
3) Drug Activity Prediction.