260x Filetype PPTX File size 0.08 MB Source: web.njit.edu
Chi-square test • The chi-square test is a popular feature selection method when we have categorical data and classification labels as opposed to regression • In a feature selection context we would apply the chi-square test to each feature and rank them chi-square values (or p-values) • A parallel solution is to calculate chi-square for all features in parallel at the same time as opposed to one at a time if done serially Chi-square test Contingency table • We have two random variables: – Label (L): 0 or 1 Feature=A Feature=B – Feature (F): Categorical • Null hypothesis: the two variables are independent of each other (unrelated) Label=0 Observed=c1 Observed=c2 • Under independence Expected=X1 Expected=X2 – P(L,F)= P(D)P(G) – P(L=0) = (c1+c2)/n Label=1 Observed=c3 Observed=c4 – P(F=A) = (c1+c3)/n Expected=X3 Expected=X4 • Expected values – E(X1) = P(L=0)P(F=A)n • We can calculate the chi-square statistic for a given feature and the probability that it is d-1 2 independent of the label (using the p-value). 2 (c - x) • Features with very small probabilities deviate c = i i significantly from the independence assumption å x and therefore considered important. i=0 i Parallel GPU implementation of chi-square test in CUDA • The key here is to organize the data to enable coalescent memory access • We define a kernel function that computes the chi- square value for a given feature • The CUDA architecture automatically distributes the kernel across different GPU cores to be processed simultaneously.
no reviews yet
Please Login to review.