Track: Artificial Intelligence
Dr. Mehmet M. Dalkilic, born and raised in Austin, Texas was the first faculty in the The Luddy School of Informatics, Computing, and Engineering. He is an Associate Professor in Computer Science/Adjunct in Statistics and the Director for the Undergraduate Data Science Program this inaugural Fall 2020. He was responsible for Informatics’ Introductory curriculum I101 as well as a new dual CS introductory class C200 that also counts toward the Data Science Degree and soon Statistics. He was the co-creator of the graduate Computational Biology program. His work in Data Science includes astronomy, geology, marine ecology, transportation, AI/ML, big data. He is currently an Visiting Faculty at the Electro-Optics Division, Crane (NSWC), Crane.
Here is the abstract:
Contemporary data mining algorithms are easily overwhelmed with truly big data. While parallelism, improved initialization, and ad hoc data reduction are commonly used and necessary strategies, we note that (1) continually revisiting data and (2) visiting all data are two of the most prominent problems–especially for iterative learning techniques like expectation-maximization algorithm for clustering (EM-T). To the best of our knowledge, there is no freely available software that specifically focuses on improving the original EM-T algorithm in the context of big data. We demonstrate the utility of CRAN package DCEM that implements an improved version of EM-T which we call EM* (EM star). DCEM provides an integrated and minimalistic interface to EM-T and EM* algorithms, and can be used as either (1) a stand-alone program or (2) a pluggable component in existing software. We show that EM* can both effectively and efficiently cluster data as we vary size, dimensions, and separability.