This document describes a system called DeviceAnalyzer that uses Apache Spark and Apache Lucene to build predictive models in near-real time from streaming and batch data. It discusses: 1) Integrating Spark and Lucene to index streaming and batch data for fast search and retrieval, enabling statistical and predictive modeling on the retrieved data. 2) A batch workflow that indexes batch data using Lucene, and a streaming workflow that processes streaming queries and compares or augments results. 3) Statistical and machine learning operators like summation, L1/L2 regularization, and sparse linear algebra for building models on retrieved device profiles.