
Modern multidisciplinary materials science routinely processes scientific workflows integrating different data resources (e.g., X-ray data, scripts, analytical results). Most of such data resources are isolated in research labs and remain underutilized. Many machine learning techniques require significant training data that is not always available in materials research. Inspired by past work on discovery of high temperature piezoelectrics where the distributed nature of the training and testing data was highlighted, we are creating CRUX, a CRowdsourced Materials Data Engine for Unpublished X-Ray Diffraction (XRD) Results in collaboration with our industry partners and developer community. CRUX aims to innovate and promote the utilization of high-quality, unpublished material science data and address fundamental challenges of data-driven material science such as accessing extensive experimental data and relationship to the processibility of the discovered materials. At the core of CRUX is a materials knowledge graph (a semantic network of materials entities and their semantic relationships), which coherently represents abstract factual knowledge from XRD datasets, answers queries, and self-evolves to recommend newly shared datasets even beyond the ad-hoc need of the question itself to help users explore the datasets. Along with automatic data integration, an exploratory query engine that supports "Why" and "What-if" analysis for XRD analysis is also a vital part of CRUX. CRUX is empowered by coherent data-workflow modeling, knowledge-based resource assembly for workflow search, and data provenance to support workflow exploration. We make the case of CRUX for peak finding in X-ray Diffraction data and inspire the novel design of ML pipelines for data-driven materials science. CRUX will make materials data resources available for a broad community including materials scientists, data analysts, developers, and the general public by enabling new interactive paradigms to explore and design workflows.