Keynote Session: Approaches to Achieving Realtime Ingestion and Analysis of Security Events
Today’s enterprise architectures can often be composed of a myriad of heterogeneous devices. Bring-Your-Own-Device policies, vendor diversification, and the transition to the cloud all contribute to a sprawling infrastructure whose complexity and scale can only be addressed by using modern distributed data processing systems. In this session, we describe the system that Capital One has built to collect, clean, and analyze the security-related events occurring within its digital infrastructure.
Raw data from each component is collected and pre-processed using Apache NiFi flows. This raw data is then written into an Apache Kafka cluster, which serves as the primary communications backbone of our platform. The raw data is then parsed, cleaned, and enriched in realtime via Apache Metron and Apache Storm. This refined data is then ingested into ElasticSearch, allowing operations teams to detect and monitor events as they occur. The refined data is also transformed into the Apache ORC data format, and stored in Amazon S3, allowing data scientists to perform long-term, batch-based analysis.
We discuss the challenges involved with architecting and implementing this system include issues surrounding data quality, performance tuning, and the impact of additional financial regulations relating to data governance. Finally, we describe the result of our efforts and the value that our data platform brings to Capital One as a business.