• AI and big data management for Autonomous Driving (AD)

Frank Kraemer, IBM (kraemerf@de.ibm.com)

Extremely Scalable, Cost-Optimized ADAS/AD Data Management and Autonomous Driving Development Infrastructure

The automobile is quickly morphing from an isolated, largely mechanical piece of equipment to one of the most technically sophisticated and connected platforms on the planet. Few technologies have been more anticipated heading into the 2020s than autonomous vehicles. Still perhaps decades from market adoption in some use cases, the technology is as promising as it is misunderstood.

The one thing these initiatives all have in common is data – miles and miles of data. Each sensor and system on connected cars generate a steady stream of information. The research and development behind future systems requires analysis of massive files and data sets. Dealing with the volume, velocity and variety of all this data creates a unique challenge. The technologies behind autonomous vehicles can be primarily broken down into three categories: sensors, edge computing and AI based control algorithms. The automotive industry is entering a new, highly competitive, transitional period where demand for new conveniences, safety capabilities and selling models are driving dramatic change. Once an industry consisting of pure hardware and adrenalin, automotive design is increasingly differentiated by software – with many visits to the dealership replaced by over-the-air bug fixes. At the forefront is Advanced Driver Assistance Systems (ADAS), which introduce disruptive requirements on engineering IT infrastructure – particularly storage, where even entry-level capacities are measured in petabytes.

Extreme scalability demands for AD development

Autonomous vehicle development requires a lot of data which is generated by the vehicle’s hardware, including: a camera that generates 20-60 MB/s, sonar at 10-100 KB/s, radar upwards of 10 KB/s, LiDAR systems that range between 10-70 MB/s, and GPS that runs at 50 KB/s. To put that figure into perspective, self-driving cars will consume and generate approximately 50-70 terabytes (TB) of data for every eight hours of driving. Along with these extreme amounts of data, autonomous vehicles require storage optimized for cost, access and performance. Hot or frequently used data must be able to accommodate the extreme analysis required for fast decisions. Warm data, accessed less frequently, needs to expand when required and perform in an always-on environment that may require recall at any time. Cold data, used the least, must provide the lowest-cost option and easily integrate into the total solution.

Spectrum Scale for the ideal AD development infrastructure

Developing and testing autonomous driving (AD) systems requires the analysis and storage of more data than ever before. Clients who can deliver insights faster while managing rapid infrastructure growth will be the industry leaders. In order to deliver these insights, the underlying storage technology must support both new big data as well as traditional applications with security, reliability and high-performance. To handle massive, unstructured data growth, the solution must scale seamlessly while matching data value to the capabilities and costs of different storage tiers and types. Spectrum Scale meets these challenges and more.

IBM Spectrum Scale is a high-performance Software Defined Storage (SDS) solution for managing data with the distinctive ability to perform archive and analytics in place. Spectrum Scale unifies virtualization, analytics, as well as file and object use cases into a single, scale-out storage solution. It can provide a single namespace for all this data, offering a single point of management with an intuitive graphical user interface.

Spectrum Scale offers system scalability, very high availability and reliability with no single point of failure in large storage infrastructure. Administrators can configure the file system so that it automatically remains available if a disk or server fails.

Proven technology for high-performance data management

With ADAS development, change is inevitable. As vehicles become fully autonomous, performance requirements become even less predictable. Spectrum Scale is full-featured, software-defined storage including advanced storage virtualization, integrated high availability, automated tiering, and the performance to effectively manage very large quantities of file or object data. With the ability to independently scale capacity, performance, protocols and resources, Spectrum Scale is the ideal solution to handle unpredictable ADAS data management workloads.

Spectrum Scale also allows different ADAS or AD data management applications and services to access the same data without movement or alteration. Data can be written and retrieved as files or objects. Rather than use a copy-and-change gateway, Spectrum Scale natively supports both protocols for higher performance and simplified administration. A common storage layer enables most Spectrum Scale features including authentication, encryption and tiering for both object and file storage.

Advanced data management

ADAS development requires contractual and regulatory commitments for test data retention. Keeping tens, and soon hundreds of petabytes in high performance data storage is a requirement during the simulation and validation phase. This data must be retained for multiple decades including service contracts commonly mandating restoration and re-simulation measured in days.

Spectrum Scale can help improve performance, lower costs, add resiliency and simplify collaboration with algorithmic and policy-driven data movement including copying and caching. Spectrum Scale catalogs data across multiple storage pools including the cloud. It tracks usage profiles, storage latency and a broad range of standard and custom metadata from which data movement policies can be constructed.

Armed with both awareness of data usage and its underlying storage, Spectrum Scale curate’s data across multiple storage tiers, including tape and cloud. The powerful, data-aware intelligence engine can create optimized, tiered storage pools by grouping devices – flash, solid-state drive (SSD), disk or tape – based on performance, location or cost. Migration polices transparently move data from one storage pool to another without changing the file’s location in the directory structure. Automated analysis of data usage patterns can help raise data to higher performance tiers as needed. The information life cycle management tools built into Spectrum Scale helps simplify data management by providing additional control over data placement. These tools include storage pooling and a high-performance, rule-based policy engine.

Remove data-related bottlenecks for HiL (Hardware in-the-Loop) / SiL (Software in-the-Loop / MiL (Machine in-the-Loop) testing

Slow storage negatively impacts applications, delays schedules and wastes expensive infrastructure. Spectrum Scale can speed time to results and maximize utilization by providing parallel data access, a requirement for HiL and SiL testing. Further, shared disks and storage-rich servers improve scalability for high-performance workloads. Spectrum Scale is based on a parallel file system with intelligence in the client, spreading the load across all storage cluster nodes, including individual files. In traditional scale-out NAS, one file can only be accessed through one node at a time by each client. This parallel file system architecture allows Spectrum Scale to seamlessly handle tens of thousands of clients, billions of files and yottabytes of data.

Empower global collaboration across the world

Spectrum Scale enables low latency read and write access to data from anywhere in the world using Active File Management (AFM), distributed routing and advanced caching technology. AFM expands the Spectrum Scale global namespace across geographical distances, providing fast read and write performance with automated namespace management. As data is written or modified at one location, all other locations get the same data with minimal delays. AFM leverages the inherent scalability of Spectrum Scale, providing a high-performance, location-independent solution that masks network failures and hides wide-area latencies and outages. These game-changing capabilities accelerate project schedules and improve productivity for globally distributed teams.

Tiering across storage layers to simplify data management at scale

Spectrum Scale includes integrated management tools and an intuitive graphical user interface to help manage data at scale. The file system can span multiple storage environments and data centers across the world to eliminate data silos and “filer sprawl.” Spectrum Scale can intelligently spread data across multiple heterogeneous storage devices – optimizing available storage utilization, reducing administration and delivering high performance where needed. The software includes multiple deployment and configuration options which accommodate current NFS filers, block storage and storage-rich servers into a global namespace with universal access.

Integration with Hadoop and Spark workloads

Spectrum Scale supports Hortonworks Hadoop workloads and the Hadoop Distributed File System (HDFS) without requiring any changes to applications. With the Spectrum Scale Hadoop connector, multiple Spectrum Scale clusters or other HDFS repositories can be federated into a single HDFS instance. Spectrum Scale reduces the need to move data, simplifying the deployment and workflow of Hadoop, Apache Spark and related packages.

Data Archiving for cost effective tape storage for ADAS and AD development infrastructure

Spectrum Archive is designed to address data storage inefficiencies by changing storage economics with a layer of intelligent software. Spectrum Archive is a perfect cost-effective solution for retaining large amount of data generated by ‘smart’ cars. This is an ideal infrastructure solution for autonomous vehicle developers that provides an easy way to move data from test vehicles to cost-effective tape drives and libraries within a tiered storage infrastructure. By using tape libraries instead of disks for Tier 2 and Tier 3 long term storage, ADAS and AD development infrastructure can improve efficiency and reduce costs related to storing growing amounts of data.

Integration with Colocation Datacenters

Colocation data center providers such as NTT are the perfect foundation for mobility data. To enable fast and secure uploading of data, a direct link connection to the major cloud services is essential. Colocation data centers build the bridge between tangible physical systems and globally accessible public cloud services. By offering a multi-service interconnection platform, NTT colocation data centers provide state-of-the-art connectivity to all major public cloud services and hybrid cloud scenarios.

In using large scale AI training and simulation computing for autonomous driving the adequate hardware to process all this information needs a secured space. Since high-performance GPU server and mass data storage systems require high power density and efficient cooling, only colocation data centers can offer a certified and secure environment for these use cases. Clients benefit from a secure and lockable surrounding within a scalable and professionalized data center infrastructure offering uninterruptible power supply and backup generators, as well as redundant cooling and air conditioning systems.