Cloud Native Summit is CNCF's flagship conference, bringing together users and technical experts from the world's leading open source and cloud native communities. Meet at Cloud Native Summit China on the GOTC 2023 to discuss the future and technical direction of cloud native computing.
Recently, developers around the world witnessed the release of a new mode for service mesh Istio called Ambient Mesh, which is completely different from Sidecar. Since its open source release in 2017, Sidecar has been regarded as a revolutionary innovation for zero-intrusion agents. However, after five years, users have found that there are many side effects that are difficult to solve through Sidecar. In September of this year, in addition to Sidecar, the Istio community announced another data plane mode called Ambient Mesh, which aims to simplify operations, increase application compatibility and reduce infrastructure costs. In this presentation, I will give an overall introduction to the Ambient mode and demonstrate how it works. Then compare it with the Sidecar mode. Finally, I will share the official views of the Istio community from the perspective of core contributors: how Ambient Mesh will evolve in the future and why the Istio community is redesigning a new lightweight proxy "ztunnel" using Rust.
Zhonghu Xu | open source technology expert of Huawei Cloud's cloud-native team recipient of Google Open Source Peer Bonus
KubeSkoop: An automated diagnostic system for container network issues
Kubernetes itself is relatively complex, with a high threshold for use. Users often encounter various problems when starting container migration. Due to the lack of skills and tools for fault diagnosis, users often feel frustrated and even give up on business containerization. Among them, network issues are particularly severe, as Kubernetes network virtualization makes it difficult to troubleshoot network problems. KubeSkoop is designed to reduce the difficulty of troubleshooting network issues and enable people without networking knowledge to automatically locate network issues through self-service automation. KubeSkoop can automatically build an access path in the container network for a given source and destination address, automate the collection and analysis of each network node's configuration on the link, combine eBPF kernel monitoring with IAAS-level network configuration checks to identify root causes that cause networks to be unavailable, greatly reducing the time required for locating network problems so that even users without any networking skills can use it. Currently deployed in Alibaba Cloud Container Service environments as a self-operated tool that solves large-scale Kubernetes cluster networking issues encountered by many customers. Recently Alibaba Cloud has open-sourced KubeSkoop which supports mainstream networking plugins and cloud vendors' Kubernetes cluster diagnostics worldwide.This topic will introduce how to use KubeSkoop diagnostic system architecture design as well as some technical details about its diagnostic capabilities implementation
Bingshen Wang | Alibaba Technical Expert
Cloud-native microservice practice based on Kitex Proxyless and Istio
With the increasing popularity of Istio, the classic sidecar model is also well known. The biggest highlight of this model is that it is non-intrusive to business, and it is precisely this advantage that has made the concept of service mesh deeply rooted in people's hearts and meets most scenario requirements. However, in some scenarios that are sensitive to performance, the sidecar mode will inevitably bring some problems such as application protocol binding, performance loss, resource overheads, and increased operational complexity.
CloudWeGo-Kitex is an RPC framework that supports multiple protocols. ByteDance mainly uses Thrift protocol internally and has done a lot of optimization on it. Kitex hopes to help other enterprises quickly build microservices. However, using Kitex-gRPC with Istio-Sidecar solution will encounter the above-mentioned problems. At the same time, we hope that users who use Thrift protocol can implement service governance based on Istio. Therefore, for multi-protocol support, Kitex supports Proxyless mode based on Istio. However compared with gRPC interface directly accessing Istio there are some issues which will be introduced in this sharing along with how to solve them.We expect that Kitex Proxyless can meet some business demands which are sensitive to performance while enriching deployment forms under unified governance plane and heterogeneous data plane scenarios.
This sharing session will explore from the implementation principle of Kitex Proxyless to landing full-chain swimming lanes based on Kitex Proxyless together with everyone
Wen Hu | CloudWeGo Reviewer Senior Research and Development Engineer in Cloud Native of Volcano Engine
Use container tools to build and manage WebAssembly applications
Wasm has emerged as a secure, portable, lightweight, and high-performance runtime sandbox suitable for cloud-native workloads such as microservices and serverless functions. Docker Desktop recently integrated WasmEdge and now supports Wasm containers.
Today, there is already a large number of battle-tested tools that enable developers to create, manage, and deploy Linux container applications in development and production environments. Developers want to use the same tools to manage their Wasm applications to reduce learning curves and operational risks. More importantly, using the same tools will allow Wasm containers to run in parallel with Linux containers. This provides architectural flexibility where some workloads (such as lightweight, stateless, transactional, scalable) can run in Wasm containers while others (such as long-running heavyweight ones) can run in Linux containers.
In this talk I will introduce how to use Docker Desktop , Podman , containerd , and various versions of Kubernetes to create, publish share and deploy real-world Wasm applications. These examples will demonstrate mixed-container types showing how Wasm containers work alongside existing Linux container applications.
Michael Yuan | Founder & maintainer of WasmEdge
OpenKruise: Comprehensive Enhancement of Cloud-Native Application Management Capability
Cloud-native application workloads are well-known for their Kubernetes native workloads (Deployment, StatefulSet), but on the other hand, we also see that from small and medium-sized startups to large Internet companies, the larger the scale of the application scenario, these native workloads are unable to meet complex business deployment demands.
Therefore, many companies have developed custom workloads suitable for their own scenarios. However, among them, only OpenKruise open-sourced by Alibaba Cloud has achieved maturity in terms of generalization, comprehensiveness and stability as an open-source component that has become a CNCF Incubation project.
In this sharing session, we will start with Kubernetes' native workloads to introduce the responsibilities and implementation basics of cloud-native application workloads. We will then analyze the real demands of application workloads in ultra-large-scale business scenarios. We will discuss how OpenKruise meets these needs through what kind of methods and its development trends in subsequent open-source ecosystems.
1. Problems and challenges in cloud-native application deployment
2. How does OpenKruise meet deployment demands in large-scale business scenarios?
3. Using Alibaba's application scenario as an example to introduce practical applications using OpenKruise for application management
Mingshan Zhao | Alibaba Cloud technical expert OpenKruise community Maintainer
When FinOps Meets Cloud Native - How Tencent Optimizes Cloud Costs Based on Crane
User research shows that more and more companies are migrating their businesses to Kubernetes. However, the packing rate and utilization of cloud resources are far lower than expected, resulting in significant waste of cloud spending. Tencent Cloud follows the "cloud financial management" method of FinOps and practices resource optimization and cost optimization based on Kubernetes. We have summarized these cloud optimization experiences and open-sourced them as Crane: Cloud Resource Analytics and Economics. I will share Tencent's experience in implementing application profiling, cost monitoring, and hybrid deployment in large-scale cluster scenarios based on Crane.
Qiming Hu | Tencent Cloud expert engineer
Cloud-native technology helps to reduce energy consumption and emissions in data centers
Green computing has now become the object of pursuit in various industries. In the digital economy era, "computing power is productivity" has become an important consensus in the industry. However, behind the growth of computing power, the energy consumption of data centers will also increase. In the context of carbon peaking and carbon neutrality strategies, how to improve efficiency and reduce energy consumption is a grand proposition.
When it comes to "green computing," external attention is generally focused on how to reduce data center PUE, but it also includes how to use computing resources reasonably. For example, under the premise of ensuring service stability, reasonable allocation of computing resources can improve resource utilization and reduce server usage, thereby reducing carbon emissions.
Cloud-native technology has significant advantages over traditional cloud computing technologies in terms of energy consumption through efficient use of computational resources. It has gradually become the mainstream technology foundation for cloud services and provides more advanced solutions for achieving green computing.
The topic will be shared from seven aspects: comparison of runtime resource utilization rate; comparison of static service consumption; comparison of microservice frameworks; comparison efficiency analysis on cloud management platforms; analysis on R&D service energy saving; analysis on cloud-native ecological related technical energy saving; and other non-obvious key points for comparing energy-saving measures.
Yong Hua | Senior Development Director of Cloud Native Platform
In today's cloud-native application environment, it has become increasingly common for many companies to use multiple Kubernetes clusters to support their applications.
Effective traffic management is crucial to ensuring the reliable and efficient operation of modern cloud-native applications. As these applications become more complex and require support for high levels of user traffic, efficient traffic management is more important than ever before. By properly managing traffic between multiple Kubernetes clusters, organizations can ensure that their applications run smoothly and users have the best possible experience.
"Traffic Management for Multiple Kubernetes Clusters" is an important topic for modern cloud-native applications. Understanding the best practices and tools for managing traffic between clusters can help organizations achieve better performance and reliability, thereby improving the performance and reliability of their applications.
This sharing session starts from the driving factors behind multi-cluster environments, introducing how to achieve cross-cluster communication of applications to achieve high availability, disaster recovery, and global load balancing.
Xiaohui Zhang | Senior Cloud Native Architect and Evangelist
HSM SDS Server is an open source software, and its open source address is: https://github.com/istio-ecosystem/hsm-sds-server. The project is based on the service mesh project Istio and follows the SDS extension standard of Envoy. Then it implements a solution for the external SDS server of the service mesh through "Hardware Security Module" (HSM). After applying this project, users can maintain credentials managed by Istio/Envoy in a more secure environment through an external SDS server. In addition to supporting the management of newly created credentials for workloads, it also allows users to upload existing workload credentials and manage them at a higher security level, such as certificate rotation functions. This project can be used to save workload credential information in two scenarios: cloud-native service mesh workloads and service mesh gateway.
This project uses Intel® SGX technology to protect user workload private keys within the data plane of the service mesh. User private keys are created and stored in SGX enclave memory, and accessed by applications authorized with SGX key-handle to access user private keys saved in encrypted memory. Therefore, user private keys will never be stored anywhere in plaintext form on the system, achieving a higher level of security.
Huailong Zhang | Cloud Native Software Development Engineer
A workflow orchestration engine called JobFlow based on the cloud-native batch computing platform Volcano
Workflow orchestration engines are widely used in high-performance computing, AI, biomedicine, image processing, beauty enhancement, game AGI, scientific computing and other scenarios to help users simplify the management of parallelism and dependency relationships between multiple tasks and significantly improve overall computational efficiency.
JobFlow is a lightweight task flow orchestration engine that focuses on job scheduling for the cloud-native batch computing platform Volcano. It provides various types of job dependencies for Volcano such as completion dependencies, probes, job failure rate tolerance dependencies etc., supports complex process control primitives such as serial or parallel execution, if-then-else statements, selection statements and loop execution etc. In fields such as HPC，AI，and big data analysis，users can use JobFlow to define concise task processing templates to reduce human waiting time and greatly save manpower and time costs.
JobFlow has been applied in a well-known research institute in China to solve problems such as user data preheating/recovery，business resource limitations，node crashes caused by excessive IO etc. through task flow orchestration which improves task calculation efficiency under equivalent hardware environment.
In this sharing session,Wang Yang and Zhou Mingcheng will introduce:
1. The main challenges faced by Volcano in workflow orchestration scenarios
2. The design concept and application scenarios of JobFlow
3. Application practice and benefits of JobFlow in production environment
Volcano is the first cloud-native batch computing project in the industry donated by Huawei Cloud to Cloud Native Computing Foundation (CNCF)in 2019.It is currently at incubation stage with participating companies including Huawei,AWS,Baidu,Tencent,Jingdong,Xiaohongshu etc.
JobFlow is a sub-project incubated within the Volcano community led by Boyun together with community developers' joint contribution.We believe that this sharing session can bring you a different method of Volcano job scheduling.In addition,the audience can also learn:
1. Boyun's management practice for task orchestration such as AI and big data analysis
2. The design background, difficulties encountered, solutions etc. of JobFlow
Yang Wang | Huawei Cloud Senior Software Engineer Volcano Community Member
Bing Liang | Platform Architect of BoCloud PaaS Product Line
ByteDance's Large-Scale Cluster Federation Technology Practice Based on Kubernetes
With the evolution of cloud-native within various business systems in ByteDance, the number and scale of k8s clusters have grown rapidly, leading to increasing maintenance costs. Additionally, the numerous and diverse cluster types also bring cognitive burden for users when selecting a deployment cluster. To solve these problems, we have independently developed a large-scale cluster federation system called KubeAdmiral to provide users with a unified service deployment entrance that facilitates task load transfer between multiple clusters. This lays the foundation for creating a unified resource pool and improving resource utilization efficiency.
Shengli Liu | Senior Cloud Native Engineer at Volcano Engine
Edge device management is an important application scenario in edge computing, facing many problems such as edge device lifecycle management, mapping cloud-native digital twin models for edge devices, lightweight edge frameworks, and how to store, distribute and consume data collected from massive edge devices.
KubeEdge is a cloud-native open source platform for edge computing built on Kubernetes and has become a CNCF incubation project. KubeEdge supports the collaboration of cloud-edge applications in complex edge-cloud network environments and provides an Edge Device Management Framework (DMI) that supports various protocols for managing edge devices in the form of cloud-native digital twin models.
This topic introduces the DMI device management framework of KubeEdge. Under the design of the DMI framework, devices are no longer just data sources but are abstracted as microservices that provide data services to device data consumers in a cloud-native way. The device data access under the DMI framework supports multiple scenarios and is very flexible. The DMI framework can provide strong support for managing cloud-native intelligent devices based on KubeEdge.
This topic is a joint presentation with Liu Chenlin, R&D engineer at Shanghai Daoke Network Technology Co., Ltd. and member of the KubeEdge community.
Ran Zhao | PhD from the University of Chinese Academy of Sciences senior engineer at Huawei Cloud contributor of the KubeEdge community.
Chenlin Liu | open source technology expert in edge computing at DaoCloud member of the KubeEdge community
The best practice of machine learning platform storage based on CubeFS
In order to meet the company's growing needs for AI training, OPPO has created a one-stop machine learning platform. With the rapid growth of business, the diversity and surge of training tasks pose challenges to storage scalability, cost, and high performance. It is difficult for the storage systems used in the early days to meet the above challenges. The speakers will share how they use the cloud-native distributed file system CubeFS to build 50PB data capacity and tens of billions of small file storage, and realize the unified storage of machine learning platforms in hybrid cloud , and support the daily training of AI business for 200 teams, 10000+ daily training tasks. They will focus on the solutions and practical experience of CubeFS's metadata management of tens of billions of small files, storage management and cache acceleration capabilities under the hybrid cloud architecture, throughout the data lifecycle to flexibly store hot and cold data.
Liang Chang | Storage architect
From load balancing to cloud-native traffic management platform.
The topic will analyze the current situation and problems of load balancing, explore the demand and development trend of traffic management platform. Through the BFE open source project, it will analyze the advanced features of application load balancing and its support for Kubernetes. It will also introduce a new generation of security architecture and explain how to integrate security functions into BFE.
Miao Zhang | Founder & CEO of Yingfei Network
Exploration and Practice of Multi-cluster HPA Based on Karmada by Ctrip
With the rapid development of Ctrip's business, the Kubernetes cluster has rapidly expanded to support online businesses and offline businesses including big data, machine learning and other scenarios. In order to improve resource utilization, enhance platform reliability and reduce cluster operation and maintenance costs, Ctrip has built a new generation of multi-cloud and multi-cluster architecture platforms based on Karmada, and extended key capabilities for cross-cluster elastic scaling of applications. This sharing mainly involves Ctrip's multi-cluster architecture as well as exploration and practice of cross-cluster application elastic scaling.
Jingxue Li | Senior Cloud Native R&D Engineer
Practices and thoughts of China Telecom Network Cloud Native
This speech is aimed at the problems of network element closure, form cloudization and low resource utilization in the cloudization of virtualized network elements, focuses on the implementation of the network functions of the cloud-native network elements themselves. It abstracts CNF's commonality in network functions, fully considers the flexibility and elastic scalability of clouds, proposes a target architecture for cloud-native network elements, puts forward a general framework (Framework of CNF) for cloud-native network elements based on this target architecture. It also provides an implementation plan combined with open-source products and feasibility verification. The implementation plan can open up the black box of network elements, change the form in which they provide services externally to offer external observability for them.
Wanyi Zhu | Cloud Computing Technology Researcher of China Telecom Research Institute
Clusterpedia - Aggregated Retrieval of Resources in Multi-Cluster Scenarios.
The current multi-cluster field is in a stage of rapid development. There are already many projects and tools that can distribute and deploy resources among multiple clusters, but it can be difficult to simultaneously view these resources located in multiple clusters. At this time, using Clusterpedia can solve such problems, allowing users to simultaneously view resources from multiple clusters and support complex search conditions. Additionally, Clusterpedia is compatible with Kubernetes OpenAPI's list/get methods. Without using the UI, existing tools like kubectl can still be used to retrieve data. For the numerous management platforms in the multi-cloud ecosystem (such as Karmada, Clusternet, Cluster-API or self-built cloud management platforms), Clusterpedia provides cluster auto-discovery to be compatible with multi-cloud management platforms and reduce additional operational maintenance for Clusterpedia.
Wei Cai | DaoCloud R&D Engineer
FluidTable: Data Table Abstraction and Elastic Cache System in Cloud-Native Environment.
Data-intensive applications (such as deep learning and big data queries) face multiple challenges in terms of data access on cloud-native platforms. To address these issues, the open-source cloud-native elastic data acceleration system Fluid, under the CNCF, has proposed technologies such as cloud-native data abstraction, cache elastic scaling, and collaborative orchestration of data applications. This report will introduce the latest release of Fluid's cloud-native data table abstraction and its cache elastic scaling design with performance analysis evaluation.
- Introduction to the kubeflow-chart project
- MLOps IDE based on JupyterLab
- Distributed training with workflow scheduling
- How enterprises can quickly apply kubeflow-chart
Audience: Developers and enterprises with MLOps and AI platform requirements. Professionals in the field of AI.