Vector database learning

Vector Database Learning

Vector database learning I learned rust today and found that rust is a very suitable language for writing databases. I found a treasure project tikv. Its github project address is https://github.com/tikv/tikv . It is a very active project, but today I The topic is vector database in the field of AI. Without further ado, let’s get started right away. prerequisites 0.1 Introduction to basic knowledge: Understand the basic definition of vector database and its difference from traditional database. 0.2 Data Structure Basics: Learn vectors and other basic data structures, and how to represent and manipulate these structures in databases. 0.3 Introduction to Linear Algebra: Introduction to vector operations, including vector addition, subtraction and dot multiplication. 0.4 Similarity measures: Learn how to calculate similarity between vectors, such as cosine similarity. 0.5 Basics of Database Index: Introduces the basic concepts of database index, especially its application in vector databases. 0.6 Preliminary Search Algorithms: Learn basic search algorithms and understand how to perform effective searches in large data sets. 0.7 Application case studies: Study the application of vector databases in different fields (such as recommendation systems, image recognition). ...

January 20, 2024 · 39 min · 8125 words · Xinwei Xiong, Me

OpenIM: Version Control & Testing Workflow

The success of an open-source project largely depends on its quality management and collaborative processes. In the OpenIM open-source community, the standardization of project management and testing processes is crucial to ensure the quality and stability of the codebase. This document provides a brief overview of our testing strategy, branch management, quality control policies, and how they are applied to the main branch, PR testing branches, and stable release branches to meet the needs of developers, testers, and community managers. Additionally, we will introduce the standards, testing schemes, and project management strategies of the OpenIM open-source community, aiming to provide clear guidance to ensure project stability and sustainability. ...

January 15, 2024 · 5 min · 1023 words · Xinwei Xiong, Me

Emerging Challenges and Trends in 2024

Large language model sharing meeting on January 6, 2024 Limitations of the model: Deep learning Pre-trained model Large language model The emergent power of large language models: 💡 Relevant research on emergence phenomena has been done for a long time in the discipline of complex systems. So, what is “emergent phenomenon”? When a complex system is composed of many tiny individuals, these tiny individuals come together and interact with each other. When the number is large enough, they exhibit special phenomena that cannot be explained by the microscopic individuals at the macro level. This can be called an "emergent phenomenon." Link: ...

January 14, 2024 · 10 min · 2002 words · Xinwei Xiong, Me
about 2023 year

2023 Annual Summary Reflections and Aspirations

My 2023 Annual Summary As 2023 swiftly draws to a close, my university life is nearing its end with just half a year remaining. A friend once said, “What’s frightening is not losing your passion for work, but never being able to find it again.” This year, I encountered many people and experienced numerous events, gradually shaping my world view. I’m fond of Maslow’s hierarchy of needs and often reflect on my own state through it. I enjoy challenges, both in my work and hobbies (like hiking, cycling…). It seems I’ve successfully fulfilled the first four levels of Maslow’s theory: physiological needs, safety needs, social needs, and esteem needs. I’m probably at the stage of self-actualization needs. However, it’s worth mentioning that while Maslow’s theory is hierarchical, human needs aren’t always linear or fixed. For instance, someone at the self-actualization stage might still encounter needs from other levels at different times. If a person loses their job or faces financial difficulties, they may refocus on safety needs like financial security and stability. Similarly, the end of a close relationship or changes in one’s social network might reignite a desire for social needs. Even in everyday life, when we fall ill or feel hungry, our focus might temporarily shift from higher-level needs like self-actualization to physiological needs. ...

December 30, 2023 · 13 min · 2670 words · Xinwei Xiong, Me

Hugo Advanced Tutorial

136: Hugo Advanced Coming to the advanced part, you need to learn some advanced Hugo techniques in depth. Module Hugo modules are the core building blocks of Hugo. A module can be your main project or a smaller module that provides one or more of the 7* component types defined in Hugo: static, content, layouts, data, assets, i18n and archetypes. You can combine modules in any combination you like, and you can even mount directories from non-Hugo projects to form a large virtual union file system. ...

November 6, 2023 · 34 min · 7121 words · Xinwei Xiong, Me

Kubernetes for Kustomize Learning

Introduction About Kustomize GitHub Repository Get Started Kustomize is an open-source configuration management tool designed specifically for Kubernetes. It helps users customize Kubernetes objects and manage them declaratively without modifying the original YAML files . This means you can retain the basic settings for applications and components while overriding default settings with declarative YAML documents called “patches” without altering the original files. Kustomize provides a declarative approach that aligns with Kubernetes philosophy and allows customization of Kubernetes configurations in a reusable, fast, debuggable, and scalable manner. ...

October 31, 2023 · 24 min · 4998 words · Xinwei Xiong, Me

OpenIM Use Harbor Build Enterprise Mirror Repositories

Requirements OpenIM provides various public image registry addresses, such as aliyun, github, Docker Hub, and more. Read https://github.com/openimsdk/open-im-server/blob/main/docs/conversions/images.md for more image building guidelines. Most enterprises choose to set up their own image repository using Harbor, integrating it into their CI/CD pipeline to eventually replace Docker Hub and further reduce image storage costs. Additionally, in a production environment, Harbor generally enables TLS, so you will also need to prepare a valid domain name. ...

October 25, 2023 · 5 min · 1009 words · Xinwei Xiong, Me

Learn About Automated Testing

Automated testing practices and strategies for GitHub open source Go projects in the cloud native field introduce As OpenIM, a popular project on Github, how to create value in the cloud native era is very important. OpenIM is a small and high-quality team, and we do not have particularly in-depth insights in automation. Continuous Integration and Continuous Delivery (CI/CD) using GitHub Actions: GitHub Actions provides a platform to automatically build and test Go language projects. By configuring GitHub Actions workflows, you can automatically run tests when your code changes, ensuring the quality and functionality of your code (https://docs.github.com/en/actions/automating-builds-and-tests) . KubeVela project practice: KubeVela is a cloud-native and open-source project in Go that shows how to organize CI/CD processes, including automated testing, in a cloud-native environment. KubeVela uses declarative workflows to coordinate the CI/CD process. You can refer to KubeVela’s GitHub repository to understand and apply these [Practice 3](https://www.alibabacloud.com/blog/kubevela-one-of-the- hottest-golang-cloud-native-and-open-source-project_597465)4 [5](https://github.com/kubevela/workflow# :~ :text=KubeVela Workflow is an open,engine in your own repository). Cloud native testing framework and tools: In cloud native development, Contract Testing is a common testing practice, which ensures that communication between services complies with predefined API protocols. For example, Pact is used in Cloud-Native Toolkit for contract testing. By writing and integrating tests, you can verify that communication between services works as expected 6 . Code Coverage Check: When doing automated testing, it is a good practice to check code coverage. Many testing frameworks have built-in code coverage checking capabilities, and they can be configured to report code coverage of tests. For example, use the SonarQube tool to read and report code coverage information 6 . Utilize open source tools and frameworks: You can use open source tools and frameworks for testing, such as using Cypress to test cloud native applications[7](https://dev.to/litmus-chaos/cloud-native-application-testing-automation-2bh5# : ~:text=Cloud Native Application %26 Testing,Testing Using Cypress for). There are other projects and resources, such as the learning-cloud-native-go/myapp repository on GitHub, which provide completed examples of cloud native Go projects. You can refer to these examples to understand and apply cloud native testing practices[8] (https://medium.com/learning-cloud-native-go/lets-get-it-started-dc4634ef03b#:~:text=The completed project can be,The completed API). Customized automated testing process: By combining GitHub Actions and open source tools, you can customize your project’s CI/CD process, including automated testing and verification steps9 . Quantify the value of automated testing Automation obviously means that the manual cost in the later period is very low. That is to say, as time goes by and the number of automation runs increases, the value of automation and ROI becomes higher. ...

October 14, 2023 · 60 min · 12628 words · Xinwei Xiong, Me

Kubernetes Control Plane - Scheduler

Scheduler kube-scheduler is responsible for scheduling and assigning Pods to nodes within the cluster. It listens to kube-apiserver, queries for Pods that haven’t been assigned to Nodes, and then assigns nodes to these Pods based on scheduling policies (updating the Pod’s NodeName field). The scheduler needs to fully consider many factors: Fair scheduling; Efficient resource utilization; QoS; affinity and anti-affinity; data locality; inter-workload interference; deadlines. kube-scheduler scheduling is divided into two phases, predicate and priority: ...

September 28, 2023 · 9 min · 1770 words · Xinwei Xiong, Me

In-depth understanding of the components of Kubernetes Kube apisserver

Deep understanding of Kube-APIServer kube-apiserver is one of the most important core components of Kubernetes and mainly provides the following functions Provides REST API interface for cluster management, including authentication and authorization, data verification, cluster status changes, etc. Provides a hub for data interaction and communication between other modules (other modules query or modify data through API Server, only API Server directly operates etcd) apiserver main functions: Authentication: Use the cluster to determine identity. Authentication: Use operation CRUD, permissions are required. Access: For Kubernetes, some additional actions are required. For example, if the written value is not standardized, it needs to be modified, and verification is required after modification. Finally, it is necessary to limit the current flow to prevent maliciousness or loopholes from causing congestion. Mutating Validating +Admission Current Limit Implementation of APIServer object Access control API Server is the intermediate hub for all component interactions. ...

September 28, 2023 · 24 min · 4922 words · Xinwei Xiong, Me