使用微軟釋出跨平臺機器學習框架ML.NET，實作二分類中文人名 (判斷 : 是中文人名或不是中文人名)

microsoft ml.net 中文字詞二分類演算法 binaryclassification c#

ekoboy2136 2023-10-18 15:00:37 ‧ 1564 瀏覽

分享至

微軟提供的範例程式碼網址
https://learn.microsoft.com/zh-tw/dotnet/machine-learning/how-does-mldotnet-work

using Microsoft.ML;
using Microsoft.ML.Data;

class Program
{
    public class HouseData
    {
        public float Size { get; set; }
        public float Price { get; set; }
    }

    public class Prediction
    {
        [ColumnName("Score")]
        public float Price { get; set; }
    }

    static void Main(string[] args)
    {
        MLContext mlContext = new MLContext();

        // 1. Import or create training data
        HouseData[] houseData = {
               new HouseData() { Size = 1.1F, Price = 1.2F },
               new HouseData() { Size = 1.9F, Price = 2.3F },
               new HouseData() { Size = 2.8F, Price = 3.0F },
               new HouseData() { Size = 3.4F, Price = 3.7F } };
        IDataView trainingData = mlContext.Data.LoadFromEnumerable(houseData);

        // 2. Specify data preparation and model training pipeline
        var pipeline = mlContext.Transforms.Concatenate("Features", new[] { "Size" })
            .Append(mlContext.Regression.Trainers.Sdca(labelColumnName: "Price", maximumNumberOfIterations: 100));

        // 3. Train model
        var model = pipeline.Fit(trainingData);

        // 4. Make a prediction
        var size = new HouseData() { Size = 2.5F };
        var price = mlContext.Model.CreatePredictionEngine<HouseData, Prediction>(model).Predict(size);

        Console.WriteLine($"Predicted price for size: {size.Size * 1000} sq ft= {price.Price * 100:C}k");

        // Predicted price for size: 2500 sq ft= $261.98k
    }
}

以上此範例為微軟提供的Machine Learning使用房子大小及價格資料來預測房價

我在嘗試把這個code修改成能夠辨識中文人名的二分類，現在先以訓練模型之後給模型"一個"人名能夠辨識後，print在主控台(是中文人名)或(不是中文人名)的二分類，之後若是大致上可以就會增加訓練資料的量再以迴圈一次丟多一點的中文人名讓模型辨識，以下是我修改後的code，有些問題想讓各位看能否給點建議，是哪裡的想法有錯誤或是寫錯

using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Data;

class NameData
{
    [LoadColumn(0)]
    public string? Name;

    [LoadColumn(1)]
    public bool IsName = false;
}

class NamePrediction
{
    [ColumnName("PredictedLabel")]
    public bool IsName = false;
}


class Program
{
    static void Main(string[] args)
    {        
        var context = new MLContext();
        string filePath = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\Name.csv";
        var data = context.Data.LoadFromTextFile<NameData>(filePath, separatorChar: ',');

        
        var pipeline = context.Transforms.Conversion.MapValueToKey("Label", "IsName").Append(context.BinaryClassification.Trainers.SdcaLogisticRegression("Label"));
        
        var model = pipeline.Fit(data);
       
        var engine = context.Model.CreatePredictionEngine<NameData, NamePrediction>(model);
      
        var newData = new NameData() { Name = "張三" };
        var prediction = engine.Predict(newData);
        
        Console.WriteLine($"是否為中文人名? {prediction.IsName}");
    }
}

附上一張程式碼截圖

資料是以Notepad++先製造幾筆簡單的人名測試整體架構能不能正常運作附檔名是csv

Name, IsName
王小明, 1
張三, 1
張四, 1
張武, 1
張六, 1
張妻, 1
張八, 1
張久, 1

1代表true 0代表False 不過我給的訓練資料一律都會是1，那我也應該不需要做特徵工程因為中文人名本身就是一個很大的特徵了，原本有嘗試過寫成true，不過微軟的ML.NET目前好像無法處理直接使用 bool 類型作為Label。ML.NET 需要Label欄位的類型是整數或浮點數，才能進行二元或多元分類，這是我的猜想而已，因為改成1後有些問題好像解決了，不過目前出現的問題依舊是在訓練模型那段程式碼會一直出現"未處理的例外狀況System.ArgumentOutOfRangeException: 'Could not find feature column 'Features' Arg_ParamName_Name'然後就沒有其他錯誤訊息以及警告訊息，實在無從下手
以下附上截圖

這個訊息點檢視詳細資料會看到一堆值

另外執行程式後跑出的黑色主控台視窗基本上都是沒有任何訊息的全黑畫面

但是在visual studio的上方按下繼續或是不進入函式，視窗會出現訊息，但是不知道這訊息是否有幫助

好像也只是告之36行有問題，因為36行就是var model = pipeline.Fit(data);訓練這行

登入發表討論

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

4 個回答

alexwilliams9811

iT邦見習生 ‧ 2024-08-03 15:24:38

I often play music games to wake up a vibrant and energetic morning. The game helps me heardle have a happy source of energy and complete my work well.

回應
分享
檢舉

登入發表回應

percyrempel

iT邦見習生 ‧ 2024-09-11 11:09:25

Using ML.NET to classify Chinese names is a fascinating application of machine learning for text recognition. It's impressive to see how versatile and powerful cross-platform tools can be. For a different kind of challenge, dive into Pokerogue and Pokerogue Dex — they offer a great mix of strategic depth and fun!

回應
分享
檢舉

登入發表回應

kololo222

iT邦見習生 ‧ 2024-11-02 15:13:33

Using ML.NET for classifying Chinese names showcases machine learning's capabilities in text recognition, demonstrating the power of cross-platform tools. For a blend of strategy and entertainment, explore Pokerogue and Pokerogue Dex. Additionally, consider checking out Monkey Mart for a unique gaming experience!

回應
分享
檢舉

登入發表回應

rowanlebsack

iT邦見習生 ‧ 2025-01-08 14:44:06

Great to see the implementation of ML.NET for binary classification of Chinese names! This can be really useful for various applications. Speaking of applications, I recently came across the Infinite Craft game, which also utilizes advanced algorithms for enhanced player experiences. It’s fascinating how machine learning can intersect with gaming and language processing. Looking forward to more advancements in AI!