今天遇到IT鐵人賽文章同步,保存XML資料出現截斷資料異常hexadecimal value 0x1D is an invalid character
,檢查發現出錯在SaveAsync。
public override async Task SavePost(Post post)
{
string filePath = GetFilePath(post);
XDocument doc = new XDocument(
new XElement("post",
//..略
new XElement("content", post.Content),
//..略
));
using (var fs = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite))
{
await doc.SaveAsync(fs, SaveOptions.None, CancellationToken.None).ConfigureAwait(false);
}
}
原因在ASCII table有128符號,其中有許多符號不能使用在XML
最直接解決方式,把不能使用的符號都以空白字串取代,這邊有S.O大大寫的Regex:
static string ReplaceHexadecimalSymbols(string txt)
{
string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]";
return Regex.Replace(txt, r,"",RegexOptions.Compiled);
}
SaveAsync有三個多型方法,使用Stream
、TextWriter
、XmlWriter
:
public Task SaveAsync(Stream stream, SaveOptions options, CancellationToken cancellationToken);
public Task SaveAsync(TextWriter textWriter, SaveOptions options, CancellationToken cancellationToken);
public Task SaveAsync(XmlWriter writer, CancellationToken cancellationToken);
從Stream改用XmlWriter設定XmlWriterSettings使用UTF-8格式並且關閉Character驗證。
//使用UTF-8加上關閉檢查字節,避免invalid character問題
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { Async =true,Encoding = Encoding.UTF8, CheckCharacters = false };
using (XmlWriter writer = XmlWriter.Create(filePath, xmlWriterSettings))
{
await doc.SaveAsync(writer, CancellationToken.None).ConfigureAwait(false);
}
除非逼不得已,不建議使用補充方式,但可以了解xml的一點規則。