在day18我們說明我們的計畫,主要兩部分教科書課本和擴充資料,今天我們會製作課本的內容,主要會將Guide的內容進行整理,並且存成txt檔案。
為了讓LLM更好的翻閱,我們會進行標記,例如「Paragraph:」和「Code Block:」,這些標記能幫助語言模型識別內容類型,從而更好地生成相關回應。
url可以根據搜尋的類別進行修改
import requests
from bs4 import BeautifulSoup
# url = 'https://diagrams.mingrammer.com/docs/guides/diagram'
# url = 'https://diagrams.mingrammer.com/docs/guides/node'
# url = 'https://diagrams.mingrammer.com/docs/guides/cluster'
url = 'https://diagrams.mingrammer.com/docs/guides/edge'
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
content_blocks = soup.find_all(['p', 'pre'])
for block in content_blocks:
if block.name == 'p':
print("Paragraph:", block.get_text())
elif block.name == 'pre':
print("Code Block:\n", block.get_text())
else:
print(f"Failed to retrieve the page. Status code: {response.status_code}")
https://diagrams.mingrammer.com/docs/guides/diagram
Paragraph: Diagram is a primary object representing a diagram.
Paragraph: Diagram represents a global diagram context.
Paragraph: You can create a diagram context with Diagram class. The first parameter of Diagram constructor will be used for output filename.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
with Diagram("Simple Diagram"):
EC2("web")
Paragraph: And if you run the above script with below command,
Code Block:
$ python diagram.py
Paragraph: It will generate an image file with single EC2 node drawn as simple_diagram.png on your working directory, and open that created image
file immediately.
Paragraph: Diagrams can be also rendered directly inside the notebook as like this:
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
with Diagram("Simple Diagram") as diag:
EC2("web")
diag
Paragraph: You can specify the output file format with outformat parameter. Default is png.
Paragraph: (png, jpg, svg, pdf and dot) are allowed.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
with Diagram("Simple Diagram", outformat="jpg"):
EC2("web")
Paragraph: The outformat parameter also support list to output all the defined output in one call.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
with Diagram("Simple Diagram Multi Output", outformat=["jpg", "png", "dot"]):
EC2("web")
Paragraph: You can specify the output filename with filename parameter. The extension shouldn't be included, it's determined by the outformat parameter.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
with Diagram("Simple Diagram", filename="my_diagram"):
EC2("web")
Paragraph: You can also disable the automatic file opening by setting the show parameter as false. Default is true.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
with Diagram("Simple Diagram", show=False):
EC2("web")
Paragraph: It allows custom Graphviz dot attributes options.
Paragraph: graph_attr, node_attr and edge_attr are supported. Here is a reference link.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
graph_attr = {
"fontsize": "45",
"bgcolor": "transparent"
}
with Diagram("Simple Diagram", show=False, graph_attr=graph_attr):
EC2("web")
https://diagrams.mingrammer.com/docs/guides/node
Paragraph: Node is a second object representing a node or system component.
Paragraph: Node is an abstract concept that represents a single system component object.
Paragraph: A node object consists of three parts: provider, resource type and name. You may already have seen each part in the previous example.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
with Diagram("Simple Diagram"):
EC2("web")
Paragraph: In above example, the EC2 is a node of compute resource type which provided by aws provider.
Paragraph: You can use other node objects in a similar manner like:
Code Block:
### aws resources
from diagrams.aws.compute import ECS, Lambda
from diagrams.aws.database import RDS, ElastiCache
from diagrams.aws.network import ELB, Route53, VPC
...
# azure resources
from diagrams.azure.compute import FunctionApps
from diagrams.azure.storage import BlobStorage
...
# alibaba cloud resources
from diagrams.alibabacloud.compute import ECS
from diagrams.alibabacloud.storage import ObjectTableStore
...
# gcp resources
from diagrams.gcp.compute import AppEngine, GKE
from diagrams.gcp.ml import AutoML
...
# k8s resources
from diagrams.k8s.compute import Pod, StatefulSet
from diagrams.k8s.network import Service
from diagrams.k8s.storage import PV, PVC, StorageClass
...
# oracle resources
from diagrams.oci.compute import VirtualMachine, Container
from diagrams.oci.network import Firewall
from diagrams.oci.storage import FileStorage, StorageGateway
Paragraph: You can find all available nodes list in Here.
Paragraph: You can represent data flow by connecting the nodes with these operators: >>, << and -.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB
from diagrams.aws.storage import S3
with Diagram("Web Services", show=False):
ELB("lb") >> EC2("web") >> RDS("userdb") >> S3("store")
ELB("lb") >> EC2("web") >> RDS("userdb") << EC2("stat")
(ELB("lb") >> EC2("web")) - EC2("web") >> RDS("userdb")
Paragraph: Be careful when using the - and any shift operators together, which could cause unexpected results due to operator precedence.
Paragraph:
Paragraph: The order of rendered diagrams is the reverse of the declaration order.
Paragraph: You can change the data flow direction with direction parameter. Default is LR.
Paragraph: (TB, BT, LR and RL) are allowed.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB
with Diagram("Workers", show=False, direction="TB"):
lb = ELB("lb")
db = RDS("events")
lb >> EC2("worker1") >> db
lb >> EC2("worker2") >> db
lb >> EC2("worker3") >> db
lb >> EC2("worker4") >> db
lb >> EC2("worker5") >> db
Paragraph:
Paragraph: Above worker example has too many redundant flows. In this case, you can group nodes into a list so that all nodes are connected to other nodes at once.
Code Block:
from diagrams import Diagram
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB
with Diagram("Grouped Workers", show=False, direction="TB"):
ELB("lb") >> [EC2("worker1"),
EC2("worker2"),
EC2("worker3"),
EC2("worker4"),
EC2("worker5")] >> RDS("events")
Paragraph:
Paragraph: You can't connect two lists directly because shift/arithmetic operations between lists are not allowed in Python.
https://diagrams.mingrammer.com/docs/guides/cluster
Paragraph: Cluster allows you group (or clustering) the nodes in an isolated group.
Paragraph: Cluster represents a local cluster context.
Paragraph: You can create a cluster context with Cluster class. And you can also connect the nodes in a cluster to other nodes outside a cluster.
Code Block:
from diagrams import Cluster, Diagram
from diagrams.aws.compute import ECS
from diagrams.aws.database import RDS
from diagrams.aws.network import Route53
with Diagram("Simple Web Service with DB Cluster", show=False):
dns = Route53("dns")
web = ECS("service")
with Cluster("DB Cluster"):
db_primary = RDS("primary")
db_primary - [RDS("replica1"),
RDS("replica2")]
dns >> web >> db_primary
Paragraph:
Paragraph: Nested clustering is also possible.
Code Block:
from diagrams import Cluster, Diagram
from diagrams.aws.compute import ECS, EKS, Lambda
from diagrams.aws.database import Redshift
from diagrams.aws.integration import SQS
from diagrams.aws.storage import S3
with Diagram("Event Processing", show=False):
source = EKS("k8s source")
with Cluster("Event Flows"):
with Cluster("Event Workers"):
workers = [ECS("worker1"),
ECS("worker2"),
ECS("worker3")]
queue = SQS("event queue")
with Cluster("Processing"):
handlers = [Lambda("proc1"),
Lambda("proc2"),
Lambda("proc3")]
store = S3("events store")
dw = Redshift("analytics")
source >> workers >> queue >> handlers
handlers >> store
handlers >> dw
Paragraph:
Paragraph: There is no depth limit of nesting. Feel free to create nested clusters as deep as you want.
https://diagrams.mingrammer.com/docs/guides/edge
Paragraph: Edge is representing an edge between Nodes.
Paragraph: Edge is an object representing a connection between Nodes with some additional properties.
Paragraph: An edge object contains three attributes: label, color and style which mirror corresponding graphviz edge attributes.
Code Block:
from diagrams import Cluster, Diagram, Edge
from diagrams.onprem.analytics import Spark
from diagrams.onprem.compute import Server
from diagrams.onprem.database import PostgreSQL
from diagrams.onprem.inmemory import Redis
from diagrams.onprem.aggregator import Fluentd
from diagrams.onprem.monitoring import Grafana, Prometheus
from diagrams.onprem.network import Nginx
from diagrams.onprem.queue import Kafka
with Diagram(name="Advanced Web Service with On-Premise (colored)", show=False):
ingress = Nginx("ingress")
metrics = Prometheus("metric")
metrics << Edge(color="firebrick", style="dashed") << Grafana("monitoring")
with Cluster("Service Cluster"):
grpcsvc = [
Server("grpc1"),
Server("grpc2"),
Server("grpc3")]
with Cluster("Sessions HA"):
primary = Redis("session")
primary \
- Edge(color="brown", style="dashed") \
- Redis("replica") \
<< Edge(label="collect") \
<< metrics
grpcsvc >> Edge(color="brown") >> primary
with Cluster("Database HA"):
primary = PostgreSQL("users")
primary \
- Edge(color="brown", style="dotted") \
- PostgreSQL("replica") \
<< Edge(label="collect") \
<< metrics
grpcsvc >> Edge(color="black") >> primary
aggregator = Fluentd("logging")
aggregator \
>> Edge(label="parse") \
>> Kafka("stream") \
>> Edge(color="black", style="bold") \
>> Spark("analytics")
ingress \
>> Edge(color="darkgreen") \
<< grpcsvc \
>> Edge(color="darkorange") \
>> aggregator
Paragraph:
最後我們存成資料檔案,之後提供LLM認識