iT邦幫忙

第 12 屆 iThome 鐵人賽

DAY 6
0
AI & Data

Machine Learning Study Jam 2020系列 第 6

Detect Labels, Faces, and Landmarks in Images with the Cloud Vision API

  • 分享至 

  • xImage
  •  

An old saying - images can be explained more than thoudsands of words

Human beings are more attracted by visualisation than words.
Research shows images are easily be memorised in brain than text.

No doubts, you love to watch videos than reading right XD /images/emoticon/emoticon35.gif

Finally we are proceeding to a more interesting topic, DETECTING IMAGES!!!

Waiting no more! Let's start~


  1. Open Google Cloud Platform ( follow the step in A Tour of Qwiklabs and Google Cloud )

  2. Activate Cloud Shell
    Like what we did in the previous lesson.

  3. Create an API Key
    Like what we did in the previous lesson.

  4. Upload an Image to a Cloud Storage bucket
    Go Storage in Google Cloud Platform and give your bucket a globally unique name.
    We use this image as a sample. donuts.png
    https://ithelp.ithome.com.tw/upload/images/20200918/20130054kvXhkGvT6n.jpg

Once this image is uploaded, set the permission to PUBLIC. So we'll now see that the file has public access.

  1. Create Vision API request
    Create a request.json file and add the following code:
{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/donuts.png"
          }
        },
        "features": [
          {
            "type": "LABEL_DETECTION",
            "maxResults": 10
          }
        ]
      }
  ]
}
  1. Label Detection
    Call the Vision API:
curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

It returns a list of labels (words) of what's in the image like this:
https://ithelp.ithome.com.tw/upload/images/20200918/20130054AKudHsp5W3.png

  • description -> name of the item.
  • score -> a number from 0 - 1 indicating how confident it is that the description matches what's in the image.
  • mid -> value that maps to the item's mid in Google's Knowledge Graph. You can use the mid when calling the Knowledge Graph API to get more information on the item.
  1. Web Detection
    Vision API can also search the Internet for additional details on your image. Through the API's webDetection method, you get a lot of interesting data back:
  • A list of entities found in your image, based on content from pages with similar images
  • URLs of exact and partial matching images found across the web, along with the URLs of those pages
  • URLs of similar images, like doing a reverse image search

We use the same image donuts.png for web detection.

This time we edit the same request.json as following:

{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/donuts.png"
          }
        },
        "features": [
          {
            "type": "WEB_DETECTION",
            "maxResults": 10
          }
        ]
      }
  ]
}

You may notice type is changing from LABEL_DETECTION to WEB_DETECTION.

Then just use the same command line as above:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

Entities of this image will be listed under webEntities:
https://ithelp.ithome.com.tw/upload/images/20200918/2013005406y4BkF8af.png

If you scroll further down of the result, you will see urls which give similiar images of the detected image under visuallySimilarImages:
https://ithelp.ithome.com.tw/upload/images/20200918/20130054QvAd0xHwnJ.png

  1. Face Detection
    Face detection method returns data on faces found in an image, including the emotions of the faces and their location in the image.

Sounds even more magical right!!!

Let's upload another image for face detection. selfie.png
https://ithelp.ithome.com.tw/upload/images/20200918/20130054MtGnFvKDH3.jpg

Once this image is uploaded, set the permission to PUBLIC. So we'll now see that the file has public access.

Edit the same request.json to the following code:

{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/selfie.png"
          }
        },
        "features": [
          {
            "type": "FACE_DETECTION"
          },
          {
            "type": "LANDMARK_DETECTION"
          }
        ]
      }
  ]
}

Notice we have 2 types here: FACE_DETECTION and LANDMARK_DETECTION

Use the same command line to call Vision API:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

API returns an object for each face found in the image. Take a look at faceAnnotations object in the result:
https://ithelp.ithome.com.tw/upload/images/20200918/20130054sqMPxJ6SiH.png

  • boundingPoly -> the x,y coordinates around the face in the image.
  • fdBoundingPoly -> a smaller box than boundingPoly, focusing on the skin part of the face.
  • landmarks -> an array of objects for each facial feature, some you may not have even known about. This tells us the type of landmark, along with the 3D position of that feature (x,y,z coordinates) where the z coordinate is the depth.
  1. Landmark Annotation
    Landmark detection can identify common (and obscure) landmarks. It returns the name of the landmark, its latitude and longitude coordinates, and the location of where the landmark was identified in an image.

Wow unbelievable right!!!

Let's upload our last image for landmark detection. city.png

https://ithelp.ithome.com.tw/upload/images/20200918/20130054Shl9hXX7eY.png

Once this image is uploaded, set the permission to PUBLIC. So we'll now see that the file has public access.

Use one last time for the same command line:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  <https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}>

The result tells us this image is taken in Boston with exact location:

https://ithelp.ithome.com.tw/upload/images/20200918/20130054x9LAlpiZGM.png

boundingPoly -> region in the image where the landmark was identified.

  1. Explore other Vision API methods
  • Logo detection -> identify common logos and their location in an image.
  • Safe search detection -> determine whether or not an image contains explicit content. This is useful for any application with user-generated content. You can filter images based on four factors: adult, medical, violent, and spoof content.
  • Text detection -> run OCR to extract text from images. This method can even identify the language of text present in an image.

Vision API does lots of amazing works!
Different type shows different results. What a convenient work!/images/emoticon/emoticon42.gif

Hope you enjoy this lesson ~


上一篇
Classify Text into Categories with the Natural Language API
下一篇
Entity and Sentiment Analysis with the Natural Language API
系列文
Machine Learning Study Jam 202012
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言