MR WHY

I am
Wang Hongyang

Wechat Official Account
Find fun things here!

Snapberry - Help the Blind Hear the World

This is the 1st release of Snapberry, an individual homework @ IoT course in CMU.
The 2nd release will be the group work exploring more possibilities.

Introduction

Snapberry is a Raspberry Pi project creating an assistant I.o.T. device for the blind to hear the world. It's built with a camera, a Raspberry Pi, and a speaker. The scene captured by the camera is translated into description words by the Microsoft vision API and then speak out with the speaker.

Links

  • Project introduction page is here.
  • Project repository on Github is here.
  • Video is here.

Components

Spectacles as Camera

Spectacles by Snap Inc. has special appearance with two cameras on both sides. This feature naturally fits the needs of cameras in this project Snapberry. Isn't it cool to extend Spectacles usage to cover new users and new market?

Raspberry Pi

In this project, Raspberry Pi is utilized as the server to process captured photos. The server is built with Node.js and Express.

Wireless Earbuds as Speaker

The words description of the scene is generated by Microsoft vision API and then read out from the speaker. Wireless earbuds like AirPods, Bragi, BeatsX can be used for the blind to hear the scene description.

Core Code and Explanation

Front-end

$(".GetBtn").click(() => {
  // get time stamp
  var timeStamp = Math.floor(Date.now() / 1000);

  // get photo from raspberry camera => scene recognition => speech
  getPhoto(timeStamp)
    .then(res => res.blob())
    .then((data) => {
      console.log(data);
      return postForDesc(data);
    })
    .then(res => {
      console.log(res);
      $(".res").html(res.description.captions["0"].text);
      responsiveVoice.speak(res.description.captions["0"].text, "US English Female");
    })

})

For the front-end part, the process can be divided into three parts:
Call getPhoto() API to ask raspberry camera to take a photo;
Get the returned photo and send it to Microsoft Vision Recognition API for image recognition and get description for the scene;
Call responsiveVoice API to speak out the description for the blind.

Backend

router.get('/photo', function (req, res) {

    var camera = new RaspiCam({
        mode: "photo",
        output: "./photo/image" + req.query.timeStamp + ".png",
        encoding: "png",
        timeout: 0 // take the picture immediately
    });

    camera.on("start", function (err, timestamp) {
        console.log("photo started at " + timestamp);
    });

    camera.on("read", function (err, timestamp, filename) {
        console.log("photo image captured with filename: " + filename);

        // leave 1s for photo to save
        setTimeout(() => {
            camera.stop();
        }, 1000);
    });

    camera.on("exit", function (timestamp) {
        console.log("photo child process has exited at " + timestamp);

        // read file and send back to front-end
        var img = fs.readFileSync('./photo/image' + req.query.timeStamp + '.png');
        res.writeHead(200, { 'Content-Type': 'image/gif' });
        res.end(img, 'binary');
    });

    camera.start();
});

The core of backend is how to manipulate camera to take photo/save photo and construct API to return the taken photo to the front-end.

A package named raspicam is adopted for the communication between raspberry pi and raspberry camera. Notice that a time stamp is transmitted from the front-end to record the time when the photo is taken. This is a semantic way to name the photos, which is helpful for later photo categorization.

Results

Framework

141

分享本文:

TOC