Saturday, August 1, 2015

Nice overview is here

Growth in mobile and cloud-based speech recognition fueling embedded speech technologies


Improvements in embedded speech technology yield a five-step Voice User Interface (VUI) capable of hands-free, eyes-free voice recognition.

3The ease of speaking a command as opposed to typing it is not only boosting demand for and investment in cloud-based voice search processing, but also creating the need for embedded speech technologies. By addressing several technology stages, advances in embedded speech recognition can eliminate issues within noisy environments and improve response times in hands-free voice-activated mobile devices.
Many of the largest players in speech technology today are also heavyweights in the mobile phone Operating Systems (OS) market. Microsoft was the first of the software/mobile OS giants to build a speech team. In the early 1990s, Bill Gates preached the benefits of Voice User Interfaces (VUIs) and predicted they would play a role in human interfacing on computers. Google got aggressive by building an elite team of speech technologists early in the 21st century and spurred the mobile industry toward speech interfaces and voice control with itsAndroid release. Apple has always been king of the user experience and, until recently, avoided pushing speech technology because of challenges in accuracy. However, with the acquisition of Siri (a voice concierge service) and incorporation of the company’s technology into the iPhone 4S, Apple could be ushering in a new generation of natural language user experiences through voice.

Speech technology has become critical to the mobile industry for a variety of reasons, primarily because it’s easier to speak than type and because the mobile phone form factor is built around talking more so than typing. Additionally, with the enormous revenue potential of mobile search, mobile OS providers are seeing the value of adding voice recognition to their technology portfolios.

Why embedded?

Much of the heavy lifting for VUIs is performed in the cloud. That’s where most of the investment from the big OS players has gone. The cloud offers an environment with virtually unlimited MIPS and memory – two essentials for advanced voice search processing. With this growth of cloud-based speech technology usage, a similar trend appears to be following in the embedded realm.

Embedded speech is the only solution that enables speech control and input when access to the cloud is unavailable – a necessary feature to add to the user experience. Embedded speech also has the ability to consume less MIPS and memory, thus increasing the efficiency of a device’s battery power.

The optimal scenario for client/cloud speech usage entails voice activation on the client, with the heavy lifting of deciphering text and meaning on the cloud. This can enable a scenario where the device is always on and always listening, so a voice command can be given and executed without having to press a button on the client. This paradigm of “no hands or eyes necessary” is particularly useful in the car for safety purposes and at home for convenience’ sake.

For example, in the recently introduced Galaxy SII Android phone, Samsung’s Voice Talk utilizes Sensory’s TrulyHandsfree Voice Control, an embedded speech technology, to activate the phone with the words “Hey Galaxy.” This phrase calls up the Vlingo cloud-based recognition service that allows the user to give commands and input text without touching the phone.

Speech recognition can be implemented on devices with as little as 10 MIPS and tens of thousands of bytes of memory. Sensory’s line of speech chips includes RISC single chips based on 8-bit microcontrollers and natural language processors that utilize small embedded DSPs (see Figure 1). In general, the more MIPS and memory thrown at speech recognition, the more capabilities (faster response times, larger vocabularies, and more complex grammar) a product can have.




21
Figure 1: Sensory’s 16-bit DSP-based natural language processor integrates digital andanalog processing blocks and several communication interfaces in a single-chip architecture.
(Click graphic to zoom)



The general approaches to speech recognition are similar no matter what platform implements the tasks. Statistical approaches like hidden Markov modeling and neural networks have been the primary methods for speech recognition for a number of years. Moving from the client to the cloud allows statistical language modeling and more complex techniques to be deployed.

The VUI stages

To create a truly hands-free, eyes-free user experience, several technology stages must be addressed (see Figure 2).




22
Figure 2: Five technology stages require core attributes to achieve a truly hands-free voice user interface.
(Click graphic to zoom by 1.9x)



Stage 1: Voice activation

This essentially is replacing the button press. The recognizer needs to be always on, ready to call Stage 2 into operation, and able to activate in very noisy situations. Another key criterion for this first stage is a very fast response time. Given that delays of more than a few hundred milliseconds can generate accuracy issues caused by users speaking to Stage 2 before the recognizer is listening, the response time of the voice activation must be the same as the response time of a button, which is near instantaneous. Simple command and control functions can be embedded in the client by the Stage 1 recognition system or a more complex Stage 2 system, which could be embedded or cloud-based.

Stage 2: Speech recognition and transcription

The more power-hungry and powerful Stage 2 recognizer translates what is spoken into text. If the purpose is text messaging or voice dialing, the process can stop here. If the user wants a question answered or data accessed, the system moves on to Stage 3. Because the Stage 1 recognizer can respond in high noise, it can drop volume in the car radio or home AV to assist in Stage 2 recognition.

Stage 3: Intent and meaning

This is probably the biggest challenge in the process. The text is accurately translated, but what does it mean? For example, what is the desired query for an Internet search? Today’s “intelligence” might try to modify the search to better fit what it thinks the user wants. However, computers are remarkably bad at figuring out intent. Apple’s Siri intelligent assistant, developed under the DoD-funded CALO project involving more than 300 researchers, might be today’s best example of intelligent interpretation.

Stage 4: Data search and query

Searching through data and finding the correct results can be straightforward or complex depending on the query. Mapping data and directions can be reliable because the grammar is well understood with a clear goal of a map search. With Google and other search providers pouring money and time into data search functionality, this stage will continue to improve.

Stage 5: Voice response

A voice response to queries is a nice alternative to a display response, which can cause drivers to take their eyes off the road or cause inconvenience in the home. Today’s state-of-the-art text-to-speech systems are highly intelligible and have progressed to sound more natural than previous automated voice systems.

Why has it taken so long for embedded recognizers to replace buttons at Stage 1?

Speech recognition has traditionally required button-push activation rather than voice activation. The main reason for this is that buttons, although distracting, are reliable and responsive, even in noisy environments. These types of environments, such as a car or a busy home, can be challenging for speech recognizers. A voice-activated word must create a response in a car (with windows down, radios on, and road noise) or in a home (with babies crying, music or TV on, and appliances running) without the user having to work for it. Thus, until recently, speech technologies have only been reliable when users are in a quiet environment with the mic close to their mouths.

The requirement of a speedy response time further complicates this challenge. Speech recognizers often need hundreds of milliseconds just to determine if the user is done talking before starting to process the speech. This time delay might be acceptable from a recognition system to yield an answer or reply to the consumer. However, at Stage 1, the response of the activation is calling up another more sophisticated recognizer at Stage 2, and consumers will not accept a delay lasting much more than the time it takes to press a button. The longer the delay, the more likely a recognition failure occurs at Stage 2 because users might start talking before the Stage 2 recognizer is ready to listen.

Recent advances in embedded speech technology such as Sensory’s TrulyHandsfree Voice Interface provide true VUIs without the need to touch devices. These technologies have eliminated the issues inherent within noisy environments as well as long response times, making voice activation feasible, accurate, and more convenie
The rapidly emerging usage of hands-free devices with voice triggers will progress into intelligent devices that listen to what we say and decide when it’s appropriate to go from the client to the cloud. They’ll also decide when and how to respond, potentially evolving into assistants that sit in the background listening to everything and deciding when to offer assistance.
nt.

The future of speech in consumer electronics

Friday, July 31, 2015

A fairly recent interview

Addressing The Biggest Challenge of Going Handsfree – Interview with Todd Mozer, CEO, Sensory Inc.

Sensory, Inc. is a company specializing in consumer speech and vision technologies. Founded in 1994, the company offers software-based biometric security solutions that use voice and facial recognition to authenticate users. Sensory will be at this year’s Consumer Electronics Show, January 6 through 9 in Las Vegas. Sensory will be showcasing its technology in booth MP25547.
In advance of CES 2015 and the launch of the company’s new app, AppLock, FindBiometrics president Peter O’Neill had a chance to interview Sensory’s CEO, Todd Mozer. The conversation details the company’s long history, the biggest challenge in developing handsfree mobile tech and the major innovations Sensory has brought to the biometrics and mobile marketplace.
*

Peter O’Neill, FindBiometrics (FB): Can you please provide our readers with a brief background of your company, Sensory?

Todd Mozer, Sensory: Sensory has been around for more than twenty years. We got started in speech recognition and on the biometric side of things doing speaker verification. We have been targeting the consumer electronics market space and over the last five years we have been very successful in the mobile market in particular. Recently we have expanded our offerings to include vision technologies including face authentication. We call our authentication offering TrulySecure, and we are putting a lot of focus on both speaker verification and face ID. TrulySecure is the biometric fusion of face and voice. Today we have world class solutions in both face and voice authentication.  We are combining them together to make a mobile authentication for high accuracy with extreme convenience. We are really focused on the convenience side of things.

FB: Before we get into TrulySecure, you have a history as being an industry leader in the voice area with a lot of innovation, can you tell our readers a little bit about some of those innovative concepts?

Sensory: We have done both speech chips of our own as well as embedded software that we licensed. Probably one of the most innovative things that we have done is what we call TrulyHandsfree.
Up until a few years ago whenever people used speech recognition they had to hit buttons to use it and that seemed really crazy to us. We wanted a solution that you could just start talking to and have it work but there were extreme challenges associated with leaving speech recognizers always on. One is that they might respond to the wrong things when you don’t want them to respond and another is that when you say the right thing you have to have it respond; you can’t have it reject you and give you a false reject. And to do all that with very,very low power consumption (because it must be always on and listening) was considered impossible.
Sensory proved a lot of naysayers wrong and came out with an algorithm that was able to do high accuracy keyword spotting in any kind of noise level and with ultra low power consumption; this made it so that you didn’t have to touch the device to control by voice.

FB: You were honored at the last Mobile World Congress in Barcelona for this product weren’t you?

Sensory: Yes. TrulyHandsfree has won numerous awards. We were awarded at the Mobile World Congress and I think Speech Tech magazine recently gave us a couple of awards as well.
It has been very, very well received. So much so that really the entire world has started realizing that we need this technology to make products really work. Motorola’s implementation that they called Touchless Control has been one of the top rated mobile apps on the Android store. It’s very flattering to have companies like Google, QUALCOMM and Apple, basically the biggest companies in the world trying to do what Sensory proved can be done…and we are staying ahead of these giants by further lowering power consumption, adding improvements in accuracy, and layering other features like speaker verification.

FB: I am familiar with this product, but the power consumption solution really is truly remarkable. Was  that one of the biggest hurdles to overcome?

Sensory: Yes absolutely. Our goal when we designed it was to get down to the sub 2mA range when it was on and running. We knew that we could go to market with something like 10mA but it wouldn’t become really mainstream until we could get it down to about 1mA. Our first implementation was about 5mA and we’ll have things hit the market in 2015 that are hitting the 1mA or lower with new features like our low power speech detection.

FB: Can you tell us more about your next launch, TrulySecure, your new authentication solution that  combines face and voice biometrics.?

Sensory: What I didn’t say about Sensory earlier, which I’ll add in now, is that a lot of our expertise is really doing small footprint solutions and porting to embedded platforms. I think this could play an increasingly important role as we move into more vision technologies. The face authentication we are doing for example is in a relatively constrained platform like Android, and we have extremely fast response with high accuracy. TrulySecure is a biometric solution that doesn’t have to sacrifice accuracy to be fast, because it has a small footprint. We have made it accurate and very unobtrusive.

FB: That is what the market is looking for isn’t it?

Sensory: We hope so. We certainly did our research on what the market wanted and we quickly concluded that face and voice had certain advantages over fingerprint and some of the other biometrics in the ease of use and low implementation cost areas and that the real challenge was to make it so that they were accurate enough. By combining the best voice verification and the best face authentication together we are achieving false accept rates below .005 percent with over 95 percent detection – and that’s using real world data.

FB: What vertical markets will you be focusing on for TrulySecure?

Sensory: Good question. Our roots are in the consumer electronic space including mobile phones, cars and home electronics so certainly we are taking it to our existing customer base which includes all these types of companies. However, we are also see opportunities in new market segments that we haven’t traditionally been selling to. For example, we believe that just as a way to lock applications there is a nice fit for app developers and enterprise applications. Because TrulySecure is all developed by Sensory in house, it can be both cost effective but also run cross platform on Android or iOS or any platform you want us to move it to.

FB: If I’m not mistaken everything is performed on device, is that correct?

Sensory: That is correct. No cloud is required and we have intentionally architected to run on device.  Nevertheless, we have started to look at other techniques where we could use some of the advantage of the cloud to enable more security and offer our customers some flexibility about where things are stored and to really take advantages of both client and cloud types of authentication.

FB: You are also announcing a new app in January. Can you tell us about this?

Sensory: TrulySecure is the brand name of our technology and what we are going to be announcing in January is a new application that users can download onto their Android devices and enable them to biometrically lock their applications. AppLock is an application locker and the neat thing about this is that you can have the ease and unobtrusiveness by using face only where literally you can just open up an application and you almost don’t even know that it did a biometric authentication because the camera glances so quickly and opens up. But if you want the highest security then we can combine both the face and voice together and then you open it up with a text dependent password, and that’s where it gets virtually unbreakable.

FB: Well Todd I was at Money 20/20 in Las Vegas last month and biometrics are starting to play a significant role in the whole financial payment industry. Your solution certainly seems ideal for the financial area as well. So congratulations on being innovative once again and look forward to seeing how your launch unfolds in 2015.

Sensory: Thanks very much Peter.

Related News

  1. jfieb

    jfiebMember

    Sensory blog June 3

    RAMBLING ON… CHIP ACQUISITIONS AND SOFTWARE DIFFERENTIATION
    June 3, 2015

    When I started Sensory over 20 years ago,

    wow, they have been around a LOT longer than I would have believed, kind of like QUIK :)


    I knew how difficult it would be to sell software to cost sensitive consumer electronic OEMs that would know my cost of goods. A chip based method of packaging up the technology made a lot of sense as a turnkey solution that could maintain a floor price by adding the features of a microcontroller or DSP with the added benefit of providing speech I/O. The idea was “buy Sensory’s micro or DSP and get speech I/O thrown in for free”.

    After about 10 years it was becoming clear that Sensory’s value add in the market was really in technology development
    ,

    Look how long it took them to find their way with their ideas...


    and particularly in developing technologies that could run on low cost chips and with smaller footprints, less power, and superior accuracy than other solutions. Our strategy of using trailing IC technologies to get the best price point was becoming useless because we lacked the scale to negotiate the best pricing, and more cutting edge technologies were becoming further out of reach; even getting the supply commitments we needed was difficult in a world of continuing flux between over and under capacity.

    So Sensory began porting our speech technologies onto other people’s chips. Last year about 10% of our sales came from our internal IC’s! Sensory’s DSP, IP, and platform partners have turned into the most strategic of our partnerships.

    Today in the semiconductor industry there is a consolidation that is occurring that somewhat mirrors Sensory’s thinking over the past 10 years, albeit at a much larger scale. Avago pays $37 billion dollars for Broadcom, Intel pays $16.7B for Altera, and NXP pays $12B for Freescale, and the list goes on, dwarfing acquisitions of earlier time periods.

    It used to be the multi-billion dollar chip companies gobbled up the smaller fabless companies, but now even the multibillion-dollar chip companies are being gobbled up. There’s a lot of reasons for this but economies of scale is probably #1. As chips get smaller and smaller, there are increasing costs for design tools, tape outs, prototyping, and although the actual variable per chip cost drops, the fixed costs are skyrocketing, making consolidation and scale more attractive.

    That sort of consolidation strategy is very much a hardware centered philosophy. I think the real value will come to these chip giants through in house technology differentiation. It’s that differentiation that will add value to their chips, enabling better margins and/or more sales.

    I expect that over time the
    In fact, we have already seen Intel, Qualcomm and many other chip giants investing in speech recognition, biometrics, and other user experience technologies, so the change is underway!



    Commentary; Sensory in some ways has a lot of history in common with QUIK. Things have moved their way a HUGE amount. When you evolve into the UI of the next generation of devices you are so important. I don't thenk they are for sale at all.

    Why?

    Kind of like QUIK, they have existed for 20 yrs and now are an EOS( dawn) of their own making and from their blog they have a roadmap of adjacent possibles that are so exciting and that they will just want to experience.

    Would I like to invest in this company?

    You bet. it would be a nice focus holding for the future, but since its private, its not possible for a mere average citizen. ....but is QUIK a proxy for them?

    Yes, I think they are as of NOW. I am really happy as I was worried they would get the IP from who knows where and that it would NOT BE ANY GOOD.
    QUIK has top notch audio hardened into EOS. Audio is a UI. It is a key focus for any Smartphone, platform wearable.

Thursday, July 30, 2015

Nice read is here, and there should be more...

http://www.eetimes.com/document.asp?doc_id=1327272
and this one has audio details

http://techfocusmedia.net/blog/quicklogic-goes-full-soc-for-sensors/

what is ingenious is how one piece of silicon can be a SoC & a coprocessor reading the details of the audio may give you a headache, but it shows how hard they worked!
I thought it would take 2 to cover the SoC and something else for the coprocessor.
Sensory will change the game... here is some material to work through. Worth ALL the time it takes.
Nice material.  QUIK has done better than I had hoped for here.




  1. Nice, A lot of reading to do. I was very curious to find out where the CRUCIAL part of a SoC would come from.
    From Brian's comments it will be worth the time to get a better understanding. I'm glad its not Audience.

    I will be reading all their blog entries tonight.

    Good Technology Exists – So Why Does Speech Recognition Still Fall Short?
    March 30, 2015

    At Mobile World Congress, I participated in ZTE’s Mobile Voice Alliance panel. ZTE presented data researched in China that basically said people want to use speech recognition on their phones, but they don’t use it because it doesn’t work well enough. I have seen similar data on US mobile phone users, and the automotive industry has also shown data supporting the high level of dissatisfaction with speech recognition.

    In fact, when I bought my new car last year I wanted the state of the art in speech recognition to make navigation easier… but sadly I’ve come to learn that the system used in my Lexus just doesn’t work well — even the voice dialing doesn’t work well.

    As an industry, I feel we must do better than this, so in this blog I’ll provide my two-cents as to why speech recognition isn’t where it should be today, even when technology that works well exists:
    1. Many core algorithms, especially the ones provided to the automotive industry are just not that good. It’s kind of ironic, but the largest independent supplier of speech technologies actually has one of the worst performing speech engines. Sadly, it’s this engine that gets used by many of the automotive companies, as well as some of the mobile companies.
    1. Even many of the good engines don’t work well in noise. In many tests, Googles speech recognition would come in as tops, but when the environment gets noisy even Google fails. I use my Moto X to voice dial while driving (at least I try to). I also listen to music while driving. The “OK Google Now” trigger works great (kudo’s to Sensory!), but everything I say after that gets lost and I see an “it’s too noisy” message from Google. I end up turning down the radio to voice dial or use Sensory’s VoiceDial app, because Sensory always works… even when it’s noisy!
    2. Speech Application designs are really bad. I was using the recognizer last week on a popular phone. The room was quiet, I had a great internet connection and the recognizer was working great but as a user I was totally confused. I said “set alarm for 4am” and it accurately transcribed “set alarm for 4am” but rather than confirm that the alarm was set for 4am, it asked me what I wanted to do with the alarm. I repeated the command, it accurately transcribed again and asked one more time what I wanted to do with the alarm. Even though it was recognizing correctly it was interfacing so poorly with me that I couldn’t tell what was happening, and it didn’t appear to be doing what I asked it to do. Simple and clear application designs can make all the difference in the world.
    3. Wireless connections are unreliable. This is a HUGE issue. If the recognizer only works when there’s a strong Internet connection, then the recognizer is going to fail A GREAT DEAL of the time. My prediction – over the next couple of years, the speech industry will come to realize that embedded speech recognition offers HUGE advantages over the common cloud based approaches used today – and these advantages exist in not just accuracy and response time, but privacy too!

    Deep learning nets have enabled some amazing progress in speech recognition over the last five years. The next five years will see embedded recognition with high performance noise cancelling and beamforming coming to the forefront, and Sensory will be leading this charge… and just like how Sensory led the way with the “always on” low-power trigger, I expect to see Google, Apple, Microsoft, Amazon, Facebook and others follow suit.



  2. I will put this one up because it Samsung

    a snip from this yrs WMC

    I’d be remiss without mentioning the Galaxy S6. Samsung invited us to the launch and of course they continue to use Sensory in a relationship that has grown quite strong over the years. Samsung continues to innovate with the Edge, and other products that everyone is talking about. It’s amazing how far Apple took the mantle in the first iPhone and how companies like Samsung and the Android system seem to now be leading the charge on innovation!


    Samsung selects Sensory as Key Source for Embedded Speech Technologies
    Santa Clara, CA – September 5, 2014 …Sensory Speech Recognition Deployed by Samsung Across a Wide Range of Phones, Wearables, and Cameras.

    Sensory Inc., the industry leader in speech and vision technologies for consumer products, is pleased to announce that its pioneering TrulyHandsfree™ voice technology is deployed across an array of Samsung’s iconic Galaxy products including smartphones, tablets, cameras, and wearables. TrulyHandsfree™ is the leading always-on, always-listening voice control solution that just works. It enables users to activate and access their phone with an ultra-low power voice trigger. The TrulyHandsfree™ voice control can also enable extremely high accuracy command sets that do not require close talking mics, quiet rooms, or even saying things exactly right. Samsung uses these features to answer calls, use the camera, or perform other functions where talking is easier and more convenient than touching the device. Samsung also uses TrulyHandsfre™ as the voice trigger for S-Voice.The technology is robust enough to work in noisy environments, has a low risk of false starts (won’t make a call when you don’t want it to) and has minimal impact on battery life, making it the ideal voice control solution for mobile and wearable devices. Since its inception, TrulyHandsfree™ trigger technology has become the most widely adopted keyword spotting technology in the speech industry.

    Among the Samsung devices implementing the TrulyHandsfree™ technology is the flagship Galaxy S line of phones. Sensory was first introduced in Galaxy S2 and has been a key part of GS3, GS4, and now GS5.

    Outside of smart phones, other Samsung products incorporating TrulyHandsfree™ include the Galaxy Note 1, 2, 3, and 4 devices and the Galaxy Gear wearables line including Gear 1, Gear 2, and Gear S.

    Sensory is also in cameras and tablets which deploy S-Voice.

    “Samsung continues to be a standard bearer and innovator in an array of dazzling and savvy technology devices which meet the needs of mass consumers,” noted Sensory’s CEO Todd Mozer. “We are very pleased that they have selected Sensory for all their embedded speech needs.”



    “Sensory has emerged as the clear leader in low-power high-accuracy speech recognition, and the widespread adoption across Samsung products is a testament to their success,” said William Meisel, President of TMA Associates, which provides insights and consulting support to companies that want to incorporate speech technologies into their products or services.

    For more information on TrulyHandsfree™ contact sales@sensory.com.

    About Sensory, Inc.
    Sensory, Inc. is the leader in UX technologies for consumer products, offering a complete line of IC and software-only solutions for speech recognition, speech synthesis, speaker verification, vision and more. Sensory’s products are widely deployed in consumer electronics applications including mobile, automotive, wearables, toys, and various home electronics. With its TrulyHandsfree™ voice control, Sensory has set the standard for mobile handset platforms’ ultra-low power “always listening” touchless control. To date, Sensory’s technologies have shipped in over half a billion units of leading consumer products.

    TrulyHandsfree is a trademark of Sensory, Inc.
    Last edited: Yesterday at 8:26 PM
  3. jfieb

    jfiebMember



    I read a lot of job offereings, Sensroy has one like this

    Senior Software Development Engineer - 5+ Yrs Exp (Loc: Santa Clara CA)
    Sensory, Inc - United States
    Job Code: 15-10
    *Location: Santa Clara CA

    Sensory offers an exciting opportunity to change the world of consumer electronics with best-in-class speech and image recognition technologies and chips. Sensory is a private, growing technology company and the leader in a rapidly expanding market for voice and vision user interfaces.Sensory’s specialties are user interfaces for mobile phones, tablets and notebooks; home automation; automotive and entertainment robotics. Sensory has design wins and products shipping with major OEMs in all these fields, including the largest mobile OEMs. Sensory’s technologies are deployed in 100’s of millions of units world-wide. Visit sensory.com for more details.


    It's important as its the UI for many devices QUIK needs to be in.

    Qualifications

    The ideal candidate combines a high level of creativity and analytic ability, with get-it-done practicality and excellent work quality. This individual will be part of the team that creates and deploys to the market best-in-class speech recognition and natural language solutions on smart phones, tablets, PCs and consumer electronics.

    Primary Duties/Responsibilities

    Software Programming

    • Architects and implements speech recognition, speech synthesis, voice processing and noise management algorithms and software on a variety of DSP-based and ARM-based platforms from the market leaders

    So QUIK has hardened some part of their algo IP
    • Evaluates new processor platforms for feasibility of implementing Sensory technologies
    • Ports technology software to various processor platforms and validates performance
    • Develops and maintains in-house and customer tools to support product application of Sensory technologies
    • Defines and implements simulations and scripts for validating, evaluating, and improving the performance of Sensory technologies.
    Technology Algorithms

    • Learns and understands existing proprietary Sensory technology in depth
    • Collaborates with the theoretical algorithms development group in the development and improvement of speech recognition, synthesis, voice processing and noise management algorithms.
    Requirements

    • MS in EE or CS.
    • Five years programming experience in product development.
    • Proven ability to be very innovative and productive in teams and working solo
    • Proven experience writing DSP algorithms in C/assembly code in shipped products.
    • Working knowledge of a few commercial DSPs or DSP cores.
    • Solid signal processing knowledge
    • Solid knowledge of analytic techniques, statistics, mathematical modeling.
    • Proficiency in assembly language and embedded “C”, demonstrated by a past primary programming role in one or more fully-released products.
    • Ability to develop software from existing code, detailed specification, or general conceptual outline; equally adept at high-level algorithmic software design and low-level code optimization.
    • Good oral and written communication skills, ability to exchange and debate complex technical concepts face-to-face or remotely.
    • U.S. citizen or permanent resident
    Preferences

    • Experience with Perl, Tcl/TK, or similar scripting languages
    • Experience writing Matlab simulations of algorithms
    • Signal processing knowledge for speech
    • Knowledge of digital audio processing
    • Familiarity with embedded systems hardware, ADCs, DACs, ability to read schematics.
    • Experience with Acoustic Echo Canceller (AEC), Beamforming and Noise Reduction Algorithms– such as for mobile, auto, conf call, etc.
    When applying, please reference: Job Code: 15-10 - Senior Software Development Engineer on subject line.

    * Sensory’s Policy for Agencies, Retained Search Firms and/or Independent Recruiters *

    Any unsolicited resumes sent to Sensory, Inc. including unsolicited resumes sent to a Sensory, Inc. by mailing address, fax machine or email address or directly to Sensory, Inc. employees, without having a Sensory, Inc. agreement in place, will be considered “UNSOLICITED” and a “PROPERTY” of Sensory, Inc. Sensory, Inc. will NOT pay a placement fee resulting from the receipt of any UNSOLICITED resume.


    QUIK could not have an SoC without this, and now it may be an important offload from the AP....that way its always on
  4. jfieb

    jfiebMember



    17 July 2015
    Powerful forces, old and new, have come together to dramatically change the way humans interact with devices. The voices of Siri, Cortana, and Echo have heralded this change to consumers and electronics developers alike, potentially marking the end of an era when tap, pinch, slide, and swipe dominate user interfaces. Very soon, the most natural form of communication—speech—will dominate human-machine interactions, and the pace of this change is taking everyone’s breath away.

    “The velocity of the improvements we have made with voice is like nothing I have ever seen before,” says Kenneth Harper, senior director of mobile technical product management at Nuance Communications. “But what we have today is just the tip of the iceberg. This vision of ubiquitous speech will become a reality in the future. In the next year, we are going to see a lot of new interfaces come to market, where speech actually is the primary interface.”

    Again, this was very important for us, and QUIK has done very well?

    The Need for Speech

    The shift to voice-enabled interfaces has been accelerated by the emergence of the Internet of Things (IoT) and broad adoption of mobile and wearable devices. As the IoT takes shape, promising to provide ubiquitous connectivity to almost limitless information, consumers increasingly expect easy and convenient access to data. Unfortunately, traditional device interfaces often hinder, rather than facilitate, such access.

    [​IMG]T

    Something Old and Something New

    To make this leap forward, developers needed a technology that could process the complexities of language and information retrieval in much the same way that the human brain does. This translates into nonlinear, parallel processing that learns from data instead of relying on large sets of rules and programming.

    For this, developers have turned to neural networks—a branch of machine learning that models high-level abstractions using large pools of data. Although neural networks (also known as deep learning) has been sidelined as a computing curiosity for several years, researchers have begun harnessing neural nets’ ability to improve speech-recognition systems.

    Neural nets use algorithms to process language via deeper and deeper layers of complexity, beginning by identifying phonemes (perceptually distinct units of sound), learning the meaning of key words, and progressing to the point where they understand the importance of context. Ultimately, the algorithms put words together to form sentences and paragraphs that conform to the rules of grammar.

    What makes neural nets so relevant now? Increased use of speech recognition and information retrieval systems like Siri, Cortana, and Echo has created large pools of data that train neural nets. The appearance of this data coincides with the availability of affordable computer systems capable of handling very large data sets. These two resources have enabled the developers to build bigger, more sophisticated models to create more accurate algorithms.

    These new and improved models have increased the effectiveness of voice interfaces in two ways; they have improved speech-recognition systems’ ability to transcribe audio into words, and enabled a technology called natural language understanding, which interprets the meaning and intent of words.

    “.......

    Processors Built for Voice

    While these software developments have greatly enhanced speech-recognition systems, hardware advances also have played a key role. Researchers credit graphics processing units (GPUs) by providing the computing power required to handle the large training data sets, which is essential in developing speech recognition and natural language understanding models. These processors possess qualities that make them ideal for voice systems.

    To begin, GPUs do not burn as much power or take up as much space as CPUs, two critical considerations when it comes to mobile and wearable devices. It is their capacity for parallel computing, however, that makes GPUs so well suited for neural network and voice processing applications. These highly efficient systems provide the bandwidth and power required to convert large training data sets into the models. The graphic processors are not as powerful as CPUs, but developers can still divide larger calculations into small pieces and spread them across each GPU chip. As a result, GPUs routinely speed up common operations, such as large matrix computations, by factors from 5 to 50, out pacing CPUs.

    “As we have gotten more sophisticated GPUs, we have also gotten more sophisticated ways of interacting with products through voice,” says Todd Mozer, CEO of Sensory Inc.

    Cloud vs. Local…or Something in Between

    Speech-recognition systems come in three flavors: cloud-based implementations, locally residing systems, and hybrids. To determine the right design for an application, issues to consider are processing/memory requirements, connectivity, latency tolerance, and privacy.

    The size of a speech-recognition system’s vocabulary determines the RAM capacity requirements. The speech system functions faster if the entire vocabulary resides in the RAM. If the system has to search the hard drive for matches, it becomes sluggish. Processing speed also impacts how fast the system can search the RAM for word matches. The more sophisticated the system, the greater the processing and memory requirements, and the more likely it will be cloud-based. However, this may not be so in the future.

    “All of the best intelligent system technology is cloud-based today, says Expect Labs’ Tuttle.” “In the future, that is not necessarily going to be the case. In three to five years, it’s certainly possible that a large percentage of the computation done in the cloud today could conceivably be done locally on your device on the fly. Essentially, you could have an intelligent system that could understand you, provide answers on a wide range of topics, and fit into your pocket, without any connectivity at all.”

    Despite the advantages of cloud-based systems, a number of factors make speech systems residing locally on a device desirable. First and foremost, they do not require connectivity to function. If there is any chance that connectivity will be interrupted, local voice resources are preferable. In addition, local systems often offer significantly better performance because there is no network latency. This means that responses are almost instantaneous. Also, if all data remains on the device, there are no privacy concerns.

    Some companies, however, adopt hybrid configurations in an attempt to cover all contingencies. By combining cloud-based and local resources, the designer gets the best of both worlds. Cloud-based resources provide high accuracy in complex applications, and local resources ensure fast responses in simpler tasks. Hybrid designs also mitigate the issue of unreliable connectivity.

    Predictions of what voice systems will look like in the future indicate that there will be a place for each of these approaches. “The cloud will continue to play a big role for many years,” says Harper. “We will continue to push more advanced capability to the device, but as we do that, we will start inventing new things that we can do only in the cloud. But the cloud will always be a little bit ahead of what you can do on the device.”

    ..................
    “We will look back on this period we are in now, and the next five years, as the golden age of AI,” says Tuttle. “The numbers of advances we are seeing are remarkable, and they look like they will continue for the foreseeable future.”


    This is hard work and just consider that they did some important things in the time it took to get the s3. This was a worry for me and what we have is going to keep us busy( learning more)
  5. jfieb

    jfiebMember



    QuickLogic and Sensory Partner to Provide Always-Listening, Deeply Embedded Voice Recognition at Ultra-Low Power


    SUNNYVALE, CA--(Marketwired - Jul 30, 2015) - QuickLogic Corporation (NASDAQ: QUIK)


    • Hardened system blocks specifically designed for voice processing applications provide extremely power efficient, always-listening voice capability

    What a lot of work to decide what part of their IP gets hardened, but in getting it right, what it allows...forget the how, but what it allows.

    • Less than 350 microAmps always-on voice trigger
    • Supports advanced voice processing, including voice recognition, without cloud connection requirement
    So this is incremental info....really, really nice.
    QuickLogic Corporation (NASDAQ: QUIK), the innovator of ultra-low power programmable sensor processing solutions, today announced that it is partnering with voice and vision technology industry leader Sensory Inc. to deliver TrulyHandsfree™ software the world's most advanced voice recognition solution, deeply embedded in its new EOS™ S3 sensor processing platform. The hardened system blocks included in the EOS sensor processing SoC platform are designed to provide integrated voice trigger and command and control functionality at ultra-low power levels, enabling a vast array of voice-driven applications without the need for a connection to cloud services.

    Integrated logic allows digital input from Pulse Density Modulation (PDM) as well as Inter-IC Sound (I2S) microphones, and provides PDM to Pulse Code Modulation (PCM) conversion for processing with Sensory's TrulyHandsfree software. Also hard coded is Sensory's Low Power Sound Detector (LPSD) technology, which allows the speech recognizer to be suspended while an ultra-low power sound detector is running and listening for what could be speech.

    The integrated system supports a wide range of features including highly noise robust always-on, always-listening fixed triggers, enrolled fixed triggers, user defined triggers and passphrases, and up to 20 phrase spotted commands that can be accurately detected in silent to extremely noisy environments. Embedding functionality in hardware dramatically reduces power consumption, enabling always-on voice triggering at a draw of less than 350 microAmps.

    "QuickLogic's new EOS sensor platform is groundbreaking, and we are excited to have enhanced its capabilities by providing our TrulyHandsfree voice control technology complemented by our ultra-low power sound detector in the form of an embedded block," said Bernard Brafman, vice president of business development at Sensory.

    "Sensory is the industry leader in voice processing systems for mobile applications," said Dr. Frank A. Shemansky, Jr., senior director of product management at QuickLogic Corporation. "Integration of Sensory's TrulyHandsfree and LPSD technologies with the QuickLogic EOS sensor processing system provides unprecedented always-on voice capability, and will facilitate a new generation of voice-driven applications."


      More
      Sensory's TrulyHandsfree firmware and hardware low power sound detector (LPSD) are included in QuickLogic's advanced EOS sensor processing SoC, which incorporates a revolutionary architecture that enables the industry's most advanced and compute intensive sensor processing capability at a fraction of the power consumption of competing technologies.

      AvailabilityInitial samples of the EOS platform with integrated voice processing will be available in September 2015. For more information, please visit www.quicklogic.com/EOS.

      About QuickLogicQuickLogic Corporation is the leading provider of ultra-low power, customizable sensor processing platforms, Display, and Connectivity semiconductor solutions for smartphone, tablet, wearable, and mobile enterprise OEMs. Called Customer Specific Standard Products (CSSPs), these programmable 'silicon plus software' solutions enable our customers to bring hardware-differentiated products to market quickly and cost effectively. For more information about QuickLogic and CSSPs, visit www.quicklogic.com.
  6. jfieb

    jfiebMember



    want this on the same post as its very, very important...

    QUIK/Sensory item of today...

    Supports advanced voice processing, including voice recognition, without cloud connection requirement

    Sensory blog.....

    Cloud vs. Local…or Something in BetweenSpeech-recognition systems come in three flavors: cloud-based implementations, locally residing systems, and hybrids. To determine the right design for an application, issues to consider are processing/memory requirements, connectivity, latency tolerance, and privacy.

    The size of a speech-recognition system’s vocabulary determines the RAM capacity requirements. The speech system functions faster if the entire vocabulary resides in the RAM. If the system has to search the hard drive for matches, it becomes sluggish. Processing speed also impacts how fast the system can search the RAM for word matches. The more sophisticated the system, the greater the processing and memory requirements, and the more likely it will be cloud-based. However, this may not be so in the future.

    “All of the best intelligent system technology is cloud-based today, says Expect Labs’ Tuttle.” “In the future, that is not necessarily going to be the case. In three to five years, it’s certainly possible that a large percentage of the computation done in the cloud today could conceivably be done locally on your device on the fly. Essentially, you could have an intelligent system that could understand you, provide answers on a wide range of topics, and fit into your pocket, without any connectivity at all.”

    Despite the advantages of cloud-based systems, a number of factors make speech systems residing locally on a device desirable. First and foremost, they do not require connectivity to function. If there is any chance that connectivity will be interrupted, local voice resources are preferable. In addition, local systems often offer significantly better performance because there is no network latency. This means that responses are almost instantaneous. Also, if all data remains on the device, there are no privacy concerns.




    Really nice execution on this key part of the Eos. This is huge in my opinion. WHat else is there to like....

    these snips......

    In the future

    In three to five years

    QUIK will have cool stuff to work on and add for the S4, S5...this stuff will be on the roadmap and we don't want it to stop.


    Some companies, however, adopt hybrid configurations in an attempt to cover all contingencies. By combining cloud-based and local resources, the designer gets the best of both worlds. Cloud-based resources provide high accuracy in complex applications, and local resources ensure fast responses in simpler tasks. Hybrid designs also mitigate the issue of unreliable connectivity.

    Predictions of what voice systems will look like in the future indicate that there will be a place for each of these approaches. “The cloud will continue to play a big role for many years,” says Harper. “We will continue to push more advanced capability to the device, but as we do that, we will start inventing new things that we can do only in the cloud. But the cloud will always be a little bit ahead of what you can do on the device.”


  7. jfiebMember




    consider what K Morris has said...



    http://www.eejournal.com/archives/articles/20150405-customizability/

    By Kevin Morris…

    Instead of looking at what’s inside,we should be thinking about what jobs a chip is intended for. If we look at a device’s intended application, that gives us a much more realistic view of the “market ” than if we look at the kinds of transistors and the type of architecture inside the chip that lets it accomplish its task. In fact, looking at the “how” can be a dangerous distraction from the “what” – which is where the real competition happens in semiconductors.

    So I will put up some interesting essays on the what of Audio....the key thing to grasp is that it is THE USER INTERFACE, some say
    BIG time of the future.

  8. jfieb

    jfiebMember



    use this one as a mental model of audio...

    Google Glass Needs A Full Audio UI
    July 30, 2014 — markwarren
    [​IMG]
    Google Glass’ UI requires the touchpad today, yet using it becomes painful after only 1-2 minutes! Glass needs a complete voice command UI to avoid “Glass shoulder” syndrome.

    Google Glass’ touchpad is OK when a verbal command might be awkward, but an audio UI becomes imperative when your hands are occupied (e.g. covered in dough while cooking, busy carrying things, or just relaxing). The Glass UI relies too heavily on the touchpad and this is, literally, painful. Tom Chi concisely explained why in his talk at “Mind the Product 2012″ (http://vimeo.com/55741515, 7m18s):

    [Tom describes the first test subjects trying a prototype gesture UI]
    “…and about a minute and a half in I started seeing them do something weird, they were going like this [Tom rolls his shoulders and kneads them], and I was like “What’s wrong with you?” and they responded “Well my shoulder sort of hurts.” and we learned from this set of experiments that if your hands are above your heart then the blood drains back and you get exhausted from doing this in about a minute or two and you can’t go more than five. “

    That’s why using Glass for more than a minute or two just isn’t practical right now; the touchpad is above your heart, yet much of the UI requires it.

    Hopefully future revisions of Glass will make the entire UI available via audio. A simple test for completeness is covering the display and then using Glass with just your voice and ears (and possibly head movements).

    So it will be like this for many devices of tomorrow.

  9. jfieb

    jfiebMember



    I want these snips together


    Sensory blog

    Wireless connections are unreliable. This is a HUGE issue. If the recognizer only works when there’s a strong Internet connection, then the recognizer is going to fail A GREAT DEAL of the time. My prediction – over the next couple of years, the speech industry will come to realize that embedded speech recognition offers HUGE advantages over the common cloud based approaches used today – and these advantages exist in not just accuracy and response time, but privacy too!



    QUIK/Sensory item of today...

    Supports advanced voice processing, including voice recognition, without cloud connection requirement

    This will help get the TAM that is possible. :)


    and the roadmap will go on so plan on the S4 doing a whole lot more along the lines expressed in the sensory blog.;)

    Last edited: Today at 10:50 AM
  10. jfieb

    jfiebMember



    the related part of the cc



    Okay. As relates to the presentation Brian that you gave, and I wanted to ask you if you could give me a little more color on the QuickLogic sense free, truly hand free program and partnership and describe that just a little more details if you don’t mind?

    Brian Faith - VP of Worldwide Sales and Marketing
    Sure. So one thing I'll mention is that the silicon level I talked about the low power sound detector block, that’s a sensory piece of IP that we have developed in our device with their permission by licensing it. That’s just the silicon level.

    If you look at the entire solution, what is that capability give customers? So basically what it does, as it allows people to do voice recognition for things like okay, Google call Bob West, that entire phase matching it, what I just said commit and deeply embedded at very lower power in our device without waking up the apps processor.

    The result is that people can have more voice recognition, enable the applications in their products from phones to wearables, knowing that the high net silicon is actually a very reputable voice recognition company and sensory.

    So it’s an ecosystem partner primarily, customers will continue to license that from them, knowing that it can run in very low power optimized hardware from QuickLogic.

    Robert West - Oak Grove Associates
    Has that been licensed by Sensory and – or any other customer?

    Brian Faith - VP of Worldwide Sales and Marketing
    Yes. The voice recognition software calls really hands free, they've actually I think I had a footprint that we said over a 1 billion smartphones today have already shipped with it. So they definitely licensed that already for phones and watches, I believe in Moto 360 watch uses that for their okay Google what's step count, what's the temperature today type functionality and also some of the larger, very large smartphone OEMs also have existing licenses.

    Robert West - Oak Grove Associates
    How is that implemented, is that implemented in a unique part or it’s in the software of the SoC?

    Brian Faith - VP of Worldwide Sales and Marketing
    Its all based in software and I think that’s one of the duties of the EOS platform is that now we're taking some of these elements that have already been running out in production where the environments and now we're optimizing those for even lower power consumption which frees up more MIPS of computational capacity to do other algorithms that these OEMs are wanting to do.

    Robert West - Oak Grove Associates
    Okay. Very good. That sounds like really and usually good program and high demand program potentially.

    Brian Faith - VP of Worldwide Sales and Marketing
    I can tell you that’s one of the most exciting element of the platform for myself personally and also to the interactions of the press that I've been briefing.


    This is a game changer, its on every smartphone, it the UI. It's better than I had hoped for.

  11. jfieb

    jfiebMember




    + Brian Faith snip

    one of the most exciting element of the platform for myself personally and also to the interactions of the press that I've been briefing.

    +

    Sensory has a roadmap

    The next five years will see embedded recognition with high performance noise cancelling and beamforming coming to the forefront, and Sensory will be leading this charge… and just like how Sensory led the way with the “always on” low-power trigger