IT for IP: Japan Patent Office’s Advanced Machine Translation System Supports Companies in Promoting Intellectual Property Strategy
2020/01/08 Toshiba Clip Team
In our highly competitive globalized economy, a successful intellectual property (IP) strategy rests on making sure that other countries know what patents you have approved, and getting information on patent filings in those countries. There’s only one way to do that, by translation, but relying on traditional manual translation is prohibitive: too much work and too expensive. That’s why the Japan Patent Office (JPO) is placing high hopes on machine translation.
The JPO operates a public website that publishes patent information. In May 2019, it unveiled a new Japanese-English machine translation system, built around a machine translation engine developed by the National Institute of Information and Communications Technology (NICT) and natural language processing technology cultivated by Toshiba Digital Solutions Corporation over many years.
Together, they provide fast, accurate, natural translation that fuses AI and the knowledge of engineers in a breakthrough development.
Yukiko Miyake, Government Solutions Engineering Dept.2, ICT Solutions Division, Toshiba Digital Solutions Corporation
Eiichiro Sumita, NICT fellow, National Institute of Information and Communications Technology (NICT)
Koji Meguro, Deputy Director, Patent Information Policy Planning Office, General Coordination Division, Japan Patent Office
Toshiyuki Nishimoto, Government Sales Dept.4, ICT Solutions Division, Toshiba Digital Solutions Corporation
Proactive IP strategy is driving a need for translation of patent documents
Patent examination is a basic tool for promoting innovation and protecting advances in science and technology. The examiner looks at the claims made in a patent application, and examines them against prior art, all to make an appropriate decision on patent rights.
The JPO has set itself a singular mission, according to Koji Meguro: to provide an examination process that is the fastest and delivers the highest quality in the world. As Deputy Director of the Patent Information Office, he was closely involved in the procurement of the new machine translation system, and the progress of the project.
The JPO has two core tasks, explains Meguro: “We have to keep up with the explosive growth in technical patent documentation we have seen in recent years, and disseminate overseas the results of our own patent examinations, to help the smooth acquisition of foreign rights by Japanese companies.”
Both require translation: examination results must be made available in English, and patent documents from other countries must be translated into Japanese. “This is why we decided to improve J-PlatPat, our patent information platform, and to start building an advanced machine translation system,” explains Meguro. “And because we want the world’s fastest and highest quality examination process, we made translation quality and speed the most important system requirements.
The project was put out to tender, and the winning bid came from Toshiba Digital Solutions Corporation, the Toshiba Group company that applies system integration, AI and IoT to the delivery of services and solutions. Two people involved in the project from proposal through to release, Toshiyuki Nishimoto, a sales representative, and Yukiko Miyake, who handled technical aspects of translation quality, can tell us more about it.
“Toshiba has been working on machine translation or a long time,” explains Nishimoto. “There are a number of approaches to this, including statistical machine translation (SMT) based on parallel translation data, and neural machine translation (NMT) that uses deep learning technology, but our main area of expertise is a rule-based (RBMT) engine.
“For the JPO, we brought together our work to date to make an RBMT-based proposal. However, as the tender date approached, we realized we weren’t getting the translation quality we needed. We were faced with the question of what to do next—including abandoning the bid. This is when I recalled a past collaboration on the transfer of translation engine technology with NICT, and I decided to contact Dr. Sumita, a Fellow at NICT.”
While its roots extend back to the 19th century, NICT is firmly positioned on technology’s cutting edge as Japan’s primary national research institute for information and communications technology. Its achievements include the world’s largest database of parallel translation data for patents—several hundred millions of sentences, the fruits of a deep relationship with the JPO and the base for an advanced NMT system. As Deputy Director of Research at NICT’s Center for Advanced Speech Translation, Dr. Sumita was interested by Nishimoto’s work, and agreed to provide consultation. That led the way to a novel technology fusion.
“We are very proactive at NICT,” says Sumita. “We want to bring translation engines and other programs developed by the institute to the wider world, to spread the technology far and wide. Of course, we don’t provide our know-how to just anybody—if there is no technology base, technology we have put a lot of effort into will not achieve widespread penetration.
*BLEU (Bilingual Evaluation Understudy) value: an index that compares correct translation and machine translation results and evaluates translation quality based on similarity. The score is calculated from 0% to 100%. The higher the score, the higher the quality.
“In this respect, Toshiba has a history of developing machine translation technology, and we have experience in transferring SMT. So when they were considering a technology transfer for patent-related machine translation, that is something we were more than happy to discuss.”
Moving forward, the Toshiba team listed candidate engines; seven in all, including RBMT, SMT and NMT. With advice from NICT, the search was on for the best engine for patent documents.
“We had to evaluate the engines,” explains Nishimoto, “We did that by using each one to translate over a thousand documents, and then looked at the accuracy of the translations and the pluses and minuses. I compared them myself, with my own eyes, so I was confident when it came to proposing the optimal method.”
Miyake was also confident. “As the development work advanced toward a proposal, I came to realize how the machine translation should be structured to respond to user needs at a very granular level. That required a complex engine structure. For example, while RBMT is suitable for the first part of a document, NMT shows its strengths in the middle, and RBMT is again suitable in the last part. You really cannot guarantee the quality if you do everything with a single engine.”
The result of this tedious and time-consuming testing, translating and scrutinizing thousands of documents, is a hybrid design, with NMT at the core and combining RBMT and SMT engines as appropriate. Natural language processing, long studied and refined by Toshiba, is used for pre- and post-translation processing. In combination with NICT’s advanced engine, it becomes possible to take full advantage of Toshiba’s strengths in machine translation.
Hybrid machine translation advances and deepens patent translation
In April 2018, the bids were opened and evaluated, and Toshiba Digital Solutions Corporation emerged as the winner. Meguro explains the JPO’s decision: “The result followed a comprehensive evaluation process that covered translation quality and speed, our key requirements, plus pricing and other factors. Toshiba Digital Solutions Corporation proposed the latest NMT, in a system configuration that fully supports it.
“Another important factor is that the system had to be compatible with our patent information platform. That means that instead of translating individual sentences one by one, documents can be thrown into the system several pages at a time, cut up and processed in parallel.” The Toshiba proposal met all these requirements. But this was not the end of the process; it simply set the scene for the next round of development.
With the system scheduled to come on line 13 months later, in May 2019, work to refine the engine began in earnest. There were two goals: realize the best possible translation quality, and achieve high speed translation. Toshiba formed a team to handle each one.
Nishimoto takes up the story. “Generally, machine translation is not good with long sentences—the longer the sentence, the longer the processing time. However, quality can be improved by pre-processing to break up long sentences. So in the first step we use RBMT to analyze the Japanese sentence structure and mark points in long sentences where the meaning changes. The marked up document then goes into NICT’s NMT engine. This seamless collaboration ensures translation speed and quality.”
As the example below shows, multi-line sentences in patent documents are divided into lines, and unnecessary data, like page numbers, and positioned are between sentences.
Meguro underlines the importance of this process. “AI has not yet reached a point where it knows which sentences to combine and which to break up. Improving translation quality requires ‘layout analysis’ that can subtly improve the user experience, and what we expected from Toshiba was language processing able to dig deeply into individual sentences.”
“Layout analysis was definitely a focus of the project,” agrees Miyake. “For instance, when we were constructing the system we found “Sunrise” as a translation result. That was confusing, as it had nothing to do with patents, so we investigated, and found an unexpected cause.”
She points to the text reproduced below.
“Patent documents often have this format,” she explains. “Here, the last character of line 1 and first character of line 2 combine as “日出.” That’s “sunrise” in Japanese, and that’s how it was translated. But it’s wrong. Eliminating mistranslations like this required literally hundreds of units of processing, and the mistakes could only be found with visual inspection. In a single analysis session, I would look at 200 translated documents. It was exhausting, but I continued to plug away.
Chemical formulas and DNA sequences can appear in science or technology patents, but letters and figures in series are one of the causes of mistranslation. NMT is notably weak at accurately translating strings of symbols; they can suddenly disappear or are mistranslated, or meaningless strings can appear from nowhere. Miyake and her colleagues handled this with a ‘do not translate’ approach, which identifies symbol strings in a sentence, but translates only the other parts of the text. They also incorporated a process for combining and outputting the translated text and the symbol string, and succeeded in suppressing NMT’s mistranslation characteristic.
There were also hurdles to overcome to get the required translation speed. “NMT and SMT produces high quality machine translations,” says Nishimoto, “but they are computationally complex and require more processing time than RBMT. Simply translating a long document like a patent can take as long as 30 minutes, but our system delivers real-time translation to web users, and people no longer have to wait in front of a computer for half an hour. It took a lot of trial and error on the part of the team to get the speed improvements, but we got to an acceptable level.”
Unending improvement to machine translation accuracy – on to the next stage
Happily, the May 2019 release date was met, and the J-PlatPat now has a fully functional system for translating Japanese patent publications into English. It is used by patent examiners overseas who want to see the results of patent examinations by the JPO, and by Japanese who need translated documentation to back-up patent applications made overseas.
Before the system was made available to the public, the JPO tested it thoroughly, recalls Meguro. “We really put it through its paces, with tests like translating a huge volume of complex sentences, and when we released it, it was bug-free. Shortly after that, we got a call from an enthusiastic user, who told us he couldn’t believe the quality of the translation. It’s rare for that to happen, and its testimony to the dramatic improvement in translation accuracy.
“Of course, the project hasn’t ended. It will continue, with the aim of further improving translation quality, and configuring the system to handle multiple languages. But I am quite comfortable letting them get on with the next phase.”
“Mr. Meguro told me about the end-user reaction, something we don’t get to hear very often, and I was really happy.” recalls Miyake. “In fact, we got a lot of feedback from Mr. Meguro and his colleagues at the JPO, and I really feel that JPO, NICT and Toshiba Digital Solutions Corporation worked well together as a team dedicated to improving translation quality and speed.”
With the system now up and running, the next chapter in the story is about adding more languages. “Development is now centered on Chinese-Japanese and Korean-Japanese, with a target release date of April 2020,” says Nishimoto. “We continue to get feedback from Mr. Sumita and NICT on the engine, and we are also improving our own applications. Machine translation is not yet a finalized service model, and we still can’t see the definitive version. What I want to do is to use the know-how from this project to develop solutions for the translation needs of government agencies, research organizations and corporations.”
Sumita shares this view. “Toshiba’s machine translation systems can operate in secure environments, and we can expect to see them deployed in ministries and agencies that require strict security. There’s also potential in areas that offer a wealth of parallel translation data such as pharmaceuticals, finance and automobiles. At NICT, we want to continue collaboration on technology transfers of high-precision engines.
The JPO’s efforts to offer the world’s fastest and highest quality examination service is supported by NICT’s advanced technology and Toshiba’s natural language processing and know-how. Continuing development through this partnership offers us glimpse into the future of machine translation.