Page 2,457«..1020..2,4562,4572,4582,459..2,4702,480..»

AWS execs speak on the top priorities for on-prem to cloud migration – SiliconANGLE News

While organizations are increasingly embracing the value of cloud and hybrid cloud infrastructures for data management, storage and analysis, a few are reticent to make the big bold move of migrating their data over to the cloud.

We still see many customers that are evaluatinghow to do their cloud migration strategiesand theyre looking for, you know, understandingwhat services can help them with those migrations, saidMat Mathews (pictured, left), general manager of Transfer Service at Amazon Web Services Inc..

Mathews;Siddhartha Roy (pictured, second from right), general manager of Snow Family at AWS; andRandy Boutin (pictured, right), GM of DataSync at AWS,spoke with Dave Vellante, host of theCUBE, SiliconANGLE Medias livestreaming studio, during the AWS Storage Day event. They discussed the current state of enterprise cloud migration from an inside perspective. (* Disclosure below.)

Several data pointers clearly signal a clear shift in favor of cloud storage over conventional on-prem solutions. However, moving petabytes of data at a time can often seem daunting (or even expensive) for some organizations. Where do they start? What cloud provider is best suited for their needs? These are the sort of questions that often make the rounds, according to the executive panel.

Id recommend customers look at theircool and cold data. If they look at their backupsand archives and they have not been used for long,it doesnt make sense to keep them on-prem.Look at how you can move those and migrate those firstand then slowly work your way up into, like,warm data and then hot data, Roy stated.

Through its compelling cost savings to customers, long-standing durability record, and unwavering flexibility, AWS has proven itself time and again as the de-facto industry option in cloud storage services, according to the panel.

How do AWS customers figure out which services to use?It comes down to a combination of things, according to Boutin.

First is the amount of available bandwidththat you have, the amount of data that youre lookingto move, and the timeframe you have in which to do that, he said.So if you have a high speed, say, gigabit network,you can move data very quickly using DataSync.If you have a slower network or perhaps you dont wantto utilize your existing network for this purpose,then the Snow Family of products makes a lot of sense.

Watch the complete video interview below, and be sure to check out more of SiliconANGLEs and theCUBEs coverage of the AWS Storage Day event. (* Disclosure: TheCUBE is a paid media partner for the AWS Storage Day. Neither Amazon Web Services Inc., the sponsor of theCUBEs event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

We are holding our third cloud startup showcase on Sept. 22.Click here to join the free and open Startup Showcase event.

TheCUBEis part of re:Invent, you know,you guys really are a part of the eventand we really appreciate your coming hereand I know people appreciate thecontent you create as well Andy Jassy

We really want to hear from you, and were looking forward to seeing you at the event and in theCUBE Club.

Read the original:
AWS execs speak on the top priorities for on-prem to cloud migration - SiliconANGLE News

Read More..

Cloud is the ‘new normal’ as businesses boost resiliency and agility beyond COVID – SiliconANGLE News

The COVID-19 pandemic and the business challenges it caused were a catalyst for many companies to accelerate their migration to the cloud, and that trend is not likely to change anytime soon.

Amazon Web Services Inc. is betting on companies growing interest in building resilience and agility in the cloud beyond pandemic times, according to Mai-Lan Tomsen Bukovec (pictured), vice president of AWS Storage.

Were going to continue to see that rapid migrationto the cloud, because companies now knowthat in the course of days and months the whole world of your expectationsof where your business is going and where,what your customers are going to do, that can change, she said. And that can change not just for a year,but maybe longer than that.Thats the new normal.

Bukovec spoke with Dave Vellante, host of theCUBE, SiliconANGLE Medias livestreaming studio, during theAWS Storage Day event.They discussed how cloud is the new reality for enterprises, how AWS storage fits into the data fabric debate, and what AWS thinks about its storage strategy and about business going hybrid.(* Disclosure below.)

While the cloud is seen as the new normal for businesses, the paths enterprises use to get there remain diverse. AWS customers typically fall into one of three patterns, the fastest being where they choose to move their core business mission to the cloud because they can no longer scale on-premises, according to Bukovec.

Its not technology that stops peoplefrom moving to the cloud as quick as they want to; its culture, its people, its processes,its how businesses work, she explained. And when you move the crown jewels into the cloud,you are accelerating that cultural change.

Other companies follow what Bukovec sees as the slower path, which is to take a few applications across the organization and move them to the cloud as a reference implementation. In this model of cloud pilots, the goal is to try to get the people who have done thisto generalize the learning across the company.

Its actually counterproductive to a lot of companiesthat want to move quickly to the cloud, Bukovec said.

The third pattern is what AWS calls new applications or cloud-first, when a company decides that all new technology initiatives will be in the cloud. That allows the business to be able to see cloud ideas and technology in different parts of its structure, generating a decentralized learning process with a faster culture change than in the previous pattern.

While cloud storage is centralized, it fully fits into the emerging trend known as data mesh, according to Bukovec. As first defined by Zhamak Dehghani, a ThoughtWorks consultant,a data meshis a type of data decentralized architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.

Data mesh presupposes separating the data storageand the characteristics of datafrom the data services that interactand operate on that storage, Bukovec explained. The idea is to ensure that the decentralized business model can work with this data and innovate faster.

Our AWS customers are putting their storagein a centralized place because its easier to track,its easier to view compliance, and its easier to predict growth and control costs, but we started with building blocksand we deliberately built our storage servicesseparate from our data services, Bukovec said. We have a number of these data servicesthat our customers are using to buildthat customized data meshon top of that centralized storage.

Heres the complete video interview, part of SiliconANGLEs and theCUBEs coverage of the AWS Storage Day event. (* Disclosure: TheCUBE is a paid media partner for the AWS Storage Day. Neither AWS, the sponsor of theCUBEs event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

We are holding our third cloud startup showcase on Sept. 22.Click here to join the free and open Startup Showcase event.

TheCUBEis part of re:Invent, you know,you guys really are a part of the eventand we really appreciate your coming hereand I know people appreciate thecontent you create as well Andy Jassy

We really want to hear from you, and were looking forward to seeing you at the event and in theCUBE Club.

Read the original post:
Cloud is the 'new normal' as businesses boost resiliency and agility beyond COVID - SiliconANGLE News

Read More..

Microsoft teases new super simple OneDrive interface – TechRadar

Finding the right files in Microsoft's cloud storage service will soon be even easier as the software giant is currently working on a new interface for OneDrive.

According to a new post on the Microsoft 365 Roadmap, OneDrive will soon be getting a new command bar later this month.

With this update, OneDrive users will easily be able to identify the right file and access primary commands. However, the simplified view in OneDrive's new interface will also help boost productivity as it allows users to focus on the content they're working on as they won't be distracted by additional menus.

In two separate posts, Microsoft also revealed that OneDrive will also be getting a new sharing experience in November of this year.

The company is updating OneDrive's Share menu to provide easy access to additional sharing options such as email, copy link, Teams chat as well as manage to access settings.

However, the Copy link button is set to be replaced by a footer where users will be able to set permissions before copying links and sharing them with recipients.

After releasing the 64-bit version of OneDrive earlier this year, Microsoft has continually updated its cloud storage service and it will be interesting to see how these visual and sharing updates pan out.

See original here:
Microsoft teases new super simple OneDrive interface - TechRadar

Read More..

Pure Storage tantalises with reinvention possibilities Blocks and Files – Blocks and Files

Pure Storage has flagged a major announcement on September 28th. A financial analysts briefing is scheduled to follow the announcement, suggesting news that will affect investors views of Pures future revenues, costs and underlying profitability measures. The company is saying the announcement is about AIOps, the future of storage, storage and DevOps product innovations, and its as-a-Service offerings. What could it announce that could cause analysts to take stock and form a different view of the company?

We ignored the AIOps aspect, as that would be a fairly incremental move, and came up with a list of potential developments:

Hardware array refreshes would be good. Using the latest Xeon processors, for example, supporting PCIe gen-4, that sort of thing but they would hardly move the needle for financial analysts. Possibly committing to support DPUs from Pensando or Fungible might do that. Still, not exactly that much impact on a financial analysts twitch-ometer.

Porting FlashBlade software to one or more public clouds would seem both logical and good sense. It would be additive to the FlashBlade market and we think analysts would concur, nod approvingly and move on. Ditto porting Cloud Block Store to the Google Cloud Platform. Expansion into adjacent market? Tick. Stronger competition for NetApp data fabric idea? Tick. Whats not to like? Nothing. Move on.

Adding file and block support to Cloud Block Store? Trivially there would be a naming problem: do we call it Cloud Block File Object Store? It would seem a logical extension of Pures public cloud capabilities and an improvement in the cross-cloud consistency of Pures hybrid cloud story. We cant imagine analysts would see a downside here.

It could be achieved with another strategy: make the Purity OS software cloud-native and have it run un the public clouds. That would be a big deal with a common code tree and four deployment layers: on-premises arrays, AWS, Azure and GCP. It would be a large very large software effort and give Pure a great hybrid cloud story with lots of scope for software revenue growth. Cue making sure the analysts understand this. An AIOps extension could be added in to strengthen the story as well.

How about doing a Silk, Qumulo or VAST Data, and walking away from hardware manufacturing using a close relationship with a contract manufacturer/distributor instead and certified configurations? Thus would be a major business change, and both analysts and customers would want reassuring that Pure would not lose its hardware design mojo.

A lesser hardware change would be to use commodity SSDs instead of Pure designing its own flash storage drives and controllers. Our instant reaction is a thumbs down, as Pure has consistently said its hardware is superior to COTS SSD vendors such as Dell, HPE and NetApp because it optimises flash efficiency, performance and endurance better than it could if it was limited by SSD constraints.

Such a change would still get analysts in a tizzy. But we dont think it likely, even if Pure could pitch a good cost-saving and no-performance-impact story.

How about a strategic deal with a public cloud vendor similar to the AWS-NetApp FSx for ONTAP deal? That would indeed be a coup having, say, Pures block storage available alongside the cloud vendors native block storage. We dont think it likely, though it has to be on the possibles list.

Expanding the Pure-as-a-Service strategy to include all of Pures products would be an incremental move and so no big deal to people who had taken the basic idea on board already. Analysts would need a talking-to perhaps, to be persuaded that this was worth doing in Annual Recurring Revenue growth terms. This could be thought of as Pure doing a me-too with HPEs GreenLake and Dells APEX strategies.

How about Pure acting as a cloud access broker and front-end concierge supplier, rather like NetApp with its Spot-based products? That would be big news and require new software and a concerted marketing and sales effort. AIOps could play a role here too. Our view, based on gut feelings alone, is that this is an unlikely move although it would be good to see NetApp getting competition.

We are left thinking that the likeliest announcements will be about making more of Pures software available in the public clouds, plus an extension of Pures as-a-Service offerings and a by-the-way set of hardware refreshes. Well see how well our predictions match up with reality on September 28 and mentally prepare for a kicking just in case we are way off base.

Go here to read the rest:
Pure Storage tantalises with reinvention possibilities Blocks and Files - Blocks and Files

Read More..

Italy says bids for national cloud hub expected this month – iTnews

Italy expects to receive bids by the end of September from companies interested in building a national cloud hub, a 900-million-euro (A$1.4 billion) project to upgrade the country's data storage facilities, a government minister said.

Part of EU-funded projects to help Italy's economy recover from the pandemic, the cloud hub initiative reflects European efforts to make the 27-member bloc less dependent on large overseas tech companies for cloud services.

"I'm confident we will receive some expressions of interest by the end of the month," Innovation Minister Vittorio Colao, a former head of telecom giant Vodafone, told reporters during an annual business conference in Cernobbio on Lake Como.

"Technological independence of Europe is important because it allows the bloc to negotiate (with foreign partners) on an equal footing," Colao said, adding he had discussed the issue with French Finance Minister Bruno Le Maire at the conference.

In the Recovery Plan sent to Brussels in April to access EU funds, Rome earmarked 900 million euros for the cloud hub project, according to sources and documents seen by Reuters.

Sources told Reuters in June that Italian state lender Cassa Depositi e Prestiti was considering an alliance with Telecom Italia and defence group Leonardo in the race to create the cloud hub.

US tech giants such as Google, Microsoft and Amazon, which dominate the data storage industry, could provide their cloud technology to the cloud hub, if licensed to companies taking part in the hub project, officials have said.

Such a structure would be aimed at soothing concerns over the risk of US surveillance in the wake of the adoption of the US CLOUD Act of 2018, which can require US-based tech firms to provide data to Washington even if it is stored abroad.

See the rest here:
Italy says bids for national cloud hub expected this month - iTnews

Read More..

Broadcom server-storage connectivity sales down but recovery coming Blocks and Files – Blocks and Files

Although Broadcom saw an overall rise in revenues and profit in its latest quarter, sales in the server-to-storage connectivity area were down. It expects a recovery and has cash for an acquisition.

Revenues in Broadcoms third fiscal 2021 quarter, ended August 1, were $6.78 billion, up 16 per cent on the year. There was a $1.88 billion profit, more than doubling last years $688 million.

Were interested because Broadcom makes server-storage connectivity products such as Brocade host bus adapters (HBAs), SAS and NVMe connectivity products.

President and CEO Hock Tans announcement statement said: Broadcom delivered record revenues in the third quarter reflecting our product and technology leadership across multiple secular growth marketsin cloud, 5G infrastructure, broadband,and wireless. We are projecting the momentum to continue in the fourth quarter.

There are two segments to its business: Semiconductor Solutions, which brought in $5.02 billion, up 19 per cent on the year; and Infrastructure Software, which reported $1.76 billion, an increase of ten per cent.

Tan said in the earnings call: Demand continued to be strong from hyper-cloud and service provider customers. Wireless continued to have a strong year-on-year compare. And while enterprise has been on a trajectory of recovery, we believe Q3 is still early in that cycle, and that enterprise was down year on year.

Inside Semiconductor Solutions, the server storage connectivity area had revenues of $673 million, which was nine per cent down on the year-ago quarter. Tan noted: Within this, Brocade grew 27 per cent year on year, driven by the launch of new Gen 7 Fibre Channel SAN products.

Overall, Tan said: Our [Infrastructure Solutions] products here supply mission-critical applications largely to enterprise, which, as I said earlier, was in a state of recovery. That being said, we have seen a very strong booking trajectory from traditional enterprise customers within this segment. We expect such enterprise recovery in server storage.

This will come from aggressive migration in cloud to 18TB disk drives and a transition to next-generation SAS and NVMe products. Tan expects Q4 server storage connectivity revenue to be up low double-digit percentage year on year. Think two to five per cent.

The enterprise segment will grow more, with Tan saying: Because of strong bookings that we have been seeing now for the last three months, at least from enterprise, which is going through largely on the large OEMs, who particularly integrate the products and sell it to end users, we are going to likely expect enterprise to grow double digits year on year in Q4.

That enterprise business growth should continue throughout 2022, Tan believes: In fact, I would say that the engine for growth for our semiconductor business in 2022 will likely be enterprise spending, whether its coming from networking, one sector for us, and/or from server storage, which is largely enterprise, we see both this showing strong growth as we go into 2022.

Broadcom is accumulating cash and could make an acquisition or indulge in more share buybacks. Tan said: By the end of October, our fiscal year, well probably see the cash net of dividends and our cash pool to be up to close to $13 billion, which is something like $6 billion, $7 billion, $8 billion above what we would, otherwise, like to carry on our books.

Let us pronounce that HBAs are NICs (Network Interface Cards) and that an era of SmartNICs is starting. It might be that Broadcom could have an acquisitive interest in the SmartNIC area.

Broadcom is already participating in the DPU (Data Processing Unit) market, developing and shipping specialised silicon engines to drive specialised workloads for hyperscalers. Answering an analyst question, Tan said: We have the scale. We have a lot of the IP calls and the capability to do all those chips for those multiple hyperscalers who can afford and are willing to push the envelope on specialised I used to call it offload computing engines, be they video transcoding, machine learning, even what people call DPUs, smart NICs, otherwise called, and various other specialised engines and security hardware that we put in place in multiple cloud guys.

Better add Broadcom to the list of DPU vendors such as Fungible, Intel and Pensando, and watch out for any SmartNIC acquisition interest.

The rest is here:
Broadcom server-storage connectivity sales down but recovery coming Blocks and Files - Blocks and Files

Read More..

"Rockset is on a mission to deliver fast and flexible real-time analytics" – JAXenter

JAXenter: Thank you for taking the time to speak with us! Can you tell us more about Rockset and how it works? How does it help us achieve real-time analytics?

Venkat Venkataramani: Rockset is a real-time analytics database that serves low latency applications. Think real-time logistics tracking, personalized experiences, anomaly detection and more.Wh

Rockset employs the same indexing approach used by the systems behind the Facebook News Feed and Google Search, which were built to make data retrieval for millions of users and on TBs of data, instantaneous. It goes a step further by building a Converged Index a search index, a columnar store and a row index on all data. This means sub-second search, aggregations and joins without any performance engineering.

You can point Rockset at any data structured, semi-structured and time series data and it will index the data in real-time and enable fast SQL analytics. This frees teams from time-consuming and inflexible data preparation. Teams can now onboard new datasets and run new experiments without being constrained by data operations. And, Rockset is fully-managed and cloud-native, making a massively distributed real-time data platform accessible to all.

SEE ALSO: Shifting toward more meaningful insights means shifting toward proactive analytics

JAXenter: What data sources does it currently support?

Venkat Venkataramani: Rockset has built-in data connectors to data streams, OLTP databases and data lakes. These connectors are all fully-managed and stay in sync with the latest data. That means you can run millisecond-latency SQL queries within 2 seconds of data being generated. Rockset has built-in connectors to Amazon DynamoDB, MongoDB, Apache Kafka, Amazon Kinesis, PostgreSQL, MySQL, Amazon S3 and Google Cloud Storage. Rockset also has a Write API to ingest and index data from other sources.

JAXenter: Whats new at Rockset and how will it continue to improve analytics for streaming data?

Venkat Venkataramani: We recently announced a series of product releases to make real-time analytics on streaming data affordable and accessible. With this launch, teams can use SQL to transform and pre-aggregate data in real-time from Apache Kafka, Amazon Kinesis and more.

This makes real-time analytics up to 100X more cost-effective on streaming data. And, we free engineering teams from needing to construct and manage complex data pipelines to onboard new streaming data and experiment on queries. Heres what weve released:

You can delve further into this release by watching a live Q&A with Tudor Bosman, Rocksets Chief Architect. He delves into how we support complex aggregations on rolled up data and ensure accuracy even in the face of dupes and latecomers.

JAXenter: What are some common use cases for real-time data analytics? When is it useful to implement?

Venkat Venkataramani: You experience real-time analytics every day whether you realize it or not. The content displayed in Instagram newsfeeds, the personalized recommendations on Amazon and the promotional offers from Uber Eats are all examples of real-time analytics. Real-time analytics encourages users to take desired actions from reading more content, to adding items to our cart, to using takeout and delivery services for more of our meals.

We think real-time analytics isnt just useful to the big tech giants. Its useful across all technology companies to drive faster time to insight and build engaging experiences. Were seeing SaaS companies in the logistics space provide real-time visibility into the end-to-end supply chain, route shipments and predict ETAs. This ensures that materials arrive on time and within schedule, even in the face of an increasingly complex chain. Or, there are marketing analytics software companies that need to unify data across a number of interaction points to create a single view of the customer. This view is then used for segmentation, personalization and automation of different actions to create more compelling customer experiences.

Theres a big misperception in the space that a) real-time analytics is too expensive b) real-time analytics is only accessible to large tech companies. Thats just not true anymore. The cloud offerings, availability of real-time data and the changing resource economics are making this within reach of any digital disrupter.

JAXenter: How is Rockset built under the hood?

Venkat Venkataramani: The Converged Index, mentioned previously, is the key component in enabling real-time analytics. Rockset stores all its data in the search, column-based and row-based index structures that are part of the Converged Index, and so we have to ensure that the underlying storage can handle both reads and writes efficiently. To meet this requirement, Rockset uses RocksDB as its embedded storage engine, with some modifications for use in the cloud. RocksDB enables Rockset to handle high write rates, leverage SSDs for optimal price-performance and support updates to any field.

Another core part of Rocksets design is its use of a disaggregated architecture to maximize resource efficiency. We use an Aggregator-Leaf-Tailer (ALT) architecture, common at companies like Facebook and LinkedIn, where resources for ingest compute, query compute and storage can be scaled independently of each other based on the workload in the system. This allows Rockset users to exploit cloud efficiencies to the full.

SEE ALSO: Codespaces helps developers to focus on what matters mostbuilding awesome things

JAXenter: Personally, what are some of your favorite open source tools that you cant do without?

Venkat Venkataramani: RocksDB! The team at Rockset built and open-sourced RocksDB at Facebook, a high performance embedded storage engine used by other modern data stores like CockroachDB, Kafka and Flink. RocksDB was a project at Facebook that abstracted access to local stable storage so that developers could focus their energies on building out other aspects of the system. RocksDB has been used at Facebook as the embedded storage for spam detection, graph search and message queuing. At Rockset, weve continued to contribute to the project as well as release RocksDB-cloud to the community.

We are also fans of the dbt community, an open-source tool that lets data teams collaborate on transforming data in their database to ship higher quality data sets, faster. We share a similar outlook on the data space we think data pipelines are challenging to build and maintain, respect SQL as the lingua franca of analytics and want to make it easy for data to be shared across an organization.

JAXenter: Can you share anything about Rocksets future? Whats on the roadmap next, what features and/or improvements are being worked on?

Venkat Venkataramani: Rockset is on a mission to deliver fast and flexible real-time analytics, without the cost and complexity. Our product roadmap is geared towards enabling all digital disrupters to realize real-time analytics.

This requires taking steps to make real-time analytics more affordable and accessible than ever before. A first step towards affordability was the release of SQL-based rollups and transformations, which cut the cost of real-time analytics up to 100X for streaming data. As part of our expansion initiative, were also expanding Rockset to users across the globe. Follow us as we continue to put real-time analytics within reach of all engineers.

Read the original:
"Rockset is on a mission to deliver fast and flexible real-time analytics" - JAXenter

Read More..

Lessons Learned: Training and Deploying State of the Art Transformer Models at Digits – insideBIGDATA

In this blog post, we want to provide a peek behind the curtains on how we extract information with Natural Language Processing (NLP). Youll learn how to appy state-of-the-art Transformer models for this problem and how to go from an ML model idea to integration in the Digits app.

Our Plan

Information can be extracted from unstructured text through a process called Named Entity Recognition (NER). This NLP concept has been around for many years, and its goal is to classify tokens into predefined categories, such as dates, persons, locations, and entities.

For example, the transaction below could be transformed into the following structured format:

We had seen outstanding results from NER implementations applied to other industries and we were eager to implement our own banking-related NER model. Rather than adopting a pre-trained NER model, we envisioned a model built with a minimal number of dependencies. That avenue would allow us to continuously update the model while remaining in control of all moving parts. With this in mind, we discarded available tools like the SpaCy NER implementation or HuggingFace models for NER. We ended up building our internal NER model based only on TensorFlow 2.x and the ecosystem library TensorFlow Text.

The Data

Every Machine Learning project starts with the data, and so did this one. We decided which relevant information we wanted to extract (e.g., location, website URLs, party names, etc.) and, in the absence of an existing public data set, we decided to annotate the data ourselves.

There are a number of commercial and open-source tools available for data annotation, including:

The optimal tool varies with each project, and is a question of cost, speed, and useful UI. For this project, our key driver for our tool selection was the quality of the UI and the speed of the sample processing, and we chose doccano.

At least one human reviewer then evaluated each selected transaction, and that person would mark the relevant sub-strings as shown above. The end-product of this processing step was a data set of annotated transactions together with the start- and end-character of each entity within the string.

Selecting an Architecture

While NER models can also be based on statistical methods, we established our NER models on an ML architecture called Transformers. This decision was based on two major factors:

The initial attention-based model architecture was the Bidirectional Encoder Representation from Transformers (BERT, for short), published in 2019. In the original paper by Google AI, the author already highlighted potential applications to NER, which gave us confidence that our transformer approach might work.

Furthermore, we had previously implemented various other deep-learning applications based on BERT architectures and we were able to reuse our existing shared libraries. This allowed us to develop a prototype in a short amount of time.

BERT models can be used as pre-trained models, which are initially trained on multi-lingual corpi on two general tasks: predicting mask tokens and predicting if the next sentence has a connection to the previous one. Such general training creates a general language understanding within the model. The pre-trained models are provided by various companies, for example, by Google via TensorFlow Hub. The pre-trained model can then be fine-tuned during a task-specific training phase. This requires less computational resources than training a model from scratch.

The BERT architecture can compute up to 512 tokens simultaneously. BERT requires WordPiece tokenization which splits words and sentences into frequent word chunks. The following example sentence would be tokenized as follows:

Digits builds a real-time engine

[bdig, b##its, bbuilds, ba, breal, b-, btime, bengine]

There are a variety of pre-trained BERT models available online, but each has a different focus. Some models are language-specific (e.g., CamemBERT for French or Beto for Spanish), and other models have been reduced in their size through model distillation or pruning (e.g., ALBERT or DistilBERT).

Time to Prototype

Our prototype model was designed to classify the sequence of tokens which represent the transaction in question. We converted the annotated data into a sequence of labels that matched the number of tokens generated from the transactions for the training. Then, we trained the model to classify each token label:

In the figure above, you notice the O tokens. Such tokens represent irrelevant tokens, and we trained the classifier to detect those as well.

The prototype model helped us demonstrate a business fit of the ML solution before engaging in the full model integration. At Digits, we develop our prototypes in GPU-backed Jupyter notebooks. Such a process helps us to iterate quickly. Then, once we confirm a business use-case for the model, we focus on the model integration and the automation of the model version updates via our MLOps pipelines.

Moving to Production

In general, we use TensorFlow Extended (TFX) to update our model versions. In this step, we convert the notebook code into TensorFlow Ops, and here we converted our prototype data preprocessing steps into TensorFlow Transform Ops. This extra step allows us later to train our model versions effectively, avoid training-serving skew, and furthermore allows us to bake our internal business logic into our ML models. This last benefit helps us to reduce the dependencies between our ML models and our data pipeline or back-end integrations.

We are running our TFX pipelines on Google Clouds Vertex AI pipelines. This managed service frees us from maintaining a Kubernetes cluster for Kubeflow Pipelines (which we have done prior to using Vertex AI).

Our production models are stored in Google Cloud Storage buckets, and TFServing allows us to load model versions directly from cloud storage. Because of the dynamic loading of the model versions, we dont need to build custom containers for our model serving setup; we can use the pre-built images from the TensorFlow team.

Here is a minimal setup for Kubernetes deployment:

apiVersion: apps/v1 kind: Deployment metadata: name: tensorflow-serving-deployment spec: template: spec: containers: name: tensorflow-serving-container image: tensorflow/serving:2.5.1 command: /usr/local/bin/tensorflow_model_server args: port=8500 model_config_file=/serving/models/config/models.conf file_system_poll_wait_seconds=120

Note the additional argument file_system_poll_wait_seconds in the list above. By default, TFServing will check the file system for new model versions every 2s. This can generate large Cloud Storage costs since every check triggers a list operation, and storage costs are billed based on the used network volume. For most applications, it is fine to reduce the file system check to every 2 minutes (set the value to 120 seconds) or disable it entirely (set the value to 0).

For maintainability, we keep all model-specific configurations in a specific ConfigMap. The generated file is then consumed by TFServing on boot-up.

apiVersion: v1 kind: ConfigMap metadata: namespace: ml-deployments name: -config data: models.conf: |+ model_config_list: { config: { name: , base_path: gs:///, model_platform: tensorflow, model_version_policy: { specific: { versions: 1607628093, versions: 1610301633 } } version_labels { key: canary, value: 1610301633 } version_labels { key: release, value: 1607628093 } } }

After the initial deployment, we started iterating to optimize the model architecture for high throughput and low latency results. This meant optimizing our deployment setup for BERT-like architectures and optimizing the trained BERT models. For example, we optimized the integration between our data processing Dataflow jobs and our ML deployments, and shared our approach in our recent talk at the Apache Beam Summit 2021.

Results

The deployed NER model allows us to extract a multitude of information from unstructured text and make it available through Digits Search.

Here are some examples of our NER model extractions:

The Final Product

At Digits, an ML model is never itself the final product. We strive to delight our customers with well-designed experiences that are tightly integrated with ML models, and only then do we witness the final product. Many additional factors come into play:

Latency vs. Accuracy

A more recent pre-trained model (e.g., BART or T5) could have provided higher model accuracy, but it would have also increased the model latency substantially. Since we are processing millions of transactions daily, it became clear that model latency is critical for us. Therefore, we spent a significant amount of time on the optimization of our trained models.

Design for false-positive scenarios

There will always be false positives, regardless of how stunning the model accuracy was pre-model deployment. Product design efforts that focus on communicating ML-predicted results to end-users are critical. At Digits, this is especially important because we cannot risk customers confidence in how Digits is handling their financial data.

Automation of model deployments

The investment in our automated model deployment setup helped us provide model rollback support. All changes to deployed models are version controlled, and deployments are automatically executed from our CI/CD system. This provides a consistent and transparent deployment workflow for our engineering team.

Devise a versioning strategy for release and rollback

To assist smooth model rollout and a holistic quantitative analysis prior to rollout, we deploy two versions of the same ML model and use TFServings version labels (e.g., release and pre-release tags) to differentiate between them. Additionally, we use an active version table that allows for version rollbacks, made as simple as updating a database record.

Assist customers, dont alienate them

Last but not least, the goal for our ML models should always be to assist our customers in their tasks instead of alienating them. That means our goal is not to replace humans or their functions, but to help our customers with cumbersome tasks. Instead of asking people to extract information manually from every transaction, well assist our customers by pre-filling extracted vendors, but they will always stay in control. If we make a mistake, Digits makes it easy to overwrite our suggestions. In fact, we will learn from our mistakes and update our ML models accordingly.

Further Reading

Check out these great resources for even more on NER and Transformer models:

About the Author

Hannes Hapke is a Machine Learning Engineer at Digits. As a Google Developer Expert, Hannes has co-authored two machine learning publications: NLP in Action by Manning Publishing, and Building Machine Learning Pipelines by OReilly Media. At Digits, he focuses on ML engineering and applies his experience in NLP to advance the understanding of financial transactions.

Sign up for the free insideBIGDATAnewsletter.

Join us on Twitter:@InsideBigData1 https://twitter.com/InsideBigData1

See original here:
Lessons Learned: Training and Deploying State of the Art Transformer Models at Digits - insideBIGDATA

Read More..

WhatsApp is adding encrypted backups – The Verge

WhatsApp will let its more than 2 billion users fully encrypt the backups of their messages, the Facebook-owned app announced Friday.

The plan, which WhatsApp is detailing in a white paper before rolling out to users on iOS and Android in the coming weeks, is meant to secure the backups WhatsApp users already send to either Google Drive or Apples iCloud, making them unreadable without an encryption key. WhatsApp users who opt into encrypted backups will be asked to save a 64-digit encryption key or create a password that is tied to the key.

WhatsApp is the first global messaging service at this scale to offer end-to-end encrypted messaging and backups, and getting there was a really hard technical challenge that required an entirely new framework for key storage and cloud storage across operating systems, Facebook CEO Mark Zuckerberg said in a statement.

If someone creates a password tied to their accounts encryption key, WhatsApp will store the associated key in a physical hardware security module, or HSM, that is maintained by Facebook and unlocked only when the correct password is entered in WhatsApp. An HSM acts like a safety deposit box for encrypting and decrypting digital keys.

Once unlocked with its associated password in WhatsApp, the HSM provides the encryption key that in turn decrypts the accounts backup that is stored on either Apple or Googles servers. A key stored in one of WhatsApps HSM vaults will become permanently inaccessible if repeated password attempts are made. The hardware itself is located in data centers owned by Facebook around the world to protect from internet outages.

The system is designed to ensure that no one besides an account owner can gain access to a backup, the head of WhatsApp, Will Cathcart, told The Verge. He said the goal of letting people create simpler passwords is to make encrypted backups more accessible. WhatsApp will only know that a key exists in a HSM, not the key itself or the associated password to unlock it.

The move by WhatsApp comes as governments around the world like India WhatsApps largest market are threatening to break the way that encryption works. We expect to get criticized by some for this, Cathcart said. Thats not new for us ... I believe strongly that governments should be pushing us to have more security and not do the opposite.

WhatsApps announcement means the app is going a step further than Apple, which encrypts iMessages but still holds the keys to encrypted backups; that means Apple can assist with recovery, but also that it can be compelled to hand the keys over to law enforcement. Cathcart said WhatsApp has been working on making encrypted backups a reality for the past couple of years, and that while they are opt-in to start, he hopes, over time, to have this be the way it works for everyone.

Visit link:
WhatsApp is adding encrypted backups - The Verge

Read More..

What Is Fully Homomorphic Encryption (FHE)? – CIO Insight

Company leaders are continually looking for ways to keep data safe without compromising its usability. Fully homomorphic encryption (FHE) could be a step in the right direction.

Fully homomorphic encryption allows the analyzing and running of processes on data without needing a decryption method. For example, if someone wanted to process information in the cloud but did not trust the provider, FHE would allow sending the encrypting data for processing without providing a decryption key.

Read more: Creating a Cloud Strategy: Tips for Success

FHE is like other encryption methods that require using a public key to encrypt the data. Only the party with the correct private key can see the information in its unencrypted state. However, FHE uses an algebraic system that allows working with data without requiring decryption first. In many cases, information is represented as integers, while multiplication and addition replace the Boolean functions used in other kinds of encryption.

FHE uses an algebraic system that allows working with data without requiring decryption first.

Researchers first proposed FHE in the 1970s, and people became interested back then. However, it has taken substantial time to turn these concepts into feasible real-world applications.

A researcher showed it was plausible with his 2009 published study. However, working with even a tiny amount of data proved too time-intensive. Even now, FHE can require hundreds of times more computing power than an equivalent plaintext data operation.

Data is at a higher risk of becoming compromised when its not encrypted. FHE keeps the information secure by not requiring decryption to occur for processing to happen.

In one recent example, Google released an FHE-based tool that allows developers to work with encrypted data without revealing any personally identifiable information (PII). Googles blog post on the subject gave the example of FHE allowing medical researchers to examine the data of people with a particular condition without providing any personal details about them.

Encryption takes private information and makes it unreadable by unauthorized third parties. However, something that makes people particularly excited about FHE is that it eliminates the tradeoff between data privacy and usability, making both present at a high level.

Read more: Data Collection Ethics: Bridging the Trust Gap

Many people familiar with FHE and its potential applications agree that it seems safer than other methods of data protection, which require decrypting data for processing. It could be particularly widely embraced in certain sectors. After all, cloud computing brings in $250 billion per year.

Experts believe FHE will emerge as a compelling option in tightly regulated industries.

People are continually interested in how to keep their data safe when stored in the cloud. Some experts also believe FHE will emerge as a compelling option in tightly regulated industries because it could become a better safeguard against breaches.

Past solutions to either completely anonymize data or restrict access through stringent data use agreements have limited the utility of abundant and valuable patient data, IBM notes on its site. FHE in clinical research can improve the acceptance of data-sharing protocols, increase sample sizes, and accelerate learning from real-world data.

Fully homomorphic encryption could forever change how companies use data. Thats crucial, especially considering how many businesses collect it in vast quantities at a time where many consumers feel increasingly concerned about keeping their details safe.

For example, FHE allows keeping information in an encrypted database to make it less vulnerable to hacking without restricting how owners can use it. That approach could limit an organizations risk of regulatory fines due to data breaches and hacks.

It also permits secure data monetization efforts by protecting customers information and allowing services to process peoples information without invading privacy. In such cases, individuals may be more forthcoming about sharing their information, knowing in advance that business representatives cannot see certain private aspects of it.

Using an FHE-based solution also enables sharing data with third-party collaborators in ways that reduce threats and help the company providing the information comply with respective regulations. Thus, this kind of encryption could support research efforts where people across multiple organizations need to work with sensitive content.

Read more: Data Analytics vs Data Science: Whats the Difference?

Fully homomorphic encryption is not widely available in commercial platforms yet. However, some companies offer products based on homomorphic encryption that could eventually work for the use cases discussed earlier.

For example, Intel has such a product that allows segmenting data into secure zones for processing.Similarly, Inpher offers a product with an FHE component. It primarily uses secure multiparty computation, but applies FHE to certain use cases.

IBM says FHE is now adequate for specific use cases.

Beyond those examples, IBM has a fully homomorphic encryption toolkit that it released for iOS in 2020. That progress primarily occurred after IBMs experts took it upon themselves to make FHE more commercially feasible, addressing the time and computing power that it previously took to use this type of encryption.

The companys representatives say FHE is now adequate for specific use cases and suggested the health care and finance industries as particularly well suited to it.

Since FHE is not widely available via commercial platforms yet, interested parties should not expect to start using it immediately. However, that could change as organizations become increasingly concerned about striking the right balance between data security and usability.

The ideal strategy for businesses to take now is to explore the options currently on the market. They can then determine if any of those options check the boxes for helping them explore fully homomorphic encryption, including what it might do in the future and what capabilities exist now.

Read next: AI vs Machine Learning: What Are Their Differences & Impacts?

Read this article:
What Is Fully Homomorphic Encryption (FHE)? - CIO Insight

Read More..