COMP9321 Data Services Engineering Term 1, 2022 Week
1: Introduction 2 Source: * https://www.infoq.com/articles/narayanan-soa-data-services Data Services • What? Data services are software services that encapsulate operations on key data entities relevant to the consumer • Why? Data nowadays is stored in multiple systems and require multiple interfaces or mechanisms to interact with them. There are varying channels (e.g., legacy systems, Online, third-party) and mechanisms (e.g., event driven, on demand, batch process) that need to be served as well adding additional challenges to data services. Without an abstraction layer for data consumers that insulate them from this complexity we will end up with a spaghetti of point to point integrations between data sources and data consumers 3 Let’s Go Deeper Data Recording... The Beginning 4 Source: Wikipedia.org 5 Not that Deep 6 Data-oriented Services ... 7 Information Systems/Applications Integration A set of services and solutions for bringing together disparate application and business processes as needed to meet the diverse information requirements of your customers, partners, suppliers and employees. Motivations: Streamlining business operations, globalisation, competition, mergers and acquisition, new business models, technology development, etc. – e.g., merger of two companies (data + processes) Problems: systems to be integrated are not homogeneous. • theyareindividuallydeveloped(ad-hoc)systemsovertime • someare“off-the-shelf”packages • different execution platforms, technologies and business rules Heterogeneity at different levels: language, platform, schema (data, process) • Dataintegration,Process/Systemsintegration 8 Picture from Mashups: Concepts, Models and Architectures Data Level Integration ... Data integration = combining data from different sources and providing users with a unified view over them Data Level Integration ... 9 Picture from View-based Integration, Yannis Katsis, CSE, UC San Diego Data Level Integration ... 10 Picture from Mashups: Concepts, Models and Architectures Data Level Integration ... 11 Picture from Mashups: Concepts, Models and Architectures System Level Integration ... In enterprise environments, pick any sizeable organisation. You will see many departments performing different functionality In silos, often supported by software systems 12 A Typical Purchase Order Process In reality: communication/coordination between the silos needed 13 An example of (real) Purchase Order Process 14 Going outside of your system boundary ... 15 The evolution of programming abstractions Services: “customer” and “service provider” Lines of code vs. Services - consider software building exercise as ‘building services’, ‘discovering services’ and ‘combining services’ WEB services Web = platform/language neutral 16 17 The evolution of programming abstractions In SOA, we talk about software as a service ... That is, SOA is about building software systems composed of a collection of (software) services A software service: • A software asset that is deployed at an endpoint and is continuously maintained by a provider for user by one or multiple clients • Services have explicit contracts that establish their purpose and how they should be used • Software services are (supposed to be) reusable (“compose-able”) ... – like lego blocks – “my” (the developer) service could be used in scenarios that I never anticipated Simplified view of services (or API ?!) - services. If you can define the services, you can begin to link the services to realize more complicated 'services' Service orientation - a way of integrating your applications as a set of linked 18 19 So again...Why Data Services? 20 Sexy Job “I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?” “The ability to take data, to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data.” Hal Varian, Google’s Chief Economist Data is Popular 21 22 Data is Popular? 23 Data is Popular 24 Data is valuable Economist.com 25 Data is Massive “There are 2.5 quintillion bytes (EB) of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT).” 1 » 2.7 Zetabytes of data exist in the digital universe today.2 » Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.3 » Akamai analyzes 75 million events per day to better target advertisements.3 » Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data.4 » In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day.5 1. Forbes, How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read 2. Wikibon, The Rapid Growth in Unstructured Data 3. Wikibon, Taming Big Data 4. SAS, Big Data Meets Big Data Analytics 5. TechCrunch, Google Processing 20,000 Terabytes A Day, And Growing Data Every Minute 26 https://dailyinfographic.com/how-much-data-is-generated-every-minute Where does data come from? 27 28 Where does data come from? 29 The connected world 30 Even more data... 31 https://www.statista.com/statistics/871513/worldwide-data-created/ 32 What can we do with the data? Life Sciences Clinical research is a slow and expensive process, with trials failing for a variety of reasons. Advanced analytics, artificial intelligence (AI) and the Internet of Medical Things (IoMT) unlocks the potential of improving speed and efficiency at every stage of clinical research by delivering more intelligent, automated solutions. Banking Financial institutions gather and access analytical insight from large volumes of unstructured data in order to make sound financial decisions. Big data analytics allows them to access the information they need when they need it, by eliminating overlapping, redundant tools and systems. Source: https://www.sas.com/en_au/insights/analytics/big-data-analytics.html 33 What can we do with the data? Manufacturing For manufacturers, solving problems is nothing new. They wrestle with difficult problems on a daily basis - from complex supply chains, to motion applications, to labor constraints and equipment breakdowns. That's why big data analytics is essential in the manufacturing industry, as it has allowed competitive organizations to discover new cost saving opportunities and revenue opportunities. Health Care Big data is a given in the health care industry. Patient records, health plans, insurance information and other types of information can be difficult to manage – but are full of key insights once analytics are applied. That’s why big data analytics technology is so important to heath care. By analyzing large amounts of information – both structured and unstructured – quickly, health care providers can provide lifesaving diagnoses or treatment options almost immediately. Source: https://www.sas.com/en_au/insights/analytics/big-data-analytics.html 34 What can we do with the data? Government Retail Customer service has evolved in the past several years, as savvier shoppers expect retailers to understand exactly what they need, when they need it. Big data analytics technology helps retailers meet those demands. Armed with endless amounts of data from customer loyalty programs, buying habits and other sources, retailers not only have an in-depth understanding of their customers, they can also predict trends, recommend new products – and boost profitability. Certain government agencies face a big challenge: tighten the budget without compromising quality or productivity. This is particularly troublesome with law enforcement agencies, which are struggling to keep crime rates down with relatively scarce resources. And that’s why many agencies use big data analytics; the technology streamlines operations while giving the agency a more holistic view of criminal activity. Source: https://www.sas.com/en_au/insights/analytics/big-data-analytics.html Also Spam/False Information Detection Credit card fraud detection Recommendation systems Human activity recognition/prediction Machine translation Face/Scene recognition Image caption Self-driving cars 35 36 Unraveling Power of Deeply Connected World • Produce a treasure trove of big data » datathatcanhelpcitiespredictaccidentsandcrimes • Give doctors real-time insight into information from pacemakers or biochips » enable optimized productivity across industries through predictive maintenance on equipment and machinery • Create true smart homes with connected appliances • Provide critical communication between self-driving cars •... 37 Looks promising...Yet how? What do you do with all this data? » Too much data to search through it manually or processing in traditional ways... But there is valuable information in the data: » How can we use it for fun, profit, and/or the greater good? Boosting in computing power helps. » Machine learning is key tool we use to make sense of very large datasets. So What is Next? • In order to build a data service you need to know how to work with data • Accessing the data from multiple sources • Exploring the Data • Cleansing the data (e.g., removing corrupted or useless data) • Manipulating the data (e.g., merging, transformation, normalization) • Presenting the data (visualization) 38 39 Useful Reading • View-based Data Integration, Yannis Katsis (http://db.ucsd.edu/wp- content/uploads/pdfs/355.pdf) • Mashups: Concepts, Models and Architectures, Daniel, Florian, Matera, Maristella (https://link.springer.com/book/10.1007%2F978-3-642-55049-2) 40 Q&A